Re: [HACKERS] Synchronization levels in SR
On Thu, 2010-05-27 at 11:28 +0900, Fujii Masao wrote: On Wed, May 26, 2010 at 10:20 PM, Simon Riggs si...@2ndquadrant.com wrote: On Wed, 2010-05-26 at 18:52 +0900, Fujii Masao wrote: I guess that dropping the support of #3 doesn't reduce complexity since the code of #3 is almost the same as that of #2. Like walreceiver sends the ACK after receiving the WAL in #2 case, it has only to do the same thing after the WAL flush. Hmm, well the code for #3 is similar also to the code for #4. So if you do #2, its easy to do #2, #3 and #4 together. No. #4 requires the way of prompt communication between walreceiver and startup process, but #2 and #3 not. That is, in #4, walreceiver has to wake the startup process up as soon as it has flushed WAL. OTOH, the startup process has to wake walreceiver up as soon as it has replayed WAL, to request it to send the ACK to the master. In #2 and #3, the prompt communication from walreceiver to startup process, i.e., changing the poll loop in the startup process would also be useful for the data to be visible immediately on the standby. But it's not required. You need to pass WAL promptly on primary from backend to WALSender. Whatever mechanism you use can also be reused symmetrically on standby to provide #4. So not a problem. The comment is about whether having #3 makes sense from a user interface perspective. It's easy to add options, but they must have useful meaning. #3 would be useful for people wanting further robustness. In #2, when simultaneous power failure on the master and the standby, and concurrent disk crash on the master happen, transaction whose success indicator has been returned to a client might be lost. #3 can avoid such a critical situation. This is one of reasons that DRBD supports Protocol C, I think. Which few people use it, or if they do its because DRBD didn't originally support multiple standbys. Not worth emulating IMHO. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [spf:guess] Re: [HACKERS] ROLLBACK TO SAVEPOINT
On 27/05/10 03:25, Florian Pflug wrote: On May 27, 2010, at 0:58 , Heikki Linnakangas wrote: On 26/05/10 02:00, Sam Vilain wrote: Florian Pflug wrote: On May 25, 2010, at 12:18 , Heikki Linnakangas wrote: Releasing the newer savepoint will cause the older one to again become accessible, as the doc says, but rolling back to a savepoint does not implicitly release it. You'll have to use RELEASE SAVEPOINT for that. Ah, now I get it. Thanks. Would changing Releasing the newer savepoint will cause ... to Explicitly releasing the newer savepoint or maybe even Explicitly releasing the newer savepoint with RELEASE SAVEPOINT will cause ... make things clearer? Yes, probably - your misreading matches my misreading of it :-) +1. Patch that changes the wording to Explicitly releasing the newer savepoint with RELEASE SAVEPOINT will cause ... is attached. Thanks, committed. I left out the Explicitly, though, because as Sam pointed out the newer savepoint can also be implicitly released by rolling back to an earlier savepoint. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
I think we should fix it now. Quick thought: maybe we could use FOR instead of AS: select myfunc(7 for a, 6 for b); IIRC the standard's mechanism for this is 'paramname = value', but I think that has problems because of our possibly use of = as an operator - otherwise that would be by far the best way to go. What is advice of FOR instead AS? it is exactly same. Regards Pavel cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
2010/5/27 Heikki Linnakangas heikki.linnakan...@enterprisedb.com: On 27/05/10 02:09, alvherre wrote: Excerpts from Andrew Dunstan's message of mié may 26 18:52:33 -0400 2010: I think we should fix it now. Quick thought: maybe we could use FOR instead of AS: select myfunc(7 for a, 6 for b); IIRC the standard's mechanism for this is 'paramname = value', but I think that has problems because of our possibly use of = as an operator - otherwise that would be by far the best way to go. I think we were refraining from = because the standard didn't specify this back then -- AFAIU this was introduced very recently. But now that it does, and that the syntax we're implementing conflicts with a different feature, it seems wise to use the standard-mandated syntax. The problem with the = operator seems best resolved as not accepting such an operator in a function parameter, which sucks but we don't seem to have a choice. Perhaps we could allow = to resolve as the operator for the case the user really needs to use it; or a schema-qualified operator. AFAIU, the standard doesn't say anything about named parameters. Oracle uses =, but as you said, that's ambiguous with the = operator. +1 for FOR. I don't see any advantage of FOR. We can change ir to support new standard or don't change it. Pavel -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Thu, 2010-05-27 at 02:18 +0300, Heikki Linnakangas wrote: On 27/05/10 01:23, Simon Riggs wrote: On Thu, 2010-05-27 at 00:21 +0300, Heikki Linnakangas wrote: On 26/05/10 23:31, Dimitri Fontaine wrote: d. choice of commit or rollback at timeout Rollback is not an option. There is no going back after the commit record has been flushed to disk or sent to a standby. There's definitely no going back after the xid has been removed from procarray because other transactions will then depend upon the final state. Currently we PANIC if we abort after we've marked clog, though that happens after XLogFlush(), which is where we're planning to wait for synch rep. If we abort after having written a commit record to disk we can still successfully generate an abort record as well. (Luckily, I note HS does actually cope with that. Phew!) So actually, an abort is a reasonable possibility, though I know it doesn't sound like it could be at first thought. Hmm, that's an interesting thought. Interesting, as in crazy ;-). :-) It's a surprising thought for me also. I don't understand how HS could handle that. As soon as it sees the commit record, the transaction becomes visible to readers. I meant not-barf completely. The choice is to either commit anyway after the timeout, or wait forever. Hmm, wait forever. What happens if we try to shutdown fast while there is a transaction that is waiting forever? Is that then a commit, even though it never made it to the standby? How would we know it was safe to switchover or not? Hmm. Refuse to shut down until the standby acknowledges the commit. That's the only way to be sure.. In practice, hard synchronous don't return ever until the commit hits the standby behavior is rarely what admins actually want, because it's disastrous from an availability point of view. More likely, admins want wait for ack from standby, unless it's not responding, in which case to hell with redundancy and just act like a single server. It makes sense if you just want to make sure that the standby doesn't return stale results when it's working properly, and you're not worried about durability but I'm not sure it's very sound otherwise. Which is also crazy. If you're using synch rep its because you care deeply about durability. Some people wish to treat the COMMIT as a guarantee, not just a shrug. I agree that don't-return-ever isn't something anyone will want. What we need is a COMMIT with ERROR message! Note that Oracle gives the options of COMMIT | SHUTDOWN at this point. Shutdown is an implicit abort for the writing transaction... At this point the primary thinks standby is no longer available. If we have a split brain situation then we should be assuming we will STONITH and shutdown the primary anyway. If we have more than one standby we can stay up and probably shouldn't be sending an abort after a commit. The trouble is *every* option is crazy from some perspective, so we must consider them all, to see whether they are practical or impractical. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
2010/5/27 Tom Lane t...@sss.pgh.pa.us: Robert Haas robertmh...@gmail.com writes: On Wed, May 26, 2010 at 8:21 PM, Tom Lane t...@sss.pgh.pa.us wrote: If we go with the spec's syntax I think we'd have no realistic choice except to forbid = altogether as an operator name. (And no, I'm not for that.) I suppose the most painful thing about doing that is that it would break hstore. Are there other commonly-used modules that rely on = as an operator name? There don't seem to be any other contrib modules that define = as an operator name, but I'm not sure what's out there on pgfoundry or elsewhere. The bigger issue to me is not so much hstore itself as that this is an awfully attractive operator name for anything container-ish. Wasn't the JSON-datatype proposal using = for an operator at one stage? (The current wiki page for it doesn't seem to reflect any such idea, though.) And I think I remember Oleg Teodor proposing such an operator in conjunction with some GIN-related idea or other. In spite of the difficulties, I'm reluctant to give up on it. I always thought that the AS syntax was a crock and I'm not eager to invent another crock to replace it. Being compatible with the SQL standard and with Oracle is not to be taken lightly. Yeah, I know. Though this could end up being one of the bits of the spec that we politely decline to follow, like upper-casing identifiers. Still, it's a good idea to think again before we've set the release in stone ... we have a last minutes for decision. any other change will need years - like 'standard strings'. I agree so it's not good time for change. But this change is a few lines in parser. Regards Pavel regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
2010/5/27 Robert Haas robertmh...@gmail.com: On Wed, May 26, 2010 at 9:28 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, May 26, 2010 at 8:21 PM, Tom Lane t...@sss.pgh.pa.us wrote: If we go with the spec's syntax I think we'd have no realistic choice except to forbid = altogether as an operator name. (And no, I'm not for that.) I suppose the most painful thing about doing that is that it would break hstore. Are there other commonly-used modules that rely on = as an operator name? There don't seem to be any other contrib modules that define = as an operator name, but I'm not sure what's out there on pgfoundry or elsewhere. The bigger issue to me is not so much hstore itself as that this is an awfully attractive operator name for anything container-ish. Wasn't the JSON-datatype proposal using = for an operator at one stage? (The current wiki page for it doesn't seem to reflect any such idea, though.) And I think I remember Oleg Teodor proposing such an operator in conjunction with some GIN-related idea or other. In spite of the difficulties, I'm reluctant to give up on it. I always thought that the AS syntax was a crock and I'm not eager to invent another crock to replace it. Being compatible with the SQL standard and with Oracle is not to be taken lightly. Yeah, I know. Though this could end up being one of the bits of the spec that we politely decline to follow, like upper-casing identifiers. Still, it's a good idea to think again before we've set the release in stone ... Perhaps one idea would be to: 1. Invent a new crock for now. 2. Add a duplicate version of the hstore = operator with a different name. 3. Emit a warning whenever an operator called = is created. 4. Document that beginning in PG 9.1, we will no longer support = as an operator name. +1 Pavel That's still going to cause a fair amount of pain, but certainly less if we decide it now rather than later. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [RFC] Security label support
As we talked at the developer meeting on Ottawa, it needs to provide a capability to assign a short text identifier on database objects to support label based ESP (such as SELinux). So, I'd like to propose a few approaches to support security label as a draft of discussion. An example of label: system_u:object_r:sepgsql_ro_table_t:s0. The format/contains/meanings of the security label shall be parsed and validated by ESP module, so all we need to do is associate such a short text on a certain database. It is quite similar to COMMENT ON. I don't want to support multiple labels of an object in this stage, because it makes ESP interfaces more complex and it is unclear whether it is actually wanted. For example, OS does not support multiple MAC features concurrently. Here are a few idea to support security labels. In this stage, I think the idea of [2] is most reasonable for us. (Perhaps, I guess Stephen has same opinion, because the idea was originally came from him.) [1] Inject a text label field to every system catalog - This idea tries to add a new field to the schema of existing system catalog. Its implementation will be simple at first, however, it will be entirely painful to modify every system catalog definitions and (typically) Create() functions under the src/backend/commands/. I doubt it is a correct way, even if short-term development. It will be reasonabel just only conceptual development. [2] Using OID as a key of text representation in separated catalog -- This idea is similar to pg_description/pg_shdescription. A new system catalog pg_seclabel and pg_shseclabel stores text form of labels for pair of the relation-Id, object-Oid and object-Subid. It does not damage to the schema of existing system catalog, It adds two new system catalogs; pg_seclabel (local) and pg_shseclabel (shared). The catalogs shall be declared as follows: CATALOG(pg_seclabel, 3037) BKI_WITHOUT_OIDS { Oid relid; /* OID of the catalog containing the object */ Oid objid; /* OID of the object itself */ int4subid; /* column number, or 0 if unused */ textlabel; /* text form of security label */ } FormData_pg_seclabel; We also add a dependency between the labeled object and the security label itself. It also enables to clean up orphan labels automatically, without any new invention. The related code will be stored in src/backend/catalog/pg_seclabel.c. It provides an internal interface to assign a security label on a certain database object when creation or relabeling. However, it also has a limitation from the viewpoint of long-term. From the definition, OID of database objects are unique. So, we cannot share text form of labels even if massive number of database objects have an identical security label; it can lead waste of storage consumption because of the duplicated security labels. So, this idea shall be switched to the [3] when we support row-level security with ESP. But I think the idea [2] is reasonable in short-term development. [3] Using security-Id as a key of text representation in separated catalog -- This idea is a derivation from the idea of [2]. It also stores text form of labels into pg_seclabel/pg_shseclabel, but it shall be identified with a pair of relation-Id and security-Id which is newly supported. The security-Id shall be stored within padding area of HeapTupleHeader like object-Id. But, unlike object-Id, it does not need to be unique for each tuples. It allows multiple tuples has same security-Id that is related to a certain text form of security label. It means we can reduce waste of storage due to the duplicated labels in text (Note, massive number of objects tend to share a limited number of labels in general). So, this approach has advantage toward the idea of [2], however, it needs more code to be implemented/reviewed than [2], such as management of security-Id, reclaim of orphan labels and so on. Therefore, it is not feasible at the statring-up stage, as long as row-level security with ESP is not available. * SQL Statement --- It also need to provide SQL statement to manage security label of the database object. I plan the following statement to change the security label. ALTER xxx name SECURITY LABEL TO 'label'; (For columns) ALTER TABLE name ALTER column SECURITY LABEL TO 'label'; The 'xxx' part is replaced by an object class, such as TABLE, SCHEMA and so on. When the ALTER command is executed, ESP module validate the given label, in addition to permission checks to relabel it. If no ESP module is available, the ALTER always raises a feature-not-supported error. Example) ALTER TABLE t1 SECURITY LABEL TO 'system_u:object_r:sepgsql_ro_table_t:s0'; ALTER SCHEMA kaigai SECURITY
Re: [HACKERS] Synchronization levels in SR
On 27/05/10 09:51, Simon Riggs wrote: On Thu, 2010-05-27 at 02:18 +0300, Heikki Linnakangas wrote: In practice, hard synchronous don't return ever until the commit hits the standby behavior is rarely what admins actually want, because it's disastrous from an availability point of view. More likely, admins want wait for ack from standby, unless it's not responding, in which case to hell with redundancy and just act like a single server. It makes sense if you just want to make sure that the standby doesn't return stale results when it's working properly, and you're not worried about durability but I'm not sure it's very sound otherwise. Which is also crazy. If you're using synch rep its because you care deeply about durability. No, not necessarily. As I said above, you might just want a guarantee that *if* you query the standby, you get up-to-date results. But if the standby is down for any reason, you don't care about it. That's a very sensible mode of operation, for example if you're offloading reads to the standby with something like pgpool. In fact I have the feeling that that's the most common use case for synchronous replication, not a deep concern of durability. I agree that don't-return-ever isn't something anyone will want. What we need is a COMMIT with ERROR message! Hmm, perhaps we could emit a warning with the commit. I'm not sure what an application could do with it, though. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
On 27/05/10 09:50, Pavel Stehule wrote: 2010/5/27 Heikki Linnakangasheikki.linnakan...@enterprisedb.com: AFAIU, the standard doesn't say anything about named parameters. Oracle uses =, but as you said, that's ambiguous with the = operator. +1 for FOR. I don't see any advantage of FOR. Any advantage over AS? It doesn't clash with the foo AS bar syntax that the standard is using for something completely different, as Peter pointed out in the original post. We can change ir to support new standard or don't change it. What new standard? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Wed, May 26, 2010 at 10:37 PM, Simon Riggs si...@2ndquadrant.com wrote: If the remote server responded first, then that proves it is a better candidate for failover than the one you think of as near. If the two standbys vary over time then you have network problems that will directly affect the performance on the master; synch_rep = N would respond better to any such problems. No. The remote standby might respond first temporarily though it's almost behind the near one. The read-only queries or incrementally updated backup operation might cause a bursty disk write, and delay the ACK from the standby. The lock contention between read-only queries and recovery would delay the ACK. So the standby which responds first is not always the best candidate for failover. Also the administrator generally doesn't put the remote standby under the control of a clusterware like heartbeat. In this case, the remote standby will never be the candidate for failover. But quorum commit cannot cover this simple case. OTOH, synchronous_replication=2 degrades the performance on the master very much. Yes, but only because you have only one near standby. It would clearly to be foolish to make this setting without 2+ near standbys. We would then have 4 or more servers; how do we specify everything for that config?? If you always want to use the near standby as the candidate for failover by using quorum commit in the above simple case, you would need to choose such a foolish setting. Otherwise, unfortunately you might have to failover to the remote standby not under the control of a clusterware. synchronous_replication approach doesn't seem to cover the typical use case. You described the failure modes for the quorum proposal, but avoided describing the failure modes for the per-standby proposal. Please explain what will happen when the near server is unavailable, with per-standby settings. Please also explain what will happen if we choose to have 4 or 5 servers to maintain performance in case of the near server going down. How will we specify the failure modes? I'll try to explain that. (1) most standard case: 1 master + 1 sync standby (near) When the master goes down, something like a clusterware detects that failure, and brings the standby online. Since we can ensure that the standby has all the committed transactions, failover doesn't cause any data loss. When the standby goes down or network outage happens, walsender detects that failure via the replication timeout, keepalive or error return from the system calls. Then walsender does something according to the specified reaction (GUC) to the failure of the standby, e.g., walsender wakes the transaction commit up from the wait-for-ACK, and exits. Then the master runs standalone. (2) 1 master + 1 sync standby (near) + 1 async standby (remote) When the master goes down, something like a clusterware brings the sync standby in the near location online. The administrator would need to take a fresh base backup of the new master, load it on the remote standby, change the primary_conninfo, and restart the remote standby. When one of standbys goes down, walsender does the same thing described in (1). Until the failed standby has restarted, the master runs together with another standby. In (1) and (2), after some failure happens, there would be only one server which is guaranteed to have all the committed transactions. When it also goes down, the database service stops. If you want to avoid this fragile situation, you would need to add one more sync standby in the near site. (3) 1 master + 2 sync standbys (near) + 1 async standby (remote) When the master goes down, something like a clusterware brings the one of sync standbys online by using some selection algorithm. The administrator would need to take a fresh base backup of the new master, load it on both remaining standbys, change the primary_conninfo, and restart them. When one of standbys goes down, walsender does the same thing described in (1). Until the failed standby has restarted, the master runs together with two standbys. At least one standby is guaranteed to be sync with the master. Is this explanation enough? Also, when synchronous_replication=1 and one of synchronous standbys goes down, how should the surviving standby catch up with the master? Such standby might be too far behind the master. The transaction commit should wait for the ACK from the lagging standby immediately even if there might be large gap? If yes, synch_rep_timeout would screw up the replication easily. That depends upon whether we send the ACK at point #2, #3 or #4. It would only cause a problem if you waited until #4. Yeah, the problem happens. If we implement quorum commit, we need to design how the surviving standby catches up with the master. Regards, -- Fujii Masao NIPPON TELEGRAPH AND
Re: [HACKERS] functional call named notation clashes with SQL feature
At 2010-05-27 08:50:18 +0200, pavel.steh...@gmail.com wrote: I don't see any advantage of FOR. We can change ir to support new standard or don't change it. Adopting FOR would mean we don't use AS in a way that conflicts with the standard. That's its only advantage. But I agree with you, I don't think it's worth inventing a new non-standard wart for this case. I don't really like the idea of getting rid of = as an operator either; I'm torn between staying true to the standard and politely looking the other way as Tom suggested we might end up doing. -- ams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
2010/5/27 Heikki Linnakangas heikki.linnakan...@enterprisedb.com: On 27/05/10 09:50, Pavel Stehule wrote: 2010/5/27 Heikki Linnakangasheikki.linnakan...@enterprisedb.com: AFAIU, the standard doesn't say anything about named parameters. Oracle uses =, but as you said, that's ambiguous with the = operator. +1 for FOR. I don't see any advantage of FOR. Any advantage over AS? It doesn't clash with the foo AS bar syntax that the standard is using for something completely different, as Peter pointed out in the original post. No, standard knows AS in different context. In param list standard doesn't use keyword AS. We can change ir to support new standard or don't change it. What new standard? ANSI SQL 2011 Pavel -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
2010/5/27 Abhijit Menon-Sen a...@toroid.org: At 2010-05-27 08:50:18 +0200, pavel.steh...@gmail.com wrote: I don't see any advantage of FOR. We can change ir to support new standard or don't change it. Adopting FOR would mean we don't use AS in a way that conflicts with the standard. That's its only advantage. But I agree with you, I don't think it's worth inventing a new non-standard wart for this case. current using AS isn't in conflict with standard .. look to standard, please. Pavel I don't really like the idea of getting rid of = as an operator either; I'm torn between staying true to the standard and politely looking the other way as Tom suggested we might end up doing. -- ams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages
On 27/05/10 08:56, Jesper Krogh wrote: Just a thought. Wouldn't a All-visible bit also enable index only scans to some degree? Yes. In fact, that's one reason I implemented the visibility map in the first place. I started working on index-only scans based on that last year, if you search the archives for index-only scans you'll find those discussions. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Thu, May 27, 2010 at 3:21 PM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, 2010-05-27 at 11:28 +0900, Fujii Masao wrote: On Wed, May 26, 2010 at 10:20 PM, Simon Riggs si...@2ndquadrant.com wrote: On Wed, 2010-05-26 at 18:52 +0900, Fujii Masao wrote: I guess that dropping the support of #3 doesn't reduce complexity since the code of #3 is almost the same as that of #2. Like walreceiver sends the ACK after receiving the WAL in #2 case, it has only to do the same thing after the WAL flush. Hmm, well the code for #3 is similar also to the code for #4. So if you do #2, its easy to do #2, #3 and #4 together. No. #4 requires the way of prompt communication between walreceiver and startup process, but #2 and #3 not. That is, in #4, walreceiver has to wake the startup process up as soon as it has flushed WAL. OTOH, the startup process has to wake walreceiver up as soon as it has replayed WAL, to request it to send the ACK to the master. In #2 and #3, the prompt communication from walreceiver to startup process, i.e., changing the poll loop in the startup process would also be useful for the data to be visible immediately on the standby. But it's not required. You need to pass WAL promptly on primary from backend to WALSender. Whatever mechanism you use can also be reused symmetrically on standby to provide #4. So not a problem. I cannot be so optimistic since the situation differs from one process to another. The comment is about whether having #3 makes sense from a user interface perspective. It's easy to add options, but they must have useful meaning. #3 would be useful for people wanting further robustness. In #2, when simultaneous power failure on the master and the standby, and concurrent disk crash on the master happen, transaction whose success indicator has been returned to a client might be lost. #3 can avoid such a critical situation. This is one of reasons that DRBD supports Protocol C, I think. Which few people use it, or if they do its because DRBD didn't originally support multiple standbys. Not worth emulating IMHO. If so, #3 would be useful for people who don't afford to buy more than one standby servers, too :) Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
On 27/05/10 10:16, Pavel Stehule wrote: 2010/5/27 Heikki Linnakangasheikki.linnakan...@enterprisedb.com: On 27/05/10 09:50, Pavel Stehule wrote: 2010/5/27 Heikki Linnakangasheikki.linnakan...@enterprisedb.com: AFAIU, the standard doesn't say anything about named parameters. Oracle uses =, but as you said, that's ambiguous with the =operator. +1 for FOR. I don't see any advantage of FOR. Any advantage over AS? It doesn't clash with the foo AS bar syntax that the standard is using for something completely different, as Peter pointed out in the original post. No, standard knows AS in different context. In param list standard doesn't use keyword AS. As Peter pointed out in the original post, according to the standard function(foo AS bar) means something else than what we have now. Please re-read the original post. We can change ir to support new standard or don't change it. What new standard? ANSI SQL 2011 Oh, does that have something to say about named parameters? Is the draft publicly available somewhere? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
On tor, 2010-05-27 at 04:06 +0300, Heikki Linnakangas wrote: On 27/05/10 03:57, Robert Haas wrote: Being compatible with the SQL standard and with Oracle is not to be taken lightly. I seem to be alone believing that the SQL standard doesn't say anything about named function parameters. Can someone point me to the relevant section of the standard? It will be in SQL:2011. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
On 27/05/10 10:49, Peter Eisentraut wrote: On tor, 2010-05-27 at 04:06 +0300, Heikki Linnakangas wrote: On 27/05/10 03:57, Robert Haas wrote: Being compatible with the SQL standard and with Oracle is not to be taken lightly. I seem to be alone believing that the SQL standard doesn't say anything about named function parameters. Can someone point me to the relevant section of the standard? It will be in SQL:2011. Does it mandate = ? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
On tor, 2010-05-27 at 10:51 +0300, Heikki Linnakangas wrote: On 27/05/10 10:49, Peter Eisentraut wrote: On tor, 2010-05-27 at 04:06 +0300, Heikki Linnakangas wrote: On 27/05/10 03:57, Robert Haas wrote: Being compatible with the SQL standard and with Oracle is not to be taken lightly. I seem to be alone believing that the SQL standard doesn't say anything about named function parameters. Can someone point me to the relevant section of the standard? It will be in SQL:2011. Does it mandate = ? routine invocation ::= routine name SQL argument list routine name ::= [ schema name period ] qualified identifier SQL argument list ::= left paren [ SQL argument [ { comma SQL argument }... ] ] right paren SQL argument ::= value expression | generalized expression | target specification | contextually typed value specification | named argument specification generalized expression ::= value expression AS path-resolved user-defined type name named argument specification ::= SQL parameter name named argument assignment token named argument SQL argument named argument SQL argument ::= value expression | target specification | contextually typed value specification named argument assignment token ::= = -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
2010/5/26 Peter Eisentraut pete...@gmx.net: It turns out that the SQL standard uses the function call notation foo(this AS that) for something else: routine invocation ::= routine name SQL argument list routine name ::= [ schema name period ] qualified identifier SQL argument list ::= left paren [ SQL argument [ { comma SQL argument }... ] ] right paren SQL argument ::= value expression | generalized expression | target specification generalized expression ::= value expression AS path-resolved user-defined type name In systems that have inheritance of composite types, this is used to specify which type the value is supposed to be interpreted as (for example, to treat the value as a supertype). can it be used (in ANSI SQL semantic) as cast? like SELECT foo(10.33 AS int) Seems kind of bad to overload this with something completely different. What should we do? Is ANSI SQL consistent in this syntax? SQL/XML use AS in different meaning. Regards Pavel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: On 26/05/10 23:31, Dimitri Fontaine wrote: So if you want simplicity to admin, effective data availability and precise control over the global setup, I say go for: a. transaction level control of the replication level b. cascading support c. quorum with timeout d. choice of commit or rollback at timeout Then give me a setup example that you can't express fully. One master, one synchronous standby on another continent for HA purposes, and one asynchronous reporting server in the same rack as the master. You don't want to set up the reporting server as a cascaded slave of the standby on the other continent, because that would double the bandwidth required, but you also don't want the master to wait for the reporting server. The possibilities are endless... Your proposal above covers a pretty good set of scenarios, but it's by no means complete. If we try to solve everything the configuration will need to be written in a Turing-complete Replication Description Language. We'll have to pick a useful, easy-to-understand subset that covers the common scenarios. To handle the more exotic scenarios, you can write a proxy that sits in front of the master, and implements whatever rules you wish, with the rules written in C. Agreed on the Turing-completeness side of those things. My current thinking is that the proxy I want might simply be a PostgreSQL instance with cascading support. In your example that would give us: Remote Standby, HA Master -- Proxy - Local Standby, Reporting So what I think we have here is a pretty good trade-off in terms of what you can do with some simple setup knobs. What's left there is that with the quorum idea, you're not sure if the one server that's synced is the remote or local standby, in this example. Several ideas are floating around (votes, mixed per-standby and per-transaction settings). Maybe we could have the standby be able to say it's not interesting into participating into the quorum, that is, it's an async replica, full stop. In your example we'd set the local reporting standby as a non-voting member of the replication setting, the proxy and the master would have a quorum of 1, and the remote HA standby would vote. I don't think the idea of having any number of voting coupons other than 0 or 1 on any server will help us the least. I do think that your proxy idea is a great one and should be in core. By the way, the cascading/proxy instance could be set without Hot Standby, if you don't like to be able to monitor it via a libpq connection and some queries. BTW, I think we're going to need a separate config file for listing the standbys anyway. There you can write per-server rules and options, but explicitly knowing about all the standbys also allows the master to recycle WAL as soon as it has been streamed to all the registered standbys. Currently we just keep wal_keep_segments files around, just in case there's a standby out there that needs them. I much prefer that each server in the set publish what it wants. It only connects to 1 given provider. Then we've been talking about this exact same retention problem for queueing solutions, with Jan, Marko and Jim. The idea we came up with is a watermarking solution (which already exists in Skytools 3, in its coarse-grain version). The first approach is to have all slave give back to its local master/provider/origin the last replayed WAL/LSN, once in a while. You derive from that a global watermark and drop WAL files depending on it. You now have two problems: no more space and why keeping that many files on the master anyway, maybe some slave could be set up for retention instead? To solve that it's possible for each server to be setup with a restricted set of servers they're deriving their watermark from. That's when you need per-server options and an explicit list of all the standbys whatever their level in the cascading tree. That means explicit maintenance of the entire replication topology. I don't think we need to solve that already. I think we need to provide an option on each member of the replication tree to either PANIC or lose WALs in case they're running out of space when trying to follow the watermark. It's crude but already allows to have a standby set to maintain the common archive and have the master drop the WAL files as soon as possible (respecting wal_keep_segments). In our case, if a WAL file is no more available from any active server we still have the option to fetch it from the archives... Regards, -- Dimitri Fontaine PostgreSQL DBA, Architecte -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Thu, 2010-05-27 at 10:09 +0300, Heikki Linnakangas wrote: No, not necessarily. As I said above, you might just want a guarantee that *if* you query the standby, you get up-to-date results. Of course. COMMIT was already one of the options, so this comment was already understood. What we are discussing is whether additional options exist and/or are desirable. We should not be forcing everybody to COMMIT whether or not it is robust. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Thu, 2010-05-27 at 16:35 +0900, Fujii Masao wrote: On Thu, May 27, 2010 at 3:21 PM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, 2010-05-27 at 11:28 +0900, Fujii Masao wrote: On Wed, May 26, 2010 at 10:20 PM, Simon Riggs si...@2ndquadrant.com wrote: On Wed, 2010-05-26 at 18:52 +0900, Fujii Masao wrote: I guess that dropping the support of #3 doesn't reduce complexity since the code of #3 is almost the same as that of #2. Like walreceiver sends the ACK after receiving the WAL in #2 case, it has only to do the same thing after the WAL flush. Hmm, well the code for #3 is similar also to the code for #4. So if you do #2, its easy to do #2, #3 and #4 together. No. #4 requires the way of prompt communication between walreceiver and startup process, but #2 and #3 not. That is, in #4, walreceiver has to wake the startup process up as soon as it has flushed WAL. OTOH, the startup process has to wake walreceiver up as soon as it has replayed WAL, to request it to send the ACK to the master. In #2 and #3, the prompt communication from walreceiver to startup process, i.e., changing the poll loop in the startup process would also be useful for the data to be visible immediately on the standby. But it's not required. You need to pass WAL promptly on primary from backend to WALSender. Whatever mechanism you use can also be reused symmetrically on standby to provide #4. So not a problem. I cannot be so optimistic since the situation differs from one process to another. This spurs some architectural thinking: I think we need to disconnect the idea of waiting in any of the components. Anytime we ask WALSender or WALReceiver to wait for acknowledgement we will be reducing throughput. So we should assume that they will continue to work as quickly as possible. The acknowledgement from standby can contain the latest xlog location of WAL received, WAL written to disk and WAL applied, all by reading values from shared memory. It's all the same, whether we send back 2 or 3 xlog locations in the ack message. Who sends the ack message? Who receives it? Would it be easier to have this happen in a second pair of processes WALSynchroniser (on primary) and WAL Acknowledger (on standby). WALAcknowledger would send back a stream of ack messages with latest xlog positions. WALSynchroniser would receive these messages and wake up sleeping backends. If we did that then there'd be almost no change at all to existing code, just additional code and processes for the sync case. Code would be separate and there would be no performance concerns either. Backends can then choose to wait until the xlog location they wish has been achieved which might be in the next acknowledgement message or in a subsequent one. That also ensures that the logic for this is completely on the master and the standby doesn't act differently, apart from needing to start a WALAcknowledger process if sync rep is requested. If you do choose to make #3 important, then I'd say you need to work out how to make WALWriter active as well, so it can perform regular fsyncs, rather than having WALReceiver wait across that I/O. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] primary/secondary/master/slave/standby
Greg Stark gsst...@mit.edu writes: Fwiw I like the word replica but I don't see an obvious choice of word to pair it with I guess it's replica / origin, per choice of Jan Wieck to be found in our catalogs: http://www.postgresql.org/docs/current/static/catalog-pg-trigger.html tgenabled char Controls in which session_replication_role modes the trigger fires. O = trigger fires in origin and local modes, D = trigger is disabled, R = trigger fires in replica mode, A = trigger fires always. So that's origin/replica, master/slave, primary/standby, master/standby. -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Thu, 2010-05-27 at 16:13 +0900, Fujii Masao wrote: On Wed, May 26, 2010 at 10:37 PM, Simon Riggs si...@2ndquadrant.com wrote: Please explain what will happen when the near server is unavailable, with per-standby settings. Please also explain what will happen if we choose to have 4 or 5 servers to maintain performance in case of the near server going down. How will we specify the failure modes? I'll try to explain that. We've been discussing parameters and how we would define what we want to happen in various scenarios. You've not explained what parameters you would use, how and where they would be set, so we aren't yet any closer to understanding what it is your proposing. Please explain how your proposal will work. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Thu, 2010-05-27 at 16:13 +0900, Fujii Masao wrote: On Wed, May 26, 2010 at 10:37 PM, Simon Riggs si...@2ndquadrant.com wrote: If the remote server responded first, then that proves it is a better candidate for failover than the one you think of as near. If the two standbys vary over time then you have network problems that will directly affect the performance on the master; synch_rep = N would respond better to any such problems. No. The remote standby might respond first temporarily though it's almost behind the near one. The read-only queries or incrementally updated backup operation might cause a bursty disk write, and delay the ACK from the standby. The lock contention between read-only queries and recovery would delay the ACK. So the standby which responds first is not always the best candidate for failover. Seems strange. If you have 2 standbys and you say you would like node1 to be the preferred candidate, then you load it so heavily that a remote server with by-definition much larger network delay responds first, then I say your preference was wrong. The above situation is caused by the DBA and the DBA can solve it also - if the preference is to keep a preferred server then that server would need to be lightly loaded so it could respond sensibly. This is the same thing as having an optimizer pick the best path and then the user saying no dumb-ass, use the index I tell you even though it is slower. If you really don't want to know the fastest way, then I personally will agree you can have that, as is my view (now) on the optimizer issue also - sometimes the admin does know best. Also the administrator generally doesn't put the remote standby under the control of a clusterware like heartbeat. In this case, the remote standby will never be the candidate for failover. But quorum commit cannot cover this simple case. If you, Jan and Yeb wish to completely exclude standbys from being part of any quorum, then I guess we need to have per-standby settings to allow that to be defined. I'm in favour of giving people options. That needn't be a mandatory per-standby setting, just a non-default option, so that we can reduce the complexity of configuration for common cases. If we're looking for simplest-implementation-first that isn't it. Currently, Oracle provides these settings, which correspond to Maximum Performance = quorum = 0 Maximum Availability = quorum = 1, timeout_action = commit Maximum Protection = quorum = 1, timeout_action = shutdown So Oracle already supports the quorum case... Oracle doesn't provide i) any capability to have quorum 1 ii) any capability to include an async node as a sync node, if the quorum cannot be reached with servers marked sync, or in the situation where because of mis-use/mis-configuration the sync servers are actually slower. iii) ability to wait for apply iv) ability to specify wait mode at transaction level all of those are desirable in some cases and easily possible by specifying things in the way I've suggested. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Thu, May 27, 2010 at 6:30 PM, Simon Riggs si...@2ndquadrant.com wrote: Who sends the ack message? walreceiver Who receives it? walsender Would it be easier to have this happen in a second pair of processes WALSynchroniser (on primary) and WAL Acknowledger (on standby). WALAcknowledger would send back a stream of ack messages with latest xlog positions. WALSynchroniser would receive these messages and wake up sleeping backends. If we did that then there'd be almost no change at all to existing code, just additional code and processes for the sync case. Code would be separate and there would be no performance concerns either. No, this seems to be bad idea. We should not establish extra connection between servers. That would be a source of trouble. If you do choose to make #3 important, then I'd say you need to work out how to make WALWriter active as well, so it can perform regular fsyncs, rather than having WALReceiver wait across that I/O. Yeah, this might be an option for optimization though I'm not sure how it has good effect. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard
Peter Eisentraut wrote: On ons, 2010-05-26 at 11:47 +0100, Mike Fowler wrote: The XMLEXISTS function works with XQuery expressions and doesn't have the call signature that your patch implements Looking at the manuals of Oracle, Derby and DB2 I see how the call signature differs. I also note that Oracle's implementation is XPath only, Derby's is partial XQuery and DB2 appears to be full XQuery. What do people prefer me to do? I see the options as: 1) Change the call signature to match the standard 2) Change the function name back to xpath_exists It would be nice to make XMLEXISTS work as in the standard, seeing how many others are providing the same interface. Should option one be the more popular there's further choices: 1) Integrate XQuery support to completely match the standard, however this will require the addition of a new library libxquery 2) Leave the XPath as is, inline with Oracle's implementation 3) Hybrid approach. Since XML is a comple time option, add XQuery as another. Conditional completion gives the full XQuery support when available or just the XPath when not I think providing XPath is enough, at least for now Agreed. I'll get another patch together in the next day or two. Regards, -- Mike Fowler Registered Linux user: 379787 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Thu, 2010-05-27 at 19:21 +0900, Fujii Masao wrote: On Thu, May 27, 2010 at 6:30 PM, Simon Riggs si...@2ndquadrant.com wrote: Would it be easier to have this happen in a second pair of processes WALSynchroniser (on primary) and WAL Acknowledger (on standby). WALAcknowledger would send back a stream of ack messages with latest xlog positions. WALSynchroniser would receive these messages and wake up sleeping backends. If we did that then there'd be almost no change at all to existing code, just additional code and processes for the sync case. Code would be separate and there would be no performance concerns either. No, this seems to be bad idea. We should not establish extra connection between servers. That would be a source of trouble. What kind of trouble? You think using an extra connection would cause problems; why? I've explained it would greatly simplify the code to do it that way and improve performance. Those sound like good things, not problems. If you do choose to make #3 important, then I'd say you need to work out how to make WALWriter active as well, so it can perform regular fsyncs, rather than having WALReceiver wait across that I/O. Yeah, this might be an option for optimization though I'm not sure how it has good effect. As I said, WALreceiver would not need to wait across fsync... -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Thu, May 27, 2010 at 7:21 PM, Simon Riggs si...@2ndquadrant.com wrote: Seems strange. If you have 2 standbys and you say you would like node1 to be the preferred candidate, then you load it so heavily that a remote server with by-definition much larger network delay responds first, then I say your preference was wrong. The above situation is caused by the DBA and the DBA can solve it also - if the preference is to keep a preferred server then that server would need to be lightly loaded so it could respond sensibly. No. Even if the load is very low in the preferred server, there is *no* guarantee that it responds first. Per-standby setting can give such a guarantee, i.e., we can specify #2, #3 or #4 in the preferred server and #1 in the other. This is the same thing as having an optimizer pick the best path and then the user saying no dumb-ass, use the index I tell you even though it is slower. If you really don't want to know the fastest way, then I personally will agree you can have that, as is my view (now) on the optimizer issue also - sometimes the admin does know best. I think that choice of wrong master causes more serious situation than that of wrong plan. Also the administrator generally doesn't put the remote standby under the control of a clusterware like heartbeat. In this case, the remote standby will never be the candidate for failover. But quorum commit cannot cover this simple case. If you, Jan and Yeb wish to completely exclude standbys from being part of any quorum, then I guess we need to have per-standby settings to allow that to be defined. I'm in favour of giving people options. That needn't be a mandatory per-standby setting, just a non-default option, so that we can reduce the complexity of configuration for common cases. If we're looking for simplest-implementation-first that isn't it. For now, I agree that we support a quorum commit feature for 9.1 or later. But I don't think that it's simpler, more intuitive and easier-to-understand than per-standby setting. So I think that we should include the per-standby setting in the first patch. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_trgm
Anyone working on make contrib/pg_trgm mutibyte encoding aware? If not, I'm interested in the work. It's already multibyte safe since 8.4 -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Thu, May 27, 2010 at 7:33 PM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, 2010-05-27 at 19:21 +0900, Fujii Masao wrote: On Thu, May 27, 2010 at 6:30 PM, Simon Riggs si...@2ndquadrant.com wrote: Would it be easier to have this happen in a second pair of processes WALSynchroniser (on primary) and WAL Acknowledger (on standby). WALAcknowledger would send back a stream of ack messages with latest xlog positions. WALSynchroniser would receive these messages and wake up sleeping backends. If we did that then there'd be almost no change at all to existing code, just additional code and processes for the sync case. Code would be separate and there would be no performance concerns either. No, this seems to be bad idea. We should not establish extra connection between servers. That would be a source of trouble. What kind of trouble? You think using an extra connection would cause problems; why? Because the number of connection failure cases doubles. Likewise, the number of process failure cases would double. If you do choose to make #3 important, then I'd say you need to work out how to make WALWriter active as well, so it can perform regular fsyncs, rather than having WALReceiver wait across that I/O. Yeah, this might be an option for optimization though I'm not sure how it has good effect. As I said, WALreceiver would not need to wait across fsync... Right, but walreceiver still needs to wait for WAL flush by walwriter. If currently WAL flush is the dominant workload for walreceiver, only leaving it to walwriter might not have so good effect. I'm not sure whether. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
Simon Riggs si...@2ndquadrant.com writes: Seems strange. If you have 2 standbys and you say you would like node1 to be the preferred candidate, then you load it so heavily that a remote server with by-definition much larger network delay responds first, then I say your preference was wrong. There's a communication mismatch here I think. The problem with the dynamic aspect of the system is that the admin wants to plan ahead and choose in advance the failover server. Other than that I much prefer the automatic and dynamic quorum idea. If you, Jan and Yeb wish to completely exclude standbys from being part of any quorum, then I guess we need to have per-standby settings to allow that to be defined. I'm in favour of giving people options. That needn't be a mandatory per-standby setting, just a non-default option, so that we can reduce the complexity of configuration for common cases. +1 Maximum Performance = quorum = 0 Maximum Availability = quorum = 1, timeout_action = commit Maximum Protection = quorum = 1, timeout_action = shutdown +1 Being able to say that a given server has not been granted to participate into the vote allowing to reach the global durability quorum will allow for choosing the failover candidates. Now you're able to have this reporting server and know for sure that your sync replicated transactions are not waiting for it. To summarize, the current per-transaction approach would be : - transaction level replication synchronous behaviour - proxy/cascading in core - quorum setup for deciding any commit is safe - any server can be excluded from the sync quorum - timeout can still raises exception or ignore (commit)? This last point seems to need some more discussion, or I didn't understand well the current positions and proposals. Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Thu, May 27, 2010 at 3:13 AM, Fujii Masao masao.fu...@gmail.com wrote: (1) most standard case: 1 master + 1 sync standby (near) When the master goes down, something like a clusterware detects that failure, and brings the standby online. Since we can ensure that the standby has all the committed transactions, failover doesn't cause any data loss. How do you propose to guarantee that? ISTM that you have to either commit locally first, or send the commit to the remote first. Either way, the two events won't occur exactly simultaneously. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Thu, 2010-05-27 at 20:13 +0900, Fujii Masao wrote: On Thu, May 27, 2010 at 7:33 PM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, 2010-05-27 at 19:21 +0900, Fujii Masao wrote: On Thu, May 27, 2010 at 6:30 PM, Simon Riggs si...@2ndquadrant.com wrote: Would it be easier to have this happen in a second pair of processes WALSynchroniser (on primary) and WAL Acknowledger (on standby). WALAcknowledger would send back a stream of ack messages with latest xlog positions. WALSynchroniser would receive these messages and wake up sleeping backends. If we did that then there'd be almost no change at all to existing code, just additional code and processes for the sync case. Code would be separate and there would be no performance concerns either. No, this seems to be bad idea. We should not establish extra connection between servers. That would be a source of trouble. What kind of trouble? You think using an extra connection would cause problems; why? Because the number of connection failure cases doubles. Likewise, the number of process failure cases would double. Not really. The users wait for just the synchroniser to return not for two things. Looks to me that other processes are independent of each other. Very simple. If you do choose to make #3 important, then I'd say you need to work out how to make WALWriter active as well, so it can perform regular fsyncs, rather than having WALReceiver wait across that I/O. Yeah, this might be an option for optimization though I'm not sure how it has good effect. As I said, WALreceiver would not need to wait across fsync... Right, but walreceiver still needs to wait for WAL flush by walwriter. Why does it? I just explained a design where that wasn't required. If currently WAL flush is the dominant workload for walreceiver, only leaving it to walwriter might not have so good effect. I'm not sure whether. If we're not sure, we could check before agreeing a design. WAL flush will be costly unless you have huge disk cache. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Thu, 2010-05-27 at 19:50 +0900, Fujii Masao wrote: For now, I agree that we support a quorum commit feature for 9.1 or later. But I don't think that it's simpler, more intuitive and easier-to-understand than per-standby setting. So I think that we should include the per-standby setting in the first patch. There already is a first patch to the community that implements quorum commit, just not by you. If you have a better way, describe it in detail and in full now, with reference to each of the use cases you mentioned, so that people get a chance to give their opinions on your design. Then we can let the community decide whether or not that second way is actually better. We may not need a second patch. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_trgm
It's already multibyte safe since 8.4 No, it doesn't. $ psql test Pager usage is off. psql (8.4.4) Type help for help. test=# select similarity('abc', 'abd'); -- OK similarity 0.33 (1 row) test=# select similarity('日本語', '日本後'); -- NG similarity NaN (1 row) test=# select show_trgm('abc'); -- OK show_trgm - { a, ab,abc,bc } (1 row) test=# select show_trgm('日本語'); -- NG show_trgm --- {} (1 row) Encoding is EUC_JP, locale is C. Included is the script to reproduce the problem. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp pg_trgm.sql Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Thu, May 27, 2010 at 8:28 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, May 27, 2010 at 3:13 AM, Fujii Masao masao.fu...@gmail.com wrote: (1) most standard case: 1 master + 1 sync standby (near) When the master goes down, something like a clusterware detects that failure, and brings the standby online. Since we can ensure that the standby has all the committed transactions, failover doesn't cause any data loss. How do you propose to guarantee that? ISTM that you have to either commit locally first, or send the commit to the remote first. Either way, the two events won't occur exactly simultaneously. Letting the transaction wait until the standby has received / flushed / replayed the WAL before it returns a success indicator to a client would guarantee that. This ensures that all transactions which a client knows as committed exist in the memory or disk of the standby. So we would be able to see those transactions from new master after failover. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Thu, May 27, 2010 at 8:30 PM, Simon Riggs si...@2ndquadrant.com wrote: Why does it? I just explained a design where that wasn't required. Hmm.. my expression might have been ambiguous. Walreceiver still needs to wait for WAL flush by walwriter *before* sending the ACK to the master, in #3 case. Because, in #3, the master has to wait until the standby has flushed the WAL. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Thu, May 27, 2010 at 8:30 PM, Simon Riggs si...@2ndquadrant.com wrote: There already is a first patch to the community that implements quorum commit, just not by you. Yeah, AFAIK, that patch includes also per-standby setting. If you have a better way, describe it in detail and in full now, with reference to each of the use cases you mentioned, so that people get a chance to give their opinions on your design. Then we can let the community decide whether or not that second way is actually better. We may not need a second patch. See http://archives.postgresql.org/pgsql-hackers/2010-05/msg01407.php But I think that we should focus on per-standby setting at first. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_trgm
Hi, On Thursday 27 May 2010 13:53:37 Tatsuo Ishii wrote: It's already multibyte safe since 8.4 No, it doesn't. Encoding is EUC_JP, locale is C. Included is the script to reproduce the problem. test=# select show_trgm('日本語'); show_trgm --- {0x8194c0,0x836e53,0x1dc363,0x1e22e9} (1 row) Time: 0.443 ms test=# select similarity('日本語', '日本後'); similarity 0.33 (1 row) Time: 0.426 ms Encoding is UTF-8... Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_trgm
No, it doesn't. Encoding is EUC_JP, locale is C. Included is the script to reproduce the problem. test=# select show_trgm('日本語'); show_trgm --- {0x8194c0,0x836e53,0x1dc363,0x1e22e9} (1 row) Time: 0.443 ms test=# select similarity('日本語', '日本後'); similarity 0.33 (1 row) Time: 0.426 ms Encoding is UTF-8... What is your locale? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Thu, May 27, 2010 at 8:02 AM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, May 27, 2010 at 8:28 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, May 27, 2010 at 3:13 AM, Fujii Masao masao.fu...@gmail.com wrote: (1) most standard case: 1 master + 1 sync standby (near) When the master goes down, something like a clusterware detects that failure, and brings the standby online. Since we can ensure that the standby has all the committed transactions, failover doesn't cause any data loss. How do you propose to guarantee that? ISTM that you have to either commit locally first, or send the commit to the remote first. Either way, the two events won't occur exactly simultaneously. Letting the transaction wait until the standby has received / flushed / replayed the WAL before it returns a success indicator to a client would guarantee that. This ensures that all transactions which a client knows as committed exist in the memory or disk of the standby. So we would be able to see those transactions from new master after failover. There could still be additional transactions that the original master has committed locally but were not acked to the client. I guess you'd just work around that by taking a new base backup from the new master. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_trgm
On Thursday 27 May 2010 14:40:41 Tatsuo Ishii wrote: No, it doesn't. Encoding is EUC_JP, locale is C. Included is the script to reproduce the problem. test=# select show_trgm('日本語'); show_trgm --- {0x8194c0,0x836e53,0x1dc363,0x1e22e9} (1 row) Time: 0.443 ms test=# select similarity('日本語', '日本後'); similarity 0.33 (1 row) Time: 0.426 ms Encoding is UTF-8... What is your locale? It was en_EN.UTF-8. Interesting. With C it fails... Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Thu, May 27, 2010 at 9:48 PM, Robert Haas robertmh...@gmail.com wrote: There could still be additional transactions that the original master has committed locally but were not acked to the client. I guess you'd just work around that by taking a new base backup from the new master. Right. Unfortunately the transaction aborted for a client might have already been committed in the standby. In this case, we might need to eliminate the mismatch of transaction status between a client and new master after failover. BTW, the similar situation might happen even when only one server is running. If the server goes down before returning a success to a client after flushing the commit record, the mismatch would happen after restart of the server. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] primary/secondary/master/slave/standby
On 27/05/10 12:39, Dimitri Fontaine wrote: Greg Starkgsst...@mit.edu writes: Fwiw I like the word replica but I don't see an obvious choice of word to pair it with I guess it's replica / origin, per choice of Jan Wieck to be found in our catalogs: http://www.postgresql.org/docs/current/static/catalog-pg-trigger.html tgenabledchar Controls in which session_replication_role modes the trigger fires. O = trigger fires in origin and local modes, D = trigger is disabled, R = trigger fires in replica mode, A = trigger fires always. So that's origin/replica, master/slave, primary/standby, master/standby. master/standby is my favourite, and I believe we have a rough consensus on that. I started to search/replace primary - master, but started to have second thoughts when I got to the section in the docs about standby servers: http://developer.postgresql.org/pgdocs/postgres/warm-standby.html Somehow that just doesn't sound as good after s/primary/master, the first sentence in particular. I think the reason is that master brings to mind an active connection between the master and standby, while primary sounds more loosely-coupled. Perhaps we should use master/standby when discussing streaming replication, and primary/standby when talking about a standby setup in general, possibly using file-based log shipping. The distinction is quite vague, so we'll have to document both terms as synonyms of each other. Thoughts? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] primary/secondary/master/slave/standby
On 27/05/10 12:39, Dimitri Fontaine wrote: Greg Starkgsst...@mit.edu writes: Fwiw I like the word replica but I don't see an obvious choice of word to pair it with I guess it's replica / origin, per choice of Jan Wieck to be found in our catalogs: http://www.postgresql.org/docs/current/static/catalog-pg-trigger.html tgenabledchar Controls in which session_replication_role modes the trigger fires. O = trigger fires in origin and local modes, D = trigger is disabled, R = trigger fires in replica mode, A = trigger fires always. So that's origin/replica, master/slave, primary/standby, master/standby. master/standby is my favorite, and I believe we have a rough consensus on that. I started to search/replace primary - master, but started to have second thoughts when I got to the section in the docs about standby servers: http://developer.postgresql.org/pgdocs/postgres/warm-standby.html Somehow that just doesn't sound as good after s/primary/master, the first sentence in particular. I think the reason is that master brings to mind an active connection between the master and standby, while primary sounds more loosely-coupled. Perhaps we should use master/standby when discussing streaming replication, and primary/standby when talking about a standby setup in general, possibly using file-based log shipping. The distinction is quite vague, so we'll have to document both terms as synonyms of each other. Thoughts? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_trgm
What is your locale? It was en_EN.UTF-8. Interesting. With C it fails... Yes, pg_trgm seems to have problems with multibyte + C locale. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Straightforward Synchronous Replication
Following design offers simplicity of design, performance and user control over sync rep waits, including wait-for-apply for HS. This implements Oracle's Maximum Availability option AND Maximum Performance options both together, rather than just one or the other: async and sync replication together, under user control. * BACKEND: In xact.c: Immediately after fsync during commit logic if (sync_rep != NONE) { max_wakeup_time = commit_timestamp + sync_rep_timeout; SetAlarm(max_wakeup_time); // similar to statement timeout WaitOnQueue(commitLSN); DisableAlarm(); } In proc.c: in signal handler code if (wakeup waiting_on_commit) RemoveFromQueue() * New process: WALSync (on primary) Receives messages from WALAck on standby and wakes up queued backends that have reached the requested commitLSN. If there are multiple WALSync processes they all try to remove backends from the head of the queue. Process started in same way as WALSender, when request arrives from standby. (WaitOnQueue() returns immediately if no WALSync are started, since that means no sync rep yet available) * New process: WALAck (on standby) Reads shared memory to get last received and last applied xlog location and sends message to WALSync on primary. Loop/Sleep forever. Values in shared mem already put there by WALReceiver and Startup processes. Reuse message protocol as for WALSender-WALReceiver. Process started after WALReceiver connects, if additional option in recovery.conf. Initiates second connection to primary, issues slightly different startup command to create WALSync. That's it. The above needs just two parameters at user level synch_rep = none | recv | apply synch_rep_timeout = Ns and an additional parameter in recovery.conf to say whether a standby is providing the facility for sync replication (as requested by Yeb etc) (default = yes). So this is the same as having quorum = 0 or 1 (boring but simple) and having sync_rep_timeout_action = commit in all cases (clear behaviour in failure modes, without need for per-standby parameters). The user specifies how long they wish to wait, but that wait never changes the flow of WAL data through the cluster, so we don't need to retune and redesign the existing system for reduced latency. It allows mixed synchronous and asynchronous replication with *ease*. If we design things differently that wouldn't be the case. The design is: * simple - Doesn't require any WAL or libpq changes * modular - almost completely isolated from existing components in 9.0. (e.g. WALSender doesn't know or care about WALSync, WALReceiver never needs to speak to WALAck directly). * performant - async and sync can co-exist; WALReceiver never waits; no need to retune WALSender operation for synchronous mode * low latency - the backchannel from standby to primary uses a separate connection so can operate without slowing down data from primary * user centric - allows user control over this feature, an important tool for real world performance * hot standby - implements xid back channel with ease (later phase) We can hang other options on this later - nothing else is essential. Development time ~ 1 man month because similar code exists for all aspects described above, so no research or internals discussion required. Yes, this is a 3rd design for sync rep, though I think it improves upon the things I've heard so far from other authors and also includes feedback from Dimitri, Heikki, Yeb, Alastair. I'm happy to code this as well, when 9.1 dev starts and a benchmark should be interesting also. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby
With the parameter checkpoint_segment and wal_keep_segments the max. number of wal segments are set. If now the max number is reached, (1) the segments are deleted/recycled or (2) if the time set by the checkpoint_timeout is over, a checkpoint is set and if possible a deletion/recycling is done. This is the mechanism on the active side of a db server. On the standby side however only unused tranferred segments will be deleted if the checkpoint_timeout mechanism (2) is executed. Is this a correct behaviour or it is an error? I have observed (checkpoint_segment set to 3; wal_keep_segments set to 10 and checkpoint_timeout set to 30min) that in my stress test the disk usage on standby side is increased up to 2GB with xlog segments whereby on the active side only ~60MB xlog files are available (we have patched the xlog file size to 4MB). To prevent this one possibility is to decreace the checkpoint_timeout to a low value (30sec), however this had the disadvantage that a checkpoint is often executed on active side which can influence the performance. Another possibility is to have different postgresql.conf on active and on standby side, but this is not our preferred solution. Best Regards/mfG Ingo Sander = Nokia Siemens Networks GmbH Co. KG NWS EP CP SVSS Platform Tech Support DE St.-Martin-Str. 76 D-81541 München *Tel.: +49-89-515938390 *ingo.san...@nsn.com
Re: [HACKERS] Synchronization levels in SR
On Thu, May 27, 2010 at 9:09 AM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, May 27, 2010 at 9:48 PM, Robert Haas robertmh...@gmail.com wrote: There could still be additional transactions that the original master has committed locally but were not acked to the client. I guess you'd just work around that by taking a new base backup from the new master. Right. Unfortunately the transaction aborted for a client might have already been committed in the standby. In this case, we might need to eliminate the mismatch of transaction status between a client and new master after failover. BTW, the similar situation might happen even when only one server is running. If the server goes down before returning a success to a client after flushing the commit record, the mismatch would happen after restart of the server. True. But that's a slightly different case. Clients could fail to receive commit ACKs for a variety of reasons, like losing network connectivity momentarily. They had better be prepared for that no matter whether replication is in use or not. The new issue that replication adds is that you've got to make sure that the two (or n) nodes don't disagree with each other. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Jan Wieck janwi...@yahoo.com wrote: On 5/26/2010 4:34 PM, Kevin Grittner wrote: My latest idea for handling this in WAL-based replication involves WAL-logging information about the transaction through which a the committing transaction makes it safe to view. There are a few options here at the detail level that I'm still thinking through. The idea would be that the xmin from read-only queries on the slaves might be somewhere behind where you would expect based on transactions committed. (The details involve such things as where non-serializable transactions fall into the plan on both sides, and whether it's worth the effort to special-case read-only transactions on the master.) I can't say that I'm 100% sure that some lurking detail won't shoot this technique down for HS, but it seems good to me at a conceptual level. Without simulating multiple simultaneous transactions during playback, how are you going to manage that the tuples, already inserted on behalf of the ongoing master transactions, disappear when they abort on the master? When do writes ever become visible to a snapshot without having been committed? I'm not talking about changing that in any way. I'm talking about deferring visibility of committed transactions until they can be viewed without risking serialization anomalies. This requires, at a minimum, that any concurrent serializable transactions which are not read-only have completed. (Perhaps I'm not understanding your question) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_trgm
Tatsuo Ishii is...@postgresql.org writes: What is your locale? It was en_EN.UTF-8. Interesting. With C it fails... Yes, pg_trgm seems to have problems with multibyte + C locale. It's not a problem, it's just pilot error, or possibly inadequate documentation. pg_trgm uses the locale's definition of alpha, digit, etc. In C locale only basic ASCII letters and digits will be recognized as word constituents. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Wed, May 26, 2010 at 5:38 PM, Greg Stark gsst...@mit.edu wrote: How about just API generalities? Like, where do you need this data, on the master or on the slave? Would PGXC like it on the transaction coordinator? What question do you need to answer, do you need to pull out sets of commits in certain ranges or look up specific transaction ids and find out when they committed? Or do you only need to answer which of two transaction ids committed first? This thread has been hard to follow for me. Were any of these questions answered? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_trgm
Yes, pg_trgm seems to have problems with multibyte + C locale. It's not a problem, it's just pilot error, or possibly inadequate documentation. pg_trgm uses the locale's definition of alpha, digit, etc. In C locale only basic ASCII letters and digits will be recognized as word constituents. That means there is no chance to make pg_trgm work with multibyte + C locale? If so, I will leave pg_trgm as it is and provide private patches for those who need the functionality. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby
On Thu, May 27, 2010 at 10:13 PM, Sander, Ingo (NSN - DE/Munich) ingo.san...@nsn.com wrote: With the parameter checkpoint_segment and wal_keep_segments the max. number of wal segments are set. If now the max number is reached, (1) the segments are deleted/recycled or (2) if the time set by the checkpoint_timeout is over, a checkpoint is set and if possible a deletion/recycling is done. This is the mechanism on the active side of a db server. On the standby side however only unused tranferred segments will be deleted if the checkpoint_timeout mechanism (2) is executed. Is this a correct behaviour or it is an error? I have observed (checkpoint_segment set to 3; wal_keep_segments set to 10 and checkpoint_timeout set to 30min) that in my stress test the disk usage on standby side is increased up to 2GB with xlog segments whereby on the active side only ~60MB xlog files are available (we have patched the xlog file size to 4MB). To prevent this one possibility is to decreace the checkpoint_timeout to a low value (30sec), however this had the disadvantage that a checkpoint is often executed on active side which can influence the performance. Another possibility is to have different postgresql.conf on active and on standby side, but this is not our preferred solution. I guess this happens because the frequency of checkpoint on the standby is too lower than that on the master. In the master, checkpoint occurs for every consumption of three segments because of checkpoint_segments = 3. On the other hand, in the standby, only checkpoint_timeout has effect, so checkpoint occurs for every 30 minutes because of checkpoint_timeout = 30min. The walreceiver should signal the bgwriter to start checkpoint if it has received more than checkpoint_segments WAL files, like normal processing? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Straightforward Synchronous Replication
On Thu, May 27, 2010 at 9:08 AM, Simon Riggs si...@2ndquadrant.com wrote: * New process: WALAck (on standby) Reads shared memory to get last received and last applied xlog location and sends message to WALSync on primary. Loop/Sleep forever. So would WALAck be polling shared memory? That would increase latency significantly, I think, though perhaps you have a plan for avoiding that? The above needs just two parameters at user level synch_rep = none | recv | apply synch_rep_timeout = Ns and an additional parameter in recovery.conf to say whether a standby is providing the facility for sync replication (as requested by Yeb etc) (default = yes). So this is the same as having quorum = 0 or 1 (boring but simple) and having sync_rep_timeout_action = commit in all cases (clear behaviour in failure modes, without need for per-standby parameters). This seems good, but I think we need a little more definition about what happens with sync_rep_timeout expires. Yes, this is a 3rd design for sync rep, though I think it improves upon the things I've heard so far from other authors and also includes feedback from Dimitri, Heikki, Yeb, Alastair. I'm happy to code this as well, when 9.1 dev starts and a benchmark should be interesting also. It's great that we have so many people who want to implement this feature, or in one case already have. I'm not sure whose design is best, but I do hope that we can avoid dueling patches. There are plenty of other good features to work on also. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby
On Thu, May 27, 2010 at 10:09 AM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, May 27, 2010 at 10:13 PM, Sander, Ingo (NSN - DE/Munich) ingo.san...@nsn.com wrote: With the parameter checkpoint_segment and wal_keep_segments the max. number of wal segments are set. If now the max number is reached, (1) the segments are deleted/recycled or (2) if the time set by the checkpoint_timeout is over, a checkpoint is set and if possible a deletion/recycling is done. This is the mechanism on the active side of a db server. On the standby side however only unused tranferred segments will be deleted if the checkpoint_timeout mechanism (2) is executed. Is this a correct behaviour or it is an error? I have observed (checkpoint_segment set to 3; wal_keep_segments set to 10 and checkpoint_timeout set to 30min) that in my stress test the disk usage on standby side is increased up to 2GB with xlog segments whereby on the active side only ~60MB xlog files are available (we have patched the xlog file size to 4MB). To prevent this one possibility is to decreace the checkpoint_timeout to a low value (30sec), however this had the disadvantage that a checkpoint is often executed on active side which can influence the performance. Another possibility is to have different postgresql.conf on active and on standby side, but this is not our preferred solution. I guess this happens because the frequency of checkpoint on the standby is too lower than that on the master. In the master, checkpoint occurs for every consumption of three segments because of checkpoint_segments = 3. On the other hand, in the standby, only checkpoint_timeout has effect, so checkpoint occurs for every 30 minutes because of checkpoint_timeout = 30min. The walreceiver should signal the bgwriter to start checkpoint if it has received more than checkpoint_segments WAL files, like normal processing? Is this also an issue when using log shipping, or just with SR? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_trgm
Tatsuo Ishii is...@postgresql.org writes: It's not a problem, it's just pilot error, or possibly inadequate documentation. pg_trgm uses the locale's definition of alpha, digit, etc. In C locale only basic ASCII letters and digits will be recognized as word constituents. That means there is no chance to make pg_trgm work with multibyte + C locale? If so, I will leave pg_trgm as it is and provide private patches for those who need the functionality. Exactly what do you consider to be the missing functionality? You need a notion of word vs non-word character from somewhere, and the locale setting is the standard place to get that. The core text search functionality behaves the same way. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_trgm
Exactly what do you consider to be the missing functionality? You need a notion of word vs non-word character from somewhere, and the locale setting is the standard place to get that. The core text search functionality behaves the same way. No. Text search works fine with multibyte + C locale. Anyway locale is completely usesless for finding word vs non-character an agglutinative language such as Japanese. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_trgm
Tatsuo Ishii is...@sraoss.co.jp writes: Anyway locale is completely usesless for finding word vs non-character an agglutinative language such as Japanese. Well, that doesn't mean that the answer is to use C locale ;-) However, you could possibly think about making this bit of code more flexible: #ifdef KEEPONLYALNUM #define iswordchr(c)(t_isalpha(c) || t_isdigit(c)) #else #define iswordchr(c)(!t_isspace(c)) #endif Currently it seems to be hard-wired to the first case in standard builds. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_trgm
Well, that doesn't mean that the answer is to use C locale ;-) Of course it's up to user whether to use C locale or not. I just want pg_trgm work with C locale as well. However, you could possibly think about making this bit of code more flexible: #ifdef KEEPONLYALNUM #define iswordchr(c) (t_isalpha(c) || t_isdigit(c)) #else #define iswordchr(c) (!t_isspace(c)) #endif Currently it seems to be hard-wired to the first case in standard builds. Yup. Here is the patch in my mind: *** trgm_op.c~ 2009-06-11 23:48:51.0 +0900 --- trgm_op.c 2010-05-27 23:38:20.0 +0900 *** *** 59,65 } #ifdef KEEPONLYALNUM ! #define iswordchr(c) (t_isalpha(c) || t_isdigit(c)) #else #define iswordchr(c) (!t_isspace(c)) #endif --- 59,65 } #ifdef KEEPONLYALNUM ! #define iswordchr(c) (t_isalpha(c) || t_isdigit(c) || (lc_ctype_is_c() !t_isspace(c))) #else #define iswordchr(c) (!t_isspace(c)) #endif -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_trgm
Tatsuo Ishii is...@postgresql.org writes: ! #define iswordchr(c)(t_isalpha(c) || t_isdigit(c) || (lc_ctype_is_c() !t_isspace(c))) This seems entirely arbitrary. It might fix things in your view but it will break the longstanding behavior for other people. I think a more appropriate type of fix would be to expose the KEEPONLYALNUM option as a GUC, or some other way of letting the user decide what he wants. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_trgm
On Thu, May 27, 2010 at 3:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: I think a more appropriate type of fix would be to expose the KEEPONLYALNUM option as a GUC, or some other way of letting the user decide what he wants. So I think a GUC is broken because pg_tgrm has a index opclasses and any indexes built using one setting will be broken if the GUC is changed. Perhaps we need two sets of functions (which presumably call the same implementation with a flag to indicate which definition to use). Then you can define an index using one or the other and the meaning would be stable. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 9.0beta2 release plans
Current thought among core is to wrap beta2 Thursday June 3 (a week from today) for public release on Monday the 7th. Get those fixes in. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Specification for Trusted PLs?
Tom Lane wrote: Joshua Tolley eggyk...@gmail.com writes: Agreed. As long as a trusted language can do things outside the database only by going through a database and calling some function to which the user has rights, in an untrusted language, that seems decent to me. A user with permissions to launch_missiles() would have a function in an untrusted language to do it, but there's no reason an untrusted language shouldn't be able to say SELECT s/untrusted/trusted/ here, right? One thing that has always bugged me is that the use of trusted/untrusted for languages is confusing, because it is trusted users who can run untrusted languages. I think trust is more associated with users than with software features. I have no idea how this confusion could be clarified. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_trgm
On tor, 2010-05-27 at 23:20 +0900, Tatsuo Ishii wrote: Anyway locale is completely usesless for finding word vs non-character an agglutinative language such as Japanese. I don't know about Japanese, but the locale approach works just fine for other agglutinative languages. I would rather suspect that it is the trigram approach that might be rather useless for such languages, because you are going to get a lot of similarity hits for the affixes. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] quoting and recovery.conf
On Fri, May 14, 2010 at 6:41 AM, Simon Riggs si...@2ndquadrant.com wrote: The main reason for having a separate recovery.conf file is that its existence is what drives the setting of InArchiveRecovery. If we were to devise some other trigger for that condition, it'd be possible to fold all those settings in as GUC variables. And the removal of recovery.conf at end of recovery prevents the re-entry into archive recovery if we crash. These things made sense when we were only dealing with PITR recovery from a backup intending to bring up the database when the PITR reached the target. That's now only one specific use case. For hot standbys we don't want them to come up ever, even if we crash. And for other standby databases we probably want to control this manually, not automatically. In these cases it's really confusing for users that some parameters have to be in postgresql.conf and some in recovery.conf. They don't see any distinction between these two files since they don't intend to ever see the case where the target is reached and the file moved out of the way automatically. My suggestion is we should fold all the parameters into postgresql.conf and treat recovery.conf as an additional postgresql.conf to read. It would allow any GUC. The only difference is that it would be moved out of the way automatically when the target is reached. Ideally I would have all the configuration be in postgresql.conf and be identical between all servers, masters and slaves. The only difference would be a single line which controlled whether to push or pull wal data. Switching roles would be a regular GUC change in postgresql.conf and pg_ctl reload. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Distclean does not remove gram.c
On Wed, May 26, 2010 at 9:00 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, May 26, 2010 at 8:20 PM, Andrew Dunstan and...@dunslane.net wrote: Gurjeet Singh wrote: I did a `git clean -f -d` and even that did not remove gram.c, apparently because this file _was_ alive at some point in the past hence git won't remove it even though the current branch does not have gram.c. At first glance that looks like a git bug. My guess is that either .git/info/exclude or a .gitignore file someplace says to ignore gram.c. git clean -df will not remove such files; you need git clean -dfx if you want that. Yes indeed, -x is what I needed. Thanks. -- gurjeet.singh @ EnterpriseDB - The Enterprise Postgres Company http://www.enterprisedb.com singh.gurj...@{ gmail | yahoo }.com Twitter/Skype: singh_gurjeet Mail sent from my BlackLaptop device
Re: [HACKERS] List traffic
On Sat, May 15, 2010 at 7:38 AM, Marc G. Fournier scra...@hub.org wrote: most people are not prepared to understand the concept of more than one list for project... Apparently you don't use very many large projects ... FreeBSD has 20+ lists, dedicated to various aspects of both end user and developer ... I imagine Linux has *as many if not more* ... MySQL, if memory servers, has a half dozen or more ... etc ... Sure, if we have distinctions which make sense then having separate lists makes sense. Linux has separate lists for different drivers, different parts of the kernel, projects to improve the kernel in various specific ways (latency, etc). I'm all for having a list dedicated to infrastructure (oddly named -www here) and a list dedicated to printing flyers and arranging conferences (-advocacy) since those topics are usually well defined. Lists like -ecpg or -odbc would work fine if the traffic warranted them. But some of the lists we have now are 99% overlap with each other -- I claim because the definitions are meaningless. What part of postgres discussion (aside from this thread) *don't* relate in some way to SQL? Or administration? Or performance? Most performance problems end up being solved by adjusting SQL or changing GUCs. Mot administration questions are originally posed as general help questions. If you're subscribed to these lists you get a random, fairly small, subset of discussion related these topics. Perhaps what I'm looking for is a more sensible division that allows most of the traffic related to the subtopics to actually go there. It would have to be a division so clearcut that anyone who doesn't follow could reasonably be blamed for not following etiquette. That's simply not true with the current divisions. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] quoting and recovery.conf
On Thu, 2010-05-27 at 16:20 +0100, Greg Stark wrote: My suggestion is we should fold all the parameters into postgresql.conf and treat recovery.conf as an additional postgresql.conf to read. It would allow any GUC. The only difference is that it would be moved out of the way automatically when the target is reached. Good idea, needs a little fleshing out. What do we do if recovery.conf and postgresql.conf have different settings in them? Trigger reload at end of recovery? Presumably we would also ignore server startup parameters in recovery.conf? -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_trgm
I don't know about Japanese, but the locale approach works just fine for other agglutinative languages. I would rather suspect that it is the trigram approach that might be rather useless for such languages, because you are going to get a lot of similarity hits for the affixes. I'm not sure what you mean by affixes. But I will explain... A Japanese sentence consists of words. Problem is, each word is not separated by space (agglutinative). So most text tools such as text search need preprocess which finds word boundaries by looking up dictionaries (and smart grammer analysis routine). In the process affixes can be determined and perhaps removed from the target word group to be used for text search (note that removing affixes is no relevant to locale). Once we get space separated sentence, it can be processed by text search or by pg_trgm just same as Engligh. (Note that these preprocessing are done outside PostgreSQL world). The difference is just the word can be consists of non ASCII letters. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Straightforward Synchronous Replication
On Thu, 2010-05-27 at 10:11 -0400, Robert Haas wrote: On Thu, May 27, 2010 at 9:08 AM, Simon Riggs si...@2ndquadrant.com wrote: * New process: WALAck (on standby) Reads shared memory to get last received and last applied xlog location and sends message to WALSync on primary. Loop/Sleep forever. So would WALAck be polling shared memory? That would increase latency significantly, I think, though perhaps you have a plan for avoiding that? The backends are going to be released in batches anyway, so I can't see how polling makes a difference. Polling means no waiting, so asynchronous action and higher throughput, and with sufficiently high polling rate no significant loss of latency. The other plan requires WALReceiver to wait for fsync and apply, which seems very likely to suck badly from a latency perspective. While its waiting it is also reducing throughout of incoming WAL. It's hard to see how that would work well. You could also do this by avoiding the wait in WALReceiver, but then that becomes more like polling anyway. The above needs just two parameters at user level synch_rep = none | recv | apply synch_rep_timeout = Ns and an additional parameter in recovery.conf to say whether a standby is providing the facility for sync replication (as requested by Yeb etc) (default = yes). So this is the same as having quorum = 0 or 1 (boring but simple) and having sync_rep_timeout_action = commit in all cases (clear behaviour in failure modes, without need for per-standby parameters). This seems good, but I think we need a little more definition about what happens with sync_rep_timeout expires. It commits... that is very clear: sync_rep_timeout_action = commit in all cases. Commit is the only viable option, since abort and wait-forever both have disadvantages pointed out for them. Yes, this is a 3rd design for sync rep, though I think it improves upon the things I've heard so far from other authors and also includes feedback from Dimitri, Heikki, Yeb, Alastair. I'm happy to code this as well, when 9.1 dev starts and a benchmark should be interesting also. It's great that we have so many people who want to implement this feature, or in one case already have. I'm not sure whose design is best, but I do hope that we can avoid dueling patches. There are plenty of other good features to work on also. There is already a patch on SR, yet Masao is discussing another that contains what looks to me like very close to nothing of Zoltan's work, not even similar ideas. The dueling patches situation looks like it already exists to me, though not of my making or encouragement. Even if I agreed with everything one of those authors say, there would still be two patches. Considering a variety of design approaches seems like a good idea for an important feature, especially when the information is thin and opinions run high. It's unlikely that anyone is right about everything, which is why I've amalgamated this simple proposal from everything said so far. It's easy to add some things if we add them at the start, much harder to retrofit them. I've shown that some things are easier than has been said, with fewer parameters and a good case for better performance also. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_trgm
So I think a GUC is broken because pg_tgrm has a index opclasses and any indexes built using one setting will be broken if the GUC is changed. Perhaps we need two sets of functions (which presumably call the same implementation with a flag to indicate which definition to use). Then you can define an index using one or the other and the meaning would be stable. It's worse. pg_trgm has another compile option IGNORECASE which might affect index opclasses. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] List traffic
On Thu, 27 May 2010, Greg Stark wrote: Sure, if we have distinctions which make sense then having separate lists makes sense. Linux has separate lists for different drivers, different parts of the kernel, projects to improve the kernel in various specific ways (latency, etc). I'm all for having a list dedicated to infrastructure (oddly named -www here) Actually, infrastructure is appropriately discussed on -sysadmins ... web is on -www ... tends to be a bit of overlap since -sysadmins was added later, and prior to that we did discuss on -www ... since those topics are usually well defined. Lists like -ecpg or -odbc would work fine if the traffic warranted them. I don't agree with the comment about 'if traffic warranted them' though ... the fact that there is very little traffic should be what makes them attractive / useful ... you don't have to weed through alot of posts to find the odbc/ecpg related ones ... Perhaps what I'm looking for is a more sensible division that allows most of the traffic related to the subtopics to actually go there. It would have to be a division so clearcut that anyone who doesn't follow could reasonably be blamed for not following etiquette. That's simply not true with the current divisions. how about something -sql vs -tuning ... ? -tuning replacing -performance, which I do agree could be sql *or* server ... where -tuning would be more obviously server related ... Marc G. FournierHub.Org Hosting Solutions S.A. scra...@hub.org http://www.hub.org Yahoo:yscrappySkype: hub.orgICQ:7615664MSN:scra...@hub.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Straightforward Synchronous Replication
On Thu, May 27, 2010 at 11:50 AM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, 2010-05-27 at 10:11 -0400, Robert Haas wrote: On Thu, May 27, 2010 at 9:08 AM, Simon Riggs si...@2ndquadrant.com wrote: * New process: WALAck (on standby) Reads shared memory to get last received and last applied xlog location and sends message to WALSync on primary. Loop/Sleep forever. So would WALAck be polling shared memory? That would increase latency significantly, I think, though perhaps you have a plan for avoiding that? The backends are going to be released in batches anyway, so I can't see how polling makes a difference. Polling means no waiting, so asynchronous action and higher throughput, and with sufficiently high polling rate no significant loss of latency. I guess what I'm trying to figure out is the part that says loop/sleep forever. That sounds like you wait 50 ms (or some other interval), then check shared memory to see if anything has changed, if not you do it again. That means that up to 49.9 ms (or whatever interval you picked) could be spent waiting before you realize that new WAL has been applied, which I suspect will not work out very well. On the other hand checking it in a TIGHT loop would mean using up a whole CPU on an idle system, so that's not practical either. ISTM you'd need some kind of signalling system between the startup process and the WALAck process, so that the startup process can wake WALAck after applying each bit of WAL (or maybe the startup process knows about the lowest LSN that WALAck cares about, and wakes it only upon reaching that point). The other plan requires WALReceiver to wait for fsync and apply, which seems very likely to suck badly from a latency perspective. While its waiting it is also reducing throughout of incoming WAL. It's hard to see how that would work well. You could also do this by avoiding the wait in WALReceiver, but then that becomes more like polling anyway. I'm not sure if I understand this part, so let me try to say it another way and you can tell me if I've got it right. I think your concern is that, during the time that WALReceiver is waiting for one chunk of WAL to get fsynced, the startup process might finish applying an earlier chunk of WAL that is of interest to the master. The ACK will therefore be delayed until the fsync completes and WALReceiver can again do other things, like check whether there are any ACKs that must be sent. Is that it, or have I missed the boat completely? The above needs just two parameters at user level synch_rep = none | recv | apply synch_rep_timeout = Ns and an additional parameter in recovery.conf to say whether a standby is providing the facility for sync replication (as requested by Yeb etc) (default = yes). So this is the same as having quorum = 0 or 1 (boring but simple) and having sync_rep_timeout_action = commit in all cases (clear behaviour in failure modes, without need for per-standby parameters). This seems good, but I think we need a little more definition about what happens with sync_rep_timeout expires. It commits... that is very clear: sync_rep_timeout_action = commit in all cases. Commit is the only viable option, since abort and wait-forever both have disadvantages pointed out for them. So, do we declare the sync server offline at that point and stop waiting for it, or do we continue waiting for it on every transaction? If we declare it dead, what are the criteria for subsequently making it alive again? Yes, this is a 3rd design for sync rep, though I think it improves upon the things I've heard so far from other authors and also includes feedback from Dimitri, Heikki, Yeb, Alastair. I'm happy to code this as well, when 9.1 dev starts and a benchmark should be interesting also. It's great that we have so many people who want to implement this feature, or in one case already have. I'm not sure whose design is best, but I do hope that we can avoid dueling patches. There are plenty of other good features to work on also. There is already a patch on SR, yet Masao is discussing another that contains what looks to me like very close to nothing of Zoltan's work, not even similar ideas. The dueling patches situation looks like it already exists to me, though not of my making or encouragement. Even if I agreed with everything one of those authors say, there would still be two patches. Oh, I wasn't aware that Fujii Masao's work had progressed as far as an actual patch yet. Considering a variety of design approaches seems like a good idea for an important feature, especially when the information is thin and opinions run high. It's unlikely that anyone is right about everything, which is why I've amalgamated this simple proposal from everything said so far. Agreed. It's easy to add some things if we add them at the start, much harder to retrofit them. I've shown that some things are easier than has been said, with
Re: [HACKERS] [ADMIN] command tag logging
alvherre alvhe...@commandprompt.com writes: Excerpts from Ray Stell's message of mié may 26 17:08:33 -0400 2010: I just installed a compiled from src 8.3.11. I usually include %i, command tag, in the log_line_prefix setting. This causes some spewage I'd not seen before on connection received lines as if it is dumping the environment: Hmm, I bet it's the recent %.*s patch. That is in the right place, isn't it. That would suggest that get_ps_display() is returning a wrong length on Ray's machine. It works okay here, but since that's platform-specific code that hardly proves much. Ray, what platform is this exactly? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] command tag logging
Excerpts from Tom Lane's message of jue may 27 12:49:49 -0400 2010: alvherre alvhe...@commandprompt.com writes: Excerpts from Ray Stell's message of mié may 26 17:08:33 -0400 2010: I just installed a compiled from src 8.3.11. I usually include %i, command tag, in the log_line_prefix setting. This causes some spewage I'd not seen before on connection received lines as if it is dumping the environment: Hmm, I bet it's the recent %.*s patch. That is in the right place, isn't it. That would suggest that get_ps_display() is returning a wrong length on Ray's machine. It works okay here, but since that's platform-specific code that hardly proves much. Ray, what platform is this exactly? FWIW it fails for me too (Debian running Linux 2.6.32). Adding some logging to stderr results in this: psdisp (len 2130) is: “““/pgsql/install/83_rel/bin/postmaster””” /pgsql/install/83_rel/bin/postmasterPGDATA=/pgsql/install/83_rel/dataORBIT_SOCKETDIR=/home/alvherre/tmp/orbit-alvherreSSH_AGENT_PID=2739GPG_AGENT_INFO=/tmp/gpg-aXAHSs/S.gpg-agent:2704:1SHELL=/bin/bashTERM=xtermXDG_SESSION_COOKIE=e50959452240490c59b0366b96665400-1274967349.87074-853952583HISTSIZE=1TMPDIR=/home/alvherre/tmpGTK_RC_FILES=/etc/gtk/gtkrc:/home/alvherre/.gtkrc-1.2-gnome2WINDOWID=29360152GNOME_KEYRING_CONTROL=/home/alvherre/tmp/keyring-EUoSfgGTK_MODULES=canberra-gtk-moduleUSER=alvherrehttp_proxy=http://localhost:8118XTERM_SHELL=/bin/bashHISTFILESIZE=1LD_LIBRARY_PATH=/pgsql/install/83_rel/libLS_COLORS=no=00:fi=00:di=01;35:ln=01;36:pi=40;33:so=01;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=01;32:*.cmd=01;32:*.exe=01;32:*.com=01;32:*.btm=01;32:*.bat=01;32:*.tar=01;31:*.tgz=01;31:*.tbz2=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lha=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tz=01;31:*.rpm=01;31:*.jpg=01 ;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.xbm=01;35:*.xpm=01;35:*.png=01;35:*.tif=01;35:*.tiff=01;35:SSH_AUTH_SOCK=/home/alvherre/tmp/keyring-EUoSfg/sshTMOUT=0USERNAME=alvherreSESSION_MANAGER=local/perhan:@/tmp/.ICE-unix/2689,unix/perhan:/tmp/.ICE-unix/2689PAGER=lessDESKTOP_SESSION=gnomePATH=/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games:/home/alvherre/bin:/sbin:/usr/sbinGDM_XSERVER_LOCATION=localPX_CONFIG_ORDER=envvarPWD=/home/alvherreEDITOR=vimLANG=es_CL.UTF-8GDM_LANG=es_CL.UTF-8TZ=America/SantiagoGDMSESSION=gnomeHISTIGNORE=ls:bg:fg:cd:exit:XTERM_VERSION=XTerm(256)XTERM_LOCALE=es_CL.UTF-8HISTCONTROL=ignorespace:erasedupsHOME=/home/alvherreSHLVL=1GNOME_DESKTOP_SESSION_ID=this-is-deprecatedno_proxy=localhost,127.0.0.0/8BASH_ENV=/home/alvherre/.bashrcLOGNAME=alvherreLESS=-XRM -x4VISUAL=vimXDG_DATA_DIRS=/usr/share/gnome:/usr/local/share/:/usr/share/:/usr/share/gdm/DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-TnbbC5PUiR,guid=833f76565b26a89543f6aa42004f_PX_CONFIG _ORDER=WINDOWPATH=7DISPLAY=:0.0HOSTFILE=/home/alvherre/.hostsXAUTHORITY=/home/alvherre/.Xauthority LOG: connection received: host=[local] Maybe the problem is the PS_PADDING setting? I patched as below -- obviously the \0 didn't make any difference (it was the first thing I tried), because the length, as you say, is wrong. *** log_line_prefix(StringInfo buf) *** 1615,1621 --- 1615,1623 int displen; psdisp = get_ps_display(displen); + fprintf(stderr, psdisp (len %d) is: “““%s”””\n, displen, psdisp); appendBinaryStringInfo(buf, psdisp, displen); + appendStringInfoChar(buf, '\0'); } break; case 'r': -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/27/2010 9:59 AM, Greg Stark wrote: On Wed, May 26, 2010 at 5:38 PM, Greg Stark gsst...@mit.edu wrote: How about just API generalities? Like, where do you need this data, on the master or on the slave? Would PGXC like it on the transaction coordinator? What question do you need to answer, do you need to pull out sets of commits in certain ranges or look up specific transaction ids and find out when they committed? Or do you only need to answer which of two transaction ids committed first? This thread has been hard to follow for me. Were any of these questions answered? Yes. On 5/26/2010 4:49 PM, Jan Wieck wrote: On 5/26/2010 12:38 PM, Greg Stark wrote: On Wed, May 26, 2010 at 5:10 PM, Jan Wieck janwi...@yahoo.com wrote: ... but to answer that request, actually I don't even think we should be discussing API specifics. How about just API generalities? Like, where do you need this data, on the master or on the slave? Would PGXC like it on the transaction coordinator? What question do you need to answer, do you need to pull out sets of commits in certain ranges or look up specific transaction ids and find out when they committed? Or do you only need to answer which of two transaction ids committed first? The question I want answered is what was the order and xid of the next 0..n transactions, that committed after transaction X? Preferably I would avoid scanning the entire available WAL just to get the next n xid's to process. The proposal assigned a unique serial number (file segment and position driven) to each xid and used that for the ordering as well as identification of the last known transaction. That is certainly a premature implementation detail. In this implementation it wouldn't even matter if a transaction that was recorded actually never made it because it crashed before the WAL flush. It would be reported by this commit order feature, but there would be no traces of whatever it did to be found inside the DB, so that anomaly is harmless. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
Andrew Dunstan and...@dunslane.net writes: Peter Eisentraut wrote: In systems that have inheritance of composite types, this is used to specify which type the value is supposed to be interpreted as (for example, to treat the value as a supertype). Why don't they just use CAST() syntax for that, instead of adding this unnecessary syntax wart? If their complaint is that CAST() is too much typing, perhaps they could adopt :: cast notation ;-) I think we should fix it now. Quick thought: maybe we could use FOR instead of AS: select myfunc(7 for a, 6 for b); I'm afraid FOR doesn't work either; it'll create a conflict with the spec-defined SUBSTRING(x FOR y) syntax. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] command tag logging
alvherre alvhe...@commandprompt.com writes: Excerpts from Tom Lane's message of jue may 27 12:49:49 -0400 2010: That is in the right place, isn't it. That would suggest that get_ps_display() is returning a wrong length on Ray's machine. It works okay here, but since that's platform-specific code that hardly proves much. Ray, what platform is this exactly? FWIW it fails for me too (Debian running Linux 2.6.32). Hmm. It seems like the %.*s change could only have affected things if the PS display area contains \0 characters before the theoretical end of the string. Which it shouldn't, once we've set the display, but Ray is only reporting this for log_connection output which might come out before that. In any case it strikes me that get_ps_display() is designed on the assumption that it needn't be particularly fast, but using its result in log_line_prefix is a place in which performance could indeed matter. Maybe we should go to some effort to track the intended display string length explicitly so we could avoid the mucking about in get_ps_display(). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] command tag logging
On Thu, May 27, 2010 at 12:49:49PM -0400, Tom Lane wrote: alvherre alvhe...@commandprompt.com writes: Excerpts from Ray Stell's message of mi?? may 26 17:08:33 -0400 2010: I just installed a compiled from src 8.3.11. I usually include %i, command tag, in the log_line_prefix setting. This causes some spewage I'd not seen before on connection received lines as if it is dumping the environment: Hmm, I bet it's the recent %.*s patch. That is in the right place, isn't it. That would suggest that get_ps_display() is returning a wrong length on Ray's machine. It works okay here, but since that's platform-specific code that hardly proves much. Ray, what platform is this exactly? I should have included this: version - PostgreSQL 8.3.11 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46) (1 row) [postgres ~]$ uname -a Linux horntail.cns.vt.edu 2.6.18-194.3.1.el5PAE #1 SMP Sun May 2 04:42:25 EDT 2010 i686 i686 i386 GNU/Linux [postgres ~]$ cat /etc/issue Red Hat Enterprise Linux Server release 5.5 (Tikanga) Kernel \r on an \m -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Keepalive for max_standby_delay
On Wed, 2010-05-26 at 16:22 -0700, Josh Berkus wrote: Just this second posted about that, as it turns out. I have a v3 *almost* ready of the keepalive patch. It still makes sense to me after a few days reflection, so is worth discussion and review. In or out, I want this settled within a week. Definitely need some RR here. Does the keepalive fix all the issues with max_standby_delay? Tom? OK, here's v4. Summary * WALSender adds a timestamp onto the header of every WAL chunk sent. * Each WAL record now has a conceptual send timestamp that remains constant while that record is replayed. This is used as the basis from which max_standby_delay is calculated when required during replay. * Send timestamp is calculated as the later of the timestamp of chunk in which WAL record was sent and the latest XLog time. * WALSender sends an empty message as a keepalive when nothing else to send. (No longer a special message type for the keepalive). I think its close, but if there's a gaping hole here somewhere then I'll punt for this release. -- Simon Riggs www.2ndQuadrant.com *** a/doc/src/sgml/protocol.sgml --- b/doc/src/sgml/protocol.sgml *** *** 4222,4247 The commands accepted in walsender mode are: /varlistentry varlistentry term ! Bytereplaceablen/replaceable /term listitem para ! Data that forms part of WAL data stream. /para /listitem /varlistentry ! /variablelist /para /listitem /varlistentry /variablelist ! /para ! para A single WAL record is never split across two CopyData messages. When a WAL record crosses a WAL page boundary, however, and is therefore already split using continuation records, it can be split at the page boundary. In other words, the first main WAL record and its continuation records can be split across different CopyData messages. /para /listitem /varlistentry --- 4222,4257 /varlistentry varlistentry term ! Bytereplaceable8/replaceable /term listitem para ! Message timestamp. /para /listitem /varlistentry ! varlistentry ! term ! Bytereplaceablen/replaceable ! /term ! listitem ! para ! Data that forms part of WAL data stream. (May be zero length). /para /listitem /varlistentry /variablelist ! /para ! para A single WAL record is never split across two CopyData messages. When a WAL record crosses a WAL page boundary, however, and is therefore already split using continuation records, it can be split at the page boundary. In other words, the first main WAL record and its continuation records can be split across different CopyData messages. + /para + /listitem + /varlistentry + /variablelist /para /listitem /varlistentry *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *** *** 1938,1944 UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force) UpdateControlFile(); minRecoveryPoint = newMinRecoveryPoint; ! ereport(DEBUG2, (errmsg(updated min recovery point to %X/%X, minRecoveryPoint.xlogid, minRecoveryPoint.xrecoff))); } --- 1938,1944 UpdateControlFile(); minRecoveryPoint = newMinRecoveryPoint; ! ereport(DEBUG3, (errmsg(updated min recovery point to %X/%X, minRecoveryPoint.xlogid, minRecoveryPoint.xrecoff))); } *** *** 9210,9218 retry: { /* * While walreceiver is active, wait for new WAL to arrive ! * from primary. */ ! receivedUpto = GetWalRcvWriteRecPtr(); if (XLByteLT(*RecPtr, receivedUpto)) { /* --- 9210,9218 { /* * While walreceiver is active, wait for new WAL to arrive ! * from primary. Get next applychunk and do other bookkeeping. */ ! receivedUpto = GetWalRcvNextApplyChunk(); if (XLByteLT(*RecPtr, receivedUpto)) { /* *** a/src/backend/replication/walreceiver.c --- b/src/backend/replication/walreceiver.c *** *** 394,410 XLogWalRcvProcessMsg(unsigned char type, char *buf, Size len) case 'w':/* WAL records */ { XLogRecPtr recptr; ! if (len sizeof(XLogRecPtr)) ereport(ERROR, (errcode(ERRCODE_PROTOCOL_VIOLATION), errmsg_internal(invalid WAL message received from primary))); memcpy(recptr, buf, sizeof(XLogRecPtr)); buf += sizeof(XLogRecPtr); len -= sizeof(XLogRecPtr); ! XLogWalRcvWrite(buf, len, recptr); break; } default: --- 394,427 case 'w':/* WAL records */ { XLogRecPtr recptr;
Re: [HACKERS] functional call named notation clashes with SQL feature
On May 27, 2010, at 9:59 AM, Tom Lane wrote: I think we should fix it now. Quick thought: maybe we could use FOR instead of AS: select myfunc(7 for a, 6 for b); I'm afraid FOR doesn't work either; it'll create a conflict with the spec-defined SUBSTRING(x FOR y) syntax. How about ISPARAMVALUEFOR? That shouldn't conflict with anything. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Why my manualy constructed raw parser tree produce failed to execute?
Hi all, I have try to understand how parser work. So far the raw parser in PostgreSQL will produce that called raw parser tree and the raw parser tree will be passed to analyzer, right? I have modified PostgreSQL, so the program wont call function raw_parser -a function container that make by yacc and lex-. The part that i try to modified was at src/backend/tcop/postgres.c function pg_parse_query. I have using GDB to see list and node structure (a parse tree structure that made using yacc) of my working query. I have constructed same list and node structure as my working query, using manual code (lit_make1, lappend, makeNode, etc). GDB pprint show that my manualy constructed list are identically with my working query, at least when i try to compare it by my eye 1 on 1. But when my manualy constructed query tree is execute, it produce error: unrecognized node type. Now to the question, why my manualy constructed list was failed to execute? I was pretty sure that my list node was identical with yacc. Is there something that i miss when i consctructed my list (perhaps some list structure part that not printed by GDB)? By the way in GBD i using call pprint(node/list name) to display my list. Thank You. -- Mohammad Heykal Abdillah heykal.abdil...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages
Well, maybe I'm confused here, but arranging things so that we NEVER have to visit the page after initially writing it seems like it's setting the bar almost impossibly high. That is the use case, though. What I've encountered so far at 3 client sites is tables which are largely append-only, with a few selects and very few updates ( 2%) on recent data. In general, once data gets flushed out of memory, it goes to disk and never gets recalled, and certainly not written. Thinks are hunky-dory until we reach max_freeze_age, at which point the server has to chew through hundreds of gigabytes of old data just to freeze them, sometimes bringing the application to a halt in the process. The user's perspective on this is quite reasonable: if I haven't selected these pages, and I haven't written to them, why does autovacuum need to visit them and screw up my server performance? Consider a table that is regularly written but append-only. Every time autovacuum kicks in, we'll go and remove any dead tuples and then mark the pages PD_ALL_VISIBLE and set the visibility map bits, which will cause subsequent vacuums to ignore the all-visible portions of the table... until anti-wraparound kicks in, at which point we'll vacuum the entire table and freeze everything. If, however, we decree that you can't write a new tuple into a PD_ALL_VISIBLE page without freezing the existing tuples, then you'll still have the small, incremental vacuums but those are pretty cheap, That only works if those pages were going to be autovacuumed anyway. In the case outlined above (which I've seen at 3 different production sites this year), they wouldn't be; a table with less than 2% updates and deletes does not get vacuumed until max_freeze_age for any reason. For that matter, pages which are getting autovacuumed are not a problem, period; they're being read and written and freezing them is not an issue. I'm not seeing a way of fixing this common issue short of overhauling CLOG, or of creating a freeze_map. Darn. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_trgm
On fre, 2010-05-28 at 00:46 +0900, Tatsuo Ishii wrote: I don't know about Japanese, but the locale approach works just fine for other agglutinative languages. I would rather suspect that it is the trigram approach that might be rather useless for such languages, because you are going to get a lot of similarity hits for the affixes. I'm not sure what you mean by affixes. But I will explain... A Japanese sentence consists of words. Problem is, each word is not separated by space (agglutinative). So most text tools such as text search need preprocess which finds word boundaries by looking up dictionaries (and smart grammer analysis routine). In the process affixes can be determined and perhaps removed from the target word group to be used for text search (note that removing affixes is no relevant to locale). Once we get space separated sentence, it can be processed by text search or by pg_trgm just same as Engligh. (Note that these preprocessing are done outside PostgreSQL world). The difference is just the word can be consists of non ASCII letters. I think the problem at hand has nothing at all to do with agglutination or CJK-specific issues. You will get the same problem with other languages *if* you set a locale that does not adequately support the characters in use. E.g., Russian with locale C and encoding UTF8: select similarity(E'\u0441\u043B\u043E\u043D', E'\u0441\u043B\u043E \u043D\u044B'); similarity NaN (1 row) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] List traffic
On 5/27/10 8:38 AM, Greg Stark wrote: Lists like -ecpg or -odbc would work fine if the traffic warranted them. A low-traffic list is a feature, not a bug. Most people don't *like* subscribing to lists which have 80posts/day. But some of the lists we have now are 99% overlap with each other -- I claim because the definitions are meaningless. What part of postgres discussion (aside from this thread) *don't* relate in some way to SQL? Or administration? Or performance? Most performance problems end up being solved by adjusting SQL or changing GUCs. This is a set theory fallacy. While most performance issues are administration issues as well, it is NOT therefore true that most administration issues are also performance issues. In fact, I'd say that the -performance list does an excellent job of sticking to troubleshooting performance issues only. And for someone who has a performance issue, and does not want to field 100 emails about can't install Postgre, that's a feature. Mot administration questions are originally posed as general help questions. If you're subscribed to these lists you get a random, fairly small, subset of discussion related these topics. Only someone who is a postgresql developer would consider 15-30 posts/day small. For most of our user base, the level of traffic on -performance, -sql, and -general is already too high and many people don't subscribe to these lists because it is too high. I get complaints -- and people personal-sending me questions because they don't want to subscribe -- all the time. Having fewer posts on any particular list is *desireable*. It's a good thing. It's *only* a problem when a bug report or user question goes unanswered because the list is unattended. And so far, I've only seen one report of that. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.0beta2 release plans
On Thu, 2010-05-27 at 11:24 -0400, Tom Lane wrote: Current thought among core is to wrap beta2 Thursday June 3 (a week from today) for public release on Monday the 7th. Get those fixes in. I believe my issues are all sorted, apart from one outstanding patch to attempt to address the max_standby_delay discussions. If anybody knows different please ping me directly. I'm going to be in powersave mode for a few weeks. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages
On 5/26/10 6:32 PM, Robert Haas wrote: Hmm, yeah. Maybe we should freeze when we set PD_ALL_VISIBLE; that might be just as good, and simpler. Assuming the visibility map is sufficiently crash-safe/non-buggy, we could then teach VACUUM that it's OK to advance relfrozenxid even when doing just a partial vacuum - because any pages that were skipped must contain only frozen tuples. Previously you've objected to proposals in this direction because they might destroy forensic information, but maybe we should do it anyway. It would be an improvement, and easier than the various ways of never having to visit the pages, which are all fairly intensive. Given the destruction of rollback information, though, we'd probably want a way to switch this behaviour on and off as an autovacuum setting. Does this send us down the wrong path, though? I thought we wanted to think about removing hint bits so that we could implement things like CRCs. No? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] command tag logging
Ray Stell ste...@cns.vt.edu writes: On Thu, May 27, 2010 at 12:49:49PM -0400, Tom Lane wrote: That is in the right place, isn't it. That would suggest that get_ps_display() is returning a wrong length on Ray's machine. It works okay here, but since that's platform-specific code that hardly proves much. Ray, what platform is this exactly? [postgres ~]$ cat /etc/issue Red Hat Enterprise Linux Server release 5.5 (Tikanga) OK, I can reproduce it when I try on my Fedora box. The problem is that log_connections emits a log message before init_ps_display() has been called, and the ps_status.c logic isn't careful to ensure that it returns something sane in that case. It accidentally failed to fail too badly before I changed the elog.c logic, because there'd be an embedded null after the program name in most cases. I suppose people might even have thought that printing postmaster or postgres for %i was intended behavior there. I think the most useful fix is to create a static variable to hold the notional strlen(ps_buffer), which will initialize to zero, and then we can use that instead of groveling over the string in get_ps_display. Should improve performance a tad as well as fixing this problem. Will work on that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON manipulation functions
I've started implementing the JSON datatype; the repo is at http://git.postgresql.org/gitweb?p=json-datatype.git . On Fri, May 14, 2010 at 1:15 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, May 13, 2010 at 9:47 PM, Joseph Adams joeyadams3.14...@gmail.com wrote: Would it be a bad idea to give an enum and a function the same name (which appears to be allowed by PostgreSQL) ? If so, json_type(json) could be json_typeof(json) or something instead. No, I think that's a fine idea. I tried making a function named json_type that has the same name as the type json_type. However, this doesn't work as expected: SELECT json_type('[1,2,3]'); Instead of calling json_type with '[1,2,3]' casted to JSON, it's trying to cast '[1,2,3]' to json_type. Is there a way to override this behavior, or would I be better off renaming the function? Note that if the function were renamed, the literal would implicitly be json: SELECT json_typeof('[1,2,3]'); -- works I tried this: CREATE OR REPLACE FUNCTION json_type(json) RETURNS json_type AS 'MODULE_PATHNAME','json_get_type' LANGUAGE C STRICT IMMUTABLE; CREATE CAST (json AS json_type) WITH FUNCTION json_type(json); However, json_type('[1,2,3]') still doesn't work (it doesn't infer that '[1,2,3]' should be casted to json first). I also tried each of AS ASSIGNMENT and AS IMPLICIT as well. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON manipulation functions
On Thu, May 27, 2010 at 2:39 PM, Joseph Adams joeyadams3.14...@gmail.com wrote: I've started implementing the JSON datatype; the repo is at http://git.postgresql.org/gitweb?p=json-datatype.git . On Fri, May 14, 2010 at 1:15 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, May 13, 2010 at 9:47 PM, Joseph Adams joeyadams3.14...@gmail.com wrote: Would it be a bad idea to give an enum and a function the same name (which appears to be allowed by PostgreSQL) ? If so, json_type(json) could be json_typeof(json) or something instead. No, I think that's a fine idea. I tried making a function named json_type that has the same name as the type json_type. However, this doesn't work as expected: SELECT json_type('[1,2,3]'); Instead of calling json_type with '[1,2,3]' casted to JSON, it's trying to cast '[1,2,3]' to json_type. Is there a way to override this behavior, or would I be better off renaming the function? Well, you should rename either the function or the type, I guess. Not sure which. Calling it json_typeof would be reasonable... Note that if the function were renamed, the literal would implicitly be json: SELECT json_typeof('[1,2,3]'); -- works I tried this: CREATE OR REPLACE FUNCTION json_type(json) RETURNS json_type AS 'MODULE_PATHNAME','json_get_type' LANGUAGE C STRICT IMMUTABLE; CREATE CAST (json AS json_type) WITH FUNCTION json_type(json); However, json_type('[1,2,3]') still doesn't work (it doesn't infer that '[1,2,3]' should be casted to json first). I also tried each of AS ASSIGNMENT and AS IMPLICIT as well. Yeah, I don't think you want to go that way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages
On Thu, May 27, 2010 at 2:17 PM, Josh Berkus j...@agliodbs.com wrote: On 5/26/10 6:32 PM, Robert Haas wrote: Hmm, yeah. Maybe we should freeze when we set PD_ALL_VISIBLE; that might be just as good, and simpler. Assuming the visibility map is sufficiently crash-safe/non-buggy, we could then teach VACUUM that it's OK to advance relfrozenxid even when doing just a partial vacuum - because any pages that were skipped must contain only frozen tuples. Previously you've objected to proposals in this direction because they might destroy forensic information, but maybe we should do it anyway. It would be an improvement, and easier than the various ways of never having to visit the pages, which are all fairly intensive. Given the destruction of rollback information, though, we'd probably want a way to switch this behaviour on and off as an autovacuum setting. It's not going to destroy anything that is needed for rollback unless there's a bug - PD_ALL_VISIBLE only gets set when all tuples on the page are visible to all backends. That can't happen until all transactions that wrote the page, and all others that have a lower xmin, have committed. That having been said, if making it a GUC makes people less nervous about doing it, then +1 from me. Does this send us down the wrong path, though? I thought we wanted to think about removing hint bits so that we could implement things like CRCs. No? PD_ALL_VISIBLE is a page-level bit, not a tuple-level bit, and I strongly suspect it's not going anywhere. It's critical infrastructure for index-only scans, among other things. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages
On Thu, May 27, 2010 at 2:00 PM, Josh Berkus j...@agliodbs.com wrote: Well, maybe I'm confused here, but arranging things so that we NEVER have to visit the page after initially writing it seems like it's setting the bar almost impossibly high. That is the use case, though. What I've encountered so far at 3 client sites is tables which are largely append-only, with a few selects and very few updates ( 2%) on recent data. In general, once data gets flushed out of memory, it goes to disk and never gets recalled, and certainly not written. We might be able to optimize this case if the transactions are small, such that they commit before dirtying too large a fraction of shared_buffers. We could - at least in theory - teach the bgwriter or some other process to freeze them before writing them to disk the first time. But if the blocks have to be written to disk before transaction commit it seems to me we're DOA, unless we're willing to retain arbitrarily large amounts of CLOG. What might be more practical is to try to find ways to spread out the I/O so that it doesn't happen all at once in a huge ornery spike. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
On Thu, May 27, 2010 at 1:27 PM, David E. Wheeler da...@kineticode.com wrote: On May 27, 2010, at 9:59 AM, Tom Lane wrote: I think we should fix it now. Quick thought: maybe we could use FOR instead of AS: select myfunc(7 for a, 6 for b); I'm afraid FOR doesn't work either; it'll create a conflict with the spec-defined SUBSTRING(x FOR y) syntax. How about ISPARAMVALUEFOR? That shouldn't conflict with anything. Or we could use the Finnish word epäjärjestelmällistyttämättömyydellänsäkäänköhän, which I'm pretty sure is not currently used in our grammar. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
On May 27, 2010, at 11:55 AM, Robert Haas wrote: Or we could use the Finnish word epäjärjestelmällistyttämättömyydellänsäkäänköhän, which I'm pretty sure is not currently used in our grammar. I thought that was an Icelandic volcano. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_trgm
On Thu, May 27, 2010 at 2:01 PM, Peter Eisentraut pete...@gmx.net wrote: On fre, 2010-05-28 at 00:46 +0900, Tatsuo Ishii wrote: I don't know about Japanese, but the locale approach works just fine for other agglutinative languages. I would rather suspect that it is the trigram approach that might be rather useless for such languages, because you are going to get a lot of similarity hits for the affixes. I'm not sure what you mean by affixes. But I will explain... A Japanese sentence consists of words. Problem is, each word is not separated by space (agglutinative). So most text tools such as text search need preprocess which finds word boundaries by looking up dictionaries (and smart grammer analysis routine). In the process affixes can be determined and perhaps removed from the target word group to be used for text search (note that removing affixes is no relevant to locale). Once we get space separated sentence, it can be processed by text search or by pg_trgm just same as Engligh. (Note that these preprocessing are done outside PostgreSQL world). The difference is just the word can be consists of non ASCII letters. I think the problem at hand has nothing at all to do with agglutination or CJK-specific issues. You will get the same problem with other languages *if* you set a locale that does not adequately support the characters in use. E.g., Russian with locale C and encoding UTF8: select similarity(E'\u0441\u043B\u043E\u043D', E'\u0441\u043B\u043E \u043D\u044B'); similarity NaN (1 row) What I can't help wondering as I'm reading this discussion is - Tatsuo-san said upthread that he has a problem with pg_trgm that he does not have with full text search. So what is full text search doing differently than pg_trgm? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] functional call named notation clashes with SQL feature
On Thu, May 27, 2010 at 2:59 PM, David E. Wheeler da...@kineticode.com wrote: On May 27, 2010, at 11:55 AM, Robert Haas wrote: Or we could use the Finnish word epäjärjestelmällistyttämättömyydellänsäkäänköhän, which I'm pretty sure is not currently used in our grammar. I thought that was an Icelandic volcano. No, that's Eyjafjallajökull. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers