Re: [HACKERS] ExecutorCheckPerms() hook
The attached patch is a revised one for DML permission checks. List of updates: - Source code comments in the patched functions were revised. - ExecCheckRTPerms() and ExecCheckRTEPerms() were moved to aclchk.c, and renamed to chkpriv_relation_perms() and chkpriv_rte_perms(). - It took the 2nd argument (bool abort) that is a hint of behavior on access violations. - It also returns AclResult, instead of bool. - I assumed RI_Initial_Check() is not broken, right now. So, this patch just reworks DML permission checks without any bugfixes. - The ESP hook were moved to ExecCheckRTPerms() from ExecCheckRTEPerms(). - At DoCopy() and RI_Initial_Check() call the checker function with list_make1(&rte), instead of &rte. - In DoCopy(), required_access is used to store either ACL_SELECT or ACL_INSERT; initialized at head of the function. - In DoCopy(), it initialize selectedCols or modifiedCol of RTE depending on if (is_from), instead of columnsSet. ToDo: - makeRangeTblEntry() stuff to allocate a RTE node with given parameter is not yet. Thanks, (2010/05/26 12:04), KaiGai Kohei wrote: > (2010/05/26 11:12), Stephen Frost wrote: >> * KaiGai Kohei (kai...@ak.jp.nec.com) wrote: #2: REALLY BIG ISSUE- You've added ExecutorCheckPerms_hook as part of this patch- don't, we're in feature-freeze right now and should not be adding hooks at this time. >>> >>> The patch is intended to submit for the v9.1 development, not v9.0, isn't >>> it? >> >> That really depends on if this is actually fixing a bug in the existing >> code or not. I'm on the fence about that at the moment, to be honest. >> I was trying to find if we expliitly say that SELECT rights are needed >> to reference a column but wasn't able to. If every code path is >> expecting that, then perhaps we should just document it that way and >> move on. In that case, all these changes would be for 9.1. If we >> decide the current behavior is a bug, it might be something which could >> be fixed in 9.0 and maybe back-patched. > > Ahh, because I found out an independent problem during the discussion, > it made us confused. Please make clear this patch does not intend to > fix the bug. > > If we decide it is an actual bug to be fixed/informed, I also agree > it should be worked in a separated patch. > > Well, rest of discussion should be haven in different thread. > >> In *either* case, given that one is a 'clean-up' patch and the other is >> 'new functionality', they should be independent *anyway*. Small >> incremental changes that don't break things when applied is what we're >> shooting for here. > > Agreed. > #3: You didn't move ExecCheckRTPerms() and ExecCheckRTEPerms() to utils/acl and instead added executor/executor.h to rt_triggers.c. I don't particularly like that. I admit that DoCopy() already knew about the executor, and if that were the only case outside of the executor where ExecCheckRTPerms() was getting called it'd probably be alright, but we already have another place that wants to use it, so let's move it to a more appropriate place. >>> >>> Sorry, I'm a bit confused. >>> It seemed to me you suggested to utilize ExecCheckRTPerms() rather than >>> moving its logic anywhere, so I kept it here. (Was it misunderstand?) >> >> I'm talking about moving the whole function (all 3 lines of it) to >> somewhere else and then reworking the function to be more appropriate >> based on it's new location (including renaming and changing arguments >> and return values, as appropriate). > > OK, I agreed. > >>> If so, but, I doubt utils/acl is the best placeholder of the moved >>> ExecCheckRTPerms(), because the checker function calls both of the >>> default acl functions and a optional external security function. >> >> Can you explain why you think that having a function in utils/acl (eg: >> include/utils/acl.h and backend/utils/aclchk.c) which calls default acl >> functions and an allows for an external hook would be a bad idea? >> >>> It means the ExecCheckRTPerms() is caller of acl functions, not acl >>> function itself, isn't it? >> >> It's providing a higher-level service, sure, but there's nothing >> particularly interesting or special about what it's doing in this case, >> and, we need it in multiple places. Why duplicate it? > > If number of the checker functions is only a reason why we move > ExecCheckRTPerms() into the backend/utils/aclchk.c right now, I > don't have any opposition. > When it reaches to a dozen, we can consider new location. Right? > > Sorry, the name of pg_rangetbl_aclcheck() was misleading for me. > >>> I agreed the checker function is not a part of executor, but it is >>> also not a part of acl functions in my opinion. >>> >>> If it is disinclined to create a new directory to deploy the checker >>> function, my preference is src/backend/utils/adt/security.c and >>> src/include/utils/security.h . >> >> We don't need a new directory or file for one function, as Robert >>
Re: [HACKERS] Synchronization levels in SR
On Wed, 2010-05-26 at 13:03 +0900, Fujii Masao wrote: > On Wed, May 26, 2010 at 1:04 AM, Simon Riggs wrote: > > On Tue, 2010-05-25 at 12:40 +0900, Fujii Masao wrote: > >> On Tue, May 25, 2010 at 10:29 AM, Josh Berkus wrote: > >> > I agree that #4 should be done last, but it will be needed, not in the > >> > least by your employer ;-) . I don't see any obvious way to make #4 > >> > compatible with any significant query load on the slave, but in general > >> > I'd think that users of #4 are far more concerned with 0% data loss than > >> > they are with getting the slave to run read queries. > >> > >> Since #2 and #3 are enough for 0% data loss, I think that such users > >> would be more concerned about what results are visible in the standby. > >> No? > > > > Please add #4 also. You can do that easily at the same time as #2 and > > #3, and it will leave me free to fix the perceived conflict problems. > > I think that we should implement the feature in small steps rather than > submit one big patch at a time. So I'd like to focus on #2 and #3 at first, > and #4 later (maybe third or fourth CF). We both know if you do #2 and #3 then doing #4 also is trivial. If you leave it out then we'll end up missing something that is required and have to rework everything. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Tue, 2010-05-25 at 23:59 -0400, Robert Haas wrote: > Quorum commit is definitely an extra knob, IMHO. No, its about three less, as I have explained. Explain your position, don't just demand others listen. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages
On 25/05/10 23:56, Josh Berkus wrote: Do we get a bit in the visibility map for a page which has aborted transaction rows on it? If there's a tuple with an aborted xmin on a page, the bit in the visibility map is not set. A tuple with aborted xmax doesn't matter. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsvector pg_stats seems quite a bit off.
On 26/05/2010, at 01.16, Jan Urbański wrote: On 19/05/10 21:01, Jesper Krogh wrote: The document base is arount 350.000 documents and I have set the statistics target on the tsvector column to 1000 since the 100 seems way of. So for tsvectors the statistics target means more or less "at any time track at most 10 * lexemes simultaneously" where "track" means keeping them in memory while going through the tuples being analysed. Remember that the measure is in lexemes, not whole tsvectors and the 10 factor is meant to approximate the average number of unique lexemes in a tsvector. If your documents are very large, this might not be a good approximation. I just did a avg(length(document_tsvector)) which is 154 Doc count is 1.3m now in my sample set. But the distribution is very "flat" at the end, the last 128 values are excactly 1.00189e-05 which means that any term sitting outside the array would get an estimate of 1.00189e-05 * 350174 / 2 = 1.75 ~ 2 rows Yeah, this might meant that you could try cranking up the stats target a lot, to make the set of simulatenously tracked lexemes larger (it will cost time and memory during analyse though). If the documents have completely different contents, what can happen is that almost all lexemes are only seen a few times and get removed during the pruning of the working set. I have seen similar behaviour while working on the typanalyze function for tsvectors. I Think i would prefer something less "magic" I Can increase the statistics target and get more reliable data but that increases also the amount of tuples being picked out for analysis which is really time consuming. But that also means that what gets stored as the lower bound of the historgram isn't anywhere near the lower bound, more the lower bound of the "artificial" histogram that happened after the last pruning. I Would suggest that the pruning in the end should be aware of this. Perhaps by keeping track of the least frequent value that never got pruned and using that as the last pruning ans lower bound? Thanks a lot for the explanation it fits fairly well why i couldn't construct a simple test set that had the problem. So far I have no idea if this is bad or good, so a couple of sample runs of stuff that is sitting outside the "most_common_vals" array: [gathered statistics suck] So the "most_common_vals" seems to contain a lot of values that should never have been kept in favor of other values that are more common. In practice, just cranking the statistics estimate up high enough seems to solve the problem, but doesn't there seem to be something wrong in how the statistics are collected? The algorithm to determine most common vals does not do it accurately. That would require keeping all lexemes from the analysed tsvectors in memory, which would be impractical. If you want to learn more about the algorithm being used, try reading http://www.vldb.org/conf/2002/S10P03.pdf and corresponding comments in ts_typanalyze.c I'll do some Reading Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Wed, May 26, 2010 at 1:04 AM, Simon Riggs wrote: > On Tue, 2010-05-25 at 12:40 +0900, Fujii Masao wrote: >> On Tue, May 25, 2010 at 10:29 AM, Josh Berkus wrote: >> > I agree that #4 should be done last, but it will be needed, not in the >> > least by your employer ;-) . I don't see any obvious way to make #4 >> > compatible with any significant query load on the slave, but in general >> > I'd think that users of #4 are far more concerned with 0% data loss than >> > they are with getting the slave to run read queries. >> >> Since #2 and #3 are enough for 0% data loss, I think that such users >> would be more concerned about what results are visible in the standby. >> No? > > Please add #4 also. You can do that easily at the same time as #2 and > #3, and it will leave me free to fix the perceived conflict problems. I think that we should implement the feature in small steps rather than submit one big patch at a time. So I'd like to focus on #2 and #3 at first, and #4 later (maybe third or fourth CF). Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Tue, May 25, 2010 at 11:36 PM, Fujii Masao wrote: > On Wed, May 26, 2010 at 2:10 AM, Simon Riggs wrote: >> My suggestion is simply to have a single parameter (name unimportant) >> >> number_of_synch_servers_we_wait_for = N >> >> which is much easier to understand because it is phrased in terms of the >> guarantee given to the transaction, not in terms of what the admin >> thinks is the situation. > > How can we choose #2, #3 or #4 by using your proposed option? > > As the result of the discussion, I'm inclined towards choosing the > "mix" approach. How about specifying #1, #2, #3 or #4 per standby, > and specifying the number of "synchronous" (i.e., means #2, #3 or > #4) standbys the transaction commit waits for at the master as > Simon suggests? > > We add new option "replication_mode" (better name?) specifying > when the standby sends the ACK meaning the completion of replication > to the master into recovery.conf. Valid values are "async", "recv", > "fsync" and "redo". Those correspond to #1, #2, #3 and #4 I defined > on the top of the thread. > > If "async", the standby never sends any ACK. If "recv", "fsync", > or "redo", the standby sends the ACK when it has received, fsynced > or replayed the WAL from the master, respectively. > > On the other hand, we add new GUC "max_synchronous_standbys" > (I prefer it to "number_of_synch_servers_we_wait_for", but does > anyone have better name?) as PGC_USERSET into postgresql.conf. > It specifies the maximum number of standbys which transaction > commit must wait for the ACK from. > > If max_synchronous_standbys is 0, no transaction commit waits for > ACK even if some connected standbys set their replication_mode to > "recv", "fsync" or "redo". If it's positive, transaction comit waits > for N ACKs. N is the smaller number between max_synchronous_standbys > and the actual number of connected "synchronous" standbys. > > Thought? I think we're over-engineering this. For a first version we should do something simple. Then we can add some of these extra knobs in a follow-on patch. Quorum commit is definitely an extra knob, IMHO. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ExecutorCheckPerms() hook
(2010/05/26 12:17), Tom Lane wrote: > Stephen Frost writes: >> That may be the case. I'm certainly more concerned with a bug in the >> existing code than any new code that we're working on. The question is- >> is this actually a user-visible bug? Or do we require that a user >> creating an FK needs SELECT rights on the primary table? If so, it's >> still a bug, but at that point it's a bug in our documentation where we >> don't mention that SELECT rights are also needed. > > Having an FK to another table certainly allows at least an indirect > form of SELECT, because you can determine whether any given key > exists in the PK table by seeing if you're allowed to insert a > referencing row. I haven't dug in the SQL spec to see if that addresses > the point, but it wouldn't bother me in the least to insist that > both REFERENCES and SELECT privilege are required to create an FK. > > In any case, RI_Initial_Check isn't broken, because if it can't do > the SELECTs it just falls back to a slower method. It's arguable > that the FK triggers themselves are assuming more than they should > about permissions, but I don't think that RI_Initial_Check can be > claimed to be buggy. Hmm. If both REFERENCES and SELECT privilege are required to create a new FK constraint, why RI_Initial_Check() need to check SELECT permission prior to SPI_execute()? It eventually checks SELECT privilege during execution of the secondary query. It is unclear for me why we need to provide a slower fallback. Thanks, -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Wed, May 26, 2010 at 2:10 AM, Simon Riggs wrote: > My suggestion is simply to have a single parameter (name unimportant) > > number_of_synch_servers_we_wait_for = N > > which is much easier to understand because it is phrased in terms of the > guarantee given to the transaction, not in terms of what the admin > thinks is the situation. How can we choose #2, #3 or #4 by using your proposed option? As the result of the discussion, I'm inclined towards choosing the "mix" approach. How about specifying #1, #2, #3 or #4 per standby, and specifying the number of "synchronous" (i.e., means #2, #3 or #4) standbys the transaction commit waits for at the master as Simon suggests? We add new option "replication_mode" (better name?) specifying when the standby sends the ACK meaning the completion of replication to the master into recovery.conf. Valid values are "async", "recv", "fsync" and "redo". Those correspond to #1, #2, #3 and #4 I defined on the top of the thread. If "async", the standby never sends any ACK. If "recv", "fsync", or "redo", the standby sends the ACK when it has received, fsynced or replayed the WAL from the master, respectively. On the other hand, we add new GUC "max_synchronous_standbys" (I prefer it to "number_of_synch_servers_we_wait_for", but does anyone have better name?) as PGC_USERSET into postgresql.conf. It specifies the maximum number of standbys which transaction commit must wait for the ACK from. If max_synchronous_standbys is 0, no transaction commit waits for ACK even if some connected standbys set their replication_mode to "recv", "fsync" or "redo". If it's positive, transaction comit waits for N ACKs. N is the smaller number between max_synchronous_standbys and the actual number of connected "synchronous" standbys. Thought? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Open Item: pg_controldata - machine readable?
Joe Conway wrote: > >> There is an open item "pg_controldata - machine readable?" in the list: > >> http://wiki.postgresql.org/wiki/PostgreSQL_9.0_Open_Items > >> Should we add the module to 9.0? > > > > No. This is a new feature that wasn't even under consideration, > > let alone written, at the time of 9.0 feature freeze. It does not > > get into either core or contrib this time around. > > Yup, agreed. That was why I put it on github instead of sending a patch > to the list. It was also a quick and dirty hack -- maybe it could be > cleaned up for 9.1, but I'm not sure there was consensus that it was > really needed. OK, I moved it from 9.0 open items to new features for 9.1. Regards, --- Takahiro Itagaki NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Open Item: invalid declspec for PG_MODULE_MAGIC
Takahiro Itagaki writes: > * Should we backport the fix to previous releases? Certainly not. It hasn't gotten through beta, and the risk of breaking third-party modules is nonnegligible. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ExecutorCheckPerms() hook
Stephen Frost writes: > That may be the case. I'm certainly more concerned with a bug in the > existing code than any new code that we're working on. The question is- > is this actually a user-visible bug? Or do we require that a user > creating an FK needs SELECT rights on the primary table? If so, it's > still a bug, but at that point it's a bug in our documentation where we > don't mention that SELECT rights are also needed. Having an FK to another table certainly allows at least an indirect form of SELECT, because you can determine whether any given key exists in the PK table by seeing if you're allowed to insert a referencing row. I haven't dug in the SQL spec to see if that addresses the point, but it wouldn't bother me in the least to insist that both REFERENCES and SELECT privilege are required to create an FK. In any case, RI_Initial_Check isn't broken, because if it can't do the SELECTs it just falls back to a slower method. It's arguable that the FK triggers themselves are assuming more than they should about permissions, but I don't think that RI_Initial_Check can be claimed to be buggy. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Open Item: pg_controldata - machine readable?
On 05/25/2010 08:03 PM, Tom Lane wrote: > Takahiro Itagaki writes: >> There is an open item "pg_controldata - machine readable?" in the list: >> http://wiki.postgresql.org/wiki/PostgreSQL_9.0_Open_Items > >> The proposal by Joe Conway is adding a new contib module. >> http://archives.postgresql.org/message-id/4b959d7a.6010...@joeconway.com >> http://github.com/jconway/pg_controldata > >> Should we add the module to 9.0? > > No. This is a new feature that wasn't even under consideration, > let alone written, at the time of 9.0 feature freeze. It does not > get into either core or contrib this time around. Yup, agreed. That was why I put it on github instead of sending a patch to the list. It was also a quick and dirty hack -- maybe it could be cleaned up for 9.1, but I'm not sure there was consensus that it was really needed. Joe signature.asc Description: OpenPGP digital signature
Re: [HACKERS] ExecutorCheckPerms() hook
(2010/05/26 11:12), Stephen Frost wrote: > * KaiGai Kohei (kai...@ak.jp.nec.com) wrote: >>> #2: REALLY BIG ISSUE- You've added ExecutorCheckPerms_hook as part of >>> this patch- don't, we're in feature-freeze right now and should not be >>> adding hooks at this time. >> >> The patch is intended to submit for the v9.1 development, not v9.0, isn't it? > > That really depends on if this is actually fixing a bug in the existing > code or not. I'm on the fence about that at the moment, to be honest. > I was trying to find if we expliitly say that SELECT rights are needed > to reference a column but wasn't able to. If every code path is > expecting that, then perhaps we should just document it that way and > move on. In that case, all these changes would be for 9.1. If we > decide the current behavior is a bug, it might be something which could > be fixed in 9.0 and maybe back-patched. Ahh, because I found out an independent problem during the discussion, it made us confused. Please make clear this patch does not intend to fix the bug. If we decide it is an actual bug to be fixed/informed, I also agree it should be worked in a separated patch. Well, rest of discussion should be haven in different thread. > In *either* case, given that one is a 'clean-up' patch and the other is > 'new functionality', they should be independent *anyway*. Small > incremental changes that don't break things when applied is what we're > shooting for here. Agreed. >>> #3: You didn't move ExecCheckRTPerms() and ExecCheckRTEPerms() to >>> utils/acl and instead added executor/executor.h to rt_triggers.c. >>> I don't particularly like that. I admit that DoCopy() already knew >>> about the executor, and if that were the only case outside of the >>> executor where ExecCheckRTPerms() was getting called it'd probably be >>> alright, but we already have another place that wants to use it, so >>> let's move it to a more appropriate place. >> >> Sorry, I'm a bit confused. >> It seemed to me you suggested to utilize ExecCheckRTPerms() rather than >> moving its logic anywhere, so I kept it here. (Was it misunderstand?) > > I'm talking about moving the whole function (all 3 lines of it) to > somewhere else and then reworking the function to be more appropriate > based on it's new location (including renaming and changing arguments > and return values, as appropriate). OK, I agreed. >> If so, but, I doubt utils/acl is the best placeholder of the moved >> ExecCheckRTPerms(), because the checker function calls both of the >> default acl functions and a optional external security function. > > Can you explain why you think that having a function in utils/acl (eg: > include/utils/acl.h and backend/utils/aclchk.c) which calls default acl > functions and an allows for an external hook would be a bad idea? > >> It means the ExecCheckRTPerms() is caller of acl functions, not acl >> function itself, isn't it? > > It's providing a higher-level service, sure, but there's nothing > particularly interesting or special about what it's doing in this case, > and, we need it in multiple places. Why duplicate it? If number of the checker functions is only a reason why we move ExecCheckRTPerms() into the backend/utils/aclchk.c right now, I don't have any opposition. When it reaches to a dozen, we can consider new location. Right? Sorry, the name of pg_rangetbl_aclcheck() was misleading for me. >> I agreed the checker function is not a part of executor, but it is >> also not a part of acl functions in my opinion. >> >> If it is disinclined to create a new directory to deploy the checker >> function, my preference is src/backend/utils/adt/security.c and >> src/include/utils/security.h . > > We don't need a new directory or file for one function, as Robert > already pointed out. OK, let's consider when aclchk.c holds a dozen of checker functions. >>> #6: I havn't checked yet, but if there are other things in an RTE which >>> would make sense in the DoCopy case, beyond just what's needed for the >>> permissions checking, and which wouldn't be 'correct' with a NULL'd >>> value, I would set those. Yes, we're building the RTE to check >>> permissions, but we don't want someone downstream to be suprised when >>> they make a change to something in the permissions checking and discover >>> that a value in RTE they expected to be there wasn't valid. Even more >>> so, if there are function helpers which can be used to build an RTE, we >>> should be using them. The same goes for RI_Initial_Check(). >> >> Are you saying something like makeFuncExpr()? >> I basically agree. However, should it be done in this patch? > > Actually, I mean looking for, and using, things like > markRTEForSelectPriv() and addRangeTableEntry() or > addRangeTableEntryForRelation(). OK, I'll make it in separated patch. >>> #8: When moving ExecCheckRTPerms(), you should rename it to be more like >>> the other function calls in acl.h Perhaps pg_rangetbl_aclcheck()? >>> Also, it sh
Re: [HACKERS] Open Item: pg_controldata - machine readable?
Takahiro Itagaki writes: > There is an open item "pg_controldata - machine readable?" in the list: > http://wiki.postgresql.org/wiki/PostgreSQL_9.0_Open_Items > The proposal by Joe Conway is adding a new contib module. > http://archives.postgresql.org/message-id/4b959d7a.6010...@joeconway.com > http://github.com/jconway/pg_controldata > Should we add the module to 9.0? No. This is a new feature that wasn't even under consideration, let alone written, at the time of 9.0 feature freeze. It does not get into either core or contrib this time around. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/25/2010 4:16 PM, Tom Lane wrote: Jan Wieck writes: No, I meant how will the *function* know, if a superuser and/or some background process can purge records at any time? The data contains timestamps which are supposedly taken in commit order. You can *not* rely on the commit timestamps to be in exact order. (Perhaps approximate ordering is good enough for what you want here, but just be careful to not fall into the trap of assuming that they're exactly ordered.) I am well aware of the fact that commit timestamps within the WAL can go backwards and that the serial numbers of this proposed implementation of commit order can even be different from what the timestamps AND the WAL are saying. As long as the serial number (record position inside of segment) is determined while the transaction still holds all its locks, this is going to be good enough for what async replication users today are used to. Again, it will not magically make it possible to determine a serializable order of actions, that happened from transactions running in read committed isolation level, post mortem. I don't even even think that is possible at all. And I don't think anyone proposed a solution for that problem anyways. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] libpq should not be using SSL_CTX_set_client_cert_cb
I've been experimenting with SSL setups involving chains of CA certificates, ie, where the server or client cert itself is signed by an intermediate CA rather than a trusted root CA. This appears to work well enough on the server side if you configure the server correctly (see discussion of bug #5468). However, libpq is not able to work with a client certificate unless that cert is directly signed by a CA that the server trusts (ie, one listed directly in the server's root.crt file). This is because there is no good way to feed back any intermediate CA certs to the server. The man page for SSL_CTX_set_client_cert_cb says in so many words that the client_cert_cb API is maldesigned: BUGS The client_cert_cb() cannot return a complete certificate chain, it can only return one client certificate. If the chain only has a length of 2, the root CA certificate may be omitted according to the TLS standard and thus a standard conforming answer can be sent to the server. For a longer chain, the client must send the complete chain (with the option to leave out the root CA certificate). This can only be accomplished by either adding the intermediate CA certificates into the trusted certificate store for the SSL_CTX object (resulting in having to add CA certificates that otherwise maybe would not be trusted), or by adding the chain certificates using the SSL_CTX_add_extra_chain_cert(3) function, which is only available for the SSL_CTX object as a whole and that therefore probably can only apply for one client certificate, making the concept of the callback function (to allow the choice from several certificates) questionable. It strikes me that we could not only fix this case, but make the libpq code simpler and more like the backend case, if we got rid of client_cert_cb and instead preloaded the ~/.postgresql/postgresql.crt file using SSL_CTX_use_certificate_chain_file(). Then, using an indirectly signed client cert would only require including the full cert chain in that file. So I'm wondering if there was any specific reason behind using the callback API to start with. Anybody remember? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Open Item: invalid declspec for PG_MODULE_MAGIC
This open item is for replacing PGDLLIMPORT markers for PG_MODULE_MAGIC and PG_FUNCTION_INFO_V1 to __declspec(dllexport) because they are always expored by user modules rather than by the core codes. http://archives.postgresql.org/message-id/20100329184705.a60e.52131...@oss.ntt.co.jp The fix is simple, so I think we can include it to 9.0. Arguable issues for the patch are: * Are there better name than PGMODULEEXPORT? I like PGDLLEXPORT because it is similar to PGDLLIMPORT, but it might be too similar. * Should we backport the fix to previous releases? I'd like to backport it because it should not break any existing third party modules because they cannot be even built on Windows. Regards, --- Takahiro Itagaki NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ExecutorCheckPerms() hook
* KaiGai Kohei (kai...@ak.jp.nec.com) wrote: > The reason why user must have SELECT privileges on the PK/FK tables is > the validateForeignKeyConstraint() entirely calls SPI_execute() to verify > FK constraints can be established between two tables (even if fallback path). > > And, the reason why RI_Initial_Check() now calls pg_class_aclcheck() is > to try to avoid unexpected access violation error because of SPI_execute(). > However, the fallback path also calls SPI_execute() entirely, so I concluded > the permission checks in RI_Initial_Check() is nonsense. That may be the case. I'm certainly more concerned with a bug in the existing code than any new code that we're working on. The question is- is this actually a user-visible bug? Or do we require that a user creating an FK needs SELECT rights on the primary table? If so, it's still a bug, but at that point it's a bug in our documentation where we don't mention that SELECT rights are also needed. Anyone know what the SQL spec says about this (if anything...)? > However, it is an independent issue right now, so I kept it as is. Uh, I don't really see it as independent.. If we have a bug there that we need to fix, and it's because we have two different bits of code trying to do the same checking, we should fix it be eliminating the duplicate checking, imv. > The origin of the matter is that we applies unnecessary permission checks, > although it is purely internal use and user was already checked to execute > whole of ALTER TABLE statement. Right? That's certainly a nice thought, but given the complexity in ALTER TABLE, in particular with regard to permissions checking, I have no idea if what it's doing is intentional or wrong. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] ExecutorCheckPerms() hook
* KaiGai Kohei (kai...@ak.jp.nec.com) wrote: > > #2: REALLY BIG ISSUE- You've added ExecutorCheckPerms_hook as part of > > this patch- don't, we're in feature-freeze right now and should not be > > adding hooks at this time. > > The patch is intended to submit for the v9.1 development, not v9.0, isn't it? That really depends on if this is actually fixing a bug in the existing code or not. I'm on the fence about that at the moment, to be honest. I was trying to find if we expliitly say that SELECT rights are needed to reference a column but wasn't able to. If every code path is expecting that, then perhaps we should just document it that way and move on. In that case, all these changes would be for 9.1. If we decide the current behavior is a bug, it might be something which could be fixed in 9.0 and maybe back-patched. In *either* case, given that one is a 'clean-up' patch and the other is 'new functionality', they should be independent *anyway*. Small incremental changes that don't break things when applied is what we're shooting for here. > > #3: You didn't move ExecCheckRTPerms() and ExecCheckRTEPerms() to > > utils/acl and instead added executor/executor.h to rt_triggers.c. > > I don't particularly like that. I admit that DoCopy() already knew > > about the executor, and if that were the only case outside of the > > executor where ExecCheckRTPerms() was getting called it'd probably be > > alright, but we already have another place that wants to use it, so > > let's move it to a more appropriate place. > > Sorry, I'm a bit confused. > It seemed to me you suggested to utilize ExecCheckRTPerms() rather than > moving its logic anywhere, so I kept it here. (Was it misunderstand?) I'm talking about moving the whole function (all 3 lines of it) to somewhere else and then reworking the function to be more appropriate based on it's new location (including renaming and changing arguments and return values, as appropriate). > If so, but, I doubt utils/acl is the best placeholder of the moved > ExecCheckRTPerms(), because the checker function calls both of the > default acl functions and a optional external security function. Can you explain why you think that having a function in utils/acl (eg: include/utils/acl.h and backend/utils/aclchk.c) which calls default acl functions and an allows for an external hook would be a bad idea? > It means the ExecCheckRTPerms() is caller of acl functions, not acl > function itself, isn't it? It's providing a higher-level service, sure, but there's nothing particularly interesting or special about what it's doing in this case, and, we need it in multiple places. Why duplicate it? > I agreed the checker function is not a part of executor, but it is > also not a part of acl functions in my opinion. > > If it is disinclined to create a new directory to deploy the checker > function, my preference is src/backend/utils/adt/security.c and > src/include/utils/security.h . We don't need a new directory or file for one function, as Robert already pointed out. > > #6: I havn't checked yet, but if there are other things in an RTE which > > would make sense in the DoCopy case, beyond just what's needed for the > > permissions checking, and which wouldn't be 'correct' with a NULL'd > > value, I would set those. Yes, we're building the RTE to check > > permissions, but we don't want someone downstream to be suprised when > > they make a change to something in the permissions checking and discover > > that a value in RTE they expected to be there wasn't valid. Even more > > so, if there are function helpers which can be used to build an RTE, we > > should be using them. The same goes for RI_Initial_Check(). > > Are you saying something like makeFuncExpr()? > I basically agree. However, should it be done in this patch? Actually, I mean looking for, and using, things like markRTEForSelectPriv() and addRangeTableEntry() or addRangeTableEntryForRelation(). > > #8: When moving ExecCheckRTPerms(), you should rename it to be more like > > the other function calls in acl.h Perhaps pg_rangetbl_aclcheck()? > > Also, it should return an actual AclResult instead of just true/false. > > See the comments in #3. > And, if the caller has to handle aclcheck_error(), user cannot distinguish > access violation errors between the default PG permission and any other > external security stuff, isn't it? I'm not suggesting that the caller handle aclcheck_error().. ExecCheckRTPerms() could just as easily have a flag which indicates if it will call aclcheck_error() or not, and if not, to return an AclResult to the caller. That flag could then be passed to ExecCheckRTEPerms() as you have it now. Thanks, Stephen signature.asc Description: Digital signature
[HACKERS] Fwd: PDXPUG Day at OSCON 2010
It was recommended to me to forward this to -hackers. Regards, Mark -- Forwarded message -- From: Mark Wong Date: Tue, May 18, 2010 at 6:57 AM Subject: PDXPUG Day at OSCON 2010 To: pgsql-annou...@postgresql.org Thanks to the generosity of O'Reilly, we will be having a full day of free PostgreSQL sessions on Sunday, July 18 at the Oregon Convention Center. Location details and schedule information can be found on the wiki at: http://wiki.postgresql.org/wiki/PDXPUGDay2010 We will ask for a $30 donation towards PostgreSQL at the conference, but no one will be turned away. Sign up here: https://spreadsheets.google.com/viewform?hl=en&formkey=dDVBRnJGWVlZRkdycFdXbXVuYTNiU2c6MQ Please submit your talk proposal here: http://spreadsheets.google.com/viewform?hl=en&formkey=dHBFMGFIWmxJUzhRM3R6dXVlWWxYQ1E6MQ. Proposals will be decided upon in June 7th and updated on the wiki. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Open Item: pg_controldata - machine readable?
There is an open item "pg_controldata - machine readable?" in the list: http://wiki.postgresql.org/wiki/PostgreSQL_9.0_Open_Items The proposal by Joe Conway is adding a new contib module. http://archives.postgresql.org/message-id/4b959d7a.6010...@joeconway.com http://github.com/jconway/pg_controldata Should we add the module to 9.0? If we do so, SGML documentation is required. IMHO, I'd like to put the feature into the core instead of a contrib module, but we cannot change the catalog version in this time. So, how about providing control file information through pg_settings view? We will retrieve those variables as GUC options. Regards, --- Takahiro Itagaki NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: Hiding data in postgresql
Hector, * Hector Beyers (hqbey...@gmail.com) wrote: > Does someone have any ideas how I can hide data without the meta data > noticing? To explain further, I would like to save some collection of data > where the meta-data does not see it. I am trying to do some security through > obscurity. It is for research purposes. This explanation doesn't actually explain anything, near as I can tell. Perhaps if you would share what your actual problem is, we could recommend a solution. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Synchronization levels in SR
On May 25, 2010, at 22:16 , Simon Riggs wrote: > All of these issues show why I want to specify the synchronisation mode > as a USERSET. That will allow us to specify more easily which parts of > our application are important when the cluster is degraded and which > data is so critical it must reach multiple servers. Hm, but since flushing a important COMMIT to the slave(s) will also need to flush all previous (potentially unimportant) COMMITs to the slave(s), isn't there a substantial chance of priority-inversion type problems there? Then again, if asynchronous_commit proved to be effective than so will this probably, so maybe my fear is unjustified. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ExecutorCheckPerms() hook
(2010/05/25 22:59), Stephen Frost wrote: > * KaiGai Kohei (kai...@ak.jp.nec.com) wrote: >> * DoCopy() and RI_Initial_Check() were reworked to call ExecCheckRTEPerms() >>with locally built RangeTblEntry. > > Maybe I missed it somewhere, but we still need to address the case where > the user doesn't have those SELECT permissions that we're looking for in > RI_Initial_Check(), right? KaiGai, your patch should be addressing that > in a similar fashion.. The reason why user must have SELECT privileges on the PK/FK tables is the validateForeignKeyConstraint() entirely calls SPI_execute() to verify FK constraints can be established between two tables (even if fallback path). And, the reason why RI_Initial_Check() now calls pg_class_aclcheck() is to try to avoid unexpected access violation error because of SPI_execute(). However, the fallback path also calls SPI_execute() entirely, so I concluded the permission checks in RI_Initial_Check() is nonsense. However, it is an independent issue right now, so I kept it as is. The origin of the matter is that we applies unnecessary permission checks, although it is purely internal use and user was already checked to execute whole of ALTER TABLE statement. Right? Thanks, -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ExecutorCheckPerms() hook
(2010/05/25 21:44), Stephen Frost wrote: > KaiGai, > > * KaiGai Kohei (kai...@ak.jp.nec.com) wrote: >> OK, the attached patch reworks it according to the way. > > Reviewing this patch, there are a whole slew of problems. > > #1: REALLY BIG ISSUE- Insufficient comment updates. You've changed > function definitions in a pretty serious way as well as moved some code > around such that some of the previous comments don't make sense. You > have got to update comments when you're writing a patch. Indeed, the > places I see a changes in comments are when you've removed what appears > to still be valid and appropriate comments, or places where you've added > comments which are just blatently wrong with the submitted patch. Hmm. I'll revise/add the comment around the patched code. > #2: REALLY BIG ISSUE- You've added ExecutorCheckPerms_hook as part of > this patch- don't, we're in feature-freeze right now and should not be > adding hooks at this time. The patch is intended to submit for the v9.1 development, not v9.0, isn't it? > #3: You didn't move ExecCheckRTPerms() and ExecCheckRTEPerms() to > utils/acl and instead added executor/executor.h to rt_triggers.c. > I don't particularly like that. I admit that DoCopy() already knew > about the executor, and if that were the only case outside of the > executor where ExecCheckRTPerms() was getting called it'd probably be > alright, but we already have another place that wants to use it, so > let's move it to a more appropriate place. Sorry, I'm a bit confused. It seemed to me you suggested to utilize ExecCheckRTPerms() rather than moving its logic anywhere, so I kept it here. (Was it misunderstand?) If so, but, I doubt utils/acl is the best placeholder of the moved ExecCheckRTPerms(), because the checker function calls both of the default acl functions and a optional external security function. It means the ExecCheckRTPerms() is caller of acl functions, not acl function itself, isn't it? In other words, I wonder we should categorize a function X which calls A and (optionally) B as a part of A. I agreed the checker function is not a part of executor, but it is also not a part of acl functions in my opinion. If it is disinclined to create a new directory to deploy the checker function, my preference is src/backend/utils/adt/security.c and src/include/utils/security.h . > #4: As mentioned previously, the hook (which should be added in a > separate patch anyway) makes more sense to me to be in > ExecCheckRTPerms(), not ExecCheckRTEPerms(). This also means that we > need to be calling ExecCheckRTPerms() from DoCopy and > RI_Initial_Check(), to make sure that the hook gets called. To that > end, I wouldn't even expose ExecCheckRTEPerms() outside of acl.c. Also, > there should be a big comment about not using or calling > ExecCheckRTEPerms() directly outside of ExecCheckRTPerms() since the > hook would then be skipped. I don't have any differences in preference between ExecCheckRTPerms() and ExecCheckRTEPerms(), except for DoCopy() and RI_Initial_Check() have to call the checker function with list_make1(&rte), instead of &rte. > #5: In DoCopy, you can remove relPerms and remainingPerms, but I'd > probably leave required_access up near the top and then just use it to > set rte->required_access directly rather than moving that bit deep down > into the function. OK, > #6: I havn't checked yet, but if there are other things in an RTE which > would make sense in the DoCopy case, beyond just what's needed for the > permissions checking, and which wouldn't be 'correct' with a NULL'd > value, I would set those. Yes, we're building the RTE to check > permissions, but we don't want someone downstream to be suprised when > they make a change to something in the permissions checking and discover > that a value in RTE they expected to be there wasn't valid. Even more > so, if there are function helpers which can be used to build an RTE, we > should be using them. The same goes for RI_Initial_Check(). Are you saying something like makeFuncExpr()? I basically agree. However, should it be done in this patch? > #7: I'd move the conditional if (is_from) into the foreach which is > building the columnsSet and eliminate the need for columnsSet; I don't > see that it's really adding much here. OK, > #8: When moving ExecCheckRTPerms(), you should rename it to be more like > the other function calls in acl.h Perhaps pg_rangetbl_aclcheck()? > Also, it should return an actual AclResult instead of just true/false. See the comments in #3. And, if the caller has to handle aclcheck_error(), user cannot distinguish access violation errors between the default PG permission and any other external security stuff, isn't it? Thanks, -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsvector pg_stats seems quite a bit off.
On 19/05/10 21:01, Jesper Krogh wrote: > The document base is arount 350.000 documents and > I have set the statistics target on the tsvector column > to 1000 since the 100 seems way of. So for tsvectors the statistics target means more or less "at any time track at most 10 * lexemes simultaneously" where "track" means keeping them in memory while going through the tuples being analysed. Remember that the measure is in lexemes, not whole tsvectors and the 10 factor is meant to approximate the average number of unique lexemes in a tsvector. If your documents are very large, this might not be a good approximation. > # ANALYZE verbose reference (document_tsvector); > INFO: analyzing "reference" > INFO: "reference": scanned 14486 of 14486 pages, containing 350174 live > rows and 6027 dead rows; 30 rows in sample, 350174 estimated total rows > ANALYZE > > Ok, so analyze allmost examined all rows. Looking into > "most_common_freqs" I find > # select count(unnest) from (select unnest(most_common_freqs) from > pg_stats where attname = 'document_tsvector') as foo; > count > --- > 2810 > (1 row) So the size of the most_common_freqs and most_common_vals rows in pg_statistics for tsvectors has an upper bound of * 10 (for the same reasons as mentioned before) and holds lexemes (not whole tsvectors). What happens also is that lexemes that where seen only one while going through the analysed set are discarded, so that's why you can actually get less entries in these arrays, even if your document set is big. > But the distribution is very "flat" at the end, the last 128 values are > excactly > 1.00189e-05 > which means that any term sitting outside the array would get an > estimate of > 1.00189e-05 * 350174 / 2 = 1.75 ~ 2 rows Yeah, this might meant that you could try cranking up the stats target a lot, to make the set of simulatenously tracked lexemes larger (it will cost time and memory during analyse though). If the documents have completely different contents, what can happen is that almost all lexemes are only seen a few times and get removed during the pruning of the working set. I have seen similar behaviour while working on the typanalyze function for tsvectors. > So far I have no idea if this is bad or good, so a couple of sample runs > of stuff that > is sitting outside the "most_common_vals" array: > > [gathered statistics suck] > So the "most_common_vals" seems to contain a lot of values that should > never have been kept in favor > of other values that are more common. > In practice, just cranking the statistics estimate up high enough seems > to solve the problem, but doesn't > there seem to be something wrong in how the statistics are collected? The algorithm to determine most common vals does not do it accurately. That would require keeping all lexemes from the analysed tsvectors in memory, which would be impractical. If you want to learn more about the algorithm being used, try reading http://www.vldb.org/conf/2002/S10P03.pdf and corresponding comments in ts_typanalyze.c It would be interesting to know what's the average size of a tsvector in your document set (ie. how many unique lexemes does a tsvector have on average). In general, the tsvector typanalyze function is designed to suck less than the constant factor that has been used previously, but it only works really well on the most common lexemes (thus preventing most gross misestimates). I'm not very surprised it misses the difference between 1612/350174 and 4/350174 and I'm quite happy that is gets that if you set the stats target really high :o) There's always the possibility that there's some stupid bug there, but I think you just set your expectations too high for the tsvector typanalze function. If you could come up with a better way of doing tsvector stats, that would be awesome - currently it's just doing its best to prevent the most outrageous errors. Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [spf:guess] Re: [HACKERS] ROLLBACK TO SAVEPOINT
Florian Pflug wrote: > On May 25, 2010, at 12:18 , Heikki Linnakangas wrote: > >> On 25/05/10 13:03, Florian Pflug wrote: >> >>> On May 25, 2010, at 6:08 , Sam Vilain wrote: >>> http://www.postgresql.org/docs/8.4/static/sql-savepoint.html Lead us to believe that if you roll back to the same savepoint name twice in a row, that you might start walking back through the savepoints. I guess I missed the note on ROLLBACK TO SAVEPOINT that that is not how it works. Here is the section: SQL requires a savepoint to be destroyed automatically when another savepoint with the same name is established. In PostgreSQL, the old savepoint is kept, though only the more recent one will be used when rolling back or releasing. (Releasing the newer savepoint will cause the older one to again become accessible to ROLLBACK TO SAVEPOINT and RELEASE SAVEPOINT.) Otherwise, SAVEPOINT is fully SQL conforming. >>> I'm confused. The sentence in brackets "Releasing the newer savepoint will >>> cause the older one to again become accessible to ROLLBACK TO SAVEPOINT and >>> RELEASE SAVEPOINT" implies that you *will* walk backwards through all the >>> savepoints named "a" if you repeatedly issue "ROLLBACK TO SAVEPOINT a", no? >>> If that is not how it actually works, then this whole paragraph is wrong, >>> I'd say. >>> >> Releasing the newer savepoint will cause the older one to again become >> accessible, as the doc says, but rolling back to a savepoint does not >> implicitly release it. You'll have to use RELEASE SAVEPOINT for that. >> > > Ah, now I get it. Thanks. > > Would changing "Releasing the newer savepoint will cause ... " to "Explicitly > releasing the newer savepoint" or maybe even "Explicitly releasing the newer > savepoint with RELEASE SAVEPOINT will cause ..." make things clearer? > Yes, probably - your misreading matches my misreading of it :-) There is another way you can get there - releasing to a savepoint before the re-used savepoint name will also release the savepoints after it. ie savepoint foo; savepoint bar; savepoint foo; release to savepoint bar; release to savepoint foo; After the first release, the second 'foo' savepoint is gone. I think this is a key advantage in saving the old savepoints. Cheers, Sam -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/25/2010 4:50 PM, Simon Riggs wrote: On Tue, 2010-05-25 at 16:41 -0400, Jan Wieck wrote: On 5/25/2010 12:03 PM, Simon Riggs wrote: > On Sun, 2010-05-23 at 16:21 -0400, Jan Wieck wrote: > >> In some systems (data warehousing, replication), the order of commits is >> important, since that is the order in which changes have become visible. >> This information could theoretically be extracted from the WAL, but >> scanning the entire WAL just to extract this tidbit of information would >> be excruciatingly painful. > > I think it would be quite simple to read WAL. WALSender reads the WAL > file after its been flushed, so it would be simple for it to read a blob > of WAL and then extract the commit order from it. > > Overall though, it would be easier and more efficient to *add* info to > WAL and then do all this processing *after* WAL has been transported > elsewhere. Extracting info with DDL triggers, normal triggers, commit > order and everything else seems like too much work to me. Every other > RDBMS has moved away from trigger-based replication and we should give > that serious consideration also. Reading the entire WAL just to find all COMMIT records, then go back to the origin database to get the actual replication log you're looking for is simpler and more efficient? I don't think so. Agreed, but I think I've not explained myself well enough. I proposed two completely separate ideas; the first one was this: If you must get commit order, get it from WAL on *origin*, using exact same code that current WALSender provides, plus some logic to read through the WAL records and extract commit/aborts. That seems much simpler than the proposal you outlined and as SR shows, its low latency as well since commits write to WAL. No need to generate event ticks either, just use XLogRecPtrs as WALSender already does. I see no problem with integrating that into core, technically or philosophically. Which means that if I want to allow a consumer of that commit order data to go offline for three days or so to replicate the 5 requested, low volume tables, the origin needs to hang on to the entire WAL log from all 100 other high volume tables? Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages
> Correct. The problem actually are aborted transactions. Just because an > XID is really old doesn't mean it was committed. Yes, that's the main issue with my idea; XIDs which fell off the CLOG would become visible even if they'd aborted. Do we get a bit in the visibility map for a page which has aborted transaction rows on it? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confused about the buffer pool size
MMK, > But it does not tell my anything about what the actual buffer size is. > How do I know what the real buffer size is? I am using 8.4.4 and I am > running only one query at a time. Please move this discussion to the pgsql-general or pgsql-performance lists. pgsql-hackers is for working on PostgreSQL code, and further questions on this list will probably not be answered. Other than that, I have no idea what you mean by "buffer size", nor why you need to know it. I'd suggest starting your post on the other mailing list by explaining what specific problem you're trying to solve. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages
On 5/24/2010 9:30 AM, Heikki Linnakangas wrote: On 22/05/10 16:35, Tom Lane wrote: Josh Berkus writes: From a discussion at dinner at pgcon, I wanted to send this to the list for people to poke holes in it: Somebody (I think Joe or Heikki) poked a big hole in this last night at the Royal Oak. Me. Although the scheme would get rid of the need to replace old XIDs with FrozenXid, it does not get rid of the need to set hint bits before you can truncate CLOG. Hmm, we don't rely on setting hint bits to truncate CLOG anymore (http://archives.postgresql.org/pgsql-committers/2006-11/msg00026.php). It's the replacement of xids with FrozenXid that matters, the hint bits are really just hints. Doesn't change the conclusion, though: you still need to replace XIDs with FrozenXids to truncate the clog. Conceivably we could keep around more than 2^32 transactions in clog with this scheme, but then you need a lot more space for the clog. But perhaps it would be better to do that than to launch anti-wraparound vacuums, or to refuse more updates in the extreme cases. Correct. The problem actually are aborted transactions. Just because an XID is really old doesn't mean it was committed. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Tue, 2010-05-25 at 16:41 -0400, Jan Wieck wrote: > On 5/25/2010 12:03 PM, Simon Riggs wrote: > > On Sun, 2010-05-23 at 16:21 -0400, Jan Wieck wrote: > > > >> In some systems (data warehousing, replication), the order of commits is > >> important, since that is the order in which changes have become visible. > >> This information could theoretically be extracted from the WAL, but > >> scanning the entire WAL just to extract this tidbit of information would > >> be excruciatingly painful. > > > > I think it would be quite simple to read WAL. WALSender reads the WAL > > file after its been flushed, so it would be simple for it to read a blob > > of WAL and then extract the commit order from it. > > > > Overall though, it would be easier and more efficient to *add* info to > > WAL and then do all this processing *after* WAL has been transported > > elsewhere. Extracting info with DDL triggers, normal triggers, commit > > order and everything else seems like too much work to me. Every other > > RDBMS has moved away from trigger-based replication and we should give > > that serious consideration also. > > Reading the entire WAL just to find all COMMIT records, then go back to > the origin database to get the actual replication log you're looking for > is simpler and more efficient? I don't think so. Agreed, but I think I've not explained myself well enough. I proposed two completely separate ideas; the first one was this: If you must get commit order, get it from WAL on *origin*, using exact same code that current WALSender provides, plus some logic to read through the WAL records and extract commit/aborts. That seems much simpler than the proposal you outlined and as SR shows, its low latency as well since commits write to WAL. No need to generate event ticks either, just use XLogRecPtrs as WALSender already does. I see no problem with integrating that into core, technically or philosophically. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages
Alvaro, >> This sounds like extending Xid to 64 bits, without having to store the >> high bits everywhere. Was this discussed in the PGCon devs meeting? Essentially, yes. One of the main objections to raising XID to 64-bit has been the per-row overhead. But adding 4 bytes per page wouldn't be much of an impact. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/25/2010 12:03 PM, Simon Riggs wrote: On Sun, 2010-05-23 at 16:21 -0400, Jan Wieck wrote: In some systems (data warehousing, replication), the order of commits is important, since that is the order in which changes have become visible. This information could theoretically be extracted from the WAL, but scanning the entire WAL just to extract this tidbit of information would be excruciatingly painful. I think it would be quite simple to read WAL. WALSender reads the WAL file after its been flushed, so it would be simple for it to read a blob of WAL and then extract the commit order from it. Overall though, it would be easier and more efficient to *add* info to WAL and then do all this processing *after* WAL has been transported elsewhere. Extracting info with DDL triggers, normal triggers, commit order and everything else seems like too much work to me. Every other RDBMS has moved away from trigger-based replication and we should give that serious consideration also. Reading the entire WAL just to find all COMMIT records, then go back to the origin database to get the actual replication log you're looking for is simpler and more efficient? I don't think so. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Robert Haas wrote: > maybe we should get serializable working and committed on one > node first and then worry about how to distribute it. I think > there might be other approaches to this problem Well, I've got two or three other ideas on how we can manage this for HS, but since I now realize that I've totally misunderstood the main use case for this (which is to support trigger-based replication), I'd like to be clear on something before letting it drop. The big question is, do such replicas need to support serializable access to the data modified by serializable transactions in the source database? That is, is there a need for such replicas to only see states which are possible in some serial order of execution of serializable transactions on the source database? Or to phrase the same question a third way, should there be a way to run queries on such replicas with confidence that what is viewed is consistent with user-defined constraints and business rules? If not, there's no intersection between this feature and SSI. If there is, I think we should think through at least a general strategy sooner, rather than later. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Tue, 2010-05-25 at 21:19 +0200, Yeb Havinga wrote: > Simon Riggs wrote: > > How we handle degraded mode is important, yes. Whatever parameters we > > choose the problem will remain the same. > > > > Should we just ignore degraded mode and respond as if nothing bad had > > happened? Most people would say not. > > > > If we specify server1 = synch and server2 = async we then also need to > > specify what happens if server1 is down. People might often specify > > if (server1 == down) server2 = synch. > > > I have a hard time imagining including async servers in the quorum. If > an async servers vote is necessary to reach quorum due to a 'real' sync > standby server failure, it would mean that the async-intended standby is > now also in sync with the master transactions. IMHO this is a bad > situation, since instead of the DBA getting the error: "not enough sync > standbys to reach quorum", he'll now get "database is slow" complaints, > only to find out later that too much sync standby servers went south. > (under the assumption that async servers are mostly on too slow links to > consider for sync standby). Yeh, there's difficulty either way. We don't need to think of servers as being "synch" or "async", more likely we would rate them in terms of typical synchronisation delay. So yeh, calling them "fast" and "slow" in terms of synchronisation delay makes sense. Some people with low xact rate and high need for protection might want to switch across to the slow server and keep running. If not, the max_synch_delay would trip and you would then select synch_failure_action = rollback. The realistic response is to add a second "fast" sync server, to allow you to stay up even when you lose one of the fast servers. That now gives you 4 servers and the failure modes start to get real complex. Specifying rules to achieve what you're after would be much harder. Some people might want that, but most people won't in the general case and if they did specify them they'd likely get them wrong. All of these issues show why I want to specify the synchronisation mode as a USERSET. That will allow us to specify more easily which parts of our application are important when the cluster is degraded and which data is so critical it must reach multiple servers. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Jan Wieck writes: >> No, I meant how will the *function* know, if a superuser and/or some >> background process can purge records at any time? > The data contains timestamps which are supposedly taken in commit order. You can *not* rely on the commit timestamps to be in exact order. (Perhaps approximate ordering is good enough for what you want here, but just be careful to not fall into the trap of assuming that they're exactly ordered.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: Hiding data in postgresql
On Tue, May 25, 2010 at 3:39 PM, Hector Beyers wrote: > > Hi guys, > (I tried the question in another forum first) > Does someone have any ideas how I can hide data without the meta data > noticing? To explain further, I would like to save some collection of data > where the meta-data does not see it. I am trying to do some security through > obscurity. It is for research purposes. > For example, populate a table with 1000 rows, but the meta-data only knows > of about 500 of them? Only on an export of a dump can you find all the data > again. Or maybe to make a hidden duplicate schema that can point to the > hidden data? > Does someone have any good ideas on how to achieve this or something > similar? > Kind regards > Hector > > > On Mon, May 24, 2010 at 9:16 PM, Hector Beyers wrote: >> >> Hi guys, >> does ANYONE have any tips on hiding data on a database server? This means >> that data is stored in places that is not necessarily picked up in the >> schema of the database. I am doing some research on databases and need some >> direction. >> Any help or direction will be highly appreciated. >> Kind regards >> Hector Not sure if this helpful, but be sure to know about views, which can be used to filter out rows of a table. Example: CREATE TABLE foo (name TEXT, visible BOOL); INSERT INTO foo VALUES ('two', true); INSERT INTO foo VALUES ('three', true); INSERT INTO foo VALUES ('four', false); INSERT INTO foo VALUES ('five', true); INSERT INTO foo VALUES ('six', false); INSERT INTO foo VALUES ('seven', true); INSERT INTO foo VALUES ('eight', false); INSERT INTO foo VALUES ('nine', false); INSERT INTO foo VALUES ('ten', false); INSERT INTO foo VALUES ('eleven', true); CREATE VIEW foo_view AS SELECT foo.name FROM foo WHERE visible=true; => SELECT * FROM foo; name | visible +- two| t three | t four | f five | t six| f seven | t eight | f nine | f ten| f eleven | t (10 rows) => SELECT * FROM foo_view; name two three five seven eleven (5 rows) Note that views are SELECT-only, but you can use CREATE RULE to simulate an updatable view. You may also want to read about Veil: http://veil.projects.postgresql.org/curdocs/main.html -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] mergejoin null handling (was Re: [PERFORM] merge join killing performance)
Scott Marlowe writes: > So, Tom, so you think it's possible that the planner isn't noticing > all those nulls and thinks it'll just take a row or two to get to the > value it needs to join on? I dug through this and have concluded that it's really an oversight in the patch I wrote some years ago in response to this: http://archives.postgresql.org/pgsql-performance/2005-05/msg00219.php That patch taught nodeMergejoin that a row containing a NULL key can't possibly match anything on the other side. However, its response to observing a NULL is just to advance to the next row of that input. What we should do, if the NULL is in the first merge column and the sort order is nulls-high, is realize that every following row in that input must also contain a NULL and so we can just terminate the mergejoin immediately. The original patch works well for cases where there are just a few nulls in one input and the important factor is to not read all the rest of the other input --- but it fails to cover the case where there are many nulls and the important factor is to not read all the rest of the nulls. The problem can be demonstrated if you modify the example given in the above-referenced message so that table t1 contains lots of nulls rather than just a few: explain analyze will show that all of t1 gets read by the mergejoin, and that's not necessary. I'm inclined to think this is a performance bug and should be back-patched, assuming the fix is simple (which I think it is, but haven't coded/tested yet). It'd probably be reasonable to go back to 8.3; before that, sorting nulls high versus nulls low was pretty poorly defined and so there'd be risk of breaking cases that gave the right answers before. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/24/2010 9:30 AM, Greg Sabino Mullane wrote: -BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 In light of the proposed purging scheme, how would it be able to distinguish between those two cases (nothing there yet vs. was there but purged)? There is a difference between an empty result set and an exception. No, I meant how will the *function* know, if a superuser and/or some background process can purge records at any time? The data contains timestamps which are supposedly taken in commit order. Checking the age of the last entry in the file should be simple enough to determine if the segment matches the "max age" configuration (if set). In the case of a superuser telling what to purge he would just call a function with a serial number (telling the obsolete segments). Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Florian Pflug wrote: > Hm, but for there to be an actual problem (and not a false > positive), an actual dangerous circle has to exist in the > dependency graph. The existence of a dangerous structure is just a > necessary (but not sufficient) and easily checked-for condition > for that, right? Now, if a read-only transaction only ever has > outgoing edges, it cannot be part of a (dangerous or not) circle, > and hence any dangerous structure it is part of is a false > positive. > > I guess my line of reasoning is flawed somehow, but I cannot > figure out why... Here's why: We're tracking rw-dependencies, where the "time-arrow" showing effective order of execution points from the reader to the writer (since the reader sees a state prior to the write, it effectively executes before it). These are important because there have to be two such dependencies, one in to the pivot and one out from the pivot, for a problem to exist. (See various works by Dr. Alan Fekete, et al, for details.) But other dependencies can imply an order of execution. In particular, a wr-dependency, where a transaction *can* see data committed by another transaction, implies that the *writer* came first in the order of execution. In this example, the transaction which lists the receipts successfully reads the control table update, but is not able to read the receipt insert. This completes the cycle, making it a real anomaly and not a false positive. Note that the wr-dependency can actually exist outside the database, making it pretty much impossible to accurately tell a false positive from a true anomaly when the pivot exists and the transaction writing data which the pivot can't read commits first. For example, let's say that the update to the control table is committed from an application which, seeing that its update came back without error, proceeds to list the receipts for the old date in a subsequent transaction. You have a wr-dependency which is, in reality, quite real and solid with no way to notice it within the database engine. That's why the techniques used in SSI are pretty hard to improve upon beyond more detailed and accurate tracking of rw-conflicts. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
2010/5/25 Florian Pflug : > Hm, but for there to be an actual problem (and not a false positive), an > actual dangerous circle has to exist in the dependency graph. The > existence of a dangerous structure is just a necessary (but not > sufficient) and easily checked-for condition for that, right? Now, if a > read-only transaction only ever has outgoing edges, it cannot be part > of a (dangerous or not) circle, and hence any dangerous structure it is > part of is a false positive. > > I guess my line of reasoning is flawed somehow, but I cannot figure out why... In the general case, "wr" dependencies also create "must be serialized before" edges. It seems that those edges can be discarded when finding a pivot, but if you want to go "back to basics": ("<" means "must be serialized before".) * T1 < T2, because T1 reads a version of a data element for which T2 later creates a newer version (rw between T1 and T2). * T3 < T1, because T3 reads a version of a data element for which T1 later creates a newer version (rw between T3 and T1). * T2 < T3, because T2 creates a version of a data element, which is then read by T3 (wr between T2 and T3). (As you can see, those 3 edges form a cycle.) Nicolas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fwd: Hiding data in postgresql
Hi guys, (I tried the question in another forum first) Does someone have any ideas how I can hide data without the meta data noticing? To explain further, I would like to save some collection of data where the meta-data does not see it. I am trying to do some security through obscurity. It is for research purposes. For example, populate a table with 1000 rows, but the meta-data only knows of about 500 of them? Only on an export of a dump can you find all the data again. Or maybe to make a hidden duplicate schema that can point to the hidden data? Does someone have any good ideas on how to achieve this or something similar? Kind regards Hector On Mon, May 24, 2010 at 9:16 PM, Hector Beyers wrote: > > Hi guys, > > does ANYONE have any tips on hiding data on a database server? This means > that data is stored in places that is not necessarily picked up in the > schema of the database. I am doing some research on databases and need some > direction. > > Any help or direction will be highly appreciated. > > Kind regards > > Hector > >
Re: [HACKERS] Exposing the Xact commit order to the user
2010/5/25 Florian Pflug : > On May 25, 2010, at 20:18 , Dan Ports wrote: > >> T3, which is a read-only transaction, sees the incremented date and an >> empty list of receipts. But T1 later commits a new entry in the >> receipts table with the old date. No serializable ordering allows this. >> >> However, if T3 hadn't performed its read, there'd be no problem; we'd >> just serialize T1 before T2 and no one would be the wiser. > > Hm, so in fact SSI sometimes allows the database to be inconsistent, but only > as long as nobody tries to observe it? I would not call this an inconsistent state: it would become inconsistent only after someone (e.g., T3) has observed it _and_ T1 commits. Nicolas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On May 25, 2010, at 20:48 , Dan Ports wrote: > On Tue, May 25, 2010 at 08:35:44PM +0200, Florian Pflug wrote: >> Hm, so in fact SSI sometimes allows the database to be inconsistent, but >> only as long as nobody tries to observe it? > > Yes. Note that even while it's in an inconsistent state, you can still > perform any query that doesn't observe the inconsistency -- hopefully > most queries fall into this category. Yeah, as long as you just walk by without looking, the database is happy ;-) >> Btw, I still don't get how this follows from the Cahill paper. For a >> transaction to lie on a dangerous circle, it needs incoming and outgoing >> edges in the conflict graph, right? But I'd have though that conflicts are >> always between a reader and a writer or between two writers. So how can a >> read-only transaction have incoming and outgoing edges? > > Right, the read-only transaction can't have incoming edges, but it can > have outgoing edges. So it can't be the "pivot" itself (the transaction > with both outgoing and incoming edges), but it can cause *another* > transaction to be. > > In the example I gave, T3 (the r/o transaction) has an outgoing edge to > T1, because it didn't see T1's concurrent update. T1 already had an > outgoing edge to T2, so adding in this incoming edge from T3 creates > the dangerous structure. Hm, but for there to be an actual problem (and not a false positive), an actual dangerous circle has to exist in the dependency graph. The existence of a dangerous structure is just a necessary (but not sufficient) and easily checked-for condition for that, right? Now, if a read-only transaction only ever has outgoing edges, it cannot be part of a (dangerous or not) circle, and hence any dangerous structure it is part of is a false positive. I guess my line of reasoning is flawed somehow, but I cannot figure out why... best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
2010/5/25 Dan Ports : > On Tue, May 25, 2010 at 02:00:42PM +0200, Nicolas Barbier wrote: > >> I don't understand the problem. According to me, in the context of >> SSI, a read-only slave can just map SERIALIZABLE to the technical >> implementation of REPEATABLE READ (i.e., the currently-existing >> "SERIALIZABLE"). The union of the transactions on the master and the >> slave(s) will still exhibit SERIALIZABLE behavior because the >> transactions on the slave cannot write anything and are therefore >> irrelevant. > > This, unfortunately, isn't true in SSI. > > Consider read-only transactions on a single node SSI database -- the > situation is the same for read-only transactions that run on a slave. > These transactions can be part of anomalies, so they need to be checked > for conflicts and potentially aborted. > > Consider Kevin's favorite example, where one table contains the current > date and the other is a list of receipts (initially empty). > T1 inserts (select current_date) into receipts, but doesn't commit > T2 increments current_date and commits > T3 reads both current_date and the receipt table > T1 commits > > T3, which is a read-only transaction, sees the incremented date and an > empty list of receipts. But T1 later commits a new entry in the > receipts table with the old date. No serializable ordering allows this. > However, if T3 hadn't performed its read, there'd be no problem; we'd > just serialize T1 before T2 and no one would be the wiser. > > SSI would detect a potential conflict here, which we could resolve by > aborting T3. (We could also abort T1, but if this is a replicated > system this isn't always an option -- T3 might be running on the > slave, so only the slave will know about the conflict, and it can't > very well abort an update transaction on the master.) Ah, indeed. I made the same reasoning mistake as Florian (presumably) did: I didn't think of the fact that the read-only transaction doesn't need to be the pivot. Nicolas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confused about the buffer pool size
Hello Heikki: This is what the documentation says (see below). But it does not tell my anything about what the actual buffer size is. How do I know what the real buffer size is? I am using 8.4.4 and I am running only one query at a time. Cheers, MMK. Sets the planner's assumption about the effective size of the disk cache that is available to a single query. This is factored into estimates of the cost of using an index; a higher value makes it more likely index scans will be used, a lower value makes it more likely sequential scans will be used. When setting this parameter you should consider both PostgreSQL's shared buffers and the portion of the kernel's disk cache that will be used for PostgreSQL data files. Also, take into account the expected number of concurrent queries on different tables, since they will have to share the available space. This parameter has no effect on the size of shared memory allocated by PostgreSQL, nor does it reserve kernel disk cache; it is used only for estimation purposes. The default is 128 megabytes (128MB). --- On Tue, 5/25/10, Heikki Linnakangas wrote: From: Heikki Linnakangas Subject: Re: [HACKERS] Confused about the buffer pool size To: "MMK" Cc: "PostgreSQL-development" Date: Tuesday, May 25, 2010, 11:36 AM On 25/05/10 19:49, MMK wrote: > Hello All: > In the code (costsize.c), I see that effective_cache_size is set to > DEFAULT_EFFECTIVE_CACHE_SIZE. > This is defined as follows in cost.h > #define DEFAULT_EFFECTIVE_CACHE_SIZE 16384 > But when I say > show shared_buffers in psql I get, > shared_buffers 28MB > In postgresql.conf file, the following lines appear > shared_buffers = 28MB # min 128kB # (change > requires restart)#temp_buffers = 8MB # min 800kB > > So I am assuming that the buffer pool size is 28MB = 28 * 128 = 3584 8K pages. > So should effective_cache_size be set to 3584 rather than the 16384? No. Please see the manual for what effective_cache_size means: http://www.postgresql.org/docs/8.4/interactive/runtime-config-query.html#GUC-EFFECTIVE-CACHE-SIZE -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
Hi, Simon Riggs writes: > On Tue, 2010-05-25 at 19:08 +0200, Alastair Turner wrote: >> On Tue, May 25, 2010 at 6:28 PM, Simon Riggs wrote: >> > The best parameter we can specify is the number of servers that we wish >> > to wait for confirmation from. >> >> This may be an incredibly naive question, but what happens to the >> transaction on the master if the number of confirmations is not >> received? > > It's much easier to say you want to wait for N servers to respond, but > don't care which they are. One parameter, simple and flexible. [...] > So whatever we do, we need additional parameters to specify timeouts > (including wait-forever as an option) and action-on-timeout: commit or > rollback. I was preparing an email on the line that we need each slave to declare its desired minimum level of synchronicity, and have the master filter that with what the transaction wants. Scratch that. Thinking about it some more, I see that Simon's proposal is both more simple and effective: we already have Hot Standby and admin functions that tells us the last replayed LSN. The bigger wins. So in case of failover we know which slave to choose. The only use case I can see for what I had in mind is to allow the user to choose which server is trusted to have accurate data or better read only performances. But if the link is slow, the code will soon enough notice, mind you. I'm still not sure about my preference here, but I can see why Simon's proposal is simpler and addresses all concerns apart from forcing the servers into a non-optimal setup for a gain that is uneasy to see. Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
Simon Riggs wrote: How we handle degraded mode is important, yes. Whatever parameters we choose the problem will remain the same. Should we just ignore degraded mode and respond as if nothing bad had happened? Most people would say not. If we specify server1 = synch and server2 = async we then also need to specify what happens if server1 is down. People might often specify if (server1 == down) server2 = synch. I have a hard time imagining including async servers in the quorum. If an async servers vote is necessary to reach quorum due to a 'real' sync standby server failure, it would mean that the async-intended standby is now also in sync with the master transactions. IMHO this is a bad situation, since instead of the DBA getting the error: "not enough sync standbys to reach quorum", he'll now get "database is slow" complaints, only to find out later that too much sync standby servers went south. (under the assumption that async servers are mostly on too slow links to consider for sync standby). regards, Yeb Havinga -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Jan Wieck wrote: > Have you ever looked at one of those queries, that Londiste or > Slony issue against the provider DB in order to get all the log > data that has been committed between two snapshots? Is that really > the best you can think of? No, I admit I haven't. In fact, I was thinking primarily in terms of log-driven situations, like HS. What would be the best place for me to look to come up to speed on your use case? (I'm relatively sure that the issue isn't that there's no information to find, but that a sequential pass over all available information would take a *long* time.) I've been working through the issues on WAL-based replicas, and have some additional ideas and alternatives, but I'd like to see the "big picture", including trigger-based replication, before posting. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Tue, May 25, 2010 at 08:35:44PM +0200, Florian Pflug wrote: > Hm, so in fact SSI sometimes allows the database to be inconsistent, but only > as long as nobody tries to observe it? Yes. Note that even while it's in an inconsistent state, you can still perform any query that doesn't observe the inconsistency -- hopefully most queries fall into this category. > Btw, I still don't get how this follows from the Cahill paper. For a > transaction to lie on a dangerous circle, it needs incoming and outgoing > edges in the conflict graph, right? But I'd have though that conflicts are > always between a reader and a writer or between two writers. So how can a > read-only transaction have incoming and outgoing edges? Right, the read-only transaction can't have incoming edges, but it can have outgoing edges. So it can't be the "pivot" itself (the transaction with both outgoing and incoming edges), but it can cause *another* transaction to be. In the example I gave, T3 (the r/o transaction) has an outgoing edge to T1, because it didn't see T1's concurrent update. T1 already had an outgoing edge to T2, so adding in this incoming edge from T3 creates the dangerous structure. Dan -- Dan R. K. Ports MIT CSAILhttp://drkp.net/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Florian Pflug wrote: > Hm, so in fact SSI sometimes allows the database to be > inconsistent, but only as long as nobody tries to observe it? Not exactly. The eventually-persisted state is always consistent, but there can be a transitory committed state which would violate user-defined constraints or business rules *if viewed*. This is what I've been on about -- the commit sequence is not necessarily the same as the apparent order of execution. A read-only transaction, if run before the overlapping commits "settle", can view a state which is not consistent with any serial order of execution, and might therefore break the rules. SSI detects that and rolls one of the transactions back if they're all running at serializable transaction isolation in a single SSI database, but the question is how to handle this when the read happens in a replica. > Btw, I still don't get how this follows from the Cahill paper. For > a transaction to lie on a dangerous circle, it needs incoming and > outgoing edges in the conflict graph, right? At least one of the transactions participating in the cycle does. There's no requirement that they all do. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsvector pg_stats seems quite a bit off.
Excerpts from Jesper Krogh's message of mié may 19 15:01:18 -0400 2010: > But the distribution is very "flat" at the end, the last 128 values are > excactly > 1.00189e-05 > which means that any term sitting outside the array would get an estimate of > 1.00189e-05 * 350174 / 2 = 1.75 ~ 2 rows I don't know if this is related, but tsvector stats are computed and stored per term, not per datum. This is different from all other datatypes. Maybe there's code somewhere that's assuming per-datum and coming up with the wrong estimates? Or maybe the tsvector-specific code contains a bug somewhere; maybe a rounding error? -- Álvaro Herrera -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On May 25, 2010, at 20:18 , Dan Ports wrote: > On Tue, May 25, 2010 at 02:00:42PM +0200, Nicolas Barbier wrote: >> I don't understand the problem. According to me, in the context of >> SSI, a read-only slave can just map SERIALIZABLE to the technical >> implementation of REPEATABLE READ (i.e., the currently-existing >> "SERIALIZABLE"). The union of the transactions on the master and the >> slave(s) will still exhibit SERIALIZABLE behavior because the >> transactions on the slave cannot write anything and are therefore >> irrelevant. > > This, unfortunately, isn't true in SSI. > > Consider read-only transactions on a single node SSI database -- the > situation is the same for read-only transactions that run on a slave. > These transactions can be part of anomalies, so they need to be checked > for conflicts and potentially aborted. > > Consider Kevin's favorite example, where one table contains the current > date and the other is a list of receipts (initially empty). > T1 inserts (select current_date) into receipts, but doesn't commit > T2 increments current_date and commits > T3 reads both current_date and the receipt table > T1 commits > > T3, which is a read-only transaction, sees the incremented date and an > empty list of receipts. But T1 later commits a new entry in the > receipts table with the old date. No serializable ordering allows this. > > However, if T3 hadn't performed its read, there'd be no problem; we'd > just serialize T1 before T2 and no one would be the wiser. Hm, so in fact SSI sometimes allows the database to be inconsistent, but only as long as nobody tries to observe it? Btw, I still don't get how this follows from the Cahill paper. For a transaction to lie on a dangerous circle, it needs incoming and outgoing edges in the conflict graph, right? But I'd have though that conflicts are always between a reader and a writer or between two writers. So how can a read-only transaction have incoming and outgoing edges? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Tue, May 25, 2010 at 02:00:42PM +0200, Nicolas Barbier wrote: > I don't understand the problem. According to me, in the context of > SSI, a read-only slave can just map SERIALIZABLE to the technical > implementation of REPEATABLE READ (i.e., the currently-existing > "SERIALIZABLE"). The union of the transactions on the master and the > slave(s) will still exhibit SERIALIZABLE behavior because the > transactions on the slave cannot write anything and are therefore > irrelevant. This, unfortunately, isn't true in SSI. Consider read-only transactions on a single node SSI database -- the situation is the same for read-only transactions that run on a slave. These transactions can be part of anomalies, so they need to be checked for conflicts and potentially aborted. Consider Kevin's favorite example, where one table contains the current date and the other is a list of receipts (initially empty). T1 inserts (select current_date) into receipts, but doesn't commit T2 increments current_date and commits T3 reads both current_date and the receipt table T1 commits T3, which is a read-only transaction, sees the incremented date and an empty list of receipts. But T1 later commits a new entry in the receipts table with the old date. No serializable ordering allows this. However, if T3 hadn't performed its read, there'd be no problem; we'd just serialize T1 before T2 and no one would be the wiser. SSI would detect a potential conflict here, which we could resolve by aborting T3. (We could also abort T1, but if this is a replicated system this isn't always an option -- T3 might be running on the slave, so only the slave will know about the conflict, and it can't very well abort an update transaction on the master.) There's another example of a read-only transaction anomaly that could cause similar problems at http://portal.acm.org/citation.cfm?doid=1031570.1031573, but I think this one is easier to follow. Dan -- Dan R. K. Ports MIT CSAILhttp://drkp.net/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages
Alvaro Herrera writes: > This sounds like extending Xid to 64 bits, without having to store the > high bits everywhere. Was this discussed in the PGCon devs meeting? Yeah, that's what it would amount to. It was not discussed at the dev meeting --- it was an idea that came up one evening at PGCon. I'm not sure whether this would imply having to widen xid to 64 bits internally. That could be a bit unpleasant as far as CPU and shared memory space go, although every year that goes by makes 32-bit machines less interesting as DB servers. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard
Robert Haas writes: > On Tue, May 25, 2010 at 1:09 PM, Mike Fowler wrote: >>> We're unlikely to accept this patch if it changes the minimum version >>> of libxml2 required to compile PostgreSQL >> >> Why? 2.6.27 is almost 4 years old. > Because we work hard to minimize our dependencies and make them as > non-onerous as possible. > At a minimum, I think it's fair to say that the burden is on you to > justify what it's worth bumping the version number. Yes. Increasing the minimum required version of some library is a Big Deal, we don't do it on a whim. And we definitely don't do it just because it's old. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard
Robert Haas wrote: On Tue, May 25, 2010 at 1:09 PM, Mike Fowler wrote: We're unlikely to accept this patch if it changes the minimum version of libxml2 required to compile PostgreSQL Why? 2.6.27 is almost 4 years old. Because we work hard to minimize our dependencies and make them as non-onerous as possible. At a minimum, I think it's fair to say that the burden is on you to justify what it's worth bumping the version number. If there is some major speed or performance advantage to using the newer API, maybe we'll consider it. But if it's just a few extra lines of code to work around it, then it's better to write those extra lines of code rather than potentially force users to upgrade packages they're otherwise happy with. The real issue is what's going to be available on most of the platforms we build on. Unfortunately, 2.6.26 is what's on my CentOS 5.4 boxes, for example. I'm sure we don't want to make 9.1 not buildable with the installed libraries on still fairly current RedHat-derived platforms. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Tue, May 25, 2010 at 6:28 PM, Simon Riggs wrote: ... > > The best parameter we can specify is the number of servers that we wish > to wait for confirmation from. That is a definition that easily manages > the complexity of having various servers up/down at any one time. It > also survives misconfiguration more easily, as well as providing a > workaround if replicating across a bursty network where we can't > guarantee response times, even of the typical response time is good. > This may be an incredibly naive question, but what happens to the transaction on the master if the number of confirmations is not received? Is this intended to create a situation where the master effectively becomes unavailable for write operations when its synchronous slaves are unavailable? Alastair "Bell" Turner ^F5 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Tue, 2010-05-25 at 13:31 -0400, Robert Haas wrote: > So I agree that we need to talk about whether or not we want to do > this. I'll give my opinion. I am not sure how useful this really is. > Consider a master with two standbys. The master commits a > transaction and waits for one of the two standbys, then acknowledges > the commit back to the user. Then the master crashes. Now what? > It's not immediately obvious which standby we should being online as > the primary, and if we guess wrong we could lose transactions thought > to be committed. This is probably a solvable problem, with enough > work: we can write a script to check the last LSN received by each of > the two standbys and promote whichever one is further along. > > But... what happens if the master and one standby BOTH crash > simultaneously? There's no way of knowing (until we get at least one > of them back up) whether it's safe to promote the other standby. Not much of a problem really, is it? If you have one server left out of 3, then you promote it OR you stay down - your choice. There is no "safe to promote" knowledge in *any* scenario; you never know what was on the primary, only what was received by the standby. If you have N standbys still up, you can pick which using the algorithm you mention. Remember that the WAL is sequential, so its not like the commit order of transactions will differ across servers if we use quorum commit. So not a problem. The multiple simultaneous case is fairly common for people that pick the "synch to server in next rack" because there's a 100 reasons why we'd take out both at the same time, ask JD. > I like the idea of a "quorum commit" type feature where we promise the > user that things are committed when "enough" servers have acknowledged > the commit. But I think most people are not going to want that > configuration unless we also provide some really good management tools > that we don't have today. Good name. Management tools has nothing to do with this; completely orthogonal. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard
On 05/25/2010 01:09 PM, Mike Fowler wrote: Why? 2.6.27 is almost 4 years old. RHEL 5 ships with 2.6.26. I imagine that supporting it is very desirable, regardless of its age, since that is unfortunately still the latest version of RHEL. -- m. tharp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Tue, 2010-05-25 at 19:08 +0200, Alastair Turner wrote: > On Tue, May 25, 2010 at 6:28 PM, Simon Riggs wrote: > ... > > > > The best parameter we can specify is the number of servers that we wish > > to wait for confirmation from. That is a definition that easily manages > > the complexity of having various servers up/down at any one time. It > > also survives misconfiguration more easily, as well as providing a > > workaround if replicating across a bursty network where we can't > > guarantee response times, even of the typical response time is good. > > > > This may be an incredibly naive question, but what happens to the > transaction on the master if the number of confirmations is not > received? Is this intended to create a situation where the master > effectively becomes unavailable for write operations when its > synchronous slaves are unavailable? How we handle degraded mode is important, yes. Whatever parameters we choose the problem will remain the same. Should we just ignore degraded mode and respond as if nothing bad had happened? Most people would say not. If we specify server1 = synch and server2 = async we then also need to specify what happens if server1 is down. People might often specify if (server1 == down) server2 = synch. So now we have 3 configuration settings, one quite complex. It's much easier to say you want to wait for N servers to respond, but don't care which they are. One parameter, simple and flexible. In both cases, we have to figure what to do if we can't get either server to respond. In replication there is no such thing as "server down" just a "server didn't reply in time X". So we need to define timeouts. So whatever we do, we need additional parameters to specify timeouts (including wait-forever as an option) and action-on-timeout: commit or rollback. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard
On Tue, May 25, 2010 at 1:09 PM, Mike Fowler wrote: >> We're unlikely to accept this patch if it changes the minimum version >> of libxml2 required to compile PostgreSQL > > Why? 2.6.27 is almost 4 years old. Because we work hard to minimize our dependencies and make them as non-onerous as possible. At a minimum, I think it's fair to say that the burden is on you to justify what it's worth bumping the version number. If there is some major speed or performance advantage to using the newer API, maybe we'll consider it. But if it's just a few extra lines of code to work around it, then it's better to write those extra lines of code rather than potentially force users to upgrade packages they're otherwise happy with. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Tue, 2010-05-25 at 13:31 -0400, Robert Haas wrote: > On Tue, May 25, 2010 at 1:10 PM, Simon Riggs wrote: > > On Tue, 2010-05-25 at 11:52 -0500, Kevin Grittner wrote: > >> Robert Haas wrote: > >> > Simon Riggs wrote: > >> >> If we define robustness at the standby level then robustness > >> >> depends upon unseen administrators, as well as the current > >> >> up/down state of standbys. This is action-at-a-distance in its > >> >> worst form. > >> > > >> > Maybe, but I can't help thinking people are going to want some > >> > form of this. The case where someone wants to do sync rep to the > >> > machine in the next rack over and async rep to a server at a > >> > remote site seems too important to ignore. > >> > >> I think there may be a terminology issue here -- I took "configure > >> by standby" to mean that *at the master* you would specify rules for > >> each standby. I think Simon took it to mean that each standby would > >> define the rules for replication to it. Maybe this issue can > >> resolve gracefully with a bit of clarification? > > > > The use case of "machine in the next rack over and async rep to a server > > at a remote site" would require the settings > > > > server.nextrack = synch > > server.remotesite = async > > > > which leaves open the question of what happens when "nextrack" is down. > > > > In many cases, to give adequate performance in that situation people add > > an additional server, so the config becomes > > > > server.nextrack1 = synch > > server.nextrack2 = synch > > server.remotesite = async > > > > We then want to specify for performance reasons that we can get a reply > > from either nextrack1 or nextrack2, so it all still works safely and > > quickly if one of them is down. How can we express that rule concisely? > > With some difficulty. > > Perhaps the difficulty here is that those still look like per-server > settings to me. Just maybe with a different set of semantics. (Those are the per-server settings.) -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confused about the buffer pool size
On 25/05/10 19:49, MMK wrote: Hello All: In the code (costsize.c), I see that effective_cache_size is set to DEFAULT_EFFECTIVE_CACHE_SIZE. This is defined as follows in cost.h #define DEFAULT_EFFECTIVE_CACHE_SIZE 16384 But when I say show shared_buffers in psql I get, shared_buffers 28MB In postgresql.conf file, the following lines appear shared_buffers = 28MB # min 128kB # (change requires restart)#temp_buffers = 8MB # min 800kB So I am assuming that the buffer pool size is 28MB = 28 * 128 = 3584 8K pages. So should effective_cache_size be set to 3584 rather than the 16384? No. Please see the manual for what effective_cache_size means: http://www.postgresql.org/docs/8.4/interactive/runtime-config-query.html#GUC-EFFECTIVE-CACHE-SIZE -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Tue, May 25, 2010 at 1:10 PM, Simon Riggs wrote: > On Tue, 2010-05-25 at 11:52 -0500, Kevin Grittner wrote: >> Robert Haas wrote: >> > Simon Riggs wrote: >> >> If we define robustness at the standby level then robustness >> >> depends upon unseen administrators, as well as the current >> >> up/down state of standbys. This is action-at-a-distance in its >> >> worst form. >> > >> > Maybe, but I can't help thinking people are going to want some >> > form of this. The case where someone wants to do sync rep to the >> > machine in the next rack over and async rep to a server at a >> > remote site seems too important to ignore. >> >> I think there may be a terminology issue here -- I took "configure >> by standby" to mean that *at the master* you would specify rules for >> each standby. I think Simon took it to mean that each standby would >> define the rules for replication to it. Maybe this issue can >> resolve gracefully with a bit of clarification? > > The use case of "machine in the next rack over and async rep to a server > at a remote site" would require the settings > > server.nextrack = synch > server.remotesite = async > > which leaves open the question of what happens when "nextrack" is down. > > In many cases, to give adequate performance in that situation people add > an additional server, so the config becomes > > server.nextrack1 = synch > server.nextrack2 = synch > server.remotesite = async > > We then want to specify for performance reasons that we can get a reply > from either nextrack1 or nextrack2, so it all still works safely and > quickly if one of them is down. How can we express that rule concisely? > With some difficulty. Perhaps the difficulty here is that those still look like per-server settings to me. Just maybe with a different set of semantics. > My suggestion is simply to have a single parameter (name unimportant) > > number_of_synch_servers_we_wait_for = N > > which is much easier to understand because it is phrased in terms of the > guarantee given to the transaction, not in terms of what the admin > thinks is the situation. So I agree that we need to talk about whether or not we want to do this. I'll give my opinion. I am not sure how useful this really is. Consider a master with two standbys. The master commits a transaction and waits for one of the two standbys, then acknowledges the commit back to the user. Then the master crashes. Now what? It's not immediately obvious which standby we should being online as the primary, and if we guess wrong we could lose transactions thought to be committed. This is probably a solvable problem, with enough work: we can write a script to check the last LSN received by each of the two standbys and promote whichever one is further along. But... what happens if the master and one standby BOTH crash simultaneously? There's no way of knowing (until we get at least one of them back up) whether it's safe to promote the other standby. I like the idea of a "quorum commit" type feature where we promise the user that things are committed when "enough" servers have acknowledged the commit. But I think most people are not going to want that configuration unless we also provide some really good management tools that we don't have today. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages
Excerpts from Heikki Linnakangas's message of mar may 25 04:41:30 -0400 2010: > On 24/05/10 22:49, Alvaro Herrera wrote: > > I think this is nonsense. If you have 3-years-old sales transactions, > > and your database has any interesting churn, tuples those pages have > > been frozen for a very long time *already*. > What's missing from the suggestion is that relfrozenxid and datfrozenxid > also need to be expanded to 8-bytes. That way you effectively have > 8-byte XIDs, which means that you never need to vacuum to avoid XID > wraparound. Hmm, so are we going to use the "xid epoch" more officially? That's entirely a new line of development, perhaps it opens new possibilities. This sounds like extending Xid to 64 bits, without having to store the high bits everywhere. Was this discussed in the PGCon devs meeting? -- Álvaro Herrera -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Tue, 2010-05-25 at 11:52 -0500, Kevin Grittner wrote: > Robert Haas wrote: > > Simon Riggs wrote: > >> If we define robustness at the standby level then robustness > >> depends upon unseen administrators, as well as the current > >> up/down state of standbys. This is action-at-a-distance in its > >> worst form. > > > > Maybe, but I can't help thinking people are going to want some > > form of this. The case where someone wants to do sync rep to the > > machine in the next rack over and async rep to a server at a > > remote site seems too important to ignore. > > I think there may be a terminology issue here -- I took "configure > by standby" to mean that *at the master* you would specify rules for > each standby. I think Simon took it to mean that each standby would > define the rules for replication to it. Maybe this issue can > resolve gracefully with a bit of clarification? The use case of "machine in the next rack over and async rep to a server at a remote site" would require the settings server.nextrack = synch server.remotesite = async which leaves open the question of what happens when "nextrack" is down. In many cases, to give adequate performance in that situation people add an additional server, so the config becomes server.nextrack1 = synch server.nextrack2 = synch server.remotesite = async We then want to specify for performance reasons that we can get a reply from either nextrack1 or nextrack2, so it all still works safely and quickly if one of them is down. How can we express that rule concisely? With some difficulty. My suggestion is simply to have a single parameter (name unimportant) number_of_synch_servers_we_wait_for = N which is much easier to understand because it is phrased in terms of the guarantee given to the transaction, not in terms of what the admin thinks is the situation. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard
Robert Haas wrote: On Tue, May 25, 2010 at 12:04 PM, Mike Fowler wrote: Erik Rijkers wrote: libxml2.x86_64 2.6.26-2.1.2.8 installed libxml2-devel.x86_642.6.26-2.1.2.8 installed Thanks for testing my patch Erik. It turns out I've got libxml2 installed at version 2.7.5. Searching the gnome mailing lists, it turns out xmlXPathCompiledEvalToBoolean was added (unbelievably) in the very next version from yours, 2.6.27 (see: http://mail.gnome.org/archives/xml/2006-October/msg00119.html). We're unlikely to accept this patch if it changes the minimum version of libxml2 required to compile PostgreSQL Why? 2.6.27 is almost 4 years old. I realise that my patch didn't update configure and configure.in, and indeed I didn't think of it when I responded to Erik (I'm too used to the Java world where people manage their own dependencies). I've now attached the updated patch which ups the check from version 2.6.23 to 2.6.27. Regards, -- Mike Fowler Registered Linux user: 379787 Index: configure === RCS file: /home/mfowler/cvsrepo/pgrepo/pgsql/configure,v retrieving revision 1.679 diff -c -r1.679 configure *** configure 13 May 2010 22:07:40 - 1.679 --- configure 25 May 2010 16:57:49 - *** *** 9079,9087 if test "$with_libxml" = yes ; then ! { $as_echo "$as_me:$LINENO: checking for xmlSaveToBuffer in -lxml2" >&5 ! $as_echo_n "checking for xmlSaveToBuffer in -lxml2... " >&6; } ! if test "${ac_cv_lib_xml2_xmlSaveToBuffer+set}" = set; then $as_echo_n "(cached) " >&6 else ac_check_lib_save_LIBS=$LIBS --- 9079,9087 if test "$with_libxml" = yes ; then ! { $as_echo "$as_me:$LINENO: checking for xmlXPathCompiledEvalToBoolean in -lxml2" >&5 ! $as_echo_n "checking for xmlXPathCompiledEvalToBoolean in -lxml2... " >&6; } ! if test "${ac_cv_lib_xml2_xmlXPathCompiledEvalToBoolean+set}" = set; then $as_echo_n "(cached) " >&6 else ac_check_lib_save_LIBS=$LIBS *** *** 9099,9109 #ifdef __cplusplus extern "C" #endif ! char xmlSaveToBuffer (); int main () { ! return xmlSaveToBuffer (); ; return 0; } --- 9099,9109 #ifdef __cplusplus extern "C" #endif ! char xmlXPathCompiledEvalToBoolean (); int main () { ! return xmlXPathCompiledEvalToBoolean (); ; return 0; } *** *** 9129,9140 test "$cross_compiling" = yes || $as_test_x conftest$ac_exeext }; then ! ac_cv_lib_xml2_xmlSaveToBuffer=yes else $as_echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ! ac_cv_lib_xml2_xmlSaveToBuffer=no fi rm -rf conftest.dSYM --- 9129,9140 test "$cross_compiling" = yes || $as_test_x conftest$ac_exeext }; then ! ac_cv_lib_xml2_xmlXPathCompiledEvalToBoolean=yes else $as_echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ! ac_cv_lib_xml2_xmlXPathCompiledEvalToBoolean=no fi rm -rf conftest.dSYM *** *** 9142,9150 conftest$ac_exeext conftest.$ac_ext LIBS=$ac_check_lib_save_LIBS fi ! { $as_echo "$as_me:$LINENO: result: $ac_cv_lib_xml2_xmlSaveToBuffer" >&5 ! $as_echo "$ac_cv_lib_xml2_xmlSaveToBuffer" >&6; } ! if test "x$ac_cv_lib_xml2_xmlSaveToBuffer" = x""yes; then cat >>confdefs.h <<_ACEOF #define HAVE_LIBXML2 1 _ACEOF --- 9142,9150 conftest$ac_exeext conftest.$ac_ext LIBS=$ac_check_lib_save_LIBS fi ! { $as_echo "$as_me:$LINENO: result: $ac_cv_lib_xml2_xmlXPathCompiledEvalToBoolean" >&5 ! $as_echo "$ac_cv_lib_xml2_xmlXPathCompiledEvalToBoolean" >&6; } ! if test "x$ac_cv_lib_xml2_xmlXPathCompiledEvalToBoolean" = x""yes; then cat >>confdefs.h <<_ACEOF #define HAVE_LIBXML2 1 _ACEOF *** *** 9152,9159 LIBS="-lxml2 $LIBS" else ! { { $as_echo "$as_me:$LINENO: error: library 'xml2' (version >= 2.6.23) is required for XML support" >&5 ! $as_echo "$as_me: error: library 'xml2' (version >= 2.6.23) is required for XML support" >&2;} { (exit 1); exit 1; }; } fi --- 9152,9159 LIBS="-lxml2 $LIBS" else ! { { $as_echo "$as_me:$LINENO: error: library 'xml2' (version >= 2.6.27) is required for XML support" >&5 ! $as_echo "$as_me: error: library 'xml2' (version >= 2.6.27) is required for XML support" >&2;} { (exit 1); exit 1; }; } fi Index: configure.in === RCS file: /home/mfowler/cvsrepo/pgrepo/pgsql/configure.in,v retrieving revision 1.627 diff -c -r1.627 configure.in *** configure.in 13 May 2010 22:07:42 - 1.627 --- configure.in 25 May 2010 16:22:32 - *** *** 940,946 fi if test "$with_libxml" = yes ; then ! AC_CHECK_LIB(xml2, xmlSaveToBuffer, [], [AC_MSG_ERROR([library 'xml2' (version >= 2.6.23) is required for XML support])]) fi if test "$with_libxslt" = yes ; then --- 940,946 ---
Re: [HACKERS] Synchronization levels in SR
On Tue, 2010-05-25 at 12:40 -0400, Robert Haas wrote: > On Tue, May 25, 2010 at 12:28 PM, Simon Riggs wrote: > > Synchronous replication implies that a commit should wait. This wait is > > experienced by the transaction, not by other parts of the system. If we > > define robustness at the standby level then robustness depends upon > > unseen administrators, as well as the current up/down state of standbys. > > This is action-at-a-distance in its worst form. > > Maybe, but I can't help thinking people are going to want some form of > this. > The case where someone wants to do sync rep to the machine in > the next rack over and async rep to a server at a remote site seems > too important to ignore. The use case of "machine in the next rack over and async rep to a server at a remote site" *is* important, but you give no explanation as to why that implies "per-standby" is the solution to it. If you read the rest of my email, you'll see that I have explained the problems "per-standby" settings would cause. Please don't be so quick to claim it is me ignoring anything. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] Add _PG_init to PL language handler documentation
Howdy, This tiny doc patch adds _PG_init to the skeleton example code for a PL. The information is quite valuable to PL authors, who might miss it when it is described in the shared library documentation. This patch was based off of 6e2ba96 in the git mirror and a colorized diff can be viewed here: http://github.com/leto/postgres/commit/a9e265a7f55a0605fb4c6135f0f689c8b89e9623 Duke -- Jonathan "Duke" Leto jonat...@leto.net http://leto.net pginit.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
Robert Haas wrote: > Simon Riggs wrote: >> If we define robustness at the standby level then robustness >> depends upon unseen administrators, as well as the current >> up/down state of standbys. This is action-at-a-distance in its >> worst form. > > Maybe, but I can't help thinking people are going to want some > form of this. The case where someone wants to do sync rep to the > machine in the next rack over and async rep to a server at a > remote site seems too important to ignore. I think there may be a terminology issue here -- I took "configure by standby" to mean that *at the master* you would specify rules for each standby. I think Simon took it to mean that each standby would define the rules for replication to it. Maybe this issue can resolve gracefully with a bit of clarification? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON manipulation functions
On Tue, May 25, 2010 at 10:52 AM, Joseph Adams wrote: >> Well, I think it's fine to use the wiki for brainstorming, but before >> you change the design you probably need to talk about it here. You >> can't rely on everyone on -hackers to follow changes on a wiki page >> somewhere. It looks like the API has been overhauled pretty heavily >> since the last version we talked about here, and I'm not sure I >> understand it. > > I'll try to explain it in one big nutshell: > > Instead of, for instance, json_to_number('5') and number_to_json(5), I > propose changing it to from_json(5)::INT and to_json('5'). Note how > from_json simply returns TEXT containing the underlying value for the > user to cast. I plan to make calling to_json/from_json with arrays or > objects (e.g. to_json(ARRAY[1,2,3]) and from_json('[1,2,3]') ) throw > an error for now, as implementing all the specifics of this could be > quite distracting. I don't see how that's an improvement over the previous design. It seems like it adds a lot of extra casting and removes useful list operations without any corresponding advantage. > If I'm not mistaken, json_object([content [AS name] [, ...]] | *) > RETURNS json can't be implemented without augmenting the grammar (as > was done with xmlforest), so I considered making it take a RECORD > parameter like the hstore(RECORD) function does, as was suggested on > IRC. However, this may be inadequate for selecting some columns but > not others. Using examples from hstore: > > SELECT hstore(foo) FROM foo; => '"e"=>"2.71828", "pi"=>"3.14159"' > -- this works, but what if we only want one field? > > SELECT hstore(pi) FROM foo; > -- function type error > > SELECT hstore(row(pi)) FROM foo; => '"f1"=>"3.14159"' > -- field name is lost > > SELECT hstore(bar) FROM (select pi FROM foo) AS bar; => '"f1"=>"3.14159"' > -- ugly, and field name is *still* lost Yeah. I'm not sure what to do about this problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Confused about the buffer pool size
Hello All: In the code (costsize.c), I see that effective_cache_size is set to DEFAULT_EFFECTIVE_CACHE_SIZE. This is defined as follows in cost.h #define DEFAULT_EFFECTIVE_CACHE_SIZE 16384 But when I say show shared_buffers in psql I get, shared_buffers 28MB In postgresql.conf file, the following lines appear shared_buffers = 28MB # min 128kB # (change requires restart)#temp_buffers = 8MB # min 800kB So I am assuming that the buffer pool size is 28MB = 28 * 128 = 3584 8K pages. So should effective_cache_size be set to 3584 rather than the 16384? Thanks, MMK.
Re: [HACKERS] Synchronization levels in SR
On Tue, 2010-05-25 at 12:40 -0400, Robert Haas wrote: > On Tue, May 25, 2010 at 12:28 PM, Simon Riggs wrote: > > Synchronous replication implies that a commit should wait. This wait is > > experienced by the transaction, not by other parts of the system. If we > > define robustness at the standby level then robustness depends upon > > unseen administrators, as well as the current up/down state of standbys. > > This is action-at-a-distance in its worst form. > > Maybe, but I can't help thinking people are going to want some form of > this. The case where someone wants to do sync rep to the machine in > the next rack over and async rep to a server at a remote site seems > too important to ignore. Uhh yeah, that is pretty much the standard use case. The "next rack" is only 50% of the equation. The next part is the disaster recovery rack over 100Mb (or even 10Mb) that is half way across the country. It is common, very common. Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard
On Tue, May 25, 2010 at 12:04 PM, Mike Fowler wrote: > Erik Rijkers wrote: >> >> libxml2.x86_64 2.6.26-2.1.2.8 installed >> libxml2-devel.x86_64 2.6.26-2.1.2.8 installed >> > > Thanks for testing my patch Erik. It turns out I've got libxml2 installed at > version 2.7.5. Searching the gnome mailing lists, it turns out > xmlXPathCompiledEvalToBoolean was added (unbelievably) in the very next > version from yours, 2.6.27 (see: > http://mail.gnome.org/archives/xml/2006-October/msg00119.html). We're unlikely to accept this patch if it changes the minimum version of libxml2 required to compile PostgreSQL. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery getting interrupted is not so unusual as it used to be
On Tue, May 25, 2010 at 12:36 PM, Simon Riggs wrote: > On Tue, 2010-05-25 at 19:12 +0900, Fujii Masao wrote: >> On Mon, May 17, 2010 at 5:33 PM, Fujii Masao wrote: >> > On Sat, May 15, 2010 at 3:20 AM, Robert Haas wrote: >> >> Hmm, OK, I think that makes sense. Would you care to propose a patch? >> > >> > Yep. Here is the patch. >> > >> > This patch distinguishes normal shutdown from unexpected exit, while the >> > server is in recovery. That is, when smart or fast shutdown is requested >> > during recovery, the bgwriter sets the ControlFile->state to new-introduced >> > DB_SHUTDOWNED_IN_RECOVERY state. >> >> This patch is worth applying for 9.0? If not, I'll add it into >> the next CF for 9.1. > > Presumably Robert will be applying the patch? It seems to address the > concern raised on the thread. Yes, I was planning to review it. But if you or someone else would like to cut in, that's OK too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Tue, May 25, 2010 at 12:28 PM, Simon Riggs wrote: > Synchronous replication implies that a commit should wait. This wait is > experienced by the transaction, not by other parts of the system. If we > define robustness at the standby level then robustness depends upon > unseen administrators, as well as the current up/down state of standbys. > This is action-at-a-distance in its worst form. Maybe, but I can't help thinking people are going to want some form of this. The case where someone wants to do sync rep to the machine in the next rack over and async rep to a server at a remote site seems too important to ignore. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery getting interrupted is not so unusual as it used to be
On Tue, 2010-05-25 at 19:12 +0900, Fujii Masao wrote: > On Mon, May 17, 2010 at 5:33 PM, Fujii Masao wrote: > > On Sat, May 15, 2010 at 3:20 AM, Robert Haas wrote: > >> Hmm, OK, I think that makes sense. Would you care to propose a patch? > > > > Yep. Here is the patch. > > > > This patch distinguishes normal shutdown from unexpected exit, while the > > server is in recovery. That is, when smart or fast shutdown is requested > > during recovery, the bgwriter sets the ControlFile->state to new-introduced > > DB_SHUTDOWNED_IN_RECOVERY state. > > This patch is worth applying for 9.0? If not, I'll add it into > the next CF for 9.1. Presumably Robert will be applying the patch? It seems to address the concern raised on the thread. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Mon, 2010-05-24 at 18:29 -0700, Josh Berkus wrote: > If people agree that the above is our roadmap, implementing > "per-standby" first makes sense, and then we can implement "per-session" > GUC later. IMHO "per-standby" sounds simple, but is dangerously simplistic, explained on another part of the thread. We need to think clearly about failure modes and how they will be handled. Failure modes and edge cases completely govern the design here. "All running smoothly" isn't a major concern and so it appears that the user interface can be done various ways. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Mon, 2010-05-24 at 22:20 +0900, Fujii Masao wrote: > Second, we need to discuss about how to specify the synch > level. There are three approaches: > > * Per standby > Since the purpose, location and H/W resource often differ > from one standby to another, specifying level per standby > (i.e., we set the level in recovery.conf) is a > straightforward approach, I think. For example, we can > choose #3 for high-availability standby near the master, > and choose #1 (async) for the disaster recovery standby > remote. > > * Per transaction > Define the PGC_USERSET option specifying the level and > specify it on the master in response to the purpose of > transaction. In this approach, for example, we can choose > #4 for the transaction which should be visible on the > standby as soon as a "success" of the commit has been > returned to a client. We can also choose #1 for > time-critical but not mission-critical transaction. > > * Mix > Allow users to specify the level per standby and > transaction at the same time, and then calculate the real > level from them by using some algorithm. > > Which should we adopt for 9.1? I'd like to implement the > "per-standby" approach at first since it's simple and seems > to cover more use cases. Thought? -1 Synchronous replication implies that a commit should wait. This wait is experienced by the transaction, not by other parts of the system. If we define robustness at the standby level then robustness depends upon unseen administrators, as well as the current up/down state of standbys. This is action-at-a-distance in its worst form. Imagine having 2 standbys, 1 synch, 1 async. If the synch server goes down, performance will improve and robustness will have been lost. What good would that be? Imagine a standby connected over a long distance. DBA brings up standby in synch mode accidentally and the primary server hits massive performance problems without any way of directly controlling this. The worst aspect of standby-level controls is that nobody ever knows how safe a transaction is. There is no definition or test for us to check exactly how safe any particular transaction is. Also, the lack of safety occurs at the time when you least want it - when one of your servers is already down. So I call "per-standby" settings simple, and broken in multiple ways. Putting the control in the hands of the transaction owner (i.e. on the master) is exactly where the control should be. I personally like the idea of that being a USERSET, though could live with system wide settings if need be. But the control must be on the *master* not on the standbys. The best parameter we can specify is the number of servers that we wish to wait for confirmation from. That is a definition that easily manages the complexity of having various servers up/down at any one time. It also survives misconfiguration more easily, as well as providing a workaround if replicating across a bursty network where we can't guarantee response times, even of the typical response time is good. (We've discussed this many times before over a period of years and not really sure why we have to re-discuss this repeatedly just because people disagree. You don't mention the earlier discussions, not sure why. If we want to follow the community process, then all previous discussions need to be taken into account, unless things have changed - which they haven't: same topic, same people, AFAICS.) -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Tue, 2010-05-25 at 12:40 +0900, Fujii Masao wrote: > On Tue, May 25, 2010 at 10:29 AM, Josh Berkus wrote: > > I agree that #4 should be done last, but it will be needed, not in the > > least by your employer ;-) . I don't see any obvious way to make #4 > > compatible with any significant query load on the slave, but in general > > I'd think that users of #4 are far more concerned with 0% data loss than > > they are with getting the slave to run read queries. > > Since #2 and #3 are enough for 0% data loss, I think that such users > would be more concerned about what results are visible in the standby. > No? Please add #4 also. You can do that easily at the same time as #2 and #3, and it will leave me free to fix the perceived conflict problems. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Sun, 2010-05-23 at 16:21 -0400, Jan Wieck wrote: > In some systems (data warehousing, replication), the order of commits is > important, since that is the order in which changes have become visible. > This information could theoretically be extracted from the WAL, but > scanning the entire WAL just to extract this tidbit of information would > be excruciatingly painful. I think it would be quite simple to read WAL. WALSender reads the WAL file after its been flushed, so it would be simple for it to read a blob of WAL and then extract the commit order from it. Overall though, it would be easier and more efficient to *add* info to WAL and then do all this processing *after* WAL has been transported elsewhere. Extracting info with DDL triggers, normal triggers, commit order and everything else seems like too much work to me. Every other RDBMS has moved away from trigger-based replication and we should give that serious consideration also. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard
Erik Rijkers wrote: libxml2.x86_64 2.6.26-2.1.2.8 installed libxml2-devel.x86_642.6.26-2.1.2.8 installed Thanks for testing my patch Erik. It turns out I've got libxml2 installed at version 2.7.5. Searching the gnome mailing lists, it turns out xmlXPathCompiledEvalToBoolean was added (unbelievably) in the very next version from yours, 2.6.27 (see: http://mail.gnome.org/archives/xml/2006-October/msg00119.html). Regards, -- Mike Fowler Registered Linux user: 379787 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard (was: Add xpath_exists Function)
On Tue, May 25, 2010 16:31, Mike Fowler wrote: > I've been reading the SQL/XML standard and discovered that it defines a > function named XMLEXISTS that does exactly what the todo item > xpath_exists defines. My original patch named the function as per the > todo but I think using the function name from the standard is a better > idea. So this patch is the same as before, but the function is now named > XMLEXISTS instead of xpath_exists. > I tried this path (cvs HEAD, applies without error), but get this error: [...] utils/adt/xml.o: In function `xmlexists': /var/data1/pg_stuff/pg_sandbox/pgsql.xmlexists/src/backend/utils/adt/xml.c:3639: undefined reference to `xmlXPathCompiledEvalToBoolean' collect2: ld returned 1 exit status make[2]: *** [postgres] Error 1 make[2]: Leaving directory `/var/data1/pg_stuff/pg_sandbox/pgsql.xmlexists/src/backend' make[1]: *** [all] Error 2 make[1]: Leaving directory `/var/data1/pg_stuff/pg_sandbox/pgsql.xmlexists/src' make: *** [all] Error 2 ./configure --prefix=/var/data1/pg_stuff/pg_installations/pgsql.xmlexists --with-pgport=6548 --quiet --enable-depend --enable-cassert --enable-debug --with-openssl --with-perl --with-libxml --with-libxslt centos 5.4 2.6.18-164.el5 x86_64 GNU/Linux libxml2.x86_64 2.6.26-2.1.2.8 installed libxml2-devel.x86_642.6.26-2.1.2.8 installed Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clearing psql`s input buffer after auto-reconnect
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 > 3. Have CheckConnection do longjmp(sigint_interrupt_jmp) after resetting ... > Now #1 might be the best long-term solution but I have no particular > appetite to tackle it, and #2 is just too ugly to contemplate. That > leaves #3, which is a bit ugly in its own right but seems like the best > fix we're likely to get. > > Comments, better ideas? I like #3. If this were a more common event I might lean towards #1 but it's not so #3 seems fine. - -- Greg Sabino Mullane g...@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201005251113 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iEYEAREDAAYFAkv76TYACgkQvJuQZxSWSsiP6wCePU5TDpfFiv7MQpQ0vdIMms0d XZcAoMES58ilXZr2m5TEfeRUeiuuuss2 =36Z9 -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade docs
Robert Haas wrote: > On Mon, May 24, 2010 at 11:35 PM, Bruce Momjian wrote: > > Have you read the docs? ?It does mention the issue with /contrib and > > stuff. ?How do I document a limitation I don't know about? ?This is all > > very vague. ?Please suggest some wording. > > OK, here's an attempt. Please fact-check. > > -- > > General Limitations > > pg_upgrade relies on binary compatibility between the old and new > on-disk formats, including the on-disk formats of individual data > types. pg_upgrade attempts to detect cases in which the on-disk > format has changed; for example, it verifies that the old and new > clusters have the same value for --enable-integer-datetimes. However, > there is no systematic way for pg_upgrade to detect problems of this > type; it has hard-coded knowledge of the specific cases known to exist > in core PostgreSQL, including /contrib. If third-party or > user-defined data types or access methods are used, it is the user's > responsibility to verify that the versions loaded into the old and new > clusters use compatible on-disk formats. If they do not, pg_upgrade > may appear to work but subsequently crash or silently corrupt data. OK, I have added a mention of the issues above, in a more abbreviated format. > pg_upgrade also relies on ABI compatibility between modules loaded > into the old and new clusters. For example, if an SQL function in the > old cluster is defined to call a particular C function, pg_upgrade > will recreate SQL function in the new cluster and will configure it to > call the same C function. If no such C function can be found by the > new cluster, pg_upgrade will simply fail. However, if a C function of > the same name exists in the new cluster, but expects a different > number of arguments or different types of arguments, then it is likely > to crash the system when called. In the worst case, data corruption > could result. These issues are not unique to pg_upgrade, and could happen even in a pg_dump restore. > Also, the following sentence appears not to fit with our "only to 9.0" > policy: "For Windows users, note that due to different integer > datetimes settings used by the one-click installer and the MSI > installer, it is only possible to upgrade from version 8.3 of the > one-click distribution to version 8.4 of the one-click distribution. > It is not possible to upgrade from the MSI installer to the one-click > installer." Agreed. I added a "8.4 or later" mention. It is not worth calling it "9.0 or later" because then I would have to update this mention for every major release. Applied patch attached. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com Index: doc/src/sgml/pgupgrade.sgml === RCS file: /cvsroot/pgsql/doc/src/sgml/pgupgrade.sgml,v retrieving revision 1.10 diff -c -c -r1.10 pgupgrade.sgml *** doc/src/sgml/pgupgrade.sgml 24 May 2010 17:43:39 - 1.10 --- doc/src/sgml/pgupgrade.sgml 25 May 2010 14:50:36 - *** *** 16,21 --- 16,31 9.0.1 -> 9.0.4. + + pg_upgrade works because, though new features are + regularly added to Postgres major releases, the internal data storage + format rarely changes. pg_upgrade does its best to + make sure the old and new clusters are binary-compatible, e.g. by + checking for compatible compile-time settings. It is important that + any external modules are also binary compatibile, though this cannot + be checked by pg_upgrade. + + Supported Versions *** *** 440,446 Limitations in migrating from PostgreSQL 8.3 - Upgrading from PostgreSQL 8.3 has additional restrictions not present when upgrading from later PostgreSQL releases. For example, --- 450,455 *** *** 502,509 For Windows users, note that due to different integer datetimes settings used by the one-click installer and the MSI installer, it is only possible to upgrade from version 8.3 of the one-click distribution to !version 8.4 of the one-click distribution. It is not possible to upgrade !from the MSI installer to the one-click installer. --- 511,518 For Windows users, note that due to different integer datetimes settings used by the one-click installer and the MSI installer, it is only possible to upgrade from version 8.3 of the one-click distribution to !version 8.4 or later of the one-click distribution. It is not !possible to upgrade from the MSI installer to the one-click installer. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON manipulation functions
> Well, I think it's fine to use the wiki for brainstorming, but before > you change the design you probably need to talk about it here. You > can't rely on everyone on -hackers to follow changes on a wiki page > somewhere. It looks like the API has been overhauled pretty heavily > since the last version we talked about here, and I'm not sure I > understand it. I'll try to explain it in one big nutshell: Instead of, for instance, json_to_number('5') and number_to_json(5), I propose changing it to from_json(5)::INT and to_json('5'). Note how from_json simply returns TEXT containing the underlying value for the user to cast. I plan to make calling to_json/from_json with arrays or objects (e.g. to_json(ARRAY[1,2,3]) and from_json('[1,2,3]') ) throw an error for now, as implementing all the specifics of this could be quite distracting. If I'm not mistaken, json_object([content [AS name] [, ...]] | *) RETURNS json can't be implemented without augmenting the grammar (as was done with xmlforest), so I considered making it take a RECORD parameter like the hstore(RECORD) function does, as was suggested on IRC. However, this may be inadequate for selecting some columns but not others. Using examples from hstore: SELECT hstore(foo) FROM foo; => '"e"=>"2.71828", "pi"=>"3.14159"' -- this works, but what if we only want one field? SELECT hstore(pi) FROM foo; -- function type error SELECT hstore(row(pi)) FROM foo; => '"f1"=>"3.14159"' -- field name is lost SELECT hstore(bar) FROM (select pi FROM foo) AS bar; => '"f1"=>"3.14159"' -- ugly, and field name is *still* lost To get (and set, which I overlooked before), use json_get and json_set. These take "JSONPath" expressions, but I don't plan to implement all sorts of fancy features during the summer. However, I do plan to support some kind of parameter substitution so you can do this: json_get('[0,1,4,9,16,25]', '[%]' %% 2)=> '4'::TEXT For this use case, though, it would be simpler to say: '[0,1,4,9,16,25]'::JSON -> 2 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq, PQexecPrepared, data size sent to FE vs. FETCH_COUNT
Alex Goncharov wrote: ,--- I/Alex (Mon, 24 May 2010 12:25:18 -0400) * | No equivalent of FETCH_COUNT is available at the libpq level, so I | assume that the interface I am using is smart enough not to send | gigabytes of data to FE. | | Where does the result set (GBs of data) reside after I call | PQexecPrepared? On BE, I hope? Sorry for asking again... No sarcasm meant: is there no straightforward answer here? Or nobody is certain? Or a wrong list? You have been given the answer. Please re-read the replies, e.g. the one from Abhijit Menon-Sen. The data is saved on the client side before the call returns. If that uses too much memory, use a cursor. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard (was: Add xpath_exists Function)
I've been reading the SQL/XML standard and discovered that it defines a function named XMLEXISTS that does exactly what the todo item xpath_exists defines. My original patch named the function as per the todo but I think using the function name from the standard is a better idea. So this patch is the same as before, but the function is now named XMLEXISTS instead of xpath_exists. Regards, -- Mike Fowler Registered Linux user: 379787 Index: src/backend/utils/adt/xml.c === RCS file: /home/mfowler/cvsrepo/pgrepo/pgsql/src/backend/utils/adt/xml.c,v retrieving revision 1.97 diff -c -r1.97 xml.c *** src/backend/utils/adt/xml.c 3 Mar 2010 17:29:45 - 1.97 --- src/backend/utils/adt/xml.c 25 May 2010 14:02:33 - *** *** 3495,3497 --- 3495,3668 return 0; #endif } + + /* + * Determines if the node specified by the supplied XPath exists + * in a given XML document, returning a boolean. + * + * It is up to the user to ensure that the XML passed is in fact + * an XML document - XPath doesn't work easily on fragments without + * a context node being known. + */ + Datum + xmlexists(PG_FUNCTION_ARGS) + { + #ifdef USE_LIBXML + text *xpath_expr_text = PG_GETARG_TEXT_P(0); + xmltype*data = PG_GETARG_XML_P(1); + ArrayType *namespaces = PG_GETARG_ARRAYTYPE_P(2); + xmlParserCtxtPtr ctxt = NULL; + xmlDocPtr doc = NULL; + xmlXPathContextPtr xpathctx = NULL; + xmlXPathCompExprPtr xpathcomp = NULL; + char *datastr; + int32 len; + int32 xpath_len; + xmlChar*string; + xmlChar*xpath_expr; + int i; + int ndim; + Datum *ns_names_uris; + bool *ns_names_uris_nulls; + int ns_count; + int result; + + /* + * Namespace mappings are passed as text[]. If an empty array is passed + * (ndim = 0, "0-dimensional"), then there are no namespace mappings. + * Else, a 2-dimensional array with length of the second axis being equal + * to 2 should be passed, i.e., every subarray contains 2 elements, the + * first element defining the name, the second one the URI. Example: + * ARRAY[ARRAY['myns', 'http://example.com'], ARRAY['myns2', + * 'http://example2.com']]. + */ + ndim = ARR_NDIM(namespaces); + if (ndim != 0) + { + int *dims; + + dims = ARR_DIMS(namespaces); + + if (ndim != 2 || dims[1] != 2) + ereport(ERROR, + (errcode(ERRCODE_DATA_EXCEPTION), + errmsg("invalid array for XML namespace mapping"), + errdetail("The array must be two-dimensional with length of the second axis equal to 2."))); + + Assert(ARR_ELEMTYPE(namespaces) == TEXTOID); + + deconstruct_array(namespaces, TEXTOID, -1, false, 'i', + &ns_names_uris, &ns_names_uris_nulls, + &ns_count); + + Assert((ns_count % 2) == 0); /* checked above */ + ns_count /= 2; /* count pairs only */ + } + else + { + ns_names_uris = NULL; + ns_names_uris_nulls = NULL; + ns_count = 0; + } + + datastr = VARDATA(data); + len = VARSIZE(data) - VARHDRSZ; + xpath_len = VARSIZE(xpath_expr_text) - VARHDRSZ; + if (xpath_len == 0) + ereport(ERROR, + (errcode(ERRCODE_DATA_EXCEPTION), + errmsg("empty XPath expression"))); + + string = (xmlChar *) palloc((len + 1) * sizeof(xmlChar)); + memcpy(string, datastr, len); + string[len] = '\0'; + + xpath_expr = (xmlChar *) palloc((xpath_len + 1) * sizeof(xmlChar)); + memcpy(xpath_expr, VARDATA(xpath_expr_text), xpath_len); + xpath_expr[xpath_len] = '\0'; + + pg_xml_init(); + xmlInitParser(); + + PG_TRY(); + { + /* + * redundant XML parsing (two parsings for the same value during one + * command execution are possible) + */ + ctxt = xmlNewParserCtxt(); + if (ctxt == NULL) + xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY, + "could not allocate parser context"); + doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0); + if (doc == NULL) + xml_ereport(ERROR, ERRCODE_INVALID_XML_DOCUMENT, + "could not parse XML document"); + xpathctx = xmlXPathNewContext(doc); + if (xpathctx == NULL) + xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY, + "could not allocate XPath context"); + xpathctx->node = xmlDocGetRootElement(doc); + if (xpathctx->node == NULL) + xml_ereport(ERROR, ERRCODE_INTERNAL_ERROR, + "could not find root XML element"); + + /* register namespaces, if any */ + if (ns_count > 0) + { + for (i = 0; i < ns_count; i++) + { + char *ns_name; + char *ns_uri; + + if (ns_names_uris_nulls[i * 2] || + ns_names_uris_nulls[i * 2 + 1]) + ereport(ERROR, + (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), + errmsg("neither namespace name nor URI may be null"))); + ns_name = TextDatumGetCString(ns_names_uris[i * 2]); + ns_uri = TextDatumGetCString(ns_names_uris[i * 2 + 1]); + if (xmlXPathRegisterNs(xpathctx, + (xmlChar *) ns_name, + (xmlChar *) ns_uri) != 0) + e
Re: [HACKERS] ExecutorCheckPerms() hook
* KaiGai Kohei (kai...@ak.jp.nec.com) wrote: > * DoCopy() and RI_Initial_Check() were reworked to call ExecCheckRTEPerms() > with locally built RangeTblEntry. Maybe I missed it somewhere, but we still need to address the case where the user doesn't have those SELECT permissions that we're looking for in RI_Initial_Check(), right? KaiGai, your patch should be addressing that in a similar fashion.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] libpq, PQexecPrepared, data size sent to FE vs. FETCH_COUNT
On 05/25/2010 07:35 AM, Alex Goncharov wrote: ,--- I/Alex (Mon, 24 May 2010 12:25:18 -0400) * | No equivalent of FETCH_COUNT is available at the libpq level, so I | assume that the interface I am using is smart enough not to send | gigabytes of data to FE. | | Where does the result set (GBs of data) reside after I call | PQexecPrepared? On BE, I hope? Sorry for asking again... No sarcasm meant: is there no straightforward answer here? Or nobody is certain? Or a wrong list? Issue multiple queries and make use of LIMIT/OFFSET. You'll have to go manual on this one. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade docs
On Mon, May 24, 2010 at 11:35 PM, Bruce Momjian wrote: > Have you read the docs? It does mention the issue with /contrib and > stuff. How do I document a limitation I don't know about? This is all > very vague. Please suggest some wording. OK, here's an attempt. Please fact-check. -- General Limitations pg_upgrade relies on binary compatibility between the old and new on-disk formats, including the on-disk formats of individual data types. pg_upgrade attempts to detect cases in which the on-disk format has changed; for example, it verifies that the old and new clusters have the same value for --enable-integer-datetimes. However, there is no systematic way for pg_upgrade to detect problems of this type; it has hard-coded knowledge of the specific cases known to exist in core PostgreSQL, including /contrib. If third-party or user-defined data types or access methods are used, it is the user's responsibility to verify that the versions loaded into the old and new clusters use compatible on-disk formats. If they do not, pg_upgrade may appear to work but subsequently crash or silently corrupt data. pg_upgrade also relies on ABI compatibility between modules loaded into the old and new clusters. For example, if an SQL function in the old cluster is defined to call a particular C function, pg_upgrade will recreate SQL function in the new cluster and will configure it to call the same C function. If no such C function can be found by the new cluster, pg_upgrade will simply fail. However, if a C function of the same name exists in the new cluster, but expects a different number of arguments or different types of arguments, then it is likely to crash the system when called. In the worst case, data corruption could result. -- Also, the following sentence appears not to fit with our "only to 9.0" policy: "For Windows users, note that due to different integer datetimes settings used by the one-click installer and the MSI installer, it is only possible to upgrade from version 8.3 of the one-click distribution to version 8.4 of the one-click distribution. It is not possible to upgrade from the MSI installer to the one-click installer." -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq, PQexecPrepared, data size sent to FE vs. FETCH_COUNT
,--- Abhijit Menon-Sen (Tue, 25 May 2010 17:26:18 +0530) * | Unless you explicitly declare and fetch from an SQL-level cursor, your | many GBs of data are going to be transmitted to libpq, which will eat | lots of memory. (The wire protocol does have something like cursors, | but libpq does not use them, it retrieves the entire result set.) ,--- Yeb Havinga (Tue, 25 May 2010 14:08:51 +0200) * | The GBs of data are gathered at the site of the libpq client (pgresult | object gathered/allocated while consuming result input from backend). `--* Thank you very much! -- Alex -- alex-goncha...@comcast.net -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ExecutorCheckPerms() hook
KaiGai, * KaiGai Kohei (kai...@ak.jp.nec.com) wrote: > OK, the attached patch reworks it according to the way. Reviewing this patch, there are a whole slew of problems. #1: REALLY BIG ISSUE- Insufficient comment updates. You've changed function definitions in a pretty serious way as well as moved some code around such that some of the previous comments don't make sense. You have got to update comments when you're writing a patch. Indeed, the places I see a changes in comments are when you've removed what appears to still be valid and appropriate comments, or places where you've added comments which are just blatently wrong with the submitted patch. #2: REALLY BIG ISSUE- You've added ExecutorCheckPerms_hook as part of this patch- don't, we're in feature-freeze right now and should not be adding hooks at this time. #3: You didn't move ExecCheckRTPerms() and ExecCheckRTEPerms() to utils/acl and instead added executor/executor.h to rt_triggers.c. I don't particularly like that. I admit that DoCopy() already knew about the executor, and if that were the only case outside of the executor where ExecCheckRTPerms() was getting called it'd probably be alright, but we already have another place that wants to use it, so let's move it to a more appropriate place. #4: As mentioned previously, the hook (which should be added in a separate patch anyway) makes more sense to me to be in ExecCheckRTPerms(), not ExecCheckRTEPerms(). This also means that we need to be calling ExecCheckRTPerms() from DoCopy and RI_Initial_Check(), to make sure that the hook gets called. To that end, I wouldn't even expose ExecCheckRTEPerms() outside of acl.c. Also, there should be a big comment about not using or calling ExecCheckRTEPerms() directly outside of ExecCheckRTPerms() since the hook would then be skipped. #5: In DoCopy, you can remove relPerms and remainingPerms, but I'd probably leave required_access up near the top and then just use it to set rte->required_access directly rather than moving that bit deep down into the function. #6: I havn't checked yet, but if there are other things in an RTE which would make sense in the DoCopy case, beyond just what's needed for the permissions checking, and which wouldn't be 'correct' with a NULL'd value, I would set those. Yes, we're building the RTE to check permissions, but we don't want someone downstream to be suprised when they make a change to something in the permissions checking and discover that a value in RTE they expected to be there wasn't valid. Even more so, if there are function helpers which can be used to build an RTE, we should be using them. The same goes for RI_Initial_Check(). #7: I'd move the conditional if (is_from) into the foreach which is building the columnsSet and eliminate the need for columnsSet; I don't see that it's really adding much here. #8: When moving ExecCheckRTPerms(), you should rename it to be more like the other function calls in acl.h Perhaps pg_rangetbl_aclcheck()? Also, it should return an actual AclResult instead of just true/false. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] JSON manipulation functions
On Tue, May 25, 2010 at 12:57, Robert Haas wrote: > On Tue, May 25, 2010 at 5:37 AM, Joseph Adams > wrote: >> I started a wiki article for brainstorming the JSON API: >> http://wiki.postgresql.org/wiki/JSON_API_Brainstorm . I also made >> substantial changes to the draft of the API based on discussion here >> and on the #postgresql IRC channel. >> >> Is it alright to use the wiki for brainstorming, or should it stay on >> the mailing list or go somewhere else? > > Well, I think it's fine to use the wiki for brainstorming, but before > you change the design you probably need to talk about it here. You > can't rely on everyone on -hackers to follow changes on a wiki page > somewhere. It looks like the API has been overhauled pretty heavily > since the last version we talked about here, and I'm not sure I > understand it. The general idea that most people have been using, and that I think is correct, is to have the discussion here on the list, and then keep a summary of the current state of it on the wiki page so it's easier for someone entering the discussion to catch up on where it is. >> I'll try not to spend too much time quibbling over the specifics as I >> tend to do. While the brainstorming is going on, I plan to start >> implementing the datatype by itself so I can establish an initial >> working codebase. > > Sounds good. Agreed. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq, PQexecPrepared, data size sent to FE vs. FETCH_COUNT
At 2010-05-25 07:35:34 -0400, alex-goncha...@comcast.net wrote: > > | Where does the result set (GBs of data) reside after I call > | PQexecPrepared? On BE, I hope? Unless you explicitly declare and fetch from an SQL-level cursor, your many GBs of data are going to be transmitted to libpq, which will eat lots of memory. (The wire protocol does have something like cursors, but libpq does not use them, it retrieves the entire result set.) -- ams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq, PQexecPrepared, data size sent to FE vs. FETCH_COUNT
Alex Goncharov wrote: ,--- I/Alex (Mon, 24 May 2010 12:25:18 -0400) * | No equivalent of FETCH_COUNT is available at the libpq level, so I | assume that the interface I am using is smart enough not to send | gigabytes of data to FE. | | Where does the result set (GBs of data) reside after I call | PQexecPrepared? On BE, I hope? Sorry for asking again... No sarcasm meant: is there no straightforward answer here? Or nobody is certain? Or a wrong list? The straighforward answer is that the libpq frontend c-library does not support something like the JDBC client's setFetchSize. The GBs of data are gathered at the site of the libpq client (pgresult object gathered/allocated while consuming result input from backend). regards, Yeb Havinga -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
2010/5/25 Dan Ports : > On Mon, May 24, 2010 at 10:24:07AM -0500, Kevin Grittner wrote: > >> Replicating or recreating the whole predicate locking and conflict >> detection on slaves is not feasible for performance reasons. (I >> won't elaborate unless someone feels that's not intuitively >> obvious.) The only sane way I can see to have a slave database allow >> serializable behavior is to WAL-log the acquisition of a snapshot by >> a serializable transaction, and the rollback or commit, on the >> master, and to have the serializable snapshot build on a slave >> exclude any serializable transactions for which there are still >> concurrent serializable transactions. Yes, that does mean WAL- >> logging the snapshot acquisition even if the transaction doesn't yet >> have an xid, and WAL-logging the commit or rollback even if it never >> acquires an xid. > > One important observation is that any anomaly that occurs on the slave > can be resolved by aborting a local read-only transaction. This is a > good thing, because the alternatives are too horrible to consider. > > You could possibly cut the costs of predicate locking by having the > master ship with each transaction the list of predicate locks it > acquired. But you'd still have to track locks for read-only > transactions, so maybe that's not a significant cost improvement. On > the other hand, if you're willing to pay the price of serializability > on the master, why not the slaves too? I don't understand the problem. According to me, in the context of SSI, a read-only slave can just map SERIALIZABLE to the technical implementation of REPEATABLE READ (i.e., the currently-existing "SERIALIZABLE"). The union of the transactions on the master and the slave(s) will still exhibit SERIALIZABLE behavior because the transactions on the slave cannot write anything and are therefore irrelevant. Is anything wrong with that reasoning? Nicolas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers