Re: [HACKERS] Bug in new buffering GiST build code
On 05.06.2012 09:45, Alexander Korotkov wrote: On Tue, Jun 5, 2012 at 2:00 AM, Alexander Korotkovwrote: On Mon, May 28, 2012 at 1:46 AM, Alexander Korotkovwrote: On Sat, May 26, 2012 at 12:33 AM, Heikki Linnakangas< heikki.linnakan...@enterprisedb.com> wrote: Alexander, do you still have the test environments and data lying around that you used for GiST buffering testing last summer? Could you rerun some of those tests with this patch? I think I can restore test environment and data. Will rerun tests soon. I rerun some of tests. There are index build times in seconds for old way of parent refind and new way of it. old new usnoa2 2385 2452 usnoa2_shuffled 8131 8055 uniform 8327 8359 I thinks difference can be described by round error. Oh, I mean not "round" error, but "random". I.e. not exactly same state of shared buffers at index build start and so on. Thanks. I was expecting a small performance gain from the new approach, but I guess not. Oh well, that would've just been a bonus, the important thing is that it now works. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in new buffering GiST build code
On Tue, Jun 5, 2012 at 2:00 AM, Alexander Korotkov wrote: > On Mon, May 28, 2012 at 1:46 AM, Alexander Korotkov > wrote: > >> On Sat, May 26, 2012 at 12:33 AM, Heikki Linnakangas < >> heikki.linnakan...@enterprisedb.com> wrote: >> >>> Alexander, do you still have the test environments and data lying around >>> that you used for GiST buffering testing last summer? Could you rerun some >>> of those tests with this patch? >>> >> >> I think I can restore test environment and data. Will rerun tests soon. >> > > I rerun some of tests. There are index build times in seconds for old way > of parent refind and new way of it. > > old new > usnoa2 2385 2452 > usnoa2_shuffled 8131 8055 > uniform 8327 8359 > > I thinks difference can be described by round error. > Oh, I mean not "round" error, but "random". I.e. not exactly same state of shared buffers at index build start and so on. -- With best regards, Alexander Korotkov.
Re: [HACKERS] Unnecessary WAL archiving after failover
On Fri, Mar 23, 2012 at 11:03:27PM +0900, Fujii Masao wrote: > > On Wed, Feb 29, 2012 at 5:48 AM, Fujii Masao wrote: > >> In streaming replication, after failover, new master might have lots > >> of un-applied > >> WAL files with old timeline ID. They are the WAL files which were recycled > >> as a > >> future ones when the server was running as a standby. Since they will > >> never be > >> used later, they don't need to be archived after failover. But since they > >> have > >> neither .ready nor .done file in archive_status, checkpoints after > >> failover newly > >> create .reacy files for them, and then finally they are archived. > >> Which might cause > >> disk I/O spike both in WAL and archive storage. If the old master archived later WAL that the new master never restored, won't this attempt to archive a file under a name that already exists in the archive? The documentation says this: The archive command should generally be designed to refuse to overwrite any pre-existing archive file. This is an important safety feature to preserve the integrity of your archive in case of administrator error (such as sending the output of two different servers to the same archive directory). It is advisable to test your proposed archive command to ensure that it indeed does not overwrite an existing file, and that it returns nonzero status in this case. Archiving on the new master would halt until the operator intervenes. > >> To avoid the above problem, I think that un-applied WAL files with old > >> timeline ID > >> should be marked as already-archived and recycled immediately at the end of > >> recovery. Thought? A small hazard comes to mind. If the administrator manually copied post-timeline-divergence segments from the failed master to the new master's pg_xlog, the current implementation loads them into the archive for you. The new master could never apply those files locally, but they might be useful for alternate recoveries down the previous timeline. Nonetheless, we can just as reasonably specify that it's not a role of the new master to provide this service. Call the fact that it did so in previous releases an implementation artifact. What about instead creating an archive status file at recycle time and deleting it as we begin to populate the file? That distinguishes copied-in, unarchived segments from recycled ones. Incidentally, RemoveOldXlogFiles() has this comment: /* * We ignore the timeline part of the XLOG segment identifiers in * deciding whether a segment is still needed. This ensures that we * won't prematurely remove a segment from a parent timeline. We could * probably be a little more proactive about removing segments of * non-parent timelines, but that would be a whole lot more * complicated. Should both instances of "parent" be "child" or "descendant"? > Just after failover, there can be three kinds of WAL files in new > master's pg_xlog directory: > > (1) WAL files which were recycled to by restartpoint > > I've already explained upthread the issue which these WAL files cause > after failover. > > > (2) WAL files which were restored from the archive > > In 9.1 or before, the restored WAL files don't remain after failover > because they are always restored onto the temporary filename > "RECOVERYXLOG". So the issue which I explain from now doesn't exist > in 9.1 or before. > > In 9.2dev, as the result of supporting cascade replication, > an archived WAL file is restored onto correct file name so that > cascading walsender can send it to another standby. This restored The documentation still says this: WAL segments that cannot be found in the archive will be sought in pg_xlog/; this allows use of recent un-archived segments. However, segments that are available from the archive will be used in preference to files in pg_xlog/. The system will not overwrite the existing contents of pg_xlog/ when retrieving archived files. I gather the last sentence is now false? > WAL file has neither .ready nor .done archive status file. After > failover, checkpoint checks the archive status file of the restored > WAL file to attempt to recycle it, finds that it has neither .ready > nor ,done, and creates .ready. Because of existence of .ready, > it will be archived again even though it obviously already exists in > the archival storage :( > > To prevent a restored WAL file from being archived again, I think > that .done should be created whenever WAL file is successfully > restored (of course this should happen only when archive_mode is > enabled). Thought? Your proposed fix makes sense, and I cannot think of any disadvantage. Concerning only doing it when archive_mode=on, would there ever be a case where a segment is restored under archive_mode=off, then the server restarted with archive_mode=on and an archival attempted on that se
[HACKERS] 9.3: load path to mitigate load penalty for checksums
Introduction: = A lot of progress has been made on the checksums issue, with Simon's excellent work beginning to gain consensus: http://archives.postgresql.org/message-id/CA +u5nmkw_gbs6qq_y8-rjgl1v7mvw2hwbhartb8lojhnpfx...@mail.gmail.com For the purposes of this proposal, I'll assume that's the general direction we'll be taking for CRCs. The major drawback of that proposal is that it increases the amount of work to be done after a large data load by requiring more WAL. Proposal: = I propose a special user-initiated loading mode at the table granularity. During this time, readers must ignore PD_ALL_VISIBLE, HEAP_XMIN_COMMITTED, and the visibility map entirely. However, writers may set all of those bits before the writing transaction commits, obviating the need to rewrite (and WAL) the data again later. Ideally, there would be no work for VACUUM to do after the data load (unless a transaction aborted). This would also help the non-CRC case of course, but I expect CRCs to make this significantly more important. Goals: = * Table granularity (doesn't affect other tables at all) * Allows concurrent loaders * Allows loading into existing tables with existing data * Online (allow reads to proceed, even if degraded) Rough user interaction: = INITIATE LOAD ON foo AS 'job name'; -- run concurrent loading sessions FINALIZE LOAD 'job name'; High-level design: = By "hints" I mean the VM bit, PD_ALL_VISIBLE, and HEAP_XMIN_COMMITTED. By "ignorehints" I mean a flag in pg_class indicating that readers should ignore hints. By "optimistichints" I mean a flag in pg_class indicating that writers can optimistically set hints. Obviously, readers and writers would need a mechanism to honor those flags, but I haven't dug into the details yet (additional routines in tqual.c?). States: 0: normal * ignorehints = false * optimistichints = false 1: trying to enter data load mode, waiting on existing lockers (who might be scanning) to finish * ignorehints = true * optimistichints = false 2: data load mode * ignorehints = true * optimistichints = true 3: trying to leave data load mode, waiting on old snapshots to be released and aborted transactions to be cleaned up * ignorehints = true * optimistichints = false INITIATE LOAD would first transition from state 0 to 1 by acquiring a ShareUpdateExclusiveLock on the table (to be sure no concurrent INITIATE or FINALIZE LOAD is going on) and setting ignorehints = true. Then it moves from state 1 to state 2 by waiting for all transactions that hold a lock on that table. Any transactions that don't already have a lock will see the new flag when they try to get it. Now we're sure that all readers will see the "ignorehints" flag, so we can set the "optimistichints" flag to indicate that writers can write hints optimistically. FINALIZE LOAD would first move from state 2 to state 3 by acquiring a ShareUpdateExclusiveLock on the table setting optimistichints = false. Then, it would move from state 3 to state 0 by first waiting for all transactions that currently hold a lock on the table, to ensure they see the optimistichints=false flag. Then, it would remember the current xid as max_loader_xid, and wait until the global xmin is greater than max_loader_xid. This should ensure that all snapshots regard all loading transactions as complete. Also, it would need to run a lazy VACUUM to remove any tuples from aborted transactions. Details and optimizations = * We probably want a graceful way to handle multiple data loads happening on the same table. Rather than erroring out, we could treat it more like a reference count, and only do the work to move in to data load mode if not there already, and only move out of data load mode if we're the last loading job on the table. * In general, there are some usability issues to sort out, to make sure a table isn't put into data load mode and left that way. Right now, I'm mostly concerned with getting a working design, but those will be important, too. * We could optimize away the VACUUM going from 3 -> 0 if we are sure no writing transactions aborted. * INITIATE and FINALIZE probably need to use PreventTransactionChain() and multiple transactions, to avoid holding the ShareUpdateExclusiveLock for too long. Also, we want to keep people from using it in the same transaction as the loading xact, because they might not realize that they would get a concurrency of 1 that way (because of the ShareUpdateExclusiveLock). Thoughts? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Interface of Row Level Security
Florian Pflug writes: > On Jun4, 2012, at 18:38 , Kohei KaiGai wrote: >> 2012/6/4 Florian Pflug : >>> Without something like RLSBYPASS, the DBA needs to have intimate >>> knowledge about the different RLS policies to e.g. guarantee that his >>> backups aren't missing crucial information, or that the replication >>> system indeed replicates all rows. >> It seems to me you can define a function which implements site- >> specific security requirement (E.g "backup should not be prevented >> by RLS policy"), then include it as a part of RLS policy >> (or implicitly added by extensions, like sepgsql tries to do). > Sure. But that requires each and every application which uses RLS > to provide support for special backup (or replication, or whatever) > privileges. And it requires the DBA to know how to assign these > privileges to certain roles for each any every application in question. > For shops which uses a lot of different applications, all with their > own RLS policy, that can quickly get out of hand. > Plus, a bug in one of these RLS policies has the potential to render > backups incomplete. I agree with Florian here: if there is no way to disable RLS for backups, the database will be un-administratable. RLSBYPASS is not necessarily the only or best way to provide such an override, but we have to have something that is simple, foolproof, and *not* dependent on the details of any one RLS policy. I suspect that KaiGai-san's objection basically comes down to not wanting to have what amounts to a backdoor in RLS policies. However, what Florian is saying is that you have to have a backdoor anyway, unless you'd like to be at risk of losing data because it wasn't backed up. You can either have one well-understood, well-documented, well-tested backdoor, or you can have an ad-hoc backdoor in each RLS policy. Nobody can think that the latter approach is preferable. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Interface of Row Level Security
On Jun4, 2012, at 18:38 , Kohei KaiGai wrote: > 2012/6/4 Florian Pflug : >> Without something like RLSBYPASS, the DBA needs to have intimate >> knowledge about the different RLS policies to e.g. guarantee that his >> backups aren't missing crucial information, or that the replication >> system indeed replicates all rows. >> >> With RLSBYPASS, all he needs to do is grant one privilege to his >> replication or backup user. The rest can be left to the development >> or support team for a specific application. > It seems to me you can define a function which implements site- > specific security requirement (E.g "backup should not be prevented > by RLS policy"), then include it as a part of RLS policy > (or implicitly added by extensions, like sepgsql tries to do). Sure. But that requires each and every application which uses RLS to provide support for special backup (or replication, or whatever) privileges. And it requires the DBA to know how to assign these privileges to certain roles for each any every application in question. For shops which uses a lot of different applications, all with their own RLS policy, that can quickly get out of hand. Plus, a bug in one of these RLS policies has the potential to render backups incomplete. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] No, pg_size_pretty(numeric) was not such a hot idea
Jim Nasby writes: > On 5/27/12 2:54 PM, Euler Taveira wrote: >> On 27-05-2012 10:45, Fujii Masao wrote: >>> OK, let me propose another approach: add pg_size_pretty(int). >> I wouldn't like to add another function but if it solves both problems... +1. > FWIW, I would argue that the case of pg_size_pretty(8*1024*1024) is > pretty contrived... Yeah, possibly. In any case, I don't think we're making either of these changes in 9.2, because the time for forcing initdbs is past. It would only be realistic to think about changing pg_size_pretty() if we come across some other, much more compelling reason to force a system catalog contents change. Assuming that's how 9.2 ships, we might as well wait to see if there are any real complaints from the field before we decide whether any changing is needed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in new buffering GiST build code
On Mon, May 28, 2012 at 1:46 AM, Alexander Korotkov wrote: > On Sat, May 26, 2012 at 12:33 AM, Heikki Linnakangas < > heikki.linnakan...@enterprisedb.com> wrote: > >> Alexander, do you still have the test environments and data lying around >> that you used for GiST buffering testing last summer? Could you rerun some >> of those tests with this patch? >> > > I think I can restore test environment and data. Will rerun tests soon. > I rerun some of tests. There are index build times in seconds for old way of parent refind and new way of it. old new usnoa2 2385 2452 usnoa2_shuffled 8131 8055 uniform 8327 8359 I thinks difference can be described by round error. Indexes seem to be exactly same. It's predictable because changing algorithm of parent refind shouldn't change the result. -- With best regards, Alexander Korotkov.
Re: [HACKERS] No, pg_size_pretty(numeric) was not such a hot idea
On 5/27/12 2:54 PM, Euler Taveira wrote: On 27-05-2012 10:45, Fujii Masao wrote: OK, let me propose another approach: add pg_size_pretty(int). If we do this, all usability and performance problems will be solved. I wouldn't like to add another function but if it solves both problems... +1. FWIW, I would argue that the case of pg_size_pretty(8*1024*1024) is pretty contrived... when would you actually do something like that? ISTM that any time you're using pg_size_pretty you'd be coming off a real datatype. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VIP: new format for psql - shell - simple using psql in shell
On 5/26/12 10:16 PM, Pavel Stehule wrote: My idea is secure to separator - because separator is just space and new line and these symbols are escaped. ISTM it'd be a really good idea to support something other than space, since presumably that'd be trivial. I'm not a fan of supporting the array construction. If you get to that level of complexity in bash it's a really good sign that you need to use a real language. If we do add array support then it needs to be keyed to the actual shell in use, because it's inconsistent between them. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.
On Sat, Jun 2, 2012 at 12:01 PM, Michael Nolan wrote: > On 6/2/12, Tom Lane wrote: >> Robert Haas writes: >>> On the other hand, if we simply say "PostgreSQL computes the >>> replication delay by subtracting the time at which the WAL was >>> generated, as recorded on the master, from the time at which it is >>> replayed by the slave" then, hey, we still have a wart, but it's >>> pretty clear what the wart is and how to fix it, and we can easily >>> document that. Again, if we could get rid of the failure modes and >>> make this really water-tight, I think I'd be in favor of that, but it >>> seems to me that we are in the process of expending a lot of energy >>> and an even larger amount of calendar time to create a system that >>> will misbehave in numerous subtle ways instead of one straightforward >>> one. I don't see that as a good trade. >> >> Well, okay, but let's document "if you use this feature, it's incumbent >> on you to make sure the master and slave clocks are synced. We >> recommend running NTP." or words to that effect. > > What if the two servers are in different time zones? NTP shouldn't have any problem; it uses UTC underneath. As does PostgreSQL, underneath. -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?" -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Interface of Row Level Security
2012/6/4 Tom Lane : > Kohei KaiGai writes: >> Here is two problems around RLSBYPASS. The first is we have >> no idea to handle invalidation of prepared-statement when current >> user is switched, right now. > > How is that specifically the fault of RLSBYPASS? *Any* of the schemes > you're proposing for inlined RLS checks will have problems with userID > switching. > Really? I don't find out a scenario that cause a problem with user-id switching in case when RLS policy is *unconditionally* appended then evaluated on executor stage. I'd like to see the scenario. > My guess is we'd have to treat the effective userID as part of the > plancache lookup key to make it safe to inline anything related to RLS. > It might be a solution, if we append individual RLS policy at the planner stage, depending on user-id. Thanks, -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Btree or not btree? That is the question
Greg Sabino Mullane writes: > We have a 8.3.18 system (yes, the same one from the previous > thread, finally upgraded!) that gave us this error yesterday: > ERROR: index "pg_class_oid_index" is not a btree That means you got bogus data while reading the metapage. I'm beginning to wonder about the hardware on this server ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Interface of Row Level Security
Kohei KaiGai writes: > Here is two problems around RLSBYPASS. The first is we have > no idea to handle invalidation of prepared-statement when current > user is switched, right now. How is that specifically the fault of RLSBYPASS? *Any* of the schemes you're proposing for inlined RLS checks will have problems with userID switching. My guess is we'd have to treat the effective userID as part of the plancache lookup key to make it safe to inline anything related to RLS. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Mon, Jun 4, 2012 at 10:42 AM, Ants Aasma wrote: > On Mon, Jun 4, 2012 at 6:38 PM, Merlin Moncure wrote: >> On Mon, Jun 4, 2012 at 10:17 AM, Merlin Moncure wrote: >>> What happens (in the very unlikely, but possible case?) if another >>> backend races to the buffer you've pointed at with 'victim'? It looks >>> like multiple backends share the clock sweep now, but don't you need >>> to need an extra test to ensure it's still a candidate victim buffer? >> >> Actually, I don't think you do: the existing check on refcount is >> probably good enough. Hm, why did you get rid of >> BufferStrategyControl.lastFreeBuffer? > > It was dead code as far as I could tell. That change isn't actually > relevant for this patch because free-list management is still > protected by a lock (except the initial unlocked test that is > doublechecked under lock) and so doesn't need any adjustment. I have to admit -- this is pretty cool. The changes look pretty clean but I think it's going to need some benchmarking to prove it's actually faster under all workloads and some more review though. If there's heavy contention on getting victim buffers you're going to be making a lot of atomic operations -- each one of which will lock down the cache line. A yielding lwlock is obviously going to have very different behaviors under contention, so there needs to be a performance test that really burns up the freelist. If it's Well And Truly faster, you have to immediately start wondering if the freelist lock can't be completely scrapped: buffer invalidation only occurs in fairly rare events -- but when it does ISTM to be fairly counter productive. In trying to avoid the sweep you end up pushing everyone towards a locky free list. I don't know about you, but I'd prefer to have more consistent behavior (as in, not "drop big table, concurrency drops 50%"). Noted, this doesn't really address the OP's problem :-). I bet it'd supplement the nailing strategy nicely however. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Interface of Row Level Security
2012/6/4 Florian Pflug : > On Jun4, 2012, at 17:38 , Kohei KaiGai wrote: >> I'm worry about future maintenance issues, once we have >> RLSBYPASS permission or something user visible… > > I fear that without a generic way to disable RLS regardless which > RLS policy function is in effect, we're creating a huge maintenance > issue for DBAs. In a lot of shops, the DBA is responsible for a large > number of databases, each potentially using a completely different > approach to RLS and hence a completely different policy function. > Here is two problems around RLSBYPASS. The first is we have no idea to handle invalidation of prepared-statement when current user is switched, right now. The second is we can have another way to describe same RLS policy without PG original enhancement towards permission mechanism... > Without something like RLSBYPASS, the DBA needs to have intimate > knowledge about the different RLS policies to e.g. guarantee that his > backups aren't missing crucial information, or that the replication > system indeed replicates all rows. > > With RLSBYPASS, all he needs to do is grant one privilege to his > replication or backup user. The rest can be left to the development > or support team for a specific application. > It seems to me you can define a function which implements site- specific security requirement (E.g "backup should not be prevented by RLS policy"), then include it as a part of RLS policy (or implicitly added by extensions, like sepgsql tries to do). These are the reason why I hesitate to go ahead with RLSBYPASS permission. Thanks, -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Btree or not btree? That is the question
We have a 8.3.18 system (yes, the same one from the previous thread, finally upgraded!) that gave us this error yesterday: ERROR: index "pg_class_oid_index" is not a btree It seems like this happened from just a normal SELECT query, and did not occur again (and this is a very busy system) The pg_class table does gets vacuumed and reindexed often. All appeared fine when I did some manual investigation. Since this only happened this one time, can I chalk it up to some random rare race condition having to do with a reindex? Or is there anything else I should be looking for or can proactively do? No other system catalog problems have been seen before it or since. -- Greg Sabino Mullane g...@endpoint.com End Point Corporation PGP Key: 0x14964AC8 pgp83Kexp6hqX.pgp Description: PGP signature
Re: [HACKERS] Updated version of pg_receivexlog
On Mon, Jun 4, 2012 at 6:53 PM, Magnus Hagander wrote: > No, it's more a "there's no reason to do that". I don't think it > should necessarily be an actual problem. Ok, good to know. > In your case the missing piece of information is why was there a > timeline switch? pg_basebackup shouldn't cause a timeline switch > whether you use it in -x mode or not. No mystery there. The timeline switch was because I had just promoted the master for standby mode. There's a chance I might have accidentally done something horribly wrong somewhere because I can't immediately reproduce this. I'll let you know if I find out how I managed to create this error. Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Interface of Row Level Security
On Jun4, 2012, at 17:38 , Kohei KaiGai wrote: > I'm worry about future maintenance issues, once we have > RLSBYPASS permission or something user visible… I fear that without a generic way to disable RLS regardless which RLS policy function is in effect, we're creating a huge maintenance issue for DBAs. In a lot of shops, the DBA is responsible for a large number of databases, each potentially using a completely different approach to RLS and hence a completely different policy function. Without something like RLSBYPASS, the DBA needs to have intimate knowledge about the different RLS policies to e.g. guarantee that his backups aren't missing crucial information, or that the replication system indeed replicates all rows. With RLSBYPASS, all he needs to do is grant one privilege to his replication or backup user. The rest can be left to the development or support team for a specific application. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updated version of pg_receivexlog
On Mon, Jun 4, 2012 at 5:48 PM, Ants Aasma wrote: > On Mon, Jun 4, 2012 at 6:20 PM, Fujii Masao wrote: >> On Mon, Jun 4, 2012 at 11:25 PM, Ants Aasma wrote: >>> On Thu, Sep 29, 2011 at 11:30 PM, Magnus Hagander >>> wrote: > it doesn't say that is not possible to use this for a standby > server... probably that's why i get the error i put a recovery.conf > after pg_basebackup finished... maybe we can say that more loudly? The idea is, if you use it with -x (or --xlog), it's for taking a backup/clone, *not* for replication. If you use it without -x, then you can use it as the start of a replica, by adding a recovery.conf. But you can't do both at once, that will confuse it. >>> >>> I stumbled upon this again today. There's nothing in the docs that >>> would even hint that using -x shouldn't work to create a replica. Why >>> does it get confused and can we (easily) make it not get confused? At >>> the very least it needs a big fat warning in documentation for the -x >>> option that the resulting backup might not be usable as a standby. >> >> Unless I'm missing something, you can use pg_basebackup -x for the >> standby. If lots of WAL files are generated in the master after >> pg_basebackup -x ends and before you start the standby instance, >> you may get the following error. In this case, you need to consult with >> archived WAL files even though you specified -x option in pg_basebackup. >> >>> FATAL: could not receive data from WAL stream: FATAL: requested WAL >>> segment 0001005C has already been removed >> >> Though we have the above problem, pg_basebackup -x is usable for >> the standby, I think. > > I assumed from Magnus's comment that this is a known problem. I wonder > what went wrong if it should have worked. In the case where this > turned up the missing file was an xlog file with the new timeline ID > but one segment before the timeline switch. I'll have to see if I can > create a reproducible case for this. No, it's more a "there's no reason to do that". I don't think it should necessarily be an actual problem. In your case the missing piece of information is why was there a timeline switch? pg_basebackup shouldn't cause a timeline switch whether you use it in -x mode or not. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updated version of pg_receivexlog
On Mon, Jun 4, 2012 at 6:20 PM, Fujii Masao wrote: > On Mon, Jun 4, 2012 at 11:25 PM, Ants Aasma wrote: >> On Thu, Sep 29, 2011 at 11:30 PM, Magnus Hagander >> wrote: it doesn't say that is not possible to use this for a standby server... probably that's why i get the error i put a recovery.conf after pg_basebackup finished... maybe we can say that more loudly? >>> >>> The idea is, if you use it with -x (or --xlog), it's for taking a >>> backup/clone, *not* for replication. >>> >>> If you use it without -x, then you can use it as the start of a >>> replica, by adding a recovery.conf. >>> >>> But you can't do both at once, that will confuse it. >> >> I stumbled upon this again today. There's nothing in the docs that >> would even hint that using -x shouldn't work to create a replica. Why >> does it get confused and can we (easily) make it not get confused? At >> the very least it needs a big fat warning in documentation for the -x >> option that the resulting backup might not be usable as a standby. > > Unless I'm missing something, you can use pg_basebackup -x for the > standby. If lots of WAL files are generated in the master after > pg_basebackup -x ends and before you start the standby instance, > you may get the following error. In this case, you need to consult with > archived WAL files even though you specified -x option in pg_basebackup. > >> FATAL: could not receive data from WAL stream: FATAL: requested WAL >> segment 0001005C has already been removed > > Though we have the above problem, pg_basebackup -x is usable for > the standby, I think. I assumed from Magnus's comment that this is a known problem. I wonder what went wrong if it should have worked. In the case where this turned up the missing file was an xlog file with the new timeline ID but one segment before the timeline switch. I'll have to see if I can create a reproducible case for this. Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Mon, Jun 4, 2012 at 6:38 PM, Merlin Moncure wrote: > On Mon, Jun 4, 2012 at 10:17 AM, Merlin Moncure wrote: >> What happens (in the very unlikely, but possible case?) if another >> backend races to the buffer you've pointed at with 'victim'? It looks >> like multiple backends share the clock sweep now, but don't you need >> to need an extra test to ensure it's still a candidate victim buffer? > > Actually, I don't think you do: the existing check on refcount is > probably good enough. Hm, why did you get rid of > BufferStrategyControl.lastFreeBuffer? It was dead code as far as I could tell. That change isn't actually relevant for this patch because free-list management is still protected by a lock (except the initial unlocked test that is doublechecked under lock) and so doesn't need any adjustment. Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Interface of Row Level Security
2012/6/4 Robert Haas : > On Sat, Jun 2, 2012 at 12:58 AM, Kohei KaiGai wrote: >> 2012/6/1 Robert Haas : >>> On Thu, May 31, 2012 at 3:52 PM, Kohei KaiGai wrote: It may be an option to separate the case into two; a situation to execute the given query immediately just after optimization and never reused, and others. >>> >>> Yes. Simon suggested exactly this a while back, and I agree with him >>> that we ought to have it. >>> >> It is good for me, also. >> Then, if so, we will be able to push the stuff corresponding to RLSBYPASS into the query optimization, and works transparently for users. >>> >>> You're still going to need a way to make sure that the cluster can be >>> dumped properly. RLSBYPASS accomplishes that; your scheme doesn't. >>> >> If something like "has_superuser_privilege() OR" is automatically >> appended to user given clauses, it makes sure whole of the database >> cluster is dumped. >> That also means any permission checks are delayed to executor stage >> (except for the case on simple query protocol, I mentioned above), >> thus it simplifies the condition to invalidate prepared statement. >> >> One problem I noticed last night around RLSBYPASS approach is: >> it can take much number of invalidation whenever current user-id is >> switched. Not only SET AUTHORIZATION, but SECURITY DEFINER >> function's invocation also. I'm not sure whether this invalidation >> storm is reasonable level, or not. >> >> Is it unavailable to handle this type of implicit superuser checks >> with existing EXECUTE statement? >> I tried to run EXPLAIN with similar case. >> >> postgres=# PREPARE p1(int, bool) AS SELECT * FROM tbl WHERE y > $1 OR $2; >> PREPARE >> postgres=# EXPLAIN EXECUTE p1(10, current_user in ('kaigai','rhaas')); >> QUERY PLAN >> >> Seq Scan on tbl (cost=0.00..21.60 rows=1160 width=40) >> (1 row >> >> However, >> >> postgres=# PREPARE p2(int) AS SELECT * FROM tbl >> WHERE y > $1 OR current_user in ('kaigai','rhaas'); >> PREPARE >> postgres=# EXPLAIN EXECUTE p2(10); >> QUERY PLAN >> - >> Seq Scan on tbl (cost=0.00..30.30 rows=394 width=40) >> Filter: ((y > 10) OR ("current_user"() = ANY ('{kaigai,rhaas}'::name[]))) >> (2 rows) >> >> Please assume the second condition something like superuser >> checks in addition to the main security policy. >> It implies an idea to replace a certain portion of clause that >> consists of only constant values and stable / immutable functions >> by shadow parameter to be calculated at executor stage, makes >> sense to wipe out RLS policy for superusers. In addition, it requires >> no new invalidation mechanism to prepared statements. > > I'm not sure how best to handle the invalidation issue... but I still > think that relying on the query planner to use theorem-proving to > simplify RLS conditions is going to be problematic. You can certainly > try it ad see how it comes out... > I think, the enhancement of planner can be handled independently from RLS policy stuff, and we should avoid to contain many complex stuff into one functionality. At least, it will work correctly with idea to append implicit condition ("OR has_superuser_privilege()") without any invalidation mechanism, although its performance penalty is not negligible; without planner- enhancement. How about your opinion? I'm worry about future maintenance issues, once we have RLSBYPASS permission or something user visible... Thanks, -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Mon, Jun 4, 2012 at 10:17 AM, Merlin Moncure wrote: > What happens (in the very unlikely, but possible case?) if another > backend races to the buffer you've pointed at with 'victim'? It looks > like multiple backends share the clock sweep now, but don't you need > to need an extra test to ensure it's still a candidate victim buffer? Actually, I don't think you do: the existing check on refcount is probably good enough. Hm, why did you get rid of BufferStrategyControl.lastFreeBuffer? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updated version of pg_receivexlog
On Mon, Jun 4, 2012 at 11:25 PM, Ants Aasma wrote: > On Thu, Sep 29, 2011 at 11:30 PM, Magnus Hagander wrote: >>> it doesn't say that is not possible to use this for a standby >>> server... probably that's why i get the error i put a recovery.conf >>> after pg_basebackup finished... maybe we can say that more loudly? >> >> The idea is, if you use it with -x (or --xlog), it's for taking a >> backup/clone, *not* for replication. >> >> If you use it without -x, then you can use it as the start of a >> replica, by adding a recovery.conf. >> >> But you can't do both at once, that will confuse it. > > I stumbled upon this again today. There's nothing in the docs that > would even hint that using -x shouldn't work to create a replica. Why > does it get confused and can we (easily) make it not get confused? At > the very least it needs a big fat warning in documentation for the -x > option that the resulting backup might not be usable as a standby. Unless I'm missing something, you can use pg_basebackup -x for the standby. If lots of WAL files are generated in the master after pg_basebackup -x ends and before you start the standby instance, you may get the following error. In this case, you need to consult with archived WAL files even though you specified -x option in pg_basebackup. > FATAL: could not receive data from WAL stream: FATAL: requested WAL > segment 0001005C has already been removed Though we have the above problem, pg_basebackup -x is usable for the standby, I think. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Mon, Jun 4, 2012 at 9:20 AM, Ants Aasma wrote: > On Mon, Jun 4, 2012 at 5:12 PM, Robert Haas wrote: >> Note sure about the rest of this patch, but this part is definitely bogus: >> >> +#if !defined(pg_atomic_fetch_and_set) >> +#define pg_atomic_fetch_and_set(dst, src, value) \ >> + do { S_LOCK(&dummy_spinlock); \ >> + dst = src; \ >> + src = value; \ >> + S_UNLOCK(&dummy_spinlock); } while (0) >> +#endif >> >> Locking a dummy backend-local spinlock doesn't provide atomicity >> across multiple processes. > > Right, of course. I don't know why I assumed that dummy_spinlock would > be global. In any case, this is very WIP and doesn't even aspire to be > portable yet. The main point was to see if there's any significant > performance to be gained by this method. yeah -- those are fallback routines in case the compiler primitives don't exist. I think I understand what Ants is doing here: he's reducing the coverage of the free list lock to only cover actually popping a buffer off the free list; it no longer covers the clock sweep. That's a massive win if it works. In order to get away with that he had to decompose all manipulations from the clock sweep to the Strategy to thread safe atomic operations. What happens (in the very unlikely, but possible case?) if another backend races to the buffer you've pointed at with 'victim'? It looks like multiple backends share the clock sweep now, but don't you need to need an extra test to ensure it's still a candidate victim buffer? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Interface of Row Level Security
On Sat, Jun 2, 2012 at 12:58 AM, Kohei KaiGai wrote: > 2012/6/1 Robert Haas : >> On Thu, May 31, 2012 at 3:52 PM, Kohei KaiGai wrote: >>> It may be an option to separate the case into two; a situation to execute >>> the given query immediately just after optimization and never reused, >>> and others. >> >> Yes. Simon suggested exactly this a while back, and I agree with him >> that we ought to have it. >> > It is good for me, also. > >>> Then, if so, we will be able to push the stuff corresponding to >>> RLSBYPASS into the query optimization, and works transparently >>> for users. >> >> You're still going to need a way to make sure that the cluster can be >> dumped properly. RLSBYPASS accomplishes that; your scheme doesn't. >> > If something like "has_superuser_privilege() OR" is automatically > appended to user given clauses, it makes sure whole of the database > cluster is dumped. > That also means any permission checks are delayed to executor stage > (except for the case on simple query protocol, I mentioned above), > thus it simplifies the condition to invalidate prepared statement. > > One problem I noticed last night around RLSBYPASS approach is: > it can take much number of invalidation whenever current user-id is > switched. Not only SET AUTHORIZATION, but SECURITY DEFINER > function's invocation also. I'm not sure whether this invalidation > storm is reasonable level, or not. > > Is it unavailable to handle this type of implicit superuser checks > with existing EXECUTE statement? > I tried to run EXPLAIN with similar case. > > postgres=# PREPARE p1(int, bool) AS SELECT * FROM tbl WHERE y > $1 OR $2; > PREPARE > postgres=# EXPLAIN EXECUTE p1(10, current_user in ('kaigai','rhaas')); > QUERY PLAN > > Seq Scan on tbl (cost=0.00..21.60 rows=1160 width=40) > (1 row > > However, > > postgres=# PREPARE p2(int) AS SELECT * FROM tbl > WHERE y > $1 OR current_user in ('kaigai','rhaas'); > PREPARE > postgres=# EXPLAIN EXECUTE p2(10); > QUERY PLAN > - > Seq Scan on tbl (cost=0.00..30.30 rows=394 width=40) > Filter: ((y > 10) OR ("current_user"() = ANY ('{kaigai,rhaas}'::name[]))) > (2 rows) > > Please assume the second condition something like superuser > checks in addition to the main security policy. > It implies an idea to replace a certain portion of clause that > consists of only constant values and stable / immutable functions > by shadow parameter to be calculated at executor stage, makes > sense to wipe out RLS policy for superusers. In addition, it requires > no new invalidation mechanism to prepared statements. I'm not sure how best to handle the invalidation issue... but I still think that relying on the query planner to use theorem-proving to simplify RLS conditions is going to be problematic. You can certainly try it ad see how it comes out... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updated version of pg_receivexlog
On Thu, Sep 29, 2011 at 11:30 PM, Magnus Hagander wrote: >> it doesn't say that is not possible to use this for a standby >> server... probably that's why i get the error i put a recovery.conf >> after pg_basebackup finished... maybe we can say that more loudly? > > The idea is, if you use it with -x (or --xlog), it's for taking a > backup/clone, *not* for replication. > > If you use it without -x, then you can use it as the start of a > replica, by adding a recovery.conf. > > But you can't do both at once, that will confuse it. I stumbled upon this again today. There's nothing in the docs that would even hint that using -x shouldn't work to create a replica. Why does it get confused and can we (easily) make it not get confused? At the very least it needs a big fat warning in documentation for the -x option that the resulting backup might not be usable as a standby. Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Mon, Jun 4, 2012 at 5:12 PM, Robert Haas wrote: > Note sure about the rest of this patch, but this part is definitely bogus: > > +#if !defined(pg_atomic_fetch_and_set) > +#define pg_atomic_fetch_and_set(dst, src, value) \ > + do { S_LOCK(&dummy_spinlock); \ > + dst = src; \ > + src = value; \ > + S_UNLOCK(&dummy_spinlock); } while (0) > +#endif > > Locking a dummy backend-local spinlock doesn't provide atomicity > across multiple processes. Right, of course. I don't know why I assumed that dummy_spinlock would be global. In any case, this is very WIP and doesn't even aspire to be portable yet. The main point was to see if there's any significant performance to be gained by this method. Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [GENERAL] pg_upgrade from 9.0.7 to 9.1.3: duplicate key pg_authid_oid_index
On Sat, Jun 02, 2012 at 05:10:03PM -0400, Tom Lane wrote: > Bruce Momjian writes: > > On Fri, Jun 01, 2012 at 09:52:59AM -0400, Tom Lane wrote: > >> It seems that pg_upgrade needs a check to make sure that the bootstrap > >> superuser is named the same in old and new clusters. > > > The attached patch adds checks to verify the the old/new servers have > > the same install-user oid. > > That may or may not be a useful check to make, but it's got > approximately nothing to do with what I was complaining about. > > In particular, supposing that the user has given you a username that > isn't the bootstrap superuser in the new cluster, this patch is not > going to stop the update script from failing. Because the script is > then going to try to replace the bootstrap superuser, and that is > certainly going to give an error. > > I see the point of worrying about the install user as well as the > bootstrap superuser, but wouldn't it be best to insist they be the same? > Particularly in the new cluster, where if they aren't the same it means > the user has manually created at least one role in the new cluster, > which is likely to lead to OID conflicts or worse. > > Furthermore, if the bootstrap superusers aren't named the same, your > patch fails to handle the original complaint. In the case the > OP mentioned, the old cluster had > OID 10: "ubuntu" > some user-defined OID: "postgres" > and the new cluster had > OID 10: "postgres" > If the user tells pg_upgrade to use username postgres, your check will > not fail AFAICS, but nonetheless things are going to be messed up after > the upgrade, because some objects and privileges that used to belong to > the bootstrap superuser will now belong to a non-default superuser, > whereas what used to belong to the non-default superuser will now belong > to the bootstrap superuser. That cannot be thought desirable. For one > reason, in the old installation the postgres role could have been > dropped (possibly after dropping a few non-builtin objects) whereas the > "ubuntu" role was pinned. In the new installation, "postgres" is pinned > and "ubuntu" won't be. > > I think the checks that are actually needed here are (1) bootstrap > superusers are named the same, and (2) there are no roles other than the > bootstrap superuser in the new cluster. You are right that it is more complex than I stated, but given the limited feedback I got on the pg_upgrade/plplython, I figured people didn't want to hear the details. Here they are: There are three failure modes for pg_upgrade: 1. check failure 2. schema restore failure 3. silent failure/corruption Of course, the later items are worse than the earlier ones. The reporter got a "schema restore failure" while still following the pg_upgrade instructions. My initial patch changed that #2 error to a #1 error. Tom is right that creating users in the new cluster (against instructions), can still generate a #2 error if a new/old pg_authid.oid match, and they are not the install user, but seeing that is something that is against the instructions, I was going to leave that as a #2. However, since Tom feels we should check that and make it a #1 failure, I have added that test to the attached patch. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c new file mode 100644 index d226f00..9f3dcda *** a/contrib/pg_upgrade/check.c --- b/contrib/pg_upgrade/check.c *** check_new_cluster(void) *** 138,144 * We don't restore our own user, so both clusters must match have * matching install-user oids. */ ! if (old_cluster.install_user_oid != new_cluster.install_user_oid) pg_log(PG_FATAL, "Old and new cluster install users have different values for pg_authid.oid.\n"); --- 138,144 * We don't restore our own user, so both clusters must match have * matching install-user oids. */ ! if (old_cluster.install_role_oid != new_cluster.install_role_oid) pg_log(PG_FATAL, "Old and new cluster install users have different values for pg_authid.oid.\n"); *** check_new_cluster(void) *** 147,153 * defined users might match users defined in the old cluster and * generate an error during pg_dump restore. */ ! if (new_cluster.user_count != 1) pg_log(PG_FATAL, "Only the install user can be defined in the new cluster.\n"); check_for_prepared_transactions(&new_cluster); --- 147,153 * defined users might match users defined in the old cluster and * generate an error during pg_dump restore. */ ! if (new_cluster.role_count != 1) pg_log(PG_FATAL, "Only the install user can be defined in the new cluster.\n"); check_for_prepared_transactions(&new_cluster); *** check_is_super_user(ClusterInfo *cluster *** 618,624
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Fri, Jun 1, 2012 at 9:55 PM, Ants Aasma wrote: > On Sat, Jun 2, 2012 at 1:48 AM, Merlin Moncure wrote: >> Buffer pins aren't a cache: with a cache you are trying to mask a slow >> operation (like a disk i/o) with a faster such that the amount of slow >> operations are minimized. Buffer pins however are very different in >> that we only care about contention on the reference count (the buffer >> itself is not locked!) which makes me suspicious that caching type >> algorithms are the wrong place to be looking. I think it comes to do >> picking between your relatively complex but general, lock displacement >> approach or a specific strategy based on known bottlenecks. > > I agree that pins aren't like a cache. I mentioned the caching > algorithms because they work based on access frequency and highly > contended locks are likely to be accessed frequently even from a > single backend. However this only makes sense for the delayed > unpinning method, and I also have come to the conclusion that it's not > likely to work well. Besides delaying cleanup, the overhead for the > common case of uncontended access is just too much. > > It seems to me that even the nailing approach will need a replacement > algorithm. The local pins still need to be published globally and > because shared memory size is fixed, the maximum amount of locally > pinned nailed buffers needs to be limited as well. > > But anyway, I managed to completely misread the profile that Sergey > gave. Somehow I missed that the time went into the retry TAS in slock > instead of the inlined TAS. This shows that the issue isn't just > cacheline ping-pong but cacheline stealing. This could be somewhat > mitigated by making pinning lock-free. The Nb-GCLOCK paper that Robert > posted earlier in another thread describes an approach for this. I > have a WIP patch (attached) that makes the clock sweep lock-free in > the common case. This patch gave a 40% performance increase for an > extremely allocation heavy load running with 64 clients on a 4 core 1 > socket system, lesser gains across the board. Pinning has a shorter > lock duration (and a different lock type) so the gain might be less, > or it might be a larger problem and post a higher gain. Either way, I > think the nailing approach should be explored further, cacheline > ping-pong could still be a problem with higher number of processors > and losing the spinlock also loses the ability to detect contention. Note sure about the rest of this patch, but this part is definitely bogus: +#if !defined(pg_atomic_fetch_and_set) +#define pg_atomic_fetch_and_set(dst, src, value) \ + do { S_LOCK(&dummy_spinlock); \ + dst = src; \ + src = value; \ + S_UNLOCK(&dummy_spinlock); } while (0) +#endif Locking a dummy backend-local spinlock doesn't provide atomicity across multiple processes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] How to avoid base backup in automated failover
Hi all, Recently I was writing an application to implement automated failover with env: Two 2008 R2 servers, Network area storage, asynchronous replication, WAL archive on primary enabled. Is there any way to avoid starting standby server always from base backup in automated failover. I see the database is growing huge. I can't keep doing base backup every day. Please suggest solution Regards, Reddy -- View this message in context: http://postgresql.1045698.n5.nabble.com/How-to-avoid-base-backup-in-automated-failover-tp5711147.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup --xlog compatibility break
On tis, 2012-05-29 at 22:31 +0200, Magnus Hagander wrote: > Yeah, good arguments all around, i agree too :-) Next question is - > suggestions for naming of said paramter? --xlog-method=something? And/or -Xsomething, which would automatically enable -x? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers