Re: [HACKERS] Create collation incorrect error code
Magnus Hagander writes: > On Thu, May 31, 2012 at 8:16 PM, Tom Lane wrote: >> What platform? > Ubuntu 12.04. FWIW, I get the same on Fedora 16. It works if I say locale='nb_NO.utf8' to prevent case-folding, so apparently case of the locale name has something to do with it ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] slow dropping of tables, DropRelFileNodeBuffers, tas
On 31 May 2012 19:09, Sergey Koposov wrote: > On Thu, 31 May 2012, Simon Riggs wrote: > >> >> That struck me as a safe and easy optimisation. This was a problem I'd >> been trying to optimise for 9.2, so I've written a patch that appears >> simple and clean enough to be applied directly. > > > Thanks! The patch indeed improved the timings, The dropping of 100 tables in > a single commit before the patch took ~ 50 seconds, now it takes ~ 5 sec Thanks for the timing. >(it > would be nice to reduce it further though, because the dropping of 1 > tables still takes ~10 min). Why do you have 10,000 tables and why is it important to drop them so quickly? If its that important, why not run the drop in parallel sessions? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.
Robert Haas writes: > Now, measuring time skew is potentially a useful thing to do, if we > believe that this will actually give us an accurate measurement of > what the time skew is, because there are a whole series of things that > people want to do which involve subtracting a slave timestamp from a > master timestamp. Tom has persistently rebuffed all such proposals on > the grounds that there might be time skew, so in theory we could make > those things possible by having a way to measure time skew, which this > does. Here's what we do: given a slave timestamp, add the estimated > time skew to find an equivalent master timestamp, and then subtract. > Using a method of this type would allow us to compute a *real* apply > delay. Woohoo! Unfortunately, if time synchronization IS in use, > then the system clocks are probably already synchronized three to six > orders of magnitude more precisely than what this method can measure, > so the effect of using GetReplicationTransferLatency() to adjust slave > timestamps will be to massively reduce the accuracy of such > calculations. However, I've thus far been unable to convince anyone > that this is a bad idea, so maybe this is where we're gonna end up. Hmm ... first question is do we actually care whether the clocks are synced to the millisecond level, ie what is it you'd do differently if you know that the master and slave clocks are synced more closely than you can measure at the protocol level. But if there is a reason to care, perhaps we could have a setting that says "we're using NTP, so trust the clocks to be synced"? What I object to is assuming that without any evidence, or being unable to operate correctly in an environment where it's not true. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Wed, May 30, 2012 at 6:10 PM, Sergey Koposov wrote: > On Wed, 30 May 2012, Jeff Janes wrote: > >> But anyway, is idt_match a fairly static table? If so, I'd partition >> that into 16 tables, and then have each one of your tasks join against >> a different one of those tables. That should relieve the contention >> on the index root block, and might have some other benefits as well. > > > No, idt_match is getting filled by multi-threaded copy() and then joined > with 4 other big tables like idt_phot. The result is then split into > partitions. That does make things more complicated. But you could you partition it at that level and then do the joins partition-wise? I don't have much experience at data partitioning (well, I do, but the experience is with partitioning in Perl with terabytes of flat files, not in PG :) ) but I think that once you have your partitioning keys you want to apply them the same way up and down the data set. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] slow dropping of tables, DropRelFileNodeBuffers, tas
On Thu, May 31, 2012 at 11:09 AM, Sergey Koposov wrote: > On Thu, 31 May 2012, Simon Riggs wrote: > >> >> That struck me as a safe and easy optimisation. This was a problem I'd >> been trying to optimise for 9.2, so I've written a patch that appears >> simple and clean enough to be applied directly. > > > Thanks! The patch indeed improved the timings, The dropping of 100 tables in > a single commit before the patch took ~ 50 seconds, now it takes ~ 5 sec (it > would be nice to reduce it further though, because the dropping of 1 > tables still takes ~10 min). I'm surprised it helped that much. I thought the most it could theoretically could help would be a factor of 4. I tried the initially unlocked test, and for me it cut the time by a factor of 3. But I only have a 1GB shared_buffers at the max, I would expect it help more at larger sizes because there is a constant overhead not related to scanning the shared buffers which gets diluted out the larger shared_buffers is. I added to that a drop-all very similar to what Simon posted and got another factor of 3. But, if you can do this during a maintenance window, then just restarting with a much smaller shared_buffers should give you a much larger speed up than either or both of these. If I can extrapolate up to 10G from my current curve, setting it to 8MB instead would give a speed up of nearly 400 fold. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Thu, May 31, 2012 at 11:50 AM, Robert Haas wrote: > > This test case is unusual because it hits a whole series of buffers > very hard. However, there are other cases where this happens on a > single buffer that is just very, very hot, like the root block of a > btree index, where the pin/unpin overhead hurts us. I think that very very hot page is also the problem here, not a whole sequence of hot pages. Most of his buffer content sh lwlocks are on just two buffers, and most of his blocked buffer mapping lwlocks on are on just two partitions. So I am guessing that almost all of his spin-lock contention from Pin and Unpin are also coming from those same two buffers. Why there are two buffers when there is only one index root block involved, I don't know. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Thu, May 31, 2012 at 9:17 AM, Robert Haas wrote: > > Oh, ho. So from this we can see that the problem is that we're > getting huge amounts of spinlock contention when pinning and unpinning > index pages. > > It would be nice to have a self-contained reproducible test case for > this, so that we could experiment with it on other systems. I just posted a patch under subject "pgbench--new transaction type" that introduces a pgbench -P option. I think that that would do a good job of simulating unique-key look-ups on the inner side of a nested loop (which is basically what we have here) and so creating contention on index pages. Right now I don't have anything with more than 2 CPUs and 2 is not high enough to get much contention so I can't post any meaningful numbers. (pgbench -P might also be of interest in hash index investigation) Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Sun, May 27, 2012 at 11:45 AM, Sergey Koposov wrote: > Hi, > > I did another test using the same data and the same code, which I've > provided before and the performance of the single thread seems to be > degrading quadratically with the number of threads. > > Here are the results: > Nthreads Time_to_execute_one_thread > 1 8.1 > 2 7.8 > 3 8.1 > 4 9.0 > 5 10.2 > 6 11.4 > 7 13.3 > 8 16.1 > 9 19.0 > 10 21.4 > 11 23.8 > 12 27.3 > 13 30.2 > 14 32.0 > 15 34.1 > 16 37.5 This looks more skew hyperbolic than quadratic. In the higher region, it doesn't seem to be increasing at an increasing rate, but rather increasing at a constant rate of about 2.5 sec per additional backend, i.e. converging to an asymptote with a slope of 2.5. A pedantic point, perhaps, but if you are trying to infer the nature of the bottleneck from the shape of the curve, they lead to rather different sets of hypotheses. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench--new transaction type
On Sun, Jun 19, 2011 at 3:30 PM, Greg Smith wrote: > I applied Jeff's patch but changed this to address concerns about the > program getting stuck running for too long in the function: > > #define plpgsql_loops 512 > > This would be better named as "plpgsql_batch_size" or something similar > instead, the current name suggests it's how many loops to run which is > confusing. > > My main performance concern here was whether this change really matter so > much once a larger number of clients were involved. Some of the other > things you can do to optimize single-client performance aren't as useful > with lots of them. Here's how the improvements in this mode worked for me > on a server with 4 Hyper-Threaded cores (i870); shared_buffers=256MB, > scale=100: > > 1 client: > -S: 11533 > -S -M prepared: 19498 > -P: 49547 > > 12 clients, 4 workers: > -S: 56052 > -S -M prepared: 82043 > -P: 159443 > > 96 clients, 8 workers: > -S: 49940 > -S -M prepared: 76099 > -P: 137942 > > I think this is a really nice new workload to demonstrate. One of the > things we tell people is that code works much faster when moved server-side, > but how much faster isn't always easy to show. Having this mode available > lets them see how dramatic that can be quite easily. I know I'd like to be > able to run performance tests for clients of new hardware using PostgreSQL > and tell them something like this: "With simple clients executing a > statement at a time, this server reaches 56K SELECTs/section. But using > server-side functions to execute them in larger batches it can do 159K". > > The value this provides for providing an alternate source for benchmark load > generation, with a very different profile for how it exercises the server, > is good too. > > Things to fix in the patch before it would be a commit candidate: > > -Adjust the loop size/name, per above > -Reformat some of the longer lines to try and respect the implied right > margin in the code formatting > -Don't include the "plgsql function created." line unless in debugging mode. > -Add the docs. Focus on how this measures how fast the database can execute > SELECT statements using server-side code. An explanation that the > "transaction" block size is 512 is important to share. It also needs a > warning that time based runs ("-T") may have to wait for a block to finish > and go beyond its normally expected end time. > -The word "via" in the "transaction type" output description is probably not > the best choice. Changing to "SELECT only using PL/pgSQL" would translate > better, and follow the standard case use for the name of that language. Sorry it has taken me a year to get back to this patch. I have wanted to use it, and to ask other people to run it and report their results, several time recently, so I would like to get it into the core. I've attached a new patch which addresses several of your concerns, and adds the documentation. The description is much longer than the descriptions of other nearby options, which mostly just give a simple statement of what they do rather than a description of why that is useful. I don't know if that means I'm starting a good trend, or a bad one, or I'm just putting the exposition in the wrong place. In addition to showing the benefits of coding things on the server side when that is applicable, it also allows hackers to stress parts of the server code that are not easy to stress otherwise. Cheers, Jeff pgbench_loop_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Figuring out shared buffer pressure
On Thu, May 31, 2012 at 10:11 AM, Bruce Momjian wrote: > However, this doesn't help people configure shared buffers larger (e.g. > 35%) if their working set is larger. Right now, I don't see how a user > would know this is happening. On the flip side, they might have a > smaller working set than 25% and spending the overhead of managing 1 > million shared buffers. Again, there is no way to know if that is the > case. Another important use case: downgrades. They do happen, and right now are amazingly risky and made with limited information. Clearly a most complete picture is impossible because of reliance on kernel buffer management, but knowing the PG buffer pool occupancy and flux seems like it'd be so much better than knowing nothing, and it is likely that some conservative intuition could be learned to perform relatively safe downgrades. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Thu, May 31, 2012 at 3:25 PM, Merlin Moncure wrote: > Hm, couple questions: how do you determine if/when to un-nail a > buffer, and who makes that decision (bgwriter?) Well, I think some experimentation might be required, but my first thought is to tie it into buffer eviction. If we record a pin in some side array in lieu of acquiring the buffer header spinlock, then we're not bumping the usage count. So perhaps what we might do is - when we first nail the buffer, we set it's usage count to something much higher than what's normally allowed, like say 50. When it gets down to 0 and somebody tries to evict it, we scan the per-backend arrays and count the number of pins we find there, and set the usage count to 5 * that number. The details could be completely wrong here. The point is that I think every so often we can just reconsider whether the buffer is still worthy of being nailed. Exactly how often we should reconsider it seems like an empirical question. > Is there a limit to > how many buffers you are allowed to nail? Probably not. The system had better be designed so that the threshold for nailing a buffer is pretty high. It should be exceedingly unlikely for more than a small percentage of buffers to meet the criteria; if it happens, then there's probably a good reason. > It seems like a much > stronger idea, but one downside I see vs the 'pin for longer idea' i > was kicking around was how to deal stale nailed buffers and keeping > them from uncontrollably growing so that you have to either stop > nailing or forcibly evicting them. I think we're really talking about variants on the same theme. My idea is basically "pin forever", which is taking "longer" to its logical extreme, but even if it were something short of forever I think you'd still need some way to flush the stale pins. Otherwise, vacuum can get stuck, etc. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Interface of Row Level Security
2012/5/31 Kohei KaiGai : > 2012/5/31 Robert Haas : >>> If we would have an "ideal optimizer", I'd still like the optimizer to >>> wipe out redundant clauses transparently, rather than RLSBYPASS >>> permissions, because it just controls all-or-nothing stuff. >>> For example, if tuples are categorized to unclassified, classified or >>> secret, and RLS policy is configured as: >>> ((current_user IN ('alice', 'bob') AND X IN ('unclassified', >>> 'classified')) OR (X IN 'unclassified)), >>> superuser can see all the tuples, and alice and bob can see >>> up to classified tuples. >>> Is it really hard to wipe out redundant condition at planner stage? >>> If current_user is obviously 'kaigai', it seems to me the left-side of >>> this clause can be wiped out at the planner stage. >>> Do I consider the issue too simple? >> >> Yes. :-) >> >> There are two problems. First, if using the extended query protocol >> (e.g. pgbench -M prepared) you can prepare a statement just once and >> then execute it multiple times. In this case, stable-functions cannot >> be constant-folded at plan time, because they are only guaranteed to >> remain constant for a *single* execution of the query, not for all >> executions of the query. So any optimization in this area would have >> to be limited to cases where the simple query protocol is used. I >> think that might still be worth doing, but it's a significant >> limitation, to be sure. Second, at present, there is no guarantee >> that the snapshot used for planning the query is the same as the >> snapshot used for executing the query, though commit >> d573e239f03506920938bf0be56c868d9c3416da made that happen in some >> common cases. If we were to do constant-folding of stable functions >> using the planner snapshot, it would represent a behavior change from >> previous releases. I am not clear whether that has any real-world >> consequences that we should be worried about. It seems to me that the >> path of least resistance might be to refactor the portal stuff so that >> we can provide a uniform guarantee that, when using the simple query >> protocol, the planner and executor snapshots will be the same ... but >> I might be wrong. >> > It may be an option to separate the case into two; a situation to execute > the given query immediately just after optimization and never reused, > and others. > Even though the second situation, it may give us better query execution > plan, if we try to reconstruct query plan just before executor with > assumption that expects immutable / stable function can be replaced > by constant value prior to execution. > In other words, this idea tries to query optimization again on EXECUTE > statement against to its nature, to replace immutable / stable functions > by constant value, and to generate wiser execute plan. > At least, it may make sense to have a flag on prepared statement to > indicate whether it has possible better plan with this re-construction. > > Then, if so, we will be able to push the stuff corresponding to > RLSBYPASS into the query optimization, and works transparently > for users. > > Isn't it feasible to implement? > If we could replace a particular term that consists of constant values and stable / immutable functions only by parameter references, it may enable to handle the term as if a constant value, but actual calculation is delayed to executor stage. For example, according to this idea, PREPARE p1(int) AS SELECT * FROM tbl WHERE current_user in ('alice','bob') AND X > $1; shall be internally rewritten to, PREPARE p1(int) AS SELECT * FROM tbl WHERE $2 AND X>$1; then, $2 is implicitly calculated just before execution of this prepared statement. The snapshot to be used for this calculation is same with executor's one. It seems to me it is a feasible idea with less invasive implementation to existing planner. Does it make sense to describe exceptional condition using regular clause, instead of special permission? Thanks, -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Interface of Row Level Security
2012/5/31 Robert Haas : >> If we would have an "ideal optimizer", I'd still like the optimizer to >> wipe out redundant clauses transparently, rather than RLSBYPASS >> permissions, because it just controls all-or-nothing stuff. >> For example, if tuples are categorized to unclassified, classified or >> secret, and RLS policy is configured as: >> ((current_user IN ('alice', 'bob') AND X IN ('unclassified', >> 'classified')) OR (X IN 'unclassified)), >> superuser can see all the tuples, and alice and bob can see >> up to classified tuples. >> Is it really hard to wipe out redundant condition at planner stage? >> If current_user is obviously 'kaigai', it seems to me the left-side of >> this clause can be wiped out at the planner stage. >> Do I consider the issue too simple? > > Yes. :-) > > There are two problems. First, if using the extended query protocol > (e.g. pgbench -M prepared) you can prepare a statement just once and > then execute it multiple times. In this case, stable-functions cannot > be constant-folded at plan time, because they are only guaranteed to > remain constant for a *single* execution of the query, not for all > executions of the query. So any optimization in this area would have > to be limited to cases where the simple query protocol is used. I > think that might still be worth doing, but it's a significant > limitation, to be sure. Second, at present, there is no guarantee > that the snapshot used for planning the query is the same as the > snapshot used for executing the query, though commit > d573e239f03506920938bf0be56c868d9c3416da made that happen in some > common cases. If we were to do constant-folding of stable functions > using the planner snapshot, it would represent a behavior change from > previous releases. I am not clear whether that has any real-world > consequences that we should be worried about. It seems to me that the > path of least resistance might be to refactor the portal stuff so that > we can provide a uniform guarantee that, when using the simple query > protocol, the planner and executor snapshots will be the same ... but > I might be wrong. > It may be an option to separate the case into two; a situation to execute the given query immediately just after optimization and never reused, and others. Even though the second situation, it may give us better query execution plan, if we try to reconstruct query plan just before executor with assumption that expects immutable / stable function can be replaced by constant value prior to execution. In other words, this idea tries to query optimization again on EXECUTE statement against to its nature, to replace immutable / stable functions by constant value, and to generate wiser execute plan. At least, it may make sense to have a flag on prepared statement to indicate whether it has possible better plan with this re-construction. Then, if so, we will be able to push the stuff corresponding to RLSBYPASS into the query optimization, and works transparently for users. Isn't it feasible to implement? Thanks, -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.
On Wed, May 30, 2012 at 12:17 PM, Fujii Masao wrote: > OTOH, I wonder whether we really need to send keepalive messages > periodically to calculate a network latency. ISTM we don't unless a network > latency varies from situation to situation so frequently and we'd like to > monitor that in almost real time. I didn't look at this patch too carefully when it was committed. Looking at it more carefully now, it looks to me like this patch does two different things. One is to add a function called GetReplicationApplyDelay(), which returns the number of milliseconds since replay was fully caught up. So if you were last caught up 5 minutes ago and you have replayed 4 minutes and 50 seconds worth of WAL during that time, this function will return 5 minutes, not 10 seconds. That is not what I would call "apply delay", which I would define as how far behind you are NOW, not how long it's been since you weren't behind at all. The second thing it does is add a function called GetReplicationTransferLatency(). The return value of this function is the difference between the slave's clock at the time the last master keepalive was processed and the master's clock at the time that keepalive was generated. I think that in practice, unless network time synchronization is in use, this is mostly going to be computing the clock skew between the master and the slave. If time synchronization is in use, then as you say it'll be a very jittery measure of master-slave network latency, which can be monitored perfectly well from outside PG. Now, measuring time skew is potentially a useful thing to do, if we believe that this will actually give us an accurate measurement of what the time skew is, because there are a whole series of things that people want to do which involve subtracting a slave timestamp from a master timestamp. Tom has persistently rebuffed all such proposals on the grounds that there might be time skew, so in theory we could make those things possible by having a way to measure time skew, which this does. Here's what we do: given a slave timestamp, add the estimated time skew to find an equivalent master timestamp, and then subtract. Using a method of this type would allow us to compute a *real* apply delay. Woohoo! Unfortunately, if time synchronization IS in use, then the system clocks are probably already synchronized three to six orders of magnitude more precisely than what this method can measure, so the effect of using GetReplicationTransferLatency() to adjust slave timestamps will be to massively reduce the accuracy of such calculations. However, I've thus far been unable to convince anyone that this is a bad idea, so maybe this is where we're gonna end up. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Thu, May 31, 2012 at 1:50 PM, Robert Haas wrote: > On Thu, May 31, 2012 at 2:03 PM, Merlin Moncure wrote: >> On Thu, May 31, 2012 at 11:54 AM, Sergey Koposov >> wrote: >>> On Thu, 31 May 2012, Robert Haas wrote: >>> Oh, ho. So from this we can see that the problem is that we're getting huge amounts of spinlock contention when pinning and unpinning index pages. It would be nice to have a self-contained reproducible test case for this, so that we could experiment with it on other systems. >>> >>> >>> I have created it a few days ago: >>> http://archives.postgresql.org/pgsql-hackers/2012-05/msg01143.php >>> >>> It is still valid. And I'm using exactly it to test. The only thing to >>> change is to create a two-col index and drop another index. >>> The scripts are precisely the ones I'm using now. >>> >>> The problem is that in order to see a really big slowdown (10 times slower >>> than a single thread) I've had to raise the buffers to 48g but it was slow >>> for smaller shared buffer settings as well. >>> >>> But I'm not sure how sensitive the test is to the hardware. >> >> It's not: high contention on spinlocks is going to suck no matter what >> hardware you have. I think the problem is pretty obvious now: any >> case where multiple backends are scanning the same sequence of buffers >> in a very tight loop is going to display this behavior. It doesn't >> come up that often: it takes a pretty unusual sequence of events to >> get a bunch of backends hitting the same buffer like that. >> >> Hm, I wonder if you could alleviate the symptoms by making making the >> Pin/UnpinBuffer smarter so that frequently pinned buffers could stay >> pinned longer -- kinda as if your private ref count was hacked to be >> higher in that case. It would be a complex fix for a narrow issue >> though. > > This test case is unusual because it hits a whole series of buffers > very hard. However, there are other cases where this happens on a > single buffer that is just very, very hot, like the root block of a > btree index, where the pin/unpin overhead hurts us. I've been > thinking about this problem for a while, but it hasn't made it up to > the top of my priority list, because workloads where pin/unpin is the > dominant cost are still relatively uncommon. I expect them to get > more common as we fix other problems. > > Anyhow, I do have some vague thoughts on how to fix this. Buffer pins > are a lot like weak relation locks, in that they are a type of lock > that is taken frequently, but rarely conflicts. And the fast-path > locking in 9.2 provides a demonstration of how to handle this kind of > problem efficiently: making the weak, rarely-conflicting locks > cheaper, at the cost of some additional expense when a conflicting > lock (in this case, a buffer cleanup lock) is taken. In particular, > each backend has its own area to record weak relation locks, and a > strong relation lock must scan all of those areas and migrate any > locks found there to the main lock table. I don't think it would be > feasible to adopt exactly this solution for buffer pins, because page > eviction and buffer cleanup locks, while not exactly common, are > common enough that we can't require a scan of N per-backend areas > every time one of those operations occurs. > > But, maybe we could have a system of this type that only applies to > the very hottest buffers. Suppose we introduce two new buffer flags, > BUF_NAILED and BUF_NAIL_REMOVAL. When we detect excessive contention > on the buffer header spinlock, we set BUF_NAILED. Once we do that, > the buffer can't be evicted until that flag is removed, and backends > are permitted to record pins in a per-backend area protected by a > per-backend spinlock or lwlock, rather than in the buffer header. > When we want to un-nail the buffer, we set BUF_NAIL_REMOVAL. Hm, couple questions: how do you determine if/when to un-nail a buffer, and who makes that decision (bgwriter?) Is there a limit to how many buffers you are allowed to nail? It seems like a much stronger idea, but one downside I see vs the 'pin for longer idea' i was kicking around was how to deal stale nailed buffers and keeping them from uncontrollably growing so that you have to either stop nailing or forcibly evicting them. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extending relations more efficiently
On Tue, 2012-05-01 at 10:08 -0400, Robert Haas wrote: > We've previously discussed the possible desirability of extending > relations in larger increments, rather than one block at a time, for > performance reasons. I attempted to determine how much performance we > could possibly buy this way, and found that, as far as I can see, the > answer is, basically, none. Another point here is that with checksums, we will want to make sure that zero pages can be treated as corrupt. That will probably involve using the WAL for extension operations, and we'll want to mitigate that cost somehow. Extending in larger chunks would probably be necessary. There are some challenges there, but I think it's worth pursuing. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] \conninfo and SSL
On Thu, May 31, 2012 at 9:04 PM, Bruce Momjian wrote: > On startup, psql shows the SSL information: > > $ psql 'sslmode=require host=localhost' > psql (9.2beta1) > SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256) > > However, \conninfo does not mention SSL: > > postgres=> \conninfo > You are connected to database "postgres" as user "postgres" on > host "localhost" at port "5432". > > Should \conninfo mention SSL? Fortunately \c shows SSL information: > > postgres=> \c > SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256) > You are now connected to database "postgres" as user "postgres". Well, \c creates a new connection, so that's not really an information command. It might not connect that trivially, depending on what authentication method you use. Including ssl info in \conninfo would be useful, I think. //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] \conninfo and SSL
On startup, psql shows the SSL information: $ psql 'sslmode=require host=localhost' psql (9.2beta1) SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256) However, \conninfo does not mention SSL: postgres=> \conninfo You are connected to database "postgres" as user "postgres" on host "localhost" at port "5432". Should \conninfo mention SSL? Fortunately \c shows SSL information: postgres=> \c SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256) You are now connected to database "postgres" as user "postgres". -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with MinGW W64
On Thu, May 31, 2012 at 2:13 PM, Tom Lane wrote: > Robert Haas writes: >> On Thu, May 31, 2012 at 8:15 AM, Johann 'Myrkraverk' Oskarsson >> wrote: >>> Is this something to discuss with the MinGW W64 team? > >> My viewpoint on this (which is different than Tom's) is that we're >> probably not entitled to assume anything about what the system header >> files do with respect to stat. On some systems, they might just have >> a function prototype, while others might define stat or stat() as a >> macro. It seems to me that our source code is hoping for a function >> definition rather than a macro definition and falling over when that's >> not how it is. I don't see that as very reasonable, unless we have >> some basis for believing that the OS isn't entitled to define stat as >> a macro rather than a function, and maybe not even then. > > I quote from the POSIX:2008 specification for : > > The following shall be declared as functions and may also be > defined as macros. Function prototypes shall be provided. > ... > int stat(const char *restrict, struct stat *restrict); > > I do not believe that the standard intends the word "shall" to have any > wiggle room. I would also read this to mean that if the header defines > "stat" as a macro, that macro ought to be an alternative way of invoking > the function. There's nothing in the passage you quote that says the macro definition can't do anything other than invoke the eponymous function, but... > Now we are messing up by failing to #undef the macro > before redefining it, but if we do that and it still doesn't work, the > header is not conformant to POSIX. ...this is probably still true. >> We have >> plenty of other places where we use our own wrapper function in lieu >> of OS facilities for various reasons (e.g. BasicOpenFile) and I don't >> think adding one more is a big deal. > > Well, I think it is. Where we use a wrapper function, it's because it > has somewhat different semantics from the underlying standard function. > I do not think it's good for readability to define pgfoo() when that's > only meant to be exactly foo(). I'm especially not keen on doing that > just because one variant of MinGW has managed to break their conformance > with POSIX. In this case, I feel like we've kind of already stepped in it, because we've defined stat() to be pgwin32_safestat(), which is, in fact, not exactly stat(). So right now ISTM that a naive backend hacker might think that stat() means "the stat provided by the OS", but, on Windows, it doesn't. I would exactly describe myself as "keen" on adding wrapper functions to things like stat(), but until non-POSIX operating systems go the way of the dodo bird, I'm not sure there's any way around it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Thu, May 31, 2012 at 2:03 PM, Merlin Moncure wrote: > On Thu, May 31, 2012 at 11:54 AM, Sergey Koposov > wrote: >> On Thu, 31 May 2012, Robert Haas wrote: >> >>> Oh, ho. So from this we can see that the problem is that we're >>> getting huge amounts of spinlock contention when pinning and unpinning >>> index pages. >>> >>> It would be nice to have a self-contained reproducible test case for >>> this, so that we could experiment with it on other systems. >> >> >> I have created it a few days ago: >> http://archives.postgresql.org/pgsql-hackers/2012-05/msg01143.php >> >> It is still valid. And I'm using exactly it to test. The only thing to >> change is to create a two-col index and drop another index. >> The scripts are precisely the ones I'm using now. >> >> The problem is that in order to see a really big slowdown (10 times slower >> than a single thread) I've had to raise the buffers to 48g but it was slow >> for smaller shared buffer settings as well. >> >> But I'm not sure how sensitive the test is to the hardware. > > It's not: high contention on spinlocks is going to suck no matter what > hardware you have. I think the problem is pretty obvious now: any > case where multiple backends are scanning the same sequence of buffers > in a very tight loop is going to display this behavior. It doesn't > come up that often: it takes a pretty unusual sequence of events to > get a bunch of backends hitting the same buffer like that. > > Hm, I wonder if you could alleviate the symptoms by making making the > Pin/UnpinBuffer smarter so that frequently pinned buffers could stay > pinned longer -- kinda as if your private ref count was hacked to be > higher in that case. It would be a complex fix for a narrow issue > though. This test case is unusual because it hits a whole series of buffers very hard. However, there are other cases where this happens on a single buffer that is just very, very hot, like the root block of a btree index, where the pin/unpin overhead hurts us. I've been thinking about this problem for a while, but it hasn't made it up to the top of my priority list, because workloads where pin/unpin is the dominant cost are still relatively uncommon. I expect them to get more common as we fix other problems. Anyhow, I do have some vague thoughts on how to fix this. Buffer pins are a lot like weak relation locks, in that they are a type of lock that is taken frequently, but rarely conflicts. And the fast-path locking in 9.2 provides a demonstration of how to handle this kind of problem efficiently: making the weak, rarely-conflicting locks cheaper, at the cost of some additional expense when a conflicting lock (in this case, a buffer cleanup lock) is taken. In particular, each backend has its own area to record weak relation locks, and a strong relation lock must scan all of those areas and migrate any locks found there to the main lock table. I don't think it would be feasible to adopt exactly this solution for buffer pins, because page eviction and buffer cleanup locks, while not exactly common, are common enough that we can't require a scan of N per-backend areas every time one of those operations occurs. But, maybe we could have a system of this type that only applies to the very hottest buffers. Suppose we introduce two new buffer flags, BUF_NAILED and BUF_NAIL_REMOVAL. When we detect excessive contention on the buffer header spinlock, we set BUF_NAILED. Once we do that, the buffer can't be evicted until that flag is removed, and backends are permitted to record pins in a per-backend area protected by a per-backend spinlock or lwlock, rather than in the buffer header. When we want to un-nail the buffer, we set BUF_NAIL_REMOVAL. At that point, it's no longer permissible to record new pins in the per-backend areas, but old ones may still exist. So then we scan all the per-backend areas and transfer the pins to the buffer header, or else just wait until no more exist; then, we clear both BUF_NAILED and BUF_NAIL_REMOVAL. So the pin algorithm looks like this: read buffer header flags (unlocked) if (flags & (BUF_NAILED|BUF_NAIL_REMOVAL) != BUF_NAILED) { take buffer header spinlock record pin in buffer header release buffer header spinlock; } else { take per-backend lwlock record pin in per-backend area release per-backend lwlock read buffer header flags (unlocked) if (flags & (BUF_NAILED|BUF_NAIL_REMOVAL) != BUF_NAILED) { take per-backend lwlock forget pin in per-backend area release per-backend lwlock take buffer header spinlock record pin in buffer header release buffer header spinlock } } Due to memory ordering effects, we might see the buffer as nailed when in fact nail removal has already begun (or even, completed). We can prevent that if (1) the nail removal code sets the nail removal flag before checking the per-backend areas and (2) the pin code checks the nail removal flag AFTER
Re: [HACKERS] Figuring out shared buffer pressure
Le jeudi 31 mai 2012 19:11:07, Bruce Momjian a écrit : > On Wed, May 30, 2012 at 05:55:07PM -0400, Bruce Momjian wrote: > > > > Seems buffers_alloc is the number of calls to StrategyGetBuffer(), > > > > which tells how many time we have requested a buffer. Not sure how > > > > that helps measure buffer pressure. > > > > > > Once the linked list is empty, every request for a buffer to read a > > > new page into must result in the eviction of the previous occupant > > > from this conceptual freelist buffer (except perhaps for some race > > > conditions). Isn't that what you wanted? Except that the > > > buffers_alloc does not get incremented when the StrategyGetBuffer is > > > satisfied by a ring strategy rather than the default strategy. > > > > Well, the ideal case is that I could find out how often data that is > > near to be discarded is actually needed, hence the "reclaimed" field > > that is often important for kernel memory presssure reporting on older > > operating systems. I will post an email soon about my theory of why > > buffer pressure is an important thing to report to users. > > OK, realizing there is no simple way to measure shared buffer pressure, > let me explain why I want to. > > Right now we simplisticly recommend 25% of RAM for shared_buffers, with > a maximum of 8GB (512MB on Windows). This helps to be sure that there > are sufficient kernel buffers for high-write operations, and perhaps a > kernel cache larger than shared buffers. > > However, this doesn't help people configure shared buffers larger (e.g. > 35%) if their working set is larger. Right now, I don't see how a user > would know this is happening. On the flip side, they might have a > smaller working set than 25% and spending the overhead of managing 1 > million shared buffers. Again, there is no way to know if that is the > case. > > For example, we have reports that larger shared buffers is sometimes > better, sometimes not, but there is no feedback we give the user to > explain why this is happening. My guess is that if their working set is > larger than 25% of RAM, they benefit, if not, the buffer management > overhead makes things slower. > > I feel we need to allow users to get clearer information on how active > their shared buffer cache is, perhaps allowing them to shink/grow it as > appropriate. Asking them to blindly try different shared buffer sizes > seems suboptimal. There is also a recent thread about the usagecount used. Jeff suggested to maybe increase by 2 and decrease by 1. This is really near another idea I had but I didn't tested yet. Increment/decrement by 1 or 2, just make even numbers ascending, and odd number descending. So that by just looking at the usage count you can know what's going with your buffers (more odd than even or reverse ? building the cache or wasting it or ...). it should also allows a finer control of eviction. Increase/decrease shared_buffer while postgresql is up is a nice idea. -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Create collation incorrect error code
On Thu, May 31, 2012 at 8:16 PM, Tom Lane wrote: > Magnus Hagander writes: >> template1=# create collation "nb_NO.utf8" (locale=nb_NO.utf8); >> ERROR: could not create locale "nb_no.utf8": Success > > What platform? D'uh, sorry. Ubuntu 12.04. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Create collation incorrect error code
Magnus Hagander writes: > template1=# create collation "nb_NO.utf8" (locale=nb_NO.utf8); > ERROR: could not create locale "nb_no.utf8": Success What platform? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with MinGW W64
Robert Haas writes: > On Thu, May 31, 2012 at 8:15 AM, Johann 'Myrkraverk' Oskarsson > wrote: >> Is this something to discuss with the MinGW W64 team? > My viewpoint on this (which is different than Tom's) is that we're > probably not entitled to assume anything about what the system header > files do with respect to stat. On some systems, they might just have > a function prototype, while others might define stat or stat() as a > macro. It seems to me that our source code is hoping for a function > definition rather than a macro definition and falling over when that's > not how it is. I don't see that as very reasonable, unless we have > some basis for believing that the OS isn't entitled to define stat as > a macro rather than a function, and maybe not even then. I quote from the POSIX:2008 specification for : The following shall be declared as functions and may also be defined as macros. Function prototypes shall be provided. ... intstat(const char *restrict, struct stat *restrict); I do not believe that the standard intends the word "shall" to have any wiggle room. I would also read this to mean that if the header defines "stat" as a macro, that macro ought to be an alternative way of invoking the function. Now we are messing up by failing to #undef the macro before redefining it, but if we do that and it still doesn't work, the header is not conformant to POSIX. You can read this yourself at http://pubs.opengroup.org/onlinepubs/9699919799/ > We have > plenty of other places where we use our own wrapper function in lieu > of OS facilities for various reasons (e.g. BasicOpenFile) and I don't > think adding one more is a big deal. Well, I think it is. Where we use a wrapper function, it's because it has somewhat different semantics from the underlying standard function. I do not think it's good for readability to define pgfoo() when that's only meant to be exactly foo(). I'm especially not keen on doing that just because one variant of MinGW has managed to break their conformance with POSIX. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] slow dropping of tables, DropRelFileNodeBuffers, tas
On Thu, 31 May 2012, Simon Riggs wrote: That struck me as a safe and easy optimisation. This was a problem I'd been trying to optimise for 9.2, so I've written a patch that appears simple and clean enough to be applied directly. Thanks! The patch indeed improved the timings, The dropping of 100 tables in a single commit before the patch took ~ 50 seconds, now it takes ~ 5 sec (it would be nice to reduce it further though, because the dropping of 1 tables still takes ~10 min). Cheers, S * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Create collation incorrect error code
Strangeness: template1=# create collation "nb_NO.utf8" (locale=nb_NO.utf8); ERROR: could not create locale "nb_no.utf8": Success Clearly it's not successful... On some runs, I get: template1=# create collation "nb_NO.utf8" (locale=nb_NO.utf8); ERROR: could not create locale "nb_no.utf8": No such file or directory DETAIL: The operating system could not find any locale data for the locale name "nb_no.utf8". Which is correct. The first one is clearly wrong. Seems there are some cases where errno isn't set properly? But form what I can tell (http://pubs.opengroup.org/onlinepubs/9699919799/functions/newlocale.html) errno should be set. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Thu, May 31, 2012 at 11:54 AM, Sergey Koposov wrote: > On Thu, 31 May 2012, Robert Haas wrote: > >> Oh, ho. So from this we can see that the problem is that we're >> getting huge amounts of spinlock contention when pinning and unpinning >> index pages. >> >> It would be nice to have a self-contained reproducible test case for >> this, so that we could experiment with it on other systems. > > > I have created it a few days ago: > http://archives.postgresql.org/pgsql-hackers/2012-05/msg01143.php > > It is still valid. And I'm using exactly it to test. The only thing to > change is to create a two-col index and drop another index. > The scripts are precisely the ones I'm using now. > > The problem is that in order to see a really big slowdown (10 times slower > than a single thread) I've had to raise the buffers to 48g but it was slow > for smaller shared buffer settings as well. > > But I'm not sure how sensitive the test is to the hardware. It's not: high contention on spinlocks is going to suck no matter what hardware you have. I think the problem is pretty obvious now: any case where multiple backends are scanning the same sequence of buffers in a very tight loop is going to display this behavior. It doesn't come up that often: it takes a pretty unusual sequence of events to get a bunch of backends hitting the same buffer like that. Hm, I wonder if you could alleviate the symptoms by making making the Pin/UnpinBuffer smarter so that frequently pinned buffers could stay pinned longer -- kinda as if your private ref count was hacked to be higher in that case. It would be a complex fix for a narrow issue though. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap_form_tuple crashing
On Thu, May 31, 2012 at 7:01 PM, Tom Lane wrote: > Atri Sharma writes: >> My call to heap_form_tuple looks like: >> val1=0; >> tupledesc1=BlessTupleDesc(node->ss.ss_currentRelation->rd_att); >> tuple=heap_form_tuple(tupledesc1,p1,&val1); > >> p1 is a pointer to a Datum instance which is created from a char array. > > Does that actually match the tupdesc you're using? Are you sure you > created the Datum correctly (ie, did you call the appropriate datatype > input routine)? > > BTW, the BlessTupleDesc call here seems to be pure cargo-cult > programming. It should not be necessary to bless a relation's tupdesc > (because that should be a named type already); and even if it were, > heap_form_tuple doesn't care. > > regards, tom lane Hi Tom, Thanks for the advice and help.Your diagnosis is correct,I used the correct datatype input routine and it worked like a charm. I will remove the calls to BlessTupleDesc then? Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hot standby PSQL 9.1 Windows 2008 Servers
Sorry to mention, In my setup the primary and standby servers receive same traffic, so no issue with the "network fault between the primary and the standby, but not between the primary and some of the clients updating it" -- View this message in context: http://postgresql.1045698.n5.nabble.com/Re-hot-standby-PSQL-9-1-Windows-2008-Servers-tp5710824p5710832.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hot standby PSQL 9.1 Windows 2008 Servers
Dear Kevin, Thank you for your reply. Yeah I am writing an application using powershell, it's true it is not trivial and especially a guy like me who has no idea on database. You raised all the cases which I am muddling with, But currently I am testing this setup: Always standby server is configured from base backup and restore from storage server then start streaming replication(asynchronous). Base backups are taken frequently. I am sure there is some data loss during switching. Still researching how to do it clean. suggest me if you have any good papers on this .. Reddy. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Re-hot-standby-PSQL-9-1-Windows-2008-Servers-tp5710824p5710830.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Figuring out shared buffer pressure
On Wed, May 30, 2012 at 05:55:07PM -0400, Bruce Momjian wrote: > > > Seems buffers_alloc is the number of calls to StrategyGetBuffer(), which > > > tells how many time we have requested a buffer. Not sure how that helps > > > measure buffer pressure. > > > > Once the linked list is empty, every request for a buffer to read a > > new page into must result in the eviction of the previous occupant > > from this conceptual freelist buffer (except perhaps for some race > > conditions). Isn't that what you wanted? Except that the > > buffers_alloc does not get incremented when the StrategyGetBuffer is > > satisfied by a ring strategy rather than the default strategy. > > Well, the ideal case is that I could find out how often data that is > near to be discarded is actually needed, hence the "reclaimed" field > that is often important for kernel memory presssure reporting on older > operating systems. I will post an email soon about my theory of why > buffer pressure is an important thing to report to users. OK, realizing there is no simple way to measure shared buffer pressure, let me explain why I want to. Right now we simplisticly recommend 25% of RAM for shared_buffers, with a maximum of 8GB (512MB on Windows). This helps to be sure that there are sufficient kernel buffers for high-write operations, and perhaps a kernel cache larger than shared buffers. However, this doesn't help people configure shared buffers larger (e.g. 35%) if their working set is larger. Right now, I don't see how a user would know this is happening. On the flip side, they might have a smaller working set than 25% and spending the overhead of managing 1 million shared buffers. Again, there is no way to know if that is the case. For example, we have reports that larger shared buffers is sometimes better, sometimes not, but there is no feedback we give the user to explain why this is happening. My guess is that if their working set is larger than 25% of RAM, they benefit, if not, the buffer management overhead makes things slower. I feel we need to allow users to get clearer information on how active their shared buffer cache is, perhaps allowing them to shink/grow it as appropriate. Asking them to blindly try different shared buffer sizes seems suboptimal. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Thu, 31 May 2012, Robert Haas wrote: Oh, ho. So from this we can see that the problem is that we're getting huge amounts of spinlock contention when pinning and unpinning index pages. It would be nice to have a self-contained reproducible test case for this, so that we could experiment with it on other systems. I have created it a few days ago: http://archives.postgresql.org/pgsql-hackers/2012-05/msg01143.php It is still valid. And I'm using exactly it to test. The only thing to change is to create a two-col index and drop another index. The scripts are precisely the ones I'm using now. The problem is that in order to see a really big slowdown (10 times slower than a single thread) I've had to raise the buffers to 48g but it was slow for smaller shared buffer settings as well. But I'm not sure how sensitive the test is to the hardware. Cheers, S * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hot standby PSQL 9.1 Windows 2008 Servers
chinnaobi wrote: > You mean when the primary which is going to switch its role to > standby might not have sent all the WAL records to the standby and > If it is switched to standby it has more WAL records than the > standby which is now serves as primary. Is it ?? What happens when there is a network fault between the primary and the standby, but not between the primary and some of the clients updating it? Similarly, if this is asynchronous replication, what if there have been commits on the primary which were still in the network buffer when the primary crashed? Clean automated failover is not a trivial task. If you are writing your own, it would be best to follow the steps recommended in the documentation rather closely. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Thu, May 31, 2012 at 10:23 AM, Sergey Koposov wrote: > On Thu, 31 May 2012, Robert Haas wrote: >> >> >> Thanks. How did you generate this perf report? It's cool, because I >> haven't figured out how to make perf generate a report that is easily >> email-able, and it seems you have. > > > I did pretty much what you have said, e.g. > attached it to running process by > perf record -g -p PID > and then > perf report -g > output > > And postgresql was compiled with cflags=-g > >> >> The only trouble is that there's no call stack information here for >> s_lock or PinBuffer, which is what I really want. It seems to have >> spit out call stack information only for the kernel functions, and not >> for user functions. > > > Yes, I forgot to clean the old binaries when recompiled with cflags=-g. > So not it is fixed. I attach the updated perf report (i.e. the first 1 > lines of it to reduce the file size). That's basically what we needed. The sequential scans are driving index scans are all simultaneously pointing at the same couple of pages in the indexThe are constantly pinning and unpinning -- the database is schizophrenically going back and forth between the key pages in the index being and not being allowed to be candidates for buffer eviction. Raising shared buffers doesn't help because it's just marking the buffers to be available for eviction, not the eviction process itself, that is the problem. IOS doens't help because it's index relation buffers, not the heap buffers we are binding up on (although if the btree was fixed it's entirely possible the problem could head right back to the heap for non IOS scans. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Thu, May 31, 2012 at 11:23 AM, Sergey Koposov wrote: > On Thu, 31 May 2012, Robert Haas wrote: >> >> Thanks. How did you generate this perf report? It's cool, because I >> haven't figured out how to make perf generate a report that is easily >> email-able, and it seems you have. > > I did pretty much what you have said, e.g. > attached it to running process by > perf record -g -p PID > and then > perf report -g > output Ah, interesting. I never tried sending the output to a file. > And postgresql was compiled with cflags=-g >> >> The only trouble is that there's no call stack information here for >> s_lock or PinBuffer, which is what I really want. It seems to have >> spit out call stack information only for the kernel functions, and not >> for user functions. > > Yes, I forgot to clean the old binaries when recompiled with cflags=-g. > So not it is fixed. I attach the updated perf report (i.e. the first 1 > lines of it to reduce the file size). Oh, ho. So from this we can see that the problem is that we're getting huge amounts of spinlock contention when pinning and unpinning index pages. It would be nice to have a self-contained reproducible test case for this, so that we could experiment with it on other systems. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Draft release notes complete
On Thu, May 31, 2012 at 5:55 PM, Bruce Momjian wrote: > On Tue, May 15, 2012 at 12:57:37PM -0400, Magnus Hagander wrote: >> On Fri, May 11, 2012 at 11:44 AM, Andrew Dunstan wrote: >> > >> > >> > On 05/11/2012 05:32 AM, Magnus Hagander wrote: >> >> >> >> >> >> But in the interest of actually being productive - what *is* the >> >> usecase for needing a 5 minute turnaround time? I don't buy the "check >> >> what a patch looks like", because that should be done *before* the >> >> commit, not after - so it's best verified by a local docs build anyway >> >> (which will also be faster). >> >> >> >> I'm sure we can put something in with a pretty quick turnaround again >> >> without too much strain on the system, but it does, as I mentioned >> >> before, require decoupling it from the buildfarm which means it's not >> >> just tweaking a config file. >> > >> > >> > If it's of any use to you I have made some adjustments to the buildfarm >> > code >> > which would let you do *just* the docs build (and dist make if you want). >> > It >> > would still pull from git, and only do anything if there's a (relevant) >> > change. So using that to set up a machine that would run every few minutes >> > might work. Of course, building the docs can itself be fairly compute >> > intensive, so you still might not want to run every few minutes if that's a >> > limiting factor. >> >> that would definitely be useful. Compute intensive is not really a >> problem, we can easily shape the box on that (and I think we already >> do). >> >> Do you have some details of what to do and how to do it to use that, >> so Stefan can set it up for us ? ;) > > Where are we on building the development docs more frequently? Still waiting for details on how it works to set that up on the buildfarm client. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Draft release notes complete
On Tue, May 15, 2012 at 12:57:37PM -0400, Magnus Hagander wrote: > On Fri, May 11, 2012 at 11:44 AM, Andrew Dunstan wrote: > > > > > > On 05/11/2012 05:32 AM, Magnus Hagander wrote: > >> > >> > >> But in the interest of actually being productive - what *is* the > >> usecase for needing a 5 minute turnaround time? I don't buy the "check > >> what a patch looks like", because that should be done *before* the > >> commit, not after - so it's best verified by a local docs build anyway > >> (which will also be faster). > >> > >> I'm sure we can put something in with a pretty quick turnaround again > >> without too much strain on the system, but it does, as I mentioned > >> before, require decoupling it from the buildfarm which means it's not > >> just tweaking a config file. > > > > > > If it's of any use to you I have made some adjustments to the buildfarm code > > which would let you do *just* the docs build (and dist make if you want). It > > would still pull from git, and only do anything if there's a (relevant) > > change. So using that to set up a machine that would run every few minutes > > might work. Of course, building the docs can itself be fairly compute > > intensive, so you still might not want to run every few minutes if that's a > > limiting factor. > > that would definitely be useful. Compute intensive is not really a > problem, we can easily shape the box on that (and I think we already > do). > > Do you have some details of what to do and how to do it to use that, > so Stefan can set it up for us ? ;) Where are we on building the development docs more frequently? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
On 31 May 2012 16:23, Tom Lane wrote: > Simon Riggs writes: >> In what way is it possibly destabilising? > > I'm prepared to believe that it only affects performance, but it could > be destabilizing to that. It needs proper review and testing, and the > next CF is the right environment for that to happen. It couldn't possibly be as destabilising to performance as commit_delay was in 9.1. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hash index concurrency
On Wed, May 30, 2012 at 12:21:33AM -0400, Tom Lane wrote: > Robert Haas writes: > > On Tue, May 29, 2012 at 11:21 PM, Jeff Janes wrote: > >> 2) Only support bitmap scans and not ordinary tid scans (the way gin > >> indexes already do). > > > -1 on losing amgettuple. I regret that we lost that for GIN and I > > shall regret it more if we lose it anywhere else. > > Not sure that's all that big a deal for hash. IIRC the only reasons to > want it are for index-only scans (not possible anyway with hash) and > exclusion constraints (for which you might as well use a btree, or plain > index-supported uniqueness if hash had that). It does via EXCLUDE constraints, so it could with what as far as I've been able to tell would be some relatively small amount of coding. dfetter@dfetter:5492=# CREATE TABLE foo(i TEXT, EXCLUDE USING HASH(i WITH =)); NOTICE: CREATE TABLE / EXCLUDE will create implicit index "foo_i_excl" for table "foo" CREATE TABLE dfetter@dfetter:5492=# insert into foo VALUES (1),(1); ERROR: conflicting key value violates exclusion constraint "foo_i_excl" DETAIL: Key (i)=(1) conflicts with existing key (i)=(1). Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
On 31 May 2012 16:26, Peter Geoghegan wrote: > On 31 May 2012 16:23, Tom Lane wrote: >> Simon Riggs writes: >>> In what way is it possibly destabilising? >> >> I'm prepared to believe that it only affects performance, but it could >> be destabilizing to that. It needs proper review and testing, and the >> next CF is the right environment for that to happen. > > It couldn't possibly be as destabilising to performance as > commit_delay was in 9.1. Furthermore, it couldn't possibly affect performance in any way unless commit_delay is set. I've just moved the delay site so that its only executed by the group commit leader. The leader would execute the code anyway, but now the followers don't. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
On Thu, May 31, 2012 at 12:25 PM, Tom Lane wrote: >> No, Tatsuo's patch attacks a phase dominated by latency in some >> setups. > > No, it does not. The reason it's a win is that it avoids the O(N^2) > behavior in the server. Whether the bandwidth savings is worth worrying > about cannot be proven one way or the other as long as that elephant > is in the room. > > regards, tom lane I understand that, but if the locking is fixed and made to be O(N) (and hence each table locking O(1)), then latency suddenly becomes the dominating factor. I'm thinking, though, pg_upgrade runs locally, contrary to pg_dump backups, so in that case latency would be negligible and Tatsuo's patch inconsequential. I'm also thinking, whether the ResourceOwner patch you've proposed would get negated by Tatsuo's patch, because suddenly a "portal" (IIRC) has a lot more locks than ResourceOwner could accomodate, forcing a reversal to O(N²) behavior. In that case, that patch would in fact be detrimental... huh... way to go 180 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
Claudio Freire writes: > On Thu, May 31, 2012 at 11:50 AM, Tom Lane wrote: >> The performance patches we applied to pg_dump over the past couple weeks >> were meant to relieve pain in situations where the big server-side >> lossage wasn't the dominant factor in runtime (ie, partial dumps). >> But this one is targeting exactly that area, which is why it looks like >> a band-aid and not a fix to me. > No, Tatsuo's patch attacks a phase dominated by latency in some > setups. No, it does not. The reason it's a win is that it avoids the O(N^2) behavior in the server. Whether the bandwidth savings is worth worrying about cannot be proven one way or the other as long as that elephant is in the room. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
On 31 May 2012 14:58, Robert Haas wrote: > Fixing regressions before release is essential; improving performance > is not - especially when the improvement relates to a little-used > feature that you were proposing to get rid of two weeks ago. Yes, the fact that I wanted to get rid of commit_delay is well established - I called for its deprecation in a dedicated thread, and during my talk at pgCon. Bruce's confusion as to how that interacted with what I've been calling "new group commit" was actually what crystallised my position here: it is trying, for the most part, to do the same thing as new group commit, but in an entirely orthogonal way. Bruce's confusion actually reflected the confusion of the code. So I'm in a sense removing the overlap between commit_delay used to do but now but shouldn't try to do anymore (make commits coincide, giving good benchmark results) and what new group commit now does, while preserving commit_delay's ability to trade off latency for throughput. I didn't have an answer to the question of how we might continue to offer a throughput/latency trade-off to users before, but knew that with 9.2, commit_delay was totally ineffective anyway. The realisation that it could be made effective by working with rather than against new group commit changed my mind. > It can't simultaneously be so unimportant that we should remove it altogether > and so important that it's must-fix-before-release, and if one test > can completely overturn your view of which category this falls into, > that seems like a reason for taking some more time to think it over > and, perhaps, run more tests. We don't have a lot of latitude to > maneuver at this point - anything we do now is going to go straight > out into the wild. Caution is appropriate. The patch can be justified as a way of removing the tension between new group commit and commit_delay. Removing commit_delay would also do this, but then there'd be no way to make the aforementioned trade-off that we previously offered. I suspect that if it restored the peaks and valleys of commit_delay's changes to throughput in 9.1, over and above a new group commit baseline, this would be more readily accepted. I hope the patch isn't being punished for being effective. Yes, it does offer a large boost to performance, but that happens to be incidental, unlikely though that sounds. You've called this a clever idea. I actually don't agree. I was fairly surprised that no one noticed this earlier. It is rather obviously the case that a delay that hopes to maximise the batching of commits at the expense of latency should occur only in a single leader backend that will proceed with the flush for the batch, and not within each and every backend as it commits. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
Simon Riggs writes: > On 31 May 2012 15:00, Tom Lane wrote: >> If we want to finish the beta cycle in a reasonable time period and get >> back to actual development, we have to refrain from adding more >> possibly-destabilizing development work to 9.2. And that is what >> this is. > In what way is it possibly destabilising? I'm prepared to believe that it only affects performance, but it could be destabilizing to that. It needs proper review and testing, and the next CF is the right environment for that to happen. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
On Thu, May 31, 2012 at 11:04:12AM -0400, Robert Haas wrote: > On Thu, May 31, 2012 at 10:50 AM, Tom Lane wrote: > > Robert Haas writes: > >> On Thu, May 31, 2012 at 10:31 AM, Tom Lane wrote: > >>> I'm not; Jeff Janes is. But you shouldn't be holding your breath > >>> anyway, since it's 9.3 material at this point. > > > >> I agree we can't back-patch that change, but then I think we ought to > >> consider back-patching some variant of Tatsuo's patch. Maybe it's not > >> reasonable to thunk an arbitrary number of relation names in there on > >> one line, but how about 1000 relations per LOCK statement or so? I > >> guess we'd need to see how much that erodes the benefit, but we've > >> certainly done back-branch rearrangements in pg_dump in the past to > >> fix various kinds of issues, and this is pretty non-invasive. > > > > I am not convinced either that this patch will still be useful after > > Jeff's fix goes in, ... > > But people on older branches are not going to GET Jeff's fix. FYI, if it got into Postgres 9.2, everyone upgrading to Postgres 9.2 would benefit because pg_upgrade uses the new cluster's pg_dumpall. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
On Thu, May 31, 2012 at 10:50 AM, Tom Lane wrote: > Robert Haas writes: >> On Thu, May 31, 2012 at 10:31 AM, Tom Lane wrote: >>> I'm not; Jeff Janes is. But you shouldn't be holding your breath >>> anyway, since it's 9.3 material at this point. > >> I agree we can't back-patch that change, but then I think we ought to >> consider back-patching some variant of Tatsuo's patch. Maybe it's not >> reasonable to thunk an arbitrary number of relation names in there on >> one line, but how about 1000 relations per LOCK statement or so? I >> guess we'd need to see how much that erodes the benefit, but we've >> certainly done back-branch rearrangements in pg_dump in the past to >> fix various kinds of issues, and this is pretty non-invasive. > > I am not convinced either that this patch will still be useful after > Jeff's fix goes in, ... But people on older branches are not going to GET Jeff's fix. > or that it provides any meaningful savings when > you consider a complete pg_dump run. Yeah, it will make the lock > acquisition phase faster, but that's not a big part of the runtime > except in very limited scenarios (--schema-only, perhaps). That is not a borderline scenario, as others have also pointed out. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
On Thu, May 31, 2012 at 11:50 AM, Tom Lane wrote: > The performance patches we applied to pg_dump over the past couple weeks > were meant to relieve pain in situations where the big server-side > lossage wasn't the dominant factor in runtime (ie, partial dumps). > But this one is targeting exactly that area, which is why it looks like > a band-aid and not a fix to me. No, Tatsuo's patch attacks a phase dominated by latency in some setups. That it's also becoming slow currently because of the locking cost is irrelevant, with locking sped up, the patch should only improve the phase even further. Imagine the current timeline: * = locking . = waiting *.*.**.**.***.***...*. Tatsuo's patch converts it to: *.** The locking fix would turn the timeline into: *.*.*.*.*.*.* Tatsuo's patch would turn that into: *** And, as noted before, pg_dump --schema-only is a key bottleneck in pg_upgrade. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
On Thu, May 31, 2012 at 10:50:51AM -0400, Tom Lane wrote: > Robert Haas writes: > > On Thu, May 31, 2012 at 10:31 AM, Tom Lane wrote: > >> I'm not; Jeff Janes is. �But you shouldn't be holding your breath > >> anyway, since it's 9.3 material at this point. > > > I agree we can't back-patch that change, but then I think we ought to > > consider back-patching some variant of Tatsuo's patch. Maybe it's not > > reasonable to thunk an arbitrary number of relation names in there on > > one line, but how about 1000 relations per LOCK statement or so? I > > guess we'd need to see how much that erodes the benefit, but we've > > certainly done back-branch rearrangements in pg_dump in the past to > > fix various kinds of issues, and this is pretty non-invasive. > > I am not convinced either that this patch will still be useful after > Jeff's fix goes in, or that it provides any meaningful savings when > you consider a complete pg_dump run. Yeah, it will make the lock > acquisition phase faster, but that's not a big part of the runtime > except in very limited scenarios (--schema-only, perhaps). FYI, that is the pg_upgrade use-case, and pg_dump/restore time is reportedly taking the majority of time in many cases. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
On 31 May 2012 15:00, Tom Lane wrote: > Simon Riggs writes: >> On 31 May 2012 13:16, Robert Haas wrote: >>> Frankly, I think this whole thing should be pushed to 9.3. > >> What matters is that we have a patch that provides a massive >> performance gain in write performance in just a few lines of code, and >> that should be committed to 9.2. > > I agree with Robert on this. This patch hasn't had *nearly* enough > testing to justify cramming it into 9.2 at this point. AFAIK the > claim of "massive performance gain" is based on a single test case run > by a single person, which doesn't even give me any confidence that it > doesn't break anything, much less that it's a win across the board. I agree with you. You would be mistaken if you thought that I think Peter's laptop was sufficient proof for anyone to commit something and I've already said exactly that to him. My description of "massive performance gain" is appropriate based on the measurements so far. > If we want to finish the beta cycle in a reasonable time period and get > back to actual development, we have to refrain from adding more > possibly-destabilizing development work to 9.2. And that is what > this is. In what way is it possibly destabilising? I see nothing in the patch to merit that claim, so presumably you haven't read the patch yet? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
Robert Haas writes: > On Thu, May 31, 2012 at 10:31 AM, Tom Lane wrote: >> I'm not; Jeff Janes is. But you shouldn't be holding your breath >> anyway, since it's 9.3 material at this point. > I agree we can't back-patch that change, but then I think we ought to > consider back-patching some variant of Tatsuo's patch. Maybe it's not > reasonable to thunk an arbitrary number of relation names in there on > one line, but how about 1000 relations per LOCK statement or so? I > guess we'd need to see how much that erodes the benefit, but we've > certainly done back-branch rearrangements in pg_dump in the past to > fix various kinds of issues, and this is pretty non-invasive. I am not convinced either that this patch will still be useful after Jeff's fix goes in, or that it provides any meaningful savings when you consider a complete pg_dump run. Yeah, it will make the lock acquisition phase faster, but that's not a big part of the runtime except in very limited scenarios (--schema-only, perhaps). The performance patches we applied to pg_dump over the past couple weeks were meant to relieve pain in situations where the big server-side lossage wasn't the dominant factor in runtime (ie, partial dumps). But this one is targeting exactly that area, which is why it looks like a band-aid and not a fix to me. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Interface of Row Level Security
On Wed, May 30, 2012 at 3:26 PM, Kohei KaiGai wrote: > My preference is RLSBYPASS permission rather than the approach > with functions that return policy clause at run-time, because it needs > to invalidate prepared statement at random timing. > In case of this function approach, the RLS policy shall be generated > on planner stage, and we cannot have any assumption to the criteria > of RLS policy. A function might generate RLS policy regarding to the > current user id. Yes, it is straightforward. The prepared statement > should be invalidate whenever current user-id got switched. > However, someone may define a function that generate RLS policy > depending on the value of "client_min_messages" for example. > Do we need to invalidate prepared statement whenever GUC get > updated? I think it is overkill. We cannot predicate all the criteria > user want to control the RLS policy using the functions. > So, RLSBYPASS permission is more simple way to limit number of > situations to invalidate prepared statements. That's a good point. > If we would have an "ideal optimizer", I'd still like the optimizer to > wipe out redundant clauses transparently, rather than RLSBYPASS > permissions, because it just controls all-or-nothing stuff. > For example, if tuples are categorized to unclassified, classified or > secret, and RLS policy is configured as: > ((current_user IN ('alice', 'bob') AND X IN ('unclassified', > 'classified')) OR (X IN 'unclassified)), > superuser can see all the tuples, and alice and bob can see > up to classified tuples. > Is it really hard to wipe out redundant condition at planner stage? > If current_user is obviously 'kaigai', it seems to me the left-side of > this clause can be wiped out at the planner stage. > Do I consider the issue too simple? Yes. :-) There are two problems. First, if using the extended query protocol (e.g. pgbench -M prepared) you can prepare a statement just once and then execute it multiple times. In this case, stable-functions cannot be constant-folded at plan time, because they are only guaranteed to remain constant for a *single* execution of the query, not for all executions of the query. So any optimization in this area would have to be limited to cases where the simple query protocol is used. I think that might still be worth doing, but it's a significant limitation, to be sure. Second, at present, there is no guarantee that the snapshot used for planning the query is the same as the snapshot used for executing the query, though commit d573e239f03506920938bf0be56c868d9c3416da made that happen in some common cases. If we were to do constant-folding of stable functions using the planner snapshot, it would represent a behavior change from previous releases. I am not clear whether that has any real-world consequences that we should be worried about. It seems to me that the path of least resistance might be to refactor the portal stuff so that we can provide a uniform guarantee that, when using the simple query protocol, the planner and executor snapshots will be the same ... but I might be wrong. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
On Thu, May 31, 2012 at 10:31 AM, Tom Lane wrote: > Claudio Freire writes: >> It's not clear whether Tom is already working on that O(N^2) fix in locking. > > I'm not; Jeff Janes is. But you shouldn't be holding your breath > anyway, since it's 9.3 material at this point. I agree we can't back-patch that change, but then I think we ought to consider back-patching some variant of Tatsuo's patch. Maybe it's not reasonable to thunk an arbitrary number of relation names in there on one line, but how about 1000 relations per LOCK statement or so? I guess we'd need to see how much that erodes the benefit, but we've certainly done back-branch rearrangements in pg_dump in the past to fix various kinds of issues, and this is pretty non-invasive. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with MinGW W64
On Thu, May 31, 2012 at 8:15 AM, Johann 'Myrkraverk' Oskarsson wrote: MinGW W64's sys/stat.h #defines stat to be _stati64 and there is subsequently a compilation error in port.h: note: expected 'struct _stati64 *' but argument is of type 'struct stat *' error: conflicting types for 'pgwin32_safestat' >> >>> In this case, I really think we ought to change all backend calls >>> that hit stat() to use something like pgstat() instead. >> >> I disagree with this conclusion. That'd be an unnecessarily >> nonstandard notation, which all existing and future developers would >> have to learn. I'd rather work around this in port.h if at all >> possible. I'm not quite sure why the existing code fails, though --- >> is there a conflict between "#define stat" and "#define stat(a,b)"? > > I wouldn't know, the compiler is GCC 4.6.3 here (any 4.5+ will do I > think) so all the usal GCC macro magic should be working. > > Is this something to discuss with the MinGW W64 team? My viewpoint on this (which is different than Tom's) is that we're probably not entitled to assume anything about what the system header files do with respect to stat. On some systems, they might just have a function prototype, while others might define stat or stat() as a macro. It seems to me that our source code is hoping for a function definition rather than a macro definition and falling over when that's not how it is. I don't see that as very reasonable, unless we have some basis for believing that the OS isn't entitled to define stat as a macro rather than a function, and maybe not even then. We have plenty of other places where we use our own wrapper function in lieu of OS facilities for various reasons (e.g. BasicOpenFile) and I don't think adding one more is a big deal. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
Claudio Freire writes: > It's not clear whether Tom is already working on that O(N^2) fix in locking. I'm not; Jeff Janes is. But you shouldn't be holding your breath anyway, since it's 9.3 material at this point. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
On Thu, May 31, 2012 at 11:17 AM, Robert Klemme wrote: > > OK, my fault was to assume you wanted to measure only your part, while > apparently you meant overall savings. But Tom had asked for separate > measurements if I understood him correctly. Also, that measurement of > your change would go after the O(N^2) fix. It could actually turn out > to be much more than 9% because the overall time would be reduced even > more dramatic. So it might actually be good for your fix to wait a > bit. ;-) It's not clear whether Tom is already working on that O(N^2) fix in locking. I'm asking because it doesn't seem like a complicated patch, contributors may want to get working if not ;-) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Thu, May 31, 2012 at 7:31 AM, Sergey Koposov wrote: > On Wed, 30 May 2012, Robert Haas wrote: > >> I'd really like to find out exactly where all those s_lock calls are >> coming from. Is there any way you can get oprofile to output a >> partial stack backtrace? If you have perf it's very easy, just 'perf >> record -g -a ' and then 'perf report >> -g'. > > > I repeated my test with 8 threads (without tasksetting) and with > sharedbuffers=48g (because that seemed to trigger in particular long times ~ > 80 seconds). And I attach the perf report. Thanks. How did you generate this perf report? It's cool, because I haven't figured out how to make perf generate a report that is easily email-able, and it seems you have. The only trouble is that there's no call stack information here for s_lock or PinBuffer, which is what I really want. It seems to have spit out call stack information only for the kernel functions, and not for user functions. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
On Thu, May 31, 2012 at 4:07 PM, Tatsuo Ishii wrote: >> On Thu, May 31, 2012 at 10:45 AM, Tatsuo Ishii wrote: >>> Just for record, I rerun the test again with my single-LOCK patch, and >>> now total runtime of pg_dump is 113 minutes. >>> 188 minutes(9.0)->125 minutes(git master)->113 minutes(with my patch). >>> >>> So far, I'm glad to see 40% time savings at this point. >> >> I see only 9.6% savings (100 * (113/125 - 1)). What am I missing? > > What I meant was (100 * (113/188 - 1)). OK, my fault was to assume you wanted to measure only your part, while apparently you meant overall savings. But Tom had asked for separate measurements if I understood him correctly. Also, that measurement of your change would go after the O(N^2) fix. It could actually turn out to be much more than 9% because the overall time would be reduced even more dramatic. So it might actually be good for your fix to wait a bit. ;-) Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741
On Thu, May 31, 2012 at 9:51 AM, Tom Lane wrote: > Robert Haas writes: >> The one thing that still seems a little odd to me is that this caused >> a pin count to get orphaned. It seems reasonable that ignoring the >> AccessExclusiveLock could result in not-found errors trying to open a >> missing relation, and even fsync requests on a missing relation. But >> I don't see why that would cause the backend-local pin counts to get >> messed up, which makes me wonder if there really is another bug here >> somewhere. > > According to Heikki's log, the Assert was in the startup process itself, > and it happened after an error: > >> 2012-05-26 10:44:28.587 CEST 10270 FATAL: could not open file >> "base/21268/32994": No such file or directory >> 2012-05-26 10:44:28.588 CEST 10270 CONTEXT: writing block 2508 of relation >> base/21268/32994 >> xlog redo multi-insert (init): rel 1663/21268/33006; blk 3117; 58 >> tuples >> TRAP: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: >> 1741) >> 2012-05-26 10:44:31.131 CEST 10269 LOG: startup process (PID 10270) was >> terminated by signal 6: Aborted > > I don't think that code is meant to recover from errors anyway, so > the fact that it fails with a pin count held isn't exactly surprising. > But it might be worth looking at exactly which on_proc_exit callbacks > are installed in the startup process and what assumptions they make. Which code isn't meant to recover from errors? > As for where the error came from in the first place, it's easy to > imagine somebody who's not got the word about the AccessExclusiveLock > reading pages of the table into buffers that have already been scanned > by the DROP. So you'd end up with orphaned buffers belonging to a > vanished table. If somebody managed to dirty them by setting hint bits > (we do allow that in HS mode no?) then later you'd have various processes > trying to write the buffer before recycling it, which seems to fit the > reported error. Right, I understand the other errors. It's just the pin count that I am a bit confused about. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
> On Thu, May 31, 2012 at 10:45 AM, Tatsuo Ishii wrote: >> Just for record, I rerun the test again with my single-LOCK patch, and >> now total runtime of pg_dump is 113 minutes. >> 188 minutes(9.0)->125 minutes(git master)->113 minutes(with my patch). >> >> So far, I'm glad to see 40% time savings at this point. > > I see only 9.6% savings (100 * (113/125 - 1)). What am I missing? What I meant was (100 * (113/188 - 1)). -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
Simon Riggs writes: > On 31 May 2012 13:16, Robert Haas wrote: >> Frankly, I think this whole thing should be pushed to 9.3. > What matters is that we have a patch that provides a massive > performance gain in write performance in just a few lines of code, and > that should be committed to 9.2. I agree with Robert on this. This patch hasn't had *nearly* enough testing to justify cramming it into 9.2 at this point. AFAIK the claim of "massive performance gain" is based on a single test case run by a single person, which doesn't even give me any confidence that it doesn't break anything, much less that it's a win across the board. If we want to finish the beta cycle in a reasonable time period and get back to actual development, we have to refrain from adding more possibly-destabilizing development work to 9.2. And that is what this is. Add it to the upcoming CF, please. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
On Thu, May 31, 2012 at 8:38 AM, Peter Geoghegan wrote: > On 31 May 2012 13:16, Robert Haas wrote: >> On Thu, May 31, 2012 at 6:19 AM, Simon Riggs wrote: >> Frankly, I think this whole thing should be pushed to 9.3. The >> commit_delay and commit_siblings knobs suck, but they've sucked for a >> long time, and it won't kill anybody to wait another release cycle to >> fix them. We have plenty of more important things queued up for 9.3 >> already, and I don't believe there's any compelling reason to think >> that this particular thing needs preferential treatment. > > Why do you think that? Those knobs are now quite ineffective, though > we never even considered that when the group commit delay patch was > committed. The entire body of research and commentary that exists on > commit_delay has been invalidated for 9.2. If that isn't something > that needs to be addressed before release, I don't know what is. The > fact that the patch can sometimes double transaction throughput for an > absolutely trivial change, moving 2 lines of code, is also a good > reason to not bump this for another year. Fixing regressions before release is essential; improving performance is not - especially when the improvement relates to a little-used feature that you were proposing to get rid of two weeks ago. It can't simultaneously be so unimportant that we should remove it altogether and so important that it's must-fix-before-release, and if one test can completely overturn your view of which category this falls into, that seems like a reason for taking some more time to think it over and, perhaps, run more tests. We don't have a lot of latitude to maneuver at this point - anything we do now is going to go straight out into the wild. Caution is appropriate. However, rather than arguing about it, let's see if anyone else has an opinion. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741
Robert Haas writes: > The one thing that still seems a little odd to me is that this caused > a pin count to get orphaned. It seems reasonable that ignoring the > AccessExclusiveLock could result in not-found errors trying to open a > missing relation, and even fsync requests on a missing relation. But > I don't see why that would cause the backend-local pin counts to get > messed up, which makes me wonder if there really is another bug here > somewhere. According to Heikki's log, the Assert was in the startup process itself, and it happened after an error: > 2012-05-26 10:44:28.587 CEST 10270 FATAL: could not open file > "base/21268/32994": No such file or directory > 2012-05-26 10:44:28.588 CEST 10270 CONTEXT: writing block 2508 of relation > base/21268/32994 > xlog redo multi-insert (init): rel 1663/21268/33006; blk 3117; 58 > tuples > TRAP: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: > 1741) > 2012-05-26 10:44:31.131 CEST 10269 LOG: startup process (PID 10270) was > terminated by signal 6: Aborted I don't think that code is meant to recover from errors anyway, so the fact that it fails with a pin count held isn't exactly surprising. But it might be worth looking at exactly which on_proc_exit callbacks are installed in the startup process and what assumptions they make. As for where the error came from in the first place, it's easy to imagine somebody who's not got the word about the AccessExclusiveLock reading pages of the table into buffers that have already been scanned by the DROP. So you'd end up with orphaned buffers belonging to a vanished table. If somebody managed to dirty them by setting hint bits (we do allow that in HS mode no?) then later you'd have various processes trying to write the buffer before recycling it, which seems to fit the reported error. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
On Thu, May 31, 2012 at 10:45 AM, Tatsuo Ishii wrote: > Just for record, I rerun the test again with my single-LOCK patch, and > now total runtime of pg_dump is 113 minutes. > 188 minutes(9.0)->125 minutes(git master)->113 minutes(with my patch). > > So far, I'm glad to see 40% time savings at this point. I see only 9.6% savings (100 * (113/125 - 1)). What am I missing? Cheers robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap_form_tuple crashing
Atri Sharma writes: > My call to heap_form_tuple looks like: > val1=0; > tupledesc1=BlessTupleDesc(node->ss.ss_currentRelation->rd_att); > tuple=heap_form_tuple(tupledesc1,p1,&val1); > p1 is a pointer to a Datum instance which is created from a char array. Does that actually match the tupdesc you're using? Are you sure you created the Datum correctly (ie, did you call the appropriate datatype input routine)? BTW, the BlessTupleDesc call here seems to be pure cargo-cult programming. It should not be necessary to bless a relation's tupdesc (because that should be a named type already); and even if it were, heap_form_tuple doesn't care. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
On 31 May 2012 13:16, Robert Haas wrote: > On Thu, May 31, 2012 at 6:19 AM, Simon Riggs wrote: > Frankly, I think this whole thing should be pushed to 9.3. The > commit_delay and commit_siblings knobs suck, but they've sucked for a > long time, and it won't kill anybody to wait another release cycle to > fix them. We have plenty of more important things queued up for 9.3 > already, and I don't believe there's any compelling reason to think > that this particular thing needs preferential treatment. Why do you think that? Those knobs are now quite ineffective, though we never even considered that when the group commit delay patch was committed. The entire body of research and commentary that exists on commit_delay has been invalidated for 9.2. If that isn't something that needs to be addressed before release, I don't know what is. The fact that the patch can sometimes double transaction throughput for an absolutely trivial change, moving 2 lines of code, is also a good reason to not bump this for another year. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
On 31 May 2012 13:16, Robert Haas wrote: > On Thu, May 31, 2012 at 6:19 AM, Simon Riggs wrote: >> I've looked at this more closely now and I can see that the call to >> XLogFlush() that is made from xact_redo_commit_internal() doesn't ever >> actually flush WAL, so whether we delay or not is completely >> irrelevant. >> >> So un-agreed. No change required to patch there. > > I think Peter's suggestion of forcibly setting the delay to 0 in the > startup process is a good one, though. It's one line of code, and if > it isn't strictly necessary today, it still seems like good > future-proofing. Adding a line that does nothing is not a good idea. The Startup process flushes very, very few WAL messages, so the setting is irrelevant. > I am not very happy about the idea of renaming commit_* to > group_commit_*. It's basically a cosmetic renaming, and breaking > existing configuration files for cosmetic purposes does not seem > warranted to me, especially when the old and new names are so close. > I certainly don't think we can do that in 9.2, now that beta1 has > already shipped. Modifying the default contents of postgresql.conf > after we've shipped beta has been a historical no-no for reasons that > escape me at the moment, but IIRC they're not stupid reasons. > > Frankly, I think this whole thing should be pushed to 9.3. The > commit_delay and commit_siblings knobs suck, but they've sucked for a > long time, and it won't kill anybody to wait another release cycle to > fix them. We have plenty of more important things queued up for 9.3 > already, and I don't believe there's any compelling reason to think > that this particular thing needs preferential treatment. No problem with pushing a variable rename through to 9.3. To be honest, I don't care whether we rename them or not. What matters is that we have a patch that provides a massive performance gain in write performance in just a few lines of code, and that should be committed to 9.2. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issues with MinGW W64
Tom Lane writes: > Robert Haas writes: >> On Tue, May 29, 2012 at 9:04 AM, Johann 'Myrkraverk' Oskarsson >> wrote: >>> The header file crtdefs.h in MinGW typedefs errcode which conflicts >>> with Postgres' elog.h. > >> Eep. Maybe this is not directly relevant, but I guess my first >> question is: why is MinGW doing that? > > I concur with Robert here: your first step should be to push back on > the MinGW developers about this nonstandard intrusion on application > namespace. We've been using errcode() as a function name since 2003, > and it's never been a problem before on any platform, including > previous versions of MinGW. I have contacted the MinGW W64 team on this. > If they won't change it, then we could consider some other hack, but > that should really be the first attempt. >>> MinGW W64's sys/stat.h #defines stat to be _stati64 and there is >>> subsequently a compilation error in port.h: >>> >>> note: expected 'struct _stati64 *' but argument is of type 'struct >>> stat *' error: conflicting types for 'pgwin32_safestat' > >> In this case, I really think we ought to change all backend calls >> that hit stat() to use something like pgstat() instead. > > I disagree with this conclusion. That'd be an unnecessarily > nonstandard notation, which all existing and future developers would > have to learn. I'd rather work around this in port.h if at all > possible. I'm not quite sure why the existing code fails, though --- > is there a conflict between "#define stat" and "#define stat(a,b)"? I wouldn't know, the compiler is GCC 4.6.3 here (any 4.5+ will do I think) so all the usal GCC macro magic should be working. Is this something to discuss with the MinGW W64 team? >>> There are series of redefined macros from the MinGW W64 CRT. >>> ... >>> And possibly some more. Do we need these redefines? > >> We probably need them somewhere, or they wouldn't have been added. >> But maybe we don't need them on the exact platform you're using. > > Can we deal with this by just wrapping each #define in #ifndef? I'll take a look and make sure the #defines end up with the same values. If so I'll attach a patch for this. -- Johann Oskarssonhttp://www.2ndquadrant.com/|[] PostgreSQL Development, 24x7 Support, Training and Services --+-- | Blog: http://my.opera.com/myrkraverk/blog/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
On Thu, May 31, 2012 at 6:19 AM, Simon Riggs wrote: > I've looked at this more closely now and I can see that the call to > XLogFlush() that is made from xact_redo_commit_internal() doesn't ever > actually flush WAL, so whether we delay or not is completely > irrelevant. > > So un-agreed. No change required to patch there. I think Peter's suggestion of forcibly setting the delay to 0 in the startup process is a good one, though. It's one line of code, and if it isn't strictly necessary today, it still seems like good future-proofing. I am not very happy about the idea of renaming commit_* to group_commit_*. It's basically a cosmetic renaming, and breaking existing configuration files for cosmetic purposes does not seem warranted to me, especially when the old and new names are so close. I certainly don't think we can do that in 9.2, now that beta1 has already shipped. Modifying the default contents of postgresql.conf after we've shipped beta has been a historical no-no for reasons that escape me at the moment, but IIRC they're not stupid reasons. Frankly, I think this whole thing should be pushed to 9.3. The commit_delay and commit_siblings knobs suck, but they've sucked for a long time, and it won't kill anybody to wait another release cycle to fix them. We have plenty of more important things queued up for 9.3 already, and I don't believe there's any compelling reason to think that this particular thing needs preferential treatment. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741
On Thu, May 31, 2012 at 7:49 AM, Erik Rijkers wrote: > On Thu, May 31, 2012 13:14, Robert Haas wrote: >> On Thu, May 31, 2012 at 1:06 AM, Erik Rijkers wrote: >>> In my test, I run the bash code (the bits that I posted earlier) in a while >>> loop so that the >>> table >>> is CREATEd, COPYied into, and DROPped every few seconds -- perhaps that >>> wasn't clear. That loop >>> is necessary; a few iterations are often successful. >> >> Yes... I let it run all night on my laptop with no errors. > > My apologies to both of you. It seems the problem was indeed solved with > that commit from Robert. > I managed to forget that I, uh... temporary, commented out the git-pull from > my build script... No problem. The one thing that still seems a little odd to me is that this caused a pin count to get orphaned. It seems reasonable that ignoring the AccessExclusiveLock could result in not-found errors trying to open a missing relation, and even fsync requests on a missing relation. But I don't see why that would cause the backend-local pin counts to get messed up, which makes me wonder if there really is another bug here somewhere. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] slow dropping of tables, DropRelFileNodeBuffers, tas
On 30 May 2012 12:10, Heikki Linnakangas wrote: > Hmm, we do this in smgrDoPendingDeletes: > > for (i = 0; i <= MAX_FORKNUM; i++) > { > smgrdounlink(srel, i, false); > } > > So we drop the buffers for each relation fork separately, which means that > we scan the buffer pool four times. Relation forks in 8.4 introduced that > issue, and 9.1 made it worse by adding another fork for unlogged tables. > With some refactoring, we could scan the buffer pool just once. That would > help a lot. That struck me as a safe and easy optimisation. This was a problem I'd been trying to optimise for 9.2, so I've written a patch that appears simple and clean enough to be applied directly. > Also, I wonder if DropRelFileNodeBuffers() could scan the pool without > grabbing the spinlocks on every buffer? It could do an unlocked test first, > and only grab the spinlock on buffers that need to be dropped. Sounds less good and we'd need reasonable proof it actually did anything useful without being dangerous. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services dropallforks.v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741
On Thu, May 31, 2012 13:14, Robert Haas wrote: > On Thu, May 31, 2012 at 1:06 AM, Erik Rijkers wrote: >> In my test, I run the bash code (the bits that I posted earlier) in a while >> loop so that the >> table >> is CREATEd, COPYied into, and DROPped every few seconds -- perhaps that >> wasn't clear. That loop >> is necessary; a few iterations are often successful. > > Yes... I let it run all night on my laptop with no errors. My apologies to both of you. It seems the problem was indeed solved with that commit from Robert. I managed to forget that I, uh... temporary, commented out the git-pull from my build script... Thanks, Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741
On 31 May 2012 12:14, Robert Haas wrote: > On Thu, May 31, 2012 at 1:06 AM, Erik Rijkers wrote: >> In my test, I run the bash code (the bits that I posted earlier) in a while >> loop so that the table >> is CREATEd, COPYied into, and DROPped every few seconds -- perhaps that >> wasn't clear. That loop >> is necessary; a few iterations are often successful. > > Yes... I let it run all night on my laptop with no errors. It looked to me like the correct fix to me as well, FWIW. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap_form_tuple crashing
On Thu, May 31, 2012 at 5:14 PM, Heikki Linnakangas wrote: > On 31.05.2012 14:42, Atri Sharma wrote: >> >> Another thing I wanted to ask was that would you recommend building >> tuples from strings directly or converting them to Datum first and >> then build the tuples from Datum instances? > > > It depends. If you have all the values in strings already, then > BuildTupleFromCStrings() is probably the easiest. But if you have some > attributes in Datum format already, then it's probably easier and faster to > use heap_form_tuple(). > > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com Performance wise,which one would be better(I am envisioning a large set of strings to be converted to tuples)? -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap_form_tuple crashing
On 31.05.2012 14:42, Atri Sharma wrote: Another thing I wanted to ask was that would you recommend building tuples from strings directly or converting them to Datum first and then build the tuples from Datum instances? It depends. If you have all the values in strings already, then BuildTupleFromCStrings() is probably the easiest. But if you have some attributes in Datum format already, then it's probably easier and faster to use heap_form_tuple(). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap_form_tuple crashing
On Thu, May 31, 2012 at 5:02 PM, Heikki Linnakangas wrote: > On 31.05.2012 13:42, Atri Sharma wrote: >> >> I am trying to call heap_form_tuple to create a tuple from a datum. >> >> My call to heap_form_tuple looks like: >> >> >> val1=0; >> tupledesc1=BlessTupleDesc(node->ss.ss_currentRelation->rd_att); >> tuple=heap_form_tuple(tupledesc1,p1,&val1); >> >> >> p1 is a pointer to a Datum instance which is created from a char array. >> >> When I am running the code,the system is crashing. >> >> Please let me know what can the problem be and also what I should do >> to rectify it. > > > Hard to say without seeing the full code... > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com Hi Heikki, Thanks for the reply. Another thing I wanted to ask was that would you recommend building tuples from strings directly or converting them to Datum first and then build the tuples from Datum instances? Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap_form_tuple crashing
On 31.05.2012 13:42, Atri Sharma wrote: I am trying to call heap_form_tuple to create a tuple from a datum. My call to heap_form_tuple looks like: val1=0; tupledesc1=BlessTupleDesc(node->ss.ss_currentRelation->rd_att); tuple=heap_form_tuple(tupledesc1,p1,&val1); p1 is a pointer to a Datum instance which is created from a char array. When I am running the code,the system is crashing. Please let me know what can the problem be and also what I should do to rectify it. Hard to say without seeing the full code... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
On 31 May 2012 11:19, Simon Riggs wrote: > I've looked at this more closely now and I can see that the call to > XLogFlush() that is made from xact_redo_commit_internal() doesn't ever > actually flush WAL, so whether we delay or not is completely > irrelevant. > > So un-agreed. No change required to patch there. So, does that clear up the question of it being acceptable to add a delay to every existing XLogFlush() call site? I think so. Aside from the outstanding question of what to rename commit_delay/commit_siblings to, and how we might want to reframe those settings in the docs, I think that's everything. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741
On Thu, May 31, 2012 at 1:06 AM, Erik Rijkers wrote: > In my test, I run the bash code (the bits that I posted earlier) in a while > loop so that the table > is CREATEd, COPYied into, and DROPped every few seconds -- perhaps that > wasn't clear. That loop > is necessary; a few iterations are often successful. Yes... I let it run all night on my laptop with no errors. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: avoid heavyweight locking on hash metapage
On Thu, May 31, 2012 at 7:07 AM, Dickson S. Guedes wrote: > 2012/5/30 Robert Haas : >> I tested the effect of this by setting up a series of 5-minute >> read-only pgbench run at scale factor 300 with 8GB of shared buffers >> on the IBM POWER7 machine. > > I know it doesn't matter, but out of curiosity what OS you used? Fedora 16, Linux 3.2.6-3.fc16.ppc64 -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: avoid heavyweight locking on hash metapage
2012/5/30 Robert Haas : > I tested the effect of this by setting up a series of 5-minute > read-only pgbench run at scale factor 300 with 8GB of shared buffers > on the IBM POWER7 machine. I know it doesn't matter, but out of curiosity what OS you used? best regards, -- Dickson S. Guedes mail/xmpp: gue...@guedesoft.net - skype: guediz http://guedesoft.net - http://www.postgresql.org.br -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] heap_form_tuple crashing
Hi all, I am trying to call heap_form_tuple to create a tuple from a datum. My call to heap_form_tuple looks like: val1=0; tupledesc1=BlessTupleDesc(node->ss.ss_currentRelation->rd_att); tuple=heap_form_tuple(tupledesc1,p1,&val1); p1 is a pointer to a Datum instance which is created from a char array. When I am running the code,the system is crashing. Please let me know what can the problem be and also what I should do to rectify it. Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GiST buffering build, bug in levelStep calculation
On Thu, May 31, 2012 at 1:36 PM, Heikki Linnakangas < heikki.linnakan...@enterprisedb.com> wrote: > I note that the calculations assume that leaf tuples and internal tuples > have similar sizes. We calculate the average leaf tuple size, and use that > to calculate the fan-out of internal pages. On some GiST opclasses, the > values stored on internal pages might be quite different from the leaf > tuples. I don't think we need to worry about that in practice, these > calculations are not very accurate anyway, but perhaps a comment would be > in order. > Probably we could collect per-level statistics of average tuple size? -- With best regards, Alexander Korotkov.
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
On 30 May 2012 17:19, Peter Geoghegan wrote: > On 30 May 2012 15:25, Simon Riggs wrote: >>> 1. It seems wrong to do it in xact_redo_commit_internal(). It won't >>> matter if commit_siblings>0 since there won't be any other backends >>> with transaction IDs anyway, but if commit_siblings==0 then we'll >>> sleep for no possible benefit. >> >> Agreed I've looked at this more closely now and I can see that the call to XLogFlush() that is made from xact_redo_commit_internal() doesn't ever actually flush WAL, so whether we delay or not is completely irrelevant. So un-agreed. No change required to patch there. >>> 2. Doing it in FlushBuffer() seems slightly iffy since we might be >>> sitting on a buffer lock. But maybe it's a win anyway, or just not >>> worth worrying about. >> >> Agreed. > > As I've pointed out, we cannot meaningfully skip the wait for > auxiliary processes alone (except perhaps by having commit_siblings > set sufficiently high). > >> The remaining cases aren't worth worrying about, apart from >> SlruPhysicalWritePage() which happens during visibility checks and >> needs to happen as quickly as possible also. > > I'm not so sure. It says in that function: > > /* > * We must determine the largest async-commit LSN for the > page. This > * is a bit tedious, but since this entire function is a slow > path > * anyway, it seems better to do this here than to maintain a > per-page > * LSN variable (which'd need an extra comparison in the > * transaction-commit path). > */ > >> I would say the additional contention from waiting outweighs the >> benefit of the wait in those 3 places, so skipping the wait is wise. > > MinimumActiveBackends() reports the "count backends (other than > myself) that are in active transactions", so unnecessary calls will > have to occur when we have active transactions >= CommitSiblings, not > connections >= CommitSiblings as was previously the case. > > What if we were to skip the wait during recovery only, by specially > setting CommitDelay to 0 in the start-up process? Would that satisfy > everyone's concerns about unhelpful delays? I'm not sure how this > might interact with hot standby. Hmm, that was a good idea, but as of my comments above, that isn't required or useful. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GiST buffering build, bug in levelStep calculation
On 29.05.2012 23:46, Alexander Korotkov wrote: On Wed, May 30, 2012 at 12:25 AM, Tom Lane wrote: Alexander Korotkov writes: On Tue, May 29, 2012 at 11:42 PM, Tom Lane wrote: While I'm looking at this, is the first test involving effective_cache_size bulletproof either? In particular, is avgIndexTuplesPerPage clamped to be strictly greater than 1? It's based on collected statistics on already inserted tuple sizes. Since tuple sizes are measured after possible toasting, I don't see the way for avgIndexTuplesPerPage to be less than 1. Yeah, but if it could be *equal* to one, you've got a zero-divide there. avgIndexTuplesPerPage is calculated as: avgIndexTuplesPerPage = pageFreeSpace / itupAvgSize; I think size of each index tuple must be at least few times lower than pageFreeSpace to let us create any index. Hmm, in theory, it seems possible that every leaf level index tuple would completely fill an index page. Not sure how useful such an index would be, though. On internal pages, at least, you have to fit at least two tuples on a page or you can't build a tree. I note that the calculations assume that leaf tuples and internal tuples have similar sizes. We calculate the average leaf tuple size, and use that to calculate the fan-out of internal pages. On some GiST opclasses, the values stored on internal pages might be quite different from the leaf tuples. I don't think we need to worry about that in practice, these calculations are not very accurate anyway, but perhaps a comment would be in order. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
>>> We recently fixed a couple of O(N^2) loops in pg_dump, but those covered >>> extremely specific cases that might or might not have anything to do >>> with what you're seeing. The complainant was extremely helpful about >>> tracking down the problems: >>> http://archives.postgresql.org/pgsql-general/2012-03/msg00957.php >>> http://archives.postgresql.org/pgsql-committers/2012-03/msg00225.php >>> http://archives.postgresql.org/pgsql-committers/2012-03/msg00230.php >> >> I'm wondering if these fixes (or today's commit) include the case for >> a database has ~100 thounsands of tables, indexes. One of my customers >> has had troubles with pg_dump for the database, it takes over 10 >> hours. > > So I did qucik test with old PostgreSQL 9.0.2 and current (as of > commit 2755abf386e6572bad15cb6a032e504ad32308cc). In a fresh initdb-ed > database I created 100,000 tables, and each has two integer > attributes, one of them is a primary key. Creating tables were > resonably fast as expected (18-20 minutes). This created a 1.4GB > database cluster. > > pg_dump dbname >/dev/null took 188 minutes on 9.0.2, which was pretty > long time as the customer complained. Now what was current? Well it > took 125 minutes. Ps showed that most of time was spent in backend. > > Below is the script to create tables. > > cnt=10 > while [ $cnt -gt 0 ] > do > psql -e -p 5432 -c "create table t$cnt(i int primary key, j int);" test > cnt=`expr $cnt - 1` > done > > p.s. You need to increate max_locks_per_transaction before running > pg_dump (I raised to 640 in my case). Just for record, I rerun the test again with my single-LOCK patch, and now total runtime of pg_dump is 113 minutes. 188 minutes(9.0)->125 minutes(git master)->113 minutes(with my patch). So far, I'm glad to see 40% time savings at this point. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Interface of Row Level Security
2012/5/31 Yeb Havinga : > On 2012-05-30 21:26, Kohei KaiGai wrote: >> >> If we would have an "ideal optimizer", I'd still like the optimizer to >> wipe out redundant clauses transparently, rather than RLSBYPASS >> permissions, because it just controls all-or-nothing stuff. >> For example, if tuples are categorized to unclassified, classified or >> secret, and RLS policy is configured as: >> ((current_user IN ('alice', 'bob') AND X IN ('unclassified', >> 'classified')) OR (X IN 'unclassified)), >> superuser can see all the tuples, and alice and bob can see >> up to classified tuples. >> Is it really hard to wipe out redundant condition at planner stage? >> If current_user is obviously 'kaigai', it seems to me the left-side of >> this clause can be wiped out at the planner stage. > > > The query's RLS policy would be simpler if the RLS policy function that > returns the WHERE clause would take the user as argument, so its result does > not contain user conditionals. > > IF (current_user IN ('alice', 'bob') > THEN > RETURN X IN ('unclassified', 'classified')) > ELSE > RETURN X IN ('unclassified') > END IF; > Yes, it is a happy case. But the point I'm concern about is, the conditions to branch cases are not limited to current user-id. The RLS policy shall be appended at planner stage, so prepared statement needs to be invalidated whenever its prerequisites are changed. For example, someone may assign a function that returns RLS policy depending on whether the current hour is even-number of odd-number. It implies that we cannot predicate all the cases to invalidate prepared statement with RLS policy, because of too much flexibility. Thanks, -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741
On 31.05.2012 08:06, Erik Rijkers wrote: On Thu, May 31, 2012 03:30, Robert Haas wrote: On Wed, May 30, 2012 at 6:00 PM, Erik Rijkers wrote: directory 2012-05-30 23:40:57.909 CEST 3909 CONTEXT: writing block 5152 of relation base/21268/26569 xlog redo multi-insert (init): rel 1663/21268/26581; blk 3852; 35 tuples TRAP: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741) 2012-05-30 23:40:58.006 CEST 5331 FATAL: could not open file "base/21268/26569": No such file or directory 2012-05-30 23:40:58.006 CEST 5331 CONTEXT: writing block 5153 of relation base/21268/26569 2012-05-30 23:40:59.661 CEST 3908 LOG: startup process (PID 3909) was terminated by signal 6: Aborted 2012-05-30 23:40:59.661 CEST 3908 LOG: terminating any other active server processes Hmm. I set up what I believe to be the same test case you were using, and it's not crashing for me. In fact, with the latest code, I haven't been able to produce any error at all. When I reverted my last commit, I managed to get this: ERROR: could not open relation with OID 18229 STATEMENT: select current_setting('port') port, count(*) from t ...but just once, and with no other error messages.So I'm either missing a step somewhere, or something about your system just happens to tickle this moreso than on mine. In my test, I run the bash code (the bits that I posted earlier) in a while loop so that the table is CREATEd, COPYied into, and DROPped every few seconds -- perhaps that wasn't clear. That loop is necessary; a few iterations are often successful. I can test it today on a few other systems to see if it is reproducible. I could no longer reproduce it after Robert's fix, the test case has been running for about an hour now. Please triple-check that you have it applied in the standby :-). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Interface of Row Level Security
On 2012-05-30 21:26, Kohei KaiGai wrote: If we would have an "ideal optimizer", I'd still like the optimizer to wipe out redundant clauses transparently, rather than RLSBYPASS permissions, because it just controls all-or-nothing stuff. For example, if tuples are categorized to unclassified, classified or secret, and RLS policy is configured as: ((current_user IN ('alice', 'bob') AND X IN ('unclassified', 'classified')) OR (X IN 'unclassified)), superuser can see all the tuples, and alice and bob can see up to classified tuples. Is it really hard to wipe out redundant condition at planner stage? If current_user is obviously 'kaigai', it seems to me the left-side of this clause can be wiped out at the planner stage. The query's RLS policy would be simpler if the RLS policy function that returns the WHERE clause would take the user as argument, so its result does not contain user conditionals. IF (current_user IN ('alice', 'bob') THEN RETURN X IN ('unclassified', 'classified')) ELSE RETURN X IN ('unclassified') END IF; regards, Yeb -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers