Re: [HACKERS] Allowing multiple concurrent base backups
On 18.03.2011 13:56, Heikki Linnakangas wrote: On 18.03.2011 10:48, Heikki Linnakangas wrote: On 17.03.2011 21:39, Robert Haas wrote: On Mon, Jan 31, 2011 at 10:45 PM, Fujii Masaomasao.fu...@gmail.com wrote: On Tue, Feb 1, 2011 at 1:31 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Hmm, good point. It's harmless, but creating the history file in the first place sure seems like a waste of time. The attached patch changes pg_stop_backup so that it doesn't create the backup history file if archiving is not enabled. When I tested the multiple backups, I found that they can have the same checkpoint location and the same history file name. $ for ((i=0; i4; i++)); do pg_basebackup -D test$i -c fast -x -l test$i done $ cat test0/backup_label START WAL LOCATION: 0/2B0 (file 00010002) CHECKPOINT LOCATION: 0/2E8 START TIME: 2011-02-01 12:12:31 JST LABEL: test0 $ cat test1/backup_label START WAL LOCATION: 0/2B0 (file 00010002) CHECKPOINT LOCATION: 0/2E8 START TIME: 2011-02-01 12:12:31 JST LABEL: test1 $ cat test2/backup_label START WAL LOCATION: 0/2B0 (file 00010002) CHECKPOINT LOCATION: 0/2E8 START TIME: 2011-02-01 12:12:31 JST LABEL: test2 $ cat test3/backup_label START WAL LOCATION: 0/2B0 (file 00010002) CHECKPOINT LOCATION: 0/2E8 START TIME: 2011-02-01 12:12:31 JST LABEL: test3 $ ls archive/*.backup archive/00010002.00B0.backup This would cause a serious problem. Because the backup-end record which indicates the same START WAL LOCATION can be written by the first backup before the other finishes. So we might think wrongly that we've already reached a consistency state by reading the backup-end record (written by the first backup) before reading the last required WAL file. /* * Force a CHECKPOINT. Aside from being necessary to prevent torn * page problems, this guarantees that two successive backup runs will * have different checkpoint positions and hence different history * file names, even if nothing happened in between. * * We use CHECKPOINT_IMMEDIATE only if requested by user (via passing * fast = true). Otherwise this can take awhile. */ RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | (fast ? CHECKPOINT_IMMEDIATE : 0)); This problem happens because the above code (in do_pg_start_backup) actually doesn't ensure that the concurrent backups have the different checkpoint locations. ISTM that we should change the above or elsewhere to ensure that. Yes, good point. Here's a patch based on that approach, ensuring that each base backup uses a different checkpoint as the start location. I think I'll commit this, rather than invent a new unique ID mechanism for backups. The latter would need changes in recovery and control file too, and I don't feel like tinkering with that at this stage. Ok, committed this. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 2nd Level Buffer Cache
On Fri, Mar 18, 2011 at 11:55 PM, Josh Berkus j...@agliodbs.com wrote: To take the opposite approach... has anyone looked at having the OS just manage all caching for us? Something like MMAPed shared buffers? Even if we find the issue with large shared buffers, we still can't dedicate serious amounts of memory to them because of work_mem issues. Granted, that's something else on the TODO list, but it really seems like we're re-inventing the wheels that the OS has already created here... A lot of people have talked about it. You can find references to mmap going at least as far back as 2001 or so. The problem is that it would depend on the OS implementing things in a certain way and guaranteeing things we don't think can be portably assumed. We would need to mlock large amounts of address space which most OS's don't allow, and we would need to at least mlock and munlock lots of small bits of memory all over the place which would create lots and lots of mappings which the kernel and hardware implementations would generally not appreciate. As far as I know, no OS has a more sophisticated approach to eviction than LRU. And clock-sweep is a significant improvement on performance over LRU for frequently accessed database objects ... plus our optimizations around not overwriting the whole cache for things like VACUUM. The clock-sweep algorithm was standard OS design before you or I knew how to type. I would expect any half-decent OS to have sometihng at least as good -- perhaps better because it can rely on hardware features to handle things. However the second point is the crux of the issue and of all similar issues on where to draw the line between the OS and Postgres. The OS knows better about the hardware characteristics and can better optimize the overall system behaviour, but Postgres understands better its own access patterns and can better optimize its behaviour whereas the OS is stuck reverse-engineering what Postgres needs, usually from simple heuristics. 2-level caches work well for a variety of applications. I think 2-level caches with simple heuristics like pin all the indexes is unlikely to be helpful. At least it won't optimize the average case and I think that's been proven. It might be helpful for optimizing the worst-case which would reduce the standard deviation. Perhaps we're at the point now where that matters. Where it might be helpful is as a more refined version of the sequential scans use limited set of buffers patch. Instead of having each sequential scan use a hard coded number of buffers, perhaps all sequential scans should share a fraction of the global buffer pool managed separately from the main pool. Though in my thought experiments I don't see any real win here. In the current scheme if there's any sign the buffer is useful it gets thrown from the sequential scan's set of buffers to reuse anyways. Now, what would be *really* useful is some way to avoid all the data copying we do between shared_buffers and the FS cache. Well the two options are mmap/mlock or directio. The former might be a fun experiment but I expect any OS to fall over pretty quickly when faced with thousands (or millions) of 8kB mappings. The latter would need Postgres to do async i/o and hopefully a global view of its i/o access patterns so it could do prefetching in a lot more cases. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rectifying wrong Date outputs
On 21.03.2011 07:40, Piyush Newe wrote: On Thu, Mar 17, 2011 at 7:56 PM, Tom Lanet...@sss.pgh.pa.us wrote: Robert Haasrobertmh...@gmail.com writes: On Thu, Mar 17, 2011 at 9:46 AM, Alvaro Herrera alvhe...@commandprompt.com wrote: Keep in mind that the datetime stuff was abandoned by the maintainer some years ago with quite some rough edges. Some of it has been fixed, but a lot of bugs remain. Looks like this is one of those places and it seems appropriate to spend some time fixing it. Since it would involve a behavior change, it should only go to 9.2, of course. I wouldn't object to fixing the problem with # of digits # of Ys in 9.1, if the fix is simple and clear-cut. I think we are still accepting patches to make minor tweaks, like the tab-completion patch I committed yesterday. It also doesn't bother me tremendously if we push it off, but I don't think that anyone's going to be too sad if TO_DATE('01-jan-2010', 'DD-MON-YYY') starts returning something more sensible than 3010-01-01. Agreed, it's certainly not too late for bug fixes in 9.1. I agree that this isn't something we would want to tweak in released branches, but 9.1 isn't there yet. I feel the patch for the same would be easier and was attached in the initial mail of this mail thread. For your ready reference, I am attaching the same patch here again. Having said that, it's not entirely clear to me what sane behavior is here. Personally I would expect that an n-Ys format spec would consume at most n digits from the input. Otherwise how are you going to use to_date to pick apart strings that don't have any separators? Yeah, seems reasonable. So I think the problem is actually upstream of the behavior complained of here. However, what we should first do is see what Oracle does in such cases, because the main driving factor for these functions is Oracle compatibility not what might seem sane in a vacuum. Following is the extended chart which is comparing the behavior of Oracle, PG EDBAS. *Data Format Oracle PostgreSQL EDBAS* TO_DATE('01-jan-1', 'DD-MON-Y') 01-JAN-2011 01-JAN-2001 01-JAN-2001 TO_DATE('01-jan-1', 'DD-MON-YY') 01-JAN-2001 01-JAN-2001 01-JAN-2001 TO_DATE('01-jan-1', 'DD-MON-YYY') 01-JAN-2001 01-JAN-2001 01-JAN-2001 TO_DATE('01-jan-1', 'DD-MON-') 01-JAN-0001 01-JAN-0001 01-JAN-0001 In this case, all the cases are in sync except the 1st one. I didn't understand why Oracle is interpreting year '1' as '2011'. It makes sense to me. Year 1, when dat format is Y, means the year closest to current date that ends with 1. Or maybe the year that ends with 1 in the current decade. This is analoguous to how two-digit years are interpreted (except that we've hardcoded that the current date to compare against is year 2000 - an assumption that will start to bite us some time before year 2100). *Data Format Oracle PostgreSQL EDBAS* TO_DATE('01-jan-10', 'DD-MON-Y') Error 01-JAN-2010 Error TO_DATE('01-jan-10', 'DD-MON-YY') 01-JAN-2010 01-JAN-2010 01-JAN-2010 TO_DATE('01-jan-10', 'DD-MON-YYY') 01-JAN-2010 01-JAN-2010 01-JAN-2010 TO_DATE('01-jan-10', 'DD-MON-') 01-JAN-0010 01-JAN-0010 01-JAN-0010 In this case, it seems in last 3 cases PostgreSQL is behaving correctly. Oracle is throwing error in 1st case since the Format ('Y') is lesser than the actual value ('10'). But PostgreSQL is ignoring this case and throwing whatever is input. The output is might not be the same was user is expecting. *Data Format Oracle PostgreSQL EDBAS* TO_DATE('01-jan-067', 'DD-MON-Y') Error 01-JAN-2067 Error TO_DATE('01-jan-111', 'DD-MON-YY') 01-JAN-0111 01-JAN-2011 Error TO_DATE('01-jan-678', 'DD-MON-YYY') 01-JAN-2678 01-JAN-1678 01-JAN-2678 TO_DATE('01-jan-001', 'DD-MON-') 01-JAN-0001 01-JAN-0001 01-JAN-0001 In this case, just last case was correct in PG. Rest other cases are not in sync with Oracle, rather the output is vague. *Data Format Oracle PostgreSQL EDBAS* TO_DATE('01-jan-2010', 'DD-MON-Y') Error 01-JAN-4010 Error TO_DATE('01-jan-2010', 'DD-MON-YY') 01-JAN-2010 01-JAN-3910 Error TO_DATE('01-jan-2010', 'DD-MON-YYY') Error 01-JAN-3010 Error TO_DATE('01-jan-2010', 'DD-MON-') 01-JAN-2010 01-JAN-2010 01-JAN-2010 In this case, PG is giving wrong output in first 3 cases. Those need to get rectified. Oracle is throwing error in 1st and 3rd case and the reason is, the format is lesser than the actual value. It seems this rule is not applicable for 2nd case in Oracle. Yeah, quite inconsistent :-(. These results are not in favor of the idea that a format with n Ys always consumess up to n digits from the input. With that rule, to_date('01-jan-2010', 'DD-MON-YY') would return 01-JAN-2020, which isn't what Oracle does and seems quite surprising too. So ignoring the cases where Oracle throws an error but PostgreSQL doesn't, there's four cases where the results differ: *Data Format Oracle PostgreSQL EDBAS* TO_DATE('01-jan-1', 'DD-MON-Y') 01-JAN-2011 01-JAN-2001 01-JAN-2001
[HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL
Hi list, When I have fields with lots of null values, I often create indexes like this: CREATE INDEX foo_i ON foo(i) WHERE i IS NOT NULL; This saves me some space, as most indexed queries exclude NULLs anyway. In PostgreSQL 9.0.3, min(i) can successfully use this index: --- marti=# create table foo as select null::int as i from generate_series(1,10); marti=# create index foo_i_notnull on foo (i) where i is not null; marti=# analyze foo; marti=# explain analyze select min(i) from foo; Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.025..0.026 rows=1 loops=1) InitPlan 1 (returns $0) - Limit (cost=0.00..0.00 rows=1 width=4) (actual time=0.021..0.021 rows=0 loops=1) - Index Scan using foo_i_notnull on foo (cost=0.00..8.27 rows=10 width=4) (actual time=0.019..0.019 rows=0 loops=1) Total runtime: 0.063 ms --- It seems that PostgreSQL 9.1alpha3 cannot, however: --- marti=# explain analyze select min(i) from foo; Aggregate (cost=1594.00..1594.01 rows=1 width=4) (actual time=29.612..29.612 rows=1 loops=1) - Seq Scan on foo (cost=0.00..1344.00 rows=10 width=4) (actual time=0.023..14.221 rows=10 loops=1) Total runtime: 29.661 ms --- It would be cool to have this feature re-added before a 9.1 release. I know that the Merge Append patch required some changes in the min/max optimization, which is probably the cause. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] How to look at the Expression Trees
Hi all, I have been trying to work on the expression evaluator (trying to alter it just for the seqscan case). I have understood a few things. I wish someone could tell me if I am wrong at some point. As far as I have gone through the code, I think: 1. Quals are formulated by planner 2. Quals are actually a list of Expression Trees. 3. They are created in the ExecInitExpr function. 4. Every row / tuple table slot is run through the same qual list and thus goes through the same expression tree execution path as the other (of course being filtered in between if they do not fit the qual in the list). 5. The most common nodes in the expression trees are the ExecEvalVar, ExecEvalConst and ExecEvalParam. I might be wrong somewhere (especially most people would be able to say a lot about the 5th point). But if the above were to be correct then how and why are the ExecMakeFunctionResultNoSets, ExecEvalRelabelType, ExecEvalFuncArgs and the likes are used? I wanted to see how the expression tree gets into form before it gets into the ExecQual for parse by ExecEvalExpr function. Is there a way to see the Expression Tree so that I get a better idea about what is happening? Regards, Vaibhav -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to look at the Expression Trees
On 21.03.2011 13:44, Vaibhav Kaushal wrote: Hi all, I have been trying to work on the expression evaluator (trying to alter it just for the seqscan case). I have understood a few things. I wish someone could tell me if I am wrong at some point. As far as I have gone through the code, I think: 1. Quals are formulated by planner 2. Quals are actually a list of Expression Trees. 3. They are created in the ExecInitExpr function. 4. Every row / tuple table slot is run through the same qual list and thus goes through the same expression tree execution path as the other (of course being filtered in between if they do not fit the qual in the list). Yes. There's actually two trees involved. The planner produces a tree of Expr nodes, and ExecInitExpr prepares a tree of ExprState nodes that mirrors the first tree. The ExprStates contain run-time state needed to execute the Expr tree. 5. The most common nodes in the expression trees are the ExecEvalVar, ExecEvalConst and ExecEvalParam. Well, that's obviously going to depend on the query. I might be wrong somewhere (especially most people would be able to say a lot about the 5th point). But if the above were to be correct then how and why are the ExecMakeFunctionResultNoSets, ExecEvalRelabelType, ExecEvalFuncArgs and the likes are used? ExecMakeFunctionResultNoSets is used to evaluate function calls. The first function call invocation always uses ExecMakeFunctionResult, but if ExecMakeFunctionResult sees on that first invocation that it was not a set-returning-function, it changes the evaluator function for subsequent invocations to ExecMakeFunctionResultNoSets. ExecMakeFunctionResultNoSets does the same thing as ExecMakeFunctionResult, but skips the checks for set-returning functions, making the evaluation a bit faster. ExecEvalFuncArgs is used by ExecMakeFunctionResult to evaluate the function arguments. ExecEvalRelabelType is used to evaluate RelabelType nodes. RelabelType doesn't really do anything, it's just a placeholder when a type is cast to another, and the source and target types are binary compatible. I wanted to see how the expression tree gets into form before it gets into the ExecQual for parse by ExecEvalExpr function. Is there a way to see the Expression Tree so that I get a better idea about what is happening? set debug_print_plan=on -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to look at the Expression Trees
On Mon, Mar 21, 2011 at 5:47 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Yes. There's actually two trees involved. The planner produces a tree of Expr nodes, and ExecInitExpr prepares a tree of ExprState nodes that mirrors the first tree. The ExprStates contain run-time state needed to execute the Expr tree. Yes I am seeing that in the code. 5. The most common nodes in the expression trees are the ExecEvalVar, ExecEvalConst and ExecEvalParam. Well, that's obviously going to depend on the query. Indeed yes, it does, but still for most cases the ExecEvalVar or ExecEvalScalarVar is seen a lot of times (I am trying only simple queries which take most data from disk) which is why i said that. Anyways you are right there. I might be wrong somewhere (especially most people would be able to say a lot about the 5th point). But if the above were to be correct then how and why are the ExecMakeFunctionResultNoSets, ExecEvalRelabelType, ExecEvalFuncArgs and the likes are used? ExecMakeFunctionResultNoSets is used to evaluate function calls. The first function call invocation always uses ExecMakeFunctionResult, but if ExecMakeFunctionResult sees on that first invocation that it was not a set-returning-function, it changes the evaluator function for subsequent invocations to ExecMakeFunctionResultNoSets. ExecMakeFunctionResultNoSets does the same thing as ExecMakeFunctionResult, but skips the checks for set-returning functions, making the evaluation a bit faster. ExecEvalFuncArgs is used by ExecMakeFunctionResult to evaluate the function arguments. ExecEvalRelabelType is used to evaluate RelabelType nodes. RelabelType doesn't really do anything, it's just a placeholder when a type is cast to another, and the source and target types are binary compatible. did not know the last fact. I wanted to see how the expression tree gets into form before it gets into the ExecQual for parse by ExecEvalExpr function. Is there a way to see the Expression Tree so that I get a better idea about what is happening? set debug_print_plan=on I am already using the postgresql server with -d 4 option and it shows a lot of things. But I am not able to see the Expression State trees. OK. I know that the output DOES show the 'expr' entries. But if those are what make the Expression _tree_ then I am not able to understand them. A little help on that would be generous. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com Regards, Vaibhav
Re: [HACKERS] How to look at the Expression Trees
On 21.03.2011 14:37, Vaibhav Kaushal wrote: I am already using the postgresql server with -d 4 option and it shows a lot of things. But I am not able to see the Expression State trees. To be precise, debug_print_plan=on prints the expression tree that comes from planner, not the execution tree of ExprStates. But the structure of the ExprState tree is the same as the planner tree, you just have ExprState nodes in place of Expr nodes. OK. I know that the output DOES show the 'expr' entries. But if those are what make the Expression _tree_ then I am not able to understand them. A little help on that would be generous. It is indeed a tree that gets printed with debug_print_plan. There's more than the expression tree there, with information about planner the relations involved etc. Look for targetlist and qual fields in the output, that's where the expression trees are. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL
On Mon, Mar 21, 2011 at 7:17 AM, Marti Raudsepp ma...@juffo.org wrote: Hi list, When I have fields with lots of null values, I often create indexes like this: CREATE INDEX foo_i ON foo(i) WHERE i IS NOT NULL; This saves me some space, as most indexed queries exclude NULLs anyway. In PostgreSQL 9.0.3, min(i) can successfully use this index: --- marti=# create table foo as select null::int as i from generate_series(1,10); marti=# create index foo_i_notnull on foo (i) where i is not null; marti=# analyze foo; marti=# explain analyze select min(i) from foo; Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.025..0.026 rows=1 loops=1) InitPlan 1 (returns $0) - Limit (cost=0.00..0.00 rows=1 width=4) (actual time=0.021..0.021 rows=0 loops=1) - Index Scan using foo_i_notnull on foo (cost=0.00..8.27 rows=10 width=4) (actual time=0.019..0.019 rows=0 loops=1) Total runtime: 0.063 ms --- It seems that PostgreSQL 9.1alpha3 cannot, however: --- marti=# explain analyze select min(i) from foo; Aggregate (cost=1594.00..1594.01 rows=1 width=4) (actual time=29.612..29.612 rows=1 loops=1) - Seq Scan on foo (cost=0.00..1344.00 rows=10 width=4) (actual time=0.023..14.221 rows=10 loops=1) Total runtime: 29.661 ms --- It would be cool to have this feature re-added before a 9.1 release. I know that the Merge Append patch required some changes in the min/max optimization, which is probably the cause. Yeah, I think this is a direct result of commit 034967bdcbb0c7be61d0500955226e1234ec5f04. I was kind of nervous about that one when it went in, and the fact that we're getting our first complaint about it before we've even hit beta is not setting my mind at ease... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rectifying wrong Date outputs
On Mon, Mar 21, 2011 at 6:24 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Having said that, it's not entirely clear to me what sane behavior is here. Personally I would expect that an n-Ys format spec would consume at most n digits from the input. Otherwise how are you going to use to_date to pick apart strings that don't have any separators? Yeah, seems reasonable. On the flip side, what if you want to allow either a two digit year or a four digit year? It doesn't seem unreasonable to allow YY to emcompass what would have allowed, unless there's a separate notion for 'either YY or '. It makes sense to me. Year 1, when dat format is Y, means the year closest to current date that ends with 1. Or maybe the year that ends with 1 in the current decade. This is analoguous to how two-digit years are interpreted (except that we've hardcoded that the current date to compare against is year 2000 - an assumption that will start to bite us some time before year 2100). Agree with all of this. So ignoring the cases where Oracle throws an error but PostgreSQL doesn't, there's four cases where the results differ: *Data Format Oracle PostgreSQL EDBAS* TO_DATE('01-jan-1', 'DD-MON-Y') 01-JAN-2011 01-JAN-2001 01-JAN-2001 TO_DATE('01-jan-111', 'DD-MON-YY') 01-JAN-0111 01-JAN-2011 Error TO_DATE('01-jan-678', 'DD-MON-YYY') 01-JAN-2678 01-JAN-1678 01-JAN-2678 TO_DATE('01-jan-2010', 'DD-MON-YY') 01-JAN-2010 01-JAN-3910 Error IMHO our current behavior in 2nd and 4th case is so bizarre that we should change them to match Oracle. I think we should fix the 1st too, the notion that a single-digit year means something between 2000-2009 seems pretty useless (granted, using a single digit for year is brain-dead to begin with). I agree, but do we understand what Oracle does categorically, rather than just its output on this specific input? The 3rd one is debatable. The range for three-digit years is currently 1100-2099, which is enough range for many applications. But should we change it for the sake of matching Oracle's behavior? Not that anyone uses YYY in practice, but still. I'm OK with that, but again, exactly what rule is Oracle applying here? BTW, whatever behavior we choose, this needs to be documented. I don't see anything in the docs on how Y, YY or YYY are expanded. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rectifying wrong Date outputs
Robert Haas robertmh...@gmail.com writes: On Mon, Mar 21, 2011 at 6:24 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Having said that, it's not entirely clear to me what sane behavior is here. Personally I would expect that an n-Ys format spec would consume at most n digits from the input. Otherwise how are you going to use to_date to pick apart strings that don't have any separators? Yeah, seems reasonable. On the flip side, what if you want to allow either a two digit year or a four digit year? It doesn't seem unreasonable to allow YY to emcompass what would have allowed, unless there's a separate notion for 'either YY or '. What I was thinking was that would take either 2 or 4 digits. Whatever you do here, the year will have to be delimited by a non-digit for such cases to be parseable. I'm OK with that, but again, exactly what rule is Oracle applying here? Yeah. Hopefully they documented it, and we don't have to try to reverse-engineer the intention from an undersized set of samples. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rectifying wrong Date outputs
On Mon, Mar 21, 2011 at 9:57 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Mar 21, 2011 at 6:24 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Having said that, it's not entirely clear to me what sane behavior is here. Personally I would expect that an n-Ys format spec would consume at most n digits from the input. Otherwise how are you going to use to_date to pick apart strings that don't have any separators? Yeah, seems reasonable. On the flip side, what if you want to allow either a two digit year or a four digit year? It doesn't seem unreasonable to allow YY to emcompass what would have allowed, unless there's a separate notion for 'either YY or '. What I was thinking was that would take either 2 or 4 digits. Whatever you do here, the year will have to be delimited by a non-digit for such cases to be parseable. I was assuming a slightly more general variant of that - namely, Y, YY, or YYY would all accept that many digits, or more; and the result of Y with 2, 3, or 4 digits would be the same as if YY, YYY, or , respectively, had been used. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rectifying wrong Date outputs
Robert Haas robertmh...@gmail.com writes: On Mon, Mar 21, 2011 at 9:57 AM, Tom Lane t...@sss.pgh.pa.us wrote: What I was thinking was that would take either 2 or 4 digits. Whatever you do here, the year will have to be delimited by a non-digit for such cases to be parseable. I was assuming a slightly more general variant of that - namely, Y, YY, or YYY would all accept that many digits, or more; and the result of Y with 2, 3, or 4 digits would be the same as if YY, YYY, or , respectively, had been used. As far as I can see, that would completely destroy the use-case of trying to parse a string where there's not non-digit delimiters and so you have to take exactly the specified number of digits, not more. Why not head in the other direction of allowing fewer digits than suggested by the format, instead of more? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL
Robert Haas robertmh...@gmail.com writes: On Mon, Mar 21, 2011 at 7:17 AM, Marti Raudsepp ma...@juffo.org wrote: I know that the Merge Append patch required some changes in the min/max optimization, which is probably the cause. Yeah, I think this is a direct result of commit 034967bdcbb0c7be61d0500955226e1234ec5f04. Yeah, looks that way. I'm not sure what it would take to re-support this case without losing the other advantages of the change. Personally I'm not terribly excited about it: I don't think that suppressing nulls from an index this way is really very useful. Using a partial index probably eats more planner cycles than you'll save, overall. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] When and where do PG invoke PLs module?
Hello, Thank you for reading my question! Lately,I‘am reading PostgreSQL’s source code! As I understand,the PLs module(such as pl/pgSQL)allow user-defined functions to be writter in other languages. AS postgresql-9.0-document.pdf says :For a function written in a procedurallanguage, the database server has no built-in knowledge about how to interpret the function’s source text. Instead, the task is passed to a special handler that knows the details of the language.The handler could either do all the work of parsing, syntax analysis, execution, etc. itself, or it could serve as “glue” between PostgreSQL and an existing implementation of a programming language. I've tried to find when and where do PG invoke PLs module,but failed.There are four procedures for a query string--parer, rewrite,plan and execute. I want to know which part invoke the PLs module,and which function is the entry to do that. Looking forward for your reply! from stone.
Re: [HACKERS] 2nd Level Buffer Cache
On Mon, 21 Mar 2011 10:24:22 +, Greg Stark wrote: On Fri, Mar 18, 2011 at 11:55 PM, Josh Berkus j...@agliodbs.com wrote: To take the opposite approach... has anyone looked at having the OS just manage all caching for us? Something like MMAPed shared buffers? Even if we find the issue with large shared buffers, we still can't dedicate serious amounts of memory to them because of work_mem issues. Granted, that's something else on the TODO list, but it really seems like we're re-inventing the wheels that the OS has already created here... A lot of people have talked about it. You can find references to mmap going at least as far back as 2001 or so. The problem is that it would depend on the OS implementing things in a certain way and guaranteeing things we don't think can be portably assumed. We would need to mlock large amounts of address space which most OS's don't allow, and we would need to at least mlock and munlock lots of small bits of memory all over the place which would create lots and lots of mappings which the kernel and hardware implementations would generally not appreciate. Actually, just from curious, I done test with mmap, and I got 2% boost on data reading, maybe because of skipping memcpy in fread. I really curious how fast, if even, it will be if I add some good and needed stuff and how e.g. vacuum will work. snip 2-level caches work well for a variety of applications. I think 2-level caches with simple heuristics like pin all the indexes is unlikely to be helpful. At least it won't optimize the average case and I think that's been proven. It might be helpful for optimizing the worst-case which would reduce the standard deviation. Perhaps we're at the point now where that matters. Actually, 2nd level caches do not pin index buffer. It's just, in simple words, some set of reserved buffers' ids to be used for index pages, all logic with pining, etc. it's same, the difference is that default level operation will not touch 2nd level. I post some reports from my simple tests. When I was experimenting with 2nd level caches I saw that some operations may swap out system tables buffers, too. snip Regards, Radek -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL
On Mon, Mar 21, 2011 at 10:25 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Mar 21, 2011 at 7:17 AM, Marti Raudsepp ma...@juffo.org wrote: I know that the Merge Append patch required some changes in the min/max optimization, which is probably the cause. Yeah, I think this is a direct result of commit 034967bdcbb0c7be61d0500955226e1234ec5f04. Yeah, looks that way. I'm not sure what it would take to re-support this case without losing the other advantages of the change. Personally I'm not terribly excited about it: I don't think that suppressing nulls from an index this way is really very useful. Using a partial index probably eats more planner cycles than you'll save, overall. If only 1% of the table has non-NULL values in that column, maybe not. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rectifying wrong Date outputs
On Mon, Mar 21, 2011 at 10:18 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Mar 21, 2011 at 9:57 AM, Tom Lane t...@sss.pgh.pa.us wrote: What I was thinking was that would take either 2 or 4 digits. Whatever you do here, the year will have to be delimited by a non-digit for such cases to be parseable. I was assuming a slightly more general variant of that - namely, Y, YY, or YYY would all accept that many digits, or more; and the result of Y with 2, 3, or 4 digits would be the same as if YY, YYY, or , respectively, had been used. As far as I can see, that would completely destroy the use-case of trying to parse a string where there's not non-digit delimiters and so you have to take exactly the specified number of digits, not more. Yeah, I thought about that, but it seems that use case is already hopelessly broken in both PostgreSQL and Oracle, so I'm disinclined to worry about it. If Piyush's table is to be believed, Oracle only throws an error for the wrong number of digits if the format is Y or YYY, and the actual number of digits is more. If the format is YY, then it accepts 2, 3, or 4 digit years. And since YY is exponentially more likely to be used than Y or YYY, that pretty much means you can't do what you're talking about using this syntax anyway. Why not head in the other direction of allowing fewer digits than suggested by the format, instead of more? Well, that seems a bit counterintuitive to me. I think it's much more likely that someone wants to insist on a four-digit year (and not allow just two digits) than that they want to insist on a two-digit year (and not allow four digits). I also think that would be pretty terrible for Oracle compatibility, since they're clearly interpreting 99 vs. as meaning either 0099, not 1999 or 2099. I don't think we want to be randomly incompatible there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 2nd Level Buffer Cache
On Mon, Mar 21, 2011 at 5:24 AM, Greg Stark gsst...@mit.edu wrote: On Fri, Mar 18, 2011 at 11:55 PM, Josh Berkus j...@agliodbs.com wrote: To take the opposite approach... has anyone looked at having the OS just manage all caching for us? Something like MMAPed shared buffers? Even if we find the issue with large shared buffers, we still can't dedicate serious amounts of memory to them because of work_mem issues. Granted, that's something else on the TODO list, but it really seems like we're re-inventing the wheels that the OS has already created here... A lot of people have talked about it. You can find references to mmap going at least as far back as 2001 or so. The problem is that it would depend on the OS implementing things in a certain way and guaranteeing things we don't think can be portably assumed. We would need to mlock large amounts of address space which most OS's don't allow, and we would need to at least mlock and munlock lots of small bits of memory all over the place which would create lots and lots of mappings which the kernel and hardware implementations would generally not appreciate. As far as I know, no OS has a more sophisticated approach to eviction than LRU. And clock-sweep is a significant improvement on performance over LRU for frequently accessed database objects ... plus our optimizations around not overwriting the whole cache for things like VACUUM. The clock-sweep algorithm was standard OS design before you or I knew how to type. I would expect any half-decent OS to have sometihng at least as good -- perhaps better because it can rely on hardware features to handle things. However the second point is the crux of the issue and of all similar issues on where to draw the line between the OS and Postgres. The OS knows better about the hardware characteristics and can better optimize the overall system behaviour, but Postgres understands better its own access patterns and can better optimize its behaviour whereas the OS is stuck reverse-engineering what Postgres needs, usually from simple heuristics. 2-level caches work well for a variety of applications. I think 2-level caches with simple heuristics like pin all the indexes is unlikely to be helpful. At least it won't optimize the average case and I think that's been proven. It might be helpful for optimizing the worst-case which would reduce the standard deviation. Perhaps we're at the point now where that matters. Where it might be helpful is as a more refined version of the sequential scans use limited set of buffers patch. Instead of having each sequential scan use a hard coded number of buffers, perhaps all sequential scans should share a fraction of the global buffer pool managed separately from the main pool. Though in my thought experiments I don't see any real win here. In the current scheme if there's any sign the buffer is useful it gets thrown from the sequential scan's set of buffers to reuse anyways. Now, what would be *really* useful is some way to avoid all the data copying we do between shared_buffers and the FS cache. Well the two options are mmap/mlock or directio. The former might be a fun experiment but I expect any OS to fall over pretty quickly when faced with thousands (or millions) of 8kB mappings. The latter would need Postgres to do async i/o and hopefully a global view of its i/o access patterns so it could do prefetching in a lot more cases. Can't you make just one large mapping and lock it in 8k regions? I thought the problem with mmap was not being able to detect other processes (http://www.mail-archive.com/pgsql-general@postgresql.org/msg122301.html) compatibility issues (possibly obsolete), etc. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 2nd Level Buffer Cache
On 21.03.2011 17:54, Merlin Moncure wrote: Can't you make just one large mapping and lock it in 8k regions? I thought the problem with mmap was not being able to detect other processes (http://www.mail-archive.com/pgsql-general@postgresql.org/msg122301.html) compatibility issues (possibly obsolete), etc. That mail is about replacing SysV shared memory with mmap(). Detecting other processes is a problem in that use, but that's not an issue with using mmap() to replace shared buffers. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL
Robert Haas wrote: Tom Lane wrote: I don't think that suppressing nulls from an index this way is really very useful. Using a partial index probably eats more planner cycles than you'll save, overall. If only 1% of the table has non-NULL values in that column, maybe not. We definitely have indexes with less than 1% non-NULL, and we've found partial indexes to be efficient for them. On the other hand, I can't think where we do min/max on any of them; so as long as this regression only affects those aggregates, it won't hurt our shop. The use case doesn't seem all that far-fetched to me, though. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Sync Rep and shutdown Re: [HACKERS] Sync Rep v19
On 2011-03-21 02:05, Robert Haas wrote: On Sun, Mar 20, 2011 at 11:03 AM, Yeb Havingayebhavi...@gmail.com wrote: On 2011-03-20 05:44, Robert Haas wrote: Hmm, I'm not going to be able to reproduce this here, and my test setup didn't show a clear regression. I can try beating on it some more, but... Any chance you could rerun your test with the latest master-branch code, and perhaps also with the patch I proposed upthread to remove a branch from the section protection by SyncRepLock? I can't really tell from reading the emails you linked what was responsible for the slowdowns and speedups, and it is unclear to me how much impact my recent changes actually had. No problem. Could you tell me the name of the remove a branch from the section protection by SyncRepLock ? patch, or perhaps a message-link? Upthread I see sync-standbys-defined-rearrangement.patch but also two sync-rep-wait-fixes. Thanks! The things I'd like to see compared are: pgbench -i -s 50 test Two runs of pgbench -c 10 -M prepared -T 600 test with 1 sync standby - server configs etc were mailed upthread. - performance as of commit e148443ddd95cd29edf4cc1de6188eb9cee029c5 1158 and 1306 (avg 1232) - performance as of current git master 1181 and 1280 (avg 1230,5) - performance as of current git master with sync-standbys-defined-rearrangement applied 1152 and 1269 (avg 1210,5) -- Yeb Havinga http://www.mgrid.net/ Mastering Medical Data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rectifying wrong Date outputs
Robert Haas robertmh...@gmail.com writes: On Mon, Mar 21, 2011 at 10:18 AM, Tom Lane t...@sss.pgh.pa.us wrote: As far as I can see, that would completely destroy the use-case of trying to parse a string where there's not non-digit delimiters and so you have to take exactly the specified number of digits, not more. Yeah, I thought about that, but it seems that use case is already hopelessly broken in both PostgreSQL and Oracle, so I'm disinclined to worry about it. How so? regression=# select to_date('20110321', 'MMDD'); to_date 2011-03-21 (1 row) regression=# select to_date('110321', 'YYMMDD'); to_date 2011-03-21 (1 row) If you break the latter case, I am sure the villagers will be on your doorstep shortly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 2nd Level Buffer Cache
On 3/21/11 3:24 AM, Greg Stark wrote: 2-level caches work well for a variety of applications. I think 2-level caches with simple heuristics like pin all the indexes is unlikely to be helpful. At least it won't optimize the average case and I think that's been proven. It might be helpful for optimizing the worst-case which would reduce the standard deviation. Perhaps we're at the point now where that matters. You're missing my point ... Postgres already *has* a 2-level cache: shared_buffers and the FS cache. Anything we add to that will be adding levels. We already did that, actually, when we implemented ARC: effectively gave PostgreSQL a 3-level cache. The results were not very good, although the algorithm could be at fault there. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rectifying wrong Date outputs
On Mon, Mar 21, 2011 at 12:39 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Mar 21, 2011 at 10:18 AM, Tom Lane t...@sss.pgh.pa.us wrote: As far as I can see, that would completely destroy the use-case of trying to parse a string where there's not non-digit delimiters and so you have to take exactly the specified number of digits, not more. Yeah, I thought about that, but it seems that use case is already hopelessly broken in both PostgreSQL and Oracle, so I'm disinclined to worry about it. How so? regression=# select to_date('20110321', 'MMDD'); to_date 2011-03-21 (1 row) regression=# select to_date('110321', 'YYMMDD'); to_date 2011-03-21 (1 row) If you break the latter case, I am sure the villagers will be on your doorstep shortly. Oh, dear. No wonder this code is so hard to get right. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Missing semicolon in parser's gram.y
Attached is a single-line patch to add a missing semicolon to gram.y; caught using yyextract. Regards, -- Gurjeet Singh EnterpriseDB Corporation The Enterprise PostgreSQL Company diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index d32e480..44c4fd6 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -9156,6 +9156,7 @@ SimpleTypename: $$ = $1; $$-collnames = $2; } + ; SimpleTypenameWithoutCollation: GenericType{ $$ = $1; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL
Kevin Grittner kevin.gritt...@wicourts.gov writes: Robert Haas wrote: Tom Lane wrote: I don't think that suppressing nulls from an index this way is really very useful. Using a partial index probably eats more planner cycles than you'll save, overall. If only 1% of the table has non-NULL values in that column, maybe not. We definitely have indexes with less than 1% non-NULL, and we've found partial indexes to be efficient for them. On the other hand, I can't think where we do min/max on any of them; so as long as this regression only affects those aggregates, it won't hurt our shop. The use case doesn't seem all that far-fetched to me, though. Hmm. We could possibly fix this by having planagg.c do a completely separate planner run for each aggregate, wherein it actually does build the equivalent query SELECT col FROM tab WHERE existing-quals AND col IS NOT NULL ORDER BY col ASC/DESC LIMIT 1 and plan that. That'd be less efficient than the current way, especially for cases where there are multiple aggregates, because there would be some duplication of processing between the per-aggregate planner runs and the main one. But since we can only do this optimization for rather simple queries anyway, maybe it wouldn't matter much. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Sync Rep and shutdown Re: [HACKERS] Sync Rep v19
On Mon, Mar 21, 2011 at 12:29 PM, Yeb Havinga yebhavi...@gmail.com wrote: pgbench -i -s 50 test Two runs of pgbench -c 10 -M prepared -T 600 test with 1 sync standby - server configs etc were mailed upthread. - performance as of commit e148443ddd95cd29edf4cc1de6188eb9cee029c5 1158 and 1306 (avg 1232) - performance as of current git master 1181 and 1280 (avg 1230,5) - performance as of current git master with sync-standbys-defined-rearrangement applied 1152 and 1269 (avg 1210,5) Hmm, that doesn't appear to show the 20% regression Simon claimed upthread. That's good... but I'm confused as to how you are getting numbers this high at all without a BBU. If every commit has to wait for two consecutive fsyncs, cranking out 1200+ commits per second is a lot. Maybe it's just barely plausible if these are 15K drives and all the commits are piggybacking on the fsyncs at top speed, but, man, that's fast. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Planner regression in 9.1: min(x) cannot use partial index with NOT NULL
So it's a clever hack that we used to allow the partial indexes to be used. It relied on the implicit assumption that min(x) and max(x) where the only values of x where NULL were both NULL. It would be nice if we were clever enough to support *any* strict aggregate using partial indexes on WHERE NOT NULL since they'll all have that property. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL
Greg Stark gsst...@mit.edu writes: So it's a clever hack that we used to allow the partial indexes to be used. It relied on the implicit assumption that min(x) and max(x) where the only values of x where NULL were both NULL. It would be nice if we were clever enough to support *any* strict aggregate using partial indexes on WHERE NOT NULL since they'll all have that property. Huh? The point of the min/max optimization is to not scan the whole index but just fetch the endpoint value. For general aggregates, you have to scan the table anyway. If an index is useful for that, it'll get picked up in the normal planning process. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] psql \dt and table size
It stroke me today again, that \dt+ isn't displaying the acurate table size for tables, since it uses pg_relation_size() till now. With having pg_table_size() since PostgreSQL 9.0 available, i believe it would be more useful to have the total acquired storage displayed, including implicit objects (the mentioned case where it was not very useful atm was a table with a big TOAST table). Attached minor patch extends \dt to use pg_table_size() starting with PostgreSQL 9.0, not sure if we backport such changes though. It would be interesting for 9.1, however. -- Thanks Bernd psql_tablesize.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Missing semicolon in parser's gram.y
Gurjeet Singh singh.gurj...@gmail.com writes: Attached is a single-line patch to add a missing semicolon to gram.y; caught using yyextract. Done some time ago ... http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=a051ef699c3ed1f89088dd6bbc2574f13d0b20eb#patch16 regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Sync Rep and shutdown Re: [HACKERS] Sync Rep v19
On 2011-03-21 18:04, Robert Haas wrote: On Mon, Mar 21, 2011 at 12:29 PM, Yeb Havingayebhavi...@gmail.com wrote: pgbench -i -s 50 test Two runs of pgbench -c 10 -M prepared -T 600 test with 1 sync standby - server configs etc were mailed upthread. - performance as of commit e148443ddd95cd29edf4cc1de6188eb9cee029c5 1158 and 1306 (avg 1232) - performance as of current git master 1181 and 1280 (avg 1230,5) - performance as of current git master with sync-standbys-defined-rearrangement applied 1152 and 1269 (avg 1210,5) I ran another pgbench with this last setup, which gives it a 1240,33 average: tps = 1300.786386 (including connections establishing) tps = 1300.844220 (excluding connections establishing) IMO what these tests have shown is that there is no 20% performance difference between the different versions. To determine if there are differences, n should be a lot higher, or perhaps a single one with a very large duration. Hmm, that doesn't appear to show the 20% regression Simon claimed upthread. That's good... but I'm confused as to how you are getting numbers this high at all without a BBU. For the sake of testing syncrep, I put xfs in nobarrier mode on both master and standby: /dev/sdc1 on /xlog type xfs (rw,noatime,nodiratime,nobarrier) /dev/md11 on /archive type xfs (rw,noatime,nodiratime,nobarrier,logdev=/dev/sdc3) /dev/md10 on /data type xfs (rw,noatime,nodiratime,nobarrier,logdev=/dev/sdc2) -- Yeb Havinga http://www.mgrid.net/ Mastering Medical Data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 2nd Level Buffer Cache
On Mon, Mar 21, 2011 at 3:54 PM, Merlin Moncure mmonc...@gmail.com wrote: Can't you make just one large mapping and lock it in 8k regions? I thought the problem with mmap was not being able to detect other processes (http://www.mail-archive.com/pgsql-general@postgresql.org/msg122301.html) compatibility issues (possibly obsolete), etc. I was assuming that locking part of a mapping would force the kernel to split the mapping. It has to record the locked state somewhere so it needs a data structure that represents the size of the locked section and that would, I assume, be the mapping. It's possible the kernel would not in fact fall over too badly doing this. At some point I'll go ahead and do experiments on it. It's a bit fraught though as it the performance may depend on the memory management features of the chipset. That said, that's only part of the battle. On 32bit you can't map the whole database as your database could easily be larger than your address space. I have some ideas on how to tackle that but the simplest test would be to just mmap 8kB chunks everywhere. But it's worse than that. Since you're not responsible for flushing blocks to disk any longer you need some way to *unlock* a block when it's possible to be flushed. That means when you flush the xlog you have to somehow find all the blocks that might no longer need to be locked and atomically unlock them. That would require new infrastructure we don't have though it might not be too hard. What would be nice is a mlock_until() where you eventually issue a call to tell the kernel what point in time you've reached and it unlocks everything older than that time. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 2nd Level Buffer Cache
On Mon, Mar 21, 2011 at 4:47 PM, Josh Berkus j...@agliodbs.com wrote: You're missing my point ... Postgres already *has* a 2-level cache: shared_buffers and the FS cache. Anything we add to that will be adding levels. I don't think those two levels are interesting -- they don't interact cleverly at all. I was assuming the two levels were segments of the shared buffers that didn't interoperate at all. If you kick buffers from the higher level cache into the lower level one then why not just increase the number of clock sweeps before you flush a buffer and insert non-index pages into a lower clock level instead of writing code for two levels? I don't think it will outperform in general because LRU is provably within some margin from optimal and the clock sweep is an approximate LRU. The only place you're going to find wins is when you know something extra about the *future* access pattern that the lru/clock doesn't know based on the past behaviour. Just saying indexes are heavily used or system tables are heavily used isn't really extra information since the LRU can figure that out. Something like sequential scans of tables larger than shared buffers don't go back and read old pages before they age out is. The other place you might win is if you have some queries that you want to always be fast at the expense of slower queries. So your short web queries that only need to touch a few small tables and system tables can tag buffers that are higher priority and shouldn't be swapped out to achieve a slightly higher hit rate on the global cache. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Planner regression in 9.1: min(x) cannot use partial index with NOT NULL
On Mon, Mar 21, 2011 at 5:40 PM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark gsst...@mit.edu writes: So it's a clever hack that we used to allow the partial indexes to be used. It relied on the implicit assumption that min(x) and max(x) where the only values of x where NULL were both NULL. It would be nice if we were clever enough to support *any* strict aggregate using partial indexes on WHERE NOT NULL since they'll all have that property. Huh? The point of the min/max optimization is to not scan the whole index but just fetch the endpoint value. But in the case where the index has no records it doesn't know whether there were no records in the table or they were just all NULL. As it happens min() and max() return NULL in both cases so it doesn't matter. My point was that this is a clever hack and a non-obvious deduction the planner is making. For general aggregates, you have to scan the table anyway. If an index is useful for that, it'll get picked up in the normal planning process. if I do SELECT count(col) from tab with no WHERE clauses on a table with 1% non-null values in col will the planner correctly find the partial index? If so why doesn't the min/max planning find it? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tolower() identifier downcasing versus multibyte encodings
I just received a feedback from our bug report about this problem and it seems the problem also occurred on a windows machine. http://pgfoundry.org/tracker/index.php?func=detailaid=1010988group_id=1000140atid=590 On Sat, Mar 19, 2011 at 14:13, Marko Kreen mark...@gmail.com wrote: On Sat, Mar 19, 2011 at 5:05 PM, Tom Lane t...@sss.pgh.pa.us wrote: Marko Kreen mark...@gmail.com writes: On Sat, Mar 19, 2011 at 6:10 AM, Tom Lane t...@sss.pgh.pa.us wrote: Or we could bite the bullet and start using str_tolower(), but the performance implications of that are unpleasant; not to mention that we really don't want to re-introduce the Turkish problem with unexpected handling of i/I in identifiers. How about first pass with 'a' - 'A' and if highbit is found then str_tolower()? Hm, maybe. There's still the problem of what to do in src/port/pgstrcasecmp.c, which won't have the infrastructure needed to do that. You mean client-side? Could we have a str_tolower without xxx_l branch that always does wide-char conversion if high-bit is set? Custom locale there won't make sense there anyway? -- marko -- Regards, Francisco Figueiredo Jr. Npgsql Lead Developer http://www.npgsql.org http://fxjr.blogspot.com http://twitter.com/franciscojunior -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Chinese initdb on Windows
On windows, if you have OS locale set to Chinese (Simplified, PRC), initdb fails: X:\C:\pgsql-install\bin\initdb.exe -D data2 The files belonging to this database system will be owned by user Heikki. This user must also own the server process. The database cluster will be initialized with locale Chinese (Simplified)_People 's Republic of China.936. initdb: locale Chinese (Simplified)_People's Republic of China.936 requires unsu pported encoding GBK Encoding GBK is not allowed as a server-side encoding. Rerun initdb with a different locale selection. The easy workaround for that is to specify --encoding=UTF-8, as UTF-8 can be used with any locale on Windows. How about doing that automatically in initdb? Now that we have the smarts in psql to detect current encoding from the environment and set client_encoding accordingly, it Just Works. Attached is a patch for that. Once you get past that, however, there's another issue: ... creating directory data2 ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 32MB creating configuration files ... ok creating template1 database in data2/base/1 ... ok initializing pg_authid ... FATAL: database locale is incompatible with operatin g system DETAIL: The database was initialized with LC_COLLATE Chinese (Simplified)_Peoples Republic of China.936, which is not recognized by setlocale(). HINT: Recreate the database with another locale or install the missing locale. child process exited with exit code 1 The problem is probably the apostrophe in the locale name, although it seems to be missing from the above error message. setlocale() has a known problem with locale names that have dots in the country name, and looks like it has similar issues with apostrophes. Fortunately, there are aliases for those problematic locales on Windows, that don't have dots or apostrophes in the names. We did some testing in EnterpriseDB of various locales on various versions of Windows, and came up with the following mappings: *_Hong Kong S.A.R.* - *_HKG.* *_U.A.E.* - *_ARE.* *_People's Republic of China.* - *_China.* China_Macau S.A.R..950 - ZHM The first three mappings map the full country name to an abbreviation that is also accepted by Windows' setlocale(). See http://msdn.microsoft.com/en-us/library/cdax410z%28v=vs.71%29.aspx. ARE is not on that list, but seems to work. Macau is trickier. ZHM is not an abbreviation of the country, but of the whole locale, so we can't replace just the country part. So this will not work for Finnish_Macau S.A.R..950, like the other mappings do. Nevertheless, it works for the common case. Any objections to the 2nd attached patch, which adds the mapping of those locale names on Windows? I'm thinking it's not too late to do this in 9.1. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 98e864d..6999a42 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -2909,7 +2909,19 @@ main(int argc, char *argv[]) } else if (!pg_valid_server_encoding_id(ctype_enc)) { - /* We recognized it, but it's not a legal server encoding */ + /* + * We recognized it, but it's not a legal server encoding. + * On Windows, UTF-8 works with any locale, so we can fall back + * to UTF-8. + */ +#ifdef WIN32 + printf(_(Encoding %s implied by locale is not allowed as a server-side encoding.\n + The default database encoding has been set to %s instead.\n), + pg_encoding_to_char(ctype_enc), + pg_encoding_to_char(PG_UTF8)); + ctype_enc = PG_UTF8; + encodingid = encodingid_to_string(ctype_enc); +#else fprintf(stderr, _(%s: locale %s requires unsupported encoding %s\n), progname, lc_ctype, pg_encoding_to_char(ctype_enc)); @@ -2918,6 +2930,7 @@ main(int argc, char *argv[]) Rerun %s with a different locale selection.\n), pg_encoding_to_char(ctype_enc), progname); exit(1); +#endif } else { diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 6999a42..754f65b 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -185,6 +185,8 @@ static int locale_date_order(const char *locale); static bool check_locale_name(const char *locale); static bool check_locale_encoding(const char *locale, int encoding); static void setlocales(void); +static void strreplace(char *str, char *needle, char *replacement); +static char *localemap(char *locale); static void usage(const char *progname); #ifdef WIN32 @@ -2253,6 +2255,79 @@ check_locale_encoding(const char *locale, int user_enc) return true; } +/* + * Replace 'needle' with 'replacement' in 'str' . Note that the replacement + * is done in-place, so 'replacement' must be shorter than 'needle'. + */ +static void +strreplace(char *str, char *needle, char *replacement) +{ + char *s; + + s = strstr(str, needle); + if (s != NULL)
Re: [HACKERS] 2nd Level Buffer Cache
Excerpts from Josh Berkus's message of lun mar 21 13:47:21 -0300 2011: We already did that, actually, when we implemented ARC: effectively gave PostgreSQL a 3-level cache. The results were not very good, although the algorithm could be at fault there. Was it really all that bad? IIRC we replaced ARC with the current clock sweep due to patent concerns. (Maybe there were performance concerns as well, I don't remember). -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL
Greg Stark gsst...@mit.edu writes: On Mon, Mar 21, 2011 at 5:40 PM, Tom Lane t...@sss.pgh.pa.us wrote: For general aggregates, you have to scan the table anyway. If an index is useful for that, it'll get picked up in the normal planning process. if I do SELECT count(col) from tab with no WHERE clauses on a table with 1% non-null values in col will the planner correctly find the partial index? If so why doesn't the min/max planning find it? It will not. The hard part of doing something with that is that there could be more than one aggregate. I did think about whether we could just push the IS NOT NULL into the main query, but that falls down on cases like this: select min(x), max(y) from tab; If we try to modify that to select min(x), max(y) from tab where x is not null and y is not null; then we get the wrong answers, since x and y are probably nonnull in different subsets of the table. In the case of min/max, the endpoint hack makes the aggregates so cheap that we can afford to perform a separate indexscan for each aggregate, and thus having a NOT NULL qual that is different for each aggregate isn't a problem (as long as we make sure it only affects that aggregate's subquery and not the whole query). This approach doesn't scale to aggregates that will scan the whole table, though. I suppose we might be able to do what you're suggesting for the case of only one aggregate, but that isn't going to meet the desire of not having a regression from what 9.0 could do with min/max. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 2nd Level Buffer Cache
Was it really all that bad? IIRC we replaced ARC with the current clock sweep due to patent concerns. (Maybe there were performance concerns as well, I don't remember). Yeah, that was why the patent was frustrating. Performance was poor and we were planning on replacing ARC in 8.2 anyway. Instead we had to backport it. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 2nd Level Buffer Cache
On Mon, Mar 21, 2011 at 2:08 PM, Greg Stark gsst...@mit.edu wrote: On Mon, Mar 21, 2011 at 3:54 PM, Merlin Moncure mmonc...@gmail.com wrote: Can't you make just one large mapping and lock it in 8k regions? I thought the problem with mmap was not being able to detect other processes (http://www.mail-archive.com/pgsql-general@postgresql.org/msg122301.html) compatibility issues (possibly obsolete), etc. I was assuming that locking part of a mapping would force the kernel to split the mapping. It has to record the locked state somewhere so it needs a data structure that represents the size of the locked section and that would, I assume, be the mapping. It's possible the kernel would not in fact fall over too badly doing this. At some point I'll go ahead and do experiments on it. It's a bit fraught though as it the performance may depend on the memory management features of the chipset. That said, that's only part of the battle. On 32bit you can't map the whole database as your database could easily be larger than your address space. I have some ideas on how to tackle that but the simplest test would be to just mmap 8kB chunks everywhere. Even on 64 bit systems you only have 48 bit address space which is not a theoretical limitation. However, at least on linux you can map in and map out pretty quick (10 microseconds paired on my linux vm) so that's not so big of a deal. Dealing with rapidly growing files is a problem. That said, probably you are not going to want to reserve multiple gigabytes in 8k non contiguous chunks. But it's worse than that. Since you're not responsible for flushing blocks to disk any longer you need some way to *unlock* a block when it's possible to be flushed. That means when you flush the xlog you have to somehow find all the blocks that might no longer need to be locked and atomically unlock them. That would require new infrastructure we don't have though it might not be too hard. What would be nice is a mlock_until() where you eventually issue a call to tell the kernel what point in time you've reached and it unlocks everything older than that time. I wonder if there is any reason to mlock at all...if you are going to 'do' mmap, can't you just hide under current lock architecture for actual locking and do direct memory access without mlock? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Chinese initdb on Windows
On Mon, Mar 21, 2011 at 7:29 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On windows, if you have OS locale set to Chinese (Simplified, PRC), initdb fails: X:\C:\pgsql-install\bin\initdb.exe -D data2 The files belonging to this database system will be owned by user Heikki. This user must also own the server process. The database cluster will be initialized with locale Chinese (Simplified)_People 's Republic of China.936. initdb: locale Chinese (Simplified)_People's Republic of China.936 requires unsu pported encoding GBK Encoding GBK is not allowed as a server-side encoding. Rerun initdb with a different locale selection. The easy workaround for that is to specify --encoding=UTF-8, as UTF-8 can be used with any locale on Windows. How about doing that automatically in initdb? Now that we have the smarts in psql to detect current encoding from the environment and set client_encoding accordingly, it Just Works. Attached is a patch for that. Once you get past that, however, there's another issue: ... creating directory data2 ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 32MB creating configuration files ... ok creating template1 database in data2/base/1 ... ok initializing pg_authid ... FATAL: database locale is incompatible with operatin g system DETAIL: The database was initialized with LC_COLLATE Chinese (Simplified)_Peoples Republic of China.936, which is not recognized by setlocale(). HINT: Recreate the database with another locale or install the missing locale. child process exited with exit code 1 The problem is probably the apostrophe in the locale name, although it seems to be missing from the above error message. setlocale() has a known problem with locale names that have dots in the country name, and looks like it has similar issues with apostrophes. Fortunately, there are aliases for those problematic locales on Windows, that don't have dots or apostrophes in the names. We did some testing in EnterpriseDB of various locales on various versions of Windows, and came up with the following mappings: *_Hong Kong S.A.R.* - *_HKG.* *_U.A.E.* - *_ARE.* *_People's Republic of China.* - *_China.* China_Macau S.A.R..950 - ZHM The first three mappings map the full country name to an abbreviation that is also accepted by Windows' setlocale(). See http://msdn.microsoft.com/en-us/library/cdax410z%28v=vs.71%29.aspx. ARE is not on that list, but seems to work. Macau is trickier. ZHM is not an abbreviation of the country, but of the whole locale, so we can't replace just the country part. So this will not work for Finnish_Macau S.A.R..950, like the other mappings do. Nevertheless, it works for the common case. Any objections to the 2nd attached patch, which adds the mapping of those locale names on Windows? I'm thinking it's not too late to do this in 9.1. I've heard complaints a number of times from Chinese users who I believe this would help. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] really lazy vacuums?
On Mar 16, 2011, at 7:44 PM, Robert Haas wrote: It would be really nice (for this and for other things) if we had some way of measuring the I/O saturation of the system, so that we could automatically adjust the aggressiveness of background processes accordingly. Has anyone looked at the overhead of measuring how long IO requests to the kernel take? If we did that not only could we get an idea of what our IO workload looked like, we could also figure out whether a block came out of cache or not. That information could potentially be useful to the planner, but even if the database couldn't use that knowledge itself it would be a damn useful statistic to have... IMHO, far more useful than our current hit rate statistics. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 2nd Level Buffer Cache
Merlin Moncure mmonc...@gmail.com Monday 21 March 2011 20:58:16 On Mon, Mar 21, 2011 at 2:08 PM, Greg Stark gsst...@mit.edu wrote: On Mon, Mar 21, 2011 at 3:54 PM, Merlin Moncure mmonc...@gmail.com wrote: Can't you make just one large mapping and lock it in 8k regions? I thought the problem with mmap was not being able to detect other processes (http://www.mail-archive.com/pgsql-general@postgresql.org/msg122301.htm l) compatibility issues (possibly obsolete), etc. I was assuming that locking part of a mapping would force the kernel to split the mapping. It has to record the locked state somewhere so it needs a data structure that represents the size of the locked section and that would, I assume, be the mapping. It's possible the kernel would not in fact fall over too badly doing this. At some point I'll go ahead and do experiments on it. It's a bit fraught though as it the performance may depend on the memory management features of the chipset. That said, that's only part of the battle. On 32bit you can't map the whole database as your database could easily be larger than your address space. I have some ideas on how to tackle that but the simplest test would be to just mmap 8kB chunks everywhere. Even on 64 bit systems you only have 48 bit address space which is not a theoretical limitation. However, at least on linux you can map in and map out pretty quick (10 microseconds paired on my linux vm) so that's not so big of a deal. Dealing with rapidly growing files is a problem. That said, probably you are not going to want to reserve multiple gigabytes in 8k non contiguous chunks. But it's worse than that. Since you're not responsible for flushing blocks to disk any longer you need some way to *unlock* a block when it's possible to be flushed. That means when you flush the xlog you have to somehow find all the blocks that might no longer need to be locked and atomically unlock them. That would require new infrastructure we don't have though it might not be too hard. What would be nice is a mlock_until() where you eventually issue a call to tell the kernel what point in time you've reached and it unlocks everything older than that time. Sorry for curious, but I think mlock is for swap prevent not for flush prevent. I wonder if there is any reason to mlock at all...if you are going to 'do' mmap, can't you just hide under current lock architecture for actual locking and do direct memory access without mlock? merlin mmap man do not say anything about when flush occurs when mmap is file and is shared, so flushes may be intended or not. Much more, this what I read, SysV shared memory is emulated by mmap (and I think this mmap is on /dev/shm) Radek -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Sync Rep and shutdown Re: [HACKERS] Sync Rep v19
On Mon, Mar 21, 2011 at 7:51 PM, Yeb Havinga yebhavi...@gmail.com wrote: On 2011-03-21 18:04, Robert Haas wrote: On Mon, Mar 21, 2011 at 12:29 PM, Yeb Havingayebhavi...@gmail.com wrote: pgbench -i -s 50 test Two runs of pgbench -c 10 -M prepared -T 600 test with 1 sync standby - server configs etc were mailed upthread. - performance as of commit e148443ddd95cd29edf4cc1de6188eb9cee029c5 1158 and 1306 (avg 1232) - performance as of current git master 1181 and 1280 (avg 1230,5) - performance as of current git master with sync-standbys-defined-rearrangement applied 1152 and 1269 (avg 1210,5) IMO what these tests have shown is that there is no 20% performance difference between the different versions. To determine if there are differences, n should be a lot higher, or perhaps a single one with a very large duration. pgbench -T 3600: sync-standbys-defined-rearrangement 1270 tps current git master 1306 tps -- Yeb Havinga http://www.mgrid.net/ Mastering Medical Data
Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL
I wrote: Hmm. We could possibly fix this by having planagg.c do a completely separate planner run for each aggregate, wherein it actually does build the equivalent query SELECT col FROM tab WHERE existing-quals AND col IS NOT NULL ORDER BY col ASC/DESC LIMIT 1 and plan that. That'd be less efficient than the current way, especially for cases where there are multiple aggregates, because there would be some duplication of processing between the per-aggregate planner runs and the main one. But since we can only do this optimization for rather simple queries anyway, maybe it wouldn't matter much. I studied the code some more, and I think this probably can be made to work. The basic idea is to have preprocess_minmax_aggregates build simplified queries like the above (working by modifying the query tree that exists at the point where it's called) and call query_planner on them. Save aside the resulting path data, then let the regular planning process continue. When optimize_minmax_aggregates is called, see whether the regular plan is cheaper than the sum of the path costs. If not, use the paths to construct a replacement plan, same as now. The reason this should work is that query_planner() embodies pretty much all the useful processing that happens between preprocess_minmax_aggregates and optimize_minmax_aggregates --- the other code in that stretch is mostly about grouping, which would disable the minmax optimization anyway. So no important steps will get left out. Of course, this introduces still more coupling between planagg.c and planner.c, but I think that's probably tolerable. The main objection to this approach is having to do all the index analysis N+1 times for an N-aggregate query. I don't see any practical alternative though if we want to make use of indexes that wouldn't be used without the IS NOT NULL clause. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Chinese initdb on Windows
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Any objections to the 2nd attached patch, which adds the mapping of those locale names on Windows? I think the added initdb message isn't following our style guidelines --- it certainly doesn't match the adjacent existing message. Other than that quibble, ok here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \dt and table size
+1 for fixing this behavior in 9.1. -1 for changing in 9.0, as the change in behavior mid-release will cause more confusion than the incomplete accounting does. Cheers, David. On Mon, Mar 21, 2011 at 06:44:51PM +0100, Bernd Helmle wrote: It stroke me today again, that \dt+ isn't displaying the acurate table size for tables, since it uses pg_relation_size() till now. With having pg_table_size() since PostgreSQL 9.0 available, i believe it would be more useful to have the total acquired storage displayed, including implicit objects (the mentioned case where it was not very useful atm was a table with a big TOAST table). Attached minor patch extends \dt to use pg_table_size() starting with PostgreSQL 9.0, not sure if we backport such changes though. It would be interesting for 9.1, however. -- Thanks Bernd -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] really lazy vacuums?
On Mon, Mar 21, 2011 at 6:08 PM, Jim Nasby j...@nasby.net wrote: Has anyone looked at the overhead of measuring how long IO requests to the kernel take? If we did that not only could we get an idea of what our IO workload looked like, we could also figure out whether a block came out of cache or not. That information could potentially be useful to the planner, but even if the database couldn't use that knowledge itself it would be a damn useful statistic to have... IMHO, far more useful than our current hit rate statistics. I've done this -- actually better, I used mincore to actually check whether the block was in cache before issuing the read -- but it turns out you can't get what you're looking for this way. It turns out when you do this you see one block being read from disk followed by n blocks that all appear to be cache hits. Because they've been prefetched by the kernel. What you end up with is actually something like the number of iops which is also an interesting measure but not really what you were looking for. My getrusage patch, which I should still dig out though it's rather too late to be committing now unless someone tells me otherwise, would tell you how much i/o a plan node actually did. But you won't know which blocks did the i/o since I was only tracking totals for the plan node. That's probably what you're looking for here. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2011 - Mentors? Projects?
On Mon, 2011-03-14 at 16:08 +, Dave Page wrote: Please count me in as a mentor for a GUI project. It is an open source dashboard for PostgreSQL DBAs, called ADJ Dashboard. I can't find any info on that via Google. URL please? Erdinc said that he will setup a web page for this, and let the list know. Regards, -- Devrim GÜNDÜZ Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org Twitter: http://twitter.com/devrimgunduz signature.asc Description: This is a digitally signed message part
Re: [HACKERS] How to look at the Expression Trees
Hi, You said: ExecMakeFunctionResultNoSets is used to evaluate function calls. What are the 'functions' there? Are they the user supplied pl/PGSQL style user functions, the functions handled by fmgr or are they just another C function which make the Expression Evaluator? Regards, Vaibhav
[HACKERS] Re: Planner regression in 9.1: min(x) cannot use partial index with NOT NULL
On Tue, Mar 22, 2011 at 4:35 AM, Tom Lane t...@sss.pgh.pa.us wrote: Reimplement planner's handling of MIN/MAX aggregate optimization (again). I'm just curious, Why is this no longer an interesting special case? --- this is an interesting special case as of 9.1 -explain (costs off) - select min(unique2) from tenk1 where unique2 = 42; - QUERY PLAN - Aggregate - - Index Scan using tenk1_unique2 on tenk1 - Index Cond: (unique2 = 42) -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planner regression in 9.1: min(x) cannot use partial index with NOT NULL
Greg Stark gsst...@mit.edu writes: On Tue, Mar 22, 2011 at 4:35 AM, Tom Lane t...@sss.pgh.pa.us wrote: Reimplement planner's handling of MIN/MAX aggregate optimization (again). I'm just curious, Why is this no longer an interesting special case? --- this is an interesting special case as of 9.1 -explain (costs off) - select min(unique2) from tenk1 where unique2 = 42; - QUERY PLAN - Aggregate - - Index Scan using tenk1_unique2 on tenk1 - Index Cond: (unique2 = 42) In the pathkey-based implementation, that resulted in an empty pathkey list, which that implementation couldn't deal with. I figured that was okay because the default plan isn't bad in such a case, but I put in a test case (probably because the code failed before I put in a defense against it, but I don't recall for sure). It's not particularly a corner case for the new code, though, and the resulting plan changed (because the new code will in fact turn this into a LIMIT subselect anyway). So I debated whether to change the expected output or just take it out, and I chose the latter. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers