Re: [HACKERS] switch UNLOGGED to LOGGED
Yes, that seems like a very appealing approach. There is plenty of bit-space available in xinfo, and we could reserve a bit each for nrels, nsubxacts, and nmsgs, with set meaning that an integer count of that item is present and clear meaning that the count is omitted from the structure (and zero). This will probably require a bit of tricky code reorganization so I think it should be done separately from the main patch. Ok, I'll try and send a patch with this change only. BTW xinfo is 32 bit long, but I think only 2 bits are used right now? I think I can make it a 8 bits, and add another 8 bits for nrels, nsubxacts, and nmsgs and the new thing. That should save another 2 bytes, while leaving space for extention. Or we can make it a 8 bits only, but only 2 bits would be left empty for future extentions; I don't know if we care about it... Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers
On 10.05.2011 04:43, Greg Smith wrote: Josh Berkus wrote: As I don't think we can change this, I think the best answer is to tell people Don't submit a big patch to PostgreSQL until you've done a few small patches first. You'll regret it. When I last did a talk about getting started writing patches, I had a few people ask me afterwards if I'd ever run into problems with having patch submissions rejected. I said I hadn't. When asked what my secret was, I told them my first serious submission modified exactly one line of code[1]. And *that* I had to defend in regards to its performance impact.[2] Anyway, I think the intro message should be Don't submit a big patch to PostgreSQL until you've done a small patch and some patch review instead though. Well, my first patch was two-phase commit. And I had never even used PostgreSQL before I dived into the source tree and started to work on that. I did, however, lurk on the pgsql-hackers mailing list for a few months before posting, so I knew the social dynamics. I basically did exactly what Robert described elsewhere in this thread, and successfully avoided the culture shock. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Process wakeups when idle and power consumption
On 10 May 2011 02:58, Fujii Masao masao.fu...@gmail.com wrote: Alright. I'm currently working on a proof-of-concept implementation of that. In the meantime, any thoughts on how this should meld with the existing latch implementation? How about making WaitLatch monitor the file descriptor for the pipe by using select()? Alright, so it's reasonable to assume that all clients of the latch code are happy to be invariably woken up on Postmaster death? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Process wakeups when idle and power consumption
On 10.05.2011 11:22, Peter Geoghegan wrote: On 10 May 2011 02:58, Fujii Masaomasao.fu...@gmail.com wrote: Alright. I'm currently working on a proof-of-concept implementation of that. In the meantime, any thoughts on how this should meld with the existing latch implementation? How about making WaitLatch monitor the file descriptor for the pipe by using select()? Alright, so it's reasonable to assume that all clients of the latch code are happy to be invariably woken up on Postmaster death? That doesn't sound like a safe assumption. All the helper processes should die quickly on postmaster death, but I'm not sure if that holds for all inter-process communication. I think the caller needs to specify if he wants that or not. Once you add that to the WaitLatchOrSocket function, it's quite clear that the API is getting out of hand. There's five different events that can wake it up: * latch is set * a socket becomes readable * a socket becomes writeable * timeout * postmaster dies I think we need to refactor the function into something like: #define WL_LATCH_SET1 #define WL_SOCKET_READABLE 2 #define WL_SOCKET_WRITEABLE 4 #define WL_TIMEOUT 8 #define WL_POSTMASTER_DEATH 16 int WaitLatch(Latch latch, int events, long timeout) Where 'event's is a bitmask of events that should cause a wakeup, and return value is a bitmask identifying which event(s) caused the call to return. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers
On Tue, May 10, 2011 at 1:46 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 10.05.2011 04:43, Greg Smith wrote: Josh Berkus wrote: As I don't think we can change this, I think the best answer is to tell people Don't submit a big patch to PostgreSQL until you've done a few small patches first. You'll regret it. When I last did a talk about getting started writing patches, I had a few people ask me afterwards if I'd ever run into problems with having patch submissions rejected. I said I hadn't. When asked what my secret was, I told them my first serious submission modified exactly one line of code[1]. And *that* I had to defend in regards to its performance impact.[2] Anyway, I think the intro message should be Don't submit a big patch to PostgreSQL until you've done a small patch and some patch review instead though. Well, my first patch was two-phase commit. And I had never even used PostgreSQL before I dived into the source tree and started to work on that. I did, however, lurk on the pgsql-hackers mailing list for a few months before posting, so I knew the social dynamics. I basically did exactly what Robert described elsewhere in this thread, and successfully avoided the culture shock. Yeah, probably same for me, though I got a lot of support from existing hackers during my first submission. But it was a tiring experience for sure. I would submit a patch and then wait anxiously for any comments. I used to get a lot of interesting and valuable comments, but would know that unless one of the very few (Tom ?) members say something, good or bad, it won't go anywhere and those comments did not come in the early days/months. I was an unknown name and what I was trying to do was very invasive. So when I look back now, I can understand the reluctance on other members to get excited about the work. Most often they would see something in the design or the patch which is completely stupid and they would loose all interest at the very moment. Since I had backing of EnterpriseDB and it was my paid job, it was much easier to keep the enthusiasm, but I wouldn't be surprised if few others would have turned their back to the project forever. Fortunately, things have changed for better now. I think the entire commit fest business is good. Also, we now have a lot more hackers with expertise in different areas and with influential opinions. Its very likely that if you submit an idea or a patch, you would get some comment/suggestion/criticism very early. Since HOT is mentioned often in these discussions, I thought I should share my experience. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Process wakeups when idle and power consumption
On 10 May 2011 09:45, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I think we need to refactor the function into something like: #define WL_LATCH_SET 1 #define WL_SOCKET_READABLE 2 #define WL_SOCKET_WRITEABLE 4 #define WL_TIMEOUT 8 #define WL_POSTMASTER_DEATH 16 While I agree with the need to not box ourselves into a corner on the latch interface by making sweeping assumptions, isn't the fact that a socket became readable or writable strictly an implementation detail? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Process wakeups when idle and power consumption
On Tue, May 10, 2011 at 5:14 AM, Peter Geoghegan pe...@2ndquadrant.com wrote: On 10 May 2011 09:45, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I think we need to refactor the function into something like: #define WL_LATCH_SET 1 #define WL_SOCKET_READABLE 2 #define WL_SOCKET_WRITEABLE 4 #define WL_TIMEOUT 8 #define WL_POSTMASTER_DEATH 16 While I agree with the need to not box ourselves into a corner on the latch interface by making sweeping assumptions, isn't the fact that a socket became readable or writable strictly an implementation detail? The thing about the socket being readable/writeable is needed for walsender. It needs to notice when its connection to walreceiver is writeable (so it can send more WAL) or readable (so it can receive a reply message). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] switch UNLOGGED to LOGGED
On Tue, May 10, 2011 at 3:35 AM, Leonardo Francalanci m_li...@yahoo.it wrote: Yes, that seems like a very appealing approach. There is plenty of bit-space available in xinfo, and we could reserve a bit each for nrels, nsubxacts, and nmsgs, with set meaning that an integer count of that item is present and clear meaning that the count is omitted from the structure (and zero). This will probably require a bit of tricky code reorganization so I think it should be done separately from the main patch. Ok, I'll try and send a patch with this change only. BTW xinfo is 32 bit long, but I think only 2 bits are used right now? I think I can make it a 8 bits, and add another 8 bits for nrels, nsubxacts, and nmsgs and the new thing. That should save another 2 bytes, while leaving space for extention. Or we can make it a 8 bits only, but only 2 bits would be left empty for future extentions; I don't know if we care about it... I don't think making xinfo shorter will save anything, because whatever follows it is going to be a 4-byte quantity and therefore 4-byte aligned. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] switch UNLOGGED to LOGGED
I don't think making xinfo shorter will save anything, because whatever follows it is going to be a 4-byte quantity and therefore 4-byte aligned. ups, didn't notice it. I'll splitxinfo into: uint16 xinfo; uint16 presentFlags; I guess it helps with the reading? I mean, instead of having a single uint32? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] crash-safe visibility map, take five
On Mon, May 9, 2011 at 10:25 PM, Merlin Moncure mmonc...@gmail.com wrote: On Fri, May 6, 2011 at 5:47 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Mar 30, 2011 at 8:52 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Another question: To address the problem in http://archives.postgresql.org/pgsql-hackers/2010-02/msg02097.php , should we just clear the vm before the log of insert/update/delete? This may reduce the performance, is there another solution? Yeah, that's a straightforward way to fix it. I don't think the performance hit will be too bad. But we need to be careful not to hold locks while doing I/O, which might require some rearrangement of the code. We might want to do a similar dance that we do in vacuum, and call visibilitymap_pin first, then lock and update the heap page, and then set the VM bit while holding the lock on the heap page. Here's an attempt at implementing the necessary gymnastics. Is there a quick synopsis of why you have to do (sometimes) the pin-lock-unlock-pin-lock mechanic? How come you only can fail to get the pin at most once? I thought I'd explained it fairly thoroughly in the comments, but evidently not. Suggestions for improvement are welcome. Here goes in more detail: Every time we insert, update, or delete a tuple in a particular heap page, we must check whether the page is marked all-visible. If it is, then we need to clear the page-level bit marking it as all-visible, and also the corresponding page in the visibility map. On the other hand, if the page isn't marked all-visible, then we needn't touch the visibility map at all. So, there are either one or two buffers involved: the buffer containing the heap page (buffer) and possibly also a buffer containing the visibility map page in which the bit for the heap page is to be found (vmbuffer). Before taking an exclusive content-lock on the heap buffer, we check whether the page appears to be all-visible. If it does, then we pin the visibility map page and then lock the buffer. If not, we just lock the buffer. However, since we weren't holding any lock, it's possible that between the time when we checked the visibility map bit and the time when we obtained the exclusive buffer-lock, the visibility map bit might have changed from clear to set (because someone is concurrently running VACUUM on the table; or on platforms with weak memory-ordering, someone was running VACUUM almost concurrently). If that happens, we give up our buffer lock, go pin the visibility map page, and reacquire the buffer lock. At this point in the process, we know that *if* the page is marked all-visible, *then* we have the appropriate visibility map page pinned. There are three possible pathways: (1) If the buffer initially appeared to be all-visible, we will have pinned the visibility map page before acquiring the exclusive lock; (2) If the buffer initially appeared NOT to be all-visible, but by the time we obtained the exclusive lock it now appeared to be all-visible, then we will have done the unfortunate unlock-pin-relock dance, and the visibility map page will now be pinned; (3) if the buffer initially appeared NOT to be all-visible, and by the time we obtained the exclusive lock it STILL appeared NOT to be all-visible, then we don't have the visibility map page pinned - but that's OK, because in this case no operation on the visibility map needs to be performed. Now it is very possible that in case (1) or (2) the visibility map bit, though we saw it set at some point, will actually have been cleared in the meantime. In case (1), this could happen before we obtain the exclusive lock; while in case (2), it could happen after we give up the lock to go pin the visibility map page and before we reacquire it. This will typically happen when a buffer has been sitting around for a while in an all-visible state and suddenly two different backends both try to update or delete tuples in that buffer at almost exactly the same time. But it causes no great harm - both backends will pin the visibility map page, whichever one gets the exclusive lock on the heap page first will clear it, and when the other backend gets the heap page afterwards, it will see that the bit has already been cleared and do nothing further. We've wasted the effort of pinning and unpinning the visibility map page when it wasn't really necessary, but that's not the end of the world. We could avoid all of this complexity - and the possibility of pinning the visibility map page needlessly - by locking the heap buffer first and then pinning the visibility map page if the heap page is all-visible. However, that would involve holding the lock on the heap buffer across a possible disk I/O to bring the visibility map page into memory, which is something the existing code tries pretty hard to avoid. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing
Re: [HACKERS] switch UNLOGGED to LOGGED
On Tue, May 10, 2011 at 8:03 AM, Leonardo Francalanci m_li...@yahoo.it wrote: I don't think making xinfo shorter will save anything, because whatever follows it is going to be a 4-byte quantity and therefore 4-byte aligned. ups, didn't notice it. I'll split xinfo into: uint16 xinfo; uint16 presentFlags; I guess it helps with the reading? I mean, instead of having a single uint32? My feeling would be just keep it as uint32. Breaking it up into chunks doesn't seem useful to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] stored procedures - use cases?
On Mon, May 9, 2011 at 11:58 PM, Pavel Stehule pavel.steh...@gmail.com wrote: no - you are little bit confused :). CALL and function execution shares nothing. There is significant differences between function and procedure. Function is called only from executor - from some plan, and you have to know a structure of result before run. The execution of CALL is much simple - you just execute code - without plan and waiting for any result - if there is. Now I'm a little confused, or you are. Surely any SQL has to be planned and executed, regardless of whether it appears in a function, a stored procedure, or anywhere else. Non-SQL statements within a stored procedure don't need to go through the planner and executor, but that's true in PL/python or PL/pgsql or whatever today. I think people are using the term stored procedures to refer to approximately whatever it is that they're unhappy that functions don't allow, and that's leading to a lot of people talking across each other. The main features seem to be (1) explicit transaction control and/or execution of commands like VACUUM that can't be invoked from within a transaction, (2) autonomous transactions, and (3) returning multiple result sets. But I don't think anybody would be desperately unhappy if it magically became possible to do those things from regular functions, unlikely as that may seem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] stored procedures - use cases?
2011/5/10 Robert Haas robertmh...@gmail.com: On Mon, May 9, 2011 at 11:58 PM, Pavel Stehule pavel.steh...@gmail.com wrote: no - you are little bit confused :). CALL and function execution shares nothing. There is significant differences between function and procedure. Function is called only from executor - from some plan, and you have to know a structure of result before run. The execution of CALL is much simple - you just execute code - without plan and waiting for any result - if there is. Now I'm a little confused, or you are. Surely any SQL has to be planned and executed, regardless of whether it appears in a function, a stored procedure, or anywhere else. Non-SQL statements within a stored procedure don't need to go through the planner and executor, but that's true in PL/python or PL/pgsql or whatever today. CALL statement is util command than SQL. It has to execute some NON SQL code. You can thinking about CALL statement like synonymum for SELECT, but it isn't correct (it is my opinion) The stored procedures was prior stored functions (more corectly UDF - user defined functions). These old time stored procedures was simply - it was client code moved on server. Usually these procedures was executed in different process or different thread. Inside procedures was full client's side functionality and there wasn't a network overhead. CALL statement is +/- remote call. It isn't SQL statement. I think people are using the term stored procedures to refer to approximately whatever it is that they're unhappy that functions don't allow, and that's leading to a lot of people talking across each other. The main features seem to be (1) explicit transaction control and/or execution of commands like VACUUM that can't be invoked from within a transaction, (2) autonomous transactions, and (3) returning multiple result sets. But I don't think anybody would be desperately unhappy if it magically became possible to do those things from regular functions, unlikely as that may seem. yes. @2 Autonomous transaction doesn't need stored procedures. Autonomous transaction can be isolated by function's flag, by some special PL/pgSQL statement: like BEGIN EXECUTE AUTONOMOUS '' END; @3 is possible now too - but not too much user friendly. Point 3 is strange. Oracle doesn't support it. Support in DB2 is little bit strange. And it is well supported by MySQL, MSSQL, maybe Informix, Sybase. Pavel Regards Pavel -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] the big picture for index-only scans
On Mon, May 9, 2011 at 10:36 PM, Merlin Moncure mmonc...@gmail.com wrote: 1. The visibility map needs to be crash-safe. The basic idea of index-only scans is that, instead of checking the heap to find out whether each tuple is visible, we first check the visibility map. If the visibility map bit is set, then we know all tuples on the page are visible to all transactions, and therefore the tuple of interest is visible to our transaction. Assuming that a significant number of visibility map bits are set, this should enable us to avoid a fair amount of I/O, especially on large tables, because the visibility map is roughly 8000 times smaller than the heap, and therefore far more practical to keep in cache. hm, what are the implications for tuple hint bits, short and long term? I'm particularly interested if you think any hint bit i/o mitigation strategies are worth pursuing. Well, I don't really want to let this thread on my project get hijacked to talk about your project (not that I haven't been guilty of that myself!) but, in brief, I think the main effect of index-only scans is that the performance difference between a vacuumed table and an unvacuumed table is going to increase. It's already the case that sequential scanning a table which has been vacuumed (and, therefore, all the pages are marked all-visible) is noticeably faster than sequential scanning a table which is not vacuumed (even if all the hint bits are set). Index-only scans are going to extend that by making index scans run faster on a table with lots of all-visible tables than on one where no pages are all-visible. So the importance of vacuuming an insert-only table occasionally (which autovacuum won't do, at present, until it's needed to prevent XID wraparound) is already more than zero, and it's going to go up. But the all-visible bits aren't quite the same as hint bits: I don't think there's any impact on hint bits per se. 2. Crash safe visibility map vs. pg_upgrade. Even if we make the visibility map crash-safe in 9.2, people are going to want to use pg_upgrade to migrate from older versions, bringing their possibly-not-quite-correct visibility map forks along with them. How should we handle that? We could (2A) arrange to have pg_upgrade nuke all visibility forks when upgrading from a release where the visibility map is not crash-safe to one where it is; +1 on 2A. OK. Anybody else? 3. Statistics. I believe that in order to accurately estimate the cost of an index-only scan, we're going to need to know the fraction of tuples that are on pages whose visibility map bits are set. It would be helpful to know the performance benefit of index only scans before knowing how much benefit to attribute here. Maybe a system wide kludge would for starters anyway, like assuming 60% of pages can be vis checked from the VM, or a single GUC, Then again, maybe not. Yeah, maybe I should try to beat the main patch into some kind of shape before working too much on the statistics stuff. Then we could actually benchmark it a bit, which would be good. I don't think that a system-wide kludge or GUC is going to work for prime time, but it's probably fine for initial performance testing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] crash-safe visibility map, take five
On Tue, May 10, 2011 at 7:48 AM, Robert Haas robertmh...@gmail.com wrote: I thought I'd explained it fairly thoroughly in the comments, but evidently not. Suggestions for improvement are welcome. ok. that clears it up nicely. Here goes in more detail: Every time we insert, update, or delete a tuple in a particular heap page, we must check whether the page is marked all-visible. If it is, then we need to clear the page-level bit marking it as all-visible, and also the corresponding page in the visibility map. On the other hand, if the page isn't marked all-visible, then we needn't touch the visibility map at all. So, there are either one or two buffers involved: the buffer containing the heap page (buffer) and possibly also a buffer containing the visibility map page in which the bit for the heap page is to be found (vmbuffer). Before taking an exclusive content-lock on the heap buffer, we check whether the page appears to be all-visible. If it does, then we pin the visibility map page and then lock the buffer. If not, we just lock the buffer. I see: here's a comment that was throwing me off: + /* +* If we didn't get the lock and it turns out we need it, we'll have to +* unlock and re-lock, to avoid holding the buffer lock across an I/O. +* That's a bit unfortunate, but hopefully shouldn't happen often. +*/ I think that might be phrased as didn't get the pin and it turns out we need it because the bit can change after inspection. The visible bit isn't 'wrong' as suggested in the comments, it just can change so that it becomes wrong. Maybe a note of why it could change would be helpful. Other than that, it looks pretty good...ISTM an awfully small amount of code to provide what it's doing (that's a good thing!). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] the big picture for index-only scans
On Tue, May 10, 2011 at 8:22 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, May 9, 2011 at 10:36 PM, Merlin Moncure mmonc...@gmail.com wrote: 1. The visibility map needs to be crash-safe. The basic idea of index-only scans is that, instead of checking the heap to find out whether each tuple is visible, we first check the visibility map. If the visibility map bit is set, then we know all tuples on the page are visible to all transactions, and therefore the tuple of interest is visible to our transaction. Assuming that a significant number of visibility map bits are set, this should enable us to avoid a fair amount of I/O, especially on large tables, because the visibility map is roughly 8000 times smaller than the heap, and therefore far more practical to keep in cache. hm, what are the implications for tuple hint bits, short and long term? I'm particularly interested if you think any hint bit i/o mitigation strategies are worth pursuing. Well, I don't really want to let this thread on my project get hijacked to talk about your project (not that I haven't been guilty of that myself!) no, that wasn't my intent at all, except in the sense of wondering if a crash-safe visibility map provides a route of displacing a lot of hint bit i/o and by extension, making alternative approaches of doing that, including mine, a lot less useful. that's a good thing. meaning: since the vis map approach is going to be a fairly large win over the classic approach to checking visibility in so many scenarios, maybe the real long term goal should be just being as aggressive as possible in terms of making sure it's set properly, and just give up and be a bit more brute forcey when it's not set. it's a fair question. that's a pretty broad statement, but that's what I'm thinking about. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] crash-safe visibility map, take five
On Tue, May 10, 2011 at 9:45 AM, Merlin Moncure mmonc...@gmail.com wrote: I see: here's a comment that was throwing me off: + /* + * If we didn't get the lock and it turns out we need it, we'll have to + * unlock and re-lock, to avoid holding the buffer lock across an I/O. + * That's a bit unfortunate, but hopefully shouldn't happen often. + */ I think that might be phrased as didn't get the pin and it turns out we need it because the bit can change after inspection. The visible bit isn't 'wrong' as suggested in the comments, it just can change so that it becomes wrong. Maybe a note of why it could change would be helpful. Oh, I see. I did write lock when I meant pin, and your other point is well-taken as well. Here's a revised version with some additional wordsmithing. Other than that, it looks pretty good...ISTM an awfully small amount of code to provide what it's doing (that's a good thing!). Thanks. It's definitely not big in terms of code footprint; it's mostly a matter of making sure we've dotted all the is and crossed all the ts. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company visibility-map-v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] ts_rank
Could somebody explain me on which methods is based ts_rank and how it works? I would appreciate some articles, if exist. Thanks a lot for reply. Mark -- View this message in context: http://postgresql.1045698.n5.nabble.com/ts-rank-tp4384120p4384120.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [v9.2] Leaky view and RLS
I'd like to summarize expected issues corresponding to leaky-view and RLS towards v9.2, and PGcon2011/Developer Meeting. We already made consensus the leaky-view is a problem to be fixed previous to the row-level security feature. We know several ways to leak/infer contents of tuples to be invisible using a view that is defined to row-level security purpose. [1] User defined functions with small-cost and side-effects (E.g error message) If these functions are appended to WHERE clause, it may be executed earlier than functions to be performed as row-level security policy of security views. These function can take arguments that references either visible or invisible tuples, so functions with side-effects enables to leak the contents of invisible tuples, when it was invoked earlier than conditions to filter out. [2] Iteration in proving PK/FK or UNIQUE constratins This type of iteration enables to estimate invisible values. E.g, fails to insert a tuple with a particular primary-key gives us a hint of existence of invisible tuple. We made consensus that RLS does not handle this type of scenario called as covert-channels. The point is user cannot see the hidden value directly. [3] Reference to statistics delivered from table contents One example was selectivity-estimator function; Tom mentioned about before. The pg_statistic holds statistical information delivered from table contents, so it may help users to infer the contents using its histograms. The discussion didn't get hot at that time. However, the point of mine is same as the reason why we don't handle covert-channels. The statistical is irreversible translation from the original data, so we need an intelligence process to infer them, not a direct data reference. We also had a discussion about a principle what type of scenarios should be considered as problem. One was that we didn't consider it is not a problem if a function internally references invisible rows, as long as it consumes them internally. Thus, we considered index-access-methods can be launched earlier than functions to filter out violated rows. Second was that we didn't consider it is not a problem if RLS allows users to infer invisible rows from the circumstances, as long as it is not possible to dump invisible data directly, because its bandwidth to leak information was quite slow. So, we decided not to handle covert-channel such as iteration of PK/FK proving in RLS. I still think the scenario [1] is the only problem to be solved prior to RLS features. The scenario [2] and [3] don't allow to leak the original data directly. In addition, the estimator function mentioned in [3] just references statistical data internally. It is same situation with index-access-method. Please give us the points at issue, if I now overlooked. Thanks, -- NEC Europe Ltd, SAP Global Competence Center KaiGai Kohei kohei.kai...@eu.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] collateral benefits of a crash-safe visibility map
On Tue, May 10, 2011 at 9:59 AM, Merlin Moncure mmonc...@gmail.com wrote: no, that wasn't my intent at all, except in the sense of wondering if a crash-safe visibility map provides a route of displacing a lot of hint bit i/o and by extension, making alternative approaches of doing that, including mine, a lot less useful. that's a good thing. Sadly, I don't think it's going to have that effect. The page-is-all-visible bits seem to offer a significant performance benefit over the xmin-committed hint bits; but the benefit of xmin-committed all by itself is too much to ignore. The advantages of the xmin-committed hint bit (as opposed to the all-visible page-level bit) are: (1) Setting the xmin-committed hint bit is a much more light-weight operation than setting the all-visible page-level bit. It can by done on-the-fly by any backend, rather than only by VACUUM, and need not be XLOG'd. (2) If there are long-running transactions on the system, xmin-committed can be set much sooner than all-visible - the transaction need only commit. All-visible can't be set until overlapping transactions have ended. (3) xmin-committed is useful on standby servers, whereas all-visible is ignored there. (Note that neither this patch nor index-only scans changes anything about that: it's existing behavior, necessitated by different xmin horizons.) So I think that attempts to minimize the overhead of setting the xmin-committed bit are not likely to be mooted by anything I'm doing. Keep up the good work. :-) Where I do think that we can possibly squeeze some additional benefit out of a crash-safe visibility map is in regards to anti-wraparound vacuuming. The existing visibility map is used to skip vacuuming of all-visible pages, but it's not used when XID wraparound is at issue. The reason is fairly obvious: a regular vacuum only needs to worry about getting rid of dead tuples (and a visibility map bit being set is good evidence that there are none), but an anti-wraparound vacuum also needs to worry about live tuples with xmins that are about to wrap around from past to future (such tuples must be frozen). There's a second reason, too: the visibility map bit, not being crash-safe, has a small chance of being wrong, and we'd like to eventually get rid of any dead tuples that slip through the cracks. Making the visibility map crash-safe doesn't directly address the first problem, but it does (if or when we're convinced that it's fairly bug-free) address the second one. To address the first problem, what we've talked about doing is something along the line of freezing the tuples at the time we mark the page all-visible, so we don't have to go back and do it again later. Unfortunately, it's not quite that simple, because freezing tuples that early would cause all sorts of headaches for hot standby, not to mention making Tom and Alvaro grumpy when they're trying to figure out a corruption problem and all the xmins are FrozenXID rather than whatever they were originally. We floated the idea of a tuple-level bit HEAP_XMIN_FROZEN that would tell the system to treat the tuple as frozen, but wouldn't actually overwrite the xmin field. That would solve the forensic problem with earlier freezing, but it doesn't do anything to resolve the Hot Standby problem. There is a performance issue to worry about, too: freezing operations must be xlog'd, as we update relfrozenxid based on the results, and therefore can't risk losing a freezing operation later on. So freezing sooner means more xlog activity for pages that might very well never benefit from it (if the tuples therein don't stick around long enough for it to matter). Nonetheless, I haven't completely given up hope. The current situation is that a big table into which new records are slowly being inserted has to be repeatedly scanned in its entirety for unfrozen tuples even though only a small and readily identifiable part of it can actually contain any such tuples, which is clearly less than ideal. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improvements to pgtune
FYI, I can help if you need javascript assistance. --- Greg Smith wrote: Shiv wrote: So my exams are over now and am fully committed to the project in terms of time. I have started compiling a sort of personal todo for myself. I agree with your advice to start the project with small steps first. (I have a copy of the code and am trying to glean as much of it as I can) I just fixed a couple of bugs in the program that were easier to correct than explain. The code changes have been pushed to the github repo. I've also revised the output format to be a lot nicer. There's a UI shortcut you may find useful too; the program now takes a single input parameter as the input file, outputting to standard out. So a sample run might look like this now: $ ./pgtune postgresql.conf.sample [old settings] #-- # pgtune wizard run on 2011-05-08 # Based on 2060728 KB RAM in the server #-- default_statistics_target = 100 maintenance_work_mem = 120MB checkpoint_completion_target = 0.9 effective_cache_size = 1408MB work_mem = 12MB wal_buffers = 8MB checkpoint_segments = 16 shared_buffers = 480MB max_connections = 80 I would really appreciate your reply to Josh's thoughts. It would help me understand the variety of tasks and a possible ordering for me to attempt them. Josh's comments :/ What would you list as the main things pgtune doesn't cover right now? I have my own list, but I suspect that yours is somewhat different./ / / /I do think that autotuning based on interrogating the database is possible. However, I think the way to make it not be a tar baby is to tackle it one setting at a time, and start with ones we have the most information for. One of the real challenges there is that some data can be gleaned from pg_* views, but a *lot* of useful performance data only shows up in the activity log, and then only if certain settings are enabled./ I just revised the entire TODO file (which is now TODO.rst, formatted in ReST markup: http://docutils.sourceforge.net/rst.html ; test with rst2html TODO.rst TODO.html and look at the result). It should be easier to follow the flow of now, and it's organized in approximately the order I think things need to get finished in. There are few major areas for expansion that might happen on this program to choose from. I was thinking about doing them in this order: 1) Fix the settings validation and limits. I consider this a good place to start on hacking the code. it's really necessary work eventually, and it's easier to get started with than the other ideas. 2) Improve internals related to tracking things like memory and connections so they're easier to pass around the program. Adding a platform class is what I was thinking of. See the Estimating shared memory usage section of the TODO for more information. Add PostgreSQL version as another input to that. 3) Improve the settings model used for existing parameters. Right now people have reported that the work_mem settings suggested in particular are too high for many servers. Ideas about why that is are in the TODO. (This really requires the platform change be done first, or the code will be too hard to write/maintain) 4) Estimate memory used by the configuration and output sysctl.conf files. (Needs platform change too) 5) Add tuning suggestions for new parameters. The most obvious ideas all involve adding common logging changes. 6) Create some new UIs for running the program. A text-based program that asked questions (a 'wizard') or a GUI program doing the same are two common suggestions. The ideas Josh was talking about for interrogating the database for things are all a long ways off from the current state of the code being able to support them. If (1) through (3) here were done, that whole direction starts with (5) and then runs further that way. That might be a valid direction to move next instead of the (4), (6) I've listed here. You'd have finished something that taught enough about how the existing program works to be able to make some more difficult design decisions about fitting new features into it. If you really want to get right into live server analysis, there's no way for that to fit into the current program yet. And I don't think you'll get enough practice to see how it would without doing some more basic work first. You might as well write something new if that's your goal, and expect that you may not finish anything useful by the end of the summer. If you want to complete a project that results in code that people absolutely will use, the more boring plan I've outlined goes that way. One of
Re: [HACKERS] the big picture for index-only scans
2011/5/10 Robert Haas robertmh...@gmail.com: So, what do we need in order to find our way to index-only scans? 3. Statistics. I believe that in order to accurately estimate the cost of an index-only scan, we're going to need to know the fraction of tuples that are on pages whose visibility map bits are set. I believe it should be fairly straightforward to have ANALYZE collect this information; and I'm inclined to do that as a separate patch. It seems like it would also be nice to know what fraction of tuples are on pages that don't have the visibility map set but where, in fact, all tuples on the page are visible to all transactions, so it would be legal to set the bit. A large discrepancy between these two percentages might be a good reason to auto-vacuum the table (perhaps using a really lazy vacuum[2]). I'm not sure if this can be added cheaply, though, and in any case, any change to the set of criteria that will trigger an auto-vacuum is probably a can of worms. Thoughts? ANALYZE can do the stats job for 'free' on the pages it collects anyway. So that looks like a good idea. I believe the really lazy vacuum is another topic; even if it will improve the performance of the index only scan to have tables already vacuuumed, the stats should expose that and the function cost_index(_only?)() taking care of that. 4. Planner and executor changes. In contrast to Heikki's original implementation, I'm inclined to not to try to split the Index Scan node into index scan and heap fetch. Since there are many choices for where to put the heap fetch node (any level of the join tree between the index scan and the root), this seems likely to result in a combinatorial explosion of paths[3], and I'm not real sure that the payback will be adequate. Furthermore, the idea of allowing user code to see tuples that will only later be determined not to have been visible to that MVCC snapshot in the first place seems pretty scary from a security perspective, though certainly there are possible benefits[4]. Instead, I'm inclined to just have the planner evaluate whether the necessary columns can be extracted purely from the index. The temptation is high to estimate the cost of an index_scan(only) + ordered(by ctid) table pages fetch if heap required. (this is what I understood from heikki suggestion 3-4. and it makes sense). It may be easier to implement both at once but I didn't find the branch in the Heikki's git repos. (probably removed since the long time) If not, we proceed as now. If so, we can use the index only approach of using the visibility map to decide which heap fetches can be skipped. It's not clear to me whether we need to compare the cost of the standard approach with the cost of the index only approach: in theory, if there aren't any visibility map bits anyway, the index only approach could be slower. But I'm not sure whether that problem is significant or common enough to be worth expending a lot of code on. Either way, the number of actual paths doesn't need to increase, because in this design, even if we apply a costing model, one approach will dominate the other. Heikki also suggested considering index scans in cases where we don't now[4, again] but I'm inclined to leave this, too, for a later optimization, again because balancing the increase in paths against the possible performance benefits of using indexes in more situations seems finicky. In short, for a first cut at this, I just want to look at this as a way to get cheaper index scans, and leave everything else to future work. Based on ANALYZE stats for the visibility, I believe cost_index and cost_index_only should be very similar functions (well, atm, I don't see the point to split it in 2 functions). Any thoughts welcome. Incidentally, if anyone else feels like working on this, feel free to let me know and I'm happy to step away, from all of it or from whatever part someone else wants to tackle. I'm mostly working on this because it's something that I think we really need to get done, more than having a burning desire to be the one who does it. Indexonly scans are welcome! I believe I can help on 3 and 4, but (really) not sure for 1 and 2. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] http://archives.postgresql.org/pgsql-hackers/2011-05/msg00292.php [2] http://archives.postgresql.org/pgsql-hackers/2011-03/msg00946.php [3] http://archives.postgresql.org/pgsql-hackers/2009-09/msg01379.php [4] http://archives.postgresql.org/pgsql-hackers/2009-07/msg00675.php -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
Re: [HACKERS] the big picture for index-only scans
On Tue, May 10, 2011 at 10:58 AM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: ANALYZE can do the stats job for 'free' on the pages it collects anyway. So that looks like a good idea. I believe the really lazy vacuum is another topic; even if it will improve the performance of the index only scan to have tables already vacuuumed, the stats should expose that and the function cost_index(_only?)() taking care of that. I basically agree. The connection is that - as we use the all-visible for more things, the performance penalty for failing to vacuum (say) an insert-only table will continue to grow. Still, as you say, clearly a separate topic. The temptation is high to estimate the cost of an index_scan(only) + ordered(by ctid) table pages fetch if heap required. (this is what I understood from heikki suggestion 3-4. and it makes sense). It may be easier to implement both at once but I didn't find the branch in the Heikki's git repos. (probably removed since the long time) I was thinking about this as well, at least if I understand you correctly. That would be similar to a bitmap index scan, and I think it would be a great thing to have, not only because it would allow us to get the advantages of index-only scans in situations that are well-suited to our current bitmap scans, but also because it could be batched. You could allocate a buffer of work_mem bytes and fill it up with TIDs; then, when it's full, you sort the buffer and start doing the necessary heap fetches in physical order. If you still need more rows, you can clear the buffer and go around for another pass. Based on ANALYZE stats for the visibility, I believe cost_index and cost_index_only should be very similar functions (well, atm, I don't see the point to split it in 2 functions). Yeah, I would more imagine modifying the existing function. Any thoughts welcome. Incidentally, if anyone else feels like working on this, feel free to let me know and I'm happy to step away, from all of it or from whatever part someone else wants to tackle. I'm mostly working on this because it's something that I think we really need to get done, more than having a burning desire to be the one who does it. Indexonly scans are welcome! I believe I can help on 3 and 4, but (really) not sure for 1 and 2. Well, I have code for #1, and just need reviews, and #2 shouldn't be that hard, and with luck I'll twist Bruce's arm into doing it (*waves to Bruce*). So #3 and #4 are the next thing to tackle. Any thoughts on what/how you'd like to contribute there? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] the big picture for index-only scans
2011/5/10 Robert Haas robertmh...@gmail.com: On Tue, May 10, 2011 at 10:58 AM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: ANALYZE can do the stats job for 'free' on the pages it collects anyway. So that looks like a good idea. I believe the really lazy vacuum is another topic; even if it will improve the performance of the index only scan to have tables already vacuuumed, the stats should expose that and the function cost_index(_only?)() taking care of that. I basically agree. The connection is that - as we use the all-visible for more things, the performance penalty for failing to vacuum (say) an insert-only table will continue to grow. Still, as you say, clearly a separate topic. The temptation is high to estimate the cost of an index_scan(only) + ordered(by ctid) table pages fetch if heap required. (this is what I understood from heikki suggestion 3-4. and it makes sense). It may be easier to implement both at once but I didn't find the branch in the Heikki's git repos. (probably removed since the long time) I was thinking about this as well, at least if I understand you correctly. That would be similar to a bitmap index scan, and I think it would be a great thing to have, not only because it would allow us to get the advantages of index-only scans in situations that are well-suited to our current bitmap scans, but also because it could be batched. You could allocate a buffer of work_mem bytes and fill it up with TIDs; then, when it's full, you sort the buffer and start doing the necessary heap fetches in physical order. If you still need more rows, you can clear the buffer and go around for another pass. Based on ANALYZE stats for the visibility, I believe cost_index and cost_index_only should be very similar functions (well, atm, I don't see the point to split it in 2 functions). Yeah, I would more imagine modifying the existing function. Any thoughts welcome. Incidentally, if anyone else feels like working on this, feel free to let me know and I'm happy to step away, from all of it or from whatever part someone else wants to tackle. I'm mostly working on this because it's something that I think we really need to get done, more than having a burning desire to be the one who does it. Indexonly scans are welcome! I believe I can help on 3 and 4, but (really) not sure for 1 and 2. Well, I have code for #1, and just need reviews, and #2 shouldn't be that hard, and with luck I'll twist Bruce's arm into doing it (*waves to Bruce*). So #3 and #4 are the next thing to tackle. Any thoughts on what/how you'd like to contribute there? I can provide initial patchs for cost and analyze, at least. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] crash-safe visibility map, take five
On 2011-05-10 14:48, Robert Haas wrote: We could avoid all of this complexity - and the possibility of pinning the visibility map page needlessly - by locking the heap buffer first and then pinning the visibility map page if the heap page is all-visible. However, that would involve holding the lock on the heap buffer across a possible disk I/O to bring the visibility map page into memory, which is something the existing code tries pretty hard to avoid. Assuming that the visibillity map would be used for visibillity testing, just picking the lock would effectively mean we want it in the buffers, which would not be that bad? Or what is the downside for keeping it across IO? Will it block other processes trying to read it? -- Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] the big picture for index-only scans
On Tue, May 10, 2011 at 3:25 AM, Robert Haas robertmh...@gmail.com wrote: So, what do we need in order to find our way to index-only scans? 1. The visibility map needs to be crash-safe. The basic idea of index-only scans is that, instead of checking the heap to find out whether each tuple is visible, we first check the visibility map. If This topic has been discussed many times, yet I have never seen an assessment that explains WHY we would want to do index-only scans. This will be a complex addition to the codebase and one that could introduce bugs into MVCC. It seems reasonable to look at what the benefit of this would be, and what the use case/ benefit profile is before we spend a long time adding this optimization. I asked for this previously on earlier threads also. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] the big picture for index-only scans
The temptation is high to estimate the cost of an index_scan(only) + ordered(by ctid) table pages fetch if heap required. (this is what I understood from heikki suggestion 3-4. and it makes sense). It may be easier to implement both at once but I didn't find the branch in the Heikki's git repos. (probably removed since the long time) I was thinking about this as well, at least if I understand you yes. correctly. That would be similar to a bitmap index scan, and I think it would be a great thing to have, not only because it would allow us to get the advantages of index-only scans in situations that are well-suited to our current bitmap scans, but also because it could be batched. You could allocate a buffer of work_mem bytes and fill it up with TIDs; then, when it's full, you sort the buffer and start doing the necessary heap fetches in physical order. If you still need more rows, you can clear the buffer and go around for another pass. Issue remaining here is that we don't have 'safe' Indexonly_scan, just indexscan with probability on the 'only'. -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Process wakeups when idle and power consumption
On Tue, May 10, 2011 at 12:45 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, May 10, 2011 at 5:14 AM, Peter Geoghegan pe...@2ndquadrant.com wrote: On 10 May 2011 09:45, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I think we need to refactor the function into something like: #define WL_LATCH_SET 1 #define WL_SOCKET_READABLE 2 #define WL_SOCKET_WRITEABLE 4 #define WL_TIMEOUT 8 #define WL_POSTMASTER_DEATH 16 While I agree with the need to not box ourselves into a corner on the latch interface by making sweeping assumptions, isn't the fact that a socket became readable or writable strictly an implementation detail? The thing about the socket being readable/writeable is needed for walsender. It needs to notice when its connection to walreceiver is writeable (so it can send more WAL) or readable (so it can receive a reply message). I've got a feeling that things will go easier if we have a separate connection for the feedback channel. Yes, two connections, one in either direction. That would make everything simple, nice one way connections. It would also mean we could stream at higher data rates. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] the big picture for index-only scans
Simon Riggs si...@2ndquadrant.com wrote: This topic has been discussed many times, yet I have never seen an assessment that explains WHY we would want to do index-only scans. In databases with this feature, it's not too unusual for a query which uses just an index to run one or more orders of magnitude faster than a query which has to randomly access the heap for each index entry. That seems like enough evidence of its possible value in PostgreSQL to proceed to the point where benchmarks become possible. I'm assuming that, like all other features added as performance optimizations, it won't be committed until there are benchmarks showing the net benefit. As a thought experiment, picture the relative costs of scanning a portion of an index in index sequence, and being done, versus scanning a portion of an index in index sequence and jumping to a random heap access for each index entry as you go. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Process wakeups when idle and power consumption
Simon Riggs si...@2ndquadrant.com writes: I've got a feeling that things will go easier if we have a separate connection for the feedback channel. Yes, two connections, one in either direction. That would make everything simple, nice one way connections. It would also mean we could stream at higher data rates. The above sounds like complete nonsense. TCP connections are already full-duplex. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Process wakeups when idle and power consumption
On 10.05.2011 14:39, Peter Geoghegan wrote: Attached is win32 implementation of the named pipe trick. It consists of a Visual Studio 2008 solution that contains two projects, named_pipe_trick (which represents the postmaster) and auxiliary_backend (which represents each auxiliary process). I split the solution into two projects/programs because Windows lacks fork() to make it all happen with a single program. Thoughts? Once I have some buy-in, I'd like to write a patch for the latch code that incorporates monitoring the postmaster using the named pipe trick (for both unix_latch.c and win32_latch.c), plus Heikki's suggestions. It should be an anonymous pipe that's inherited by the child process by rather than a named pipe. Otherwise seems fine to me, as far as this proof of concept program goes. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] the big picture for index-only scans
On Tue, May 10, 2011 at 11:27 AM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: 2011/5/10 Robert Haas robertmh...@gmail.com: On Tue, May 10, 2011 at 10:58 AM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: ANALYZE can do the stats job for 'free' on the pages it collects anyway. So that looks like a good idea. I believe the really lazy vacuum is another topic; even if it will improve the performance of the index only scan to have tables already vacuuumed, the stats should expose that and the function cost_index(_only?)() taking care of that. I basically agree. The connection is that - as we use the all-visible for more things, the performance penalty for failing to vacuum (say) an insert-only table will continue to grow. Still, as you say, clearly a separate topic. The temptation is high to estimate the cost of an index_scan(only) + ordered(by ctid) table pages fetch if heap required. (this is what I understood from heikki suggestion 3-4. and it makes sense). It may be easier to implement both at once but I didn't find the branch in the Heikki's git repos. (probably removed since the long time) I was thinking about this as well, at least if I understand you correctly. That would be similar to a bitmap index scan, and I think it would be a great thing to have, not only because it would allow us to get the advantages of index-only scans in situations that are well-suited to our current bitmap scans, but also because it could be batched. You could allocate a buffer of work_mem bytes and fill it up with TIDs; then, when it's full, you sort the buffer and start doing the necessary heap fetches in physical order. If you still need more rows, you can clear the buffer and go around for another pass. Based on ANALYZE stats for the visibility, I believe cost_index and cost_index_only should be very similar functions (well, atm, I don't see the point to split it in 2 functions). Yeah, I would more imagine modifying the existing function. Any thoughts welcome. Incidentally, if anyone else feels like working on this, feel free to let me know and I'm happy to step away, from all of it or from whatever part someone else wants to tackle. I'm mostly working on this because it's something that I think we really need to get done, more than having a burning desire to be the one who does it. Indexonly scans are welcome! I believe I can help on 3 and 4, but (really) not sure for 1 and 2. Well, I have code for #1, and just need reviews, and #2 shouldn't be that hard, and with luck I'll twist Bruce's arm into doing it (*waves to Bruce*). So #3 and #4 are the next thing to tackle. Any thoughts on what/how you'd like to contribute there? I can provide initial patchs for cost and analyze, at least. OK, cool. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] crash-safe visibility map, take five
On Tue, May 10, 2011 at 11:34 AM, Jesper Krogh jes...@krogh.cc wrote: On 2011-05-10 14:48, Robert Haas wrote: We could avoid all of this complexity - and the possibility of pinning the visibility map page needlessly - by locking the heap buffer first and then pinning the visibility map page if the heap page is all-visible. However, that would involve holding the lock on the heap buffer across a possible disk I/O to bring the visibility map page into memory, which is something the existing code tries pretty hard to avoid. Assuming that the visibillity map would be used for visibillity testing, just picking the lock would effectively mean we want it in the buffers, which would not be that bad? Or what is the downside for keeping it across IO? Will it block other processes trying to read it? Heikki might be in a better position to comment on that than I am, since he wrote the existing code. But I think that's basically the issue. When one process has an exclusive buffer lock, nobody else can scan the tuples in that block - so a sequential scan, for example, that reached that block, or an index scan that needed to probe into it, would pile up behind the read of the visibility map page. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] the big picture for index-only scans
Kevin Grittner kevin.gritt...@wicourts.gov writes: Simon Riggs si...@2ndquadrant.com wrote: This topic has been discussed many times, yet I have never seen an assessment that explains WHY we would want to do index-only scans. In databases with this feature, it's not too unusual for a query which uses just an index to run one or more orders of magnitude faster than a query which has to randomly access the heap for each index entry. That seems like enough evidence of its possible value in PostgreSQL to proceed to the point where benchmarks become possible. I'm assuming that, like all other features added as performance optimizations, it won't be committed until there are benchmarks showing the net benefit. As a thought experiment, picture the relative costs of scanning a portion of an index in index sequence, and being done, versus scanning a portion of an index in index sequence and jumping to a random heap access for each index entry as you go. It's already the case that we'll flip over to a bitmap indexscan, and thus get rid of most/all of the random page accesses, in situations where this is likely to be a big win. Pointing to the performance difference in databases that don't do that is therefore not too convincing. I'm inclined to agree that index-only scans might be worth the amount of work that's involved ... but I share Simon's desire to see some proof before anything gets committed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] collateral benefits of a crash-safe visibility map
On Tue, May 10, 2011 at 3:47 PM, Robert Haas robertmh...@gmail.com wrote: To address the first problem, what we've talked about doing is something along the line of freezing the tuples at the time we mark the page all-visible, so we don't have to go back and do it again later. Unfortunately, it's not quite that simple, because freezing tuples that early would cause all sorts of headaches for hot standby, not to mention making Tom and Alvaro grumpy when they're trying to figure out a corruption problem and all the xmins are FrozenXID rather than whatever they were originally. We floated the idea of a tuple-level bit HEAP_XMIN_FROZEN that would tell the system to treat the tuple as frozen, but wouldn't actually overwrite the xmin field. That would solve the forensic problem with earlier freezing, but it doesn't do anything to resolve the Hot Standby problem. There is a performance issue to worry about, too: freezing operations must be xlog'd, as we update relfrozenxid based on the results, and therefore can't risk losing a freezing operation later on. So freezing sooner means more xlog activity for pages that might very well never benefit from it (if the tuples therein don't stick around long enough for it to matter). Hmmm, do we really need to WAL log freezing? Can we break down freezing into a 2 stage process, so that we can have first stage as a lossy operation and a second stage that is WAL logged? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hint bit cache v5
On Mon, May 9, 2011 at 5:12 PM, Merlin Moncure mmonc...@gmail.com wrote: I'd like to know if this is a strategy that merits further work...If anybody has time/interest that is. It's getting close to the point where I can just post it to the commit fest for review. In particular, I'm concerned if Tom's earlier objections can be satisfied. If not, it's back to the drawing board... I'm interested in what you're doing here. From here, there's quite a lot of tuning possibilities. It would be very useful to be able to define some metrics we are interested in reducing and working out how to measure them. That way we can compare all the different variants of this to see which way of doing things works best. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] crash-safe visibility map, take five
Robert Haas robertmh...@gmail.com writes: On Tue, May 10, 2011 at 11:34 AM, Jesper Krogh jes...@krogh.cc wrote: Or what is the downside for keeping it across IO? Will it block other processes trying to read it? Heikki might be in a better position to comment on that than I am, since he wrote the existing code. But I think that's basically the issue. When one process has an exclusive buffer lock, nobody else can scan the tuples in that block - so a sequential scan, for example, that reached that block, or an index scan that needed to probe into it, would pile up behind the read of the visibility map page. Right, it's the loss of potential concurrency that's annoying here. On the other hand, the concurrency loss might be entirely theoretical --- in particular, if other potential readers of the heap page would probably also need to wait for the visibility page to come in, then nothing is gained by letting them acquire the heap page lock sooner. I've not read this thread in any detail yet, but if we're going to be jumping through extremely complex hoops to avoid that scenario, it might be better to KISS ... especially in the first iteration. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] collateral benefits of a crash-safe visibility map
On Tue, May 10, 2011 at 12:57 PM, Simon Riggs si...@2ndquadrant.com wrote: Hmmm, do we really need to WAL log freezing? Can we break down freezing into a 2 stage process, so that we can have first stage as a lossy operation and a second stage that is WAL logged? That might solve the relfrozenxid problem - set the bits in the heap, sync the heap, then update relfrozenxid once the heap is guaranteed safely on disk - but it again seems problematic for Hot Standby. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] crash-safe visibility map, take five
On Tue, May 10, 2011 at 1:00 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, May 10, 2011 at 11:34 AM, Jesper Krogh jes...@krogh.cc wrote: Or what is the downside for keeping it across IO? Will it block other processes trying to read it? Heikki might be in a better position to comment on that than I am, since he wrote the existing code. But I think that's basically the issue. When one process has an exclusive buffer lock, nobody else can scan the tuples in that block - so a sequential scan, for example, that reached that block, or an index scan that needed to probe into it, would pile up behind the read of the visibility map page. Right, it's the loss of potential concurrency that's annoying here. On the other hand, the concurrency loss might be entirely theoretical --- in particular, if other potential readers of the heap page would probably also need to wait for the visibility page to come in, then nothing is gained by letting them acquire the heap page lock sooner. I've not read this thread in any detail yet, but if we're going to be jumping through extremely complex hoops to avoid that scenario, it might be better to KISS ... especially in the first iteration. I wouldn't describe the hoops as extremely complex; I don't feel any inclination to simplify the patch beyond what it is right now. Of course, we'll see what the feedback looks like after more people have read the patch, but my feeling is that the patch strikes a reasonable balance between performance and keeping it simple. There are some more complicated shenanigans that I started to experiment with and ripped out as premature optimization, but this part I think is OK. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] the big picture for index-only scans
On Tue, May 10, 2011 at 12:53 PM, Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: Simon Riggs si...@2ndquadrant.com wrote: This topic has been discussed many times, yet I have never seen an assessment that explains WHY we would want to do index-only scans. In databases with this feature, it's not too unusual for a query which uses just an index to run one or more orders of magnitude faster than a query which has to randomly access the heap for each index entry. That seems like enough evidence of its possible value in PostgreSQL to proceed to the point where benchmarks become possible. I'm assuming that, like all other features added as performance optimizations, it won't be committed until there are benchmarks showing the net benefit. As a thought experiment, picture the relative costs of scanning a portion of an index in index sequence, and being done, versus scanning a portion of an index in index sequence and jumping to a random heap access for each index entry as you go. It's already the case that we'll flip over to a bitmap indexscan, and thus get rid of most/all of the random page accesses, in situations where this is likely to be a big win. Pointing to the performance difference in databases that don't do that is therefore not too convincing. I'm inclined to agree that index-only scans might be worth the amount of work that's involved ... but I share Simon's desire to see some proof before anything gets committed. Well, we're not in the habit of committing performance patches without performance numbers, so I doubt we'll reverse that trend now, and certainly I had no intention of doing so. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Collation mega-cleanups
On Mon, May 09, 2011 at 03:57:12PM -0400, Robert Haas wrote: On Mon, May 9, 2011 at 2:58 PM, Bruce Momjian br...@momjian.us wrote: Tom this collation stuff has seen more post-feature-commit cleanups than I think any patch I remember. Is there anything we can learn from this? How about don't commit all the large patches at the end of the cycle? My take home from following this is: 'Even Tom can get caught in the just one more little change trap' :-) Ross -- Ross Reedstrom, Ph.D. reeds...@rice.edu Systems Engineer Admin, Research Scientistphone: 713-348-6166 Connexions http://cnx.orgfax: 713-348-3665 Rice University MS-375, Houston, TX 77005 GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] collateral benefits of a crash-safe visibility map
Robert Haas robertmh...@gmail.com writes: On Tue, May 10, 2011 at 12:57 PM, Simon Riggs si...@2ndquadrant.com wrote: Hmmm, do we really need to WAL log freezing? That might solve the relfrozenxid problem - set the bits in the heap, sync the heap, then update relfrozenxid once the heap is guaranteed safely on disk - but it again seems problematic for Hot Standby. ... or even warm standby. You basically *have to* WAL-log freezing before you can truncate pg_clog. The only freedom you have here is freedom to mess with the policy about how soon you try to truncate pg_clog. (Doing an unlogged freeze operation first is right out, too, if it causes the system to fail to perform/log the operation later.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] the big picture for index-only scans
On Tue, May 10, 2011 at 5:17 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Simon Riggs si...@2ndquadrant.com wrote: This topic has been discussed many times, yet I have never seen an assessment that explains WHY we would want to do index-only scans. In databases with this feature, it's not too unusual for a query which uses just an index to run one or more orders of magnitude faster than a query which has to randomly access the heap for each index entry. That seems like enough evidence of its possible value in PostgreSQL to proceed to the point where benchmarks become possible. I'm assuming that, like all other features added as performance optimizations, it won't be committed until there are benchmarks showing the net benefit. As a thought experiment, picture the relative costs of scanning a portion of an index in index sequence, and being done, versus scanning a portion of an index in index sequence and jumping to a random heap access for each index entry as you go. I can picture that. Regrettably, I can also picture the accesses to the visibility map, the maintenance operations on the VM that are needed for this and the contention that both of those will cause. ISTM quite likely that we'll slow down writes to some extent in order to improve this use case. So I'm interested in knowing how broad the use case is and what the overheads are, rather than have an aw crap! moment in the future where we finish the code and only then realise its benefit footprint is not useful. Best to start out with a clear benefit analysis other than other DBMS do it. For example, will this be an index-specific tuning option (manual/automatic), per table or an always-on feature? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Collation mega-cleanups
On Tuesday, May 10, 2011 07:08:23 PM Ross J. Reedstrom wrote: On Mon, May 09, 2011 at 03:57:12PM -0400, Robert Haas wrote: On Mon, May 9, 2011 at 2:58 PM, Bruce Momjian br...@momjian.us wrote: Tom this collation stuff has seen more post-feature-commit cleanups than I think any patch I remember. Is there anything we can learn from this? How about don't commit all the large patches at the end of the cycle? My take home from following this is: 'Even Tom can get caught in the just one more little change trap' I don't think any of the changes from Tom deserves that categorization. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] collateral benefits of a crash-safe visibility map
On 10.05.2011 17:47, Robert Haas wrote: On Tue, May 10, 2011 at 9:59 AM, Merlin Moncuremmonc...@gmail.com wrote: no, that wasn't my intent at all, except in the sense of wondering if a crash-safe visibility map provides a route of displacing a lot of hint bit i/o and by extension, making alternative approaches of doing that, including mine, a lot less useful. that's a good thing. Sadly, I don't think it's going to have that effect. The page-is-all-visible bits seem to offer a significant performance benefit over the xmin-committed hint bits; but the benefit of xmin-committed all by itself is too much to ignore. The advantages of the xmin-committed hint bit (as opposed to the all-visible page-level bit) are: (1) Setting the xmin-committed hint bit is a much more light-weight operation than setting the all-visible page-level bit. It can by done on-the-fly by any backend, rather than only by VACUUM, and need not be XLOG'd. (2) If there are long-running transactions on the system, xmin-committed can be set much sooner than all-visible - the transaction need only commit. All-visible can't be set until overlapping transactions have ended. (3) xmin-committed is useful on standby servers, whereas all-visible is ignored there. (Note that neither this patch nor index-only scans changes anything about that: it's existing behavior, necessitated by different xmin horizons.) (4) xmin-committed flag attached directly to the tuple provides some robustness in case of corruption, due to bad hw. Without the flag, a single bit flip in the clog could in the worst case render all of your bulk-loaded data invisible and vacuumable. Of course, corruption will always eat your data to some extent, but the hint bits provide some robustness. Hint bits are close to the data itself, not in another file like the clog, which can come handy at disaster recovery. A flag in the heap page header isn't too different from a per-tuple hint bit from that point of view, it's still in the same page as the data itself. A bit in the clog or visibility map is not. Not sure how much performance we're willing to sacrifice for that, but it's something to keep in mind. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] the big picture for index-only scans
Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: Simon Riggs si...@2ndquadrant.com wrote: This topic has been discussed many times, yet I have never seen an assessment that explains WHY we would want to do index-only scans. In databases with this feature, it's not too unusual for a query which uses just an index to run one or more orders of magnitude faster than a query which has to randomly access the heap for each index entry. That seems like enough evidence of its possible value in PostgreSQL to proceed to the point where benchmarks become possible. I'm assuming that, like all other features added as performance optimizations, it won't be committed until there are benchmarks showing the net benefit. As a thought experiment, picture the relative costs of scanning a portion of an index in index sequence, and being done, versus scanning a portion of an index in index sequence and jumping to a random heap access for each index entry as you go. It's already the case that we'll flip over to a bitmap indexscan, and thus get rid of most/all of the random page accesses, in situations where this is likely to be a big win. Pointing to the performance difference in databases that don't do that is therefore not too convincing. Sure. Of course, if you're only accessing twenty thousand rows from a table containing fifty million rows, bitmap index scans could come out pretty close to random access times; but on the whole I agree that the scale of benefit in PostgreSQL won't tend to match what people see in other products. Note that my words were enough evidence of its possible value in PostgreSQL to proceed to the point where benchmarks become possible. In particular, we might want to somehow try to make clear to people that the very wide indexes they are accustomed to creating to allow this optimization in other products might be inefficient compared to creating several one-column indexes which would enable bitmap logical operations. I'm inclined to agree that index-only scans might be worth the amount of work that's involved So we agree there. ... but I share Simon's desire to see some proof before anything gets committed. And we agree there. In fact, I can't think of anyone in the community who doesn't want to see that for *any* purported performance enhancement. My overall gut feel is that there will be some circumstances where the covering index optmization is much faster, and some where people expect it to be, but it isn't. The trickiest part of this might be developing a costing model which allows us to make the right choice most of the time. And even if we get it perfect, we can expect questions about why the covering index wasn't used, and requests for hints so they can force it. :-( -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hint bit cache v5
On Tue, May 10, 2011 at 11:59 AM, Simon Riggs si...@2ndquadrant.com wrote: On Mon, May 9, 2011 at 5:12 PM, Merlin Moncure mmonc...@gmail.com wrote: I'd like to know if this is a strategy that merits further work...If anybody has time/interest that is. It's getting close to the point where I can just post it to the commit fest for review. In particular, I'm concerned if Tom's earlier objections can be satisfied. If not, it's back to the drawing board... I'm interested in what you're doing here. From here, there's quite a lot of tuning possibilities. It would be very useful to be able to define some metrics we are interested in reducing and working out how to measure them. That way we can compare all the different variants of this to see which way of doing things works best. thanks for that! I settled on this approach because the downside cases should hopefully be pretty limited. The upside is a matter of debate although fairly trivial to demonstrate synthetically. I'm looking for some way of benchmarking the benefits in non-simulated fashion. We desperately need something like a performance farm (as many many others have mentioned). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers
Heikki Linnakangas wrote: Well, my first patch was two-phase commit. And I had never even used PostgreSQL before I dived into the source tree and started to work on that Well, everyone knows you're awesome. A small percentage of the people who write patches end up having the combination of background skills, mindset, and approach to pull something like that off. But there are at least a dozens submissions that start review with I don't think there will ever work, but I can't even read your malformed patch to be sure for every one that went like 2PC. If every submitter was a budding Heikki we wouldn't need patch submission guidelines at all. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] collateral benefits of a crash-safe visibility map
On Tue, May 10, 2011 at 9:47 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, May 10, 2011 at 9:59 AM, Merlin Moncure mmonc...@gmail.com wrote: no, that wasn't my intent at all, except in the sense of wondering if a crash-safe visibility map provides a route of displacing a lot of hint bit i/o and by extension, making alternative approaches of doing that, including mine, a lot less useful. that's a good thing. Sadly, I don't think it's going to have that effect. The page-is-all-visible bits seem to offer a significant performance benefit over the xmin-committed hint bits; but the benefit of xmin-committed all by itself is too much to ignore. The advantages of the xmin-committed hint bit (as opposed to the all-visible page-level bit) are: (1) Setting the xmin-committed hint bit is a much more light-weight operation than setting the all-visible page-level bit. It can by done on-the-fly by any backend, rather than only by VACUUM, and need not be XLOG'd. (2) If there are long-running transactions on the system, xmin-committed can be set much sooner than all-visible - the transaction need only commit. All-visible can't be set until overlapping transactions have ended. (3) xmin-committed is useful on standby servers, whereas all-visible is ignored there. (Note that neither this patch nor index-only scans changes anything about that: it's existing behavior, necessitated by different xmin horizons.) right. #1 could maybe worked around somehow and #2 is perhaps arguable, at least in some workloads, but #3 is admittedly a killer especially since the bit is on the page. I noted your earlier skepticism regarding moving the page visibility check completely to the VM: In some ways, that would make things much simpler. But to make that work, every insert/update/delete to a page would have to pin the visibility map page and clear PD_ALL_VISIBLE if appropriate, so it might not be good from a performance standpoint, especially in high-concurrency workloads. Right now, if PD_ALL_VISIBLE isn't set, we don't bother touching the visibility map page, which seems like a possibly important optimization. That's debatable, but probably moot. Thanks for thinking that through though. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 4.1beta1: ANYARRAY disallowed for DOMAIN types which happen to be arrays
Given: CREATE DOMAIN int_array AS int[]; The operator [] works fine in 4.1beta1: SELECT (ARRAY[1,2,3]::int_array)[1]; proving that int_array is an array type with element type int. It is inconsistent that other array functions and operators don't work. On Mon, 2011-05-09 at 23:32 -0400, Tom Lane wrote: So we basically had three alternatives to make it better: #1 downcast to the array type, which would possibly silently break applications that were relying on the function result being considered of the domain type I do not think of this as Downcasting int_array to int[] but as allowing an ANYARRAY to match int_array which is an array type. Since no cast is logically required, the return type is the same as the first argument type, as expected and as PostgreSQL has done for some time. #2 re-apply domain checks on the function result, which would be a performance hit and possibly again result in unobvious breakage If the function result is a new value then nothing is being re-applied. If it is an existing value of the domain type which was passed in or extracted from a data structure, then the domain checks have already been applied. This is a red herring. #3 explicitly break it by throwing a parse error until you downcast (and then upcast the function result if you want) I realize that #3 is a bit unpleasant, but are either of the other two better? At least #3 shows you where you need to check for problems. Wrapping most (but not all) of your array operations in downcasts and upcasts is horrible. There is another issue that wasn't really mentioned in the previous thread, which is that if we are matching a domain-over-array to a function's ANYARRAY argument, what exactly should be allowed to match to ANYELEMENT --- or if the function returns ANYELEMENT, what should the imputed result type be? Since PostgreSQL allows indexing of the domain type, we already know the answer. I don't even get why there is confusion abou the element type of an array. AFAICS it's impossible to give an answer to that without effectively deciding that function argument matching smashes the domain to its base type (the array type). It's not very clear what's the point of a domain type if every operation on it is going to neglect its domain-ness. Yes, what is the point of neglecting the domain-ness of a domain type by being forced to downcast it to an unchecked type before (some) array operations? If a value is being constructed of a domain-type which has constraints, check them. When I don't want the security of a domain type I can cast it to its representation type before I passed it, but it seems bizarre to be required to do such a thing! I did read the previous threads some time ago. They seemed mostly to be concerned with discussing the internal implementation of these matters and the typmod feature (which I still don't understand). The internal algorithms and deta structures which PostgreSQL uses to internally represent SQL types and operations are a weak justification for PostgreSQL's behavior - they can be changed if they are wrong. I am still hoping to get rid of my domains which are arrays when PostgreSQL supports arrays of elements which are of domain types. Could we at least defer this change until that is done? _Greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Collation mega-cleanups
On Tue, May 10, 2011 at 07:21:16PM +0200, Andres Freund wrote: On Tuesday, May 10, 2011 07:08:23 PM Ross J. Reedstrom wrote: On Mon, May 09, 2011 at 03:57:12PM -0400, Robert Haas wrote: On Mon, May 9, 2011 at 2:58 PM, Bruce Momjian br...@momjian.us wrote: Tom this collation stuff has seen more post-feature-commit cleanups than I think any patch I remember. Is there anything we can learn from this? How about don't commit all the large patches at the end of the cycle? My take home from following this is: 'Even Tom can get caught in the just one more little change trap' I don't think any of the changes from Tom deserves that categorization. No disrespect intended, far from it. The trap is that something at seems at a distance as relatively small can grow on closer inspection. Which I think is exactly what Tom said (paraphrased) The pre-commit review was insufficent i.e. the remaining problems seemed little, but were not. In addition, little is relative to who's doing the changes, over what domain. Things that are little for Tom on PostgreSQL would not be so for me. Presumably the inverse is true over other domains. So perhaps it was more of the This code is less ready than I thought it was, but now that I've spent the time understanding it and the problem, the shortest way out is forward. I think we've all been in that swamp, at one time or another. Ross -- Ross Reedstrom, Ph.D. reeds...@rice.edu Systems Engineer Admin, Research Scientistphone: 713-348-6166 Connexions http://cnx.orgfax: 713-348-3665 Rice University MS-375, Houston, TX 77005 GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 4.1beta1: ANYARRAY disallowed for DOMAIN types which happen to be arrays
On Mon, May 9, 2011 at 11:32 PM, Tom Lane t...@sss.pgh.pa.us wrote: J. Greg Davidson g...@ngender.net writes: * Tighten casting checks for domains based on arrays (Tom Lane) When a domain is based on an array type,..., such a domain type is no longer allowed to match an anyarray parameter of a polymorphic function, except by explicitly downcasting it to the base array type. This will require me to add hundreds of casts to my code. I do not get how this will Tighten casting checks. It will certainly not tighten my code! Could you explain how it is good to not be able to do array operations with a type which is an array? The discussion that led up to that decision is in this thread: http://archives.postgresql.org/pgsql-hackers/2010-10/msg01362.php specifically here: http://archives.postgresql.org/pgsql-hackers/2010-10/msg01545.php The previous behavior was clearly broken. The new behavior is at least consistent. It might be more user-friendly if we did automatic downcasts in these cases, but we were not (and still are not) doing automatic downcasts for domains over scalar types in comparable cases, so it's not very clear why domains over array types should be treated differently. To be concrete, consider the function array_append(anyarray, anyelement) yielding anyarray. Suppose we have a domain D over int[] and the call array_append(var_of_type_D, 42). If we automatically downcast the variable to int[], should the result of the function be considered to be of type D, or type int[]? This isn't a trivial distinction because choosing to consider it of type D means we have to re-check D's domain constraints, which might or might not be satisfied by the modified array. Previous releases considered the result to be of type D, *without* rechecking the domain constraints, which was flat out wrong. So we basically had three alternatives to make it better: * downcast to the array type, which would possibly silently break applications that were relying on the function result being considered of the domain type * re-apply domain checks on the function result, which would be a performance hit and possibly again result in unobvious breakage * explicitly break it by throwing a parse error until you downcast (and then upcast the function result if you want) I realize that #3 is a bit unpleasant, but are either of the other two better? At least #3 shows you where you need to check for problems. Aren't any applications that would be broken by #1 broken already? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] collateral benefits of a crash-safe visibility map
On Tue, May 10, 2011 at 6:02 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, May 10, 2011 at 12:57 PM, Simon Riggs si...@2ndquadrant.com wrote: Hmmm, do we really need to WAL log freezing? Can we break down freezing into a 2 stage process, so that we can have first stage as a lossy operation and a second stage that is WAL logged? That might solve the relfrozenxid problem - set the bits in the heap, sync the heap, then update relfrozenxid once the heap is guaranteed safely on disk - but it again seems problematic for Hot Standby. How about we truncate the clog differently on each server? We could have a special kind of VACUUM that runs during Hot Standby, setting frozen hint bits only. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] collateral benefits of a crash-safe visibility map
On Tue, May 10, 2011 at 1:49 PM, Simon Riggs si...@2ndquadrant.com wrote: On Tue, May 10, 2011 at 6:02 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, May 10, 2011 at 12:57 PM, Simon Riggs si...@2ndquadrant.com wrote: Hmmm, do we really need to WAL log freezing? Can we break down freezing into a 2 stage process, so that we can have first stage as a lossy operation and a second stage that is WAL logged? That might solve the relfrozenxid problem - set the bits in the heap, sync the heap, then update relfrozenxid once the heap is guaranteed safely on disk - but it again seems problematic for Hot Standby. How about we truncate the clog differently on each server? We could have a special kind of VACUUM that runs during Hot Standby, setting frozen hint bits only. Interesting idea. It does seem complicated. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 4.1beta1: ANYARRAY disallowed for DOMAIN types which happen to be arrays
Robert Haas robertmh...@gmail.com writes: On Mon, May 9, 2011 at 11:32 PM, Tom Lane t...@sss.pgh.pa.us wrote: So we basically had three alternatives to make it better: * downcast to the array type, which would possibly silently break applications that were relying on the function result being considered of the domain type * re-apply domain checks on the function result, which would be a performance hit and possibly again result in unobvious breakage * explicitly break it by throwing a parse error until you downcast (and then upcast the function result if you want) I realize that #3 is a bit unpleasant, but are either of the other two better? At least #3 shows you where you need to check for problems. Aren't any applications that would be broken by #1 broken already? My point is that doing #1 would break them *silently* --- if you did have a problem, figuring out what it was could require a great deal of sleuthing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 4.1beta1: ANYARRAY disallowed for DOMAIN types which happen to be arrays
On Tue, May 10, 2011 at 1:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, May 9, 2011 at 11:32 PM, Tom Lane t...@sss.pgh.pa.us wrote: So we basically had three alternatives to make it better: * downcast to the array type, which would possibly silently break applications that were relying on the function result being considered of the domain type * re-apply domain checks on the function result, which would be a performance hit and possibly again result in unobvious breakage * explicitly break it by throwing a parse error until you downcast (and then upcast the function result if you want) I realize that #3 is a bit unpleasant, but are either of the other two better? At least #3 shows you where you need to check for problems. Aren't any applications that would be broken by #1 broken already? My point is that doing #1 would break them *silently* --- if you did have a problem, figuring out what it was could require a great deal of sleuthing. Eh, I'm confused. Explain further? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers
All, Part of the trouble is in the question. Having a patch rejected is not really a problem; it's something you should learn from. I know it can be annoying. I get annoyed when it happens to me too. But I try to get over it as quickly as possible, and either fix the patch, or find another (and better) way to do the same thing, or move on. Everybody here is acting in good faith, and nobody's on a power trip. That's one of the good things about working on Postgres. If it were otherwise I would have moved on to something else long ago. The problem is not that patches get rejected. It's *how* they get rejected, and how the submitter experiences the process of them getting rejected. Did they learn something from it and understand the reasons for the rejection? or did they experience the process as arbitrary, frustrating, and incomprehesible? Ideally, we want a sumbitter whose patch has been rejected to walk away with either my proposal was rejected, and I understand why it's a bad idea even if I don't agree, or my proposal was rejected, and I know what needs to be done to fix it. Of course, there are always idiots who won't learn anything no matter how good our process is. But if the whole submission process is perceived to be fair and understandible, those will be a tiny minority. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] collateral benefits of a crash-safe visibility map
On Tue, May 10, 2011 at 6:08 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, May 10, 2011 at 12:57 PM, Simon Riggs si...@2ndquadrant.com wrote: Hmmm, do we really need to WAL log freezing? That might solve the relfrozenxid problem - set the bits in the heap, sync the heap, then update relfrozenxid once the heap is guaranteed safely on disk - but it again seems problematic for Hot Standby. ... or even warm standby. You basically *have to* WAL-log freezing before you can truncate pg_clog. The only freedom you have here is freedom to mess with the policy about how soon you try to truncate pg_clog. (Doing an unlogged freeze operation first is right out, too, if it causes the system to fail to perform/log the operation later.) Trying to think outside of the box from all these things we can't do. Can we keep track of the relfrozenxid and then note when we fsync the relevant file, then issue a single WAL record to indicate that? Still WAL logging, but 1 record per table, not 1 record per tuple. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Process wakeups when idle and power consumption
On 10 May 2011 17:43, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: It should be an anonymous pipe that's inherited by the child process by rather than a named pipe. Otherwise seems fine to me, as far as this proof of concept program goes. Alright, thanks. I'll use an anonymous pipe in the patch itself. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Collation mega-cleanups
Ross J. Reedstrom reeds...@rice.edu writes: So perhaps it was more of the This code is less ready than I thought it was, but now that I've spent the time understanding it and the problem, the shortest way out is forward. Yeah, exactly. By the time I really understood how incomplete the collation patch was, I'd done most of the work to fix it; and insisting on backing it out of 9.1 didn't seem productive (even assuming that I could have won that argument, which was by no means a given). I'm still fairly troubled by the potential overhead in the form of extra lookups during parse time, but have not had the time to try to measure that. Too bad we haven't got a performance-test farm. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Server Programming Interface underspecified in 4.1beta1
J. Greg Davidson g...@ngender.net wrote: I would like to be able to program to a C or C++ SPI which is clean, complete and type-safe. I am good at reading API documentation in C or C++ and would be happy to review any proposed improvements. I want to second Andrew's post, and emphasize that such suggestions are not intended to be flippant -- a lot of PostgreSQL development (documentation included) occurs because someone feels the need for something and goes ahead with making it happen. If you want to put the effort into improving the API documentation for SPI, but don't know where to start, I'd be happy to help you get started. It's great when someone who benefits from PostgreSQL can scratch their own itch and give back to the community in the process. Let me know. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] the big picture for index-only scans
On Tue, May 10, 2011 at 5:53 PM, Tom Lane t...@sss.pgh.pa.us wrote: It's already the case that we'll flip over to a bitmap indexscan, and thus get rid of most/all of the random page accesses, in situations where this is likely to be a big win. Pointing to the performance difference in databases that don't do that is therefore not too convincing. The other major effect is row size. Many databases have very wide rows, perhaps on the order of 1kB. So the table with a million rows might be 8GB but the index on a few key columns might only be a few megabytes. Even if you have to read the entire index in random order it'll likely all be cached and scan faster than the table itself. One problem with hanging on benchmarks is that database schema design can actually change based on what performs well. People get in the habit of creating indexes in Oracle that are only logical when you realize they allow the database to do an index-only scan because they contain extra columns that aren't actually used in where clauses but are typically in the select list. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 4.1beta1: ANYARRAY disallowed for DOMAIN types which happen to be arrays
Robert Haas robertmh...@gmail.com writes: On Tue, May 10, 2011 at 1:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, May 9, 2011 at 11:32 PM, Tom Lane t...@sss.pgh.pa.us wrote: So we basically had three alternatives to make it better: * downcast to the array type, which would possibly silently break applications that were relying on the function result being considered of the domain type Aren't any applications that would be broken by #1 broken already? My point is that doing #1 would break them *silently* --- if you did have a problem, figuring out what it was could require a great deal of sleuthing. Eh, I'm confused. Explain further? The previous behavior was effectively to allow a domain-over-array type to match the ANYARRAY symbol, without doing anything else special with it. In particular if the function returned ANYARRAY then its output would be taken to be of the domain type, which is wrong since the function might produce an array value that doesn't meet the domain's constraints. We could, and perhaps should, instead downcast the domain to the array type, which would imply that ANYARRAY is matching the base type not the domain, and in particular that a declared ANYARRAY result type means the base type not the domain. The things that were bothering me about this at the time were (1) it would be a silent change of behavior, and (2) it doesn't seem very consistent to handle domain-to-ANYARRAY matching this way without also doing something with domain-to-ANYELEMENT matching. An example of the inconsistency is that something like create domain myi as int; select array[1,2,3] || 4::myi; fails with operator does not exist: integer[] || myi, not only in HEAD but all recent releases. If we're going to downcast a domain-over-array to plain array to allow it to be used with array_append, it's not clear why we don't allow myi to be automatically downcast to int for the same purpose. However, exactly what we ought to do instead isn't entirely clear, and when I brought it up back in October no one seemed to care enough to pursue the matter. So I just left both cases as throwing error, which seemed the most conservative course. I'm still willing to talk about alternatives, though it seems a bit late in the 9.1 cycle to be designing behaviors. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] the big picture for index-only scans
On Tue, May 10, 2011 at 6:25 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: ... but I share Simon's desire to see some proof before anything gets committed. And we agree there. In fact, I can't think of anyone in the community who doesn't want to see that for *any* purported performance enhancement. I'm not talking about eventual commit, I'm talking about the whole process of development. We should be focusing on improving a measurable performance issue, not on implementing one exact design that someone thought might help. How will we review the patch except by measuring it against the declared performance goal? Otherwise all the various options along the way will just be matters of opinion, instead of measurement. From what has been said so far, the use case for this is related to the practice of using covered indexes, which makes me nervous because that is an expert level tuning task on other DBMS, limiting the range of people who get benefit. The typical speed up for non-covered indexes will come when we access a very large table (not in cache) via an index scan that is smaller than a bitmapindex scan. Will we be able to gauge selectivities sufficiently accurately to be able to pinpoint that during optimization? How will we know that the table is not in cache? Or is this an optimisation in the executor for a bitmapheap scan? I'm not being negative, I'm trying to avoid arguments, blind alleys and much wasted development if we focus on the wrong things or go to design too early.. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers
On Tue, May 10, 2011 at 6:54 PM, Josh Berkus j...@agliodbs.com wrote: Of course, there are always idiots who won't learn anything no matter how good our process is. But if the whole submission process is perceived to be fair and understandible, those will be a tiny minority. The thing is, I think things are much better now than they were three or four years ago. At the time the project had grown much faster than the existing stable of developers and the rate at which patches were being submitted was much greater than they could review. It's not perfect, Tom still spends more of his time reviewing patches when he would probably enjoy writing fresh code -- and it's a shame if you think about the possibilities we might be missing out on if he were let loose. And patches still don't get a detailed HOWTO on what needs to happen before commit. But it's better. We need to be careful about looking at the current situation and deciding it's not perfect so a wholesale change is needed when the only reason it's not worse is because the current system was adopted. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backpatching of Teach the regular expression functions to do case-insensitive matching
On mån, 2011-05-09 at 10:56 -0400, Robert Haas wrote: I'm just shooting from the hip here, but maybe we could have a separate (probably smaller) set of tests that are only designed to work in a limited range of locales and/or encodings. I'm really pleased that we now have the src/test/isolation stuff, and I think some more auxilliary test suites would be quite excellent. Even if people didn't always want to run every single one when doing things manually, the buildfarm certainly could. Well, the result of people don't always run them is the rest of src/test/. How much of that stuff even works anymore? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers
The thing is, I think things are much better now than they were three or four years ago. Oh, no question. If you read above in this thread, I'm not really proposing a change in the current process, just documenting the current process. Right now there's a gap between how sumbitters expect things to work, and how they actually do work. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backpatching of Teach the regular expression functions to do case-insensitive matching
On mån, 2011-05-09 at 12:42 -0400, Tom Lane wrote: The problem we'd have is that there's no way (at present) to make such a test pass on every platform. Windows has its own set of locale names (which initdb fails to install as collations anyway) and we also have the problem that OS X can be counted on to get UTF8 sorting wrong. (It might be okay for case-folding though; not sure.) Possibly we could just provide an alternate expected file for OS X, but I don't see a decent workaround for Windows --- it would pretty much have to have its very own test case. Windows =Vista has locale names similar to Linux, and my cursory testing with some hacked up test suite indicates that it produces the same results as the Linux expected file, modulo some error message differences. So I think this could be made to work, it just needs someone to implement a few bits. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backpatching of Teach the regular expression functions to do case-insensitive matching
Peter Eisentraut pete...@gmx.net writes: On mån, 2011-05-09 at 12:42 -0400, Tom Lane wrote: The problem we'd have is that there's no way (at present) to make such a test pass on every platform. Windows has its own set of locale names (which initdb fails to install as collations anyway) and we also have the problem that OS X can be counted on to get UTF8 sorting wrong. (It might be okay for case-folding though; not sure.) Possibly we could just provide an alternate expected file for OS X, but I don't see a decent workaround for Windows --- it would pretty much have to have its very own test case. Windows =Vista has locale names similar to Linux, and my cursory testing with some hacked up test suite indicates that it produces the same results as the Linux expected file, modulo some error message differences. So I think this could be made to work, it just needs someone to implement a few bits. Well, that would be great, but the someone is not going to be me; I don't do Windows. I'd be willing to take responsibility for putting in the regression test once the necessary Windows-specific code was committed, though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] the big picture for index-only scans
Simon Riggs si...@2ndquadrant.com wrote: Kevin Grittner kevin.gritt...@wicourts.gov wrote: ... but I share Simon's desire to see some proof before anything gets committed. And we agree there. In fact, I can't think of anyone in the community who doesn't want to see that for *any* purported performance enhancement. I'm not talking about eventual commit, I'm talking about the whole process of development. I'm confused -- you want to see proof that the concept works well in PostgreSQL before development effort on it begins? Or there is some alternative you would like to see pursued instead? Something else? From what has been said so far, the use case for this is related to the practice of using covered indexes, which makes me nervous because that is an expert level tuning task on other DBMS What? On the versions of MS SQL Server and Sybase ASE I've used it costs covered index plans against all the other plans automatically, and picks this type of plan if the cost looks lower. Sure, DBAs sometimes add indexes, or add columns to indexes, in hopes that such a plan will be chosen -- but what's new and different there? The typical speed up for non-covered indexes will come when we access a very large table (not in cache) via an index scan that is smaller than a bitmapindex scan. Will we be able to gauge selectivities sufficiently accurately to be able to pinpoint that during optimization? How will we know that the table is not in cache? Or is this an optimisation in the executor for a bitmapheap scan? I would continue to object to using current cache contents for plan choice because of plan instability and the fact that an odd initial cache load could skew plans in a bad direction indefinitely. I do agree (and have already posted) that I think the hardest part of this might be developing a good cost model. I doubt that's an insoluble problem, especially since it is something we can refine over time as we gain experience with the edge cases. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers
On Tue, May 10, 2011 at 3:09 PM, Greg Stark gsst...@mit.edu wrote: The thing is, I think things are much better now than they were three or four years ago. At the time the project had grown much faster than the existing stable of developers and the rate at which patches were being submitted was much greater than they could review. Just in the last 2.5 years since I've been around, there have, AFAICT, been major improvements both in the timeliness and quality of the feedback we provide, and the quality of the patches we receive. When I first started reviewing, it was very common to blow through the CommitFest application and bounce half the patches back for failure to apply, failure to pass the regression tests, or other blindingly obvious breakage. That's gone down almost to nothing. It's also become much more common for patches to include adequate documentation and regression tests - or at least *an effort* at documentation and regression tests - than was formerly the case. We still bounce things for those reasons from time to time - generally from recurring contributors who think for some reason that it's someone else's job to worry about cleaning up their patch - but it's less common than it used to be. We still have some rough edges around the incorporation of large patches. But it could be so much worse. We committed something like six major features in a month: collations, sync rep, SSI, SQL/MED, extensions, writeable CTEs, and a major overhaul of PL/python. While that's likely to delay the release a bit (and already has), and has already produced quite a few bug reports and will produce many more before we're done, it's still an impressive accomplishment. I'm not sure we could have done that even a year ago. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backpatching of Teach the regular expression functions to do case-insensitive matching
On Tue, May 10, 2011 at 3:09 PM, Peter Eisentraut pete...@gmx.net wrote: On mån, 2011-05-09 at 10:56 -0400, Robert Haas wrote: I'm just shooting from the hip here, but maybe we could have a separate (probably smaller) set of tests that are only designed to work in a limited range of locales and/or encodings. I'm really pleased that we now have the src/test/isolation stuff, and I think some more auxilliary test suites would be quite excellent. Even if people didn't always want to run every single one when doing things manually, the buildfarm certainly could. Well, the result of people don't always run them is the rest of src/test/. How much of that stuff even works anymore? I don't know. But I'm not sure I see your point. The fact that we haven't yet succeeded in doing something doesn't mean that it's either impossible or unimportant. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VARIANT / ANYTYPE datatype
Darren Duncan wrote: To follow-up, an additional feature that would be useful and resembles union types is the variant where you could declare a union type first and then separately other types could declare they are a member of the union. I'm talking about loosely what mixins or type-roles or interfaces etc are in other languages. The most trivial example would be declaring an ENUM-alike first and then separately declaring the component values where the latter declare they are part of the ENUM, and this could make it easier to add or change ENUM values. But keep in mind that this is a distinct concept from what we're otherwise talking about as being union types. -- Darren Duncan Should this be a TODO item? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VARIANT / ANYTYPE datatype
Excerpts from Bruce Momjian's message of mar may 10 16:21:36 -0400 2011: Darren Duncan wrote: To follow-up, an additional feature that would be useful and resembles union types is the variant where you could declare a union type first and then separately other types could declare they are a member of the union. I'm talking about loosely what mixins or type-roles or interfaces etc are in other languages. The most trivial example would be declaring an ENUM-alike first and then separately declaring the component values where the latter declare they are part of the ENUM, and this could make it easier to add or change ENUM values. But keep in mind that this is a distinct concept from what we're otherwise talking about as being union types. -- Darren Duncan Should this be a TODO item? The general idea of C-style unions, sure. Mixin-style stuff ... not sure. Seems like it'd be pretty painful. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VARIANT / ANYTYPE datatype
Alvaro Herrera wrote: Excerpts from Bruce Momjian's message of mar may 10 16:21:36 -0400 2011: Darren Duncan wrote: To follow-up, an additional feature that would be useful and resembles union types is the variant where you could declare a union type first and then separately other types could declare they are a member of the union. I'm talking about loosely what mixins or type-roles or interfaces etc are in other languages. The most trivial example would be declaring an ENUM-alike first and then separately declaring the component values where the latter declare they are part of the ENUM, and this could make it easier to add or change ENUM values. But keep in mind that this is a distinct concept from what we're otherwise talking about as being union types. -- Darren Duncan Should this be a TODO item? The general idea of C-style unions, sure. Mixin-style stuff ... not sure. Seems like it'd be pretty painful. From the perspective of users, the single greatest distinction between these 2 kinds of unions is being closed versus being open, and that is the primary reason to choose one over the other. A closed union is the C-style, where the union type declares what other types or values it ranges over. The closed union is best when the union definer can reasonably assume that the union won't either ever or would rarely be changed, and in particular can assume that application or database code would have knowledge of the parts that it deals specially with, so it can be assumed that if the closed union type ever is changed then any code designed to use it may be changed at the same time. A good example for a closed union would be a boolean type which just ranges over the two singletons false and true or an order type which ranges just over the three singletons decrease, same, increase. Or a type which enumerates the 7 days of the week, as this is unlikely to change in the life of a system. An open union is the mixin style, where the component types declare they are part of the union. The open union is best when it is likely that there would be either user-defined or extension-defined new types for the union to come along later, and we want to have code that can be generic or polymorphic for any types that can be used in particular ways. Examples of open union types could be number, which all the numeric types compose, and so you can know say that you can use the generic numeric operators on values you have simply if their types compose the number union type, and it still works if more numeric types appear later. Likewise, the string open union could include both text and blob, as both support catenation and substring matches or extraction, for example. This would aid to operator overloading in a generic way, letting you use the same syntax for different types, but allowing types to mix is optional; eg, you could support add(int,int) and add(real,real) without supporting add(int,real) etc but the syntax add(x,y) is shared, and you do this while still having a strong type system; allowing the mixing is optional case-by-case. Supporting the open union is closer to supporting ANYTYPE while the closed union isn't so much. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why not install pgstattuple by default?
Tom Lane wrote: Christopher Browne cbbro...@gmail.com writes: But people are evidently still setting packaging policies based on how things were back in 7.3, even though that perhaps isn't necessary anymore. FWIW, once you get past the client versus server distinction, I think most subpackaging decisions are based on either the idea that only a minority of people will want this, or a desire to limit how many dependencies are pulled in by the main package(s). Both of those concerns apply to various subsets of -contrib, which means it's going to be hard to persuade packagers to fold -contrib into the -server package altogether. Nor would you gain their approval by trying to pre-empt the decision. We might get somewhere by trying to identify a small set of particularly popular contrib modules that don't add any extra dependencies, and then recommending to packagers that those ones get bundled into the main server package. Certainly it's not a huge amount of code; less than 2MB these days. - % wc `dpkg -L postgresql-contrib-9.0` | tail -1 15952 67555 1770987 total Well, to add some concrete facts rather than generalities to my own post, here are the sizes of the built RPMs from my last build for Fedora: -rw-r--r--. 1 tgl tgl 3839458 Apr 18 10:50 postgresql-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl 490788 Apr 18 10:50 postgresql-contrib-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl 27337677 Apr 18 10:51 postgresql-debuginfo-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl 961660 Apr 18 10:50 postgresql-devel-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl 7569048 Apr 18 10:50 postgresql-docs-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl 246506 Apr 18 10:50 postgresql-libs-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl64940 Apr 18 10:50 postgresql-plperl-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl65776 Apr 18 10:50 postgresql-plpython-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl45941 Apr 18 10:50 postgresql-pltcl-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl 5302117 Apr 18 10:50 postgresql-server-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl 1370509 Apr 18 10:50 postgresql-test-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl 3644113 Apr 18 10:50 postgresql-upgrade-9.0.4-1.fc13.x86_64.rpm Is that last one pg_upgrade? It seems very big. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backpatching of Teach the regular expression functions to do case-insensitive matching
On tis, 2011-05-10 at 15:17 -0400, Tom Lane wrote: Well, that would be great, but the someone is not going to be me; I don't do Windows. Yeah, me neither. At least not for this release. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backpatching of Teach the regular expression functions to do case-insensitive matching
On tis, 2011-05-10 at 15:48 -0400, Robert Haas wrote: On Tue, May 10, 2011 at 3:09 PM, Peter Eisentraut pete...@gmx.net wrote: Well, the result of people don't always run them is the rest of src/test/. How much of that stuff even works anymore? I don't know. But I'm not sure I see your point. The fact that we haven't yet succeeded in doing something doesn't mean that it's either impossible or unimportant. Yes, but doing the same thing again in hope of different results isn't the right thing either. I'm all for more test suites, but we should make them as widely accessible and accessed as possible so that they get maintained. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why not install pgstattuple by default?
Bruce Momjian br...@momjian.us writes: Tom Lane wrote: here are the sizes of the built RPMs from my last build for Fedora: -rw-r--r--. 1 tgl tgl 3839458 Apr 18 10:50 postgresql-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl 490788 Apr 18 10:50 postgresql-contrib-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl 27337677 Apr 18 10:51 postgresql-debuginfo-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl 961660 Apr 18 10:50 postgresql-devel-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl 7569048 Apr 18 10:50 postgresql-docs-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl 246506 Apr 18 10:50 postgresql-libs-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl64940 Apr 18 10:50 postgresql-plperl-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl65776 Apr 18 10:50 postgresql-plpython-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl45941 Apr 18 10:50 postgresql-pltcl-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl 5302117 Apr 18 10:50 postgresql-server-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl 1370509 Apr 18 10:50 postgresql-test-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl 3644113 Apr 18 10:50 postgresql-upgrade-9.0.4-1.fc13.x86_64.rpm Is that last one pg_upgrade? It seems very big. pg_upgrade plus a supporting set of 8.4 files ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VARIANT / ANYTYPE datatype
On Wed, May 4, 2011 at 5:58 PM, Alvaro Herrera alvhe...@alvh.no-ip.org wrote: A customer came to us with this request: a way to store any data in a column. We've gone back and forth trying to determine reasonable implementation restrictions, safety and useful semantics for them. I note that this has been requested in the past: http://archives.postgresql.org/pgsql-general/2004-02/msg01266.php I think its a reasonably common use case. Would it be possible to do this with a typed hstore? Seems easier to add something there than it would be to add the VARIANT type as discussed here. both Oracle and MS-SQL have it Do they? What types are they called? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Infinity bsearch crash on Windows
A 9.1Beta1 test report from Richard Broersma (and confirmed on another system by Mark Watson) showed up pgsql-testers this week at http://archives.postgresql.org/pgsql-testers/2011-05/msg0.php with the following test crashing his Windows server every time: SELECT 'INFINITY'::TIMESTAMP; That works fine for me on Linux. Richard chased the error in the logs, which was a generic you can't touch that memory one, down to a full stack trace: http://archives.postgresql.org/pgsql-testers/2011-05/msg9.php It looks like it's losing its mind inside of src/backend/utils/adt/datetime.c , specifically at this line in datebsearch: 3576 while (last = base) 3577 { 3578 position = base + ((last - base) 1); 3579 result = key[0] - position-token[0]; Why crash there only on Windows? Was the problem actually introduced above this part of the code? These are all questions I have no answer for. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backpatching of Teach the regular expression functions to do case-insensitive matching
Peter Eisentraut pete...@gmx.net writes: I'm all for more test suites, but we should make them as widely accessible and accessed as possible so that they get maintained. Yeah. My preference would really be to push something like collate.linux.utf8 into the standard regression tests, but we'd first have to get it to where there was only one .sql file and not more than three or so variant expected files (one of which would be the one for platforms that don't support locale_t, analogous to the no-support expected file for the xml test). If we were at that point, then instead of having a separate make target, I'd be very strongly tempted to include the test in the standard tests by the expedient of having it create and \c to a separate database with suitable values of ENCODING, LC_COLLATE, etc. The lack of initdb support for getting more-or-less-standard collation entries into pg_collation on Windows seems to be the major missing piece from here (dunno if Peter is aware of others). If we don't fix that before release, we're going to regret it anyway IMO, because of the inevitable tide of questions/complaints from Windows users trying to use the collation feature. We've already seen at least one such from a beta tester. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VARIANT / ANYTYPE datatype
On Tue, May 10, 2011 at 3:57 PM, Simon Riggs si...@2ndquadrant.com wrote: On Wed, May 4, 2011 at 5:58 PM, Alvaro Herrera alvhe...@alvh.no-ip.org wrote: A customer came to us with this request: a way to store any data in a column. We've gone back and forth trying to determine reasonable implementation restrictions, safety and useful semantics for them. I note that this has been requested in the past: http://archives.postgresql.org/pgsql-general/2004-02/msg01266.php I think its a reasonably common use case. Would it be possible to do this with a typed hstore? Seems easier to add something there than it would be to add the VARIANT type as discussed here. both Oracle and MS-SQL have it Do they? What types are they called? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services MS SQL Server calls it's variant type 'sql_variant', but it's limited to a subset of the data types they support. Basically, it can store any numeric type, or any binary or text type with a constrained length. No timestamps, geometry, XML, user-defined types, etc. allowed. So it's not really as much of an any value type as it might look on the surface. Don't know any details of Oracle's implementation. -Eric
Re: [HACKERS] the big picture for index-only scans
On Tue, May 10, 2011 at 8:35 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Simon Riggs si...@2ndquadrant.com wrote: Kevin Grittner kevin.gritt...@wicourts.gov wrote: ... but I share Simon's desire to see some proof before anything gets committed. And we agree there. In fact, I can't think of anyone in the community who doesn't want to see that for *any* purported performance enhancement. I'm not talking about eventual commit, I'm talking about the whole process of development. I'm confused -- you want to see proof that the concept works well in PostgreSQL before development effort on it begins? Or there is some alternative you would like to see pursued instead? Something else? Well, I didn't ask for that and agree it would be foolish to demand proof ahead of development. I know this technique is effective in other DBMS, I just want to be certain it will be effective for us before too much work is done. We have the additional requirement for a crash safe vacuum map that needs to be consulted, with possible contention effects. Sybase etc can simply avoid the logical I/O, which is always a win, in or out of cache. So the problem is quite different for us. What I suggested was a assessment and benefit case because we normally start with a problem and then work out how to solve it. Normally, others come forward with the why? when? questions and it feels like there's a bit of groupthink going on here. This looks to me like its being approached like it was a feature, but it looks to me like a possible optimisation, so suggest we treat it that way. Out of concern, I don't want you to waste time on work that *may* not be that useful in practice, and I don't want to miss improvements or alternatives either. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Feature proposal: distinguish each PostgreSQL instance in the event log
From: Tom Lane t...@sss.pgh.pa.us MauMau maumau...@gmail.com writes: I've encountered one problem on Windows. I need to support running all of my products on one host simultaneously. Plus, I need to log messages in syslog/event log. On Linux, I can distinguish the messages of one product and those of other products by setting syslog_ident in postgresql.conf. On the other hand, I can't do that on Windows: all of the PostgreSQL instances use the same fixed event source name PostgreSQL. SO I'd like to propose a trivial feature which allows users to set event source. I'm a bit concerned by the fact that this seems to require global actions to be taken (registering/deregistering) when a GUC value changes. That's going to cause headaches, not least because you cannot assume that the value changes synchronously across all Postgres processes. Maybe that's only an artifact of the way you worded this and not a real requirement ... but if it is a requirement you should think about how to avoid it. If you can't avoid it then most likely the event name is going to have to be PGC_POSTMASTER, ie not changeable after startup. Also, what happens if we fail to deregister because of crashing? Is it better to have one parameter for syslog program name and event source, instead of having both syslog_ident and event_source? I don't like aliases for GUC variables, and in any case I think it'd be confusing to force both usages to share a name that's not particularly appropriate for either one alone. So personally, -1 for unifying those. Others might see it differently though. Considering Tom's advice, I'll make a patch as follows: 1. Create a new GUC variable called event_source and don't touch anything with the existing syslog_ident. 2. Make event_source PGC_POSTMASTER at first because I don't think it is not a requirement to change the event source name while PostgreSQL server is active. Also, what happens if we fail to deregister because of crashing? The registration/deregistration is performed separately from PostgreSQL's start/stop, so there is no concern about this. Please see Registering eventlog on Windows on the page below: Installation Procedure http://www.postgresql.org/docs/9.0/static/install-procedure.html In fact, I've almost finished writing the program patch and testing. However, I'd like to ask for your opinions about a few things before completing and submitting the patch. 1. Documentation Currently, the event Source registration is described on the above page. However, many of my colleagues fail to find the article. They use PostgreSQL as one of many software and don't fully read the manual. I wonder if it would be better to put the article on the following section, because this is in the chapter about installation on Windows: 16.1.4. Cleaning and installing http://www.postgresql.org/docs/9.0/static/install-windows-full.html or: Post-Installation Setup http://www.postgresql.org/docs/9.0/static/install-post.html In addition, I think the above page should be linked from the description of log_destination parameter. 2. pg_ctl's event logging pg_ctl also logs to event log. Should pg_ctl use the event_source setting? According to the response to the bug #6011 report, pg_ctl does not need to obey the postgresql.conf setting, because postgresql.conf is for the server. However, I'm afraid that the users claim that logging with event source PostgreSQL instead of the event_source setting is a bug. For reference, pg_ctl uses port parameter in postgresql.conf. Regards, MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Infinity bsearch crash on Windows
On 10 May 2011 23:02, Greg Smith g...@2ndquadrant.com wrote: Why crash there only on Windows? Was the problem actually introduced above this part of the code? These are all questions I have no answer for. I don't find it at all surprising that there's a memory corruption bug that only manifests itself on Windows. Recently, I reported a bug in pgAdmin that turned out to be a simple case of forgetting to allocate an extra byte of memory for a null in a c string. The outward problem couldn't be reproduced on Mac - it only occurred on Linux. Of course, the problem with undefined behaviour is not that it might cause your program to crash, but that it might not cause your program to crash. For debug builds, Visual C++ allocates no man's land guard bytes on either side of areas of allocated memory, which is great for catching heap corruption bugs. My guess is that when the VC++ debugger issues a breakpoint, that's exactly where the memory is being corrupted/improperly dereferenced. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Feature proposal: distinguish each PostgreSQL instance in the event log
MauMau maumau...@gmail.com writes: MauMau maumau...@gmail.com writes: I've encountered one problem on Windows. I need to support running all of my products on one host simultaneously. Plus, I need to log messages in syslog/event log. On Linux, I can distinguish the messages of one product and those of other products by setting syslog_ident in postgresql.conf. On the other hand, I can't do that on Windows: all of the PostgreSQL instances use the same fixed event source name PostgreSQL. SO I'd like to propose a trivial feature which allows users to set event source. BTW, what will this accomplish exactly that couldn't be accomplished by setting log_line_prefix to include the desired identifier? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Infinity bsearch crash on Windows
Greg Smith g...@2ndquadrant.com writes: A 9.1Beta1 test report from Richard Broersma (and confirmed on another system by Mark Watson) showed up pgsql-testers this week at http://archives.postgresql.org/pgsql-testers/2011-05/msg0.php with the following test crashing his Windows server every time: SELECT 'INFINITY'::TIMESTAMP; Hmm ... I bet this is related to the recent reports about ALTER USER VALID UNTIL 'infinity' crashing on Windows. Can the people seeing this get through the regression tests? Perhaps more to the point, what is their setting of TimeZone? What does the pg_timezone_abbrevs view show for them? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] the big picture for index-only scans
Simon Riggs si...@2ndquadrant.com wrote: Normally, others come forward with the why? when? questions and it feels like there's a bit of groupthink going on here. This looks to me like its being approached like it was a feature, but it looks to me like a possible optimisation, so suggest we treat it that way. This issue has come up a great many times over the years, and there has been much discussion around it. The Wiki page is here: http://wiki.postgresql.org/wiki/Index-only_scans This currently references 11 threads on the topic. I'd bet that by spending a couple hours at it I could quadruple that number of threads. (I'd really rather not, though.) The problem is that there are regular and fairly frequent complaints on the list about queries which run slower than people expect because the heap must be checked for visibility information when matching index entries are found. It has become enough of a conspicuous issue that a lot of people are interested in seeing if something can be done about it. After much discussion, people are trying to advance a plan to find an answer. I'm sure nobody involved would ignore any suggestion on how it might be done better; but at this point, I don't think it's fair to suggest that this is not being pursued in response to a real problem, or that no serious thought has been given to direction before people started moving. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [BUGS] BUG #5957: createdb with description and md5 auth forces to provide password twice
Bruce Momjian wrote: Tom Lane wrote: Bruce Momjian br...@momjian.us writes: Prior to PG 8.2, this was necessary to put the comment on the database, but now that we have the shared comment/description table pg_shdescription, this is not necessary. Do we need createdb to be able to create databases for pre-8.2 clusters? If not, the attached patch fixes the double-prompting. Well, if you're only going to change this in HEAD, that might be an acceptable limitation, but if you intend to back-patch I think not. Older versions of createdb are probably significantly more likely to be used with even-older servers. This code has been that way since pre-8.2 so I see no need to backpatch; this is the first such complaint I have seen. Seems like it wouldn't be that hard to test the server version and only reconnect if it's pre-8.2. I am not excited about adding more code for this so I am thinking head-only. Attached patch applied to head only. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c new file mode 100644 index 9b72eac..544f2f6 *** a/src/bin/scripts/createdb.c --- b/src/bin/scripts/createdb.c *** main(int argc, char *argv[]) *** 192,197 --- 192,202 appendPQExpBuffer(sql, ;\n); + /* + * Connect to the 'postgres' database by default, except have + * the 'postgres' user use 'template1' so he can create the + * 'postgres' database. + */ conn = connectDatabase(strcmp(dbname, postgres) == 0 ? template1 : postgres, host, port, username, prompt_password, progname); *** main(int argc, char *argv[]) *** 208,219 } PQclear(result); - PQfinish(conn); if (comment) { - conn = connectDatabase(dbname, host, port, username, prompt_password, progname); - printfPQExpBuffer(sql, COMMENT ON DATABASE %s IS , fmtId(dbname)); appendStringLiteralConn(sql, comment, conn); appendPQExpBuffer(sql, ;\n); --- 213,221 *** main(int argc, char *argv[]) *** 231,239 } PQclear(result); - PQfinish(conn); } exit(0); } --- 233,242 } PQclear(result); } + PQfinish(conn); + exit(0); } diff --git a/src/bin/scripts/dropdb.c b/src/bin/scripts/dropdb.c new file mode 100644 index 1cf18fd..48f73ae *** a/src/bin/scripts/dropdb.c --- b/src/bin/scripts/dropdb.c *** main(int argc, char *argv[]) *** 113,118 --- 113,123 appendPQExpBuffer(sql, DROP DATABASE %s;\n, fmtId(dbname)); + /* + * Connect to the 'postgres' database by default, except have + * the 'postgres' user use 'template1' so he can drop the + * 'postgres' database. + */ conn = connectDatabase(strcmp(dbname, postgres) == 0 ? template1 : postgres, host, port, username, prompt_password, progname); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] the big picture for index-only scans
On Wed, May 11, 2011 at 12:14 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: The problem is that there are regular and fairly frequent complaints on the list about queries which run slower than people expect To be fair about 3/4 of them were actually complaining about the lack of some global materialized cache of the aggregate value. Covering index-only scans are only going to be a linear speedup no matter how large the factor it's not going to turn select count(*) into a O(1) operation. I support the idea of thinking of this as an optimization. But I don't think there's much question. If we can avoid doing the i/o on the heap that's an obvious and huge win. Sure the costs of maintaining the vm need to be measured against the gains but it we don't know what those costs are yet and whoever works on it will be well aware of that balance. On a separate note though, Simon, I don't know what you mean by we normally start with a problem. It's an free software project and people are free to work on whatever interests them whether that's because it solves a problem they have, helps a client who's paying them, or just because it's of academic interest to them. We don't always take their patches if they aren't of general interest but people propose all kinds of crazy experimental ideas all the time. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgresql.conf error checking strategy
Robert Haas wrote: On Mon, May 9, 2011 at 11:10 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Sun, May 8, 2011 at 1:04 AM, Tom Lane t...@sss.pgh.pa.us wrote: Yes, definitely. ?Perhaps summarize as rethink how we handle partially correct postgresql.conf files. ?Or maybe Robert sees it as rethink approach to making sure all backends share the same value of critical settings? ?Or maybe those are two different TODOs? The second is what I had in mind. ?I'm thinking that at least for critical GUCs we need a different mechanism for making sure everything stays in sync, like having the postmaster write a precompiled file and convincing the backends to read it in some carefully synchronized fashion. ?However, it's not clear to me whether something along those lines (or some other lines) would solve the problem you were complaining about; therefore it's possible, as you say, that there are two separate action items here. ?Or maybe not: maybe someone can come up with an approach that swats both problems in one go. Well, the thing that was annoying me was that because a backend saw one value in postgresql.conf as incorrect, it was refusing to apply any changes at all from postgresql.conf. ?And worse, there was no log entry to give any hint what was going on. ?This doesn't seem to me to have much to do with the problem you're on about. ?I agree it's conceivable that someone might think of a way to solve both issues at once, but I think we'd better list them as separate TODOs. OK by me. Two TODOs added: Allow postgresql.conf settings to be accepted by backends even if some settings are invalid for those backends * http://archives.postgresql.org/pgsql-hackers/2011-04/msg00330.php * http://archives.postgresql.org/pgsql-hackers/2011-05/msg00375.php Incomplete itemAllow all backends to receive postgresql.conf setting changes at the same time * http://archives.postgresql.org/pgsql-hackers/2011-04/msg00330.php * http://archives.postgresql.org/pgsql-hackers/2011-05/msg00375.php -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] the big picture for index-only scans
Robert Haas wrote: Any thoughts welcome. ?Incidentally, if anyone else feels like working on this, feel free to let me know and I'm happy to step away, from all of it or from whatever part someone else wants to tackle. ?I'm mostly working on this because it's something that I think we really need to get done, more than having a burning desire to be the one who does it. Indexonly scans are welcome! I believe I can help on 3 and 4, but (really) not sure for 1 and 2. Well, I have code for #1, and just need reviews, and #2 shouldn't be that hard, and with luck I'll twist Bruce's arm into doing it (*waves to Bruce*). So #3 and #4 are the next thing to tackle. Any thoughts on what/how you'd like to contribute there? I am happy to have pg_upgrade skip upgrading visibility map files --- it already has code to conditionally process them because they only exist in = 8.4: /* fsm/vm files added in PG 8.4 */ if (GET_MAJOR_VERSION(old_cluster.major_version) = 804) { /* * Copy/link any fsm and vm files, if they exist */ Just give the word and it will be done. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] the big picture for index-only scans
Greg Stark wrote: On a separate note though, Simon, I don't know what you mean by we normally start with a problem. It's an free software project and people are free to work on whatever interests them whether that's because it solves a problem they have, helps a client who's paying them, or just because it's of academic interest to them. We don't always take their patches if they aren't of general interest but people propose all kinds of crazy experimental ideas all the time. I am confused by Simon's questions too. Simon seems to regularly argue for adding features late in the development cycle and backpatch things no one else thinks should be backpatched, but he wants more research that index-only scans are going to improve things before it is implemented? The first is aggressive development, the second is very conservative development --- they don't match, so I now wonder what the motivation is since it isn't consistent. Isn't speeding up COUNT(*) a sufficient case because it will not have to touch the heap in many cases? No one is going to apply this patch until we fully understand the performance implications, just like every other patch. No one has suggested otherwise. It is helpful to have people critically review all our work, but disagreeing just for the sake of causing discussion isn't helpful, and I have seen a lot of these discussions lately. I am sensing a pattern. :-( -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] the big picture for index-only scans
On Wed, May 11, 2011 at 1:47 AM, Bruce Momjian br...@momjian.us wrote: Isn't speeding up COUNT(*) a sufficient case because it will not have to touch the heap in many cases? Putting aside the politics questions, count(*) is an interesting case -- it exposes some of the unanswered questions about index-only scans. The reason select count(*) might win would be because we could pick any index and do an index scan, relying on the visibility map to optimize away the heap reads. This is only going to be a win if a large fraction of the heap reads get optimized away. It's going to be pretty tricky to determine in the optimizer a) which index will be cheapest and b) what fraction of index tuples will point to pages where the heap reference can be optimized away. The penalty for guessing wrong if we use an index-only scan and it turns out to have many pages that aren't all-visible would be pretty high. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why not install pgstattuple by default?
Tom Lane wrote: Bruce Momjian br...@momjian.us writes: Tom Lane wrote: here are the sizes of the built RPMs from my last build for Fedora: -rw-r--r--. 1 tgl tgl 3839458 Apr 18 10:50 postgresql-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl 490788 Apr 18 10:50 postgresql-contrib-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl 27337677 Apr 18 10:51 postgresql-debuginfo-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl 961660 Apr 18 10:50 postgresql-devel-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl 7569048 Apr 18 10:50 postgresql-docs-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl 246506 Apr 18 10:50 postgresql-libs-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl64940 Apr 18 10:50 postgresql-plperl-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl65776 Apr 18 10:50 postgresql-plpython-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl45941 Apr 18 10:50 postgresql-pltcl-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl 5302117 Apr 18 10:50 postgresql-server-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl 1370509 Apr 18 10:50 postgresql-test-9.0.4-1.fc13.x86_64.rpm -rw-r--r--. 1 tgl tgl 3644113 Apr 18 10:50 postgresql-upgrade-9.0.4-1.fc13.x86_64.rpm Is that last one pg_upgrade? It seems very big. pg_upgrade plus a supporting set of 8.4 files ... OK, where do I get to dance around that pg_upgrade is packaged in Fedora thanks to you? At PGCon? LOL -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] the big picture for index-only scans
Greg Stark wrote: On Wed, May 11, 2011 at 1:47 AM, Bruce Momjian br...@momjian.us wrote: Isn't speeding up COUNT(*) a sufficient case because it will not have to touch the heap in many cases? Putting aside the politics questions, count(*) is an interesting case -- it exposes some of the unanswered questions about index-only scans. The reason select count(*) might win would be because we could pick any index and do an index scan, relying on the visibility map to optimize away the heap reads. This is only going to be a win if a large fraction of the heap reads get optimized away. It's going to be pretty tricky to determine in the optimizer a) which index will be cheapest and b) what fraction of index tuples will point to pages where the heap reference can be optimized away. The penalty for guessing wrong if we use an index-only scan and it turns out to have many pages that aren't all-visible would be pretty high. Yes, that is the tricky optimizer/analyze part. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] the big picture for index-only scans
Robert Haas wrote: So, what do we need in order to find our way to index-only scans? 1. The visibility map needs to be crash-safe. The basic idea of index-only scans is that, instead of checking the heap to find out whether each tuple is visible, we first check the visibility map. If the visibility map bit is set, then we know all tuples on the page are visible to all transactions, and therefore the tuple of interest is visible to our transaction. Assuming that a significant number of visibility map bits are set, this should enable us to avoid a fair amount of I/O, especially on large tables, because the visibility map is roughly 8000 times smaller than the heap, and therefore far more practical to keep in cache. However, before we can rely on the FYI, because the visibility map is only one _bit_ per page, it is 8000 * 8 or 64k times smaller than the heap, e.g. one 8k page covers 64MB of heap pages. This is important because we rely on this compactness in hope that the WAL logging of this information will not be burdensome. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] the big picture for index-only scans
Greg Stark wrote: On Wed, May 11, 2011 at 1:47 AM, Bruce Momjian br...@momjian.us wrote: Isn't speeding up COUNT(*) a sufficient case because it will not have to touch the heap in many cases? Putting aside the politics questions, count(*) is an interesting case -- it exposes some of the unanswered questions about index-only scans. The reason select count(*) might win would be because we could pick any index and do an index scan, relying on the visibility map to optimize away the heap reads. This is only going to be a win if a large fraction of the heap reads get optimized away. It's going to be pretty tricky to determine in the optimizer a) which index will be cheapest and b) what fraction of index tuples will point I assume the smallest non-partial index would be the cheapest index. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PGC_S_DEFAULT is inadequate
I believe I've sussed the reason for the recent reports of Windows builds crashing when asked to process 'infinity'::timestamp. It's a bit tedious, so bear with me: 1. The immediate cause is that datebsearch() is being called with a NULL pointer and zero count, ie, the powerup default values of timezonetktbl and sztimezonetktbl, because InstallTimeZoneAbbrevs is never called, because the GUC variable timezone_abbreviations is never set. That routine should be a bit more robust about the case, and I've already fixed that, but the underlying failure to initialize the GUC variable remains a problem. 2. The 9.1 change that created the issue is that I changed pg_timezone_abbrev_initialize to use a source value of PGC_S_DEFAULT instead of the previous, rather arbitrary choice of PGC_S_ARGV. That seemed like a good idea at the time because (a) it looked a lot saner in pg_settings and (b) it wouldn't override a setting coming from the postmaster's command line, should anyone ever try to do that (evidently no one ever has, or at least not complained to us that it didn't work). 3. The reason it fails in Windows and nowhere else is that write_one_nondefault_variable() ignores and doesn't write out variables having source == PGC_S_DEFAULT. So, even though the postmaster has correctly set its own value of timezone_abbreviations, child processes don't receive that setting. You can duplicate this behavior in a non-Windows machine if you #define EXEC_BACKEND. Too bad it didn't occur to me to test the GUC assign hook changes that way. Although I might not have found it anyway, because: 4. The problem is masked in the regression database because we create a database-level setting of timezone_abbreviations, so that backends do receive a value of the GUC before they are ever asked to parse any timestamp values. Else we would have seen this immediately in the buildfarm. Isn't that special? Effectively, write_one_nondefault_variable is assuming that variables having source == PGC_S_DEFAULT *must* have exactly their boot values. It turns out there's another place making the same assumption, namely the kludge in guc-file.l that tries to reset variables that were formerly set by postgresql.conf and no longer are. What it does is to reset them using source == PGC_S_DEFAULT, which will override the existing setting with the boot_val if and only if the variable currently has source == PGC_S_DEFAULT, which it just forced to be the case for anything previously having source == PGC_S_FILE. So this is fine if the current value was from the file or was the boot_val, but if we'd overridden the boot value with a replacement default value using PGC_S_DEFAULT, that code would cause the value to revert to the boot_val not the replacement value. Not desirable. So, having recognized these two problems, I was about to knuckle under and make the replacement default value in pg_timezone_abbrev_initialize be assigned with source PGC_S_ENV_VAR, which is the next step up. That would be ugly in the sense of exposing a confusing source value in pg_settings, but it would not have any worse effects because there is no real environment variable from which we might absorb a setting for timezone_abbreviations. But wait: there's another place that's also using PGC_S_DEFAULT like this, and that's the assignment of client_encoding from database encoding in postinit.c. And for that variable, there *is* a potential assignment from an environment variable, namely we will absorb a value from PGCLIENTENCODING if that's set in the server environment. (For the record, I take no position on whether that's actually a good behavior; but it is the historical, documented behavior and we've not had complaints about it.) If we have postinit.c use PGC_S_ENV_VAR for this purpose, then PGCLIENTENCODING will stop working because it will always be overridden from the database encoding, because that setting will be applied later with the same priority level. My conclusion about all this is that we really need to invent another GucSource value falling between PGC_S_DEFAULT and PGC_S_ENV_VAR, called perhaps PGC_S_DYNAMIC_DEFAULT, for the purpose of denoting values that are defaults in terms of the precedence pecking order but are not simply the hard-wired boot values. There's no real need for clients to see the difference, so we could have the external representation in pg_settings be default for both, but guc.c really needs to be aware of which settings are truly boot values and which are not. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers