Re: [HACKERS] global temporary tables
2010/4/24 Robert Haas robertmh...@gmail.com: A couple of recent threads made got me thinking again about the idea of global temporary tables. There seem to be two principal issues: 1. What is a global temporary table? 2. How could we implement that? Despite rereading the idea: global temp tables thread from April 2009 in some detail, I was not able to get a clear understanding of (1). What I *think* it is supposed to mean is that the table is a permanent object which is globally visible - that is, it's part of some non-temp schema like public or $user and it's column definitions etc. are visible to all backends - and it's not automatically removed on commit, backend exit, etc. - but the *contents* of the table are temporary and backend-local, so that each new backend initially sees it as empty and can then insert, update, and delete data independently of what any other backend does. As to (2), my thought is that perhaps we could implement this by instantiating a separate relfilenode for the relation for each backend which accesses it. relfilenode would be 0 in pg_class, as it is for mapped relations, but every time a backend touched the rel, we'd allocate a relfilenode and associated the oid of the temp table to it using some kind of backend-local storage - actually similar to what the relmapper code does, except without the complexity of ever actually having to persist the value; and perhaps using a hash table rather than an array, since the number of mapped rels that a backend can need to deal with is rather more limited than the number of temp tables it might want to use. it is good idea. I missing some ideas about statistics, about indexes. Regards Pavel Stehule Thoughts? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Add missing optimizer hooks for function cost and number of rows.
On Fri, 2010-04-23 at 18:55 -0400, Tom Lane wrote: sri...@postgresql.org (Simon Riggs) writes: Log Message: --- Add missing optimizer hooks for function cost and number of rows. Closely follow design of other optimizer hooks: if hook exists retrieve value from plugin; if still not set then get from cache. What exactly are we doing adding new features without discussion (or documentation, or known use cases) at this stage of the release cycle? Existing hooks were not fully complete in their coverage. That has happened before, and we have discussed that before on hackers, so I took care not to deviate from that implementation. This is a very low impact change, isn't in a new area and similar optimizer related changes were made recently, so I saw little to object to in this particular change. No such hooks are documented, even ones with strong use cases. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Re: [HACKERS] vcregress.bat check triggered Heap error in the Debugversionof win32 build
Hi, I think I've answered the question. The regress.bat is in the source code line. I can expand it. Just like: ..\..\..\Debug\pg_regress\pg_regress --dlpath=. --psqldir=..\..\..\Debug\psql --schedule=serial_schedule --multibyte=SQL_ASCII --no-locale --temp-install=.\tmp_check --top-builddir=E:\learn\db_research\postgreSQL\cvsroot\pgsql.latests And this command will trigger the error. Thanks in advance. The release version will not have such an issue. Xiong He 2010-04-24 16:34:14 发件人: Andrew Dunstan 发送时间: 2010-04-24 08:42:27 收件人: Xiong He 抄送: pgsql-hackers 主题: Re: [HACKERS] vcregress.bat check triggered Heap error in the Debugversionof win32 build Xiong He wrote: Thanks. In my test, it fails during the vcregress.bat check startup. It's a Debug Assertion Error. File: dbgheap.c Line: 1252. E:\learn\db_research\postgreSQL\cvsroot\pgsql.latest\src\tools\msvcvcregress.ba t check No test can run. I used VS2005 for the build. Please do not top-answer. You still need to tell us more about exactly what it is doing when it fails. What program is it that is failing (and don't answer vcregress.bat)? cheers andrew
Re: [HACKERS] testing HS/SR - 1 vs 2 performance
On Fri, 2010-04-23 at 19:07 -0400, Robert Haas wrote: On Fri, Apr 23, 2010 at 6:39 PM, Simon Riggs si...@2ndquadrant.com wrote: On Fri, 2010-04-23 at 11:32 -0400, Robert Haas wrote: 99% of transactions happen in similar times between primary and standby, everything dragged down by rare but severe spikes. We're looking for something that would delay something that normally takes 0.1ms into something that takes 100ms, yet does eventually return. That looks like a severe resource contention issue. Wow. Good detective work. While we haven't fully established the source of those problems, I am now happy that these test results don't present any reason to avoid commiting the main patch tested by Erik (not the smaller additional one I sent). I expect to commit that on Sunday. Both Heikki and I objected to that patch. Please explain your objection, based upon the patch and my explanations. And apparently it doesn't fix the problem, either. So, -1 from me. There is an issue observed in Erik's later tests, but my interpretation of the results so far is that the sorted array patch successfully removes the initially reported loss of performance. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: recovery_connections cannot start (was Re: [HACKERS] master in standby mode croaks)
On Fri, 2010-04-23 at 19:33 -0400, Robert Haas wrote: Principle of obvious breakage. That is a good principle. It can be applied both ways here. Changing user interfaces (or indeed, anything) to very little obvious gain is a considerable annoyance to users. IIABDFI We need to be aware of the timing issues on the project. Changing something that has been the same for years is just annoying to existing users and makes upgrading to our brand new shiny software much harder than we ourselves would like that to be. But also, deferring solutions to user problems for vague reasons also needs to be avoided because waiting til next release moves the time to fix from about 6 months to about 18 months on average, which crosses patience threshold. So in general, I seek to speed up necessary change and slow down unnecessary change requests. I think we're improving on both. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Add missing optimizer hooks for function cost and number of rows.
On Sat, Apr 24, 2010 at 3:31 AM, Simon Riggs si...@2ndquadrant.com wrote: On Fri, 2010-04-23 at 18:55 -0400, Tom Lane wrote: sri...@postgresql.org (Simon Riggs) writes: Log Message: --- Add missing optimizer hooks for function cost and number of rows. Closely follow design of other optimizer hooks: if hook exists retrieve value from plugin; if still not set then get from cache. What exactly are we doing adding new features without discussion (or documentation, or known use cases) at this stage of the release cycle? Existing hooks were not fully complete in their coverage. That has happened before, and we have discussed that before on hackers, so I took care not to deviate from that implementation. This is a very low impact change, isn't in a new area and similar optimizer related changes were made recently, so I saw little to object to in this particular change. No such hooks are documented, even ones with strong use cases. The point isn't whether the existing hooks are complete or not. The point is that we shouldn't be making undiscussed changes EVER, and particularly not a week before beta. Hooks are frequently proposed and rejected - every once in a while they are proposed and accepted. So it is not as if there is any reason to believe that no one could possibly object to this. And you carefully failed to answer Tom's other point about lack of use case. I think the use case for these hooks is pretty thin, but I don't really want to argue about it now. I want you to revert the patch and resubmit it for 9.1 when there is time to properly discuss it, and focus on the remaining open items so that we can put out a beta. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql: Add setting to make '+' on \d implicit
On Fri, 23 Apr 2010 14:28:38 -0400 Tom Lane t...@sss.pgh.pa.us wrote: If we were to do something like that, it would certainly have to affect every \d variant that has a + option. Which is probably not a very good idea --- in many cases that's a very expensive/verbose option. I can't get excited about this proposal, personally. I was thinking of affecting all \d* commands, didn't know some were expensive though. What the OP actually seemed to care about was database object comments. I could see somebody who relied heavily on comments wanting his comments to be included in all display commands, even without the + option. Maybe a configuration variable along the lines of 'always_show_comments' would be a better design. Sounds good, comment visibility is my goal. I know full metadata should be handled in other ways, but I like to do most processing in psql sessions for the self documenting effect, and seeing comments helps in that context. If \pset's not the right place anywhere else would be fine. Thanks for considering, Cheers -Terry -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql: Add setting to make '+' on \d implicit
On Fri, 23 Apr 2010 14:34:45 -0700 Steve Atkins st...@blighty.com wrote: Maybe a configuration variable along the lines of 'always_show_comments' would be a better design. Or more generally an ability to set aliases via .psqlrc similar to \set, maybe? \alias \d- = \d \alias \d = \d+ Sounds harder but better because of the generality. Cheers -Terry -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Add missing optimizer hooks for function cost and number of rows.
Robert Haas robertmh...@gmail.com writes: On Sat, Apr 24, 2010 at 3:31 AM, Simon Riggs si...@2ndquadrant.com wrote: Existing hooks were not fully complete in their coverage. That has happened before, and we have discussed that before on hackers, so I took care not to deviate from that implementation. This is a very low impact change, isn't in a new area and similar optimizer related changes were made recently, so I saw little to object to in this particular change. No such hooks are documented, even ones with strong use cases. The point isn't whether the existing hooks are complete or not. The point is that we shouldn't be making undiscussed changes EVER, and particularly not a week before beta. I have a problem with not only the process (or lack of it) but the substance of the patch. I don't believe that a system-wide hook point has any great use for improving function estimation. You need function- specific knowledge, and this is just not a useful way to package it. When we put in the COST/ROWS options for functions, it was generally agreed that the way forward would be to generalize those, eg by allowing per-function estimator functions to be called instead of just inserting constants. (And I think the main reason we didn't just do that immediately was that we wanted a feature that could be used without doing C-level programming.) This patch doesn't do that, nor even lay any useful groundwork for doing it. It would be impossible for instance for this hook function to lay its hands on the arguments to the function to be estimated, which certainly begs the question as to how it's going to deliver any estimate more useful than the constant value. Please revert. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] global temporary tables
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 ... surprised to find my clone unaffected? If it modifies both, how do we avoid complete havoc if the original has since been modified (perhaps incompatibly, perhaps not) by some other backend doing its own ALTER TABLE? Since this is such a thorny problem, and this is a temporary table, why not just disallow ALTER completely for the first pass? - -- Greg Sabino Mullane g...@turnstep.com PGP Key: 0x14964AC8 201004241201 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iEYEAREDAAYFAkvTFfsACgkQvJuQZxSWSsjrVACePmmNglGi6KoZgYZ7zjUm4gPm o2wAoNYYuiZl1HZXsgiAOQkJzNUmaORm =IijV -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] global temporary tables
Greg Sabino Mullane g...@turnstep.com writes: surprised to find my clone unaffected? If it modifies both, how do we avoid complete havoc if the original has since been modified (perhaps incompatibly, perhaps not) by some other backend doing its own ALTER TABLE? Since this is such a thorny problem, and this is a temporary table, why not just disallow ALTER completely for the first pass? Usually the way we approach these kinds of problems is that we want to see some plausible outline for how they might be fixed before we move forward with the base feature. IOW, I wouldn't object to not having ALTER in the first release, but if we have no idea how to do ALTER at all I'd be too worried that we were painting ourselves into a corner. Or maybe you can make a case that there's no need to allow ALTER at all, ever. But surely DROP needs to be possible, and that seems to already introduce some of the same issues. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Add missing optimizer hooks for function cost and number of rows.
On Sat, 2010-04-24 at 11:17 -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Sat, Apr 24, 2010 at 3:31 AM, Simon Riggs si...@2ndquadrant.com wrote: Existing hooks were not fully complete in their coverage. That has happened before, and we have discussed that before on hackers, so I took care not to deviate from that implementation. This is a very low impact change, isn't in a new area and similar optimizer related changes were made recently, so I saw little to object to in this particular change. No such hooks are documented, even ones with strong use cases. The point isn't whether the existing hooks are complete or not. The point is that we shouldn't be making undiscussed changes EVER, and particularly not a week before beta. I have a problem with not only the process (or lack of it) but the substance of the patch. I don't believe that a system-wide hook point has any great use for improving function estimation. You need function- specific knowledge, and this is just not a useful way to package it. I've revoked it. When we put in the COST/ROWS options for functions, it was generally agreed that the way forward would be to generalize those, eg by allowing per-function estimator functions to be called instead of just inserting constants. I completely agree that the above is the best way forwards. (And I think the main reason we didn't just do that immediately was that we wanted a feature that could be used without doing C-level programming.) This patch doesn't do that, nor even lay any useful groundwork for doing it. It would be impossible for instance for this hook function to lay its hands on the arguments to the function to be estimated, which certainly begs the question as to how it's going to deliver any estimate more useful than the constant value. We can override table stats but not functions stats. I noticed that hole in the previous implementation, and rectified it with the intention of helping users with what I saw as a small, low risk patch that exactly followed existing code. It would seem I did that too quickly without realising that an objection would occur. For the record, it's possible to use the main optimizer hooks to get the same result, so I'm in no way personally blocked by this. I think that's harder, so others may not find it as easy. This may actually help obtain funding to implement the full approach, maybe. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Add missing optimizer hooks for function cost and number of rows.
Simon Riggs si...@2ndquadrant.com writes: We can override table stats but not functions stats. I noticed that hole in the previous implementation, and rectified it with the intention of helping users with what I saw as a small, low risk patch that exactly followed existing code. It would seem I did that too quickly without realising that an objection would occur. Well, you did it without much thought at all. I think this episode is a perfect demonstration of why we ask for concrete use-cases for proposed hooks. If you'd actually tried to write something that used the hook, you'd surely have noticed that it wasn't being passed the information that it would need to do anything useful, and you'd probably have recognized the problem that there's no good way for a single hook function to provide an extensible collection of function-specific knowledge. But the other point is that people aren't going to want to have to write C-language hook functions in order to provide estimators for user-defined functions. We need to think of something higher-level than that. I think there was some discussion of generalizing the COST/ROWS constants into SQL expressions using the function arguments, which the planner could evaluate if it could reduce the arguments to constants. I'm not sure if that would be adequate or even useful, but it seems more likely to be helpful than a bare hook function. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Add missing optimizer hooks for function cost and number of rows.
On Sat, 2010-04-24 at 12:59 -0400, Tom Lane wrote: Well, you did it without much thought at all. To the consequences, definitely. I think this episode is a perfect demonstration of why we ask for concrete use-cases for proposed hooks. If you'd actually tried to write something that used the hook, you'd surely have noticed that it wasn't being passed the information that it would need to do anything useful, and you'd probably have recognized the problem that there's no good way for a single hook function to provide an extensible collection of function-specific knowledge. To the value, no. The limitations of the hook approach were clear, but they do at least allow overriding values on a session by session basis, allowing you to write a program to estimate the result and then set the function costs accordingly. It's not clever or the best way, but it was the same situation as the other hooks currently provide and I imagined it would be accepted without question, wrongly. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] global temporary tables
On Sat, Apr 24, 2010 at 12:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Sabino Mullane g...@turnstep.com writes: surprised to find my clone unaffected? If it modifies both, how do we avoid complete havoc if the original has since been modified (perhaps incompatibly, perhaps not) by some other backend doing its own ALTER TABLE? Since this is such a thorny problem, and this is a temporary table, why not just disallow ALTER completely for the first pass? Usually the way we approach these kinds of problems is that we want to see some plausible outline for how they might be fixed before we move forward with the base feature. IOW, I wouldn't object to not having ALTER in the first release, but if we have no idea how to do ALTER at all I'd be too worried that we were painting ourselves into a corner. Or maybe you can make a case that there's no need to allow ALTER at all, ever. But surely DROP needs to be possible, and that seems to already introduce some of the same issues. I had the same thought as GSM this morning. More specifically, it seems to me that the problematic cases are precisely those in which you might feel an urge to touch somebody else's local buffers, so I think we should disallow, approximately, those ALTER TABLE cases which require a full-table rewrite. I don't see the problem with DROP. Under the proposed design, it's approximately equivalent to dropping a table that someone else has truncated. You just wait for the necessary lock and then do it. At least AIUI, the use case for this feature is that you want to avoid creating the same temporary table over and over again. The schema is fixed and doesn't change much, but you're creating it lots and lots of times in lots and lots of different backends, leading to both management and performance difficulties. If you want to be able to change the schema frequently or in a backend-local way, use the existing temporary table feature. Now, there is ONE problem with DROP, which is that you might orphan some heaps. Of course, that can also happen due to a backend crash. Currently, autovacuum arranges to drop any orphaned temp tables that have passed the wraparound threshold, but even if we were happy with waiting 2 billion transactions to get things cleaned up, the mechanism can't work here because it relies on being able to examine the pg_class row and determine which backend owns it, and where the storage is located. We could possibly set things up so that a running backend will notice if a global temporary table for which it's created a private relfilenode gets dropped, and blow away the backing file. But that doesn't protect against crashes, so I think we're going to need some other garbage collection mechanism, either instead of in addition to asking backends to clean up after themselves. I'm not quite sure what the design of that should look like, though. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] global temporary tables
Robert Haas robertmh...@gmail.com writes: At least AIUI, the use case for this feature is that you want to avoid creating the same temporary table over and over again. The context that I've seen it come up in is that people don't want to clutter their functions with create-it-if-it-doesn't-exist logic, which you have to have given the current behavior of temp tables. Any performance gain from reduced catalog churn would be gravy. Aside from the DROP problem, I think this implementation proposal has one other big shortcoming: what are you going to do about table statistics? In many cases, you really *have* to do an ANALYZE once you've populated a temp table, if you want to get decent plans for it. Where will you put those stats? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] global temporary tables
[ forgot to respond to this part ] Robert Haas robertmh...@gmail.com writes: ... I don't see the problem with DROP. Under the proposed design, it's approximately equivalent to dropping a table that someone else has truncated. You just wait for the necessary lock and then do it. And do *what*? You can remove the catalog entries, but how are you going to make the physical storage of other backends' versions go away? (To say nothing of making them flush their local buffers for it.) If you do remove the catalog entries, won't you be cutting the knees out from under whatever end-of-session cleanup processing might exist in those other backends? The idea of the global table as a template that individual sessions clone working tables from would avoid most of these problems. You rejected it on the grounds that ALTER would be too hard; but if you're blowing off ALTER anyway, that argument seems pretty unimpressive. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] global temporary tables
On Sat, Apr 24, 2010 at 1:31 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: At least AIUI, the use case for this feature is that you want to avoid creating the same temporary table over and over again. The context that I've seen it come up in is that people don't want to clutter their functions with create-it-if-it-doesn't-exist logic, which you have to have given the current behavior of temp tables. Any performance gain from reduced catalog churn would be gravy. I think there's a significant contingent on this mailing list who feel that that gravy would be rather tasty and would like very much to enjoy some of it along with their temporary table tetrazzini. Aside from the DROP problem, I think this implementation proposal has one other big shortcoming: what are you going to do about table statistics? In many cases, you really *have* to do an ANALYZE once you've populated a temp table, if you want to get decent plans for it. Where will you put those stats? For a first cut, I had thought about ignoring the problem. Now, that may sound stupid, because now if two different backends have very different distributions of data in the table and both do an ANALYZE, one set of statistics will clobber the other set of statistics. On the flip side, for some usage patterns, it might be actually work out to a win. Maybe the data I'm putting in here today is a great deal like the data I put in here yesterday, and planning it with yesterday's statistics doesn't cost enough to be worth a re-ANALYZE. If we don't want to do that, I suppose one option is to create a pg_statistic-like table in the backend's temporary tablespace and put them there; or we could put them into a backend-local hash table. The current setup of pg_statistic is actually somewhat weak for a number of things we might want to do: for example, it might be interesting to gather statistics for the subset of a table for which a particular partial index is predOK. When such an index is available for a particular query, we could use the statistics for that subset of the table instead of the overall statistics for the table, and get better estimates. Or we could even let the user specify predicates which will cause the table to have a different statistical distribution than the table as a whole, and gather statistics for the subset that matches the predicate. One approach would be to make the starelid column able to reference something other than a relation OID, although I don't think that actually helps with the global temp table problem because if we use the real pg_statistic to store the data then we have to arrange to clean it up. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] global temporary tables
For a first cut, I had thought about ignoring the problem. Now, that may sound stupid, because now if two different backends have very different distributions of data in the table and both do an ANALYZE, one set of statistics will clobber the other set of statistics. On the flip side, for some usage patterns, it might be actually work out to a win. Maybe the data I'm putting in here today is a great deal like the data I put in here yesterday, and planning it with yesterday's statistics doesn't cost enough to be worth a re-ANALYZE. Both variant can be. First time - statistic can be taken from some original (can be empty). After ANALYZE the statistic can be individual. Regards Pavel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] global temporary tables
On Sat, Apr 24, 2010 at 1:38 PM, Tom Lane t...@sss.pgh.pa.us wrote: [ forgot to respond to this part ] Robert Haas robertmh...@gmail.com writes: ... I don't see the problem with DROP. Under the proposed design, it's approximately equivalent to dropping a table that someone else has truncated. You just wait for the necessary lock and then do it. And do *what*? You can remove the catalog entries, but how are you going to make the physical storage of other backends' versions go away? (To say nothing of making them flush their local buffers for it.) If you do remove the catalog entries, won't you be cutting the knees out from under whatever end-of-session cleanup processing might exist in those other backends? Well, if I knew for sure what the best way was to solve all of these problems, I'd be posting a finished patch rather than a request for comment on a design. It's not obvious to me that there's a terribly thorny problem in the area you're concerned about, but your concern is making me worry that I'm missing something. Why would the end-of-session processing need the catalog entries? It seems like whatever backend-local data structure we're using to record the relfilenode mappings would be sufficent to nuke the backend storage, and what else needs doing? The idea of the global table as a template that individual sessions clone working tables from would avoid most of these problems. You rejected it on the grounds that ALTER would be too hard; but if you're blowing off ALTER anyway, that argument seems pretty unimpressive. I don't think that avoiding the catalog churn is something to be dismissed lightly, but I also think that cloning the table is likely to be significantly more difficult from an implementation point of view. Under the implementation I'm proposing, we don't need much that is fundamentally all that new. Global temporary tables can be treated like our existing temp tables for purposes of XLog and bufmgr, but they don't get forced into a temporary namespace. The relation mapping infrastructure provides a pretty good start for using a relfilenode that isn't stored in pg_class. I've already gone through the exercise of finding all the places where we check rd_istemp and changing them to use macros instead (RelationNeedsWAL, IsBackendLocal, etc.) and it's not bad. There's a related project which I think can also leverage much of this same infrastructure: unlogged tables. We've talked about this before, but in short the idea is that an unlogged table behaves like a regular table in all respects except that we never write WAL for it; and we truncate it at shutdown and at startup. Therefore, it doesn't show up on standby servers, and its contents are not persistent across restarts, but performance is improved. It's suitable for things like the table of currently logged in users, where you don't mind forcing everyone to log in again if the database crashes. (It might even be possible to allow writes to unlogged tables on standby servers, though I'm not feeling that ambitious ATM.) So: - local temp tables exist in a temp namespace, use local buffers, and skip WAL - global temp tables exist in a non-temp namespace, use local buffers, and skip WAL - unlogged tables exist in a non-temp namespace, use shared buffers, and skip WAL - normal tables exist in a non-temp namespace, use shared buffers, and write WAL Thoughts? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] global temporary tables
Robert Haas robertmh...@gmail.com writes: ... Why would the end-of-session processing need the catalog entries? It seems like whatever backend-local data structure we're using to record the relfilenode mappings would be sufficent to nuke the backend storage, and what else needs doing? Well, if you're intending to have a separate data structure and code path for cleaning up this type of temp table, then maybe you don't need to touch any catalog entries. I'm concerned though about how far the effects will propagate --- things like TRUNCATE, VACUUM FULL, REINDEX will probably have issues with this. Right now they think in terms of writing a new pg_class entry in order to reassociate tables with new relfilenodes. Have you thought much about the previously proposed design, ie keeping catalog entries for temp tables in backend-local temporary catalogs? That would certainly be a lot of work, but I think in the end it might fit in better. This design feels like it's being driven by hey, we can abuse the relmapper to sort of do what we want, and not by what we really want. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] global temporary tables
On Fri, 2010-04-23 at 22:52 -0400, Robert Haas wrote: Thoughts? Only a requirement: that we design this in a way that will allow temp tables to be used during Hot Standby. I make not other comment. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assertion failure twophase.c (3) (testing HS/SR)
On Fri, 2010-04-23 at 03:08 +0200, Erik Rijkers wrote: It's a very easy test; I will probably run it a few more times. Please share details of your system and hardware. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] global temporary tables
On Sat, Apr 24, 2010 at 3:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: ... Why would the end-of-session processing need the catalog entries? It seems like whatever backend-local data structure we're using to record the relfilenode mappings would be sufficent to nuke the backend storage, and what else needs doing? Well, if you're intending to have a separate data structure and code path for cleaning up this type of temp table, then maybe you don't need to touch any catalog entries. I'm concerned though about how far the effects will propagate --- things like TRUNCATE, VACUUM FULL, REINDEX will probably have issues with this. Right now they think in terms of writing a new pg_class entry in order to reassociate tables with new relfilenodes. Have you thought much about the previously proposed design, ie keeping catalog entries for temp tables in backend-local temporary catalogs? That would certainly be a lot of work, but I think in the end it might fit in better. This design feels like it's being driven by hey, we can abuse the relmapper to sort of do what we want, and not by what we really want. Well, yes and no. I think there are definitely some good things that can happen if we can see our way to taking a hammer to pg_class and pg_attribute. If we create, e.g. pg_shared_class and pg_shared_attribute, then we can un-nail the catalogs you just nailed to make the authentication process able to work without selecting a database. We can also enable (without guilt) clustering both those catalogs and the database-specific versions of pg_class, since we no longer have to worry about having multiple copies of the row that can get out of sync with each other. And if we further break off pg_temp_class and pg_temp_attribute, then we can also have our existing flavor of temporary tables without worrying about catalog bloat, which would be great. There may be other applications as well. Having said all that, it doesn't actually allow us to implement global temporary tables, because obviously the catalog entries for a global temporary table have to be permanent. Of course, if we didn't have to worry about catalog bloat, the clone approach you're proposing would be somewhat more attractive, but I actually think that the synergy is in the other direction: the perfect place to store the catalog entries and statistics for local temporary tables is - in a global temporary table! Note that while a local temporary table can never inherit from a permanent table, it's entirely sensible to let global temporary tables inherit from permanent tables. Different backends will have different views of the overall contents of the parent table, but that's OK, even desirable. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] global temporary tables
On Sat, Apr 24, 2010 at 6:29 PM, Simon Riggs si...@2ndquadrant.com wrote: On Fri, 2010-04-23 at 22:52 -0400, Robert Haas wrote: Thoughts? Only a requirement: that we design this in a way that will allow temp tables to be used during Hot Standby. I make not other comment. For so long as local temporary tables put their catalog entries in pg_class, we're not going to be able to use them during Hot Standby. See the email I just sent elsewhere on this thread for a long term roadmap to getting out of that pickle. At least under the implementation I'm proposing here, making global temporary tables usable would be an easier nut to crack, because the catalog entries are a non-issue. There is one major problem, though: assigning a scratch relfilenode to the temporary table requires generating an OID, which we currently have no way to allow on the standby. Upthread I also proposed an implementation for unlogged tables (that is, contents don't survive a server bounce) which wouldn't have that problem either, although I haven't fully thought it through and there may very well be other issues. So in short: I don't think anything we're talking about it would make HS use harder, and some of it might make it easier. But probably some additional engineering effort dedicated specifically to solving the problems unique to HS would still be needed. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] trouble with to_char('L')
Takahiro Itagaki wrote: Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp wrote: Revised patch attached. Please test it. I applied this version of the patch. Please check wheter the bug is fixed and any buildfarm failures. Great. I have merged in my C comments into the code with the attached patch so we remember why the code is setup as it is. One thing I am confused about is that, for Win32, our numeric/monetary handling sets lc_ctype to match numeric/monetary, while our time code in the same file uses that method _and_ uses wcsftime() to return the value in wide characters. So, why do we do both for time? Is there any value to that? Seems we should do the same for both numeric/monetary and time. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com Index: src/backend/utils/adt/pg_locale.c === RCS file: /cvsroot/pgsql/src/backend/utils/adt/pg_locale.c,v retrieving revision 1.54 diff -c -c -r1.54 pg_locale.c *** src/backend/utils/adt/pg_locale.c 22 Apr 2010 01:55:52 - 1.54 --- src/backend/utils/adt/pg_locale.c 24 Apr 2010 22:43:53 - *** *** 41,46 --- 41,50 * DOES NOT WORK RELIABLY: on some platforms the second setlocale() call * will change the memory save is pointing at. To do this sort of thing * safely, you *must* pstrdup what setlocale returns the first time. + * + * FYI, The Open Group locale standard is defined here: + * + * http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html *-- */ *** *** 424,430 char *grouping; char *thousands_sep; int encoding; - #ifdef WIN32 char *save_lc_ctype; #endif --- 428,433 *** *** 435,459 free_struct_lconv(CurrentLocaleConv); ! /* Set user's values of monetary and numeric locales */ save_lc_monetary = setlocale(LC_MONETARY, NULL); if (save_lc_monetary) save_lc_monetary = pstrdup(save_lc_monetary); save_lc_numeric = setlocale(LC_NUMERIC, NULL); if (save_lc_numeric) save_lc_numeric = pstrdup(save_lc_numeric); #ifdef WIN32 ! /* set user's value of ctype locale */ save_lc_ctype = setlocale(LC_CTYPE, NULL); if (save_lc_ctype) save_lc_ctype = pstrdup(save_lc_ctype); - #endif ! /* Get formatting information for numeric */ ! #ifdef WIN32 setlocale(LC_CTYPE, locale_numeric); #endif setlocale(LC_NUMERIC, locale_numeric); extlconv = localeconv(); encoding = pg_get_encoding_from_locale(locale_numeric); --- 438,485 free_struct_lconv(CurrentLocaleConv); ! /* Save user's values of monetary and numeric locales */ save_lc_monetary = setlocale(LC_MONETARY, NULL); if (save_lc_monetary) save_lc_monetary = pstrdup(save_lc_monetary); + save_lc_numeric = setlocale(LC_NUMERIC, NULL); if (save_lc_numeric) save_lc_numeric = pstrdup(save_lc_numeric); #ifdef WIN32 !/* ! * Ideally, monetary and numeric local symbols could be returned in ! * any server encoding. Unfortunately, the WIN32 API does not allow ! * setlocale() to return values in a codepage/CTYPE that uses more ! * than two bytes per character, like UTF-8: ! * ! * http://msdn.microsoft.com/en-us/library/x99tb11d.aspx ! * ! * Evidently, LC_CTYPE allows us to control the encoding used ! * for strings returned by localeconv(). The Open Group ! * standard, mentioned at the top of this C file, doesn't ! * explicitly state this. ! * ! * Therefore, we set LC_CTYPE to match LC_NUMERIC or LC_MONETARY ! * (which cannot be UTF8), call localeconv(), and then convert from ! * the numeric/monitary LC_CTYPE to the server encoding. One ! * example use of this is for the Euro symbol. ! * ! * Perhaps someday we will use GetLocaleInfoW() which returns values ! * in UTF16 and convert from that. ! */ ! ! /* save user's value of ctype locale */ save_lc_ctype = setlocale(LC_CTYPE, NULL); if (save_lc_ctype) save_lc_ctype = pstrdup(save_lc_ctype); ! /* use numeric to set the ctype */ setlocale(LC_CTYPE, locale_numeric); #endif + + /* Get formatting information for numeric */ setlocale(LC_NUMERIC, locale_numeric); extlconv = localeconv(); encoding = pg_get_encoding_from_locale(locale_numeric); *** *** 462,471 thousands_sep = db_encoding_strdup(encoding, extlconv-thousands_sep); grouping = strdup(extlconv-grouping); - /* Get formatting information for monetary */ #ifdef WIN32 setlocale(LC_CTYPE, locale_monetary); #endif setlocale(LC_MONETARY, locale_monetary); extlconv = localeconv(); encoding = pg_get_encoding_from_locale(locale_monetary); --- 488,499 thousands_sep = db_encoding_strdup(encoding, extlconv-thousands_sep); grouping = strdup(extlconv-grouping); #ifdef WIN32 + /* use monetary to set the ctype */
Re: [HACKERS] [RFC] nodeToString format and exporting the SQL parser
Jehan-Guillaume (ioguix) de Rorthais wrote: A simple example of a tokenizer is the php one: http://fr.php.net/token_get_all And here is a basic example which return pseudo rows here : = TOKENIZE $script$ SELECT 1; UPDATE test SET a=2; $script$; type | pos | value | line - -+-+--+-- SQL_COMMAND | 1 | 'SELECT' | 1 CONSTANT| 8 | '1' | 1 DELIMITER | 9 | ';' | 1 SQL_COMMAND | 11 | 'UPDATE' | 2 IDENTIFIER | 18 | 'test' | 2 SQL_KEYWORD | 23 | 'SET'| 2 IDENTIFIER | 27 | 'a'| 2 OPERATOR| 30 | '=' | 2 CONSTANT| 31 | '1' | 2 Sounds useful to me, though as a function like suggested in a later email. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] nodeToString format and exporting the SQL parser
On Sat, Apr 24, 2010 at 8:07 PM, Bruce Momjian br...@momjian.us wrote: Jehan-Guillaume (ioguix) de Rorthais wrote: A simple example of a tokenizer is the php one: http://fr.php.net/token_get_all And here is a basic example which return pseudo rows here : = TOKENIZE $script$ SELECT 1; UPDATE test SET a=2; $script$; type | pos | value | line - -+-+--+-- SQL_COMMAND | 1 | 'SELECT' | 1 CONSTANT | 8 | '1' | 1 DELIMITER | 9 | ';' | 1 SQL_COMMAND | 11 | 'UPDATE' | 2 IDENTIFIER | 18 | 'test' | 2 SQL_KEYWORD | 23 | 'SET' | 2 IDENTIFIER | 27 | 'a' | 2 OPERATOR | 30 | '=' | 2 CONSTANT | 31 | '1' | 2 Sounds useful to me, though as a function like suggested in a later email. If tool-builders think this is useful, I have no problem with making it available. It should be suitably disclaimed: We reserve the right to rip out the entire flex/yacc-based lexer and parser at any time and replace them with a hand-coded system written in Prolog that emits tokenization information only in ASN.1-encoded pig latin. If massive changes in the way this function works - or its complete disappearance - are going to make you grumpy, don't call it. But having said that, assuming there is a real use case for this, I think it's better to let people get at it rather than forcing them to roll their own. Because frankly, if we do rip out the whole thing, then people are going to have to adjust their stuff anyway, regardless of whether they're using some API we provide or something they've cooked up from scratch. And in practice, most changes on our end are likely to be incremental, though, again, we're not guaranteeing that in any way. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] global temporary tables
On Apr 24, 2010, at 12:31 PM, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: At least AIUI, the use case for this feature is that you want to avoid creating the same temporary table over and over again. The context that I've seen it come up in is that people don't want to clutter their functions with create-it-if-it-doesn't-exist logic, which you have to have given the current behavior of temp tables. Any performance gain from reduced catalog churn would be gravy. Aside from the DROP problem, I think this implementation proposal has one other big shortcoming: what are you going to do about table statistics? In many cases, you really *have* to do an ANALYZE once you've populated a temp table, if you want to get decent plans for it. Where will you put those stats? One possibility: rename the existing pg_stats to pg_stats_permanent. Create a global temporary table called pg_stats_temporary. pg_stats becomes a union of the two. I know the backend wouldn't be able to use the view, but hopefully access to statistics goes through a limited set of functions so that teaching them about the two different tables isn't hard. As for cleanup and storage questions; what about having temp objects live in pgsql_tmp? I'm thinking create a directory under pgsql_tmp for a backend PID the first time it creates a temp object (global or local) and create the files in there. That also means that we don't have to come up with different relfilenodes for each backend. On the other hand, some layer (presumably smgr) would need to understand whether a relation was temporary or not. If we do that, cleanup is easy: you can remove any directories that no longer have a running PID. For forensics you probably only want to do that automatically when a backend starts and discovers it already has a directory, though we should also provide an administrator function that will clobber all directories that no longer have backends. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] nodeToString format and exporting the SQL parser
On Fri, Apr 2, 2010 at 3:53 PM, Michael Tharp g...@partiallystapled.com wrote: Most Esteemed Hackers: Due to popular demand on #postgresql (by which I mean David Fetter), I have been spending a little time making the internal SQL parser available to clients via a C-language SQL function. The function itself is extremely simple: just a wrapper around a call to raw_parser followed by nodeToString. Seems reasonable. Most of the hard stuff has been in parsing the output of nodeToString on the client side. So, I have a few questions to help gauge interest in related patches: Is there interest in a patch to extend nodes/outfuncs.c with support for serializing more node types? Coverage has been pretty good so far but various utility statements and their related nodes are missing, e.g. AlterTableStmt and GrantStmt. I expect that this will be the least contentious suggestion. This wouldn't bother me provided the code footprint is small. I would be against adding a lot of complexity for this. The nodeToString format as it stands is somewhat ambiguous with respect to the type of a node member's value if one does not have access to readfuncs.c. For example, a T_BitString called foo is serialized as ':foo b1010' while a char * containing 'b1010' is also serialized as ':foo b1010'. This may just mean that _outToken needs to escape the leading 'b'. A similar problem exists for booleans ('true' as a string vs. as a boolean). I am not inclined to change this. Turning the format into something self-describing seems to me to be significant work and a significant compatibility break for a very small amount of benefit. Additionally, values may span more than one token for certain types e.g. Datum (:constvalue 4 [ 16 0 0 0 ]). Plan trees have a few types that don't have a corresponding read function and output an array of space-separated integers. PlanInvalItem even seems to use a format containing parentheses, which the tokenizer splits as if it were a list. While most of these only occur in plan nodes and thus don't affect my use case (Datum being the exception), it would be ideal if they could be parsed more straightforwardly. I'm not inclined to change this, either. These last two problems perhaps can be worked around by escaping more things in _outToken, but maybe it would be smarter to make the fields self-descriptive in terms of type. For example, the field names could be prefixed with a short string describing its type, which in most cases would be a single character, e.g. 's:schemaname' for a char*, 'b:true' for a bool, 'n:...' for any node (including Value nodes), or longer strings for less commonly used types like the integer arrays in plan nodes (although these would probably be better as a real integer list). These could be used to unambiguously parse individual tokens and also to determine how many or what kind of token to expect for multi-token values such as Datum which would otherwise require guessing. Does this seem reasonable? Is there another format that might make more sense? This seems ugly to me and I don't see the utility of it. As far as I can tell, the current parser in nodes/read.c ignores the field names entirely, so this can be done without changing postgres' own parsing code at all and without affecting backwards compatibility of any stored trees. Does anyone else out there use nodeToString() output in their own tools, and if so, does this make your life easier or harder? Lastly, I'll leave a link to my WIP implementation in case anyone is interested: http://bitbucket.org/gxti/parse_sql/src/ Currently I'm working on adding support for cooked parse trees and figuring out what, if anything, I need to do to support multibyte encodings. My personal use is for parsing DDL so the input is decidedly not hostile but I'd still like to make this a generally useful module. Thanks in advance for any comments, tips, or flames sent my way. Thanks for having a thick skin. :-) I'm having a hard time imaging what you could use this for without encoding a lot of information about the meaning of particular constructs. In which case the self-describing stuff is not needed. As you point out downthread, if all you want to do is compare, it's not needed either. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] nodeToString format and exporting the SQL parser
Added to TODO: Allow the parser output to be seen by clients This is useful for external tools. * http://archives.postgresql.org/pgsql-hackers/2010-04/msg00095.php --- Robert Haas wrote: On Fri, Apr 2, 2010 at 3:53 PM, Michael Tharp g...@partiallystapled.com wrote: Most Esteemed Hackers: Due to popular demand on #postgresql (by which I mean David Fetter), I have been spending a little time making the internal SQL parser available to clients via a C-language SQL function. The function itself is extremely simple: just a wrapper around a call to raw_parser followed by nodeToString. Seems reasonable. Most of the hard stuff has been in parsing the output of nodeToString on the client side. So, I have a few questions to help gauge interest in related patches: Is there interest in a patch to extend nodes/outfuncs.c with support for serializing more node types? Coverage has been pretty good so far but various utility statements and their related nodes are missing, e.g. AlterTableStmt and GrantStmt. I expect that this will be the least contentious suggestion. This wouldn't bother me provided the code footprint is small. I would be against adding a lot of complexity for this. The nodeToString format as it stands is somewhat ambiguous with respect to the type of a node member's value if one does not have access to readfuncs.c. For example, a T_BitString called foo is serialized as ':foo b1010' while a char * containing 'b1010' is also serialized as ':foo b1010'. This may just mean that _outToken needs to escape the leading 'b'. A similar problem exists for booleans ('true' as a string vs. as a boolean). I am not inclined to change this. Turning the format into something self-describing seems to me to be significant work and a significant compatibility break for a very small amount of benefit. Additionally, values may span more than one token for certain types e.g. Datum (:constvalue 4 [ 16 0 0 0 ]). Plan trees have a few types that don't have a corresponding read function and output an array of space-separated integers. PlanInvalItem even seems to use a format containing parentheses, which the tokenizer splits as if it were a list. While most of these only occur in plan nodes and thus don't affect my use case (Datum being the exception), it would be ideal if they could be parsed more straightforwardly. I'm not inclined to change this, either. These last two problems perhaps can be worked around by escaping more things in _outToken, but maybe it would be smarter to make the fields self-descriptive in terms of type. For example, the field names could be prefixed with a short string describing its type, which in most cases would be a single character, e.g. 's:schemaname' for a char*, 'b:true' for a bool, 'n:...' for any node (including Value nodes), or longer strings for less commonly used types like the integer arrays in plan nodes (although these would probably be better as a real integer list). These could be used to unambiguously parse individual tokens and also to determine how many or what kind of token to expect for multi-token values such as Datum which would otherwise require guessing. Does this seem reasonable? Is there another format that might make more sense? This seems ugly to me and I don't see the utility of it. As far as I can tell, the current parser in nodes/read.c ignores the field names entirely, so this can be done without changing postgres' own parsing code at all and without affecting backwards compatibility of any stored trees. Does anyone else out there use nodeToString() output in their own tools, and if so, does this make your life easier or harder? Lastly, I'll leave a link to my WIP implementation in case anyone is interested: ?http://bitbucket.org/gxti/parse_sql/src/ Currently I'm working on adding support for cooked parse trees and figuring out what, if anything, I need to do to support multibyte encodings. My personal use is for parsing DDL so the input is decidedly not hostile but I'd still like to make this a generally useful module. Thanks in advance for any comments, tips, or flames sent my way. Thanks for having a thick skin. :-) I'm having a hard time imaging what you could use this for without encoding a lot of information about the meaning of particular constructs. In which case the self-describing stuff is not needed. As you point out downthread, if all you want to do is compare, it's not needed either. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com -- Sent via
Re: [HACKERS] [RFC] nodeToString format and exporting the SQL parser
Robert Haas robertmh...@gmail.com writes: On Sat, Apr 24, 2010 at 8:07 PM, Bruce Momjian br...@momjian.us wrote: Sounds useful to me, though as a function like suggested in a later email. If tool-builders think this is useful, I have no problem with making it available. It should be suitably disclaimed: We reserve the right to rip out the entire flex/yacc-based lexer and parser at any time and replace them with a hand-coded system written in Prolog that emits tokenization information only in ASN.1-encoded pig latin. If massive changes in the way this function works - or its complete disappearance - are going to make you grumpy, don't call it. I'm a bit concerned with the vagueness of the goals here. We started with a request to dump out node trees, ie, post-parsing representation; but the example use case of syntax highlighting would find that representation quite useless. (Example: foo::bar and CAST(foo AS bar) yield the same parse tree.) A syntax highlighter might get some use out of the lexer-output token stream, but I'm afraid from the proposed output that people might be expecting more semantic information than the lexer can provide. The lexer doesn't, for example, have any clue that some keywords are commands and others aren't; nor any very clear understanding about the semantic difference between the tokens '=' and ';'. Also, if all you want is the lexer, it's not that hard to steal psql's version and adapt it to your purposes. The lexer doesn't change very fast, and it's not that big either. Anyway, it certainly wouldn't be hard for an add-on module to provide a SRF that calls the lexer (or parser) and returns some sort of tabular representation of the results. I'm just not sure how useful it'll be in the real world. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] nodeToString format and exporting the SQL parser
On 04/24/2010 08:49 PM, Robert Haas wrote: The nodeToString format as it stands is somewhat ambiguous with respect to the type of a node member's value if one does not have access to readfuncs.c. For example, a T_BitString called foo is serialized as ':foo b1010' while a char * containing 'b1010' is also serialized as ':foo b1010'. This may just mean that _outToken needs to escape the leading 'b'. A similar problem exists for booleans ('true' as a string vs. as a boolean). I am not inclined to change this. Turning the format into something self-describing seems to me to be significant work and a significant compatibility break for a very small amount of benefit. The funny thing is, it doesn't seem to be a compatibility break because the code in readfuncs.c that parses the node strings ignores the field names entirely because it assumes they are in a particular order. It also isn't much work to change the output because the code is, with the exception of a few weirdos, all at the top of outfuncs.c, and the weirdos are also dispersed within that file. However, I'm no longer convinced that using a serialized node tree is the way to go for my use case, nor am I particularly sure that it even matches my use case at all anymore as I keep simplifying the goals as time goes on. I won't be able to make any compelling arguments until I figure out what I need :-) Thanks for the feedback. -- m. tharp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] global temporary tables
On Sat, Apr 24, 2010 at 8:47 PM, Jim Nasby deci...@decibel.org wrote: On Apr 24, 2010, at 12:31 PM, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: At least AIUI, the use case for this feature is that you want to avoid creating the same temporary table over and over again. The context that I've seen it come up in is that people don't want to clutter their functions with create-it-if-it-doesn't-exist logic, which you have to have given the current behavior of temp tables. Any performance gain from reduced catalog churn would be gravy. Aside from the DROP problem, I think this implementation proposal has one other big shortcoming: what are you going to do about table statistics? In many cases, you really *have* to do an ANALYZE once you've populated a temp table, if you want to get decent plans for it. Where will you put those stats? One possibility: rename the existing pg_stats to pg_stats_permanent. Create a global temporary table called pg_stats_temporary. pg_stats becomes a union of the two. I know the backend wouldn't be able to use the view, but hopefully access to statistics goes through a limited set of functions so that teaching them about the two different tables isn't hard. Yeah, I don't think that would be too horrible. Part of me feels like you'd want to have the ability to store stats for a global temp table in either one of those tables depending on use-case, but I'm also reluctant to invent a lot of new syntax for a very limited use case. As for cleanup and storage questions; what about having temp objects live in pgsql_tmp? I'm thinking create a directory under pgsql_tmp for a backend PID the first time it creates a temp object (global or local) and create the files in there. That also means that we don't have to come up with different relfilenodes for each backend. That would impose a couple of implementation restrictions that don't seem necessary. One, it would imply ignoring reltablespace. Two, it would prohibit (or at least complicate) allowing a backend to CLUSTER or REINDEX its own private copy of the rel. On the other hand, some layer (presumably smgr) would need to understand whether a relation was temporary or not. If we do that, cleanup is easy: you can remove any directories that no longer have a running PID. For forensics you probably only want to do that automatically when a backend starts and discovers it already has a directory, though we should also provide an administrator function that will clobber all directories that no longer have backends. Unfortunately, I don't see much alternative to making smgr know something about the temp-ness of the relation, though I'm hoping to keep the smgr surgery to an absolute minimum. Maybe what we could do is incorporate the backend ID or PID into the file name when the relation is temp. Then we could scan for and nuke such files pretty easily. Otherwise I can't really think how to make it work. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] nodeToString format and exporting the SQL parser
On Sat, Apr 24, 2010 at 9:08 PM, Michael Tharp g...@partiallystapled.com wrote: The funny thing is, it doesn't seem to be a compatibility break because the code in readfuncs.c that parses the node strings ignores the field names entirely because it assumes they are in a particular order. It also isn't much work to change the output because the code is, with the exception of a few weirdos, all at the top of outfuncs.c, and the weirdos are also dispersed within that file. Yeah, I think that's basically all true. However, I'm no longer convinced that using a serialized node tree is the way to go for my use case, nor am I particularly sure that it even matches my use case at all anymore as I keep simplifying the goals as time goes on. I won't be able to make any compelling arguments until I figure out what I need :-) However this is, as you say, a fairly damning point. :-) Bruce, I think we should not have a TODO for this until the OP (or someone) can address this issue a bit better. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] nodeToString format and exporting the SQL parser
Robert Haas wrote: On Sat, Apr 24, 2010 at 9:08 PM, Michael Tharp g...@partiallystapled.com wrote: The funny thing is, it doesn't seem to be a compatibility break because the code in readfuncs.c that parses the node strings ignores the field names entirely because it assumes they are in a particular order. It also isn't much work to change the output because the code is, with the exception of a few weirdos, all at the top of outfuncs.c, and the weirdos are also dispersed within that file. Yeah, I think that's basically all true. However, I'm no longer convinced that using a serialized node tree is the way to go for my use case, nor am I particularly sure that it even matches my use case at all anymore as I keep simplifying the goals as time goes on. I won't be able to make any compelling arguments until I figure out what I need :-) However this is, as you say, a fairly damning point. :-) Bruce, I think we should not have a TODO for this until the OP (or someone) can address this issue a bit better. OK, removed. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] global temporary tables
Robert Haas robertmh...@gmail.com writes: Unfortunately, I don't see much alternative to making smgr know something about the temp-ness of the relation, though I'm hoping to keep the smgr surgery to an absolute minimum. Maybe what we could do is incorporate the backend ID or PID into the file name when the relation is temp. Then we could scan for and nuke such files pretty easily. Otherwise I can't really think how to make it work. I think that could be a really good idea, mainly because it makes post-crash cleanup MUCH safer: you can tell with certainty from the filename that it's a leftover temp table. The notion of zapping files just because we don't see them listed in pg_class has always scared the heck out of me. We already know temp-ness at pretty low levels, like bufmgr vs localbuf. Pushing it all the way down to smgr doesn't seem like a leap; in fact I think it would eliminate a separate isTemp parameter in a lot of places. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] global temporary tables
On Apr 24, 2010, at 8:14 PM, Robert Haas wrote: One possibility: rename the existing pg_stats to pg_stats_permanent. Create a global temporary table called pg_stats_temporary. pg_stats becomes a union of the two. I know the backend wouldn't be able to use the view, but hopefully access to statistics goes through a limited set of functions so that teaching them about the two different tables isn't hard. Yeah, I don't think that would be too horrible. Part of me feels like you'd want to have the ability to store stats for a global temp table in either one of those tables depending on use-case, but I'm also reluctant to invent a lot of new syntax for a very limited use case. Yeah, I'm thinking that's very probably overkill. And if we were going to go to that level, I think it would be far more useful to provide an interface to allow manual control over statistics first, so that you can give the optimizer custom information. As for cleanup and storage questions; what about having temp objects live in pgsql_tmp? I'm thinking create a directory under pgsql_tmp for a backend PID the first time it creates a temp object (global or local) and create the files in there. That also means that we don't have to come up with different relfilenodes for each backend. That would impose a couple of implementation restrictions that don't seem necessary. One, it would imply ignoring reltablespace. Two, it would prohibit (or at least complicate) allowing a backend to CLUSTER or REINDEX its own private copy of the rel. Well, the same structure could be imposed underneath a temptablespace. I don't think it matters where you ultimately put it, the goal is just to make sure you can definitively tell that a file was a: temporary and b: what PID it belonged to. That allows for safe cleanup. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] global temporary tables
On Sat, Apr 24, 2010 at 9:37 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Unfortunately, I don't see much alternative to making smgr know something about the temp-ness of the relation, though I'm hoping to keep the smgr surgery to an absolute minimum. Maybe what we could do is incorporate the backend ID or PID into the file name when the relation is temp. Then we could scan for and nuke such files pretty easily. Otherwise I can't really think how to make it work. I think that could be a really good idea, mainly because it makes post-crash cleanup MUCH safer: you can tell with certainty from the filename that it's a leftover temp table. The notion of zapping files just because we don't see them listed in pg_class has always scared the heck out of me. We already know temp-ness at pretty low levels, like bufmgr vs localbuf. Pushing it all the way down to smgr doesn't seem like a leap; in fact I think it would eliminate a separate isTemp parameter in a lot of places. Eh? I don't see how it's going to do that. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] global temporary tables
On Sat, Apr 24, 2010 at 9:59 PM, Robert Haas robertmh...@gmail.com wrote: On Sat, Apr 24, 2010 at 9:37 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Unfortunately, I don't see much alternative to making smgr know something about the temp-ness of the relation, though I'm hoping to keep the smgr surgery to an absolute minimum. Maybe what we could do is incorporate the backend ID or PID into the file name when the relation is temp. Then we could scan for and nuke such files pretty easily. Otherwise I can't really think how to make it work. I think that could be a really good idea, mainly because it makes post-crash cleanup MUCH safer: you can tell with certainty from the filename that it's a leftover temp table. The notion of zapping files just because we don't see them listed in pg_class has always scared the heck out of me. We already know temp-ness at pretty low levels, like bufmgr vs localbuf. Pushing it all the way down to smgr doesn't seem like a leap; in fact I think it would eliminate a separate isTemp parameter in a lot of places. Eh? I don't see how it's going to do that. Oh, maybe I do see. If we pass it to smgropen() and stash it in the SMgrRelation, we don't have to keep supplying it later on, maybe? Will investigate further. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] global temporary tables
Robert Haas robertmh...@gmail.com writes: Oh, maybe I do see. If we pass it to smgropen() and stash it in the SMgrRelation, we don't have to keep supplying it later on, maybe? Right. I'm unsure whether we should push it into the RelFileNode struct itself, but even having it in SMgrRelation ought to cut out a few places where it now has to be passed separately. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Order of pg_stat_activity timestamp columns
Tom Lane wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: Tom Lane t...@sss.pgh.pa.us wrote: The current column ordering can be rationalized to some extent as 1. identity info (user id, db id, application name) 2. current query info 3. session info (backend start time, client addr/port) OK. I guess that trumps my idea, although it would sure be nice if it were possible to swap 2 and 3 so that we could put the query text at the end. Well, the current ordering is definitely historical rather than designed, but I'm hesitant to do more than minor tweaking. Even if we think/hope it won't break applications, people are probably used to seeing a particular ordering. I'm not necessarily dead set against it though. I guess if we were to do what you suggest, we'd end up with identity: datid| oid | datname | name | procpid | integer | usesysid | oid | usename | name | application_name | text | session: client_addr | inet | client_port | integer | backend_start| timestamp with time zone | transaction: xact_start | timestamp with time zone | query: query_start | timestamp with time zone | waiting | boolean | current_query| text | or possibly that plus relocate procpid somewhere else. Anyone think this is sufficiently better to justify possible confusion? I implemented Tom's suggested ordering above: test= SELECT * FROM pg_stat_activity; -[ RECORD 1 ]+ datid| 16384 datname | test procpid | 22216 usesysid | 10 usename | postgres application_name | psql client_addr | client_port | -1 backend_start| 2010-04-24 22:35:21.683308-04 xact_start | 2010-04-24 22:47:19.53821-04 query_start | 2010-04-24 22:47:19.53821-04 waiting | f current_query| SELECT * FROM pg_stat_activity; Patch attached. It will require a catversion bump too. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com Index: src/backend/catalog/system_views.sql === RCS file: /cvsroot/pgsql/src/backend/catalog/system_views.sql,v retrieving revision 1.65 diff -c -c -r1.65 system_views.sql *** src/backend/catalog/system_views.sql2 Jan 2010 16:57:36 - 1.65 --- src/backend/catalog/system_views.sql25 Apr 2010 02:47:39 - *** *** 335,347 S.usesysid, U.rolname AS usename, S.application_name, ! S.current_query, ! S.waiting, S.xact_start, S.query_start, ! S.backend_start, ! S.client_addr, ! S.client_port FROM pg_database D, pg_stat_get_activity(NULL) AS S, pg_authid U WHERE S.datid = D.oid AND S.usesysid = U.oid; --- 335,347 S.usesysid, U.rolname AS usename, S.application_name, ! S.client_addr, ! S.client_port, ! S.backend_start, S.xact_start, S.query_start, ! S.waiting, ! S.current_query FROM pg_database D, pg_stat_get_activity(NULL) AS S, pg_authid U WHERE S.datid = D.oid AND S.usesysid = U.oid; Index: src/test/regress/expected/rules.out === RCS file: /cvsroot/pgsql/src/test/regress/expected/rules.out,v retrieving revision 1.154 diff -c -c -r1.154 rules.out *** src/test/regress/expected/rules.out 29 Dec 2009 20:11:45 - 1.154 --- src/test/regress/expected/rules.out 25 Apr 2010 02:47:40 - *** *** 1289,1295 pg_rules | SELECT n.nspname AS schemaname, c.relname AS tablename, r.rulename, pg_get_ruledef(r.oid) AS definition FROM ((pg_rewrite r JOIN pg_class c ON ((c.oid = r.ev_class))) LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) WHERE (r.rulename '_RETURN'::name); pg_settings | SELECT a.name, a.setting, a.unit, a.category, a.short_desc, a.extra_desc, a.context, a.vartype, a.source, a.min_val, a.max_val, a.enumvals, a.boot_val, a.reset_val, a.sourcefile, a.sourceline FROM pg_show_all_settings() a(name, setting, unit, category, short_desc, extra_desc, context, vartype, source, min_val, max_val, enumvals, boot_val, reset_val, sourcefile, sourceline); pg_shadow| SELECT pg_authid.rolname AS usename, pg_authid.oid
Re: [HACKERS] global temporary tables
On Sat, Apr 24, 2010 at 10:28 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Oh, maybe I do see. If we pass it to smgropen() and stash it in the SMgrRelation, we don't have to keep supplying it later on, maybe? Right. I'm unsure whether we should push it into the RelFileNode struct itself, but even having it in SMgrRelation ought to cut out a few places where it now has to be passed separately. Pushing it into the RelFileNode has some advantages in terms of being able to get at the information from everywhere, but one thing that makes me think that's probably not a good decision is that we somtimes WAL-log relfilenodes. And WAL-logging the value of the isTemp flag is a waste, because if we're WAL-logging, it's zero. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] global temporary tables
Robert Haas robertmh...@gmail.com writes: Pushing it into the RelFileNode has some advantages in terms of being able to get at the information from everywhere, but one thing that makes me think that's probably not a good decision is that we somtimes WAL-log relfilenodes. And WAL-logging the value of the isTemp flag is a waste, because if we're WAL-logging, it's zero. Yeah. I think we also use RelFileNode as a hash tag in places, and so adding a bool to it would be problematic for a couple of reasons: possibly uninitialized pad bytes, and uselessly incorporating more bytes into the hash calculation. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers