On Tue, Dec 25, 2012 at 10:34 AM, Dimitri Fontaine <dimi...@2ndquadrant.fr> wrote: > Thank you for this partial commit, and Simon and Andres to fill in the > gaps. I should mention that the missing header parts were all in my > patch, and that headers hacking is proving suprisingly uneasy.
Apologies to all about the missed bits. I believe that I had them in my version of the patch as well, but I think the changes ended up unstaged in my repository rather than folded into the patch. I'm usually smarter than that, but, obviously, not this time. >> I was thinking that we might need to go beyond what pg_regress can do >> in this case. For example, one idea would be to install an audit >> trigger that records all the DDL that happens during the regression >> tests. Then, afterwards, replay that DDL into a new database. Then >> do a schema-only dump of the old and new databases and diff the dump >> files. That's a little wacky by current community standards but FWIW >> EDB has a bunch of internal tests that we run to check our proprietary >> stuff; some of them work along these lines and it's pretty effective >> at shaking out bugs. > > Are you in a position to contribute those parts to the community? > Implementing them again then having to support two different variants of > them does not look like the best use of both our times. If I thought there were some useful code, I would try to see if we could contribute it, but I'm pretty sure there isn't. We have a bunch of pg_dump tests, but their intended purpose is to verify that our proprietary stuff dumps and reloads properly, which is not what we need here. The reason I mentioned it was just to make the point that it is possible to have a large suite of tests that goes beyond what is possible with pg_regress, and that this can be a useful way of finding out whether you've broken things. In our case, we actually cheat pretty hard by using things like \! pg_dump in the .sql files, so it actually ends up going through pg_regress after all, technically anyway. I'm not quite sure what to think about that approach from a community point of view. It certainly ends up reducing the number of new things that you need to invent in order to do new kinds of testing, but it's still not as flexible as I would like - for example, I don't see a way to do the exact kind of testing we need here in a cross-platform way using that infrastructure. I'm tempted to propose that we define a Perl-based testing API for "additional regression tests" that can't be handled using pg_regress. Like: we put a bunch of Perl modules in a certain directory, and then write a script that goes through and do require $module; $module->regress; ...on each one, based on some schedule file. If a test passes or fails, it calls some API that we provide to report the result. That way, whatever tests we write for this effort can potentially become the core of a larger test suite, and by basing it on Perl we add no new tool dependencies and can (hopefully) run on both Linux and Windows. I'm open up to other ideas, of course. > The case where we should certainly prefer looking at the catalog caches > are when we want to know the actual schema where the object has been > created. The current patch is deriving that information mostly from the > parse tree, using the first entry of the search_path when the schema is > not given in the command. It's ok because we are still in the same > transaction and no command has been able to run in between the user > command and our lookup, but I could easily get convinced to look up the > catalogs instead, even more so once we have the OID of the object easily > available in all places. I haven't looked at the code, but it seems to me that there have got to be edge cases where that will fail. For one thing, we only do pretty minimal locking on namespaces, so I'm not at all sure that someone couldn't (for example) rename the old namespace out of the way, thus causing one search_path lookup to find the first schema in the search_path and the other lookup to find the second schema in the search_path. Frankly, I think we have some hazards of that type in the DDL code as it stands, so it's not like nobody's ever made that mistake before, but it would be nice not to propagate it. Really, the only safe way to do these things is to have any given name lookup done in one and only one place. We're some distance from there and there and I don't know that this patch should be required to carry the burden of nailing down all the loose ends in that area, but I think it's reasonable to insist that it shouldn't do anything which makes it impossible for someone else to nail down those loose ends, which is still fairly high on my personal to-do list. The other danger here is - what exactly do you mean by "no command has been able to run between the user command and our lookup"? Because you can do stupid things with functions like set_config(). This would only be safe if no user-provided expressions can possibly get evaluated between point A and point B, and that strikes me as the sort of thing that could easily be false unless this is all done VERY close to the start of processing. A belated Merry Christmas to you as well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers