Re: [HACKERS] Eliminating unnecessary left joins
2007/4/7, Ottó Havasvölgyi [EMAIL PROTECTED]: My simple example: Class hierarchy and fields: Shape (ID, X, Y) | +-Circle (ID, Radius) | +-Rectangle (ID, Width, Height) The mapper creates 3 tables with the columns next to the class name. And it creates 3 views. One of them: RectangleView: SELECT r.ID as ID, s.X as X, s.Y as Y, r.Width as Width, r.Height as Height FROM Rectangle r LEFT JOIN Shape s ON ( r.ID=s.ID) I find this view definition a bit strange: why is there a left outer join? I expect there to be a FK from Rectangle.ID to Shape.ID (all rectangles are shapes), which makes the definition totally equivalent with one in which a normal join is used (whether attributes of Shape are used or not). The main use case I see for the original optimization is ORMs that join in a whole hierarchy, even when only a part of it is needed. I guess that that is rather common. The ORM that I use does exactly this, because the main target-DBMSs (MS-SQL and Oracle) do the optimization for it. Example (somewhat less contrived than my previous one): Imagine an implementation of the typical books that are borrowed by people n-m relationship, using three tables (Book, Borrowed, Person). Let's find all books that have been borrowed by a certain person. The non-ORM version would be something like: SELECT Book.* FROM Book JOIN Borrowed ON Borrowed.book_id = Book.id WHERE Borrowed.person_id = x; Now assume that Borrowed is a class hierarchy mapped into multiple tables by a typical ORM. The query would probably become something like: SELECT Book.* FROM Book JOIN Borrowed_Parent ON Borrowed_Parent.book_id = Book.id LEFT JOIN Borrowed_Child1 ON Borrowed_Child1.id = Borrowed_Parent.id LEFT JOIN Borrowed_Child2 ON Borrowed_Child2.id = Borrowed_Parent.id (...) WHERE Borrowed_Parent.person_id = x; It is clear that the children of the hierarchy are needlessly joined in (as the only attribute that is actually needed is person_id, which is on the parent level). It is not always trivial for the ORM to find that out, without writing stuff that looks suspiciously similar to a DBMS optimizer. Maybe it is debatable whether this optimization should be done by the application (i.e. the ORM) or by the DBMS. I am personally in favor of doing it in the DBMS. greetings, Nicolas -- Nicolas Barbier http://www.gnu.org/philosophy/no-word-attachments.html ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] problem with install scripts.
Hi, I used the development version of PostgreSQL to compile my module fulldisjunctions. The install scripts works well there (passes). However, i wanted to check if it still works (backward compatible) with the stable version and it complains on these lines in the sql script when i do make installcheck: SET standard_conforming_strings = off; SET escape_string_warning = off; When i remove this lines the stable version make installcheck passes. What should i do? 10x. -- Regards, Tzahi. -- Tzahi Fadida Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info WARNING TO SPAMMERS: see at http://members.lycos.co.uk/my2nis/spamwarning.html ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Eliminating unnecessary left joins
My mapper joins the parent classes' tables to the current class' table in the view. In the ShapeView only ID, X, and Y is selected from the shape table, and none of the child tables are touched, opposite to your sample. But even though all Shape objects (circles and rectangles too) are in the resultset as Shape objects. I see this storage model quite consistent. You are right, that this can be done with inner join too, this is an option in the mapper. Oracle and MSSQL performs this left join optimization, so it is usually used with left join by other mapper users. I have asked them (the developers of the mapper) to perform this optimization at mapper level because not all DBMSs supported this optimization, but it seemed they didn't like this idea... And then I came here. This optimization would be useful for every Postgres users. To be honest I also find your sample strange, more exactly that *sibling* child tables are left joined to the parent. Maybe because the storage model is different than in my mapper. In my case the left joined parent tables should be excluded by the optimizer if possible. Best regards, Otto 2007/4/8, Nicolas Barbier [EMAIL PROTECTED]: 2007/4/7, Ottó Havasvölgyi [EMAIL PROTECTED]: My simple example: Class hierarchy and fields: Shape (ID, X, Y) | +-Circle (ID, Radius) | +-Rectangle (ID, Width, Height) The mapper creates 3 tables with the columns next to the class name. And it creates 3 views. One of them: RectangleView: SELECT r.ID as ID, s.X as X, s.Y as Y, r.Width as Width, r.Height as Height FROM Rectangle r LEFT JOIN Shape s ON ( r.ID=s.ID) I find this view definition a bit strange: why is there a left outer join? I expect there to be a FK from Rectangle.ID to Shape.ID (all rectangles are shapes), which makes the definition totally equivalent with one in which a normal join is used (whether attributes of Shape are used or not). The main use case I see for the original optimization is ORMs that join in a whole hierarchy, even when only a part of it is needed. I guess that that is rather common. The ORM that I use does exactly this, because the main target-DBMSs (MS-SQL and Oracle) do the optimization for it. Example (somewhat less contrived than my previous one): Imagine an implementation of the typical books that are borrowed by people n-m relationship, using three tables (Book, Borrowed, Person). Let's find all books that have been borrowed by a certain person. The non-ORM version would be something like: SELECT Book.* FROM Book JOIN Borrowed ON Borrowed.book_id = Book.id http://book.id/ WHERE Borrowed.person_id = x; Now assume that Borrowed is a class hierarchy mapped into multiple tables by a typical ORM. The query would probably become something like: SELECT Book.* FROM Book JOIN Borrowed_Parent ON Borrowed_Parent.book_id = Book.idhttp://book.id/ LEFT JOIN Borrowed_Child1 ON Borrowed_Child1.id = Borrowed_Parent.id LEFT JOIN Borrowed_Child2 ON Borrowed_Child2.id = Borrowed_Parent.id (...) WHERE Borrowed_Parent.person_id = x; It is clear that the children of the hierarchy are needlessly joined in (as the only attribute that is actually needed is person_id, which is on the parent level). It is not always trivial for the ORM to find that out, without writing stuff that looks suspiciously similar to a DBMS optimizer. Maybe it is debatable whether this optimization should be done by the application (i.e. the ORM) or by the DBMS. I am personally in favor of doing it in the DBMS. greetings, Nicolas -- Nicolas Barbier http://www.gnu.org/philosophy/no-word-attachments.html
Re: [HACKERS] problem with install scripts.
Tzahi Fadida wrote: What should i do? Telling what version you use and what error messages you see would help. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] problem with install scripts.
On Sunday 08 April 2007 21:34:52 Peter Eisentraut wrote: Tzahi Fadida wrote: What should i do? Telling what version you use and what error messages you see would help. SET client_min_messages = warning; \set ECHO none + ERROR: unrecognized configuration parameter standard_conforming_strings + ERROR: unrecognized configuration parameter escape_string_warning RESET client_min_messages; #define CATALOG_VERSION_NO 200411041 -- Regards, Tzahi. -- Tzahi Fadida Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info WARNING TO SPAMMERS: see at http://members.lycos.co.uk/my2nis/spamwarning.html ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] problem with install scripts.
Tzahi Fadida wrote: On Sunday 08 April 2007 21:34:52 Peter Eisentraut wrote: Tzahi Fadida wrote: What should i do? Telling what version you use and what error messages you see would help. SET client_min_messages = warning; \set ECHO none + ERROR: unrecognized configuration parameter standard_conforming_strings + ERROR: unrecognized configuration parameter escape_string_warning RESET client_min_messages; #define CATALOG_VERSION_NO 200411041 Well, your version is too old then. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] [HACKERS] Arrays of Complex Types
Tom Lane wrote: David Fetter [EMAIL PROTECTED] writes: On Mon, Apr 02, 2007 at 10:01:44PM -0400, Alvaro Herrera wrote: So, hum, what happened to the idea of creating the array types only on demand? Scotched, as far as I could tell, More like you submitted a patch that entirely ignores multiple people's opinion on what is needed. Bruce may have put this into the patch queue, but do not labor under the delusion that that means it'll get applied as-is. The queue is currently operating as a list of open issues. One of the things that's been bothering me about this proposal is that it leaves untouched and indeed greatly expands the scope of the typename mangling we do. (i.e. adding an entry to pg_type with _ prepended). Up to now we've only used this gadget in a way that might matter a lot on user defined non-composite types, I think, and now we have expanded that to include enums, which are really a special case of user defined non-composites which don't require an extra C module. That's a comparatively small window, but this proposal will extend it to all composites, which is quite a large expansion in scope. And since _ is a perfectly legal initial char for an identifier, if type _foo exists then any attempt to create a table or view or composite called foo will fail. Is it possible to fix this, or am I trying to shut the stable door after the horse has well and truly bolted? If it can be fixed, I'd like to see it fixed before we fix the problem David is trying to address here. It's been suggested to me that this is an insignificant corner case. But I have often seen coding standards that actually require certain classes of identifier to being with _, so it's very far from a merely theoretical point. I'm slightly inclined to agree with David that the danger of catalog bloat isn't that great, and might not justify the extra work that some sort of explicit array creation would involve (e.g. changes in grammar, pg_dump), as long as we are agreed that we don't want array types ever to have their own user definable names or settable namespace. cheers andrew ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] [HACKERS] Arrays of Complex Types
Andrew Dunstan [EMAIL PROTECTED] writes: One of the things that's been bothering me about this proposal is that it leaves untouched and indeed greatly expands the scope of the typename mangling we do. (i.e. adding an entry to pg_type with _ prepended). Yeah, that's been bothering me too. One of the problems with the patch as-is is that it extends the 62-instead-of-63-char limit to table names as well as type names. I've been thinking of proposing that we add a column to pg_type that points from a type to its array type (if any), ie the reverse link from typelem. If we had that then the parser could follow that to determine which type is foo[], instead of relying on the _foo naming convention. I don't suggest that we stop using the naming convention, but it would no longer be a hard-and-fast rule, just a convention. In particular we could rejigger things around the edges to reduce the name conflict problem. For instance the rule for forming array type names could be prepend _, truncate to less than 64 bytes if necessary, then substitute numbers at the end if needed to get something unique. This is not all that different from what we do now to get unique serial sequence names, for example. This would also open the door to supporting CREATE TYPE foo AS ARRAY OF bar without having to have any restrictions about the name of foo. I'd still much rather do things that way for arrays of composites than invent a ton of pg_type entries that are mostly going to go unused. regards, tom lane PS: Has anyone looked at what it will take to make the entries in an array-of-composite be something smaller than full tuples? It's not going to be anything but a toy unless you can get the per-entry overhead down to something sane. Perhaps the MinimalTuple representation would work. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] [HACKERS] Arrays of Complex Types
On Sun, Apr 08, 2007 at 07:08:38PM -0400, Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: One of the things that's been bothering me about this proposal is that it leaves untouched and indeed greatly expands the scope of the typename mangling we do. (i.e. adding an entry to pg_type with _ prepended). Yeah, that's been bothering me too. One of the problems with the patch as-is is that it extends the 62-instead-of-63-char limit to table names as well as type names. I did this by copying some code which already creates array names, so should that code change to do something different, the 62-instead-of-63-char thing would go away along with it. I agree that the prepended _s are far from optimal. I've been thinking of proposing that we add a column to pg_type that points from a type to its array type (if any), ie the reverse link from typelem. If we had that then the parser could follow that to determine which type is foo[], instead of relying on the _foo naming convention. I don't suggest that we stop using the naming convention, but it would no longer be a hard-and-fast rule, just a convention. That'd be neat :) In particular we could rejigger things around the edges to reduce the name conflict problem. For instance the rule for forming array type names could be prepend _, truncate to less than 64 bytes if necessary, then substitute numbers at the end if needed to get something unique. This is not all that different from what we do now to get unique serial sequence names, for example. This would also open the door to supporting CREATE TYPE foo AS ARRAY OF bar I'm sorry to keep harping on this, but I really don't see a use case and do see foot guns both with making the array types optional and with decoupling their names from those of their respective compound types. When they're optional, we get all kinds of stepping on a step that isn't there issues, and when they're decoupled, operations like, ALTER TABLE foo RENAME TO bar have either surprising or undefined behavior, or both. without having to have any restrictions about the name of foo. I'd still much rather do things that way for arrays of composites than invent a ton of pg_type entries that are mostly going to go unused. I'm sure there's a better way than my first attempt. PS: Has anyone looked at what it will take to make the entries in an array-of-composite be something smaller than full tuples? It's not going to be anything but a toy unless you can get the per-entry overhead down to something sane. Perhaps the MinimalTuple representation would work. Sounds neat, too :) Cheers, D -- David Fetter [EMAIL PROTECTED] http://fetter.org/ phone: +1 415 235 3778AIM: dfetter666 Skype: davidfetter Remember to vote! Consider donating to PostgreSQL: http://www.postgresql.org/about/donate ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] [HACKERS] Arrays of Complex Types
Tom Lane wrote: I've been thinking of proposing that we add a column to pg_type that points from a type to its array type (if any), ie the reverse link from typelem. If we had that then the parser could follow that to determine which type is foo[], instead of relying on the _foo naming convention. good. I don't suggest that we stop using the naming convention, but it would no longer be a hard-and-fast rule, just a convention. In particular we could rejigger things around the edges to reduce the name conflict problem. For instance the rule for forming array type names could be prepend _, truncate to less than 64 bytes if necessary, then substitute numbers at the end if needed to get something unique. This is not all that different from what we do now to get unique serial sequence names, for example. Sounds OK but I'd add something that might make it even more unlikely to generate a name clash. This would also open the door to supporting CREATE TYPE foo AS ARRAY OF bar without having to have any restrictions about the name of foo. I'd still much rather do things that way for arrays of composites than invent a ton of pg_type entries that are mostly going to go unused. ISTM we should either do it all automatically or all manually. If you want user defined names for array types then we can forget name mangling for user defined types and do everything manually. cheers andrew ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] [HACKERS] Arrays of Complex Types
Andrew Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: I don't suggest that we stop using the naming convention, but it would no longer be a hard-and-fast rule, just a convention. In particular we could rejigger things around the edges to reduce the name conflict problem. For instance the rule for forming array type names could be prepend _, truncate to less than 64 bytes if necessary, then substitute numbers at the end if needed to get something unique. This is not all that different from what we do now to get unique serial sequence names, for example. Sounds OK but I'd add something that might make it even more unlikely to generate a name clash. Like what? I don't want to stray far from _foo when we don't have to, because I'm sure there is user code out there that'll still rely on that naming convention; we shouldn't break it if we don't have to. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] [HACKERS] Arrays of Complex Types
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: I don't suggest that we stop using the naming convention, but it would no longer be a hard-and-fast rule, just a convention. In particular we could rejigger things around the edges to reduce the name conflict problem. For instance the rule for forming array type names could be prepend _, truncate to less than 64 bytes if necessary, then substitute numbers at the end if needed to get something unique. This is not all that different from what we do now to get unique serial sequence names, for example. Sounds OK but I'd add something that might make it even more unlikely to generate a name clash. Like what? I don't want to stray far from _foo when we don't have to, because I'm sure there is user code out there that'll still rely on that naming convention; we shouldn't break it if we don't have to. Oh, in that case maybe we'd better live with it :-( I certainly think we should deprecate relying on it. cheers andrew ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] [HACKERS] Arrays of Complex Types
Andrew Dunstan [EMAIL PROTECTED] writes: I'm slightly inclined to agree with David that the danger of catalog bloat isn't that great, and might not justify the extra work that some sort of explicit array creation would involve (e.g. changes in grammar, pg_dump), as long as we are agreed that we don't want array types ever to have their own user definable names or settable namespace. I did some tests just now to determine the total number of catalog entries associated with a simple table definition. Assuming it has N user columns of built-in types (hence not requiring pg_depend entries for the datatypes), I count 1 pg_class entry for the table itself 1 pg_type entry for the rowtype N + 6 pg_attribute entries for the user and system columns 2 pg_depend entries (type - table and table - namespace) 2 pg_shdepend entries (ownership of table and type) Of course this goes up *fast* if you need a toast table, indexes, constraints, etc, but that's the irreducible minimum. Generating an array rowtype would add three more catalog entries to this (the array pg_type entry, a pg_depend arraytype-rowtype link, and another pg_shdepend entry), which isn't a huge percentage overhead. Obviously if we wanted to trim some fat here, getting rid of the redundant pg_attribute entries for system columns would be the first place to look. Based on this, I withdraw my efficiency concern about generating rowtypes for all user tables. I do, however, still object to generating them for system tables. In particular an array type for pg_statistic will actively Not Work and probably constitute a security hole, because of the anyarray hack we use there. BTW, I just noticed that we currently create array types with AUTO dependencies on their element type, meaning that you can drop them separately: regression=# create type fooey as enum ('a','b'); CREATE TYPE regression=# drop type _fooey; DROP TYPE Is this a bad idea? If we made the dependency INTERNAL then the system would refuse the drop above. I think we would have to do that if we wanted to add the base-array link I suggested, because otherwise this drop would leave a dangling pointer in pg_type. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] [HACKERS] Arrays of Complex Types
Tom Lane wrote: Based on this, I withdraw my efficiency concern about generating rowtypes for all user tables. I do, however, still object to generating them for system tables. In particular an array type for pg_statistic will actively Not Work and probably constitute a security hole, because of the anyarray hack we use there. How would we do that? Not create the array types in bootstrap mode? Or just special-case pg_statistic? cheers andrew ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Bitmap index thoughts (another segfault)
I'm seeing a segfault on a size TPC-H size 10 database. The patch and code are: - bitmap patch from 12 Mar - 8.3 dev from 27 Mar Thanks Mark. I tracked this down. I'll post a new patch soon. Gavin ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Bitmap index thoughts (another segfault)
On Sat, 7 Apr 2007, Mark Kirkwood wrote: Mark Kirkwood wrote: bitmap=# SELECT count(*) FROM bitmaptest WHERE val1 in (1,7) AND val0 IN (4,3) ; ERROR: XX000: unknown stream type 2 LOCATION: stream_add_node, tidbitmap.c:1033 Thanks. Turned out to be a typo in stream_add_node()! I'll post a new patch soon. Thanks for the test kit and your testing! Gavin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] [HACKERS] Arrays of Complex Types
Andrew Dunstan [EMAIL PROTECTED] writes: How would we do that? Not create the array types in bootstrap mode? Or just special-case pg_statistic? Not generate them in bootstrap mode works for me. IIRC, there's code somewhere in there that allows anyarray to pass as a column type in bootstrap mode, so that seems to fit ... regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings