Re: [HACKERS] Eliminating unnecessary left joins

2007-04-08 Thread Nicolas Barbier

2007/4/7, Ottó Havasvölgyi [EMAIL PROTECTED]:


My simple example:

Class hierarchy and fields:
Shape (ID, X, Y)
|
+-Circle (ID, Radius)
|
+-Rectangle (ID, Width, Height)

The mapper creates 3 tables with the columns next to the class name.
And it creates 3 views. One of them:

RectangleView:  SELECT r.ID as ID, s.X as X, s.Y as Y, r.Width
as Width, r.Height as Height FROM Rectangle r LEFT JOIN Shape s ON
( r.ID=s.ID)


I find this view definition a bit strange: why is there a left outer
join? I expect there to be a FK from Rectangle.ID to Shape.ID (all
rectangles are shapes), which makes the definition totally equivalent
with one in which a normal join is used (whether attributes of Shape
are used or not).

The main use case I see for the original optimization is ORMs that
join in a whole hierarchy, even when only a part of it is needed. I
guess that that is rather common. The ORM that I use does exactly
this, because the main target-DBMSs (MS-SQL and Oracle) do the
optimization for it.

Example (somewhat less contrived than my previous one):

Imagine an implementation of the typical books that are borrowed by
people n-m relationship, using three tables (Book, Borrowed,
Person). Let's find all books that have been borrowed by a certain
person.

The non-ORM version would be something like:

SELECT Book.*
FROM
Book
   JOIN Borrowed ON Borrowed.book_id = Book.id
WHERE Borrowed.person_id = x;

Now assume that Borrowed is a class hierarchy mapped into multiple
tables by a typical ORM. The query would probably become something
like:

SELECT Book.*
FROM
 Book
JOIN Borrowed_Parent ON Borrowed_Parent.book_id = Book.id
   LEFT JOIN Borrowed_Child1 ON Borrowed_Child1.id = Borrowed_Parent.id
   LEFT JOIN Borrowed_Child2 ON Borrowed_Child2.id = Borrowed_Parent.id
   (...)
WHERE Borrowed_Parent.person_id = x;

It is clear that the children of the hierarchy are needlessly joined
in (as the only attribute that is actually needed is person_id, which
is on the parent level). It is not always trivial for the ORM to find
that out, without writing stuff that looks suspiciously similar to a
DBMS optimizer.

Maybe it is debatable whether this optimization should be done by the
application (i.e. the ORM) or by the DBMS. I am personally in favor of
doing it in the DBMS.

greetings,
Nicolas

--
Nicolas Barbier
http://www.gnu.org/philosophy/no-word-attachments.html

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[HACKERS] problem with install scripts.

2007-04-08 Thread Tzahi Fadida
Hi,
I used the development version of PostgreSQL to compile my module 
fulldisjunctions. The install scripts works well there (passes).
However, i wanted to check if it still works (backward compatible) with the 
stable version and it complains on these lines in the sql script when i do 
make installcheck:
SET standard_conforming_strings = off;
SET escape_string_warning = off;

When i remove this lines the stable version make installcheck passes.

What should i do?
10x.

-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Eliminating unnecessary left joins

2007-04-08 Thread Ottó Havasvölgyi

My mapper joins the parent classes' tables to the current class' table in
the view. In the ShapeView only ID, X, and Y is selected from the shape
table, and none of the child tables are touched, opposite to your sample.
But even though all Shape objects (circles and rectangles too) are in the
resultset as Shape objects. I see this storage model quite consistent.
You are right, that this can be done with inner join too, this is an option
in the mapper. Oracle and MSSQL performs this left join optimization, so it
is usually used with left join by other mapper users. I have asked them (the
developers of the mapper) to perform this optimization at mapper level
because not all DBMSs supported this optimization, but it seemed they didn't
like this idea... And then I came here. This optimization would be useful
for every Postgres users.

To be honest I also find your sample strange, more exactly that
*sibling* child tables are left joined to the parent. Maybe because the
storage model is different than in my mapper.

In my case the left joined parent tables should be excluded by the optimizer
if possible.

Best regards,
Otto



2007/4/8, Nicolas Barbier [EMAIL PROTECTED]:


2007/4/7, Ottó Havasvölgyi [EMAIL PROTECTED]:

 My simple example:

 Class hierarchy and fields:
 Shape (ID, X, Y)
 |
 +-Circle (ID, Radius)
 |
 +-Rectangle (ID, Width, Height)

 The mapper creates 3 tables with the columns next to the class name.
 And it creates 3 views. One of them:

 RectangleView:  SELECT r.ID as ID, s.X as X, s.Y as Y,
r.Width
 as Width, r.Height as Height FROM Rectangle r LEFT JOIN Shape
s ON
 ( r.ID=s.ID)

I find this view definition a bit strange: why is there a left outer
join? I expect there to be a FK from Rectangle.ID to Shape.ID (all
rectangles are shapes), which makes the definition totally equivalent
with one in which a normal join is used (whether attributes of Shape
are used or not).

The main use case I see for the original optimization is ORMs that
join in a whole hierarchy, even when only a part of it is needed. I
guess that that is rather common. The ORM that I use does exactly
this, because the main target-DBMSs (MS-SQL and Oracle) do the
optimization for it.

Example (somewhat less contrived than my previous one):

Imagine an implementation of the typical books that are borrowed by
people n-m relationship, using three tables (Book, Borrowed,
Person). Let's find all books that have been borrowed by a certain
person.

The non-ORM version would be something like:

SELECT Book.*
FROM
Book
   JOIN Borrowed ON Borrowed.book_id = Book.id http://book.id/
WHERE Borrowed.person_id = x;

Now assume that Borrowed is a class hierarchy mapped into multiple
tables by a typical ORM. The query would probably become something
like:

SELECT Book.*
FROM
 Book
JOIN Borrowed_Parent ON Borrowed_Parent.book_id = 
Book.idhttp://book.id/
   LEFT JOIN Borrowed_Child1 ON Borrowed_Child1.id = Borrowed_Parent.id
   LEFT JOIN Borrowed_Child2 ON Borrowed_Child2.id = Borrowed_Parent.id
   (...)
WHERE Borrowed_Parent.person_id = x;

It is clear that the children of the hierarchy are needlessly joined
in (as the only attribute that is actually needed is person_id, which
is on the parent level). It is not always trivial for the ORM to find
that out, without writing stuff that looks suspiciously similar to a
DBMS optimizer.

Maybe it is debatable whether this optimization should be done by the
application (i.e. the ORM) or by the DBMS. I am personally in favor of
doing it in the DBMS.

greetings,
Nicolas

--
Nicolas Barbier
http://www.gnu.org/philosophy/no-word-attachments.html



Re: [HACKERS] problem with install scripts.

2007-04-08 Thread Peter Eisentraut
Tzahi Fadida wrote:
 What should i do?

Telling what version you use and what error messages you see would help.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] problem with install scripts.

2007-04-08 Thread Tzahi Fadida
On Sunday 08 April 2007 21:34:52 Peter Eisentraut wrote:
 Tzahi Fadida wrote:
  What should i do?

 Telling what version you use and what error messages you see would help.

  SET client_min_messages = warning;
  \set ECHO none
+ ERROR:  unrecognized configuration parameter standard_conforming_strings
+ ERROR:  unrecognized configuration parameter escape_string_warning
  RESET client_min_messages;

#define CATALOG_VERSION_NO  200411041


-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] problem with install scripts.

2007-04-08 Thread Peter Eisentraut
Tzahi Fadida wrote:
 On Sunday 08 April 2007 21:34:52 Peter Eisentraut wrote:
  Tzahi Fadida wrote:
   What should i do?
 
  Telling what version you use and what error messages you see would
  help.

   SET client_min_messages = warning;
   \set ECHO none
 + ERROR:  unrecognized configuration parameter
 standard_conforming_strings + ERROR:  unrecognized configuration
 parameter escape_string_warning RESET client_min_messages;

 #define CATALOG_VERSION_NO  200411041

Well, your version is too old then.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] [HACKERS] Arrays of Complex Types

2007-04-08 Thread Andrew Dunstan

Tom Lane wrote:

David Fetter [EMAIL PROTECTED] writes:
  

On Mon, Apr 02, 2007 at 10:01:44PM -0400, Alvaro Herrera wrote:


So, hum, what happened to the idea of creating the array types only
on demand?
  


  

Scotched, as far as I could tell,



More like you submitted a patch that entirely ignores multiple people's
opinion on what is needed.

Bruce may have put this into the patch queue, but do not labor under
the delusion that that means it'll get applied as-is.  The queue is
currently operating as a list of open issues.

  


One of the things that's been bothering me about this proposal is that 
it leaves untouched and indeed greatly expands the scope of the typename 
mangling we do. (i.e. adding an entry to pg_type with _ prepended). Up 
to now we've only used this gadget in a way that might matter a lot on 
user defined non-composite types, I think, and now we have expanded that 
to include enums, which are really a special case of user defined 
non-composites which don't require an extra C module. That's a 
comparatively small window, but this proposal will extend it to all 
composites, which is quite a large expansion in scope. And since _ is a 
perfectly legal initial char for an identifier, if type _foo exists then 
any attempt to create a table or view or composite called foo will fail.


Is it possible to fix this, or am I trying to shut the stable door after 
the horse has well and truly bolted? If it can be fixed, I'd like to see 
it fixed before we fix the problem David is trying to address here.


It's been suggested to me that this is an insignificant corner case. But 
I have often seen coding standards that actually require certain classes 
of identifier to being with _, so it's very far from a merely 
theoretical point.


I'm slightly inclined to agree with David that the danger of catalog 
bloat isn't that great, and might not justify the extra work that some 
sort of explicit array creation would involve (e.g. changes in grammar, 
pg_dump), as long as we are agreed that we don't want array types ever 
to have their own user definable names or settable namespace.


cheers

andrew


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PATCHES] [HACKERS] Arrays of Complex Types

2007-04-08 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 One of the things that's been bothering me about this proposal is that 
 it leaves untouched and indeed greatly expands the scope of the typename 
 mangling we do. (i.e. adding an entry to pg_type with _ prepended).

Yeah, that's been bothering me too.  One of the problems with the patch
as-is is that it extends the 62-instead-of-63-char limit to table names
as well as type names.

I've been thinking of proposing that we add a column to pg_type that
points from a type to its array type (if any), ie the reverse link
from typelem.  If we had that then the parser could follow that to
determine which type is foo[], instead of relying on the _foo naming
convention.  I don't suggest that we stop using the naming convention,
but it would no longer be a hard-and-fast rule, just a convention.
In particular we could rejigger things around the edges to reduce
the name conflict problem.  For instance the rule for forming array type
names could be prepend _, truncate to less than 64 bytes if necessary,
then substitute numbers at the end if needed to get something unique.
This is not all that different from what we do now to get unique
serial sequence names, for example.

This would also open the door to supporting 

CREATE TYPE foo AS ARRAY OF bar

without having to have any restrictions about the name of foo.
I'd still much rather do things that way for arrays of composites
than invent a ton of pg_type entries that are mostly going to go
unused.

regards, tom lane

PS: Has anyone looked at what it will take to make the entries
in an array-of-composite be something smaller than full tuples?
It's not going to be anything but a toy unless you can get the
per-entry overhead down to something sane.  Perhaps the
MinimalTuple representation would work.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] [HACKERS] Arrays of Complex Types

2007-04-08 Thread David Fetter
On Sun, Apr 08, 2007 at 07:08:38PM -0400, Tom Lane wrote:
 Andrew Dunstan [EMAIL PROTECTED] writes:
  One of the things that's been bothering me about this proposal is that 
  it leaves untouched and indeed greatly expands the scope of the typename 
  mangling we do. (i.e. adding an entry to pg_type with _ prepended).
 
 Yeah, that's been bothering me too.  One of the problems with the patch
 as-is is that it extends the 62-instead-of-63-char limit to table names
 as well as type names.

I did this by copying some code which already creates array names, so
should that code change to do something different, the
62-instead-of-63-char thing would go away along with it.  I agree that
the prepended _s are far from optimal.

 I've been thinking of proposing that we add a column to pg_type that
 points from a type to its array type (if any), ie the reverse link
 from typelem.  If we had that then the parser could follow that to
 determine which type is foo[], instead of relying on the _foo naming
 convention.  I don't suggest that we stop using the naming convention,
 but it would no longer be a hard-and-fast rule, just a convention.

That'd be neat :)

 In particular we could rejigger things around the edges to reduce
 the name conflict problem.  For instance the rule for forming array type
 names could be prepend _, truncate to less than 64 bytes if necessary,
 then substitute numbers at the end if needed to get something unique.
 This is not all that different from what we do now to get unique
 serial sequence names, for example.
 
 This would also open the door to supporting 
 
 CREATE TYPE foo AS ARRAY OF bar

I'm sorry to keep harping on this, but I really don't see a use case
and do see foot guns both with making the array types optional and
with decoupling their names from those of their respective compound
types.  When they're optional, we get all kinds of stepping on a step
that isn't there issues, and when they're decoupled, operations like,
ALTER TABLE foo RENAME TO bar have either surprising or undefined
behavior, or both.

 without having to have any restrictions about the name of foo.
 I'd still much rather do things that way for arrays of composites
 than invent a ton of pg_type entries that are mostly going to go
 unused.

I'm sure there's a better way than my first attempt.

 PS: Has anyone looked at what it will take to make the entries in an
 array-of-composite be something smaller than full tuples?  It's not
 going to be anything but a toy unless you can get the per-entry
 overhead down to something sane.  Perhaps the MinimalTuple
 representation would work.

Sounds neat, too :)

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 415 235 3778AIM: dfetter666
  Skype: davidfetter

Remember to vote!
Consider donating to PostgreSQL: http://www.postgresql.org/about/donate

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] [HACKERS] Arrays of Complex Types

2007-04-08 Thread Andrew Dunstan

Tom Lane wrote:


I've been thinking of proposing that we add a column to pg_type that
points from a type to its array type (if any), ie the reverse link
from typelem.  If we had that then the parser could follow that to
determine which type is foo[], instead of relying on the _foo naming
convention.  


good.



I don't suggest that we stop using the naming convention,
but it would no longer be a hard-and-fast rule, just a convention.
In particular we could rejigger things around the edges to reduce
the name conflict problem.  For instance the rule for forming array type
names could be prepend _, truncate to less than 64 bytes if necessary,
then substitute numbers at the end if needed to get something unique.
This is not all that different from what we do now to get unique
serial sequence names, for example.
  


Sounds OK but I'd add something that might make it even more unlikely to 
generate a name clash.


This would also open the door to supporting 


CREATE TYPE foo AS ARRAY OF bar

without having to have any restrictions about the name of foo.
I'd still much rather do things that way for arrays of composites
than invent a ton of pg_type entries that are mostly going to go
unused.


  


ISTM we should either do it all automatically or all manually. If you 
want user defined names for array types then we can forget name mangling 
for user defined types and do everything manually.


cheers

andrew



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PATCHES] [HACKERS] Arrays of Complex Types

2007-04-08 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 I don't suggest that we stop using the naming convention,
 but it would no longer be a hard-and-fast rule, just a convention.
 In particular we could rejigger things around the edges to reduce
 the name conflict problem.  For instance the rule for forming array type
 names could be prepend _, truncate to less than 64 bytes if necessary,
 then substitute numbers at the end if needed to get something unique.
 This is not all that different from what we do now to get unique
 serial sequence names, for example.

 Sounds OK but I'd add something that might make it even more unlikely to 
 generate a name clash.

Like what?  I don't want to stray far from _foo when we don't have to,
because I'm sure there is user code out there that'll still rely on
that naming convention; we shouldn't break it if we don't have to.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] [HACKERS] Arrays of Complex Types

2007-04-08 Thread Andrew Dunstan

Tom Lane wrote:

Andrew Dunstan [EMAIL PROTECTED] writes:
  

Tom Lane wrote:


I don't suggest that we stop using the naming convention,
but it would no longer be a hard-and-fast rule, just a convention.
In particular we could rejigger things around the edges to reduce
the name conflict problem.  For instance the rule for forming array type
names could be prepend _, truncate to less than 64 bytes if necessary,
then substitute numbers at the end if needed to get something unique.
This is not all that different from what we do now to get unique
serial sequence names, for example.
  


  
Sounds OK but I'd add something that might make it even more unlikely to 
generate a name clash.



Like what?  I don't want to stray far from _foo when we don't have to,
because I'm sure there is user code out there that'll still rely on
that naming convention; we shouldn't break it if we don't have to.


  

Oh, in that case maybe we'd better live with it :-(

I certainly think we should deprecate relying on it.

cheers

andrew


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] [HACKERS] Arrays of Complex Types

2007-04-08 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 I'm slightly inclined to agree with David that the danger of catalog 
 bloat isn't that great, and might not justify the extra work that some 
 sort of explicit array creation would involve (e.g. changes in grammar, 
 pg_dump), as long as we are agreed that we don't want array types ever 
 to have their own user definable names or settable namespace.

I did some tests just now to determine the total number of catalog
entries associated with a simple table definition.  Assuming it has
N user columns of built-in types (hence not requiring pg_depend entries
for the datatypes), I count

1 pg_class entry for the table itself
1 pg_type entry for the rowtype
N + 6 pg_attribute entries for the user and system columns
2 pg_depend entries (type - table and table - namespace)
2 pg_shdepend entries (ownership of table and type)

Of course this goes up *fast* if you need a toast table, indexes,
constraints, etc, but that's the irreducible minimum.

Generating an array rowtype would add three more catalog entries to this
(the array pg_type entry, a pg_depend arraytype-rowtype link, and
another pg_shdepend entry), which isn't a huge percentage overhead.
Obviously if we wanted to trim some fat here, getting rid of the
redundant pg_attribute entries for system columns would be the first
place to look.

Based on this, I withdraw my efficiency concern about generating
rowtypes for all user tables.  I do, however, still object to generating
them for system tables.  In particular an array type for pg_statistic
will actively Not Work and probably constitute a security hole, because
of the anyarray hack we use there.

BTW, I just noticed that we currently create array types with AUTO
dependencies on their element type, meaning that you can drop them
separately:

regression=# create type fooey as enum ('a','b');
CREATE TYPE
regression=# drop type _fooey;
DROP TYPE

Is this a bad idea?  If we made the dependency INTERNAL then the
system would refuse the drop above.  I think we would have to do
that if we wanted to add the base-array link I suggested, because
otherwise this drop would leave a dangling pointer in pg_type.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] [HACKERS] Arrays of Complex Types

2007-04-08 Thread Andrew Dunstan

Tom Lane wrote:


Based on this, I withdraw my efficiency concern about generating
rowtypes for all user tables.  I do, however, still object to generating
them for system tables.  In particular an array type for pg_statistic
will actively Not Work and probably constitute a security hole, because
of the anyarray hack we use there.

  


How would we do that? Not create the array types in bootstrap mode? Or 
just special-case pg_statistic?


cheers

andrew

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Bitmap index thoughts (another segfault)

2007-04-08 Thread Gavin Sherry

 I'm seeing a segfault on a size TPC-H size 10 database. The patch and
 code are:
 - bitmap patch from 12 Mar
 - 8.3 dev from 27 Mar

Thanks Mark. I tracked this down. I'll post a new patch soon.

Gavin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Bitmap index thoughts (another segfault)

2007-04-08 Thread Gavin Sherry
On Sat, 7 Apr 2007, Mark Kirkwood wrote:

 Mark Kirkwood wrote:
 bitmap=# SELECT count(*) FROM bitmaptest
 WHERE val1 in (1,7)
 AND val0 IN (4,3)
 ;

 ERROR:  XX000: unknown stream type 2
 LOCATION:  stream_add_node, tidbitmap.c:1033

Thanks. Turned out to be a typo in stream_add_node()! I'll post a new
patch soon. Thanks for the test kit and your testing!

Gavin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] [HACKERS] Arrays of Complex Types

2007-04-08 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 How would we do that? Not create the array types in bootstrap mode? Or 
 just special-case pg_statistic?

Not generate them in bootstrap mode works for me.  IIRC, there's code
somewhere in there that allows anyarray to pass as a column type in
bootstrap mode, so that seems to fit ...

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings