Re: [PERFORM] single index on more than two coulumns a bad thing?

2004-04-02 Thread Aaron Werman
another thing that I have all over the place is a hierarchy:
index on grandfather_table(grandfather)
index on father_table(grandfather, father)
index on son_table(grandfather, father, son)

almost all of my indices are composite. Are you thinking about composite
indices with low cardinality leading columns?

/Aaron

- Original Message - 
From: Josh Berkus [EMAIL PROTECTED]
To: Palle Girgensohn [EMAIL PROTECTED];
[EMAIL PROTECTED]
Sent: Thursday, April 01, 2004 7:35 PM
Subject: Re: [PERFORM] single index on more than two coulumns a bad thing?


 Palle,

  Is it always bad to create index xx on yy (field1, field2, field3);

 No, it seldom bad, in fact.I have some indexes that run up to seven
 columns, becuase they are required for unique keys.

 Indexes of 3-4 columns are often *required* for many-to-many join tables.

 I'm afraid that you've been given some misleading advice.

  I guess the problem is that the index might often grow bigger than the
  table, or at least big enough not to speed up the queries?

 Well, yes ... a 4-column index on a 5-column table could be bigger than
the
 table if allowed to bloat and not re-indexed.   But that's just a reason
for
 better maintainence.

 -- 
 -Josh Berkus
  Aglio Database Solutions
  San Francisco


 ---(end of broadcast)---
 TIP 7: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] Spatial join insists on sequential scan of larger table

2004-04-02 Thread Clive Page
I am trying to do a spatial join between two tables each of which has a
column of type BOX called ERRBOX, with R-TREE indices created on both.

The smaller table, xmm1, has  56,711 rows,
the larger one, twomass, has 177,757,299 rows.

The most efficient way to join these is to do a sequential scan of the
smaller table, and an R-tree lookup on the larger.  However for a simple
inner join the optimiser seems to want to do the reverse, for example:

EXPLAIN
SELECT x.ra AS xra, x.decl AS xdecl, t.ra AS tra, t.decl AS tdecl
FROM xmm1 AS x INNER JOIN twomass AS t
ON x.errbox  t.errbox;

QUERY PLAN
--
 Nested Loop  (cost=0.00..196642756520.34 rows=49506496044 width=32)
   -  Seq Scan on twomass t  (cost=0.00..9560002.72 rows=177023872 width=48)
   -  Index Scan using xmm1box on xmm1 x  (cost=0.00..1107.28 rows=280 width=48)
 Index Cond: (x.errbox  outer.errbox)


Reversing the join condition (i.e. t.errbox  x.errbox) and similar make
no difference, nor does using the old implicit join syntax.

If, however, I specify an outer join such as:

EXPLAIN
SELECT x.ra AS xra, x.decl AS xdecl, t.ra AS tra, t.decl AS tdecl
FROM xmm1 AS x LEFT OUTER JOIN twomass AS t
ON x.errbox  t.errbox;

   QUERY PLAN

 Nested Loop Left Join  (cost=0.00..198945259325.90 rows=49506496044
width=32)
   -  Seq Scan on xmm1 x  (cost=0.00..8592.32 rows=55932 width=48)
   -  Index Scan using tbox on twomass t  (cost=0.00..3545848.88 rows=885119 width=48)
 Index Cond: (outer.errbox  t.errbox)


This executes, it need hardly be said, a whole lot faster.

I found that I can also force a sequential scan of the smaller table by
dropping its R-tree index, but I may need this in other operations, so
this isn't a very satisfactory solution.  It's odd that an outer join
should be faster than an inner one, or to put it another way, after
dropping an index there is more than an order of magnitude speed increase.

I'm using Postgres 7.4.1 on Red Hat Linux.  Has anyone had similar
problems with spatial joins?


-- 
Clive Page
Dept of Physics  Astronomy,
University of Leicester,
Leicester, LE1 7RH,  U.K.



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Spatial join insists on sequential scan of larger table

2004-04-02 Thread Tom Lane
Clive Page [EMAIL PROTECTED] writes:
 This executes, it need hardly be said, a whole lot faster.

Could we see EXPLAIN ANALYZE output?

The estimated costs for the two cases are nearly the same, which says to
me that there's something wrong with the cost model for r-tree lookups,
but I don't know what it is.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] single index on more than two coulumns a bad thing?

2004-04-02 Thread Bruno Wolff III
On Fri, Apr 02, 2004 at 01:00:45 +0200,
  Palle Girgensohn [EMAIL PROTECTED] wrote:
 
 Is it always bad to create index xx on yy (field1, field2, field3);
 
 I guess the problem is that the index might often grow bigger than the 
 table, or at least big enough not to speed up the queries?

One place where you need them in postgres is enforcing unique multicolumn
keys. These will get created implicitly from the unique (or primary key)
constraint. It isn't all that unusual to have a table that describes
a many to many (to many ...) relationship where the primary key is all
of the columns.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Spatial join insists on sequential scan of larger

2004-04-02 Thread Clive Page
On Fri, 2 Apr 2004, Tom Lane wrote:

 Could we see EXPLAIN ANALYZE output?

Certainly, but that's going to take a little time (as the ANALYZE causes
it to run the actual query, which I only just discovered), so may have to
wait until Monday if I don't get time to finish it this afternoon.


-- 
Clive Page
Dept of Physics  Astronomy,
University of Leicester,
Leicester, LE1 7RH,  U.K.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings