different query plan because different limit # (Re: [PERFORM] weird query plan)

2007-06-06 Thread weiping
I changed the query to :
EXPLAIN ANALYZE select id from wd_urlusermaps where id in (select id
from wd_urlusermaps where share =1 and userid='219177') order by id desc
limit 20;

and it's much better now (from real execute time), but the cost report
higher
then slower one above, may be I should do some tunning on planner
parameter or
is it a planner bug?
QUERY PLAN
---
Limit (cost=16118.83..16118.88 rows=20 width=4) (actual
time=17.539..17.619 rows=20 loops=1)
- Sort (cost=16118.83..16121.57 rows=1094 width=4) (actual
time=17.534..17.560 rows=20 loops=1)
Sort Key: public.wd_urlusermaps.id
- Nested Loop (cost=6753.28..16063.61 rows=1094 width=4) (actual
time=16.739..17.439 rows=41 loops=1)
- HashAggregate (cost=6753.28..6764.22 rows=1094 width=4) (actual
time=16.707..16.786 rows=41 loops=1)
- Index Scan using urlusermaps_userid on wd_urlusermaps
(cost=0.00..6750.55 rows=1094 width=4) (actual time=1.478..16.563
rows=41 loops=1)
Index Cond: (userid = 219177)
Filter: (share = 1)
- Index Scan using wd_urlusermaps_pkey on wd_urlusermaps
(cost=0.00..8.49 rows=1 width=4) (actual time=0.008..0.010 rows=1 loops=41)
Index Cond: (public.wd_urlusermaps.id = public.wd_urlusermaps.id)
Total runtime: 17.762 ms
(11 rows)

 sorry, forgot to mention our version, it's postgresql 8.2.3

 -laser
   
 I have a table:
 webdigest=# \d wd_urlusermaps
 表 public.wd_urlusermaps
 字段名 | 类型 | 修饰词
 -+-+-
 id | integer | not null default nextval('wd_urlusermaps_id_seq'::regclass)
 urlid | integer | not null
 tag | character varying(512) |
 title | character varying(512) |
 summary | character varying(1024) |
 comment | character varying(1024) |
 ctime | timestamp without time zone |
 mtime | timestamp without time zone |
 share | smallint |
 userid | integer |
 import | smallint | default 0
 索引:
 wd_urlusermaps_pkey PRIMARY KEY, btree (id) CLUSTER
 urlusermaps_urlid_userid UNIQUE, btree (urlid, userid)
 urlusermaps_urlid btree (urlid)
 urlusermaps_userid btree (userid)
 wd_urlusermaps_ctime_idx btree (ctime)
 wd_urlusermaps_share_idx btree (share)

 and target statistic set to 1000, and two different query plan:

 webdigest=# explain analyze select A.id as
 fav_id,A.urlid,A.tag,A.title,A.summary,A.comment,A.ctime,A.share from
 wd_urlusermaps A where share =1 and A.userid='219177' ORDER BY A.id DESC
 limit 20 ;
 QUERY PLAN
 
 Limit (cost=0.00..4932.56 rows=20 width=96) (actual
 time=730.461..2374.435 rows=20 loops=1)
 - Index Scan Backward using wd_urlusermaps_pkey on wd_urlusermaps a
 (cost=0.00..269810.77 rows=1094 width=96) (actual time=730.456..2374.367
 rows=20 loops=1)
 Filter: ((share = 1) AND (userid = 219177))
 Total runtime: 2374.513 ms
 (4 rows)

 webdigest=# explain analyze select A.id as
 fav_id,A.urlid,A.tag,A.title,A.summary,A.comment,A.ctime,A.share from
 wd_urlusermaps A where share =1 and A.userid='219177' ORDER BY A.id DESC
 limit 40 ;
 QUERY PLAN
 -
 Limit (cost=6805.77..6805.87 rows=40 width=96) (actual time=5.731..5.905
 rows=40 loops=1)
 - Sort (cost=6805.77..6808.50 rows=1094 width=96) (actual
 time=5.726..5.785 rows=40 loops=1)
 Sort Key: id
 - Index Scan using urlusermaps_userid on wd_urlusermaps a
 (cost=0.00..6750.55 rows=1094 width=96) (actual time=0.544..5.616
 rows=41 loops=1)
 Index Cond: (userid = 219177)
 Filter: (share = 1)
 Total runtime: 6.013 ms
 (7 rows)

 the userid=219177 got 2000+ record and around 40 shared=1, why above 2 query
 shows so much difference?

 any hint would be greatly appreciated.

 -laser


 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly


   
 

 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings


   

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Thousands of tables versus on table?

2007-06-06 Thread Heikki Linnakangas

[EMAIL PROTECTED] wrote:

On Wed, 6 Jun 2007, Steinar H. Gunderson wrote:


On Tue, Jun 05, 2007 at 05:59:25PM -0400, Tom Lane wrote:

I think the main argument for partitioning is when you are interested in
being able to drop whole partitions cheaply.


Wasn't there also talk about adding the ability to mark individual 
partitions
as read-only, thus bypassing MVCC and allowing queries to be satisfied 
using

indexes only?

Not that I think I've seen it on the TODO... :-)


now that's a very interesting idea, especially when combined with 
time-based data where the old times will never change.


That's been discussed, but it's controversial. IMHO a better way to 
achieve that is to design the dead-space-map so that it can be used to 
check which parts of a table are visible to everyone, and skip 
visibility checks. That doesn't require any user action, and allows updates.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: different query plan because different limit # (Re: [PERFORM] weird query plan)

2007-06-06 Thread weiping
continue digging shows:
set cpu_tuple_cost to 0.1;
explain analyze select * from wd_urlusermaps where share =1 and
userid='219177' order by id desc limit 20;
SET
时间: 0.256 ms
QUERY PLAN

Limit (cost=7063.98..7064.03 rows=20 width=110) (actual
time=6.047..6.130 rows=20 loops=1)
- Sort (cost=7063.98..7066.71 rows=1094 width=110) (actual
time=6.043..6.070 rows=20 loops=1)
Sort Key: id
- Index Scan using urlusermaps_userid on wd_urlusermaps
(cost=0.00..7008.76 rows=1094 width=110) (actual time=0.710..5.838
rows=41 loops=1)
Index Cond: (userid = 219177)
Filter: (share = 1)
Total runtime: 6.213 ms
(7 rows)

now it's what i need, which means we should increase cpu_tuple_cost for
large
RAM node (we got 16G RAN and the table only serveral hundred M) to avoid
sort
happened too early. is it true?

-laser
 I changed the query to :
 EXPLAIN ANALYZE select id from wd_urlusermaps where id in (select id
 from wd_urlusermaps where share =1 and userid='219177') order by id desc
 limit 20;

 and it's much better now (from real execute time), but the cost report
 higher
 then slower one above, may be I should do some tunning on planner
 parameter or
 is it a planner bug?
 QUERY PLAN
 ---
 Limit (cost=16118.83..16118.88 rows=20 width=4) (actual
 time=17.539..17.619 rows=20 loops=1)
 - Sort (cost=16118.83..16121.57 rows=1094 width=4) (actual
 time=17.534..17.560 rows=20 loops=1)
 Sort Key: public.wd_urlusermaps.id
 - Nested Loop (cost=6753.28..16063.61 rows=1094 width=4) (actual
 time=16.739..17.439 rows=41 loops=1)
 - HashAggregate (cost=6753.28..6764.22 rows=1094 width=4) (actual
 time=16.707..16.786 rows=41 loops=1)
 - Index Scan using urlusermaps_userid on wd_urlusermaps
 (cost=0.00..6750.55 rows=1094 width=4) (actual time=1.478..16.563
 rows=41 loops=1)
 Index Cond: (userid = 219177)
 Filter: (share = 1)
 - Index Scan using wd_urlusermaps_pkey on wd_urlusermaps
 (cost=0.00..8.49 rows=1 width=4) (actual time=0.008..0.010 rows=1 loops=41)
 Index Cond: (public.wd_urlusermaps.id = public.wd_urlusermaps.id)
 Total runtime: 17.762 ms
 (11 rows)

   
 sorry, forgot to mention our version, it's postgresql 8.2.3

 -laser
   
 
 I have a table:
 webdigest=# \d wd_urlusermaps
 表 public.wd_urlusermaps
 字段名 | 类型 | 修饰词
 -+-+-
 id | integer | not null default nextval('wd_urlusermaps_id_seq'::regclass)
 urlid | integer | not null
 tag | character varying(512) |
 title | character varying(512) |
 summary | character varying(1024) |
 comment | character varying(1024) |
 ctime | timestamp without time zone |
 mtime | timestamp without time zone |
 share | smallint |
 userid | integer |
 import | smallint | default 0
 索引:
 wd_urlusermaps_pkey PRIMARY KEY, btree (id) CLUSTER
 urlusermaps_urlid_userid UNIQUE, btree (urlid, userid)
 urlusermaps_urlid btree (urlid)
 urlusermaps_userid btree (userid)
 wd_urlusermaps_ctime_idx btree (ctime)
 wd_urlusermaps_share_idx btree (share)

 and target statistic set to 1000, and two different query plan:

 webdigest=# explain analyze select A.id as
 fav_id,A.urlid,A.tag,A.title,A.summary,A.comment,A.ctime,A.share from
 wd_urlusermaps A where share =1 and A.userid='219177' ORDER BY A.id DESC
 limit 20 ;
 QUERY PLAN
 
 Limit (cost=0.00..4932.56 rows=20 width=96) (actual
 time=730.461..2374.435 rows=20 loops=1)
 - Index Scan Backward using wd_urlusermaps_pkey on wd_urlusermaps a
 (cost=0.00..269810.77 rows=1094 width=96) (actual time=730.456..2374.367
 rows=20 loops=1)
 Filter: ((share = 1) AND (userid = 219177))
 Total runtime: 2374.513 ms
 (4 rows)

 webdigest=# explain analyze select A.id as
 fav_id,A.urlid,A.tag,A.title,A.summary,A.comment,A.ctime,A.share from
 wd_urlusermaps A where share =1 and A.userid='219177' ORDER BY A.id DESC
 limit 40 ;
 QUERY PLAN
 -
 Limit (cost=6805.77..6805.87 rows=40 width=96) (actual time=5.731..5.905
 rows=40 loops=1)
 - Sort (cost=6805.77..6808.50 rows=1094 width=96) (actual
 time=5.726..5.785 rows=40 loops=1)
 Sort Key: id
 - Index Scan using urlusermaps_userid on wd_urlusermaps a
 (cost=0.00..6750.55 rows=1094 width=96) (actual time=0.544..5.616
 rows=41 loops=1)
 Index Cond: (userid = 219177)
 Filter: (share = 1)
 Total runtime: 6.013 ms
 (7 rows)

 the userid=219177 got 2000+ record and around 40 shared=1, why above 2 query
 shows so much difference?

 any 

Re: different query plan because different limit # (Re: [PERFORM] weird query plan)

2007-06-06 Thread Gregory Stark
weiping [EMAIL PROTECTED] writes:

 - Index Scan using urlusermaps_userid on wd_urlusermaps
(cost=0.00..6750.55 rows=1094 width=4) (actual time=1.478..16.563 rows=41 
 loops=1)
Index Cond: (userid = 219177)
Filter: (share = 1)

It's estimating 1094 rows and getting 41 rows. You might considering raising
the statistics target for that table.

Does it get accurate estimates for the number of rows for each of these?

explain analyze select * from wd_urlusermaps where userid=219177 
explain analyze select * from wd_urlusermaps where share=1

(the latter might take a while)

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] upgraded to pgsql 8.2.4, getting worse performance then 7.4.x

2007-06-06 Thread Douglas J Hunley
On Tuesday 05 June 2007 10:34:04 Douglas J Hunley wrote:
 On Monday 04 June 2007 17:11:23 Gregory Stark wrote:
  Those plans look like they have a lot of casts to text in them. How have
  you defined your indexes? Are your id columns really text?

 project table:
 Indexes:
 project_pk PRIMARY KEY, btree (id)
 project_path UNIQUE, btree (path)

 role table:
 Indexes:
 role_pk PRIMARY KEY, btree (id)

 role_default_user table:
 Indexes:
 role_def_user_pk PRIMARY KEY, btree (id)
 role_def_u_prj_idx UNIQUE, btree (role_id, default_user_class_id,
 project_id)

 role_operation table:
 Indexes:
 role_operation_pk PRIMARY KEY, btree (id)
 role_oper_obj_oper btree (object_type_id, operation_category,
 operation_name)
 role_oper_role_id btree (role_id)

 sfuser table:
 Indexes:
 sfuser_pk PRIMARY KEY, btree (id)
 sfuser_username UNIQUE, btree (username)

 projectmembership table:
 Indexes:
 pjmb_pk PRIMARY KEY, btree (id)
 pjmb_projmember UNIQUE, btree (project_id, member_id)
 pjmb_member btree (member_id)

 relationship table:
 Indexes:
 relationship_pk PRIMARY KEY, btree (id)
 relation_origin btree (origin_id)
 relation_target btree (target_id)
 relation_type btree (relationship_type_name)

 field_value table:
 Indexes:
 field_value_pk PRIMARY KEY, btree (id)
 f_val_fid_val_idx UNIQUE, btree (field_id, value)
 field_class_idx btree (value_class)
 field_value_idx btree (value)

 item table:
 Indexes:
 item_pk PRIMARY KEY, btree (id)
 item_created_by_id btree (created_by_id)
 item_folder btree (folder_id)
 item_name btree (name)

 and yes, the 'id' column is always: character varying type

  And you don't have a 7.4 install around to compare the plans do you?

 I have a 7.3.19 db, if that would be useful

Any insight given the above?

-- 
Douglas J Hunley (doug at hunley.homeip.net) - Linux User #174778
http://doug.hunley.homeip.net

It is our moral duty to corrupt the young

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] weird query plan

2007-06-06 Thread Tom Lane
weiping [EMAIL PROTECTED] writes:
 - Index Scan using urlusermaps_userid on wd_urlusermaps a
 (cost=0.00..6750.55 rows=1094 width=96) (actual time=0.544..5.616
 rows=41 loops=1)
 Index Cond: (userid = 219177)
 Filter: (share = 1)

 the userid=219177 got 2000+ record and around 40 shared=1, why above 2 query
 shows so much difference?

Probably because the rowcount estimate is so far off (1094 vs 41).

Possibly boosting the statistics target would help.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Thousands of tables versus on table?

2007-06-06 Thread Scott Marlowe

[EMAIL PROTECTED] wrote:

On Tue, 5 Jun 2007, Tom Lane wrote:


[EMAIL PROTECTED] writes:

however I really don't understand why it is more efficiant to have a 5B
line table that you do a report/query against 0.1% of then it is to 
have
1000 different tables of 5M lines each and do a report/query against 
100%

of.


Essentially what you are doing when you do that is taking the top few
levels of the index out of the database and putting it into the
filesystem; plus creating duplicative indexing information in the
database's system catalogs.

The degree to which this is a win is *highly* debatable, and certainly
depends on a whole lot of assumptions about filesystem performance.
You also need to assume that constraint-exclusion in the planner is
pretty doggone cheap relative to the table searches, which means it
almost certainly will lose badly if you carry the subdivision out to
the extent that the individual tables become small.  (This last could

  ^^
what is considered 'small'? a few thousand records, a few million 
records?


I would say small is when the individual tables are in the 10 to 20 
Megabyte range.  How many records that is depends on record width, of 
course.  Basically, once the tables get small enough that you don't 
really need indexes much, since you tend to grab 25% or more of each one 
that you're going to hit in a query.


what multiplication factor would there need to be on the partitioning 
to make it worth while? 100 tables, 1000 tables, 1 tables?
Really depends on the size of the master table I think.  If the master 
table is about 500 Megs in size, and you partition it down to about 1 
meg per child table, you're probably ok.  Walking through 500 entries 
for constraint exclusion seems pretty speedy from the tests I've run on 
a 12M row table that was about 250 Megs, split into 200 to 400 or so 
equisized child tables.  The time to retrieve 85,000 rows that were all 
neighbors went from 2 to 6 seconds, to about 0.2 seconds, and we got rid 
of indexes entirely since they weren't really needed anymore.


the company that I'm at started out with a seperate database per 
customer (not useing postgres), there are basicly zero cross-customer 
queries, with a large volume of updates and lookups.


overall things have now grown to millions of updates/day (some 
multiple of this in lookups), and ~2000 customers, with tens of 
millions of rows between them.


having each one as a seperate database has really helped us over the 
years as it's made it easy to scale (run 500 databases on each server 
instead of 1000, performance just doubled)
I think that for what you're doing, partitioning at the database level 
is probably a pretty good compromise solution.  Like you say, it's easy 
to put busy databases on a new server to balance out the load.  Hardware 
is cheap.


various people (not database experts) are pushing to install Oracle 
cluster so that they can move all of these to one table with a 
customerID column.
Have these people identified a particular problem they're trying to 
solve, or is this a religious issue for them?  From your description it 
sounds like a matter of dogma, not problem solving.
the database folks won't comment much on this either way, but they 
don't seem enthusiastic to combine all the data togeather.
I think they can see the fecal matter heading towards the rotational 
cooling device on this one.  I can't imagine this being a win from the 
perspective of saving the company money.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Thousands of tables versus on table?

2007-06-06 Thread Scott Marlowe

Tom Lane wrote:

The degree to which this is a win is *highly* debatable, and certainly
depends on a whole lot of assumptions about filesystem performance.
You also need to assume that constraint-exclusion in the planner is
pretty doggone cheap relative to the table searches, which means it
almost certainly will lose badly if you carry the subdivision out to
the extent that the individual tables become small.  (This last could
be improved in some cases if we had a more explicit representation of
partitioning, but it'll never be as cheap as one more level of index
search.)
I did some testing a while back on some of this, and with 400 or so 
partitions, the select time was still very fast.


We were testing grabbing 50-80k rows from 12M at a time, all adjacent to 
each other.  With the one big table and one big two way index method, we 
were getting linearly increasing select times as the dataset grew larger 
and larger.  The indexes were much larger than available memory and 
shared buffers.  The retrieval time for 50-80k rows was on the order of 
2 to 6 seconds, while the retrieval time for the same number of rows 
with 400 partitions was about 0.2 to 0.5 seconds.


I haven't tested with more partitions than that, but might if I get a 
chance.  What was really slow was the inserts since I was using rules at 
the time.  I'd like to try re-writing it to use triggers, since I would 
then have one trigger on the parent table instead of 400 rules.  Or I 
could imbed the rules into the app that was creating / inserting the 
data.  The insert performance dropped off VERY fast as I went over 100 
rules, and that was what primarily stopped me from testing larger 
numbers of partitions.


The select performance stayed very fast with more partitions, so I'm 
guessing that the constraint exclusion is pretty well optimized.


I'll play with it some more when I get a chance.  For certain operations 
like the one we were testing, partitioning seems to pay off big time.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Thousands of tables versus on table?

2007-06-06 Thread Craig James

[EMAIL PROTECTED] wrote:
various people (not database experts) are pushing to install Oracle 
cluster so that they can move all of these to one table with a 
customerID column.


They're blowing smoke if they think Oracle can do this.  One of my applications 
had this exact same problem -- table-per-customer versus 
big-table-for-everyone.  Oracle fell over dead, even with the best indexing 
possible, tuned by the experts, and using partitions keyed to the customerID.

We ended up breaking it up into table-per-customer because Oracle fell over 
dead when we had to do a big update on a customer's entire dataset.  All other 
operations were slowed by the additional index on the customer-ID, especially 
complex joins.  With a table-for-everyone, you're forced to create tricky 
partitioning or clustering, clever indexes, and even with that, big updates are 
problematic.  And once you do this, then you become heavily tied to one RDBMS 
and your applications are no longer portable, because clustering, indexing, 
partitioning and other DB tuning tricks are very specific to each RDBMS.

When we moved to Postgres, we never revisited this issue, because both Oracle 
and Postgres are able to handle thousands of tables well.  As I wrote in a 
previous message on a different topic, often the design of your application is 
more important than the performance.  In our case, the table-per-customer makes 
the applications simpler, and security is MUCH easier.

Oracle is simply not better than Postgres in this regard.  As far as I know, 
there is only one specific situation (discussed frequently here) where Oracle 
is faster: the count(), min() and max() functions, and I know significant 
progress has been made since I started using Postgres.  I have not found any 
other query where Oracle is significantly better, and I've found several where 
Postgres is the clear winner.

It's telling that Oracle's license contract prohibits you from publishing 
comparisons and benchmarks.  You have to wonder why.

Craig

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Thousands of tables versus on table?

2007-06-06 Thread Scott Marlowe

Craig James wrote:


Oracle is simply not better than Postgres in this regard.  As far as I 
know, there is only one specific situation (discussed frequently here) 
where Oracle is faster: the count(), min() and max() functions, and I 
know significant progress has been made since I started using 
Postgres.  I have not found any other query where Oracle is 
significantly better, and I've found several where Postgres is the 
clear winner. 
In my testing between a commercial database that cannot be named and 
postgresql, I found max() / min() to be basically the same, even with 
where clauses and joins happening.


count(*), OTOH, is a still a clear winner for the big commercial 
database.  With smaller sets (1 Million or so) both dbs are in the same 
ballpark.


With 30+million rows, count(*) took 2 minutes on pgsql and 4 seconds on 
the big database.


OTOH, there are some things, like importing data, which are MUCH faster 
in pgsql than in the big database.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Thousands of tables versus on table?

2007-06-06 Thread Craig James

Scott Marlowe wrote:
OTOH, there are some things, like importing data, which are MUCH faster 
in pgsql than in the big database.


An excellent point, I forgot about this. The COPY command is the best thing 
since the invention of a shirt pocket.  We have a database-per-customer design, 
and one of the mosterous advantages of Postgres is that we can easily do 
backups.  A pg_dump, then scp to a backup server, and in just a minute or two 
we have a full backup.  For recovery, pg_restore is equally fast and amazing.  
Last time I checked, Oracle didn't have anything close to this.

Craig



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Thousands of tables versus on table?

2007-06-06 Thread Jonah H. Harris

On 6/6/07, Craig James [EMAIL PROTECTED] wrote:

They're blowing smoke if they think Oracle can do this.


Oracle could handle this fine.


Oracle fell over dead, even with the best indexing possible,
tuned by the experts, and using partitions keyed to the
customerID.


I don't think so, whoever tuned this likely didn't know what they were doing.


It's telling that Oracle's license contract prohibits you from
publishing comparisons and benchmarks.  You have to wonder why.


They did this for the same reason as everyone else.  They don't want
non-experts tuning the database incorrectly, writing a benchmark paper
about it, and making the software look bad.

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Thousands of tables versus on table?

2007-06-06 Thread Jonah H. Harris

On 6/6/07, Craig James [EMAIL PROTECTED] wrote:

Last time I checked, Oracle didn't have anything close to this.


When did you check, 15 years ago?  Oracle has direct-path
import/export and data pump; both of which make generic COPY look like
a turtle.  The new PostgreSQL bulk-loader takes similar concepts from
Oracle and is fairly faster than COPY.

Don't get me wrong, I'm pro-PostgreSQL... but spouting personal
observations on other databases as facts just boasts an
PostgreSQL-centric egotistical view of the world.  If you don't tune
Oracle, it will suck.  If you don't understand Oracle architecture
when you tune an application, it will suck; just like PostgreSQL.
People who don't have extensive experience in the other databases just
hear what you say and regurgitate it as fact; which it is not.

Look at how many people in these lists still go on and on about MySQL
flaws based on their experience with MySQL 3.23.  Times change and it
doesn't do anyone any good to be ignorant of other databases.  If
you're going to speak about another database in a comparison, please
stay current or specify the database you're comparing against.

This is nothing against you, but it always starts an avalanche of,
look how perfect we are compared to everyone else.

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Thousands of tables versus on table?

2007-06-06 Thread Andrew Sullivan
On Wed, Jun 06, 2007 at 12:06:09AM +0200, Steinar H. Gunderson wrote:

 Wasn't there also talk about adding the ability to mark individual
 partitions as read-only, thus bypassing MVCC and allowing queries
 to be satisfied using indexes only?

I have a (different) problem that read-only data segments (maybe
partitions, maybe something else) would help, so I know for sure that
someone is working on a problem like this, but I don't think it's the
sort of thing that's going to come any time soon.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
I remember when computers were frustrating because they *did* exactly what 
you told them to.  That actually seems sort of quaint now.
--J.D. Baldwin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] VERY slow queries at random

2007-06-06 Thread Gunther Mayer

Hi there,

We run a small ISP with a FreeBSD/freeradius/postgresql 8.2.4 backend
and 200+ users. Authentication happens via UAM/hotspot and I see a lot
of authorisation and accounting packets that are handled via PL/PGSQL
functions directly in the database.

Everything seems to work 100% except that a few times a day I see

Jun  6 10:41:31 caligula postgres[57347]: [4-1] radiususer: LOG:
duration: 19929.291 ms  statement: SELECT fn_accounting_start(...)

in my logs. I'm logging slow queries with log_min_duration_statement =
500 in my postgresql.conf. Sometimes another query runs equally slow or
even slower (I've seen 139 seconds!!!) a few minutes before or after as
well, but then everything is back to normal.

Even though I haven't yet indexed my data I know that the system is
performant because my largest table (the accounting one) only has 5000+
rows, the entire database is only a few MB's and I have plenty of memory
(2GB), shared_buffers = 100MB and max_fsm_pages = 179200.  Also from
briefly enabling

log_parser_stats = on
log_planner_stats = on
log_executor_stats = on

I saw that most queries are 100% satisfied from cache so the disk
doesn't even get hit. Finally, the problem seems unrelated to load
because it happens at 4am just as likely as at peak traffic time.

What the heck could cause such erratic behaviour? I suspect some type of
resource problem but what and how could I dig deeper?

Gunther


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Thousands of tables versus on table?

2007-06-06 Thread Andrew Sullivan
On Tue, Jun 05, 2007 at 03:31:55PM -0700, [EMAIL PROTECTED] wrote:
 various people (not database experts) are pushing to install Oracle 
 cluster so that they can move all of these to one table with a customerID 
 column.

Well, you will always have to deal with the sort of people who will
base their technical prescriptions on the shiny ads they read in
SuperGlobalNetworkedExecutiveGoFast, or whatever rag they're reading
these days.  I usually encourage such people actually to perform the
analysis of the license, salary, contingency, and migrations costs
(and do a similar analysis myself, actually, so when they have
overlooked the 30 things that individually cost $1million a piece, I
can point them out).  More than one jaw has had to be picked up off
the floor when presented with the bill for RAC.  Frequently, people
discover that it is a good way to turn your tidy money-making
enterprise into a giant money hole that produces a sucking sound on
the other end of which is Oracle Corporation.  

All of that aside, I have pretty severe doubts that RAC would be a
win for you.  A big honkin' single database in Postgres ought to be
able to do this too, if you throw enough hardware money at it.  But
it seems a waste to re-implement something that's already apparently
working for you in favour of something more expensive that you don't
seem to need.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
When my information changes, I alter my conclusions.  What do you do sir?
--attr. John Maynard Keynes

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Thousands of tables versus on table?

2007-06-06 Thread Jonah H. Harris

On 6/6/07, Andrew Sullivan [EMAIL PROTECTED] wrote:

Well, you will always have to deal with the sort of people who will
base their technical prescriptions on the shiny ads they read in
SuperGlobalNetworkedExecutiveGoFast, or whatever rag they're reading
these days.


Always.


I usually encourage such people actually to perform the
analysis of the license, salary, contingency, and migrations costs


Yes, this is the best way.

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 6: explain analyze is your friend


control of benchmarks (was: [PERFORM] Thousands of tables)

2007-06-06 Thread Andrew Sullivan
On Wed, Jun 06, 2007 at 02:01:59PM -0400, Jonah H. Harris wrote:
 They did this for the same reason as everyone else.  They don't want
 non-experts tuning the database incorrectly, writing a benchmark paper
 about it, and making the software look bad.

I agree that Oracle is a fine system, and I have my doubts about the
likelihood Oracle will fall over under fairly heavy loads.  But I
think the above is giving Oracle Corp a little too much credit. 

Corporations exist to make money, and the reason they prohibit doing
anything with their software and then publishing it without their
approval is because they want to control all the public perception of
their software, whether deserved or not.  Every user of any large
software system (Oracle or otherwise) has their favourite horror
story about the grotty corners of that software;
commercially-licensed people just aren't allowed to prove it in
public.  It's not only the clueless Oracle is protecting themselves
against; it's also the smart, accurate, but expensive corner-case
testers.  I get to complain that PostgreSQL is mostly fast but has
terrible outlier performance problems.  I can think of another system
that I've used that certainly had a similar issue, but I couldn't
show you the data to prove it.  Everyone who used it knew about it,
though.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
A certain description of men are for getting out of debt, yet are
against all taxes for raising money to pay it off.
--Alexander Hamilton

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] VERY slow queries at random

2007-06-06 Thread Andrew Sullivan
On Wed, Jun 06, 2007 at 09:20:54PM +0200, Gunther Mayer wrote:
 
 What the heck could cause such erratic behaviour? I suspect some type of
 resource problem but what and how could I dig deeper?

Is something (perhaps implicitly) locking the table?  That will cause
this.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
The year's penultimate month is not in truth a good way of saying
November.
--H.W. Fowler

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: control of benchmarks (was: [PERFORM] Thousands of tables)

2007-06-06 Thread Jonah H. Harris

On 6/6/07, Andrew Sullivan [EMAIL PROTECTED] wrote:

But I think the above is giving Oracle Corp a little too
much credit.


Perhaps.  However, Oracle has a thousand or so knobs which can control
almost every aspect of every subsystem.  If you know how they interact
with each other and how to use them properly, they can make a huge
difference in performance.  Most people do not know all the knobs or
understand what difference each can make given the theory and
architecture of the system, which results in poor general
configurations.  Arguably, there is a cost associated with having
someone staffed and/or consulted that has the depth of knowledge
required to tune it in such a manner which goes back to a basic
cost/benefit analysis.

Oracle, while seeming like a one-size-fits-all system, has the same
basic issue as PostgreSQL and everyone else; to get optimum
performance, it has to be tuned specifically for the
application/workload at hand.


Corporations exist to make money, and the reason they prohibit doing
anything with their software and then publishing it without their
approval is because they want to control all the public perception of
their software, whether deserved or not.


Of course.  Which is why audited benchmarks like SPEC and TPC are
around.  While they may not represent one's particular workload, they
are the only way to fairly demonstrate comparable performance.


Every user of any large software system (Oracle or otherwise)
has their favourite horror story about the grotty corners of
that software;


Of course, but they also never say why it was caused.  With Oracle,
almost all bad-performance cases I've seen are related to improper
tuning and/or hardware; even by experienced DBAs.

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Thousands of tables versus on table?

2007-06-06 Thread Craig James

Jonah H. Harris wrote:

On 6/6/07, Craig James [EMAIL PROTECTED] wrote:

They're blowing smoke if they think Oracle can do this.


Oracle could handle this fine.


Oracle fell over dead, even with the best indexing possible,
tuned by the experts, and using partitions keyed to the
customerID.


I don't think so, whoever tuned this likely didn't know what they were 
doing.


Wrong on both counts.

You didn't read my message.  I said that *BOTH* Oracle and Postgres performed 
well with table-per-customer.  I wasn't Oracle bashing.  In fact, I was doing 
the opposite: Someone's coworker claimed ORACLE was the miracle cure for all 
problems, and I was simply pointing out that there are no miracle cures.  (I 
prefer Postgres for many reasons, but Oracle is a fine RDBMS that I have used 
extensively.)

The technical question is simple: Table-per-customer or big-table-for-everyone.  The 
answer is, it depends.  It depends on your application, your 
read-versus-write ratio, the table size, the design of your application software, and a 
dozen other factors.  There is no simple answer, but there are important technical 
insights which, I'm happy to report, various people contributed to this discussion.  
Perhaps you have some technical insight too, because it really is an important question.

The reason I assert (and stand by this) that They're blowing smoke when they 
claim Oracle has the magic cure, is because Oracle and Postgres are both relational 
databases, they write their data to disks, and they both have indexes with O(log(N)) 
retrieval/update times.  Oracle doesn't have a magical workaround to these facts, nor 
does Postgres.

Craig

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Thousands of tables versus on table?

2007-06-06 Thread Jonah H. Harris

On 6/6/07, Craig James [EMAIL PROTECTED] wrote:

You didn't read my message.  I said that *BOTH* Oracle
and Postgres performed well with table-per-customer.


Yes, I did.  My belief is that Oracle can handle all customers in a
single table.


The technical question is simple: Table-per-customer or
big-table-for-everyone.  The answer is, it depends.


I agree, it does depend on the data, workload, etc.  No
one-size-fits-all answer there.


The reason I assert (and stand by this) that They're
blowing smoke when they claim Oracle has the magic
cure, is because Oracle and Postgres are both relational
databases, they write their data to disks, and they both
have indexes with O(log(N)) retrieval/update times.  Oracle
doesn't have a magical workaround to these facts,
nor does Postgres.


Agreed that they are similar on the basics, but they do use
significantly different algorithms and optimizations.  Likewise, there
is more tuning that can be done with Oracle given the amount of time
and money one has to spend on it.  Again, cost/benefit analysis on
this type of an issue... but you're right, there is no magic cure.

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[PERFORM] Is this true?

2007-06-06 Thread Chris Hoover

Question,

Does (pg_stat_get_db_blocks_fetched(oid)-pg_stat_get_db_blocks_hit(oid)*8) =
number of KB read from disk for the listed database since the last server
startup?

Thanks,

Chris


Re: [PERFORM] VERY slow queries at random

2007-06-06 Thread Scott Marlowe

Gunther Mayer wrote:

Hi there,

We run a small ISP with a FreeBSD/freeradius/postgresql 8.2.4 backend
and 200+ users. Authentication happens via UAM/hotspot and I see a lot
of authorisation and accounting packets that are handled via PL/PGSQL
functions directly in the database.

Everything seems to work 100% except that a few times a day I see

Jun  6 10:41:31 caligula postgres[57347]: [4-1] radiususer: LOG:
duration: 19929.291 ms  statement: SELECT fn_accounting_start(...)

in my logs. I'm logging slow queries with log_min_duration_statement =
500 in my postgresql.conf. Sometimes another query runs equally slow or
even slower (I've seen 139 seconds!!!) a few minutes before or after as
well, but then everything is back to normal.

Even though I haven't yet indexed my data I know that the system is
performant because my largest table (the accounting one) only has 5000+
rows, the entire database is only a few MB's and I have plenty of memory
(2GB), shared_buffers = 100MB and max_fsm_pages = 179200.  Also from
briefly enabling

log_parser_stats = on
log_planner_stats = on
log_executor_stats = on

I saw that most queries are 100% satisfied from cache so the disk
doesn't even get hit. Finally, the problem seems unrelated to load
because it happens at 4am just as likely as at peak traffic time.

What the heck could cause such erratic behaviour? I suspect some type of
resource problem but what and how could I dig deeper? 


Maybe your hard drive is set to spin down after a certain period of 
idle, and since most all your data is coming from memory, then it might 
be that on the rare occasion when it needs to hit the drive it's not 
spun up anymore.


Maybe some other process is cranking up (cron jobs???) that are chewing 
up all your I/O bandwidth?


Hard to say.  Anything in the system logs that would give you a hint?  
Try correlating them by the time of the slow pgsql queries.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Is this true?

2007-06-06 Thread Jeff Davis
On Wed, 2007-06-06 at 16:58 -0400, Chris Hoover wrote:
 Question,
 
 Does (pg_stat_get_db_blocks_fetched(oid)-pg_stat_get_db_blocks_hit
 (oid)*8) = number of KB read from disk for the listed database since
 the last server startup?

That will give you the number of blocks requested from the OS. The OS
does it's own caching, and so many of those reads might come from the OS
buffer cache, and not the disk itself.

Also, if you're concerned with the number since the last server restart,
make sure you have stats_reset_on_server_start set appropriately.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] LIKE search and performance

2007-06-06 Thread James Mansion

[EMAIL PROTECTED] wrote:

What is a real life example where an intelligent and researched
database application would issue a like or ilike query as their
primary condition in a situation where they expected very high
selectivity?
  
In my case the canonical example is to search against textual keys where 
the search is
performed automatically if the user hs typed enough data and paused.  In 
almost all
cases the '%' trails, and I'm looking for 'starts with' in effect.  
usually the search will have
a specified upper number of returned rows, if that's an available 
facility.  I realise in this
case that matching against the index does not allow the match count 
unless we check

MVCC as we go, but I don't see why another thread can't be doing that.

James


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[PERFORM] Weird 8.2.4 performance

2007-06-06 Thread Kurt Overberg

Gang,

I'm running a mid-size production 8.0 environment.  I'd really like  
to upgrade to 8.2, so I've been doing some testing to make sure my  
app works well with 8.2, and I ran across this weirdness.  I set up  
and configured 8.2 in the standard way, MacOSX Tiger, current  
patches, download src, configure, make, make install, initdb, start  
the db, create a few users, dump out my 8.0 DB (its about 13 GB raw  
text), load it into 8.2.4, vacuum analyze.


This is a simple query the shows some weird behavior.  I have two  
tables, task and taskinstance.  A taskinstance is tied to a campaign  
through the task table (taskinstance points at task which points at  
campaign).  Very simple.  To select all the taskinstances associated  
with a certain campaign, I use this query:


select id from taskinstance where taskid in (select id from task  
where campaignid = 75);


Now, I know this could (and should) be rewritten to not use the WHERE  
x IN () style, but this is actually a sub-query to a larger query-  
The bigger query was acting slow, and I've narrowed it down to this  
snippet.  Task has a total of ~2000 rows, in which 11 of them belong  
to campaign 75.  TaskInstance has around 650,000 rows.


This query runs great on production under 8.0 (27ms), but under 8.2.4  
(on my mac) I'm seeing times in excess of 50,000ms.  Note that on  
8.2.4, if I run the query again, it gets successively faster  
(50,000ms-6000ms-27ms).  Is this normal?  If I change the  
campaignid from 75 to another number, it jumps back to 50,000ms,  
which leads me to believe that postgresql is somehow caching the  
results of the query and not figuring out a better way to run the query.


Indexes:
Taskinstance has taskid_taskinstance_key btree (taskid)
Task has Task_campaignId_key btree (campaignid)

Explain Outputs:

-- 8.2


explain analyze select id from taskinstance where taskid in (select  
id from task where campaignid = 75);
  
QUERY PLAN
 
-
 Nested Loop  (cost=37.65..15068.50 rows=2301 width=4) (actual  
time=99.986..50905.512 rows=881 loops=1)
   -  HashAggregate  (cost=16.94..17.01 rows=7 width=4) (actual  
time=0.213..0.236 rows=9 loops=1)
 -  Index Scan using Task_campaignId_key on task   
(cost=0.00..16.93 rows=7 width=4) (actual time=0.091..0.197 rows=9  
loops=1)

   Index Cond: (campaignid = 76)
   -  Bitmap Heap Scan on taskinstance  (cost=20.71..2143.26  
rows=556 width=8) (actual time=421.423..5655.745 rows=98 loops=9)

 Recheck Cond: (taskinstance.taskid = task.id)
 -  Bitmap Index Scan on taskid_taskinstance_key   
(cost=0.00..20.57 rows=556 width=0) (actual time=54.709..54.709  
rows=196 loops=9)

   Index Cond: (taskinstance.taskid = task.id)
 Total runtime: 50907.264 ms
(9 rows)



-- 8.0

 explain analyze select id from taskinstance where taskid in (select  
id from task where campaignid = 75);
  
QUERY PLAN
 
---
 Nested Loop  (cost=13.70..17288.28 rows=2640 width=4) (actual  
time=0.188..21.496 rows=1599 loops=1)
   -  HashAggregate  (cost=13.70..13.70 rows=8 width=4) (actual  
time=0.153..0.217 rows=11 loops=1)
 -  Index Scan using Task_campaignId_key on task   
(cost=0.00..13.68 rows=8 width=4) (actual time=0.026..0.082 rows=11  
loops=1)

   Index Cond: (campaignid = 75)
   -  Index Scan using taskid_taskinstance_key on taskinstance   
(cost=0.00..2152.28 rows=563 width=8) (actual time=0.012..0.832  
rows=145 loops=11)

 Index Cond: (taskinstance.taskid = outer.id)
 Total runtime: 27.406 ms
(7 rows)

The weird thing is that on 8.2, I don't see any sequential scans  
taking place, it seems to be properly using the indexes.


If anyone has any ideas, I'd appreciate your thoughts.  This one has  
got me boggled.  If I can provide any more information that would  
helpful, please let me know.


Thanks for any light you could shed on my situation!

/kurt


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Weird 8.2.4 performance

2007-06-06 Thread Mark Kirkwood

Kurt Overberg wrote:


Explain Outputs:

-- 8.2




   -  Bitmap Heap Scan on taskinstance  (cost=20.71..2143.26 rows=556 
width=8) (actual time=421.423..5655.745 rows=98 loops=9)

 Recheck Cond: (taskinstance.taskid = task.id)
 -  Bitmap Index Scan on taskid_taskinstance_key  
(cost=0.00..20.57 rows=556 width=0) (actual time=54.709..54.709 rows=196 
loops=9)



-- 8.0



   -  Index Scan using taskid_taskinstance_key on taskinstance  
(cost=0.00..2152.28 rows=563 width=8) (actual time=0.012..0.832 rows=145 
loops=11)



8.2 is deciding to use a bitmap index scan on taskid_taskinstance_key, 
which seems to be slower (!) than a plain old index scan that 8.0 is 
using. A dirty work around is to disable bitmap scans via:


SET enable_bitmapscan=off

but it is probably worthwhile to try to find out *why* the bitmap scan 
is 1) slow and 2) chosen at all given 1).


One thought that comes to mind - is work_mem smaller on your 8.2 system 
than the 8.0 one? (or in fact is it very small on both?). Also it might 
be interesting to see your non-default postgresql.conf settings for both 
systems.


Cheers

Mark

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Weird 8.2.4 performance

2007-06-06 Thread Michael Glaesemann

On Jun 6, 2007, at 18:27 , Kurt Overberg wrote:

select id from taskinstance where taskid in (select id from task  
where campaignid = 75);


Now, I know this could (and should) be rewritten to not use the  
WHERE x IN () style, but this is actually a sub-query to a larger  
query.


Granted, it won't explain why this particular query is slower in 8.2,  
but it shouldn't be to hard to drop in something like


SELECT id
FROM taskinstance
NATURAL JOIN (
SELECT id AS taskid, campaignid
FROM tasks) t
WHERE campaignid = 75

AIUI, the planner can sometimes rewrite IN as a join, but I don't  
know whether or not that's is happening in this case. I'm guessing  
not as I see nested loops in the plans. (I'm a novice at reading  
plans, so take this with at least a teaspoon of salt. :) )


if I run the query again, it gets successively faster (50,000ms- 
6000ms-27ms).  Is this normal?  If I change the campaignid from  
75 to another number, it jumps back to 50,000ms, which leads me to  
believe that postgresql is somehow caching the results of the query  
and not figuring out a better way to run the query.


As the query is repeated, the associated rows are probably already in  
memory, leading to the speedups you're seeing.



-- 8.2



 Recheck Cond: (taskinstance.taskid = task.id)
 -  Bitmap Index Scan on taskid_taskinstance_key   
(cost=0.00..20.57 rows=556 width=0) (actual time=54.709..54.709  
rows=196 loops=9)

   Index Cond: (taskinstance.taskid = task.id)




-- 8.0


   -  Index Scan using taskid_taskinstance_key on taskinstance   
(cost=0.00..2152.28 rows=563 width=8) (actual time=0.012..0.832  
rows=145 loops=11)

 Index Cond: (taskinstance.taskid = outer.id)


I see that the row estimates in both of the query plans are off a  
little. Perhaps increasing the statistics would help? Also, you can  
see that 8.2 is using bitmap scans, which aren't available in 8.0.  
Perhaps try setting enable_bitmapscan off and running the query again  
to see if there's a performance difference.


The weird thing is that on 8.2, I don't see any sequential scans  
taking place, it seems to be properly using the indexes.


As an aside, whether the planner decides to use a sequential scan or  
an index has more to do with the particular query: indexes are not a  
guaranteed performance win.


Hope this helps a bit.

Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match