[PERFORM] Tsearch2 really slower than ilike ?

2004-11-16 Thread Herv Piedvache
Hi,

I'm completly dispointed with Tsearch2 ...

I have a table like this :
  Table public.site
Column |Type |   
Modifiers
---+-+---
 id_site   | integer | not null default 
nextval('public.site_id_site_seq'::text)
 site_name | text|
 site_url  | text|
 url| text|
 language  | text|
 datecrea  | date| default now()
 id_category   | integer |
 time_refresh  | integer |
 active| integer |
 error | integer |
 description   | text|
 version   | text|
 idx_site_name | tsvector|
 lastcheck | date|
 lastupdate| timestamp without time zone |
Indexes:
site_id_site_key unique, btree (id_site)
ix_idx_site_name gist (idx_site_name)
Triggers:
tsvectorupdate_site_name BEFORE INSERT OR UPDATE ON site FOR EACH ROW 
EXECUTE PROCEDURE tsearch2('idx_site_name', 'site_name')

I have 183 956 records in the database ...

SELECT s.site_name, s.id_site, s.description, s.site_url, 
case when exists (select id_user 
from user_choice u 
 where u.id_site=s.id_site 
and u.id_user = 1) then 1
  else 0 end as bookmarked 
   FROM site s 
WHERE s.idx_site_name @@ to_tsquery('atari');

Explain Analyze :
   QUERY PLAN
--
 Index Scan using ix_idx_site_name on site s  (cost=0.00..1202.12 rows=184 
width=158) (actual time=4687.674..4698.422 rows=1 loops=1)
   Index Cond: (idx_site_name @@ '\'atari\''::tsquery)
   Filter: (idx_site_name @@ '\'atari\''::tsquery)
   SubPlan
 -  Seq Scan on user_choice u  (cost=0.00..3.46 rows=1 width=4) (actual 
time=0.232..0.232 rows=0 loops=1)
   Filter: ((id_site = $0) AND (id_user = 1))
 Total runtime: 4698.608 ms

First time I run the request I have a result in about 28 seconds.

SELECT s.site_name, s.id_site, s.description, s.site_url, 
case when exists (select id_user 
from user_choice u 
 where u.id_site=s.id_site 
and u.id_user = 1) then 1
  else 0 end as bookmarked 
   FROM site_rss s 
WHERE s.site_name ilike '%atari%'

   QUERY PLAN

 Seq Scan on site_rss s  (cost=0.00..11863.16 rows=295 width=158) (actual 
time=17.414..791.937 rows=12 loops=1)
   Filter: (site_name ~~* '%atari%'::text)
   SubPlan
 -  Seq Scan on user_choice u  (cost=0.00..3.46 rows=1 width=4) (actual 
time=0.222..0.222 rows=0 loops=12)
   Filter: ((id_site = $0) AND (id_user = 1))
 Total runtime: 792.099 ms

First time I run the request I have a result in about 789 miliseconds !!???

I'm using PostgreSQL v7.4.6 with a Bi-Penitum III 933 Mhz and 1 Gb of RAM.

Any idea ... ? For the moment I'm going back to use the ilike solution ... but 
I was really thinking that Tsearch2 could be a better solution ...

Regards,
-- 
Hervé Piedvache

Elma Ingénierie Informatique
6 rue du Faubourg Saint-Honoré
F-75008 - Paris - France
Pho. 33-144949901
Fax. 33-144949902

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Tsearch2 really slower than ilike ?

2004-11-16 Thread Michael Fuhr
On Tue, Nov 16, 2004 at 03:55:58PM +0100, Hervé Piedvache wrote:

 WHERE s.idx_site_name @@ to_tsquery('atari');

How much text does each site_name field contain?  From the field
name I'd guess only a few words.  Based on my own experience, if
the fields were documents containing thousands of words then I'd
expect tsearch2 to be faster than ILIKE by an order of magnitude
or more.

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Tsearch2 really slower than ilike ?

2004-11-16 Thread Herv Piedvache
Michael,

Le Mardi 16 Novembre 2004 16:32, Michael Fuhr a écrit :
 On Tue, Nov 16, 2004 at 03:55:58PM +0100, Hervé Piedvache wrote:
  WHERE s.idx_site_name @@ to_tsquery('atari');

 How much text does each site_name field contain?  From the field
 name I'd guess only a few words.  Based on my own experience, if
 the fields were documents containing thousands of words then I'd
 expect tsearch2 to be faster than ILIKE by an order of magnitude
 or more.

Yes site name ... is company names or web site name ... so not many word in 
each record ... but I don't understand why more words are more efficient than 
few words ?? sorry ...

Regards,
-- 
Hervé Piedvache

Elma Ingénierie Informatique
6 rue du Faubourg Saint-Honoré
F-75008 - Paris - France
Pho. 33-144949901
Fax. 33-144949902

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Tsearch2 really slower than ilike ?

2004-11-16 Thread Joshua D. Drake

or more.
   

Yes site name ... is company names or web site name ... so not many word in 
each record ... but I don't understand why more words are more efficient than 
few words ?? sorry ...
 

Well there are a couple of reasons but the easiest one is index size.
An ILIKE btree index is in general going to be much smaller than a gist 
index.
The smaller the index the faster it is searched.

Sincerely,
Joshua D. Drake

Regards,
 


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL
begin:vcard
fn:Joshua Drake
n:Drake;Joshua
org:Command Prompt, Inc.
adr:;;PO Box 215 ;Cascade Locks;OR;97014;US
email;internet:[EMAIL PROTECTED]
title:Consultant
tel;work:503-667-4564
tel;fax:503-210-0334
x-mozilla-html:FALSE
url:http://www.commandprompt.com
version:2.1
end:vcard


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Tsearch2 really slower than ilike ?

2004-11-16 Thread Herv Piedvache
Le Mardi 16 Novembre 2004 17:06, Joshua D. Drake a écrit :
QUERY PLAN
 --
 -- Seq Scan on site_rss s 
  (cost=0.00..11863.16 rows=295 width=158) (actual time=17.414..791.937
  rows=12 loops=1)
Filter: (site_name ~~* '%atari%'::text)
SubPlan
  -  Seq Scan on user_choice u  (cost=0.00..3.46 rows=1 width=4)
  (actual time=0.222..0.222 rows=0 loops=12)
Filter: ((id_site = $0) AND (id_user = 1))
  Total runtime: 792.099 ms
 
 First time I run the request I have a result in about 789 miliseconds
  !!???
 
 I'm using PostgreSQL v7.4.6 with a Bi-Penitum III 933 Mhz and 1 Gb of RAM.
 
 Any idea ... ? For the moment I'm going back to use the ilike solution ...
  but I was really thinking that Tsearch2 could be a better solution ...

 Well I would be curious about what happens the second time you run the
 query.
 The first time is kind of a bad example because it has to push the index
 into ram.

The second time is really quicker yes ... about 312 miliseconds ... 
But for each search I have after it take about 3 or 4 seconds ...
So what can I do ?

Regards,
-- 
Hervé Piedvache

Elma Ingénierie Informatique
6 rue du Faubourg Saint-Honoré
F-75008 - Paris - France
Pho. 33-144949901
Fax. 33-144949902

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Tsearch2 really slower than ilike ?

2004-11-16 Thread Oleg Bartunov
could you provide me a dump of your table (just id and tsvector columns),
so I could try on my computer. Also, plain query (simple and clean) which
demonstrated  your problem would be preferred next time !
   Oleg
On Tue, 16 Nov 2004, [iso-8859-15] Herv? Piedvache wrote:
Michael,
Le Mardi 16 Novembre 2004 16:32, Michael Fuhr a ?crit :
On Tue, Nov 16, 2004 at 03:55:58PM +0100, Herv? Piedvache wrote:
WHERE s.idx_site_name @@ to_tsquery('atari');
How much text does each site_name field contain?  From the field
name I'd guess only a few words.  Based on my own experience, if
the fields were documents containing thousands of words then I'd
expect tsearch2 to be faster than ILIKE by an order of magnitude
or more.
Yes site name ... is company names or web site name ... so not many word in
each record ... but I don't understand why more words are more efficient than
few words ?? sorry ...
Regards,
Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Tsearch2 really slower than ilike ?

2004-11-16 Thread Oleg Bartunov
On Tue, 16 Nov 2004, Joshua D. Drake wrote:

or more.
Yes site name ... is company names or web site name ... so not many word in 
each record ... but I don't understand why more words are more efficient 
than few words ?? sorry ...

Well there are a couple of reasons but the easiest one is index size.
An ILIKE btree index is in general going to be much smaller than a gist 
index.
The smaller the index the faster it is searched.
for single word queries  @@ should have the same performance  as ilike with 
index disabled  and better for complex queries.


Sincerely,
Joshua D. Drake

Regards,


Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] mis-estimation on data-warehouse aggregate creation

2004-11-16 Thread Kris Jurka


On Tue, 16 Nov 2004, F. Senault wrote:

 Let me guess...  You've never run analyze on your tables ?
 

No, I have.  I mentioned that I did in my email, but you can also tell by
the exactly correct guesses for some other plan steps:

-  Seq Scan on period  (cost=0.00..90.88 rows=3288 width=54) (actual 
time=0.118..12.126 rows=3288 loops=1)

Kris Jurka

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[PERFORM] nuderstanding 'explain analyse'

2004-11-16 Thread Alexandre Leclerc
Hello,

I build two SELECT queries, and in one I used COALESCE with a CASE,
and in the second one I used only CASE statements.

When analysing, I'm getting the exact same result, except the cost.
(For now I have so few data that the results are too fragmented.

If the plans for both queries are exactly the same, should I assume
that the cost will also be the same?

Thanks for any help.
-- 
Alexandre Leclerc

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Tsearch2 really slower than ilike ?

2004-11-16 Thread Oleg Bartunov
ok, I downloaded dump of table and here is what I found:
zz=# select count(*) from tt;
 count 

 183956
(1 row)

zz=# select * from stat('select tt from tt') order by ndoc desc, nentry
desc,wo
rd limit 10;
 word | ndoc  | nentry 
--+---+
 blog | 12710 |  12835
 weblog   |  4857 |   4859
 news |  4402 |   4594
 life |  4136 |   4160
 world|  1980 |   1986
 journal  |  1882 |   1883
 livejourn|  1737 |   1737
 thought  |  1669 |   1677
 web  |  1154 |   1161
 scotsman.com |  1138 |   1138
(10 rows)

zz=# explain analyze select tt from tt where tt @@  'blog';
  QUERY PLAN 
--
 Index Scan using tt_idx on tt  (cost=0.00..728.83 rows=184 width=32) (actual time=0.047..141.110 rows=12710 loops=1)
   Index Cond: (tt @@ '\'blog\''::tsquery)
   Filter: (tt @@ '\'blog\''::tsquery)
 Total runtime: 154.105 ms
(4 rows)

It's really fast ! So, I don't understand your problem. 
I run query on my desktop machine, nothing special.

Oleg
On Tue, 16 Nov 2004, [iso-8859-15] Herv? Piedvache wrote:
Hi,
I'm completly dispointed with Tsearch2 ...
I have a table like this :
 Table public.site
   Column |Type |
Modifiers
---+-+---
id_site   | integer | not null default
nextval('public.site_id_site_seq'::text)
site_name | text|
site_url  | text|
url| text|
language  | text|
datecrea  | date| default now()
id_category   | integer |
time_refresh  | integer |
active| integer |
error | integer |
description   | text|
version   | text|
idx_site_name | tsvector|
lastcheck | date|
lastupdate| timestamp without time zone |
Indexes:
   site_id_site_key unique, btree (id_site)
   ix_idx_site_name gist (idx_site_name)
Triggers:
   tsvectorupdate_site_name BEFORE INSERT OR UPDATE ON site FOR EACH ROW
EXECUTE PROCEDURE tsearch2('idx_site_name', 'site_name')
I have 183 956 records in the database ...
SELECT s.site_name, s.id_site, s.description, s.site_url,
   case when exists (select id_user
   from user_choice u
where u.id_site=s.id_site
   and u.id_user = 1) then 1
 else 0 end as bookmarked
  FROM site s
WHERE s.idx_site_name @@ to_tsquery('atari');
Explain Analyze :
  QUERY PLAN
--
Index Scan using ix_idx_site_name on site s  (cost=0.00..1202.12 rows=184
width=158) (actual time=4687.674..4698.422 rows=1 loops=1)
  Index Cond: (idx_site_name @@ '\'atari\''::tsquery)
  Filter: (idx_site_name @@ '\'atari\''::tsquery)
  SubPlan
-  Seq Scan on user_choice u  (cost=0.00..3.46 rows=1 width=4) (actual
time=0.232..0.232 rows=0 loops=1)
  Filter: ((id_site = $0) AND (id_user = 1))
Total runtime: 4698.608 ms
First time I run the request I have a result in about 28 seconds.
SELECT s.site_name, s.id_site, s.description, s.site_url,
   case when exists (select id_user
   from user_choice u
where u.id_site=s.id_site
   and u.id_user = 1) then 1
 else 0 end as bookmarked
  FROM site_rss s
WHERE s.site_name ilike '%atari%'
  QUERY PLAN

Seq Scan on site_rss s  (cost=0.00..11863.16 rows=295 width=158) (actual
time=17.414..791.937 rows=12 loops=1)
  Filter: (site_name ~~* '%atari%'::text)
  SubPlan
-  Seq Scan on user_choice u  (cost=0.00..3.46 rows=1 width=4) (actual
time=0.222..0.222 rows=0 loops=12)
  Filter: ((id_site = $0) AND (id_user = 1))
Total runtime: 792.099 ms
First time I run the request I have a result in about 789 miliseconds !!???
I'm using PostgreSQL v7.4.6 with a Bi-Penitum III 933 Mhz and 1 Gb of RAM.
Any idea ... ? For the moment I'm going back to use the ilike solution ... but
I was really thinking that Tsearch2 could be a 

Re: [PERFORM] mis-estimation on data-warehouse aggregate creation

2004-11-16 Thread Simon Riggs
On Tue, 2004-11-16 at 09:10, Kris Jurka wrote:
  By rewriting the JOIN 
 conditions to LEFT JOIN we force the planner to recognize that there will 
 be a match for every row in the sales table:
 

You realise that returns a different answer (or at least it potentially
does, depending upon your data?

  -  Hash Join  (cost=4.70..194.23 rows=1 width=12) 
 (actual time=2.675..74.693 rows=3288 loops=1)
Hash Cond: ((outer.monthnumber = 
 inner.monthnumber) AND (outer.monthname = inner.monthname) AND 
 (outer.year = inner.year) AND (outer.monthyear = inner.monthyear) 
 AND (outer.quarter = inner.quarter) AND (outer.quarteryear = 
 inner.quarteryear))
-  Seq Scan on period  (cost=0.00..90.88 
 rows=3288 width=54) (actual time=0.118..12.126 rows=3288 loops=1)
-  Hash  (cost=3.08..3.08 rows=108 width=58) 
 (actual time=1.658..1.658 rows=0 loops=1)
  -  Seq Scan on shd_month  (cost=0.00..3.08 
 rows=108 width=58) (actual time=0.081..0.947 rows=108 loops=1)

ISTM your trouble starts here ^^^
estimate=1, but rows=3288 

The join condition has so many ANDed predicates that we assume that this
will reduce the selectivity considerably. It does not, and so you pay
the cost dearly later on.

In both plans, the trouble starts at this point.

If you pre-build tables that have only a single join column between the
full.oldids and shrunken.renumberedids then this will most likely work
correctly, since the planner will be able to correctly estimate the join
selectivity. i.e. put product.id onto shd_productline ahead of time, so
you can avoid the complex join.

Setting join_collapse_limit lower doesn't look like it would help, since
the plan already shows joining the sub-queries together first.

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] memcached and PostgreSQL

2004-11-16 Thread Michael Adler
http://pugs.postgresql.org/sfpug/archives/21.html

I noticed that some of you left coasters were talking about memcached
and pgsql. I'm curious to know what was discussed.

In reading about memcached, it seems that many people are using it to
circumvent the scalability problems of MySQL (lack of MVCC). 

from their site:

snip
Shouldn't the database do this?

Regardless of what database you use (MS-SQL, Oracle, Postgres,
MysQL-InnoDB, etc..), there's a lot of overhead in implementing ACID
properties in a RDBMS, especially when disks are involved, which means
queries are going to block. For databases that aren't ACID-compliant
(like MySQL-MyISAM), that overhead doesn't exist, but reading threads
block on the writing threads. memcached never blocks. 
/snip

So What does memcached offer pgsql users? It would still seem to offer
the benefit of a multi-machined cache.

-Mike 

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] query plan question

2004-11-16 Thread David Parker
I have a query for which postgres is generating a different plan on different 
machines. The database schema is the same, the dataset is the same, the 
configuration is the same (e.g., pg_autovacuum running in both cases), both 
systems are Solaris 9. The main difference in the two systems is that one is 
sparc and the other is intel.

The query runs in about 40 ms on the intel box, but takes about 18 seconds on 
the sparc box. Now, the intel boxes we have are certainly faster, but I'm 
curious why the query plan might be different.

For the intel:

QUERY PLAN
Unique  (cost=11.50..11.52 rows=2 width=131)
  -  Sort  (cost=11.50..11.50 rows=2 width=131)
Sort Key: up.prefix, s.name, s.tuid, s.foundryversion
-  Hash Join  (cost=10.42..11.49 rows=2 width=131)
  Hash Cond: (outer.dbid = inner.schema)
  -  Seq Scan on schema s  (cost=0.00..1.02 rows=2 width=128)
  -  Hash  (cost=10.41..10.41 rows=4 width=11)
-  Nested Loop  (cost=0.00..10.41 rows=4 width=11)
  -  Nested Loop  (cost=0.00..2.14 rows=4 width=4)
-  Seq Scan on flow fl  (cost=0.00..0.00 
rows=1 width=4)
  Filter: (servicetype = 646)
-  Index Scan using usage_flow_i on usage u  
(cost=0.00..2.06 rows=6 width=8)
  Index Cond: (u.flow = outer.dbid)
  -  Index Scan using usageparameter_usage_i on 
usageparameter up  (cost=0.00..2.06 rows=1 width=15)
Index Cond: (up.usage = outer.dbid)
Filter: ((prefix)::text  'xsd'::text)

For the sparc:

QUERY PLAN
Unique  (cost=10.81..10.83 rows=1 width=167)
  -  Sort  (cost=10.81..10.82 rows=1 width=167)
Sort Key: up.prefix, s.name, s.tuid, s.foundryversion
-  Nested Loop  (cost=9.75..10.80 rows=1 width=167)
  Join Filter: (outer.flow = inner.dbid)
  -  Hash Join  (cost=9.75..10.79 rows=1 width=171)
Hash Cond: (outer.dbid = inner.schema)
-  Seq Scan on schema s  (cost=0.00..1.02 rows=2 
width=128)
-  Hash  (cost=9.75..9.75 rows=1 width=51)
  -  Nested Loop  (cost=0.00..9.75 rows=1 width=51)
Join Filter: (inner.usage = outer.dbid)
-  Index Scan using usage_flow_i on usage u  
(cost=0.00..4.78 rows=1 width=8)
-  Index Scan using usageparameter_schema_i on 
usageparameter up  (cost=0.00..4.96 rows=1 width=51)
  Filter: ((prefix)::text  'xsd'::text)
  -  Seq Scan on flow fl  (cost=0.00..0.00 rows=1 width=4)
Filter: (servicetype = 646)

I assume the problem with the second plan starts with doing a Nested Loop 
rather than a Hash Join at the 4th line of the plan, but I don't know why it 
would be different for the same schema, same dataset.

What factors go into the planner's decision to choose a nested loop over a hash 
join? Should I be looking at adjusting my runtime configuration on the sparc 
box somehow?

Thanks.

- DAP
--
David ParkerTazz Networks(401) 709-5130
 

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] query plan question

2004-11-16 Thread Russell Smith
On Wed, 17 Nov 2004 02:54 pm, you wrote:
 I have a query for which postgres is generating a different plan on different 
 machines. The database schema is the same, the dataset is the same, the 
 configuration is the same (e.g., pg_autovacuum running in both cases), both 
 systems are Solaris 9. The main difference in the two systems is that one is 
 sparc and the other is intel.
 
 The query runs in about 40 ms on the intel box, but takes about 18 seconds on 
 the sparc box. Now, the intel boxes we have are certainly faster, but I'm 
 curious why the query plan might be different.
 
 For the intel:
 
 QUERY PLAN
 Unique  (cost=11.50..11.52 rows=2 width=131)
   -  Sort  (cost=11.50..11.50 rows=2 width=131)
 Sort Key: up.prefix, s.name, s.tuid, s.foundryversion
 -  Hash Join  (cost=10.42..11.49 rows=2 width=131)
   Hash Cond: (outer.dbid = inner.schema)
   -  Seq Scan on schema s  (cost=0.00..1.02 rows=2 width=128)
   -  Hash  (cost=10.41..10.41 rows=4 width=11)
 -  Nested Loop  (cost=0.00..10.41 rows=4 width=11)
   -  Nested Loop  (cost=0.00..2.14 rows=4 width=4)
 -  Seq Scan on flow fl  (cost=0.00..0.00 
 rows=1 width=4)
   Filter: (servicetype = 646)
 -  Index Scan using usage_flow_i on usage 
 u  (cost=0.00..2.06 rows=6 width=8)
   Index Cond: (u.flow = outer.dbid)
   -  Index Scan using usageparameter_usage_i on 
 usageparameter up  (cost=0.00..2.06 rows=1 width=15)
 Index Cond: (up.usage = outer.dbid)
 Filter: ((prefix)::text  'xsd'::text)
 
 For the sparc:
 
 QUERY PLAN
 Unique  (cost=10.81..10.83 rows=1 width=167)
   -  Sort  (cost=10.81..10.82 rows=1 width=167)
 Sort Key: up.prefix, s.name, s.tuid, s.foundryversion
 -  Nested Loop  (cost=9.75..10.80 rows=1 width=167)
   Join Filter: (outer.flow = inner.dbid)
   -  Hash Join  (cost=9.75..10.79 rows=1 width=171)
 Hash Cond: (outer.dbid = inner.schema)
 -  Seq Scan on schema s  (cost=0.00..1.02 rows=2 
 width=128)
 -  Hash  (cost=9.75..9.75 rows=1 width=51)
   -  Nested Loop  (cost=0.00..9.75 rows=1 width=51)
 Join Filter: (inner.usage = outer.dbid)
 -  Index Scan using usage_flow_i on usage 
 u  (cost=0.00..4.78 rows=1 width=8)
 -  Index Scan using usageparameter_schema_i 
 on usageparameter up  (cost=0.00..4.96 rows=1 width=51)
   Filter: ((prefix)::text  'xsd'::text)
   -  Seq Scan on flow fl  (cost=0.00..0.00 rows=1 width=4)
 Filter: (servicetype = 646)
 
Unique  (cost=11.50..11.52 rows=2 width=131)
Unique  (cost=10.81..10.83 rows=1 width=167)

The estimations for the cost is basically the same, 10ms for the first row.  
Can you supply Explain analyze to see what it's actually doing?

Russell Smith

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Table Partitions: To Inherit Or Not To Inherit

2004-11-16 Thread Don Drake
I've read the previous thread on the list regarding partitioning
mechanisms and I just wrote a plpgsql function to create the partition
tables (by date) as well as another function used to do the insert (it
determines which table will be inserted).

The creation of the partition tables uses the inherits clause when
creating.  It creates an exact copy of the table it's inheriting from,
and adds the indexes since inherits doesn't do that for me.

CREATE TABLE hourly_report_data_2004_11_16 () INHERITS (hourly_report_data)

When I query on the hourly_report_data, the explain plan shows it
query all the tables that inherited from it.  That's all great.

What's really the difference between this and creating separate tables
with the same column definition without the inherit, and then create a
view to merge them together?

Also, I've run into a snag in that I have a hourly_detail table, that
has a foreign key to the hourly_report_data.  The inherit method above
does not honor the foreign key relationship to the children table of
hourly_report_data.  I can't insert any data into the hourly_detail
table due to the constraint failing.

The hourly_detail table is relatively tiny compared to the enormous
hourly_report_data table, so if I don't have to partition that one I
would rather not.  Any suggestions on this?

Thanks.

-Don

-- 
Donald Drake
President
Drake Consulting
http://www.drakeconsult.com/
312-560-1574

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] memcached and PostgreSQL

2004-11-16 Thread Josh Berkus
Michael,

 So What does memcached offer pgsql users? It would still seem to offer
 the benefit of a multi-machined cache.

Yes, and a very, very fast one too ... like, 120,000 operations per second.   
PostgreSQL can't match that because of the overhead of authentication, 
security, transaction visibility checking, etc.   

So memcached becomes a very good place to stick data that's read often but not 
updated often, or alternately data that changes often but is disposable.   An 
example of the former is a user+ACL list; and example of the latter is web 
session information ... or simple materialized views.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Table Partitions: To Inherit Or Not To Inherit

2004-11-16 Thread Josh Berkus
Don,

 What's really the difference between this and creating separate tables
 with the same column definition without the inherit, and then create a
 view to merge them together?

Easier syntax for queries.   If you created completely seperate tables and 
UNIONED them together, you'd have to be constantly modifying a VIEW which 
tied the tables together.  With inheritance, you just do SELECT * FROM 
parent_table and it handles finding all the children for you.

 Also, I've run into a snag in that I have a hourly_detail table, that
 has a foreign key to the hourly_report_data.  The inherit method above
 does not honor the foreign key relationship to the children table of
 hourly_report_data.  I can't insert any data into the hourly_detail
 table due to the constraint failing.

This is a known limitation of inherited tables, at least in current 
implementations.   I think it's on the TODO list.   For now, either live 
without the FKs, or implement them through custom triggers/rules.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] nuderstanding 'explain analyse'

2004-11-16 Thread Josh Berkus
Alexandre,

 If the plans for both queries are exactly the same, should I assume
 that the cost will also be the same?

Nope.   A seq scan over 1,000,000,000 rows is going to cost a LOT more than a 
seq scan over 1000 rows, even though it's the same plan.

When you have the data sorted out, post explain analyzes and we'll take a shot 
at it.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] memcached and PostgreSQL

2004-11-16 Thread Troels Arvin
On Tue, 16 Nov 2004 21:47:54 -0800, Josh Berkus wrote:

 So memcached becomes a very good place to stick data that's read often but 
 not 
 updated often, or alternately data that changes often but is disposable.   An 
 example of the former is a user+ACL list; and example of the latter is web 
 session information ... or simple materialized views.

Has anyone tried at least two of

1. memcached
2. Tugela Cache (pretty much the same as memcached, I think)
3. Sharedance

In that case: Do you have any comparative remarks?


Links:

1: http://www.danga.com/memcached/

2: http://meta.wikimedia.org/wiki/Tugela_Cache
   http://cvs.sourceforge.net/viewcvs.py/wikipedia/tugelacache/

3: http://sharedance.pureftpd.org/

-- 
Greetings from Troels Arvin, Copenhagen, Denmark



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly