Re: [PERFORM] postgres eating CPU

2004-11-17 Thread Joshua D. Drake
Josh Berkus wrote:
in March there was an interesting discussion on the list with the
subject "postgres eating CPU on HP9000".
http://archives.postgresql.org/pgsql-performance/2004-03/msg00380.php
--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL
begin:vcard
fn:Joshua Drake
n:Drake;Joshua
org:Command Prompt, Inc.
adr:;;PO Box 215 ;Cascade Locks;OR;97014;US
email;internet:[EMAIL PROTECTED]
title:Consultant
tel;work:503-667-4564
tel;fax:503-210-0334
x-mozilla-html:FALSE
url:http://www.commandprompt.com
version:2.1
end:vcard


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Analyzer is clueless

2004-11-17 Thread Tom Lane
>> I've seen cases where it seems the
>> planer doesn't think it'll be getting a unique value or a small set of
>> values even though stats indicates that it should be.

A test case exhibiting the problem would be helpful.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Analyzer is clueless

2004-11-17 Thread Josh Berkus
Jim,

> Is there still a good reason to have the histogram stats so low? Should
> the default be changed to more like 100 at this point?

Low overhead.  This is actually a TODO for me for 8.1.   I need to find some 
test cases to set a differential level of histogram access for indexed 
fields, so like 10 for most fields but 100/150/200 for indexed fields.

However, I got stalled on finding test cases and then ran out of time.

> Also, how extensively does the planner use n_distinct, null_frac,
> reltuples and the histogram to see what the odds are of finding a unique
> value or a low number of values? I've seen cases where it seems the
> planer doesn't think it'll be getting a unique value or a small set of
> values even though stats indicates that it should be.
>
> One final question... would there be interest in a process that would
> dynamically update the histogram settings for tables based on how
> distinct/unique each field was?

Well, the process by which the analyzer decides that a field is unique could 
probably use some troubleshooting.   And we always, always could use 
suggestions/tests/help with the query planner.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Analyzer is clueless

2004-11-17 Thread Jim C. Nasby
On Wed, Nov 17, 2004 at 10:32:48PM +, Simon Riggs wrote:
> The main issue is that PostgreSQL's default histogram statistics setting
> is lower than other RDBMS. This means that it is less able to
> discriminate between cases such as yours that are close to the edge.
> This is a trade-off between run-time of the ANALYZE command and the
> benefit it produces. As Joshua suggests, increasing the statistics
> target for this table will likely allow the optimizer to correctly
> determine the selectivity of the index and take the right path.

Is there still a good reason to have the histogram stats so low? Should
the default be changed to more like 100 at this point?

Also, how extensively does the planner use n_distinct, null_frac,
reltuples and the histogram to see what the odds are of finding a unique
value or a low number of values? I've seen cases where it seems the
planer doesn't think it'll be getting a unique value or a small set of
values even though stats indicates that it should be.

One final question... would there be interest in a process that would
dynamically update the histogram settings for tables based on how
distinct/unique each field was?
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] postgres eating CPU

2004-11-17 Thread Tom Lane
"[EMAIL PROTECTED]" <[EMAIL PROTECTED]> writes:
> in March there was an interesting discussion on the list with the
> subject "postgres eating CPU on HP9000".
>http://archives.postgresql.org/pgsql-performance/2004-03/msg00380.php

Reviewing that, the problem is most likely that (a) they didn't have
max_fsm_pages set high enough to cover the database, and (b) they were
running 7.3.* which is prone to index bloat.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] postgres eating CPU

2004-11-17 Thread Josh Berkus

> >>in March there was an interesting discussion on the list with the
> >>subject "postgres eating CPU on HP9000".

Aha, this one.   Yeah, I believe that they upgraded to 7.4 inorder to deal 
with REINDEX issues.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] postgres eating CPU

2004-11-17 Thread [EMAIL PROTECTED]
Josh Berkus wrote:
in March there was an interesting discussion on the list with the
subject "postgres eating CPU on HP9000".
   

Link, please?
 

  http://archives.postgresql.org/pgsql-performance/2004-03/msg00380.php
 

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Analyzer is clueless

2004-11-17 Thread Simon Riggs
On Thu, 2004-11-18 at 02:08, David Brown wrote:
> Statistics were run on each table before query execution. The random page 
> cost was lowered to 2, but as you can see, the estimated costs are wild 
> anyway.
> 
> As a comparison, MS SQL Server took less than 15 seconds, or 4 times faster.
> 
> MySQL (InnoDB) took 2 seconds, which is 30 times faster.
> 
> The query looks straightforward to me (it might be clearer with a subselect), 
> so what on earth is wrong?

The query is, as you say, straightforward.

You are clearly working with a query that is on the very edge of the
decision between using an index or not. 

The main issue is that PostgreSQL's default histogram statistics setting
is lower than other RDBMS. This means that it is less able to
discriminate between cases such as yours that are close to the edge.
This is a trade-off between run-time of the ANALYZE command and the
benefit it produces. As Joshua suggests, increasing the statistics
target for this table will likely allow the optimizer to correctly
determine the selectivity of the index and take the right path.

If this is a general RDBMS comparison, you may wish to extend the
system's default_statistics_target = 80 or at least > 10.

To improve this query, you may wish to extend the table's statistics
target using:

ALTER TABLE "drinv"
ALTER COLUMN OrderDate SET STATISTICS 100;

which should allow the planner to more accurately estimate statistics
and thereby select an index, if appropriate.

The doco has recently been changed with regard to effective_cache_size;
you don't mention what beta release level you're using. That is the only
planner parameter that takes cache size into account, so any other
changes would certainly have zero effect on this *plan* though might
still benefit execution time.

Please post EXPLAIN ANALYZE output for any further questions.

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] postgres eating CPU

2004-11-17 Thread Josh Berkus

> in March there was an interesting discussion on the list with the
> subject "postgres eating CPU on HP9000".

Link, please?

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] postgres eating CPU

2004-11-17 Thread [EMAIL PROTECTED]
Hi,
in March there was an interesting discussion on the list with the 
subject "postgres eating CPU on HP9000".

Now I'm the same problem on a Dell dual processor machine.
Anybody know if there was a solution?
Thanks
Piergiorgio
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] sort_mem affect on inserts?

2004-11-17 Thread Josh Berkus
David,

> I understand that the sort_mem conf setting affects queries with order by,
> etc., and the doc mentions that it is used in create index. Does sort_mem
> affect the updating of indexes, i.e., can the sort_mem setting affect the
> performance of inserts?

Only if the table has Foriegn Keys whose lookup might require a large sort.   
Otherwise, no.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] sort_mem affect on inserts?

2004-11-17 Thread David Parker
I understand that the sort_mem conf setting affects queries with order by, 
etc., and the doc mentions that it is used in create index. Does sort_mem 
affect the updating of indexes, i.e., can the sort_mem setting affect the 
performance of inserts?

- DAP
--
David ParkerTazz Networks(401) 709-5130
 

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] memcached and PostgreSQL

2004-11-17 Thread Josh Berkus
Michael,

> Still, it seems like a convenient way to maintain cache coherency,
> assuming that your application doesn't already have a clean way to do
> that.

Precisely.The big problem with memory caching is the cache getting out of 
sync with the database.   Updating the cache through database triggers helps 
ameliorate that.

However, our inability to pass messages with NOTIFY somewhat limits the the 
utility of this solution   Sean wants "on commit triggers", but there's some 
major issues to work out with that.   Passing messages with NOTIFY would be 
easier and almost as good.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Analyzer is clueless

2004-11-17 Thread Joshua D. Drake
Hello,
Have you tried increasing the statistics target for orderdate and 
rerunning analyze?

Sincerely,
Joshua D. Drake
David Brown wrote:
I'm doing some performance profiling with a simple two-table query:
SELECT L."ProductID", sum(L."Amount")
FROM "drinv" H
JOIN "drinvln" L ON L."OrderNo" = H."OrderNo"
WHERE
("OrderDate" between '2003-01-01' AND '2003-04-30')
GROUP BY L."ProductID"
drinv and drinvln have about 100,000 and 3,500,000 rows respectively. Actual 
data size in the large table is 500-600MB. OrderNo is indexed in both tables, 
as is OrderDate.
The environment is PGSQL 8 on Win2k with 512MB RAM (results are similar to 7.3 
from Mammoth). I've tried tweaking various conf parameters, but apart from 
using up memory, nothing seems to have had a tangible effect - the Analyzer 
doesn't seem to take resources into account like some of the doco suggests.
The date selection represents about 5% of the range. Here's the plan summaries:
Three months (2003-01-01 to 2003-03-30) = 1 second
HashAggregate  (cost=119365.53..119368.74 rows=642 width=26)
  ->  Nested Loop  (cost=0.00..118791.66 rows=114774 width=26)
->  Index Scan using "drinv_OrderDate" on drinv h  (cost=0.00..200.27 
rows=3142 width=8)
  Index Cond: (("OrderDate" >= '2003-01-01'::date) AND ("OrderDate" 
<= '2003-03-30'::date))
->  Index Scan using "drinvln_OrderNo" on drinvln l  (cost=0.00..28.73 
rows=721 width=34)
  Index Cond: (l."OrderNo" = "outer"."OrderNo")
Four months (2003-01-01 to 2003-04-30) = 60 seconds
HashAggregate  (cost=126110.53..126113.74 rows=642 width=26)
  ->  Hash Join  (cost=277.55..125344.88 rows=153130 width=26)
Hash Cond: ("outer"."OrderNo" = "inner"."OrderNo")
->  Seq Scan on drinvln l  (cost=0.00..106671.35 rows=3372935 width=34)
->  Hash  (cost=267.07..267.07 rows=4192 width=8)
  ->  Index Scan using "drinv_OrderDate" on drinv h  
(cost=0.00..267.07 rows=4192 width=8)
Index Cond: (("OrderDate" >= '2003-01-01'::date) AND 
("OrderDate" <= '2003-04-30'::date))
Four months (2003-01-01 to 2003-04-30) with Seq_scan disabled = 75 seconds
HashAggregate  (cost=130565.83..130569.04 rows=642 width=26)
  ->  Merge Join  (cost=519.29..129800.18 rows=153130 width=26)
Merge Cond: ("outer"."OrderNo" = "inner"."OrderNo")
->  Sort  (cost=519.29..529.77 rows=4192 width=8)
  Sort Key: h."OrderNo"
  ->  Index Scan using "drinv_OrderDate" on drinv h  
(cost=0.00..267.07 rows=4192 width=8)
Index Cond: (("OrderDate" >= '2003-01-01'::date) AND 
("OrderDate" <= '2003-04-30'::date))
->  Index Scan using "drinvln_OrderNo" on drinvln l  
(cost=0.00..119296.29 rows=3372935 width=34)
Statistics were run on each table before query execution. The random page cost 
was lowered to 2, but as you can see, the estimated costs are wild anyway.
As a comparison, MS SQL Server took less than 15 seconds, or 4 times faster.
MySQL (InnoDB) took 2 seconds, which is 30 times faster.
The query looks straightforward to me (it might be clearer with a subselect), 
so what on earth is wrong?
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

--
Command Prompt, Inc., home of PostgreSQL Replication, and plPHP.
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
begin:vcard
fn:Joshua D. Drake
n:Drake;Joshua D.
org:Command Prompt, Inc.
adr:;;PO Box 215;Cascade Locks;Oregon;97014;USA
email;internet:[EMAIL PROTECTED]
title:Consultant
tel;work:503-667-4564
tel;fax:503-210-0334
note:Command Prompt, Inc. is the largest and oldest US based commercial PostgreSQL support provider. We  provide the only commercially viable integrated PostgreSQL replication solution, but also custom programming, and support. We authored  the book Practical PostgreSQL, the procedural language plPHP, and adding trigger capability to plPerl.
x-mozilla-html:FALSE
url:http://www.commandprompt.com/
version:2.1
end:vcard


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] memcached and PostgreSQL

2004-11-17 Thread Michael Adler
On Wed, Nov 17, 2004 at 09:13:09AM -0800, Darcy Buskermolen wrote:
> On November 16, 2004 08:00 pm, Michael Adler wrote:
> > http://pugs.postgresql.org/sfpug/archives/21.html
> >
> > I noticed that some of you left coasters were talking about memcached
> > and pgsql. I'm curious to know what was discussed.
> 
> Have a look at the pdf presentation found on the following site:
> 
> http://people.freebsd.org/~seanc/pgmemcache/

Thanks for that.

That presentation was rather broad and the API seems rather general
purpose, but I wonder why you would really want access the cache by
way of the DB? If one major point of memcache is to allocate RAM to a
low-overhead server instead of to the RDBMS's disk cache, why would
you add the overhead of the RDBMS to the process?  (this is a bit of
straw man, but just trying to flesh-out the pros and cons)

Still, it seems like a convenient way to maintain cache coherency,
assuming that your application doesn't already have a clean way to do
that.

(just my uninformed opinion, though...)

-Mike

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Analyzer is clueless

2004-11-17 Thread Josh Berkus
Dan,

> I'm doing some performance profiling with a simple two-table query:

Please send EXPLAIN ANALYZE for each query, and not just EXPLAIN.   Thanks!

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] query plan question

2004-11-17 Thread Matthew T. O'Connor
Well based on the autovacuum log that you attached, all of those tables 
are insert only (at least during the time period included in the log.   
Is that correct?  If so, autovacuum will never do a vacuum (unless 
required by xid wraparound issues) on those tables.  So this doesn't 
appear to be an autovacuum problem.  I'm not sure about the missing 
pg_statistic entries anyone else care to field that one?

Matthew
David Parker wrote:
Thanks. The tables I'm concerned with are named: 'schema', 'usage',
'usageparameter', and 'flow'. It looks like autovacuum is performing
analyzes:
% grep "Performing: " logs/.db.tazz.vacuum.log
[2004-11-17 12:05:58 PM] Performing: ANALYZE
"public"."scriptlibrary_library"
[2004-11-17 12:15:59 PM] Performing: ANALYZE
"public"."scriptlibraryparm"
[2004-11-17 12:15:59 PM] Performing: ANALYZE "public"."usageparameter"
[2004-11-17 12:21:00 PM] Performing: ANALYZE "public"."usageproperty"
[2004-11-17 12:21:00 PM] Performing: ANALYZE "public"."route"
[2004-11-17 12:21:00 PM] Performing: ANALYZE "public"."usageparameter"
[2004-11-17 12:21:00 PM] Performing: ANALYZE
"public"."scriptlibrary_library"
[2004-11-17 12:26:01 PM] Performing: ANALYZE "public"."usage"
[2004-11-17 12:26:01 PM] Performing: ANALYZE "public"."usageparameter"
[2004-11-17 12:31:04 PM] Performing: ANALYZE "public"."usageproperty"
[2004-11-17 12:36:04 PM] Performing: ANALYZE "public"."route"
[2004-11-17 12:36:04 PM] Performing: ANALYZE "public"."service_usage"
[2004-11-17 12:36:04 PM] Performing: ANALYZE "public"."usageparameter"
But when I run the following:
select * from pg_statistic where starelid in 
(select oid from pg_class where relname in
('schema','usageparameter','flow','usage'))

it returns no records. Shouldn't it? It doesn't appear to be doing a
vacuum anywhere, which makes sense because none of these tables have
over the default threshold of 1000. Are there statistics which only get
generated by vacuum?
I've attached a gzip of the pg_autovacuum log file, with -d 3.
Thanks again.
- DAP
 

-Original Message-
From: Matthew T. O'Connor [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 17, 2004 11:41 AM
To: David Parker
Cc: Tom Lane; Jeff; Russell Smith; [EMAIL PROTECTED]
Subject: Re: [PERFORM] query plan question

David Parker wrote:
   

We're using postgresql 7.4.5. I've only recently put pg_autovacuum in 
place as part of our installation, and I'm basically taking the 
defaults. I doubt it's a problem with autovacuum itself, but rather 
with my configuration of it. I have some reading to do, so 
 

any pointers 
   

to existing autovacuum threads would be greatly appreciated!
 

Well the first thing to do is increase the verbosity of the 
pg_autovacuum logging output.  If you use -d2 or higher, 
pg_autovacuum will print out a lot of detail on what it thinks 
the thresholds are and 
why it is or isn't performing vacuums and analyzes.   Attach 
some of the 
log and I'll take a look at it.

   


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Tsearch2 really slower than ilike ?

2004-11-17 Thread Oleg Bartunov
1;2c1;2c1;2cOn Wed, 17 Nov 2004, [iso-8859-15] Herv? Piedvache wrote:
Oleg,
Sorry but when I do your request I get :
# select id_site from site where idx_site_name @@  'livejourn';
ERROR:  type " " does not exist
no idea :) btw, what version of postgresql and OS you're running.
Could you try minimal test - check sql commands from tsearch2 sources,
some basic queries from tsearch2 documentation, tutorials.
btw, your query should looks like
select id_site from site_rss where idx_site_name @@ 'livejourn';

How did you run your queries at all ? I mean your first message about 
poor tsearch2 performance.

1;2c1;2c1;2c
What is this ?
(private: I don't know what happend with my mail, but I do nothing special to
disturb the contains when I'm writting to you ...)
Le Mardi 16 Novembre 2004 22:13, Oleg Bartunov a ?crit :
ok, I downloaded dump of table and here is what I found:
zz=# select count(*) from tt;
  count

  183956
(1 row)
zz=# select * from stat('select tt from tt') order by ndoc desc, nentry
desc,wo
rd limit 10;
  word | ndoc  | nentry
--+---+
  blog | 12710 |  12835
  weblog   |  4857 |   4859
  news |  4402 |   4594
  life |  4136 |   4160
  world|  1980 |   1986
  journal  |  1882 |   1883
  livejourn|  1737 |   1737
  thought  |  1669 |   1677
  web  |  1154 |   1161
  scotsman.com |  1138 |   1138
(10 rows)
zz=# explain analyze select tt from tt where tt @@  'blog';
   QUERY PLAN
---
--- Index Scan using tt_idx on tt
(cost=0.00..728.83 rows=184 width=32) (actual time=0.047..141.110
rows=12710 loops=1) Index Cond: (tt @@ '\'blog\''::tsquery)
Filter: (tt @@ '\'blog\''::tsquery)
  Total runtime: 154.105 ms
(4 rows)
It's really fast ! So, I don't understand your problem.
I run query on my desktop machine, nothing special.
Oleg
On Tue, 16 Nov 2004, [iso-8859-15] Herv? Piedvache wrote:
Hi,
I'm completly dispointed with Tsearch2 ...
I have a table like this :
 Table "public.site"
   Column |Type |
Modifiers
---+-+---
 id_site   | integer
  | not null default
nextval('public.site_id_site_seq'::text)
site_name | text|
site_url  | text|
url| text|
language  | text|
datecrea  | date| default now()
id_category   | integer |
time_refresh  | integer |
active| integer |
error | integer |
description   | text|
version   | text|
idx_site_name | tsvector|
lastcheck | date|
lastupdate| timestamp without time zone |
Indexes:
   "site_id_site_key" unique, btree (id_site)
   "ix_idx_site_name" gist (idx_site_name)
Triggers:
   tsvectorupdate_site_name BEFORE INSERT OR UPDATE ON site FOR EACH ROW
EXECUTE PROCEDURE tsearch2('idx_site_name', 'site_name')
I have 183 956 records in the database ...
SELECT s.site_name, s.id_site, s.description, s.site_url,
   case when exists (select id_user
   from user_choice u
where u.id_site=s.id_site
   and u.id_user = 1)
then 1 else 0 end as bookmarked
  FROM site s
WHERE s.idx_site_name @@ to_tsquery('atari');
Explain Analyze :
  QUERY PLAN
-
- Index
Scan using ix_idx_site_name on site s  (cost=0.00..1202.12 rows=184
width=158) (actual time=4687.674..4698.422 rows=1 loops=1)
  Index Cond: (idx_site_name @@ '\'atari\''::tsquery)
  Filter: (idx_site_name @@ '\'atari\''::tsquery)
  SubPlan
->  Seq Scan on user_choice u  (cost=0.00..3.46 rows=1 width=4)
(actual time=0.232..0.232 rows=0 loops=1)
  Filter: ((id_site = $0) AND (id_user = 1))
Total runtime: 4698.608 ms
First time I run the request I have a result in about 28 seconds.
SELECT s.site_name, s.id_site, s.description, s.site_url,
   case when exists (select id_user
   from user_choice u
where u.id_site=s.id_site
   and u.id_user = 1)
then 1 else 0 end as bookmarked
  FROM site_rss s
WHERE s.site_name ilike '%atari%'
 

Re: [PERFORM] Tsearch2 really slower than ilike ?

2004-11-17 Thread Hervé Piedvache
Oleg,

Sorry but when I do your request I get :
# select id_site from site where idx_site_name @@  'livejourn';
ERROR:  type " " does not exist

What is this ?

(private: I don't know what happend with my mail, but I do nothing special to 
disturb the contains when I'm writting to you ...)

Le Mardi 16 Novembre 2004 22:13, Oleg Bartunov a écrit :
> ok, I downloaded dump of table and here is what I found:
>
> zz=# select count(*) from tt;
>   count
> 
>   183956
> (1 row)
>
> zz=# select * from stat('select tt from tt') order by ndoc desc, nentry
> desc,wo
> rd limit 10;
>   word | ndoc  | nentry
> --+---+
>   blog | 12710 |  12835
>   weblog   |  4857 |   4859
>   news |  4402 |   4594
>   life |  4136 |   4160
>   world|  1980 |   1986
>   journal  |  1882 |   1883
>   livejourn|  1737 |   1737
>   thought  |  1669 |   1677
>   web  |  1154 |   1161
>   scotsman.com |  1138 |   1138
> (10 rows)
>
> zz=# explain analyze select tt from tt where tt @@  'blog';
>QUERY PLAN
> ---
>--- Index Scan using tt_idx on tt 
> (cost=0.00..728.83 rows=184 width=32) (actual time=0.047..141.110
> rows=12710 loops=1) Index Cond: (tt @@ '\'blog\''::tsquery)
> Filter: (tt @@ '\'blog\''::tsquery)
>   Total runtime: 154.105 ms
> (4 rows)
>
> It's really fast ! So, I don't understand your problem.
> I run query on my desktop machine, nothing special.
>
>
>   Oleg
>
> On Tue, 16 Nov 2004, [iso-8859-15] Herv? Piedvache wrote:
> > Hi,
> >
> > I'm completly dispointed with Tsearch2 ...
> >
> > I have a table like this :
> >  Table "public.site"
> >Column |Type |
> > Modifiers
> > ---+-+---
> > id_site   | integer  
> >   | not null default
> > nextval('public.site_id_site_seq'::text)
> > site_name | text|
> > site_url  | text|
> > url| text|
> > language  | text|
> > datecrea  | date| default now()
> > id_category   | integer |
> > time_refresh  | integer |
> > active| integer |
> > error | integer |
> > description   | text|
> > version   | text|
> > idx_site_name | tsvector|
> > lastcheck | date|
> > lastupdate| timestamp without time zone |
> > Indexes:
> >"site_id_site_key" unique, btree (id_site)
> >"ix_idx_site_name" gist (idx_site_name)
> > Triggers:
> >tsvectorupdate_site_name BEFORE INSERT OR UPDATE ON site FOR EACH ROW
> > EXECUTE PROCEDURE tsearch2('idx_site_name', 'site_name')
> >
> > I have 183 956 records in the database ...
> >
> > SELECT s.site_name, s.id_site, s.description, s.site_url,
> >case when exists (select id_user
> >from user_choice u
> > where u.id_site=s.id_site
> >and u.id_user = 1)
> > then 1 else 0 end as bookmarked
> >   FROM site s
> > WHERE s.idx_site_name @@ to_tsquery('atari');
> >
> > Explain Analyze :
> >   QUERY PLAN
> > -
> >- Index
> > Scan using ix_idx_site_name on site s  (cost=0.00..1202.12 rows=184
> > width=158) (actual time=4687.674..4698.422 rows=1 loops=1)
> >   Index Cond: (idx_site_name @@ '\'atari\''::tsquery)
> >   Filter: (idx_site_name @@ '\'atari\''::tsquery)
> >   SubPlan
> > ->  Seq Scan on user_choice u  (cost=0.00..3.46 rows=1 width=4)
> > (actual time=0.232..0.232 rows=0 loops=1)
> >   Filter: ((id_site = $0) AND (id_user = 1))
> > Total runtime: 4698.608 ms
> >
> > First time I run the request I have a result in about 28 seconds.
> >
> > SELECT s.site_name, s.id_site, s.description, s.site_url,
> >case when exists (select id_user
> >from user_choice u
> > where u.id_site=s.id_site
> >and u.id_user = 1)
> > then 1 else 0 end as bookmarked
> >   FROM site_rss s
> > WHERE s.site_name ilike '%atari%'
> >
> >   QUERY PLAN
> > ---

Re: [PERFORM] memcached and PostgreSQL

2004-11-17 Thread Darcy Buskermolen
On November 16, 2004 08:00 pm, Michael Adler wrote:
> http://pugs.postgresql.org/sfpug/archives/21.html
>
> I noticed that some of you left coasters were talking about memcached
> and pgsql. I'm curious to know what was discussed.
>
> In reading about memcached, it seems that many people are using it to
> circumvent the scalability problems of MySQL (lack of MVCC).
>
> from their site:
>
> 
> Shouldn't the database do this?
>
> Regardless of what database you use (MS-SQL, Oracle, Postgres,
> MysQL-InnoDB, etc..), there's a lot of overhead in implementing ACID
> properties in a RDBMS, especially when disks are involved, which means
> queries are going to block. For databases that aren't ACID-compliant
> (like MySQL-MyISAM), that overhead doesn't exist, but reading threads
> block on the writing threads. memcached never blocks.
> 
>
> So What does memcached offer pgsql users? It would still seem to offer
> the benefit of a multi-machined cache.

Have a look at the pdf presentation found on the following site:

http://people.freebsd.org/~seanc/pgmemcache/


>
> -Mike
>
> ---(end of broadcast)---
> TIP 7: don't forget to increase your free space map settings

-- 
Darcy Buskermolen
Wavefire Technologies Corp.
ph: 250.717.0200
fx:  250.763.1759
http://www.wavefire.com

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] query plan question

2004-11-17 Thread Matthew T. O'Connor
David Parker wrote:
We're using postgresql 7.4.5. I've only recently put pg_autovacuum in
place as part of our installation, and I'm basically taking the
defaults. I doubt it's a problem with autovacuum itself, but rather with
my configuration of it. I have some reading to do, so any pointers to
existing autovacuum threads would be greatly appreciated!
Well the first thing to do is increase the verbosity of the 
pg_autovacuum logging output.  If you use -d2 or higher, pg_autovacuum 
will print out a lot of detail on what it thinks the thresholds are and 
why it is or isn't performing vacuums and analyzes.   Attach some of the 
log and I'll take a look at it.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] memcached and PostgreSQL

2004-11-17 Thread Mike Rylander
On 17 Nov 2004 03:08:20 -0500, Greg Stark <[EMAIL PROTECTED]> wrote:
> Josh Berkus <[EMAIL PROTECTED]> writes:
> 
> > So memcached becomes a very good place to stick data that's read often but 
> > not
> > updated often, or alternately data that changes often but is disposable.   
> > An
> > example of the former is a user+ACL list; and example of the latter is web
> > session information ... or simple materialized views.
> 
> I would like very much to use something like memcached for a materialized view
> I have. The problem is that I have to join it against other tables.
> 
> I've thought about providing a SRF in postgres to read records out of
> memcached but I'm unclear it would it really help at all.
> 
> Has anyone tried anything like this?

I haven't tried it yet, but I plan too.  An intersting case might be
to use plperlu to interface with memcached and store hashes in the
cache via some external process, like a CGI script.  Then just define
a TYPE for the perl SRF to return, and store the data as an array of
hashes with keys matching the TYPE.

A (perhaps useless) example could then be something like:

CREATE TYPE user_info AS ( sessionid TEXT,  userid INT, lastaccess
TIMESTAMP, lastrequest TEXT);

CREATE FUNCTION get_user_info_by_session ( TEXT) RETURNS SETOF user_info AS $$
  use Cache::Memcached;

  my $session = shift;

  my $c = $_SHARED{memcached} || Cache::Memcached->new( {servers =>
'127.0.0.1:'} );

  my $user_info = $m->get('web_access_list');

  # $user_info looks like
  # [ {userid => 5, lastrequest => 'http://...', lastaccess => localtime(),
  #sessionid => '123456789'}, { ...} ]
  # and is stored by a CGI.

  @info = grep {$$_{sessionid} eq $session} @$user_info;

  return [EMAIL PROTECTED];
$$ LANGUAGE 'plperlu';

SELECT u.username, f.lastrequest FROM users u,
get_user_info_by_session('123456789') WHERE f.userid = u.userid;


Any thoughts? 

> 
> --
> greg
> 
> ---(end of broadcast)---
> TIP 5: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faqs/FAQ.html
> 


-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


[PERFORM] Analyzer is clueless

2004-11-17 Thread David Brown
I'm doing some performance profiling with a simple two-table query:

SELECT L."ProductID", sum(L."Amount")
FROM "drinv" H
JOIN "drinvln" L ON L."OrderNo" = H."OrderNo"
WHERE
("OrderDate" between '2003-01-01' AND '2003-04-30')
GROUP BY L."ProductID"

drinv and drinvln have about 100,000 and 3,500,000 rows respectively. Actual 
data size in the large table is 500-600MB. OrderNo is indexed in both tables, 
as is OrderDate.

The environment is PGSQL 8 on Win2k with 512MB RAM (results are similar to 7.3 
from Mammoth). I've tried tweaking various conf parameters, but apart from 
using up memory, nothing seems to have had a tangible effect - the Analyzer 
doesn't seem to take resources into account like some of the doco suggests.

The date selection represents about 5% of the range. Here's the plan summaries:

Three months (2003-01-01 to 2003-03-30) = 1 second

HashAggregate  (cost=119365.53..119368.74 rows=642 width=26)
  ->  Nested Loop  (cost=0.00..118791.66 rows=114774 width=26)
->  Index Scan using "drinv_OrderDate" on drinv h  (cost=0.00..200.27 
rows=3142 width=8)
  Index Cond: (("OrderDate" >= '2003-01-01'::date) AND ("OrderDate" 
<= '2003-03-30'::date))
->  Index Scan using "drinvln_OrderNo" on drinvln l  (cost=0.00..28.73 
rows=721 width=34)
  Index Cond: (l."OrderNo" = "outer"."OrderNo")


Four months (2003-01-01 to 2003-04-30) = 60 seconds

HashAggregate  (cost=126110.53..126113.74 rows=642 width=26)
  ->  Hash Join  (cost=277.55..125344.88 rows=153130 width=26)
Hash Cond: ("outer"."OrderNo" = "inner"."OrderNo")
->  Seq Scan on drinvln l  (cost=0.00..106671.35 rows=3372935 width=34)
->  Hash  (cost=267.07..267.07 rows=4192 width=8)
  ->  Index Scan using "drinv_OrderDate" on drinv h  
(cost=0.00..267.07 rows=4192 width=8)
Index Cond: (("OrderDate" >= '2003-01-01'::date) AND 
("OrderDate" <= '2003-04-30'::date))


Four months (2003-01-01 to 2003-04-30) with Seq_scan disabled = 75 seconds


HashAggregate  (cost=130565.83..130569.04 rows=642 width=26)
  ->  Merge Join  (cost=519.29..129800.18 rows=153130 width=26)
Merge Cond: ("outer"."OrderNo" = "inner"."OrderNo")
->  Sort  (cost=519.29..529.77 rows=4192 width=8)
  Sort Key: h."OrderNo"
  ->  Index Scan using "drinv_OrderDate" on drinv h  
(cost=0.00..267.07 rows=4192 width=8)
Index Cond: (("OrderDate" >= '2003-01-01'::date) AND 
("OrderDate" <= '2003-04-30'::date))
->  Index Scan using "drinvln_OrderNo" on drinvln l  
(cost=0.00..119296.29 rows=3372935 width=34)

Statistics were run on each table before query execution. The random page cost 
was lowered to 2, but as you can see, the estimated costs are wild anyway.

As a comparison, MS SQL Server took less than 15 seconds, or 4 times faster.

MySQL (InnoDB) took 2 seconds, which is 30 times faster.

The query looks straightforward to me (it might be clearer with a subselect), 
so what on earth is wrong?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] query plan question

2004-11-17 Thread David Parker
We're using postgresql 7.4.5. I've only recently put pg_autovacuum in
place as part of our installation, and I'm basically taking the
defaults. I doubt it's a problem with autovacuum itself, but rather with
my configuration of it. I have some reading to do, so any pointers to
existing autovacuum threads would be greatly appreciated!

Thanks.

- DAP

>-Original Message-
>From: Tom Lane [mailto:[EMAIL PROTECTED] 
>Sent: Wednesday, November 17, 2004 10:46 AM
>To: David Parker
>Cc: Jeff; Russell Smith; [EMAIL PROTECTED]
>Subject: Re: [PERFORM] query plan question 
>
>"David Parker" <[EMAIL PROTECTED]> writes:
>> So I guess I need to figure out why pg_autovacuum isn't analyzing 
>> those tables.
>
>Which autovacuum version are you using?  The early releases 
>had some nasty bugs that would allow it to skip tables 
>sometimes.  I think all the known problems are fixed as of 
>recent 7.4.x updates.
>
>   regards, tom lane
>

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] query plan question

2004-11-17 Thread Tom Lane
"David Parker" <[EMAIL PROTECTED]> writes:
> So I guess I need to figure out why pg_autovacuum isn't analyzing those
> tables.

Which autovacuum version are you using?  The early releases had some
nasty bugs that would allow it to skip tables sometimes.  I think all
the known problems are fixed as of recent 7.4.x updates.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] query plan question

2004-11-17 Thread David Parker
Hmm, I'm really a beginner at this...

It turns out that the pg_statistic table in my good database has records
in it for the tables in the query, while the pg_statistic table in my
bad database has no records for those tables at all!

So I guess I need to figure out why pg_autovacuum isn't analyzing those
tables.

- DAP 

>-Original Message-
>From: David Parker 
>Sent: Wednesday, November 17, 2004 9:44 AM
>To: 'Jeff'
>Cc: Russell Smith; [EMAIL PROTECTED]
>Subject: RE: [PERFORM] query plan question
>
>I've got pg_autovacuum running on both platforms. I've 
>verified that the tables involved in the query have the same 
>number of rows on both databases.
>
>I'm not sure where to look to see how the stats might be 
>different. The "good" database's pg_statistic table has 24 
>more rows than that in the "bad" database, so there's 
>definitely a difference. The good database's pg_statistic has 
>rows for 2 extra tables, but they are not tables involved in 
>the query in question...
>
>So something must be up with stats, but can you tell me what 
>the most signicant columns in the pg_statistic table are for 
>the planner making its decision? I'm sure this has been 
>discussed before, so if there's a thread you can point me to, 
>that would be great - I realize it's a big general question.
>
>Thanks for your time.
>
>- DAP
>
>>-Original Message-
>>From: Jeff [mailto:[EMAIL PROTECTED]
>>Sent: Wednesday, November 17, 2004 9:01 AM
>>To: David Parker
>>Cc: Russell Smith; [EMAIL PROTECTED]
>>Subject: Re: [PERFORM] query plan question
>>
>>
>>On Nov 17, 2004, at 7:32 AM, David Parker wrote:
>>
>>> Oh, I didn't realize that analyze gave that much more info. 
>>I've got a
>>> lot to learn about this tuning stuff ;-)
>>>
>>> I've attached the output. I see from the new output where the slow 
>>> query is taking its time (the nested loop at line 10), but I still 
>>> have no idea why this plan is getting chosen
>>>
>>
>>looks like your stats are incorrect on the sparc.
>>Did you forget to run vacuum analyze on it?
>>
>>also, do both db's have the same data loaded?
>>there are some very different numbers in terms of actual rows 
>floating 
>>around there...
>>
>>--
>>Jeff Trout <[EMAIL PROTECTED]>
>>http://www.jefftrout.com/
>>http://www.stuarthamm.net/
>>
>>

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] query plan question

2004-11-17 Thread David Parker
I've got pg_autovacuum running on both platforms. I've verified that the
tables involved in the query have the same number of rows on both
databases.

I'm not sure where to look to see how the stats might be different. The
"good" database's pg_statistic table has 24 more rows than that in the
"bad" database, so there's definitely a difference. The good database's
pg_statistic has rows for 2 extra tables, but they are not tables
involved in the query in question...

So something must be up with stats, but can you tell me what the most
signicant columns in the pg_statistic table are for the planner making
its decision? I'm sure this has been discussed before, so if there's a
thread you can point me to, that would be great - I realize it's a big
general question.

Thanks for your time.

- DAP

>-Original Message-
>From: Jeff [mailto:[EMAIL PROTECTED] 
>Sent: Wednesday, November 17, 2004 9:01 AM
>To: David Parker
>Cc: Russell Smith; [EMAIL PROTECTED]
>Subject: Re: [PERFORM] query plan question
>
>
>On Nov 17, 2004, at 7:32 AM, David Parker wrote:
>
>> Oh, I didn't realize that analyze gave that much more info. 
>I've got a 
>> lot to learn about this tuning stuff ;-)
>>
>> I've attached the output. I see from the new output where the slow 
>> query is taking its time (the nested loop at line 10), but I still 
>> have no idea why this plan is getting chosen
>>
>
>looks like your stats are incorrect on the sparc.
>Did you forget to run vacuum analyze on it?
>
>also, do both db's have the same data loaded?
>there are some very different numbers in terms of actual rows floating 
>around there...
>
>--
>Jeff Trout <[EMAIL PROTECTED]>
>http://www.jefftrout.com/
>http://www.stuarthamm.net/
>
>

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] query plan question

2004-11-17 Thread Jeff
On Nov 17, 2004, at 7:32 AM, David Parker wrote:
Oh, I didn't realize that analyze gave that much more info. I've got a
lot to learn about this tuning stuff ;-)
I've attached the output. I see from the new output where the slow 
query
is taking its time (the nested loop at line 10), but I still have no
idea why this plan is getting chosen

looks like your stats are incorrect on the sparc.
Did you forget to run vacuum analyze on it?
also, do both db's have the same data loaded?
there are some very different numbers in terms of actual rows floating 
around there...

--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] query plan question

2004-11-17 Thread David Parker
>If they are the same and PostgreSQL are the same, are the 
>intel machines Xeons?

Yup, dual 3.06-GHz Intel Xeon Processors.

I'm not sure off the top of my head what the sparcs are exactly. We're
in the process of moving completely to intel, but we still have to
support our app on sparc, and we are seeing these weird differences...

- DAP

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] query plan question

2004-11-17 Thread David Parker
Oh, I didn't realize that analyze gave that much more info. I've got a
lot to learn about this tuning stuff ;-) 

I've attached the output. I see from the new output where the slow query
is taking its time (the nested loop at line 10), but I still have no
idea why this plan is getting chosen

Thanks!

- DAP

>-Original Message-
>From: [EMAIL PROTECTED] 
>[mailto:[EMAIL PROTECTED] On Behalf Of 
>Russell Smith
>Sent: Tuesday, November 16, 2004 11:36 PM
>To: [EMAIL PROTECTED]
>Subject: Re: [PERFORM] query plan question
>
>On Wed, 17 Nov 2004 02:54 pm, you wrote:
>> I have a query for which postgres is generating a different 
>plan on different machines. The database schema is the same, 
>the dataset is the same, the configuration is the same (e.g., 
>pg_autovacuum running in both cases), both systems are Solaris 
>9. The main difference in the two systems is that one is sparc 
>and the other is intel.
>> 
>> The query runs in about 40 ms on the intel box, but takes 
>about 18 seconds on the sparc box. Now, the intel boxes we 
>have are certainly faster, but I'm curious why the query plan 
>might be different.
>> 
>> For the intel:
>> 
>> QUERY PLAN
>> Unique  (cost=11.50..11.52 rows=2 width=131)
>>   ->  Sort  (cost=11.50..11.50 rows=2 width=131)
>> Sort Key: up.prefix, s.name, s.tuid, s.foundryversion
>> ->  Hash Join  (cost=10.42..11.49 rows=2 width=131)
>>   Hash Cond: ("outer".dbid = "inner"."schema")
>>   ->  Seq Scan on "schema" s  (cost=0.00..1.02 
>rows=2 width=128)
>>   ->  Hash  (cost=10.41..10.41 rows=4 width=11)
>> ->  Nested Loop  (cost=0.00..10.41 
>rows=4 width=11)
>>   ->  Nested Loop  (cost=0.00..2.14 
>rows=4 width=4)
>> ->  Seq Scan on flow fl  
>(cost=0.00..0.00 rows=1 width=4)
>>   Filter: (servicetype = 646)
>> ->  Index Scan using 
>usage_flow_i on "usage" u  (cost=0.00..2.06 rows=6 width=8)
>>   Index Cond: (u.flow = 
>"outer".dbid)
>>   ->  Index Scan using 
>usageparameter_usage_i on usageparameter up  (cost=0.00..2.06 
>rows=1 width=15)
>> Index Cond: (up."usage" = 
>"outer".dbid)
>> Filter: ((prefix)::text <> 
>> 'xsd'::text)
>> 
>> For the sparc:
>> 
>> QUERY PLAN
>> Unique  (cost=10.81..10.83 rows=1 width=167)
>>   ->  Sort  (cost=10.81..10.82 rows=1 width=167)
>> Sort Key: up.prefix, s.name, s.tuid, s.foundryversion
>> ->  Nested Loop  (cost=9.75..10.80 rows=1 width=167)
>>   Join Filter: ("outer".flow = "inner".dbid)
>>   ->  Hash Join  (cost=9.75..10.79 rows=1 width=171)
>> Hash Cond: ("outer".dbid = "inner"."schema")
>> ->  Seq Scan on "schema" s  
>(cost=0.00..1.02 rows=2 width=128)
>> ->  Hash  (cost=9.75..9.75 rows=1 width=51)
>>   ->  Nested Loop  (cost=0.00..9.75 
>rows=1 width=51)
>> Join Filter: 
>("inner"."usage" = "outer".dbid)
>> ->  Index Scan using 
>usage_flow_i on "usage" u  (cost=0.00..4.78 rows=1 width=8)
>> ->  Index Scan using 
>usageparameter_schema_i on usageparameter up  (cost=0.00..4.96 
>rows=1 width=51)
>>   Filter: 
>((prefix)::text <> 'xsd'::text)
>>   ->  Seq Scan on flow fl  (cost=0.00..0.00 
>rows=1 width=4)
>> Filter: (servicetype = 646)
>> 
>Unique  (cost=11.50..11.52 rows=2 width=131) Unique  
>(cost=10.81..10.83 rows=1 width=167)
>
>The estimations for the cost is basically the same, 10ms for 
>the first row.  Can you supply Explain analyze to see what 
>it's actually doing?
>
>Russell Smith
>
>---(end of 
>broadcast)---
>TIP 6: Have you searched our list archives?
>
>   http://archives.postgresql.org
>
QUERY PLAN
"Unique  (cost=10.81..10.83 rows=1 width=167) (actual time=19390.684..19390.687 
rows=1 loops=1)"
"  ->  Sort  (cost=10.81..10.82 rows=1 width=167) (actual 
time=19390.678..19390.679 rows=1 loops=1)"
"Sort Key: up.prefix, s.name, s.tuid, s.foundryversion"
"->  Nested Loop  (cost=9.75..10.80 rows=1 width=167) (actual 
time=19377.051..19390.391 rows=1 loops=1)"
"  Join Filter: ("outer".flow = "inner".dbid)"
"  ->  Hash Join  (cost=9.75..10.79 rows=1 width=171) (actual 
time=19173.684..19181.827 rows=770 loops=1)"
"Hash Cond: ("outer".dbid = "inner"."schema")"
"->  Seq Scan on "schema" s  (cost=0.00..1.02 rows=2 
width=128) (actual time=0.240..0.693 rows=20 loops=1)"
"->  Hash  (cost=9.75..9.75 rows=1 width=51) (actual 
time=19173.354..19173.354 rows=0 loops=1)"
"

Re: [PERFORM] mis-estimation on data-warehouse aggregate creation

2004-11-17 Thread Kris Jurka


On Tue, 16 Nov 2004, Simon Riggs wrote:

> The join condition has so many ANDed predicates that we assume that this
> will reduce the selectivity considerably. It does not, and so you pay
> the cost dearly later on.
> 

Yes, that makes a lot of sense.  Without some incredibly good cross-column
statistics there is no way it could expect all of the rows to match.  
Thanks for the analysis.

Kris Jurka

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] memcached and PostgreSQL

2004-11-17 Thread Greg Stark
Josh Berkus <[EMAIL PROTECTED]> writes:

> So memcached becomes a very good place to stick data that's read often but 
> not 
> updated often, or alternately data that changes often but is disposable.   An 
> example of the former is a user+ACL list; and example of the latter is web 
> session information ... or simple materialized views.

I would like very much to use something like memcached for a materialized view
I have. The problem is that I have to join it against other tables.

I've thought about providing a SRF in postgres to read records out of
memcached but I'm unclear it would it really help at all.

Has anyone tried anything like this?

-- 
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html