from:"Dave Dutcher"

Re: [PERFORM] Postgresql optimisation

2009-10-28 Thread Dave Dutcher

 -Original Message-
 From: Denis BUCHER
 
 And each morning huge tables are DELETED and all data is 
 INSERTed new from a script. (Well, huge is very relative, 
 it's only 400'000 records)

If you are deleting ALL rows in the tables, then I would suggest using
TRUNCATE instead of DELETE.  Truncate will be faster deleting and it will
not accumulate dead tuples.

Also if you switch to truncate then you should ANALYSE the tables after you
finish inserting.  Note that VACUUM ANALYSE is not necessary after a
truncate/insert because there should be no dead tuples to vacuum.

Dave




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Postgresql optimisation

2009-10-28 Thread Dave Dutcher

 From: Matthew Wakeling
 
 Perhaps reading the other replies in the thread before 
 replying yourself might be advisable, because this previous 
 reply directly contradicts you:
 
 On Wed, 28 Oct 2009, Kevin Grittner wrote:
  I recommend VACUUM ANALYZE of the table(s) after this step. Without 
  that, the first query to read each tuple sets its hint bits and 
  rewrites it, causing a surprising delay at unpredictable times 
  (although heavier near the start of the day).
 
 There *is* a benefit of running VACUUM ANALYSE rather than 
 just ANALYSE.
 
 Matthew

I did read the other replies first, I guess I just missed Kevin Grittner's
somehow.  I noticed several people were worried the OP had problems with
bloat, which is why I suggested TRUNCATE if possible.  That was my main
point.  I guess I made the other comment because I feel beginners with
postgres quite often don't understand the difference between VACUUM and
ANALYSE, and for large tables an ANALYSE alone can take much less time.  I
didn't think about hint bits because I've never noticed a big impact from
them, but that is probably just because of my particular situation.  Now
that it has been pointed out to me I agree it is good advise for the OP to
use VACUUM ANALSE.

Dave



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] What is the role of #fsync and #synchronous_commit in configuration file .

2009-10-06 Thread Dave Dutcher

From: keshav upadhyaya
Subject: [PERFORM] What is the role of #fsync and #synchronous_commit in
configuration file .

Hi , 
I want to imporve  the performance for inserting of huge data in my table .

I have only one idex in table . 

First question - i want to know the role played by 

 #fsync   = onand 
 #synchronous_commit = on

I want to understand more in details what exactly had happened  one is made
them off , 
is it dangerous to do this ?  as it will not sync the data in each commit .

The settings are described in the docs:

http://www.postgresql.org/docs/8.4/interactive/runtime-config-wal.html

If you turn fsync off, you risk data loss in case of power or hardware
failure.

Dave

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] High CPU load on Postgres Server during Peak times!!!!

2009-09-24 Thread Dave Dutcher

From: Shiva Raman
Subject: Re: [PERFORM] High CPU load on Postgres Server during Peak
times

Andy Colson Wrote :  , 
Eww.  I think that's bad.  A connection that has a transaction open will
cause lots of row versions, 
which use up ram, and make it slower to step through the table (even with
an index).  You really need 
to fix up your code and make sure you commit transactions.  (any statement
(select, insert, update) will 
start a new transaction that you need to explicitly commit).

With reference to this suggestion by Andy Colson, we checked the
application code and found that only
INSERT, UPDATE  has COMMIT  and SELECT has no commit, We are using a lot of
Ajax Suggest in the all 
the forms accessed for fetching the data using SELECT statements which are
not explicitly committed. 
We have started updating the code on this. 

You need a COMMIT for every BEGIN.  If you just run a SELECT statement
without first beginning a transaction, then you should not end up with a
connection that is Idle in Transaction.  If you are beginning a transaction,
doing a select, and then not committing, then yes that is a bug.

Dave

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] select max() much slower than select min()

2009-06-19 Thread Dave Dutcher

 -Original Message-
 From: Brian Cox
 Subject: [PERFORM] select max() much slower than select min()

 seems like max() shouldn't take any longer than min() and 
 certainly not 10 times as long. Any ideas on how to determine 
 the max more quickly?

That is odd.  It seems like max should actually have to scan fewer rows than
min should.  It might still be bloat in the table, because unless you did
VACUUM FULL there could still be dead rows.  A vacuum verbose would show if
there is bloat or not.  Also maybe you could try a two column index like
this:

create index test_index on ts_stats_transet_user_interval
(ts_interval_start_time, ts_id);

Dave

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue - 2 linux machines, identical configs, different performance

2009-06-17 Thread Dave Dutcher


We have two machines.  Both running Linux Redhat, both running postgres
8.2.5.
Both have nearly identical 125 GB databases.  In fact we use PITR Recovery
to 
Replicate from one to the other.  

I have to ask the obvious question.  Do you regularly analyze the machine
you replicate too?


Dave



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Yet another slow nested loop

2009-06-16 Thread Dave Dutcher

 -Original Message-
 From: Alexander Staubo
 
-  Nested Loop  (cost=0.00..5729774.95 rows=10420 width=116)
 (actual time=262614.470..262614.470 rows=0 loops=1)
  Join Filter: ((photos.taken_at  (event_instances.time +
 '-01:00:00'::interval)) AND (photos.taken_at  (event_instances.time
 + '07:00:00'::interval)))
  -  Nested Loop  (cost=0.00..2055574.35 rows=11869630
 width=120) (actual time=21.750..121838.012 rows=14013998 loops=1)


Do you have any of the other enable_* options set to false?  What do you
have random_page_cost set to?  I ask because I'm surprised to see postgres
choose to loop when it knows it will have to loop 11 million times.

Dave



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Unexpected query plan results

2009-06-01 Thread Dave Dutcher

 -Original Message-
 From: Anne Rosset
 Subject: Re: [PERFORM] Unexpected query plan results

 SELECT SUM(1) FROM item WHERE is_deleted = 'f'; sum 
 - 1824592 
 (1
 row)
 SELECT SUM(1) FROM item WHERE folder_id = 'tracker3641 
 /sf/sfmain/do/go/tracker3641?returnUrlKey=1243878161701'; sum 

 122412 (1 row)
 SELECT SUM(1) FROM item WHERE folder_id = 'tracker3641 
 /sf/sfmain/do/go/tracker3641?returnUrlKey=1243878161701' AND 
 is_deleted = 'f'; sum - 71 (1 row) SELECT SUM(1) FROM 
 item WHERE 
 folder_id = 'tracker3641 
 /sf/sfmain/do/go/tracker3641?returnUrlKey=1243878161701' AND 
 is_deleted = 't'; sum  122341 (1 row)

 Something's not right here.  If the whole table has only 
 468173 rows, 
 you can't have 1.8 million deleted rows where is_deleted = false.

 ...Robert

 The item table has 2324829 rows
 The artifact table has 468173 rows.
 Thanks,
 Anne

I'd been thinking about the sort, but I hadn't thought yet if that index
scan on item could be made faster.  Could you post the table definition of
item including the indexes on it?

Dave

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Unexpected query plan results

2009-05-29 Thread Dave Dutcher

 From: Anne Rosset
 Subject: [PERFORM] Unexpected query plan results

 Hi,
 We have one query which has a left join. If we run this query without 
 the left join, it runs slower than with the left join.
[snip]
 I am having a hard time to understand why the query runs 
 faster with the 
 left join.

It looks like the query plan for the query without the left join is less
than optimal.  Adding the left join just seemed to shake things up enough
that postgres picked a better plan.  The slow step in the query without the
left join appears to be sorting the result of a hash join so it can be used
in a merge join.

 -  Sort  (cost=47640.91..47808.10 rows=66876 width=70) (actual
time=4273.919..4401.387 rows=168715 loops=1)
   Sort Key: (artifact.id)::text
-  Hash Join  (cost=9271.96..42281.07 rows=66876 width=70)
(actual time=124.119..794.667 rows=184378 loops=1)

The plan might be sped up by removing the sort or making the sort faster.
Postgres thinks the Hash Join will only produce 66,876 rows, but it produces
184,378 rows.  If it made a better estimate of the results of the hash join,
it might not choose this plan.  I don't really know if there is a way to
improve the estimate on a join when the estimates of the inputs look pretty
good.  

As a test you might try disabling sorts by setting enable_sort to false,
then run the explain analyze again to see what you get.

You might be able to make the sort faster by increasing work mem.  What do
you have work mem set to now and what version of Postgres are you using?

Dave

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Unexpected query plan results

2009-05-29 Thread Dave Dutcher

 From: Anne Rosset
 Subject: Re: [PERFORM] Unexpected query plan results

 Thank Dave. We are using postgresql-server-8.2.4-1PGDG and 
 have work-mem set to 20MB.
 What value would you advise?
 thanks,

 Anne

Work-mem is kind of tricky because the right setting depends on how much ram
your machine has, is the machine dedicated to postgres, and how many
simultaneous connections you have.  If this is a test server, and not used
in production, you could just play around with the setting and see if your
query gets any faster.  

Here are the docs on work mem

http://www.postgresql.org/docs/8.2/interactive/runtime-config-resource.html#
RUNTIME-CONFIG-RESOURCE-MEMORY

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-18 Thread Dave Dutcher


What I don't understand is the part where you talking about disabling hash
joins:

* result: planner replaced hash join is replaced by merge join
* execution time: 0.84ms !
* NOTE: curiously planner is expecting to execute this query in 0.29ms
- so it's supposed from its logic to be faster, so why this plan is not used
from the beginning???... 

 Sort  (cost=4562.83..4568.66 rows=2329 width=176) (actual
time=0.237..0.237 rows=20 loops=1)
 Sort Key: h.horder
 Sort Method:  quicksort  Memory: 30kB
 -  Merge Join  (cost=4345.89..4432.58 rows=2329 width=176)
(actual time=0.065..0.216 rows=20 loops=1)
   Merge Cond: (s.ref = h.ref_stat)
   -  Index Scan using stat_ref_idx on stat s
(cost=0.00..49.25 rows=1000 width=45) (actual time=0.018..0.089 rows=193
loops=1)
   -  Sort  (cost=4345.89..4351.72 rows=2329 width=135)
(actual time=0.042..0.043 rows=20 loops=1)
 Sort Key: h.ref_stat
 Sort Method:  quicksort  Memory: 30kB
 -  Index Scan using history_ref_idx on history h
(cost=0.00..4215.64 rows=2329 width=135) (actual time=0.012..0.025 rows=20
loops=1)
   Index Cond: (ref_object = '01'::bpchar)
 Total runtime: 0.288 ms
(12 rows)

The explain analyze ran the query in 0.288 ms.  That is the actual time it
took to run the query on the server.  It is not an estimate of the time.
You measured 0.84 ms to run the query, which seems to imply either a problem
in one of the timing methods or that 66% of your query execution time is
sending the results to the client.  I'm curious how you did you execution
time measurements.

Dave


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Deleting millions of rows

2009-02-02 Thread Dave Dutcher

 -Original Message-
 From: Brian Cox
 Subject: [PERFORM] Deleting millions of rows

 I'm using 8.3.5. Table ts_defects has 48M rows. Through psql: 
 delete from ts_defects;
 Result: out of memory/Can't allocate size: 32 I then did 10 
 or so deletes to get rid of the rows. Afterwards, inserts 
 into or queries on this table performed significantly slower. 
 I tried a vacuum analyze, but this didn't help. To fix this, 
 I dumped and restored the database.

 1) why can't postgres delete all rows in a table if it has 
 millions of rows?
 2) is there any other way to restore performance other than 
 restoring the database?

 Thanks,
 Brian

If you are deleting an entire table, then the TRUNCATE command is the way to
go.  TRUNCATE is very fast and leaves no dead rows behind.  The problem with
a normal delete is that the rows are not actually removed from the file.
Once the table is VACUUMED the dead space is marked as available to be
reused, but plain VACUUM doesn't remove any space either.  A VACUUM FULL or
CLUSTER will actually remove dead space, but they can take a while to run.
(I've heard CLUSTER is supposed to be faster than VACUUM FULL)  Another way
is to create a new table with the same definition as the old table, select
the rows you want to keep into the new table, drop the old table, and then
rename the new table to have the old table's name.  

Dave

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] postgresql performance

2008-03-05 Thread Dave Dutcher

 -Original Message-
 From: SPMLINGAM
 Subject: [PERFORM] postgresql performance

 Dear Friends,
  I have a table with 50 lakhs records, the table has more 
 then 10 fields, i have primary key, i have select query with 
 count(*) without any condition, it takes 17 seconds.

17 seconds to scan 5 million records doesn't sound that bad to me.
Postgresql does not store a count of records, and so it has to actually scan
the table to count all the records.  This was a design choice because select
count(*) isn't usually used in a production system.  

   I have another one query which will do joins with other 
 small tables, it takes 47 seconds to give output, the result 
 has 2 lakhs records. the indexing is not used.  I have 
 created one index with one field ( which i used in this 
 query, the field value has duplicates also ).

You should post which version of Postgresql you are using, your table
definition, and the output of EXPLAIN ANALYSE run on your query.  If you
have a lot of IO wait, you are most likely IO bound.  When Postgresql is
using a lot of CPU it is likely performing a sort or hashing.  Pulling a
large number of rows out of an even larger table can be difficult to do
extremely quickly, but if you post the EXPLAIN ANALYZE output we would know
if things could be improved or not.

Dave

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-performance

Re: [PERFORM] Query only slow on first run

2007-11-28 Thread Dave Dutcher

 -Original Message-
 From: tmp
 We have primarily two tables of interest here: questions 
 (~100k rows) and posts (~400k rows). Each post refers to a 
 question, but only the posts rows for which the 
 corresponding question.status = 1 are relevant. This 
 reduces the number of relevant question rows to about 10k. 

Earlier you said only a small subset of questions have a status of 1, so I
assumed you meant like 100 not 10k :)  According to the explain analyze
there are only 646 rows in posts which match your criteria, so it does seem
like scanning posts first might be the right thing to do. 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Query only slow on first run

2007-11-27 Thread Dave Dutcher

 -Original Message-
 From: cluster
 
  Probably by buying much faster disk hardware.
  Or buy more RAM, so that the data can stay cached.
 
 So the only problem here is lack of RAM and/or disk speed?

I don't think you can reach that conclusion yet.  Like everybody said the
reason the query was faster the second time was that the disk pages were
cached in RAM, and pulling the data out of RAM is way faster than disk.  If
I were you, I would try to optimize the query for when the disk pages aren't
in RAM.  In order to test the query without having anything cached you need
to clear out Postgres's shared buffers and the OS cache.  That can be
tricky, but it may be as easy as running a big select on another table.

As for optimizing the query, I noticed that all three joins are done by
nested loops.  I wonder if another join method would be faster.  Have you
analyzed all the tables?  You aren't disabling hash joins or merge joins are
you?  If you aren't, then as a test I would try disabling nested loops by
doing set enable_nestloop=false and see if the query is any faster for
you.  If it is faster without nested loops, then you might need to look into
changing some settings.

Dave


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Query only slow on first run

2007-11-27 Thread Dave Dutcher

 -Original Message-
 From: cluster
 
 If I disable the nested loops, the query becomes *much* slower.
 
 A thing that strikes me is the following. As you can see I have the
 constraint: q.status = 1. Only a small subset of the data set 
 has this status. I have an index on q.status but for some 
 reason this is not used. Instead the constraint are ensured 
 with a Filter: (q.status = 1) 
 in an index scan for the primary key in the q table. If the 
 small subset having q.status = 1 could be isolated quickly 
 using an index, I would expect the query to perform better. I 
 just don't know why the planner doesn't use the index on q.status.
 

What version of Postgres are you using?  Do you know what your
join_collapse_limit is set to?

You might be able to force it to scan for questions with a status of 1 first
to see if it helps by changing the FROM clause to: 

FROM posts p, question_tags qt, (SELECT * FROM questions WHERE status = 1
OFFSET 0) q

Dave



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD

2007-11-16 Thread Dave Dutcher

 -Original Message-
 From: Ow Mun Heng
 Subject: Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD
 
 Even for Postgresql, nested loops are still evil and hampers 
 performance.


I don't know about that.  There are times when it is the right plan:
 

explain analyze select * from table1 t1 inner join table2 t2 on t1.f_id =
t2.id where t1.id = 'xyzzy';

 QUERY PLAN


 Nested Loop  (cost=0.00..17.65 rows=1 width=344) (actual time=0.080..0.096
rows=1 loops=1)
   -  Index Scan using table1_pkey on table1 t  (cost=0.00..9.18 rows=1
width=238) (actual time=0.044..0.048 rows=1 loops=1)
 Index Cond: ((id)::text = 'xyzzy'::text)
   -  Index Scan using table2_pkey on table2 i  (cost=0.00..8.46 rows=1
width=106) (actual time=0.019..0.023 rows=1 loops=1)
 Index Cond: (t.f_id = i.id)
 Total runtime: 0.224 ms


set enable_nestloop=off;
SET


explain analyze select * from table1 t1 inner join table2 t2 on t1.f_id =
t2.id where t1.id = 'xyzzy';

   QUERY PLAN


 Hash Join  (cost=9.18..72250.79 rows=1 width=344) (actual
time=13493.572..15583.049 rows=1 loops=1)
   Hash Cond: (i.id = t.f_id)
   -  Seq Scan on table2 i  (cost=0.00..61297.40 rows=2188840 width=106)
(actual time=0.015..8278.347 rows=2188840 loops=1)
   -  Hash  (cost=9.18..9.18 rows=1 width=238) (actual time=0.056..0.056
rows=1 loops=1)
 -  Index Scan using table1_pkey on table1 t  (cost=0.00..9.18
rows=1 width=238) (actual time=0.040..0.045 rows=1 loops=1)
   Index Cond: ((id)::text = 'xyzzy'::text)
 Total runtime: 15583.212 ms

(I changed the table names, but everything else is real.)



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Optimizing PostgreSQL for Windows

2007-10-30 Thread Dave Dutcher

From: Christian Rengstl
Subject: [PERFORM] Optimizing PostgreSQL for Windows

Hi list,

I have the following query:
select t.a1, t.a2 from table1 t inner join table2 s
using(id) where t.pid='xyz' and s.chromosome=9 order by s.pos;

-  Bitmap Heap Scan on table1 t  (cost=388.25..27357.57
rows=22286 width=23) (actual time=112.595..32989.663 rows=22864 loops=1)
  Recheck Cond: ((pid)::text = 'xyz'::text)
  -  Bitmap Index Scan on idx_table1  (cost=0.00..382.67
rows=22286 width=0) (actual time=103.790..103.790 rows=22864 loops=1)
Index Cond: ((pid)::text = 'xyz'::text)

The bitmap heap scan on table1 seems very slow.  What version of Postgres
are you using?  There were performance enhancements in 8.1 and 8.2.  What
kind of a hard drive are you using?  I would guess a single SATA drive would
give you better performance than that, but I don't know for sure.  Do you
regularly vacuum the table?  Not enough vacuuming can lead to tables filled
with dead rows, which can increase the amount of data needing to be scanned
considerably.

Dave

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] select count(*) performance (vacuum did not help)

2007-09-24 Thread Dave Dutcher

 -Original Message-
 From: Gábor Farkas
 
 
 i see.
 
 will i achieve the same thing by simply dropping that table 
 and re-creating it?

Yes.  Or even easier (if you don't need the data anymore) you can use the
truncate command.  Which deletes everything in the table including dead
rows.

Dave


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Upgraded from 7.4 to 8.1.4 QUERIES NOW SLOW!!!

2007-09-21 Thread Dave Dutcher

From: smiley2211
Subject: Re: [PERFORM] Upgraded from 7.4 to 8.1.4 QUERIES NOW SLOW!!!

-  Seq Scan on encounters_questions_answers eqa
(cost=1.00..17608.66 rows=464766 width=8) (actual
time=0.003..735.934 rows=464766 loop
s=7430)

It looks like enable_seqscan is set to false.  For some reason that might
have worked on 7.4, but I would try turning that back on for 8.1.
Sequential scans aren't always bad, sometimes they are faster than index
scans.  I would first try running the system with all the enable_* settings
on.

If you can't turn on logging its going to be very hard to track down the
problem.  The easiest way to track down a problem normally is to set
log_min_duration to something like 2000ms.  Then Postgres will log all slow
queries.  Then you can run EXPLAIN ANALYZE on the slow queries to find the
problem.

I think Carlos had a good idea when he asked about the encoding on the new
server vs the old.  Does your application use the like keyword to compare
text fields?  If so, you might need to create indexes which use the
text_pattern_ops operator classes.  With unicode postgres cannot use an
index scan for a query like SELECT * FROM foo WHERE name LIKE 'Bob%' unless
there is an index like CREATE INDEX name_index ON foo (name
text_pattern_ops).  However if you are not using like queries, then this is
not your problem.

More on operator classes:
http://www.postgresql.org/docs/8.1/interactive/indexes-opclass.html

Dave

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Low CPU Usage

2007-09-21 Thread Dave Dutcher

From:  [EMAIL PROTECTED]
Subject: Re: [PERFORM] Low CPU Usage

I have no idea how to continue researching this problem. Now I'm going to
do some networks tests.

I would go back to the slow program and try to capture the slow queries in
the log file.  Once you have some queries which are running slow then you
can run EXPLAIN ANALYZE to see what the bottle neck is.

It seems like you've found pgAdmin is slow sending across the network, but
we don't know if that has anything to do with your original problems.

Just my 2 cents.

Dave

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Query works when kludged, but would prefer best practice solution

2007-09-18 Thread Dave Dutcher

I think Tom is talking about something like this:

explain select * from foo();
  QUERY PLAN
--
 Function Scan on foo  (cost=0.00..12.50 rows=1000 width=50)

The planner is estimating the function will return 1000 rows.


explain select * from foo() where id  0;
 QUERY PLAN
-
 Function Scan on foo  (cost=0.00..15.00 rows=333 width=50)
   Filter: (id  0)

In the second case I am asking for all ids greater than zero, but my ids are
all positive integers.  The planner doesn't know that, so it assumes the
where clause will decrease the number of results.

I would still say this is a kludge, and since you already found a kludge
that works, this may not help you at all.

Dave
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Carlo
Stonebanks
Sent: Tuesday, September 18, 2007 1:29 AM
To: 'Tom Lane'; 'Merlin Moncure'
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Query works when kludged, but would prefer best
practice solution 

Hi Tom,

Thanks for the suggestion - this concept is pretty new to me. Can you expand
a bit on the idea of how to place such a dummy constraint on a function,
and the conditions on which it affects the planner? Would this require that
constraint_exclusion be set on?

(When I go to sleep, I have a dream -- and in this dream Tom writes a
brilliant three line code sample that makes it all clear to me, and I wake
up a PostgreSQL guru)

;-)

Carlo
 
-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED]
Sent: September 17, 2007 11:30 PM
To: Merlin Moncure
Cc: Carlo Stonebanks; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Query works when kludged, but would prefer best
practice solution 

Merlin Moncure [EMAIL PROTECTED] writes:
 On 9/17/07, Carlo Stonebanks [EMAIL PROTECTED] wrote:
 Please see the section marked as PROBLEM in ORIGINAL QUERY plan
below.

 This looks like it might be the problem tom caught and rigged a 
 solution
to:

http://people.planetpostgresql.org/dfetter/index.php?/archives/134-PostgreSQ
L-Weekly-News-September-03-2007.html
 (look fro band-aid).

No, fraid not, that was about misestimation of outer joins, and I see no
outer join here.

What I do see is misestimation of a set-returning-function's output:

  -  Function Scan on zips_in_mile_range  (cost=0.00..12.50 rows=1000
width=40) (actual time=149.850..149.920 rows=66 loops=1)

There's not any very nice way to improve that in existing releases :-(.
In 8.3 it will be possible to add a ROWS option to function definitions to
replace the default 1000 rows estimate with some other number, but that
still helps little if the number of result rows is widely variable.

As far as kluges go: rather than kluging conditions affecting unrelated
tables, maybe you could put in a dummy constraint on the function's output
--- ie, a condition you know is always true, but the planner won't know
that, and will scale down its result-rows estimate accordingly.

regards, tom lane



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] update, truncate and vacuum

2007-07-26 Thread Dave Dutcher

 From: Scott Feldstein
 Subject: [PERFORM] update, truncate and vacuum

 Hi,
 I have a couple questions about how update, truncate and 
 vacuum would work together.

 1) If I update a table foo (id int, value numeric (20, 6)) 
 with update foo set value = 100 where id = 1

 Would a vacuum be necessary after this type of operation 
 since the updated value is a numeric? (as opposed to a sql 
 type where its size could potentially change i.e varchar)

Yes a vacuum is still necessary.  The type doesn't really matter.  Postgres
effectively does a delete and insert on all updates.

 2) After several updates/deletes to a table, if I truncate 
 it, would it be necessary to run vacuum in order to reclaim the space?

No a vacuum is not necessary after a truncate because the whole data file is
deleted once a truncate commits.  There aren't any dead rows because there
aren't any rows.

Dave

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] test / live environment, major performance difference

2007-06-12 Thread Dave Dutcher

 From: Christo Du Preez
 Sent: Tuesday, June 12, 2007 2:38 AM
 
 Where do I set the planner settings or are you reffering to 
 settings in postgres.conf that may affect the planner?
 

Yes I'm reffering to settings in postgres.conf.  I'm wondering if
enable_indexscan or something got turned off on the server for some reason.
Here is a description of those settings:

http://www.postgresql.org/docs/8.2/interactive/runtime-config-query.html

So when you move data from the laptop to the server, I see that your script
correctly runs an analyze after the load, so have you run analyze on the
fast laptop lately?  Hopefully running analyze wouldn't make the planner
choose a worse plan on the laptop, but if we are trying to get things
consistant between the laptop and server, that is something I would try.

If the consistancy problem really is a problem of the planner not using
index scans on the server, then if you can, please post the table definition
for the table with a million rows and an EXPLAIN ANALYZE of a query which
selects a few rows from the table.

Dave


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] How can fixed and variable width columns perform similarly?

2007-04-27 Thread Dave Dutcher

I think the manual is implying that if you store a value like Sid in a
field either of type varchar(128) or type text there is no performance
difference.  The manual is not saying that you get the same performance
storing a 500k text field as when you store the value Sid.

Dave


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Siddharth Anand
Sent: Friday, April 27, 2007 10:32 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] How can fixed and variable width columns perform
similarly?

Hi!
I read the link below and am puzzled by or curious about something.
http://www.postgresql.org/docs/8.1/interactive/datatype-character.html

The Tip below is intriguing

Tip:  There are no performance differences between these three types,
apart from the increased storage size when using the blank-padded type.
While character(n) has performance advantages in some other database
systems, it has no such advantages in PostgreSQL. In most situations text
or character varying should be used instead.

How can a field that doesn't have a limit like text perform similarly to
char varying(128), for example? At some point, we need to write data to
disk. The more data that needs to be written, the longer the disk write
will take, especially when it requires finding free sectors to write to.

Another interesting quote from the same page is the following:

Long values are also stored in background tables so they do not interfere
with rapid access to the shorter column values. 

If the long values are stored in a separate table, on a different part of
the disk, doesn't this imply an extra disk seek? Won't it therefore take
longer?


Sid








---(end of broadcast)---
TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] not using indexes on large table

2007-04-23 Thread Dave Dutcher

 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Jeroen Kleijer

 The problems comes when I try to do a query without using a 
 where clause
 because by then, it completely discards the indexes and does 
 a complete
 table scan which takes over half an hour! (40.710.725 rows, 1110258
 pages, 1715 seconds)
 
 I've tried several things but doing a query like:
 select distinct volume from project_access_times
 or
 select distinct qtree from project_access_times
 always result in a full sequential table scan even after a 
 'vacuum' and
 'vacuum analyze'.

To my knowledge Postgres doesn't use indexes for distinct queries or
grouping.  Also you are getting horrible IO performance.  Our old slow test
machine can scan a table of 12 million rows in 100 seconds, and our
production server can do the same in 20 seconds.  If possible, I would try
running the same thing on your local hard drive.  That way you can see how
much the netapp and NFS are slowing you down.  Although in the end if you
need very fast distinct queries, you will need to maintain a separate table.

Dave


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [PERFORM] Basic Q on superfluous primary keys

2007-04-18 Thread Dave Dutcher

I think a database with all natural keys is unrealistic.  For example if you
have a table that refers to people, are you going to use their name as a
primary key?  Names change all the time due to things like marriage,
divorce, or trouble with the law.  We have tables with 20 million rows which
reference back to a table of people, and if I used the person's name as key,
it would be a major pain when somebody's name changes.  Even if there is
referential integrity, one person might be referred to by 25% of the 20
million rows, so the update would take quite a long time.  Also the table
will be filled with dead rows and the indexes will likely be bloated.  If I
want to clean that up, it will take a vacuum full or a cluster which will
lock the whole table and run for hours.  If I use a surrogate key, I can
change their name in one row and be done with it.  

Just my 2 cents.

Dave


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Beginner Question

2007-04-10 Thread Dave Dutcher

In your first post you said that the query is taking much longer than a
second, and in your second post you say the performance is horrible, but
explain analyze shows the query runs in 219 milliseconds, which doesn't seem
too bad to me.  I wonder if the slow part for you is returning all the rows
to the client?  How are you running this query?  (JDBC, ODBC, C library?)
Do you really need all the rows?  Maybe you could use a cursor to page
through the rows?

Dave


 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of s d
 Sent: Monday, April 09, 2007 7:46 PM
 To: Jan de Visser
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Beginner Question
 
 
 Hi Jan,
 Adding this Index slowed down things by a factor of 4.
 
 Also, the performance is so horrible (example bellow) that i am
 certain i am doing something wrong.
 
 Does the following explain gives any ideas ?
 
 Thanks
 
 =#  EXPLAIN ANALYZE select * from word_association where (word1 ='the'
 or word2='the') and count  10;
 
 QUERY PLAN
 --
 --
 
  Bitmap Heap Scan on word_association  (cost=250.86..7256.59 rows=4624
 width=22) (actual time=13.461..211.568 rows=6601 loops=1)
Recheck Cond: (((word1)::text = 'the'::text) OR ((word2)::text =
 'the'::text))
Filter: (count  10)
-  BitmapOr  (cost=250.86..250.86 rows=12243 width=0) (actual
 time=9.052..9.052 rows=0 loops=1)
  -  Bitmap Index Scan on word_association_index1_1
 (cost=0.00..153.20 rows=7579 width=0) (actual time=5.786..5.786
 rows=7232 loops=1)
Index Cond: ((word1)::text = 'the'::text)
  -  Bitmap Index Scan on word_association_index2_1
 (cost=0.00..95.34 rows=4664 width=0) (actual time=3.253..3.253
 rows=4073 loops=1)
Index Cond: ((word2)::text = 'the'::text)
   Total runtime: 219.987 ms
 (9 rows)


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Weird performance drop

2007-03-30 Thread Dave Dutcher

 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Vincenzo Romano
 
 Is there any workaround?
 
 In my opinion the later the query planner decisions are taken the more
 effective they can be.
 It could be an option for the function (body) to delay any 
 query planner
 decision.

I think a possible workaround is to use a plpgsql function and the execute
statement. The docs will have more info.

Dave



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] scalablility problem

2007-03-30 Thread Dave Dutcher

 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Xiaoning Ding
 
 
 Hi all,
 
 When I run multiple TPC-H queries (DBT3) on  postgresql, I 
 found the system
 is not scalable.  My machine has 8GB memory, and 4 Xeon Dual 
 Core processor
 ( 8 cores in total). OS kernel is linux 2.6.9. Postgresql is 
 7.3.18.

Is there anyway you can upgrade to 8.2?  There have been a lot of
performance and scalability enhancements.


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Weird performance drop

2007-03-29 Thread Dave Dutcher

 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Vincenzo Romano

 I thought that the query planner usually did a bad job on 
 function bodies
 because they'd appear opaque to it.
 In this case it seems to me that the body is opaque only if I 
 use the like
 operator.

If you run explain on a query that looks like select * from a_table where
a_column like 'foo%' (and you have the appropriate index) you will see that
postgres rewrites the where clause as a_column = 'foo' and a_column 
'fop'.  I think your problem is that the query is planned when the function
is created, and at that time postgres doesn't know the value you are
comparing against when you use the like operator, so postgres can't rewrite
the query using = and .  The problem doesn't happen for plain equality
because postgres doesn't need to know anything about what you are comparing
against in order to use equality.

Somebody else can correct me if I'm wrong.

Dave


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [PERFORM] Planner doing seqscan before indexed join

2007-03-29 Thread Dave Dutcher

 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Dan Harris
 
 After some digging, I've found that the planner is choosing 
 to apply a necessary seq scan to the table.  Unfortunately,
 it's scanning the whole table, when it seems that it could 
 have joined it to a smaller table first and reduce the
 amount of rows it would have to scan dramatically ( 70 
 million to about 5,000 ).
 

Joining will reduce the amount of rows to scan for the filter, but
performing the join is non-trivial.  If postgres is going to join two tables
together without applying any filter first then it will have to do a seqscan
of one of the tables, and if it chooses the table with 5000 rows, then it
will have to do 5000 index scans on a table with 70 million records.  I
don't know which way would be faster. 

I wonder if you could find a way to use an index to do the text filter.
Maybe tsearch2?  I haven't used anything like that myself, maybe someone
else has more input.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Nested Loop

2007-03-26 Thread Dave Dutcher

-Original Message-
From: [EMAIL PROTECTED] On Behalf Of Gauri Kanekar
Subject: Re: [PERFORM] Nested Loop

join_collapse_limit = 1 # JOINs 

Is there a reason you have this set to 1?  Postgres can't consider multiple
join orders when you do that.  I would try setting that back to the default
and seeing if this query is any faster.

Other than that it looked like the problems with the query might be bad
estimates of rows.  One is that postgres expects there to be 1 matching row
from rd when there are actually 30.  You might try increasing the statistics
targets on rd.sd and rd.sdt, reanalyzing, and seeing if that helps.  Also
postgres expects the join of rd and rm to return about 27205 rows when it
actually returns 10 million.  I'm not sure what you can do about that.
Maybe if Postgres gets a better estimate for rd it would then estimate the
join better.

Dave

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [PERFORM] Execution plan changed after upgrade from 7.3.9 to 8.2.3

2007-03-13 Thread Dave Dutcher

 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 [EMAIL PROTECTED]
 Subject: Re: [PERFORM] Execution plan changed after upgrade 
 from 7.3.9 to 8.2.3

 The following did not change anything in the execution plan

 ALTER TABLE lm05_t_tarif_panneau ALTER COLUMN lrg_min SET 
 STATISTICS 1000
 ALTER TABLE lm05_t_tarif_panneau ALTER COLUMN lrg_max SET 
 STATISTICS 1000
 ANALYZE lm05_t_tarif_panneau

 I was able to improve response time by creating indexes, but I would 
 like to avoid changing the database structure because it is not 
 maintained by ourseleves, but by the  third party vendor.

I would actually try increasing the statistics on table
lm05_t_couleur_panneau columns ht_min, ht_max, cod_aspect, and
cod_gamme_panneau.  Because I think the planner is thrown off because the
sequential scan on lm05_t_couleur_panneau returns 280 rows when it expects
1.  Maybe to start you could just SET default_statistics_target=1000,
analyze everything, and see if that makes any difference.

Dave

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Execution plan changed after upgrade from 7.3.9 to 8.2.3

2007-03-13 Thread Dave Dutcher

 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 [EMAIL PROTECTED]
 Subject: Re: [PERFORM] Execution plan changed after upgrade 
 from 7.3.9 to 8.2.3

 Increasing the default_statistics_target to 1000 did not help.
 It just make the vacuum full analyze to take longer to complete.

Just FYI when you change statistics you only need to run ANALYZE, not VACUUM
ANALYZE, and definetly not VACUUM FULL ANALYZE.

I don't know what else to suggest for this query since you can't change the
SQL.  I would talk to the vendor and ask them to add indexes if you know
that helps.

Dave

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Bad Row Count Estimate on View with 8.2

2007-01-30 Thread Dave Dutcher

 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane
 
 I took another look and think I found the problem: 8.2's new code for
 flattening UNION ALL subqueries into append relations is failing to
 initialize all the fields of the appendrel, which confuses
 estimate_num_groups (and perhaps other places).  I think this will fix
 it for you.
 

I gave this a try on our test machine yesterday and it worked.  The planner
was estimating that the group by on the union would return about 300 rows
which is very similar to what 8.1.2 thought.  Actually it returned about
3000 rows, but still it is a good enough estimate to pick a plan which takes
100ms instead of a plan which takes 100 seconds.

Thanks,

Dave



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Bad Row Count Estimate on View with 8.2

2007-01-28 Thread Dave Dutcher

 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane
 
 
 In fact, since there isn't any parent relation in a UNION, I'm not
 sure that this patch actually changed your results ... but I'm not
 seeing what else would've ...
 

Thanks for looking into it.  I thought I might actually test if it was the
patch you mentioned which changed my results, but I haven't had time.
Because you mentioned it was grouping on the results of a UNION ALL which
was throwing off the row estimate I changed my query from a UNION ALL/GROUP
BY to a GROUP BY/FULL OUTER JOIN.  The view runs a hair slower by itself,
but the better estimate of rows makes it work much better for joining with.
If anybody is curious, this is what I changed too:

SELECT 
coalesce(pos_set.fund_id, trade_set.fund_id) as fund_id,
coalesce(pos_set.owner_trader_id, trade_set.owner_trader_id) as
owner_trader_id,
coalesce(pos_set.strategy_id,  trade_set.strategy_id) as strategy_id,
coalesce(pos_set.cf_account_id,  trade_set.cf_account_id) as cf_account_id,
coalesce(pos_set.instrument_id, trade_set.instrument_id) as instrument_id,
coalesce(pos_set.pos, 0) + coalesce(trade_set.pos, 0) as pos, 
coalesce(pos_set.cost, 0) + coalesce(trade_set.cost, 0) as cost
FROM
(
SELECT om_position.fund_id, om_position.owner_trader_id,
om_position.strategy_id, om_position.cf_account_id,
om_position.instrument_id, om_position.pos, om_position.cost
FROM om_position
WHERE om_position.as_of_date = date(now())
) as pos_set
full outer join
(
SELECT om_trade.fund_id, om_trade.owner_trader_id, om_trade.strategy_id,
om_trade.cf_account_id, om_trade.instrument_id,
sum(om_trade.qty::numeric(22,9)) AS pos, sum(om_trade.cost) as cost
FROM om_trade
WHERE om_trade.process_state = 0 OR om_trade.process_state = 2
GROUP BY om_trade.fund_id, om_trade.owner_trader_id, om_trade.strategy_id,
om_trade.cf_account_id, om_trade.instrument_id
) as trade_set 
ON 
pos_set.fund_id = trade_set.fund_id and pos_set.owner_trader_id =
trade_set.owner_trader_id and
pos_set.strategy_id = trade_set.strategy_id and pos_set.cf_account_id =
trade_set.cf_account_id and
pos_set.instrument_id = trade_set.instrument_id;




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[PERFORM] Bad Row Count Estimate on View with 8.2

2007-01-23 Thread Dave Dutcher

 on
om_trade_partial_process_state_index  (cost=0.00..4.26 rows=1 width=0)
(actual time=0.004..0.004 rows=0 loops=1)
  Index Cond: (process_state = 2)
Total runtime: 27.055 ms



Thanks,


Dave Dutcher
Telluride Asset Management
952.653.6411
 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Problem with grouping, uses Sort and GroupAggregate, HashAggregate is better(?)

2007-01-14 Thread Dave Dutcher

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Rolf Østvik (HA/EXA)

Have you tried set enable_sort=off with 8.1.2?  I'm not sure if that will
change anything because it has to do at least one sort.  Its just a lots
faster to do a hashagg + small sort than one big sort in this case.  (I
wonder if there should be enable_groupagg?)


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Large table performance

2007-01-13 Thread Dave Dutcher

Have you run vacuum and analyze on the table?  What version of Postgres are
you running?  What OS are you using?
 
This looks like a straight forward query.  With any database the first time
you run the query its going to be slower because it actually has to read off
disk.  The second time its faster because some or all of the data/indexes
will be cached.  However 10 seconds sounds like a long time for pulling
10,000 records out of a table of 3 million.  If you post an EXPLAIN ANALYZE,
it might give us a clue.
 
Dave
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Mark Dobbrow
Sent: Friday, January 12, 2007 6:31 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Large table performance


Hello - 

I have a fairly large table (3 million records), and am fetching 10,000
non-contigous records doing a simple select on an indexed column ie 

select grades from large_table where teacher_id = X

This is a test database, so the number of records is always 10,000 and i
have 300 different teacher ids.

The problem is, sometimes fetching un-cached records takes 0.5 secs and
sometimes (more often) is takes more like 10.0 seconds

(fetching the same records for a given teacher_id a second time takes about
0.25 secs)

Has anyone seen similar behavior or know what the solution might be?

any help much appreciated,
Mark



ps. My shared_buffers is set at 5000 (kernal max), and work_mem=8192

Re: [PERFORM] Slow Query on Postgres 8.2

2007-01-08 Thread Dave Dutcher

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane
 
 [ fools around with it for awhile... ]  I think this is already fixed
 for 8.2.1.  Note the costs of the two related index scans:

I installed 8.2.1 this morning and it works much better.  The query that was
taking 3411.429ms on 8.2.0 now takes 9.3ms.  Thanks for your help.


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Slow Query on Postgres 8.2

2007-01-04 Thread Dave Dutcher

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane
 
 Um ... what indexes has this table got exactly?  It's very 
 unclear what
 alternatives the planner is being faced with.
 

Here is the table definition.  Thanks.

   Table public.om_transaction
 Column  |  Type  |Modifiers
-++-

 transaction_id  | character varying(20)  | not null default '0'::character
varying
 type| character varying(20)  | not null default ''::character
varying
 fund_id | character varying(10)  | not null default ''::character
varying
 owner_trader_id | character varying(10)  | not null default ''::character
varying
 strategy_id | character varying(30)  | not null default ''::character
varying
 instrument_id   | integer| default 0
 cf_account_id   | integer| not null default 0
 as_of_date  | date   | not null default
'0001-01-01'::date
 insert_date | date   | not null default
'0001-01-01'::date
 amount  | numeric(22,9)  | not null default 0.0
 currency_id | integer| not null default 0
 process_state   | integer| not null
 comment | character varying(256) | default ''::character varying
Indexes:
om_transaction_pkey PRIMARY KEY, btree (transaction_id)
cf_account_id_om_transaction_index btree (cf_account_id)
currency_id_om_transaction_index btree (currency_id)
fund_id_om_transaction_index btree (fund_id)
instrument_id_om_transaction_index btree (instrument_id)
om_transaction_om_transaction_index btree (as_of_date, fund_id,
strategy_id, owner_trader_id, cf_account_id, instrument_id, type)
om_transaction_partial_process_state_index btree (process_state) WHERE
process_state = 0
owner_trader_id_om_transaction_index btree (owner_trader_id)
strategy_id_om_transaction_index btree (strategy_id)
Foreign-key constraints:
$1 FOREIGN KEY (owner_trader_id) REFERENCES om_trader(trader_id)
$2 FOREIGN KEY (fund_id) REFERENCES om_fund(fund_id)
$3 FOREIGN KEY (strategy_id) REFERENCES om_strategy(strategy_id)
$4 FOREIGN KEY (cf_account_id) REFERENCES om_cf_account(id)
$5 FOREIGN KEY (instrument_id) REFERENCES om_instrument(id)
$6 FOREIGN KEY (currency_id) REFERENCES om_instrument(id)


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Slow Query on Postgres 8.2

2007-01-04 Thread Dave Dutcher

The source data is a little different.  The fast query was on our production
8.1 server, and the other was a test 8.2 server with day old data.  The
production server has like 3.84 million rows vs 3.83 million rows in test,
so the statistics might be a little different, but I would figure the
compairison is still valid.
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Adam Rich
Sent: Thursday, January 04, 2007 7:19 PM
To: 'Dave Dutcher'; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow Query on Postgres 8.2


Dave,
Is it me or are the two examples you attached returning different row
counts?  
That means either the source data is different, or your queries are.
 
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Dave Dutcher
Sent: Thursday, January 04, 2007 5:32 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Slow Query on Postgres 8.2



Hello,
 
I am looking at upgrading from 8.1.2 to 8.2.0, and I've found a query which
runs a lot slower.  Here is the query:
 
select type, currency_id, instrument_id, sum(amount) as total_amount from
om_transaction 
where 
strategy_id in
('BASKET1','BASKET2','BASKET3','BASKET4','BASKET5','BASKET6','BASKET7','BASK
ET8','BASKET9','BASKET10','BASKET11')
and owner_trader_id in ('dave','sam','bob','tad',
'tim','harry','frank','bart','lisa','homer','marge','maggie','apu','milhouse
','disco stu')
and cf_account_id in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,29)
and as_of_date  '2006-12-04' and as_of_date = '2006-12-05' 
group by type, currency_id, instrument_id;

I changed the values in the in statements to fake ones, but it still takes
over three seconds on 8.2, where 8.1 only takes 26 milliseconds.  When I
increase the number of valules in the IN clauses, the query rapidly gets
worse.  I tried increasing my stats target to 1000 and analyzing, but that
didn't help so I put that back to 10.  While the query is running the CPU is
at 100%.  Is there a more efficient way to write a query like this?  I've
attached the output from EXPLAIN ANALYZE in a file because it is somewhat
large.
 
Thanks,
 

Dave Dutcher
Telluride Asset Management
952.653.6411

Re: [PERFORM] Regex performance issue

2006-12-02 Thread Dave Dutcher

 -Original Message-
 From: [EMAIL PROTECTED] On Behalf Of Alexandru Coseru
 asterisk= explain analyze SELECT * FROM destlist WHERE 
 '0039051248787' ~ 
 prefix AND id_ent='-2' AND dir=0 ORDER by length(prefix) DESC;

  
 QUERY PLAN
 --
 --
  Sort  (cost=7925.07..7925.15 rows=31 width=67) (actual 
 time=857.715..857.716 rows=2 loops=1)
Sort Key: length((prefix)::text)
-  Bitmap Heap Scan on destlist  (cost=60.16..7924.30 
 rows=31 width=67) 
 (actual time=2.156..857.686 rows=2 loops=1)
  Recheck Cond: ((id_ent = -2) AND (dir = 0))
  Filter: ('0039051248787'::text ~ (prefix)::text)
  -  Bitmap Index Scan on destlist_indx2  (cost=0.00..60.16 
 rows=6193 width=0) (actual time=1.961..1.961 rows=5205 loops=1)
Index Cond: ((id_ent = -2) AND (dir = 0))
  Total runtime: 857.804 ms
 (8 rows)
 

 mmumu btree (prefix varchar_pattern_ops)
 

I'm surpised Postgres isn't using the index on prefix seeing as the index
uses the varchar_pattern_ops operator class.  It could be that the index
isn't selective enough, or is Postgres not able to use an index with Posix
regular expressions?  The docs seem to say that it can, but I'd be curious
to see what happens if you use LIKE instead of ~. 

Dave
 


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Query plan for heavy SELECT with lite sub-SELECTs

2006-11-02 Thread Dave Dutcher

 -Original Message-
 From: [EMAIL PROTECTED] 
 Nikolay Samokhvalov
 
 What should I do to make Postgres work properly in such cases (I have
 a lot of similar queries; surely, they are executed w/o seqscans, but
 overall picture is the same - I see that starting from sub-selects
 dramatically decrease performance)?

How about this:

explain analyze 
select (select typname from pg_type where pg_type.oid=mainq.prorettype limit
1)
from (select * from pg_proc offset 1500 limit 1) mainq;

QUERY PLAN

-
 Subquery Scan mainq  (cost=50.99..56.85 rows=1 width=4) (actual
time=13.646..13.659 rows=1 loops=1)
   -  Limit  (cost=50.99..51.02 rows=1 width=310) (actual
time=13.575..13.579 rows=1 loops=1)
 -  Seq Scan on pg_proc  (cost=0.00..62.34 rows=1834 width=310)
(actual time=0.014..7.297 rows=1501 loops=1)
   SubPlan
 -  Limit  (cost=0.00..5.82 rows=1 width=64) (actual time=0.038..0.043
rows=1 loops=1)
   -  Index Scan using pg_type_oid_index on pg_type
(cost=0.00..5.82 rows=1 width=64) (actual time=0.028..0.028 rows=1 loops=1)
 Index Cond: (oid = $0)
 Total runtime: 13.785 ms

I would expect you to get closer to 2 ms on that query.  My machine takes 13
ms to do just the seq scan of pg_proc.

Dave




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Vacuum and Memory Loss

2006-10-22 Thread Dave Dutcher

 Hello friends,
 
 I am responsible for maintaining a high volume website using 
 postgresql
 8.1.4. Given the amount of reads and writes, I vacuum full 
 the server a
 few times a week around 1, 2 AM shutting down the site for a few
 minutes. The next day morning around 10 - 11 AM the server slows down
 to death. It used to be that the error 'Too many clients' would be
 recorded, until I increased the number of clients it can handle, and
 now it simply slows down to death having lots and lots of postmaster
 processes running:

If you are saying that running the vacuum full helps your performance, then
you want to make sure you are running plain vacuum and analyze frequently
enough.  If you have a database which has lots of update and delete
statements, and you do not run vacuum regularly enough, you can end up with
lots dead blocks slowing down database scans.  If you do lots of updates and
deletes you should shedule vacuum and analyze more often, or you might want
to look into running auto vacuum:

http://www.postgresql.org/docs/8.1/interactive/maintenance.html#AUTOVACUUM

If you aren't doing lots of updates and deletes, then maybe you just have a
busy database.  Lots of postmaster processes implies you have lots of
clients connecting to your database.  You can turn on stats_command_string
and then check the pg_stat_activity table to see what these connections are
doing.  If they are running queries, you can try to optimize them.  Try
turning on logging of long running queries with log_min_duration_statement.
Then use EXPLAIN ANALYZE to see why the query is slow and if anything can be
done to speed it up.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] PostgreSQL Caching

2006-10-04 Thread Dave Dutcher

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Adnan DURSUN
 i want to be can read an execution plan when 
 i look at it. 
 So, is there any doc about how it should be read ?


You are asking how to read the output from EXPLAIN?  This page is a good
place to start:

http://www.postgresql.org/docs/8.1/interactive/performance-tips.html 




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] simple case using index on windows but not on linux

2006-10-04 Thread Dave Dutcher

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 
 lc_collate is C, as are all the other lc settings.
 
 I have run the analyze commands.
 
 Still the same.


That is strange.  I figured it had to be related to the locale and the LIKE
operator.  I'm not an expert on these locale issues, but I'd be curious to
see if it would start using an index if you added an index like this:

CREATE INDEX test_index ON t_order (c_number varchar_pattern_ops);

Dave


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Performace Optimization for Dummies

2006-09-28 Thread Dave Dutcher

 -Original Message-
 From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Carlo
Stonebanks
 Subject: [PERFORM] Performace Optimization for Dummies

 At this early stage in the project, we are initializing our portal's 
 database with millions of rows of imported data in over 50 different 
 flattened tables; each table's structure is unique to the 
 data provider. 
 This requires a pretty complex import program, because the 
 data must be 
 matched semantically, not literally. Even with all of the expression 
 matching and fuzzy logic in the code,our performance 
 statistics show that 
 the program spends over 75% of its time in SQL queries 
 looking for matching 
 and/or duplicate data.

 The import is slow - and degrades as the tables grow. 

So your program first transforms the data and then inserts it?  And it is
the transforming process which is running select statements that is slow?
If that is the case you could use duration logging to find the slow select
statement, and then you could post an EXPLAIN ANALYZE of the select.  

One question off the top of my head is are you using regular expressions for
your fuzzy logic if so do your indexes have the right operator classes?
(see http://www.postgresql.org/docs/8.1/static/indexes-opclass.html)

Dave

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Forcing the use of particular execution plans

2006-09-27 Thread Dave Dutcher

 -Original Message-
 From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of  Tim Truman
 
 Hi,
 
 I have the following query which has been running very slowly 
 and after a
 lot of testing/trial and error I found an execution plan that 
 ran the query
 in a fraction of the time (and then lost the statistics that 
 produced it).
 What I wish to know is how to force the query to use the 
 faster execution
 plan.

It would be a bit easier to diagnose the problem if you posted EXPLAIN
ANALYZE rather than just EXPLAIN.  The two plans you posted looked very
similar except for the order of the nested loop in subquery 1 and an index
scan rather than a seq scan in subquery 2.  

My guess would be that the order of the nested loop is determined mostly by
estimates of matching rows.  If you ran an EXPLAIN ANALYZE you could tell if
the planner is estimating correctly.  If it is not, you could try increasing
your statistics target and running ANALYZE.  

To make the planner prefer an index scan over a seq scan, I would first
check the statistics again, and then you can try setting enable_seqscan to
false (enable_seqscan is meant more for testing than production) or, you
could try reducing random_page_cost, but you should test that against a
range of queries before putting it in production.

Dave


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] running benchmark test on a 50GB database

2006-09-20 Thread Dave Dutcher

I would start by reading this web page:

http://powerpostgresql.com/PerfList

There are probably some other web pages out there with similar information,
or you can check the mailing list archives for a lot of info.  If those
places don't help, then you should try to indentify what queries are slow,
post an EXPLAIN ANALYZE of the slow queries along with the relvent schema
info (i.e. table definitions and indexes).



 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Nuno Alves
 Sent: Wednesday, September 20, 2006 10:28 AM
 To: pgsql-performance@postgresql.org
 Subject: [PERFORM] running benchmark test on a 50GB database
 
 
 Hi,
 
 I am running bechmark test in a 50 GB postgresql database.
 I have the postgresql.conf with all parameters by default.
 In this configuration the database is very, very slow.
 
 Could you please tell which is the best configuration?
 
 My system:
 Pentium D 3.0Ghz
 RAM: 1GB
 HD: 150GB SATA
 
 Thanks in advance,
 Nuno
 
 ---(end of 
 broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
 


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] High CPU Load

2006-09-14 Thread Dave Dutcher


 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Jérôme BENOIS
 
 explain analyze select distinct
 INTEGER_VALUE,DATE_VALUE,EI_ID,VALUE_TYPE,FLOAT_VALUE,ID,TEXT_
 VALUE,CATEGORY_ID,STRING_VALUE,CATEGORYATTR_ID,NAME from ((( 
 select distinct ei_id as EIID from mpng2_ei_attribute as 
 reqin1 where reqin1.CATEGORYATTR_ID = 0 AND reqin1.TEXT_VALUE 
 ilike '' and ei_id in ( select distinct ei_id as EIID from 
 mpng2_ei_attribute as reqin2 where reqin2.CATEGORYATTR_ID = 0 
 AND reqin2.TEXT_VALUE ilike '' and ei_id in ( select distinct 
 ei_id as EIID from mpng2_ei_attribute as reqin3 where 
 reqin3.NAME = '' AND reqin3.STRING_VALUE = '' ) ) ) ) ) as 
 req0 join mpng2_ei_attribute on req0.eiid = 
 mpng2_ei_attribute.ei_id order by ei_id asc;


That is a lot of distinct's.  Sorts are one thing that can really use up
CPU.  This query is doing lots of sorts, so its not surprising the CPU usage
is high.  

On the subqueries you have a couple of cases where you say ... in (select
distinct ...)  I dont think the distinct clause is necessary in that case.
I'm not a hundred percent sure, but you might want to try removing them and
see if the query results are the same and maybe the query will execute
faster.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [Fwd: Re: [PERFORM] Performance problem with Sarge compared

2006-09-12 Thread Dave Dutcher



 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Piñeiro
  TIP 6: explain analyze is your friend
 Well, yes, it is a friend, but as the select at postgre Sarge version
 never finished I can't use a explain analyze. I show you the explain,
 with the hope that someone has any idea, but i think that 
 this is almost
 indecipherable (if you want the Woody ones i can post the explain
 analyze). Thanks in advance.

Does the machine run out of disk space every time?  Is it possible to try
the query on a different machine with more hard drive room?  An explain
analyze of the slow plan will be much more helpful than an explain, even if
its from a different machine.  If its generating a large temp file, it is
another sign that the query is doing some kind of large cross product.  


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Performance problem with Sarge compared with Woody

2006-09-11 Thread Dave Dutcher

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Piñeiro
 Subject: [PERFORM] Performance problem with Sarge compared with Woody

 a week ago we migrate a Woody(postgre 7.2.1) server to Sarge(postgre
 7.4.7). To migrate the database we use a dump, using pg_dump with this
 options:
 pg_dump -U username -c -F p -O -v -f filename DBname

 We have a search, that using woody take about 1-2 minutes, but with
 sarge it is executing about 2 hours, and at least it crashes, with a
 message about a temporal file and no more disk space ( i have 
 more than
 a GB of free disk space).

 Any idea ?

The first question is did you run ANALYZE on the new database after
importing your data?  

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] [PATCHES] Template0 age is increasing speedily.

2006-09-07 Thread Dave Dutcher

Title: Message



I 
would expect that the age of Template0 is increasing at the same rate as every 
other database in your cluster. Transaction IDs are global across all 
databases in the cluster, so as I understand it, executing a transaction in any 
database will increase the age of all databases by 1.


  
  -Original Message-From: 
  [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of Nimesh 
  SatamSent: Thursday, September 07, 2006 5:49 AMTo: Chris 
  MairCc: pgsql-performance@postgresql.orgSubject: Re: 
  [PERFORM] [PATCHES] Template0 age is increasing speedily.
  Hi,
  
  Postgres Version used is 8.1.3
  OS: Linux
  
  
  'SELECT datname, age(datfrozenxid) FROM pg_database'
  
  postgres 
  | 
  1575xyz 
  | 1073743934template1 
  | 
  1632template0 | 
  61540256
  This is the command which I tried and got the above output, and the 
  number is increasing pretty fast for template0.
  
  Please let me know if this a problem.
  
  
  
  Regards,
  Nimesh.
  
  On 9/7/06, Chris 
  Mair [EMAIL PROTECTED] wrote: 
  On 
Thu, 2006-09-07 at 16:01 +0530, Nimesh Satam wrote: I noticed 
that the age oftemplate0 is increasing very rapidly..Can 
 you please let me know how we can control this and what 
causes such problems. We also noticed that the 
database slow downs heavily at a particular time..Can you suggest 
any tools which will help in diagnosing the root  cause behiond the 
data load.Hi,first of all: there is no need to cross 
post on 4 lists.If you have a performance problem, post on 
pgsql-performance.Second, please tell us which version of PostgreSQL 
on which operating system you're using. Diagnosing yourproblem might 
depend on which OS you use...Finally, explain what you mean by "the 
age of template0 isincreasing very rapidly", you mean "the size is 
increasing"? Bye,Chris.--Chris 
Mairhttp://www.1006.org

Re: [PERFORM] PostgreSQL performance issues

2006-08-30 Thread Dave Dutcher

Title: Message



That's an 
interesting situation. Your CPU's are pegged, and you're hardly doing any 
IO. I wonder if there is some ineficient query, or if its just very high 
query volume. Maybe you could try setting log_min_duration_statement to 
try to track down the slowest of the queries. Then post the slow queries 
with an explain analyze to the list.

Here is some info 
on setting up logging:
http://www.postgresql.org/docs/8.1/interactive/runtime-config-logging.html


Are your queries 
standard SQL or do you call functions you wrote in PL/pgSQl or PL/Python or 
anything?

Re: [PERFORM] Query tuning

2006-08-23 Thread Dave Dutcher

It seems to me that what would work best is an index scan backward on the
eventtime index.  I don't see why that wouldn't work for you, maybe the
planner is just esitmating the seq scan and sort is faster for some reason.
What does EXPLAIN say if you use a small limit and offset like 10?  Or what
does EXPLAIN say if you first run set enable_seqscan=false;  (If you get
the same plan, then I wouldn't bother running EXPLAIN ANALYZE, but if you
get a different plan I would run EXPLAIN ANALYZE to see if the new plan is
any faster.)



 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Subbiah, Stalin
 Sent: Wednesday, August 23, 2006 1:03 PM
 To: Chris
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Query tuning
 
 
 I get the same plan after running vacuum analyze. Nope, I don't have
 index on objdomainid, objid and userdomainid. Only eventime has it.
 
 -Original Message-
 From: Chris [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, August 22, 2006 8:06 PM
 To: Subbiah, Stalin
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Query tuning
 
 Subbiah, Stalin wrote:
  Actually these servers will be upgraded to 8.1.4 in couple 
 of months.
 
 even so, you could get some bad data in there.
 http://www.postgresql.org/docs/8.0/static/release.html . Go 
 through the
 old release notes and you'll find various race conditions, 
 crashes etc.
 
  Here you go with explain analyze.
  
  # explain analyze SELECT *
  FROM EVENTLOG
  WHERE EVENTTIME'07/23/06 16:00:00' AND  
 EVENTTIME'08/22/06 16:00:00'
 
  AND  (OBJDOMAINID='tzRh39d0d91luNGT1weIUjLvFIcA' 
  OR OBJID='tzRh39d0d91luNGT1weIUjLvFIcA' 
  OR USERDOMAINID='tzRh39d0d91luNGT1weIUjLvFIcA')
  ORDER BY EVENTTIME DESC, SEQUENCENUM DESC LIMIT 500 OFFSET 500;
   
  QUERY PLAN
  
  
 --
  --
  
 --
  --
  
 --
  --
  
 --
  --
  -
   Limit  (cost=15583110.14..15583111.39 rows=500 width=327) (actual
  time=427771.568..427772.904 rows=500 loops=1)
 -  Sort  (cost=15583108.89..15618188.88 rows=14031998 
 width=327) 
  (actual time=427770.504..427771.894 rows=1000 loops=1)
   Sort Key: eventtime, sequencenum
   -  Seq Scan on eventlog  (cost=0.00..2334535.17 
  rows=14031998
  width=327) (actual time=10.370..190038.764 rows=7699388 loops=1)
 Filter: ((eventtime  '2006-07-23 
 16:00:00'::timestamp 
  without time zone) AND (eventtime  '2006-08-22 
 16:00:00'::timestamp 
  without time zone) AND (((objdomainid)::text =
  'tzRh39d0d91luNGT1weIUjLvFIcA'::text) OR ((objid)::text =
  'tzRh39d0d91luNGT1weIUjLvFIcA'::text) OR ((userdomainid)::text =
  'tzRh39d0d91luNGT1weIUjLvFIcA'::text)))
   Total runtime: 437884.134 ms
  (6 rows)
 
 If you analyze the table then run this again what plan does 
 it come back
 with?
 
 I can't read explain output properly but I suspect (and I'm 
 sure I'll be
 corrected if need be) that the sort step is way out of whack and so is
 the seq scan because the stats aren't up to date enough.
 
 Do you have an index on objdomainid, objid and userdomainid (one index
 per field) ? I wonder if that will help much.
 
 --
 Postgresql  php tutorials
 http://www.designmagick.com/
 
 ---(end of 
 broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq
 


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] VACUUM FULL needed sometimes to prevent transaction ID wraparound?

2006-08-22 Thread Dave Dutcher

I would guess that you are not running vacuumdb as a user with permission to
vacuum the postgres or template1 databases.  Try telling vacuumdb to log in
as postgres or whatever your superuser account is called.


 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Marinos Yannikos
 Sent: Tuesday, August 22, 2006 1:11 PM
 To: pgsql-performance@postgresql.org
 Subject: [PERFORM] VACUUM FULL needed sometimes to prevent 
 transaction ID wraparound?
 
 
 Hello,
 
 we're looking into the reason why we are getting warnings about 
 transaction ID wraparound despite a daily vaccumdb -qaz. Someone is 
 claiming that VACUUM without FULL cannot reassign XIDs properly when 
 max_fsm_pages was set too low (it says so here too, but this 
 is rather 
 old: 
 http://www.varlena.com/GeneralBits/Tidbits/perf.html#maxfsmp). Is 
 this true, or do we have a different issue here? We're using 
 8.1.3 with 
 a database generated on 8.1.3 (i.e. not migrated from 7.x or anything 
 like that).
 
 Thanks,
   Marinos
 
 ---(end of 
 broadcast)---
 TIP 5: don't forget to increase your free space map settings
 


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Big diference in response time (query plan question)

2006-08-16 Thread Dave Dutcher


 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Luiz K. Matsumura

 Well, in this case the queries with LEFT OUTER join and with 
 inner join 
 returns the same result set. I don´t have the sufficient knowledge to
 affirm , but I suspect that if the query plan used for 
 fk_clifor = 352 
 and with left outer join is applied for the first query 
 (fk_clifor = 243 
 with left outer join)
 we will have a better total runtime.
 There are some manner to make this test ?

It looks like Postgres used a nested loop join for the fast query and a
merge join for the slow query.  I don't think the left join is causing any
problems.  On the slower query the cost estimate of the nested loop must
have been higher than the cost estimate of the merge join because of more
rows.  You could try disabling merge joins with the command set
enable_mergejoin=false.  Then run the explain analyze again to see if it is
faster.  

If it is faster without merge join, then you could try to change your
settings to make the planner prefer the nested loop.  I'm not sure what the
best way to do that is.  Maybe you could try reducing the random_page_cost,
which should make index scans cheaper.

Dave


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Big diference in response time (query plan question)

2006-08-16 Thread Dave Dutcher


 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Luiz K. Matsumura
 
 
 Where I can see the current random_page_cost value ? There 
 are some hint 
 about what value I must set ?
 Thanks in advance.
 Luiz

On Linux the random_page_cost is set in the postgresql.conf file.  You can
see what it is set to by typing show random_page_cost.  This page has some
guidelines on random_page_cost and other server settings:

http://www.powerpostgresql.com/PerfList/

As it says on the page, make sure you test a variety of queries.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Speeding up query, Joining 55mil and 43mil records.

2006-06-21 Thread Dave Dutcher

Title: Message



Could 
you post an explain analyze of the query? Just FYI, if you do an explain 
analyze of the insert statement, it will actually do the insert. If you 
don't want that just post an explain analyze of the select 
part.

To me 
it would be interesting to compare just the select parts of the query between 
Postgres and MSSQL. That way you would know if your Postgres install is 
slower at the query or slower at the insert.


  
  -Original Message-From: 
  [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of 
  nickySent: Wednesday, June 21, 2006 8:47 AMTo: 
  pgsql-performance@postgresql.orgSubject: [PERFORM] Speeding up 
  query, Joining 55mil and 43mil records. Hello People, I'm trying to solve a 'what i feel is a' 
  performance/configuration/query error on my side. I'm fairly new to 
  configuring PostgreSQL so, i might be completely wrong with my configuration. 
  My database consists of 44 tables, about 20GB. Two of those tables are 
  'big/huge'. Table src.src_faktuur_verricht contains 43million records (9GB) 
  and table src.src_faktuur_verrsec contains 55million records (6GB). 
  Below is the 'slow' query. INSERT INTO 
  rpt.rpt_verrichting(verrichting_id,verrichting_secid,fout_status,patientnr,verrichtingsdatum,locatie_code,afdeling_code,uitvoerder_code,aanvrager_code,verrichting_code,dbcnr,aantal_uitgevoerd,kostenplaats_code,vc_patientnr,vc_verrichting_code,vc_dbcnr)SELECT 
  t1.id, 
  t0.secid, 
  t1.status, 
  t1.patientnr, 
  t1.datum, 
  t1.locatie, 
  t1.afdeling, 
  t1.uitvoerder, 
  t1.aanvrager, 
  t0.code, 
  t1.casenr, 
  t0.aantal, 
  t0.kostplaats, 
  null, 
  null, nullFROM 
  src.src_faktuur_verrsec t0 JOIN 
  src.src_faktuur_verricht t1 ON 
  t0.id = t1.idWHERE substr(t0.code,1,2) not in 
  ('14','15','16','17')AND (substr(t0.correctie,4,1) 
   '1' OR t0.correctie is null)AND 
  EXTRACT(YEAR from t1.datum)  2004;Output from 
  explainHash Join (cost=1328360.12..6167462.76 rows=7197568 
  width=118) Hash Cond: (("outer".id)::text = 
  ("inner".id)::text) - Seq Scan on src_faktuur_verrsec 
  t0 (cost=0.00..2773789.90 rows=40902852 
  width=52) Filter: 
  ((substr((code)::text, 1, 2)  '14'::text) AND (substr((code)::text, 1, 
  2)  '15'::text) AND (substr((code)::text, 1, 2)  '16'::text) 
  AND (substr((code)::text, 1, 2)  '17'::text) AND 
  ((substr((correctie)::text, 4, 1)  '1'::text) OR (correctie IS 
  NULL))) - Hash (cost=1188102.97..1188102.97 
  rows=8942863 width=80) 
  - Bitmap Heap Scan on src_faktuur_verricht t1 
  (cost=62392.02..1188102.97 rows=8942863 
  width=80) 
  Recheck Cond: (date_part('year'::text, datum)  2004::double 
  precision) 
  - Bitmap Index Scan on src_faktuur_verricht_idx1 
  (cost=0.00..62392.02 rows=8942863 
  width=0) 
  Index Cond: (date_part('year'::text, datum)  2004::double 
  precision)The db server runs PostgreSQL 8.1.4 on FreeBSD 
  6.1-Stable. 2GB of RAM. It contains two SATA150 disks, one contains 
  PostgreSQL and the rest of the operating system and the other disk holds the 
  pg_xlog directory.Changed lines from my postgresql.conf 
  fileshared_buffers = 8192temp_buffers = 4096work_mem = 
  65536maintenance_work_mem = 1048576max_fsm_pages = 4fsync = 
  offwal_buffers = 64effective_cache_size = 174848The query 
  above takes around 42 minutes. However, i also have a wimpy desktop 
  machine with 1gb ram. Windows with MSSQL 2000 (default installation), same 
  database structure, same indexes, same query, etc and it takes 17 minutes. The 
  big difference makes me think that i've made an error with my PostgreSQL 
  configuration. I just can't seem to figure it out. Could someone 
  perhaps give me some pointers, advice?Thanks in advance. 
  Nicky

Re: [PERFORM] OT - select + must have from - sql standard syntax?

2006-06-14 Thread Dave Dutcher

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Greg Stark
 Sent: Tuesday, June 13, 2006 11:16 PM
 Subject: Re: [PERFORM] OT - select + must have from - sql 
 standard syntax?
[SNIP]

 Well you could always create a dual, it was always just a 
 regular table. We
 used to joke about what would happen to Oracle if you 
 inserted an extra row in
 it...

I've never used Oracle, so I don't understand why its called dual when
it only has one row?  Shouldn't it be called single?  :\

Dave

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Posrgres speed problem

2006-06-12 Thread Dave Dutcher

Do you run analyze on the production server regularly?

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Ruben Rubio Rey
 Sent: Monday, June 12, 2006 9:39 AM
 To: pgsql-performance@postgresql.org
 Subject: [PERFORM] Posrgres speed problem

 Hi,

 Im having a problem with postgres 8.1.3 on a Fedora Core 3 (kernel 
 2.6.9-1.667smp)

 I have two similar servers, one in production and another for testing 
 purposes.
 Databases are equal (with a difference of some hours)

 In the testing server, an sql sentence takes arround 1 sec.
 In production server (low server load) takes arround 50 secs, 
 and uses 
 too much resources.

 Explain analyze takes too much load, i had to cancel it!

 Could it be a  it a bug?
 Any ideas?

 Thanks in advance

 ---(end of 
 broadcast)---
 TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Why date index is not used

2006-06-08 Thread Dave Dutcher

Actually It looks to me like the sorting is the slow part of this query.
Maybe if you did create an index on both kuupaev and kellaaeg it might
make the sorting faster.  Or maybe you could try increasing the server's
work mem.  The sort will be much slower if the server can't do the whole
thing in ram.



 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Tomas Vondra
 Sent: Thursday, June 08, 2006 2:20 PM
 To: Andrus
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Why date index is not used
 
 
 More precisely - the Postgres could use the index to speed up the
 sorting, but in this case the sorting is very fast (less than one
 second according to the output), so Postgres probably decided not
 to use the index because it would be slower.
 
 Btw. have you run ANALYZE on the table recently? What is the number
 of distinct values in the 'kuupaev' column?
 
 Tomas
 
  Why Postgres 8.1 does not use makse_kuupaev_idx index in 
 the following query 
  ?
  
  How to speed this query up ?
  
  explain analyze select * from makse order by kuupaev desc, 
 kellaaeg desc 
  limit 100
  
  Limit  (cost=62907.94..62908.19 rows=100 width=876) (actual 
  time=33699.551..33701.001 rows=100 loops=1)
-  Sort  (cost=62907.94..63040.49 rows=53022 width=876) (actual 
  time=33699.534..33700.129 rows=100 loops=1)
  Sort Key: kuupaev, kellaaeg
  -  Seq Scan on makse  (cost=0.00..2717.22 
 rows=53022 width=876) 
  (actual time=0.020..308.502 rows=53028 loops=1)
  Total runtime: 37857.177 ms
  
  
  CREATE TABLE makse(
kuupaev date,
kellaaeg char(6) NOT NULL DEFAULT ''::bpchar,
guid char(36) NOT NULL,
CONSTRAINT makse_pkey PRIMARY KEY (guid) )
  
  
  CREATE INDEX makse_kuupaev_idx  ON makse  USING btree  (kuupaev);
  
  
  Andrus. 
 
 
 ---(end of 
 broadcast)---
 TIP 4: Have you searched our list archives?
 
   http://archives.postgresql.org


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] INSERT OU UPDATE WITHOUT SELECT?

2006-05-30 Thread Dave Dutcher

What I do when I'm feeling lazy is execute a delete statement and then
an insert.  I only do it when I'm inserting/updating a very small number
of rows, so I've never worried if its optimal for performance.  Besides
I've heard that an update in postgres is similar in performance to a
delete/insert.


 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of PFC
 Sent: Tuesday, May 30, 2006 5:35 PM
 To: Jonah H. Harris; Waldomiro
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] INSERT OU UPDATE WITHOUT SELECT?
 
 
  PostgreSQL does not support MERGE at the moment, sorry.
 
   Issue an UPDATE, and watch the rowcount ; if the 
 rowcount is 0, issue an  
 INSERT.
   Be prepared to retry if another transaction has 
 inserted the row  
 meanwhile, though.
 
   MERGE would be really useful.
 
 
 ---(end of 
 broadcast)---
 TIP 5: don't forget to increase your free space map settings
 


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Getting even more insert performance (250m+rows/day)

2006-05-24 Thread Dave Dutcher


If you can live with possible database corruption, you could try turning
Fsync off.  For example if you could just reinsert the data on the off
chance a hardware failure corrupts the database, you might get a decent
improvement.

Also have you tried creating the index after you have inserted all your
data?  (Or maybe copy already disables the indexes while inserting?)



 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Daniel J. Luke
 Sent: Wednesday, May 24, 2006 2:45 PM
 To: pgsql-performance@postgresql.org
 Subject: [PERFORM] Getting even more insert performance 
 (250m+rows/day)
 
 
 I have a system that currently inserts ~ 250 million rows per day (I  
 have about 10k more raw data than that, but I'm at the limit of my  
 ability to get useful insert performance out of postgres).
 
 Things I've already done that have made a big difference:
 - modified postgresql.conf shared_buffers value
 - converted to COPY from individual insert statements
 - changed BLCKSZ to 32768
 
 I currently get ~35k/sec inserts on a table with one index (~70k/sec  
 inserts if I don't have any indexes).
 
 The indexed field is basically a time_t (seconds since the epoch),  
 autovacuum is running (or postgres would stop choosing to use the  
 index). The other fields have relatively lower cardinality.
 
 Each days worth of data gets inserted into its own table so that I  
 can expire the data without too much effort (since drop table 
 is much  
 faster than running a delete and then vacuum).
 
 I would really like to be able to have 1 (or 2) more indexes on the  
 table since it takes a while for a sequential scan of 
 250million rows  
 to complete, but CPU time goes way up.
 
 In fact, it looks like I'm not currently IO bound, but CPU-bound. I  
 think some sort of lazy-index generation (especially if it could be  
 parallelized to use the other processors/cores that currently sit  
 mostly idle) would be a solution. Is anyone working on 
 something like  
 this? Any other ideas? Where should I look if I want to start to  
 think about creating a new index that would work this way (or am I  
 just crazy)?
 
 Thanks for any insight!
 
 --
 Daniel J. Luke
 ++
 | * [EMAIL PROTECTED] * |
 | *-- http://www.geeklair.net -* |
 ++
 |   Opinions expressed are mine and do not necessarily   |
 |  reflect the opinions of my employer.  |
 ++
 
 
 


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Question about explain-command...

2006-05-10 Thread Dave Dutcher

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Clemens Eisserer
 Sent: Wednesday, May 10, 2006 6:50 AM
 To: pgsql-performance@postgresql.org
 Subject: [PERFORM] Question about explain-command...

 What does the hash-lines mean, does that mean my query does not use
 the indices at all?
 Why are some table-names and some column-names surrounded by '  '?
 Are they threated as text-columns?
 I have to admit that the tables are just filled with test-data so the
 analyzer may take just a very simple way since almost no data is in...

For small tables, it is faster to do a sequential scan than an index
scan.  You probably don't have enough test data to make the planner
choose an index scan.

I don't think the quotes really mean anything.  They are just used as
delimiters.

The hash lines mean your tables are being joined by hash joins.  You
should read this page for more info:

http://www.postgresql.org/docs/8.1/interactive/performance-tips.html

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] in memory views

2006-05-10 Thread Dave Dutcher

Title: Message



Are 
you using the Postgres JDBC driver? Or are you using an ODBC JDBC 
driver? The Postgres specific driver is usually 
faster.



  
  -Original Message-From: 
  [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of Thomas 
  VatterSent: Wednesday, May 10, 2006 3:54 PMTo: Scott 
  MarloweCc: Tino Wildenhain; 
  pgsql-performance@postgresql.orgSubject: Re: [PERFORM] in memory 
  viewsScott Marlowe wrote: 
  On Wed, 2006-05-10 at 10:41, Thomas Vatter wrote:
  
Scott Marlowe wrote: 

  

  What happens if you do this by declaring it as a cursor and then
fetching the first row?
  
  


  I do executeQuery(), for the resultSet I do next() and return one row,
but wait, I have to review the logic in this area, I can tell you
tomorrow


A good short test is to run explain analyze on the query from the psql
command line.  If it shows an execution time of significantly less than
what you get from you application, then it is likely that the real
problem is that your application is receiving the whole result set via
libpq and waiting for that.  A cursor will solve that problem.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


  Yes, the difference between psql command line and 
  application is 6 seconds to 40 seconds. It isexactly the step resultSet = 
  excecuteQuery() that needs 40 seconds. I use next() as a cursorthrough the 
  resultSet, but I fear this is not enough, do I have to use 
  createStatement(resultSetType, resultSetConcurrency) respectively 
  prepareStatement (resultSetType, resultSetConcurrency) toachieve the 
  cursor behaviour?regardstom

Re: [PERFORM] Memory and/or cache issues?

2006-05-05 Thread Dave Dutcher

Title: Message

For a
standard config most of the memory used by Postgres is the shared buffers.
The shared buffers are a cache to store blocks read from the disk, so if you do
a query, Postgres will allocate and fill the shared buffers up to the max amount
you set in your postgresql.conf file. Postgres doesn't release that
memorybetween queries because the point is to be able to pull data from
ram instead of the disk on the next query.

Areyou sure your settings in postgresql.conf are
standard?What are your settings for shared_buffers and
work_mem?

-Original Message-From:
[EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of mcelroy,
timSent: Friday, May 05, 2006 8:58 AMTo: 'Tom
Lane'Cc: pgsql-performance@postgresql.orgSubject: Re:
[PERFORM] Memory and/or cache issues?
Are you saying the kernel's disc cache may be getting
whacked? No, I understand that PG should use as much memory as it can
and the system as well. The main problem here is that with almost all
the 8GB of RAM 'in use' when I try to do a pg_dump or vacuumdb I run out of
memory and the system crashes
I well understand that unused memory is not a good thing, just
that when you have none and can't do the maint workbad stuff
happens. For example, I just created a benchdb on my DEV box with
1,000,000 tuples. As this ran the mem in use jumped up 1G and it hasn't
gone down? Once the PG process has finished its task shouldn't it
release the memory it used?
Thanks, Tim
-Original Message- From:
Tom Lane [mailto:[EMAIL PROTECTED]] Sent: Friday, May 05, 2006 9:44 AM To: mcelroy, tim Cc: pgsql-performance@postgresql.org
Subject: Re:
[PERFORM] Memory and/or cache issues?
"mcelroy, tim" [EMAIL PROTECTED]
writes: I see this on all the postgres
installations, no matter what I set the
postgresql.conf settings to regarding memory allocation, once postgres starts up 95% of the memory on the box is used. Is
there a way within Linux to 'see' what or who is
actually using this memory?
Probably kernel disk cache. Are you under the
misimpression that unused memory is a good
thing? If a Unix-ish system *isn't* showing near zero free memory under load, the kernel is wasting valuable
resources.

regards, tom
lane

Re: [PERFORM] Lot'sa joins - performance tip-up, please?

2006-05-03 Thread Dave Dutcher


  -  Nested Loop  (cost=0.00..176144.30 rows=57925 width=26)
 (actual time=68.322..529472.026 rows=57925 loops=1)
-  Seq Scan on ticketing_codes_played
 (cost=0.00..863.25 rows=57925 width=8) (actual time=0.042..473.881
 rows=57925 loops=1)
-  Index Scan using ticketing_codes_pk on
 ticketing_codes  (cost=0.00..3.01 rows=1 width=18) (actual
 time=9.102..9.108 rows=1 loops=57925)
  Index Cond: (ticketing_codes.code_id =
 outer.code_id)
  Total runtime: 542000.093 ms
 (27 rows)
 
 
 I'll be more than happy to provide any additional information 
 that I may
 be able to gather. I'd be most happy if someone would scream something
 like four joins, smells like a poor design because design 
 is poor, but
 the system is in production, and I have to bare with it.


It looks like that nested loop which is joining ticketing_codes_played
to ticketing_codes is the slow part.  I'm curious how many rows are in
the ticketing_codes table?

Four or five joins does not seem like a lot to me, but it can be slow if
you are joining big tables with other big tables.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] query performance question

2006-04-30 Thread Dave Dutcher

Title: Message



You 
are pulling a fair amount of data from the database and doing a lot of 
computation in the SQL. I'm not sure how fast this query could be expected 
to run, but I had one idea. If you've 
inserted and deleted a lot into this table, you will need to run vacuum 
ocasionally. If you haven't been doing that, I would try a VACUUM FULL 
ANALYZE on the table. (That will take a lock on the table and prevent 
clients from reading data while it is running.)



  
  -Original Message-From: 
  [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of 
  gulsahSent: Friday, April 28, 2006 6:31 AMTo: 
  pgsql-performance@postgresql.orgSubject: [PERFORM] query 
  performance questionHi,I have a performance 
  problem with Postgresql version 8.1 installed on a Fedora Core release 4 
  (Stentz) with kernel version 2.6.11.The machine I am working on has 
  512MB of RAM and Pentium III 800 MHz CPU.I have only one table in the 
  database which consists of 256 columns and 1 rows. Each column is of float 
  type and each row corresponds to a vector in my application. What I want to do 
  is to compute the distance between a predefined vector in hand and the ones in 
  the database.The computation proceeds according to the following 
  pseudocode: for(i=1; 
  i=256 ; 
  i++){ 
  distance += abs(x1_i - x2_i); 
  }where x1_i denotes the vector in hand's i coordinate and x2_i denotes 
  the icoordinate of the vector in the database.The distance 
  computation have to be done for all the vectors in the databaseby means of 
  a query and the result set should be sorted in terms of thecomputed 
  distances.When I implement the query and measure the time spent for it 
  in an applicationI see that the query is handled in more than 8 seconds 
  which is undesirable inmy application.Here what I want to ask you 
  all is that, is it a normal performance for acomputer with the properties 
  that I have mentioned above? Is there any solutionin your mind to increase 
  the performance of my query?To make it more undestandable, I should 
  give the query for vectors with size3, but in my case their size is 
  256.selectid as 
  vectorid,abs(40.9546-x2_1)+abs(-72.9964-x2_2)+abs(53.5348-x2_3) as 
  distancefrom vectordborder by distanceThank you all for your 
  help.-gulsah
  
  
  Talk is cheap. Use Yahoo! Messenger to make PC-to-Phone calls. Great 
  rates starting at 1¢/min.

Re: [PERFORM] planner not using index for like operator

2006-04-25 Thread Dave Dutcher

Title: Message



If you 
are using a locale other than the C locale, you need to create the index with an 
operator class to get index scans with like.

See 
here for details:

http://www.postgresql.org/docs/8.1/interactive/indexes-opclass.html


  
  -Original Message-From: 
  [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of Sriram 
  DandapaniSent: Tuesday, April 25, 2006 12:08 PMTo: 
  Pgsql-Performance (E-mail)Subject: [PERFORM] planner not using 
  index for like operator
  
  For the 
  query
  
  
  Select col1 from 
  table1
  Where col1 like 
  172.%
  
  The table has 133 million unique 
  ip addresses. Col1 is indexed.
  
  The optimizer is using a 
  sequential scan
  
  This is the explain analyze 
  output
  
  "Seq Scan on table1 
  (cost=0.00..2529284.80 rows=1 width=15) (actual time=307591.339..565251.775 
  rows=524288 loops=1)"
  " Filter: ((col1)::text ~~ 
  '172.%'::text)"
  "Total runtime: 565501.873 
  ms"
  
  
  The number of affected rows (500K) 
  is a small fraction of the total row 
count.

Re: [PERFORM] Takes too long to fetch the data from database

2006-04-21 Thread Dave Dutcher









Ive never used a cursor in Postgres, but I dont think it will help you a
lot. In theory cursors make it
easier to do paging, but your main problem is that getting the first page is
slow. A cursor isnt going to
be any faster at getting the first page than OFFSET/LIMIT is.



Did you try Brunos suggestion of:



SELECT * FROM wan ORDER BY stime
DESC OFFSET 0 LIMIT 50;



You should run an EXPLAIN ANALYZE on that
query to see if it is using an index scan.
Also what version of Postgres are you
using? You can run select version(); to check.









-Original Message-
From:
[EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of soni de
Sent: Thursday, April 20, 2006 11:42 PM
To: Merlin Moncure
Cc:
pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Takes too
long to fetch the data from database





I don't want to query exactly 81900 rows into set. I just want to fetch
50 or 100 rows at a time in a decreasing order of stime.(i.e 50 or 100 rows
starting from last to end).











if we fetched sequentially, there is also problem in fetching all the
records (select * from wanwhere kname='pluto' order by stime) it is
taking more than 4~5 minutes. tried it on same table having more than 326054
records. 












On 4/20/06, Merlin Moncure
[EMAIL PROTECTED] wrote:


 SELECT * FROM wan ORDER BY stime LIMIT 50 OFFSET 81900;

you need to try and solve the problem without using 'offset'.you
could do: 
BEGIN;
DECLARE crs cursor FOR SELECT * FROM wan ORDER BY stime;
FETCH ABSOLUTE 81900 in crs;
FETCH 49 in crs;
CLOSE crs;
COMMIT;

this may be a bit faster but will not solve the fundamental problem. 

the more interesting question is why you want to query exactly 81900
rows into a set.This type of thinking will always get you into
trouble, absolute positioning will not really work in a true sql
sense.if you are browsing a table sequentially, there are much 
better methods.

merlin

Re: [PERFORM] Little use of CPU ( 5%)

2006-04-21 Thread Dave Dutcher









Maybe you could post the query and an
EXPLAIN ANALYZE of the query. That would give more information for trying to
decide what is wrong.



So your question is basically why you get
a slower read rate on this query than on other queries? If I had to guess, maybe it could be that
you are scanning an index with a low correlation (The order of the records in
the index is very different then the order of the records on the disk.) causing
your drives to do a lot of seeking.
A possible fix for this might be to cluster the table on the index, but
I would check out the explain analyze first to see which
step is really the slow one.







-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of luchot
Sent: Friday, April 21, 2006 4:33
AM
To:
pgsql-performance@postgresql.org
Subject: [PERFORM] Little use of
CPU (  5%)



Hello ,



I have a problem of performance with a query. I use PostgreSQL 8.1.3.



The distribution of Linux is Red Hat Enterprise
Linux ES release 4 (Nahant Update 2) and the server is a bi-processor Xeon 2.4GHz with 1 Go of Ram and the
size of the database files is about 60 Go.



The problem is that this query uses only a few percentage of the cpu as
seen with the top command :



PID
USER
PR NI VIRT RES
SHR S %CPU %MEM
TIME+ COMMAND


3342 postgres 18 0 140m 134m 132m D 5.9 13.3 17:04.06 postmaster



The vm stat command :

procs ---memory-- ---swap--
-io --system-- cpu

r
b swpd free buff cache
si so bi bo in cs us sy
id wa

0 1 184 16804 38104 933516 0 0 3092 55 667 145 12
4 71 14

0
1 184 16528 38140 933480 0 0 2236 0 1206 388 2 1 50 47

0
1 184 15008 38188 935252 0 0 2688 92 1209 396 2
0 49 48





The config of PostgresQL is
: 





shared_buffers = 16384  (128Mo)

work_mem = 65536
(64 Mo)

maintenance_work_mem =
98304 (96 Mo)

effective_cache_size = 84000



I think that the problem is
there are too much %wait that are waiting cause of the really bad
rate of lecture (bi) which is only 3 Mo/s .

It is this value I do not
understand because whit other queries this rate is about 120 Mo/s. I use SCSI
DISK and a RAID 0 hardware system .



This is the query plan of the
query :




QUERY PLAN




Aggregate (cost=24582205.20..24582205.22 rows=1
width=13)


- Nested Loop (cost=2.11..24582054.88 rows=60129
width=13)


Join Filter: (inner.l_quantity
 (subplan))


- Seq Scan on part (cost=0.00..238744.00 rows=6013 width=4)


Filter: ((p_brand = 'Brand#51'::bpchar) AND (p_container = 'MED
JAR'::bpchar))


- Bitmap Heap Scan on
lineitem (cost=2.11..126.18 rows=31
width=27)


Recheck Cond: (outer.p_partkey = lineitem.l_partkey)


- Bitmap Index Scan on
id_partkey_lineitem
(cost=0.00..2.11 rows=31 width=0)


Index Cond: (outer.p_partkey = lineitem.l_partkey)


SubPlan


- Aggregate (cost=126.50..126.51 rows=1 width=10)


- Index Scan using
id_partkey_lineitem on lineitem
(cost=0.00..126.42 rows=31 width=10)


Index Cond: (l_partkey = $0)

(13 rows)





The number of tuples in
Lineitem is 180 000 000.



So my question is what I have
to do to increase the rate of the read which improve the execution of the
query? 

I add that the server is only
dedicated for PostgreSQL.



Regards,

Re: [PERFORM] Problem with LIKE-Performance

2006-04-18 Thread Dave Dutcher

 -Original Message-
 From: [EMAIL PROTECTED]
[mailto:pgsql-performance-
 [EMAIL PROTECTED] On Behalf Of Tarabas (Manuel Rorarius)
 Subject: [PERFORM] Problem with LIKE-Performance

 Hi!

   I am having trouble with like statements on one of my tables.

It looks like you are getting a sequential scan instead of an index
scan.  What is your locale setting?  As far as I know Postgres doesn't
support using indexes with LIKE unless you are using the C locale.  

Also, in the future you only need to post EXPLAIN ANALYZE not EXPLAIN
ANALYZE VERBOSE.

Dave

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Query planner is using wrong index.

2006-04-06 Thread Dave Dutcher

 -Original Message-
 From: [EMAIL PROTECTED]
[mailto:pgsql-performance-
 [EMAIL PROTECTED] On Behalf Of Brian Herlihy
 Sent: Thursday, April 06, 2006 6:56 PM
 To: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Query planner is using wrong index.
[Snip]
 I am really surprised that I have to go through such contortions just
to
 use
 the primary key!  This area of Postgres needs improvement.

Of course you mentioned that you are using 7.4.7.  You might want to try
upgrading to 8.1.3.  There have been a lot of improvements to the
performance since 7.4. I don't know if your specific problem was fixed,
but it's worth a try.

Also you might want to at least upgrade to 7.4.12 for the bug fixes.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] [Solved] Slow performance on Windows .NET and OleDb

2006-03-30 Thread Dave Dutcher

I use Npgsql, and the connection string I use is real simple:

Server=192.168.0.36;Database=mydb;User Id=myuserid;Password=123456

Hope that helps,

Dave

 -Original Message-
 From: [EMAIL PROTECTED]
[mailto:pgsql-performance-
 [EMAIL PROTECTED] On Behalf Of Greg Quinn
 Sent: Wednesday, March 29, 2006 11:57 PM
 To: pgsql-performance@postgresql.org
 Subject: [PERFORM] [Solved] Slow performance on Windows .NET and OleDb
 
 This problem was caused by the OleDb driver. I used a 3rd party .NET
 provider and it worked, 8000 rows in just over 100ms!
 
 Can somebody send me a sample connection string for the PostGreSql
native
 .net driver please? I'm battling to find a valid connection string.
 
 Thanks
 
 
 
 ---(end of
broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that
your
message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] simple join uses indexes, very slow

2006-03-28 Thread Dave Dutcher

 -Original Message-
 From: [EMAIL PROTECTED]
[mailto:pgsql-performance-
 [EMAIL PROTECTED] On Behalf Of george young
 Sent: Monday, March 27, 2006 12:48 PM
 To: pgsql-performance@postgresql.org
 Subject: [PERFORM] simple join uses indexes, very slow

[Snip]

 Indexes:
 parameters_idx btree (run, opset_num, step_num, opset,
opset_ver,
 step, step_ver, name, split, wafers)
 parameters_opset_idx btree (opset, step, name)
 parameters_step_idx btree (step, name)

Have you tried creating some different indexes on parameters?  I don't
know if it should matter or not, but I would try some indexes like:

(run, opset_num) //Without all the other columns
(opset_num, run) //Backwards
(opset_num)

I don't really know Postgres internals all that well.  It just seems to
me that parameters_idx has a lot of columns this query is not interested
in.  I'd just be curious to see what happens.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] simple join uses indexes, very slow

2006-03-28 Thread Dave Dutcher

 -Original Message-
 From: [EMAIL PROTECTED]
[mailto:pgsql-performance-
 [EMAIL PROTECTED] On Behalf Of Steinar H. Gunderson
 Sent: Tuesday, March 28, 2006 10:29 AM
 
 An index on (A,B,C) can be used for a query on (A,B) or (A), so it
doesn't
 really matter. It isn't usable for a query on (B), (C) or (B,C),
though.
 (The
 index rows will get bigger, of course, so you'll need more I/O if you
want
 to
 scan large parts of it, but I guess that's beside the point.)


I guess what I am really curious about is why was the OP getting an
expensive sort when the planner tried a merge join?  Most of the time
was spent sorting the parameters parameters table by opset_num even
though opset_num is indexed.  Isn't Postgres able to walk the index
instead of sorting?  I was wondering if maybe Postgres wasn't
recognizing that it could just walk the index because the opset_num
column isn't the first in the index.





---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] simple join uses indexes, very slow

2006-03-28 Thread Dave Dutcher


 -Original Message-
 From: [EMAIL PROTECTED]
[mailto:pgsql-performance-
 [EMAIL PROTECTED] On Behalf Of Steinar H. Gunderson
 A merge join requires sorted inputs.
 
  Most of the time was spent sorting the parameters parameters table
by
  opset_num even though opset_num is indexed. Isn't Postgres able to
walk
 the
  index instead of sorting?
 
 The time of an index scan vs. a sequential scan + sort depends on
several
 factors, so it's not just a matter of walking the index whenever there
is
 one.

I was just looking this over again and I realized I misread the query
plan.  The slowest step was the Bitmap Heap Scan not the sort.  (The
sort was relatively fast.)




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Problem with query, server totally unresponsive

2006-03-24 Thread Dave Dutcher

 From: [EMAIL PROTECTED]
[mailto:pgsql-performance-
 [EMAIL PROTECTED] On Behalf Of Jim C. Nasby
 Subject: Re: [PERFORM] Problem with query, server totally unresponsive

 On Thu, Mar 23, 2006 at 01:12:08PM +0100, Bendik Rognlien Johansen
wrote:
  Hello, I have a big problem with one of my databases. When i run my
  query, after a few minutes, the postmaster shows 99% mem i top, and
  the server becomes totally unresponsive.

 You've got a bunch of sorts going on; could you be pushing the machine
 into swapping?

  I get this message when I try to cancel the query:

  server closed the connection unexpectedly
  This probably means the server terminated abnormally
  before or while processing the request.
  The connection to the server was lost. Attempting reset: Failed.

 Did you send a kill of some kind to the backend?

  The machine has 2x Intel dual core processors (3GHz) and 2 Gigs of
ram.

 Unless I missed some big news recently, no such CPU exists.
 Hyperthreading is absolutely not the same as dual core, and many
people
 have found that it's best to disable hyperthreading on database
servers.

Maybe I'm confused by the marketing, but I think those CPUs do exist.
According to New Egg the Pentium D 830 and the Pentium D 930 both are
dual core Pentiums that run at 3Ghz.  It also specifically says these
processors don't support hyper threading, so I believe they really have
two cores.   Maybe you are thinking he was talking about a 3Ghz Core
Duo.

http://www.newegg.com/Product/ProductList.asp?Category=34N=200034+5
0001157+1302820275+1051007392Submit=ENE

Dave

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] import performance

2006-03-13 Thread Dave Dutcher

[Snip]
 
  shared_buffers = 256
 
 Make this higher too. If this is a dedicated machine with 512 MB of
ram,
 set it to something like 125000.
 
 You may need to adjust shared memory settings for your operating
system.
 See the manual for details.
 

Whoa.  Maybe I'm wrong, but isn't each buffer 8192 bytes?  So you are
suggesting that he set his shared buffers to a gigabyte on a machine
with 512 MB of ram?  Or was that just a miscalculation?

Dave


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Help understanding indexes, explain, and optimizing

2006-03-07 Thread Dave Dutcher

Actually I think LEFT OUTER JOIN is equivalent to LEFT JOIN.  The
Postgres manual says that the word OUTER is optional.  Either way you
get ...all rows in the qualified Cartesian product (i.e., all combined
rows that pass its join condition), plus one copy of each row in the
left-hand table for which there was no right-hand row that passed the
join condition.

It sounds like the original posters problem was a less than optimal join
order, and from what I understand Postgres can't reorder left joins.  


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Chris
Sent: Monday, March 06, 2006 6:40 PM
To: i.v.r.
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Help understanding indexes, explain, and
optimizing

i.v.r. wrote:
 Hi everyone,
[Snip]
 So I'm wondering what I'm doing wrong. I migrated this database from 
 MySQL, and on there it ran pretty fast.

Have you done an 'analyze' or 'vacuum analyze' over these tables?

A left outer join gets *everything* from the second table:

  LEFT OUTER JOIN groups ON groups.id = locations.group_id
  LEFT OUTER JOIN schools ON schools.location_id = locations.id

So they will load everything from groups and schools. Maybe they should 
be left join's not left outer joins?


-- 
Postgresql  php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] joining two tables slow due to sequential scan

2006-02-10 Thread Dave Dutcher










What version of postgres
are you using? Can you post the
output from EXPLAIN ANALYZE?





-Original Message-
From:
[EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tim Jones
Sent: Friday, February 10, 2006
4:07 PM
To:
pgsql-performance@postgresql.org
Subject: [PERFORM] joining two
tables slow due to sequential scan











I am trying to join two tables and
keep getting a sequential scan in the plan even though there is an index on the
columns I am joining on. Basically this the deal ... I have two
tables with docid in them which is what I am using for the join. 











ClinicalDocs ... (no primary key)
though it does not help if I make docid primary key





docid integer (index)





patientid integer (index)





visitid integer (index)





...











Documentversions





docid integer (index)





docversionnumber (index)





docversionidentifier (primary key)











It seems to do an index scan if I
put the primary key as docid. This is what occurs when I link on the
patid from ClinicalDocs to patient table. However I can not make the
docid primary key because it gets repeated depending on how may versions of a
document I have. I have tried using a foreign key on documentversions
with no sucess. 











In addition this query











select * from documentversions join
clinicaldocuments on documentversions.documentidentifier
= clinicaldocuments.dssdocumentidentifier where documentversions.documentstatus
= 'AC'; 











does index scan 





but if I change the order e.g











select * from clinicaldocuments join
documentversions on clinicaldocuments.dssdocumentidentifier
= documentversions .documentidentifier where
clinicaldocuments.patientidentifier= 123;











does sequential scanwhat I
need is bottom query





it is extremely slow ... Any ideas ?









Tim Jones

Healthcare Project Manager

Optio Software, Inc.

(770) 576-3555

Re: [PERFORM] joining two tables slow due to sequential scan

2006-02-10 Thread Dave Dutcher

OK, if I'm reading this correctly, it looks like the planner is choosing
a sequential scan because it expects 48,000 rows for that
patientidentifier, but its actually only getting 3.  The planner has the
number of rows right for the sequential scan, so it seems like the stats
are up to date.  I would try increasing the stats for the
patientindentifier column with 'alter table set statistics...' or
increasing the default_statistics_target for the whole DB.  Once you
have changed the stats I believe you need to run analyze again.



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tim Jones
Sent: Friday, February 10, 2006 4:59 PM
To: Tom Lane
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] joining two tables slow due to sequential scan 

ok here is real db

the first  query I had seems to make no sense because it is only fast if
I limit the rows since almost all rows have status = 'AC'

second query
 tables both have about 10 million rows and it takes a long time as you
can see but this person only has approx 160 total documents


 QUERY PLAN  

---
 Hash Join  (cost=84813.14..1510711.97 rows=48387 width=555) (actual
time=83266.854..91166.315 rows=3 loops=1)
   Hash Cond: (outer.documentidentifier =
inner.dssdocumentidentifier)
   -  Seq Scan on documentversions  (cost=0.00..269141.98 rows=9677398
width=415) (actual time=0.056..49812.459 rows=9677398 loops=1)
   -  Hash  (cost=83660.05..83660.05 rows=48036 width=140) (actual
time=10.833..10.833 rows=3 loops=1)
 -  Bitmap Heap Scan on clinicaldocuments
(cost=301.13..83660.05 rows=48036 width=140) (actual time=0.243..0.258
rows=3 loops=1)
   Recheck Cond: (patientidentifier = 690193)
   -  Bitmap Index Scan on ix_cdocpid  (cost=0.00..301.13
rows=48036 width=0) (actual time=0.201..0.201 rows=3 loops=1)
 Index Cond: (patientidentifier = 690193)
 Total runtime: 91166.540 ms


Tim Jones
Healthcare Project Manager
Optio Software, Inc.
(770) 576-3555

-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 10, 2006 5:52 PM
To: Tim Jones
Cc: Scott Marlowe; Dave Dutcher; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] joining two tables slow due to sequential scan 

Tim Jones [EMAIL PROTECTED] writes:
 QUERY PLAN
 'Hash Join  (cost=899.83..4384.17 rows=482 width=1350) (actual
 time=0.203..0.203 rows=0 loops=1)'
 ...
 'Total runtime: 0.392 ms'

Hardly seems like evidence of a performance problem ...

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] [PERFORMANCE] Stored Procedures

2006-01-23 Thread Dave Dutcher

I don't think pgpool is what you need.  If I understand pgpool
correctly, pgpool lets you pool multiple postgres servers together.  You
are just looking for database connection pooling.  

A simple connection pool is basically just an application wide list of
connections.  When a client needs a connection, you just request a
connection from the pool.  If there is an unused connection in the pool,
it is given to the client and removed from the unused pool.  If there is
no unused connection in the pool, then a new connection is opened.  When
the client is done with it, the client releases it back into the pool.

You can google for 'database connection pool' and you should find a
bunch of stuff.  It's probably a good idea to find one already written.
If you write your own you have to make sure it can deal with things like
dead connections, synchronization, and maximum numbers of open
connections.

Dave


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Marcos
Sent: Monday, January 23, 2006 7:27 AM
To: Markus Schaber
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] [PERFORMANCE] Stored Procedures

Hi Markus

 You really should consider using a connection pool (most web
application
 servers provide pooling facilities) or some other means to keep the
 connection between several http requests.

Yes. I'm finding a connection pool, I found the pgpool but yet don't
understand how it's work I'm go read more about him.

Thanks

Marcos


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[PERFORM] Showing Column Statistics Number

2006-01-11 Thread Dave Dutcher









Hi,



Ive looked around through the docs, but cant
seem to find an answer to this. If
I change a columns statistics with Alter table alter column set
statistics n, is there a way I can later go back and see what the number
is for that column? I want to be
able to tell which columns Ive changed the statistics on, and which ones
I havent.



Thanks,



Dave

[PERFORM] Left Join Performance vs Inner Join Performance

2006-01-10 Thread Dave Dutcher









Hello,



I have an inner join query that runs fast, but I when I
change to a left join the query runs 96 times slower. I wish I could always do an inner join,
but there are rare times when there isnt data in the right hand
table. I could expect a small
performance hit, but the difference is so large I figure I must be doing
something wrong. What I think is
the strangest is how similar the two query plans are.



Query (inner join version, just replace inner with left for
other version):

select 

p.owner_trader_id, p.strategy_id, m.last, m.bid, m.ask

from 

om_position p inner
join om_instrument_mark m on m.instrument_id
= p.instrument_id and m.data_source_id
= 5 and m.date = '2005-02-03' 

where p.as_of_date = '2005-02-03' and p.fund_id
= 'TRIDE' and p.owner_trader_id = 'tam4' and p.strategy_id = 'BASKET1'



Query plan for inner join:

Nested Loop (cost=0.00..176.99 rows=4
width=43) (actual time=0.234..14.182 rows=193 loops=1)

 - Index Scan
using as_of_date_om_position_index on om_position p
(cost=0.00..68.26 rows=19 width=20) (actual time=0.171..5.210 rows=193
loops=1)

 Index
Cond: (as_of_date =
'2005-02-03'::date)


Filter: (((fund_id)::text = 'TRIDE'::text) AND ((owner_trader_id)::text
= 'tam4'::text) AND ((strategy_id)::text =
'BASKET1'::text))

 - Index Scan
using om_instrument_mark_pkey on om_instrument_mark
m (cost=0.00..5.71 rows=1 width=31)
(actual time=0.028..0.032 rows=1 loops=193)


Index Cond: ((m.instrument_id
= outer.instrument_id) AND (m.data_source_id = 5) AND (m.date
= '2005-02-03'::date))

Total runtime: 14.890 ms



Query plan for left join:

Nested Loop Left Join (cost=0.00..7763.36 rows=19
width=43) (actual time=3.005..1346.308 rows=193 loops=1)

 - Index Scan
using as_of_date_om_position_index on om_position p
(cost=0.00..68.26 rows=19 width=20) (actual time=0.064..6.654 rows=193
loops=1)


Index Cond: (as_of_date
= '2005-02-03'::date)


Filter: (((fund_id)::text = 'TRIDE'::text) AND ((owner_trader_id)::text
= 'tam4'::text) AND ((strategy_id)::text =
'BASKET1'::text))

 - Index Scan
using om_instrument_mark_pkey on om_instrument_mark
m (cost=0.00..404.99 rows=1
width=31) (actual time=3.589..6.919 rows=1 loops=193)


Index Cond: (m.instrument_id
= outer.instrument_id)


Filter: ((data_source_id = 5) AND (date =
'2005-02-03'::date))

Total runtime: 1347.159 ms





Table Definitions:

CREATE TABLE om_position

(

 fund_id varchar(10)
NOT NULL DEFAULT ''::character varying,

 owner_trader_id varchar(10) NOT NULL DEFAULT ''::character varying,

 strategy_id varchar(30)
NOT NULL DEFAULT ''::character varying,

 instrument_id int4 NOT NULL DEFAULT 0,

 as_of_date date NOT NULL DEFAULT
'0001-01-01'::date,

 pos numeric(22,9) NOT NULL DEFAULT 0.0,

 cf_account_id int4 NOT NULL DEFAULT 0,

 cost numeric(22,9) NOT NULL DEFAULT 0.0,

 CONSTRAINT om_position_pkey PRIMARY KEY (fund_id,
owner_trader_id, strategy_id,
cf_account_id, instrument_id,
as_of_date),

 CONSTRAINT
$1 FOREIGN KEY (strategy_id)

 REFERENCES om_strategy (strategy_id) MATCH
SIMPLE

 ON UPDATE NO
ACTION ON DELETE NO ACTION,

 CONSTRAINT $2
FOREIGN KEY (fund_id)

 REFERENCES om_fund (fund_id) MATCH SIMPLE

 ON UPDATE NO
ACTION ON DELETE NO ACTION,

 CONSTRAINT
$3 FOREIGN KEY (cf_account_id)

 REFERENCES om_cf_account (id) MATCH SIMPLE

 ON UPDATE NO
ACTION ON DELETE NO ACTION,

 CONSTRAINT
$4 FOREIGN KEY (owner_trader_id)

 REFERENCES om_trader (trader_id) MATCH
SIMPLE

 ON UPDATE NO
ACTION ON DELETE NO ACTION

) 

WITH OIDS;

CREATE INDEX as_of_date_om_position_index

 ON om_position

 USING btree

 (as_of_date);



CREATE TABLE om_instrument_mark

(

 instrument_id int4 NOT NULL DEFAULT 0,

 data_source_id int4 NOT NULL DEFAULT 0,

 date date NOT NULL DEFAULT
'0001-01-01'::date,

 last numeric(22,9) NOT NULL DEFAULT 0.0,

 bid numeric(22,9) NOT NULL DEFAULT 0.0,

 ask numeric(22,9) NOT NULL DEFAULT 0.0,

 comment varchar(150) NOT
NULL DEFAULT ''::character varying,

 trader_id varchar(10)
NOT NULL DEFAULT 'auto'::character varying,

 CONSTRAINT om_instrument_mark_pkey PRIMARY KEY (instrument_id,
data_source_id, date),

 CONSTRAINT
$1 FOREIGN KEY (instrument_id)

 REFERENCES om_instrument (id) MATCH SIMPLE

 ON UPDATE NO
ACTION ON DELETE NO ACTION,

 CONSTRAINT
$2 FOREIGN KEY (data_source_id)

 REFERENCES om_data_source (id) MATCH SIMPLE

 ON UPDATE NO
ACTION ON DELETE NO ACTION,

 CONSTRAINT om_instrument_mark_trader_id_fkey FOREIGN KEY (trader_id)

 REFERENCES om_trader (trader_id) MATCH
SIMPLE

 ON UPDATE NO
ACTION ON DELETE NO ACTION

) 

WITH OIDS;



Thanks for any help

88 matches

Mail list logo