date:20051109

Re: [PERFORM] Some help on buffers and other performance tricks

2005-11-09 Thread Ron Peacetree

The point Gentlemen, was that Good Architecture is King.  That's what I was 
trying to emphasize by calling proper DB architecture step 0.  All other things 
being equal (and they usually aren't, this sort of stuff is _very_ context 
dependent), the more of your critical schema that you can fit into RAM during 
normal operation the better.

...and it all starts with proper DB design.  Otherwise, you are quite right in 
stating that you risk wasting time, effort, and HW.

Ron

-Original Message-
From: Frank Wiles <[EMAIL PROTECTED]>
Sent: Nov 9, 2005 6:53 PM
To: Alvaro Herrera <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED], [EMAIL PROTECTED], pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Some help on buffers and other performance tricks

On Wed, 9 Nov 2005 20:07:52 -0300
Alvaro Herrera <[EMAIL PROTECTED]> wrote:

> IMHO you should really be examining your queries _before_ you do any
> investment in hardware, because later those may prove unnecessary.

  It all really depends on what you're doing.  For some of the systems
  I run, 4 GBs of RAM is *WAY* overkill, RAID 1+0 is overkill, etc. 

  In general I would slightly change the "order of operations" from: 

  1) Buy tons of RAM 
  2) Buy lots of disk I/O 
  3) Tune your conf
  4) Examine your queries 

  to 

  1) Tune your conf
  2) Spend a few minutes examining your queries  
  3) Buy as much RAM as you can afford
  4) Buy as much disk I/O as you can 
  5) Do in depth tuning of your queries/conf 

  Personally I avoid planning my schema around my performance at
  the start.  I just try to represent the data in a sensible,
  normalized way.  While I'm sure I sub-consciously make decisions 
  based on performance considerations early on, I don't do any major 
  schema overhauls until I find I can't get the performance I need
  via tuning.  

  Obviously there are systems/datasets/quantities where this won't
  always work out best, but for the majority of systems out there 
  complicating your schema, maxing your hardware, and THEN tuning
  is IMHO the wrong approach. 

 -
   Frank Wiles <[EMAIL PROTECTED]>
   http://www.wiles.org
 -

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Some help on buffers and other performance tricks

2005-11-09 Thread Frank Wiles

On Wed, 9 Nov 2005 20:07:52 -0300
Alvaro Herrera <[EMAIL PROTECTED]> wrote:

> IMHO you should really be examining your queries _before_ you do any
> investment in hardware, because later those may prove unnecessary.

  It all really depends on what you're doing.  For some of the systems
  I run, 4 GBs of RAM is *WAY* overkill, RAID 1+0 is overkill, etc. 

  In general I would slightly change the "order of operations" from: 

  1) Buy tons of RAM 
  2) Buy lots of disk I/O 
  3) Tune your conf
  4) Examine your queries 

  to 

  1) Tune your conf
  2) Spend a few minutes examining your queries  
  3) Buy as much RAM as you can afford
  4) Buy as much disk I/O as you can 
  5) Do in depth tuning of your queries/conf 

  Personally I avoid planning my schema around my performance at
  the start.  I just try to represent the data in a sensible,
  normalized way.  While I'm sure I sub-consciously make decisions 
  based on performance considerations early on, I don't do any major 
  schema overhauls until I find I can't get the performance I need
  via tuning.  

  Obviously there are systems/datasets/quantities where this won't
  always work out best, but for the majority of systems out there 
  complicating your schema, maxing your hardware, and THEN tuning
  is IMHO the wrong approach. 

 -
   Frank Wiles <[EMAIL PROTECTED]>
   http://www.wiles.org
 -

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Some help on buffers and other performance tricks

2005-11-09 Thread Frank Wiles

On Wed, 9 Nov 2005 21:43:33 -0300
Alvaro Herrera <[EMAIL PROTECTED]> wrote:

> Frank Wiles wrote:
> 
> >   Obviously there are systems/datasets/quantities where this won't
> >   always work out best, but for the majority of systems out there 
> >   complicating your schema, maxing your hardware, and THEN tuning
> >   is IMHO the wrong approach. 
> 
> I wasn't suggesting to complicate the schema -- I was actually
> thinking in systems where some queries are not using indexes, some
> queries are plain wrong, etc.  Buying a very expensive RAID and then
> noticing that you just needed to create an index, is going to make
> somebody feel at least somewhat stupid.

  Sorry I was referring to Ron statement that the first step should
  be to "Optimize your schema to be as tight as possible." 

  But I agree, finding out you need an index after spending $$$ on 
  extra hardware would be bad.  Especially if you have to explain it
  to the person forking over the $$$! :) 

 -
   Frank Wiles <[EMAIL PROTECTED]>
   http://www.wiles.org
 -


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Some help on buffers and other performance tricks

2005-11-09 Thread Alvaro Herrera

Frank Wiles wrote:

>   Obviously there are systems/datasets/quantities where this won't
>   always work out best, but for the majority of systems out there 
>   complicating your schema, maxing your hardware, and THEN tuning
>   is IMHO the wrong approach. 

I wasn't suggesting to complicate the schema -- I was actually thinking
in systems where some queries are not using indexes, some queries are
plain wrong, etc.  Buying a very expensive RAID and then noticing that
you just needed to create an index, is going to make somebody feel at
least somewhat stupid.

-- 
Alvaro Herrera   Valdivia, Chile   ICBM: S 39º 49' 17.7", W 73º 14' 26.8"
Y una voz del caos me habló y me dijo
"Sonríe y sé feliz, podría ser peor".
Y sonreí. Y fui feliz.
Y fue peor.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Some help on buffers and other performance tricks

2005-11-09 Thread Alvaro Herrera

Ron Peacetree wrote:
> 0= Optimize your schema to be a tight as possible.  Your goal is to give 
> yourself the maximum chance that everything you want to work on is in RAM 
> when you need it.
> 1= Upgrade your RAM to as much as you can possibly strain to afford.  4GB at 
> least.  It's that important.
> 2= If the _entire_ DB does not fit in RAM after upgrading your RAM, the next 
> step is making sure your HD IO subsystem is adequate to your needs.
> 3= Read the various pg tuning docs that are available and Do The Right Thing.
> 4= If performance is still not acceptable, then it's time to drill down into 
> what specific actions/queries are problems.
> If you get to here and the entire DBMS is still not close to acceptable, your 
> fundamental assumptions have to be re-examined.

IMHO you should really be examining your queries _before_ you do any
investment in hardware, because later those may prove unnecessary.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Outer Join performance in PostgreSQL

2005-11-09 Thread Stephan Szabo

On Wed, 9 Nov 2005, Ashok Agrawal wrote:

> I noticed outer join is very very slow in postgresql as compared
> to Oracle.
>
> SELECT a.dln_code, a.company_name,
> to_char(a.certificate_date,'DD-MON-'),
> to_char(a.certificate_type_id, '99'),
> COALESCE(b.certificate_type_description,'None') ,
> a.description, a.blanket_single, a.certificate_status,
> COALESCE(a.sun_legal_entity, 'None'),
> COALESCE(a.other_entity_name, 'None'),
> COALESCE(a.notes, 'None'),COALESCE(c.name, NULL),
> COALESCE(to_char(a.created_date,'DD-MON-'), 'N/A'),
> COALESCE(c.name, NULL),
> COALESCE(to_char(a.updated_date,'DD-MON-'), 'N/A'),
> COALESCE(e.name, NULL),
> COALESCE(to_char(a.approved_date,'DD-MON-'), 'N/A')
>   FROM ecms_cert_headers a
> LEFT OUTER JOIN taxpack_user c ON (a.created_by = c.emp_no))
> LEFT OUTER JOIN taxpack_user d ON (a.updated_by = d.emp_no))
> LEFT OUTER JOIN taxpack_user e ON (a.approved_by = e.emp_no))
> INNER JOIN ecms_certificate_types b ON
>   (a.certificate_type_id= b.certificate_type_id ))
>  WHERE a.dln_code = '17319'

I think in the above it's safe to do the inner join first, although
PostgreSQL won't determine that currently and that could have something to
do with the difference in performance if Oracle did reorder the joins.
If you were to run the query doing the INNER JOIN first, does that give
the correct results and run more quickly in PostgreSQL?  In either case,
explain analyze output would be handy to find the actual times taken by
the steps.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[PERFORM] (View and SQL) VS plpgsql

2005-11-09 Thread Eric Lauzon


Hello all , i post this question here because i wasen't able to find
answer to my question elsewhere , i hope someone can answer.


Abstract:

The function that can be found at the end of the e-mail emulate two thing.

First it will fill a record set of result with needed column from a table and 
two "empty result column" a min and a max.

Those two column are then filled by a second query on the same table that will 
do a min and a max

on an index idx_utctime.

The function loop for the first recordset and return a setof record that is 
casted by caller to the function.


The goald of this is to enabled the application that will receive the result 
set to minimise its

work by having to group internaly two matching rowset. We use to handle two 
resultset but i am looking

toward improving performances and at first glance it seem to speed up the 
process.


Questions:

1. How could this be done in a single combinasion of SQL and view? 

2. In a case like that is plpgsql really givig significant overhead?

3. Performance difference [I would need a working pure-SQL version to compare 
PLANNER and Explain results ]

STUFF:

--TABLE && INDEX


CREATE TABLE archive_event
(
  inst int4 NOT NULL,
  cid int8 NOT NULL,
  src int8 NOT NULL,
  dst int8 NOT NULL,
  bid int8 NOT NULL,
  tid int4 NOT NULL,
  utctime int4 NOT NULL,
  CONSTRAINT ids_archives_event_pkey PRIMARY KEY (inst, cid),
  CONSTRAINT ids_archives_event_cid_index UNIQUE (cid)
) 

--index

CREATE INDEX idx_archive_utctime
  ON archive_event
  USING btree
  (utctime);

CREATE INDEX idx_archive_src
  ON archive_event
  USING btree
  (src);

CREATE INDEX idx_archive_bid_tid
  ON archive_event
  USING btree
  (tid, bid);




--FUNCTION
CREATE OR REPLACE FUNCTION console_get_source_rule_level_1()
  RETURNS SETOF RECORD AS
'
DECLARE

one_record record;
r_record record;

BEGIN

FOR r_record IN SELECT count(cid) AS hits,src, bid, tid,NULL::int8 as 
min_time,NULL::int8 as max_time FROM archive_event WHERE inst=\'3\' AND 
(utctime BETWEEN \'111492\' AND \'1131512399\') GROUP BY src, bid, tid LOOP

SELECT INTO one_record MIN(utctime) as timestart,MAX(utctime) as 
timestop from archive_event  where src =r_record.src AND bid =r_record.bid  AND 
tid = r_record.tid AND inst =\'3\' AND (utctime BETWEEN \'111492\' AND 
\'1131512399\');

r_record.min_time := one_record.timestart;
r_record.max_time := one_record.timestop;
  
RETURN NEXT r_record;

END LOOP;

RETURN;

END;
'
  LANGUAGE 'plpgsql' VOLATILE;
GRANT EXECUTE ON FUNCTION console_get_source_rule_level_1() TO console WITH 
GRANT OPTION;


--FUNCTION CALLER
SELECT * from get_source_rule_level_1() AS (hits int8,src int8,bid int8,tid 
int4,min_time int8,max_time int8)



Eric Lauzon
[Recherche & Développement]
Above Sécurité / Above Security
Tél  : (450) 430-8166
Fax : (450) 430-1858 

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Sort performance on large tables

2005-11-09 Thread Ron Peacetree

...and on those notes, let me repeat my often stated advice that a DB server 
should be configured with as much RAM as is feasible.  4GB or more strongly 
recommended.

I'll add that the HW you are using for a DB server should be able to hold _at 
least_ 4GB of RAM (note that modern _laptops_ can hold 2GB.  Next year's are 
likely to be able to hold 4GB.).  I can't casually find specs on the D3000, but 
if it can't be upgraded to at least 4GB, you should be looking for new DB 
server HW.

At this writing, 4 1GB DIMMs (4GB) should set you back ~$300 or less.  4 2GB 
DIMMs (8GB) should cost ~$600.
As of now, very few mainboards support 4GB DIMMs and I doubt the D3000 has such 
a mainboard.  If you can use them, 4 4GB DIMMs (16GB) will currently set you 
back ~$1600-$2400.

Whatever the way you do it, it's well worth the money to have at least 4GB of 
RAM in a DB server.  It makes all kinds of problems just not exist.

Ron

-Original Message-
From: Simon Riggs <[EMAIL PROTECTED]>
Sent: Nov 9, 2005 4:35 AM
To: Charlie Savage <[EMAIL PROTECTED]>, Luke Lonergan <[EMAIL PROTECTED]>
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Sort performance on large tables

On Tue, 2005-11-08 at 00:05 -0700, Charlie Savage wrote:

> Setup: Dell Dimension 3000, Suse 10, 1GB ram, PostgreSQL 8.1 RC 1 with 

> I want to extract data out of the file, with the most important values 
> being stored in a column called tlid.  The tlid field is an integer, and 
> the values are 98% unique.  There is a second column called ogc_fid 
> which is unique (it is a serial field).  I need to extract out unique 
> TLID's (doesn't matter which duplicate I get rid of).  To do this I am 
> running this query:
> 
> SELECT tlid, min(ogc_fid)
> FROM completechain
> GROUP BY tlid;
> 
> The results from explain analyze are:
> 
> "GroupAggregate  (cost=10400373.80..11361807.88 rows=48071704 width=8) 
> (actual time=7311682.715..8315746.835 rows=47599910 loops=1)"
> "  ->  Sort  (cost=10400373.80..10520553.06 rows=48071704 width=8) 
> (actual time=7311682.682..7972304.777 rows=48199165 loops=1)"
> "Sort Key: tlid"
> "->  Seq Scan on completechain  (cost=0.00..2228584.04 
> rows=48071704 width=8) (actual time=27.514..773245.046 rows=48199165 
> loops=1)"
> "Total runtime: 8486057.185 ms"

> Should I expect results like this?  I realize that the computer is quite 
> low-end and is very IO bound for this query, but I'm still surprised 
> that the sort operation takes so long.
> 
> Out of curiosity, I setup an Oracle database on the same machine with 
> the same data and ran the same query.  Oracle was over an order of 
> magnitude faster. Looking at its query plan, it avoided the sort by 
> using "HASH GROUP BY."  Does such a construct exist in PostgreSQL (I see 
> only hash joins)?

PostgreSQL can do HashAggregates as well as GroupAggregates, just like
Oracle. HashAggs avoid the sort phase, so would improve performance
considerably. The difference in performance you are getting is because
of the different plan used. Did you specifically do anything to Oracle
to help it get that plan, or was it a pure out-of-the-box install (or
maybe even a "set this up for Data Warehousing" install)?

To get a HashAgg plan, you need to be able to fit all of the unique
values in memory. That would be 98% of 48071704 rows, each 8+ bytes
wide, giving a HashAgg memory sizing of over 375MB. You must allocate
memory of the next power of two above the level you want, so we would
need to allocate 512MB to work_mem before it would consider using a
HashAgg.

Can you let us know how high you have to set work_mem before an EXPLAIN
(not EXPLAIN ANALYZE) chooses the HashAgg plan?

Please be aware that publishing Oracle performance results is against
the terms of their licence and we seek to be both fair and legitimate,
especially within this public discussion forum.

Best Regards, Simon Riggs

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Outer Join performance in PostgreSQL

2005-11-09 Thread Michael Alan Dorman

Ashok Agrawal <[EMAIL PROTECTED]> writes:
> I noticed outer join is very very slow in postgresql as compared
> to Oracle.

I think the three things the people best able to help you are going to
ask for are 1) what version of PostgreSQL, 2) what are the tables, and
how many rows in each, and 3) output from 'explain analyze' rather
than just 'explain'.

That said, I'm willing to take an amateurish stab at it even without
that.

In fact, I don't think the outer joins are the issue at all.  I see
that you're forcing a right join from ecms_certificate_types to
ecms_cert_headers.  This seems to be causing postgresql to think it
must (unnecessarily) consider three quarters of a billion rows, which,
if I'm reading right, seems to be producing the majority of the
estimated cost:

>  Hash Join  (cost=1666049.74..18486619.37 rows=157735046 width=874)
>Hash Cond: ("outer".certificate_type_id = "inner".certificate_type_id)
>->  Merge Right Join  (cost=1666048.13..11324159.05 rows=643816513 
> width=826)

In fact, looking at the fact that you're doing a COALESCE on a column
from b, it seems to me that doing a right join from ecms_cert_headers
to ecms_certificate_types is just wrong.  It seems to me that that
should be a left join as well.

With that in mind, I would rewrite the whole FROM clause as:

   FROM ecms_cert_headers a
LEFT OUTER JOIN ecms_certificate_types b
 ON (a.certificate_type_id = b.certificate_type_id)
LEFT OUTER JOIN taxpack_user c
 ON (a.created_by = c.emp_no)
LEFT OUTER JOIN taxpack_user d
 ON (a.updated_by = d.emp_no)
LEFT OUTER JOIN taxpack_user e
 ON (a.approved_by = e.emp_no)
  WHERE a.dln_code = '17319'

It seems to me that this more reflects the intent of the data that is
being retrieved.  I would also expect it to be a boatload faster.

Assuming I've understood the intent correctly, I would guess that the
difference is the result of the Oracle planner being able to eliminate
the right join or something.

Mike

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Some help on buffers and other performance tricks

2005-11-09 Thread Ron Peacetree

0= Optimize your schema to be a tight as possible.  Your goal is to give 
yourself the maximum chance that everything you want to work on is in RAM when 
you need it.
1= Upgrade your RAM to as much as you can possibly strain to afford.  4GB at 
least.  It's that important.
2= If the _entire_ DB does not fit in RAM after upgrading your RAM, the next 
step is making sure your HD IO subsystem is adequate to your needs.
3= Read the various pg tuning docs that are available and Do The Right Thing.
4= If performance is still not acceptable, then it's time to drill down into 
what specific actions/queries are problems.
If you get to here and the entire DBMS is still not close to acceptable, your 
fundamental assumptions have to be re-examined.

Ron

-Original Message-
From: Yves Vindevogel <[EMAIL PROTECTED]>
Sent: Nov 9, 2005 3:11 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Some help on buffers and other performance tricks

Hi all,

I've got PG 8.0 on Debian sarge set up ...
I want to speed up performance on the system.

The system will run PG, Apache front-end on port 80 and Tomcat / Cocoon 
for the webapp.
The webapp is not so heavily used, so we can give the max performance 
to the database.
The database has a lot of work to do, we upload files every day.
The current server has 8 databases of around 1 million records.  This 
will be more in the future.
There's only one main table, with some smaller tables.  95% of the 
records are in that one table.
A lot of updates are done on that table, affecting 10-20% of the 
records.

The system has 1 gig of ram.  I could give 512Mb to PG.
Filesystem is ext2, with the -noatime parameter in fstab

Could I get some suggestions in how to configure my buffers, wals,  
?

Met vriendelijke groeten,
Bien ï¿½ vous,
Kind regards,

Yves Vindevogel
Implements



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[PERFORM] Some help on buffers and other performance tricks

2005-11-09 Thread Yves Vindevogel

Hi all,

I've got PG 8.0 on Debian sarge set up ...
I want to speed up performance on the system.

The system will run PG, Apache front-end on port 80 and Tomcat / Cocoon for the webapp.
The webapp is not so heavily used, so we can give the max performance to the database.
The database has a lot of work to do, we upload files every day.
The current server has 8 databases of around 1 million records.  This will be more in the future.
There's only one main table, with some smaller tables.  95% of the records are in that one table.
A lot of updates are done on that table, affecting 10-20% of the records.

The system has 1 gig of ram.  I could give 512Mb to PG.
Filesystem is ext2, with the -noatime parameter in fstab

Could I get some suggestions in how to configure my buffers, wals,  ?

Met vriendelijke groeten,
Bien à vous,
Kind regards,

Yves Vindevogel
Implements




Mail: [EMAIL PROTECTED]  - Mobile: +32 (478) 80 82 91

Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76

Web: http://www.implements.be

First they ignore you.  Then they laugh at you.  Then they fight you.  Then you win.
Mahatma Ghandi.
---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[PERFORM] Outer Join performance in PostgreSQL

2005-11-09 Thread Ashok Agrawal

I noticed outer join is very very slow in postgresql as compared
to Oracle.

SELECT a.dln_code, a.company_name,
to_char(a.certificate_date,'DD-MON-'),
to_char(a.certificate_type_id, '99'),
COALESCE(b.certificate_type_description,'None') ,
a.description, a.blanket_single, a.certificate_status,
COALESCE(a.sun_legal_entity, 'None'),
COALESCE(a.other_entity_name, 'None'),
COALESCE(a.notes, 'None'),COALESCE(c.name, NULL),
COALESCE(to_char(a.created_date,'DD-MON-'), 'N/A'),
COALESCE(c.name, NULL),
COALESCE(to_char(a.updated_date,'DD-MON-'), 'N/A'),
COALESCE(e.name, NULL),
COALESCE(to_char(a.approved_date,'DD-MON-'), 'N/A')
  FROM ecms_cert_headers a
LEFT OUTER JOIN taxpack_user c ON (a.created_by = c.emp_no))
LEFT OUTER JOIN taxpack_user d ON (a.updated_by = d.emp_no))
LEFT OUTER JOIN taxpack_user e ON (a.approved_by = e.emp_no))
INNER JOIN ecms_certificate_types b ON
  (a.certificate_type_id= b.certificate_type_id ))
 WHERE a.dln_code = '17319'


This query return only 1 record but take 25 second to execute in postgreSQL
as compared to 1.3 second in Oracle. Any suggestion ? Below is explain output.


 Hash Join  (cost=1666049.74..18486619.37 rows=157735046 width=874)
   Hash Cond: ("outer".certificate_type_id = "inner".certificate_type_id)
   ->  Merge Right Join  (cost=1666048.13..11324159.05 rows=643816513 width=826)
 Merge Cond: ("outer"."?column3?" = "inner"."?column16?")
 ->  Sort  (cost=30776.19..31207.80 rows=172645 width=64)
   Sort Key: (e.emp_no)::text
   ->  Seq Scan on taxpack_user e  (cost=0.00..4898.45 rows=172645
width=64)
 ->  Sort  (cost=1635271.94..1637136.51 rows=745827 width=811)
   Sort Key: (a.approved_by)::text
   ->  Merge Left Join  (cost=25230.45..36422.18 rows=745827 
width=811)
 Merge Cond: ("outer"."?column17?" = "inner"."?column2?")
 ->  Sort  (cost=3117.35..3119.51 rows=864 width=844)
   Sort Key: (a.updated_by)::text
   ->  Nested Loop Left Join  (cost=0.00..3075.21
rows=864 width=844)
 ->  Index Scan using pk_ecms_cert_headers on
ecms_cert_headers a  (cost=0.00..6.01 rows=1 width=829)
   Index Cond: ((dln_code)::text =
'17319'::text)
 ->  Index Scan using ash_n1 on taxpack_user c
(cost=0.00..3058.40 rows=864 width=64)
   Index Cond: (("outer".created_by)::text =
(c.emp_no)::text)
 ->  Sort  (cost=22113.10..22544.71 rows=172645 width=16)
   Sort Key: (d.emp_no)::text
   ->  Seq Scan on taxpack_user d  (cost=0.00..4898.45
rows=172645 width=16)
   ->  Hash  (cost=1.49..1.49 rows=49 width=50)
 ->  Seq Scan on ecms_certificate_types b  (cost=0.00..1.49 rows=49
width=50)
(23 rows)

Thanks
Ashok


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Improving performance on multicolumn query

2005-11-09 Thread Jan Kesten

Hi all!

First thanks to any answer by now :-)

> You don't post your table definitions (please do), but it looks like
> test_b, test_c, test_d and test_e might be bigints? If so, you may
> want to do explicit "AND test_b=1::bigint AND test_c=2::bigint" etc.
> -- 7.4 doesn't figure this out for you. (8.0 and higher does.)

I didn't post table defintion, but you all are right, test_a to test_e
are all bigint. I use JDBC to connect to this database and use a
prepared statment for the queries and set all parameters with
pst.setLong() method. Perhaps this could be the problem? I'll try
'normal' statements with typecasting, because as far as I can see, the
query is the problem (postgresql takes more than 98% cpu while running
these statements) or the overhead produced (but not the network, as it
has only 1-2% load). Quering other tables (not as big - both rows and
columns are much less)  run quite fast with the same code.

So, thanks again - I'll try and report :-) Can't be so slow, I have some
self-build database with millions of rows and they run very fast - but
they don't use bigint ;-)

Cheers,
Jan

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Sort performance on large tables

2005-11-09 Thread Charlie Savage


Hi Simon,

Thanks for the response Simon.


PostgreSQL can do HashAggregates as well as GroupAggregates, just like
Oracle. HashAggs avoid the sort phase, so would improve performance
considerably. The difference in performance you are getting is because
of the different plan used. Did you specifically do anything to Oracle
to help it get that plan, or was it a pure out-of-the-box install (or
maybe even a "set this up for Data Warehousing" install)?


It was an out-of-the-box plan with the standard database install option 
(wasn't a Data Warehousing install).



Can you let us know how high you have to set work_mem before an EXPLAIN
(not EXPLAIN ANALYZE) chooses the HashAgg plan?


The planner picked a HashAggregate only when I set work_mem to 2097151 - 
which I gather is the maximum allowed value according to a message 
returned from the server.




Please be aware that publishing Oracle performance results is against
the terms of their licence and we seek to be both fair and legitimate,
especially within this public discussion forum.


Sorry, I didn't realize - I'll be more vague next time.

Charlie

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Improving performance on multicolumn query

2005-11-09 Thread Steinar H. Gunderson

On Wed, Nov 09, 2005 at 01:08:07PM +0100, Jan Kesten wrote:
> Now my problem: I need really many queries of rows using it's primary
> key and fetching about five different columns but these are quite slow
> (about 10 queries per second and as I have some other databases which
> can have about 300 queries per second I think this is slow):
> 
> transfer=> explain analyse SELECT * FROM test WHERE test_a=9091150001
> AND test_b=1 AND test_c=2 AND test_d=0 AND test_e=0;
> 
>  Index Scan using test_idx on test  (cost=0.00..50.27 rows=1 width=1891)
> (actual time=0.161..0.167 rows=1 loops=1)
>Index Cond: (test_a = 9091150001::bigint)
>Filter: ((test_b = 1) AND (test_c = 2) AND (test_d = 0) AND (test_e 0))

You don't post your table definitions (please do), but it looks like test_b,
test_c, test_d and test_e might be bigints? If so, you may want to do
explicit "AND test_b=1::bigint AND test_c=2::bigint" etc. -- 7.4 doesn't
figure this out for you. (8.0 and higher does.)

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Improving performance on multicolumn query

2005-11-09 Thread Christopher Kings-Lynne


transfer=> explain analyse SELECT * FROM test WHERE test_a=9091150001
AND test_b=1 AND test_c=2 AND test_d=0 AND test_e=0;

 Index Scan using test_idx on test  (cost=0.00..50.27 rows=1 width=1891)
(actual time=0.161..0.167 rows=1 loops=1)
   Index Cond: (test_a = 9091150001::bigint)
   Filter: ((test_b = 1) AND (test_c = 2) AND (test_d = 0) AND (test_e 0))

So, what to do to speed things up? If I understand correctly this
output, the planner uses my index (test_idx is the same as test_pkey
created along with the table), but only for the first column.


Hi Jan,

If you're using 7.4.x then the planner can't use the index for unquoted 
bigints.  Try this:


SELECT * FROM test WHERE test_a='9091150001' AND test_b='1' AND 
test_c=''2 AND test_d='0' AND test_e='0';


Chris

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Improving performance on multicolumn query

2005-11-09 Thread Steinar H. Gunderson

On Wed, Nov 09, 2005 at 01:08:07PM +0100, Jan Kesten wrote:
> First, I'm using postgresql 7.4.7 on a 2GHz machine having 1.5GByte RAM
> and I have a table with about 220 columns and 2 rows - and the first
> five columns build a primary key (and a unique index).

I forgot this, but it should be mentioned: A primary key works as an
unique index, so you don't need both.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Improving performance on multicolumn query

2005-11-09 Thread Richard Huxton


Jan Kesten wrote:


First, I'm using postgresql 7.4.7 on a 2GHz machine having 1.5GByte RAM
and I have a table with about 220 columns and 2 rows - and the first
five columns build a primary key (and a unique index).



transfer=> explain analyse SELECT * FROM test WHERE test_a=9091150001
AND test_b=1 AND test_c=2 AND test_d=0 AND test_e=0;

 Index Scan using test_idx on test  (cost=0.00..50.27 rows=1 width=1891)
(actual time=0.161..0.167 rows=1 loops=1)
   Index Cond: (test_a = 9091150001::bigint)
   Filter: ((test_b = 1) AND (test_c = 2) AND (test_d = 0) AND (test_e 0))


This says it's taking less than a millisecond - which is almost 
certainly too fast to measure accurately anyway. Are you sure this query 
is the problem?



So, what to do to speed things up? If I understand correctly this
output, the planner uses my index (test_idx is the same as test_pkey
created along with the table), but only for the first column.


1. Are all of test_a/b/c/d/e bigint rather than int?
2. Have you tried explicitly casting your query parameters?
...WHERE test_a=123::bigint AND test_b=456::bigint...

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: explain analyze is your friend

[PERFORM] Improving performance on multicolumn query

2005-11-09 Thread Jan Kesten


Hi, all!

I've been using postgresql for a long time now, but today I had some
problem I couldn't solve properly - hope here some more experienced
users have some hint's for me.

First, I'm using postgresql 7.4.7 on a 2GHz machine having 1.5GByte RAM
and I have a table with about 220 columns and 2 rows - and the first
five columns build a primary key (and a unique index).

Now my problem: I need really many queries of rows using it's primary
key and fetching about five different columns but these are quite slow
(about 10 queries per second and as I have some other databases which
can have about 300 queries per second I think this is slow):

transfer=> explain analyse SELECT * FROM test WHERE test_a=9091150001
AND test_b=1 AND test_c=2 AND test_d=0 AND test_e=0;

 Index Scan using test_idx on test  (cost=0.00..50.27 rows=1 width=1891)
(actual time=0.161..0.167 rows=1 loops=1)
   Index Cond: (test_a = 9091150001::bigint)
   Filter: ((test_b = 1) AND (test_c = 2) AND (test_d = 0) AND (test_e 0))

So, what to do to speed things up? If I understand correctly this
output, the planner uses my index (test_idx is the same as test_pkey
created along with the table), but only for the first column.

Accidently I can't refactor these tables as they were given to me.

Thanks for any hint!
Jan



signature.asc
Description: PGP signature

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Sort performance on large tables

2005-11-09 Thread Simon Riggs

On Tue, 2005-11-08 at 00:05 -0700, Charlie Savage wrote:

> Setup: Dell Dimension 3000, Suse 10, 1GB ram, PostgreSQL 8.1 RC 1 with 

> I want to extract data out of the file, with the most important values 
> being stored in a column called tlid.  The tlid field is an integer, and 
> the values are 98% unique.  There is a second column called ogc_fid 
> which is unique (it is a serial field).  I need to extract out unique 
> TLID's (doesn't matter which duplicate I get rid of).  To do this I am 
> running this query:
> 
> SELECT tlid, min(ogc_fid)
> FROM completechain
> GROUP BY tlid;
> 
> The results from explain analyze are:
> 
> "GroupAggregate  (cost=10400373.80..11361807.88 rows=48071704 width=8) 
> (actual time=7311682.715..8315746.835 rows=47599910 loops=1)"
> "  ->  Sort  (cost=10400373.80..10520553.06 rows=48071704 width=8) 
> (actual time=7311682.682..7972304.777 rows=48199165 loops=1)"
> "Sort Key: tlid"
> "->  Seq Scan on completechain  (cost=0.00..2228584.04 
> rows=48071704 width=8) (actual time=27.514..773245.046 rows=48199165 
> loops=1)"
> "Total runtime: 8486057.185 ms"

> Should I expect results like this?  I realize that the computer is quite 
> low-end and is very IO bound for this query, but I'm still surprised 
> that the sort operation takes so long.
> 
> Out of curiosity, I setup an Oracle database on the same machine with 
> the same data and ran the same query.  Oracle was over an order of 
> magnitude faster. Looking at its query plan, it avoided the sort by 
> using "HASH GROUP BY."  Does such a construct exist in PostgreSQL (I see 
> only hash joins)?

PostgreSQL can do HashAggregates as well as GroupAggregates, just like
Oracle. HashAggs avoid the sort phase, so would improve performance
considerably. The difference in performance you are getting is because
of the different plan used. Did you specifically do anything to Oracle
to help it get that plan, or was it a pure out-of-the-box install (or
maybe even a "set this up for Data Warehousing" install)?

To get a HashAgg plan, you need to be able to fit all of the unique
values in memory. That would be 98% of 48071704 rows, each 8+ bytes
wide, giving a HashAgg memory sizing of over 375MB. You must allocate
memory of the next power of two above the level you want, so we would
need to allocate 512MB to work_mem before it would consider using a
HashAgg.

Can you let us know how high you have to set work_mem before an EXPLAIN
(not EXPLAIN ANALYZE) chooses the HashAgg plan?

Please be aware that publishing Oracle performance results is against
the terms of their licence and we seek to be both fair and legitimate,
especially within this public discussion forum.

Best Regards, Simon Riggs

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Some help on buffers and other performance tricks

Re: [PERFORM] Some help on buffers and other performance tricks

Re: [PERFORM] Some help on buffers and other performance tricks

Re: [PERFORM] Some help on buffers and other performance tricks

Re: [PERFORM] Some help on buffers and other performance tricks

Re: [PERFORM] Outer Join performance in PostgreSQL

[PERFORM] (View and SQL) VS plpgsql

Re: [PERFORM] Sort performance on large tables

Re: [PERFORM] Outer Join performance in PostgreSQL

Re: [PERFORM] Some help on buffers and other performance tricks

[PERFORM] Some help on buffers and other performance tricks

[PERFORM] Outer Join performance in PostgreSQL

Re: [PERFORM] Improving performance on multicolumn query

Re: [PERFORM] Sort performance on large tables

Re: [PERFORM] Improving performance on multicolumn query

Re: [PERFORM] Improving performance on multicolumn query

Re: [PERFORM] Improving performance on multicolumn query

Re: [PERFORM] Improving performance on multicolumn query

[PERFORM] Improving performance on multicolumn query

Re: [PERFORM] Sort performance on large tables

20 matches

Site Navigation

Mail list logo

Footer information