Re: [PERFORM] Sort performance on large tables
On Tue, 2005-11-08 at 00:05 -0700, Charlie Savage wrote: Setup: Dell Dimension 3000, Suse 10, 1GB ram, PostgreSQL 8.1 RC 1 with I want to extract data out of the file, with the most important values being stored in a column called tlid. The tlid field is an integer, and the values are 98% unique. There is a second column called ogc_fid which is unique (it is a serial field). I need to extract out unique TLID's (doesn't matter which duplicate I get rid of). To do this I am running this query: SELECT tlid, min(ogc_fid) FROM completechain GROUP BY tlid; The results from explain analyze are: GroupAggregate (cost=10400373.80..11361807.88 rows=48071704 width=8) (actual time=7311682.715..8315746.835 rows=47599910 loops=1) - Sort (cost=10400373.80..10520553.06 rows=48071704 width=8) (actual time=7311682.682..7972304.777 rows=48199165 loops=1) Sort Key: tlid - Seq Scan on completechain (cost=0.00..2228584.04 rows=48071704 width=8) (actual time=27.514..773245.046 rows=48199165 loops=1) Total runtime: 8486057.185 ms Should I expect results like this? I realize that the computer is quite low-end and is very IO bound for this query, but I'm still surprised that the sort operation takes so long. Out of curiosity, I setup an Oracle database on the same machine with the same data and ran the same query. Oracle was over an order of magnitude faster. Looking at its query plan, it avoided the sort by using HASH GROUP BY. Does such a construct exist in PostgreSQL (I see only hash joins)? PostgreSQL can do HashAggregates as well as GroupAggregates, just like Oracle. HashAggs avoid the sort phase, so would improve performance considerably. The difference in performance you are getting is because of the different plan used. Did you specifically do anything to Oracle to help it get that plan, or was it a pure out-of-the-box install (or maybe even a set this up for Data Warehousing install)? To get a HashAgg plan, you need to be able to fit all of the unique values in memory. That would be 98% of 48071704 rows, each 8+ bytes wide, giving a HashAgg memory sizing of over 375MB. You must allocate memory of the next power of two above the level you want, so we would need to allocate 512MB to work_mem before it would consider using a HashAgg. Can you let us know how high you have to set work_mem before an EXPLAIN (not EXPLAIN ANALYZE) chooses the HashAgg plan? Please be aware that publishing Oracle performance results is against the terms of their licence and we seek to be both fair and legitimate, especially within this public discussion forum. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Improving performance on multicolumn query
Hi, all! I've been using postgresql for a long time now, but today I had some problem I couldn't solve properly - hope here some more experienced users have some hint's for me. First, I'm using postgresql 7.4.7 on a 2GHz machine having 1.5GByte RAM and I have a table with about 220 columns and 2 rows - and the first five columns build a primary key (and a unique index). Now my problem: I need really many queries of rows using it's primary key and fetching about five different columns but these are quite slow (about 10 queries per second and as I have some other databases which can have about 300 queries per second I think this is slow): transfer= explain analyse SELECT * FROM test WHERE test_a=9091150001 AND test_b=1 AND test_c=2 AND test_d=0 AND test_e=0; Index Scan using test_idx on test (cost=0.00..50.27 rows=1 width=1891) (actual time=0.161..0.167 rows=1 loops=1) Index Cond: (test_a = 9091150001::bigint) Filter: ((test_b = 1) AND (test_c = 2) AND (test_d = 0) AND (test_e 0)) So, what to do to speed things up? If I understand correctly this output, the planner uses my index (test_idx is the same as test_pkey created along with the table), but only for the first column. Accidently I can't refactor these tables as they were given to me. Thanks for any hint! Jan signature.asc Description: PGP signature ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Improving performance on multicolumn query
Jan Kesten wrote: First, I'm using postgresql 7.4.7 on a 2GHz machine having 1.5GByte RAM and I have a table with about 220 columns and 2 rows - and the first five columns build a primary key (and a unique index). transfer= explain analyse SELECT * FROM test WHERE test_a=9091150001 AND test_b=1 AND test_c=2 AND test_d=0 AND test_e=0; Index Scan using test_idx on test (cost=0.00..50.27 rows=1 width=1891) (actual time=0.161..0.167 rows=1 loops=1) Index Cond: (test_a = 9091150001::bigint) Filter: ((test_b = 1) AND (test_c = 2) AND (test_d = 0) AND (test_e 0)) This says it's taking less than a millisecond - which is almost certainly too fast to measure accurately anyway. Are you sure this query is the problem? So, what to do to speed things up? If I understand correctly this output, the planner uses my index (test_idx is the same as test_pkey created along with the table), but only for the first column. 1. Are all of test_a/b/c/d/e bigint rather than int? 2. Have you tried explicitly casting your query parameters? ...WHERE test_a=123::bigint AND test_b=456::bigint... -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Improving performance on multicolumn query
On Wed, Nov 09, 2005 at 01:08:07PM +0100, Jan Kesten wrote: First, I'm using postgresql 7.4.7 on a 2GHz machine having 1.5GByte RAM and I have a table with about 220 columns and 2 rows - and the first five columns build a primary key (and a unique index). I forgot this, but it should be mentioned: A primary key works as an unique index, so you don't need both. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Improving performance on multicolumn query
transfer= explain analyse SELECT * FROM test WHERE test_a=9091150001 AND test_b=1 AND test_c=2 AND test_d=0 AND test_e=0; Index Scan using test_idx on test (cost=0.00..50.27 rows=1 width=1891) (actual time=0.161..0.167 rows=1 loops=1) Index Cond: (test_a = 9091150001::bigint) Filter: ((test_b = 1) AND (test_c = 2) AND (test_d = 0) AND (test_e 0)) So, what to do to speed things up? If I understand correctly this output, the planner uses my index (test_idx is the same as test_pkey created along with the table), but only for the first column. Hi Jan, If you're using 7.4.x then the planner can't use the index for unquoted bigints. Try this: SELECT * FROM test WHERE test_a='9091150001' AND test_b='1' AND test_c=''2 AND test_d='0' AND test_e='0'; Chris ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Improving performance on multicolumn query
On Wed, Nov 09, 2005 at 01:08:07PM +0100, Jan Kesten wrote: Now my problem: I need really many queries of rows using it's primary key and fetching about five different columns but these are quite slow (about 10 queries per second and as I have some other databases which can have about 300 queries per second I think this is slow): transfer= explain analyse SELECT * FROM test WHERE test_a=9091150001 AND test_b=1 AND test_c=2 AND test_d=0 AND test_e=0; Index Scan using test_idx on test (cost=0.00..50.27 rows=1 width=1891) (actual time=0.161..0.167 rows=1 loops=1) Index Cond: (test_a = 9091150001::bigint) Filter: ((test_b = 1) AND (test_c = 2) AND (test_d = 0) AND (test_e 0)) You don't post your table definitions (please do), but it looks like test_b, test_c, test_d and test_e might be bigints? If so, you may want to do explicit AND test_b=1::bigint AND test_c=2::bigint etc. -- 7.4 doesn't figure this out for you. (8.0 and higher does.) /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Sort performance on large tables
Hi Simon, Thanks for the response Simon. PostgreSQL can do HashAggregates as well as GroupAggregates, just like Oracle. HashAggs avoid the sort phase, so would improve performance considerably. The difference in performance you are getting is because of the different plan used. Did you specifically do anything to Oracle to help it get that plan, or was it a pure out-of-the-box install (or maybe even a set this up for Data Warehousing install)? It was an out-of-the-box plan with the standard database install option (wasn't a Data Warehousing install). Can you let us know how high you have to set work_mem before an EXPLAIN (not EXPLAIN ANALYZE) chooses the HashAgg plan? The planner picked a HashAggregate only when I set work_mem to 2097151 - which I gather is the maximum allowed value according to a message returned from the server. Please be aware that publishing Oracle performance results is against the terms of their licence and we seek to be both fair and legitimate, especially within this public discussion forum. Sorry, I didn't realize - I'll be more vague next time. Charlie ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Improving performance on multicolumn query
Hi all! First thanks to any answer by now :-) You don't post your table definitions (please do), but it looks like test_b, test_c, test_d and test_e might be bigints? If so, you may want to do explicit AND test_b=1::bigint AND test_c=2::bigint etc. -- 7.4 doesn't figure this out for you. (8.0 and higher does.) I didn't post table defintion, but you all are right, test_a to test_e are all bigint. I use JDBC to connect to this database and use a prepared statment for the queries and set all parameters with pst.setLong() method. Perhaps this could be the problem? I'll try 'normal' statements with typecasting, because as far as I can see, the query is the problem (postgresql takes more than 98% cpu while running these statements) or the overhead produced (but not the network, as it has only 1-2% load). Quering other tables (not as big - both rows and columns are much less) run quite fast with the same code. So, thanks again - I'll try and report :-) Can't be so slow, I have some self-build database with millions of rows and they run very fast - but they don't use bigint ;-) Cheers, Jan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] Some help on buffers and other performance tricks
Hi all, I've got PG 8.0 on Debian sarge set up ... I want to speed up performance on the system. The system will run PG, Apache front-end on port 80 and Tomcat / Cocoon for the webapp. The webapp is not so heavily used, so we can give the max performance to the database. The database has a lot of work to do, we upload files every day. The current server has 8 databases of around 1 million records. This will be more in the future. There's only one main table, with some smaller tables. 95% of the records are in that one table. A lot of updates are done on that table, affecting 10-20% of the records. The system has 1 gig of ram. I could give 512Mb to PG. Filesystem is ext2, with the -noatime parameter in fstab Could I get some suggestions in how to configure my buffers, wals, ? Met vriendelijke groeten, Bien à vous, Kind regards, Yves Vindevogel Implements Mail: [EMAIL PROTECTED] - Mobile: +32 (478) 80 82 91 Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76 Web: http://www.implements.be x-tad-smaller First they ignore you. Then they laugh at you. Then they fight you. Then you win. Mahatma Ghandi./x-tad-smaller ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Some help on buffers and other performance tricks
0= Optimize your schema to be a tight as possible. Your goal is to give yourself the maximum chance that everything you want to work on is in RAM when you need it. 1= Upgrade your RAM to as much as you can possibly strain to afford. 4GB at least. It's that important. 2= If the _entire_ DB does not fit in RAM after upgrading your RAM, the next step is making sure your HD IO subsystem is adequate to your needs. 3= Read the various pg tuning docs that are available and Do The Right Thing. 4= If performance is still not acceptable, then it's time to drill down into what specific actions/queries are problems. If you get to here and the entire DBMS is still not close to acceptable, your fundamental assumptions have to be re-examined. Ron -Original Message- From: Yves Vindevogel [EMAIL PROTECTED] Sent: Nov 9, 2005 3:11 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] Some help on buffers and other performance tricks Hi all, I've got PG 8.0 on Debian sarge set up ... I want to speed up performance on the system. The system will run PG, Apache front-end on port 80 and Tomcat / Cocoon for the webapp. The webapp is not so heavily used, so we can give the max performance to the database. The database has a lot of work to do, we upload files every day. The current server has 8 databases of around 1 million records. This will be more in the future. There's only one main table, with some smaller tables. 95% of the records are in that one table. A lot of updates are done on that table, affecting 10-20% of the records. The system has 1 gig of ram. I could give 512Mb to PG. Filesystem is ext2, with the -noatime parameter in fstab Could I get some suggestions in how to configure my buffers, wals, ? Met vriendelijke groeten, Bien � vous, Kind regards, Yves Vindevogel Implements ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Outer Join performance in PostgreSQL
Ashok Agrawal [EMAIL PROTECTED] writes: I noticed outer join is very very slow in postgresql as compared to Oracle. I think the three things the people best able to help you are going to ask for are 1) what version of PostgreSQL, 2) what are the tables, and how many rows in each, and 3) output from 'explain analyze' rather than just 'explain'. That said, I'm willing to take an amateurish stab at it even without that. In fact, I don't think the outer joins are the issue at all. I see that you're forcing a right join from ecms_certificate_types to ecms_cert_headers. This seems to be causing postgresql to think it must (unnecessarily) consider three quarters of a billion rows, which, if I'm reading right, seems to be producing the majority of the estimated cost: Hash Join (cost=1666049.74..18486619.37 rows=157735046 width=874) Hash Cond: (outer.certificate_type_id = inner.certificate_type_id) - Merge Right Join (cost=1666048.13..11324159.05 rows=643816513 width=826) In fact, looking at the fact that you're doing a COALESCE on a column from b, it seems to me that doing a right join from ecms_cert_headers to ecms_certificate_types is just wrong. It seems to me that that should be a left join as well. With that in mind, I would rewrite the whole FROM clause as: FROM ecms_cert_headers a LEFT OUTER JOIN ecms_certificate_types b ON (a.certificate_type_id = b.certificate_type_id) LEFT OUTER JOIN taxpack_user c ON (a.created_by = c.emp_no) LEFT OUTER JOIN taxpack_user d ON (a.updated_by = d.emp_no) LEFT OUTER JOIN taxpack_user e ON (a.approved_by = e.emp_no) WHERE a.dln_code = '17319' It seems to me that this more reflects the intent of the data that is being retrieved. I would also expect it to be a boatload faster. Assuming I've understood the intent correctly, I would guess that the difference is the result of the Oracle planner being able to eliminate the right join or something. Mike ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Sort performance on large tables
...and on those notes, let me repeat my often stated advice that a DB server should be configured with as much RAM as is feasible. 4GB or more strongly recommended. I'll add that the HW you are using for a DB server should be able to hold _at least_ 4GB of RAM (note that modern _laptops_ can hold 2GB. Next year's are likely to be able to hold 4GB.). I can't casually find specs on the D3000, but if it can't be upgraded to at least 4GB, you should be looking for new DB server HW. At this writing, 4 1GB DIMMs (4GB) should set you back ~$300 or less. 4 2GB DIMMs (8GB) should cost ~$600. As of now, very few mainboards support 4GB DIMMs and I doubt the D3000 has such a mainboard. If you can use them, 4 4GB DIMMs (16GB) will currently set you back ~$1600-$2400. Whatever the way you do it, it's well worth the money to have at least 4GB of RAM in a DB server. It makes all kinds of problems just not exist. Ron -Original Message- From: Simon Riggs [EMAIL PROTECTED] Sent: Nov 9, 2005 4:35 AM To: Charlie Savage [EMAIL PROTECTED], Luke Lonergan [EMAIL PROTECTED] Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Sort performance on large tables On Tue, 2005-11-08 at 00:05 -0700, Charlie Savage wrote: Setup: Dell Dimension 3000, Suse 10, 1GB ram, PostgreSQL 8.1 RC 1 with I want to extract data out of the file, with the most important values being stored in a column called tlid. The tlid field is an integer, and the values are 98% unique. There is a second column called ogc_fid which is unique (it is a serial field). I need to extract out unique TLID's (doesn't matter which duplicate I get rid of). To do this I am running this query: SELECT tlid, min(ogc_fid) FROM completechain GROUP BY tlid; The results from explain analyze are: GroupAggregate (cost=10400373.80..11361807.88 rows=48071704 width=8) (actual time=7311682.715..8315746.835 rows=47599910 loops=1) - Sort (cost=10400373.80..10520553.06 rows=48071704 width=8) (actual time=7311682.682..7972304.777 rows=48199165 loops=1) Sort Key: tlid - Seq Scan on completechain (cost=0.00..2228584.04 rows=48071704 width=8) (actual time=27.514..773245.046 rows=48199165 loops=1) Total runtime: 8486057.185 ms Should I expect results like this? I realize that the computer is quite low-end and is very IO bound for this query, but I'm still surprised that the sort operation takes so long. Out of curiosity, I setup an Oracle database on the same machine with the same data and ran the same query. Oracle was over an order of magnitude faster. Looking at its query plan, it avoided the sort by using HASH GROUP BY. Does such a construct exist in PostgreSQL (I see only hash joins)? PostgreSQL can do HashAggregates as well as GroupAggregates, just like Oracle. HashAggs avoid the sort phase, so would improve performance considerably. The difference in performance you are getting is because of the different plan used. Did you specifically do anything to Oracle to help it get that plan, or was it a pure out-of-the-box install (or maybe even a set this up for Data Warehousing install)? To get a HashAgg plan, you need to be able to fit all of the unique values in memory. That would be 98% of 48071704 rows, each 8+ bytes wide, giving a HashAgg memory sizing of over 375MB. You must allocate memory of the next power of two above the level you want, so we would need to allocate 512MB to work_mem before it would consider using a HashAgg. Can you let us know how high you have to set work_mem before an EXPLAIN (not EXPLAIN ANALYZE) chooses the HashAgg plan? Please be aware that publishing Oracle performance results is against the terms of their licence and we seek to be both fair and legitimate, especially within this public discussion forum. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] (View and SQL) VS plpgsql
Hello all , i post this question here because i wasen't able to find answer to my question elsewhere , i hope someone can answer. Abstract: The function that can be found at the end of the e-mail emulate two thing. First it will fill a record set of result with needed column from a table and two empty result column a min and a max. Those two column are then filled by a second query on the same table that will do a min and a max on an index idx_utctime. The function loop for the first recordset and return a setof record that is casted by caller to the function. The goald of this is to enabled the application that will receive the result set to minimise its work by having to group internaly two matching rowset. We use to handle two resultset but i am looking toward improving performances and at first glance it seem to speed up the process. Questions: 1. How could this be done in a single combinasion of SQL and view? 2. In a case like that is plpgsql really givig significant overhead? 3. Performance difference [I would need a working pure-SQL version to compare PLANNER and Explain results ] STUFF: --TABLE INDEX CREATE TABLE archive_event ( inst int4 NOT NULL, cid int8 NOT NULL, src int8 NOT NULL, dst int8 NOT NULL, bid int8 NOT NULL, tid int4 NOT NULL, utctime int4 NOT NULL, CONSTRAINT ids_archives_event_pkey PRIMARY KEY (inst, cid), CONSTRAINT ids_archives_event_cid_index UNIQUE (cid) ) --index CREATE INDEX idx_archive_utctime ON archive_event USING btree (utctime); CREATE INDEX idx_archive_src ON archive_event USING btree (src); CREATE INDEX idx_archive_bid_tid ON archive_event USING btree (tid, bid); --FUNCTION CREATE OR REPLACE FUNCTION console_get_source_rule_level_1() RETURNS SETOF RECORD AS ' DECLARE one_record record; r_record record; BEGIN FOR r_record IN SELECT count(cid) AS hits,src, bid, tid,NULL::int8 as min_time,NULL::int8 as max_time FROM archive_event WHERE inst=\'3\' AND (utctime BETWEEN \'111492\' AND \'1131512399\') GROUP BY src, bid, tid LOOP SELECT INTO one_record MIN(utctime) as timestart,MAX(utctime) as timestop from archive_event where src =r_record.src AND bid =r_record.bid AND tid = r_record.tid AND inst =\'3\' AND (utctime BETWEEN \'111492\' AND \'1131512399\'); r_record.min_time := one_record.timestart; r_record.max_time := one_record.timestop; RETURN NEXT r_record; END LOOP; RETURN; END; ' LANGUAGE 'plpgsql' VOLATILE; GRANT EXECUTE ON FUNCTION console_get_source_rule_level_1() TO console WITH GRANT OPTION; --FUNCTION CALLER SELECT * from get_source_rule_level_1() AS (hits int8,src int8,bid int8,tid int4,min_time int8,max_time int8) Eric Lauzon [Recherche Développement] Above Sécurité / Above Security Tél : (450) 430-8166 Fax : (450) 430-1858 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Outer Join performance in PostgreSQL
On Wed, 9 Nov 2005, Ashok Agrawal wrote: I noticed outer join is very very slow in postgresql as compared to Oracle. SELECT a.dln_code, a.company_name, to_char(a.certificate_date,'DD-MON-'), to_char(a.certificate_type_id, '99'), COALESCE(b.certificate_type_description,'None') , a.description, a.blanket_single, a.certificate_status, COALESCE(a.sun_legal_entity, 'None'), COALESCE(a.other_entity_name, 'None'), COALESCE(a.notes, 'None'),COALESCE(c.name, NULL), COALESCE(to_char(a.created_date,'DD-MON-'), 'N/A'), COALESCE(c.name, NULL), COALESCE(to_char(a.updated_date,'DD-MON-'), 'N/A'), COALESCE(e.name, NULL), COALESCE(to_char(a.approved_date,'DD-MON-'), 'N/A') FROM ecms_cert_headers a LEFT OUTER JOIN taxpack_user c ON (a.created_by = c.emp_no)) LEFT OUTER JOIN taxpack_user d ON (a.updated_by = d.emp_no)) LEFT OUTER JOIN taxpack_user e ON (a.approved_by = e.emp_no)) INNER JOIN ecms_certificate_types b ON (a.certificate_type_id= b.certificate_type_id )) WHERE a.dln_code = '17319' I think in the above it's safe to do the inner join first, although PostgreSQL won't determine that currently and that could have something to do with the difference in performance if Oracle did reorder the joins. If you were to run the query doing the INNER JOIN first, does that give the correct results and run more quickly in PostgreSQL? In either case, explain analyze output would be handy to find the actual times taken by the steps. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Some help on buffers and other performance tricks
Ron Peacetree wrote: 0= Optimize your schema to be a tight as possible. Your goal is to give yourself the maximum chance that everything you want to work on is in RAM when you need it. 1= Upgrade your RAM to as much as you can possibly strain to afford. 4GB at least. It's that important. 2= If the _entire_ DB does not fit in RAM after upgrading your RAM, the next step is making sure your HD IO subsystem is adequate to your needs. 3= Read the various pg tuning docs that are available and Do The Right Thing. 4= If performance is still not acceptable, then it's time to drill down into what specific actions/queries are problems. If you get to here and the entire DBMS is still not close to acceptable, your fundamental assumptions have to be re-examined. IMHO you should really be examining your queries _before_ you do any investment in hardware, because later those may prove unnecessary. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Some help on buffers and other performance tricks
Frank Wiles wrote: Obviously there are systems/datasets/quantities where this won't always work out best, but for the majority of systems out there complicating your schema, maxing your hardware, and THEN tuning is IMHO the wrong approach. I wasn't suggesting to complicate the schema -- I was actually thinking in systems where some queries are not using indexes, some queries are plain wrong, etc. Buying a very expensive RAID and then noticing that you just needed to create an index, is going to make somebody feel at least somewhat stupid. -- Alvaro Herrera Valdivia, Chile ICBM: S 39º 49' 17.7, W 73º 14' 26.8 Y una voz del caos me habló y me dijo Sonríe y sé feliz, podría ser peor. Y sonreí. Y fui feliz. Y fue peor. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Some help on buffers and other performance tricks
On Wed, 9 Nov 2005 21:43:33 -0300 Alvaro Herrera [EMAIL PROTECTED] wrote: Frank Wiles wrote: Obviously there are systems/datasets/quantities where this won't always work out best, but for the majority of systems out there complicating your schema, maxing your hardware, and THEN tuning is IMHO the wrong approach. I wasn't suggesting to complicate the schema -- I was actually thinking in systems where some queries are not using indexes, some queries are plain wrong, etc. Buying a very expensive RAID and then noticing that you just needed to create an index, is going to make somebody feel at least somewhat stupid. Sorry I was referring to Ron statement that the first step should be to Optimize your schema to be as tight as possible. But I agree, finding out you need an index after spending $$$ on extra hardware would be bad. Especially if you have to explain it to the person forking over the $$$! :) - Frank Wiles [EMAIL PROTECTED] http://www.wiles.org - ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Some help on buffers and other performance tricks
On Wed, 9 Nov 2005 20:07:52 -0300 Alvaro Herrera [EMAIL PROTECTED] wrote: IMHO you should really be examining your queries _before_ you do any investment in hardware, because later those may prove unnecessary. It all really depends on what you're doing. For some of the systems I run, 4 GBs of RAM is *WAY* overkill, RAID 1+0 is overkill, etc. In general I would slightly change the order of operations from: 1) Buy tons of RAM 2) Buy lots of disk I/O 3) Tune your conf 4) Examine your queries to 1) Tune your conf 2) Spend a few minutes examining your queries 3) Buy as much RAM as you can afford 4) Buy as much disk I/O as you can 5) Do in depth tuning of your queries/conf Personally I avoid planning my schema around my performance at the start. I just try to represent the data in a sensible, normalized way. While I'm sure I sub-consciously make decisions based on performance considerations early on, I don't do any major schema overhauls until I find I can't get the performance I need via tuning. Obviously there are systems/datasets/quantities where this won't always work out best, but for the majority of systems out there complicating your schema, maxing your hardware, and THEN tuning is IMHO the wrong approach. - Frank Wiles [EMAIL PROTECTED] http://www.wiles.org - ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Some help on buffers and other performance tricks
The point Gentlemen, was that Good Architecture is King. That's what I was trying to emphasize by calling proper DB architecture step 0. All other things being equal (and they usually aren't, this sort of stuff is _very_ context dependent), the more of your critical schema that you can fit into RAM during normal operation the better. ...and it all starts with proper DB design. Otherwise, you are quite right in stating that you risk wasting time, effort, and HW. Ron -Original Message- From: Frank Wiles [EMAIL PROTECTED] Sent: Nov 9, 2005 6:53 PM To: Alvaro Herrera [EMAIL PROTECTED] Cc: [EMAIL PROTECTED], [EMAIL PROTECTED], pgsql-performance@postgresql.org Subject: Re: [PERFORM] Some help on buffers and other performance tricks On Wed, 9 Nov 2005 20:07:52 -0300 Alvaro Herrera [EMAIL PROTECTED] wrote: IMHO you should really be examining your queries _before_ you do any investment in hardware, because later those may prove unnecessary. It all really depends on what you're doing. For some of the systems I run, 4 GBs of RAM is *WAY* overkill, RAID 1+0 is overkill, etc. In general I would slightly change the order of operations from: 1) Buy tons of RAM 2) Buy lots of disk I/O 3) Tune your conf 4) Examine your queries to 1) Tune your conf 2) Spend a few minutes examining your queries 3) Buy as much RAM as you can afford 4) Buy as much disk I/O as you can 5) Do in depth tuning of your queries/conf Personally I avoid planning my schema around my performance at the start. I just try to represent the data in a sensible, normalized way. While I'm sure I sub-consciously make decisions based on performance considerations early on, I don't do any major schema overhauls until I find I can't get the performance I need via tuning. Obviously there are systems/datasets/quantities where this won't always work out best, but for the majority of systems out there complicating your schema, maxing your hardware, and THEN tuning is IMHO the wrong approach. - Frank Wiles [EMAIL PROTECTED] http://www.wiles.org - ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match