Re: [PERFORM] Postgresql works too slow
Steve Poe [EMAIL PROTECTED] writes: Try enabliing your checkpoint_segments. In my example, our database restore took 75mins. After enabling checkpoints_segments to 20, we cut it down to less than 30 minutes. Increasing maintenance_work_mem might help too ... or several other settings ... with no information about exactly *what* is slow, it's hard to say. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Postgresql works too slow
Try enabliing your checkpoint_segments. In my example, our database restore took 75mins. After enabling checkpoints_segments to 20, we cut it down to less than 30 minutes. Increasing maintenance_work_mem might help too ... or several other settings ... with no information about exactly *what* is slow, it's hard to say. Try turning fsync = false for the duration of your reload. Chris ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] UNSUBSCRIBE
---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] How to improve db performance with $7K?
Problem with this strategy. You want battery-backed write caching for best performance safety. (I've tried IDE for WAL before w/ write caching off -- the DB got crippled whenever I had to copy files from/to the drive on the WAL partition -- ended up just moving WAL back on the same SCSI drive as the main DB.) That means in addition to a $$$ SCSI caching controller, you also need a $$$ SATA caching controller. From my glance at prices, advanced SATA controllers seem to cost nearly as their SCSI counterparts. This also looks to be the case for the drives themselves. Sure you can get super cheap 7200RPM SATA drives but they absolutely suck for database work. Believe me, I gave it a try once -- ugh. The highend WD 10K Raptors look pretty good though -- the benchmarks @ storagereview seem to put these drives at about 90% of SCSI 10Ks for both single-user and multi-user. However, they're also priced like SCSIs -- here's what I found @ Mwave (going through pricewatch to find WD740GDs): Seagate 7200 SATA -- 80GB$59 WD 10K SATA -- 72GB$182 Seagate 10K U320 -- 72GB$289 Using the above prices for a fixed budget for RAID-10, you could get: SATA 7200 -- 680MB per $1000 SATA 10K -- 200MB per $1000 SCSI 10K -- 125MB per $1000 For a 99% read-only DB that required lots of disk space (say something like Wikipedia or blog host), using consumer level SATA probably is ok. For anything else, I'd consider SATA 10K if (1) I do not need 15K RPM and (2) I don't have SCSI intrastructure already. Steve Poe wrote: If SATA drives don't have the ability to replace SCSI for a multi-user Postgres apps, but you needed to save on cost (ALWAYS an issue), could/would you implement SATA for your logs (pg_xlog) and keep the rest on SCSI? Steve Poe Mohan, Ross wrote: I've been doing some reading up on this, trying to keep up here, and have found out that (experts, just yawn and cover your ears) 1) some SATA drives (just type II, I think?) have a Phase Zero implementation of Tagged Command Queueing (the special sauce for SCSI). 2) This SATA TCQ is called NCQ and I believe it basically allows the disk software itself to do the reordering (this is called simple in TCQ terminology) It does not yet allow the TCQ head of queue command, allowing the current tagged request to go to head of queue, which is a simple way of manifesting a high priority request. 3) SATA drives are not yet multi-initiator? Largely b/c of 2 and 3, multi-initiator SCSI RAID'ed drives are likely to whomp SATA II drives for a while yet (read: a year or two) in multiuser PostGres applications. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Greg Stark Sent: Thursday, April 14, 2005 2:04 PM To: Kevin Brown Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] How to improve db performance with $7K? Kevin Brown [EMAIL PROTECTED] writes: Greg Stark wrote: I think you're being misled by analyzing the write case. Consider the read case. When a user process requests a block and that read makes its way down to the driver level, the driver can't just put it aside and wait until it's convenient. It has to go ahead and issue the read right away. Well, strictly speaking it doesn't *have* to. It could delay for a couple of milliseconds to see if other requests come in, and then issue the read if none do. If there are already other requests being fulfilled, then it'll schedule the request in question just like the rest. But then the cure is worse than the disease. You're basically describing exactly what does happen anyways, only you're delaying more requests than necessary. That intervening time isn't really idle, it's filled with all the requests that were delayed during the previous large seek... Once the first request has been fulfilled, the driver can now schedule the rest of the queued-up requests in disk-layout order. I really don't see how this is any different between a system that has tagged queueing to the disks and one that doesn't. The only difference is where the queueing happens. And *when* it happens. Instead of being able to issue requests while a large seek is happening and having some of them satisfied they have to wait until that seek is finished and get acted on during the next large seek. If my theory is correct then I would expect bandwidth to be essentially equivalent but the latency on SATA drives to be increased by about 50% of the average seek time. Ie, while a busy SCSI drive can satisfy most requests in about 10ms a busy SATA drive would satisfy most requests in 15ms. (add to that that 10k RPM and 15kRPM SCSI drives have even lower seek times and no such IDE/SATA drives exist...) In reality higher latency feeds into a system feedback loop causing your application to run slower causing bandwidth demands to be lower as well. It's often hard to distinguish root causes from symptoms when optimizing
Re: [PERFORM] immutable functions vs. join for lookups ?
On 4/15/05, Enrico Weigelt [EMAIL PROTECTED] wrote: a) SELECT items.a, items.b, ..., users.username FROM items, users WHERE items.uid = users.uid; c) CREATE FUNCTION id2username(oid) RETURNS text LANGUAGE 'SQL' IMMUTABLE AS ' SELECT username AS RESULT FROM users WHERE uid = $1'; You will be told that this function is not immutable but stable, and this is quite right. But consider such a function: CREATE OR REPLACE FUNCTION id2username (oid int) RETURNS TEXT AS $$ BEGIN IF oid = 0 THEN RETURN 'foo'; ELSIF oid = 1 THEN RETURN 'bar'; END IF; END; $$ LANGUAGE plpgsql IMMUTABLE; versus a lookup table with similar data. Logic suggests it should be faster than a table... It got me worried when I added: RAISE WARNING 'Called' after begin and I got lots of Called warnings when using this IMMUTABLE function in select... And the timings for ~6000 values in aaa table (and two values in lookup table) are: There is a query, output of the EXPLAIN ANALYZE, Time of EXPLAIN ANALYZE and Real time of SELECT (without EXPLAIN ANALYZE): a) simple select from temp table, and a lookup cost: EXPLAIN ANALYZE SELECT n FROM aaa; Seq Scan on aaa (cost=0.00..87.92 rows=5992 width=4) (actual time=0.011..24.849 rows=6144 loops=1) Total runtime: 51.881 ms (2 rows) Time: 52,882 ms Real time: 16,261 ms EXPLAIN ANALYZE SELECT id2username(n) FROM aaa limit 2; Limit (cost=0.00..0.03 rows=2 width=4) (actual time=0.111..0.150 rows=2 loops=1) - Seq Scan on aaa (cost=0.00..104.80 rows=6144 width=4) (actual time=0.102..0.129 rows=2 loops=1) Total runtime: 0.224 ms (3 rows) Time: 1,308 ms Real time: 1,380 ms b) natural join with lookup table: EXPLAIN ANALYZE SELECT username FROM aaa NATURAL JOIN lookup; Hash Join (cost=2.45..155.09 rows=3476 width=32) (actual time=0.306..83.677 rows=6144 loops=1) Hash Cond: (outer.n = inner.n) - Seq Scan on aaa (cost=0.00..87.92 rows=5992 width=4) (actual time=0.006..25.517 rows=6144 loops=1) - Hash (cost=2.16..2.16 rows=116 width=36) (actual time=0.237..0.237 rows=0 loops=1) - Seq Scan on lookup (cost=0.00..2.16 rows=116 width=36) (actual time=0.016..0.034 rows=2 loops=1) Total runtime: 107.378 ms (6 rows) Time: 109,040 ms Real time: 25,364 ms c) IMMUTABLE static lookup function: EXPLAIN ANALYZE SELECT id2username(n) FROM aaa; Seq Scan on aaa (cost=0.00..104.80 rows=6144 width=4) (actual time=0.089..116.397 rows=6144 loops=1) Total runtime: 143.800 ms (2 rows) Time: 144,869 ms Real time: 102,428 ms d) self-join with a function ;) EXPLAIN ANALYZE SELECT * FROM (SELECT n, id2username(n) AS username FROM (SELECT DISTINCT n FROM aaa) AS values) AS v_lookup RIGHT JOIN aaa USING (n); Hash Left Join (cost=506.82..688.42 rows=6144 width=36) (actual time=102.382..182.661 rows=6144 loops=1) Hash Cond: (outer.n = inner.n) - Seq Scan on aaa (cost=0.00..89.44 rows=6144 width=4) (actual time=0.012..24.360 rows=6144 loops=1) - Hash (cost=506.82..506.82 rows=2 width=36) (actual time=102.217..102.217 rows=0 loops=1) - Subquery Scan v_lookup (cost=476.05..506.82 rows=2 width=36) (actual time=53.626..102.057 rows=2 loops=1) - Subquery Scan values (cost=476.05..506.80 rows=2 width=4) (actual time=53.613..102.023 rows=2 loops=1) - Unique (cost=476.05..506.77 rows=2 width=4) (actual time=53.456..101.772 rows=2 loops=1) - Sort (cost=476.05..491.41 rows=6144 width=4) (actual time=53.440..76.710 rows=6144 loops=1) Sort Key: n - Seq Scan on aaa (cost=0.00..89.44 rows=6144 width=4) (actual time=0.013..26.626 rows=6144 loops=1) Total runtime: 209.378 ms (11 rows) Time: 211,460 ms Real time: 46,682 ms ...so this IMMUTABLE is twice as slow (~100 ms) as the query joining itself with a SELECT DISTINCT on an IMMUTABLE function (~50 ms), which is twice as slow as JOIN against lookup table (~25 ms), and I feel this IMMUTABLE function could be around ~20 ms (~16 ms plus calling the function two times plus giving the values). Ah, and this is PostgreSQL 8.0.1 running under FreeBSD on a CPU: Intel(R) Celeron(R) CPU 2.40GHz (2400.10-MHz 686-class CPU). Regards, Dawid PS: I have a feeling that IMMUTABLE functions worked better in 7.4, yet I am unable to confirm this. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] immutable functions vs. join for lookups ?
d) self-join with a function ;) EXPLAIN ANALYZE SELECT * FROM (SELECT n, id2username(n) AS username FROM (SELECT DISTINCT n FROM aaa) AS values) AS v_lookup RIGHT JOIN aaa USING (n); That's pretty clever. It sure seems like the server was not caching the results of the function...maybe the server thought it was to small a table to bother? Merlin ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
FW: [PERFORM] speed of querry?
Another odd thing is when I tried turning off merge joins on the XP desktop It took 32 secs to run compared to the 6 secs it was taking. On the Linux (4proc box) it is now running in 3 secs with the mergejoins turned off. Unfortunately it takes over 2 minutes to actually return the 160,000+ rows. I am guessing that is either network (I have gig cards on a LAN) or perhaps the ODBC driver (using PGADMIN III to do the select). I tried to run on psql on the server but it was putting it out to more. If I do it and use test.txt will it run it all out so I can get a time? Does it display the time anywhere like in pgadminIII? ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
FW: [PERFORM] speed of querry?
What are the statistics for tbljobtitle.id and tbljobtitle.clientnum I added default_statistics_target = 250 to the config and re-loaded the data base. If that is what you mean? --- how many distinct values of each, tbljobtitle.id 6764 for all clients 1018 for SAKS tbljobtitle.clientnum 237 distinct clientnums just 1 for SAKS and are the distributions skewed to a few popular values? There are 3903 distinct values for jobtitle Not sure if I answered the questions, let me know if you need more info. It appears there are 1018 job titles in the table for saks and 6764 for all the clients. There can be more values as presentation layer can have more then one value for an id. SAKS is not using presentation layer yet as there are only 1018 distinct values 1 for each id. Joel ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
FW: [PERFORM] speed of querry?
Sorry if this posts twice I posted and did not see it hit the list. What are the statistics for tbljobtitle.id and tbljobtitle.clientnum I added default_statistics_target = 250 to the config and re-loaded the data base. If that is what you mean? --- how many distinct values of each, tbljobtitle.id 6764 for all clients 1018 for SAKS tbljobtitle.clientnum 237 distinct clientnums just 1 for SAKS and are the distributions skewed to a few popular values? There are 3903 distinct values for jobtitle Not sure if I answered the questions, let me know if you need more info. It appears there are 1018 job titles in the table for saks and 6764 for all the clients. There can be more values as presentation layer can have more then one value for an id. SAKS is not using presentation layer yet as there are only 1018 distinct values 1 for each id. Joel ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] speed of querry?
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Joel Fradkin Sent: 18 April 2005 14:02 To: PostgreSQL Perform Subject: FW: [PERFORM] speed of querry? Another odd thing is when I tried turning off merge joins on the XP desktop It took 32 secs to run compared to the 6 secs it was taking. On the Linux (4proc box) it is now running in 3 secs with the mergejoins turned off. Unfortunately it takes over 2 minutes to actually return the 160,000+ rows. I am guessing that is either network (I have gig cards on a LAN) or perhaps the ODBC driver (using PGADMIN III to do the select). pgAdmin III uses libpq, not the ODBC driver. Regards, Dave ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] speed of querry?
pgAdmin III uses libpq, not the ODBC driver. Sorry I am not too aware of all the semantics. I guess the question is if it is normal to take 2 mins to get 160K of records, or is there something else I can do (I plan on limiting the query screens using limit and offset; I realize this will only be effective for the early part of the returned record set, but I believe they don't page through a bunch of records, they probably add search criteria). But for reporting I will need to return all the records and this seems slow to me (but it might be in line with what I get now; I will have to do some benchmarking). The application is a mixture of .net and asp and will soon have java. So I am using the .net library for the .net pages and the ODBC driver for the asp pages. I did find using a view for the location join sped up the query a great deal, I will have to see if there are other places I can use that thinking (instead of joining on the associate table and its dependants I can just join on a view of that data, etc). Basically I have a view that does a join from location to district, region and division tables. The old viwassoclist had those joined to the assoc table in the viwassoclist, I changed it to use the view I created where the tables were joined to the location table and in assoclist I just join to the location view. This really made a huge difference in speed. Regards, Dave ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] immutable functions vs. join for lookups ?
On 4/18/05, Merlin Moncure [EMAIL PROTECTED] wrote: d) self-join with a function ;) EXPLAIN ANALYZE SELECT * FROM (SELECT n, id2username(n) AS username FROM (SELECT DISTINCT n FROM aaa) AS values) AS v_lookup RIGHT JOIN aaa USING (n); That's pretty clever. It sure seems like the server was not caching the results of the function...maybe the server thought it was to small a table to bother? Nah, I don't thinks so. Having around 2 097 152 rows of 1s and 0s takes 48 seconds for id2username() query. The self join you've quoted above takes 32 seconds. SELECT n FROM aaa; takes 7 seconds. Thinking further... SELECT CASE n WHEN 0 THEN 'foo' WHEN 1 THEN 'bar' END FROM aaa; takes 9 seconds. CREATE OR REPLACE FUNCTION id2un_case(oid int) RETURNS text AS $$ BEGIN RETURN CASE oid WHEN 0 THEN 'foo' WHEN 1 THEN 'bar' END; END; $$ LANGUAGE plpgsql IMMUTABLE; SELECT id2un_case(n) FROM aaa; ...takes 36 seconds ...and to see how it depends on flags used: SELECT count(id2un_case(n)) FROM aaa; ...id2un_case(n) IMMUTABLE takes 29900,114 ms ...id2un_case(n) IMMUTABLE STRICT takes 30187,958 ms ...id2un_case(n) STABLE takes 31457,560 ms ...id2un_case(n) takes 33545,178 ms ...id2un_case(n) VOLATILE takes 35150,920 ms (and a count(CASE n WHEN ... END) FROM aaa takes: 2564,188 ms I understand that these measurements are not too accurate. They were done on idle system, and the queries were run couple of times (to make sure they're cached :)). I believe either something is minor performance difference between IMMUTABLE STABLE and even VOLATILE plpgsql... :( Oh, and doing things like ORDER BY n or WHERE n = 1 didn't help either... I still wonder whether it's only my case or is there really something wrong with these functions? Regards, Dawid ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: FW: [PERFORM] speed of querry?
On Mon, 18 Apr 2005, Joel Fradkin wrote: Another odd thing is when I tried turning off merge joins on the XP desktop It took 32 secs to run compared to the 6 secs it was taking. On the Linux (4proc box) it is now running in 3 secs with the mergejoins turned off. Unfortunately it takes over 2 minutes to actually return the 160,000+ rows. I am guessing that is either network (I have gig cards on a LAN) or perhaps the ODBC driver (using PGADMIN III to do the select). I tried to run on psql on the server but it was putting it out to more. If I do it and use test.txt will it run it all out so I can get a time? Does it display the time anywhere like in pgadminIII? Redirecting should turn the pager off. \timing will add a timing number after queries. If you want to not be bothered by the pager, you can turn if off with \pset pager off. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] How to improve postgres performace
Hi all, Some months ago i post a similar problem here i it was solved by running vaccumdb time by time. So, when i started using the postgres, i never been used the vacuumdb, and after 2 months i started using once a week, after few weeks, i tried once a day and now twice a day. At this weekend i have started to use pg_autovacuum with default settings. I really worried about that, because it's no enough anymore, and users claim about performace. But running the vacuumdb full, everthing starts to run better again, so i think the problem is not related to a specific query. What I can do to check what I have change to get more performance ? Could I use vacuum verbose to check what is going on ? So, how ? Most all the time, even user querying the server the machine is 96%-100% idle. The discs are SCSI, FreeBSD 5.3, the size of database is 1.1Gb, max 30 connections and 10 concurrent conections. My server have 512Mb Ram and 256Mb has changed to SHMAX. There is max 1000 inserted/excluded/Updated row by day. These are my kernel params: -- options SHMMAXPGS=65536 options SEMMNI=40 options SEMMNS=240 options SEMUME=40 options SEMMNU=120 Postgresql.conf non-default settings tcpip_socket = true max_connections = 30 shared_buffers = 1024 sort_mem = 2048 vacuum_mem = 16384 wal_buffers = 16 checkpoint_segments = 5 effective_cache_size = 16384 random_page_cost = 2 stats_start_collector = true stats_row_level = true I follow the most of all discussions in this group and tried myself change the parameters, but now, I don't know more what to do to get better performance. Thanks a Lot Rodrigo Moreno ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] How to improve db performance with $7K?
William Yu [EMAIL PROTECTED] writes: Using the above prices for a fixed budget for RAID-10, you could get: SATA 7200 -- 680MB per $1000 SATA 10K -- 200MB per $1000 SCSI 10K -- 125MB per $1000 What a lot of these analyses miss is that cheaper == faster because cheaper means you can buy more spindles for the same price. I'm assuming you picked equal sized drives to compare so that 200MB/$1000 for SATA is almost twice as many spindles as the 125MB/$1000. That means it would have almost double the bandwidth. And the 7200 RPM case would have more than 5x the bandwidth. While 10k RPM drives have lower seek times, and SCSI drives have a natural seek time advantage, under load a RAID array with fewer spindles will start hitting contention sooner which results into higher latency. If the controller works well the larger SATA arrays above should be able to maintain their mediocre latency much better under load than the SCSI array with fewer drives would maintain its low latency response time despite its drives' lower average seek time. -- greg ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] How to improve db performance with $7K?
This is fundamentaly untrue. A mirror is still a mirror. At most in a RAID 10 you can have two simultaneous seeks. You are always going to be limited by the seek time of your drives. It's a stripe, so you have to read from all members of the stripe to get data, requiring all drives to seek. There is no advantage to seek time in adding more drives. By adding more drives you can increase throughput, but the max throughput of the PCI-X bus isn't that high (I think around 400MB/sec) You can easily get this with a six or seven drive RAID 5, or a ten drive RAID 10. At that point you start having to factor in the cost of a bigger chassis to hold more drives, which can be big bucks. Alex Turner netEconomist On 18 Apr 2005 10:59:05 -0400, Greg Stark [EMAIL PROTECTED] wrote: William Yu [EMAIL PROTECTED] writes: Using the above prices for a fixed budget for RAID-10, you could get: SATA 7200 -- 680MB per $1000 SATA 10K -- 200MB per $1000 SCSI 10K -- 125MB per $1000 What a lot of these analyses miss is that cheaper == faster because cheaper means you can buy more spindles for the same price. I'm assuming you picked equal sized drives to compare so that 200MB/$1000 for SATA is almost twice as many spindles as the 125MB/$1000. That means it would have almost double the bandwidth. And the 7200 RPM case would have more than 5x the bandwidth. While 10k RPM drives have lower seek times, and SCSI drives have a natural seek time advantage, under load a RAID array with fewer spindles will start hitting contention sooner which results into higher latency. If the controller works well the larger SATA arrays above should be able to maintain their mediocre latency much better under load than the SCSI array with fewer drives would maintain its low latency response time despite its drives' lower average seek time. -- greg ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Sort and index
Hi, I thought that an index can be used for sorting. I'm a little confused about the following result: create index OperationsName on Operations(cOperationName); explain SELECT * FROM Operations ORDER BY cOperationName; QUERY PLAN --- Sort (cost=185.37..189.20 rows=1532 width=498) Sort Key: coperationname - Seq Scan on operations (cost=0.00..104.32 rows=1532 width=498) (3 rows) Is this supposed to be so? Andrei -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.308 / Virus Database: 266.9.15 - Release Date: 4/16/2005 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] How to improve postgres performace
Rodrigo Moreno [EMAIL PROTECTED] writes: At this weekend i have started to use pg_autovacuum with default settings. I really worried about that, because it's no enough anymore, and users claim about performace. But running the vacuumdb full, everthing starts to run better again, so i think the problem is not related to a specific query. It sounds like you may not have the FSM settings set large enough for your database. The default settings are only enough for a small DB (perhaps a few hundred meg). regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Sort and index
Thanks for the quick response Andrei Dave Held wrote: -Original Message- From: Andrei Gaspar [mailto:[EMAIL PROTECTED] Sent: Monday, April 18, 2005 10:36 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] Sort and index I thought that an index can be used for sorting. I'm a little confused about the following result: create index OperationsName on Operations(cOperationName); explain SELECT * FROM Operations ORDER BY cOperationName; QUERY PLAN -- - Sort (cost=185.37..189.20 rows=1532 width=498) Sort Key: coperationname - Seq Scan on operations (cost=0.00..104.32 rows=1532 width=498) (3 rows) Is this supposed to be so? Since you are fetching the entire table, you are touching all the rows. If the query were to fetch the rows in index order, it would be seeking all over the table's tracks. By fetching in sequence order, it has a much better chance of fetching rows in a way that minimizes head seeks. Since disk I/O is generally 10-100x slower than RAM, the in-memory sort can be surprisingly slow and still beat indexed disk access. Of course, this is only true if the table can fit and be sorted entirely in memory (which, with 1500 rows, probably can). __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.308 / Virus Database: 266.9.15 - Release Date: 4/16/2005 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
RES: [PERFORM] How to improve postgres performace
Tom, How to check if the value it's enough ? The log generate by vacuum verbose can help ? The current values for: max_fsm_pages = 1048576 max_fsm_relations = 1000 this is enough ? Regards, Rodrigo -Mensagem original- De: Tom Lane [mailto:[EMAIL PROTECTED] Enviada em: segunda-feira, 18 de abril de 2005 12:58 Para: Rodrigo Moreno Cc: pgsql-performance@postgresql.org Assunto: Re: [PERFORM] How to improve postgres performace Rodrigo Moreno [EMAIL PROTECTED] writes: At this weekend i have started to use pg_autovacuum with default settings. I really worried about that, because it's no enough anymore, and users claim about performace. But running the vacuumdb full, everthing starts to run better again, so i think the problem is not related to a specific query. It sounds like you may not have the FSM settings set large enough for your database. The default settings are only enough for a small DB (perhaps a few hundred meg). regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] How to improve db performance with $7K?
[snip] Adding drives will not let you get lower response times than the average seek time on your drives*. But it will let you reach that response time more often. [snip] I believe your assertion is fundamentaly flawed. Adding more drives will not let you reach that response time more often. All drives are required to fill every request in all RAID levels (except possibly 0+1, but that isn't used for enterprise applicaitons). Most requests in OLTP require most of the request time to seek, not to read. Only in single large block data transfers will you get any benefit from adding more drives, which is atypical in most database applications. For most database applications, the only way to increase transactions/sec is to decrease request service time, which is generaly achieved with better seek times or a better controller card, or possibly spreading your database accross multiple tablespaces on seperate paritions. My assertion therefore is that simply adding more drives to an already competent* configuration is about as likely to increase your database effectiveness as swiss cheese is to make your car run faster. Alex Turner netEconomist *Assertion here is that the DBA didn't simply configure all tables and xlog on a single 7200 RPM disk, but has seperate physical drives for xlog and tablespace at least on 10k drives. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Sort and index
On Mon, Apr 18, 2005 at 10:44:43AM -0500, Dave Held wrote: I thought that an index can be used for sorting. I'm a little confused about the following result: create index OperationsName on Operations(cOperationName); explain SELECT * FROM Operations ORDER BY cOperationName; QUERY PLAN -- - Sort (cost=185.37..189.20 rows=1532 width=498) Sort Key: coperationname - Seq Scan on operations (cost=0.00..104.32 rows=1532 width=498) (3 rows) Is this supposed to be so? Since you are fetching the entire table, you are touching all the rows. If the query were to fetch the rows in index order, it would be seeking all over the table's tracks. By fetching in sequence order, it has a much better chance of fetching rows in a way that minimizes head seeks. Since disk I/O is generally 10-100x slower than RAM, the in-memory sort can be surprisingly slow and still beat indexed disk access. Of course, this is only true if the table can fit and be sorted entirely in memory (which, with 1500 rows, probably can). Out of curiosity, what are the results of the following queries? (Queries run twice to make sure time differences aren't due to caching.) SET enable_seqscan TO on; SET enable_indexscan TO off; EXPLAIN ANALYZE SELECT * FROM Operations ORDER BY cOperationName; EXPLAIN ANALYZE SELECT * FROM Operations ORDER BY cOperationName; SET enable_seqscan TO off; SET enable_indexscan TO on; EXPLAIN ANALYZE SELECT * FROM Operations ORDER BY cOperationName; EXPLAIN ANALYZE SELECT * FROM Operations ORDER BY cOperationName; SELECT version(); With 1500 rows of random data, I consistently see better performance with an index scan (about twice as fast as a sequence scan), and the planner uses an index scan if it has a choice (i.e., when enable_seqscan and enable_indexscan are both on). But my test case and postgresql.conf settings might be different enough from yours to account for different behavior. -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] How to improve db performance with $7K?
Hi, At 18:56 18/04/2005, Alex Turner wrote: All drives are required to fill every request in all RAID levels No, this is definitely wrong. In many cases, most drives don't actually have the data requested, how could they handle the request? When reading one random sector, only *one* drive out of N is ever used to service any given request, be it RAID 0, 1, 0+1, 1+0 or 5. When writing: - in RAID 0, 1 drive - in RAID 1, RAID 0+1 or 1+0, 2 drives - in RAID 5, you need to read on all drives and write on 2. Otherwise, what would be the point of RAID 0, 0+1 or 1+0? Jacques. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] How to improve db performance with $7K?
Alex Turner wrote: [snip] Adding drives will not let you get lower response times than the average seek time on your drives*. But it will let you reach that response time more often. [snip] I believe your assertion is fundamentaly flawed. Adding more drives will not let you reach that response time more often. All drives are required to fill every request in all RAID levels (except possibly 0+1, but that isn't used for enterprise applicaitons). Most requests in OLTP require most of the request time to seek, not to read. Only in single large block data transfers will you get any benefit from adding more drives, which is atypical in most database applications. For most database applications, the only way to increase transactions/sec is to decrease request service time, which is generaly achieved with better seek times or a better controller card, or possibly spreading your database accross multiple tablespaces on seperate paritions. My assertion therefore is that simply adding more drives to an already competent* configuration is about as likely to increase your database effectiveness as swiss cheese is to make your car run faster. Consider the case of a mirrored file system with a mostly read() workload. Typical behavior is to use a round-robin method for issueing the read operations to each mirror in turn, but one can use other methods like a geometric algorithm that will issue the reads to the drive with the head located closest to the desired track.Some systems have many mirrors of the data for exactly this behavior. In fact, one can carry this logic to the extreme and have one drive for every cylinder in the mirror, thus removing seek latencies completely. In fact this extreme case would also remove the rotational latency as the cylinder will be in the disks read cache. :-) Of course, writing data would be a bit slow! I'm not sure I understand your assertion that all drives are required to fill every request in all RAID levels. After all, in mirrored reads only one mirror needs to read any given block of data, so I don't know what goal is achieved in making other mirrors read the same data. My assertion (based on ample personal experience) is that one can *always* get improved performance by adding more drives. Just limit the drives to use the first few cylinders so that the average seek time is greatly reduced and concatenate the drives together. One can then build the usual RAID device out of these concatenated metadevices. Yes, one is wasting lots of disk space, but that's life. If your goal is performance, then you need to put your money on the table. The system will be somewhat unreliable because of the device count, additional SCSI buses, etc., but that too is life in the high performance world. -- Alan ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] How to improve db performance with $7K?
Alex Turner wrote: [snip] Adding drives will not let you get lower response times than the average seek time on your drives*. But it will let you reach that response time more often. [snip] I believe your assertion is fundamentaly flawed. Adding more drives will not let you reach that response time more often. All drives are required to fill every request in all RAID levels (except possibly 0+1, but that isn't used for enterprise applicaitons). Actually 0+1 is the recommended configuration for postgres databases (both for xlog and for the bulk data), because the write speed of RAID5 is quite poor. Hence you base assumption is not correct, and adding drives *does* help. Most requests in OLTP require most of the request time to seek, not to read. Only in single large block data transfers will you get any benefit from adding more drives, which is atypical in most database applications. For most database applications, the only way to increase transactions/sec is to decrease request service time, which is generaly achieved with better seek times or a better controller card, or possibly spreading your database accross multiple tablespaces on seperate paritions. This is probably true. However, if you are doing lots of concurrent connections, and things are properly spread across multiple spindles (using RAID0+1, or possibly tablespaces across multiple raids). Then each seek occurs on a separate drive, which allows them to occur at the same time, rather than sequentially. Having 2 processes competing for seeking on the same drive is going to be worse than having them on separate drives. John =:- signature.asc Description: OpenPGP digital signature
Re: [PERFORM] How to improve db performance with $7K?
Hi, At 16:59 18/04/2005, Greg Stark wrote: William Yu [EMAIL PROTECTED] writes: Using the above prices for a fixed budget for RAID-10, you could get: SATA 7200 -- 680MB per $1000 SATA 10K -- 200MB per $1000 SCSI 10K -- 125MB per $1000 What a lot of these analyses miss is that cheaper == faster because cheaper means you can buy more spindles for the same price. I'm assuming you picked equal sized drives to compare so that 200MB/$1000 for SATA is almost twice as many spindles as the 125MB/$1000. That means it would have almost double the bandwidth. And the 7200 RPM case would have more than 5x the bandwidth. While 10k RPM drives have lower seek times, and SCSI drives have a natural seek time advantage, under load a RAID array with fewer spindles will start hitting contention sooner which results into higher latency. If the controller works well the larger SATA arrays above should be able to maintain their mediocre latency much better under load than the SCSI array with fewer drives would maintain its low latency response time despite its drives' lower average seek time. I would definitely agree. More factors in favor of more cheap drives: - cheaper drives (7200 rpm) have larger disks (3.7 diameter against 2.6 or 3.3). That means the outer tracks hold more data, and the same amount of data is held on a smaller area, which means less tracks, which means reduced seek times. You can roughly count the real average seek time as (average seek time over full disk * size of dataset / capacity of disk). And you actually need to physicall seek less often too. - more disks means less data per disk, which means the data is further concentrated on outer tracks, which means even lower seek times Also, what counts is indeed not so much the time it takes to do one single random seek, but the number of random seeks you can do per second. Hence, more disks means more seeks per second (if requests are evenly distributed among all disks, which a good stripe size should achieve). Not taking into account TCQ/NCQ or write cache optimizations, the important parameter (random seeks per second) can be approximated as: N * 1000 / (lat + seek * ds / (N * cap)) Where: N is the number of disks lat is the average rotational latency in milliseconds (500/(rpm/60)) seek is the average seek over the full disk in milliseconds ds is the dataset size cap is the capacity of each disk Using this formula and a variety of disks, counting only the disks themselves (no enclosures, controllers, rack space, power, maintenance...), trying to maximize the number of seeks/second for a fixed budget (1000 euros) with a dataset size of 100 GB makes SATA drives clear winners: you can get more than 4000 seeks/second (with 21 x 80GB disks) where SCSI cannot even make it to the 1400 seek/second point (with 8 x 36 GB disks). Results can vary quite a lot based on the dataset size, which illustrates the importance of staying on the edges of the disks. I'll try to make the analysis more complete by counting some of the overhead (obviously 21 drives has a lot of other implications!), but I believe SATA drives still win in theory. It would be interesting to actually compare this to real-world (or nearly-real-world) benchmarks to measure the effectiveness of features like TCQ/NCQ etc. Jacques. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] How to improve db performance with $7K?
Alex, In the situation of the animal hospital server I oversee, their application is OLTP. Adding hard drives (6-8) does help performance. Benchmarks like pgbench and OSDB agree with it, but in reality users could not see noticeable change. However, moving the top 5/10 tables and indexes to their own space made a greater impact. Someone who reads PostgreSQL 8.0 Performance Checklist is going to see point #1 add more disks is the key. How about adding a subpoint to explaining when more disks isn't enough or applicable? I maybe generalizing the complexity of tuning an OLTP application, but some clarity could help. Steve Poe ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
RES: RES: [PERFORM] How to improve postgres performace
That max_fsm_pages value is enough to cover 8Gb, so it should work OK for a database disk footprint up to 10 or so Gb. I don't know how many tables in your installation so I can't say if max_fsm_relations is high enough, but you can check that by looking at the tail end of the output of VACUUM VERBOSE. (Or just count 'em ;-)) The last count in vacuum verbose shows me 92 relations, and I know the lower value for max_fsm_relations is enough, maybe I'll change to 500. Offhand these look reasonable, though, so if you are seeing database bloat over time it probably means you need to tweak your autovacuum settings. I'm not much of an autovacuum expert, but maybe someone else can help you there. I'll let the autovacuum running this week to see what happen. You might want to keep track of physical file sizes over a period of time and try to determine exactly where the bloat is happening. There is two mostly used and bigger tables, I'll keep eyes on both tables. Thanks Rodrigo Moreno ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] How to improve db performance with $7K?
Not true - the recommended RAID level is RAID 10, not RAID 0+1 (at least I would never recommend 1+0 for anything). RAID 10 and RAID 0+1 are _quite_ different. One gives you very good redundancy, the other is only slightly better than RAID 5, but operates faster in degraded mode (single drive). Alex Turner netEconomist On 4/18/05, John A Meinel [EMAIL PROTECTED] wrote: Alex Turner wrote: [snip] Adding drives will not let you get lower response times than the average seek time on your drives*. But it will let you reach that response time more often. [snip] I believe your assertion is fundamentaly flawed. Adding more drives will not let you reach that response time more often. All drives are required to fill every request in all RAID levels (except possibly 0+1, but that isn't used for enterprise applicaitons). Actually 0+1 is the recommended configuration for postgres databases (both for xlog and for the bulk data), because the write speed of RAID5 is quite poor. Hence you base assumption is not correct, and adding drives *does* help. Most requests in OLTP require most of the request time to seek, not to read. Only in single large block data transfers will you get any benefit from adding more drives, which is atypical in most database applications. For most database applications, the only way to increase transactions/sec is to decrease request service time, which is generaly achieved with better seek times or a better controller card, or possibly spreading your database accross multiple tablespaces on seperate paritions. This is probably true. However, if you are doing lots of concurrent connections, and things are properly spread across multiple spindles (using RAID0+1, or possibly tablespaces across multiple raids). Then each seek occurs on a separate drive, which allows them to occur at the same time, rather than sequentially. Having 2 processes competing for seeking on the same drive is going to be worse than having them on separate drives. John =:- ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] How to improve db performance with $7K?
I think the add more disks thing is really from the point of view that one disk isn't enough ever. You should really have at least four drives configured into two RAID 1s. Most DBAs will know this, but most average Joes won't. Alex Turner netEconomist On 4/18/05, Steve Poe [EMAIL PROTECTED] wrote: Alex, In the situation of the animal hospital server I oversee, their application is OLTP. Adding hard drives (6-8) does help performance. Benchmarks like pgbench and OSDB agree with it, but in reality users could not see noticeable change. However, moving the top 5/10 tables and indexes to their own space made a greater impact. Someone who reads PostgreSQL 8.0 Performance Checklist is going to see point #1 add more disks is the key. How about adding a subpoint to explaining when more disks isn't enough or applicable? I maybe generalizing the complexity of tuning an OLTP application, but some clarity could help. Steve Poe ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] How to improve db performance with $7K?
Ok - well - I am partially wrong... If you're stripe size is 64Kb, and you are reading 256k worth of data, it will be spread across four drives, so you will need to read from four devices to get your 256k of data (RAID 0 or 5 or 10), but if you are only reading 64kb of data, I guess you would only need to read from one disk. So my assertion that adding more drives doesn't help is pretty wrong... particularly with OLTP because it's always dealing with blocks that are smaller that the stripe size. Alex Turner netEconomist On 4/18/05, Jacques Caron [EMAIL PROTECTED] wrote: Hi, At 18:56 18/04/2005, Alex Turner wrote: All drives are required to fill every request in all RAID levels No, this is definitely wrong. In many cases, most drives don't actually have the data requested, how could they handle the request? When reading one random sector, only *one* drive out of N is ever used to service any given request, be it RAID 0, 1, 0+1, 1+0 or 5. When writing: - in RAID 0, 1 drive - in RAID 1, RAID 0+1 or 1+0, 2 drives - in RAID 5, you need to read on all drives and write on 2. Otherwise, what would be the point of RAID 0, 0+1 or 1+0? Jacques. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] How to improve db performance with $7K?
So I wonder if one could take this stripe size thing further and say that a larger stripe size is more likely to result in requests getting served parallized across disks which would lead to increased performance? Again, thanks to all people on this list, I know that I have learnt a _hell_ of alot since subscribing. Alex Turner netEconomist On 4/18/05, Alex Turner [EMAIL PROTECTED] wrote: Ok - well - I am partially wrong... If you're stripe size is 64Kb, and you are reading 256k worth of data, it will be spread across four drives, so you will need to read from four devices to get your 256k of data (RAID 0 or 5 or 10), but if you are only reading 64kb of data, I guess you would only need to read from one disk. So my assertion that adding more drives doesn't help is pretty wrong... particularly with OLTP because it's always dealing with blocks that are smaller that the stripe size. Alex Turner netEconomist On 4/18/05, Jacques Caron [EMAIL PROTECTED] wrote: Hi, At 18:56 18/04/2005, Alex Turner wrote: All drives are required to fill every request in all RAID levels No, this is definitely wrong. In many cases, most drives don't actually have the data requested, how could they handle the request? When reading one random sector, only *one* drive out of N is ever used to service any given request, be it RAID 0, 1, 0+1, 1+0 or 5. When writing: - in RAID 0, 1 drive - in RAID 1, RAID 0+1 or 1+0, 2 drives - in RAID 5, you need to read on all drives and write on 2. Otherwise, what would be the point of RAID 0, 0+1 or 1+0? Jacques. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] How to improve db performance with $7K?
Jacques Caron [EMAIL PROTECTED] writes: When writing: - in RAID 0, 1 drive - in RAID 1, RAID 0+1 or 1+0, 2 drives - in RAID 5, you need to read on all drives and write on 2. Actually RAID 5 only really needs to read from two drives. The existing parity block and the block you're replacing. It just xors the old block, the new block, and the existing parity block to generate the new parity block. -- greg ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] How to improve db performance with $7K?
Alex Turner wrote: Not true - the recommended RAID level is RAID 10, not RAID 0+1 (at least I would never recommend 1+0 for anything). Uhmm I was under the impression that 1+0 was RAID 10 and that 0+1 is NOT RAID 10. Ref: http://www.acnc.com/raid.html Sincerely, Joshua D. Drake ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] How to improve db performance with $7K?
Hi, At 20:16 18/04/2005, Alex Turner wrote: So my assertion that adding more drives doesn't help is pretty wrong... particularly with OLTP because it's always dealing with blocks that are smaller that the stripe size. When doing random seeks (which is what a database needs most of the time), the number of disks helps improve the number of seeks per second (which is the bottleneck in this case). When doing sequential reads, the number of disks helps improve total throughput (which is the bottleneck in that case). In short: in always helps :-) Jacques. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] immutable functions vs. join for lookups ?
Tom Lane [EMAIL PROTECTED] writes: It would be interesting sometime to try to teach the planner about inlining SQL-language functions to become joins. That is, given create function id2name(int) returns text as 'select name from mytab where id = $1' language sql stable; select uid, id2name(uid) from othertab where something; I think that in principle this could automatically be converted to select uid, name from othertab left join mytab on (uid = id) where something; The Inlining of the function is presumably a side-issue. I have tons of queries that use subqueries in the select list for which the same behaviour would be appropriate. Things like select uid, (select name from mytab where id = uid) as name from othertab ... There are some pitfalls though, particularly that you'd have to be able to prove that the function's query couldn't return more than one row (else the join might produce more result rows than the original query). Or just have a special join type that has the desired behaviour in that case. Ie, pretend the query was really SELECT * FROM othertab LEFT SINGLE JOIN mytab ... Where LEFT SINGLE JOIN is an imaginary syntax that doesn't actually have to exist in the parser, but exists in the planner/executor and behaves differently in the case of duplicate matches. Actually I could see such a syntax being useful directly too. -- greg ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] How to improve db performance with $7K?
Don't you think optimal stripe width would be a good question to research the binaries for? I'd think that drives the answer, largely. (uh oh, pun alert) EG, oracle issues IO requests (this may have changed _just_ recently) in 64KB chunks, regardless of what you ask for. So when I did my striping (many moons ago, when the Earth was young...) I did it in 128KB widths, and set the oracle multiblock read count according. For oracle, any stripe size under 64KB=stupid, anything much over 128K/258K=wasteful. I am eager to find out how PG handles all this. - Ross p.s. Brooklyn thug accent 'You want a database record? I gotcher record right here' http://en.wikipedia.org/wiki/Akashic_Records -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Alex Turner Sent: Monday, April 18, 2005 2:21 PM To: Jacques Caron Cc: Greg Stark; William Yu; pgsql-performance@postgresql.org Subject: Re: [PERFORM] How to improve db performance with $7K? So I wonder if one could take this stripe size thing further and say that a larger stripe size is more likely to result in requests getting served parallized across disks which would lead to increased performance? Again, thanks to all people on this list, I know that I have learnt a _hell_ of alot since subscribing. Alex Turner netEconomist On 4/18/05, Alex Turner [EMAIL PROTECTED] wrote: Ok - well - I am partially wrong... If you're stripe size is 64Kb, and you are reading 256k worth of data, it will be spread across four drives, so you will need to read from four devices to get your 256k of data (RAID 0 or 5 or 10), but if you are only reading 64kb of data, I guess you would only need to read from one disk. So my assertion that adding more drives doesn't help is pretty wrong... particularly with OLTP because it's always dealing with blocks that are smaller that the stripe size. Alex Turner netEconomist On 4/18/05, Jacques Caron [EMAIL PROTECTED] wrote: Hi, At 18:56 18/04/2005, Alex Turner wrote: All drives are required to fill every request in all RAID levels No, this is definitely wrong. In many cases, most drives don't actually have the data requested, how could they handle the request? When reading one random sector, only *one* drive out of N is ever used to service any given request, be it RAID 0, 1, 0+1, 1+0 or 5. When writing: - in RAID 0, 1 drive - in RAID 1, RAID 0+1 or 1+0, 2 drives - in RAID 5, you need to read on all drives and write on 2. Otherwise, what would be the point of RAID 0, 0+1 or 1+0? Jacques. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] How to improve db performance with $7K?
Hi, At 20:21 18/04/2005, Alex Turner wrote: So I wonder if one could take this stripe size thing further and say that a larger stripe size is more likely to result in requests getting served parallized across disks which would lead to increased performance? Actually, it would be pretty much the opposite. The smaller the stripe size, the more evenly distributed data is, and the more disks can be used to serve requests. If your stripe size is too large, many random accesses within one single file (whose size is smaller than the stripe size/number of disks) may all end up on the same disk, rather than being split across multiple disks (the extreme case being stripe size = total size of all disks, which means concatenation). If all accesses had the same cost (i.e. no seek time, only transfer time), the ideal would be to have a stripe size equal to the number of disks. But below a certain size, you're going to use multiple disks to serve one single request which would not have taken much more time from a single disk (reading even a large number of consecutive blocks within one cylinder does not take much more time than reading a single block), so you would add unnecessary seeks on a disk that could have served another request in the meantime. You should definitely not go below the filesystem block size or the database block size. There is a interesting discussion of the optimal stripe size in the vinum manpage on FreeBSD: http://www.freebsd.org/cgi/man.cgi?query=vinumapropos=0sektion=0manpath=FreeBSD+5.3-RELEASE+and+Portsformat=html (look for Performance considerations, towards the end -- note however that some of the calculations are not entirely correct). Basically it says the optimal stripe size is somewhere between 256KB and 4MB, preferably an odd number, and that some hardware RAID controllers don't like big stripe sizes. YMMV, as always. Jacques. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] How to improve db performance with $7K?
Mistype.. I meant 0+1 in the second instance :( On 4/18/05, Joshua D. Drake [EMAIL PROTECTED] wrote: Alex Turner wrote: Not true - the recommended RAID level is RAID 10, not RAID 0+1 (at least I would never recommend 1+0 for anything). Uhmm I was under the impression that 1+0 was RAID 10 and that 0+1 is NOT RAID 10. Ref: http://www.acnc.com/raid.html Sincerely, Joshua D. Drake ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Compressing WAL
Added to TODO: * Compress WAL entries [wal] I have also added this email to TODO.detail. --- Simon Riggs wrote: On Sun, 2005-04-10 at 21:12 -0400, Bruce Momjian wrote: Jim C. Nasby wrote: Maybe better for -hackers, but here it goes anyway... Has anyone looked at compressing WAL's before writing to disk? On a system generating a lot of WAL it seems there might be some gains to be had WAL data could be compressed before going to disk, since today's machines are generally more I/O bound than CPU bound. And unlike the base tables, you generally don't need to read the WAL, so you don't really need to worry about not being able to quickly scan through the data without decompressing it. I have never heard anyone talk about it, but it seems useful. I think compressing the page images written on first page modification since checkpoint would be a big win. Well it was discussed 2-3 years ago as part of the PITR preamble. You may be surprised to read that over... A summary of thoughts to date on this are: xlog.c XLogInsert places backup blocks into the wal buffers before insertion, so is the right place to do this. It would be possible to do this before any LWlocks are taken, so would not not necessarily impair scalability. Currently XLogInsert is a severe CPU bottleneck around the CRC calculation, as identified recently by Tom. Digging further, the code used seems to cause processor stalls on Intel CPUs, possibly responsible for much of the CPU time. Discussions to move to a 32-bit CRC would also be effected by this because of the byte-by-byte nature of the algorithm, whatever the length of the generating polynomial. PostgreSQL's CRC algorithm is the fastest BSD code available. Until improvement is made there, I would not investigate compression further. Some input from hardware tuning specialists is required... The current LZW compression code uses a 4096 byte lookback size, so that would need to be modified to extend across a whole block. An alternative, suggested originally by Tom and rediscovered by me because I just don't read everybody's fine words in history, is to simply take out the freespace in the middle of every heap block that consists of zeros. Any solution in this area must take into account the variability of the size of freespace in database blocks. Some databases have mostly full blocks, others vary. There would also be considerable variation in compressability of blocks, especially since some blocks (e.g. TOAST) are likely to already be compressed. There'd need to be some testing done to see exactly the point where the costs of compression produce realisable benefits. So any solution must be able to cope with both compressed blocks and non-compressed blocks. My current thinking is that this could be achieved by using the spare fourth bit of the BkpBlocks portion of the XLog structure, so that either all included BkpBlocks are compressed or none of them are, and hope that allows benefit to shine through. Not thought about heap/index issues. It is possible that an XLogWriter process could be used to assist in the CRC and compression calculations also, an a similar process used to assist decompression for recovery, in time. I regret I do not currently have time to pursue further. Best Regards, Simon Riggs -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Question on REINDEX
All, A couple of questions regarding REINDEX command: Running PostgreSQL 7.4.2 on Solaris. 1) When is it necessary to run REINDEX or drop/create an index? All I could really find in the docs is: In some situations it is worthwhile to rebuild indexes periodically with the REINDEX command. (There is also contrib/reindexdb which can reindex an entire database.) However, PostgreSQL 7.4 has substantially reduced the need for this activity compared to earlier releases. What are these situations? We have a database with some large tables. Currently we reindex (actually drop/create) nightly. But as the tables have grown this has become prohibitively time-consuming. According to the above comment it may not be necessary at all. 2) If reindexing is necessary, how can this be done in a non-obtrusive way in a production environment. Our database is being updated constantly. REINDEX locks client apps out while in progress. Same with CREATE INDEX when we drop/create. The table can have over 10 million row. Recreating the indexes seems to take hours. This is too long to lock the client apps out. Is there any other solution? thanks, Bill __ Do you Yahoo!? Make Yahoo! your home page http://www.yahoo.com/r/hs ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] How to improve db performance with $7K?
On 4/18/05, Jacques Caron [EMAIL PROTECTED] wrote: Hi, At 20:21 18/04/2005, Alex Turner wrote: So I wonder if one could take this stripe size thing further and say that a larger stripe size is more likely to result in requests getting served parallized across disks which would lead to increased performance? Actually, it would be pretty much the opposite. The smaller the stripe size, the more evenly distributed data is, and the more disks can be used to serve requests. If your stripe size is too large, many random accesses within one single file (whose size is smaller than the stripe size/number of disks) may all end up on the same disk, rather than being split across multiple disks (the extreme case being stripe size = total size of all disks, which means concatenation). If all accesses had the same cost (i.e. no seek time, only transfer time), the ideal would be to have a stripe size equal to the number of disks. [snip] Ahh yes - but the critical distinction is this: The smaller the stripe size, the more disks will be used to serve _a_ request - which is bad for OLTP because you want fewer disks per request so that you can have more requests per second because the cost is mostly seek. If more than one disk has to seek to serve a single request, you are preventing that disk from serving a second request at the same time. To have more throughput in MB/sec, you want a smaller stripe size so that you have more disks serving a single request allowing you to multiple by effective drives to get total bandwidth. Because OLTP is made up of small reads and writes to a small number of different files, I would guess that you want those files split up across your RAID, but not so much that a single small read or write operation would traverse more than one disk. That would infer that your optimal stripe size is somewhere on the right side of the bell curve that represents your database read and write block count distribution. If on average the dbwritter never flushes less than 1MB to disk at a time, then I guess your best stripe size would be 1MB, but that seems very large to me. So I think therefore that I may be contending the exact opposite of what you are postulating! Alex Turner netEconomist ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Question on REINDEX
Bill, 1) When is it necessary to run REINDEX or drop/create an index? All I could really find in the docs is: If you need to VACUUM FULL, you need to REINDEX as well. For example, if you drop millions of rows from a table. 2) If reindexing is necessary, how can this be done in a non-obtrusive way in a production environment. Our database is being updated constantly. REINDEX locks client apps out while in progress. Same with CREATE INDEX when we drop/create. The table can have over 10 million row. Recreating the indexes seems to take hours. This is too long to lock the client apps out. Is there any other solution? Better to up your max_fsm_pages and do regular VACUUMs regularly and frequently so that you don't have to REINDEX at all. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Question on REINDEX
On Mon, Apr 18, 2005 at 12:21:42 -0700, Bill Chandler [EMAIL PROTECTED] wrote: Running PostgreSQL 7.4.2 on Solaris. 1) When is it necessary to run REINDEX or drop/create an index? All I could really find in the docs is: In some situations it is worthwhile to rebuild indexes periodically with the REINDEX command. (There is also contrib/reindexdb which can reindex an entire database.) However, PostgreSQL 7.4 has substantially reduced the need for this activity compared to earlier releases. In pathologic cases it is possible to have a lot of empty space on a lot of your index pages. Reindexing would change that to a smaller number. In earlier versions, I think it was possible to have completely empty pages and this happened for patterns of use (new values monotonically increasing, oldest values deleted first) that were actually seen in practice. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Question on vacuumdb
On Mon, Apr 18, 2005 at 12:27:08 -0700, Bill Chandler [EMAIL PROTECTED] wrote: All, If I run the command vacuumdb mydb I understand that it does some disk space recovery (but not as much as vacuumdb --full mydb). You are better off not using vacuum full unless some unusual event has bloated your database. By running normal vacuums often enough (and with a large enough fsm setting) your database should reach a steady state size. Question: if I run the command vacuumdb --analyze mydb does it still do the aforementioned disk space recovery AS WELL AS update query planning statistics? Or are those two completely separate operations requiring separate invocations of 'vacuumdb'. It is better to do both with one command. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] immutable functions vs. join for lookups ?
Greg Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: It would be interesting sometime to try to teach the planner about inlining SQL-language functions to become joins. That is, given The Inlining of the function is presumably a side-issue. I have tons of queries that use subqueries in the select list for which the same behaviour would be appropriate. Yeah, I was actually thinking about a two-step process: inline the function to produce somethig equivalent to a handwritten scalar sub-SELECT, and then try to convert sub-SELECTs into joins. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Question on REINDEX
Josh Berkus josh@agliodbs.com writes: 1) When is it necessary to run REINDEX or drop/create an index? All I could really find in the docs is: If you need to VACUUM FULL, you need to REINDEX as well. For example, if you drop millions of rows from a table. That's probably a pretty good rule of thumb. It's worth noting that VACUUM FULL tends to actively bloat indexes, not reduce them in size, because it has to create new index entries for the rows it moves before it can delete the old ones. So if a VACUUM FULL moves many rows you are likely to see the indexes get bigger not smaller. Better to up your max_fsm_pages and do regular VACUUMs regularly and frequently so that you don't have to REINDEX at all. Yes, definitely. Also consider using CLUSTER rather than VACUUM FULL when you need to clean up after massive deletions from a table. It's not any less intrusive in terms of locking, but it's often faster and it avoids the index bloat problem (since it effectively does a REINDEX). regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] How to improve db performance with $7K?
Kevin Brown wrote: Greg Stark wrote: I think you're being misled by analyzing the write case. Consider the read case. When a user process requests a block and that read makes its way down to the driver level, the driver can't just put it aside and wait until it's convenient. It has to go ahead and issue the read right away. Well, strictly speaking it doesn't *have* to. It could delay for a couple of milliseconds to see if other requests come in, and then issue the read if none do. If there are already other requests being fulfilled, then it'll schedule the request in question just like the rest. The idea with SCSI or any command queuing is that you don't have to wait for another request to come in --- you can send the request as it arrives, then if another shows up, you send that too, and the drive optimizes the grouping at a later time, knowing what the drive is doing, rather queueing in the kernel. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] How to improve db performance with $7K?
Does it really matter at which end of the cable the queueing is done (Assuming both ends know as much about drive geometry etc..)? Alex Turner netEconomist On 4/18/05, Bruce Momjian pgman@candle.pha.pa.us wrote: Kevin Brown wrote: Greg Stark wrote: I think you're being misled by analyzing the write case. Consider the read case. When a user process requests a block and that read makes its way down to the driver level, the driver can't just put it aside and wait until it's convenient. It has to go ahead and issue the read right away. Well, strictly speaking it doesn't *have* to. It could delay for a couple of milliseconds to see if other requests come in, and then issue the read if none do. If there are already other requests being fulfilled, then it'll schedule the request in question just like the rest. The idea with SCSI or any command queuing is that you don't have to wait for another request to come in --- you can send the request as it arrives, then if another shows up, you send that too, and the drive optimizes the grouping at a later time, knowing what the drive is doing, rather queueing in the kernel. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] How to improve db performance with $7K?
On Mon, Apr 18, 2005 at 06:49:44PM -0400, Alex Turner wrote: Does it really matter at which end of the cable the queueing is done (Assuming both ends know as much about drive geometry etc..)? That is a pretty strong assumption, isn't it? Also you seem to be assuming that the controller-disk protocol (some internal, unknown to mere mortals, mechanism) is equally powerful than the host-controller (SATA, SCSI, etc). I'm lost whether this thread is about what is possible with current, in-market technology, or about what could in theory be possible [if you were to design open source disk controllers and disks.] -- Alvaro Herrera ([EMAIL PROTECTED]) La fuerza no está en los medios físicos sino que reside en una voluntad indomable (Gandhi) ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] How to improve db performance with $7K?
On 4/14/05, Tom Lane [EMAIL PROTECTED] wrote: That's basically what it comes down to: SCSI lets the disk drive itself do the low-level I/O scheduling whereas the ATA spec prevents the drive from doing so (unless it cheats, ie, caches writes). Also, in SCSI it's possible for the drive to rearrange reads as well as writes --- which AFAICS is just not possible in ATA. (Maybe in the newest spec...) The reason this is so much more of a win than it was when ATA was designed is that in modern drives the kernel has very little clue about the physical geometry of the disk. Variable-size tracks, bad-block sparing, and stuff like that make for a very hard-to-predict mapping from linear sector addresses to actual disk locations. Combine that with the fact that the drive controller can be much smarter than it was twenty years ago, and you can see that the case for doing I/O scheduling in the kernel and not in the drive is pretty weak. So if you all were going to choose between two hard drives where: drive A has capacity C and spins at 15K rpms, and drive B has capacity 2 x C and spins at 10K rpms and all other features are the same, the price is the same and C is enough disk space which would you choose? I've noticed that on IDE drives, as the capacity increases the data density increases and there is a pereceived (I've not measured it) performance increase. Would the increased data density of the higher capacity drive be of greater benefit than the faster spindle speed of drive A? -- Matthew Nuzum www.bearfruit.org ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] Foreign key slows down copy/insert
I am new to cross references between tables, and I am trying to understand how they impact performance. From reading the documentation I was under the impression that deffering foreign keys would yield about the same performance as dropping them before a copy, and adding them after. However, I cannot see this in my test case. I have a table A with an int column ID that references table B column ID. Table B has about 150k rows, and has an index on B.ID. When trying to copy 1 million rows into A, I get the following \timings: 1) drop FK, copy (200s), add FK (5s) 2) add FK defferable initially deffered, copy (I aborted after 30min) 3) add FK defferable initially deffered, begin, copy (200s), commit (I aborted after 30min) How do I explain why test cases 2 and 3 do not come close to case 1? Am I missing something obvious? Since the database I am working on has many FKs, I would rather not have to drop/add them when I am loading large data sets. If it would help I can write this out in a reproducable scenario. I am using postgresql 7.4.5 at the moment. Sincerely, -- Richard van den Berg, CISSP --- Trust Factory B.V. | www.dna-portal.net Bazarstraat 44a| www.trust-factory.com 2518AK The Hague | Phone: +31 70 3620684 The Netherlands| Fax : +31 70 3603009 --- ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [Fwd: Re: [PERFORM] Functionscan estimates]
Hmmm. My brain is being jostled and I'm confusing illustra-postgres, informix-postgres and postgresql. Some things had functions and some things had constants and I do not remember which products had what combination. But probably how they are in postgresql, post hellerstein, is how I am remembering. I can find out for sure, given a little time, by querying old contacts. It would be best if I had a clear question to ask, though. --elein On Thu, Apr 14, 2005 at 02:58:09PM -0400, Alvaro Herrera wrote: On Thu, Apr 14, 2005 at 10:39:03AM -0700, elein wrote: All functions could have a cost associated with them, set by the writer of the function in order for the planner to reorder function calls. The stonebraker airplane level example was: select ... from ... where f(id) = 3 and expensive_image_function(img) The idea, of course is to weight the expensive function so it was pushed to the end of the execution. So there was only a constant cost associated with the function? No estimator function, for example? -- Alvaro Herrera ([EMAIL PROTECTED]) If you have nothing to say, maybe you need just the right tool to help you not say it. (New York Times, about Microsoft PowerPoint) ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Foreign key slows down copy/insert
Since the database I am working on has many FKs, I would rather not have to drop/add them when I am loading large data sets. You may want to hunt the archives. IIRCC I saw a couple of posts in the recent months about an update you can do to one of the system tables to disable the key checks and then re-enable them after your done with the import. Sincerely, Joshua D. Drake -- Command Prompt, Inc., Your PostgreSQL solutions company. 503-667-4564 Custom programming, 24x7 support, managed services, and hosting Open Source Authors: plPHP, pgManage, Co-Authors: plPerlNG Reliable replication, Mammoth Replicator - http://www.commandprompt.com/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] refcurosr vs. setof
On Sun, Apr 17, 2005 at 10:05:29PM +0200, Rüdiger Herrmann wrote: I need to write several PL/pgSQL functions all returning a result set wich can be obtained by a single SELECT statement. For now the functions are called by a Java application. Both REFCURSOR and SETOF serve my purpose, but I was wondering if there is a perfonance difference between the two. The result set can become quite large. Here's an excerpt from the Control Structures section of the PL/pgSQL documentation: The current implementation of RETURN NEXT for PL/pgSQL stores the entire result set before returning from the function, as discussed above. That means that if a PL/pgSQL function produces a very large result set, performance may be poor: data will be written to disk to avoid memory exhaustion, but the function itself will not return until the entire result set has been generatedCurrently, the point at which data begins being written to disk is controlled by the work_mem configuration variable. You might want to test both ways in typical and worst-case scenarios and see how each performs. -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq