Re: [PERFORM] memory question
What Scott said ... seconded, all of it. I'm running one 500GB database on a 64-bit, 8GB VMware virtual machine, with 2 vcores, PG 8.3.9 with shared_buffers set to 2GB, and it works great. However, it's a modest workload, most of the database is archival for data mining, and the working set for routine OLTP is pretty modest and easily fits in the 2GB, and it's back-ended on to a pretty decent EMC Clariion FibreChannel array. Not the typical case. For physical x86 servers, brand name (e.g. Kingston) ECC memory is down to $25 per GB in 4GB DIMMs, and $36 per GB in 8GB DIMMs dollars to doughnuts you have a server somewhere with 2GB or 4GB parts that can be pulled and replaced with double the density, et voila, an extra 16GB of RAM for about $500. Lots and lots of RAM is absolutely, positively a no-brainer when trying to make a DB go fast. If for no other reason than people get all starry eyed at GHz numbers, almost all computers tend to be CPU heavy and RAM light in their factory configs. I build a new little server for the house every 3-5 years, using desktop parts, and give it a mid-life upgrade with bigger drives and doubling the RAM density. Big banks running huge Oracle OLTP setups use the strategy of essentially keeping the whole thing in RAM HP shifts a lot of Superdome's maxed out with 2TB of RAM into this market - and that RAM costs a lot more than $25 a gig ;-) Cheers Dave
Re: [PERFORM] memory question
On Wed, 24 Mar 2010, Campbell, Lance wrote: I have 24 Gig of memory on my server... Our server manager seems to think that I have way to much memory. He thinks that we only need 5 Gig. You organisation probably spent more money getting your server manager to investigate how much RAM you need and scaring you about wasting resources, than it would cost to just slap 24GB in the machine. 24GB is the least amount of RAM I would consider putting in a new server nowadays. It's so cheap. Matthew -- Lord grant me patience, and I want it NOW! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Why Wal_buffer is 64KB
Hi All, Can anybody clarify on this, why wal_buffer is 64kb and what is advantages and disadvantages in increasing or decreasing the wal_buffer. Regards Raghav
Re: [PERFORM] default_statistics_target
On Mon, Mar 22, 2010 at 6:19 PM, Carlo Stonebanks stonec.regis...@sympatico.ca wrote: Thanks for the insight. How much more of a server's resources will be consumed by an ANALYZE with default_statistics_target = 100? I don't think it will be much of a problem, especially since autovacuum will do only the tables that need it and not all the same time. But you can certainly try it. Before changing the global setting, try just changing it for one session with SET: \timing ANALYZE some table; SET default_statistics_target = 100; ANALYZE same table; \q ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why Wal_buffer is 64KB
On Thu, 2010-03-25 at 20:31 +0530, Tadipathri Raghu wrote: Hi All, Can anybody clarify on this, why wal_buffer is 64kb and what is advantages and disadvantages in increasing or decreasing the wal_buffer. This is addressed in the documentation. http://www.postgresql.org/docs/8.4/interactive/wal-configuration.html -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why Wal_buffer is 64KB
On Thu, Mar 25, 2010 at 11:01 AM, Tadipathri Raghu traghu@gmail.com wrote: Hi All, Can anybody clarify on this, why wal_buffer is 64kb and what is advantages and disadvantages in increasing or decreasing the wal_buffer. is 64kb just because by default we have low values in almost everything :) and the advantages is that if your average transaction is more than 64kb large all wal data will be in memory until commit, actually i thing it should be large enough to accomodate more than one transaction but i'm not sure about that one... i usually use 1Mb for OLTP systems -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL AsesorÃa y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why Wal_buffer is 64KB
If you do large transactions, which emits large quantities of xlog, be aware that while the previous xlog segment is being fsynced, no new writes happen to the next segment. If you use large wal_buffers (more than 16 MB) these buffers can absorb xlog data while the previous segment is being fsynced, which allows a higher throughput. However, large wal_buffers also mean the COMMIT of small transactions might find lots of data in the buffers that noone has written/synced yet, which isn't good. If you use dedicated spindle(s) for the xlog, you can set the walwriter to be extremely aggressive (write every 5 ms for instance) and use fdatasync. This way, at almost every rotation of the disk, xlog gets written. I've found this configuration gives increased throughput, while not compromising latency, but you need to test it for yourself, it depends on your whole system. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] tuning auto vacuum for highly active tables
Hi All, We have a postgres database in which couple of tables get bloated due to heavy inserts and deletes. Auto vacuum is running. My question is how can I make auto vacuum more aggressive? I am thinking of enabling autovacuum_vacuum_cost_delay and autovacuum_vacuum_cost_limit parameters. Can anyone suggest how to calculate the appropriate values for these parameters and if there are any side effects of enabling these parameters.Any help will be highly appreciated. Thanks Paramjeet Kaur
Re: [PERFORM] mysql to postgresql, performance questions
On Wed, 2010-03-24 at 09:55 +0100, Yeb Havinga wrote: Greg Smith wrote: Tom Lane wrote: So has anyone looked at porting MythTV to PG? Periodically someone hacks together something that works, last big effort I'm aware of was in 2006, and then it bit rots away. I'm sure we'd get some user uptake on the result--MySQL corruption is one of the top ten cause of a MythTV system crashing. It would be the same with PG, unless the pg cluster configuration with mythtv would come with a properly configured WAL - I had corrupted tables (and a personal wiki entry (the other mysql database in my house) *only* when I sometimes took the risk of not shutting down the machine properly when e.g. the remote was missing). Pulling the plug should not corrupt a postgreSQL database, unless it was using disks which lie about write caching. Now need for WAL replica for that regards, Yeb Havinga -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] mysql to postgresql, performance questions
Hannu Krosing wrote: Pulling the plug should not corrupt a postgreSQL database, unless it was using disks which lie about write caching. Didn't we recently put the old wife's 'the disks lied' tale to bed in favour of actually admiting that some well known filesystems and saftware raid systems have had trouble with their write barriers? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] mysql to postgresql, performance questions
On Thu, Mar 25, 2010 at 2:04 PM, James Mansion ja...@mansionfamily.plus.com wrote: Hannu Krosing wrote: Pulling the plug should not corrupt a postgreSQL database, unless it was using disks which lie about write caching. Didn't we recently put the old wife's 'the disks lied' tale to bed in favour of actually admiting that some well known filesystems and saftware raid systems have had trouble with their write barriers? I believe so. It was determined to be a combination of several culprits, and only a few hard drives from back in the day apparently ever had this problem. Of course now it seems that modern SSDs may lie about cache if they don't have a big enough capacitor to guarantee they can write out their internal cache etc. The sad fact remains that many desktop / workstation systems lie, and quite a few servers as well, for whatever reason. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] mysql to postgresql, performance questions
Hannu Krosing wrote: Pulling the plug should not corrupt a postgreSQL database, unless it was using disks which lie about write caching. Didn't we recently put the old wife's 'the disks lied' tale to bed in favour of actually admiting that some well known filesystems and saftware raid systems have had trouble with their write barriers? I put a cheap UPS on the home server (which uses Software RAID) precisely because I don't really trust that stuff, and there is also the RAID5 write hole... and maybe the RAID1 write hole too... and installing a UPS takes less time that actually figuring out if the system is power-loss-safe. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] mysql to postgresql, performance questions
On Thu, Mar 25, 2010 at 2:29 PM, Pierre C li...@peufeu.com wrote: Hannu Krosing wrote: Pulling the plug should not corrupt a postgreSQL database, unless it was using disks which lie about write caching. Didn't we recently put the old wife's 'the disks lied' tale to bed in favour of actually admiting that some well known filesystems and saftware raid systems have had trouble with their write barriers? I put a cheap UPS on the home server (which uses Software RAID) precisely because I don't really trust that stuff, and there is also the RAID5 write hole... and maybe the RAID1 write hole too... and installing a UPS takes less time that actually figuring out if the system is power-loss-safe. Very true, a UPS might not cover every possible failure mode, but it sure takes care of an aweful lot of the common ones. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] experiments in query optimization
Hi everyone, I've been trying to reduce both memory usage and runtime for a query. Comments/suggestions gratefully received. Details are at http://bulldog.duhs.duke.edu/~faheem/snppy/opt.pdf See particularly Section 1 - Background and Discussion. If you want a text version, see http://bulldog.duhs.duke.edu/~faheem/snppy/opt.tex For background see http://bulldog.duhs.duke.edu/~faheem/snppy/diag.pdf (text version http://bulldog.duhs.duke.edu/~faheem/snppy/diag.tex) and http://bulldog.duhs.duke.edu/~faheem/snppy/snppy.pdf Please CC any replies to me at the above email address. Thanks. Regards, Faheem. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] mysql to postgresql, performance questions
Scott Marlowe wrote: On Thu, Mar 25, 2010 at 2:29 PM, Pierre C li...@peufeu.com wrote: Hannu Krosing wrote: Pulling the plug should not corrupt a postgreSQL database, unless it was using disks which lie about write caching. Didn't we recently put the old wife's 'the disks lied' tale to bed in favour of actually admiting that some well known filesystems and saftware raid systems have had trouble with their write barriers? I put a cheap UPS on the home server (which uses Software RAID) precisely because I don't really trust that stuff, and there is also the RAID5 write hole... and maybe the RAID1 write hole too... and installing a UPS takes less time that actually figuring out if the system is power-loss-safe. Very true, a UPS might not cover every possible failure mode, but it sure takes care of an aweful lot of the common ones. Yeah, but the original post was about mythtv boxes, which usually do not have upses. My suggestion about proper setup of the wal was based on some experience of my own. What I did was probably the fastest path to corrupt database files: diskless mythtv box that booted from the fileserver at the attic (with ups btw), but I was too lazy (after x days of lirc / xorg / ivtv / rtc / xmltv etc work) to move the default configured mysql database from the mythtv box (with root filesystem and also mysql on the nfs mount) to a mysql running on the fileserver itself. On top of that I had nfs mounted async for speed. Really after x days of configuration to get things running (my wife thinks it's hobby time but it really isn't) all that is on your mind is: it works good enough? fine, will iron out non essential things when they pop up and if the db becomes corrupt, I had database backups. In the end I had a few times a corrupt table that was always easily repaired with the mysqlcheck tool. Based on this experience I do not think that reliability alone will convince mythtv developers/users to switch to postgresql, and besides that as a developer and user myself, it's always in a way funny to see how creative people can finding ways to not properly use (your) software ;-) regards, Yeb Havinga -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance Tuning Large PL/PGSQL Stored Procedure
On Tue, Mar 23, 2010 at 5:00 PM, Eliot Gable egable+pgsql-performa...@gmail.com wrote: The complex type contains roughly 25 fields, mostly text, plus another 10 REFCURSORs. How many rows min/max/avg are coming back in your refcursors? Are you using cursors in order to return multiple complex data structures (sets, etc) in a single function call? The application that calls the stored procedure was also written by me in C++ and uses asynchronous libpq API commands to execute a single SQL transaction which calls the stored procedure and also performs a FETCH ALL on all open cursors. It then returns all results into various structures. All rows of all cursors that are open are always used for every call to the stored procedure. The stored procedure implements various logic which determines which tables in the database to query and how to filter the results from those queries to return only the relevant information needed by the C++ application. Currently, in terms of optimization, I have taken the following approaches based on the following reasoning: 1. For all queries whose results need to return to the C++ application, I utilize cursors so that all results can be readied and generated by the stored procedure with just one call to the PostgreSQL backend. I accomplish this using asynchronous libpq API calls to issue a single transaction to the server. The first command after the BEGIN is a SELECT * FROM MyStoredProc(blah), which is then followed by FETCH ALL commands for each cursor that the stored procedure leaves open. I then follow up with multiple API calls to return the results and retrieve the rows from those results. This minimizes the amount of back-and-forth between my C++ application and the database backend. 1a. Incidentally, I am also using cursors for most queries inside the stored procedure that do not return results to the C++ application. I am unsure whether this incurs a performance penalty compared to doing, for example, a SELECT ... INTO (var1, var2, ...) within the body of the stored procedure. Instead of using SELECT ... INTO, I am using OPEN cursor_name; FETCH cursor_name INTO (var1, var2). 2. I have built indexes on all columns that are used in where clauses and joins. 3. I use lots of joins to pull in data from various tables (there are around 60 tables that are queried with each call to the stored procedure). 4. When performing joins, the first table listed is the one that returns the most context-specific results, which always also means that it has the most-specific and fewest number of relevant rows. I then join them in order of least number of result rows with all inner joins preceding left outer joins. 5. Instead of using UNIONs and EXCEPT clauses, I use multiple WITH clauses to define several different query-specific views. I order them such that I can join additional tables in later WITH clauses to the views created WITH clauses can make your queries much easier to read and yield great speedups if you need to access the table expression multiple times from other parts of the query. however, in some cases you can get into trouble because a standard set of joins is going to give the planner the most flexibility in terms of query optimization. previously in a way that minimizes the number of rows involved in the JOIN operations while still providing provably accurate result sets. The EXCEPT clauses are then replaced by also defining one view which contains a set of IDs that I want filtered from the final result set and using a WHERE id NOT IN (SELECT id FROM filtered_view). Typically, this approach leaves me with just one UNION of two previously defined views (the union is required UNION is always an optimization target (did you mean UNION ALL?) 7. When I have a query I need to execute whose results will be used in several other queries, I currently open the cursor for that query using the FOR ... LOOP construct to retrieve all records in the result set and build a result array using the array_append() method. I then do an unnest(my_array) do not use array_append. always do array(select ...) whenever it is possible. when it isn't, rethink your problem until it is possible. only exception is to use array_agg aggregate if your problem really is an aggregation type of thing. as a matter of fact, any for...loop is an optimization target because a re-think will probably yield a query that does the same thing without paying for the loop. For most of the joins, they simply join on foreign key IDs and no additional filtering criteria are used on their information. Only a handful of the joined tables bring in additional criteria by which to filter the result set. The approach used in 7 with cursors and building a result array which is then unnested has me worried in terms of performance. It seems to me there should be some better way to accomplish the same thing. The stored procedure does not perform
Re: [PERFORM] Why Wal_buffer is 64KB
Hi Pierre, First of all , I Thank all for sharing the information on this Issue. On Thu, Mar 25, 2010 at 11:44 PM, Pierre C li...@peufeu.com wrote: If you do large transactions, which emits large quantities of xlog, be aware that while the previous xlog segment is being fsynced, no new writes happen to the next segment. If you use large wal_buffers (more than 16 MB) these buffers can absorb xlog data while the previous segment is being fsynced, which allows a higher throughput. However, large wal_buffers also mean the COMMIT of small transactions might find lots of data in the buffers that noone has written/synced yet, which isn't good. If you use dedicated spindle(s) for the xlog, you can set the walwriter to be extremely aggressive (write every 5 ms for instance) and use fdatasync. This way, at almost every rotation of the disk, xlog gets written. I've found this configuration gives increased throughput, while not compromising latency, but you need to test it for yourself, it depends on your whole system. Small testing is done from my end. I have created a test table with one row and done insertion into it(10,00,000- rows). I have turned off fsync and syncronous_commit. I saw there is fast insert if i do so, but if i turn it on then there is latency. Before fsync / syncronous_commit on postgres=# explain analyze insert into test values(generate_series(1,100)); QUERY PLAN - Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.015..6293.674 rows=100 loops=1) Total runtime: *37406.012 ms* (2 rows) After fsync/syncronous_commit off = postgres=# explain analyze insert into test values(generate_series(1,100)); QUERY PLAN - Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.154..5801.584 rows=100 loops=1) Total runtime: *29378.626 ms *(2 rows) I request to know here is, what would be xlog files with wal_buffer. Does xlog will recycle or grow in creating one more for this particular transaction. Could you explain here, when wal_buffer is 64kb which is very small, and everything is in xlog files written, so wt happens if we increase the wal_buffer here? Regards Raghav
Re: [PERFORM] Performance Tuning Large PL/PGSQL Stored Procedure
On Thu, Mar 25, 2010 at 10:00 PM, Merlin Moncure mmonc...@gmail.com wrote: On Tue, Mar 23, 2010 at 5:00 PM, Eliot Gable egable+pgsql-performa...@gmail.com egable%2bpgsql-performa...@gmail.com wrote: The complex type contains roughly 25 fields, mostly text, plus another 10 REFCURSORs. How many rows min/max/avg are coming back in your refcursors? Are you using cursors in order to return multiple complex data structures (sets, etc) in a single function call? I think the largest number of rows is around 40. Most are substantially smaller. However, most of them have about two dozen or more columns, and I have already shortened the list of columns to the minimum possible. The average number of rows is around 10, but the largest sets of rows also have the most columns. I'm using the cursors in order to obtain multiple complex data structures in a single function call. The application that calls the stored procedure was also written by me in C++ and uses asynchronous libpq API commands to execute a single SQL transaction which calls the stored procedure and also performs a FETCH ALL on all open cursors. It then returns all results into various structures. All rows of all cursors that are open are always used for every call to the stored procedure. The stored procedure implements various logic which determines which tables in the database to query and how to filter the results from those queries to return only the relevant information needed by the C++ application. Currently, in terms of optimization, I have taken the following approaches based on the following reasoning: 1. For all queries whose results need to return to the C++ application, I utilize cursors so that all results can be readied and generated by the stored procedure with just one call to the PostgreSQL backend. I accomplish this using asynchronous libpq API calls to issue a single transaction to the server. The first command after the BEGIN is a SELECT * FROM MyStoredProc(blah), which is then followed by FETCH ALL commands for each cursor that the stored procedure leaves open. I then follow up with multiple API calls to return the results and retrieve the rows from those results. This minimizes the amount of back-and-forth between my C++ application and the database backend. 1a. Incidentally, I am also using cursors for most queries inside the stored procedure that do not return results to the C++ application. I am unsure whether this incurs a performance penalty compared to doing, for example, a SELECT ... INTO (var1, var2, ...) within the body of the stored procedure. Instead of using SELECT ... INTO, I am using OPEN cursor_name; FETCH cursor_name INTO (var1, var2). 2. I have built indexes on all columns that are used in where clauses and joins. 3. I use lots of joins to pull in data from various tables (there are around 60 tables that are queried with each call to the stored procedure). 4. When performing joins, the first table listed is the one that returns the most context-specific results, which always also means that it has the most-specific and fewest number of relevant rows. I then join them in order of least number of result rows with all inner joins preceding left outer joins. 5. Instead of using UNIONs and EXCEPT clauses, I use multiple WITH clauses to define several different query-specific views. I order them such that I can join additional tables in later WITH clauses to the views created WITH clauses can make your queries much easier to read and yield great speedups if you need to access the table expression multiple times from other parts of the query. however, in some cases you can get into trouble because a standard set of joins is going to give the planner the most flexibility in terms of query optimization. So far, every case I have converted to WITH clauses has resulted in more than double the speed (half the time required to perform the query). The main reason appears to be from avoiding calculating JOIN conditions multiple times in different parts of the query due to the UNION and EXCEPT clauses. previously in a way that minimizes the number of rows involved in the JOIN operations while still providing provably accurate result sets. The EXCEPT clauses are then replaced by also defining one view which contains a set of IDs that I want filtered from the final result set and using a WHERE id NOT IN (SELECT id FROM filtered_view). Typically, this approach leaves me with just one UNION of two previously defined views (the union is required UNION is always an optimization target (did you mean UNION ALL?) Thanks for the suggestion on UNION ALL; I indeed do not need elimination of duplicates, so UNION ALL is a better option. 7. When I have a query I need to execute whose results will be used in several other queries, I currently open the cursor for that query using
[PERFORM] good effective_io_concurrency for FusionIO drives?
Hello, Wondering what's a good value for effective_io_concurrency when dealing with FusionIO drives...anyone have any experience with this? I know that SSDs vary from 10 channels to 30, and that 1 SSD about as fast as a 4-drive RAID, but I can't seem to settle on a good value to use for effective_io_concurrency. Has anyone done any performance testing/metrics with the value? Any recommendations or thoughts? --Richard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance