[PERFORM] what work_mem needs a query needs?
Hi all, In a previous post, Ron Peacetree suggested to check what work_mem needs a query needs. How that can be done? Thanks all -- Arnau ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS
Alvaro Herrera wrote: Ron wrote: C= What file system are you using? Unlike BigDBMS, pg does not have its own native one, so you have to choose the one that best suits your needs. For update heavy applications involving lots of small updates jfs and XFS should both be seriously considered. Actually it has been suggested that a combination of ext2 (for WAL) and ext3 (for data, with data journalling disabled) is a good performer. AFAIK you don't want the overhead of journalling for the WAL partition. I'm curious as to why ext3 for data with journalling disabled? Would that not be the same as ext2? -- Until later, Geoffrey Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. - Benjamin Franklin ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS
On þri, 2007-01-02 at 09:04 -0500, Geoffrey wrote: Alvaro Herrera wrote: Actually it has been suggested that a combination of ext2 (for WAL) and ext3 (for data, with data journalling disabled) is a good performer. AFAIK you don't want the overhead of journalling for the WAL partition. I'm curious as to why ext3 for data with journalling disabled? Would that not be the same as ext2? I believe Alvaro was referring to ext3 with journalling enabled for meta-data, but not for data. I also believe this is the standard ext3 configuration, but I could be wrong on that. gnari ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 2 Jan 2007, at 14:54, Ragnar wrote: On þri, 2007-01-02 at 09:04 -0500, Geoffrey wrote: Alvaro Herrera wrote: Actually it has been suggested that a combination of ext2 (for WAL) and ext3 (for data, with data journalling disabled) is a good performer. AFAIK you don't want the overhead of journalling for the WAL partition. I'm curious as to why ext3 for data with journalling disabled? Would that not be the same as ext2? I believe Alvaro was referring to ext3 with journalling enabled for meta-data, but not for data. I also believe this is the standard ext3 configuration, but I could be wrong on that. gnari it doesn't really belong here but ext3 has data journaled (data and meta data) ordered (meta data journald but data written before meta data (default)) journald (meta data only journal) modes. The performance differences between ordered and meta data only journaling should be very small enyway - -- Viele Grüße, Lars Heidieker [EMAIL PROTECTED] http://paradoxon.info - Mystische Erklärungen. Die mystischen Erklärungen gelten für tief; die Wahrheit ist, dass sie noch nicht einmal oberflächlich sind. -- Friedrich Nietzsche -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (Darwin) iD8DBQFFmnJUcxuYqjT7GRYRApNrAJ9oYusdw+Io4iSZrEITTbFy2qDA4QCgmBW5 7cpQZmlIv61EF2wP2yNXZhA= =glwc -END PGP SIGNATURE- ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS
More specifically, you should set the noatime,data=writeback options in fstab on ext3 partitions for best performance. Correct? it doesn't really belong here but ext3 has data journaled (data and meta data) ordered (meta data journald but data written before meta data (default)) journald (meta data only journal) modes. The performance differences between ordered and meta data only journaling should be very small enyway ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Worse perfomance on 8.2.0 than on 7.4.14
=?iso-8859-1?q?Rolf=20=D8stvik?= [EMAIL PROTECTED] writes: If you (Tom) still want me to do the following steps then please tell me. Please --- I'm still curious why the estimated cost changed so much from 7.4 to 8.2. I can believe a marginal change in cost leading to a plan switch, but comparing the total-cost numbers shows that 8.2 must think that indexscan is a whole lot more expensive than 7.4 did, which seems odd. For the most part 8.2 ought to think nestloop-with-inner-indexscan is cheaper than 7.4 did, because we now account for caching effects across repeated iterations of the inner scan. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] Config parameters
I'm curious what parameters you guys typically *always* adjust on new PostgreSQL installs. I am working with a database that contains several large tables (10-20 million) and many smaller tables (hundreds of rows). My system has 2 GB of RAM currently, although I will be upping it to 4GB soon. My motivation in asking this question is to make sure I'm not making a big configuration no-no by missing a parameter, and also for my own checklist of parameters I should almost always set when configuring a new install. The parameters that I almost always change when installing a new system is shared_buffers, max_fsm_pages, checkpoint_segments, and effective_cache_size. Are there any parameters missing that always should be changed when deploying to a decent server? ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Config parameters
Jeremy Haile wrote: I'm curious what parameters you guys typically *always* adjust on new PostgreSQL installs. The parameters that I almost always change when installing a new system is shared_buffers, max_fsm_pages, checkpoint_segments, and effective_cache_size. Always: work_mem, maintenance_work_mem Also consider temp_buffers and random_page_cost. A lot will depend on how much of the data you handle ends up cached. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Slow dump?
Hello, we recently migrated our system from 8.1.x to 8.2 and when running dumps have noticed an extreme decrease in speed where the dump is concerned (by more than a factor of 2). I was wondering if someone might offer some suggestions as to what may be causing the problem. How important are max_fsm_pages and max_fsm_relations to doing a dump? I was just looking over your config file and that's the only thing that jumped out at me as needing to be changed. Machine info: OS: Solaris 10 Sunfire X4100 XL 2x AMD Opteron Model 275 dual core procs 8GB of ram Pertinent postgres settings: shared_buffers: 5 work_mem: 8192 maintenance_work_mem: 262144 max_stack_depth: 3048 (default) There doesn't seem to be any other performance degradation while the dump is running (which I suppose is good). Any ideas? -- erik jones [EMAIL PROTECTED] software development emma(r) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Config parameters
What is a decent default setting for work_mem and maintenance_work_mem, considering I am regularly querying tables that are tens of millions of rows and have 2-4 GB of RAM? Also - what is the best way to determine decent settings for temp_buffers and random_page_cost? On Tue, 02 Jan 2007 16:34:19 +, Richard Huxton dev@archonet.com said: Jeremy Haile wrote: I'm curious what parameters you guys typically *always* adjust on new PostgreSQL installs. The parameters that I almost always change when installing a new system is shared_buffers, max_fsm_pages, checkpoint_segments, and effective_cache_size. Always: work_mem, maintenance_work_mem Also consider temp_buffers and random_page_cost. A lot will depend on how much of the data you handle ends up cached. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Slow dump?
Erik Jones [EMAIL PROTECTED] writes: Hello, we recently migrated our system from 8.1.x to 8.2 and when running dumps have noticed an extreme decrease in speed where the dump is concerned (by more than a factor of 2). That's odd. pg_dump is normally pretty much I/O bound, at least assuming your tables are sizable. The only way it wouldn't be is if you have a datatype with a very slow output converter. Have you looked into exactly which tables are slow to dump and what datatypes they contain? (Running pg_dump with log_min_duration_statement enabled would provide useful data about which steps take a long time, if you're not sure.) regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Slow dump?
Tom Lane wrote: Erik Jones [EMAIL PROTECTED] writes: Hello, we recently migrated our system from 8.1.x to 8.2 and when running dumps have noticed an extreme decrease in speed where the dump is concerned (by more than a factor of 2). That's odd. pg_dump is normally pretty much I/O bound, at least assuming your tables are sizable. The only way it wouldn't be is if you have a datatype with a very slow output converter. Have you looked into exactly which tables are slow to dump and what datatypes they contain? (Running pg_dump with log_min_duration_statement enabled would provide useful data about which steps take a long time, if you're not sure.) regards, tom lane Well, all of our tables use pretty basic data types: integer (various sizes), text, varchar, boolean, and timestamps without time zone. In addition, other than not having a lot of our foreign keys in place, there have been no other schema changes since the migration. -- erik jones [EMAIL PROTECTED] software development emma(r) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS
On Fri, 2006-12-29 at 07:52 -0500, Ron wrote: A= go through each query and see what work_mem needs to be for that query to be as RAM resident as possible. If you have enough RAM, set work_mem for that query that large. Remember that work_mem is =per query=, so queries running in parallel eat the sum of each of their work_mem's. Just to clarify, from the docs on work_mem at http://www.postgresql.org/docs/current/static/runtime-config- resource.html : Specifies the amount of memory to be used by internal sort operations and hash tables before switching to temporary disk files. The value is specified in kilobytes, and defaults to 1024 kilobytes (1 MB). Note that for a complex query, several sort or hash operations might be running in parallel; each one will be allowed to use as much memory as this value specifies before it starts to put data into temporary files. Also, several running sessions could be doing such operations concurrently. So the total memory used could be many times the value of work_mem; it is necessary to keep this fact in mind when choosing the value. Sort operations are used for ORDER BY, DISTINCT, and merge joins. Hash tables are used in hash joins, hash-based aggregation, and hash-based processing of IN subqueries. Regards, Jeff Davis ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Config parameters
Jeremy Haile wrote: What is a decent default setting for work_mem and maintenance_work_mem, considering I am regularly querying tables that are tens of millions of rows and have 2-4 GB of RAM? Well, work_mem will depend on your query-load. Queries that do a lot of sorting should benefit from increased work_mem. You only have limited RAM though, so it's a balancing act between memory used to cache disk and per-process sort memory. Note that work_mem is per sort, so you can use multiples of that amount in a single query. You can issue a set to change the value for a session. How you set maintenance_work_mem will depend on whether you vacuum continually (e.g. autovacuum) or at set times. Also - what is the best way to determine decent settings for temp_buffers and random_page_cost? With all of these, testing I'm afraid. The only sure thing you can say is that random_page_cost should be 1 if all your database fits in RAM. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Config parameters
Thanks for the information! Are there any rule-of-thumb starting points for these values that you use when setting up servers? I'd at least like a starting point for testing different values. For example, I'm sure setting a default work_mem of 100MB is usually overkill - but is 5MB usually a reasonable number? 20MB? My system does not have a huge number of concurrent users, but they are hitting large tables. I'm not sure what numbers people usually use here successfully. For maintenance_work_mem, I turned off autovacuum to save on performance, but run a vacuum analyze once an hour. My current database characteristics are heavy insert (bulk inserts every 5 minutes) and medium amount of selects on large, heavily indexed tables. For temp_buffers - any rule of thumb starting point? What's the best way to evaluate if this number is adjusted correctly? For random_page_cost - is the default of 4 pretty good for most drives? Do you usually bump it up to 3 on modern servers? I've usually done internal RAID setups, but the database I'm currently working on is hitting a SAN over fiber. I realize that these values can vary a lot based on a variety of factors - but I'd love some more advice on what good rule-of-thumb starting points are for experimentation and how to evaluate whether the values are set correctly. (in the case of temp_buffers and work_mem especially) On Tue, 02 Jan 2007 18:49:54 +, Richard Huxton dev@archonet.com said: Jeremy Haile wrote: What is a decent default setting for work_mem and maintenance_work_mem, considering I am regularly querying tables that are tens of millions of rows and have 2-4 GB of RAM? Well, work_mem will depend on your query-load. Queries that do a lot of sorting should benefit from increased work_mem. You only have limited RAM though, so it's a balancing act between memory used to cache disk and per-process sort memory. Note that work_mem is per sort, so you can use multiples of that amount in a single query. You can issue a set to change the value for a session. How you set maintenance_work_mem will depend on whether you vacuum continually (e.g. autovacuum) or at set times. Also - what is the best way to determine decent settings for temp_buffers and random_page_cost? With all of these, testing I'm afraid. The only sure thing you can say is that random_page_cost should be 1 if all your database fits in RAM. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Config parameters
On Tue, 2007-01-02 at 13:19, Jeremy Haile wrote: Thanks for the information! Are there any rule-of-thumb starting points for these values that you use when setting up servers? I'd at least like a starting point for testing different values. For example, I'm sure setting a default work_mem of 100MB is usually overkill - but is 5MB usually a reasonable number? 20MB? My system does not have a huge number of concurrent users, but they are hitting large tables. I'm not sure what numbers people usually use here successfully. The setting for work_mem is very dependent on how many simultaneous connections you'll be processing at the same time, and how likely they are to be doing sorts. If you'll only ever have 5 connections to a database on a machine with a lot of memory, then setting it to 100M is probably fine. Keep in mind, the limit is PER SORT, not per query. An upper limit of about 25% of the machine's total memory is a good goal for how big to size work_mem. So, on a 4 Gig machine you could divide 1G (25%) by the total possible connections, then again by the average number of sorts you'd expect per query / connection to get an idea. Also, you can set it smaller than that, and for a given connection, set it on the fly when needed. CONNECT set work_mem=100; select . DISCONNECT And you run less risk of blowing out the machine with joe user's random query. For maintenance_work_mem, I turned off autovacuum to save on performance, but run a vacuum analyze once an hour. My current database characteristics are heavy insert (bulk inserts every 5 minutes) and medium amount of selects on large, heavily indexed tables. Did you turn off stats collection as well? That's really the major performance issue with autovacuum, not autovacuum itself. Plus, if you've got a table that really needs vacuuming every 5 minutes to keep the database healthy, you may be working against yourself by turning off autovacuum. I.e. the cure may be worse than the disease. OTOH, if you don't delete / update often, then don't worry about it. For temp_buffers - any rule of thumb starting point? What's the best way to evaluate if this number is adjusted correctly? Haven't researched temp_buffers at all. For random_page_cost - is the default of 4 pretty good for most drives? Do you usually bump it up to 3 on modern servers? I've usually done internal RAID setups, but the database I'm currently working on is hitting a SAN over fiber. random_page_cost is the hardest to come up with the proper setting. If you're hitting a RAID10 with 40 disk drives or some other huge drive array, you might need to crank up random_page_cost to some very large number, as sequential accesses are often preferred there. I believe there were some posts by Luke Lonergan (sp) a while back where he had set random_page_cost to 20 or something even higher on a large system like that. On data sets that fit in memory, the cost nominally approaces 1. On smaller work group servers with a single mirror set for a drive subsystem and moderate to large data sets, I've found values of 1.4 to 3.0 to be reasonable, depending on the workload. I realize that these values can vary a lot based on a variety of factors - but I'd love some more advice on what good rule-of-thumb starting points are for experimentation and how to evaluate whether the values are set correctly. (in the case of temp_buffers and work_mem especially) To see if the values are good or not, run a variety of your worst queries on the machine while varying the settings to see which run best. That will at least let you know if you're close. While you can't change buffers on the fly, you can change work_mem and random_page_cost on the fly, per connection, to see the change. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Config parameters
So, on a 4 Gig machine you could divide 1G (25%) by the total possible connections, then again by the average number of sorts you'd expect per query / connection to get an idea. Thanks for the advice. I'll experiment with higher work_mem settings, as I am regularly doing sorts on large datasets. I imagine the default setting isn't very optimal in my case. Did you turn off stats collection as well? That's really the major performance issue with autovacuum, not autovacuum itself. I did turn off stats collection. I'm not sure how much of a difference it makes, but I was trying to squeeze every ounce of performance out of the database. I.e. the cure may be worse than the disease. OTOH, if you don't delete / update often, then don't worry about it. I hardly ever delete/update. I update regularly, but only on small tables so it doesn't make as big of a difference. I do huge inserts, which is why turning off stats/autovacuum gives me some performance benefit. I usually only do deletes nightly in large batches, so autovacuuming/analyzing once an hour works fairly well. Haven't researched temp_buffers at all. Do you usually change temp_buffers? Mine is currently at the default setting. I guess I could arbitrarily bump it up - but I'm not sure what the consequences would be or how to tell if it is set correctly. random_page_cost is the hardest to come up with the proper setting. This definitely sounds like the hardest to figure out. (since it seems to be almost all trial-and-error) I'll play with some different values. This is only used by the query planner right? How much of a performance difference does it usually make to tweak this number? (i.e. how much performance difference would someone usually expect when they find that 2.5 works better than 4?) While you can't change buffers on the fly, you can change work_mem and random_page_cost on the fly, per connection, to see the change. Thanks for the advice. I was aware you could change work_mem on the fly, but didn't think about setting random_page_cost on-the-fly. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] More 8.2 client issues (Was: [Slow dump?)
Hmm... This gets stranger and stranger. When connecting to the database with the psql client in 8.2's bin directory and using commands such as \d the client hangs, or takes an extremely long time. If we connect to the same 8.2 database with a psql client from 8.1.4, both remotely and locally, \d responds immediately. Could the issue be with the client programs somehow? Note also that we did our migration over the xmas weekend using the dump straight into a restore command. We kicked it off Saturday (12-23-06) night and it had just reached the point of adding foreign keys the morning of the 26th. We stopped it there, wrote a script to go through and build indexes (which finished in a timely manner) and have added just the foreign keys strictly necessary for our applications functionality (i.e. foreign keys set to cascade on update/delete, etc...). Original Message Subject:Re: [PERFORM] Slow dump? Date: Tue, 02 Jan 2007 11:40:18 -0600 From: Erik Jones [EMAIL PROTECTED] To: Tom Lane [EMAIL PROTECTED] CC: pgsql-performance@postgresql.org References: [EMAIL PROTECTED] [EMAIL PROTECTED] Tom Lane wrote: Erik Jones [EMAIL PROTECTED] writes: Hello, we recently migrated our system from 8.1.x to 8.2 and when running dumps have noticed an extreme decrease in speed where the dump is concerned (by more than a factor of 2). That's odd. pg_dump is normally pretty much I/O bound, at least assuming your tables are sizable. The only way it wouldn't be is if you have a datatype with a very slow output converter. Have you looked into exactly which tables are slow to dump and what datatypes they contain? (Running pg_dump with log_min_duration_statement enabled would provide useful data about which steps take a long time, if you're not sure.) regards, tom lane Well, all of our tables use pretty basic data types: integer (various sizes), text, varchar, boolean, and timestamps without time zone. In addition, other than not having a lot of our foreign keys in place, there have been no other schema changes since the migration. -- erik jones [EMAIL PROTECTED] software development emma(r) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- erik jones [EMAIL PROTECTED] software development emma(r) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] More 8.2 client issues (Was: [Slow dump?)
Erik Jones wrote: Hmm... This gets stranger and stranger. When connecting to the database with the psql client in 8.2's bin directory and using commands such as \d the client hangs, or takes an extremely long time. If we connect to the same 8.2 database with a psql client from 8.1.4, both remotely and locally, \d responds immediately. Could the issue be with the client programs somehow? Couldn't be some DNS problems that only affect the 8.2 client I suppose? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] More 8.2 client issues (Was: [Slow dump?)
Richard Huxton wrote: Erik Jones wrote: Hmm... This gets stranger and stranger. When connecting to the database with the psql client in 8.2's bin directory and using commands such as \d the client hangs, or takes an extremely long time. If we connect to the same 8.2 database with a psql client from 8.1.4, both remotely and locally, \d responds immediately. Could the issue be with the client programs somehow? Couldn't be some DNS problems that only affect the 8.2 client I suppose? Hmm... I don't see how that would matter when the 8.2. client is being run locally. -- erik jones [EMAIL PROTECTED] software development emma(r) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] More 8.2 client issues (Was: [Slow dump?)
Erik Jones [EMAIL PROTECTED] writes: Hmm... This gets stranger and stranger. When connecting to the database with the psql client in 8.2's bin directory and using commands such as \d the client hangs, or takes an extremely long time. Hangs at what point? During connection? Try strace'ing psql (or whatever the Solaris equivalent is) to see what it's doing. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings