Re: [PERFORM] Tuning Postgres for Single connection use
On Mon, Apr 14, 2014 at 5:19 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Mon, Apr 14, 2014 at 2:46 PM, Nick Eubank nickeub...@gmail.com wrote: Any rules of thumb for work_mem, maintenance_work_mem, shared_buffer, etc. for a database that DOESN'T anticipate concurrent connections and that is doing lots of aggregate functions on large tables? All the advice I can find online on tuning (thishttp://wiki.postgresql.org/wiki/Performance_Optimization , thishttp://media.revsys.com/talks/djangocon/2011/secrets-of-postgresql-performance.pdf , this http://www.revsys.com/writings/postgresql-performance.html etc.) is written for people anticipating lots of concurrent connections. I'm a social scientist looking to use Postgres not as a database to be shared by multiple users, but rather as my own tool for manipulating a massive data set (I have 5 billion transaction records (600gb in csv) and want to pull out unique user pairs, estimate aggregates for individual users, etc.). This also means almost no writing, except to creation of new tables based on selections from the main table. I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if that's important. I'd go with a small shared_buffers, like 128MB, and let the OS cache as much as possible. This minimizes the amount of double buffering. And set work_mem to about 6GB, then bump it up if that doesn't seem to cause problems. In the scenario you describe, it is probably no big deal if you guess too high. Monitor the process, if it it starts to go nuts just kill it and start again with a lower work_mem. If it is a single user system, you can afford to be adventurous. If you need to build indexes, you should bump up maintenance_work_mem, but I just would do that in the local session not system wide. Cheers, Jeff Quick followup Jeff: it seems that I can't set work_mem above about 1gb (can't get to 2gb. When I update config, the values just don't change in SHOW ALL -- integer constraint?). Is there a work around, or should I tweak something else accordingly? Thanks! Nick (Properly bottom posted this time?)
Re: [PERFORM] Tuning Postgres for Single connection use
On Tue, Apr 15, 2014 at 9:12 AM, Nick Eubank nickeub...@gmail.com wrote: On Mon, Apr 14, 2014 at 5:19 PM, Jeff Janes jeff.ja...@gmail.com wrote: I'd go with a small shared_buffers, like 128MB, and let the OS cache as much as possible. This minimizes the amount of double buffering. And set work_mem to about 6GB, then bump it up if that doesn't seem to cause problems. In the scenario you describe, it is probably no big deal if you guess too high. Monitor the process, if it it starts to go nuts just kill it and start again with a lower work_mem. If it is a single user system, you can afford to be adventurous. If you need to build indexes, you should bump up maintenance_work_mem, but I just would do that in the local session not system wide. Cheers, Jeff Quick followup Jeff: it seems that I can't set work_mem above about 1gb (can't get to 2gb. When I update config, the values just don't change in SHOW ALL -- integer constraint?). Is there a work around, or should I tweak something else accordingly? What version are you using? What is the exact line you put in your config file? Did you get any errors when using that config file? Are you sure you actually reloaded the server, so that it reread the config file, rather than just changing the file and then not applying the change? I usually set work_mem within a psql connection, in which case you need to quote the setting if you use units: set work_mem=3GB; But if you set it system wide in the config file the quotes should not be needed. Thanks! Nick (Properly bottom posted this time?) Looked good to me. Cheers, Jeff
Re: [PERFORM] Tuning Postgres for Single connection use
From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Nick Eubank Sent: Tuesday, April 15, 2014 11:12 AM To: Jeff Janes Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Tuning Postgres for Single connection use On Mon, Apr 14, 2014 at 5:19 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Mon, Apr 14, 2014 at 2:46 PM, Nick Eubank nickeub...@gmail.com wrote: Any rules of thumb for work_mem, maintenance_work_mem, shared_buffer, etc. for a database that DOESN'T anticipate concurrent connections and that is doing lots of aggregate functions on large tables? All the advice I can find online on tuning (this, this, this etc.) is written for people anticipating lots of concurrent connections. I'm a social scientist looking to use Postgres not as a database to be shared by multiple users, but rather as my own tool for manipulating a massive data set (I have 5 billion transaction records (600gb in csv) and want to pull out unique user pairs, estimate aggregates for individual users, etc.). This also means almost no writing, except to creation of new tables based on selections from the main table. I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if that's important. I'd go with a small shared_buffers, like 128MB, and let the OS cache as much as possible. This minimizes the amount of double buffering. And set work_mem to about 6GB, then bump it up if that doesn't seem to cause problems. In the scenario you describe, it is probably no big deal if you guess too high. Monitor the process, if it it starts to go nuts just kill it and start again with a lower work_mem. If it is a single user system, you can afford to be adventurous. If you need to build indexes, you should bump up maintenance_work_mem, but I just would do that in the local session not system wide. Cheers, Jeff Quick followup Jeff: it seems that I can't set work_mem above about 1gb (can't get to 2gb. When I update config, the values just don't change in SHOW ALL -- integer constraint?). Is there a work around, or should I tweak something else accordingly? Thanks! Nick (Properly bottom posted this time?) [Schnabel, Robert D.] Nick, I asked the same question a while ago about work_mem on Windows. See this thread: http://www.postgresql.org/message-id/17895.1315869...@sss.pgh.pa.us Bob -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Tuning Postgres for Single connection use
Jeff Janes jeff.ja...@gmail.com writes: On Tue, Apr 15, 2014 at 9:12 AM, Nick Eubank nickeub...@gmail.com wrote: Quick followup Jeff: it seems that I can't set work_mem above about 1gb (can't get to 2gb. When I update config, the values just don't change in SHOW ALL -- integer constraint?). Is there a work around, or should I tweak something else accordingly? What version are you using? What is the exact line you put in your config file? Did you get any errors when using that config file? Are you sure you actually reloaded the server, so that it reread the config file, rather than just changing the file and then not applying the change? I usually set work_mem within a psql connection, in which case you need to quote the setting if you use units: set work_mem=3GB; FWIW, it's generally considered a seriously *bad* idea to set work_mem as high as 1GB in postgresql.conf: you're promising that each query running on the server can use 1GB per sort or hash step. You probably don't have the machine resources to honor that promise. (If you do, I'd like to have your IT budget ;-)) Recommended practice is to keep the global setting conservatively small, and bump it up locally in your session (with SET) for individual queries that need the very large value. But having said that, Postgres doesn't try to enforce any such practice. My money is on what Jeff is evidently thinking: you forgot to do pg_ctl reload, or else the setting is too large for your platform, in which case there should have been a complaint in the postmaster log. As noted elsewhere, the limit for Windows is a hair under 2GB even if it's 64-bit Windows. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Tuning Postgres for Single connection use
On 15/04/14 09:46, Nick Eubank wrote: Any rules of thumb for |work_mem|, |maintenance_work_mem|, |shared_buffer|, etc. for a database that DOESN'T anticipate concurrent connections and that is doing lots of aggregate functions on large tables? All the advice I can find online on tuning (this http://wiki.postgresql.org/wiki/Performance_Optimization, this http://media.revsys.com/talks/djangocon/2011/secrets-of-postgresql-performance.pdf, this http://www.revsys.com/writings/postgresql-performance.html etc.) is written for people anticipating lots of concurrent connections. I'm a social scientist looking to use Postgres not as a database to be shared by multiple users, but rather as my own tool for manipulating a massive data set (I have 5 billion transaction records (600gb in csv) and want to pull out unique user pairs, estimate aggregates for individual users, etc.). This also means almost no writing, except to creation of new tables based on selections from the main table. I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if that's important. Thanks! Well for serious database work, I suggest upgrading to Linux - you will get better performance out of the same hardware and probably (a year or so ago, I noticed some tuning options did not apply to Microsoft O/S's, but I don't recall the details - these options may, or may not, apply to your situation) more scope for tuning. Apart from anything else, your processing will not be slowed down by having to run anti-virus software! Note that in Linux you have a wide choice of distributions and desktop environments: I chose Mate (http://mate-desktop.org), some people prefer xfce (http://www.xfce.org), I used to use GNOME 2. Cheers, Gavin
Re: [PERFORM] Tuning Postgres for Single connection use
Thanks Gavin -- would LOVE to. Sadly I'm in a weird situation where my hardware is not under my control, so I'm stuck making the best of what I have. Next time though! :) On Monday, April 14, 2014, Gavin Flower gavinflo...@archidevsys.co.nz wrote: On 15/04/14 09:46, Nick Eubank wrote: Any rules of thumb for work_mem, maintenance_work_mem, shared_buffer, etc. for a database that DOESN'T anticipate concurrent connections and that is doing lots of aggregate functions on large tables? All the advice I can find online on tuning (thishttp://wiki.postgresql.org/wiki/Performance_Optimization , thishttp://media.revsys.com/talks/djangocon/2011/secrets-of-postgresql-performance.pdf , this http://www.revsys.com/writings/postgresql-performance.html etc.) is written for people anticipating lots of concurrent connections. I'm a social scientist looking to use Postgres not as a database to be shared by multiple users, but rather as my own tool for manipulating a massive data set (I have 5 billion transaction records (600gb in csv) and want to pull out unique user pairs, estimate aggregates for individual users, etc.). This also means almost no writing, except to creation of new tables based on selections from the main table. I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if that's important. Thanks! Well for serious database work, I suggest upgrading to Linux - you will get better performance out of the same hardware and probably (a year or so ago, I noticed some tuning options did not apply to Microsoft O/S's, but I don't recall the details - these options may, or may not, apply to your situation) more scope for tuning. Apart from anything else, your processing will not be slowed down by having to run anti-virus software! Note that in Linux you have a wide choice of distributions and desktop environments: I chose Mate (http://mate-desktop.org), some people prefer xfce (http://www.xfce.org), I used to use GNOME 2. Cheers, Gavin
Re: [PERFORM] Tuning Postgres for Single connection use
On 04/14/2014 05:46 PM, Nick Eubank wrote: Any rules of thumb for |work_mem|, |maintenance_work_mem|, |shared_buffer|, etc. for a database that DOESN'T anticipate concurrent connections and that is doing lots of aggregate functions on large tables? All the advice I can find online on tuning (this http://wiki.postgresql.org/wiki/Performance_Optimization, this http://media.revsys.com/talks/djangocon/2011/secrets-of-postgresql-performance.pdf, this http://www.revsys.com/writings/postgresql-performance.html etc.) is written for people anticipating lots of concurrent connections. I'm a social scientist looking to use Postgres not as a database to be shared by multiple users, but rather as my own tool for manipulating a massive data set (I have 5 billion transaction records (600gb in csv) and want to pull out unique user pairs, estimate aggregates for individual users, etc.). This also means almost no writing, except to creation of new tables based on selections from the main table. I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if that's important. First up would probably be don't run on Windows. shared_buffers above 512Mb causes performance to degrade on Windows, while that threshold is much higher on *nix. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Tuning Postgres for Single connection use
On Mon, Apr 14, 2014 at 2:46 PM, Nick Eubank nickeub...@gmail.com wrote: Any rules of thumb for work_mem, maintenance_work_mem, shared_buffer, etc. for a database that DOESN'T anticipate concurrent connections and that is doing lots of aggregate functions on large tables? All the advice I can find online on tuning (thishttp://wiki.postgresql.org/wiki/Performance_Optimization , thishttp://media.revsys.com/talks/djangocon/2011/secrets-of-postgresql-performance.pdf , this http://www.revsys.com/writings/postgresql-performance.html etc.) is written for people anticipating lots of concurrent connections. I'm a social scientist looking to use Postgres not as a database to be shared by multiple users, but rather as my own tool for manipulating a massive data set (I have 5 billion transaction records (600gb in csv) and want to pull out unique user pairs, estimate aggregates for individual users, etc.). This also means almost no writing, except to creation of new tables based on selections from the main table. I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if that's important. I'd go with a small shared_buffers, like 128MB, and let the OS cache as much as possible. This minimizes the amount of double buffering. And set work_mem to about 6GB, then bump it up if that doesn't seem to cause problems. In the scenario you describe, it is probably no big deal if you guess too high. Monitor the process, if it it starts to go nuts just kill it and start again with a lower work_mem. If it is a single user system, you can afford to be adventurous. If you need to build indexes, you should bump up maintenance_work_mem, but I just would do that in the local session not system wide. Cheers, Jeff
Re: [PERFORM] Tuning Postgres for Single connection use
In this list, please bottom post! I've added potentially useful advice below. On 15/04/14 11:39, Nick Eubank wrote: Thanks Gavin -- would LOVE to. Sadly I'm in a weird situation where my hardware is not under my control, so I'm stuck making the best of what I have. Next time though! :) On Monday, April 14, 2014, Gavin Flower gavinflo...@archidevsys.co.nz mailto:gavinflo...@archidevsys.co.nz wrote: On 15/04/14 09:46, Nick Eubank wrote: Any rules of thumb for |work_mem|, |maintenance_work_mem|, |shared_buffer|, etc. for a database that DOESN'T anticipate concurrent connections and that is doing lots of aggregate functions on large tables? All the advice I can find online on tuning (this http://wiki.postgresql.org/wiki/Performance_Optimization, this http://media.revsys.com/talks/djangocon/2011/secrets-of-postgresql-performance.pdf, this http://www.revsys.com/writings/postgresql-performance.html etc.) is written for people anticipating lots of concurrent connections. I'm a social scientist looking to use Postgres not as a database to be shared by multiple users, but rather as my own tool for manipulating a massive data set (I have 5 billion transaction records (600gb in csv) and want to pull out unique user pairs, estimate aggregates for individual users, etc.). This also means almost no writing, except to creation of new tables based on selections from the main table. I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if that's important. Thanks! Well for serious database work, I suggest upgrading to Linux - you will get better performance out of the same hardware and probably (a year or so ago, I noticed some tuning options did not apply to Microsoft O/S's, but I don't recall the details - these options may, or may not, apply to your situation) more scope for tuning. Apart from anything else, your processing will not be slowed down by having to run anti-virus software! Note that in Linux you have a wide choice of distributions and desktop environments: I chose Mate (http://mate-desktop.org), some people prefer xfce (http://www.xfce.org), I used to use GNOME 2. Cheers, Gavin Yeah, I know the feeling! I have a client that uses MySQL (ugh!), but I won't even bother mentioning PostgreSQL! Hopefully, someone more knowledgeable will give you some good advice specific to your O/S. For tables that don't change, consider a packing density of 100%. Take care in how you design your tables, and the column types. Consider carefully the queries you are likely to use, so you can design appropriate indexes. Some advice will depend on the schema you plan to use, and the type of queries. Cheers, Gavin
Re: [PERFORM] Tuning Postgres for Single connection use
Terrific -- thanks Gavin and Jeff! That's incredibly helpful for a n00b like me! On Mon, Apr 14, 2014 at 5:29 PM, Gavin Flower gavinflo...@archidevsys.co.nz wrote: In this list, please bottom post! I've added potentially useful advice below. On 15/04/14 11:39, Nick Eubank wrote: Thanks Gavin -- would LOVE to. Sadly I'm in a weird situation where my hardware is not under my control, so I'm stuck making the best of what I have. Next time though! :) On Monday, April 14, 2014, Gavin Flower gavinflo...@archidevsys.co.nz wrote: On 15/04/14 09:46, Nick Eubank wrote: Any rules of thumb for work_mem, maintenance_work_mem, shared_buffer, etc. for a database that DOESN'T anticipate concurrent connections and that is doing lots of aggregate functions on large tables? All the advice I can find online on tuning (thishttp://wiki.postgresql.org/wiki/Performance_Optimization , thishttp://media.revsys.com/talks/djangocon/2011/secrets-of-postgresql-performance.pdf , this http://www.revsys.com/writings/postgresql-performance.html etc.) is written for people anticipating lots of concurrent connections. I'm a social scientist looking to use Postgres not as a database to be shared by multiple users, but rather as my own tool for manipulating a massive data set (I have 5 billion transaction records (600gb in csv) and want to pull out unique user pairs, estimate aggregates for individual users, etc.). This also means almost no writing, except to creation of new tables based on selections from the main table. I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if that's important. Thanks! Well for serious database work, I suggest upgrading to Linux - you will get better performance out of the same hardware and probably (a year or so ago, I noticed some tuning options did not apply to Microsoft O/S's, but I don't recall the details - these options may, or may not, apply to your situation) more scope for tuning. Apart from anything else, your processing will not be slowed down by having to run anti-virus software! Note that in Linux you have a wide choice of distributions and desktop environments: I chose Mate (http://mate-desktop.org), some people prefer xfce (http://www.xfce.org), I used to use GNOME 2. Cheers, Gavin Yeah, I know the feeling! I have a client that uses MySQL (ugh!), but I won't even bother mentioning PostgreSQL! Hopefully, someone more knowledgeable will give you some good advice specific to your O/S. For tables that don't change, consider a packing density of 100%. Take care in how you design your tables, and the column types. Consider carefully the queries you are likely to use, so you can design appropriate indexes. Some advice will depend on the schema you plan to use, and the type of queries. Cheers, Gavin