Re: Setting effective_io_concurrency in VM?
Hi, On 2017-11-27 11:40:19 -0700, Scott Marlowe wrote: > tl;dr: Only way to know is to benchmark it. I'd guess that somewhere > between 10 and 20 is going to get the best throughput but that's just > a guess. Benchmark it and let us know! FWIW, for SSDs my previous experiments suggest that the sweet spot is more likely to be an order of magnitude or two bigger. Depends a bit on your workload (including size of scans and concurrency) obviously. Greetings, Andres Freund
Re: Setting effective_io_concurrency in VM?
On Mon, Nov 27, 2017 at 10:40 AM, Scott Marlowewrote: > > Generally VMs are never going to be as fast as running on bare metal > etc. You can adjust it and test it with something simple like pgbench > with various settings for -c (concurrency) and see where it peaks etc > with the setting. This will at least get you into the ball park. > None of the built-in workloads for pgbench cares a whit about effective_io_concurrency. He would have to come up with some custom transactions to exercise that feature. (Or use the tool people use to run the TPCH benchmark, rather than using pgbench's built in transactions) I think the best overall advice would be to configure it the same as you would if it were not a VM. There may be cases where you diverge from that, but I think each one would require extensive investigation and experimentation, so can't be turned into a rule of thumb. Cheers, Jeff
Re: Setting effective_io_concurrency in VM?
On Mon, Nov 27, 2017 at 3:44 PM, Andrew Kerberwrote: > Whats the guest OS? I have been able to get Oracle to perform just as > well on Virtuals as it does on Physicals. I suspect the settings are > pretty similar. > Guest OS is CentOS 6 and CentOS 7 depending on which DB host we're looking at. I'd be interested in learning for either case. Don. -- Don Seiler www.seiler.us
Re: Setting effective_io_concurrency in VM?
El 27 nov. 2017 15:24, "Don Seiler"escribió: Good afternoon. We run Postgres (currently 9.2, upgrading to 9.6 shortly) in VMWare ESX machines. We currently have effective_io_concurrency set to the default of 1. I'm told that the data volume is a RAID 6 with 14 data drives and 2 parity drives. I know that RAID10 is recommended, just working with what I've inherited for now (storage is high-end HP 3Par and HP recommended RAID 6 for best performance). Anyway, I'm wondering if, in a virtualized environment with a VM datastore, it makes sense to set effective_io_concurrency closer to the number of data drives? I'd also be interested in hearing how others have configured their PostgreSQL instances for VMs (if there's anything special to think about). If the storage was exclusively for the Postgres box I'd try effective_io_concurrency somewhere between 8 and 12. Since it is probably not, it will depend on the load the other VMs exert on the storage. Assuming the storage isnt already stressed and you need the extra IOPS, you could test values between 4 and 8. You can of course be a lousy team player and have PG paralelize as much as it can, but this eventually will piss off the storage or vmware manager, which is never good as they can limit your IO throughput at the virtualization or storage layers. Cheers.
Re: Setting effective_io_concurrency in VM?
On Mon, Nov 27, 2017 at 11:23 AM, Don Seilerwrote: > Good afternoon. > > We run Postgres (currently 9.2, upgrading to 9.6 shortly) in VMWare ESX > machines. We currently have effective_io_concurrency set to the default of > 1. I'm told that the data volume is a RAID 6 with 14 data drives and 2 > parity drives. I know that RAID10 is recommended, just working with what > I've inherited for now (storage is high-end HP 3Par and HP recommended RAID > 6 for best performance). > > Anyway, I'm wondering if, in a virtualized environment with a VM datastore, > it makes sense to set effective_io_concurrency closer to the number of data > drives? > > I'd also be interested in hearing how others have configured their > PostgreSQL instances for VMs (if there's anything special to think about). Generally VMs are never going to be as fast as running on bare metal etc. You can adjust it and test it with something simple like pgbench with various settings for -c (concurrency) and see where it peaks etc with the setting. This will at least get you into the ball park. A while back we needed fast machines with LOTS of storage (7TB data drives with 5TB of data on them) and the only way to stuff that many 800GB SSDs into a single machine was to use RAID-5 with a spare (I lobbied for RAID6 but was overidden eh...) We were able to achieve over 15k TPS in pgbench with a 400GB data store on those boxes. The secret was to turn off the cache in the RAID controller and cranl up effective io concurrency to something around 10 (not sure, it's been a while). tl;dr: Only way to know is to benchmark it. I'd guess that somewhere between 10 and 20 is going to get the best throughput but that's just a guess. Benchmark it and let us know!
Re: Half billion records in one table? RDS
Jean Baro wrote: > Hi there, > > We are creating a new DB which will behave most like a file system, > I mean, there will be no complex queries or joins running in the DB. > The idea is to grab the WHOLE set of messages for a particular user > and then filter, order, combine or full text search in the function itself > (AWS Lambda). > The maximum number of messages is limited to 1.000 messages per user. > So we expect Postgres to have an amazing performance for this scenario. > [...] > > Sizing and worst case scenario: > > · 500MM messages in the main DB > · 4K queries per second (by UserID) – Max time of 500ms per query. > Simples SELECT, > with no ORDER, WHERE OR GROUP BY. Just grab all the messages for a > particular user. MAX 1000 messages per USER. > · 1K inserts per second on average (So that in 1 hour we can insert > around 3MM messages) > · 1K deletes per second on average (So that in 1 hour we can remove > around 3MM messages) > > My question is: > Can we use any kind of compression for PostgreSQL which would result in > reduced IO and disk size? > We are not relying on any kind of table partitioning, is that the best > approach for this scenario? > Is PG on RDS capable of delivering this type of performance while requiring > low maintenance? > What about Auto Vacuum? Any suggestion how to optimize it for such a work load > (we will insert and delete millions of rows every day). It sounds like your JSON data, which are your chief concern, are not processed inside the database. For that, the type "json" is best. Such data are automatically stored in a compressed format if their size exceeds 2KB. The compression is not amazingly good, but fast. If your application removes data by deleting them from the table, partitioning won't help. It is useful if data get removed in bulk, e.g. if you want to delete all yesterday's data at once. The workload does not sound amazingly large, so I'd expect PostgreSQL to have no problems with it with decent storage and CPU power, but you'd have to test that. Tune autovacuum if it cannot keep up (tables get bloated). The first knob to twiddle is probably lowering "autovacuum_vacuum_cost_delay". Autovacuum might be your biggest problem (only guessing). Yours, Laurenz Albe