Re: Setting effective_io_concurrency in VM?

2017-11-27 Thread Andres Freund
Hi,

On 2017-11-27 11:40:19 -0700, Scott Marlowe wrote:
> tl;dr: Only way to know is to benchmark it. I'd guess that somewhere
> between 10 and 20 is going to get the best throughput but that's just
> a guess. Benchmark it and let us know!

FWIW, for SSDs my previous experiments suggest that the sweet spot is
more likely to be an order of magnitude or two bigger. Depends a bit on
your workload (including size of scans and concurrency) obviously.

Greetings,

Andres Freund



Re: Setting effective_io_concurrency in VM?

2017-11-27 Thread Jeff Janes
On Mon, Nov 27, 2017 at 10:40 AM, Scott Marlowe 
wrote:

>
> Generally VMs are never going to be as fast as running on bare metal
> etc. You can adjust it and test it with something simple like pgbench
> with various settings for -c (concurrency) and see where it peaks etc
> with the setting. This will at least get you into the ball park.
>

None of the built-in workloads for pgbench cares a whit about
effective_io_concurrency.  He would have to come up with some custom
transactions to exercise that feature.  (Or use the tool people use to run
the TPCH benchmark, rather than using pgbench's built in transactions)

I think the best overall advice would be to configure it the same as you
would if it were not a VM.  There may be cases where you diverge from that,
but I think each one would require extensive investigation and
experimentation, so can't be turned into a rule of thumb.

Cheers,

Jeff


Re: Setting effective_io_concurrency in VM?

2017-11-27 Thread Don Seiler
On Mon, Nov 27, 2017 at 3:44 PM, Andrew Kerber 
wrote:

> Whats the guest OS?  I have been able to get Oracle to perform just as
> well on Virtuals as it does on Physicals.  I suspect the settings are
> pretty similar.
>

Guest OS is CentOS 6 and CentOS 7 depending on which DB host we're looking
at. I'd be interested in learning for either case.

Don.

-- 
Don Seiler
www.seiler.us


Re: Setting effective_io_concurrency in VM?

2017-11-27 Thread Fernando Hevia
El 27 nov. 2017 15:24, "Don Seiler"  escribió:

Good afternoon.

We run Postgres (currently 9.2, upgrading to 9.6 shortly) in VMWare ESX
machines. We currently have effective_io_concurrency set to the default of
1. I'm told that the data volume is a RAID 6 with 14 data drives and 2
parity drives. I know that RAID10 is recommended, just working with what
I've inherited for now (storage is high-end HP 3Par and HP recommended RAID
6 for best performance).

Anyway, I'm wondering if, in a virtualized environment with a VM datastore,
it makes sense to set effective_io_concurrency closer to the number of data
drives?

I'd also be interested in hearing how others have configured their
PostgreSQL instances for VMs (if there's anything special to think about).



If the storage was exclusively for the Postgres box I'd try
effective_io_concurrency somewhere between 8 and 12. Since it is probably
not, it will depend on the load the other VMs exert on the storage.
Assuming the storage isnt already stressed and you need the extra IOPS, you
could test values between 4 and 8. You can of course be a lousy team player
and have PG paralelize as much as it can,  but this eventually will piss
off the storage or vmware manager, which is never good as they can limit
your IO throughput at the virtualization or storage layers.

Cheers.


Re: Setting effective_io_concurrency in VM?

2017-11-27 Thread Scott Marlowe
On Mon, Nov 27, 2017 at 11:23 AM, Don Seiler  wrote:
> Good afternoon.
>
> We run Postgres (currently 9.2, upgrading to 9.6 shortly) in VMWare ESX
> machines. We currently have effective_io_concurrency set to the default of
> 1. I'm told that the data volume is a RAID 6 with 14 data drives and 2
> parity drives. I know that RAID10 is recommended, just working with what
> I've inherited for now (storage is high-end HP 3Par and HP recommended RAID
> 6 for best performance).
>
> Anyway, I'm wondering if, in a virtualized environment with a VM datastore,
> it makes sense to set effective_io_concurrency closer to the number of data
> drives?
>
> I'd also be interested in hearing how others have configured their
> PostgreSQL instances for VMs (if there's anything special to think about).

Generally VMs are never going to be as fast as running on bare metal
etc. You can adjust it and test it with something simple like pgbench
with various settings for -c (concurrency) and see where it peaks etc
with the setting. This will at least get you into the ball park.

A while back we needed fast machines with LOTS of storage (7TB data
drives with 5TB of data on them) and the only way to stuff that many
800GB SSDs into a single machine was to use RAID-5 with a spare (I
lobbied for RAID6 but was overidden eh...) We were able to achieve
over 15k TPS in pgbench with a 400GB data store on those boxes. The
secret was to turn off the cache in the RAID controller and cranl up
effective io concurrency to something around 10 (not sure, it's been a
while).

tl;dr: Only way to know is to benchmark it. I'd guess that somewhere
between 10 and 20 is going to get the best throughput but that's just
a guess. Benchmark it and let us know!



Re: Half billion records in one table? RDS

2017-11-27 Thread Laurenz Albe
Jean Baro wrote:
> Hi there,
> 
> We are creating a new DB which will behave most like a file system,
> I mean, there will be no complex queries or joins running in the DB.
> The idea is to grab the WHOLE set of messages for a particular user
> and then filter, order, combine or full text search in the function itself 
> (AWS Lambda).
> The maximum number of messages is limited to 1.000 messages per user.
> So we expect Postgres to have an amazing performance for this scenario.
> 
[...]
>  
> Sizing and worst case scenario:
>  
> · 500MM messages in the main DB
> · 4K queries per second (by UserID) – Max time of 500ms per query. 
> Simples SELECT,
>   with no ORDER, WHERE OR GROUP BY. Just grab all the messages for a 
> particular user. MAX 1000 messages per USER.
> · 1K inserts per second on average (So that in 1 hour we can insert 
> around 3MM messages)
> · 1K deletes per second on average (So that in 1 hour we can remove 
> around 3MM messages)
> 
> My question is:
> Can we use any kind of compression for PostgreSQL which would result in 
> reduced IO and disk size?
> We are not relying on any kind of table partitioning, is that the best 
> approach for this scenario?
> Is PG on RDS capable of delivering this type of performance while requiring 
> low maintenance?
> What about Auto Vacuum? Any suggestion how to optimize it for such a work load
> (we will insert and delete millions of rows every day).

It sounds like your JSON data, which are your chief concern, are
not processed inside the database.  For that, the type "json" is best.
Such data are automatically stored in a compressed format if their
size exceeds 2KB. The compression is not amazingly good, but fast.

If your application removes data by deleting them from the
table, partitioning won't help.  It is useful if data get removed
in bulk, e.g. if you want to delete all yesterday's data at once.

The workload does not sound amazingly large, so I'd expect PostgreSQL
to have no problems with it with decent storage and CPU power,
but you'd have to test that.

Tune autovacuum if it cannot keep up (tables get bloated).
The first knob to twiddle is probably lowering "autovacuum_vacuum_cost_delay".
Autovacuum might be your biggest problem (only guessing).

Yours,
Laurenz Albe