Thank you a lot for your suggestions.

> Random points/suggestions:
> *) mdadm is the way to go.   I think you'll get bandwidth constrained on most 
> modern hba unless they are really crappy.  On reasonably modern hardware 
> storage is rarely the bottleneck anymore (which is a great place to be).  
> Fancy raid controllers may actually hurt performance -- they are obsolete 
> IMNSHO.
> 
> *) Small point, but you'll want to crank effective_io_concurrency (see: 
> https://www.postgresql.org/message-id/CAHyXU0yiVvfQAnR9cyH%3DHWh1WbLRsioe%3DmzRJTHwtr%3D2azsTdQ%40mail.gmail.com
>  
> <https://www.postgresql.org/message-id/CAHyXU0yiVvfQAnR9cyH=HWh1WbLRsioe=mzRJTHwtr=2azs...@mail.gmail.com>).
>  It only affects certain kinds of queries, but when it works it really works. 
>    Those benchmarks were done on my crapbox dell workstation!

This is really impressive, thank you for sharing it.

> *) For very high transaction rates, you can get a lot of benefit from 
> disabling synchronous_commit if you are willing to accommodate the risk.  I 
> do not recommend disabling fsync unless you are prepared to regenerate the 
> entire database at any time.
> 
> *) Don't assume indefinite linear scaling as you increase storage capacity -- 
> the database itself can become the bottleneck, especially for writing.   To 
> improve write performance, classic optimization strategies of trying to 
> intelligently bundle writes around units of work still apply.  If you are 
> expecting high rates of write activity your engineering focus needs to be 
> here for sure (read scaling is comparatively pretty easy).

What do you mean with “units of work”?


> *) I would start doing your benchmarking with pgbench since that is going to 
> most closely reflect measured production performance.  

My final benchmark will be my application, it’s quite articulated and does also 
query parallelization using a custom splitter, so it will be hard to reproduce 
it using pgbench. At the moment I was just figuring out why my SSD weren’t 
performing as expected with comparable benchmarks found on hardware review 
websites (fio with 4k and 8k workloads). 


> If this is the case your stack performance is going to be based on data 
> structure design.  Make liberal use of:
> *) natural keys 
> *) constraint exclusion for partition selection
> *) BRIN index is amazing (if you can work into it's limitations)
> *) partial indexing
> *) covering indexes.  Don't forget to vacuum your partitions before you make 
> them live if you use them

The data definition is optimized for Postgres yet, but didn’t know about 
covering indexes. I read about BRIN but never tried them. Will do some testing.


> If your data is going to get really big and/or query activity is expected to 
> be high, keep an eye on your scale out strategy.   Going monolithic to 
> bootstrap your app is the right choice IMO but start thinking about the 
> longer term if you are expecting growth.  I'm starting to come out to the 
> perspective that lift/shift scaleout using postgres fdw without an insane 
> amount of app retooling could be a viable option by postgres 11/12 or so.  
> For my part I scaled out over asynchronous dblink which is a much more 
> maintenance heavy strategy (but works fabulous although I which you could 
> asynchronously connect).

Thank you for your hints 

 Pietro Pugni

Reply via email to