Thank you a lot for your suggestions. > Random points/suggestions: > *) mdadm is the way to go. I think you'll get bandwidth constrained on most > modern hba unless they are really crappy. On reasonably modern hardware > storage is rarely the bottleneck anymore (which is a great place to be). > Fancy raid controllers may actually hurt performance -- they are obsolete > IMNSHO. > > *) Small point, but you'll want to crank effective_io_concurrency (see: > https://www.postgresql.org/message-id/CAHyXU0yiVvfQAnR9cyH%3DHWh1WbLRsioe%3DmzRJTHwtr%3D2azsTdQ%40mail.gmail.com > > <https://www.postgresql.org/message-id/CAHyXU0yiVvfQAnR9cyH=HWh1WbLRsioe=mzRJTHwtr=2azs...@mail.gmail.com>). > It only affects certain kinds of queries, but when it works it really works. > Those benchmarks were done on my crapbox dell workstation!
This is really impressive, thank you for sharing it. > *) For very high transaction rates, you can get a lot of benefit from > disabling synchronous_commit if you are willing to accommodate the risk. I > do not recommend disabling fsync unless you are prepared to regenerate the > entire database at any time. > > *) Don't assume indefinite linear scaling as you increase storage capacity -- > the database itself can become the bottleneck, especially for writing. To > improve write performance, classic optimization strategies of trying to > intelligently bundle writes around units of work still apply. If you are > expecting high rates of write activity your engineering focus needs to be > here for sure (read scaling is comparatively pretty easy). What do you mean with “units of work”? > *) I would start doing your benchmarking with pgbench since that is going to > most closely reflect measured production performance. My final benchmark will be my application, it’s quite articulated and does also query parallelization using a custom splitter, so it will be hard to reproduce it using pgbench. At the moment I was just figuring out why my SSD weren’t performing as expected with comparable benchmarks found on hardware review websites (fio with 4k and 8k workloads). > If this is the case your stack performance is going to be based on data > structure design. Make liberal use of: > *) natural keys > *) constraint exclusion for partition selection > *) BRIN index is amazing (if you can work into it's limitations) > *) partial indexing > *) covering indexes. Don't forget to vacuum your partitions before you make > them live if you use them The data definition is optimized for Postgres yet, but didn’t know about covering indexes. I read about BRIN but never tried them. Will do some testing. > If your data is going to get really big and/or query activity is expected to > be high, keep an eye on your scale out strategy. Going monolithic to > bootstrap your app is the right choice IMO but start thinking about the > longer term if you are expecting growth. I'm starting to come out to the > perspective that lift/shift scaleout using postgres fdw without an insane > amount of app retooling could be a viable option by postgres 11/12 or so. > For my part I scaled out over asynchronous dblink which is a much more > maintenance heavy strategy (but works fabulous although I which you could > asynchronously connect). Thank you for your hints Pietro Pugni