On Wed, Sep 28, 2016 at 11:27 AM, Stephen Frost <sfr...@snowman.net> wrote:
> Greg, > > * Greg Spiegelberg (gspiegelb...@gmail.com) wrote: > > Bigger buckets mean a wider possibility of response times. Some buckets > > may contain 140k records and some 100X more. > > Have you analyzed the depth of the btree indexes to see how many more > pages need to be read to handle finding a row in 140k records vs. 14M > records vs. 140M records? > > I suspect you'd find that the change in actual depth (meaning how many > pages have to actually be read to find the row you're looking for) isn't > very much and that your concern over the "wider possibility of response > times" isn't well founded. > > Stephen, Excellent feedback! Um, how does one look at tree depth in PostgreSQL? Oracle I know but have not done the same in PG. Pointers? > Since you have a hard-set 30ms maximum for query response time, I would > suggest you work out how long it takes to read a cold page from your I/O > subsystem and then you can work through exactly how many page reads > could be done in that 30ms (or perhaps 20ms, to allow for whatever > overhead there will be in the rest of the system and as a buffer) and > then work that back to how deep the index can be based on that many page > reads and then how many records are required to create an index of that > depth. Of course, the page from the heap will also need to be read and > there's a bit of additional work to be done, but the disk i/o for cold > pages is almost certainly where most time will be spent. > > I suspect you'll discover that millions of tables is a couple orders of > magnitude off of how many you'd need to keep the number of page reads > below the threshold you work out based on your I/O. > > Of course, you would need a consistent I/O subsystem, or at least one > where you know the maximum possible latency to pull a cold page. > > Lastly, you'll want to figure out how to handle system crash/restart if > this system requires a high uptime. I expect you'd want to have at > least one replica and a setup which allows you to flip traffic to it > very quickly to maintain the 30ms response times. > I'm replicating via messaging. PG replication is fine for smaller db's but I don't trust networks and PG upgrade intricacies complicate matters. -Greg