On Tue, Sep 27, 2016 at 9:42 AM, Mike Sofen <mso...@runbox.com> wrote:
> *From:* Mike Sofen *Sent:* Tuesday, September 27, 2016 8:10 AM > > *From:* Greg Spiegelberg *Sent:* Monday, September 26, 2016 7:25 AM > I've gotten more responses than anticipated and have answered some > questions and gotten some insight but my challenge again is what should I > capture along the way to prove or disprove this storage pattern? > Alternatives to the storage pattern aside, I need ideas to test rig, > capture metrics and suggestions to tune it. > > > > In the next 24 hours, I will be sending ~1 trillion records to the test > database. Because of time to set up, I'd rather have things set up > properly the first go. > > > > Thanks! > > -Greg > > --------------------- > > Greg, I ran another quick test on a wider table than you’ve described, but > this time with 80 million rows, with core counts, ram and ssd storage > similar to what you’d have on that AWS EC2 instance. This table had 7 > columns (3 integers, 3 text, 1 timestamptz) with an average width of 157 > chars, one btree index on the pk int column. Using explain analyze, I > picked one id value out of the 80m and ran a select * where id = x. It did > an index scan, had a planning time of 0.077ms, and an execution time of > 0.254 seconds. I ran the query for a variety of widely spaced values (so > the data was uncached) and the timing never changed. This has been > mirroring my general experience with PG – very fast reads on indexed > queries. > > > > Summary: I think your buckets can be WAY bigger than you are envisioning > for the simple table design you’ve described. I’m betting you can easily > do 500 million rows per bucket before approaching anything close to the > 30ms max query time. > > > > Mike Sofen (Synthetic Genomics) > > > > Totally typo’d the execution time: it was 0.254 MILLISECONDS, not > SECONDS. Thus my comment about going up 10x in bucket size instead of > appearing to be right at the limit. Sorry! > > > I figured. :) Haven't ruled it out but expectations of this implementation is to perform at worst 3X slower than memcache or Redis. Bigger buckets mean a wider possibility of response times. Some buckets may contain 140k records and some 100X more. -Greg