From: Greg Spiegelberg   Sent: Monday, September 26, 2016 7:25 AM
I've gotten more responses than anticipated and have answered some questions 
and gotten some insight but my challenge again is what should I capture along 
the way to prove or disprove this storage pattern?  Alternatives to the storage 
pattern aside, I need ideas to test rig, capture metrics and suggestions to 
tune it.

 

In the next 24 hours, I will be sending ~1 trillion records to the test 
database.  Because of time to set up, I'd rather have things set up properly 
the first go.

 

Thanks!

-Greg 

---------------------

Greg, I ran another quick test on a wider table than you’ve described, but this 
time with 80 million rows, with core counts, ram and ssd storage similar to 
what you’d have on that AWS EC2 instance.  This table had 7 columns (3 
integers, 3 text, 1 timestamptz) with an average width of 157 chars, one btree 
index on the pk int column.  Using explain analyze, I picked one id value out 
of the 80m and ran a select * where id = x.  It did an index scan, had a 
planning time of 0.077ms, and an execution time of 0.254 seconds.  I ran the 
query for a variety of widely spaced values (so the data was uncached) and the 
timing never changed. This has been mirroring my general experience with PG – 
very fast reads on indexed queries. 

 

Summary:  I think your buckets can be WAY bigger than you are envisioning for 
the simple table design you’ve described.  I’m betting you can easily do 500 
million rows per bucket before approaching anything close to the 30ms max query 
time.

 

Mike Sofen (Synthetic Genomics)

Reply via email to