Hi Craig, Fair point. For now, I mean "just fast" - which is 5-15 seconds, but I'd like to get it down to the 1-2 second range.
>From the query I provided, I have approximately 30,000 unique keys (what I called primary_id) that I'm grouping by, and each key has a series of numerical values for each of the type_ids. I'm looking at averages, stddev and other statistics across a few hundred type_ids (where agg.type_id in ....). The part of the query that varies is the user specified type_ids, which makes it impossible to precalculate my statistics. I'd like this to eventually scale to a million unique keys, and a thousand type_ids. For now Postgres been great for modeling the data, understanding where I hit performance bottle necks, and providing a fast enough user interface. But, I'm definitely starting to think about whether I can cache my data (with millions of keys and thousands of type_ids, the data might be too large), and whether to look into distributed databases (even thought I can't precompute the stats, my queries are easily distributable across multiple processors since each processor could take a batch of keys). I might even want to consider a column oriented database - since my keys don't change often, I could potentially add new columns when there are new type_ids. I've been thinking of looking into memcached or hbase. If you have any suggestions on which options I should explore, I'd greatly appreciate it. Sorry, for veering off topic a bit from postgres. thanks, Anish On Wed, Aug 17, 2011 at 10:32 PM, Craig Ringer <ring...@ringerc.id.au>wrote: > On 18/08/2011 9:03 AM, Anish Kejariwal wrote: > >> Thanks for the help Pavel and Craig. I really appreciate it. I'm going >> to try a couple of these different options (write a c function, use a sql >> function with case statements, and use plperl), so I can see which gives me >> the realtime performance that I need, and works best for clean code in my >> particular case. >> > Do you really mean "realtime"? Or just "fast"? > > If you have strongly bounded latency requirements, any SQL-based, > disk-based system is probably not for you. Especially not one that relies on > a statics-based query planner, caching, and periodic checkpoints. I'd be > looking into in-memory databases designed for realtime environments where > latency is critical. > > Hard realtime: If this system fails to respond within <x> milliseconds, all > the time, every time, then something will go "smash" or "boom" expensively > and unrecoverably. > > Soft realtime: If this system responds late, the late response is expensive > or less useful. Frequent late responses are unacceptable but the occasional > one might be endurable. > > Just needs to be fast: If it responds late, the user gets irritated because > they're sitting and waiting for a response. Regular long stalls are > unacceptable, but otherwise the user can put up with it. You're more > concerned with average latency than maximum latency. > > -- > Craig Ringer >