Re: [PERFORM] Calculating statistic via function rather than with query is slowing my query

Anish Kejariwal Thu, 18 Aug 2011 11:46:29 -0700

Hi Craig,

Fair point.  For now, I mean "just fast" - which is 5-15 seconds, but I'd
like to get it down to the 1-2 second range.

>From the query I provided, I have approximately 30,000 unique keys (what I
called primary_id) that I'm grouping by, and each key has a series of
numerical values for each of the type_ids.  I'm looking at averages, stddev
and other statistics across a few hundred type_ids (where agg.type_id in
....).  The part of the query that varies is the user specified type_ids,
which makes it impossible to precalculate my statistics.

I'd like this to eventually scale to a million unique keys, and a thousand
type_ids.

For now Postgres been great for modeling the data, understanding where I hit
performance bottle necks, and providing a fast enough user interface.  But,
I'm definitely starting to think about whether I can cache my data (with
millions of keys and thousands of type_ids, the data might be too large),
and whether to look into distributed databases (even thought I can't
precompute the stats, my queries are easily distributable across multiple
processors since each processor could take a batch of keys).  I might even
want to consider a column oriented database  - since my keys don't change
often, I could potentially add new columns when there are new type_ids.

I've been thinking of looking into memcached or hbase.  If you have any
suggestions on which options I should explore, I'd greatly appreciate it.

Sorry, for veering off topic a bit from postgres.

thanks,
Anish

On Wed, Aug 17, 2011 at 10:32 PM, Craig Ringer <ring...@ringerc.id.au>wrote:

> On 18/08/2011 9:03 AM, Anish Kejariwal wrote:
>
>> Thanks for the help Pavel and Craig.  I really appreciate it.  I'm going
>> to try a couple of these different options (write a c function, use a sql
>> function with case statements, and use plperl), so I can see which gives me
>> the realtime performance that I need, and works best for clean code in my
>> particular case.
>>
> Do you really mean "realtime"? Or just "fast"?
>
> If you have strongly bounded latency requirements, any SQL-based,
> disk-based system is probably not for you. Especially not one that relies on
> a statics-based query planner, caching, and periodic checkpoints. I'd be
> looking into in-memory databases designed for realtime environments where
> latency is critical.
>
> Hard realtime: If this system fails to respond within <x> milliseconds, all
> the time, every time, then something will go "smash" or "boom" expensively
> and unrecoverably.
>
> Soft realtime: If this system responds late, the late response is expensive
> or less useful. Frequent late responses are unacceptable but the occasional
> one might be endurable.
>
> Just needs to be fast: If it responds late, the user gets irritated because
> they're sitting and waiting for a response. Regular long stalls are
> unacceptable, but otherwise the user can put up with it. You're more
> concerned with average latency than maximum latency.
>
> --
> Craig Ringer
>

Re: [PERFORM] Calculating statistic via function rather than with query is slowing my query

Reply via email to