On Jun 18, 2013, at 12:23 PM, John R Pierce <pie...@hogranch.com> wrote:

> On 6/18/2013 12:17 PM, Steven Schlansker wrote:
>> 1) The common value is not known at schema definition time, and may change 
>> (very slowly) over time.
> 
> how could a value thats constant in 95% of the rows change, unless you added 
> 20 times more rows with a new value (and for a big portion of the time, no 
> value would meet your 95% criteria).

The table is a denormalized version of some packed data.  The packed data is 
constant, but the extractor code changes over time.  The value in question is a 
"extractor version used to create this row".

There is a periodic job that attempts to find batches of rows that have fields 
extracted by an old version of the extractor.  These rows are re-extracted from 
the packed data.

So, most of the time the vast majority of rows will have CURRENT_VERSION as 
their version, and a small percentage of rows will have a previous version.  
The job will select rows where extracted_version != CURRENT_VERSION.  If this 
query is not indexed, even doing a periodic check if any rows exist takes an 
absurd amount of time.

At some point, the code changes, and CURRENT_VERSION gets incremented.  Rows 
then slowly (over a period of days / weeks) get "upgraded" to the new current 
version, in batches of thousands.


This is what I mean by a very slowly changing mostly-constant value.

Hope that makes sense,
Steven




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to