Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2010-01-05 Thread Chetan Suttraway
Hi,

My suggestion is to keep two sets of histograms. One which is generated by
running ANALYZE and
the other which is dynamically generated histograms using the entries from
logging (that is done
in insert/update/delete operations).
I am not sure how difficult is it to read  such record details from logs.

Basically from the details mentioned here what i understand is that the
table data (timestamp) is added
in incremental way, ie existing data is not modified to great extent and the
new data is
merely appended to old data.
In this case, the only work for analyse/statistics generation is to merge
the histograms of newly added records to old histograms.

if we can treat this case as similar to that of merging of histograms in
case of joins involving 2 tables and generating the histograms for the
cartesian (result) node, then all we need to do is somehow generate
temporary histogram for the new set of records and merge them with the old
histogram.
The information of interesting columns from the new records can be read from
the logging section.
We must be already having the part of merging of histograms and I hope that
it wont be very costly
to make these calls so as to effect planner.
(Further my opinion is to calculate this cost of histogram generation and
use it in costing in some way)

Further we can put some threshold limit to make this merge happen
automatically. Say if the temporary histograms reach some set threshold,
only then these will be merged with the older histograms.

Please pass on your inputs.

Regards,
Chetan


On Wed, Dec 30, 2009 at 8:38 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Josh Berkus j...@agliodbs.com writes:
  My thoughts on dealing with this intelligently without a major change to
  statstics gathering went along these lines:

  1. add columns to pg_statistic to hold estimates of upper and lower
  bounds growth between analyzes.

 This seems like a fundamentally broken approach, first because time
 between analyzes is not even approximately a constant, and second
 because it assumes that we have a distance metric for all datatypes.
 (Note that convert_to_scalar does not assume that it can measure
 arbitrary distances, but only fractions *within* a histogram bucket;
 and even that is pretty shaky.)

 I don't have a better idea at the moment :-(

regards, tom lane

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2010-01-05 Thread Robert Haas
On Tue, Jan 5, 2010 at 7:49 AM, Chetan Suttraway
chetan.suttra...@enterprisedb.com wrote:
 if we can treat this case as similar to that of merging of histograms in
 case of joins involving 2 tables and generating the histograms for the
 cartesian (result) node,

...which you can't, because it's totally different, so I think the
rest of this is a dead end.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2010-01-04 Thread Csaba Nagy
On Wed, 2009-12-30 at 17:16 +0100, Tom Lane wrote:
 I think the cleanest solution to this would be to make ANALYZE
 cheaper, perhaps by finding some way for it to work incrementally.

What if when inserting/deleting a tuple, some random sample of them
would be passed into an auto-analyze buffer ?

Then a special process (the auto-analyze daemon) would process them and
update the statistics incrementally based on the new values found (which
might or might not be mathematically feasible).

The overhead for each backend process would be kept in limits by the
rate at which you randomly send or not send the change to the analyze
buffer.

The processing overhead would be kept in limits by the processing rate
of the auto-analyze process, which can be made to periodically sleep or
it could be made to span multiple processes (on multiprocessor systems).

If the buffer is full, then you skip putting in it... so it also could
autotune itself to a sustainable rate.


Of course as with all my other posts on hackers, this is all mostly
hand-waving, I have no clue about the feasibility of all this with
regard to the current state of the code (which I didn't read, I
unfortunately found myself hating reading C code beyond reason, and
writing any of it till now resumed to copy-paste-modify).

Cheers,
Csaba.


Csaba Nagy
Software Engineer 
 
 
eCircle 
P: +49 (0)89 / 120 09-783 | F: +49 (0)89 / 120 09-750
E: c.n...@ecircle.com
Nymphenburger Str. 86, 80636 München  
 
Stay in touch
Web: www.ecircle.com/de | Newsletter: www.ecircle.com/index.php?id=63L=0

Für Hilfe mit dem eC-messenger wenden Sie sich bitte an unseren 
Support: support...@ecircle.com.

Neuste Untersuchungen
Ein unschlagbares Doppel: E-mail-Marketing  Webanalyse
Download Whitepaper: www.ecircle.com/index.php?id=61L=0
 
eCircle AG, HRB 136 334, Handelsregister München Vorstand: 
Volker Wiewer (Vorsitzender), Thomas Wilke, Lars Wössner, 
Alexander Meyer Vorsitzender des Aufsichtsrates: Dr. Mark Wössner  
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2010-01-04 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 I've applied a patch to HEAD that does the above.  Can you test it to
 see how well it fixes your problem?

 Sure.  It'll take us a while to set up a test environment; the database
 is 1TB in size so converting it to 8.5 isn't quick.

Great.  When you have it set up, you might want to play with enabling
the mergejoinscansel change as well, and see if that is a net plus or
minus for you.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2010-01-04 Thread Josh Berkus

 I've applied a patch to HEAD that does the above.  Can you test it to
 see how well it fixes your problem?

Sure.  It'll take us a while to set up a test environment; the database
is 1TB in size so converting it to 8.5 isn't quick.

Will report back.

--Josh


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2010-01-04 Thread Josh Berkus

 Great.  When you have it set up, you might want to play with enabling
 the mergejoinscansel change as well, and see if that is a net plus or
 minus for you.

How would I *disable* it?

--Josh

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2010-01-04 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 Great.  When you have it set up, you might want to play with enabling
 the mergejoinscansel change as well, and see if that is a net plus or
 minus for you.

 How would I *disable* it?

It's #ifdef NOT_USED in CVS at the moment.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2010-01-03 Thread Tom Lane
I wrote:
 Actually, in the problematic cases, it's interesting to consider the
 following strategy: when scalarineqsel notices that it's being asked for
 a range estimate that's outside the current histogram bounds, first try
 to obtain the actual current max() or min() of the column value --- this
 is something we can get fairly cheaply if there's a btree index on the
 column.  If we can get it, plug it into the histogram, replacing the
 high or low bin boundary.  Then estimate as we currently do.  This would
 work reasonably well as long as re-analyzes happen at a time scale such
 that the histogram doesn't move much overall, ie, the number of
 insertions between analyzes isn't a lot compared to the number of rows
 per bin.  We'd have some linear-in-the-bin-size estimation error because
 the modified last or first bin actually contains more rows than other
 bins, but it would certainly work a lot better than it does now.

I've applied a patch to HEAD that does the above.  Can you test it to
see how well it fixes your problem?

Looking at the current uses of the histogram stats, there is another
place that could possibly benefit from this type of on-demand index
search: mergejoinscansel.  That code attempts to estimate how much of a
column actually needs to be scanned by a merge join, recognizing that a
merge will stop reading either input once the other is exhausted.
Having an accurate idea of the join keys' upper bounds is fairly
important here, since the supposed cost reduction from not scanning all
of a table disappears if there's even one large-valued key on the other
side of the join.  On the other hand, making use of index searches in
mergejoinscansel would put these index searches into the standard, very
heavily used join planning path, not into a corner case which is all
they are right now.  So I'm fairly nervous about the potential hit on
join planning time.  Is there anyone around who can do planner timing
measurements on complex join queries involving large tables?  If so,
please try CVS HEAD as-is and after enabling this bit in selfuncs.c:

/*
 * XXX It's very tempting to try to use the actual column min and max,
 * if we can get them relatively-cheaply with an index probe.  However,
 * since this function is called many times during join planning,
 * that could have unpleasant effects on planning speed.  Need more
 * investigation before enabling this.
 */
#ifdef NOT_USED
if (get_actual_variable_range(root, vardata, sortop, min, max))
return true;
#endif

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2010-01-01 Thread Simon Riggs
On Thu, 2009-12-31 at 21:29 +, Simon Riggs wrote:
 On Thu, 2009-12-31 at 15:18 -0500, Tom Lane wrote:
  Simon Riggs si...@2ndquadrant.com writes:
   Why not get both max() and min(), then rebase the histogram according to
   those values. That way the histogram can still move significantly and
   the technique will still work.
  
  Define rebase, keeping in mind that this has to work on datatypes that
  we don't have a distance metric for.
 
 Make it work differently according to whether we have, or not, just as
 we do elsewhere with stats. No point in limiting ourselves to the lowest
 common denominator, especially when the common case is integer keys and
 time datatypes.

This seemed obvious but I understand now that you meant we don't know
that from the datatype definition, so we can't do as I suggested, yet.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2009-12-31 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 Actually, in the problematic cases, it's interesting to consider the
 following strategy: when scalarineqsel notices that it's being asked for
 a range estimate that's outside the current histogram bounds, first try
 to obtain the actual current max() or min() of the column value --- this
 is something we can get fairly cheaply if there's a btree index on the
 column.  If we can get it, plug it into the histogram, replacing the
 high or low bin boundary.  Then estimate as we currently do.  This would
 work reasonably well as long as re-analyzes happen at a time scale such
 that the histogram doesn't move much overall, ie, the number of
 insertions between analyzes isn't a lot compared to the number of rows
 per bin.  We'd have some linear-in-the-bin-size estimation error because
 the modified last or first bin actually contains more rows than other
 bins, but it would certainly work a lot better than it does now.

I know very little about statistics in general, but your proposal seems
straigth enough for me to understand it, and looks good: +1.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2009-12-31 Thread Simon Riggs
On Wed, 2009-12-30 at 14:55 -0500, Tom Lane wrote:

 Actually, in the problematic cases, it's interesting to consider the
 following strategy: when scalarineqsel notices that it's being asked for
 a range estimate that's outside the current histogram bounds, first try
 to obtain the actual current max() or min() of the column value --- this
 is something we can get fairly cheaply if there's a btree index on the
 column.  If we can get it, plug it into the histogram, replacing the
 high or low bin boundary.  Then estimate as we currently do.  This would
 work reasonably well as long as re-analyzes happen at a time scale such
 that the histogram doesn't move much overall, ie, the number of
 insertions between analyzes isn't a lot compared to the number of rows
 per bin.  We'd have some linear-in-the-bin-size estimation error because
 the modified last or first bin actually contains more rows than other
 bins, but it would certainly work a lot better than it does now.

Histograms often move quickly, but they seldom change shape.

Why not get both max() and min(), then rebase the histogram according to
those values. That way the histogram can still move significantly and
the technique will still work.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2009-12-31 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 Why not get both max() and min(), then rebase the histogram according to
 those values. That way the histogram can still move significantly and
 the technique will still work.

Define rebase, keeping in mind that this has to work on datatypes that
we don't have a distance metric for.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2009-12-31 Thread Simon Riggs
On Thu, 2009-12-31 at 15:18 -0500, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  Why not get both max() and min(), then rebase the histogram according to
  those values. That way the histogram can still move significantly and
  the technique will still work.
 
 Define rebase, keeping in mind that this has to work on datatypes that
 we don't have a distance metric for.

Make it work differently according to whether we have, or not, just as
we do elsewhere with stats. No point in limiting ourselves to the lowest
common denominator, especially when the common case is integer keys and
time datatypes.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2009-12-30 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 Josh Berkus j...@agliodbs.com writes:
 My thoughts on dealing with this intelligently without a major
 change to statstics gathering went along these lines:
 
 1. add columns to pg_statistic to hold estimates of upper and
 lower bounds growth between analyzes.
 
 This seems like a fundamentally broken approach
 
 I don't have a better idea at the moment :-(
 
It's been a while since I've been bitten by this issue -- the last
time was under Sybase.  The Sybase suggestion was to either add
dummy rows [YUCK!] to set the extreme bounds or to lie to the
optimizer by fudging the statistics after each generation.  Perhaps
we could do better by adding columns for high and low bounds to
pg_statistic.  These would not be set by ANALYZE, but
user-modifiable to cover exactly this problem?  NULL would mean
current behavior?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2009-12-30 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 I don't have a better idea at the moment :-(
 
 It's been a while since I've been bitten by this issue -- the last
 time was under Sybase.  The Sybase suggestion was to either add
 dummy rows [YUCK!] to set the extreme bounds or to lie to the
 optimizer by fudging the statistics after each generation.  Perhaps
 we could do better by adding columns for high and low bounds to
 pg_statistic.  These would not be set by ANALYZE, but
 user-modifiable to cover exactly this problem?  NULL would mean
 current behavior?

Well, the problem Josh has got is exactly that a constant high bound
doesn't work.

What I'm wondering about is why he finds that re-running ANALYZE
isn't an acceptable solution.  It's supposed to be a reasonably
cheap thing to do.

I think the cleanest solution to this would be to make ANALYZE
cheaper, perhaps by finding some way for it to work incrementally.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2009-12-30 Thread Joshua D. Drake
On Wed, 30 Dec 2009 11:16:45 -0500, Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 I don't have a better idea at the moment :-(
  
 It's been a while since I've been bitten by this issue -- the last
 time was under Sybase.  The Sybase suggestion was to either add
 dummy rows [YUCK!] to set the extreme bounds or to lie to the
 optimizer by fudging the statistics after each generation.  Perhaps
 we could do better by adding columns for high and low bounds to
 pg_statistic.  These would not be set by ANALYZE, but
 user-modifiable to cover exactly this problem?  NULL would mean
 current behavior?
 
 Well, the problem Josh has got is exactly that a constant high bound
 doesn't work.
 
 What I'm wondering about is why he finds that re-running ANALYZE
 isn't an acceptable solution.  It's supposed to be a reasonably
 cheap thing to do.

What makes ANALYZE cheap is that two things:

1. It uses read only bandwidth (for the most part), which is the most
bandwidth we have
2. It doesn't take a lock that bothers anything

On the other hand ANALYZE also:

1. Uses lots of memory
2. Lots of processor
3. Can take a long time

We normally don't notice because most sets won't incur a penalty. We got a
customer who
has a single table that is over 1TB in size... We notice. Granted that is
the extreme
but it would only take a quarter of that size (which is common) to start
seeing issues.

 
 I think the cleanest solution to this would be to make ANALYZE
 cheaper, perhaps by finding some way for it to work incrementally.
 

That could be interesting. What about a running statistics set that has
some kind of 
threshold? What I mean is, we run our normal analyze but we can mark a
table HOT 
(yeah bad term). If we mark the table HOT statistics are generated on the
fly for
the planner and updated every X interval. Perhaps then written out at a
checkpoint?

This is just off the top of my head.

JD

   regards, tom lane

-- 
PostgreSQL - XMPP: jdrake(at)jabber(dot)postgresql(dot)org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2009-12-30 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 Well, the problem Josh has got is exactly that a constant high
 bound doesn't work.
 
I thought the problem was that the high bound in the statistics fell
too far below the actual high end in the data.  This tends (in my
experience) to be much more painful than an artificially extended
high end in the statistics.  (YMMV, of course.)
 
 What I'm wondering about is why he finds that re-running ANALYZE
 isn't an acceptable solution.  It's supposed to be a reasonably
 cheap thing to do.
 
Good point.  We haven't hit this problem in PostgreSQL precisely
because we can run ANALYZE often enough to prevent the skew from
becoming pathological.
 
 I think the cleanest solution to this would be to make ANALYZE
 cheaper, perhaps by finding some way for it to work incrementally.
 
Yeah, though as you say above, it'd be good to know why frequent
ANALYZE is a problem as it stands.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2009-12-30 Thread Greg Smith

Joshua D. Drake wrote:

We normally don't notice because most sets won't incur a penalty. We got a 
customer who
has a single table that is over 1TB in size... We notice. Granted that is the 
extreme
but it would only take a quarter of that size (which is common) to start seeing 
issues.
  


Right, and the only thing that makes this case less painful is that you 
don't really need the stats to be updated quite as often in situations 
with that much data.  If, say, your stats say there's 2B rows in the 
table but there's actually 2.5B, that's a big error, but unlikely to 
change the types of plans you get.  Once there's millions of distinct 
values it's takes a big change for plans to shift, etc.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2009-12-30 Thread Tom Lane
Greg Smith g...@2ndquadrant.com writes:
 Right, and the only thing that makes this case less painful is that you 
 don't really need the stats to be updated quite as often in situations 
 with that much data.  If, say, your stats say there's 2B rows in the 
 table but there's actually 2.5B, that's a big error, but unlikely to 
 change the types of plans you get.  Once there's millions of distinct 
 values it's takes a big change for plans to shift, etc.

Normally, yeah.  I think Josh's problem is that he's got
performance-critical queries that are touching the moving edge of the
data set, and so the part of the stats that are relevant to them is
changing fast, even though in an overall sense the table contents might
not be changing much.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2009-12-30 Thread Kevin Grittner
Greg Smith g...@2ndquadrant.com wrote:
 
 If, say, your stats say there's 2B rows in the table but there's
 actually 2.5B, that's a big error, but unlikely to change the
 types of plans you get.  Once there's millions of distinct values
 it's takes a big change for plans to shift, etc.
 
Well, the exception to that is if the stats say that your highest
value is x, and there are actually 500 million rows with values
greater than x, you can get some very bad plans for queries
requiring a range of values above x.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2009-12-30 Thread Alvaro Herrera
Tom Lane escribió:
 Greg Smith g...@2ndquadrant.com writes:
  Right, and the only thing that makes this case less painful is that you 
  don't really need the stats to be updated quite as often in situations 
  with that much data.  If, say, your stats say there's 2B rows in the 
  table but there's actually 2.5B, that's a big error, but unlikely to 
  change the types of plans you get.  Once there's millions of distinct 
  values it's takes a big change for plans to shift, etc.
 
 Normally, yeah.  I think Josh's problem is that he's got
 performance-critical queries that are touching the moving edge of the
 data set, and so the part of the stats that are relevant to them is
 changing fast, even though in an overall sense the table contents might
 not be changing much.

Maybe only tangentially related: if this was a setup partitioned by a
timestamp, it would be very useful to be able to analyze only the
current partition and have updated stats for the parent relation as
well.  However AFAICT with your proposed changes in this area this would
not work, right?  You'd need an analyze on the parent relation, which is
painful.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2009-12-30 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Tom Lane escribió:
 Normally, yeah.  I think Josh's problem is that he's got
 performance-critical queries that are touching the moving edge of the
 data set, and so the part of the stats that are relevant to them is
 changing fast, even though in an overall sense the table contents might
 not be changing much.

 Maybe only tangentially related: if this was a setup partitioned by a
 timestamp, it would be very useful to be able to analyze only the
 current partition and have updated stats for the parent relation as
 well.  However AFAICT with your proposed changes in this area this would
 not work, right?  You'd need an analyze on the parent relation, which is
 painful.

Yeah, I was just thinking about that myself.  The parent-level ANALYZE
would approximately double the work involved, assuming that your total
data set is large enough to max out the number of blocks sampled.
So it'd be painful but not catastrophic.  Maybe the way to think about
the incremental update problem is to find a way to let ANALYZE
calculate parent-relation stats from the stats of the individual
partitions.  Not that I know how to do that either, but at least it's
a clearly stated task.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2009-12-30 Thread Greg Stark
On Wed, Dec 30, 2009 at 4:31 PM, Joshua D. Drake j...@commandprompt.com wrote:
 On the other hand ANALYZE also:

 1. Uses lots of memory
 2. Lots of processor
 3. Can take a long time

 We normally don't notice because most sets won't incur a penalty. We got a
 customer who
 has a single table that is over 1TB in size... We notice. Granted that is
 the extreme
 but it would only take a quarter of that size (which is common) to start
 seeing issues.

I'm a bit puzzled by people's repeated suggestion here that large
tables take a long time to analyze. The sample analyze takes to
generate statistics is not heavily influenced by the size of the
table. Your 1TB table should take basically the same amount of time as
a 1GB table or a 1MB table (if it wasn't already in cache).

Unless the reason why it's 1TB is that the columns are extremely wide
rather than that it has a lot of rows? Or unless you've raised the
statistics target in (a misguided*) belief that larger tables require
larger statistics targets to achieve the same level of accuracy. Or
unless when you say ANALYZE you're really running VACUUM ANALYZE.

[*] except for ndistinct estimates :(

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2009-12-30 Thread Joshua D. Drake
On Wed, 30 Dec 2009 18:42:38 +, Greg Stark gsst...@mit.edu wrote:

 I'm a bit puzzled by people's repeated suggestion here that large
 tables take a long time to analyze. The sample analyze takes to
 generate statistics is not heavily influenced by the size of the
 table. Your 1TB table should take basically the same amount of time as
 a 1GB table or a 1MB table (if it wasn't already in cache).

No. 

postgres=# analyze verbose test_one_million;
INFO:  analyzing public.test_one_million
INFO:  test_one_million: scanned 3000 of 4425 pages, containing 677950
live rows and 0 dead rows; 3000 rows in sample, 76 estimated total rows
ANALYZE
Time: 168.009 ms
postgres=# analyze verbose test_one_million;
INFO:  analyzing public.test_one_million
INFO:  test_one_million: scanned 3000 of 4425 pages, containing 677950
live rows and 0 dead rows; 3000 rows in sample, 76 estimated total rows
ANALYZE
Time: 104.006 ms
postgres=# analyze verbose test_ten_million;
INFO:  analyzing public.test_ten_million
INFO:  test_ten_million: scanned 3000 of 44248 pages, containing 678000
live rows and 0 dead rows; 3000 rows in sample, 1048 estimated total
rows
ANALYZE
Time: 20145.148 ms
postgres=# analyze verbose test_ten_million;
INFO:  analyzing public.test_ten_million
INFO:  test_ten_million: scanned 3000 of 44248 pages, containing 678000
live rows and 0 dead rows; 3000 rows in sample, 1048 estimated total
rows
ANALYZE
Time: 18481.053 ms
postgres=# analyze verbose test_ten_million;
INFO:  analyzing public.test_ten_million
INFO:  test_ten_million: scanned 3000 of 44248 pages, containing 678000
live rows and 0 dead rows; 3000 rows in sample, 1048 estimated total
rows
ANALYZE
Time: 17653.006 ms

The test_one_million when in cache and out is very quick. I don't think
the ten million is actually able to get into cache (small box) but either
way
if you look at the on disk number for the one million 168ms versus the on
disk number for the ten million, they are vastly different.

postgres=# select
pg_size_pretty(pg_total_relation_size('test_one_million'));
 pg_size_pretty

 35 MB
(1 row)

Time: 108.006 ms
postgres=# select
pg_size_pretty(pg_total_relation_size('test_ten_million'));
 pg_size_pretty

 346 MB
(1 row)


 
 Unless the reason why it's 1TB is that the columns are extremely wide
 rather than that it has a lot of rows?

I should have qualified, yes they are very wide.

JD

-- 
PostgreSQL - XMPP: jdrake(at)jabber(dot)postgresql(dot)org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2009-12-30 Thread Peter Eisentraut
On tis, 2009-12-29 at 22:08 -0500, Tom Lane wrote:
 This seems like a fundamentally broken approach, first because time
 between analyzes is not even approximately a constant, and second
 because it assumes that we have a distance metric for all datatypes.

Maybe you could compute a correlation between the column values and the
transaction numbers to recognize a continuously advancing column.  It
wouldn't tell you much about how fast they are advancing, but at least
the typical use cases of serial and current timestamp columns should
clearly stick out.  And then instead of assuming that a value beyond the
histogram bound doesn't exist, you assume for example the average
frequency, which should be pretty good for the serial and timestamp
cases.  (Next step: Fourier analysis ;-) )


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2009-12-30 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On tis, 2009-12-29 at 22:08 -0500, Tom Lane wrote:
 This seems like a fundamentally broken approach, first because time
 between analyzes is not even approximately a constant, and second
 because it assumes that we have a distance metric for all datatypes.

 Maybe you could compute a correlation between the column values and the
 transaction numbers to recognize a continuously advancing column.  It
 wouldn't tell you much about how fast they are advancing, but at least
 the typical use cases of serial and current timestamp columns should
 clearly stick out.  And then instead of assuming that a value beyond the
 histogram bound doesn't exist, you assume for example the average
 frequency, which should be pretty good for the serial and timestamp
 cases.  (Next step: Fourier analysis ;-) )

Actually, the histogram hasn't got much of anything to do with estimates
of the number of occurrences of a single value.

Josh hasn't shown us his specific problem query, but I would bet that
it's roughly like WHERE update_time  now() - interval 'something',
that is, the estimate that's problematic is an inequality not an
equality.  When the range being asked for is outside the histogram
bounds, it really is rather difficult to come up with a reasonable
estimate --- you'd need a specific idea of how far outside the upper
bound it is, how fast the upper bound has been advancing, and how long
it's been since the last analyze.  (I find the last bit particularly
nasty, because it will mean that plans change even when nothing is
changing in the database.)

[ thinks for awhile ... ]

Actually, in the problematic cases, it's interesting to consider the
following strategy: when scalarineqsel notices that it's being asked for
a range estimate that's outside the current histogram bounds, first try
to obtain the actual current max() or min() of the column value --- this
is something we can get fairly cheaply if there's a btree index on the
column.  If we can get it, plug it into the histogram, replacing the
high or low bin boundary.  Then estimate as we currently do.  This would
work reasonably well as long as re-analyzes happen at a time scale such
that the histogram doesn't move much overall, ie, the number of
insertions between analyzes isn't a lot compared to the number of rows
per bin.  We'd have some linear-in-the-bin-size estimation error because
the modified last or first bin actually contains more rows than other
bins, but it would certainly work a lot better than it does now.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2009-12-30 Thread Chris Browne
j...@commandprompt.com (Joshua D. Drake) writes:
 On the other hand ANALYZE also:

 1. Uses lots of memory
 2. Lots of processor
 3. Can take a long time

 We normally don't notice because most sets won't incur a penalty. We got a
 customer who
 has a single table that is over 1TB in size... We notice. Granted that is
 the extreme
 but it would only take a quarter of that size (which is common) to start
 seeing issues.

I find it curious that ANALYZE *would* take a long time to run.

After all, its sampling strategy means that, barring having SET
STATISTICS to some ghastly high number, it shouldn't need to do
materially more work to analyze a 1TB table than is required to analyze
a 1GB table.

With the out-of-the-box (which may have changed without my notice ;-))
default of 10 bars in the histogram, it should search for 30K rows,
which, while not free, doesn't get enormously more expensive as tables
grow.
-- 
cbbrowne,@,gmail.com
http://linuxfinances.info/info/linuxdistributions.html
Rules  of  the  Evil  Overlord   #179.  I  will  not  outsource  core
functions. http://www.eviloverlord.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2009-12-30 Thread Greg Stark
well that's interesting because they claim to be doing exactly the same amount 
of I/O in terms of pages.

In the first case it's reading 3/4 of the table so it's effectively doing a 
sequential scan. In the second case it's only scanning 7.5% so you would expect 
it to be slower but not that much slower.

If as you say the rows are very wide then the other part of the equation will 
be TOAST table I/O though. I'm not sure what it would look like but I bet 
analyze isn't optimized to handle well -- not much of postgres really knows 
about TOAST. It'll be accessing the same number of TOAST records but out of a 
much bigger TOAST table. 
--
greg
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2009-12-30 Thread Tom Lane
Chris Browne cbbro...@acm.org writes:
 I find it curious that ANALYZE *would* take a long time to run.

 After all, its sampling strategy means that, barring having SET
 STATISTICS to some ghastly high number, it shouldn't need to do
 materially more work to analyze a 1TB table than is required to analyze
 a 1GB table.

Right.  The example JD quotes in this thread compares a 35MB table
to a 350MB one, and the difference is all about having crossed the
threshold of what would fit in his available RAM.  There isn't going
to be much difference in the ANALYZE time for big versus very big
tables.  (There might, however, be a difference in the quality of
the resulting stats :-()

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2009-12-30 Thread Greg Smith

Joshua D. Drake wrote:

postgres=# analyze verbose test_ten_million;
INFO:  analyzing public.test_ten_million
INFO:  test_ten_million: scanned 3000 of 44248 pages, containing 678000
live rows and 0 dead rows; 3000 rows in sample, 1048 estimated total
rows
ANALYZE
Time: 20145.148 ms
  


At an ever larger table sizes, this would turn into 3000 random seeks 
all over the drive, one at a time because there's no async I/O here to 
queue requests better than that for this access pattern.  Let's say they 
take 10ms each, not an unrealistic amount of time on current hardware.  
That's 30 seconds, best case, which is similar to what JD's example is 
showing even on a pretty small data set.  Under load it could easily 
take over a minute, hammering the disks the whole time, and in a TOAST 
situation you're doing even more work.  It's not outrageous and it 
doesn't scale linearly with table size, but it's not something you want 
to happen any more than you have to either--consider the poor client who 
is trying to get their work done while that is going on.


On smaller tables, you're both more likely to grab a useful next page 
via readahead, and to just have the data you need cached in RAM 
already.  There's a couple of shelves in the response time to finish 
ANALYZE as you exceed L1/L2 CPU cache size and RAM size, then it trails 
downward as the seeks get longer and longer once the data you need is 
spread further across the disk(s).  That the logical beginning of a 
drive is much faster than the logical end doesn't help either.  I should 
generate that graph again one day somewhere I can release it at...


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2009-12-30 Thread Craig Ringer

On 31/12/2009 12:33 AM, Kevin Grittner wrote:

Tom Lanet...@sss.pgh.pa.us  wrote:


Well, the problem Josh has got is exactly that a constant high
bound doesn't work.


I thought the problem was that the high bound in the statistics fell
too far below the actual high end in the data.  This tends (in my
experience) to be much more painful than an artificially extended
high end in the statistics.  (YMMV, of course.)


What I'm wondering about is why he finds that re-running ANALYZE
isn't an acceptable solution.  It's supposed to be a reasonably
cheap thing to do.


Good point.  We haven't hit this problem in PostgreSQL precisely
because we can run ANALYZE often enough to prevent the skew from
becoming pathological.


While regular ANALYZE seems to be pretty good ... is it insane to 
suggest determining the min/max bounds of problem columns by looking at 
a btree index on the column in ANALYZE, instead of relying on random 
data sampling? An ANALYZE that didn't even have to scan the indexes but 
just look at the ends might be something that could be run much more 
frequently with less I/O and memory cost than a normal ANALYZE, just to 
selectively update key stats that are an issue for such continuously 
advancing columns.


--
Craig Ringer

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2009-12-30 Thread Craig Ringer



While regular ANALYZE seems to be pretty good ... is it insane to
suggest determining the min/max bounds of problem columns by looking at
a btree index on the column in ANALYZE, instead of relying on random
data sampling? An ANALYZE that didn't even have to scan the indexes but
just look at the ends might be something that could be run much more
frequently with less I/O and memory cost than a normal ANALYZE, just to
selectively update key stats that are an issue for such continuously
advancing columns.


... which Tom Lane already raised in a smarter way in a message I hadn't 
read yet. Sorry for the noise.


--
Craig Ringer

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Thoughts on statistics for continuously advancing columns

2009-12-29 Thread Josh Berkus
All,

One of our clients is having query plan issues with a table with a
continuously advancing timestamp column (i.e. one with default now()).
The newest rows, which are the most in demand, are always estimated to
be fewer than they are or even non-existant.  As a result, the user has
to analyze the table every hour ... and it's a very large table.

I've seen this in a lot of other databases, both with timestamp columns
and with SERIALs -- both of which are very common table structures.
From my reading of the planner code, the problem seems to be the
histgram bounds ... if a requested value is above the high bound, it's
assumed to be extremely uncommon or not exist.   This leads to bad plans
if analyze hasn't been run very recently.

My thoughts on dealing with this intelligently without a major change to
statstics gathering went along these lines:

1. add columns to pg_statistic to hold estimates of upper and lower
bounds growth between analyzes.

2. every time analyze is run, populate these columns with 1/2 of the
proprotion of values above or below the previously stored bounds,
averaged with the existing value for the new columns.

3. use this factor instead of the existing algorithm to calculate the
row estimate for out-of-bounds values.

This is obviously a very rough idea, but I wanted to get feedback on the
general problem and my approach before going further with it.

Thanks!

--Josh Berkus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on statistics for continuously advancing columns

2009-12-29 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 My thoughts on dealing with this intelligently without a major change to
 statstics gathering went along these lines:

 1. add columns to pg_statistic to hold estimates of upper and lower
 bounds growth between analyzes.

This seems like a fundamentally broken approach, first because time
between analyzes is not even approximately a constant, and second
because it assumes that we have a distance metric for all datatypes.
(Note that convert_to_scalar does not assume that it can measure
arbitrary distances, but only fractions *within* a histogram bucket;
and even that is pretty shaky.)

I don't have a better idea at the moment :-(

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers