Re: [PERFORM] One tuple per transaction

2005-03-18 Thread Hannu Krosing
On L, 2005-03-12 at 14:05 -0800, Josh Berkus wrote:
 Tambet,
 
  In one of our applications we have a database function, which
  recalculates COGS (cost of good sold) for certain period. This involves
  deleting bunch of rows from one table, inserting them again in correct
  order and updating them one-by-one (sometimes one row twice) to reflect
  current state. The problem is, that this generates an enormous amount of
  tuples in that table.
 
 Sounds like you have an application design problem ...  how about re-writing 
 your function so it's a little more sensible?

Also, you could at least use a temp table for intermediate steps. This
will at least save WAL traffic.

-- 
Hannu Krosing [EMAIL PROTECTED]

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] multi-column index

2005-03-18 Thread Manfred Koizar
On Thu, 17 Mar 2005 23:48:30 -0800, Ron Mayer
[EMAIL PROTECTED] wrote:
Would this also help estimates in the case where values in a table
are tightly clustered, though not in strictly ascending or descending
order?

No, I was just expanding the existing notion of correlation from single
columns to index tuples.

For example, address data has many fields that are related
to each other (postal codes, cities, states/provinces).

This looks like a case for cross-column statistics, though you might not
have meant it as such.  I guess what you're talking about can also be
described with a single column.  In a list like

  3 3 ... 3 1 1 ... 1 7 7 ... 7 4 4 ... 4 ...

equal items are clustered together but the values are not correlated
to their positions.  This would require a whole new column
characteristic, something like the probability that we find the same
value in adjacent heap tuples, or the number of different values we can
expect on one heap page.  The latter might even be easy to compute
during ANALYSE.

Servus
 Manfred

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] multi-column index

2005-03-18 Thread Manfred Koizar
On Thu, 17 Mar 2005 13:15:32 -0500, Tom Lane [EMAIL PROTECTED] wrote:
I am coming around to the view that we really do need to calculate
index-specific correlation numbers,

Correlation is a first step.  We might also want distribution
information like number of distinct index tuples and histograms.

Now, as to the actual mechanics of getting the numbers: the above link
seems to imply reading the whole index in index order.

That turned out to be surprisingly easy (no need to look at data values,
no operator lookup, etc.) to implement as a proof of concept.  As it's
good enough for my use cases I never bothered to change it.

  Which is a
hugely expensive proposition for a big index,

Just a thought:  Could the gathering of the sample be integrated into
the bulk delete phase of VACUUM?  (I know, ANALYSE is not always
performed as an option to VACUUM, and VACUUM might not even have to
delete any index tuples.)

  We need a way
to get the number from a small sample of pages.

I had better (or at least different) ideas at that time, like walking
down the tree, but somehow lost impetus :-(

The idea I was toying with was to recalculate the index keys for the
sample rows that ANALYZE already acquires, and then compare/sort
those.

This seems to be the approach that perfectly fits into what we have now.

  This is moderately expensive CPU-wise though, and it's also not
clear what compare/sort means for non-btree indexes.

Nothing.  We'd need some notion of clusteredness instead of
correlation.  C.f. my answer to Ron in this thread.

BTW, the more I think about it, the more I come to the conclusion that
when the planner starts to account for clusteredness, random page cost
has to be raised.

Servus
 Manfred

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])