Re: [HACKERS] Getting rid of cmin and cmax

2006-09-19 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 I'm thinking of removing cmin and cmax, and keeping that information in 
 backend-private memory instead.

I don't believe you can remove *both*.  What's been discussed is
removing one of them, by letting the field represent a lookup key for an
in-memory structure in the infrequent case that xmin and xmax are both
the current xact.  You solve the table size problem by only having one
entry for each unique cmin/cmax pair in use.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Getting rid of cmin and cmax

2006-09-19 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:
  
I'm thinking of removing cmin and cmax, and keeping that information in 
backend-private memory instead.



I don't believe you can remove *both*.  What's been discussed is
removing one of them, by letting the field represent a lookup key for an
in-memory structure in the infrequent case that xmin and xmax are both
the current xact.  You solve the table size problem by only having one
entry for each unique cmin/cmax pair in use.
  


That's another possibility, but removing both cmin and cmax has also 
been discussed. It's also mentioned in the TODO item:


 One possible solution is to create a phantom cid which represents a 
cmin/cmax pair and is stored in local memory. *Another idea is to store 
both cmin and cmax only in local memory.*


Saving 4 bytes per tuple with the phantom cid is nice, but saving 8 
bytes (assuming we get rid of xvac in the future, or overlay it with 
xmin for example) is even better.


--
 Heikki Linnakangas
 EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Getting rid of cmin and cmax

2006-09-19 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Saving 4 bytes per tuple with the phantom cid is nice, but saving 8 
 bytes (assuming we get rid of xvac in the future, or overlay it with 
 xmin for example) is even better.

xvac is not going away, so that argument is unconvincing, and I don't
believe you can avoid blowing out local memory if you have to remember
each tuple's cmin/cmax separately.  (Notice that Manfred gave up on his
patch for lack of a spill-to-disk mechanism.)

I'm also concerned about loss of debug traceability if these fields
disappear entirely from disk --- it's been handy more than once to be
able to tell where in a complex transaction something happened.

Lastly, at least on machines with 8-byte MAXALIGN, removing four more
bytes from heap headers would save nothing.  So I'm not excited about
going through enormous pushups to get rid of both fields, when a far
simpler and better-performing mechanism suffices to remove one.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Getting rid of cmin and cmax

2006-09-19 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 Heikki Linnakangas [EMAIL PROTECTED] writes:
 Saving 4 bytes per tuple with the phantom cid is nice, but saving 8 
 bytes (assuming we get rid of xvac in the future, or overlay it with 
 xmin for example) is even better.

 xvac is not going away, so that argument is unconvincing, 

The use case for vacuum full has been narrowing steadily over time. I'm not
sure there's much left of it these days. In every case where it comes up on
list people are inevitably just confused and don't need it. In the few cases
where it's actually suggested we invariably recommend one of the various
commands that rewrite the entire table instead.

The main fundamental problem with rewriting the entire table is that you may
not have enough storage to do so and I think there may be better ways of
avoiding that than always storing 4 bytes in every tuple of every table just
in case we want to run vacuum full one day.

 and I don't believe you can avoid blowing out local memory if you have to
 remember each tuple's cmin/cmax separately. (Notice that Manfred gave up on
 his patch for lack of a spill-to-disk mechanism.)

spill to disk would certainly be a requirement. In the cases where it's needed
the data has to be stored somewhere, it just doesn't have to go through WAL
and take up table space when no other backend will every be interested in it.

 I'm also concerned about loss of debug traceability if these fields
 disappear entirely from disk --- it's been handy more than once to be
 able to tell where in a complex transaction something happened.

That's an interesting thought. I'm not sure what scenario you're picturing
though. You would still have xmin/xmax so it when do you need to look at cmin
to know when something happened?

You could certainly imagine a guc setting that would force the spill to disk
to always even when the data isn't needed. It seems like a poor substitute for
some dedicated tracing mechanism targeted specifically at the info needed for
debugging.

 Lastly, at least on machines with 8-byte MAXALIGN, removing four more
 bytes from heap headers would save nothing.  

Well there still exist plenty of 4-byte MAXALIGN machines out there. But the
more serious problem with this argument is that it comes up repeatedly. If we
pass up 4 bytes here and 4 bytes there pretty soon we're talking about real
data. Well, at least 8 bytes.

Getting rid of cmin, cmax, xvac and in some cases xmin (as discussed at the
code sprint) leaves us in much better shape even though any one of those
doesn't necessarily buy us much.

 So I'm not excited about going through enormous pushups to get rid of both
 fields, when a far simpler and better-performing mechanism suffices to
 remove one.

Frankly the whole phantom commandid thing sounds more complicated. You *still*
need a local state data structure that *still* has to spill to disk and now
it's much harder to characterize how large it will grow since it depends on
arbitrary combinations of cmin and cmax.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Getting rid of cmin and cmax

2006-09-19 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Frankly the whole phantom commandid thing sounds more complicated. You *still*
 need a local state data structure that *still* has to spill to disk and now
 it's much harder to characterize how large it will grow since it depends on
 arbitrary combinations of cmin and cmax.

Yeah, but it requires only one entry when a command processes
arbitrarily large numbers of tuples, so in practice it's not going to
need to spill to disk.  What Heikki wants to do will require an entry in
local memory for *each tuple* modified by a transaction.  That will ruin
performance on a regular basis.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Getting rid of cmin and cmax

2006-09-19 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 Frankly the whole phantom commandid thing sounds more complicated. You 
 *still*
 need a local state data structure that *still* has to spill to disk and now
 it's much harder to characterize how large it will grow since it depends on
 arbitrary combinations of cmin and cmax.

 Yeah, but it requires only one entry when a command processes
 arbitrarily large numbers of tuples, so in practice it's not going to
 need to spill to disk.  

Well there's a reason we support commandids up to 4 billion. One of the common
use cases of bulk loading data in a series of individual inserts would cause
such a structure to spill to disk. As would ISAM style programming that steps
through a large data set and updates rows one by one.

 What Heikki wants to do will require an entry in local memory for *each
 tuple* modified by a transaction. That will ruin performance on a regular
 basis.

Sure, but that's the same amount of data all those useless cmin/cmaxes are
taking up now, actually it's less, it's only 6 bytes instead of 8. Even
assuming no clever data structures compress it. And that data doesn't have to
be fsynced so it can sit in filesystem cache and get spooled out to disk
lazily. If you touch a million records in your transaction in one of the
peculiar situations that require keeping the data you're talking about a few
megs of cache sacrificed during that one operation versus extra i/o on every
operation.

I should probably let Heikki defend his idea though. I guess I was just
feeling argumentative. I'm know he's thought through the same things.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Getting rid of cmin and cmax

2006-09-19 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Well there's a reason we support commandids up to 4 billion. One of the common
 use cases of bulk loading data in a series of individual inserts would cause
 such a structure to spill to disk. As would ISAM style programming that steps
 through a large data set and updates rows one by one.

You're missing the point though, which is that no memory entry is needed
at all unless the same tuple has been both inserted and deleted in the
current transaction.  Bulk data loads will incur zero entries in this
scheme, whereas what Heikki has in mind will incur an entry per tuple.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Getting rid of cmin and cmax

2006-09-19 Thread Heikki Linnakangas

Tom Lane kirjoitti:

Gregory Stark [EMAIL PROTECTED] writes:
Frankly the whole phantom commandid thing sounds more complicated. 
You *still*
need a local state data structure that *still* has to spill to disk 
and now
it's much harder to characterize how large it will grow since it 
depends on

arbitrary combinations of cmin and cmax.


Yeah, but it requires only one entry when a command processes
arbitrarily large numbers of tuples, so in practice it's not going to
need to spill to disk. What Heikki wants to do will require an entry in
local memory for *each tuple* modified by a transaction. That will ruin
performance on a regular basis.


As I tried to say in the first post, I believe we can actually get away 
without an entry in local memory in typical scenarios, including bulk 
data loads. Bulk updates are probably the biggest problem, but I think 
we could handle even them just fine with the right data structure.


--
Heikki Linnakangas
EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Getting rid of cmin and cmax

2006-09-19 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 As I tried to say in the first post, I believe we can actually get away 
 without an entry in local memory in typical scenarios, including bulk 
 data loads.

I didn't find that argument very credible, particularly not the part
that assumes we know what the oldest snapshot is.  I remain of the
opinion that this is going to be a large, complicated (ie buggy),
poorly performing mechanism to hypothetically someday save 4 bytes
that, even if we do save them, are just going to disappear into
alignment padding on most newer servers.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Getting rid of cmin and cmax

2006-09-19 Thread Heikki Linnakangas

Tom Lane kirjoitti:

I'm also concerned about loss of debug traceability if these fields
disappear entirely from disk --- it's been handy more than once to be
able to tell where in a complex transaction something happened.
  


Sure. We'll just have to try to compensate that with debug messages 
etc., whatever scheme we choose.



Lastly, at least on machines with 8-byte MAXALIGN, removing four more
bytes from heap headers would save nothing.  So I'm not excited about
going through enormous pushups to get rid of both fields, when a far
simpler and better-performing mechanism suffices to remove one.
  


It would be a win on 32-bit architectures. And there has been discussion of
storing at least some data types unaligned.

--
Heikki Linnakangas
EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Getting rid of cmin and cmax

2006-09-19 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Tom Lane kirjoitti:
 I'm also concerned about loss of debug traceability if these fields
 disappear entirely from disk --- it's been handy more than once to be
 able to tell where in a complex transaction something happened.

 Sure. We'll just have to try to compensate that with debug messages 
 etc., whatever scheme we choose.

I think you completely misunderstand the context in which I'm concerned
about that --- handwaving about better debug messages doesn't assuage
the concern.  In fact, since I wrote that message I've had another
example of what stored cmin is good for: a few minutes ago, in
connection with Marc Evan's issue here,
http://archives.postgresql.org/pgsql-general/2006-09/msg00741.php
we were able to eliminate a theory about an FK trigger having modified a
row after its insertion by observing that the stored row still had cmin
= 0.  I've made use of cmin data in many prior cases to help identify
what's what: in lots of real applications, the cmin value tells you
exactly which kind of transaction inserted or modified the row, because
different transactions have different numbers of steps.  If cmin
vanishes into transient storage then after-the-fact forensic analysis
will be severely handicapped.  No amount of debug messages will make
up for data that's not there anymore when you realize you need it.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Getting rid of cmin and cmax

2006-09-19 Thread Bruce Momjian
Tom Lane wrote:
 Gregory Stark [EMAIL PROTECTED] writes:
  Frankly the whole phantom commandid thing sounds more complicated. You 
  *still*
  need a local state data structure that *still* has to spill to disk and now
  it's much harder to characterize how large it will grow since it depends on
  arbitrary combinations of cmin and cmax.
 
 Yeah, but it requires only one entry when a command processes
 arbitrarily large numbers of tuples, so in practice it's not going to
 need to spill to disk.  What Heikki wants to do will require an entry in
 local memory for *each tuple* modified by a transaction.  That will ruin
 performance on a regular basis.

Agreed.  TODO has:

* Merge xmin/xmax/cmin/cmax back into three header fields

  Before subtransactions, there used to be only three fields needed to
  store these four values. This was possible because only the current
  transaction looks at the cmin/cmax values. If the current transaction
  created and expired the row the fields stored where xmin (same as
  xmax), cmin, cmax, and if the transaction was expiring a row from a
  another transaction, the fields stored were xmin (cmin was not
  needed), xmax, and cmax. Such a system worked because a transaction
  could only see rows from another completed transaction. However,
  subtransactions can see rows from outer transactions, and once the
  subtransaction completes, the outer transaction continues, requiring
  the storage of all four fields. With subtransactions, an outer
  transaction can create a row, a subtransaction expire it, and when the
  subtransaction completes, the outer transaction still has to have
  proper visibility of the row's cmin, for example, for cursors.

  One possible solution is to create a phantom cid which represents a
  cmin/cmax pair and is stored in local memory.  Another idea is to
  store both cmin and cmax only in local memory.

I do see both the phantom idea and the local memory for all cmin/cmax
values.  I believe the phantom idea has the most merit.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match