subject:"\[HACKERS\] Remove xmin and cmin from frozen tuples"

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-16 Thread Bruce Momjian


This is now in the TODO list:

* Merge xmin/xmax/cmin/cmax back into three header fields

  Before subtransactions, there used to be only three fields needed to
  store these four values. This was possible because only the current
  transaction looks at the cmin/cmax values. If the current transaction
  created and expired the row the fields stored where xmin (same as
  xmax), cmin, cmax, and if the transaction was expiring a row from a
  another transaction, the fields stored were xmin (cmin was not
  needed), xmax, and cmax. Such a system worked because a transaction
  could only see rows from another completed transaction. However,
  subtransactions can see rows from outer transactions, and once the
  subtransaction completes, the outer transaction continues, requiring
  the storage of all four fields. With subtransactions, an outer
  transaction can create a row, a subtransaction expire it, and when the
  subtransaction completes, the outer transaction still has to have
  proper visibility of the row's cmin, for example, for cursors.

  One possible solution is to create a phantom cid which represents a
  cmin/cmax pair and is stored in local memory.  Another idea is to
  store both cmin and cmax only in local memory.


---

Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  I'm curious to know how can you store the cmin/cmax pair completely out
  of the tuple.  It's easy to see how to store a single identifier in each
  tuple that would be an index to a structure in local memory.  However,
  to eliminate both you'd have to keep a list of all tuples you have
  created or obsoleted, with the cmin and cmax of each.  This seems like
  an awful amount of memory.
 
 Yeah.  I think a reasonable compromise scheme is to try to get down to
 three fields per tuple:
 
   xminsame as now
   xmaxsame as now
   cid/xvac
 
 xvac can share storage with the command ID info as long as VACUUM FULL
 never tries to move a tuple whose originating or deleting transaction
 is still running ... which is pretty much the same restriction we had
 before.
 
 For the command IDs, I am imagining:
 
 if created in current transaction: use cid to store cmin
 
 if deleted in current transaction: use cid to store cmax
 
 if both created and deleted in current transaction: cid is an index
 into an in-memory data structure that contains cmin and cmax.
 
 current transaction would have to have the loose definition that
 includes any subxact of the current top xact, but still, I think that
 the case where you need both fields is relatively uncommon.
 
 The in-memory data structure would only need to contain an entry for
 each distinct combination of cmin and cmax used in the current xact,
 so I think we could assume that it would never get unreasonably large.
 The entries would be created on demand much like we do for
 multixact ids (I guess you'd want a hash table to map requested
 cmin/cmax to an existing entry ID quickly).
 
   regards, tom lane
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-07 Thread Alvaro Herrera

On Wed, Sep 07, 2005 at 12:31:01AM -0500, Jim C. Nasby wrote:
 On Tue, Sep 06, 2005 at 07:02:20PM -0400, Tom Lane wrote:
  Jim C. Nasby [EMAIL PROTECTED] writes:
   If the 4 header fields in question were just normalized out, wouldn't
   all the semantics continue to work the same? All I'm envisioning is
   replacing them in each tuple with a pointer (vis_id) to another
   datastore that would be roughly equivalent to:
  
   CREATE TABLE visibility (
   vis_id  SERIAL,
   xminint,
   xmaxint,
   cminint,
   cmax_xmax   int
   )
  
   Of course you wouldn't use an actual table to do this, but hopefully
   this clarifies my idea.

 Well, like I said, I'm not envisioning using a table to store that info.
 Since we'd be storing 4 fixed-length fields, you wouldn't need to
 actually store vis_id per entry, just use the offset into the page.
 Assuming you use one 'slot' to store the id of the first set, you could
 store 511 tuples per page, which doesn't sound very bad.

I think this could be done with our SLRU mechanism, just like pg_clog,
pg_subtrans and pg_multixact do.  Whether it's really a good idea or
not, it's another story.  The problem is that you would be creating new
ones all the time, it would become a huge source of contention, and it
would use a lot of memory.

Anyway you are just moving the storage somewhere else -- instead of
having 4 fields in the tuple itself, you have one field which points
the same 4 fields elsewhere.  I don't see how is that a win; it's
actually worse because you have to do two lookups.  (And actually you
have just enlarged the storage requirements because you need to store
the vis_id twice.)

-- 
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
La realidad se compone de muchos sueños, todos ellos diferentes,
pero en cierto aspecto, parecidos... (Yo, hablando de sueños eróticos)

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-07 Thread Jim C. Nasby

On Wed, Sep 07, 2005 at 01:05:52PM -0400, Alvaro Herrera wrote:
 Anyway you are just moving the storage somewhere else -- instead of
 having 4 fields in the tuple itself, you have one field which points
 the same 4 fields elsewhere.  I don't see how is that a win; it's
 actually worse because you have to do two lookups.  (And actually you
 have just enlarged the storage requirements because you need to store
 the vis_id twice.)

It would only be of use if the table had few transactions in it; in
other words, if it was mostly read-only. For a true read-only table
there are other options people have suggested that are probably better.

BTW, this becomes even more attractive if vis_id is int2; in that case
you can keep the entire mapping in memory in ~1MB.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-07 Thread Alvaro Herrera

On Wed, Sep 07, 2005 at 01:20:27PM -0400, Tom Lane wrote:

 Anyway the fundamental insight has been completely lost here.  The
 original point was that cmin and cmax are only interesting within the
 originating transaction, and not to anyone else, and thus perhaps don't
 need to be kept in permanent storage; while xmin/xmax are different
 animals because they *are* of interest to other transactions.

I'm curious to know how can you store the cmin/cmax pair completely out
of the tuple.  It's easy to see how to store a single identifier in each
tuple that would be an index to a structure in local memory.  However,
to eliminate both you'd have to keep a list of all tuples you have
created or obsoleted, with the cmin and cmax of each.  This seems like
an awful amount of memory.

-- 
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
I can't go to a restaurant and order food because I keep looking at the
fonts on the menu.  Five minutes later I realize that it's also talking
about food (Donald Knuth)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-07 Thread Tom Lane

Alvaro Herrera [EMAIL PROTECTED] writes:
 I'm curious to know how can you store the cmin/cmax pair completely out
 of the tuple.  It's easy to see how to store a single identifier in each
 tuple that would be an index to a structure in local memory.  However,
 to eliminate both you'd have to keep a list of all tuples you have
 created or obsoleted, with the cmin and cmax of each.  This seems like
 an awful amount of memory.

Yeah.  I think a reasonable compromise scheme is to try to get down to
three fields per tuple:

xminsame as now
xmaxsame as now
cid/xvac

xvac can share storage with the command ID info as long as VACUUM FULL
never tries to move a tuple whose originating or deleting transaction
is still running ... which is pretty much the same restriction we had
before.

For the command IDs, I am imagining:

if created in current transaction: use cid to store cmin

if deleted in current transaction: use cid to store cmax

if both created and deleted in current transaction: cid is an index
into an in-memory data structure that contains cmin and cmax.

current transaction would have to have the loose definition that
includes any subxact of the current top xact, but still, I think that
the case where you need both fields is relatively uncommon.

The in-memory data structure would only need to contain an entry for
each distinct combination of cmin and cmax used in the current xact,
so I think we could assume that it would never get unreasonably large.
The entries would be created on demand much like we do for
multixact ids (I guess you'd want a hash table to map requested
cmin/cmax to an existing entry ID quickly).

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-07 Thread Tom Lane

Alvaro Herrera [EMAIL PROTECTED] writes:
 I think this could be done with our SLRU mechanism, just like pg_clog,
 pg_subtrans and pg_multixact do.  Whether it's really a good idea or
 not, it's another story.  The problem is that you would be creating new
 ones all the time, it would become a huge source of contention, and it
 would use a lot of memory.

... and you couldn't expire the data in a reasonable period of time.
pg_subtrans and pg_multixact have only very short active ranges.
pg_clog is longer-lived, but at only 2 bits per transaction, we can
stand it.  16 bytes per tuple is a whole lot more data.

Anyway the fundamental insight has been completely lost here.  The
original point was that cmin and cmax are only interesting within the
originating transaction, and not to anyone else, and thus perhaps don't
need to be kept in permanent storage; while xmin/xmax are different
animals because they *are* of interest to other transactions.
The storage scheme Jim proposes takes no advantage of that whatever.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-06 Thread Jim C. Nasby

On Fri, Sep 02, 2005 at 03:51:15PM -0400, Bruce Momjian wrote:
 One possible solution is to create a phantom cid which represents a
 cmin/cmax pair and is stored in local memory.

If we're going to look at doing that I think it would also be good to
consider including xmin and xmax as well. This might require persisting
to disk, but for transactions that touch a number of tuples it could
potentially be a big win (imagine being able to shrink all 4 fields down
to a single int; a 45% space reduction).
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-06 Thread Alvaro Herrera

On Tue, Sep 06, 2005 at 03:58:28PM -0500, Jim C. Nasby wrote:
 On Fri, Sep 02, 2005 at 03:51:15PM -0400, Bruce Momjian wrote:
One possible solution is to create a phantom cid which represents a
cmin/cmax pair and is stored in local memory.
 
 If we're going to look at doing that I think it would also be good to
 consider including xmin and xmax as well.

If you do that, you'll never be able to delete or update the tuple.


 This might require persisting to disk, but for transactions that touch
 a number of tuples it could potentially be a big win (imagine being
 able to shrink all 4 fields down to a single int; a 45% space
 reduction).

Yeah, I've heard about compression algorithms that managed to fit
megabytes of data in 8 bytes and even less.  They were very cool.  No
one managed to write decompression algorithms however.  Imagine a whole
data warehouse could be stored in a single disk block!!  I imagine the
development of decompressors was boycotted by SAN vendors and the like.

-- 
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
Si un desconocido se acerca y te regala un CD de Ubuntu ...
 Eso es ...  Eau de Tux

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-06 Thread Jim C. Nasby

On Tue, Sep 06, 2005 at 05:37:06PM -0400, Alvaro Herrera wrote:
 On Tue, Sep 06, 2005 at 03:58:28PM -0500, Jim C. Nasby wrote:
  On Fri, Sep 02, 2005 at 03:51:15PM -0400, Bruce Momjian wrote:
   One possible solution is to create a phantom cid which represents a
   cmin/cmax pair and is stored in local memory.
  
  If we're going to look at doing that I think it would also be good to
  consider including xmin and xmax as well.
 
 If you do that, you'll never be able to delete or update the tuple.

My idea was to use an int to represent combinations of (c|x)(min|max),
probably on a per-table basis. Essentially, it would normalize these
values. I don't see how this would eliminate the ability to update or
delete.

Of course this would actually hurt on tables that had mostly single-row
operations, so it would have to be a table-creation option. I believe it
could substantially reduce the size of tables that don't see a lot of
transactions. It would be good just to have a quick and dirty patch just
to see if this is the case or not.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-06 Thread Tom Lane

Jim C. Nasby [EMAIL PROTECTED] writes:
 If we're going to look at doing that I think it would also be good to
 consider including xmin and xmax as well.
 
 If you do that, you'll never be able to delete or update the tuple.

 My idea was to use an int to represent combinations of (c|x)(min|max),
 probably on a per-table basis. Essentially, it would normalize these
 values. I don't see how this would eliminate the ability to update or
 delete.

How will other transactions know whether the tuple is good (yet) or not?
How will you recover if the backend that does know this crashes before
transaction end?  How will you lock tuples for update/delete?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-06 Thread Jim C. Nasby

On Tue, Sep 06, 2005 at 06:02:27PM -0400, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  If we're going to look at doing that I think it would also be good to
  consider including xmin and xmax as well.
  
  If you do that, you'll never be able to delete or update the tuple.
 
  My idea was to use an int to represent combinations of (c|x)(min|max),
  probably on a per-table basis. Essentially, it would normalize these
  values. I don't see how this would eliminate the ability to update or
  delete.
 
 How will other transactions know whether the tuple is good (yet) or not?
 How will you recover if the backend that does know this crashes before
 transaction end?  How will you lock tuples for update/delete?

If the 4 header fields in question were just normalized out, wouldn't
all the semantics continue to work the same? All I'm envisioning is
replacing them in each tuple with a pointer (vis_id) to another
datastore that would be roughly equivalent to:

CREATE TABLE visibility (
vis_id  SERIAL,
xminint,
xmaxint,
cminint,
cmax_xmax   int
)

Of course you wouldn't use an actual table to do this, but hopefully
this clarifies my idea. Any time the backend would update any of those
fields it would now insert a new row into visibility containing the
proper values and use vis_id in the tuples.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-06 Thread Tom Lane

Jim C. Nasby [EMAIL PROTECTED] writes:
 If the 4 header fields in question were just normalized out, wouldn't
 all the semantics continue to work the same? All I'm envisioning is
 replacing them in each tuple with a pointer (vis_id) to another
 datastore that would be roughly equivalent to:

 CREATE TABLE visibility (
 vis_id  SERIAL,
 xminint,
 xmaxint,
 cminint,
 cmax_xmax   int
 )

 Of course you wouldn't use an actual table to do this, but hopefully
 this clarifies my idea.

Let's see ... replace every tuple access with TWO tuple accesses
... yes, that should improve performance nicely.  And no, I'm not
sure the semantics are the same, particularly w.r.t. atomicity of
state changes.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-06 Thread Jim C. Nasby

On Tue, Sep 06, 2005 at 07:02:20PM -0400, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  If the 4 header fields in question were just normalized out, wouldn't
  all the semantics continue to work the same? All I'm envisioning is
  replacing them in each tuple with a pointer (vis_id) to another
  datastore that would be roughly equivalent to:
 
  CREATE TABLE visibility (
  vis_id  SERIAL,
  xminint,
  xmaxint,
  cminint,
  cmax_xmax   int
  )
 
  Of course you wouldn't use an actual table to do this, but hopefully
  this clarifies my idea.
 
 Let's see ... replace every tuple access with TWO tuple accesses
 ... yes, that should improve performance nicely.  And no, I'm not
 sure the semantics are the same, particularly w.r.t. atomicity of
 state changes.

Well, like I said, I'm not envisioning using a table to store that info.
Since we'd be storing 4 fixed-length fields, you wouldn't need to
actually store vis_id per entry, just use the offset into the page.
Assuming you use one 'slot' to store the id of the first set, you could
store 511 tuples per page, which doesn't sound very bad.

But this is why I'm hoping a quick patch could be done so that this
could be tested.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-03 Thread Manfred Koizar

On Fri, 2 Sep 2005 20:41:48 -0400 (EDT), Bruce Momjian
pgman@candle.pha.pa.us wrote:
 Once I had a patch based on 7.4 that stored cmin and cmax in
 backend-local memory.

Interesting idea, but how would you record the cmin/xmin values without
requiring unlimited memory?

That's exactly the reason for not sending it to -patches.  Without
spilling to disk this is just not ready for real life.  The problem is
that -- unlike other data structures that build up during a
transaction, e.g. trigger queues -- cmin/cmax lookup requires random
access, so we'd need some form of tree or hash.  Unfornunately I never
got beyond brainstorming :-(

BTW, is there anything else that'd need spilling to disk during long
transactions?

Servus
 Manfred


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-03 Thread Alvaro Herrera

On Sat, Sep 03, 2005 at 10:59:31AM +0200, Manfred Koizar wrote:
 On Fri, 2 Sep 2005 20:41:48 -0400 (EDT), Bruce Momjian
 pgman@candle.pha.pa.us wrote:
  Once I had a patch based on 7.4 that stored cmin and cmax in
  backend-local memory.
 
 Interesting idea, but how would you record the cmin/xmin values without
 requiring unlimited memory?
 
 That's exactly the reason for not sending it to -patches.  Without
 spilling to disk this is just not ready for real life.  The problem is
 that -- unlike other data structures that build up during a
 transaction, e.g. trigger queues -- cmin/cmax lookup requires random
 access, so we'd need some form of tree or hash.  Unfornunately I never
 got beyond brainstorming :-(
 
 BTW, is there anything else that'd need spilling to disk during long
 transactions?

Yes, the queue of pending deferred triggers.

-- 
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
El hombre nunca sabe de lo que es capaz hasta que lo intenta (C. Dickens)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-03 Thread Bruce Momjian

Manfred Koizar wrote:
 On Fri, 2 Sep 2005 20:41:48 -0400 (EDT), Bruce Momjian
 pgman@candle.pha.pa.us wrote:
  Once I had a patch based on 7.4 that stored cmin and cmax in
  backend-local memory.
 
 Interesting idea, but how would you record the cmin/xmin values without
 requiring unlimited memory?
 
 That's exactly the reason for not sending it to -patches.  Without
 spilling to disk this is just not ready for real life.  The problem is
 that -- unlike other data structures that build up during a
 transaction, e.g. trigger queues -- cmin/cmax lookup requires random
 access, so we'd need some form of tree or hash.  Unfornunately I never
 got beyond brainstorming :-(

What we do for shared row locks is much more compact because we create a
phantom xid for combination of xids and store that in a hash.  I think
the problem with having cmin/cmax in local memory is that without a way
to stamp a row with some fake cid (as is suggested in the TODO item now)
there is really no way to _group_ rows together to store their values as
a single item in local memory, meaning we would have to store ctids or
something for each row, and that seems unscalable.

I have added your memory-only idea to the TODO item because it
highlights that cmin/cmax is never looked at outside the backend that is
modifying the row.

If there was a tuple header field we can use just while the row is being
created or expired we could use that, but I don't see one.  Could we
recalculated one of the fields after it is created/expired?  I don't
know.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-02 Thread Pailloncy Jean-Gerard


[skip]



happening in the bgwriter's inner loop.  Even more to the point, you
can't do such changes without getting a superexclusive lock on the  
page

(not only locked, but no one else has it pinned), which is a real
nonstarter for the bgwriter, both for performance and possible  
deadlock

issues.

Hi Tom,

I do not want to discuss in deep the place to do this job, it is  
really over my head.


But, you said you need a super-exclusive lock, and I wonder if a  
wait-free algorithm would be good for pg in a general manner.


I have given some references about it already on the list.
http://archives.postgresql.org/pgsql-hackers/2005-02/msg00263.php

Cordialement,
Jean-Gérard Pailloncy


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-02 Thread Bruce Momjian

Tom Lane wrote:
 Of course, it's fair to ask whether such a program would be any faster
 than binary-mode COPY by the time you got done ... or enough faster to
 justify your effort, anyway.
 
 THe only fundamental disadvantage that COPY labors under is having to
 write WAL records.  It might be interesting to do something similar to
 the recent hacks for CREATE TABLE AS, so that a COPY into a table just
 created in the current transaction would skip writing WAL and instead
 fsync the table at the end.

Added to TODO:

o Allow COPY into an empty table to skip WAL logging

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-02 Thread Bruce Momjian

Tom Lane wrote:
 ITAGAKI Takahiro [EMAIL PROTECTED] writes:
  I think it would be a waste to retain xmin and cmin for frozen tuples
  because their values represent only 'visible for all transactions'.
 
 True, but the hard part is getting rid of the storage for them.
 
  I wrote a makeshift patch to compress xmin and cmin (8bytes) to
  1-bit flag, using tuple overlaping.
  Is this idea worth trying?
 
 I think this is incredibly ugly :-(.  It eliminates a fairly basic
 assumption which is that items on a page don't overlap.  The space
 savings cannot be worth the loss in testability and reliability.
 To take just one problem, it is no longer possible to check an item
 offset for validity against pd_upper.  If we're going to do this,
 we need a more invasive patch that changes the structure of heaptuple
 headers in a more fundamental way, and avoids breaking the page layout
 representation.  (Something like the way Oids are now handled might
 work, although there are alignment issues to worry about, and it'd
 take more work on VACUUM's part to convert a tuple to frozen state.)
 
 I'm also less than enthused about using up our last infomask bit for
 a relatively unimportant purpose.  We might need that for something
 bigger someday... though I can't presently guess what.

Considering the cost/benefits, rather than doing some optimization for
long-lived tuples, I would like to see us merge the existing
xmin/xmax/cmin/cmax values back into three storage fields like we had in
7.4 and had to expand to a full four in 8.0 to support subtransactions.
The benefit is that every row would be reduced in size by 4 bytes or 14%
for all rows:

* Merge xmin/xmax/cmin/cmax back into three header fields

  Before subtransactions, there used to be only three fields needed to
  store these four values. This was possible because only the current
  transaction looks at the cmin/cmax values. If the current transaction
  created and expired the row the fields stored where xmin (same as
  xmax), cmin, cmax, and if the transaction was expiring a row from a
  another transaction, the fields stored were xmin (cmin was not
  needed), xmax, and cmax. Such a system worked because a transaction
  could only see rows from another completed transaction.

  However, subtransactions can see rows from outer transactions, and
  once the subtransaction completes, the outer transaction continues,
  requiring the storage of all four fields. With subtransactions, an
  outer transaction can create a row, a subtransaction expire it, and
  when the subtransaction completes, the outer transaction still has
  to have proper visibility of the row's cmin, for example, for 
  cursors.

  One possible solution is to create a phantom cid which represents a
  cmin/cmax pair and is stored in local memory.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-02 Thread Tom Lane

Bruce Momjian pgman@candle.pha.pa.us writes:
 Tom Lane wrote:
 THe only fundamental disadvantage that COPY labors under is having to
 write WAL records.  It might be interesting to do something similar to
 the recent hacks for CREATE TABLE AS, so that a COPY into a table just
 created in the current transaction would skip writing WAL and instead
 fsync the table at the end.

 Added to TODO:
 o Allow COPY into an empty table to skip WAL logging

It has to be a *new* table, not an *empty* table.  If it's already
visible to other xacts then somebody else could insert into it in
parallel with you, because COPY doesn't take an exclusive lock.
Contrariwise, it doesn't really matter (I think) if there are WAL-logged
records already in the table and COPY is adding more that aren't logged.
(You might have to force COPY to start adding the rows on freshly added
pages ... hmm ... all of a sudden I think we had this discussion already?
I for sure remember the fresh-pages trick from some other thread.)

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-02 Thread Tom Lane

Bruce Momjian pgman@candle.pha.pa.us writes:
 Considering the cost/benefits, rather than doing some optimization for
 long-lived tuples, I would like to see us merge the existing
 xmin/xmax/cmin/cmax values back into three storage fields like we had in
 7.4 and had to expand to a full four in 8.0 to support subtransactions.

There is another reason for trying to do that rather than the frozen-row
optimization, which is that to get it down to two visibility-related
fields, we'd have to find another representation for tuples that are
Datums in memory.  The current Datum representation overlays three int32
fields where the visibility fields are for a tuple on-disk.  This works
fine now, and would still work fine if we could revert to the 7.4
approach, but it doesn't play nicely with a scheme to remove 2 of the 4
fields.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-02 Thread Alvaro Herrera

On Fri, Sep 02, 2005 at 04:02:08PM -0400, Tom Lane wrote:
 Bruce Momjian pgman@candle.pha.pa.us writes:
  Tom Lane wrote:
  THe only fundamental disadvantage that COPY labors under is having to
  write WAL records.  It might be interesting to do something similar to
  the recent hacks for CREATE TABLE AS, so that a COPY into a table just
  created in the current transaction would skip writing WAL and instead
  fsync the table at the end.
 
  Added to TODO:
  o Allow COPY into an empty table to skip WAL logging
 
 It has to be a *new* table, not an *empty* table.  If it's already
 visible to other xacts then somebody else could insert into it in
 parallel with you, because COPY doesn't take an exclusive lock.

What about the indexes?  Logging one of the inserters and not the other
is certain to corrupt the whole thing.  (Logging index insertion but not
the heap itself is silly, but perhaps an easy way out is to disable the
feature for tables with indexes.)

 Contrariwise, it doesn't really matter (I think) if there are WAL-logged
 records already in the table and COPY is adding more that aren't logged.

Only if the page is locked in a fashion that the bulk loader can't
insert tuples into a page that the other transaction is using.  (Not
sure if this can happen in reality.)  Else we risk both inserting a
tuple in the same page, and on recovery finding out that somebody else
used the tuple slot.

-- 
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
Los románticos son seres que mueren de deseos de vida

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-02 Thread Josh Berkus

Tom, Alvaro,

  It has to be a *new* table, not an *empty* table.  If it's already
  visible to other xacts then somebody else could insert into it in
  parallel with you, because COPY doesn't take an exclusive lock.

There's still major gains to be had, for ETL, in being able to disable 
logging on new tables/partitions.  *particularly* partitions.

 Contrariwise, it doesn't really matter (I think) if there are WAL-logged
 records already in the table and COPY is adding more that aren't logged.
 (You might have to force COPY to start adding the rows on freshly added
 pages ... hmm ... all of a sudden I think we had this discussion
 already? I for sure remember the fresh-pages trick from some other
 thread.)

Yes, and that's what shot the proposal down before.   But I don't think we 
devoted sufficient discussion to the new table case.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-02 Thread Josh Berkus

People:

 Bruce Momjian pgman@candle.pha.pa.us writes:
  Considering the cost/benefits, rather than doing some optimization for
  long-lived tuples, I would like to see us merge the existing
  xmin/xmax/cmin/cmax values back into three storage fields like we had
  in 7.4 and had to expand to a full four in 8.0 to support
  subtransactions.

Hmmm.   I personally don't see a whole lot of value in trimming 4 bytes per 
row off an archive table, particularly if the table would need to go 
through some kind of I/O intensive operation to do it.

Where I do see value is in enabling index-only access for frozen tables.  
That would be a *huge* gain, especially with bitmaps.   I think we've 
discussed this before,though.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-02 Thread Tom Lane

Alvaro Herrera [EMAIL PROTECTED] writes:
 On Fri, Sep 02, 2005 at 04:02:08PM -0400, Tom Lane wrote:
 It has to be a *new* table, not an *empty* table.  If it's already
 visible to other xacts then somebody else could insert into it in
 parallel with you, because COPY doesn't take an exclusive lock.

 What about the indexes?  Logging one of the inserters and not the other
 is certain to corrupt the whole thing.

Good point, but that fits in just fine with the restriction to
just-created tables.

 Contrariwise, it doesn't really matter (I think) if there are WAL-logged
 records already in the table and COPY is adding more that aren't logged.

 Only if the page is locked in a fashion that the bulk loader can't
 insert tuples into a page that the other transaction is using.

What other transaction?  The point I was making is that
BEGIN;
CREATE TABLE ...
INSERT ...
COPY ...
is still optimizable.  There isn't going to be anyone competing with
the COPY while it runs.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-02 Thread Alvaro Herrera

On Fri, Sep 02, 2005 at 04:27:59PM -0400, Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:

  Contrariwise, it doesn't really matter (I think) if there are WAL-logged
  records already in the table and COPY is adding more that aren't logged.
 
  Only if the page is locked in a fashion that the bulk loader can't
  insert tuples into a page that the other transaction is using.
 
 What other transaction?  The point I was making is that
   BEGIN;
   CREATE TABLE ...
   INSERT ...
   COPY ...
 is still optimizable.  There isn't going to be anyone competing with
 the COPY while it runs.

Sure.  I was thinking that you were looking for a mechanism to relax the
other restriction.

-- 
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
Dios hizo a Adán, pero fue Eva quien lo hizo hombre.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-02 Thread Alvaro Herrera

On Fri, Sep 02, 2005 at 01:30:58PM -0700, Josh Berkus wrote:

  Contrariwise, it doesn't really matter (I think) if there are WAL-logged
  records already in the table and COPY is adding more that aren't logged.
  (You might have to force COPY to start adding the rows on freshly added
  pages ... hmm ... all of a sudden I think we had this discussion
  already? I for sure remember the fresh-pages trick from some other
  thread.)
 
 Yes, and that's what shot the proposal down before.   But I don't think we 
 devoted sufficient discussion to the new table case.

If we are going to have real partitioning sometime soon, I don't think
the restriction is a problem.  You may have to load a whole partition
again, which may be faster than using logged COPY to an already-filled
partition.  The point is, it's not the whole table, just a partition.

-- 
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
Acepta los honores y aplausos y perderás tu libertad

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-02 Thread Alvaro Herrera

On Fri, Sep 02, 2005 at 01:35:42PM -0700, Josh Berkus wrote:

  Bruce Momjian pgman@candle.pha.pa.us writes:
   Considering the cost/benefits, rather than doing some optimization for
   long-lived tuples, I would like to see us merge the existing
   xmin/xmax/cmin/cmax values back into three storage fields like we had
   in 7.4 and had to expand to a full four in 8.0 to support
   subtransactions.
 
 Hmmm.   I personally don't see a whole lot of value in trimming 4 bytes per 
 row off an archive table, particularly if the table would need to go 
 through some kind of I/O intensive operation to do it.

I think you are missing something.  These 4 bytes are not trimmed by an
I/O-intensive operation, they are not written in the first place.

Now, I agree for a very wide table those 4 bytes per tuple may not be a
lot.  But the optimization could be significant for not-wide (uh, sorry,
I don't remember the word) tables. 

 Where I do see value is in enabling index-only access for frozen tables.  
 That would be a *huge* gain, especially with bitmaps.   I think we've 
 discussed this before, though.

That's a completely different discussion.  Btree-organized heaps may
help you there.

-- 
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
Having your biases confirmed independently is how scientific progress is
made, and hence made our great society what it is today (Mary Gardiner)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-02 Thread Bruce Momjian

Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  On Fri, Sep 02, 2005 at 04:02:08PM -0400, Tom Lane wrote:
  It has to be a *new* table, not an *empty* table.  If it's already
  visible to other xacts then somebody else could insert into it in
  parallel with you, because COPY doesn't take an exclusive lock.
 
  What about the indexes?  Logging one of the inserters and not the other
  is certain to corrupt the whole thing.
 
 Good point, but that fits in just fine with the restriction to
 just-created tables.

Seem the newly created table could have an index, but we would skip
logging on that too and create a zero-length file on crash restore.

  Contrariwise, it doesn't really matter (I think) if there are WAL-logged
  records already in the table and COPY is adding more that aren't logged.
 
  Only if the page is locked in a fashion that the bulk loader can't
  insert tuples into a page that the other transaction is using.
 
 What other transaction?  The point I was making is that
   BEGIN;
   CREATE TABLE ...
   INSERT ...
   COPY ...
 is still optimizable.  There isn't going to be anyone competing with
 the COPY while it runs.

Updated TODO:

o Allow COPY on a newly-created table to skip WAL logging

  On crash recovery, the table involved in the COPY would
  have its heap and index files truncated.  One issue is
  that no other backend should be able to add to the table
  at the same time, which is something that is currently
  allowed.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-02 Thread Tom Lane

Alvaro Herrera [EMAIL PROTECTED] writes:
 On Fri, Sep 02, 2005 at 01:35:42PM -0700, Josh Berkus wrote:
 Where I do see value is in enabling index-only access for frozen tables.  
 That would be a *huge* gain, especially with bitmaps.   I think we've 
 discussed this before, though.

 That's a completely different discussion.  Btree-organized heaps may
 help you there.

There was some talk of using a spare bit in index entries to mark known
good index entries (xmin committed and less than GlobalXmin, and xmax
invalid) but the cost of maintaining such bits seems nontrivial.  In any
case I agree that's an independent issue.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-02 Thread Tom Lane

Bruce Momjian pgman@candle.pha.pa.us writes:
 Updated TODO:

 o Allow COPY on a newly-created table to skip WAL logging

   On crash recovery, the table involved in the COPY would
   have its heap and index files truncated.  One issue is
   that no other backend should be able to add to the table
   at the same time, which is something that is currently
   allowed.

This is simply wrong.  (1) a table created in the current transaction
isn't visible to anyone else, (2) the correct rollback state is for
it not to be there, rather than be there and empty.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-02 Thread Bruce Momjian

Tom Lane wrote:
 Bruce Momjian pgman@candle.pha.pa.us writes:
  Updated TODO:
 
  o Allow COPY on a newly-created table to skip WAL logging
 
On crash recovery, the table involved in the COPY would
have its heap and index files truncated.  One issue is
that no other backend should be able to add to the table
at the same time, which is something that is currently
allowed.
 
 This is simply wrong.  (1) a table created in the current transaction
 isn't visible to anyone else, (2) the correct rollback state is for
 it not to be there, rather than be there and empty.

New text:

o Allow COPY on a newly-created table to skip WAL logging

  On crash recovery, the table involved in the COPY would
  removed or have its heap and index files truncated.  One
  issue is that no other backend should be able to add to
  the table at the same time, which is something that is
  currently allowed.

I think we can lock a zero-length table and do this optimization.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-02 Thread Alvaro Herrera

On Fri, Sep 02, 2005 at 05:18:09PM -0400, Tom Lane wrote:
 Bruce Momjian pgman@candle.pha.pa.us writes:
  Updated TODO:
 
  o Allow COPY on a newly-created table to skip WAL logging
 
On crash recovery, the table involved in the COPY would
have its heap and index files truncated.  One issue is
that no other backend should be able to add to the table
at the same time, which is something that is currently
allowed.
 
 This is simply wrong.  (1) a table created in the current transaction
 isn't visible to anyone else, (2) the correct rollback state is for
 it not to be there, rather than be there and empty.

As a related note:

I remember somebody mentioned some time ago that if you create a table
and then crash before ending the transaction, the tuple in pg_class is
no longer valid, but the file remains.  I think this will be a much
worse problem if we allow a table that's being COPY'ed to remain after a
crash.

-- 
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
Oh, oh, las chicas galacianas, lo harán por las perlas,
¡Y las de Arrakis por el agua! Pero si buscas damas
Que se consuman como llamas, ¡Prueba una hija de Caladan! (Gurney Halleck)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-02 Thread Tom Lane

Alvaro Herrera [EMAIL PROTECTED] writes:
 I remember somebody mentioned some time ago that if you create a table
 and then crash before ending the transaction, the tuple in pg_class is
 no longer valid, but the file remains.

Right --- it will be removed on transaction rollback, but not if the
backend crashes first.  There was a patch submitted earlier this year to
try to clean out such files, but it got rejected (as too messy IIRC).
I think we still have a TODO item about it.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-02 Thread Manfred Koizar

On Fri, 2 Sep 2005 15:51:15 -0400 (EDT), Bruce Momjian
pgman@candle.pha.pa.us wrote:
   * Merge xmin/xmax/cmin/cmax back into three header fields

And don't forget xvac, please.

 Before subtransactions, there used to be only three fields needed to
 store these four values.

... five values.

 This was possible because only the current
 transaction looks at the cmin/cmax values.

Which is a reason to get rid of cmin/cmax in tuple headers entirely.
Once I had a patch based on 7.4 that stored cmin and cmax in
backend-local memory.  It passed make check and some volume tests, but
I felt it was not ready to be applied without any spill-to-disk
mechanism.  Development stalled when I tried to eliminate xvac as
well, which would have required deep cuts into VACUUM code :-(

Servus
 Manfred


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-02 Thread Bruce Momjian

Manfred Koizar wrote:
 On Fri, 2 Sep 2005 15:51:15 -0400 (EDT), Bruce Momjian
 pgman@candle.pha.pa.us wrote:
  * Merge xmin/xmax/cmin/cmax back into three header fields
 
 And don't forget xvac, please.
   
Before subtransactions, there used to be only three fields needed to
store these four values.
 
 ... five values.
 
  This was possible because only the current
  transaction looks at the cmin/cmax values.
 
 Which is a reason to get rid of cmin/cmax in tuple headers entirely.
 Once I had a patch based on 7.4 that stored cmin and cmax in
 backend-local memory.  It passed make check and some volume tests, but
 I felt it was not ready to be applied without any spill-to-disk
 mechanism.  Development stalled when I tried to eliminate xvac as
 well, which would have required deep cuts into VACUUM code :-(

Interesting idea, but how would you record the cmin/xmin values without
requiring unlimited memory?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-01 Thread Tom Lane

ITAGAKI Takahiro [EMAIL PROTECTED] writes:
 I agree. I think an good position of freezer is on bgwriter.
 My idea is:
   1. Just before bgwriter writes an dirty page in LRU order,
   2. Freeze tuples in the page and repair fragmentation.
   3. (Replace the fsm page that has least freespace.)
   4. Flush the page.

This is a bad idea.  The bgwriter isn't the place to be doing freezing,
because there is no reasonable way for it to guarantee that all old
tuples in a table (or any larger unit) have been frozen.  So you'd still
need VACUUM to ensure no wraparound.  Plus, you can't do such changes
without emitting an XLOG record, which is something we don't want
happening in the bgwriter's inner loop.  Even more to the point, you
can't do such changes without getting a superexclusive lock on the page
(not only locked, but no one else has it pinned), which is a real
nonstarter for the bgwriter, both for performance and possible deadlock
issues.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-01 Thread Alvaro Herrera

On Wed, Aug 31, 2005 at 09:14:42PM -0700, Josh Berkus wrote:

  One thing that comes to mind is that this makes somewhat easier to build
  a tool to write pre-built tables, for bulk-loading purposes.  You just
  construct the binary file with the HEAP_FROZEN bit set, and then attach
  the file to a dummy table.  (Then again, you can do it today, using a
  Xmin of FrozenTransactionId.  I wonder why the Bizgres people isn't
  advocating a tool to do that.  It is very hard to do with user-defined
  types, but for BI/DW you mostly don't need those, do you?)
 
 Hmmm ... can you expand on this a little?  We'd discussed frozen partitions 
 but hadn't thought to get around to them for a while, expecting the kind of 
 issues which Tom just raised.

What issues did he raise on this?

What I'm saying is that you can write a heap file, on which the tuples
would all have xmin=FrozenTransactionId, xmax=Invalid, and the
corresponding bits set in the infomask.  This ensures that no matter the
state of the server, you can plug the file in and all tuples will be
valid.

The only problem is figuring out how to lay the data in the tuples
themselves, w.r.t endianness and such.  This is platform-dependent, so
you have to write code to do it correctly.  In absence of user-defined
types, this should not be _too_ hard to do.  Of course, such a program
would in general also be Postgres-version-dependent.

Note that this is a very different business from skipping the Xmin and
Cmin from the tuple header -- in fact, there's no relation to that at
all.

-- 
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
FOO MANE PADME HUM

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-01 Thread Tom Lane

Alvaro Herrera [EMAIL PROTECTED] writes:
 What I'm saying is that you can write a heap file, on which the tuples
 would all have xmin=FrozenTransactionId, xmax=Invalid, and the
 corresponding bits set in the infomask.  This ensures that no matter the
 state of the server, you can plug the file in and all tuples will be
 valid.

 The only problem is figuring out how to lay the data in the tuples
 themselves, w.r.t endianness and such.  This is platform-dependent, so
 you have to write code to do it correctly.  In absence of user-defined
 types, this should not be _too_ hard to do.  Of course, such a program
 would in general also be Postgres-version-dependent.

Of course, it's fair to ask whether such a program would be any faster
than binary-mode COPY by the time you got done ... or enough faster to
justify your effort, anyway.

THe only fundamental disadvantage that COPY labors under is having to
write WAL records.  It might be interesting to do something similar to
the recent hacks for CREATE TABLE AS, so that a COPY into a table just
created in the current transaction would skip writing WAL and instead
fsync the table at the end.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-01 Thread Simon Riggs

On Wed, 2005-08-31 at 22:25 -0400, Alvaro Herrera wrote:
 On Thu, Sep 01, 2005 at 10:45:44AM +0900, ITAGAKI Takahiro wrote:
 
 Hi,
 
  I think it would be a waste to retain xmin and cmin for frozen tuples
  because their values represent only 'visible for all transactions'.
  Additionally, most tuples in database can be frozen potentially.
 
 I think this is an interesting idea.  

Agreed, especially since it would avoid the need to vacuum altogether.

 I was thinking that when the tuple
 needs to be obsoleted it would need to grow to accomodate the Xmax, but
 you are not actually proposing to remove that, so it seems sensible.  In
 fact, it is perfectly reasonable to remove Xmin and Cmin, because after
 the tuple is frozen, the Xmin never changes again.

It's a good idea, but the Xmin is set to FrozenTransactionId, which is
how we know it is frozen, so how can we remove Xmin? The way to do this
is surely by using a row version id that is different for this format.

Getting 8 or 16 bytes per row back would be a very useful gain.

 Now, one thing of note is that you need to compress the page in order
 to actually be able to use the just-freed space.  VACUUM could do that,
 but maybe it would be better to do it on-line -- the freezing process is
 going to have to write the page regardless.  I wonder if with your patch
 the page is compressed on the same VACUUM execution that freezes the
 tuple?

Only if you do a FULL, which is currently incompatible with a FREEZE. 

There's no point in compressing a block if you can't also redistribute
rows between blocks to fill up the spaces, so another reason why it has
to be a FULL. Unless you do this at load time, which is why I guess you
mention

 One thing that comes to mind is that this makes somewhat easier to build
 a tool to write pre-built tables, for bulk-loading purposes.  You just
 construct the binary file with the HEAP_FROZEN bit set, and then attach
 the file to a dummy table.  (Then again, you can do it today, using a
 Xmin of FrozenTransactionId.  I wonder why the Bizgres people isn't
 advocating a tool to do that.  It is very hard to do with user-defined
 types, but for BI/DW you mostly don't need those, do you?)

Loading a table using COPY with frozen bits set was suggested in May, so
yeh... it was suggested. At that time it was rejected, since earlier
transactions would then be able to see rows they ought not be able to
see. Thinking some more about this, this is only the inverse situation
of a TRUNCATE. With truncate we remove tuples that ought to still be
visible to pre-existing transactions. So there shouldn't really be an
issue with loading pre-frozen tuples - as long as you accept the
consequences for row visibility. 

Externally writing blocks is possible, but it bypasses a lot of other
features. My current preference would be to have bulk_heap_insert()
function to add a whole page at a time rather than inserting rows one at
at a time. The main objective for a load is to make it disk bound; once
we've achieved that by some further tuning, writing an external file
would cost around the same as writing it internally from the DBMS.
Oracle (direct path loader) and Teradata (Fastload) load data in
complete blocks using a reduced code pathway, so I guess I was just
following on, but I'm genuinely open to further persuasion if there is a
better way.

Having a table marked as INSERT ONLY would allow us to save 8 bytes/row,
loading it pre-frozen (in some way) would save another 8 bytes/row and
allow us to permanently avoid VACUUMing the table. That would be even
better when we have per-table XID wrap avoidance.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-01 Thread Alvaro Herrera

On Thu, Sep 01, 2005 at 11:08:36AM -0400, Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  What I'm saying is that you can write a heap file, on which the tuples
  would all have xmin=FrozenTransactionId, xmax=Invalid, and the
  corresponding bits set in the infomask.  This ensures that no matter the
  state of the server, you can plug the file in and all tuples will be
  valid.
 
  The only problem is figuring out how to lay the data in the tuples
  themselves, w.r.t endianness and such.  This is platform-dependent, so
  you have to write code to do it correctly.  In absence of user-defined
  types, this should not be _too_ hard to do.  Of course, such a program
  would in general also be Postgres-version-dependent.
 
 Of course, it's fair to ask whether such a program would be any faster
 than binary-mode COPY by the time you got done ... or enough faster to
 justify your effort, anyway.

It may not be faster generating the data in the first place, but you
don't have to vacuum the table, nor you are subject to hint bits
changing, resulting in more unnecessary I/O.

This can't be avoided with COPY, because there's always the chance that
it will fail partway through, so you can't write frozen tuples.  With an
external program, you can just dump the invalid line somewhere else and
continue with the rest.

-- 
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
Just treat us the way you want to be treated + some extra allowance
 for ignorance.(Michael Brusser)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-01 Thread Josh Berkus

Alvaro,

 What issues did he raise on this?

On having no Xmin.

 What I'm saying is that you can write a heap file, on which the tuples
 would all have xmin=FrozenTransactionId, xmax=Invalid, and the
 corresponding bits set in the infomask.  This ensures that no matter the
 state of the server, you can plug the file in and all tuples will be
 valid.

 The only problem is figuring out how to lay the data in the tuples
 themselves, w.r.t endianness and such.  This is platform-dependent, so
 you have to write code to do it correctly.  In absence of user-defined
 types, this should not be _too_ hard to do.  Of course, such a program
 would in general also be Postgres-version-dependent.

So, bulk loading by file generation?   So the idea is that you would generate 
a properly formatted PostgreSQL table file, and then in one transaction 
create the table and attach it?

Seems like this would have the additional limitation of being useful only for 
loading new partitions/new tables.  However, it would have some significant 
advantages for bulk loading ... chiefly that the data page generation and 
associated computations could be done *off* the database server.   This might 
help considerably in getting around the 100mb/s data computation ceiling 
we're hitting ...

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-01 Thread Josh Berkus

Tom,

 THe only fundamental disadvantage that COPY labors under is having to
 write WAL records.  It might be interesting to do something similar to
 the recent hacks for CREATE TABLE AS, so that a COPY into a table just
 created in the current transaction would skip writing WAL and instead
 fsync the table at the end.

Yes, I thought we discussed doing this for empty tables -- it would be, per 
our tests, a +10% to +30% boost to COPY.

But there was some problem the patch?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-01 Thread Alvaro Herrera

On Thu, Sep 01, 2005 at 09:20:48AM -0700, Josh Berkus wrote:

  What I'm saying is that you can write a heap file, on which the tuples
  would all have xmin=FrozenTransactionId, xmax=Invalid, and the
  corresponding bits set in the infomask.  This ensures that no matter the
  state of the server, you can plug the file in and all tuples will be
  valid.
 
 So, bulk loading by file generation?   So the idea is that you would generate 
 a properly formatted PostgreSQL table file, and then in one transaction 
 create the table and attach it?

Exactly.

-- 
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
Changing the world ... one keyboard at a time!
 (www.DVzine.org)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-01 Thread Alvaro Herrera

On Thu, Sep 01, 2005 at 04:34:09PM +0100, Simon Riggs wrote:
 On Wed, 2005-08-31 at 22:25 -0400, Alvaro Herrera wrote:

  I was thinking that when the tuple
  needs to be obsoleted it would need to grow to accomodate the Xmax, but
  you are not actually proposing to remove that, so it seems sensible.  In
  fact, it is perfectly reasonable to remove Xmin and Cmin, because after
  the tuple is frozen, the Xmin never changes again.
 
 It's a good idea, but the Xmin is set to FrozenTransactionId, which is
 how we know it is frozen, so how can we remove Xmin? The way to do this
 is surely by using a row version id that is different for this format.

Per Takahiro's patch, you don't need to set the Xmin to
FrozenTransactionId -- what you do instead is set a bit in the infomask.


  Now, one thing of note is that you need to compress the page in order
  to actually be able to use the just-freed space.  VACUUM could do that,
  but maybe it would be better to do it on-line -- the freezing process is
  going to have to write the page regardless.  I wonder if with your patch
  the page is compressed on the same VACUUM execution that freezes the
  tuple?
 
 Only if you do a FULL, which is currently incompatible with a FREEZE. 

Well, if we are going to mess with what FREEZE is doing, we can as well
make it compress the page.  Note that to compress the page you don't
need to touch the indexes.

I don't remember the exact reason why FULL is incompatible with FREEZE,
but AFAIR it's not fundamentally unsolvable (just very hard.)

 There's no point in compressing a block if you can't also redistribute
 rows between blocks to fill up the spaces, so another reason why it has
 to be a FULL.

That's a good point.

  One thing that comes to mind is that this makes somewhat easier to build
  a tool to write pre-built tables, for bulk-loading purposes.  You just
  construct the binary file with the HEAP_FROZEN bit set, and then attach
  the file to a dummy table.
 
 Loading a table using COPY with frozen bits set was suggested in May, so
 yeh... it was suggested.

I'm not proposing to use COPY for that.  It has loads of problems, which
is why the patch was rejected.  Using an external program is a different
matter.

 Externally writing blocks is possible, but it bypasses a lot of other
 features.

Like what?

I don't really care for this feature, mind you -- I was merely
mentioning the idea as it crossed my mind.

-- 
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
The problem with the facetime model is not just that it's demoralizing, but
that the people pretending to work interrupt the ones actually working.
   (Paul Graham)

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-01 Thread Tom Lane

Josh Berkus josh@agliodbs.com writes:
 THe only fundamental disadvantage that COPY labors under is having to
 write WAL records.  It might be interesting to do something similar to
 the recent hacks for CREATE TABLE AS, so that a COPY into a table just
 created in the current transaction would skip writing WAL and instead
 fsync the table at the end.

 Yes, I thought we discussed doing this for empty tables -- it would be, per 
 our tests, a +10% to +30% boost to COPY.

 But there was some problem the patch?

I have seen no such patch AFAIR.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-09-01 Thread Tom Lane

Alvaro Herrera [EMAIL PROTECTED] writes:
 On Thu, Sep 01, 2005 at 04:34:09PM +0100, Simon Riggs wrote:
 On Wed, 2005-08-31 at 22:25 -0400, Alvaro Herrera wrote:
 Now, one thing of note is that you need to compress the page in order
 to actually be able to use the just-freed space.  VACUUM could do that,
 but maybe it would be better to do it on-line -- the freezing process is
 going to have to write the page regardless.  I wonder if with your patch
 the page is compressed on the same VACUUM execution that freezes the
 tuple?
 
 Only if you do a FULL, which is currently incompatible with a FREEZE. 

 Well, if we are going to mess with what FREEZE is doing, we can as well
 make it compress the page.

Anyone looked at the code lately???

PageRepairFragmentation is part of any kind of vacuum.  As long as you
don't reassign tuple IDs (which it doesn't) there's no impact on indexes.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: Additional background daemon (was: [HACKERS] Remove xmin and cmin from frozen tuples)

2005-09-01 Thread Tom Lane

Jim C. Nasby [EMAIL PROTECTED] writes:
 On Thu, Sep 01, 2005 at 09:22:35AM -0400, Tom Lane wrote:
 This is a bad idea.  The bgwriter isn't the place to be doing freezing,

 So is this something that another daemon could handle?

Possibly, but I'd be inclined to think of it as autovacuum's problem.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] Remove xmin and cmin from frozen tuples

2005-08-31 Thread ITAGAKI Takahiro

Hi Hackers,

I think it would be a waste to retain xmin and cmin for frozen tuples
because their values represent only 'visible for all transactions'.
Additionally, most tuples in database can be frozen potentially.

I wrote a makeshift patch to compress xmin and cmin (8bytes) to
1-bit flag, using tuple overlaping.

Is this idea worth trying?

Also, it will be useful to combine it and more aggressive freeze vacuum,
for example, a freezer integrated with bgwriter.



(The following is test of the attached patch)

* Test query
  1. create table test (a int);
  2. insert into test select * from generate_series(1, 10);
  3. update test set a = a where a % 100 = 0; # to force defrag
  4. select * from pgstattuple('test');
  5. vacuum freeze test;
  6. select * from pgstattuple('test');

* Results of pgstattuple

-[ before vacuum ]-+
table_len  | 3645440
tuple_count| 10
tuple_len  | 320
tuple_percent  | 87.78
dead_tuple_count   | 1000
dead_tuple_len | 32000
dead_tuple_percent | 0.88
free_space | 536
free_percent   | 0.01

-[ 8.1beta1 orig ]-+
table_len  | 3645440
tuple_count| 10
tuple_len  | 320
tuple_percent  | 87.78
dead_tuple_count   | 0
dead_tuple_len | 0
dead_tuple_percent | 0
free_space | 30772   -- about 32byte * 1000 (dead tuples)
free_percent   | 0.84

-[ patched ]---+
table_len  | 3645440
tuple_count| 10
tuple_len  | 320
tuple_percent  | 87.78
dead_tuple_count   | 0
dead_tuple_len | 0
dead_tuple_percent | 0
free_space | 823628  -- + 8byte * 10 (whole tuples)
free_percent   | 22.59

---
ITAGAKI Takahiro
NTT Cyber Space Laboratories


xmincut.patch
Description: Binary data

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-08-31 Thread Alvaro Herrera

On Thu, Sep 01, 2005 at 10:45:44AM +0900, ITAGAKI Takahiro wrote:

Hi,

 I think it would be a waste to retain xmin and cmin for frozen tuples
 because their values represent only 'visible for all transactions'.
 Additionally, most tuples in database can be frozen potentially.

I think this is an interesting idea.  I was thinking that when the tuple
needs to be obsoleted it would need to grow to accomodate the Xmax, but
you are not actually proposing to remove that, so it seems sensible.  In
fact, it is perfectly reasonable to remove Xmin and Cmin, because after
the tuple is frozen, the Xmin never changes again.

Now, one thing of note is that you need to compress the page in order
to actually be able to use the just-freed space.  VACUUM could do that,
but maybe it would be better to do it on-line -- the freezing process is
going to have to write the page regardless.  I wonder if with your patch
the page is compressed on the same VACUUM execution that freezes the
tuple?

One thing that comes to mind is that this makes somewhat easier to build
a tool to write pre-built tables, for bulk-loading purposes.  You just
construct the binary file with the HEAP_FROZEN bit set, and then attach
the file to a dummy table.  (Then again, you can do it today, using a
Xmin of FrozenTransactionId.  I wonder why the Bizgres people isn't
advocating a tool to do that.  It is very hard to do with user-defined
types, but for BI/DW you mostly don't need those, do you?)

-- 
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
Cuando no hay humildad las personas se degradan (A. Christie)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-08-31 Thread Tom Lane

ITAGAKI Takahiro [EMAIL PROTECTED] writes:
 I think it would be a waste to retain xmin and cmin for frozen tuples
 because their values represent only 'visible for all transactions'.

True, but the hard part is getting rid of the storage for them.

 I wrote a makeshift patch to compress xmin and cmin (8bytes) to
 1-bit flag, using tuple overlaping.
 Is this idea worth trying?

I think this is incredibly ugly :-(.  It eliminates a fairly basic
assumption which is that items on a page don't overlap.  The space
savings cannot be worth the loss in testability and reliability.
To take just one problem, it is no longer possible to check an item
offset for validity against pd_upper.  If we're going to do this,
we need a more invasive patch that changes the structure of heaptuple
headers in a more fundamental way, and avoids breaking the page layout
representation.  (Something like the way Oids are now handled might
work, although there are alignment issues to worry about, and it'd
take more work on VACUUM's part to convert a tuple to frozen state.)

I'm also less than enthused about using up our last infomask bit for
a relatively unimportant purpose.  We might need that for something
bigger someday... though I can't presently guess what.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-08-31 Thread Josh Berkus

Alvaro,

 One thing that comes to mind is that this makes somewhat easier to build
 a tool to write pre-built tables, for bulk-loading purposes.  You just
 construct the binary file with the HEAP_FROZEN bit set, and then attach
 the file to a dummy table.  (Then again, you can do it today, using a
 Xmin of FrozenTransactionId.  I wonder why the Bizgres people isn't
 advocating a tool to do that.  It is very hard to do with user-defined
 types, but for BI/DW you mostly don't need those, do you?)

Hmmm ... can you expand on this a little?  We'd discussed frozen partitions 
but hadn't thought to get around to them for a while, expecting the kind of 
issues which Tom just raised.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-08-31 Thread ITAGAKI Takahiro

Alvaro Herrera [EMAIL PROTECTED] wrote:

 Now, one thing of note is that you need to compress the page in order
 to actually be able to use the just-freed space.  VACUUM could do that,
 but maybe it would be better to do it on-line -- the freezing process is
 going to have to write the page regardless.

I agree. I think an good position of freezer is on bgwriter.
My idea is:
  1. Just before bgwriter writes an dirty page in LRU order,
  2. Freeze tuples in the page and repair fragmentation.
  3. (Replace the fsm page that has least freespace.)
  4. Flush the page.


 I wonder if with your patch
 the page is compressed on the same VACUUM execution that freezes the tuple?

Yes, defragmentation is performed after freezing, but the page has at least
one dead tuple. In current VACUUM implementation, pages that have no dead
tuples will not be defraged. So you cannot compress just after bulk-load.

---
ITAGAKI Takahiro
NTT Cyber Space Laboratories



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Remove xmin and cmin from frozen tuples

2005-08-31 Thread ITAGAKI Takahiro

Tom Lane [EMAIL PROTECTED] wrote:

 I think this is incredibly ugly :-(.

Yes, I think so, too :-(My patch is product of the thought that
I don't want to modify codes widely. So if we want to do it more cool way,
lots of changes are needed as you said.


 I'm also less than enthused about using up our last infomask bit for
 a relatively unimportant purpose.  We might need that for something
 bigger someday... though I can't presently guess what.

I think it is not a problem, because the header still has rooms for several
bits. I assume that the combination of HEAP_XMIN_COMMITTED + HEAP_XMIN_INVALID
has currently no meaning, right? If so, HEAP_FROZEN can be assigned here.
Also, t_natts is currently 16-bits, but it can be cut to 11-bits
because MaxTupleAttributeNumber is 1664  2^11.

---
ITAGAKI Takahiro
NTT Cyber Space Laboratories



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

54 matches

Mail list logo