Re: [HACKERS] Remove xmin and cmin from frozen tuples
This is now in the TODO list: * Merge xmin/xmax/cmin/cmax back into three header fields Before subtransactions, there used to be only three fields needed to store these four values. This was possible because only the current transaction looks at the cmin/cmax values. If the current transaction created and expired the row the fields stored where xmin (same as xmax), cmin, cmax, and if the transaction was expiring a row from a another transaction, the fields stored were xmin (cmin was not needed), xmax, and cmax. Such a system worked because a transaction could only see rows from another completed transaction. However, subtransactions can see rows from outer transactions, and once the subtransaction completes, the outer transaction continues, requiring the storage of all four fields. With subtransactions, an outer transaction can create a row, a subtransaction expire it, and when the subtransaction completes, the outer transaction still has to have proper visibility of the row's cmin, for example, for cursors. One possible solution is to create a phantom cid which represents a cmin/cmax pair and is stored in local memory. Another idea is to store both cmin and cmax only in local memory. --- Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: I'm curious to know how can you store the cmin/cmax pair completely out of the tuple. It's easy to see how to store a single identifier in each tuple that would be an index to a structure in local memory. However, to eliminate both you'd have to keep a list of all tuples you have created or obsoleted, with the cmin and cmax of each. This seems like an awful amount of memory. Yeah. I think a reasonable compromise scheme is to try to get down to three fields per tuple: xminsame as now xmaxsame as now cid/xvac xvac can share storage with the command ID info as long as VACUUM FULL never tries to move a tuple whose originating or deleting transaction is still running ... which is pretty much the same restriction we had before. For the command IDs, I am imagining: if created in current transaction: use cid to store cmin if deleted in current transaction: use cid to store cmax if both created and deleted in current transaction: cid is an index into an in-memory data structure that contains cmin and cmax. current transaction would have to have the loose definition that includes any subxact of the current top xact, but still, I think that the case where you need both fields is relatively uncommon. The in-memory data structure would only need to contain an entry for each distinct combination of cmin and cmax used in the current xact, so I think we could assume that it would never get unreasonably large. The entries would be created on demand much like we do for multixact ids (I guess you'd want a hash table to map requested cmin/cmax to an existing entry ID quickly). regards, tom lane -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Remove xmin and cmin from frozen tuples
On Wed, Sep 07, 2005 at 12:31:01AM -0500, Jim C. Nasby wrote: On Tue, Sep 06, 2005 at 07:02:20PM -0400, Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: If the 4 header fields in question were just normalized out, wouldn't all the semantics continue to work the same? All I'm envisioning is replacing them in each tuple with a pointer (vis_id) to another datastore that would be roughly equivalent to: CREATE TABLE visibility ( vis_id SERIAL, xminint, xmaxint, cminint, cmax_xmax int ) Of course you wouldn't use an actual table to do this, but hopefully this clarifies my idea. Well, like I said, I'm not envisioning using a table to store that info. Since we'd be storing 4 fixed-length fields, you wouldn't need to actually store vis_id per entry, just use the offset into the page. Assuming you use one 'slot' to store the id of the first set, you could store 511 tuples per page, which doesn't sound very bad. I think this could be done with our SLRU mechanism, just like pg_clog, pg_subtrans and pg_multixact do. Whether it's really a good idea or not, it's another story. The problem is that you would be creating new ones all the time, it would become a huge source of contention, and it would use a lot of memory. Anyway you are just moving the storage somewhere else -- instead of having 4 fields in the tuple itself, you have one field which points the same 4 fields elsewhere. I don't see how is that a win; it's actually worse because you have to do two lookups. (And actually you have just enlarged the storage requirements because you need to store the vis_id twice.) -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com La realidad se compone de muchos sueños, todos ellos diferentes, pero en cierto aspecto, parecidos... (Yo, hablando de sueños eróticos) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Remove xmin and cmin from frozen tuples
On Wed, Sep 07, 2005 at 01:05:52PM -0400, Alvaro Herrera wrote: Anyway you are just moving the storage somewhere else -- instead of having 4 fields in the tuple itself, you have one field which points the same 4 fields elsewhere. I don't see how is that a win; it's actually worse because you have to do two lookups. (And actually you have just enlarged the storage requirements because you need to store the vis_id twice.) It would only be of use if the table had few transactions in it; in other words, if it was mostly read-only. For a true read-only table there are other options people have suggested that are probably better. BTW, this becomes even more attractive if vis_id is int2; in that case you can keep the entire mapping in memory in ~1MB. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Remove xmin and cmin from frozen tuples
On Wed, Sep 07, 2005 at 01:20:27PM -0400, Tom Lane wrote: Anyway the fundamental insight has been completely lost here. The original point was that cmin and cmax are only interesting within the originating transaction, and not to anyone else, and thus perhaps don't need to be kept in permanent storage; while xmin/xmax are different animals because they *are* of interest to other transactions. I'm curious to know how can you store the cmin/cmax pair completely out of the tuple. It's easy to see how to store a single identifier in each tuple that would be an index to a structure in local memory. However, to eliminate both you'd have to keep a list of all tuples you have created or obsoleted, with the cmin and cmax of each. This seems like an awful amount of memory. -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com I can't go to a restaurant and order food because I keep looking at the fonts on the menu. Five minutes later I realize that it's also talking about food (Donald Knuth) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Remove xmin and cmin from frozen tuples
Alvaro Herrera [EMAIL PROTECTED] writes: I'm curious to know how can you store the cmin/cmax pair completely out of the tuple. It's easy to see how to store a single identifier in each tuple that would be an index to a structure in local memory. However, to eliminate both you'd have to keep a list of all tuples you have created or obsoleted, with the cmin and cmax of each. This seems like an awful amount of memory. Yeah. I think a reasonable compromise scheme is to try to get down to three fields per tuple: xminsame as now xmaxsame as now cid/xvac xvac can share storage with the command ID info as long as VACUUM FULL never tries to move a tuple whose originating or deleting transaction is still running ... which is pretty much the same restriction we had before. For the command IDs, I am imagining: if created in current transaction: use cid to store cmin if deleted in current transaction: use cid to store cmax if both created and deleted in current transaction: cid is an index into an in-memory data structure that contains cmin and cmax. current transaction would have to have the loose definition that includes any subxact of the current top xact, but still, I think that the case where you need both fields is relatively uncommon. The in-memory data structure would only need to contain an entry for each distinct combination of cmin and cmax used in the current xact, so I think we could assume that it would never get unreasonably large. The entries would be created on demand much like we do for multixact ids (I guess you'd want a hash table to map requested cmin/cmax to an existing entry ID quickly). regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Remove xmin and cmin from frozen tuples
Alvaro Herrera [EMAIL PROTECTED] writes: I think this could be done with our SLRU mechanism, just like pg_clog, pg_subtrans and pg_multixact do. Whether it's really a good idea or not, it's another story. The problem is that you would be creating new ones all the time, it would become a huge source of contention, and it would use a lot of memory. ... and you couldn't expire the data in a reasonable period of time. pg_subtrans and pg_multixact have only very short active ranges. pg_clog is longer-lived, but at only 2 bits per transaction, we can stand it. 16 bytes per tuple is a whole lot more data. Anyway the fundamental insight has been completely lost here. The original point was that cmin and cmax are only interesting within the originating transaction, and not to anyone else, and thus perhaps don't need to be kept in permanent storage; while xmin/xmax are different animals because they *are* of interest to other transactions. The storage scheme Jim proposes takes no advantage of that whatever. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Remove xmin and cmin from frozen tuples
On Fri, Sep 02, 2005 at 03:51:15PM -0400, Bruce Momjian wrote: One possible solution is to create a phantom cid which represents a cmin/cmax pair and is stored in local memory. If we're going to look at doing that I think it would also be good to consider including xmin and xmax as well. This might require persisting to disk, but for transactions that touch a number of tuples it could potentially be a big win (imagine being able to shrink all 4 fields down to a single int; a 45% space reduction). -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Remove xmin and cmin from frozen tuples
On Tue, Sep 06, 2005 at 03:58:28PM -0500, Jim C. Nasby wrote: On Fri, Sep 02, 2005 at 03:51:15PM -0400, Bruce Momjian wrote: One possible solution is to create a phantom cid which represents a cmin/cmax pair and is stored in local memory. If we're going to look at doing that I think it would also be good to consider including xmin and xmax as well. If you do that, you'll never be able to delete or update the tuple. This might require persisting to disk, but for transactions that touch a number of tuples it could potentially be a big win (imagine being able to shrink all 4 fields down to a single int; a 45% space reduction). Yeah, I've heard about compression algorithms that managed to fit megabytes of data in 8 bytes and even less. They were very cool. No one managed to write decompression algorithms however. Imagine a whole data warehouse could be stored in a single disk block!! I imagine the development of decompressors was boycotted by SAN vendors and the like. -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com Si un desconocido se acerca y te regala un CD de Ubuntu ... Eso es ... Eau de Tux ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Remove xmin and cmin from frozen tuples
On Tue, Sep 06, 2005 at 05:37:06PM -0400, Alvaro Herrera wrote: On Tue, Sep 06, 2005 at 03:58:28PM -0500, Jim C. Nasby wrote: On Fri, Sep 02, 2005 at 03:51:15PM -0400, Bruce Momjian wrote: One possible solution is to create a phantom cid which represents a cmin/cmax pair and is stored in local memory. If we're going to look at doing that I think it would also be good to consider including xmin and xmax as well. If you do that, you'll never be able to delete or update the tuple. My idea was to use an int to represent combinations of (c|x)(min|max), probably on a per-table basis. Essentially, it would normalize these values. I don't see how this would eliminate the ability to update or delete. Of course this would actually hurt on tables that had mostly single-row operations, so it would have to be a table-creation option. I believe it could substantially reduce the size of tables that don't see a lot of transactions. It would be good just to have a quick and dirty patch just to see if this is the case or not. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Remove xmin and cmin from frozen tuples
Jim C. Nasby [EMAIL PROTECTED] writes: If we're going to look at doing that I think it would also be good to consider including xmin and xmax as well. If you do that, you'll never be able to delete or update the tuple. My idea was to use an int to represent combinations of (c|x)(min|max), probably on a per-table basis. Essentially, it would normalize these values. I don't see how this would eliminate the ability to update or delete. How will other transactions know whether the tuple is good (yet) or not? How will you recover if the backend that does know this crashes before transaction end? How will you lock tuples for update/delete? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Remove xmin and cmin from frozen tuples
On Tue, Sep 06, 2005 at 06:02:27PM -0400, Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: If we're going to look at doing that I think it would also be good to consider including xmin and xmax as well. If you do that, you'll never be able to delete or update the tuple. My idea was to use an int to represent combinations of (c|x)(min|max), probably on a per-table basis. Essentially, it would normalize these values. I don't see how this would eliminate the ability to update or delete. How will other transactions know whether the tuple is good (yet) or not? How will you recover if the backend that does know this crashes before transaction end? How will you lock tuples for update/delete? If the 4 header fields in question were just normalized out, wouldn't all the semantics continue to work the same? All I'm envisioning is replacing them in each tuple with a pointer (vis_id) to another datastore that would be roughly equivalent to: CREATE TABLE visibility ( vis_id SERIAL, xminint, xmaxint, cminint, cmax_xmax int ) Of course you wouldn't use an actual table to do this, but hopefully this clarifies my idea. Any time the backend would update any of those fields it would now insert a new row into visibility containing the proper values and use vis_id in the tuples. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Remove xmin and cmin from frozen tuples
Jim C. Nasby [EMAIL PROTECTED] writes: If the 4 header fields in question were just normalized out, wouldn't all the semantics continue to work the same? All I'm envisioning is replacing them in each tuple with a pointer (vis_id) to another datastore that would be roughly equivalent to: CREATE TABLE visibility ( vis_id SERIAL, xminint, xmaxint, cminint, cmax_xmax int ) Of course you wouldn't use an actual table to do this, but hopefully this clarifies my idea. Let's see ... replace every tuple access with TWO tuple accesses ... yes, that should improve performance nicely. And no, I'm not sure the semantics are the same, particularly w.r.t. atomicity of state changes. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Remove xmin and cmin from frozen tuples
On Tue, Sep 06, 2005 at 07:02:20PM -0400, Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: If the 4 header fields in question were just normalized out, wouldn't all the semantics continue to work the same? All I'm envisioning is replacing them in each tuple with a pointer (vis_id) to another datastore that would be roughly equivalent to: CREATE TABLE visibility ( vis_id SERIAL, xminint, xmaxint, cminint, cmax_xmax int ) Of course you wouldn't use an actual table to do this, but hopefully this clarifies my idea. Let's see ... replace every tuple access with TWO tuple accesses ... yes, that should improve performance nicely. And no, I'm not sure the semantics are the same, particularly w.r.t. atomicity of state changes. Well, like I said, I'm not envisioning using a table to store that info. Since we'd be storing 4 fixed-length fields, you wouldn't need to actually store vis_id per entry, just use the offset into the page. Assuming you use one 'slot' to store the id of the first set, you could store 511 tuples per page, which doesn't sound very bad. But this is why I'm hoping a quick patch could be done so that this could be tested. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Remove xmin and cmin from frozen tuples
On Fri, 2 Sep 2005 20:41:48 -0400 (EDT), Bruce Momjian pgman@candle.pha.pa.us wrote: Once I had a patch based on 7.4 that stored cmin and cmax in backend-local memory. Interesting idea, but how would you record the cmin/xmin values without requiring unlimited memory? That's exactly the reason for not sending it to -patches. Without spilling to disk this is just not ready for real life. The problem is that -- unlike other data structures that build up during a transaction, e.g. trigger queues -- cmin/cmax lookup requires random access, so we'd need some form of tree or hash. Unfornunately I never got beyond brainstorming :-( BTW, is there anything else that'd need spilling to disk during long transactions? Servus Manfred ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Remove xmin and cmin from frozen tuples
On Sat, Sep 03, 2005 at 10:59:31AM +0200, Manfred Koizar wrote: On Fri, 2 Sep 2005 20:41:48 -0400 (EDT), Bruce Momjian pgman@candle.pha.pa.us wrote: Once I had a patch based on 7.4 that stored cmin and cmax in backend-local memory. Interesting idea, but how would you record the cmin/xmin values without requiring unlimited memory? That's exactly the reason for not sending it to -patches. Without spilling to disk this is just not ready for real life. The problem is that -- unlike other data structures that build up during a transaction, e.g. trigger queues -- cmin/cmax lookup requires random access, so we'd need some form of tree or hash. Unfornunately I never got beyond brainstorming :-( BTW, is there anything else that'd need spilling to disk during long transactions? Yes, the queue of pending deferred triggers. -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com El hombre nunca sabe de lo que es capaz hasta que lo intenta (C. Dickens) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Remove xmin and cmin from frozen tuples
Manfred Koizar wrote: On Fri, 2 Sep 2005 20:41:48 -0400 (EDT), Bruce Momjian pgman@candle.pha.pa.us wrote: Once I had a patch based on 7.4 that stored cmin and cmax in backend-local memory. Interesting idea, but how would you record the cmin/xmin values without requiring unlimited memory? That's exactly the reason for not sending it to -patches. Without spilling to disk this is just not ready for real life. The problem is that -- unlike other data structures that build up during a transaction, e.g. trigger queues -- cmin/cmax lookup requires random access, so we'd need some form of tree or hash. Unfornunately I never got beyond brainstorming :-( What we do for shared row locks is much more compact because we create a phantom xid for combination of xids and store that in a hash. I think the problem with having cmin/cmax in local memory is that without a way to stamp a row with some fake cid (as is suggested in the TODO item now) there is really no way to _group_ rows together to store their values as a single item in local memory, meaning we would have to store ctids or something for each row, and that seems unscalable. I have added your memory-only idea to the TODO item because it highlights that cmin/cmax is never looked at outside the backend that is modifying the row. If there was a tuple header field we can use just while the row is being created or expired we could use that, but I don't see one. Could we recalculated one of the fields after it is created/expired? I don't know. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Remove xmin and cmin from frozen tuples
[skip] happening in the bgwriter's inner loop. Even more to the point, you can't do such changes without getting a superexclusive lock on the page (not only locked, but no one else has it pinned), which is a real nonstarter for the bgwriter, both for performance and possible deadlock issues. Hi Tom, I do not want to discuss in deep the place to do this job, it is really over my head. But, you said you need a super-exclusive lock, and I wonder if a wait-free algorithm would be good for pg in a general manner. I have given some references about it already on the list. http://archives.postgresql.org/pgsql-hackers/2005-02/msg00263.php Cordialement, Jean-Gérard Pailloncy ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Remove xmin and cmin from frozen tuples
Tom Lane wrote: Of course, it's fair to ask whether such a program would be any faster than binary-mode COPY by the time you got done ... or enough faster to justify your effort, anyway. THe only fundamental disadvantage that COPY labors under is having to write WAL records. It might be interesting to do something similar to the recent hacks for CREATE TABLE AS, so that a COPY into a table just created in the current transaction would skip writing WAL and instead fsync the table at the end. Added to TODO: o Allow COPY into an empty table to skip WAL logging -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Remove xmin and cmin from frozen tuples
Tom Lane wrote: ITAGAKI Takahiro [EMAIL PROTECTED] writes: I think it would be a waste to retain xmin and cmin for frozen tuples because their values represent only 'visible for all transactions'. True, but the hard part is getting rid of the storage for them. I wrote a makeshift patch to compress xmin and cmin (8bytes) to 1-bit flag, using tuple overlaping. Is this idea worth trying? I think this is incredibly ugly :-(. It eliminates a fairly basic assumption which is that items on a page don't overlap. The space savings cannot be worth the loss in testability and reliability. To take just one problem, it is no longer possible to check an item offset for validity against pd_upper. If we're going to do this, we need a more invasive patch that changes the structure of heaptuple headers in a more fundamental way, and avoids breaking the page layout representation. (Something like the way Oids are now handled might work, although there are alignment issues to worry about, and it'd take more work on VACUUM's part to convert a tuple to frozen state.) I'm also less than enthused about using up our last infomask bit for a relatively unimportant purpose. We might need that for something bigger someday... though I can't presently guess what. Considering the cost/benefits, rather than doing some optimization for long-lived tuples, I would like to see us merge the existing xmin/xmax/cmin/cmax values back into three storage fields like we had in 7.4 and had to expand to a full four in 8.0 to support subtransactions. The benefit is that every row would be reduced in size by 4 bytes or 14% for all rows: * Merge xmin/xmax/cmin/cmax back into three header fields Before subtransactions, there used to be only three fields needed to store these four values. This was possible because only the current transaction looks at the cmin/cmax values. If the current transaction created and expired the row the fields stored where xmin (same as xmax), cmin, cmax, and if the transaction was expiring a row from a another transaction, the fields stored were xmin (cmin was not needed), xmax, and cmax. Such a system worked because a transaction could only see rows from another completed transaction. However, subtransactions can see rows from outer transactions, and once the subtransaction completes, the outer transaction continues, requiring the storage of all four fields. With subtransactions, an outer transaction can create a row, a subtransaction expire it, and when the subtransaction completes, the outer transaction still has to have proper visibility of the row's cmin, for example, for cursors. One possible solution is to create a phantom cid which represents a cmin/cmax pair and is stored in local memory. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Remove xmin and cmin from frozen tuples
Bruce Momjian pgman@candle.pha.pa.us writes: Tom Lane wrote: THe only fundamental disadvantage that COPY labors under is having to write WAL records. It might be interesting to do something similar to the recent hacks for CREATE TABLE AS, so that a COPY into a table just created in the current transaction would skip writing WAL and instead fsync the table at the end. Added to TODO: o Allow COPY into an empty table to skip WAL logging It has to be a *new* table, not an *empty* table. If it's already visible to other xacts then somebody else could insert into it in parallel with you, because COPY doesn't take an exclusive lock. Contrariwise, it doesn't really matter (I think) if there are WAL-logged records already in the table and COPY is adding more that aren't logged. (You might have to force COPY to start adding the rows on freshly added pages ... hmm ... all of a sudden I think we had this discussion already? I for sure remember the fresh-pages trick from some other thread.) regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Remove xmin and cmin from frozen tuples
Bruce Momjian pgman@candle.pha.pa.us writes: Considering the cost/benefits, rather than doing some optimization for long-lived tuples, I would like to see us merge the existing xmin/xmax/cmin/cmax values back into three storage fields like we had in 7.4 and had to expand to a full four in 8.0 to support subtransactions. There is another reason for trying to do that rather than the frozen-row optimization, which is that to get it down to two visibility-related fields, we'd have to find another representation for tuples that are Datums in memory. The current Datum representation overlays three int32 fields where the visibility fields are for a tuple on-disk. This works fine now, and would still work fine if we could revert to the 7.4 approach, but it doesn't play nicely with a scheme to remove 2 of the 4 fields. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Remove xmin and cmin from frozen tuples
On Fri, Sep 02, 2005 at 04:02:08PM -0400, Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: Tom Lane wrote: THe only fundamental disadvantage that COPY labors under is having to write WAL records. It might be interesting to do something similar to the recent hacks for CREATE TABLE AS, so that a COPY into a table just created in the current transaction would skip writing WAL and instead fsync the table at the end. Added to TODO: o Allow COPY into an empty table to skip WAL logging It has to be a *new* table, not an *empty* table. If it's already visible to other xacts then somebody else could insert into it in parallel with you, because COPY doesn't take an exclusive lock. What about the indexes? Logging one of the inserters and not the other is certain to corrupt the whole thing. (Logging index insertion but not the heap itself is silly, but perhaps an easy way out is to disable the feature for tables with indexes.) Contrariwise, it doesn't really matter (I think) if there are WAL-logged records already in the table and COPY is adding more that aren't logged. Only if the page is locked in a fashion that the bulk loader can't insert tuples into a page that the other transaction is using. (Not sure if this can happen in reality.) Else we risk both inserting a tuple in the same page, and on recovery finding out that somebody else used the tuple slot. -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com Los románticos son seres que mueren de deseos de vida ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Remove xmin and cmin from frozen tuples
Tom, Alvaro, It has to be a *new* table, not an *empty* table. If it's already visible to other xacts then somebody else could insert into it in parallel with you, because COPY doesn't take an exclusive lock. There's still major gains to be had, for ETL, in being able to disable logging on new tables/partitions. *particularly* partitions. Contrariwise, it doesn't really matter (I think) if there are WAL-logged records already in the table and COPY is adding more that aren't logged. (You might have to force COPY to start adding the rows on freshly added pages ... hmm ... all of a sudden I think we had this discussion already? I for sure remember the fresh-pages trick from some other thread.) Yes, and that's what shot the proposal down before. But I don't think we devoted sufficient discussion to the new table case. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Remove xmin and cmin from frozen tuples
People: Bruce Momjian pgman@candle.pha.pa.us writes: Considering the cost/benefits, rather than doing some optimization for long-lived tuples, I would like to see us merge the existing xmin/xmax/cmin/cmax values back into three storage fields like we had in 7.4 and had to expand to a full four in 8.0 to support subtransactions. Hmmm. I personally don't see a whole lot of value in trimming 4 bytes per row off an archive table, particularly if the table would need to go through some kind of I/O intensive operation to do it. Where I do see value is in enabling index-only access for frozen tables. That would be a *huge* gain, especially with bitmaps. I think we've discussed this before,though. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Remove xmin and cmin from frozen tuples
Alvaro Herrera [EMAIL PROTECTED] writes: On Fri, Sep 02, 2005 at 04:02:08PM -0400, Tom Lane wrote: It has to be a *new* table, not an *empty* table. If it's already visible to other xacts then somebody else could insert into it in parallel with you, because COPY doesn't take an exclusive lock. What about the indexes? Logging one of the inserters and not the other is certain to corrupt the whole thing. Good point, but that fits in just fine with the restriction to just-created tables. Contrariwise, it doesn't really matter (I think) if there are WAL-logged records already in the table and COPY is adding more that aren't logged. Only if the page is locked in a fashion that the bulk loader can't insert tuples into a page that the other transaction is using. What other transaction? The point I was making is that BEGIN; CREATE TABLE ... INSERT ... COPY ... is still optimizable. There isn't going to be anyone competing with the COPY while it runs. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Remove xmin and cmin from frozen tuples
On Fri, Sep 02, 2005 at 04:27:59PM -0400, Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: Contrariwise, it doesn't really matter (I think) if there are WAL-logged records already in the table and COPY is adding more that aren't logged. Only if the page is locked in a fashion that the bulk loader can't insert tuples into a page that the other transaction is using. What other transaction? The point I was making is that BEGIN; CREATE TABLE ... INSERT ... COPY ... is still optimizable. There isn't going to be anyone competing with the COPY while it runs. Sure. I was thinking that you were looking for a mechanism to relax the other restriction. -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com Dios hizo a Adán, pero fue Eva quien lo hizo hombre. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Remove xmin and cmin from frozen tuples
On Fri, Sep 02, 2005 at 01:30:58PM -0700, Josh Berkus wrote: Contrariwise, it doesn't really matter (I think) if there are WAL-logged records already in the table and COPY is adding more that aren't logged. (You might have to force COPY to start adding the rows on freshly added pages ... hmm ... all of a sudden I think we had this discussion already? I for sure remember the fresh-pages trick from some other thread.) Yes, and that's what shot the proposal down before. But I don't think we devoted sufficient discussion to the new table case. If we are going to have real partitioning sometime soon, I don't think the restriction is a problem. You may have to load a whole partition again, which may be faster than using logged COPY to an already-filled partition. The point is, it's not the whole table, just a partition. -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com Acepta los honores y aplausos y perderás tu libertad ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Remove xmin and cmin from frozen tuples
On Fri, Sep 02, 2005 at 01:35:42PM -0700, Josh Berkus wrote: Bruce Momjian pgman@candle.pha.pa.us writes: Considering the cost/benefits, rather than doing some optimization for long-lived tuples, I would like to see us merge the existing xmin/xmax/cmin/cmax values back into three storage fields like we had in 7.4 and had to expand to a full four in 8.0 to support subtransactions. Hmmm. I personally don't see a whole lot of value in trimming 4 bytes per row off an archive table, particularly if the table would need to go through some kind of I/O intensive operation to do it. I think you are missing something. These 4 bytes are not trimmed by an I/O-intensive operation, they are not written in the first place. Now, I agree for a very wide table those 4 bytes per tuple may not be a lot. But the optimization could be significant for not-wide (uh, sorry, I don't remember the word) tables. Where I do see value is in enabling index-only access for frozen tables. That would be a *huge* gain, especially with bitmaps. I think we've discussed this before, though. That's a completely different discussion. Btree-organized heaps may help you there. -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com Having your biases confirmed independently is how scientific progress is made, and hence made our great society what it is today (Mary Gardiner) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Remove xmin and cmin from frozen tuples
Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: On Fri, Sep 02, 2005 at 04:02:08PM -0400, Tom Lane wrote: It has to be a *new* table, not an *empty* table. If it's already visible to other xacts then somebody else could insert into it in parallel with you, because COPY doesn't take an exclusive lock. What about the indexes? Logging one of the inserters and not the other is certain to corrupt the whole thing. Good point, but that fits in just fine with the restriction to just-created tables. Seem the newly created table could have an index, but we would skip logging on that too and create a zero-length file on crash restore. Contrariwise, it doesn't really matter (I think) if there are WAL-logged records already in the table and COPY is adding more that aren't logged. Only if the page is locked in a fashion that the bulk loader can't insert tuples into a page that the other transaction is using. What other transaction? The point I was making is that BEGIN; CREATE TABLE ... INSERT ... COPY ... is still optimizable. There isn't going to be anyone competing with the COPY while it runs. Updated TODO: o Allow COPY on a newly-created table to skip WAL logging On crash recovery, the table involved in the COPY would have its heap and index files truncated. One issue is that no other backend should be able to add to the table at the same time, which is something that is currently allowed. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Remove xmin and cmin from frozen tuples
Alvaro Herrera [EMAIL PROTECTED] writes: On Fri, Sep 02, 2005 at 01:35:42PM -0700, Josh Berkus wrote: Where I do see value is in enabling index-only access for frozen tables. That would be a *huge* gain, especially with bitmaps. I think we've discussed this before, though. That's a completely different discussion. Btree-organized heaps may help you there. There was some talk of using a spare bit in index entries to mark known good index entries (xmin committed and less than GlobalXmin, and xmax invalid) but the cost of maintaining such bits seems nontrivial. In any case I agree that's an independent issue. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Remove xmin and cmin from frozen tuples
Bruce Momjian pgman@candle.pha.pa.us writes: Updated TODO: o Allow COPY on a newly-created table to skip WAL logging On crash recovery, the table involved in the COPY would have its heap and index files truncated. One issue is that no other backend should be able to add to the table at the same time, which is something that is currently allowed. This is simply wrong. (1) a table created in the current transaction isn't visible to anyone else, (2) the correct rollback state is for it not to be there, rather than be there and empty. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Remove xmin and cmin from frozen tuples
Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: Updated TODO: o Allow COPY on a newly-created table to skip WAL logging On crash recovery, the table involved in the COPY would have its heap and index files truncated. One issue is that no other backend should be able to add to the table at the same time, which is something that is currently allowed. This is simply wrong. (1) a table created in the current transaction isn't visible to anyone else, (2) the correct rollback state is for it not to be there, rather than be there and empty. New text: o Allow COPY on a newly-created table to skip WAL logging On crash recovery, the table involved in the COPY would removed or have its heap and index files truncated. One issue is that no other backend should be able to add to the table at the same time, which is something that is currently allowed. I think we can lock a zero-length table and do this optimization. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Remove xmin and cmin from frozen tuples
On Fri, Sep 02, 2005 at 05:18:09PM -0400, Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: Updated TODO: o Allow COPY on a newly-created table to skip WAL logging On crash recovery, the table involved in the COPY would have its heap and index files truncated. One issue is that no other backend should be able to add to the table at the same time, which is something that is currently allowed. This is simply wrong. (1) a table created in the current transaction isn't visible to anyone else, (2) the correct rollback state is for it not to be there, rather than be there and empty. As a related note: I remember somebody mentioned some time ago that if you create a table and then crash before ending the transaction, the tuple in pg_class is no longer valid, but the file remains. I think this will be a much worse problem if we allow a table that's being COPY'ed to remain after a crash. -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com Oh, oh, las chicas galacianas, lo harán por las perlas, ¡Y las de Arrakis por el agua! Pero si buscas damas Que se consuman como llamas, ¡Prueba una hija de Caladan! (Gurney Halleck) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Remove xmin and cmin from frozen tuples
Alvaro Herrera [EMAIL PROTECTED] writes: I remember somebody mentioned some time ago that if you create a table and then crash before ending the transaction, the tuple in pg_class is no longer valid, but the file remains. Right --- it will be removed on transaction rollback, but not if the backend crashes first. There was a patch submitted earlier this year to try to clean out such files, but it got rejected (as too messy IIRC). I think we still have a TODO item about it. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Remove xmin and cmin from frozen tuples
On Fri, 2 Sep 2005 15:51:15 -0400 (EDT), Bruce Momjian pgman@candle.pha.pa.us wrote: * Merge xmin/xmax/cmin/cmax back into three header fields And don't forget xvac, please. Before subtransactions, there used to be only three fields needed to store these four values. ... five values. This was possible because only the current transaction looks at the cmin/cmax values. Which is a reason to get rid of cmin/cmax in tuple headers entirely. Once I had a patch based on 7.4 that stored cmin and cmax in backend-local memory. It passed make check and some volume tests, but I felt it was not ready to be applied without any spill-to-disk mechanism. Development stalled when I tried to eliminate xvac as well, which would have required deep cuts into VACUUM code :-( Servus Manfred ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Remove xmin and cmin from frozen tuples
Manfred Koizar wrote: On Fri, 2 Sep 2005 15:51:15 -0400 (EDT), Bruce Momjian pgman@candle.pha.pa.us wrote: * Merge xmin/xmax/cmin/cmax back into three header fields And don't forget xvac, please. Before subtransactions, there used to be only three fields needed to store these four values. ... five values. This was possible because only the current transaction looks at the cmin/cmax values. Which is a reason to get rid of cmin/cmax in tuple headers entirely. Once I had a patch based on 7.4 that stored cmin and cmax in backend-local memory. It passed make check and some volume tests, but I felt it was not ready to be applied without any spill-to-disk mechanism. Development stalled when I tried to eliminate xvac as well, which would have required deep cuts into VACUUM code :-( Interesting idea, but how would you record the cmin/xmin values without requiring unlimited memory? -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Remove xmin and cmin from frozen tuples
ITAGAKI Takahiro [EMAIL PROTECTED] writes: I agree. I think an good position of freezer is on bgwriter. My idea is: 1. Just before bgwriter writes an dirty page in LRU order, 2. Freeze tuples in the page and repair fragmentation. 3. (Replace the fsm page that has least freespace.) 4. Flush the page. This is a bad idea. The bgwriter isn't the place to be doing freezing, because there is no reasonable way for it to guarantee that all old tuples in a table (or any larger unit) have been frozen. So you'd still need VACUUM to ensure no wraparound. Plus, you can't do such changes without emitting an XLOG record, which is something we don't want happening in the bgwriter's inner loop. Even more to the point, you can't do such changes without getting a superexclusive lock on the page (not only locked, but no one else has it pinned), which is a real nonstarter for the bgwriter, both for performance and possible deadlock issues. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Remove xmin and cmin from frozen tuples
On Wed, Aug 31, 2005 at 09:14:42PM -0700, Josh Berkus wrote: One thing that comes to mind is that this makes somewhat easier to build a tool to write pre-built tables, for bulk-loading purposes. You just construct the binary file with the HEAP_FROZEN bit set, and then attach the file to a dummy table. (Then again, you can do it today, using a Xmin of FrozenTransactionId. I wonder why the Bizgres people isn't advocating a tool to do that. It is very hard to do with user-defined types, but for BI/DW you mostly don't need those, do you?) Hmmm ... can you expand on this a little? We'd discussed frozen partitions but hadn't thought to get around to them for a while, expecting the kind of issues which Tom just raised. What issues did he raise on this? What I'm saying is that you can write a heap file, on which the tuples would all have xmin=FrozenTransactionId, xmax=Invalid, and the corresponding bits set in the infomask. This ensures that no matter the state of the server, you can plug the file in and all tuples will be valid. The only problem is figuring out how to lay the data in the tuples themselves, w.r.t endianness and such. This is platform-dependent, so you have to write code to do it correctly. In absence of user-defined types, this should not be _too_ hard to do. Of course, such a program would in general also be Postgres-version-dependent. Note that this is a very different business from skipping the Xmin and Cmin from the tuple header -- in fact, there's no relation to that at all. -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com FOO MANE PADME HUM ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Remove xmin and cmin from frozen tuples
Alvaro Herrera [EMAIL PROTECTED] writes: What I'm saying is that you can write a heap file, on which the tuples would all have xmin=FrozenTransactionId, xmax=Invalid, and the corresponding bits set in the infomask. This ensures that no matter the state of the server, you can plug the file in and all tuples will be valid. The only problem is figuring out how to lay the data in the tuples themselves, w.r.t endianness and such. This is platform-dependent, so you have to write code to do it correctly. In absence of user-defined types, this should not be _too_ hard to do. Of course, such a program would in general also be Postgres-version-dependent. Of course, it's fair to ask whether such a program would be any faster than binary-mode COPY by the time you got done ... or enough faster to justify your effort, anyway. THe only fundamental disadvantage that COPY labors under is having to write WAL records. It might be interesting to do something similar to the recent hacks for CREATE TABLE AS, so that a COPY into a table just created in the current transaction would skip writing WAL and instead fsync the table at the end. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Remove xmin and cmin from frozen tuples
On Wed, 2005-08-31 at 22:25 -0400, Alvaro Herrera wrote: On Thu, Sep 01, 2005 at 10:45:44AM +0900, ITAGAKI Takahiro wrote: Hi, I think it would be a waste to retain xmin and cmin for frozen tuples because their values represent only 'visible for all transactions'. Additionally, most tuples in database can be frozen potentially. I think this is an interesting idea. Agreed, especially since it would avoid the need to vacuum altogether. I was thinking that when the tuple needs to be obsoleted it would need to grow to accomodate the Xmax, but you are not actually proposing to remove that, so it seems sensible. In fact, it is perfectly reasonable to remove Xmin and Cmin, because after the tuple is frozen, the Xmin never changes again. It's a good idea, but the Xmin is set to FrozenTransactionId, which is how we know it is frozen, so how can we remove Xmin? The way to do this is surely by using a row version id that is different for this format. Getting 8 or 16 bytes per row back would be a very useful gain. Now, one thing of note is that you need to compress the page in order to actually be able to use the just-freed space. VACUUM could do that, but maybe it would be better to do it on-line -- the freezing process is going to have to write the page regardless. I wonder if with your patch the page is compressed on the same VACUUM execution that freezes the tuple? Only if you do a FULL, which is currently incompatible with a FREEZE. There's no point in compressing a block if you can't also redistribute rows between blocks to fill up the spaces, so another reason why it has to be a FULL. Unless you do this at load time, which is why I guess you mention One thing that comes to mind is that this makes somewhat easier to build a tool to write pre-built tables, for bulk-loading purposes. You just construct the binary file with the HEAP_FROZEN bit set, and then attach the file to a dummy table. (Then again, you can do it today, using a Xmin of FrozenTransactionId. I wonder why the Bizgres people isn't advocating a tool to do that. It is very hard to do with user-defined types, but for BI/DW you mostly don't need those, do you?) Loading a table using COPY with frozen bits set was suggested in May, so yeh... it was suggested. At that time it was rejected, since earlier transactions would then be able to see rows they ought not be able to see. Thinking some more about this, this is only the inverse situation of a TRUNCATE. With truncate we remove tuples that ought to still be visible to pre-existing transactions. So there shouldn't really be an issue with loading pre-frozen tuples - as long as you accept the consequences for row visibility. Externally writing blocks is possible, but it bypasses a lot of other features. My current preference would be to have bulk_heap_insert() function to add a whole page at a time rather than inserting rows one at at a time. The main objective for a load is to make it disk bound; once we've achieved that by some further tuning, writing an external file would cost around the same as writing it internally from the DBMS. Oracle (direct path loader) and Teradata (Fastload) load data in complete blocks using a reduced code pathway, so I guess I was just following on, but I'm genuinely open to further persuasion if there is a better way. Having a table marked as INSERT ONLY would allow us to save 8 bytes/row, loading it pre-frozen (in some way) would save another 8 bytes/row and allow us to permanently avoid VACUUMing the table. That would be even better when we have per-table XID wrap avoidance. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Remove xmin and cmin from frozen tuples
On Thu, Sep 01, 2005 at 11:08:36AM -0400, Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: What I'm saying is that you can write a heap file, on which the tuples would all have xmin=FrozenTransactionId, xmax=Invalid, and the corresponding bits set in the infomask. This ensures that no matter the state of the server, you can plug the file in and all tuples will be valid. The only problem is figuring out how to lay the data in the tuples themselves, w.r.t endianness and such. This is platform-dependent, so you have to write code to do it correctly. In absence of user-defined types, this should not be _too_ hard to do. Of course, such a program would in general also be Postgres-version-dependent. Of course, it's fair to ask whether such a program would be any faster than binary-mode COPY by the time you got done ... or enough faster to justify your effort, anyway. It may not be faster generating the data in the first place, but you don't have to vacuum the table, nor you are subject to hint bits changing, resulting in more unnecessary I/O. This can't be avoided with COPY, because there's always the chance that it will fail partway through, so you can't write frozen tuples. With an external program, you can just dump the invalid line somewhere else and continue with the rest. -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com Just treat us the way you want to be treated + some extra allowance for ignorance.(Michael Brusser) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Remove xmin and cmin from frozen tuples
Alvaro, What issues did he raise on this? On having no Xmin. What I'm saying is that you can write a heap file, on which the tuples would all have xmin=FrozenTransactionId, xmax=Invalid, and the corresponding bits set in the infomask. This ensures that no matter the state of the server, you can plug the file in and all tuples will be valid. The only problem is figuring out how to lay the data in the tuples themselves, w.r.t endianness and such. This is platform-dependent, so you have to write code to do it correctly. In absence of user-defined types, this should not be _too_ hard to do. Of course, such a program would in general also be Postgres-version-dependent. So, bulk loading by file generation? So the idea is that you would generate a properly formatted PostgreSQL table file, and then in one transaction create the table and attach it? Seems like this would have the additional limitation of being useful only for loading new partitions/new tables. However, it would have some significant advantages for bulk loading ... chiefly that the data page generation and associated computations could be done *off* the database server. This might help considerably in getting around the 100mb/s data computation ceiling we're hitting ... -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Remove xmin and cmin from frozen tuples
Tom, THe only fundamental disadvantage that COPY labors under is having to write WAL records. It might be interesting to do something similar to the recent hacks for CREATE TABLE AS, so that a COPY into a table just created in the current transaction would skip writing WAL and instead fsync the table at the end. Yes, I thought we discussed doing this for empty tables -- it would be, per our tests, a +10% to +30% boost to COPY. But there was some problem the patch? -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Remove xmin and cmin from frozen tuples
On Thu, Sep 01, 2005 at 09:20:48AM -0700, Josh Berkus wrote: What I'm saying is that you can write a heap file, on which the tuples would all have xmin=FrozenTransactionId, xmax=Invalid, and the corresponding bits set in the infomask. This ensures that no matter the state of the server, you can plug the file in and all tuples will be valid. So, bulk loading by file generation? So the idea is that you would generate a properly formatted PostgreSQL table file, and then in one transaction create the table and attach it? Exactly. -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com Changing the world ... one keyboard at a time! (www.DVzine.org) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Remove xmin and cmin from frozen tuples
On Thu, Sep 01, 2005 at 04:34:09PM +0100, Simon Riggs wrote: On Wed, 2005-08-31 at 22:25 -0400, Alvaro Herrera wrote: I was thinking that when the tuple needs to be obsoleted it would need to grow to accomodate the Xmax, but you are not actually proposing to remove that, so it seems sensible. In fact, it is perfectly reasonable to remove Xmin and Cmin, because after the tuple is frozen, the Xmin never changes again. It's a good idea, but the Xmin is set to FrozenTransactionId, which is how we know it is frozen, so how can we remove Xmin? The way to do this is surely by using a row version id that is different for this format. Per Takahiro's patch, you don't need to set the Xmin to FrozenTransactionId -- what you do instead is set a bit in the infomask. Now, one thing of note is that you need to compress the page in order to actually be able to use the just-freed space. VACUUM could do that, but maybe it would be better to do it on-line -- the freezing process is going to have to write the page regardless. I wonder if with your patch the page is compressed on the same VACUUM execution that freezes the tuple? Only if you do a FULL, which is currently incompatible with a FREEZE. Well, if we are going to mess with what FREEZE is doing, we can as well make it compress the page. Note that to compress the page you don't need to touch the indexes. I don't remember the exact reason why FULL is incompatible with FREEZE, but AFAIR it's not fundamentally unsolvable (just very hard.) There's no point in compressing a block if you can't also redistribute rows between blocks to fill up the spaces, so another reason why it has to be a FULL. That's a good point. One thing that comes to mind is that this makes somewhat easier to build a tool to write pre-built tables, for bulk-loading purposes. You just construct the binary file with the HEAP_FROZEN bit set, and then attach the file to a dummy table. Loading a table using COPY with frozen bits set was suggested in May, so yeh... it was suggested. I'm not proposing to use COPY for that. It has loads of problems, which is why the patch was rejected. Using an external program is a different matter. Externally writing blocks is possible, but it bypasses a lot of other features. Like what? I don't really care for this feature, mind you -- I was merely mentioning the idea as it crossed my mind. -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com The problem with the facetime model is not just that it's demoralizing, but that the people pretending to work interrupt the ones actually working. (Paul Graham) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Remove xmin and cmin from frozen tuples
Josh Berkus josh@agliodbs.com writes: THe only fundamental disadvantage that COPY labors under is having to write WAL records. It might be interesting to do something similar to the recent hacks for CREATE TABLE AS, so that a COPY into a table just created in the current transaction would skip writing WAL and instead fsync the table at the end. Yes, I thought we discussed doing this for empty tables -- it would be, per our tests, a +10% to +30% boost to COPY. But there was some problem the patch? I have seen no such patch AFAIR. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Remove xmin and cmin from frozen tuples
Alvaro Herrera [EMAIL PROTECTED] writes: On Thu, Sep 01, 2005 at 04:34:09PM +0100, Simon Riggs wrote: On Wed, 2005-08-31 at 22:25 -0400, Alvaro Herrera wrote: Now, one thing of note is that you need to compress the page in order to actually be able to use the just-freed space. VACUUM could do that, but maybe it would be better to do it on-line -- the freezing process is going to have to write the page regardless. I wonder if with your patch the page is compressed on the same VACUUM execution that freezes the tuple? Only if you do a FULL, which is currently incompatible with a FREEZE. Well, if we are going to mess with what FREEZE is doing, we can as well make it compress the page. Anyone looked at the code lately??? PageRepairFragmentation is part of any kind of vacuum. As long as you don't reassign tuple IDs (which it doesn't) there's no impact on indexes. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: Additional background daemon (was: [HACKERS] Remove xmin and cmin from frozen tuples)
Jim C. Nasby [EMAIL PROTECTED] writes: On Thu, Sep 01, 2005 at 09:22:35AM -0400, Tom Lane wrote: This is a bad idea. The bgwriter isn't the place to be doing freezing, So is this something that another daemon could handle? Possibly, but I'd be inclined to think of it as autovacuum's problem. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] Remove xmin and cmin from frozen tuples
Hi Hackers, I think it would be a waste to retain xmin and cmin for frozen tuples because their values represent only 'visible for all transactions'. Additionally, most tuples in database can be frozen potentially. I wrote a makeshift patch to compress xmin and cmin (8bytes) to 1-bit flag, using tuple overlaping. Is this idea worth trying? Also, it will be useful to combine it and more aggressive freeze vacuum, for example, a freezer integrated with bgwriter. (The following is test of the attached patch) * Test query 1. create table test (a int); 2. insert into test select * from generate_series(1, 10); 3. update test set a = a where a % 100 = 0; # to force defrag 4. select * from pgstattuple('test'); 5. vacuum freeze test; 6. select * from pgstattuple('test'); * Results of pgstattuple -[ before vacuum ]-+ table_len | 3645440 tuple_count| 10 tuple_len | 320 tuple_percent | 87.78 dead_tuple_count | 1000 dead_tuple_len | 32000 dead_tuple_percent | 0.88 free_space | 536 free_percent | 0.01 -[ 8.1beta1 orig ]-+ table_len | 3645440 tuple_count| 10 tuple_len | 320 tuple_percent | 87.78 dead_tuple_count | 0 dead_tuple_len | 0 dead_tuple_percent | 0 free_space | 30772 -- about 32byte * 1000 (dead tuples) free_percent | 0.84 -[ patched ]---+ table_len | 3645440 tuple_count| 10 tuple_len | 320 tuple_percent | 87.78 dead_tuple_count | 0 dead_tuple_len | 0 dead_tuple_percent | 0 free_space | 823628 -- + 8byte * 10 (whole tuples) free_percent | 22.59 --- ITAGAKI Takahiro NTT Cyber Space Laboratories xmincut.patch Description: Binary data ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Remove xmin and cmin from frozen tuples
On Thu, Sep 01, 2005 at 10:45:44AM +0900, ITAGAKI Takahiro wrote: Hi, I think it would be a waste to retain xmin and cmin for frozen tuples because their values represent only 'visible for all transactions'. Additionally, most tuples in database can be frozen potentially. I think this is an interesting idea. I was thinking that when the tuple needs to be obsoleted it would need to grow to accomodate the Xmax, but you are not actually proposing to remove that, so it seems sensible. In fact, it is perfectly reasonable to remove Xmin and Cmin, because after the tuple is frozen, the Xmin never changes again. Now, one thing of note is that you need to compress the page in order to actually be able to use the just-freed space. VACUUM could do that, but maybe it would be better to do it on-line -- the freezing process is going to have to write the page regardless. I wonder if with your patch the page is compressed on the same VACUUM execution that freezes the tuple? One thing that comes to mind is that this makes somewhat easier to build a tool to write pre-built tables, for bulk-loading purposes. You just construct the binary file with the HEAP_FROZEN bit set, and then attach the file to a dummy table. (Then again, you can do it today, using a Xmin of FrozenTransactionId. I wonder why the Bizgres people isn't advocating a tool to do that. It is very hard to do with user-defined types, but for BI/DW you mostly don't need those, do you?) -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com Cuando no hay humildad las personas se degradan (A. Christie) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Remove xmin and cmin from frozen tuples
ITAGAKI Takahiro [EMAIL PROTECTED] writes: I think it would be a waste to retain xmin and cmin for frozen tuples because their values represent only 'visible for all transactions'. True, but the hard part is getting rid of the storage for them. I wrote a makeshift patch to compress xmin and cmin (8bytes) to 1-bit flag, using tuple overlaping. Is this idea worth trying? I think this is incredibly ugly :-(. It eliminates a fairly basic assumption which is that items on a page don't overlap. The space savings cannot be worth the loss in testability and reliability. To take just one problem, it is no longer possible to check an item offset for validity against pd_upper. If we're going to do this, we need a more invasive patch that changes the structure of heaptuple headers in a more fundamental way, and avoids breaking the page layout representation. (Something like the way Oids are now handled might work, although there are alignment issues to worry about, and it'd take more work on VACUUM's part to convert a tuple to frozen state.) I'm also less than enthused about using up our last infomask bit for a relatively unimportant purpose. We might need that for something bigger someday... though I can't presently guess what. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Remove xmin and cmin from frozen tuples
Alvaro, One thing that comes to mind is that this makes somewhat easier to build a tool to write pre-built tables, for bulk-loading purposes. You just construct the binary file with the HEAP_FROZEN bit set, and then attach the file to a dummy table. (Then again, you can do it today, using a Xmin of FrozenTransactionId. I wonder why the Bizgres people isn't advocating a tool to do that. It is very hard to do with user-defined types, but for BI/DW you mostly don't need those, do you?) Hmmm ... can you expand on this a little? We'd discussed frozen partitions but hadn't thought to get around to them for a while, expecting the kind of issues which Tom just raised. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Remove xmin and cmin from frozen tuples
Alvaro Herrera [EMAIL PROTECTED] wrote: Now, one thing of note is that you need to compress the page in order to actually be able to use the just-freed space. VACUUM could do that, but maybe it would be better to do it on-line -- the freezing process is going to have to write the page regardless. I agree. I think an good position of freezer is on bgwriter. My idea is: 1. Just before bgwriter writes an dirty page in LRU order, 2. Freeze tuples in the page and repair fragmentation. 3. (Replace the fsm page that has least freespace.) 4. Flush the page. I wonder if with your patch the page is compressed on the same VACUUM execution that freezes the tuple? Yes, defragmentation is performed after freezing, but the page has at least one dead tuple. In current VACUUM implementation, pages that have no dead tuples will not be defraged. So you cannot compress just after bulk-load. --- ITAGAKI Takahiro NTT Cyber Space Laboratories ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Remove xmin and cmin from frozen tuples
Tom Lane [EMAIL PROTECTED] wrote: I think this is incredibly ugly :-(. Yes, I think so, too :-(My patch is product of the thought that I don't want to modify codes widely. So if we want to do it more cool way, lots of changes are needed as you said. I'm also less than enthused about using up our last infomask bit for a relatively unimportant purpose. We might need that for something bigger someday... though I can't presently guess what. I think it is not a problem, because the header still has rooms for several bits. I assume that the combination of HEAP_XMIN_COMMITTED + HEAP_XMIN_INVALID has currently no meaning, right? If so, HEAP_FROZEN can be assigned here. Also, t_natts is currently 16-bits, but it can be cut to 11-bits because MaxTupleAttributeNumber is 1664 2^11. --- ITAGAKI Takahiro NTT Cyber Space Laboratories ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster