Re: [PATCHES] COPY with no WAL, in certain circumstances
Patch withdrawn by author. --- Simon Riggs wrote: http://archives.postgresql.org/pgsql-hackers/2006-10/msg01172.php As discussed on -hackers, its possible to avoid writing any WAL at all for COPY in these circumstances: BEGIN; CREATE TABLE foo.. COPY foo... COMMIT; BEGIN; TRUNCATE foo.. COPY foo... COMMIT; The enclosed patch implements this, as discussed. There is no user interface to enable/disable, just as with CTAS and CREATE INDEX; no docs, just code comments. This plays nicely with the --single-transaction option in psql to allow fast restores/upgrades. YMMV but disk bound COPY will benefit greatly from this patch, some tests showing 100% gain. COPY is still *very* CPU intensive, so some tests have shown negligible benefit, fyi, but that isn't the typical case. Applies cleanly to CVS HEAD, passes make check. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com [ Attachment, skipping... ] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] COPY with no WAL, in certain circumstances
On Sat, 2007-01-06 at 21:32 -0500, Tom Lane wrote: Robert Treat [EMAIL PROTECTED] writes: On Saturday 06 January 2007 16:36, Simon Riggs wrote: snip BEGIN; CREATE TABLE foo... INSERT INTO foo--uses WAL COPY foo.. --no WAL INSERT INTO foo--uses WAL COPY foo.. --no WAL INSERT INTO foo--uses WAL COPY foo...--no WAL COMMIT; Is there some technical reason that the INSERT statements need to use WAL in these scenarios? First, there's enough other overhead to an INSERT that you'd not save much percentagewise. Second, not using WAL doesn't come for free: the cost is having to fsync the whole table afterwards. So it really only makes sense for commands that one can expect are writing pretty much all of the table. I could easily see it being a net loss for individual INSERTs. Agreed. We agreed that before, on the original design thread. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] COPY with no WAL, in certain circumstances
On Sat, 2007-01-06 at 21:18 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: The rule is: if the relfilenode for a table is new in this transaction (and therefore the whole things will be dropped at end-of-transaction) then *all* COPY commands are able to avoid writing WAL safely, if: - PITR is not enabled - there is no active portal (which could have been opened on an earlier commandid and could therefore see data prior to the switch to the new relfilenode). In those cases, *not* using WAL causes no problems at all, so sleep well without it. Uh ... what in the world has an active portal got to do with it? I think you've confused snapshot considerations with crash recovery. The patch sets HEAP_XMIN_COMMITTED on all of the rows loaded by COPY as well. So the active portal consideration does apply in this case. (We discussed about a year ago the idea of setting FrozenTransactionId, which I now agree wouldn't work, but setting the hint bits does work.). That is important, because otherwise the first person to read the newly loaded table has to re-write the whole table again; right now we ignore that cost as being associated with the original COPY, but from most users perspective it is. Its common practice to issue a select count(*) from table after its been loaded, so that later readers of the table don't suffer. Which makes me think we can still use the no-WAL optimisation, but just without setting HEAP_XMIN_COMMITTED when there is an active portal. (I should also mention that the creation of the relfilenode can happen in earlier committed subtransactions also. There is also a great big list of commands that throw implicit transactions, all of which cannot therefore be used with this optimisation either.) -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] COPY with no WAL, in certain circumstances
Simon Riggs [EMAIL PROTECTED] writes: On Sat, 2007-01-06 at 21:18 -0500, Tom Lane wrote: Uh ... what in the world has an active portal got to do with it? I think you've confused snapshot considerations with crash recovery. The patch sets HEAP_XMIN_COMMITTED on all of the rows loaded by COPY as well. I think you just talked yourself out of getting this patch applied. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] COPY with no WAL, in certain circumstances
On Sun, 2007-01-07 at 03:53 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: On Sat, 2007-01-06 at 21:18 -0500, Tom Lane wrote: Uh ... what in the world has an active portal got to do with it? I think you've confused snapshot considerations with crash recovery. The patch sets HEAP_XMIN_COMMITTED on all of the rows loaded by COPY as well. I think you just talked yourself out of getting this patch applied. Maybe; what would be your explanation? Do you have a failure case you know of? Perhaps if one exists, there is another route. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] COPY with no WAL, in certain circumstances
FYI, I am going need to add documentation in the COPY manual page or no one will know about this performance enhancement. --- Simon Riggs wrote: http://archives.postgresql.org/pgsql-hackers/2006-10/msg01172.php As discussed on -hackers, its possible to avoid writing any WAL at all for COPY in these circumstances: BEGIN; CREATE TABLE foo.. COPY foo... COMMIT; BEGIN; TRUNCATE foo.. COPY foo... COMMIT; The enclosed patch implements this, as discussed. There is no user interface to enable/disable, just as with CTAS and CREATE INDEX; no docs, just code comments. This plays nicely with the --single-transaction option in psql to allow fast restores/upgrades. YMMV but disk bound COPY will benefit greatly from this patch, some tests showing 100% gain. COPY is still *very* CPU intensive, so some tests have shown negligible benefit, fyi, but that isn't the typical case. Applies cleanly to CVS HEAD, passes make check. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com [ Attachment, skipping... ] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] COPY with no WAL, in certain circumstances
On Sat, 2007-01-06 at 11:05 -0500, Bruce Momjian wrote: FYI, I am going need to add documentation in the COPY manual page or no one will know about this performance enhancement. I have some questions: As discussed on -hackers, its possible to avoid writing any WAL at all for COPY in these circumstances: BEGIN; CREATE TABLE foo.. COPY foo... COMMIT; What if I do this? BEGIN; CREATE TABLE foo... INSERT INTO foo VALUES ('1'); COPY foo... COMMIT; ? E.g., what are the boundaries of ignoring the WAL? Joshua D. Drake BEGIN; TRUNCATE foo.. COPY foo... COMMIT; The enclosed patch implements this, as discussed. There is no user interface to enable/disable, just as with CTAS and CREATE INDEX; no docs, just code comments. This plays nicely with the --single-transaction option in psql to allow fast restores/upgrades. YMMV but disk bound COPY will benefit greatly from this patch, some tests showing 100% gain. COPY is still *very* CPU intensive, so some tests have shown negligible benefit, fyi, but that isn't the typical case. Applies cleanly to CVS HEAD, passes make check. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com [ Attachment, skipping... ] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] COPY with no WAL, in certain circumstances
Joshua D. Drake wrote: On Sat, 2007-01-06 at 11:05 -0500, Bruce Momjian wrote: FYI, I am going need to add documentation in the COPY manual page or no one will know about this performance enhancement. I have some questions: As discussed on -hackers, its possible to avoid writing any WAL at all for COPY in these circumstances: BEGIN; CREATE TABLE foo.. COPY foo... COMMIT; What if I do this? BEGIN; CREATE TABLE foo... INSERT INTO foo VALUES ('1'); COPY foo... COMMIT; On ABORT, the entire table disappears, as well as the INSERT, so I don't see any problem. I assume the INSERT is WAL logged. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] COPY with no WAL, in certain circumstances
BEGIN; CREATE TABLE foo... INSERT INTO foo VALUES ('1'); COPY foo... COMMIT; On ABORT, the entire table disappears, as well as the INSERT, so I don't see any problem. I assume the INSERT is WAL logged. No I don't see any problems, I am just trying to understand the boundaries. E.g., is there some weird limitation where if I have any values in the table before the copy (like the example above) that copy will go through WAL. Or in other words, does this patch mean that all COPY execution that is within a transaction will ignore WAL? Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] COPY with no WAL, in certain circumstances
Bruce Momjian [EMAIL PROTECTED] writes: FYI, I am going need to add documentation in the COPY manual page or no one will know about this performance enhancement. I don't think it belongs in COPY. What would make more sense is another item under the populating a database performance tips, suggesting that wrapping the restore into a single transaction is a good idea. We don't really want to be documenting this separately under COPY, CREATE INDEX, and everywhere else that might eventually optimize the case. Come to think of it, that page also fails to suggest that PITR logging shouldn't be on during bulk load. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] COPY with no WAL, in certain circumstances
Simon Riggs wrote: As discussed on -hackers, its possible to avoid writing any WAL at all for COPY in these circumstances: Cool. The enclosed patch implements this, as discussed. There is no user interface to enable/disable, just as with CTAS and CREATE INDEX; no docs, just code comments. IMHO, this deserves an GUC parameter (use_wal_in_copy?). Because a lot of people use COPY because it's faster than INSERT but expects that it will be in WAL. The default would be use_wal_in_copy = true. -- Euler Taveira de Oliveira http://www.timbira.com/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] COPY with no WAL, in certain circumstances
On Sat, 2007-01-06 at 16:41 -0200, Euler Taveira de Oliveira wrote: Simon Riggs wrote: As discussed on -hackers, its possible to avoid writing any WAL at all for COPY in these circumstances: Cool. The enclosed patch implements this, as discussed. There is no user interface to enable/disable, just as with CTAS and CREATE INDEX; no docs, just code comments. IMHO, this deserves an GUC parameter (use_wal_in_copy?). Because a lot of people use COPY because it's faster than INSERT but expects that it will be in WAL. The default would be use_wal_in_copy = true. That I don't think makes sense. A copy is an all or nothing option, if a copy fails in the middle the whole thing is rolled back. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] COPY with no WAL, in certain circumstances
Euler Taveira de Oliveira [EMAIL PROTECTED] writes: Simon Riggs wrote: The enclosed patch implements this, as discussed. There is no user interface to enable/disable, just as with CTAS and CREATE INDEX; no docs, just code comments. IMHO, this deserves an GUC parameter (use_wal_in_copy?). Why? The whole point is that it's automatic and transparent. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] COPY with no WAL, in certain circumstances
Joshua D. Drake wrote: BEGIN; CREATE TABLE foo... INSERT INTO foo VALUES ('1'); COPY foo... COMMIT; On ABORT, the entire table disappears, as well as the INSERT, so I don't see any problem. I assume the INSERT is WAL logged. No I don't see any problems, I am just trying to understand the boundaries. E.g., is there some weird limitation where if I have any values in the table before the copy (like the example above) that copy will go through WAL. Or in other words, does this patch mean that all COPY execution that is within a transaction will ignore WAL? Yes, because it is possible to do in all cases. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] COPY with no WAL, in certain circumstances
On Sat, 2007-01-06 at 15:24 -0500, Bruce Momjian wrote: Joshua D. Drake wrote: BEGIN; CREATE TABLE foo... INSERT INTO foo VALUES ('1'); COPY foo... COMMIT; On ABORT, the entire table disappears, as well as the INSERT, so I don't see any problem. I assume the INSERT is WAL logged. No I don't see any problems, I am just trying to understand the boundaries. E.g., is there some weird limitation where if I have any values in the table before the copy (like the example above) that copy will go through WAL. Or in other words, does this patch mean that all COPY execution that is within a transaction will ignore WAL? Yes, because it is possible to do in all cases. Very happy to add documentation where Tom suggested. Reason for no documentation was that CREATE INDEX and CREATE TABLE AS SELECT already use this optimisation, but to my knowledge neither was/is documented on those command pages. The rule is: if the relfilenode for a table is new in this transaction (and therefore the whole things will be dropped at end-of-transaction) then *all* COPY commands are able to avoid writing WAL safely, if: - PITR is not enabled - there is no active portal (which could have been opened on an earlier commandid and could therefore see data prior to the switch to the new relfilenode). In those cases, *not* using WAL causes no problems at all, so sleep well without it. So all of these work as shown BEGIN; COPY foo... --uses WAL TRUNCATE foo... COPY foo.. --no WAL COPY foo.. --no WAL COMMIT; BEGIN; CREATE TABLE foo... INSERT INTO foo --uses WAL COPY foo.. --no WAL INSERT INTO foo --uses WAL COPY foo.. --no WAL INSERT INTO foo --uses WAL COPY foo... --no WAL COMMIT; BEGIN; CREATE TABLE foo... AS SELECT --no WAL INSERT INTO foo --uses WAL COPY foo.. --no WAL COMMIT; BEGIN; DECLARE CURSOR cursor CREATE TABLE foo... COPY foo.. --uses WAL because active portal COPY foo.. --uses WAL because active portal CLOSE cursor COPY foo.. --no WAL COPY foo.. --no WAL COMMIT; psql --single-transaction -f mydb.pgdump Come to think of it, I should be able to use pg_current_xlog_insert_location() to come up with a test case. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] COPY with no WAL, in certain circumstances
Simon Riggs wrote: Or in other words, does this patch mean that all COPY execution that is within a transaction will ignore WAL? Yes, because it is possible to do in all cases. Very happy to add documentation where Tom suggested. Reason for no documentation was that CREATE INDEX and CREATE TABLE AS SELECT already use this optimisation, but to my knowledge neither was/is documented on those command pages. I wasn't aware those used the optimization. Seems they all should be documented somewhere. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] COPY with no WAL, in certain circumstances
Joshua D. Drake wrote: IMHO, this deserves an GUC parameter (use_wal_in_copy?). Because a lot of people use COPY because it's faster than INSERT but expects that it will be in WAL. The default would be use_wal_in_copy = true. That I don't think makes sense. A copy is an all or nothing option, if a copy fails in the middle the whole thing is rolled back. I was worried about PITR, but Simon answers my question: PITR enables so uses WAL. -- Euler Taveira de Oliveira http://www.timbira.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] COPY with no WAL, in certain circumstances
On Saturday 06 January 2007 16:40, Bruce Momjian wrote: Simon Riggs wrote: Or in other words, does this patch mean that all COPY execution that is within a transaction will ignore WAL? Yes, because it is possible to do in all cases. Very happy to add documentation where Tom suggested. Reason for no documentation was that CREATE INDEX and CREATE TABLE AS SELECT already use this optimisation, but to my knowledge neither was/is documented on those command pages. I wasn't aware those used the optimization. Seems they all should be documented somewhere. Might I suggest somewhere under chapter 27, with something akin to what we have for documenting lock levels and the different operations that use them. We document the reasons you want to avoid WAL and various operations in the database that do this automagically. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] COPY with no WAL, in certain circumstances
On Saturday 06 January 2007 16:36, Simon Riggs wrote: The rule is: if the relfilenode for a table is new in this transaction (and therefore the whole things will be dropped at end-of-transaction) then *all* COPY commands are able to avoid writing WAL safely, if: - PITR is not enabled - there is no active portal (which could have been opened on an earlier commandid and could therefore see data prior to the switch to the new relfilenode). In those cases, *not* using WAL causes no problems at all, so sleep well without it. snip BEGIN; CREATE TABLE foo... INSERT INTO foo --uses WAL COPY foo.. --no WAL INSERT INTO foo --uses WAL COPY foo.. --no WAL INSERT INTO foo --uses WAL COPY foo... --no WAL COMMIT; Is there some technical reason that the INSERT statements need to use WAL in these scenarios? ISTM that in the above scenario there are no cases where the INSERT statements are any more recoverable than the COPY statements. While there might not be much gain from bypassing WAL on a single insert, in bunches, or more importantly when doing INSERT INTO foo SELECT *, it could be a nice improvement as well. Am I overlooking something? -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] COPY with no WAL, in certain circumstances
Simon Riggs [EMAIL PROTECTED] writes: The rule is: if the relfilenode for a table is new in this transaction (and therefore the whole things will be dropped at end-of-transaction) then *all* COPY commands are able to avoid writing WAL safely, if: - PITR is not enabled - there is no active portal (which could have been opened on an earlier commandid and could therefore see data prior to the switch to the new relfilenode). In those cases, *not* using WAL causes no problems at all, so sleep well without it. Uh ... what in the world has an active portal got to do with it? I think you've confused snapshot considerations with crash recovery. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] COPY with no WAL, in certain circumstances
Bruce Momjian [EMAIL PROTECTED] writes: Simon Riggs wrote: Reason for no documentation was that CREATE INDEX and CREATE TABLE AS SELECT already use this optimisation, but to my knowledge neither was/is documented on those command pages. I wasn't aware those used the optimization. Seems they all should be documented somewhere. We don't document every single optimization in the system ... if we did, the docs would be as big as the source code and equally unreadable by non-programmers. I think it's a much better idea just to mention it one place and not try to enumerate exactly which commands have the optimization. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] COPY with no WAL, in certain circumstances
Robert Treat [EMAIL PROTECTED] writes: On Saturday 06 January 2007 16:36, Simon Riggs wrote: snip BEGIN; CREATE TABLE foo... INSERT INTO foo --uses WAL COPY foo.. --no WAL INSERT INTO foo --uses WAL COPY foo.. --no WAL INSERT INTO foo --uses WAL COPY foo... --no WAL COMMIT; Is there some technical reason that the INSERT statements need to use WAL in these scenarios? First, there's enough other overhead to an INSERT that you'd not save much percentagewise. Second, not using WAL doesn't come for free: the cost is having to fsync the whole table afterwards. So it really only makes sense for commands that one can expect are writing pretty much all of the table. I could easily see it being a net loss for individual INSERTs. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] COPY with no WAL, in certain circumstances
Is there some technical reason that the INSERT statements need to use WAL in these scenarios? First, there's enough other overhead to an INSERT that you'd not save much percentagewise. Second, not using WAL doesn't come for free: the cost is having to fsync the whole table afterwards. So it really only makes sense for commands that one can expect are writing pretty much all of the table. I could easily see it being a net loss for individual INSERTs. What about multi value inserts? Just curious. Joshua D. Drake regards, tom lane -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] COPY with no WAL, in certain circumstances
On Sat, 2007-01-06 at 22:09 -0500, Tom Lane wrote: Joshua D. Drake [EMAIL PROTECTED] writes: cost is having to fsync the whole table afterwards. So it really only makes sense for commands that one can expect are writing pretty much all of the table. I could easily see it being a net loss for individual INSERTs. What about multi value inserts? Just curious. I wouldn't want the system to assume that a multi-VALUES insert is writing most of the table. Would you? The thing is reasonable for inserting maybe a few hundred or few thousand rows at most, and that's still small in comparison to typical tables. Good point. :) Joshua D. Drake regards, tom lane -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate ---(end of broadcast)--- TIP 6: explain analyze is your friend