Re: [HACKERS] GSoC proposal - make an unlogged table logged
On Thu, Apr 3, 2014 at 7:26 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 04/01/2014 08:39 PM, Heikki Linnakangas wrote: On 03/07/2014 05:36 AM, Tom Lane wrote: =?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= fabriziome...@gmail.com writes: Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED too? Thinking in a scope of one GSoC, of course. I think it's basically the same thing. You might hope to optimize it; but you have to create (rather than remove) an init fork, and there's no way to do that in exact sync with the commit. You just have to include that information with the commit WAL record, no? No-one's replied yet, but perhaps the worry is that after you've written the commit record, you have to go ahead with removing/creating the init fork, and that is seen as too risky. If a creat() or unlink() call fails, that will have to be a PANIC, and crash recovery will likewise have to PANIC if the forks still cannot be removed/created. Yeah, that's the concern. If I may digress for a moment, unlogged materialized views are not supported. This is because we have this facility where if a materialized view hasn't been populated yet, you get an error when you try to scan it. If we allowed unlogged materialized views, then they'd get reset to empty rather than to not-populated, because the not-populated status is stored in the catalog, not the filesystem. I still wish we'd never added the notion of populated in the first place, but Kevin felt it was essential, so we ended up here. Anyway, the idea that I had for fixing the unlogged materialized view case was to add a new 64-bit integer to the control file that gets bumped every time we start crash recovery, and which also gets recorded in pg_class. The value 0 would be reserved, and all pg_class entries for non-unlogged relations would store 0. For unlogged relations, we could check whether the value in pg_class equals the current value; if not, the relation should be viewed as not-populated. This is not too far from a solution from the problem we need to solve here. If we want to make an unlogged relation logged, we can go ahead and remove the init forks right away, knowing that the pg_class update changing relpersistence and this new value won't take effect until commit. If the system meanwhile crashes, a backend connected to the relevant database has enough state to recognize that the relation is in this in-between state. Before we can again use that relation, we need to rebuild the init fork and reset it. Of course, it's not too clear exactly how that state cleanup happens; as one option, we could just require the user to run a manual TRUNCATE. This would not be totally without precedent, because CREATE INDEX CONCURRENTLY leaves crap behind that the user has to reindex or drop, but it's clearly not ideal. Another option would be to try to make autovacuum put things right, but that of course might not happen right away. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On 04/01/2014 08:58 PM, Andres Freund wrote: On 2014-04-01 12:56:04 -0500, Jim Nasby wrote: On 3/4/14, 8:50 AM, Andres Freund wrote: Can't that be solved by just creating the permanent relation in a new relfilenode? That's equivalent to a rewrite, yes, but we need to do that for anything but wal_level=minimal anyway. Maybe I'm missing something, but doesn't this actually involve writing the data twice? Once into WAL and again into the relation itself? Yes. But as I said, that's unavoidable for anything but wal_level=minimal. Ideally, you would *only* write the data to WAL, when you do ALTER TABLE ... SET LOGGED. There's no fundamental reason you need to rewrite the heap, too. I understand that it might be difficult to do, because of the way the system catalogs work, but it's worthy goal. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On 2014-04-03 13:38:29 +0300, Heikki Linnakangas wrote: On 04/01/2014 08:58 PM, Andres Freund wrote: On 2014-04-01 12:56:04 -0500, Jim Nasby wrote: On 3/4/14, 8:50 AM, Andres Freund wrote: Can't that be solved by just creating the permanent relation in a new relfilenode? That's equivalent to a rewrite, yes, but we need to do that for anything but wal_level=minimal anyway. Maybe I'm missing something, but doesn't this actually involve writing the data twice? Once into WAL and again into the relation itself? Yes. But as I said, that's unavoidable for anything but wal_level=minimal. Ideally, you would *only* write the data to WAL, when you do ALTER TABLE ... SET LOGGED. There's no fundamental reason you need to rewrite the heap, too. I understand that it might be difficult to do, because of the way the system catalogs work, but it's worthy goal. I don't think that's realistic to achieve due to the issues described in http://archives.postgresql.org/message-id/CA%2BTgmob44LNwwU73N1aJsGQyzQ61SdhKJRC_89wCm0%2BaLg%3Dx2Q%40mail.gmail.com I don't think it's worthwile to make the feature much more complex, just to address this. perfect is the enemy of good and all that. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On 04/03/2014 01:44 PM, Andres Freund wrote: On 2014-04-03 13:38:29 +0300, Heikki Linnakangas wrote: On 04/01/2014 08:58 PM, Andres Freund wrote: On 2014-04-01 12:56:04 -0500, Jim Nasby wrote: On 3/4/14, 8:50 AM, Andres Freund wrote: Can't that be solved by just creating the permanent relation in a new relfilenode? That's equivalent to a rewrite, yes, but we need to do that for anything but wal_level=minimal anyway. Maybe I'm missing something, but doesn't this actually involve writing the data twice? Once into WAL and again into the relation itself? Yes. But as I said, that's unavoidable for anything but wal_level=minimal. Ideally, you would *only* write the data to WAL, when you do ALTER TABLE ... SET LOGGED. There's no fundamental reason you need to rewrite the heap, too. I understand that it might be difficult to do, because of the way the system catalogs work, but it's worthy goal. I don't think that's realistic to achieve due to the issues described in http://archives.postgresql.org/message-id/CA%2BTgmob44LNwwU73N1aJsGQyzQ61SdhKJRC_89wCm0%2BaLg%3Dx2Q%40mail.gmail.com To which I replied here: http://www.postgresql.org/message-id/533af9d7.7010...@vmware.com. Please reply to that sub-thread with any problems you see. I might be missing something, but I really don't see any insurmountable problem here. I don't think it's worthwile to make the feature much more complex, just to address this. perfect is the enemy of good and all that. We should do the trivial implementation first, sure. But that ought to be trivial. Now is the time to discuss how to do the more optimal thing. If we can come up with a feasible design on that, Fabrizio will have time to do that as part of the GSoC. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On 2014-04-01 20:39:35 +0300, Heikki Linnakangas wrote: On 03/07/2014 05:36 AM, Tom Lane wrote: =?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= fabriziome...@gmail.com writes: Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED too? Thinking in a scope of one GSoC, of course. I think it's basically the same thing. You might hope to optimize it; but you have to create (rather than remove) an init fork, and there's no way to do that in exact sync with the commit. You just have to include that information with the commit WAL record, no? Sure, it's possible to do that. But that seems like complicating generic paths more than I'd like for a minor feature. Especially as the unlinking of the files would need to happen somewhere in RecordTransactionCommit(). After the XLogFlush(), but before unsetting MyPgXact-delayChkpt. That's a crit section, right? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On 04/01/2014 08:39 PM, Heikki Linnakangas wrote: On 03/07/2014 05:36 AM, Tom Lane wrote: =?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= fabriziome...@gmail.com writes: Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED too? Thinking in a scope of one GSoC, of course. I think it's basically the same thing. You might hope to optimize it; but you have to create (rather than remove) an init fork, and there's no way to do that in exact sync with the commit. You just have to include that information with the commit WAL record, no? No-one's replied yet, but perhaps the worry is that after you've written the commit record, you have to go ahead with removing/creating the init fork, and that is seen as too risky. If a creat() or unlink() call fails, that will have to be a PANIC, and crash recovery will likewise have to PANIC if the forks still cannot be removed/created. My first thought is that that seems ok. It's unlikely that an unlink() of a small file in the data directory would fail. Creation could be done with a temporary name first and renamed into place, to avoid running out of disk space in the critical section. If that's not acceptable, one idea off the top of my head is to somehow stamp the init forks when making an unlogged table logged, with the XID of the transcation. Crash recovery could then check the clog to see if the transaction committed, and ignore any init fork files belonging to committed transactions. (Same in reverse when making a logged table unlogged). Currently, we reset unlogged relations before replaying the WAL. That would have to be delayed until end of WAL replay, because otherwise we don't know if the transaction committed or not. Although if we go with the stamping approach, we could still reset unstamped files at the beginning of recovery. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On 2014-04-03 14:26:50 +0300, Heikki Linnakangas wrote: On 04/01/2014 08:39 PM, Heikki Linnakangas wrote: On 03/07/2014 05:36 AM, Tom Lane wrote: =?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= fabriziome...@gmail.com writes: Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED too? Thinking in a scope of one GSoC, of course. I think it's basically the same thing. You might hope to optimize it; but you have to create (rather than remove) an init fork, and there's no way to do that in exact sync with the commit. You just have to include that information with the commit WAL record, no? No-one's replied yet That might be because it was a month after the initial discussion, and at least I'd temporarily lost track of the thread ;) , but perhaps the worry is that after you've written the commit record, you have to go ahead with removing/creating the init fork, and that is seen as too risky. If a creat() or unlink() call fails, that will have to be a PANIC, and crash recovery will likewise have to PANIC if the forks still cannot be removed/created. That's part of the worry, yes. It's also creeping code dealing with unlogged relations into a fairly critical place (RecordTransactionCommit()) where it really doesn't seem to belong. My first thought is that that seems ok. It's unlikely that an unlink() of a small file in the data directory would fail. Creation could be done with a temporary name first and renamed into place, to avoid running out of disk space in the critical section. I continue to feel that that's far too much impact for a minor feature. Even if it could be made work reliably, it'll be a fair amount of seldomly used infrastructure. If that's not acceptable, one idea off the top of my head is to somehow stamp the init forks when making an unlogged table logged, with the XID of the transcation. Crash recovery could then check the clog to see if the transaction committed, and ignore any init fork files belonging to committed transactions. (Same in reverse when making a logged table unlogged). I've thought about that - after all, the logical decoding stuff uses that trick in some places - but it has the grave disadvantage that it requires a full directory scan to fully remove a relation. That seems to be a heavy price. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On 2014-04-03 13:38:29 +0300, Heikki Linnakangas wrote: On 04/01/2014 08:58 PM, Andres Freund wrote: On 2014-04-01 12:56:04 -0500, Jim Nasby wrote: On 3/4/14, 8:50 AM, Andres Freund wrote: Can't that be solved by just creating the permanent relation in a new relfilenode? That's equivalent to a rewrite, yes, but we need to do that for anything but wal_level=minimal anyway. Maybe I'm missing something, but doesn't this actually involve writing the data twice? Once into WAL and again into the relation itself? Yes. But as I said, that's unavoidable for anything but wal_level=minimal. Ideally, you would *only* write the data to WAL, when you do ALTER TABLE ... SET LOGGED. There's no fundamental reason you need to rewrite the heap, too. As another point: What's the advantage of that? The amount of writes will be the same, no? It doesn't seem to be all that interesting that a second filenode exists temporarily? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On 04/03/2014 02:41 PM, Andres Freund wrote: On 2014-04-03 13:38:29 +0300, Heikki Linnakangas wrote: On 04/01/2014 08:58 PM, Andres Freund wrote: On 2014-04-01 12:56:04 -0500, Jim Nasby wrote: On 3/4/14, 8:50 AM, Andres Freund wrote: Can't that be solved by just creating the permanent relation in a new relfilenode? That's equivalent to a rewrite, yes, but we need to do that for anything but wal_level=minimal anyway. Maybe I'm missing something, but doesn't this actually involve writing the data twice? Once into WAL and again into the relation itself? Yes. But as I said, that's unavoidable for anything but wal_level=minimal. Ideally, you would *only* write the data to WAL, when you do ALTER TABLE ... SET LOGGED. There's no fundamental reason you need to rewrite the heap, too. As another point: What's the advantage of that? The amount of writes will be the same, no? It doesn't seem to be all that interesting that a second filenode exists temporarily? Surely it's cheaper to read the whole relation and copy it to just WAL, than to read the whole relation and write it both the WAL and another file. (Maybe it's not worth the trouble to avoid it - but that depends on whether we come up with a good design..) - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On 2014-04-03 15:02:27 +0300, Heikki Linnakangas wrote: On 04/03/2014 02:41 PM, Andres Freund wrote: On 2014-04-03 13:38:29 +0300, Heikki Linnakangas wrote: On 04/01/2014 08:58 PM, Andres Freund wrote: On 2014-04-01 12:56:04 -0500, Jim Nasby wrote: On 3/4/14, 8:50 AM, Andres Freund wrote: Can't that be solved by just creating the permanent relation in a new relfilenode? That's equivalent to a rewrite, yes, but we need to do that for anything but wal_level=minimal anyway. Maybe I'm missing something, but doesn't this actually involve writing the data twice? Once into WAL and again into the relation itself? Yes. But as I said, that's unavoidable for anything but wal_level=minimal. Ideally, you would *only* write the data to WAL, when you do ALTER TABLE ... SET LOGGED. There's no fundamental reason you need to rewrite the heap, too. As another point: What's the advantage of that? The amount of writes will be the same, no? It doesn't seem to be all that interesting that a second filenode exists temporarily? Surely it's cheaper to read the whole relation and copy it to just WAL, than to read the whole relation and write it both the WAL and another file. I have to admit I was thinking of the WAL replay case ;). But we'll actually have to write all dirty s_b, change the persistency tags and such anyway because there's no LSN interlock with checkpoints. That seems pretty ugly as well, and once again, avoidable by a rewrite. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC proposal - make an unlogged table logged
Heikki Linnakangas hlinnakan...@vmware.com writes: No-one's replied yet, but perhaps the worry is that after you've written the commit record, you have to go ahead with removing/creating the init fork, and that is seen as too risky. If a creat() or unlink() call fails, that will have to be a PANIC, and crash recovery will likewise have to PANIC if the forks still cannot be removed/created. My first thought is that that seems ok. No, it isn't. No filesystem operation should *ever* be thought to be guaranteed to succeed. I also concur with Andres' complaint that this feature is not worth adding complication to the core transaction commit path for. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On Tue, Apr 1, 2014 at 1:40 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-04-01 13:37:57 -0300, Fabrízio de Royes Mello wrote: In the GSoC proposal page [1] I received some suggestions to strech goals: * ALTER TABLE name SET UNLOGGED. This is essentially the reverse of the core proposal, which is ALTER TABLE name SET LOGGED. Yes, I think that should definitely be included. It would be weird to have SET LOGGED but not SET UNLOGGED. Yes, that makes sense. * Allow unlogged indexes on logged tables. I don't think it's realistic to build the infrastructure necessary for that as part of gsoc. The reasons have been explained somewhere in this thread. * Implement ALTER TABLE name SET LOGGED without rewriting the whole table, when wal_level = minimal. Yea, maybe. * Allow unlogged materialized views. I don't think that's realistic either. Thanks Andres for your comments. Anyway I added this additional goals to proposal and as Heikki said: It's actually nice to have several separate goals like this, it means that if you finish the task faster than expected, you can move on to the next goal, and if one task takes longer than expected so that you don't finish all the work, we'll still have something useful. I hope you can help me in some way with this project too. :-) Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On Fri, Mar 7, 2014 at 12:36 AM, Tom Lane t...@sss.pgh.pa.us wrote: =?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= fabriziome...@gmail.com writes: Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED too? Thinking in a scope of one GSoC, of course. I think it's basically the same thing. You might hope to optimize it; but you have to create (rather than remove) an init fork, and there's no way to do that in exact sync with the commit. So for safety I think you have to copy the data into a new relfilenode. Hi all, In the GSoC proposal page [1] I received some suggestions to strech goals: * ALTER TABLE name SET UNLOGGED. This is essentially the reverse of the core proposal, which is ALTER TABLE name SET LOGGED. Yes, I think that should definitely be included. It would be weird to have SET LOGGED but not SET UNLOGGED. * Allow unlogged indexes on logged tables. * Implement ALTER TABLE name SET LOGGED without rewriting the whole table, when wal_level = minimal. * Allow unlogged materialized views. Comments? [1] http://www.google-melange.com/gsoc/proposal/review/student/google/gsoc2014/fabriziomello/5629499534213120 -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://frabriziomello.blogspot.comhttp://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On 2014-04-01 13:37:57 -0300, Fabrízio de Royes Mello wrote: In the GSoC proposal page [1] I received some suggestions to strech goals: * ALTER TABLE name SET UNLOGGED. This is essentially the reverse of the core proposal, which is ALTER TABLE name SET LOGGED. Yes, I think that should definitely be included. It would be weird to have SET LOGGED but not SET UNLOGGED. Yes, that makes sense. * Allow unlogged indexes on logged tables. I don't think it's realistic to build the infrastructure necessary for that as part of gsoc. The reasons have been explained somewhere in this thread. * Implement ALTER TABLE name SET LOGGED without rewriting the whole table, when wal_level = minimal. Yea, maybe. * Allow unlogged materialized views. I don't think that's realistic either. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On 03/07/2014 05:36 AM, Tom Lane wrote: =?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= fabriziome...@gmail.com writes: Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED too? Thinking in a scope of one GSoC, of course. I think it's basically the same thing. You might hope to optimize it; but you have to create (rather than remove) an init fork, and there's no way to do that in exact sync with the commit. You just have to include that information with the commit WAL record, no? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On 3/4/14, 8:50 AM, Andres Freund wrote: Can't that be solved by just creating the permanent relation in a new relfilenode? That's equivalent to a rewrite, yes, but we need to do that for anything but wal_level=minimal anyway. Maybe I'm missing something, but doesn't this actually involve writing the data twice? Once into WAL and again into the relation itself? -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On 2014-04-01 12:56:04 -0500, Jim Nasby wrote: On 3/4/14, 8:50 AM, Andres Freund wrote: Can't that be solved by just creating the permanent relation in a new relfilenode? That's equivalent to a rewrite, yes, but we need to do that for anything but wal_level=minimal anyway. Maybe I'm missing something, but doesn't this actually involve writing the data twice? Once into WAL and again into the relation itself? Yes. But as I said, that's unavoidable for anything but wal_level=minimal. If somebody wants to put in the additional nontrivial work to make it work faster with wal_level=minimal, they can do so. But the other case is more general and needs to be done anyway. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On Wed, Mar 5, 2014 at 7:42 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Tue, Mar 4, 2014 at 5:00 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Tue, Mar 4, 2014 at 3:29 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-03-04 12:54:02 -0500, Robert Haas wrote: On Tue, Mar 4, 2014 at 9:50 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-03-04 09:47:08 -0500, Robert Haas wrote: Can't that be solved by just creating the permanent relation in a new relfilenode? That's equivalent to a rewrite, yes, but we need to do that for anything but wal_level=minimal anyway. Yes, that would work. I've tended to view optimizing away the relfilenode copy as an indispensable part of this work, but that might be wrongheaded. It would certainly be a lot easier to make this happen if we didn't insist on that. I think it'd already much better than today's situation, and it's a required codepath for wal_level logical anyway. So even if somebody wants to make this work without the full copy for minimal, it'd still be a required codepath. So I am perfectly ok with a patch just adding that. Then is this a good idea for a GSoC project ? I don't know very well this internals, but I am willing to learn and I think the GSoC is a good opportunity. Any of you are willing to mentoring this project? I written the proposal to this feature, so I would like to know if someone can review. I think this isn't a good design. Per the discussion between Andres and I, I think that I think you should do is make ALTER TABLE .. SET LOGGED work just like VACUUM FULL, with the exception that it will set a different relpersistence for the new relfilenode. If you do it that way, this will be less efficient, but much simpler, and you might actually finish it in one summer. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On 6 March 2014 19:42, Robert Haas robertmh...@gmail.com wrote: On Wed, Mar 5, 2014 at 7:42 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Tue, Mar 4, 2014 at 5:00 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Tue, Mar 4, 2014 at 3:29 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-03-04 12:54:02 -0500, Robert Haas wrote: On Tue, Mar 4, 2014 at 9:50 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-03-04 09:47:08 -0500, Robert Haas wrote: Can't that be solved by just creating the permanent relation in a new relfilenode? That's equivalent to a rewrite, yes, but we need to do that for anything but wal_level=minimal anyway. Yes, that would work. I've tended to view optimizing away the relfilenode copy as an indispensable part of this work, but that might be wrongheaded. It would certainly be a lot easier to make this happen if we didn't insist on that. I think it'd already much better than today's situation, and it's a required codepath for wal_level logical anyway. So even if somebody wants to make this work without the full copy for minimal, it'd still be a required codepath. So I am perfectly ok with a patch just adding that. Then is this a good idea for a GSoC project ? I don't know very well this internals, but I am willing to learn and I think the GSoC is a good opportunity. Any of you are willing to mentoring this project? I written the proposal to this feature, so I would like to know if someone can review. I think this isn't a good design. Per the discussion between Andres and I, I think that I think you should do is make ALTER TABLE .. SET LOGGED work just like VACUUM FULL, with the exception that it will set a different relpersistence for the new relfilenode. If you do it that way, this will be less efficient, but much simpler, and you might actually finish it in one summer. Sounds like a plan. Would there be any stretch-goals for this work, or is there not really anything else that could be done? -- Thom
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On Thu, Mar 6, 2014 at 4:42 PM, Robert Haas robertmh...@gmail.com wrote: I think this isn't a good design. Per the discussion between Andres and I, I think that I think you should do is make ALTER TABLE .. SET LOGGED work just like VACUUM FULL, with the exception that it will set a different relpersistence for the new relfilenode. If you do it that way, this will be less efficient, but much simpler, and you might actually finish it in one summer. Do it like 'VACUUM FULL' for any wal_level? Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On Thu, Mar 6, 2014 at 2:52 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Thu, Mar 6, 2014 at 4:42 PM, Robert Haas robertmh...@gmail.com wrote: I think this isn't a good design. Per the discussion between Andres and I, I think that I think you should do is make ALTER TABLE .. SET LOGGED work just like VACUUM FULL, with the exception that it will set a different relpersistence for the new relfilenode. If you do it that way, this will be less efficient, but much simpler, and you might actually finish it in one summer. Do it like 'VACUUM FULL' for any wal_level? Yep. Anything else appears to be a research problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On Thu, Mar 6, 2014 at 5:04 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Mar 6, 2014 at 2:52 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Thu, Mar 6, 2014 at 4:42 PM, Robert Haas robertmh...@gmail.com wrote: I think this isn't a good design. Per the discussion between Andres and I, I think that I think you should do is make ALTER TABLE .. SET LOGGED work just like VACUUM FULL, with the exception that it will set a different relpersistence for the new relfilenode. If you do it that way, this will be less efficient, but much simpler, and you might actually finish it in one summer. Do it like 'VACUUM FULL' for any wal_level? Yep. Anything else appears to be a research problem. I'll change the proposal. Thanks a lot! Grettings, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On Thu, Mar 6, 2014 at 5:05 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Thu, Mar 6, 2014 at 5:04 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Mar 6, 2014 at 2:52 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Thu, Mar 6, 2014 at 4:42 PM, Robert Haas robertmh...@gmail.com wrote: I think this isn't a good design. Per the discussion between Andres and I, I think that I think you should do is make ALTER TABLE .. SET LOGGED work just like VACUUM FULL, with the exception that it will set a different relpersistence for the new relfilenode. If you do it that way, this will be less efficient, but much simpler, and you might actually finish it in one summer. Do it like 'VACUUM FULL' for any wal_level? Yep. Anything else appears to be a research problem. I'll change the proposal. Thanks a lot! One last question. Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED too? Thinking in a scope of one GSoC, of course. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On Thu, Mar 6, 2014 at 5:04 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Mar 6, 2014 at 2:52 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Thu, Mar 6, 2014 at 4:42 PM, Robert Haas robertmh...@gmail.com wrote: I think this isn't a good design. Per the discussion between Andres and I, I think that I think you should do is make ALTER TABLE .. SET LOGGED work just like VACUUM FULL, with the exception that it will set a different relpersistence for the new relfilenode. If you do it that way, this will be less efficient, but much simpler, and you might actually finish it in one summer. Do it like 'VACUUM FULL' for any wal_level? Yep. Anything else appears to be a research problem. Updated proposal: proposal ** Add to PostgreSQL the capacity to making an Unlogged table Logged ** Introduction This project will allow to change an unlogged table (that doesn't create transaction logs - WAL files) and it's dependencies to a logged table, in other words, a regular table that create WAL files. To make this happen we'll introduce a new SQL syntax: ALTER TABLE name SET LOGGED; Benefits to the PostgreSQL Community The unlogged tables feature was introduced by 9.1 version, and provide better write performance than regular tables (logged), but are not crash-safe. Their contents are automatically discarded (cleared) in a case of a server crash, and their contents do not propagate to replication slaves, either. With the capacity of turning an unlogged table in a logged table will allow us have the better of two features, in other words, we can use an unlogged table to run a bulk load a thousands of lines (ETL scripts) and get better performance, and then change it to a logged table to get durability of loaded data. Deliverables This project has just one deliverable at the end. The deliverable will be the implementation of the routines that transform an unlogged table to logged, using the same algorithm of the vacuum full, with the exception that it will set a different relpersistence for the new relfilenode. Project Schedule until May 19: * create a website to the project (wiki.postgresql.org) * create a public repository to the project (github.com/fabriziomello) * read what has already been discussed by the community about the project ( http://wiki.postgresql.org/wiki/Todo) * as already discussed in pgsql-hackers mailing list this feature will be implemented similar to vacuum full, with the exception that it will set a differente relpersistence for the new relfilenode * learn about some PostgreSQL internals: . grammar (src/backend/parser/gram.y) . vacuum full (src/backend/commands/[vacuum.c | cluster.c]) May 19 - June 23 * implementation of the first prototype: . change the grammar of PostgreSQL to support ALTER TABLE ... SET LOGGED . implement and/or adapt the routines to change an unlogged table to logged (similar to vacuum full) * write documentation and the test cases * submit this first prototype to the commitfest 2014/06 ( https://commitfest.postgresql.org/action/commitfest_view?id=22) June 23 - June 27 * review with the Mentor of the work done until now June 27 - August 18 * do the adjustments based on the community feedback during the commitfest 2014/06 * submit to the commitfest 2014/09 for final evaluation and maybe will be committed to 9.5 version (webpage don't created yet) August 18 - August 22 * final review with the Mentor of all work done. About the proponent Fabrízio de Royes Mello e-mail: fabriziome...@gmail.com twitter: @fabriziomello github: http://github.com/fabriziomello linkedin: http://linkedin.com/in/fabriziomello Currently I help people and teams to take the full potential of relational databases, especially PostgreSQL, helping teams to design the structure of the database (modeling), build physical architecture (database schema), programming (procedural languages), SQL (usage, tuning, best practices), optimization and orchestration of instances in production too. I perform a volunteer work for Brazilian Community of PostgreSQL (www.postgresql.org.br), supporting mailing lists, organizing events (pgbr.postgresql.org.br) and some admin tasks. And also I help a little the PostgreSQL Global Development Group (PGDG) in the implementation of some features and review of patches (git.postgresql.org). /proposal -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] GSoC proposal - make an unlogged table logged
=?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= fabriziome...@gmail.com writes: Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED too? Thinking in a scope of one GSoC, of course. I think it's basically the same thing. You might hope to optimize it; but you have to create (rather than remove) an init fork, and there's no way to do that in exact sync with the commit. So for safety I think you have to copy the data into a new relfilenode. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On Fri, Mar 7, 2014 at 12:36 AM, Tom Lane t...@sss.pgh.pa.us wrote: =?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= fabriziome...@gmail.com writes: Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED too? Thinking in a scope of one GSoC, of course. I think it's basically the same thing. You might hope to optimize it; but you have to create (rather than remove) an init fork, and there's no way to do that in exact sync with the commit. So for safety I think you have to copy the data into a new relfilenode. Well, the same thing that 'vacuum full' does, but changing relpersistence to RELPERSISTENCE_UNLOGGED for the new relfilenode. Is this? Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On Tue, Mar 4, 2014 at 5:00 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Tue, Mar 4, 2014 at 3:29 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-03-04 12:54:02 -0500, Robert Haas wrote: On Tue, Mar 4, 2014 at 9:50 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-03-04 09:47:08 -0500, Robert Haas wrote: Can't that be solved by just creating the permanent relation in a new relfilenode? That's equivalent to a rewrite, yes, but we need to do that for anything but wal_level=minimal anyway. Yes, that would work. I've tended to view optimizing away the relfilenode copy as an indispensable part of this work, but that might be wrongheaded. It would certainly be a lot easier to make this happen if we didn't insist on that. I think it'd already much better than today's situation, and it's a required codepath for wal_level logical anyway. So even if somebody wants to make this work without the full copy for minimal, it'd still be a required codepath. So I am perfectly ok with a patch just adding that. Then is this a good idea for a GSoC project ? I don't know very well this internals, but I am willing to learn and I think the GSoC is a good opportunity. Any of you are willing to mentoring this project? Hi all, I written the proposal to this feature, so I would like to know if someone can review. proposal ** Add to PostgreSQL the capacity to making an Unlogged table Logged ** Introduction This project will allow to change an unlogged table (that doesn't create transaction logs - WAL files) and it's dependencies to a logged table, in other words, a regular table that create WAL files. To make this happen we'll introduce a new SQL syntax: ALTER TABLE name SET LOGGED; Benefits to the PostgreSQL Community The unlogged tables feature was introduced by 9.1 version, and provide better write performance than regular tables (logged), but are not crash-safe. Their contents are automatically discarded (cleared) in a case of a server crash, and their contents do not propagate to replication slaves, either. With the capacity of turning an unlogged table in a logged table will allow us have the better of two features, in other words, we can use an unlogged table to run a bulk load a thousands of lines (ETL scripts) and get better performance, and then change it to a logged table to get durability of loaded data. Deliverables This project will be splitted into 2 (two) deliverables: 1) Allow change an unlogged table to logged when wal_level = minimal (without propagate their contents to replication slaves) 2) Allow change an unlogged table to logged when wal_level != minimal (propagating their contents to replication slaves) Project Schedule until May 19: * create a website to the project (wiki.postgresql.org) * create a public repository to the project (github.com/fabriziomello) * read what has already been discussed by the community about the project ( http://wiki.postgresql.org/wiki/Todo) * discuss with the community the best design to the feature * learn about some PostgreSQL internals: . physical storage for relations (src/backend/catalog/storage.c) . transaction system (src/backend/access/transam/xact.c) . two-phase commit (src/backend/access/transam/twophase.c) . table commands (src/backend/commands/tablecmds.c) . grammar (src/backend/parser/gram.y) May 19 - June 23 * evaluate with the mentor and community if is a good start point use the already sent patch ( http://www.postgresql.org/message-id/263033.9223...@web29013.mail.ird.yahoo.com ) * implementation of the first deliverable: . change the grammar of PostgreSQL to support ALTER TABLE ... SET LOGGED . implement the routines to change an unlogged table to logged when wal_level = minimal * write documentation and the test cases * submit this first deliverable to the commitfest 2014/06 ( https://commitfest.postgresql.org/action/commitfest_view?id=22) June 23 - June 27 * review with the Mentor of the work done until now June 27 - August 18 * implementation of the second deliverable (wal_level != minimal) * write documentation and the test cases * submit this second deliverable to the commitfest 2014/09 (webpage don't created yet) August 18 - August 22 * final review with the Mentor of all work done. About the proponent Fabrízio de Royes Mello e-mail: fabriziome...@gmail.com twitter: @fabriziomello github: http://github.com/fabriziomello linkedin: http://linkedin.com/in/fabriziomello Currently I help people and teams to take the full potential of relational databases, especially PostgreSQL, helping teams to design the structure of the database (modeling), build physical architecture (database schema), programming (procedural languages), SQL (usage, tuning, best practices), optimization and orchestration of instances in production too. I perform a volunteer work for Brazilian Community of PostgreSQL (www.postgresql.org.br), supporting mailing
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On Mon, Mar 3, 2014 at 12:08 PM, Stephen Frost sfr...@snowman.net wrote: * Robert Haas (robertmh...@gmail.com) wrote: On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: Is the TODO item make an unlogged table logged [1] a good GSoC project? I'm pretty sure we found some problems in that design that we couldn't figure out how to solve. I don't have a pointer to the relevant -hackers discussion off-hand, but I think there was one. ISTR the discussion going something along the lines of we'd have to WAL log the entire table to do that, and if we have to do that, what's the point?. No, not really. The issue is more around what happens if we crash part way through. At crash recovery time, the system catalogs are not available, because the database isn't consistent yet and, anyway, the startup process can't be bound to a database, let alone every database that might contain unlogged tables. So the sentinel that's used to decide whether to flush the contents of a table or index is the presence or absence of an _init fork, which the startup process obviously can see just fine. The _init fork also tells us what to stick in the relation when we reset it; for a table, we can just reset to an empty file, but that's not legal for indexes, so the _init fork contains a pre-initialized empty index that we can just copy over. Now, to make an unlogged table logged, you've got to at some stage remove those _init forks. But this is not a transactional operation. If you remove the _init forks and then the transaction rolls back, you've left the system an inconsistent state. If you postpone the removal until commit time, then you have a problem if it fails, particularly if it works for the first file but fails for the second. And if you crash at any point before you've fsync'd the containing directory, you have no idea which files will still be on disk after a hard reboot. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On 2014-03-04 09:47:08 -0500, Robert Haas wrote: On Mon, Mar 3, 2014 at 12:08 PM, Stephen Frost sfr...@snowman.net wrote: * Robert Haas (robertmh...@gmail.com) wrote: On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: Is the TODO item make an unlogged table logged [1] a good GSoC project? I'm pretty sure we found some problems in that design that we couldn't figure out how to solve. I don't have a pointer to the relevant -hackers discussion off-hand, but I think there was one. ISTR the discussion going something along the lines of we'd have to WAL log the entire table to do that, and if we have to do that, what's the point?. No, not really. The issue is more around what happens if we crash part way through. At crash recovery time, the system catalogs are not available, because the database isn't consistent yet and, anyway, the startup process can't be bound to a database, let alone every database that might contain unlogged tables. So the sentinel that's used to decide whether to flush the contents of a table or index is the presence or absence of an _init fork, which the startup process obviously can see just fine. The _init fork also tells us what to stick in the relation when we reset it; for a table, we can just reset to an empty file, but that's not legal for indexes, so the _init fork contains a pre-initialized empty index that we can just copy over. Now, to make an unlogged table logged, you've got to at some stage remove those _init forks. But this is not a transactional operation. If you remove the _init forks and then the transaction rolls back, you've left the system an inconsistent state. If you postpone the removal until commit time, then you have a problem if it fails, particularly if it works for the first file but fails for the second. And if you crash at any point before you've fsync'd the containing directory, you have no idea which files will still be on disk after a hard reboot. Can't that be solved by just creating the permanent relation in a new relfilenode? That's equivalent to a rewrite, yes, but we need to do that for anything but wal_level=minimal anyway. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On Tue, Mar 4, 2014 at 3:31 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-03-04 01:10:50 -0300, Fabrízio de Royes Mello wrote: Today I do something like that: 1) create unlogged table tmp_foo ... 2) populate 'tmp_foo' table (ETL scripts or whatever) 3) start transaction 4) lock table tmp_foo in access exclusive mode 5) update pg_class set relpersistence = 'p' where oid = 'tmp_foo':regclass 6) drop table foo; -- the old foo table 7) alter table tmp_foo rename to foo; 8) end transaction 9) run pg_repack in table 'foo' I know it's very ugly, but works... and works for standbys too... :-) No, it doesn't work. It just may happen to not fail loudly/visibly in some cases. You're absolutely risking corruption of this *and* other relations when doing so. Well this already works for some time, but you are correct, exists the risk of corruption! But in my case if all run without any interrupt the relation is switched to logged. I do some checks before and after, and if something happens with this process we cleanup everything and start from the beginning. Maybe I must run CLUSTER inside the transaction block after update pg_class and execute DROP and RENAME after, in a second phase. Maybe this way is more secure. Is it? If some crash occurs and PostgreSQL restart I check if the unlogged table 'tmp_foo' exists and then I drop it. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On Tue, Mar 4, 2014 at 11:50 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-03-04 09:47:08 -0500, Robert Haas wrote: On Mon, Mar 3, 2014 at 12:08 PM, Stephen Frost sfr...@snowman.net wrote: * Robert Haas (robertmh...@gmail.com) wrote: On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: Is the TODO item make an unlogged table logged [1] a good GSoC project? I'm pretty sure we found some problems in that design that we couldn't figure out how to solve. I don't have a pointer to the relevant -hackers discussion off-hand, but I think there was one. ISTR the discussion going something along the lines of we'd have to WAL log the entire table to do that, and if we have to do that, what's the point?. No, not really. The issue is more around what happens if we crash part way through. At crash recovery time, the system catalogs are not available, because the database isn't consistent yet and, anyway, the startup process can't be bound to a database, let alone every database that might contain unlogged tables. So the sentinel that's used to decide whether to flush the contents of a table or index is the presence or absence of an _init fork, which the startup process obviously can see just fine. The _init fork also tells us what to stick in the relation when we reset it; for a table, we can just reset to an empty file, but that's not legal for indexes, so the _init fork contains a pre-initialized empty index that we can just copy over. Now, to make an unlogged table logged, you've got to at some stage remove those _init forks. But this is not a transactional operation. If you remove the _init forks and then the transaction rolls back, you've left the system an inconsistent state. If you postpone the removal until commit time, then you have a problem if it fails, particularly if it works for the first file but fails for the second. And if you crash at any point before you've fsync'd the containing directory, you have no idea which files will still be on disk after a hard reboot. Can't that be solved by just creating the permanent relation in a new relfilenode? That's equivalent to a rewrite, yes, but we need to do that for anything but wal_level=minimal anyway. Did you see this initial patch [1] from Leonardo Francalanci ? Regards, [1] http://www.postgresql.org/message-id/263033.9223...@web29013.mail.ird.yahoo.com -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On Tue, Mar 4, 2014 at 9:50 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-03-04 09:47:08 -0500, Robert Haas wrote: On Mon, Mar 3, 2014 at 12:08 PM, Stephen Frost sfr...@snowman.net wrote: * Robert Haas (robertmh...@gmail.com) wrote: On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: Is the TODO item make an unlogged table logged [1] a good GSoC project? I'm pretty sure we found some problems in that design that we couldn't figure out how to solve. I don't have a pointer to the relevant -hackers discussion off-hand, but I think there was one. ISTR the discussion going something along the lines of we'd have to WAL log the entire table to do that, and if we have to do that, what's the point?. No, not really. The issue is more around what happens if we crash part way through. At crash recovery time, the system catalogs are not available, because the database isn't consistent yet and, anyway, the startup process can't be bound to a database, let alone every database that might contain unlogged tables. So the sentinel that's used to decide whether to flush the contents of a table or index is the presence or absence of an _init fork, which the startup process obviously can see just fine. The _init fork also tells us what to stick in the relation when we reset it; for a table, we can just reset to an empty file, but that's not legal for indexes, so the _init fork contains a pre-initialized empty index that we can just copy over. Now, to make an unlogged table logged, you've got to at some stage remove those _init forks. But this is not a transactional operation. If you remove the _init forks and then the transaction rolls back, you've left the system an inconsistent state. If you postpone the removal until commit time, then you have a problem if it fails, particularly if it works for the first file but fails for the second. And if you crash at any point before you've fsync'd the containing directory, you have no idea which files will still be on disk after a hard reboot. Can't that be solved by just creating the permanent relation in a new relfilenode? That's equivalent to a rewrite, yes, but we need to do that for anything but wal_level=minimal anyway. Yes, that would work. I've tended to view optimizing away the relfilenode copy as an indispensable part of this work, but that might be wrongheaded. It would certainly be a lot easier to make this happen if we didn't insist on that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On 2014-03-04 12:54:02 -0500, Robert Haas wrote: On Tue, Mar 4, 2014 at 9:50 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-03-04 09:47:08 -0500, Robert Haas wrote: Can't that be solved by just creating the permanent relation in a new relfilenode? That's equivalent to a rewrite, yes, but we need to do that for anything but wal_level=minimal anyway. Yes, that would work. I've tended to view optimizing away the relfilenode copy as an indispensable part of this work, but that might be wrongheaded. It would certainly be a lot easier to make this happen if we didn't insist on that. I think it'd already much better than today's situation, and it's a required codepath for wal_level logical anyway. So even if somebody wants to make this work without the full copy for minimal, it'd still be a required codepath. So I am perfectly ok with a patch just adding that. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On Tue, Mar 4, 2014 at 3:29 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-03-04 12:54:02 -0500, Robert Haas wrote: On Tue, Mar 4, 2014 at 9:50 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-03-04 09:47:08 -0500, Robert Haas wrote: Can't that be solved by just creating the permanent relation in a new relfilenode? That's equivalent to a rewrite, yes, but we need to do that for anything but wal_level=minimal anyway. Yes, that would work. I've tended to view optimizing away the relfilenode copy as an indispensable part of this work, but that might be wrongheaded. It would certainly be a lot easier to make this happen if we didn't insist on that. I think it'd already much better than today's situation, and it's a required codepath for wal_level logical anyway. So even if somebody wants to make this work without the full copy for minimal, it'd still be a required codepath. So I am perfectly ok with a patch just adding that. Then is this a good idea for a GSoC project ? I don't know very well this internals, but I am willing to learn and I think the GSoC is a good opportunity. Any of you are willing to mentoring this project? Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
[HACKERS] GSoC proposal - make an unlogged table logged
Hi all, Is the TODO item make an unlogged table logged [1] a good GSoC project? Regards, [1] http://www.postgresql.org/message-id/aanlktinenzbrxdcwohkqbba2bhubfy8_c5jwrxlod...@mail.gmail.com -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: Is the TODO item make an unlogged table logged [1] a good GSoC project? I'm pretty sure we found some problems in that design that we couldn't figure out how to solve. I don't have a pointer to the relevant -hackers discussion off-hand, but I think there was one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC proposal - make an unlogged table logged
* Robert Haas (robertmh...@gmail.com) wrote: On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: Is the TODO item make an unlogged table logged [1] a good GSoC project? I'm pretty sure we found some problems in that design that we couldn't figure out how to solve. I don't have a pointer to the relevant -hackers discussion off-hand, but I think there was one. ISTR the discussion going something along the lines of we'd have to WAL log the entire table to do that, and if we have to do that, what's the point?. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] GSoC proposal - make an unlogged table logged
Stephen Frost sfr...@snowman.net writes: * Robert Haas (robertmh...@gmail.com) wrote: On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: Is the TODO item make an unlogged table logged [1] a good GSoC project? I'm pretty sure we found some problems in that design that we couldn't figure out how to solve. I don't have a pointer to the relevant -hackers discussion off-hand, but I think there was one. ISTR the discussion going something along the lines of we'd have to WAL log the entire table to do that, and if we have to do that, what's the point?. IIRC, the reason you'd have to do that is to make the table contents appear on slave servers. If you don't consider replication then it might seem easier. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On 03/03/2014 05:22 PM, Tom Lane wrote: Stephen Frost sfr...@snowman.net writes: ... ISTR the discussion going something along the lines of we'd have to WAL log the entire table to do that, and if we have to do that, what's the point?. IIRC, the reason you'd have to do that is to make the table contents appear on slave servers. If you don't consider replication then it might seem easier. So switch on logging and then perform CLUSTER/VACUUM FULL ? Should this work, or is something extra needed ? Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On 2014-03-03 12:08:26 -0500, Stephen Frost wrote: * Robert Haas (robertmh...@gmail.com) wrote: On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: Is the TODO item make an unlogged table logged [1] a good GSoC project? I'm pretty sure we found some problems in that design that we couldn't figure out how to solve. I don't have a pointer to the relevant -hackers discussion off-hand, but I think there was one. ISTR the discussion going something along the lines of we'd have to WAL log the entire table to do that, and if we have to do that, what's the point?. I don't see that as a particularly problematic problem. The primary reason to want to convert a unlogged to a logged table probably is that it's easier to do so than to recreate the table + dependencies. Also the overhead of logging full pages will be noticeably smaller than the overhead of adding all rows individually, even if using heap_multi_insert(). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On Mon, Mar 3, 2014 at 8:28 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: Is the TODO item make an unlogged table logged [1] a good GSoC project? Another interesting project around unlogged tables would be to make it possible to have unlogged indexes on fully-logged tables. That is something that there was some discussion of before, that might be easier. FWIW, I don't think that TODO page is a very good resource for finding a starter project. Picking a good project is a skill in and of itself. A lot of that stuff is aspirational, either because it's difficult, or, more often, because it's difficult relative to the rewards, which can be quite low. To be honest, if I have what I imagine to be a great idea for a project, I don't put it on that page. Maybe I should, but I don't, and I don't think that is uncommon. This is not because I'm particularly guarded about sharing the information. Why do you think that hash indexes still aren't WAL-logged after all these years (a project that someone made noise about recently in relation to GSoC), even though that's generally considered to be a SMOP? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On 2014-03-03 12:44:26 -0800, Peter Geoghegan wrote: On Mon, Mar 3, 2014 at 8:28 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: Is the TODO item make an unlogged table logged [1] a good GSoC project? Another interesting project around unlogged tables would be to make it possible to have unlogged indexes on fully-logged tables. That is something that there was some discussion of before, that might be easier. I'd actually say it's harder because it requires modifying the catalog or transparently introducing hacks similar to what unlogged matviews are doing, to make sure the index is marked invalid after a crash restart. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On Mon, Mar 3, 2014 at 2:40 PM, Hannu Krosing ha...@2ndquadrant.com wrote: On 03/03/2014 05:22 PM, Tom Lane wrote: Stephen Frost sfr...@snowman.net writes: ... ISTR the discussion going something along the lines of we'd have to WAL log the entire table to do that, and if we have to do that, what's the point?. IIRC, the reason you'd have to do that is to make the table contents appear on slave servers. If you don't consider replication then it might seem easier. So switch on logging and then perform CLUSTER/VACUUM FULL ? Should this work, or is something extra needed ? Today I do something like that: 1) create unlogged table tmp_foo ... 2) populate 'tmp_foo' table (ETL scripts or whatever) 3) start transaction 4) lock table tmp_foo in access exclusive mode 5) update pg_class set relpersistence = 'p' where oid = 'tmp_foo':regclass 6) drop table foo; -- the old foo table 7) alter table tmp_foo rename to foo; 8) end transaction 9) run pg_repack in table 'foo' I know it's very ugly, but works... and works for standbys too... :-) Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On Mon, Mar 3, 2014 at 5:44 PM, Peter Geoghegan p...@heroku.com wrote: On Mon, Mar 3, 2014 at 8:28 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: Is the TODO item make an unlogged table logged [1] a good GSoC project? Another interesting project around unlogged tables would be to make it possible to have unlogged indexes on fully-logged tables. That is something that there was some discussion of before, that might be easier. More easy than make an unlogged table logged ? FWIW, I don't think that TODO page is a very good resource for finding a starter project. Picking a good project is a skill in and of itself. A lot of that stuff is aspirational, either because it's difficult, or, more often, because it's difficult relative to the rewards, which can be quite low. To be honest, if I have what I imagine to be a great idea for a project, I don't put it on that page. Maybe I should, but I don't, and I don't think that is uncommon. This is not because I'm particularly guarded about sharing the information. Share your ideas, please! Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On Mon, Mar 3, 2014 at 5:47 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-03-03 12:44:26 -0800, Peter Geoghegan wrote: On Mon, Mar 3, 2014 at 8:28 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: Is the TODO item make an unlogged table logged [1] a good GSoC project? Another interesting project around unlogged tables would be to make it possible to have unlogged indexes on fully-logged tables. That is something that there was some discussion of before, that might be easier. I'd actually say it's harder because it requires modifying the catalog or transparently introducing hacks similar to what unlogged matviews are doing, to make sure the index is marked invalid after a crash restart. Unlogged matviews was disallowed [1]. [1] http://www.postgresql.org/message-id/e1uznrd-0008eh...@gemulon.postgresql.org -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On Mon, Mar 3, 2014 at 2:42 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-03-03 12:08:26 -0500, Stephen Frost wrote: * Robert Haas (robertmh...@gmail.com) wrote: On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: Is the TODO item make an unlogged table logged [1] a good GSoC project? I'm pretty sure we found some problems in that design that we couldn't figure out how to solve. I don't have a pointer to the relevant -hackers discussion off-hand, but I think there was one. ISTR the discussion going something along the lines of we'd have to WAL log the entire table to do that, and if we have to do that, what's the point?. I don't see that as a particularly problematic problem. The primary reason to want to convert a unlogged to a logged table probably is that it's easier to do so than to recreate the table + dependencies. Also the overhead of logging full pages will be noticeably smaller than the overhead of adding all rows individually, even if using heap_multi_insert(). Do you know some similar in the source code? Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] GSoC proposal - make an unlogged table logged
On 2014-03-04 01:10:50 -0300, Fabrízio de Royes Mello wrote: Today I do something like that: 1) create unlogged table tmp_foo ... 2) populate 'tmp_foo' table (ETL scripts or whatever) 3) start transaction 4) lock table tmp_foo in access exclusive mode 5) update pg_class set relpersistence = 'p' where oid = 'tmp_foo':regclass 6) drop table foo; -- the old foo table 7) alter table tmp_foo rename to foo; 8) end transaction 9) run pg_repack in table 'foo' I know it's very ugly, but works... and works for standbys too... :-) No, it doesn't work. It just may happen to not fail loudly/visibly in some cases. You're absolutely risking corruption of this *and* other relations when doing so. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers