subject:"\[HACKERS\] GSoC proposal"


On 04/01/2014 08:58 PM, Andres Freund wrote:

On 2014-04-01 12:56:04 -0500, Jim Nasby wrote:

On 3/4/14, 8:50 AM, Andres Freund wrote:

Can't that be solved by just creating the permanent relation in a new
relfilenode? That's equivalent to a rewrite, yes, but we need to do that
for anything but wal_level=minimal anyway.


Maybe I'm missing something, but doesn't this actually involve writing the data 
twice? Once into WAL and again into the relation itself?


Yes. But as I said, that's unavoidable for anything but
wal_level=minimal.


Ideally, you would *only* write the data to WAL, when you do ALTER TABLE 
... SET LOGGED. There's no fundamental reason you need to rewrite the 
heap, too. I understand that it might be difficult to do, because of the 
way the system catalogs work, but it's worthy goal.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

On 2014-04-03 13:38:29 +0300, Heikki Linnakangas wrote:
 On 04/01/2014 08:58 PM, Andres Freund wrote:
 On 2014-04-01 12:56:04 -0500, Jim Nasby wrote:
 On 3/4/14, 8:50 AM, Andres Freund wrote:
 Can't that be solved by just creating the permanent relation in a new
 relfilenode? That's equivalent to a rewrite, yes, but we need to do that
 for anything but wal_level=minimal anyway.
 
 Maybe I'm missing something, but doesn't this actually involve writing the 
 data twice? Once into WAL and again into the relation itself?
 
 Yes. But as I said, that's unavoidable for anything but
 wal_level=minimal.
 
 Ideally, you would *only* write the data to WAL, when you do ALTER TABLE ...
 SET LOGGED. There's no fundamental reason you need to rewrite the heap, too.
 I understand that it might be difficult to do, because of the way the system
 catalogs work, but it's worthy goal.

I don't think that's realistic to achieve due to the issues described in
http://archives.postgresql.org/message-id/CA%2BTgmob44LNwwU73N1aJsGQyzQ61SdhKJRC_89wCm0%2BaLg%3Dx2Q%40mail.gmail.com

I don't think it's worthwile to make the feature much more complex, just
to address this. perfect is the enemy of good and all that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

On 04/03/2014 01:44 PM, Andres Freund wrote:

On 2014-04-03 13:38:29 +0300, Heikki Linnakangas wrote:

On 04/01/2014 08:58 PM, Andres Freund wrote:

On 2014-04-01 12:56:04 -0500, Jim Nasby wrote:

On 3/4/14, 8:50 AM, Andres Freund wrote:

Can't that be solved by just creating the permanent relation in a new
relfilenode? That's equivalent to a rewrite, yes, but we need to do that
for anything but wal_level=minimal anyway.

Maybe I'm missing something, but doesn't this actually involve writing the data
twice? Once into WAL and again into the relation itself?

Yes. But as I said, that's unavoidable for anything but
wal_level=minimal.

Ideally, you would *only* write the data to WAL, when you do ALTER TABLE ...
SET LOGGED. There's no fundamental reason you need to rewrite the heap, too.
I understand that it might be difficult to do, because of the way the system
catalogs work, but it's worthy goal.

I don't think that's realistic to achieve due to the issues described in
http://archives.postgresql.org/message-id/CA%2BTgmob44LNwwU73N1aJsGQyzQ61SdhKJRC_89wCm0%2BaLg%3Dx2Q%40mail.gmail.com

To which I replied here:
http://www.postgresql.org/message-id/533af9d7.7010...@vmware.com. Please
reply to that sub-thread with any problems you see. I might be missing
something, but I really don't see any insurmountable problem here.

I don't think it's worthwile to make the feature much more complex, just
to address this. perfect is the enemy of good and all that.

We should do the trivial implementation first, sure. But that ought to
be trivial. Now is the time to discuss how to do the more optimal thing.
If we can come up with a feasible design on that, Fabrizio will have
time to do that as part of the GSoC.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

On 2014-04-01 20:39:35 +0300, Heikki Linnakangas wrote:
 On 03/07/2014 05:36 AM, Tom Lane wrote:
 =?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= fabriziome...@gmail.com writes:
 Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED too?
 Thinking in a scope of one GSoC, of course.
 
 I think it's basically the same thing.  You might hope to optimize it;
 but you have to create (rather than remove) an init fork, and there's
 no way to do that in exact sync with the commit.
 
 You just have to include that information with the commit WAL record, no?

Sure, it's possible to do that. But that seems like complicating generic
paths more than I'd like for a minor feature. Especially as the
unlinking of the files would need to happen somewhere in
RecordTransactionCommit(). After the XLogFlush(), but before unsetting
MyPgXact-delayChkpt. That's a crit section, right?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged


On 04/01/2014 08:39 PM, Heikki Linnakangas wrote:

On 03/07/2014 05:36 AM, Tom Lane wrote:

=?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= fabriziome...@gmail.com writes:

Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED too?
Thinking in a scope of one GSoC, of course.


I think it's basically the same thing.  You might hope to optimize it;
but you have to create (rather than remove) an init fork, and there's
no way to do that in exact sync with the commit.


You just have to include that information with the commit WAL record, no?


No-one's replied yet, but perhaps the worry is that after you've written 
the commit record, you have to go ahead with removing/creating the init 
fork, and that is seen as too risky. If a creat() or unlink() call 
fails, that will have to be a PANIC, and crash recovery will likewise 
have to PANIC if the forks still cannot be removed/created.


My first thought is that that seems ok. It's unlikely that an unlink() 
of a small file in the data directory would fail. Creation could be done 
with a temporary name first and renamed into place, to avoid running out 
of disk space in the critical section.


If that's not acceptable, one idea off the top of my head is to somehow 
stamp the init forks when making an unlogged table logged, with the XID 
of the transcation. Crash recovery could then check the clog to see if 
the transaction committed, and ignore any init fork files belonging to 
committed transactions. (Same in reverse when making a logged table 
unlogged).


Currently, we reset unlogged relations before replaying the WAL. That 
would have to be delayed until end of WAL replay, because otherwise we 
don't know if the transaction committed or not. Although if we go with 
the stamping approach, we could still reset unstamped files at the 
beginning of recovery.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

On 2014-04-03 14:26:50 +0300, Heikki Linnakangas wrote:
 On 04/01/2014 08:39 PM, Heikki Linnakangas wrote:
 On 03/07/2014 05:36 AM, Tom Lane wrote:
 =?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= fabriziome...@gmail.com writes:
 Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED too?
 Thinking in a scope of one GSoC, of course.
 
 I think it's basically the same thing.  You might hope to optimize it;
 but you have to create (rather than remove) an init fork, and there's
 no way to do that in exact sync with the commit.
 
 You just have to include that information with the commit WAL record, no?
 
 No-one's replied yet

That might be because it was a month after the initial discussion, and
at least I'd temporarily lost track of the thread ;)

 , but perhaps the worry is that after you've written the
 commit record, you have to go ahead with removing/creating the init fork,
 and that is seen as too risky. If a creat() or unlink() call fails, that
 will have to be a PANIC, and crash recovery will likewise have to PANIC if
 the forks still cannot be removed/created.

That's part of the worry, yes. It's also creeping code dealing with
unlogged relations into a fairly critical place
(RecordTransactionCommit()) where it really doesn't seem to belong.

 My first thought is that that seems ok. It's unlikely that an unlink() of a
 small file in the data directory would fail. Creation could be done with a
 temporary name first and renamed into place, to avoid running out of disk
 space in the critical section.

I continue to feel that that's far too much impact for a minor
feature. Even if it could be made work reliably, it'll be a fair amount
of seldomly used infrastructure.

 If that's not acceptable, one idea off the top of my head is to somehow
 stamp the init forks when making an unlogged table logged, with the XID of
 the transcation. Crash recovery could then check the clog to see if the
 transaction committed, and ignore any init fork files belonging to committed
 transactions. (Same in reverse when making a logged table unlogged).

I've thought about that - after all, the logical decoding stuff uses
that trick in some places - but it has the grave disadvantage that it
requires a full directory scan to fully remove a relation. That seems to
be a heavy price.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

On 2014-04-03 13:38:29 +0300, Heikki Linnakangas wrote:
 On 04/01/2014 08:58 PM, Andres Freund wrote:
 On 2014-04-01 12:56:04 -0500, Jim Nasby wrote:
 On 3/4/14, 8:50 AM, Andres Freund wrote:
 Can't that be solved by just creating the permanent relation in a new
 relfilenode? That's equivalent to a rewrite, yes, but we need to do that
 for anything but wal_level=minimal anyway.
 
 Maybe I'm missing something, but doesn't this actually involve writing the 
 data twice? Once into WAL and again into the relation itself?
 
 Yes. But as I said, that's unavoidable for anything but
 wal_level=minimal.
 
 Ideally, you would *only* write the data to WAL, when you do ALTER TABLE ...
 SET LOGGED. There's no fundamental reason you need to rewrite the
 heap, too.

As another point: What's the advantage of that? The amount of writes
will be the same, no? It doesn't seem to be all that interesting that
a second filenode exists temporarily?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged


On 04/03/2014 02:41 PM, Andres Freund wrote:

On 2014-04-03 13:38:29 +0300, Heikki Linnakangas wrote:

On 04/01/2014 08:58 PM, Andres Freund wrote:

On 2014-04-01 12:56:04 -0500, Jim Nasby wrote:

On 3/4/14, 8:50 AM, Andres Freund wrote:

Can't that be solved by just creating the permanent relation in a new
relfilenode? That's equivalent to a rewrite, yes, but we need to do that
for anything but wal_level=minimal anyway.


Maybe I'm missing something, but doesn't this actually involve writing the data 
twice? Once into WAL and again into the relation itself?


Yes. But as I said, that's unavoidable for anything but
wal_level=minimal.


Ideally, you would *only* write the data to WAL, when you do ALTER TABLE ...
SET LOGGED. There's no fundamental reason you need to rewrite the
heap, too.


As another point: What's the advantage of that? The amount of writes
will be the same, no? It doesn't seem to be all that interesting that
a second filenode exists temporarily?


Surely it's cheaper to read the whole relation and copy it to just WAL, 
than to read the whole relation and write it both the WAL and another file.


(Maybe it's not worth the trouble to avoid it - but that depends on 
whether we come up with a good design..)


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

On 2014-04-03 15:02:27 +0300, Heikki Linnakangas wrote:
 On 04/03/2014 02:41 PM, Andres Freund wrote:
 On 2014-04-03 13:38:29 +0300, Heikki Linnakangas wrote:
 On 04/01/2014 08:58 PM, Andres Freund wrote:
 On 2014-04-01 12:56:04 -0500, Jim Nasby wrote:
 On 3/4/14, 8:50 AM, Andres Freund wrote:
 Can't that be solved by just creating the permanent relation in a new
 relfilenode? That's equivalent to a rewrite, yes, but we need to do that
 for anything but wal_level=minimal anyway.
 
 Maybe I'm missing something, but doesn't this actually involve writing 
 the data twice? Once into WAL and again into the relation itself?
 
 Yes. But as I said, that's unavoidable for anything but
 wal_level=minimal.
 
 Ideally, you would *only* write the data to WAL, when you do ALTER TABLE ...
 SET LOGGED. There's no fundamental reason you need to rewrite the
 heap, too.
 
 As another point: What's the advantage of that? The amount of writes
 will be the same, no? It doesn't seem to be all that interesting that
 a second filenode exists temporarily?
 
 Surely it's cheaper to read the whole relation and copy it to just WAL, than
 to read the whole relation and write it both the WAL and another file.

I have to admit I was thinking of the WAL replay case ;). But we'll
actually have to write all dirty s_b, change the persistency tags and
such anyway because there's no LSN interlock with checkpoints. That
seems pretty ugly as well, and once again, avoidable by a rewrite.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-04-03 Thread Tom Lane

Heikki Linnakangas hlinnakan...@vmware.com writes:
 No-one's replied yet, but perhaps the worry is that after you've written 
 the commit record, you have to go ahead with removing/creating the init 
 fork, and that is seen as too risky. If a creat() or unlink() call 
 fails, that will have to be a PANIC, and crash recovery will likewise 
 have to PANIC if the forks still cannot be removed/created.

 My first thought is that that seems ok.

No, it isn't.  No filesystem operation should *ever* be thought to be
guaranteed to succeed.

I also concur with Andres' complaint that this feature is not worth
adding complication to the core transaction commit path for.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-04-02 Thread Fabrízio de Royes Mello

On Tue, Apr 1, 2014 at 1:40 PM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2014-04-01 13:37:57 -0300, Fabrízio de Royes Mello wrote:
  In the GSoC proposal page [1] I received some suggestions to strech
goals:
 
  * ALTER TABLE name SET UNLOGGED. This is essentially the reverse of
the
  core proposal, which is ALTER TABLE name SET LOGGED. Yes, I think that
  should definitely be included. It would be weird to have SET LOGGED but
not
  SET UNLOGGED.

 Yes, that makes sense.

  * Allow unlogged indexes on logged tables.

 I don't think it's realistic to build the infrastructure necessary for
 that as part of gsoc. The reasons have been explained somewhere in this
 thread.

  * Implement ALTER TABLE name SET LOGGED without rewriting the whole
  table, when wal_level = minimal.

 Yea, maybe.

  * Allow unlogged materialized views.

 I don't think that's realistic either.


Thanks Andres for your comments.

Anyway I added this additional goals to proposal and as Heikki said:

It's actually nice to have several separate goals like this, it means that
if you finish the task faster than expected, you can move on to the next
goal, and if one task takes longer than expected so that you don't finish
all the work, we'll still have something useful.

I hope you can help me in some way with this project too. :-)

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-04-01 Thread Fabrízio de Royes Mello

On Fri, Mar 7, 2014 at 12:36 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 =?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= fabriziome...@gmail.com
 writes:
  Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED
 too?
  Thinking in a scope of one GSoC, of course.

 I think it's basically the same thing.  You might hope to optimize it;
 but you have to create (rather than remove) an init fork, and there's
 no way to do that in exact sync with the commit.  So for safety I think
 you have to copy the data into a new relfilenode.


Hi all,

In the GSoC proposal page [1] I received some suggestions to strech goals:

* ALTER TABLE name SET UNLOGGED. This is essentially the reverse of the
core proposal, which is ALTER TABLE name SET LOGGED. Yes, I think that
should definitely be included. It would be weird to have SET LOGGED but not
SET UNLOGGED.

* Allow unlogged indexes on logged tables.

* Implement ALTER TABLE name SET LOGGED without rewriting the whole
table, when wal_level = minimal.

* Allow unlogged materialized views.

Comments?


[1]
http://www.google-melange.com/gsoc/proposal/review/student/google/gsoc2014/fabriziomello/5629499534213120

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: 
 http://frabriziomello.blogspot.comhttp://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-04-01 Thread Andres Freund

On 2014-04-01 13:37:57 -0300, Fabrízio de Royes Mello wrote:
 In the GSoC proposal page [1] I received some suggestions to strech goals:
 
 * ALTER TABLE name SET UNLOGGED. This is essentially the reverse of the
 core proposal, which is ALTER TABLE name SET LOGGED. Yes, I think that
 should definitely be included. It would be weird to have SET LOGGED but not
 SET UNLOGGED.

Yes, that makes sense.

 * Allow unlogged indexes on logged tables.

I don't think it's realistic to build the infrastructure necessary for
that as part of gsoc. The reasons have been explained somewhere in this
thread.

 * Implement ALTER TABLE name SET LOGGED without rewriting the whole
 table, when wal_level = minimal.

Yea, maybe.

 * Allow unlogged materialized views.

I don't think that's realistic either.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-04-01 Thread Heikki Linnakangas


On 03/07/2014 05:36 AM, Tom Lane wrote:

=?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= fabriziome...@gmail.com writes:

Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED too?
Thinking in a scope of one GSoC, of course.


I think it's basically the same thing.  You might hope to optimize it;
but you have to create (rather than remove) an init fork, and there's
no way to do that in exact sync with the commit.


You just have to include that information with the commit WAL record, no?

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-04-01 Thread Jim Nasby


On 3/4/14, 8:50 AM, Andres Freund wrote:

Can't that be solved by just creating the permanent relation in a new
relfilenode? That's equivalent to a rewrite, yes, but we need to do that
for anything but wal_level=minimal anyway.


Maybe I'm missing something, but doesn't this actually involve writing the data 
twice? Once into WAL and again into the relation itself?
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-04-01 Thread Andres Freund

On 2014-04-01 12:56:04 -0500, Jim Nasby wrote:
 On 3/4/14, 8:50 AM, Andres Freund wrote:
 Can't that be solved by just creating the permanent relation in a new
 relfilenode? That's equivalent to a rewrite, yes, but we need to do that
 for anything but wal_level=minimal anyway.
 
 Maybe I'm missing something, but doesn't this actually involve writing the 
 data twice? Once into WAL and again into the relation itself?

Yes. But as I said, that's unavoidable for anything but
wal_level=minimal. If somebody wants to put in the additional nontrivial
work to make it work faster with wal_level=minimal, they can do so. But
the other case is more general and needs to be done anyway.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal. Index-only scans for GIST

2014-03-25 Thread Anastasia Lubennikova

2014-03-18 18:47 GMT+04:00 Robert Haas robertmh...@gmail.com


  If the fetch() is specified by the developer, then using it, algorithm
 can
  retrieve the data directly to output areas at this stage, without
 reference
  to the heap.

 This seems to be the crux of your proposal, but it seems vague: what
 exactly do you mean by retrieve the data directly to output areas?
  What data are you going to retrieve and where are you going to put it?


 I meant Datum that storages in Gist-tree nodes. Now gistgettuple() returns
xs_ctup.t_self (item pointer). I'm going to add index-only scan
functionality: gistsettuple() will return pointer and Datum itself as
xs_itup . So queue will contain both the pointer and the Datum. If
visibilitymap_test returns true then Datum from xs_itup would be added into
queue. Otherwise page must be scanned.

Another question to consider is: which operator classes do you
 anticipate that this will work for and which ones do you anticipate
 that it will not work for?  Any operator class that lossifies that
 input data before storing it in the index is presumably doomed, but
 which ones do that, and which do not?


about amcanreturn:
I'm going to create function gistcanreturn() = If fetch() is defined for
all indexed columns?

And last point of my project is to implement fetch() for existing opclasses
based on GIST.

-- 
Best regards,
Lubennikova Anastasia

[HACKERS] GSoC proposal. Index-only scans for GIST

2014-03-18 Thread Anastasia Lubennikova

Hello!
Here is the text of my proposal which I've applied to GSoC.
(and link
http://www.google-melange.com/gsoc/proposal/public/google/gsoc2014/lubennikovaav/5629499534213120)
Any suggestions and comments are welcome.

*Project name*

Support for index-only scans for GIST index

*Brief review*

Currently GiST index don't have index-only scan functionality. Index-only
scans are a major performance feature added to Postgres 9.2. They allow
certain types of queries to be satisfied just by retrieving data from
indexes, and not from tables. This feature for B-trees (implemented since
PostgreSQL-9.2) allows doing certain types of queries significantly faster.
This is achieved by reduction in the amount of I/O necessary to satisfy
queries. I think it will be useful to implement this feature for user
defined data types that use GiST index.

*Benefits to the PostgreSQL Community*

Faster GiST index search would be actual for many PostgreSQL applications
(for example some GIS systems).

*Quantifiable results*

Acceleration of GiST index search.

*Project details*

1. The GiST is a balanced tree structure like a B-tree, containing key,
pointer pairs. GiST key is a member of a user-defined class, and
represents some property that is true of all data items reachable from the
pointer associated with the key. The GiST provides a possibility to create
custom data types with indexed access methods and extensible set of queries.

There are seven methods that an index operator class for GiST must provide,
and an eighth that is optional.

-consistent

-union

-compress

-decompress

-penalty

-picksplit

-equal

-distance (optional)

I'm going to create new optional method fetch() in addition to existing.
Thus user can create a method of retrieving data from the index without
loss. It will be used when performing search queries to speed data
retrieving.

2. gistget algorithm (several parts omitted):

Check the key
gistindex_keytest() - does this index tuple satisfy the scan key(s)?

Scan all items on the GiST index page and insert them into the queue (or
directly to output areas)

plain scan

Heap tuple TIDs are returned into so-pageData[]

ordered scan

Heap tuple TIDs are pushed into individual search queue items

If the fetch() is specified by the developer, then using it, algorithm can
retrieve the data directly to output areas at this stage, without reference
to the heap.

3. Create function gistcanreturn to check whether fetch() is specified by
user.

Amcanreturn - Function to check whether index supports index-only scans, or
zero if none

There is the question of support index-only scans for multicolumn indexes.
Probably it would require extend the interface to add separate columns
checking.

To solve this question I need the community's help.

4. Add details for index only scans into gistcostestimate function

*Links*

1) Hellerstein J. M., Naughton J. F., Pfeffer A. Generalized search
trees for database systems. - September, 1995.

2) http://www.sai.msu.su/~megera/postgres/gist/

3) PostgreSQL 9.3.3 Documentation: chapters 11. Indexes, 55. GiST
Indexes.

--
Best regards,
Lubennikova Anastasia

Re: [HACKERS] GSoC proposal. Index-only scans for GIST

2014-03-18 Thread Robert Haas

On Tue, Mar 18, 2014 at 9:12 AM, Anastasia Lubennikova
lubennikov...@gmail.com wrote:
 Support for index-only scans for GIST index

This is a cool idea, if it can be made to work.

 If the fetch() is specified by the developer, then using it, algorithm can
 retrieve the data directly to output areas at this stage, without reference
 to the heap.

This seems to be the crux of your proposal, but it seems vague: what
exactly do you mean by retrieve the data directly to output areas?
What data are you going to retrieve and where are you going to put it?

Another question to consider is: which operator classes do you
anticipate that this will work for and which ones do you anticipate
that it will not work for?  Any operator class that lossifies that
input data before storing it in the index is presumably doomed, but
which ones do that, and which do not?

Tom Lane previously proposed extending SP-GiST to support index-only
scans.  You might find that discussing worth reading, or perhaps
consider it as an alternative if GiST doesn't work out:

http://www.postgresql.org/message-id/10839.1323885...@sss.pgh.pa.us

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal. Index-only scans for GIST

2014-03-18 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 Tom Lane previously proposed extending SP-GiST to support index-only
 scans.  You might find that discussing worth reading, or perhaps
 consider it as an alternative if GiST doesn't work out:
 http://www.postgresql.org/message-id/10839.1323885...@sss.pgh.pa.us

That wasn't just a proposal, see commits
3695a555136a6d179cac8ae48d5f90171d5b30e9 and
92203624934095163f8b57b5b3d7bbd2645da2c8.  But yeah, that might be a
useful reference for what is likely to be involved with making GIST
do it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal. Index-only scans for GIST

2014-03-18 Thread Josh Berkus

Alexander,

Can you comment on the below proposal? I'd like your opinion on how
difficult it will be.

On 03/18/2014 06:12 AM, Anastasia Lubennikova wrote:
Hello!
Here is the text of my proposal which I've applied to GSoC.
(and link
http://www.google-melange.com/gsoc/proposal/public/google/gsoc2014/lubennikovaav/5629499534213120)
Any suggestions and comments are welcome.

*Project name*

Support for index-only scans for GIST index

*Brief review*

*Benefits to the PostgreSQL Community*

Faster GiST index search would be actual for many PostgreSQL applications
(for example some GIS systems).

*Quantifiable results*

Acceleration of GiST index search.

*Project details*

There are seven methods that an index operator class for GiST must provide,
and an eighth that is optional.

-consistent

-union

-compress

-decompress

-penalty

-picksplit

-equal

-distance (optional)

2. gistget algorithm (several parts omitted):

Check the key
gistindex_keytest() - does this index tuple satisfy the scan key(s)?

Scan all items on the GiST index page and insert them into the queue (or
directly to output areas)

plain scan

Heap tuple TIDs are returned into so-pageData[]

ordered scan

Heap tuple TIDs are pushed into individual search queue items

If the fetch() is specified by the developer, then using it, algorithm can
retrieve the data directly to output areas at this stage, without reference
to the heap.

3. Create function gistcanreturn to check whether fetch() is specified by
user.

Amcanreturn - Function to check whether index supports index-only scans, or
zero if none

There is the question of support index-only scans for multicolumn indexes.
Probably it would require extend the interface to add separate columns
checking.

To solve this question I need the community's help.

4. Add details for index only scans into gistcostestimate function

*Links*

1) Hellerstein J. M., Naughton J. F., Pfeffer A. Generalized search
trees for database systems. - September, 1995.

2) http://www.sai.msu.su/~megera/postgres/gist/

3) PostgreSQL 9.3.3 Documentation: chapters 11. Indexes, 55. GiST
Indexes.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal. Index-only scans for GIST

2014-03-18 Thread Alexander Korotkov

On Tue, Mar 18, 2014 at 5:12 PM, Anastasia Lubennikova 
lubennikov...@gmail.com wrote:

 2. gistget algorithm (several parts omitted):

 Check the key
 gistindex_keytest() - does this index tuple satisfy the scan key(s)?

 Scan all items on the GiST index page and insert them into the queue (or
 directly to output areas)

 plain scan

 Heap tuple TIDs are returned into so-pageData[]

 ordered scan

 Heap tuple TIDs are pushed into individual search queue items

 If the fetch() is specified by the developer, then using it, algorithm can
 retrieve the data directly to output areas at this stage, without reference
 to the heap.


I think there are following changes to be made to GiST code:
1) When next consistent IndexTuple is found extract original Datums using
fetch method.
2) Put those original Datums to queue.
3) When returning next ItemPointer from queue put original Datums to
IndexScanDesc (into xs_itup).

 3. Create function gistcanreturn to check whether fetch() is specified by
 user.

 Amcanreturn - Function to check whether index supports index-only scans,
 or zero if none

 There is the question of support index-only scans for multicolumn indexes.
 Probably it would require extend the interface to add separate columns
 checking.

 To solve this question I need the community's help.


 4. Add details for index only scans into gistcostestimate function


Also, another part of work to be mentioned is to implement fetch function
for all suitable opclasses.


With best regards,
Alexander Korotkov.

Re: [HACKERS] GSoC proposal. Index-only scans for GIST

2014-03-18 Thread Alexander Korotkov

Josh,

Anastasia has already consulted to me in person. It is not big proposal.
But for newbie who is not familiar with PostgreSQL code base and especially
GiST it seems fair enough.

With best regards,
Alexander Korotkov.

On Tue, Mar 18, 2014 at 9:16 PM, Josh Berkus j...@agliodbs.com wrote:

Alexander,

Can you comment on the below proposal? I'd like your opinion on how
difficult it will be.

On 03/18/2014 06:12 AM, Anastasia Lubennikova wrote:
Hello!
Here is the text of my proposal which I've applied to GSoC.
(and link

http://www.google-melange.com/gsoc/proposal/public/google/gsoc2014/lubennikovaav/5629499534213120
)
Any suggestions and comments are welcome.

*Project name*

Support for index-only scans for GIST index

*Brief review*

Currently GiST index don't have index-only scan functionality. Index-only
scans are a major performance feature added to Postgres 9.2. They allow
certain types of queries to be satisfied just by retrieving data from
indexes, and not from tables. This feature for B-trees (implemented since
PostgreSQL-9.2) allows doing certain types of queries significantly
faster.
This is achieved by reduction in the amount of I/O necessary to satisfy
queries. I think it will be useful to implement this feature for user
defined data types that use GiST index.

*Benefits to the PostgreSQL Community*

Faster GiST index search would be actual for many PostgreSQL applications
(for example some GIS systems).

*Quantifiable results*

Acceleration of GiST index search.

*Project details*

1. The GiST is a balanced tree structure like a B-tree, containing key,
pointer pairs. GiST key is a member of a user-defined class, and
represents some property that is true of all data items reachable from
the
pointer associated with the key. The GiST provides a possibility to
create
custom data types with indexed access methods and extensible set of
queries.

There are seven methods that an index operator class for GiST must
provide,
and an eighth that is optional.

-consistent

-union

-compress

-decompress

-penalty

-picksplit

-equal

-distance (optional)

2. gistget algorithm (several parts omitted):

Check the key
gistindex_keytest() - does this index tuple satisfy the scan key(s)?

Scan all items on the GiST index page and insert them into the queue (or
directly to output areas)

plain scan

Heap tuple TIDs are returned into so-pageData[]

ordered scan

Heap tuple TIDs are pushed into individual search queue items

If the fetch() is specified by the developer, then using it, algorithm
can
retrieve the data directly to output areas at this stage, without
reference
to the heap.

3. Create function gistcanreturn to check whether fetch() is specified by
user.

Amcanreturn - Function to check whether index supports index-only scans,
or
zero if none

There is the question of support index-only scans for multicolumn
indexes.
Probably it would require extend the interface to add separate columns
checking.

To solve this question I need the community's help.

4. Add details for index only scans into gistcostestimate function

*Links*

1) Hellerstein J. M., Naughton J. F., Pfeffer A. Generalized search
trees for database systems. - September, 1995.

2) http://www.sai.msu.su/~megera/postgres/gist/

3) PostgreSQL 9.3.3 Documentation: chapters 11. Indexes, 55. GiST
Indexes.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: [HACKERS] GSoC proposal. Index-only scans for GIST

2014-03-18 Thread Josh Berkus

On 03/18/2014 12:11 PM, Alexander Korotkov wrote:
 Josh,
 
 Anastasia has already consulted to me in person. It is not big proposal.
 But for newbie who is not familiar with PostgreSQL code base and especially
 GiST it seems fair enough.
 

Thanks, that's what I wanted to know.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-06 Thread Robert Haas

On Wed, Mar 5, 2014 at 7:42 PM, Fabrízio de Royes Mello
fabriziome...@gmail.com wrote:
 On Tue, Mar 4, 2014 at 5:00 PM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
 On Tue, Mar 4, 2014 at 3:29 PM, Andres Freund and...@2ndquadrant.com
 wrote:
 
  On 2014-03-04 12:54:02 -0500, Robert Haas wrote:
   On Tue, Mar 4, 2014 at 9:50 AM, Andres Freund and...@2ndquadrant.com
   wrote:
On 2014-03-04 09:47:08 -0500, Robert Haas wrote:
Can't that be solved by just creating the permanent relation in a
new
relfilenode? That's equivalent to a rewrite, yes, but we need to do
that
for anything but wal_level=minimal anyway.
  
   Yes, that would work.  I've tended to view optimizing away the
   relfilenode copy as an indispensable part of this work, but that might
   be wrongheaded.  It would certainly be a lot easier to make this
   happen if we didn't insist on that.
 
  I think it'd already much better than today's situation, and it's a
  required codepath for wal_level  logical anyway. So even if somebody
  wants to make this work without the full copy for minimal, it'd still be
  a required codepath. So I am perfectly ok with a patch just adding that.
 

 Then is this a good idea for a GSoC project ?

 I don't know very well this internals, but I am willing to learn and I
 think the GSoC is a good opportunity.

 Any of you are willing to mentoring this project?


 I written the proposal to this feature, so I would like to know if someone
 can review.

I think this isn't a good design.  Per the discussion between Andres
and I, I think that I think you should do is make ALTER TABLE .. SET
LOGGED work just like VACUUM FULL, with the exception that it will set
a different relpersistence for the new relfilenode.  If you do it that
way, this will be less efficient, but much simpler, and you might
actually finish it in one summer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-06 Thread Thom Brown

On 6 March 2014 19:42, Robert Haas robertmh...@gmail.com wrote:

 On Wed, Mar 5, 2014 at 7:42 PM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
  On Tue, Mar 4, 2014 at 5:00 PM, Fabrízio de Royes Mello
  fabriziome...@gmail.com wrote:
  On Tue, Mar 4, 2014 at 3:29 PM, Andres Freund and...@2ndquadrant.com
  wrote:
  
   On 2014-03-04 12:54:02 -0500, Robert Haas wrote:
On Tue, Mar 4, 2014 at 9:50 AM, Andres Freund 
 and...@2ndquadrant.com
wrote:
 On 2014-03-04 09:47:08 -0500, Robert Haas wrote:
 Can't that be solved by just creating the permanent relation in a
 new
 relfilenode? That's equivalent to a rewrite, yes, but we need to
 do
 that
 for anything but wal_level=minimal anyway.
   
Yes, that would work.  I've tended to view optimizing away the
relfilenode copy as an indispensable part of this work, but that
 might
be wrongheaded.  It would certainly be a lot easier to make this
happen if we didn't insist on that.
  
   I think it'd already much better than today's situation, and it's a
   required codepath for wal_level  logical anyway. So even if somebody
   wants to make this work without the full copy for minimal, it'd still
 be
   a required codepath. So I am perfectly ok with a patch just adding
 that.
  
 
  Then is this a good idea for a GSoC project ?
 
  I don't know very well this internals, but I am willing to learn and I
  think the GSoC is a good opportunity.
 
  Any of you are willing to mentoring this project?
 
 
  I written the proposal to this feature, so I would like to know if
 someone
  can review.

 I think this isn't a good design.  Per the discussion between Andres
 and I, I think that I think you should do is make ALTER TABLE .. SET
 LOGGED work just like VACUUM FULL, with the exception that it will set
 a different relpersistence for the new relfilenode.  If you do it that
 way, this will be less efficient, but much simpler, and you might
 actually finish it in one summer.


Sounds like a plan.  Would there be any stretch-goals for this work, or is
there not really anything else that could be done?

-- 
Thom

Re: [HACKERS] GSoC proposal - make an unlogged table logged

On Thu, Mar 6, 2014 at 4:42 PM, Robert Haas robertmh...@gmail.com wrote:


 I think this isn't a good design.  Per the discussion between Andres
 and I, I think that I think you should do is make ALTER TABLE .. SET
 LOGGED work just like VACUUM FULL, with the exception that it will set
 a different relpersistence for the new relfilenode.  If you do it that
 way, this will be less efficient, but much simpler, and you might
 actually finish it in one summer.


Do it like 'VACUUM FULL' for any wal_level?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-06 Thread Robert Haas

On Thu, Mar 6, 2014 at 2:52 PM, Fabrízio de Royes Mello
fabriziome...@gmail.com wrote:
 On Thu, Mar 6, 2014 at 4:42 PM, Robert Haas robertmh...@gmail.com wrote:
 I think this isn't a good design.  Per the discussion between Andres
 and I, I think that I think you should do is make ALTER TABLE .. SET
 LOGGED work just like VACUUM FULL, with the exception that it will set
 a different relpersistence for the new relfilenode.  If you do it that
 way, this will be less efficient, but much simpler, and you might
 actually finish it in one summer.


 Do it like 'VACUUM FULL' for any wal_level?

Yep.  Anything else appears to be a research problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

On Thu, Mar 6, 2014 at 5:04 PM, Robert Haas robertmh...@gmail.com wrote:

 On Thu, Mar 6, 2014 at 2:52 PM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
  On Thu, Mar 6, 2014 at 4:42 PM, Robert Haas robertmh...@gmail.com
wrote:
  I think this isn't a good design.  Per the discussion between Andres
  and I, I think that I think you should do is make ALTER TABLE .. SET
  LOGGED work just like VACUUM FULL, with the exception that it will set
  a different relpersistence for the new relfilenode.  If you do it that
  way, this will be less efficient, but much simpler, and you might
  actually finish it in one summer.
 
 
  Do it like 'VACUUM FULL' for any wal_level?

 Yep.  Anything else appears to be a research problem.


I'll change the proposal. Thanks a lot!

Grettings,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] GSoC proposal - make an unlogged table logged

On Thu, Mar 6, 2014 at 5:05 PM, Fabrízio de Royes Mello 
fabriziome...@gmail.com wrote:


 On Thu, Mar 6, 2014 at 5:04 PM, Robert Haas robertmh...@gmail.com wrote:
 
  On Thu, Mar 6, 2014 at 2:52 PM, Fabrízio de Royes Mello
  fabriziome...@gmail.com wrote:
   On Thu, Mar 6, 2014 at 4:42 PM, Robert Haas robertmh...@gmail.com
wrote:
   I think this isn't a good design.  Per the discussion between Andres
   and I, I think that I think you should do is make ALTER TABLE .. SET
   LOGGED work just like VACUUM FULL, with the exception that it will
set
   a different relpersistence for the new relfilenode.  If you do it
that
   way, this will be less efficient, but much simpler, and you might
   actually finish it in one summer.
  
  
   Do it like 'VACUUM FULL' for any wal_level?
 
  Yep.  Anything else appears to be a research problem.
 

 I'll change the proposal. Thanks a lot!


One last question.

Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED too?
Thinking in a scope of one GSoC, of course.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] GSoC proposal - make an unlogged table logged

On Thu, Mar 6, 2014 at 5:04 PM, Robert Haas robertmh...@gmail.com wrote:

 On Thu, Mar 6, 2014 at 2:52 PM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
  On Thu, Mar 6, 2014 at 4:42 PM, Robert Haas robertmh...@gmail.com
wrote:
  I think this isn't a good design.  Per the discussion between Andres
  and I, I think that I think you should do is make ALTER TABLE .. SET
  LOGGED work just like VACUUM FULL, with the exception that it will set
  a different relpersistence for the new relfilenode.  If you do it that
  way, this will be less efficient, but much simpler, and you might
  actually finish it in one summer.
 
 
  Do it like 'VACUUM FULL' for any wal_level?

 Yep.  Anything else appears to be a research problem.


Updated proposal:

proposal

** Add to PostgreSQL the capacity to making an Unlogged table Logged **

Introduction

This project will allow to change an unlogged table (that doesn't create
transaction logs - WAL files) and it's dependencies to a logged table, in
other words, a regular table that create WAL files. To make this happen
we'll introduce a new SQL syntax:

ALTER TABLE name SET LOGGED;


Benefits to the PostgreSQL Community

The  unlogged tables feature was introduced by 9.1 version, and provide
better write performance than regular tables (logged), but are not
crash-safe. Their contents are automatically discarded (cleared) in a case
of a server crash, and their contents do not propagate to replication
slaves, either.
With the capacity of turning an unlogged table in a logged table will
allow us have the better of two features, in other words, we can use an
unlogged table to run a bulk load a thousands of lines (ETL scripts) and
get better performance, and then change it to a logged table to get
durability of loaded data.


Deliverables

This project has just one deliverable at the end. The deliverable will be
the implementation of the routines that transform an unlogged table to
logged, using the same algorithm of the vacuum full, with the exception
that it will set a different relpersistence for the new relfilenode.


Project Schedule

until May 19:
* create a website to the project (wiki.postgresql.org)
* create a public repository to the project (github.com/fabriziomello)
* read what has already been discussed by the community about the project (
http://wiki.postgresql.org/wiki/Todo)
* as already discussed in pgsql-hackers mailing list this feature will be
implemented similar to vacuum full, with the exception that it will set a
differente relpersistence for the new relfilenode
* learn about some PostgreSQL internals:
  . grammar (src/backend/parser/gram.y)
  . vacuum full (src/backend/commands/[vacuum.c | cluster.c])

May 19 - June 23
* implementation of the first prototype:
  . change the grammar of PostgreSQL to support ALTER TABLE ... SET LOGGED
  . implement and/or adapt the routines to change an unlogged table to
logged (similar to vacuum full)
* write documentation and the test cases
* submit this first prototype to the commitfest 2014/06 (
https://commitfest.postgresql.org/action/commitfest_view?id=22)

June 23 - June 27
* review with the Mentor of the work done until now

June 27 - August 18
* do the adjustments based on the community feedback during the commitfest
2014/06
* submit to the commitfest 2014/09 for final evaluation and maybe will be
committed to 9.5 version (webpage don't created yet)

August 18 - August 22
* final review with the Mentor of all work done.


About the proponent

Fabrízio de Royes Mello
e-mail: fabriziome...@gmail.com
twitter: @fabriziomello
github: http://github.com/fabriziomello
linkedin: http://linkedin.com/in/fabriziomello

Currently I help people and teams to take the full potential of relational
databases, especially PostgreSQL, helping teams to design the structure of
the database (modeling), build physical architecture (database schema),
programming (procedural languages), SQL (usage, tuning, best practices),
optimization and orchestration of instances in production too. I perform a
volunteer work for Brazilian Community of PostgreSQL (www.postgresql.org.br),
supporting mailing lists, organizing events (pgbr.postgresql.org.br) and
some admin tasks. And also I help a little the PostgreSQL Global
Development Group (PGDG) in the implementation of some features and review
of patches (git.postgresql.org).

/proposal

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-06 Thread Tom Lane

=?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= fabriziome...@gmail.com writes:
 Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED too?
 Thinking in a scope of one GSoC, of course.

I think it's basically the same thing.  You might hope to optimize it;
but you have to create (rather than remove) an init fork, and there's
no way to do that in exact sync with the commit.  So for safety I think
you have to copy the data into a new relfilenode.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

On Fri, Mar 7, 2014 at 12:36 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 =?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= fabriziome...@gmail.com
writes:
  Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED
too?
  Thinking in a scope of one GSoC, of course.

 I think it's basically the same thing.  You might hope to optimize it;
 but you have to create (rather than remove) an init fork, and there's
 no way to do that in exact sync with the commit.  So for safety I think
 you have to copy the data into a new relfilenode.


Well, the same thing that 'vacuum full' does, but changing relpersistence
to RELPERSISTENCE_UNLOGGED for the new relfilenode. Is this?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-05 Thread Fabrízio de Royes Mello

On Tue, Mar 4, 2014 at 5:00 PM, Fabrízio de Royes Mello 
fabriziome...@gmail.com wrote:


 On Tue, Mar 4, 2014 at 3:29 PM, Andres Freund and...@2ndquadrant.com
wrote:
 
  On 2014-03-04 12:54:02 -0500, Robert Haas wrote:
   On Tue, Mar 4, 2014 at 9:50 AM, Andres Freund and...@2ndquadrant.com
wrote:
On 2014-03-04 09:47:08 -0500, Robert Haas wrote:
Can't that be solved by just creating the permanent relation in a
new
relfilenode? That's equivalent to a rewrite, yes, but we need to do
that
for anything but wal_level=minimal anyway.
  
   Yes, that would work.  I've tended to view optimizing away the
   relfilenode copy as an indispensable part of this work, but that might
   be wrongheaded.  It would certainly be a lot easier to make this
   happen if we didn't insist on that.
 
  I think it'd already much better than today's situation, and it's a
  required codepath for wal_level  logical anyway. So even if somebody
  wants to make this work without the full copy for minimal, it'd still be
  a required codepath. So I am perfectly ok with a patch just adding that.
 

 Then is this a good idea for a GSoC project ?

 I don't know very well this internals, but I am willing to learn and I
think the GSoC is a good opportunity.

 Any of you are willing to mentoring this project?


Hi all,


I written the proposal to this feature, so I would like to know if someone
can review.

proposal

** Add to PostgreSQL the capacity to making an Unlogged table Logged **

Introduction

This project will allow to change an unlogged table (that doesn't create
transaction logs - WAL files) and it's dependencies to a logged table, in
other words, a regular table that create WAL files. To make this happen
we'll introduce a new SQL syntax:

ALTER TABLE name SET LOGGED;


Benefits to the PostgreSQL Community

The  unlogged tables feature was introduced by 9.1 version, and provide
better write performance than regular tables (logged), but are not
crash-safe. Their contents are automatically discarded (cleared) in a case
of a server crash, and their contents do not propagate to replication
slaves, either.
With the capacity of turning an unlogged table in a logged table will
allow us have the better of two features, in other words, we can use an
unlogged table to run a bulk load a thousands of lines (ETL scripts) and
get better performance, and then change it to a logged table to get
durability of loaded data.


Deliverables

This project will be splitted into 2 (two) deliverables:
1) Allow change an unlogged table to logged when wal_level = minimal
(without propagate their contents to replication slaves)
2) Allow change an unlogged table to logged when wal_level != minimal
(propagating their contents to replication slaves)


Project Schedule

until May 19:
* create a website to the project (wiki.postgresql.org)
* create a public repository to the project (github.com/fabriziomello)
* read what has already been discussed by the community about the project (
http://wiki.postgresql.org/wiki/Todo)
* discuss with the community the best design to the feature
* learn about some PostgreSQL internals:
  . physical storage for relations (src/backend/catalog/storage.c)
  . transaction system (src/backend/access/transam/xact.c)
  . two-phase commit (src/backend/access/transam/twophase.c)
  . table commands (src/backend/commands/tablecmds.c)
  . grammar (src/backend/parser/gram.y)

May 19 - June 23
* evaluate with the mentor and community if is a good start point use the
already sent patch (
http://www.postgresql.org/message-id/263033.9223...@web29013.mail.ird.yahoo.com
)
* implementation of the first deliverable:
  . change the grammar of PostgreSQL to support ALTER TABLE ... SET LOGGED
  . implement the routines to change an unlogged table to logged when
wal_level = minimal
* write documentation and the test cases
* submit this first deliverable to the commitfest 2014/06 (
https://commitfest.postgresql.org/action/commitfest_view?id=22)

June 23 - June 27
* review with the Mentor of the work done until now

June 27 - August 18
* implementation of the second deliverable (wal_level != minimal)
* write documentation and the test cases
* submit this second deliverable to the commitfest 2014/09 (webpage don't
created yet)

August 18 - August 22
* final review with the Mentor of all work done.


About the proponent

Fabrízio de Royes Mello
e-mail: fabriziome...@gmail.com
twitter: @fabriziomello
github: http://github.com/fabriziomello
linkedin: http://linkedin.com/in/fabriziomello

Currently I help people and teams to take the full potential of relational
databases, especially PostgreSQL, helping teams to design the structure of
the database (modeling), build physical architecture (database schema),
programming (procedural languages), SQL (usage, tuning, best practices),
optimization and orchestration of instances in production too. I perform a
volunteer work for Brazilian Community of PostgreSQL (www.postgresql.org.br),
supporting mailing

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-04 Thread Robert Haas

On Mon, Mar 3, 2014 at 12:08 PM, Stephen Frost sfr...@snowman.net wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
 On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
  Is the TODO item make an unlogged table logged [1] a good GSoC project?

 I'm pretty sure we found some problems in that design that we couldn't
 figure out how to solve.  I don't have a pointer to the relevant
 -hackers discussion off-hand, but I think there was one.

 ISTR the discussion going something along the lines of we'd have to WAL
 log the entire table to do that, and if we have to do that, what's the
 point?.

No, not really.  The issue is more around what happens if we crash
part way through.  At crash recovery time, the system catalogs are not
available, because the database isn't consistent yet and, anyway, the
startup process can't be bound to a database, let alone every database
that might contain unlogged tables.  So the sentinel that's used to
decide whether to flush the contents of a table or index is the
presence or absence of an _init fork, which the startup process
obviously can see just fine.  The _init fork also tells us what to
stick in the relation when we reset it; for a table, we can just reset
to an empty file, but that's not legal for indexes, so the _init fork
contains a pre-initialized empty index that we can just copy over.

Now, to make an unlogged table logged, you've got to at some stage
remove those _init forks.  But this is not a transactional operation.
If you remove the _init forks and then the transaction rolls back,
you've left the system an inconsistent state.  If you postpone the
removal until commit time, then you have a problem if it fails,
particularly if it works for the first file but fails for the second.
And if you crash at any point before you've fsync'd the containing
directory, you have no idea which files will still be on disk after a
hard reboot.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-04 Thread Andres Freund

On 2014-03-04 09:47:08 -0500, Robert Haas wrote:
 On Mon, Mar 3, 2014 at 12:08 PM, Stephen Frost sfr...@snowman.net wrote:
  * Robert Haas (robertmh...@gmail.com) wrote:
  On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello
  fabriziome...@gmail.com wrote:
   Is the TODO item make an unlogged table logged [1] a good GSoC project?
 
  I'm pretty sure we found some problems in that design that we couldn't
  figure out how to solve.  I don't have a pointer to the relevant
  -hackers discussion off-hand, but I think there was one.
 
  ISTR the discussion going something along the lines of we'd have to WAL
  log the entire table to do that, and if we have to do that, what's the
  point?.
 
 No, not really.  The issue is more around what happens if we crash
 part way through.  At crash recovery time, the system catalogs are not
 available, because the database isn't consistent yet and, anyway, the
 startup process can't be bound to a database, let alone every database
 that might contain unlogged tables.  So the sentinel that's used to
 decide whether to flush the contents of a table or index is the
 presence or absence of an _init fork, which the startup process
 obviously can see just fine.  The _init fork also tells us what to
 stick in the relation when we reset it; for a table, we can just reset
 to an empty file, but that's not legal for indexes, so the _init fork
 contains a pre-initialized empty index that we can just copy over.
 
 Now, to make an unlogged table logged, you've got to at some stage
 remove those _init forks.  But this is not a transactional operation.
 If you remove the _init forks and then the transaction rolls back,
 you've left the system an inconsistent state.  If you postpone the
 removal until commit time, then you have a problem if it fails,
 particularly if it works for the first file but fails for the second.
 And if you crash at any point before you've fsync'd the containing
 directory, you have no idea which files will still be on disk after a
 hard reboot.

Can't that be solved by just creating the permanent relation in a new
relfilenode? That's equivalent to a rewrite, yes, but we need to do that
for anything but wal_level=minimal anyway.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-04 Thread Fabrízio de Royes Mello

On Tue, Mar 4, 2014 at 3:31 AM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2014-03-04 01:10:50 -0300, Fabrízio de Royes Mello wrote:
  Today I do something like that:
 
  1) create unlogged table tmp_foo ...
  2) populate 'tmp_foo' table (ETL scripts or whatever)
  3) start transaction
  4) lock table tmp_foo in access exclusive mode
  5) update pg_class set relpersistence = 'p' where oid =
'tmp_foo':regclass
  6) drop table foo; -- the old foo table
  7) alter table tmp_foo rename to foo;
  8) end transaction
  9) run pg_repack in table 'foo'
 
  I know it's very ugly, but works... and works for standbys too... :-)

 No, it doesn't work. It just may happen to not fail loudly/visibly in
 some cases. You're absolutely risking corruption of this *and* other
 relations when doing so.


Well this already works for some time, but you are correct, exists the risk
of corruption!

But in my case if all run without any interrupt the relation is switched to
logged. I do some checks before and after, and if something happens with
this process we cleanup everything and start from the beginning.

Maybe I must run CLUSTER inside the transaction block after update pg_class
and execute DROP and RENAME after, in a second phase. Maybe this way is
more secure. Is it?

If some crash occurs and PostgreSQL restart I check if the unlogged table
'tmp_foo' exists and then I drop it.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-04 Thread Fabrízio de Royes Mello

On Tue, Mar 4, 2014 at 11:50 AM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2014-03-04 09:47:08 -0500, Robert Haas wrote:
  On Mon, Mar 3, 2014 at 12:08 PM, Stephen Frost sfr...@snowman.net
wrote:
   * Robert Haas (robertmh...@gmail.com) wrote:
   On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello
   fabriziome...@gmail.com wrote:
Is the TODO item make an unlogged table logged [1] a good GSoC
project?
  
   I'm pretty sure we found some problems in that design that we
couldn't
   figure out how to solve.  I don't have a pointer to the relevant
   -hackers discussion off-hand, but I think there was one.
  
   ISTR the discussion going something along the lines of we'd have to
WAL
   log the entire table to do that, and if we have to do that, what's the
   point?.
 
  No, not really.  The issue is more around what happens if we crash
  part way through.  At crash recovery time, the system catalogs are not
  available, because the database isn't consistent yet and, anyway, the
  startup process can't be bound to a database, let alone every database
  that might contain unlogged tables.  So the sentinel that's used to
  decide whether to flush the contents of a table or index is the
  presence or absence of an _init fork, which the startup process
  obviously can see just fine.  The _init fork also tells us what to
  stick in the relation when we reset it; for a table, we can just reset
  to an empty file, but that's not legal for indexes, so the _init fork
  contains a pre-initialized empty index that we can just copy over.
 
  Now, to make an unlogged table logged, you've got to at some stage
  remove those _init forks.  But this is not a transactional operation.
  If you remove the _init forks and then the transaction rolls back,
  you've left the system an inconsistent state.  If you postpone the
  removal until commit time, then you have a problem if it fails,
  particularly if it works for the first file but fails for the second.
  And if you crash at any point before you've fsync'd the containing
  directory, you have no idea which files will still be on disk after a
  hard reboot.

 Can't that be solved by just creating the permanent relation in a new
 relfilenode? That's equivalent to a rewrite, yes, but we need to do that
 for anything but wal_level=minimal anyway.


Did you see this initial patch [1] from Leonardo Francalanci ?


Regards,

[1]
http://www.postgresql.org/message-id/263033.9223...@web29013.mail.ird.yahoo.com

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-04 Thread Robert Haas

On Tue, Mar 4, 2014 at 9:50 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-03-04 09:47:08 -0500, Robert Haas wrote:
 On Mon, Mar 3, 2014 at 12:08 PM, Stephen Frost sfr...@snowman.net wrote:
  * Robert Haas (robertmh...@gmail.com) wrote:
  On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello
  fabriziome...@gmail.com wrote:
   Is the TODO item make an unlogged table logged [1] a good GSoC 
   project?
 
  I'm pretty sure we found some problems in that design that we couldn't
  figure out how to solve.  I don't have a pointer to the relevant
  -hackers discussion off-hand, but I think there was one.
 
  ISTR the discussion going something along the lines of we'd have to WAL
  log the entire table to do that, and if we have to do that, what's the
  point?.

 No, not really.  The issue is more around what happens if we crash
 part way through.  At crash recovery time, the system catalogs are not
 available, because the database isn't consistent yet and, anyway, the
 startup process can't be bound to a database, let alone every database
 that might contain unlogged tables.  So the sentinel that's used to
 decide whether to flush the contents of a table or index is the
 presence or absence of an _init fork, which the startup process
 obviously can see just fine.  The _init fork also tells us what to
 stick in the relation when we reset it; for a table, we can just reset
 to an empty file, but that's not legal for indexes, so the _init fork
 contains a pre-initialized empty index that we can just copy over.

 Now, to make an unlogged table logged, you've got to at some stage
 remove those _init forks.  But this is not a transactional operation.
 If you remove the _init forks and then the transaction rolls back,
 you've left the system an inconsistent state.  If you postpone the
 removal until commit time, then you have a problem if it fails,
 particularly if it works for the first file but fails for the second.
 And if you crash at any point before you've fsync'd the containing
 directory, you have no idea which files will still be on disk after a
 hard reboot.

 Can't that be solved by just creating the permanent relation in a new
 relfilenode? That's equivalent to a rewrite, yes, but we need to do that
 for anything but wal_level=minimal anyway.

Yes, that would work.  I've tended to view optimizing away the
relfilenode copy as an indispensable part of this work, but that might
be wrongheaded.  It would certainly be a lot easier to make this
happen if we didn't insist on that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-04 Thread Andres Freund

On 2014-03-04 12:54:02 -0500, Robert Haas wrote:
 On Tue, Mar 4, 2014 at 9:50 AM, Andres Freund and...@2ndquadrant.com wrote:
  On 2014-03-04 09:47:08 -0500, Robert Haas wrote:
  Can't that be solved by just creating the permanent relation in a new
  relfilenode? That's equivalent to a rewrite, yes, but we need to do that
  for anything but wal_level=minimal anyway.
 
 Yes, that would work.  I've tended to view optimizing away the
 relfilenode copy as an indispensable part of this work, but that might
 be wrongheaded.  It would certainly be a lot easier to make this
 happen if we didn't insist on that.

I think it'd already much better than today's situation, and it's a
required codepath for wal_level  logical anyway. So even if somebody
wants to make this work without the full copy for minimal, it'd still be
a required codepath. So I am perfectly ok with a patch just adding that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-04 Thread Fabrízio de Royes Mello

On Tue, Mar 4, 2014 at 3:29 PM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2014-03-04 12:54:02 -0500, Robert Haas wrote:
  On Tue, Mar 4, 2014 at 9:50 AM, Andres Freund and...@2ndquadrant.com
wrote:
   On 2014-03-04 09:47:08 -0500, Robert Haas wrote:
   Can't that be solved by just creating the permanent relation in a new
   relfilenode? That's equivalent to a rewrite, yes, but we need to do
that
   for anything but wal_level=minimal anyway.
 
  Yes, that would work.  I've tended to view optimizing away the
  relfilenode copy as an indispensable part of this work, but that might
  be wrongheaded.  It would certainly be a lot easier to make this
  happen if we didn't insist on that.

 I think it'd already much better than today's situation, and it's a
 required codepath for wal_level  logical anyway. So even if somebody
 wants to make this work without the full copy for minimal, it'd still be
 a required codepath. So I am perfectly ok with a patch just adding that.


Then is this a good idea for a GSoC project ?

I don't know very well this internals, but I am willing to learn and I
think the GSoC is a good opportunity.

Any of you are willing to mentoring this project?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] GSoC proposal

2014-03-03 Thread Albe Laurenz

 I'm applying for GSoC 2014 with Postgresql and would appreciate your comments 
 on my proposal
 (attached). I'm looking for technical corrections/comments and your opinions 
 on the project's
 viability. In particular, if the community has doubts about its usefulness, I 
 would start working on
 an extra proposal from https://wiki.postgresql.org/wiki/GSoC_2014, perhaps on 
 the RETURNING clause as
 a student named Karlik did last year.

I am sure that Simon had his reasons when he proposed
http://www.postgresql.org/message-id/CA+U5nMJGgJNt5VXqkR=crtdqxfmuyzwef23-fd5nusns+6n...@mail.gmail.com
but I cannot help asking some questions:

1) Why limit the feature to UTF8 strings?
   Shouldn't the technique work for all multibyte server encodings?

2) There is probably something that makes this necessary, but why should the 
decision
   how toast is sliced be attached to the data type?
   My (probably naive) idea would be to add a new TOAST strategy (e.g. SLICED)
   to PLAIN, MAIN, EXTERNAL and EXTENDED.

The feature only makes sense for string data types, right?

Yours,
Laurenz Albe   

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] GSoC proposal - make an unlogged table logged

Hi all,

Is the TODO item make an unlogged table logged [1] a good GSoC project?

Regards,

[1]
http://www.postgresql.org/message-id/aanlktinenzbrxdcwohkqbba2bhubfy8_c5jwrxlod...@mail.gmail.com

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Robert Haas

On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello
fabriziome...@gmail.com wrote:
 Is the TODO item make an unlogged table logged [1] a good GSoC project?

I'm pretty sure we found some problems in that design that we couldn't
figure out how to solve.  I don't have a pointer to the relevant
-hackers discussion off-hand, but I think there was one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
 On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
  Is the TODO item make an unlogged table logged [1] a good GSoC project?
 
 I'm pretty sure we found some problems in that design that we couldn't
 figure out how to solve.  I don't have a pointer to the relevant
 -hackers discussion off-hand, but I think there was one.

ISTR the discussion going something along the lines of we'd have to WAL
log the entire table to do that, and if we have to do that, what's the
point?.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Tom Lane

Stephen Frost sfr...@snowman.net writes:
 * Robert Haas (robertmh...@gmail.com) wrote:
 On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
 Is the TODO item make an unlogged table logged [1] a good GSoC project?

 I'm pretty sure we found some problems in that design that we couldn't
 figure out how to solve.  I don't have a pointer to the relevant
 -hackers discussion off-hand, but I think there was one.

 ISTR the discussion going something along the lines of we'd have to WAL
 log the entire table to do that, and if we have to do that, what's the
 point?.

IIRC, the reason you'd have to do that is to make the table contents
appear on slave servers.  If you don't consider replication then it might
seem easier.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Hannu Krosing

On 03/03/2014 05:22 PM, Tom Lane wrote:
 Stephen Frost sfr...@snowman.net writes:
...
 ISTR the discussion going something along the lines of we'd have to WAL
 log the entire table to do that, and if we have to do that, what's the
 point?.
 IIRC, the reason you'd have to do that is to make the table contents
 appear on slave servers.  If you don't consider replication then it might
 seem easier.
So switch on logging and then perform CLUSTER/VACUUM FULL ?

Should this work, or is something extra needed ?

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Andres Freund

On 2014-03-03 12:08:26 -0500, Stephen Frost wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
  On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello
  fabriziome...@gmail.com wrote:
   Is the TODO item make an unlogged table logged [1] a good GSoC project?
  
  I'm pretty sure we found some problems in that design that we couldn't
  figure out how to solve.  I don't have a pointer to the relevant
  -hackers discussion off-hand, but I think there was one.
 
 ISTR the discussion going something along the lines of we'd have to WAL
 log the entire table to do that, and if we have to do that, what's the
 point?.

I don't see that as a particularly problematic problem. The primary
reason to want to convert a unlogged to a logged table probably is that
it's easier to do so than to recreate the table + dependencies. Also the
overhead of logging full pages will be noticeably smaller than the
overhead of adding all rows individually, even if using
heap_multi_insert().

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Peter Geoghegan

On Mon, Mar 3, 2014 at 8:28 AM, Fabrízio de Royes Mello
fabriziome...@gmail.com wrote:
 Is the TODO item make an unlogged table logged [1] a good GSoC project?

Another interesting project around unlogged tables would be to make it
possible to have unlogged indexes on fully-logged tables. That is
something that there was some discussion of before, that might be
easier.

FWIW, I don't think that TODO page is a very good resource for finding
a starter project. Picking a good project is a skill in and of itself.
A lot of that stuff is aspirational, either because it's difficult,
or, more often, because it's difficult relative to the rewards, which
can be quite low. To be honest, if I have what I imagine to be a great
idea for a project, I don't put it on that page. Maybe I should, but I
don't, and I don't think that is uncommon. This is not because I'm
particularly guarded about sharing the information.

Why do you think that hash indexes still aren't WAL-logged after all
these years (a project that someone made noise about recently in
relation to GSoC), even though that's generally considered to be a
SMOP?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Andres Freund

On 2014-03-03 12:44:26 -0800, Peter Geoghegan wrote:
 On Mon, Mar 3, 2014 at 8:28 AM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
  Is the TODO item make an unlogged table logged [1] a good GSoC project?
 
 Another interesting project around unlogged tables would be to make it
 possible to have unlogged indexes on fully-logged tables. That is
 something that there was some discussion of before, that might be
 easier.

I'd actually say it's harder because it requires modifying the catalog
or transparently introducing hacks similar to what unlogged matviews are
doing, to make sure the index is marked invalid after a crash restart.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

On Mon, Mar 3, 2014 at 2:40 PM, Hannu Krosing ha...@2ndquadrant.com wrote:

 On 03/03/2014 05:22 PM, Tom Lane wrote:
  Stephen Frost sfr...@snowman.net writes:
 ...
  ISTR the discussion going something along the lines of we'd have to
WAL
  log the entire table to do that, and if we have to do that, what's the
  point?.
  IIRC, the reason you'd have to do that is to make the table contents
  appear on slave servers.  If you don't consider replication then it
might
  seem easier.
 So switch on logging and then perform CLUSTER/VACUUM FULL ?

 Should this work, or is something extra needed ?


Today I do something like that:

1) create unlogged table tmp_foo ...
2) populate 'tmp_foo' table (ETL scripts or whatever)
3) start transaction
4) lock table tmp_foo in access exclusive mode
5) update pg_class set relpersistence = 'p' where oid = 'tmp_foo':regclass
6) drop table foo; -- the old foo table
7) alter table tmp_foo rename to foo;
8) end transaction
9) run pg_repack in table 'foo'

I know it's very ugly, but works... and works for standbys too... :-)

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] GSoC proposal - make an unlogged table logged

On Mon, Mar 3, 2014 at 5:44 PM, Peter Geoghegan p...@heroku.com wrote:

 On Mon, Mar 3, 2014 at 8:28 AM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
  Is the TODO item make an unlogged table logged [1] a good GSoC
project?

 Another interesting project around unlogged tables would be to make it
 possible to have unlogged indexes on fully-logged tables. That is
 something that there was some discussion of before, that might be
 easier.


More easy than make an unlogged table logged ?


 FWIW, I don't think that TODO page is a very good resource for finding
 a starter project. Picking a good project is a skill in and of itself.
 A lot of that stuff is aspirational, either because it's difficult,
 or, more often, because it's difficult relative to the rewards, which
 can be quite low. To be honest, if I have what I imagine to be a great
 idea for a project, I don't put it on that page. Maybe I should, but I
 don't, and I don't think that is uncommon. This is not because I'm
 particularly guarded about sharing the information.


Share your ideas, please!

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] GSoC proposal - make an unlogged table logged

On Mon, Mar 3, 2014 at 5:47 PM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2014-03-03 12:44:26 -0800, Peter Geoghegan wrote:
  On Mon, Mar 3, 2014 at 8:28 AM, Fabrízio de Royes Mello
  fabriziome...@gmail.com wrote:
   Is the TODO item make an unlogged table logged [1] a good GSoC
project?
 
  Another interesting project around unlogged tables would be to make it
  possible to have unlogged indexes on fully-logged tables. That is
  something that there was some discussion of before, that might be
  easier.

 I'd actually say it's harder because it requires modifying the catalog
 or transparently introducing hacks similar to what unlogged matviews are
 doing, to make sure the index is marked invalid after a crash restart.


Unlogged matviews was disallowed [1].

[1]
http://www.postgresql.org/message-id/e1uznrd-0008eh...@gemulon.postgresql.org

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] GSoC proposal - make an unlogged table logged