Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-04-04 Thread Robert Haas
On Thu, Apr 3, 2014 at 7:26 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 04/01/2014 08:39 PM, Heikki Linnakangas wrote:
 On 03/07/2014 05:36 AM, Tom Lane wrote:
 =?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= fabriziome...@gmail.com
 writes:
 Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED
 too?
 Thinking in a scope of one GSoC, of course.

 I think it's basically the same thing.  You might hope to optimize it;
 but you have to create (rather than remove) an init fork, and there's
 no way to do that in exact sync with the commit.

 You just have to include that information with the commit WAL record, no?

 No-one's replied yet, but perhaps the worry is that after you've written the
 commit record, you have to go ahead with removing/creating the init fork,
 and that is seen as too risky. If a creat() or unlink() call fails, that
 will have to be a PANIC, and crash recovery will likewise have to PANIC if
 the forks still cannot be removed/created.

Yeah, that's the concern.

If I may digress for a moment, unlogged materialized views are not
supported.  This is because we have this facility where if a
materialized view hasn't been populated yet, you get an error when you
try to scan it.  If we allowed unlogged materialized views, then
they'd get reset to empty rather than to not-populated, because the
not-populated status is stored in the catalog, not the filesystem.  I
still wish we'd never added the notion of populated in the first
place, but Kevin felt it was essential, so we ended up here.

Anyway, the idea that I had for fixing the unlogged materialized view
case was to add a new 64-bit integer to the control file that gets
bumped every time we start crash recovery, and which also gets
recorded in pg_class.  The value 0 would be reserved, and all pg_class
entries for non-unlogged relations would store 0.  For unlogged
relations, we could check whether the value in pg_class equals the
current value; if not, the relation should be viewed as not-populated.

This is not too far from a solution from the problem we need to solve
here.  If we want to make an unlogged relation logged, we can go ahead
and remove the init forks right away, knowing that the pg_class update
changing relpersistence and this new value won't take effect until
commit.  If the system meanwhile crashes, a backend connected to the
relevant database has enough state to recognize that the relation is
in this in-between state.  Before we can again use that relation, we
need to rebuild the init fork and reset it.  Of course, it's not too
clear exactly how that state cleanup happens; as one option, we could
just require the user to run a manual TRUNCATE.

This would not be totally without precedent, because CREATE INDEX
CONCURRENTLY leaves crap behind that the user has to reindex or drop,
but it's clearly not ideal.  Another option would be to try to make
autovacuum put things right, but that of course might not happen right
away.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-04-03 Thread Heikki Linnakangas

On 04/01/2014 08:58 PM, Andres Freund wrote:

On 2014-04-01 12:56:04 -0500, Jim Nasby wrote:

On 3/4/14, 8:50 AM, Andres Freund wrote:

Can't that be solved by just creating the permanent relation in a new
relfilenode? That's equivalent to a rewrite, yes, but we need to do that
for anything but wal_level=minimal anyway.


Maybe I'm missing something, but doesn't this actually involve writing the data 
twice? Once into WAL and again into the relation itself?


Yes. But as I said, that's unavoidable for anything but
wal_level=minimal.


Ideally, you would *only* write the data to WAL, when you do ALTER TABLE 
... SET LOGGED. There's no fundamental reason you need to rewrite the 
heap, too. I understand that it might be difficult to do, because of the 
way the system catalogs work, but it's worthy goal.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-04-03 Thread Andres Freund
On 2014-04-03 13:38:29 +0300, Heikki Linnakangas wrote:
 On 04/01/2014 08:58 PM, Andres Freund wrote:
 On 2014-04-01 12:56:04 -0500, Jim Nasby wrote:
 On 3/4/14, 8:50 AM, Andres Freund wrote:
 Can't that be solved by just creating the permanent relation in a new
 relfilenode? That's equivalent to a rewrite, yes, but we need to do that
 for anything but wal_level=minimal anyway.
 
 Maybe I'm missing something, but doesn't this actually involve writing the 
 data twice? Once into WAL and again into the relation itself?
 
 Yes. But as I said, that's unavoidable for anything but
 wal_level=minimal.
 
 Ideally, you would *only* write the data to WAL, when you do ALTER TABLE ...
 SET LOGGED. There's no fundamental reason you need to rewrite the heap, too.
 I understand that it might be difficult to do, because of the way the system
 catalogs work, but it's worthy goal.

I don't think that's realistic to achieve due to the issues described in
http://archives.postgresql.org/message-id/CA%2BTgmob44LNwwU73N1aJsGQyzQ61SdhKJRC_89wCm0%2BaLg%3Dx2Q%40mail.gmail.com

I don't think it's worthwile to make the feature much more complex, just
to address this. perfect is the enemy of good and all that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-04-03 Thread Heikki Linnakangas

On 04/03/2014 01:44 PM, Andres Freund wrote:

On 2014-04-03 13:38:29 +0300, Heikki Linnakangas wrote:

On 04/01/2014 08:58 PM, Andres Freund wrote:

On 2014-04-01 12:56:04 -0500, Jim Nasby wrote:

On 3/4/14, 8:50 AM, Andres Freund wrote:

Can't that be solved by just creating the permanent relation in a new
relfilenode? That's equivalent to a rewrite, yes, but we need to do that
for anything but wal_level=minimal anyway.


Maybe I'm missing something, but doesn't this actually involve writing the data 
twice? Once into WAL and again into the relation itself?


Yes. But as I said, that's unavoidable for anything but
wal_level=minimal.


Ideally, you would *only* write the data to WAL, when you do ALTER TABLE ...
SET LOGGED. There's no fundamental reason you need to rewrite the heap, too.
I understand that it might be difficult to do, because of the way the system
catalogs work, but it's worthy goal.


I don't think that's realistic to achieve due to the issues described in
http://archives.postgresql.org/message-id/CA%2BTgmob44LNwwU73N1aJsGQyzQ61SdhKJRC_89wCm0%2BaLg%3Dx2Q%40mail.gmail.com


To which I replied here: 
http://www.postgresql.org/message-id/533af9d7.7010...@vmware.com. Please 
reply to that sub-thread with any problems you see. I might be missing 
something, but I really don't see any insurmountable problem here.



I don't think it's worthwile to make the feature much more complex, just
to address this. perfect is the enemy of good and all that.


We should do the trivial implementation first, sure. But that ought to 
be trivial. Now is the time to discuss how to do the more optimal thing. 
If we can come up with a feasible design on that, Fabrizio will have 
time to do that as part of the GSoC.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-04-03 Thread Andres Freund
On 2014-04-01 20:39:35 +0300, Heikki Linnakangas wrote:
 On 03/07/2014 05:36 AM, Tom Lane wrote:
 =?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= fabriziome...@gmail.com writes:
 Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED too?
 Thinking in a scope of one GSoC, of course.
 
 I think it's basically the same thing.  You might hope to optimize it;
 but you have to create (rather than remove) an init fork, and there's
 no way to do that in exact sync with the commit.
 
 You just have to include that information with the commit WAL record, no?

Sure, it's possible to do that. But that seems like complicating generic
paths more than I'd like for a minor feature. Especially as the
unlinking of the files would need to happen somewhere in
RecordTransactionCommit(). After the XLogFlush(), but before unsetting
MyPgXact-delayChkpt. That's a crit section, right?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-04-03 Thread Heikki Linnakangas

On 04/01/2014 08:39 PM, Heikki Linnakangas wrote:

On 03/07/2014 05:36 AM, Tom Lane wrote:

=?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= fabriziome...@gmail.com writes:

Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED too?
Thinking in a scope of one GSoC, of course.


I think it's basically the same thing.  You might hope to optimize it;
but you have to create (rather than remove) an init fork, and there's
no way to do that in exact sync with the commit.


You just have to include that information with the commit WAL record, no?


No-one's replied yet, but perhaps the worry is that after you've written 
the commit record, you have to go ahead with removing/creating the init 
fork, and that is seen as too risky. If a creat() or unlink() call 
fails, that will have to be a PANIC, and crash recovery will likewise 
have to PANIC if the forks still cannot be removed/created.


My first thought is that that seems ok. It's unlikely that an unlink() 
of a small file in the data directory would fail. Creation could be done 
with a temporary name first and renamed into place, to avoid running out 
of disk space in the critical section.


If that's not acceptable, one idea off the top of my head is to somehow 
stamp the init forks when making an unlogged table logged, with the XID 
of the transcation. Crash recovery could then check the clog to see if 
the transaction committed, and ignore any init fork files belonging to 
committed transactions. (Same in reverse when making a logged table 
unlogged).


Currently, we reset unlogged relations before replaying the WAL. That 
would have to be delayed until end of WAL replay, because otherwise we 
don't know if the transaction committed or not. Although if we go with 
the stamping approach, we could still reset unstamped files at the 
beginning of recovery.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-04-03 Thread Andres Freund
On 2014-04-03 14:26:50 +0300, Heikki Linnakangas wrote:
 On 04/01/2014 08:39 PM, Heikki Linnakangas wrote:
 On 03/07/2014 05:36 AM, Tom Lane wrote:
 =?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= fabriziome...@gmail.com writes:
 Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED too?
 Thinking in a scope of one GSoC, of course.
 
 I think it's basically the same thing.  You might hope to optimize it;
 but you have to create (rather than remove) an init fork, and there's
 no way to do that in exact sync with the commit.
 
 You just have to include that information with the commit WAL record, no?
 
 No-one's replied yet

That might be because it was a month after the initial discussion, and
at least I'd temporarily lost track of the thread ;)

 , but perhaps the worry is that after you've written the
 commit record, you have to go ahead with removing/creating the init fork,
 and that is seen as too risky. If a creat() or unlink() call fails, that
 will have to be a PANIC, and crash recovery will likewise have to PANIC if
 the forks still cannot be removed/created.

That's part of the worry, yes. It's also creeping code dealing with
unlogged relations into a fairly critical place
(RecordTransactionCommit()) where it really doesn't seem to belong.

 My first thought is that that seems ok. It's unlikely that an unlink() of a
 small file in the data directory would fail. Creation could be done with a
 temporary name first and renamed into place, to avoid running out of disk
 space in the critical section.

I continue to feel that that's far too much impact for a minor
feature. Even if it could be made work reliably, it'll be a fair amount
of seldomly used infrastructure.

 If that's not acceptable, one idea off the top of my head is to somehow
 stamp the init forks when making an unlogged table logged, with the XID of
 the transcation. Crash recovery could then check the clog to see if the
 transaction committed, and ignore any init fork files belonging to committed
 transactions. (Same in reverse when making a logged table unlogged).

I've thought about that - after all, the logical decoding stuff uses
that trick in some places - but it has the grave disadvantage that it
requires a full directory scan to fully remove a relation. That seems to
be a heavy price.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-04-03 Thread Andres Freund
On 2014-04-03 13:38:29 +0300, Heikki Linnakangas wrote:
 On 04/01/2014 08:58 PM, Andres Freund wrote:
 On 2014-04-01 12:56:04 -0500, Jim Nasby wrote:
 On 3/4/14, 8:50 AM, Andres Freund wrote:
 Can't that be solved by just creating the permanent relation in a new
 relfilenode? That's equivalent to a rewrite, yes, but we need to do that
 for anything but wal_level=minimal anyway.
 
 Maybe I'm missing something, but doesn't this actually involve writing the 
 data twice? Once into WAL and again into the relation itself?
 
 Yes. But as I said, that's unavoidable for anything but
 wal_level=minimal.
 
 Ideally, you would *only* write the data to WAL, when you do ALTER TABLE ...
 SET LOGGED. There's no fundamental reason you need to rewrite the
 heap, too.

As another point: What's the advantage of that? The amount of writes
will be the same, no? It doesn't seem to be all that interesting that
a second filenode exists temporarily?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-04-03 Thread Heikki Linnakangas

On 04/03/2014 02:41 PM, Andres Freund wrote:

On 2014-04-03 13:38:29 +0300, Heikki Linnakangas wrote:

On 04/01/2014 08:58 PM, Andres Freund wrote:

On 2014-04-01 12:56:04 -0500, Jim Nasby wrote:

On 3/4/14, 8:50 AM, Andres Freund wrote:

Can't that be solved by just creating the permanent relation in a new
relfilenode? That's equivalent to a rewrite, yes, but we need to do that
for anything but wal_level=minimal anyway.


Maybe I'm missing something, but doesn't this actually involve writing the data 
twice? Once into WAL and again into the relation itself?


Yes. But as I said, that's unavoidable for anything but
wal_level=minimal.


Ideally, you would *only* write the data to WAL, when you do ALTER TABLE ...
SET LOGGED. There's no fundamental reason you need to rewrite the
heap, too.


As another point: What's the advantage of that? The amount of writes
will be the same, no? It doesn't seem to be all that interesting that
a second filenode exists temporarily?


Surely it's cheaper to read the whole relation and copy it to just WAL, 
than to read the whole relation and write it both the WAL and another file.


(Maybe it's not worth the trouble to avoid it - but that depends on 
whether we come up with a good design..)


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-04-03 Thread Andres Freund
On 2014-04-03 15:02:27 +0300, Heikki Linnakangas wrote:
 On 04/03/2014 02:41 PM, Andres Freund wrote:
 On 2014-04-03 13:38:29 +0300, Heikki Linnakangas wrote:
 On 04/01/2014 08:58 PM, Andres Freund wrote:
 On 2014-04-01 12:56:04 -0500, Jim Nasby wrote:
 On 3/4/14, 8:50 AM, Andres Freund wrote:
 Can't that be solved by just creating the permanent relation in a new
 relfilenode? That's equivalent to a rewrite, yes, but we need to do that
 for anything but wal_level=minimal anyway.
 
 Maybe I'm missing something, but doesn't this actually involve writing 
 the data twice? Once into WAL and again into the relation itself?
 
 Yes. But as I said, that's unavoidable for anything but
 wal_level=minimal.
 
 Ideally, you would *only* write the data to WAL, when you do ALTER TABLE ...
 SET LOGGED. There's no fundamental reason you need to rewrite the
 heap, too.
 
 As another point: What's the advantage of that? The amount of writes
 will be the same, no? It doesn't seem to be all that interesting that
 a second filenode exists temporarily?
 
 Surely it's cheaper to read the whole relation and copy it to just WAL, than
 to read the whole relation and write it both the WAL and another file.

I have to admit I was thinking of the WAL replay case ;). But we'll
actually have to write all dirty s_b, change the persistency tags and
such anyway because there's no LSN interlock with checkpoints. That
seems pretty ugly as well, and once again, avoidable by a rewrite.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-04-03 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 No-one's replied yet, but perhaps the worry is that after you've written 
 the commit record, you have to go ahead with removing/creating the init 
 fork, and that is seen as too risky. If a creat() or unlink() call 
 fails, that will have to be a PANIC, and crash recovery will likewise 
 have to PANIC if the forks still cannot be removed/created.

 My first thought is that that seems ok.

No, it isn't.  No filesystem operation should *ever* be thought to be
guaranteed to succeed.

I also concur with Andres' complaint that this feature is not worth
adding complication to the core transaction commit path for.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-04-02 Thread Fabrízio de Royes Mello
On Tue, Apr 1, 2014 at 1:40 PM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2014-04-01 13:37:57 -0300, Fabrízio de Royes Mello wrote:
  In the GSoC proposal page [1] I received some suggestions to strech
goals:
 
  * ALTER TABLE name SET UNLOGGED. This is essentially the reverse of
the
  core proposal, which is ALTER TABLE name SET LOGGED. Yes, I think that
  should definitely be included. It would be weird to have SET LOGGED but
not
  SET UNLOGGED.

 Yes, that makes sense.

  * Allow unlogged indexes on logged tables.

 I don't think it's realistic to build the infrastructure necessary for
 that as part of gsoc. The reasons have been explained somewhere in this
 thread.

  * Implement ALTER TABLE name SET LOGGED without rewriting the whole
  table, when wal_level = minimal.

 Yea, maybe.

  * Allow unlogged materialized views.

 I don't think that's realistic either.


Thanks Andres for your comments.

Anyway I added this additional goals to proposal and as Heikki said:

It's actually nice to have several separate goals like this, it means that
if you finish the task faster than expected, you can move on to the next
goal, and if one task takes longer than expected so that you don't finish
all the work, we'll still have something useful.

I hope you can help me in some way with this project too. :-)

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-04-01 Thread Fabrízio de Royes Mello
On Fri, Mar 7, 2014 at 12:36 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 =?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= fabriziome...@gmail.com
 writes:
  Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED
 too?
  Thinking in a scope of one GSoC, of course.

 I think it's basically the same thing.  You might hope to optimize it;
 but you have to create (rather than remove) an init fork, and there's
 no way to do that in exact sync with the commit.  So for safety I think
 you have to copy the data into a new relfilenode.


Hi all,

In the GSoC proposal page [1] I received some suggestions to strech goals:

* ALTER TABLE name SET UNLOGGED. This is essentially the reverse of the
core proposal, which is ALTER TABLE name SET LOGGED. Yes, I think that
should definitely be included. It would be weird to have SET LOGGED but not
SET UNLOGGED.

* Allow unlogged indexes on logged tables.

* Implement ALTER TABLE name SET LOGGED without rewriting the whole
table, when wal_level = minimal.

* Allow unlogged materialized views.

Comments?


[1]
http://www.google-melange.com/gsoc/proposal/review/student/google/gsoc2014/fabriziomello/5629499534213120

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: 
 http://frabriziomello.blogspot.comhttp://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-04-01 Thread Andres Freund
On 2014-04-01 13:37:57 -0300, Fabrízio de Royes Mello wrote:
 In the GSoC proposal page [1] I received some suggestions to strech goals:
 
 * ALTER TABLE name SET UNLOGGED. This is essentially the reverse of the
 core proposal, which is ALTER TABLE name SET LOGGED. Yes, I think that
 should definitely be included. It would be weird to have SET LOGGED but not
 SET UNLOGGED.

Yes, that makes sense.

 * Allow unlogged indexes on logged tables.

I don't think it's realistic to build the infrastructure necessary for
that as part of gsoc. The reasons have been explained somewhere in this
thread.

 * Implement ALTER TABLE name SET LOGGED without rewriting the whole
 table, when wal_level = minimal.

Yea, maybe.

 * Allow unlogged materialized views.

I don't think that's realistic either.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-04-01 Thread Heikki Linnakangas

On 03/07/2014 05:36 AM, Tom Lane wrote:

=?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= fabriziome...@gmail.com writes:

Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED too?
Thinking in a scope of one GSoC, of course.


I think it's basically the same thing.  You might hope to optimize it;
but you have to create (rather than remove) an init fork, and there's
no way to do that in exact sync with the commit.


You just have to include that information with the commit WAL record, no?

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-04-01 Thread Jim Nasby

On 3/4/14, 8:50 AM, Andres Freund wrote:

Can't that be solved by just creating the permanent relation in a new
relfilenode? That's equivalent to a rewrite, yes, but we need to do that
for anything but wal_level=minimal anyway.


Maybe I'm missing something, but doesn't this actually involve writing the data 
twice? Once into WAL and again into the relation itself?
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-04-01 Thread Andres Freund
On 2014-04-01 12:56:04 -0500, Jim Nasby wrote:
 On 3/4/14, 8:50 AM, Andres Freund wrote:
 Can't that be solved by just creating the permanent relation in a new
 relfilenode? That's equivalent to a rewrite, yes, but we need to do that
 for anything but wal_level=minimal anyway.
 
 Maybe I'm missing something, but doesn't this actually involve writing the 
 data twice? Once into WAL and again into the relation itself?

Yes. But as I said, that's unavoidable for anything but
wal_level=minimal. If somebody wants to put in the additional nontrivial
work to make it work faster with wal_level=minimal, they can do so. But
the other case is more general and needs to be done anyway.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal. Index-only scans for GIST

2014-03-25 Thread Anastasia Lubennikova
2014-03-18 18:47 GMT+04:00 Robert Haas robertmh...@gmail.com


  If the fetch() is specified by the developer, then using it, algorithm
 can
  retrieve the data directly to output areas at this stage, without
 reference
  to the heap.

 This seems to be the crux of your proposal, but it seems vague: what
 exactly do you mean by retrieve the data directly to output areas?
  What data are you going to retrieve and where are you going to put it?


 I meant Datum that storages in Gist-tree nodes. Now gistgettuple() returns
xs_ctup.t_self (item pointer). I'm going to add index-only scan
functionality: gistsettuple() will return pointer and Datum itself as
xs_itup . So queue will contain both the pointer and the Datum. If
visibilitymap_test returns true then Datum from xs_itup would be added into
queue. Otherwise page must be scanned.

Another question to consider is: which operator classes do you
 anticipate that this will work for and which ones do you anticipate
 that it will not work for?  Any operator class that lossifies that
 input data before storing it in the index is presumably doomed, but
 which ones do that, and which do not?


about amcanreturn:
I'm going to create function gistcanreturn() = If fetch() is defined for
all indexed columns?

And last point of my project is to implement fetch() for existing opclasses
based on GIST.

-- 
Best regards,
Lubennikova Anastasia


[HACKERS] GSoC proposal. Index-only scans for GIST

2014-03-18 Thread Anastasia Lubennikova
Hello!
Here is the text of my proposal which I've applied to GSoC.
(and link
http://www.google-melange.com/gsoc/proposal/public/google/gsoc2014/lubennikovaav/5629499534213120)
Any suggestions and comments are welcome.

*Project name*

Support for index-only scans for GIST index



*Brief review*

Currently GiST index don't have index-only scan functionality. Index-only
scans are a major performance feature added to Postgres 9.2. They allow
certain types of queries to be satisfied just by retrieving data from
indexes, and not from tables. This feature for B-trees (implemented since
PostgreSQL-9.2) allows doing certain types of queries significantly faster.
This is achieved by reduction in the amount of I/O necessary to satisfy
queries. I think it will be useful to implement this feature for user
defined data types that use GiST index.



*Benefits to the PostgreSQL Community*

Faster GiST index search would be actual for many PostgreSQL applications
(for example some GIS systems).


 *Quantifiable results*

Acceleration of GiST index search.


*Project details*

1. The GiST is a balanced tree structure like a B-tree, containing key,
pointer pairs. GiST key is a member of a user-defined class, and
represents some property that is true of all data items reachable from the
pointer associated with the key. The GiST provides a possibility to create
custom data types with indexed access methods and extensible set of queries.

There are seven methods that an index operator class for GiST must provide,
and an eighth that is optional.

-consistent

-union

-compress

-decompress

-penalty

-picksplit

-equal

-distance (optional)

I'm going to create new optional method fetch() in addition to existing.
Thus user can create a method of retrieving data from the index without
loss. It will be used when performing search queries to speed data
retrieving.



2. gistget algorithm (several parts omitted):

Check the key
gistindex_keytest() - does this index tuple satisfy the scan key(s)?

Scan all items on the GiST index page and insert them into the queue (or
directly to output areas)

plain scan

Heap tuple TIDs are returned into so-pageData[]

ordered scan

Heap tuple TIDs are pushed into individual search queue items

If the fetch() is specified by the developer, then using it, algorithm can
retrieve the data directly to output areas at this stage, without reference
to the heap.


3. Create function gistcanreturn to check whether fetch() is specified by
user.

Amcanreturn - Function to check whether index supports index-only scans, or
zero if none

There is the question of support index-only scans for multicolumn indexes.
Probably it would require extend the interface to add separate columns
checking.

To solve this question I need the community's help.


4. Add details for index only scans into gistcostestimate function



 *Links*

1) Hellerstein J. M., Naughton J. F., Pfeffer A. Generalized search
trees for database systems. - September, 1995.

2) http://www.sai.msu.su/~megera/postgres/gist/

3) PostgreSQL 9.3.3 Documentation: chapters 11. Indexes, 55. GiST
Indexes.

-- 
Best regards,
Lubennikova Anastasia


Re: [HACKERS] GSoC proposal. Index-only scans for GIST

2014-03-18 Thread Robert Haas
On Tue, Mar 18, 2014 at 9:12 AM, Anastasia Lubennikova
lubennikov...@gmail.com wrote:
 Support for index-only scans for GIST index

This is a cool idea, if it can be made to work.

 If the fetch() is specified by the developer, then using it, algorithm can
 retrieve the data directly to output areas at this stage, without reference
 to the heap.

This seems to be the crux of your proposal, but it seems vague: what
exactly do you mean by retrieve the data directly to output areas?
What data are you going to retrieve and where are you going to put it?

Another question to consider is: which operator classes do you
anticipate that this will work for and which ones do you anticipate
that it will not work for?  Any operator class that lossifies that
input data before storing it in the index is presumably doomed, but
which ones do that, and which do not?

Tom Lane previously proposed extending SP-GiST to support index-only
scans.  You might find that discussing worth reading, or perhaps
consider it as an alternative if GiST doesn't work out:

http://www.postgresql.org/message-id/10839.1323885...@sss.pgh.pa.us

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal. Index-only scans for GIST

2014-03-18 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Tom Lane previously proposed extending SP-GiST to support index-only
 scans.  You might find that discussing worth reading, or perhaps
 consider it as an alternative if GiST doesn't work out:
 http://www.postgresql.org/message-id/10839.1323885...@sss.pgh.pa.us

That wasn't just a proposal, see commits
3695a555136a6d179cac8ae48d5f90171d5b30e9 and
92203624934095163f8b57b5b3d7bbd2645da2c8.  But yeah, that might be a
useful reference for what is likely to be involved with making GIST
do it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal. Index-only scans for GIST

2014-03-18 Thread Josh Berkus
Alexander,

Can you comment on the below proposal?  I'd like your opinion on how
difficult it will be.

On 03/18/2014 06:12 AM, Anastasia Lubennikova wrote:
 Hello!
 Here is the text of my proposal which I've applied to GSoC.
 (and link
 http://www.google-melange.com/gsoc/proposal/public/google/gsoc2014/lubennikovaav/5629499534213120)
 Any suggestions and comments are welcome.
 
 *Project name*
 
 Support for index-only scans for GIST index
 
 
 
 *Brief review*
 
 Currently GiST index don't have index-only scan functionality. Index-only
 scans are a major performance feature added to Postgres 9.2. They allow
 certain types of queries to be satisfied just by retrieving data from
 indexes, and not from tables. This feature for B-trees (implemented since
 PostgreSQL-9.2) allows doing certain types of queries significantly faster.
 This is achieved by reduction in the amount of I/O necessary to satisfy
 queries. I think it will be useful to implement this feature for user
 defined data types that use GiST index.
 
 
 
 *Benefits to the PostgreSQL Community*
 
 Faster GiST index search would be actual for many PostgreSQL applications
 (for example some GIS systems).
 
 
  *Quantifiable results*
 
 Acceleration of GiST index search.
 
 
 *Project details*
 
 1. The GiST is a balanced tree structure like a B-tree, containing key,
 pointer pairs. GiST key is a member of a user-defined class, and
 represents some property that is true of all data items reachable from the
 pointer associated with the key. The GiST provides a possibility to create
 custom data types with indexed access methods and extensible set of queries.
 
 There are seven methods that an index operator class for GiST must provide,
 and an eighth that is optional.
 
 -consistent
 
 -union
 
 -compress
 
 -decompress
 
 -penalty
 
 -picksplit
 
 -equal
 
 -distance (optional)
 
 I'm going to create new optional method fetch() in addition to existing.
 Thus user can create a method of retrieving data from the index without
 loss. It will be used when performing search queries to speed data
 retrieving.
 
 
 
 2. gistget algorithm (several parts omitted):
 
 Check the key
 gistindex_keytest() - does this index tuple satisfy the scan key(s)?
 
 Scan all items on the GiST index page and insert them into the queue (or
 directly to output areas)
 
 plain scan
 
 Heap tuple TIDs are returned into so-pageData[]
 
 ordered scan
 
 Heap tuple TIDs are pushed into individual search queue items
 
 If the fetch() is specified by the developer, then using it, algorithm can
 retrieve the data directly to output areas at this stage, without reference
 to the heap.
 
 
 3. Create function gistcanreturn to check whether fetch() is specified by
 user.
 
 Amcanreturn - Function to check whether index supports index-only scans, or
 zero if none
 
 There is the question of support index-only scans for multicolumn indexes.
 Probably it would require extend the interface to add separate columns
 checking.
 
 To solve this question I need the community's help.
 
 
 4. Add details for index only scans into gistcostestimate function
 
 
 
  *Links*
 
 1) Hellerstein J. M., Naughton J. F., Pfeffer A. Generalized search
 trees for database systems. - September, 1995.
 
 2) http://www.sai.msu.su/~megera/postgres/gist/
 
 3) PostgreSQL 9.3.3 Documentation: chapters 11. Indexes, 55. GiST
 Indexes.
 


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal. Index-only scans for GIST

2014-03-18 Thread Alexander Korotkov
On Tue, Mar 18, 2014 at 5:12 PM, Anastasia Lubennikova 
lubennikov...@gmail.com wrote:

 2. gistget algorithm (several parts omitted):

 Check the key
 gistindex_keytest() - does this index tuple satisfy the scan key(s)?

 Scan all items on the GiST index page and insert them into the queue (or
 directly to output areas)

 plain scan

 Heap tuple TIDs are returned into so-pageData[]

 ordered scan

 Heap tuple TIDs are pushed into individual search queue items

 If the fetch() is specified by the developer, then using it, algorithm can
 retrieve the data directly to output areas at this stage, without reference
 to the heap.


I think there are following changes to be made to GiST code:
1) When next consistent IndexTuple is found extract original Datums using
fetch method.
2) Put those original Datums to queue.
3) When returning next ItemPointer from queue put original Datums to
IndexScanDesc (into xs_itup).

 3. Create function gistcanreturn to check whether fetch() is specified by
 user.

 Amcanreturn - Function to check whether index supports index-only scans,
 or zero if none

 There is the question of support index-only scans for multicolumn indexes.
 Probably it would require extend the interface to add separate columns
 checking.

 To solve this question I need the community's help.


 4. Add details for index only scans into gistcostestimate function


Also, another part of work to be mentioned is to implement fetch function
for all suitable opclasses.


With best regards,
Alexander Korotkov.


Re: [HACKERS] GSoC proposal. Index-only scans for GIST

2014-03-18 Thread Alexander Korotkov
Josh,

Anastasia has already consulted to me in person. It is not big proposal.
But for newbie who is not familiar with PostgreSQL code base and especially
GiST it seems fair enough.


With best regards,
Alexander Korotkov.

On Tue, Mar 18, 2014 at 9:16 PM, Josh Berkus j...@agliodbs.com wrote:

 Alexander,

 Can you comment on the below proposal?  I'd like your opinion on how
 difficult it will be.

 On 03/18/2014 06:12 AM, Anastasia Lubennikova wrote:
  Hello!
  Here is the text of my proposal which I've applied to GSoC.
  (and link
 
 http://www.google-melange.com/gsoc/proposal/public/google/gsoc2014/lubennikovaav/5629499534213120
 )
  Any suggestions and comments are welcome.
 
  *Project name*
 
  Support for index-only scans for GIST index
 
 
 
  *Brief review*
 
  Currently GiST index don't have index-only scan functionality. Index-only
  scans are a major performance feature added to Postgres 9.2. They allow
  certain types of queries to be satisfied just by retrieving data from
  indexes, and not from tables. This feature for B-trees (implemented since
  PostgreSQL-9.2) allows doing certain types of queries significantly
 faster.
  This is achieved by reduction in the amount of I/O necessary to satisfy
  queries. I think it will be useful to implement this feature for user
  defined data types that use GiST index.
 
 
 
  *Benefits to the PostgreSQL Community*
 
  Faster GiST index search would be actual for many PostgreSQL applications
  (for example some GIS systems).
 
 
   *Quantifiable results*
 
  Acceleration of GiST index search.
 
 
  *Project details*
 
  1. The GiST is a balanced tree structure like a B-tree, containing key,
  pointer pairs. GiST key is a member of a user-defined class, and
  represents some property that is true of all data items reachable from
 the
  pointer associated with the key. The GiST provides a possibility to
 create
  custom data types with indexed access methods and extensible set of
 queries.
 
  There are seven methods that an index operator class for GiST must
 provide,
  and an eighth that is optional.
 
  -consistent
 
  -union
 
  -compress
 
  -decompress
 
  -penalty
 
  -picksplit
 
  -equal
 
  -distance (optional)
 
  I'm going to create new optional method fetch() in addition to existing.
  Thus user can create a method of retrieving data from the index without
  loss. It will be used when performing search queries to speed data
  retrieving.
 
 
 
  2. gistget algorithm (several parts omitted):
 
  Check the key
  gistindex_keytest() - does this index tuple satisfy the scan key(s)?
 
  Scan all items on the GiST index page and insert them into the queue (or
  directly to output areas)
 
  plain scan
 
  Heap tuple TIDs are returned into so-pageData[]
 
  ordered scan
 
  Heap tuple TIDs are pushed into individual search queue items
 
  If the fetch() is specified by the developer, then using it, algorithm
 can
  retrieve the data directly to output areas at this stage, without
 reference
  to the heap.
 
 
  3. Create function gistcanreturn to check whether fetch() is specified by
  user.
 
  Amcanreturn - Function to check whether index supports index-only scans,
 or
  zero if none
 
  There is the question of support index-only scans for multicolumn
 indexes.
  Probably it would require extend the interface to add separate columns
  checking.
 
  To solve this question I need the community's help.
 
 
  4. Add details for index only scans into gistcostestimate function
 
 
 
   *Links*
 
  1) Hellerstein J. M., Naughton J. F., Pfeffer A. Generalized search
  trees for database systems. - September, 1995.
 
  2) http://www.sai.msu.su/~megera/postgres/gist/
 
  3) PostgreSQL 9.3.3 Documentation: chapters 11. Indexes, 55. GiST
  Indexes.
 


 --
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com



Re: [HACKERS] GSoC proposal. Index-only scans for GIST

2014-03-18 Thread Josh Berkus
On 03/18/2014 12:11 PM, Alexander Korotkov wrote:
 Josh,
 
 Anastasia has already consulted to me in person. It is not big proposal.
 But for newbie who is not familiar with PostgreSQL code base and especially
 GiST it seems fair enough.
 

Thanks, that's what I wanted to know.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-06 Thread Robert Haas
On Wed, Mar 5, 2014 at 7:42 PM, Fabrízio de Royes Mello
fabriziome...@gmail.com wrote:
 On Tue, Mar 4, 2014 at 5:00 PM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
 On Tue, Mar 4, 2014 at 3:29 PM, Andres Freund and...@2ndquadrant.com
 wrote:
 
  On 2014-03-04 12:54:02 -0500, Robert Haas wrote:
   On Tue, Mar 4, 2014 at 9:50 AM, Andres Freund and...@2ndquadrant.com
   wrote:
On 2014-03-04 09:47:08 -0500, Robert Haas wrote:
Can't that be solved by just creating the permanent relation in a
new
relfilenode? That's equivalent to a rewrite, yes, but we need to do
that
for anything but wal_level=minimal anyway.
  
   Yes, that would work.  I've tended to view optimizing away the
   relfilenode copy as an indispensable part of this work, but that might
   be wrongheaded.  It would certainly be a lot easier to make this
   happen if we didn't insist on that.
 
  I think it'd already much better than today's situation, and it's a
  required codepath for wal_level  logical anyway. So even if somebody
  wants to make this work without the full copy for minimal, it'd still be
  a required codepath. So I am perfectly ok with a patch just adding that.
 

 Then is this a good idea for a GSoC project ?

 I don't know very well this internals, but I am willing to learn and I
 think the GSoC is a good opportunity.

 Any of you are willing to mentoring this project?


 I written the proposal to this feature, so I would like to know if someone
 can review.

I think this isn't a good design.  Per the discussion between Andres
and I, I think that I think you should do is make ALTER TABLE .. SET
LOGGED work just like VACUUM FULL, with the exception that it will set
a different relpersistence for the new relfilenode.  If you do it that
way, this will be less efficient, but much simpler, and you might
actually finish it in one summer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-06 Thread Thom Brown
On 6 March 2014 19:42, Robert Haas robertmh...@gmail.com wrote:

 On Wed, Mar 5, 2014 at 7:42 PM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
  On Tue, Mar 4, 2014 at 5:00 PM, Fabrízio de Royes Mello
  fabriziome...@gmail.com wrote:
  On Tue, Mar 4, 2014 at 3:29 PM, Andres Freund and...@2ndquadrant.com
  wrote:
  
   On 2014-03-04 12:54:02 -0500, Robert Haas wrote:
On Tue, Mar 4, 2014 at 9:50 AM, Andres Freund 
 and...@2ndquadrant.com
wrote:
 On 2014-03-04 09:47:08 -0500, Robert Haas wrote:
 Can't that be solved by just creating the permanent relation in a
 new
 relfilenode? That's equivalent to a rewrite, yes, but we need to
 do
 that
 for anything but wal_level=minimal anyway.
   
Yes, that would work.  I've tended to view optimizing away the
relfilenode copy as an indispensable part of this work, but that
 might
be wrongheaded.  It would certainly be a lot easier to make this
happen if we didn't insist on that.
  
   I think it'd already much better than today's situation, and it's a
   required codepath for wal_level  logical anyway. So even if somebody
   wants to make this work without the full copy for minimal, it'd still
 be
   a required codepath. So I am perfectly ok with a patch just adding
 that.
  
 
  Then is this a good idea for a GSoC project ?
 
  I don't know very well this internals, but I am willing to learn and I
  think the GSoC is a good opportunity.
 
  Any of you are willing to mentoring this project?
 
 
  I written the proposal to this feature, so I would like to know if
 someone
  can review.

 I think this isn't a good design.  Per the discussion between Andres
 and I, I think that I think you should do is make ALTER TABLE .. SET
 LOGGED work just like VACUUM FULL, with the exception that it will set
 a different relpersistence for the new relfilenode.  If you do it that
 way, this will be less efficient, but much simpler, and you might
 actually finish it in one summer.


Sounds like a plan.  Would there be any stretch-goals for this work, or is
there not really anything else that could be done?

-- 
Thom


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-06 Thread Fabrízio de Royes Mello
On Thu, Mar 6, 2014 at 4:42 PM, Robert Haas robertmh...@gmail.com wrote:


 I think this isn't a good design.  Per the discussion between Andres
 and I, I think that I think you should do is make ALTER TABLE .. SET
 LOGGED work just like VACUUM FULL, with the exception that it will set
 a different relpersistence for the new relfilenode.  If you do it that
 way, this will be less efficient, but much simpler, and you might
 actually finish it in one summer.


Do it like 'VACUUM FULL' for any wal_level?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-06 Thread Robert Haas
On Thu, Mar 6, 2014 at 2:52 PM, Fabrízio de Royes Mello
fabriziome...@gmail.com wrote:
 On Thu, Mar 6, 2014 at 4:42 PM, Robert Haas robertmh...@gmail.com wrote:
 I think this isn't a good design.  Per the discussion between Andres
 and I, I think that I think you should do is make ALTER TABLE .. SET
 LOGGED work just like VACUUM FULL, with the exception that it will set
 a different relpersistence for the new relfilenode.  If you do it that
 way, this will be less efficient, but much simpler, and you might
 actually finish it in one summer.


 Do it like 'VACUUM FULL' for any wal_level?

Yep.  Anything else appears to be a research problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-06 Thread Fabrízio de Royes Mello
On Thu, Mar 6, 2014 at 5:04 PM, Robert Haas robertmh...@gmail.com wrote:

 On Thu, Mar 6, 2014 at 2:52 PM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
  On Thu, Mar 6, 2014 at 4:42 PM, Robert Haas robertmh...@gmail.com
wrote:
  I think this isn't a good design.  Per the discussion between Andres
  and I, I think that I think you should do is make ALTER TABLE .. SET
  LOGGED work just like VACUUM FULL, with the exception that it will set
  a different relpersistence for the new relfilenode.  If you do it that
  way, this will be less efficient, but much simpler, and you might
  actually finish it in one summer.
 
 
  Do it like 'VACUUM FULL' for any wal_level?

 Yep.  Anything else appears to be a research problem.


I'll change the proposal. Thanks a lot!

Grettings,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-06 Thread Fabrízio de Royes Mello
On Thu, Mar 6, 2014 at 5:05 PM, Fabrízio de Royes Mello 
fabriziome...@gmail.com wrote:


 On Thu, Mar 6, 2014 at 5:04 PM, Robert Haas robertmh...@gmail.com wrote:
 
  On Thu, Mar 6, 2014 at 2:52 PM, Fabrízio de Royes Mello
  fabriziome...@gmail.com wrote:
   On Thu, Mar 6, 2014 at 4:42 PM, Robert Haas robertmh...@gmail.com
wrote:
   I think this isn't a good design.  Per the discussion between Andres
   and I, I think that I think you should do is make ALTER TABLE .. SET
   LOGGED work just like VACUUM FULL, with the exception that it will
set
   a different relpersistence for the new relfilenode.  If you do it
that
   way, this will be less efficient, but much simpler, and you might
   actually finish it in one summer.
  
  
   Do it like 'VACUUM FULL' for any wal_level?
 
  Yep.  Anything else appears to be a research problem.
 

 I'll change the proposal. Thanks a lot!


One last question.

Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED too?
Thinking in a scope of one GSoC, of course.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-06 Thread Fabrízio de Royes Mello
On Thu, Mar 6, 2014 at 5:04 PM, Robert Haas robertmh...@gmail.com wrote:

 On Thu, Mar 6, 2014 at 2:52 PM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
  On Thu, Mar 6, 2014 at 4:42 PM, Robert Haas robertmh...@gmail.com
wrote:
  I think this isn't a good design.  Per the discussion between Andres
  and I, I think that I think you should do is make ALTER TABLE .. SET
  LOGGED work just like VACUUM FULL, with the exception that it will set
  a different relpersistence for the new relfilenode.  If you do it that
  way, this will be less efficient, but much simpler, and you might
  actually finish it in one summer.
 
 
  Do it like 'VACUUM FULL' for any wal_level?

 Yep.  Anything else appears to be a research problem.


Updated proposal:

proposal

** Add to PostgreSQL the capacity to making an Unlogged table Logged **

Introduction

This project will allow to change an unlogged table (that doesn't create
transaction logs - WAL files) and it's dependencies to a logged table, in
other words, a regular table that create WAL files. To make this happen
we'll introduce a new SQL syntax:

ALTER TABLE name SET LOGGED;


Benefits to the PostgreSQL Community

The  unlogged tables feature was introduced by 9.1 version, and provide
better write performance than regular tables (logged), but are not
crash-safe. Their contents are automatically discarded (cleared) in a case
of a server crash, and their contents do not propagate to replication
slaves, either.
With the capacity of turning an unlogged table in a logged table will
allow us have the better of two features, in other words, we can use an
unlogged table to run a bulk load a thousands of lines (ETL scripts) and
get better performance, and then change it to a logged table to get
durability of loaded data.


Deliverables

This project has just one deliverable at the end. The deliverable will be
the implementation of the routines that transform an unlogged table to
logged, using the same algorithm of the vacuum full, with the exception
that it will set a different relpersistence for the new relfilenode.


Project Schedule

until May 19:
* create a website to the project (wiki.postgresql.org)
* create a public repository to the project (github.com/fabriziomello)
* read what has already been discussed by the community about the project (
http://wiki.postgresql.org/wiki/Todo)
* as already discussed in pgsql-hackers mailing list this feature will be
implemented similar to vacuum full, with the exception that it will set a
differente relpersistence for the new relfilenode
* learn about some PostgreSQL internals:
  . grammar (src/backend/parser/gram.y)
  . vacuum full (src/backend/commands/[vacuum.c | cluster.c])

May 19 - June 23
* implementation of the first prototype:
  . change the grammar of PostgreSQL to support ALTER TABLE ... SET LOGGED
  . implement and/or adapt the routines to change an unlogged table to
logged (similar to vacuum full)
* write documentation and the test cases
* submit this first prototype to the commitfest 2014/06 (
https://commitfest.postgresql.org/action/commitfest_view?id=22)

June 23 - June 27
* review with the Mentor of the work done until now

June 27 - August 18
* do the adjustments based on the community feedback during the commitfest
2014/06
* submit to the commitfest 2014/09 for final evaluation and maybe will be
committed to 9.5 version (webpage don't created yet)

August 18 - August 22
* final review with the Mentor of all work done.


About the proponent

Fabrízio de Royes Mello
e-mail: fabriziome...@gmail.com
twitter: @fabriziomello
github: http://github.com/fabriziomello
linkedin: http://linkedin.com/in/fabriziomello

Currently I help people and teams to take the full potential of relational
databases, especially PostgreSQL, helping teams to design the structure of
the database (modeling), build physical architecture (database schema),
programming (procedural languages), SQL (usage, tuning, best practices),
optimization and orchestration of instances in production too. I perform a
volunteer work for Brazilian Community of PostgreSQL (www.postgresql.org.br),
supporting mailing lists, organizing events (pgbr.postgresql.org.br) and
some admin tasks. And also I help a little the PostgreSQL Global
Development Group (PGDG) in the implementation of some features and review
of patches (git.postgresql.org).

/proposal

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-06 Thread Tom Lane
=?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= fabriziome...@gmail.com writes:
 Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED too?
 Thinking in a scope of one GSoC, of course.

I think it's basically the same thing.  You might hope to optimize it;
but you have to create (rather than remove) an init fork, and there's
no way to do that in exact sync with the commit.  So for safety I think
you have to copy the data into a new relfilenode.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-06 Thread Fabrízio de Royes Mello
On Fri, Mar 7, 2014 at 12:36 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 =?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= fabriziome...@gmail.com
writes:
  Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED
too?
  Thinking in a scope of one GSoC, of course.

 I think it's basically the same thing.  You might hope to optimize it;
 but you have to create (rather than remove) an init fork, and there's
 no way to do that in exact sync with the commit.  So for safety I think
 you have to copy the data into a new relfilenode.


Well, the same thing that 'vacuum full' does, but changing relpersistence
to RELPERSISTENCE_UNLOGGED for the new relfilenode. Is this?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-05 Thread Fabrízio de Royes Mello
On Tue, Mar 4, 2014 at 5:00 PM, Fabrízio de Royes Mello 
fabriziome...@gmail.com wrote:


 On Tue, Mar 4, 2014 at 3:29 PM, Andres Freund and...@2ndquadrant.com
wrote:
 
  On 2014-03-04 12:54:02 -0500, Robert Haas wrote:
   On Tue, Mar 4, 2014 at 9:50 AM, Andres Freund and...@2ndquadrant.com
wrote:
On 2014-03-04 09:47:08 -0500, Robert Haas wrote:
Can't that be solved by just creating the permanent relation in a
new
relfilenode? That's equivalent to a rewrite, yes, but we need to do
that
for anything but wal_level=minimal anyway.
  
   Yes, that would work.  I've tended to view optimizing away the
   relfilenode copy as an indispensable part of this work, but that might
   be wrongheaded.  It would certainly be a lot easier to make this
   happen if we didn't insist on that.
 
  I think it'd already much better than today's situation, and it's a
  required codepath for wal_level  logical anyway. So even if somebody
  wants to make this work without the full copy for minimal, it'd still be
  a required codepath. So I am perfectly ok with a patch just adding that.
 

 Then is this a good idea for a GSoC project ?

 I don't know very well this internals, but I am willing to learn and I
think the GSoC is a good opportunity.

 Any of you are willing to mentoring this project?


Hi all,


I written the proposal to this feature, so I would like to know if someone
can review.

proposal

** Add to PostgreSQL the capacity to making an Unlogged table Logged **

Introduction

This project will allow to change an unlogged table (that doesn't create
transaction logs - WAL files) and it's dependencies to a logged table, in
other words, a regular table that create WAL files. To make this happen
we'll introduce a new SQL syntax:

ALTER TABLE name SET LOGGED;


Benefits to the PostgreSQL Community

The  unlogged tables feature was introduced by 9.1 version, and provide
better write performance than regular tables (logged), but are not
crash-safe. Their contents are automatically discarded (cleared) in a case
of a server crash, and their contents do not propagate to replication
slaves, either.
With the capacity of turning an unlogged table in a logged table will
allow us have the better of two features, in other words, we can use an
unlogged table to run a bulk load a thousands of lines (ETL scripts) and
get better performance, and then change it to a logged table to get
durability of loaded data.


Deliverables

This project will be splitted into 2 (two) deliverables:
1) Allow change an unlogged table to logged when wal_level = minimal
(without propagate their contents to replication slaves)
2) Allow change an unlogged table to logged when wal_level != minimal
(propagating their contents to replication slaves)


Project Schedule

until May 19:
* create a website to the project (wiki.postgresql.org)
* create a public repository to the project (github.com/fabriziomello)
* read what has already been discussed by the community about the project (
http://wiki.postgresql.org/wiki/Todo)
* discuss with the community the best design to the feature
* learn about some PostgreSQL internals:
  . physical storage for relations (src/backend/catalog/storage.c)
  . transaction system (src/backend/access/transam/xact.c)
  . two-phase commit (src/backend/access/transam/twophase.c)
  . table commands (src/backend/commands/tablecmds.c)
  . grammar (src/backend/parser/gram.y)

May 19 - June 23
* evaluate with the mentor and community if is a good start point use the
already sent patch (
http://www.postgresql.org/message-id/263033.9223...@web29013.mail.ird.yahoo.com
)
* implementation of the first deliverable:
  . change the grammar of PostgreSQL to support ALTER TABLE ... SET LOGGED
  . implement the routines to change an unlogged table to logged when
wal_level = minimal
* write documentation and the test cases
* submit this first deliverable to the commitfest 2014/06 (
https://commitfest.postgresql.org/action/commitfest_view?id=22)

June 23 - June 27
* review with the Mentor of the work done until now

June 27 - August 18
* implementation of the second deliverable (wal_level != minimal)
* write documentation and the test cases
* submit this second deliverable to the commitfest 2014/09 (webpage don't
created yet)

August 18 - August 22
* final review with the Mentor of all work done.


About the proponent

Fabrízio de Royes Mello
e-mail: fabriziome...@gmail.com
twitter: @fabriziomello
github: http://github.com/fabriziomello
linkedin: http://linkedin.com/in/fabriziomello

Currently I help people and teams to take the full potential of relational
databases, especially PostgreSQL, helping teams to design the structure of
the database (modeling), build physical architecture (database schema),
programming (procedural languages), SQL (usage, tuning, best practices),
optimization and orchestration of instances in production too. I perform a
volunteer work for Brazilian Community of PostgreSQL (www.postgresql.org.br),
supporting mailing 

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-04 Thread Robert Haas
On Mon, Mar 3, 2014 at 12:08 PM, Stephen Frost sfr...@snowman.net wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
 On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
  Is the TODO item make an unlogged table logged [1] a good GSoC project?

 I'm pretty sure we found some problems in that design that we couldn't
 figure out how to solve.  I don't have a pointer to the relevant
 -hackers discussion off-hand, but I think there was one.

 ISTR the discussion going something along the lines of we'd have to WAL
 log the entire table to do that, and if we have to do that, what's the
 point?.

No, not really.  The issue is more around what happens if we crash
part way through.  At crash recovery time, the system catalogs are not
available, because the database isn't consistent yet and, anyway, the
startup process can't be bound to a database, let alone every database
that might contain unlogged tables.  So the sentinel that's used to
decide whether to flush the contents of a table or index is the
presence or absence of an _init fork, which the startup process
obviously can see just fine.  The _init fork also tells us what to
stick in the relation when we reset it; for a table, we can just reset
to an empty file, but that's not legal for indexes, so the _init fork
contains a pre-initialized empty index that we can just copy over.

Now, to make an unlogged table logged, you've got to at some stage
remove those _init forks.  But this is not a transactional operation.
If you remove the _init forks and then the transaction rolls back,
you've left the system an inconsistent state.  If you postpone the
removal until commit time, then you have a problem if it fails,
particularly if it works for the first file but fails for the second.
And if you crash at any point before you've fsync'd the containing
directory, you have no idea which files will still be on disk after a
hard reboot.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-04 Thread Andres Freund
On 2014-03-04 09:47:08 -0500, Robert Haas wrote:
 On Mon, Mar 3, 2014 at 12:08 PM, Stephen Frost sfr...@snowman.net wrote:
  * Robert Haas (robertmh...@gmail.com) wrote:
  On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello
  fabriziome...@gmail.com wrote:
   Is the TODO item make an unlogged table logged [1] a good GSoC project?
 
  I'm pretty sure we found some problems in that design that we couldn't
  figure out how to solve.  I don't have a pointer to the relevant
  -hackers discussion off-hand, but I think there was one.
 
  ISTR the discussion going something along the lines of we'd have to WAL
  log the entire table to do that, and if we have to do that, what's the
  point?.
 
 No, not really.  The issue is more around what happens if we crash
 part way through.  At crash recovery time, the system catalogs are not
 available, because the database isn't consistent yet and, anyway, the
 startup process can't be bound to a database, let alone every database
 that might contain unlogged tables.  So the sentinel that's used to
 decide whether to flush the contents of a table or index is the
 presence or absence of an _init fork, which the startup process
 obviously can see just fine.  The _init fork also tells us what to
 stick in the relation when we reset it; for a table, we can just reset
 to an empty file, but that's not legal for indexes, so the _init fork
 contains a pre-initialized empty index that we can just copy over.
 
 Now, to make an unlogged table logged, you've got to at some stage
 remove those _init forks.  But this is not a transactional operation.
 If you remove the _init forks and then the transaction rolls back,
 you've left the system an inconsistent state.  If you postpone the
 removal until commit time, then you have a problem if it fails,
 particularly if it works for the first file but fails for the second.
 And if you crash at any point before you've fsync'd the containing
 directory, you have no idea which files will still be on disk after a
 hard reboot.

Can't that be solved by just creating the permanent relation in a new
relfilenode? That's equivalent to a rewrite, yes, but we need to do that
for anything but wal_level=minimal anyway.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-04 Thread Fabrízio de Royes Mello
On Tue, Mar 4, 2014 at 3:31 AM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2014-03-04 01:10:50 -0300, Fabrízio de Royes Mello wrote:
  Today I do something like that:
 
  1) create unlogged table tmp_foo ...
  2) populate 'tmp_foo' table (ETL scripts or whatever)
  3) start transaction
  4) lock table tmp_foo in access exclusive mode
  5) update pg_class set relpersistence = 'p' where oid =
'tmp_foo':regclass
  6) drop table foo; -- the old foo table
  7) alter table tmp_foo rename to foo;
  8) end transaction
  9) run pg_repack in table 'foo'
 
  I know it's very ugly, but works... and works for standbys too... :-)

 No, it doesn't work. It just may happen to not fail loudly/visibly in
 some cases. You're absolutely risking corruption of this *and* other
 relations when doing so.


Well this already works for some time, but you are correct, exists the risk
of corruption!

But in my case if all run without any interrupt the relation is switched to
logged. I do some checks before and after, and if something happens with
this process we cleanup everything and start from the beginning.

Maybe I must run CLUSTER inside the transaction block after update pg_class
and execute DROP and RENAME after, in a second phase. Maybe this way is
more secure. Is it?

If some crash occurs and PostgreSQL restart I check if the unlogged table
'tmp_foo' exists and then I drop it.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-04 Thread Fabrízio de Royes Mello
On Tue, Mar 4, 2014 at 11:50 AM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2014-03-04 09:47:08 -0500, Robert Haas wrote:
  On Mon, Mar 3, 2014 at 12:08 PM, Stephen Frost sfr...@snowman.net
wrote:
   * Robert Haas (robertmh...@gmail.com) wrote:
   On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello
   fabriziome...@gmail.com wrote:
Is the TODO item make an unlogged table logged [1] a good GSoC
project?
  
   I'm pretty sure we found some problems in that design that we
couldn't
   figure out how to solve.  I don't have a pointer to the relevant
   -hackers discussion off-hand, but I think there was one.
  
   ISTR the discussion going something along the lines of we'd have to
WAL
   log the entire table to do that, and if we have to do that, what's the
   point?.
 
  No, not really.  The issue is more around what happens if we crash
  part way through.  At crash recovery time, the system catalogs are not
  available, because the database isn't consistent yet and, anyway, the
  startup process can't be bound to a database, let alone every database
  that might contain unlogged tables.  So the sentinel that's used to
  decide whether to flush the contents of a table or index is the
  presence or absence of an _init fork, which the startup process
  obviously can see just fine.  The _init fork also tells us what to
  stick in the relation when we reset it; for a table, we can just reset
  to an empty file, but that's not legal for indexes, so the _init fork
  contains a pre-initialized empty index that we can just copy over.
 
  Now, to make an unlogged table logged, you've got to at some stage
  remove those _init forks.  But this is not a transactional operation.
  If you remove the _init forks and then the transaction rolls back,
  you've left the system an inconsistent state.  If you postpone the
  removal until commit time, then you have a problem if it fails,
  particularly if it works for the first file but fails for the second.
  And if you crash at any point before you've fsync'd the containing
  directory, you have no idea which files will still be on disk after a
  hard reboot.

 Can't that be solved by just creating the permanent relation in a new
 relfilenode? That's equivalent to a rewrite, yes, but we need to do that
 for anything but wal_level=minimal anyway.


Did you see this initial patch [1] from Leonardo Francalanci ?


Regards,

[1]
http://www.postgresql.org/message-id/263033.9223...@web29013.mail.ird.yahoo.com

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-04 Thread Robert Haas
On Tue, Mar 4, 2014 at 9:50 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-03-04 09:47:08 -0500, Robert Haas wrote:
 On Mon, Mar 3, 2014 at 12:08 PM, Stephen Frost sfr...@snowman.net wrote:
  * Robert Haas (robertmh...@gmail.com) wrote:
  On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello
  fabriziome...@gmail.com wrote:
   Is the TODO item make an unlogged table logged [1] a good GSoC 
   project?
 
  I'm pretty sure we found some problems in that design that we couldn't
  figure out how to solve.  I don't have a pointer to the relevant
  -hackers discussion off-hand, but I think there was one.
 
  ISTR the discussion going something along the lines of we'd have to WAL
  log the entire table to do that, and if we have to do that, what's the
  point?.

 No, not really.  The issue is more around what happens if we crash
 part way through.  At crash recovery time, the system catalogs are not
 available, because the database isn't consistent yet and, anyway, the
 startup process can't be bound to a database, let alone every database
 that might contain unlogged tables.  So the sentinel that's used to
 decide whether to flush the contents of a table or index is the
 presence or absence of an _init fork, which the startup process
 obviously can see just fine.  The _init fork also tells us what to
 stick in the relation when we reset it; for a table, we can just reset
 to an empty file, but that's not legal for indexes, so the _init fork
 contains a pre-initialized empty index that we can just copy over.

 Now, to make an unlogged table logged, you've got to at some stage
 remove those _init forks.  But this is not a transactional operation.
 If you remove the _init forks and then the transaction rolls back,
 you've left the system an inconsistent state.  If you postpone the
 removal until commit time, then you have a problem if it fails,
 particularly if it works for the first file but fails for the second.
 And if you crash at any point before you've fsync'd the containing
 directory, you have no idea which files will still be on disk after a
 hard reboot.

 Can't that be solved by just creating the permanent relation in a new
 relfilenode? That's equivalent to a rewrite, yes, but we need to do that
 for anything but wal_level=minimal anyway.

Yes, that would work.  I've tended to view optimizing away the
relfilenode copy as an indispensable part of this work, but that might
be wrongheaded.  It would certainly be a lot easier to make this
happen if we didn't insist on that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-04 Thread Andres Freund
On 2014-03-04 12:54:02 -0500, Robert Haas wrote:
 On Tue, Mar 4, 2014 at 9:50 AM, Andres Freund and...@2ndquadrant.com wrote:
  On 2014-03-04 09:47:08 -0500, Robert Haas wrote:
  Can't that be solved by just creating the permanent relation in a new
  relfilenode? That's equivalent to a rewrite, yes, but we need to do that
  for anything but wal_level=minimal anyway.
 
 Yes, that would work.  I've tended to view optimizing away the
 relfilenode copy as an indispensable part of this work, but that might
 be wrongheaded.  It would certainly be a lot easier to make this
 happen if we didn't insist on that.

I think it'd already much better than today's situation, and it's a
required codepath for wal_level  logical anyway. So even if somebody
wants to make this work without the full copy for minimal, it'd still be
a required codepath. So I am perfectly ok with a patch just adding that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-04 Thread Fabrízio de Royes Mello
On Tue, Mar 4, 2014 at 3:29 PM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2014-03-04 12:54:02 -0500, Robert Haas wrote:
  On Tue, Mar 4, 2014 at 9:50 AM, Andres Freund and...@2ndquadrant.com
wrote:
   On 2014-03-04 09:47:08 -0500, Robert Haas wrote:
   Can't that be solved by just creating the permanent relation in a new
   relfilenode? That's equivalent to a rewrite, yes, but we need to do
that
   for anything but wal_level=minimal anyway.
 
  Yes, that would work.  I've tended to view optimizing away the
  relfilenode copy as an indispensable part of this work, but that might
  be wrongheaded.  It would certainly be a lot easier to make this
  happen if we didn't insist on that.

 I think it'd already much better than today's situation, and it's a
 required codepath for wal_level  logical anyway. So even if somebody
 wants to make this work without the full copy for minimal, it'd still be
 a required codepath. So I am perfectly ok with a patch just adding that.


Then is this a good idea for a GSoC project ?

I don't know very well this internals, but I am willing to learn and I
think the GSoC is a good opportunity.

Any of you are willing to mentoring this project?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] GSoC proposal

2014-03-03 Thread Albe Laurenz
 I'm applying for GSoC 2014 with Postgresql and would appreciate your comments 
 on my proposal
 (attached). I'm looking for technical corrections/comments and your opinions 
 on the project's
 viability. In particular, if the community has doubts about its usefulness, I 
 would start working on
 an extra proposal from https://wiki.postgresql.org/wiki/GSoC_2014, perhaps on 
 the RETURNING clause as
 a student named Karlik did last year.

I am sure that Simon had his reasons when he proposed
http://www.postgresql.org/message-id/CA+U5nMJGgJNt5VXqkR=crtdqxfmuyzwef23-fd5nusns+6n...@mail.gmail.com
but I cannot help asking some questions:

1) Why limit the feature to UTF8 strings?
   Shouldn't the technique work for all multibyte server encodings?

2) There is probably something that makes this necessary, but why should the 
decision
   how toast is sliced be attached to the data type?
   My (probably naive) idea would be to add a new TOAST strategy (e.g. SLICED)
   to PLAIN, MAIN, EXTERNAL and EXTENDED.

The feature only makes sense for string data types, right?

Yours,
Laurenz Albe   

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Fabrízio de Royes Mello
Hi all,

Is the TODO item make an unlogged table logged [1] a good GSoC project?

Regards,

[1]
http://www.postgresql.org/message-id/aanlktinenzbrxdcwohkqbba2bhubfy8_c5jwrxlod...@mail.gmail.com

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Robert Haas
On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello
fabriziome...@gmail.com wrote:
 Is the TODO item make an unlogged table logged [1] a good GSoC project?

I'm pretty sure we found some problems in that design that we couldn't
figure out how to solve.  I don't have a pointer to the relevant
-hackers discussion off-hand, but I think there was one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
  Is the TODO item make an unlogged table logged [1] a good GSoC project?
 
 I'm pretty sure we found some problems in that design that we couldn't
 figure out how to solve.  I don't have a pointer to the relevant
 -hackers discussion off-hand, but I think there was one.

ISTR the discussion going something along the lines of we'd have to WAL
log the entire table to do that, and if we have to do that, what's the
point?.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 * Robert Haas (robertmh...@gmail.com) wrote:
 On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
 Is the TODO item make an unlogged table logged [1] a good GSoC project?

 I'm pretty sure we found some problems in that design that we couldn't
 figure out how to solve.  I don't have a pointer to the relevant
 -hackers discussion off-hand, but I think there was one.

 ISTR the discussion going something along the lines of we'd have to WAL
 log the entire table to do that, and if we have to do that, what's the
 point?.

IIRC, the reason you'd have to do that is to make the table contents
appear on slave servers.  If you don't consider replication then it might
seem easier.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Hannu Krosing
On 03/03/2014 05:22 PM, Tom Lane wrote:
 Stephen Frost sfr...@snowman.net writes:
...
 ISTR the discussion going something along the lines of we'd have to WAL
 log the entire table to do that, and if we have to do that, what's the
 point?.
 IIRC, the reason you'd have to do that is to make the table contents
 appear on slave servers.  If you don't consider replication then it might
 seem easier.
So switch on logging and then perform CLUSTER/VACUUM FULL ?

Should this work, or is something extra needed ?

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Andres Freund
On 2014-03-03 12:08:26 -0500, Stephen Frost wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
  On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello
  fabriziome...@gmail.com wrote:
   Is the TODO item make an unlogged table logged [1] a good GSoC project?
  
  I'm pretty sure we found some problems in that design that we couldn't
  figure out how to solve.  I don't have a pointer to the relevant
  -hackers discussion off-hand, but I think there was one.
 
 ISTR the discussion going something along the lines of we'd have to WAL
 log the entire table to do that, and if we have to do that, what's the
 point?.

I don't see that as a particularly problematic problem. The primary
reason to want to convert a unlogged to a logged table probably is that
it's easier to do so than to recreate the table + dependencies. Also the
overhead of logging full pages will be noticeably smaller than the
overhead of adding all rows individually, even if using
heap_multi_insert().

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Peter Geoghegan
On Mon, Mar 3, 2014 at 8:28 AM, Fabrízio de Royes Mello
fabriziome...@gmail.com wrote:
 Is the TODO item make an unlogged table logged [1] a good GSoC project?

Another interesting project around unlogged tables would be to make it
possible to have unlogged indexes on fully-logged tables. That is
something that there was some discussion of before, that might be
easier.

FWIW, I don't think that TODO page is a very good resource for finding
a starter project. Picking a good project is a skill in and of itself.
A lot of that stuff is aspirational, either because it's difficult,
or, more often, because it's difficult relative to the rewards, which
can be quite low. To be honest, if I have what I imagine to be a great
idea for a project, I don't put it on that page. Maybe I should, but I
don't, and I don't think that is uncommon. This is not because I'm
particularly guarded about sharing the information.

Why do you think that hash indexes still aren't WAL-logged after all
these years (a project that someone made noise about recently in
relation to GSoC), even though that's generally considered to be a
SMOP?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Andres Freund
On 2014-03-03 12:44:26 -0800, Peter Geoghegan wrote:
 On Mon, Mar 3, 2014 at 8:28 AM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
  Is the TODO item make an unlogged table logged [1] a good GSoC project?
 
 Another interesting project around unlogged tables would be to make it
 possible to have unlogged indexes on fully-logged tables. That is
 something that there was some discussion of before, that might be
 easier.

I'd actually say it's harder because it requires modifying the catalog
or transparently introducing hacks similar to what unlogged matviews are
doing, to make sure the index is marked invalid after a crash restart.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Fabrízio de Royes Mello
On Mon, Mar 3, 2014 at 2:40 PM, Hannu Krosing ha...@2ndquadrant.com wrote:

 On 03/03/2014 05:22 PM, Tom Lane wrote:
  Stephen Frost sfr...@snowman.net writes:
 ...
  ISTR the discussion going something along the lines of we'd have to
WAL
  log the entire table to do that, and if we have to do that, what's the
  point?.
  IIRC, the reason you'd have to do that is to make the table contents
  appear on slave servers.  If you don't consider replication then it
might
  seem easier.
 So switch on logging and then perform CLUSTER/VACUUM FULL ?

 Should this work, or is something extra needed ?


Today I do something like that:

1) create unlogged table tmp_foo ...
2) populate 'tmp_foo' table (ETL scripts or whatever)
3) start transaction
4) lock table tmp_foo in access exclusive mode
5) update pg_class set relpersistence = 'p' where oid = 'tmp_foo':regclass
6) drop table foo; -- the old foo table
7) alter table tmp_foo rename to foo;
8) end transaction
9) run pg_repack in table 'foo'

I know it's very ugly, but works... and works for standbys too... :-)

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Fabrízio de Royes Mello
On Mon, Mar 3, 2014 at 5:44 PM, Peter Geoghegan p...@heroku.com wrote:

 On Mon, Mar 3, 2014 at 8:28 AM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
  Is the TODO item make an unlogged table logged [1] a good GSoC
project?

 Another interesting project around unlogged tables would be to make it
 possible to have unlogged indexes on fully-logged tables. That is
 something that there was some discussion of before, that might be
 easier.


More easy than make an unlogged table logged ?


 FWIW, I don't think that TODO page is a very good resource for finding
 a starter project. Picking a good project is a skill in and of itself.
 A lot of that stuff is aspirational, either because it's difficult,
 or, more often, because it's difficult relative to the rewards, which
 can be quite low. To be honest, if I have what I imagine to be a great
 idea for a project, I don't put it on that page. Maybe I should, but I
 don't, and I don't think that is uncommon. This is not because I'm
 particularly guarded about sharing the information.


Share your ideas, please!

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Fabrízio de Royes Mello
On Mon, Mar 3, 2014 at 5:47 PM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2014-03-03 12:44:26 -0800, Peter Geoghegan wrote:
  On Mon, Mar 3, 2014 at 8:28 AM, Fabrízio de Royes Mello
  fabriziome...@gmail.com wrote:
   Is the TODO item make an unlogged table logged [1] a good GSoC
project?
 
  Another interesting project around unlogged tables would be to make it
  possible to have unlogged indexes on fully-logged tables. That is
  something that there was some discussion of before, that might be
  easier.

 I'd actually say it's harder because it requires modifying the catalog
 or transparently introducing hacks similar to what unlogged matviews are
 doing, to make sure the index is marked invalid after a crash restart.


Unlogged matviews was disallowed [1].

[1]
http://www.postgresql.org/message-id/e1uznrd-0008eh...@gemulon.postgresql.org

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Fabrízio de Royes Mello
On Mon, Mar 3, 2014 at 2:42 PM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2014-03-03 12:08:26 -0500, Stephen Frost wrote:
  * Robert Haas (robertmh...@gmail.com) wrote:
   On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello
   fabriziome...@gmail.com wrote:
Is the TODO item make an unlogged table logged [1] a good GSoC
project?
  
   I'm pretty sure we found some problems in that design that we couldn't
   figure out how to solve.  I don't have a pointer to the relevant
   -hackers discussion off-hand, but I think there was one.
 
  ISTR the discussion going something along the lines of we'd have to WAL
  log the entire table to do that, and if we have to do that, what's the
  point?.

 I don't see that as a particularly problematic problem. The primary
 reason to want to convert a unlogged to a logged table probably is that
 it's easier to do so than to recreate the table + dependencies. Also the
 overhead of logging full pages will be noticeably smaller than the
 overhead of adding all rows individually, even if using
 heap_multi_insert().


Do you know some similar in the source code?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Andres Freund
On 2014-03-04 01:10:50 -0300, Fabrízio de Royes Mello wrote:
 Today I do something like that:
 
 1) create unlogged table tmp_foo ...
 2) populate 'tmp_foo' table (ETL scripts or whatever)
 3) start transaction
 4) lock table tmp_foo in access exclusive mode
 5) update pg_class set relpersistence = 'p' where oid = 'tmp_foo':regclass
 6) drop table foo; -- the old foo table
 7) alter table tmp_foo rename to foo;
 8) end transaction
 9) run pg_repack in table 'foo'
 
 I know it's very ugly, but works... and works for standbys too... :-)

No, it doesn't work. It just may happen to not fail loudly/visibly in
some cases. You're absolutely risking corruption of this *and* other
relations when doing so.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal

2014-02-28 Thread Florian Pflug
On Feb28, 2014, at 05:29 , Tan Tran tankimt...@gmail.com wrote:
 I'm applying for GSoC 2014 with Postgresql and would appreciate your comments
 on my proposal (attached).
 pg_gsoc2014_TanTran.pdf

First, please include your proposal as plain, inline text next time.
That makes it easier to quote the relevant parts when replying, and
also allows your mail to be indexed correctly by the mailing list
archive.

Regarding your proposal, I think you need to explain what exactly it
is you want to achieve in more detail.

 In particular, text and bytea are EXTERNAL by default, so that substring
 operations can seek straight to the exact slice (which is O(1)) instead
 of de-toasting the whole datum (which is O(file size)). Specifically,
 varlena.c’s text_substring(...) and bytea_substring(...) call
 DatumGetTextPSlice(...), which r!etrieves only the slice(s) at an
 easily-computed offset.!
 
 ...
 
 1. First, I will optimize array element retrieval and UTF-8 substring
 retrieval. Both are straightforward, as they involve calculating slice
 numbers and using similar code to above.!

I'm confused by that - text_substring *already* attempts to only fetch
the relevant slice in the case of UTF-8. It can't do so precisely - it
needs to use a conservative estimate - but I fail to see how that can
be avoided. Since UTF-8 maps a character to anything from 1 to 6 bytes,
you can't compute the byte offset of a given character index precisely.

You could store a constant number of *characters* per slice, instead of
a constant number of *bytes*, but due to the rather large worst-case of
6 bytes per character, that would increase the storage and access overhead
6 fold for languages which can largely be represented with 1 byte per
character. That's not going to go down well...

I haven't looked at how we currently handle arrays, but the problems
there are similar. For arrays containing variable-length types, you can't
compute the byte offset from the index. It's even worst than for varchar,
because the range of possible element lengths is much longer - one array
element might be only a few bytes long, while another may be 1kB or more...

 2. Second, I will implement a SPLITTER clause for the CREATE TYPE
 statement. As 1 proposes, one would define a type, for example:
   CREATE TYPE my_xml
 LIKE xml
 SPLITTER my_xml_splitter;

As far as I can tell, the idea is to allow a datatype to influence how
it's split into chunks for TOASTing so that functions can fetch only
the required slices more easily. To judge whether that is worthwhile or
not, you'd have to provide a concrete example of when such a facility
would be useful.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC Proposal - Caching query results in pgpool-II

2011-04-07 Thread Magnus Hagander
2011/4/7 Tatsuo Ishii is...@postgresql.org:
 In my understanding pqc is not designed to be working with pgpool.
 Thus if a user want to use both query cache and query dispatching,
 replication or failover etc. which are provided by pgpool, it seems
 it's not possible. For this purpose maybe user could *cascade* pqc and
 pgpool, but I'm not sure. Even if it's possible, it will bring huge
 performance penalty.

 Another point is cache invalidation. Masanori's proposal includes
 cache invalidation technique by looking at write queries, which is
 lacking in pqc in my understanding.

Probably. My question wasn't necessarily hasn't this already been
done in pqc, more should this perhaps build on or integrate with pgc
in order not to duplicate effort. I think at the very least, any
overlap should be researched and identified - because if it can
integrate parts of pgc, or work with, more effort can be spent on the
new parts rather than redoing something that's already been done.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC Proposal - Caching query results in pgpool-II

2011-04-07 Thread Selena Deckelmann
2011/4/5 Masanori Yamazaki myamazak...@gmail.com:
 Hello

  I am sending my proposal about Google Summer Of Code2011.
 It would be nice if you could give me your opinion.

Fantastic!  Please submit your proposal through the GSoC website:

http://www.google-melange.com/gsoc/profile/student/google/gsoc2011

-selena



-- 
http://chesnok.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC Proposal - Caching query results in pgpool-II

2011-04-06 Thread Magnus Hagander
How does this relate to the existing pqc project (
http://code.google.com/p/pqc/)? Seems the goals are fairly similar, and both
are based off pgpool?

/Magnus
 On Apr 6, 2011 2:10 AM, Masanori Yamazaki m.yamazak...@gmail.com wrote:
 Hello

 My name is Masanori Yamazaki. I am sending my proposal about
 Google Summer Of Code2011. It would be nice if you could give
 me your opinion.


 ・title

 Caching query results in pgpool-II


 ・Synopsis

 Pgpool-II has query caching functionality using storage provided by
 dedicated PostgreSQL (system database). This has several drawbacks
 however. 1)it is slow because it needs to access disk storage 2)it
 does not invalidate the cache automatically.

 This proposal tries to solve these problems.

 - To speed up the cache access, it will be placed on memory, rather
 than database. The memory will be put on shared memory or external
 memory services such as memcached so that the cache can be shared by
 multiple sessions. Old cache entries will be deleted by LRU manner.

 - The cache will be invalidated automatically upon the timing when the
 relevant tables are updated. Note that this is not always possible
 because the query result might come from multiple tables, views or
 even functions. In this case the cache will be invalidated by
 timeout(or they are not cached at all).

 - Fine tuning knobs need to be invented to control the cache behavior
 though they are not clear at this moment.


 ・Benefits to the PostgreSQL Community:


 Query caching will effectively enhance the performance of PostgreSQL
 and this project will contribute to increase the number of users of
 PostgreSQL, who need more high performance database systems.

 Note that implementing query cache in pgpool-II will bring merits not
 only to the latest version of PostgreSQL but to the previous releases
 of PostgreSQL.


 ・Project Schedule

 -April
 preparation

 -May 1 - May 22
 write a specification

 -May 23 - June 19
 coding

 -June 20 - July 22
 test

 -July 23 - August 12
 complete of coding and test, commit


 ・Personal Data and Biographical Information

 Name : Masanori Yamazaki
 Born : 23.1.1981
 School :Currently I learn contemporary philosophy, culture and literature
 at Waseda University in Japan.
 Coding :
 1.About five years job as web application programer(PHP, Java).
 2.I experienced projects used framework such as Symfony, Zend Framework,
 CakePHP, and Struts.
 3.I am interested in OSS and like coding.


 Regards


[HACKERS] GSoC Proposal - Caching query results in pgpool-II

2011-04-06 Thread Masanori Yamazaki
Hello

 I am sending my proposal about Google Summer Of Code2011.
It would be nice if you could give me your opinion.

・title

Caching query results in pgpool-II


・Synopsis

Pgpool-II has query caching functionality using storage provided by
dedicated PostgreSQL (system database). This has several drawbacks
however. 1)it is slow because it needs to access disk storage 2)it
does not invalidate the cache automatically.

This proposal tries to solve these problems.

- To speed up the cache access, it will be placed on memory, rather
  than database. The memory will be put on shared memory or external
  memory services such as memcached so that the cache can be shared by
  multiple sessions. Old cache entries will be deleted by LRU manner.

- The cache will be invalidated automatically upon the timing when the
  relevant tables are updated. Note that this is not always possible
  because the query result might come from multiple tables, views or
  even functions. In this case the cache will be invalidated by
  timeout(or they are not cached at all).

- Fine tuning knobs need to be invented to control the cache behavior
  though they are not clear at this moment.


・Benefits to the PostgreSQL Community:


Query caching will effectively enhance the performance of PostgreSQL
and this project will contribute to increase the number of users of
PostgreSQL, who need more high performance database systems.

Note that implementing query cache in pgpool-II will bring merits not
only to the latest version of PostgreSQL but to the previous releases
of PostgreSQL.


・Project Schedule

-April
 preparation

-May 1 - May 22
 write a specification

-May 23 - June 19
 coding

-June 20 - July 22
 test

-July 23 - August 12
 Complete of coding and test, commit


・Personal Data and Biographical Information

 Name : Masanori Yamazaki
 Born : 23.1.1981
 School :Currently I learn contemporary philosophy, culture and literature
at Waseda University in Japan.
 Coding :
 1.About five years job as web application programer(PHP, Java).
 2.I experienced projects used framework such as Symfony, Zend Framework,
CakePHP, and Struts.
 3.I am interested in OSS and like coding.


Regards


Re: [HACKERS] GSoC Proposal - Caching query results in pgpool-II

2011-04-06 Thread Tatsuo Ishii
I like this proposal. This would bring big benefit to both the
PostgreSQL and the pgpool project.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

 Hello
 
  I am sending my proposal about Google Summer Of Code2011.
 It would be nice if you could give me your opinion.
 
 ・title
 
 Caching query results in pgpool-II
 
 
 ・Synopsis
 
 Pgpool-II has query caching functionality using storage provided by
 dedicated PostgreSQL (system database). This has several drawbacks
 however. 1)it is slow because it needs to access disk storage 2)it
 does not invalidate the cache automatically.
 
 This proposal tries to solve these problems.
 
 - To speed up the cache access, it will be placed on memory, rather
   than database. The memory will be put on shared memory or external
   memory services such as memcached so that the cache can be shared by
   multiple sessions. Old cache entries will be deleted by LRU manner.
 
 - The cache will be invalidated automatically upon the timing when the
   relevant tables are updated. Note that this is not always possible
   because the query result might come from multiple tables, views or
   even functions. In this case the cache will be invalidated by
   timeout(or they are not cached at all).
 
 - Fine tuning knobs need to be invented to control the cache behavior
   though they are not clear at this moment.
 
 
 ・Benefits to the PostgreSQL Community:
 
 
 Query caching will effectively enhance the performance of PostgreSQL
 and this project will contribute to increase the number of users of
 PostgreSQL, who need more high performance database systems.
 
 Note that implementing query cache in pgpool-II will bring merits not
 only to the latest version of PostgreSQL but to the previous releases
 of PostgreSQL.
 
 
 ・Project Schedule
 
 -April
  preparation
 
 -May 1 - May 22
  write a specification
 
 -May 23 - June 19
  coding
 
 -June 20 - July 22
  test
 
 -July 23 - August 12
  Complete of coding and test, commit
 
 
 ・Personal Data and Biographical Information
 
  Name : Masanori Yamazaki
  Born : 23.1.1981
  School :Currently I learn contemporary philosophy, culture and literature
 at Waseda University in Japan.
  Coding :
  1.About five years job as web application programer(PHP, Java).
  2.I experienced projects used framework such as Symfony, Zend Framework,
 CakePHP, and Struts.
  3.I am interested in OSS and like coding.
 
 
 Regards

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC Proposal - Caching query results in pgpool-II

2011-04-06 Thread Tatsuo Ishii
In my understanding pqc is not designed to be working with pgpool.
Thus if a user want to use both query cache and query dispatching,
replication or failover etc. which are provided by pgpool, it seems
it's not possible. For this purpose maybe user could *cascade* pqc and
pgpool, but I'm not sure. Even if it's possible, it will bring huge
performance penalty.

Another point is cache invalidation. Masanori's proposal includes
cache invalidation technique by looking at write queries, which is
lacking in pqc in my understanding.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

 How does this relate to the existing pqc project (
 http://code.google.com/p/pqc/)? Seems the goals are fairly similar, and both
 are based off pgpool?
 
 /Magnus
  On Apr 6, 2011 2:10 AM, Masanori Yamazaki m.yamazak...@gmail.com wrote:
 Hello

 My name is Masanori Yamazaki. I am sending my proposal about
 Google Summer Of Code2011. It would be nice if you could give
 me your opinion.


 ・title

 Caching query results in pgpool-II


 ・Synopsis

 Pgpool-II has query caching functionality using storage provided by
 dedicated PostgreSQL (system database). This has several drawbacks
 however. 1)it is slow because it needs to access disk storage 2)it
 does not invalidate the cache automatically.

 This proposal tries to solve these problems.

 - To speed up the cache access, it will be placed on memory, rather
 than database. The memory will be put on shared memory or external
 memory services such as memcached so that the cache can be shared by
 multiple sessions. Old cache entries will be deleted by LRU manner.

 - The cache will be invalidated automatically upon the timing when the
 relevant tables are updated. Note that this is not always possible
 because the query result might come from multiple tables, views or
 even functions. In this case the cache will be invalidated by
 timeout(or they are not cached at all).

 - Fine tuning knobs need to be invented to control the cache behavior
 though they are not clear at this moment.


 ・Benefits to the PostgreSQL Community:


 Query caching will effectively enhance the performance of PostgreSQL
 and this project will contribute to increase the number of users of
 PostgreSQL, who need more high performance database systems.

 Note that implementing query cache in pgpool-II will bring merits not
 only to the latest version of PostgreSQL but to the previous releases
 of PostgreSQL.


 ・Project Schedule

 -April
 preparation

 -May 1 - May 22
 write a specification

 -May 23 - June 19
 coding

 -June 20 - July 22
 test

 -July 23 - August 12
 complete of coding and test, commit


 ・Personal Data and Biographical Information

 Name : Masanori Yamazaki
 Born : 23.1.1981
 School :Currently I learn contemporary philosophy, culture and literature
 at Waseda University in Japan.
 Coding :
 1.About five years job as web application programer(PHP, Java).
 2.I experienced projects used framework such as Symfony, Zend Framework,
 CakePHP, and Struts.
 3.I am interested in OSS and like coding.


 Regards

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal: Fast GiST index build

2011-04-05 Thread Alexander Korotkov
On Mon, Apr 4, 2011 at 8:50 PM, Robert Haas robertmh...@gmail.com wrote:

 OK.  Could you briefly describe the algorithm you propose to
 implement, bearing in mind that I haven't read the paper?


The technique can be very briefly described in following rules.
M = number of index keys fitting in RAM;
B = number of index keys in one page;
1) Additional buffers of M/(2*B) pages each is attached to all nodes of some
levels. Levels are selected with step floor(log(M/4B, B))), leaf nodes don't
contain buffers. I.e. nodes in levels i*floor(log(M/4B, B))), i = 1,2,3,...
contain buffers (numbering is going from down to up, level 0 contain leaf
nodes).
2) When entry reaches node with buffer, it is placed into buffer.
3) When buffer is overflowed it runs down into lower buffers or leaf pages.
4) When split occurs in node with buffer, then this buffers splits into two
buffers using penalty function.


With best regards,
Alexander Korotkov.


[HACKERS] GSoC Proposal - Caching query results in pgpool-II

2011-04-05 Thread Masanori Yamazaki
Hello

 My name is Masanori Yamazaki. I am sending my proposal about
Google Summer Of Code2011. It would be nice if you could give
me your opinion.


・title

Caching query results in pgpool-II


・Synopsis

Pgpool-II has query caching functionality using storage provided by
dedicated PostgreSQL (system database). This has several drawbacks
however. 1)it is slow because it needs to access disk storage 2)it
does not invalidate the cache automatically.

This proposal tries to solve these problems.

- To speed up the cache access, it will be placed on memory, rather
  than database. The memory will be put on shared memory or external
  memory services such as memcached so that the cache can be shared by
  multiple sessions. Old cache entries will be deleted by LRU manner.

- The cache will be invalidated automatically upon the timing when the
  relevant tables are updated. Note that this is not always possible
  because the query result might come from multiple tables, views or
  even functions. In this case the cache will be invalidated by
  timeout(or they are not cached at all).

- Fine tuning knobs need to be invented to control the cache behavior
  though they are not clear at this moment.


・Benefits to the PostgreSQL Community:


Query caching will effectively enhance the performance of PostgreSQL
and this project will contribute to increase the number of users of
PostgreSQL, who need more high performance database systems.

Note that implementing query cache in pgpool-II will bring merits not
only to the latest version of PostgreSQL but to the previous releases
of PostgreSQL.


・Project Schedule

-April
 preparation

-May 1 - May 22
 write a specification

-May 23 - June 19
 coding

-June 20 - July 22
 test

-July 23 - August 12
 complete of coding and test, commit


・Personal Data and Biographical Information

 Name : Masanori Yamazaki
 Born : 23.1.1981
 School :Currently I learn contemporary philosophy, culture and literature
at Waseda University in Japan.
 Coding :
 1.About five years job as web application programer(PHP, Java).
 2.I experienced projects used framework such as Symfony, Zend Framework,
CakePHP, and Struts.
 3.I am interested in OSS and like coding.


Regards


Re: [HACKERS] GSoC proposal: Fast GiST index build

2011-04-05 Thread Oleg Bartunov

Just to clarify situation a bit. I noticed buffer tree technique while 
reseaching
sp-gist and got an idea to use it for improving CREATE INDEX for GiST, which
is what we were looking many times. Alexander is working on his thesis and 
this project suits  ideally for him and community. Since I and Teodor are very
busy in the moment, it's very important to have one more gist developer 
available, especially, keeping in mind the energy and motivation of Alexander.
He already did several contributions and I have no doubt his work will be 
useful for us. So, I suggest support his work !


Oleg

On Wed, 6 Apr 2011, Alexander Korotkov wrote:


On Mon, Apr 4, 2011 at 8:50 PM, Robert Haas robertmh...@gmail.com wrote:


OK.  Could you briefly describe the algorithm you propose to
implement, bearing in mind that I haven't read the paper?



The technique can be very briefly described in following rules.
M = number of index keys fitting in RAM;
B = number of index keys in one page;
1) Additional buffers of M/(2*B) pages each is attached to all nodes of some
levels. Levels are selected with step floor(log(M/4B, B))), leaf nodes don't
contain buffers. I.e. nodes in levels i*floor(log(M/4B, B))), i = 1,2,3,...
contain buffers (numbering is going from down to up, level 0 contain leaf
nodes).
2) When entry reaches node with buffer, it is placed into buffer.
3) When buffer is overflowed it runs down into lower buffers or leaf pages.
4) When split occurs in node with buffer, then this buffers splits into two
buffers using penalty function.


With best regards,
Alexander Korotkov.



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] GSoC proposal: Fast GiST index build

2011-04-04 Thread Alexander Korotkov
Hello!

Here is the text of my proposal which I've recently applied to GSoC and have
mentioned before in
http://archives.postgresql.org/message-id/AANLkTimqFRmFdrYaesnJB8H4BuJo3j1SBdR1qmv=k...@mail.gmail.com
Any comments are welcome.

*Project name*

Fast GiST index build

*Synopsis*

Currently GiST index don't have any bulk load functionality. It have to
create new index by entry insertion one by one. This makes new index
creation relatively slow in comparison with btree and even GIN. There are
various works in computer science about bulk operation on R-tree. Since GiST
in basic is generalization of R-tree, some of them seems to be applicable to
GiST. In [2] R-tree bulk operations techniques are presented. Despite some
issues related with GiST distinction from R-tree, techniques of this paper
seems to be applicable to GiST.

*Benefits to the PostgreSQL Community*

Faster GiST index build would be actual for many PostgreSQL applications.
For example, some GIS systems suffers from index building in may hours and
even days.

*Quantifiable results*

Acceleration of GiST index build.

*Project Details*

Paper [2] present R-tree bulk operations techniques which are, in general,
application of buffer tree [1] to R-tree. The technique proposed in the
paper [2] can be very briefly described as following. Additional buffers is
attached to nodes in some levels (levels are selected with some step). When
entry fall into node with buffer, it is places into buffer. Then buffer is
overflowed it runs down into lower buffers or leaf pages.

The results of the paper [2] can have following applications for PostgreSQL
GiST:

1) Fast GiST index creation. The proposed technique of bulk insert can be
most straight-forwardly applied to GiST index creation. Since R-tree was
cosidered in the paper, key length is fixed. In GiST we can deal with
varlena keys that can leads complexities. For example, page split can occurs
during placing index entry into appropriate buffers. Another difficulty with
varlena keys is that size of buffer and level step of buffers are calculated
by the number of entries in page. When using varlena keys this number is not
constant. Since, these calculations is used only for guarantee operation
performance, we can estimate number of entries in page (using average key
length from some keys). Herewith performance guarantees wouldn't be so
strong.
2) Bulk insert. Since access method interfae in PostgreSQL doesn't contain
bulk insert function, bulk insert operations are performed by entries
insertion one by one. And access method doesn't know how many entries is
expected. This circumstance makes application of bulk insert technique
difficult for PostgreSQL GiST. With current access method interface bulk
insert can be implemented using technique like fastupdate in GIN. But let us
note that such technique lead concurrent index searches to scan buffers.
Since, buffer size is significant (the size of most lowest buffer is
comparable with size of whole subtree), it can lead to significant slowdown.
The finding of compromise between index build acceleration and concurrent
search slowdown is a separate research, and it doesn't seem to be feasible
to fit it in the GSoC.
3) Bulk delete. Current bulk delete operation in PostgreSQL GiST is fast,
but it doesn't perform index restructuring. These can lead to worsening of
index during bulk delete. The bulk delete technique in the paper [2] focused
on acceleration of bulk delete procedure which perform index restructuring.
The problem of application of this approach to GiST is that GiST
implementation doesn't provide any explicit guarantees about storage
utilization. If such guarantees exists then they are hidden in the picksplit
access method. Application of bulk delete technique requires GiST access
methos to know about storage utilization guarantees of implementation. In
turn it require a GiST operator class refactoring, and it doesn't seem to be
feasible to fit it in the GSoC.
Accordingly, in this GSoC project fast GiST index creation would be
implemented. The results of this implementation can help to find a way to
implement additional advantages noted above.

*Inch-stones*

1) Solve architecture questions with help of commutiny. I'm going to produce
approach to varlena keys handling on this step and discuss issues of buffer
handling.
2) First, approximate implementation. On this step implementation might not
always properly work with varlena keys, on some corner cases significant
slowdown is possible.
3) Accurate implementation of varlena keys handling and detail testing on
corner cases. This step supposes detail consideration on particular test
cases of corner cases which might cause dip in performance and their
solution implementation.
4) Final refactoring, documentation, testing and commit.

*Project Schedule*

until May 31

Solve architecture questions with help of commutiny.

1 june - 30 june

First, approximate implementation.

1 july - 31 july

Implementation of 

Re: [HACKERS] GSoC proposal: Fast GiST index build

2011-04-04 Thread Robert Haas
On Mon, Apr 4, 2011 at 7:16 AM, Alexander Korotkov aekorot...@gmail.com wrote:
 Project name
 Fast GiST index build

Would/could/should this be implemented in a manner similar to the
existing GIN fast update feature?

It's occurred to me to wonder whether even btree indexes would benefit
from this type of optimization.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal: Fast GiST index build

2011-04-04 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Apr 4, 2011 at 7:16 AM, Alexander Korotkov aekorot...@gmail.com 
 wrote:
 Project name
 Fast GiST index build

 Would/could/should this be implemented in a manner similar to the
 existing GIN fast update feature?

Fast build and fast update tend to be two different problems ...

 It's occurred to me to wonder whether even btree indexes would benefit
 from this type of optimization.

GIN fast update is a win when you can optimize the insertion of multiple
occurrences of the same key.  There isn't really any corresponding
optimization possible in btree, AFAICS.  (Heikki did some work awhile
back on btrees with multiple TIDs per key, for low-cardinality tables,
which might conceivably admit of a similar optimization.  But I haven't
heard anything about that in a long time.  It wasn't real clear to me
where the win over GIN would be for that.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal: Fast GiST index build

2011-04-04 Thread Alexander Korotkov
On Mon, Apr 4, 2011 at 7:04 PM, Robert Haas robertmh...@gmail.com wrote:

 On Mon, Apr 4, 2011 at 7:16 AM, Alexander Korotkov aekorot...@gmail.com
 wrote:
  Project name
  Fast GiST index build

 Would/could/should this be implemented in a manner similar to the
 existing GIN fast update feature?


I've mentioned this problem in item #2 in project details. In short. Problem
is in concurrent selects. Buffers size is significant and their scan in
concurrent select can cause significant slow down. Probably, compromise can
be achived by using for smaller buffers or something like this, but it's
topic of separate research. It doesn't seems to be feasible for me to give a
production solution of this problem during GSoC.


With best regards,
Alexander Korotkov.


Re: [HACKERS] GSoC proposal: Fast GiST index build

2011-04-04 Thread Robert Haas
On Mon, Apr 4, 2011 at 12:46 PM, Alexander Korotkov
aekorot...@gmail.com wrote:
 On Mon, Apr 4, 2011 at 7:04 PM, Robert Haas robertmh...@gmail.com wrote:

 On Mon, Apr 4, 2011 at 7:16 AM, Alexander Korotkov aekorot...@gmail.com
 wrote:
  Project name
  Fast GiST index build

 Would/could/should this be implemented in a manner similar to the
 existing GIN fast update feature?

 I've mentioned this problem in item #2 in project details. In short. Problem
 is in concurrent selects. Buffers size is significant and their scan in
 concurrent select can cause significant slow down. Probably, compromise can
 be achived by using for smaller buffers or something like this, but it's
 topic of separate research. It doesn't seems to be feasible for me to give a
 production solution of this problem during GSoC.

OK.  Could you briefly describe the algorithm you propose to
implement, bearing in mind that I haven't read the paper?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-21 Thread Robert Haas
2010/4/20 Pavel baro...@seznam.cz:
 For now I know it is not commitable in actual state, but for my thesis it is
 enough and I know it will not be commitable with this design at all. In case
 of GSoC it will depends on the time I will be able to spend on it, if I will
 consider some other design.

I am not sure about this, but I would think we would not want to
accept the project unless you intend to try to make it committable.  I
haven't looked at your actual code to see how much work I think that
would take.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-20 Thread Pavel

Greg Smith wrote:


pavelbaros wrote:
I am also waiting for approval for my repository named 
materialized_view on git.postgresql.org, so I could publish 
completed parts. 


Presuming that you're going to wander there and get assigned what 
looks like an official repo name for this project is a 
bit...optimistic.  I would recommend that you publish to something 
like github instead (you can fork http://github.com/postgres/postgres 
), and if the work looks good enough that it gets picked up by the 
community maybe you migrate it onto the main site eventually.  
git.postgresql.org is really not setup to be general hosting space for 
everyone who has a PostgreSQL related project; almost every repo on 
there belongs to someone who has already been a steady project 
contributor for a number of years.


Yes, you're true, I'm kind of newbe in this kind of project and 
specially in PostgreSQL. But I think it is best way to get into 
PostgreSQL. When I chose my bachelor thesis I did not know I could 
participate GSoC or try to make it commitable. Anyway I will make repo 
on github, so everybody could look at it, as soon as posible.


http://github.com/pbaros/postgres
 
(Switching to boilerplate mode for a paragraph...) You have picked a 
PostgreSQL feature that is dramatically more difficult than it appears 
to be, and I wouldn't expect you'll actually finish even a fraction of 
your goals in a summer of work.  You're at least in plentiful 
company--most students do the same.  As a rule, if you see a feature 
on our TODO list that looks really useful and fun to work on, it's 
only still there because people have tried multiple times to build it 
completely but not managed to do so because it's harder than it 
appears.  This is certainly the case with materialized views.


You've outlined a reasonable way to build a prototype that does a 
limited implementation here.  The issue is what it will take to extend 
that into being production quality for the real-world uses of 
materialized views.  How useful your prototype is depends on how well 
it implements a subset of that in a way that will get used by the 
final design.


The main hidden complexity in this particular project relates to 
handling view refreshes.  The non-obvious problem is that when the 
view updates, you need something like a SQL MERGE to really handle 
that in a robust way that doesn't conflict with concurrent access to 
queries against the materialized view.  And work on MERGE support is 
itself blocked behind the fact that PostgreSQL doesn't have a good way 
to lock access to a key value that doesn't exist yet--what other 
databases call key range locking.  See the notes for Add SQL-standard 
MERGE/REPLACE/UPSERT command at http://wiki.postgresql.org/wiki/Todo 
for more information.


You can work around that to build a prototype by grabbing a full table 
lock on the materialized view when updating it, but that's not a 
production quality solution.  Solving that little detail is actually 
more work than the entire project you've outlined.  Your suggested 
implementation--In function CloseIntoRel executor swap relfilenode's 
of temp table and original table and finally delete temp table--is 
where the full table lock is going to end up at.  The exact use cases 
that need materialized views cannot handle a CLUSTER-style table 
recreation each time that needs an exclusive lock to switchover, so 
that whole part of your design is going to be a prototype that doesn't 
work at all like what needs to get built to make this feature 
committable.  It's also not a reasonable assumption that you have 
enough disk space to hold a second copy of the MV in a production system.


For now I know it is not commitable in actual state, but for my thesis 
it is enough and I know it will not be commitable with this design at 
all. In case of GSoC it will depends on the time I will be able to spend 
on it, if I will consider some other design.




Once there's a good way to merge updates, how to efficiently generate 
them against the sort of large data sets that need materalized 
views--so you just write out the updates rather than a whole new 
copy--is itself a large project with a significant quantity of 
academic research to absorb before starting.  Dan Colish at Portland 
State has been playing around with prototypes for the specific problem 
of finding a good algorithm for view refreshing that is compatible 
with PostgreSQL's execution model.  He's already recognized the table 
lock issue here and for the moment is ignoring that part.  I don't 
have a good feel yet for how long the targeted update code will take 
to mature, but based on what I do know I suspect that little detail is 
also a larger effort than the entire scope you're envisioning.  
There's a reason why the MIT Press compendium Materialized Views: 
Techniques, Implementations, and Applications is over 600 pages 
long--I hope you've already started digging through that material.


I would like 

Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-13 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 There are basically 2 major parts for materialized views:

 A) Planner: Getting the query planner to swap in the MatView for part of
 a query automatically for query plan portions which the MatView supports;

 B) Maintenance: maintaining the MatView data according to the programmed
 scheme (synch, asynch, periodic).

 I do not believe it is possible to do both of the above in one summer.
 Of the two, (A) would be more useful since it is possible to manually
 implement (B) using triggers, queues and cron jobs today.

A patch that implements only (A) will be DOA.  The reason is that the
planner can *never* swap in a MatView on its own authority, unless it
can prove that this does not change the semantics of the query.  Which
it obviously will be unable to do unless there's a fully transparent
continuous-update scheme in place.

So the correct approach is to work on (B) first.  When and if we get to
a sufficiently transparent update implementation, we can think about
changing the planner.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-12 Thread Pavel Stehule
2010/4/12 Robert Haas robertmh...@gmail.com:
 On Sun, Apr 11, 2010 at 5:24 AM, Greg Smith g...@2ndquadrant.com wrote:
 From the rest of your comments, I'm comfortable that you're in sync with the
 not necessarily obvious risky spots here I wanted to raise awareness of.
  It's unreasonable to expect we'll have exactly the same priorities  here,
 and I doubt it's useful to debate how I perceive the merit of various
 development subsets here compared to yourself.  I don't think it's really
 important whether anyone agrees with me or not about exactly the value of a
 full table lock implementation.  The main thing I'm concerned about is just
 that it's noted as a known risky part, one that could end up blocking the
 project's ability to commit even a subset of the proposed patch here.

 I think that one of the things that we need to get our hands around is
 how we're going to distinguish the snapshot flavor of materialized
 view from the continuous update flavor.  By definition, the latter
 will only ever be supportable for a fairly restricted subset of all
 possible queries, and I am assuming that we will not want to decide
 what the behavior is going to be based on the query but rather based
 on what the user specifies.  Anything else seems like it would be have
 the potential for severe POLA violations.  So we need to think now
 about how we'll distinguish between the two flavors.  I imagine some
 sort of syntactic marker would be appropriate; not sure what.

 Reading this thread, I'm starting to grow concerned that some people
 may feel that manually refreshed materialized views are not even worth
 bothering with, because (the argument goes) you could just use some
 table and write a function that updates it.  There's probably some
 truth to that, but I guess my thought is that it would have some value
 as a convenience feature; and eventually we might optimize it to the
 point where it would make more sense to use the built-in feature
 rather than rolling your own.  However, if we're going to have
 complaints that manually refreshed materialized views suck and we
 should only ever support materialized views to the extent that we can
 make them automatically update on-the-fly, then let's have those
 complaints now before someone spends several months of their life on
 the project only to be told that we don't want it.  Let's be clear: I
 think it's useful, but, if other people disagree, we need to iron that
 out now.

 ...Robert

I thing so manually refreshed materialized views has sense. It is
similar to replication - there was replications like slony, but for
some people is more important integrated replication in 9.0. More -
manually refreshed (periodically refreshed) views can share lot if
infrastructure with dynamically actualised views. I am sure so
dynamical materialised views is bad task for GSoC - it is too large,
too complex. Manually refreshed views is adequate to two months work
and it has sense.

Regards
Pavel Stehule


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-12 Thread Robert Haas
On Mon, Apr 12, 2010 at 2:16 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 I am sure so
 dynamical materialised views is bad task for GSoC - it is too large,
 too complex. Manually refreshed views is adequate to two months work
 and it has sense.

That is my feeling also - though I fear that even the simplest
possible implementation of this feature may be a stretch.  Anyway we
agree: keep it simple.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-12 Thread Josh Berkus
On 4/9/10 1:36 PM, pavelbaros wrote:
 2) change rewriter
 - usually, view is relation with defined rule and when rewriting, rule
 is fired and relation (view) is replaced by definition of view. If
 relation do not have rule, planner and executor behave to it as physical
 table (relation). In case of materialized view we want to rewrite select
 statement only in case when we refreshing MV. In other cases rewriter
 should skip rewriting and pick up physical relation. Exclude situation
 when other rewrite rules which are not related to MV definition are
 specified.

This was done (although not completed) against PostgreSQL 7.1 by
students in Georgia, USA, I believe.  It might be worthwhile looking at
their work if I can find it (if nowhere else, it should be in the ACM).

There are basically 2 major parts for materialized views:

A) Planner: Getting the query planner to swap in the MatView for part of
a query automatically for query plan portions which the MatView supports;

B) Maintenance: maintaining the MatView data according to the programmed
scheme (synch, asynch, periodic).

I do not believe it is possible to do both of the above in one summer.
Of the two, (A) would be more useful since it is possible to manually
implement (B) using triggers, queues and cron jobs today.


-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-12 Thread Robert Haas
On Mon, Apr 12, 2010 at 1:50 PM, Josh Berkus j...@agliodbs.com wrote:
 On 4/9/10 1:36 PM, pavelbaros wrote:
 2) change rewriter
 - usually, view is relation with defined rule and when rewriting, rule
 is fired and relation (view) is replaced by definition of view. If
 relation do not have rule, planner and executor behave to it as physical
 table (relation). In case of materialized view we want to rewrite select
 statement only in case when we refreshing MV. In other cases rewriter
 should skip rewriting and pick up physical relation. Exclude situation
 when other rewrite rules which are not related to MV definition are
 specified.

 This was done (although not completed) against PostgreSQL 7.1 by
 students in Georgia, USA, I believe.  It might be worthwhile looking at
 their work if I can find it (if nowhere else, it should be in the ACM).

 There are basically 2 major parts for materialized views:

 A) Planner: Getting the query planner to swap in the MatView for part of
 a query automatically for query plan portions which the MatView supports;

 B) Maintenance: maintaining the MatView data according to the programmed
 scheme (synch, asynch, periodic).

 I do not believe it is possible to do both of the above in one summer.
 Of the two, (A) would be more useful since it is possible to manually
 implement (B) using triggers, queues and cron jobs today.

I don't believe that it's possible to do EITHER of those things in one
summer.  I believe that a basic implementation that has NO bells and
whistles at all, as originally proposed, is going to be a Very Hard
Project.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-12 Thread Greg Smith

Josh Berkus wrote:

There are basically 2 major parts for materialized views:
A) Planner: Getting the query planner to swap in the MatView for part of
a query automatically for query plan portions which the MatView supports;
B) Maintenance: maintaining the MatView data according to the programmed
scheme (synch, asynch, periodic).
  


I'm run more into problems where it's perfectly fine to specify using 
the materialized view directly in the query, but keeping that view up to 
date usefully was the real problem.  The whole idea of getting a MV used 
automatically is valuable, but far down the roadmap as I see it.


Not everyone would agree of course, and your description does suggest a 
better way to organize a high-level summary though; here's a first cut:


1) Creation of materalized view
Current state:  using CREATE TABLE AS or similar mechanism, maintain 
manually
Optimal:  CREATE MATERIALIZED VIEW grammar, metadata to store MV data, 
dump/reload support


2) Updating materialized views
Current state:  periodically create new snapshots, or maintain using 
triggers
Optimal:  Built-in refresh via multiple strategies, with minimal locking 
as to improve concurrent access


3) Using materialized views in the planner
Current state:  specify the manually created MV in queries that can use it
Optimal:  Automatically accelerate queries that could be satisfied by 
substituting available MVs


With (1) being what I think is the only GSoC sized subset here.

I'm not saying someone can't jump right into (3), using the current 
implementations for (1) and (2) that are floating around out there.  I 
just think it would end up wasting a fair amount of work on prototypes 
that don't work quite the same way as the eventual fully integrated 
version.  You certainly can start working on (3) without a fully fleshed 
out implementation of (2), I don't know that it makes sense to work on 
before (1) though.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-12 Thread Robert Haas
On Mon, Apr 12, 2010 at 3:43 PM, Greg Smith g...@2ndquadrant.com wrote:
 Josh Berkus wrote:

 There are basically 2 major parts for materialized views:
 A) Planner: Getting the query planner to swap in the MatView for part of
 a query automatically for query plan portions which the MatView supports;
 B) Maintenance: maintaining the MatView data according to the programmed
 scheme (synch, asynch, periodic).


 I'm run more into problems where it's perfectly fine to specify using the
 materialized view directly in the query, but keeping that view up to date
 usefully was the real problem.  The whole idea of getting a MV used
 automatically is valuable, but far down the roadmap as I see it.

 Not everyone would agree of course, and your description does suggest a
 better way to organize a high-level summary though; here's a first cut:

 1) Creation of materalized view
 Current state:  using CREATE TABLE AS or similar mechanism, maintain
 manually
 Optimal:  CREATE MATERIALIZED VIEW grammar, metadata to store MV data,
 dump/reload support

 2) Updating materialized views
 Current state:  periodically create new snapshots, or maintain using
 triggers
 Optimal:  Built-in refresh via multiple strategies, with minimal locking as
 to improve concurrent access

 3) Using materialized views in the planner
 Current state:  specify the manually created MV in queries that can use it
 Optimal:  Automatically accelerate queries that could be satisfied by
 substituting available MVs

 With (1) being what I think is the only GSoC sized subset here.

 I'm not saying someone can't jump right into (3), using the current
 implementations for (1) and (2) that are floating around out there.  I just
 think it would end up wasting a fair amount of work on prototypes that don't
 work quite the same way as the eventual fully integrated version.  You
 certainly can start working on (3) without a fully fleshed out
 implementation of (2), I don't know that it makes sense to work on before
 (1) though.

Good summary.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-12 Thread Josh Berkus
Greg,

 I'm not saying someone can't jump right into (3), using the current
 implementations for (1) and (2) that are floating around out there.  I
 just think it would end up wasting a fair amount of work on prototypes
 that don't work quite the same way as the eventual fully integrated
 version.  You certainly can start working on (3) without a fully fleshed
 out implementation of (2), I don't know that it makes sense to work on
 before (1) though.

What would be the use case for (1) by itself?


-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-12 Thread Greg Smith

Josh Berkus wrote:

What would be the use case for (1) by itself?
  


There isn't any use case for just working on the infrastructure, just 
like there's no use case for Syntax for partitioning on its own.  That 
why people rarely work on that part of these problems--it's boring and 
produces no feature of value on its own.  I believe that in both cases, 
attempts to build the more complicated parts, ones that don't first 
address some of the core infrastructure first, will continue to produce 
only prototypes.


I don't want to see Materialized Views wander down the same path as 
partitioning, where lots of people produce fun parts patches, while 
ignoring the grunt work of things like production quality catalog 
support for the feature.  I think Pavel's proposal got that part right 
by starting with the grammar and executor setup trivia.  And Robert's 
comments about the details in that area it's easy to forget about hit 
the mark too.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-12 Thread Josh Berkus

 I don't want to see Materialized Views wander down the same path as
 partitioning, where lots of people produce fun parts patches, while
 ignoring the grunt work of things like production quality catalog
 support for the feature.  I think Pavel's proposal got that part right
 by starting with the grammar and executor setup trivia.  And Robert's
 comments about the details in that area it's easy to forget about hit
 the mark too.

Good point.  And GSoC may be one of the few times we can get people to
do that kind of work.  Other than Simon, of course.  ;-)

I just worry about any feature which doesn't get as far as a
user-visible implementation.  If someone doesn't do the rest of the
parts soon, such features tend to atrophy because nobody is using them.


-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-12 Thread Greg Smith

Josh Berkus wrote:

I just worry about any feature which doesn't get as far as a
user-visible implementation.  If someone doesn't do the rest of the
parts soon, such features tend to atrophy because nobody is using them.
  


While they're limited, there are complexly viable prototype quality 
implementations possible here without a large amount of work to get them 
started.  I'm not worried too much about this feature being unused.  As 
I was just reminded when assembling an page on the wiki about it:  
http://wiki.postgresql.org/wiki/Materalized_Views it's currently ranked 
#1--by a large margin--on the UserVoice feature request survey that 
Peter kicked off.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-11 Thread Greg Smith

Robert Haas wrote:

I also think that you're underestimating the number of problems that
will have to be solved to get this done.  It's going to take some
significant work - both design work and coding work - to figure out
how this should integrate into the rest of the system.  (What should
be the value of pg_class.relkind?  Where should the node
representation of the snapshot query be stored?  And did we handle all
of those OID dependencies correctly?)
  


I don't think I'm underestimating all that, but I suspect Pavel is by a 
considerable amount.  This is why I've been suggesting that a GSoC scope 
here might just be wrestling with this area of the problem for the whole 
summer--not even getting into updates beyond a completely trivial 
implementation, if any at all.  Things like handle OID dependencies 
are definitely not on the fun side of the development work that people 
tend to think about in advance.



Where I can see this possibly falling down (other than being just too
much work for a relative PostgreSQL novice to get it done in one
summer) is if there are concerns about it being incompatible with
incrementally-updated views.  I imagine that we're going to want to
eventually support both, so we need to make sure that this
implementation doesn't box us into a corner.


Exactly my concern; comitting this part without knowing how that's later 
going to fit into place strikes me the sort of the thing this project 
doesn't like to do.  The alternate approach of starting with the update 
machinery is less likely IMHO to get stuck wondering if there's a future 
blind spot coming or not, since you'd be building from the bottom up 
starting with the hardest parts.


From the rest of your comments, I'm comfortable that you're in sync 
with the not necessarily obvious risky spots here I wanted to raise 
awareness of.  It's unreasonable to expect we'll have exactly the same 
priorities  here, and I doubt it's useful to debate how I perceive the 
merit of various development subsets here compared to yourself.  I don't 
think it's really important whether anyone agrees with me or not about 
exactly the value of a full table lock implementation.  The main thing 
I'm concerned about is just that it's noted as a known risky part, one 
that could end up blocking the project's ability to commit even a subset 
of the proposed patch here.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-11 Thread Heikki Linnakangas
Robert Haas wrote:
 2010/4/10 Andrew Dunstan and...@dunslane.net:
 Heikki Linnakangas wrote:
 1. Keep the materialized view up-to-date when the base tables change.
 This can be further divided into many steps, you can begin by supporting
 automatic updates only on very simple views with e.g a single table and
 a where clause. Then extend that to support joins, aggregates,
 subqueries etc. Keeping it really limited, you could even require the
 user to write the required triggers himself.
 That last bit doesn't strike me as much of an advance. Isn't the whole point
 of this to automate it? Creating greedy materialized views is usually not
 terribly difficult now, but you do have to write the triggers.
 
 Yeah, I agree.

It doesn't accomplish anything interesting on its own. But if you do the
planner changes to automatically use the materialized view to satisfy
queries (item 2. in my previous email), it's useful.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-11 Thread Robert Haas
On Sun, Apr 11, 2010 at 10:26 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Robert Haas wrote:
 2010/4/10 Andrew Dunstan and...@dunslane.net:
 Heikki Linnakangas wrote:
 1. Keep the materialized view up-to-date when the base tables change.
 This can be further divided into many steps, you can begin by supporting
 automatic updates only on very simple views with e.g a single table and
 a where clause. Then extend that to support joins, aggregates,
 subqueries etc. Keeping it really limited, you could even require the
 user to write the required triggers himself.
 That last bit doesn't strike me as much of an advance. Isn't the whole point
 of this to automate it? Creating greedy materialized views is usually not
 terribly difficult now, but you do have to write the triggers.

 Yeah, I agree.

 It doesn't accomplish anything interesting on its own. But if you do the
 planner changes to automatically use the materialized view to satisfy
 queries (item 2. in my previous email), it's useful.

But you can't do that with a snapshot view, only a continuous updated one.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-11 Thread Florian G. Pflug

On 11.04.10 20:47 , Robert Haas wrote:

On Sun, Apr 11, 2010 at 10:26 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

Robert Haas wrote:

2010/4/10 Andrew Dunstanand...@dunslane.net:

Heikki Linnakangas wrote:

1. Keep the materialized view up-to-date when the base tables
change. This can be further divided into many steps, you can
begin by supporting automatic updates only on very simple
views with e.g a single table and a where clause. Then extend
that to support joins, aggregates, subqueries etc. Keeping it
really limited, you could even require the user to write the
required triggers himself.

That last bit doesn't strike me as much of an advance. Isn't
the whole point of this to automate it? Creating greedy
materialized views is usually not terribly difficult now, but
you do have to write the triggers.


Yeah, I agree.


It doesn't accomplish anything interesting on its own. But if you
do the planner changes to automatically use the materialized view
to satisfy queries (item 2. in my previous email), it's useful.


But you can't do that with a snapshot view, only a continuous updated
one.


If continuous updates prove to be too hard initially, you could instead
update the view on select if it's outdated. Such a materialized view
would be a kind of inter-session cache for subselects.

The hard part would probably be to figure out how to decide whether the
view is outdated or not, and to deal with two concurrent transactions
trying to use an outdates view (and both trying to refresh it). What
makes the second problem hard is that you wouldn't want one of the
transactions to wait for the other to complete, because this is not how
SELECTs traditionally behave.

best regards, Florian Pflug



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-11 Thread Robert Haas
On Sun, Apr 11, 2010 at 10:13 PM, Florian G. Pflug f...@phlo.org wrote:
 If continuous updates prove to be too hard initially, you could instead
 update the view on select if it's outdated. Such a materialized view
 would be a kind of inter-session cache for subselects.

 The hard part would probably be to figure out how to decide whether the
 view is outdated or not, and to deal with two concurrent transactions
 trying to use an outdates view (and both trying to refresh it). What
 makes the second problem hard is that you wouldn't want one of the
 transactions to wait for the other to complete, because this is not how
 SELECTs traditionally behave.

Well, the proposed project is to create views that only get refreshed manually.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-11 Thread Robert Haas
On Sun, Apr 11, 2010 at 5:24 AM, Greg Smith g...@2ndquadrant.com wrote:
 From the rest of your comments, I'm comfortable that you're in sync with the
 not necessarily obvious risky spots here I wanted to raise awareness of.
  It's unreasonable to expect we'll have exactly the same priorities  here,
 and I doubt it's useful to debate how I perceive the merit of various
 development subsets here compared to yourself.  I don't think it's really
 important whether anyone agrees with me or not about exactly the value of a
 full table lock implementation.  The main thing I'm concerned about is just
 that it's noted as a known risky part, one that could end up blocking the
 project's ability to commit even a subset of the proposed patch here.

I think that one of the things that we need to get our hands around is
how we're going to distinguish the snapshot flavor of materialized
view from the continuous update flavor.  By definition, the latter
will only ever be supportable for a fairly restricted subset of all
possible queries, and I am assuming that we will not want to decide
what the behavior is going to be based on the query but rather based
on what the user specifies.  Anything else seems like it would be have
the potential for severe POLA violations.  So we need to think now
about how we'll distinguish between the two flavors.  I imagine some
sort of syntactic marker would be appropriate; not sure what.

Reading this thread, I'm starting to grow concerned that some people
may feel that manually refreshed materialized views are not even worth
bothering with, because (the argument goes) you could just use some
table and write a function that updates it.  There's probably some
truth to that, but I guess my thought is that it would have some value
as a convenience feature; and eventually we might optimize it to the
point where it would make more sense to use the built-in feature
rather than rolling your own.  However, if we're going to have
complaints that manually refreshed materialized views suck and we
should only ever support materialized views to the extent that we can
make them automatically update on-the-fly, then let's have those
complaints now before someone spends several months of their life on
the project only to be told that we don't want it.  Let's be clear: I
think it's useful, but, if other people disagree, we need to iron that
out now.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-11 Thread Heikki Linnakangas
Robert Haas wrote:
 On Sun, Apr 11, 2010 at 10:26 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Robert Haas wrote:
 2010/4/10 Andrew Dunstan and...@dunslane.net:
 Heikki Linnakangas wrote:
 1. Keep the materialized view up-to-date when the base tables change.
 This can be further divided into many steps, you can begin by supporting
 automatic updates only on very simple views with e.g a single table and
 a where clause. Then extend that to support joins, aggregates,
 subqueries etc. Keeping it really limited, you could even require the
 user to write the required triggers himself.
 That last bit doesn't strike me as much of an advance. Isn't the whole 
 point
 of this to automate it? Creating greedy materialized views is usually not
 terribly difficult now, but you do have to write the triggers.
 Yeah, I agree.
 It doesn't accomplish anything interesting on its own. But if you do the
 planner changes to automatically use the materialized view to satisfy
 queries (item 2. in my previous email), it's useful.
 
 But you can't do that with a snapshot view, only a continuous updated one.

A materialized view with manually-defined triggers to keep it up-to-date
is a continuously updated one.

Other DBMSs allow that with snapshot views too, you just don't get
totally up-to-date results, but I not sure we want to go there.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-10 Thread Heikki Linnakangas
Greg Smith wrote:
 The main hidden complexity in this particular project relates to
 handling view refreshes.  The non-obvious problem is that when the view
 updates, you need something like a SQL MERGE to really handle that in a
 robust way that doesn't conflict with concurrent access to queries
 against the materialized view.  And work on MERGE support is itself
 blocked behind the fact that PostgreSQL doesn't have a good way to lock
 access to a key value that doesn't exist yet--what other databases call
 key range locking.  See the notes for Add SQL-standard
 MERGE/REPLACE/UPSERT command at http://wiki.postgresql.org/wiki/Todo
 for more information.
 
 You can work around that to build a prototype by grabbing a full table
 lock on the materialized view when updating it, but that's not a
 production quality solution.

It would still be useful for many applications. And it would provide a
basis to extend later. You don't need to solve all problems at once, as
long as what you implement is a useful subset.

 Now, with all that said, that doesn't mean there's not a useful project
 for you buried in this mess.  The first two steps in your plan:
 
 1) create materialized view
 2) change rewriter
 
 Include building a prototype grammer, doing an initial executor
 implementation, and getting some sort of rewriter working.  That is
 potentially good groundwork to lay here.  I would suggest that you
 completely drop your step 3:
 
 3) create command that takes snapshot (refresh MV)
 
 Because you cannot built that in a way that will be useful (and by that
 I mean committable quality) until there's a better way to handle updates
 than writing a whole new table and grabbing a full relation lock to
 switch to it.  To do a good job just on the first two steps should take
 at least a whole summer anyway--there's a whole stack of background
 research needed I haven't seen anyone do yet, and that isn't on your
 plan yet.  There is a precedent for taking this approach.  After getting
 stalled trying to add the entirety of easy partitioning to PostgreSQL,
 the current scope has been scaled back to just trying to get the syntax
 and on-disk structure right, then finish off the implementation.  See
 http://wiki.postgresql.org/wiki/Table_partitioning to get an idea how
 that's been broken into those two major chunks.

The good thing about this subject for GSoC is that it can be divided
into many small steps. There's two largely independent main parts:

1. Keep the materialized view up-to-date when the base tables change.
This can be further divided into many steps, you can begin by supporting
automatic updates only on very simple views with e.g a single table and
a where clause. Then extend that to support joins, aggregates,
subqueries etc. Keeping it really limited, you could even require the
user to write the required triggers himself.

2. Teach the planner to use materialized views automatically when a
query references the base tables. So if you issue the query SELECT *
FROM table WHERE foo  10 AND bar = 10, and there's a materialized view
on SELECT * FROM table WHERE bar = 10, the planner can transform the
original query into SELECT * FROM materializedview WHERE foo  10.
This largely depends on 1, although some DBMSs offer the option to use
manually refreshed materialized views too, knowing that they might not
be completely up-to-date.

There's a lot room to choose which problems you want to tackle, which is
good for a summer-of-code project. Your proposal basically describes
doing 1, in a limited fashion where the view is not updated
automatically, but only when the DBA runs a command to refresh it. I'm
not sure if that's useful enough on its own, writing CREATE
MATERIALIZED VIEW ... SELECT ... doesn't seem any easier than just
writing CREATE TABLA AS  But if you can do something about 2, or
even a very limited part of 1, keeping the view up-to-date
automatically, it becomes much more useful.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-10 Thread Andrew Dunstan



Heikki Linnakangas wrote:


1. Keep the materialized view up-to-date when the base tables change.
This can be further divided into many steps, you can begin by supporting
automatic updates only on very simple views with e.g a single table and
a where clause. Then extend that to support joins, aggregates,
subqueries etc. Keeping it really limited, you could even require the
user to write the required triggers himself.

  


That last bit doesn't strike me as much of an advance. Isn't the whole 
point of this to automate it? Creating greedy materialized views is 
usually not terribly difficult now, but you do have to write the triggers.


The other thing that could be interesting about this would be some 
scheme for lazy refresh that didn't involve re-extracting the whole data 
set.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-10 Thread Robert Haas
2010/4/10 Andrew Dunstan and...@dunslane.net:
 Heikki Linnakangas wrote:

 1. Keep the materialized view up-to-date when the base tables change.
 This can be further divided into many steps, you can begin by supporting
 automatic updates only on very simple views with e.g a single table and
 a where clause. Then extend that to support joins, aggregates,
 subqueries etc. Keeping it really limited, you could even require the
 user to write the required triggers himself.

 That last bit doesn't strike me as much of an advance. Isn't the whole point
 of this to automate it? Creating greedy materialized views is usually not
 terribly difficult now, but you do have to write the triggers.

Yeah, I agree.

 The other thing that could be interesting about this would be some scheme
 for lazy refresh that didn't involve re-extracting the whole data set.

One way to do this would be to infer a primary key for the result set
based on the input query.  But I think we don't really have the
infrastructure to do this right now, so not really a project for a
beginner.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-10 Thread Kevin Grittner
Greg Smith  wrote:
 
 And work on MERGE support is itself blocked behind the fact that
 PostgreSQL doesn't have a good way to lock access to a key value
 that doesn't exist yet--what other databases call key range
 locking.
 
The bulk of the serializable implementation WIP is work to implement
just this sort of locking.  There are already a couple possible
spin-off uses on the horizon based on the ability of these locks to
survive their initiating transactions and detect conflicting writes.
Both spinoffs involve somehow flagging a transaction as being one for
which the locks should be kept until further notice, and issuing a
notification when a conflicting write occurs.  That seems consistent
with the needs of materialized views, too.
 
It probably won't be solid in time to be useful for GSoC, but if
someone's looking to map out a plan for materialized views, I thought
this information might be germane.
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-10 Thread Greg Smith

Heikki Linnakangas wrote:

Your proposal basically describes
doing 1, in a limited fashion where the view is not updated
automatically, but only when the DBA runs a command to refresh it. I'm
not sure if that's useful enough on its own, writing CREATE
MATERIALIZED VIEW ... SELECT ... doesn't seem any easier than just
writing CREATE TABLA AS  But if you can do something about 2, or
even a very limited part of 1, keeping the view up-to-date
automatically, it becomes much more useful.
  


You've hit upon the core issue here.  You can build materialized views 
right now using CREATE TABLE AS.  You can even update them by creating 
a new table the same way, with a new name, and doing the 
LOCK/RENAME/DROP shuffle--what people used to do for rewriting bloated 
tables before there was CLUSTER.  The first step in the proposal here is 
essentially syntax to give an easier UI for that.  That's an interesting 
step, but recognize that it doesn't actually provide anything you can't 
do already.


If you then note that doing any sort of incremental update to the view 
is a hard problem, and that a lot of the useful cases for materialized 
views involve tables where it's impractical to recreate the whole thing 
anyway, you'll inevitably find yourself deeply lost in the minutia of 
how to handle the updates.  It's really the core problem in building 
what people expect from a materialized view implementation in a serious 
database.  Chipping away at the other pieces around it doesn't move the 
feature that far forward, even if you get every single one of them 
except incremental updates finished, because everything else combined is 
still not that much work in comparison to the issues around updates.


There certainly are a fair number of subproblems you can break out of 
here.  I just think it's important to recognize that the path that leads 
to a useful GSoC project and the one that gives a production quality 
materialized view implementation may not have that much in common, and 
to manage expectations on both sides accordingly.  If Pavel thinks he's 
going to end up being able to say I added materialized views to 
PostgreSQL at the end of the summer, that's going to end in 
disappointment.  And if people think this project plan will lead to 
being able to claim PostgreSQL now has this feature, that's also not 
going to go well.  If the scope is add initial grammar and rewriting 
moving toward a future materialized view feature, which the underlying 
implementation noted as a stub prototype, that might work out OK.  This 
is why I likened it to the work on Syntax for partitioning, which has 
a similarly focused subgoal structure.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-10 Thread Greg Smith

Robert Haas wrote:

It's not obvious to me
that a brief full-table lock wouldn't be acceptable for an initial
implementation.  Obviously it wouldn't be suitable for every use case
but since we're talking about manually refreshed views that was bound
to be true anyway.
  


There already is an initial implementation of sorts.  There are a couple 
of ways you can build these right now, so any new development has to 
look like it will end with good odds of being an improvement over what's 
already available before it's worth putting development resources into.


As a rough idea of what people want these for in the field, based on 
what I've seen requests for, imagine that someone has a 1TB table 
they're materializing a view on in order to get at least a 10:1, and 
hopefully close to a 100:1, speedup on viewing summary data.  Now, 
picture what happens if you have someone doing a sequential scan on the 
MV, which is still quite big, the updater process lines up to grab an 
exclusive lock when it's done, and now a second user wanting to read a 
single row quickly comes along behind it.  Given a full-table lock 
implementation, that scenario is unlikely to play out with the second 
user getting a fast response.  They'll likely sit in a lock queue for 
some potentially long period of time instead, waiting for the active seq 
scan to finish then the update to happen.  You have to build it that way 
or a steady stream of people reading could block out updates forever.


To be frank, that makes for a materalized view implementation of little 
value over what you can currently do as far as I'm concerned.  It might 
be interesting as a prototype, but that's not necessarily going to look 
like what's needed to do this for real at all.  I'm not a big fan of 
dumping work into projects when you can see exactly how it's going to 
fail before you even get started.  As I see if, if you know where it's 
going to fall down, you don't need to build a prototype as an exercise 
to show you how to build it--you should work on that part first instead.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-10 Thread Robert Haas
On Sat, Apr 10, 2010 at 11:40 PM, Greg Smith g...@2ndquadrant.com wrote:
 To be frank, that makes for a materalized view implementation of little
 value over what you can currently do as far as I'm concerned.  It might be
 interesting as a prototype, but that's not necessarily going to look like
 what's needed to do this for real at all.  I'm not a big fan of dumping work
 into projects when you can see exactly how it's going to fail before you
 even get started.  As I see if, if you know where it's going to fall down,
 you don't need to build a prototype as an exercise to show you how to build
 it--you should work on that part first instead.

Hopefully, you're already aware that I have enormous respect for your
opinions on a wide variety of topics; if not, let me publicly say that
I absolutely do.

Having said that, I disagree with your conclusions in this instance.
I see nothing but upside from this work.  It is vastly easier to write
a patch that builds on existing functionality than it is to write
something new from scratch.  If there's any value in having manually
refreshed materialized views, then having the simplest possible
implementation of what those can look like committed will make it far
easier to plan out next steps.  While the proposed implementation may
not solve a huge number of real-world problems, I think there's a good
argument that some people will get good use of it.  Not everyone has
1TB tables with continuous access patterns.  And, provided that it
doesn't conflict with anything we want to do in the future, being
useful to some people is a good enough reason to put it in.

I also think that you're underestimating the number of problems that
will have to be solved to get this done.  It's going to take some
significant work - both design work and coding work - to figure out
how this should integrate into the rest of the system.  (What should
be the value of pg_class.relkind?  Where should the node
representation of the snapshot query be stored?  And did we handle all
of those OID dependencies correctly?)

Where I can see this possibly falling down (other than being just too
much work for a relative PostgreSQL novice to get it done in one
summer) is if there are concerns about it being incompatible with
incrementally-updated views.  I imagine that we're going to want to
eventually support both, so we need to make sure that this
implementation doesn't box us into a corner.  But as far as snapshot
views go, complaining that the proposed locking is too strong doesn't
seem quite fair.  Fixing that, AFAICS, is a very hard project,
possibly involving significant planner support and an implementation
of MERGE, and I would much rather try to land a fundamentals patch
like this first and then deal with the gyrations that will be involved
in making this work than try to land the whole thing all at once.

Of course, if I'm missing something, and there's a SIMPLE way to get
materialized views that can be refreshed without a full-table lock,
that's another story altogether - maybe you have an idea?

Finally, even if we decided NOT to merge this patch because of the
limitations you mention (and right now that doesn't seem to be the
consensus), having this part of it completed as a starting point for
future work might be reason enough by itself.

In short: I think you may be letting the perfect be the enemy of the good.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-09 Thread pavelbaros

Hello,

I am sending my proposal on GSoC. Details are listed below. Please, if 
you have ideas, tips, or if you only want to say you opinion about my 
project, go ahead.


thanks,
Pavel Baros

Abstract:

It is effort to implement snapshot materialized view (are only updated 
when refreshed) in PostgreSQL. In this time I finished some part of it 
and I am trying to publish my present work on git.postgresql.org



Benefits to the PostgreSQL Community

First of all, it would be the best if my work is helpful to everybody 
who misses materialized views in PostgreSQL, because PostgreSQL do not 
have still implemented materialized views. In addition, MV is mentioned 
as feature in TODO list.



Deliverables

First of all, at the end of whole my project is not only writing 
bachelors thesis, but finish it as patch and if possible, get patch into 
next PostgrSQL release, or keep git repository actual to last PosgreSQL 
version. I have also personal goals. Arouse the interest about 
implementing MV in PostgreSQL, or at least arouse discussion about it.



Project Schedule

My work goes quite good, I am on good way to finish main parts on 
backend in few weeks. After that I will make and run tests and implement 
related changes to PosgreSQL tools (psql, pg_dump, etc.). I am also 
waiting for approval for my repository named materialized_view on 
git.postgresql.org, so I could publish completed parts. For now next 
step will be to discuss implementation on postgresql.hackers.



Bio

I am from Czech Republic and I am studying on Faculty of Electrical 
Engineering on Czech Technical University in Prague www.fel.cvut.cz/en/. 
My bachelor thesis is based on this project, implementing MV in PostgreSQL.


I've experienced many different jobs. The best experience for me was, 
when I've worked as tester and software engineer in C/C++ and C# for 
Radiant Systems Inc. for more than year. After that I've worked as Web 
developer with Internet technologies (PHP, HTML, CSS, ...), where the 
goal was to make an internal system for an advertising agency. Finally 
my recent job was as Windows Mobile Developer. Except the first 
experience, others lasts only few months mainly because those were 
temporary projects. For now I am looking for some part time job, of 
course, preferably something closer to database systems.



Implementation:  could be divided to few steps:

1) create materialized view
- modify grammar (parser): CREATE MATERIALIZED VIEW mv_name AS SELECT ...
- change executor, so that it will create physical table defined by 
select statement


2) change rewriter
- usually, view is relation with defined rule and when rewriting, rule 
is fired and relation (view) is replaced by definition of view. If 
relation do not have rule, planner and executor behave to it as physical 
table (relation). In case of materialized view we want to rewrite select 
statement only in case when we refreshing MV. In other cases rewriter 
should skip rewriting and pick up physical relation. Exclude situation 
when other rewrite rules which are not related to MV definition are 
specified.


3) create command that takes snapshot (refresh MV)
- modify grammar: ALTER MATERIALIZED VIEW mv_name REFRESH;
- taking snapshot (refreshing) is similar to command SELECT INTO ... 
and I decided to follow the way it works. After parsing query and before 
transformation is MANUALLY created tree representation of SELECT * INTO 
... with flag IntoClause-isrefresh set true, indicating it is 
refreshing materialized view. Everithing acts as it would be regular 
SELECT INTO ... except functions OpenIntoRel() and CloseIntoRel(). In 
function OpenIntoRel is created temp table (without catalog) and set as 
destination for result of select. In function CloseIntoRel executor swap 
relfilenode's of temp table and original table and finally delete temp 
table. Behavior of CloseIntoRel function is inspired by CLUSTER statement.



Contacts:  baro...@seznam.cz

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

2010-04-09 Thread Kevin Grittner
pavelbaros baro...@seznam.cz wrote:
 
 I am also waiting for approval for my repository named
 materialized_view on git.postgresql.org
 
They seem to prefer that you get a repository under your name and
use materialized_view as a branch name.  See my account on
git.postgresql.org and its serializable branch for an example.
 
I learned by putting in a request similar to your pending one.
;-)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >