Re: [PERFORM] unlogged tables

2015-04-13 Thread Matheus de Oliveira
On Mon, Apr 13, 2015 at 4:31 PM, dgabriel gabriel.do...@gmail.com wrote:

 In the event of a normal shutdown, we can flush all the writes to disk
 so we know all the data has been written, so there is no need to truncate.

 Isn't possible to periodically flush data to disk and in case of crush
 postgres to load only the data that existed at last flush? The periodic
 flush could be configurable, for example every 30 minutes or after x rows
 updated/inserted.


There is no such facility implemented for UNLOGGED TABLEs. That could be a
feature request though.

Best regards,
-- 
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres


Re: [PERFORM] unlogged tables

2015-04-13 Thread dgabriel
In the event of a normal shutdown, we can flush all the writes to disk 
so we know all the data has been written, so there is no need to truncate.

Isn't possible to periodically flush data to disk and in case of crush
postgres to load only the data that existed at last flush? The periodic
flush could be configurable, for example every 30 minutes or after x rows
updated/inserted. 



--
View this message in context: 
http://postgresql.nabble.com/unlogged-tables-tp4985453p5845576.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] unlogged tables

2015-04-13 Thread David G. Johnston
On Monday, April 13, 2015, Matheus de Oliveira matioli.math...@gmail.com
wrote:


 On Mon, Apr 13, 2015 at 4:31 PM, dgabriel gabriel.do...@gmail.com
 javascript:_e(%7B%7D,'cvml','gabriel.do...@gmail.com'); wrote:

 In the event of a normal shutdown, we can flush all the writes to disk
 so we know all the data has been written, so there is no need to
 truncate.

 Isn't possible to periodically flush data to disk and in case of crush
 postgres to load only the data that existed at last flush? The periodic
 flush could be configurable, for example every 30 minutes or after x rows
 updated/inserted.


 There is no such facility implemented for UNLOGGED TABLEs. That could be a
 feature request though.


Well, that is half right anyway.  UNLOGGED tables obey checkpoints just
like any other table.  The missing feature is an option to leaved restored
the last checkpoint.  Instead, not knowing whether there were changes since
the last checkpoint, the system truncated the relation.

What use case is there for a behavior that the last checkpoint data is left
on the relation upon restarting - not knowing whether it was possible the
other data could have been written subsequent?

David J.


Re: [PERFORM] unlogged tables

2015-04-13 Thread dgabriel
That will be a very useful feature. I don' 
t care if i loss 1-2 hours of data. I know we could have some cron jobs to
dump the table periodically but  the table could be big,  so this operation
could be expensive. Also i have to detect when postgres crush, i have no
idea how i can detect if postgres crushed. Then i have somehow to attache a
script at postgres start, to restore the dumps...the dump solution is very
complicate and  unreliable. A periodic flush feature will be amazing!

How is the procedure for feature request on postgres, github?

Thanks!



--
View this message in context: 
http://postgresql.nabble.com/unlogged-tables-tp4985453p5845580.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] unlogged tables

2015-04-13 Thread Matheus de Oliveira
On Mon, Apr 13, 2015 at 5:30 PM, Yves Dorfsman y...@zioup.com wrote:


 In my experience postgres is very aggressive in getting rid of unlogged
 tables, it does get rid of them from shutdowns that seem perfectly fine (no
 crash). A lot of people get surprised by this.


Shutdowns in fast or smart modes does not get rid of unlogged tables.
But if you do immediate, then it does, and I don't see why people get
surprised by it, as you probably shouldn't be using immediate mode in
normal circumstances.

Best regards,
-- 
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres


Re: [PERFORM] unlogged tables

2015-04-13 Thread Yves Dorfsman
On 2015-04-13 14:16, dgabriel wrote:
 That will be a very useful feature.

I agree, unlogged tables would be a lot more useful if they didn't disappear
on re-start.


 could be expensive. Also i have to detect when postgres crush, i have no
 idea how i can detect if postgres crushed. Then i have somehow to attache a
 script at postgres start, to restore the dumps...the dump solution is very
 complicate and  unreliable. A periodic flush feature will be amazing!

In my experience postgres is very aggressive in getting rid of unlogged
tables, it does get rid of them from shutdowns that seem perfectly fine (no
crash). A lot of people get surprised by this.

-- 
http://yves.zioup.com
gpg: 4096R/32B0F416



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] unlogged tables

2015-04-13 Thread Jim Nasby

On 4/13/15 3:49 PM, David G. Johnston wrote:

On Monday, April 13, 2015, Matheus de Oliveira
matioli.math...@gmail.com mailto:matioli.math...@gmail.com wrote:
On Mon, Apr 13, 2015 at 4:31 PM, dgabriel gabriel.do...@gmail.com
javascript:_e(%7B%7D,'cvml','gabriel.do...@gmail.com'); wrote:

In the event of a normal shutdown, we can flush all the writes
to disk
so we know all the data has been written, so there is no need to
truncate.

Isn't possible to periodically flush data to disk and in case of
crush
postgres to load only the data that existed at last flush? The
periodic
flush could be configurable, for example every 30 minutes or
after x rows
updated/inserted.

There is no such facility implemented for UNLOGGED TABLEs. That
could be a feature request though.

Well, that is half right anyway.  UNLOGGED tables obey checkpoints just
like any other table.  The missing feature is an option to
leaved restored the last checkpoint.  Instead, not knowing whether there
were changes since the last checkpoint, the system truncated the relation.

What use case is there for a behavior that the last checkpoint data is
left on the relation upon restarting - not knowing whether it was
possible the other data could have been written subsequent?


Yeah, this is not something that would be very easy to accomplish, 
because a buffer can get evicted and written to disk at any point. It 
wouldn't be too hard to read every unlogged table during recovery and 
see if there are any pages that were written after the last checkpoint, 
but that obviously won't be very fast.


Actually, I suppose we could dedicate a fork for unlogged tables and use 
that to record the newest LSN of any page that's been written out. But 
if you have much of any write activity on the table that's probably 
going to be completely useless.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] unlogged tables

2015-04-13 Thread Alvaro Herrera
Jim Nasby wrote:

 Yeah, this is not something that would be very easy to accomplish, because a
 buffer can get evicted and written to disk at any point. It wouldn't be too
 hard to read every unlogged table during recovery and see if there are any
 pages that were written after the last checkpoint, but that obviously won't
 be very fast.

If you consider only tables, then yeah perhaps this is easy to
accomplish (not really convinced myself).  But if you consider indexes,
things are not so easy anymore.


In the thread from 2011 (which this started as a reply to) the OP was
doing frequent UPDATEs to keep track of counts of something.  I think
that would be better served by using INSERTs of deltas and periodic
accumulation of grouped values, as suggested in
http://www.postgresql.org/message-id/20150305211601.gw3...@alvh.no-ip.org
This has actually been suggested many times over the years.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] unlogged tables

2015-04-13 Thread David G. Johnston
On Mon, Apr 13, 2015 at 4:49 PM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Mon, Apr 13, 2015 at 1:49 PM, David G. Johnston 
 david.g.johns...@gmail.com wrote:

 On Monday, April 13, 2015, Matheus de Oliveira matioli.math...@gmail.com
 wrote:


 On Mon, Apr 13, 2015 at 4:31 PM, dgabriel gabriel.do...@gmail.com
 wrote:

 In the event of a normal shutdown, we can flush all the writes to disk
 so we know all the data has been written, so there is no need to
 truncate.

 Isn't possible to periodically flush data to disk and in case of crush
 postgres to load only the data that existed at last flush? The periodic
 flush could be configurable, for example every 30 minutes or after x
 rows
 updated/inserted.


 There is no such facility implemented for UNLOGGED TABLEs. That could be
 a feature request though.


 One way would be to lock dirty buffers from unlogged relations into
 shared_buffers (which hardly seems like a good thing) until the start of a
 super-checkpoint and then write them all out as fast as possible (which
 kind of defeats checkpoint_completion_target).  And then if the crash
 happened during a super-checkpoint, the data would still be inconsistent
 and need to be truncated.




 Well, that is half right anyway.  UNLOGGED tables obey checkpoints just
 like any other table.


 Do they?  I thought they only obeyed shutdown checkpoints, not online
 checkpoints.  I do remember some changes around this area, but none that
 completely reverted that logic.


​I vaguely recall that conversation now...I'm not positive on the exact
mechanics here and, as it pertains to the OP, the difference you describe
is immaterial since in either case the status quo mandates an all or
nothing approach to an unlogged table's contents.​





 The missing feature is an option to leaved restored the last checkpoint.
 Instead, not knowing whether there were changes since the last checkpoint,
 the system truncated the relation.

 What use case is there for a behavior that the last checkpoint data is
 left on the relation upon restarting - not knowing whether it was possible
 the other data could have been written subsequent?


 I would like a way to have unlogged tables be available on a replica
 provided that no changes were made to them between the pg_basebackup and
 the recovery point.


 My use case is that I mark certain read-only-after-bulk-loading tables as
 unlogged solely to avoid blowing out the log archive during the loading
 phase and refresh phase.  This is stuff like vendor catalogs, NCBI
 datasets, ChEMBL datasets, etc, which can simply be re-derived from the
 reference.  It would be nice if these were still available (without having
 to repeat the ETL) after crashes provided they were not written to since a
 checkpoint, and available on cloned test servers without having to repeat
 the ETL on those as well.



​My gut reaction is that those should be in their own clusters and accessed
via postgres_fdw...

That particular use-case would probably best be served with a separate
replication channel which pushes data files from the primary to the slaves
and allows for the slave to basically rewrite its existing table by
pointing to the newly supplied version.  Some kind of CREATE STATIC TABLE
and PUSH STATIC TABLE TO {all | replica name} command combo...though
ideally with less manual intervention...

David J.​


Re: [PERFORM] unlogged tables

2015-04-13 Thread Jim Nasby

On 4/13/15 4:13 PM, Alvaro Herrera wrote:

Jim Nasby wrote:


Yeah, this is not something that would be very easy to accomplish, because a
buffer can get evicted and written to disk at any point. It wouldn't be too
hard to read every unlogged table during recovery and see if there are any
pages that were written after the last checkpoint, but that obviously won't
be very fast.


If you consider only tables, then yeah perhaps this is easy to
accomplish (not really convinced myself).  But if you consider indexes,
things are not so easy anymore.


Are indexes not guaranteed to have LSNs? I thought they basically 
followed the same write rules as heap pages in regard to WAL first.


Though, if you have an index that doesn't support logging (like hash) 
you're still hosed...



In the thread from 2011 (which this started as a reply to) the OP was


I don't keep PGSQL emails from that far back... ;)


doing frequent UPDATEs to keep track of counts of something.  I think
that would be better served by using INSERTs of deltas and periodic
accumulation of grouped values, as suggested in
http://www.postgresql.org/message-id/20150305211601.gw3...@alvh.no-ip.org
This has actually been suggested many times over the years.


What I was suggesting certainly wouldn't help you if you were getting 
any serious amount of changes to the count.


I am wondering though what the bottleneck in HEAD is with doing an 
UPDATE instead of an INSERT, at least where unlogged would help 
significantly. I didn't think we logged all that much more for an 
UPDATE. Heck, with HOT you might even be able to log less.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] unlogged tables

2015-04-13 Thread Jeff Janes
On Mon, Apr 13, 2015 at 1:49 PM, David G. Johnston 
david.g.johns...@gmail.com wrote:

 On Monday, April 13, 2015, Matheus de Oliveira matioli.math...@gmail.com
 wrote:


 On Mon, Apr 13, 2015 at 4:31 PM, dgabriel gabriel.do...@gmail.com
 wrote:

 In the event of a normal shutdown, we can flush all the writes to disk
 so we know all the data has been written, so there is no need to
 truncate.

 Isn't possible to periodically flush data to disk and in case of crush
 postgres to load only the data that existed at last flush? The periodic
 flush could be configurable, for example every 30 minutes or after x rows
 updated/inserted.


 There is no such facility implemented for UNLOGGED TABLEs. That could be
 a feature request though.


One way would be to lock dirty buffers from unlogged relations into
shared_buffers (which hardly seems like a good thing) until the start of a
super-checkpoint and then write them all out as fast as possible (which
kind of defeats checkpoint_completion_target).  And then if the crash
happened during a super-checkpoint, the data would still be inconsistent
and need to be truncated.




 Well, that is half right anyway.  UNLOGGED tables obey checkpoints just
 like any other table.


Do they?  I thought they only obeyed shutdown checkpoints, not online
checkpoints.  I do remember some changes around this area, but none that
completely reverted that logic.



 The missing feature is an option to leaved restored the last checkpoint.
 Instead, not knowing whether there were changes since the last checkpoint,
 the system truncated the relation.

 What use case is there for a behavior that the last checkpoint data is
 left on the relation upon restarting - not knowing whether it was possible
 the other data could have been written subsequent?


I would like a way to have unlogged tables be available on a replica
provided that no changes were made to them between the pg_basebackup and
the recovery point.

My use case is that I mark certain read-only-after-bulk-loading tables as
unlogged solely to avoid blowing out the log archive during the loading
phase and refresh phase.  This is stuff like vendor catalogs, NCBI
datasets, ChEMBL datasets, etc, which can simply be re-derived from the
reference.  It would be nice if these were still available (without having
to repeat the ETL) after crashes provided they were not written to since a
checkpoint, and available on cloned test servers without having to repeat
the ETL on those as well.

As for maybe its corrupt, maybe it isn't, but lets keep them anyway,
yeah, I have little use for that.

Cheers,

Jeff


Re: [PERFORM] unlogged tables

2015-04-13 Thread Jim Nasby

On 4/13/15 7:32 PM, David G. Johnston wrote:

The missing feature is an option to leaved restored the last
checkpoint.  Instead, not knowing whether there were changes
since the last checkpoint, the system truncated the relation.

What use case is there for a behavior that the last checkpoint
data is left on the relation upon restarting - not knowing
whether it was possible the other data could have been written
subsequent?


I would like a way to have unlogged tables be available on a replica
provided that no changes were made to them between the pg_basebackup
and the recovery point.


My use case is that I mark certain read-only-after-bulk-loading
tables as unlogged solely to avoid blowing out the log archive
during the loading phase and refresh phase.  This is stuff like
vendor catalogs, NCBI datasets, ChEMBL datasets, etc, which can
simply be re-derived from the reference.  It would be nice if these
were still available (without having to repeat the ETL) after
crashes provided they were not written to since a checkpoint, and
available on cloned test servers without having to repeat the ETL on
those as well.


​My gut reaction is that those should be in their own clusters and
accessed via postgres_fdw...


Likely to produce really crappy plans if the tables are of any real size...


That particular use-case would probably best be served with a separate
replication channel which pushes data files from the primary to the
slaves and allows for the slave to basically rewrite its existing
table by pointing to the newly supplied version.  Some kind of CREATE
STATIC TABLE and PUSH STATIC TABLE TO {all | replica name} command
combo...though ideally with less manual intervention...


You still have the same problem of knowing if someone has scribbled on 
the data since the last checkpoint.


There's been recent discussion of adding support for read-only tables. 
If we had those, we might be able to support something like...


INSERT INTO unlogged;
ALTER TABLE unlogged SET READ ONLY;
CHECKPOINT;
/* take backup */

This should be safe as long as we WAL log changes to read-only status 
(which presumably we would).


How much work that would entail though, I don't know.

Ultimately you still have to get the data over to the other machine 
anyway. ISTM it'd be a LOT more useful to look at ways to make the WAL 
logging of bulk inserts (and especially COPY into a known empty table) a 
lot more efficient.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] unlogged tables

2015-04-13 Thread David G. Johnston
On Mon, Apr 13, 2015 at 7:45 PM, Jim Nasby jim.na...@bluetreble.com wrote:

 On 4/13/15 7:32 PM, David G. Johnston wrote:​

  That particular use-case would probably best be served with a separate
 replication channel which pushes data files from the primary to the
 slaves and allows for the slave to basically rewrite its existing
 table by pointing to the newly supplied version.  Some kind of CREATE
 STATIC TABLE and PUSH STATIC TABLE TO {all | replica name} command
 combo...though ideally with less manual intervention...


 You still have the same problem of knowing if someone has scribbled on the
 data since the last checkpoint.


​That seems like an automation concern though...the more limited idea was
to simply have a means for a table to exist on the master and allow the
user to cause an exact copy of that table to appear on a replica via direct
data transfer (i.e., without need to create a backup/dump).  If the table
already exists on the replica the existing version remains as-is until the
new table is fully push and then a filenode pointer update happens.  If
changes are made to the master the two tables will remain diverged until a
new push occurs.​

I imaging this same idea could be handled external to the database though
I'm don't know enough to comment on the specific technical merits of each.​


 There's been recent discussion of adding support for read-only tables. If
 we had those, we might be able to support something like...

 INSERT INTO unlogged;
 ALTER TABLE unlogged SET READ ONLY;
 CHECKPOINT;
 /* take backup */

 This should be safe as long as we WAL log changes to read-only status
 (which presumably we would).

 How much work that would entail though, I don't know.

 Ultimately you still have to get the data over to the other machine
 anyway. ISTM it'd be a LOT more useful to look at ways to make the WAL
 logging of bulk inserts (and especially COPY into a known empty table) a
 lot more efficient.


​Jeff Janes makes a comment about wanting ...to avoid blowing out the log
archive...; which I also don't quite follow...

WAL does seem to be designed to solve a different problem that what is
described here - lots of small changes versus few large changes.  Improving
WAL to move the size at which small becomes large is a win but another
channel designed for few large changes may be less complex to implement.
The current work in logical replication likely has merit here as well but
my familiarity with that technology is fairly limited.

David J.