subject:"\[PERFORM\] unlogged tables"

Re: [PERFORM] Unlogged tables

2017-08-10 Thread George Neuner




On 8/10/2017 1:29 AM, l...@laurent-hasson.com wrote:


Finally, my true question was whether Postgres would support something 
like worm with the performance benefits of UNLOGGED, but not the 
inconveniences of auto truncates.




If you can live with the limitations, one other thing you might try is 
storing WORM data in the filesystem and accessing it via file_fdw.

https://www.postgresql.org/docs/current/static/file-fdw.html

There are a lot of downsides to this:  file_fdw tables are read-only, so 
you have to update the external file through some other means.  Also, 
I've never used file_fdw, so I'm not sure whether you can create indexes 
on the tables - and even if you can, you would need to manually recreate 
the indexes periodically because Postgresql won't see your updates.


George

Re: [PERFORM] Unlogged tables

2017-08-10 Thread George Neuner

On 8/10/2017 1:29 AM, l...@laurent-hasson.com wrote:
Hello George... I know about not doing top posting but was emailing 
from my phone, and just recently moved to Android. I think I am still 
not configured right.

Somewhat orthogonal, but any particular reason why top posts == bad, 
or just convention?

The standard joke reply is:

   Because it messes up the order in which people normally read.
> Why is top-posting such a bad thing?
   >> Top-posting.
   >>> What is the most annoying thing in e-mail?

It is just convention, but with a good reason:  most posts in groups are 
part of a discussion, and it's hard to follow a discussion when replies 
are far from the comment or question that provoked them. The convention 
for discussion is "interleaved" style.

The email top posting convention serves a different purpose: to preserve 
a record of the communication.  Polite people often use a mix of 
styles:  copying the [latest portion of the ] quoted message to the top 
and replying to it inline (as with a discussion).

see https://en.wikipedia.org/wiki/Posting_style

Then too, there is the issue of editing.  With an email, typically only 
a handful of people will receive it.  With a public group or mailing 
list, all of the participants  - perhaps thousands - will receive the 
post.  When lots of people in a popular discussion quote the entire 
message, it quickly grows to an unwieldy size and eventually will be 
rejected by the servers.

The polite thing when replying is to edit the original message to 
include just information relevant to your reply, and then reply inline.  
Leave archiving of the discussion to the servers.

I will try a few scenarios and report back. I do not believe I have 
long cp intervals and I do not believe the windows machine shuts down 
faster than 'normal'

Your problem still may be related to the shutdown delay.

The way it works is: Windows sends a shutdown message to the service, 
and the service replies with an estimate of how long it will take to 
stop.  Until the service terminates, Windows waits and periodically 
polls the service asking for its progress.  Windows continues to wait 
until the service process either terminates, or until the system 
configured "drop-dead" timeout occurs, at which time Windows forcibly 
kills the service and continues with the shutdown.

The problem is that Postgresql is not a single process: pg_ctl spawns a 
bunch of children.  Looking further at the source, I believe pg_ctl is 
waiting for the children to terminate before stopping itself - but it is 
NOT responding to Windows progress messages, so Windows has no idea 
whether it is making headway or needs more time to complete.

Windows has no idea that those other processes are connected to the 
Postgresql service, so if it times out and kills pg_ctl, it assumes it 
is done with Postgresql.  The other processes then may be killed whether 
or not they are finished.

Finally, my true question was whether Postgres would support something 
like worm with the performance benefits of UNLOGGED, but not the 
inconveniences of auto truncates.

I saw some of the other responses re: that issue.

As I mentioned previously, an unlogged table will be truncated on 
startup if it is dirty - i.e. there were any updates that haven't 
survived at least one checkpoint.  The only thing you could try to do is 
force a checkpoint immediately following an unlogged table write.  But 
that is expensive performance wise and is not encouraged.

https://www.postgresql.org/docs/current/static/sql-checkpoint.html

George

Re: [PERFORM] Unlogged tables

2017-08-09 Thread l...@laurent-hasson.com

Sent from my BlackBerry - the most secure mobile device
From: gneun...@comcast.net
Sent: August 9, 2017 14:52
To: l...@laurent-hasson.com
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Unlogged tables

Please don't top post.

On 8/9/2017 2:30 PM, l...@laurent-hasson.com<mailto:l...@laurent-hasson.com> 
wrote:
> On 8/9/2017 2:17 PM, gneun...@comcast.net<mailto:gneun...@comcast.net> wrote:

>> On Wed, 9 Aug 2017 09:14:48 -0700, Jeff Janes 
>> <jeff.ja...@gmail.com><mailto:jeff.ja...@gmail.com> wrote:

>> Why doesn't the Windows scheduled shutdown signal postgres to shutdown
>> cleanly and wait for it to do so?  That is what is supposed to happen.

> Windows *does* signal shutdown (and sleep and hibernate and wakeup).
> pg_ctl can catch these signals only when running as a service ... it
> will not catch any system signals when run as an application.

Ok, I am not sure. I run Postgres as a service, and when my Windows rebooted 
after a patch, UNLOGGED tables were cleaned... maybe the patch process in 
Windows messed something up, I don't know.

Hmm.  Do you have checkpoint intervals set very long?  Or do you have the 
Windows shutdown delay(s) set short?

Data in unlogged tables persists only AFTER a checkpoint ... if the tables had 
been written to and were "dirty", and the system went down before the shutdown 
checkpoint (or before the shutdown checkpoint completed), then the tables would 
be truncated at the next startup.

Service control in Windows is very different from Unix/Linux, and Windows is 
not completely POSIX compatible.  I develop software for Windows and Linux, but 
I only use Postgresql.  Postgresql was written originally for Unix and it is 
possible that the Windows version is not doing something quite right.

I took a quick glance at the source for pg_ctl:  SERVICE_CONTROL_SHUTDOWN and 
SERVICE_CONTROL_STOP both just set an shared event to notify the writer 
processes to terminate.  Offhand I don't see where pg_ctl - running as a 
service - is waiting for the writer processes to actually terminate ( it does 
wait if run from the command line ).   It's possible that your system shut down 
too quickly and the WAL writer was killed instead of terminating cleanly.

Just FYI, re: Postgresql as a user application.

Windows doesn't send *signals* (ala Unix) at all ... it is message based.  The 
control messages are different for applications and services - e.g., 
WM_SHUTDOWN is sent to applications, SERVICE_CONTROL_SHUTDOWN is sent to 
services.  In order for an application to catch a message, it must create a 
window.

pg_ctl is a command line program which does not create any windows (in any 
mode).  It was designed to enable it to run as a service, but when run as a 
user application it will can't receive any system messages.  The user *must* 
manually stop a running database cluster before shutting down or sleeping.

George

Hello George... I know about not doing top posting but was emailing from my 
phone, and just recently moved to Android. I think I am still not configured 
right.

Somewhat orthogonal, but any particular reason why top posts == bad, or just 
convention?

I will try a few scenarios and report back. I do not believe I have long cp 
intervals and I do not believe the windows machine shuts down faster than 
'normal'

Finally, my true question was whether Postgres would support something like 
worm with the performance benefits of UNLOGGED, but not the inconveniences of 
auto truncates.

Thanks.

Re: [PERFORM] Unlogged tables

2017-08-09 Thread George Neuner

Please don't top post.

On 8/9/2017 2:30 PM, l...@laurent-hasson.com wrote:

> On 8/9/2017 2:17 PM, gneun...@comcast.net wrote:

>> On Wed, 9 Aug 2017 09:14:48 -0700, Jeff Janes  wrote:

>> Why doesn't the Windows scheduled shutdown signal postgres to shutdown
>> cleanly and wait for it to do so?  That is what is supposed to happen.

> Windows *does* signal shutdown (and sleep and hibernate and wakeup).
> pg_ctl can catch these signals only when running as a service ... it
> will not catch any system signals when run as an application.

Ok, I am not sure. I run Postgres as a service, and when my Windows 
rebooted after a patch, UNLOGGED tables were cleaned... maybe the 
patch process in Windows messed something up, I don't know.

Hmm.  Do you have checkpoint intervals set very long?  Or do you have 
the Windows shutdown delay(s) set short?

Data in unlogged tables persists only AFTER a checkpoint ... if the 
tables had been written to and were "dirty", and the system went down 
before the shutdown checkpoint (or before the shutdown checkpoint 
completed), then the tables would be truncated at the next startup.

Service control in Windows is very different from Unix/Linux, and 
Windows is not completely POSIX compatible.  I develop software for 
Windows and Linux, but I only use Postgresql.  Postgresql was written 
originally for Unix and it is possible that the Windows version is not 
doing something quite right.

I took a quick glance at the source for pg_ctl: SERVICE_CONTROL_SHUTDOWN 
and SERVICE_CONTROL_STOP both just set an shared event to notify the 
writer processes to terminate.  Offhand I don't see where pg_ctl - 
running as a service - is waiting for the writer processes to actually 
terminate ( it does wait if run from the command line ).   It's possible 
that your system shut down too quickly and the WAL writer was killed 
instead of terminating cleanly.

Just FYI, re: Postgresql as a user application.

Windows doesn't send *signals* (ala Unix) at all ... it is message 
based.  The control messages are different for applications and services 
- e.g., WM_SHUTDOWN is sent to applications, SERVICE_CONTROL_SHUTDOWN is 
sent to services.  In order for an application to catch a message, it 
must create a window.

pg_ctl is a command line program which does not create any windows (in 
any mode).  It was designed to enable it to run as a service, but when 
run as a user application it will can't receive any system messages.  
The user *must* manually stop a running database cluster before shutting 
down or sleeping.

George

Re: [PERFORM] Unlogged tables

2017-08-09 Thread l...@laurent-hasson.com

Ok, I am not sure. I run Postgres as a service, and when my Windows rebooted 
after a patch, UNLOGGED tables were cleaned... maybe the patch process in 
Windows messed something up, I don't know.

From: gneun...@comcast.net
Sent: August 9, 2017 13:17
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Unlogged tables

On Wed, 9 Aug 2017 09:14:48 -0700, Jeff Janes <jeff.ja...@gmail.com> wrote:

 >Why doesn't the Windows scheduled shutdown signal postgres to shutdown
 >cleanly and wait for it to do so?  That is what is supposed to happen.

Windows *does* signal shutdown (and sleep and hibernate and wakeup).
pg_ctl can catch these signals only when running as a service ... it
will not catch any system signals when run as an application.

George

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Unlogged tables

2017-08-09 Thread George Neuner

On Wed, 9 Aug 2017 09:14:48 -0700, Jeff Janes  wrote:

>Why doesn't the Windows scheduled shutdown signal postgres to shutdown
>cleanly and wait for it to do so?  That is what is supposed to happen.

Windows *does* signal shutdown (and sleep and hibernate and wakeup).  
pg_ctl can catch these signals only when running as a service ... it 
will not catch any system signals when run as an application.

George

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Unlogged tables

2017-08-09 Thread Jeff Janes

On Tue, Aug 8, 2017 at 8:20 PM, l...@laurent-hasson.com <
l...@laurent-hasson.com> wrote:

> Hello,
>
>
> We have a fairly large static dataset that we load into Postgres. We made
> the tables UNLOGGED and saw a pretty significant performance improvement
> for the loading. This was all fantastic until the server crashed and we
> were surprised to see during a follow up demo that the data had
> disappeared... Of course, it's all our fault for not understanding the
> implications of UNLOGGED proprely.
>
>
> However, our scenario is truly a set of tables with 100's of millions of
> rows that are effectively WORMs: we write them once only, and then only
> read from them afterwards. As such, they could not be possibly corrupted
> post-load (i think) during a server crash (short of physical disk
> defects...).
>

Yes, this is a feature many people have wanted.   You'd have to somehow
mark the unlogged table as immutable and then do a checkpoint, after which
it would no longer need to be truncated after a crash.  Alternatively, it
could be done automatically where the system would somehow know which
unlogged tables were possibly touched since the last successful checkpoint,
and truncate only those one.  But, no one has implemented such a thing.

>
> I'd like to have the performance improvement during a initial batch
> insert, and then make sure the table remains after "unclean" shutdowns,
> which, as you might have it, includes a regular Windows server shut down
> during patching for example.
>

Why doesn't the Windows scheduled shutdown signal postgres to shutdown
cleanly and wait for it to do so?  That is what is supposed to happen.

> So unlogged tables in practice are pretty flimsy. I tried to ALTER ... SET
> LOGGED, but that takes a VERY long time and pretty much negates the initial
> performance boost of loading into an unlogged table.
>

Are you using streaming or wal logging?

Cheers,

Jeff

Re: [PERFORM] Unlogged tables

2017-08-09 Thread Stephen Frost

David, all,

* David G. Johnston (david.g.johns...@gmail.com) wrote:
> On Wed, Aug 9, 2017 at 3:39 AM, Michael Paquier 
> wrote:
> 
> > This triggers a table rewrite and makes sure that all the data gets
> > WAL-logged. The cost to pay for durability.

That's not entirely accurate- there are certain cases where we don't
have to WAL-log the data, in fact we've got a specific optimization to
avoid WAL logging when it isn't necessary (see
src/backend/commands/copy.c:2392 or so), and the data will still be
durable once the transaction commits.  There are limitations there
though, of course, but it sounds like those are ones the OP may be happy
to live with in this case.

> > > Is there a way to get my cake and eat it too?
> >
> > Not completely. Making data durable will have a cost at the end, but
> > you can leverage it.
>
> Aren't you over-playing the role of the WAL in providing durability.  An
> unlogged table remains intact after a clean shutdown and so is "durable" if
> one considers the primary "permanence" aspect of the word.

In database terms, however, durable is intended to be in the face of a
crash and not just a clean shutdown, otherwise we wouldn't need to bother
with this whole WAL thing at all.

> The trade-off the OP wishes for is "lose crash-safety to gain write-once
> (to the data files) performance".  Seeming having this on a per-table basis
> would be part of the desirability.  It sounds like OP would be willing to
> place the table into "read only" mode in order to ensure this - which is
> something that is not presently possible.  I could envision that putting an
> unlogged table into read-only mode would cause the system to ensure that
> the data files are fully populated and then set a flag in the catalog that
> informs the crash recovery process to go ahead and omit truncating that
> particular unlogged table since the data files are known to be accurate.

This does sound like a pretty interesting idea, though not really
necessary unless OP has a mix of data that needs to be WAL-log'd and
data that doesn't.

What I believe OP is really looking for here, specifically, is using
wal_level = minimal while creating the table (or truncating it) within
the same transaction as the data load is done.  That will avoid having
the table's contents written into the WAL, and PG will treat it as a
regular table post-commit, meaning that it won't be truncated on a
database crash.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [PERFORM] Unlogged tables

2017-08-09 Thread David G. Johnston

On Wed, Aug 9, 2017 at 3:39 AM, Michael Paquier 
wrote:

> This triggers a table rewrite and makes sure that all the data gets
> WAL-logged. The cost to pay for durability.
>
> > Is there a way to get my cake and eat it too?
>
> Not completely. Making data durable will have a cost at the end, but
> you can leverage it.
>
>
Aren't you over-playing the role of the WAL in providing durability.  An
unlogged table remains intact after a clean shutdown and so is "durable" if
one considers the primary "permanence" aspect of the word.

The trade-off the OP wishes for is "lose crash-safety to gain write-once
(to the data files) performance".  Seeming having this on a per-table basis
would be part of the desirability.  It sounds like OP would be willing to
place the table into "read only" mode in order to ensure this - which is
something that is not presently possible.  I could envision that putting an
unlogged table into read-only mode would cause the system to ensure that
the data files are fully populated and then set a flag in the catalog that
informs the crash recovery process to go ahead and omit truncating that
particular unlogged table since the data files are known to be accurate.

David J.

Re: [PERFORM] Unlogged tables

2017-08-09 Thread Michael Paquier

On Wed, Aug 9, 2017 at 5:20 AM, l...@laurent-hasson.com
 wrote:
> We have a fairly large static dataset that we load into Postgres. We made
> the tables UNLOGGED and saw a pretty significant performance improvement for
> the loading. This was all fantastic until the server crashed and we were
> surprised to see during a follow up demo that the data had disappeared... Of
> course, it's all our fault for not understanding the implications of
> UNLOGGED proprely.

This is documented.

> However, our scenario is truly a set of tables with 100's of millions of
> rows that are effectively WORMs: we write them once only, and then only read
> from them afterwards. As such, they could not be possibly corrupted
> post-load (i think) during a server crash (short of physical disk
> defects...).
>
> I'd like to have the performance improvement during a initial batch insert,
> and then make sure the table remains after "unclean" shutdowns, which, as
> you might have it, includes a regular Windows server shut down during
> patching for example. So unlogged tables in practice are pretty flimsy.

All the data that you want to keep needs to be durable anyway, so you
will need to WAL-log it, and full page writes of those relation pages
will need to be created at least once. After you get past the
checkpoint the data will still be around. If you want to improve the
performance once, there are a couple of tricks, like switching
wal_level to minimal, preferring COPY over multi-value INSERT, batch a
lot of them in the same transaction. Of course you can as well
increase wal_max_size to trigger less checkpoints, or use
synchronous_commit = off to reduce fsync costs.

> I tried to ALTER ... SET LOGGED, but that takes a VERY long time and pretty
> much negates the initial performance boost of loading into an unlogged
> table.

This triggers a table rewrite and makes sure that all the data gets
WAL-logged. The cost to pay for durability.

> Is there a way to get my cake and eat it too?

Not completely. Making data durable will have a cost at the end, but
you can leverage it.
-- 
Michael

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Unlogged tables

2017-08-08 Thread l...@laurent-hasson.com

Hello,


We have a fairly large static dataset that we load into Postgres. We made the 
tables UNLOGGED and saw a pretty significant performance improvement for the 
loading. This was all fantastic until the server crashed and we were surprised 
to see during a follow up demo that the data had disappeared... Of course, it's 
all our fault for not understanding the implications of UNLOGGED proprely.


However, our scenario is truly a set of tables with 100's of millions of rows 
that are effectively WORMs: we write them once only, and then only read from 
them afterwards. As such, they could not be possibly corrupted post-load (i 
think) during a server crash (short of physical disk defects...).


I'd like to have the performance improvement during a initial batch insert, and 
then make sure the table remains after "unclean" shutdowns, which, as you might 
have it, includes a regular Windows server shut down during patching for 
example. So unlogged tables in practice are pretty flimsy. I tried to ALTER ... 
SET LOGGED, but that takes a VERY long time and pretty much negates the initial 
performance boost of loading into an unlogged table.


Is there a way to get my cake and eat it too?


Thank you,

Laurent Hasson

Re: [PERFORM] unlogged tables

2015-04-14 Thread Jeff Janes

On Tue, Apr 14, 2015 at 8:41 AM, Yves Dorfsman y...@zioup.com wrote:

 On 2015-04-13 17:49, Jeff Janes wrote:
 
  One way would be to lock dirty buffers from unlogged relations into
  shared_buffers (which hardly seems like a good thing) until the start of
 a
  super-checkpoint and then write them all out as fast as possible
 (which kind
  of defeats checkpoint_completion_target).  And then if the crash happened
  during a super-checkpoint, the data would still be inconsistent and need
 to be
  truncated.
 

 What do you call a super-checkpoint?


A hypothetical checkpoint which includes writing and flushing pages of
unlogged tables.

Presumably you wouldn't want every checkpoint to do this, because if done
the way I described the super-checkpoint is a vulnerable period.  Crashes
that happen during it would result in truncation of the unlogged relation.
Since that is the very thing we want to avoid, you would want to make these
vulnerable periods rare.

Cheers,

Jeff

Re: [PERFORM] unlogged tables

2015-04-14 Thread Jeff Janes

On Mon, Apr 13, 2015 at 8:28 PM, David G. Johnston 
david.g.johns...@gmail.com wrote:

 On Mon, Apr 13, 2015 at 7:45 PM, Jim Nasby jim.na...@bluetreble.com
 wrote:



 There's been recent discussion of adding support for read-only tables. If
 we had those, we might be able to support something like...

 INSERT INTO unlogged;
 ALTER TABLE unlogged SET READ ONLY;
 CHECKPOINT;
 /* take backup */

 This should be safe as long as we WAL log changes to read-only status
 (which presumably we would).

 How much work that would entail though, I don't know.


Right.  I've been keeping an eye on that discussion with the same
intention.  The big question is how, during recovery, does it know what
state the table was in without being able to read from the system
catalogs?  Perhaps it would be the checkpointer's duty at the end of the
checkpoint to remove the init fork for unlogged relations which were turned
to read only before that checkpoint started.



 Ultimately you still have to get the data over to the other machine
 anyway. ISTM it'd be a LOT more useful to look at ways to make the WAL
 logging of bulk inserts (and especially COPY into a known empty table) a
 lot more efficient.


 Jeff Janes makes a comment about wanting ...to avoid blowing out the
 log archive...; which I also don't quite follow...


I think the WAL logging of bulk COPY is pretty space-efficient already,
provided it is not indexed at the time of the COPY.  But no amount of
efficiency improvement is going to make them small enough for me want to
keep the WAL logs around beyond the next base backup.

What I would really want is a way to make two separate WAL streams; changes
to this set of tables goes to the keep forever, for PITR stream, and
changes to this other set of tables go to the keep until pg_basebackup is
next run stream.  Of course you couldn't have fk constraints between the
two different sets of tables.

Having to get the data over to the other machine doesn't bother me, it is
just a question of how to do it without permanently intermingling it with
WAL logs which I want to keep forever.

The FDW would be a good option, except the overhead (both execution
overhead and the overhead of poor plans) seems to be too large.  I haven't
explored it as much as I would like.

Cheers,

Jeff

Re: [PERFORM] unlogged tables

2015-04-14 Thread dgabriel

David G Johnston wrote
 Well, that is half right anyway.  UNLOGGED tables obey checkpoints just
 like any other table.  The missing feature is an option to leaved restored
 the last checkpoint.  Instead, not knowing whether there were changes
 since the last checkpoint, the system truncated the relation.
 
 What use case is there for a behavior that the last checkpoint data is
 left on the relation upon restarting - not knowing whether it was possible
 the other data could have been written subsequent?

If is possible to restore the table at last checkpoint state that will be
more than enough. I don't care about the changes since last checkpoint, I am
willing to lose those changes.  There are use cases where is acceptable to
lose some data, for example in a cache system, it is not a big issue if we
lose some cached data.  



--
View this message in context: 
http://postgresql.nabble.com/unlogged-tables-tp4985453p5845650.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] unlogged tables

2015-04-14 Thread Yves Dorfsman

On 2015-04-13 17:49, Jeff Janes wrote:
 
 One way would be to lock dirty buffers from unlogged relations into
 shared_buffers (which hardly seems like a good thing) until the start of a
 super-checkpoint and then write them all out as fast as possible (which kind
 of defeats checkpoint_completion_target).  And then if the crash happened
 during a super-checkpoint, the data would still be inconsistent and need to be
 truncated.
  

What do you call a super-checkpoint?

-- 
http://yves.zioup.com
gpg: 4096R/32B0F416



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] unlogged tables

2015-04-14 Thread Jim Nasby


On 4/14/15 10:56 AM, dgabriel wrote:

David G Johnston wrote

Well, that is half right anyway.  UNLOGGED tables obey checkpoints just
like any other table.  The missing feature is an option to leaved restored
the last checkpoint.  Instead, not knowing whether there were changes
since the last checkpoint, the system truncated the relation.

What use case is there for a behavior that the last checkpoint data is
left on the relation upon restarting - not knowing whether it was possible
the other data could have been written subsequent?


If is possible to restore the table at last checkpoint state that will be
more than enough. I don't care about the changes since last checkpoint, I am
willing to lose those changes.  There are use cases where is acceptable to
lose some data, for example in a cache system, it is not a big issue if we
lose some cached data.


It is not. Unless you ensure that data is written to WAL (on disk) 
BEFORE it is written to the data pages, you will probably have 
corruption after a crash, and have no way to prevent or possibly even 
detect the corruption.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] unlogged tables

2015-04-13 Thread Matheus de Oliveira

On Mon, Apr 13, 2015 at 4:31 PM, dgabriel gabriel.do...@gmail.com wrote:

 In the event of a normal shutdown, we can flush all the writes to disk
 so we know all the data has been written, so there is no need to truncate.

 Isn't possible to periodically flush data to disk and in case of crush
 postgres to load only the data that existed at last flush? The periodic
 flush could be configurable, for example every 30 minutes or after x rows
 updated/inserted.


There is no such facility implemented for UNLOGGED TABLEs. That could be a
feature request though.

Best regards,
-- 
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres

Re: [PERFORM] unlogged tables

2015-04-13 Thread dgabriel

In the event of a normal shutdown, we can flush all the writes to disk 
so we know all the data has been written, so there is no need to truncate.

Isn't possible to periodically flush data to disk and in case of crush
postgres to load only the data that existed at last flush? The periodic
flush could be configurable, for example every 30 minutes or after x rows
updated/inserted. 



--
View this message in context: 
http://postgresql.nabble.com/unlogged-tables-tp4985453p5845576.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] unlogged tables

2015-04-13 Thread David G. Johnston

On Monday, April 13, 2015, Matheus de Oliveira matioli.math...@gmail.com
wrote:


 On Mon, Apr 13, 2015 at 4:31 PM, dgabriel gabriel.do...@gmail.com
 javascript:_e(%7B%7D,'cvml','gabriel.do...@gmail.com'); wrote:

 In the event of a normal shutdown, we can flush all the writes to disk
 so we know all the data has been written, so there is no need to
 truncate.

 Isn't possible to periodically flush data to disk and in case of crush
 postgres to load only the data that existed at last flush? The periodic
 flush could be configurable, for example every 30 minutes or after x rows
 updated/inserted.


 There is no such facility implemented for UNLOGGED TABLEs. That could be a
 feature request though.


Well, that is half right anyway.  UNLOGGED tables obey checkpoints just
like any other table.  The missing feature is an option to leaved restored
the last checkpoint.  Instead, not knowing whether there were changes since
the last checkpoint, the system truncated the relation.

What use case is there for a behavior that the last checkpoint data is left
on the relation upon restarting - not knowing whether it was possible the
other data could have been written subsequent?

David J.

Re: [PERFORM] unlogged tables

2015-04-13 Thread dgabriel

That will be a very useful feature. I don' 
t care if i loss 1-2 hours of data. I know we could have some cron jobs to
dump the table periodically but  the table could be big,  so this operation
could be expensive. Also i have to detect when postgres crush, i have no
idea how i can detect if postgres crushed. Then i have somehow to attache a
script at postgres start, to restore the dumps...the dump solution is very
complicate and  unreliable. A periodic flush feature will be amazing!

How is the procedure for feature request on postgres, github?

Thanks!



--
View this message in context: 
http://postgresql.nabble.com/unlogged-tables-tp4985453p5845580.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] unlogged tables

2015-04-13 Thread Matheus de Oliveira

On Mon, Apr 13, 2015 at 5:30 PM, Yves Dorfsman y...@zioup.com wrote:


 In my experience postgres is very aggressive in getting rid of unlogged
 tables, it does get rid of them from shutdowns that seem perfectly fine (no
 crash). A lot of people get surprised by this.


Shutdowns in fast or smart modes does not get rid of unlogged tables.
But if you do immediate, then it does, and I don't see why people get
surprised by it, as you probably shouldn't be using immediate mode in
normal circumstances.

Best regards,
-- 
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres

Re: [PERFORM] unlogged tables

2015-04-13 Thread Yves Dorfsman

On 2015-04-13 14:16, dgabriel wrote:
 That will be a very useful feature.

I agree, unlogged tables would be a lot more useful if they didn't disappear
on re-start.


 could be expensive. Also i have to detect when postgres crush, i have no
 idea how i can detect if postgres crushed. Then i have somehow to attache a
 script at postgres start, to restore the dumps...the dump solution is very
 complicate and  unreliable. A periodic flush feature will be amazing!

In my experience postgres is very aggressive in getting rid of unlogged
tables, it does get rid of them from shutdowns that seem perfectly fine (no
crash). A lot of people get surprised by this.

-- 
http://yves.zioup.com
gpg: 4096R/32B0F416



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] unlogged tables

2015-04-13 Thread Jim Nasby


On 4/13/15 3:49 PM, David G. Johnston wrote:

On Monday, April 13, 2015, Matheus de Oliveira
matioli.math...@gmail.com mailto:matioli.math...@gmail.com wrote:
On Mon, Apr 13, 2015 at 4:31 PM, dgabriel gabriel.do...@gmail.com
javascript:_e(%7B%7D,'cvml','gabriel.do...@gmail.com'); wrote:

In the event of a normal shutdown, we can flush all the writes
to disk
so we know all the data has been written, so there is no need to
truncate.

Isn't possible to periodically flush data to disk and in case of
crush
postgres to load only the data that existed at last flush? The
periodic
flush could be configurable, for example every 30 minutes or
after x rows
updated/inserted.

There is no such facility implemented for UNLOGGED TABLEs. That
could be a feature request though.

Well, that is half right anyway.  UNLOGGED tables obey checkpoints just
like any other table.  The missing feature is an option to
leaved restored the last checkpoint.  Instead, not knowing whether there
were changes since the last checkpoint, the system truncated the relation.

What use case is there for a behavior that the last checkpoint data is
left on the relation upon restarting - not knowing whether it was
possible the other data could have been written subsequent?


Yeah, this is not something that would be very easy to accomplish, 
because a buffer can get evicted and written to disk at any point. It 
wouldn't be too hard to read every unlogged table during recovery and 
see if there are any pages that were written after the last checkpoint, 
but that obviously won't be very fast.


Actually, I suppose we could dedicate a fork for unlogged tables and use 
that to record the newest LSN of any page that's been written out. But 
if you have much of any write activity on the table that's probably 
going to be completely useless.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] unlogged tables

2015-04-13 Thread Alvaro Herrera

Jim Nasby wrote:

 Yeah, this is not something that would be very easy to accomplish, because a
 buffer can get evicted and written to disk at any point. It wouldn't be too
 hard to read every unlogged table during recovery and see if there are any
 pages that were written after the last checkpoint, but that obviously won't
 be very fast.

If you consider only tables, then yeah perhaps this is easy to
accomplish (not really convinced myself).  But if you consider indexes,
things are not so easy anymore.


In the thread from 2011 (which this started as a reply to) the OP was
doing frequent UPDATEs to keep track of counts of something.  I think
that would be better served by using INSERTs of deltas and periodic
accumulation of grouped values, as suggested in
http://www.postgresql.org/message-id/20150305211601.gw3...@alvh.no-ip.org
This has actually been suggested many times over the years.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] unlogged tables

2015-04-13 Thread David G. Johnston

On Mon, Apr 13, 2015 at 4:49 PM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Mon, Apr 13, 2015 at 1:49 PM, David G. Johnston 
 david.g.johns...@gmail.com wrote:

 On Monday, April 13, 2015, Matheus de Oliveira matioli.math...@gmail.com
 wrote:


 On Mon, Apr 13, 2015 at 4:31 PM, dgabriel gabriel.do...@gmail.com
 wrote:

 In the event of a normal shutdown, we can flush all the writes to disk
 so we know all the data has been written, so there is no need to
 truncate.

 Isn't possible to periodically flush data to disk and in case of crush
 postgres to load only the data that existed at last flush? The periodic
 flush could be configurable, for example every 30 minutes or after x
 rows
 updated/inserted.


 There is no such facility implemented for UNLOGGED TABLEs. That could be
 a feature request though.


 One way would be to lock dirty buffers from unlogged relations into
 shared_buffers (which hardly seems like a good thing) until the start of a
 super-checkpoint and then write them all out as fast as possible (which
 kind of defeats checkpoint_completion_target).  And then if the crash
 happened during a super-checkpoint, the data would still be inconsistent
 and need to be truncated.




 Well, that is half right anyway.  UNLOGGED tables obey checkpoints just
 like any other table.


 Do they?  I thought they only obeyed shutdown checkpoints, not online
 checkpoints.  I do remember some changes around this area, but none that
 completely reverted that logic.


I vaguely recall that conversation now...I'm not positive on the exact
mechanics here and, as it pertains to the OP, the difference you describe
is immaterial since in either case the status quo mandates an all or
nothing approach to an unlogged table's contents.





 The missing feature is an option to leaved restored the last checkpoint.
 Instead, not knowing whether there were changes since the last checkpoint,
 the system truncated the relation.

 What use case is there for a behavior that the last checkpoint data is
 left on the relation upon restarting - not knowing whether it was possible
 the other data could have been written subsequent?


 I would like a way to have unlogged tables be available on a replica
 provided that no changes were made to them between the pg_basebackup and
 the recovery point.


 My use case is that I mark certain read-only-after-bulk-loading tables as
 unlogged solely to avoid blowing out the log archive during the loading
 phase and refresh phase.  This is stuff like vendor catalogs, NCBI
 datasets, ChEMBL datasets, etc, which can simply be re-derived from the
 reference.  It would be nice if these were still available (without having
 to repeat the ETL) after crashes provided they were not written to since a
 checkpoint, and available on cloned test servers without having to repeat
 the ETL on those as well.



My gut reaction is that those should be in their own clusters and accessed
via postgres_fdw...

That particular use-case would probably best be served with a separate
replication channel which pushes data files from the primary to the slaves
and allows for the slave to basically rewrite its existing table by
pointing to the newly supplied version.  Some kind of CREATE STATIC TABLE
and PUSH STATIC TABLE TO {all | replica name} command combo...though
ideally with less manual intervention...

David J.

Re: [PERFORM] unlogged tables

2015-04-13 Thread Jim Nasby


On 4/13/15 4:13 PM, Alvaro Herrera wrote:

Jim Nasby wrote:


Yeah, this is not something that would be very easy to accomplish, because a
buffer can get evicted and written to disk at any point. It wouldn't be too
hard to read every unlogged table during recovery and see if there are any
pages that were written after the last checkpoint, but that obviously won't
be very fast.


If you consider only tables, then yeah perhaps this is easy to
accomplish (not really convinced myself).  But if you consider indexes,
things are not so easy anymore.


Are indexes not guaranteed to have LSNs? I thought they basically 
followed the same write rules as heap pages in regard to WAL first.


Though, if you have an index that doesn't support logging (like hash) 
you're still hosed...



In the thread from 2011 (which this started as a reply to) the OP was


I don't keep PGSQL emails from that far back... ;)


doing frequent UPDATEs to keep track of counts of something.  I think
that would be better served by using INSERTs of deltas and periodic
accumulation of grouped values, as suggested in
http://www.postgresql.org/message-id/20150305211601.gw3...@alvh.no-ip.org
This has actually been suggested many times over the years.


What I was suggesting certainly wouldn't help you if you were getting 
any serious amount of changes to the count.


I am wondering though what the bottleneck in HEAD is with doing an 
UPDATE instead of an INSERT, at least where unlogged would help 
significantly. I didn't think we logged all that much more for an 
UPDATE. Heck, with HOT you might even be able to log less.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] unlogged tables

2015-04-13 Thread Jeff Janes

On Mon, Apr 13, 2015 at 1:49 PM, David G. Johnston 
david.g.johns...@gmail.com wrote:

 On Monday, April 13, 2015, Matheus de Oliveira matioli.math...@gmail.com
 wrote:


 On Mon, Apr 13, 2015 at 4:31 PM, dgabriel gabriel.do...@gmail.com
 wrote:

 In the event of a normal shutdown, we can flush all the writes to disk
 so we know all the data has been written, so there is no need to
 truncate.

 Isn't possible to periodically flush data to disk and in case of crush
 postgres to load only the data that existed at last flush? The periodic
 flush could be configurable, for example every 30 minutes or after x rows
 updated/inserted.


 There is no such facility implemented for UNLOGGED TABLEs. That could be
 a feature request though.


One way would be to lock dirty buffers from unlogged relations into
shared_buffers (which hardly seems like a good thing) until the start of a
super-checkpoint and then write them all out as fast as possible (which
kind of defeats checkpoint_completion_target).  And then if the crash
happened during a super-checkpoint, the data would still be inconsistent
and need to be truncated.




 Well, that is half right anyway.  UNLOGGED tables obey checkpoints just
 like any other table.


Do they?  I thought they only obeyed shutdown checkpoints, not online
checkpoints.  I do remember some changes around this area, but none that
completely reverted that logic.



 The missing feature is an option to leaved restored the last checkpoint.
 Instead, not knowing whether there were changes since the last checkpoint,
 the system truncated the relation.

 What use case is there for a behavior that the last checkpoint data is
 left on the relation upon restarting - not knowing whether it was possible
 the other data could have been written subsequent?


I would like a way to have unlogged tables be available on a replica
provided that no changes were made to them between the pg_basebackup and
the recovery point.

My use case is that I mark certain read-only-after-bulk-loading tables as
unlogged solely to avoid blowing out the log archive during the loading
phase and refresh phase.  This is stuff like vendor catalogs, NCBI
datasets, ChEMBL datasets, etc, which can simply be re-derived from the
reference.  It would be nice if these were still available (without having
to repeat the ETL) after crashes provided they were not written to since a
checkpoint, and available on cloned test servers without having to repeat
the ETL on those as well.

As for maybe its corrupt, maybe it isn't, but lets keep them anyway,
yeah, I have little use for that.

Cheers,

Jeff

Re: [PERFORM] unlogged tables

2015-04-13 Thread Jim Nasby


On 4/13/15 7:32 PM, David G. Johnston wrote:

The missing feature is an option to leaved restored the last
checkpoint.  Instead, not knowing whether there were changes
since the last checkpoint, the system truncated the relation.

What use case is there for a behavior that the last checkpoint
data is left on the relation upon restarting - not knowing
whether it was possible the other data could have been written
subsequent?


I would like a way to have unlogged tables be available on a replica
provided that no changes were made to them between the pg_basebackup
and the recovery point.


My use case is that I mark certain read-only-after-bulk-loading
tables as unlogged solely to avoid blowing out the log archive
during the loading phase and refresh phase.  This is stuff like
vendor catalogs, NCBI datasets, ChEMBL datasets, etc, which can
simply be re-derived from the reference.  It would be nice if these
were still available (without having to repeat the ETL) after
crashes provided they were not written to since a checkpoint, and
available on cloned test servers without having to repeat the ETL on
those as well.


My gut reaction is that those should be in their own clusters and
accessed via postgres_fdw...


Likely to produce really crappy plans if the tables are of any real size...


That particular use-case would probably best be served with a separate
replication channel which pushes data files from the primary to the
slaves and allows for the slave to basically rewrite its existing
table by pointing to the newly supplied version.  Some kind of CREATE
STATIC TABLE and PUSH STATIC TABLE TO {all | replica name} command
combo...though ideally with less manual intervention...


You still have the same problem of knowing if someone has scribbled on 
the data since the last checkpoint.


There's been recent discussion of adding support for read-only tables. 
If we had those, we might be able to support something like...


INSERT INTO unlogged;
ALTER TABLE unlogged SET READ ONLY;
CHECKPOINT;
/* take backup */

This should be safe as long as we WAL log changes to read-only status 
(which presumably we would).


How much work that would entail though, I don't know.

Ultimately you still have to get the data over to the other machine 
anyway. ISTM it'd be a LOT more useful to look at ways to make the WAL 
logging of bulk inserts (and especially COPY into a known empty table) a 
lot more efficient.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] unlogged tables

2015-04-13 Thread David G. Johnston

On Mon, Apr 13, 2015 at 7:45 PM, Jim Nasby jim.na...@bluetreble.com wrote:

 On 4/13/15 7:32 PM, David G. Johnston wrote:

  That particular use-case would probably best be served with a separate
 replication channel which pushes data files from the primary to the
 slaves and allows for the slave to basically rewrite its existing
 table by pointing to the newly supplied version.  Some kind of CREATE
 STATIC TABLE and PUSH STATIC TABLE TO {all | replica name} command
 combo...though ideally with less manual intervention...


 You still have the same problem of knowing if someone has scribbled on the
 data since the last checkpoint.


That seems like an automation concern though...the more limited idea was
to simply have a means for a table to exist on the master and allow the
user to cause an exact copy of that table to appear on a replica via direct
data transfer (i.e., without need to create a backup/dump).  If the table
already exists on the replica the existing version remains as-is until the
new table is fully push and then a filenode pointer update happens.  If
changes are made to the master the two tables will remain diverged until a
new push occurs.

I imaging this same idea could be handled external to the database though
I'm don't know enough to comment on the specific technical merits of each.


 There's been recent discussion of adding support for read-only tables. If
 we had those, we might be able to support something like...

 INSERT INTO unlogged;
 ALTER TABLE unlogged SET READ ONLY;
 CHECKPOINT;
 /* take backup */

 This should be safe as long as we WAL log changes to read-only status
 (which presumably we would).

 How much work that would entail though, I don't know.

 Ultimately you still have to get the data over to the other machine
 anyway. ISTM it'd be a LOT more useful to look at ways to make the WAL
 logging of bulk inserts (and especially COPY into a known empty table) a
 lot more efficient.


Jeff Janes makes a comment about wanting ...to avoid blowing out the log
archive...; which I also don't quite follow...

WAL does seem to be designed to solve a different problem that what is
described here - lots of small changes versus few large changes.  Improving
WAL to move the size at which small becomes large is a win but another
channel designed for few large changes may be less complex to implement.
The current work in logical replication likely has merit here as well but
my familiarity with that technology is fairly limited.

David J.

Re: [PERFORM] unlogged tables

2011-12-04 Thread Pierre C

My table is a statistics counters table, so I can live with a partial  
data

loss, but not with a full data loss because many counters are weekly and
monthly.

Unlogged table can increase speed, this table has about 1.6 millions of
update per hour, but unlogged with a chance of loss all information on a
crash are not a good idea for this.


You could use an unlogged table for hourly updates, and periodically,  
accumulate those counters to a (logged) daily/weekly table...


The hourly table could be rebuilt by examining only 1 hour's worth of  
data, so it isn't too much of a problem if it's lost. The other tables  
would get much less updates.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] unlogged tables

2011-11-15 Thread Josh Berkus


 Unlogged table can increase speed, this table has about 1.6
 millions of update per hour, but unlogged with a chance of loss
 all information on a crash are not a good idea for this.
  
 pg_dump -t 'tablename' from a cron job?  (Make sure to rotate dump
 file names, maybe with day of week or some such.)

Or just CREATE TABLE AS copy the table every hour to a second, backup
table.  Then it would be much easier to script automated restore of the
data.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] unlogged tables

2011-11-14 Thread Sergey Konoplev

Hi,

On 12 November 2011 00:18, Stephen Frost sfr...@snowman.net wrote:
 In a crash, unlogged tables are automatically truncated.

BTW I wonder what for they are truncated in a crash case?

-- 
Sergey Konoplev

Blog: http://gray-hemp.blogspot.com
LinkedIn: http://ru.linkedin.com/in/grayhemp
JID/GTalk: gray...@gmail.com Skype: gray-hemp

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] unlogged tables

2011-11-14 Thread Richard Huxton


On 14/11/11 08:10, Sergey Konoplev wrote:

Hi,

On 12 November 2011 00:18, Stephen Frostsfr...@snowman.net  wrote:

In a crash, unlogged tables are automatically truncated.


BTW I wonder what for they are truncated in a crash case?


Because they bypass the transaction-log (WAL), hence unlogged.

There's no way to know whether there were partial updates applied when 
the system restarts.


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] unlogged tables

2011-11-14 Thread Sergey Konoplev

On 14 November 2011 12:58, Richard Huxton d...@archonet.com wrote:
 Because they bypass the transaction-log (WAL), hence unlogged.
 There's no way to know whether there were partial updates applied when the
 system restarts.

I probably did not understand the truncate meaning correct. It
truncates all the records of the table or several recent records only?


 --
  Richard Huxton
  Archonet Ltd




-- 
Sergey Konoplev

Blog: http://gray-hemp.blogspot.com
LinkedIn: http://ru.linkedin.com/in/grayhemp
JID/GTalk: gray...@gmail.com Skype: gray-hemp

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] unlogged tables

2011-11-14 Thread Richard Huxton


On 14/11/11 10:08, Sergey Konoplev wrote:

On 14 November 2011 12:58, Richard Huxtond...@archonet.com  wrote:

Because they bypass the transaction-log (WAL), hence unlogged.
There's no way to know whether there were partial updates applied when the
system restarts.


I probably did not understand the truncate meaning correct. It
truncates all the records of the table or several recent records only?


All.

Let's say you were doing something like UPDATE unlogged_table SET x=1 
WHERE y=2. If a crash occurs during this command, there's no guarantee 
that the affected disk pages were all updated. Worse, a single page 
might be partially updated or even have rubbish in it (depending on the 
nature of the crash).


Without the WAL there's no way to check whether the table is good or 
not, or even to know what the last updates were. So - the only safe 
thing to do is truncate the unlogged tables.


In the event of a normal shutdown, we can flush all the writes to disk 
so we know all the data has been written, so there is no need to truncate.


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] unlogged tables

2011-11-14 Thread Sergey Konoplev

On 14 November 2011 14:17, Richard Huxton d...@archonet.com wrote:
 On 14/11/11 10:08, Sergey Konoplev wrote:

 On 14 November 2011 12:58, Richard Huxtond...@archonet.com  wrote:
 Let's say you were doing something like UPDATE unlogged_table SET x=1 WHERE
 y=2. If a crash occurs during this command, there's no guarantee that the
 affected disk pages were all updated. Worse, a single page might be
 partially updated or even have rubbish in it (depending on the nature of the
 crash).

 Without the WAL there's no way to check whether the table is good or not, or
 even to know what the last updates were. So - the only safe thing to do is
 truncate the unlogged tables.

 In the event of a normal shutdown, we can flush all the writes to disk so we
 know all the data has been written, so there is no need to truncate.

Thank you for the explanation. Now I understand it.


 --
  Richard Huxton
  Archonet Ltd




-- 
Sergey Konoplev

Blog: http://gray-hemp.blogspot.com
LinkedIn: http://ru.linkedin.com/in/grayhemp
JID/GTalk: gray...@gmail.com Skype: gray-hemp

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] unlogged tables

2011-11-14 Thread Anibal David Acosta

Maybe an option like Recover from file  will be useful
So, for example, daily some process do a COPY of entire table to a file 

In case of crash postgres recover content from the file.

:)



-Mensaje original-
De: Sergey Konoplev [mailto:gray...@gmail.com] 
Enviado el: lunes, 14 de noviembre de 2011 07:39 a.m.
Para: Richard Huxton
CC: Stephen Frost; Anibal David Acosta; pgsql-performance@postgresql.org
Asunto: Re: [PERFORM] unlogged tables

On 14 November 2011 14:17, Richard Huxton d...@archonet.com wrote:
 On 14/11/11 10:08, Sergey Konoplev wrote:

 On 14 November 2011 12:58, Richard Huxtond...@archonet.com  wrote:
 Let's say you were doing something like UPDATE unlogged_table SET x=1 
 WHERE y=2. If a crash occurs during this command, there's no 
 guarantee that the affected disk pages were all updated. Worse, a 
 single page might be partially updated or even have rubbish in it 
 (depending on the nature of the crash).

 Without the WAL there's no way to check whether the table is good or 
 not, or even to know what the last updates were. So - the only safe 
 thing to do is truncate the unlogged tables.

 In the event of a normal shutdown, we can flush all the writes to disk 
 so we know all the data has been written, so there is no need to truncate.

Thank you for the explanation. Now I understand it.


 --
  Richard Huxton
  Archonet Ltd




--
Sergey Konoplev

Blog: http://gray-hemp.blogspot.com
LinkedIn: http://ru.linkedin.com/in/grayhemp
JID/GTalk: gray...@gmail.com Skype: gray-hemp


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] unlogged tables

2011-11-14 Thread Kevin Grittner

Anibal David Acosta a...@devshock.com wrote:
 
 Maybe an option like Recover from file  will be useful
 So, for example, daily some process do a COPY of entire table to a
 file 
 
 In case of crash postgres recover content from the file.
 
If you need to recover file contents on a crash, then an unlogged
table is probably not the right choice.  There is always
asynchronous commit.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] unlogged tables

2011-11-14 Thread Anibal David Acosta

I am doing asynchronous commit but sometimes I think that there are so  many
things in an insert/update transaction, for a table that has not too much
important information.

My table is a statistics counters table, so I can live with a partial data
loss, but not with a full data loss because many counters are weekly and
monthly.

Unlogged table can increase speed, this table has about 1.6 millions of
update per hour, but unlogged with a chance of loss all information on a
crash are not a good idea for this.

Anyway, thanks Kevin!







-Mensaje original-
De: Kevin Grittner [mailto:kevin.gritt...@wicourts.gov] 
Enviado el: lunes, 14 de noviembre de 2011 02:27 p.m.
Para: 'Richard Huxton'; Anibal David Acosta; 'Sergey Konoplev'
CC: pgsql-performance@postgresql.org; 'Stephen Frost'
Asunto: Re: [PERFORM] unlogged tables

Anibal David Acosta a...@devshock.com wrote:
 
 Maybe an option like Recover from file  will be useful So, for 
 example, daily some process do a COPY of entire table to a file
 
 In case of crash postgres recover content from the file.
 
If you need to recover file contents on a crash, then an unlogged table is
probably not the right choice.  There is always asynchronous commit.
 
-Kevin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] unlogged tables

2011-11-14 Thread Kevin Grittner

Anibal David Acosta a...@devshock.com wrote:
 
 I am doing asynchronous commit but sometimes I think that there
 are so  many things in an insert/update transaction, for a table
 that has not too much important information.
 
 My table is a statistics counters table, so I can live with a
 partial data loss, but not with a full data loss because many
 counters are weekly and monthly.
 
 Unlogged table can increase speed, this table has about 1.6
 millions of update per hour, but unlogged with a chance of loss
 all information on a crash are not a good idea for this.
 
pg_dump -t 'tablename' from a cron job?  (Make sure to rotate dump
file names, maybe with day of week or some such.)
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] unlogged tables

2011-11-11 Thread Anibal David Acosta

Hello, just for clarification.

 

Unlogged tables are not memory tables don't?

 

If we stop postgres server (normal stop) and start again, all information in
unlogged tables still remain?

 

So, can I expect a data loss just in case of crash, power failure or SO
crash don't?

 

In case of crash, is possible that data corruption happened in a unlogged
tables?

 

For performance purpose can I use async commit and unlogged tables?

 

 

Thanks!

Re: [PERFORM] unlogged tables

2011-11-11 Thread Stephen Frost

* Anibal David Acosta (a...@devshock.com) wrote:
 Unlogged tables are not memory tables don't?

Unlogged tables are not memory tables.

 If we stop postgres server (normal stop) and start again, all information in
 unlogged tables still remain?

Yes.

 So, can I expect a data loss just in case of crash, power failure or SO
 crash don't?

Yes.

 In case of crash, is possible that data corruption happened in a unlogged
 tables?

In a crash, unlogged tables are automatically truncated.

 For performance purpose can I use async commit and unlogged tables?

I'm not aware of any issues (beyond those already documented for async
commit..) with having async commit and unlogged tables.

THanks,

Stephen


signature.asc
Description: Digital signature

42 matches

Mail list logo