Re: [HACKERS] restoration after crash slowness, any way to improve?

2016-09-01 Thread Jeff Janes
On Wed, Aug 31, 2016 at 6:26 PM, Joshua D. Drake 
wrote:

> -hackers,
>
> So this is more of a spit balling thread than anything. As I understand
> it, if we have a long running transaction or a large number of wal logs and
> we crash, we then have to restore those logs on restart to the last known
> good transaction. No problem.
>

It only has to replay from the start of the last successful checkpoint.  It
doesn't matter whether there was a long-running transaction or not.  If a
long transaction spans many checkpoints, replay still only has to go back
to the start of the last successful checkpoint.  Maybe you just had
checkpoint_segments or max_wal_size st way too high, assuming
checkpoint_timeout to always kick in instead and be the limiting factor.
But then your long-running transaction  invalidated that assumption?


> I recently ran a very long transaction. I was building up a large number
> of rows into a two column table to test index performance. I ended up
> having to kill the connection and thus the transaction after I realized I
> had an extra zero in my generate_series(). (Side note: Amazing the
> difference a single zero can make ;)). When coming back up, I watched the
> machine and I was averaging anywhere from 60MBs to 97MBs on writes.


Was it IO limited?

Killing a session/connection/transaction should not take down the entire
server, so there should be no recovery taking place in the first place.
Are you sure you are seeing recovery, and not just the vacuuming of the
aborted tuples?



> However, since I know this machine can get well over 400MBs when using
> multiple connections I can't help but wonder if there is anything we can do
> to make restoration more efficient without sacrificing the purpose of what
> it is doing?
>
> Can we have multiple readers pull transaction logs into shared_buffers (on
> recovery only), sort the good transactions and then push them back to the
> walwriter or bgwriter to the pages?
>

I don't see how that could work.  Whether a page is consistent or not is
orthogonal to whether the transactions on that page have committed or
aborted.

There are two possibilities that I've considered though for long-running
PITR, which could also apply to crash recovery, and which I think have been
discussed here before.  One is to have a leading recovery process which
would identify pages which will be recovered from a FPI, and send word back
to the lagging process not to bother applying incremental WAL to those
pages.  The other would be for a leading process to asynchronously read
into memory (either FS cache or shared_buffers) pages which it sees the
lagging process will need to write to.

In the first case, you would want the leading process to be leading by a
lot, so that it has the broadest scope to detect FPI.  Basically you would
want it to read all the way to the end of the replay, provided it had
enough memory to store the list of FPI pages.  For the second one, you
would not want it to run so far ahead that it the pages it read in would
get pushed out again before the lagging process got to them.  Controlling
how far ahead that would be seems like it would be hard.

Cheers,

Jeff


[HACKERS] restoration after crash slowness, any way to improve?

2016-08-31 Thread Joshua D. Drake

-hackers,

So this is more of a spit balling thread than anything. As I understand 
it, if we have a long running transaction or a large number of wal logs 
and we crash, we then have to restore those logs on restart to the last 
known good transaction. No problem.


I recently ran a very long transaction. I was building up a large number 
of rows into a two column table to test index performance. I ended up 
having to kill the connection and thus the transaction after I realized 
I had an extra zero in my generate_series(). (Side note: Amazing the 
difference a single zero can make ;)). When coming back up, I watched 
the machine and I was averaging anywhere from 60MBs to 97MBs on writes. 
That in itself isn't that bad over a single thread and a single SSD, 
doing what it is doing.


However, since I know this machine can get well over 400MBs when using 
multiple connections I can't help but wonder if there is anything we can 
do to make restoration more efficient without sacrificing the purpose of 
what it is doing?


Can we have multiple readers pull transaction logs into shared_buffers 
(on recovery only), sort the good transactions and then push them back 
to the walwriter or bgwriter to the pages?


Anyway, like I said, spitballing and I thought I would start the thread.

Sincerely,

JD

--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers