On Tue, 2010-05-18 at 17:06 -0400, Heikki Linnakangas wrote:
> On 17/05/10 04:40, Simon Riggs wrote:
> > On Sun, 2010-05-16 at 16:53 +0100, Simon Riggs wrote:
> >>>
> >>> Attached patch rearranges the walsender loops slightly to fix the above.
> >>> XLogSend() now only sends up to MAX_SEND_SIZE bytes (== XLOG_SEG_SIZE /
> >>> 2) in one round and returns to the main loop after that even if there's
> >>> unsent WAL, and the main loop no longer sleeps if there's unsent WAL.
> >>> That way the main loop gets to respond to signals quickly, and we also
> >>> get to update the shared memory status and PS display more often when
> >>> there's a lot of catching up to do.
> >>>
> >>> Comments
> >>
> >> 8MB at a time still seems like a large batch to me.
> >>
> >> libpq is going to send it in smaller chunks anyway, so I don't see the
> >> importance of trying to keep the batch too large. It just introduces
> >> delay into the sending process. We should be sending chunks that matches
> >> libpq better.
> >
> > More to the point the logic will fail if XLOG_BLCKSZ>  PQ_BUFFER_SIZE
> > because it will send partial pages.
> 
> I don't see a failure. We rely on not splitting WAL records across 
> messages, but we're talking about libpq-level CopyData messages, not TCP 
> messages.

OK

> > Having MAX_SEND_SIZE>  PQ_BUFFER_SIZE is pointless, as libpq currently
> > stands.
> 
> Well, it does affect the size of the read() in walsender, and I'm sure 
> there's some overhead in setting the ps display and the other small 
> stuff we do once per message. But you're probably right that we could 
> easily make MAX_SEND_SIZE much smaller with no noticeable affect on 
> performance, while making walsender more responsive to signals. I'll 
> decrease it to, say, 512 kB.

I'm pretty certain we don't need to set the ps display once per message.
ps doesn't need an update 5 times per second on average.

There's no reason that the buffer size we use for XLogRead() should be
the same as the send buffer, if you're worried about that. My point is
that pq_putmessage contains internal flushes so at the libpq level you
gain nothing by big batches. The read() will be buffered anyway with
readahead so not sure what the issue is. We'll have to do this for sync
rep anyway, so what's the big deal? Just do it now, once. Do we really
want 9.1 code to differ here?

-- 
 Simon Riggs           www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to