On Sat, Feb 18, 2012 at 12:36 AM, Heikki Linnakangas <heikki.linnakan...@enterprisedb.com> wrote: > Attached is a new version, fixing that, and off-by-one bug you pointed out > in the slot wraparound handling. I also moved code around a bit, I think > this new division of labor between the XLogInsert subroutines is more > readable.
This patch includes not only xlog scaling improvement but also other ones. I think it's better to extract them as separate patches and commit them first. If we do so, the main patch would become more readable. For example, I think that the followings can be extracted as a separate patch. (1) Make walwriter try to initialize as many of the no-longer-needed WAL buffers for future use as we can. (2) Refactor the "update full_page_writes code". (3) Get rid of XLogCtl->Write.LogwrtResult and XLogCtl->Insert.LogwrtResult. (4) Call TRACE_POSTGRESQL_XLOG_SWITCH() even if the xlog switch has no work to do. Others? I'm not sure if (3) makes sense. In current master, those two shared variables are used to reduce the contention of XLogCtl->info_lck and WALWriteLock. You think they have no effect on reducing the lock contention? In some places, the spinlock "insertpos_lck" is taken while another spinlock "info_lck" is being held. Is this OK? What if unfortunately inner spinlock takes long to be taken? + * An xlog-switch record consumes all the remaining space on the + * WAL segment. We have already reserved it for us, but we still need + * to make sure it's been allocated and zeroed in the WAL buffers so + * that when the caller (or someone else) does XLogWrite(), it can + * really write out all the zeros. Why do we need to write out all the remaining space with zeros? In current master, we don't do that. A recovery code ignores the data following XLOG_SWITCH record, so I don't think that's required. + /* XXX: before this patch, TRACE_POSTGRESQL_XLOG_SWITCH was not called + * if the xlog switch had no work to do, ie. if we were already at the + * beginning of a new XLOG segment. You can check if RecPtr points to + * beginning of a segment if you want to keep the distinction. + */ + TRACE_POSTGRESQL_XLOG_SWITCH(); If so, RecPtr (or the flag indicating whether the xlog switch has no work to do) should be given to TRACE_POSTGRESQL_XLOG_SWITCH() as an argument? The followings are trivial comments: + * insertion, but ẃhile we're at it, advance lastslot as much as we Typo: 'ẃ' should be 'w' In XLogRecPtrToBufIdx() and XLogRecEndPtrToBufIdx(), why don't you use XLogFileSize instead of XLogSegsPerFile * XLogSegSize? There are some source code comments which still refer to WALInsertLock. It's cleaner if pg_start_backup_callback() uses Insert instead of XLogCtl->Insert, like do_pg_start_backup(), do_pg_stop_backup() and do_pg_abort_backup() do. + freespace = XLOG_BLCKSZ - EndRecPtr.xrecoff % XLOG_BLCKSZ; Though this is not incorrect, it's better to use EndOfLog instead of EndRecPtr, like near code does. while (extraWaits-- > 0) PGSemaphoreUnlock(&MyProc->sem); This should be executed also in WaitForXLogInsertionSlotToBecomeFree()? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers