Re: [HACKERS] Idea for fixing the Windows fsync problem

Magnus Hagander Tue, 16 Jan 2007 11:38:39 -0800

Tom Lane wrote:
> I just had a thought about fixing those Windows "permission denied"
> problems.  The case that we believe we understand is where the bgwriter
> is trying to execute a previously-logged fsync request against a table
> file that is pending delete --- that is, actually has been unlink()'d,
> but some other process is holding an open file reference to it.  The
> problem is only for fsync, not for write(), because the table drop
> sequence always invalidates every shared buffer for the table before
> trying to unlink it.
> 
> So: maybe the solution is to add a step to the drop sequence, namely
> revoking any pending fsync request, before unlink.  This would not only
> clean up the Windows issue, it'd also let us remove the current hack in
> md.c to not complain about an ENOENT failure (which is really hardly any
> safer than ignoring EACCES would be, if you want to be honest about it).


Sounds good so far :-)


> The problem is that the ForwardFsyncRequest() mechanism is asynchronous:
> currently, a backend could see pending fsync requests that are still in
> the shared-memory queue, but there's no way to tell whether the bgwriter
> has already absorbed some requests into its private memory.  How can a
> backend tell the bgwriter to forget about it, and then delay until it
> can be sure that the bgwriter won't try it later?
> 
> We could have backends put "revoke fsync" requests into the shared queue
> and then sleep until they see the queue has been drained ... but there's
> not a convenient way to implement that delay, and I hardly want to just
> "sleep and retry" during every table drop.  It'd probably take at least
> one more LWLock, and noticeably more complicated ForwardFsyncRequest()
> logic, to make this work.
> 
> Thoughts?  Is this a reasonable solution path, or is it likely to be a
> waste of time?  We know that there are causes of "permission denied"
> that are not explained by the pending-delete problem.

Do we need to actually wait for it? Does the backend need to know when
it's done? If it fires off the "discard" request, then it's up to the
bgwriter to see it in time, no?

Perhaps we could have the bgwrite check the queue *if* it gets the
ENOENT/EACCESS error and then re-check the queue for drops on that file?
Or maybe that's even more complex?  (I confess I haven't looked at the
code..)

//Magnus

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

                http://www.postgresql.org/about/donate

Re: [HACKERS] Idea for fixing the Windows fsync problem

Reply via email to