Not sure if anyone else has experienced the below, postgres configs can be
shared if needed, we have a few database servers running on 7.12 and 9
We recently noticed some problems with timeouts on some postgres database
servers. The machines don't appear to be heavily loaded, although they are
On Fri, Feb 19, 2021 at 08:47:16AM -0500, Greg Troxel wrote:
> I see our man page addresses this with FDISKSYNC. It sounds like you
> aren't proposing to change this (makes sense), but there's the pesky
> issue of errors within the disk when writing from cache to media.
> Perhaps those are
On Fri, Feb 19, 2021 at 08:33:03AM -0500, Greg Troxel wrote:
> Maybe I'm way off in space, but I'd like to see us be careful about
>
> 1) operating system has a succcessful return from a write transaction to
> a disk controller (perhaps via a controller that has a write-back
> cache)
> On Feb 19, 2021, at 5:33 AM, Greg Troxel wrote:
>
> I thought NCQ was supposed to give acks for actual writing, but allow
> them to be perhaps ordered and multiple in flight, so that one could use
> that instead of the big-hammer inscrutable writeback cache.
Certainly in the universe of
> On Feb 18, 2021, at 5:43 PM, David Holland wrote:
>
> And currently there's a problem that the only way to flush the
> underlying hardware-level caches is to call fsync_range and pass
> FDISKSYNC. This might be POSIX (is it? man page doesn't say so) but it
> doesn't necessarily seem helpful
Greg Troxel writes:
> 1) operating system has a succcessful return from a write transaction to
> a disk controller (perhaps via a controller that has a write-back
> cache)
>
> 2) operating system has been told by the controller that the write has
> actually completed to stable storage
David Holland writes:
> > > everything that process wrote is on disk,
> >
> > That is probably unattainable, since I've seen it plausibly asserted
> > that some disks lie, reporting that writes are on the media when this
> > is not actually true.
>
> Indeed. What I meant to say is that
On Fri, Feb 19, 2021 at 01:43:07AM +, David Holland wrote:
> [...]
>
> (9) We need a model for what happens to the unwritten data. Throwing
> it away is clearly wrong (some may recall a furor a couple years ago
> when it was discovered that Linux did this) but retrying and likely
> failing on