timeouts connecting to pgsql database

2021-02-19 Thread Derrick Lobo
Not sure if anyone else has experienced the below, postgres configs can be shared if needed, we have a few database servers running on 7.12 and 9 We recently noticed some problems with timeouts on some postgres database servers. The machines don't appear to be heavily loaded, although they are

Re: fsync error reporting

2021-02-19 Thread David Holland
On Fri, Feb 19, 2021 at 08:47:16AM -0500, Greg Troxel wrote: > I see our man page addresses this with FDISKSYNC. It sounds like you > aren't proposing to change this (makes sense), but there's the pesky > issue of errors within the disk when writing from cache to media. > Perhaps those are

Re: fsync error reporting

2021-02-19 Thread David Holland
On Fri, Feb 19, 2021 at 08:33:03AM -0500, Greg Troxel wrote: > Maybe I'm way off in space, but I'd like to see us be careful about > > 1) operating system has a succcessful return from a write transaction to > a disk controller (perhaps via a controller that has a write-back > cache)

Re: fsync error reporting

2021-02-19 Thread Jason Thorpe
> On Feb 19, 2021, at 5:33 AM, Greg Troxel wrote: > > I thought NCQ was supposed to give acks for actual writing, but allow > them to be perhaps ordered and multiple in flight, so that one could use > that instead of the big-hammer inscrutable writeback cache. Certainly in the universe of

Re: fsync error reporting

2021-02-19 Thread Jason Thorpe
> On Feb 18, 2021, at 5:43 PM, David Holland wrote: > > And currently there's a problem that the only way to flush the > underlying hardware-level caches is to call fsync_range and pass > FDISKSYNC. This might be POSIX (is it? man page doesn't say so) but it > doesn't necessarily seem helpful

Re: fsync error reporting

2021-02-19 Thread Greg Troxel
Greg Troxel writes: > 1) operating system has a succcessful return from a write transaction to > a disk controller (perhaps via a controller that has a write-back > cache) > > 2) operating system has been told by the controller that the write has > actually completed to stable storage

Re: fsync error reporting

2021-02-19 Thread Greg Troxel
David Holland writes: > > > everything that process wrote is on disk, > > > > That is probably unattainable, since I've seen it plausibly asserted > > that some disks lie, reporting that writes are on the media when this > > is not actually true. > > Indeed. What I meant to say is that

Re: fsync error reporting

2021-02-19 Thread tlaronde
On Fri, Feb 19, 2021 at 01:43:07AM +, David Holland wrote: > [...] > > (9) We need a model for what happens to the unwritten data. Throwing > it away is clearly wrong (some may recall a furor a couple years ago > when it was discovered that Linux did this) but retrying and likely > failing on