On Sat, May 08, 2021 at 04:57:54PM +1200, Thomas Munro wrote:
> On Sat, May 8, 2021 at 2:30 AM Tom Lane wrote:
> > May 07 03:31:39 gcc202 kernel: sunvdc: vdc_tx_trigger() failure, err=-11
>
> That's -EAGAIN (assuming errnos match x86) and I guess it indicates
> that VDC_MAX_RETRIES is exceeded
On Sat, May 8, 2021 at 2:30 AM Tom Lane wrote:
> May 07 03:31:39 gcc202 kernel: sunvdc: vdc_tx_trigger() failure, err=-11
That's -EAGAIN (assuming errnos match x86) and I guess it indicates
that VDC_MAX_RETRIES is exceeded here:
On Fri, May 07, 2021 at 10:18:14PM -0400, Tom Lane wrote:
> Andres Freund writes:
> > On 2021-05-07 17:14:18 -0700, Noah Misch wrote:
> >> Having a flaky buildfarm member is bad news. I'll LD_PRELOAD the attached
> >> to
> >> prevent fsync from reaching the kernel. Hopefully, that will make
Andres Freund writes:
> On 2021-05-07 17:14:18 -0700, Noah Misch wrote:
>> Having a flaky buildfarm member is bad news. I'll LD_PRELOAD the attached to
>> prevent fsync from reaching the kernel. Hopefully, that will make the
>> hardware-or-kernel trouble unreachable. (Changing
Hi,
On 2021-05-07 17:14:18 -0700, Noah Misch wrote:
> Having a flaky buildfarm member is bad news. I'll LD_PRELOAD the attached to
> prevent fsync from reaching the kernel. Hopefully, that will make the
> hardware-or-kernel trouble unreachable. (Changing 008_fsm_truncation.pl
> wouldn't avoid
On Fri, May 07, 2021 at 04:42:46PM +1200, Thomas Munro wrote:
> Oh, and I see that 13 has 9989d37d "Remove XLogFileNameP() from the
> tree" to fix this exact problem.
I don't see that we'd be able to get a redesign of this area safe
enough for a backpatch, but perhaps we (I?) had better put some
On Fri, May 07, 2021 at 04:30:00PM -0400, Tom Lane wrote:
> I can certainly see an argument for running some buildfarm animals
> with fsync on (for all tests). I don't see a reason for forcing
> them all to run some tests that way; and if I were going to do that,
> I doubt that
On Fri, May 07, 2021 at 01:18:19PM -0400, Tom Lane wrote:
> Realizing that 9989d37d prevents the assertion failure, I went
> to see if thorntail had shown EIO failures without assertions.
> Looking back 180 days, I found these:
>
> sysname |branch | snapshot | stage
Andres Freund writes:
> Isn't this a good reason to have at least some tests run with fsync=on?
Why?
I can certainly see an argument for running some buildfarm animals
with fsync on (for all tests). I don't see a reason for forcing
them all to run some tests that way; and if I were going to do
Hi,
On 2021-05-07 10:29:58 -0400, Tom Lane wrote:
> I wrote:
> > 1. No wonder we could not reproduce it anywhere else. I've warned
> > the cfarm admins that their machine may be having hardware issues.
>
> I heard back from the machine's admin. The time of the crash I observed
> matches
On 5/7/21 11:27 AM, Andrew Dunstan wrote:
> On 5/7/21 12:38 AM, Andres Freund wrote:
>> Hi,
>>
>> On 2021-05-07 00:30:11 -0400, Tom Lane wrote:
>>> Andres Freund writes:
On 2021-05-06 21:43:32 -0400, Tom Lane wrote:
> That I'm not sure about. gdb is certainly installed, and thorntail
I wrote:
> Thomas Munro writes:
>> Oh, and I see that 13 has 9989d37d "Remove XLogFileNameP() from the
>> tree" to fix this exact problem.
> Hah, so that maybe explains why thorntail has only shown this in
> the v12 branch. Should we consider back-patching that?
Realizing that 9989d37d
On 5/7/21 12:38 AM, Andres Freund wrote:
> Hi,
>
> On 2021-05-07 00:30:11 -0400, Tom Lane wrote:
>> Andres Freund writes:
>>> On 2021-05-06 21:43:32 -0400, Tom Lane wrote:
That I'm not sure about. gdb is certainly installed, and thorntail is
visibly running the current buildfarm
I wrote:
> 1. No wonder we could not reproduce it anywhere else. I've warned
> the cfarm admins that their machine may be having hardware issues.
I heard back from the machine's admin. The time of the crash I observed
matches exactly to these events in the kernel log:
May 07 03:31:39 gcc202
Thomas Munro writes:
> On Fri, May 7, 2021 at 1:43 PM Tom Lane wrote:
>> The interesting part of this is frame 6, which points here:
> Oh, and I see that 13 has 9989d37d "Remove XLogFileNameP() from the
> tree" to fix this exact problem.
Hah, so that maybe explains why thorntail has only shown
On Fri, May 7, 2021 at 1:43 PM Tom Lane wrote:
> The interesting part of this is frame 6, which points here:
>
> case SYNC_METHOD_FDATASYNC:
> if (pg_fdatasync(fd) != 0)
> ereport(PANIC,
> (errcode_for_file_access(),
>
Hi,
On 2021-05-07 00:30:11 -0400, Tom Lane wrote:
> Andres Freund writes:
> > On 2021-05-06 21:43:32 -0400, Tom Lane wrote:
> >> That I'm not sure about. gdb is certainly installed, and thorntail is
> >> visibly running the current buildfarm client and is configured with the
> >> correct
Andres Freund writes:
> On 2021-05-06 21:43:32 -0400, Tom Lane wrote:
>> That I'm not sure about. gdb is certainly installed, and thorntail is
>> visibly running the current buildfarm client and is configured with the
>> correct core_file_glob, and I can report that the crash did leave a 'core'
Hi,
On 2021-05-06 21:43:32 -0400, Tom Lane wrote:
> 2. We evidently need to put a bit more effort into this error
> reporting logic. More generally, I wonder how we could audit
> the code for similar hazards elsewhere, because I bet there are
> some. (Or ... could it be sane to run functions
On Thu, May 06, 2021 at 09:43:32PM -0400, Tom Lane wrote:
> 2. We evidently need to put a bit more effort into this error
> reporting logic. More generally, I wonder how we could audit
> the code for similar hazards elsewhere, because I bet there are
> some. (Or ... could it be sane to run
Thomas Munro writes:
> While looking for something else, I noticed thorntail has failed twice
> like this, on REL_12_STABLE:
> TRAP: FailedAssertion("!(CritSectionCount == 0 ||
> (context)->allowInCritSection)", File:
>
Hi,
While looking for something else, I noticed thorntail has failed twice
like this, on REL_12_STABLE:
TRAP: FailedAssertion("!(CritSectionCount == 0 ||
(context)->allowInCritSection)", File:
22 matches
Mail list logo