Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-30 Thread Alvaro Herrera
Alvaro Herrera wrote: > Alvaro Herrera wrote: > > > Before pushing, I'll give a look to the regular autovacuum path to see > > if it needs a similar fix. > > Reading that one, my conclusion is that it doesn't have the same problem > because the strings are allocated in AutovacuumMemCxt which is

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-24 Thread Alvaro Herrera
Alvaro Herrera wrote: > Before pushing, I'll give a look to the regular autovacuum path to see > if it needs a similar fix. Reading that one, my conclusion is that it doesn't have the same problem because the strings are allocated in AutovacuumMemCxt which is not reset by error recovery. This

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-23 Thread Alvaro Herrera
Tom Lane wrote: > What I'm suspicious of as the actual bug cause is the comment in > perform_work_item about how we need to be sure that we're allocating these > strings in a long-lived context. If, in fact, they were allocated in some > context that could get reset during the PG_TRY

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-18 Thread Tom Lane
Alvaro Herrera writes: > And the previous code crashes in 45 minutes? That's solid enough for > me; I'll clean up the patch and push in the next few days. I think what > you have now should be sufficient for the time being for your production > system. I'm still of the

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-18 Thread Justin Pryzby
On Wed, Oct 18, 2017 at 07:22:27PM +0200, Alvaro Herrera wrote: > Do you still have those core dumps? If so, would you please verify the > database that autovacuum was running in? Just open each with gdb (using > the original postgres binary, not the one you just installed) and do > "print

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-18 Thread Alvaro Herrera
Justin Pryzby wrote: > On Wed, Oct 18, 2017 at 06:54:09PM +0200, Alvaro Herrera wrote: > > And the previous code crashes in 45 minutes? That's solid enough for > > me; I'll clean up the patch and push in the next few days. I think what > > you have now should be sufficient for the time being

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-18 Thread Justin Pryzby
On Wed, Oct 18, 2017 at 06:54:09PM +0200, Alvaro Herrera wrote: > Justin Pryzby wrote: > > > No crashes in ~28hr. It occurs to me that it's a weaker test due to not > > preserving most compilation options. > > And the previous code crashes in 45 minutes? That's solid enough for > me; I'll

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-18 Thread Alvaro Herrera
Justin Pryzby wrote: > No crashes in ~28hr. It occurs to me that it's a weaker test due to not > preserving most compilation options. And the previous code crashes in 45 minutes? That's solid enough for me; I'll clean up the patch and push in the next few days. I think what you have now

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-18 Thread Justin Pryzby
On Tue, Oct 17, 2017 at 09:07:40AM -0500, Justin Pryzby wrote: > On Tue, Oct 17, 2017 at 09:34:24AM -0400, Tom Lane wrote: > > Justin Pryzby writes: > > > On Tue, Oct 17, 2017 at 12:59:16PM +0200, Alvaro Herrera wrote: > > >> Anyway, can give this patch a try? > > > > The

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-17 Thread Tom Lane
Alvaro Herrera writes: > cur_datname here seems corrupted -- it points halfway into cur_nspname, > which is also a corrupt value. Yeah. > And I think that's because we're not > checking that the namespace OID is a valid value before calling > get_namespace_name on it.

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-17 Thread Craig Ringer
On 17 October 2017 at 22:39, Tom Lane wrote: > Justin Pryzby writes: >> On Tue, Oct 17, 2017 at 09:34:24AM -0400, Tom Lane wrote: >>> So: where did you get the existing binaries? If it's from some vendor >>> packaging system, what you should do is fetch

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-17 Thread Alvaro Herrera
Justin Pryzby wrote: > I'm happy to try the patch, but in case it makes any difference, we have few > DBs/schemas: I don't expect that it does. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-17 Thread Tom Lane
Justin Pryzby writes: > On Tue, Oct 17, 2017 at 09:34:24AM -0400, Tom Lane wrote: >> So: where did you get the existing binaries? If it's from some vendor >> packaging system, what you should do is fetch the package source, add >> the patch to the probably-nonempty set of

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-17 Thread Justin Pryzby
On Tue, Oct 17, 2017 at 09:34:24AM -0400, Tom Lane wrote: > Justin Pryzby writes: > > On Tue, Oct 17, 2017 at 12:59:16PM +0200, Alvaro Herrera wrote: > >> Anyway, can give this patch a try? > > > I've only compiled postgres once before and this is a production environment >

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-17 Thread Tom Lane
Justin Pryzby writes: > On Tue, Oct 17, 2017 at 12:59:16PM +0200, Alvaro Herrera wrote: >> Anyway, can give this patch a try? > I've only compiled postgres once before and this is a production environment > (althought nothing so important that the crashes are a serious

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-17 Thread Tomas Vondra
On 10/17/2017 02:29 PM, Justin Pryzby wrote: > On Tue, Oct 17, 2017 at 12:59:16PM +0200, Alvaro Herrera wrote: >> Anyway, can give this patch a try? > > I've only compiled postgres once before and this is a production environment > (althought nothing so important that the crashes are a serious

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-17 Thread Justin Pryzby
On Tue, Oct 17, 2017 at 12:59:16PM +0200, Alvaro Herrera wrote: > Justin Pryzby wrote: > > > #1 0x006a52e9 in perform_work_item (workitem=0x7f8ad1f94824) at > > autovacuum.c:2676 > > cur_datname = 0x298c740 "no 1 :vartype 1184 :vartypmod -1 > > :varcollid 0 :varlevelsup 0

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-17 Thread Justin Pryzby
On Tue, Oct 17, 2017 at 12:59:16PM +0200, Alvaro Herrera wrote: > Anyway, can give this patch a try? I've only compiled postgres once before and this is a production environment (althought nothing so important that the crashes are a serious concern either). Is it reasonable to wget the postgres

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-17 Thread Alvaro Herrera
Justin Pryzby wrote: > #1 0x006a52e9 in perform_work_item (workitem=0x7f8ad1f94824) at > autovacuum.c:2676 > cur_datname = 0x298c740 "no 1 :vartype 1184 :vartypmod -1 :varcollid > 0 :varlevelsup 0 :varnoold 1 :varoattno 1 :location 146} {CONST :consttype > 1184 :consttypmod -1

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-17 Thread Alvaro Herrera
Justin Pryzby wrote: > On Sun, Oct 15, 2017 at 02:44:58PM +0200, Tomas Vondra wrote: > > Thanks, but I'm not sure that'll help, at this point. We already know > > what happened (corrupted memory), we don't know "how". And core files > > are mostly just "snapshots" so are not very useful in

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-17 Thread Justin Pryzby
On Sun, Oct 15, 2017 at 02:44:58PM +0200, Tomas Vondra wrote: > Thanks, but I'm not sure that'll help, at this point. We already know > what happened (corrupted memory), we don't know "how". And core files > are mostly just "snapshots" so are not very useful in answering that :-( Is there

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-15 Thread Justin Pryzby
On Sat, Oct 14, 2017 at 08:56:56PM -0500, Justin Pryzby wrote: > On Fri, Oct 13, 2017 at 10:57:32PM -0500, Justin Pryzby wrote: > > > Also notice the vacuum process was interrupted, same as yesterday (think > > > goodness for full logs). Our INSERT script is using python > > >

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-15 Thread Tomas Vondra
Hi, On 10/15/2017 03:56 AM, Justin Pryzby wrote: > On Fri, Oct 13, 2017 at 10:57:32PM -0500, Justin Pryzby wrote: ... >> It's a bit difficult to guess what went wrong from this backtrace. For >> me gdb typically prints a bunch of lines immediately before the frames, >> explaining what went wrong

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-14 Thread Justin Pryzby
On Fri, Oct 13, 2017 at 10:57:32PM -0500, Justin Pryzby wrote: > > Also notice the vacuum process was interrupted, same as yesterday (think > > goodness for full logs). Our INSERT script is using python > > multiprocessing.pool() with "maxtasksperchild=1", which I think means we > > load > > one

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-14 Thread Tomas Vondra
Hi, On 10/15/2017 12:42 AM, Justin Pryzby wrote: > On Fri, Oct 13, 2017 at 10:57:32PM -0500, Justin Pryzby wrote: >> I don't have any reason to believe there's memory issue on the server, So I >> suppose this is just a "heads up" to early adopters until/in case it happens >> again and I can at

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-14 Thread Justin Pryzby
On Fri, Oct 13, 2017 at 10:57:32PM -0500, Justin Pryzby wrote: > I don't have any reason to believe there's memory issue on the server, So I > suppose this is just a "heads up" to early adopters until/in case it happens > again and I can at least provide a stack trace. I'm back; find stacktrace

[HACKERS] SIGSEGV in BRIN autosummarize

2017-10-13 Thread Justin Pryzby
I upgraded one of our customers to PG10 Tuesday night, and Wednesday replaced an BTREE index with BRIN index (WITH autosummarize). Today I see: < 2017-10-13 17:22:47.839 -04 >LOG: server process (PID 32127) was terminated by signal 11: Segmentation fault < 2017-10-13 17:22:47.839 -04 >DETAIL: