Re: double free in ExecHashJoin, 9.6.12

2019-07-26 Thread Merlin Moncure
On Wed, Jul 24, 2019 at 11:01 PM Thomas Munro  wrote:
>
> On Thu, Jul 25, 2019 at 2:39 AM Merlin Moncure  wrote:
> > Server is generally running pretty well, and is high volume.  This
> > query is not new and is also medium volume.  Database rebooted in
> > about 4 seconds with no damage; fast enough we didn't even trip alarms
> > (I noticed this troubleshooting another issue).  We are a couple of
> > bug fixes releases behind but I didn't see anything obvious in the
> > release notes suggesting a resolved issue. Anyone have any ideas?
> > thanks in advance.
>
> > postgres: rms ysconfig 10.33.190.21(36788) 
> > SELECT(ExecHashJoin+0x5a2)[0x5e2d32]
>
> Hi Merlin,
>
> Where's the binary from (exact package name, if installed with a
> package)?  Not sure if this is going to help, but is there any chance
> you could disassemble that function so we can try to see what it's
> doing at that offset?  For example on Debian if you have
> postgresql-9.6 and postgresql-9.6-dbg installed you could run "gdb
> /usr/lib/postgresql/9.6/bin/postgres" and then "disassemble
> ExecHashJoin".  The code at "<+1442>" (0x5a2) is presumably calling
> free or some other libc thing (though I'm surprised not to see an
> intervening palloc thing).

Thanks -- great suggestion.  I'll report back with any interesting findings.

merlin




Re: double free in ExecHashJoin, 9.6.12

2019-07-24 Thread Thomas Munro
On Thu, Jul 25, 2019 at 2:39 AM Merlin Moncure  wrote:
> Server is generally running pretty well, and is high volume.  This
> query is not new and is also medium volume.  Database rebooted in
> about 4 seconds with no damage; fast enough we didn't even trip alarms
> (I noticed this troubleshooting another issue).  We are a couple of
> bug fixes releases behind but I didn't see anything obvious in the
> release notes suggesting a resolved issue. Anyone have any ideas?
> thanks in advance.

> postgres: rms ysconfig 10.33.190.21(36788) 
> SELECT(ExecHashJoin+0x5a2)[0x5e2d32]

Hi Merlin,

Where's the binary from (exact package name, if installed with a
package)?  Not sure if this is going to help, but is there any chance
you could disassemble that function so we can try to see what it's
doing at that offset?  For example on Debian if you have
postgresql-9.6 and postgresql-9.6-dbg installed you could run "gdb
/usr/lib/postgresql/9.6/bin/postgres" and then "disassemble
ExecHashJoin".  The code at "<+1442>" (0x5a2) is presumably calling
free or some other libc thing (though I'm surprised not to see an
intervening palloc thing).

--
Thomas Munro
https://enterprisedb.com




double free in ExecHashJoin, 9.6.12

2019-07-24 Thread Merlin Moncure
Server is generally running pretty well, and is high volume.  This
query is not new and is also medium volume.  Database rebooted in
about 4 seconds with no damage; fast enough we didn't even trip alarms
(I noticed this troubleshooting another issue).  We are a couple of
bug fixes releases behind but I didn't see anything obvious in the
release notes suggesting a resolved issue. Anyone have any ideas?
thanks in advance.

merlin

*** glibc detected *** postgres: rms ysconfig 10.33.190.21(36788)
SELECT: double free or corruption (!prev): 0x01fb2140 ***
=== Backtrace: =
/lib64/libc.so.6(+0x75dee)[0x7f4fde053dee]
/lib64/libc.so.6(+0x78c80)[0x7f4fde056c80]
postgres: rms ysconfig 10.33.190.21(36788) SELECT(ExecHashJoin+0x5a2)[0x5e2d32]
postgres: rms ysconfig 10.33.190.21(36788) SELECT(ExecProcNode+0x208)[0x5cf728]
postgres: rms ysconfig 10.33.190.21(36788)
SELECT(standard_ExecutorRun+0x18a)[0x5cd1ca]
postgres: rms ysconfig 10.33.190.21(36788) SELECT[0x6e5607]
postgres: rms ysconfig 10.33.190.21(36788) SELECT(PortalRun+0x188)[0x6e67d8]
postgres: rms ysconfig 10.33.190.21(36788) SELECT[0x6e2af3]
postgres: rms ysconfig 10.33.190.21(36788) SELECT(PostgresMain+0x75a)[0x6e456a]
postgres: rms ysconfig 10.33.190.21(36788)
SELECT(PostmasterMain+0x1875)[0x6840b5]
postgres: rms ysconfig 10.33.190.21(36788) SELECT(main+0x7a8)[0x60b528]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7f4fddffcd1d]
postgres: rms ysconfig 10.33.190.21(36788) SELECT[0x46c589]

2019-07-23 09:41:41 CDT[:@]:LOG:  server process (PID 18057) was
terminated by signal 6: Aborted
2019-07-23 09:41:41 CDT[:@]:DETAIL:  Failed process was running:
SELECT JR.job_id as jobId,JR.job_execution_id as
jobResultId,JR.created as lastRunDate, JR.status as status,
JR.status_message as statusMessage, JR.output_format as outputFormat,
JS.schedule_name as scheduleName, JS.job_name as reportName,
JS.created_by as scheduledBy, JS.product as source FROM (SELECT
JR.job_id, MAX(JR.created) AS MaxCreated FROM job_schedule JS JOIN
job_result JR ON JR.job_id=JS.job_id WHERE (lower(JS.recepients) like
lower($1) OR lower(JS.created_by) = lower($2)) GROUP BY JR.job_id) TMP
JOIN job_result JR ON JR.job_id = TMP.job_id AND JR.created =
TMP.MaxCreated JOIN job_schedule JS ON JS.job_id = JR.job_id AND
JS.job_type='CRON'

merlin