Re: dsa_allocate() faliure

2019-02-12 Thread Thomas Munro
On Mon, Feb 11, 2019 at 10:33 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > So I'll wait until after the release, and we'll have
> > to live with the bug for another 3 months.
>
> Check.  Please hold off committing until you see the release tags
> appear, probably late Tuesday my time / Wednesday noonish yours.

Pushed.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: dsa_allocate() faliure

2019-02-10 Thread Tom Lane
Thomas Munro  writes:
> On Mon, Feb 11, 2019 at 10:33 AM Tom Lane  wrote:
>> I observe from
>> https://coverage.postgresql.org/src/backend/utils/mmgr/freepage.c.gcov.html
>> that the edge cases in this function aren't too well exercised by
>> our regression tests, meaning that the buildfarm might not prove
>> much either way about the correctness of this patch.  That is one
>> factor pushing me to think we shouldn't risk it.  But, taking a
>> longer view, is that something that's practical to improve?

> Yeah.  This is a nice example of code that really deserves unit tests
> written in C.  Could be good motivation to built the infrastructure I
> mentioned here:
> https://www.postgresql.org/message-id/flat/CAEepm%3D2heu%2B5zwB65jWap3XY-UP6PpJZiKLQRSV2UQH9BmVRXQ%40mail.gmail.com

Meh.  I think if you hold out for that, you're going to be waiting a
long time.  I was thinking more along the lines of making a test API
in src/test/modules/, akin to what we've got for predtest or rbtree.

regards, tom lane



Re: dsa_allocate() faliure

2019-02-10 Thread Thomas Munro
On Mon, Feb 11, 2019 at 10:33 AM Tom Lane  wrote:
> I observe from
>
> https://coverage.postgresql.org/src/backend/utils/mmgr/freepage.c.gcov.html
>
> that the edge cases in this function aren't too well exercised by
> our regression tests, meaning that the buildfarm might not prove
> much either way about the correctness of this patch.  That is one
> factor pushing me to think we shouldn't risk it.  But, taking a
> longer view, is that something that's practical to improve?

Yeah.  This is a nice example of code that really deserves unit tests
written in C.  Could be good motivation to built the infrastructure I
mentioned here:

https://www.postgresql.org/message-id/flat/CAEepm%3D2heu%2B5zwB65jWap3XY-UP6PpJZiKLQRSV2UQH9BmVRXQ%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com



Re: dsa_allocate() faliure

2019-02-10 Thread Thomas Munro
On Mon, Feb 11, 2019 at 11:02 AM Justin Pryzby  wrote:
> On Mon, Feb 11, 2019 at 09:45:07AM +1100, Thomas Munro wrote:
> > Ouch.  Yeah, that'd do it and matches the evidence.  With this change,
> > I couldn't reproduce the problem after 90 minutes with a test case
> > that otherwise hits it within a couple of minutes.
> ...
> > Note that this patch addresses the error "dsa_allocate could not find
> > %zu free pages".  (The error "dsa_area could not attach to segment" is
> > something else and apparently rarer.)
>
> "could not attach" is the error reported early this morning while
> stress-testing this patch with queued_alters queries in loops, so that's
> consistent with your understanding.  And I guess it preceded getting stuck on
> lock; although I don't how long between the first happened and the second, I'm
> guess not long and perhaps immedidately; since the rest of the processes were
> all stuck as in bug#15585 rather than ERRORing once every few minutes.
>
> I mentioned that "could not attach to segment" occurs in leader either/or
> parallel worker.  And most of the time causes an ERROR only, and doesn't wedge
> all future parallel workers.  Maybe bug#15585 "wedged" state maybe only occurs
> after some pattern of leader+worker failures (?)  I've just triggered 
> bug#15585
> again, but if there's a pattern, I don't see it.
>
> Please let me know whether you're able to reproduce the "not attach" bug using
> simultaneous loops around the queued_alters query; it's easy here.

I haven't ever managed to reproduce that one yet.  It's great you have
a reliable repro...  Let's discuss it on the #15585 thread.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: dsa_allocate() faliure

2019-02-10 Thread Justin Pryzby
On Mon, Feb 11, 2019 at 09:45:07AM +1100, Thomas Munro wrote:
> Ouch.  Yeah, that'd do it and matches the evidence.  With this change,
> I couldn't reproduce the problem after 90 minutes with a test case
> that otherwise hits it within a couple of minutes.
...
> Note that this patch addresses the error "dsa_allocate could not find
> %zu free pages".  (The error "dsa_area could not attach to segment" is
> something else and apparently rarer.)

"could not attach" is the error reported early this morning while
stress-testing this patch with queued_alters queries in loops, so that's
consistent with your understanding.  And I guess it preceded getting stuck on
lock; although I don't how long between the first happened and the second, I'm
guess not long and perhaps immedidately; since the rest of the processes were
all stuck as in bug#15585 rather than ERRORing once every few minutes.

I mentioned that "could not attach to segment" occurs in leader either/or
parallel worker.  And most of the time causes an ERROR only, and doesn't wedge
all future parallel workers.  Maybe bug#15585 "wedged" state maybe only occurs
after some pattern of leader+worker failures (?)  I've just triggered bug#15585
again, but if there's a pattern, I don't see it.

Please let me know whether you're able to reproduce the "not attach" bug using
simultaneous loops around the queued_alters query; it's easy here.

Justin



Re: dsa_allocate() faliure

2019-02-10 Thread Tom Lane
Thomas Munro  writes:
> This brings us to a difficult choice: we're about to cut a new
> release, and this could in theory be included.  Even though the fix is
> quite convincing, it doesn't seem wise to change such complicated code
> at the last minute, and I know from an off-list chat that that is also
> Robert's view.

Yeah ... at this point we're just too close to the release deadline,
I'm afraid, even though the fix *looks* pretty safe.  Not worth the risk
given that this seems to be a low-probability bug.

I observe from

https://coverage.postgresql.org/src/backend/utils/mmgr/freepage.c.gcov.html

that the edge cases in this function aren't too well exercised by
our regression tests, meaning that the buildfarm might not prove
much either way about the correctness of this patch.  That is one
factor pushing me to think we shouldn't risk it.  But, taking a
longer view, is that something that's practical to improve?

> So I'll wait until after the release, and we'll have
> to live with the bug for another 3 months.

Check.  Please hold off committing until you see the release tags
appear, probably late Tuesday my time / Wednesday noonish yours.

regards, tom lane



Re: dsa_allocate() faliure

2019-02-10 Thread Thomas Munro
On Sun, Feb 10, 2019 at 5:41 PM Robert Haas  wrote:
> On Sun, Feb 10, 2019 at 2:37 AM Thomas Munro
>  wrote:
> > But at first glance it shouldn't be allocating pages, because it just
> > does consolidation to try to convert to singleton format, and then it
> > does recycle list cleanup using soft=true so that no allocation of
> > btree pages should occur.
>
> I think I see what's happening.  At the moment the problem occurs,
> there is no btree - there is only a singleton range.  So
> FreePageManagerInternal() takes the fpm->btree_depth == 0 branch and
> then ends up in the section with the comment  /* Not contiguous; we
> need to initialize the btree. */.  And that section, sadly, does not
> respect the 'soft' flag, so kaboom.  Something like the attached might
> fix it.

Ouch.  Yeah, that'd do it and matches the evidence.  With this change,
I couldn't reproduce the problem after 90 minutes with a test case
that otherwise hits it within a couple of minutes.

Here's a patch with a commit message explaining the change.

It also removes an obsolete comment, which is in fact related.  The
comment refers to an output parameter internal_pages_used, which must
have been used to report this exact phenomenon in an earlier
development version.  But there is no such parameter in the committed
version, and instead there is the soft flag to prevent internal
allocation.  I have no view on which approach is best, but yeah, if
we're using a soft flag, it has to work reliably.

This brings us to a difficult choice: we're about to cut a new
release, and this could in theory be included.  Even though the fix is
quite convincing, it doesn't seem wise to change such complicated code
at the last minute, and I know from an off-list chat that that is also
Robert's view.  So I'll wait until after the release, and we'll have
to live with the bug for another 3 months.

Note that this patch addresses the error "dsa_allocate could not find
%zu free pages".  (The error "dsa_area could not attach to segment" is
something else and apparently rarer.)

> Boy, I love FreePageManagerDump!

Yeah.  And I love reproducible bugs.

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Fix-freepage.c-bug-that-causes-rare-dsa_allocate-fai.patch
Description: Binary data


Re: dsa_allocate() faliure

2019-02-10 Thread Justin Pryzby
On Sun, Feb 10, 2019 at 07:11:22PM +0300, Sergei Kornilov wrote:
> > I ran overnight with this patch, but all parallel processes ended up stuck 
> > in
> > the style of bug#15585. So that's either not the root cause, or there's a 
> > 2nd
> > issue.
> 
> Maybe i missed something in this discussion, but you can reproduce bug#15585? 
> How? With this testcase: 
> https://www.postgresql.org/message-id/CAEepm%3D1MvOE-Sfv1chudx5KEmw_qHYrj8F9Og_WmdBRhXSQ%2B%2Bw%40mail.gmail.com
>  ?

By running the queued_alters query multiple times in a loop:
https://www.postgresql.org/message-id/20181231221734.GB25379%40telsasoft.com

I'm able to trigger dsa "ERROR"s with that query on a newly initdb cluster with
only that table.  But I think some servers are more likely to hit it than
others.

I've only tripped on 15585 twice, and only while trying to trigger other DSA
bugs (the working hypothesis is that bug is 2ndary issue which happens after
hitting some other bug).  And not consistently or quickly.

Justin



Re: dsa_allocate() faliure

2019-02-10 Thread Sergei Kornilov
Hi

> I ran overnight with this patch, but all parallel processes ended up stuck in
> the style of bug#15585. So that's either not the root cause, or there's a 2nd
> issue.

Maybe i missed something in this discussion, but you can reproduce bug#15585? 
How? With this testcase: 
https://www.postgresql.org/message-id/CAEepm%3D1MvOE-Sfv1chudx5KEmw_qHYrj8F9Og_WmdBRhXSQ%2B%2Bw%40mail.gmail.com
 ?

regards, Sergei



Re: dsa_allocate() faliure

2019-02-10 Thread Justin Pryzby
On Sun, Feb 10, 2019 at 12:10:52PM +0530, Robert Haas wrote:
> I think I see what's happening.  At the moment the problem occurs,
> there is no btree - there is only a singleton range.  So
> FreePageManagerInternal() takes the fpm->btree_depth == 0 branch and
> then ends up in the section with the comment  /* Not contiguous; we
> need to initialize the btree. */.  And that section, sadly, does not
> respect the 'soft' flag, so kaboom.  Something like the attached might
> fix it.

I ran overnight with this patch, but all parallel processes ended up stuck in
the style of bug#15585.  So that's either not the root cause, or there's a 2nd
issue.

https://www.postgresql.org/message-id/flat/15585-324ff6a93a18da46%40postgresql.org

Justin



Re: dsa_allocate() faliure

2019-02-09 Thread Robert Haas
On Sun, Feb 10, 2019 at 2:37 AM Thomas Munro
 wrote:
> ... but why would it do that?  I can reproduce cases where (for
> example) FreePageManagerPutInternal() returns 179, and then
> FreePageManagerLargestContiguous() returns 179, but then after
> FreePageBtreeCleanup() it returns 178.  At that point FreePageDump()
> says:
>
> btree depth 1:
>   77@0 l: 27(1) 78(178)
> freelists:
>   1: 27
>   129: 78(178)
>
> But at first glance it shouldn't be allocating pages, because it just
> does consolidation to try to convert to singleton format, and then it
> does recycle list cleanup using soft=true so that no allocation of
> btree pages should occur.

I think I see what's happening.  At the moment the problem occurs,
there is no btree - there is only a singleton range.  So
FreePageManagerInternal() takes the fpm->btree_depth == 0 branch and
then ends up in the section with the comment  /* Not contiguous; we
need to initialize the btree. */.  And that section, sadly, does not
respect the 'soft' flag, so kaboom.  Something like the attached might
fix it.

Boy, I love FreePageManagerDump!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


dsa-allocate-btree-init-soft.patch
Description: Binary data


Re: dsa_allocate() faliure

2019-02-09 Thread Robert Haas
On Sun, Feb 10, 2019 at 1:55 AM Thomas Munro
 wrote:
> Bleugh.  Yeah.  What I said before wasn't quite right.  The value
> returned by FreePageManagerPutInternal() is actually correct at the
> moment it is returned, but it ceases to be correct immediately
> afterwards if the following call to FreePageBtreeCleanup() happens to
> reduce the size of that particular span.  The problem is that we
> clobber fpm->contiguous_pages with the earlier (and by now incorrect)
> value that we were holding in a local variable.

Yeah, I had similar bugs to that during the initial development work I
did on freepage.c, and that's why I got rid of some lazy recomputation
thing that I had tried at some point.  The version that got committed
brought that back again, but possibly it's got the same kind of
problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: dsa_allocate() faliure

2019-02-09 Thread Thomas Munro
On Sun, Feb 10, 2019 at 7:24 AM Thomas Munro
 wrote:
> On Sat, Feb 9, 2019 at 9:21 PM Robert Haas  wrote:
> > On Fri, Feb 8, 2019 at 8:00 AM Thomas Munro
> >  wrote:
> > > Sometimes FreeManagerPutInternal() returns a
> > > number-of-contiguous-pages-created-by-this-insertion that is too large
> > > by one. [...]
> >
> > I spent a long time thinking about this and starting at code this
> > afternoon, but I didn't really come up with much of anything useful.
> > It seems like a strange failure mode, because
> > FreePageManagerPutInternal() normally just  returns its third argument
> > unmodified. [...]
>
> Bleugh.  Yeah.  What I said before wasn't quite right.  The value
> returned by FreePageManagerPutInternal() is actually correct at the
> moment it is returned, but it ceases to be correct immediately
> afterwards if the following call to FreePageBtreeCleanup() happens to
> reduce the size of that particular span.

... but why would it do that?  I can reproduce cases where (for
example) FreePageManagerPutInternal() returns 179, and then
FreePageManagerLargestContiguous() returns 179, but then after
FreePageBtreeCleanup() it returns 178.  At that point FreePageDump()
says:

btree depth 1:
  77@0 l: 27(1) 78(178)
freelists:
  1: 27
  129: 78(178)

But at first glance it shouldn't be allocating pages, because it just
does consolidation to try to convert to singleton format, and then it
does recycle list cleanup using soft=true so that no allocation of
btree pages should occur.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: dsa_allocate() faliure

2019-02-09 Thread Thomas Munro
On Sat, Feb 9, 2019 at 9:21 PM Robert Haas  wrote:
> On Fri, Feb 8, 2019 at 8:00 AM Thomas Munro
>  wrote:
> > Sometimes FreeManagerPutInternal() returns a
> > number-of-contiguous-pages-created-by-this-insertion that is too large
> > by one. [...]
>
> I spent a long time thinking about this and starting at code this
> afternoon, but I didn't really come up with much of anything useful.
> It seems like a strange failure mode, because
> FreePageManagerPutInternal() normally just  returns its third argument
> unmodified. [...]

Bleugh.  Yeah.  What I said before wasn't quite right.  The value
returned by FreePageManagerPutInternal() is actually correct at the
moment it is returned, but it ceases to be correct immediately
afterwards if the following call to FreePageBtreeCleanup() happens to
reduce the size of that particular span.  The problem is that we
clobber fpm->contiguous_pages with the earlier (and by now incorrect)
value that we were holding in a local variable.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: dsa_allocate() faliure

2019-02-09 Thread Robert Haas
On Fri, Feb 8, 2019 at 8:00 AM Thomas Munro
 wrote:
> Sometimes FreeManagerPutInternal() returns a
> number-of-contiguous-pages-created-by-this-insertion that is too large
> by one.  If this happens to be a new max-number-of-contiguous-pages,
> it causes trouble some arbitrary time later because the max is wrong
> and this FPM cannot satisfy a request that large, and it may not be
> recomputed for some time because the incorrect value prevents
> recomputation.  Not sure yet if this is due to the lazy computation
> logic or a plain old fence-post error in the btree consolidation code
> or something else.

I spent a long time thinking about this and starting at code this
afternoon, but I didn't really come up with much of anything useful.
It seems like a strange failure mode, because
FreePageManagerPutInternal() normally just  returns its third argument
unmodified. The only cases where anything else happens are the ones
where we're able to consolidate the returned span with a preceding or
following span, and I'm scratching my head as to how that logic could
be wrong, especially since it also has some Assert() statements that
seem like they would detect the kinds of inconsistencies that would
lead to trouble.  For example, if we somehow ended up with two spans
that (improperly) overlapped, we'd trip an Assert().  And if that
didn't happen -- because we're not in an Assert-enabled build -- the
code is written so that it only relies on the npages value of the last
of the consolidated scans, so an error in the npages value of one of
the earlier spans would just get fixed up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: dsa_allocate() faliure

2019-02-07 Thread Thomas Munro
On Fri, Feb 8, 2019 at 4:49 AM Thomas Munro
 wrote:
> I don't have the answer yet but I have some progress: I finally
> reproduced the "could not find %d free pages" error by running lots of
> concurrent parallel queries.  Will investigate.

Sometimes FreeManagerPutInternal() returns a
number-of-contiguous-pages-created-by-this-insertion that is too large
by one.  If this happens to be a new max-number-of-contiguous-pages,
it causes trouble some arbitrary time later because the max is wrong
and this FPM cannot satisfy a request that large, and it may not be
recomputed for some time because the incorrect value prevents
recomputation.  Not sure yet if this is due to the lazy computation
logic or a plain old fence-post error in the btree consolidation code
or something else.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: dsa_allocate() faliure

2019-02-07 Thread Thomas Munro
On Thu, Feb 7, 2019 at 9:10 PM Jakub Glapa  wrote:
> > Do you have query logging enabled ?  If not, could you consider it on at 
> > least
> one of those servers ?  I'm interested to know what ELSE is running at the 
> time
> that query failed.
>
> Ok, I have configured that and will enable in the time window when the errors 
> usually occur. I'll report as soon as I have something.

I don't have the answer yet but I have some progress: I finally
reproduced the "could not find %d free pages" error by running lots of
concurrent parallel queries.  Will investigate.

Set up:

create table foo (p int, a int, b int) partition by list (p);
create table foo_1 partition of foo for values in (1);
create table foo_2 partition of foo for values in (2);
create table foo_3 partition of foo for values in (3);
alter table foo_1 set (parallel_workers = 4);
alter table foo_2 set (parallel_workers = 4);
alter table foo_3 set (parallel_workers = 4);
insert into foo
select generate_series(1, 1000)::int % 3 + 1,
   generate_series(1, 1000)::int % 50,
   generate_series(1, 1000)::int % 50;
create index on foo_1(a);
create index on foo_2(a);
create index on foo_3(a);
create index on foo_1(b);
create index on foo_2(b);
create index on foo_3(b);
analyze;

Then I ran three copies of :

#!/bin/sh
(
  echo "set max_parallel_workers_per_gather = 4;"
  for I in `seq 1 10`; do
echo "explain analyze select count(*) from foo where a between 5
and 6 or b between 5 and 6;"
  done
) | psql postgres

-- 
Thomas Munro
http://www.enterprisedb.com



Re: dsa_allocate() faliure

2019-02-07 Thread Jakub Glapa
> Do you have query logging enabled ?  If not, could you consider it on at
least
one of those servers ?  I'm interested to know what ELSE is running at the
time
that query failed.

Ok, I have configured that and will enable in the time window when the
errors usually occur. I'll report as soon as I have something.


--
regards,
pozdrawiam,
Jakub Glapa


On Thu, Feb 7, 2019 at 12:21 AM Justin Pryzby  wrote:

> Moving to -hackers, hopefully it doesn't confuse the list scripts too much.
>
> On Mon, Feb 04, 2019 at 08:52:17AM +0100, Jakub Glapa wrote:
> > I see the error showing up every night on 2 different servers. But it's a
> > bit of a heisenbug because If I go there now it won't be reproducible.
>
> Do you have query logging enabled ?  If not, could you consider it on at
> least
> one of those servers ?  I'm interested to know what ELSE is running at the
> time
> that query failed.
>
> Perhaps you could enable query logging JUST for the interval of time that
> the
> server usually errors ?  The CSV logs can be imported to postgres for
> analysis.
> You might do something like SELECT
> left(message,99),COUNT(1),max(session_id) FROM postgres_log WHERE log_time
> BETWEEN .. AND .. GROUP BY 1 ORDER BY 2;
> And just maybe there'd be a query there that only runs once per day which
> would
> allow reproducing the error at will.  Or utility command like vacuum..
>
> I think ideally you'd set:
>
> log_statement= all
> log_min_messages = info
> log_destination  = 'stderr,csvlog'
> # stderr isn't important for this purpose, but I keep it set to capture
> crash messages, too
>
> You should set these to something that works well at your site:
>
> log_rotation_age= '2min'
> log_rotation_size   = '32MB'
>
> I would normally set these, and I don't see any reason why you wouldn't set
> them too:
>
> log_checkpoints = on
> log_lock_waits  = on
> log_temp_files  = on
> log_min_error_statement = notice
> log_temp_files  = 0
> log_min_duration_statement  = '9sec'
> log_autovacuum_min_duration = '999sec'
>
> And I would set these too but maybe you'd prefer to do something else:
>
> log_directory   = /var/log/postgresql
> log_file_mode   = 0640
> log_filename= postgresql-%Y-%m-%d_%H%M%S.log
>
> Justin
>


Re: dsa_allocate() faliure

2019-02-06 Thread Justin Pryzby
Moving to -hackers, hopefully it doesn't confuse the list scripts too much.

On Mon, Feb 04, 2019 at 08:52:17AM +0100, Jakub Glapa wrote:
> I see the error showing up every night on 2 different servers. But it's a
> bit of a heisenbug because If I go there now it won't be reproducible.

Do you have query logging enabled ?  If not, could you consider it on at least
one of those servers ?  I'm interested to know what ELSE is running at the time
that query failed.  

Perhaps you could enable query logging JUST for the interval of time that the
server usually errors ?  The CSV logs can be imported to postgres for analysis.
You might do something like SELECT left(message,99),COUNT(1),max(session_id) 
FROM postgres_log WHERE log_time BETWEEN .. AND .. GROUP BY 1 ORDER BY 2;
And just maybe there'd be a query there that only runs once per day which would
allow reproducing the error at will.  Or utility command like vacuum..

I think ideally you'd set:

log_statement= all
log_min_messages = info
log_destination  = 'stderr,csvlog'
# stderr isn't important for this purpose, but I keep it set to capture crash 
messages, too

You should set these to something that works well at your site:

log_rotation_age= '2min'
log_rotation_size   = '32MB'

I would normally set these, and I don't see any reason why you wouldn't set
them too:

log_checkpoints = on
log_lock_waits  = on
log_temp_files  = on
log_min_error_statement = notice
log_temp_files  = 0
log_min_duration_statement  = '9sec'
log_autovacuum_min_duration = '999sec'

And I would set these too but maybe you'd prefer to do something else:

log_directory   = /var/log/postgresql
log_file_mode   = 0640
log_filename= postgresql-%Y-%m-%d_%H%M%S.log

Justin



Re: dsa_allocate() faliure

2018-12-02 Thread Thomas Munro
On Sat, Dec 1, 2018 at 9:46 AM Justin Pryzby  wrote:
> elog(FATAL,
>  "dsa_allocate could not find %zu free 
> pages", npages);
> +   abort()

If anyone can reproduce this problem with a debugger, it'd be
interesting to see the output of dsa_dump(area), and
FreePageManagerDump(segment_map->fpm).  This error condition means
that get_best_segment() selected a segment from a segment bin that
holds segments with a certain minimum number of contiguous free pages
>= the requested number npages, but then FreePageManagerGet() found
that it didn't have npages of contiguous free memory after all when it
consulted the segment's btree of free space.  Possible explanations
include: the segment bin lists are somehow messed up, the FPM in the
segment was corrupted by someone scribbling on free pages (which hold
the btree), the btree was corrupted by an incorrect sequence of
allocate/free calls (for example double frees, allocating from one
area and freeing to another etc), freepage.c fails to track its
largest size correctly.

There is a macro FPM_EXTRA_ASSERTS that can be defined to double-check
the largest contiguous page tracking.  I have also been wondering
about a debug mode that would mprotect(PROT_READ) free pages when they
aren't being modified to detect unexpected writes, which should work
on systems that have 4k pages.

One thing I noticed is that it is failing on a "large" allocation,
where we go straight to the btree of 4k pages, but the equivalent code
where we allocate a superblock for "small" allocations doesn't report
the same kind of FATAL this-can't-happen error, it just fails the
allocation via the regular error path without explanation.  I also
spotted a path that doesn't respect the DSA_ALLOC_NO_OOM flag (you get
a null pointer instead of an error).  I should fix those
inconsistencies (draft patch attached), but those are incidental
problems AFAIK.

-- 
Thomas Munro
http://www.enterprisedb.com


fix-dsa-area-handling.patch
Description: Binary data


Re: dsa_allocate() faliure

2018-11-30 Thread Justin Pryzby
On Fri, Nov 30, 2018 at 08:20:49PM +0100, Jakub Glapa wrote:
> In the last days I've been monitoring no segfault occurred but the
> das_allocation did.
> I'm starting to doubt if the segfault I've found in dmesg was actually
> related.

The dmesg looks like a real crash, not just OOM.  You can hopefully find the
timestamp of the segfaults in /var/log/syslog, and compare with postgres logs
if they go back far enough.  All the postgres processes except the parent
would've been restarted at that time.

> I've grepped the postgres log for dsa_allocated:
> Why do the messages occur sometimes as FATAL and sometimes as ERROR?

I believe it may depend if it happens in a parallel worker or the leader.

You may get more log detail if you enable CSV logging (although unfortunately
as I recall it doesn't indicate it's a parallel worker).

You could force it to dump core if you recompile postgres with an assert() (see
patch below).

You could build an .deb by running dpkg-buildpackage -rfakeroot or similar (i
haven't done this in awhile), or you could compile, install, and launch
debugging binaries from your homedir (or similar)

You'd want to compile the same version (git checkout REL_10_6) and with the
proper configure flags..perhaps starting with:
./configure --with-libxml --with-libxslt --enable-debug 
--prefix=$HOME/src/postgresql.bin --enable-cassert && time make && make install

Be careful if you have extensions installed that they still work.

Justin

--- a/src/backend/utils/mmgr/dsa.c
+++ b/src/backend/utils/mmgr/dsa.c
@@ -727,4 +727,7 @@ dsa_allocate_extended(dsa_area *area, size_t size, int 
flags)
if (!FreePageManagerGet(segment_map->fpm, npages, _page))
+   {
elog(FATAL,
 "dsa_allocate could not find %zu free pages", 
npages);
+   abort()
+   }




Re: dsa_allocate() faliure

2018-11-30 Thread Jakub Glapa
Hi, just a small update.
I've configured the OS for taking crash dumps on Ubuntu 16.04 with the
following (maybe somebody will find it helpful):
I've added LimitCORE=infinity to /lib/systemd/system/postgresql@.service
under [Service] section
I've reloaded the service config with sudo systemctl daemon-reload
Changed the core pattern to: sudo echo
/var/lib/postgresql/core.%p.sig%s.%ts | tee -a /proc/sys/kernel/core_pattern
I had tested it with kill -ABRT pidofbackend and it behaved correctly. A
crash dump was written.

In the last days I've been monitoring no segfault occurred but the
das_allocation did.
I'm starting to doubt if the segfault I've found in dmesg was actually
related.

I've grepped the postgres log for dsa_allocated:
Why do the messages occur sometimes as FATAL and sometimes as ERROR?

2018-11-29 07:59:06 CET::@:[20584]: FATAL:  dsa_allocate could not find 7
free pages
2018-11-29 07:59:06 CET:127.0.0.1(40846):user@db:[19507]: ERROR:
dsa_allocate could not find 7 free pages
2018-11-30 09:04:13 CET::@:[27341]: FATAL:  dsa_allocate could not find 13
free pages
2018-11-30 09:04:13 CET:127.0.0.1(41782):user@db:[25417]: ERROR:
dsa_allocate could not find 13 free pages
2018-11-30 09:28:38 CET::@:[30215]: FATAL:  dsa_allocate could not find 4
free pages
2018-11-30 09:28:38 CET:127.0.0.1(45980):user@db:[29924]: ERROR:
dsa_allocate could not find 4 free pages
2018-11-30 16:37:16 CET::@:[14385]: FATAL:  dsa_allocate could not find 7
free pages
2018-11-30 16:37:16 CET::@:[14375]: FATAL:  dsa_allocate could not find 7
free pages
2018-11-30 16:37:16 CET:212.186.105.45(55004):user@db:[14386]: FATAL:
dsa_allocate could not find 7 free pages
2018-11-30 16:37:16 CET:212.186.105.45(54964):user@db:[14379]: ERROR:
dsa_allocate could not find 7 free pages
2018-11-30 16:37:16 CET:212.186.105.45(54916):user@db:[14370]: ERROR:
dsa_allocate could not find 7 free pages
2018-11-30 16:45:11 CET:212.186.105.45(55356):user@db:[14555]: FATAL:
dsa_allocate could not find 7 free pages
2018-11-30 16:49:13 CET::@:[15359]: FATAL:  dsa_allocate could not find 7
free pages
2018-11-30 16:49:13 CET::@:[15363]: FATAL:  dsa_allocate could not find 7
free pages
2018-11-30 16:49:13 CET:212.186.105.45(54964):user@db:[14379]: FATAL:
dsa_allocate could not find 7 free pages
2018-11-30 16:49:13 CET:212.186.105.45(54916):user@db:[14370]: ERROR:
dsa_allocate could not find 7 free pages
2018-11-30 16:49:13 CET:212.186.105.45(55842):user@db:[14815]: ERROR:
dsa_allocate could not find 7 free pages
2018-11-30 16:56:11 CET:212.186.105.45(57076):user@db:[15638]: FATAL:
dsa_allocate could not find 7 free pages


There's quite a bit errors from today but I was launching the problematic
query in parallel from 2-3 sessions.
Sometimes it was breaking sometimes not.
Couldn't find any pattern.
The workload on this db is not really constant, rather bursting.

--
regards,
Jakub Glapa


On Tue, Nov 27, 2018 at 9:03 AM Thomas Munro 
wrote:

> On Tue, Nov 27, 2018 at 4:00 PM Thomas Munro
>  wrote:
> > Hmm.  I will see if I can come up with a many-partition torture test
> > reproducer for this.
>
> No luck.  I suppose one theory that could link both failure modes
> would a buffer overrun, where in the non-shared case it trashes a
> pointer that is later dereferenced, and in the shared case it writes
> past the end of allocated 4KB pages and corrupts the intrusive btree
> that lives in spare pages to track available space.
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>


Re: dsa_allocate() faliure

2018-11-27 Thread Thomas Munro
On Tue, Nov 27, 2018 at 4:00 PM Thomas Munro
 wrote:
> Hmm.  I will see if I can come up with a many-partition torture test
> reproducer for this.

No luck.  I suppose one theory that could link both failure modes
would a buffer overrun, where in the non-shared case it trashes a
pointer that is later dereferenced, and in the shared case it writes
past the end of allocated 4KB pages and corrupts the intrusive btree
that lives in spare pages to track available space.

--
Thomas Munro
http://www.enterprisedb.com



Re: dsa_allocate() faliure

2018-11-26 Thread Thomas Munro
On Tue, Nov 27, 2018 at 7:45 AM Alvaro Herrera  wrote:
> On 2018-Nov-26, Jakub Glapa wrote:
> > Justin thanks for the information!
> > I'm running Ubuntu 16.04.
> > I'll try to prepare for the next crash.
> > Couldn't find anything this time.
>
> As I recall, the appport stuff in Ubuntu is terrible ... I've seen it
> take 40 minutes to write the crash dump to disk, during which the
> database was "down".  I don't know why it is so slow (it's a rather
> silly python script that apparently processes the core dump one byte at
> a time, and you can imagine that with a few gigabytes of shared memory
> that takes a while).  Anyway my recommendation was to *remove* that
> stuff from the server and make sure the core file is saved by normal
> means.

Thanks for CC-ing me.  I didn't see this thread earlier because I'm
not subscribed to -performance.  Let's move it over to -hackers since
it looks like it's going to be a debugging exercise.  So, reading
through the thread[1], I think there might be two independent problems
here:

1.  Jakub has a many-partition Parallel Bitmap Heap Scan query that
segfaults when run with max_parallel_workers = 0.  That sounds
suspiciously like an instance of a class of bug we've run into before.
We planned a parallel query, but were unable to launch one due to lack
of DSM slots or process slots, so we run the parallel plan in a kind
of degraded non-parallel mode that needs to cope with various pointers
into shared memory being NULL.  A back trace from a core file should
hopefully make it very obvious what's going on.

2.  The same query when run in real parallel query mode occasionally
reaches an error "dsa_allocate could not find 7 free pages", which
should not happen.  This is on 10.6, so it has the commit "Fix
segment_bins corruption in dsa.c.".

Hmm.  I will see if I can come up with a many-partition torture test
reproducer for this.

[1] 
https://www.postgresql.org/message-id/flat/CAJk1zg10iCNsxFvQ4pgKe1B0rdjNG9iELA7AzLXjXnQm5T%3DKzQ%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com