Re: dsa_allocate() faliure
On Mon, Feb 11, 2019 at 10:33 AM Tom Lane wrote: > Thomas Munro writes: > > So I'll wait until after the release, and we'll have > > to live with the bug for another 3 months. > > Check. Please hold off committing until you see the release tags > appear, probably late Tuesday my time / Wednesday noonish yours. Pushed. -- Thomas Munro http://www.enterprisedb.com
Re: dsa_allocate() faliure
Thomas Munro writes: > On Mon, Feb 11, 2019 at 10:33 AM Tom Lane wrote: >> I observe from >> https://coverage.postgresql.org/src/backend/utils/mmgr/freepage.c.gcov.html >> that the edge cases in this function aren't too well exercised by >> our regression tests, meaning that the buildfarm might not prove >> much either way about the correctness of this patch. That is one >> factor pushing me to think we shouldn't risk it. But, taking a >> longer view, is that something that's practical to improve? > Yeah. This is a nice example of code that really deserves unit tests > written in C. Could be good motivation to built the infrastructure I > mentioned here: > https://www.postgresql.org/message-id/flat/CAEepm%3D2heu%2B5zwB65jWap3XY-UP6PpJZiKLQRSV2UQH9BmVRXQ%40mail.gmail.com Meh. I think if you hold out for that, you're going to be waiting a long time. I was thinking more along the lines of making a test API in src/test/modules/, akin to what we've got for predtest or rbtree. regards, tom lane
Re: dsa_allocate() faliure
On Mon, Feb 11, 2019 at 10:33 AM Tom Lane wrote: > I observe from > > https://coverage.postgresql.org/src/backend/utils/mmgr/freepage.c.gcov.html > > that the edge cases in this function aren't too well exercised by > our regression tests, meaning that the buildfarm might not prove > much either way about the correctness of this patch. That is one > factor pushing me to think we shouldn't risk it. But, taking a > longer view, is that something that's practical to improve? Yeah. This is a nice example of code that really deserves unit tests written in C. Could be good motivation to built the infrastructure I mentioned here: https://www.postgresql.org/message-id/flat/CAEepm%3D2heu%2B5zwB65jWap3XY-UP6PpJZiKLQRSV2UQH9BmVRXQ%40mail.gmail.com -- Thomas Munro http://www.enterprisedb.com
Re: dsa_allocate() faliure
On Mon, Feb 11, 2019 at 11:02 AM Justin Pryzby wrote: > On Mon, Feb 11, 2019 at 09:45:07AM +1100, Thomas Munro wrote: > > Ouch. Yeah, that'd do it and matches the evidence. With this change, > > I couldn't reproduce the problem after 90 minutes with a test case > > that otherwise hits it within a couple of minutes. > ... > > Note that this patch addresses the error "dsa_allocate could not find > > %zu free pages". (The error "dsa_area could not attach to segment" is > > something else and apparently rarer.) > > "could not attach" is the error reported early this morning while > stress-testing this patch with queued_alters queries in loops, so that's > consistent with your understanding. And I guess it preceded getting stuck on > lock; although I don't how long between the first happened and the second, I'm > guess not long and perhaps immedidately; since the rest of the processes were > all stuck as in bug#15585 rather than ERRORing once every few minutes. > > I mentioned that "could not attach to segment" occurs in leader either/or > parallel worker. And most of the time causes an ERROR only, and doesn't wedge > all future parallel workers. Maybe bug#15585 "wedged" state maybe only occurs > after some pattern of leader+worker failures (?) I've just triggered > bug#15585 > again, but if there's a pattern, I don't see it. > > Please let me know whether you're able to reproduce the "not attach" bug using > simultaneous loops around the queued_alters query; it's easy here. I haven't ever managed to reproduce that one yet. It's great you have a reliable repro... Let's discuss it on the #15585 thread. -- Thomas Munro http://www.enterprisedb.com
Re: dsa_allocate() faliure
On Mon, Feb 11, 2019 at 09:45:07AM +1100, Thomas Munro wrote: > Ouch. Yeah, that'd do it and matches the evidence. With this change, > I couldn't reproduce the problem after 90 minutes with a test case > that otherwise hits it within a couple of minutes. ... > Note that this patch addresses the error "dsa_allocate could not find > %zu free pages". (The error "dsa_area could not attach to segment" is > something else and apparently rarer.) "could not attach" is the error reported early this morning while stress-testing this patch with queued_alters queries in loops, so that's consistent with your understanding. And I guess it preceded getting stuck on lock; although I don't how long between the first happened and the second, I'm guess not long and perhaps immedidately; since the rest of the processes were all stuck as in bug#15585 rather than ERRORing once every few minutes. I mentioned that "could not attach to segment" occurs in leader either/or parallel worker. And most of the time causes an ERROR only, and doesn't wedge all future parallel workers. Maybe bug#15585 "wedged" state maybe only occurs after some pattern of leader+worker failures (?) I've just triggered bug#15585 again, but if there's a pattern, I don't see it. Please let me know whether you're able to reproduce the "not attach" bug using simultaneous loops around the queued_alters query; it's easy here. Justin
Re: dsa_allocate() faliure
Thomas Munro writes: > This brings us to a difficult choice: we're about to cut a new > release, and this could in theory be included. Even though the fix is > quite convincing, it doesn't seem wise to change such complicated code > at the last minute, and I know from an off-list chat that that is also > Robert's view. Yeah ... at this point we're just too close to the release deadline, I'm afraid, even though the fix *looks* pretty safe. Not worth the risk given that this seems to be a low-probability bug. I observe from https://coverage.postgresql.org/src/backend/utils/mmgr/freepage.c.gcov.html that the edge cases in this function aren't too well exercised by our regression tests, meaning that the buildfarm might not prove much either way about the correctness of this patch. That is one factor pushing me to think we shouldn't risk it. But, taking a longer view, is that something that's practical to improve? > So I'll wait until after the release, and we'll have > to live with the bug for another 3 months. Check. Please hold off committing until you see the release tags appear, probably late Tuesday my time / Wednesday noonish yours. regards, tom lane
Re: dsa_allocate() faliure
On Sun, Feb 10, 2019 at 5:41 PM Robert Haas wrote: > On Sun, Feb 10, 2019 at 2:37 AM Thomas Munro > wrote: > > But at first glance it shouldn't be allocating pages, because it just > > does consolidation to try to convert to singleton format, and then it > > does recycle list cleanup using soft=true so that no allocation of > > btree pages should occur. > > I think I see what's happening. At the moment the problem occurs, > there is no btree - there is only a singleton range. So > FreePageManagerInternal() takes the fpm->btree_depth == 0 branch and > then ends up in the section with the comment /* Not contiguous; we > need to initialize the btree. */. And that section, sadly, does not > respect the 'soft' flag, so kaboom. Something like the attached might > fix it. Ouch. Yeah, that'd do it and matches the evidence. With this change, I couldn't reproduce the problem after 90 minutes with a test case that otherwise hits it within a couple of minutes. Here's a patch with a commit message explaining the change. It also removes an obsolete comment, which is in fact related. The comment refers to an output parameter internal_pages_used, which must have been used to report this exact phenomenon in an earlier development version. But there is no such parameter in the committed version, and instead there is the soft flag to prevent internal allocation. I have no view on which approach is best, but yeah, if we're using a soft flag, it has to work reliably. This brings us to a difficult choice: we're about to cut a new release, and this could in theory be included. Even though the fix is quite convincing, it doesn't seem wise to change such complicated code at the last minute, and I know from an off-list chat that that is also Robert's view. So I'll wait until after the release, and we'll have to live with the bug for another 3 months. Note that this patch addresses the error "dsa_allocate could not find %zu free pages". (The error "dsa_area could not attach to segment" is something else and apparently rarer.) > Boy, I love FreePageManagerDump! Yeah. And I love reproducible bugs. -- Thomas Munro http://www.enterprisedb.com 0001-Fix-freepage.c-bug-that-causes-rare-dsa_allocate-fai.patch Description: Binary data
Re: dsa_allocate() faliure
On Sun, Feb 10, 2019 at 07:11:22PM +0300, Sergei Kornilov wrote: > > I ran overnight with this patch, but all parallel processes ended up stuck > > in > > the style of bug#15585. So that's either not the root cause, or there's a > > 2nd > > issue. > > Maybe i missed something in this discussion, but you can reproduce bug#15585? > How? With this testcase: > https://www.postgresql.org/message-id/CAEepm%3D1MvOE-Sfv1chudx5KEmw_qHYrj8F9Og_WmdBRhXSQ%2B%2Bw%40mail.gmail.com > ? By running the queued_alters query multiple times in a loop: https://www.postgresql.org/message-id/20181231221734.GB25379%40telsasoft.com I'm able to trigger dsa "ERROR"s with that query on a newly initdb cluster with only that table. But I think some servers are more likely to hit it than others. I've only tripped on 15585 twice, and only while trying to trigger other DSA bugs (the working hypothesis is that bug is 2ndary issue which happens after hitting some other bug). And not consistently or quickly. Justin
Re: dsa_allocate() faliure
Hi > I ran overnight with this patch, but all parallel processes ended up stuck in > the style of bug#15585. So that's either not the root cause, or there's a 2nd > issue. Maybe i missed something in this discussion, but you can reproduce bug#15585? How? With this testcase: https://www.postgresql.org/message-id/CAEepm%3D1MvOE-Sfv1chudx5KEmw_qHYrj8F9Og_WmdBRhXSQ%2B%2Bw%40mail.gmail.com ? regards, Sergei
Re: dsa_allocate() faliure
On Sun, Feb 10, 2019 at 12:10:52PM +0530, Robert Haas wrote: > I think I see what's happening. At the moment the problem occurs, > there is no btree - there is only a singleton range. So > FreePageManagerInternal() takes the fpm->btree_depth == 0 branch and > then ends up in the section with the comment /* Not contiguous; we > need to initialize the btree. */. And that section, sadly, does not > respect the 'soft' flag, so kaboom. Something like the attached might > fix it. I ran overnight with this patch, but all parallel processes ended up stuck in the style of bug#15585. So that's either not the root cause, or there's a 2nd issue. https://www.postgresql.org/message-id/flat/15585-324ff6a93a18da46%40postgresql.org Justin
Re: dsa_allocate() faliure
On Sun, Feb 10, 2019 at 2:37 AM Thomas Munro wrote: > ... but why would it do that? I can reproduce cases where (for > example) FreePageManagerPutInternal() returns 179, and then > FreePageManagerLargestContiguous() returns 179, but then after > FreePageBtreeCleanup() it returns 178. At that point FreePageDump() > says: > > btree depth 1: > 77@0 l: 27(1) 78(178) > freelists: > 1: 27 > 129: 78(178) > > But at first glance it shouldn't be allocating pages, because it just > does consolidation to try to convert to singleton format, and then it > does recycle list cleanup using soft=true so that no allocation of > btree pages should occur. I think I see what's happening. At the moment the problem occurs, there is no btree - there is only a singleton range. So FreePageManagerInternal() takes the fpm->btree_depth == 0 branch and then ends up in the section with the comment /* Not contiguous; we need to initialize the btree. */. And that section, sadly, does not respect the 'soft' flag, so kaboom. Something like the attached might fix it. Boy, I love FreePageManagerDump! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company dsa-allocate-btree-init-soft.patch Description: Binary data
Re: dsa_allocate() faliure
On Sun, Feb 10, 2019 at 1:55 AM Thomas Munro wrote: > Bleugh. Yeah. What I said before wasn't quite right. The value > returned by FreePageManagerPutInternal() is actually correct at the > moment it is returned, but it ceases to be correct immediately > afterwards if the following call to FreePageBtreeCleanup() happens to > reduce the size of that particular span. The problem is that we > clobber fpm->contiguous_pages with the earlier (and by now incorrect) > value that we were holding in a local variable. Yeah, I had similar bugs to that during the initial development work I did on freepage.c, and that's why I got rid of some lazy recomputation thing that I had tried at some point. The version that got committed brought that back again, but possibly it's got the same kind of problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: dsa_allocate() faliure
On Sun, Feb 10, 2019 at 7:24 AM Thomas Munro wrote: > On Sat, Feb 9, 2019 at 9:21 PM Robert Haas wrote: > > On Fri, Feb 8, 2019 at 8:00 AM Thomas Munro > > wrote: > > > Sometimes FreeManagerPutInternal() returns a > > > number-of-contiguous-pages-created-by-this-insertion that is too large > > > by one. [...] > > > > I spent a long time thinking about this and starting at code this > > afternoon, but I didn't really come up with much of anything useful. > > It seems like a strange failure mode, because > > FreePageManagerPutInternal() normally just returns its third argument > > unmodified. [...] > > Bleugh. Yeah. What I said before wasn't quite right. The value > returned by FreePageManagerPutInternal() is actually correct at the > moment it is returned, but it ceases to be correct immediately > afterwards if the following call to FreePageBtreeCleanup() happens to > reduce the size of that particular span. ... but why would it do that? I can reproduce cases where (for example) FreePageManagerPutInternal() returns 179, and then FreePageManagerLargestContiguous() returns 179, but then after FreePageBtreeCleanup() it returns 178. At that point FreePageDump() says: btree depth 1: 77@0 l: 27(1) 78(178) freelists: 1: 27 129: 78(178) But at first glance it shouldn't be allocating pages, because it just does consolidation to try to convert to singleton format, and then it does recycle list cleanup using soft=true so that no allocation of btree pages should occur. -- Thomas Munro http://www.enterprisedb.com
Re: dsa_allocate() faliure
On Sat, Feb 9, 2019 at 9:21 PM Robert Haas wrote: > On Fri, Feb 8, 2019 at 8:00 AM Thomas Munro > wrote: > > Sometimes FreeManagerPutInternal() returns a > > number-of-contiguous-pages-created-by-this-insertion that is too large > > by one. [...] > > I spent a long time thinking about this and starting at code this > afternoon, but I didn't really come up with much of anything useful. > It seems like a strange failure mode, because > FreePageManagerPutInternal() normally just returns its third argument > unmodified. [...] Bleugh. Yeah. What I said before wasn't quite right. The value returned by FreePageManagerPutInternal() is actually correct at the moment it is returned, but it ceases to be correct immediately afterwards if the following call to FreePageBtreeCleanup() happens to reduce the size of that particular span. The problem is that we clobber fpm->contiguous_pages with the earlier (and by now incorrect) value that we were holding in a local variable. -- Thomas Munro http://www.enterprisedb.com
Re: dsa_allocate() faliure
On Fri, Feb 8, 2019 at 8:00 AM Thomas Munro wrote: > Sometimes FreeManagerPutInternal() returns a > number-of-contiguous-pages-created-by-this-insertion that is too large > by one. If this happens to be a new max-number-of-contiguous-pages, > it causes trouble some arbitrary time later because the max is wrong > and this FPM cannot satisfy a request that large, and it may not be > recomputed for some time because the incorrect value prevents > recomputation. Not sure yet if this is due to the lazy computation > logic or a plain old fence-post error in the btree consolidation code > or something else. I spent a long time thinking about this and starting at code this afternoon, but I didn't really come up with much of anything useful. It seems like a strange failure mode, because FreePageManagerPutInternal() normally just returns its third argument unmodified. The only cases where anything else happens are the ones where we're able to consolidate the returned span with a preceding or following span, and I'm scratching my head as to how that logic could be wrong, especially since it also has some Assert() statements that seem like they would detect the kinds of inconsistencies that would lead to trouble. For example, if we somehow ended up with two spans that (improperly) overlapped, we'd trip an Assert(). And if that didn't happen -- because we're not in an Assert-enabled build -- the code is written so that it only relies on the npages value of the last of the consolidated scans, so an error in the npages value of one of the earlier spans would just get fixed up. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: dsa_allocate() faliure
On Fri, Feb 8, 2019 at 4:49 AM Thomas Munro wrote: > I don't have the answer yet but I have some progress: I finally > reproduced the "could not find %d free pages" error by running lots of > concurrent parallel queries. Will investigate. Sometimes FreeManagerPutInternal() returns a number-of-contiguous-pages-created-by-this-insertion that is too large by one. If this happens to be a new max-number-of-contiguous-pages, it causes trouble some arbitrary time later because the max is wrong and this FPM cannot satisfy a request that large, and it may not be recomputed for some time because the incorrect value prevents recomputation. Not sure yet if this is due to the lazy computation logic or a plain old fence-post error in the btree consolidation code or something else. -- Thomas Munro http://www.enterprisedb.com
Re: dsa_allocate() faliure
On Thu, Feb 7, 2019 at 9:10 PM Jakub Glapa wrote: > > Do you have query logging enabled ? If not, could you consider it on at > > least > one of those servers ? I'm interested to know what ELSE is running at the > time > that query failed. > > Ok, I have configured that and will enable in the time window when the errors > usually occur. I'll report as soon as I have something. I don't have the answer yet but I have some progress: I finally reproduced the "could not find %d free pages" error by running lots of concurrent parallel queries. Will investigate. Set up: create table foo (p int, a int, b int) partition by list (p); create table foo_1 partition of foo for values in (1); create table foo_2 partition of foo for values in (2); create table foo_3 partition of foo for values in (3); alter table foo_1 set (parallel_workers = 4); alter table foo_2 set (parallel_workers = 4); alter table foo_3 set (parallel_workers = 4); insert into foo select generate_series(1, 1000)::int % 3 + 1, generate_series(1, 1000)::int % 50, generate_series(1, 1000)::int % 50; create index on foo_1(a); create index on foo_2(a); create index on foo_3(a); create index on foo_1(b); create index on foo_2(b); create index on foo_3(b); analyze; Then I ran three copies of : #!/bin/sh ( echo "set max_parallel_workers_per_gather = 4;" for I in `seq 1 10`; do echo "explain analyze select count(*) from foo where a between 5 and 6 or b between 5 and 6;" done ) | psql postgres -- Thomas Munro http://www.enterprisedb.com
Re: dsa_allocate() faliure
> Do you have query logging enabled ? If not, could you consider it on at least one of those servers ? I'm interested to know what ELSE is running at the time that query failed. Ok, I have configured that and will enable in the time window when the errors usually occur. I'll report as soon as I have something. -- regards, pozdrawiam, Jakub Glapa On Thu, Feb 7, 2019 at 12:21 AM Justin Pryzby wrote: > Moving to -hackers, hopefully it doesn't confuse the list scripts too much. > > On Mon, Feb 04, 2019 at 08:52:17AM +0100, Jakub Glapa wrote: > > I see the error showing up every night on 2 different servers. But it's a > > bit of a heisenbug because If I go there now it won't be reproducible. > > Do you have query logging enabled ? If not, could you consider it on at > least > one of those servers ? I'm interested to know what ELSE is running at the > time > that query failed. > > Perhaps you could enable query logging JUST for the interval of time that > the > server usually errors ? The CSV logs can be imported to postgres for > analysis. > You might do something like SELECT > left(message,99),COUNT(1),max(session_id) FROM postgres_log WHERE log_time > BETWEEN .. AND .. GROUP BY 1 ORDER BY 2; > And just maybe there'd be a query there that only runs once per day which > would > allow reproducing the error at will. Or utility command like vacuum.. > > I think ideally you'd set: > > log_statement= all > log_min_messages = info > log_destination = 'stderr,csvlog' > # stderr isn't important for this purpose, but I keep it set to capture > crash messages, too > > You should set these to something that works well at your site: > > log_rotation_age= '2min' > log_rotation_size = '32MB' > > I would normally set these, and I don't see any reason why you wouldn't set > them too: > > log_checkpoints = on > log_lock_waits = on > log_temp_files = on > log_min_error_statement = notice > log_temp_files = 0 > log_min_duration_statement = '9sec' > log_autovacuum_min_duration = '999sec' > > And I would set these too but maybe you'd prefer to do something else: > > log_directory = /var/log/postgresql > log_file_mode = 0640 > log_filename= postgresql-%Y-%m-%d_%H%M%S.log > > Justin >
Re: dsa_allocate() faliure
Moving to -hackers, hopefully it doesn't confuse the list scripts too much. On Mon, Feb 04, 2019 at 08:52:17AM +0100, Jakub Glapa wrote: > I see the error showing up every night on 2 different servers. But it's a > bit of a heisenbug because If I go there now it won't be reproducible. Do you have query logging enabled ? If not, could you consider it on at least one of those servers ? I'm interested to know what ELSE is running at the time that query failed. Perhaps you could enable query logging JUST for the interval of time that the server usually errors ? The CSV logs can be imported to postgres for analysis. You might do something like SELECT left(message,99),COUNT(1),max(session_id) FROM postgres_log WHERE log_time BETWEEN .. AND .. GROUP BY 1 ORDER BY 2; And just maybe there'd be a query there that only runs once per day which would allow reproducing the error at will. Or utility command like vacuum.. I think ideally you'd set: log_statement= all log_min_messages = info log_destination = 'stderr,csvlog' # stderr isn't important for this purpose, but I keep it set to capture crash messages, too You should set these to something that works well at your site: log_rotation_age= '2min' log_rotation_size = '32MB' I would normally set these, and I don't see any reason why you wouldn't set them too: log_checkpoints = on log_lock_waits = on log_temp_files = on log_min_error_statement = notice log_temp_files = 0 log_min_duration_statement = '9sec' log_autovacuum_min_duration = '999sec' And I would set these too but maybe you'd prefer to do something else: log_directory = /var/log/postgresql log_file_mode = 0640 log_filename= postgresql-%Y-%m-%d_%H%M%S.log Justin
Re: dsa_allocate() faliure
On Sat, Dec 1, 2018 at 9:46 AM Justin Pryzby wrote: > elog(FATAL, > "dsa_allocate could not find %zu free > pages", npages); > + abort() If anyone can reproduce this problem with a debugger, it'd be interesting to see the output of dsa_dump(area), and FreePageManagerDump(segment_map->fpm). This error condition means that get_best_segment() selected a segment from a segment bin that holds segments with a certain minimum number of contiguous free pages >= the requested number npages, but then FreePageManagerGet() found that it didn't have npages of contiguous free memory after all when it consulted the segment's btree of free space. Possible explanations include: the segment bin lists are somehow messed up, the FPM in the segment was corrupted by someone scribbling on free pages (which hold the btree), the btree was corrupted by an incorrect sequence of allocate/free calls (for example double frees, allocating from one area and freeing to another etc), freepage.c fails to track its largest size correctly. There is a macro FPM_EXTRA_ASSERTS that can be defined to double-check the largest contiguous page tracking. I have also been wondering about a debug mode that would mprotect(PROT_READ) free pages when they aren't being modified to detect unexpected writes, which should work on systems that have 4k pages. One thing I noticed is that it is failing on a "large" allocation, where we go straight to the btree of 4k pages, but the equivalent code where we allocate a superblock for "small" allocations doesn't report the same kind of FATAL this-can't-happen error, it just fails the allocation via the regular error path without explanation. I also spotted a path that doesn't respect the DSA_ALLOC_NO_OOM flag (you get a null pointer instead of an error). I should fix those inconsistencies (draft patch attached), but those are incidental problems AFAIK. -- Thomas Munro http://www.enterprisedb.com fix-dsa-area-handling.patch Description: Binary data
Re: dsa_allocate() faliure
On Fri, Nov 30, 2018 at 08:20:49PM +0100, Jakub Glapa wrote: > In the last days I've been monitoring no segfault occurred but the > das_allocation did. > I'm starting to doubt if the segfault I've found in dmesg was actually > related. The dmesg looks like a real crash, not just OOM. You can hopefully find the timestamp of the segfaults in /var/log/syslog, and compare with postgres logs if they go back far enough. All the postgres processes except the parent would've been restarted at that time. > I've grepped the postgres log for dsa_allocated: > Why do the messages occur sometimes as FATAL and sometimes as ERROR? I believe it may depend if it happens in a parallel worker or the leader. You may get more log detail if you enable CSV logging (although unfortunately as I recall it doesn't indicate it's a parallel worker). You could force it to dump core if you recompile postgres with an assert() (see patch below). You could build an .deb by running dpkg-buildpackage -rfakeroot or similar (i haven't done this in awhile), or you could compile, install, and launch debugging binaries from your homedir (or similar) You'd want to compile the same version (git checkout REL_10_6) and with the proper configure flags..perhaps starting with: ./configure --with-libxml --with-libxslt --enable-debug --prefix=$HOME/src/postgresql.bin --enable-cassert && time make && make install Be careful if you have extensions installed that they still work. Justin --- a/src/backend/utils/mmgr/dsa.c +++ b/src/backend/utils/mmgr/dsa.c @@ -727,4 +727,7 @@ dsa_allocate_extended(dsa_area *area, size_t size, int flags) if (!FreePageManagerGet(segment_map->fpm, npages, _page)) + { elog(FATAL, "dsa_allocate could not find %zu free pages", npages); + abort() + }
Re: dsa_allocate() faliure
Hi, just a small update. I've configured the OS for taking crash dumps on Ubuntu 16.04 with the following (maybe somebody will find it helpful): I've added LimitCORE=infinity to /lib/systemd/system/postgresql@.service under [Service] section I've reloaded the service config with sudo systemctl daemon-reload Changed the core pattern to: sudo echo /var/lib/postgresql/core.%p.sig%s.%ts | tee -a /proc/sys/kernel/core_pattern I had tested it with kill -ABRT pidofbackend and it behaved correctly. A crash dump was written. In the last days I've been monitoring no segfault occurred but the das_allocation did. I'm starting to doubt if the segfault I've found in dmesg was actually related. I've grepped the postgres log for dsa_allocated: Why do the messages occur sometimes as FATAL and sometimes as ERROR? 2018-11-29 07:59:06 CET::@:[20584]: FATAL: dsa_allocate could not find 7 free pages 2018-11-29 07:59:06 CET:127.0.0.1(40846):user@db:[19507]: ERROR: dsa_allocate could not find 7 free pages 2018-11-30 09:04:13 CET::@:[27341]: FATAL: dsa_allocate could not find 13 free pages 2018-11-30 09:04:13 CET:127.0.0.1(41782):user@db:[25417]: ERROR: dsa_allocate could not find 13 free pages 2018-11-30 09:28:38 CET::@:[30215]: FATAL: dsa_allocate could not find 4 free pages 2018-11-30 09:28:38 CET:127.0.0.1(45980):user@db:[29924]: ERROR: dsa_allocate could not find 4 free pages 2018-11-30 16:37:16 CET::@:[14385]: FATAL: dsa_allocate could not find 7 free pages 2018-11-30 16:37:16 CET::@:[14375]: FATAL: dsa_allocate could not find 7 free pages 2018-11-30 16:37:16 CET:212.186.105.45(55004):user@db:[14386]: FATAL: dsa_allocate could not find 7 free pages 2018-11-30 16:37:16 CET:212.186.105.45(54964):user@db:[14379]: ERROR: dsa_allocate could not find 7 free pages 2018-11-30 16:37:16 CET:212.186.105.45(54916):user@db:[14370]: ERROR: dsa_allocate could not find 7 free pages 2018-11-30 16:45:11 CET:212.186.105.45(55356):user@db:[14555]: FATAL: dsa_allocate could not find 7 free pages 2018-11-30 16:49:13 CET::@:[15359]: FATAL: dsa_allocate could not find 7 free pages 2018-11-30 16:49:13 CET::@:[15363]: FATAL: dsa_allocate could not find 7 free pages 2018-11-30 16:49:13 CET:212.186.105.45(54964):user@db:[14379]: FATAL: dsa_allocate could not find 7 free pages 2018-11-30 16:49:13 CET:212.186.105.45(54916):user@db:[14370]: ERROR: dsa_allocate could not find 7 free pages 2018-11-30 16:49:13 CET:212.186.105.45(55842):user@db:[14815]: ERROR: dsa_allocate could not find 7 free pages 2018-11-30 16:56:11 CET:212.186.105.45(57076):user@db:[15638]: FATAL: dsa_allocate could not find 7 free pages There's quite a bit errors from today but I was launching the problematic query in parallel from 2-3 sessions. Sometimes it was breaking sometimes not. Couldn't find any pattern. The workload on this db is not really constant, rather bursting. -- regards, Jakub Glapa On Tue, Nov 27, 2018 at 9:03 AM Thomas Munro wrote: > On Tue, Nov 27, 2018 at 4:00 PM Thomas Munro > wrote: > > Hmm. I will see if I can come up with a many-partition torture test > > reproducer for this. > > No luck. I suppose one theory that could link both failure modes > would a buffer overrun, where in the non-shared case it trashes a > pointer that is later dereferenced, and in the shared case it writes > past the end of allocated 4KB pages and corrupts the intrusive btree > that lives in spare pages to track available space. > > -- > Thomas Munro > http://www.enterprisedb.com >
Re: dsa_allocate() faliure
On Tue, Nov 27, 2018 at 4:00 PM Thomas Munro wrote: > Hmm. I will see if I can come up with a many-partition torture test > reproducer for this. No luck. I suppose one theory that could link both failure modes would a buffer overrun, where in the non-shared case it trashes a pointer that is later dereferenced, and in the shared case it writes past the end of allocated 4KB pages and corrupts the intrusive btree that lives in spare pages to track available space. -- Thomas Munro http://www.enterprisedb.com
Re: dsa_allocate() faliure
On Tue, Nov 27, 2018 at 7:45 AM Alvaro Herrera wrote: > On 2018-Nov-26, Jakub Glapa wrote: > > Justin thanks for the information! > > I'm running Ubuntu 16.04. > > I'll try to prepare for the next crash. > > Couldn't find anything this time. > > As I recall, the appport stuff in Ubuntu is terrible ... I've seen it > take 40 minutes to write the crash dump to disk, during which the > database was "down". I don't know why it is so slow (it's a rather > silly python script that apparently processes the core dump one byte at > a time, and you can imagine that with a few gigabytes of shared memory > that takes a while). Anyway my recommendation was to *remove* that > stuff from the server and make sure the core file is saved by normal > means. Thanks for CC-ing me. I didn't see this thread earlier because I'm not subscribed to -performance. Let's move it over to -hackers since it looks like it's going to be a debugging exercise. So, reading through the thread[1], I think there might be two independent problems here: 1. Jakub has a many-partition Parallel Bitmap Heap Scan query that segfaults when run with max_parallel_workers = 0. That sounds suspiciously like an instance of a class of bug we've run into before. We planned a parallel query, but were unable to launch one due to lack of DSM slots or process slots, so we run the parallel plan in a kind of degraded non-parallel mode that needs to cope with various pointers into shared memory being NULL. A back trace from a core file should hopefully make it very obvious what's going on. 2. The same query when run in real parallel query mode occasionally reaches an error "dsa_allocate could not find 7 free pages", which should not happen. This is on 10.6, so it has the commit "Fix segment_bins corruption in dsa.c.". Hmm. I will see if I can come up with a many-partition torture test reproducer for this. [1] https://www.postgresql.org/message-id/flat/CAJk1zg10iCNsxFvQ4pgKe1B0rdjNG9iELA7AzLXjXnQm5T%3DKzQ%40mail.gmail.com -- Thomas Munro http://www.enterprisedb.com