Re: [HACKERS] [sqlsmith] Failed assertions on parallel worker shutdown
On Fri, Jun 3, 2016 at 11:13 PM, Robert Haaswrote: >> I think the workers should stop processing tuples after the tuple queues got >> detached. This will not only handle above situation gracefully, but will >> allow to speed up the queries where Limit clause is present on top of Gather >> node. Patch for the same is attached with mail (this was part of original >> parallel seq scan patch, but was not applied and the reason as far as I >> remember was we thought such an optimization might not be required for >> initial version). > > This is very likely a good idea, but... > >> Another approach to fix this issue could be to reset mqh_partial_bytes and >> mqh_length_word_complete in shm_mq_sendv in case of SHM_MQ_DETACHED. These >> are currently reset only incase of success. > > ...I think we should do this too Patches for both of these things are now committed. Barring problems with those commits, I believe this resolves this open item. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertions on parallel worker shutdown
On Sat, Jun 4, 2016 at 2:57 AM, Amit Kapilawrote: > In the above change, you are first adding bytes_written and then doing the > SHM_MQ_DETACHED check, whereas other place the check is done first which > seems to be right. I think it doesn't matter either way, but it is better > to be consistent. Also isn't it better to set mqh_length_word_complete as > false as next time this API is called, it should first try to write length > into buffer. OK, committed after (I hope) fixing those issues. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertions on parallel worker shutdown
On Sat, Jun 4, 2016 at 8:43 AM, Robert Haaswrote: > > On Thu, May 26, 2016 at 5:57 AM, Amit Kapila wrote: > > > > I am able to reproduce the assertion (it occurs once in two to three times, > > but always at same place) you have reported upthread with the above query. > > It seems to me, issue here is that while workers are writing tuples in the > > tuple queue, the master backend has detached from the queues. The reason > > why master backend has detached from tuple queues is because of Limit > > clause, basically after processing required tuples as specified by Limit > > clause, it calls shutdown of nodes in below part of code: > > I can't reproduce this assertion failure on master. I tried running > 'make installcheck' and then running this query repeatedly in the > regression database with and without > parallel_setup_cost=parallel_tuple_cost=0, and got nowhere. Does that > work for you, or do you have some other set of steps? > I have tried by populating pg_statistic table after running make installcheck. The way to populate pg_statistic is to create lot of tables and insert few rows in each table as mentioned in the end of mail upthread. https://www.postgresql.org/message-id/CAA4eK1KOKGqmz9bGu%2BZ42qhRwMbm4R5rfnqsLCNqFs9j14jzEA%40mail.gmail.com Today, again I tried reproducing it using same steps, but could not reproduce it. This is a timing issue and difficult to reproduce, last time also I have spent quite some time to reproduce it. I think we can fix the issue as per analysis done by me last time and then let Andreas run his tests to see if he could see the issue again. > > I think the workers should stop processing tuples after the tuple queues got > > detached. This will not only handle above situation gracefully, but will > > allow to speed up the queries where Limit clause is present on top of Gather > > node. Patch for the same is attached with mail (this was part of original > > parallel seq scan patch, but was not applied and the reason as far as I > > remember was we thought such an optimization might not be required for > > initial version). > > This is very likely a good idea, but... > > > Another approach to fix this issue could be to reset mqh_partial_bytes and > > mqh_length_word_complete in shm_mq_sendv in case of SHM_MQ_DETACHED. These > > are currently reset only incase of success. > > ...I think we should do this too, because it's intended that calling > shm_mq_sendv again after it previously returned SHM_MQ_DETACHED should > again return SHM_MQ_DETACHED, not fail an assertion. > res = shm_mq_send_bytes(mqh, j, tmpbuf, nowait, _written); mqh->mqh_partial_bytes += bytes_written; + if (res == SHM_MQ_DETACHED) + { + mqh->mqh_partial_bytes = 0; + return res; + } In the above change, you are first adding bytes_written and then doing the SHM_MQ_DETACHED check, whereas other place the check is done first which seems to be right. I think it doesn't matter either way, but it is better to be consistent. Also isn't it better to set mqh_length_word_complete as false as next time this API is called, it should first try to write length into buffer. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] [sqlsmith] Failed assertions on parallel worker shutdown
On Thu, May 26, 2016 at 5:57 AM, Amit Kapilawrote: > On Tue, May 24, 2016 at 6:36 PM, Andreas Seltenreich > wrote: >> >> >> Each of the sent plans was collected when a worker dumped core due to >> the failed assertion. More core dumps than plans were actually >> observed, since with this failed assertion, multiple workers usually >> trip on and dump core simultaneously. >> >> The following query corresponds to plan2: >> >> --8<---cut here---start->8--- >> select >> pg_catalog.pg_stat_get_bgwriter_requested_checkpoints() as c0, >> subq_0.c3 as c1, subq_0.c1 as c2, 31 as c3, 18 as c4, >> (select unique1 from public.bprime limit 1 offset 9) as c5, >> subq_0.c2 as c6 >> from >> (select ref_0.tablename as c0, ref_0.inherited as c1, >> ref_0.histogram_bounds as c2, 100 as c3 >> from >> pg_catalog.pg_stats as ref_0 >> where 49 is not NULL limit 55) as subq_0 >> where true >> limit 58; >> --8<---cut here---end--->8--- >> > > I am able to reproduce the assertion (it occurs once in two to three times, > but always at same place) you have reported upthread with the above query. > It seems to me, issue here is that while workers are writing tuples in the > tuple queue, the master backend has detached from the queues. The reason > why master backend has detached from tuple queues is because of Limit > clause, basically after processing required tuples as specified by Limit > clause, it calls shutdown of nodes in below part of code: I can't reproduce this assertion failure on master. I tried running 'make installcheck' and then running this query repeatedly in the regression database with and without parallel_setup_cost=parallel_tuple_cost=0, and got nowhere. Does that work for you, or do you have some other set of steps? > I think the workers should stop processing tuples after the tuple queues got > detached. This will not only handle above situation gracefully, but will > allow to speed up the queries where Limit clause is present on top of Gather > node. Patch for the same is attached with mail (this was part of original > parallel seq scan patch, but was not applied and the reason as far as I > remember was we thought such an optimization might not be required for > initial version). This is very likely a good idea, but... > Another approach to fix this issue could be to reset mqh_partial_bytes and > mqh_length_word_complete in shm_mq_sendv in case of SHM_MQ_DETACHED. These > are currently reset only incase of success. ...I think we should do this too, because it's intended that calling shm_mq_sendv again after it previously returned SHM_MQ_DETACHED should again return SHM_MQ_DETACHED, not fail an assertion. Can you see whether the attached patch fixes this for you? (Status update for Noah: I will provide another update regarding this issue no later than Monday COB, US time. I assume that Amit will have responded by then, and it should hopefully be clear what the next step is at that point.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company dont-fail-mq-assert-v1.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertions on parallel worker shutdown
On Thu, May 26, 2016 at 03:27:40PM +0530, Amit Kapila wrote: > I think the workers should stop processing tuples after the tuple queues > got detached. This will not only handle above situation gracefully, but > will allow to speed up the queries where Limit clause is present on top of > Gather node. Patch for the same is attached with mail (this was part of > original parallel seq scan patch, but was not applied and the reason as far > as I remember was we thought such an optimization might not be required for > initial version). > > Another approach to fix this issue could be to reset mqh_partial_bytes and > mqh_length_word_complete in shm_mq_sendv in case of SHM_MQ_DETACHED. > These are currently reset only incase of success. [Action required within 72 hours. This is a generic notification.] The above-described topic is currently a PostgreSQL 9.6 open item. Robert, since you committed the patch believed to have created it, you own this open item. If some other commit is more relevant or if this does not belong as a 9.6 open item, please let us know. Otherwise, please observe the policy on open item ownership[1] and send a status update within 72 hours of this message. Include a date for your subsequent status update. Testers may discover new open items at any time, and I want to plan to get them all fixed well in advance of shipping 9.6rc1. Consequently, I will appreciate your efforts toward speedy resolution. Thanks. [1] http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertions on parallel worker shutdown
On Tue, May 24, 2016 at 6:36 PM, Andreas Seltenreichwrote: > > > Each of the sent plans was collected when a worker dumped core due to > the failed assertion. More core dumps than plans were actually > observed, since with this failed assertion, multiple workers usually > trip on and dump core simultaneously. > > The following query corresponds to plan2: > > --8<---cut here---start->8--- > select > pg_catalog.pg_stat_get_bgwriter_requested_checkpoints() as c0, > subq_0.c3 as c1, subq_0.c1 as c2, 31 as c3, 18 as c4, > (select unique1 from public.bprime limit 1 offset 9) as c5, > subq_0.c2 as c6 > from > (select ref_0.tablename as c0, ref_0.inherited as c1, > ref_0.histogram_bounds as c2, 100 as c3 > from > pg_catalog.pg_stats as ref_0 > where 49 is not NULL limit 55) as subq_0 > where true > limit 58; > --8<---cut here---end--->8--- > > I am able to reproduce the assertion (it occurs once in two to three times, but always at same place) you have reported upthread with the above query. It seems to me, issue here is that while workers are writing tuples in the tuple queue, the master backend has detached from the queues. The reason why master backend has detached from tuple queues is because of Limit clause, basically after processing required tuples as specified by Limit clause, it calls shutdown of nodes in below part of code: ExecutePlan() { .. if (TupIsNull(slot)) { /* Allow nodes to release or shut down resources. */ (void) ExecShutdownNode(planstate); break; } .. } Now, it is quite possible that the worker has written part of it's data, after which the queue got detached. The callers of shm_mq (tqueueReceiveSlot/shm_mq_send) doesn't handle SHM_MQ_DETACHED due to which it keeps on sending more data (next tuple) which leads to the assertion in below code: shm_mq_sendv() { .. /* Write the actual data bytes into the buffer. */ Assert(mqh->mqh_partial_bytes <= nbytes); .. } I think the workers should stop processing tuples after the tuple queues got detached. This will not only handle above situation gracefully, but will allow to speed up the queries where Limit clause is present on top of Gather node. Patch for the same is attached with mail (this was part of original parallel seq scan patch, but was not applied and the reason as far as I remember was we thought such an optimization might not be required for initial version). Another approach to fix this issue could be to reset mqh_partial_bytes and mqh_length_word_complete in shm_mq_sendv in case of SHM_MQ_DETACHED. These are currently reset only incase of success. I have added this issue in list of PostgreSQL 9.6 open items @ https://wiki.postgresql.org/wiki/PostgreSQL_9.6_Open_Items The steps to reproduce it on regression database is: 1. We need to create enough rows in pg_statistic, so that parallel plan can be selected. 2. Used below procedure to ensure sufficient rows gets created in pg_statistic CREATE OR REPLACE FUNCTION populate_pg_stat() RETURNS int AS $BODY$ DECLARE count int; BEGIN FOR count IN 1 .. 5 LOOP Execute 'Create table t' || count || '(c1 int)'; Execute 'insert into t' || count || ' values (generate_series(1,10))'; Execute 'Analyze t' || count; End Loop; Return 1; END $BODY$ LANGUAGE plpgsql; 3. set parallel_tuple_cost = 0 set parallel_setup_cost = 0 execute query - Explain Analyze select pg_catalog.pg_stat_get_bgwriter_requested_checkpoints() as c0, subq_0.c3 as c1, subq_0.c1 as c2, 31 as c3, 18 as c4, (select unique1 from public.bprime limit 1 offset 9) as c5, subq_0.c2 as c6 from (select ref_0.tablename as c0, ref_0.inherited as c1, ref_0.histogram_bounds as c2, 100 as c3 from pg_catalog.pg_stats as ref_0 where 49 is not NULL limit 55) as subq_0 where true limit 58; With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com stop_processing_tuples_detached_queue_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertions on parallel worker shutdown
Amit Kapila writes: > On Mon, May 23, 2016 at 4:48 PM, Andreas Seltenreich> wrote: >> plan6 corresponds to this query: >> > Are you sure that the core dumps you are seeing are due to plan6? Each of the plans sent was harvested from a controlling process when the above assertion failed in its workers. I do not know whether the plans themselves really are at fault, as most of the collected plans look ok to me. The backtrace in the controlling process always look like the one reported. (Except when the coredumping took so long as to trigger a statement_timeout in the still-running master. There are no plans/queries available in this case, as the the state is no longer available in an aborted transaction.) > I have tried to generate a parallel plan for above query and it seems to me > that > after applying the patches (avoid_restricted_clause_below_gather_v1.patch > and prohibit_parallel_clause_below_rel_v1.patch), the plan it generates > doesn't have subplan below gather node [1]. > Without patch avoid_restricted_clause_below_gather_v1.patch, it will allow to > push > subplan below gather node, so I think either there is some other plan > (query) due to which you are seeing core dumps or the above two patches > haven't been applied before testing. According to my notes, the patches were applied in the instance that crashed. The fact that I do not see the other variants of the crashes the patches fix anymore, and the probability for this failed assertion per random query is reduced by about a factor of 20 in contrast to testing with the patches not applied, I'm pretty certain that this is not a bookkeeping error on my part. > Is it possible that core dump is due to plan2 or some other similar > plan (I am not sure at this stage about the cause of the problem you > are seeing, but if due to some reason PARAM_EXEC params are pushed > below gather, then such a plan might not work)? If you think plan > other than plan6 can cause such a problem, then can you share the > query for plan2? Each of the sent plans was collected when a worker dumped core due to the failed assertion. More core dumps than plans were actually observed, since with this failed assertion, multiple workers usually trip on and dump core simultaneously. The following query corresponds to plan2: --8<---cut here---start->8--- select pg_catalog.pg_stat_get_bgwriter_requested_checkpoints() as c0, subq_0.c3 as c1, subq_0.c1 as c2, 31 as c3, 18 as c4, (select unique1 from public.bprime limit 1 offset 9) as c5, subq_0.c2 as c6 from (select ref_0.tablename as c0, ref_0.inherited as c1, ref_0.histogram_bounds as c2, 100 as c3 from pg_catalog.pg_stats as ref_0 where 49 is not NULL limit 55) as subq_0 where true limit 58; --8<---cut here---end--->8--- regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertions on parallel worker shutdown
On Mon, May 23, 2016 at 4:48 PM, Andreas Seltenreichwrote: > > Amit Kapila writes: > > > Earlier problems were due to the reason that some unsafe/restricted > > expressions were pushed below Gather node as part of target list whereas in > > the plan6, it seems some unsafe node is pushed below Gather node. It will > > be helpful if you can share the offending query? > > plan6 corresponds to this query: > > select > pg_catalog.anyarray_out( > cast((select most_common_vals from pg_catalog.pg_stats limit 1 offset 41) > as anyarray)) as c0 > from > public.quad_point_tbl as ref_0 where ref_0.p ~= ref_0.p; > Are you sure that the core dumps you are seeing are due to plan6? I have tried to generate a parallel plan for above query and it seems to me that after applying the patches (avoid_restricted_clause_below_gather_v1.patch and prohibit_parallel_clause_below_rel_v1.patch), the plan it generates doesn't have subplan below gather node [1]. Without patch avoid_restricted_clause_below_gather_v1.patch, it will allow to push subplan below gather node, so I think either there is some other plan (query) due to which you are seeing core dumps or the above two patches haven't been applied before testing. Is it possible that core dump is due to plan2 or some other similar plan (I am not sure at this stage about the cause of the problem you are seeing, but if due to some reason PARAM_EXEC params are pushed below gather, then such a plan might not work)? If you think plan other than plan6 can cause such a problem, then can you share the query for plan2? [1] QUERY PLAN Gather (cost=543.59..5041.17 rows=1761 width=32) Output: anyarray_out($0) Workers Planned: 1 InitPlan 1 (returns $0) -> Limit (cost=491.56..543.59 rows=1 width=32) Output: pg_stats.most_common_vals -> Subquery Scan on pg_stats (cost=127.30..491.56 rows=7 width=32) Output: pg_stats.most_common_vals -> Hash Join (cost=127.30..491.49 rows=7 width=401) Output: NULL::name, NULL::name, NULL::name, NULL::boolean, NULL::real, NULL::integer, NULL::real, CASE WHEN (s.stakind1 = 1) THEN s.stavalues1 WHEN (s.stakind2 = 1) THEN s.stavalues2 WHEN (s.stakind3 = 1) THEN s.stavalues3 WHEN (s.stakind4 = 1) THEN s.stavalues4 WHEN (s.stakind5 = 1) THEN s.stavalues5 ELSE NULL::anyarray END, NULL::real[], NULL::anyarray, NULL::real, NULL::anyarray, NULL::real[], NULL::real[] Hash Cond: ((a.attrelid = c.oid) AND (a.attnum = s.staattnum)) Join Filter: has_column_privilege(c.oid, a.attnum, 'select'::text) -> Seq Scan on pg_catalog.pg_attribute a (cost=0.00..318.11 rows=6097 width=6) Output: a.attrelid, a.attname, a.atttypid, a.attstattarget, a.attlen, a.attnum, a.attndims, a.attcacheoff, a.atttypmod, a.attbyval, a.attstorage, a.attalign, a.attnotnull, a.atthasdef, a.attisdropped, a.attislocal, a.attinhcount, a.attcollation, a.attacl, a.attoptions, a.attfdwoptions Filter: (NOT a.attisdropped) -> Hash (cost=117.59..117.59 rows=647 width=180) Output: s.stakind1, s.stavalues1, s.stakind2, s.stavalues2, s.stakind3, s.stavalues3, s.stakind4, s.stavalues4, s.stakind5, s.stava lues5, s.starelid, s.staattnum, c.oid -> Hash Join (cost=57.21..117.59 rows=647 width=180) Output: s.stakind1, s.stavalues1, s.stakind2, s.stavalues2, s.stakind3, s.stavalues3, s.stakind4, s.stavalues4, s.stakind5, s .stavalues5, s.starelid, s.staattnum, c.oid Hash Cond: (s.starelid = c.oid) -> Seq Scan on pg_catalog.pg_statistic s (cost=0.00..51.48 rows=648 width=176) Output: s.starelid, s.staattnum, s.stainherit, s.stanullfrac, s.stawidth, s.stadistinct, s.stakind1, s.stakind2, s.stakind3, s.stakind4, s.stakind5, s.staop1, s.staop2, s.staop3, s.staop4, s.staop5,s.stanumbers1, s.stanumbers2, s.stanumbers3, s.stanumbers4, s.stanumbers5, s.sta values1, s.stavalues2, s.stavalues3, s.stavalues4, s.stavalues5 -> Hash (cost=45.61..45.61 rows=928 width=8) Output: c.oid, c.relnamespace -> Seq Scan on pg_catalog.pg_class
Re: [HACKERS] [sqlsmith] Failed assertions on parallel worker shutdown
Amit Kapila writes: > Earlier problems were due to the reason that some unsafe/restricted > expressions were pushed below Gather node as part of target list whereas in > the plan6, it seems some unsafe node is pushed below Gather node. It will > be helpful if you can share the offending query? plan6 corresponds to this query: select pg_catalog.anyarray_out( cast((select most_common_vals from pg_catalog.pg_stats limit 1 offset 41) as anyarray)) as c0 from public.quad_point_tbl as ref_0 where ref_0.p ~= ref_0.p; regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertions on parallel worker shutdown
On Mon, May 23, 2016 at 2:19 PM, Andreas Seltenreichwrote: > > I wrote: > >> There's another class of parallel worker core dumps when testing master > >> with sqlsmith. In these cases, the following assertion fails for all > >> workers simulataneously: > >> > >> TRAP: FailedAssertion("!(mqh->mqh_partial_bytes <= nbytes)", File: "shm_mq.c", Line: 386) > > > > I no longer observe these after applying these two patches by Amit > > Kapila > > I spoke too soon: These still occur with the patches applied, but with > much lower probability. (one core dump per 20e6 random queries instead > of 1e6). > > Most of the collected plans look inconspicuous to me now, except for one > that again had a subplan below a gather node (plan6). > This problem looks different from the previous problems you have reported. Earlier problems were due to the reason that some unsafe/restricted expressions were pushed below Gather node as part of target list whereas in the plan6, it seems some unsafe node is pushed below Gather node. It will be helpful if you can share the offending query? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] [sqlsmith] Failed assertions on parallel worker shutdown
I wrote: >> There's another class of parallel worker core dumps when testing master >> with sqlsmith. In these cases, the following assertion fails for all >> workers simulataneously: >> >> TRAP: FailedAssertion("!(mqh->mqh_partial_bytes <= nbytes)", File: >> "shm_mq.c", Line: 386) > > I no longer observe these after applying these two patches by Amit > Kapila I spoke too soon: These still occur with the patches applied, but with much lower probability. (one core dump per 20e6 random queries instead of 1e6). Most of the collected plans look inconspicuous to me now, except for one that again had a subplan below a gather node (plan6). Tarball of all collected plans attached. regards, Andreas plans.tar.gz Description: application/gzip -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertions on parallel worker shutdown
I wrote: > There's another class of parallel worker core dumps when testing master > with sqlsmith. In these cases, the following assertion fails for all > workers simulataneously: > > TRAP: FailedAssertion("!(mqh->mqh_partial_bytes <= nbytes)", File: > "shm_mq.c", Line: 386) I no longer observe these after applying these two patches by Amit Kapila: avoid_restricted_clause_below_gather_v1.patch Message-ID:
[HACKERS] [sqlsmith] Failed assertions on parallel worker shutdown
There's another class of parallel worker core dumps when testing master with sqlsmith. In these cases, the following assertion fails for all workers simulataneously: TRAP: FailedAssertion("!(mqh->mqh_partial_bytes <= nbytes)", File: "shm_mq.c", Line: 386) The backtraces of the controlling process is always in ExecShutdownGatherWorkers. The queries always work fine on re-running, so I guess there is some race condition on worker shutdown? Backtraces below. regards andreas Core was generated by `postgres: bgworker: parallel worker for PID 30525 '. Program terminated with signal SIGABRT, Aborted. #0 0x7f5a3df91067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 0x7f5a3df91067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x7f5a3df92448 in __GI_abort () at abort.c:89 #2 0x007eabe1 in ExceptionalCondition (conditionName=conditionName@entry=0x984e10 "!(mqh->mqh_partial_bytes <= nbytes)", errorType=errorType@entry=0x82a75d "FailedAssertion", fileName=fileName@entry=0x984b8c "shm_mq.c", lineNumber=lineNumber@entry=386) at assert.c:54 #3 0x006d8042 in shm_mq_sendv (mqh=0x25f17b8, iov=iov@entry=0x7ffc6352af00, iovcnt=iovcnt@entry=1, nowait=) at shm_mq.c:386 #4 0x006d807d in shm_mq_send (mqh=, nbytes=, data=, nowait=) at shm_mq.c:327 #5 0x005d96b9 in ExecutePlan (dest=0x25f1850, direction=, numberTuples=0, sendTuples=, operation=CMD_SELECT, use_parallel_mode=, planstate=0x2612da8, estate=0x2612658) at execMain.c:1596 #6 standard_ExecutorRun (queryDesc=0x261a660, direction=, count=0) at execMain.c:338 #7 0x005dc7cf in ParallelQueryMain (seg=, toc=0x7f5a3ea6c000) at execParallel.c:735 #8 0x004e617b in ParallelWorkerMain (main_arg=) at parallel.c:1035 #9 0x00683862 in StartBackgroundWorker () at bgworker.c:726 #10 0x0068e9a2 in do_start_bgworker (rw=0x2590760) at postmaster.c:5531 #11 maybe_start_bgworker () at postmaster.c:5706 #12 0x0046cbba in ServerLoop () at postmaster.c:1762 #13 0x0069081e in PostmasterMain (argc=argc@entry=4, argv=argv@entry=0x256d580) at postmaster.c:1298 #14 0x0046d80d in main (argc=4, argv=0x256d580) at main.c:228 (gdb) attach 30525 0x7f5a3e044e33 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:81 81 ../sysdeps/unix/syscall-template.S: No such file or directory. (gdb) bt #0 0x7f5a3e044e33 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:81 #1 0x006d1b4e in WaitEventSetWaitBlock (nevents=1, occurred_events=0x7ffc6352aec0, cur_timeout=-1, set=0x44251c0) at latch.c:981 #2 WaitEventSetWait (set=set@entry=0x44251c0, timeout=timeout@entry=-1, occurred_events=occurred_events@entry=0x7ffc6352aec0, nevents=nevents@entry=1) at latch.c:935 #3 0x006d1f96 in WaitLatchOrSocket (latch=0x7f5a3d898494, wakeEvents=wakeEvents@entry=1, sock=sock@entry=-1, timeout=timeout@entry=-1) at latch.c:347 #4 0x006d205d in WaitLatch (latch=, wakeEvents=wakeEvents@entry=1, timeout=timeout@entry=-1) at latch.c:302 #5 0x004e6d64 in WaitForParallelWorkersToFinish (pcxt=0x442d4e8) at parallel.c:537 #6 0x005dcf84 in ExecParallelFinish (pei=0x441cab8) at execParallel.c:541 #7 0x005eeead in ExecShutdownGatherWorkers (node=node@entry=0x3e3a070) at nodeGather.c:416 #8 0x005ef389 in ExecShutdownGather (node=0x3e3a070) at nodeGather.c:430 #9 0x005dd03d in ExecShutdownNode (node=0x3e3a070) at execProcnode.c:807 #10 0x0061ad73 in planstate_tree_walker (planstate=0x3e361a8, walker=0x5dd010 , context=0x0) at nodeFuncs.c:3442 #11 0x0061ad73 in planstate_tree_walker (planstate=0xf323c30, walker=0x5dd010 , context=0x0) at nodeFuncs.c:3442 #12 0x0061ad73 in planstate_tree_walker (planstate=0xf323960, walker=0x5dd010 , context=0x0) at nodeFuncs.c:3442 #13 0x005d96da in ExecutePlan (dest=0xb826868, direction=, numberTuples=0, sendTuples=, operation=CMD_SELECT, use_parallel_mode=, planstate=0xf323960, estate=0xf322b28) at execMain.c:1576 #14 standard_ExecutorRun (queryDesc=0xddca888, direction=, count=0) at execMain.c:338 #15 0x006f6e88 in PortalRunSelect (portal=portal@entry=0x258ccc8, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0xb826868) at pquery.c:946 #16 0x006f83ae in PortalRun (portal=0x258ccc8, count=9223372036854775807, isTopLevel=, dest=0xb826868, altdest=0xb826868, completionTag=0x7ffc6352b3d0 "") at pquery.c:787 #17 0x006f5c63 in exec_simple_query (query_string=) at postgres.c:1094 #18 PostgresMain (argc=39374024, argv=0x25ed130, dbname=0x256e480 "regression", username=0x25ed308 "0\321^\002") at postgres.c:4059 #19 0x0046c8b2 in BackendRun (port=0x25935d0) at postmaster.c:4258 #20