Re: [HACKERS] Must be owner to truncate?
Bruce Momjian schrob: Stephen Frost wrote: -- Start of PGP signed section. * Jim C. Nasby ([EMAIL PROTECTED]) wrote: On Thu, Jul 07, 2005 at 01:48:59PM -0400, Tom Lane wrote: I don't really agree with the viewpoint that truncate is just a quick DELETE, and so I do not agree that DELETE permissions should be enough to let you do a TRUNCATE. What about adding a truncate permission? I would find it useful, as it seems would others. That would be acceptable for me as well. I'd prefer it just work off delete, but as long as I can grant truncate to someone w/o giving them ownership rights on the table I'd be happy. Added to TODO: * Add TRUNCATE permission Currently only the owner can TRUNCATE a table because triggers are not called, and the table is locked in exclusive mode. Is anyone working on this yet? I looked at the code involved, and it seems there are just a couple of lines needed, some regression test and documentation updates, and most importantly, tab-completion updates. However, a question arose quickly: According to the standard, revoking INSERT, UPDATE and DELETE after GRANT ALL PRIVILEGES would leave the relation read-only, but with the TRUNCATE privilege lying around, this would no longer be true for PostgreSQL. Would this open a security hole or is it okay as far as extensions to the standard go? regards, Andreas -- ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] localization problem (and solution)
Tom Lane writes: I looked into this a bit more, and it seems the issue is that libperl will do setlocale(LC_ALL, ); the first time any locale-related Perl function is invoked. To defend ourselves against that, we'd have to set more environment variables than just LC_COLLATE and LC_CTYPE. What I'm thinking about is: * during startup, putenv(LC_ALL=C) and unsetenv any other LC_ variables that may be lurking, except LC_MESSAGES. * copy LC_COLLATE and LC_CTYPE into the environment when we get them from pg_control, as Manuel suggested. I'm afraid having LC_ALL in the environment at this time would still do the wrong thing on setlocale(LC_ALL, ); since a LC_ALL environment variable overrides the other categories. Maybe setting LANG instead would be a better choice? regards, Andreas -- ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] psql readline win32
Magnus Hagander writes: 2) Should we ship a file of standard bindings. We're not going to get it complete, but we could get some of the most common ones in europe at least (in sweden, this would for example include [EMAIL PROTECTED]|). Which would help people a lot. Yes we should, at least for Windows - put it in share along with other samples, maybe. That was my second question on that - where to put it, and how to find it. IMHO an elegant solution would be to expose readline's macro/bind mechanism with a psql command, e.g. like bash's bind does. This way one could put the macros into the system-wide psqlrc. regards, Andreas ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PG Extensions: Must be statically linked?
Mark Dilger writes: I have had the same concern, though never any hard evidence of a problem. If the C++ functions are wrapped with extern C, and all exceptions caught (perhaps converted into error numbers which are then returned from the wrapper functions to the plain-C calling functions), are there any remaining known problems? I have often considered making a C++ allocator which wrapped palloc and pfree, so that I could then use the STL within the backend... Has anyone tried this? I did some experiments on a C++ language handler last year (including an allocator and a class loader to spare the extern Cs/name mangling): http://archives.postgresql.org/pgsql-general/2005-10/msg01570.php news:[EMAIL PROTECTED] The remaining issue is the impedance mismatch between longjmp()ing and exceptions. regards, Andreas -- ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PL/pgSQL 'i = i + 1' Syntax
Mark Dilger schrob: Tom Lane wrote: No it isn't. The plpgsql scanner treats := and = as *the same token*. They can be interchanged freely. This has nothing to do with the case of modifying a loop variable in particular. I disagree. If the scanner treated them the same, then if i := 1 then ... would work, but it doesn't. The := is rejected in a conditional. Try the following code if you don't believe me: You're confusing the PL/pgSQL scanner with the SQL scanner. Expressions in PL/pgSQL are handed down to the SQL parser. regards, andreas ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] GIN stuck in loop during PITR
I'm just experimenting a bit with GIN, and it is occasionally getting stuck looping in findParents() during WAL replay. The attached patch seems to fix it. I also had to set ptr-off as advertised in the comment above the function to avoid triggering assertions. GIN isn't fully transparent to me yet, so it is quite likely that I am missing something... regards, andreas Index: ginbtree.c === RCS file: /projects/cvsroot/pgsql/src/backend/access/gin/ginbtree.c,v retrieving revision 1.1 diff -c -r1.1 ginbtree.c *** ginbtree.c 2 May 2006 11:28:54 - 1.1 --- ginbtree.c 25 May 2006 18:12:13 - *** *** 202,208 for(;;) { buffer = ReadBuffer(btree-index, blkno); LockBuffer(buffer, GIN_EXCLUSIVE); ! page = BufferGetPage(root-buffer); if ( GinPageIsLeaf(page) ) elog(ERROR, Lost path); --- 202,208 for(;;) { buffer = ReadBuffer(btree-index, blkno); LockBuffer(buffer, GIN_EXCLUSIVE); ! page = BufferGetPage(buffer); if ( GinPageIsLeaf(page) ) elog(ERROR, Lost path); *** *** 224,229 --- 224,230 ptr-blkno = blkno; ptr-buffer = buffer; ptr-parent = root; /* it's may be wrong, but in next call we will correct */ + ptr-off = offset; stack-parent = ptr; return; } ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] GIN stuck in loop during PITR
Andreas Seltenreich schrob: Teodor Sigaev schrob: Thanks a lot, applied. Can you describe test suite? It may be useful for test more... Here's a shell script that triggers the bug when I revert the patch. Just tried the script on HEAD, and it was triggering an assertion. I guess it is because we are still returning InvalidOffsetNumber in the trivial case (looks like a typo to me). I've attached a patch. regards, andreas Index: ginbtree.c === RCS file: /projects/cvsroot/pgsql/src/backend/access/gin/ginbtree.c,v retrieving revision 1.2 diff -c -r1.2 ginbtree.c *** ginbtree.c 26 May 2006 08:01:17 - 1.2 --- ginbtree.c 26 May 2006 20:09:45 - *** *** 189,195 Assert( !GinPageIsLeaf(page) ); /* check trivial case */ ! if ( (root-off != btree-findChildPtr(btree, page, stack-blkno, InvalidOffsetNumber)) != InvalidBuffer ) { stack-parent = root; return; } --- 189,195 Assert( !GinPageIsLeaf(page) ); /* check trivial case */ ! if ( (root-off = btree-findChildPtr(btree, page, stack-blkno, InvalidOffsetNumber)) != InvalidOffsetNumber ) { stack-parent = root; return; } ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] backup + restore fails
Holger Schoenen writes: ERROR: invalid byte sequence for encoding UTF8: 0xe46973 Command was: -- [...] -- Started on 2006-09-15 14:56:51 Westeuropäische Normalzeit The same problem was recently reported on the pgsql-de-allgemein list. Would just avoiding %Z in Win32's strftime be an acceptable solution? elog.c is already doing this, however because of the length of the zone names, not the localization problem. The attached patch is completely untested because I don't have access to a win32 box. regards, andreas Index: src/bin/pg_dump/pg_backup_archiver.c === RCS file: /projects/cvsroot/pgsql/src/bin/pg_dump/pg_backup_archiver.c,v retrieving revision 1.137 diff -c -r1.137 pg_backup_archiver.c *** src/bin/pg_dump/pg_backup_archiver.c14 Oct 2006 23:07:22 - 1.137 --- src/bin/pg_dump/pg_backup_archiver.c20 Oct 2006 18:59:11 - *** *** 2780,2785 { charbuf[256]; ! if (strftime(buf, 256, %Y-%m-%d %H:%M:%S %Z, localtime(tim)) != 0) ahprintf(AH, -- %s %s\n\n, msg, buf); } --- 2780,2793 { charbuf[256]; ! if (strftime(buf, 256, !/* Win32 timezone names are long and localized and ! * can interfere with utf-8 dumps */ ! #ifndef WIN32 !%Y-%m-%d %H:%M:%S %Z, ! #else !%Y-%m-%d %H:%M:%S, ! #endif !localtime(tim)) != 0) ahprintf(AH, -- %s %s\n\n, msg, buf); } Index: src/bin/pg_dump/pg_dumpall.c === RCS file: /projects/cvsroot/pgsql/src/bin/pg_dump/pg_dumpall.c,v retrieving revision 1.84 diff -c -r1.84 pg_dumpall.c *** src/bin/pg_dump/pg_dumpall.c7 Oct 2006 20:59:05 - 1.84 --- src/bin/pg_dump/pg_dumpall.c20 Oct 2006 18:59:12 - *** *** 1320,1325 charbuf[256]; time_t now = time(NULL); ! if (strftime(buf, 256, %Y-%m-%d %H:%M:%S %Z, localtime(now)) != 0) printf(-- %s %s\n\n, msg, buf); } --- 1320,1333 charbuf[256]; time_t now = time(NULL); ! if (strftime(buf, 256, !/* Win32 timezone names are long and localized and ! * can interfere with utf-8 dumps */ ! #ifndef WIN32 !%Y-%m-%d %H:%M:%S %Z, ! #else !%Y-%m-%d %H:%M:%S, ! #endif !localtime(now)) != 0) printf(-- %s %s\n\n, msg, buf); } ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Writing triggers in C++
Florian G. Pflug writes: Maybe we should create some wiki page or pgfoundry project that collects all glue code, tipps and tricks that people invented to glue C++ into the postgres backend. If it can be made to work, sure; in techdocs. I was thinking that two pairs of macros, PG_BEGIN_CPP, PG_END_CPP and PG_CPP_BEGIN_BACKEND, PG_CPP_END_BACKEND should be able to take care of the exception handling issues. You'd need to wrap any code-block that calls postgres functions that might do an elog(ERROR) inside PG_CPP_BEGIN_BACKEND, PG_CPP_END_BACKEND. Vice versa, any block of c++ code that is called from the backend would need to start with PG_BEGIN_CPP, and end with PG_END_CPP. I've made positive experiences with such a setup, although I've spared the PG_BEGIN_CPP/PG_END_CPP by doing the exception conversion in a C++ language handler that instantiates functors using the portable class loading technique described in this paper: http://www.s11n.net/papers/classloading_cpp.html I'd be glad to help out on a pgfoundry project to make C++ a better citizen for extending postgres. regards, andreas ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCH] Add error handling to byteaout.
Alvaro Herrera writes: Why not just use an unsigned 64 bit variable? Also, perhaps palloc_huge() avoids the whole problem in the first place ... Ja, that crossed my mind too, but the current limit is already far beyond anything that is usually configured for per-backend memory use, so I dismissed it. though it might only move the issue around, if you cannot ship the longer-than-1GB resulting escaped value. For example, when client and server encodings do not match: ,[ mbutils.c ] | result = palloc(len * MAX_CONVERSION_GROWTH + 1); ` This results in the fun fact that the maximum size for bytea values that are guaranteed to be pg_dumpable regardless of encoding/escaping settings is lower than 64MB. One thing that would also mitigate the problem is supporting a more efficient output format. For example, there's already means for base64-encoding in the backend: self=# select c, length(encode(mkbytea(28),c)) from (values ('hex'),('base64')) as v(c); c| length +--- hex| 536870912 base64 | 362623337 (2 rows) Maybe it is reasonable to make it available as another option for use in bytea_output? regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add error handling to byteaout.
Piotr Stefaniak writes: s/int/Size/ doesn't fix anything on 32-bit machines. Postgres requires twos-complement representation, so that the assumption that signed integer types wrap around on overflow can be safely made. Thanks for the clarification! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] Add error handling to byteaout.
Hi, when dealing with bytea values that are below the 1GB limit, but too large to be escaped, the error messages emitted are not very helpful for users. Especially if they appear in an unrelated context such as during pg_dump. I've attached a patch that adds ereport()ing that would have prevented some confusion. Example: ,[ master ] | ase=# select mkbytea(29); | ERROR: invalid memory alloc request size 1073741827 | ase=# set bytea_output to escape; | ase=# select mkbytea(29); | ERROR: invalid memory alloc request size 18446744071562067969 ` The scary one is due to an integer overflow the attached patch also fixes. I don't see any security implications though as it's only the sign bit that is affected. ,[ with patch applied ] | ase=# set bytea_output to 'escape'; | ase=# select mkbytea(29); | ERROR: escaped bytea value would be too big | DETAIL: Value would require 2147483649 bytes. | HINT: Use a different bytea_output setting or binary methods such as COPY BINARY. ` regards, Andreas create function mkbytea(power int, part bytea default '\x00') returns bytea as $$select case when power0 then mkbytea(power-1,part||part) else part end;$$ language sql; From f62a101a690fc9251c4c2de9c87323cedd0e9a54 Mon Sep 17 00:00:00 2001 From: Andreas Seltenreich andreas.seltenre...@credativ.de Date: Mon, 1 Jun 2015 16:17:21 +0200 Subject: [PATCH] Add error handling to byteaout. Emit a comprehensible error message when escaping fails due to MaxAllocSize instead of waiting for palloc to fail. Also use size_t for size computations to prevent integer overflow in BYTEA_OUTPUT_ESCAPE branch. --- src/backend/utils/adt/varlena.c | 28 +--- 1 file changed, 25 insertions(+), 3 deletions(-) diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c index 779729d..ec2594e 100644 --- a/src/backend/utils/adt/varlena.c +++ b/src/backend/utils/adt/varlena.c @@ -347,8 +347,21 @@ byteaout(PG_FUNCTION_ARGS) if (bytea_output == BYTEA_OUTPUT_HEX) { + Size len; /* Print hex format */ - rp = result = palloc(VARSIZE_ANY_EXHDR(vlena) * 2 + 2 + 1); + len = (Size)VARSIZE_ANY_EXHDR(vlena) * 2 + 2 + 1; + + if (!AllocSizeIsValid(len)) { + ereport(ERROR, + (errcode(ERRCODE_OUT_OF_MEMORY), +errmsg(escaped bytea value would be too big), +errdetail(Value would require %zu bytes., + len), +errhint(Use a different bytea_output setting or + binary methods such as COPY BINARY.))); + } + + rp = result = palloc(len); *rp++ = '\\'; *rp++ = 'x'; rp += hex_encode(VARDATA_ANY(vlena), VARSIZE_ANY_EXHDR(vlena), rp); @@ -357,8 +370,8 @@ byteaout(PG_FUNCTION_ARGS) { /* Print traditional escaped format */ char *vp; - int len; - int i; + Sizelen; + Sizei; len = 1;/* empty string has 1 char */ vp = VARDATA_ANY(vlena); @@ -371,6 +384,15 @@ byteaout(PG_FUNCTION_ARGS) else len++; } + if (!AllocSizeIsValid(len)) { + ereport(ERROR, + (errcode(ERRCODE_OUT_OF_MEMORY), +errmsg(escaped bytea value would be too big), +errdetail(Value would require %zu bytes., + len), +errhint(Use a different bytea_output setting or + binary methods such as COPY BINARY.))); + } rp = result = (char *) palloc(len); vp = VARDATA_ANY(vlena); for (i = VARSIZE_ANY_EXHDR(vlena); i != 0; i--, vp++) -- 2.1.4 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add error handling to byteaout.
Tom Lane t...@sss.pgh.pa.us writes: Andreas Seltenreich andreas.seltenre...@credativ.de writes: The scary one is due to an integer overflow the attached patch also fixes. s/int/Size/ doesn't fix anything on 32-bit machines. Well, it changes the signedness of the computation on 32-bit, and in combination with the fact that len is always smaller than 2^32, but may exceed 2^31-1, the change avoids the dependency on the undefined behavior of signed integer overflows in C on 32-bit as well. But I admit that this might be a rather academic point... Anyway, my motivation for the patch was the improved error reporting. Is the drive-by type change a problem here? regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [sqlsmith] Failed assertion in joinrels.c
Hi, sqlsmith triggered the following assertion in master (c188204). TRAP: FailedAssertion(!(!bms_overlap(joinrelids, sjinfo-min_lefthand)), File: joinrels.c, Line: 500) As usual, the query is against the regression database. It is rather unwieldy… I wonder if I should stop working on new grammar rules and instead work on some post-processing that prunes the AST as much as possible while maintaining the failure mode. regards, andreas select subq_647409.c0 as c0, subq_647409.c0 as c1 from public.customer as rel_4116461 left join public.clstr_tst_s as rel_4116555 left join information_schema.columns as rel_4116556 on (rel_4116555.rf_a = rel_4116556.ordinal_position ) right join pg_catalog.pg_roles as rel_4116557 on (rel_4116556.maximum_cardinality = rel_4116557.rolconnlimit ) on (rel_4116461.passwd = rel_4116557.rolpassword ) left join (select subq_647410.c8 as c0 from public.char_tbl as rel_4116611, lateral (select rel_4116612.name as c0, rel_4116612.comment as c1, rel_4116612.nslots as c2, rel_4116612.comment as c3, rel_4116612.nslots as c4, rel_4116612.nslots as c5, rel_4116612.comment as c6, rel_4116612.comment as c7, rel_4116612.nslots as c8, rel_4116612.nslots as c9, rel_4116612.nslots as c10 from public.hub as rel_4116612 where rel_4116612.comment ~=~ rel_4116612.comment fetch first 116 rows only) as subq_647410 where (subq_647410.c7 !~~* subq_647410.c7) or ((subq_647410.c3 = subq_647410.c3) and ((subq_647410.c7 ~* subq_647410.c6) and (subq_647410.c7 @@ subq_647410.c7))) fetch first 152 rows only) as subq_647409 inner join public.int4_tbl as rel_4116661 inner join public.shoe as rel_4116662 on (rel_4116661.f1 = rel_4116662.sh_avail ) inner join public.rtest_vview3 as rel_4116663 on (rel_4116661.f1 = rel_4116663.a ) on (subq_647409.c0 = rel_4116662.sh_avail ) on (rel_4116555.b = rel_4116661.f1 ) where ((rel_4116557.rolvaliduntil is NULL) or (rel_4116663.b !~ rel_4116461.name)) or (rel_4116661.f1 is not NULL) fetch first 80 rows only; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [sqlsmith] ERROR: failed to build any %d-way joins
Hi, there's a 1/1e6 chance that a sqlsmith query on the regression db of master (c124cef) fails with ERROR: failed to build any {4..8}-way joins They all appear to work fine on REL9_5_STABLE. sample query: select 1 from information_schema.collations as rel_60113935 inner join information_schema.sql_sizing_profiles as rel_60113936 on (rel_60113935.pad_attribute = rel_60113936.sizing_name ) inner join information_schema.foreign_tables as rel_60113937 on (rel_60113936.profile_id = rel_60113937.foreign_table_catalog ) right join information_schema.constraint_table_usage as rel_60113938 on (rel_60113935.collation_schema = rel_60113938.table_catalog ) left join public.btree_tall_tbl as rel_60113939 on (rel_60113936.required_value = rel_60113939.id ) where rel_60113938.table_name is not NULL; regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] ERROR: failed to build any %d-way joins
I wrote: They all appear to work fine on REL9_5_STABLE. oops, that branch was slightly stale. Updating it with the latest planner changes (8dccf03..0853388), the queries fail there as well. regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] subplan variable reference / unassigned NestLoopParams
Tom Lane writes: Andreas Seltenreich seltenre...@gmx.de writes: Tom Lane writes: Well, I certainly think all of these represent bugs: 3 | ERROR: plan should not reference subplan's variable 2 | ERROR: failed to assign all NestLoopParams to plan nodes These appear to be related. The following query produces the former, but if you replace the very last reference of provider with the literal 'bar', it raises the latter error. Fixed that, thanks for the test case! I haven't seen the former since your commit, but there recently were some instances of the latter. The attached queries all throw the error against master at 8752bbb. regards, Andreas -- ERROR: failed to assign all NestLoopParams to plan nodes select subq_19608926.c0 as c0, rel_124211605.objname as c1 from (select rel_124211602.numeric_precision as c0 from information_schema.sequences as rel_124211602 where EXISTS ( select rel_124211603.unique_constraint_name as c0 from information_schema.referential_constraints as rel_124211603 where rel_124211603.unique_constraint_schema is not NULL)) as subq_19608926 left join public.tab1 as rel_124211604 on (subq_19608926.c0 = rel_124211604.a ) right join pg_catalog.pg_seclabels as rel_124211605 right join public.phone as rel_124211606 on (rel_124211605.objtype = rel_124211606.comment ) on (rel_124211604.b = rel_124211605.objtype ) inner join public.bt_i4_heap as rel_124211729 inner join public.bt_i4_heap as rel_124211730 on (rel_124211729.random = rel_124211730.seqno ) on (subq_19608926.c0 = rel_124211730.seqno ) where rel_124211606.comment = rel_124211604.b; -- ERROR: failed to assign all NestLoopParams to plan nodes select subq_53656269.c0 as c0 from (select rel_339945676.id3c as c0 from public.rule_and_refint_t3 as rel_339945676 where rel_339945676.data !~~ rel_339945676.data) as subq_53656269 inner join public.dropcolumn as rel_339945677 on (subq_53656269.c0 = rel_339945677.b ) inner join public.bt_name_heap as rel_339945678 left join public.rtest_order2 as rel_339945705 inner join information_schema.sequences as rel_339945706 on (rel_339945705.a = rel_339945706.numeric_precision ) inner join public.num_result as rel_339945707 on (rel_339945705.b = rel_339945707.id1 ) on (rel_339945678.random = rel_339945706.numeric_precision ) on (rel_339945677.b = rel_339945706.numeric_precision ) where rel_339945678.seqno ~* rel_339945705.c fetch first 45 rows only; -- ERROR: failed to assign all NestLoopParams to plan nodes select rel_273437910.name as c0, rel_273437908.sequence_catalog as c1, rel_273437865.b as c2, rel_273437910.location as c3, rel_273437864.id2c as c4, rel_273437908.start_value as c5 from public.rule_and_refint_t2 as rel_273437864 inner join public.ruletest_tbl as rel_273437865 on (rel_273437864.id2c = rel_273437865.a ) inner join information_schema.sequences as rel_273437908 inner join public.rules_log as rel_273437909 on (rel_273437908.numeric_precision_radix = rel_273437909.f1 ) right join public.emp as rel_273437910 on (rel_273437909.tag = rel_273437910.name ) on (rel_273437865.a = rel_273437908.numeric_precision ) where rel_273437909.tag !~* rel_273437909.tag fetch first 110 rows only; -- ERROR: failed to assign all NestLoopParams to plan nodes select rel_156464410.minimum_value as c0 from public.rule_and_refint_t1 as rel_156464330 inner join public.rules_src as rel_156464331 on (rel_156464330.id1b = rel_156464331.f1 ) inner join public.main_table as rel_156464401 inner join pg_catalog.pg_file_settings as rel_156464402 left join public.person as rel_156464409 inner join information_schema.sequences as rel_156464410 on (rel_156464409.age = rel_156464410.numeric_precision ) on (rel_156464402.sourceline = rel_156464409.age ) on (rel_156464401.a = rel_156464402.sourceline ) on (rel_156464331.f1 = rel_156464410.numeric_precision ) where rel_156464402.sourcefile @@ rel_156464409.name fetch first 155 rows only; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
Tom Lane writes: Well, I certainly think all of these represent bugs: [...] thanks for priorizing them. I'll try to digest them somewhat before posting. This one's pretty darn odd, because 2619 is pg_statistic and not an index at all: 4 | ERROR: cache lookup failed for index 2619 This is actually the one from the README :-). Quoting to spare media discontinuity: --8---cut here---start-8--- Taking a closer look at it reveals that it happens when you query a certain catalog view like this: self=# select indexdef from pg_catalog.pg_indexes where indexdef is not NULL; FEHLER: cache lookup failed for index 2619 This is because the planner then puts pg_get_indexdef(oid) in a context where it sees non-index-oids, which causes it to croak: QUERY PLAN Hash Join (cost=17.60..30.65 rows=9 width=4) Hash Cond: (i.oid = x.indexrelid) - Seq Scan on pg_class i (cost=0.00..12.52 rows=114 width=8) Filter: ((pg_get_indexdef(oid) IS NOT NULL) AND (relkind = 'i'::char)) - Hash (cost=17.31..17.31 rows=23 width=4) - Hash Join (cost=12.52..17.31 rows=23 width=4) Hash Cond: (x.indrelid = c.oid) - Seq Scan on pg_index x (cost=0.00..4.13 rows=113 width=8) - Hash (cost=11.76..11.76 rows=61 width=8) - Seq Scan on pg_class c (cost=0.00..11.76 rows=61 width=8) Filter: (relkind = ANY ('{r,m}'::char[])) --8---cut here---end---8--- thanks, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [sqlsmith] subplan variable reference / unassigned NestLoopParams (was: [sqlsmith] Failed assertion in joinrels.c)
Tom Lane writes: Well, I certainly think all of these represent bugs: 3 | ERROR: plan should not reference subplan's variable 2 | ERROR: failed to assign all NestLoopParams to plan nodes These appear to be related. The following query produces the former, but if you replace the very last reference of provider with the literal 'bar', it raises the latter error. select 1 from pg_catalog.pg_shseclabel as rel_09 inner join public.rtest_view2 as rel_32 left join pg_catalog.pg_roles as rel_33 on (rel_32.a = rel_33.rolconnlimit ) on (rel_09.provider = rel_33.rolpassword ) left join pg_catalog.pg_user as rel_35 on (rel_33.rolconfig = rel_35.useconfig ) where ( ((rel_09.provider ~~ 'foo') and (rel_35.usename ~* rel_09.provider))); ,[ FWIW: git bisect run ] | first bad commit: [e83bb10d6dcf05a666d4ada00d9788c7974ad378] | Adjust definition of cheapest_total_path to work better with LATERAL. ` regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
Tom Lane writes: On 08/01/2015 05:59 PM, Tom Lane wrote: Well, I certainly think all of these represent bugs: 1 | ERROR: could not find pathkey item to sort Hmm ... I see no error with these queries as of today's HEAD or back-branch tips. I surmise that this was triggered by one of the other recently-fixed bugs, though the connection isn't obvious offhand. I still see this error in master as of b8cbe43, but the queries are indeed a pita to reproduce. The one below is the only one so far that is robust against running ANALYZE on the regression db, and also reproduces when I run it as an EXTRA_TEST with make check. regards, Andreas select rel_217088662.a as c0, rel_217088554.a as c1, rel_217088662.b as c2, subq_34235266.c0 as c3, rel_217088660.id2 as c4, rel_217088660.id2 as c5 from public.clstr_tst as rel_217088554 inner join (select rel_217088628.a as c0 from public.rtest_vview3 as rel_217088628 where (rel_217088628.b !~ rel_217088628.b) and rel_217088628.b ~~* rel_217088628.b) or (rel_217088628.b ~* rel_217088628.b)) or (rel_217088628.b rel_217088628.b)) or (rel_217088628.b = rel_217088628.b))) as subq_34235266 inner join public.num_exp_mul as rel_217088660 inner join public.onek2 as rel_217088661 on (rel_217088660.id1 = rel_217088661.unique1 ) on (subq_34235266.c0 = rel_217088660.id1 ) inner join public.main_table as rel_217088662 on (rel_217088661.unique2 = rel_217088662.a ) on (rel_217088554.b = rel_217088660.id1 ) where rel_217088554.d = rel_217088554.c fetch first 94 rows only; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
Peter Geoghegan writes: On Fri, Jul 31, 2015 at 5:56 PM, Andreas Seltenreich seltenre...@gmx.de wrote: sqlsmith triggered the following assertion in master (c188204). Thanks for writing sqlsmith. It seems like a great tool. I wonder, are you just running the tool with assertions enabled when PostgreSQL is built? Right. I have to admit my testing setup is still more tailored towards testing sqlsmith than postgres. If so, it might make sense to make various problems more readily detected. As you may know, Clang has a pretty decent option called AddressSanitizer that can detect memory errors as they occur with an overhead that is not excessive. I didn't known this clang feature yet, thanks for pointing it out. I considered running some instances under valgrind to detect these, but the performance penalty seemed not worth it. One might use the following configure arguments when building PostgreSQL to use AddressSanitizer: ./configure CC=clang CFLAGS='-O1 -g -fsanitize=address -fno-omit-frame-pointer -fno-optimize-sibling-calls' --enable-cassert A quick attempt to sneak these in made my ansible playbooks unhappy due to make check failures and other generated noise. I'll try to have an instance with the AddressSanitizer active soon though. Of course, it remains to be seen if this pays for itself. Apparently the tool has about a 2x overhead [1]. I'm really not sure that you'll find any more bugs this way, but it's certainly possible that you'll find a lot more. Given your success in finding bugs without using AddressSanitizer, introducing it may be premature. Piotr also suggested on IRC to run coverage tests w/ sqlsmith. This could yield valuable hints in which direction to extend sqlsmith's grammar. Thanks, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
Tom Lane writes: What concerns me more is that what you're finding is only cases that trip an assertion sanity check. It seems likely that you're also managing to trigger other bugs with less drastic consequences, such as could not devise a query plan failures or just plain wrong answers. Ja, some of these are logged as well[1], but most of them are really as undrastic as can get, and I was afraid reporting them would be more of a nuisance. I analysed a couple of the cache lookup failures, and they all had a similar severreness than the example in the README[2]. The operator ones I analysed seem due to intentionally broken operators in the regression db. The NestLoopParams and subplan reference one sound interesting though… I'm not sure how we could identify wrong answers automatically :-( Csmith isn't doing this either. They discuss differential testing though in their papers, i.e., comparing the results of different products. Maybe a simple metric like numbers of rows returned might already be valuable for correctness checks. I also thought about doing some sampling on the data and simulating relational operations and check for witness tuples, but it is probably not appropriate to start implementing a mini-rdbms on the client side. but it might be worth checking for XX000 SQLSTATE responses, since generally that should be a can't-happen case. (Or if it can happen, we need to change the errcode.) The sqlstate is currently missing in the reports because libpqxx is not putting it in it's exceptions :-/. regards, Andreas Footnotes: [1] smith=# select * from report24h; count | error ---+-- 43831 | ERROR: unsupported XML feature 39496 | ERROR: invalid regular expression: quantifier operand invalid 27261 | ERROR: canceling statement due to statement timeout 21386 | ERROR: operator does not exist: point = point 8580 | ERROR: cannot compare arrays of different element types 5019 | ERROR: invalid regular expression: brackets [] not balanced 4646 | ERROR: could not determine which collation to use for string comparison 2583 | ERROR: invalid regular expression: nfa has too many states 2248 | ERROR: operator does not exist: xml = xml 1198 | ERROR: operator does not exist: polygon = polygon 1171 | ERROR: cache lookup failed for index 16862 677 | ERROR: invalid regular expression: parentheses () not balanced 172 | ERROR: cache lookup failed for index 257148 84 | ERROR: could not find member 1(34520,34520) of opfamily 1976 55 | ERROR: missing support function 1(34516,34516) in opfamily 1976 42 | ERROR: operator does not exist: city_budget = city_budget 13 | ERROR: could not find commutator for operator 34538 10 | ERROR: could not identify a comparison function for type xid 4 | Connection to database failed 4 | ERROR: cache lookup failed for index 2619 3 | ERROR: plan should not reference subplan's variable 2 | ERROR: cache lookup failed for index 12322 2 | ERROR: failed to assign all NestLoopParams to plan nodes 2 | ERROR: invalid regular expression: invalid character range 1 | ERROR: could not find pathkey item to sort (25 rows) Time: 1158,990 ms [2] https://github.com/anse1/sqlsmith/blob/master/README.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [sqlsmith] ERROR: too late to create a new PlaceHolderInfo
Hi, on master at 36d9345, the attached queries raised too late to create a new PlaceHolderInfo. regards, Andreas select subq_218206.c0 as c0 from (select rel_1375794.sl_name as c0, coalesce(rel_1375793.f2, rel_1375793.f2) as c1 from public.subselect_tbl as rel_1375793 inner join public.shoe_ready as rel_1375794 on (rel_1375793.f1 = rel_1375794.sh_avail ) where 34 21 fetch first 99 rows only) as subq_218205, lateral (select rel_1375795.b as c0 from public.rtest_t5 as rel_1375795 where rel_1375795.b coalesce(rel_1375795.b, rel_1375795.b) fetch first 122 rows only) as subq_218206 where EXISTS ( select coalesce(rel_1375802.tablename, rel_1375802.schemaname) as c0 from (select rel_1375801.b as c0, rel_1375801.b as c1, rel_1375801.b as c2, rel_1375801.a as c3 from public.rtest_t9 as rel_1375801 where rel_1375801.b ~~ rel_1375801.b fetch first 180 rows only) as subq_218208 left join pg_catalog.pg_policies as rel_1375802 on (subq_218208.c2 = rel_1375802.cmd ) where rel_1375802.qual !~* rel_1375802.cmd fetch first 118 rows only) fetch first 154 rows only; select coalesce(subq_734225.c1, subq_734225.c1) as c0, rel_4650803.srvversion as c1 from (select rel_4650756.name as c0, subq_734226.c0 as c1 from public.street as rel_4650756, lateral (select rel_4650757.conproc as c0 from pg_catalog.pg_conversion as rel_4650757 where rel_4650757.conowner = rel_4650757.connamespace fetch first 52 rows only) as subq_734226 where rel_4650756.name ~~* coalesce(rel_4650756.cname, rel_4650756.cname)) as subq_734225 right join (select rel_4650790.thepath as c0, rel_4650790.name as c1 from public.road as rel_4650790 where rel_4650790.thepath rel_4650790.thepath) as subq_734232 inner join pg_catalog.pg_policies as rel_4650802 right join pg_catalog.pg_foreign_server as rel_4650803 on (rel_4650802.tablename = rel_4650803.srvname ) inner join pg_catalog.pg_roles as rel_4650804 on (rel_4650803.srvtype = rel_4650804.rolpassword ) on (subq_734232.c1 = rel_4650802.cmd ) on (subq_734225.c0 = rel_4650803.srvtype ) where (((rel_4650804.rolname !~ subq_734225.c0) and (((rel_4650802.tablename is NULL) or (rel_4650802.qual !~* subq_734225.c0)) and (subq_734225.c0 ~ subq_734225.c0))) and ((rel_4650802.policyname is not NULL) or (rel_4650802.roles is NULL))) and (rel_4650802.with_check ~~ subq_734225.c0) fetch first 63 rows only; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [sqlsmith] Failed assertion in analyzejoins.c
Hi, this one was in today's sqlsmith harvest. It triggers an assertion in current master (c030dc4). regards, Andreas -- TRAP: FailedAssertion(!(!bms_is_empty(phinfo-ph_eval_at)), File: analyzejoins.c, Line: 474) select rel_141618057.srvfdw as c0, rel_141618057.srvversion as c1 from public.tv as rel_141617979 right join pg_catalog.pg_foreign_server as rel_141618057 left join pg_catalog.pg_roles as rel_141618058 on (rel_141618057.srvname = rel_141618058.rolname ) on (rel_141617979.type = rel_141618058.rolpassword ) where rel_141618057.srvfdw is not NULL; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Failing assertions in indxpath.c, placeholder.c and brin_minmax.c
Hi, when running my random query generator contraption[1] against the regression database of 9.5 or master, it occasionally triggers one of the following three assertions. Someone more knowledgeable might want to take a look at them... -- FailedAssertion(!(outer_rel-rows 0), File: indxpath.c, Line: 1911) -- sample query: select rel1925354.loid as c0, rel1925353.version as c1 from (select rel1925352.aa as c0, rel1925352.aa as c1 from public.b as rel1925352 where (rel1925352.bb is NULL) and (rel1925352.bb rel1925352.bb)) as subq_303136 inner join pg_catalog.pg_stat_ssl as rel1925353 on (subq_303136.c0 = rel1925353.pid ) right join pg_catalog.pg_largeobject as rel1925354 on (subq_303136.c0 = rel1925354.pageno ) where (rel1925353.clientdn !~ rel1925353.clientdn) and (rel1925353.cipher = rel1925353.clientdn); ,[ git bisect ] | first bad commit: [3f8c23c4d31d4a0e801041733deb2c7cfa577b32] Improve | predtest.c's ability to reason about operator expressions. ` -- FailedAssertion(!(!bms_is_empty(phinfo-ph_eval_at)), File: placeholder.c, Line: 109) -- sample query: select rel1600276.viewowner as c0, rel1600274.maxwritten_clean as c1, rel1600275.n_tup_hot_upd as c2 from pg_catalog.pg_stat_bgwriter as rel1600274 inner join pg_catalog.pg_stat_xact_all_tables as rel1600275 on (rel1600274.maxwritten_clean = rel1600275.seq_scan ) right join pg_catalog.pg_views as rel1600276 right join pg_catalog.pg_operator as rel1600277 on (rel1600276.viewname = rel1600277.oprname ) on (rel1600275.relname = rel1600277.oprname ) where 3 is not NULL; ,[ git bisect ] | first bad commit: [f4abd0241de20d5d6a79b84992b9e88603d44134] Support | flattening of empty-FROM subqueries and one-row VALUES tables. ` -- FailedAssertion(!(key-sk_flags 0x0080), File: brin_minmax.c, Line: 177) -- sample query: select rel167978.namecol as c0 from information_schema.parameters as rel167972 left join public.student as rel167977 inner join public.brintest as rel167978 on (rel167977.age = rel167978.int4col ) on (rel167972.interval_precision = rel167977.age ) where rel167977.name rel167977.name; regards, andreas Footnotes: [1] https://github.com/anse1/sqlsmith -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failing assertions in indxpath.c, placeholder.c and brin_minmax.c
Tom Lane writes: Andreas Seltenreich seltenre...@gmx.de writes: when running my random query generator contraption[1] against the regression database of 9.5 or master, it occasionally triggers one of the following three assertions. I've fixed the first two of these --- thanks for the report! I let sqlsmith run during the night, and it did no longer trigger the first two. During roughly a million random queries it triggered the already mentioned brin one 10 times, but there was also one instance of this new one in the log: TRAP: FailedAssertion(!(join_clause_is_movable_into(rinfo, joinrel-relids, join_and_req)), File: relnode.c, Line: 987) LOG: server process (PID 12851) was terminated by signal 6: Aborted DETAIL: Failed process was running: select rel65543066.tmplname as c0, rel65543064.umuser as c1 from public.dept as rel65543059 inner join pg_catalog.pg_user_mappings as rel65543064 left join pg_catalog.pg_enum as rel65543065 on (rel65543064.srvname = rel65543065.enumlabel ) inner join pg_catalog.pg_ts_template as rel65543066 on (rel65543065.enumtypid = rel65543066.tmplnamespace ) on (rel65543059.dname = rel65543064.srvname ) where ((rel65543059.mgrname = rel65543059.mgrname) and (rel65543064.usename = rel65543066.tmplname)) and (rel65543059.mgrname ~~ rel65543059.mgrname) fetch first 128 rows only; ,[ git bisect ] | first bad commit: [3f8c23c4d31d4a0e801041733deb2c7cfa577b32] Improve | predtest.c's ability to reason about operator expressions. ` I'm a bit confused about this aspect of your report though, because in my hands that example fails clear back to 9.2. It doesn't seem to require the predtest.c improvement to expose the fault. Hmm, I actually used a different, uglier query to trigger this assertion for the bisection run. I'll attach it[1] along with the complete git bisect log[2]. regards, andreas Footnotes: [1] select subq_717608.c3 as c0, rel4551421.inhrelid as c1, rel4551421.inhrelid as c2, subq_717608.c3 as c3 from information_schema.foreign_tables as rel4551363 right join public.hash_f8_heap as rel4551366 inner join pg_catalog.pg_constraint as rel4551419 inner join (select rel4551420.bb as c0, rel4551420.aa as c1, rel4551420.aa as c2, rel4551420.aa as c3 from public.b as rel4551420 where ( rel4551420.bbrel4551420.bb ) and ( rel4551420.bbrel4551420.bb ) ) as subq_717608 on (rel4551419.coninhcount = subq_717608.c1 ) left join pg_catalog.pg_inherits as rel4551421 on (subq_717608.c1 = rel4551421.inhseqno ) on (rel4551366.seqno = subq_717608.c1 ) on (rel4551363.foreign_table_schema = rel4551419.conname ) where ( ( rel4551419.contypidrel4551419.connamespace ) and ( rel4551419.connamespace=rel4551419.conrelid ) ) and ( rel4551421.inhparentrel4551419.contypid ) fetch first 9 rows only ; [2] git bisect start # bad: [3b5a89c4820fb11c337838c1ad71e8e93f2937d1] Fix resource leak pointed out by Coverity. git bisect bad 3b5a89c4820fb11c337838c1ad71e8e93f2937d1 # good: [e6df2e1be6330660ba4d81daa726ae4a71535aa9] Stamp 9.4beta1. git bisect good e6df2e1be6330660ba4d81daa726ae4a71535aa9 # bad: [68e66923ff629c324e219090860dc9e0e0a6f5d6] Add missing volatile qualifier. git bisect bad 68e66923ff629c324e219090860dc9e0e0a6f5d6 # bad: [a16bac36eca8158cbf78987e95376f610095f792] Remove dependency on wsock32.lib in favor of ws2_32 git bisect bad a16bac36eca8158cbf78987e95376f610095f792 # good: [55d5b3c08279b487cfa44d4b6e6eea67a0af89e4] Remove unnecessary output expressions from unflattened subqueries. git bisect good 55d5b3c08279b487cfa44d4b6e6eea67a0af89e4 # bad: [1cbc9480106241aaa8db112331e93d0a265b6db0] Check interrupts during logical decoding more frequently. git bisect bad 1cbc9480106241aaa8db112331e93d0a265b6db0 # bad: [686f362bee126e50280bcd3b35807b02f18a8966] Fix contrib/pg_upgrade/test.sh for $PWD containing spaces. git bisect bad 686f362bee126e50280bcd3b35807b02f18a8966 # bad: [be76a6d39e2832d4b88c0e1cc381aa44a7f86881] Secure Unix-domain sockets of make check temporary clusters. git bisect bad be76a6d39e2832d4b88c0e1cc381aa44a7f86881 # good: [6554656ea2043c5bb877b427237dc5ddd7c5e5c8] Improve tuplestore's error messages for I/O failures. git bisect good 6554656ea2043c5bb877b427237dc5ddd7c5e5c8 # bad: [a7205d81573cb0c979f2d463a1d9edb6f97c94aa] Adjust 9.4 release notes. git bisect bad a7205d81573cb0c979f2d463a1d9edb6f97c94aa # bad: [3f8c23c4d31d4a0e801041733deb2c7cfa577b32] Improve predtest.c's ability to reason about operator expressions. git bisect bad 3f8c23c4d31d4a0e801041733deb2c7cfa577b32 # good: [c81e63d85f0c2c39d3fdfd8b95fc1ead6fdcb89f] Fix pg_restore's processing of old-style BLOB COMMENTS data. git bisect good c81e63d85f0c2c39d3fdfd8b95fc1ead6fdcb89f # first bad commit: [3f8c23c4d31d4a0e801041733deb2c7cfa577b32] Improve predtest.c's ability to reason about operator
Re: [HACKERS] 9.3.9 and pg_multixact corruption
Thomas Munro writes: > In various places we have int pageno = offset / (uint32) 1636, expanded > from this macro (which calls the offset an xid): It appears to depend on the context it is expanded in, as some of the code must have gotten the segment number right: ,[ ls -sh pg_multixact/members/ ] | 256K 97E0 | [...] | 256K A03B | 24K A03C = -5FC4 |0 5FC4 ` > I don't really see how any uint32 value could produce such a pageno via > that macro. Even if called in an environment where (xid) is accidentally > an int, the int / unsigned expression would convert it to unsigned first > (unless (xid) is a bigger type like int64_t: by the rules of int promotion > you'd get signed division in that case, hmm...). But it's always called > with a MultiXactOffset AKA uint32 variable. I managed disassemble RecordNewMultiXact from the core dump using a cross-binutils, and it reveals that the compiler[1] appears to have indeed generated a signed division here. I'm attaching a piece of C code that does the same computation as the assembly (I think), as well as the disassembly itself. regards, Andreas Footnotes: [1] Sun C 5.12 SunOS_sparc Patch 148917-07 2013/10/18, 64-bit #include #include #include uint32_t offset2page(uint32_t offset) { uint64_t l0, i3, i5, i4, l6_1, l6_2, o2, o4, o1, o3; l0 = offset; i3 = 0x5fc3e800ULL; i5 = i3 ^ -375; i4 = (int32_t)l0; l6_1 = i4 * i5; o2 = l6_1 >> 32; o4 = l0 + o2; o1 = ((int32_t)o4) >> 10; o3 = ((int32_t)l0) >> 31; l6_2 = o1 - o3; return l6_2; } int main(int argc, char *argv[]) { uint32_t page = offset2page(atol(argv[1])); printf("page: %d\n", page); printf("segment: %04X\n", (int32_t)page/32); return 0; } RecordNewMultiXact: 100112be8: 07 00 04 01 sethi %hi(0x100400), %g3 100112bec: 9d e3 bf 30 save %sp, -208, %sp 100112bf0: 82 10 e2 bd or %g3, 0x2bd, %g1 100112bf4: 90 10 20 0e mov 0xe, %o0 100112bf8: a5 36 20 00 srl %i0, 0, %l2 100112bfc: 9f 28 70 0c sllx %g1, 0xc, %o7 100112c00: a2 0c a7 ff and %l2, 0x7ff, %l1 100112c04: b9 3e a0 00 sra %i2, 0, %i4 100112c08: b4 03 ed d8 add %o7, 0xdd8, %i2 100112c0c: b1 36 60 00 srl %i1, 0, %i0 100112c10: f0 23 a8 af st %i0, [ %sp + 0x8af ] 100112c14: b2 10 00 1b mov %i3, %i1 100112c18: 40 08 1d 66 call 0x10031a1b0 ; LWLockAcquire 100112c1c: 92 10 20 00 clr %o1 100112c20: 93 34 a0 0b srl %l2, 0xb, %o1 100112c24: 90 10 00 1a mov %i2, %o0 100112c28: 97 34 a0 00 srl %l2, 0, %o3 100112c2c: 7f ff f6 bd call 0x100110720 ; SimpleLruReadPage 100112c30: 94 10 20 01 mov 1, %o2 100112c34: fa 5e a0 00 ldx [ %i2 ], %i5 100112c38: ab 3a 20 00 sra %o0, 0, %l5 100112c3c: 96 10 20 01 mov 1, %o3 100112c40: a9 2d 70 03 sllx %l5, 3, %l4 100112c44: e0 5f 60 08 ldx [ %i5 + 8 ], %l0 100112c48: a7 3c 60 00 sra %l1, 0, %l3 100112c4c: ad 2c f0 02 sllx %l3, 2, %l6 100112c50: da 5c 00 14 ldx [ %l0 + %l4 ], %o5 100112c54: f0 23 40 16 st %i0, [ %o5 + %l6 ] 100112c58: d8 5e a0 00 ldx [ %i2 ], %o4 100112c5c: d4 5b 20 18 ldx [ %o4 + 0x18 ], %o2 100112c60: d6 2a 80 15 stb %o3, [ %o2 + %l5 ] 100112c64: 40 08 1f 2d call 0x10031a918 ; LWLockRelease 100112c68: 90 10 20 0e mov 0xe, %o0 100112c6c: 90 10 20 0f mov 0xf, %o0 100112c70: 40 08 1d 50 call 0x10031a1b0 ; LWLockAcquire 100112c74: 92 10 20 00 clr %o1 100112c78: 80 a7 20 00 cmp %i4, 0 100112c7c: 04 40 00 7c ble,pn %icc, 0x100112e6c 100112c80: 90 07 3f ff add %i4, -1, %o0 100112c84: e0 03 a8 af ld [ %sp + 0x8af ], %l0 100112c88: 37 17 f0 fa sethi %hi(0x5fc3e800), %i3 100112c8c: 86 10 3f ff mov -1, %g3 100112c90: ba 1e fe 89 xor %i3, -375, %i5 100112c94: d0 23 a8 b7 st %o0, [ %sp + 0x8b7 ] 100112c98: a2 10 26 64 mov 0x664, %l1 100112c9c: c0 23 a8 b3 clr [ %sp + 0x8b3 ] 100112ca0: 37 10 1e 0b sethi %hi(0x40782c00), %i3 100112ca4: b9 3c 20 00 sra %l0, 0, %i4 100112ca8: ac 4f 00 1d mulx %i4, %i5, %l6 100112cac: 95 35 b0 20 srlx %l6, 0x20, %o2 100112cb0: ba 10 26 63 mov 0x663, %i5 100112cb4: 98 04 00 0a add %l0, %o2, %o4 100112cb8: b8 10 20 01 mov 1, %i4 100112cbc: 93 3b 20 0a sra %o4, 0xa, %o1 100112cc0: 97 3c 20 1f sra %l0, 0x1f, %o3 100112cc4: ac 22 40 0b sub %o1, %o3, %l6 100112cc8: 89 2d a0 04 sll %l6, 4, %g4 100112ccc: 82 01 00 16 add %g4, %l6, %g1 100112cd0: b1 28 60 02 sll %g1, 2, %i0 100112cd4: 84 26 00 01 sub %i0, %g1, %g2 100112cd8: f0 03 a8 af ld [ %sp + 0x8af ], %i0 100112cdc: 9f 28 a0 03 sll %g2, 3, %o7 100112ce0: 90 05 80 0f add %l6,
[HACKERS] RemoveLocalLock pfree'ing NULL when out-of-memory
Hi, a memory-starved instance of sqlsmith just caught RemoveLocalLock pfree'ing a NULL in locallock->lockOwners. I think what happened is that it was called to clean up after LockAcquireExtended's MemoryContextAlloc failed. The content of errordata seems consistent with this. Caught in master as of 85eda7e (sorry about the bogus hashes in earlier reports, I had some cruft on my local branch). regards, Andreas FailedAssertion("!(pointer != ((void *)0))", File: "mcxt.c", Line: 1002) #3 0x007e1c80 in pfree (pointer=) at mcxt.c:1002 #4 0x006bdd24 in RemoveLocalLock (locallock=locallock@entry=0x3a90d68) at lock.c:1225 #5 0x006c1ceb in LockReleaseAll (lockmethodid=lockmethodid@entry=1, allLocks=1 '\001') at lock.c:2083 #6 0x006c3274 in ProcReleaseLocks (isCommit=isCommit@entry=0 '\000') at proc.c:752 #7 0x007e3700 in ResourceOwnerReleaseInternal (owner=owner@entry=0x208b488, phase=phase@entry=RESOURCE_RELEASE_LOCKS, isCommit=isCommit@entry=0 '\000', isTopLevel=isTopLevel@entry=1 '\001') at resowner.c:307 #8 0x007e381f in ResourceOwnerRelease (owner=0x208b488, phase=phase@entry=RESOURCE_RELEASE_LOCKS, isCommit=isCommit@entry=0 '\000', isTopLevel=isTopLevel@entry=1 '\001') at resowner.c:212 #9 0x004e903b in AbortTransaction () at xact.c:2557 #10 0x004e98ad in AbortCurrentTransaction () at xact.c:3003 #11 0x006d45a3 in PostgresMain (argc=1, argv=0x202e638, dbname=0x202e610 "regression", username=0x202e5f0 "smith") at postgres.c:3856 #12 0x00466964 in BackendRun (port=0x204e080) at postmaster.c:4204 #13 BackendStartup (port=0x204e080) at postmaster.c:3880 #14 ServerLoop () at postmaster.c:1683 #15 0x0067867e in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x202d600) at postmaster.c:1292 #16 0x0046774d in main (argc=3, argv=0x202d600) at main.c:223 (gdb) p errordata[0] $4 = {elevel = 20, output_to_server = 0 '\000', output_to_client = 1 '\001', show_funcname = 0 '\000', hide_stmt = 0 '\000', hide_ctx = 0 '\000', filename = 0x9a1f80 "mcxt.c", lineno = 769, funcname = 0x9a24b0 <__func__.5880> "MemoryContextAlloc", domain = 0x9350f6 "postgres-9.6", context_domain = 0x9350f6 "postgres-9.6", sqlerrcode = 8389, message = 0x296d020 "out of memory", detail = 0x296cfe8 "Failed on request of size 128.", detail_log = 0x0, hint = 0x0, context = 0x0, schema_name = 0x0, table_name = 0x0, column_name = 0x0, datatype_name = 0x0, constraint_name = 0x0, cursorpos = 0, internalpos = 0, internalquery = 0x0, saved_errno = 12, assoc_context = 0x296a7a8} -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3.9 and pg_multixact corruption
[ adding Bjorn Munch to Cc ] Jim Nasby writes: > On 9/20/15 9:23 AM, Christoph Berg wrote: >> a short update here: the customer updated the compiler to a newer >> version, is now compiling using -O2 instead of -O3, and the code >> generated now looks sane, so this turned out to be a compiler issue. >> (Though it's unclear if the upgrade fixed it, or the different -O >> level.) > > Do we officially not support anything > -O2? If so it'd be nice if > configure threw at least a warning (if not an error that you had to > explicitly over-ride). At least the solaris binaries distributed via postgresql.org[1] have been compiled with -xO3 according to pg_config. And their code for multixact.c looks inconspicuous. To recap the data points: | compiler | flags | multixact.o | |---+---+-| | Sun C 5.12 SunOS_sparc Patch 148917-07 2013/10/18 | -xO3 | bad | | Sun C 5.13 SunOS_Sparc 2014/10/20 | -xO2 | good| | Sun C 5.8 Patch 121015-04 2007/01/10 | -xO3 | good| regards, Andreas Footnotes: [1] http://www.postgresql.org/download/solaris/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3.9 and pg_multixact corruption
Alvaro Herrera writes: > Jim Nasby wrote: >> Do we officially not support anything > -O2? If so it'd be nice if configure >> threw at least a warning (if not an error that you had to explicitly >> over-ride). > > Keep in mind this is Sun OS C -- not one of the most popular compilers > in the world. I don't know what you suggest: have a test program that > configure runs and detects whether the compiler does the wrong thing? > It doesn't seem a sane idea to maintain test cases for all known > compiler bugs ... I think the intention was to make configure complain if there's a -O > 2 in CFLAGS. OTOH, a unit test for multixact.c that exercises the code including wraparounds sounds like a desirable thing regardless of the fact that it could have caught this miscompilation earlier than 6 months into production. regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries
I wrote: > Tom Lane writes: >> Andreas Seltenreich <seltenre...@gmx.de> writes: >>> I've added new grammar rules to sqlsmith and improved some older ones. >>> This was rewarded with a return of "failed to generate plan" errors. >> >> I believe I've dealt with these cases now. Thanks for the report! > > I no longer see "failed to build any n-way joins" after pulling, but > there are still instances of "could not devise a query plan". Samples below. sorry, I spoke too soon: nine of the former have been logged through the night. I'm attaching a larger set of sample queries this time in case that there are still multiple causes for the observed errors. regards, Andreas -- select query||';' from error e -- where (msg ~~ 'ERROR: failed to build any%') -- and e.t > now() - interval '8 hours' -- order by length(query) asc; select ref_0.collation_schema as c0 from (select sample_0.is_supported as c0 from information_schema.sql_packages as sample_0 tablesample system (0.8) where 25 > 15 fetch first 60 rows only) as subq_0 right join information_schema.collations as ref_0 on (subq_0.c0 = ref_0.collation_catalog ), lateral (select ref_1.f2 as c0, ref_1.f1 as c1, ref_0.collation_name as c2, ref_0.pad_attribute as c3, subq_0.c0 as c4, subq_0.c0 as c5, subq_0.c0 as c6 from public.func_index_heap as ref_1 where ref_1.f2 ~ ref_1.f1 fetch first 170 rows only) as subq_1, lateral (select sample_8.b as c0, subq_1.c1 as c1, sample_8.a as c2 from public.clstr_tst as sample_8 tablesample system (8.8) left join public.f_star as sample_9 tablesample bernoulli (7.5) on (sample_8.a = sample_9.aa ) left join public.rules_log as sample_10 tablesample bernoulli (7) on (sample_8.b = sample_10.f1 ) where sample_10.tag <> sample_8.c fetch first 126 rows only) as subq_2 where (ref_0.collation_name is NULL) or (subq_2.c1 !~ subq_2.c1) fetch first 56 rows only; select subq_16.c4 as c0 from public.rule_and_refint_t1 as sample_18 tablesample system (9.7) left join (select sample_20.xx as c0, sample_20.xx as c1 from public.inhf as sample_20 tablesample bernoulli (1) where sample_20.xx is not NULL) as subq_11 inner join public.main_table as sample_25 tablesample system (1.2) inner join (select sample_26.b as c0, subq_15.c1 as c1, subq_15.c0 as c2, 32 as c3, subq_15.c0 as c4 from public.dropcolumn as sample_26 tablesample system (7.6) , lateral (select sample_27.t as c0, sample_26.b as c1 from public.radix_text_tbl as sample_27 tablesample system (4.6) where (sample_26.b is not NULL) and (sample_27.t ~~ sample_27.t)) as subq_15 where subq_15.c0 !~ subq_15.c0) as subq_16 on (sample_25.b = subq_16.c0 ) on (subq_11.c0 = subq_16.c2 ) on (sample_18.id1a = sample_25.a ), lateral (select subq_16.c3 as c0, subq_17.c1 as c1, sample_18.id1a as c2, coalesce(subq_16.c0, subq_16.c1) as c3 from public.rtest_vcomp as ref_21, lateral (select sample_28.a as c0, ref_21.size_in_cm as c1, subq_11.c0 as c2, sample_18.id1a as c3 from public.tab1 as sample_28 tablesample system (8.2) where subq_11.c1 <= sample_28.b fetch first 111 rows only) as subq_17 where ref_21.size_in_cm is NULL fetch first 101 rows only) as subq_18 where subq_11.c0 ~>=~ subq_11.c1 fetch first 94 rows only; select subq_33.c0 as c0, subq_33.c0 as c1 from (select sample_95.y as c0, sample_95.z as c1 from public.check2_tbl as sample_95 tablesample system (5.1) where sample_95.y = sample_95.y) as subq_28 right join pg_catalog.pg_user as ref_101 on (subq_28.c0 = ref_101.passwd ) left join pg_catalog.pg_opfamily as sample_96 tablesample bernoulli (9.8) on (ref_101.usesysid = sample_96.opfmethod ), lateral (select subq_31.c0 as c0, sample_97.e as c1, sample_97.e as c2 from public.dropcolumnchild as sample_97 tablesample system (2.4) , lateral (select subq_30.c0 as c0 from public.random_tbl as sample_98 tablesample system (8.4) , lateral (select subq_28.c1 as c0,
[HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries
Hi, I've added new grammar rules to sqlsmith and improved some older ones. This was rewarded with a return of "failed to generate plan" errors. The failing queries all contain a lateral subquery. The shortest of the failing queries are below. They were run against the regression db of master as of db07236. regards, Andreas smith=# select msg, query from error where (firstline(msg) ~~ 'ERROR: failed to build any%' or firstline(msg) ~~ 'ERROR: could not devise a query plan%') and t > now() - interval '1 day' order by length(query) asc limit 3; ERROR: failed to build any 8-way joins select ref_96.foreign_table_schema as c0, sample_87.is_supported as c1 from information_schema.sql_packages as sample_87 tablesample system (0.2) right join information_schema._pg_foreign_tables as ref_96 on (sample_87.feature_id = ref_96.foreign_table_catalog ), lateral (select sample_87.is_verified_by as c0, ref_97.indexed_col as c1, coalesce(sample_87.feature_id, ref_96.foreign_server_name) as c2, 4 as c3 from public.comment_test as ref_97 where ref_97.id ~>~ ref_97.indexed_col fetch first 73 rows only) as subq_33 where ref_96.foreign_table_name ~~ subq_33.c1 ERROR: could not devise a query plan for the given query select subq_43.c0 as c0 from (select ref_181.installed as c0 from pg_catalog.pg_available_extension_versions as ref_181, lateral (select ref_181.name as c0, ref_181.installed as c1 from pg_catalog.pg_conversion as ref_182 where ref_182.conname ~~* ref_181.version fetch first 98 rows only) as subq_42 where (subq_42.c0 is not NULL) or (subq_42.c1 is NULL)) as subq_43 right join pg_catalog.pg_language as sample_177 tablesample system (2.8) on (subq_43.c0 = sample_177.lanispl ) where sample_177.lanowner < sample_177.lanvalidator ERROR: failed to build any 5-way joins select ref_239.id2 as c0, 40 as c1, ref_239.id2 as c2, ref_238.aa as c3 from public.tt5 as sample_289 tablesample system (8.1) inner join information_schema.element_types as ref_237 on (sample_289.x = ref_237.character_maximum_length ) left join public.b as ref_238 on (ref_237.character_maximum_length = ref_238.aa ) left join public.num_exp_mul as ref_239 on (ref_237.numeric_precision_radix = ref_239.id1 ), lateral (select sample_290.b as c0, sample_289.y as c1, ref_239.id2 as c2 from public.rtest_t8 as sample_290 tablesample bernoulli (4.6) where (sample_290.b > ref_238.bb) and (sample_289.y > ref_239.expected) fetch first 91 rows only) as subq_64 where (subq_64.c1 > sample_289.y) and (sample_289.y = ref_239.expected) fetch first 133 rows only -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries
Tom Lane writes: > Andreas Seltenreich <seltenre...@gmx.de> writes: >> I've added new grammar rules to sqlsmith and improved some older ones. >> This was rewarded with a return of "failed to generate plan" errors. > > I believe I've dealt with these cases now. Thanks for the report! I no longer see "failed to build any n-way joins" after pulling, but there are still instances of "could not devise a query plan". Samples below. regards, Andreas select ref_1.aa as c0, subq_1.c1 as c1, coalesce(ref_1.class, ref_1.class) as c2, subq_1.c0 as c3 from (select subq_0.c1 as c0, coalesce(sample_0.a, sample_1.i) as c1 from public.rtest_t9 as sample_0 tablesample bernoulli (5.6) inner join public.iportaltest as sample_1 tablesample bernoulli (9.8) on (sample_0.a = sample_1.i ), lateral (select sample_1.d as c0, ref_0.a as c1, sample_1.p as c2, ref_0.a as c3, ref_0.a as c4, sample_0.b as c5, sample_1.i as c6 from public.rtest_view2 as ref_0 where sample_0.b = sample_0.b fetch first 93 rows only) as subq_0 where sample_0.b ~<=~ sample_0.b) as subq_1 right join public.e_star as ref_1 on (subq_1.c0 = ref_1.aa ) where ref_1.cc < ref_1.cc fetch first 59 rows only; select sample_69.tmpllibrary as c0, coalesce(sample_69.tmplname, sample_69.tmplname) as c1, subq_33.c0 as c2 from (select coalesce(ref_53.provider, sample_68.typdefault) as c0 from pg_catalog.pg_type as sample_68 tablesample bernoulli (6.9) inner join pg_catalog.pg_shseclabel as ref_53 on (sample_68.typowner = ref_53.objoid ), lateral (select sample_68.typcategory as c0, ref_54.speaker as c1, ref_54.speaker as c2 from public.test_range_excl as ref_54 where (ref_53.label >= ref_53.provider) and (ref_53.label !~* ref_53.provider) fetch first 143 rows only) as subq_32 where ref_53.label ~>~ ref_53.label) as subq_33 right join pg_catalog.pg_pltemplate as sample_69 tablesample bernoulli (9.8) on (subq_33.c0 = sample_69.tmplhandler ) where sample_69.tmplvalidator ~ subq_33.c0 fetch first 131 rows only; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries
Tom Lane writes: > [2. transitive-lateral-fixes-1.patch] I was about to write that sqlsmith likes the patch, but after more than 10^8 ok queries the attached ones were generated. regards, Andreas post-patch-errors.sql Description: application/sql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries
Tom Lane writes: > [2. transitive-lateral-fixes-2.patch ] > [2. remove-lateraljoininfo-2.patch ] They seem to have fixed the issue for good now. No errors have been logged for 2e8 queries since applying the first patch. (The second one was applied later and didn't get as much exposure.) I guess that means I have to go back to extending the grammar again :-). regards Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries
Peter Geoghegan writes: > On Sun, Dec 6, 2015 at 9:52 AM, Andreas Seltenreich <seltenre...@gmx.de> > wrote: >> I've added new grammar rules to sqlsmith and improved some older ones. > > Could you possibly teach sqlsmith about INSERT ... ON CONFLICT DO > UPDATE/IGNORE? I think that that could be very helpful, especially if > it could be done in advance of any stable release of 9.5. In summary, it can't be added ad-hoc, but might still happen in advance of the release of 9.5. Adding upsert needs significiant effort because historically, non-boolean value expression productions yield a random type. This is not a problem for generating queries, but it is for inserts. Also, sqlsmith can at the moment only generate sensible value expressions from column references. Generating a proper upsert would require supporting type-constraining of productions as well as adding productions for pulling values out of thin air (e.g., generating atomic value subselects or calling argumentless functions). regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries
David Fetter writes: > On Mon, Dec 14, 2015 at 03:06:18PM +0900, Michael Paquier wrote: >> On Sun, Dec 13, 2015 at 10:14 AM, Andreas Seltenreich wrote: >> > https://github.com/anse1/sqlsmith >> >> I am in awe regarding this stuff, which has caught many bugs >> already, it is a bit sad that it is released under the GPL license >> preventing a potential integration into PostgreSQL core to >> strengthen the test infrastructure, > > I suspect that a polite request to the Andreas that he change to a > PostgreSQL-compatible license like one of (TPL, BSD2, MIT) might > handle this part. It probably would, but I never thought core integration would be a desirable thing. Like Csmith, SQLsmith is intended to be product-agnostic. That's not yet the case, but it's still on the roadmap. Further, the license shouldn't interfere with institutionalizing sqlsmith somewhere to automatically send mails on failed assertions or first sight of an error message. Or providing a web interface around the logging database of an instance where one can drill down on logged errors/queries. >> and it is even sadder to see a direct dependency with libpqxx :( > > I suspect this part is a SMOP, but I'm not quite sure what S might > constitute in this case. sqlsmith uses three connections in total: 1. Connection for sending the generated queries to the DUT 2. Connection for retrieving the schema from the DUT 3. Logging connection 1 is trivial to change. 1+2 are intendend to be pluggable for supporting other products. Switching 1 to libpq is even on the todo list, as libpqxx doesn't allow access to the SQLSTATE (There is a four year old bug report about this by Andres). 2. Is more effort to change to libpq, as actual data is passed through that connection. 3. Was intended to stay libpqxx-only even when testing other products. Btw, I'm glad C++11 was no longer considered a blocker this thime :-) regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries
Greg Stark writes: > There may be other errors that would be surprising for Tom or Robert. What > I did with the string argument fuzzer was printed counts for each sqlstate > for the errors and watched for errors that only occurred occasionally or > didn't make sense to me. > > Also, do you have any timeouts? I currently set statement_timeout to 1s to avoid wasting time letting postgres crunch numbers. Less than 0.5% of the queries run into this timeout. > Do you have any stats on how long these queries are taking to plan? > What's the longest query to plan you've found? No, I'm currently not logging timing spects. The schema I let the instances log into is part of the repo[1]. > Do you have coverage data for the corpus? I do have some older numbers for line coverage from before the recent grammar extension: | revision | overall | parser | |--+-+| | a4c1989 |26.0 | 20.4 | regards, Andreas Footnotes: [1] https://github.com/anse1/sqlsmith/blob/master/log.sql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries
Tom Lane writes: > Merlin Moncurewrites: >> Aside from the functional issues, could your changes result in >> performance regressions? [...] > It's a little bit harder to gauge the impact on planner speed. The > transitive closure calculation could be expensive in a query with many > lateral references, but that doesn't seem likely to be common; and anyway > we'll buy back some of that cost due to simpler tests later. I'm > optimistic that we'll come out ahead in HEAD/9.5 after the removal > of LateralJoinInfo setup. It might be roughly a wash in the back > branches. On the empirical side: I see a speedup of 0.4% in testing speed with the patch applied. It could very well be me venting the room one additional time during the second session, resulting in the CPUs spending more time in their opportunistic frequency range or something. regards, Andreas smith=# select extract('day' from t), avg(generated/extract(epoch from updated - t)) as tps from instance natural join stat where generated > 100 and hostname not in('dwagon','slugbug') -- these do stuff beside sqlsmith and t > now() - interval '3 days' group by 1 order by 1; date_part | tps ---+-- 8 | 55.494181110456 9 | 55.6902316869404 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries
Greg Stark writes: > On Sat, Dec 12, 2015 at 8:30 PM, Andreas Seltenreich <seltenre...@gmx.de> > wrote: > When you hit the timeout is this implemented in your fuzzer or using > statement_timeout? If the former, can you add a statement_timeout of > just short of the timeout in the fuzzer and find cases where the > planner might not be calling CHECK_FOR_INTERRUPTS frequently enough? It's the latter. I don't think I can add a client-side timeout into sqlsmith elegantly. IMHO it's better to write another test tool that just re-runs the queries that were logged with a timeout by sqlsmith and investigates their timeout-behavior more closely. >> I do have some older numbers for line coverage from before the recent >> grammar extension: > > If you have a corpus of queries in a simple format it would be pretty > convenient to add them in a regression test and then run make coverage > to get html reports. Hmm, I thought I found a workflow that would yield sqlsmith's coverage without integrating it into the regession tests. This is what I did: make install initdb /tmp/gcov pg_ctl -D /tmp/gcov start make installcheck pg_ctl -D /tmp/gcov stop make coverage-clean pg_ctl -D /tmp/gcov start sqlsmith --target='dbname=regression' --max-queries=1 pg_ctl -D /tmp/gcov stop make coverage-html It seems to yield a pure sqlsmith-only coverage report, as a "make coverage-html" before the "make coverage-clean" yields a report with much higher score. Maybe there are drawbacks to the workflow you are suggesting? I just re-did it with the current sqlsmith code, and it's up by 25% compared to the latest tested revision: | revision | overall | parser | |--+-+| | a4c1989 |26.0 | 20.4 | | ee099e6 |33.8 | 25.8 | I also put the report here, in case someone wants to look at certain details, or make suggestions into what directions to best extend the grammar to increase coverage. http://ansel.ydns.eu/~andreas/coverage/ http://ansel.ydns.eu/~andreas/gcov.tar.xz > Did you publish the source already? I haven't been following all > along, sorry if these are all answered questions. It's not had a proper release yet, but the code is available via github in all its rapid-prototypesque glory: https://github.com/anse1/sqlsmith regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade in 9.5 broken for adminpack
Bruce Momjian writes: > On Thu, Apr 16, 2015 at 11:29:07PM -0700, Jeff Janes wrote: >> Of course after sending that it became obvious. The C function is not >> getting >> called because the SQL function is marked as being strict, yet is called with >> NULL arguments. >> >> Trivial patch attached to unset strict flag in pg_proc.h. >> >> But CATALOG_VERSION_NO probably needs another bump as well. > > Patch applied and catversion bumped. Thanks. Shouldn't there be some validation of arguments now that the function is no longer marked strict? Currently, unprivileged users can crash the server calling binary_upgrade_create_empty_extension with null arguments. Found using sqlsmith. regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failing assertions in spgtextproc.c
Tom Lane writes: > Andreas Seltenreich <seltenre...@gmx.de> writes: >> TRAP: FailedAssertion([...], File: "spgtextproc.c", Line: 424) >> TRAP: FailedAssertion([...], File: "spgtextproc.c", Line: 564) > > Can you show us the definition of the index that's causing this, > and some samples of the data you're putting in it? Here's a recipe for triggering the former: create table t(c text); create index on t using spgist(c); insert into t select '' from generate_series(1,1); set enable_seqscan to off; select count(1) from t; I still think it's just the assertions being too strict. regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] Add STRICT to some regression test C functions.
Hi, some of the C functions in the regression test DB readily crash when passing NULL input values. The regression tests themselves do not pass NULL values to them, but when the regression database is used as a basis for fuzz testing, they cause a lot of noise. Maybe someone can sneak this patch in? Thanks, Andreas >From 2711471d48c2e58809c2f4617d36352c5903bbd9 Mon Sep 17 00:00:00 2001 From: Andreas Seltenreich <seltenre...@gmx.de> Date: Sun, 3 Jan 2016 19:05:06 +0100 Subject: [PATCH] Add STRICT to some regression test C functions. These functions readily crash when passing NULL input values. The regression tests themselves do not pass NULL values to them, but when the regression database is used as a basis for fuzz testing, they cause a lot of noise. --- src/test/regress/input/create_function_2.source | 10 +- src/test/regress/output/create_function_2.source | 10 +- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/src/test/regress/input/create_function_2.source b/src/test/regress/input/create_function_2.source index 1b013ae..486803d 100644 --- a/src/test/regress/input/create_function_2.source +++ b/src/test/regress/input/create_function_2.source @@ -74,27 +74,27 @@ CREATE FUNCTION user_relns() CREATE FUNCTION pt_in_widget(point, widget) RETURNS bool AS '@libdir@/regress@DLSUFFIX@' - LANGUAGE C; + LANGUAGE C STRICT; CREATE FUNCTION overpaid(emp) RETURNS bool AS '@libdir@/regress@DLSUFFIX@' - LANGUAGE C; + LANGUAGE C STRICT; CREATE FUNCTION boxarea(box) RETURNS float8 AS '@libdir@/regress@DLSUFFIX@' - LANGUAGE C; + LANGUAGE C STRICT; CREATE FUNCTION interpt_pp(path, path) RETURNS point AS '@libdir@/regress@DLSUFFIX@' - LANGUAGE C; + LANGUAGE C STRICT; CREATE FUNCTION reverse_name(name) RETURNS name AS '@libdir@/regress@DLSUFFIX@' - LANGUAGE C; + LANGUAGE C STRICT; CREATE FUNCTION oldstyle_length(int4, text) RETURNS int4 diff --git a/src/test/regress/output/create_function_2.source b/src/test/regress/output/create_function_2.source index 98e1c29..bdfc5be 100644 --- a/src/test/regress/output/create_function_2.source +++ b/src/test/regress/output/create_function_2.source @@ -58,23 +58,23 @@ CREATE FUNCTION user_relns() CREATE FUNCTION pt_in_widget(point, widget) RETURNS bool AS '@libdir@/regress@DLSUFFIX@' - LANGUAGE C; + LANGUAGE C STRICT; CREATE FUNCTION overpaid(emp) RETURNS bool AS '@libdir@/regress@DLSUFFIX@' - LANGUAGE C; + LANGUAGE C STRICT; CREATE FUNCTION boxarea(box) RETURNS float8 AS '@libdir@/regress@DLSUFFIX@' - LANGUAGE C; + LANGUAGE C STRICT; CREATE FUNCTION interpt_pp(path, path) RETURNS point AS '@libdir@/regress@DLSUFFIX@' - LANGUAGE C; + LANGUAGE C STRICT; CREATE FUNCTION reverse_name(name) RETURNS name AS '@libdir@/regress@DLSUFFIX@' - LANGUAGE C; + LANGUAGE C STRICT; CREATE FUNCTION oldstyle_length(int4, text) RETURNS int4 AS '@libdir@/regress@DLSUFFIX@' -- 2.1.4 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [sqlsmith] Failing assertions in spgtextproc.c
I do see two assertions in spgtextproc.c fail on occasion when testing with sqlsmith: TRAP: FailedAssertion([...], File: "spgtextproc.c", Line: 424) TRAP: FailedAssertion([...], File: "spgtextproc.c", Line: 564) I can't reproduce it reliably but looking at the coredumps, the failing part of the expression is always in->level == 0 && DatumGetPointer(in->reconstructedValue) == NULL In all of the dumps I looked at, in->reconstructedValue contains a zero-length text instead of the asserted NULL, and the tuples fed to leaf_consistent()/inner_consistent() look like the one below. , | (gdb) p *in | $1 = {scankeys = 0x60a3ee0, nkeys = 1, reconstructedValue = 101373680, level = 0, | returnData = 1 '\001', allTheSame = 1 '\001', hasPrefix = 0 '\000', prefixDatum = 0, nNodes = 8, | nodeLabels = 0x37b6768} | (gdb) x ((text *)in->reconstructedValue)->vl_len_ | 0x60ad6f0:0x0010 | (gdb) p *(text *)in->scankeys[0]->sk_argument | $2 = {vl_len_ = "0\000\000", vl_dat = 0x855950c "sqlsmith~", '\177' , "\020 "} | (gdb) p in->nodeLabels[0] | $3 = 65535 ` Maybe these assertions are just too strict? I don't see the code misbehaving when relaxing them to reconstrValue != NULL && VARSIZE_ANY_EXHDR(reconstrValue) == in->level || in->level == 0 regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failing assertions in spgtextproc.c
Peter Geoghegan writes: > Can you do this?: > > (gdb) p debug_query_string output below. Since sqlsmith ist no longer restricted to read-only statements, the chances for reproduction are low :-/. select pg_catalog.pg_stat_get_buf_written_backend() as c0, subq_1.c0 as c1, subq_1.c0 as c2, subq_1.c0 as c3 from (select (select ordinal_position from information_schema.parameters limit 1 offset 12) as c0, ref_2.t as c1 from public.radix_text_tbl as ref_2 inner join pg_catalog.pg_stat_activity as ref_3 on (ref_2.t = ref_3.application_name ) where ref_2.t @@ cast(coalesce(ref_2.t, ref_3.client_hostname) as text) limit 111) as subq_1, lateral (select subq_1.c0 as c0, subq_2.c2 as c1, 56 as c2, cast(coalesce(cast(coalesce((select pop from public.real_city limit 1 offset 34) , subq_1.c0) as integer), subq_2.c0) as integer) as c3, 74 as c4, (select unique1 from public.onek2 limit 1 offset 17) as c5 from (select (select ordinal_position from information_schema.parameters limit 1 offset 27) as c0, sample_2.umoptions as c1, sample_2.umserver as c2 from pg_catalog.pg_user_mapping as sample_2 tablesample system (6.2) where 49 is NULL) as subq_2 where cast(coalesce(subq_1.c0, subq_2.c0) as integer) is not NULL limit 105) as subq_3 where ((select x from public.tt0 limit 1 offset 12) <= subq_3.c0) and (subq_3.c4 <= 30); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [sqlsmith] Failed assertion in postgres_fdw/deparse.c:1116
Creating some foreign tables via postgres_fdw in the regression db of master as of de33af8, sqlsmith triggers the following assertion: TRAP: FailedAssertion("!(const Node*)(var))->type) == T_Var))", File: "deparse.c", Line: 1116) gdb says var is holding a T_PlaceHolderVar instead. In a build without assertions, it leads to an error later: ERROR: cache lookup failed for type 0 Recipe: --8<---cut here---start->8--- create extension postgres_fdw; create server myself foreign data wrapper postgres_fdw; create schema fdw_postgres; create user mapping for public server myself options (user :'USER'); import foreign schema public from server myself into fdw_postgres; select subq_0.c0 as c0 from (select 31 as c0 from fdw_postgres.a as ref_0 where 93 >= ref_0.aa) as subq_0 right join fdw_postgres.rtest_vview5 as ref_1 on (subq_0.c0 = ref_1.a ) where 92 = subq_0.c0; --8<---cut here---end--->8--- regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Don't generate parallel paths for rels with parallel-restricted
Amit Kapila writes: > I have moved it to CLOSE_WAIT state because we have derived our > queries to reproduce the problem based on original report[1]. If next > run of sqlsmith doesn't show any problem in this context then we will > move it to resolved. I don't have access to my testing horse power this weekend so I can report on tuesday at the earliest. Unless someone else feels like running sqlsmith… regards, Andreas -- SQLsmith error of the day: time zone "Bruce Momjian" not recognized. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] PANIC: failed to add BRIN tuple
Alvaro Herrera writes: > How long does it take for you to reproduce this panic in the unpatched > code? Very long, I'm afraid. I only observed it once, and according to the logging database, about 4e9 random queries were generated since testing with 9.5 code. I could probably speed it up by creating lots of additional BRIN indexes in the regression database, and by compiling a sqlsmith that generates update statements only. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ORDER/GROUP BY expression not found in targetlist
Tom Lane writes: > Andreas Seltenreich <seltenre...@gmx.de> writes: >> Peter Geoghegan writes: >>> It's surprising that SQL Smith didn't catch something with such simple >>> steps to reproduce. > >> I removed distinct relatively early because it causes a large part of >> queries to fail due to it not finding an equality operator it likes. It >> seems to be more picky about the equality operator than, say, joins. >> I'm sure it has a good reason to do so? > > It's looking for an operator that is known to be semantically equality, > by virtue of being the equality member of a btree or hash opclass. > Type path has no such opclass unfortunately. As do lots of data types in the regression db while still having an operator providing semantic equivalence. I was hoping for someone to question that status quo. Naively I'd say an equivalence flag is missing in the catalog that is independent of opclasses. regards Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] PANIC: failed to add BRIN tuple
I wrote: > Alvaro Herrera writes: >> How long does it take for you to reproduce this panic in the unpatched >> code? > > I could probably speed it up by creating lots of additional BRIN indexes > in the regression database, and by compiling a sqlsmith that generates > update statements only. This actually worked. Sqlsmith triggered the BRIN panic twice in 80e6 queries (vs. onece in 4e9 before). I pushed the modified version on the branch "modify-heavy". Re-fuzzing now with your patch applied. andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Should pg_export_snapshot() and currtid() be tagged parallel-unsafe?
Digging through the sqlsmith logging db, I noticed the following errors: ERROR: cannot update SecondarySnapshot during a parallel operation ERROR: cannot assign XIDs during a parallel operation Queries raising the first one always contain calls to currtid() or currtid2(). Queries raising the second one always contain a call to pg_export_snapshot(). Patch to tag them as unsafe attached. regards, Andreas diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h index f33c3ff..6a65e77 100644 --- a/src/include/catalog/pg_proc.h +++ b/src/include/catalog/pg_proc.h @@ -1347,9 +1347,9 @@ DATA(insert OID = 1291 ( suppress_redundant_updates_trigger PGNSP PGUID 12 1 0 DESCR("trigger to suppress updates when new and old records match"); DATA(insert OID = 1292 ( tideq PGNSP PGUID 12 1 0 0 0 f f f t t f i s 2 0 16 "27 27" _null_ _null_ _null_ _null_ _null_ tideq _null_ _null_ _null_ )); -DATA(insert OID = 1293 ( currtid PGNSP PGUID 12 1 0 0 0 f f f f t f v s 2 0 27 "26 27" _null_ _null_ _null_ _null_ _null_ currtid_byreloid _null_ _null_ _null_ )); +DATA(insert OID = 1293 ( currtid PGNSP PGUID 12 1 0 0 0 f f f f t f v u 2 0 27 "26 27" _null_ _null_ _null_ _null_ _null_ currtid_byreloid _null_ _null_ _null_ )); DESCR("latest tid of a tuple"); -DATA(insert OID = 1294 ( currtid2 PGNSP PGUID 12 1 0 0 0 f f f f t f v s 2 0 27 "25 27" _null_ _null_ _null_ _null_ _null_ currtid_byrelname _null_ _null_ _null_ )); +DATA(insert OID = 1294 ( currtid2 PGNSP PGUID 12 1 0 0 0 f f f f t f v u 2 0 27 "25 27" _null_ _null_ _null_ _null_ _null_ currtid_byrelname _null_ _null_ _null_ )); DESCR("latest tid of a tuple"); DATA(insert OID = 1265 ( tidne PGNSP PGUID 12 1 0 0 0 f f f t t f i s 2 0 16 "27 27" _null_ _null_ _null_ _null_ _null_ tidne _null_ _null_ _null_ )); DATA(insert OID = 2790 ( tidgt PGNSP PGUID 12 1 0 0 0 f f f t t f i s 2 0 16 "27 27" _null_ _null_ _null_ _null_ _null_ tidgt _null_ _null_ _null_ )); @@ -3135,7 +3135,7 @@ DESCR("xlog filename, given an xlog location"); DATA(insert OID = 3165 ( pg_xlog_location_diff PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 1700 "3220 3220" _null_ _null_ _null_ _null_ _null_ pg_xlog_location_diff _null_ _null_ _null_ )); DESCR("difference in bytes, given two xlog locations"); -DATA(insert OID = 3809 ( pg_export_snapshot PGNSP PGUID 12 1 0 0 0 f f f f t f v r 0 0 25 "" _null_ _null_ _null_ _null_ _null_ pg_export_snapshot _null_ _null_ _null_ )); +DATA(insert OID = 3809 ( pg_export_snapshot PGNSP PGUID 12 1 0 0 0 f f f f t f v u 0 0 25 "" _null_ _null_ _null_ _null_ _null_ pg_export_snapshot _null_ _null_ _null_ )); DESCR("export a snapshot"); DATA(insert OID = 3810 ( pg_is_in_recovery PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 16 "" _null_ _null_ _null_ _null_ _null_ pg_is_in_recovery _null_ _null_ _null_ )); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Don't generate parallel paths for rels with parallel-restricted
Amit Kapila writes: > Right, so I have moved "Failed assertion in parallel worker > (ExecInitSubPlan)" item to CLOSE_WAIT state as I don't think there is any > known pending issue in that item. I have moved it to CLOSE_WAIT state > because we have derived our queries to reproduce the problem based on > original report[1]. If next run of sqlsmith doesn't show any problem in > this context then we will move it to resolved. It ran for about 100e6 queries by now without tripping on any parallel worker related assertions. I tested with min_parallel_relation_size_v1.patch applied and set to 64kB to have more exposure during testing. regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ORDER/GROUP BY expression not found in targetlist
Peter Geoghegan writes: > On Wed, May 25, 2016 at 7:12 PM, Andres Freundwrote: >> >> =# CREATE TABLE twocol(col01 int, col02 int); >> =# SELECT DISTINCT col01, col02, col01 FROM twocol ; >> ERROR: XX000: ORDER/GROUP BY expression not found in targetlist >> LOCATION: get_sortgroupref_tle, tlist.c:341 >> >> which appears to be a 9.6 regression, presumable fallout from the path >> restructuring. > > It's surprising that SQL Smith didn't catch something with such simple > steps to reproduce. I removed distinct relatively early because it causes a large part of queries to fail due to it not finding an equality operator it likes. It seems to be more picky about the equality operator than, say, joins. I'm sure it has a good reason to do so? regression=> select distinct f1 from path_tbl; ERROR: could not identify an equality operator for type path LINE 1: select distinct f1 from path_tbl; regression=> \do = -[ RECORD 38 ]-+ Schema | pg_catalog Name | = Left arg type | path Right arg type | path Result type| boolean Description| equal -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] PANIC: failed to add BRIN tuple
Alvaro Herrera writes: > If you can re-run sqlsmith and see if you can find different bugs, I'd > appreciate it. [...] > [2. text/x-diff; brincrash-2.patch] BRIN is inconspicuous since applying this patch. All coredumps I see now are either due to the parallel worker shutdown issue or acl.c's text/name confusion, both reported earlier. regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [sqlsmith] OOM crash in plpgsql_extra_checks_check_hook
Just had a parallel worker of a memory-starved instance of sqlsmith crash. plpgsql_extra_checks_check_hook forgot to check the result of its malloc call here: Core was generated by `postgres: bgworker: parallel worker for PID 5905 '. Program terminated with signal SIGSEGV, Segmentation fault. #0 plpgsql_extra_checks_check_hook (newvalue=, extra=0x7fff7fe31a58, source=) at pl_handler.c:113 113 *myextra = extrachecks; (gdb) bt #0 plpgsql_extra_checks_check_hook (newvalue=, extra=0x7fff7fe31a58, source=) at pl_handler.c:113 #1 0x0080173f in call_string_check_hook (newval=0x7fff7fe31a50, extra=, source=, elevel=15, conf=, conf=) at guc.c:9779 #2 0x008029b8 in InitializeOneGUCOption (gconf=0x4) at guc.c:4546 #3 0x00804dbc in define_custom_variable (variable=0x2cb6ef0) at guc.c:7466 #4 0x00805862 in DefineCustomStringVariable (name=name@entry=0x7f803cbfe011 "plpgsql.extra_warnings", short_desc=short_desc@entry=0x7f803cbfe1f8 "List of programming constructs that should produce a warning.", long_desc=long_desc@entry=0x0, valueAddr=valueAddr@entry=0x7f803ce070d8 , bootValue=bootValue@entry=0x7f803cbfdf78 "none", context=context@entry=PGC_USERSET, flags=1, check_hook=0x7f803cbe9700 , assign_hook=0x7f803cbe96e0 , show_hook=0x0) at guc.c:7733 #5 0x7f803cbe99ea in _PG_init () at pl_handler.c:173 #6 0x007f1bcb in internal_load_library (libname=libname@entry=0x7f8040cee14d ) at dfmgr.c:276 #7 0x007f2738 in RestoreLibraryState (start_address=0x7f8040cee14d ) at dfmgr.c:741 #8 0x004e61c0 in ParallelWorkerMain (main_arg=) at parallel.c:985 #9 0x00684072 in StartBackgroundWorker () at bgworker.c:726 #10 0x0068f142 in do_start_bgworker (rw=0x2cb5230) at postmaster.c:5535 #11 maybe_start_bgworker () at postmaster.c:5709 #12 0x0068fb96 in sigusr1_handler (postgres_signal_arg=) at postmaster.c:4971 #13 #14 0x7f8040091ac3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81 #15 0x0046c31f in ServerLoop () at postmaster.c:1657 #16 0x00690fc7 in PostmasterMain (argc=argc@entry=4, argv=argv@entry=0x2c8c620) at postmaster.c:1301 #17 0x0046d96d in main (argc=4, argv=0x2c8c620) at main.c:228 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] Improve spinlock inline assembly for x86.
Hi, I'm currently experimenting with just-in-time compilation using libfirm. While discussing issues with its developers, it was pointed out to me that our spinlock inline assembly is less than optimal. Attached is a patch that addresses this. , | Remove the LOCK prefix from the XCHG instruction. Locking is implicit | with XCHG and the prefix wastes a byte. Also remove the "cc" register | from the clobber list as the XCHG instruction does not modify any flags. | | Reported by Christoph Mallon. ` regards, Andreas >From c836b4f3e0b60d070481d4061e6fe0ffbe488495 Mon Sep 17 00:00:00 2001 From: Andreas Seltenreich <seltenre...@gmx.de> Date: Sun, 17 Jan 2016 11:51:53 +0100 Subject: [PATCH] Improve spinlock inline assembly for x86. Remove the LOCK prefix from the XCHG instruction. Locking is implicit with XCHG and the prefix wastes a byte. Also remove the "cc" register from the clobber list as the xchg instruction does not modify any flags. Reported by Christoph Mallon. --- src/include/storage/s_lock.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h index 8b240cd..933bb76 100644 --- a/src/include/storage/s_lock.h +++ b/src/include/storage/s_lock.h @@ -158,7 +158,6 @@ tas(volatile slock_t *lock) __asm__ __volatile__( " cmpb $0,%1 \n" " jne 1f \n" - " lock \n" " xchgb %0,%1 \n" "1: \n" : "+q"(_res), "+m"(*lock) @@ -226,11 +225,10 @@ tas(volatile slock_t *lock) register slock_t _res = 1; __asm__ __volatile__( - " lock \n" " xchgb %0,%1 \n" : "+q"(_res), "+m"(*lock) : /* no inputs */ -: "memory", "cc"); +: "memory"); return (int) _res; } -- 2.1.4 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [sqlsmith] Failed assertion in BecomeLockGroupLeader
Hi, tonight's sqlsmith run yielded another core dump: TRAP: FailedAssertion("!(MyProc->lockGroupLeader == ((void *)0))", File: "proc.c", Line: 1787) I couldn't identifiy a query for it though: debug_query_string is empty. Additionally, the offending query was not reported in the error context as it typically is for non-parallel executor crashes. regards, Andreas GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1 Core was generated by `postgres: bgworker: parallel worker for PID 4706 '. Program terminated with signal SIGABRT, Aborted. #0 0x7ff1bda16067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 ../nptl/sysdeps/unix/sysv/linux/raise.c: Datei oder Verzeichnis nicht gefunden. #0 0x7ff1bda16067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x7ff1bda17448 in __GI_abort () at abort.c:89 #2 0x007eaa11 in ExceptionalCondition (conditionName=conditionName@entry=0x988318 "!(MyProc->lockGroupLeader == ((void *)0))", errorType=errorType@entry=0x82a45d "FailedAssertion", fileName=fileName@entry=0x8760e5 "proc.c", lineNumber=lineNumber@entry=1787) at assert.c:54 #3 0x006e3e7b in BecomeLockGroupLeader () at proc.c:1787 #4 0x004e6a59 in LaunchParallelWorkers (pcxt=pcxt@entry=0x1db05c8) at parallel.c:437 #5 0x005ef2d7 in ExecGather (node=node@entry=0x1d9d0b8) at nodeGather.c:168 #6 0x005dd788 in ExecProcNode (node=node@entry=0x1d9d0b8) at execProcnode.c:515 #7 0x005d999f in ExecutePlan (dest=0x1d7d310, direction=, numberTuples=0, sendTuples=, operation=CMD_SELECT, use_parallel_mode=, planstate=0x1d9d0b8, estate=0x1d9c858) at execMain.c:1567 #8 standard_ExecutorRun (queryDesc=0x1db0080, direction=, count=0) at execMain.c:338 #9 0x005dcb3f in ParallelQueryMain (seg=, toc=0x7ff1be507000) at execParallel.c:716 #10 0x004e608b in ParallelWorkerMain (main_arg=) at parallel.c:1033 #11 0x00683a42 in StartBackgroundWorker () at bgworker.c:726 #12 0x0068eb82 in do_start_bgworker (rw=0x1d24ec0) at postmaster.c:5531 #13 maybe_start_bgworker () at postmaster.c:5706 #14 0x0046c993 in ServerLoop () at postmaster.c:1762 #15 0x006909fe in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1cfa560) at postmaster.c:1298 #16 0x0046d5ed in main (argc=3, argv=0x1cfa560) at main.c:228 (gdb) bt full #3 0x006e3e7b in BecomeLockGroupLeader () at proc.c:1787 leader_lwlock = #4 0x004e6a59 in LaunchParallelWorkers (pcxt=pcxt@entry=0x1db05c8) at parallel.c:437 oldcontext = 0x1d9ced0 worker = { bgw_name = "\220\a\333\001\000\000\000\000\370\340L\276\361\177\000\000\370\373\025\264\361\177\000\000P\a\333\001\000\000\000\000X\310\331\001\000\000\000\000\004\000\000\000\000\000\000\000\310\005\333\001\000\000\000\000\233\222J\000\000\000\000", bgw_flags = 1, bgw_start_time = BgWorkerStart_PostmasterStart, bgw_restart_time = 1, bgw_main = 0x0, bgw_library_name = "\000\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000u\222J\000\000\000\000\000\350\336\331\001\000\000\000\000\360\260a\000\000\000\000\000\200\315]\000\000\000\000\000\300\330\367\217\375\177\000\000h\205\333\001\000\000\000", bgw_function_name = "`\346\327\001\000\000\000\000\004\000\000\000\000\000\000\000\360\260a\000\000\000\000\000\200\315]\000\000\000\000\000\300\330\367\217\375\177\000\000h\317\331\001\000\000\000\000\210\325\327\001\000\000\000\000\004\000\000\000\000\000\000", bgw_main_arg = 6402288, bgw_extra = "X\310\331\001\000\000\000\000\000\000\000\000\000\000\000\000\060\233\333\001\000\000\000\000\001\000\000\000\000\000\000\000\004\000\000\000\000\000\000\000l\321]\000\000\000\000\000\000\000\000\000\000\000\000\000H-\334\001\000\000\000\000h\317\331\001\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\274\341M\276\361\177\000\000\310\005\333\001\000\000\000\000\004\000\000\000\000\000\000\000\310\005\333\001\000\000\000\000\000\000\000\000\000\000\000", bgw_notify_pid = 4 } i = any_registrations_failed = 0 '\000' #5 0x005ef2d7 in ExecGather (node=node@entry=0x1d9d0b8) at nodeGather.c:168 pcxt = 0x1db05c8 estate = gather = 0x1d7d440 fslot = 0x1d9ced0 i = resultSlot = isDone = ExprSingleResult econtext = #6 0x005dd788 in ExecProcNode (node=node@entry=0x1d9d0b8) at execProcnode.c:515 result = __func__ = "ExecProcNode" #7 0x005d999f in ExecutePlan (dest=0x1d7d310, direction=, numberTuples=0, sendTuples=, operation=CMD_SELECT, use_parallel_mode=, planstate=0x1d9d0b8, estate=0x1d9c858) at execMain.c:1567 slot = current_tuple_count = 0 #8 standard_ExecutorRun (queryDesc=0x1db0080, direction=,
Re: [HACKERS] [sqlsmith] Failed assertion in BecomeLockGroupLeader
Alvaro Herrera writes: > Amit Kapila wrote: >> It will be helpful if you can find the offending query or plan >> corresponding to it? > > So I suppose the PID of the process starting the workers should be in > the stack somewhere. Ja, it's right on the top, but long gone by now… > With that one should be able to attach to that process and get another > stack trace. I'm curious on whether you would need to have started > the server with "postgres -T" This sounds like it should work to capture more context when the Assertion fails the next time. I have to purge the catalogs a bit though to avoid stopping early on boring core dumps. Most of them are currently caused by acl.c using text for syscache lookups and triggering an NAMEDATALEN assertion. E.g.: select has_language_privilege('smithsmithsmithsmithsmithsmithsmithsmithsmithsmithsmithsmithsmith', smith'); thanks, andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in BecomeLockGroupLeader
Simon Riggs writes: > It's good that the input is fuzzed, but there needs to be a way to re-run > identical fuzzing or a way to backtrack to find what broke. Not much point > finding bugs we can't identify later. sqlsmith is deterministic and allows re-generating a sequence of random queries with the --seed argument. Finding a testing methodology that ensures a repeatable server-side is a harder problem though. One would have to avoid touching any kind of concurrency, disable autovacuum, autoanalyze and invoke explicit analyzes/vacuums in concert with query generation. Further, one would have to avoid any kind of concurrency while testing. Even then, 1% of the queries run into a statement_timeout due to randomly generated excessive cross joins. If a timeout just barely happens, it might not do so on the repeated run and the deterministic state is gone from then on. I'm afraid this list is not complete yet. I didn't think the effort of creating this kind of clean-room testing was worth it. If reports of failed assertions with backtrace without a recipe to reproduce them are a nuisance, I'll avoid them in the future. regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in BecomeLockGroupLeader
Amit Kapila writes: > On Fri, Apr 29, 2016 at 7:15 PM, Tom Lanewrote: >> but it might be worth copying over the full query from the parent >> side. > > That would amount to couple of extra cycles considering we need to do it > for each worker, but OTOH it might be a useful debugging information in the > cases as reported in this thread. Maybe only do it in assertion-enabled builds when performance is an issue? regards, andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in BecomeLockGroupLeader
Amit Kapila writes: > On Sat, Apr 30, 2016 at 5:58 AM, Andreas Seltenreich <seltenre...@gmx.de> > wrote: >> This sounds like it should work to capture more context when the >> Assertion fails the next time. > > Sounds good. So can we assume that you will try to get us the new report > with more information? Ja. I do have a busy weekend+week ahead though, so no holding of breath. regards Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [sqlsmith] Crash in apply_projection_to_path
Hi, the following query against the regression database crashes master as of 23b09e15. select 1 from depth0 inner join depth1 on (depth0.c = depth1.c) where depth0.c @@ depth1.c limit 1; regards, Andreas Program terminated with signal SIGSEGV, Segmentation fault. #0 create_projection_path (root=root@entry=0x6918e60, rel=0x7f7f7f7f7f7f7f7f, subpath=0x69d6de8, target=target@entry=0x69d7428) at pathnode.c:2160 2160pathnode->path.parallel_safe = rel->consider_parallel && #0 create_projection_path (root=root@entry=0x6918e60, rel=0x7f7f7f7f7f7f7f7f, subpath=0x69d6de8, target=target@entry=0x69d7428) at pathnode.c:2160 #1 0x0067841e in apply_projection_to_path (root=0x6918e60, rel=0x69d5e18, path=0x69d6ee0, target=0x69d7428) at pathnode.c:2251 #2 0x0065ff20 in grouping_planner (root=0x22e1850, root@entry=0x6918e60, inheritance_update=56 '8', inheritance_update@entry=0 '\000', tuple_fraction=2615.2579411764709, tuple_fraction@entry=0) at planner.c:1737 #3 0x00661f44 in subquery_planner (glob=glob@entry=0x6918dc8, parse=parse@entry=0x235e3b0, parent_root=parent_root@entry=0x0, hasRecursion=hasRecursion@entry=0 '\000', tuple_fraction=tuple_fraction@entry=0) at planner.c:758 #4 0x00662e2b in standard_planner (parse=0x235e3b0, cursorOptions=256, boundParams=0x0) at planner.c:307 #5 0x006f40ed in pg_plan_query (querytree=0x235e3b0, cursorOptions=256, boundParams=0x0) at postgres.c:798 #6 0x006f41e4 in pg_plan_queries (querytrees=, cursorOptions=256, boundParams=0x0) at postgres.c:857 #7 0x006f5c93 in exec_simple_query (query_string=) at postgres.c:1022 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertions on parallel worker shutdown
I wrote: > There's another class of parallel worker core dumps when testing master > with sqlsmith. In these cases, the following assertion fails for all > workers simulataneously: > > TRAP: FailedAssertion("!(mqh->mqh_partial_bytes <= nbytes)", File: > "shm_mq.c", Line: 386) I no longer observe these after applying these two patches by Amit Kapila: avoid_restricted_clause_below_gather_v1.patch Message-ID:
Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)
Amit Kapila writes: > avoid_restricted_clause_below_gather_v1.patch > prohibit_parallel_clause_below_rel_v1.patch I didn't observe any parallel worker related coredumps since applying these. The same amount of testing done before applying them yielded about a dozend. Dilip Kumar writes: > So now its clear that because of sub query pullup, we may get expression in > targetlist while creating single table path list. So we need to avoid > parallel plan if it contains expression. This sounds like a rather heavy restriction though… regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertions on parallel worker shutdown
I wrote: >> There's another class of parallel worker core dumps when testing master >> with sqlsmith. In these cases, the following assertion fails for all >> workers simulataneously: >> >> TRAP: FailedAssertion("!(mqh->mqh_partial_bytes <= nbytes)", File: >> "shm_mq.c", Line: 386) > > I no longer observe these after applying these two patches by Amit > Kapila I spoke too soon: These still occur with the patches applied, but with much lower probability. (one core dump per 20e6 random queries instead of 1e6). Most of the collected plans look inconspicuous to me now, except for one that again had a subplan below a gather node (plan6). Tarball of all collected plans attached. regards, Andreas plans.tar.gz Description: application/gzip -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertions on parallel worker shutdown
Amit Kapila writes: > Earlier problems were due to the reason that some unsafe/restricted > expressions were pushed below Gather node as part of target list whereas in > the plan6, it seems some unsafe node is pushed below Gather node. It will > be helpful if you can share the offending query? plan6 corresponds to this query: select pg_catalog.anyarray_out( cast((select most_common_vals from pg_catalog.pg_stats limit 1 offset 41) as anyarray)) as c0 from public.quad_point_tbl as ref_0 where ref_0.p ~= ref_0.p; regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertions on parallel worker shutdown
Amit Kapila writes: > On Mon, May 23, 2016 at 4:48 PM, Andreas Seltenreich <seltenre...@gmx.de> > wrote: >> plan6 corresponds to this query: >> > Are you sure that the core dumps you are seeing are due to plan6? Each of the plans sent was harvested from a controlling process when the above assertion failed in its workers. I do not know whether the plans themselves really are at fault, as most of the collected plans look ok to me. The backtrace in the controlling process always look like the one reported. (Except when the coredumping took so long as to trigger a statement_timeout in the still-running master. There are no plans/queries available in this case, as the the state is no longer available in an aborted transaction.) > I have tried to generate a parallel plan for above query and it seems to me > that > after applying the patches (avoid_restricted_clause_below_gather_v1.patch > and prohibit_parallel_clause_below_rel_v1.patch), the plan it generates > doesn't have subplan below gather node [1]. > Without patch avoid_restricted_clause_below_gather_v1.patch, it will allow to > push > subplan below gather node, so I think either there is some other plan > (query) due to which you are seeing core dumps or the above two patches > haven't been applied before testing. According to my notes, the patches were applied in the instance that crashed. The fact that I do not see the other variants of the crashes the patches fix anymore, and the probability for this failed assertion per random query is reduced by about a factor of 20 in contrast to testing with the patches not applied, I'm pretty certain that this is not a bookkeeping error on my part. > Is it possible that core dump is due to plan2 or some other similar > plan (I am not sure at this stage about the cause of the problem you > are seeing, but if due to some reason PARAM_EXEC params are pushed > below gather, then such a plan might not work)? If you think plan > other than plan6 can cause such a problem, then can you share the > query for plan2? Each of the sent plans was collected when a worker dumped core due to the failed assertion. More core dumps than plans were actually observed, since with this failed assertion, multiple workers usually trip on and dump core simultaneously. The following query corresponds to plan2: --8<---cut here---start->8--- select pg_catalog.pg_stat_get_bgwriter_requested_checkpoints() as c0, subq_0.c3 as c1, subq_0.c1 as c2, 31 as c3, 18 as c4, (select unique1 from public.bprime limit 1 offset 9) as c5, subq_0.c2 as c6 from (select ref_0.tablename as c0, ref_0.inherited as c1, ref_0.histogram_bounds as c2, 100 as c3 from pg_catalog.pg_stats as ref_0 where 49 is not NULL limit 55) as subq_0 where true limit 58; --8<---cut here---end--->8--- regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [sqlsmith] Failed assertions on parallel worker shutdown
There's another class of parallel worker core dumps when testing master with sqlsmith. In these cases, the following assertion fails for all workers simulataneously: TRAP: FailedAssertion("!(mqh->mqh_partial_bytes <= nbytes)", File: "shm_mq.c", Line: 386) The backtraces of the controlling process is always in ExecShutdownGatherWorkers. The queries always work fine on re-running, so I guess there is some race condition on worker shutdown? Backtraces below. regards andreas Core was generated by `postgres: bgworker: parallel worker for PID 30525 '. Program terminated with signal SIGABRT, Aborted. #0 0x7f5a3df91067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 0x7f5a3df91067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x7f5a3df92448 in __GI_abort () at abort.c:89 #2 0x007eabe1 in ExceptionalCondition (conditionName=conditionName@entry=0x984e10 "!(mqh->mqh_partial_bytes <= nbytes)", errorType=errorType@entry=0x82a75d "FailedAssertion", fileName=fileName@entry=0x984b8c "shm_mq.c", lineNumber=lineNumber@entry=386) at assert.c:54 #3 0x006d8042 in shm_mq_sendv (mqh=0x25f17b8, iov=iov@entry=0x7ffc6352af00, iovcnt=iovcnt@entry=1, nowait=) at shm_mq.c:386 #4 0x006d807d in shm_mq_send (mqh=, nbytes=, data=, nowait=) at shm_mq.c:327 #5 0x005d96b9 in ExecutePlan (dest=0x25f1850, direction=, numberTuples=0, sendTuples=, operation=CMD_SELECT, use_parallel_mode=, planstate=0x2612da8, estate=0x2612658) at execMain.c:1596 #6 standard_ExecutorRun (queryDesc=0x261a660, direction=, count=0) at execMain.c:338 #7 0x005dc7cf in ParallelQueryMain (seg=, toc=0x7f5a3ea6c000) at execParallel.c:735 #8 0x004e617b in ParallelWorkerMain (main_arg=) at parallel.c:1035 #9 0x00683862 in StartBackgroundWorker () at bgworker.c:726 #10 0x0068e9a2 in do_start_bgworker (rw=0x2590760) at postmaster.c:5531 #11 maybe_start_bgworker () at postmaster.c:5706 #12 0x0046cbba in ServerLoop () at postmaster.c:1762 #13 0x0069081e in PostmasterMain (argc=argc@entry=4, argv=argv@entry=0x256d580) at postmaster.c:1298 #14 0x0046d80d in main (argc=4, argv=0x256d580) at main.c:228 (gdb) attach 30525 0x7f5a3e044e33 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:81 81 ../sysdeps/unix/syscall-template.S: No such file or directory. (gdb) bt #0 0x7f5a3e044e33 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:81 #1 0x006d1b4e in WaitEventSetWaitBlock (nevents=1, occurred_events=0x7ffc6352aec0, cur_timeout=-1, set=0x44251c0) at latch.c:981 #2 WaitEventSetWait (set=set@entry=0x44251c0, timeout=timeout@entry=-1, occurred_events=occurred_events@entry=0x7ffc6352aec0, nevents=nevents@entry=1) at latch.c:935 #3 0x006d1f96 in WaitLatchOrSocket (latch=0x7f5a3d898494, wakeEvents=wakeEvents@entry=1, sock=sock@entry=-1, timeout=timeout@entry=-1) at latch.c:347 #4 0x006d205d in WaitLatch (latch=, wakeEvents=wakeEvents@entry=1, timeout=timeout@entry=-1) at latch.c:302 #5 0x004e6d64 in WaitForParallelWorkersToFinish (pcxt=0x442d4e8) at parallel.c:537 #6 0x005dcf84 in ExecParallelFinish (pei=0x441cab8) at execParallel.c:541 #7 0x005eeead in ExecShutdownGatherWorkers (node=node@entry=0x3e3a070) at nodeGather.c:416 #8 0x005ef389 in ExecShutdownGather (node=0x3e3a070) at nodeGather.c:430 #9 0x005dd03d in ExecShutdownNode (node=0x3e3a070) at execProcnode.c:807 #10 0x0061ad73 in planstate_tree_walker (planstate=0x3e361a8, walker=0x5dd010 , context=0x0) at nodeFuncs.c:3442 #11 0x0061ad73 in planstate_tree_walker (planstate=0xf323c30, walker=0x5dd010 , context=0x0) at nodeFuncs.c:3442 #12 0x0061ad73 in planstate_tree_walker (planstate=0xf323960, walker=0x5dd010 , context=0x0) at nodeFuncs.c:3442 #13 0x005d96da in ExecutePlan (dest=0xb826868, direction=, numberTuples=0, sendTuples=, operation=CMD_SELECT, use_parallel_mode=, planstate=0xf323960, estate=0xf322b28) at execMain.c:1576 #14 standard_ExecutorRun (queryDesc=0xddca888, direction=, count=0) at execMain.c:338 #15 0x006f6e88 in PortalRunSelect (portal=portal@entry=0x258ccc8, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0xb826868) at pquery.c:946 #16 0x006f83ae in PortalRun (portal=0x258ccc8, count=9223372036854775807, isTopLevel=, dest=0xb826868, altdest=0xb826868, completionTag=0x7ffc6352b3d0 "") at pquery.c:787 #17 0x006f5c63 in exec_simple_query (query_string=) at postgres.c:1094 #18 PostgresMain (argc=39374024, argv=0x25ed130, dbname=0x256e480 "regression", username=0x25ed308 "0\321^\002") at postgres.c:4059 #19 0x0046c8b2 in BackendRun (port=0x25935d0) at postmaster.c:4258 #20
[HACKERS] [sqlsmith] PANIC: failed to add BRIN tuple
There was one instance of this PANIC when testing with the regression db of master at 50e5315. , | WARNING: specified item offset is too large | PANIC: failed to add BRIN tuple | server closed the connection unexpectedly ` It is reproducible with the query below on this instance only. I've put the data directory (20MB) here: http://ansel.ydns.eu/~andreas/brincrash.tar.xz The instance was running on Debian Jessie amd64. Query and Backtrace below. regards, Andreas --8<---cut here---start->8--- update public.brintest set byteacol = null, charcol = public.brintest.charcol, int2col = null, int4col = public.brintest.int4col, textcol = public.brintest.textcol, oidcol = cast(coalesce(cast(coalesce(null, public.brintest.oidcol) as oid), pg_catalog.pg_my_temp_schema()) as oid), tidcol = public.brintest.tidcol, float8col = public.brintest.float8col, macaddrcol = null, cidrcol = public.brintest.cidrcol, datecol = public.brintest.datecol, timecol = public.brintest.timecol, timestamptzcol = pg_catalog.clock_timestamp(), intervalcol = public.brintest.intervalcol, timetzcol = public.brintest.timetzcol, bitcol = public.brintest.bitcol, varbitcol = public.brintest.varbitcol, uuidcol = null returning public.brintest.byteacol as c0; --8<---cut here---end--->8--- Core was generated by `postgres: smith regression [local] UPDATE '. Program terminated with signal SIGABRT, Aborted. #0 0x7fd2cda67067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 0x7fd2cda67067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x7fd2cda68448 in __GI_abort () at abort.c:89 #2 0x007ec969 in errfinish (dummy=dummy@entry=0) at elog.c:557 #3 0x007f011c in elog_finish (elevel=elevel@entry=20, fmt=fmt@entry=0x82ca8f "failed to add BRIN tuple") at elog.c:1378 #4 0x00470618 in brin_doupdate (idxrel=0x101f4c0, pagesPerRange=1, revmap=0x10d20e50, heapBlk=8, oldbuf=2878, oldoff=9, origtup=0x10d864a8, origsz=6144, newtup=0x5328a88, newsz=6144, samepage=1 '\001') at brin_pageops.c:184 #5 0x0046e5bb in brininsert (idxRel=0x101f4c0, values=0x211b, nulls=0x6 , heaptid=0x, heapRel=0x7fd2ce6fd700, checkUnique=UNIQUE_CHECK_NO) at brin.c:244 #6 0x005d887f in ExecInsertIndexTuples (slot=0xe92a560, tupleid=0x10d21084, estate=0x9ed8a68, noDupErr=0 '\000', specConflict=0x0, arbiterIndexes=0x0) at execIndexing.c:383 #7 0x005f74d5 in ExecUpdate (tupleid=0x7ffe11ea74a0, oldtuple=0x211b, slot=0xe92a560, planSlot=0x, epqstate=0x7fd2ce6fd700, estate=0x9ed8a68, canSetTag=1 '\001') at nodeModifyTable.c:1015 #8 0x005f7b6c in ExecModifyTable (node=0x9ed8d28) at nodeModifyTable.c:1501 #9 0x005dd5d8 in ExecProcNode (node=node@entry=0x9ed8d28) at execProcnode.c:396 #10 0x005d962f in ExecutePlan (dest=0xde86040, direction=, numberTuples=0, sendTuples=, operation=CMD_UPDATE, use_parallel_mode=, planstate=0x9ed8d28, estate=0x9ed8a68) at execMain.c:1567 #11 standard_ExecutorRun (queryDesc=0xde860d8, direction=, count=0) at execMain.c:338 #12 0x006f74c9 in ProcessQuery (plan=, sourceText=0xd74e88 "update public.brintest[...]", params=0x0, dest=0xde86040, completionTag=0x7ffe11ea7670 "") at pquery.c:185 #13 0x006f775f in PortalRunMulti (portal=portal@entry=0xde8abf0, isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0xde86040, altdest=0xc96680 , completionTag=completionTag@entry=0x7ffe11ea7670 "") at pquery.c:1267 #14 0x006f7a0c in FillPortalStore (portal=portal@entry=0xde8abf0, isTopLevel=isTopLevel@entry=1 '\001') at pquery.c:1044 #15 0x006f845d in PortalRun (portal=0xde8abf0, count=9223372036854775807, isTopLevel=, dest=0x9ee76b8, altdest=0x9ee76b8, completionTag=0x7ffe11ea7a20 "") at pquery.c:782 #16 0x006f5c63 in exec_simple_query (query_string=) at postgres.c:1094 #17 PostgresMain (argc=233352176, argv=0xe8ad358, dbname=0xcf7508 "regression", username=0xe8ad3b0 "Xӊ\016") at postgres.c:4059 #18 0x0046c8b2 in BackendRun (port=0xd1c580) at postmaster.c:4258 #19 BackendStartup (port=0xd1c580) at postmaster.c:3932 #20 ServerLoop () at postmaster.c:1690 #21 0x0069081e in PostmasterMain (argc=argc@entry=4, argv=argv@entry=0xcf64f0) at postmaster.c:1298 #22 0x0046d80d in main (argc=4, argv=0xcf64f0) at main.c:228 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Just-in-time compiling things (was: asynchronous and vectorized execution)
Konstantin Knizhnik writes: > Latest information from ISP RAS guys: them have made good progress > since February: them have rewritten most of methods of Scan, Aggregate > and Join to LLVM API. Is their work available somewhere? I'm experimenting in that area as well, although I'm using libFirm instead of LLVM. I wonder what their motivation to rewrite backend code in LLVM IR was, since I am following the approach of keeping the IR around when compiling the vanilla postgres C code, possibly inlining it during JIT and then doing optimizations on this IR. That way the logic doesn't have to be duplicated. regrads Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in BecomeLockGroupLeader
> Amit Kapila writes: >> Sounds good. So can we assume that you will try to get us the new report >> with more information? I don't see these crashes anymore in c1543a8. By the amount of fuzzing done it should have happened a dozen times, so it's highly likely something in 23b09e15..c1543a8 fixed it. regards, andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)
Hi, when fuzz testing master as of c1543a8, parallel workers trigger the following assertion in ExecInitSubPlan every couple hours. TRAP: FailedAssertion("!(list != ((List *) ((void *)0)))", File: "list.c", Line: 390) Sample backtraces of a worker and leader below, plan of leader attached. The collected queries don't seem to reproduce it. Curiously, running explain on them on the failed instance after crash recovery never shows any gather nodes… regards, andreas Core was generated by `postgres: bgworker: parallel worker for PID 28062 '. Program terminated with signal SIGABRT, Aborted. (gdb) bt #3 0x0061bad2 in list_nth_cell (list=0x0, n=) at list.c:390 #4 0x0061bb26 in list_nth (list=, n=) at list.c:413 #5 0x005fe566 in ExecInitSubPlan (subplan=subplan@entry=0x1522a08, parent=parent@entry=0x1538188) at nodeSubplan.c:705 #6 0x005e3b54 in ExecInitExpr (node=0x1522a08, parent=parent@entry=0x1538188) at execQual.c:4724 #7 0x005e415c in ExecInitExpr (node=, parent=parent@entry=0x1538188) at execQual.c:5164 #8 0x005ff3fc in ExecInitAlternativeSubPlan (asplan=asplan@entry=0x1522060, parent=parent@entry=0x1538188) at nodeSubplan.c:1185 #9 0x005e35c4 in ExecInitExpr (node=0x1522060, parent=parent@entry=0x1538188) at execQual.c:4740 #10 0x005e3f8a in ExecInitExpr (node=0x1522978, parent=parent@entry=0x1538188) at execQual.c:4845 #11 0x005e415c in ExecInitExpr (node=, parent=parent@entry=0x1538188) at execQual.c:5164 #12 0x005e3687 in ExecInitExpr (node=0x1522920, parent=parent@entry=0x1538188) at execQual.c:4648 #13 0x005e330f in ExecInitExpr (node=0x15228c8, parent=parent@entry=0x1538188) at execQual.c:5132 #14 0x005e326f in ExecInitExpr (node=0x1522870, parent=parent@entry=0x1538188) at execQual.c:5152 #15 0x005e415c in ExecInitExpr (node=, parent=parent@entry=0x1538188) at execQual.c:5164 #16 0x005fbb62 in ExecInitSeqScan (node=0x1522728, estate=0x15379b8, eflags=16) at nodeSeqscan.c:192 #17 0x005dd567 in ExecInitNode (node=0x1522728, estate=estate@entry=0x15379b8, eflags=eflags@entry=16) at execProcnode.c:192 #18 0x005f12a5 in ExecInitHashJoin (node=0x1522530, estate=0x15379b8, eflags=16) at nodeHashjoin.c:489 #19 0x005dd497 in ExecInitNode (node=node@entry=0x1522530, estate=estate@entry=0x15379b8, eflags=eflags@entry=16) at execProcnode.c:275 #20 0x005dae6c in InitPlan (eflags=16, queryDesc=) at execMain.c:959 #21 standard_ExecutorStart (queryDesc=, eflags=16) at execMain.c:238 #22 0x005dcac4 in ParallelQueryMain (seg=, toc=0x7f442d27b000) at execParallel.c:729 #23 0x004e631b in ParallelWorkerMain (main_arg=) at parallel.c:1033 #24 0x00683af2 in StartBackgroundWorker () at bgworker.c:726 #25 0x0068ec32 in do_start_bgworker (rw=0x14c4a20) at postmaster.c:5531 #26 maybe_start_bgworker () at postmaster.c:5706 #27 0x0068f686 in sigusr1_handler (postgres_signal_arg=) at postmaster.c:4967 #28 #29 0x7f442c839ac3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81 #30 0x0046c144 in ServerLoop () at postmaster.c:1654 #31 0x00690aae in PostmasterMain (argc=argc@entry=4, argv=argv@entry=0x14a1560) at postmaster.c:1298 #32 0x0046d78d in main (argc=4, argv=0x14a1560) at main.c:228 (gdb) frame 5 #5 0x005fe566 in ExecInitSubPlan (subplan=subplan@entry=0x1522a08, parent=parent@entry=0x1538188) at nodeSubplan.c:705 (gdb) list 704 /* Link the SubPlanState to already-initialized subplan */ 705 sstate->planstate = (PlanState *) list_nth(estate->es_subplanstates, 706subplan->plan_id - 1); (gdb) attach 28062 Attaching to program: /home/smith/postgres/inst/master/bin/postgres, process 28062 (gdb) bt #0 0x7f442c840e33 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:81 #1 0x006d1dde in WaitEventSetWaitBlock (nevents=1, occurred_events=0x7fffdedd75a0, cur_timeout=-1, set=0xe3eedb8) at latch.c:981 #2 WaitEventSetWait (set=set@entry=0xe3eedb8, timeout=timeout@entry=-1, occurred_events=occurred_events@entry=0x7fffdedd75a0, nevents=nevents@entry=1) at latch.c:935 #3 0x006d2226 in WaitLatchOrSocket (latch=0x7f442c0d9644, wakeEvents=wakeEvents@entry=1, sock=sock@entry=-1, timeout=-1, timeout@entry=0) at latch.c:347 #4 0x006d22ed in WaitLatch (latch=, wakeEvents=wakeEvents@entry=1, timeout=timeout@entry=0) at latch.c:302 #5 0x005ef4e3 in gather_readnext (gatherstate=0xe3d7ce8) at nodeGather.c:384 #6 gather_getnext (gatherstate=0xe3d7ce8) at nodeGather.c:283 #7 ExecGather (node=node@entry=0xe3d7ce8) at nodeGather.c:229 #8 0x005dd728 in ExecProcNode (node=node@entry=0xe3d7ce8) at execProcnode.c:515 #9 0x005f995c in ExecNestLoop (node=node@entry=0x10ef9d90) at nodeNestloop.c:174 #10
Re: [HACKERS] [sqlsmith] Failed assertion in BecomeLockGroupLeader
Alvaro Herrera writes: > Robert Haas wrote: >> On Thu, May 5, 2016 at 4:11 PM, Andreas Seltenreich <seltenre...@gmx.de> >> wrote: >> > I don't see these crashes anymore in c1543a8. By the amount of fuzzing >> > done it should have happened a dozen times, so it's highly likely >> > something in 23b09e15..c1543a8 fixed it. >> >> Hmm, I'd guess c45bf5751b6338488bd79ce777210285531da373 to be the most >> likely candidate. > > I thought so too, but then that patch change things in the planner side, > but it seems to me that the reported crash is in the executor, unless I'm > misreading. Tom had a theory in Message-ID: <12751.1461937...@sss.pgh.pa.us> on how the planner bug could cause the executor crash. regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] PANIC: failed to add BRIN tuple
I wrote: > Re-fuzzing now with your patch applied. This so far yielded three BRIN core dumps on different machines with the same backtraces. Crash recovery fails in these cases. I've put the data directory here (before recovery): http://ansel.ydns.eu/~andreas/brincrash2-spidew.tar.xz (9.1M) Corresponding backtraces of the backend and startup core files below. regards, Andreas Core was generated by `postgres: smith brintest [local] UPDATE '. Program terminated with signal SIGABRT, Aborted. #0 0x7f5a49a9c067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 0x7f5a49a9c067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x7f5a49a9d448 in __GI_abort () at abort.c:89 #2 0x007ec979 in errfinish (dummy=dummy@entry=0) at elog.c:557 #3 0x007f012c in elog_finish (elevel=elevel@entry=20, fmt=fmt@entry=0x989cf0 "incorrect index offsets supplied") at elog.c:1378 #4 0x006eda2f in PageIndexDeleteNoCompact (page=page@entry=0x7f5a48c6f200 "Y", itemnos=itemnos@entry=0x7ffe6d0d0abc, nitems=nitems@entry=1) at bufpage.c:1011 #5 0x00470119 in brin_doupdate (idxrel=0x1c5a530, pagesPerRange=1, revmap=0x91a7d90, heapBlk=2166, oldbuf=3782, oldoff=2, origtup=0x719db80, origsz=3656, newtup=0x754e188, newsz=3656, samepage=1 '\001') at brin_pageops.c:181 #6 0x0046e5db in brininsert (idxRel=0x1c5a530, values=0x6591, nulls=0x6 , heaptid=0x, heapRel=0x7f5a4a72f700, checkUnique=UNIQUE_CHECK_NO) at brin.c:244 #7 0x005d888f in ExecInsertIndexTuples (slot=0x91a4870, tupleid=0x91a7df4, estate=0x91b9b38, noDupErr=0 '\000', specConflict=0x0, arbiterIndexes=0x0) at execIndexing.c:383 #8 0x005f74e5 in ExecUpdate (tupleid=0x7ffe6d0d0ed0, oldtuple=0x6591, slot=0x91a4870, planSlot=0x, epqstate=0x7f5a4a72f700, estate=0x91b9b38, canSetTag=1 '\001') at nodeModifyTable.c:1015 #9 0x005f7b7c in ExecModifyTable (node=0x71861c0) at nodeModifyTable.c:1501 #10 0x005dd5e8 in ExecProcNode (node=node@entry=0x71861c0) at execProcnode.c:396 #11 0x005d963f in ExecutePlan (dest=0x17bb870, direction=, numberTuples=0, sendTuples=, operation=CMD_UPDATE, use_parallel_mode=, planstate=0x71861c0, estate=0x91b9b38) at execMain.c:1567 #12 standard_ExecutorRun (queryDesc=0x17bb908, direction=, count=0) at execMain.c:338 #13 0x006f74d9 in ProcessQuery (plan=, sourceText=0x1670e18 "update public.brintest set \n charcol = null, \n namecol = pg_catalog.name(cast(null as character varying)\n), \n int8col = null, \n int2col = public.brintest.int2col, \n textcol = null, \n float4col = null, \n float8col = cast(coalesce(null,\n\npublic.brintest.float8col) as double precision), \n inetcol = public.brintest.inetcol, \n cidrcol = public.brintest.cidrcol, \n bpcharcol = null, \n datecol = (select datecol from public.brintest limit 1 offset 25)\n, \n timecol = (select timecol from public.brintest limit 1 offset 42)\n, \n timetzcol = (select timetzcol from public.brintest limit 1 offset 3)\n, \n numericcol = (select numericcol from public.brintest limit 1 offset 32)\n, \n uuidcol = public.brintest.uuidcol\nreturning \n public.brintest.datecol as c0;", params=0x0, dest=0x17bb870, completionTag=0x7ffe6d0d10a0 "") at pquery.c:185 #14 0x006f776f in PortalRunMulti (portal=portal@entry=0x1611b68, isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0x17bb870, altdest=0xc96680 , completionTag=completionTag@entry=0x7ffe6d0d10a0 "") at pquery.c:1267 #15 0x006f7a1c in FillPortalStore (portal=portal@entry=0x1611b68, isTopLevel=isTopLevel@entry=1 '\001') at pquery.c:1044 #16 0x006f846d in PortalRun (portal=0x1611b68, count=9223372036854775807, isTopLevel=, dest=0x756c870, altdest=0x756c870, completionTag=0x7ffe6d0d1450 "") at pquery.c:782 #17 0x006f5c73 in exec_simple_query (query_string=) at postgres.c:1094 #18 PostgresMain (argc=23141224, argv=0x2cab518, dbname=0x15f3578 "brintest", username=0x2cab570 "\030\265\312\002") at postgres.c:4059 #19 0x0046c8d2 in BackendRun (port=0x1618880) at postmaster.c:4258 #20 BackendStartup (port=0x1618880) at postmaster.c:3932 #21 ServerLoop () at postmaster.c:1690 #22 0x006907fe in PostmasterMain (argc=argc@entry=4, argv=argv@entry=0x15f2560) at postmaster.c:1298 #23 0x0046d82d in main (argc=4, argv=0x15f2560) at main.c:228 (gdb) frame 4 #4 0x006eda2f in PageIndexDeleteNoCompact (page=page@entry=0x7f5a48c6f200 "Y", itemnos=itemnos@entry=0x7ffe6d0d0abc, nitems=nitems@entry=1) at bufpage.c:1011 1011elog(ERROR, "incorrect index offsets supplied"); (gdb) list 1006} 1007} 1008 1009/* this will catch invalid or out-of-order itemnos[] */ 1010
Re: [HACKERS] GiST index build versus NaN coordinates
Tom Lane writes: > More generally, this example makes me fearful that NaN coordinates in > geometric values are likely to cause all sorts of issues. It's too late > to disallow them, probably, but I wonder how can we identify other bugs > of this ilk. Sounds like some fuzz testing with nan/infinity is in order. sqlsmith doesn't generate any float literals, but it calls functions to satisfy its need for values of specific types. Adding suitable functions[1] to the regression db, I made the following observations: The infinite loop from the bug report was triggered. Further, two previously unseen errors are logged: ERROR: timestamp cannot be NaN ERROR: getQuadrant: impossible case The first is porbably as boring as it gets, the second one is from the getQuadrant() in spgquadtreeproc.c. Curiously, the getQuadrant()s in geo_spgist.c and rangetypes_spgist.c do not have such a check. I guess the boxes will just end up in an undefined position in the index for these. regards Andreas Footnotes: [1] create function smith_double_inf() returns float as $$select 'infinity'::float$$ language sql immutable; create function smith_double_ninf() returns float as $$select '-infinity'::float$$ language sql immutable; create function smith_double_nan() returns float as $$select 'nan'::float$$ language sql immutable; create function smith_real_nan() returns real as $$select 'nan'::real$$ language sql immutable; create function smith_real_inf() returns real as $$select 'infinity'::real$$ language sql immutable; create function smith_real_ninf() returns real as $$select '-infinity'::real$$ language sql immutable; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issue in pg_catalog.pg_indexes view definition
Tom Lane writes: > Dilip Kumarwrites: >> So I think changing the view definition and calling this function on >> indexrelid will remove the error. So I think >> correct fix is to change view definition, as I proposed in above patch. [...] > We've dealt with similar issues in places like pg_relation_size() by > making the functions return NULL instead of throwing an error for an > unmatched argument OID. Note that Michael Paquier sent a patch implementing this in another thread: https://www.postgresql.org/message-id/cab7npqtxf5dtxjezb7xkjvowxx8d_2atxmtu3psnkhcwt_j...@mail.gmail.com regards, andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving executor performance
Andres Freund writes: > Having expression evaluation and slot deforming as a series of simple > sequential steps, instead of complex recursive calls, would also make it > fairly straightforward to optionally just-in-time compile those. I don't think that JIT becomes easier by this change. Constructing the IR for LLVM, libFirm or any other JIT library from expression trees is straightforward already. It's probably more of a nuisance for those that already have some code/design on JIT-compiling expressions (vitessedb, ISP RAS, yours truly) I like your patch for all the other reasons stated though! regards Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving executor performance
Andres Freund writes: > The problem is that the previous form has a lot of ad-hoc analysis > strewn in. The interesting part is getting rid of all that. That's what > the new ExecInitExpr2() does. The target form can be both evaluated more > efficiently in the dispatch manner in the patch, and quite simply > converted to a JIT - without duplicating the analysis code. I did write > a small ad-hoc x86 jit, and it was really quite straightforward this > way. Ja, I see the advantage when doing ad-hoc-JIT compilation. > What did you do with JIT and expression evaluation? You basically just > replaced the toplevel ExprState note with a different evalfunc, pointing > into your code? That's the plan, yes. I'm sorry there's no publishable code yet on the the postgres side of things. Using libFirm[1], the plan is to. 1. Automatically generate Firm-IR for the static C code around expression evaluation as well operators in the system catalog. 2. Construct IR for expression trees (essentially all the function calls the executor would do). 3. Push libFirm's optimize button. At this stage, most of the dispatching goes away by inlining the calls including functions from the catalog implementing operators. 4. Generate code and replace the toplevel expression node with a funcall node. I did implement this recipe with a toy Forth interpreter to see whether libFirm was up to the job (Nobody's done JIT with libFirm before). The results were favorable[2]. Currently, a student at credativ is working on applying these techniques to postgres. regards, Andreas Footnotes: [1] http://libfirm.org/ [2] https://korte.credativ.com/~ase/firm-postgres-jit-forth.pdf -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving executor performance
Andres Freund writes: > On 2016-07-14 23:03:10 +0200, Andreas Seltenreich wrote: >> That's the plan, yes. I'm sorry there's no publishable code yet on the >> the postgres side of things. Using libFirm[1], the plan is to. > > Why libfirm? - It has a more modern IR than LLVM (they're catching up though) - It's very lightweight, compiles faster than LLVM's ./configure run - I do have lots of experience with it and a committer bit > It seems to only have x86 and sparc backends, and no windows support? Ack, it's mostly used in research projects, that's why the number of supported ISAs is small. It's enough to answer the burning question what speedup is to expected by jit-compiling things in the backend though. Also, if this thing actually takes off, adding more backends is something that is easier with libFirm than LLVM, IMHO. >> 1. Automatically generate Firm-IR for the static C code around >>expression evaluation as well operators in the system catalog. > >> 2. Construct IR for expression trees (essentially all the function calls >>the executor would do). > > But that essentially means redoing most of execQual's current code in IR > - or do you want to do that via 1) above? Manually re-doing backend logic in IR is a can of worms I do not want to open. This would guarantee bugs and result in a maintenance nightmare, so doing 1) for the code is the only option when something turns out to be a bottleneck IMHO. > As long as the preparation code (which is currently intermixed with > the execution phase) isn't separated, that means pulling essentially > the whole backend given that we do catalog lookups and everything > during that phase. Right, the catalog lookups need to be done before JIT-compilation to allow inlining operators. >> Currently, a student at credativ is working on applying these >> techniques to postgres. > > Are you planning to support this to postgres proper? The goal is to publish it as an extension that sneaks into planner_hook. I think BSD-licensing is ok as long as libfirm (LGPL) is kept as an external dependency. regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
Hi, the following statement triggers an assertion in tsearch: select ts_delete(array_to_tsvector('{smith,smith,smith}'::text[]), '{smith,smith}'::text[]); -- TRAP: FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511) regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)
Hi, testing with sqlsmith shows that the following assertion doesn't hold: FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200) The triggering statements always contain a call to pg_start_backup with the third argument 'true', i.e. it's trying to start an exlusive backup. I didn't manage to put together a stand-alone testcase yet. regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GiST index build versus NaN coordinates
I wrote: > Sounds like some fuzz testing with nan/infinity is in order. related fallout: close_ps returns a NULL pointer with NaNs around: select close_ps('(nan,nan)', '(nan,nan),(nan,nan)'); -- TRAP: FailedAssertion("!(result != ((void *)0))", File: "geo_ops.c", Line: 2860) regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [sqlsmith] Crash in GetOldestSnapshot()
Hi, since updating master from c93d873..fc509cd, I see crashes in GetOldestSnapshot() on update/delete returning statements. I reduced the triggering statements down to this: update clstr_tst set d = d returning d; Backtrace below. regards, Andreas Program received signal SIGSEGV, Segmentation fault. (gdb) bt #0 GetOldestSnapshot () at snapmgr.c:422 #1 0x004b8279 in init_toast_snapshot (toast_snapshot=0x7ffcd824b010) at tuptoaster.c:2314 #2 0x004b83bc in toast_fetch_datum (attr=) at tuptoaster.c:1869 #3 0x004b9ab5 in heap_tuple_untoast_attr (attr=0x18226c8) at tuptoaster.c:179 #4 0x007f71ad in pg_detoast_datum_packed (datum=) at fmgr.c:2266 #5 0x007cfc12 in text_to_cstring (t=0x18226c8) at varlena.c:186 #6 0x007f5735 in FunctionCall1Coll (flinfo=flinfo@entry=0x18221c0, collation=collation@entry=0, arg1=arg1@entry=25306824) at fmgr.c:1297 #7 0x007f68ee in OutputFunctionCall (flinfo=0x18221c0, val=25306824) at fmgr.c:1946 #8 0x00478bc1 in printtup (slot=0x1821f80, self=0x181ce48) at printtup.c:359 #9 0x006f9c8e in RunFromStore (portal=portal@entry=0x177cbf8, direction=direction@entry=ForwardScanDirection, count=count@entry=0, dest=0x181ce48) at pquery.c:1117 #10 0x006f9d52 in PortalRunSelect (portal=portal@entry=0x177cbf8, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0x181ce48) at pquery.c:942 #11 0x006fb41e in PortalRun (portal=portal@entry=0x177cbf8, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0x181ce48, altdest=altdest@entry=0x181ce48, completionTag=completionTag@entry=0x7ffcd824b920 "") at pquery.c:787 #12 0x006f822b in exec_simple_query (query_string=0x17db878 "update clstr_tst set d = d returning d;") at postgres.c:1094 #13 PostgresMain (argc=, argv=argv@entry=0x1781ce0, dbname=0x1781b40 "regression", username=) at postgres.c:4074 #14 0x0046c9bd in BackendRun (port=0x1786920) at postmaster.c:4262 #15 BackendStartup (port=0x1786920) at postmaster.c:3936 #16 ServerLoop () at postmaster.c:1693 #17 0x00693044 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x175d5f0) at postmaster.c:1301 #18 0x0046dd26 in main (argc=3, argv=0x175d5f0) at main.c:228 (gdb) list 417 418 if (OldestActiveSnapshot != NULL) 419 ActiveLSN = OldestActiveSnapshot->as_snap->lsn; 420 421 if (XLogRecPtrIsInvalid(RegisteredLSN) || RegisteredLSN > ActiveLSN) 422 return OldestActiveSnapshot->as_snap; 423 424 return OldestRegisteredSnapshot; 425 } 426 (gdb) bt full #0 GetOldestSnapshot () at snapmgr.c:422 OldestRegisteredSnapshot = RegisteredLSN = ActiveLSN = #1 0x004b8279 in init_toast_snapshot (toast_snapshot=0x7ffcd824b010) at tuptoaster.c:2314 snapshot = #2 0x004b83bc in toast_fetch_datum (attr=) at tuptoaster.c:1869 toastrel = 0x7f8b4ca88920 toastidxs = 0x18447c8 toastkey = { sk_flags = 0, sk_attno = 1, sk_strategy = 3, sk_subtype = 0, sk_collation = 100, sk_func = { fn_addr = 0x77c490 , fn_oid = 184, fn_nargs = 2, fn_strict = 1 '\001', fn_retset = 0 '\000', fn_stats = 2 '\002', fn_extra = 0x0, fn_mcxt = 0x18282a8, fn_expr = 0x0 }, sk_argument = 34491 } toastscan = ttup = toasttupDesc = 0x7f8b4ca88c50 result = 0x18422d8 toast_pointer = ressize = 5735 residx = nextidx = 0 numchunks = 3 chunk = isnull = chunkdata = chunksize = num_indexes = 1 validIndex = 0 SnapshotToast = { satisfies = 0x112, xmin = 3626283536, xmax = 32764, xip = 0xf8ac628, xcnt = 5221870, subxip = 0x0, subxcnt = 0, suboverflowed = 0 '\000', takenDuringRecovery = 0 '\000', copied = 0 '\000', curcid = 14, speculativeToken = 0, active_count = 260753304, regd_count = 0, ph_node = { first_child = 0xf8ac680, next_sibling = 0xa400112, prev_or_parent = 0x0 }, whenTaken = 274, lsn = 0 } __func__ = "toast_fetch_datum" #3 0x004b9ab5 in heap_tuple_untoast_attr (attr=0x18226c8) at tuptoaster.c:179 No locals. #4 0x007f71ad in pg_detoast_datum_packed (datum=) at fmgr.c:2266 No locals. #5 0x007cfc12 in text_to_cstring (t=0x18226c8) at varlena.c:186 tunpacked = result = #6 0x007f5735 in FunctionCall1Coll (flinfo=flinfo@entry=0x18221c0,
Re: [HACKERS] [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)
Michael Paquier writes: > Andreas, with the patch attached is the assertion still triggered? > [2. text/x-diff; base-backup-crash-v2.patch] I didn't observe the crashes since applying this patch. There should have been about five by the amount of fuzzing done. regards Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [sqlsmith] Crash in pg_get_viewdef_name_ext()
Hi, sqlsmith just triggered a crash in pg_get_viewdef_name_ext(). Looks like commit 976b24fb4 failed to update this caller of pg_get_viewdef_worker(). Backtrace below. Patch attached. regards, Andreas Program terminated with signal SIGSEGV, Segmentation fault. (gdb) bt #0 strlen () at ../sysdeps/x86_64/strlen.S:106 #1 0x007cdf09 in cstring_to_text (s=s@entry=0x0) at varlena.c:152 #2 0x007a3409 in string_to_text (str=0x0) at ruleutils.c:10083 #3 pg_get_viewdef_name_ext (fcinfo=) at ruleutils.c:681 #4 0x005dfae2 in ExecMakeFunctionResultNoSets (fcache=0x403ed80, econtext=0x3fb0eb8, isNull=0x403e0a1 "", isDone=) at execQual.c:2041 [...] diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index d68ca7a..3c3fce4 100644 --- a/src/backend/utils/adt/ruleutils.c +++ b/src/backend/utils/adt/ruleutils.c @@ -671,6 +671,7 @@ pg_get_viewdef_name_ext(PG_FUNCTION_ARGS) int prettyFlags; RangeVar *viewrel; Oid viewoid; + char *res; prettyFlags = pretty ? PRETTYFLAG_PAREN | PRETTYFLAG_INDENT : PRETTYFLAG_INDENT; @@ -678,7 +679,12 @@ pg_get_viewdef_name_ext(PG_FUNCTION_ARGS) viewrel = makeRangeVarFromNameList(textToQualifiedNameList(viewname)); viewoid = RangeVarGetRelid(viewrel, NoLock, false); - PG_RETURN_TEXT_P(string_to_text(pg_get_viewdef_worker(viewoid, prettyFlags, WRAP_COLUMN_DEFAULT))); + res = pg_get_viewdef_worker(viewoid, prettyFlags, WRAP_COLUMN_DEFAULT); + + if (res == NULL) + PG_RETURN_NULL(); + + PG_RETURN_TEXT_P(string_to_text(res)); } /* -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [sqlsmith] crashes in RestoreSnapshot on hot standby
Running sqlsmith on a streaming slave (master as of f8c5855) is inconspicuous as long as the master is idle. As soon as I start it on the master as well, the standby frequently crashes in RestoreSnapshot. It doesn't seem to be specific to the queries, as they don't trigger a crash when re-run. Backtraces always look like the ones below. regards, Andreas BEGIN BACKTRACE OF CORE FILE ./slave/postgres.9826@.core ON doombat Core was generated by `postgres: smith regression [local] SELECT '. Program terminated with signal SIGSEGV, Segmentation fault. #0 __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:167 167 ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: Datei oder Verzeichnis nicht gefunden. (gdb) bt #0 __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:167 #1 0x00822032 in RestoreSnapshot (start_address=start_address@entry=0x7f2701d5a110 ) at snapmgr.c:2020 #2 0x004a934a in heap_beginscan_parallel (relation=0x2060a90, parallel_scan=parallel_scan@entry=0x7f2701d5a0f8) at heapam.c:1657 #3 0x005fbedf in ExecSeqScanInitializeDSM (node=0x1f5b470, pcxt=0x221af88) at nodeSeqscan.c:327 #4 0x005dd0ad in ExecParallelInitializeDSM (planstate=planstate@entry=0x1f5b470, d=d@entry=0x7ffd4ba200d0) at execParallel.c:245 #5 0x005dd425 in ExecInitParallelPlan (planstate=0x1f5b470, estate=estate@entry=0x1f5ab28, nworkers=2) at execParallel.c:477 #6 0x005ef4a4 in ExecGather (node=node@entry=0x1f5b048) at nodeGather.c:159 #7 0x005dda48 in ExecProcNode (node=node@entry=0x1f5b048) at execProcnode.c:515 #8 0x005f4b30 in ExecLimit (node=node@entry=0x1f5acd0) at nodeLimit.c:91 #9 0x005dd9d8 in ExecProcNode (node=node@entry=0x1f5acd0) at execProcnode.c:531 #10 0x005fef7c in ExecSetParamPlan (node=, econtext=0x1f5c138) at nodeSubplan.c:999 #11 0x005e28b5 in ExecEvalParamExec (exprstate=, econtext=, isNull=0x22045b0 "", isDone=) at execQual.c:1135 #12 0x005deb6d in ExecMakeFunctionResultNoSets (fcache=0x2204200, econtext=0x1f5c138, isNull=0x2203d98 "", isDone=) at execQual.c:2015 #13 0x005de29a in ExecEvalCoalesce (coalesceExpr=, econtext=0x1f5c138, isNull=0x2203d98 "", isDone=) at execQual.c:3446 #14 0x005deb6d in ExecMakeFunctionResultNoSets (fcache=0x22039e8, econtext=0x1f5c138, isNull=0x7ffd4ba203df "", isDone=) at execQual.c:2015 #15 0x005e4939 in ExecQual (qual=, econtext=econtext@entry=0x1f5c138, resultForNull=resultForNull@entry=0 '\000') at execQual.c:5269 #16 0x005faef1 in ExecResult (node=node@entry=0x1f5c020) at nodeResult.c:82 #17 0x005ddbf8 in ExecProcNode (node=node@entry=0x1f5c020) at execProcnode.c:392 #18 0x005d9c1f in ExecutePlan (dest=0x1ebb7d0, direction=, numberTuples=0, sendTuples=, operation=CMD_SELECT, use_parallel_mode=, planstate=0x1f5c020, estate=0x1f5ab28) at execMain.c:1567 #19 standard_ExecutorRun (queryDesc=0x1f5a718, direction=, count=0) at execMain.c:338 #20 0x006f7238 in PortalRunSelect (portal=portal@entry=0x1d13be8, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0x1ebb7d0) at pquery.c:946 #21 0x006f875e in PortalRun (portal=0x1d13be8, count=9223372036854775807, isTopLevel=, dest=0x1ebb7d0, altdest=0x1ebb7d0, completionTag=0x7ffd4ba20840 "") at pquery.c:787 #22 0x006f6003 in exec_simple_query (query_string=) at postgres.c:1094 #23 PostgresMain (argc=30489576, argv=0x1ecfb08, dbname=0x1cf5a00 "regression", username=0x1ecfc20 "\b\373\354\001") at postgres.c:4074 #24 0x0046ca67 in BackendRun (port=0x1d17b50) at postmaster.c:4262 #25 BackendStartup (port=0x1d17b50) at postmaster.c:3936 #26 ServerLoop () at postmaster.c:1693 #27 0x00690ab7 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1cf45e0) at postmaster.c:1301 #28 0x0046d9cd in main (argc=3, argv=0x1cf45e0) at main.c:228 (gdb) p debug_query_string $1 = 0x1d68a78 "select \n sample_0.j as c0\nfrom \n public.testjsonb as sample_0 tablesample system (8) \nwhere cast(coalesce(pg_catalog.char_length(\n cast((select comment from public.room limit 1 offset 20)\n as text)),\npg_catalog.pg_trigger_depth()) as integer) <> 3" BEGIN BACKTRACE OF CORE FILE ./slave/postgres.8104@.core ON marbit Core was generated by `postgres: bgworker: parallel worker for PID 2610 '. Program terminated with signal SIGSEGV, Segmentation fault. #0 __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:167 167 ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: Datei oder Verzeichnis nicht gefunden. (gdb) bt #0 __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:167 #1 0x00822032 in RestoreSnapshot (start_address=0x7f7b1ee4fa58 ) at snapmgr.c:2020 #2
Re: [HACKERS] [sqlsmith] crashes in RestoreSnapshot on hot standby
Amit Kapila writes: > On Fri, Jul 1, 2016 at 9:38 AM, Thomas Munro> wrote: >> Or maybe just like this? >> >> - snapshot->subxip = snapshot->xip + serialized_snapshot->xcnt; >> + snapshot->subxip = ((TransactionId *) (snapshot + 1)) + >> + serialized_snapshot->xcnt; >> > > This way it looks better to me. Thanks for the patch. I no longer see these crashes when testing with the patch applied. thanks, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [sqlsmith] ERROR: plan should not reference subplan's variable
Updating master from f8c5855..1bdae16, sqlsmith triggers "failed to generate plan" errors again. Below is the smallest query logged so far. regards, Andreas -- ERROR: plan should not reference subplan's variable set force_parallel_mode = 'on'; set max_parallel_workers_per_gather = '1'; explain WITH jennifer_0 AS (select (select b from public.rtest_v1 limit 1 offset 5) as c0, pg_catalog.pg_current_xlog_location() as c1, sample_0.a as c2, sample_0.a as c3 from public.rtest_view4 as sample_0 tablesample system (5.9) where cast(null as bigint) = pg_catalog.hashinet( cast((select client_addr from pg_catalog.pg_stat_activity limit 1 offset 35) as inet)) limit 76) select ref_0.sl_name as c0 from public.shoelace as ref_0 where (cast(null as anyrange) < cast(null as anyrange)) and (EXISTS ( select 39 as c0 from jennifer_0 as ref_1 where cast(null as real) = cast(null as real) limit 81)) limit 96; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving RLS planning
Tom Lane writes: > Thanks for reviewing --- I'll do something with that test case and > push it. sqlsmith doesn't seem to like 215b43cdc: select 1 from information_schema.usage_privileges where information_schema._pg_keysequal( (select null::smallint[]), '{16,25,23}'); -- TRAP: FailedAssertion("!(!and_clause((Node *) clause))", File: "restrictinfo.c", Line: 81) regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [sqlsmith] Failed assertion in numeric aggregate
Hi, updating master from be7f7ee..39b691f, the following assertion is triggered frequently by sqlsmith: TRAP: BadArgument("!(((context) != ((void *)0) && (const Node*)((context)))->type) == T_AllocSetContext", File: "mcxt.c", Line: 1010) Digging in the coredumps, it looks like set_var_from_num() is invoked on an uninitialized NumericVar. Sample gdb session below. Below is also one of the generated queries that eventually triggers it for me when invoked a dozen times or so. regards, Andreas --8<---cut here---start->8--- select subq_0.c0 as c0, subq_0.c0 as c1, 5 as c2, (select pg_catalog.min(class) from public.f_star) as c3 from (select sample_2.cc as c0 from public.shoelace_arrive as ref_0 inner join public.hub as sample_1 right join public.e_star as sample_2 on (sample_1.name = sample_2.class ) on (ref_0.arr_name = sample_2.class ) limit 63) as subq_0 where ((subq_0.c0 is not NULL) and ((select pg_catalog.var_pop(enumsortorder) from pg_catalog.pg_enum) is not NULL)) and (((select pg_catalog.var_samp(random) from public.bt_txt_heap) is NULL) or ((select m from public.money_data limit 1 offset 1) <> (select pg_catalog.min(salary) from public.rtest_empmass) )); --8<---cut here---end--->8--- (gdb) bt #0 0x7ff011f221c8 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 #1 0x7ff011f2364a in __GI_abort () at abort.c:89 #2 0x007ef1b1 in ExceptionalCondition (conditionName=conditionName@entry=0x9d26c8 "!(((context) != ((void *)0) && (const Node*)((context)))->type) == T_AllocSetContext", errorType=errorType@entry=0x835c25 "BadArgument", fileName=fileName@entry=0x9d2640 "mcxt.c", lineNumber=lineNumber@entry=1010) at assert.c:54 #3 0x00813561 in pfree (pointer=) at mcxt.c:1010 #4 0x00773169 in alloc_var (var=var@entry=0x7ffe3a6d18d0, ndigits=ndigits@entry=6) at numeric.c:5387 #5 0x00774230 in set_var_from_num (num=0x1e49180, dest=0x7ffe3a6d18d0) at numeric.c:5608 #6 0x0077be2c in numeric_poly_deserialize (fcinfo=) at numeric.c:4196 #7 0x005ec48c in combine_aggregates (aggstate=0x1e255d8, pergroup=) at nodeAgg.c:986 #8 0x005edcc5 in agg_retrieve_direct (aggstate=0x1e255d8) at nodeAgg.c:2095 #9 ExecAgg (node=node@entry=0x1e255d8) at nodeAgg.c:1837 #10 0x005e0078 in ExecProcNode (node=node@entry=0x1e255d8) at execProcnode.c:503 #11 0x0060173c in ExecSetParamPlan (node=, econtext=0x1e2e710) at nodeSubplan.c:995 #12 0x005e4f75 in ExecEvalParamExec (exprstate=, econtext=, isNull=0x7ffe3a6d1b3f "", isDone=) at execQual.c:1140 #13 0x005e14c6 in ExecEvalNullTest (nstate=0x1e2ec50, econtext=0x1e2e710, isNull=0x7ffe3a6d1b3f "", isDone=0x0) at execQual.c:3902 #14 0x005e0656 in ExecEvalOr (orExpr=, econtext=0x1e2e710, isNull=0x7ffe3a6d1b3f "", isDone=) at execQual.c:2809 #15 0x005e7089 in ExecQual (qual=, econtext=econtext@entry=0x1e2e710, resultForNull=resultForNull@entry=0 '\000') at execQual.c:5379 #16 0x005fd6b1 in ExecResult (node=node@entry=0x1e2e5f8) at nodeResult.c:82 #17 0x005e01f8 in ExecProcNode (node=node@entry=0x1e2e5f8) at execProcnode.c:392 #18 0x005dc27e in ExecutePlan (dest=0x7ff0129e22b0, direction=, numberTuples=0, sendTuples=, operation=CMD_SELECT, use_parallel_mode=, planstate=0x1e2e5f8, estate=0x1e1aba8) at execMain.c:1567 #19 standard_ExecutorRun (queryDesc=0x1d563b8, direction=, count=0) at execMain.c:338 #20 0x006faad8 in PortalRunSelect (portal=portal@entry=0x1def878, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0x7ff0129e22b0) at pquery.c:948 #21 0x006fc04e in PortalRun (portal=portal@entry=0x1def878, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0x7ff0129e22b0, altdest=altdest@entry=0x7ff0129e22b0, completionTag=completionTag@entry=0x7ffe3a6d1fa0 "") at pquery.c:789 #22 0x006f8deb in exec_simple_query (query_string=0x1dc2e58 "select \n subq_0.c0 as c0, \n subq_0.c0 as c1, \n 5 as c2, \n (select pg_catalog.min(class) from public.f_star)\n as c3\nfrom \n (select \n sample_2.cc as c0\n from \npublic.shoel"...) at postgres.c:1094 #23 PostgresMain (argc=, argv=argv@entry=0x1d64730, dbname=0x1d64590 "regression", username=) at postgres.c:4070 #24 0x0046cf81 in BackendRun (port=0x1d4ffd0) at postmaster.c:4260 #25 BackendStartup (port=0x1d4ffd0) at postmaster.c:3934 #26 ServerLoop () at postmaster.c:1691 #27 0x00693634 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1d2c5d0) at postmaster.c:1299 #28 0x0046e0d6 in main (argc=3, argv=0x1d2c5d0) at main.c:228 (gdb) frame 5 #5 0x00774230 in
[HACKERS] [sqlsmith] Missing CHECK_FOR_INTERRUPTS in tsquery_rewrite
Hi, testing with sqlsmith yielded an uncancellable backend hogging CPU time. Gdb showed it was busy in findeq() of tsquery_rewrite.c. This function appears to have exponential complexity wrt. the size of the involved tsqueries. The following query runs for 12s on my machine with no way to cancel it and incrementing the length of the first argument by 1 doubles this time. select ts_rewrite( (select string_agg(i::text, '&')::tsquery from generate_series(1,32) g(i)), (select string_agg(i::text, '&')::tsquery from generate_series(1,19) g(i)), 'foo'); The attached patch adds a CHECK_FOR_INTERRUPTS to make it cancellable. regards, Andreas >From d9910a96c9bd73c16e29ecaa0577945d5e1c091c Mon Sep 17 00:00:00 2001 From: Andreas Seltenreich <seltenre...@gmx.de> Date: Sun, 30 Oct 2016 03:25:55 +0100 Subject: [PATCH] Add CHECK_FOR_INTERRUPTS in tsquery_rewrite loop. The loop in findeq() appears to have exponential complexity and runtime becomes excessive for more than about 30 tokens in the tsvectors. Add a CHECK_FOR_INTERRUPTS to make it cancellable. --- src/backend/utils/adt/tsquery_rewrite.c | 4 1 file changed, 4 insertions(+) diff --git a/src/backend/utils/adt/tsquery_rewrite.c b/src/backend/utils/adt/tsquery_rewrite.c index 28f328d..ef6444f 100644 --- a/src/backend/utils/adt/tsquery_rewrite.c +++ b/src/backend/utils/adt/tsquery_rewrite.c @@ -95,6 +95,10 @@ findeq(QTNode *node, QTNode *ex, QTNode *subs, bool *isfind) do { +/* This loop is rather heavyweight, it better be + * cancellable. */ +CHECK_FOR_INTERRUPTS(); + tnode->sign = 0; for (i = 0; i < ex->nchild; i++) { -- 2.9.3 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [sqlsmith] Failed assertion in TS_phrase_execute
Hi, the query below triggers an assertion in TS_phrase_execute. Testing was done on master at dbdfd11. regards, Andreas -- TRAP: FailedAssertion("!(curitem->qoperator.oper == 4)", File: "tsvector_op.c", Line: 1432) select 'moscow' @@ ts_rewrite('moscow', 'moscow', ts_rewrite( tsquery_phrase('moscow','moscow'), 'moscow', $$ 'sanct' & 'peter'$$)); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Parallel worker crash on seqscan
Hi, Ashutosh Sharma writes: >> the following query appears to reliably crash parallel workers on master >> as of 0832f2d. > As suggested, I have tried to reproduce this issue on *0832f2d* commit but > could not reproduce it. as Tom pointed out earlier, I had secretly set parallel_setup_cost and parallel_tuple_cost to 0. I assumed these were irrelevant when force_parallel_mode is on. I'll do less assuming and more testing on a vanilla install on future reports. Sorry for the inconvenience, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan
Hi, just caught another InitPlan below Gather with the recent patches in (master as of 4cc6a3f). Recipe below. regards, andreas set max_parallel_workers_per_gather = 2; set min_parallel_relation_size = 0; set parallel_setup_cost = 0; set parallel_tuple_cost = 0; explain select 1 from public.quad_point_tbl as ref_0, lateral (select ref_0.p as c3, sample_0.d as c5 from public.nv_child_2010 as sample_0 left join public.mvtest_tvv as ref_1 on ('x'< (select contype from pg_catalog.pg_constraint limit 1)) limit 82) as subq_0; --QUERY PLAN -- -- Gather (cost=0.19..13727.52 rows=902246 width=4) --Workers Planned: 2 ---> Nested Loop (cost=0.19..13727.52 rows=902246 width=4) -- -> Parallel Seq Scan on quad_point_tbl ref_0 (cost=0.00..105.85 rows=4585 width=16) -- -> Limit (cost=0.19..1.33 rows=82 width=20) --InitPlan 1 (returns $0) -- -> Limit (cost=0.00..0.19 rows=1 width=1) ---> Gather (cost=0.00..10.22 rows=54 width=1) -- Workers Planned: 2 -- -> Parallel Seq Scan on pg_constraint (cost=0.00..10.22 rows=22 width=1) ---> Seq Scan on nv_child_2010 sample_0 (cost=0.00..35.50 rows=2550 width=20) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [sqlsmith] Crash on GUC serialization
Hi, sqlsmith just made a GUC that tricks the serialization code into dereferencing a nullpointer. Here's a recipe: --8<---cut here---start->8--- set min_parallel_relation_size to 0; set max_parallel_workers_per_gather to 2; set force_parallel_mode to on; begin; select set_config('foo.baz', null, b) from (values (false), (true)) g(b); commit; select 1; --8<---cut here---end--->8--- regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Crash on GUC serialization
Michael Paquier writes: > [2. text/plain; fix-guc-string-eval.patch] I'm afraid taking care of the length computation is not sufficient. ISTM like it'll still try to serialize the NULL pointer later on in serialize_variable: ,[ guc.c:9108 ] | case PGC_STRING: | { | struct config_string *conf = (struct config_string *) gconf; | do_serialize(destptr, maxbytes, "%s", *conf->variable); ` -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [sqlsmith] Parallel worker crash on seqscan
Hi, the following query appears to reliably crash parallel workers on master as of 0832f2d. --8<---cut here---start->8--- set max_parallel_workers_per_gather to 2; set force_parallel_mode to 1; select subq.context from pg_settings, lateral (select context from pg_opclass limit 1) as subq limit 1; --8<---cut here---end--->8--- Backtrace of a worker below. regards, Andreas Core was generated by `postgres: bgworker: parallel worker for PID 27448'. Program terminated with signal SIGSEGV, Segmentation fault. #0 MakeExpandedObjectReadOnlyInternal (d=0) at expandeddatum.c:100 100 if (!VARATT_IS_EXTERNAL_EXPANDED_RW(DatumGetPointer(d))) (gdb) bt #0 MakeExpandedObjectReadOnlyInternal (d=0) at expandeddatum.c:100 #1 0x563b0c9a4e98 in ExecTargetList (tupdesc=, isDone=0x7ffdd20400ac, itemIsDone=0x563b0e6a8b50, isnull=0x563b0e6a8ae0 "", values=0x563b0e6a8ac0, econtext=0x563b0e6a7db8, targetlist=0x563b0e6a8498) at execQual.c:5491 #2 ExecProject (projInfo=projInfo@entry=0x563b0e6a8368, isDone=isDone@entry=0x7ffdd20400ac) at execQual.c:5710 #3 0x563b0c9a514f in ExecScan (node=node@entry=0x563b0e6a8038, accessMtd=accessMtd@entry=0x563b0c9bc910 , recheckMtd=recheckMtd@entry=0x563b0c9bc900 ) at execScan.c:220 #4 0x563b0c9bc9c3 in ExecSeqScan (node=node@entry=0x563b0e6a8038) at nodeSeqscan.c:127 #5 0x563b0c99d6e8 in ExecProcNode (node=node@entry=0x563b0e6a8038) at execProcnode.c:419 #6 0x563b0c9995be in ExecutePlan (dest=0x563b0e67da40, direction=, numberTuples=0, sendTuples=, operation=CMD_SELECT, use_parallel_mode=, planstate=0x563b0e6a8038, estate=0x563b0e6a77f8) at execMain.c:1567 #7 standard_ExecutorRun (queryDesc=0x563b0e6a2828, direction=, count=0) at execMain.c:338 #8 0x563b0c99c911 in ParallelQueryMain (seg=, toc=0x7f55173aa000) at execParallel.c:745 #9 0x563b0c898b02 in ParallelWorkerMain (main_arg=) at parallel.c:1108 #10 0x563b0ca49cad in StartBackgroundWorker () at bgworker.c:726 #11 0x563b0ca55770 in do_start_bgworker (rw=0x563b0e621080) at postmaster.c:5535 #12 maybe_start_bgworker () at postmaster.c:5710 #13 0x563b0ca56238 in sigusr1_handler (postgres_signal_arg=) at postmaster.c:4959 #14 #15 0x7f5516788293 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:84 #16 0x563b0c818249 in ServerLoop () at postmaster.c:1665 #17 0x563b0ca577e2 in PostmasterMain (argc=3, argv=0x563b0e5fa490) at postmaster.c:1309 #18 0x563b0c819f6d in main (argc=3, argv=0x563b0e5fa490) at main.c:228 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan
Hi, the query below triggers a parallel worker assertion for me when run on the regression database of master as of 0832f2d. The plan sports a couple of InitPlan nodes below Gather. regards, Andreas Gather (cost=1.64..84.29 rows=128 width=4) Workers Planned: 1 Single Copy: true -> Limit (cost=1.64..84.29 rows=128 width=4) -> Subquery Scan on subq_0 (cost=1.64..451.06 rows=696 width=4) Filter: (subq_0.c6 IS NOT NULL) -> Nested Loop Left Join (cost=1.64..444.07 rows=699 width=145) Join Filter: (sample_0.aa = sample_1.pageno) InitPlan 4 (returns $3) -> Result (cost=1.21..5.36 rows=15 width=0) One-Time Filter: ($0 AND ($1 = $2)) InitPlan 1 (returns $0) -> Result (cost=0.00..0.00 rows=0 width=0) One-Time Filter: false InitPlan 2 (returns $1) -> Limit (cost=0.35..0.52 rows=1 width=4) -> Gather (cost=0.00..1.04 rows=6 width=4) Workers Planned: 1 -> Parallel Seq Scan on reltime_tbl (cost=0.00..1.04 rows=4 width=4) InitPlan 3 (returns $2) -> Limit (cost=0.52..0.69 rows=1 width=4) -> Gather (cost=0.00..1.04 rows=6 width=4) Workers Planned: 1 -> Parallel Seq Scan on reltime_tbl reltime_tbl_1 (cost=0.00..1.04 rows=4 width=4) -> Sample Scan on pg_foreign_data_wrapper sample_2 (cost=1.21..5.36 rows=15 width=0) Sampling: system ('3.1'::real) -> Nested Loop (cost=0.15..382.85 rows=699 width=4) -> Sample Scan on pg_largeobject sample_1 (cost=0.00..209.03 rows=699 width=8) Sampling: bernoulli ('2.9'::real) Filter: (pageno IS NOT NULL) -> Index Only Scan using pg_foreign_table_relid_index on pg_foreign_table ref_0 (cost=0.15..0.24 rows=1 width=4) Index Cond: (ftrelid = sample_1.loid) -> Materialize (cost=0.00..16.06 rows=4 width=4) -> Append (cost=0.00..16.04 rows=4 width=4) -> Sample Scan on a sample_0 (cost=0.00..4.01 rows=1 width=4) Sampling: system ('5'::real) -> Sample Scan on b sample_0_1 (cost=0.00..4.01 rows=1 width=4) Sampling: system ('5'::real) -> Sample Scan on c sample_0_2 (cost=0.00..4.01 rows=1 width=4) Sampling: system ('5'::real) -> Sample Scan on d sample_0_3 (cost=0.00..4.01 rows=1 width=4) Sampling: system ('5'::real) --8<---cut here---start->8--- set force_parallel_mode to on; set max_parallel_workers_per_gather to 2; select 91 as c0 from (select (select pfname from public.pslot limit 1 offset 3) as c0, ref_1.grandtot as c1, (select pg_catalog.min(procost) from pg_catalog.pg_proc) as c2, ref_0.ftoptions as c3, ref_1.grandtot as c4, sample_1.loid as c5, pg_catalog.pg_rotate_logfile() as c6, (select random from public.hash_i4_heap limit 1 offset 5) as c7, sample_1.loid as c8 from public.a as sample_0 tablesample system (5) right join pg_catalog.pg_largeobject as sample_1 tablesample bernoulli (2.9) inner join pg_catalog.pg_foreign_table as ref_0 on (sample_1.loid = ref_0.ftrelid ) on (sample_0.aa = sample_1.pageno ) left join public.mvtest_tvv as ref_1 on (EXISTS ( select sample_2.fdwoptions as c0, sample_2.fdwhandler as c1, (select during from public.test_range_excl limit 1 offset 89) as c2 from pg_catalog.pg_foreign_data_wrapper as sample_2 tablesample system (3.1) where (EXISTS ( select sample_3.b as c0, (select grantee from information_schema.udt_privileges limit 1 offset 4) as c1, sample_3.b as c2, sample_3.rf_a as c3, sample_3.b as c4,
[HACKERS] [sqlsmith] Infinite recursion in bitshift
Hi, sqlsmith just found another crasher: select bit '1' >> (-2^31)::int; This is due to an integer overflow in bitshiftright()/bitshiftleft() leading to them recursively calling each other. Patch attached. regards, Andreas >From cfdc425f75da268e1c2af08f936c59f34b69e577 Mon Sep 17 00:00:00 2001 From: Andreas Seltenreich <seltenre...@gmx.de> Date: Fri, 14 Oct 2016 20:52:52 +0200 Subject: [PATCH] Fix possible infinite recursion on bitshift. bitshiftright() and bitshiftleft() would recursively call each other infinitely when the user passed a MIN_INT for the shift amount due to an integer overflow. This patch reduces a negative shift amount to -VARBITMAXLEN to avoid overflow. --- src/backend/utils/adt/varbit.c | 12 1 file changed, 12 insertions(+) diff --git a/src/backend/utils/adt/varbit.c b/src/backend/utils/adt/varbit.c index 75e6a46..d8ecfb6 100644 --- a/src/backend/utils/adt/varbit.c +++ b/src/backend/utils/adt/varbit.c @@ -1387,9 +1387,15 @@ bitshiftleft(PG_FUNCTION_ARGS) /* Negative shift is a shift to the right */ if (shft < 0) + { + /* Protect against overflow */ + if (shft < -VARBITMAXLEN) + shft = -VARBITMAXLEN; + PG_RETURN_DATUM(DirectFunctionCall2(bitshiftright, VarBitPGetDatum(arg), Int32GetDatum(-shft))); + } result = (VarBit *) palloc(VARSIZE(arg)); SET_VARSIZE(result, VARSIZE(arg)); @@ -1447,9 +1453,15 @@ bitshiftright(PG_FUNCTION_ARGS) /* Negative shift is a shift to the left */ if (shft < 0) + { + /* Protect against overflow */ + if (shft < -VARBITMAXLEN) + shft = -VARBITMAXLEN; + PG_RETURN_DATUM(DirectFunctionCall2(bitshiftleft, VarBitPGetDatum(arg), Int32GetDatum(-shft))); + } result = (VarBit *) palloc(VARSIZE(arg)); SET_VARSIZE(result, VARSIZE(arg)); -- 2.9.3 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Infinite recursion in bitshift
Tom Lane writes: >> This is due to an integer overflow in bitshiftright()/bitshiftleft() >> leading to them recursively calling each other. Patch attached. > > Seems sane, though I wonder if it'd be better to use -INT_MAX rather > than -VARBITMAXLEN. I am undecided between those two. -INT_MAX might be a more precise fix for the problem, but the extra distance to the danger zone was kind of soothing :-). regards, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers