Re: [HACKERS] SQL procedures
On Wed, Nov 8, 2017 at 11:03 AM, Peter Eisentraut wrote: > On 11/8/17 11:11, Merlin Moncure wrote: >> On Wed, Nov 8, 2017 at 9:13 AM, Peter Eisentraut >> wrote: >>> I have already submitted a separate patch that addresses these questions. >> >> Maybe I'm obtuse, but I'm not seeing it? In very interested in the >> general approach to transaction management; if you've described it in >> the patch I'll read it there. Thanks for doing this. > > https://www.postgresql.org/message-id/178d3380-0fae-2982-00d6-c43100bc8...@2ndquadrant.com All right, thanks. So, *) Are you sure you want to go the SPI route? 'sql' language (non-spi) procedures might be simpler from implementation standpoint and do not need any language adjustments? *) Is it possible to jump into SPI without having a snapshot already set up. For example? If I wanted to set isolation level in a procedure, would I get impacted by this error? ERROR: SET TRANSACTION ISOLATION LEVEL must be called before any query merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL procedures
On Wed, Nov 8, 2017 at 9:13 AM, Peter Eisentraut wrote: > I have already submitted a separate patch that addresses these questions. Maybe I'm obtuse, but I'm not seeing it? In very interested in the general approach to transaction management; if you've described it in the patch I'll read it there. Thanks for doing this. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL procedures
On Tue, Oct 31, 2017 at 12:23 PM, Peter Eisentraut wrote: > - Transaction control in procedure bodies This feature is really key, since it enables via SQL lots of things that are not possible without external coding, including: *) very long running processes in a single routine *) transaction isolation control inside the procedure (currently client app has to declare this) *) certain error handling cases that require client side support *) simple in-database threading *) simple construction of daemon scripts (yeah, you can use bgworker for this, but pure sql daemon with a cron heartbeat hook is hard to beat for simplicity) I do wonder how transaction control could be added later. The last time I (lightly) looked at this, I was starting to think that working transaction control into the SPI interface was the wrong approach; pl/pgsql would have to adopt a very different set of behaviors if it was called in a function or a proc. If you restricted language choice to purely SQL, you could work around this problem; SPI languages would be totally abstracted from those sets of considerations and you could always call an arbitrary language function if you needed to. SQL has no flow control but I'm not too concerned about that. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan
On Tue, Sep 19, 2017 at 1:37 PM, Robert Haas wrote: > On Tue, Sep 19, 2017 at 12:45 PM, Pavel Stehule > wrote: >>> You can already set a GUC with function scope. I'm not getting your >>> point. >> >> yes, it is true. But implementation of #option is limited to PLpgSQL - so >> there is not any too much questions - GUC is global - there is lot of >> points: >> >> * what is correct impact on PREPARE >> * what is correct impact on EXECUTE >> * what should be done if this GUC is changed .. > > For better or for worse, as a project we've settled on GUCs as a way > to control behavior. I think it makes more sense to try to apply that > option to new behaviors we want to control than to invent some new > system. This seems very sensible. We also have infrastructure at the SQL level (SET) to manage the GUC. Tom upthread (for pretty good reasons) extending SET to pl/pgsql specific scoping but TBH I'm struggling as to why we need to implement new syntax for this; the only thing missing is being able to scope SET statements to a code block FWICT. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan
On Fri, Sep 8, 2017 at 2:48 PM, Pavel Stehule wrote: > > > 2017-09-08 21:21 GMT+02:00 Daniel Gustafsson : >> >> > On 08 Sep 2017, at 19:14, Simon Riggs wrote: >> > >> > On 6 September 2017 at 07:43, Robert Haas wrote: >> > >> >> LET custom_plan_tries = 0 IN SELECT ... >> > >> > Tom has pointed me at this proposal, since on another thread I asked >> > for something very similar. (No need to reprise that discussion, but I >> > wanted prepared queries to be able to do SET work_mem = X; SELECT). >> > This idea looks a good way forward to me. >> > >> > Since we're all in roughly the same place, I'd like to propose that we >> > proceed with the following syntax... whether or not this precisely >> > solves OP's issue on this thread. >> > >> > 1. Allow SET to set multiple parameters... >> > SET guc1 = x, guc2 = y >> > This looks fairly straightforward >> > >> > 2. Allow a SET to apply only for a single statement >> > SET guc1 = x, guc2 = y FOR stmt >> > e.g. SET max_parallel_workers = 4 FOR SELECT count(*) FROM bigtable >> > Internally a GUC setting already exists for a single use, via >> > GUC_ACTION_SAVE, so we just need to invoke it. >> >> This syntax proposal makes sense, +1. My immediate thought was that the >> per-statement GUCs were sort of like options, and most options in our >> syntax >> are enclosed with (), like: SET (guc1 = x, guc2 = y) FOR SELECT ..; > > we newer support this syntax in combination with SET keyword > > see - CREATE FUNCTION command > > personally I prefer syntax without FOR keyword - because following keyword > must be reserved keyword > > SET x = .., y = .. SELECT ... ; This seems pretty ugly from a syntax perspective. We already have 'SET LOCAL', which manages scope to the current transaction. How about SET BLOCK which would set until you've left the current statement block? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Aggregation push-down
On Thu, Aug 17, 2017 at 10:22 AM, Antonin Houska wrote: > Antonin Houska wrote: > output type. For other aggregates (like avg()) the remote nodes will have to > return the transient state value in an appropriate form (maybe bytea type), > which does not depend on PG version. Hm, that seems like an awful lot of work (new version agnostic serialization format) for very little benefit (version independent type serialization for remote aggregate pushdown). How about forcing the version to be the same for the feature to be used? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add Roman numeral conversion to to_number
On Mon, Aug 14, 2017 at 2:48 PM, Peter Eisentraut wrote: > On 8/3/17 13:45, Robert Haas wrote: >> On Thu, Aug 3, 2017 at 9:25 AM, Oliver Ford wrote: >>> Adds to the to_number() function the ability to convert Roman numerals >>> to a number. This feature is on the formatting.c TODO list. It is not >>> currently implemented in either Oracle, MSSQL or MySQL so gives >>> PostgreSQL an edge :-) >> I kind of put my head in my hands when I saw this. I'm not really >> sure it's worth complicating the code for something that has so little >> practical utility, but maybe other people will feel differently. > > I can't get excited about it. to_number() and such usually mirror the > Oracle implementation, so having something that is explicitly not in > Oracle goes a bit against its mission. > > One of the more interesting features of to_number/to_char is that it has > a bunch of facilities for formatting decimal points, leading/trailing > zeros, filling in spaces and signs, and so on. None of that applies > naturally to Roman numerals, so there isn't a strong case for including > that into these functions, when a separate function or module could do. Well, doesn't that also apply to scientific notation ()? 'RN' is documented as an accepted formatting string, and nowhere does it mention that it only works for input. So we ought to allow for it to be fixed or at least document that it does not work. It's nothing but a curio obviously, but it's kind of cool IMO. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Thu, Aug 10, 2017 at 12:01 PM, Ants Aasma wrote: > On Wed, Jan 18, 2017 at 4:33 PM, Merlin Moncure wrote: >> On Wed, Jan 18, 2017 at 4:11 AM, Ants Aasma wrote: >>> On Wed, Jan 4, 2017 at 5:36 PM, Merlin Moncure wrote: >>>> Still getting checksum failures. Over the last 30 days, I see the >>>> following. Since enabling checksums FWICT none of the damage is >>>> permanent and rolls back with the transaction. So creepy! >>> >>> The checksums still only differ in least significant digits which >>> pretty much means that there is a block number mismatch. So if you >>> rule out filesystem not doing its job correctly and transposing >>> blocks, it could be something else that is resulting in blocks getting >>> read from a location that happens to differ by a small multiple of >>> page size. Maybe somebody is racily mucking with table fd's between >>> seeking and reading. That would explain the issue disappearing after a >>> retry. >>> >>> Maybe you can arrange for the RelFileNode and block number to be >>> logged for the checksum failures and check what the actual checksums >>> are in data files surrounding the failed page. If the requested block >>> number contains something completely else, but the page that follows >>> contains the expected checksum value, then it would support this >>> theory. >> >> will do. Main challenge is getting hand compiled server to swap in >> so that libdir continues to work. Getting access to the server is >> difficult as is getting a maintenance window. I'll post back ASAP. > > As a new datapoint, we just had a customer with an issue that I think > might be related. The issue was reasonably repeatable by running a > report on the standby system. Issue manifested itself by first "could > not open relation" and/or "column is not in index" errors, followed a > few minutes later by a PANIC from startup process due to "specified > item offset is too large", "invalid max offset number" or "page X of > relation base/16384/1259 is uninitialized". I took a look at the xlog > dump and it was completely fine. For instance in the "specified item > offset is too large" case there was a INSERT_LEAF redo record > inserting the preceding offset just a couple hundred kilobytes back. > Restarting the server sometimes successfully applied the offending > WAL, sometimes it failed with other corruption errors. The offending > relations were always pg_class or pg_class_oid_index. Replacing plsh > functions with dummy plpgsql functions made the problem go away, > reintroducing plsh functions made it reappear. Fantastic. I was never able to attempt to apply O_CLOEXEC patch (see upthread) due to the fact that access to the system is highly limited and compiling a replacement binary was a bit of a headache. IIRC this was the best theory on the table as to the underlying cause and we ought to to try that first, right? Reminder; I was able to completely eliminate all damage (but had to handle occasional unexpected rollback) via enabling checksums. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10 beta docs: different replication solutions
On Sun, Jul 30, 2017 at 8:34 PM, Steve Singer wrote: > > We don't seem to describe logical replication on > > https://www.postgresql.org/docs/10/static/different-replication-solutions.html > > The attached patch adds a section. This is a good catch. Two quick observations: 1) Super pedantic point. I don't like the 'repl.' abbreviation in the 'most common implementation' both for the existing hs/sr and for the newly added logical. 2) This lingo: + Logical replication allows the data changes from individual tables + to be replicated. Logical replication doesn't require a particular server + to be designated as a master or a slave but allows data to flow in multiple + directions. For more information on logical replication, see . Is good, but I would revise it just a bit to emphasize the subscription nature of logical replication to link the concepts expressed strongly in the main section. For example: Logical replication allows the data changes [remove: "from individual tables to be replicated"] to be published to subscriber nodes. Data can flow in any direction between nodes on a per-table basis; there is no concept of a master server. Conflict resolution must be handled completely by the application. For more information on... what do you think? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSONB - JSONB operator feature request
On Tue, Jul 18, 2017 at 12:49 PM, David Fetter wrote: > On Tue, Jul 18, 2017 at 01:36:32PM +0200, david.tu...@linuxbox.cz wrote: >> Hi, >> >> some users and me used hstore - hstore for example storing only changed >> rows in trigger like: >> >> hsore(NEW) - hstore(OLD) >> >> There isn't same operator/function in JSON/JSONB. We can only remove keys >> from JSONB, but not equal key-value pairs. Is there any chance to have >> same feature with JSON/JSONB in postgres core? > > Here's one slightly modified from > http://coussej.github.io/2016/05/24/A-Minus-Operator-For-PostgreSQLs-JSONB/ > > CREATE OR REPLACE FUNCTION jsonb_minus ( arg1 jsonb, arg2 jsonb ) > RETURNS jsonb > LANGUAGE sql > AS $$ > SELECT > COALESCE(json_object_agg( > key, > CASE > -- if the value is an object and the value of the second argument > is > -- not null, we do a recursion > WHEN jsonb_typeof(value) = 'object' AND arg2 -> key IS NOT NULL > THEN jsonb_minus(value, arg2 -> key) > -- for all the other types, we just return the value > ELSE value > END > ), '{}')::jsonb > FROM > jsonb_each(arg1) > WHERE > arg1 -> key IS DISTINCT FROM arg2 -> key > $$; > > CREATE OPERATOR - ( > PROCEDURE = jsonb_minus, > LEFTARG = jsonb, > RIGHTARG = jsonb > ); > > I suspect that there's a faster way to do the jsonb_minus function > internally. yes, please! I also sorely miss the hstore 'slice' function which is very similar. The main remaining disadvantage with jsonb WRT to hstore is that you can't do simple retransformations that these operations allow for. Too often you end up doing multiple '->' operations against the same object followed by a rebundling which is a real performance killer. I understand that there are more edge cases due the flexible json structure but I'd be quite happy returning NULL or erroring when you can't arrive at a sensible extraction. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [BUGS] [HACKERS] Segmentation fault in libpq
On Thu, Jun 29, 2017 at 9:12 AM, Tom Lane wrote: > Merlin Moncure writes: >> On Thu, Jun 29, 2017 at 8:23 AM, Michal Novotny >> wrote: >>> Could you please help me based on information provided above? > >> You might want to run your code through some analysis tools (for >> example, valgrind). > > valgrind is not a perfect tool for finding that kind of problem, > especially if you can't reproduce the crash reliably; but at least > valgrind is readily available and easy to use, so you might as > well start there and see if it finds anything. If you have access > to any sort of static analysis tool (eg, Coverity), that might be > more likely to help. Or you could fall back on manual code > auditing, if the program isn't very big. clang static analyzer is another good tool to check out https://clang-analyzer.llvm.org/ merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Segmentation fault in libpq
On Thu, Jun 29, 2017 at 8:23 AM, Michal Novotny wrote: > Hi, > > comments inline ... > > > > On 06/29/2017 03:08 PM, Merlin Moncure wrote: >> >> On Thu, Jun 29, 2017 at 4:01 AM, Michal Novotny >> wrote: >>> >>> Hi all, >>> >>> we've developed an application using libpq to access a table in the PgSQL >>> database but we're sometimes experiencing segmentation fault on >>> resetPQExpBuffer() function of libpq called from PQexecParams() with >>> prepared query. >>> >>> PostgreSQL version is 9.6.3 and the backtrace is: >>> >>> Core was generated by `/usr/ti/bin/status-monitor2 -m >>> /usr/lib64/status-monitor2/modules'. >>> Program terminated with signal 11, Segmentation fault. >>> #0 resetPQExpBuffer (str=str@entry=0x9f4a28) at pqexpbuffer.c:152 >>> 152 str->data[0] = '\0'; >>> >>> Thread 1 (Thread 0x7fdf68de3840 (LWP 3525)): >>> #0 resetPQExpBuffer (str=str@entry=0x9f4a28) at pqexpbuffer.c:152 >>> No locals. >>> #1 0x7fdf66e0333d in PQsendQueryStart (conn=conn@entry=0x9f46d0) at >>> fe-exec.c:1371 >>> No locals. >>> #2 0x7fdf66e044b9 in PQsendQueryParams (conn=conn@entry=0x9f46d0, >>> command=command@entry=0x409a98 "SELECT min, hour, day, month, dow, >>> sensor, >>> module, params, priority, rt_due FROM sm.cron WHERE sensor = $1 ORDER BY >>> priority DESC", nParams=nParams@entry=1, paramTypes=paramTypes@entry=0x0, >>> paramValues=paramValues@entry=0xa2b7b0, >>> paramLengths=paramLengths@entry=0x0, >>> paramFormats=paramFormats@entry=0x0, resultFormat=resultFormat@entry=0) >>> at >>> fe-exec.c:1192 >>> No locals. >>> #3 0x7fdf66e0552b in PQexecParams (conn=0x9f46d0, command=0x409a98 >>> "SELECT min, hour, day, month, dow, sensor, module, params, priority, >>> rt_due >>> FROM sm.cron WHERE sensor = $1 ORDER BY priority DESC", nParams=1, >>> paramTypes=0x0, paramValues=0xa2b7b0, paramLengths=0x0, paramFormats=0x0, >>> resultFormat=0) at fe-exec.c:1871 >>> No locals. >>> >>> Unfortunately we didn't have more information from the crash, at least >>> for >>> now. >>> >>> Is this a known issue and can you help me with this one? >> >> Is your application written in C? We would need to completely rule >> out your code (say, by double freeing result or something else nasty) >> before assuming problem was withing libpq itself, particularly in this >> area of the code. How reproducible is the problem? >> >> merlin > > > The application is written in plain C. The issue is it happens just > sometimes - sometimes it happens and sometimes it doesn't. Once it happens > it causes the application crash but as it's systemd unit with > Restart=on-failure flag it's automatically being restarted. > > What's being done is: > 1) Ensure connection already exists and create a new one if it doesn't exist > yet > 2) Run PQexecParams() with specified $params that has $params_cnt elements: > > res = PQexecParams(conn, prepared_query, params_cnt, NULL, (const char > **)params, NULL, NULL, 0); > > 3) Check for result and report error and exit if "PQresultStatus(res) != > PGRES_TUPLES_OK" > 4) Do some processing with the result > 5) Clear result using PQclear() > > It usually works fine but sometimes it's crashing and I don't know how to > investigate further. > > Could you please help me based on information provided above? You might want to run your code through some analysis tools (for example, valgrind). Short of that, to get help here you need to post the code for review. How big is your application? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Segmentation fault in libpq
On Thu, Jun 29, 2017 at 4:01 AM, Michal Novotny wrote: > Hi all, > > we've developed an application using libpq to access a table in the PgSQL > database but we're sometimes experiencing segmentation fault on > resetPQExpBuffer() function of libpq called from PQexecParams() with > prepared query. > > PostgreSQL version is 9.6.3 and the backtrace is: > > Core was generated by `/usr/ti/bin/status-monitor2 -m > /usr/lib64/status-monitor2/modules'. > Program terminated with signal 11, Segmentation fault. > #0 resetPQExpBuffer (str=str@entry=0x9f4a28) at pqexpbuffer.c:152 > 152 str->data[0] = '\0'; > > Thread 1 (Thread 0x7fdf68de3840 (LWP 3525)): > #0 resetPQExpBuffer (str=str@entry=0x9f4a28) at pqexpbuffer.c:152 > No locals. > #1 0x7fdf66e0333d in PQsendQueryStart (conn=conn@entry=0x9f46d0) at > fe-exec.c:1371 > No locals. > #2 0x7fdf66e044b9 in PQsendQueryParams (conn=conn@entry=0x9f46d0, > command=command@entry=0x409a98 "SELECT min, hour, day, month, dow, sensor, > module, params, priority, rt_due FROM sm.cron WHERE sensor = $1 ORDER BY > priority DESC", nParams=nParams@entry=1, paramTypes=paramTypes@entry=0x0, > paramValues=paramValues@entry=0xa2b7b0, paramLengths=paramLengths@entry=0x0, > paramFormats=paramFormats@entry=0x0, resultFormat=resultFormat@entry=0) at > fe-exec.c:1192 > No locals. > #3 0x7fdf66e0552b in PQexecParams (conn=0x9f46d0, command=0x409a98 > "SELECT min, hour, day, month, dow, sensor, module, params, priority, rt_due > FROM sm.cron WHERE sensor = $1 ORDER BY priority DESC", nParams=1, > paramTypes=0x0, paramValues=0xa2b7b0, paramLengths=0x0, paramFormats=0x0, > resultFormat=0) at fe-exec.c:1871 > No locals. > > Unfortunately we didn't have more information from the crash, at least for > now. > > Is this a known issue and can you help me with this one? Is your application written in C? We would need to completely rule out your code (say, by double freeing result or something else nasty) before assuming problem was withing libpq itself, particularly in this area of the code. How reproducible is the problem? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lag(bigint,int,int), etc?
On Tue, Jun 27, 2017 at 10:12 AM, Colin 't Hart wrote: > On 27 Jun 2017, at 17:06, Merlin Moncure wrote: >> >>> On Tue, Jun 27, 2017 at 10:01 AM, Colin 't Hart >>> wrote: >>> Hi, >>> >>> The following rather contrived example illustrates that lag(), lead() >>> (and probably other functions) can't automatically cast an integer to >>> a bigint: >>> >>> select lag(sum,1,0) over () from (select sum(generate_series) over >>> (order by generate_series) from generate_series(1,10)) x; >>> ERROR: function lag(bigint, integer, integer) does not exist >>> LINE 1: select lag(sum,1,0) over () from (select sum(generate_series... >>> ^ >>> HINT: No function matches the given name and argument types. You >>> might need to add explicit type casts. >>> >>> >>> I guess this is because the lag() and lead() functions take any type, >>> and hence the default must be of the same type. >>> This had me stumped for a few while until I realised that the types >>> were different. >>> >>> Would there be any way to implement an automatic conversion? >>> >>> On the off-chance that this is actually a bug, this is on 9.6.3, but >>> it also occurs on 9.3.17 >> >> Why not cast the arguments? The first and the third argument have to >> be the same, and the second argument is always int. >> >> merlin > > I know that I can cast. I'm wondering if it would be possible/desirable to > implement automatic casting. Automatic casting works already for functions > defined to take bigint and you pass in an integer. But not for these > functions that take any type. Right. If you've got 2+ types being passed for 'any', which argument should you get? It's ambiguous, so the type rules into 'any' taking functions are stricter than for regular functions. Casting behaviors more complex than they look on the surface and changes to make them more flexible are typically difficult to make work. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lag(bigint,int,int), etc?
On Tue, Jun 27, 2017 at 10:01 AM, Colin 't Hart wrote: > Hi, > > The following rather contrived example illustrates that lag(), lead() > (and probably other functions) can't automatically cast an integer to > a bigint: > > select lag(sum,1,0) over () from (select sum(generate_series) over > (order by generate_series) from generate_series(1,10)) x; > ERROR: function lag(bigint, integer, integer) does not exist > LINE 1: select lag(sum,1,0) over () from (select sum(generate_series... >^ > HINT: No function matches the given name and argument types. You > might need to add explicit type casts. > > > I guess this is because the lag() and lead() functions take any type, > and hence the default must be of the same type. > This had me stumped for a few while until I realised that the types > were different. > > Would there be any way to implement an automatic conversion? > > On the off-chance that this is actually a bug, this is on 9.6.3, but > it also occurs on 9.3.17 Why not cast the arguments? The first and the third argument have to be the same, and the second argument is always int. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgresql transactons not fully isolated
On Tue, Jun 20, 2017 at 2:58 PM, Merlin Moncure wrote: > On Tue, Jun 20, 2017 at 2:34 PM, David G. Johnston > wrote: >> On Tue, Jun 20, 2017 at 12:22 PM, Chapman Flack >> wrote: >>> I get the reported result (DELETE 0 and a table containing 2 and 3) >>> in both 'read committed' and 'read uncommitted'. >> >> Practically speaking those are a single transaction isolation mode. >> >> https://www.postgresql.org/docs/10/static/transaction-iso.html >> >> I think Merlin has mis-read the article he linked to. The example >> being used there never claims to be done under serialization and seems >> to describe an example of the perils of relying on the default >> isolation level. > > oops -- could be operator error :-) yep, I made the rookie mistake of setting transaction isolation level (which immediately evaporated since it wasn't bracketed by the transaction), but not for the default. Sorry for the noise, serialization failures are raised and that is acceptable behavior per spec AIUI. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgresql transactons not fully isolated
On Tue, Jun 20, 2017 at 2:34 PM, David G. Johnston wrote: > On Tue, Jun 20, 2017 at 12:22 PM, Chapman Flack wrote: >> I get the reported result (DELETE 0 and a table containing 2 and 3) >> in both 'read committed' and 'read uncommitted'. > > Practically speaking those are a single transaction isolation mode. > > https://www.postgresql.org/docs/10/static/transaction-iso.html > > I think Merlin has mis-read the article he linked to. The example > being used there never claims to be done under serialization and seems > to describe an example of the perils of relying on the default > isolation level. oops -- could be operator error :-) merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] postgresql transactons not fully isolated
Michael Malis via: http://malisper.me/postgres-transactions-arent-fully-isolated/ has determined that postgresql transactions are not fully isolated even when using serializable isolationl level. If I prep a table, ints via: postgres=# create table ints (n int); CREATE TABLE postgres=# insert into ints values (1); INSERT 0 1 postgres=# insert into ints values (2); INSERT 0 1 and then run two concurrent in serializable isolation mode transactions like this: T1: BEGIN T1: UPDATE ints SET n = n + 1; T2: BEGIN T2: DELETE FROM ints where n = 2; -- blocks T1: COMMIT; -- T2 frees T2: SELECT * FROM ints; -- both rows 2 and 3 visible T2: COMMIT: My understanding is that for serializable transactions, the result is correct as long as you can play back transactions in either order, one after another, when they overlap and get that result. This is clearly not the case since when played in either order you'd end up with one row. I guess the failure occurs there is some kind of separation between when the row is initially looked up and the deletion is qualified. This is likely not a problem in practice, but is Micheal right in is assessment that we are not precisely following the spec? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CTE inlining
On Fri, May 12, 2017 at 3:39 PM, Bruce Momjian wrote: > On Tue, May 9, 2017 at 05:14:19PM -0400, Tom Lane wrote: >> Ilya Shkuratov writes: >> > Ok, it seems that most people in discussion are agree that removing >> > optimization >> > fence is a right thing to do. >> > Nonetheless I still hoping to discuss the algorithm and its implementation. >> >> Yeah, so far we've mainly discussed whether to do that and how to control >> it, not what the actual results would be. > > To summarize, it seems we have two options if we want to add fence > control to CTEs: > > 1. add INLINE to disable the CTE fence > 2. add MATERIALIZE to enable the CTE fence > > or some other keywords. I think most people prefer #2 because: > > * most users writing queries prefer #2 Yeah, I think there was rough consensus on this point.I think it's fair to assume that most (or at least a majority) of queries will benefit from inlining, people would want to opt out of, rather than opt in to, generally good optimization strategies. This will hit some in people today, but this is not a backwards compatibility issue since performance is generally not really fairly described as compatibility criteria. If this feature drops we ought to warn people in the release notes though. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG 10 release notes
On Mon, May 1, 2017 at 7:02 AM, Robert Haas wrote: > On Tue, Apr 25, 2017 at 11:01 AM, Bruce Momjian wrote: >> I didn't think logical decoding was really more than a proof-of-concept >> until now. > > /me searches for jaw on floor. > > I would not in any way refer to logical decoding as being only a proof > of concept, even before logical replication. That's fair, but I think I understand what Bruce was going for here. Data point: github third party modules are generally not approved for deployment in my organization so logical decoding from a production perspective does not exist (for me) until 10.0. Point being, an important threshold has been crossed. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CTE inlining
On Wed, May 3, 2017 at 12:33 PM, Alvaro Herrera wrote: > David Fetter wrote: > >> When we add a "temporary" GUC, we're taking on a gigantic burden. >> Either we support it forever somehow, or we put it on a deprecation >> schedule immediately and expect to be answering questions about it for >> years after it's been removed. >> >> -1 for the GUC. > > Absolutely. > > So ISTM we have three choices: > > 1) we switch unmarked CTEs as inlineable by default in pg11. What seems > likely to happen for a user that upgrades to pg11 is that 5 out of 10 > CTE-using queries are going to become faster than with pg10, and they > are going to be happy; 4 out of five are going to see no difference, but > they didn't have to do anything about it; and the remaining query is > going to become slower, either indistinguishably so (in which case they > don't care and they remain happy because of the other improvements) or > notably so, in which case they can easily figure where to add the > MATERIALIZED option and regain the original performance. +1 for option 1. This change will be welcome for a large number of queries, but forced materialization is a real need and I use it often. This comes off as a very reasonable compromise in my opinion unless it requires major coding gymnastics to implement. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CTE inlining
\On Tue, May 2, 2017 at 12:05 PM, Tomas Vondra wrote: > On 5/2/17 6:34 PM, David Fetter wrote: >> >> On Tue, May 02, 2017 at 02:40:55PM +0200, Andreas Karlsson wrote: >>> >>> On 05/02/2017 04:38 AM, Craig Ringer wrote: On 1 May 2017 at 22:26, Andreas Karlsson wrote: > >>> >>> >>> ... >>> >>> I see some alternatives, none of them perfect. >>> >>> 1. Just remove the optimization fence and let people add OFFSET 0 to >>> their >>> queries if they want an optimization fence. This lets us keep pretending >>> that we do not have query hints (and therefore do not have to formalize >>> any >>> syntax for them) while still allowing people to add optimization fences. >> >> >> +1 >> >> I get that people with gigantic PostgreSQL installations with >> stringent performance requirements sometimes need to do odd things to >> squeeze out the last few percentage points of performance. As the >> people (well, at least the people close to the ground) at these >> organizations are fully aware, performance optimizations are extremely >> volatile with respect to new versions of software, whether it's >> PostgreSQL, Oracle, the Linux kernel, or what have you. They expect >> this, and they have processes in place to handle it. If they don't, >> it's pilot error. >> >> We should not be penalizing all our other users to maintain the >> fiction that people can treat performance optimizations as a "fire and >> forget" matter. >> > > Agreed. > >>> 2. Add a decorator for WITH (e.g. "WITH MATERIALIZED x (...") to add an >>> explicit optimization fence. This will for the first time add official >>> support for a query hint in the syntax which is a quite big precedent. >> >> >> Yep. It's one we should think very carefully before we introduce. >> > > I think it's a mistake to see this as an introduction of query hits. > > Firstly, it's a question whether it qualifies as a hint. I wouldn't call it > a hint, but let's assume there is a definition of query hints that includes > WITH MATERIALIZED. > > More importantly, however, this is not introducing anything new. It's just a > different name for the current "WITH" semantics, and you can achieve the > same behavior by "OFFSET 0". And people are already using these as hints, so > I fail to see how this introduces anything new. > > In fact, if you see the optimization fence as an implicit query hint, this > actually *removes* a hint (although most users are unaware of that behavior > and use it unintentionally). +1 down the line. More to the point, for several years now we've (or at least I, but I'm not the only one) have been advocating for the usage of CTE to avoid the undocumented and bizarre OFFSET 0 trick. Jerking this out from users without giving a simple mechanic to get the same behavior minus a major query rewrite is blatantly user hostile. I can't believe we're even contemplating it. Also a GUC is not a solution for pretty obvious reasons I think. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CTE inlining
On Sun, Apr 30, 2017 at 6:21 PM, Andres Freund wrote: > On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote: >> why we cannot to introduce GUC option - enable_cteoptfence ? > > Doesn't really solve the issue, and we've generally shied away from GUCs > that influence behaviour after a few bad experiences. What if you want > one CTE inlined, but another one not? Yeah. Are we absolutely opposed to SQL syntax against WITH that allows or disallows fencing? for example, WITH [MATERIALIZED] Pushing people to OFFSET 0 is a giant step backwards IMO, and as in implementation detail is also subject to change. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Crash observed during the start of the Postgres process
On Tue, Apr 25, 2017 at 8:44 AM, K S, Sandhya (Nokia - IN/Bangalore) wrote: > Hello, > > Did you get a chance to take a look into the issue? > > Please consider it with high priority. We will be awaiting your inputs. This email is heavily cross posted, which is obnoxious. Can you paste the relevant log snippet? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Triggers and logical replication (10devel)
On Fri, Apr 21, 2017 at 5:08 AM, Egor Rogov wrote: > Hello, > It seams that tiggers don't fire on subscriber's tables during logical > replication. Is it a bug? Reading the documentation (which is TBH a bit hard to follow) it appears that it is expected behavior. https://www.postgresql.org/docs/devel/static/logical-replication-architecture.html#logical-replication-snapshot states: "The apply process on the subscriber database always runs with session_replication_role set to replica, which produces the usual effects on triggers and constraints." https://www.postgresql.org/docs/devel/static/sql-altertable.html states: "The trigger firing mechanism is also affected by the configuration variable session_replication_role. Simply enabled triggers will fire when the replication role is “origin” (the default) or “local”." merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ongoing issues with representation of empty arrays
On Mon, Apr 10, 2017 at 11:17 PM, Andrew Gierth wrote: >> "Tom" == Tom Lane writes: > > >> First is contrib/intarray, _AGAIN_ (see past bugs such as #7730): > >> ... > >> I plan to fix this one properly, unless anyone has any objections. > > Tom> Just to clarify, what do you think is "properly"? > > I would say, that any time an intarray function returns an empty result > it should be the standard 0-dimensional representation that every other > array operation uses. The intarray functions all seem already able to > take such values as inputs. Also there should be regression tests for > this (none of intarray's existing tests have any empty arrays at all). Are there any impacts outside of intarray? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance issue with postgres9.6
On Fri, Apr 7, 2017 at 12:54 PM, Tom Lane wrote: > Tomas Vondra writes: >> On 04/07/2017 06:31 PM, Merlin Moncure wrote: >>> I think your math is off. Looking at your attachments, planning time >>> is 0.056ms, not 0.56ms. This is in no way relevant to performance on >>> the order of your measured TPS. How are you measuring TPS? > >> Not sure where did you get the 0.056ms? > > I don't see that either, but: > >> What I see is this in the 9.3 explains: >> Total runtime: 0.246 ms >> and this in those from 9.6: >> Planning time: 0.396 ms >> Execution time: 0.181 ms >> That is roughly 0.25ms vs. 0.6ms (0.4+0.2), as reported by Prakash. > > 9.3's EXPLAIN did not measure planning time at all. The "Total runtime" > it reports corresponds to "Execution time" in the newer version. So > these numbers indicate that 9.6 is significantly *faster*, not slower, > than 9.3, at least so far as execution of this one example is concerned. > > The OP may well be having some performance issue with 9.6, but the > presented material completely fails to demonstrate it. This smells like a problem with the test execution environment itself. OP (if on linux), try: pgbench -n -f <(echo "select * from subscriber where s_id = 100") -c 4 -T 10 ...where pgbench is run from the database server (if pgbench is not in the default path, you may have to qualify it). This should give apples to apples comparison, or at least rule out certain environmental considerations like the network stack. If your client is running on windows, one place to look is the TCP stack. In my experience tcp configuration issues are much more common on windows. On any reasonably modern hardware can handle thousands and thousands of transactions per second for simple indexed select. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance issue with postgres9.6
On Fri, Apr 7, 2017 at 5:16 AM, Prakash Itnal wrote: > Hello, > > We currently use psotgres 9.3 in our products. Recently we upgraded to > postgres 9.6. But with 9.6 we have seen a drastic reduction in throughput. > After analyzing carefully I found that "planner time" in 9.6 is very high. > Below are the details: > > Scenario: > 1 Create a table with 10 rows. > 2 Execute simple query: select * from subscriber where s_id = 100; > 3 No update/delete/insert; tried vacuum, full vacuum; by default we enable > auto-vacuum > > 9.3: Avg of "Total runtime" : 0.24ms [actual throughput: 650 TPS] > 9.6: Avg of Total time: 0.56ms (Avg of "Planning time" : 0.38ms + Avg of > "Execution time" : 0.18ms) [actual throughput: 80 TPS] I think your math is off. Looking at your attachments, planning time is 0.056ms, not 0.56ms. This is in no way relevant to performance on the order of your measured TPS. How are you measuring TPS? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq Alternate Row Processor
On Mon, Feb 13, 2017 at 8:46 AM, Kyle Gearhart wrote: > On 2/9/17 7:15 PM, Jim Nasby wrote: >> Can you run a trace to see where all the time is going in the single row >> case? I don't see an obvious time-suck with a quick look through the code. >> It'd be interesting to see how things change if you eliminate the filler >> column from the SELECT. > > Traces are attached, these are with callgrind. > > profile_nofiller.txt: single row without filler column > profile_filler.txt: single row with filler column > profile_filler_callback.txt: callback with filler column > > pqResultAlloc looks to hit malloc pretty hard. The callback reduces all of > that to a single malloc for each row. Couldn't that be optimized, say, by preserving malloc'd memory when in single row mode and recycling it? (IIRC during the single row mode discussion this optimization was voted down). A barebones callback mode ISTM is a complete departure from the classic PGresult interface. This code is pretty unpleasant IMO: acct->abalance = *((int*)PQgetvalue(res, 0, i)); acct->abalance = __bswap_32(acct->abalance); Your code is faster but foists a lot of the work on the user, so it's kind of cheating in a way (although very carefully written applications might be able to benefit). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea on how to simplify comparing two sets
1On Tue, Feb 7, 2017 at 9:46 PM, Joel Jacobson wrote: > On Tue, Feb 7, 2017 at 4:58 PM, Tom Lane wrote: >> Joel Jacobson writes: >>> Currently there is no simple way to check if two sets are equal. >> >> Uh ... maybe check whether SELECT set1 EXCEPT SELECT set2 >> and SELECT set2 EXCEPT SELECT set1 are both empty? > > Yes, that's one way, but it's ugly as you have to repeat yourself and > write both sets two times. > Not an issue for small queries, but if you have two big queries stored > in a .sql file, > you would have to modify both places for each query and always make > sure they are identical. A CTE might help: WITH left AS (something complex), right AS (something complex) SELECT COUNT(*) = 0 AS good FROM ( SELECT * FROM left EXCEPT SELECT * FROM right UNION ALL SELECT * FROM right EXCEPT SELECT * FROM left ) q; This isn't the most efficient solution, but is easily abstracted into dynamic SQL (meaning, you could pass both queries as arguments to a checker function). Another, similar approach is to abstract the query behind a view which ISTM is a practice you are underutilizing based on your comments :-). If I were in a hurry and the dataset was enormous I would probably dump both queries identically ordered to a .csv, and do: diff left.csv right.csv | head -1 in bash or something like that. Not sure if the utility of a bidirectional EXCEPT is enough to justify adding custom syntax for that approach. I use the 'double EXCEPT' tactic fairly often and understand the need, but the bar for non-standard syntax is pretty high (and has been getting higher over the years, I think). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Mon, Jan 23, 2017 at 8:07 PM, Tom Lane wrote: > Peter Geoghegan writes: >> I thought that checksums went in in part because we thought that there >> was some chance that they'd find bugs in Postgres. > > Not really. AFAICS the only point is to catch storage-system malfeasance. > > It's barely possible that checksumming would help detect cases where > we'd written data meant for block A into block B, but I don't rate > that as being significantly more probable than bugs in the checksum > code itself. Also, if that case did happen, the checksum code might > "detect" it in some sense, but it would be remarkably unhelpful at > identifying the actual cause. Hm, but at least in some cases wouldn't it protect people from further damage? End user data damage ought to prevented at all costs IMO. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Sat, Jan 21, 2017 at 12:35 PM, Tom Lane wrote: > Andres Freund writes: >> Sure, it might be easy, but we don't have it. Personally I think >> checksums just aren't even ready for prime time. If we had: >> - ability to switch on/off at runtime (early patches for that have IIRC >> been posted) >> - *builtin* tooling to check checksums for everything >> - *builtin* tooling to compute checksums after changing setting >> - configurable background sweeps for checksums > > Yeah, and there's a bunch of usability tooling that we don't have, > centered around "what do you do after you get a checksum error?". > AFAIK there's no way to check or clear such an error; but without > such tools, I'm afraid that checksums are as much of a foot-gun > as a benefit. I see your point here, but they sure saved my ass with that pl/sh issue. So I'm inclined to lightly disagree; there are good arguments either way. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Wed, Jan 18, 2017 at 4:11 AM, Ants Aasma wrote: > On Wed, Jan 4, 2017 at 5:36 PM, Merlin Moncure wrote: >> Still getting checksum failures. Over the last 30 days, I see the >> following. Since enabling checksums FWICT none of the damage is >> permanent and rolls back with the transaction. So creepy! > > The checksums still only differ in least significant digits which > pretty much means that there is a block number mismatch. So if you > rule out filesystem not doing its job correctly and transposing > blocks, it could be something else that is resulting in blocks getting > read from a location that happens to differ by a small multiple of > page size. Maybe somebody is racily mucking with table fd's between > seeking and reading. That would explain the issue disappearing after a > retry. > > Maybe you can arrange for the RelFileNode and block number to be > logged for the checksum failures and check what the actual checksums > are in data files surrounding the failed page. If the requested block > number contains something completely else, but the page that follows > contains the expected checksum value, then it would support this > theory. will do. Main challenge is getting hand compiled server to swap in so that libdir continues to work. Getting access to the server is difficult as is getting a maintenance window. I'll post back ASAP. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New CORRESPONDING clause design
On Tue, Jan 17, 2017 at 12:37 AM, Surafel Temsgen wrote: > I am new here and I really want to contribute, I have read same resource > that help understanding database system and postgresql. I would like to > start implementing sql syntax corresponding by clause because I believe > implementing sql syntax gives an opportunity to familiarize many part of > postgresql source code. Previous implementation is here and have an issue on > explain query and break cases on unlabeled NULLs > To repeat what a corresponding by clause means > Corresponding clause either contains a BY(...) clause or not. If it > doesn't have a BY(...) clause the usage is as follows. This is great stuff. Does the syntax only apply to UNION? I would imagine it would also apply to INTERSECT/EXCEPT? What about UNION ALL? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Retiring from the Core Team
On Wed, Jan 11, 2017 at 6:29 PM, Josh Berkus wrote: > Hackers: > > You will have noticed that I haven't been very active for the past year. > My new work on Linux containers and Kubernetes has been even more > absorbing than I anticipated, and I just haven't had a lot of time for > PostgreSQL work. > > For that reason, as of today, I am stepping down from the PostgreSQL > Core Team. Thanks for all your hard work. FWIW, your blog posts, 'Primary Keyvil' are some of my favorite of all time! merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] merging some features from plpgsql2 project
On Wed, Jan 11, 2017 at 2:57 PM, Robert Haas wrote: > - The E'' syntax and the standard_conforming_strings GUC were added in > PostgreSQL 8.0. The only legal value of standard_conforming_strings > was "false". > > - In PostgreSQL 8.1, it became possible to set > standard_conforming_strings to "true", but the default was still > "false". > > - In PostgreSQL 9.1, the default was changed to "true". > > So there 6 major release from the time the GUC was added and 5 from > the time it became mutable before the default was flipped. We've now > had 5 more since the default was changed to "true". (No, it's not > time to remove the GUC yet. At least not in my opinion.) > > One thing that made changing standard_conforming_strings particularly > painful was that it had knock-on effects on many language-specific > drivers not maintained by the core project (or just plain not > maintained). I don't think the language changes being proposed here > for PL/pgsql would have the same kind of impact, but some of them > would make it significantly harder to migrate to PostgreSQL from > Oracle, which some people might see as an anti-goal (as per other > nearby threads on making that easier). I don't think it's a simple matter of waiting N or N+M releases (although I certainly did appreciate that we did it regardless). It comes down to this: there's just no way to release changes that break a lot of code without breaking a lot of code. It's all about acknowledging that and judging it acceptable against the benefits you get. For posterity, with respect to conforming strings, SQL injection is an absolute scourge of the computing world so on balance we did the right thing. Having said that, It's always good to do the math and the calculation is primarily an economic one, I think, merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] merging some features from plpgsql2 project
On Wed, Jan 11, 2017 at 11:11 AM, Peter Eisentraut wrote: > The current syntax was chosen because it is SQL-compatible. Adding > redundant syntax to save a few characters without any new functionality > (performance, resource usage, safety, etc.) is a weak argument in the > overall scheme of things. Yeah -- exactly. The few minor things that are not 100% SQL compatible I find to be major headaches. Incompatible usage of INTO for example. This thread has been going on for quite some time now and is starting to become somewhat circular. Perhaps we ought to organize the various ideas and pain points presented in a wiki along with conclusions, and in some cases if there is no solution that is compatible with the current syntax. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] merging some features from plpgsql2 project
On Tue, Jan 10, 2017 at 7:44 AM, Marko Tiikkaja wrote: > On Tue, Jan 10, 2017 at 2:26 PM, Peter Eisentraut > wrote: >> >> It's not like PL/pgSQL is the king of brevity. > > > This is essentially saying "PL/PgSQL isn't perfect, so we shouldn't try and > make it better". I hear this argument a lot, and as long as people keep > rejecting improvements for this reason they can keep saying it. It's a > self-fulfilling prophecy. Agreed. But adding language features, especially syntactical ones, demands prudence; there is good reason to limit keywords like that. What about: pgsql.rows pgsql.found pgsql.sqlerrm etc as automatic variables (I think this was suggested upthread). Conflicts with existing structures is of course an issue but I bet it could be worked out. I also kinda disagree on the brevity point, or at least would like to add some color. SQL is verbose in the sense of "let's make everything an english language sentence" but incredibly terse relative to other language implementations of the same task. Embedded SQL tends to be uniformly clumsy due to all of the extra handling of errrors, parameterization, etc. This is why we write plpgsql naturally. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] merging some features from plpgsql2 project
On Sun, Jan 8, 2017 at 2:52 AM, Joel Jacobson wrote: > On Sat, Jan 7, 2017 at 8:56 PM, Pavel Stehule wrote: >> >> Jim, Marko, Joel - is there a place, features where we can find a partial >> agreement? If it is, then we can move our view there. > > I have decided I definitively want a new language, and I'm willing to > pay for it. well, as they say, "money talks" :-D. > Hopefully the community will join forces and contribute with ideas and > code, but with or without you or the rest of the community, plpgsql2 > is going to happen. > Call it pltrustly or plpgsql2, I don't care. I just care about ending > my suffering from being forced writing plpgsql every day. It sucks, > and I'm going to end it. Curious, are you mainly troubled by the 'INTO STRICT' family of problems? Or something else? Pavel has scored some points with PRAGMA syntax and ISTM that does not require compatibility break. > And please kill all these GUCs ideas. The best thing with PostgreSQL > is the natural expected behaviour of the default configuration. > Contrary to MySQL where you have to enable lots and lots of > configuration options just to get a behaviour you expect as a novice > user. I think there is a lot of support for this point of view. Jim is notable outlier here, but for the most part we don't do language behavior changes with GUC. > It's much better to just come together and agree on whatever we have > learned during the last 15 years of PL/pgSQL1, and sample all ideas > during a year maybe, and decide what to put into PL/pgSQL2. To make it > useful, we should aim to not break compatibility for _most_ code, but > accept some necessary rewrites of functions with deprecated > anti-patterns. Agreed: If you want to break compatibility, pushing a new language is the better way than GUC. If you got consensus on this, having both languages side by side supported for a while (maybe 4-5 releases) is they way to go, and finally the only language is frozen and moved to extension. But this is a lot of work and aggravation, are you *sure* you can only get what you want with a full compatibility break? With respect to your company developers specifically? I'm genuinely curious if you've taken a good look at pl/v8 and why you've determined it's not suitable to move forward with. It's got a different set of headaches, but is really fast, and sometimes wonder if with some alternative preprocessing (like coffeescript but geared towards SQL) could have some long term promise. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] merging some features from plpgsql2 project
On Thu, Jan 5, 2017 at 11:03 AM, Robert Haas wrote: > Now, that's not to say we should never break backward compatibility. > Sometimes we should. I think the problem with PL/pgsql is that many > of the compatibility breaks that people want are likely to lead to > subtle misbehavior rather than outright failure, or are not easy to > spot via a cursory look ("hmm, could that SELECT query ever return > more than one row?"). The core issue is that developers tend to be very poor at estimating the impacts of changes; they look at things the the lens of the "new". Professional software development is quite expensive and framework- (I'll lump the database and it's various built-in features under that term) level changes are essentially throwing out some portion of our user's investments. Even fairly innocent compatibility breaks can have major downstream impacts on our users and it's always much worse than expected. For example, nobody thought that changing the bytea text encoding format to hex would have corrupted our user's data, but it did. TBH, the discussion should shift away from specific issues on compatibility and towards a specific set of standards and policies around how to do it and what kinds of technical justifications need to be made in advance. Security problems for example could be argued as a valid reason to break user code, or poor adherence to the the SQL standard which are in turn blocking other content. Minus those kinds of considerations it's really just not worth doing, and there's no tricky strategy like playing with version numbers that can game that rule. A formal deprecation policy might be a good start. The C language really should be considered the gold standard here. Changes did have to be made, like getting rid of the notoriously broken and insecure gets(), but they were made very, very slowly and unobtrusively. (I do think lpad should except "any" FWIW) :-D merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] merging some features from plpgsql2 project
On Wed, Jan 4, 2017 at 1:49 PM, Pavel Stehule wrote: > >> >> so some possible design can be: >> >> DECLARE >> PRAGMA UNCACHED_PLANS; >> BEGIN >> SELECT ... INTO ; >> SELECT ... INTO ; >> END; >> >> This respects Ada and PL/SQL style - probably easy implementation >> >> Regards >> >> Pavel > > > some examples based on Ada doc > > FUNCTION xxx RETURN int AS > PRAGMA yyy -- pragma has function scope > BEGIN > > FUNCTION xxx RETURN int AS > BEGIN > DECLARE > PRAGMA yyy -- pragma has block scope ok, sub-block makes sense over statement level IMO. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] merging some features from plpgsql2 project
On Tue, Jan 3, 2017 at 2:15 PM, Pavel Stehule wrote: > > > 2017-01-03 20:54 GMT+01:00 Merlin Moncure : >> >> On Tue, Jan 3, 2017 at 9:58 AM, Pavel Stehule >> wrote: >> > 2017-01-03 16:23 GMT+01:00 Merlin Moncure : >> >> *) Would also like to have a FINALLY block >> > >> > What you can do there? >> >> This is syntax sugar so you don't need second begin/end/exception >> block or duplicated code. It separates error handling from cleanup. >> >> BEGIN >> PERFORM dblink_connect(... >> >> EXCEPTION WHEN OTHERS THEN >> >> FINALLY >> PERFORM dblink_disconnect(... >> END; > > > Does know somebody this pattern from Ada or PL/SQL? I guess probably not. It's a standard pattern in modern EH languages (for example, https://msdn.microsoft.com/en-us/library/dszsf989.aspx). >> >> >> *) Some user visible mechanic other than forcing SQL through EXECUTE >> >> to be able to control plan caching would be useful. >> > >> > fully agree. >> > >> > Have you some ideas? >> > >> > What about plpgsql option (function scope) -- WITHOUT-PLAN-CACHE - any >> > non >> > trivial plans will not be cached - and evaluated as parametrized query >> > only. >> >> I have slight preference for syntax marker for each query, similar to >> INTO. Maybe 'UNCACHED'? > > > I am not clean opinion - the statement level is nice, but what readability? > > SELECT UNCACHED t.a, t.b FROM INTO a,b; Yeah -- this is pretty ugly admittedly. Maybe control directive is ok, as long as you can set it mid function? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Tue, Jan 3, 2017 at 1:05 PM, Peter Eisentraut wrote: > On 11/7/16 5:31 PM, Merlin Moncure wrote: >> Regardless, it seems like you might be on to something, and I'm >> inclined to patch your change, test it, and roll it out to production. >> If it helps or at least narrows the problem down, we ought to give it >> consideration for inclusion (unless someone else can think of a good >> reason not to do that, heh!). > > Any results yet? Not yet unfortunately. I compiled the server with the change, but was not able get $libdir working so that I could just do a binary swap over my pgdg compiled package. If anyone has some pointers on how to do that, I'd appreciated it. Still getting checksum failures. Over the last 30 days, I see the following. Since enabling checksums FWICT none of the damage is permanent and rolls back with the transaction. So creepy! [root@rcdylsdbmpf001 pg_log]# cat *.log | grep "page verification failed" 2016-12-05 10:17:48 CST [postgres@castaging]: WARNING: page verification failed, calculated checksum 61797 but expected 61798 2016-12-05 11:15:31 CST [postgres@castaging]: WARNING: page verification failed, calculated checksum 37750 but expected 37749 2016-12-05 11:15:58 CST [postgres@castaging]: WARNING: page verification failed, calculated checksum 44483 but expected 44482 2016-12-05 11:16:33 CST [postgres@castaging]: WARNING: page verification failed, calculated checksum 58926 but expected 58925 2016-12-05 11:17:08 CST [postgres@castaging]: WARNING: page verification failed, calculated checksum 38527 but expected 38528 2016-12-05 11:18:34 CST [postgres@castaging]: WARNING: page verification failed, calculated checksum 61932 but expected 61933 2016-12-05 11:18:55 CST [postgres@castaging]: WARNING: page verification failed, calculated checksum 23757 but expected 23758 2016-12-05 12:13:48 CST [rms@mpf2]: WARNING: page verification failed, calculated checksum 44192 but expected 44225 at character 417 2016-12-08 14:18:37 CST [postgres@castaging]: WARNING: page verification failed, calculated checksum 36083 but expected 36082 2016-12-08 15:52:31 CST [postgres@castaging]: WARNING: page verification failed, calculated checksum 63414 but expected 63415 at character 1096 2016-12-09 09:12:21 CST [postgres@castaging]: WARNING: page verification failed, calculated checksum 25781 but expected 25780 2016-12-09 09:13:20 CST [postgres@castaging]: WARNING: page verification failed, calculated checksum 63043 but expected 63044 at character 4230 2016-12-12 08:57:45 CST [postgres@castaging]: WARNING: page verification failed, calculated checksum 31775 but expected 31771 2016-12-13 09:47:11 CST [postgres@castaging]: WARNING: page verification failed, calculated checksum 40802 but expected 40806 2016-12-15 12:49:04 CST [rms@mpf2]: WARNING: page verification failed, calculated checksum 11625 but expected 11592 at character 417 2016-12-15 12:51:08 CST [rms@mpf2]: WARNING: page verification failed, calculated checksum 51017 but expected 51018 2016-12-15 12:52:36 CST [rms@mpf2]: WARNING: page verification failed, calculated checksum 51017 but expected 51018 at character 417 2016-12-16 12:16:31 CST [rms@mpf2]: WARNING: page verification failed, calculated checksum 23580 but expected 23576 2016-12-20 13:59:33 CST [postgres@castaging]: WARNING: page verification failed, calculated checksum 45273 but expected 45285 2016-12-20 14:00:22 CST [postgres@castaging]: WARNING: page verification failed, calculated checksum 10524 but expected 10525 note second database 'mpf2'. This is a new development. Example of query that is jacking things is this: 2016-12-15 12:51:08 CST [rms@mpf2]: WARNING: page verification failed, calculated checksum 51017 but expected 51018 2016-12-15 12:51:08 CST [rms@mpf2]: CONTEXT: SQL statement " COPY ( SELECT 'DELETE FROM tblNAptCommonSample WHERE ReportPeriod = 201612;' UNION ALL SELECT format( 'INSERT INTO tblNAptCommonSample(' 'ReportPeriod, Period, AdjustmentType, PlanType, MSA, MSASubMkt, ' 'Sample, Occupancy, OccupancyChange, AverageRent, AverageRentChange, RentSF, ' 'RentSFChange)' 'VALUES(' '%s, %s, ''%s'', ''%s'', ''%s'', %s,' '%s, %s, %s, %s, %s, %s,' '%s)', ReportPeriod, Period, AdjustmentType, PlanType, MSA, MSASubMkt, c(Sample), c(Occupancy), c(OccupancyChange), c(AverageRent), c(AverageRentChange), c(RentSF), c(RentSFChange)) FROM tblNAptCommonSample WHERE Period = 201612 AND MSA != '5610' UNION ALL SELECT 'go' ) TO '/tmp/tblnaptcommonsample.sql'; " PL/pgSQL function writempf1history(integer) line 75 at EXECUTE or this: 2016-12-15 12:52:36 CST [r
Re: [HACKERS] merging some features from plpgsql2 project
On Tue, Jan 3, 2017 at 9:58 AM, Pavel Stehule wrote: > 2017-01-03 16:23 GMT+01:00 Merlin Moncure : >> So -1 to strict mode, unless we can make a case why this can't be done >> as part of checking/validation. > > Can be plpgsq.extra_errors and plpgsql.extra_warnings solution? > > I am thinking so there is a space for improvement (in extra_* usage) extra_warnings seems ok at the GUC level. However it's bad to have a body of code fail to compile based on GUC. check_function_bodies for example is a complete hack and should be avoided if at all possible IMO. There is very good informal rule that GUC should not impact behavior (minus some special cases like timeouts). Good examples of failure to follow this rule are mysql and php. Maybe settings at level of extension could be ok, but I'm skeptical. Good languages are clear without needing extra context. > Do you know plpgsql_check https://github.com/okbob/plpgsql_check ? Yes. This is good design and should be model for core-work (if any). In my ideal world, this could would be part of pgxn and to have pgxn client be installed in core. For plpgsql to enter modern era we need standardized packaging and deployment like cran, npm, etc. >> Other random points: >> *) Another major pain point is swapping in the input variables for >> debugging purposes. Something that emits a script based on a set of >> arguments would be wonderful. > > ??? Often for debugging of complicated cases I'm starting from errors in database log with function name and argument values. Sometimes I find myself pasting pl/pgsql function into text editor and replacing input variables with known values. >> >> *) Would also like to have a FINALLY block > > What you can do there? This is syntax sugar so you don't need second begin/end/exception block or duplicated code. It separates error handling from cleanup. BEGIN PERFORM dblink_connect(... EXCEPTION WHEN OTHERS THEN FINALLY PERFORM dblink_disconnect(... END; >> *) Some user visible mechanic other than forcing SQL through EXECUTE >> to be able to control plan caching would be useful. > > fully agree. > > Have you some ideas? > > What about plpgsql option (function scope) -- WITHOUT-PLAN-CACHE - any non > trivial plans will not be cached - and evaluated as parametrized query only. I have slight preference for syntax marker for each query, similar to INTO. Maybe 'UNCACHED'? On Tue, Jan 3, 2017 at 10:57 AM, Jim Nasby wrote: > Or just fix the issue, provide the backwards compatability GUCs and move on. I really don't think this will fly. I'm not buying your argument (at all) that compatibility breaks have have been cleanly done in the past, at least not in the modern era. In any event, marginal language improvements are not a good justification to do it. And yes, the continual monkey around with column names in pg_stat_activity are a major hassle. For heaven's sake, can we just add new columns and/or create a new view? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] merging some features from plpgsql2 project
On Tue, Dec 27, 2016 at 10:09 PM, Jim Nasby wrote: > On 12/27/16 4:56 PM, Merlin Moncure wrote: >> >> On Tue, Dec 27, 2016 at 1:54 AM, Pavel Stehule >> wrote: >>> >>> First I describe my initial position. I am strongly against introduction >>> "new" language - plpgsql2 or new plpgsql, or any else. The trust of >>> developers to us is important and introduction of any not compatible or >>> different feature has to have really big reason. PostgreSQL is >>> conservative >>> environment, and PLpgSQL should not be a exception. More - I have not any > > > Which is why this is an external fork of plpgsql. > > ** The real problem is that we have no mechanism for allowing a PL's > language/syntax/API to move forward without massive backwards compatibility > problems. ** Just got back from break :-). Have some thoughts on this. Backwards compatibility is really a fundamental problem. There's really no solution to it other than to try and avoid using syntax to solve problems. It should be obvious to everyone that plgsql cannot withstand a compatibility break. Another language could be offered as an alternative in core (say, pl/psm or pl/v8), but pl/pgsql has to support old code. Some really out there features could maybe be redacted (in particular, using = for assignment), but not not much. But I guess we're stuck with the status quo. I think we ought to avoid language features that influence the behavior (performance is ok) of the code (and that includes throwing errors). That's a freight train headed towards javscript's 'strict' mode, which is thinly disguised language fork. #option and pragma type syntaxes are trying to cheat the language -- hardly anyone uses them and it's a tricky way to try and make the language into something other than it is. C does it right -- dubious code is raised as warnings and it's up to the end user to determine which warnings are interesting and likely to be an error. So, rather than hacking the language to control throwing and errors and such there should be some ability validate the function heavily and verify suspicious use of INTO or other dubious things (unused variables, masked assignments, etc). The validation output could even be a set returning function. So -1 to strict mode, unless we can make a case why this can't be done as part of checking/validation. Other random points: *) Another major pain point is swapping in the input variables for debugging purposes. Something that emits a script based on a set of arguments would be wonderful. *) Would also like to have a FINALLY block *) A mechanic to manually print out a stack trace for debugging purposes would be helpful. *) COPY not being able to accept arguments as variables (in particular the filename) is a major headache *) Upon error, we ought to print a few lines of context on either side of the error. Not very fun to try and figure out where the errors is when you are working off of 'syntax error near "FROM"' etc. This is a major problem throughout the platform IMO. *) Some user visible mechanic other than forcing SQL through EXECUTE to be able to control plan caching would be useful. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] merging some features from plpgsql2 project
On Tue, Dec 27, 2016 at 1:54 AM, Pavel Stehule wrote: > Hi > > I reread ideas described on page https://github.com/trustly/plpgsql2 > > Some points are well and can be benefit for PlpgSQL. > > First I describe my initial position. I am strongly against introduction > "new" language - plpgsql2 or new plpgsql, or any else. The trust of > developers to us is important and introduction of any not compatible or > different feature has to have really big reason. PostgreSQL is conservative > environment, and PLpgSQL should not be a exception. More - I have not any > information from my customers, colleagues about missing features in this > language. If there is some gaps, then it is in outer environment - IDE, > deployment, testing, Breaking language compatibility is a really big deal. There has to be a lot of benefits to the effort and you have to make translation from plpgsql1 to plpgsql2 really simple. You have made some good points on the rationale but not nearly enough to justify implementation fork. So basically I agree. Having said that, If you don't mind I'd like to run with the topic (which I'm loosely interpreting as, "Things I'd like to do in SQL/PLPGSQL and can't"). #1 problem with plpgsql in my point of view is that the language and grammar are not supersets of sql. A lot of PLPGSQL keywords (EXECUTE, BEGIN, INTO, END) have incompatible meanings with our SQL implementation. IMNSHO, SQL ought to give the same behavior inside or outside of plpgsql. It doesn't, and this is one of the reasons why plpgsql may not be a good candidate for stored procedure implementation. #2 problem with plpgsql is after function entry it's too late to do things like set transaction isolation level and change certain kinds of variables (like statement_timeout). This is very obnoxious, I can't wrap the database in an API 100%; the application has to manage things that really should be controlled in SQL. #3 problem with plpgsql is complete lack of inlining. inlining function calls in postgres is a black art even for very trivial cases. This makes it hard for us to write quick things and in the worst case causes endless duplications of simple expressions. In short I guess the issue is that we don't have stored procedures and I don't see an easy path to getting there with the current language. There are a lot of other little annoyances but most of them can be solved without a compatibility break. It would be pretty neat if postgres SQL implementation could directly incorporate limited flow control and command execution. For example, CREATE my_proc(Done OUT BOOL) RETURNS BOOL AS $$ BEGIN; SET transaction_isolation = 'serializable'; SELECT some_plpgsql_func_returning_bool(); COMMIT; $$; CALL my_proc() UNTIL Done; Key points here are: *) my_proc is in native SQL (not plpgsql), and run outside of snapshot *) CALL is invocation into stored procedure. I extended it in similar fashion as pl/sql CALL (https://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_4008.htm) but anything will do for syntaxs as long as you get arbitrary control of procedure lifetime external to snapshot and transaction *) simple addition of UNTIL gets us out of the debate for best 'stored procedure language'. Keeping things to pure SQL really simplifies things since we already have statement parsing at tcop level. We just need some special handling for CALL. *) In my usage of plpgsql maybe 80% of database cases are covered purely in language but maybe 20% of cases need support from application typically where threading and transaction management is involved. With the above it would be more like 95% would be covered and if you extended CALL to something like: CALL my_proc() IN BACKGROUND UNTIL Done; ..where "IN BACKGOUND" moved execution to a background worker one could do just about everything in SQL in tasks that do nothing but read and write to the database that today need significant support from outside language (primarily bash for me). With respect to stuff you mentioned, like smarter handling of INTO, are you really sure you need to break compatibility for that? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb problematic operators
On Sun, Dec 11, 2016 at 10:59 PM, Craig Ringer wrote: > PgJDBC allows you to write ??, which is ugly, but tolerable, since the > JDBC spec doesn't have an escape syntax for it. This is the core problem; *JDBC* is busted. SQL reserves words but not punctuation marks so any assumption by client side code that characters are not going to be interpreted by the server are going to cause problems. As noted earlier ":" is equally problematic as that is hibernate's parameter marker and hibernate is probably in even greater usage than naked JDBC in the java community. Imagine trying to embed, say, perl, in java and reserving the very same punctuation marks and then complaining to the perl community that their language is broken due to usage overlap...that's what's happening here. If we really wanted to fix this, maybe the right way to think about the problem is a highly reduced character set and a pre-processor or an extension. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb problematic operators
On Fri, Dec 9, 2016 at 5:50 AM, Jordan Gigov wrote: > There is this problem with the jsonb operators "? text" "?| text[]" > and "?& text[]" that the question mark is typically used for prepared > statement parameters in the most used abstraction APIs in Java and > PHP. > > This really needs an alternative. Something like "HAS text", "HAS > ANY(text[])" and "HAS ALL(text[])" same as regular array usage. It > probably should be another word that has less chance of becoming a > conflict with another operator in future SQL specifications, but > that's for you to decide. > > It's not a good idea to expect everyone else to make for workarounds > for problems you choose to create. You are griping in the wrong place. "everyone else" has reserved characters for its own use that were not allowed to be reserved without a clean escaping mechanism -- hibernate does this, for example reserving ':' which is used in many places within SQL. Typically when you embed special characters in strings designed to be processed by something else you allow for that character to be directly. In the computer science world we generally call this escaping strings and it a very common and well understood practice. For some odd reason however the authors of java various frameworks seem to be impervious to the utility of the concept. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Wed, Nov 2, 2016 at 10:45 AM, Oskari Saarenmaa wrote: > 26.10.2016, 21:34, Andres Freund kirjoitti: >> >> Any chance that plsh or the script it executes does anything with the file >> descriptors it inherits? That'd certainly one way to get into odd corruption >> issues. >> >> We processor really should use O_CLOEXEC for the majority of it file >> handles. > > > Attached a patch to always use O_CLOEXEC in BasicOpenFile if we're not using > EXEC_BACKEND. It'd be nice to not expose all fds to most pl-languages > either, but I guess there's no easy solution to that without forcibly > closing all fds whenever any functions are called. FYI, this is not my first run-in with strange behavior, on this thread (not necessarily worth reading); https://www.postgresql.org/message-id/CAHyXU0x5mW-SbSuUBEshzumOaN7JPUWa7Ejza68HE-KY0Nq7Kg%40mail.gmail.com I had a similar set of starting conditions that resulted in very strange behavior (but not data corruption AFAICT) --the problem mysteriously disappeared when I fixed some bugs that would cause the routine to concurrently do the same operation. I would like to point out that I use both pl/sh (and via it, sqsh) very highly, so these problems are not necessarily the norm. Regardless, it seems like you might be on to something, and I'm inclined to patch your change, test it, and roll it out to production. If it helps or at least narrows the problem down, we ought to give it consideration for inclusion (unless someone else can think of a good reason not to do that, heh!). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Tue, Nov 1, 2016 at 8:56 AM, Tom Lane wrote: > Merlin Moncure writes: >> On Mon, Oct 31, 2016 at 10:32 AM, Oskari Saarenmaa wrote: >>> Your production system's postgres backends probably have a lot more open >>> files associated with them than the simple test case does. Since Postgres >>> likes to keep files open as long as possible and only closes them when you >>> need to free up fds to open new files, it's possible that your production >>> backends have almost all allowed fds used when you execute your pl/sh >>> function. >>> >>> If that's the case, the sqsh process that's executed may not have enough fds >>> to do what it wanted to do and because of busted error handling could end up >>> writing to fds that were opened by Postgres and point to $PGDATA files. > >> Does that apply? the mechanics are a sqsh function that basically does: >> cat foo.sql | sqsh >> pipe redirection opens a new process, right? > > Yeah, but I doubt that either level of the shell would attempt to close > inherited file handles. > > The real problem with Oskari's theory is that it requires not merely > busted, but positively brain-dead error handling in the shell and/or > sqsh, ie ignoring open() failures altogether. That seems kind of > unlikely. Still, I suspect he might be onto something --- there must > be some reason you can reproduce the issue in production and not in > your test bed, and number-of-open-files is as good a theory as I've > heard. > > Maybe the issue is not with open() failures, but with the resulting > FD numbers being much larger than sqsh is expecting. It would be > weird to try to store an FD in something narrower than int, but > I could see a use of select() being unprepared for large FDs. > Still, it's hard to translate that idea into scribbling on the > wrong file... Looking at the sqsh code, nothing really stands out. It's highly developed and all obvious errors are checked. There certainly could be a freak bug in there (or in libfreetds which sqsh links to) doing the damage though. In the meantime I'll continue to try and work a reliable reproduction. This particular routine only gets called in batches on a quarterly basis so things have settled down. Just a thought; could COPY be tricked into writing into the wrong file descriptor? For example, if a file was killed with a rm -rf and the fd pressured backend reopened the fd immediately? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Mon, Oct 31, 2016 at 10:32 AM, Oskari Saarenmaa wrote: > 27.10.2016, 21:53, Merlin Moncure kirjoitti: >> >> As noted earlier, I was not able to reproduce the issue with >> crashme.sh, which was: >> >> NUM_FORKS=16 >> do_parallel psql -p 5432 -c"select PushMarketSample('1740')" >> castaging_test >> do_parallel psql -p 5432 -c"select PushMarketSample('4400')" >> castaging_test >> do_parallel psql -p 5432 -c"select PushMarketSample('2160')" >> castaging_test >> do_parallel psql -p 5432 -c"select PushMarketSample('6680')" >> castaging_test >> >> >> (do_parallel is simple wrapper to executing the command in parallel up >> to NUM_FORKS). This is on the same server and cluster as above. >> This kind of suggests that either >> A) there is some concurrent activity from another process that is >> tripping the issue >> or >> B) there is something particular to the session invoking the function >> that is participating in the problem. As the application is >> structured, a single threaded node.js app is issuing the query that is >> high traffic and long lived. It's still running in fact and I'm kind >> of tempted to find some downtime to see if I can still reproduce via >> the UI. > > Your production system's postgres backends probably have a lot more open > files associated with them than the simple test case does. Since Postgres > likes to keep files open as long as possible and only closes them when you > need to free up fds to open new files, it's possible that your production > backends have almost all allowed fds used when you execute your pl/sh > function. > > If that's the case, the sqsh process that's executed may not have enough fds > to do what it wanted to do and because of busted error handling could end up > writing to fds that were opened by Postgres and point to $PGDATA files. Does that apply? the mechanics are a sqsh function that basically does: cat foo.sql | sqsh pipe redirection opens a new process, right? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Fri, Oct 28, 2016 at 3:16 PM, Jim Nasby wrote: > On 10/28/16 8:23 AM, Merlin Moncure wrote: >> >> On Thu, Oct 27, 2016 at 6:39 PM, Greg Stark wrote: >>> >>> On Thu, Oct 27, 2016 at 9:53 PM, Merlin Moncure >>> wrote: >>>> >>>> I think we can rule out faulty storage >>> >>> >>> Nobody ever expects the faulty storage > > > LOL > >> Believe me, I know. But the evidence points elsewhere in this case; >> this is clearly application driven. > > > FWIW, just because it's triggered by specific application behavior doesn't > mean there isn't a storage bug. That's what makes data corruption bugs such > a joy to figure out. > > BTW, if you haven't already, I would reset all your storage related options > and GUCs to safe defaults... plain old FSYNC, no cute journal / FS / mount > options, etc. Maybe this is related to the app, but the most helpful thing > right now is to find some kind of safe config so you can start bisecting. upthread, you might have noticed that I already did that. Here is the other evidence: *) server running fine for 5+ years *) other database on same cluster not impacted with 10x write activity *) no interesting logs reported in /var/log/messages, dmesg, etc *) san fabric turns over petabytes/day with no corruption. 100+ postgres clusters, 1000+ sql server clusters (and that's not production) *) storage/network teams have been through everything. nothing intersting/unusual to report *) we have infrequently run routing (posted upthread) that, when run, database crashed within minutes *) after turning on checksums, 30% of invocations of routine resulted in checksum error *) problem re-occurred after dump-restore and full cluster rebuild *) checksum error caused routine rollback. FWICT this prevented the damage *) everything is fine now that routine is not being run anymore you can come up with your conclusion, I've come up with mine. The only frustrating thing here is that I can't reproduce out of the production environment. If this database goes down I have 30 people sitting around so I can't take downtime lightly. > I would also consider alternatives to plsh, just to rule it out if nothing > else. I'd certainly look at some way to get sqsh out of the loop (again, > just to get something that doesn't crash). First idea that comes to mind is > a stand-alone shell script that watches a named pipe for a filename; when it > gets that file it runs it with sqsh and does something to signal completion. I do a lot of etl to/from sql server and it's all sqsh based. If I can figure out how to reproduce in a better way, I'll zero in on the problem in about 10 minutes. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Thu, Oct 27, 2016 at 6:39 PM, Greg Stark wrote: > On Thu, Oct 27, 2016 at 9:53 PM, Merlin Moncure wrote: >> I think we can rule out faulty storage > > Nobody ever expects the faulty storage Believe me, I know. But the evidence points elsewhere in this case; this is clearly application driven. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Thu, Oct 27, 2016 at 2:31 AM, Ants Aasma wrote: > On Wed, Oct 26, 2016 at 8:43 PM, Merlin Moncure wrote: >> /var/lib/pgsql/9.5/data/pg_log/postgresql-26.log | grep "page >> verification" >> 2016-10-26 11:26:42 CDT [postgres@castaging]: WARNING: page >> verification failed, calculated checksum 37251 but expected 37244 >> 2016-10-26 11:27:55 CDT [postgres@castaging]: WARNING: page >> verification failed, calculated checksum 37249 but expected 37244 >> 2016-10-26 12:16:44 CDT [postgres@castaging]: WARNING: page >> verification failed, calculated checksum 44363 but expected 44364 >> 2016-10-26 12:18:58 CDT [postgres@castaging]: WARNING: page >> verification failed, calculated checksum 49525 but expected 49539 >> 2016-10-26 12:19:12 CDT [postgres@castaging]: WARNING: page >> verification failed, calculated checksum 37345 but expected 37340 > > The checksum values are improbably close. The checksum algorithm has > decently good mixing of all bits in the page. Having the first byte > match in 5 checksums makes this 1:2^40 improbable. What is not mixed > in properly is the block number, it only gets xor'ed before packing > the value into 16bits using modulo 0x. So I'm pretty sure > different block numbers were used for writing out and reading in the > page. Either the blocknum gets corrupted between calculating the > checksum and writing the page out (unlikely given the proximity), or > the pages are somehow getting transposed in the storage. I think we can rule out faulty storage, and I'm reasonably sure nothing is writing to the database cluster except postgres itself. So far I have not executed an emergency dump/restore on this database, nor have I received any other log entries suggesting problems of any kind. Meaning, FWICT, the database is completely healthy notwithstanding the reported errors. This suggests (but does not prove) that the syscatalog damaging activity happened during the commission of the attached function and healed itself during the rollback. Previously to turning checksums, the transaction did *not* roll back and the damage was allowed stand and probably accumulated. As noted earlier, I was not able to reproduce the issue with crashme.sh, which was: NUM_FORKS=16 do_parallel psql -p 5432 -c"select PushMarketSample('1740')" castaging_test do_parallel psql -p 5432 -c"select PushMarketSample('4400')" castaging_test do_parallel psql -p 5432 -c"select PushMarketSample('2160')" castaging_test do_parallel psql -p 5432 -c"select PushMarketSample('6680')" castaging_test (do_parallel is simple wrapper to executing the command in parallel up to NUM_FORKS). This is on the same server and cluster as above. This kind of suggests that either A) there is some concurrent activity from another process that is tripping the issue or B) there is something particular to the session invoking the function that is participating in the problem. As the application is structured, a single threaded node.js app is issuing the query that is high traffic and long lived. It's still running in fact and I'm kind of tempted to find some downtime to see if I can still reproduce via the UI. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Wed, Oct 26, 2016 at 3:06 PM, Jim Nasby wrote: > On 10/26/16 2:25 PM, Merlin Moncure wrote: >> >> I don't think that's the case. sqsh is a psql-like utility. it >> writes to stdout and stderr only which is captured by plsh and sent. >> In this context shexec only wraps rm -f 'file' where 'file' is a file >> previously created with COPY in the same transaction. > > How do you know that? It could potentially be doing anything it wanted with > file handles. Removing the exec might "solve" the problem here, assuming > that the forked process doesn't still inherit all open FH's. > > In a nutshell, running arbitrary shell code (let alone arbitrary > executables) in a pl that's running inside the Postgres backend sounds > insanely risky to me. Well, it is insantely risky, but the code in this context is not arbitrary; it's only: PERFORM shexec('rm -f ' || _OutputFile); merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Wed, Oct 26, 2016 at 2:12 PM, Andres Freund wrote: > On 2016-10-26 13:49:12 -0500, Merlin Moncure wrote: >> On Wed, Oct 26, 2016 at 1:45 PM, Andres Freund wrote: >> > >> > >> > On October 26, 2016 9:38:49 PM GMT+03:00, Merlin Moncure >> > wrote: >> >>On Wed, Oct 26, 2016 at 1:34 PM, Andres Freund >> >>wrote: >> >>> Any chance that plsh or the script it executes does anything with the >> >>file descriptors it inherits? That'd certainly one way to get into odd >> >>corruption issues. >> >> >> >>not sure. it's pretty small -- see >> >>https://github.com/petere/plsh/blob/master/plsh.c >> > >> > Afaics that could also be in your script, not just plsh. The later doesn't >> > seem to close all file handles above stderr, which means that all handles >> > for relations etc week be open in your script. If you e.g. do any unusual >> > redirections (2>&17 or such), that could end badly. But I'm just on my >> > phone, in a taxi without seatbelts, at 60mph, so I didn't look carefully. >> >> gotcha :-). see above: >> *) sqshf: >> #!/bin/bash >> cat \ >> $2 \ >> | eval "sqsh $1 -L'datetime=%Y-%m-%d %H:%M:%S.%u' -G 7.0" >> >> echo "Success" >> >> *) shexec: >> #!/bin/bash >> >> eval $1 >> >> FWICT that's all that's happening here with respect to pl/sh. > > My point is that that doesn't mean anything. Whatever sqsh is, or > whatever $1 eval's to (and $2 for that matter), could access the > filehandles the backend has opened. I don't think that's the case. sqsh is a psql-like utility. it writes to stdout and stderr only which is captured by plsh and sent. In this context shexec only wraps rm -f 'file' where 'file' is a file previously created with COPY in the same transaction. In a test environment, spamming this routine with 16 threads I was not able to reproduce. Still hunting... merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Wed, Oct 26, 2016 at 1:45 PM, Andres Freund wrote: > > > On October 26, 2016 9:38:49 PM GMT+03:00, Merlin Moncure > wrote: >>On Wed, Oct 26, 2016 at 1:34 PM, Andres Freund >>wrote: >>> Any chance that plsh or the script it executes does anything with the >>file descriptors it inherits? That'd certainly one way to get into odd >>corruption issues. >> >>not sure. it's pretty small -- see >>https://github.com/petere/plsh/blob/master/plsh.c > > Afaics that could also be in your script, not just plsh. The later doesn't > seem to close all file handles above stderr, which means that all handles for > relations etc week be open in your script. If you e.g. do any unusual > redirections (2>&17 or such), that could end badly. But I'm just on my > phone, in a taxi without seatbelts, at 60mph, so I didn't look carefully. gotcha :-). see above: *) sqshf: #!/bin/bash cat \ $2 \ | eval "sqsh $1 -L'datetime=%Y-%m-%d %H:%M:%S.%u' -G 7.0" echo "Success" *) shexec: #!/bin/bash eval $1 FWICT that's all that's happening here with respect to pl/sh. I'm almost done with reproduction environment and if i get the issue I should be able to zero in on the problem. One final thing is that concurrency is a contributing factor. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Wed, Oct 26, 2016 at 1:34 PM, Andres Freund wrote: > Any chance that plsh or the script it executes does anything with the file > descriptors it inherits? That'd certainly one way to get into odd corruption > issues. not sure. it's pretty small -- see https://github.com/petere/plsh/blob/master/plsh.c merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Wed, Oct 26, 2016 at 1:09 PM, Tom Lane wrote: > Merlin Moncure writes: >> *) I've now strongly correlated this routine with the damage. > > Hmm. Do you have any way to replace the non-core calls with something > else? The "shexec('rm -f ' || _OutputFile)" bits could presumably be > converted to use contrib/adminpack's pg_file_unlink(), or an equivalent > one-liner in plperlu or plpythonu. I don't know what the sqshf > incantation does, though. > > This wouldn't be about fixing it so much as narrowing down where the > problem is. Will do. Before you spend a lot of time, let me get full confirmation that this is the source of the issue and I'll bake a reproduction script (there is still a possibility that something *else* caused it and by happenstance we caught it here). For posterity, sqshf does: CREATE OR REPLACE FUNCTION sqshf( ConnectionString TEXT, FileName TEXT) RETURNS TEXT AS $$#!/bin/bash cat \ $2 \ | eval "sqsh $1 -L'datetime=%Y-%m-%d %H:%M:%S.%u' -G 7.0" echo "Success" $$ LANGUAGE plsh; sqsh being the freetds wrapping linux console to sql server. So basically not much. shexec does: CREATE OR REPLACE FUNCTION shexec(_cmd TEXT) RETURNS TEXT AS $$#!/bin/bash eval $1 $$ LANGUAGE plsh; :-) merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Wed, Oct 26, 2016 at 12:43 PM, Merlin Moncure wrote: > On Wed, Oct 26, 2016 at 11:35 AM, Merlin Moncure wrote: >> On Tue, Oct 25, 2016 at 3:08 PM, Merlin Moncure wrote: >>> Confirmation of problem re-occurrence will come in a few days.I'm >>> much more likely to believe 6+sigma occurrence (storage, freak bug, >>> etc) should it prove the problem goes away post rebuild. >> >> ok, no major reported outage yet, but just got: >> >> 2016-10-26 11:27:55 CDT [postgres@castaging]: ERROR: invalid page in >> block 12 of relation base/203883/1259 *) I've now strongly correlated this routine with the damage. [root@rcdylsdbmpf001 ~]# cat /var/lib/pgsql/9.5/data/pg_log/postgresql-26.log | grep -i pushmarketsample | head -5 2016-10-26 11:26:27 CDT [postgres@castaging]: LOG: execute : SELECT PushMarketSample($1::TEXT) AS published 2016-10-26 11:26:40 CDT [postgres@castaging]: LOG: execute : SELECT PushMarketSample($1::TEXT) AS published PL/pgSQL function pushmarketsample(text,date,integer) line 103 at SQL statement PL/pgSQL function pushmarketsample(text,date,integer) line 103 at SQL statement 2016-10-26 11:26:42 CDT [postgres@castaging]: STATEMENT: SELECT PushMarketSample($1::TEXT) AS published *) First invocation was 11:26:27 CDT *) Second invocation was 11:26:40 and gave checksum error (as noted earlier 11:26:42) *) Routine attached (if interested) My next step is to set up test environment and jam this routine aggressively to see what happens. merlin /* * Send sample for a market to the pubs database. It will delete existing * sample and replace it with whatever is determined to be the best available * samble for each aparmtent in the market. * * XXX: assume 'asof' defines the period for which we are replacing data. * Should that assumption be invalidated, we will need to accept an argument * overriding the period. * * XXX: AsOf controls the extraction date but if a properites 'good' sample * as after asof, no sample is pulled for the property. */ SELECT DropFunction('PushMarketSample'); CREATE OR REPLACE FUNCTION PushMarketSample( _MarketID TEXT, _AsOf DATE DEFAULT now()::DATE, _MaxSampleDays INT DEFAULT 90, Result OUT TEXT) RETURNS TEXT AS $$ DECLARE _OutputFile TEXT; BEGIN _OutputFile := format('/tmp/apartment_sample_%s_%s.sql', _MarketId, to_char(_AsOf, 'MMDD')); DROP TABLE IF EXISTS AptSample; CREATE TEMP TABLE AptSample ON COMMIT DROP AS SELECT ApartmentId, FloorplanNumber, msasubmkt AS SubmarketId, (SELECT Period FROM DataEntryPeriod) AS Sampled, Vacant, DiscountedLowRent, DiscountedHighRent, LowRent, HighRent, LowDiscountRate, HighDiscountRate, Remarks, /* Is there a spot for remarks in MPF? */ TotalUnits, UnitCount, IncludeInAggregations, Rent, Occupancy FROM vw_ApartmentSample aq WHERE Current AND Sampled::DATE BETWEEN (_AsOf - _MaxSampleDays) AND _AsOf AND _MarketID = msa AND DataSource = 'Survey'; /* XXX: write out the sql script because we are stupidly prohibited from * using pl/pgsql variables embedded into sql statements when using COPY. */ DROP TABLE IF EXISTS Script; CREATE TEMP TABLE Script ON COMMIT DROP AS SELECT 'begin transaction' UNION ALL SELECT format( 'DELETE tblAptDet ' 'FROM tblAptDet d ' 'JOIN tblApt a ON a.id = d.id ' 'WHERE ' ' a.msa = %s ' ' AND d.Period = %s', quote_literal(_MarketID), (SELECT Period FROM DataEntryPeriod)) UNION ALL SELECT format( 'INSERT INTO tblaptdet(' 'Id, PlanNumber, Period, Vacant, DLoRate, DHiRate, ' 'FLoRate, FHiRate, LoDiscount, HiDiscount, Remarks, Quantity, ' 'UpDtoHistory) ' 'VALUES(' '%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)', quote_nullable(ApartmentId), quote_nullable(FloorplanNumber), quote_nullable(Sampled), quote_nullable(Vacant), quote_nullable(DiscountedLowRent), quote_nullable(DiscountedHighRent), quote_nullable(LowRent), quote_nullable(HighRent), quote_nullable(LowDiscountRate), quote_nullable(HighDiscountRate), quote_nullable(Remarks), quote_nullable(UnitCount), CASE WHEN IncludeInAggregations THEN 1 ELSE 0 END) FROM AptSample UNION ALL SELECT 'commit transaction' UNION ALL SELECT 'go'; EXECUTE 'COPY Script TO ' || quote_literal(_OutputFile); /* push in the new sample! */ SELECT sqshf(ConnectTo('MPFUI'), _OutputFile) INTO Result; PERFORM shexec('rm -
Re: [HACKERS] emergency outage requiring database restart
On Wed, Oct 26, 2016 at 11:35 AM, Merlin Moncure wrote: > On Tue, Oct 25, 2016 at 3:08 PM, Merlin Moncure wrote: >> Confirmation of problem re-occurrence will come in a few days.I'm >> much more likely to believe 6+sigma occurrence (storage, freak bug, >> etc) should it prove the problem goes away post rebuild. > > ok, no major reported outage yet, but just got: > > 2016-10-26 11:27:55 CDT [postgres@castaging]: ERROR: invalid page in > block 12 of relation base/203883/1259 *) still on 9.5.2 :( not in control of the package update process *) getting more of these: [root@rcdylsdbmpf001 data]# cat /var/lib/pgsql/9.5/data_checksum/pg_log/postgresql-26.log | grep "invalid page" 2016-10-26 11:26:42 CDT [postgres@castaging]: ERROR: invalid page in block 10 of relation base/203883/1259 2016-10-26 11:27:55 CDT [postgres@castaging]: ERROR: invalid page in block 12 of relation base/203883/1259 2016-10-26 12:16:44 CDT [postgres@castaging]: ERROR: invalid page in block 13 of relation base/203883/1259 2016-10-26 12:18:58 CDT [postgres@castaging]: ERROR: invalid page in block 15 of relation base/203883/1259 2016-10-26 12:19:12 CDT [postgres@castaging]: ERROR: invalid page in block 7 of relation base/203883/2662 castaging=# select relname from pg_class where oid in(1259, 2662); relname pg_class_oid_index pg_class *) only one database in the cluster is damaged. I do not suspect storage at this point *) Currently I can execute pg_dump with no problems. So far impact is not severe but soon I have to do dump/restore *) age(xmin) from pg_class gives all reasonable values (approx 757k) that aren't 2147483647 *) All cases of statement executing getting this error in a pl/pgsql routine that does the following: BEGIN ... DROP TABLE IF EXISTS foo; CREATE TEMP TABLE foo ON COMMIT DROP AS ... SELECT ... INTO Result; DROP TABLE IF EXISTS bar; CREATE TEMP TABLE bar ON COMMIT DROP AS ... PERFORM ; EXCEPTION WHEN OTHERS THEN PERFORM ; RAISE; END; *) page verification miss is numerically close [root@rcdylsdbmpf001 ~]# cat /var/lib/pgsql/9.5/data/pg_log/postgresql-26.log | grep "page verification" 2016-10-26 11:26:42 CDT [postgres@castaging]: WARNING: page verification failed, calculated checksum 37251 but expected 37244 2016-10-26 11:27:55 CDT [postgres@castaging]: WARNING: page verification failed, calculated checksum 37249 but expected 37244 2016-10-26 12:16:44 CDT [postgres@castaging]: WARNING: page verification failed, calculated checksum 44363 but expected 44364 2016-10-26 12:18:58 CDT [postgres@castaging]: WARNING: page verification failed, calculated checksum 49525 but expected 49539 2016-10-26 12:19:12 CDT [postgres@castaging]: WARNING: page verification failed, calculated checksum 37345 but expected 37340 *) sample log leading into the first error (all queries are logged) 2016-10-26 11:26:40 CDT [postgres@castaging]: LOG: execute : SELECT PushMarketSample($1::TEXT) AS published 2016-10-26 11:26:40 CDT [postgres@castaging]: DETAIL: parameters: $1 = '8840' 2016-10-26 11:26:40 CDT [postgres@mpf2]: LOG: statement: START TRANSACTION ISOLATION LEVEL REPEATABLE READ 2016-10-26 11:26:40 CDT [postgres@mpf2]: LOG: statement: SAVEPOINT s2 2016-10-26 11:26:40 CDT [postgres@mpf2]: LOG: execute : DECLARE c1 CURSOR FOR SELECT period FROM public.dataentryperiod 2016-10-26 11:26:40 CDT [postgres@mpf2]: LOG: statement: FETCH 100 FROM c1 2016-10-26 11:26:41 CDT [postgres@mpf2]: LOG: statement: CLOSE c1 2016-10-26 11:26:41 CDT [postgres@mpf2]: LOG: execute : DECLARE c2 CURSOR FOR SELECT period FROM public.dataentryperiod 2016-10-26 11:26:41 CDT [postgres@mpf2]: LOG: statement: FETCH 100 FROM c2 2016-10-26 11:26:41 CDT [postgres@mpf2]: LOG: statement: CLOSE c2 2016-10-26 11:26:41 CDT [rms@mpf2]: ERROR: relation "tblagent" does not exist at character 15 2016-10-26 11:26:41 CDT [rms@mpf2]: STATEMENT: select * from tblagent 2016-10-26 11:26:41 CDT [rms@mpf2]: LOG: execute : SELECT CURRENT_SCHEMA() 2016-10-26 11:26:42 CDT [postgres@mpf2]: LOG: execute : DECLARE c3 CURSOR FOR SELECT period FROM public.dataentryperiod 2016-10-26 11:26:42 CDT [postgres@mpf2]: LOG: statement: FETCH 100 FROM c3 2016-10-26 11:26:42 CDT [postgres@mpf2]: LOG: statement: CLOSE c3 2016-10-26 11:26:42 CDT [postgres@mpf2]: LOG: execute : DECLARE c4 CURSOR FOR SELECT period FROM public.dataentryperiod 2016-10-26 11:26:42 CDT [postgres@mpf2]: LOG: statement: FETCH 100 FROM c4 2016-10-26 11:26:42 CDT [postgres@mpf2]: LOG: statement: CLOSE c4 2016-10-26 11:26:42 CDT [postgres@castaging]: WARNING: page verification failed, calculated checksum 37251 but expected 37244 2016-10-26 11:26:42 CDT [postgres@castaging]: CONTEXT: SQL statement "CREATE TEMP TABLE ApartmentPublishBySubmkt ON COMMIT DROP AS SELECT ApartmentId, To
Re: [HACKERS] emergency outage requiring database restart
On Tue, Oct 25, 2016 at 3:08 PM, Merlin Moncure wrote: > Confirmation of problem re-occurrence will come in a few days.I'm > much more likely to believe 6+sigma occurrence (storage, freak bug, > etc) should it prove the problem goes away post rebuild. ok, no major reported outage yet, but just got: 2016-10-26 11:27:55 CDT [postgres@castaging]: ERROR: invalid page in block 12 of relation base/203883/1259 merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Tue, Oct 25, 2016 at 2:31 PM, Tom Lane wrote: > Merlin Moncure writes: >> What if the subsequent dataloss was in fact a symptom of the first >> outage? Is in theory possible for data to appear visible but then be >> eaten up as the transactions making the data visible get voided out by >> some other mechanic? I had to pull a quick restart the first time and >> everything looked ok -- or so I thought. What I think was actually >> happening is that data started to slip into the void. It's like >> randomly sys catalogs were dropping off. I bet other data was, too. I >> can pull older backups and verify that. It's as if some creeping xmin >> was snuffing everything out. > > Might be interesting to look at age(xmin) in a few different system > catalogs. I think you can ignore entries with age = 2147483647; > those should be frozen rows. But if you see entries with very large > ages that are not that, it'd be suspicious. nothing really stands out. The damage did re-occur after a dump/restore -- not sure about a cluster level rebuild. No problems previous to that. This suggests that if this theory holds the damage would have had to have been under the database level -- perhaps in clog. Maybe hint bits and clog did not agree as to commit or delete status for example. clog has plenty of history leading past the problem barrier: -rwx-- 1 postgres postgres 256K Jul 10 16:21 -rwx-- 1 postgres postgres 256K Jul 21 12:39 0001 -rwx-- 1 postgres postgres 256K Jul 21 13:19 0002 -rwx-- 1 postgres postgres 256K Jul 21 13:59 0003 Confirmation of problem re-occurrence will come in a few days.I'm much more likely to believe 6+sigma occurrence (storage, freak bug, etc) should it prove the problem goes away post rebuild. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Tue, Oct 25, 2016 at 12:57 PM, Alvaro Herrera wrote: > Merlin Moncure wrote: > >> After last night, I rebuilt the cluster, turning on checksums, turning >> on synchronous commit (it was off) and added a standby replica. This >> should help narrow the problem down should it re-occur; if storage is >> bad (note, other database on same machine is doing 10x write activity >> and is fine) or something is scribbling on shared memory (my guess >> here) then checksums should be popped, right? > > Not really sure about that. As I recall we compute the CRC on the > buffer's way out, based on the then-current contents, so if something > scribbles on the buffer while it's waiting to be evicted, the CRC > computation would include the new (corrupted) bytes rather than the > original ones -- see FlushBuffer. Huh. I have a new theory on this. Dealing with the reconstituted database, I'm finding more things -- functions and such, that are simply gone and had to be rebuilt -- they escaped notice as they were not in primary code paths. Recall that the original outage came manifested as queries getting stuck, possibly on spinlock (we don't know for sure). After that, things started to randomly disappear, possibly from system catalogs (but now need to go back and verify older data, I think). There were three autovac processes running. What if the subsequent dataloss was in fact a symptom of the first outage? Is in theory possible for data to appear visible but then be eaten up as the transactions making the data visible get voided out by some other mechanic? I had to pull a quick restart the first time and everything looked ok -- or so I thought. What I think was actually happening is that data started to slip into the void. It's like randomly sys catalogs were dropping off. I bet other data was, too. I can pull older backups and verify that. It's as if some creeping xmin was snuffing everything out. The confirmation of this should be obvious -- if that's indeed the case, the backup and restored cluster should no longer present data loss. Given that I was getting that every 1-2 days, we should be able to figure that out pretty soon. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Mon, Oct 24, 2016 at 9:18 PM, Alvaro Herrera wrote: > Merlin Moncure wrote: >> On Mon, Oct 24, 2016 at 6:01 PM, Merlin Moncure wrote: > >> > Corruption struck again. >> > This time got another case of view busted -- attempting to create >> > gives missing 'type' error. >> >> Call it a hunch -- I think the problem is in pl/sh. > > I've heard that before. well, yeah, previously I had an issue where the database crashed during a heavy concurrent pl/sh based load. However the problems went away when I refactored the code. Anyways, I looked at the code and couldn't see anything obviously wrong so who knows? All I know is my production database is exploding continuously and I'm looking for answers. The only other extension in heavy use on this servers is postgres_fdw. The other database on the cluster is fine, which kind of suggests we are not facing clog or WAL type problems. After last night, I rebuilt the cluster, turning on checksums, turning on synchronous commit (it was off) and added a standby replica. This should help narrow the problem down should it re-occur; if storage is bad (note, other database on same machine is doing 10x write activity and is fine) or something is scribbling on shared memory (my guess here) then checksums should be popped, right? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Mon, Oct 24, 2016 at 6:01 PM, Merlin Moncure wrote: > On Thu, Oct 20, 2016 at 1:52 PM, Merlin Moncure wrote: >> On Wed, Oct 19, 2016 at 2:39 PM, Merlin Moncure wrote: >>> On Wed, Oct 19, 2016 at 9:56 AM, Bruce Momjian wrote: >>>> On Wed, Oct 19, 2016 at 08:54:48AM -0500, Merlin Moncure wrote: >>>>> > Yeah. Believe me -- I know the drill. Most or all the damage seemed >>>>> > to be to the system catalogs with at least two critical tables dropped >>>>> > or inaccessible in some fashion. A lot of the OIDs seemed to be >>>>> > pointing at the wrong thing. Couple more datapoints here. >>>>> > >>>>> > *) This database is OLTP, doing ~ 20 tps avg (but very bursty) >>>>> > *) Another database on the same cluster was not impacted. However >>>>> > it's more olap style and may not have been written to during the >>>>> > outage >>>>> > >>>>> > Now, this infrastructure running this system is running maybe 100ish >>>>> > postgres clusters and maybe 1000ish sql server instances with >>>>> > approximately zero unexplained data corruption issues in the 5 years >>>>> > I've been here. Having said that, this definitely smells and feels >>>>> > like something on the infrastructure side. I'll follow up if I have >>>>> > any useful info. >>>>> >>>>> After a thorough investigation I now have credible evidence the source >>>>> of the damage did not originate from the database itself. >>>>> Specifically, this database is mounted on the same volume as the >>>>> operating system (I know, I know) and something non database driven >>>>> sucked up disk space very rapidly and exhausted the volume -- fast >>>>> enough that sar didn't pick it up. Oh well :-) -- thanks for the help >>>> >>>> However, disk space exhaustion should not lead to corruption unless the >>>> underlying layers lied in some way. >>> >>> I agree -- however I'm sufficiently separated from the things doing >>> the things that I can't verify that in any real way. In the meantime >>> I'm going to take standard precautions (enable checksums/dedicated >>> volume/replication). Low disk space also does not explain the bizarre >>> outage I had last friday. >> >> ok, data corruption struck again. This time disk space is ruled out, >> and access to the database is completely denied: >> postgres=# \c castaging >> WARNING: leaking still-referenced relcache entry for >> "pg_index_indexrelid_index" > > Corruption struck again. > This time got another case of view busted -- attempting to create > gives missing 'type' error. Call it a hunch -- I think the problem is in pl/sh. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Thu, Oct 20, 2016 at 1:52 PM, Merlin Moncure wrote: > On Wed, Oct 19, 2016 at 2:39 PM, Merlin Moncure wrote: >> On Wed, Oct 19, 2016 at 9:56 AM, Bruce Momjian wrote: >>> On Wed, Oct 19, 2016 at 08:54:48AM -0500, Merlin Moncure wrote: >>>> > Yeah. Believe me -- I know the drill. Most or all the damage seemed >>>> > to be to the system catalogs with at least two critical tables dropped >>>> > or inaccessible in some fashion. A lot of the OIDs seemed to be >>>> > pointing at the wrong thing. Couple more datapoints here. >>>> > >>>> > *) This database is OLTP, doing ~ 20 tps avg (but very bursty) >>>> > *) Another database on the same cluster was not impacted. However >>>> > it's more olap style and may not have been written to during the >>>> > outage >>>> > >>>> > Now, this infrastructure running this system is running maybe 100ish >>>> > postgres clusters and maybe 1000ish sql server instances with >>>> > approximately zero unexplained data corruption issues in the 5 years >>>> > I've been here. Having said that, this definitely smells and feels >>>> > like something on the infrastructure side. I'll follow up if I have >>>> > any useful info. >>>> >>>> After a thorough investigation I now have credible evidence the source >>>> of the damage did not originate from the database itself. >>>> Specifically, this database is mounted on the same volume as the >>>> operating system (I know, I know) and something non database driven >>>> sucked up disk space very rapidly and exhausted the volume -- fast >>>> enough that sar didn't pick it up. Oh well :-) -- thanks for the help >>> >>> However, disk space exhaustion should not lead to corruption unless the >>> underlying layers lied in some way. >> >> I agree -- however I'm sufficiently separated from the things doing >> the things that I can't verify that in any real way. In the meantime >> I'm going to take standard precautions (enable checksums/dedicated >> volume/replication). Low disk space also does not explain the bizarre >> outage I had last friday. > > ok, data corruption struck again. This time disk space is ruled out, > and access to the database is completely denied: > postgres=# \c castaging > WARNING: leaking still-referenced relcache entry for > "pg_index_indexrelid_index" Corruption struck again. This time got another case of view busted -- attempting to create gives missing 'type' error. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Fri, Oct 21, 2016 at 1:37 PM, Merlin Moncure wrote: > On Fri, Oct 21, 2016 at 8:03 AM, Kevin Grittner wrote: >> On Tue, Oct 18, 2016 at 8:45 AM, Merlin Moncure wrote: >> >>> Most or all the damage seemed to be to the system catalogs with >>> at least two critical tables dropped or inaccessible in some >>> fashion. A lot of the OIDs seemed to be pointing at the wrong >>> thing. >> >> While the oid in pg_class often matches the filename, that is not >> true after some operations (like CLUSTER or VACUUM FULL). It is >> the relfilenode column that is the definitive link to the file. > > no such operations happened. In the first instance at least one table > dropped from the system catalogs. I have a hunch that the heap is > fine (supported by the size of the database on disk). At this > precise moment I'm restoring the database to another fileserver in > order to do some forensic analysis, also in the hopes of getting the > second database online in order to expedite recovery. > > ah -- done. :-) deleting the init file didn't help, but starting up > single user allowed the start up to gracefully fail with a FATAL cache > lookup. OK, I have some good (very- in the specific case of yours truly) news to report. Doing a filesystem level copy to a test server I was able to relfilenode swap one of the critical tables over the place of the refilenode of the stored backup. Not being able know the file to copy from, I figured out the source node by judging the size and using 'strings' utility. Data recovery for that table at least appears to be 100%. For those following along, this simple process is only likely to work easily if the table contains only system types; no user types, enums, composites, etc, since those have a unique ID for each data restore. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Fri, Oct 21, 2016 at 8:03 AM, Kevin Grittner wrote: > On Tue, Oct 18, 2016 at 8:45 AM, Merlin Moncure wrote: > >> Most or all the damage seemed to be to the system catalogs with >> at least two critical tables dropped or inaccessible in some >> fashion. A lot of the OIDs seemed to be pointing at the wrong >> thing. > > While the oid in pg_class often matches the filename, that is not > true after some operations (like CLUSTER or VACUUM FULL). It is > the relfilenode column that is the definitive link to the file. no such operations happened. In the first instance at least one table dropped from the system catalogs. I have a hunch that the heap is fine (supported by the size of the database on disk). At this precise moment I'm restoring the database to another fileserver in order to do some forensic analysis, also in the hopes of getting the second database online in order to expedite recovery. ah -- done. :-) deleting the init file didn't help, but starting up single user allowed the start up to gracefully fail with a FATAL cache lookup. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Thu, Oct 20, 2016 at 3:16 PM, Alvaro Herrera wrote: > Merlin Moncure wrote: > >> single user mode dumps core :( >> >> bash-4.1$ postgres --single -D /var/lib/pgsql/9.5/data castaging >> LOG: 0: could not change directory to "/root": Permission denied >> LOCATION: resolve_symlinks, exec.c:293 >> Segmentation fault (core dumped) >> >> Core was generated by `postgres --single -D /var/lib/pgsql/9.5/data >> castaging'. >> Program terminated with signal 11, Segmentation fault. >> #0 0x00797d6f in ?? () >> Missing separate debuginfos, use: debuginfo-install >> postgresql95-server-9.5.2-1PGDG.rhel6.x86_64 >> (gdb) bt >> #0 0x00797d6f in ?? () >> #1 0x0079acf1 in RelationCacheInitializePhase3 () >> #2 0x007b35c5 in InitPostgres () >> #3 0x006b9b53 in PostgresMain () >> #4 0x005f30fb in main () > > Maybe > rm global/pg_internal.init > and try again? Will do when I can do that had to do emergency restore + some unfun data reconstruction from the query log. Notably there is a much larger database in the same cluster which is undamaged. This server is new to production usage, maybe 2 months. Here is contents of pg_extension plpgsql dblink hstore postgres_fdw plsh * not used pg_trgm * not used plr * not used tablefunc * not used adminpack * not used plpythonu * not used postgis * not used postgis_topology * not used Short term plan is to separate the database to it's own cluster, install replication and checksums. All queries to this database are logged. Here is the contents of the log leading into and after the the crash: oct 17 crash: 2016-10-17 12:12:24 CDT [rms@castaging]: DETAIL: parameters: $1 = '21121', $2 = '8', $3 = '2016-10-13', $4 = NULL, $5 = NULL, $6 = NULL, $7 = NULL, $8 = NULL, $9 = NULL, $10 = NULL, $11 = 't', $12 2016-10-17 12:12:24 CDT [rms@castaging]: LOG: execute : SELECT NULL AS PROCEDURE_CAT, n.nspname AS PROCEDURE_SCHEM, p.proname AS PROCEDURE_NAME, NULL, NULL, NULL, d.description AS REMARKS 2016-10-17 12:12:24 CDT [rms@castaging]: LOG: execute : SELECT n.nspname,p.proname,p.prorettype,p.proargtypes, t.typtype,t.typrelid , p.proargnames, p.proargmodes, p.proallargtypes , p.o 2016-10-17 12:12:24 CDT [rms@castaging]: LOG: execute : select * from checkin($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12) as result 2016-10-17 12:12:24 CDT [rms@castaging]: DETAIL: parameters: $1 = '114333', $2 = 'rrosillo', $3 = 'CALLER', $4 = 'Survey', $5 = 'Happy', $6 = 'Callback', $7 = 'OTHER', $8 = '2016-10-18 01:05:00', 2016-10-17 12:12:24 CDT [rms@castaging]: LOG: execute S_3: COMMIT 2016-10-17 12:12:25 CDT [@]: ERROR: could not open relation with OID 1203933 <-- first sign of damage 2016-10-17 12:12:25 CDT [@]: CONTEXT: automatic analyze of table "castaging.public.apartment" oct 20 crash: 2016-10-20 12:46:38 CDT [postgres@castaging]: LOG: statement: SELECT CallsByUser() AS byuser 2016-10-20 12:46:40 CDT [postgres@castaging]: LOG: statement: SELECT CallCenterOverviewJSON() AS overview 2016-10-20 12:46:41 CDT [postgres@castaging]: LOG: statement: SELECT CallCenterUserTrackingJSON() AS tracking 2016-10-20 12:46:41 CDT [postgres@castaging]: LOG: statement: SELECT MarketOverviewJSON() AS market 2016-10-20 12:46:42 CDT [postgres@castaging]: LOG: execute : SELECT SubMarketOverviewJSON($1::TEXT) AS submkt 2016-10-20 12:46:42 CDT [postgres@castaging]: DETAIL: parameters: $1 = '640' 2016-10-20 12:46:44 CDT [postgres@castaging]: LOG: statement: SELECT CallsByUser() AS byuser 2016-10-20 12:46:46 CDT [postgres@castaging]: LOG: statement: SELECT CallCenterOverviewJSON() AS overview 2016-10-20 12:46:47 CDT [postgres@castaging]: LOG: statement: SELECT CallCenterUserTrackingJSON() AS tracking 2016-10-20 12:46:47 CDT [postgres@castaging]: ERROR: "pg_description_o_c_o_index" is an index <-- first sign of damage 2016-10-20 12:46:47 CDT [postgres@castaging]: CONTEXT: SQL function "callcenterusertrackingjson" during startup 2016-10-20 12:46:47 CDT [postgres@castaging]: STATEMENT: SELECT CallCenterUserTrackingJSON() AS tracking 2016-10-20 12:46:47 CDT [postgres@castaging]: WARNING: leaking still-referenced relcache entry for "pg_class_oid_index" CallCenterUserTrackingJSON() and friends are not particularly interesting except that they are making use of of json_agg(). They were also called basically all day long in 5 second intervals. I guess this isn't saying very much, but I'm starting to smell a rat here. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Thu, Oct 20, 2016 at 2:07 PM, Tom Lane wrote: > Merlin Moncure writes: >> single user mode dumps core :( > > You've got a mess there :-( > >> Missing separate debuginfos, use: debuginfo-install >> postgresql95-server-9.5.2-1PGDG.rhel6.x86_64 > > This backtrace would likely be much more informative if you did the above. can't; don't have the package unfortunately. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Thu, Oct 20, 2016 at 1:52 PM, Merlin Moncure wrote: > On Wed, Oct 19, 2016 at 2:39 PM, Merlin Moncure wrote: >> On Wed, Oct 19, 2016 at 9:56 AM, Bruce Momjian wrote: >>> On Wed, Oct 19, 2016 at 08:54:48AM -0500, Merlin Moncure wrote: >>>> > Yeah. Believe me -- I know the drill. Most or all the damage seemed >>>> > to be to the system catalogs with at least two critical tables dropped >>>> > or inaccessible in some fashion. A lot of the OIDs seemed to be >>>> > pointing at the wrong thing. Couple more datapoints here. >>>> > >>>> > *) This database is OLTP, doing ~ 20 tps avg (but very bursty) >>>> > *) Another database on the same cluster was not impacted. However >>>> > it's more olap style and may not have been written to during the >>>> > outage >>>> > >>>> > Now, this infrastructure running this system is running maybe 100ish >>>> > postgres clusters and maybe 1000ish sql server instances with >>>> > approximately zero unexplained data corruption issues in the 5 years >>>> > I've been here. Having said that, this definitely smells and feels >>>> > like something on the infrastructure side. I'll follow up if I have >>>> > any useful info. >>>> >>>> After a thorough investigation I now have credible evidence the source >>>> of the damage did not originate from the database itself. >>>> Specifically, this database is mounted on the same volume as the >>>> operating system (I know, I know) and something non database driven >>>> sucked up disk space very rapidly and exhausted the volume -- fast >>>> enough that sar didn't pick it up. Oh well :-) -- thanks for the help >>> >>> However, disk space exhaustion should not lead to corruption unless the >>> underlying layers lied in some way. >> >> I agree -- however I'm sufficiently separated from the things doing >> the things that I can't verify that in any real way. In the meantime >> I'm going to take standard precautions (enable checksums/dedicated >> volume/replication). Low disk space also does not explain the bizarre >> outage I had last friday. > > ok, data corruption struck again. This time disk space is ruled out, > and access to the database is completely denied: > postgres=# \c castaging > WARNING: leaking still-referenced relcache entry for > "pg_index_indexrelid_index" single user mode dumps core :( bash-4.1$ postgres --single -D /var/lib/pgsql/9.5/data castaging LOG: 0: could not change directory to "/root": Permission denied LOCATION: resolve_symlinks, exec.c:293 Segmentation fault (core dumped) Core was generated by `postgres --single -D /var/lib/pgsql/9.5/data castaging'. Program terminated with signal 11, Segmentation fault. #0 0x00797d6f in ?? () Missing separate debuginfos, use: debuginfo-install postgresql95-server-9.5.2-1PGDG.rhel6.x86_64 (gdb) bt #0 0x00797d6f in ?? () #1 0x0079acf1 in RelationCacheInitializePhase3 () #2 0x007b35c5 in InitPostgres () #3 0x006b9b53 in PostgresMain () #4 0x005f30fb in main () (gdb) merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Wed, Oct 19, 2016 at 2:39 PM, Merlin Moncure wrote: > On Wed, Oct 19, 2016 at 9:56 AM, Bruce Momjian wrote: >> On Wed, Oct 19, 2016 at 08:54:48AM -0500, Merlin Moncure wrote: >>> > Yeah. Believe me -- I know the drill. Most or all the damage seemed >>> > to be to the system catalogs with at least two critical tables dropped >>> > or inaccessible in some fashion. A lot of the OIDs seemed to be >>> > pointing at the wrong thing. Couple more datapoints here. >>> > >>> > *) This database is OLTP, doing ~ 20 tps avg (but very bursty) >>> > *) Another database on the same cluster was not impacted. However >>> > it's more olap style and may not have been written to during the >>> > outage >>> > >>> > Now, this infrastructure running this system is running maybe 100ish >>> > postgres clusters and maybe 1000ish sql server instances with >>> > approximately zero unexplained data corruption issues in the 5 years >>> > I've been here. Having said that, this definitely smells and feels >>> > like something on the infrastructure side. I'll follow up if I have >>> > any useful info. >>> >>> After a thorough investigation I now have credible evidence the source >>> of the damage did not originate from the database itself. >>> Specifically, this database is mounted on the same volume as the >>> operating system (I know, I know) and something non database driven >>> sucked up disk space very rapidly and exhausted the volume -- fast >>> enough that sar didn't pick it up. Oh well :-) -- thanks for the help >> >> However, disk space exhaustion should not lead to corruption unless the >> underlying layers lied in some way. > > I agree -- however I'm sufficiently separated from the things doing > the things that I can't verify that in any real way. In the meantime > I'm going to take standard precautions (enable checksums/dedicated > volume/replication). Low disk space also does not explain the bizarre > outage I had last friday. ok, data corruption struck again. This time disk space is ruled out, and access to the database is completely denied: postgres=# \c castaging WARNING: leaking still-referenced relcache entry for "pg_index_indexrelid_index" merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Wed, Oct 19, 2016 at 9:56 AM, Bruce Momjian wrote: > On Wed, Oct 19, 2016 at 08:54:48AM -0500, Merlin Moncure wrote: >> > Yeah. Believe me -- I know the drill. Most or all the damage seemed >> > to be to the system catalogs with at least two critical tables dropped >> > or inaccessible in some fashion. A lot of the OIDs seemed to be >> > pointing at the wrong thing. Couple more datapoints here. >> > >> > *) This database is OLTP, doing ~ 20 tps avg (but very bursty) >> > *) Another database on the same cluster was not impacted. However >> > it's more olap style and may not have been written to during the >> > outage >> > >> > Now, this infrastructure running this system is running maybe 100ish >> > postgres clusters and maybe 1000ish sql server instances with >> > approximately zero unexplained data corruption issues in the 5 years >> > I've been here. Having said that, this definitely smells and feels >> > like something on the infrastructure side. I'll follow up if I have >> > any useful info. >> >> After a thorough investigation I now have credible evidence the source >> of the damage did not originate from the database itself. >> Specifically, this database is mounted on the same volume as the >> operating system (I know, I know) and something non database driven >> sucked up disk space very rapidly and exhausted the volume -- fast >> enough that sar didn't pick it up. Oh well :-) -- thanks for the help > > However, disk space exhaustion should not lead to corruption unless the > underlying layers lied in some way. I agree -- however I'm sufficiently separated from the things doing the things that I can't verify that in any real way. In the meantime I'm going to take standard precautions (enable checksums/dedicated volume/replication). Low disk space also does not explain the bizarre outage I had last friday. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Tue, Oct 18, 2016 at 8:45 AM, Merlin Moncure wrote: > On Mon, Oct 17, 2016 at 2:04 PM, Alvaro Herrera > wrote: >> Merlin Moncure wrote: >> >>> castaging=# CREATE OR REPLACE VIEW vw_ApartmentSample AS >>> castaging-# SELECT ... >>> ERROR: 42809: "pg_cast_oid_index" is an index >>> LINE 11: FROM ApartmentSample s >>> ^ >>> LOCATION: heap_openrv_extended, heapam.c:1304 >>> >>> should I be restoring from backups? >> >> It's pretty clear to me that you've got catalog corruption here. You >> can try to fix things manually as they emerge, but that sounds like a >> fool's errand. > > Yeah. Believe me -- I know the drill. Most or all the damage seemed > to be to the system catalogs with at least two critical tables dropped > or inaccessible in some fashion. A lot of the OIDs seemed to be > pointing at the wrong thing. Couple more datapoints here. > > *) This database is OLTP, doing ~ 20 tps avg (but very bursty) > *) Another database on the same cluster was not impacted. However > it's more olap style and may not have been written to during the > outage > > Now, this infrastructure running this system is running maybe 100ish > postgres clusters and maybe 1000ish sql server instances with > approximately zero unexplained data corruption issues in the 5 years > I've been here. Having said that, this definitely smells and feels > like something on the infrastructure side. I'll follow up if I have > any useful info. After a thorough investigation I now have credible evidence the source of the damage did not originate from the database itself. Specifically, this database is mounted on the same volume as the operating system (I know, I know) and something non database driven sucked up disk space very rapidly and exhausted the volume -- fast enough that sar didn't pick it up. Oh well :-) -- thanks for the help merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Mon, Oct 17, 2016 at 2:04 PM, Alvaro Herrera wrote: > Merlin Moncure wrote: > >> castaging=# CREATE OR REPLACE VIEW vw_ApartmentSample AS >> castaging-# SELECT ... >> ERROR: 42809: "pg_cast_oid_index" is an index >> LINE 11: FROM ApartmentSample s >> ^ >> LOCATION: heap_openrv_extended, heapam.c:1304 >> >> should I be restoring from backups? > > It's pretty clear to me that you've got catalog corruption here. You > can try to fix things manually as they emerge, but that sounds like a > fool's errand. Yeah. Believe me -- I know the drill. Most or all the damage seemed to be to the system catalogs with at least two critical tables dropped or inaccessible in some fashion. A lot of the OIDs seemed to be pointing at the wrong thing. Couple more datapoints here. *) This database is OLTP, doing ~ 20 tps avg (but very bursty) *) Another database on the same cluster was not impacted. However it's more olap style and may not have been written to during the outage Now, this infrastructure running this system is running maybe 100ish postgres clusters and maybe 1000ish sql server instances with approximately zero unexplained data corruption issues in the 5 years I've been here. Having said that, this definitely smells and feels like something on the infrastructure side. I'll follow up if I have any useful info. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Mon, Oct 17, 2016 at 2:04 PM, Alvaro Herrera wrote: > Merlin Moncure wrote: > >> castaging=# CREATE OR REPLACE VIEW vw_ApartmentSample AS >> castaging-# SELECT ... >> ERROR: 42809: "pg_cast_oid_index" is an index >> LINE 11: FROM ApartmentSample s >> ^ >> LOCATION: heap_openrv_extended, heapam.c:1304 >> >> should I be restoring from backups? > > It's pretty clear to me that you've got catalog corruption here. You > can try to fix things manually as they emerge, but that sounds like a > fool's errand. agreed. current plan is to restore from backups, and recover as much data as I can. Also doing bugfix release and going to enable checksums. We had several good backups since the previous outage so it's not clear the events are related but after months of smooth operation I find that coincidence highly suspicious. As always, we need to suspect hardware problems but I'm highly abstracted from them -- using esx + san. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Mon, Oct 17, 2016 at 1:39 PM, Merlin Moncure wrote: > On Thu, Oct 13, 2016 at 4:13 PM, Tom Lane wrote: >> Merlin Moncure writes: >>> Today I had an emergency production outage on a server. >>> ... >>> Adding all this up it smells like processes were getting stuck on a >>> spinlock. >> >> Maybe. If it happens again, probably the most useful debug data would >> be stack traces from some of the busy processes. > > Another odd datapoint on this server. Things were running pretty good > but an application crashed on a missing view. Trying to recreate the > view, I got: > > CREATE OR REPLACE VIEW vw_ApartmentQueueLastGood AS > SELECT ... > > ERROR: type "vw_apartmentqueuelastgood" already exists > HINT: A relation has an associated type of the same name, so you must > use a name that doesn't conflict with any existing type. > > ...which was pretty strange. I had to manually delete the pg_type > record in order to create the view. I'm getting more reports of > 'could not open relation with oid=X' errors so I could be facing data > corruption :(. castaging=# CREATE OR REPLACE VIEW vw_ApartmentSample AS castaging-# SELECT ... ERROR: 42809: "pg_cast_oid_index" is an index LINE 11: FROM ApartmentSample s ^ LOCATION: heap_openrv_extended, heapam.c:1304 should I be restoring from backups? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Thu, Oct 13, 2016 at 4:13 PM, Tom Lane wrote: > Merlin Moncure writes: >> Today I had an emergency production outage on a server. >> ... >> Adding all this up it smells like processes were getting stuck on a spinlock. > > Maybe. If it happens again, probably the most useful debug data would > be stack traces from some of the busy processes. Another odd datapoint on this server. Things were running pretty good but an application crashed on a missing view. Trying to recreate the view, I got: CREATE OR REPLACE VIEW vw_ApartmentQueueLastGood AS SELECT ... ERROR: type "vw_apartmentqueuelastgood" already exists HINT: A relation has an associated type of the same name, so you must use a name that doesn't conflict with any existing type. ...which was pretty strange. I had to manually delete the pg_type record in order to create the view. I'm getting more reports of 'could not open relation with oid=X' errors so I could be facing data corruption :(. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY as a set returning function
On Fri, Sep 30, 2016 at 9:56 PM, Tom Lane wrote: > Craig Ringer writes: >> On 1 Oct. 2016 05:20, "Tom Lane" wrote: >>> I think the last of those suggestions has come up before. It has the >>> large advantage that you don't have to remember a different syntax for >>> copy-as-a-function. > >> That sounds fantastic. It'd help this copy variant retain festure parity >> with normal copy. And it'd bring us closer to being able to FETCH in non >> queries. > > On second thought, though, this couldn't exactly duplicate the existing > COPY syntax, because COPY relies heavily on the rowtype of the named > target table to tell it what it's copying. You'd need some new syntax > to provide the list of column names and types, which puts a bit of > a hole in the "syntax we already know" argument. A SRF-returning-record > would have a leg up on that, because we do have existing syntax for > defining the concrete rowtype that any particular call returns. One big disadvantage of SRF-returning-record syntax is that functions are basically unwrappable with generic wrappers sans major gymnastics such as dynamically generating the query and executing it. This is a major disadvantage relative to the null::type hack we use in the populate_record style functions and perhaps ought to make this (SRF-returning-record syntax) style of use discouraged for useful library functions. If there were a way to handle wrapping I'd withdraw this minor objection -- this has come up in dblink discussions a few times). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Infinite recursion in bitshift
On Fri, Oct 14, 2016 at 3:31 PM, Tom Lane wrote: > Andreas Seltenreich writes: >> Tom Lane writes: >>> Seems sane, though I wonder if it'd be better to use -INT_MAX rather >>> than -VARBITMAXLEN. > >> I am undecided between those two. -INT_MAX might be a more precise fix >> for the problem, but the extra distance to the danger zone was kind of >> soothing :-). > > Yeah, might as well use the tighter limit. > > Poking around in varbit.c, I noticed some other places that were assuming > that a typmod couldn't exceed VARBITMAXLEN. anybit_typmodin() enforces > that, but there are places where a user can shove in an arbitrary integer, > eg > > regression=# select "bit"(42, 2147483647); > ERROR: invalid memory alloc request size 18446744073441116169 > > I fixed those too and pushed it. Thanks for the report! Curious -- are there real world scenarios where this would happen? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] emergency outage requiring database restart
Today I had an emergency production outage on a server. This particular server was running 9.5.2. The symptoms were interesting so I thought I'd report. Here is what I saw: *) User CPU was pegged 100% *) Queries reading data would block and not respond to cancel or terminate *) pg_stat_activity reported no waiting queries (but worked fine otherwise). Adding all this up it smells like processes were getting stuck on a spinlock. Connections quickly got eaten up and situation was desperately urgent so I punted and did an immediate restart and things came back normally. I had a console to the database and did manage to grab contents of pg_stat_activity and noticed several trivial queries were running normally (according to pg_stat_activity) but were otherwise stuck. Attempting to run one of them myself, I noted query got stuck and did not cancel. I was in a terrible rush but am casting around for stuff to grab out in case that happens again -- 'perf top' would be a natural choice I guess. Three autovacuum processes were running. Obviously going to do bugfix upgrade but was wondering if anybody has seen anything like this. This particular server was upgraded to 9.5 somewhat recently but ran on 9.2 for years with no issues. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] munmap() failure due to sloppy handling of hugepage size
On Wed, Oct 12, 2016 at 5:18 PM, Tom Lane wrote: > Merlin Moncure writes: >> ISTM all this silliness is pretty much unique to linux anyways. >> Instead of reading the filesystem, what about doing test map and test >> unmap? > > And if mmap succeeds and munmap fails, you'll recover how exactly? > > If this API were less badly designed, we'd not be having this problem > in the first place ... I was thinking to 'guess' in a ^2 loop in the case the obvious unmap didn't work, finally aborting if no guess worked. :-). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] munmap() failure due to sloppy handling of hugepage size
On Wed, Oct 12, 2016 at 5:10 PM, Tom Lane wrote: > Alvaro Herrera writes: >> Tom Lane wrote: >>> According to >>> https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt >>> looking into /proc/meminfo is the longer-standing API and thus is >>> likely to work on more kernel versions. Also, if you look into >>> /sys then you are going to see multiple possible values and it's >>> not clear how to choose the right one. > >> I'm not sure that this is the best rationale. In my system there are >> 2MB and 1GB huge page sizes; in systems with lots of memory (let's say 8 >> GB of shared memory is requested) it seems a clear winner to allocate 8 >> 1GB hugepages than 4096 2MB hugepages because the page table is so much >> smaller. The /proc interface only shows the 2MB page size, so if we go >> that route we'd not be getting the full benefit of the feature. > > And you'll tell mmap() which one to do how exactly? I haven't found > anything explaining how applications get to choose which page size applies > to their request. The kernel document says that /proc/meminfo reflects > the "default" size, and I'd assume that that's what we'll get from mmap. hm. for (recent) linux, I see: MAP_HUGE_2MB, MAP_HUGE_1GB (since Linux 3.8) Used in conjunction with MAP_HUGETLB to select alternative hugetlb page sizes (respectively, 2 MB and 1 GB) on systems that support multiple hugetlb page sizes. More generally, the desired huge page size can be configured by encoding the base-2 logarithm of the desired page size in the six bits at the offset MAP_HUGE_SHIFT. (A value of zero in this bit field provides the default huge page size; the default huge page size can be discovered vie the Hugepagesize field exposed by /proc/meminfo.) Thus, the above two constants are defined as: #define MAP_HUGE_2MB(21 << MAP_HUGE_SHIFT) #define MAP_HUGE_1GB(30 << MAP_HUGE_SHIFT) The range of huge page sizes that are supported by the system can be discovered by listing the subdirectories in /sys/kernel/mm/hugepages. via: http://man7.org/linux/man-pages/man2/mmap.2.html#NOTES ISTM all this silliness is pretty much unique to linux anyways. Instead of reading the filesystem, what about doing test map and test unmap? We could zero in on the page size for default I think with some probing of known possible values. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autonomous transactions
On Tue, Oct 11, 2016 at 10:06 AM, Petr Jelinek wrote: > On 10/10/16 16:44, Merlin Moncure wrote: >> On Thu, Oct 6, 2016 at 3:53 PM, Simon Riggs wrote: >>> On 6 October 2016 at 21:27, Robert Haas wrote: >>>> I think we should implement background transactions and call them >>>> background transactions. That allows us to expose additional >>>> functionality which is useful, like the ability to kick something off >>>> and check back later for the results. There's no reason to call it >>>> background transactions and also call it autonomous transactions: one >>>> feature doesn't need two names. >>> >>> I'm happy to also invoke it via an alternate mechanism or API, so that >>> it can continue to be used even if the above mechanism changes. >>> >>> We have no need to wait for the perfect solution, even assuming we >>> would ever agree that just one exists. >> >> -1 on implementing both autonomous and background transactions. This >> will confuse everyone. > > I personally care much more about having background transactions than > autonomous ones (as I only ever had use-cases for the background ones) > so don't agree there. All right. But would you agree then that AT should at least emulate competing implementations? A major advantage of bgworkers is possibly supporting concurrent activity and maybe the syntax could be more directed to possibly moving in that direction other than copying oracle style (PRAGMA etc), particularly if the locking rules are substantially different. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autonomous transactions
On Thu, Oct 6, 2016 at 3:53 PM, Simon Riggs wrote: > On 6 October 2016 at 21:27, Robert Haas wrote: >> I think we should implement background transactions and call them >> background transactions. That allows us to expose additional >> functionality which is useful, like the ability to kick something off >> and check back later for the results. There's no reason to call it >> background transactions and also call it autonomous transactions: one >> feature doesn't need two names. > > I'm happy to also invoke it via an alternate mechanism or API, so that > it can continue to be used even if the above mechanism changes. > > We have no need to wait for the perfect solution, even assuming we > would ever agree that just one exists. -1 on implementing both autonomous and background transactions. This will confuse everyone. The lingo here is no so important, I think. What *is* important is defining how the locking and execution rules should work and the implementation should flow from that. Those rules should be estimated from competing implementations and how well they work. +1 for any solution that makes migration from other solutions to postgres easier. bgworkers should be considered if you want things to run in parallel. Reading the proposal (note, I may have missed it) it isn't clear to me if you can have the parent and AT run a query at the same time. Should this (parallel execution) be a design goal, then that's the end of the story. However I don't think it is TBH. ISTM the expectation is single threaded behavior with finer grained control of commits. If we're not 100% clear on this point one way or the other then things are a bit preemptive. Maybe we are clear and I missed something? One major advantage non-bgworker serilized execution approach is that certain classes of deadlock are easier to detect or do not exist since there is only one execution state; AIUI it's impossible for two transaction states to be simultaneously waiting assuming the pl/pgsql instuctions are not run in parallel with one exception, and that is the AT trying to acquire a lock exclusively held by the master. If the AT blocks on the parent it ought to be O(1) and instant to detect that and roll it back with right supporting infrastructure in the lock manager. It also makes sharing execution state much easier, especially the parts that look like, "I'm waiting here until the other guy finishes" since there's only one "guy". How will advisory locks work? I think they'd block with bgworkers and not block with non-bgworkers. What about other session based stuff like prepared statements? Expectations around those cases out to clarify the implementation. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why we lost Uber as a user
On Wed, Aug 17, 2016 at 5:18 PM, Jim Nasby wrote: > On 8/17/16 2:51 PM, Simon Riggs wrote: >> On 17 August 2016 at 12:19, Greg Stark wrote: >>> Yes, this is exactly what it should be doing and exactly why it's >>> useful. Physical replication accurately replicates the data from the >>> master including "corruption" whereas a logical replication system >>> will not, causing divergence and possible issues during a failover. >> >> >> Yay! Completely agree. >> >> Physical replication, as used by DRBD and all other block-level HA >> solutions, and also used by other databases, such as Oracle. >> >> Corruption on the master would often cause errors that would prevent >> writes and therefore those changes wouldn't even be made, let alone be >> replicated. > > > My experience has been that you discover corruption after it's already > safely on disk, and more than once I've been able to recover by using data > on a londiste replica. > > As I said originally, it's critical to understand the different solutions > and the pros and cons of each. There is no magic bullet. Data point: in the half or so cases I've experienced corruption on replicated systems, in all cases but one the standby was clean. The 'unclean' case actually 8.2 warm standby; the source of the corruption was a very significant bug where prepared statements would write back corrupted data if the table definitions changed under the statement (fixed in 8.3). In that particular case the corruption was very unfortunately quite widespread and passed directly along to the standby server. This bug nearly costed us a user as well although not nearly so famous as uber :-). In the few modern cases I've seen I've not been able to trace it back to any bug in postgres (in particular multixact was ruled out) and I've chalked it up to media or (more likely I think) filesystem problems in the face of a -9 reset. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why we lost Uber as a user
On Thu, Jul 28, 2016 at 8:16 AM, pgwhatever wrote: > Statement-Based replication has a lot of problems with it like indeterminate > UDFs. Here is a link to see them all: > https://dev.mysql.com/doc/refman/5.7/en/replication-sbr-rbr.html#replication-sbr-rbr-sbr-disadvantages Sure. It's also incredibly efficient with respect to bandwidth -- so, if you're application was engineered to work around those problems it's a huge win. They could have used pgpool, but I guess the fix was already in. Taking a step back, from the outside, it looks like uber: *) has a very thick middleware, very thin database with respect to logic and complexity *) has a very high priority on quick and cheap (in terms of bandwidth) replication *) has decided the database needs to be interchangeable *) is not afraid to make weak or erroneous technical justifications as a basis of stack selection (the futex vs ipc argument I felt was particularly awful -- it ignored the fact we use spinlocks) The very fact that they swapped it out so easily suggests that they were not utilizing the database as they could have, and a different technical team might have come to a different result. Postgres is a very general system and rewards deep knowledge such that it can outperform even specialty systems in the hands of a capable developer (for example, myself). I'm just now hammering in the final coffin nails that will get solr swapped out for jsonb backed postgres. I guess it's fair to say that they felt mysql is closer to what they felt a database should do out of the box. That's disappointing, but life moves on. The takeaways are: *) people like different choices of replication mechanics -- statement level sucks a lot of the time, but not all the time *) hs/sr simplicity of configuration and operation is a big issue. it's continually gotten better and still needs to *) bad QC can cost you customers. how much regression coverage do we have of hs/sr? *) postgres may not be the ideal choice for those who want a thin and simple database merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why we lost Uber as a user
On Tue, Jul 26, 2016 at 5:07 PM, Tom Lane wrote: > Josh Berkus writes: >> To explain this in concrete terms, which the blog post does not: > >> 1. Create a small table, but one with enough rows that indexes make >> sense (say 50,000 rows). > >> 2. Make this table used in JOINs all over your database. > >> 3. To support these JOINs, index most of the columns in the small table. > >> 4. Now, update that small table 500 times per second. > >> That's a recipe for runaway table bloat; VACUUM can't do much because >> there's always some minutes-old transaction hanging around (and SNAPSHOT >> TOO OLD doesn't really help, we're talking about minutes here), and >> because of all of the indexes HOT isn't effective. > > Hm, I'm not following why this is a disaster. OK, you have circa 100% > turnover of the table in the lifespan of the slower transactions, but I'd > still expect vacuuming to be able to hold the bloat to some small integer > multiple of the minimum possible table size. (And if the table is small, > that's still small.) I suppose really long transactions (pg_dump?) could > be pretty disastrous, but there are ways around that, like doing pg_dump > on a slave. > > Or in short, this seems like an annoyance, not a time-for-a-new-database > kind of problem. Well, the real annoyance as I understand it is the raw volume of bytes of WAL traffic a single update of a field can cause. They switched to statement level replication(!). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug with plpgsql handling of NULL argument of compound type
On Fri, Jul 22, 2016 at 1:39 PM, David G. Johnston wrote: > On Fri, Jul 22, 2016 at 2:13 PM, Tom Lane wrote: >> >> There is a rather squishy question as to whether NULL::composite_type >> should be semantically equivalent to ROW(NULL,NULL,...)::composite_type. >> If it is, then the SELECT should have failed before even getting into the >> plpgsql function, because ROW(NULL,NULL) is surely not a valid value of >> type c. The SQL standard seems to believe that these things *are* >> equivalent (at least, that was how we read the spec for IS [NOT] NULL). >> > > I dislike that they are considered equal in various circumstances but if > that's we are guided toward c'est la vie. Not sure we are guided there. Currently we follow the spec specifically with the IS NULL operator but not in other cases. For example. postgres=# select row(null, null) is null; ?column? ── t postgres=# select coalesce(row(null, null), row(1,1)); coalesce ── (,) Seem not to agree (at all) since I'm pretty sure the spec defines coalesce in terms of IS NULL. The basic problem we have is that in postgres the record variable is a distinct thing from its contents and the spec does not treat it that was. For my part, I think the spec is totally out to lunch on this point but we've been stuck with the status quo for quite some time now -- there's been no compelling narrative that suggests how things should be changed and to what. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DO with a large amount of statements get stuck with high memory consumption
On Mon, Jul 18, 2016 at 8:59 AM, Jan Wieck wrote: > > > On Mon, Jul 18, 2016 at 9:43 AM, Tom Lane wrote: >> >> Merlin Moncure writes: >> > BTW, while the fix does address the cleanup performance issue, it's >> > still the case that anonymous code blocks burn up lots of resident >> > memory (my 315k example I tested with ate around 8gb IIRC) when run >> > like this. My question is, if the pl/pgsql code block is anonymous >> > and not in some kind of a loop, why bother caching the plan at all? >> >> Nobody got around to it. Also, as you note, it's not as simple as >> "don't cache if in a DO block". You'd need to track whether you were >> inside any sort of looping construct. Depending on how difficult >> that turned out to be, it might add overhead to regular functions >> that we don't want. > > Agreed. And from the structures themselves it is not really easy to detect > if inside of a loop, the toplevel, while, for and if all use the same > statement > block and call exec_stmts(), which in turn calls exec_stmt() for each > element in that list. It is not impossible to add a flag, set at PL compile > time, to that element and check it every time, the statement is executed. > But such a change definitely needs more testing and probably won't > qualify for backpatching. Right. Note, not arguing for backpatch here, just some open speculation and some evidence that we still have a problem (although nearly as nasty of one -- the pre-patch behavior of not responding to cancel is very dangerous and solved). Hm, maybe, instead of trying to figure out if in a loop, set a 'called' flag with each statement and only cache when touched the second time. (If that's easier, dunno). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DO with a large amount of statements get stuck with high memory consumption
On Sat, Jul 16, 2016 at 2:47 PM, Jan Wieck wrote: > On Tue, Jul 12, 2016 at 3:29 PM, Merlin Moncure wrote: >> >> I've noticed that pl/pgsql functions/do commands do not behave well >> when the statement resolves and frees memory. To be clear: >> >> FOR i in 1..100 >> LOOP >> INSERT INTO foo VALUES (i); >> END LOOP; >> >> ...runs just fine while >> >> BEGIN >> INSERT INTO foo VALUES (1); >> INSERT INTO foo VALUES (2); >> ... >> INSERT INTO foo VALUES (100); >> END; > > > This sounds very much like what led to commit > 25c539233044c235e97fd7c9dc600fb5f08fe065. > > It seems that patch was only applied to master and never backpatched to 9.5 > or earlier. You're right; thanks (my bad for missing that). For those following along, the case that turned this up was: DO ...; Where the insertion step was a large number of standalone insert statements. (temp table creation isn't necessary to turn up this bug, but it's a common pattern when sending batch updates to a server). For those following along, the workaround I recommend would be to do this: do $d$ begin create function doit() returns void as $$ $$ language sql; perform doit(); end; $d$; BTW, while the fix does address the cleanup performance issue, it's still the case that anonymous code blocks burn up lots of resident memory (my 315k example I tested with ate around 8gb IIRC) when run like this. My question is, if the pl/pgsql code block is anonymous and not in some kind of a loop, why bother caching the plan at all? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] DO with a large amount of statements get stuck with high memory consumption
I've noticed that pl/pgsql functions/do commands do not behave well when the statement resolves and frees memory. To be clear: FOR i in 1..100 LOOP INSERT INTO foo VALUES (i); END LOOP; ...runs just fine while BEGIN INSERT INTO foo VALUES (1); INSERT INTO foo VALUES (2); ... INSERT INTO foo VALUES (100); END; (for the curious, create a script yourself via copy ( select 'do $$begin create temp table foo(i int);' union all select format('insert into foo values (%s);', i) from generate_series(1,100) i union all select 'raise notice ''abandon all hope!''; end; $$;' ) to '/tmp/breakit.sql'; ...while consume amounts of resident memory proportional to the number of statemnts and eventually crash the server. The problem is obvious; each statement causes a plan to get created and the server gets stuck in a loop where SPI_freeplan() is called repeatedly. Everything is working as designed I guess, but when this happens it's really unpleasant: the query is uncancellable and unterminatable, nicht gut. A pg_ctl kill ABRT will do the trick but I was quite astonished to see linux take a few minutes to clean up the mess (!) on a somewhat pokey virtualized server with lots of memory. With even as little as ten thousand statements the cleanup time far exceed the runtime of the statement block. I guess the key takeaway here is, "don't do that"; pl/pgsql aggressively generates plans and turns out to be a poor choice for bulk loading because of all the plan caching. Having said that, I can't help but wonder if there should be a (perhaps user configurable) limit to the amount of SPI plans a single function call should be able to acquire on the basis you are going to smack into very poor behaviors in the memory subsystem. Stepping back, I can't help but wonder what the value of all the plan caching going on is at all for statement blocks. Loops might comprise a notable exception, noted. I'd humbly submit though that (relative to functions) it's much more likely to want to do something like insert a lot of statements and a impossible to utilize any cached plans. This is not an academic gripe -- I just exploded production :-D. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] IMPORT FOREIGN SCHEMA can't be run in in pl/pgsql due to INTO
On Mon, Jul 11, 2016 at 3:09 PM, Tom Lane wrote: > Merlin Moncure writes: >> Currently pl/pgsql interprets the mandatory INTO of IMPORT FOREIGN >> SCHEMA as INTO variable. > > Ugh, that's definitely a bug. > >> I estimate this to be minor oversight in >> pl/pgsql parsing with respect to the introduction of this statement. > > While we can certainly hack it by something along the lines of not > recognizing INTO when the first token was IMPORT, the whole thing > seems awfully messy and fragile. And it will certainly break again > the next time somebody decides that INTO is le mot juste in some new > SQL command. I wish we could think of a safer, more future-proof > solution. I have no idea what that would be, though, short of > deprecating INTO altogether. This is a natural consequence of having two almost-but-not-quite-the-same grammars handing the same shared language. There are a similar set of annoyances compiling C with a C++ compiler as we all know. In a perfect world, SQL procedural extensions would be a proper superset and we'd have *one* grammar handling everything. Among other niceties this would make moving forward with stored procedures a much simpler discussion. Well, C'est la vie :-D. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] IMPORT FOREIGN SCHEMA can't be run in in pl/pgsql due to INTO
Currently pl/pgsql interprets the mandatory INTO of IMPORT FOREIGN SCHEMA as INTO variable. I estimate this to be minor oversight in pl/pgsql parsing with respect to the introduction of this statement. Assuming it's easily fixed, would a patch to fix pl/pgsql parsing be accepted? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MVCC overheads
On Thu, Jul 7, 2016 at 11:45 AM, Pete Stevenson wrote: > Hi postgresql hackers - > > I would like to find some analysis (published work, blog posts) on the > overheads affiliated with the guarantees provided by MVCC isolation. More > specifically, assuming the current workload is CPU bound (as opposed to IO) > what is the CPU overhead of generating the WAL, the overhead of version > checking and version creation, and of garbage collecting old and unnecessary > versions? For what it’s worth, I am working on a research project where it is > envisioned that some of this work can be offloaded. That's going to be hard to measure. First, what you didn't say is, 'with respect to what?'. You mention WAL for example. WAL is more of a crash safety mechanism than anything and it's not really fair to include it in an analysis of 'MVCC overhead', or at least not completely. One thing that MVCC *does* objectively cause is bloat, although you can still get bloat without MVCC if you (for example) delete rows or rewrite rows such that they can't fit in their old slot. MVCC definitely incurs some runtime overhead to check visibility but the amount of overhead is highly dependent on the specific workload. Postgres 'hint bits' reduce the cost to near zero for many workloads but in other workloads they are expensive to maintain and cause a lot of extra traffic. One nice feature about not having to worry about visibility is that you can read data directly out of the index. We have some workarounds to deal with that ('all visible bit') but again the amount of benefit from that strategy is going to be very situation specific. Stepping back, the overhead of MVCC in postgres (and probably other systems too) has been continually reduced over the years -- the really nasty parts have been relegated to background cleanup processing. That processing is pretty sequential and the 'i/o bottleneck' is finally getting solved on cheap storage pushing things back into the cpu space. In summary, I think the future of MVCC and transactional systems is very bright, and the data management systems that discard transactional safety in order to get some short term performance gains is, uh, not so bright. Transactions are essential in systems where data integrity matters. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions
On Fri, Jul 1, 2016 at 11:45 AM, Alvaro Herrera wrote: > Merlin Moncure wrote: > >> It's pretty easy to craft a query where you're on the winning side, >> but what's the worst case of doing two pass...is constant folding a >> non trivial fraction of planning time? > > One thing that has been suggested is to re-examine the plan after > planning is done, and if execution time is estimated to be large (FSVO), > then run a second planning pass with more expensive optimizations > enabled to try and find better plans. The guiding principle would be to > continue to very quickly find good enough plans for > frequent/small/simple queries, but spend more planning effort on more > complex ones where execution is likely to take much longer than planning > time. > > So doing constant-folding twice would be enabled for the second planning > pass. I like this idea. Maybe a GUC controlling the cost based cutoff (with 0 meaning, "assume the worst and plan the hard way first"). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions
On Fri, Jul 1, 2016 at 10:27 AM, Robert Haas wrote: > On Fri, Jul 1, 2016 at 10:20 AM, Merlin Moncure wrote: >> Yeah. Also, even if you could parse out those cases, it's major >> optimization fence. Consider if you have an ORDER BY clause here: >> >> SELECT FROM foo WHERE a OR b ORDER BY c; >> >> ... by pushing inside a union, you're going to be in trouble in real >> world cases. That's just a mess and it would add a lot of runtime >> analysis of the alternative paths. It's hard for me to believe >> rewriting is easier and simpler than rewriting 'false OR x' to 'x'. I >> also thing that constant folding strategies are going to render much >> more sensible output to EXPLAIN. > > I don't think that it's easier and simpler and didn't intend to say > otherwise. I do think that I've run across LOTS of queries over the > years where rewriting OR using UNION ALL was a lot faster, and I think > that case is more likely to occur in practice than FALSE OR WHATEVER. > But, I'm just throwing out opinions to see what sticks here; I'm not > deeply invested in this. Sure (I didn't put you on that position, just thinking out loud). The problem with UNION ALL is that it's only safe to do so when you know for sure the both sides of the partition are non-overlapping. The author of the query often knows this going in but for the planner it's not so simple to figure out in many cases. If there's a subset of cases. UNION sans ALL is probably a dead end on performance grounds. This hinges on Tom's earlier statements, "Much of the value of doing constant-folding would disappear if we ran it before subquery pullup + join simplification, because in non-stupidly-written queries those are what expose the expression simplification opportunities." and, especially, "We could run it twice but that seems certain to be a dead loser most of the time." It's pretty easy to craft a query where you're on the winning side, but what's the worst case of doing two pass...is constant folding a non trivial fraction of planning time? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions
On Fri, Jul 1, 2016 at 9:11 AM, Tom Lane wrote: > Robert Haas writes: >> On Fri, Jul 1, 2016 at 9:52 AM, Tom Lane wrote: >>> Maybe, but neither UNION nor UNION ALL would duplicate the semantics >>> of OR, so there's some handwaving here that I missed. > >> SELECT * FROM foo WHERE a = 5 OR a = 4 >> isn't equivalent to >> SELECT * FROM foo WHERE a = 5 >> UNION >> SELECT * FROM foo WHERE a = 4 >> ? > > It probably is, but you're assuming that "a" appears in the list of > columns being unioned. If you make that just "SELECT b FROM ..." > then the latter form gets rid of duplicate b values where the first > doesn't. On the other hand, UNION ALL might introduce duplicates > not present in the OR query's result. Yeah. Also, even if you could parse out those cases, it's major optimization fence. Consider if you have an ORDER BY clause here: SELECT FROM foo WHERE a OR b ORDER BY c; ... by pushing inside a union, you're going to be in trouble in real world cases. That's just a mess and it would add a lot of runtime analysis of the alternative paths. It's hard for me to believe rewriting is easier and simpler than rewriting 'false OR x' to 'x'. I also thing that constant folding strategies are going to render much more sensible output to EXPLAIN. FYI, The query is something along the lines of SELECT * FROM foo WHERE ('a' = 'a' AND EXISTS ...) OR ('a' = 'b' AND EXISTS ...) OR ('a' = 'c' AND EXISTS ...) ...where the left side of the equality is a parameterized 'filter mode' flag. That way the query can introduce filtering behaviors without doing dynamic acrobatics. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions
Hello hackers, Observe the following test case (apologies if this is a well understood problem): create temp table foo as select generate_series(1,100) id; create index on foo(id); create temp table bar as select id, id % 10 = 0 as good from generate_series(1,100) id; create index on bar(good); analyze foo; analyze bar; explain analyze select * from foo where false or exists (select 1 from bar where good and foo.id = bar.id); -- A explain analyze select * from foo where exists (select 1 from bar where good and foo.id = bar.id); -- B These queries are trivially verified as identical but give very different plans. A gives QUERY PLAN ─ Seq Scan on foo (cost=0.00..4459425.00 rows=50 width=4) (actual time=13.299..130.271 rows=10 loops=1) Filter: (alternatives: SubPlan 1 or hashed SubPlan 2) Rows Removed by Filter: 90 SubPlan 1 -> Index Scan using bar_good_idx on bar (cost=0.42..4.45 rows=1 width=0) (never executed) Index Cond: (good = true) Filter: (good AND (foo.id = id)) SubPlan 2 -> Index Scan using bar_good_idx on bar bar_1 (cost=0.42..4.44 rows=1 width=4) (actual time=0.024..0.055 rows=10 loops=1) Index Cond: (good = true) Filter: good Planning time: 0.103 ms Execution time: 130.312 ms B gives QUERY PLAN ─── Nested Loop (cost=4.87..12.91 rows=1 width=4) (actual time=0.075..0.161 rows=10 loops=1) -> HashAggregate (cost=4.45..4.46 rows=1 width=4) (actual time=0.058..0.060 rows=10 loops=1) Group Key: bar.id -> Index Scan using bar_good_idx on bar (cost=0.42..4.44 rows=1 width=4) (actual time=0.018..0.045 rows=10 loops=1) Index Cond: (good = true) Filter: good -> Index Only Scan using foo_id_idx on foo (cost=0.42..8.44 rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=10) Index Cond: (id = bar.id) Heap Fetches: 10 Planning time: 0.193 ms Execution time: 0.187 ms This is a general problem to OR expressions while AND expressions will generally pass the optimization through. The 'old school' optimization approach is to rewrite the OR expressions to UNION ALL but this can have unpleasant downstream effects on the query in real world scenarios. The question is: can the one time filter logic be expanded such the first query can be functionally be written into the second one? This type of query happens a lot when trying to mix multiple different filtering expressions (a 'filter mode' if you will) in a single query based on a user supplied switch. Food for thought. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers