from:"Merlin Moncure"

Re: [HACKERS] SQL procedures

2017-11-08 Thread Merlin Moncure

On Wed, Nov 8, 2017 at 11:03 AM, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
> On 11/8/17 11:11, Merlin Moncure wrote:
>> On Wed, Nov 8, 2017 at 9:13 AM, Peter Eisentraut
>> <peter.eisentr...@2ndquadrant.com> wrote:
>>> I have already submitted a separate patch that addresses these questions.
>>
>> Maybe I'm obtuse, but I'm not seeing it? In very interested in the
>> general approach to transaction management; if you've described it in
>> the patch I'll read it there.  Thanks for doing this.
>
> https://www.postgresql.org/message-id/178d3380-0fae-2982-00d6-c43100bc8...@2ndquadrant.com

All right, thanks.  So,
*) Are you sure you want to go the SPI route?  'sql' language
(non-spi) procedures might be simpler from implementation standpoint
and do not need any language adjustments?

*) Is it possible to jump into SPI without having a snapshot already
set up. For example? If I wanted to set isolation level in a
procedure, would I get impacted by this error?
ERROR:  SET TRANSACTION ISOLATION LEVEL must be called before any query

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SQL procedures

2017-11-08 Thread Merlin Moncure

On Wed, Nov 8, 2017 at 9:13 AM, Peter Eisentraut
 wrote:
> I have already submitted a separate patch that addresses these questions.

Maybe I'm obtuse, but I'm not seeing it? In very interested in the
general approach to transaction management; if you've described it in
the patch I'll read it there.  Thanks for doing this.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SQL procedures

2017-11-08 Thread Merlin Moncure

On Tue, Oct 31, 2017 at 12:23 PM, Peter Eisentraut
 wrote:
> - Transaction control in procedure bodies

This feature is really key, since it enables via SQL lots of things
that are not possible without external coding, including:
*) very long running processes in a single routine
*) transaction isolation control inside the procedure (currently
client app has to declare this)
*) certain error handling cases that require client side support
*) simple in-database threading
*) simple construction of daemon scripts (yeah, you can use bgworker
for this, but pure sql daemon with a cron heartbeat hook is hard to
beat for simplicity)

I do wonder how transaction control could be added later.

The last time I (lightly) looked at this, I was starting to think that
working transaction control into the SPI interface was the wrong
approach; pl/pgsql would have to adopt a very different set of
behaviors if it was called in a function or a proc.  If you restricted
language choice to purely SQL, you could work around this problem; SPI
languages would be totally abstracted from those sets of
considerations and you could always call an arbitrary language
function if you needed to.  SQL has no flow control but I'm not too
concerned about that.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan

2017-09-19 Thread Merlin Moncure

On Tue, Sep 19, 2017 at 1:37 PM, Robert Haas  wrote:
> On Tue, Sep 19, 2017 at 12:45 PM, Pavel Stehule  
> wrote:
>>> You can already set a GUC with function scope.  I'm not getting your
>>> point.
>>
>> yes, it is true. But implementation of #option is limited to PLpgSQL - so
>> there is not any too much questions - GUC is global - there is lot of
>> points:
>>
>> * what is correct impact on PREPARE
>> * what is correct impact on EXECUTE
>> * what should be done if this GUC is changed ..
>
> For better or for worse, as a project we've settled on GUCs as a way
> to control behavior.  I think it makes more sense to try to apply that
> option to new behaviors we want to control than to invent some new
> system.

This seems very sensible.

We also have infrastructure at the SQL level (SET) to manage the GUC.
Tom upthread (for pretty good reasons) extending SET to pl/pgsql
specific scoping but TBH I'm struggling as to why we need to implement
new syntax for this; the only thing missing is being able to scope SET
statements to a code block FWICT.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan

2017-09-08 Thread Merlin Moncure

On Fri, Sep 8, 2017 at 2:48 PM, Pavel Stehule  wrote:
>
>
> 2017-09-08 21:21 GMT+02:00 Daniel Gustafsson :
>>
>> > On 08 Sep 2017, at 19:14, Simon Riggs  wrote:
>> >
>> > On 6 September 2017 at 07:43, Robert Haas  wrote:
>> >
>> >> LET custom_plan_tries = 0 IN SELECT ...
>> >
>> > Tom has pointed me at this proposal, since on another thread I asked
>> > for something very similar. (No need to reprise that discussion, but I
>> > wanted prepared queries to be able to do SET work_mem = X; SELECT).
>> > This idea looks a good way forward to me.
>> >
>> > Since we're all in roughly the same place, I'd like to propose that we
>> > proceed with the following syntax... whether or not this precisely
>> > solves OP's issue on this thread.
>> >
>> > 1. Allow SET to set multiple parameters...
>> > SET guc1 = x, guc2 = y
>> > This looks fairly straightforward
>> >
>> > 2. Allow a SET to apply only for a single statement
>> > SET guc1 = x, guc2 = y FOR stmt
>> > e.g. SET max_parallel_workers = 4 FOR SELECT count(*) FROM bigtable
>> > Internally a GUC setting already exists for a single use, via
>> > GUC_ACTION_SAVE, so we just need to invoke it.
>>
>> This syntax proposal makes sense, +1.  My immediate thought was that the
>> per-statement GUCs were sort of like options, and most options in our
>> syntax
>> are enclosed with (), like: SET (guc1 = x, guc2 = y) FOR SELECT ..;
>
> we newer support this syntax in combination with SET keyword
>
> see - CREATE FUNCTION command
>
> personally I prefer syntax without FOR keyword - because following keyword
> must be reserved keyword
>
> SET x = .., y = .. SELECT ... ;

This seems pretty ugly from a syntax perspective.

We already have 'SET LOCAL', which manages scope to the current
transaction.  How about SET BLOCK which would set until you've left
the current statement block?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Aggregation push-down

2017-09-07 Thread Merlin Moncure

On Thu, Aug 17, 2017 at 10:22 AM, Antonin Houska  wrote:
> Antonin Houska  wrote:
> output type. For other aggregates (like avg()) the remote nodes will have to
> return the transient state value in an appropriate form (maybe bytea type),
> which does not depend on PG version.

Hm, that seems like an awful lot of work (new version agnostic
serialization format) for very little benefit (version independent
type serialization for remote aggregate pushdown).  How about forcing
the version to be the same for the feature to be used?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add Roman numeral conversion to to_number

2017-08-14 Thread Merlin Moncure

On Mon, Aug 14, 2017 at 2:48 PM, Peter Eisentraut
 wrote:
> On 8/3/17 13:45, Robert Haas wrote:
>> On Thu, Aug 3, 2017 at 9:25 AM, Oliver Ford  wrote:
>>> Adds to the to_number() function the ability to convert Roman numerals
>>> to a number. This feature is on the formatting.c TODO list. It is not
>>> currently implemented in either Oracle, MSSQL or MySQL so gives
>>> PostgreSQL an edge :-)
>> I kind of put my head in my hands when I saw this.  I'm not really
>> sure it's worth complicating the code for something that has so little
>> practical utility, but maybe other people will feel differently.
>
> I can't get excited about it.  to_number() and such usually mirror the
> Oracle implementation, so having something that is explicitly not in
> Oracle goes a bit against its mission.
>
> One of the more interesting features of to_number/to_char is that it has
> a bunch of facilities for formatting decimal points, leading/trailing
> zeros, filling in spaces and signs, and so on.  None of that applies
> naturally to Roman numerals, so there isn't a strong case for including
> that into these functions, when a separate function or module could do.

Well, doesn't that also apply to scientific notation ()?

'RN' is documented as an accepted formatting string, and nowhere does
it mention that it only works for input.  So we ought to allow for it
to be fixed or at least document that it does not work.  It's nothing
but a curio obviously, but it's kind of cool IMO.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2017-08-10 Thread Merlin Moncure

On Thu, Aug 10, 2017 at 12:01 PM, Ants Aasma <ants.aa...@eesti.ee> wrote:
> On Wed, Jan 18, 2017 at 4:33 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
>> On Wed, Jan 18, 2017 at 4:11 AM, Ants Aasma <ants.aa...@eesti.ee> wrote:
>>> On Wed, Jan 4, 2017 at 5:36 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
>>>> Still getting checksum failures.   Over the last 30 days, I see the
>>>> following.  Since enabling checksums FWICT none of the damage is
>>>> permanent and rolls back with the transaction.   So creepy!
>>>
>>> The checksums still only differ in least significant digits which
>>> pretty much means that there is a block number mismatch. So if you
>>> rule out filesystem not doing its job correctly and transposing
>>> blocks, it could be something else that is resulting in blocks getting
>>> read from a location that happens to differ by a small multiple of
>>> page size. Maybe somebody is racily mucking with table fd's between
>>> seeking and reading. That would explain the issue disappearing after a
>>> retry.
>>>
>>> Maybe you can arrange for the RelFileNode and block number to be
>>> logged for the checksum failures and check what the actual checksums
>>> are in data files surrounding the failed page. If the requested block
>>> number contains something completely else, but the page that follows
>>> contains the expected checksum value, then it would support this
>>> theory.
>>
>> will do.   Main challenge is getting hand compiled server to swap in
>> so that libdir continues to work.  Getting access to the server is
>> difficult as is getting a maintenance window.  I'll post back ASAP.
>
> As a new datapoint, we just had a customer with an issue that I think
> might be related. The issue was reasonably repeatable by running a
> report on the standby system. Issue manifested itself by first "could
> not open relation" and/or "column is not in index" errors, followed a
> few minutes later by a PANIC from startup process due to "specified
> item offset is too large", "invalid max offset number" or "page X of
> relation base/16384/1259 is uninitialized". I took a look at the xlog
> dump and it was completely fine. For instance in the "specified item
> offset is too large" case there was a INSERT_LEAF redo record
> inserting the preceding offset just a couple hundred kilobytes back.
> Restarting the server sometimes successfully applied the offending
> WAL, sometimes it failed with other corruption errors. The offending
> relations were always pg_class or pg_class_oid_index. Replacing plsh
> functions with dummy plpgsql functions made the problem go away,
> reintroducing plsh functions made it reappear.

Fantastic.  I was never able to attempt to apply O_CLOEXEC patch (see
upthread) due to the fact that access to the system is highly limited
and compiling a replacement binary was a bit of a headache.  IIRC this
was the best theory on the table as to the underlying cause and we
ought to to try that first, right?

Reminder; I was able to completely eliminate all damage (but had to
handle occasional unexpected rollback) via enabling checksums.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 10 beta docs: different replication solutions

2017-07-31 Thread Merlin Moncure

On Sun, Jul 30, 2017 at 8:34 PM, Steve Singer  wrote:
>
> We don't seem to describe logical replication on
>
> https://www.postgresql.org/docs/10/static/different-replication-solutions.html
>
> The attached patch adds a section.

This is a good catch.  Two quick observations:

1) Super pedantic point. I don't like the 'repl.' abbreviation in the
'most common implementation' both for the existing hs/sr and for the
newly added logical.

2) This lingo:
+ Logical replication allows the data changes from individual tables
+ to be replicated. Logical replication doesn't require a particular server
+ to be designated as a master or a slave but allows data to flow
in multiple
+ directions. For more information on logical replication, see
.

Is good, but I would revise it just a bit to emphasize the
subscription nature of logical replication to link the concepts
expressed strongly in the main section.  For example:

Logical replication allows the data changes [remove: "from individual
tables to be replicated"] to be published to subscriber nodes.  Data
can flow in any direction between nodes on a per-table basis; there is
no concept of a master server.  Conflict resolution must be handled
completely by the application.  For more information on...

what do you think?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSONB - JSONB operator feature request

2017-07-19 Thread Merlin Moncure

On Tue, Jul 18, 2017 at 12:49 PM, David Fetter  wrote:
> On Tue, Jul 18, 2017 at 01:36:32PM +0200, david.tu...@linuxbox.cz wrote:
>> Hi,
>>
>> some users and me used hstore - hstore for example storing only changed
>> rows in trigger like:
>>
>> hsore(NEW) - hstore(OLD)
>>
>> There isn't same operator/function in JSON/JSONB. We can only remove keys
>> from JSONB, but not equal key-value pairs. Is there any chance to have
>> same feature with JSON/JSONB in postgres core?
>
> Here's one slightly modified from 
> http://coussej.github.io/2016/05/24/A-Minus-Operator-For-PostgreSQLs-JSONB/
>
> CREATE OR REPLACE FUNCTION jsonb_minus ( arg1 jsonb, arg2 jsonb )
> RETURNS jsonb
> LANGUAGE sql
> AS $$
> SELECT
> COALESCE(json_object_agg(
> key,
> CASE
> -- if the value is an object and the value of the second argument 
> is
> -- not null, we do a recursion
> WHEN jsonb_typeof(value) = 'object' AND arg2 -> key IS NOT NULL
> THEN jsonb_minus(value, arg2 -> key)
> -- for all the other types, we just return the value
> ELSE value
> END
> ), '{}')::jsonb
> FROM
> jsonb_each(arg1)
> WHERE
> arg1 -> key IS DISTINCT FROM arg2 -> key
> $$;
>
> CREATE OPERATOR - (
> PROCEDURE = jsonb_minus,
> LEFTARG   = jsonb,
> RIGHTARG  = jsonb
> );
>
> I suspect that there's a faster way to do the jsonb_minus function
> internally.

yes, please!  I also sorely miss the hstore 'slice' function which is
very similar.  The main remaining disadvantage with jsonb WRT to
hstore is that you can't do simple retransformations that these
operations allow for.  Too often you end up doing multiple '->'
operations against the same object followed by a rebundling which is a
real performance killer.

I understand that there are more edge cases due the flexible json
structure but I'd be quite happy returning NULL or erroring when you
can't arrive at a sensible extraction.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [BUGS] [HACKERS] Segmentation fault in libpq

2017-06-29 Thread Merlin Moncure

On Thu, Jun 29, 2017 at 9:12 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmonc...@gmail.com> writes:
>> On Thu, Jun 29, 2017 at 8:23 AM, Michal Novotny
>> <michal.novo...@greycortex.com> wrote:
>>> Could you please help me based on information provided above?
>
>> You might want to run your code through some analysis tools (for
>> example, valgrind).
>
> valgrind is not a perfect tool for finding that kind of problem,
> especially if you can't reproduce the crash reliably; but at least
> valgrind is readily available and easy to use, so you might as
> well start there and see if it finds anything.  If you have access
> to any sort of static analysis tool (eg, Coverity), that might be
> more likely to help.  Or you could fall back on manual code
> auditing, if the program isn't very big.

clang static analyzer is another good tool to check out

https://clang-analyzer.llvm.org/

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Segmentation fault in libpq

2017-06-29 Thread Merlin Moncure

On Thu, Jun 29, 2017 at 8:23 AM, Michal Novotny
<michal.novo...@greycortex.com> wrote:
> Hi,
>
> comments inline ...
>
>
>
> On 06/29/2017 03:08 PM, Merlin Moncure wrote:
>>
>> On Thu, Jun 29, 2017 at 4:01 AM, Michal Novotny
>> <michal.novo...@greycortex.com> wrote:
>>>
>>> Hi all,
>>>
>>> we've developed an application using libpq to access a table in the PgSQL
>>> database but we're sometimes experiencing segmentation fault on
>>> resetPQExpBuffer() function of libpq called from PQexecParams() with
>>> prepared query.
>>>
>>> PostgreSQL version is 9.6.3 and the backtrace is:
>>>
>>> Core was generated by `/usr/ti/bin/status-monitor2 -m
>>> /usr/lib64/status-monitor2/modules'.
>>> Program terminated with signal 11, Segmentation fault.
>>> #0  resetPQExpBuffer (str=str@entry=0x9f4a28) at pqexpbuffer.c:152
>>> 152 str->data[0] = '\0';
>>>
>>> Thread 1 (Thread 0x7fdf68de3840 (LWP 3525)):
>>> #0  resetPQExpBuffer (str=str@entry=0x9f4a28) at pqexpbuffer.c:152
>>> No locals.
>>> #1  0x7fdf66e0333d in PQsendQueryStart (conn=conn@entry=0x9f46d0) at
>>> fe-exec.c:1371
>>> No locals.
>>> #2  0x7fdf66e044b9 in PQsendQueryParams (conn=conn@entry=0x9f46d0,
>>> command=command@entry=0x409a98 "SELECT min, hour, day, month, dow,
>>> sensor,
>>> module, params, priority, rt_due FROM sm.cron WHERE sensor = $1 ORDER BY
>>> priority DESC", nParams=nParams@entry=1, paramTypes=paramTypes@entry=0x0,
>>> paramValues=paramValues@entry=0xa2b7b0,
>>> paramLengths=paramLengths@entry=0x0,
>>> paramFormats=paramFormats@entry=0x0, resultFormat=resultFormat@entry=0)
>>> at
>>> fe-exec.c:1192
>>> No locals.
>>> #3  0x7fdf66e0552b in PQexecParams (conn=0x9f46d0, command=0x409a98
>>> "SELECT min, hour, day, month, dow, sensor, module, params, priority,
>>> rt_due
>>> FROM sm.cron WHERE sensor = $1 ORDER BY priority DESC", nParams=1,
>>> paramTypes=0x0, paramValues=0xa2b7b0, paramLengths=0x0, paramFormats=0x0,
>>> resultFormat=0) at fe-exec.c:1871
>>> No locals.
>>>
>>> Unfortunately we didn't have more information from the crash, at least
>>> for
>>> now.
>>>
>>> Is this a known issue and can you help me with this one?
>>
>> Is your application written in C?  We would need to completely rule
>> out your code (say, by double freeing result or something else nasty)
>> before assuming problem was withing libpq itself, particularly in this
>> area of the code.  How reproducible is the problem?
>>
>> merlin
>
>
> The application is written in plain C. The issue is it happens just
> sometimes - sometimes it happens and sometimes it doesn't.  Once it happens
> it causes the application crash but as it's systemd unit with
> Restart=on-failure flag it's automatically being restarted.
>
> What's being done is:
> 1) Ensure connection already exists and create a new one if it doesn't exist
> yet
> 2) Run PQexecParams() with specified $params that has $params_cnt elements:
>
> res = PQexecParams(conn, prepared_query, params_cnt, NULL, (const char
> **)params, NULL, NULL, 0);
>
> 3) Check for result and report error and exit if "PQresultStatus(res) !=
> PGRES_TUPLES_OK"
> 4) Do some processing with the result
> 5) Clear result using PQclear()
>
> It usually works fine but sometimes it's crashing and I don't know how to
> investigate further.
>
> Could you please help me based on information provided above?

You might want to run your code through some analysis tools (for
example, valgrind).  Short of that, to get help here you need to post
the code for review. How big is your application?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Segmentation fault in libpq

2017-06-29 Thread Merlin Moncure

On Thu, Jun 29, 2017 at 4:01 AM, Michal Novotny
 wrote:
> Hi all,
>
> we've developed an application using libpq to access a table in the PgSQL
> database but we're sometimes experiencing segmentation fault on
> resetPQExpBuffer() function of libpq called from PQexecParams() with
> prepared query.
>
> PostgreSQL version is 9.6.3 and the backtrace is:
>
> Core was generated by `/usr/ti/bin/status-monitor2 -m
> /usr/lib64/status-monitor2/modules'.
> Program terminated with signal 11, Segmentation fault.
> #0  resetPQExpBuffer (str=str@entry=0x9f4a28) at pqexpbuffer.c:152
> 152 str->data[0] = '\0';
>
> Thread 1 (Thread 0x7fdf68de3840 (LWP 3525)):
> #0  resetPQExpBuffer (str=str@entry=0x9f4a28) at pqexpbuffer.c:152
> No locals.
> #1  0x7fdf66e0333d in PQsendQueryStart (conn=conn@entry=0x9f46d0) at
> fe-exec.c:1371
> No locals.
> #2  0x7fdf66e044b9 in PQsendQueryParams (conn=conn@entry=0x9f46d0,
> command=command@entry=0x409a98 "SELECT min, hour, day, month, dow, sensor,
> module, params, priority, rt_due FROM sm.cron WHERE sensor = $1 ORDER BY
> priority DESC", nParams=nParams@entry=1, paramTypes=paramTypes@entry=0x0,
> paramValues=paramValues@entry=0xa2b7b0, paramLengths=paramLengths@entry=0x0,
> paramFormats=paramFormats@entry=0x0, resultFormat=resultFormat@entry=0) at
> fe-exec.c:1192
> No locals.
> #3  0x7fdf66e0552b in PQexecParams (conn=0x9f46d0, command=0x409a98
> "SELECT min, hour, day, month, dow, sensor, module, params, priority, rt_due
> FROM sm.cron WHERE sensor = $1 ORDER BY priority DESC", nParams=1,
> paramTypes=0x0, paramValues=0xa2b7b0, paramLengths=0x0, paramFormats=0x0,
> resultFormat=0) at fe-exec.c:1871
> No locals.
>
> Unfortunately we didn't have more information from the crash, at least for
> now.
>
> Is this a known issue and can you help me with this one?

Is your application written in C?  We would need to completely rule
out your code (say, by double freeing result or something else nasty)
before assuming problem was withing libpq itself, particularly in this
area of the code.  How reproducible is the problem?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] lag(bigint,int,int), etc?

2017-06-27 Thread Merlin Moncure

On Tue, Jun 27, 2017 at 10:12 AM, Colin 't Hart <colinth...@gmail.com> wrote:
> On 27 Jun 2017, at 17:06, Merlin Moncure <mmonc...@gmail.com> wrote:
>>
>>> On Tue, Jun 27, 2017 at 10:01 AM, Colin 't Hart <colinth...@gmail.com> 
>>> wrote:
>>> Hi,
>>>
>>> The following rather contrived example illustrates that lag(), lead()
>>> (and probably other functions) can't automatically cast an integer to
>>> a bigint:
>>>
>>> select lag(sum,1,0) over () from (select sum(generate_series) over
>>> (order by generate_series) from generate_series(1,10)) x;
>>> ERROR:  function lag(bigint, integer, integer) does not exist
>>> LINE 1: select lag(sum,1,0) over () from (select sum(generate_series...
>>>   ^
>>> HINT:  No function matches the given name and argument types. You
>>> might need to add explicit type casts.
>>>
>>>
>>> I guess this is because the lag() and lead() functions take any type,
>>> and hence the default must be of the same type.
>>> This had me stumped for a few while until I realised that the types
>>> were different.
>>>
>>> Would there be any way to implement an automatic conversion?
>>>
>>> On the off-chance that this is actually a bug, this is on 9.6.3, but
>>> it also occurs on 9.3.17
>>
>> Why not cast the arguments?  The first and the third argument have to
>> be the same, and the second argument is always int.
>>
>> merlin
>
> I know that I can cast. I'm wondering if it would be possible/desirable to 
> implement automatic casting. Automatic casting works already for functions 
> defined to take bigint and you pass in an integer. But not for these 
> functions that take any type.

Right.  If you've got 2+ types being passed for 'any', which argument
should you get?  It's ambiguous, so the type rules into 'any' taking
functions are stricter than for regular functions.  Casting behaviors
more complex than they look on the surface and changes to make them
more flexible are typically difficult to make work.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] lag(bigint,int,int), etc?

2017-06-27 Thread Merlin Moncure

On Tue, Jun 27, 2017 at 10:01 AM, Colin 't Hart  wrote:
> Hi,
>
> The following rather contrived example illustrates that lag(), lead()
> (and probably other functions) can't automatically cast an integer to
> a bigint:
>
> select lag(sum,1,0) over () from (select sum(generate_series) over
> (order by generate_series) from generate_series(1,10)) x;
> ERROR:  function lag(bigint, integer, integer) does not exist
> LINE 1: select lag(sum,1,0) over () from (select sum(generate_series...
>^
> HINT:  No function matches the given name and argument types. You
> might need to add explicit type casts.
>
>
> I guess this is because the lag() and lead() functions take any type,
> and hence the default must be of the same type.
> This had me stumped for a few while until I realised that the types
> were different.
>
> Would there be any way to implement an automatic conversion?
>
> On the off-chance that this is actually a bug, this is on 9.6.3, but
> it also occurs on 9.3.17

Why not cast the arguments?  The first and the third argument have to
be the same, and the second argument is always int.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql transactons not fully isolated

2017-06-21 Thread Merlin Moncure

On Tue, Jun 20, 2017 at 2:58 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
> On Tue, Jun 20, 2017 at 2:34 PM, David G. Johnston
> <david.g.johns...@gmail.com> wrote:
>> On Tue, Jun 20, 2017 at 12:22 PM, Chapman Flack <c...@anastigmatix.net> 
>> wrote:
>>> I get the reported result (DELETE 0 and a table containing 2 and 3)
>>> in both 'read committed' and 'read uncommitted'.
>>
>> Practically speaking those are a single transaction isolation mode.
>>
>> https://www.postgresql.org/docs/10/static/transaction-iso.html
>>
>> I think Merlin has mis-read the article he linked to.  The example
>> being used there never claims to be done under serialization and seems
>> to describe an example of the perils of relying on the default
>> isolation level.
>
> oops -- could be operator error :-)


yep, I made the rookie mistake of setting transaction isolation level
(which immediately evaporated since it wasn't bracketed by the
transaction), but not for the default.  Sorry for the noise,
serialization failures are raised and that is acceptable behavior per
spec AIUI.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql transactons not fully isolated

2017-06-20 Thread Merlin Moncure

On Tue, Jun 20, 2017 at 2:34 PM, David G. Johnston
 wrote:
> On Tue, Jun 20, 2017 at 12:22 PM, Chapman Flack  wrote:
>> I get the reported result (DELETE 0 and a table containing 2 and 3)
>> in both 'read committed' and 'read uncommitted'.
>
> Practically speaking those are a single transaction isolation mode.
>
> https://www.postgresql.org/docs/10/static/transaction-iso.html
>
> I think Merlin has mis-read the article he linked to.  The example
> being used there never claims to be done under serialization and seems
> to describe an example of the perils of relying on the default
> isolation level.

oops -- could be operator error :-)

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] postgresql transactons not fully isolated

2017-06-20 Thread Merlin Moncure

Michael Malis via:
http://malisper.me/postgres-transactions-arent-fully-isolated/  has
determined that postgresql transactions are not fully isolated even
when using serializable isolationl level.

If I prep a table, ints via:
postgres=# create table ints (n int);
CREATE TABLE
postgres=# insert into ints values (1);
INSERT 0 1
postgres=# insert into ints values (2);
INSERT 0 1

and then run two concurrent in serializable isolation mode
transactions like this:
T1: BEGIN
T1: UPDATE ints SET n = n + 1;
T2: BEGIN
T2: DELETE FROM ints where n = 2; -- blocks
T1: COMMIT; -- T2 frees
T2: SELECT * FROM ints;  -- both rows 2 and 3 visible
T2: COMMIT:


My understanding is that for serializable transactions, the result is
correct as long as you can play back transactions in either order, one
after another, when they overlap and get that result.  This is clearly
not the case since when played in either order you'd end up with one
row.

I guess the failure occurs there is some kind of separation between
when the row is initially looked up and the deletion is qualified.
This is likely not a problem in practice, but is Micheal right in is
assessment that we are not precisely following the spec?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CTE inlining

2017-05-12 Thread Merlin Moncure

On Fri, May 12, 2017 at 3:39 PM, Bruce Momjian  wrote:
> On Tue, May  9, 2017 at 05:14:19PM -0400, Tom Lane wrote:
>> Ilya Shkuratov  writes:
>> > Ok, it seems that most people in discussion are agree that removing 
>> > optimization
>> > fence is a right thing to do.
>> > Nonetheless I still hoping to discuss the algorithm and its implementation.
>>
>> Yeah, so far we've mainly discussed whether to do that and how to control
>> it, not what the actual results would be.
>
> To summarize, it seems we have two options if we want to add fence
> control to CTEs:
>
> 1.  add INLINE to disable the CTE fence
> 2.  add MATERIALIZE to enable the CTE fence
>
> or some other keywords.  I think most people prefer #2 because:
>
> *  most users writing queries prefer #2

Yeah, I think there was rough consensus on this point.I think it's
fair to assume that most (or at least a majority) of queries will
benefit from inlining, people would want to opt out of, rather than
opt in to, generally good optimization strategies.   This will hit
some in people today, but this is not a backwards compatibility issue
since performance is generally not really fairly described as
compatibility criteria.  If this feature drops we ought to warn people
in the release notes though.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG 10 release notes

2017-05-04 Thread Merlin Moncure

On Mon, May 1, 2017 at 7:02 AM, Robert Haas  wrote:
> On Tue, Apr 25, 2017 at 11:01 AM, Bruce Momjian  wrote:
>> I didn't think logical decoding was really more than a proof-of-concept
>> until now.
>
> /me searches for jaw on floor.
>
> I would not in any way refer to logical decoding as being only a proof
> of concept, even before logical replication.

That's fair, but I think I understand what Bruce was going for here.
Data point: github third party modules are generally not approved for
deployment in my organization so logical decoding from a production
perspective does not exist (for me) until 10.0.  Point being, an
important threshold has been crossed.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CTE inlining

2017-05-03 Thread Merlin Moncure

On Wed, May 3, 2017 at 12:33 PM, Alvaro Herrera
 wrote:
> David Fetter wrote:
>
>> When we add a "temporary" GUC, we're taking on a gigantic burden.
>> Either we support it forever somehow, or we put it on a deprecation
>> schedule immediately and expect to be answering questions about it for
>> years after it's been removed.
>>
>> -1 for the GUC.
>
> Absolutely.
>
> So ISTM we have three choices:
>
> 1) we switch unmarked CTEs as inlineable by default in pg11.  What seems
> likely to happen for a user that upgrades to pg11 is that 5 out of 10
> CTE-using queries are going to become faster than with pg10, and they
> are going to be happy; 4 out of five are going to see no difference, but
> they didn't have to do anything about it; and the remaining query is
> going to become slower, either indistinguishably so (in which case they
> don't care and they remain happy because of the other improvements) or
> notably so, in which case they can easily figure where to add the
> MATERIALIZED option and regain the original performance.

+1 for option 1.  This change will be welcome for a large number of
queries, but forced materialization is a real need and I use it often.
This comes off as a very reasonable compromise in my opinion unless it
requires major coding gymnastics to implement.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CTE inlining

2017-05-02 Thread Merlin Moncure

\On Tue, May 2, 2017 at 12:05 PM, Tomas Vondra
 wrote:
> On 5/2/17 6:34 PM, David Fetter wrote:
>>
>> On Tue, May 02, 2017 at 02:40:55PM +0200, Andreas Karlsson wrote:
>>>
>>> On 05/02/2017 04:38 AM, Craig Ringer wrote:

 On 1 May 2017 at 22:26, Andreas Karlsson  wrote:
>
>>>
>>>
>>> ...
>>>
>>> I see some alternatives, none of them perfect.
>>>
>>> 1. Just remove the optimization fence and let people add OFFSET 0 to
>>> their
>>> queries if they want an optimization fence. This lets us keep pretending
>>> that we do not have query hints (and therefore do not have to formalize
>>> any
>>> syntax for them) while still allowing people to add optimization fences.
>>
>>
>> +1
>>
>> I get that people with gigantic PostgreSQL installations with
>> stringent performance requirements sometimes need to do odd things to
>> squeeze out the last few percentage points of performance.  As the
>> people (well, at least the people close to the ground) at these
>> organizations are fully aware, performance optimizations are extremely
>> volatile with respect to new versions of software, whether it's
>> PostgreSQL, Oracle, the Linux kernel, or what have you.  They expect
>> this, and they have processes in place to handle it.  If they don't,
>> it's pilot error.
>>
>> We should not be penalizing all our other users to maintain the
>> fiction that people can treat performance optimizations as a "fire and
>> forget" matter.
>>
>
> Agreed.
>
>>> 2. Add a decorator for WITH (e.g. "WITH MATERIALIZED x (...") to add an
>>> explicit optimization fence. This will for the first time add official
>>> support for a query hint in the syntax which is a quite big precedent.
>>
>>
>> Yep.  It's one we should think very carefully before we introduce.
>>
>
> I think it's a mistake to see this as an introduction of query hits.
>
> Firstly, it's a question whether it qualifies as a hint. I wouldn't call it
> a hint, but let's assume there is a definition of query hints that includes
> WITH MATERIALIZED.
>
> More importantly, however, this is not introducing anything new. It's just a
> different name for the current "WITH" semantics, and you can achieve the
> same behavior by "OFFSET 0". And people are already using these as hints, so
> I fail to see how this introduces anything new.
>
> In fact, if you see the optimization fence as an implicit query hint, this
> actually *removes* a hint (although most users are unaware of that behavior
> and use it unintentionally).

+1 down the line.  More to the point, for several years now we've (or
at least I, but I'm not the only one) have been advocating for the
usage of CTE to avoid the undocumented and bizarre OFFSET 0 trick.
Jerking this out from users without giving a simple mechanic to get
the same behavior minus a major query rewrite is blatantly user
hostile.  I can't believe we're even contemplating it.   Also a GUC is
not a solution for pretty obvious reasons I think.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CTE inlining

2017-05-02 Thread Merlin Moncure

On Sun, Apr 30, 2017 at 6:21 PM, Andres Freund  wrote:
> On 2017-04-30 07:19:21 +0200, Pavel Stehule wrote:
>> why we cannot to introduce GUC option - enable_cteoptfence ?
>
> Doesn't really solve the issue, and we've generally shied away from GUCs
> that influence behaviour after a few bad experiences.  What if you want
> one CTE inlined, but another one not?

Yeah.  Are we absolutely opposed to SQL syntax against WITH that
allows or disallows fencing?   for example,

WITH [MATERIALIZED]

Pushing people to OFFSET 0 is a giant step backwards IMO, and as in
implementation detail is also subject to change.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Crash observed during the start of the Postgres process

2017-04-25 Thread Merlin Moncure

On Tue, Apr 25, 2017 at 8:44 AM, K S, Sandhya (Nokia - IN/Bangalore)
 wrote:
> Hello,
>
> Did you get a chance to take a look into the issue?
>
> Please consider it with high priority. We will be awaiting your inputs.

This email is heavily cross posted, which is obnoxious.  Can you paste
the relevant log snippet?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Triggers and logical replication (10devel)

2017-04-21 Thread Merlin Moncure

On Fri, Apr 21, 2017 at 5:08 AM, Egor Rogov  wrote:
> Hello,
> It seams that tiggers don't fire on subscriber's tables during logical
> replication. Is it a bug?

Reading the documentation (which is TBH a bit hard to follow) it
appears that it is expected behavior.

https://www.postgresql.org/docs/devel/static/logical-replication-architecture.html#logical-replication-snapshot
states:

"The apply process on the subscriber database always runs with
session_replication_role set to replica, which produces the usual
effects on triggers and constraints."

https://www.postgresql.org/docs/devel/static/sql-altertable.html states:

"The trigger firing mechanism is also affected by the configuration
variable session_replication_role. Simply enabled triggers will fire
when the replication role is “origin” (the default) or “local”."

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Ongoing issues with representation of empty arrays

2017-04-19 Thread Merlin Moncure

On Mon, Apr 10, 2017 at 11:17 PM, Andrew Gierth
 wrote:
>> "Tom" == Tom Lane  writes:
>
>  >> First is contrib/intarray, _AGAIN_ (see past bugs such as #7730):
>  >> ...
>  >> I plan to fix this one properly, unless anyone has any objections.
>
>  Tom> Just to clarify, what do you think is "properly"?
>
> I would say, that any time an intarray function returns an empty result
> it should be the standard 0-dimensional representation that every other
> array operation uses.  The intarray functions all seem already able to
> take such values as inputs.  Also there should be regression tests for
> this (none of intarray's existing tests have any empty arrays at all).

Are there any impacts outside of intarray?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance issue with postgres9.6

2017-04-07 Thread Merlin Moncure

On Fri, Apr 7, 2017 at 12:54 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Tomas Vondra <tomas.von...@2ndquadrant.com> writes:
>> On 04/07/2017 06:31 PM, Merlin Moncure wrote:
>>> I think your math is off.  Looking at your attachments, planning time
>>> is 0.056ms, not 0.56ms.  This is in no way relevant to performance on
>>> the order of your measured TPS.   How are you measuring TPS?
>
>> Not sure where did you get the 0.056ms?
>
> I don't see that either, but:
>
>> What I see is this in the 9.3 explains:
>>   Total runtime: 0.246 ms
>> and this in those from 9.6:
>>   Planning time: 0.396 ms
>>   Execution time: 0.181 ms
>> That is roughly 0.25ms vs. 0.6ms (0.4+0.2), as reported by Prakash.
>
> 9.3's EXPLAIN did not measure planning time at all.  The "Total runtime"
> it reports corresponds to "Execution time" in the newer version.  So
> these numbers indicate that 9.6 is significantly *faster*, not slower,
> than 9.3, at least so far as execution of this one example is concerned.
>
> The OP may well be having some performance issue with 9.6, but the
> presented material completely fails to demonstrate it.

This smells like a problem with the test execution environment itself.
OP (if on linux), try:

pgbench -n -f <(echo "select * from subscriber where s_id = 100") -c 4 -T 10

...where pgbench is run from the database server (if pgbench is not in
the default path, you may have to qualify it).  This should give
apples to apples comparison, or at least rule out certain
environmental considerations like the network stack.

If your client is running on windows, one place to look is the TCP
stack.  In my experience tcp configuration issues are much more common
on windows.  On any reasonably modern hardware can handle thousands
and thousands of transactions per second for simple indexed select.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance issue with postgres9.6

2017-04-07 Thread Merlin Moncure

On Fri, Apr 7, 2017 at 5:16 AM, Prakash Itnal  wrote:
> Hello,
>
> We currently use psotgres 9.3 in our products. Recently we upgraded to
> postgres 9.6. But with 9.6 we have seen a drastic reduction in throughput.
> After analyzing carefully I found that "planner time" in 9.6 is very high.
> Below are the details:
>
> Scenario:
> 1 Create a table with 10 rows.
> 2 Execute simple query: select * from subscriber where s_id = 100;
> 3 No update/delete/insert; tried vacuum, full vacuum; by default we enable
> auto-vacuum
>
> 9.3: Avg of "Total runtime" : 0.24ms [actual throughput: 650 TPS]
> 9.6: Avg of Total time: 0.56ms (Avg of "Planning time" : 0.38ms + Avg of
> "Execution time" : 0.18ms) [actual throughput: 80 TPS]

I think your math is off.  Looking at your attachments, planning time
is 0.056ms, not 0.56ms.  This is in no way relevant to performance on
the order of your measured TPS.   How are you measuring TPS?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq Alternate Row Processor

2017-02-13 Thread Merlin Moncure

On Mon, Feb 13, 2017 at 8:46 AM, Kyle Gearhart
 wrote:
> On 2/9/17 7:15 PM, Jim Nasby wrote:
>> Can you run a trace to see where all the time is going in the single row 
>> case? I don't see an obvious time-suck with a quick look through the code. 
>> It'd be interesting to see how things change if you eliminate the filler 
>> column from the SELECT.
>
> Traces are attached, these are with callgrind.
>
> profile_nofiller.txt: single row without filler column
> profile_filler.txt: single row with filler column
> profile_filler_callback.txt: callback with filler column
>
> pqResultAlloc looks to hit malloc pretty hard.  The callback reduces all of 
> that to a single malloc for each row.

Couldn't that be optimized, say, by preserving malloc'd memory when in
single row mode and recycling it?  (IIRC during the single row mode
discussion this optimization was voted down).

A barebones callback mode ISTM is a complete departure from the
classic PGresult interface.  This code is pretty unpleasant IMO:
acct->abalance = *((int*)PQgetvalue(res, 0, i));
acct->abalance = __bswap_32(acct->abalance);

Your code is faster but foists a lot of the work on the user, so it's
kind of cheating in a way (although very carefully written
applications might be able to benefit).

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idea on how to simplify comparing two sets

2017-02-07 Thread Merlin Moncure

1On Tue, Feb 7, 2017 at 9:46 PM, Joel Jacobson  wrote:
> On Tue, Feb 7, 2017 at 4:58 PM, Tom Lane  wrote:
>> Joel Jacobson  writes:
>>> Currently there is no simple way to check if two sets are equal.
>>
>> Uh ... maybe check whether SELECT set1 EXCEPT SELECT set2
>> and SELECT set2 EXCEPT SELECT set1 are both empty?
>
> Yes, that's one way, but it's ugly as you have to repeat yourself and
> write both sets two times.
> Not an issue for small queries, but if you have two big queries stored
> in a .sql file,
> you would have to modify both places for each query and always make
> sure they are identical.

A CTE might help:

WITH left AS (something complex),
right AS (something complex)
SELECT COUNT(*) = 0 AS good FROM
(
  SELECT * FROM left EXCEPT SELECT * FROM right
  UNION ALL
  SELECT * FROM right EXCEPT SELECT * FROM left
) q;

This isn't the most efficient solution, but is easily abstracted into
dynamic SQL (meaning, you could pass both queries as arguments to a
checker function).  Another, similar approach is to abstract the query
behind a view which ISTM is a practice you are underutilizing based on
your comments :-).

If I were in a hurry and the dataset was enormous I would probably
dump both queries identically ordered to a .csv, and do:
diff left.csv right.csv | head -1

in bash or something like that.  Not sure if the utility of a
bidirectional EXCEPT is enough to justify adding custom syntax for
that approach.  I use the 'double EXCEPT' tactic fairly often and
understand the need, but the bar for non-standard syntax is pretty
high (and has been getting higher over the years, I think).

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

2017-01-23 Thread Merlin Moncure

On Mon, Jan 23, 2017 at 8:07 PM, Tom Lane  wrote:
> Peter Geoghegan  writes:
>> I thought that checksums went in in part because we thought that there
>> was some chance that they'd find bugs in Postgres.
>
> Not really.  AFAICS the only point is to catch storage-system malfeasance.
>
> It's barely possible that checksumming would help detect cases where
> we'd written data meant for block A into block B, but I don't rate
> that as being significantly more probable than bugs in the checksum
> code itself.  Also, if that case did happen, the checksum code might
> "detect" it in some sense, but it would be remarkably unhelpful at
> identifying the actual cause.

Hm, but at least in some cases wouldn't it protect people from further
damage?  End user data damage ought to prevented at all costs IMO.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

2017-01-23 Thread Merlin Moncure

On Sat, Jan 21, 2017 at 12:35 PM, Tom Lane  wrote:
> Andres Freund  writes:
>> Sure, it might be easy, but we don't have it.  Personally I think
>> checksums just aren't even ready for prime time. If we had:
>> - ability to switch on/off at runtime (early patches for that have IIRC
>>   been posted)
>> - *builtin* tooling to check checksums for everything
>> - *builtin* tooling to compute checksums after changing setting
>> - configurable background sweeps for checksums
>
> Yeah, and there's a bunch of usability tooling that we don't have,
> centered around "what do you do after you get a checksum error?".
> AFAIK there's no way to check or clear such an error; but without
> such tools, I'm afraid that checksums are as much of a foot-gun
> as a benefit.

I see your point here, but they sure saved my ass with that pl/sh
issue.  So I'm inclined to lightly disagree; there are good arguments
either way.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2017-01-18 Thread Merlin Moncure

On Wed, Jan 18, 2017 at 4:11 AM, Ants Aasma <ants.aa...@eesti.ee> wrote:
> On Wed, Jan 4, 2017 at 5:36 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
>> Still getting checksum failures.   Over the last 30 days, I see the
>> following.  Since enabling checksums FWICT none of the damage is
>> permanent and rolls back with the transaction.   So creepy!
>
> The checksums still only differ in least significant digits which
> pretty much means that there is a block number mismatch. So if you
> rule out filesystem not doing its job correctly and transposing
> blocks, it could be something else that is resulting in blocks getting
> read from a location that happens to differ by a small multiple of
> page size. Maybe somebody is racily mucking with table fd's between
> seeking and reading. That would explain the issue disappearing after a
> retry.
>
> Maybe you can arrange for the RelFileNode and block number to be
> logged for the checksum failures and check what the actual checksums
> are in data files surrounding the failed page. If the requested block
> number contains something completely else, but the page that follows
> contains the expected checksum value, then it would support this
> theory.

will do.   Main challenge is getting hand compiled server to swap in
so that libdir continues to work.  Getting access to the server is
difficult as is getting a maintenance window.  I'll post back ASAP.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CORRESPONDING clause design

2017-01-17 Thread Merlin Moncure

On Tue, Jan 17, 2017 at 12:37 AM, Surafel Temsgen  wrote:
> I am new here and I really want to contribute, I have read same resource
> that help understanding database system and postgresql. I would like to
> start implementing sql syntax corresponding by clause because I believe
> implementing sql syntax gives an opportunity to familiarize  many part of
> postgresql source code. Previous implementation is here and have an issue on
> explain query and break cases on unlabeled NULLs
> To repeat what a corresponding by clause means
> Corresponding clause either contains a BY(...) clause or not. If it
> doesn't have a BY(...) clause the usage is as follows.

This is great stuff. Does the syntax only apply to UNION?  I would
imagine it would also apply to INTERSECT/EXCEPT?  What about UNION
ALL?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Retiring from the Core Team

2017-01-12 Thread Merlin Moncure

On Wed, Jan 11, 2017 at 6:29 PM, Josh Berkus  wrote:
> Hackers:
>
> You will have noticed that I haven't been very active for the past year.
>  My new work on Linux containers and Kubernetes has been even more
> absorbing than I anticipated, and I just haven't had a lot of time for
> PostgreSQL work.
>
> For that reason, as of today, I am stepping down from the PostgreSQL
> Core Team.

Thanks for all your hard work.  FWIW, your blog posts, 'Primary
Keyvil' are some of my favorite of all time!

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] merging some features from plpgsql2 project

2017-01-11 Thread Merlin Moncure

On Wed, Jan 11, 2017 at 2:57 PM, Robert Haas  wrote:
> - The E'' syntax and the standard_conforming_strings GUC were added in
> PostgreSQL 8.0.  The only legal value of standard_conforming_strings
> was "false".
>
> - In PostgreSQL 8.1, it became possible to set
> standard_conforming_strings to "true", but the default was still
> "false".
>
> - In PostgreSQL 9.1, the default was changed to "true".
>
> So there 6 major release from the time the GUC was added and 5 from
> the time it became mutable before the default was flipped.   We've now
> had 5 more since the default was changed to "true".  (No, it's not
> time to remove the GUC yet.  At least not in my opinion.)
>
> One thing that made changing standard_conforming_strings particularly
> painful was that it had knock-on effects on many language-specific
> drivers not maintained by the core project (or just plain not
> maintained).  I don't think the language changes being proposed here
> for PL/pgsql would have the same kind of impact, but some of them
> would make it significantly harder to migrate to PostgreSQL from
> Oracle, which some people might see as an anti-goal (as per other
> nearby threads on making that easier).

I don't think it's a simple matter of waiting N or N+M releases
(although I certainly did appreciate that we did it regardless).  It
comes down to this: there's just no way to release changes that break
a lot of code without breaking a lot of code.  It's all about
acknowledging that and judging it acceptable against the benefits you
get.   For posterity, with respect to conforming strings, SQL
injection is an absolute scourge of the computing world so on balance
we did the right thing.  Having said that, It's always good to do the
math and the calculation is primarily an economic one, I think,

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] merging some features from plpgsql2 project

2017-01-11 Thread Merlin Moncure

On Wed, Jan 11, 2017 at 11:11 AM, Peter Eisentraut
 wrote:
> The current syntax was chosen because it is SQL-compatible.  Adding
> redundant syntax to save a few characters without any new functionality
> (performance, resource usage, safety, etc.) is a weak argument in the
> overall scheme of things.

Yeah -- exactly.  The few minor things that are not 100% SQL
compatible I find to be major headaches.  Incompatible usage of INTO
for example.

This thread has been going on for quite some time now and is starting
to become somewhat circular.   Perhaps we ought to organize the
various ideas and pain points presented in a wiki along with
conclusions, and in some cases if there is no solution that is
compatible with the current syntax.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] merging some features from plpgsql2 project

2017-01-11 Thread Merlin Moncure

On Tue, Jan 10, 2017 at 7:44 AM, Marko Tiikkaja  wrote:
> On Tue, Jan 10, 2017 at 2:26 PM, Peter Eisentraut
>  wrote:
>>
>> It's not like PL/pgSQL is the king of brevity.
>
>
> This is essentially saying "PL/PgSQL isn't perfect, so we shouldn't try and
> make it better".  I hear this argument a lot, and as long as people keep
> rejecting improvements for this reason they can keep saying it.  It's a
> self-fulfilling prophecy.

Agreed.  But adding language features, especially syntactical ones,
demands prudence; there is good reason to limit keywords like that.
What about:
pgsql.rows
pgsql.found
pgsql.sqlerrm
etc
as automatic variables (I think this was suggested upthread).
Conflicts with existing structures is of course an issue but I bet it
could be worked out.

I also kinda disagree on the brevity point, or at least would like to
add some color.  SQL is verbose in the sense of "let's make everything
an english language sentence" but incredibly terse relative to other
language implementations of the same task.   Embedded SQL tends to be
uniformly clumsy due to all of the extra handling of errrors,
parameterization, etc.  This is why we write plpgsql naturally.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] merging some features from plpgsql2 project

2017-01-09 Thread Merlin Moncure

On Sun, Jan 8, 2017 at 2:52 AM, Joel Jacobson  wrote:
> On Sat, Jan 7, 2017 at 8:56 PM, Pavel Stehule  wrote:
>>
>> Jim, Marko, Joel - is there a place, features where we can find a partial 
>> agreement? If it is, then we can move our view there.
>
> I have decided I definitively want a new language, and I'm willing to
> pay for it.

well, as they say, "money talks" :-D.

> Hopefully the community will join forces and contribute with ideas and
> code, but with or without you or the rest of the community, plpgsql2
> is going to happen.
> Call it pltrustly or plpgsql2, I don't care. I just care about ending
> my suffering from being forced writing plpgsql every day. It sucks,
> and I'm going to end it.

Curious, are you mainly troubled by the 'INTO STRICT' family of
problems? Or something else?  Pavel has scored some points with PRAGMA
syntax and ISTM that does not require compatibility break.

> And please kill all these GUCs ideas. The best thing with PostgreSQL
> is the natural expected behaviour of the default configuration.
> Contrary to MySQL where you have to enable lots and lots of
> configuration options just to get a behaviour you expect as a novice
> user.

I think there is a lot of support for this point of view.  Jim is
notable outlier here, but for the most part we don't do language
behavior changes with GUC.

> It's much better to just come together and agree on whatever we have
> learned during the last 15 years of PL/pgSQL1, and sample all ideas
> during a year maybe, and decide what to put into PL/pgSQL2. To make it
> useful, we should aim to not break compatibility for _most_ code, but
> accept some necessary rewrites of functions with deprecated
> anti-patterns.

Agreed: If you want to break compatibility, pushing a new language is
the better way than GUC.   If you got consensus on this, having both
languages side by side supported for a while (maybe 4-5 releases) is
they way to go, and finally the only language is frozen and moved to
extension.  But this is a lot of work and aggravation, are you *sure*
you can only get what you want with a full compatibility break?

With respect to your company developers specifically?  I'm genuinely
curious if you've taken a good look at pl/v8 and why you've determined
it's not suitable to move forward with.  It's got a different set of
headaches, but is really fast, and sometimes wonder if with some
alternative preprocessing (like coffeescript but geared towards SQL)
could have some long term promise.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] merging some features from plpgsql2 project

2017-01-05 Thread Merlin Moncure

On Thu, Jan 5, 2017 at 11:03 AM, Robert Haas  wrote:
> Now, that's not to say we should never break backward compatibility.
> Sometimes we should.  I think the problem with PL/pgsql is that many
> of the compatibility breaks that people want are likely to lead to
> subtle misbehavior rather than outright failure, or are not easy to
> spot via a cursory look ("hmm, could that SELECT query ever return
> more than one row?").

The core issue is that developers tend to be very poor at estimating
the impacts of changes; they look at things the the lens of the "new".
Professional software development is quite expensive and framework-
(I'll lump the database and it's various built-in features under that
term) level changes are essentially throwing out some portion of our
user's investments.  Even fairly innocent compatibility breaks can
have major downstream impacts on our users and it's always much worse
than expected.  For example, nobody thought that changing the bytea
text encoding format to hex would have corrupted our user's data, but
it did.

TBH, the discussion should shift away from specific issues on
compatibility and towards a specific set of standards and policies
around how to do it and what kinds of technical justifications need to
be made in advance.  Security problems for example could be argued as
a valid reason to break user code, or poor adherence to the the SQL
standard which are in turn blocking other content.  Minus those kinds
of considerations it's really just not worth doing, and there's no
tricky strategy like playing with version numbers that can game that
rule.  A formal deprecation policy might be a good start.

The C language really should be considered the gold standard here.
Changes did have to be made, like getting rid of the notoriously
broken and insecure gets(), but they were made very, very slowly and
unobtrusively.

(I do think lpad should except "any" FWIW) :-D

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] merging some features from plpgsql2 project

2017-01-04 Thread Merlin Moncure

On Wed, Jan 4, 2017 at 1:49 PM, Pavel Stehule  wrote:
>
>>
>> so some possible design can be:
>>
>> DECLARE
>>   PRAGMA UNCACHED_PLANS;
>> BEGIN
>>   SELECT ... INTO ;
>>   SELECT ... INTO ;
>> END;
>>
>> This respects Ada and PL/SQL style - probably easy implementation
>>
>> Regards
>>
>> Pavel
>
>
> some examples based on Ada doc
>
> FUNCTION xxx RETURN int AS
>   PRAGMA yyy -- pragma has function scope
> BEGIN
>
> FUNCTION xxx RETURN int AS
> BEGIN
>   DECLARE
> PRAGMA yyy -- pragma has block scope

ok, sub-block makes sense over statement level IMO.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] merging some features from plpgsql2 project

2017-01-04 Thread Merlin Moncure

On Tue, Jan 3, 2017 at 2:15 PM, Pavel Stehule <pavel.steh...@gmail.com> wrote:
>
>
> 2017-01-03 20:54 GMT+01:00 Merlin Moncure <mmonc...@gmail.com>:
>>
>> On Tue, Jan 3, 2017 at 9:58 AM, Pavel Stehule <pavel.steh...@gmail.com>
>> wrote:
>> > 2017-01-03 16:23 GMT+01:00 Merlin Moncure <mmonc...@gmail.com>:
>> >> *) Would also like to have a FINALLY block
>> >
>> > What you can do there?
>>
>> This is syntax sugar so you don't need second begin/end/exception
>> block or duplicated code.  It separates error handling from cleanup.
>>
>> BEGIN
>>   PERFORM dblink_connect(...
>>   
>> EXCEPTION WHEN OTHERS THEN
>>   
>> FINALLY
>>   PERFORM dblink_disconnect(...
>> END;
>
>
> Does know somebody this pattern from Ada or PL/SQL?

I guess probably not.  It's a standard pattern in modern EH languages
(for example, https://msdn.microsoft.com/en-us/library/dszsf989.aspx).

>>
>> >> *) Some user visible mechanic other than forcing SQL through EXECUTE
>> >> to be able to control plan caching would be useful.
>> >
>> > fully agree.
>> >
>> > Have you some ideas?
>> >
>> > What about plpgsql option (function scope) -- WITHOUT-PLAN-CACHE - any
>> > non
>> > trivial plans will not be cached - and evaluated as parametrized query
>> > only.
>>
>> I have slight preference for syntax marker for each query, similar to
>> INTO.  Maybe 'UNCACHED'?
>
>
> I am not clean opinion - the statement level is nice, but what readability?
>
> SELECT UNCACHED t.a, t.b FROM INTO a,b;

Yeah -- this is pretty ugly admittedly.  Maybe control directive is
ok, as long as you can set it mid function?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2017-01-04 Thread Merlin Moncure

On Tue, Jan 3, 2017 at 1:05 PM, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
> On 11/7/16 5:31 PM, Merlin Moncure wrote:
>> Regardless, it seems like you might be on to something, and I'm
>> inclined to patch your change, test it, and roll it out to production.
>> If it helps or at least narrows the problem down, we ought to give it
>> consideration for inclusion (unless someone else can think of a good
>> reason not to do that, heh!).
>
> Any results yet?

Not yet unfortunately.  I compiled the server with the change, but was
not able get $libdir working so that I could just do a binary swap
over my pgdg compiled package.  If anyone has some pointers on how to
do that, I'd appreciated it.

Still getting checksum failures.   Over the last 30 days, I see the
following.  Since enabling checksums FWICT none of the damage is
permanent and rolls back with the transaction.   So creepy!

[root@rcdylsdbmpf001 pg_log]# cat *.log | grep "page verification failed"
2016-12-05 10:17:48 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 61797 but expected 61798
2016-12-05 11:15:31 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 37750 but expected 37749
2016-12-05 11:15:58 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 44483 but expected 44482
2016-12-05 11:16:33 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 58926 but expected 58925
2016-12-05 11:17:08 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 38527 but expected 38528
2016-12-05 11:18:34 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 61932 but expected 61933
2016-12-05 11:18:55 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 23757 but expected 23758
2016-12-05 12:13:48 CST [rms@mpf2]: WARNING:  page verification
failed, calculated checksum 44192 but expected 44225 at character 417
2016-12-08 14:18:37 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 36083 but expected 36082
2016-12-08 15:52:31 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 63414 but expected 63415 at
character 1096
2016-12-09 09:12:21 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 25781 but expected 25780
2016-12-09 09:13:20 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 63043 but expected 63044 at
character 4230
2016-12-12 08:57:45 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 31775 but expected 31771
2016-12-13 09:47:11 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 40802 but expected 40806
2016-12-15 12:49:04 CST [rms@mpf2]: WARNING:  page verification
failed, calculated checksum 11625 but expected 11592 at character 417
2016-12-15 12:51:08 CST [rms@mpf2]: WARNING:  page verification
failed, calculated checksum 51017 but expected 51018
2016-12-15 12:52:36 CST [rms@mpf2]: WARNING:  page verification
failed, calculated checksum 51017 but expected 51018 at character 417
2016-12-16 12:16:31 CST [rms@mpf2]: WARNING:  page verification
failed, calculated checksum 23580 but expected 23576
2016-12-20 13:59:33 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 45273 but expected 45285
2016-12-20 14:00:22 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 10524 but expected 10525

note second database 'mpf2'.  This is a new development.  Example of
query that is jacking things is this:
2016-12-15 12:51:08 CST [rms@mpf2]: WARNING:  page verification
failed, calculated checksum 51017 but expected 51018
2016-12-15 12:51:08 CST [rms@mpf2]: CONTEXT:  SQL statement "
  COPY (
SELECT 'DELETE FROM tblNAptCommonSample WHERE ReportPeriod = 201612;'
UNION ALL
SELECT format(
'INSERT INTO tblNAptCommonSample('
  'ReportPeriod, Period, AdjustmentType, PlanType, MSA, MSASubMkt, '
  'Sample, Occupancy, OccupancyChange, AverageRent,
AverageRentChange, RentSF, '
  'RentSFChange)'
'VALUES('
  '%s, %s, ''%s'', ''%s'', ''%s'', %s,'
  '%s, %s, %s, %s, %s, %s,'
  '%s)',
  ReportPeriod, Period, AdjustmentType, PlanType, MSA, MSASubMkt,
  c(Sample), c(Occupancy), c(OccupancyChange), c(AverageRent),
c(AverageRentChange), c(RentSF),
  c(RentSFChange))
FROM tblNAptCommonSample
  WHERE Period = 201612
  AND MSA != '5610'
UNION ALL
  SELECT 'go'
  ) TO '/tmp/tblnaptcommonsample.sql';
"
PL/pgSQL function writempf1history(integer) line 75 at EXECUTE

or this:
2016-12-15 12:52:36 CST [rms@mpf2]: WARNING:  page verification
failed, calculated checksum 51017 but expected 51018 at character 417
2016-12-15 12:52:36 CST [rms@mpf2]: QUERY:
  COPY (
SE

Re: [HACKERS] merging some features from plpgsql2 project

2017-01-03 Thread Merlin Moncure

On Tue, Jan 3, 2017 at 9:58 AM, Pavel Stehule <pavel.steh...@gmail.com> wrote:
> 2017-01-03 16:23 GMT+01:00 Merlin Moncure <mmonc...@gmail.com>:
>> So -1 to strict mode, unless we can make a case why this can't be done
>> as part of checking/validation.
>
> Can be plpgsq.extra_errors and plpgsql.extra_warnings solution?
>
> I am thinking so there is a space for improvement (in extra_* usage)

extra_warnings seems ok at the GUC level.  However it's bad to have a
body of code fail to compile based on GUC.  check_function_bodies for
example is a complete hack and should be avoided if at all possible
IMO.  There is very good informal rule that GUC should not impact
behavior (minus some special cases like timeouts).   Good examples of
failure to follow this rule are mysql and php.

Maybe settings at level of extension could be ok, but I'm skeptical.
Good languages are clear without needing extra context.

> Do you know plpgsql_check https://github.com/okbob/plpgsql_check ?

Yes.  This is good design and should be model for core-work (if any).
 In my ideal world, this could would be part of pgxn and to have pgxn
client be installed in core.   For plpgsql to enter modern era we need
standardized packaging and deployment like cran, npm, etc.

>> Other random points:
>> *) Another major pain point is swapping in the input variables for
>> debugging purposes.  Something that emits a script based on a set of
>> arguments would be wonderful.
>
> ???

Often for debugging of complicated cases I'm starting from errors in
database log with function name and argument values.  Sometimes I find
myself pasting pl/pgsql function into text editor and replacing input
variables with known values.

>>
>> *) Would also like to have a FINALLY block
>
> What you can do there?

This is syntax sugar so you don't need second begin/end/exception
block or duplicated code.  It separates error handling from cleanup.

BEGIN
  PERFORM dblink_connect(...

EXCEPTION WHEN OTHERS THEN

FINALLY
  PERFORM dblink_disconnect(...
END;

>> *) Some user visible mechanic other than forcing SQL through EXECUTE
>> to be able to control plan caching would be useful.
>
> fully agree.
>
> Have you some ideas?
>
> What about plpgsql option (function scope) -- WITHOUT-PLAN-CACHE - any non
> trivial plans will not be cached - and evaluated as parametrized query only.

I have slight preference for syntax marker for each query, similar to
INTO.  Maybe 'UNCACHED'?

On Tue, Jan 3, 2017 at 10:57 AM, Jim Nasby <jim.na...@bluetreble.com> wrote:
> Or just fix the issue, provide the backwards compatability GUCs and move on.

I really don't think this will fly.  I'm not buying your argument (at
all) that compatibility breaks have have been cleanly done in the
past, at least not in the modern era.  In any event, marginal language
improvements are not a good justification to do it.   And yes, the
continual monkey around with column names in pg_stat_activity are a
major hassle.  For heaven's sake, can we just add new columns and/or
create a new view?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] merging some features from plpgsql2 project

2017-01-03 Thread Merlin Moncure

On Tue, Dec 27, 2016 at 10:09 PM, Jim Nasby <jim.na...@bluetreble.com> wrote:
> On 12/27/16 4:56 PM, Merlin Moncure wrote:
>>
>> On Tue, Dec 27, 2016 at 1:54 AM, Pavel Stehule <pavel.steh...@gmail.com>
>> wrote:
>>>
>>> First I describe my initial position. I am strongly against introduction
>>> "new" language - plpgsql2 or new plpgsql, or any else. The trust of
>>> developers to us is important and introduction of any not compatible or
>>> different feature has to have really big reason. PostgreSQL is
>>> conservative
>>> environment, and PLpgSQL should not be a exception. More - I have not any
>
>
> Which is why this is an external fork of plpgsql.
>
> ** The real problem is that we have no mechanism for allowing a PL's
> language/syntax/API to move forward without massive backwards compatibility
> problems. **

Just got back from break :-).  Have some thoughts on this.  Backwards
compatibility is really a fundamental problem.  There's really no
solution to it other than to try and avoid using syntax to solve
problems.  It should be obvious to everyone that plgsql cannot
withstand a compatibility break.  Another language could be offered as
an alternative in core (say, pl/psm or pl/v8), but pl/pgsql has to
support old code.   Some really out there features could maybe be
redacted (in particular, using = for assignment), but not not much.
But I guess we're stuck with the status quo.

I think we ought to avoid language features that influence the
behavior (performance is ok) of the code (and that includes throwing
errors).  That's a freight train headed towards javscript's 'strict'
mode, which is thinly disguised language fork.  #option and pragma
type syntaxes are trying to cheat the language -- hardly anyone uses
them and it's a tricky way to try and make the language into something
other than it is.

C does it right -- dubious code is raised as warnings and it's up to
the end user to determine which warnings are interesting and likely to
be an error.  So, rather than hacking the language to control throwing
and errors and such there should be some ability validate the function
heavily and verify suspicious use of INTO or other dubious things
(unused variables, masked assignments, etc).  The validation output
could even be a set returning function.

So -1 to strict mode, unless we can make a case why this can't be done
as part of checking/validation.

Other random points:
*) Another major pain point is swapping in the input variables for
debugging purposes.  Something that emits a script based on a set of
arguments would be wonderful.

*) Would also like to have a FINALLY block

*) A mechanic to manually print out a stack trace for debugging
purposes would be helpful.

*) COPY not being able to accept arguments as variables (in particular
the filename) is a major headache

*) Upon error, we ought to print a few lines of context on either side
of the error.  Not very fun to try and figure out where the errors is
when you are working off of 'syntax error near "FROM"' etc.  This is a
major problem throughout the platform IMO.

*) Some user visible mechanic other than forcing SQL through EXECUTE
to be able to control plan caching would be useful.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] merging some features from plpgsql2 project

2016-12-27 Thread Merlin Moncure

On Tue, Dec 27, 2016 at 1:54 AM, Pavel Stehule  wrote:
> Hi
>
> I reread ideas described on page https://github.com/trustly/plpgsql2
>
> Some points are well and can be benefit for PlpgSQL.
>
> First I describe my initial position. I am strongly against introduction
> "new" language - plpgsql2 or new plpgsql, or any else. The trust of
> developers to us is important and introduction of any not compatible or
> different feature has to have really big reason. PostgreSQL is conservative
> environment, and PLpgSQL should not be a exception. More - I have not any
> information from my customers, colleagues about missing features in this
> language.  If there is some gaps, then it is in outer environment - IDE,
> deployment, testing,

Breaking language compatibility is a really big deal.  There has to be
a lot of benefits to the effort and you have to make translation from
plpgsql1 to plpgsql2 really simple.  You have made some good points on
the rationale but not nearly enough to justify implementation fork. So
basically I agree.  Having said that, If you don't mind I'd like to
run with the topic (which I'm loosely interpreting as, "Things I'd
like to do in SQL/PLPGSQL and can't").

#1 problem with plpgsql in my point of view is that the language and
grammar are not supersets of sql.  A lot of PLPGSQL keywords (EXECUTE,
BEGIN, INTO, END) have incompatible meanings with our SQL
implementation.  IMNSHO, SQL ought to give the same behavior inside or
outside of plpgsql.  It doesn't, and this is one of the reasons why
plpgsql may not be a good candidate for stored procedure
implementation.

#2 problem with plpgsql is after function entry it's too late to do
things like set transaction isolation level and change certain kinds
of variables (like statement_timeout).  This is very obnoxious, I
can't wrap the database in an API 100%; the application has to manage
things that really should be controlled in SQL.

#3 problem with plpgsql is complete lack of inlining.  inlining
function calls in postgres is a black art even for very trivial cases.
This makes it hard for us to write quick things and in the worst case
causes endless duplications of simple expressions.

In short I guess the issue is that we don't have stored procedures and
I don't see an easy path to getting there with the current language.
There are a lot of other little annoyances but most of them can be
solved without a compatibility break.

It would be pretty neat if postgres SQL implementation could directly
incorporate limited flow control and command execution.  For example,
CREATE my_proc(Done OUT BOOL) RETURNS BOOL AS
$$
  BEGIN;
  SET transaction_isolation = 'serializable';
  SELECT some_plpgsql_func_returning_bool();
  COMMIT;
$$;
CALL my_proc() UNTIL Done;

Key points here are:
*) my_proc is in native SQL (not plpgsql), and run outside of snapshot
*) CALL is invocation into stored procedure.  I extended it in similar
fashion as pl/sql CALL
(https://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_4008.htm)
but anything will do for syntaxs as long as you get arbitrary control
of procedure lifetime external to snapshot and transaction
*) simple addition of UNTIL gets us out of the debate for best 'stored
procedure language'.   Keeping things to pure SQL really simplifies
things since we already have statement parsing at tcop level.  We just
need some special handling for CALL.
*) In my usage of plpgsql maybe 80% of database cases are covered
purely in language but maybe 20% of cases need support from
application typically where threading and transaction management is
involved.  With the above it would be more like 95% would be covered
and if you extended CALL to something like:

CALL my_proc() IN BACKGROUND UNTIL Done;

..where "IN BACKGOUND" moved execution to a background worker one
could do just about everything in SQL in tasks that do nothing but
read and write to the database that today need significant support
from outside language (primarily bash for me).

With respect to stuff you mentioned, like smarter handling of INTO,
are you really sure you need to break compatibility for that?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb problematic operators

2016-12-12 Thread Merlin Moncure

On Sun, Dec 11, 2016 at 10:59 PM, Craig Ringer  wrote:
> PgJDBC allows you to write ??, which is ugly, but tolerable, since the
> JDBC spec doesn't have an escape syntax for it.

This is the core problem; *JDBC* is busted.  SQL reserves words but
not punctuation marks so any assumption by client side code that
characters are not going to be interpreted by the server are going to
cause problems.  As noted earlier ":" is equally problematic as that
is hibernate's parameter marker and hibernate is probably in even
greater usage than naked JDBC in the java community.

Imagine trying to embed, say, perl, in java and reserving the very
same punctuation marks and then complaining to the perl community that
their language is broken due to usage overlap...that's what's
happening here.  If we really wanted to fix this, maybe the right way
to think about the problem is a highly reduced character set and a
pre-processor or an extension.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb problematic operators

2016-12-09 Thread Merlin Moncure

On Fri, Dec 9, 2016 at 5:50 AM, Jordan Gigov  wrote:
> There is this problem with the jsonb operators "? text" "?| text[]"
> and "?& text[]" that the question mark is typically used for prepared
> statement parameters in the most used abstraction APIs in Java and
> PHP.
>
> This really needs an alternative. Something like "HAS text", "HAS
> ANY(text[])" and "HAS ALL(text[])" same as regular array usage. It
> probably should be another word that has less chance of becoming a
> conflict with another operator in future SQL specifications, but
> that's for you to decide.
>
> It's not a good idea to expect everyone else to make for workarounds
> for problems you choose to create.

You are griping in the wrong place.  "everyone else" has reserved
characters for its own use that were not allowed to be reserved
without a clean escaping mechanism -- hibernate does this, for example
reserving ':'  which is used in many places within SQL.

Typically when you embed special characters in strings designed to be
processed by something else you allow for that character to be
directly.  In the computer science world we generally call this
escaping strings and it a very common and well understood practice.
For some odd reason however the authors of java various frameworks
seem to be impervious to the utility of the concept.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-11-07 Thread Merlin Moncure

On Wed, Nov 2, 2016 at 10:45 AM, Oskari Saarenmaa  wrote:
> 26.10.2016, 21:34, Andres Freund kirjoitti:
>>
>> Any chance that plsh or the script it executes does anything with the file
>> descriptors it inherits? That'd certainly one way to get into odd corruption
>> issues.
>>
>> We processor really should use O_CLOEXEC for the majority of it file
>> handles.
>
>
> Attached a patch to always use O_CLOEXEC in BasicOpenFile if we're not using
> EXEC_BACKEND.  It'd be nice to not expose all fds to most pl-languages
> either, but I guess there's no easy solution to that without forcibly
> closing all fds whenever any functions are called.

FYI, this is not my first run-in with strange behavior, on this thread
(not necessarily worth reading);
https://www.postgresql.org/message-id/CAHyXU0x5mW-SbSuUBEshzumOaN7JPUWa7Ejza68HE-KY0Nq7Kg%40mail.gmail.com

I had a similar set of starting conditions that resulted in very
strange behavior (but not data corruption AFAICT) --the problem
mysteriously disappeared when I fixed some bugs that would cause the
routine to concurrently do the same operation.  I would like to point
out that I use both pl/sh (and via it, sqsh) very highly, so these
problems are not necessarily the norm.

Regardless, it seems like you might be on to something, and I'm
inclined to patch your change, test it, and roll it out to production.
If it helps or at least narrows the problem down, we ought to give it
consideration for inclusion (unless someone else can think of a good
reason not to do that, heh!).

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-11-01 Thread Merlin Moncure

On Tue, Nov 1, 2016 at 8:56 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmonc...@gmail.com> writes:
>> On Mon, Oct 31, 2016 at 10:32 AM, Oskari Saarenmaa <o...@ohmu.fi> wrote:
>>> Your production system's postgres backends probably have a lot more open
>>> files associated with them than the simple test case does.  Since Postgres
>>> likes to keep files open as long as possible and only closes them when you
>>> need to free up fds to open new files, it's possible that your production
>>> backends have almost all allowed fds used when you execute your pl/sh
>>> function.
>>>
>>> If that's the case, the sqsh process that's executed may not have enough fds
>>> to do what it wanted to do and because of busted error handling could end up
>>> writing to fds that were opened by Postgres and point to $PGDATA files.
>
>> Does that apply?  the mechanics are a sqsh function that basically does:
>> cat foo.sql  | sqsh 
>> pipe redirection opens a new process, right?
>
> Yeah, but I doubt that either level of the shell would attempt to close
> inherited file handles.
>
> The real problem with Oskari's theory is that it requires not merely
> busted, but positively brain-dead error handling in the shell and/or
> sqsh, ie ignoring open() failures altogether.  That seems kind of
> unlikely.  Still, I suspect he might be onto something --- there must
> be some reason you can reproduce the issue in production and not in
> your test bed, and number-of-open-files is as good a theory as I've
> heard.
>
> Maybe the issue is not with open() failures, but with the resulting
> FD numbers being much larger than sqsh is expecting.  It would be
> weird to try to store an FD in something narrower than int, but
> I could see a use of select() being unprepared for large FDs.
> Still, it's hard to translate that idea into scribbling on the
> wrong file...

Looking at the sqsh code, nothing really stands out.  It's highly
developed and all obvious errors are checked.  There certainly could
be a freak bug in there (or in libfreetds which sqsh links to) doing
the damage though.  In the meantime I'll continue to try and work a
reliable reproduction. This particular routine only gets called in
batches on a quarterly basis so things have settled down.

Just a thought; could COPY be tricked into writing into the wrong file
descriptor?  For example, if a file was killed with a rm -rf and the
fd pressured backend reopened the fd immediately?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-11-01 Thread Merlin Moncure

On Mon, Oct 31, 2016 at 10:32 AM, Oskari Saarenmaa <o...@ohmu.fi> wrote:
> 27.10.2016, 21:53, Merlin Moncure kirjoitti:
>>
>> As noted earlier, I was not able to reproduce the issue with
>> crashme.sh, which was:
>>
>> NUM_FORKS=16
>> do_parallel psql -p 5432  -c"select PushMarketSample('1740')"
>> castaging_test
>> do_parallel psql -p 5432  -c"select PushMarketSample('4400')"
>> castaging_test
>> do_parallel psql -p 5432  -c"select PushMarketSample('2160')"
>> castaging_test
>> do_parallel psql -p 5432  -c"select PushMarketSample('6680')"
>> castaging_test
>> 
>>
>> (do_parallel is simple wrapper to executing the command in parallel up
>> to NUM_FORKS).   This is on the same server and cluster as above.
>> This kind of suggests that either
>> A) there is some concurrent activity from another process that is
>> tripping the issue
>> or
>> B) there is something particular to the session invoking the function
>> that is participating in the problem.  As the application is
>> structured, a single threaded node.js app is issuing the query that is
>> high traffic and long lived.  It's still running in fact and I'm kind
>> of tempted to find some downtime to see if I can still reproduce via
>> the UI.
>
> Your production system's postgres backends probably have a lot more open
> files associated with them than the simple test case does.  Since Postgres
> likes to keep files open as long as possible and only closes them when you
> need to free up fds to open new files, it's possible that your production
> backends have almost all allowed fds used when you execute your pl/sh
> function.
>
> If that's the case, the sqsh process that's executed may not have enough fds
> to do what it wanted to do and because of busted error handling could end up
> writing to fds that were opened by Postgres and point to $PGDATA files.

Does that apply?  the mechanics are a sqsh function that basically does:
cat foo.sql  | sqsh 

pipe redirection opens a new process, right?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-28 Thread Merlin Moncure

On Fri, Oct 28, 2016 at 3:16 PM, Jim Nasby <jim.na...@bluetreble.com> wrote:
> On 10/28/16 8:23 AM, Merlin Moncure wrote:
>>
>> On Thu, Oct 27, 2016 at 6:39 PM, Greg Stark <st...@mit.edu> wrote:
>>>
>>> On Thu, Oct 27, 2016 at 9:53 PM, Merlin Moncure <mmonc...@gmail.com>
>>> wrote:
>>>>
>>>> I think we can rule out faulty storage
>>>
>>>
>>> Nobody ever expects the faulty storage
>
>
> LOL
>
>> Believe me, I know.  But the evidence points elsewhere in this case;
>> this is clearly application driven.
>
>
> FWIW, just because it's triggered by specific application behavior doesn't
> mean there isn't a storage bug. That's what makes data corruption bugs such
> a joy to figure out.
>
> BTW, if you haven't already, I would reset all your storage related options
> and GUCs to safe defaults... plain old FSYNC, no cute journal / FS / mount
> options, etc. Maybe this is related to the app, but the most helpful thing
> right now is to find some kind of safe config so you can start bisecting.

upthread, you might have noticed that I already did that.   Here is
the other evidence:
*) server running fine for 5+ years
*) other database on same cluster not impacted with 10x write activity
*) no interesting logs reported in  /var/log/messages, dmesg, etc
*) san fabric turns over petabytes/day with no corruption. 100+
postgres clusters, 1000+ sql server clusters (and that's not
production)
*) storage/network teams have been through everything. nothing
intersting/unusual to report
*) we have infrequently run routing (posted upthread) that, when run,
database crashed within minutes
*) after turning on checksums, 30% of invocations of routine resulted
in checksum error
*) problem re-occurred after dump-restore and full cluster rebuild
*) checksum error caused routine rollback.  FWICT this prevented the damage
*) everything is fine now that routine is not being run anymore

you can come up with your conclusion, I've come up with mine.  The
only frustrating thing here is that I can't reproduce out of the
production environment.  If this database goes down I have 30 people
sitting around so I can't take downtime lightly.

> I would also consider alternatives to plsh, just to rule it out if nothing
> else. I'd certainly look at some way to get sqsh out of the loop (again,
> just to get something that doesn't crash). First idea that comes to mind is
> a stand-alone shell script that watches a named pipe for a filename; when it
> gets that file it runs it with sqsh and does something to signal completion.

I do a lot of etl to/from sql server and it's all sqsh based.  If I
can figure out how to reproduce in a better way, I'll zero in on the
problem in about 10 minutes.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-28 Thread Merlin Moncure

On Thu, Oct 27, 2016 at 6:39 PM, Greg Stark <st...@mit.edu> wrote:
> On Thu, Oct 27, 2016 at 9:53 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
>> I think we can rule out faulty storage
>
> Nobody ever expects the faulty storage

Believe me, I know.  But the evidence points elsewhere in this case;
this is clearly application driven.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-27 Thread Merlin Moncure

On Thu, Oct 27, 2016 at 2:31 AM, Ants Aasma <ants.aa...@eesti.ee> wrote:
> On Wed, Oct 26, 2016 at 8:43 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
>> /var/lib/pgsql/9.5/data/pg_log/postgresql-26.log | grep "page
>> verification"
>> 2016-10-26 11:26:42 CDT [postgres@castaging]: WARNING:  page
>> verification failed, calculated checksum 37251 but expected 37244
>> 2016-10-26 11:27:55 CDT [postgres@castaging]: WARNING:  page
>> verification failed, calculated checksum 37249 but expected 37244
>> 2016-10-26 12:16:44 CDT [postgres@castaging]: WARNING:  page
>> verification failed, calculated checksum 44363 but expected 44364
>> 2016-10-26 12:18:58 CDT [postgres@castaging]: WARNING:  page
>> verification failed, calculated checksum 49525 but expected 49539
>> 2016-10-26 12:19:12 CDT [postgres@castaging]: WARNING:  page
>> verification failed, calculated checksum 37345 but expected 37340
>
> The checksum values are improbably close. The checksum algorithm has
> decently good mixing of all bits in the page. Having the first byte
> match in 5 checksums makes this 1:2^40 improbable. What is not mixed
> in properly is the block number, it only gets xor'ed before packing
> the value into 16bits using modulo 0x. So I'm pretty sure
> different block numbers were used for writing out and reading in the
> page. Either the blocknum gets corrupted between calculating the
> checksum and writing the page out (unlikely given the proximity), or
> the pages are somehow getting transposed in the storage.

I think we can rule out faulty storage, and I'm reasonably sure
nothing is writing to the database cluster except postgres itself.

So far I have not executed an emergency dump/restore on this database,
nor have I received any other log entries suggesting problems of any
kind.  Meaning, FWICT, the database is completely healthy
notwithstanding the reported errors.

This suggests (but does not prove) that the syscatalog damaging
activity happened during the commission of the attached function and
healed itself during the rollback.   Previously to turning checksums,
the transaction did *not* roll back and the damage was allowed stand
and probably accumulated.

As noted earlier, I was not able to reproduce the issue with
crashme.sh, which was:

NUM_FORKS=16
do_parallel psql -p 5432  -c"select PushMarketSample('1740')" castaging_test
do_parallel psql -p 5432  -c"select PushMarketSample('4400')" castaging_test
do_parallel psql -p 5432  -c"select PushMarketSample('2160')" castaging_test
do_parallel psql -p 5432  -c"select PushMarketSample('6680')" castaging_test

(do_parallel is simple wrapper to executing the command in parallel up
to NUM_FORKS).   This is on the same server and cluster as above.
This kind of suggests that either
A) there is some concurrent activity from another process that is
tripping the issue
or
B) there is something particular to the session invoking the function
that is participating in the problem.  As the application is
structured, a single threaded node.js app is issuing the query that is
high traffic and long lived.  It's still running in fact and I'm kind
of tempted to find some downtime to see if I can still reproduce via
the UI.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-26 Thread Merlin Moncure

On Wed, Oct 26, 2016 at 3:06 PM, Jim Nasby <jim.na...@bluetreble.com> wrote:
> On 10/26/16 2:25 PM, Merlin Moncure wrote:
>>
>> I don't think that's the case.  sqsh is a psql-like utility.  it
>> writes to stdout  and stderr only which is captured by plsh and sent.
>> In this context shexec only wraps rm -f 'file' where 'file' is a file
>> previously created with COPY in the same transaction.
>
> How do you know that? It could potentially be doing anything it wanted with
> file handles. Removing the exec might "solve" the problem here, assuming
> that the forked process doesn't still inherit all open FH's.
>
> In a nutshell, running arbitrary shell code (let alone arbitrary
> executables) in a pl that's running inside the Postgres backend sounds
> insanely risky to me.

Well, it is insantely risky, but the code in this context is not
arbitrary; it's only:
PERFORM shexec('rm -f ' || _OutputFile);

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-26 Thread Merlin Moncure

On Wed, Oct 26, 2016 at 2:12 PM, Andres Freund <and...@anarazel.de> wrote:
> On 2016-10-26 13:49:12 -0500, Merlin Moncure wrote:
>> On Wed, Oct 26, 2016 at 1:45 PM, Andres Freund <and...@anarazel.de> wrote:
>> >
>> >
>> > On October 26, 2016 9:38:49 PM GMT+03:00, Merlin Moncure 
>> > <mmonc...@gmail.com> wrote:
>> >>On Wed, Oct 26, 2016 at 1:34 PM, Andres Freund <and...@anarazel.de>
>> >>wrote:
>> >>> Any chance that plsh or the script it executes does anything with the
>> >>file descriptors it inherits? That'd certainly one way to get into odd
>> >>corruption issues.
>> >>
>> >>not sure.  it's pretty small -- see
>> >>https://github.com/petere/plsh/blob/master/plsh.c
>> >
>> > Afaics that could also be in your script, not just plsh. The later doesn't 
>> > seem to close all file handles above stderr, which means that all handles 
>> > for relations etc week be open in your script.  If you e.g. do any unusual 
>> > redirections (2>&17 or such), that could end badly.   But I'm just on my 
>> > phone, in a taxi without seatbelts, at 60mph, so I didn't look carefully.
>>
>> gotcha :-).  see above:
>> *) sqshf:
>> #!/bin/bash
>> cat \
>>   $2 \
>>   | eval "sqsh $1 -L'datetime=%Y-%m-%d %H:%M:%S.%u' -G 7.0"
>>
>> echo "Success"
>>
>> *) shexec:
>> #!/bin/bash
>>
>> eval $1
>>
>> FWICT that's all that's happening here with respect to pl/sh.
>
> My point is that that doesn't mean anything. Whatever sqsh is, or
> whatever $1 eval's to (and $2 for that matter), could access the
> filehandles the backend has opened.

I don't think that's the case.  sqsh is a psql-like utility.  it
writes to stdout  and stderr only which is captured by plsh and sent.
In this context shexec only wraps rm -f 'file' where 'file' is a file
previously created with COPY in the same transaction.

In a test environment, spamming this routine with 16 threads I was not
able to reproduce.  Still hunting...

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-26 Thread Merlin Moncure

On Wed, Oct 26, 2016 at 1:45 PM, Andres Freund <and...@anarazel.de> wrote:
>
>
> On October 26, 2016 9:38:49 PM GMT+03:00, Merlin Moncure <mmonc...@gmail.com> 
> wrote:
>>On Wed, Oct 26, 2016 at 1:34 PM, Andres Freund <and...@anarazel.de>
>>wrote:
>>> Any chance that plsh or the script it executes does anything with the
>>file descriptors it inherits? That'd certainly one way to get into odd
>>corruption issues.
>>
>>not sure.  it's pretty small -- see
>>https://github.com/petere/plsh/blob/master/plsh.c
>
> Afaics that could also be in your script, not just plsh. The later doesn't 
> seem to close all file handles above stderr, which means that all handles for 
> relations etc week be open in your script.  If you e.g. do any unusual 
> redirections (2>&17 or such), that could end badly.   But I'm just on my 
> phone, in a taxi without seatbelts, at 60mph, so I didn't look carefully.

gotcha :-).  see above:
*) sqshf:
#!/bin/bash
cat \
  $2 \
  | eval "sqsh $1 -L'datetime=%Y-%m-%d %H:%M:%S.%u' -G 7.0"

echo "Success"

*) shexec:
#!/bin/bash

eval $1

FWICT that's all that's happening here with respect to pl/sh. I'm
almost done with reproduction environment and if i get the issue I
should be able to zero in on the problem.  One final thing is that
concurrency is a contributing factor.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-26 Thread Merlin Moncure

On Wed, Oct 26, 2016 at 1:34 PM, Andres Freund  wrote:
> Any chance that plsh or the script it executes does anything with the file 
> descriptors it inherits? That'd certainly one way to get into odd corruption 
> issues.

not sure.  it's pretty small -- see
https://github.com/petere/plsh/blob/master/plsh.c

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-26 Thread Merlin Moncure

On Wed, Oct 26, 2016 at 1:09 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmonc...@gmail.com> writes:
>> *) I've now strongly correlated this routine with the damage.
>
> Hmm.  Do you have any way to replace the non-core calls with something
> else?  The "shexec('rm -f ' || _OutputFile)" bits could presumably be
> converted to use contrib/adminpack's pg_file_unlink(), or an equivalent
> one-liner in plperlu or plpythonu.  I don't know what the sqshf
> incantation does, though.
>
> This wouldn't be about fixing it so much as narrowing down where the
> problem is.

Will do. Before you spend a lot of time, let me get full confirmation
that this is the source of the issue and I'll bake a reproduction
script (there is still a possibility that something *else* caused it
and by happenstance we caught it here).

For posterity,  sqshf does:

CREATE OR REPLACE FUNCTION sqshf(
  ConnectionString TEXT,
  FileName TEXT) RETURNS TEXT AS
$$#!/bin/bash

cat \
  $2 \
  | eval "sqsh $1 -L'datetime=%Y-%m-%d %H:%M:%S.%u' -G 7.0"

echo "Success"

$$ LANGUAGE plsh;

sqsh being the freetds wrapping linux console to sql server.   So
basically not much.

shexec does:

CREATE OR REPLACE FUNCTION shexec(_cmd TEXT) RETURNS TEXT AS
$$#!/bin/bash

eval $1

$$ LANGUAGE plsh;

:-)

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-26 Thread Merlin Moncure

On Wed, Oct 26, 2016 at 12:43 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
> On Wed, Oct 26, 2016 at 11:35 AM, Merlin Moncure <mmonc...@gmail.com> wrote:
>> On Tue, Oct 25, 2016 at 3:08 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
>>> Confirmation of problem re-occurrence will come in a few days.I'm
>>> much more likely to believe 6+sigma occurrence (storage, freak bug,
>>> etc) should it prove the problem goes away post rebuild.
>>
>> ok, no major reported outage yet, but just got:
>>
>> 2016-10-26 11:27:55 CDT [postgres@castaging]: ERROR:  invalid page in
>> block 12 of relation base/203883/1259

*) I've now strongly correlated this routine with the damage.

[root@rcdylsdbmpf001 ~]# cat
/var/lib/pgsql/9.5/data/pg_log/postgresql-26.log  | grep -i
pushmarketsample | head -5
2016-10-26 11:26:27 CDT [postgres@castaging]: LOG:  execute :
SELECT PushMarketSample($1::TEXT) AS published
2016-10-26 11:26:40 CDT [postgres@castaging]: LOG:  execute :
SELECT PushMarketSample($1::TEXT) AS published
PL/pgSQL function pushmarketsample(text,date,integer) line 103 at SQL statement
PL/pgSQL function pushmarketsample(text,date,integer) line 103 at SQL statement
2016-10-26 11:26:42 CDT [postgres@castaging]: STATEMENT:  SELECT
PushMarketSample($1::TEXT) AS published

*) First invocation was 11:26:27 CDT

*) Second invocation was 11:26:40 and gave checksum error (as noted
earlier 11:26:42)

*) Routine attached (if interested)

My next step is to set up test environment and jam this routine
aggressively to see what happens.

merlin
/*
 * Send sample for a market to the pubs database.  It will delete existing 
 * sample and replace it with whatever is determined to be the best available
 * samble for each aparmtent in the market.
 *
 * XXX: assume 'asof' defines the period for which we are replacing data. 
 * Should that assumption be invalidated, we will need to accept an argument
 * overriding the period.
 *
 * XXX: AsOf controls the extraction date but if a properites 'good' sample
 * as after asof, no sample is pulled for the property. 
 */
SELECT DropFunction('PushMarketSample');

CREATE OR REPLACE FUNCTION PushMarketSample(
  _MarketID TEXT,
  _AsOf DATE DEFAULT now()::DATE,
  _MaxSampleDays INT DEFAULT 90,
  Result OUT TEXT) RETURNS TEXT AS
$$
DECLARE
  _OutputFile TEXT;
BEGIN
  _OutputFile := format('/tmp/apartment_sample_%s_%s.sql',
_MarketId,
to_char(_AsOf, 'MMDD'));

  DROP TABLE IF EXISTS AptSample;

  CREATE TEMP TABLE AptSample ON COMMIT DROP AS 
SELECT 
  ApartmentId,
  FloorplanNumber,
  msasubmkt AS SubmarketId,
  (SELECT Period FROM DataEntryPeriod) AS Sampled,
  Vacant,
  DiscountedLowRent,
  DiscountedHighRent,
  LowRent,
  HighRent,
  LowDiscountRate,
  HighDiscountRate,
  Remarks, /* Is there a spot for remarks in MPF? */
  TotalUnits,
  UnitCount,
  IncludeInAggregations,
  Rent,
  Occupancy
FROM vw_ApartmentSample aq
WHERE 
  Current
  AND Sampled::DATE BETWEEN (_AsOf - _MaxSampleDays) AND _AsOf
  AND _MarketID = msa
  AND DataSource = 'Survey';

  /* XXX: write out the sql script because we are stupidly prohibited from 
   * using pl/pgsql variables embedded into sql statements when using COPY.
   */
  DROP TABLE IF EXISTS Script;
  CREATE TEMP TABLE Script ON COMMIT DROP AS 
SELECT 'begin transaction'
UNION ALL SELECT 
  format(
'DELETE tblAptDet '
'FROM tblAptDet d '
'JOIN tblApt a ON a.id = d.id '
'WHERE '
'  a.msa = %s '
'  AND d.Period = %s',
quote_literal(_MarketID),
(SELECT Period FROM DataEntryPeriod))
UNION ALL SELECT format(
  'INSERT INTO tblaptdet('
'Id, PlanNumber, Period, Vacant, DLoRate, DHiRate, '
'FLoRate, FHiRate, LoDiscount, HiDiscount, Remarks, Quantity, '
'UpDtoHistory) '
  'VALUES('
'%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)',
quote_nullable(ApartmentId),
quote_nullable(FloorplanNumber),
quote_nullable(Sampled),
quote_nullable(Vacant),
quote_nullable(DiscountedLowRent),
quote_nullable(DiscountedHighRent),
quote_nullable(LowRent),
quote_nullable(HighRent),
quote_nullable(LowDiscountRate),
quote_nullable(HighDiscountRate),
quote_nullable(Remarks),
quote_nullable(UnitCount),
CASE WHEN IncludeInAggregations THEN 1 ELSE 0 END)
FROM AptSample
UNION ALL SELECT 'commit transaction'
UNION ALL SELECT 'go';

  EXECUTE 'COPY Script TO ' || quote_literal(_OutputFile);

  /* push in the new sample! */
  SELECT sqshf(ConnectTo('MPFUI'), _OutputFile) INTO Result;

  PERFORM shexec('rm -f ' || _OutputFile);

  DELETE FROM MarketPublishTracking WHERE 
MarketId = _MarketID
AND Period =(SELECT Period FROM DataEntryPeriod);

  DELETE FROM Submarket

Re: [HACKERS] emergency outage requiring database restart

2016-10-26 Thread Merlin Moncure

On Wed, Oct 26, 2016 at 11:35 AM, Merlin Moncure <mmonc...@gmail.com> wrote:
> On Tue, Oct 25, 2016 at 3:08 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
>> Confirmation of problem re-occurrence will come in a few days.I'm
>> much more likely to believe 6+sigma occurrence (storage, freak bug,
>> etc) should it prove the problem goes away post rebuild.
>
> ok, no major reported outage yet, but just got:
>
> 2016-10-26 11:27:55 CDT [postgres@castaging]: ERROR:  invalid page in
> block 12 of relation base/203883/1259


*) still on 9.5.2 :(  not in control of the package update process

*) getting more of these:
[root@rcdylsdbmpf001 data]# cat
/var/lib/pgsql/9.5/data_checksum/pg_log/postgresql-26.log  | grep
"invalid page"
2016-10-26 11:26:42 CDT [postgres@castaging]: ERROR:  invalid page in
block 10 of relation base/203883/1259
2016-10-26 11:27:55 CDT [postgres@castaging]: ERROR:  invalid page in
block 12 of relation base/203883/1259
2016-10-26 12:16:44 CDT [postgres@castaging]: ERROR:  invalid page in
block 13 of relation base/203883/1259
2016-10-26 12:18:58 CDT [postgres@castaging]: ERROR:  invalid page in
block 15 of relation base/203883/1259
2016-10-26 12:19:12 CDT [postgres@castaging]: ERROR:  invalid page in
block 7 of relation base/203883/2662

castaging=# select relname from pg_class where oid in(1259, 2662);
  relname

 pg_class_oid_index
 pg_class

*) only one database in the cluster is damaged.   I do not suspect
storage at this point

*) Currently I can execute pg_dump with no problems.   So far impact
is not severe but soon I have to do dump/restore

*) age(xmin) from pg_class gives all reasonable values (approx 757k)
that aren't 2147483647

*) All cases of statement executing getting this error in a pl/pgsql
routine that does the following:
BEGIN
...
  

  DROP TABLE IF EXISTS foo;
  CREATE TEMP TABLE foo ON COMMIT DROP AS ...
  SELECT  ... INTO Result;

  DROP TABLE IF EXISTS bar;
  CREATE TEMP TABLE bar ON COMMIT DROP AS ...

 PERFORM  ;

EXCEPTION
  WHEN OTHERS THEN
 PERFORM  ;
 RAISE;
END;

*) page verification miss is numerically close
[root@rcdylsdbmpf001 ~]# cat
/var/lib/pgsql/9.5/data/pg_log/postgresql-26.log | grep "page
verification"
2016-10-26 11:26:42 CDT [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 37251 but expected 37244
2016-10-26 11:27:55 CDT [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 37249 but expected 37244
2016-10-26 12:16:44 CDT [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 44363 but expected 44364
2016-10-26 12:18:58 CDT [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 49525 but expected 49539
2016-10-26 12:19:12 CDT [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 37345 but expected 37340

*) sample log leading into the first error (all queries are logged)
2016-10-26 11:26:40 CDT [postgres@castaging]: LOG:  execute :
SELECT PushMarketSample($1::TEXT) AS published
2016-10-26 11:26:40 CDT [postgres@castaging]: DETAIL:  parameters: $1 = '8840'
2016-10-26 11:26:40 CDT [postgres@mpf2]: LOG:  statement: START
TRANSACTION ISOLATION LEVEL REPEATABLE READ
2016-10-26 11:26:40 CDT [postgres@mpf2]: LOG:  statement: SAVEPOINT s2
2016-10-26 11:26:40 CDT [postgres@mpf2]: LOG:  execute :
DECLARE c1 CURSOR FOR
SELECT period FROM public.dataentryperiod
2016-10-26 11:26:40 CDT [postgres@mpf2]: LOG:  statement: FETCH 100 FROM c1
2016-10-26 11:26:41 CDT [postgres@mpf2]: LOG:  statement: CLOSE c1
2016-10-26 11:26:41 CDT [postgres@mpf2]: LOG:  execute :
DECLARE c2 CURSOR FOR
SELECT period FROM public.dataentryperiod
2016-10-26 11:26:41 CDT [postgres@mpf2]: LOG:  statement: FETCH 100 FROM c2
2016-10-26 11:26:41 CDT [postgres@mpf2]: LOG:  statement: CLOSE c2
2016-10-26 11:26:41 CDT [rms@mpf2]: ERROR:  relation "tblagent" does
not exist at character 15
2016-10-26 11:26:41 CDT [rms@mpf2]: STATEMENT:  select * from tblagent
2016-10-26 11:26:41 CDT [rms@mpf2]: LOG:  execute : SELECT
CURRENT_SCHEMA()
2016-10-26 11:26:42 CDT [postgres@mpf2]: LOG:  execute :
DECLARE c3 CURSOR FOR
SELECT period FROM public.dataentryperiod
2016-10-26 11:26:42 CDT [postgres@mpf2]: LOG:  statement: FETCH 100 FROM c3
2016-10-26 11:26:42 CDT [postgres@mpf2]: LOG:  statement: CLOSE c3
2016-10-26 11:26:42 CDT [postgres@mpf2]: LOG:  execute :
DECLARE c4 CURSOR FOR
SELECT period FROM public.dataentryperiod
2016-10-26 11:26:42 CDT [postgres@mpf2]: LOG:  statement: FETCH 100 FROM c4
2016-10-26 11:26:42 CDT [postgres@mpf2]: LOG:  statement: CLOSE c4
2016-10-26 11:26:42 CDT [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 37251 but expected 37244
2016-10-26 11:26:42 CDT [postgres@castaging]: CONTEXT:  SQL statement
"CREATE TEMP TABLE ApartmentPublishBySubmkt ON COMMIT DROP AS
  SELECT

Re: [HACKERS] emergency outage requiring database restart

2016-10-26 Thread Merlin Moncure

On Tue, Oct 25, 2016 at 3:08 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
> Confirmation of problem re-occurrence will come in a few days.I'm
> much more likely to believe 6+sigma occurrence (storage, freak bug,
> etc) should it prove the problem goes away post rebuild.

ok, no major reported outage yet, but just got:

2016-10-26 11:27:55 CDT [postgres@castaging]: ERROR:  invalid page in
block 12 of relation base/203883/1259

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-25 Thread Merlin Moncure

On Tue, Oct 25, 2016 at 2:31 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmonc...@gmail.com> writes:
>> What if the subsequent dataloss was in fact a symptom of the first
>> outage?  Is in theory possible for data to appear visible but then be
>> eaten up as the transactions making the data visible get voided out by
>> some other mechanic?  I had to pull a quick restart the first time and
>> everything looked ok -- or so I thought.   What I think was actually
>> happening is that data started to slip into the void.   It's like
>> randomly sys catalogs were dropping off. I bet other data was, too.  I
>> can pull older backups and verify that.  It's as if some creeping xmin
>> was snuffing everything out.
>
> Might be interesting to look at age(xmin) in a few different system
> catalogs.  I think you can ignore entries with age = 2147483647;
> those should be frozen rows.  But if you see entries with very large
> ages that are not that, it'd be suspicious.

nothing really stands out.

The damage did re-occur after a dump/restore -- not sure about a
cluster level rebuild.  No problems previous to that.  This suggests
that if this theory holds the damage would have had to have been under
the database level -- perhaps in clog.  Maybe hint bits and clog did
not agree as to commit or delete status for example.  clog has plenty
of history leading past the problem barrier:
-rwx-- 1 postgres postgres 256K Jul 10 16:21 
-rwx-- 1 postgres postgres 256K Jul 21 12:39 0001
-rwx-- 1 postgres postgres 256K Jul 21 13:19 0002
-rwx-- 1 postgres postgres 256K Jul 21 13:59 0003


Confirmation of problem re-occurrence will come in a few days.I'm
much more likely to believe 6+sigma occurrence (storage, freak bug,
etc) should it prove the problem goes away post rebuild.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-25 Thread Merlin Moncure

On Tue, Oct 25, 2016 at 12:57 PM, Alvaro Herrera
<alvhe...@2ndquadrant.com> wrote:
> Merlin Moncure wrote:
>
>> After last night, I rebuilt the cluster, turning on checksums, turning
>> on synchronous commit (it was off) and added a standby replica.  This
>> should help narrow the problem down should it re-occur; if storage is
>> bad (note, other database on same machine is doing 10x write activity
>> and is fine) or something is scribbling on shared memory (my guess
>> here)  then checksums should be popped, right?
>
> Not really sure about that.  As I recall we compute the CRC on the
> buffer's way out, based on the then-current contents, so if something
> scribbles on the buffer while it's waiting to be evicted, the CRC
> computation would include the new (corrupted) bytes rather than the
> original ones -- see FlushBuffer.

Huh. I have a new theory on this.  Dealing with the reconstituted
database, I'm finding more things -- functions and such, that are
simply gone and had to be rebuilt -- they escaped notice as they were
not in primary code paths.  Recall that the original outage came
manifested as queries getting stuck, possibly on spinlock (we don't
know for sure).  After that, things started to randomly disappear,
possibly from system catalogs (but now need to go back and verify
older data, I think).  There were three autovac processes running.

What if the subsequent dataloss was in fact a symptom of the first
outage?  Is in theory possible for data to appear visible but then be
eaten up as the transactions making the data visible get voided out by
some other mechanic?  I had to pull a quick restart the first time and
everything looked ok -- or so I thought.   What I think was actually
happening is that data started to slip into the void.   It's like
randomly sys catalogs were dropping off. I bet other data was, too.  I
can pull older backups and verify that.  It's as if some creeping xmin
was snuffing everything out.

The confirmation of this should be obvious -- if that's indeed the
case, the backup and restored cluster should no longer present data
loss. Given that I was getting that every 1-2 days, we should be able
to figure that out pretty soon.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-25 Thread Merlin Moncure

On Mon, Oct 24, 2016 at 9:18 PM, Alvaro Herrera
<alvhe...@2ndquadrant.com> wrote:
> Merlin Moncure wrote:
>> On Mon, Oct 24, 2016 at 6:01 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
>
>> > Corruption struck again.
>> > This time got another case of view busted -- attempting to create
>> > gives missing 'type' error.
>>
>> Call it a hunch -- I think the problem is in pl/sh.
>
> I've heard that before.

well, yeah, previously I had an issue where the database crashed
during a heavy concurrent pl/sh based load.   However the problems
went away when I refactored the code.   Anyways, I looked at the code
and couldn't see anything obviously wrong so who knows?  All I know is
my production database is exploding continuously and I'm looking for
answers.  The only other extension in heavy use on this servers is
postgres_fdw.

The other database on the cluster is fine, which kind of suggests we
are not facing clog or WAL type problems.

After last night, I rebuilt the cluster, turning on checksums, turning
on synchronous commit (it was off) and added a standby replica.  This
should help narrow the problem down should it re-occur; if storage is
bad (note, other database on same machine is doing 10x write activity
and is fine) or something is scribbling on shared memory (my guess
here)  then checksums should be popped, right?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-24 Thread Merlin Moncure

On Mon, Oct 24, 2016 at 6:01 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
> On Thu, Oct 20, 2016 at 1:52 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
>> On Wed, Oct 19, 2016 at 2:39 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
>>> On Wed, Oct 19, 2016 at 9:56 AM, Bruce Momjian <br...@momjian.us> wrote:
>>>> On Wed, Oct 19, 2016 at 08:54:48AM -0500, Merlin Moncure wrote:
>>>>> > Yeah.  Believe me -- I know the drill.  Most or all the damage seemed
>>>>> > to be to the system catalogs with at least two critical tables dropped
>>>>> > or inaccessible in some fashion.  A lot of the OIDs seemed to be
>>>>> > pointing at the wrong thing.  Couple more datapoints here.
>>>>> >
>>>>> > *) This database is OLTP, doing ~ 20 tps avg (but very bursty)
>>>>> > *) Another database on the same cluster was not impacted.  However
>>>>> > it's more olap style and may not have been written to during the
>>>>> > outage
>>>>> >
>>>>> > Now, this infrastructure running this system is running maybe 100ish
>>>>> > postgres clusters and maybe 1000ish sql server instances with
>>>>> > approximately zero unexplained data corruption issues in the 5 years
>>>>> > I've been here.  Having said that, this definitely smells and feels
>>>>> > like something on the infrastructure side.  I'll follow up if I have
>>>>> > any useful info.
>>>>>
>>>>> After a thorough investigation I now have credible evidence the source
>>>>> of the damage did not originate from the database itself.
>>>>> Specifically, this database is mounted on the same volume as the
>>>>> operating system (I know, I know) and something non database driven
>>>>> sucked up disk space very rapidly and exhausted the volume -- fast
>>>>> enough that sar didn't pick it up.  Oh well :-) -- thanks for the help
>>>>
>>>> However, disk space exhaustion should not lead to corruption unless the
>>>> underlying layers lied in some way.
>>>
>>> I agree -- however I'm sufficiently separated from the things doing
>>> the things that I can't verify that in any real way.   In the meantime
>>> I'm going to take standard precautions (enable checksums/dedicated
>>> volume/replication).  Low disk space also does not explain the bizarre
>>> outage I had last friday.
>>
>> ok, data corruption struck again.  This time disk space is ruled out,
>> and access to the database is completely denied:
>> postgres=# \c castaging
>> WARNING:  leaking still-referenced relcache entry for
>> "pg_index_indexrelid_index"
>
> Corruption struck again.
> This time got another case of view busted -- attempting to create
> gives missing 'type' error.

Call it a hunch -- I think the problem is in pl/sh.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-24 Thread Merlin Moncure

On Thu, Oct 20, 2016 at 1:52 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
> On Wed, Oct 19, 2016 at 2:39 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
>> On Wed, Oct 19, 2016 at 9:56 AM, Bruce Momjian <br...@momjian.us> wrote:
>>> On Wed, Oct 19, 2016 at 08:54:48AM -0500, Merlin Moncure wrote:
>>>> > Yeah.  Believe me -- I know the drill.  Most or all the damage seemed
>>>> > to be to the system catalogs with at least two critical tables dropped
>>>> > or inaccessible in some fashion.  A lot of the OIDs seemed to be
>>>> > pointing at the wrong thing.  Couple more datapoints here.
>>>> >
>>>> > *) This database is OLTP, doing ~ 20 tps avg (but very bursty)
>>>> > *) Another database on the same cluster was not impacted.  However
>>>> > it's more olap style and may not have been written to during the
>>>> > outage
>>>> >
>>>> > Now, this infrastructure running this system is running maybe 100ish
>>>> > postgres clusters and maybe 1000ish sql server instances with
>>>> > approximately zero unexplained data corruption issues in the 5 years
>>>> > I've been here.  Having said that, this definitely smells and feels
>>>> > like something on the infrastructure side.  I'll follow up if I have
>>>> > any useful info.
>>>>
>>>> After a thorough investigation I now have credible evidence the source
>>>> of the damage did not originate from the database itself.
>>>> Specifically, this database is mounted on the same volume as the
>>>> operating system (I know, I know) and something non database driven
>>>> sucked up disk space very rapidly and exhausted the volume -- fast
>>>> enough that sar didn't pick it up.  Oh well :-) -- thanks for the help
>>>
>>> However, disk space exhaustion should not lead to corruption unless the
>>> underlying layers lied in some way.
>>
>> I agree -- however I'm sufficiently separated from the things doing
>> the things that I can't verify that in any real way.   In the meantime
>> I'm going to take standard precautions (enable checksums/dedicated
>> volume/replication).  Low disk space also does not explain the bizarre
>> outage I had last friday.
>
> ok, data corruption struck again.  This time disk space is ruled out,
> and access to the database is completely denied:
> postgres=# \c castaging
> WARNING:  leaking still-referenced relcache entry for
> "pg_index_indexrelid_index"

Corruption struck again.
This time got another case of view busted -- attempting to create
gives missing 'type' error.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-21 Thread Merlin Moncure

On Fri, Oct 21, 2016 at 1:37 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
> On Fri, Oct 21, 2016 at 8:03 AM, Kevin Grittner <kgri...@gmail.com> wrote:
>> On Tue, Oct 18, 2016 at 8:45 AM, Merlin Moncure <mmonc...@gmail.com> wrote:
>>
>>> Most or all the damage seemed to be to the system catalogs with
>>> at least two critical tables dropped or inaccessible in some
>>> fashion.  A lot of the OIDs seemed to be pointing at the wrong
>>> thing.
>>
>> While the oid in pg_class often matches the filename, that is not
>> true after some operations (like CLUSTER or VACUUM FULL).  It is
>> the relfilenode column that is the definitive link to the file.
>
> no such operations happened.  In the first instance at least one table
> dropped from the system catalogs.   I have a hunch that the heap is
> fine (supported by the size of the database on disk).   At this
> precise moment I'm restoring the database to another fileserver in
> order to do some forensic analysis, also in the hopes of getting the
> second database online in order to expedite recovery.
>
> ah -- done. :-)  deleting the init file didn't help, but starting up
> single user allowed the start up to gracefully fail with a FATAL cache
> lookup.

OK, I have some good (very- in the specific case of yours truly) news
to report.  Doing a filesystem level copy to a test server I was able
to relfilenode swap one of the critical tables over the place of the
refilenode of the stored backup.  Not being able know the file to copy
from, I figured out the source node by judging the size and using
'strings'  utility.  Data recovery for that table at least appears to
be 100%.

For those following along, this simple process is only likely to work
easily if the table contains only system types; no user types, enums,
composites, etc, since those have a unique ID for each data restore.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-21 Thread Merlin Moncure

On Fri, Oct 21, 2016 at 8:03 AM, Kevin Grittner <kgri...@gmail.com> wrote:
> On Tue, Oct 18, 2016 at 8:45 AM, Merlin Moncure <mmonc...@gmail.com> wrote:
>
>> Most or all the damage seemed to be to the system catalogs with
>> at least two critical tables dropped or inaccessible in some
>> fashion.  A lot of the OIDs seemed to be pointing at the wrong
>> thing.
>
> While the oid in pg_class often matches the filename, that is not
> true after some operations (like CLUSTER or VACUUM FULL).  It is
> the relfilenode column that is the definitive link to the file.

no such operations happened.  In the first instance at least one table
dropped from the system catalogs.   I have a hunch that the heap is
fine (supported by the size of the database on disk).   At this
precise moment I'm restoring the database to another fileserver in
order to do some forensic analysis, also in the hopes of getting the
second database online in order to expedite recovery.

ah -- done. :-)  deleting the init file didn't help, but starting up
single user allowed the start up to gracefully fail with a FATAL cache
lookup.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-20 Thread Merlin Moncure

On Thu, Oct 20, 2016 at 3:16 PM, Alvaro Herrera
<alvhe...@2ndquadrant.com> wrote:
> Merlin Moncure wrote:
>
>> single user mode dumps core :(
>>
>> bash-4.1$ postgres --single -D /var/lib/pgsql/9.5/data castaging
>> LOG:  0: could not change directory to "/root": Permission denied
>> LOCATION:  resolve_symlinks, exec.c:293
>> Segmentation fault (core dumped)
>>
>> Core was generated by `postgres --single -D /var/lib/pgsql/9.5/data 
>> castaging'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  0x00797d6f in ?? ()
>> Missing separate debuginfos, use: debuginfo-install
>> postgresql95-server-9.5.2-1PGDG.rhel6.x86_64
>> (gdb) bt
>> #0  0x00797d6f in ?? ()
>> #1  0x0079acf1 in RelationCacheInitializePhase3 ()
>> #2  0x007b35c5 in InitPostgres ()
>> #3  0x006b9b53 in PostgresMain ()
>> #4  0x005f30fb in main ()
>
> Maybe
>   rm global/pg_internal.init
> and try again?

Will do when I can do that had to do emergency restore + some unfun
data reconstruction from the query log.

Notably there is a much larger database in the same cluster which is
undamaged.  This server is new to production usage, maybe 2 months.

Here is contents of pg_extension
 plpgsql
 dblink
 hstore
 postgres_fdw
 plsh * not used
 pg_trgm * not used
 plr * not used
 tablefunc * not used
 adminpack * not used
 plpythonu * not used
 postgis * not used
 postgis_topology * not used

Short term plan is to separate the database to it's own cluster,
install replication and checksums.  All queries to this database are
logged.  Here is the contents of the log leading into and after the
the crash:

oct 17 crash:
2016-10-17 12:12:24 CDT [rms@castaging]: DETAIL:  parameters: $1 =
'21121', $2 = '8', $3 = '2016-10-13', $4 = NULL, $5 = NULL, $6 = NULL,
$7 = NULL, $8 = NULL, $9 = NULL, $10 = NULL, $11 = 't', $12
2016-10-17 12:12:24 CDT [rms@castaging]: LOG:  execute :
SELECT NULL AS PROCEDURE_CAT, n.nspname AS PROCEDURE_SCHEM, p.proname
AS PROCEDURE_NAME, NULL, NULL, NULL, d.description AS REMARKS
2016-10-17 12:12:24 CDT [rms@castaging]: LOG:  execute :
SELECT n.nspname,p.proname,p.prorettype,p.proargtypes,
t.typtype,t.typrelid , p.proargnames, p.proargmodes, p.proallargtypes
, p.o
2016-10-17 12:12:24 CDT [rms@castaging]: LOG:  execute :
select * from checkin($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11,
$12) as result
2016-10-17 12:12:24 CDT [rms@castaging]: DETAIL:  parameters: $1 =
'114333', $2 = 'rrosillo', $3 = 'CALLER', $4 = 'Survey', $5 = 'Happy',
$6 = 'Callback', $7 = 'OTHER', $8 = '2016-10-18 01:05:00',
2016-10-17 12:12:24 CDT [rms@castaging]: LOG:  execute S_3: COMMIT
2016-10-17 12:12:25 CDT [@]: ERROR:  could not open relation with OID
1203933 <-- first sign of damage
2016-10-17 12:12:25 CDT [@]: CONTEXT:  automatic analyze of table
"castaging.public.apartment"

oct 20 crash:
2016-10-20 12:46:38 CDT [postgres@castaging]: LOG:  statement: SELECT
CallsByUser() AS byuser
2016-10-20 12:46:40 CDT [postgres@castaging]: LOG:  statement: SELECT
CallCenterOverviewJSON() AS overview
2016-10-20 12:46:41 CDT [postgres@castaging]: LOG:  statement: SELECT
CallCenterUserTrackingJSON() AS tracking
2016-10-20 12:46:41 CDT [postgres@castaging]: LOG:  statement: SELECT
MarketOverviewJSON() AS market
2016-10-20 12:46:42 CDT [postgres@castaging]: LOG:  execute :
SELECT SubMarketOverviewJSON($1::TEXT) AS submkt
2016-10-20 12:46:42 CDT [postgres@castaging]: DETAIL:  parameters: $1 = '640'
2016-10-20 12:46:44 CDT [postgres@castaging]: LOG:  statement: SELECT
CallsByUser() AS byuser
2016-10-20 12:46:46 CDT [postgres@castaging]: LOG:  statement: SELECT
CallCenterOverviewJSON() AS overview
2016-10-20 12:46:47 CDT [postgres@castaging]: LOG:  statement: SELECT
CallCenterUserTrackingJSON() AS tracking
2016-10-20 12:46:47 CDT [postgres@castaging]: ERROR:
"pg_description_o_c_o_index" is an index <-- first sign of damage
2016-10-20 12:46:47 CDT [postgres@castaging]: CONTEXT:  SQL function
"callcenterusertrackingjson" during startup
2016-10-20 12:46:47 CDT [postgres@castaging]: STATEMENT:  SELECT
CallCenterUserTrackingJSON() AS tracking
2016-10-20 12:46:47 CDT [postgres@castaging]: WARNING:  leaking
still-referenced relcache entry for "pg_class_oid_index"

CallCenterUserTrackingJSON() and friends are not particularly
interesting except that they are making use of of json_agg().  They
were also called basically all day long in 5 second intervals.   I
guess this isn't saying very much, but I'm starting to smell a rat
here.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-20 Thread Merlin Moncure

On Thu, Oct 20, 2016 at 2:07 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmonc...@gmail.com> writes:
>> single user mode dumps core :(
>
> You've got a mess there :-(
>
>> Missing separate debuginfos, use: debuginfo-install
>> postgresql95-server-9.5.2-1PGDG.rhel6.x86_64
>
> This backtrace would likely be much more informative if you did the above.

can't; don't have the package unfortunately.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-20 Thread Merlin Moncure

On Thu, Oct 20, 2016 at 1:52 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
> On Wed, Oct 19, 2016 at 2:39 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
>> On Wed, Oct 19, 2016 at 9:56 AM, Bruce Momjian <br...@momjian.us> wrote:
>>> On Wed, Oct 19, 2016 at 08:54:48AM -0500, Merlin Moncure wrote:
>>>> > Yeah.  Believe me -- I know the drill.  Most or all the damage seemed
>>>> > to be to the system catalogs with at least two critical tables dropped
>>>> > or inaccessible in some fashion.  A lot of the OIDs seemed to be
>>>> > pointing at the wrong thing.  Couple more datapoints here.
>>>> >
>>>> > *) This database is OLTP, doing ~ 20 tps avg (but very bursty)
>>>> > *) Another database on the same cluster was not impacted.  However
>>>> > it's more olap style and may not have been written to during the
>>>> > outage
>>>> >
>>>> > Now, this infrastructure running this system is running maybe 100ish
>>>> > postgres clusters and maybe 1000ish sql server instances with
>>>> > approximately zero unexplained data corruption issues in the 5 years
>>>> > I've been here.  Having said that, this definitely smells and feels
>>>> > like something on the infrastructure side.  I'll follow up if I have
>>>> > any useful info.
>>>>
>>>> After a thorough investigation I now have credible evidence the source
>>>> of the damage did not originate from the database itself.
>>>> Specifically, this database is mounted on the same volume as the
>>>> operating system (I know, I know) and something non database driven
>>>> sucked up disk space very rapidly and exhausted the volume -- fast
>>>> enough that sar didn't pick it up.  Oh well :-) -- thanks for the help
>>>
>>> However, disk space exhaustion should not lead to corruption unless the
>>> underlying layers lied in some way.
>>
>> I agree -- however I'm sufficiently separated from the things doing
>> the things that I can't verify that in any real way.   In the meantime
>> I'm going to take standard precautions (enable checksums/dedicated
>> volume/replication).  Low disk space also does not explain the bizarre
>> outage I had last friday.
>
> ok, data corruption struck again.  This time disk space is ruled out,
> and access to the database is completely denied:
> postgres=# \c castaging
> WARNING:  leaking still-referenced relcache entry for
> "pg_index_indexrelid_index"

single user mode dumps core :(

bash-4.1$ postgres --single -D /var/lib/pgsql/9.5/data castaging
LOG:  0: could not change directory to "/root": Permission denied
LOCATION:  resolve_symlinks, exec.c:293
Segmentation fault (core dumped)

Core was generated by `postgres --single -D /var/lib/pgsql/9.5/data castaging'.
Program terminated with signal 11, Segmentation fault.
#0  0x00797d6f in ?? ()
Missing separate debuginfos, use: debuginfo-install
postgresql95-server-9.5.2-1PGDG.rhel6.x86_64
(gdb) bt
#0  0x00797d6f in ?? ()
#1  0x0079acf1 in RelationCacheInitializePhase3 ()
#2  0x007b35c5 in InitPostgres ()
#3  0x006b9b53 in PostgresMain ()
#4  0x005f30fb in main ()
(gdb)


merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-20 Thread Merlin Moncure

On Wed, Oct 19, 2016 at 2:39 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
> On Wed, Oct 19, 2016 at 9:56 AM, Bruce Momjian <br...@momjian.us> wrote:
>> On Wed, Oct 19, 2016 at 08:54:48AM -0500, Merlin Moncure wrote:
>>> > Yeah.  Believe me -- I know the drill.  Most or all the damage seemed
>>> > to be to the system catalogs with at least two critical tables dropped
>>> > or inaccessible in some fashion.  A lot of the OIDs seemed to be
>>> > pointing at the wrong thing.  Couple more datapoints here.
>>> >
>>> > *) This database is OLTP, doing ~ 20 tps avg (but very bursty)
>>> > *) Another database on the same cluster was not impacted.  However
>>> > it's more olap style and may not have been written to during the
>>> > outage
>>> >
>>> > Now, this infrastructure running this system is running maybe 100ish
>>> > postgres clusters and maybe 1000ish sql server instances with
>>> > approximately zero unexplained data corruption issues in the 5 years
>>> > I've been here.  Having said that, this definitely smells and feels
>>> > like something on the infrastructure side.  I'll follow up if I have
>>> > any useful info.
>>>
>>> After a thorough investigation I now have credible evidence the source
>>> of the damage did not originate from the database itself.
>>> Specifically, this database is mounted on the same volume as the
>>> operating system (I know, I know) and something non database driven
>>> sucked up disk space very rapidly and exhausted the volume -- fast
>>> enough that sar didn't pick it up.  Oh well :-) -- thanks for the help
>>
>> However, disk space exhaustion should not lead to corruption unless the
>> underlying layers lied in some way.
>
> I agree -- however I'm sufficiently separated from the things doing
> the things that I can't verify that in any real way.   In the meantime
> I'm going to take standard precautions (enable checksums/dedicated
> volume/replication).  Low disk space also does not explain the bizarre
> outage I had last friday.

ok, data corruption struck again.  This time disk space is ruled out,
and access to the database is completely denied:
postgres=# \c castaging
WARNING:  leaking still-referenced relcache entry for
"pg_index_indexrelid_index"

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-19 Thread Merlin Moncure

On Wed, Oct 19, 2016 at 9:56 AM, Bruce Momjian <br...@momjian.us> wrote:
> On Wed, Oct 19, 2016 at 08:54:48AM -0500, Merlin Moncure wrote:
>> > Yeah.  Believe me -- I know the drill.  Most or all the damage seemed
>> > to be to the system catalogs with at least two critical tables dropped
>> > or inaccessible in some fashion.  A lot of the OIDs seemed to be
>> > pointing at the wrong thing.  Couple more datapoints here.
>> >
>> > *) This database is OLTP, doing ~ 20 tps avg (but very bursty)
>> > *) Another database on the same cluster was not impacted.  However
>> > it's more olap style and may not have been written to during the
>> > outage
>> >
>> > Now, this infrastructure running this system is running maybe 100ish
>> > postgres clusters and maybe 1000ish sql server instances with
>> > approximately zero unexplained data corruption issues in the 5 years
>> > I've been here.  Having said that, this definitely smells and feels
>> > like something on the infrastructure side.  I'll follow up if I have
>> > any useful info.
>>
>> After a thorough investigation I now have credible evidence the source
>> of the damage did not originate from the database itself.
>> Specifically, this database is mounted on the same volume as the
>> operating system (I know, I know) and something non database driven
>> sucked up disk space very rapidly and exhausted the volume -- fast
>> enough that sar didn't pick it up.  Oh well :-) -- thanks for the help
>
> However, disk space exhaustion should not lead to corruption unless the
> underlying layers lied in some way.

I agree -- however I'm sufficiently separated from the things doing
the things that I can't verify that in any real way.   In the meantime
I'm going to take standard precautions (enable checksums/dedicated
volume/replication).  Low disk space also does not explain the bizarre
outage I had last friday.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-19 Thread Merlin Moncure

On Tue, Oct 18, 2016 at 8:45 AM, Merlin Moncure <mmonc...@gmail.com> wrote:
> On Mon, Oct 17, 2016 at 2:04 PM, Alvaro Herrera
> <alvhe...@2ndquadrant.com> wrote:
>> Merlin Moncure wrote:
>>
>>> castaging=# CREATE OR REPLACE VIEW vw_ApartmentSample AS
>>> castaging-#   SELECT ...
>>> ERROR:  42809: "pg_cast_oid_index" is an index
>>> LINE 11:   FROM ApartmentSample s
>>> ^
>>> LOCATION:  heap_openrv_extended, heapam.c:1304
>>>
>>> should I be restoring from backups?
>>
>> It's pretty clear to me that you've got catalog corruption here.  You
>> can try to fix things manually as they emerge, but that sounds like a
>> fool's errand.
>
> Yeah.  Believe me -- I know the drill.  Most or all the damage seemed
> to be to the system catalogs with at least two critical tables dropped
> or inaccessible in some fashion.  A lot of the OIDs seemed to be
> pointing at the wrong thing.  Couple more datapoints here.
>
> *) This database is OLTP, doing ~ 20 tps avg (but very bursty)
> *) Another database on the same cluster was not impacted.  However
> it's more olap style and may not have been written to during the
> outage
>
> Now, this infrastructure running this system is running maybe 100ish
> postgres clusters and maybe 1000ish sql server instances with
> approximately zero unexplained data corruption issues in the 5 years
> I've been here.  Having said that, this definitely smells and feels
> like something on the infrastructure side.  I'll follow up if I have
> any useful info.

After a thorough investigation I now have credible evidence the source
of the damage did not originate from the database itself.
Specifically, this database is mounted on the same volume as the
operating system (I know, I know) and something non database driven
sucked up disk space very rapidly and exhausted the volume -- fast
enough that sar didn't pick it up.  Oh well :-) -- thanks for the help

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-18 Thread Merlin Moncure

On Mon, Oct 17, 2016 at 2:04 PM, Alvaro Herrera
<alvhe...@2ndquadrant.com> wrote:
> Merlin Moncure wrote:
>
>> castaging=# CREATE OR REPLACE VIEW vw_ApartmentSample AS
>> castaging-#   SELECT ...
>> ERROR:  42809: "pg_cast_oid_index" is an index
>> LINE 11:   FROM ApartmentSample s
>> ^
>> LOCATION:  heap_openrv_extended, heapam.c:1304
>>
>> should I be restoring from backups?
>
> It's pretty clear to me that you've got catalog corruption here.  You
> can try to fix things manually as they emerge, but that sounds like a
> fool's errand.

Yeah.  Believe me -- I know the drill.  Most or all the damage seemed
to be to the system catalogs with at least two critical tables dropped
or inaccessible in some fashion.  A lot of the OIDs seemed to be
pointing at the wrong thing.  Couple more datapoints here.

*) This database is OLTP, doing ~ 20 tps avg (but very bursty)
*) Another database on the same cluster was not impacted.  However
it's more olap style and may not have been written to during the
outage

Now, this infrastructure running this system is running maybe 100ish
postgres clusters and maybe 1000ish sql server instances with
approximately zero unexplained data corruption issues in the 5 years
I've been here.  Having said that, this definitely smells and feels
like something on the infrastructure side.  I'll follow up if I have
any useful info.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-17 Thread Merlin Moncure

On Mon, Oct 17, 2016 at 2:04 PM, Alvaro Herrera
<alvhe...@2ndquadrant.com> wrote:
> Merlin Moncure wrote:
>
>> castaging=# CREATE OR REPLACE VIEW vw_ApartmentSample AS
>> castaging-#   SELECT ...
>> ERROR:  42809: "pg_cast_oid_index" is an index
>> LINE 11:   FROM ApartmentSample s
>> ^
>> LOCATION:  heap_openrv_extended, heapam.c:1304
>>
>> should I be restoring from backups?
>
> It's pretty clear to me that you've got catalog corruption here.  You
> can try to fix things manually as they emerge, but that sounds like a
> fool's errand.

agreed. current plan is to restore from backups, and recover as much
data as I can.  Also doing bugfix release and going to enable
checksums.

We had several good backups since the previous outage so it's not
clear the events are related but after months of smooth operation I
find that coincidence highly suspicious. As always, we need to suspect
hardware problems but I'm highly abstracted from them -- using esx +
san.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-17 Thread Merlin Moncure

On Mon, Oct 17, 2016 at 1:39 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
> On Thu, Oct 13, 2016 at 4:13 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> Merlin Moncure <mmonc...@gmail.com> writes:
>>> Today I had an emergency production outage on a server.
>>> ...
>>> Adding all this up it smells like processes were getting stuck on a 
>>> spinlock.
>>
>> Maybe.  If it happens again, probably the most useful debug data would
>> be stack traces from some of the busy processes.
>
> Another odd datapoint on this server. Things were running pretty good
> but an application crashed on a missing view.  Trying to recreate the
> view, I got:
>
> CREATE OR REPLACE VIEW vw_ApartmentQueueLastGood AS
>   SELECT ...
>
> ERROR:  type "vw_apartmentqueuelastgood" already exists
> HINT:  A relation has an associated type of the same name, so you must
> use a name that doesn't conflict with any existing type.
>
> ...which was pretty strange.  I had to manually delete the pg_type
> record in order to create the view.   I'm getting more reports of
> 'could not open relation with oid=X' errors so I could be facing data
> corruption :(.

castaging=# CREATE OR REPLACE VIEW vw_ApartmentSample AS
castaging-#   SELECT ...
ERROR:  42809: "pg_cast_oid_index" is an index
LINE 11:   FROM ApartmentSample s
^
LOCATION:  heap_openrv_extended, heapam.c:1304

should I be restoring from backups?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-17 Thread Merlin Moncure

On Thu, Oct 13, 2016 at 4:13 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmonc...@gmail.com> writes:
>> Today I had an emergency production outage on a server.
>> ...
>> Adding all this up it smells like processes were getting stuck on a spinlock.
>
> Maybe.  If it happens again, probably the most useful debug data would
> be stack traces from some of the busy processes.

Another odd datapoint on this server. Things were running pretty good
but an application crashed on a missing view.  Trying to recreate the
view, I got:

CREATE OR REPLACE VIEW vw_ApartmentQueueLastGood AS
  SELECT ...

ERROR:  type "vw_apartmentqueuelastgood" already exists
HINT:  A relation has an associated type of the same name, so you must
use a name that doesn't conflict with any existing type.

...which was pretty strange.  I had to manually delete the pg_type
record in order to create the view.   I'm getting more reports of
'could not open relation with oid=X' errors so I could be facing data
corruption :(.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY as a set returning function

2016-10-17 Thread Merlin Moncure

On Fri, Sep 30, 2016 at 9:56 PM, Tom Lane  wrote:
> Craig Ringer  writes:
>> On 1 Oct. 2016 05:20, "Tom Lane"  wrote:
>>> I think the last of those suggestions has come up before.  It has the
>>> large advantage that you don't have to remember a different syntax for
>>> copy-as-a-function.
>
>> That sounds fantastic. It'd help this copy variant retain festure parity
>> with normal copy. And it'd bring us closer to being able to FETCH in non
>> queries.
>
> On second thought, though, this couldn't exactly duplicate the existing
> COPY syntax, because COPY relies heavily on the rowtype of the named
> target table to tell it what it's copying.  You'd need some new syntax
> to provide the list of column names and types, which puts a bit of
> a hole in the "syntax we already know" argument.  A SRF-returning-record
> would have a leg up on that, because we do have existing syntax for
> defining the concrete rowtype that any particular call returns.

One big disadvantage of SRF-returning-record syntax is that functions
are basically unwrappable with generic wrappers sans major gymnastics
such as dynamically generating the query and executing it.  This is a
major disadvantage relative to the null::type hack we use in the
populate_record style functions and perhaps ought to make this
(SRF-returning-record syntax) style of use discouraged for useful
library functions.  If there were a way to handle wrapping I'd
withdraw this minor objection -- this has come up in dblink
discussions a few times).

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] Infinite recursion in bitshift

2016-10-14 Thread Merlin Moncure

On Fri, Oct 14, 2016 at 3:31 PM, Tom Lane  wrote:
> Andreas Seltenreich  writes:
>> Tom Lane writes:
>>> Seems sane, though I wonder if it'd be better to use -INT_MAX rather
>>> than -VARBITMAXLEN.
>
>> I am undecided between those two.  -INT_MAX might be a more precise fix
>> for the problem, but the extra distance to the danger zone was kind of
>> soothing :-).
>
> Yeah, might as well use the tighter limit.
>
> Poking around in varbit.c, I noticed some other places that were assuming
> that a typmod couldn't exceed VARBITMAXLEN.  anybit_typmodin() enforces
> that, but there are places where a user can shove in an arbitrary integer,
> eg
>
> regression=# select "bit"(42, 2147483647);
> ERROR:  invalid memory alloc request size 18446744073441116169
>
> I fixed those too and pushed it.  Thanks for the report!

Curious -- are there real world scenarios where this would happen?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] emergency outage requiring database restart

2016-10-13 Thread Merlin Moncure

Today I had an emergency production outage on a server.  This
particular server was running 9.5.2.   The symptoms were interesting
so I thought I'd report.  Here is what I saw:

*) User CPU was pegged 100%
*) Queries reading data would block and not respond to cancel or terminate
*) pg_stat_activity reported no waiting queries (but worked fine otherwise).

Adding all this up it smells like processes were getting stuck on a spinlock.

Connections quickly got eaten up and situation was desperately urgent
so I punted and did an immediate restart and things came back
normally.   I had a console to the database and did manage to grab
contents of pg_stat_activity and noticed several trivial queries were
running normally (according to pg_stat_activity) but were otherwise
stuck.  Attempting to run one of them myself, I noted query got stuck
and did not cancel.  I was in a terrible rush but am casting around
for stuff to grab out in case that happens again -- 'perf top' would
be a natural choice I guess.

Three autovacuum processes were running.  Obviously going to do bugfix
upgrade but was wondering if anybody has seen anything like this.
This particular server was upgraded to 9.5 somewhat recently but ran
on 9.2 for years with no issues.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] munmap() failure due to sloppy handling of hugepage size

2016-10-12 Thread Merlin Moncure

On Wed, Oct 12, 2016 at 5:18 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmonc...@gmail.com> writes:
>> ISTM all this silliness is pretty much unique to linux anyways.
>> Instead of reading the filesystem, what about doing test map and test
>> unmap?
>
> And if mmap succeeds and munmap fails, you'll recover how exactly?
>
> If this API were less badly designed, we'd not be having this problem
> in the first place ...

I was thinking to 'guess' in a ^2 loop in the case the obvious unmap
didn't work, finally aborting if no guess worked.  :-).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] munmap() failure due to sloppy handling of hugepage size

2016-10-12 Thread Merlin Moncure

On Wed, Oct 12, 2016 at 5:10 PM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Tom Lane wrote:
>>> According to
>>> https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
>>> looking into /proc/meminfo is the longer-standing API and thus is
>>> likely to work on more kernel versions.  Also, if you look into
>>> /sys then you are going to see multiple possible values and it's
>>> not clear how to choose the right one.
>
>> I'm not sure that this is the best rationale.  In my system there are
>> 2MB and 1GB huge page sizes; in systems with lots of memory (let's say 8
>> GB of shared memory is requested) it seems a clear winner to allocate 8
>> 1GB hugepages than 4096 2MB hugepages because the page table is so much
>> smaller.  The /proc interface only shows the 2MB page size, so if we go
>> that route we'd not be getting the full benefit of the feature.
>
> And you'll tell mmap() which one to do how exactly?  I haven't found
> anything explaining how applications get to choose which page size applies
> to their request.  The kernel document says that /proc/meminfo reflects
> the "default" size, and I'd assume that that's what we'll get from mmap.

hm. for (recent) linux, I see:

   MAP_HUGE_2MB, MAP_HUGE_1GB (since Linux 3.8)
  Used in conjunction with MAP_HUGETLB to select alternative
  hugetlb page sizes (respectively, 2 MB and 1 GB) on systems
  that support multiple hugetlb page sizes.

  More generally, the desired huge page size can be configured
  by encoding the base-2 logarithm of the desired page size in
  the six bits at the offset MAP_HUGE_SHIFT.  (A value of zero
  in this bit field provides the default huge page size; the
  default huge page size can be discovered vie the Hugepagesize
  field exposed by /proc/meminfo.)  Thus, the above two
  constants are defined as:

  #define MAP_HUGE_2MB(21 << MAP_HUGE_SHIFT)
  #define MAP_HUGE_1GB(30 << MAP_HUGE_SHIFT)

  The range of huge page sizes that are supported by the system
  can be discovered by listing the subdirectories in
  /sys/kernel/mm/hugepages.

via: http://man7.org/linux/man-pages/man2/mmap.2.html#NOTES

ISTM all this silliness is pretty much unique to linux anyways.
Instead of reading the filesystem, what about doing test map and test
unmap?  We could zero in on the page size for default I think with
some probing of known possible values.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autonomous transactions

2016-10-11 Thread Merlin Moncure

On Tue, Oct 11, 2016 at 10:06 AM, Petr Jelinek <p...@2ndquadrant.com> wrote:
> On 10/10/16 16:44, Merlin Moncure wrote:
>> On Thu, Oct 6, 2016 at 3:53 PM, Simon Riggs <si...@2ndquadrant.com> wrote:
>>> On 6 October 2016 at 21:27, Robert Haas <robertmh...@gmail.com> wrote:
>>>> I think we should implement background transactions and call them
>>>> background transactions.  That allows us to expose additional
>>>> functionality which is useful, like the ability to kick something off
>>>> and check back later for the results.  There's no reason to call it
>>>> background transactions and also call it autonomous transactions: one
>>>> feature doesn't need two names.
>>>
>>> I'm happy to also invoke it via an alternate mechanism or API, so that
>>> it can continue to be used even if the above mechanism changes.
>>>
>>> We have no need to wait for the perfect solution, even assuming we
>>> would ever agree that just one exists.
>>
>> -1 on implementing both autonomous and background transactions.  This
>> will confuse everyone.
>
> I personally care much more about having background transactions than
> autonomous ones (as I only ever had use-cases for the background ones)
> so don't agree there.

All right.  But would you agree then that AT should at least emulate
competing implementations? A major advantage of bgworkers is possibly
supporting concurrent activity and maybe the syntax could be more
directed to possibly moving in that direction other than copying
oracle style (PRAGMA etc), particularly if the locking rules are
substantially different.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autonomous transactions

2016-10-10 Thread Merlin Moncure

On Thu, Oct 6, 2016 at 3:53 PM, Simon Riggs  wrote:
> On 6 October 2016 at 21:27, Robert Haas  wrote:
>> I think we should implement background transactions and call them
>> background transactions.  That allows us to expose additional
>> functionality which is useful, like the ability to kick something off
>> and check back later for the results.  There's no reason to call it
>> background transactions and also call it autonomous transactions: one
>> feature doesn't need two names.
>
> I'm happy to also invoke it via an alternate mechanism or API, so that
> it can continue to be used even if the above mechanism changes.
>
> We have no need to wait for the perfect solution, even assuming we
> would ever agree that just one exists.

-1 on implementing both autonomous and background transactions.  This
will confuse everyone.

The lingo here is no so important, I think.  What *is* important is
defining how the locking and execution rules should work and the
implementation should flow from that.  Those rules should be estimated
from competing implementations and how well they work.  +1 for any
solution that makes migration from other solutions to postgres easier.

bgworkers should be considered if you want things to run in parallel.
Reading the proposal (note, I may have missed it) it isn't clear to me
if you can have the parent and AT run a query at the same time.
Should this (parallel execution) be a design goal, then that's the end
of the story.

However I don't think it is TBH.  ISTM the expectation is single
threaded behavior with finer grained control of commits.   If we're
not 100% clear on this point one way or the other then things are a
bit preemptive.  Maybe we are clear and I missed something?

One major advantage non-bgworker serilized execution approach is that
certain classes of deadlock are easier to detect or do not exist since
there is only one execution state; AIUI it's impossible for two
transaction states to be simultaneously waiting assuming the pl/pgsql
instuctions are not run in parallel with one exception, and that is
the AT trying to acquire a lock exclusively held by the master.  If
the AT blocks on the parent it ought to be O(1) and instant to detect
that and roll it back with right supporting infrastructure in the lock
manager.  It also makes sharing execution state much easier,
especially the parts that look like, "I'm waiting here until the other
guy finishes" since there's only one "guy".

How will advisory locks work? I think they'd block with bgworkers and
not block with non-bgworkers.  What about other session based stuff
like prepared statements?  Expectations around those cases out to
clarify the implementation.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why we lost Uber as a user

2016-08-19 Thread Merlin Moncure

On Wed, Aug 17, 2016 at 5:18 PM, Jim Nasby  wrote:
> On 8/17/16 2:51 PM, Simon Riggs wrote:
>> On 17 August 2016 at 12:19, Greg Stark  wrote:
>>> Yes, this is exactly what it should be doing and exactly why it's
>>> useful. Physical replication accurately replicates the data from the
>>> master including "corruption" whereas a logical replication system
>>> will not, causing divergence and possible issues during a failover.
>>
>>
>> Yay! Completely agree.
>>
>> Physical replication, as used by DRBD and all other block-level HA
>> solutions, and also used by other databases, such as Oracle.
>>
>> Corruption on the master would often cause errors that would prevent
>> writes and therefore those changes wouldn't even be made, let alone be
>> replicated.
>
>
> My experience has been that you discover corruption after it's already
> safely on disk, and more than once I've been able to recover by using data
> on a londiste replica.
>
> As I said originally, it's critical to understand the different solutions
> and the pros and cons of each. There is no magic bullet.

Data point: in the half or so cases I've experienced corruption on
replicated systems, in all cases but one the standby was clean.  The
'unclean' case actually 8.2 warm standby; the source of the corruption
was a very significant bug where prepared statements would write back
corrupted data if the table definitions changed under the statement
(fixed in 8.3).  In that particular case the corruption was very
unfortunately quite widespread and passed directly along to the
standby server.  This bug nearly costed us a user as well although not
nearly so famous as uber :-).

In the few modern cases I've seen I've not been able to trace it back
to any bug in postgres (in particular multixact was ruled out) and
I've chalked it up to media or (more likely I think) filesystem
problems in the face of a -9 reset.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why we lost Uber as a user

2016-07-28 Thread Merlin Moncure

On Thu, Jul 28, 2016 at 8:16 AM, pgwhatever  wrote:
> Statement-Based replication has a lot of problems with it like indeterminate
> UDFs.  Here is a link to see them all:
> https://dev.mysql.com/doc/refman/5.7/en/replication-sbr-rbr.html#replication-sbr-rbr-sbr-disadvantages

Sure.  It's also incredibly efficient with respect to bandwidth -- so,
if you're application was engineered to work around those problems
it's a huge win.  They could have used pgpool, but I guess the fix was
already in.

Taking a step back, from the outside, it looks like uber:
*) has a very thick middleware, very thin database with respect to
logic and complexity
*) has a very high priority on quick and cheap (in terms of bandwidth)
replication
*) has decided the database needs to be interchangeable
*) is not afraid to make weak or erroneous technical justifications as
a basis of stack selection (the futex vs ipc argument I felt was
particularly awful -- it ignored the fact we use spinlocks)

The very fact that they swapped it out so easily suggests that they
were not utilizing the database as they could have, and a different
technical team might have come to a different result.   Postgres is a
very general system and rewards deep knowledge such that it can
outperform even specialty systems in the hands of a capable developer
(for example, myself).  I'm just now hammering in the final coffin
nails that will get solr swapped out for jsonb backed postgres.

I guess it's fair to say that they felt mysql is closer to what they
felt a database should do out of the box.  That's disappointing, but
life moves on.  The takeaways are:

*) people like different choices of replication mechanics -- statement
level sucks a lot of the time, but not all the time
*) hs/sr simplicity of configuration and operation is a big issue.
it's continually gotten better and still needs to
*) bad QC can cost you customers.   how much regression coverage do we
have of hs/sr?
*) postgres may not be the ideal choice for those who want a thin and
simple database

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why we lost Uber as a user

2016-07-27 Thread Merlin Moncure

On Tue, Jul 26, 2016 at 5:07 PM, Tom Lane  wrote:
> Josh Berkus  writes:
>> To explain this in concrete terms, which the blog post does not:
>
>> 1. Create a small table, but one with enough rows that indexes make
>> sense (say 50,000 rows).
>
>> 2. Make this table used in JOINs all over your database.
>
>> 3. To support these JOINs, index most of the columns in the small table.
>
>> 4. Now, update that small table 500 times per second.
>
>> That's a recipe for runaway table bloat; VACUUM can't do much because
>> there's always some minutes-old transaction hanging around (and SNAPSHOT
>> TOO OLD doesn't really help, we're talking about minutes here), and
>> because of all of the indexes HOT isn't effective.
>
> Hm, I'm not following why this is a disaster.  OK, you have circa 100%
> turnover of the table in the lifespan of the slower transactions, but I'd
> still expect vacuuming to be able to hold the bloat to some small integer
> multiple of the minimum possible table size.  (And if the table is small,
> that's still small.)  I suppose really long transactions (pg_dump?) could
> be pretty disastrous, but there are ways around that, like doing pg_dump
> on a slave.
>
> Or in short, this seems like an annoyance, not a time-for-a-new-database
> kind of problem.

Well, the real annoyance as I understand it is the raw volume of bytes
of WAL traffic a single update of a field can cause.  They switched to
statement level replication(!).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug with plpgsql handling of NULL argument of compound type

2016-07-22 Thread Merlin Moncure

On Fri, Jul 22, 2016 at 1:39 PM, David G. Johnston
 wrote:
> On Fri, Jul 22, 2016 at 2:13 PM, Tom Lane  wrote:
>>
>> There is a rather squishy question as to whether NULL::composite_type
>> should be semantically equivalent to ROW(NULL,NULL,...)::composite_type.
>> If it is, then the SELECT should have failed before even getting into the
>> plpgsql function, because ROW(NULL,NULL) is surely not a valid value of
>> type c.  The SQL standard seems to believe that these things *are*
>> equivalent (at least, that was how we read the spec for IS [NOT] NULL).
>>
>
> I dislike that they are considered equal in various circumstances but if
> that's we are guided toward c'est la vie.

Not sure we are guided there.  Currently we follow the spec
specifically with the IS NULL operator but not in other cases. For
example.
postgres=# select row(null, null) is null;
 ?column?
──
 t

postgres=# select coalesce(row(null, null), row(1,1));
 coalesce
──
 (,)

Seem not to agree (at all) since I'm pretty sure the spec defines
coalesce in terms of IS NULL.

The basic problem we have is that in postgres the record variable is a
distinct thing from its contents and the spec does not treat it that
was. For my part, I think the spec is totally out to lunch on this
point but we've been stuck with the status quo for quite some time now
-- there's been no compelling narrative that suggests how things
should be changed and to what.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DO with a large amount of statements get stuck with high memory consumption

2016-07-18 Thread Merlin Moncure

On Mon, Jul 18, 2016 at 8:59 AM, Jan Wieck <j...@wi3ck.info> wrote:
>
>
> On Mon, Jul 18, 2016 at 9:43 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>
>> Merlin Moncure <mmonc...@gmail.com> writes:
>> > BTW, while the fix does address the cleanup performance issue, it's
>> > still the case that anonymous code blocks burn up lots of resident
>> > memory (my 315k example I tested with ate around 8gb IIRC) when run
>> > like this.  My question is, if the pl/pgsql code block is anonymous
>> > and not in some kind of a loop, why bother caching the plan at all?
>>
>> Nobody got around to it.  Also, as you note, it's not as simple as
>> "don't cache if in a DO block".  You'd need to track whether you were
>> inside any sort of looping construct.  Depending on how difficult
>> that turned out to be, it might add overhead to regular functions
>> that we don't want.
>
> Agreed. And from the structures themselves it is not really easy to detect
> if inside of a loop, the toplevel, while, for and if all use the same
> statement
> block and call exec_stmts(), which in turn calls exec_stmt() for  each
> element in that list. It is not impossible to add a flag, set at PL compile
> time, to that element and check it every time, the statement is executed.
> But such a change definitely needs more testing and probably won't
> qualify for backpatching.

Right. Note, not arguing for backpatch here, just some open
speculation and some evidence that we still have a problem (although
nearly as nasty of one -- the pre-patch behavior of not responding to
cancel is very dangerous and solved).

Hm, maybe, instead of trying to figure out if in a loop, set a
'called' flag  with each statement and only cache when touched the
second time.  (If that's easier, dunno).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DO with a large amount of statements get stuck with high memory consumption

2016-07-18 Thread Merlin Moncure

On Sat, Jul 16, 2016 at 2:47 PM, Jan Wieck <j...@wi3ck.info> wrote:
> On Tue, Jul 12, 2016 at 3:29 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
>>
>> I've noticed that pl/pgsql functions/do commands do not behave well
>> when the statement resolves and frees memory.   To be clear:
>>
>> FOR i in 1..100
>> LOOP
>>   INSERT INTO foo VALUES (i);
>> END LOOP;
>>
>> ...runs just fine while
>>
>> BEGIN
>>   INSERT INTO foo VALUES (1);
>>   INSERT INTO foo VALUES (2);
>>   ...
>>   INSERT INTO foo VALUES (100);
>> END;
>
>
> This sounds very much like what led to commit
> 25c539233044c235e97fd7c9dc600fb5f08fe065.
>
> It seems that patch was only applied to master and never backpatched to 9.5
> or earlier.

You're right; thanks (my bad for missing that).  For those following
along, the case that turned this up was:
DO

...;

Where the insertion step was a large number of standalone insert statements.

(temp table creation isn't necessary to turn up this bug, but it's a
common pattern when sending batch updates to a server).

For those following along, the workaround I recommend would be to do this:

do $d$
begin

create function doit() returns void as
$$

$$ language sql;
perform doit();
end;
$d$;

BTW, while the fix does address the cleanup performance issue, it's
still the case that anonymous code blocks burn up lots of resident
memory (my 315k example I tested with ate around 8gb IIRC) when run
like this.  My question is, if the pl/pgsql code block is anonymous
and not in some kind of a loop, why bother caching the plan at all?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] DO with a large amount of statements get stuck with high memory consumption

2016-07-12 Thread Merlin Moncure

I've noticed that pl/pgsql functions/do commands do not behave well
when the statement resolves and frees memory.   To be clear:

FOR i in 1..100
LOOP
  INSERT INTO foo VALUES (i);
END LOOP;

...runs just fine while

BEGIN
  INSERT INTO foo VALUES (1);
  INSERT INTO foo VALUES (2);
  ...
  INSERT INTO foo VALUES (100);
END;

(for the curious, create a script yourself via
copy (
  select
'do $$begin create temp table foo(i int);'
  union all select
format('insert into foo values (%s);', i) from generate_series(1,100) i
  union all select 'raise notice ''abandon all hope!''; end; $$;'
) to '/tmp/breakit.sql';

...while consume amounts of resident memory proportional to the number
of statemnts and eventually crash the server.  The problem is obvious;
each statement causes a plan to get created and the server gets stuck
in a loop where SPI_freeplan() is called repeatedly.  Everything is
working as designed I guess, but when this happens it's really
unpleasant: the query is uncancellable and unterminatable, nicht gut.
A pg_ctl kill ABRT  will do the trick but I was quite astonished
to see linux take a few minutes to clean up the mess (!) on a somewhat
pokey virtualized server with lots of memory.  With even as little as
ten thousand statements the cleanup time far exceed the runtime of the
statement block.

I guess the key takeaway here is, "don't do that"; pl/pgsql
aggressively generates plans and turns out to be a poor choice for
bulk loading because of all the plan caching.   Having said that, I
can't help but wonder if there should be a (perhaps user configurable)
limit to the amount of SPI plans a single function call should be able
to acquire on the basis you are going to smack into very poor
behaviors in the memory subsystem.

Stepping back, I can't help but wonder what the value of all the plan
caching going on is at all for statement blocks.  Loops might comprise
a notable exception, noted.  I'd humbly submit though that (relative
to functions) it's much more likely to want to do something like
insert a lot of statements and a impossible to utilize any cached
plans.

This is not an academic gripe -- I just exploded production :-D.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IMPORT FOREIGN SCHEMA can't be run in in pl/pgsql due to INTO

2016-07-12 Thread Merlin Moncure

On Mon, Jul 11, 2016 at 3:09 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmonc...@gmail.com> writes:
>> Currently pl/pgsql interprets the mandatory INTO of IMPORT FOREIGN
>> SCHEMA as INTO variable.
>
> Ugh, that's definitely a bug.
>
>> I estimate this to be minor oversight in
>> pl/pgsql parsing with respect to the introduction of this statement.
>
> While we can certainly hack it by something along the lines of not
> recognizing INTO when the first token was IMPORT, the whole thing
> seems awfully messy and fragile.  And it will certainly break again
> the next time somebody decides that INTO is le mot juste in some new
> SQL command.  I wish we could think of a safer, more future-proof
> solution.  I have no idea what that would be, though, short of
> deprecating INTO altogether.

This is a natural consequence of having two
almost-but-not-quite-the-same grammars handing the same shared
language.  There are a similar set of annoyances compiling C with a
C++ compiler as we all know.  In a perfect world, SQL procedural
extensions would be a proper superset and we'd have *one* grammar
handling everything.  Among other niceties this would make moving
forward with stored procedures a much simpler discussion.  Well, C'est
la vie :-D.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] IMPORT FOREIGN SCHEMA can't be run in in pl/pgsql due to INTO

2016-07-11 Thread Merlin Moncure

Currently pl/pgsql interprets the mandatory INTO of IMPORT FOREIGN
SCHEMA as INTO variable.  I estimate this to be minor oversight in
pl/pgsql parsing with respect to the introduction of this statement.
Assuming it's easily fixed, would a patch to fix pl/pgsql parsing be
accepted?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MVCC overheads

2016-07-08 Thread Merlin Moncure

On Thu, Jul 7, 2016 at 11:45 AM, Pete Stevenson
 wrote:
> Hi postgresql hackers -
>
> I would like to find some analysis (published work, blog posts) on the 
> overheads affiliated with the guarantees provided by MVCC isolation. More 
> specifically, assuming the current workload is CPU bound (as opposed to IO) 
> what is the CPU overhead of generating the WAL, the overhead of version 
> checking and version creation, and of garbage collecting old and unnecessary 
> versions? For what it’s worth, I am working on a research project where it is 
> envisioned that some of this work can be offloaded.

That's going to be hard to measure.   First, what you didn't say is,
'with respect to what?'. You mention WAL for example.  WAL is more of
a crash safety mechanism than anything and it's not really fair to
include it in an analysis of 'MVCC overhead', or at least not
completely.  One thing that MVCC *does* objectively cause is bloat,
although you can still get bloat without MVCC if you (for example)
delete rows or rewrite rows such that they can't fit in their old
slot.

MVCC definitely incurs some runtime overhead to check visibility but
the amount of overhead is highly dependent on the specific workload.
Postgres 'hint bits' reduce the cost to near zero for many workloads
but in other workloads they are expensive to maintain and cause a lot
of extra traffic.   One nice feature about not having to worry about
visibility is that you can read data directly out of the index.  We
have some workarounds to deal with that ('all visible bit') but again
the amount of benefit from that strategy is going to be very situation
specific.

Stepping back, the overhead of MVCC in postgres (and probably other
systems too) has been continually reduced over the years -- the really
nasty parts have been relegated to background cleanup processing.
That processing is pretty sequential and the 'i/o bottleneck' is
finally getting solved on cheap storage pushing things back into the
cpu space.

In summary, I think the future of MVCC and transactional systems is
very bright, and the data management systems that discard
transactional safety in order to get some short term performance gains
is, uh, not so bright.  Transactions are essential in systems where
data integrity matters.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions

2016-07-06 Thread Merlin Moncure

On Fri, Jul 1, 2016 at 11:45 AM, Alvaro Herrera
<alvhe...@2ndquadrant.com> wrote:
> Merlin Moncure wrote:
>
>> It's pretty easy to craft a query where you're on the winning side,
>> but what's the worst case of doing two pass...is constant folding a
>> non trivial fraction of planning time?
>
> One thing that has been suggested is to re-examine the plan after
> planning is done, and if execution time is estimated to be large (FSVO),
> then run a second planning pass with more expensive optimizations
> enabled to try and find better plans.  The guiding principle would be to
> continue to very quickly find good enough plans for
> frequent/small/simple queries, but spend more planning effort on more
> complex ones where execution is likely to take much longer than planning
> time.
>
> So doing constant-folding twice would be enabled for the second planning
> pass.

I like this idea.  Maybe a GUC controlling the cost based cutoff (with
0 meaning, "assume the worst and plan the hard way first").

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions

2016-07-01 Thread Merlin Moncure

On Fri, Jul 1, 2016 at 10:27 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Fri, Jul 1, 2016 at 10:20 AM, Merlin Moncure <mmonc...@gmail.com> wrote:
>> Yeah.  Also, even if you could parse out those cases, it's major
>> optimization fence.  Consider if you have an ORDER BY clause here:
>>
>> SELECT FROM foo WHERE a OR b ORDER BY c;
>>
>> ... by pushing inside a union, you're going to be in trouble in real
>> world cases.  That's just a mess and it would add a lot of runtime
>> analysis of the alternative paths.  It's hard for me to believe
>> rewriting is easier and simpler than rewriting 'false OR x' to 'x'.  I
>> also thing that constant folding strategies are going to render much
>> more sensible output to EXPLAIN.
>
> I don't think that it's easier and simpler and didn't intend to say
> otherwise.  I do think that I've run across LOTS of queries over the
> years where rewriting OR using UNION ALL was a lot faster, and I think
> that case is more likely to occur in practice than FALSE OR WHATEVER.
> But, I'm just throwing out opinions to see what sticks here; I'm not
> deeply invested in this.

Sure (I didn't put you on that position, just thinking out loud).  The
problem with UNION ALL is that it's only safe to do so when you know
for sure the both sides of the partition are non-overlapping.  The
author of the query often knows this going in but for the planner it's
not so simple to figure out in many cases.  If there's a subset of
cases.   UNION sans ALL is probably a dead end on performance grounds.

This hinges on Tom's earlier statements, "Much of
the value of doing constant-folding would disappear if we ran it before
subquery pullup + join simplification, because in non-stupidly-written
queries those are what expose the expression simplification opportunities."

and, especially, "We could run it twice but that seems certain to be a
dead loser most of
the time."

It's pretty easy to craft a query where you're on the winning side,
but what's the worst case of doing two pass...is constant folding a
non trivial fraction of planning time?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions

2016-07-01 Thread Merlin Moncure

On Fri, Jul 1, 2016 at 9:11 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Fri, Jul 1, 2016 at 9:52 AM, Tom Lane  wrote:
>>> Maybe, but neither UNION nor UNION ALL would duplicate the semantics
>>> of OR, so there's some handwaving here that I missed.
>
>> SELECT * FROM foo WHERE a = 5 OR a = 4
>> isn't equivalent to
>> SELECT * FROM foo WHERE a = 5
>> UNION
>> SELECT * FROM foo WHERE a = 4
>> ?
>
> It probably is, but you're assuming that "a" appears in the list of
> columns being unioned.  If you make that just "SELECT b FROM ..."
> then the latter form gets rid of duplicate b values where the first
> doesn't.  On the other hand, UNION ALL might introduce duplicates
> not present in the OR query's result.

Yeah.  Also, even if you could parse out those cases, it's major
optimization fence.  Consider if you have an ORDER BY clause here:

SELECT FROM foo WHERE a OR b ORDER BY c;

... by pushing inside a union, you're going to be in trouble in real
world cases.  That's just a mess and it would add a lot of runtime
analysis of the alternative paths.  It's hard for me to believe
rewriting is easier and simpler than rewriting 'false OR x' to 'x'.  I
also thing that constant folding strategies are going to render much
more sensible output to EXPLAIN.

FYI, The query is something along the lines of
SELECT * FROM foo
WHERE
  ('a' = 'a' AND EXISTS ...)
  OR ('a' = 'b' AND EXISTS ...)
  OR ('a' = 'c' AND EXISTS ...)

...where the left side of the equality is a parameterized 'filter
mode' flag.  That way the query can introduce filtering behaviors
without doing dynamic acrobatics.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions

2016-06-21 Thread Merlin Moncure

Hello hackers,

Observe the following test case (apologies if this is a well
understood problem):

create temp table foo as select generate_series(1,100) id;
create index on foo(id);

create temp table bar as select id, id % 10 = 0 as good from
generate_series(1,100) id;
create index on bar(good);

analyze foo;
analyze bar;

explain analyze select * from foo where false or exists (select 1 from
bar where good and foo.id = bar.id);  -- A
explain analyze select * from foo where exists (select 1 from bar
where good and foo.id = bar.id);  -- B

These queries are trivially verified as identical but give very different plans.
A gives
  QUERY PLAN
─
 Seq Scan on foo  (cost=0.00..4459425.00 rows=50 width=4) (actual
time=13.299..130.271 rows=10 loops=1)
   Filter: (alternatives: SubPlan 1 or hashed SubPlan 2)
   Rows Removed by Filter: 90
   SubPlan 1
 ->  Index Scan using bar_good_idx on bar  (cost=0.42..4.45 rows=1
width=0) (never executed)
   Index Cond: (good = true)
   Filter: (good AND (foo.id = id))
   SubPlan 2
 ->  Index Scan using bar_good_idx on bar bar_1  (cost=0.42..4.44
rows=1 width=4) (actual time=0.024..0.055 rows=10 loops=1)
   Index Cond: (good = true)
   Filter: good
 Planning time: 0.103 ms
 Execution time: 130.312 ms

B gives
  QUERY PLAN
───
 Nested Loop  (cost=4.87..12.91 rows=1 width=4) (actual
time=0.075..0.161 rows=10 loops=1)
   ->  HashAggregate  (cost=4.45..4.46 rows=1 width=4) (actual
time=0.058..0.060 rows=10 loops=1)
 Group Key: bar.id
 ->  Index Scan using bar_good_idx on bar  (cost=0.42..4.44
rows=1 width=4) (actual time=0.018..0.045 rows=10 loops=1)
   Index Cond: (good = true)
   Filter: good
   ->  Index Only Scan using foo_id_idx on foo  (cost=0.42..8.44
rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=10)
 Index Cond: (id = bar.id)
 Heap Fetches: 10
 Planning time: 0.193 ms
 Execution time: 0.187 ms

This is a general problem to OR expressions while AND expressions will
generally pass the optimization through.   The 'old school'
optimization approach is to rewrite the OR expressions to UNION ALL
but this can have unpleasant downstream effects on the query in real
world scenarios.  The question is: can the one time filter logic be
expanded such the first query can be functionally be written into the
second one?  This type of query happens a lot when trying to mix
multiple different filtering expressions (a 'filter mode' if you will)
in a single query based on a user supplied switch.  Food for thought.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1941 matches

Mail list logo