from:"David G. Johnston"

Re: [HACKERS] pgbench more operators & functions

2016-04-04 Thread David G. Johnston

On Sun, Apr 3, 2016 at 10:18 PM, Simon Riggs  wrote:

> On 4 April 2016 at 01:14, Michael Paquier 
> wrote:
>
>
>> I'd say why not.
>>
>
> I'd say "why not wait?". Minor, non-urgent patches will definitely go
> nowhere for a long time, so it gains nobody to submit now.
>
> Submitting patches during freeze has been discouraged for many years, so
> asking a long term contributor to avoid sending multiple minor patches is
> in line with that.
>
>
The main downside I see is on the CF manager having more items to manage.
The committers should be able to prioritize and so seeing the other items,
while maybe not ideal (though they should be in a future CF period so they
shouldn't be too visible), doesn't seem that bad.  What it does allow is
for lurkers or potential reviewers and developers to see what is being (has
been) worked on by others in the community.  That kind of visibility seems
like it should be desired - since proving that nobody benefits from it
being published seem a bit of stretch of reason.  But maybe I'm just not
close enough to the problems it causes - which ideally could be mitigated
in some form other than asking people to hold off making work public.

The main downside would be the human tendency to want to look at, comment
and/or work on these more minor items when they should be working on more
important things.  That, though, seem like the opposite of saying
"non-urgent patches will definitely go nowhere for a long time" and
probably installs a level of parental involvement that is not necessarily
the community's role.

David J.

Re: [HACKERS] Nested funtion

2016-03-27 Thread David G. Johnston

On Sun, Mar 27, 2016 at 9:14 PM, Sridhar N Bamandlapally <
sridhar@gmail.com> wrote:

> Hi
>
> Is there any way to create nested function?
>
> oracle to postgres migration required super function variable reference
> into nested function without nested function parameter
>
> Oracle sample:
> ---
> create or replace function f1(n number) return number
> is
> vs number:=1;
> function nf1(m number) return number is
> begin
> return vs + m + n;
> end;
> begin
> return nf1(2);
> end;
> /
>
> run:
> 
> SQL> select f1(9) from dual;
>
>  F1(9)
> --
> 12
>
>
PostgreSQL's pl/pgsql langauge doesn't grok closures.  You are welcome to
write "f1" in a language that does.  Perl and Python are built-in and many
others are available.  I assume at least one of them can do this.

David J.

Re: [HACKERS] Draft release notes for next week's releases

2016-03-27 Thread David G. Johnston

On Sun, Mar 27, 2016 at 8:43 PM, Peter Geoghegan  wrote:

> On Sat, Mar 26, 2016 at 4:34 PM, Tom Lane  wrote:
> > Probably the most discussion-worthy item is whether we can say
> > anything more about the strxfrm mess.  Should we make a wiki
> > page about that and have the release note item link to it?
>
> I think that there is an argument against doing so, which is that
> right now, all we have to offer on that are weasel words. However, I'm
> still in favor of a Wiki page, because I would not be at all surprised
> if our understanding of this problem evolved, and we were able to
> offer better answers in several weeks. Realistically, it will probably
> take at least that long before affected users even start to think
> about this.
>

One question to debate is whether placing a list of "known" (collated from
the program runs lots of people performed) would do more harm than good.
Personally I'd rather see a list of known failures and evaluate my
situation objectively (i.e., large index but no reported problem on my
combination of locale and platform).  I understand that a lack of evidence
is not proof that I am unaffected at this stage in the game.  Having
something I can execute on my server to try and verify behavior -
irrespective of the correctness of the indexes themselves - would be
welcomed.

David J.

Re: [HACKERS] syntax sugar for conditional check

2016-04-01 Thread David G. Johnston

On Fri, Apr 1, 2016 at 3:22 PM, Jim Nasby  wrote:

> On 4/1/16 1:08 PM, Tom Lane wrote:
>
>> Jim Nasby  writes:
>>
>>> Rather than this, I think an exclusive-or operator would be a lot more
>>> useful. The only difficulty I run into with CHECK constaints is when I
>>> want to ensure that only ONE condition is true.
>>>
>>
>> "bool != bool" works as XOR.  If you need "exactly one of N" you could
>> do something like "(cond1::int + cond2::int + ...) = 1".  We could
>> wrap some syntactic sugar around either of these, but it's not clear
>> to me that it'd be any more useful than a custom SQL function.
>>
>
> It would prevent having to re-create that function every time... :)


And it would nicely complement our recent addition of "
num_nonnulls
(variadic "any")"

David J.

Re: [HACKERS] raw output from copy

2016-04-04 Thread David G. Johnston

On Fri, Apr 1, 2016 at 8:42 AM, Daniel Verite 
wrote:

> Andrew Dunstan wrote:
>
> > If someone can make a good case that this is going to be of
> > general use I'll happily go along, but I haven't seen one so far.
>
> About COPY FROM with a raw format, for instance just yesterday
> there was this user question on stackoverflow:
> http://stackoverflow.com/questions/36317237
>
> which essentially is: how to import contents from a file without any
> particular interpretation of any character?
>
> With the patch discussed in this thread, a user can do
> \copy table(textcol) from /path/to/file (format raw)
>

What is needed to solve this specific use-case is a way to specify "QUOTE
NONE" instead of the default for whatever format is being hijacked:

COPY file_content FROM '/tmp/textfile.txt' WITH (FORMAT csv, QUOTE
E'');

becomes

COPY file_content FROM '/tmp/textfile.txt' WITH (FORMAT csv, QUOTE NONE);

Or maybe: "WITH (FORMAT single_column)"

Though maybe that doesn't extend well to unencoded binary data...which
seems like it can be considered a separate problem from reliably importing
an entire file into a single row+column in a table.

David J.

Re: [HACKERS] Request - repeat value of \pset title during \watch interations

2016-03-21 Thread David G. Johnston

On Monday, March 21, 2016, Tom Lane <t...@sss.pgh.pa.us> wrote:

> "David G. Johnston" <david.g.johns...@gmail.com <javascript:;>> writes:
> > I'll admit it's awkward because it's abbreviated but if someone enters
> > \watch 5 and then sees (5s) in the title I think they can put two and two
> > together.
>
> Where I find this to be awkward is that the format is randomly different
> between the user-title and no-user-title cases.
>
> What about just discarding the old format entirely, and printing one of
> these two things:
>
> Timestamp (every Ns)
>
> User Given Title  Timestamp (every Ns)
>
>
This works for me.

David J.

Re: [HACKERS] Add jsonb_compact(...) for whitespace-free jsonb to text

2016-04-29 Thread David G. Johnston

On Fri, Apr 29, 2016 at 7:15 AM, Merlin Moncure  wrote:

> Andrew mentions several solutions.  I like them all except I would
> prefer not to introduce a GUC for controlling the output format.  I do
> not think it's a good idea to set the expectation that clients can
> rely on text out byte for byte for any type including json.
>
> +1

I agree on the GUC point and on the general desirability of making jsonb
output not include insignificant whitespace.

There seems to be enough coders who agree to this principle: could one of
you please write a patch and start a new thread specifically for this
change.  If we go that route the need for the subject of this thread
becomes moot.

Thanks!

David J.

Re: [HACKERS] Add jsonb_compact(...) for whitespace-free jsonb to text

2016-04-27 Thread David G. Johnston

On Sun, Apr 24, 2016 at 3:02 PM, Sehrope Sarkuni  wrote:

> It'd be nice to have a stable text representation of a jsonb value with
> minimal whitespace. The latter would also save a few bytes per record in
> text output formats, on the wire, and in backups (ex: COPY ... TO STDOUT).
>

>
Attached is a *very* work in progress patch that adds a
> jsonb_compact(jsonb)::text function. It generates a text representation
> without extra whitespace but does not yet try to enforce a stable order of
> the properties within a jsonb value.
>
> 
This doesn't impact backups unless you do away with the function and change
the output i/o function for the type.  In that scenario the function is no
longer necessary.

David J.

Re: [HACKERS] Add jsonb_compact(...) for whitespace-free jsonb to text

2016-04-27 Thread David G. Johnston

On Sun, Apr 24, 2016 at 3:02 PM, Sehrope Sarkuni  wrote:

> Attached is a *very* work in progress patch that adds a
> jsonb_compact(jsonb)::text function. It generates a text representation
> without extra whitespace but does not yet try to enforce a stable order of
> the properties within a jsonb value.
>

I think that having a jsonb_compact function that complements the existing
jsonb_pretty function is a well scoped and acceptable feature.  I do not
believe that it should also take on the role of canonicalization.

I'd suggest that any discussions regarding stability of jsonb output be
given its own thread.

That topic also seems separate from a discussion on how to implement a
binary transport protocol for jsonb.

Lastly would be whether we change our default text representation so that
users utilizing COPY get the added benefit of a maximally minimized text
representation.

As an aside on the last topic, has there ever been considered to have a way
to specify a serialization function to use for a given type (or maybe
column) specified in a copy command?

Something like: COPY [...] WITH (jsonb USING jsonb_compact)

I'm thinking this would hard and undesirable given the role copy plays and
limited intelligence that it has in order to maximize its efficiency in
fulfilling its role.

Backups get compressed already so bandwidth seems the bigger goal there.
Otherwise I'd say that we lack any kind of overwhelming evidence that
making such a change would be warranted.

While these are definitely related topics it doesn't seem like any are
pre-requisites for the others.  I think this thread is going to become hard
to follow and trail off it continues to try and address all of these topics
randomly as people see fit to reply.  And it will quickly become hard for
anyone to jump in and understand the topics at hand.

David J.

Re: [HACKERS] Accidentally parallel unsafe functions

2016-04-29 Thread David G. Johnston

On Fri, Apr 29, 2016 at 4:19 PM, Tom Lane  wrote:

> Alvaro Herrera  writes:
> > Andreas Karlsson wrote:
> >> I am currently looking into adding the correct parallel options to all
> >> functions in the extensions and I noticed that some built-in functions
> seems
> >> to have been marked as unsafe by accident. The main culprit is
> >> system_views.sql which redefines these functions and removes the
> parallel
> >> safe flag.
>
> > Surely CREATE OR REPLACE should keep whatever the flag was, rather than
> > ovewrite it with a bogus value if not specified?  In other words IMO the
> > CREATE OR REPLACE code needs changing, not system_views.sql.
>
> Absolutely not!  The definition of CREATE OR REPLACE is that at the end,
> the state of the object is predictable from only what the command says.
> This is not open for renegotiation.
>

To whit:

http://www.postgresql.org/docs/current/static/sql-createfunction.html

"""

When CREATE OR REPLACE FUNCTION is used to replace an existing function,
the ownership and permissions of the function do not change. All other
function properties are assigned the values specified or implied in the
command. You must own the function to replace it (this includes being a
member of the owning role).
"""

I'd interpret "specified or implied in the command" to mean exactly what
Tom said - and it applies to "all [other] function properties".

The ownership bit is a nice side-effect since you can use superuser to
replace existing functions that are already owned by another user.  Those
are the only implied dynamic of function creation that exists within the
CREATE FUNCTION command - everything else is contained within the CREATE
FUNCTION DDL.

David J.

Re: [HACKERS] Add jsonb_compact(...) for whitespace-free jsonb to text

2016-04-28 Thread David G. Johnston

On Thu, Apr 28, 2016 at 10:00 AM, Ryan Pedela 
wrote:

> On Tue, Apr 26, 2016 at 10:49 AM, Stephen Frost 
> wrote:
>
>> * Ryan Pedela (rped...@datalanche.com) wrote:
>> > On Sun, Apr 24, 2016 at 4:02 PM, Sehrope Sarkuni 
>> wrote:
>> > > The default text representation of jsonb adds whitespace in between
>> > > key/value pairs (after the colon ":") and after successive properties
>> > > (after the comma ","):
>>
>> [...]
>>
>> > > It'd be nice to have a stable text representation of a jsonb value
>> with
>> > > minimal whitespace. The latter would also save a few bytes per record
>> in
>> > > text output formats, on the wire, and in backups (ex: COPY ... TO
>> STDOUT).
>> >
>> > +1
>> >
>> > I cannot comment on the patch itself, but I welcome jsonb_compact() or
>> some
>> > way to get JSON with no inserted whitespace.
>>
>> As I mentioned to Sehrope on IRC, at least for my 2c, if you want a
>> compact JSON format to reduce the amount of traffic over the wire or to
>> do things with on the client side, we should probably come up with a
>> binary format, rather than just hack out the whitespace.  It's not like
>> representing numbers using ASCII characters is terribly efficient
>> either.
>
>
> Why build a Ferrari when a skateboard would suffice? Besides, that doesn't
> help one of the most common cases for JSONB: REST APIs.
>

I'm agreeing with this sentiment.  This isn't an either-or situation so
argue the white-space removal on its own merits.  The fact that we might
implement a binary representation in the future doesn't, for me, influence
whether we make this white-space change now.



>
> Now that PG fully supports JSON, a user can use PG to construct the JSON
> payload of a REST API request. Then the web server would simply be a
> pass-through for the JSON payload. I personally have this use case, it is
> not hypothetical.
>
However currently, a user must parse the JSON string from PG and
> re-stringify it to minimize the whitespace. Given that HTTP is text-based,
> removing all extraneous whitespace is the best way to compress it, and on
> top of that you can do gzip compression.
>

Can you clarify what you mean by "and on top of that you can do gzip
compression"?

Unless you are suggesting that the binary format is just a gzipped version
> of the minimized text format, I don't see how a binary format helps at all
> in the REST API case.
>
>
No, I'm pretty sure you still end up with uncompressed text in the
application layer.


> In addition, every JSON implementation I have ever seen fully minimizes
> JSON by default. PG appears to deviate from standard practice for no
> apparent reason.
>

Sorry to nit-pick but that's called convention - the standard is likely
silent on this point.  And its not like this was done in a vacuum - why is
this only coming up now and not, say, during the beta?  Is it surprising
that this seemingly easy-to-overlook dynamic was not explicitly addressed
by the author and reviewer of the patch, especially when implementation of
said feature consisted of a lot more things of greater import and impact?

David J.

[HACKERS] Incomplete description for \t in psql documentation - should mention caption

2016-04-21 Thread David G. Johnston

http://www.postgresql.org/docs/9.5/interactive/app-psql.html

"""
\t
Toggles the display of output column name headings and row count footer.
This command is equivalent to \pset tuples_only and is provided for
convenience.
"""

Experience says that a table caption (i.e., \C) is also suppressed when the
option is used.

The documentation should be changed though I'd argue that there is some
merit for still showing the caption.

The exact combination I wanted to try and make work was:
\C 'Table Caption'
\x \t
SELECT * FROM tbl LIMIT 1

The desired output would have been:

Table Caption
col1   | 1
col2   | 2
col3   | 3

This is also a bit mis-leading: the \t suppresses column name headings -
unless viewed in expanded mode...in which case it suppresses the "ROW#"
block heading but leave the column name headings (which just happen to be
placed in rows) in place.  This is indeed the desired behavior it just
isn't precisely documented.

So:

\t
Toggles display of the row count footer.  In normal mode also suppresses
the column name headings.  In expanded mode suppresses the Record# block
header instead.  Also suppresses any caption that is set for the table.

I'm not sure how willing someone is to work these mechanics out - or the
desire to make them conditional on expanded versus table mode.  My
immediate needs can be readily solved by adding an additional column to my
output so that "type | Record Type" is the first column of the expanded
output.  Or just live with the redundant "-[ RECORD 1 ]-" header.

David J.

Re: [HACKERS] [PATCH] Add EXPLAIN (ALL) shorthand

2016-05-19 Thread David G. Johnston

On Thursday, May 19, 2016, David Christensen  wrote:

>
> > On May 19, 2016, at 3:17 PM, Евгений Шишкин  > wrote:
> >
> >
> >> On 19 May 2016, at 22:59, Tom Lane >
> wrote:
> >>
> >> David Christensen > writes:
> >>> This simple patch adds “ALL” as an EXPLAIN option as shorthand for
> “EXPLAIN (ANALYZE, VERBOSE, COSTS, TIMING, BUFFERS)” for usability.
> >>
> >> I'm not sure this is well thought out.  It would mean for example that
> >> we could never implement EXPLAIN options that are mutually exclusive
> >> ... at least, not without having to redefine ALL as
> all-except-something.
> >> Non-boolean options would be problematic as well.
> >>
> >
> > Maybe EVERYTHING would be ok.
> > But it is kinda long word to type.
>
> If it’s just a terminology issue, what about EXPLAIN (*); already a
> precedent with SELECT * to mean “everything”.  (MAX?

>

> LIKE_I’M_5?) Let the bikeshedding begin!

+1

> In any case, I think a shorthand for “give me the most possible detail
> without me having to lookup/type/remember the options” is a good tool.
>

EXPLAIN THIS
EXPLAIN BETTER

EXPAIN ABCTV (might need permission to document this variation though)

The later has some resemblance to option short form in command lines.

The bigger question is do we want to solve this in the server or let it be
a client concern and stick some usability enhancements into psql?  Maybe
even put it in psql first then migrate to the server after some field
experience.

The middle road is probably something like the following:

We could setup a guc for "default_explain_options" that the user could set
or have set for them using alter role.  Maybe we'd want to include a new
option "CLEAN" or "NONE" that tells the system to skip those default and
only use ones that are explicitly specified in the SQL command.  Basically
an envvar like many command line apps use in lieu of an .rc file but with a
simpler way to disable than setting it to nothing.

David J.

Re: [HACKERS] Calling json_* functions with JSONB data

2016-05-23 Thread David G. Johnston

On Mon, May 23, 2016 at 12:55 PM, Peter van Hardenberg  wrote:

> Hi there,
>
> I noticed it was very easy to accidentally call the json_* form of JSON
> manipulation functions with jsonb data as input. This is pretty
> sub-optimal, since it involves rendering the jsonb then reparsing it and
> calling the json_* form of the function.
>
> Fortunately, this seems quite easy to resolve by taking advantage of our
> ability to add json_*(jsonb) form of the functions.
>
> I talked this over with Andrew who had no objections and suggested I float
> it on the list before writing a patch. Looks pretty straightforward, just a
> few new data rows in pg_proc.h.
>
> Anyone have any concerns or suggestions?
>
>
Please provide an example of what you are talking about.

SELECT json_array_length('[1,2]'::jsonb)
ERROR: function json_array_length(jsonb) does not exist

-- The function name is "jsonb_array_length"; and there is no implicit cast
between the two.

David J.

Re: [HACKERS] Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls

2016-05-23 Thread David G. Johnston

On Mon, May 23, 2016 at 12:01 PM, Jeff Davis <pg...@j-davis.com> wrote:

> On Fri, May 20, 2016 at 1:41 PM, David G. Johnston
> <david.g.johns...@gmail.com> wrote:
> > How does the relatively new FILTER clause play into this, if at all?
>
> My interpretation of the standard is that FILTER is not allowable for
> a window function, and IGNORE|RESPECT NULLS is not allowable for an
> ordinary aggregate.
>
> So if we support IGNORE|RESPECT NULLS for anything other than a window
> function, we have to come up with our own semantics.
>
> > We already have "STRICT" for deciding whether a function processes nulls.
> > Wouldn't this need to exist on the "CREATE AGGREGATE"
>
> STRICT defines behavior at DDL time. I was suggesting that we might
> want a DDL-time flag to indicate whether a function can make use of
> the query-time IGNORE|RESPECT NULLS option. In other words, most
> functions wouldn't know what to do with IGNORE|RESPECT NULLS, but
> perhaps some would if we allowed them the option.
>
> Perhaps I didn't understand your point?
>

The "this" in the quote doesn't refer to STRICT but rather the
non-existence feature that we are talking about.

I am trying to make the point that the DDL syntax for "RESPECT|IGNORE
NULLS" should be on the "CREATE AGGREGATE" page and not the "CREATE
FUNCTION" page.

As far as a plain function cares its only responsibility with respect to
NULLs is defined by the STRICT property.  As this behavior only manifests
in aggregate situations the corresponding property should be defined there
as well.

David J.

Re: [HACKERS] Changed SRF in targetlist handling

2016-05-23 Thread David G. Johnston

On Mon, May 23, 2016 at 1:44 PM, David Fetter  wrote:

> On Mon, May 23, 2016 at 01:36:57PM -0400, Tom Lane wrote:
> > David Fetter  writes:
> > > On Mon, May 23, 2016 at 01:10:29PM -0400, Tom Lane wrote:
> > >> This seems a bridge too far to me.  It's just way too common to do
> > >> "select generate_series(1,n)".  We could tell people they have to
> > >> rewrite to "select * from generate_series(1,n)", but it would be far
> > >> more polite to do that for them.
> >
> > > How about making "TABLE generate_series(1,n)" work?  It's even
> > > shorter in exchange for some cognitive load.
> >
> > No thanks --- the word after TABLE ought to be a table name, not some
> > arbitrary expression.  That's way too much mess to save one keystroke.
>
> It's not just about saving a keystroke.  This change would go with
> removing the ability to do SRFs in the target list of a SELECT
> query.
>

If you want to make an argument for doing this regardless of the target
list SRF change by all means - but it does absolutely nothing to mitigate
the breakage that would result if we choose this path.

David J.

Re: [HACKERS] Changed SRF in targetlist handling

2016-05-23 Thread David G. Johnston

On Mon, May 23, 2016 at 4:24 PM, Alvaro Herrera 
wrote:

> Tom Lane wrote:
> > Joe Conway  writes:
>
> > > I'll also note that, unless I missed something, we also have to
> consider
> > > that the capability to pipeline results is still only available in the
> > > target list.
> >
> > Yes, we would definitely want to improve nodeFunctionscan.c to perform
> > better for ValuePerCall SRFs.  But that has value independently of this.
>
> Ah, so that's what "pipeline results" mean!  I hadn't gotten that.  I
> agree; Abhijit had a patch or a plan for this, a long time ago ...
>
>
Is this sidebar strictly an implementation detail, not user visible?

David J.

Re: [HACKERS] Changed SRF in targetlist handling

2016-05-23 Thread David G. Johnston

On Mon, May 23, 2016 at 4:05 PM, David Fetter  wrote:

> On Mon, May 23, 2016 at 02:39:54PM -0500, Merlin Moncure wrote:
> > On Mon, May 23, 2016 at 2:13 PM, David Fetter  wrote:
> > > On Mon, May 23, 2016 at 01:28:11PM -0500, Merlin Moncure wrote:
> > >> On Mon, May 23, 2016 at 12:10 PM, Tom Lane  wrote:
> > >> > Andres Freund  writes:
> > >> >> discussing executor performance with a number of people at pgcon,
> > >> >> several hackers - me included - complained about the additional
> > >> >> complexity, both code and runtime, required to handle SRFs in the
> target
> > >> >> list.
> > >> >
> > >> > Yeah, this has been an annoyance for a long time.
> > >> >
> > >> >> One idea I circulated was to fix that by interjecting a special
> executor
> > >> >> node to process SRF containing targetlists (reusing Result
> possibly?).
> > >> >> That'd allow to remove the isDone argument from
> ExecEval*/ExecProject*
> > >> >> and get rid of ps_TupFromTlist which is fairly ugly.
> > >> >
> > >> > Would that not lead to, in effect, duplicating all of execQual.c?
> The new
> > >> > executor node would still have to be prepared to process all
> expression
> > >> > node types.
> > >> >
> > >> >> Robert suggested - IIRC mentioning previous on-list discussion - to
> > >> >> instead rewrite targetlist SRFs into lateral joins. My gut feeling
> is
> > >> >> that that'd be a larger undertaking, with significant semantics
> changes.
> > >> >
> > >> > Yes, this was discussed on-list awhile back (I see David found a
> reference
> > >> > already).  I think it's feasible, although we'd first have to agree
> > >> > whether we want to remain bug-compatible with the old
> > >> > least-common-multiple-of-the-periods behavior.  I would vote for
> not,
> > >> > but it's certainly a debatable thing.
> > >>
> > >> +1 on removing LCM.
> > >
> > > As a green field project, that would make total sense.  As a thing
> > > decades in, it's not clear to me that that would break less stuff or
> > > break it worse than simply disallowing SRFs in the target list, which
> > > has been rejected on bugward-compatibility grounds.  I suspect it
> > > would be even worse because disallowing SRFs in target lists would at
> > > least be obvious and localized when it broke code.
> >
> > If I'm reading this correctly, it sounds to me like you are making the
> > case that removing target list SRF completely would somehow cause less
> > breakage than say, rewriting it to a LATERAL based implementation for
> > example.
>
> Yes.
>
> Making SRFs in target lists throw an error is a thing that will be
> pretty straightforward to deal with in extant code bases, whatever
> size of pain in the neck it might be.  The line of code that caused
> the error would be very clear, and the fix would be very obvious.
>
> Making their behavior different in some way that throws no warnings is
> guaranteed to cause subtle and hard to track bugs in extant code
> bases.


I'm advocating that if a presently allowed SRF-in-target-list is allowed
to remain it executes using the same semantics it has today.  In all other
cases, including LCM, if the present behavior is undesirable to maintain we
throw an error.  I'd hope that such an error can be written in such a way
as to name the offending function or functions.

If the user of a complex query doesn't want to expend the effort to locate
the specific instance of SRF that is in violation they will still have the
option to rewrite all of their uses in that particular query.

David J.

Re: [HACKERS] Changed SRF in targetlist handling

2016-05-23 Thread David G. Johnston

On Mon, May 23, 2016 at 4:15 PM, Tom Lane  wrote:

> Merlin Moncure  writes:
> > On Mon, May 23, 2016 at 2:13 PM, David Fetter  wrote:
> >> On Mon, May 23, 2016 at 01:28:11PM -0500, Merlin Moncure wrote:
> >>> +1 on removing LCM.
>
> >> As a green field project, that would make total sense.  As a thing
> >> decades in, it's not clear to me that that would break less stuff or
> >> break it worse than simply disallowing SRFs in the target list, which
> >> has been rejected on bugward-compatibility grounds.  I suspect it
> >> would be even worse because disallowing SRFs in target lists would at
> >> least be obvious and localized when it broke code.
>
> > If I'm reading this correctly, it sounds to me like you are making the
> > case that removing target list SRF completely would somehow cause less
> > breakage than say, rewriting it to a LATERAL based implementation for
> > example.  With more than a little forbearance, let's just say I don't
> > agree.
>
> We should consider single and multiple SRFs in a targetlist as distinct
> use-cases; only the latter has got weird properties.
>
> There are several things we could potentially do with multiple SRFs in
> the same targetlist.  In increasing order of backwards compatibility and
> effort required:
>
> 1. Throw error if there's more than one SRF.
>
> 2. Rewrite into LATERAL ROWS FROM (srf1(), srf2(), ...).  This would
> have the same behavior as before if the SRFs all return the same number
> of rows, and otherwise would behave differently.
>
> 3. Rewrite into some other construct that still ends up as a FunctionScan
> RTE node, but has the old LCM behavior if the SRFs produce different
> numbers of rows.  (Perhaps we would not need to expose this construct
> as something directly SQL-visible.)
>
> It's certainly arguable that the common use-cases for SRF-in-tlist
> don't have more than one SRF per tlist, and thus that implementing #1
> would be an appropriate amount of effort.  It's worth noting also that
> the LCM behavior has been repeatedly reported as a bug, and therefore
> that if we do #3 we'll be expending very substantial effort to be
> literally bug-compatible with ancient behavior that no one in the
> current development group thinks is well-designed.  As far as #2 goes,
> it would have the advantage that code depending on the same-number-of-
> rows case would continue to work as before.  David has a point that it
> would silently break application code that's actually depending on the
> LCM behavior, but how much of that is there likely to be, really?
>
> [ reflects a bit... ]  I guess there is room for an option 2-and-a-half:
>
> 2.5. Rewrite into LATERAL ROWS FROM (srf1(), srf2(), ...), but decorate
> the FunctionScan RTE to tell the executor to throw an error if the SRFs
> don't all return the same number of rows, rather than silently
> null-padding.  This would have the same behavior as before for the sane
> case, and would be very not-silent about cases where apps actually invoked
> the LCM behavior.  Again, we wouldn't necessarily have to expose such an
> option at the SQL level.  (Though it strikes me that such a restriction
> could have value in its own right, analogous to the STRICT options that
> we've invented in some other places to allow insisting on the expected
> numbers of rows being returned.  ROWS FROM STRICT (srf1(), srf2(), ...),
> anybody?)
>

I'd let the engineers decide between 1, 2.5, and 3 - but if we accomplish
our goals while implementing #3 I'd say that would be the best outcome for
the community as whole.

We don't have the luxury of providing a safe-usage mode where people
writing new queries get the error but pre-existing queries are considered
OK.  We will have to rely upon education and deal with the occasional bug
report but our long-time customers, even if only a minority would be
affected, will appreciate the effort taken to not break code that has been
working for a long time.

The minority is likely small enough to at least make options 1 and 2.5
viable though I'd think we make an effort to avoid #1.

David J.

Re: [HACKERS] [BUGS] BUG #14155: bloom index error with unlogged table

2016-05-24 Thread David G. Johnston

Moving my griping to -hackers only

On Tue, May 24, 2016 at 8:08 PM, Tom Lane  wrote:

> dig...@126.com writes:
> > postgres=# create unlogged table u_tbl (id int);
> > CREATE TABLE
> > postgres=# create index idx_u_tbl on u_tbl using bloom (id);
> > ERROR:  index "idx_u_tbl" already contains data
>
> Yeah, it looks like nobody ever tested bloom's unlogged-index support;
> it doesn't work or even come very close to working.  Will fix, thanks
> for the report!
>

I'll tack on my own gripe here, just because.

It doesn't give me a lot of confidence in what was committed when the
summary sentence for the module says:

"
bloom is a module which implements an index access method. It comes as an
example of custom access methods and generic WAL records usage. But it is
also useful in itself.
"

Honestly, as a user I couldn't care less that bloom is "an example custom
access method".  I want to know what it does and that it does so reliably,
and has a easy-to-use interface.  I complained earlier about its lack of
direct support for the boolean type.  Teodor's response on the thread
wasn't particularly encouraging:

https://www.postgresql.org/message-id/5718a59d.4090...@sigaev.ru

I also see that the following -hacker thread didn't get resolved:

https://www.postgresql.org/message-id/cakfquwykrepeselfwb0b85dat466lely8ao-okpwaqpwtmg...@mail.gmail.com

I would not be surprised to see additional problems crop up in the module.
Tom's characterization above just reinforces that.

David J.

Re: [HACKERS] Calling json_* functions with JSONB data

2016-05-23 Thread David G. Johnston

On Mon, May 23, 2016 at 4:37 PM, Tom Lane  wrote:

> Peter van Hardenberg  writes:
> > Great question, Marko. If you can point me towards an example I'll take a
> > look, but I'll proceed with the current understanding and suggestions and
> > see what people have to say.
>
> I believe Marko's just complaining about the case for unknown-type
> arguments, for example:
>
> regression=# select json_array_length('[1,2,3]');
>  json_array_length
> ---
>  3
> (1 row)
>
> The parser has no trouble resolving this because there is only one
> json_array_length(); but if there were two, it would fail to make a
> determination of which one you meant.
>
> AFAICS the only way to fix that would be to introduce some preference
> between the two types.  For example, we could move both 'json' and 'jsonb'
> into their own typcategory ('J' is unused...) and then mark 'jsonb' as
> the preferred type in that category.  This would require a fair amount of
> experimentation to determine if it upsets any cases that work conveniently
> today; but right offhand I don't see any fatal problems with such an idea.
>
> regards, tom lane
>



I
 guess the relevant point in the documentation is the parenthetical
sentence:



(The processing functions consider the last value as the operative one.)

http://www.postgresql.org/docs/9.5/static/datatype-json.html

Which normalizes the behaviors of jsonb and json as they pass through one
of these functions.  Though only the multi-key is noted which means
white-space (immaterial) and key-order (potentially material) behaviors
differ; though the later coming through the function unscathed is not
something that user's should be relying upon.  Specifically I'm thinking of
the behavior that "json_each(...)" would exhibit.

David J.

Re: [HACKERS] Changed SRF in targetlist handling

2016-05-23 Thread David G. Johnston

On Mon, May 23, 2016 at 4:42 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> "David G. Johnston" <david.g.johns...@gmail.com> writes:
> > On Mon, May 23, 2016 at 4:24 PM, Alvaro Herrera <
> alvhe...@2ndquadrant.com>
> > wrote:
> >> Ah, so that's what "pipeline results" mean!  I hadn't gotten that.  I
> >> agree; Abhijit had a patch or a plan for this, a long time ago ...
>
> > Is this sidebar strictly an implementation detail, not user visible?
>
> Hmm.  It could be visible in the sense that the execution of multiple
> functions in one ROWS FROM() construct could be interleaved, while
> (I think) the current implementation runs each one to completion
> serially.  But if you're writing code that assumes that, I think you
> should not be very surprised when we break it.  In any case, that
> would not affect the proposed translation for SRFs-in-tlist, since
> those have that behavior today.
>

Thanks

Sounds like "zipper results" would be a better term for it...but, yes, if
that's the general context it falls into implementation from my
perspective.

But then I don't get Joe's point - if its an implementation detail why
should it matter if rewriting the SRF-in-tlist to be laterals changes
execution from a serial to an interleaved implementation.  Plus, Joe's
claim: "the capability to pipeline results is still only available in the
target list", and yours above are at odds since you claim the rewritten
behavior is the same today.  Is there a disconnect in knowledge or are you
talking about different things?

David J.

Re: [HACKERS] Calling json_* functions with JSONB data

2016-05-23 Thread David G. Johnston

On Mon, May 23, 2016 at 5:38 PM, Jim Nasby  wrote:

> On 5/23/16 11:55 AM, Peter van Hardenberg wrote:
>
>> Fortunately, this seems quite easy to resolve by taking advantage of our
>> ability to add json_*(jsonb) form of the functions.
>>
>
> Another issue no one has mentioned is functions that return JSON/JSONB.
> IMO those should NOT be overloaded, because that will make it very easy to
> accidentally change from one type to the other without meaning to.

Actually, by definition they cannot be overloaded.  A function's signature
is derived from its input types only.

http://www.postgresql.org/docs/devel/static/sql-createfunction.html

"""
The name of the new function must not match any existing function with the
same input argument types in the same schema. However, functions of
different argument types can share a name (this is called overloading).
"""

Admittedly the absence of "output" is not emphasized but overloading in
(most?) languages (small sample size for personal knowledge) is subject to
the same restriction.

David J.

Re: [HACKERS] Changed SRF in targetlist handling

2016-05-22 Thread David G. Johnston

tl;dr

Semantic changes to SRF-in-target-list processing are undesirable when they
are all but deprecated.

I'd accept a refactoring that trades a performance gain for unaffected
queries for a reasonable performance hit of those afflicted.

Preamble...

Most recent thread that I can recall seeing on the topic - and where I
believe the rewrite idea was first presented.

http://www.postgresql.org/message-id/flat/25750.1458767...@sss.pgh.pa.us#25750.1458767...@sss.pgh.pa.us

On Sun, May 22, 2016 at 8:53 PM, Andres Freund  wrote:

> Hi,
>
> discussing executor performance with a number of people at pgcon,
> several hackers - me included - complained about the additional
> complexity, both code and runtime, required to handle SRFs in the target
> list.
>
> One idea I circulated was to fix that by interjecting a special executor
> node to process SRF containing targetlists (reusing Result possibly?).
> That'd allow to remove the isDone argument from ExecEval*/ExecProject*
> and get rid of ps_TupFromTlist which is fairly ugly.
>

Conceptually I'm all for minimizing the impact on queries of this form.
It seems to be the most likely to get written and committed and the least
likely to cause unforeseen issues.

> Robert suggested - IIRC mentioning previous on-list discussion - to
> instead rewrite targetlist SRFs into lateral joins. My gut feeling is
> that that'd be a larger undertaking, with significant semantics changes.
>
[...]

> If we accept bigger semantical changes, I'm inclined to instead just get
> rid of targetlist SRFs in total; they're really weird and not needed
> anymore.
>

I cannot see these, in isolation, being a good option.  Nonetheless, I
don't think any semantic change should happen before 9.2 becomes no longer
supported.  I'd be inclined to take a similar approach as with
standard_conforming_strings (minus the execution guc, just the warning one)
with whatever after-the-fact learning taken into account.

Its worth considering query rewrite and making it forbidden as a joint goal.

For something like a canonical version of this, especially for
composite-returning SRF:

WITH func_call (
SELECT func(tbl.col)
FROM tbl
)
SELECT (func_call.func).*
FROM func_call;

If we can rewrite the CTE portion into a lateral - with the exact same
semantics (specifically, returning the single-column composite) then check
the rewritten query the select list SRF would not longer be present and no
error would be thrown.

For situations where a rewrite cannot be made to behave properly we leave
the construct alone and let the query raise an error.

In considering what I just wrote I'm not particularly enamored with
it...hence my overall conclusion.  Can't say I hate it and after re-reading
the aforementioned thread I'm inclined to like it for cases where, for
instance, we are susceptible to a LCM evaluation.

David J.

Re: [HACKERS] Adding an alternate syntax for Phrase Search

2016-05-22 Thread David G. Johnston

On Sun, May 22, 2016 at 6:53 PM, Teodor Sigaev  wrote:

>
> to_tsquery(' Berkus & "PostgreSQL Version 10.0" ')
>>
>> ... would be equivalent to:
>>
>> to_tsquery(' Berkus & ( PostgreSQL <-> version <-> 10.0 )')
>>
>
> select to_tsquery('Berkus') && phraseto_tsquery('PostgreSQL Version 10.0');
> does it as you wish

Sure, but I imagine (not having used it myself), that in cases involving
user input said text treated somewhat holistically and it wouldn't be all
that easy, or desirable, to choose between the two forms at runtime.

David J.

Re: [HACKERS] Adding an alternate syntax for Phrase Search

2016-05-22 Thread David G. Johnston

On Sun, May 22, 2016 at 3:00 PM, Thom Brown  wrote:

> On 22 May 2016 at 18:52, Josh berkus  wrote:
> > Folks,
> >
> > This came up at pgCon.
> >
> > The 'word <-> word <-> word' syntax for phrase search is not
> > developer-friendly.  While we need the <-> operator for SQL and for the
> > sophisticated cases, it would be really good to support an alternate
> > syntax for the simplest case of "words next to each other".  My proposal
> > is enclosing the phrase in double-quotes, which would be intuitive to
> > users and familiar from search engines.  Thus:
> >
> > to_tsquery(' Berkus & "PostgreSQL Version 10.0" ')
> >
> > ... would be equivalent to:
> >
> > to_tsquery(' Berkus & ( PostgreSQL <-> version <-> 10.0 )')
> >
> > I realize we're already in beta, but pgCon was actually the first time I
> > saw the new syntax.  I think if we don't do this now, we'll be doing it
> > for 10.0.
>
> I think it's way too late for that.  I don't see a problem with
> including it for 10.0, but when the feature freeze has long passed and
> we also have our first beta out, it's no longer a matter of changing
> the design or additional functionality, unless there's something that
> absolutely requires modification.  This isn't that.

Particularly in light of our annual major release cycle we need to be open
to usability recommendations during Beta 1 (at minimum).  Not everyone with
intelligence, insight, and meaningful uses for our product and features
follows -hackers and compiles from source to try things out during
development.  We should encourage these others to at least voice their
opinions on the new features.

Its not like we get inundated with these kinds of requests.  Let it remain
mostly a resource concern.  If a few people can agree on desirability and
get a patch written, reviewed, and ready-for-commit before the next beta
release then the release committee, with input from the community, can be
the final arbiter of whether to back-patch it into 9.6 or keep it for 10.0

I'd like to think that features are the "top-level capabilities" that we
introduce - this is a sub-component of the "phrase search" feature.
Component freeze should occur no earlier than after the second packaged
release.  I'd generally rather have feature freeze earlier and use the
added time for component work and additional general testing if keeping on
the yearly cycle doesn't allow for both.  But, I'm tending to think that we
are that tightly constrained generally.

David J.

Re: [HACKERS] [PATCH] Add EXPLAIN (ALL) shorthand

2016-05-21 Thread David G. Johnston

On Sat, May 21, 2016 at 10:32 AM, Euler Taveira 
wrote:

> On 20-05-2016 20:34, Robert Haas wrote:
> > Hmm, my experience is different.  I use EXPLAIN (ANALYZE, VERBOSE) a
> > lot, but EXPLAIN (ANALYZE, BUFFERS) only rarely.  I wonder if a GUC is
> > the way to go.
> >
> I wouldn't like a command output controlled by GUC. EXPLAIN is used a
> lot in bug/performance reports.

And most of the time the choice of options is totally arbitrary based upon
the mood and experience of the user, so what's it matter if they saved a
few keystrokes and set the GUC in the .psqlrc file?

I'm predicting users that will have
> trouble while using EXPLAIN if someone change the suggested GUC. It also
> breaks clients/applications that parse EXPLAIN.

Pretty much the same argument as above.

I would not expect a DBA to set this value globally - but shame on them if
they do.  I'd expect either ALTER ROLE or SET usage, in .psqlrc if
applicable, to be the dominate usage for setting the value to a non-empty
string.  There is UI to consider here but I don't see any fundamental
problems.

I don't want another
> bytea_output. However, if someone wants to suggest turning on/off some
> option defaults, I'm all ears.
>

I don't see the resemblance.  There was, and is, no way to change the
output format of bytea within the SELECT SQL statement, only the via GUC.
But with explain everything you could do with the GUC can and is already
being done by people simply tossing or excluding options on the EXPLAIN SQL
statement they write.

All this does is allow the user to setup their preferred defaults in said
GUC so they don't have to keep typing them in over and over again.  If the
GUC results in a broken output in whatever context you care about the user
doing it manually would have the same result.

David J.

Re: [HACKERS] Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls

2016-05-20 Thread David G. Johnston

Just doing a drive-by...

On Fri, May 20, 2016 at 3:50 PM, Jeff Davis  wrote:

> Old thread link:
>
> http://www.postgresql.org/message-id/CA+=vxna5_n1q5q5okxc0aqnndbo2ru6gvw+86wk+onsunjd...@mail.gmail.com
>
> On Thu, Apr 14, 2016 at 1:29 PM, Stephen Frost  wrote:
> > Jeff
> >
> > (Reviving an old thread for 2014...)

>
> > Would you have time to work on this for 9.7..?  I came across a
> > real-world use case for exactly this capability and was sorely
> > disappointed to discover we didn't support it even though there had been
> > discussion for years on it, which quite a few interested parties.
> >

> First, I think the syntax is still implemented in a bad way. Right now
> it's part of the OVER clause, and the IGNORE NULLS gets put into the
> frame options. It doesn't match the way the spec defines the grammar,
> and I don't see how it really makes sense that it's a part of the
> frame options or the window object at all.

How does the relatively new FILTER clause play into this, if at all?

I think we need a need catalog support to say
> whether a function honors IGNORE|RESPECT NULLS or not, which means we
> also need support in CREATE FUNCTION.
>

We already have "STRICT" for deciding whether a function processes nulls.
Wouldn't this need to exist on the "CREATE AGGREGATE"

Rhetorical question: I presume we are going to punt on the issue, since it
is non-standard, but what is supposed to happen with a window invocation
that ignores nulls but has a non-strict function that returns a non-null on
null input?

David J.

Re: [HACKERS] Does Type Have = Operator?

2016-05-11 Thread David G. Johnston

On Wed, May 11, 2016 at 9:54 AM, Robert Haas <robertmh...@gmail.com> wrote:

> On Tue, May 10, 2016 at 9:16 PM, David G. Johnston
> <david.g.johns...@gmail.com> wrote:
> > Brute force: you'd have to query pg_amop and note the absence of a row
> with
> > a btree (maybe hash too...) family strategy 3 (1 for hash) [equality]
> where
> > the left and right types are the same and match the type in question.
>
> The core system uses this kind of thing to find equality operators in
> a number of cases.
>
> We often assume that the operator which implements equality for the
> type's default btree operator class is the canonical one for some
> purpose.  Ditto for the default hash operator class.
>

Yeah, the user-facing documentation covers it pretty deeply if not in one
central location.

But apparently the core system also uses the fact that "=", if present, is
an equality operator and, less so, that no other operator is expected

to be used for equality.

I suspect that such an expectation is not enforced though - e.g., someone
could define "==" to mean equality if they so choose (the lesser
property).  Its hard to imagine defining "=" to mean something different in
logic, though, without intentionally trying to be cryptic.

David J.

Re: [HACKERS] Does Type Have = Operator?

2016-05-10 Thread David G. Johnston

On Tuesday, May 10, 2016, David E. Wheeler  wrote:

>
> This makes sense, of course, and I could fix it by comparing text values
> instead of json values when the values are JSON. But of course the lack of
> a = operator is not limited to JSON. So I’m wondering if there’s an
> interface at the SQL level to tell me whether a type has an = operator?
> That way I could always use text values in those situations.
>
>
http://www.postgresql.org/docs/9.5/interactive/catalog-pg-amop.html
http://www.postgresql.org/docs/9.5/interactive/xindex.html

Brute force: you'd have to query pg_amop and note the absence of a row with
a btree (maybe hash too...) family strategy 3 (1 for hash)
[equality] where the left and right types are the same and match the type
in question.

There is likely more to it - though absence is pretty much a given I'd be
concerned about false negatives due to ignoring other factors like
"amoppurpose".

In theory you should be able to trade off convenience for correctness by
calling:

to_regoperator('=(type,type)')

But I've never tried it and it assumes that = is the equality operator and
that its presence is sufficient.  I'm also guessing on the text type name
syntax.

http://www.postgresql.org/docs/9.5/interactive/functions-info.html

This option is a young one from what I remember.

David J.

Re: [HACKERS] Accurate list of Keywords / Datatypes?

2016-05-10 Thread David G. Johnston

On Saturday, May 7, 2016, Euler Taveira  wrote:

> On 07-05-2016 22:53, Robins Tharakan wrote:
> > Should I be looking somewhere else? Parse keywords from Git Source file
> > (if so where)? Parse PG Documentation?
> >
> src/include/parser/kwlist.h


>
 http://www.postgresql.org/docs/9.5/interactive/functions-info.html

SELECT * FROM pg_get_keywords();

I don't know how the docs, this function, and the source code relate to
each other.

David J.

Re: [HACKERS] Does Type Have = Operator?

2016-05-10 Thread David G. Johnston

On Tuesday, May 10, 2016, Euler Taveira  wrote:
>
> Also, IS DISTINCT FROM is an alias for = operator per standard IIRC.
>

Technically "is not distinct from" would be more correct.   Alias implies
exact while in the presence of nulls the two behave differently.

"is distinct from" ~ "<>" which is the canonical form (alias) for "!="

http://www.postgresql.org/docs/9.5/interactive/functions-comparison.html

David J.

Re: [HACKERS] parallel.c is not marked as test covered

2016-05-11 Thread David G. Johnston

On Wed, May 11, 2016 at 10:38 AM, Robert Haas  wrote:

> On Wed, May 11, 2016 at 12:34 AM, Tom Lane  wrote:
> >> Hmm, that is strange.  I would have expected that to stuff a Gather on
> >> top of the Aggregate.  I wonder why it's not doing that.
> >
> > The reason is that create_plain_partial_paths() contains a hard-wired
> > decision not to generate any partial paths for relations smaller than
> > 1000 blocks, which means that no partial path will ever be generated
> > for *any* relation in the standard regression tests, force_parallel_mode
> > or no.
>
> Well that's an interesting theory, except that you've completely
> missed the point of force_parallel_mode.  force_parallel_mode pushes a
> special Gather node on top of any plan that is not already parallel in
> some way but which is parallel-safe.   That special Gather node runs
> only in the worker, not the leader, and always uses just one worker.
>

What happens when there are no workers available due to
max_worker_processes already being assigned?

Related question, if max_parallel_degree is >1 and "the requested number of
workers may not actually be available at runtime" is true, does the degree
of parallelism minimize at 1 worker + leader or will the leader simply run
the query by itself?

David J.

Re: [HACKERS] Parameters don't work in FETCH NEXT clause?

2016-05-17 Thread David G. Johnston

On Tue, May 17, 2016 at 12:15 PM, Shay Rojansky  wrote:

> Apologies, as usual I didn't read the docs carefully enough.
>
> On Tue, May 17, 2016 at 7:13 PM, Tom Lane  wrote:
>
>> Shay Rojansky  writes:
>> > A user of mine just raised a strange issue... While it is possible to
>> use a
>> > parameter in a LIMIT clause, PostgreSQL does not seem to allow using
>> one in
>> > a FETCH NEXT clause. In other words, while the following works:
>> > SELECT 1 LIMIT $1;
>> > The following generates a syntax error:
>> > SELECT 1 FETCH NEXT $1 ROWS ONLY;
>> > Since LIMIT and FETCH NEXT are supposed to be equivalent this behavior
>> is
>> > odd.
>>
>> Per the SELECT reference page:
>>
>> SQL:2008 introduced a different syntax to achieve the same result,
>> which PostgreSQL also supports. It is:
>>
>> OFFSET start { ROW | ROWS }
>> FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
>>
>> In this syntax, to write anything except a simple integer constant for
>> start or count, you must write parentheses around it.
>>
>> The comments about this in gram.y are informative:
>>
>>  * Allowing full expressions without parentheses causes various parsing
>>  * problems with the trailing ROW/ROWS key words.  SQL only calls for
>>  * constants, so we allow the rest only with parentheses.  If omitted,
>>  * default to 1.
>>
>
Would something like this be valid?

OFFSET { start_literal | ( start_expression ) } { ROW | ROWS }
FETCH { FIRST | NEXT} [ count_literal | ( count_expression ) ] { ROW | ROWS
} ONLY

Leaving the mandatory parentheses detail to the description, while
adequate, seems insufficient - especially when a normal LIMIT expression is
not so restricted.

And don't you think the section header would be more accurately named:

Limit, Offset & Fetch Clauses

The nuance regarding "different standard syntax" is unknown to the reader
who first looks at the syntax and sees three different lines, one for each
clause, and then scrolls down looking at headers until they find the
section for the clause they are interested in.  That FETCH is an alias for
LIMIT is not something that I immediately understood - though to be honest
I don't think I comprehended the presence of FETCH on a SELECT query at all
and thought it only pertained to cursors

David J.

Re: [HACKERS] Keeping CURRENT_DATE and similar constructs in original format

2016-05-12 Thread David G. Johnston

On Thursday, May 12, 2016, Tom Lane <t...@sss.pgh.pa.us> wrote:

> "David G. Johnston" <david.g.johns...@gmail.com <javascript:;>> writes:
> > On Thursday, May 12, 2016, Tom Lane <t...@sss.pgh.pa.us <javascript:;>
> > <javascript:_e(%7B%7D,'cvml','t...@sss.pgh.pa.us <javascript:;>');>>
> wrote:
> >> (I'm not particularly in love with the node type name
> >> ValueFunction; anybody got a better idea?)
>
> > SQL99DateTimeFunction (or roughly whenever they were introduced)?
>
> Some of them aren't datetime-related, though.  I thought about
> NiladicFunction but it seemed maybe too technical.
>
>
The time ones taking precision confuse things a bit but my first reaction
was positive.  It is readily grepable.  I'd rather have that over
ValueFunction.

David J.

Re: [HACKERS] 10.0

2016-05-13 Thread David G. Johnston

On Friday, May 13, 2016, Tom Lane  wrote:

> Robert Haas > writes:
> > On Fri, May 13, 2016 at 2:49 PM, Tom Lane  > wrote:
>
> > If we don't want to stick with the current practice of debating when
> > to bump the same digit, then let's agree that 10.0 will follow 9.6 and
> > after that we'll bump the first digit after X.4, as we did with 7.X
> > and 8.X.
>
> It was absolute, utter chance that the 7.x and 8.x series both ended
> with .4.  I see no reason to turn that into some kind of standard,
> especially when it didn't work like that for 6.x or 9.x.
>

The underlying premise, for me, of choosing .4 or .5  was that presently we
discontinue support after 5 years/releases.  A new .0 would come out just
as we discontinue the previous .0

As an added idea, if switching to major.patch, would be to make 10.0.x but
then go to 11.x. Though that's somewhat easier cognitively it would
probably be harder for developers that just ripping off the bandaid.

David J.

Re: [HACKERS] Lets (not) break all the things. Was: [pgsql-advocacy] 9.6 -> 10.0

2016-05-13 Thread David G. Johnston

On Fri, May 13, 2016 at 6:05 PM, Joshua D. Drake 
wrote:

> On 05/13/2016 01:42 PM, Josh berkus wrote:
>
>> On 05/13/2016 01:04 PM, Joshua D. Drake wrote:
>>
>>> On 05/13/2016 12:03 PM, Josh berkus wrote:
>>>
 On 05/13/2016 11:48 AM, Robert Haas wrote:

> On Fri, May 13, 2016 at 12:12 PM, Joshua D. Drake
>  wrote:
>

>>> Anyway, all of this is a moot point, because nobody has the power to
 tell the various companies what to do.  We're just lucky that everyone
 is still committed to writing stuff which adds to PostgreSQL.

>>>
>>> Lucky? No. We earned it. We earned it through years and years of hard
>>> work. Should we be thankful? Absolutely. Should we be grateful that we
>>> have such a powerful and engaged commercial contribution base? 100%.
>>>
>>
>> Lucky.  Sure there was work and personal integrity involved, but like
>> any success story, there was luck.
>>
>> But we've also been fortunate in not spawning hostile-but-popular forks
>> by people who left the project, and that none of the companies who
>> created hostile forks were very successful with them, and that nobody
>> has seriously tried using lawyers to control/ruin the project.
>>
>
> I can't get behind you on this. Everything you have said above has to do
> with the hard work and integrity of the people in this project. It isn't
> luck or divine intervention.
>
>
>> And, most importantly, we've been lucky that a lot of competing projects
>> have self-immolated instead of being successful and brain-draining our
>> contributors (MySQL, ANTS, MonetDB, etc.)
>>
>>
> Actually there are people that have been drained out, I won't name them
> but it is pretty easy to figure out who they are. The people that are left,
> especially the long timers are here because of their integrity and
> attachment to the project.
>
> This project builds good people, and the good people build a good project.
>
> I am not going to buy into a luck story for something I and many others
> have invested decades of their life into.


I'm not sure how this ended up a philosophical/religious argument but
given that we are simply lucky that a meteor didn't wipe out our species 20
thousand years ago our present reality is a combination of determination
and luck and trying to look at shorter term slices and assign weights to
how much luck influenced something and how much was determination seems
like something best done over drinks.

In the vein of "praise publicly, sold privately" if anyone here thinks
that someone else should be acting differently I'd suggest sending them a
private message to that effect.  More useful would be to point out on these
lists things you find are done well.

David J.

Re: [HACKERS] 10.0

2016-05-13 Thread David G. Johnston

On Fri, May 13, 2016 at 10:18 PM, Mark Dilger 
wrote:

>
> My main concern is that a commitment to never, ever break backwards
> compatibility is a commitment to obsolescence.

You started this sub-thread with:

"If I understand correctly..."

I'm not sure that you do...

Our scheme is, in your terms, basically:

.micro

where  is a decimal.

You cannot reason about the whole and fraction portions of the decimal
independently.

When  changes backward compatibility can be broken - with respect to
both API and implementation.

It therefore makes sense to
> reserve room in the numbering scheme to be clear and honest about when
> backwards compatibility has been broken.  The major number is the normal
> place to do that.

I'm not convinced there is enough risk here to compromise the present in
order to accommodate some unknown scenario that may never even come to
pass.

David J.

Re: [HACKERS] Lets (not) break all the things. Was: [pgsql-advocacy] 9.6 -> 10.0

2016-05-13 Thread David G. Johnston

On Fri, May 13, 2016 at 12:12 PM, Joshua D. Drake 
wrote:

> On 05/13/2016 07:42 AM, Robert Haas wrote:
>
>> On Sat, Apr 30, 2016 at 8:46 PM, Joshua Drake 
>> wrote:
>>
>>> Oh, absolutely. I was just pointing out how a lot of companies are
>>> hoarding
>>> talent internally for no productive purpose.
>>>
>>
>> Wow, really?
>>
>
[...]

>
>
>> Let's respect people and companies for what they do contribute, rather
>> than labeling it as not good enough.
>>
>>
> There was no disrespect intended. I was trying to push forth an idea that
> multi-company team collaboration is better for the community than single
> company team collaboration. I will stand by that assertion.

Yeah, that failed...

I find it much easier to debate the assertion "multi-company team
collaboration is better ..." than debating whether "companies are
hoarding...".

I'll disagree - such a blanket statement is nearly worthless.  Three
one-person companies working together would not automatically produce
better outputs than a single company dedicating 3 people to an endeavor.
But even that retort requires me to cheat because I'm able to draw a
conclusion about the fallacy of generalization without needing to be any
more precise as to my concept of "better".

When it comes to the first statement nothing can be done to avoid the fact
that no frame-of-reference has been established for "productive purpose".

Contrary to my earlier point the nature of our setup makes the whole "scold
privately" aspect difficult - and maybe this approach works better than I
can imagine.  Though the down-thread from the quoted post does seem to
indicate that considerable non-lethal friendly-fire is one of its
byproducts.

David J.

Re: [HACKERS] 10.0

2016-05-13 Thread David G. Johnston

On Fri, May 13, 2016 at 6:44 PM, Michael Banck <mba...@debian.org> wrote:

> On Fri, May 13, 2016 at 05:31:00PM -0400, David G. Johnston wrote:
> > The underlying premise, for me, of choosing .4 or .5  was that presently
> we
> > discontinue support after 5 years/releases.  A new .0 would come out just
> > as we discontinue the previous .0
>
> Well maybe the 5 year support cycle would be nice to encode, but how is
> .0 different from .1 then?  You make sound like .0 would be similar to
> Ubuntu's LTS which is not the case, unless you want to propose that only
> .0 releases get 5 years and not the in-betweens? That'd be a shame.
>

The opinion seems to be that major.0 is some kind of magic incantation in
the broader world of users...so its basically "since the last time we
incremented major this is what we've done - 80% of which is new to you only
if you've lived under a rock for the past four years".

David J.

Re: [HACKERS] 10.0

2016-05-13 Thread David G. Johnston

On Fri, May 13, 2016 at 7:32 PM, Mark Dilger 
wrote:

>
> Any project that starts inflating its numbering scheme sends a message to
> users of the form, "hey, we've just been taken over by marketing people,
> and
> software quality will go down from now on."
>

Tom brought up my own thoughts on the rest - but, really, this is a
cynical way of looking at things.  At least insofar as whether marketing
has say over what exact version number is assigned to a release should, and
here likely does, have almost zero influence on the quality of the code
that went into said release.  This entire discussion and timeline is proof
of exactly that.  We have the option to market version 10.0.0 if we so
choose but during the entire past year -core hasn't cared whether the
release they are working toward is eventually numbered 9.6.0 or 10.0.0 -
and I posit that nothing would have changed had it been decided a year ago
that we were going to release 10.0.0.

I'll accept that addressing the broader community's confusion regarding our
version numbering scheme is a problem worth addressing.  Therefore I
support moving to a simple "version.patch" system for next year's release.
Having accepted the importance of accepting the broader community's
opinions changing to 10 now after we've released beta1 would be
counter-productive.

Personally, having come in around the 8.2 release I didn't, and quite
honestly still don't, care why the decision was made to release 9.0 instead
of 8.5.  Mainly because the feature oriented decisions to increment seems
to be largely focused on the DBA - which is it not my primary concern.  For
me, the inclusion of window functions would have been enough to warrant a
major version bump - and CTEs as well.  Both of those went into 8.4.  So
not only is it getting harder to gain critical mass in a single release
(which is good - it means we are not settling for the easy stuff and are
properly breaking down harder features into small releaseables) but the
stuff we are keying on are likely of secondary importance to a large number
of users.

David J.

[HACKERS] Keeping CURRENT_DATE and similar constructs in original format

2016-05-12 Thread David G. Johnston

On Thursday, May 12, 2016, Tom Lane > wrote:

>
> So what I've wanted to do for some time is invent a new expression node
> type that represents any one of these functions and can be reverse-listed
> in the same format that the input had.  The attached proposed patch does
> that.  (I'm not particularly in love with the node type name
> ValueFunction; anybody got a better idea?)
>
>
SQL99DateTimeFunction (or roughly whenever they were introduced)?

I agree with the premise.  I took notice of it recently in explain output
on these lists using current_date.  That example read like
('now'::cstring)::date which was really odd since I was at least expecting
text as the intermediate cast...

David J.

Re: [HACKERS] Html parsing and inline elements

2016-04-29 Thread David G. Johnston

On Fri, Apr 29, 2016 at 1:47 PM, Bruce Momjian  wrote:

> On Wed, Apr 13, 2016 at 12:57:19PM -0300, Marcelo Zabani wrote:
> > Hi, Tom,
> >
> > You're right, I don't think one can argue that the default parser should
> know
> > HTML.
> > How about your suggestion of there being an HTML parser, is it feasible?
> I ask
> > this because I think that a lot of people store HTML documents these
> days, and
> > although there probably aren't lots of HTML with words written along
> multiple
> > inline elements, it would certainly be nice to have a proper parser for
> these
> > use cases.
> >
> > What do you think?
>
> It sounds useful.
>

It sounds like an external project/extension...

David J.

Re: [HACKERS] Rename max_parallel_degree?

2016-05-02 Thread David G. Johnston

On Sun, Apr 24, 2016 at 8:01 PM, Robert Haas  wrote:

>
> Of course, we could make this value 1-based rather than 0-based, as
> Peter Geoghegan suggested a while back.  But as I think I said at the
> time, I think that's more misleading than helpful.  The leader
> participates in the parallel plan, but typically does far less of the
> work beneath the Gather node than the other nodes involved in the
> query, often almost none.  In short, the leader is special.
> Pretending that it's just another process involved in the parallel
> group isn't doing anyone a favor.
>

Does this apply to the extent that a value of 1 is likely worse than 0
since the leader is now tasked with accumulating but there is only one
process actually working to provide the leader data?

I'm inclined to accept max_parallel_workers where a value of 0 means no
parallelism and the non-zero counts indicate the number of workers in
addition to the required leader.

Though that does suggest "additional" as a valid option.  Something like
"max_additional_workers".  Just how overloaded is the term "worker".  If
worker is understood to mean "a process which implements execution of [part
of] a query plan" the word additional leaves no ambiguity as to where the
leader is accounted for.

It does significantly reduce grep-ability though :(

max_additional_parallel_workers...

David J.

Re: [HACKERS] Rename max_parallel_degree?

2016-05-02 Thread David G. Johnston

On Mon, May 2, 2016 at 12:14 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Mon, May 2, 2016 at 2:44 PM, David G. Johnston
> <david.g.johns...@gmail.com> wrote:
> > Does this apply to the extent that a value of 1 is likely worse than 0
> since
> > the leader is now tasked with accumulating but there is only one process
> > actually working to provide the leader data?
>
> I don't know what that means, but it doesn't work like that.  If the
> worker can't generate data fast enough, the leader will also run the
> parallel portion of the plan.  So 1 is unlikely to be worse than 0; in
> fact it's often a lot better.
>
> > I'm inclined to accept max_parallel_workers where a value of 0 means no
> > parallelism and the non-zero counts indicate the number of workers in
> > addition to the required leader.
>
> That's how it works now.
>


Ok, that's basically what I was trying to figure out.  Whether the leader
will take on the role of "worker" at lower levels of "parallelism" since if
it was dedicated to only aggregating data from other workers it would be a
net loss if only one other worker existed.



>
> > Though that does suggest "additional" as a valid option.  Something like
> > "max_additional_workers".  Just how overloaded is the term "worker".  If
> > worker is understood to mean "a process which implements execution of
> [part
> > of] a query plan" the word additional leaves no ambiguity as to where the
> > leader is accounted for.
> >
> > It does significantly reduce grep-ability though :(
> >
> > max_additional_parallel_workers...
>
> I don't think that it's likely to be very clear what "additional"
> refers to in this context.
>
>
I'm not sure how "what" could be interpreted as anything other than
"parallel worker"...which I presumed is recognized by the user otherwise
"max_parallel_workers" itself exhibits the same problem.

I could see how code usage could be annoying, having to always "+1" the
value to get total number of potential workers, but the UI, for me, removes
any question of of whether the leader is counted in the set of "parallel
workers".

David J.

Re: [HACKERS] full table delete query

2016-05-03 Thread David G. Johnston

On Tue, May 3, 2016 at 5:51 AM, hari.prasath 
wrote:

> Hi all,
>   How postgresql handles full table delete in terms of loading the
> full table in these scenarios
>
> consider one big table(tablename: bigtable)
> and the query will be >> delete from bigtable;
>
> 1)which doesn't have any foreign table reference with any other tables
>
> 2)And when this table is referenced by other table
>
>
You should at least consider whether you can use TRUNCATE, especially in #1

An actual delete has to modify every page for the table so it can mark
every row as having been deleted.  I don't think it needs to load TOAST
data but am uncertain.  I reasonably confident all non-TOASTED data will
end up in buffers.

References would depend on CASCADE behavior but in a restrict mode only FK
resolution triggers will be involved.  In most well-design scenarios
indexes are then used instead of the corresponding triggers.  So less data
but still likely every row will be read in.

David J.

Re: [HACKERS] Pg_stop_backup process does not run - Backup Intervals

2016-05-03 Thread David G. Johnston

On Mon, May 2, 2016 at 12:03 PM, Rodrigo Cavalcante 
wrote:

> Hi,
>
> On alternate days my backup is failing, by the pg_stop_backup process ()
> does not perform or quit.
>
> Version PostgreSQL: 9.1.6
>

Reporting unusual behavior while running a years-old point release is
unlikely to be productive.  No one wants to debug something that very well
may have been fixed in the meantime.

And, as Robert said, you need to provide considerably more detail than you
have if you wish to get help diagnosing your situation.  And I too would
recommend you not "roll your own" in this area unless you are trying to
learn, at a lower-level, how these things work.

David J.

Re: [HACKERS] Make PG's "NOT NULL"s and attnotnull ("is_nullable") conform to SQL-2011

2016-05-04 Thread David G. Johnston

On Monday, February 8, 2016, Vitaly Burovoy 
wrote:

>
> 12. At the same time in (subcl. 4.13) mentioned there can be "at least
> one NNC" (may be via inheritance?).
>
>
This is a bit hard to reason about given that our implementation of
inheritance is non-standard.

Are we close to the standard semantics with regard to this particular
dynamic?

David J.

Re: [HACKERS] Make PG's "NOT NULL"s and attnotnull ("is_nullable") conform to SQL-2011

2016-05-03 Thread David G. Johnston

Quick flyby here...

On Tuesday, May 3, 2016, Tom Lane  wrote:

> Vitaly Burovoy > writes:
> > On 4/27/16, Alvaro Herrera >
> wrote:
> >> Point 2 is where things differ from what I remember; my (possibly
> >> flawed) understanding was that there's no difference between those
> >> things.  Many (maybe all) of the things from this point on are probably
> >> fallout from that one change.
>
> > It is just mentioning that CHECK constraints have influence on
> > nullability characteristic, but it differs from NNC.
> > NNC creates CHECK constraint, but not vice versa. You can create
> > several CHECK "col IS NOT NULL" constraints, but only one NNC (several
> > ones by inheritance only?). And DROP NOT NULL should drop only those
> > CHECK that is linked with NNC (and inherited), but no more (full
> > explanation is in my initial letter).


Either it's one, or it's not...


> This seems to me to be a most curious reading of the standard.
> SQL:2011 11.4  syntax rule 17a says
>
>  If a  is specified that contains
>  the  NOT NULL, then it is equivalent to the
>  following :
>
> CND CHECK ( C IS NOT NULL ) CA
>
> As a rule, when the SQL spec says "equivalent", they do not mean "it's
> sort of like this", they mean the effects are indistinguishable.  In
> particular, I see nothing whatsoever saying that you're not allowed to
> write more than one per column.


Does it define how DROP NOT NULL is supposed to behave?

I agree that the behavior of a column NNC is identical to a similar
constraint defined on the table: but if drop not null doesn't impact table
constraints then the concept of perfect equality is already lost.


> So I don't like the proposal to add an attnotnullid column to
> pg_attribute.  What we'd talked about earlier was converting attnotnull
> into, effectively, a hint flag saying that there's at least one NOT NULL
> constraint attached to the column.


>
Have we considered making it a table constraint and giving it a name?  We
already handle that case without difficulty.

Not looking for a detailed explanation.

David J.

Re: [HACKERS] force_parallel_mode uniqueness

2016-05-08 Thread David G. Johnston

On Sun, May 8, 2016 at 10:56 AM, Robert Haas <robertmh...@gmail.com> wrote:

> On Sat, May 7, 2016 at 11:42 PM, David G. Johnston
> <david.g.johns...@gmail.com> wrote:
> > All of the other planner GUCs are basically, {on, off, special} with on
> or
> > special the default as appropriate for the feature - since most/all
> features
> > default to enabled.  While I get that the expected usage is to set this
> to
> > off (which really leaves parallel mode in its default on behavior) and
> then
> > reduce the parallel workers to zero to disable that runs contrary to all
> of
> > the other switches listed alongside force_parallel_mode.
> > constraint_exclusion seems like something to be emulated here.
>
> I am not really sure what you are suggesting here.  If you're saying
> that you don't like the ordering regress > on > off, because there are
> other GUCs where the intermediate values are all between "on" and
> "off", then I think that's silly.  We should name and order the
> options based on what makes sense, not based on what made sense for
> other options.  Note that if you think there are no other GUCs which
> have a value greater than "on", see also
> synchronous_commit=remote_apply.
>

I was thinking more along the lines that it should be called:

parallel_mode (enum)

It would default to "on"

"off" would turn it off (instead of having to set parallel_degree to 0)

And it would have additional enum values for:

"always" - basically what on means in the current setup
"regress" - the same as the current regress.


> > Also, all of the other geoq options get placed here yet max parallel
> degree
> > is in an entirely different section.
>
> max_parallel_degree has nothing to do with GEQO, so I don't know that
> the location of "other" GEQO options has much to do with anything.  It
> also has nothing to do with force_parallel_mode, which is what this
> email was about until you abruptly switched topics.
>

I was simply trying to indicate that the various settings that configure
geqo are present on this page.  max_parallel_degree is likewise a setting
that configures parallel_mode but it isn't on this page.


> > I'm a bit torn on this point though
> > since it does fit nicely in asynchronous behavior.  I think the next
> thought
> > finds the happy middle.
>
> We could put max_parallel_degree under "other planner options" rather
> than "asynchronous behavior".  However, I wonder what behavior people
> will want for parallel operations that are not queries.  For example,
> suppose we have parallel CREATE INDEX.  Should the number of workers
> for that operation also be controlled by max_parallel_degree?  If yes,
> then this shouldn't be a query planner option, because CREATE INDEX is
> not a query.
>

Like I said, it isn't clear-cut.  But at the moment it is just for queries
- it could be moved if it gets dual purposed as you describe.


> > If nothing else this option should include a link to max_parallel_degree
> and
> > vice-versa.  Noting how to disable the feature in this section, if the
> guc
> > semantics are not changed, would be good too.  That note would likely
> > suffice to establish the linking term to parallel degree.  Something can
> be
> > devised, even if just a see also, to link back.
>
> It's probably worth mentioning under force_parallel_mode that it will
> have no effect if parallel query is disabled by the
> max_parallel_degree setting.  But it is completely unnecessary IMHO
> for max_parallel_degree to link to force_parallel_mode.  Most people
> should not be using force_parallel_mode; it is there for testing
> whether functions are correctly labeled as to parallel safety and
> that's it.
>

So this particular capability is unique and as such it warrants offing a
"force" mode that none of the other planner configuration GUCs on this page
have.  Its clear that this is how it was intended but as a casual reader of
the section its uniqueness stood out - and maybe that is for the better.

I guess part of the misunderstanding is simply that you have a lot more
plans for this feature than are currently implemented but I am reading the
documentation only knowing about those parts that are.

David J.

Re: [HACKERS] First-draft release notes for next week's back-branch releases

2016-05-08 Thread David G. Johnston

On Sunday, May 8, 2016, Tom Lane <t...@sss.pgh.pa.us> wrote:

> [ I think you meant to attach this to the other thread, but anyway... ]

This is where the link to the online version was; reading the sgml
and/or compiling ends up being a bit more than I wanted to do to review
these.

>
> "David G. Johnston" <david.g.johns...@gmail.com <javascript:;>> writes:
> > "...replacement_sort_tuples, which see for further details." needs
> > rewording.
>
> Hmm, "which see" is perfectly good English to my knowledge, and I'm not
> sure that other possible ways of wording this would be less awkward.
>

Removing it doesn't seem like a bad choice...The user should realize the
relevant preceding linked guc is where they should look for more details -
pointing it out to them seems verbose.  But the meaning is clear regardless
of familiarity.

> > Is it worth mentioning the deprecation of exclusive backups in the notes
> > introducing non-exclusive ones?
>
> It's not clear to me that we're actually deprecating them; there did not
> seem to be consensus on that.

>
Then section 24.3.3 needs fixing. The second paragraph explicitly states it
is deprecated.

http://www.postgresql.org/docs/devel/static/continuous-archiving.html

David J.

Re: [HACKERS] SET ROLE and reserved roles

2016-04-15 Thread David G. Johnston

On Fri, Apr 15, 2016 at 8:56 AM, Stephen Frost  wrote:

> Robert,
>
> * Robert Haas (robertmh...@gmail.com) wrote:
> > On Wed, Apr 13, 2016 at 6:53 PM, Stephen Frost 
> wrote:
> > > Requiring that SET ROLE be allowed will mean that many more paths must
> > > be checked and adjusted, such as in all of the CreateObject statements
> > > and potentionally many other paths that I'm not thinking of here, not
> > > the least of which are all of the *existing* checks near
> > > get_rolespec_oid/tuple which will require additional checks for the
> > > CURRENT_USER case to see if the current user is a default role.
> >
> > I don't get it.  I think that these new roles should work just like
> > any other roles, except for existing at initdb time.  I don't see why
> > they would require checking or adjusting any code-paths at all.  It
> > would presumably have made the patch you committed smaller and
> > simpler.  The only thing you'd need to do is (approximately) not dump
> > them.
>
> Perhaps it would be helpful to return to the initial goal of these
> default roles.
>
> We've identified certain privileges which are currently superuser-only
> and we would like to be able to be GRANT those to non-superuser roles.
> Where our GRANT system is able to provide the granularity desired, we
> have simply removed the superuser checks, set up appropriate default
> values and, through the "pg_dump dump catalog ACLs" patch, allowed
> administrators to make the changes to those objects via the
> 'GRANT privilege ON object TO role' system.
>
> For those cases where our GRANT system is unable to provide the
> granularity desired and it isn't sensible to extend the GRANT system to
> cover the exact granularity we wish to provide, we needed to come up
> with an alternative approach.  That approach is the use of special
> default roles, where we can allow exactly the level of granularity
> desired and give administrators a way to grant those privileges to
> users.
>

I didn't fully comprehend this before, but now I understand why additional
checks beyond simple "has inherited the necessary grant" are needed.  The
checks being performed are not for itemized granted capabilities

Further, there seems to be no particular use-case which is satisfied by
> allowing SET ROLE'ing to the default roles or for those roles to own
> objects; indeed, it seems more likely to simply spark confusion and
> ultimately may prevent administrators from making use of this system for
> granting privileges as they are unable to GRANT just the specific
> privilege they wish to.
>
>
And I'm have trouble imaging what kind of corner-case could result from
not allowing a role to assume ownership of a role it is a member of.  While
we do have the capability to required SET ROLE it is optional: any code
paths dealing with grants have to assume that the current role receives
permission via inheritance - and all we are doing here is ensuring that is
the situation.

It now occurs to me that perhaps we can improve on this situation in the
> future by formally supporting a role attribute that is "do not allow SET
> ROLE to this user" and then changing the default roles to have that
> attribute set (and disallowing users from changing it).  That's just
> additional flexibility over what the current patch does, however, but
> perhaps it helps illustrate that there are cases where only the
> privileges of the role are intended to be GRANT'd and not the ability to
> SET ROLE to the role or to create objects as that role.
>

That is where my first response was heading but it was definitely scope
creep so I didn't bring it up.  Basically, an "INHERITONLY" modifier in
addition to INHERIT and NOINHERIT.

I had stated the having a role that neither provides inheritance nor
changing-to is pointless but I am now unsure if that is true.  It would
depend upon whether a LOGIN role is one that is considered having been SET
ROLE to or not.  If not, then a "LOGINONLY" combination would work such
that a role with that combination would only have whatever grants are given
to it and is not allowed to be changed to by a different role - i.e., it
could only be used by someone logging into the system specifically as that
role.  It likewise complements "NOLOGIN + LOGIN".

It wouldn't directly preclude said role from itself SET ROLE'ing to a
different role though.

And, for all that, I do realize I'm using these terms somewhat contrary to
reality since INHERIT is done on the receiver and not the giver.  The
wording would have to change to reflect the POV of the giver.  That is,
INHERITONLY should be something done to a role that prevents it from being
SET TO as opposed to NOINHERIT which forces a role to use SET TO to access
the permissions of all roles in which it is a member.

David J.

Re: [HACKERS] [patch] \crosstabview documentation

2016-04-13 Thread David G. Johnston

On Wed, Apr 13, 2016 at 12:29 PM, Tom Lane  wrote:

> Christoph Berg  writes:
> > Another thing about \crosstabview:
>
> > # select 1,2 \crosstabview
> > The query must return at least two columns to be shown in crosstab
>
> > s/two/three/, I guess.
>
> Yeah, I noticed that.  See
> http://www.postgresql.org/message-id/10276.1460569...@sss.pgh.pa.us
>
> So I guess:

"
 crosstabview with only 2 output columns
 "

https://wiki.postgresql.org/wiki/Crosstabview
(last section on that page)

never got implemented

David J.

Re: [HACKERS] Query Procedures

2016-04-21 Thread David G. Johnston

On Thu, Apr 21, 2016 at 8:26 AM, Andrea Adami  wrote:

>
> Hello, i'm a developer from italy and i need to make a query to get the
> list of stored procedures with its signature.
> Basically I would like to get the same list pgAdmin shows under the node
> functions of the database tree on left panel. with the query : p.oid AS
> SELECT oid , p.proname AS name , p.proargtypes FROM pg_proc p I get the
> list of procedures with the column "proargtypes" that contains the list
> of parameters's type.
> Each digit of proargtypes column refers to the corresponding tuple in
> pg_type So far all is fine, the problems arise when i want convert these
> numbers to strings as the previously mentioned pgAdmin did: 1 ) in typname
> column (of pg_type table) some time, I did not understand why , there is a
> space in front of the name , better: a space is displayed but there is not
> a space because a trim do not throw away . 2 ) I can not understand what
> colums tell me if the type of data in question is an array ( to display it
> with ' [ ] ' appended to the name ) Someone is kind enough to put me on the
> right track ?
>
> p.s.
> the function: pg_catalog.pg_get_function_arguments(p.oid) show what i need
> but after the parameter name, what i want is a list of parameter's datatype
> (the signature) without the parameter's name
>
>
Using text as an example the system considers the type "_text" to be its
array form.  As you sure that what you are thinking is a leading space is
not this leading underscore?

David J.

Re: [HACKERS] sign function with INTERVAL?

2016-04-13 Thread David G. Johnston

On Wed, Apr 13, 2016 at 3:48 PM, Daniel Lenski  wrote:

> On Wed, Apr 13, 2016 at 12:35 PM, Tom Lane  wrote:
> > Jim Nasby  writes:
> >> Actually, after looking at the code for interval_lt, all that needs to
> >> happen to add this support is to expose interval_cmp_internal() as a
> >> strict function. It already does exactly what you want.
> >
> > interval_cmp() is already SQL-accessible.
>
> Thanks! The interval_cmp() function does not appear in the 9.5 docs.
> http://www.postgresql.org/docs/9.5/static/functions-datetime.html
>
> On Wed, Apr 13, 2016 at 11:54 AM, Gianni Ceccarelli
>  wrote:
> > I'm not sure that "positive time interval" is a thing. Witness:
> >
> > (snip)
> >
> >  dakkar=> select date '2016-03-01' + interval '1 month - 30 days';
> >  ┌─┐
> >  │  ?column?   │
> >  ├─┤
> >  │ 2016-03-02 00:00:00 │
> >  └─┘
> >  (1 row)
> >
> > when used this way, it looks positive, but
> >
> >  dakkar=> select date '2016-02-01' + interval '1 month - 30 days';
> >  ┌─┐
> >  │  ?column?   │
> >  ├─┤
> >  │ 2016-01-31 00:00:00 │
> >  └─┘
> >  (1 row)
> >
> > when used this way, it looks negative.
> >
> > So I suspect the reason SIGN() is not defined for intervals is that
> > it cannot be made to work in the general case.
>
> I hadn't considered this case of an interval like '1 month - 30 days',
> which could be either positive or negative depending on the starting
> date to which it is added.
>
> interval_cmp's handling of this case seems surprising to me. If I've
> got this right, it assumes that (interval '1 month' == interval '30
> days') exactly:
>
> http://doxygen.postgresql.org/backend_2utils_2adt_2timestamp_8c_source.html#l02515
>
>
I was trying to figure out how the months/year fit in here - or whether
years are derived from days (i.e., 365 instead of 360)...

The anchored section of code only shows stand-alone conversion factors for
days->hours and months-days


> Do I have that right? I'm having trouble envisioning an application
> that would ever generate intervals that contain months and days
> without opposite signs, but it's useful to know that such a corner
> case could exist.
>

Yes.
&
I want the date that is 1 month and 14 days (2 weeks) from now...

For added fun the SQL standard apparently disallows mixed signs (according
to our docs)

>According to the SQL standard all fields of an interval value must have
the same sign,
>so a leading negative sign applies to all fields; for example the negative
sign in
>the interval literal '-1 2:03:04' applies to both the days and
hour/minute/second parts.

http://www.postgresql.org/docs/current/static/datatype-datetime.html


> Given this behavior, the only 100% reliable way to check whether an
> interval is forward, backwards, or zero would be to first add, and
> then subtract, the starting point:
>
> postgres=# select interval_cmp( (date '2016-02-01' + interval '1 month
> - 30 days') - date '2016-02-01', interval '0' );
>  interval_cmp
> --
>-1
> (1 row)
>
> postgres=# select interval_cmp( (date '2016-04-01' + interval '1 month
> - 30 days') - date '2016-04-01', interval '0' );
>  interval_cmp
> --
> 0
> (1 row)
>
>
Yes, the dual nature of an interval, i.e., an assumed conversion factor
(1m = 30d) if dealing with it independently but a conversion factor based
on reality (feb has 28 days, typically) makes working with it complicated.
There is not way you could write an operator that successfully handles the
later situation since you cannot write a custom ternary operator that would
take two intervals and a date.  That said we already have rules that allow
us to canonical-ize an interval so any form of two-interval comparison can
be performed: but those results become invalidated if one has to apply the
interval to a date.

In short, adding this feature would make it much easier for the
inexperienced to use intervals unsafely without realizing it.  It is
possible to write custom functions that do exactly what is needed based
upon the usage of intervals within the system under observation.  Doability
combined with ignorance hazard means that the status-quo seems acceptable.

I guess it would be nice to expose our conversion factors in such a way
that a user can readily get the number of seconds represented by a given
interval when considered without respect to a starting date.  But since
most uses of interval are related to dates it seems likely that comparing
intervals by comparing the dates resulting from their application is the
most reliable.

N.B. consider too that the signs are not the whole of it.  Intervals allow
for the word "ago" to be specified.

David J.

Re: [HACKERS] SET ROLE and reserved roles

2016-04-13 Thread David G. Johnston

On Wed, Apr 13, 2016 at 3:53 PM, Stephen Frost  wrote:

> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > Stephen Frost  writes:
> > > On Wednesday, April 13, 2016, Tom Lane  wrote:
> > >> If you want to prevent that, I think it needs to be done somewhere
> else
> > >> than here.  What about "ALTER OWNER TO pg_signal_backend", for
> instance?
> >
> > > Checks are included in that code path to disallow it.
> >
> > If there are such low-level checks, why do we need to disallow SET ROLE?
> > It seems like the wrong place to be inserting such defenses.
>
> The low-level checks were placed all along the paths which used
> get_rolespace_oid/tuple.  SET ROLE is the only place where a user could
> change to another role and then do things which result in objects being
> owned by that role without going through one of those paths.
>
> Requiring that SET ROLE be allowed will mean that many more paths must
> be checked and adjusted, such as in all of the CreateObject statements
> and potentionally many other paths that I'm not thinking of here, not
> the least of which are all of the *existing* checks near
> get_rolespec_oid/tuple which will require additional checks for the
> CURRENT_USER case to see if the current user is a default role.
>
> > >> But perhaps more to the point, why is it a strange corner case for one
> > >> of these roles to own objects?
> >
> > > If the default roles can own objects and otherwise be used for other
> > > purposes then we can never, ever, be rid of any which we create.
> >
> > This argument seems quite bogus to me, because granted privileges are
> > pretty darn close to being objects.  Certainly, if you have some
> > "GRANT pg_signal_backend TO joe_user" commands in a pg_dump script,
> > those will fail for lack of the role just as surely as ALTER OWNER
> > commands would.  So we would need some kind of special hack in pg_dump
> > either way, unless we make it incumbent on users to clean up before
> > upgrading (which doesn't seem out of the question to me ...)
>
> I don't think we would have any concerns with a hack in pg_dump to
> remove those GRANTs if that default role goes away.  There's certainly
> no way we're going to do that for tables which have data in them,
> however.  Therefore, the user would have to address any such cases
> before being able to an upgrade, and I don't see why we want to allow
> such a case.
>
> As for present-vs-future cruft, we can't put this genie back in the
> bottle once we allow a default role to own objects, which is why I felt
> it was prudent to address that concern from the get-go.
>
> > I think you're replacing hypothetical future cruft with actual present
> > cruft, and it doesn't seem like a very good tradeoff.
>
> If we don't care about default roles being able to own objects, then we
> can remove the check in SET ROLE and simply accept that we can never
> remove those roles without user intervention.  There's a number of other
> checks in the tree which we can debate (should a default role be allowed
> to have a USER MAPPING?  What about additional privileges beyond those
> we define for it?  Perhaps DEFAULT privileges?).  If we're going to
> allow default roles to own objects, it seems like we could just about
> remove all those other checks also.
>
> If we don't want default roles to be able to own objects, but we do want
> users to be able to SET ROLE to them, then there's a whole slew of
> *additional* checks which have to be added, which is surely a
> net-increase in code.
>
> I'm happy to do my best to implement the concensus opinion, just wanted
> to be clear on what the options are.  If I could get responses regarding
> everyone's preferences on the above, then we can get to what the
> desired behavior should be and I'll implement it.
>

>From what I've read here I'm thinking Stephen has the right idea.

Lets be conservative in what we allow with these new roles and let
experience guide us as to whether we need to open things up more - or just
fix oversights.

We have chosen to give up "defense in depth" here since if by some other
means the value of current_user gets set to one of these roles having the
sole protection point at SET ROLE will seem like a bad choice.  Aside from
that it will allow for fewer changes for code that chooses to rely on that
assertion instead of ensuring that it is true.

If we are wrong the obvious work-around is that all roles that would be
granted a given pg_* role would instead be granted a normal role which
itself would be granted the pg_* role.  Any of those roles would be then be
able emulate SET ROLE pg_* by instead switching to the normal role.

I don't think we'd be inclined to make these login roles so most external
tools are likely going to simply rely on the fact that whatever user they
are told to login with already has the necessary grants setup.

Stephen's entire response is enough for me to want to require an
justification for why

Re: [HACKERS] 9.6 and fsync=off

2016-04-28 Thread David G. Johnston

On Thursday, April 28, 2016, Simon Riggs  wrote:

> On 27 April 2016 at 17:04, Craig Ringer  > wrote:
>
>> On 27 April 2016 at 21:44, Tom Lane > > wrote:
>>
>>> Petr Jelinek >> > writes:
>>> > +1 (Abhijit's wording with data loss changed to data corruption)
>>>
>>> I'd suggest something like
>>>
>>> #fsync = on # flush data to disk for crash
>>> safety
>>> # (turning this off can cause
>>> # unrecoverable data corruption!)
>>>
>>>
>> Looks good.
>>
>> The docs on fsync are already good, it's just a matter of making people
>> think twice and actually look at them.
>>
>
> If fsync=off and you turn it on, does it fsync anything at that point?
>
> Or does it mean only that future fsyncs will occur?
>
>
http://www.postgresql.org/docs/current/static/runtime-config-wal.html

4th paragraph in the fsync section.

David J.

Re: [HACKERS] between not propated into a simple equality join

2016-05-09 Thread David G. Johnston

On Mon, May 9, 2016 at 8:53 AM, Benedikt Grundmann <
bgrundm...@janestreet.com> wrote:

> We just run into a very simple query that the planner does much worse on
> than we thought it would (in production the table in question is ~ 100
> GB).  It surprised us given the planner is generally quite good, so I
> thought I share our surprise
>
> Setup:
>
> postgres_prod@proddb_testing=# select version();[1]
> version
>
>
> 
>  PostgreSQL 9.2.16 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
> 4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit
> (1 row)
>
> Time: 69.246 ms
>
> postgres_prod@proddb_testing=# create table toy_data3 (the_date date, i
> int);
> CREATE TABLE
> Time: 67.096 ms
> postgres_prod@proddb_testing=# insert into toy_data3
>
>   (select current_date-(s.idx/1000), s.idx from generate_series(1,100)
> as s(idx));
> INSERT 0 100
> Time: 1617.483 ms
> postgres_prod@proddb_testing=# create index toy_data_date3 on
> toy_data3(the_date);
> CREATE INDEX
> Time: 660.166 ms
> postgres_prod@proddb_testing=# analyze toy_data3;
> ANALYZE
> Time: 294.984 ms
>
> The bad behavior:
>
> postgres_prod@proddb_testing=# explain analyze
>   select * from (
>select td1.the_date, td1.i
> from toy_data3 td1, toy_data3 td2  where td1.the_date = td2.the_date
> and td1.i = td2.i
>   ) foo
>   where the_date between current_date and current_date;
>QUERY
> PLAN
>
> ───
>  Hash Join  (cost=55.49..21980.50 rows=1 width=8) (actual
> time=0.336..179.374 rows=999 loops=1)
>Hash Cond: ((td2.the_date = td1.the_date) AND (td2.i = td1.i))
>->  Seq Scan on toy_data3 td2  (cost=0.00..14425.00 rows=100
> width=8) (actual time=0.007..72.510 rows=100 lo
>->  Hash  (cost=40.44..40.44 rows=1003 width=8) (actual
> time=0.321..0.321 rows=999 loops=1)
>  Buckets: 1024  Batches: 1  Memory Usage: 40kB
>  ->  Index Scan using toy_data_date3 on toy_data3 td1
>  (cost=0.01..40.44 rows=1003 width=8) (actual time=0.018.
>Index Cond: ((the_date >= ('now'::cstring)::date) AND
> (the_date <= ('now'::cstring)::date))
>  Total runtime: 179.440 ms
> (8 rows)
>
> Time: 246.094 ms
>
> Notice the red.  Which is sad because one would like it to realize that it
> could propagate the index constraint onto td2.  That is on both sides of
> the join do the green.
>
>
FWIW

This is my plan result:
version
PostgreSQL 9.5.2 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
4.8.2-19ubuntu1) 4.8.2, 64-bit
All default settings

using "BETWEEN"


 QUERY PLAN
Nested Loop  (cost=0.86..48.91 rows=1 width=8) (actual time=0.042..168.512
rows=999 loops=1)
  ->  Index Scan using toy_data_date3 on toy_data3 td1  (cost=0.43..8.46
rows=1 width=8) (actual time=0.022..1.388 rows=999 loops=1)
Index Cond: ((the_date >= ('now'::cstring)::date) AND (the_date <=
('now'::cstring)::date))
  ->  Index Scan using toy_data_date3 on toy_data3 td2  (cost=0.42..40.44
rows=1 width=8) (actual time=0.078..0.160 rows=1 loops=999)
Index Cond: (the_date = td1.the_date)
Filter: (td1.i = i)
Rows Removed by Filter: 998
Planning time: 0.353 ms
Execution time: 169.692 ms

using "="

QUERY PLAN
Hash Join  (cost=49.89..90.46 rows=1 width=8) (actual time=2.320..5.652
rows=999 loops=1)
  Hash Cond: (td1.i = td2.i)
  ->  Index Scan using toy_data_date3 on toy_data3 td1  (cost=0.43..37.37
rows=967 width=8) (actual time=0.014..1.168 rows=999 loops=1)
Index Cond: (the_date = ('now'::cstring)::date)
  ->  Hash  (cost=37.37..37.37 rows=967 width=8) (actual time=2.292..2.292
rows=999 loops=1)
Buckets: 1024  Batches: 1  Memory Usage: 48kB
->  Index Scan using toy_data_date3 on toy_data3 td2
 (cost=0.43..37.37 rows=967 width=8) (actual time=0.008..1.183 rows=999
loops=1)
  Index Cond: (the_date = ('now'::cstring)::date)
Planning time: 0.326 ms
Execution time: 6.673 ms

I was hoping to be able to say more but alas cannot find the words.

I'm surprised by the estimate of 1 rows for the td1 index scan in my 9.5
query - and also why the 9.2 query would choose to sequential scan hash
join in favor of what seems to be a superior index scan nested loop on a
fraction of the table.

The fact that the between doesn't get transitively applied to td2 through
the td1=td2 condition, not as much...though whether the limitation is due
to theory or implementation I do not know.

I do suspect that at least part of the issue is that the computation of
"the_date" makes the two columns highly correlated while the planner
assumes independence.

David J.

Re: [HACKERS] "pg_xxx" role name restriction not applied to bootstrap superuser?

2016-05-07 Thread David G. Johnston

On Saturday, May 7, 2016, Tom Lane  wrote:

> Stephen Frost > writes:
> > * Tom Lane (t...@sss.pgh.pa.us ) wrote:
> >> ... but I'm left with a policy question: should initdb disallow
> >> bootstrap superuser names like "pg_xxx"?
>
> > On the whole, I'd vote to treat the bootstrap user as a normal role and
> > therefore have the same restriction in place for that user also.
>
> If we're going to enforce such a restriction, I think it would be
> a good thing for it to be in place in beta1.
>
>
I don't fathom a good reason to treat only the bootstrap user differently.
I'd second guess prohibiting pg_ generally instead of only the specific
system roles in use in a given release.  Having beta1 go out with full
restrictions will at least maximize the chance of getting complaints and
insight into how prevalent the prefix is in the wild.

David J.

Re: [HACKERS] First-draft release notes for next week's back-branch releases

2016-05-07 Thread David G. Johnston

On Friday, May 6, 2016, Tom Lane  wrote:

> If you're not tired of reviewing release notes (I'm sure getting a bit
> tired of writing them), see
>
>
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=eb7de00ac2d282263541ece849ec71e2809e9467
>
> guaibasaurus should have 'em up on the web in an hour or so, too, at
>
> http://www.postgresql.org/docs/devel/static/release-9-5-3.html
>
>
"...replacement_sort_tuples, which see for further details." needs
rewording.

For some reason I had trouble comprehending the index only scans on partial
index couple or paragraphs.  Got it after a few reads.  Seems like
it's almost too detailed.

"Partial indexes can be used for index only scans in some circumstances.
See section for details."  If there isn't a section to point to there
should be - people want to know how to get IOS and aren't going to read
release notes to figure it out.

Are the pg_stat_activity changes breaking changes? If so its not clear
from the notes.

I'll +1 the elsewhere mentioned confusion adding a pg_config view vis-a-vis
pg_settings.  Adding (or using) the word "compile" would be advisable.

The guc for the number of standby servers that must acknowledge should be
named in the notes and linked to the main docs.  "An additional syntax has
been added to synchronous_standby_names to accommodate the number of
standby servers that must acknowledge a commit."

Is it worth mentioning the deprecation of exclusive backups in the notes
introducing non-exclusive ones?

Read the rest and nothing stood out - though I guess I'd advise myself or
the next person to read up from the bottom so fresh eyes read the lower
stuff first.

David J.

[HACKERS] force_parallel_mode uniqueness

2016-05-07 Thread David G. Johnston

My take below is that of a user reading our documentation and our projected
consistency via that document.

All of the other planner GUCs are basically, {on, off, special} with on or
special the default as appropriate for the feature - since most/all
features default to enabled.  While I get that the expected usage is to set
this to off (which really leaves parallel mode in its default on
behavior) and then reduce the parallel workers to zero to disable that runs
contrary to all of the other switches listed alongside force_parallel_mode.
 constraint_exclusion seems like something to be emulated here.

Also, all of the other geoq options get placed here yet max parallel
degree is in an entirely different section.  I'm a bit torn on this point
though since it does fit nicely in asynchronous behavior.  I think the next
thought finds the happy middle.

If nothing else this option should include a link to max_parallel_degree
and vice-versa.  Noting how to disable the feature in this section, if the
guc semantics are not changed, would be good too.  That note would
likely suffice to establish the linking term to parallel degree.  Something
can be devised, even if just a see also, to link back.

David J.

Re: [HACKERS] Speaking of breaking compatibility...standard_conforming_strings

2016-05-24 Thread David G. Johnston

On Tue, May 24, 2016 at 3:57 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> "David G. Johnston" <david.g.johns...@gmail.com> writes:
> > I just noticed this comment in scan.l:
> > /*
> >  * GUC variables.  This is a DIRECT violation of the warning given at the
> >  * head of gram.y, ie flex/bison code must not depend on any GUC
> variables;
> >  * as such, changing their values can induce very unintuitive behavior.
> >  * But we shall have to live with it as a short-term thing until the
> switch
> >  * to SQL-standard string syntax is complete.
> >  */
> > int backslash_quote = BACKSLASH_QUOTE_SAFE_ENCODING;
> > bool escape_string_warning = true;
> > bool standard_conforming_strings = true;
>
> > I'm not exactly sure what else needs to happen to remove these forbidden
> > GUCs and if we are not prepared to do this now when will we ever be...
>
> Dunno, are you prepared to bet that nobody is turning off
> standard_conforming_strings anymore?
>
> In any case, we keep adding new violations of this rule (cf
> operator_precedence_warning) so I have little hope that it will ever be
> completely clean.
>

I tend to hold the same position.  I'd probably update the last sentence
of the comment to reflect that reality.

"We use them here due to the user-facing capability to change the parsing
rules of SQL-standard string literals."

The switch is not likely to ever be "complete" and if so not likely in
whatever period the future reader might consider "short-term".

David J.

Re: [HACKERS] Speaking of breaking compatibility...standard_conforming_strings

2016-05-24 Thread David G. Johnston

On Tue, May 24, 2016 at 4:07 PM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Tue, May 24, 2016 at 3:57 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>
>> "David G. Johnston" <david.g.johns...@gmail.com> writes:
>> > I just noticed this comment in scan.l:
>> > /*
>> >  * GUC variables.  This is a DIRECT violation of the warning given at
>> the
>> >  * head of gram.y, ie flex/bison code must not depend on any GUC
>> variables;
>> >  * as such, changing their values can induce very unintuitive behavior.
>> >  * But we shall have to live with it as a short-term thing until the
>> switch
>> >  * to SQL-standard string syntax is complete.
>> >  */
>> > int backslash_quote = BACKSLASH_QUOTE_SAFE_ENCODING;
>> > bool escape_string_warning = true;
>> > bool standard_conforming_strings = true;
>>
>> > I'm not exactly sure what else needs to happen to remove these forbidden
>> > GUCs and if we are not prepared to do this now when will we ever be...
>>
>> Dunno, are you prepared to bet that nobody is turning off
>> standard_conforming_strings anymore?
>>
>> In any case, we keep adding new violations of this rule (cf
>> operator_precedence_warning) so I have little hope that it will ever be
>> completely clean.
>>
>
> I tend to hold the same position.  I'd probably update the last sentence
> of the comment to reflect that reality.
>
> "We use them here due to the user-facing capability to change the parsing
> rules of SQL-standard string literals."
>
> The switch is not likely to ever be "complete" and if so not likely in
> whatever period the future reader might consider "short-term".
>
>
FWIW I'm not intending to dig any deeper in this area of the codebase.  I
was actually trying to find "gram.y" via a GitHub search (OT - I'm finding
that to be not the most usable tool...need to get better at git cli) and
ended up seeing scan.l

so I figured I'd read a few lines and got hit with that.  I was trying to
formulate an opinion of the "USING opclass" thread...decided that I'd take
a pass on that at this time.

David J.

[HACKERS] Speaking of breaking compatibility...standard_conforming_strings

2016-05-24 Thread David G. Johnston

I just noticed this comment in scan.l:

/*
 * GUC variables.  This is a DIRECT violation of the warning given at the
 * head of gram.y, ie flex/bison code must not depend on any GUC variables;
 * as such, changing their values can induce very unintuitive behavior.
 * But we shall have to live with it as a short-term thing until the switch
 * to SQL-standard string syntax is complete.
 */
int backslash_quote = BACKSLASH_QUOTE_SAFE_ENCODING;
bool escape_string_warning = true;
bool standard_conforming_strings = true;

I'm not exactly sure what else needs to happen to remove these forbidden
GUCs and if we are not prepared to do this now when will we ever be...

David J.

Re: [HACKERS] pg_restore & search_path, COPY failed for table "mytable": ERROR: function myinnerfunction(integer) does not exist

2016-07-21 Thread David G. Johnston

On Thu, Jul 21, 2016 at 1:57 PM, Jean-Pierre Pelletier <
jppellet...@e-djuster.com> wrote:

>
> I'm puzzled as to how search_path should be used,.
> Should all references be schema qualified inside functions body ?
>

Pretty much...you can also do:

CREATE FUNCTION funcname()
SET search_path TO 'other_schemas_needed_by_this_function'
AS $$
[...]
$$

You don't have to specify the schema the function is going to reside
in...but there is exposure if you don't.

Or is search_path safe except in the body of functions used in index or
> constraints ?
>

pg_dump/pg_restore tends to be very conservative in setting search_path.
I'd say you are safe if you can successfully dump/restore and unsafe if you
cannot.

Cross-schema dependencies can be problematic and if you are not willing to
test that your omissions are immaterial I'd say you should take the
paranoid route an schema-prefix everything - either explicitly or by taking
advantage of attribute setting options for functions.

Views, materialized and otherwise, are other areas commonly affected by lax
schema specifications.

David J.

Re: [HACKERS] Bug with plpgsql handling of NULL argument of compound type

2016-07-22 Thread David G. Johnston

On Fri, Jul 22, 2016 at 2:13 PM, Tom Lane  wrote:

> There is a rather squishy question as to whether NULL::composite_type
> should be semantically equivalent to ROW(NULL,NULL,...)::composite_type.
> If it is, then the SELECT should have failed before even getting into the
> plpgsql function, because ROW(NULL,NULL) is surely not a valid value of
> type c.  The SQL standard seems to believe that these things *are*
> equivalent (at least, that was how we read the spec for IS [NOT] NULL).
>
>
I dislike that they are considered equal in various circumstances but if
that's we are guided toward c'est la vie.


> FWIW, there is a very good argument that any not-null restriction on a
> datatype (as opposed to a stored column) is broken by design.  How do
> you expect a LEFT JOIN to a table with such a column to work?


As described - except "NULL::composite_type" isn't atomic.

I/we would like: If you have a non-null value of this composite type then
the first field of the composite must itself be non-null.

We
> certainly are not going to enforce the not-nullness in that context,
> and that leads to the thought that maybe we should just deny the validity
> of such restrictions across the board.
>
>
Soft or hard we should do this.  Being allowed to set NOT NULL on a
domain with these traps built into the architecture is just asking for
angry users when their data model fails to be usable throughout their
application.  The only thing we can offer is that we will respect NOT NULL
during the persisting a record to storage..

David J.

Re: [HACKERS] Bug with plpgsql handling of NULL argument of compound type

2016-07-22 Thread David G. Johnston

On Fri, Jul 22, 2016 at 3:04 PM, Merlin Moncure <mmonc...@gmail.com> wrote:

> On Fri, Jul 22, 2016 at 1:39 PM, David G. Johnston
> <david.g.johns...@gmail.com> wrote:
> > On Fri, Jul 22, 2016 at 2:13 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> >>
> >> There is a rather squishy question as to whether NULL::composite_type
> >> should be semantically equivalent to ROW(NULL,NULL,...)::composite_type.
> >> If it is, then the SELECT should have failed before even getting into
> the
> >> plpgsql function, because ROW(NULL,NULL) is surely not a valid value of
> >> type c.  The SQL standard seems to believe that these things *are*
> >> equivalent (at least, that was how we read the spec for IS [NOT] NULL).
> >>
> >
> > I dislike that they are considered equal in various circumstances but if
> > that's we are guided toward c'est la vie.
>
> Not sure we are guided there.  Currently we follow the spec
> specifically with the IS NULL operator but not in other cases. For
> example.
> postgres=# select row(null, null) is null;
>  ?column?
> ──
>  t
>

[...]

> 
>
> The basic problem we have is that in postgres the record variable is a
> distinct thing from its contents and the spec does not treat it that
> was. For my part, I think the spec is totally out to lunch on this
> point but we've been stuck with the status quo for quite some time now
> -- there's been no compelling narrative that suggests how things
> should be changed and to what.
>

In short,

1) We should discourage/remove the NOT NULL aspect of DOMAINs.

2) If one wishes to implement composite types defining not null components
it should
2a) be done on the CREATE TYPE statement
2b) involve behind-the-scenes transformation of row(null, null)::ctype to
null::ctype and null::ctype should not be validated, ever

I cannot speak to the standard nor the entirety of our implementation in
this area, but...

I don't personally have a problem with (conceptually, not actual
evaluations):

select row(null, null) is null --> true
select null is null --> true
select null = row(null, null) --> false (at least with respect to
implementation)

IS NULL and equality are two different things.  That both constructs
evaluate to null but are not implementation equivalent, while maybe a bit
ugly, doesn't offend me.  I'd just consider "row(null, null) is null" to be
a special case circumstance required by the spec and move on.

Then, forcing "null::composite" to be evaluated like "row(null,
null)::composite" can be considered incorrect.

If anything, ROW(null, null)::ctype should hard transformed to
"null::ctype" but not the other way around.  Only after an attempt to
transform row(null, null) is performed should the type constraints be
applied to those values not able to be transformed.

That all said I'm still disinclined to suggest/allow people to add NOT NULL
constraints to DOMAINs.  All other types in the system are basically
validated using the rule: "if there is a non-null value of this type ensure
that said value conforms".  As domains are types they should conform to
this policy.  A composite type is a container for other types.  The
container itself should be allowed to have its own rules - in much the same
way as a table does [1].

My concern regarding the above is that the row/isnull behavior is all
defined around the composite type yet the notnull constraint is attached to
the DOMAIN and I dislike that disconnect.  Having the NOT NULL on the
composite type and only having it applied after any value of the form
row(null, null)::ctype is reduced to null::ctype - a form in which all
subfield constraints are ignored - would be closer to my liking.  It also
avoids other problems related to DOMAINs but not requiring their use.

David J.

[1] I see a problem here if one were to change the behavior of explicit
composite types.  w.r.t. tables the NOT NULL constraints is not part of the
implicitly created type but if we allow an explicitly declared composite to
use NOT NULL and don't default the implicit types the disconnect could be
confusing.

Re: [HACKERS] Proposal: revert behavior of IS NULL on row types

2016-07-22 Thread David G. Johnston

On Fri, Jul 22, 2016 at 7:01 PM, Andrew Gierth 
wrote:

> In light of the fact that it is an endless cause of bugs both in pg and
> potentially to applications, I propose that we cease attempting to
> conform to the spec's definition of IS NULL in favour of the following
> rules:
>
> 1. x IS NULL  is true if and only if x has the null value (isnull set).
>

I don't have a problem conforming to "ROW(NULL, NULL) IS NULL" being
true...if you somehow get a hold of something in that form, which your
others points address.

> 2. x IS NOT NULL  if and only if  NOT (x IS NULL)
>

I would rather prohibit "IS NOT NULL" altogether.  If one needs to test
"NOT (x IS NULL)" they can write it that way.

3. ROW() and other row constructors never return the null value.
>

I think I get this (though if they return row(null, null) I'd say there is
not difference as far as the user is conconcerned)...

> Whole-row vars when constructed never contain the null value.
>

...but what does this mean in end-user terms?

> 4. Columns or variables of composite type can (if not declared NOT NULL)
>
contain the null value (isnull set) which is distinct from an
> all-columns-null value.
>

Is this just about the validation of the component types; which seems only
to be realized via DOMAINs?  If not I don't follow how this applies or is
different from what we do today.

> 5. COALESCE(x,y) continues to return y if and only if x is the null
> value. (We currently violate the spec here.)
>

I would concur - especially if in your referenced example
COALESCE((null,1),(2,null)) indeed would have to return (2,null)

My comment to #1 implies that I think COALESCE((null,null),(2,null)) should
return (2,null)...I am OK with that.  Operationally (null,null) should be
indistinguishable from the null value.  It mostly is today and we should
identify and fix those areas where they are different - not work to make
them more distinguishable.

>
> (X. Optionally, consider adding new predicates:
>
>   x IS ALL NULL
>   x IS NOT ALL NULL
>   x IS ALL NOT NULL
>   x IS NOT ALL NOT NULL
>
> which would examine the fields of x non-recursively.)
>
>
Not sure regarding recursion here but I'd much rather work a way to fit
this into the existing ANY syntax:

NULL IS ANY(x) -- definitely needs some bike-shedding though...

This presupposes that ROW(null, null) and null are indistinguishable
operationally which makes the "ALL" form unnecessary; and ANY = NOT(ALL)

David J.

Re: [HACKERS] Proposal: revert behavior of IS NULL on row types

2016-07-22 Thread David G. Johnston

On Fri, Jul 22, 2016 at 8:04 PM, Andrew Gierth <and...@tao11.riddles.org.uk>
wrote:

> >>>>> "David" == David G Johnston <david.g.johns...@gmail.com> writes:
>
>  >> 2. x IS NOT NULL  if and only if  NOT (x IS NULL)
>
>  David> I would rather prohibit "IS NOT NULL" altogether. If one needs
>  David> to test "NOT (x IS NULL)" they can write it that way.
>
> Prohibiting IS NOT NULL is not on the cards; it's very widely used.
>
>
Yet changing how it behaves, invisibly, is?  I'm tending to agree that
status-quo is preferable to either option but if you say change is
acceptable I'd say we should do it visibly.

Allowing syntax that is widely used but implementing it differently than
how it is being used seems worse than telling people said syntax is
problematic and we've chosen to avoid the issue altogether.


>  >> Whole-row vars when constructed never contain the null value.
>
>  David> ...but what does this mean in end-user terms?
>
> It means for example that this query:
>
>   select y from x left join y on (x.id=y.id) where y is null;
>
> would always return 0 rows.
>
>
Ok, so I'm pretty sure I disagree on this one too.

David J.

Re: [HACKERS] Proposal: revert behavior of IS NULL on row types

2016-07-22 Thread David G. Johnston

On Friday, July 22, 2016, Andrew Gierth <and...@tao11.riddles.org.uk> wrote:

> >>>>> "David" == David G Johnston <david.g.johns...@gmail.com
> <javascript:;>> writes:
>
>  >> Prohibiting IS NOT NULL is not on the cards; it's very widely used.
>
>  David> Yet changing how it behaves, invisibly, is?
>
> Did you mean prohibiting it only for composite-type args? It's obviously
> widely used for non-composite args.
>
> I would expect that >95% of cases where someone has written (x IS NOT
> NULL) for x being a composite type, it's actually a bug and that NOT (x
> IS NULL) was intended.
>
>
Yeah, it would need to be targeted there.  I agree with the numbers and the
sentiment but it's still allowed and defined behavior which changing
invisibly is disconcerting.

David J.

Re: [HACKERS] [BUGS] BUG #14244: wrong suffix for pg_size_pretty()

2016-07-29 Thread David G. Johnston

On Fri, Jul 29, 2016 at 8:18 PM, Bruce Momjian  wrote:

> On Tue, Jul 12, 2016 at 01:36:38PM +, thomas.ber...@1und1.de wrote:
> > The following bug has been logged on the website:
> >
> > Bug reference:  14244
> > Logged by:  Thomas Berger
> > Email address:  thomas.ber...@1und1.de
> > PostgreSQL version: 9.5.3
> > Operating system:   any
> > Description:
> >
> > pg_size_pretty uses the suffix "kB" (kilobyte, 10^3 byte), but the
> returned
> > value is "KB", or "KiB" ( kibibyte, 2^10 byte). This is missleading and
> > should be fixed. See also https://en.wikipedia.org/wiki/Kibibyte
> >
> > =# select pg_size_pretty(1024000::bigint);
> >  pg_size_pretty
> > 
> >  1000 kB
>
> (Thread moved to hackers.)
>
> Yes, we have redefined kB, and documented its use in postgresql.conf and
> pg_size_pretty(), but it does not match any recognized standard.
>

After bouncing on this for a bit I'm inclined to mark the bug itself
"won't fix" but introduce a "to_binary_iso" function (I'm hopeful a better
name will emerge...) that will output a number using ISO binary suffixes.
I would document this under 9.8 "data type formatting functions" instead of
within system functions.

pg_size_pretty output can continue with a defined role to be used as input
into a GUC variable; and to keep backward compatibility.  Add a note near
its definition to use "to_binary_iso" for a standard-conforming output
string.

David J.

Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-01 Thread David G. Johnston

On Mon, Aug 1, 2016 at 10:19 AM, Tom Lane  wrote:

> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> Building on the has-property approach Andrew suggested, I wonder if
> >> we need something like pg_index_column_has_property(indexoid, colno,
> >> propertyname) with properties like "sortable", "desc", "nulls first".
>
> > Right, this makes sense to me.  The point which I was trying to get at
> > above is that we should be able to replace most of what is provided in
> > pg_get_indexdef() by using this function to rebuild the CREATE INDEX
> > command- again, similar to how we build a CREATE TABLE command rather
> > than simply provide a 'pg_get_tabledef()'.
>
> As far as I understood Andrew's use case, he was specifically *not*
> interested in a complete representation of an index definition, but
> rather about whether it had certain properties that would be of
> interest to query-constructing applications.
>
>
+1

I guess it might matter whether the query be constructed is CREATE INDEX or
SELECT

The later seems to be more useful.  The former should probably be something
the server provides as a whole when requested.

David J.

Re: [HACKERS] Proposal: revert behavior of IS NULL on row types

2016-07-26 Thread David G. Johnston

On Tue, Jul 26, 2016 at 11:10 AM, Tom Lane  wrote:

>
> 3. Andrew also revived the bug #7808 thread in which it was complained
> that ExecMakeTableFunctionResult should not fail on null results from
> functions returning SETOF composite.  That's related only in that the
> proposed fix is to translate a plain NULL into ROW(NULL,NULL,...).
> I do not much like the implementation details of his proposed patch,
> but I think making that translation is probably better than failing
> altogether.  Given the infrequency of field complaints, my recommendation
> here is to fix it in HEAD but not back-patch.
>

Andrew's response also introduces an area for documentation improvement.

The concept embodied by "NULL" in the operator "IS [NOT] NULL" is distinct
from the concept embodied by "NULL" in the operator "IS [NOT] DISTINCT
FROM".

In short, the former smooths out the differences between composite and
non-composite types while the later maintains their differences.  While a
bit confusing I don't see that there is much to be done about it - aside
from making the distinction more clear at:

https://www.postgresql.org/docs/devel/static/functions-comparison.html

Does spec support or refute this distinction in treatment?

CREATE TYPE twocol AS (col1 text, col2 int);
CREATE TYPE twocolalt AS (col1 text, col2 int);

SELECT
row(null, null)::twocol IS NULL,
null::twocol IS NULL,
null::twocol IS NOT DISTINCT FROM row(null, null)::twocol

Its at least worth considering whether to note that when comparing two
composite values using IS DISTINCT FROM that the comparison is unaware of
the composite type itself but performs an iterative comparison of each pair
of columns.

SELECT row(null, null)::twocol IS NOT DISTINCT FROM row(null,
null)::twocolalt

This is likely to matter little in practice given low odds of someone
accidentially comparing two physically identical but semantically different
composite types.

David J.

Re: [HACKERS] [BUGS] BUG #14244: wrong suffix for pg_size_pretty()

2016-07-30 Thread David G. Johnston

On Sat, Jul 30, 2016 at 10:35 AM, Tom Lane  wrote:

> Greg Stark  writes:
> > I think Bruce's summary is a bit revisionist.
>
> I would say it's a tempest in a teapot.
>
> What I think we should do is accept "kb" and the rest case-insensitively,
> print them all in all-upper-case always, and tell standards pedants
> to get lost.  The idea of introducing either a GUC or new function names
> is just silly; it will cause far more confusion and user code breakage
> than will result from just leaving well enough alone.
>

I wouldn't mind fixing case sensitivity in the process...but I don't think
we need to touch the GUC infrastructure at all.

For a product that has a reasonably high regard for the SQL standard I'd
like to at least keep an open mind about other relevant standards - and if
accommodation is as simple as writing a new function I'd see no reason to
reject such a patch.  pg_size_pretty never did seem like a good name for a
function with its behavior...lets be open to accepting an improved version
without a pg_ prefix.

We could even avoid a whole new function and add an "iB" template pattern
to the to_char function - although I'm not sure that wouldn't be more
confusing than helpful in practice.

David J.

Re: [HACKERS] Fwd: [BUGS] BUG #14247: COMMENT is restored on wrong database

2016-08-02 Thread David G. Johnston

Moving to -hackers since this is getting into details

Bug Report # 14247

On Tue, Aug 2, 2016 at 4:58 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> "David G. Johnston" <david.g.johns...@gmail.com> writes:
> > Do you have an opinion on this following?
>
> I think the real problem in this area is that the division of labor
> we have between pg_dump and pg_dumpall isn't very good for real-world
> use cases.

I won't disagree here.

>   I'm not at all sure what a better answer would look like.
> But I'd rather see us take two steps back and consider the issue
> holistically instead of trying to band-aid individual misbehaviors.
>
> The fact that pg_dump is emitting COMMENT ON DATABASE at all is
> fundamentally wrong given the existing division-of-labor decisions,
> namely that pg_dump is responsible for objects within a database
> not for database-level properties.

But I have to disagree with this in the presence of the --create flag.

> It's entirely possible that some feature like COMMENT ON CURRENT DATABASE
> would be a necessary component of a holistic solution,
>  [ various other ON CURRENT commands elidded ]
>

In reading the code for the public schema bug I never fully got my head
around how dump/restore interacts with multiple databases.  Specifically,
in RestoreArchive, why we basically continue instead of breaking once we've
encountered the "DATABASE" entry and decide that we need to drop the DB due
to (createDB && dropSchema).

OTOH, seeing the code for the public schema exception I'm inclined to
think that adding something like the follow just after the public schema
block just patched:

/* If the user didn't request that a new database be created skip emitting
 * commands targeting the current database as they may not have the same
 * name as the database being restored into.
 *
 * XXX This too is ugly but the treatment of globals is not presently
amenable to pretty solutions
 */
if (!ropt->createDB))
{
if (strcmp(te->desc, "DATABASE") != 0  //this is quite possibly too broad a
check...I'm always concerned about false positives in cases like these.
 return;
}

Though reading it now that seems a bit like throwing out the baby with the
bath water; but then again if they are not requesting a database be created
and they happen to have a database of the same name already in place it
would be unexpected that it would not have been properly configured.

Assuming I'm not missing something here it seems like an easy and
internally consistent hack/fix for a rare but real concern.  However, since
the existing code doesn't provoke an error it doesn't seem like something
this should back-patched (I'm not convinced either way though).

David J.

Re: [HACKERS] New version numbering practices

2016-08-01 Thread David G. Johnston

On Mon, Aug 1, 2016 at 1:41 PM, Tom Lane  wrote:

> Over the past couple of months I have already found myself
> writing "10.0" or "9.7^H^H^H10" to make it clear that I meant the next
> release version, because just "10" seemed too ambiguous.

I thought that was just (and maybe some instances were) humor regarding
the general indecisiveness on the issue.

>   Maybe I'm
> worried about nothing and the ambiguity mostly came from our not having
> settled the two-or-three-part-version-number question, but I'm not sure.
>

I think this dynamic will sort itself out.

I suspect I'll end up using 10.x somewhat frequently though I'm mostly on
the lists.  I suspect the choice will be dependent on context and channel.

David J.

Re: [HACKERS] Surprising behaviour of \set AUTOCOMMIT ON

2016-08-11 Thread David G. Johnston

On Thu, Aug 11, 2016 at 2:11 AM, Venkata Balaji N  wrote:

>
> [...]
>  committing all the previously open transactions
> [...]
>

"All"?  There can only ever be at most one open transaction at any given
time...

I don't have a fundamental issue with saying "when turning auto-commit on
you are also requesting that the open transaction, if there is one, is
committed immediately."  I'm more inclined to think an error is the correct
solution - or to respond in a way conditional to the present usage
(interactive vs. script).  I disagree with  Robert's unsubstantiated belief
regarding ON_ERROR_STOP and think that it captures the relevant user-intent
for this behavior as well.

David J.

Re: [HACKERS] New version numbering practices

2016-08-03 Thread David G. Johnston

On Wed, Aug 3, 2016 at 1:20 PM, Tom Lane  wrote:

> Greg Stark  writes:
> > Right now git-describe --tags on a random revision between 9.4
> > and 9.5 will print something like REL9_4_BETA1-1973-g85c25fd or
> > something like REL9_5_BETA2-33-g55a2cc8 if it happens to be after a
> > beta. It's really hard to tell what release the revision you're on is
> > actually between from that.
>
> That command is kinda useless AFAICT :-(
>

Mostly as a function of a lack of definition as to what it wants to show.
It would be good to at least ensure that shared commit between master and a
release branch is tagged on master.

git describe --tags REL9_6_BETA1~1 should show REL9_5_0 (or, e.g.,
REL9_5_GOLIVE if we cannot reasonably put the 9.5.0 tag on master) and not
REL9_5_ALPHA1-*

David J.

Re: [HACKERS] New version numbering practices

2016-08-03 Thread David G. Johnston

On Wed, Aug 3, 2016 at 12:51 PM, Greg Stark  wrote:

> On Tue, Aug 2, 2016 at 2:57 PM, Tom Lane  wrote:
> > I thought we'd pretty much done that cleanup during the cvs->git
> > conversion?
>
> I guess I'm talking about tags. I'm not clear on the distinction
> between tags and branches names in git.
>

Ignoring git for a moment our setup is that each major version (9.6) gets
a branch when it is finally released.  For each minor release the last
commit on each branch that is included in the release is tagged with both
the branch/major-version AND the patch/minor-version.

I'm sure the internet can provide a better overview of the differences,
within git, between tags and branches.  One way to look at it, though is
that tags are
 explicit labels pointing to commits
 whereas b
ranches are
 implicit labels.

When you do:  git checkout branch you are asking for whatever HEAD - for
that branch - points to.  Committing to a branch causes a new commit to be
created and then HEAD - for that branch - to be moved.  So you are, by
default, dealing with the implicit HEAD label within the branch "namespace".

David J.

Re: [HACKERS] max_parallel_degree > 0 for 9.6 beta

2016-08-04 Thread David G. Johnston

On Thu, Aug 4, 2016 at 9:25 AM, Robert Haas  wrote:

> On Thu, Aug 4, 2016 at 12:56 AM, Tom Lane  wrote:
> > Noah Misch  writes:
> >> On Wed, Apr 20, 2016 at 02:28:15PM -0400, Tom Lane wrote:
> >>> +1, but let's put an entry on the 9.6 open-items page to remind us to
> >>> make that decision at the right time.
> >
> >> It's that time.  Do we restore the max_parallel_workers_per_gather=0
> default,
> >> or is enabling this by default the right thing after all?
> >
> > At this point I'd have to vote against enabling by default in 9.6.  The
> > fact that in the past week we've found bugs as bad as e1a93dd6a does not
> > give me a warm fuzzy feeling about the parallel-query code being ready
> > for prime time.
> >
> > Of course the question is how do we ever get to that point if we chicken
> > out with enabling it by default now.  Maybe we could keep it turned on
> > in HEAD.
>
> However, we have a long track
> record of being cautious about things like this, so I would be OK with
> the idea of disabling this in the 9.6 branch and leaving it turned on
> in master, with the hope of shipping it enabled-by-default in v10.


My initial reaction was +1 but now I'm leaning toward enabled by default.

Those who would upgrade to 9.6 within a year of its release are most
likely, process and personality wise, to be those for whom the risks and
rewards of new features is part of their everyday routine.

If indeed we release 10.0 with it enabled by default we should be confident
in its stability at that point in the 9.6.x branch.

David J.

Re: [HACKERS] Surprising behaviour of \set AUTOCOMMIT ON

2016-08-12 Thread David G. Johnston

On Fri, Aug 12, 2016 at 3:03 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Thu, Aug 11, 2016 at 8:34 AM, David G. Johnston
> <david.g.johns...@gmail.com> wrote:
> > I don't have a fundamental issue with saying "when turning auto-commit on
> > you are also requesting that the open transaction, if there is one, is
> > committed immediately."  I'm more inclined to think an error is the
> correct
> > solution - or to respond in a way conditional to the present usage
> > (interactive vs. script).  I disagree with  Robert's unsubstantiated
> belief
> > regarding ON_ERROR_STOP and think that it captures the relevant
> user-intent
> > for this behavior as well.
>
> I'll substantiate my belief by referring to you for the documentation
> for ON_ERROR_STOP, which says:
>
> "By default, command processing continues after an error. When this
> variable is set to on, processing will instead stop immediately. In
> interactive mode, psql will return to the command prompt; otherwise,
> psql will exit, returning error code 3 to distinguish this case from
> fatal error conditions, which are reported using error code 1. In
> either case, any currently running scripts (the top-level script, if
> any, and any other scripts which it may have in invoked) will be
> terminated immediately. If the top-level command string contained
> multiple SQL commands, processing will stop with the current command."
>
> In every existing case, ON_ERROR_STOP affects whether we continue to
> process further commands after an error has already occurred.  Your
> proposal would involve changing things so that the value ON_ERROR_STOP
> affects not only *how errors are handled* but *whether they happen in
> the first place* -- but only in this one really specific case, and no
> others.
>
> This isn't really an arguable point, even if you want to try to
> pretend otherwise.  ON_ERROR_STOP should affect whether we stop on
> error, not whether generate an error in the first place.  The clue is
> in the name.
>
>
Changing AUTOCOMMIT to ON while in a transaction is a psql error - period.

If ON_ERROR_STOP is on we stop.  This meets the current semantics for
ON_ERROR_STOP.

With ON_ERROR_STOP off psql is going to continue on with the next command.
I'd suggest changing things so that psql can, depending upon the error,
invoke additional commands to bring the system into a known good state
before the next user command is executed.  In the case of "\set AUTOCOMMIT
on" this additional command would be COMMIT.  We can still report the error
before continuing on - so there is no affecting the "generating [of] an
error in the first place.".

Allowing command-specific responses to errors when faced with script
continuation would be a change.  I do not think it would be a bad one.  Am
I stretching a bit here?  Sure.  Is it worth stretching to avoid adding
more knobs to the system?  Maybe.

I'll admit I haven't tried to find fault with the idea (or discover better
alternatives) nor how it would look in implementation.  As a user, though,
it would make sense if the system behaved in this way.  That only
AUTOCOMMIT needs this capability at the moment doesn't bother me.  I'm also
fine with making it an error and moving on - but if you want to accommodate
both possibilities this seems like a cleaner solution than yet another
environment variable that a user would need to consider.

David J.

Re: [HACKERS] unexpected psql "feature"

2016-07-13 Thread David G. Johnston

On Wed, Jul 13, 2016 at 5:44 PM, Fabien COELHO  wrote:

>
>> I do not think changing this is appropriate.  All you are likely to
>> accomplish is breaking code that does what its author wanted.
>>
>
> Hmmm... My 0.02€: Currently this feature is NOT documented, so somehow it
> is not supported, and relying on it seems risky, as it is really a side
> effect of the current implementation. If it becomes documented, it could be
> made to behave sanely at the same time...

To me it has sane and well-defined behavior - if maybe rarely useful.

Why would you choose to execute "SELECT 1 \; SELECT 2;" instead of "SELECT
1; SELECT 2;" in a setup where the behavior of both strings is identical?
Or, rather, how would they differ?

David J.

Re: [HACKERS] unexpected psql "feature"

2016-07-13 Thread David G. Johnston

On Wed, Jul 13, 2016 at 5:44 PM, Fabien COELHO  wrote:

>
> Hello Tom,
>
> Although "\;" behavior is not documented, I would have expected both
>>> results to be shown one after the other, or having a an error, but not a
>>> quiet discard.
>>>
>>
>> See the documentation for PQexec(): all but the last query result is
>> discarded.
>>
>
> Sure. That is developer-level answer to "why", although it does not really
> say why the developer chose PQexex over PQsendQuery. At the user-level the
> behavior is still pretty surprising.

Lets try putting it this way...

As a psql user I want some way to choose whether I send my query via
"PQexec" or "PQsendQuery".  I'm not sure why the "PQexec" access point is
undocumented but this "\;" syntax, vis-a-vis ";" provides me that choice.

David J.

Re: [HACKERS] pgbench - allow to store select results into variables

2016-07-13 Thread David G. Johnston

On Wed, Jul 13, 2016 at 4:02 PM, Tom Lane  wrote:

> Robert Haas  writes:
> > On Sat, Jul 9, 2016 at 7:52 AM, Fabien COELHO 
> wrote:
> >> If someone thinks that "gset" is a good idea for pgbench, which I
> don't, it
> >> could be implemented. I think that an "into" feature, like PL/pgSQL &
> ECPG,
> >> makes more sense for scripting.
>
> > I agree: I like \into.
>
> > But:
>
> >> SELECT 1, 2 \; SELECT 3;
> >> \into one two three
>
> > I think that's pretty weird.
>
> Yeah, that's seriously nasty action-at-a-distance in my view.  I'd be okay
> with
>
> SELECT 1, 2 \into one two
> SELECT 3 \into three
>
> but I do not think that a metacommand on a following line should
> retroactively affect the execution of a prior command, much less commands
> before the last one.


You need a test and a definition for:

SELECT 1, 2;
SELECT 3;
\into x, y, z

It should fail - "too many variables" - right?

David J.

Re: [HACKERS] unexpected psql "feature"

2016-07-13 Thread David G. Johnston

On Wed, Jul 13, 2016 at 4:47 PM, Fabien COELHO  wrote:

>
> I would suggest that:
>  - the \; psql feature should be documented somewhere
>

agreed

>  - all results should be shown, not just the last one
>

disagree

# select 1 ; select 2 ;
?column?
--
1
(1 row)

?column?
-
2
(1 row)

Having

# select 1 \; select 2 ;

Result in identical behavior seems undesirable.  At least now if you want
to discard all intermediate work and just show the last statement you can
do so without going to any great lengths.  If you really want both results
don't use "\;".  This makes even more sense when the earlier statements are
DML instead of SELECT.

David J.

Re: [HACKERS] Odd error when using UNION and COLLATE

2016-07-20 Thread David G. Johnston

On Wed, Jul 20, 2016 at 5:38 PM, Bruce Momjian  wrote:

> I think the 'ORDER BY x COLLATE "C"' is being parsed as an a_expr, and
> we don't allow a_expr in a UNION.  Perhaps we are too strict here, but I
> can't tell.
>

ORDER BY 1 COLLATE "C" is indeed an expression - the number no longer
refers to a column position but it is a constant.  The presence or absence
of UNION doesn't factor into things here - the expression itself is useless
on its face.

This one is a bit different in cause but I suspect is working as well as
can be expected.

SELECT 'a-c' AS x UNION ALL SELECT 'ab' AS x ORDER BY x COLLATE "C";

David J.

Re: [HACKERS] to_date_valid()

2016-07-04 Thread David G. Johnston

On Mon, Jul 4, 2016 at 8:39 PM, Andreas 'ads' Scherbaum <
adsm...@wars-nicht.de> wrote:

> On 04.07.2016 18:37, Pavel Stehule wrote:
>
>>
>> I don't know if the name "strict" is best, but the name "validate" is
>> not good too. Current to_date does some validations too.
>>
>
> Obviously not enough, because it allows invalid dates. I'd say that the
> current to_date() merely validates the input format for string parsing, and
> that the date is in range. But there is not much validation on the date
> itself.
>
> So the name can't be "strict" because of the conflict with "NULL"
> handling, and you don't like "valid" - what other options do you offer?

We don't have to change the name...we could do something like how
RegularExpressions work - like (?i) - and just add  a new modifier code.

'~-MI-DD' --that's a leading tilde, could be anything - even something
like "HM-MI-DD" for "historical mode"

Also, see this thread of a few weeks ago for related material:

https://www.postgresql.org/message-id/1873520224.1784572.1465833145330.JavaMail.yahoo%40mail.yahoo.com

It seems that fixing it is back on the table, possibly even for 9.6 since
this is such a hideous bug - one that closely resembles a cockroach ;)

WRT to the 2014 "reject out-of-range dates" thread, I'm kinda surprised
that we didn't just set the date to be the minimum or maximum allowable
date back in 9.2 instead of rejecting it...

I'd be more inclined to add another argument if it was basically an enum.

to_date(text, format, format_style)

This at least opens the door for future enhancements that are associated
with named styles.  I could imagine an "exact" format that also allows for
something like regexp character classes so that one can write:
 "[:SEP:]MM[:SEP:]DD" to keep exact matches but accommodate variations
on what people type as a separator e.g. [.-/]

format_style=>historical would invoke today's behaviors

David J.

Re: [HACKERS] Typo Patch

2016-07-05 Thread David G. Johnston

On Tue, Jul 5, 2016 at 11:54 AM, Robert Haas  wrote:

> On Sat, Jul 2, 2016 at 12:17 PM, CharSyam  wrote:
> > I fixed typos. and attached patch for this.
> > Thanks.
> >
> > I only changed comments only in src/backend/utils/adt/tsvector_op.c
>
> Well, that's definitely a typo.  However, typo or no typo, it's hard
> for me to figure out with any certainty what that sentence actually
> means.
>

Given that "curitem->qoperator.distance" is constant.

Also, both loops go from low to high position (over the same scale)  with
the inner loop stopping as soon as it moves beyond the position of the
outer loop - namely that the left/inner item is always positioned to the
left of the right/outer operand within the document being search.

Regardless of whether there is a match at this meeting point increasing the
outer loop's position will not cause any of the previously checked (at the
lower positions) inner loop items to match where they did not before.  I
say this but I'm concerned that for sufficiently large values of
curitem->qoperator.distance a given left operand could match multiple right
operands since the distance is a maximum extent.

Thus, in the case of a match the current Lpos needs to be checked again
during the subsequent iterator of the outer loop.

The "else if (Rpos - Lpos < distance) break" block though is oddly
commented/written since distance is a maximum - shouldn't this be treated
as a match as well?

Since, for sufficiently large values of "distance", a single left operand
could cause multiple right operands to be matched when the less-than
condition matches we need to leave Lpos alone and try the next Rpos against
it.  For the greater than (default) and the equal cases we can skip
rechecking the current Lpos.

The comment in question can be removed - we're not actually resetting
anything here.  The use of LposStart is not really needed - just make sure
to leave Lpos in the correct position (i.e. optional increment on all
paths) and the inner while loop will do the correct thing.

It seems a bit odd to be keying off of the RIGHT operand and returning its
position when left-to-right languages would consider the position of the
leading word to be reported.

Code Comment Suggestion:

For each physically positioned right-side operand iterate over each
instance of the left-side operand to locate one within the specified
distance.  As soon as a match is found move onto the next right-operand and
continue searching starting with the last checked left-side operand.  Note
that for an exact distance match the current left-side operand can be
skipped over.

For some graphical imagery consider how a Slinky operates.  Hold the left
side, move the right side, release the left and let it collapse; repeat.
In this case, though, the collapsing can be stopped mid-stream since the
distance check has an inequality.

The following shouldn't be taken as an actual patch submission but rather a
codification of the above.

diff --git a/src/backend/utils/adt/tsvector_op.c
b/src/backend/utils/adt/tsvector_op.c
index 242b7e1..fefaca5 100644
--- a/src/backend/utils/adt/tsvector_op.c
+++ b/src/backend/utils/adt/tsvector_op.c
@@ -1375,7 +1375,6 @@ TS_phrase_execute(QueryItem *curitem,
  ExecPhraseData Ldata = {0, false, NULL},
  Rdata = {0, false, NULL};
  WordEntryPos *Lpos,
-   *LposStart,
*Rpos,
*pos_iter = NULL;

@@ -1423,15 +1422,17 @@ TS_phrase_execute(QueryItem *curitem,
  * ExecPhraseData->data can point to the tsvector's WordEntryPosVector
  */

+ /*
+  * For each physically positioned right-side operand iterate over each
+  * instance of the left-side operand to locate one within the specified
+  * distance.  As soon as a match is found move onto the next right-operand
+  * and continue searching starting with the last checked left-side
operand.
+  * Note that for an exact distance match the current left-side operand
+  * can be skipped over.
+  */
  Rpos = Rdata.pos;
- LposStart = Ldata.pos;
  while (Rpos < Rdata.pos + Rdata.npos)
  {
- /*
- * We need to check all possible distances, so reset Lpos
- * to guranteed not yet satisfied position.
- */
- Lpos = LposStart;
  while (Lpos < Ldata.pos + Ldata.npos)
  {
  if (WEP_GETPOS(*Rpos) - WEP_GETPOS(*Lpos) ==
@@ -1449,7 +1450,7 @@ TS_phrase_execute(QueryItem *curitem,
  * could not satisfy distance for any other right
  * position
  */
- LposStart = Lpos + 1;
+ Lpos++
  break;
  }
  else
@@ -1462,16 +1463,26 @@ TS_phrase_execute(QueryItem *curitem,
  }

  }
- else if (WEP_GETPOS(*Rpos) <= WEP_GETPOS(*Lpos) ||
- WEP_GETPOS(*Rpos) - WEP_GETPOS(*Lpos) <
+ else if (WEP_GETPOS(*Rpos) <= WEP_GETPOS(*Lpos)
+ {
+ /*
+ * No Increment - we are beyond the current right operand
+ * so its possible this left operand could match the
next right
+ * operand.
+ */
+ break;
+ }
+ else if (WEP_GETPOS(*Rpos) - WEP_GETPOS(*Lpos) <
  curitem->qoperator.distance)
  {
+ /*

Re: [HACKERS] to_date_valid()

2016-07-05 Thread David G. Johnston

On Tue, Jul 5, 2016 at 5:22 AM, Andreas 'ads' Scherbaum <
adsm...@wars-nicht.de> wrote:

> On 05.07.2016 04:33, David G. Johnston wrote:
>
>> On Mon, Jul 4, 2016 at 8:39 PM, Andreas 'ads' Scherbaum
>> <adsm...@wars-nicht.de <mailto:adsm...@wars-nicht.de>>wrote:
>>
>> On 04.07.2016 18:37, Pavel Stehule wrote:
>>
>>
>> I don't know if the name "strict" is best, but the name
>> "validate" is
>> not good too. Current to_date does some validations too.
>>
>>
>> Obviously not enough, because it allows invalid dates. I'd say that
>> the current to_date() merely validates the input format for string
>> parsing, and that the date is in range. But there is not much
>> validation on the date itself.
>>
>> So the name can't be "strict" because of the conflict with "NULL"
>> handling, and you don't like "valid" - what other options do you
>> offer?
>>
>>
>> We don't have to change the name...we could do something like how
>> RegularExpressions work - like (?i) - and just add  a new modifier code.
>>
>> '~-MI-DD' --that's a leading tilde, could be anything - even
>> something like "HM-MI-DD" for "historical mode"
>>
>
> Where to_timestamp() already uses HH for the hour? If you add another "H",
> that surely is confusing.


I don't really try to pick final names for things until the idea has taken
hold...



>
>
>
> It seems that fixing it is back on the table, possibly even for 9.6
>> since this is such a hideous bug - one that closely resembles a cockroach
>> ;)
>>
>
> 9.6 is already in Beta, people are testing their applications against it.
> This would be a huge break, plus an API change - something you don't add in
> a Beta.
>
>
Surely these beta testers would test against the RC before putting it into
production so I don't see an issue.  I tend to agree generally but the
point of beta is to find bugs and solicit suggestions for improvement to
features.  Having found a bug it doesn't make sense to avoid patching our
current unstable release.  This applies moreso because of our annual
release cycle.  On the topic of whether this becomes an exception to the
rule I'm largely ambivalent.

David J.

Re: [HACKERS] Fwd: [BUGS] BUG #14247: COMMENT is restored on wrong database

2016-08-04 Thread David G. Johnston

On Thu, Aug 4, 2016 at 4:50 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Robert Haas <robertmh...@gmail.com> writes:
> > On Tue, Aug 2, 2016 at 5:42 PM, David G. Johnston
> > <david.g.johns...@gmail.com> wrote:
> > The fact that pg_dump is emitting COMMENT ON DATABASE at all is
> > fundamentally wrong given the existing division-of-labor decisions,
> > namely that pg_dump is responsible for objects within a database
> > not for database-level properties.
>
> > I think a while back somebody had the idea of making COMMENT ON
> > CURRENT_DATABASE or COMMENT ON CURRENT DATABASE work, which seems like
> > an elegant solution to me.  Of course, I just work here.
>
> I'm fairly annoyed at David for having selectively quoted from private
> email in a public forum, but that was one of the points I touched on
> in material that he cut.

> The point I tried to make to him is that
> possibly COMMENT ON CURRENT DATABASE is a portion of a holistic solution,
> but it's only a portion.

I should have asked first and I'll take the heat for choosing to re-post
publicly but I kept this aspect of your recommendation precisely because
that was indeed your position.

TL>> It's entirely possible that some feature like COMMENT ON CURRENT
DATABASE
TL>> would be a necessary component of a holistic solution, [ various
other ON CURRENT commands elidded ]

I'm all for an elegant solution here though at some point having a working
solution now beats waiting for someone to willingly dive more deeply into
pg_dump.  I too seem to recall previous proposals for COMMON ON CURRENT
DATABASE yet here we are...

David J.

[HACKERS] Add hint for people who place EXECUTE USING arguments in parentheses (in plpgsql)

2016-08-05 Thread David G. Johnston

Basically,

diff --git a/src/backend/parser/parse_param.c
b/src/backend/parser/parse_param.c
index b402843..97064fc 100644
--- a/src/backend/parser/parse_param.c
+++ b/src/backend/parser/parse_param.c
@@ -108,6 +108,9 @@ fixed_paramref_hook(ParseState *pstate, ParamRef *pref)
  ereport(ERROR,
  (errcode(ERRCODE_UNDEFINED_PARAMETER),
  errmsg("there is no parameter $%d", paramno),
+ parstate->numParams == 1 && < It is of pseudo-type record >
+ ? errhint("%s", _("did you incorrectly enclose USING arguments in
parentheses?"))
+ : 0
  parser_errposition(pstate, pref->location)));

  param = makeNode(Param);


I didn't spend too much time trying to figure out how to test that a
parameter is composite/record typed...

I think a false positive on SQL EXECUTE is going to be very small...but I
choose here mostly out of ease - a fix in pl/pgsql would be more correct.

David J.

[HACKERS] psql: Missing option to print current buffer to current output channel (i.e., \qprint)

2016-08-04 Thread David G. Johnston

"\echo" has "\qecho" - I'm basically looking for a "\qprint"

"\write" doesn't apply

The following doesn't work either...

#bash (stock Ubuntu 14.04)
psql --set=query_in_var='SELECT version();' > /dev/null <<'SQL'
\o /tmp/psql-test.txt
\set ECHO queries
--still ends up on stdout, hence the redirect to /dev/null to avoid it
showing on screen along with the cat output below
:query_in_var
\o
SQL
cat /tmp/psql-test.txt

Can anyone offer a reason not to add "\qprint"?

David J.

Re: [HACKERS] Surprising behaviour of \set AUTOCOMMIT ON

2016-08-08 Thread David G. Johnston

On Mon, Aug 8, 2016 at 11:02 AM, Robert Haas  wrote:

> On Mon, Aug 8, 2016 at 10:10 AM, Rahila Syed 
> wrote:
> > Thank you for inputs everyone.
> >
> > The opinions on this thread can be classified into following
> > 1. Commit
> > 2. Rollback
> > 3. Error
> > 4. Warning
> >
> > As per opinion upthread, issuing implicit commit immediately after
> switching
> > autocommit to ON, can be unsafe if it was not desired.  While I agree
> that
> > its difficult to judge users intention here, but if we were to base it on
> > some assumption, the closest would be implicit COMMIT in my
> opinion.There is
> > higher likelihood of a user being happy with issuing a commit when
> setting
> > autocommit ON than a transaction being rolled back.  Also there are quite
> > some interfaces which provide this.
> >
> > As mentioned upthread, issuing a warning on switching back to autocommit
> > will not be effective inside a script. It won't allow subsequent
> commands to
> > be committed as set autocommit to ON is not committed. Scripts will have
> to
> > be rerun with changes which will impact user friendliness.
> >
> > While I agree that issuing an ERROR and rolling back the transaction
> ranks
> > higher in safe behaviour, it is not as common (according to instances
> stated
> > upthread) as immediately committing any open transaction when switching
> back
> > to autocommit.
>
> I think I like the option of having psql issue an error.  On the
> server side, the transaction would still be open, but the user would
> receive a psql error message and the autocommit setting would not be
> changed.  So the user could type COMMIT or ROLLBACK manually and then
> retry changing the value of the setting.
>
> Alternatively, I also think it would be sensible to issue an immediate
> COMMIT when the autocommit setting is changed from off to on.  That
> was my first reaction.
>
> Aborting the server-side transaction - with or without notice -
> doesn't seem very reasonable.
>
>
Best of both worlds?

IF (ON_ERROR_STOP == 1)
THEN (treat this like any other error)
ELSE (send COMMIT; to server)

David J.

[HACKERS] phrase search TS_phrase_execute code readability patch

2016-08-09 Thread David G. Johnston

Hackers,

The attached attempts to make comprehension of the code in
"TS_phrase_execute" a bit easier.  I posted similar on the "typo patch"
thread of July 2nd/5th but my comments there reflected my mis-understanding
of the distance operator being exact as opposed to the expected
less-than-or-equal.

I had to make one assumption for the attached - that
"curitem->qoperator.distance" is always >= 0.  In the presence of that
assumption the need for the OR goes away.

I don't follow why LposStart is needed so I removed it...

Not compiled or in any way tested...but its not major surgery either -
mostly code comments to make following the logic easier.

David J.
diff --git a/src/backend/utils/adt/tsvector_op.c 
b/src/backend/utils/adt/tsvector_op.c
index ad5a254..215f58f 100644
--- a/src/backend/utils/adt/tsvector_op.c
+++ b/src/backend/utils/adt/tsvector_op.c
@@ -1474,14 +1474,14 @@ TS_phrase_execute(QueryItem *curitem,
 */
 
Rpos = Rdata.pos;
-   LposStart = Ldata.pos;
while (Rpos < Rdata.pos + Rdata.npos)
{
/*
-* We need to check all possible distances, so reset 
Lpos to
-* guaranteed not yet satisfied position.
+* For each physically positioned right-side operand 
iterate over each
+* instance of the left-side operand to locate one at 
exactly the specified
+* distance.  As soon as a match is found move onto the 
next right-operand
+* and continue searching starting with the next 
left-operand.
 */
-   Lpos = LposStart;
while (Lpos < Ldata.pos + Ldata.npos)
{
if (WEP_GETPOS(*Rpos) - WEP_GETPOS(*Lpos) ==
@@ -1490,16 +1490,16 @@ TS_phrase_execute(QueryItem *curitem,
/* MATCH! */
if (data)
{
-   /* Store position for upper 
phrase operator */
+   /* Store the position of the 
right-operand */
*pos_iter = WEP_GETPOS(*Rpos);
pos_iter++;
 
/*
-* Set left start position to 
next, because current
-* one could not satisfy 
distance for any other right
-* position
+* Move onto the next 
right-operand, and also the next
+* left-operand as the current 
one cannot be a match
+* for any other.
 */
-   LposStart = Lpos + 1;
+   Lpos++
break;
}
else
@@ -1512,18 +1512,23 @@ TS_phrase_execute(QueryItem *curitem,
}
 
}
-   else if (WEP_GETPOS(*Rpos) <= WEP_GETPOS(*Lpos) 
||
-WEP_GETPOS(*Rpos) - 
WEP_GETPOS(*Lpos) <
+   else if (WEP_GETPOS(*Rpos) - WEP_GETPOS(*Lpos) <
 curitem->qoperator.distance)
{
/*
-* Go to the next Rpos, because Lpos is 
ahead or on less
-* distance than required by current 
operator
+* We are now within the required 
distance so
+* continue onto the next right-operand 
and retry
+* the current left-operand to see if 
the added
+* distance causes it to match.
 */
break;
 
}
 
+   /*
+* The current left-operand is too far away so
+* try the next one.
+*/
Lpos++;
}
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Strange behavior of some volatile function like random(), nextval()

2016-06-29 Thread David G. Johnston

More specifically...
On Wed, Jun 29, 2016 at 7:34 AM, Michael Paquier 
wrote:

> On Wed, Jun 29, 2016 at 7:43 PM, Alex Ignatov 
> wrote:
> > Hello!
> >
> > Got some strange behavior of random() function:
> >
> > postgres=# select (select random() ) from generate_series(1,10) as i;
> >   random
> > ---
> >  0.831577288918197
> > [...]
> > (10 rows)
>
> I recall that this is treated as an implicit LATERAL, meaning that
> random() is calculated only once.
>

A non-correlated (i.e., does not refer to outer variables) subquery placed
into the target-list need only have its value computed once - so that is
what happens.  The fact that a volatile function can return different
values given the same arguments doesn't mean much when the function is only
ever called a single time.

> > postgres=# select (select random()+i*0 ) from generate_series(1,10) as i;
> >   ?column?
> > 
> >0.97471913928166
> > [...]
> > (10 rows)
>
> But not that. So those results do not surprise me.
>
>
A correlated subquery, on the other hand, has to be called once for every
row and is evaluated within the context supplied by said row.  Each time
random is called it returns a new value.

Section 4.2.11 (9.6 docs)
https://www.postgresql.org/docs/9.6/static/sql-expressions.html#SQL-SYNTAX-SCALAR-SUBQUERIES

Maybe this could be worded better but the first part talks about a single
execution while "any one execution" is mentioned in reference to "the
surrounding query".

I do think that defining "correlated" and "non-correlated" subqueries
within this section would be worthwhile.

David J.

[HACKERS] TLC for EXPLAIN ANALYZE (parallel query and loops)

2016-07-01 Thread David G. Johnston

https://www.postgresql.org/docs/9.6/static/using-explain.html

Existing...

14.1.2 Explain Analyze
[...]
"""
In some query plans, it is possible for a subplan node to be executed more
than once. For example, the inner index scan will be executed once per
outer row in the above nested-loop plan. In such cases, the loops value
reports the total number of executions of the node, and the actual time and
rows values shown are averages per-execution. This is done to make the
numbers comparable with the way that the cost estimates are shown. Multiply
by the loops value to get the total time actually spent in the node. In the
above example, we spent a total of 0.220 milliseconds executing the index
scans on tenk2.
"""

Additional topics to cover here (somewhere in this region)...

Parallel-executed queries executed in workers also result in an increase in
the number of loops.  The total loop count will be equal to the number of
workers used, plus 1 if the leader contributed to retrieving rows.  Note
that, presently, the leader's contributions are not detailed and can only
be imputed from the total for the node and the detail of the workers.
[other detail goes here - the whole block could be placed subsequent to the
inheritance example].


[Possibly make a shorter form of this into a note...]
A nested-loop execution will often result in exactly 1 row being returned
per loop.  In the parallel case, however, and especially when performing
parallel sequential scans with a highly-restrictive filter, it is possible
that few rows are returned.  For instance, a parallel sequential scan on a
unique value will return a single row but might, including the leader, use
3 scans/loops to perform the work.  In this case the average value per loop
would be 0.333+ - which is rounded down to zero since rows is expressed as
an integer.  In any case when loops > 1 it can be necessary (though not
always sufficient) to examine the parent node to discover the total number
of records returned by the child node.

I'm sure I have some level of imprecision here but hopefully this is enough
to start.

David J.

Re: [HACKERS] Actuall row count of Parallel Seq Scan in EXPLAIN ANALYZE .

2016-07-01 Thread David G. Johnston

On Mon, Jun 20, 2016 at 2:54 AM, Masahiko Sawada 
wrote:

> >> QUERY PLAN
> >>
> >>
> -
> >>  Finalize Aggregate  (cost=217018.55..217018.56 rows=1 width=8)
> >> (actual time=2640.015..2640.015 rows=1 loops=1)
> >>Output: count(*)
> >>->  Gather  (cost=217018.33..217018.54 rows=2 width=8) (actual
> >> time=2639.064..2640.002 rows=3 loops=1)
> >>  Output: (PARTIAL count(*))
> >>  Workers Planned: 2
> >>  Workers Launched: 2
> >>  ->  Partial Aggregate  (cost=216018.33..216018.34 rows=1
> >> width=8) (actual time=2632.714..2632.715 rows=1 loops=3)
> >>Output: PARTIAL count(*)
> >>Worker 0: actual time=2632.583..2632.584 rows=1 loops=1
> >>Worker 1: actual time=2627.517..2627.517 rows=1 loops=1
> >>->  Parallel Seq Scan on public.pgbench_accounts
> >> (cost=0.00..205601.67 rows=417 width=0) (actual
> >> time=0.042..1685.542 rows=333 loops=3)
> >>  Worker 0: actual time=0.033..1657.486 rows=3457968
> >> loops=1
> >>  Worker 1: actual time=0.039..1702.979 rows=3741069
> >> loops=1
>


> postgres(1)=# explain analyze verbose select * from pgbench_accounts
> where aid = 10;
>  QUERY PLAN
>
> 
>  Gather  (cost=1000.00..217018.43 rows=1 width=97) (actual
> time=0.541..2094.773 rows=1 loops=1)
>Output: aid, bid, abalance, filler
>Workers Planned: 2
>Workers Launched: 2
>->  Parallel Seq Scan on public.pgbench_accounts
> (cost=0.00..216018.34 rows=0 width=97) (actual time=1390.109..2088.103
> rows=0 loops=3)
>  Output: aid, bid, abalance, filler
>  Filter: (pgbench_accounts.aid = 10)
>  Rows Removed by Filter: 333
>  Worker 0: actual time=2082.681..2082.681 rows=0 loops=1
>  Worker 1: actual time=2087.532..2087.532 rows=0 loops=1
>  Planning time: 0.126 ms
>  Execution time: 2095.564 ms
> (12 rows)
>
>
The below discrepancy bothered me but it finally dawned on me that this is
exactly what Robert's complaint is about.  The 1 row returned by the leader
is averaged across three parallel workers and shows as zero in the "actual"
count for the PSS in the second plan.

Actual rows in parens:

First Plan:

Finalize Agg. (1) <- Gather (3) <- Partial Agg (1x3 = 3) { "Leader" (1) +
Worker0 (1) + Worker1 (1) } <- ParallelSS (3.33Mx3) { "Leader" (~2M) +
 Worker0 (~4M) + Worker1 (~4M) }

Second Plan:

Gather (1) <- ParallelSS (0.33, rounded down :( x 3) { "Leader" (0) +
Worker0 (0) + Worker1 (0) }

(rhetorical question, see my first paragraph)
Why, in the first plan, does the PSS node have a total count of 10M which
are fed to and aggregated by the Partial Agg node parent, while in the
second plan the PSS node shows zero actual rows yet its parent has a row
count of 1?

I'm inclined to go with Robert on this - reporting averages seems
counter-productive.  Similar information can be had by reporting totals and
loop and letting the do the division if needed.

As a middle position if the resultant integer average rows value is 0 can
we output

Re: [HACKERS] Column COMMENTs in CREATE TABLE?

2016-07-02 Thread David G. Johnston

On Sat, Jul 2, 2016 at 12:55 PM, Marko Tiikkaja  wrote:

>
> What I would prefer is something like this:
>
> CREATE TABLE foo(
>   f1 int NOT NULL COMMENT
> 'the first field',
>   f2 int NOT NULL COMMENT
> 'the second field',
> ...
> );
>
> which would ensure the comments are both next to the field definition
> they're documenting and that they make it all the way to the database. I
> looked into the biggest products, and MySQL supports this syntax.  I
> couldn't find any similar syntax in any other product.
>
>
+1 for the idea - though restricting it to columns would not be ideal.


CREATE TABLE name
COMMENT IS
'Table Comment Here'
(
col1 serial COMMENT IS 'Place comment here'
);

David J.

Re: [HACKERS] Column COMMENTs in CREATE TABLE?

2016-07-02 Thread David G. Johnston

On Sat, Jul 2, 2016 at 8:31 PM, Fabrízio de Royes Mello <
fabriziome...@gmail.com> wrote:

>
>
> Em sábado, 2 de julho de 2016, David G. Johnston <
> david.g.johns...@gmail.com> escreveu:
>
>> On Sat, Jul 2, 2016 at 12:55 PM, Marko Tiikkaja <ma...@joh.to> wrote:
>>
>>>
>>> What I would prefer is something like this:
>>>
>>> CREATE TABLE foo(
>>>   f1 int NOT NULL COMMENT
>>> 'the first field',
>>>   f2 int NOT NULL COMMENT
>>> 'the second field',
>>> ...
>>> );
>>>
>>> which would ensure the comments are both next to the field definition
>>> they're documenting and that they make it all the way to the database. I
>>> looked into the biggest products, and MySQL supports this syntax.  I
>>> couldn't find any similar syntax in any other product.
>>>
>>>
>> +1 for the idea - though restricting it to columns would not be ideal.
>>
>>
>> CREATE TABLE name
>> COMMENT IS
>> 'Table Comment Here'
>> (
>> col1 serial COMMENT IS 'Place comment here'
>> );
>>
>>
> And what about the other CREATE statements? IMHO if we follow this path
> then we should add COMMENT to all CREATE statements and perhaps also to
> ALTER. Of course in a set of small patches to make the reviewers life
> easier.
>
>
I should have made it clear I didn't expect TABLE to be the only object
but rather was using it as an example of how we could/should do this
generally for top-level objects (e.g., table) and sub-objects (e.g.,
column).

David J.

Re: [HACKERS] 10.0

2016-06-20 Thread David G. Johnston

On Mon, Jun 20, 2016 at 5:08 PM, Robert Haas  wrote:

> On Mon, Jun 20, 2016 at 4:53 PM, Joshua D. Drake 
> wrote:
> > Or we could adopt the very reasonable and practical policy of:
> >
> > The current versioning scheme isn't broke, so we aren't going to fix it.
>
> Yeah, no kidding.  We had a perfectly good consensus to keep this at
> 9.6 on pgsql-advocacy, and then later we had a revised consensus to
> retitle it to 10.0,

If -advocacy had changed their mind before beta1 was tagged this may have
played out a bit differently...maybe.  In any case 9.6 was a foregone
conclusion given -advocacy's timeline and because of its independence from
-hackers (Tom, the tagger, specifically).

but as soon as the discussion came over to
> pgsql-hackers nothing would do but that we relitigate the whole thing
> ignoring the previous discussion because it wasn't on pgsql-hackers.
> Why -hackers is the right place to decide on the marketing version
> number rather than -advocacy went unexplained, of course.

I had the same thought.  Yet no one even tried to move the discussion back
there.  And at this point PGCon was so close that I personally figured it
would be discussed.  It was, and an announcement was made that a decision
was reached.  No one voiced an objection so those not at PGCon (or at I)
figured for better or worse we're going to version 10 (expected) and to
address the expressed desire reduce the public confusion surrounding our
major-major-minor version scheme we would instead switch to a more
traditional major-minor scheme (acceptable).

  Now we have
> a new consensus, at least the third if not the fourth or fifth, about
> what to do on this topic, and since Tom likes this outcome better he'd
> like to stop discussion right here.

This portion of the thread was mostly a technical concern - how do we
display the version in human-readable and machine-readable formats.  I'd
agree with your earlier idea that if you really want to open up the
decision between 10.n.x and 10.x start a new thread on -advocacy.  Tally
up those at PGCon as a starting point and lets other toss in their lot from
there.  At this point I'm only seeing rehashing of the same arguments.  Let
other people make their, informed or otherwise, opinions known if you feel
that the group at PGCon is not a fair sampling.

> A two-part version numbering
> scheme may or may not be for the best, but the idea that we're making
> that decision in any principled way, or that the consensus on this new
> system is any more valid than any of the previous consensus, doesn't
> ring true to me.  The idea that this discussion is not fixing any real
> problem, though -- that rings true.
>
>
You've hijack the meaning of "this here.  Most of "this" thread ended up
explaining the difference between "version()" and
current_setting('server_version_num").  That needs to be addressed unless
we revert back to status-quo.  That isn't this thread, though.  This thread
assumes we are going to 10.0 and semantically losing the middle digit.

I'd find it odd to argue against 10.0 on the basis that we'd be displaying
10.0.x in the output of version().  That seems like such a inconsequential
detail in the larger scheme of things.  Either you accept 10.0 and we pick
a human readable output or you don't and the decision doesn't come into
play.  I'm happy with how things are (+0.5 for we should output 10.0.x for
version()) so I'm stopping here.

David J.

< 1 2 3 4 5 6 7 >

401 - 500 of 602 matches

Mail list logo