Re: [HACKERS] Add jsonb_compact(...) for whitespace-free jsonb to text

2016-05-02 Thread Shulgin, Oleksandr
On Sun, May 1, 2016 at 3:22 AM, Andrew Dunstan  wrote:

>
> On 04/29/2016 06:11 PM, Merlin Moncure wrote:
>
> This is a simple matter of removing spaces in the occasional C string
>> literal in the serialization routines and adding a json_pretty
>> function.
>>
>
> I spent a few hours on this. See <
> https://bitbucket.org/adunstan/pgdevel/commits/branch/jsonformat> for WIP
> - there are three commits. No regression tests yet for the two new
> functions (json_squash and json_pretty), Otherwise fairly complete.
> Removing whitespace generation was pretty simple for both json and jsonb.
>

Looks good, thank you!

It would make sense IMO to rename FormatState's `indent' field as `pretty':
it's being used to add whitespace between the punctuation, not only at
start of a line.  I'd also move the "if (indent)" check out of
add_indent(): just don't call it if no indent is needed.

I'll try to play with the patch to produce some regression tests for the
new functions.

--
Alex


Re: [HACKERS] Add jsonb_compact(...) for whitespace-free jsonb to text

2016-04-29 Thread Shulgin, Oleksandr
On Fri, Apr 29, 2016 at 3:18 PM, Andrew Dunstan  wrote:

>
> On 04/28/2016 04:29 PM, Alvaro Herrera wrote:
>
>>
>>> Actually we did have someone come up with a patch to "normalize" how
>> JSON stuff was output, because our code seems to do it in three
>> different, inconsistent ways.  And our response was for them to get the
>> heck outta here, because we're so much in love with our current
>> practice that we don't need to refactor the three implementations into a
>> single one.
>>
>
> That's a pretty bad mischaracterization of the discussion. What was
> proposed was broken, as I pointed out to the OP, and then again later to
> you when you asked about it. What he wanted to achieve simply couldn't be
> done they way he was trying to achieve it.
>

Yeah, the original request was pretty invalid, but when I've proposed to
resubmit just the normalization of whitespace nobody has shown enthusiasm
about the idea either:


http://www.postgresql.org/message-id/cacaco5qkoiz-00jf6w2uf0pst05qrekq9uzssybl0m9fgkd...@mail.gmail.com

Regarding the present proposal:
>
> I wouldn't necessarily be opposed to us having one or more of the
> following:
>
> a) suppressing whitespace addition in all json generation and text output,
> possibly governed by a GUC setting so we could maintain behaviour
> compatibility if required
>

I'm not thrilled about GUC that would silently break stuff.  That being
said, if someone's code is dependent on exact placement of whitespace in
the JSON text, it's pretty broken already and it's just a matter of time
when they hit an issue there.


> b) a function to remove whitespace from json values, but otherwise
> preserve things like key order
> c) a function to pretty-print json similar to the output from jsonb, but
> again preserving key order
> d) a function to reorder keys in json so they were sorted according to the
> relevant collation.
>
> None of these things except possibly the last should be terribly difficult
> to do.
>

My vote goes to remove all optional whitespace by default and have a single
function to prettify it.  Key reordering can then be handled with an
optional parameter to such prettifying function.

It would probably make sense model this function after Python's
"dump-to-JSON-string" function:
https://docs.python.org/2/library/json.html#json.dumps  With the optional
parameters for sorting the keys, indentation size and punctuation.  This
way all the prettiness enhancements could be contained in a single function
w/o the need for generalized interface used in many places.

How about that?

--
Alex


Re: [HACKERS] Wire protocol compression

2016-04-21 Thread Shulgin, Oleksandr
On Thu, Apr 21, 2016 at 3:17 PM, Aleksander Alekseev <
a.aleks...@postgrespro.ru> wrote:

> > > or on Linux TCP/IP stack level.
> > >
> >
> > Yes, but if you want to have both compression and encryption it is
> > crucial to apply compression *before* encryption and I don't see how
> > this can happen with this approach.
>
> If I'm not mistaken IPSec gives you both compression and encryption.
> Just as an example.
>

True, but that setup (as well as any other hypothetical TCP/IP stack level
solution) would require network tweaking on both server and the clients.
With protocol-level compression you can avoid that altogether and
encryption is already solved with use of TLS.

--
Alex


Re: [HACKERS] Wire protocol compression

2016-04-21 Thread Shulgin, Oleksandr
On Thu, Apr 21, 2016 at 3:04 PM, Aleksander Alekseev <
a.aleks...@postgrespro.ru> wrote:

> > I guess since the usual answer for compression was "use what SSL
> > provides you for free", it's rather unlikely that someone bothered to
> > make a proxy just for that purpose, and really, a proxy is just
> > another moving part in your setup: not everyone will be thrilled to
> > add that.
>
> It just doesn't sound like a feature that should be implemented
> separately for every single application that uses TCP. Granted TCP proxy
> is not the most convenient way to solve a task. Maybe it could be
> implemented in OpenVPN


Which is another moving part with its own setup and maintenance overhead.


> or on Linux TCP/IP stack level.
>

Yes, but if you want to have both compression and encryption it is crucial
to apply compression *before* encryption and I don't see how this can
happen with this approach.

--
Alex


Re: [HACKERS] Wire protocol compression

2016-04-21 Thread Shulgin, Oleksandr
On Thu, Apr 21, 2016 at 11:07 AM, Aleksander Alekseev <
a.aleks...@postgrespro.ru> wrote:

> > Does it make sense to you guys to discuss compression outside of TLS?
> > There are potentially huge bandwidth savings which could benefit both
> > WAN and non-WAN scenarios, and decoupling this problem from TLS would
> > make it both accessible to everyone (assuming PostgreSQL clients
> > follow). It would be a protocol change though.
>
> I personally don't think it's something that should be implemented in
> PostgreSQL core. As a third-party TCP-proxy (on both client and server
> sides) with gzip/lz4 support perhaps. I'll be not surprised if it turns
> out that such projects already exist.
>

Hm, did you see this recent discussion on -hackers:
http://www.postgresql.org/message-id/flat/CAMkU=1zt9cjaQjVYAmywcP9iyxMJxFBUaVeB1eiaqBP=gej...@mail.gmail.com#CAMkU=1zt9cjaQjVYAmywcP9iyxMJxFBUaVeB1eiaqBP=gej...@mail.gmail.com
?

I guess since the usual answer for compression was "use what SSL provides
you for free", it's rather unlikely that someone bothered to make a proxy
just for that purpose, and really, a proxy is just another moving part in
your setup: not everyone will be thrilled to add that.

--
Alex


Re: [HACKERS] PQsendQuery+PQgetResult+PQsetSingleRowMode limitations and support

2016-04-11 Thread Shulgin, Oleksandr
On Mon, Apr 11, 2016 at 7:15 PM, Karl O. Pinc  wrote:
>
> Should I submit a regression test or something to ensure
> that this usage is officially supported?  (A grep for
> PQsetSingleRowMode in src/test/ finds no hits.)
> Can I assume because it's documented it'll continue to work?

Pretty much.

> > Not sure about the part
> > where you call PQsetSingleRowMode() again after seeing
> > PGRES_TUPLES_OK: doesn't look to me like you need or want to do
> > that.  You should only call it immediately after PQsendQuery().
>
> You're quite right.  All but the first PQsetSingleRowMode()
> calls fail.
>
> This seems unfortunate.   What if I submit several SQL statements
> with one PQsendQuery() call and I only want some of the statements
> executed in single row mode?

I would assume that if you know for which of the statements you want the
single row mode, then you as well can submit them as separate PQsendQuery()
calls.

> I'm not sure what the use case
> would be but it seems sad that PQsetSingleRowMode() is per
> libpq call and not per sql statement.

It is per query, where query == "argument to PQsendQuery()" :-)

> When the docs here say "query" what they really mean is "set of
> statements submitted in a single libpq call".

Which are the same things more or less, I'm not sure that the extended
explanation you suggest makes it less confusing.

--
Regards,
Alex


Re: [HACKERS] PQsendQuery+PQgetResult+PQsetSingleRowMode limitations and support

2016-04-11 Thread Shulgin, Oleksandr
On Fri, Apr 1, 2016 at 7:53 PM, Karl O. Pinc <k...@meme.com> wrote:
>
> On Fri, 1 Apr 2016 05:57:33 +0200
> "Shulgin, Oleksandr" <oleksandr.shul...@zalando.de> wrote:
>
> > On Apr 1, 2016 02:57, "Karl O. Pinc" <k...@meme.com> wrote:
> > >
> > > I assume there are no questions about supporting a
> > > similar functionality only without PQsetSingleRowMode,
> > > as follows:
> >
> > Sorry, but I don't see what is your actual question here?
>
> The question is whether or not the functionality of the first
> script is supported.  I ask since Bruce was surprised to see
> this working and questioned whether PG was intended to behave
> this way.

Well, according to the docs it should work, though I don't recall if I have
really tried that at least once.  Not sure about the part where you call
PQsetSingleRowMode() again after seeing PGRES_TUPLES_OK: doesn't look to me
like you need or want to do that.  You should only call it immediately
after PQsendQuery().

> > Both code examples are going to compile and work, AFAICS. The
> > difference is that the latter will try to fetch the whole result set
> > into client's memory before returning you a PGresult.
>
> Thanks for the clarification.  For some reason I recently
> got it into my head that the libpq buffering was on the server side,
> which is really strange since I long ago determined it was
> client side.

There are also a number of cases where the caching will happen on the
server side: using ORDER BY without an index available to fetch the records
in the required order is the most obvious one.

Less obvious is when you have a set-returning-function and use it like
"SELECT * FROM srffunc()", this will cause the intermediate result to be
materialized in a tuple store on the server side before it will be streamed
to the client.  On the other hand, if you use the same function as "SELECT
srffunc()" you are going to get the same results streamed to the client.
I've seen this a number of times already and I doesn't look like a
fundamental limitation of the execution engine to me, rather an
implementation deficiency.

Another plausible approach to get the results row by row is invoking COPY
protocol with the query: "COPY (SELECT ...) TO STDOUT".  This way you lose
the type information of course, but it still might be appropriate for some
use cases.

--
Regards,
Alex


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-04-04 Thread Shulgin, Oleksandr
On Apr 5, 2016 00:31, "Tom Lane"  wrote:
>
> Alex Shulgin  writes:
> > On Mon, Apr 4, 2016 at 1:06 AM, Tom Lane  wrote:
> >> I'm inclined to
> >> revert the aspect of 3d3bf62f3 that made us work from "d" (the observed
> >> number of distinct values in the sample) rather than stadistinct (the
> >> extrapolated estimate for the table).  On reflection I think that
that's
> >> inconsistent with the theory behind the old MCV-cutoff rule.  It
wouldn't
> >> matter if we were going to replace the cutoff rule with something else,
> >> but it's beginning to sound like that won't happen for 9.6.
>
> > Please feel free to do what you think is in the best interest of the
people
> > preparing 9.6 for the freeze.  I'm not all that familiar with the
process,
> > but I guess reverting this early might save some head-scratching if some
> > interesting interactions of this change combined with some others are
going
> > to show up.
>
> I've reverted that bit; so we still have the improvements associated with
> ignoring nulls, but nothing else at the moment.  I'll set this commitfest
> item back to Waiting on Author, just in case you are able to make some
> more progress before the end of the week.

OK, though it's unlikely that I'll get productive again before next week,
but maybe someone who has also been following this thread wants to step in?

Thanks.
--
Alex


Re: [HACKERS] SSL indicator in psql prompt

2016-04-04 Thread Shulgin, Oleksandr
On Apr 4, 2016 17:54, "Robert Haas"  wrote:
>
> On Fri, Apr 1, 2016 at 10:15 AM, Tom Lane  wrote:
> > Peter Eisentraut  writes:
> >> I like how browsers show a little lock in the address bar depending on
> >> whether SSL is in use.  This could be useful in psql as well.  Here is
a
> >> prototype patch.
> >> Comments?
> >
> > -1 on the hard-coded UTF8, even with the encoding check (which I don't
> > think is terribly trustworthy).  How about defining it in a way that
> > lets/makes the user provide the character(s) to print?
>
> I think you have been trolled.  Note the date.

Are you trying to say that this feature is in your opinion useless?

Even if that's an April Fools patch, I don't thiy it is entirely out of
scope. :-)

--
Alex


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-04-02 Thread Shulgin, Oleksandr
On Apr 2, 2016 18:38, "Tom Lane" <t...@sss.pgh.pa.us> wrote:
>
> "Shulgin, Oleksandr" <oleksandr.shul...@zalando.de> writes:
> > On Apr 1, 2016 23:14, "Tom Lane" <t...@sss.pgh.pa.us> wrote:
> >> Haven't looked at 0002 yet.
>
> > [crosses fingers] hope you'll have a chance to do that before feature
> > freeze for 9.6
>
> I studied this patch for awhile after rebasing it onto yesterday's
> commits.

Fantastic! I could not hope for a better reply :-)

> I did not like the fact that the compute_scalar_stats logic
> would allow absolutely anything into the MCV list once num_hist falls
> below 2. I think it's important that we continue to reject values that
> are only seen once in the sample, because there's no very good reason to
> think that they are MCVs and not just infrequent values that by luck
> appeared in the sample.

In my understanding we only put a value in the track list if we've seen it
at least twice, no?

> However, after I rearranged the tests there so
> that "if (num_hist >= 2)" only controlled whether to apply the 1/K limit,
> one of the regression tests started to fail:

Uh-oh.

> there's a place in
> rowsecurity.sql that expects that if a column contains nothing but several
> instances of a single value, that value will be recorded as a lone MCV.
> Now this isn't a particularly essential thing for that test, but it still
> seems like a good property for ANALYZE to have.

No objection here.

> The reason it's failing,
> of course, is that the test as written cannot possibly accept the last
> (or only) value.

Yeah, this I would expect from such a change.

> Before I noticed the regression failure, I'd been thinking that maybe it'd
> be better if the decision rule were not "at least 100+x% of the average
> frequency of this value and later ones", but "at least 100+x% of the
> average frequency of values after this one".

Hm, sounds pretty similar to what I wanted to achieve, but better
formalized.

> With that formulation, we're
> not constrained as to the range of x.  Now, if there are *no* values after
> this one, then this way needs an explicit special case in order not to
> compute 0/0; but the preceding para shows that we need a special case for
> the last value anyway.
>
> So, attached is a patch rewritten along these lines.  I used 50% rather
> than 25% as the new cutoff percentage --- obviously it should be higher
> in this formulation than before, but I have no idea if that particular
> number is good or we should use something else.  Also, the rule for the
> last value is "at least 1% of the non-null samples".  That's a pure guess
> as well.
>
> I do not have any good corpuses of data to try this on.  Can folks who
> have been following this thread try it on their data and see how it
> does?  Also please try some other multipliers besides 1.5, so we can
> get a feeling for where that cutoff should be placed.

Expect me to run it on my pet db early next week. :-)

Many thanks!
--
Alex


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-04-01 Thread Shulgin, Oleksandr
On Apr 1, 2016 23:14, "Tom Lane" <t...@sss.pgh.pa.us> wrote:
>
> "Shulgin, Oleksandr" <oleksandr.shul...@zalando.de> writes:
> > Alright.  I'm attaching the latest version of this patch split in two
> > parts: the first one is NULLs-related bugfix and the second is the
> > "improvement" part, which applies on top of the first one.
>
> I've applied the first of these patches,

Great news, thank you!

> broken into two parts first
> because it seemed like there were two issues and second because Tomas
> deserved primary credit for one part, ie realizing we were using the
> Haas-Stokes formula wrong.
>
> As for the other part, I committed it with one non-cosmetic change:
> I do not think it is right to omit "too wide" values when considering
> the threshold for MCVs.  As submitted, the patch was inconsistent on
> that point anyway since it did it differently in compute_distinct_stats
> and compute_scalar_stats.  But the larger picture here is that we define
> the MCV population to exclude nulls, so it's reasonable to consider a
> value as an MCV even if it's greatly outnumbered by nulls.  There is
> no such exclusion for "too wide" values; those things are just an
> implementation limitation in analyze.c, not something that is part of
> the pg_statistic definition.  If there are a lot of "too wide" values
> in the sample, we don't know whether any of them are duplicates, but
> we do know that the frequencies of the normal-width values have to be
> discounted appropriately.

Okay.

> Haven't looked at 0002 yet.

[crosses fingers] hope you'll have a chance to do that before feature
freeze for 9.6…

--
Alex


Re: [HACKERS] SSL indicator in psql prompt

2016-04-01 Thread Shulgin, Oleksandr
On Fri, Apr 1, 2016 at 2:52 PM, Peter Eisentraut  wrote:

> I like how browsers show a little lock in the address bar depending on
> whether SSL is in use.  This could be useful in psql as well.  Here is a
> prototype patch.
>
> Example:
>
> Put this in .psqlrc:
>
> \set PROMPT1 '%s%/%R%# '
>
> $ psql test
> psql (9.6devel)
> Type "help" for help.
>
> test=#
>
> Without SSL:
>
> test=#
>
> Comments?
>

Sounds reasonable.  What is the meaning of the latter symbol?

--
Alex


Re: [HACKERS] PQsendQuery+PQgetResult+PQsetSingleRowMode limitations and support

2016-03-31 Thread Shulgin, Oleksandr
On Apr 1, 2016 02:57, "Karl O. Pinc"  wrote:
>
> I assume there are no questions about supporting a
> similar functionality only without PQsetSingleRowMode,
> as follows:

Sorry, but I don't see what is your actual question here?

Both code examples are going to compile and work, AFAICS. The difference is
that the latter will try to fetch the whole result set into client's memory
before returning you a PGresult.

--
Alex


Re: [HACKERS] unexpected result from to_tsvector

2016-03-30 Thread Shulgin, Oleksandr
On Wed, Mar 30, 2016 at 10:17 AM, Artur Zakirov <a.zaki...@postgrespro.ru>
wrote:

> On 29.03.2016 19:17, Shulgin, Oleksandr wrote:
>
>>
>> Hm, indeed.   Unfortunately, it is not quite easy to find "the" new RFC,
>> there was quite a number of correcting and extending RFCs issued over
>> the last (almost) 30 years, which is not that surprising...
>>
>> Are we going to do something about it?  Is it likely that
>> relaxing/changing the rules on our side will break any possible
>> workarounds that people might have employed to make the search work like
>> they want it to work?
>>
>
> Do you mean here workarounds to recognize such values as 't...@123-reg.ro'
> as an email address? Actually I do not see any workarounds except a patch
> to PostgreSQL.
>

No, more like disallowing '_' in the host/domain- names.  Anyway, that is
pure speculation on my part.

By the way, Teodor committed the patch yesterday.


I've seen that after posting my reply to the list ;-)

--
Alex


Re: [HACKERS] Add schema-qualified relnames in constraint error messages.

2016-03-29 Thread Shulgin, Oleksandr
On Tue, Mar 15, 2016 at 4:44 PM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:

> On Mon, Mar 14, 2016 at 7:55 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>
>> "Shulgin, Oleksandr" <oleksandr.shul...@zalando.de> writes:
>> > What I dislike about this POC is all the disruption in libpq, to be
>> > honest.
>>
>> Yeah, I don't much like that either.  But I don't think we can avoid
>> some refactoring there; as designed, conversion of an error message into
>> user-visible form is too tightly tied to receipt of the message.
>>
>
> True.  Attached is a v2 which addresses all of the points raised earlier I
> believe.
>
> We still extract the message building part of code from
> pqGetErrorNotice3(), but the user-facing change is much more sane now: just
> added a new function PQresultVerboseErrorMessage().
>

I wonder if this sort of change has any chance to be back-patched?  If not,
it would be really nice to have any sort of review soonish.

Thank you.
--
Alex


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-03-29 Thread Shulgin, Oleksandr
On Tue, Mar 29, 2016 at 6:24 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> "Shulgin, Oleksandr" <oleksandr.shul...@zalando.de> writes:
> > I've just seen that this patch doesn't have a reviewer assigned
> anymore...
>
> I took my name off it because I was busy with other things and didn't
> want to discourage other people from reviewing it meanwhile.


I wanted to write that this should not be stopping anyone else from
attempting a review, but then learned that you have removed your name. ;-)


>   I do hope
> to get to it eventually but there's a lot of stuff on my to-do list.
>

Completely understood.

Thank you.
--
Alex


Re: [HACKERS] unexpected result from to_tsvector

2016-03-29 Thread Shulgin, Oleksandr
On Sun, Mar 20, 2016 at 3:42 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> "Shulgin, Oleksandr" <oleksandr.shul...@zalando.de> writes:
> > On Mar 20, 2016 01:09, "Dmitrii Golub" <dmitrii.go...@gmail.com> wrote:
> >> Alex, actually subdomain can start with digit,
>
> > Not according to the RFC you have linked to.
>
> The powers-that-be relaxed that some time ago; I assume there's a newer
> RFC.  For instance, "163.com" is a real domain:
>
> $ dig 163.com
> ...
> ;; QUESTION SECTION:
> ;163.com.   IN  A
>

Hm, indeed.   Unfortunately, it is not quite easy to find "the" new RFC,
there was quite a number of correcting and extending RFCs issued over the
last (almost) 30 years, which is not that surprising...

Are we going to do something about it?  Is it likely that relaxing/changing
the rules on our side will break any possible workarounds that people might
have employed to make the search work like they want it to work?

--
Alex


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-03-29 Thread Shulgin, Oleksandr
On Tue, Mar 15, 2016 at 4:47 PM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:

> On Wed, Mar 9, 2016 at 5:28 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>
>> "Shulgin, Oleksandr" <oleksandr.shul...@zalando.de> writes:
>> > Yes, I now recall that my actual concern was that sample_cnt may
>> calculate
>> > to 0 due to the latest condition above, but that also implies track_cnt
>> ==
>> > 0, and then we have a for loop there which will not run at all due to
>> this,
>> > so I figured we can avoid calculating avgcount and running the loop
>> > altogether with that check.  I'm not opposed to changing the condition
>> if
>> > that makes the code easier to understand (or dropping it altogether if
>> > calculating 0/0 is believed to be harmless anyway).
>>
>> Avoiding intentional zero divides is good.  It might happen to work
>> conveniently on your machine, but I wouldn't think it's portable.
>>
>
> Tom,
>
> Thank you for volunteering to review this patch!
>
> Are you waiting on me to produce an updated version with more comments
> about NULL-handling in the distinct estimator, or do you have something
> cooking already?
>

I've just seen that this patch doesn't have a reviewer assigned anymore...

I would welcome any review.  If we don't commit even the first part
(bugfix) now, is it going to be 9.7-only material?..

--
Alex


Re: [HACKERS] unexpected result from to_tsvector

2016-03-20 Thread Shulgin, Oleksandr
On Mar 20, 2016 01:09, "Dmitrii Golub" <dmitrii.go...@gmail.com> wrote:
>
> 2016-03-14 16:22 GMT+03:00 Shulgin, Oleksandr <
oleksandr.shul...@zalando.de>:
>>
>> In fact, the 123-yyy.zzz domain is not valid either according to the RFC
(subdomain can't start with a digit), but since we already allow it, should
we not allow 123_yyy.zzz to be recognized as a Host?  Then why not
recognize aaa@123_yyy.zzz as an email address?
>>
>> Another option is to prohibit underscore in recognized host names, but
this has more breakage potential IMO.
>>
>
> Alex, actually subdomain can start with digit,

Not according to the RFC you have linked to.

> try it.

What do you mean? Try it with ts_debug()? I already did, you could see me
referring to this example above: 123-yyy.zzz

--
Alex


Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries

2016-03-19 Thread Shulgin, Oleksandr
On Fri, Mar 18, 2016 at 7:53 AM, Haribabu Kommi <kommi.harib...@gmail.com>
wrote:
>
> On Thu, Mar 17, 2016 at 6:56 PM, Shulgin, Oleksandr
> <oleksandr.shul...@zalando.de> wrote:
> >
> > You mean change context name and correct the comment?  I didn't suggest
to
> > change the function name.
>
> Changed the context name and the comment only.

Check.

+} lookup_hba_line_context;
^ but why TAB here?

> >> > Still remains an issue of representing special keywords in database
and
> >> > user_name fields, but there was no consensus about that though.
> >>
> >> How about adding keyword_database and keyword_user columns to listing
> >> out the keywords.  These columns will be populated only when the hba
line
> >> contains any keywords.
> >
> >
> > Hm... that could work too.
>
> Here I attached patch with the added two keyword columns.

+ if (!tok->quoted && strcmp(tok->string, "all") == 0)

token_is_keyword(tok, "all") ?

> During the testing with different IP comparison methods like 'samehost' or
> 'samenet', the address details are not displayed. Is there any need of
> showing the IP compare method also?

Do you mean return "samehost/samenet/all" in the yet another
keyword_address out parameter or something like that?

--
Alex


Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries

2016-03-19 Thread Shulgin, Oleksandr
On Thu, Mar 17, 2016 at 2:12 AM, Haribabu Kommi <kommi.harib...@gmail.com>
wrote:

> On Wed, Mar 16, 2016 at 9:49 PM, Shulgin, Oleksandr
> <oleksandr.shul...@zalando.de> wrote:
> >
> > Some comments:
> >
> > +/* Context to use with hba_line_callback function. */
> > +typedef struct
> > +{
> > +   MemoryContext memcxt;
> > +   TupleDesc   tupdesc;
> > +   Tuplestorestate *tuple_store;
> > +}  HbalineContext;
> >
> > Rather "with *lookup_hba_line_callback*", as hba_line_callback() is a
> > generic one.
>
> Fine. I will change the function and context names.
>

You mean change context name and correct the comment?  I didn't suggest to
change the function name.

> Still remains an issue of representing special keywords in database and
> > user_name fields, but there was no consensus about that though.
>
> How about adding keyword_database and keyword_user columns to listing
> out the keywords.  These columns will be populated only when the hba line
> contains any keywords.
>

Hm... that could work too.

--
Alex


Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries

2016-03-16 Thread Shulgin, Oleksandr
On Tue, Mar 15, 2016 at 7:23 PM, David Steele  wrote:

> On 3/3/16 12:16 AM, Haribabu Kommi wrote:
> > On Fri, Feb 5, 2016 at 2:29 PM, Haribabu Kommi 
> wrote:
> >>
> >> This patch needs to be applied on top discard_hba_and_ident_cxt patch
> >> that is posted earlier.
> >
> > Here I attached a re-based patch to the latest head with inclusion of
> > discard_hba_ident_cxt patch for easier review as a single patch.
>
> Alex, Scott, do you have an idea of when you'll be able to review this
> new version?
>

The new version applies with some fuzziness to the current master and
compiles cleanly.

Some comments:

+/* Context to use with hba_line_callback function. */
+typedef struct
+{
+   MemoryContext memcxt;
+   TupleDesc   tupdesc;
+   Tuplestorestate *tuple_store;
+}  HbalineContext;

Rather "with *lookup_hba_line_callback*", as hba_line_callback() is a
generic one.

+ line_number |  mode   | type  | database | user_name |  address  |
  netmask | hostname | method | options |
 reason
+-+-+---+--+---+---+-+--++-+--
+  84 | skipped | local | {all}| {all} |   |
  |  | trust  | {}  |
connection type mismatch
+  86 | skipped | host  | {all}| {all} | 127.0.0.1 |
255.255.255.255 |  | trust  | {}  | IP
address mismatch
+  88 | matched | host  | {all}| {all} | ::1   |
::::::: |  | trust  | {}  |

Hm... now I'm not sure if we really need the "mode" column.  It should be
clear that we skipped every line that had a non-NULL "reason".  I guess we
could remove "mode" and rename "reason" to "skip_reason"?

Still remains an issue of representing special keywords in database and
user_name fields, but there was no consensus about that though.

--
Alex


Re: [HACKERS] Soliciting Feedback on Improving Server-Side Programming Documentation

2016-03-15 Thread Shulgin, Oleksandr
On Tue, Mar 15, 2016 at 6:02 PM, Corey Huinker 
wrote:

> Over the past few months, I've been familiarizing myself with postgres
> server side programming in C.
>
> My attempts to educate myself were slow and halting. The existing server
> side programming documentation has some examples, but those examples didn't
> show me how do what I wanted to do, and my research-via-google was highly
> circular, almost always pointing back to the documentation I had already
> found lacking, or a copy of it.
>
> Most of what I have learned I have culled from asking people on IRC, or
> bugging people I've met through user groups and PgConf. In all cases,
> people have been extremely helpful. However, this method is inefficient,
> because we're using two people's time, one of whom has to tolerate my
> incessant questions and slow learning pace.
>
> Furthermore, the helpful suggestions I received boiled down to:
> 1. The function/macro/var you're looking for is PG_FOO, git grep PG_FOO
> 2. Look in blah.c which does something like what you're trying to do
> 3. The comments in blah.h do a good job of listing and explaining this
> macro or that
>

There's also a good deal of README files in the source tree, so I would add:

4. find src -name 'README*'


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-03-15 Thread Shulgin, Oleksandr
On Wed, Mar 9, 2016 at 5:28 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> "Shulgin, Oleksandr" <oleksandr.shul...@zalando.de> writes:
> > Yes, I now recall that my actual concern was that sample_cnt may
> calculate
> > to 0 due to the latest condition above, but that also implies track_cnt
> ==
> > 0, and then we have a for loop there which will not run at all due to
> this,
> > so I figured we can avoid calculating avgcount and running the loop
> > altogether with that check.  I'm not opposed to changing the condition if
> > that makes the code easier to understand (or dropping it altogether if
> > calculating 0/0 is believed to be harmless anyway).
>
> Avoiding intentional zero divides is good.  It might happen to work
> conveniently on your machine, but I wouldn't think it's portable.
>

Tom,

Thank you for volunteering to review this patch!

Are you waiting on me to produce an updated version with more comments
about NULL-handling in the distinct estimator, or do you have something
cooking already?

--
Regards,
Alex


Re: [HACKERS] Add schema-qualified relnames in constraint error messages.

2016-03-15 Thread Shulgin, Oleksandr
On Mon, Mar 14, 2016 at 7:55 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> "Shulgin, Oleksandr" <oleksandr.shul...@zalando.de> writes:
> > What I dislike about this POC is all the disruption in libpq, to be
> > honest.
>
> Yeah, I don't much like that either.  But I don't think we can avoid
> some refactoring there; as designed, conversion of an error message into
> user-visible form is too tightly tied to receipt of the message.
>

True.  Attached is a v2 which addresses all of the points raised earlier I
believe.

We still extract the message building part of code from
pqGetErrorNotice3(), but the user-facing change is much more sane now: just
added a new function PQresultVerboseErrorMessage().

> It would be much neater if we could form the verbose message every
> > time and let the client decide where to cut it.  Maybe a bit "too clever"
> > would be to put a \0 char between short message and it's verbose
> > continuation.  The client could then reach the verbose part like this
> > (assuming that libpq did put a verbose part there): msg + strlen(msg) +
> 1.
>
> Blech :-(
>

:-)  That would not work either way, I've just noticed that SQLLEVEL is put
at the start of the message in verbose mode, but not by default.

Thinking about it, though, it seems to me that we could get away with
> always performing both styles of conversion and sticking both strings
> into the PGresult.  That would eat a little more space but not much.
> Then we just need to add API to let the application get at both forms.
>

This is what the v2 basically implements, now complete with help,
tab-complete and documentation changes.  I don't think we can add a usual
simple regression test here reliably, due to LOCATION field which might be
subject to unexpected line number changes.  But do we really need one?

--
Regards,
Alex
From d8d6a65da57d8e14eac4f614d31b19d52f8176d9 Mon Sep 17 00:00:00 2001
From: Oleksandr Shulgin <oleksandr.shul...@zalando.de>
Date: Wed, 6 Jan 2016 14:58:17 +0100
Subject: [PATCH] Add \errverbose psql command and support in libpq

For every error we build both the default and verbose error message,
then store both in PGresult.  The client can then retrieve the verbose
message using the new exported function PGresultVerboseErrorMessage().

In psql we store the verbose error message in pset (if any) for display
when requested by the user with \errverbose.
---
 doc/src/sgml/libpq.sgml |  40 -
 src/bin/psql/command.c  |   9 ++
 src/bin/psql/common.c   |   5 ++
 src/bin/psql/help.c |   1 +
 src/bin/psql/settings.h |   2 +
 src/bin/psql/startup.c  |   1 +
 src/bin/psql/tab-complete.c |   2 +-
 src/interfaces/libpq/exports.txt|   1 +
 src/interfaces/libpq/fe-exec.c  |  29 ++-
 src/interfaces/libpq/fe-protocol3.c | 166 +---
 src/interfaces/libpq/libpq-fe.h |   1 +
 src/interfaces/libpq/libpq-int.h|   3 +-
 12 files changed, 181 insertions(+), 79 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 2328d8f..8b9544f 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -2691,6 +2691,38 @@ char *PQresultErrorMessage(const PGresult *res);
   
  
 
+ 
+  
+   PQresultVerboseErrorMessage
+   
+PQresultVerboseErrorMessage
+   
+  
+
+  
+   
+Returns the most verbose error message associated with the
+command, or an empty string if there was no error.
+
+char *PQresultVerboseErrorMessage(const PGresult *res);
+
+If there was an error, the returned string will include a trailing
+newline.  The caller should not free the result directly. It will
+be freed when the associated PGresult handle is
+passed to PQclear.
+   
+
+   
+ If error verbosity is already set to the maximum available level,
+ this will return exactly the same error message
+ as PQresultErrorMessage. Otherwise it can be
+ useful to retrieve a more detailed error message without running
+ the failed query again (while increasing the error verbosity
+ level).
+   
+  
+ 
+
  
   PQresultErrorFieldPQresultErrorField
   
@@ -5579,9 +5611,11 @@ PGVerbosity PQsetErrorVerbosity(PGconn *conn, PGVerbosity verbosity);
   this will normally fit on a single line.  The default mode produces
   messages that include the above plus any detail, hint, or context
   fields (these might span multiple lines).  The VERBOSE
-  mode includes all available fields.  Changing the verbosity does not
-  affect the messages available from already-existing
-  PGresult objects, only subsequently-created ones.
+  mode includes all available fields.  It is still possible to
+  retriev

Re: [HACKERS] unexpected result from to_tsvector

2016-03-14 Thread Shulgin, Oleksandr
On Mon, Mar 7, 2016 at 10:46 PM, Artur Zakirov 
wrote:

> Hello,
>
> On 07.03.2016 23:55, Dmitrii Golub wrote:
>
>>
>>
>> Hello,
>>
>> Should we added tests for this case?
>>
>
> I think we should. I have added tests for teo...@123-stack.net and
> 1...@stack.net emails.
>
>
>> 123_reg.ro  is not valid domain name, bacause of
>> symbol "_"
>>
>> https://tools.ietf.org/html/rfc1035 page 8.
>>
>> Dmitrii Golub
>>
>
> Thank you for the information. Fixed.


Hm...  now that doesn't look all that consistent to me (after applying the
patch):

=# select ts_debug('simple', 'a...@123-yyy.zzz');
 ts_debug
---
 (email,"Email address",a...@123-yyy.zzz,{simple},simple,{a...@123-yyy.zzz})
(1 row)

But:

=# select ts_debug('simple', 'aaa@123_yyy.zzz');
ts_debug
-
 (asciiword,"Word, all ASCII",aaa,{simple},simple,{aaa})
 (blank,"Space symbols",@,{},,)
 (uint,"Unsigned integer",123,{simple},simple,{123})
 (blank,"Space symbols",_,{},,)
 (host,Host,yyy.zzz,{simple},simple,{yyy.zzz})
(5 rows)

One can also see that if we only keep the domain name, the result is
similar:

=# select ts_debug('simple', '123-yyy.zzz');
   ts_debug
---
 (host,Host,123-yyy.zzz,{simple},simple,{123-yyy.zzz})
(1 row)

=# select ts_debug('simple', '123_yyy.zzz');
  ts_debug
-
 (uint,"Unsigned integer",123,{simple},simple,{123})
 (blank,"Space symbols",_,{},,)
 (host,Host,yyy.zzz,{simple},simple,{yyy.zzz})
(3 rows)

But, this only has to do with 123 being recognized as a number, not with
the underscore:

=# select ts_debug('simple', 'abc_yyy.zzz');
   ts_debug
---
 (host,Host,abc_yyy.zzz,{simple},simple,{abc_yyy.zzz})
(1 row)

=# select ts_debug('simple', '1abc_yyy.zzz');
   ts_debug
---
 (host,Host,1abc_yyy.zzz,{simple},simple,{1abc_yyy.zzz})
(1 row)

In fact, the 123-yyy.zzz domain is not valid either according to the RFC
(subdomain can't start with a digit), but since we already allow it, should
we not allow 123_yyy.zzz to be recognized as a Host?  Then why not
recognize aaa@123_yyy.zzz as an email address?

Another option is to prohibit underscore in recognized host names, but this
has more breakage potential IMO.

--
Alex


Re: [HACKERS] Add schema-qualified relnames in constraint error messages.

2016-03-14 Thread Shulgin, Oleksandr
On Wed, Feb 10, 2016 at 12:33 AM, Daniel Verite <dan...@manitou-mail.org>
wrote:

>     Shulgin, Oleksandr wrote:
>
> > Most importantly, I'd like to learn of better options than storing the
> > whole last_result in psql's pset structure.
>
> I guess that you could, each time a query fails, gather silently the
> result of \errverbose, store it in a buffer, discard the PGresult,
> and in case the user does \errverbose before running another query,
> output what was in that buffer.
>

That's a neat idea.  I also think that we could only store last PGresult
when the query fails actually and discard it otherwise: the PGresult
holding only the error doesn't occupy too much space.

What I dislike about this POC is all the disruption in libpq, to be
honest.  It would be much neater if we could form the verbose message every
time and let the client decide where to cut it.  Maybe a bit "too clever"
would be to put a \0 char between short message and it's verbose
continuation.  The client could then reach the verbose part like this
(assuming that libpq did put a verbose part there): msg + strlen(msg) + 1.

--
Alex


Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-03-11 Thread Shulgin, Oleksandr
On Thu, Mar 10, 2016 at 9:05 PM, Tom Lane  wrote:

> Gilles Darold  writes:
> > Then, should I have to use an alternate file to store the information or
> > implement a bidirectional communication with the syslogger?
>
> I'd just define a new single-purpose file $PGDATA/log_file_name
> or some such.
>

Would it make sense to have it as a symlink instead?

--
Alex


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-03-09 Thread Shulgin, Oleksandr
On Wed, Mar 9, 2016 at 1:33 PM, Tomas Vondra <tomas.von...@2ndquadrant.com>
wrote:

> Hi,
>
> On Wed, 2016-03-09 at 11:23 +0100, Shulgin, Oleksandr wrote:
> > On Tue, Mar 8, 2016 at 8:16 PM, Alvaro Herrera
> > <alvhe...@2ndquadrant.com> wrote:
> >
> > Also, I can't quite figure out why the "else" now in line 2131
> > is now "else if track_cnt != 0".  What happens if track_cnt is
> > zero?
> > The comment above the "if" block doesn't provide any guidance.
> >
> >
> > It is there only to avoid potentially dividing zero by zero when
> > calculating avgcount (which will not be used after that anyway).  I
> > agree it deserves a comment.
>
> That definitely deserves a comment. It's not immediately clear why
> (track_cnt != 0) would prevent division by zero in the code. The only
> way such error could happen is if ndistinct==0, because that's the
> actual denominator. Which means this
>
> ndistinct = ndistinct * sample_cnt
>
> would have to evaluate to 0. But ndistinct==0 can't happen as we're in
> the (nonnull_cnt > 0) branch, and that guarantees (standistinct != 0).
>
> Thus the only possibility seems to be (nonnull_cnt==toowide_cnt). Why
> not to use this condition instead?
>

Yes, I now recall that my actual concern was that sample_cnt may calculate
to 0 due to the latest condition above, but that also implies track_cnt ==
0, and then we have a for loop there which will not run at all due to this,
so I figured we can avoid calculating avgcount and running the loop
altogether with that check.  I'm not opposed to changing the condition if
that makes the code easier to understand (or dropping it altogether if
calculating 0/0 is believed to be harmless anyway).

--
Alex


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-03-09 Thread Shulgin, Oleksandr
On Tue, Mar 8, 2016 at 8:16 PM, Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:

> Shulgin, Oleksandr wrote:
>
> > Alright.  I'm attaching the latest version of this patch split in two
> > parts: the first one is NULLs-related bugfix and the second is the
> > "improvement" part, which applies on top of the first one.
>
> I went over patch 0001 and it seems pretty reasonable.  It's missing
> some comment updates -- at least the large comments that talk about Duj1
> should be modified to indicate why the code is now subtracting the null
> count.


Good point.


> Also, I can't quite figure out why the "else" now in line 2131
> is now "else if track_cnt != 0".  What happens if track_cnt is zero?
> The comment above the "if" block doesn't provide any guidance.
>

It is there only to avoid potentially dividing zero by zero when
calculating avgcount (which will not be used after that anyway).  I agree
it deserves a comment.

Thank you!
--
Alex


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-03-09 Thread Shulgin, Oleksandr
On Tue, Mar 8, 2016 at 9:10 PM, Joel Jacobson <j...@trustly.com> wrote:

> On Wed, Mar 9, 2016 at 1:25 AM, Shulgin, Oleksandr
> <oleksandr.shul...@zalando.de> wrote:
> > Thank you for spending your time to run these :-)
>
> n/p, it took like 30 seconds :-)
>

Great!  I'm glad to hear it was as easy to use as I hoped for :-)

> I don't want to be asking for too much here, but is there a chance you
> could
> > try the effects of the proposed patch on an offline copy of your
> database?
>
> Yes, I think that should be possible.
>
> > Do you envision or maybe have experienced problems with query plans
> > referring to the columns that are near the top of the above hist_ratio
> > report?  In other words: what are the practical implications for you with
> > the values being duplicated rather badly throughout the histogram like in
> > the example you shown?
>
> I don't know much about the internals of query planner,
> I just read the "57.1. Row Estimation Examples" to get a basic
> understanding.
>
> If I understand it correctly, if the histogram_bounds contains a lot
> of duplicated values,
> then the row estimation will be inaccurate, which in turn will trick
> the query planner
> into a sub-optimal plan?
>

Yes, basically it should matter the most for the equality comparison
operator, such that a MCV entry would provide more accurate selectivity
estimate (and the histogram is not used at all in this case anyway).  For
the "less/greater-than" comparison both MCV list and histogram are used, so
the drawback of having repeated values in the histogram, in my
understanding is the same: less accurate selectivity estimates for the
values that could fall precisely into a bin which didn't make it into the
histogram.

We've had some problems lately with the query planner, or actually we've
> always
> had them but never noticed them nor cared about them, but now during peak
> times
> we've had short periods where we haven't been able to fully cope up
> with the traffic.
>
> I tracked down the most self_time-consuming functions and quickly saw
> how to optimize them.
> Many of them where on the form:
> SELECT .. FROM SomeBigTable WHERE Col1 = [some dynamic value] AND Col2
> = [some constant value] AND Col3 = [some other constant value]
> The number of rows matching the WHERE clause were very tiny, perfect
> match for a partial index:
> CREATE INDEX .. ON SomeBigTable USING btree (Col1) WHERE Col2 = [some
> constant value] AND Col3 = [some other constant value];
>
> Even though the new partial index matched the query perfectly, the
> query planner didn't want to use it. Instead it continued to use some
> other sub-optimal index.
>
> The only way to force it to use the correct index was to use the
> "+0"-trick which I recently learned from one of my colleagues:
> SELECT .. FROM SomeBigTable WHERE Col1 = [some dynamic value] AND
> Col2+0 = [some constant value] AND Col3+0 = [some other constant
> value]
> CREATE INDEX .. ON SomeBigTable USING btree (Col1) WHERE Col2+0 =
> [some constant value] AND Col3+0 = [some other constant value];
>
> By adding +0 to the columns, the query planner will as I understand it
> be extremely motivated to use the correct index, as otherwise it would
> have to do a seq scan on the entire big table, which would be very
> costly.
>
> I'm glad the trick worked, now the system is fast again.
>
> We're still on 9.1, so maybe these problems will go away once we upgrade
> to 9.5.
>

Hm... sounds like a planner bug to me.  I'm not exceptionally aware of the
changes in partial index handling that were made after 9.1, though grepping
the commit log for "partial index" produces a number of hits after the date
of 9.1 release.

I don't know if these problems I described can be fixed by your patch,
> but I wanted to share this story since I know our systems (Trustly's
> and Zalando's) are quite similar in design,
> so maybe you have experienced something similar.
>

I would not expect this type of problem to be affected by the patch in any
way, though maybe I'm missing the complete picture here.

Also, I'm not aware of similar problems in our systems, but I can ask
around. :-)

Thank you.
--
Alex


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-03-08 Thread Shulgin, Oleksandr
On Tue, Mar 8, 2016 at 3:36 PM, Joel Jacobson  wrote:

> Hi Alex,
>
> Thanks for excellent research.
>

Joel,

Thank you for spending your time to run these :-)

I've ran your queries against Trustly's production database and I can
> confirm your findings, the results are similar:
>
> WITH ...
> SELECT count(1),
>min(hist_ratio)::real,
>avg(hist_ratio)::real,
>max(hist_ratio)::real,
>stddev(hist_ratio)::real
>   FROM stats2
>  WHERE histogram_bounds IS NOT NULL;
>
> -[ RECORD 1 ]
> count  | 2814
> min| 0.193548
> avg| 0.927357
> max| 1
> stddev | 0.164134
>
>
> WHERE distinct_hist < num_hist
> -[ RECORD 1 ]
> count  | 624
> min| 0.193548
> avg| 0.672407
> max| 0.990099
> stddev | 0.194901
>
>
> WITH ..
> SELECT schemaname ||'.'|| tablename ||'.'|| attname || (CASE inherited
> WHEN TRUE THEN ' (inherited)' ELSE '' END) AS columnname,
>n_distinct, null_frac,
>num_mcv, most_common_vals, most_common_freqs,
>mcv_frac, (mcv_frac / (1 - null_frac))::real AS nonnull_mcv_frac,
>distinct_hist, num_hist, hist_ratio,
>histogram_bounds
>   FROM stats2
>  ORDER BY hist_ratio
>  LIMIT 1;
>
>  -[ RECORD 1
> ]-+-
> columnname| public.x.y
> n_distinct| 103
> null_frac | 0
> num_mcv   | 10
> most_common_vals  | {0,1,2,3,4,5,6,7,8,9}
> most_common_freqs |
>
> {0.4765,0.141733,0.1073,0.0830667,0.0559667,0.037,0.0251,0.0188,0.0141,0.0113667}
> mcv_frac  | 0.971267
> nonnull_mcv_frac  | 0.971267
> distinct_hist | 18
> num_hist  | 93
> hist_ratio| 0.193548387096774
> histogram_bounds  |
>
> {10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,12,12,12,12,12,12,12,12,12,12,12,12,13,13,13,13,13,13,13,13,13,13,14,14,14,14,14,15,15,15,15,16,16,16,16,21,23,5074,5437,5830,6049,6496,7046,7784,14629,21285}
>

I don't want to be asking for too much here, but is there a chance you
could try the effects of the proposed patch on an offline copy of your
database?

Do you envision or maybe have experienced problems with query plans
referring to the columns that are near the top of the above hist_ratio
report?  In other words: what are the practical implications for you with
the values being duplicated rather badly throughout the histogram like in
the example you shown?

Thank you!
--
Alex


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-03-08 Thread Shulgin, Oleksandr
On Mon, Mar 7, 2016 at 6:02 PM, Jeff Janes <jeff.ja...@gmail.com> wrote:

> On Mon, Mar 7, 2016 at 3:17 AM, Shulgin, Oleksandr
> <oleksandr.shul...@zalando.de> wrote:
> >
> > They might get that different plan when they upgrade to the latest major
> > version anyway.  Is it set somewhere that minor version upgrades should
> > never affect the planner?  I doubt so.
>
> People with meticulous standards are expected to re-validate their
> application, including plans and performance, before doing major
> version updates into production. They can continue to use a *fully
> patched* server from a previous major release while they do that.
>
> This is not the case for minor version updates.  We do not want to put
> people in the position where getting a security or corruption-risk
> update forces them to also accept changes which may destroy the
> performance of their system.
>
> I don't know if it is set out somewhere else, but there are many
> examples in this list of us declining to back-patch performance bug
> fixes which might negatively impact some users.  The only times we
> have done it that I can think of are when there is almost no
> conceivable way it could have a meaningful negative effect, or if the
> bug was tied in with security or stability bugs that needed to be
> fixed anyway and couldn't be separated.
>

The necessity to perform security upgrades is indeed a valid argument
against back-patching this, since this is not a bug that causes incorrect
results or data corruption, etc.

Thank you all for the thoughtful replies!
--
Alex


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-03-07 Thread Shulgin, Oleksandr
On Fri, Mar 4, 2016 at 7:27 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Thu, Mar 3, 2016 at 2:48 AM, Shulgin, Oleksandr
> <oleksandr.shul...@zalando.de> wrote:
> > On Wed, Mar 2, 2016 at 7:33 PM, Alvaro Herrera <alvhe...@2ndquadrant.com
> >
> > wrote:
> >> Shulgin, Oleksandr wrote:
> >>
> >> > Alright.  I'm attaching the latest version of this patch split in two
> >> > parts: the first one is NULLs-related bugfix and the second is the
> >> > "improvement" part, which applies on top of the first one.
> >>
> >> So is this null-related bugfix supposed to be backpatched?  (I assume
> >> it's not because it's very likely to change existing plans).
> >
> > For the good, because cardinality estimations will be more accurate in
> these
> > cases, so yes I would expect it to be back-patchable.
>
> -1.  I think the cost of changing existing query plans in back
> branches is too high.  The people who get a better plan never thank
> us, but the people who (by bad luck) get a worse plan always complain.
>

They might get that different plan when they upgrade to the latest major
version anyway.  Is it set somewhere that minor version upgrades should
never affect the planner?  I doubt so.

--
Alex


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-03-02 Thread Shulgin, Oleksandr
On Wed, Mar 2, 2016 at 7:33 PM, Alvaro Herrera <alvhe...@2ndquadrant.com>
wrote:

> Shulgin, Oleksandr wrote:
>
> > Alright.  I'm attaching the latest version of this patch split in two
> > parts: the first one is NULLs-related bugfix and the second is the
> > "improvement" part, which applies on top of the first one.
>
> So is this null-related bugfix supposed to be backpatched?  (I assume
> it's not because it's very likely to change existing plans).
>

For the good, because cardinality estimations will be more accurate in
these cases, so yes I would expect it to be back-patchable.

-- 
Alex


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-03-02 Thread Shulgin, Oleksandr
On Wed, Mar 2, 2016 at 5:46 PM, David Steele <da...@pgmasters.net> wrote:

> On 3/2/16 11:10 AM, Shulgin, Oleksandr wrote:
> > On Wed, Feb 24, 2016 at 12:30 AM, Tomas Vondra
> > <tomas.von...@2ndquadrant.com <mailto:tomas.von...@2ndquadrant.com>>
> wrote:
> >
> > I think it'd be useful not to have all the changes in one lump, but
> > structure this as a patch series with related changes in separate
> > chunks. I doubt we'd like to mix the changes in a single commit, and
> > it makes the reviews and reasoning easier. So those NULL-handling
> > fixes should be in one patch, the MCV patches in another one.
> >
> >
> > OK, such a split would make sense to me.  Though, I'm a bit late as the
> > commitfest is already closed to new patches, I guess asking the CF
> > manager to split this might work (assuming I produce the patch files)?
>
> If the patch is broken into two files that gives the review/committer
> more options but I don't think it requires another CF entry.
>

Alright.  I'm attaching the latest version of this patch split in two
parts: the first one is NULLs-related bugfix and the second is the
"improvement" part, which applies on top of the first one.

--
Alex
From a6b1cb866f9374cdc893e9a318959eccaa5bfbc9 Mon Sep 17 00:00:00 2001
From: Oleksandr Shulgin <oleksandr.shul...@zalando.de>
Date: Wed, 2 Mar 2016 18:18:36 +0100
Subject: [PATCH 1/2] Account for NULLs in ANALYZE more strictly

Previously the ndistinct and avgcount calculation (for MCV list) could
be affected greatly by high fraction of NULLs in the sample.  Account
for that by subtracting the number of NULLs we've seen from the total
sample size explicitly.

At the same time, values that are considered "too wide" are accounted
for in ndistinct, but removed from sample size for MCV list
calculation.  In compute_distinct_stats() we need to do that manually,
in compute_scalar_stats() the value_cnt is already holding the number
of non-null, not too-wide values.
---
 src/backend/commands/analyze.c | 42 --
 1 file changed, 24 insertions(+), 18 deletions(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 8a5f07c..f05b496 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -2085,17 +2085,21 @@ compute_distinct_stats(VacAttrStatsP stats,
 		denom,
 		stadistinct;
 
-			numer = (double) samplerows *(double) d;
+			double		samplerows_nonnull = samplerows - null_cnt;
+			double		totalrows_nonnull
+			= totalrows * (1.0 - stats->stanullfrac);
 
-			denom = (double) (samplerows - f1) +
-(double) f1 *(double) samplerows / totalrows;
+			numer = samplerows_nonnull * (double) d;
+
+			denom = (samplerows_nonnull - f1) +
+(double) f1 * samplerows_nonnull / totalrows_nonnull;
 
 			stadistinct = numer / denom;
 			/* Clamp to sane range in case of roundoff error */
 			if (stadistinct < (double) d)
 stadistinct = (double) d;
-			if (stadistinct > totalrows)
-stadistinct = totalrows;
+			if (stadistinct > totalrows_nonnull)
+stadistinct = totalrows_nonnull;
 			stats->stadistinct = floor(stadistinct + 0.5);
 		}
 
@@ -2124,16 +2128,17 @@ compute_distinct_stats(VacAttrStatsP stats,
 			/* Track list includes all values seen, and all will fit */
 			num_mcv = track_cnt;
 		}
-		else
+		else if (track_cnt != 0)
 		{
+			int			sample_cnt = nonnull_cnt - toowide_cnt;
 			double		ndistinct = stats->stadistinct;
 			double		avgcount,
 		mincount;
 
 			if (ndistinct < 0)
-ndistinct = -ndistinct * totalrows;
+ndistinct = -ndistinct * sample_cnt;
 			/* estimate # of occurrences in sample of a typical value */
-			avgcount = (double) samplerows / ndistinct;
+			avgcount = (double) sample_cnt / ndistinct;
 			/* set minimum threshold count to store a value */
 			mincount = avgcount * 1.25;
 			if (mincount < 2)
@@ -2434,17 +2439,21 @@ compute_scalar_stats(VacAttrStatsP stats,
 		denom,
 		stadistinct;
 
-			numer = (double) samplerows *(double) d;
+			double		samplerows_nonnull = samplerows - null_cnt;
+			double		totalrows_nonnull
+			= totalrows * (1.0 - stats->stanullfrac);
+
+			numer = samplerows_nonnull * (double) d;
 
-			denom = (double) (samplerows - f1) +
-(double) f1 *(double) samplerows / totalrows;
+			denom = (samplerows_nonnull - f1) +
+(double) f1 * samplerows_nonnull / totalrows_nonnull;
 
 			stadistinct = numer / denom;
 			/* Clamp to sane range in case of roundoff error */
 			if (stadistinct < (double) d)
 stadistinct = (double) d;
-			if (stadistinct > totalrows)
-stadistinct = totalrows;
+			if (stadistinct > totalrows_nonnull)
+stadistinct = totalrows_nonnull;
 			stats->stadistinct = floor(stadistinct + 0.5);
 		}
 
@@ -2480,21 +2489,18 @@ compute_scalar_stats(Va

Re: [HACKERS] More stable query plans via more predictable column statistics

2016-03-02 Thread Shulgin, Oleksandr
On Wed, Feb 24, 2016 at 12:30 AM, Tomas Vondra <tomas.von...@2ndquadrant.com
> wrote:

> Hi,
>
> On 02/08/2016 03:01 PM, Shulgin, Oleksandr wrote:
> >
> ...
>
>>
>> I've incorporated this fix into the v2 of my patch, I think it is
>> related closely enough.  Also, added corresponding changes to
>> compute_distinct_stats(), which doesn't produce a histogram.
>>
>
> I think it'd be useful not to have all the changes in one lump, but
> structure this as a patch series with related changes in separate chunks. I
> doubt we'd like to mix the changes in a single commit, and it makes the
> reviews and reasoning easier. So those NULL-handling fixes should be in one
> patch, the MCV patches in another one.


OK, such a split would make sense to me.  Though, I'm a bit late as the
commitfest is already closed to new patches, I guess asking the CF manager
to split this might work (assuming I produce the patch files)?

--
Alex


Re: [HACKERS] [PATH] Correct negative/zero year in to_date/to_timestamp

2016-02-26 Thread Shulgin, Oleksandr
On Fri, Feb 26, 2016 at 3:24 PM, Ivan Kartyshov 
wrote:

> The following review has been posted through the commitfest application:
>


> make installcheck-world:  tested, failed
> Implements feature:   tested, failed
> Spec compliant:   tested, failed
> Documentation:tested, failed
>
> Applied this patch, it works well, make what it expected correctly, code
> style is maintained. Regression tests passed OK. No documentation or
> comments.
>

Why does it say "tested, failed" for all points above there? ;-)

--
Alex


Re: [HACKERS] Add schema-qualified relnames in constraint error messages.

2016-02-09 Thread Shulgin, Oleksandr
On Mon, Feb 8, 2016 at 5:24 PM, Daniel Verite <dan...@manitou-mail.org>
wrote:

>     Shulgin, Oleksandr wrote:
>
> > Added to the Open commitfest: https://commitfest.postgresql.org/9/475/
>
> Here's a review. Note that the patch tested and submitted
> is not the initial one in the thread, so it doesn't exactly
> match  $subject now.
>

I've edited the CF entry title to read: Add \errverbose to psql (and
support in libpq)

What's tested here is a client-side approach, suggested by Tom
> upthread as an alternative, and implemented by Oleksandr in
> 0001-POC-errverbose-in-psql.patch
>
> The patch has two parts: one part in libpq exposing a new function
> named PQresultBuildErrorMessage, and a second part implementing an
> \errverbose command in psql, essentially displaying the result of the
> function.
> The separation makes sense if we consider that other clients might benefit
> from the function, or that libpq is a better place than psql to maintain
> this in the future, as the list of error fields available in a PGresult
> might grow.
> OTOH maybe adding a new libpq function just for that is overkill, but this
> is subjective.
>
> psql part
> =
> Compiles and works as intended.
> For instance with \set VERBOSITY default, an FK violation produces:
>
>   # insert into table2 values(10);
>   ERROR:  insert or update on table "table2" violates foreign key
> constraint
> "table2_id_tbl1_fkey"
>   DETAIL:  Key (id_tbl1)=(10) is not present in table "table1".
>
> Then \errverbose just displays the verbose form of the error:
>   # \errverbose
> ERROR:  23503: insert or update on table "table2" violates foreign
>   key constraint "table2_id_tbl1_fkey"
> DETAIL:  Key (id_tbl1)=(10) is not present in table "table1".
> SCHEMA NAME:  public
> TABLE NAME:  table2
> CONSTRAINT NAME:  table2_id_tbl1_fkey
> LOCATION:  ri_ReportViolation, ri_triggers.c:3326
>
> - make check passes
> - valgrind test is OK (no obvious leak after using the command).
>
> Missing bits:
> - the command is not mentioned in the help (\? command, see help.c)
> - it should be added to tab completions (see tab-complete.c)
> - it needs to be described in the SGML documentation
>
> libpq part
> ==
> The patch implements and exports a new PQresultBuildErrorMessage()
> function.
>
> AFAICS its purpose is to produce a result similar to what
> PQresultErrorMessage() would have produced, if PQERRORS_VERBOSE
> was the verbosity in effect at execution time.
>
> My comments:
>
> - the name of the function does not really hint at what it does.
> Maybe something like PQresultVerboseErrorMessage() would be more
> explicit?
>

Not exactly, the verbosity setting might or might not be in effect when
this function is called.  Another external part of the state that might
affect the result is conn->show_context field.

I would expect the new function to have the same interface than
> PQresultErrorMessage(), but it's not the case.
>
> - it takes a PGconn* and PGresult* as input parameters, but
> PQresultErrorMessage takes only a  as input.
> It's not clear why PGconn* is necessary.
>

This is because PQresultErrorMessage() returns a stored error message:
res->errMsg (or "").

- it returns a pointer to a strdup'ed() buffer, which
> has to be freed separately by the caller. That differs from
> PQresultErrorMessage() which keeps this managed inside the
> PGresult struct.
>

For the same reason: this function actually produces a new error message,
as opposed to returning a stored one.

- if protocol version < 3, an error message is returned. It's not
> clear to the caller that this error is emitted by the function itself
> rather than the query. Besides, are we sure it's necessary?
> Maybe the function could just produce an output with whatever
> error fields it has, even minimally with older protocol versions,
> rather than failing?
>

Hm, probably we could just copy the message from conn->errorMessage, in
case of protocol v2, but I don't see a point in passing PGresult to the
func or naming it PQresult in the case of v2.

- if the fixed error message is kept, it should pass through
> libpq_gettext() for translation.
>

Good point.

- a description of the function should be added to the SGML doc.
> There's a sentence in PQsetErrorVerbosity() that says, currently:
>
>   "Changing the verbosity does not affect the messages available from
>already-existing PGresult objects, only subsequently-created ones."
>
> As it's precisely the point of that new function, that bit could
> be altered to refer to it.
>

I didn't touch the documentation specifically, because this is just a
proof-of-concept, but it's good that you notice this detail.  Most
importantly, I'd like to learn of better options than storing the whole
last_result in psql's pset structure.

Thanks for the review!

--
Alex


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-02-08 Thread Shulgin, Oleksandr
On Mon, Jan 25, 2016 at 5:11 PM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:
>
> On Sat, Jan 23, 2016 at 11:22 AM, Tomas Vondra <
tomas.von...@2ndquadrant.com> wrote:
>>
>>
>> Overall, I think this is really about deciding when to cut-off the MCV,
so that it does not grow needlessly large - as Robert pointed out, the
larger the list, the more expensive the estimation (and thus planning).
>>
>> So even if we could fit the whole sample into the MCV list (i.e. we
believe we've seen all the values and we can fit them into the MCV list),
it may not make sense to do so. The ultimate goal is to estimate
conditions, and if we can do that reasonably even after cutting of the
least frequent values from the MCV list, then why not?
>>
>> From this point of view, the analysis concentrates deals just with the
ANALYZE part and does not discuss the estimation counter-part at all.
>
>
> True, this aspect still needs verification.  As stated, my primary
motivation was to improve the plan stability for relatively short MCV lists.
>
> Longer MCV lists might be a different story, but see "Increasing stats
target" section of the original mail: increasing the target doesn't give
quite the expected results with unpatched code either.

To address this concern I've run my queries again on the same dataset, now
focusing on how the number of MCV items changes with the patched code
(using the CTEs from my original mail):

WITH ...

SELECT count(1),
   min(num_mcv)::real,
   avg(num_mcv)::real,
   max(num_mcv)::real,
   stddev(num_mcv)::real

  FROM stats2

 WHERE num_mcv IS NOT NULL;

(ORIGINAL)
count  | 27452
min| 1
avg| 32.7115
max| 100
stddev | 40.6927

(PATCHED)
count  | 27527
min| 1
avg| 38.4341
max| 100
stddev | 43.3596

A significant portion of the MCV lists is occupying all 100 slots available
with the default statistics target, so it also interesting to look at the
stats that habe "underfilled" MCV lists (by changing the condition of the
WHERE clause to read "num_mcv < 100"):

(<100 ORIGINAL)
count  | 20980
min| 1
avg| 11.9541
max| 99
stddev | 18.4132

(<100 PATCHED)
count  | 19329
min| 1
avg| 12.3222
max| 99
stddev | 19.6959

As one can see, with the patched code the average length of MCV lists
doesn't change all that dramatically, while at the same time exposing all
the improvements described in the original mail.

>> After fixing the estimator to consider fraction of NULLs, the estimates
look like this:
>>
>> statistics target |   master  |  patched
>>--
>>   100 | 1302  | 5356
>>  1000 | 6022  | 6791
>>
>> So this seems to significantly improve the ndistinct estimate (patch
attached).
>
>
> Hm... this looks correct.  And compute_distinct_stats() needs the same
treatment, obviously.

I've incorporated this fix into the v2 of my patch, I think it is related
closely enough.  Also, added corresponding changes to
compute_distinct_stats(), which doesn't produce a histogram.

I'm adding this to the next CommitFest.  Further reviews are very much
appreciated!

--
Alex
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 070df29..cbf3538 100644
*** a/src/backend/commands/analyze.c
--- b/src/backend/commands/analyze.c
***
*** 2079,2095 
  		denom,
  		stadistinct;
  
! 			numer = (double) samplerows *(double) d;
  
! 			denom = (double) (samplerows - f1) +
! (double) f1 *(double) samplerows / totalrows;
  
  			stadistinct = numer / denom;
  			/* Clamp to sane range in case of roundoff error */
  			if (stadistinct < (double) d)
  stadistinct = (double) d;
! 			if (stadistinct > totalrows)
! stadistinct = totalrows;
  			stats->stadistinct = floor(stadistinct + 0.5);
  		}
  
--- 2079,2099 
  		denom,
  		stadistinct;
  
! 			double		samplerows_nonnull = samplerows - null_cnt;
! 			double		totalrows_nonnull
! 			= totalrows * (1.0 - stats->stanullfrac);
  
! 			numer = samplerows_nonnull * (double) d;
! 
! 			denom = (samplerows_nonnull - f1) +
! (double) f1 * samplerows_nonnull / totalrows_nonnull;
  
  			stadistinct = numer / denom;
  			/* Clamp to sane range in case of roundoff error */
  			if (stadistinct < (double) d)
  stadistinct = (double) d;
! 			if (stadistinct > totalrows_nonnull)
! stadistinct = totalrows_nonnull;
  			stats->stadistinct = floor(stadistinct + 0.5);
  		}
  
***
*** 2120,2146 
  		}
  		else
  		{
  			double		ndistinct = stats->stadistinct;
- 			double		avgcount,
- 		mincount;
  
  			if (ndistinct < 0)
! ndistinct = -ndistinct * totalrows;
! 			/* estimate # of occurrences in sample of a typical value */
! 			avgcou

Re: [HACKERS] Trivial doc fix in logicaldecoding.sgml

2016-01-28 Thread Shulgin, Oleksandr
On Thu, Jan 28, 2016 at 5:55 AM, Fujii Masao <masao.fu...@gmail.com> wrote:

> On Wed, Jan 27, 2016 at 7:34 PM, Shulgin, Oleksandr
> <oleksandr.shul...@zalando.de> wrote:
> > Hi,
> >
> > Please find attached a simple copy-paste fix for CREATE_REPLICATION_SLOT
> > syntax.
>
> We should change also START_REPLICATION SLOT syntax document as follows?
>
> -  START_REPLICATION SLOT
> slot_name LOGICAL
> options
> +  START_REPLICATION SLOT
> slot_name LOGICAL
> XXX/XXX
> (options)
>

If a committer would thinks so, I don't object.  Though this one is rather
a detail for which the reader is already referred to protocol-replication,
while my fix was about a factual error.

--
Alex


Re: [HACKERS] Trivial doc fix in logicaldecoding.sgml

2016-01-28 Thread Shulgin, Oleksandr
On Thu, Jan 28, 2016 at 9:42 AM, Craig Ringer <cr...@2ndquadrant.com> wrote:
>
>
> On 28 January 2016 at 16:36, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:
>>
>> On Thu, Jan 28, 2016 at 5:55 AM, Fujii Masao <masao.fu...@gmail.com>
wrote:
>>>
>>>
>>> We should change also START_REPLICATION SLOT syntax document as follows?
>>>
>>> -  START_REPLICATION SLOT
>>> slot_name LOGICAL
>>> options
>>> +  START_REPLICATION SLOT
>>> slot_name LOGICAL
>>> XXX/XXX
>>> (options)
>>
>>
>> If a committer would thinks so, I don't object.  Though this one is
rather a detail for which the reader is already referred to
protocol-replication, while my fix was about a factual error.
>>
>
> I think it should be changed. I've already had people confused by this.
>
> Either that or remove the synopsis entirely, changing it to
>
> START_REPLICATION SLOT 
>
> and linking to the protocol docs. Which might be better.

I think it still makes sense to keep the LOGICAL, but hide the rest of the
details behind that ellipsis, so:

  START_REPLICATION SLOT slot_name LOGICAL ...

Updated patch attached.

--
Alex
From 11b61d04cd16b9577759edff15e68f8ba9c4828e Mon Sep 17 00:00:00 2001
From: Oleksandr Shulgin <oleksandr.shul...@zalando.de>
Date: Wed, 27 Jan 2016 11:27:35 +0100
Subject: [PATCH] Fix protocol commands description in logicaldecoding.sgml

---
 doc/src/sgml/logicaldecoding.sgml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 1ae5eb6..e841348 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -280,7 +280,7 @@ $ pg_recvlogical -d postgres --slot test --drop-slot
 The commands
 
  
-  CREATE_REPLICATION_SLOT slot_name LOGICAL options
+  CREATE_REPLICATION_SLOT slot_name LOGICAL output_plugin
  
 
  
@@ -288,7 +288,7 @@ $ pg_recvlogical -d postgres --slot test --drop-slot
  
 
  
-  START_REPLICATION SLOT slot_name LOGICAL options
+  START_REPLICATION SLOT slot_name LOGICAL ...
  
 
 are used to create, drop, and stream changes from a replication
-- 
2.5.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Trivial doc fix in logicaldecoding.sgml

2016-01-27 Thread Shulgin, Oleksandr
Hi,

Please find attached a simple copy-paste fix for CREATE_REPLICATION_SLOT
syntax.

--
Alex
From 05119485a473febe8ffd95103fd7774bc31ee079 Mon Sep 17 00:00:00 2001
From: Oleksandr Shulgin 
Date: Wed, 27 Jan 2016 11:27:35 +0100
Subject: [PATCH] Fix CREATE_REPLICATION_SLOT syntax in logicaldecoding.sgml

---
 doc/src/sgml/logicaldecoding.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 1ae5eb6..926637b 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -280,7 +280,7 @@ $ pg_recvlogical -d postgres --slot test --drop-slot
 The commands
 
  
-  CREATE_REPLICATION_SLOT slot_name LOGICAL options
+  CREATE_REPLICATION_SLOT slot_name LOGICAL output_plugin
  
 
  
-- 
2.5.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mac OS: invalid byte sequence for encoding "UTF8"

2016-01-27 Thread Shulgin, Oleksandr
On Wed, Jan 27, 2016 at 10:59 AM, Artur Zakirov 
wrote:

> Hello.
>
> When a user try to create a text search dictionary for the russian
> language on Mac OS then called the following error message:
>
>   CREATE EXTENSION hunspell_ru_ru;
> + ERROR:  invalid byte sequence for encoding "UTF8": 0xd1
> + CONTEXT:  line 341 of configuration file
> "/Users/stas/code/postgrespro2/tmp_install/Users/stas/code/postgrespro2/install/share/tsearch_data/ru_ru.affix":
> "SFX Y   хаться шутсяхаться
>
> Russian dictionary was downloaded from
> http://extensions.openoffice.org/en/project/slovari-dlya-russkogo-yazyka-dictionaries-russian
> Affix and dictionary files was extracted from the archive and converted to
> UTF-8. Also a converted dictionary can be downloaded from
> https://github.com/select-artur/hunspell_dicts/tree/master/ru_ru


Not sure why the file uses "SET KOI8-R" directive then?

This behavior occurs on:
> - Mac OS X 10.10 Yosemite and Mac OS X 10.11 El Capitan.
> - latest PostgreSQL version from git and PostgreSQL 9.5 (probably also on
> 9.4.5).
>
> There is also the test to reproduce this bug in the attachment.
>

What error message do you get with this test program?  (I don't get any,
but I'm not on Mac OS.)

--
Alex


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-01-25 Thread Shulgin, Oleksandr
On Sat, Jan 23, 2016 at 11:22 AM, Tomas Vondra  wrote:

> Hi,
>
> On 01/20/2016 10:49 PM, Alvaro Herrera wrote:
>
>>
>> Tom, are you reviewing this for the current commitfest?
>>
>
> While I'm not the right Tom, I've been looking the the patch recently, so
> let me post the review here ...
>

Thank you for the review!

2) mincount = 1.25 * avgcount
> -
>
> While I share the dislike of arbitrary constants (like the 1.25 here), I
> do think we better keep this, otherwise we can just get rid of the mincount
> entirely I think - the most frequent value will always be above the
> (remaining) average count, making the threshold pointless.
>

Correct.

It might have impact in the original code, but in the new one it's quite
> useless (see the next point), unless I'm missing some detail.
>
>
> 3) modifying avgcount threshold inside the loop
> ---
>
> The comment was extended with this statement:
>
>  * We also decrease ndistinct in the process such that going forward
>  * it refers to the number of distinct values left for the histogram.
>
> and indeed, that's what's happening - at the end of each loop, we do this:
>
> /* Narrow our view of samples left for the histogram */
> sample_cnt -= track[i].count;
> ndistinct--;
>
> but that immediately lowers the avgcount, as that's re-evaluated within
> the same loop
>
> avgcount = (double) sample_cnt / (double) ndistinct;
>
> which means it's continuously decreasing and lowering the threshold,
> although that's partially mitigated by keeping the 1.25 coefficient.
>

I was going to write "not necessarily lowering", but this is actually
accurate.  The following holds due to track[i].count > avgcount (=
sample_cnt / ndistinct):

 sample_cnt sample_cnt - track[i].count
 > -
  ndistinct   ndistinct - 1

It however makes reasoning about the algorithm much more complicated.
>

Unfortunately, yes.

4) for (i = 0; /* i < num_mcv */; i++)
> ---
>
> The way the loop is coded seems rather awkward, I guess. Not only there's
> an unexpected comment in the "for" clause, but the condition also says this
>
> /* Another way to say "while (i < num_mcv)" */
> if (i >= num_mcv)
> break;
>
> Why not to write it as a while loop, then? Possibly including the
> (num_hist >= 2) condition, like this:
>
> while ((i < num_mcv) && (num_hist >= 2))
> {
> ...
> }
>
> In any case, the loop would deserve a comment explaining why we think
> computing the thresholds like this makes sense.
>

This is partially explained by a comment inside the loop:

! for (i = 0; /* i < num_mcv */; i++)
  {
! /*
! * We have to put this before the loop condition, otherwise
! * we'll have to repeat this code before the loop and after
! * decreasing ndistinct.
! */
! num_hist = ndistinct;
! if (num_hist > num_bins)
! num_hist = num_bins + 1;

I guess this is a case where code duplication can be traded for more
apparent control flow, i.e:

!  num_hist = ndistinct;
!  if (num_hist > num_bins)
!  num_hist = num_bins + 1;

!  for (i = 0; i < num_mcv && num_hist >= 2; i++)
   {
...
+ /* Narrow our view of samples left for the histogram */
+ sample_cnt -= track[i].count;
+ ndistinct--;
+
+  num_hist = ndistinct;
+  if (num_hist > num_bins)
+  num_hist = num_bins + 1;
   }

Summary
> ---
>
> Overall, I think this is really about deciding when to cut-off the MCV, so
> that it does not grow needlessly large - as Robert pointed out, the larger
> the list, the more expensive the estimation (and thus planning).
>
> So even if we could fit the whole sample into the MCV list (i.e. we
> believe we've seen all the values and we can fit them into the MCV list),
> it may not make sense to do so. The ultimate goal is to estimate
> conditions, and if we can do that reasonably even after cutting of the
> least frequent values from the MCV list, then why not?
>
> From this point of view, the analysis concentrates deals just with the
> ANALYZE part and does not discuss the estimation counter-part at all.
>

True, this aspect still needs verification.  As stated, my primary
motivation was to improve the plan stability for relatively short MCV lists.

Longer MCV lists might be a different story, but see "Increasing stats
target" section of the original mail: increasing the target doesn't give
quite the expected results with unpatched code either.

5) ndistinct estimation vs. NULLs
> -
>
> While looking at the patch, I started realizing whether we're actually
> handling NULLs correctly when estimating ndistinct. Because that part also
> uses samplerows directly and entirely ignores NULLs, as it does this:
>
> numer = (double) samplerows *(double) d;
>
> denom = (double) (samplerows - f1) +
> (double) f1 *(double) samplerows / totalrows;
>
> ...
> if 

Re: [HACKERS] Inconsistent error handling in START_REPLICATION command

2016-01-21 Thread Shulgin, Oleksandr
On Thu, Jan 21, 2016 at 3:25 PM, Robert Haas  wrote:

> On Wed, Jan 20, 2016 at 2:28 AM, Craig Ringer 
> wrote:
> > It enters COPY BOTH mode before it invokes the startup callback. The
> client
> > has no way to unilaterally terminate COPY BOTH mode and return to the
> normal
> > walsender protocol. The server doesn't do it when there's an ERROR. So
> the
> > only option the client has for recovery is to disconnect and reconnect.
>
> This isn't my understanding of how the protocol works, and it's not
> what the documentation says either.
>
> Per http://www.postgresql.org/docs/current/static/protocol-flow.html:
>
> ===
> In the event of a backend-detected error during copy-both mode, the
> backend will issue an ErrorResponse message, discard frontend messages
> until a Sync message is received, and then issue ReadyForQuery and
> return to normal processing. The frontend should treat receipt of
> ErrorResponse as terminating the copy in both directions; no CopyDone
> should be sent in this case.
> ===
>
> So it's true that the client can't unilaterally terminate COPY BOTH
> mode; it can only send CopyDone.  But an error on the server side
> should do so.
>

Hm, you're right.  Even though the server sends COPY_BOTH message before
the plugin startup, an attempt by the client to actually read the data
messages results in ErrorResponse handled on the client, then the client
can re-submit the corrected START_REPLICATION command and continue without
the need to reconnect.  This cannot be actually tested in psql, but I can
verify the behavior with my python scripts.

Then I don't have a preference for the early error reporting in this case.
If the current behavior potentially allows for more flexible error
reporting, I'm for keeping it.

--
Alex


Re: [HACKERS] Stream consistent snapshot via a logical decoding plugin as a series of INSERTs

2016-01-20 Thread Shulgin, Oleksandr
On Wed, Jan 20, 2016 at 9:26 AM, Craig Ringer <cr...@2ndquadrant.com> wrote:

> On 20 January 2016 at 15:50, Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de> wrote:
>
> That'd be nice, but probably not totally necessary for streaming
> relations. It doesn't really need the slot at all. Or shouldn't, I think.
> Though it might be easiest to allow it to acquire the slot just for
> convenience and shared code.
>

Yes, before looking at the code I thought I could do without a slot, but
dependency on MyReplicationSlot being set in a number of places has forced
me to actually acquire it, in order to keep the changes more contained.

--
Alex


Re: [HACKERS] Stream consistent snapshot via a logical decoding plugin as a series of INSERTs

2016-01-19 Thread Shulgin, Oleksandr
On Fri, Jan 15, 2016 at 5:31 PM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:

>
> POC patch attached.  Findings:
>
> 1) Needs an actual slot for all the decode machinery to work (code depends
> on MyReplicationSlot being set).
> 2) Requires a core patch.
> 3) Currently only supports textual output, adding binary is trivial.
>
>
> Acquiring a slot means this cannot be run in parallel from multiple
> backends.  Any ideas on how to overcome this (except for opening multiple
> slots with the same LSN)?
> To obtain a consistent snapshot, the client still needs to take care of
> preserving and setting transaction snapshot properly.
>

Testing revealed a number of problems with memory handling in this code, a
corrected v2 is attached.

Completely another problem is proper handling of SPI stack and releasing
the replication slot.  The latter I'd like to avoid dealing with, because
at the moment it's not possible to stream a number of relations in parallel
using this POC function, so I'd rather move in a direction of not acquiring
the replication slot at all.

The SPI problem manifests itself if I place a LIMIT on top of the query:

# SELECT pg_logical_slot_stream_relation('slot1', 'pg_catalog', 'pg_class')
LIMIT 5;
WARNING:  relcache reference leak: relation "pg_class" not closed
WARNING:  transaction left non-empty SPI stack
HINT:  Check for missing "SPI_finish" calls.

I wonder if there is a way to install some sort of cleanup handler that
will be called by executor?

--
Alex
From 83c2c754066f43111d0f21ff088cc5503e910aab Mon Sep 17 00:00:00 2001
From: Oleksandr Shulgin <oleksandr.shul...@zalando.de>
Date: Fri, 15 Jan 2016 17:30:04 +0100
Subject: [PATCH] POC: pg_logical_slot_stream_relation

---
 src/backend/catalog/system_views.sql|   9 +
 src/backend/replication/logical/logicalfuncs.c  | 355 +---
 src/backend/replication/logical/reorderbuffer.c |   6 +-
 src/include/catalog/pg_proc.h   |   2 +
 src/include/replication/logicalfuncs.h  |   1 +
 src/include/replication/reorderbuffer.h |   3 +
 6 files changed, 333 insertions(+), 43 deletions(-)

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 923fe58..5431b61 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -941,6 +941,15 @@ LANGUAGE INTERNAL
 VOLATILE ROWS 1000 COST 1000
 AS 'pg_logical_slot_peek_binary_changes';
 
+CREATE OR REPLACE FUNCTION pg_logical_slot_stream_relation(
+IN slot_name name, IN relnamespace name, IN relname name, IN nochildren bool DEFAULT FALSE,
+VARIADIC options text[] DEFAULT '{}',
+OUT data text)
+RETURNS SETOF TEXT
+LANGUAGE INTERNAL
+VOLATILE ROWS 1000 COST 1000
+AS 'pg_logical_slot_stream_relation';
+
 CREATE OR REPLACE FUNCTION pg_create_physical_replication_slot(
 IN slot_name name, IN immediately_reserve boolean DEFAULT false,
 OUT slot_name name, OUT xlog_position pg_lsn)
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 562c8f6..c1605de 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -21,12 +21,18 @@
 #include "funcapi.h"
 #include "miscadmin.h"
 
+#include "access/htup_details.h"
 #include "access/xlog_internal.h"
 
+#include "executor/spi.h"
+
+#include "catalog/namespace.h"
 #include "catalog/pg_type.h"
 
 #include "nodes/makefuncs.h"
 
+#include "lib/stringinfo.h"
+
 #include "mb/pg_wchar.h"
 
 #include "utils/array.h"
@@ -40,6 +46,7 @@
 #include "replication/decode.h"
 #include "replication/logical.h"
 #include "replication/logicalfuncs.h"
+#include "replication/reorderbuffer.h"
 
 #include "storage/fd.h"
 
@@ -50,6 +57,12 @@ typedef struct DecodingOutputState
 	TupleDesc	tupdesc;
 	bool		binary_output;
 	int64		returned_rows;
+
+	/* for pg_logical_stream_relation */
+	MemoryContext	context;
+	Relation		rel;
+	Portal			cursor;
+	TupleTableSlot *tupslot;
 } DecodingOutputState;
 
 /*
@@ -270,6 +283,53 @@ logical_read_local_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr,
 	return count;
 }
 
+static List *
+deconstruct_options_array(ArrayType *arr)
+{
+	Size		ndim;
+	List	   *options = NIL;
+
+	ndim = ARR_NDIM(arr);
+	if (ndim > 1)
+	{
+		ereport(ERROR,
+(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("array must be one-dimensional")));
+	}
+	else if (array_contains_nulls(arr))
+	{
+		ereport(ERROR,
+(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("array must not contain nulls")));
+	}
+	else if (ndim == 1)
+	{
+		int			nelems;
+		Datum	   *datum_opts;
+		int			i;
+
+		Assert(ARR_ELEMTYPE(arr) == TEXTOID);
+
+		

Re: [HACKERS] Stream consistent snapshot via a logical decoding plugin as a series of INSERTs

2016-01-19 Thread Shulgin, Oleksandr
On Wed, Jan 20, 2016 at 7:57 AM, Craig Ringer <cr...@2ndquadrant.com> wrote:

> On 15 January 2016 at 16:30, Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de> wrote:
>
>
>> I'd like to propose generic functions (probably in an extension, or in
>> core if not possible otherwise) to facilitate streaming existing data from
>> the database *in the same format* that one would get if these would be the
>> changes decoded by a logical decoding plugin.
>>
>
> So effectively produce synthetic logical decoding callbacks to run a bunch
> of fake INSERTs, presumably with a fake xid etc?
>

Exactly.


> The idea is to use a snapshot returned from CREATE_REPLICATION_SLOT
>> command of the replication protocol to get a consistent snapshot of the
>> database, then start listening to new changes on the slot.
>>
>
> My impression is that you want to avoid the current step of "synchronize
> database initial contents" when using logical decoding for replication.
>

Yes, but...


> But I guess you're looking to then populate that empty schema in-band via
> logical decoding, rather than having to do a --data-only dump or use COPY.
> Right?
>
> That won't help you for schema; presumably you'd still do a pg_dump
> --schema-only | pg_restore for that.
>
> Just like when restoring a --data-only dump or using COPY you'd have to
> disable FKs during sync, but that's pretty much unavoidable.
>

All of this implies another *postgres* database on the receiving side,
which is not necessarily the case for my research.

The way this initial export phase is implemented there is by providing a
>> SQL-callable set returning function which is using SPI to run "SELECT *
>> FROM mytable" behind the scenes and runs the resulting tuples through the
>> INSERT callback of the logical decoding plugin, which lives in the same
>> loadable module as this SQL function.
>>
>
> o_O
>
> What about the reorder buffer, the logical decoding memory context, etc?
>

As shown by the POC patch, it is rather straightforward to achieve.

Bottled Water logical decoding plugin uses binary protocol based on Avro
>> data serialization library.  As an experiment I was adding support for JSON
>> output format to it, and for that I had to re-implement the aforementioned
>> SRF to export initial data to convert tuples to JSON instead.
>>
>
> Have you taken a look at what's been done with pglogical and
> pglogical_output?
>
> We've got extensible protocol support there, and if Avro offers compelling
> benefits over the current binary serialization I'm certainly interested in
> hearing about it.
>

This is what I'm going to benchmark.  With the generic function I can just
create two slots: one for pglogical and another one for BottledWater/Avro
and see which one performs better when forced to stream some TB worth of
INSERTs through the change callback.

What do you say?
>>
>
> Interesting idea. As outlined I think it sounds pretty fragile though; I
> really, really don't like the idea of lying to the insert callback by
> passing it a fake insert with (presumably) fake reorder buffer txn, etc.
>

Fair enough.  However for performance testing it could be not that bad,
even if nothing of that lands in the actual API.

What we've done in pglogical is take a --schema-only dump then, on the
> downstream, attach to the exported snapshot and use COPY ... TO STDOUT over
> a libpq connection to the upstream feed that to COPY ... FROM STDIN on
> another libpq connection to "ourselves", i.e. the downstream. Unless Petr
> changed it to use COPY innards directly on the downstream; I know he talked
> about it but haven't checked if he did. Anyway, either way it's not pretty
> and requires a sideband non-replication connection to sync initial state.
> The upside is that it can be relatively easily parallelized for faster sync
> using multiple connections.
>

I've also measured that to have a baseline for comparing it to decoding
performance.

To what extent are you setting up a true logical decoding context here?
>

It is done in the same way exact pg_logical_slot_get/peek_changes() do.


> Where does the xact info come from? The commit record? etc.
>

palloc0()


> You're presumably not forming a reorder buffer then decoding it since it
> could create a massive tempfile on disk, so are you just dummying this info
> up?
>

In my experience, it doesn't.  We know it's going to be a "committed xact",
so we don't really need to queue the changes up before we see a "commit"
record.


> Or hoping the plugin won't look at it?
>

Pretty much. :-)

The functionality is good and I think that for the SQL level you'd have to
> use SET TRANSACTION SNAPSH

Re: [HACKERS] More stable query plans via more predictable column statistics

2016-01-18 Thread Shulgin, Oleksandr
On Wed, Dec 2, 2015 at 10:20 AM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:

> On Tue, Dec 1, 2015 at 7:00 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>
>> "Shulgin, Oleksandr" <oleksandr.shul...@zalando.de> writes:
>> > This post summarizes a few weeks of research of ANALYZE statistics
>> > distribution on one of our bigger production databases with some
>> real-world
>> > data and proposes a patch to rectify some of the oddities observed.
>>
>> Please add this to the 2016-01 commitfest ...
>>
>
> Added: https://commitfest.postgresql.org/8/434/
>

It would be great if some folks could find a moment to run the queries I
was showing on their data to confirm (or refute) my findings, or to
contribute to the picture in general.

As I was saying, the queries were designed in such a way that even
unprivileged user can run them (the results will be limited to the stats
data available to that user, obviously; and for custom-tailored statstarget
one still needs superuser to join the pg_statistic table directly).  Also,
on the scale of ~30k attribute statistics records, the queries take only a
few seconds to finish.

Cheers!
--
Alex


[HACKERS] Trivial fixes for some IDENTIFICATION comment lines

2016-01-18 Thread Shulgin, Oleksandr
Hello,

I've noticed that in src/backend/replication/logical/reorderbuffer.c, the
IDENTIFICATION comment line is incorrect:

 * IDENTIFICATION
 *  src/backend/replication/reorderbuffer.c

By using a simple find+grep command I can see this is also the case for the
following files:

$ find src -name \*.c -o -name \*.h | xargs grep -A1 IDENTIFICATION | grep
-v -E 'IDENTIFICATION|--' | grep -v '^\(src/.*\.[ch]\)-\s*\*\s*\1\s*$'

src/include/utils/evtcache.h- *   src/backend/utils/cache/evtcache.c
src/backend/utils/cache/relfilenodemap.c- *
src/backend/utils/cache/relfilenode.c
src/backend/utils/adt/version.c- *
src/backend/storage/ipc/dsm_impl.c- * src/backend/storage/ipc/dsm.c
src/backend/replication/logical/reorderbuffer.c- *
 src/backend/replication/reorderbuffer.c
src/backend/replication/logical/snapbuild.c- *
 src/backend/replication/snapbuild.c
src/backend/replication/logical/logicalfuncs.c- *
src/backend/replication/logicalfuncs.c
src/backend/commands/dropcmds.c- *src/backend/catalog/dropcmds.c

The one wtih version.c is a false positive: there's just an extra blank
line in the comment.

A patch to fix the the above is attached.

Another minor thing is that if there is a convention for the order of
Copyright, NOTES and IDENTIFICATION, then it is not followed strictly.
Compare e.g. reorderbuffer.c vs. snapbuild.c.

--
Alex
From 4e71c026be7d1a75710fe89177d0dd37d8d84421 Mon Sep 17 00:00:00 2001
From: Oleksandr Shulgin 
Date: Mon, 18 Jan 2016 12:49:32 +0100
Subject: [PATCH] Assorted IDENTIFICATION comment line fixes.

---
 src/backend/commands/dropcmds.c | 2 +-
 src/backend/replication/logical/logicalfuncs.c  | 2 +-
 src/backend/replication/logical/reorderbuffer.c | 2 +-
 src/backend/replication/logical/snapbuild.c | 2 +-
 src/backend/storage/ipc/dsm_impl.c  | 2 +-
 src/backend/utils/cache/relfilenodemap.c| 2 +-
 src/include/utils/evtcache.h| 2 +-
 7 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c
index 522027a..20b42fe 100644
--- a/src/backend/commands/dropcmds.c
+++ b/src/backend/commands/dropcmds.c
@@ -8,7 +8,7 @@
  *
  *
  * IDENTIFICATION
- *	  src/backend/catalog/dropcmds.c
+ *	  src/backend/commands/dropcmds.c
  *
  *-
  */
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 56e47e4..562c8f6 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -9,7 +9,7 @@
  * Copyright (c) 2012-2016, PostgreSQL Global Development Group
  *
  * IDENTIFICATION
- *	  src/backend/replication/logicalfuncs.c
+ *	  src/backend/replication/logical/logicalfuncs.c
  *-
  */
 
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 78acced..7402f20 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -8,7 +8,7 @@
  *
  *
  * IDENTIFICATION
- *	  src/backend/replication/reorderbuffer.c
+ *	  src/backend/replication/logical/reorderbuffer.c
  *
  * NOTES
  *	  This module gets handed individual pieces of transactions in the order
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 97c1ad4..1dd302b 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -99,7 +99,7 @@
  * Copyright (c) 2012-2016, PostgreSQL Global Development Group
  *
  * IDENTIFICATION
- *	  src/backend/replication/snapbuild.c
+ *	  src/backend/replication/logical/snapbuild.c
  *
  *-
  */
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 5d7b46f..297cfdd 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -41,7 +41,7 @@
  *
  *
  * IDENTIFICATION
- *	  src/backend/storage/ipc/dsm.c
+ *	  src/backend/storage/ipc/dsm_impl.c
  *
  *-
  */
diff --git a/src/backend/utils/cache/relfilenodemap.c b/src/backend/utils/cache/relfilenodemap.c
index 894dacb..bef4d5b 100644
--- a/src/backend/utils/cache/relfilenodemap.c
+++ b/src/backend/utils/cache/relfilenodemap.c
@@ -7,7 +7,7 @@
  * Portions Copyright (c) 1994, Regents of the University of California
  *
  * IDENTIFICATION
- *	  src/backend/utils/cache/relfilenode.c
+ *	  src/backend/utils/cache/relfilenodemap.c
  *
  *-
  */
diff --git a/src/include/utils/evtcache.h b/src/include/utils/evtcache.h
index 7c8da74..76d2519 100644
--- 

Re: [HACKERS] Stream consistent snapshot via a logical decoding plugin as a series of INSERTs

2016-01-15 Thread Shulgin, Oleksandr
On Fri, Jan 15, 2016 at 11:08 AM, Simon Riggs <si...@2ndquadrant.com> wrote:

> On 15 January 2016 at 08:30, Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de> wrote:
>
>
>> I'd like to propose generic functions (probably in an extension, or in
>> core if not possible otherwise) to facilitate streaming existing data from
>> the database *in the same format* that one would get if these would be the
>> changes decoded by a logical decoding plugin.
>>
>> The idea is to use a snapshot returned from CREATE_REPLICATION_SLOT
>> command of the replication protocol to get a consistent snapshot of the
>> database, then start listening to new changes on the slot.
>>
>
> It sounds like this is already possible.
>

Totally, that's how it was supposed to be used anyway.  What is missing IMO
is retrieving the initial snapshot in the same format as that the later
changes will arrive.

--
Alex


Re: [HACKERS] Stream consistent snapshot via a logical decoding plugin as a series of INSERTs

2016-01-15 Thread Shulgin, Oleksandr
On Fri, Jan 15, 2016 at 12:09 PM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:

> On Fri, Jan 15, 2016 at 11:08 AM, Simon Riggs <si...@2ndquadrant.com>
> wrote:
>
>> On 15 January 2016 at 08:30, Shulgin, Oleksandr <
>> oleksandr.shul...@zalando.de> wrote:
>>
>>
>>> I'd like to propose generic functions (probably in an extension, or in
>>> core if not possible otherwise) to facilitate streaming existing data from
>>> the database *in the same format* that one would get if these would be the
>>> changes decoded by a logical decoding plugin.
>>>
>>> The idea is to use a snapshot returned from CREATE_REPLICATION_SLOT
>>> command of the replication protocol to get a consistent snapshot of the
>>> database, then start listening to new changes on the slot.
>>>
>>
>> It sounds like this is already possible.
>>
>
> Totally, that's how it was supposed to be used anyway.  What is missing
> IMO is retrieving the initial snapshot in the same format as that the later
> changes will arrive.
>

POC patch attached.  Findings:

1) Needs an actual slot for all the decode machinery to work (code depends
on MyReplicationSlot being set).
2) Requires a core patch.
3) Currently only supports textual output, adding binary is trivial.


Acquiring a slot means this cannot be run in parallel from multiple
backends.  Any ideas on how to overcome this (except for opening multiple
slots with the same LSN)?
To obtain a consistent snapshot, the client still needs to take care of
preserving and setting transaction snapshot properly.

--
Alex
From f800b8c387eb17f4eb005b38b78585f1f165b0d3 Mon Sep 17 00:00:00 2001
From: Oleksandr Shulgin <oleksandr.shul...@zalando.de>
Date: Fri, 15 Jan 2016 17:30:04 +0100
Subject: [PATCH] POC: pg_logical_slot_stream_relation

---
 src/backend/catalog/system_views.sql|   9 +
 src/backend/replication/logical/logicalfuncs.c  | 337 +---
 src/backend/replication/logical/reorderbuffer.c |   6 +-
 src/include/catalog/pg_proc.h   |   2 +
 src/include/replication/logicalfuncs.h  |   1 +
 src/include/replication/reorderbuffer.h |   3 +
 6 files changed, 315 insertions(+), 43 deletions(-)

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 923fe58..5431b61 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -941,6 +941,15 @@ LANGUAGE INTERNAL
 VOLATILE ROWS 1000 COST 1000
 AS 'pg_logical_slot_peek_binary_changes';
 
+CREATE OR REPLACE FUNCTION pg_logical_slot_stream_relation(
+IN slot_name name, IN relnamespace name, IN relname name, IN nochildren bool DEFAULT FALSE,
+VARIADIC options text[] DEFAULT '{}',
+OUT data text)
+RETURNS SETOF TEXT
+LANGUAGE INTERNAL
+VOLATILE ROWS 1000 COST 1000
+AS 'pg_logical_slot_stream_relation';
+
 CREATE OR REPLACE FUNCTION pg_create_physical_replication_slot(
 IN slot_name name, IN immediately_reserve boolean DEFAULT false,
 OUT slot_name name, OUT xlog_position pg_lsn)
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 56e47e4..bc62784 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -21,12 +21,18 @@
 #include "funcapi.h"
 #include "miscadmin.h"
 
+#include "access/htup_details.h"
 #include "access/xlog_internal.h"
 
+#include "executor/spi.h"
+
+#include "catalog/namespace.h"
 #include "catalog/pg_type.h"
 
 #include "nodes/makefuncs.h"
 
+#include "lib/stringinfo.h"
+
 #include "mb/pg_wchar.h"
 
 #include "utils/array.h"
@@ -40,6 +46,7 @@
 #include "replication/decode.h"
 #include "replication/logical.h"
 #include "replication/logicalfuncs.h"
+#include "replication/reorderbuffer.h"
 
 #include "storage/fd.h"
 
@@ -50,6 +57,11 @@ typedef struct DecodingOutputState
 	TupleDesc	tupdesc;
 	bool		binary_output;
 	int64		returned_rows;
+
+	/* for pg_logical_stream_relation */
+	Relation		rel;
+	Portal			cursor;
+	TupleTableSlot *tupslot;
 } DecodingOutputState;
 
 /*
@@ -270,6 +282,53 @@ logical_read_local_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr,
 	return count;
 }
 
+static List *
+deconstruct_options_array(ArrayType *arr)
+{
+	Size		ndim;
+	List	   *options = NIL;
+
+	ndim = ARR_NDIM(arr);
+	if (ndim > 1)
+	{
+		ereport(ERROR,
+(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("array must be one-dimensional")));
+	}
+	else if (array_contains_nulls(arr))
+	{
+		ereport(ERROR,
+(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("array must not contain nulls")));
+	}
+	else if (ndim == 1)
+	{
+		

[HACKERS] Stream consistent snapshot via a logical decoding plugin as a series of INSERTs

2016-01-15 Thread Shulgin, Oleksandr
Hello,

I'd like to propose generic functions (probably in an extension, or in core
if not possible otherwise) to facilitate streaming existing data from the
database *in the same format* that one would get if these would be the
changes decoded by a logical decoding plugin.

The idea is to use a snapshot returned from CREATE_REPLICATION_SLOT command
of the replication protocol to get a consistent snapshot of the database,
then start listening to new changes on the slot.

One of the implementations of this approach is "Bottled Water":
https://github.com/confluentinc/bottledwater-pg

The way this initial export phase is implemented there is by providing a
SQL-callable set returning function which is using SPI to run "SELECT *
FROM mytable" behind the scenes and runs the resulting tuples through the
INSERT callback of the logical decoding plugin, which lives in the same
loadable module as this SQL function.

Bottled Water logical decoding plugin uses binary protocol based on Avro
data serialization library.  As an experiment I was adding support for JSON
output format to it, and for that I had to re-implement the aforementioned
SRF to export initial data to convert tuples to JSON instead.

Now I'd like to compare performance impact of using JSON vs. Avro vs.
binary format of pglogical_output and for that a missing part is something
that would stream the existing data in pglogical's format.  Instead of
writing one more implementation of the export function, this time for
pglogical_output, I'd rather use a generic function that accepts a relation
name, logical decoding plugin name and a set of startup options for the
plugin, then pretends that we're decoding a stream of INSERTs on a slot (no
actual slot is needed for that, but setting transaction snapshot beforehand
is something to be done by the client).

In SQL and C pseudo-code:

CREATE FUNCTION /*pg_catalog.?*/ pg_logical_stream_relation(
relnamespace text,
relname text,
plugin_name text,
nochildren boolean DEFAULT FALSE,
VARIADIC options text[] DEFAULT '{}'::text[]
) RETURNS SETOF text
AS '...', 'pg_logical_stream_relation' LANGUAGE C VOLATILE;

CREATE FUNCTION /*pg_catalog.?*/ pg_logical_stream_relation_binary(
relnamespace text,
relname text,
plugin_name text,
nochildren boolean DEFAULT FALSE,
VARIADIC options text[] DEFAULT '{}'::text[]
) RETURNS SETOF bytea
AS '...', 'pg_logical_stream_relation_binary' LANGUAGE C VOLATILE;

-- usage:
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SET TRANSACTION SNAPSHOT '-N';

SELECT *
  FROM pg_logical_stream_relation('myschema', 'mytable', 'test_decoding',
  nochildren := FALSE, ...)


Datum
pg_logical_stream_relation(PG_FUNCTION_ARGS)
{
if (SRF_IS_FIRSTCALL())
{
/* create decoding context */ /* starts the plugin up */
/* emit BEGIN */
}
/*
  seq scan
  => emit series of INSERTs
*/
/* emit COMMIT */
/* free decoding context */ /* shuts the plugin down */
}

What do you say?

--
Alex


Re: [HACKERS] Add schema-qualified relnames in constraint error messages.

2016-01-07 Thread Shulgin, Oleksandr
On Wed, Jan 6, 2016 at 3:02 PM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:

>
> Please find attached a POC patch, using \errverbose for the command name.
> Unfortunately, I didn't see a good way to contain the change in psql only
> and had to change libpq, adding new interface PQresultBuildErrorMessage().
> Also, what I don't like very much is that we need to keep track of the last
> result from last SendQuery() in psql's pset.  So in my view this is sort of
> a dirty hack that works nonetheless.
>

Added to the Open commitfest: https://commitfest.postgresql.org/9/475/


Re: [HACKERS] Inconsistent error handling in START_REPLICATION command

2016-01-06 Thread Shulgin, Oleksandr
On Tue, Jan 5, 2016 at 11:35 AM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:

> On Tue, Jan 5, 2016 at 10:39 AM, Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de> wrote:
>
>>
>> I didn't look in the code yet, but if someone knows off top of the head
>> the reason to this behavior, I'd be glad to learn it.
>>
>
> Looks like the attached patch fixes it for me:
>

Should I add it to the current commitfest?  It seems to me, the patch is
trivial and unambiguous, so that might be an overkill...

Thanks.
--
Alex


Re: [HACKERS] Add schema-qualified relnames in constraint error messages.

2016-01-06 Thread Shulgin, Oleksandr
On Wed, Jan 6, 2016 at 5:06 AM, Jim Nasby  wrote:

> On 1/5/16 9:16 PM, Tom Lane wrote:
>
>> Jim Nasby  writes:
>>
>>> FWIW, I suspect very few people know about the verbosity setting (I
>>> didn't until a few months ago...) Maybe psql should hint about it the
>>> first time an error is reported in a session.
>>>
>>
>> Actually, what'd be really handy IMO is something to regurgitate the
>> most recent error in verbose mode, without making a permanent session
>> state change.  Something like
>>
>> regression=# insert into bar values(1);
>> ERROR:  insert or update on table "bar" violates foreign key constraint
>> "bar_f1_fkey"
>> DETAIL:  Key (f1)=(1) is not present in table "foo".
>> regression=# \saywhat
>> ERROR:  23503: insert or update on table "bar" violates foreign key
>> constraint "bar_f1_fkey"
>> DETAIL:  Key (f1)=(1) is not present in table "foo".
>> SCHEMA NAME:  public
>> TABLE NAME:  bar
>> CONSTRAINT NAME:  bar_f1_fkey
>> LOCATION:  ri_ReportViolation, ri_triggers.c:3326
>> regression=#
>>
>> Not sure how hard that would be to do within psql's current structure.
>>
>
> At first glance, it looks like it just means changing AcceptResult() to
> use PQresultErrorField in addition to PQresultErrorMessage, and stuffing
> the results somewhere. And of course adding \saywhat to the psql parser,
> but maybe someone more versed in psql code can verify that.
>
> If it is that simple, looks like another good beginner hacker task. :)


Sorry, I couldn't resist it: I was too excited to learn such option
existed. :-)

Please find attached a POC patch, using \errverbose for the command name.
Unfortunately, I didn't see a good way to contain the change in psql only
and had to change libpq, adding new interface PQresultBuildErrorMessage().
Also, what I don't like very much is that we need to keep track of the last
result from last SendQuery() in psql's pset.  So in my view this is sort of
a dirty hack that works nonetheless.

Cheers!
--
Alex
From 402866a6899791e7f410ed747e3b0018eed717e0 Mon Sep 17 00:00:00 2001
From: Oleksandr Shulgin 
Date: Wed, 6 Jan 2016 14:58:17 +0100
Subject: [PATCH] POC: \errverbose in psql

---
 src/bin/psql/command.c  |  20 ++
 src/bin/psql/common.c   |   6 +-
 src/bin/psql/settings.h |   1 +
 src/bin/psql/startup.c  |   1 +
 src/interfaces/libpq/exports.txt|   1 +
 src/interfaces/libpq/fe-exec.c  |  21 ++
 src/interfaces/libpq/fe-protocol3.c | 131 +++-
 src/interfaces/libpq/libpq-fe.h |   1 +
 src/interfaces/libpq/libpq-int.h|   2 +
 9 files changed, 122 insertions(+), 62 deletions(-)

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 9750a5b..9ae5ae5 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -822,6 +822,24 @@ exec_command(const char *cmd,
 		}
 	}
 
+	/* \errverbose -- display verbose error message from last result */
+	else if (strcmp(cmd, "errverbose") == 0)
+	{
+		char	   *errmsg;
+
+		if (pset.last_result && PQresultStatus(pset.last_result) == PGRES_FATAL_ERROR)
+		{
+			/* increase error verbosity for a moment */
+			PQsetErrorVerbosity(pset.db, PQERRORS_VERBOSE);
+			errmsg = PQresultBuildErrorMessage(pset.db, pset.last_result);
+			PQsetErrorVerbosity(pset.db, pset.verbosity);
+
+			psql_error("%s", errmsg);
+
+			PQfreemem(errmsg);
+		}
+	}
+
 	/* \f -- change field separator */
 	else if (strcmp(cmd, "f") == 0)
 	{
@@ -1865,6 +1883,8 @@ do_connect(char *dbname, char *user, char *host, char *port)
 			{
 PQfinish(o_conn);
 pset.db = NULL;
+PQclear(pset.last_result);
+pset.last_result = NULL;
 			}
 		}
 
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 2cb2e9b..85658e6 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -315,6 +315,8 @@ CheckConnection(void)
 			psql_error("Failed.\n");
 			PQfinish(pset.db);
 			pset.db = NULL;
+			PQclear(pset.last_result);
+			pset.last_result = NULL;
 			ResetCancelConn();
 			UnsyncVariables();
 		}
@@ -1154,7 +1156,9 @@ SendQuery(const char *query)
 		}
 	}
 
-	PQclear(results);
+	/* XXX: can we have PQclear that would free everything except errfields? */
+	PQclear(pset.last_result);
+	pset.last_result = results;
 
 	/* Possible microtiming output */
 	if (pset.timing)
diff --git a/src/bin/psql/settings.h b/src/bin/psql/settings.h
index 20a6470..8b88fbb 100644
--- a/src/bin/psql/settings.h
+++ b/src/bin/psql/settings.h
@@ -80,6 +80,7 @@ enum trivalue
 typedef struct _psqlSettings
 {
 	PGconn	   *db;/* connection to backend */
+	PGresult   *last_result;	/* last result from running a query, if any */
 	int			encoding;		/* client_encoding */
 	FILE	   *queryFout;		/* where to send the query results */
 	bool		queryFoutPipe;	/* queryFout is from a popen() */
diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c
index 6916f6f..7586ee81 100644
--- 

[HACKERS] Inconsistent error handling in START_REPLICATION command

2016-01-05 Thread Shulgin, Oleksandr
Hackers,

It looks like there's an inconsistency in error handling during
START_REPLICATION command of replication protocol:

$ psql postgres://localhost/psycopg2test?replication=database
psql (9.6devel)
Type "help" for help.

psycopg2test=# IDENTIFY_SYSTEM;
  systemid   | timeline |  xlogpos  |dbname
-+--+---+--
 6235978519197579707 |1 | 0/2CE0F78 | psycopg2test
(1 row)

psycopg2test=# START_REPLICATION SLOT "TEST1" LOGICAL 0/0 ("invalid"
"value");
ERROR:  syntax error

1) syntax errors are reported and client can retry with corrected command:

psycopg2test=# START_REPLICATION SLOT "TEST1" LOGICAL 0/0 ("invalid"
'value');
ERROR:  replication slot name "TEST1" contains invalid character
HINT:  Replication slot names may only contain lower case letters, numbers,
and the underscore character.

2) further errors are reported and we can retry:

psycopg2test=# START_REPLICATION SLOT "test1" LOGICAL 0/0 ("invalid"
'value');
ERROR:  replication slot "test1" does not exist

psycopg2test=# CREATE_REPLICATION_SLOT "test1" LOGICAL "test_decoding";
 slot_name | consistent_point | snapshot_name | output_plugin
---+--+---+---
 test1 | 0/2CF5340| 088C-1| test_decoding
(1 row)

psycopg2test=# START_REPLICATION SLOT "test1" LOGICAL 0/0 ("invalid"
'value');
unexpected PQresultStatus: 8

The last command results in the following output sent to the server log:

ERROR:  option "invalid" = "value" is unknown
CONTEXT:  slot "test1", output plugin "test_decoding", in the startup
callback

But the client has no way to learn about the error, nor can it restart with
correct one (the server has entered COPY protocol mode):

psycopg2test=# START_REPLICATION SLOT "test1" LOGICAL 0/0;
PQexec not allowed during COPY BOTH

I recall Craig Rigner mentioning this issue in context of the
pglogical_output plugin, but I thought that was something to do with the
startup packet.  The behavior above doesn't strike me as very consistent,
we should be able to detect and report errors during output plugin startup
and let the client retry with the corrected command as we do for syntax or
other errors.

I didn't look in the code yet, but if someone knows off top of the head the
reason to this behavior, I'd be glad to learn it.

Cheers.
-- 
*Oleksandr "Alex" Shulgin*
*Database Engineer*

Mobile: +49 160 84-90-639
Email: oleksandr.shul...@zalando.de


Re: [HACKERS] Inconsistent error handling in START_REPLICATION command

2016-01-05 Thread Shulgin, Oleksandr
On Tue, Jan 5, 2016 at 10:39 AM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:

> Hackers,
>
> It looks like there's an inconsistency in error handling during
> START_REPLICATION command of replication protocol:
>
> $ psql postgres://localhost/psycopg2test?replication=database
> psql (9.6devel)
> Type "help" for help.
>
> psycopg2test=# IDENTIFY_SYSTEM;
>   systemid   | timeline |  xlogpos  |dbname
> -+--+---+--
>  6235978519197579707 |1 | 0/2CE0F78 | psycopg2test
> (1 row)
>
> psycopg2test=# START_REPLICATION SLOT "TEST1" LOGICAL 0/0 ("invalid"
> "value");
> ERROR:  syntax error
>
> 1) syntax errors are reported and client can retry with corrected command:
>
> psycopg2test=# START_REPLICATION SLOT "TEST1" LOGICAL 0/0 ("invalid"
> 'value');
> ERROR:  replication slot name "TEST1" contains invalid character
> HINT:  Replication slot names may only contain lower case letters,
> numbers, and the underscore character.
>
> 2) further errors are reported and we can retry:
>
> psycopg2test=# START_REPLICATION SLOT "test1" LOGICAL 0/0 ("invalid"
> 'value');
> ERROR:  replication slot "test1" does not exist
>
> psycopg2test=# CREATE_REPLICATION_SLOT "test1" LOGICAL "test_decoding";
>  slot_name | consistent_point | snapshot_name | output_plugin
> ---+--+---+---
>  test1 | 0/2CF5340| 088C-1| test_decoding
> (1 row)
>
> psycopg2test=# START_REPLICATION SLOT "test1" LOGICAL 0/0 ("invalid"
> 'value');
> unexpected PQresultStatus: 8
>
> The last command results in the following output sent to the server log:
>
> ERROR:  option "invalid" = "value" is unknown
> CONTEXT:  slot "test1", output plugin "test_decoding", in the startup
> callback
>
> But the client has no way to learn about the error, nor can it restart
> with correct one (the server has entered COPY protocol mode):
>
> psycopg2test=# START_REPLICATION SLOT "test1" LOGICAL 0/0;
> PQexec not allowed during COPY BOTH
>
> I recall Craig Rigner mentioning this issue in context of the
> pglogical_output plugin, but I thought that was something to do with the
> startup packet.  The behavior above doesn't strike me as very consistent,
> we should be able to detect and report errors during output plugin startup
> and let the client retry with the corrected command as we do for syntax or
> other errors.
>
> I didn't look in the code yet, but if someone knows off top of the head
> the reason to this behavior, I'd be glad to learn it.
>

Looks like the attached patch fixes it for me:

psycopg2test=# START_REPLICATION SLOT "test1" LOGICAL 0/0 ("invalid"
'value');
ERROR:  option "invalid" = "value" is unknown
CONTEXT:  slot "test1", output plugin "test_decoding", in the startup
callback

psycopg2test=# START_REPLICATION SLOT "test1" LOGICAL 0/0;
unexpected PQresultStatus: 8

There was a similar problem with PHYSICAL replication--if
the requested start LSN is in the future, the error is not reported to the
client and connection is unusable:

psycopg2test=# START_REPLICATION SLOT "test2" PHYSICAL 1FF0/0;
unexpected PQresultStatus: 8

psycopg2test=# START_REPLICATION SLOT "test2" PHYSICAL 0/0;
PQexec not allowed during COPY BOTH

After the patch:

psycopg2test=# START_REPLICATION SLOT "test2" PHYSICAL 1FF0/0;
ERROR:  requested starting point 1FF0/0 is ahead of the WAL flush position
of this server 0/2DE6C28

psycopg2test=# START_REPLICATION SLOT "test2" PHYSICAL 0/0;
unexpected PQresultStatus: 8


Actually, would it make sense to move the block that sends CopyBothResponse
message to the WalSndLoop() instead?  I think you can't get any closer than
that...

--
Alex
From 4ebfa71a1d4e4d43415ad9c2a8142f45444313ec Mon Sep 17 00:00:00 2001
From: Oleksandr Shulgin <oleksandr.shul...@zalando.de>
Date: Tue, 5 Jan 2016 11:34:37 +0100
Subject: [PATCH] Move CopyBothResponse closer to WalSndLoop()

This is required in order for any initialization errors to be properly
reported to the client and to avoid entering COPY BOTH mode
prematurely (the client will not be able to correct the error after
that even if it could learn about it somehow).
---
 src/backend/replication/walsender.c | 39 -
 1 file changed, 25 insertions(+), 14 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index c03e045..3b83db7 100644
--- a/src/backend/replication/walsender.c
+++ b/src/

Re: [HACKERS] \x auto and EXPLAIN

2016-01-04 Thread Shulgin, Oleksandr
On Sun, Jan 3, 2016 at 6:43 PM, Tom Lane  wrote:

> Andreas Karlsson  writes:
> > psql's "\x auto" is a nice feature, but it is made much less useful in
> > my opinion due to the expanded output format making query plans
> > unreadable (and query plans often end up using expanded display due to
> > their width). I think we should never use the expanded format for
> > EXPLAIN output in the "\x auto" mode, since even when the wrapped format
> > is used the query plans are very hard to read.
>
> > I see two ways to fix this.
>
> > 1) Never use expanded display for the case where there is only one
> > column. There seems to me like there is little value in using expanded
> > display for when you only have one column, but I may be missing some use
> > case here.
>
> > 2) Explicitly detect (for example based on the headers) that the result
> > is a query plan and if so disable expanded display.
>
> The second of these seems pretty bletcherous --- for one thing, it might
> fall foul of localization attempts.  However, I could see the argument
> for not using expanded mode for any single-column output.
>

+1 to option #1, I sympathize to this as an annoyance that can be easily
fixed.

--
Alex


Re: [HACKERS] custom function for converting human readable sizes to bytes

2016-01-04 Thread Shulgin, Oleksandr
On Wed, Dec 30, 2015 at 8:28 PM, Pavel Stehule <pavel.steh...@gmail.com>
wrote:

>
>
> 2015-12-30 17:33 GMT+01:00 Robert Haas <robertmh...@gmail.com>:
>
>> On Mon, Dec 28, 2015 at 8:45 AM, Shulgin, Oleksandr
>> <oleksandr.shul...@zalando.de> wrote:
>> > I didn't check out earlier versions of this patch, but the latest one
>> still
>> > changes pg_size_pretty() to emit PB suffix.
>> >
>> > I don't think it is worth it to throw a number of changes together like
>> > that.  We should focus on adding pg_size_bytes() first and make it
>> > compatible with both pg_size_pretty() and existing GUC units: that is
>> > support suffixes up to TB and make sure they have the meaning of powers
>> of
>> > 2^10, not 10^3.  Re-using the table present in guc.c would be a plus.
>> >
>> > Next, we could think about adding handling of PB suffix on input and
>> output,
>> > but I don't see a big problem if that is emitted as 1024TB or the user
>> has
>> > to specify it as 1024TB in a GUC or argument to pg_size_bytes(): an
>> minor
>> > inconvenience only.
>>
>> +1 to everything in this email.
>>
>
> so I removed support for PB and SI units. Now the
> memory_unit_conversion_table is shared.
>

Looks better, thanks.

I'm not sure why the need to touch the regression test for pg_size_pretty():

!10.5 | 10.5 bytes | -10.5 bytes
!  1000.5 | 1000.5 bytes   | -1000.5 bytes
!   100.5 | 977 kB | -977 kB
!10.5 | 954 MB | -954 MB
! 1.5 | 931 GB | -931 GB
!  1000.5 | 909 TB | -909 TB

A nitpick, this loop:

+ while (*cp)
+ {
+ if ((isdigit(*cp) || *cp == '.') && ndigits < MAX_DIGITS)
+ digits[ndigits++] = *cp++;
+ else
+ break;
+ }

would be a bit easier to parse if spelled as:

+ while (*cp && (isdigit(*cp) || *cp == '.') && ndigits < MAX_DIGITS)
+ digits[ndigits++] = *cp++;

On the other hand, this seems to truncate the digits silently:

+ digits[ndigits] = '\0';

I don't think we want that, e.g:

postgres=# select pg_size_bytes('9223372036854775807.9');
ERROR:  invalid unit "9"
HINT:  Valid units for this parameter are "kB", "MB", "GB", and "TB".

I think making a mutable copy of the input string and truncating it before
passing to numeric_in() would make more sense--no need to hard-code
MAX_DIGITS.  The same goes for hard-coding MAX_UNIT_LEN, e.g. compare the
following two outputs:

postgres=# select pg_size_bytes('1 KiB');
ERROR:  invalid unit "KiB"
HINT:  Valid units for this parameter are "kB", "MB", "GB", and "TB".

postgres=# select pg_size_bytes('1024 bytes');
ERROR:  invalid format

I believe we should see a similar error message and a hint in the latter
case.  (No, I don't think we should add support for 'bytes' as a unit, not
even for "compatibility" with pg_size_pretty()--for one, I don't think it
would be wise to expect pg_size_bytes() to be able to deparse *every*
possible output produced by pg_size_pretty() as it's purpose is
human-readable display; also, pg_size_pretty() can easily produce output
that doesn't fit into bigint type, or is just negative)

Code comments and doc change need proof-reading by a native English
speaker, which I am not.

--
Alex


Re: [HACKERS] custom function for converting human readable sizes to bytes

2016-01-04 Thread Shulgin, Oleksandr
On Mon, Jan 4, 2016 at 6:03 PM, Pavel Stehule <pavel.steh...@gmail.com>
wrote:
>
> 2016-01-04 17:48 GMT+01:00 Shulgin, Oleksandr <
oleksandr.shul...@zalando.de>:
>>
>> On Mon, Jan 4, 2016 at 4:51 PM, Robert Haas <robertmh...@gmail.com>
wrote:
>>>
>>> On Mon, Jan 4, 2016 at 10:17 AM, Pavel Stehule <pavel.steh...@gmail.com>
wrote:
>>> > [ new patch ]
>>>
>>> + case '-':
>>> + ereport(ERROR,
>>> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>>> +  errmsg("size cannot be negative")));
>>>
>>> Why not?  I bet if you copy any - sign to the buffer, this will Just
Work.
>>
>>
>> I'm also inclined on dropping that explicit check for empty string below
and let numeric_in() error out on that.  Does this look OK, or can it
confuse someone:
>>
>> postgres=# select pg_size_bytes('');
>> ERROR:  invalid input syntax for type numeric: ""
>>
>> ?
>>
>>> + if ( conv->base_unit == GUC_UNIT_KB &&
>>
>>
>> Between "(" and "conv->..." I believe.
>
>
> both fixed

Hm...

> + switch (*strptr)
> + {
> + /* ignore plus symbol */
> + case '+':
> + case '-':
> + *bufptr++ = *strptr++;
> + break;
> + }

Well, to that amount you don't need any special checks, I'm just not sure
if reported error message is not misleading if we let numeric_in() handle
all the errors.  At least it can cope with the leading spaces, +/- and
empty input quite well.

--
Alex


Re: [HACKERS] custom function for converting human readable sizes to bytes

2016-01-04 Thread Shulgin, Oleksandr
On Mon, Jan 4, 2016 at 6:14 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Mon, Jan 4, 2016 at 11:48 AM, Shulgin, Oleksandr
> >
> > postgres=# select pg_size_bytes('');
> > ERROR:  invalid input syntax for type numeric: ""
>
> I think that's a pretty bad error message.  I mean, the user is
> calling a function that takes text as an input data type.  So, where's
> numeric involved?
>

Is there a way to force CONTEXT output in the reported error?  I guess that
could help.

I'm also kind of wondering what the intended use case for this
> function is.  Why do we want it?  Do we want it?
>

As suggested above a usecase could be like the following:

SELECT relname FROM pg_class WHERE pg_relation_size(oid) >
pg_size_bytes('100 GB');

I think it's neat and useful.

--
Alex


Re: [HACKERS] Re: pglogical_output - a general purpose logical decoding output plugin

2016-01-04 Thread Shulgin, Oleksandr
On Sun, Jan 3, 2016 at 7:21 PM, Tomasz Rybak  wrote:

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   not tested
> Spec compliant:   not tested
> Documentation:tested, failed
>
> Applies cleanly on current master
> (b416c0bb622ce5d33fdbec3bbce00451132f10ec).
>
> Builds without any problems on current Debian unstable (am64 arch, GCC
> 5.3.1-4, glibc 2.21-6)
>

A make from an external build dir fails on install, suggested fix:

install: all
$(MKDIR_P) '$(DESTDIR)$(includedir)'/pglogical_output
- $(INSTALL_DATA) pglogical_output/hooks.h
'$(DESTDIR)$(includedir)'/pglogical_output
+ $(INSTALL_DATA)
$(top_srcdir)/contrib/pglogical_output/pglogical_output/hooks.h
'$(DESTDIR)$(includedir)'/pglogical_output

--
Alex


Re: [HACKERS] pglogical - logical replication contrib module

2016-01-04 Thread Shulgin, Oleksandr
On Fri, Jan 1, 2016 at 12:34 AM, Petr Jelinek  wrote:

> Hi,
>
> I'd like to submit the replication solution which is based on the
> pglogical_output [1] module (which is obviously needed for this to compile).
>

Hi,

Impressive stuff!

Apparently this depends on a newer, yet-to-be-published version of the
pglogical_output patch:

.../contrib/pglogical/pglogical_hooks.c: In function
‘pglogical_row_filter_hook’:
.../contrib/pglogical/pglogical_hooks.c:173:35: error: ‘struct
PGLogicalRowFilterArgs’ has no member named ‘change’
HeapTuple  tup = _args->change->data.tp.newtuple->tuple;
   ^

It currently doesn't do multi-master or automatic DDL. I think DDL should
> be relatively easy if somebody finishes the deparse extension as the
> infrastructure for replicating arbitrary commands is present in this patch.
>

I wish could find the time to get back to this patch.  I didn't check it in
quite a while...

+PGconn *
+pglogical_connect(const char *connstring, const char *connname)
+{
+ PGconn   *conn;
+ StringInfoData dsn;
+
+ initStringInfo();
+ appendStringInfo(,
+ "%s fallback_application_name='%s'",
+ connstring, connname);
+
+ conn = PQconnectdb(dsn.data);

This is prone to errors when connstring is specified in URI format.  A
workaround is provided in this commit for
walreceiver: b3fc6727ce54a16ae9227bcccfebfa028ac5b16f

--
Alex


Re: [HACKERS] custom function for converting human readable sizes to bytes

2016-01-04 Thread Shulgin, Oleksandr
On Mon, Jan 4, 2016 at 4:51 PM, Robert Haas  wrote:

> On Mon, Jan 4, 2016 at 10:17 AM, Pavel Stehule 
> wrote:
> > [ new patch ]
>
> + case '-':
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +  errmsg("size cannot be negative")));
>
> Why not?  I bet if you copy any - sign to the buffer, this will Just Work.
>

I'm also inclined on dropping that explicit check for empty string below
and let numeric_in() error out on that.  Does this look OK, or can it
confuse someone:

postgres=# select pg_size_bytes('');
ERROR:  invalid input syntax for type numeric: ""

?

+ if ( conv->base_unit == GUC_UNIT_KB &&
>

Between "(" and "conv->..." I believe.

---
Alex


Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries

2015-12-30 Thread Shulgin, Oleksandr
On Wed, Dec 30, 2015 at 4:31 AM, Haribabu Kommi 
wrote:

>
> Adding quotes to pg_hba_lookup function makes it different from others.
> The issues regarding the same is already discussed in [1].
>
> select a.database[1], b.datname from
> pg_hba_lookup('postgres','kommih','::1')
> as a, pg_database as b where a.database[1]
> = b.datname;
>
> The queries like above are not possible with quoted output. It is very
> rare that the
> pg_hba_lookup function used in join operations, but still it is better
> to provide
> data without quotes. so I reverted these changes in the attached latest
> patch.
>

That's a good point.  I wonder that maybe instead of re-introducing quotes
we could somehow make the unquoted keywords that have special meaning stand
out, e.g:

database  | {$sameuser}
user_name | {$all}

That should make it obvious which of the values are placeholders and
doesn't interfere with joining database or user catalogs (while I would
call "sameuser" a very unlikely name for a database, "all" might be not
that unlikely name for a user, e.g. someone called like "Albert L. Lucky"
could use that as a login name).

--
Alex


Re: [HACKERS] custom function for converting human readable sizes to bytes

2015-12-30 Thread Shulgin, Oleksandr
On Tue, Dec 29, 2015 at 7:15 PM, Pavel Stehule 
wrote:

>
>> I didn't check out earlier versions of this patch, but the latest one
>> still changes pg_size_pretty() to emit PB suffix.
>>
>> I don't think it is worth it to throw a number of changes together like
>> that.  We should focus on adding pg_size_bytes() first and make it
>> compatible with both pg_size_pretty() and existing GUC units: that is
>> support suffixes up to TB and make sure they have the meaning of powers of
>> 2^10, not 10^3.  Re-using the table present in guc.c would be a plus.
>>
>> Next, we could think about adding handling of PB suffix on input and
>> output, but I don't see a big problem if that is emitted as 1024TB or the
>> user has to specify it as 1024TB in a GUC or argument to pg_size_bytes():
>> an minor inconvenience only.
>>
>
> Last version still support BP in pg_size_pretty. It isn't big change. PB
> isn't issue.
>
> We have to do significant decision - should to support SI units in
> pg_size_bytes? We cannot to change it later. There is disagreement for SI
> units in pg_size_pretty, so SI units in pg_size_bytes can be inconsistent.
>

There is no way at this point to add support for SI units in a consistent
and backwards-compatible manner: both GUC and pg_size_pretty() use SI
suffixes (kB, MB, GB, TB) with the meaning of 2^(10*n) (KiB, MiB, GiB,
TiB).  But given that the size relates to memory or disk space, it is quite
customary *not* to use SI units, so I don't see a point in adding those.

--
Alex


Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries

2015-12-29 Thread Shulgin, Oleksandr
On Tue, Dec 29, 2015 at 4:15 AM, Haribabu Kommi <kommi.harib...@gmail.com>
wrote:

> On Mon, Dec 28, 2015 at 9:09 PM, Shulgin, Oleksandr
> <oleksandr.shul...@zalando.de> wrote:
> >
> > Still this requires a revert of the memory context handling commit for
> > load_hba() and load_ident().  I think you can get around the problem by
> > changing these functions to work with CurrentMemoryContext and set it
> > explicitly to the newly allocated PostmasterContext in
> > PerformAuthentication().  In your function you could then create a
> temporary
> > context to be discarded before leaving the function.
>
> Thanks for the review. I didn't understand your point clearly.
>
> In the attached patch, load_hba uses PostmasterContext if it is present,
> otherwise CurretMemoryContext. PostmasterContext is present only
> in the backend start phase.
>
> > I still think you should not try to re-implement check_hba(), but extend
> > this function with means to report line skip reasons as per your
> > requirements.  Having an optional callback function might be a good fit
> (a
> > possible use case is logging the reasons line by line).
>
> check_hba function is enhanced to fill the hba line details with
> reason for mismatch.
> In check_hba function whenever a mismatch is found, the fill_hbaline
> function is
> called to frame the tuple and inserted into tuple store.
>

This is close enough, but what I actually mean by "a callback" is more or
less like the attached version.

While at it, I've also added some trivial code to preserve keyword quoting
in database and user fields, as well as added netmask output parameter;
also documentation is extended a little.

The biggest question for me is the proper handling of memory contexts for
HBA and ident files data.  I think it makes sense to release them
explicitly because with the current state of affairs, we have dangling
pointers in parsed_{hba,ident}_{context,lines} after release of
PostmasterContext.  The detailed comment in postgres.c
around MemoryContextDelete(PostmasterContext); suggests that it's not easy
already to keep track of the all things that might be affected by this
cleanup step:

/*
* If the PostmasterContext is still around, recycle the space; we don't
* need it anymore after InitPostgres completes.  Note this does not trash
* *MyProcPort, because ConnCreate() allocated that space with malloc()
* ... else we'd need to copy the Port data first.  Also, subsidiary data
* such as the username isn't lost either; see ProcessStartupPacket().
*/

Not sure if we need any advanced machinery here like some sort of cleanup
hooks list?  For now I've added discard_{hba,ident}() functions and call
them explicitly where appropriate.

--
Alex
From 56b584c25b952d6855d2a3d77eb40755d982efb9 Mon Sep 17 00:00:00 2001
From: Oleksandr Shulgin <oleksandr.shul...@zalando.de>
Date: Tue, 29 Dec 2015 14:51:27 +0100
Subject: [PATCH] WIP: pg_hba_lookup()

---
 doc/src/sgml/func.sgml   |  39 +++
 src/backend/catalog/system_views.sql |   9 +
 src/backend/libpq/hba.c  | 663 ++-
 src/backend/utils/init/postinit.c|  26 +-
 src/include/catalog/pg_proc.h|   2 +
 src/include/libpq/hba.h  |   4 +
 src/include/utils/builtins.h |   3 +
 7 files changed, 720 insertions(+), 26 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index e08bf60..2594dd5
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*** SELECT collation for ('foo' COLLATE "de_
*** 16576,16581 
--- 16576,16594 
 text
 set parameter and return new value

+   
+
+ 
+  pg_hba_lookup
+ 
+ pg_hba_lookup(database text,
+ user_name text
+ [, address text]
+ [, ssl_inuse text)
+
+record
+scan pg_hba.conf and return examined entries up to a matching one
+   
   
  
 
*** SELECT set_config('log_statement_stats',
*** 16633,16638 
--- 16646,16677 
  
 
  
+
+ pg_hba_lookup returns a set of records containing the
+ line number, mode, type, database, user_name, address, netmask, hostname,
+ method, options and skip reason. For example, to debug problems with user
+ kommih trying to connect to a database postgres
+ from IPv6-address ::1, one can issue a following query:
+ 
+ postgres=# SELECT * FROM pg_hba_lookup('postgres', 'kommih', '::1');
+  line_number |  mode   | type  | database | user_name |  address  | netmask | hostname | method | options |  reason  
+ -+-+---+--+---+

Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries

2015-12-28 Thread Shulgin, Oleksandr
On Thu, Dec 24, 2015 at 5:16 AM, Haribabu Kommi <kommi.harib...@gmail.com>
wrote:

> On Thu, Dec 24, 2015 at 2:37 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> > "Shulgin, Oleksandr" <oleksandr.shul...@zalando.de> writes:
> >> 1. Have you considered re-loading the HBA file upon call to this
> function
> >> in a local context instead of keeping it in the backends memory?
> >
> > Aside from the security questions, please consider that this feature
> should
> > work similarly to the current implementation of the pg_file_settings
> view,
> > namely it tells you about what is *currently* in the on-disk files, not
> > necessarily what is the active setting in the postmaster's memory.
> > A backend could not be entirely sure about the postmaster's state anyway;
> > and even if it could be, one of the major applications for features like
> > this is testing manual changes to the files before you SIGHUP the
> > postmaster.  So re-reading the files on each usage is a Good Thing, IMO,
> > even if it sounds inefficient.
> >
> >> 2. I also wonder why JSONB arrays for database/user instead of TEXT[]?
> >
> > Yes, that seems rather random to me too.
>
> Here I attached updated patch with the following changes,
> - Local loading of HBA file to show the authentication data
> - Changed database and user types are text[]
>

Still this requires a revert of the memory context handling commit for
load_hba() and load_ident().  I think you can get around the problem by
changing these functions to work with CurrentMemoryContext and set it
explicitly to the newly allocated PostmasterContext in
PerformAuthentication().  In your function you could then create a
temporary context to be discarded before leaving the function.

I still think you should not try to re-implement check_hba(), but extend
this function with means to report line skip reasons as per your
requirements.  Having an optional callback function might be a good fit (a
possible use case is logging the reasons line by line).

Thank you.
--
Alex


Re: [HACKERS] custom function for converting human readable sizes to bytes

2015-12-28 Thread Shulgin, Oleksandr
On Tue, Dec 22, 2015 at 10:45 AM, Pavel Stehule 
wrote:

> Hi
>
> 2015-12-21 16:11 GMT+01:00 Robert Haas :
>
>> On Sun, Dec 20, 2015 at 4:54 AM, Pavel Stehule 
>> wrote:
>> > new update:
>> >
>> > 1. unit searching is case insensitive
>> >
>> > 2. initial support for binary byte prefixes - KiB, MiB, ..  (IEC
>> standard),
>> > change behave for SI units
>> >
>> > Second point is much more complex then it is looking - if pg_size_bytes
>> > should be consistent with pg_size_pretty.
>> >
>> > The current pg_size_pretty and transformations in guc.c are based on
>> JEDEC
>> > standard. Using this standard for GUC has sense - using it for object
>> sizes
>> > is probably unhappy.
>> >
>> > I tried to fix (and enhance) pg_size_pretty - now reports correct
>> units, and
>> > via second parameter it allows to specify base: 2 (binary, IEC  -
>> default)
>> > or 10 (SI).
>>
>> -1 from me.  I don't think we should muck with the way pg_size_pretty
>> works.
>>
>
> new update - I reverted changes in pg_size_pretty
>

Hi,

I didn't check out earlier versions of this patch, but the latest one still
changes pg_size_pretty() to emit PB suffix.

I don't think it is worth it to throw a number of changes together like
that.  We should focus on adding pg_size_bytes() first and make it
compatible with both pg_size_pretty() and existing GUC units: that is
support suffixes up to TB and make sure they have the meaning of powers of
2^10, not 10^3.  Re-using the table present in guc.c would be a plus.

Next, we could think about adding handling of PB suffix on input and
output, but I don't see a big problem if that is emitted as 1024TB or the
user has to specify it as 1024TB in a GUC or argument to pg_size_bytes():
an minor inconvenience only.

--
Alex


Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries

2015-12-23 Thread Shulgin, Oleksandr
On Wed, Dec 16, 2015 at 9:33 AM, Haribabu Kommi 
wrote:

>
> Function is changed to accept default values.
>
> Apart from the above, added a local memory context to allocate the memory
> required for forming tuple for each line. This context resets for every
> hba line
> to avoid consuming unnecessary memory for scenarios of huge pg_hba.conf
> files.
>
> In the revert_hba_context_release_in_backend patch, apart from reverting
> the commit - 1e24cf64. In tokenize_file function, changed the new context
> allocation from CurrentMemoryContext instead of TopMemoryContext.
>
> Patch apply process:
> 1. revert_hba_context_release_in_backend_2.patch
> 2. pg_hba_lookup_poc_v7.patch
>

Hello,

1. Have you considered re-loading the HBA file upon call to this function
in a local context instead of keeping it in the backends memory?  I do not
expect that the revert of 1e24cf645d24aab3ea39a9d259897fd0cae4e4b6 would be
accepted, as the commit message refers to potential security problems with
keeping this data in backend memory:

... This saves a
probably-usually-negligible amount of space per running backend.  It
also
avoids leaving potentially-security-sensitive data lying around in
memory
in processes that don't need it.  You'd have to be unusually paranoid to
think that that amounts to a live security bug, so I've not gone so far
as
to forcibly zero the memory; but there surely isn't a good reason to
keep
this data around.

2. I also wonder why JSONB arrays for database/user instead of TEXT[]?

3. What happens with special keywords for database column like
sameuser/samerole/samegroup and for special values in the user column?

4. Would it be possible to also include the raw unparsed line from the HBA
file?  Just the line number is probably enough when you have access to the
host, but to show the results to someone else you might need to copy the
raw line manually.  Not a big deal anyway.

5. Some tests demonstrating possible output would be really nice to have.

Cheers!
--
Alex


Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries

2015-12-23 Thread Shulgin, Oleksandr
On Wed, Dec 23, 2015 at 12:56 PM, Haribabu Kommi <kommi.harib...@gmail.com>
wrote:

> On Wed, Dec 23, 2015 at 8:54 PM, Shulgin, Oleksandr
> <oleksandr.shul...@zalando.de> wrote:
> >
> > 1. Have you considered re-loading the HBA file upon call to this
> function in
> > a local context instead of keeping it in the backends memory?  I do not
> > expect that the revert of 1e24cf645d24aab3ea39a9d259897fd0cae4e4b6 would
> be
> > accepted, as the commit message refers to potential security problems
> with
> > keeping this data in backend memory:
> >
> > ... This saves a
> > probably-usually-negligible amount of space per running backend.  It
> > also
> > avoids leaving potentially-security-sensitive data lying around in
> > memory
> > in processes that don't need it.  You'd have to be unusually
> paranoid to
> > think that that amounts to a live security bug, so I've not gone so
> far
> > as
> > to forcibly zero the memory; but there surely isn't a good reason to
> > keep
> > this data around.
>
> Yes, it is possible to load the file locally whenever the lookup
> function is called.
> Only thing i am considering is performance impact because of huge file load
> whenever the function is called.
>

Not sure why do you call it huge?  Anyway since this is going to be used to
debug connection problems, I would expect the performance impact to be
insignificant.

On the other hand, re-loading the HBA file in the function will enable the
DBA to test if a proposed change to the HBA file fixes the client problem
w/o making the server to reload the config.

> 3. What happens with special keywords for database column like
> > sameuser/samerole/samegroup and for special values in the user column?
>
> There is no special handling for the keywords in this approach. Based on
> the
> inputs to the function, it checks for the matched line in all hba lines.
>
> For example, if a configuration line contains 'all' for database and user
> names,
> then if user gives any database name and user name this line will be
> matched
> and returned.
>

Now I wonder why are you trying to re-implement the checks found in hba.c's
check_hba() instead of extending this function to provide textual reason(s)
for skipping/rejection?

--
Alex


Re: [HACKERS] psql - -dry-run option

2015-12-18 Thread Shulgin, Oleksandr
On Thu, Dec 17, 2015 at 9:13 PM, Tom Lane  wrote:

>
> Whether we really need a feature like that isn't clear though; it's not
> like it's hard to test things that way now.  Stick in a BEGIN with no
> COMMIT, you're there.  The problem only comes in if you start expecting
> the behavior to be bulletproof.  Maybe I'm being too pessimistic about
> what people would believe a --dry-run switch to be good for ... but
> I doubt it.
>

I'm on the same line: BEGIN/ROLLBACK requires trivial effort and a
--dry-run option might give a false sense of security, but it cannot
possibly rollback side-effects of user functions which modify filesystem or
interact with the outside world in some other way.

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-12-17 Thread Shulgin, Oleksandr
On Wed, Dec 16, 2015 at 8:39 PM, Tomas Vondra <tomas.von...@2ndquadrant.com>
wrote:

> Hi,
>
> On 12/01/2015 10:34 AM, Shulgin, Oleksandr wrote:
>
>>
>> I have the plans to make something from this on top of
>> pg_stat_statements and auto_explain, as I've mentioned last time.  The
>> next iteration will be based on the two latest patches above, so it
>> still makes sense to review them.
>>
>> As for EXPLAIN ANALYZE support, it will require changes to core, but
>> this can be done separately.
>>
>
> I'm re-reading the thread, and I have to admit I'm utterly confused what
> is the current plan, what features it's supposed to provide and whether it
> will solve the use case I'm most interested in. Oleksandr, could you please
> post a summary explaining that?
>
> My use case for this functionality is debugging of long-running queries,
> particularly getting EXPLAIN ANALYZE for them. In such cases I either can't
> run the EXPLAIN ANALYZE manually because it will either run forever (just
> like the query) and may not be the same (e.g. due to recent ANALYZE). So we
> need to extract the data from the process executing the query.
>
> I'm not essentially opposed to doing this in an extension (better an
> extension than nothing), but I don't quite see how you could to do that
> from pg_stat_statements or auto_explain. AFAIK both extensions merely use
> hooks before/after the executor, and therefore can't do anything in between
> (while the query is actually running).
>
> Perhaps you don't intend to solve this particular use case? Which use
> cases are you aiming to solve, then? Could you explain?
>

Hi Tomas.

Thanks for your interest in this patch!

My motivation is the same as your use case: having a long-running query, be
able to look inside this exact query run by this exact backend.

I admit the evolution of ideas in this patch can be very confusing: we were
trying a number of different approaches, w/o thinking deeply on the
implications, just to have a proof of concept.

Maybe all we need to do is add another hook somewhere in the executor, and
> push the explain analyze into pg_stat_statements once in a while, entirely
> eliminating the need for inter-process communication (signals, queues, ...).
>
> I'm pretty sure we don't need this for "short" queries, because in those
> cases we have other means to get the explain analyze (e.g. running the
> query again or auto_explain). So I can imagine having a rather high
> threshold (say, 60 seconds or even more), and we'd only push the explain
> analyze after crossing it. And then we'd only update it once in a while -
> say, again every 60 seconds.
>
> Of course, this might be configurable by two GUCs:
>
>pg_stat_statements.explain_analyze_threshold = 60  # -1 is "off"
>pg_stat_statements.explain_analyze_refresh = 60
>
> FWIW I'd still prefer having "EXPLAIN ANALYZE" in core, but better this
> than nothing.
>

Yes, this is how pg_stat_statements idea comes into play: even if we
implement support for online EXPLAIN ANALYZE, enabling it for every query
is an overkill, but if you don't enable it from the query start, there is
no way to enable it later on as the query has already progressed.  So in
order to know for which queries it makes sense to enable this feature, we
need to know what is the query's expected run time, thus pg_stat_statements
seems like a natural place to obtain this information and/or trigger the
behavior.

I'm also all for simplification of the underlying communication mechanism:
shared memory queues are neat, but not necessarily the best way to handle
it.  As for the use of signals: I believe this was a red herring, it will
make the code much less fragile if the progressing backend itself will
publish intermediary EXPLAIN ANALYZE reports for other backends to read.

The EXPLAIN (w/o ANALYZE) we can do completely as an extension: no core
support required.  To enable ANALYZE it will require a little hacking
around Instrumentation methods: otherwise the Explain functions just crash
when run in the middle of the query.

Hope that makes it clear.

--
Alex


Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2015-12-17 Thread Shulgin, Oleksandr
On Tue, Dec 15, 2015 at 11:30 PM, Tomas Vondra  wrote:

>
> Attached is a spreadsheet with results for various work_mem values, and
> also with a smaller data set (just 30M rows in the fact table), which
> easily fits into memory. Yet it shows similar gains, shaving off ~40% in
> the best case, suggesting that this is not just thanks to reduction of I/O
> when forcing the temp files to disk.


A neat idea!  Have you possibly tried to also collect statistics about
actual false-positive rates and filter allocation sizes in every of the
collected data points?

--
Alex


Re: [HACKERS] More stable query plans via more predictable column statistics

2015-12-07 Thread Shulgin, Oleksandr
On Fri, Dec 4, 2015 at 6:48 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Tue, Dec 1, 2015 at 10:21 AM, Shulgin, Oleksandr
> <oleksandr.shul...@zalando.de> wrote:
> >
> > What I have found is that in a significant percentage of instances, when
> a
> > duplicate sample value is *not* put into the MCV list, it does produce
> > duplicates in the histogram_bounds, so it looks like the MCV cut-off
> happens
> > too early, even though we have enough space for more values in the MCV
> list.
> >
> > In the extreme cases I've found completely empty MCV lists and histograms
> > full of duplicates at the same time, with only about 20% of distinct
> values
> > in the histogram (as it turns out, this happens due to high fraction of
> > NULLs in the sample).
>
> Wow, this is very interesting work.  Using values_cnt rather than
> samplerows to compute avgcount seems like a clear improvement.  It
> doesn't make any sense to raise the threshold for creating an MCV
> based on the presence of additional nulls or too-wide values in the
> table.  I bet compute_distinct_stats needs a similar fix.


Yes, and there's also the magic 1.25 multiplier in that code.  I think it
would make sense to agree first on how exactly the patch for
compute_scalar_stats() should look like, then port the relevant bits to
compute_distinct_stats().


> But for
> plan stability considerations, I'd say we should back-patch this all
> the way, but those considerations might mitigate for a more restrained
> approach.  Still, maybe we should try to sneak at least this much into
> 9.5 RSN, because I have to think this is going to help people with
> mostly-NULL (or mostly-really-wide) columns.
>

I'm not sure.  Likely people would complain or have found this out on their
own if they were seriously affected.

What I would be interested is people running the queries I've shown on
their data to see if there are any interesting/unexpected patterns.

As far as the rest of the fix, your code seems to remove the handling
> for ndistinct < 0.  That seems unlikely to be correct, although it's
> possible that I am missing something.


The difference here is that ndistinct at this scope in the original code
did hide a variable from an outer scope.  That one could be < 0, but in my
code there is no inner-scope ndistinct, we are referring to the outer scope
variable which cannot be < 0.


> Aside from that, the rest of
> this seems like a policy change, and I'm not totally sure off-hand
> whether it's the right policy.  Having more MCVs can increase planning
> time noticeably, and the point of the existing cutoff is to prevent us
> from choosing MCVs that aren't actually "C".  I think this change
> significantly undermines those protections.  It seems to me that it
> might be useful to evaluate the effects of this part of the patch
> separately from the samplerows -> values_cnt change.
>

Yes, that's why I was wondering if frequency cut-off approach might be
helpful here.  I'm going to have a deeper look at array's typanalyze
implementation at the least.

--
Alex


Re: [HACKERS] Logical replication and multimaster

2015-12-03 Thread Shulgin, Oleksandr
On Thu, Dec 3, 2015 at 8:34 AM, Craig Ringer  wrote:

> On 3 December 2015 at 14:54, konstantin knizhnik <
> k.knizh...@postgrespro.ru> wrote:
>
>>
>> Are there some principle problems with it? In BDR it was handled in
>> alternative way, using executor callback. It will be much easier if DDL can
>> be replicated in the same way as normal SQL statements.
>>
>
> It can't. I wish it could.
>

That reminds me of that DDL deparsing patch I was trying to revive a while
ago.  Strangely, I cannot find it in any of the commit fests.  Will add it.

--
Alex


Re: [HACKERS] More stable query plans via more predictable column statistics

2015-12-02 Thread Shulgin, Oleksandr
On Tue, Dec 1, 2015 at 7:00 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> "Shulgin, Oleksandr" <oleksandr.shul...@zalando.de> writes:
> > This post summarizes a few weeks of research of ANALYZE statistics
> > distribution on one of our bigger production databases with some
> real-world
> > data and proposes a patch to rectify some of the oddities observed.
>
> Please add this to the 2016-01 commitfest ...
>

Added: https://commitfest.postgresql.org/8/434/


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-12-01 Thread Shulgin, Oleksandr
On Tue, Dec 1, 2015 at 12:04 AM, Simon Riggs  wrote:

> On 30 November 2015 at 22:27, Julien Rouhaud 
> wrote:
>
>
>> I registered as reviewer on this, but after reading the whole thread for
>> the second time, it's still not clear to me if the last two submitted
>> patches (0001-Add-auto_explain.publish_plans.patch and
>> 0002-Add-SHM-table-of-contents-to-the-explain-DSM.patch) are still
>> needed reviews, since multiple refactoring ideas and objections have
>> been raised since.
>>
>
> I looked into it more deeply and decided partial EXPLAINs aren't very
> useful yet.
>
> I've got some thoughts around that which I'll publish when I have more
> time. I suggest this is a good idea, just needs some serious
> committer-love, so we should bounce this for now.
>

I have the plans to make something from this on top of pg_stat_statements
and auto_explain, as I've mentioned last time.  The next iteration will be
based on the two latest patches above, so it still makes sense to review
them.

As for EXPLAIN ANALYZE support, it will require changes to core, but this
can be done separately.

--
Alex


[HACKERS] More stable query plans via more predictable column statistics

2015-12-01 Thread Shulgin, Oleksandr
Hi Hackers!

This post summarizes a few weeks of research of ANALYZE statistics
distribution on one of our bigger production databases with some real-world
data and proposes a patch to rectify some of the oddities observed.


Introduction


We have observed that for certain data sets the distribution of samples
between most_common_vals and histogram_bounds can be unstable: so that it
may change dramatically with the next ANALYZE run, thus leading to
radically different plans.

I was revisiting the following performance thread and I've found some
interesting details about statistics in our environment:


http://www.postgresql.org/message-id/flat/CAMkU=1zxynmn11yl8g7agf7k5u4zhvjn0dqcc_eco1qs49u...@mail.gmail.com#CAMkU=1zxynmn11yl8g7agf7k5u4zhvjn0dqcc_eco1qs49u...@mail.gmail.com

My initial interest was in evaluation if distribution of samples could be
made more predictable and less dependent on the factor of luck, thus
leading to more stable execution plans.


Unexpected findings
===

What I have found is that in a significant percentage of instances, when a
duplicate sample value is *not* put into the MCV list, it does produce
duplicates in the histogram_bounds, so it looks like the MCV cut-off
happens too early, even though we have enough space for more values in the
MCV list.

In the extreme cases I've found completely empty MCV lists and histograms
full of duplicates at the same time, with only about 20% of distinct values
in the histogram (as it turns out, this happens due to high fraction of
NULLs in the sample).


Data set and approach
=

In order to obtain these insights into distribution of statistics samples
on one of our bigger databases (~5 TB, 2,300+ tables, 31,000+ individual
attributes) I've built some queries which all start with the following CTEs:

WITH stats1 AS (
SELECT *,
   current_setting('default_statistics_target')::int stats_target,

   array_length(most_common_vals,1) AS num_mcv,
   (SELECT SUM(f) FROM UNNEST(most_common_freqs) AS f) AS mcv_frac,

   array_length(histogram_bounds,1) AS num_hist,
   (SELECT COUNT(DISTINCT h)
  FROM UNNEST(histogram_bounds::text::text[]) AS h) AS
distinct_hist

   FROM pg_stats
  WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
),
stats2 AS (
SELECT *,
   distinct_hist::float/num_hist AS hist_ratio
  FROM stats1
)

The idea here is to collect the number of distinct values in the histogram
bounds vs. the total number of bounds.  One of the reasons why there might
be duplicates in the histogram is a fully occupied MCV list, so we collect
the number of MCVs as well, in order to compare it with the stats_target.

These queries assume that all columns use the default statistics target,
which was actually the case with the database where I was testing this.
 (It is straightforward to include per-column stats target in the picture,
but to do that efficiently, one will have to copy over the definition of
pg_stats view in order to access pg_attribute.attstattarget, and also will
have to run this as superuser to access pg_statistic.  I wanted to keep the
queries simple to make it easier for other interested people to run these
queries in their environment, which quite likely excludes superuser
access.  The more general query is also included[3].)


With the CTEs shown above it was easy to assess the following
"meta-statistics":

WITH ...
SELECT count(1),
   min(hist_ratio)::real,
   avg(hist_ratio)::real,
   max(hist_ratio)::real,
   stddev(hist_ratio)::real
  FROM stats2
 WHERE histogram_bounds IS NOT NULL;

-[ RECORD 1 ]
count  | 18980
min| 0.181818
avg| 0.939942
max| 1
stddev | 0.143189

That doesn't look too bad, but the min value is pretty fishy already.  If I
would run the same query, limiting the scope to non-fully-unique
histograms, with "WHERE distinct_hist < num_hist" instead, the picture
would be a bit different:

-[ RECORD 1 ]
count  | 3724
min| 0.181818
avg| 0.693903
max| 0.990099
stddev | 0.170845

It shows that about 20% of all analyzed columns that have a histogram
(3724/18980), also have some duplicates in it, and that they have only
about 70% of distinct sample values on average.


Select statistics examples
==

Apart from mere aggregates it is interesting to look at some specific
MCVs/histogram examples.  The following query is aimed to reconstruct
values of certain variables present in analyze.c code of
compute_scalar_stats() (again, with the same CTEs as above):

WITH ...
SELECT schemaname ||'.'|| tablename ||'.'|| attname || (CASE inherited WHEN
TRUE THEN ' (inherited)' ELSE '' END) AS columnname,
   n_distinct, null_frac,
   num_mcv, most_common_vals, most_common_freqs,
   mcv_frac, (mcv_frac / (1 - null_frac))::real AS nonnull_mcv_frac,
   distinct_hist, num_hist, hist_ratio,
   histogram_bounds
  FROM 

Re: [HACKERS] Revisiting pg_stat_statements and IN() (Was: Re: pg_stat_statements fingerprinting logic and ArrayExpr)

2015-11-30 Thread Shulgin, Oleksandr
On Wed, Nov 25, 2015 at 9:13 AM, Lukas Fittl  wrote:

> On Mon, Nov 23, 2015 at 11:53 PM, Peter Geoghegan  wrote:
>
>> One specific justification he gave for not using pg_stat_statements was:
>>
>> "Doesn’t merge bind vars in IN()" (See slide #11)
>>
>> I wonder:
>>
>> * How do other people feel about this? Personally, I've seen enough
>> problems of this kind in the field that "slippery slope" arguments
>> against this don't seem very compelling.
>>
>
> As someone who runs a little monitoring service thats solely based on
> pg_stat_statements data, ignoring IN list length would certainly be a good
> change.
>
> We currently do this in post-processing, together with other data cleanup
> (e.g. ignoring the length of a VALUES list in an INSERT statement).
>
> Given the fact that pgss data is normalized & you don't know which plan
> was chosen, I don't think distinguishing based on the length of the list
> helps anyone really.
>
> I see pg_stat_statements as a high-level overview of which queries have
> run, and which ones you might want to look into closer using e.g.
> auto_explain.
>

I still have the plans to try to marry pg_stat_statements and auto_explain
for the next iteration of "online query plans" extension I was proposing a
few months ago, and the first thing I was going to look into is rectifying
this problem with IN() jumbling.  So, have a +1 from me.

--
Alex


Re: [HACKERS] Erroneous cost estimation for nested loop join

2015-11-09 Thread Shulgin, Oleksandr
On Mon, Nov 9, 2015 at 11:08 AM,  wrote:

>
>   - cost parameter calibration: random_page_cost = 92.89
>

This demands some explanation and raises question of value of seq_page_cost
parameter -- I don't see anything about it your mail.

--
Alex


Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2015-11-02 Thread Shulgin, Oleksandr
On Mon, Nov 2, 2015 at 1:17 PM, Craig Ringer  wrote:

> Hi all
>
> I'd like to submit pglogical_output for inclusion in the 9.6 series as
> a contrib.


Yay, that looks pretty advanced! :-)

Still digesting...

--
Alex


Re: [HACKERS] extend pgbench expressions with functions

2015-10-30 Thread Shulgin, Oleksandr
On Fri, Sep 18, 2015 at 10:21 AM, Fabien COELHO  wrote:

>
> Hello Kyotaro-san,
>
> My description should have been obscure. Indeed the call tree is
>> finite for *sane* expression node. But it makes infinit call for
>> a value of expr->etype unknown by both evalDouble and
>> evalInt.
>>
>
> Such issue would be detected if the function is actually tested, hopefully
> this should be the case... :-)
>
> However I agree that relying implicitely on the "default" case is not very
> good practice, so I updated the code in the attached v11 to fail
> explicitely on such errors.
>
> I also attached a small test script, which exercises most (all?) functions:
>
>   ./pgbench -f functions.sql -t 1
>

A short review from me:

1. Patch applies cleanly on current HEAD.
2. It compiles without errors or warnings.
3. The attached test case can be executed w/o symptoms of any problem and
it produces meaningful results.

Should we not allow for functions taking 0 arguments?  Since we're already
into some math here, how about pi()? ;-)

I understand requiring at least 1 arg simplifies the code a bit, but right
now it reports syntax error for "random()", while it correctly reports
unexpected number of arguments for "random(1,2,3)".  We would need another
check for min() and max() which expect >=1 arguments, but it's easy to add.

I would also argue that we should rename "random" to "rand" here to avoid
confusion with the familiar SQL function "random()" that doesn't take
arguments.

--
Alex


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-10-27 Thread Shulgin, Oleksandr
On Mon, Oct 26, 2015 at 10:02 PM, Christopher Browne 
wrote:

>
> On 26 October 2015 at 16:25, Peter Eisentraut  wrote:
>
>> On 10/14/15 6:41 AM, Victor Wagner wrote:
>> > 1. It is allowed to specify several hosts in the connect string, either
>> > in URL-style (separated by comma) or in param=value form (several host
>> > parameters).
>>
>> I'm not fond of having URLs that are not valid URLs according to the
>> applicable standards.  Because then they can't be parsed or composed by
>> standard libraries.
>>
>> Also, this assumes that all the components other than host and port are
>> the same.  Earlier there was a discussion about why the ports would ever
>> need to be different.  Well, why can't the database names be different?
>>  I could have use for that.
>>
>> I think you should just accept multiple URLs.
>>
>
> I'd give a "+1" on this...
>
> As an area of new behaviour, I don't see a big problem with declining to
> support every wee bit of libpq configuration, and instead requiring the
> use of URLs.
>
> Trying to put "multiplicities" into each parameter (and then considering
> it at the pg_service level, too) is WAY more complicated, and for a
> feature where it seems to me that it is pretty reasonable to have a
> series of fully qualified URLs.
>
> Specifying several URLs should be easier to understand, easier to
> test, easier to code, and easier to keep from blowing up badly.
>

Setting aside all other concerns, have a +1 from me on that too.

--
Alex


Re: [HACKERS] pg_basebackup and replication slots

2015-10-26 Thread Shulgin, Oleksandr
On Mon, Oct 26, 2015 at 12:58 PM, Joshua D. Drake 
wrote:

> Hello,
>
> The fact that pg_basebackup doesn't use replicaiton slots, is that a
> technical limitation or just a, "we need a patch"?
>

I believe it does, but only in master so far, not even in 9.5:
http://www.postgresql.org/docs/devel/static/app-pgbasebackup.html

--
Alex


Re: [HACKERS] SuperUser check in pg_stat_statements

2015-10-19 Thread Shulgin, Oleksandr
On Mon, Oct 19, 2015 at 5:15 PM, rajan  wrote:

> Hello,
>
> When monitoring using pg_stat_satements I see that postgres by default
> conceals queries executed by other users from the user who is selecting the
> pg_stat_statements view.
>
> I have edited the pg_stat_statements.c by disabling the superuser check
> function so that all queries will be visible to all users.
>

Well, you could see that's by design.  What problem are you trying to solve
with that?


> Can this be posted as a patch to postgresql?
>

This is not going to be accepted.

--
Alex


Re: [HACKERS] How to import PostgreSQL 9.2.4 dump to PostgreSQL 9.4.5?

2015-10-15 Thread Shulgin, Oleksandr
On Thu, Oct 15, 2015 at 11:27 AM, FattahRozzaq  wrote:

> Hi guys,
>
> I'm running some test.
>
>
> However, I'm stuck in restoring a PostgreSQL 9.2.4 dump to PostgreSQL
> 9.4.5 database.
> I'm doing the backup using pg_dump version PostgreSQL 9.2.4 in text
> file (SQL dump).
> I'm trying to restore it to PostgreSQL 9.4.5 usng psql.
>
>
> The restoration error message was as below:
>  psql:room.sql:41695: invalid command \n
> Query buffer reset (cleared).
> psql:room.sql:41696: invalid command \n
> Query buffer reset (cleared).
>
>
> The SQL statement in the file as below:
> COPY room (id, user_id, partner_id, rm_cr_ts, rm_up_ts, rm_sv_ts,
> value, last_dlvrd_to_user, last_dlvrd_to_prtnr, last_seen_by_user,
> last_seen_by_prtnr, num_unseen_by_user, num_unseen_by_prtnr,
> last_message_id, last_message_ts, legacy_id, legacy_last_message_id,
> inserted_at, updated_at) FROM stdin;
> 711675  1391103 2657167 -1429528649798556   1429528649798556
>  1429528649798556created 0   0   0   0   1
>   0   197334511429528649798556AUzWjCFTbRHQVpj5SYnD
>AUzWjCR43gMouVUpyttw2015-09-12 18:56:49 2015-09-12 18:56:49
>

19  40868   1191438 14267677738225381426767773822538
>  1426767773822538closed  1426754122000
> 1426754122002   0   1   \N  0
> 550a8a4a73656d2e26162015-09-12 14:05:32
> 2015-09-12 14:05:32
>

Not sure how that could happen, but this second line has only 18 fields
instead of the expected 19.  The first one appears to be correct.

I create the room table as below:
> CREATE TABLE room (
> id integer NOT NULL,
> user_id character varying(255),
> partner_id character varying(255),
> rm_cr_ts bigint,
> rm_up_ts bigint,
> rm_sv_ts bigint,
> value character varying(255),
> last_dlvrd_to_user bigint,
> last_dlvrd_to_prtnr bigint,
> last_seen_by_user bigint,
> last_seen_by_prtnr bigint,
> num_unseen_by_user integer,
> num_unseen_by_prtnr integer,
> last_message_id integer,
> last_message_ts bigint,
> legacy_id text,
> legacy_last_message_id text,
> inserted_at timestamp without time zone NOT NULL,
> updated_at timestamp without time zone NOT NULL
> );
>
>
> All the servers are Debian 7.8 (Wheezy) with hardware:
> - CPU 24 cores
> - RAM 128GB
> - SSD 128GB
>
>
> Have anyone try this before?
> How to do it properly?
>


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-10-15 Thread Shulgin, Oleksandr
On Tue, Sep 29, 2015 at 7:52 PM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:

>
> This is not a change of the direction, but rather of the approach.
> Hitting a process with a signal and hoping it will produce a meaningful
> response in all circumstances without disrupting its current task was way
> too naive.  I'd really want to make this work with ANALYZE, just not as the
> first step.  I believe this approach can be extended to include
> instrumentation support (obviously we will not be able to contain this in
> the auto_explain module).
>

I was thinking about this and what seems to be the biggest problem is when
to actually turn the feature on.  It seems unlikely that someone will want
to enable it unconditionally.  Enabling per-backend also doesn't seem to be
a good approach because you don't know if the next query you'd like to look
at is going to run in this exact backend.

What might be actually usable is poking pg_stat_statements for queryid to
decide if we need to do explain (and possibly analyze).  We could make it
so that the explain is produced for any query that is known to run longer
than certain configurable threshold on average (or we can provide for
enabling this explicitly per entry in pg_stat_statements).  Then the
interested client can go and do pg_explain_backend() on the pid.

If we would also track the plan total costs, we could do some predictions
so that if the same query comes along and is planned with total cost
exceeding the recorded average the configurable amount of percent, then we
enable explain.

Does this make sense to you?  Does this make a good argument for merging
pg_stat_statements and auto_explain into core?

--
Alex


Re: [HACKERS] Database schema diff

2015-10-14 Thread Shulgin, Oleksandr
On Tue, Oct 13, 2015 at 5:48 PM, Michal Novotny <
michal.novo...@trustport.com> wrote:

> Hi guys,
>
> I would like to ask you whether is there any tool to be able to compare
> database schemas ideally no matter what the column order is or to dump
> database table with ascending order of all database columns.
>
> For example, if I have table (called table) in schema A and in schema B
> (the time difference between is 1 week) and I would like to verify the
> column names/types matches but the order is different, i.e.:
>
> Schema A (2015-10-01) |  Schema B (2015-10-07)
>   |
> id int|  id int
> name varchar(64)  |  name varchar(64)
> text text |  description text
> description text  |  text text
>
> Is there any tool to compare and (even in case above) return that both
> tables match? Something like pgdiff or something?
>
> This should work for all schemas, tables, functions, triggers and all
> the schema components?
>

I've used pg_dump --split for this purpose a number of times (it requires
patching pg_dump[1]).

The idea is to produce the two database's schema dumps split into
individual files per database object, then run diff -r against the schema
folders.  This worked really well for my purposes.

This will however report difference in columns order, but I'm not really
sure why would you like to ignore that.

--
Alex

[1]
http://www.postgresql.org/message-id/AANLkTikLHA2x6U=q-t0j0ys78txhfmdtyxjfsrsrc...@mail.gmail.com


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-10-14 Thread Shulgin, Oleksandr
On Wed, Oct 14, 2015 at 12:41 PM, Victor Wagner  wrote:

> On 2015.08.18 at 07:18:50 +0300, Victor Wagner wrote:
>
> > Rationale
> > =
> >
> > Since introduction of the WAL-based replication into the PostgreSQL, it
> is
> > possible to create high-availability and load-balancing clusters.
> >
> > However, there is no support for failover in the client libraries. So,
> only
> > way to provide transparent for client application failover is IP address
> > migration. This approach has some limitation, i.e. it requires that
> > master and backup servers reside in the same subnet or may not be
> > feasible for other reasons.
> >
> > Commercial RDBMS, such as Oracle, employ more flexible approach. They
> > allow to specify multiple servers in the connect string, so if primary
> > server is not available, client library tries to connect to other ones.
> >
> > This approach allows to use geographically distributed failover clusters
> > and also is a cheap way to implement load-balancing (which is not
> > possible with IP address migration).
> >
>
> Attached patch which implements client library failover and
> loadbalancing as was described in the proposal
> <20150818041850.ga5...@wagner.pp.ru>.
>
> This patch implements following fuctionality:
>
> 1. It is allowed to specify several hosts in the connect string, either
> in URL-style (separated by comma) or in param=value form (several host
> parameters).
>

So what exactly would happen with this command: PGHOST=host1 psql -h host2

Or this one: PGHOST=host1 psql host=host2

What about service files?

2. Each host parameter can be accompanied with port specifier separated
> by colon. Port, specified such way takes precedence of port parameter or
> default port for this particular host only.
>
> 3. There is hostorder parameter with two possible valies 'sequential'
> and 'random' (default sequential). 'parallel' hostorder described in the
> proposal is not yet implemented in this version of patch.
>
> 4. In the URL-style connect string parameter loadBalanceHosts=true is
> considered equal to 'hostorder=random' for compatibility with jdbc.
>
> 5. Added new parameter readonly=boolean. If this parameter is false (the
> default) upon successful connection check is performed that backend is
> not in the recovery state. If so, connection is not considered usable
> and next host is tried.
>
> 6. Added new parameter falover_timeout. If no usable host is found and
> this parameter is specified, hosts are retried cyclically until this
> timeout expires. It gives some time for claster management software to
> promote one of standbys to master if master fails. By default there is
> no retries.
>
> Some implementation notes:
>
> 1. Field  pghost in the PGconn structure now stores comma separated list
> of hosts, which is parsed in the connectDBStart. So, expected results of
> some tests in src/interfaces/libpq/test are changed.
>

 trying postgresql://[::1]:12345/db
-dbname='db' host='::1' port='12345' (inet)
+dbname='db' host='[::1]:12345' (inet)

Such a change doesn't look really nice to me.


> 2. Url with colon  but no port number after the host no more considered
> valid.
>

We could live with that, but is there a good reason for backward
compatibility break in this case?

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-29 Thread Shulgin, Oleksandr
On Tue, Sep 29, 2015 at 12:28 AM, Jim Nasby <jim.na...@bluetreble.com>
wrote:

> On 9/18/15 5:05 AM, Shulgin, Oleksandr wrote:
>
>>
>> So this has to be the responsibility of the reply sending backend in the
>> end: to create and release the DSM *at some point*.
>>
>
> What's wrong with just releasing it at the end of the statement? When the
> statement is done there's no point to reading it asynchronously anymore.


That was only one of the problems in signals-based design, and it has other
more significant problems.  The current approach does exactly that:
allocate the segment before running the plan, release at the end of the
statement.

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-29 Thread Shulgin, Oleksandr
On Tue, Sep 29, 2015 at 12:57 PM, Andres Freund <and...@anarazel.de> wrote:

> On 2015-09-25 19:13:13 +0200, Shulgin, Oleksandr wrote:
> > the auto_explain contrib module.  I now propose the most simple thing
> > possible in my opinion: a new GUC option for auto_explain.  It will make
> > every backend, in which the module is loaded via *_preload_libraries
> > mechanism or using CREATE EXTENSION/LOAD commands, to actively publish
> the
> > plans of queries in dynamic shared memory as long as
> > auto_explain.publish_plans is turned on.
>
> Wait. You're proposing that every query does this unconditionally? For
> every single query? That sounds prohibitively expensive to me.
>

Only if the extension is loaded AND the option is turned on.  Likely you
don't want to do this for an OLTP database, but for other types of load it
might make sense.  Also, I think it should be possible to toggle this on a
per-process basis.

Finally, we can put in a cost threshold, so the plans only get published if
they have total_cost >= publish_plan_cost_threshod.

Also, do you mean expensive in terms of CPU or the dynamic shared memory?

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-29 Thread Shulgin, Oleksandr
On Tue, Sep 29, 2015 at 7:01 PM, Simon Riggs <si...@2ndquadrant.com> wrote:

> On 25 September 2015 at 12:13, Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de> wrote:
>
>
>> I now believe that use of ProcessInterrupts() in the recently proposed
>> design as well as manipulation of proc latch due to use of shared memory
>> queue are major blockers.
>>
>> In order to simplify things up, I've taken a step back and looked again
>> at the auto_explain contrib module.  I now propose the most simple thing
>> possible in my opinion: a new GUC option for auto_explain.  It will make
>> every backend, in which the module is loaded via *_preload_libraries
>> mechanism or using CREATE EXTENSION/LOAD commands, to actively publish the
>> plans of queries in dynamic shared memory as long as
>> auto_explain.publish_plans is turned on.
>>
>> The greatest part for me, is that this approach doesn't require handling
>> of signals and is isolated in an external module, so it can be readily used
>> with the current server versions, no need to wait for >= 9.6.
>>
>
> This is a major change of direction, so the thread title no longer applies
> at all.
>
> The requirement is to be able to see the output of EXPLAIN ANALYZE for a
> running process. Ideally, we would dump the diagnostic output for any
> process that runs longer than a specific timeout, so we can decide whether
> to kill it, or just save that for later diagnostics.
>
> I'm interested in helping the original goal happen. Dumping an EXPLAIN,
> without ANALYZE info, is not at all interesting since it has no value for
> immediate action or post-facto diagnosis, sorry to say - making it happen
> for every backend just makes it even less useful.
>

This is not a change of the direction, but rather of the approach.  Hitting
a process with a signal and hoping it will produce a meaningful response in
all circumstances without disrupting its current task was way too naive.
I'd really want to make this work with ANALYZE, just not as the first
step.  I believe this approach can be extended to include instrumentation
support (obviously we will not be able to contain this in the auto_explain
module).

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-29 Thread Shulgin, Oleksandr
On Tue, Sep 29, 2015 at 8:34 PM, Simon Riggs <si...@2ndquadrant.com> wrote:

> On 29 September 2015 at 12:52, Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de> wrote:
>
>
>> Hitting a process with a signal and hoping it will produce a meaningful
>> response in all circumstances without disrupting its current task was way
>> too naive.
>>
>
> Hmm, I would have to disagree, sorry. For me the problem was dynamically
> allocating everything at the time the signal is received and getting into
> problems when that caused errors.
>

What I mean is that we need to move the actual EXPLAIN run out of
ProcessInterrupts().  It can be still fine to trigger the communication
with a signal.

* INIT - Allocate N areas of memory for use by queries, which can be
> expanded/contracted as needed. Keep a freelist of structures.
> * OBSERVER - When requested, gain exclusive access to a diagnostic area,
> then allocate the designated process to that area, then send a signal
> * QUERY - When signal received dump an EXPLAIN ANALYZE to the allocated
> diagnostic area, (set flag to show complete, set latch on observer)
> * OBSERVER - process data in diagnostic area and then release area for use
> by next observation
>
> If the EXPLAIN ANALYZE doesn't fit into the diagnostic chunk, LOG it as a
> problem and copy data only up to the size defined. Any other ERRORs that
> are caused by this process cause it to fail normally.
>

Do you envision problems if we do this with a newly allocated DSM every
time instead of pre-allocated area?  This will have to revert the workflow,
because only the QUERY knows the required segment size:

OBSERVER - sends a signal and waits for its proc latch to be set
QUERY - when signal is received allocates a DSM just big enough to fit the
EXPLAIN plan, then locates the OBSERVER(s) and sets its latch (or their
latches)

The EXPLAIN plan should already be produced somewhere in the executor, to
avoid calling into explain.c from ProcessInterrupts().

That allows the observer to be another backend, or it allows the query
> process to perform self-observation based upon a timeout (e.g. >1 hour) or
> a row limit (e.g. when an optimizer estimate is seen to be badly wrong).
>

Do you think there is one single best place in the executor code where such
a check could be added?  I have very little idea about that.

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-28 Thread Shulgin, Oleksandr
On Mon, Sep 28, 2015 at 7:09 PM, Pavel Stehule <pavel.steh...@gmail.com>
wrote:

>
> 2015-09-28 12:37 GMT+02:00 Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de>:
>
>>
>>> I didn't propose too different solution. There is only one difference -
>>> sharing DSM for smaller data. It is similar to using usual shared memory.
>>>
>>
>> Does this mean implementing some sort of allocator on top of the shared
>> memory segment?  If so, how are you going to prevent fragmentation?
>>
>
> yes, simple memory allocator is necessary in this case. But it should be
> really simple - you can allocate only fixed size blocks - 10KB, 100KB and
> 1MB from separate buffers. So the fragmentation is not possible.
>

Maybe we're talking about completely different ideas, but I don't see how
fixing the block helps to fight fragmentation, in particular.  Can you
sketch a simple example?  E.g. 400 backends share the common segment and
all of them want to publish a plan of ~1kb, for example.  Now what?

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-28 Thread Shulgin, Oleksandr
On Fri, Sep 25, 2015 at 7:13 PM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:

>
> Some implementation details:
>
> For every backend that might be running (MaxBackends) we reserve a
> dsm_handle slot in the addins shared memory.  When the new option is turned
> on, the ExecutorRun hook will produce a plan in whatever format is
> specified by the auto_explain.log_format, allocate a DSM segment, copy the
> plan into the segment and finally store the DSM handle into its own slot.
> No locking is required around this because every backend writes to its slot
> exclusively, no other backend can be writing into it concurrently.  In the
> ExecutorFinish hook we invalidate the backend's slot by setting the handle
> to 0 and deallocate the DSM segment.
>

And this patch adds back the use of table of contents on the DSM.
Benefits: magic number validation; communication of the actual plan size.

--
Alex
From 9294995e200fca2abdccf0825ef852f26df1f4a2 Mon Sep 17 00:00:00 2001
From: Oleksandr Shulgin <oleksandr.shul...@zalando.de>
Date: Mon, 28 Sep 2015 16:20:42 +0200
Subject: [PATCH 2/2] Add SHM table of contents to the explain DSM

By doing so we achieve two things:

1) verify the actual contents of the segment we've attached to, by
enforcing the magic number check;

2) transmit the actual size of the payload, so the receiving side can
allocate and parse in process-local memory, detaching from the DSM
early.
---
 contrib/auto_explain/auto_explain.c | 59 -
 1 file changed, 51 insertions(+), 8 deletions(-)

diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 0fa123f..ac6b750 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -23,6 +23,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "storage/shmem.h"
+#include "storage/shm_toc.h"
 #include "utils/guc.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
@@ -48,6 +49,7 @@ static const struct config_enum_entry format_options[] = {
 	{NULL, 0, false}
 };
 
+#define PG_EXPLAIN_BACKEND_MAGIC	0x79fb2449
 
 /* Shared memory structs */
 static dsm_handle *published_plan_handles = NULL;
@@ -455,18 +457,40 @@ explain_query(QueryDesc *queryDesc)
 static void
 explain_publish_plan(QueryDesc *queryDesc)
 {
-	StringInfo		str = explain_query(queryDesc);
+	StringInfo		str;
+	Size			plan_size;
+	shm_toc_estimator	toc_estimator;
+	Sizeseg_size;
 	dsm_segment	   *dsm;
+	shm_toc		   *toc = NULL;
 
 	attach_published_plan_handles();
 
-	dsm = dsm_create(str->len + 1, DSM_CREATE_NULL_IF_MAXSEGMENTS);
+	str = explain_query(queryDesc);
+	plan_size = str->len + 1;
+
+	shm_toc_initialize_estimator(_estimator);
+	shm_toc_estimate_keys(_estimator, 2);	/* size, plan */
+	shm_toc_estimate_chunk(_estimator, sizeof(Size));
+	shm_toc_estimate_chunk(_estimator, plan_size);
+	seg_size = shm_toc_estimate(_estimator);
+
+	dsm = dsm_create(seg_size, DSM_CREATE_NULL_IF_MAXSEGMENTS);
 	if (dsm)
 	{
-		char	   *buffer = (char *) dsm_segment_address(dsm);
+		Size	   *size_ptr;
+		char	   *buffer;
 		dsm_handle	handle;
 
-		memcpy(buffer, str->data, str->len + 1);
+		toc = shm_toc_create(PG_EXPLAIN_BACKEND_MAGIC, dsm_segment_address(dsm), seg_size);
+
+		size_ptr = (Size *) shm_toc_allocate(toc, sizeof(Size));
+		*size_ptr = plan_size;
+		shm_toc_insert(toc, 0, size_ptr);
+
+		buffer = (char *) shm_toc_allocate(toc, plan_size);
+		memcpy(buffer, str->data, plan_size);
+		shm_toc_insert(toc, 1, buffer);
 
 		MyPublishedPlanSegment = dsm;
 
@@ -570,9 +594,30 @@ pg_explain_backend(PG_FUNCTION_ARGS)
 	{
 		PG_TRY();
 		{
-			const char	   *p = (const char *) dsm_segment_address(seg);
+			shm_toc		   *toc;
+			Size			plan_size;
+			char		   *buffer;
+			const char	   *p;
+
+			toc = shm_toc_attach(PG_EXPLAIN_BACKEND_MAGIC, dsm_segment_address(seg));
+			if (toc == NULL)
+ereport(ERROR,
+		(errmsg("bad magic in dynamic memory segment")));
+
+			plan_size = *((Size *) shm_toc_lookup(toc, 0));
+
+			buffer = palloc(plan_size);
+			memcpy(buffer, shm_toc_lookup(toc, 1), plan_size);
+
+			/* we can detach DSM already, but only if it's not our own one */
+			if (handle)
+			{
+handle = 0;		/* avoid possible extra detach in PG_CATCH */
+dsm_detach(seg);
+			}
 
 			/* break into tuples on newline */
+			p = buffer;
 			while (*p)
 			{
 const char *q = strchr(p, '\n');
@@ -592,9 +637,7 @@ pg_explain_backend(PG_FUNCTION_ARGS)
 p += len + 1;
 			}
 
-			/* detach DSM, but only if it's not our own one */
-			if (handle)
-dsm_detach(seg);
+			pfree(buffer);
 		}
 		PG_CATCH();
 		{
-- 
2.1.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-28 Thread Shulgin, Oleksandr
On Mon, Sep 28, 2015 at 12:05 PM, Pavel Stehule <pavel.steh...@gmail.com>
wrote:

>
> 2015-09-28 12:01 GMT+02:00 Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de>:
>
>> On Sun, Sep 27, 2015 at 8:05 AM, Pavel Stehule <pavel.steh...@gmail.com>
>> wrote:
>>
>> the preparing of content before execution is interesting idea, that can
>>> be used more. The almost queries and plans are not too big, so when the
>>> size of content is not too big - less than 1MB, then can be used one DSM
>>> for all backends.
>>>
>>
>>
>>> When size of content is bigger than limit, then DSM will be allocated
>>> specially for this content. The pointer to DSM and offset can be stored in
>>> requested process slot. The reading and writing to requested slot should be
>>> protected by spinlock, but it should block only two related processes for
>>> short time (copy memory).
>>>
>>
>> Sorry, I don't think this will fly.
>>
>> The whole idea is that a backend publishes the plan of a query just
>> before running it and it doesn't care which other backend(s) might be
>> reading it, how many times and in which order.  The only required locking
>> (implicit) is contained in the code for dsm_attach/detach().
>>
>
> I didn't propose too different solution. There is only one difference -
> sharing DSM for smaller data. It is similar to using usual shared memory.
>

Does this mean implementing some sort of allocator on top of the shared
memory segment?  If so, how are you going to prevent fragmentation?

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-28 Thread Shulgin, Oleksandr
On Sun, Sep 27, 2015 at 8:05 AM, Pavel Stehule 
wrote:

the preparing of content before execution is interesting idea, that can be
> used more. The almost queries and plans are not too big, so when the size
> of content is not too big - less than 1MB, then can be used one DSM for all
> backends.
>


> When size of content is bigger than limit, then DSM will be allocated
> specially for this content. The pointer to DSM and offset can be stored in
> requested process slot. The reading and writing to requested slot should be
> protected by spinlock, but it should block only two related processes for
> short time (copy memory).
>

Sorry, I don't think this will fly.

The whole idea is that a backend publishes the plan of a query just before
running it and it doesn't care which other backend(s) might be reading it,
how many times and in which order.  The only required locking (implicit) is
contained in the code for dsm_attach/detach().


> Other possibility is showing the size of content in requested process
> slot. Then the requester can preallocate enough size of shared memory. But
> this doesn't solve a issues related to waiting requester for content. So
> first variant is pretty simple, and should be preferred. The disadvantage
> is clear - it can enforce maybe significant slowdown of fast queries.
>

Both of these approaches have just too many synchronization problems, IMO.

--
Alex


  1   2   >