Re: [HACKERS] Re: [BUGS] BUG #14821: idle_in_transaction_session_timeout sometimes gets ignored when statement timeout is pending

2017-10-11 Thread Lukas Fittl
On Wed, Oct 11, 2017 at 2:11 PM Andres Freund <and...@anarazel.de> wrote:

> I've pushed this. Thanks for the report & fix!
>

Excellent, thanks!

Best,
Lukas
-- 
Lukas Fittl


[HACKERS] Re: [BUGS] BUG #14821: idle_in_transaction_session_timeout sometimes gets ignored when statement timeout is pending

2017-09-20 Thread Lukas Fittl
Hi,

As per the bug report at
https://www.postgresql.org/message-id/20170921010956.17345.61461%40wrigleys.postgresql.org
it seems that the query cancellation holdoff logic in ProcessInterrupts is
a bit overly aggressive in keeping other interrupts from running.

In particular I've seen an issue in the wild where
idle_in_transaction_session_timeout did not get triggered because
the HOLD_CANCEL_INTERRUPTS() in SocketBackend wraps around a pq_getbyte()
call, and so ProcessInterrupts doesn't do anything when it gets called
because the query cancel holdoff counter is positive.

Andres suggested the following re-ordering of the logic on -bugs:


> On Wed, Sep 20, 2017 at 6:29 PM, Andres Freund <and...@anarazel.de> wrote:
>>
>>
>> if (QueryCancelPending && QueryCancelHoldoffCount != 0)
>> {
>> /* rearm */
>> }
>> else if (QueryCancelPending)
>> {
>> /* handle interrupt */
>> }
>>
>
Which is implemented in the attached patch.

Unless someone wants to pick this up right away, I'll register it in the
next commitfest tomorrow.

Best,
Lukas

-- 
Lukas Fittl


0001-Only-skip-query-cancel-itself-when-query-cancel-hold.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Use $ parameters as replacement characters for pg_stat_statements

2017-03-27 Thread Lukas Fittl
On Mon, Mar 27, 2017 at 8:24 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>
> Pushed with minor adjustments.
>

Excellent - thanks for your review and all the discussion here!


> The main non-cosmetic thing I did was to replace the floor(log10())
> business with plain constant "10" as I suggested before.  That's
> what we do in other places --- see int4out for an example --- and
> frankly I did not feel that a small space savings in a transient
> string buffer was worth the intellectual effort to verify whether
> that calculation was correct or not, never mind whatever runtime
> cycles it would take.  I don't believe the argument that it's safer
> your way: if you had an off-by-one thinko in the calculation, or even
> just roundoff error in the log10() call, it could result in an actual
> reachable buffer overrun, because there's no safety margin.
>

Makes sense, guess I was overthinking this one.

Best,
Lukas

-- 
Lukas Fittl

Skype: lfittl
Phone: +1 415 321 0630 <(415)%20321-0630>


Re: [HACKERS] [PATCH] Use $ parameters as replacement characters for pg_stat_statements

2017-03-22 Thread Lukas Fittl
On Sat, Mar 18, 2017 at 12:46 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>
> So it turns out this discussion just reinvented the alternative that
> Lukas had in his 0002 proposal.  Are there any remaining objections
> to proceeding with that approach?
>

Thanks for reviewing - updated patch attached, comments below.


> In a quick read of the patch, I note that it falsifies and fails to
> update the header comment for generate_normalized_query:
>
>  * *query_len_p contains the input string length, and is updated with
>  * the result string length (which cannot be longer) on exit.
>
> It doesn't look like the calling code depends (any more?) on the
> assumption that the string doesn't get longer, but it would be good
> to double-check that.
>

Fixed.


> I'd just add, say, "10 * clocations_count" to the allocation length.
> We can afford to waste a few bytes on this transient copy of the query
> text.
>

I've changed this, although I've kept this somewhat dynamic, to avoid
having an accidental overrun in edge cases:

1 + floor(log10(jstate->clocations_count +
jstate->highest_extern_param_id));


> The documentation is overly vague about what the parameter symbols are,
> in particular failing to explain that their numbers start from after
> the last $n occurring in the original query text.
>

Updated the docs to clarify this.


> It seems like a good idea to have a regression test case demonstrating
> that, too.
>

I already had a separate patch that adds more regression tests (),
including one for prepared statements in the initial email.

Merged together now into one patch to keep it simple, and added another
constant value to the prepared statement test case. I've also included
a commit message in the patch file, if helpful.

Thanks,
Lukas

-- 
Lukas Fittl


0002_pgss_mask_with_incrementing_params.v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Use $ parameters as replacement characters for pg_stat_statements

2017-03-06 Thread Lukas Fittl
On Mon, Mar 6, 2017 at 9:36 AM, Robert Haas <robertmh...@gmail.com> wrote:

> On Sat, Mar 4, 2017 at 1:52 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> > In my opinion, we expose query id (and dbid, and userid) as the
> > canonical identifier for each pg_stat_statements entry, and have done
> > so for some time. That's the stable API -- not query text. I'm aware
> > of cases where query text was used as an identifier, but that ended up
> > being hashed anyway.
> >
> > Query text is just for human consumption.
>
> Lukas evidently thinks otherwise, based on the original post.
>

I actually agree with Peter that the queryid+userid+dbid is the canonical
identifier,
not the query text.

There is however value in parsing the query, e.g. to find out which
statement
type something is, or to determine which table names a query references
(assuming one knows the search_path) programatically.

It is for that latter reason I'm interested in parsing the query, and
avoiding the
ambiguity that ? carries, since its also an operator.

Based on some hackery, I've previously built a little example script that
filters pg_stat_statements output: https://github.com/lfittl/pg_qtop#usage

This script currently breaks in complex cases of ? operators, since the
pg_stat_statements query text is ambiguous.


> > I'd be in favor of a change
> > that makes it easier to copy and paste a query, to run EXPLAIN and so
> > on. Lukas probably realizes that there are no guarantees that the
> > query text that appears in pg_stat_statements will even appear as
> > normalized in all cases. The "sticky entry" stuff is intended to
> > maximize the chances of that happening, but it's still generally quite
> > possible (e.g. pg_stat_statements never swaps constants in a query
> > like "SELECT 5, pg_stat_statements_reset()"). This means that we
> > cannot really say that this buys us a machine-readable query text
> > format, at least not without adding some fairly messy caveats.
>
> Well, Lukas's original suggestion of using $n for a placeholder would
> do that, unless there's already a $n with the same numerical value,
> but Andres's proposal to use $-n or $:n would not.
>

Yes, and I do think that making it easier to run EXPLAIN would be the
primary user-visible benefit in core.

I'd be happy to add a docs section showing how to use this, if there is
some consensus that its worth pursuing this direction.

Thanks for all the comments, appreciate the discussion.

Best,
Lukas

-- 
Lukas Fittl


Re: [HACKERS] [PATCH] Use $ parameters as replacement characters for pg_stat_statements

2017-03-01 Thread Lukas Fittl
On Wed, Mar 1, 2017 at 6:51 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:
>
> Hmm, I think this could confuse people into thinking that the queries
> displayed were in fact prepared queries.
>
> Maybe we could gather some more ideas.
>

I think thats a reasonable concern - the main benefit of $1 is that its
already designated as something that can replace a constant, and still be
read by the Postgres parser.

Is there any other character that has the same properties?

I'll also note that Greg Stark mentioned in [0] that "There's another
feature pg_stat_statements *really* needs. A way to convert a jumbled
statement into one that can be prepared easily. The use of ? instead of :1
:2 etc makes this a mechanical but annoying process."

Using $1, $2, etc. for jumbling statements would give us that for free, no
additional effort needed.

[0] https://www.postgresql.org/message-id/CAM-w4HNOeNW6pY_1%
3DLp1aJGMmZ_R6S8JHjqvJMv8-%3DOf3q1q0w%40mail.gmail.com

Best,
Lukas

-- 
Lukas Fittl


[HACKERS] [PATCH] Use $ parameters as replacement characters for pg_stat_statements

2017-02-28 Thread Lukas Fittl
Hi,

Currently pg_stat_statements replaces constant values with ? characters.
I've seen this be a problem on multiple occasions, in particular since it
conflicts with the use of ? as an operator.

I'd like to propose changing the replacement character from ? to instead be
a parameter (like $1).

My main motiviation is to aid external tools that parse pg_stat_statements
output (like [0]), and currently have to resort to complex parser patches
to distinguish operators from replacement characters.

First of all, attached 0001_pgss_additional_regression_tests.v1.patch which
increases regression test coverage for a few scenarios relevant to this
discussion.

Then, there is two variants I've prepared, only one of the two to be
committed:

A) 0002_pgss_mask_with_incrementing_params.v1.patch: Replace constants with
$1, $2, $3 (etc) in the normalized query, whilst respecting any existing
params in the counting

B) 0003_pgss_mask_with_zero_param.v1.patch: Replace constants with $0 in
the normalized query

Ideally I'd see A turn into something we can commit, but it involves a bit
more computation to get the parameter numbers right. The benefit is that we
could use PREPARE/EXECUTE with pg_stat_statements output.

B is intentionally simple, and would address the primary issue at hand
(distinguishing from the ? operator).

I'm also adding this patch to the commitfest starting tomorrow.

Best,
Lukas

[0] https://github.com/lfittl/pg_query#parsing-a-normalized-query

-- 
Lukas Fittl


0001_pgss_additional_regression_tests.v1.patch
Description: Binary data


0002_pgss_mask_with_incrementing_params.v1.patch
Description: Binary data


0003_pgss_mask_with_zero_param.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements and non default search_path

2016-10-16 Thread Lukas Fittl
On Sat, Oct 15, 2016 at 11:29 PM, Julien Rouhaud <julien.rouh...@dalibo.com>
wrote:
>
> > BTW, after thinking about it some more, I don't see how storing the
> > search_path would help at all... it's not like you can do anything with
> > it unless you have a huge chunk of the parser available.
> >
>
> My use case is not really to know the fully qualified name of each
> identifier, but being able to optimize a problematic query found with
> pg_stat_statements.  I can already "unjumble" it automatically, but the
> search_path is needed to be able to get an explain or just execute the
> query.
>

I'd also find having the search_path available to be a helpful benefit -
I've run into the same problem as Julien where two identical queries
couldn't be identified correctly because of the missing search_path.

In particular this situation might happen if you shard different tenants
using schemas (one schema for each tenant), and use the search_path to set
the current tenant.

In my own setup I combine pg_stat_statements with a slightly hacked up copy
of the Postgres parser, so having the correct search_path + query text
available is enough to find the objects a query references (most of the
time, there are a few other edge cases).

My assumption thus far has been that adding another field like this might
not be considered because of performance considerations.

Can somebody chime in if it would be feasible to store this in the
out-of-band query text file, and whether a patch for this would be
considered acceptable?

Best,
Lukas

-- 
Lukas Fittl

Skype: lfittl
Phone: +1 415 321 0630


Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)

2016-08-10 Thread Lukas Fittl
On Wed, Aug 10, 2016 at 4:24 PM, Kisung Kim <ks...@bitnine.net> wrote:
>
> When I used the index bloating estimation script in
> https://github.com/ioguix/pgsql-bloat-estimation,
> the result is as follows:
>

Regardless of the issue at hand, it might make sense to verify these
statistics using pgstattuple - those bloat estimation queries can be wildly
off at times.

See also https://www.postgresql.org/docs/9.5/static/pgstattuple.html as
well as https://www.keithf4.com/checking-for-postgresql-bloat/

Best,
Lukas

-- 
Lukas Fittl

Skype: lfittl
Phone: +1 415 321 0630


Re: [HACKERS] Revisiting pg_stat_statements and IN() (Was: Re: pg_stat_statements fingerprinting logic and ArrayExpr)

2015-11-25 Thread Lukas Fittl
On Mon, Nov 23, 2015 at 11:53 PM, Peter Geoghegan <p...@heroku.com> wrote:

> One specific justification he gave for not using pg_stat_statements was:
>
> "Doesn’t merge bind vars in IN()" (See slide #11)
>
> I wonder:
>
> * How do other people feel about this? Personally, I've seen enough
> problems of this kind in the field that "slippery slope" arguments
> against this don't seem very compelling.
>

As someone who runs a little monitoring service thats solely based on
pg_stat_statements data, ignoring IN list length would certainly be a good
change.

We currently do this in post-processing, together with other data cleanup
(e.g. ignoring the length of a VALUES list in an INSERT statement).

Given the fact that pgss data is normalized & you don't know which plan was
chosen, I don't think distinguishing based on the length of the list helps
anyone really.

I see pg_stat_statements as a high-level overview of which queries have
run, and which ones you might want to look into closer using e.g.
auto_explain.

Best,
Lukas

-- 
Lukas Fittl

Skype: lfittl
Phone: +1 415 321 0630


Re: [HACKERS] SuperUser check in pg_stat_statements

2015-10-20 Thread Lukas Fittl
Rajan,

I'll reply off-list since this isn't the right discussion for -hackers.

Best,
Lukas

On Tue, Oct 20, 2015 at 7:02 AM, rajan <vgmon...@gmail.com> wrote:

> Hey Lukas,
>
> Thanks. Able to see the queries from all users. Can you explain the
> monitoring.get_stat_statements()?
>
>
>
> --
> View this message in context:
> http://postgresql.nabble.com/SuperUser-check-in-pg-stat-statements-tp5870589p5870733.html
> Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Lukas Fittl

Skype: lfittl
Phone: +1 415 321 0630


Re: [HACKERS] SuperUser check in pg_stat_statements

2015-10-19 Thread Lukas Fittl
On Mon, Oct 19, 2015 at 3:12 PM, Jim Nasby <jim.na...@bluetreble.com> wrote:

> On 10/19/15 3:48 PM, rajan wrote:
>
>> Thanks Stephen and Shulgin for your response.
>>
>> Will go through the patch and will try to solve my problem using that.
>>
>> My scenario is that i need to have an user who cannot be a super user but
>> a
>> monitor user, who will be able to see all the queries executed by all
>> users.
>>
>
> You can set that up today by defining a view on top of pg_stat_statements
> (or maybe it needs a SECDEF SRF... been a while since I've done it).


You can solve this using a security definer method created by a superuser,
see

https://gist.github.com/lfittl/9ee78ac200e4e7ebe33d

for a full example.

-- 
Lukas Fittl

Skype: lfittl
Phone: +1 415 321 0630


Re: [HACKERS] reparsing query

2015-04-15 Thread Lukas Fittl
On Wed, Apr 15, 2015 at 8:39 PM, Qingqing Zhou zhouqq.postg...@gmail.com
wrote:

 It is not difficult to output parsed query in some tool readable
 format but it comes with a maintain overhead: once tools rely on it,
 we have to conform to some schema continuously, like the xml/xmlns. Do
 we want to take this? Depends on how far the tools can go with this
 exposed information.

 We actually already have explain command in json format and tools
 drawing/analyzing query plan rely on it. Will that work for your
 scenario?


Sure, that would work - but as I understand adding an explicit SQL command
(or function) is not something that would ever be merged into Postgres
core. Especially since that command's output could easily change across
versions.

My thought was more along the lines of making something like raw_parser +
nodeToString available through an extension, but with a JSON output format.

Note that an important detail in the monitoring case is that you don't
necessarily have the statement's underlying relations available (since you
might work with statistics data on a different machine).

Best,
Lukas

-- 
Lukas Fittl

Skype: lfittl
Phone: +1 415 321 0630


Re: [HACKERS] reparsing query

2015-04-15 Thread Lukas Fittl
On Wed, Apr 15, 2015 at 2:43 PM, Qingqing Zhou zhouqq.postg...@gmail.com
wrote:

 Is this a proposal to have a better formatted (JSON etc)
 debug_print_parse results?


I've run into the need for this as well for monitoring purposes, my
solution was to compile the needed object files into a library (see [0]).

It'd be interesting to explore if there is some way to make this less
hack-ish, and enable tools to parse queries in a better way. Essentially
what is needed is some way to reliably translate SQL into an AST-like
output, from an outside tool, whilst reusing the current PostgreSQL parser.

I believe the recent work on deparsing DDL statements relates to some of
that (i.e. might be re-usable for this purpose, through an extension), if
I'm understanding correctly.

[0]: https://github.com/pganalyze/pg_query

Best,
Lukas

-- 
Lukas Fittl

Skype: lfittl
Phone: +1 415 321 0630


Re: [HACKERS] List of table names of a DB

2015-01-08 Thread Lukas Fittl
On Fri, Jan 9, 2015 at 7:14 AM, Deepak S in.live...@live.in wrote:

 Sorry, it's not about querying. I am implementing an invalidation
 mechanism for Postgres Query Cache as part of my masters project. In order
 to this, I need to store details(like name) of each table the query uses.
 In essence, I need to store the table names of the cached queries.

 Initially, I thought of writing a code that could extract the table names
 but later discovered that it is a gargantuan task as I shall have to
 include around 600 production rules as was hinted in a Stackoverflow
 Exchange post. Hence, I thought of getting hold of the data structure used
 for storing table names of a DB but I couldn't get it.


For prototyping you might also find https://github.com/pganalyze/pg_query
useful.

Its a Ruby-based library that makes the Postgres parser easier to access
from the outside, getting a list of tables from a query is trivial - but if
you need the oids you'll have to do it like pgpool does.

(feel free to ping me off-list about this)

Best,

-- 
Lukas Fittl

Skype: lfittl
Phone: +43 6991 2770651