Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c

2016-08-09 Thread Dilip Kumar
On Wed, Aug 10, 2016 at 10:04 AM, Dilip Kumar  wrote:
> This seems better, after checking at other places I found that for
> invalid type we are using ERRCODE_UNDEFINED_OBJECT and for invalid
> functions we are using ERRCODE_UNDEFINED_FUNCTION. So I have done the
> same way.
>
> Updated patch attached.

I found some more places where we can get similar error and updated in
my v3 patch.

I don't claim that it fixes all such kind of error, but at least it
fixes error reported by sqlsmith plus some additional what I found in
similar area.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


cache_lookup_failure_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Python adding support for multi-dimensional arrays

2016-08-09 Thread Pavel Stehule
Hi

2016-08-03 13:54 GMT+02:00 Alexey Grishchenko :

> On Wed, Aug 3, 2016 at 12:49 PM, Alexey Grishchenko <
> agrishche...@pivotal.io> wrote:
>
>> Hi
>>
>> Current implementation of PL/Python does not allow the use of
>> multi-dimensional arrays, for both input and output parameters. This forces
>> end users to introduce workarounds like casting arrays to text before
>> passing them to the functions and parsing them after, which is an
>> error-prone approach
>>
>> This patch adds support for multi-dimensional arrays as both input and
>> output parameters for PL/Python functions. The number of dimensions
>> supported is limited by Postgres MAXDIM macrovariable, by default equal to
>> 6. Both input and output multi-dimensional arrays should have fixed
>> dimension sizes, i.e. 2-d arrays should represent MxN matrix, 3-d arrays
>> represent MxNxK cube, etc.
>>
>> This patch does not support multi-dimensional arrays of composite types,
>> as composite types in Python might be represented as iterators and there is
>> no obvious way to find out when the nested array stops and composite type
>> structure starts. For example, if we have a composite type of (int, text),
>> we can try to return "[ [ [1,'a'], [2,'b'] ], [ [3,'c'], [4,'d'] ] ]", and
>> it is hard to find out that the first two lists are lists, and the third
>> one represents structure. Things are getting even more complex when you
>> have arrays as members of composite type. This is why I think this
>> limitation is reasonable.
>>
>> Given the function:
>>
>> CREATE FUNCTION test_type_conversion_array_int4(x int4[]) RETURNS int4[]
>> AS $$
>> plpy.info(x, type(x))
>> return x
>> $$ LANGUAGE plpythonu;
>>
>> Before patch:
>>
>> # SELECT * FROM test_type_conversion_array_int4(ARRAY[[1,2,3],[4,5,6]]);
>> ERROR:  cannot convert multidimensional array to Python list
>> DETAIL:  PL/Python only supports one-dimensional arrays.
>> CONTEXT:  PL/Python function "test_type_conversion_array_int4"
>>
>>
>> After patch:
>>
>> # SELECT * FROM test_type_conversion_array_int4(ARRAY[[1,2,3],[4,5,6]]);
>> INFO:  ([[1, 2, 3], [4, 5, 6]], )
>>  test_type_conversion_array_int4
>> -
>>  {{1,2,3},{4,5,6}}
>> (1 row)
>>
>>
>> --
>> Best regards,
>> Alexey Grishchenko
>>
>
> Also this patch incorporates the fix for https://www.postgresql.
> org/message-id/CAH38_tkwA5qgLV8zPN1OpPzhtkNKQb30n3x
> q-2NR9jUfv3qwHA%40mail.gmail.com, as they touch the same piece of code -
> array manipulation in PL/Python
>
>
I am sending review of this patch:

1. The implemented functionality is clearly benefit - passing MD arrays,
pretty faster passing bigger arrays
2. I was able to use this patch cleanly without any errors or warnings
3. There is no any error or warning
4. All tests passed - I tested Python 2.7 and Python 3.5
5. The code is well commented and clean
6. For this new functionality the documentation is not necessary

7. I invite more regress tests for both directions (Python <-> Postgres)
for more than two dimensions

My only one objection is not enough regress tests - after fixing this patch
will be ready for commiters.

Good work, Alexey

Thank you

Regards

Pavel


> --
> Best regards,
> Alexey Grishchenko
>


Re: [HACKERS] Small issues in syncrep.c

2016-08-09 Thread Michael Paquier
On Tue, Aug 9, 2016 at 5:34 PM, Julien Rouhaud
 wrote:
> Since 14e8803f1, it's not necessary to acquire the SyncRepLock to see up
> to date data. But it looks like this commit didn't update all the
> comment around MyProc->syncRepState, which still mention retrieving the
> value without and without lock.  Also, there's I think a now unneeded
> test to try to retrieve again syncRepState.
>
> Patch attached to fix both small issues, present since 9.5.

You could directly check MyProc->syncRepState and remove syncRepState.
Could you add it to the next commit fest? I don't think this will get
into 9.6 as this is an optimization.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait events monitoring future development

2016-08-09 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> Lets put this in perspective: there's tons of companies that spend thousands
> of dollars per month extra by running un-tuned systems in cloud environments.
> I almost called that "waste" but in reality it should be a simple business
> question: is it worth more to the company to spend resources on reducing
> the AWS bill or rolling out new features?
> It's something that can be estimated and a rational business decision made.
> 
> Where things become completely *irrational* is when a developer reads
> something like "plpgsql blocks with an EXCEPTION handler are more expensive"
> and they freak out and spend a bunch of time trying to avoid them, without
> even the faintest idea of what that overhead actually is.
> More important, they haven't the faintest idea of what that overhead costs
> the company, vs what it costs the company for them to spend an extra hour
> trying to avoid the EXCEPTION (and probably introducing code that's far
> more bug-prone in the process).
> 
> So in reality, the only people likely to notice even something as large
> as a 10% hit are those that were already close to maxing out their hardware
> anyway.
> 
> The downside to leaving stuff like this off by default is users won't
> remember it's there when they need it. At best, that means they spend more
> time debugging something than they need to. At worse, it means they suffer
> a production outage for longer than they need to, and that can easily exceed
> many months/years worth of the extra cost from the monitoring overhead.

I'd rather like this way of positive thinking.  It will be better to think of 
the event monitoring as a positive feature for (daily) proactive improvement, 
not only as a debugging feature which gives negative image.  For example, 
pgAdmin4 can display 10 most time-consuming events and their solutions.  The 
DBA initially places the database and WAL on the same volume.  As the system 
grows and the write workload increases, the DBA can get a suggestion from 
pgAdmin4 that he can prepare for the system growth by placing WAL on another 
volume to reduce WALWriteLock wait events.  This is not debugging, but 
proactive monitoring.


> > As another idea, we can stand on the middle ground.  Interestingly, MySQL
> also enables their event monitoring (Performance Schema) by default, but
> not all events are collected.  I guess highly encountered events are not
> collected by default to minimize the overhead.
> 
> That's what we currently do with several track_* and log_*_stats GUCs,
> several of which I forgot even existed until just now. Since there's question
> over the actual overhead maybe that's a prudent approach for now, but I
> think we should be striving to enable these things ASAP.

Agreed.  And as Bruce said, it may be better to be able to disable collection 
of some events that have visible impact on performance.

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics (v19)

2016-08-09 Thread Michael Paquier
On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra
 wrote:
> 1) enriching the query tree with multivariate statistics info
>
> Right now all the stuff related to multivariate statistics estimation
> happens in clausesel.c - matching condition to statistics, selection of
> statistics to use (if there are multiple usable stats), etc. So pretty much
> all this info is internal to clausesel.c and does not get outside.

This does not seem bad to me as first sight but...

> I'm starting to think that some of the steps (matching quals to stats,
> selection of stats) should happen in a "preprocess" step before the actual
> estimation, storing the information (which stats to use, etc.) in a new type
> of node in the query tree - something like RestrictInfo.
>
> I believe this needs to happen sometime after deconstruct_jointree() as that
> builds RestrictInfos nodes, and looking at planmain.c, right after
> extract_restriction_or_clauses seems about right. Haven't tried, though.
>
> This would move all the "statistics selection" logic from clausesel.c,
> separating it from the "actual estimation" and simplifying the code.
>
> But more importantly, I think we'll need to show some of the data in EXPLAIN
> output. With per-column statistics it's fairly straightforward to determine
> which statistics are used and how. But with multivariate stats things are
> often more complicated - there may be multiple candidate statistics (e.g.
> histograms covering different subsets of the conditions), it's possible to
> apply them in different orders, etc.
>
> But EXPLAIN can't show the info if it's ephemeral and available only within
> clausesel.c (and thrown away after the estimation).

This gives a good reason to not do that in clauserel.c, it would be
really cool to be able to get some information regarding the stats
used with a simple EXPLAIN.

> 2) combining multiple statistics
>
> I think the ability to combine multivariate statistics (covering different
> subsets of conditions) is important and useful, but I'm starting to think
> that the current implementation may not be the correct one (which is why I
> haven't written the SGML docs about this part of the patch series yet).
>
> Assume there's a table "t" with 3 columns (a, b, c), and that we're
> estimating query:
>
>SELECT * FROM t WHERE a = 1 AND b = 2 AND c = 3
>
> but that we only have two statistics (a,b) and (b,c). The current patch does
> about this:
>
>P(a=1,b=2,c=3) = P(a=1,b=2) * P(c=3|b=2)
>
> i.e. it estimates the first two conditions using (a,b), and then estimates
> (c=3) using (b,c) with "b=2" as a condition. Now, this is very efficient,
> but it only works as long as the query contains conditions "connecting" the
> two statistics. So if we remove the "b=2" condition from the query, this
> stops working.

This is trying to make the algorithm smarter than the user, which is
something I'd think we could live without. In this case statistics on
(a,c) or (a,b,c) are missing. And what if the user does not want to
make use of stats for (a,c) because he only defined (a,b) and (b,c)?

Patch 0001: there have been comments about that before, and you have
put the checks on RestrictInfo in a couple of variables of
pull_varnos_walker, so nothing to say from here.

Patch 0002:
+  
+   CREATE STATISTICS will create a new multivariate
+   statistics on the table. The statistics will be created in the in the
+   current database. The statistics will be owned by the user issuing
+   the command.
+  
s/in the/in the/.

+  
+   Create table t1 with two functionally dependent columns, i.e.
+   knowledge of a value in the first column is sufficient for detemining the
+   value in the other column. Then functional dependencies are built on those
+   columns:
s/detemining/determining/

+  
+   If a schema name is given (for example, CREATE STATISTICS
+   myschema.mystat ...) then the statistics is created in the specified
+   schema.  Otherwise it is created in the current schema.  The name of
+   the table must be distinct from the name of any other statistics in the
+   same schema.
+  
I would just assume that a statistics is located on the schema of the
relation it depends on. So the thing that may be better to do is just:
- Register the OID of the table a statistics depends on but not the schema.
- Give up on those query extensions related to the schema.
- Allow the same statistics name to be used for multiple tables.
- Just fail if a statistics name is being reused on the table again.
It may be better to complain about that even if the column list is
different.
- Register the dependency between the statistics and the table.

+ALTER STATISTICS name
OWNER TO { new_owner |
CURRENT_USER | SESSION_USER }
On the same line, is OWNER TO really necessary? I could have assumed
that if a user is able to query the set of columns related to a
statistics, he should have access to it.

=# create statistics aa_a_b3 on aam (a, b) with (dependencies);
ERROR:  

Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c

2016-08-09 Thread Dilip Kumar
On Tue, Aug 9, 2016 at 6:49 PM, Amit Kapila  wrote:
> Okay, then how about ERRCODE_UNDEFINED_OBJECT?  It seems to be used in
> almost similar cases.

This seems better, after checking at other places I found that for
invalid type we are using ERRCODE_UNDEFINED_OBJECT and for invalid
functions we are using ERRCODE_UNDEFINED_FUNCTION. So I have done the
same way.

Updated patch attached.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


cache_lookup_failure_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dsm_unpin_segment

2016-08-09 Thread Thomas Munro
On Wed, Aug 10, 2016 at 2:43 PM, Jim Nasby  wrote:
> On 8/9/16 6:14 PM, Thomas Munro wrote:
>> The can't be static, they need to be in shared memory, because we also
>> want to protect against two *different* backends pinning it.
>
> Right, this would strictly protect from it happening within a single
> backend. Perhaps it's pointless for pin/unpin, but it seems like it would be
> a good thing to have for attach/detach.

Double attach already gets you:

elog(ERROR, "can't attach the same segment more than once");

Double detach is conceptually like double free, and in fact actually
leads to a double pfree of the backend-local dsm_segment object.  It
doesn't seem like the kind of thing it's easy or reasonable to try to
defend against, since you have no space left in which to store the
state you need to detect double-frees after you've done it once!

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heap WARM Tuples - Design Draft

2016-08-09 Thread Pavan Deolasee
On Tue, Aug 9, 2016 at 12:06 AM, Claudio Freire 
wrote:

> On Mon, Aug 8, 2016 at 2:52 PM, Pavan Deolasee 
> wrote:
>
> > Some heuristics and limits on amount of work done to detect duplicate
> index
> > entries will help too.
>
> I think I prefer a more thorough approach.
>
> Increment/decrement may get very complicated with custom opclasses,
> for instance. A column-change bitmap won't know how to handle
> funcional indexes, etc.
>
> What I intend to try, is modify btree to allow efficient search of a
> key-ctid pair, by adding the ctid to the sort order.


Yes, that's a good medium term plan. And this is kind of independent from
the core idea. So I'll go ahead and write a patch that implements the core
idea and some basic optimizations. We can then try different approaches
such as tracking changed columns, tracking increment/decrement and teaching
btree to skip duplicate (key, CTID) entries.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] dsm_unpin_segment

2016-08-09 Thread Jim Nasby

On 8/9/16 6:14 PM, Thomas Munro wrote:

Could a couple of static variables be used to ensure multiple pin/unpin and
> attach/detach calls throw an assert() (or become a no-op if asserts are
> disabled)? It would be nice if we could protect users from this.

The can't be static, they need to be in shared memory, because we also
want to protect against two *different* backends pinning it.


Right, this would strictly protect from it happening within a single 
backend. Perhaps it's pointless for pin/unpin, but it seems like it 
would be a good thing to have for attach/detach.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heap WARM Tuples - Design Draft

2016-08-09 Thread Jim Nasby

On 8/9/16 6:44 PM, Claudio Freire wrote:

Since we can lookup all occurrences of k1=a index=0 and k2=a index=0,
and in fact we probably did so already as part of the update logic


That's a change from what currently happens, right?

The reason I think that's important is that dropping the assumption that 
we can't safely re-find index entries from the heap opens up other 
optimizations, ones that should be significantly simpler to implement. 
The most obvious example being getting rid of full index scans in 
vacuum. While that won't help with write amplification, it would reduce 
the cost of vacuum enormously. Orders of magnitude wouldn't surprise me 
in the least.


If that's indeed a prerequisite to WARM it would be great to get that 
groundwork laid early so others could work on other optimizations it 
would enable.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heap WARM Tuples - Design Draft

2016-08-09 Thread Jim Nasby

On 8/9/16 6:46 PM, Claudio Freire wrote:

On Tue, Aug 9, 2016 at 8:44 PM, Claudio Freire  wrote:

index  0   1   2   3   4
k1  ..   a   a   a
k2  ..   b   a   a
i1   ^
i2   ^ ^
lp   u   u   r3
hot*



Sorry, that should read:

index  0   1   2   3   4
k1  ..   a   a   a
k2  ..   b   a   a
i1 ^
i2 ^
lp   u   u   r3
hot*



FWIW, your ascii-art is unfortunately getting bungled somewhere I think :/.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] handling unconvertible error messages

2016-08-09 Thread Kyotaro HORIGUCHI
Hello,

(I've recovered the lost Cc recipients so far)

At Mon, 8 Aug 2016 12:52:11 +0300, Victor Wagner  wrote in 
<20160808125211.1361c...@fafnir.local.vm>
> On Mon, 08 Aug 2016 18:28:57 +0900 (Tokyo Standard Time)
> Kyotaro HORIGUCHI  wrote:
> > 
> > I don't see charset compatibility to be easily detectable,
> 
> In the worst case we can hardcode explicit compatibility table.

We could have the language lists compatible with some
language-bound encodings.  For example, LATIN1 (ISO/IEC 8859-1),
according to Wikipedia
(https://en.wikipedia.org/wiki/ISO/IEC_8859-1)

According to the list, we might have the following compatibility
list of locales, maybe without region.

{{"UTF8", "LATIN1"}, "af", "sq", "eu", "da", "en", "fo", "en"}... and so.

The biggest problem for this is at least *I* cannot confirm the
validity of the list. Both about perfectness of coverage of
LATIN1 over all languages in the list and omission of any
possiblly coverable language. Nontheless, we could use such lists
if we accept the possible imperfectness, which would eventually
result in the original error (conversion failure) or excess
fallback for possibly convertable languages but unfortunately the
latter  would be inacceptable for table data.

> There is limited set of languages, which have translated error messages,
> and limited (albeit wide) set of encodings, supported by PostgreSQL. So

Yes, we can have a negative list already known to be incompatible.

{{"UTF8", "LATIN1"}, "ru", .. er..what else?}

ISO639-1 seems to have about 190 languages and most of them are
apparently incompatible with LATIN1 encoding. It doesn't seem to
me good to have a haphazardly made negative list.

> it is possible to define complete list of encodings, compatible with
> some translation. And fall back to untranslated messages if client
> encoding is not in this list.
> 
> > because locale (or character set) is not a matter of PostgreSQL
> > (except for some encodings bound to one particular character
> > set)... So the conversion-fallback might be a only available
> > solution.
> 
> Conversion fallback may be a solution for data. For NLS-messages I think
> it is better to fall back to English (untranslated) messages than use of
> transliteration or something alike.

I suppose that 'fallback' means "have a try then use English if
failed" so I think it is sutable rather for message, not for
data, and it doesn't need any a priori information about
compatibility. It seems to me that PostgreSQL refuses to ignore
or conceal conversion errors and return broken or unwanted byte
sequence for data.  Things are different for error messages, it
is preferable to be anyyhow readable than totally abandoned.

> I think that for now we can assume that the best effort is already done
> for the data, and think how to improve situation with messages.

Is there any source to know the compatibility for any combination
of language vs encoding? Maybe we need a ground for the list.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heap WARM Tuples - Design Draft

2016-08-09 Thread Bruce Momjian
On Tue, Aug  9, 2016 at 06:19:57PM -0500, Jim Nasby wrote:
> On 8/8/16 3:19 PM, Bruce Momjian wrote:
> >>What will help, and something I haven't yet applied any thoughts, is when we
> >>> can turn WARM chains back to HOT by removing stale index entries.
> >I can't see how we can ever do that because we have multiple indexes
> >pointing to the chain, and keys that might be duplicated if we switched
> >to HOT.  Seems only VACUUM can fix that.
> 
> Are these changes still predicated on being able to re-find all index
> entries by key value? If so, that makes incremental vacuums practical,
> perhaps eliminating a lot of these issues.

No, the increment/decrement doesn't require btree lookups.

>  > We can't use the bits LP_REDIRECT lp_len because we need to create WARM
>  > chains before pruning, and I don't think walking the pre-pruned chain 
>  > is
>  > worth it.  (As I understand HOT, LP_REDIRECT is only created during
>  > pruning.)
> >>>
> >>> That's correct. But lp_len provides us some place to stash information 
> >>> from
> >>> heap tuples when they are pruned.
> >Right.  However, I see storing information at prune time as only useful
> >if you are willing to scan the chain, and frankly, I have given up on
> >chain scanning (with column comparisons) as being too expensive for
> >its limited value.
> 
> What if some of this work happened asynchronously? I'm thinking something
> that runs through shared_buffers in front of bgwriter.

Well, we prune when we need the space --- that is the big value of page
pruning, that it is fast can be done when you need it.  VACUUM is all
about background cleanup.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Notice lock waits

2016-08-09 Thread Jim Nasby

On 8/5/16 12:00 PM, Jeff Janes wrote:

So I created a new guc, notice_lock_waits, which acts like
log_lock_waits but sends the message as NOTICE so it will show up on
interactive connections like psql.


I would strongly prefer that this accept a log level instead of being 
hard-coded to NOTICE. The reason is that I find the NOTICE chatter from 
many DDL commands to be completely worthless (looking at you %TYPE), so 
I normally set client_min_messages to WARNING in DDL scripts. I can work 
on that patch; would it essentially be a matter of changing 
notice_lock_waits to int lock_wait_level?

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heap WARM Tuples - Design Draft

2016-08-09 Thread Claudio Freire
On Tue, Aug 9, 2016 at 8:44 PM, Claudio Freire  wrote:
> index  0   1   2   3   4
> k1  ..   a   a   a
> k2  ..   b   a   a
> i1   ^
> i2   ^ ^
> lp   u   u   r3
> hot*


Sorry, that should read:

index  0   1   2   3   4
k1  ..   a   a   a
k2  ..   b   a   a
i1 ^
i2 ^
lp   u   u   r3
hot*


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heap WARM Tuples - Design Draft

2016-08-09 Thread Claudio Freire
On Tue, Aug 9, 2016 at 8:19 PM, Jim Nasby  wrote:
> On 8/8/16 3:19 PM, Bruce Momjian wrote:
>>>
>>> What will help, and something I haven't yet applied any thoughts, is when
>>> we
>>> > can turn WARM chains back to HOT by removing stale index entries.
>>
>> I can't see how we can ever do that because we have multiple indexes
>> pointing to the chain, and keys that might be duplicated if we switched
>> to HOT.  Seems only VACUUM can fix that.
>
>
> Are these changes still predicated on being able to re-find all index
> entries by key value? If so, that makes incremental vacuums practical,
> perhaps eliminating a lot of these issues.
>
 > > We can't use the bits LP_REDIRECT lp_len because we need to create
 > > WARM
 > > chains before pruning, and I don't think walking the pre-pruned
 > > chain is
 > > worth it.  (As I understand HOT, LP_REDIRECT is only created during
 > > pruning.)
>>>
>>> >
>>> > That's correct. But lp_len provides us some place to stash information
>>> > from
>>> > heap tuples when they are pruned.
>>
>> Right.  However, I see storing information at prune time as only useful
>> if you are willing to scan the chain, and frankly, I have given up on
>> chain scanning (with column comparisons) as being too expensive for
>> its limited value.
>
>
> What if some of this work happened asynchronously? I'm thinking something
> that runs through shared_buffers in front of bgwriter.

If one can find key-ctid pairs efficiently in the index, this can be
done during WARM pruning (ie: during updates, when we're already doing
the index lookups anyway).

Suppose you have the following chain:

index  0   1   2   3   4
k1  a   a   a   a   a
k2  a   a   b   a   a
i1   ^
i2   ^^
hot  **

If versions 0-2 die, pruning can free 1 (it's HOT), and leave
redirects in 0 and 2:

index  0   1   2   3   4
k1  a.   a   a   a
k2  a.   b   a   a
i1   ^
i2   ^ ^
lp  r2   u   r3
hot*

Since we can lookup all occurrences of k1=a index=0 and k2=a index=0,
and in fact we probably did so already as part of the update logic, we
can remove the first redirect by pointing indexes to 2:

index  0   1   2   3   4
k1  ..   a   a   a
k2  ..   b   a   a
i1   ^
i2   ^ ^
lp   u   u   r3
hot*

So WARM pruning would have to happen just prior to adding a WARM tuple
to be able to reuse all the index lookup work to do the index pruning
with the writing.

The indexam interface will get considerably more complex in order to
do this, however. So perhaps it would be ok to do 2 independent
lookups instead? I'm undecided yet.

But one way or the other, pruning can happen early and incrementally.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait events monitoring future development

2016-08-09 Thread Jim Nasby

On 8/8/16 11:07 PM, Tsunakawa, Takayuki wrote:

From: pgsql-hackers-ow...@postgresql.org

> If you want to know why people are against enabling this monitoring by
> default, above is the reason.  What percentage of people do you think would
> be willing to take a 10% performance penalty for monitoring like this?  I
> would bet very few, but the argument above doesn't seem to address the fact
> it is a small percentage.
>
> In fact, the argument above goes even farther, saying that we should enable
> it all the time because people will be unwilling to enable it on their own.
> I have to question the value of the information if users are not willing
> to enable it.  And the solution proposed is to force the 10% default overhead
> on everyone, whether they are currently doing debugging, whether they will
> ever do this level of debugging, because people will be too scared to enable
> it.  (Yes, I think Oracle took this
> approach.)



Lets put this in perspective: there's tons of companies that spend 
thousands of dollars per month extra by running un-tuned systems in 
cloud environments. I almost called that "waste" but in reality it 
should be a simple business question: is it worth more to the company to 
spend resources on reducing the AWS bill or rolling out new features? 
It's something that can be estimated and a rational business decision made.


Where things become completely *irrational* is when a developer reads 
something like "plpgsql blocks with an EXCEPTION handler are more 
expensive" and they freak out and spend a bunch of time trying to avoid 
them, without even the faintest idea of what that overhead actually is. 
More important, they haven't the faintest idea of what that overhead 
costs the company, vs what it costs the company for them to spend an 
extra hour trying to avoid the EXCEPTION (and probably introducing code 
that's far more bug-prone in the process).


So in reality, the only people likely to notice even something as large 
as a 10% hit are those that were already close to maxing out their 
hardware anyway.


The downside to leaving stuff like this off by default is users won't 
remember it's there when they need it. At best, that means they spend 
more time debugging something than they need to. At worse, it means they 
suffer a production outage for longer than they need to, and that can 
easily exceed many months/years worth of the extra cost from the 
monitoring overhead.



> We can talk about this feature all we want, but if we are not willing to
> be realistic in how much performance penalty the _average_ user is willing
> to lose to have this monitoring, I fear we will make little progress on
> this feature.

OK, 10% was an overstatement.  Anyway, As Amit said, we can discuss the default 
value based on the performance evaluation before release.

As another idea, we can stand on the middle ground.  Interestingly, MySQL also 
enables their event monitoring (Performance Schema) by default, but not all 
events are collected.  I guess highly encountered events are not collected by 
default to minimize the overhead.


That's what we currently do with several track_* and log_*_stats GUCs, 
several of which I forgot even existed until just now. Since there's 
question over the actual overhead maybe that's a prudent approach for 
now, but I think we should be striving to enable these things ASAP.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dsm_unpin_segment

2016-08-09 Thread Thomas Munro
On Wed, Aug 10, 2016 at 10:38 AM, Jim Nasby  wrote:
> On 8/9/16 1:01 PM, Robert Haas wrote:
>>
>> However, I don't see the need for a full-blown request
>> counter here; we've had this API for several releases now and to my
>> knowledge nobody has complained about the fact that you aren't
>> supposed to call dsm_pin_segment() multiple times for the same
>> segment.
>
>
> Could a couple of static variables be used to ensure multiple pin/unpin and
> attach/detach calls throw an assert() (or become a no-op if asserts are
> disabled)? It would be nice if we could protect users from this.

The can't be static, they need to be in shared memory, because we also
want to protect against two *different* backends pinning it.  The v2
patch protects users from this, by adding 'pinned' flag to the
per-segment control object that is visible everywhere, and then doing:

+   if (dsm_control->item[seg->control_slot].pinned)
+   elog(ERROR, "cannot pin a segment that is already pinned");

... and:

+   if (!dsm_control->item[i].pinned)
+   elog(ERROR, "cannot unpin a segment that is not pinned");

Those checks could arguably be assertions rather than errors, but I
don't think that pin/unpin operations will ever be high frequency
enough for it to be worth avoiding those instructions in production
builds.  The whole reason for pinning segments is likely to be able to
reuse them repeatedly, so nailing it down is likely something you'd
want to do immediately after creating it.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Heap WARM Tuples - Design Draft

2016-08-09 Thread Jim Nasby

On 8/8/16 3:19 PM, Bruce Momjian wrote:

What will help, and something I haven't yet applied any thoughts, is when we
> can turn WARM chains back to HOT by removing stale index entries.

I can't see how we can ever do that because we have multiple indexes
pointing to the chain, and keys that might be duplicated if we switched
to HOT.  Seems only VACUUM can fix that.


Are these changes still predicated on being able to re-find all index 
entries by key value? If so, that makes incremental vacuums practical, 
perhaps eliminating a lot of these issues.



> > We can't use the bits LP_REDIRECT lp_len because we need to create WARM
> > chains before pruning, and I don't think walking the pre-pruned chain is
> > worth it.  (As I understand HOT, LP_REDIRECT is only created during
> > pruning.)

>
> That's correct. But lp_len provides us some place to stash information from
> heap tuples when they are pruned.

Right.  However, I see storing information at prune time as only useful
if you are willing to scan the chain, and frankly, I have given up on
chain scanning (with column comparisons) as being too expensive for
its limited value.


What if some of this work happened asynchronously? I'm thinking 
something that runs through shared_buffers in front of bgwriter.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dsm_unpin_segment

2016-08-09 Thread Jim Nasby

On 8/9/16 1:01 PM, Robert Haas wrote:

However, I don't see the need for a full-blown request
counter here; we've had this API for several releases now and to my
knowledge nobody has complained about the fact that you aren't
supposed to call dsm_pin_segment() multiple times for the same
segment.


Could a couple of static variables be used to ensure multiple pin/unpin 
and attach/detach calls throw an assert() (or become a no-op if asserts 
are disabled)? It would be nice if we could protect users from this.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Slowness of extended protocol

2016-08-09 Thread Vladimir Sitnikov
Robert Haas:

> but for some reason you can't use prepared statements, for example because
> the queries are dynamically generated and .  That case is analogous to -M
> extended, not -M prepared.  And -M extended is well-known to be SLOWER
>

I do not buy that "dynamically generated queries defeat server-prepared
usage" argument. It is just not true (see below).

Do you mean "in language X, where X != Java it is impossible to implement a
query cache"?
That is just ridiculus.

At the end of the day, there will be a finite number of hot queries that
are important.
Here's relevant pgjdbc commit: https://github.com/pgjdbc/pgjdbc/pull/319
It works completely transparent to the application, and it does use
server-prepared statements even though application builds "brand new" sql
text every time.

It is not something theoretical, but it is something that is already
implemented and battle-tested. The application does build SQL texts based
on the names of tables and columns that are shown in the browser, and
pgjdbc uses query cache (to map user SQL to backend statement name), thus
it benefits from server-prepared statements automatically.

Not a single line change was required at the application side.

Am I missing something?
I cannot imagine a real life case when an application throws 10'000+ UNIQUE
SQL texts per second at the database.
Cases like "where id=1", "where id=2", "where id=3" do not count as they
should be written with bind variables, thus it represents a single SQL text
like "where id=$1".

Robert>you have to keep sending a different query text every time

Do you agree that the major part would be some hot queries, the rest will
be much less frequently used ones (e.g. one time queries)?

In OLTP applications the number of queries is high, and almost all the
queries are reused.
server-prepared to rescue here.
"protocol optimization" would not be noticeable.

In DWH applications the queries might be unique, however the number of
queries is much less, thus the "protocol optimization" would be invisible
as the query plan/process time would be much higher than the gain from
"protocol
optimization".

Vladimir


Re: [HACKERS] per-statement-level INSTEAD OF triggers

2016-08-09 Thread Tom Lane
Robert Haas  writes:
> On Mon, Aug 8, 2016 at 4:40 AM, Yugo Nagata  wrote:
>> I'm asking out of curiosity, do anyone know why we don't have
>> per-statement-level INSTEAD OF triggers? I looked into the
>> standard SQL (20xx draft), but I can't find the restriction
>> such that INSTEAD OF triggers must be row-level. Is there
>> any technical difficulties, or other reasons for the current
>> implementation?

> I think one problem is that the trigger wouldn't have any way of
> knowing the specifics of what the user was trying to do.  It would
> just know the type of operation (INSERT, UPDATE, or DELETE).  I guess
> that could be useful to someone, but not all that useful.

It might be more useful after we get the infrastructure that Kevin's been
working on to allow collecting all the updates into a tuplestore that
could be passed to a statement-level trigger.  Right now I tend to agree
that there's little point.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] per-statement-level INSTEAD OF triggers

2016-08-09 Thread Robert Haas
On Mon, Aug 8, 2016 at 4:40 AM, Yugo Nagata  wrote:
> I'm asking out of curiosity, do anyone know why we don't have
> per-statement-level INSTEAD OF triggers? I looked into the
> standard SQL (20xx draft), but I can't find the restriction
> such that INSTEAD OF triggers must be row-level. Is there
> any technical difficulties, or other reasons for the current
> implementation?

I think one problem is that the trigger wouldn't have any way of
knowing the specifics of what the user was trying to do.  It would
just know the type of operation (INSERT, UPDATE, or DELETE).  I guess
that could be useful to someone, but not all that useful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication WIP

2016-08-09 Thread Alvaro Herrera
Petr Jelinek wrote:
> On 09/08/16 10:13, Craig Ringer wrote:

> >The only argument I can see for not using bgworkers is for the
> >supervisor worker. It's a singleton that launches the per-database
> >workers, and arguably is a job that the postmaster could do better. The
> >current design there stems from its origins as an extension. Maybe
> >worker management could be simplified a bit as a result. I'd really
> >rather not invent yet another new and mostly duplicate category of
> >custom workers to achieve that though.
> 
> It is simplified compared to pglogical (there is only 2 worker types not 3).
> I don't think it's job of postmaster to scan catalogs however so it can't
> really start workers for logical replication. I actually modeled it more
> after autovacuum (using bgworkers though) than the original extension.

Yeah, it's a very bad idea to put postmaster on this task.  We should
definitely stay away from that.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication WIP

2016-08-09 Thread Alvaro Herrera
Petr Jelinek wrote:
> On 09/08/16 12:16, Craig Ringer wrote:

> >Right. I think that's probably the direction we should be going
> >eventually. Personally I don't think such a change should block the
> >logical replication work from proceeding with bgworkers, though.
> 
> Yeah that's why I added local max GUC that just handles the logical worker
> limit within the max_worker_processes. I didn't want to also write generic
> framework for managing the max workers using tags or something as part of
> this, it's big enough as it is and we can always move the limit to the more
> generic place once we have it.

Parallel query does exactly that: the workers are allocated from the
bgworkers array, and if you want more, it's on you to increase that
limit (it doesn't even have the GUC for a maximum).  As far as logical
replication and parallel query are concerned, that's fine.  We can
improve this later, if it proves to be a problem.

I think there are far more pressing matters to review.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in EXPLAIN for foreign/custom join pushdown plans

2016-08-09 Thread Robert Haas
On Mon, Aug 8, 2016 at 12:22 AM, Etsuro Fujita
 wrote:
>>> I noticed that currently the core doesn't show any information on the
>>> target
>>> relations involved in a foreign/custom join in EXPLAIN, by itself.
>> I think that's a feature, not a bug.
> I agree with you.  I'd leave that for 10.0.

I don't want to change it at all, neither in 10 or any later version.

>> I disagree with that.  Currently, when we say that something is a join
>> (Merge Join, Hash Join, Nested Loop) we mean that the executor is
>> performing a join, but that's not the case here.  The executor is
>> performing a scan.  The remote side, we suppose, is performing a join
>> for us, but we are not performing a join: we are performing a scan.
>> So I think the fact that it shows up in the plan as "Foreign Scan" is
>> exactly right.  We are scanning some foreign thing, and that thing may
>> internally be doing other stuff, like joins and aggregates, but all
>> we're doing is scanning it.
>
> Hmm.  One thing I'm concerned about would be the case where direct
> modification is implemented by using GetForeignUpperPaths, not
> PlanDirectModify.  In that case, the way things are now, we would have
> "Foreign Scan" followed by an INSERT/UPDATE/DELETE query, but that seems odd
> to me.

I don't think there's really any problem there.  But if there is,
let's solve it when someone submits that patch, not now.

> One thing we need to do to leave that as is would be to fix a bug that I
> pointed out upthred.  Let me explain about that again.  The EXPLAIN command
> selects relation aliases to be used in printing a query so that each
> selected alias is unique, but postgres_fdw wouldn't consider the uniqueness.
> Here is an example:
>
> postgres=# explain verbose select * from (select t1.a, t2.a from ft1 t1, ft2
> t2 where t1.a = t2.a union select t1.a, t2.a from ft1 t1, ft2 t2 where t1.a
> = t2.a) as t(t1a, t2a);
>  QUERY PLAN
> 
>  Unique  (cost=204.12..204.13 rows=2 width=8)
>Output: t1.a, t2.a
>->  Sort  (cost=204.12..204.12 rows=2 width=8)
>  Output: t1.a, t2.a
>  Sort Key: t1.a, t2.a
>  ->  Append  (cost=100.00..204.11 rows=2 width=8)
>->  Foreign Scan  (cost=100.00..102.04 rows=1 width=8)
>  Output: t1.a, t2.a
>  Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
>  Remote SQL: SELECT r1.a, r2.a FROM (public.t1 r1 INNER
> JOIN public.t2 r2 ON (((r1.a = r2.a
>->  Foreign Scan  (cost=100.00..102.04 rows=1 width=8)
>  Output: t1_1.a, t2_1.a
>  Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
>  Remote SQL: SELECT r1.a, r2.a FROM (public.t1 r1 INNER
> JOIN public.t2 r2 ON (((r1.a = r2.a
> (14 rows)
>
> The relation aliases in the Relations line in the second Foreign Scan, t1
> and t2 for ft1 and ft2, are not unique; they should be t1_1 and t2_1
> (compare the aliases in the Relations line with ones in the Output line
> directly above that, created by core).  The reason for that is because
> postgres_fdw has created the Relations info by using rte->eref->aliasname as
> relation aliases as is at path-creation time. Probably it would be a little
> bit too early for postgers_fdw to do that.  Couldn't postgres_fdw create
> that info after query planning, for example, during ExplainForeignScan?

Yes, it seems what we are doing now is not consistent with what
happens for plain tables; that should probably be fixed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Slowness of extended protocol

2016-08-09 Thread Robert Haas
On Tue, Aug 9, 2016 at 4:50 AM, Vladimir Sitnikov
 wrote:
> I've tried pgbench -M prepared, and it is way faster than pgbench -M simple.

That's true, but it's also testing something completely different from
what Shay is concerned about.  -M prepared creates a prepared
statement once, and then executes it many times.  That is, as you say,
faster.  But what Shay is concerned about is the case where you are
using the extended query protocol to send out-of-line parameters but
for some reason you can't use prepared statements, for example because
the queries are dynamically generated and you have to keep sending a
different query text every time.  That case is analogous to -M
extended, not -M prepared.  And -M extended is well-known to be SLOWER
than -M simple.  Here's a quick test on my laptop demonstrating this:

[rhaas ~]$ pgbench -M simple -S -T 10
starting vacuum...end.
transaction type: 
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 119244
latency average: 0.084 ms
tps = 11919.440321 (including connections establishing)
tps = 11923.229139 (excluding connections establishing)
[rhaas ~]$ pgbench -M prepared -S -T 10
starting vacuum...end.
transaction type: 
scaling factor: 1
query mode: prepared
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 192100
latency average: 0.052 ms
tps = 19210.341944 (including connections establishing)
tps = 19214.820999 (excluding connections establishing)
[rhaas ~]$ pgbench -M extended -S -T 10
starting vacuum...end.
transaction type: 
scaling factor: 1
query mode: extended
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 104275
latency average: 0.096 ms
tps = 10425.510813 (including connections establishing)
tps = 10427.725239 (excluding connections establishing)

Shay is not worried about the people who are getting 19.2k TPS instead
of 11.9k TPS.  Those people are already happy.  He is worried about
the people who are getting 10.4k TPS instead of 11.9k TPS.  People who
can't use prepared statements because their query text varies - and I
have personally written multiple web applications that have exactly
that profile - suffer a big penalty if they choose to use the extended
query protocol to pass parameters.  Here, it's about a 13% performance
loss.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] phrase search TS_phrase_execute code readability patch

2016-08-09 Thread David G. Johnston
Hackers,

The attached attempts to make comprehension of the code in
"TS_phrase_execute" a bit easier.  I posted similar on the "typo patch"
thread of July 2nd/5th but my comments there reflected my mis-understanding
of the distance operator being exact as opposed to the expected
less-than-or-equal.

I had to make one assumption for the attached - that
"curitem->qoperator.distance" is always >= 0.  In the presence of that
assumption the need for the OR goes away.

I don't follow why LposStart is needed so I removed it...

Not compiled or in any way tested...but its not major surgery either -
mostly code comments to make following the logic easier.

David J.
diff --git a/src/backend/utils/adt/tsvector_op.c 
b/src/backend/utils/adt/tsvector_op.c
index ad5a254..215f58f 100644
--- a/src/backend/utils/adt/tsvector_op.c
+++ b/src/backend/utils/adt/tsvector_op.c
@@ -1474,14 +1474,14 @@ TS_phrase_execute(QueryItem *curitem,
 */
 
Rpos = Rdata.pos;
-   LposStart = Ldata.pos;
while (Rpos < Rdata.pos + Rdata.npos)
{
/*
-* We need to check all possible distances, so reset 
Lpos to
-* guaranteed not yet satisfied position.
+* For each physically positioned right-side operand 
iterate over each
+* instance of the left-side operand to locate one at 
exactly the specified
+* distance.  As soon as a match is found move onto the 
next right-operand
+* and continue searching starting with the next 
left-operand.
 */
-   Lpos = LposStart;
while (Lpos < Ldata.pos + Ldata.npos)
{
if (WEP_GETPOS(*Rpos) - WEP_GETPOS(*Lpos) ==
@@ -1490,16 +1490,16 @@ TS_phrase_execute(QueryItem *curitem,
/* MATCH! */
if (data)
{
-   /* Store position for upper 
phrase operator */
+   /* Store the position of the 
right-operand */
*pos_iter = WEP_GETPOS(*Rpos);
pos_iter++;
 
/*
-* Set left start position to 
next, because current
-* one could not satisfy 
distance for any other right
-* position
+* Move onto the next 
right-operand, and also the next
+* left-operand as the current 
one cannot be a match
+* for any other.
 */
-   LposStart = Lpos + 1;
+   Lpos++
break;
}
else
@@ -1512,18 +1512,23 @@ TS_phrase_execute(QueryItem *curitem,
}
 
}
-   else if (WEP_GETPOS(*Rpos) <= WEP_GETPOS(*Lpos) 
||
-WEP_GETPOS(*Rpos) - 
WEP_GETPOS(*Lpos) <
+   else if (WEP_GETPOS(*Rpos) - WEP_GETPOS(*Lpos) <
 curitem->qoperator.distance)
{
/*
-* Go to the next Rpos, because Lpos is 
ahead or on less
-* distance than required by current 
operator
+* We are now within the required 
distance so
+* continue onto the next right-operand 
and retry
+* the current left-operand to see if 
the added
+* distance causes it to match.
 */
break;
 
}
 
+   /*
+* The current left-operand is too far away so
+* try the next one.
+*/
Lpos++;
}
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Slowness of extended protocol

2016-08-09 Thread Shay Rojansky
On Tue, Aug 9, 2016 at 3:42 PM, Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:

> Shay>But here's the more important general point. We're driver
> developers, not application developers. I don't really know what
> performance is "just fine" for each of my users, and what is not worth
> optimizing further. Users may follow best practices, or they may not for
> various reasons.
>
> Of course we cannot benchmark all the existing applications, however we
> should at lest try to use "close to production" benchmarks.
>
> Let me recap: even "select 1" shows clear advantage of reusing
> server-prepared statements.
> My machine shows the following results for "select 1 pgbench":
> simple: 38K ops/sec (~26us/op)
> extended: 32K ops/sec (~31us/op)
> prepared: 47K ops/sec (~21us/op)
>
> Note: reusing server-prepared statements shaves 10us (out of 31us), while
> "brand new ParseBindExecDeallocate" message would not able to perform
> better than 26us/op (that is 5 us worse than the prepared one). So it makes
> much more sense investing time in "server-prepared statement reuse" at the
> client side and "improving Bind/Exec performance" at the backend side.
>
> For more complex queries the gap (prepared vs simple) would be much bigger
> since parse/validate/plan for a complex query is much harder operation than
> the one for "select 1"
>

You seem to be misunderstanding the fundamental point here. Nobody is
saying that prepared statements aren't a huge performance booster - they
are. I recommend them to anyone who asks. But you're saying "let's not
optimize anything else", whereas there are many programs out there *not*
using prepared statements for various reasons (e.g. pgbouncer, or simply an
existing codebase). If your opinion is that nothing should be done for
these users, fine - nobody's forcing you to do anything. I simply don't see
why *not* optimize the very widely-used single-statement execution path.


> Note: I do not mean "look, prepared always win". I mean: "if your
> environment does not allow reuse of prepared statements for some reason,
> you lose huge amount of time on re-parsing the queries, and it is worth
> fixing that obvious issue first".
>

Maybe it's worth fixing it, maybe not - that's going to depend on the
application. Some applications may be huge/legacy and hard to change,
others may depend on something like pgbouncer which doesn't allow it. Other
drivers out there probably don't persist prepared statements across
close/open, making prepared statements useless for short-lived scenarios.
Does the Python driver persist prepared statements? Does the Go driver do
so? If not the single-execution flow is very relevant for optimization.


> Shay>I don't see how reusing SQL text affects security in any way.
>
> Reusing SQL text makes application more secure as "build SQL on the fly"
> is prone to SQL injection security issues.
> So reusing SQL text makes application more secure and it enables
> server-prepared statements that improve performance considerably. It is a
> win-win.
>

We've all understood that server-prepared statements are good for
performance. But they aren't more or less vulnerable to SQL injection -
developers can just as well concatenate user-provided strings into a
prepared SQL text.

[deleting more comments trying to convince that prepared statements are
great for performance, which they are]

Shay>Again, in a world where prepared statements aren't persisted across
> connections (e.g. pgbouncer)
>
> pgbouncer does not properly support named statements, and that is
> pbgouncer's issue.
>
> Here's the issue for pgbouncer project: https://github.com/
> pgbouncer/pgbouncer/issues/126#issuecomment-200900171
> The response from pgbouncer team is "all the protocol bits are there, it
> is just implementation from pgbouncer that is missing".
>
> By the way: I do not use pgbouncer, thus there's no much interest for me
> to invest time in fixing pgbouncer's issues.
>

Um, OK... So you're not at all bothered by the fact that the (probably)
most popular PostgreSQL connection pool is incompatible with your argument?
I'm trying to think about actual users and the actual software they use, so
pgbouncer is very relevant.

Shay>Any scenario where you open a relatively short-lived connection and
> execute something once is problematic - imagine a simple web service which
> needs to insert a single record into a table.
>
> I would assume the application does not use random string for a table name
> (and columns/aliases), thus it would result in typical SQL text reuse, thus
> it should trigger "server-side statement prepare" logic. In other way, that
> kind of application does not need the "new ParseBindExecDeallocate message
> we are talking about".
>

I wouldn't assume anything. Maybe the application does want to change
column names (which can't be parameterized). Maybe it needs the extended
protocol simply because it wants to do binary encoding, which isn't
possible with the simple protocol 

Re: [HACKERS] 9.6 phrase search distance specification

2016-08-09 Thread Ryan Pedela
On Tue, Aug 9, 2016 at 12:59 PM, Ryan Pedela  wrote:

>
>
> Thanks,
>
> Ryan Pedela
> Datalanche CEO, founder
> www.datalanche.com
>
> On Tue, Aug 9, 2016 at 11:58 AM, Tom Lane  wrote:
>
>> Bruce Momjian  writes:
>> > Does anyone know why the phrase distance "<3>" was changed from "at most
>> > three tokens away" to "exactly three tokens away"?
>>
>> So that it would correctly support phraseto_tsquery's use of the operator
>> to represent omitted words (stopwords) in a phrase.
>>
>> I think there's probably some use in also providing an operator that does
>> "at most this many tokens away", but Oleg/Teodor were evidently less
>> excited, because they didn't take the time to do it.
>>
>> The thread where this change was discussed is
>>
>> https://www.postgresql.org/message-id/flat/c19fcfec308e6ccd9
>> 52cdde9e648b505%40mail.gmail.com
>>
>> see particularly
>>
>> https://www.postgresql.org/message-id/11252.1465422251%40sss.pgh.pa.us
>
>
>  I would say that it is worth it to have a "phrase slop" operator (Apache
> Lucene terminology). Proximity search is extremely useful for improving
> relevance and phrase slop is one of the tools to achieve that.
>
>
Sorry for the position of my signature

Ryan


Re: [HACKERS] 9.6 phrase search distance specification

2016-08-09 Thread Ryan Pedela
Thanks,

Ryan Pedela
Datalanche CEO, founder
www.datalanche.com

On Tue, Aug 9, 2016 at 11:58 AM, Tom Lane  wrote:

> Bruce Momjian  writes:
> > Does anyone know why the phrase distance "<3>" was changed from "at most
> > three tokens away" to "exactly three tokens away"?
>
> So that it would correctly support phraseto_tsquery's use of the operator
> to represent omitted words (stopwords) in a phrase.
>
> I think there's probably some use in also providing an operator that does
> "at most this many tokens away", but Oleg/Teodor were evidently less
> excited, because they didn't take the time to do it.
>
> The thread where this change was discussed is
>
> https://www.postgresql.org/message-id/flat/c19fcfec308e6ccd952cdde9e648b5
> 05%40mail.gmail.com
>
> see particularly
>
> https://www.postgresql.org/message-id/11252.1465422251%40sss.pgh.pa.us


 I would say that it is worth it to have a "phrase slop" operator (Apache
Lucene terminology). Proximity search is extremely useful for improving
relevance and phrase slop is one of the tools to achieve that.


Re: [HACKERS] 9.6 phrase search distance specification

2016-08-09 Thread Bruce Momjian
On Tue, Aug  9, 2016 at 01:58:25PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > Does anyone know why the phrase distance "<3>" was changed from "at most
> > three tokens away" to "exactly three tokens away"?
> 
> So that it would correctly support phraseto_tsquery's use of the operator
> to represent omitted words (stopwords) in a phrase.
> 
> I think there's probably some use in also providing an operator that does
> "at most this many tokens away", but Oleg/Teodor were evidently less
> excited, because they didn't take the time to do it.
> 
> The thread where this change was discussed is
> 
> https://www.postgresql.org/message-id/flat/c19fcfec308e6ccd952cdde9e648b505%40mail.gmail.com
> 
> see particularly
> 
> https://www.postgresql.org/message-id/11252.1465422251%40sss.pgh.pa.us

Ah, I know it was discussed somewhere.  Thanks, the phraseto_tsquery
tie-in was what I forgot.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dsm_unpin_segment

2016-08-09 Thread Robert Haas
On Mon, Aug 8, 2016 at 8:53 PM, Tom Lane  wrote:
> Thomas Munro  writes:
>> Yeah, I was considering unbalanced pin/unpin requests to be a
>> programming error.  To be more defensive about that, how about I add a
>> boolean 'pinned' to dsm_control_item, and elog(ERROR, ...) if it's not
>> in the expected state when you try to pin or unpin?
>
> Well, what you have there is a one-bit-wide pin request counter.
> I do not see why that's better than an actual counter, but if that's
> what you want to do, fine.
>
> The larger picture here is that Robert is exhibiting a touching but
> unfounded faith that extensions using this feature will contain zero bugs.

That's an overstatement of my position.  I think it is quite likely
that extensions using this feature will have bugs, because essentially
all code has bugs, but whether they are likely have the specific bug
of unpinning a segment that is already unpinned is not quite so clear.
That's not to say I object to Thomas's v2 patch, which will catch that
mistake if it happens.  Defensive programming never killed anybody, as
far as I know.  However, I don't see the need for a full-blown request
counter here; we've had this API for several releases now and to my
knowledge nobody has complained about the fact that you aren't
supposed to call dsm_pin_segment() multiple times for the same
segment.  Therefore, I think the evidence supports the contention that
it's not broken and doesn't need to be fixed.  If we do decide it
needs to be fixed, I think that's material for a separate patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.6 phrase search distance specification

2016-08-09 Thread Tom Lane
Bruce Momjian  writes:
> Does anyone know why the phrase distance "<3>" was changed from "at most
> three tokens away" to "exactly three tokens away"?

So that it would correctly support phraseto_tsquery's use of the operator
to represent omitted words (stopwords) in a phrase.

I think there's probably some use in also providing an operator that does
"at most this many tokens away", but Oleg/Teodor were evidently less
excited, because they didn't take the time to do it.

The thread where this change was discussed is

https://www.postgresql.org/message-id/flat/c19fcfec308e6ccd952cdde9e648b505%40mail.gmail.com

see particularly

https://www.postgresql.org/message-id/11252.1465422251%40sss.pgh.pa.us

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: xmltable - proof concept

2016-08-09 Thread Pavel Stehule
2016-08-09 19:30 GMT+02:00 Alvaro Herrera :

> Pavel Stehule wrote:
>
> > postgres=# SELECT  xmltable.*
> > postgres-#FROM (SELECT data FROM xmldata) x,
> > postgres-# LATERAL xmltable('/ROWS/ROW'
> > postgres(#  PASSING data
> > postgres(#  COLUMNS id int PATH '@id',
> > postgres(#   country_name text PATH
> > 'COUNTRY_NAME',
> > postgres(#   country_id text PATH
> > 'COUNTRY_ID',
> > postgres(#   region_id int PATH
> 'REGION_ID',
> > postgres(#   size float PATH 'SIZE',
> > postgres(#   unit text PATH 'SIZE/@unit',
> > postgres(#   premier_name text PATH
> > 'PREMIER_NAME' DEFAULT 'not specified');
> > ┌┬──┬┬───┬──┬──┬
> ───┐
> > │ id │ country_name │ country_id │ region_id │ size │ unit │
> premier_name  │
> > ╞╪══╪╪═══╪══╪══╪
> ═══╡
> > │  1 │ Australia│ AU │ 3 │¤ │ ¤│ not
> specified │
> > │  2 │ China│ CN │ 3 │¤ │ ¤│ not
> specified │
> > │  3 │ HongKong │ HK │ 3 │¤ │ ¤│ not
> specified │
> > │  4 │ India│ IN │ 3 │¤ │ ¤│ not
> specified │
> > │  5 │ Japan│ JP │ 3 │¤ │ ¤│ Sinzo Abe
>│
> > │  6 │ Singapore│ SG │ 3 │  791 │ km   │ not
> specified │
> > └┴──┴┴───┴──┴──┴
> ───┘
> > (6 rows)
>
> Nice work!
>

Thank you

Pavel


>
> --
> Álvaro Herrerahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


[HACKERS] 9.6 phrase search distance specification

2016-08-09 Thread Bruce Momjian
Does anyone know why the phrase distance "<3>" was changed from "at most
three tokens away" to "exactly three tokens away"?  I looked at the
thread at:

https://www.postgresql.org/message-id/flat/33828354.WrrSMviC7Y%40abook

and didn't see the answer.  I assume if you are looking for "<3>" you
would want "<2>" matches and "<1>" matches as well.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: xmltable - proof concept

2016-08-09 Thread Alvaro Herrera
Pavel Stehule wrote:

> postgres=# SELECT  xmltable.*
> postgres-#FROM (SELECT data FROM xmldata) x,
> postgres-# LATERAL xmltable('/ROWS/ROW'
> postgres(#  PASSING data
> postgres(#  COLUMNS id int PATH '@id',
> postgres(#   country_name text PATH
> 'COUNTRY_NAME',
> postgres(#   country_id text PATH
> 'COUNTRY_ID',
> postgres(#   region_id int PATH 'REGION_ID',
> postgres(#   size float PATH 'SIZE',
> postgres(#   unit text PATH 'SIZE/@unit',
> postgres(#   premier_name text PATH
> 'PREMIER_NAME' DEFAULT 'not specified');
> ┌┬──┬┬───┬──┬──┬───┐
> │ id │ country_name │ country_id │ region_id │ size │ unit │ premier_name  │
> ╞╪══╪╪═══╪══╪══╪═══╡
> │  1 │ Australia│ AU │ 3 │¤ │ ¤│ not specified │
> │  2 │ China│ CN │ 3 │¤ │ ¤│ not specified │
> │  3 │ HongKong │ HK │ 3 │¤ │ ¤│ not specified │
> │  4 │ India│ IN │ 3 │¤ │ ¤│ not specified │
> │  5 │ Japan│ JP │ 3 │¤ │ ¤│ Sinzo Abe │
> │  6 │ Singapore│ SG │ 3 │  791 │ km   │ not specified │
> └┴──┴┴───┴──┴──┴───┘
> (6 rows)

Nice work!

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: xmltable - proof concept

2016-08-09 Thread Pavel Stehule
Hi

2016-08-07 11:15 GMT+02:00 Pavel Stehule :

> Hi
>
> I am sending a initial implementation of xmltable function:
>
> The code is not clean now, but it does almost of expected work. The usage
> is simple. It is fast - 16K entries in 400ms.
>
> I invite any help with documentation and testing.
>
> The full ANSI/SQL, or Oracle compatible implementation is not possible due
> limits of libxml2, but for typical usage it should to work well. It doesn't
> need any new reserved keyword, so there should not be hard barriers for
> accepting (when this work will be complete).
>
> Example:
>
> postgres=# SELECT * FROM xmldata;
> ┌──┐
> │   data   │
> ╞══╡
> │   ↵│
> │ ↵│
> │   AU   ↵│
> │   Australia↵│
> │   3  ↵│
> │   ↵│
> │ ↵│
> │   CN   ↵│
> │   China↵│
> │   3  ↵│
> │   ↵│
> │ ↵│
> │   HK   ↵│
> │   HongKong ↵│
> │   3  ↵│
> │   ↵│
> │ ↵│
> │   IN   ↵│
> │   India↵│
> │   3  ↵│
> │   ↵│
> │ ↵│
> │   JP   ↵│
> │   Japan↵│
> │   3Sinzo Abe↵│
> │   ↵│
> │ ↵│
> │   SG   ↵│
> │   Singapore↵│
> │   3791↵│
> │   ↵│
> │   │
> └──┘
> (1 row)
> postgres=# SELECT  xmltable.*
> postgres-#FROM (SELECT data FROM xmldata) x,
> postgres-# LATERAL xmltable('/ROWS/ROW'
> postgres(#  PASSING data
> postgres(#  COLUMNS id int PATH '@id',
> postgres(#   country_name text PATH
> 'COUNTRY_NAME',
> postgres(#   country_id text PATH
> 'COUNTRY_ID',
> postgres(#   region_id int PATH
> 'REGION_ID',
> postgres(#   size float PATH 'SIZE',
> postgres(#   unit text PATH 'SIZE/@unit',
> postgres(#   premier_name text PATH
> 'PREMIER_NAME' DEFAULT 'not specified');
> ┌┬──┬┬───┬──┬──┬
> ───┐
> │ id │ country_name │ country_id │ region_id │ size │ unit │ premier_name
> │
> ╞╪══╪╪═══╪══╪══╪
> ═══╡
> │  1 │ Australia│ AU │ 3 │¤ │ ¤│ not specified
> │
> │  2 │ China│ CN │ 3 │¤ │ ¤│ not specified
> │
> │  3 │ HongKong │ HK │ 3 │¤ │ ¤│ not specified
> │
> │  4 │ India│ IN │ 3 │¤ │ ¤│ not specified
> │
> │  5 │ Japan│ JP │ 3 │¤ │ ¤│ Sinzo Abe
> │
> │  6 │ Singapore│ SG │ 3 │  791 │ km   │ not specified
> │
> └┴──┴┴───┴──┴──┴
> ───┘
> (6 rows)
>
> Regards
>
>
I am sending updated version - the code is not better, but there is full
functionality implemented.

* xmlnamespaces,
* default xmlnamespace,
* ordinality column,
* NOT NULL constraint,
* mode without explicitly defined columns.

Lot of bugs was fixed - it is ready for some playing.

tests, comments, notes, comparing with other db are welcome. Some behave is
based by libxml2 possibilities - so only XPath is supported.

Regards

Pavel


> Pavel
>
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 69bf65d..a20c576 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -58,7 +58,6 @@
 #include "utils/typcache.h"
 #include "utils/xml.h"
 
-
 /* static function decls */
 static Datum ExecEvalArrayRef(ArrayRefExprState *astate,
  ExprContext 

Re: [HACKERS] Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

2016-08-09 Thread Andrew Borodin
Here is new version of the patch, now it includes recommendations from
Anastasia Lubennikova.


I've investigated anamalous index size decrease. Most probable version
appeared to be true.
Cube extension, as some others, use Guttman's polynomial time split
algorithm. It is known to generate "needle-like" MBBs (MBRs) for
sorted data due to imbalanced splits (splitting 100 tuples as 98 to
2). Due to previous throw-to-the-end behavior of GiST this imbalance
was further amplificated (most of inserts were going to bigger part
after split). So GiST inserts were extremely slow for sorted data.
There is no need to do exact sorting to trigger this behavior.
This behavior can be fused by implementation of small-m restriction in
split (AFAIR this is described in original R-tree paper from 84),
which prescribes to do a split in a parts no smaller than m, where m
is around 20% of a page capacity (in tuples number). After application
of this patch performance for sorted data is roughly the same as
performance for randomized data.

If data is randomized this effect of Guttman poly-time split is not in
effect; test from the start of the thread will show no statistically
confident improvement by the patch.
To prove performance increase in randomized case I've composed
modified test https://gist.github.com/x4m/856b2e1a5db745f8265c76b9c195f2e1
This test with five runs shows following:
Before patch
Insert Time AVG 78 seconds STD 9.5
Afer patch
Insert Time AVG 68 seconds STD 7.7
This is marginal but statistically viable performance improvement.

For sorted data performance is improved by a factor of 3.

Best regards, Andrey Borodin, Octonica & Ural Federal University.


PageIndexTupleOverwrite v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-09 Thread Andrew Gierth
> "Alvaro" == Alvaro Herrera  writes:

 >> One idea is utils/adt/misc.c.  Or we could make a new file under
 >> utils/adt/ though I'm not very sure what to name it.  amaccess.c?
 >> catutils.c?  If there's only ever likely to be one or two functions
 >> of this ilk, maybe a new file is overkill and we should just use
 >> misc.c.

 Alvaro> I like the idea of a new file; I have a hunch that it will
 Alvaro> grow, given that we're expanding in this area, and perhaps we
 Alvaro> can find some existing stuff to relocate there in the future.
 Alvaro> I don't think a small file is a problem, anyway.

 Alvaro> How about amfuncs.c?  Maybe it can live in catalog/ instead of
 Alvaro> utils/adt?

Well, the existing patch used access/index/amapi.c for the AM capability
functions. There may be some merit in keeping everything together - I
asked because it didn't seem at first glance that the index column
property function belonged there, but on second thought there's some
overlap in that in future, if indoptions ever acquires any AM-specific
flags, it may be necessary for pg_index_column_has_property to call into
an AM-specific function.

So, here are some options:

1. Put everything in access/index/amapi.c

2. Move either all of access/index/amapi.c, or just the SQL-callable
   part of it (amvalidate), to utils/adt/amfuncs.c and put new stuff in
   there

3. put pg_index[am]_has_capability in access/index/amapi.c and
   pg_index_column_has_property in utils/adt/misc.c

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set log_line_prefix and application name in test drivers

2016-08-09 Thread Tom Lane
Peter Eisentraut  writes:
> Here is a small patch that sets log_line_prefix and application name in
> pg_regress and the TAP tests, to make analyzing the server log output
> easier.

How would this interact with the buildfarm's existing policies
on setting log_line_prefix?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Set log_line_prefix and application name in test drivers

2016-08-09 Thread Peter Eisentraut
Here is a small patch that sets log_line_prefix and application name in
pg_regress and the TAP tests, to make analyzing the server log output
easier.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 0595672e9272a53162eda315c15835c26fdb6d10 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 9 Aug 2016 11:56:49 -0400
Subject: [PATCH] Set log_line_prefix and application name in test drivers

Before pg_regress runs psql, set the application name to the test name.
Similarly, set the application name to the test file name in the TAP
tests.  Also, set a default log_line_prefix that show the application
name, as well as the PID and a time stamp.

That way, the server log output can be correlated to the test input
files, making debugging a bit easier.
---
 src/test/perl/PostgresNode.pm  | 1 +
 src/test/perl/TestLib.pm   | 2 ++
 src/test/regress/pg_regress.c  | 1 +
 src/test/regress/pg_regress_main.c | 7 +++
 4 files changed, 11 insertions(+)

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index e629373..3f2ca91 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -402,6 +402,7 @@ sub init
 	open my $conf, ">>$pgdata/postgresql.conf";
 	print $conf "\n# Added by PostgresNode.pm\n";
 	print $conf "fsync = off\n";
+	print $conf "log_line_prefix = '%t [%p]: [%l] %qapp=%a '\n";
 	print $conf "log_statement = all\n";
 	print $conf "port = $port\n";
 
diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm
index 649fd82..27fcc78 100644
--- a/src/test/perl/TestLib.pm
+++ b/src/test/perl/TestLib.pm
@@ -60,6 +60,8 @@ BEGIN
 	delete $ENV{PGPORT};
 	delete $ENV{PGHOST};
 
+	$ENV{PGAPPNAME} = $0;
+
 	# Must be set early
 	$windows_os = $Config{osname} eq 'MSWin32' || $Config{osname} eq 'msys';
 }
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 574f5b8..d5a8e16 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2247,6 +2247,7 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 		fputs("\n# Configuration added by pg_regress\n\n", pg_conf);
 		fputs("log_autovacuum_min_duration = 0\n", pg_conf);
 		fputs("log_checkpoints = on\n", pg_conf);
+		fputs("log_line_prefix = '%t [%p]: [%l] %qapp=%a '\n", pg_conf);
 		fputs("log_lock_waits = on\n", pg_conf);
 		fputs("log_temp_files = 128kB\n", pg_conf);
 		fputs("max_prepared_transactions = 2\n", pg_conf);
diff --git a/src/test/regress/pg_regress_main.c b/src/test/regress/pg_regress_main.c
index d9591c0..2733635 100644
--- a/src/test/regress/pg_regress_main.c
+++ b/src/test/regress/pg_regress_main.c
@@ -34,6 +34,7 @@ psql_start_test(const char *testname,
 	char		expectfile[MAXPGPATH];
 	char		psql_cmd[MAXPGPATH * 3];
 	size_t		offset = 0;
+	char	   *appnameenv;
 
 	/*
 	 * Look for files in the output dir first, consistent with a vpath search.
@@ -63,6 +64,9 @@ psql_start_test(const char *testname,
 		offset += snprintf(psql_cmd + offset, sizeof(psql_cmd) - offset,
 		   "%s ", launcher);
 
+	appnameenv = psprintf("PGAPPNAME=pg_regress/%s", testname);
+	putenv(appnameenv);
+
 	snprintf(psql_cmd + offset, sizeof(psql_cmd) - offset,
 			 "\"%s%spsql\" -X -a -q -d \"%s\" < \"%s\" > \"%s\" 2>&1",
 			 bindir ? bindir : "",
@@ -80,6 +84,9 @@ psql_start_test(const char *testname,
 		exit(2);
 	}
 
+	unsetenv("PGAPPNAME");
+	free(appnameenv);
+
 	return pid;
 }
 
-- 
2.9.2


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-09 Thread Alvaro Herrera
Tom Lane wrote:
> Andrew Gierth  writes:
> > Where'd be a good place to put that function? ruleutils? catalog/index.c ?
> 
> > (ruleutils is way too big already)
> 
> Agreed.  catalog/index.c is not a place that implements SQL-visible
> functions, so I don't like that either.
> 
> One idea is utils/adt/misc.c.  Or we could make a new file under
> utils/adt/ though I'm not very sure what to name it.  amaccess.c?
> catutils.c?  If there's only ever likely to be one or two functions
> of this ilk, maybe a new file is overkill and we should just use misc.c.

I like the idea of a new file; I have a hunch that it will grow, given
that we're expanding in this area, and perhaps we can find some existing
stuff to relocate there in the future.  I don't think a small file is a
problem, anyway.

How about amfuncs.c?  Maybe it can live in catalog/ instead of
utils/adt?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Slowness of extended protocol

2016-08-09 Thread Vladimir Sitnikov
Shay>But here's the more important general point. We're driver developers,
not application developers. I don't really know what performance is "just
fine" for each of my users, and what is not worth optimizing further. Users
may follow best practices, or they may not for various reasons.

Of course we cannot benchmark all the existing applications, however we
should at lest try to use "close to production" benchmarks.

Let me recap: even "select 1" shows clear advantage of reusing
server-prepared statements.
My machine shows the following results for "select 1 pgbench":
simple: 38K ops/sec (~26us/op)
extended: 32K ops/sec (~31us/op)
prepared: 47K ops/sec (~21us/op)

Note: reusing server-prepared statements shaves 10us (out of 31us), while
"brand new ParseBindExecDeallocate" message would not able to perform
better than 26us/op (that is 5 us worse than the prepared one). So it makes
much more sense investing time in "server-prepared statement reuse" at the
client side and "improving Bind/Exec performance" at the backend side.

For more complex queries the gap (prepared vs simple) would be much bigger
since parse/validate/plan for a complex query is much harder operation than
the one for "select 1"

Note: I do not mean "look, prepared always win". I mean: "if your
environment does not allow reuse of prepared statements for some reason,
you lose huge amount of time on re-parsing the queries, and it is worth
fixing that obvious issue first".

Shay>I don't see how reusing SQL text affects security in any way.

Reusing SQL text makes application more secure as "build SQL on the fly" is
prone to SQL injection security issues.
So reusing SQL text makes application more secure and it enables
server-prepared statements that improve performance considerably. It is a
win-win.

Shay>a new feature in the Npgsql dev branch which allows prepared
statements to be persisted across open/close on pooled connections

Do you have some benchmark results for "reusing server-prepared statements
across open/close on pooled"? I would expect that feature to be a great win.

Once again, I'd like to focus on real-life cases, not artificial ones.

For example, the major part of my performance fixes to pgjdbc were made as
a part of improving my java application that was suffering from performance
issues when talking to PostgreSQL.
For instance, there were queries that took 20ms to plan and 0.2ms to
execute (the query is like where id=? but the SQL text was more involved).
As transparent server-side statement was implemented at pgjdbc side, it
shaved those 20ms by eliminating Parse messages on the hot path.

In other words, it was not just "lets optimize pgjdbc". It was driven by
the need to optimize the client application, and the profiling results were
pointing to pgjdbc issues.

Shay>Again, in a world where prepared statements aren't persisted across
connections (e.g. pgbouncer)

pgbouncer does not properly support named statements, and that is
pbgouncer's issue.

Here's the issue for pgbouncer project:
https://github.com/pgbouncer/pgbouncer/issues/126#issuecomment-200900171
The response from pgbouncer team is "all the protocol bits are there, it is
just implementation from pgbouncer that is missing".

By the way: I do not use pgbouncer, thus there's no much interest for me to
invest time in fixing pgbouncer's issues.


Shay>Any scenario where you open a relatively short-lived connection and
execute something once is problematic - imagine a simple web service which
needs to insert a single record into a table.

I would assume the application does not use random string for a table name
(and columns/aliases), thus it would result in typical SQL text reuse, thus
it should trigger "server-side statement prepare" logic. In other way, that
kind of application does not need the "new ParseBindExecDeallocate message
we are talking about".

In other words, if an application is using "select name from objects where
id=$1" kind of queries, the driver should be using extended protocol
(Bind/Exec) behind the scenes if it does aim to get high performance.

Vladimir


Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-09 Thread Tom Lane
Andrew Gierth  writes:
> Where'd be a good place to put that function? ruleutils? catalog/index.c ?

> (ruleutils is way too big already)

Agreed.  catalog/index.c is not a place that implements SQL-visible
functions, so I don't like that either.

One idea is utils/adt/misc.c.  Or we could make a new file under
utils/adt/ though I'm not very sure what to name it.  amaccess.c?
catutils.c?  If there's only ever likely to be one or two functions
of this ilk, maybe a new file is overkill and we should just use misc.c.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-09 Thread Andrew Gierth
> "Kevin" == Kevin Grittner  writes:

 >>> Building on the has-property approach Andrew suggested, I wonder if
 >>> we need something like pg_index_column_has_property(indexoid, colno,
 >>> propertyname) with properties like "sortable", "desc", "nulls first".
 >> 
 >> This seems simple enough, on the surface.  Why not run with this design?

 Kevin> Andrew's patch, plus this, covers everything I can think of.

Where'd be a good place to put that function? ruleutils? catalog/index.c ?

(ruleutils is way too big already)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c

2016-08-09 Thread Amit Kapila
On Mon, Aug 8, 2016 at 6:00 PM, Dilip Kumar  wrote:
> On Mon, Aug 8, 2016 at 5:23 PM, Amit Kapila  wrote:
>>
>>
>> Did you consider to use ERRCODE_UNDEFINED_COLUMN with error messages
>> like: "type %u does not exit" or "type id %u does not exit"? Errorcode
>> ERRCODE_UNDEFINED_COLUMN seems to be used for syscache lookup failure
>> cases at certain places in code.
>
>
> But I think, OBJECT will be more appropriate than COLUMN at this place.
>

Okay, then how about ERRCODE_UNDEFINED_OBJECT?  It seems to be used in
almost similar cases.

> However I can change error message to "type id %u does not exit" if this
> seems better ?
>

I think that is better.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Slowness of extended protocol

2016-08-09 Thread Shay Rojansky
On Tue, Aug 9, 2016 at 8:50 AM, Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:

> Shay>There are many scenarios where connections are very short-lived
> (think about webapps where a pooled connection is allocated per-request and
> reset in between)
>
> Why the connection is reset in between in the first place?
> In pgjdbc we do not reset per-connection statement cache, thus we easily
> reuse named statements for pooled connections.
>

A DISCARD ALL is sent when the connection is returned to the pool, the
prevent state leakage etc. You make a valid comment though - there's
already a new feature in the Npgsql dev branch which allows prepared
statements to be persisted across open/close on pooled connections, it's
interesting to learn that it is standard behavior in pgjdbc. At least up to
now, the logic was not to implcitly keep holding server-side resources
across a pooled connection close, because these may become a drain on the
server etc.

More important, unless I'm mistaken pgbouncer also sends DISCARD ALL to
clean the connection state, as will other pooling solutions. That
unfortunately means that you can't always depend on prepared statements to
persist after closing the connection.


> Shay>There are also many scenarios where you're not necessarily going to
> send the same query multiple times in a single connection lifespan, so
> preparing is again out of the question.
>

> Can you list at least one scenario of that kind, so we can code it into
> pgbench (or alike) and validate "simple vs prepared" performance?
>

Again, in a world where prepared statements aren't persisted across
connections (e.g. pgbouncer), this scenario is extremely common. Any
scenario where you open a relatively short-lived connection and execute
something once is problematic - imagine a simple web service which needs to
insert a single record into a table.

Shay>And more generally, there's no reason for a basic, non-prepared
> execution to be slower than it can be.
>
> That's too generic. If the performance for "end-to-end cases" is just
> fine, then it is not worth optimizing further. Typical application best
> practice is to reuse SQL text (for both security and performance point of
> views), so in typical applications I've seen, query text was reused, thus
> it naturally was handled by server-prepared logic.
>

I don't see how reusing SQL text affects security in any way.

But here's the more important general point. We're driver developers, not
application developers. I don't really know what performance is "just fine"
for each of my users, and what is not worth optimizing further. Users may
follow best practices, or they may not for various reasons. They may be
porting code over from SqlServer, where prepare is almost never used
(because SqlServer caches plans implicitly), or they may simply not be very
good programmers. The API for executing non-prepared statements is there,
we support it and PostgreSQL supports it - it just happens to not be very
fast. Of course we can say "screw everyone not preparing statements", but
that doesn't seem like a very good way to treat users. Especially since the
fix isn't that hard.

Let me highlight another direction: current execution of server-prepared
> statement requires some copying of "parse tree" (or whatever). I bet it
> would be much better investing in removal of that copying rather than
> investing into "make one-time queries faster" thing. If we could make
> "Exec" processing faster, it would immediately improve tons of applications.
>

I don't understand what exactly you're proposing here, but if you have some
unrelated optimization that would speed up prepared statements, by all
means that's great. It's just unrelated to this thread.


> Shay>Of course we can choose a different query to benchmark instead of
> SELECT 1 - feel free to propose one (or several).
>
> I've tried pgbench -M prepared, and it is way faster than pgbench -M
> simple.
>
> Once again: all cases I have in mind would benefit from reusing
> server-prepared statements. In other words, after some warmup the
> appication would use just Bind-Execute-Sync kind of messages, and it would
> completely avoid Parse/Describe ones.
>

Of course that's the ideal scenario. It's just not the *only* scenario for
all users - they may either not have prepared statements persisting across
open/close as detailed above, or their code may simply not be preparing
statements at the moment. Why not help them out for free?

Shay>FYI in Npgsql specifically describe isn't used to get any knowledge
> about parameters - users must populate the correct parameters or query
> execution fails.
>
> I think the main reason to describe for pgjdbc is to get result oids.
> pgjdbc is not "full binary", thus it has to be careful which fields it
> requests in binary format.
> That indeed slows down "unknown queries", but as the query gets reused,
> pgjdbc switches to server-prepared execution, and eliminates parse-describe
> overheads 

Re: [HACKERS] Wait events monitoring future development

2016-08-09 Thread Bruce Momjian
On Tue, Aug  9, 2016 at 04:17:28AM +, Tsunakawa, Takayuki wrote:
> From: pgsql-hackers-ow...@postgresql.org
> > I used to think of that this kind of features should be enabled by default,
> > because when I was working at the previous company, I had only few features
> > to understand what is happening inside PostgreSQL by observing production
> > databases. I needed those features enabled in the production databases when
> > I was called.
> > 
> > However, now I have another opinion. When we release the next major release
> > saying 10.0 with the wait monitoring, many people will start their benchmark
> > test with a configuration with *the default values*, and if they see some
> > performance decrease, for example around 10%, they will be talking about
> > it as the performance decrease in PostgreSQL 10.0. It means PostgreSQL will
> > be facing difficult reputation.
> > 
> > So, I agree with the features should be disabled by default for a while.
> 
> I understand your feeling well.  This is a difficult decision.  Let's hope 
> for trivial overhead.

I think the goal is that some internal tracking can be enabled by
default and some internal or external tool can be turned on and off to
get more fine-grained statistics about the event durations.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication WIP

2016-08-09 Thread Petr Jelinek

On 09/08/16 12:16, Craig Ringer wrote:

On 9 August 2016 at 17:28, Masahiko Sawada > wrote:


> Sure, you can go deeper down the rabbit hole here and say that we need to
> add bgworker "categories" with reserved pools of worker slots for each
> category. But do we really need that?

If we change these processes to bgworker, we can categorize them into
two, auxiliary process(check pointer and  wal sender etc) and other
worker process.
And max_worker_processes controls the latter.


Right. I think that's probably the direction we should be going
eventually. Personally I don't think such a change should block the
logical replication work from proceeding with bgworkers, though. It's
been delayed a long time, a lot of people want it, and I think we need
to focus on meeting the core requirements not getting too sidetracked on
minor points.

Of course, everyone's idea of what's core and what's a minor sidetrack
differs ;)



Yeah that's why I added local max GUC that just handles the logical 
worker limit within the max_worker_processes. I didn't want to also 
write generic framework for managing the max workers using tags or 
something as part of this, it's big enough as it is and we can always 
move the limit to the more generic place once we have it.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication WIP

2016-08-09 Thread Craig Ringer
On 9 August 2016 at 17:28, Masahiko Sawada  wrote:


> > Sure, you can go deeper down the rabbit hole here and say that we need to
> > add bgworker "categories" with reserved pools of worker slots for each
> > category. But do we really need that?
>
> If we change these processes to bgworker, we can categorize them into
> two, auxiliary process(check pointer and  wal sender etc) and other
> worker process.
> And max_worker_processes controls the latter.


Right. I think that's probably the direction we should be going eventually.
Personally I don't think such a change should block the logical replication
work from proceeding with bgworkers, though. It's been delayed a long time,
a lot of people want it, and I think we need to focus on meeting the core
requirements not getting too sidetracked on minor points.

Of course, everyone's idea of what's core and what's a minor sidetrack
differs ;)


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Logical Replication WIP

2016-08-09 Thread Masahiko Sawada
On Tue, Aug 9, 2016 at 5:13 PM, Craig Ringer  wrote:
> On 9 August 2016 at 15:59, Masahiko Sawada  wrote:
>
>>
>>
>> The logical replication launcher process and the apply process are
>> implemented as a bgworker. Isn't better to have them as an auxiliary
>> process like checkpointer, wal writer?
>
>
> I don't think so. The checkpointer, walwriter, autovacuum, etc predate
> bgworkers. I strongly suspect that if they were to be implemented now they'd
> use bgworkers.
>
> Now, perhaps we want a new bgworker "kind" for system workers or some other
> minor tweaks. But basically I think bgworkers are exactly what we should be
> using here.

I understood. Thanks!

>>
>> IMO the number of logical replication connections should not be
>> limited by max_worker_processes.
>
>
> Well, they *are* worker processes... but I take your point, that that
> setting has been "number of bgworkers the user can run" and it might not be
> expected that logical replication would use the same space.
>
> max_worker_progresses isn't just a limit, it controls how many shmem slots
> we allocate.
>
> I guess we could have a separate max_logical_workers or something, but I'm
> inclined to think that adds complexity without really making things any
> nicer. We'd just add them together to decide how many shmem slots to
> allocate and we'd have to keep track of how many slots were used by which
> types of backend. Or create a near-duplicate of the bgworker facility for
> logical rep.
>
> Sure, you can go deeper down the rabbit hole here and say that we need to
> add bgworker "categories" with reserved pools of worker slots for each
> category. But do we really need that?

If we change these processes to bgworker, we can categorize them into
two, auxiliary process(check pointer and  wal sender etc) and other
worker process.
And max_worker_processes controls the latter.

> max_connections includes everything, both system and user backends. It's not
> like we don't do this elsewhere. It's at worst a mild wart.
>
> The only argument I can see for not using bgworkers is for the supervisor
> worker. It's a singleton that launches the per-database workers, and
> arguably is a job that the postmaster could do better. The current design
> there stems from its origins as an extension. Maybe worker management could
> be simplified a bit as a result. I'd really rather not invent yet another
> new and mostly duplicate category of custom workers to achieve that though.
>
>

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication WIP

2016-08-09 Thread Petr Jelinek

On 09/08/16 10:13, Craig Ringer wrote:

On 9 August 2016 at 15:59, Masahiko Sawada > wrote:



The logical replication launcher process and the apply process are
implemented as a bgworker. Isn't better to have them as an auxiliary
process like checkpointer, wal writer?


I don't think so. The checkpointer, walwriter, autovacuum, etc predate
bgworkers. I strongly suspect that if they were to be implemented now
they'd use bgworkers.

Now, perhaps we want a new bgworker "kind" for system workers or some
other minor tweaks. But basically I think bgworkers are exactly what we
should be using here.



Agreed.



IMO the number of logical replication connections should not be
limited by max_worker_processes.


Well, they *are* worker processes... but I take your point, that that
setting has been "number of bgworkers the user can run" and it might not
be expected that logical replication would use the same space.


Again agree, I think we should ultimately go towards what PeterE 
suggested in 
https://www.postgresql.org/message-id/a2fffd92-6e59-a4eb-dd85-c5865ebca...@2ndquadrant.com




The only argument I can see for not using bgworkers is for the
supervisor worker. It's a singleton that launches the per-database
workers, and arguably is a job that the postmaster could do better. The
current design there stems from its origins as an extension. Maybe
worker management could be simplified a bit as a result. I'd really
rather not invent yet another new and mostly duplicate category of
custom workers to achieve that though.



It is simplified compared to pglogical (there is only 2 worker types not 
3). I don't think it's job of postmaster to scan catalogs however so it 
can't really start workers for logical replication. I actually modeled 
it more after autovacuum (using bgworkers though) than the original 
extension.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication WIP

2016-08-09 Thread Petr Jelinek

On 09/08/16 09:59, Masahiko Sawada wrote:

On 2016-08-05 17:00:13 +0200, Petr Jelinek wrote:

as promised here is WIP version of logical replication patch.




Thank you for working on this!


Thanks for looking!



I've applied these patches to current HEAD, but got the following error.

libpqwalreceiver.c:48: error: redefinition of typedef ‘WalReceiverConnHandle’
../../../../src/include/replication/walreceiver.h:137: note: previous
declaration of ‘WalReceiverConnHandle’ was here
make[2]: *** [libpqwalreceiver.o] Error 1
make[1]: *** [install-backend/replication/libpqwalreceiver-recurse] Error 2
make: *** [install-src-recurse] Error 2

After fixed this issue with attached patch, I used logical replication a little.
Some random comments and questions.



Interesting, my compiler does have problem. Will investigate.


The logical replication launcher process and the apply process are
implemented as a bgworker. Isn't better to have them as an auxiliary
process like checkpointer, wal writer?
IMO the number of logical replication connections should not be
limited by max_worker_processes.



What Craig said reflects my rationale for doing this pretty well.


We need to set the publication up by at least CREATE PUBLICATION and
ALTER PUBLICATION command.
Can we make CREATE PUBLICATION possible to define tables as well?
For example,
CREATE PUBLICATION mypub [ TABLE table_name, ...] [WITH options]


Agreed, that just didn't make it to the first cut to -hackers. We've 
been also thinking of having special ALL TABLES parameter there that 
would encompass whole db.



--
This patch can not drop the subscription.

=# drop subscription sub;
ERROR:  unrecognized object class: 6102



Yeah that's because of the patch 0006, I didn't finish all the 
dependency tracking for the pg_subscription_rel catalog that it adds 
(which is why I called it PoC). I expect to have this working in next 
version (there is still quite a bit of polish work needed in general).


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Slowness of extended protocol

2016-08-09 Thread Vladimir Sitnikov
Shay>There are many scenarios where connections are very short-lived (think
about webapps where a pooled connection is allocated per-request and reset
in between)

Why the connection is reset in between in the first place?
In pgjdbc we do not reset per-connection statement cache, thus we easily
reuse named statements for pooled connections.

Shay>and the extra roundtrip that preparing entails is too much.

When server-prepared statement gets reused, neither parse neither describe
are used.

Shay>There are also many scenarios where you're not necessarily going to
send the same query multiple times in a single connection lifespan, so
preparing is again out of the question.

Can you list at least one scenario of that kind, so we can code it into
pgbench (or alike) and validate "simple vs prepared" performance?


Shay>And more generally, there's no reason for a basic, non-prepared
execution to be slower than it can be.

That's too generic. If the performance for "end-to-end cases" is just fine,
then it is not worth optimizing further. Typical application best practice
is to reuse SQL text (for both security and performance point of views), so
in typical applications I've seen, query text was reused, thus it naturally
was handled by server-prepared logic.

Let me highlight another direction: current execution of server-prepared
statement requires some copying of "parse tree" (or whatever). I bet it
would be much better investing in removal of that copying rather than
investing into "make one-time queries faster" thing. If we could make
"Exec" processing faster, it would immediately improve tons of applications.


Shay>Of course we can choose a different query to benchmark instead of
SELECT 1 - feel free to propose one (or several).

I've tried pgbench -M prepared, and it is way faster than pgbench -M simple.

Once again: all cases I have in mind would benefit from reusing
server-prepared statements. In other words, after some warmup the
appication would use just Bind-Execute-Sync kind of messages, and it would
completely avoid Parse/Describe ones.

If a statement is indeed "one-time" statement, then I do not care much how
long it would take to execute.

Shay>FYI in Npgsql specifically describe isn't used to get any knowledge
about parameters - users must populate the correct parameters or query
execution fails.

I think the main reason to describe for pgjdbc is to get result oids.
pgjdbc is not "full binary", thus it has to be careful which fields it
requests in binary format.
That indeed slows down "unknown queries", but as the query gets reused,
pgjdbc switches to server-prepared execution, and eliminates parse-describe
overheads completely.

Vladimir


[HACKERS] Small issues in syncrep.c

2016-08-09 Thread Julien Rouhaud
Hello,

Since 14e8803f1, it's not necessary to acquire the SyncRepLock to see up
to date data. But it looks like this commit didn't update all the
comment around MyProc->syncRepState, which still mention retrieving the
value without and without lock.  Also, there's I think a now unneeded
test to try to retrieve again syncRepState.

Patch attached to fix both small issues, present since 9.5.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org
diff --git a/src/backend/replication/syncrep.c 
b/src/backend/replication/syncrep.c
index 67249d8..c3e11b8 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -195,17 +195,12 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
ResetLatch(MyLatch);
 
/*
-* Try checking the state without the lock first.  There's no
-* guarantee that we'll read the most up-to-date value, so if 
it looks
-* like we're still waiting, recheck while holding the lock.  
But if
-* it looks like we're done, we must really be done, because 
once
-* walsender changes the state to SYNC_REP_WAIT_COMPLETE, it 
will
-* never update it again, so we can't be seeing a stale value 
in that
-* case.
+* Acquiring the lock is not needed, the latch ensure proper 
barriers.
+* If it looks like we're done, we must really be done, because 
once
+* walsender changes the state to SYNC_REP_WAIT_COMPLETE, it 
will never
+* update it again, so we can't be seeing a stale value in that 
case.
 */
syncRepState = MyProc->syncRepState;
-   if (syncRepState == SYNC_REP_WAITING)
-   syncRepState = MyProc->syncRepState;
if (syncRepState == SYNC_REP_WAIT_COMPLETE)
break;
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication WIP

2016-08-09 Thread Michael Paquier
On Tue, Aug 9, 2016 at 5:13 PM, Craig Ringer  wrote:
> On 9 August 2016 at 15:59, Masahiko Sawada  wrote:
>> The logical replication launcher process and the apply process are
>> implemented as a bgworker. Isn't better to have them as an auxiliary
>> process like checkpointer, wal writer?
>
> I don't think so. The checkpointer, walwriter, autovacuum, etc predate
> bgworkers. I strongly suspect that if they were to be implemented now they'd
> use bgworkers.

+1. We could always get them now under the umbrella of the bgworker
infrastructure if this cleans up some code duplication.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication WIP

2016-08-09 Thread Craig Ringer
On 9 August 2016 at 15:59, Masahiko Sawada  wrote:


>
> The logical replication launcher process and the apply process are
> implemented as a bgworker. Isn't better to have them as an auxiliary
> process like checkpointer, wal writer?
>

I don't think so. The checkpointer, walwriter, autovacuum, etc predate
bgworkers. I strongly suspect that if they were to be implemented now
they'd use bgworkers.

Now, perhaps we want a new bgworker "kind" for system workers or some other
minor tweaks. But basically I think bgworkers are exactly what we should be
using here.


> IMO the number of logical replication connections should not be
> limited by max_worker_processes.
>

Well, they *are* worker processes... but I take your point, that that
setting has been "number of bgworkers the user can run" and it might not be
expected that logical replication would use the same space.

max_worker_progresses isn't just a limit, it controls how many shmem slots
we allocate.

I guess we could have a separate max_logical_workers or something, but I'm
inclined to think that adds complexity without really making things any
nicer. We'd just add them together to decide how many shmem slots to
allocate and we'd have to keep track of how many slots were used by which
types of backend. Or create a near-duplicate of the bgworker facility for
logical rep.

Sure, you can go deeper down the rabbit hole here and say that we need to
add bgworker "categories" with reserved pools of worker slots for each
category. But do we really need that?

max_connections includes everything, both system and user backends. It's
not like we don't do this elsewhere. It's at worst a mild wart.

The only argument I can see for not using bgworkers is for the supervisor
worker. It's a singleton that launches the per-database workers, and
arguably is a job that the postmaster could do better. The current design
there stems from its origins as an extension. Maybe worker management could
be simplified a bit as a result. I'd really rather not invent yet another
new and mostly duplicate category of custom workers to achieve that though.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Slowness of extended protocol

2016-08-09 Thread Shay Rojansky
On Mon, Aug 8, 2016 at 6:44 PM, Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:
>
> The problem with "empty statement name" is statements with empty name can
>> be reused (for instance, for batch insert executions), so the server side
>> has to do a defensive copy (it cannot predict how many times this unnamed
>> statement will be used).
>>
>
That seems right.

Also, part of the point here is to reduce the number of protocol messages
>> needed in order to send a parameterized query - not to have to do
>> Parse/Describe/Bind/Execute/Sync - since part of the slowdown comes from
>> that (although I'm not sure how much). Your proposal keeps the 5 messages.
>>
>
> As my benchmarks show, notable overhead is due to "defensive copying of
> the execution plan". So I would measure first, and only then would claim
> where the overhead is.
>

> Some more profiling is required to tell which part is a main time consumer.
>

Tom also pointed to the caching as the main culprit, although there seems
to be some message-related overhead as well. It seems that things like
process title changes may be fixable easily - do we really need a process
title change on every message (as opposed to, say, execute messages only).
This profiling and optimization effort can happen in parallel to the
discussion on what to do with the execution plan caching.


> Technically speaking, I would prefer to have a more real-life looking
> example instead of SELECT 1.
> Do you have something in mind?
> For instance, for more complex queries, "Parse/Plan" could cost much more
> than we shave by adding "a special non-cached statement name" or by
> reducing "5 messages into 1".
>
> There's a side problem: describe message requires full roundtrip since
> there are cases when client needs to know how to encode parameters.
> Additional roundtrip hurts much worse than just an additional message that
> is pipelined (e.g. sent in the same TCP packet).
>

This is true, but there doesn't seem to be anything we can do about it - if
your usage relies on describe to get information on parameters (as opposed
to results), you're stuck with an extra roundtrip no matter what. So it
seems you have to use the extended protocol anyway.

FYI in Npgsql specifically describe isn't used to get any knowledge about
parameters - users must populate the correct parameters or query execution
fails.


> Note: it is quite easy to invent a name that is not yet used in the wild,
>>> so it is safe.
>>>
>>
>> That's problematic, how do you know what's being used in the wild and
>> what isn't? The protocol has a specification, it's very problematic to get
>> up one day and to change it retroactively. But again, the empty statement
>> seems to already be there for that.
>>
>
> Empty statement has different semantics, and it is wildly used.
> For instance, pgjdbc uses unnamed statements a lot.
> On the other hand, statement name of "!pq@#!@#42" is rather safe to use
> as a special case.
> Note: statement names are not typically created by humans (statement name
> is not a SQL), and very little PG clients do support named statements.
>

IMHO this simply isn't the kind of thing one does in a serious protocol of
a widely-used product, others seem to agree on this.

> Sir, any new SQL keyword is what you call a "retroactively defining
special semantics".
> It's obvious that very little current clients do use named
server-prepared statements.
> Statement names are not something that is provided by the end-user in a
web page, so it is not a rocket science to come up with a
> statement name that is both short and "never ever used in the wild".

The difference is that before the new SQL keyword is added, trying to use
it results in an error. What you're proposing is taking something that
already works in one way and changing its behavior.

> Shay, can you come up with a real-life use case when those "I claim the
statement will be used only once" is would indeed improve performance?
> Or, to put it in another way: "do you have a real-life case when simple
protocol is faster than extended protocol with statement reuse"?

> I do have a couple of java applications and it turns out there's a huge
win of reusing server-prepared statements.
> There's a problem that "generic plan after 5th execution might be much
worse than a specific one", however those statements are not often
> and I just put hints to the SQL (limit 0, +0, CTE, those kind of things).

I maintain Npgsql, which is a general-purpose database driver and not a
specific application. The general .NET database API (ADO.NET), like most
(all?) DB APIs, allows users to send a simple statement to the database
(ExecuteReader, ExecuteScalar, ExecuteNonQuery). Every time a user uses
these APIs without preparing, they pay a performance penalty because the
extended protocol has more overhead than the simple one.

Obviously smart use of prepared statements is a great idea, but it doesn't
work everywhere. There are many scenarios where 

Re: [HACKERS] Logical Replication WIP

2016-08-09 Thread Masahiko Sawada
On Sat, Aug 6, 2016 at 2:04 AM, Simon Riggs  wrote:
> On 5 August 2016 at 16:22, Andres Freund  wrote:
>> On 2016-08-05 17:00:13 +0200, Petr Jelinek wrote:
>>> as promised here is WIP version of logical replication patch.
>>
>> Yay!
>
> Yay2
>

Thank you for working on this!

I've applied these patches to current HEAD, but got the following error.

libpqwalreceiver.c:48: error: redefinition of typedef ‘WalReceiverConnHandle’
../../../../src/include/replication/walreceiver.h:137: note: previous
declaration of ‘WalReceiverConnHandle’ was here
make[2]: *** [libpqwalreceiver.o] Error 1
make[1]: *** [install-backend/replication/libpqwalreceiver-recurse] Error 2
make: *** [install-src-recurse] Error 2

After fixed this issue with attached patch, I used logical replication a little.
Some random comments and questions.

The logical replication launcher process and the apply process are
implemented as a bgworker. Isn't better to have them as an auxiliary
process like checkpointer, wal writer?
IMO the number of logical replication connections should not be
limited by max_worker_processes.

--
We need to set the publication up by at least CREATE PUBLICATION and
ALTER PUBLICATION command.
Can we make CREATE PUBLICATION possible to define tables as well?
For example,
CREATE PUBLICATION mypub [ TABLE table_name, ...] [WITH options]

--
This patch can not drop the subscription.

=# drop subscription sub;
ERROR:  unrecognized object class: 6102

-- 
+/*-
+ *
+ * proto.c
+ * logical replication protocol functions
+ *
+ * Copyright (c) 2015, PostgreSQL Global Development Group
+ *

The copyright of added files are old.

And this patch has some whitespace problems.
Please run "git show --check" or "git diff origin/master --check"

Regards,

--
Masahiko Sawada
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 94648c7..e4aaba4 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -40,12 +40,12 @@
 
 PG_MODULE_MAGIC;
 
-typedef struct WalReceiverConnHandle {
+struct WalReceiverConnHandle {
 	/* Current connection to the primary, if any */
 	PGconn *streamConn;
 	/* Buffer for currently read records */
 	char   *recvBuf;
-} WalReceiverConnHandle;
+};
 
 PGDLLEXPORT WalReceiverConnHandle *_PG_walreceirver_conn_init(WalReceiverConnAPI *wrcapi);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2016-08-09 Thread Ashutosh Bapat
What strikes me odd about these patches is RelOptInfo has remained
unmodified. For a base partitioned table, I would expect it to be marked as
partitioned may be indicating the partitioning scheme. Instead of that, I
see that the code directly deals with PartitionDesc, PartitionKey which are
part of Relation. It uses other structures like PartitionKeyExecInfo. I
don't have any specific observation as to why we need such information in
RelOptInfo, but lack of it makes me uncomfortable. It could be because
inheritance code does not require any mark in RelOptInfo and your patch
re-uses inheritance code. I am not sure.

On Tue, Aug 9, 2016 at 9:17 AM, Amit Langote 
wrote:

> On 2016/08/09 6:02, Robert Haas wrote:
> > On Mon, Aug 8, 2016 at 1:40 AM, Amit Langote
> >  wrote:
> >>> +1, if we could do it. It will need a change in the way Amit's patch
> stores
> >>> partitioning scheme in PartitionDesc.
> >>
> >> Okay, I will try to implement this in the next version of the patch.
> >>
> >> One thing that comes to mind is what if a user wants to apply hash
> >> operator class equality to list partitioned key by specifying a hash
> >> operator class for the corresponding column.  In that case, we would not
> >> have the ordering procedure with an hash operator class, hence any
> >> ordering based optimization becomes impossible to implement.  The
> current
> >> patch rejects a column for partition key if its type does not have a
> btree
> >> operator class for both range and list methods, so this issue doesn't
> >> exist, however it could be seen as a limitation.
> >
> > Yes, I think you should expect careful scrutiny of that issue.  It
> > seems clear to me that range partitioning requires a btree opclass,
> > that hash partitioning requires a hash opclass, and that list
> > partitioning requires at least one of those things.  It would probably
> > be reasonable to pick one or the other and insist that list
> > partitioning always requires exactly that, but I can't see how it's
> > reasonable to insist that you must have both types of opclass for any
> > type of partitioning.
>
> So because we intend to implement optimizations for list partition
> metadata that presuppose existence of corresponding btree operator class,
> we should just always require user to specify one (or error out if user
> doesn't specify and a default one doesn't exist).  That way, we explicitly
> do not support specify hash equality operator for list partitioning.
>
> >> So, we have 3 choices for the internal representation of list
> partitions:
> >>
> >> Choice 1 (the current approach):  Load them in the same order as they
> are
> >> found in the partition catalog:
> >>
> >> Table 1: p1 {'b', 'f'}, p2 {'c', 'd'}, p3 {'a', 'e'}
> >> Table 2: p1 {'c', 'd'}, p2 {'a', 'e'}, p3 {'b', 'f'}
> >>
> >> In this case, mismatch on the first list would make the two tables
> >> incompatibly partitioned, whereas they really aren't incompatible.
> >
> > Such a limitation seems clearly unacceptable.  We absolutely must be
> > able to match up compatible partitioning schemes without getting
> > confused by little details like the order of the partitions.
>
> Agreed.  Will change my patch to adopt the below method.
>
> >> Choice 2: Representation with 2 arrays:
> >>
> >> Table 1: ['a', 'b', 'c', 'd', 'e', 'f'], [3, 1, 2, 2, 3, 1]
> >> Table 2: ['a', 'b', 'c', 'd', 'e', 'f'], [2, 3, 1, 1, 2, 3]
> >>
> >> It still doesn't help the case of pairwise joins because it's hard to
> tell
> >> which value belongs to which partition (the 2nd array carries the
> original
> >> partition numbers).  Although it might still work for tuple-routing.
> >
> > It's very good for tuple routing.  It can also be used to match up
> > partitions for pairwise joins.  Compare the first arrays.  If they are
> > unequal, stop.  Else, compare the second arrays, incrementally
> > building a mapping between them and returning false if the mapping
> > turns out to be non-bijective.  For example, in this case, we look at
> > index 0 and decide that 3 -> 2.  We look at index 1 and decide 1 -> 3.
> > We look at index 2 and decide 2 -> 1.  We look at index 4 and find
> > that we already have a mapping for 2, but it's compatible because we
> > need 2 -> 1 and that's what is already there.  Similarly for the
> > remaining entries.  This is really a pretty easy loop to write and it
> > should run very quickly.
>
> I see, it does make sense to try to implement this way.
>
> >> Choice 3: Order all lists' elements for each list individually and then
> >> order the lists themselves on their first values:
> >>
> >> Table 1: p3 {'a', 'e'}, p2 {'b', 'f'}, p1 {'c', 'd'}
> >> Table 2: p2 {'a', 'e'}, p1 {'b', 'f'}, p3 {'c', 'd'}
> >>
> >> This representation makes pairing partitions for pairwise joining
> >> convenient but for tuple-routing we still need to visit each partition
> in
> >> the worst case.
> >
> > I think this is clearly not good