date:20160807

Re: [HACKERS] Declarative partitioning

2016-08-07 Thread Ashutosh Bapat

> IIUC, this seems like a combination of 2 and 3 above:
>
> So, we have the following list partitions (as read from the catalog)
>
> Table 1: p1 {'b', 'f'}, p2 {'c', 'd'}, p3 {'a', 'e'}
> Table 2: p1 {'c', 'd'}, p2 {'a', 'e'}, p3 {'b', 'f'}
>
> By applying the method of 3:
>
> Table 1: p3 {'a', 'e'}, p2 {'b', 'f'}, p1 {'c', 'd'}
> Table 2: p2 {'a', 'e'}, p1 {'b', 'f'}, p3 {'c', 'd'}
>
> Then applying 2:
>
> Table 1: ['a', 'b', 'c', 'd', 'e', 'f'], [1, 2, 3, 3, 1, 2], [3, 1, 2]
> Table 2: ['a', 'b', 'c', 'd', 'e', 'f'], [1, 2, 3, 3, 1, 2], [2, 3, 1]
>

We should arrange the OID arrays to follow canonical order. Assuming that
OIDs of p1, pt2, and p3 of table 1 are t1o1, t1o2 and t1o3 respectively,
and those of p1, p2 and p3 of table 2 are t2o1, t2o2, t2o3 resp. the last
arrays should be [t1o3, t1o2, t1o1] and [t2o2, t2o1, t2o3].Thus the last
arrays from both representation give the OIDs of children that should be
joined pair-wise. IOW, OID array should just follow the canonical order
instead of specification order. AFAIU, your patch arranges the range
partition OIDs in the canonical order and not specification order.


>
> This is user-specification independent representation wherein the
> partition numbers in the 2nd array are based on canonical representation
> (ordered lists).  To check pairwise join compatibility, simply compare the
> first two arrays.  As to which partitions (think OIDs, RTEs whatever) pair
> with each other, simply pair corresponding elements of the 3rd array which
> are original partitions numbers (or OIDs).  Also when routing a tuple, we
> find partition number in the array 2 and then look up the array 3 to get
> the actual partition to insert the tuple.
>
> Neither of these representations make the logic of checking pairwise-join
> compatibility and pairing a subset of partitions (on either side) any
> easier, although I haven't given it a good thought yet.
>
> With the small change suggested above, it should be easy to check
partition-wise join compatibilty for simplest case. I agree that for
generic case it will be difficult. E.g. Table 1#: p3 {'a', 'e'}, p4{'a#',
'l'}, p2 {'b', 'f'}, p1 {'c', 'd'} being (INNER) joined with Table 2: p2
{'a', 'e'}, p1 {'b', 'f'}, p3 {'c', 'd'} assuming 'a'< 'a#' < 'b'; in this
case, we should be able to match p3-p2, p2-p1, p1-p3 for partition-wise
join, even though canonical representations of both partitions differ,
because of an extra partition in between.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Declarative partitioning

2016-08-07 Thread Amit Langote

On 2016/08/05 21:38, Ashutosh Bapat wrote:
>>> Consider lists ('e', 'i', 'f'), ('h', 'd','m') and ('l', 'b', 'a') for a
>>> list partitioned tables. I am suggesting that we arrange them as
>>> ('a','b','l'), ('d', 'h', 'm') and ('e', 'f', 'i'). If the given row
>> (either
>>> for comparison or for inserting) has value 'c', we will search for it in
>>> ('a','b','l') but will be eliminate other two partitions since the
>> second's
>>> partition's lowest value is higher than 'c' and lowest values of rest of
>> the
>>> partitions are higher than that of the second partition. Without this
>> order
>>> among the partitions, we will have to compare lowest values of all the
>>> partitions.
>>
>> I would think that for that case what we'd want to do is create two
>> lists.  One looks like this:
>>
>> [ 'a', 'b', 'd', 'e', f', 'h', 'i', 'l', 'm' ]
>>
>> The other looks like this:
>>
>> [3, 3, 2, 1, 1, 2, 1, 1, 3, 2]
>>
> 
> small correction; there's an extra 1 here. Every partition in the example
> has only three values.
> 
> 
>>
>> Given a particular value, bsearch the first list.  If the value is not
>> found, it's not in any partition.  If it is found, then go look up the
>> same array index in the second list; that's the containing partition.
>>
> 
> +1, if we could do it. It will need a change in the way Amit's patch stores
> partitioning scheme in PartitionDesc.

Okay, I will try to implement this in the next version of the patch.

One thing that comes to mind is what if a user wants to apply hash
operator class equality to list partitioned key by specifying a hash
operator class for the corresponding column.  In that case, we would not
have the ordering procedure with an hash operator class, hence any
ordering based optimization becomes impossible to implement.  The current
patch rejects a column for partition key if its type does not have a btree
operator class for both range and list methods, so this issue doesn't
exist, however it could be seen as a limitation.

> This way specifications {('e', 'i', 'f'), ('h', 'd','m') and ('l', 'b',
> 'a')} and {('h', 'd','m') , ('e', 'i', 'f'), and ('l', 'b', 'a')} which are
> essentially same, are represented in different ways. It makes matching
> partitions for partition-wise join a bit tedius. We have to make sure that
> the first array matches for both the joining relations and then make sure
> that all the values belonging to the same partition for one table also
> belong to the same partition in the other table. Some more complex logic
> for matching subsets of lists for partition-wise join.

So, we have 3 choices for the internal representation of list partitions:

Choice 1 (the current approach):  Load them in the same order as they are
found in the partition catalog:

Table 1: p1 {'b', 'f'}, p2 {'c', 'd'}, p3 {'a', 'e'}
Table 2: p1 {'c', 'd'}, p2 {'a', 'e'}, p3 {'b', 'f'}

In this case, mismatch on the first list would make the two tables
incompatibly partitioned, whereas they really aren't incompatible.

Choice 2: Representation with 2 arrays:

Table 1: ['a', 'b', 'c', 'd', 'e', 'f'], [3, 1, 2, 2, 3, 1]
Table 2: ['a', 'b', 'c', 'd', 'e', 'f'], [2, 3, 1, 1, 2, 3]

It still doesn't help the case of pairwise joins because it's hard to tell
which value belongs to which partition (the 2nd array carries the original
partition numbers).  Although it might still work for tuple-routing.

Choice 3: Order all lists' elements for each list individually and then
order the lists themselves on their first values:

Table 1: p3 {'a', 'e'}, p2 {'b', 'f'}, p1 {'c', 'd'}
Table 2: p2 {'a', 'e'}, p1 {'b', 'f'}, p3 {'c', 'd'}

This representation makes pairing partitions for pairwise joining
convenient but for tuple-routing we still need to visit each partition in
the worst case.

> At least for straight forward partitioned table matching it helps to have
> both these array look same independent of the user specification. From that
> point of view, the partition be ordered by their lowest or highest list
> values and the second array is the index in the ordered set. For both the
> specifications above, the list will look like
> 
> [ 'a', 'b', 'd', 'e', f', 'h', 'i', 'l', 'm' ]
> [1, 1, 2, 3, 3, 2, 3, 1, 2]

IIUC, this seems like a combination of 2 and 3 above:

So, we have the following list partitions (as read from the catalog)

Table 1: p1 {'b', 'f'}, p2 {'c', 'd'}, p3 {'a', 'e'}
Table 2: p1 {'c', 'd'}, p2 {'a', 'e'}, p3 {'b', 'f'}

By applying the method of 3:

Table 1: p3 {'a', 'e'}, p2 {'b', 'f'}, p1 {'c', 'd'}
Table 2: p2 {'a', 'e'}, p1 {'b', 'f'}, p3 {'c', 'd'}

Then applying 2:

Table 1: ['a', 'b', 'c', 'd', 'e', 'f'], [1, 2, 3, 3, 1, 2], [3, 1, 2]
Table 2: ['a', 'b', 'c', 'd', 'e', 'f'], [1, 2, 3, 3, 1, 2], [2, 3, 1]

This is user-specification independent representation wherein the
partition numbers in the 2nd array are based on canonical representation
(ordered lists).  To check pairwise join compatibility, simply compare the
first two arrays.  As to which partitions (thi

Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-07 Thread Alvaro Herrera

Tom Lane wrote:

> Building on the has-property approach Andrew suggested, I wonder if
> we need something like pg_index_column_has_property(indexoid, colno,
> propertyname) with properties like "sortable", "desc", "nulls first".

This seems simple enough, on the surface.  Why not run with this design?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Oddity in EXPLAIN for foreign/custom join pushdown plans

2016-08-07 Thread Etsuro Fujita


On 2016/08/05 21:47, Robert Haas wrote:

On Tue, Jul 26, 2016 at 11:20 PM, Etsuro Fujita
 wrote:

I noticed that currently the core doesn't show any information on the target
relations involved in a foreign/custom join in EXPLAIN, by itself.



I think that's a feature, not a bug.


I agree with you.  I'd leave that for 10.0.


postgres_fdw shows the target relations in the Relations line, as shown
above, but I think that the core should show such information independently
of FDWs; in the above example replace "Foreign Scan" with "Foreign Join on
public.ft1 t1, public.ft2 t2".



I disagree with that.  Currently, when we say that something is a join
(Merge Join, Hash Join, Nested Loop) we mean that the executor is
performing a join, but that's not the case here.  The executor is
performing a scan.  The remote side, we suppose, is performing a join
for us, but we are not performing a join: we are performing a scan.
So I think the fact that it shows up in the plan as "Foreign Scan" is
exactly right.  We are scanning some foreign thing, and that thing may
internally be doing other stuff, like joins and aggregates, but all
we're doing is scanning it.


Hmm.  One thing I'm concerned about would be the case where direct 
modification is implemented by using GetForeignUpperPaths, not 
PlanDirectModify.  In that case, the way things are now, we would have 
"Foreign Scan" followed by an INSERT/UPDATE/DELETE query, but that seems 
odd to me.



Also, I don't really see the point of moving this from postgres_fdw to
core.  If, at some point in time, there are many FDWs that implement
sophisticated pushdown operations and we figure out that they are all
duplicating the code to do the EXPLAIN printout, and they're all
printing basically the same thing, perhaps not in an entirely
consistent way, then we could try to unify all of them into one
implementation in core.  But that's certainly not where we are right
now.  I don't see any harm at all in leaving this under the control of
the FDW, and in fact, I think it's better.  Neither the postgres_fdw
format nor what you want to replace it with are so unambiguously
awesome that some other FDW author might not come up with something
better.

I think we should leave this the way it is.


One thing we need to do to leave that as is would be to fix a bug that I 
pointed out upthred.  Let me explain about that again.  The EXPLAIN 
command selects relation aliases to be used in printing a query so that 
each selected alias is unique, but postgres_fdw wouldn't consider the 
uniqueness.  Here is an example:


postgres=# explain verbose select * from (select t1.a, t2.a from ft1 t1, 
ft2 t2 where t1.a = t2.a union select t1.a, t2.a from ft1 t1, ft2 t2 
where t1.a = t2.a) as t(t1a, t2a);

 QUERY PLAN

 Unique  (cost=204.12..204.13 rows=2 width=8)
   Output: t1.a, t2.a
   ->  Sort  (cost=204.12..204.12 rows=2 width=8)
 Output: t1.a, t2.a
 Sort Key: t1.a, t2.a
 ->  Append  (cost=100.00..204.11 rows=2 width=8)
   ->  Foreign Scan  (cost=100.00..102.04 rows=1 width=8)
 Output: t1.a, t2.a
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1.a, r2.a FROM (public.t1 r1 
INNER JOIN public.t2 r2 ON (((r1.a = r2.a

   ->  Foreign Scan  (cost=100.00..102.04 rows=1 width=8)
 Output: t1_1.a, t2_1.a
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1.a, r2.a FROM (public.t1 r1 
INNER JOIN public.t2 r2 ON (((r1.a = r2.a

(14 rows)

The relation aliases in the Relations line in the second Foreign Scan, 
t1 and t2 for ft1 and ft2, are not unique; they should be t1_1 and t2_1 
(compare the aliases in the Relations line with ones in the Output line 
directly above that, created by core).  The reason for that is because 
postgres_fdw has created the Relations info by using 
rte->eref->aliasname as relation aliases as is at path-creation time. 
Probably it would be a little bit too early for postgers_fdw to do that. 
 Couldn't postgres_fdw create that info after query planning, for 
example, during ExplainForeignScan?


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] garbage in xml regress test

2016-08-07 Thread Pavel Stehule

2016-08-08 5:44 GMT+02:00 Tom Lane :

> Pavel Stehule  writes:
> > 2016-08-07 23:16 GMT+02:00 Tom Lane :
> >> Probably your copy of libxml produces the results in xml_2.out instead.
>
> > no - it is in expected/xml.out
>
> No, the output you're producing is in results/xml.out, but if the
> regression test harness isn't complaining about it, it's because it
> matches one of the available expected/xml*.out files.  Which would
> be xml_2.out, looks like.
>

understand

Thank you

Pavel


>
> regards, tom lane
>

Re: [HACKERS] garbage in xml regress test

2016-08-07 Thread Tom Lane

Pavel Stehule  writes:
> 2016-08-07 23:16 GMT+02:00 Tom Lane :
>> Probably your copy of libxml produces the results in xml_2.out instead.

> no - it is in expected/xml.out

No, the output you're producing is in results/xml.out, but if the
regression test harness isn't complaining about it, it's because it
matches one of the available expected/xml*.out files.  Which would
be xml_2.out, looks like.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] garbage in xml regress test

2016-08-07 Thread Pavel Stehule

2016-08-07 23:16 GMT+02:00 Tom Lane :

> Pavel Stehule  writes:
> > When I checked result of xml test I found some strange lines in
> > expected/xml.out file
> > It is little bit strange - regress test reports ok, but diff is not empty
>
> Probably your copy of libxml produces the results in xml_2.out instead.
>

no - it is in expected/xml.out

https://github.com/postgres/postgres/blob/master/src/test/regress/expected/xml.out


LINE 1: INSERT INTO xmltest VALUES (3, ' regards, tom lane
>

Re: [HACKERS] Detecting skipped data from logical slots (data silently skipped)

2016-08-07 Thread Craig Ringer

On 5 August 2016 at 14:07, Andres Freund  wrote:

>
> > > The simplest fix would be to require downstreams to flush their
> replication
> > > origin when they get a hot standby feedback message, before they send a
> > > reply with confirmation. That could be somewhat painful for
> performance, but
> > > can be alleviated somewhat by waiting for the downstream postgres to
> get
> > > around to doing a flush anyway and only forcing it if we're getting
> close to
> > > the walsender timeout. That's pretty much what BDR and pglogical do
> when
> > > applying transactions to avoid having to do a disk flush for each and
> every
> > > applied xact. Then we change START_REPLICATION ... LOGICAL so it
> ERRORs if
> > > you ask for a too-old LSN rather than silently ignoring it.
> >
> > That's basically just proposing to revert this broken optimization,
> > IIUC, and instead just try not to flush too often on the standby.
>
> The effect of the optimization is *massive* if you are replicating a
> less active database, or a less active subset of a database, in a
> cluster with lots of other activity. I don't think that can just be
> disregard, to protect against something with plenty of other failure
> scenarios.
>

Right. Though if we flush lazily I'm surprised the effect is that big,
you're the one who did the work and knows the significance of it.

All I'm trying to say is that I think the current behaviour is too
dangerous. It doesn't just lead to failure, but easy, undetectable, silent
failure when users perform common and simple tasks like starting a snapshot
or filesystem-level pg_start_backup() copy of a DB. The only reason it
can't happen for pg_basebackup too is that we omit slots during
pg_basebackup . That inconsistency between snapshot/fs-level and
pg_basebackup is unfortunate but understandable.

So I'm not saying "this whole idea must go". I'm saying I think it's to
permissive and needs to be able to be stricter about what it's allowed to
skip, so we can differentiate between "nothing interesting here" and "um, I
think someone else consumed data I needed, I'd better bail out now". I've
never been comfortable with the skipping behaviour and found it confusing
right from the start, but now I have definite cases it can cause silent
inconsistencies and really think it needs to be modified.

Robert's point that we could keep track of the skippable range is IMO a
good one. An extra slot attribute with the last LSN that resulted in the
output plugin doing a write to the client would be sufficient, at least at
this point. To anticipate future needs where we might want to allow output
plugins to ignore some things, control could be handed to the output plugin
by allowing it to also make a function call for the position to be
explicitly advanced even if it performs no writes.

That way we can safely skip ahead if the client asks us for an LSN equal to
or after the last real data we sent them, but refuse to skip if we sent
them data after the LSN they're asking for.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-07 Thread Noah Misch

On Sun, Aug 07, 2016 at 07:19:39PM -0400, Tom Lane wrote:
> Had the complaint been raised sooner, maybe there would've been time
> to get a well-thought-out API into 9.6.  The fact that it wasn't raised
> till more than 6 months after we committed the pg_am changes, and more
> than 2 months after 9.6beta1 was released, makes me feel that it's not
> all that critical a problem.
> 
> Having said all that, it is unfortunate that 9.6 is going to go out
> without any good solution to this need.  But as Robert already pointed
> out, trying to fix it now would force delaying 9.6rc1 by several weeks
> (and that's assuming that it doesn't take very long to get consensus
> on a solution).  There's not, AFAICT, desire on the part of the release
> team to do that.  We'd like to ship 9.6 on time for a change.

I agree with all that.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Slowness of extended protocol

2016-08-07 Thread Robert Haas

On Sun, Aug 7, 2016 at 7:46 PM, Shay Rojansky  wrote:
> We could call this "protocol 3.1" since it doesn't break backwards
> compatibility (no incompatible server-initiated message changes, but it does
> include a feature that won't be supported by servers which only support 3.0.
> This could be a sort of "semantic versioning" for the protocol - optional
> new client-initiated features are a minor version bump, others are a major
> version bump...

I wouldn't try to do that; we've done nothing similar in past
instances where we've added new protocol or sub-protocol messages,
which has happened at least for COPY BOTH mode within recent memory.
See d3d414696f39e2b57072fab3dd4fa11e465be4ed.

> This new client-initiated message would be similar to query, except that it
> would include the parameter and result-related fields from Bind. The
> responses would be identical to the responses for the Query message.
>
> Does this make sense?

I think so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_ctl promote wait

2016-08-07 Thread Michael Paquier

On Sat, Aug 6, 2016 at 10:43 AM, Peter Eisentraut
 wrote:
> On 8/5/16 12:14 AM, Michael Paquier wrote:
>> In do_stop, this patches makes the wait happen for a maximum of
>> wait_seconds * 2, once when getting the control file information, and
>> once when waiting for the server to shut down.
>
> That's not how I read it.  get_controlfile() will decrease the
> wait_seconds argument by how much wait time it has used.  The wait for
> shutdown will then only use as much seconds as are left.

Ah, right. The reference to wait_seconds gets decremented.

>> This is not a good
>> idea, and the idea of putting a wait argument in get_controlfile does
>> not seem a good interface to me. I'd rather see get_controlfile be
>> extended with a flag saying no_error_on_failure and keep the wait
>> logic within pg_ctl.
>
> I guess we could write a wrapper function in pg_ctl that encapsulated
> the wait logic.

That's what I would do.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Change the default of update_process_title to off

2016-08-07 Thread Tsunakawa, Takayuki

From: David Rowley [mailto:david.row...@2ndquadrant.com]
> But perhaps it's better written like:
> 
> + This value defaults to "off" on Windows platforms due to the
> platform's significant overhead for updating the process title.

Thank you, I copied this.  But I changed "off" to off because other places in 
config.sgml don't enclose on/off with quotes.


> From: Robert Haas [mailto:robertmh...@gmail.com]
> On Fri, Aug 5, 2016 at 12:19 PM, Jeff Janes  wrote:
> > Shouldn't we change the code which initdb uses to create the example
> > postgresql.conf, so that it shows the commented out value as being
> > 'off', when initdb is run on Windows?
> 
> +1.

Good idea.  Done.

Regards
Takayuki Tsunakawa



update_process_title_off_on_win_v2.patch
Description: update_process_title_off_on_win_v2.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring of heapam code.

2016-08-07 Thread Michael Paquier

On Sat, Aug 6, 2016 at 2:56 AM, Alvaro Herrera  wrote:
> Anastasia Lubennikova wrote:
>> So there is a couple of patches. They do not cover all mentioned problems,
>> but I'd like to get a feedback before continuing.
>
> I agree that we could improve things in this general area, but I do not
> endorse any particular changes you propose in these patches; I haven't
> reviewed your patches.

Well, I am interested in the topic. And just had a look at the patches proposed.

+ /*
+ * Split PageGetItem into set of different macros
+ * in order to make code more readable.
+ */
+#define PageGetItemHeap(page, itemId) \
+( \
+   AssertMacro(PageIsValid(page)), \
+   AssertMacro(ItemIdHasStorage(itemId)), \
+   (HeapTupleHeader)(((char *)(page)) + ItemIdGetOffset(itemId)) \
+)
Placing into bufpage.h macros that are specific to heap and indexes is
not that much a good idea. I'd suggest having the heap ones in
heapam.h, and the index ones in a more generic place, as
src/access/common as already mentioned by Alvaro.

+   onpage_tup = PageGetItemHeapHeaderOnly(page, newitemid);
Just PageGetItemHeapHeader would be fine.

@@ -2324,7 +2087,6 @@ FreeBulkInsertState(BulkInsertState bistate)
pfree(bistate);
 }

-
 /*
  * heap_insert - insert tuple into a heap
Those patches have some noise.

Patch 0002 is doing two things: reorganizing the order of the routines
in heapam.c and move some routines from heapam.c to hio.c. Those two
could be separate patches/

If the final result is to be able to extend custom AMs for tables, we
should have a structure like src/access/common/index.c,
src/access/common/table.c and src/access/common/relation.c for
example, and have headers dedicated to them. But yes, there is
definitely a lot of room for refactoring as we are aiming, at least I
guess so, at having more various code paths for access methods.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Slowness of extended protocol

2016-08-07 Thread Shay Rojansky

On Sun, Aug 7, 2016 at 6:11 PM, Robert Haas  wrote:

> I'm glad reducing the overhead of out-of-line parameters seems like an
> > important goal. FWIW, if as Vladimir seems to suggest, it's possible to
> > bring down the overhead of the v3 extended protocol to somewhere near the
> > simple protocol, that would obviously be a better solution - it would
> mean
> > things work faster here and now, rather than waiting for the v4
> protocol. I
> > have no idea if that's possible though, I'll see if I can spend some
> time on
> > understand where the slowdown comes from.
>
> That having been said, I don't really see a reason why we couldn't
> introduce a new protocol message for this without bumping the protocol
> version.  Clients who don't know about the new message type just won't
> use it; nothing breaks.  Clients who do know about the new message
> need to be careful not to send it to older servers, but since the
> server reports its version to the client before the first opportunity
> to send queries, that shouldn't be too hard.  We could add a new
> interface to libpq that uses the new protocol message on new servers
> and falls back to the existing extended protocol on older servers.  In
> general, it seems to me that we only need to bump the protocol version
> if there will be server-initiated communication that is incompatible
> with existing clients.  Anything that the client can choose to
> initiate (or not) based on the server version should be OK.


That sounds right to me. As you say, the server version is sent early in
the startup phase, before any queries are sent to the backend, so frontends
know which server they're communicating with.

We could call this "protocol 3.1" since it doesn't break backwards
compatibility (no incompatible server-initiated message changes, but it
does include a feature that won't be supported by servers which only
support 3.0. This could be a sort of "semantic versioning" for the protocol
- optional new client-initiated features are a minor version bump, others
are a major version bump...

This new client-initiated message would be similar to query, except that it
would include the parameter and result-related fields from Bind. The
responses would be identical to the responses for the Query message.

Does this make sense?

Re: [HACKERS] [sqlsmith] Crash in pg_get_viewdef_name_ext()

2016-08-07 Thread Michael Paquier

On Mon, Aug 8, 2016 at 6:57 AM, Tom Lane  wrote:
> Andreas Seltenreich  writes:
>> sqlsmith just triggered a crash in pg_get_viewdef_name_ext().  Looks
>> like commit 976b24fb4 failed to update this caller of
>> pg_get_viewdef_worker().  Backtrace below.  Patch attached.
>
> Pushed, thanks.
>
> (For the record, you can trigger this by passing a name that is
> a valid relation, but not a view.)

Oops. Thanks for the fix!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-07 Thread Tom Lane

Robert Haas  writes:
> On Sat, Aug 6, 2016 at 8:00 AM, Andrew Gierth
>  wrote:
>> Anyway, what I haven't seen in this thread is any implementable
>> counter-proposal other than the "just hardcode the name 'btree'"
>> response that was given in the JDBC thread, which I don't consider
>> acceptable in any sense. Is 9.6 going to go out like this or is action
>> going to be taken before rc1?

> Well, at this point, I think 9.6 is going to go out like this, unless
> Tom is willing to do something today.  Multiple people have expressed
> clear support for adding something along the lines you've suggested, I
> too am in favor, and I think it's unfortunate that Tom didn't do
> something about it before now.

FWIW, this thread started on 25-July, less than two weeks ago.  I've been
fully engaged in either stabilizing 9.6 or doing necessary back-branch
maintenance since then (including, I might add, putting in full days both
days this weekend).  There has not been time to work on this request, and
as far as I've seen from the thread we are not very close to a consensus
on what the API details should be anyway.

Had the complaint been raised sooner, maybe there would've been time
to get a well-thought-out API into 9.6.  The fact that it wasn't raised
till more than 6 months after we committed the pg_am changes, and more
than 2 months after 9.6beta1 was released, makes me feel that it's not
all that critical a problem.

Having said all that, it is unfortunate that 9.6 is going to go out
without any good solution to this need.  But as Robert already pointed
out, trying to fix it now would force delaying 9.6rc1 by several weeks
(and that's assuming that it doesn't take very long to get consensus
on a solution).  There's not, AFAICT, desire on the part of the release
team to do that.  We'd like to ship 9.6 on time for a change.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-07 Thread Robert Haas

On Sun, Aug 7, 2016 at 4:09 PM, Greg Sabino Mullane  wrote:
> Robert Haas wrote:
>> But I'm neither willing to commit a patch to fix the day before rc1
>> nor to argue that the whole release cycle should be put back by
>> several weeks on account of this issue.
>
> Seriously? First, not sure why this would put the whole release cycle
> back by 'several weeks'.

It's a combination of three things.  First, it is unhelpful to users
and to the project and burdensome to packagers to do lots of releases
in quick succession.  If we issue beta4 this week, we should really
wait about 3 weeks before we consider issuing beta5 or rc1.
Otherwise, we're going to have little time to get any meaningful
feedback on the release, and possibly annoy some of our all-volunteer
packaging team.  Second, we are doing a set of minor releases this
week and those minor releases will include some security fixes.  We do
not want to do minor releases for the back branches without releasing
a new version of 9.6, because that leaves people who are trying to
help with beta-testing for 9.6 running close with disclosed security
vulnerabilities.  Third, I don't think it's appropriate to have a
catversion bump between rc1 and final; rc1 should be very, very close
to final; if it's not, we weren't really ready for rc1.

Because of point #2, we have to release either 9.6beta4 or 9.6rc1 this
week.  If we pick 9.6rc1, then because of point #3, we cannot make
this change before final.  If we pick 9.6beta4, then because of point
#1, we cannot reasonably release 9.6rc1 for about another 3 weeks.

These issues have been discussed by the release team, which includes
the core team and the pool of active committers, and this is the
consensus.  Of course, you may not agree.

> Second, this is removing functionality, so what
> are apps supposed to do - have a three-choice case in the code to handle
> pg_am for < 9.6, do some ugly parsing for 9.6, and use the new functions
> when 10.0 comes out?! This issue was raised on July 25th, and the OP has
> gone out of his way to present the case and provide patches. It's hardly
> fair to discard it now.

I understand your concern and I wish things had come out differently
than they have with respect to this issue.  However, I do not agree
that rushing a patch into the tree less than 24 hours before rc1 is
likely to improve our chances of a stable, on-time release, and I also
do not agree that this one issue is so important that it justifies
holding up the final release by another three weeks.  You may feel
differently and that is fine, but I do not think that my position is
unreasonable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] Crash in GetOldestSnapshot()

2016-08-07 Thread Robert Haas

On Sun, Aug 7, 2016 at 5:46 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Sun, Aug 7, 2016 at 2:03 PM, Tom Lane  wrote:
>>> What I suggested just now in <2850.1470592...@sss.pgh.pa.us> might
>>> be implementable with a couple hours' work, though.  Do you have a
>>> reason to think it'd be insufficient?
>
>> No - if you can implement that quickly, I think it sounds like a great idea.
>
> Pushed.

Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] Crash in pg_get_viewdef_name_ext()

2016-08-07 Thread Tom Lane

Andreas Seltenreich  writes:
> sqlsmith just triggered a crash in pg_get_viewdef_name_ext().  Looks
> like commit 976b24fb4 failed to update this caller of
> pg_get_viewdef_worker().  Backtrace below.  Patch attached.

Pushed, thanks.

(For the record, you can trigger this by passing a name that is
a valid relation, but not a view.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] Crash in GetOldestSnapshot()

2016-08-07 Thread Tom Lane

Robert Haas  writes:
> On Sun, Aug 7, 2016 at 2:03 PM, Tom Lane  wrote:
>> What I suggested just now in <2850.1470592...@sss.pgh.pa.us> might
>> be implementable with a couple hours' work, though.  Do you have a
>> reason to think it'd be insufficient?

> No - if you can implement that quickly, I think it sounds like a great idea.

Pushed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] garbage in xml regress test

2016-08-07 Thread Tom Lane

Pavel Stehule  writes:
> When I checked result of xml test I found some strange lines in
> expected/xml.out file
> It is little bit strange - regress test reports ok, but diff is not empty

Probably your copy of libxml produces the results in xml_2.out instead.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [sqlsmith] Crash in pg_get_viewdef_name_ext()

2016-08-07 Thread Andreas Seltenreich

Hi,

sqlsmith just triggered a crash in pg_get_viewdef_name_ext().  Looks
like commit 976b24fb4 failed to update this caller of
pg_get_viewdef_worker().  Backtrace below.  Patch attached.

regards,
Andreas

Program terminated with signal SIGSEGV, Segmentation fault.
(gdb) bt
#0  strlen () at ../sysdeps/x86_64/strlen.S:106
#1  0x007cdf09 in cstring_to_text (s=s@entry=0x0) at varlena.c:152
#2  0x007a3409 in string_to_text (str=0x0) at ruleutils.c:10083
#3  pg_get_viewdef_name_ext (fcinfo=) at ruleutils.c:681
#4  0x005dfae2 in ExecMakeFunctionResultNoSets (fcache=0x403ed80, 
econtext=0x3fb0eb8, isNull=0x403e0a1 "", isDone=) at 
execQual.c:2041
[...]

diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index d68ca7a..3c3fce4 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -671,6 +671,7 @@ pg_get_viewdef_name_ext(PG_FUNCTION_ARGS)
 	int			prettyFlags;
 	RangeVar   *viewrel;
 	Oid			viewoid;
+	char		*res;
 
 	prettyFlags = pretty ? PRETTYFLAG_PAREN | PRETTYFLAG_INDENT : PRETTYFLAG_INDENT;
 
@@ -678,7 +679,12 @@ pg_get_viewdef_name_ext(PG_FUNCTION_ARGS)
 	viewrel = makeRangeVarFromNameList(textToQualifiedNameList(viewname));
 	viewoid = RangeVarGetRelid(viewrel, NoLock, false);
 
-	PG_RETURN_TEXT_P(string_to_text(pg_get_viewdef_worker(viewoid, prettyFlags, WRAP_COLUMN_DEFAULT)));
+	res = pg_get_viewdef_worker(viewoid, prettyFlags, WRAP_COLUMN_DEFAULT);
+
+	if (res == NULL)
+		PG_RETURN_NULL();
+
+	PG_RETURN_TEXT_P(string_to_text(res));
 }
 
 /*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-07 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160

Robert Haas wrote:
> But I'm neither willing to commit a patch to fix the day before rc1 
> nor to argue that the whole release cycle should be put back by 
> several weeks on account of this issue.

Seriously? First, not sure why this would put the whole release cycle 
back by 'several weeks'. Second, this is removing functionality, so what 
are apps supposed to do - have a three-choice case in the code to handle 
pg_am for < 9.6, do some ugly parsing for 9.6, and use the new functions 
when 10.0 comes out?! This issue was raised on July 25th, and the OP has 
gone out of his way to present the case and provide patches. It's hardly 
fair to discard it now.

- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201608071606
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAlenlWUACgkQvJuQZxSWSsjjeACfVrThYGx+4DnBwO2ZAOYGoK7s
wdgAoOoxdVo0RM7smSr3CJg8J4dM3YMo
=+m9i
-END PGP SIGNATURE-

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] garbage in xml regress test

2016-08-07 Thread Pavel Stehule

Hi

When I checked result of xml test I found some strange lines in
expected/xml.out file

It is little bit strange - regress test reports ok, but diff is not empty

[pavel@nemesis regress]$ diff results/xml.out expected/xml.out
11a12,13
>^
63a66,67
> ^
222a227,228
> &
> ^
228a235,236
> &idontexist;
>^
246a255,256
> &idontexist;
> ^
247a258,259
> &idontexist;
> ^
256a269,270
>
>^
273a288,289
> &
>   ^
279a296,297
> &idontexist;
>^
297a316,317
> &idontexist;
> ^

Regards

Pavel

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-07 Thread Tom Lane

Robert Haas  writes:
> I think the whole idea of a fast temporary table is that there are no
> catalog entries.  If there are no catalog entries, then dependencies
> are not visible.  If there ARE catalog entries, to what do they refer?
>  Without a pg_class entry for the table, there's no table OID upon
> which to depend.

TBH, I think that the chances of such a design getting committed are
not distinguishable from zero.  Tables have to have OIDs; there is just
too much code that assumes that.  And I seriously doubt that it will
work (for any large value of "work") without catalog entries.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-07 Thread Robert Haas

On Sat, Aug 6, 2016 at 4:05 AM, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>> For example, suppose I create a fast temporary table and then I create a
>> functional index on the fast temporary table that uses some SQL function
>> defined in pg_proc.
> Just to clarify, did you mean something like this?
> ```
> create fast temp table fasttab(x int, s text);
> create or replace function test_function_for_index(t text) returns text as
> $$
> begin
> return lower(t);
> end;
> $$ language plpgsql immutable;
> create index fasttab_s_idx on fasttab (test_function_for_index(s));
> drop function test_function_for_index(t text);
> ```
> As far as I understand dependencies should protect in case of fasttable too,
> because everything is visible as in regular case, isn't it?

I think the whole idea of a fast temporary table is that there are no
catalog entries.  If there are no catalog entries, then dependencies
are not visible.  If there ARE catalog entries, to what do they refer?
 Without a pg_class entry for the table, there's no table OID upon
which to depend.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-07 Thread Robert Haas

On Sat, Aug 6, 2016 at 8:00 AM, Andrew Gierth
 wrote:
> Anyway, what I haven't seen in this thread is any implementable
> counter-proposal other than the "just hardcode the name 'btree'"
> response that was given in the JDBC thread, which I don't consider
> acceptable in any sense. Is 9.6 going to go out like this or is action
> going to be taken before rc1?

Well, at this point, I think 9.6 is going to go out like this, unless
Tom is willing to do something today.  Multiple people have expressed
clear support for adding something along the lines you've suggested, I
too am in favor, and I think it's unfortunate that Tom didn't do
something about it before now.  But I'm neither willing to commit a
patch to fix the day before rc1 nor to argue that the whole release
cycle should be put back by several weeks on account of this issue.
Once we open the tree for 10, I'm willing to pick this up if nobody
else has gotten to it before then.

I realize that's probably not the answer you were hoping for, and I'm
sorry about that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] Crash in GetOldestSnapshot()

2016-08-07 Thread Tom Lane

Robert Haas  writes:
> On Sun, Aug 7, 2016 at 2:03 PM, Tom Lane  wrote:
>> What I suggested just now in <2850.1470592...@sss.pgh.pa.us> might
>> be implementable with a couple hours' work, though.  Do you have a
>> reason to think it'd be insufficient?

> No - if you can implement that quickly, I think it sounds like a great idea.

I'll look into it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Slowness of extended protocol

2016-08-07 Thread Robert Haas

On Fri, Aug 5, 2016 at 8:07 PM, Shay Rojansky  wrote:
>> > I really don't get what's problematic with posting a message on a
>> > mailing
>> > list about a potential performance issue, to try to get people's
>> > reactions,
>> > without diving into profiling right away. I'm not a PostgreSQL
>> > developer,
>> > have other urgent things to do and don't even spend most of my
>> > programming
>> > time in C.
>>
>> There's absolutely nothing wrong with that.  I find your questions
>> helpful and interesting and I hope you will keep asking them.  I think
>> that they are a valuable contribution to the discussion on this list.
>
> Thanks for the positive feedback Robert.
>
> I'm glad reducing the overhead of out-of-line parameters seems like an
> important goal. FWIW, if as Vladimir seems to suggest, it's possible to
> bring down the overhead of the v3 extended protocol to somewhere near the
> simple protocol, that would obviously be a better solution - it would mean
> things work faster here and now, rather than waiting for the v4 protocol. I
> have no idea if that's possible though, I'll see if I can spend some time on
> understand where the slowdown comes from.

I think that's a fine thing to work on, but I don't hold out a lot of
hope.  If we ask the question "can you reduce the overhead vs. the
status quo?" I will wager that the answer is "yes".  But if you ask
the question "can you make it as efficient as we could be given a
protocol change?" I will wager that the answer is "no".

That having been said, I don't really see a reason why we couldn't
introduce a new protocol message for this without bumping the protocol
version.  Clients who don't know about the new message type just won't
use it; nothing breaks.  Clients who do know about the new message
need to be careful not to send it to older servers, but since the
server reports its version to the client before the first opportunity
to send queries, that shouldn't be too hard.  We could add a new
interface to libpq that uses the new protocol message on new servers
and falls back to the existing extended protocol on older servers.  In
general, it seems to me that we only need to bump the protocol version
if there will be server-initiated communication that is incompatible
with existing clients.  Anything that the client can choose to
initiate (or not) based on the server version should be OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Change the default of update_process_title to off

2016-08-07 Thread Robert Haas

On Fri, Aug 5, 2016 at 12:19 PM, Jeff Janes  wrote:
> On Fri, Aug 5, 2016 at 3:25 AM, Tsunakawa, Takayuki
>  wrote:
>>> From: Tom Lane [mailto:t...@sss.pgh.pa.us]
>>> Yeah, I think I agree.  It would be bad to disable it by default on Unix,
>>> because ps(1) is a very standard tool there, but the same argument doesn't
>>> hold for Windows.
>>
>> It seems that we could reach a consensus.  The patch is attached.  I'll add 
>> this to the next CommitFest.
>>
>>> Another route to a solution would be to find a cheaper way to update the
>>> process title on Windows ... has anyone looked for alternatives?
>>
>> I couldn't find an alternative solution after asking some Windows support 
>> staff.
>>
>> Regards
>> Takayuki Tsunakawa
>
> Shouldn't we change the code which initdb uses to create the example
> postgresql.conf, so that it shows the commented out value as being
> 'off', when initdb is run on Windows?

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] Crash in GetOldestSnapshot()

2016-08-07 Thread Robert Haas

On Sun, Aug 7, 2016 at 2:03 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> So I think in the short term what we should do about this is just fix
>> it so it doesn't crash.
>
> Well, we clearly need to fix GetOldestSnapshot so it won't crash,
> but I do not think that having RETURNING queries randomly returning
> "ERROR: no known snapshots" is acceptable even for a beta release.
> If we aren't prepared to do something about that before Monday,
> I think we need to revert 3e2f3c2e until we do have a fix for it.
>
> What I suggested just now in <2850.1470592...@sss.pgh.pa.us> might
> be implementable with a couple hours' work, though.  Do you have a
> reason to think it'd be insufficient?

No - if you can implement that quickly, I think it sounds like a great idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] Crash in GetOldestSnapshot()

2016-08-07 Thread Robert Haas

On Sun, Aug 7, 2016 at 1:57 PM, Tom Lane  wrote:
> Andrew Gierth  writes:
>> In a similar case in the past involving holdable cursors, the solution
>> was to detoast _before_ storing in the tuplestore (see
>> PersistHoldablePortal). I guess the question now is, under what
>> circumstances is it now allowable to detoast a datum with no active
>> snapshot? (Wouldn't it be necessary in such a case to know what the
>> oldest snapshot ever used in the transaction was?)
>
> After looking at this a bit, I think probably the appropriate solution
> is to register the snapshot that was used by the query and store it as
> a property of the Portal, releasing it when the Portal is destroyed.
> Essentially, this views possession of a relevant snapshot as a resource
> that is required to make toast dereferences safe.

Hmm, good idea.

> I think there has been a bug here for awhile.  Consider a committed-dead
> row with some associated toast data, and suppose the query's snapshot
> was the last one that could see that row.  Once we destroy the query's
> snapshot, there is nothing preventing a concurrent VACUUM from removing
> the dead row and the toast data.

Yeah: as you see, I came to the same conclusion.

> When the RETURNING code was originally
> written, I think this was safe enough, because the bookkeeping that
> determined when VACUUM could remove data was based on transactions'
> advertised xmins, and those did not move once set for the life of the
> transaction.  So dereferencing a toast pointer you'd fetched was safe
> for the rest of the transaction.  But when we changed over to
> oldest-snapshot-based xmin advertisement, and made it so that a
> transaction holding no snapshots advertised no xmin, we created a hazard
> for data held in portals.

But I missed this aspect of it.

> In this view of things, flattening toast pointers in "held" tuplestores
> is still necessary, but it's because their protective snapshot is going
> away not because the transaction as a whole is going away.  But as long
> as we hold onto the relevant snapshot, we don't have to do that for
> portals used intra-transaction.
>
> It's interesting to think about whether we could let snapshots outlive
> transactions and thereby not need to flatten "held" tuplestores either.
> I kinda doubt it's a good idea because of the potential bad effects
> for vacuum not being able to remove dead rows for a long time; but
> it seems at least possible to do it, in this world-view.

EnterpriseDB has had complaints from customers about the cost of
flattening TOAST pointers when tuplestores are held, so I think
providing an option to skip the flattening (at the risk of increased
bloat) would be a good idea even if we chose not to change the default
behavior.  It's worth noting that the ability to set
old_snapshot_threshold serves as a way of limiting the damage that can
be caused by the open snapshot, so that optional behavior might be
more appealing now than heretofore.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] Crash in GetOldestSnapshot()

2016-08-07 Thread Tom Lane

Robert Haas  writes:
> So I think in the short term what we should do about this is just fix
> it so it doesn't crash.

Well, we clearly need to fix GetOldestSnapshot so it won't crash,
but I do not think that having RETURNING queries randomly returning
"ERROR: no known snapshots" is acceptable even for a beta release.
If we aren't prepared to do something about that before Monday,
I think we need to revert 3e2f3c2e until we do have a fix for it.

What I suggested just now in <2850.1470592...@sss.pgh.pa.us> might
be implementable with a couple hours' work, though.  Do you have a
reason to think it'd be insufficient?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] Crash in GetOldestSnapshot()

2016-08-07 Thread Robert Haas

On Sat, Aug 6, 2016 at 9:00 AM, Andrew Gierth
 wrote:
> Hmm.
>
> So this happens because RETURNING queries run to completion immediately
> and populate a tuplestore with the results, and the portal then fetches
> from the tuplestore to send to the destination. The assumption is that
> the tuplestore output can be processed without needing a snapshot, which
> obviously is not true now if it contains toasted data.
>
> In a similar case in the past involving holdable cursors, the solution
> was to detoast _before_ storing in the tuplestore (see
> PersistHoldablePortal). I guess the question now is, under what
> circumstances is it now allowable to detoast a datum with no active
> snapshot? (Wouldn't it be necessary in such a case to know what the
> oldest snapshot ever used in the transaction was?)

Yes, I think you're right.  Suppose we leave "snapshot too old" to one
side; assume that feature is disabled.  If the backend fetches the
heap tuples without de-TOAST-ing, releases its snapshot (presumably
resetting xmin), and then goes into the tank for a while, those tuples
could be pruned.  When the backend wakes up again and tries to TOAST,
we would get the the exact sort of heap:toast disagreement that we set
out to prevent here.  That's not likely to occur because in most cases
the number of tuples returned will be small, and VACUUM is quite
unlikely to remove them before we de-TOAST.  But this report makes me
suspect it can happen (I have not tested).

With "snapshot too old" enabled, it becomes much more likely.  The
offending prune operation can happen at any time after our snapshot
times out and before we de-TOAST, rather than needing to happen after
the query ends and before we de-TOAST.

So I think in the short term what we should do about this is just fix
it so it doesn't crash.  In the longer term, we might want to think a
bit more carefully about the way we handle de-TOASTing overall; we've
had a number of different bugs that are all rooted in failure to think
carefully about the fact that the query's snapshot needs to live at
least as long as any TOAST pointers that we might want to de-TOAST
later (3f8c8e3c61cef5729980ee4372ec159862a979f1,
ec543db77b6b72f24d0a637c4a4a419cf8311d0b).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] Crash in GetOldestSnapshot()

2016-08-07 Thread Tom Lane

Andrew Gierth  writes:
> In a similar case in the past involving holdable cursors, the solution
> was to detoast _before_ storing in the tuplestore (see
> PersistHoldablePortal). I guess the question now is, under what
> circumstances is it now allowable to detoast a datum with no active
> snapshot? (Wouldn't it be necessary in such a case to know what the
> oldest snapshot ever used in the transaction was?)

After looking at this a bit, I think probably the appropriate solution
is to register the snapshot that was used by the query and store it as
a property of the Portal, releasing it when the Portal is destroyed.
Essentially, this views possession of a relevant snapshot as a resource
that is required to make toast dereferences safe.

I think there has been a bug here for awhile.  Consider a committed-dead
row with some associated toast data, and suppose the query's snapshot
was the last one that could see that row.  Once we destroy the query's
snapshot, there is nothing preventing a concurrent VACUUM from removing
the dead row and the toast data.  When the RETURNING code was originally
written, I think this was safe enough, because the bookkeeping that
determined when VACUUM could remove data was based on transactions'
advertised xmins, and those did not move once set for the life of the
transaction.  So dereferencing a toast pointer you'd fetched was safe
for the rest of the transaction.  But when we changed over to
oldest-snapshot-based xmin advertisement, and made it so that a
transaction holding no snapshots advertised no xmin, we created a hazard
for data held in portals.

In this view of things, flattening toast pointers in "held" tuplestores
is still necessary, but it's because their protective snapshot is going
away not because the transaction as a whole is going away.  But as long
as we hold onto the relevant snapshot, we don't have to do that for
portals used intra-transaction.

It's interesting to think about whether we could let snapshots outlive
transactions and thereby not need to flatten "held" tuplestores either.
I kinda doubt it's a good idea because of the potential bad effects
for vacuum not being able to remove dead rows for a long time; but
it seems at least possible to do it, in this world-view.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Heap WARM Tuples - Design Draft

2016-08-07 Thread Bruce Momjian

On Sun, Aug  7, 2016 at 10:49:45AM -0400, Bruce Momjian wrote:
> OK, crazy idea time --- what if we only do WARM chain additions when all
> indexed values are increasing (with NULLs higher than all values)?  (If
> a key is always-increasing, it can't match a previous value in the
> chain.)  That avoids the problem of having to check the WARM chain,
> except for the previous tuple, and the problem of pruning removing
> changed rows.  It avoids having to check the index for matching key/ctid
> values, and it prevents CREATE INDEX from having to index WARM chain
> values.
> 
> Any decreasing value would cause a normal tuple be created.

Actually, when we add the first WARM tuple, we can mark the HOT/WARM
chain as either all-incrementing or all-decrementing.  We would need a
bit to indicate that.

Also, it would be possible for keys involved in multi-key indexes to not
match the direction of the chain as long as keys earlier in the index
matched, e.g. key (1,5,6) would be less than (2,1,1) since 1 < 2, even
though 5 > 1.  I am not sure it would be worth detecting this.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls

2016-08-07 Thread Tom Lane

Dean Rasheed  writes:
> On 5 August 2016 at 21:48, Tom Lane  wrote:
>> OK, thanks.  What shall we do about Andreas' request to back-patch this?
>> I'm personally willing to do it, but there is the old bugaboo of "maybe
>> it will destabilize a plan that someone is happy with".

> My inclination would be to back-patch it because arguably it's a
> bug-fix -- at the very least the old behaviour didn't match the docs
> for stadistinct:

Yeah.  I suspect that situations like this are pretty rare, or we'd have
recognized the problem sooner.  And at least for Andreas, it'd be fixing
a real problem.  I'll apply the back-patch unless I hear objections in
the next couple of hours.

> Additionally, I think that example is misleading because it's only
> really true if there are no null values in the column. Perhaps it
> would help to have a more explicit example to illustrate how nulls
> affect stadistinct, for example:

Good idea, will do.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Consolidate 'unique array values' logic into a reusable function?

2016-08-07 Thread Tom Lane

Thomas Munro  writes:
> Here's a sketch patch that creates a function array_unique which takes
> the same arguments as qsort or qsort_arg and returns the new length.

Hmm ... I'd be against using this in backend/regex/, because I still
have hopes of converting that to a standalone library someday (and
in any case it needs to stay compatible with Tcl's copy of the code).
But otherwise this seems like a reasonable proposal.

As for the function name, maybe "qunique()" to go with "qsort()"?
I'm not thrilled with "array_unique" because that sounds like it
is meant for Postgres' array data types.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Heap WARM Tuples - Design Draft

2016-08-07 Thread Bruce Momjian

On Sat, Aug  6, 2016 at 10:51:21AM -0400, Bruce Momjian wrote:
> > If we need to find an efficient way to convert WARM chains back to HOT, 
> > which
> > will happen soon when the old index tuple retires, the system can attain a
> > stable state, not for all but many use cases.
> 
> I don't see how that is possible, except perhaps by vacuum.
> 
> OK, now I am less certain.

OK, crazy idea time --- what if we only do WARM chain additions when all
indexed values are increasing (with NULLs higher than all values)?  (If
a key is always-increasing, it can't match a previous value in the
chain.)  That avoids the problem of having to check the WARM chain,
except for the previous tuple, and the problem of pruning removing
changed rows.  It avoids having to check the index for matching key/ctid
values, and it prevents CREATE INDEX from having to index WARM chain
values.

Any decreasing value would cause a normal tuple be created.

Sure, it is a limitation, but it removes almost all of the the overhead
of WARM updates.  In fact, even the key comparisons of old/new tuples 
will return -1, 0, or 1, and from there you can know if the new key is
unchanged, or increasing.  I assume we already do this comparison to
determine if we can do a HOT update.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Wait events monitoring future development

2016-08-07 Thread Ilya Kosmodemiansky

Hi,

I've summarized Wait events monitoring discussion at Developer unconference
in Ottawa this year on wiki:

https://wiki.postgresql.org/wiki/PgCon_2016_Developer_Unconference/Wait_events_monitoring


(Thanks to Alexander Korotkov for patiently pushing me to make this thing
finally done)

If you attended, fill free to point me out if I missed something, I will
put it on the wiki too.

Wait event monitoring looks ones again stuck on the way through community
approval in spite of huge progress done last year in that direction. The
importance of the topic is beyond discussion now, if you talk to any
PostgreSQL person about implementing such a tool in Postgres and if the
person does not get exited, probably you talk to a full-time PostgreSQL
developer;-) Obviously it needs a better design, both the user interface
and implementation, and perhaps this is why full-time developers are still
sceptical.

In order to move forward, imho we need at least some steps, whose steps can
be done in parallel

1. Further requirements need to be collected from DBAs.

   If you are a PostgreSQL DBA with Oracle experience and use perf for
troubleshooting Postgres - you are an ideal person to share your
experience, but everyone is welcome.

2. Further pg_wait_sampling performance testing needed and in different
environments.

   According to developers, overhead is small, but many people have doubts
that it can be much more significant for intensive workloads. Obviously, it
is not an easy task to test, because you need to put doubtfully
non-production ready code into mission-critical production for such tests.
   As a result it will be clear if this design should be abandoned  and we
need to think about less-invasive solutions or this design is acceptable.

Any thoughts?

Best regards,
Ilya

-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
i...@postgresql-consulting.com

[HACKERS] patch: xmltable - proof concept

2016-08-07 Thread Pavel Stehule

Hi

I am sending a initial implementation of xmltable function:

The code is not clean now, but it does almost of expected work. The usage
is simple. It is fast - 16K entries in 400ms.

I invite any help with documentation and testing.

The full ANSI/SQL, or Oracle compatible implementation is not possible due
limits of libxml2, but for typical usage it should to work well. It doesn't
need any new reserved keyword, so there should not be hard barriers for
accepting (when this work will be complete).

Example:

postgres=# SELECT * FROM xmldata;
┌──┐
│   data   │
╞══╡
│   ↵│
│ ↵│
│   AU   ↵│
│   Australia↵│
│   3  ↵│
│   ↵│
│ ↵│
│   CN   ↵│
│   China↵│
│   3  ↵│
│   ↵│
│ ↵│
│   HK   ↵│
│   HongKong ↵│
│   3  ↵│
│   ↵│
│ ↵│
│   IN   ↵│
│   India↵│
│   3  ↵│
│   ↵│
│ ↵│
│   JP   ↵│
│   Japan↵│
│   3Sinzo Abe↵│
│   ↵│
│ ↵│
│   SG   ↵│
│   Singapore↵│
│   3791↵│
│   ↵│
│   │
└──┘
(1 row)
postgres=# SELECT  xmltable.*
postgres-#FROM (SELECT data FROM xmldata) x,
postgres-# LATERAL xmltable('/ROWS/ROW'
postgres(#  PASSING data
postgres(#  COLUMNS id int PATH '@id',
postgres(#   country_name text PATH
'COUNTRY_NAME',
postgres(#   country_id text PATH
'COUNTRY_ID',
postgres(#   region_id int PATH 'REGION_ID',
postgres(#   size float PATH 'SIZE',
postgres(#   unit text PATH 'SIZE/@unit',
postgres(#   premier_name text PATH
'PREMIER_NAME' DEFAULT 'not specified');
┌┬──┬┬───┬──┬──┬───┐
│ id │ country_name │ country_id │ region_id │ size │ unit │ premier_name  │
╞╪══╪╪═══╪══╪══╪═══╡
│  1 │ Australia│ AU │ 3 │¤ │ ¤│ not specified │
│  2 │ China│ CN │ 3 │¤ │ ¤│ not specified │
│  3 │ HongKong │ HK │ 3 │¤ │ ¤│ not specified │
│  4 │ India│ IN │ 3 │¤ │ ¤│ not specified │
│  5 │ Japan│ JP │ 3 │¤ │ ¤│ Sinzo Abe │
│  6 │ Singapore│ SG │ 3 │  791 │ km   │ not specified │
└┴──┴┴───┴──┴──┴───┘
(6 rows)

Regards

Pavel
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 69bf65d..326fa62 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -58,7 +58,6 @@
 #include "utils/typcache.h"
 #include "utils/xml.h"
 
-
 /* static function decls */
 static Datum ExecEvalArrayRef(ArrayRefExprState *astate,
  ExprContext *econtext,
@@ -149,6 +148,8 @@ static Datum ExecEvalMinMax(MinMaxExprState *minmaxExpr,
 			   bool *isNull, ExprDoneCond *isDone);
 static Datum ExecEvalXml(XmlExprState *xmlExpr, ExprContext *econtext,
 			bool *isNull, ExprDoneCond *isDone);
+static Datum ExecEvalXmlTable(XmlTableState *xmltable, ExprContext *econtext,
+			bool *isnull, ExprDoneCond *isDone);
 static Datum ExecEvalNullIf(FuncExprState *nullIfExpr,
 			   ExprContext *econtext,
 			   bool *isNull, ExprDoneCond *isDone);
@@ -2073,6 +2074,7 @@ ExecMakeTableFunctionResult(ExprState *funcexpr,
 	MemoryContext oldcontext;
 	bool		direct_function_call;
 	bool		first_time = true;
+	bool		xmltable_call;
 
 	callerContext = Curren

Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls

2016-08-07 Thread Andreas Joseph Krogh

På søndag 07. august 2016 kl. 09:01:40, skrev Dean Rasheed <
dean.a.rash...@gmail.com >:
On 5 August 2016 at 21:48, Tom Lane  wrote:
 > OK, thanks.  What shall we do about Andreas' request to back-patch this?
 > I'm personally willing to do it, but there is the old bugaboo of "maybe
 > it will destabilize a plan that someone is happy with".
 >

 My inclination would be to back-patch it because arguably it's a
 bug-fix -- at the very least the old behaviour didn't match the docs
 for stadistinct:
 [snip]
 
 
Will this then make it into the soon-to-be-released 9.5.4?
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com 
www.visena.com

Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls

2016-08-07 Thread Dean Rasheed

On 5 August 2016 at 21:48, Tom Lane  wrote:
> OK, thanks.  What shall we do about Andreas' request to back-patch this?
> I'm personally willing to do it, but there is the old bugaboo of "maybe
> it will destabilize a plan that someone is happy with".
>

My inclination would be to back-patch it because arguably it's a
bug-fix -- at the very least the old behaviour didn't match the docs
for stadistinct:

  The number of distinct nonnull data values in the column.
  A value greater than zero is the actual number of distinct values.
  A value less than zero is the negative of a multiplier for the number
  of rows in the table; for example, a column in which values appear about
  twice on the average could be represented by
  stadistinct = -0.5.

Additionally, I think that example is misleading because it's only
really true if there are no null values in the column. Perhaps it
would help to have a more explicit example to illustrate how nulls
affect stadistinct, for example:

  ... for example, a column in which about 80% of the values are nonnull
  and each nonnull value appears about twice on average could be
  represented by stadistinct = -0.4.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

43 matches

Mail list logo