date:20160615

Re: [HACKERS] forcing a rebuild of the visibility map

2016-06-15 Thread Masahiko Sawada

On Thu, Jun 16, 2016 at 12:06 PM, Robert Haas  wrote:
> [ Changing subject line in the hopes of keeping separate issues in
> separate threads. ]
>
> On Mon, Jun 6, 2016 at 11:00 AM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> I'm intuitively sympathetic to the idea that we should have an option
>>> for this, but I can't figure out in what case we'd actually tell
>>> anyone to use it.  It would be useful for the kinds of bugs listed
>>> above to have VACUUM (rebuild_vm) to blow away the VM fork and rebuild
>>> it, but that's different semantics than what we proposed for VACUUM
>>> (even_frozen_pages).  And I'd be sort of inclined to handle that case
>>> by providing some other way to remove VM forks (like a new function in
>>> the pg_visibilitymap contrib module, maybe?) and then just tell people
>>> to run regular VACUUM afterwards, rather than putting the actual VM
>>> fork removal into VACUUM.
>>
>> There's a lot to be said for that approach.  If we do it, I'd be a bit
>> inclined to offer an option to blow away the FSM as well.
>
> After having been scared out of my mind by the discovery of
> longstanding breakage in heap_update[1], it occurred to me that this
> is an excellent example of a case in which the option for which Andres
> was agitating - specifically forcing VACUUM to visit ever single page
> in the heap - would be useful.  Even if there's functionality
> available to blow away the visibility map forks, you might not want to
> just go remove them all and re-vacuum everything, because while you
> were rebuilding the VM forks, index-only scans would stop being
> index-only, and that might cause an operational problem.  Being able
> to simply force every page to be visited is better.  Patch for that
> attached.  I decided to call the option disable_page_skipping rather
> than even_frozen_pages because it should also force visiting pages
> that are all-visible but not all-frozen.  (I was sorely tempted to go
> with the competing proposal of calling it VACUUM (THEWHOLEDAMNTHING)
> but I refrained.)
>
> However, I also think that the approach described above - providing a
> way to excise VM forks specifically - is useful.  Patch for that
> attached, too.  It turns out that can't be done without either adding
> a new WAL record type or extending an existing one; I chose to add a
> "flags" argument to XLOG_SMGR_TRUNCATE, specifying the forks to be
> truncated.  Since this will require bumping the XLOG version, if we're
> going to do it, and I think we should, it would be good to try to get
> it done before beta2 rather than closer to release.  However, I don't
> want to commit this without some review, so please review this.
> Actually, please review both patches.[2]  The same core support could
> be used to add an option to truncate the FSM, but I didn't write a
> patch for that because I'm incredibly { busy | lazy }.
>

Option name DISABLE_PAGE_SKIPPING is good to me.
I'm still working on this, but here are some review comments.

---
+CREATE FUNCTION pg_truncate_visibility_map(regclass)
+RETURNS void
+AS 'MODULE_PATHNAME', 'pg_truncate_visibility_map'
+LANGUAGE C STRICT
+PARALLEL UNSAFE;  -- let's not make this any more dangerous
+

"REVOKE ALL ON FUNCTION pg_truncate_visibility_map(regclass) FROM
PUBLIC;" is missing in pg_visibility/pg_visibility--1.0--1.1.sql.

---
I think that VACUUM with VERBOSE option can show the information for
how many frozen pages we skipped like autovacuum does. Thought?
Patch attached.

Regards,

--
Masahiko Sawada
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 7a67fa5..ddbb835 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -1329,6 +1329,8 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	"Skipped %u pages due to buffer pins.\n",
 	vacrelstats->pinskipped_pages),
 	 vacrelstats->pinskipped_pages);
+	appendStringInfo(&buf, _("Skipped %u frozen pages.\n"),
+	 vacrelstats->frozenskipped_pages);
 	appendStringInfo(&buf, ngettext("%u page is entirely empty.\n",
 	"%u pages are entirely empty.\n",
 	empty_pages),

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prevent ALTER TABLE DROP NOT NULL on child tables if parent column has it

2016-06-15 Thread Vitaly Burovoy

On 6/15/16, Michael Paquier  wrote:
> On Thu, Jun 16, 2016 at 12:10 PM, Vitaly Burovoy
>  wrote:
>> In the initial letter[1] I posted a digest from the SQL-2011 standard
>> and a conclusion as a design of a new patch.
>> Now I have more free time and I'm hacking it that way. The new patch
>> is at the very early stage, full of WIPs and TODOs. I hope it'll be
>> ready to be shown in a month (may be two).
>
> I have just read both your patch and the one of Alvaro, but yours does
> not touch pg_constraint in any way. Isn't that unexpected?

The consensus was reached to use CHECK constraints instead of new type
of constrains.
Alvaro made attempt[1] to write PoC in 2012 but it failed to apply on
top of master (and after solving conflicts led to core dumps) in Jan
2016.

I just rebased Alvaro's one to understand how he wanted to solve issue
and to run tests and queries. After all I sent rebased working patch
for anyone who wants to read it and try it without core dumps.

I have not published my patch for NOT NULLs yet.

Alvaro, the correct path[2] in the second message[3] of the thread.
What's wrong in it (I got the source in the previous[1] thread)?

[1] 
https://www.postgresql.org/message-id/flat/1343682669-sup-2...@alvh.no-ip.org
[2] 
https://www.postgresql.org/message-id/attachment/41886/catalog-notnull-2-c477e84_cleaned.patch
[3] 
https://www.postgresql.org/message-id/CAKOSWNnXbOY4pEiwN9wePOx8J%2BB44yTj40BQ8RVo-eWkdiGDFQ%40mail.gmail.com
-- 
Best regards,
Vitaly Burovoy

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prevent ALTER TABLE DROP NOT NULL on child tables if parent column has it

2016-06-15 Thread Michael Paquier

On Thu, Jun 16, 2016 at 1:27 PM, Alvaro Herrera
 wrote:
> Michael Paquier wrote:
>> On Wed, Jun 15, 2016 at 10:34 PM, Tom Lane  wrote:
>
>> > IIRC, Alvaro posted a WIP patch for that awhile back.  Not sure what the
>> > current state is.
>>
>> Are you talking about that?
>> https://www.postgresql.org/message-id/20110707213401.GA27098%40alvh.no-ip.org
>> This is not a small patch :)
>>
>> Alvaro, others, any opinions?
>
> I haven't looked at this in a while.  I suppose I should get it in shape
> sometime.  Unless you would like to work on it?

Well, I could, but there is another big patch I am getting into shape
for the next CF (Ahem. scram. Ahem), so I am going to stay focused on
this one to have fuel for it during the CF. But I'm fine reviewing the
patch for the DROP NOT NULL.

> Note that Vitaly Burovoy rebased it, but I think the rebase is wrong.
> https://www.postgresql.org/message-id/cakoswnkn6hsyatuys8xzxzrcr-kl1okhs5-b9qd9bf1rad3...@mail.gmail.com

Yes, it is definitely wrong.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typo in parallel comment of heap_delete()

2016-06-15 Thread Thomas Munro

On Tue, Jun 7, 2016 at 12:00 AM, Robert Haas  wrote:
> On Sun, Jun 5, 2016 at 4:39 PM, Jim Nasby  wrote:
>> I'm pretty sure this is a typo...
>
> Sure is.  Thanks.

The same typo appears in heap_update.

PS Far be it from me, but postgres_fdw.c seems to have a stray
conditional form where I would expect a present subjunctive:  "...
lest the new path *would* kick ..."
/me ducks

-- 
Thomas Munro
http://www.enterprisedb.com

lest.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prevent ALTER TABLE DROP NOT NULL on child tables if parent column has it

2016-06-15 Thread Alvaro Herrera

Michael Paquier wrote:
> On Wed, Jun 15, 2016 at 10:34 PM, Tom Lane  wrote:

> > IIRC, Alvaro posted a WIP patch for that awhile back.  Not sure what the
> > current state is.
> 
> Are you talking about that?
> https://www.postgresql.org/message-id/20110707213401.GA27098%40alvh.no-ip.org
> This is not a small patch :)
> 
> Alvaro, others, any opinions?

I haven't looked at this in a while.  I suppose I should get it in shape
sometime.  Unless you would like to work on it?

Note that Vitaly Burovoy rebased it, but I think the rebase is wrong.
https://www.postgresql.org/message-id/cakoswnkn6hsyatuys8xzxzrcr-kl1okhs5-b9qd9bf1rad3...@mail.gmail.com

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prevent ALTER TABLE DROP NOT NULL on child tables if parent column has it

2016-06-15 Thread Michael Paquier

On Thu, Jun 16, 2016 at 12:10 PM, Vitaly Burovoy
 wrote:
> In the initial letter[1] I posted a digest from the SQL-2011 standard
> and a conclusion as a design of a new patch.
> Now I have more free time and I'm hacking it that way. The new patch
> is at the very early stage, full of WIPs and TODOs. I hope it'll be
> ready to be shown in a month (may be two).

I have just read both your patch and the one of Alvaro, but yours does
not touch pg_constraint in any way. Isn't that unexpected?

> But it already forbids dropping NOT NULLs if they were set as result
> of inheritance.

Okay, I'll drop any proposal on my side in this case. Looking forward
to seeing what you got for the first commit fest of 10.0.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ERROR: ORDER/GROUP BY expression not found in targetlist

2016-06-15 Thread Robert Haas

On Tue, Jun 14, 2016 at 12:18 PM, Tom Lane  wrote:
> Amit Kapila  writes:
>> On Mon, Jun 13, 2016 at 9:37 PM, Tom Lane  wrote:
>>> ...  I got a core dump in the window.sql test:
>>> which I think may be another manifestation of the failure-to-apply-proper-
>>> pathtarget issue we're looking at in this thread.  Or maybe it's just
>>> an unjustified assumption in make_partialgroup_input_target that the
>>> input path must always have some sortgrouprefs assigned.
>
>> I don't see this problem after your recent commit
>> - 89d53515e53ea080029894939118365b647489b3.  Are you still getting it?
>
> No.  I am suspicious that there's still a bug there, ie we're looking at
> the wrong pathtarget; but the regression test doesn't actually crash.
> That might only be because we don't choose the parallelized path.

OK, so there are quite a number of separate things here:

1. The case originally reported by Thomas Munro still fails.  To fix
that, we probably need to apply scanjoin_target to each partial path.
But we can only do that if it's parallel-safe.  It seems like what we
want is something like this: (1) During scan/join planning, somehow
skip calling generate_gather_paths for the topmost scan/join rel as we
do to all the others.  (2) If scanjoin_target is not parallel-safe,
build a path for the scan/join phase that applies a Gather node to the
cheapest path and does projection at the Gather node.  Then forget all
the partial paths so we can't do any bogus upper planning.  (3) If
scanjoin_target is parallel-safe, replace the list of partial paths
for the topmost scan/join rel with a new list where scanjoin_target
has been applied to each one.  I haven't tested this so I might be
totally off-base about what's actually required here...

2. In https://www.postgresql.org/message-id/15695.1465827...@sss.pgh.pa.us
Tom raised the issue that we need some test cases covering this area.

3. In https://www.postgresql.org/message-id/25521.1465837...@sss.pgh.pa.us
Tom proposed adding a GUC to set the minimum relation size required
for consideration of parallelism.

4. In https://www.postgresql.org/message-id/1597.1465846...@sss.pgh.pa.us
Tom raised the question of whether there is a danger that
MinMaxAggPath might not be parallel-safe.

5. In https://www.postgresql.org/message-id/20391.1465850...@sss.pgh.pa.us
Tom proposed a patch to fix an apparent confusion on my part between
subplans and subqueries.

6. In 
https://www.postgresql.org/message-id/ca+tgmozwjb9eaibj-meeaq913wkkoz4aq7vqncfrdls9wyh...@mail.gmail.com
I discussed the need to fix the way consider_parallel is set for upper
rels, and in 
https://www.postgresql.org/message-id/22832.1465854...@sss.pgh.pa.us
Tom observed that, once that work is done, we can get rid of the
wholePlanParallelSafe flag.

I don't think it's remotely realistic to get all of this fixed in time
for beta2; I think we'll be lucky if we can get #1 fixed.  But we'd
better track all of these issues so that we can crank through them
later.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prevent ALTER TABLE DROP NOT NULL on child tables if parent column has it

2016-06-15 Thread Vitaly Burovoy

On 6/15/16, Tom Lane  wrote:
> Michael Paquier  writes:
>> To put it short, it should not be possible to drop a NOT NULL
>> constraint on a child relation when its parent table is using it, this
>> should only be possible from the parent. Attached is a patch handling
>> this problem by adding a new function in pg_inherits.c to fetch the
>> list of parent relations for a given relation OID, and did some
>> refactoring to stick with what is done when scanning child relations.
>
> This doesn't sound like the right approach; in particular, it won't really
> help for deciding whether to propagate a DROP NOT NULL on a parent rel to
> its children.  What we've discussed in the past is to store NOT NULL
> constraints in pg_constraint, much like CHECK constraints are already, and
> use use-count logic identical to the CHECK case to keep track of whether
> NOT NULL constraints are inherited or not.  My feeling is that we'd keep
> the pg_attribute.attnotnull field and continue to drive actual enforcement
> off that, but it would just reflect a summary of the pg_constraint state.
>
> IIRC, Alvaro posted a WIP patch for that awhile back.  Not sure what the
> current state is.
>
>   regards, tom lane

The last thread about NOT NULLs as constraints is accessible by the link[1].
I rebased[2] Alvaro's patch to the actual master at that time, but I
have not repeated it since then.

In the initial letter[1] I posted a digest from the SQL-2011 standard
and a conclusion as a design of a new patch.
Now I have more free time and I'm hacking it that way. The new patch
is at the very early stage, full of WIPs and TODOs. I hope it'll be
ready to be shown in a month (may be two).

But it already forbids dropping NOT NULLs if they were set as result
of inheritance.


[1] 
https://www.postgresql.org/message-id/flat/CAKOSWNkN6HSyatuys8xZxzRCR-KL1OkHS5-b9qd9bf1Rad3PLA%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/attachment/41886/catalog-notnull-2-c477e84_cleaned.patch

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prevent ALTER TABLE DROP NOT NULL on child tables if parent column has it

2016-06-15 Thread Michael Paquier

On Wed, Jun 15, 2016 at 10:34 PM, Tom Lane  wrote:
> Michael Paquier  writes:
> This doesn't sound like the right approach; in particular, it won't really
> help for deciding whether to propagate a DROP NOT NULL on a parent rel to
> its children.  What we've discussed in the past is to store NOT NULL
> constraints in pg_constraint, much like CHECK constraints are already, and
> use use-count logic identical to the CHECK case to keep track of whether
> NOT NULL constraints are inherited or not.  My feeling is that we'd keep
> the pg_attribute.attnotnull field and continue to drive actual enforcement
> off that, but it would just reflect a summary of the pg_constraint state.

OK, I see. Hm, by storing this information I would actually think that
we want to drop this attnotnull so as we don't need to bother about
updating pg_attribute through the whole tree when dropping a NOT NULL
constraint on the parent, and we do not actually need to store this
information in two different places..

I would also rather do nothing for the DDL interface regarding for
example the possibility to change the constraint names for domains and
tables to keep things simple. A patch of this caliber would be
complicated enough if a catalog switch is done.

> IIRC, Alvaro posted a WIP patch for that awhile back.  Not sure what the
> current state is.

Are you talking about that?
https://www.postgresql.org/message-id/20110707213401.GA27098%40alvh.no-ip.org
This is not a small patch :)

Alvaro, others, any opinions?
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] forcing a rebuild of the visibility map

2016-06-15 Thread Robert Haas

[ Changing subject line in the hopes of keeping separate issues in
separate threads. ]

On Mon, Jun 6, 2016 at 11:00 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> I'm intuitively sympathetic to the idea that we should have an option
>> for this, but I can't figure out in what case we'd actually tell
>> anyone to use it.  It would be useful for the kinds of bugs listed
>> above to have VACUUM (rebuild_vm) to blow away the VM fork and rebuild
>> it, but that's different semantics than what we proposed for VACUUM
>> (even_frozen_pages).  And I'd be sort of inclined to handle that case
>> by providing some other way to remove VM forks (like a new function in
>> the pg_visibilitymap contrib module, maybe?) and then just tell people
>> to run regular VACUUM afterwards, rather than putting the actual VM
>> fork removal into VACUUM.
>
> There's a lot to be said for that approach.  If we do it, I'd be a bit
> inclined to offer an option to blow away the FSM as well.

After having been scared out of my mind by the discovery of
longstanding breakage in heap_update[1], it occurred to me that this
is an excellent example of a case in which the option for which Andres
was agitating - specifically forcing VACUUM to visit ever single page
in the heap - would be useful.  Even if there's functionality
available to blow away the visibility map forks, you might not want to
just go remove them all and re-vacuum everything, because while you
were rebuilding the VM forks, index-only scans would stop being
index-only, and that might cause an operational problem.  Being able
to simply force every page to be visited is better.  Patch for that
attached.  I decided to call the option disable_page_skipping rather
than even_frozen_pages because it should also force visiting pages
that are all-visible but not all-frozen.  (I was sorely tempted to go
with the competing proposal of calling it VACUUM (THEWHOLEDAMNTHING)
but I refrained.)

However, I also think that the approach described above - providing a
way to excise VM forks specifically - is useful.  Patch for that
attached, too.  It turns out that can't be done without either adding
a new WAL record type or extending an existing one; I chose to add a
"flags" argument to XLOG_SMGR_TRUNCATE, specifying the forks to be
truncated.  Since this will require bumping the XLOG version, if we're
going to do it, and I think we should, it would be good to try to get
it done before beta2 rather than closer to release.  However, I don't
want to commit this without some review, so please review this.
Actually, please review both patches.[2]  The same core support could
be used to add an option to truncate the FSM, but I didn't write a
patch for that because I'm incredibly { busy | lazy }.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] 
https://www.postgresql.org/message-id/ca+tgmoz-1txcsxwcp3i-kmo5dnb-6p-odqw7c-n_q3pzzy1...@mail.gmail.com
[2] This request is directed at the list generally, not at any
specific individual ... well, OK, I mean you.  Yeah, you!
From 010e99b403ec733d50c71a7d4ef646b1b446ef07 Mon Sep 17 00:00:00 2001
From: Robert Haas 
Date: Wed, 15 Jun 2016 22:52:58 -0400
Subject: [PATCH 2/2] Add VACUUM (DISABLE_PAGE_SKIPPING) for emergencies.

---
 doc/src/sgml/ref/vacuum.sgml | 21 -
 src/backend/commands/vacuum.c|  9 
 src/backend/commands/vacuumlazy.c| 87 
 src/backend/parser/gram.y| 10 +
 src/include/nodes/parsenodes.h   |  3 +-
 src/test/regress/expected/vacuum.out |  1 +
 src/test/regress/sql/vacuum.sql  |  2 +
 7 files changed, 92 insertions(+), 41 deletions(-)

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 19fd748..dee1c5b 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation

-VACUUM [ ( { FULL | FREEZE | VERBOSE | ANALYZE } [, ...] ) ] [ table_name [ (column_name [, ...] ) ] ]
+VACUUM [ ( { FULL | FREEZE | VERBOSE | ANALYZE | DISABLE_PAGE_SKIPPING } [, ...] ) ] [ table_name [ (column_name [, ...] ) ] ]
 VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ table_name ]
 VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] ANALYZE [ table_name [ (column_name [, ...] ) ] ]

@@ -130,6 +130,25 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] ANALYZE [ 

+DISABLE_PAGE_SKIPPING
+
+ 
+  Normally, VACUUM will skip pages based on the visibility map.  Pages where
+  all tuples are known to be frozen can always be skipped, and those
+  where all tuples are known to be visible to all transactions may be
+  skipped except when performing an aggressive vacuum.  Furthermore,
+  except when performing an aggressive vacuum, some pages may be skipped
+  in order to avoid waiting for other sessions to finish using them.
+  This option disables all page-skipping behavior, and is intended to
+  be used only the contents of the visi

Re: [HACKERS] parallel.c is not marked as test covered

2016-06-15 Thread Robert Haas

On Wed, Jun 15, 2016 at 10:48 PM, Amit Kapila  wrote:
> exec_stmt_execsql() is used to execute SQL statements insider plpgsql which
> includes dml statements as well, so probably you wanted to play safe by not
> allowing parallel option from that place.  However, I think there shouldn't
> be a problem in using CURSOR_OPT_PARALLEL_OK from this place as we have a
> check in standard_planner which will take care of whether to choose parallel
> mode or not for a particular statement.  If you want, I can do more detailed
> analysis and prepare a patch.

That would be great.  I have a vague recollection that that function
might be called in some contexts where execution can be suspended,
which would be no good for parallel query.  But that might be wrong.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel.c is not marked as test covered

2016-06-15 Thread Amit Kapila

On Thu, Jun 16, 2016 at 12:46 AM, Robert Haas  wrote:
>
> On Wed, Jun 15, 2016 at 5:23 AM, Amit Kapila 
wrote:
> >> > Considering above analysis is correct, we have below options:
> >> > a. Modify the test such that it actually generates an error and to
hide
> >> > the
> >> > context, we can exception block and raise some generic error.
> >> > b. Modify the test such that it actually generates an error and to
hide
> >> > the
> >> > context, we can use force_parallel_mode = regress;
> >>
> >> Either of those sounds okay.  No need to raise a generic error; one can
> >> raise
> >> SQLERRM to keep the main message and not the context.  I lean toward
(a)
> >> so we
> >> have nonzero test coverage of force_parallel_mode=on.
> >
> > Patch implementing option (a) attached with this mail.
>
> OK, committed.

Thanks.

>I also changed "select" to "perform" per your
> analysis.

oops, it seems I have forgotten to make that change in patch.

>
>   I wonder if we need to revisit the choices I made inside
> PL/pgsql and see why CURSOR_OPT_PARALLEL_OK is not being set here.
>

exec_stmt_execsql() is used to execute SQL statements insider plpgsql which
includes dml statements as well, so probably you wanted to play safe by not
allowing parallel option from that place.  However, I think there shouldn't
be a problem in using CURSOR_OPT_PARALLEL_OK from this place as we have a
check in standard_planner which will take care of whether to choose
parallel mode or not for a particular statement.  If you want, I can do
more detailed analysis and prepare a patch.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Alvaro Herrera

Kevin Grittner wrote:

> test=# -- connection 1
> analyze verbose t1;  -- when run after threshold, STO error occurs
> INFO:  analyzing "public.t1"
> INFO:  "t1": scanned 45 of 45 pages, containing 8999 live rows and
> 1001 dead rows; 8999 rows in sample, 8999 estimated total rows
> ERROR:  snapshot too old
> CONTEXT:  SQL statement "SELECT (i * (select c1 from t2))"
> PL/pgSQL function mysq(integer) line 3 at RETURN
> 
> Is there some other behavior which would be preferred?

The fact that the ERROR is being thrown seems okay to me.  I was a bit
surprised that the second INFO line is shown, but there's a simple
explanation: we first acquire the sample rows (using
acquire_sample_rows) and only after that's done we try to compute the
stats from them.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Andres Freund

On 2016-06-15 20:22:24 -0500, Kevin Grittner wrote:
> I'm not clear where you see this as being in any way different with
> STO.  Above it seemed that you saw this as an issue related to
> ANALYZE.  If there is not early pruning for the table being
> analyzed, nothing is at all different.  If there is early pruning
> the rows are not seen and there could be no detoasting.  If there
> is a function that lies about IMMUTABLE and reads from a table, it
> either functions as before or throws a STO error on page access
> (long before any detoasting).  Am I missing something?

I'm not sure, I might be missing something myself. Given the frequency
of confusion of all senior hackers involved in this discussion...

I previously was thinking of this in the context of ANALYZE, but I now
think it's a bigger problem (and might not really affect ANALYZE
itself).

The simplest version of the scenario I'm concerned about is that a tuple
in a tuple is *not* vacuumed, even though it's elegible to be removed
due to STO. If that tuple has toasted columns, it could be the that the
toast table was independently vacuumed (autovac considers main/toast
tables separately, or the horizon could change between the two heap
scans, or pins could prevent vacuuming of one page, or ...).  Toast
accesses via tuptoaster.c currently don't perform
TestForOldSnapshot_impl(), because they use SnapshotToastData, not
SnapshotMVCC.

That seems to means two things:

1) You might get 'missing chunk number ...' errors on access to toasted
   columns. Misleading error, but ok.

2) Because the tuple has been pruned from the toast table, it's possible
   that the toast oid/va_valueid is reused, because now there's no
   conflict with GetNewOidWithIndex() anymore. In that case the
   toast_fetch_datum() might return a tuple from another column & type
   (all columns in a table share the same toast table), which could lead
   to almost arbitrary problems.  That's not super likely to happen, but
   could have quite severe consequences once it starts.

It seems the easiest way to fix this would be to make
TestForOldSnapshot() "accept" SnapshotToast as well.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Kevin Grittner

On Wed, Jun 15, 2016 at 5:34 PM, Andres Freund  wrote:
> On 2016-06-15 16:58:25 -0500, Kevin Grittner wrote:
>> On Wed, Jun 15, 2016 at 3:25 PM, Andres Freund  wrote:
>>> On 2016-06-15 14:24:58 -0500, Kevin Grittner wrote:
 On Wed, Jun 15, 2016 at 2:20 PM, Andres Freund  wrote:

> We might fetch a toast tuple which
> since have been re-purposed for a datum of a different type.

 How would that happen?
>>>
>>> Autovac vacuums toast and heap tables independently. Once a toast datum
>>> isn't used anymore, the oid used can be reused (because it doesn't
>>> conflict via GetNewOidWithIndex() anymore. If analyze then detoasts a
>>> datum, which hasn't been removed, the contents of that toast id, might
>>> actually be for something different.
>>
>> What prevents that from happening now, without STO?
>
> Afaics we shouldn't ever look (i.e. detoast) at a "dead for everyone"
> tuple in autovacuum (or anywhere else). There's one minor exception to
> that, and that's enum datums in indexes, which is why we currently have
> weird transactional requirements for them.  I'm not entirely sure this
> can be hit, but it's worth checking.

I'm not clear where you see this as being in any way different with
STO.  Above it seemed that you saw this as an issue related to
ANALYZE.  If there is not early pruning for the table being
analyzed, nothing is at all different.  If there is early pruning
the rows are not seen and there could be no detoasting.  If there
is a function that lies about IMMUTABLE and reads from a table, it
either functions as before or throws a STO error on page access
(long before any detoasting).  Am I missing something?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increase message string buffer size of watch command of psql

2016-06-15 Thread Ioseph Kim

Thanks, I agree that strftime() is better then asctime().

regards, Ioseph

2016년 06월 16일 08:37에 Tom Lane 이(가) 쓴 글:
> Alvaro Herrera  writes:
>> +1 to strftime("%c").  If we wanted to provide additional flexibility we
>> could have a \pset option to change the strftime format string to
>> something other than %c, but I don't think there's enough demand to
>> justify it.
> Agreed, that seems like something for later (or never).  Pushed that way.
>
> I also widened the buffer a bit in the back branches, to address the
> original complaint.
>
>   regards, tom lane



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increase message string buffer size of watch command of psql

2016-06-15 Thread Tom Lane

Alvaro Herrera  writes:
> +1 to strftime("%c").  If we wanted to provide additional flexibility we
> could have a \pset option to change the strftime format string to
> something other than %c, but I don't think there's enough demand to
> justify it.

Agreed, that seems like something for later (or never).  Pushed that way.

I also widened the buffer a bit in the back branches, to address the
original complaint.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MultiXactId error after upgrade to 9.3.4

2016-06-15 Thread Alvaro Herrera

Stephen Frost wrote:
> Greetings,
> 
>   Looks like we might not be entirely out of the woods yet regarding
>   MultiXactId's.  After doing an upgrade from 9.2.6 to 9.3.4, we saw the
>   following:
> 
>   ERROR:  MultiXactId 6849409 has not been created yet -- apparent wraparound
> 
>   The table contents can be select'd out and match the pre-upgrade
>   backup, but any attempt to VACUUM / VACUUM FULL / CLUSTER fails,
>   unsurprisingly.

I finally figured what is going on here, though I don't yet have a
patch.

This has been reported a number of times:

https://www.postgresql.org/message-id/20140330040029.GY4582%40tamriel.snowman.net
https://www.postgresql.org/message-id/538F3D70.6080902%40publicrelay.com
https://www.postgresql.org/message-id/556439CF.7070109%40pscs.co.uk
https://www.postgresql.org/message-id/20160614173150.GA443784@alvherre.pgsql
https://www.postgresql.org/message-id/20160615203829.5798.4...@wrigleys.postgresql.org

We theorised that we were missing some place that was failing to pass
the "allow_old" flag to GetMultiXactIdMembers; and since we couldn't
find any and the problem was worked around simply (by doing SELECT FOR
UPDATE or equivalent on the affected tuples), there was no further
research.  (The allow_old flag is passed for tuples that match an
infomask bit pattern that can only come from tuples locked in 9.2 and
prior, i.e. one that is never set by 9.3ff).

Yesterday I had to deal with it and quickly found what is going wrong:
the problem is that 9.2 and earlier it was acceptable (and common) to
leave tuples with very old multixacts in xmax, even after multixact
counter wraparound.  When one such value was found in a live tuple,
GetMultiXactIdMembers() would notice that it was out of range and simply
return "no members", at which point heap_update and siblings would
consider the tuple as not locked and move on.

When pg_upgrading a database containing tuples marked like that, the new
code would error out, because during 9.3 multixact we considered that it
was dangerous to silently allow tuples to be marked by values we didn't
keep track of, so we made it an error instead, per
https://www.postgresql.org/message-id/20111204122027.GA10035%40tornado.leadboat.com
Some cases are allowed to be downgraded to DEBUG, when allow_old is
true.

I think that was a good choice in general so that possibly-data-eating
bugs could be reported, but there's a problem in the specific case of
tuples carried over by pg_upgrade whose Multixact is "further in the
future" compared to the nextMultiXactId counter.  I think it's pretty
clear that we should let that error be downgraded to DEBUG too, like the
other checks.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH v12] GSSAPI encryption support

2016-06-15 Thread Robbie Harwood

Michael Paquier  writes:

> On Thu, Apr 7, 2016 at 8:20 AM, Tom Lane  wrote:
>> Robbie Harwood  writes:
>>> Tom Lane  writes:
>>>
 Wait a second.  So the initial connection-request packet is
 necessarily unencrypted under this scheme?
>>
>>> Yes, by necessity.  The username must be sent in the clear, even if
>>> only as part of the GSSAPI handshake (i.e., the GSSAPI username will
>>> appear in plantext in the GSSAPI blobs which are otherwise
>>> encrypted).  GSSAPI performs authentication before it can start
>>> encryption.
>>
>> Ugh.  I had thought we were putting work into this because it
>> represented something we could recommend as best practice, but now
>> you're telling me that it's always going to be inferior to what we
>> have already.
>
> It does not seem necessary to have an equivalent of
> pqsecure_open_client, just some extra handling in fe-connect.c to set
> up the initial context with a proper message handling... Not that
> direct anyway. So should the patch be marked as returned with feedback
> at this stage?

I think in order to satisfy Tom's (valid) concern, there does need to be
a separate handshake - i.e., GSSAPI support in pqsecure_open_client().

If I were to continue as I have been - using the plaintext connection
and auth negotiation path - then at the time of startup the client has
no way of knowing whether to send connection parameters or not.
Personally, I would be in favor of not frontloading these connection
parameters over insecure connections, but it is my impression that the
project does not want to go this way (which is fine).

The way I'm seeing this, when a connection comes in, we take the 'G'
character for GSSAPI much as for SSL.  At that time, we need to perform
an *authentication* handshake (because GSSAPI will not do encryption
before authenticating).  I expect to use a consistent format for all
GSSAPI packets - four bytes for length, and a payload.  (I would prefer
tagging them, but previously preference for not doing this has been
expressed.)

Once GSSAPI authentication is complete, the normal handshake process can
be tunneled through a GSSAPI encryption layer, as is done with TLS.  The
server will need to retain some of the earlier authentication data
(e.g., to check that the presented user-name matches GSSAPI
credentials), but there will be no authentication packets exchanged
(more specifically, it will resemble the anonymous case).  Authorization
will be checked as normal, and we then proceed in the usual fashion, all
over the GSSAPI tunnel.

On the server, I'll need to implement `hostgss` (by analogy to
`hostssl`), and we'll want to lock authentication on those connections
to GSSAPI-only.  Clients will explicitly probe for GSSAPI support as
they do for TLS support (I look forward to the bikeshed on the order of
these) and should have a parameter to require said support.  One thing
I'm not clear on is what our behavior should be when the user doesn't
explicitly request GSSAPI and doesn't have a ccache - do we prompt?
Skip probing?  I'm not sure what the best option there is.

Before I implement this design, does anyone have any additional concerns
or feedback on it?

Thanks,
--Robbie

signature.asc
Description: PGP signature

Re: [HACKERS] New design for FK-based join selectivity estimation

2016-06-15 Thread Tom Lane

Tomas Vondra  writes:
> Attached is a reworked patch, mostly following the new design proposal 
> from this thread.

I've whacked this around quite a bit and am now reasonably happy with it.
The issue of planning speed hit should be pretty much gone, although I've
not done testing to prove that.  I've rearranged the actual selectivity
calculation some too, because tests I did here did not look very good for
anything but the plain-inner-join case.  It may be that more work is
needed there, but at least there's reasonable commentary about what we're
doing and why.

I have not adopted the plan of ignoring single-column FKs.  While it would
only take a couple more lines to do so, I think the argument that there
will be a material planning speed hit no longer has much force.  And I see
a couple of arguments in favor of allowing this code to trigger on
single-column FKs.  First, it should work well even when pg_statistic data
is missing or out of date, which would be an improvement; and second, we
are likely not going to get much beta testing of the behavior if we
restrict it to multi-col FKs.  So I think we should ship it like this for
beta even if we end up adding a filter against single-column FKs for
release.

Comments and testing appreciated.

regards, tom lane

diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 08ed990..8e5b33c 100644
*** a/src/backend/nodes/copyfuncs.c
--- b/src/backend/nodes/copyfuncs.c
***
*** 27,32 
--- 27,33 
  #include "nodes/plannodes.h"
  #include "nodes/relation.h"
  #include "utils/datum.h"
+ #include "utils/rel.h"
  
  
  /*
*** _copyValue(const Value *from)
*** 4252,4257 
--- 4253,4276 
  	return newnode;
  }
  
+ 
+ static ForeignKeyCacheInfo *
+ _copyForeignKeyCacheInfo(const ForeignKeyCacheInfo *from)
+ {
+ 	ForeignKeyCacheInfo *newnode = makeNode(ForeignKeyCacheInfo);
+ 
+ 	COPY_SCALAR_FIELD(conrelid);
+ 	COPY_SCALAR_FIELD(confrelid);
+ 	COPY_SCALAR_FIELD(nkeys);
+ 	/* COPY_SCALAR_FIELD might work for these, but let's not assume that */
+ 	memcpy(newnode->conkey, from->conkey, sizeof(newnode->conkey));
+ 	memcpy(newnode->confkey, from->confkey, sizeof(newnode->confkey));
+ 	memcpy(newnode->conpfeqop, from->conpfeqop, sizeof(newnode->conpfeqop));
+ 
+ 	return newnode;
+ }
+ 
+ 
  /*
   * copyObject
   *
*** copyObject(const void *from)
*** 5050,5055 
--- 5069,5081 
  			retval = _copyRoleSpec(from);
  			break;
  
+ 			/*
+ 			 * MISCELLANEOUS NODES
+ 			 */
+ 		case T_ForeignKeyCacheInfo:
+ 			retval = _copyForeignKeyCacheInfo(from);
+ 			break;
+ 
  		default:
  			elog(ERROR, "unrecognized node type: %d", (int) nodeTag(from));
  			retval = 0;			/* keep compiler quiet */
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index c7b4153..eaaa370 100644
*** a/src/backend/nodes/outfuncs.c
--- b/src/backend/nodes/outfuncs.c
***
*** 30,35 
--- 30,36 
  #include "nodes/plannodes.h"
  #include "nodes/relation.h"
  #include "utils/datum.h"
+ #include "utils/rel.h"
  
  
  /*
*** _outPlannerInfo(StringInfo str, const Pl
*** 2048,2053 
--- 2049,2055 
  	WRITE_NODE_FIELD(append_rel_list);
  	WRITE_NODE_FIELD(rowMarks);
  	WRITE_NODE_FIELD(placeholder_list);
+ 	WRITE_NODE_FIELD(fkey_list);
  	WRITE_NODE_FIELD(query_pathkeys);
  	WRITE_NODE_FIELD(group_pathkeys);
  	WRITE_NODE_FIELD(window_pathkeys);
*** _outIndexOptInfo(StringInfo str, const I
*** 2138,2143 
--- 2140,2174 
  }
  
  static void
+ _outForeignKeyOptInfo(StringInfo str, const ForeignKeyOptInfo *node)
+ {
+ 	int			i;
+ 
+ 	WRITE_NODE_TYPE("FOREIGNKEYOPTINFO");
+ 
+ 	WRITE_UINT_FIELD(con_relid);
+ 	WRITE_UINT_FIELD(ref_relid);
+ 	WRITE_INT_FIELD(nkeys);
+ 	appendStringInfoString(str, " :conkey");
+ 	for (i = 0; i < node->nkeys; i++)
+ 		appendStringInfo(str, " %d", node->conkey[i]);
+ 	appendStringInfoString(str, " :confkey");
+ 	for (i = 0; i < node->nkeys; i++)
+ 		appendStringInfo(str, " %d", node->confkey[i]);
+ 	appendStringInfoString(str, " :conpfeqop");
+ 	for (i = 0; i < node->nkeys; i++)
+ 		appendStringInfo(str, " %u", node->conpfeqop[i]);
+ 	WRITE_INT_FIELD(nmatched);
+ 	/* for compactness, just print the number of matches per column: */
+ 	appendStringInfoString(str, " :eclass");
+ 	for (i = 0; i < node->nkeys; i++)
+ 		appendStringInfo(str, " %d", (node->eclass[i] != NULL));
+ 	appendStringInfoString(str, " :rinfos");
+ 	for (i = 0; i < node->nkeys; i++)
+ 		appendStringInfo(str, " %d", list_length(node->rinfos[i]));
+ }
+ 
+ static void
  _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
  {
  	/*
*** _outConstraint(StringInfo str, const Con
*** 3207,3212 
--- 3238,3264 
  	}
  }
  
+ static void
+ _outForeignKeyCacheInfo(StringInfo str, const ForeignKeyCacheInfo *node)
+ {
+ 	int			i;
+ 
+ 	WRITE_NODE_TYPE("FOREIGNKEYCACHEINFO");
+ 
+ 	WRITE_OID_FIELD(conrelid);
+ 	WRITE_OID_FIELD

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Andres Freund

On 2016-06-15 16:58:25 -0500, Kevin Grittner wrote:
> On Wed, Jun 15, 2016 at 3:25 PM, Andres Freund  wrote:
> > On 2016-06-15 14:24:58 -0500, Kevin Grittner wrote:
> >> On Wed, Jun 15, 2016 at 2:20 PM, Andres Freund  wrote:
> >>
> >> > We might fetch a toast tuple which
> >> > since have been re-purposed for a datum of a different type.
> >>
> >> How would that happen?
> >
> > Autovac vacuums toast and heap tables independently. Once a toast datum
> > isn't used anymore, the oid used can be reused (because it doesn't
> > conflict via GetNewOidWithIndex() anymore. If analyze then detoasts a
> > datum, which hasn't been removed, the contents of that toast id, might
> > actually be for something different.
> 
> What prevents that from happening now, without STO?

Afaics we shouldn't ever look (i.e. detoast) at a "dead for everyone"
tuple in autovacuum (or anywhere else). There's one minor exception to
that, and that's enum datums in indexes, which is why we currently have
weird transactional requirements for them.  I'm not entirely sure this
can be hit, but it's worth checking.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Kevin Grittner

On Wed, Jun 15, 2016 at 2:40 PM, Alvaro Herrera
 wrote:
> Kevin Grittner wrote:
>> On Wed, Jun 15, 2016 at 1:59 PM, Alvaro Herrera  
>> wrote:
>
>> > We actually go quite some lengths to support this case, even when it's
>> > the opinion of many that we shouldn't.  For example VACUUM doesn't try
>> > to find index entries using the values in each deleted tuple; instead we
>> > remember the TIDs and then scan the indexes (possibly many times) to
>> > find entries that match those TIDs -- which is much slower.  Yet we do
>> > it this way to protect the case that somebody is doing the
>> > not-really-IMMUTABLE function.
>> >
>> > In other words, I don't think we consider the position you argued as
>> > acceptable.
>>
>> What are you saying is unacceptable, and what behavior would be
>> acceptable instead?
>
> The answer "we don't support the situation where you have an index using
> an IMMUTABLE function that isn't actually immutable" is not acceptable.
> The acceptable solution would be a design that doesn't have that
> property as a requisite.
>
> I think having various executor(/heapam) checks that raise errors when
> queries are executed from within ANALYZE is acceptable.

Here is an example of a test case showing that:

-- connection 1
drop table if exists t1;
create table t1 (c1 int not null);
drop table if exists t2;
create table t2 (c1 int not null);
insert into t1 select generate_series(1, 1);
drop function mysq(i int);
create function mysq(i int)
  returns int
  language plpgsql
  immutable
as $mysq$
begin
  return (i * (select c1 from t2));
end
$mysq$;
insert into t2 values (1);
create index t1_c1sq on t1 ((mysq(c1)));
begin transaction isolation level repeatable read;
select 1;

-- connection 2
vacuum analyze verbose t1;
delete from t1 where c1 between 1000 and 1999;
delete from t1 where c1 = 8000;
update t2 set c1 = 1;

-- connection 1
analyze verbose t1;  -- when run after threshold, STO error occurs

The tail end of that, running the analyze once immediately and once
after the threshold is:

test=# -- connection 1
test=# analyze verbose t1;  -- when run after threshold, STO error occurs
INFO:  analyzing "public.t1"
INFO:  "t1": scanned 45 of 45 pages, containing 8999 live rows and
1001 dead rows; 8999 rows in sample, 8999 estimated total rows
ANALYZE
test=# -- connection 1
analyze verbose t1;  -- when run after threshold, STO error occurs
INFO:  analyzing "public.t1"
INFO:  "t1": scanned 45 of 45 pages, containing 8999 live rows and
1001 dead rows; 8999 rows in sample, 8999 estimated total rows
ERROR:  snapshot too old
CONTEXT:  SQL statement "SELECT (i * (select c1 from t2))"
PL/pgSQL function mysq(integer) line 3 at RETURN

Is there some other behavior which would be preferred?

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Kevin Grittner

On Wed, Jun 15, 2016 at 3:25 PM, Andres Freund  wrote:
> On 2016-06-15 14:24:58 -0500, Kevin Grittner wrote:
>> On Wed, Jun 15, 2016 at 2:20 PM, Andres Freund  wrote:
>>
>> > We might fetch a toast tuple which
>> > since have been re-purposed for a datum of a different type.
>>
>> How would that happen?
>
> Autovac vacuums toast and heap tables independently. Once a toast datum
> isn't used anymore, the oid used can be reused (because it doesn't
> conflict via GetNewOidWithIndex() anymore. If analyze then detoasts a
> datum, which hasn't been removed, the contents of that toast id, might
> actually be for something different.

What prevents that from happening now, without STO?

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel.c is not marked as test covered

2016-06-15 Thread Robert Haas

On Wed, Jun 15, 2016 at 4:59 PM, Peter Eisentraut
 wrote:
> On 6/14/16 12:37 PM, Robert Haas wrote:
>> ERROR: can't generate random numbers because you haven't specified a seed
>>
>> ...to which the user will reply, "oh yes I did; in fact I ran SELECT
>> magic_setseed(42) just before I ran the offending query!".  They'll
>> then contact an expert (hopefully) who will very possibly spend a lot
>> of time troubleshooting the wrong thing.  If the message says:
>>
>> ERROR: can't generate random numbers because you haven't specified a seed
>> CONTEXT: parallel worker, PID 62413
>>
>> ...then the expert has a very good chance of guessing what has gone
>> wrong right away.
>
> My updated opinion is to keep the context but remove the PID.  I can see
> that as this feature is being introduced, we will want to know that
> something is happening inside a parallel worker.  Maybe in the future we'll
> have it all figured out and won't need the context anymore.  But the PID is
> not useful by the time you see the message.  The server log should contain a
> trace of the PIDs if you need to do deep digging.

I'm comfortable with that.  Feel free to make it so, unless you want
me to do it for some reason.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel.c is not marked as test covered

2016-06-15 Thread Peter Eisentraut


On 6/14/16 12:37 PM, Robert Haas wrote:

ERROR: can't generate random numbers because you haven't specified a seed

...to which the user will reply, "oh yes I did; in fact I ran SELECT
magic_setseed(42) just before I ran the offending query!".  They'll
then contact an expert (hopefully) who will very possibly spend a lot
of time troubleshooting the wrong thing.  If the message says:

ERROR: can't generate random numbers because you haven't specified a seed
CONTEXT: parallel worker, PID 62413

...then the expert has a very good chance of guessing what has gone
wrong right away.


My updated opinion is to keep the context but remove the PID.  I can see 
that as this feature is being introduced, we will want to know that 
something is happening inside a parallel worker.  Maybe in the future 
we'll have it all figured out and won't need the context anymore.  But 
the PID is not useful by the time you see the message.  The server log 
should contain a trace of the PIDs if you need to do deep digging.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increase message string buffer size of watch command of psql

2016-06-15 Thread Alvaro Herrera

Tom Lane wrote:
> I wrote:

> > Well, we did part of that, but it's still using asctime().  Should we
> > change that to strftime(..."%c"...) to be less English-centric?
> > (The result seems to be the same in C locale.  pg_controldata has done
> > it that way for a long time, with few complaints.)  If we want to do so,
> > now would be the time, since 9.6 already whacked around the format
> > of \watch output.
> 
> I take it from the vast silence that nobody particularly cares one way
> or the other.  On reflection I think that this would be a good change
> to make, so I'll go do so unless I hear complaints soon.

+1 to strftime("%c").  If we wanted to provide additional flexibility we
could have a \pset option to change the strftime format string to
something other than %c, but I don't think there's enough demand to
justify it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Andres Freund

On 2016-06-15 14:24:58 -0500, Kevin Grittner wrote:
> On Wed, Jun 15, 2016 at 2:20 PM, Andres Freund  wrote:
> 
> > We might fetch a toast tuple which
> > since have been re-purposed for a datum of a different type.
> 
> How would that happen?

Autovac vacuums toast and heap tables independently. Once a toast datum
isn't used anymore, the oid used can be reused (because it doesn't
conflict via GetNewOidWithIndex() anymore. If analyze then detoasts a
datum, which hasn't been removed, the contents of that toast id, might
actually be for something different.

That's not super likely to happen (given how rare oid wraparounds
usually are), but it appears to be possible.

Regards,

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread David G. Johnston

On Wed, Jun 15, 2016 at 3:40 PM, Alvaro Herrera 
wrote:

> Kevin Grittner wrote:
> > On Wed, Jun 15, 2016 at 1:59 PM, Alvaro Herrera
> >  wrote:
>
> > > We actually go quite some lengths to support this case, even when it's
> > > the opinion of many that we shouldn't.  For example VACUUM doesn't try
> > > to find index entries using the values in each deleted tuple; instead
> we
> > > remember the TIDs and then scan the indexes (possibly many times) to
> > > find entries that match those TIDs -- which is much slower.  Yet we do
> > > it this way to protect the case that somebody is doing the
> > > not-really-IMMUTABLE function.
> > >
> > > In other words, I don't think we consider the position you argued as
> > > acceptable.
> >
> > What are you saying is unacceptable, and what behavior would be
> > acceptable instead?
>
> The answer "we don't support the situation where you have an index using
> an IMMUTABLE function that isn't actually immutable" is not acceptable.
> The acceptable solution would be a design that doesn't have that
> property as a requisite.
>

Yes, a much better solution would be for PostgreSQL to examine the body of
every function and determine on its own the proper volatility - or lacking
that to "sandbox" (for lack of a better term) function execution so it
simply cannot do things that conflict with its user specified marking.  But
the prevailing opinion on this list is that such an effort is not worthy of
resources and that "let the user keep both pieces"

is the more expedient policy.  That this patch is being defended using
that argument is consistent to policy and thus quite acceptable.

The technical details here are just beyond my reach ATM but I think
Robert's meta-points are spot on.  Though to be fair we are  changing a
fundamental assumption underlying how the system and transactions operate -
the amount of code whose assumptions are now being stressed is non-trivial;
and for a feature that will generally have less use in production - and
likely in much higher-stakes arenas - having a professionally hostile
approach will help to ensure that what is released has been thoroughly
vetted.

These edge cases should be thought of, discussed, and ideally documented
somewhere so that future coders can see and understand that said edges have
been considered even if the answer is: "well, we don't blow up and at worse
have some slightly off statistics, that seems fine".

David J.

Re: [HACKERS] WIP: Data at rest encryption

2016-06-15 Thread PostgreSQL - Hans-Jürgen Schönig

On 06/14/2016 09:59 PM, Jim Nasby wrote:

On 6/12/16 2:13 AM, Ants Aasma wrote:

On Fri, Jun 10, 2016 at 5:23 AM, Haribabu Kommi
 wrote:

> 1. Instead of doing the entire database files encryption, how about
> providing user an option to protect only some particular tables that
> wants the encryption at table/tablespace level. This not only 
provides

> an option to the user, it reduces the performance impact on tables
> that doesn't need any encryption. The problem with this approach
> is that every xlog record needs to validate to handle the encryption
> /decryption, instead of at page level.

Is there a real need for this? The customers I have talked to want to
encrypt the whole database and my goal is to make the feature fast
enough to make that feasible for pretty much everyone. I guess
switching encryption off per table would be feasible, but the key
setup would still need to be done at server startup. Per record
encryption would result in some additional information leakage though.
Overall I thought it would not be worth it, but I'm willing to have my
mind changed on this.

I actually design with this in mind. Tables that contain sensitive 
info go into designated schemas, partly so that you can blanket move 
all of those to an encrypted tablespace (or safer would be to move 
things not in those schemas to an unencrypted tablespace). Since that 
can be done with an encrypted filesystem maybe that's good enough. 
(It's not really clear to me what this buys us over an encrypted FS, 
other than a feature comparison checkmark...)

the reason why this is needed is actually very simple: security 
guidelines and legal requirements ...
we have dealt with a couple of companies recently, who explicitly 
demanded PostgreSQL level encryption in a transparent way to fulfill 
some internal or legal requirements. this is especially true for 
financial stuff. and yes, sure ... you can do a lot of stuff with 
filesystem encryption.
the core idea of this entire thing is however to have a counterpart on 
the database level. if you don't have the key you cannot start the 
instance and if you happen to get access to the filesystem you are still 
not able to fire up the DB.

as it said: requirements by ever bigger companies.

as far as benchmarking is concerned: i did a quick test yesterday (not 
with the final AES implementation yet) and i got pretty good results. 
with a reasonably well cached database in a typical application I expect 
to loose around 10-20%. if everything fits in memory there is 0 loss of 
course. the worst I got with the standard AES (no hardware support used 
yet) I lost around 45% or so. but this requires a value as low as 32 MB 
of shared buffers or so.

many thanks,

hans

--
Hans-Jürgen Schönig
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Robert Haas

On Wed, Jun 15, 2016 at 2:59 PM, Alvaro Herrera
 wrote:
> Kevin Grittner wrote:
>> On Wed, Jun 15, 2016 at 1:50 PM, Robert Haas  wrote:
>>
>> > The expression index case is the one to worry about; if there is a
>> > problem, that's where it is.  What bothers me is that a function used
>> > in an expression index could do anything at all - it can read any
>> > table in the database.
>>
>> It *can*, but then you are lying to the database when you call it
>> IMMUTABLE.  Such an index can easily become corrupted through
>> normal DML.  Without DML the ANALYZE has no problem.  So you seem
>> to be concerned that if someone is lying to the database engine to
>> force it accept a function as IMMUTABLE when it actually isn't, and
>> then updating the referenced rows (which is very likely to render
>> the index corrupted), that statistics might also become stale.
>
> We actually go quite some lengths to support this case, even when it's
> the opinion of many that we shouldn't.  For example VACUUM doesn't try
> to find index entries using the values in each deleted tuple; instead we
> remember the TIDs and then scan the indexes (possibly many times) to
> find entries that match those TIDs -- which is much slower.  Yet we do
> it this way to protect the case that somebody is doing the
> not-really-IMMUTABLE function.
>
> In other words, I don't think we consider the position you argued as
> acceptable.

Well, I actually don't think there's a giant problem here.  I'm just
trying to follow the chain of the argument to its (illogical)
conclusion.  I mean, if ANALYZE itself fails to see a tuple subjected
to early pruning, that should be fine.  And if some query run by a
supposedly-but-not-actually immutable function errors out because
snapshot_too_old is set, that also seems more or less fine.  The
statistics might not get updated, but oh well: either make your
supposedly-immutable function actually immutable, or else turn off
snapshot_too_old, or else live with the fact that ANALYZE will fail
some percentage of the time.  Supporting people who cheat and do
things that technically aren't allowed is one thing; saying that every
new feature must never have any downsides for such people is something
else.  If we took the latter approach, parallel query would be right
out, because you sure can break things by mislabeling functions as
PARALLEL SAFE.  I *do* think it *must* be possible to get an ANALYZE
to do something funky - either error out, or give wrong answers - if
you set up a strange enough set of circumstances, but I don't think
that's necessarily something we need to do anything about.

I think this whole discussion of snapshot too old has provoked far too
many bitter emails -- on all sides.  I entirely believe that there are
legitimate reasons to have concerns about this feature, and I entirely
suspect that it has problems we haven't found yet, and I also entirely
believe that there will be some future bugs that stem from this
feature that we would not have had otherwise.  I think it is entirely
legitimate to have concerns about those things.  On the other hand, I
*also* believe that Kevin is a pretty careful guy and that he's done
his best to make this patch safe and that he did not just go off and
commit something half-baked without due reflection.  We have to expect
that if people who are committers don't get much review of their
patches, they will eventually commit them anyway.  The "I can't
believe you committed this" reactions seem out of line to me.  This
feature is not perfect.  Nor is it the worst thing anybody's ever
committed.  But it's definitely provoked more ire than most.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Choosing the cheapest optimizer cost

2016-06-15 Thread Bruce Momjian

Right now, the optimizer chooses the path with the cheapest cost.

However, do we take into account the behavior of the plan in handling
mis-estimated row counts?  For example, if a path has a log(n) behavior
for changes in the row count, and another plan that is slightly cheaper
has a log(n^2) behavior, should we choose the former, knowing the the
row counts are often inaccurate?

I suppose one approach would be to track not only the path costs, but
the handling of mis-estimated, and account for that in the final path
choice?  Do we already do this by giving less stable plans higher costs?
Does that have the same effect?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Alvaro Herrera

Kevin Grittner wrote:
> On Wed, Jun 15, 2016 at 1:59 PM, Alvaro Herrera
>  wrote:

> > We actually go quite some lengths to support this case, even when it's
> > the opinion of many that we shouldn't.  For example VACUUM doesn't try
> > to find index entries using the values in each deleted tuple; instead we
> > remember the TIDs and then scan the indexes (possibly many times) to
> > find entries that match those TIDs -- which is much slower.  Yet we do
> > it this way to protect the case that somebody is doing the
> > not-really-IMMUTABLE function.
> >
> > In other words, I don't think we consider the position you argued as
> > acceptable.
> 
> What are you saying is unacceptable, and what behavior would be
> acceptable instead?

The answer "we don't support the situation where you have an index using
an IMMUTABLE function that isn't actually immutable" is not acceptable.
The acceptable solution would be a design that doesn't have that
property as a requisite.

I think having various executor(/heapam) checks that raise errors when
queries are executed from within ANALYZE is acceptable.  I don't know
about the TOAST related angle Andres just raised.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Kevin Grittner

On Wed, Jun 15, 2016 at 2:20 PM, Andres Freund  wrote:

> We might fetch a toast tuple which
> since have been re-purposed for a datum of a different type.

How would that happen?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Andres Freund

On 2016-06-15 14:50:46 -0400, Robert Haas wrote:
> On Wed, Jun 15, 2016 at 2:45 PM, Alvaro Herrera
>  wrote:
> > Robert Haas wrote:
> >> On Wed, Jun 15, 2016 at 1:44 PM, Kevin Grittner 
> >> wrote:
> >> > The test I showed creates a situation which (to ANALYZE) is
> >> > identical to what you describe -- ANALYZE sees a page with an LSN
> >> > recent enough that it could have been (and actually has been)
> >> > pruned.  Why would it be better for the ANALYZE to fail than to
> >> > complete?
> >>
> >> As I understand it, the reason we need to sometimes give "ERROR:
> >> snapshot too old" after early pruning is because we might otherwise
> >> give the wrong answer.
> >
> > So what constitutes "the wrong answer"?  A regular transaction reading a
> > page and not finding a tuple that should have been there but was
> > removed, is a serious problem and should be aborted.  For ANALYZE it may
> > not matter a great deal.  Some very old tuple that might have been
> > chosen for the sample is not there; a different tuple is chosen instead,
> > so the stats might be slightly difference.  No big deal.
> >
> > Maybe it is possible to get into trouble if you're taking a sample for
> > an expression index.
> 
> The expression index case is the one to worry about; if there is a
> problem, that's where it is.  What bothers me is that a function used
> in an expression index could do anything at all - it can read any
> table in the database.

Isn't that also a problem around fetching toast tuples? As we don't
TestForOldSnapshot_impl() for toast, We might fetch a toast tuple which
since have been re-purposed for a datum of a different type. Which can
have arbitrarily bad consequences afaics.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel.c is not marked as test covered

2016-06-15 Thread Robert Haas

On Wed, Jun 15, 2016 at 5:23 AM, Amit Kapila  wrote:
>> > Considering above analysis is correct, we have below options:
>> > a. Modify the test such that it actually generates an error and to hide
>> > the
>> > context, we can exception block and raise some generic error.
>> > b. Modify the test such that it actually generates an error and to hide
>> > the
>> > context, we can use force_parallel_mode = regress;
>>
>> Either of those sounds okay.  No need to raise a generic error; one can
>> raise
>> SQLERRM to keep the main message and not the context.  I lean toward (a)
>> so we
>> have nonzero test coverage of force_parallel_mode=on.
>
> Patch implementing option (a) attached with this mail.

OK, committed.  I also changed "select" to "perform" per your
analysis.  I wonder if we need to revisit the choices I made inside
PL/pgsql and see why CURSOR_OPT_PARALLEL_OK is not being set here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Kevin Grittner

On Wed, Jun 15, 2016 at 1:59 PM, Alvaro Herrera
 wrote:
> Kevin Grittner wrote:
>> On Wed, Jun 15, 2016 at 1:50 PM, Robert Haas  wrote:
>>
>>> The expression index case is the one to worry about; if there is a
>>> problem, that's where it is.  What bothers me is that a function used
>>> in an expression index could do anything at all - it can read any
>>> table in the database.
>>
>> It *can*, but then you are lying to the database when you call it
>> IMMUTABLE.  Such an index can easily become corrupted through
>> normal DML.  Without DML the ANALYZE has no problem.  So you seem
>> to be concerned that if someone is lying to the database engine to
>> force it accept a function as IMMUTABLE when it actually isn't, and
>> then updating the referenced rows (which is very likely to render
>> the index corrupted), that statistics might also become stale.
>
> We actually go quite some lengths to support this case, even when it's
> the opinion of many that we shouldn't.  For example VACUUM doesn't try
> to find index entries using the values in each deleted tuple; instead we
> remember the TIDs and then scan the indexes (possibly many times) to
> find entries that match those TIDs -- which is much slower.  Yet we do
> it this way to protect the case that somebody is doing the
> not-really-IMMUTABLE function.
>
> In other words, I don't think we consider the position you argued as
> acceptable.

What are you saying is unacceptable, and what behavior would be
acceptable instead?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Kevin Grittner

On Wed, Jun 15, 2016 at 1:54 PM, Alvaro Herrera
 wrote:
> Robert Haas wrote:

>> The expression index case is the one to worry about; if there is a
>> problem, that's where it is.  What bothers me is that a function used
>> in an expression index could do anything at all - it can read any
>> table in the database.
>
> Hmm, but if it does that, then the code that actually implements that
> query would run the STO checks, right?  The analyze code doesn't need
> to.

Right.  In the described case, you would get a "snapshot too old"
error inside the expression which is trying to generate the index
value.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Alvaro Herrera

Kevin Grittner wrote:
> On Wed, Jun 15, 2016 at 1:50 PM, Robert Haas  wrote:
> 
> > The expression index case is the one to worry about; if there is a
> > problem, that's where it is.  What bothers me is that a function used
> > in an expression index could do anything at all - it can read any
> > table in the database.
> 
> It *can*, but then you are lying to the database when you call it
> IMMUTABLE.  Such an index can easily become corrupted through
> normal DML.  Without DML the ANALYZE has no problem.  So you seem
> to be concerned that if someone is lying to the database engine to
> force it accept a function as IMMUTABLE when it actually isn't, and
> then updating the referenced rows (which is very likely to render
> the index corrupted), that statistics might also become stale.

We actually go quite some lengths to support this case, even when it's
the opinion of many that we shouldn't.  For example VACUUM doesn't try
to find index entries using the values in each deleted tuple; instead we
remember the TIDs and then scan the indexes (possibly many times) to
find entries that match those TIDs -- which is much slower.  Yet we do
it this way to protect the case that somebody is doing the
not-really-IMMUTABLE function.

In other words, I don't think we consider the position you argued as
acceptable.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Kevin Grittner

On Wed, Jun 15, 2016 at 1:50 PM, Robert Haas  wrote:

> The expression index case is the one to worry about; if there is a
> problem, that's where it is.  What bothers me is that a function used
> in an expression index could do anything at all - it can read any
> table in the database.

It *can*, but then you are lying to the database when you call it
IMMUTABLE.  Such an index can easily become corrupted through
normal DML.  Without DML the ANALYZE has no problem.  So you seem
to be concerned that if someone is lying to the database engine to
force it accept a function as IMMUTABLE when it actually isn't, and
then updating the referenced rows (which is very likely to render
the index corrupted), that statistics might also become stale.

They might.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Alvaro Herrera

Robert Haas wrote:
> On Wed, Jun 15, 2016 at 2:45 PM, Alvaro Herrera
>  wrote:

> > Maybe it is possible to get into trouble if you're taking a sample for
> > an expression index.
> 
> The expression index case is the one to worry about; if there is a
> problem, that's where it is.  What bothers me is that a function used
> in an expression index could do anything at all - it can read any
> table in the database.

Hmm, but if it does that, then the code that actually implements that
query would run the STO checks, right?  The analyze code doesn't need
to.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Robert Haas

On Wed, Jun 15, 2016 at 2:45 PM, Alvaro Herrera
 wrote:
> Robert Haas wrote:
>> On Wed, Jun 15, 2016 at 1:44 PM, Kevin Grittner 
>> wrote:
>> > The test I showed creates a situation which (to ANALYZE) is
>> > identical to what you describe -- ANALYZE sees a page with an LSN
>> > recent enough that it could have been (and actually has been)
>> > pruned.  Why would it be better for the ANALYZE to fail than to
>> > complete?
>>
>> As I understand it, the reason we need to sometimes give "ERROR:
>> snapshot too old" after early pruning is because we might otherwise
>> give the wrong answer.
>
> So what constitutes "the wrong answer"?  A regular transaction reading a
> page and not finding a tuple that should have been there but was
> removed, is a serious problem and should be aborted.  For ANALYZE it may
> not matter a great deal.  Some very old tuple that might have been
> chosen for the sample is not there; a different tuple is chosen instead,
> so the stats might be slightly difference.  No big deal.
>
> Maybe it is possible to get into trouble if you're taking a sample for
> an expression index.

The expression index case is the one to worry about; if there is a
problem, that's where it is.  What bothers me is that a function used
in an expression index could do anything at all - it can read any
table in the database.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Kevin Grittner

On Wed, Jun 15, 2016 at 1:45 PM, Alvaro Herrera
 wrote:

> Maybe it is possible to get into trouble if you're taking a sample for
> an expression index.

Maybe -- if you are using a function for an index expression which
does an explicit SELECT against some database table rather than
only using values from the row itself.  In such a case you would
have had to mark a function as IMMUTABLE which depends on table
contents.  I say you get to keep both pieces.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Alvaro Herrera

Robert Haas wrote:
> On Wed, Jun 15, 2016 at 1:44 PM, Kevin Grittner 
> wrote:

> > The test I showed creates a situation which (to ANALYZE) is
> > identical to what you describe -- ANALYZE sees a page with an LSN
> > recent enough that it could have been (and actually has been)
> > pruned.  Why would it be better for the ANALYZE to fail than to
> > complete?
> 
> As I understand it, the reason we need to sometimes give "ERROR:
> snapshot too old" after early pruning is because we might otherwise
> give the wrong answer.

So what constitutes "the wrong answer"?  A regular transaction reading a
page and not finding a tuple that should have been there but was
removed, is a serious problem and should be aborted.  For ANALYZE it may
not matter a great deal.  Some very old tuple that might have been
chosen for the sample is not there; a different tuple is chosen instead,
so the stats might be slightly difference.  No big deal.

Maybe it is possible to get into trouble if you're taking a sample for
an expression index.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Kevin Grittner

On Wed, Jun 15, 2016 at 1:29 PM, Robert Haas  wrote:
> On Wed, Jun 15, 2016 at 1:44 PM, Kevin Grittner 
> wrote:>> So what happens in this scenario:
>>> 1. ANALYZE runs really slowly - maybe the user-defined function it's
>>> running for the expression index is extremely long-running.
>>> 2. Eventually, the snapshot for ANALYZE is older than the configured
>>> value of snapshot_too_old.
>>> 3. Then, ANALYZE selects a page with an LSN new enough that it might
>>> have been pruned.
>>>
>>> Presumably, the ANALYZE ought to error out in this scenario, just as
>>> it would in any other situation where an old snapshot sees a new page.
>>> No?
>>
>> The test I showed creates a situation which (to ANALYZE) is
>> identical to what you describe -- ANALYZE sees a page with an LSN
>> recent enough that it could have been (and actually has been)
>> pruned.  Why would it be better for the ANALYZE to fail than to
>> complete?
>
> As I understand it, the reason we need to sometimes give "ERROR:
> snapshot too old" after early pruning is because we might otherwise
> give the wrong answer.
>
> Maybe I'm confused.

In the scenario that you describe, ANALYZE is coming up with
statistics to use in query planning, and the numbers are not
expected to always be 100% accurate.  I can think of conditions
which might prevail when seeing the recent LSN.

(1)  The recent LSN is from a cause having nothing to do with the
STO feature, like DML.  As things stand, the behavior is the same
as without the patch -- the rows are counted just the same as
always.  If we did as you suggest, we instead would abort ANALYZE
and have stale statistics.

(2)  The recent LSN is related to STO pruning.  The dead rows are
gone forever, and will not be counted.  This seems more correct
than counting them, even if it were possible, and also superior to
aborting the ANALYZE and leaving stale statistics.

Of course, a subset of (1) is the case that the rows can be
early-pruned at the next opportunity.  In such a case ANALYZE is
still counting them according to the rules that we had before the
snapshot too old feature.  If someone wants to argue that those
rules are wrong, that seems like material for a separate patch.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Robert Haas

On Wed, Jun 15, 2016 at 1:44 PM, Kevin Grittner 
wrote:>> So what happens in this scenario:
>> 1. ANALYZE runs really slowly - maybe the user-defined function it's
>> running for the expression index is extremely long-running.
>> 2. Eventually, the snapshot for ANALYZE is older than the configured
>> value of snapshot_too_old.
>> 3. Then, ANALYZE selects a page with an LSN new enough that it might
>> have been pruned.
>>
>> Presumably, the ANALYZE ought to error out in this scenario, just as
>> it would in any other situation where an old snapshot sees a new page.
>> No?
>
> The test I showed creates a situation which (to ANALYZE) is
> identical to what you describe -- ANALYZE sees a page with an LSN
> recent enough that it could have been (and actually has been)
> pruned.  Why would it be better for the ANALYZE to fail than to
> complete?

As I understand it, the reason we need to sometimes give "ERROR:
snapshot too old" after early pruning is because we might otherwise
give the wrong answer.

Maybe I'm confused.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-06-15 Thread Julien Rouhaud

On 15/06/2016 18:14, Julien Rouhaud wrote:
> On 15/06/2016 17:49, Robert Haas wrote:
>> On Tue, Jun 14, 2016 at 7:10 AM, Julien Rouhaud
>>  wrote:
 I don't entirely like the new logic in
 RegisterDynamicBackgroundWorker.
>>>
 I wonder if we can't drive this off
 of a couple of counters, instead of having the process registering the
 background worker iterate over every slot. [...]

>>
>> I think we should go that way.  Some day we might try to make the
>> process of finding a free slot more efficient than it is today; I'd
>> rather not double down on linear search.
>>
> 
> Ok.
> 
>> Are you going to update this patch?
>>
> 
> Sure, I'll post a new patch ASAP.
> 

Attached v4 implements the design you suggested, I hope everything's ok.

I didn't do anything for the statistics part, because I'm not sure
having the counters separately is useful (instead of just having the
current number of parallel workers), and if it's useful I'd find strange
to have these counters get reset at restart instead of being stored and
accumulated as other stats, and that's look too much for 9.6 material.
I'd be happy to work on this though.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e0e5a1e..c661c7a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2018,6 +2018,23 @@ include_dir 'conf.d'

   
 
+  
+   max_parallel_workers (integer)
+   
+max_parallel_workers configuration 
parameter
+   
+   
+   
+
+ Sets the maximum number of workers that can be launched at the same
+ time for the whole server.  This parameter allows the administrator to
+ reserve background worker slots for for third part dynamic background
+ workers.  The default value is 4.  Setting this value to 0 disables
+ parallel query execution.
+
+   
+  
+
   
backend_flush_after (integer)

diff --git a/src/backend/access/transam/parallel.c 
b/src/backend/access/transam/parallel.c
index ab5ef25..b429474 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -453,7 +453,8 @@ LaunchParallelWorkers(ParallelContext *pcxt)
snprintf(worker.bgw_name, BGW_MAXLEN, "parallel worker for PID %d",
 MyProcPid);
worker.bgw_flags =
-   BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+   BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION
+   | BGWORKER_IS_PARALLEL_WORKER;
worker.bgw_start_time = BgWorkerStart_ConsistentState;
worker.bgw_restart_time = BGW_NEVER_RESTART;
worker.bgw_main = ParallelWorkerMain;
diff --git a/src/backend/optimizer/path/allpaths.c 
b/src/backend/optimizer/path/allpaths.c
index cc8ba61..2bcd86b 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -719,9 +719,11 @@ create_plain_partial_paths(PlannerInfo *root, RelOptInfo 
*rel)
}
 
/*
-* In no case use more than max_parallel_workers_per_gather workers.
+* In no case use more than max_parallel_workers or
+* max_parallel_workers_per_gather workers.
 */
-   parallel_workers = Min(parallel_workers, 
max_parallel_workers_per_gather);
+   parallel_workers = Min(max_parallel_workers, Min(parallel_workers,
+   max_parallel_workers_per_gather));
 
/* If any limit was set to zero, the user doesn't want a parallel scan. 
*/
if (parallel_workers <= 0)
diff --git a/src/backend/optimizer/path/costsize.c 
b/src/backend/optimizer/path/costsize.c
index e7f63f4..fd62126 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -113,6 +113,7 @@ int effective_cache_size = 
DEFAULT_EFFECTIVE_CACHE_SIZE;
 
 Cost   disable_cost = 1.0e10;
 
+intmax_parallel_workers = 4;
 intmax_parallel_workers_per_gather = 2;
 
 bool   enable_seqscan = true;
diff --git a/src/backend/optimizer/plan/planner.c 
b/src/backend/optimizer/plan/planner.c
index 07b925e..66e65c8 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -246,8 +246,9 @@ standard_planner(Query *parse, int cursorOptions, 
ParamListInfo boundParams)
IsUnderPostmaster && dynamic_shared_memory_type != 
DSM_IMPL_NONE &&
parse->commandType == CMD_SELECT && !parse->hasModifyingCTE &&
parse->utilityStmt == NULL && max_parallel_workers_per_gather > 
0 &&
-   !IsParallelWorker() && !IsolationIsSerializable() &&
-   !has_parallel_hazard((Node *) parse, true);
+   max_parallel_workers > 0 && !IsParallelWorker() &&
+   !IsolationIsSerializable() && !has_parallel_hazard((Node *) 
parse,

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Kevin Grittner

On Wed, Jun 15, 2016 at 10:46 AM, Robert Haas  wrote:
> On Sat, Jun 11, 2016 at 11:29 AM, Kevin Grittner  wrote:
>> I have reviewed the code and run tests to try to find something
>> here which could be considered a bug, without finding any problem.
>> When reading pages for the random sample for ANALYZE (or
>> auto-analyze) there is not an age check; so ANALYZE completes
>> without error, keeping statistics up-to-date.
>>
>> There really is no difference in behavior except in the case that:
>>
>> (1)  old_snapshot_threshold >= 0 to enable the "snapshot too old"
>>feature, and
>> (2)  there were tuples that were dead as the result of completed
>>transactions, and
>> (3)  those tuples became older than the threshold, and
>> (4)  those tuples were pruned or vacuumed away, and
>> (5)  an ANALYZE process would have read those dead tuples had they
>>not been removed.
>>
>> In such a case the irrevocably dead, permanently removed tuples are
>> not counted in the statistics.  I have trouble seeing a better
>> outcome than that.  Among my tests, I specifically checked for an
>> ANALYZE of a table having an index on an expression, using an old
>> snapshot:
>>
>> -- connection 1
>> drop table if exists t1;
>> create table t1 (c1 int not null);
>> drop table if exists t2;
>> create table t2 (c1 int not null);
>> insert into t1 select generate_series(1, 1);
>> drop function mysq(i int);
>> create function mysq(i int)
>>   returns int
>>   language plpgsql
>>   immutable
>> as $mysq$
>> begin
>>   return (i * i);
>> end
>> $mysq$;
>> create index t1_c1sq on t1 ((mysq(c1)));
>> begin transaction isolation level repeatable read;
>> select 1;
>>
>> -- connection 2
>> vacuum analyze verbose t1;
>> delete from t1 where c1 between 1000 and 1999;
>> delete from t1 where c1 = 8000;
>> insert into t2 values (1);
>> select pg_sleep_for('2min');
>> vacuum verbose t1;  -- repeat if necessary to see the dead rows
>> disappear
>>
>> -- connection 1
>> analyze verbose t1;
>>
>> This runs to completion, as I would want and expect.
>>
>> I am closing this item on the "PostgreSQL 9.6 Open Items" page.  If
>> anyone feels that I've missed something, please provide a test to
>> show the problem, or a clear description of the problem and how you
>> feel behavior should be different.
>
> So what happens in this scenario:
>
> 1. ANALYZE runs really slowly - maybe the user-defined function it's
> running for the expression index is extremely long-running.
> 2. Eventually, the snapshot for ANALYZE is older than the configured
> value of snapshot_too_old.
> 3. Then, ANALYZE selects a page with an LSN new enough that it might
> have been pruned.
>
> Presumably, the ANALYZE ought to error out in this scenario, just as
> it would in any other situation where an old snapshot sees a new page.
> No?

The test I showed creates a situation which (to ANALYZE) is
identical to what you describe -- ANALYZE sees a page with an LSN
recent enough that it could have been (and actually has been)
pruned.  Why would it be better for the ANALYZE to fail than to
complete?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_isready features

2016-06-15 Thread Jimmy

yup, it does for (1)

:-)





On Wed, Jun 15, 2016 at 9:53 AM Imre Samu  wrote:

> >Why I need it? There is tiny window between postgres being ready to
> accept connections
> >and final scripts which create initial user/database.
> >Ideally having option to say "postgres is ready after specific user can
> login to specific database" would be ideal.
>
> temporary - the official docker-postgres solution is not OK?
> see :
> https://github.com/docker-library/postgres/blob/master/9.5/docker-entrypoint.sh#L50
>
> *# internal start of server in order to allow set-up using psql-client *
> *# does not listen on external TCP/IP and waits until start finishes*
> *gosu postgres pg_ctl -D "$PGDATA" -o "-c listen_addresses='localhost'" *
>
> *.*
>
>
> more info: https://github.com/docker-library/postgres/issues/146
>
>
> 2016-06-15 18:09 GMT+02:00 Jimmy :
>
>> Not sure if this was discussed in the past and decided it does not belong
>> to pg_isready utility
>>
>> I am using pg_isready in dockerized development environment as part of
>> docker-compose. Simply say I have POSTGRES container and APP container. I
>> use pg_isready inside app container and wait till postgres is ready
>> to accept connections before app starts.
>>
>> There are two features which would make this a bit smoother and better.
>>
>>
>> *1. enforce login*
>> This could be optional and turned off by default. Basically if user
>> supplies username/database as part of pg_isready call and the login fails
>> (for whatever reason), pg_isready should fail.
>>
>> Why I need it? There is tiny window between postgres being ready to
>> accept connections and final scripts which create initial user/database.
>> Ideally having option to say "postgres is ready after specific user can
>> login to specific database" would be ideal. Again, this can be optional and
>> turned off by default.
>>
>>
>> *2. retry*
>> This is something I can do via unix script, but ideally it would be nice
>> if there would be retry mechanism. For example if I say retry=30 then
>> pg_isready would try 30x with x seconds pause in between and fail only
>> after 30 retries. This could use default retry=0 (current behavior)
>>
>>
>> thanks a lot!
>>
>>
>>
>

Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-15 Thread Teodor Sigaev


We need to reach a consensus here, since there is no way to say "I don't know".
I inclined to agree with you, that returning false is better in such a
case.That will
indicate user to the source of problem.


Here is a patch, now phrase operation returns false if there is not postion 
information. If this behavior looks more reasonable, I'll commit that.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


phrase_no_fallback.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_isready features

2016-06-15 Thread Imre Samu

>Why I need it? There is tiny window between postgres being ready to accept
connections
>and final scripts which create initial user/database.
>Ideally having option to say "postgres is ready after specific user can
login to specific database" would be ideal.

temporary - the official docker-postgres solution is not OK?
see :
https://github.com/docker-library/postgres/blob/master/9.5/docker-entrypoint.sh#L50

*# internal start of server in order to allow set-up using psql-client *
*# does not listen on external TCP/IP and waits until start finishes*
*gosu postgres pg_ctl -D "$PGDATA" -o "-c listen_addresses='localhost'" *

*.*


more info: https://github.com/docker-library/postgres/issues/146


2016-06-15 18:09 GMT+02:00 Jimmy :

> Not sure if this was discussed in the past and decided it does not belong
> to pg_isready utility
>
> I am using pg_isready in dockerized development environment as part of
> docker-compose. Simply say I have POSTGRES container and APP container. I
> use pg_isready inside app container and wait till postgres is ready
> to accept connections before app starts.
>
> There are two features which would make this a bit smoother and better.
>
>
> *1. enforce login*
> This could be optional and turned off by default. Basically if user
> supplies username/database as part of pg_isready call and the login fails
> (for whatever reason), pg_isready should fail.
>
> Why I need it? There is tiny window between postgres being ready to accept
> connections and final scripts which create initial user/database. Ideally
> having option to say "postgres is ready after specific user can login to
> specific database" would be ideal. Again, this can be optional and turned
> off by default.
>
>
> *2. retry*
> This is something I can do via unix script, but ideally it would be nice
> if there would be retry mechanism. For example if I say retry=30 then
> pg_isready would try 30x with x seconds pause in between and fail only
> after 30 retries. This could use default retry=0 (current behavior)
>
>
> thanks a lot!
>
>
>

Re: [HACKERS] pg_isready features

2016-06-15 Thread Joshua D. Drake


On 06/15/2016 09:30 AM, David G. Johnston wrote:

On Wed, Jun 15, 2016 at 12:09 PM, Jimmy 



*2. retry*
This is something I can do via unix script, but ideally it would be
nice if there would be retry mechanism. For example if I say
retry=30 then pg_isready would try 30x with x seconds pause in
between and fail only after 30 retries. This could use default
retry=0 (current behavior)


And the value of this instead of setting a timeout 30x higher is?


Simplicity.

JD

 


--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_isready features

2016-06-15 Thread Jimmy

(1) I can (and do) use psql, pg_isready seems lighter and since it already
supports credentials and DB name, I see very little harm for pg_isready to
say back whether user logged IN or not using these credentials.


(2) timeout is not the same - timeout does not retry, its a simple timeout
in case connection hangs, try it






On Wed, Jun 15, 2016 at 9:30 AM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Wed, Jun 15, 2016 at 12:09 PM, Jimmy  wrote:
>
>> Not sure if this was discussed in the past and decided it does not belong
>> to pg_isready utility
>>
>> I am using pg_isready in dockerized development environment as part of
>> docker-compose. Simply say I have POSTGRES container and APP container. I
>> use pg_isready inside app container and wait till postgres is ready
>> to accept connections before app starts.
>>
>> There are two features which would make this a bit smoother and better.
>>
>>
>> *1. enforce login*
>> This could be optional and turned off by default. Basically if user
>> supplies username/database as part of pg_isready call and the login fails
>> (for whatever reason), pg_isready should fail.
>>
>> Why I need it? There is tiny window between postgres being ready to
>> accept connections and final scripts which create initial user/database.
>> Ideally having option to say "postgres is ready after specific user can
>> login to specific database" would be ideal. Again, this can be optional and
>> turned off by default.
>>
>>
> It shouldn't have to enforce login if there is a way for the server to
> communicate ready or not ready for login without requiring credentials to
> actually be supplied.  I guess it would be more effort and invasive.  Are
> you trying to avoid psql here?
>
>
>>
>> *2. retry*
>> This is something I can do via unix script, but ideally it would be nice
>> if there would be retry mechanism. For example if I say retry=30 then
>> pg_isready would try 30x with x seconds pause in between and fail only
>> after 30 retries. This could use default retry=0 (current behavior)
>>
>>
> And the value of this instead of setting a timeout 30x higher is?
> 
>
>

Re: [HACKERS] pg_isready features

2016-06-15 Thread David G. Johnston

On Wed, Jun 15, 2016 at 12:09 PM, Jimmy  wrote:

> Not sure if this was discussed in the past and decided it does not belong
> to pg_isready utility
>
> I am using pg_isready in dockerized development environment as part of
> docker-compose. Simply say I have POSTGRES container and APP container. I
> use pg_isready inside app container and wait till postgres is ready
> to accept connections before app starts.
>
> There are two features which would make this a bit smoother and better.
>
>
> *1. enforce login*
> This could be optional and turned off by default. Basically if user
> supplies username/database as part of pg_isready call and the login fails
> (for whatever reason), pg_isready should fail.
>
> Why I need it? There is tiny window between postgres being ready to accept
> connections and final scripts which create initial user/database. Ideally
> having option to say "postgres is ready after specific user can login to
> specific database" would be ideal. Again, this can be optional and turned
> off by default.
>
>
It shouldn't have to enforce login if there is a way for the server to
communicate ready or not ready for login without requiring credentials to
actually be supplied.  I guess it would be more effort and invasive.  Are
you trying to avoid psql here?


>
> *2. retry*
> This is something I can do via unix script, but ideally it would be nice
> if there would be retry mechanism. For example if I say retry=30 then
> pg_isready would try 30x with x seconds pause in between and fail only
> after 30 retries. This could use default retry=0 (current behavior)
>
>
And the value of this instead of setting a timeout 30x higher is?

Re: [HACKERS] An extra error for client disconnection on Windows

2016-06-15 Thread Robert Haas

On Thu, Jun 2, 2016 at 4:51 AM, Kyotaro HORIGUCHI
 wrote:
> After a process termination without PQfinish() of a client,
> server emits the following log message not seen on Linux boxes.
>
>> LOG:  could not receive data from client: An existing connection was 
>> forcibly closed by the remote host.
>
> This is because pgwin32_recv reuturns an error ECONNRESET for the
> situation instead of returning non-error EOF as recv(2) does.
>
> This patch translates WSAECONNRESET of WSARecv to an EOF so that
> pgwin32_recv behaves the same way with Linux.
>
> The attached patch does this.

Please add this to the next CommitFest so it gets reviewed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-06-15 Thread Julien Rouhaud

On 15/06/2016 17:49, Robert Haas wrote:
> On Tue, Jun 14, 2016 at 7:10 AM, Julien Rouhaud
>  wrote:
>>> I don't entirely like the new logic in
>>> RegisterDynamicBackgroundWorker.
>>
>>> I wonder if we can't drive this off
>>> of a couple of counters, instead of having the process registering the
>>> background worker iterate over every slot. [...]
>>>
> 
> I think we should go that way.  Some day we might try to make the
> process of finding a free slot more efficient than it is today; I'd
> rather not double down on linear search.
> 

Ok.

> Are you going to update this patch?
> 

Sure, I'll post a new patch ASAP.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_isready features

2016-06-15 Thread Jimmy

Not sure if this was discussed in the past and decided it does not belong
to pg_isready utility

I am using pg_isready in dockerized development environment as part of
docker-compose. Simply say I have POSTGRES container and APP container. I
use pg_isready inside app container and wait till postgres is ready
to accept connections before app starts.

There are two features which would make this a bit smoother and better.


*1. enforce login*
This could be optional and turned off by default. Basically if user
supplies username/database as part of pg_isready call and the login fails
(for whatever reason), pg_isready should fail.

Why I need it? There is tiny window between postgres being ready to accept
connections and final scripts which create initial user/database. Ideally
having option to say "postgres is ready after specific user can login to
specific database" would be ideal. Again, this can be optional and turned
off by default.


*2. retry*
This is something I can do via unix script, but ideally it would be nice if
there would be retry mechanism. For example if I say retry=30 then
pg_isready would try 30x with x seconds pause in between and fail only
after 30 retries. This could use default retry=0 (current behavior)


thanks a lot!

Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-15 Thread Teodor Sigaev


what's about word with several infinitives



select to_tsvector('en', 'leavings');
   to_tsvector

  'leave':1 'leavings':1
(1 row)



select to_tsvector('en', 'leavings') @@ 'leave <0> leavings'::tsquery;
  ?column?
--
  t
(1 row)


Second example is not correct:

select phraseto_tsquery('en', 'leavings')
will produce 'leave | leavings'

and

select phraseto_tsquery('en', 'leavings cats')
will produce 'leave <-> cat | leavings <-> cat'

which seems correct and we don't need special threating of <0>.


This brings up something else that I am not very sold on: to wit,
do we really want the "less than or equal" distance behavior at all?
The documentation gives the example that
phraseto_tsquery('cat ate some rats')
produces
( 'cat' <-> 'ate' ) <2> 'rat'
because "some" is a stopword.  However, that pattern will also match
"cat ate rats", which seems surprising and unexpected to me; certainly
it would surprise a user who did not realize that "some" is a stopword.

So I think there's a reasonable case for decreeing that  should only
match lexemes *exactly* N apart.  If we did that, we would no longer have
the misbehavior that Jean-Pierre is complaining about, and we'd not need
to argue about whether <0> needs to be treated specially.


Agree, seems that's easy to change. I thought that I saw an issue with 
hyphenated word but, fortunately, I forget that hyphenated words don't share a 
position:

# select to_tsvector('foo-bar');
 to_tsvector
-
 'bar':3 'foo':2 'foo-bar':1
# select phraseto_tsquery('foo-bar');
 phraseto_tsquery
---
 ( 'foo-bar' <-> 'foo' ) <-> 'bar'
and
# select to_tsvector('foo-bar') @@ phraseto_tsquery('foo-bar');
 ?column?
--
 t


Patch is attached

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


phrase_exact_distance.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 10.0

2016-06-15 Thread Jim Nasby

On 6/15/16 9:04 AM, Merlin Moncure wrote:

We could stand to be more explicit here.  The docs for version()
>> > indicated
>> > the server_version_num should be used for "machine processing".

On a somewhat related note, any objection to adding server_version_num 
to pg_config? It's common to need to know what version you're handling 
in a Makefile, and today that's pretty ugly (especially when something 
is stamped as beta, since it breaks assumptions about numeric).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-06-15 Thread Robert Haas

On Tue, Jun 14, 2016 at 7:10 AM, Julien Rouhaud
 wrote:
>> I don't entirely like the new logic in
>> RegisterDynamicBackgroundWorker.
>
> I'm not that happy with it too. We can avoid iterating over every slots
> if the feature isn't activated though (max_parallel_workers >=
> max_worker_processes).
>
>> I wonder if we can't drive this off
>> of a couple of counters, instead of having the process registering the
>> background worker iterate over every slot.  Suppose we add two
>> counters to BackgroundWorkerArray, parallel_register_count and
>> parallel_terminate_count.  Whenever a backend successfully registers a
>> parallel worker, it increments parallel_register_count.  Whenever the
>> postmaster marks a parallel wokrer slot as no longer in use, it
>> increments parallel_terminate_count.  Then, the number of active
>> parallel workers is just parallel_register_count -
>> parallel_terminate_count.  (We can't have the postmaster and the
>> backends share the same counter, because then it would need locking,
>> and the postmaster can't try to take spinlocks - can't even use
>> atomics, because those might be emulated using spinlocks.)
>>
>
> I wanted to maintain counters at first, but it seemed more invasive, and
> I thought that the max_parallel_worker would be ueful in environnements
> where there're lots of parallel workers and dynamic workers used, so
> finding a free slot would require iterating over most of the slots most
> of the time anyway.  I'm of course also ok with maintaining counters.

I think we should go that way.  Some day we might try to make the
process of finding a free slot more efficient than it is today; I'd
rather not double down on linear search.

Are you going to update this patch?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Robert Haas

On Sat, Jun 11, 2016 at 11:29 AM, Kevin Grittner  wrote:
> I have reviewed the code and run tests to try to find something
> here which could be considered a bug, without finding any problem.
> When reading pages for the random sample for ANALYZE (or
> auto-analyze) there is not an age check; so ANALYZE completes
> without error, keeping statistics up-to-date.
>
> There really is no difference in behavior except in the case that:
>
> (1)  old_snapshot_threshold >= 0 to enable the "snapshot too old"
>feature, and
> (2)  there were tuples that were dead as the result of completed
>transactions, and
> (3)  those tuples became older than the threshold, and
> (4)  those tuples were pruned or vacuumed away, and
> (5)  an ANALYZE process would have read those dead tuples had they
>not been removed.
>
> In such a case the irrevocably dead, permanently removed tuples are
> not counted in the statistics.  I have trouble seeing a better
> outcome than that.  Among my tests, I specifically checked for an
> ANALYZE of a table having an index on an expression, using an old
> snapshot:
>
> -- connection 1
> drop table if exists t1;
> create table t1 (c1 int not null);
> drop table if exists t2;
> create table t2 (c1 int not null);
> insert into t1 select generate_series(1, 1);
> drop function mysq(i int);
> create function mysq(i int)
>   returns int
>   language plpgsql
>   immutable
> as $mysq$
> begin
>   return (i * i);
> end
> $mysq$;
> create index t1_c1sq on t1 ((mysq(c1)));
> begin transaction isolation level repeatable read;
> select 1;
>
> -- connection 2
> vacuum analyze verbose t1;
> delete from t1 where c1 between 1000 and 1999;
> delete from t1 where c1 = 8000;
> insert into t2 values (1);
> select pg_sleep_for('2min');
> vacuum verbose t1;  -- repeat if necessary to see the dead rows
> disappear
>
> -- connection 1
> analyze verbose t1;
>
> This runs to completion, as I would want and expect.
>
> I am closing this item on the "PostgreSQL 9.6 Open Items" page.  If
> anyone feels that I've missed something, please provide a test to
> show the problem, or a clear description of the problem and how you
> feel behavior should be different.

So what happens in this scenario:

1. ANALYZE runs really slowly - maybe the user-defined function it's
running for the expression index is extremely long-running.
2. Eventually, the snapshot for ANALYZE is older than the configured
value of snapshot_too_old.
3. Then, ANALYZE selects a page with an LSN new enough that it might
have been pruned.

Presumably, the ANALYZE ought to error out in this scenario, just as
it would in any other situation where an old snapshot sees a new page.
No?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should pg_export_snapshot() and currtid() be tagged parallel-unsafe?

2016-06-15 Thread Robert Haas

On Tue, Jun 14, 2016 at 5:01 PM, Andreas Seltenreich  wrote:
> Digging through the sqlsmith logging db, I noticed the following errors:
>
> ERROR:  cannot update SecondarySnapshot during a parallel operation
> ERROR:  cannot assign XIDs during a parallel operation
>
> Queries raising the first one always contain calls to currtid() or
> currtid2().  Queries raising the second one always contain a call to
> pg_export_snapshot().  Patch to tag them as unsafe attached.

Thanks, committed.

None of these are exercised by 'make check', or this would have been
caught sooner.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-15 Thread Noah Misch

On Wed, Jun 15, 2016 at 03:02:15PM +0300, Teodor Sigaev wrote:
> On Wed, Jun 15, 2016 at 02:54:33AM -0400, Noah Misch wrote:
> > On Mon, Jun 13, 2016 at 10:44:06PM -0400, Noah Misch wrote:
> > > On Fri, Jun 10, 2016 at 03:10:40AM -0400, Noah Misch wrote:
> > > > [Action required within 72 hours.  This is a generic notification.]
> > > > 
> > > > The above-described topic is currently a PostgreSQL 9.6 open item.  
> > > > Teodor,
> > > > since you committed the patch believed to have created it, you own this 
> > > > open
> > > > item.  If some other commit is more relevant or if this does not belong 
> > > > as a
> > > > 9.6 open item, please let us know.  Otherwise, please observe the 
> > > > policy on
> > > > open item ownership[1] and send a status update within 72 hours of this
> > > > message.  Include a date for your subsequent status update.  Testers may
> > > > discover new open items at any time, and I want to plan to get them all 
> > > > fixed
> > > > well in advance of shipping 9.6rc1.  Consequently, I will appreciate 
> > > > your
> > > > efforts toward speedy resolution.  Thanks.
> > > > 
> > > > [1] 
> > > > http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com
> > > 
> > > This PostgreSQL 9.6 open item is past due for your status update.  Kindly 
> > > send
> > > a status update within 24 hours, and include a date for your subsequent 
> > > status
> > > update.  Refer to the policy on open item ownership:
> > > http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com
> > 
> >IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 9.6 open item is long past due
> >for your status update.  Please reacquaint yourself with the policy on open
> >item ownership[1] and then reply immediately.  If I do not hear from you by
> >2016-06-16 07:00 UTC, I will transfer this item to release management team
> >ownership without further notice.
> >
> >[1] 
> >http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com
> 
> I'm working on it right now.

That is good news, but it is not a valid status update.  In particular, it
does not specify a date for your next update.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reviewing freeze map code

2016-06-15 Thread Robert Haas

On Wed, Jun 15, 2016 at 9:59 AM, Amit Kapila  wrote:
>> That just kicks the can down the road.  Then you have PD_ALL_VISIBLE
>> clear but the VM bit is still set.
>
> I mean to say clear both as we are doing currently in code:
> if (PageIsAllVisible(BufferGetPage(buffer)))
> {
> all_visible_cleared = true;
> PageClearAllVisible(BufferGetPage(buffer));
> visibilitymap_clear(relation, BufferGetBlockNumber(buffer),
> vmbuffer);
> }

Sure, but without emitting a WAL record, that's just broken.  You
could have the heap page get flushed to disk and the VM page not get
flushed to disk, and then crash, and now you have the classic VM
corruption scenario.

>>   And you still haven't WAL-logged
>> anything.
>
> Yeah, I think WAL requirement is more difficult to meet and I think
> releasing the lock on buffer before writing WAL could lead to flush of such
> a buffer before WAL.
>
> I feel this is an existing-bug and should go to Older Bugs Section in open
> items page.

It does seem to be an existing bug, but the freeze map makes the
problem more serious, I think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reviewing freeze map code

2016-06-15 Thread Robert Haas

On Wed, Jun 15, 2016 at 10:03 AM, Masahiko Sawada  wrote:
>> I'm not sure what to do about this: this part of the heap_update()
>> logic has been like this forever, and I assume that if it were easy to
>> refactor this away, somebody would have done it by now.
>
> How about changing collect_corrupt_items to acquire
> AccessExclusiveLock for safely checking?

Well, that would make it a lot less likely for
pg_check_{visible,frozen} to detect the bug in heap_update(), but it
wouldn't fix the bug in heap_update().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel safety tagging of extension functions

2016-06-15 Thread Andreas Karlsson


On 06/14/2016 07:51 PM, Robert Haas wrote:

On Tue, Jun 14, 2016 at 6:37 AM, Andreas Karlsson  wrote:

I have rebased all my patches on the current master now (and skipped the
extensions I previously listed).


Thanks, this is really helpful.  It was starting to get hard to keep
track of what hadn't been applied yet.  I decided to prioritize
getting committed the patches where the extension version had already
been bumped by 749a787c5b25ae33b3d4da0ef12aa05214aa73c7, so I've now
committed the patches for cube, hstore, intarray, ltree, pg_trgm, and
seg.  However, in pg_trgm, I changed some of the functions that you
had marked as PARALLEL RESTRICTED to be PARALLEL SAFE, because I
didn't see any reason why they needed to be PARALLEL RESTRICTED.  It's
OK for a parallel-safe function to depend on GUC values, because those
are synchronized from the leader to all worker processes.  Random
global variables won't necessarily be kept in sync, but anything
controlled by the GUC mechanism will be.  If there's some other reason
you think those functions aren't parallel-safe, please let me know.


Nah, this is a leftover from before I realized that GUCs are safe. I 
thought I went through all the code and fixed this misunderstanding. 
Thanks for spotting this.



I'm still in favor of rejecting the adminpack patch.  To me, that just
seems like attaching a larger magazine to the gun pointed at your
foot.  I can't deny that in a strict sense those functions are
parallel-safe, but I think they are better left alone.


Making them parallel restricted should prevent them from being a 
footgun, but I also do not see any huge benefit from doing so (while 
making them safe seems dangerous). I do not care either way.


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reviewing freeze map code

2016-06-15 Thread Masahiko Sawada

On Wed, Jun 15, 2016 at 9:56 PM, Robert Haas  wrote:
> On Wed, Jun 15, 2016 at 2:41 AM, Thomas Munro
>  wrote:
>> I spent some time chasing down the exact circumstances.  I suspect
>> that there may be an interlocking problem in heap_update.  Using the
>> line numbers from cae1c788 [1], I see the following interaction
>> between the VACUUM, UPDATE and SELECT (pg_check_visible) backends, all
>> in reference to the same block number:
>>
>>   [VACUUM] sets all visible bit
>>
>>   [UPDATE] heapam.c:3931 HeapTupleHeaderSetXmax(oldtup.t_data, 
>> xmax_old_tuple);
>>   [UPDATE] heapam.c:3938 LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
>>
>>   [SELECT] LockBuffer(buffer, BUFFER_LOCK_SHARE);
>>   [SELECT] observes VM_ALL_VISIBLE as true
>>   [SELECT] observes tuple in HEAPTUPLE_DELETE_IN_PROGRESS state
>>   [SELECT] barfs
>>
>>   [UPDATE] heapam.c:4116 visibilitymap_clear(...)
>
> Yikes: heap_update() sets the tuple's XMAX, CMAX, infomask, infomask2,
> and CTID without logging anything or clearing the all-visible flag and
> then releases the lock on the heap page to go do some more work that
> might even ERROR out.  Only if that other work all goes OK do we
> relock the page and perform the WAL-logged actions.
>
> That doesn't seem like a good idea even in existing releases, because
> you've taken a tuple on an all-visible page and made it not
> all-visible, and you've made a page modification that is not
> necessarily atomic without logging it.  This is is particularly bad in
> 9.6, because if that page is also all-frozen then XMAX will eventually
> be pointing into space and VACUUM will never visit the page to
> re-freeze it the way it would have done in earlier releases.  However,
> even in older releases, I think there's a remote possibility of data
> corruption.  Backend #1 makes these changes to the page, releases the
> lock, and errors out.  Backend #2 writes the page to the OS.  DBA
> takes a hot backup, tearing the page in the middle of XMAX.  Oops.
>
> I'm not sure what to do about this: this part of the heap_update()
> logic has been like this forever, and I assume that if it were easy to
> refactor this away, somebody would have done it by now.
>

How about changing collect_corrupt_items to acquire
AccessExclusiveLock for safely checking?

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 10.0

2016-06-15 Thread Merlin Moncure

On Wed, Jun 15, 2016 at 8:59 AM, David G. Johnston
 wrote:
> On Wed, Jun 15, 2016 at 9:38 AM, Merlin Moncure  wrote:
>>
>> On Tue, Jun 14, 2016 at 5:48 PM, David G. Johnston
>>  wrote:
>> > On Tue, Jun 14, 2016 at 5:58 PM, Merlin Moncure 
>> > wrote:
>> >>
>> >> On Tue, Jun 14, 2016 at 4:13 PM, Jim Nasby 
>> >> wrote:
>> >> > On 6/14/16 3:56 PM, Tom Lane wrote:
>> >> >>
>> >> >> Jim Nasby  writes:
>> >> >>>
>> >> >>> On 6/14/16 3:01 PM, Robert Haas wrote:
>> >> 
>> >>  This seems kind of silly, because anybody who is writing code that
>> >>  might have to run against an existing version of the database
>> >>  won't
>> >>  be
>> >>  able to use it.  The one thing that absolutely has to be
>> >>  cross-version
>> >>  is the method of determining which version you're running against.
>> >> >>
>> >> >>
>> >> >>> We're talking about a function that doesn't currently exist anyway.
>> >> >>
>> >> >>
>> >> >> Huh?  We're talking about PQserverVersion(), comparisons to
>> >> >> PG_VERSION_NUM,
>> >> >> and related APIs.  Those most certainly exist now, and trying to
>> >> >> supplant
>> >> >> them seems like a giant waste of time.
>> >> >>
>> >> >> On the other hand, parsing fields out of version() mechanically has
>> >> >> been
>> >> >> deprecated for as long as those other APIs have existed (which is
>> >> >> since
>> >> >> 8.0 or so).  version() is only meant for human consumption, so I see
>> >> >> no
>> >> >> reason it shouldn't just start returning "10.0", "10.1", etc.  If
>> >> >> that
>> >> >> breaks anyone's code, well, they should have switched to one of the
>> >> >> easier methods years ago.
>> >> >
>> >> >
>> >> > The original post was:
>> >> >
>> >> >>   IF substring(version() FROM $q$([0-9]+\.[0-9]+)$q$)::NUMERIC >=
>> >> >> 9.3
>> >> >
>> >> > and \df *version* on my HEAD doesn't show any other options.
>> >>
>> >> Right.  It's the only way to handle things on the SQL level well,
>> >> that, and pg_settings approaches.  In other words, there is no easier
>> >> way.  I think it's pretty reasonable to assume there's a lot more code
>> >> interfacing with the database from SQL than from C.
>> >
>> >
>> > We could stand to be more explicit here.  The docs for version()
>> > indicated
>> > the server_version_num should be used for "machine processing".
>>
>> whoop! I didn't know about that setting.  I guess it dismantles a lot
>> of the case for more aggressive action.  That said, maybe it's a good
>> idea to construct the versioning change so that 10.x releases have a
>> server_version_num > 9.x releases and leave the other functions alone
>> (assuming that wasn't already the plan).
>>
>> > Option E: Give the DBA control.  If they know they have one or more
>> > mis-behaving applications but it is out their control to patch the code
>> > to
>> > work properly they can change this supposedly human-readable output to
>> > conform the historical x.y.z format.  This would disabled by default.
>> > Humans can easily interpret both versions so please save the comment
>> > about
>> > not having GUCs that influence user-visible behavior.  If your argument
>> > for
>> > changing the format outright is "its for human consumption only" then
>> > apparently this output should not be considered important enough to
>> > adhere
>> > to that rule.  Non-humans depending on its format are subject to our, or
>> > the
>> > DBA's, whims.
>>
>> Nah -- my argument could be restated as "I wasn't aware of the machine
>> variant of the version #".  Do you think it's a good idea to have the
>> machine version number be 10 for version 10.0?  What would 10.1
>> be?  100100 or 11?
>
>
> Sleeping on it I too came to the conclusion that the GUC was largely an
> undesirable option.
>
> IIRC the plan is to have the machine version behave as if the middle number
> is present and always zero.  It would be (the?) one place that the
> historical behavior remains visible but it is impossible to have a totally
> clean break.
>
> Tom's comment back on May 14th (different thread) was that we'd basically
> redefine the minor portion to be 4-digits instead of 2.

That makes sense -- I'm good then.  Thanks for engaging

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 10.0

2016-06-15 Thread David G. Johnston

On Wed, Jun 15, 2016 at 9:38 AM, Merlin Moncure  wrote:

> On Tue, Jun 14, 2016 at 5:48 PM, David G. Johnston
>  wrote:
> > On Tue, Jun 14, 2016 at 5:58 PM, Merlin Moncure 
> wrote:
> >>
> >> On Tue, Jun 14, 2016 at 4:13 PM, Jim Nasby 
> >> wrote:
> >> > On 6/14/16 3:56 PM, Tom Lane wrote:
> >> >>
> >> >> Jim Nasby  writes:
> >> >>>
> >> >>> On 6/14/16 3:01 PM, Robert Haas wrote:
> >> 
> >>  This seems kind of silly, because anybody who is writing code that
> >>  might have to run against an existing version of the database won't
> >>  be
> >>  able to use it.  The one thing that absolutely has to be
> >>  cross-version
> >>  is the method of determining which version you're running against.
> >> >>
> >> >>
> >> >>> We're talking about a function that doesn't currently exist anyway.
> >> >>
> >> >>
> >> >> Huh?  We're talking about PQserverVersion(), comparisons to
> >> >> PG_VERSION_NUM,
> >> >> and related APIs.  Those most certainly exist now, and trying to
> >> >> supplant
> >> >> them seems like a giant waste of time.
> >> >>
> >> >> On the other hand, parsing fields out of version() mechanically has
> >> >> been
> >> >> deprecated for as long as those other APIs have existed (which is
> since
> >> >> 8.0 or so).  version() is only meant for human consumption, so I see
> no
> >> >> reason it shouldn't just start returning "10.0", "10.1", etc.  If
> that
> >> >> breaks anyone's code, well, they should have switched to one of the
> >> >> easier methods years ago.
> >> >
> >> >
> >> > The original post was:
> >> >
> >> >>   IF substring(version() FROM $q$([0-9]+\.[0-9]+)$q$)::NUMERIC >= 9.3
> >> >
> >> > and \df *version* on my HEAD doesn't show any other options.
> >>
> >> Right.  It's the only way to handle things on the SQL level well,
> >> that, and pg_settings approaches.  In other words, there is no easier
> >> way.  I think it's pretty reasonable to assume there's a lot more code
> >> interfacing with the database from SQL than from C.
> >
> >
> > We could stand to be more explicit here.  The docs for version()
> indicated
> > the server_version_num should be used for "machine processing".
>
> whoop! I didn't know about that setting.  I guess it dismantles a lot
> of the case for more aggressive action.  That said, maybe it's a good
> idea to construct the versioning change so that 10.x releases have a
> server_version_num > 9.x releases and leave the other functions alone
> (assuming that wasn't already the plan).
>
> > Option E: Give the DBA control.  If they know they have one or more
> > mis-behaving applications but it is out their control to patch the code
> to
> > work properly they can change this supposedly human-readable output to
> > conform the historical x.y.z format.  This would disabled by default.
> > Humans can easily interpret both versions so please save the comment
> about
> > not having GUCs that influence user-visible behavior.  If your argument
> for
> > changing the format outright is "its for human consumption only" then
> > apparently this output should not be considered important enough to
> adhere
> > to that rule.  Non-humans depending on its format are subject to our, or
> the
> > DBA's, whims.
>
> Nah -- my argument could be restated as "I wasn't aware of the machine
> variant of the version #".  Do you think it's a good idea to have the
> machine version number be 10 for version 10.0?  What would 10.1
> be?  100100 or 11?


Sleeping on it I too came to the conclusion that the GUC was largely an
undesirable option.

IIRC the plan is to have the machine version behave as if the middle number
is present and always zero.  It would be (the?) one place that the
historical behavior remains visible but it is impossible to have a totally
clean break.



Tom's comment back on May 14th (different thread) was that we'd basically
redefine the minor portion to be 4-digits instead of 2.

David J.

Re: [HACKERS] Reviewing freeze map code

2016-06-15 Thread Amit Kapila

On Wed, Jun 15, 2016 at 7:13 PM, Robert Haas  wrote:
>
> On Wed, Jun 15, 2016 at 9:43 AM, Amit Kapila 
wrote:
> > On Wed, Jun 15, 2016 at 6:26 PM, Robert Haas 
wrote:
> >> On Wed, Jun 15, 2016 at 2:41 AM, Thomas Munro
> >>  wrote:
> >> > I spent some time chasing down the exact circumstances.  I suspect
> >> > that there may be an interlocking problem in heap_update.  Using the
> >> > line numbers from cae1c788 [1], I see the following interaction
> >> > between the VACUUM, UPDATE and SELECT (pg_check_visible) backends,
all
> >> > in reference to the same block number:
> >> >
> >> >   [VACUUM] sets all visible bit
> >> >
> >> >   [UPDATE] heapam.c:3931 HeapTupleHeaderSetXmax(oldtup.t_data,
> >> > xmax_old_tuple);
> >> >   [UPDATE] heapam.c:3938 LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
> >> >
> >> >   [SELECT] LockBuffer(buffer, BUFFER_LOCK_SHARE);
> >> >   [SELECT] observes VM_ALL_VISIBLE as true
> >> >   [SELECT] observes tuple in HEAPTUPLE_DELETE_IN_PROGRESS state
> >> >   [SELECT] barfs
> >> >
> >> >   [UPDATE] heapam.c:4116 visibilitymap_clear(...)
> >>
> >> Yikes: heap_update() sets the tuple's XMAX, CMAX, infomask, infomask2,
> >> and CTID without logging anything or clearing the all-visible flag and
> >> then releases the lock on the heap page to go do some more work that
> >> might even ERROR out.
> >
> > Can't we clear the all-visible flag before releasing the lock?  We can
use
> > logic of already_marked as it is currently used in code to clear it just
> > once.
>
> That just kicks the can down the road.  Then you have PD_ALL_VISIBLE
> clear but the VM bit is still set.

I mean to say clear both as we are doing currently in code:
if (PageIsAllVisible(BufferGetPage(buffer)))
{
all_visible_cleared = true;
PageClearAllVisible(BufferGetPage(buffer));
visibilitymap_clear(relation, BufferGetBlockNumber(buffer),
vmbuffer);
}

>
>   And you still haven't WAL-logged
> anything.
>

Yeah, I think WAL requirement is more difficult to meet and I think
releasing the lock on buffer before writing WAL could lead to flush of such
a buffer before WAL.

I feel this is an existing-bug and should go to Older Bugs Section in open
items page.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Reviewing freeze map code

2016-06-15 Thread Robert Haas

On Wed, Jun 15, 2016 at 9:43 AM, Amit Kapila  wrote:
> On Wed, Jun 15, 2016 at 6:26 PM, Robert Haas  wrote:
>> On Wed, Jun 15, 2016 at 2:41 AM, Thomas Munro
>>  wrote:
>> > I spent some time chasing down the exact circumstances.  I suspect
>> > that there may be an interlocking problem in heap_update.  Using the
>> > line numbers from cae1c788 [1], I see the following interaction
>> > between the VACUUM, UPDATE and SELECT (pg_check_visible) backends, all
>> > in reference to the same block number:
>> >
>> >   [VACUUM] sets all visible bit
>> >
>> >   [UPDATE] heapam.c:3931 HeapTupleHeaderSetXmax(oldtup.t_data,
>> > xmax_old_tuple);
>> >   [UPDATE] heapam.c:3938 LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
>> >
>> >   [SELECT] LockBuffer(buffer, BUFFER_LOCK_SHARE);
>> >   [SELECT] observes VM_ALL_VISIBLE as true
>> >   [SELECT] observes tuple in HEAPTUPLE_DELETE_IN_PROGRESS state
>> >   [SELECT] barfs
>> >
>> >   [UPDATE] heapam.c:4116 visibilitymap_clear(...)
>>
>> Yikes: heap_update() sets the tuple's XMAX, CMAX, infomask, infomask2,
>> and CTID without logging anything or clearing the all-visible flag and
>> then releases the lock on the heap page to go do some more work that
>> might even ERROR out.
>
> Can't we clear the all-visible flag before releasing the lock?  We can use
> logic of already_marked as it is currently used in code to clear it just
> once.

That just kicks the can down the road.  Then you have PD_ALL_VISIBLE
clear but the VM bit is still set.  And you still haven't WAL-logged
anything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reviewing freeze map code

2016-06-15 Thread Amit Kapila

On Wed, Jun 15, 2016 at 6:26 PM, Robert Haas  wrote:
>
> On Wed, Jun 15, 2016 at 2:41 AM, Thomas Munro
>  wrote:
> > I spent some time chasing down the exact circumstances.  I suspect
> > that there may be an interlocking problem in heap_update.  Using the
> > line numbers from cae1c788 [1], I see the following interaction
> > between the VACUUM, UPDATE and SELECT (pg_check_visible) backends, all
> > in reference to the same block number:
> >
> >   [VACUUM] sets all visible bit
> >
> >   [UPDATE] heapam.c:3931 HeapTupleHeaderSetXmax(oldtup.t_data,
xmax_old_tuple);
> >   [UPDATE] heapam.c:3938 LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
> >
> >   [SELECT] LockBuffer(buffer, BUFFER_LOCK_SHARE);
> >   [SELECT] observes VM_ALL_VISIBLE as true
> >   [SELECT] observes tuple in HEAPTUPLE_DELETE_IN_PROGRESS state
> >   [SELECT] barfs
> >
> >   [UPDATE] heapam.c:4116 visibilitymap_clear(...)
>
> Yikes: heap_update() sets the tuple's XMAX, CMAX, infomask, infomask2,
> and CTID without logging anything or clearing the all-visible flag and
> then releases the lock on the heap page to go do some more work that
> might even ERROR out.
>

Can't we clear the all-visible flag before releasing the lock?  We can use
logic of already_marked as it is currently used in code to clear it just
once.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pg_dump vs. idle_in_transaction_session_timeout

2016-06-15 Thread Tom Lane

Bernd Helmle  writes:
> Currently pg_dump doesn't turn off idle_in_transaction_session_timeout.

Definitely an oversight, since it turns off other timeout settings.

> Okay, the window of failure here is very narrow (on my machine it breaks
> with an insane setting of 1ms only),

Probably if you were using parallel dump or restore, it could be a lot
more fragile, since an individual worker session might have nothing to
do for some period of time.

Will fix if no one beats me to it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 10.0

2016-06-15 Thread Merlin Moncure

On Tue, Jun 14, 2016 at 5:48 PM, David G. Johnston
 wrote:
> On Tue, Jun 14, 2016 at 5:58 PM, Merlin Moncure  wrote:
>>
>> On Tue, Jun 14, 2016 at 4:13 PM, Jim Nasby 
>> wrote:
>> > On 6/14/16 3:56 PM, Tom Lane wrote:
>> >>
>> >> Jim Nasby  writes:
>> >>>
>> >>> On 6/14/16 3:01 PM, Robert Haas wrote:
>> 
>>  This seems kind of silly, because anybody who is writing code that
>>  might have to run against an existing version of the database won't
>>  be
>>  able to use it.  The one thing that absolutely has to be
>>  cross-version
>>  is the method of determining which version you're running against.
>> >>
>> >>
>> >>> We're talking about a function that doesn't currently exist anyway.
>> >>
>> >>
>> >> Huh?  We're talking about PQserverVersion(), comparisons to
>> >> PG_VERSION_NUM,
>> >> and related APIs.  Those most certainly exist now, and trying to
>> >> supplant
>> >> them seems like a giant waste of time.
>> >>
>> >> On the other hand, parsing fields out of version() mechanically has
>> >> been
>> >> deprecated for as long as those other APIs have existed (which is since
>> >> 8.0 or so).  version() is only meant for human consumption, so I see no
>> >> reason it shouldn't just start returning "10.0", "10.1", etc.  If that
>> >> breaks anyone's code, well, they should have switched to one of the
>> >> easier methods years ago.
>> >
>> >
>> > The original post was:
>> >
>> >>   IF substring(version() FROM $q$([0-9]+\.[0-9]+)$q$)::NUMERIC >= 9.3
>> >
>> > and \df *version* on my HEAD doesn't show any other options.
>>
>> Right.  It's the only way to handle things on the SQL level well,
>> that, and pg_settings approaches.  In other words, there is no easier
>> way.  I think it's pretty reasonable to assume there's a lot more code
>> interfacing with the database from SQL than from C.
>
>
> We could stand to be more explicit here.  The docs for version() indicated
> the server_version_num should be used for "machine processing".

whoop! I didn't know about that setting.  I guess it dismantles a lot
of the case for more aggressive action.  That said, maybe it's a good
idea to construct the versioning change so that 10.x releases have a
server_version_num > 9.x releases and leave the other functions alone
(assuming that wasn't already the plan).

> Option E: Give the DBA control.  If they know they have one or more
> mis-behaving applications but it is out their control to patch the code to
> work properly they can change this supposedly human-readable output to
> conform the historical x.y.z format.  This would disabled by default.
> Humans can easily interpret both versions so please save the comment about
> not having GUCs that influence user-visible behavior.  If your argument for
> changing the format outright is "its for human consumption only" then
> apparently this output should not be considered important enough to adhere
> to that rule.  Non-humans depending on its format are subject to our, or the
> DBA's, whims.

Nah -- my argument could be restated as "I wasn't aware of the machine
variant of the version #".  Do you think it's a good idea to have the
machine version number be 10 for version 10.0?  What would 10.1
be?  100100 or 11?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Prevent ALTER TABLE DROP NOT NULL on child tables if parent column has it

2016-06-15 Thread Tom Lane

Michael Paquier  writes:
> To put it short, it should not be possible to drop a NOT NULL
> constraint on a child relation when its parent table is using it, this
> should only be possible from the parent. Attached is a patch handling
> this problem by adding a new function in pg_inherits.c to fetch the
> list of parent relations for a given relation OID, and did some
> refactoring to stick with what is done when scanning child relations.

This doesn't sound like the right approach; in particular, it won't really
help for deciding whether to propagate a DROP NOT NULL on a parent rel to
its children.  What we've discussed in the past is to store NOT NULL
constraints in pg_constraint, much like CHECK constraints are already, and
use use-count logic identical to the CHECK case to keep track of whether
NOT NULL constraints are inherited or not.  My feeling is that we'd keep
the pg_attribute.attnotnull field and continue to drive actual enforcement
off that, but it would just reflect a summary of the pg_constraint state.

IIRC, Alvaro posted a WIP patch for that awhile back.  Not sure what the
current state is.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ERROR: ORDER/GROUP BY expression not found in targetlist

2016-06-15 Thread Amit Kapila

On Tue, Jun 14, 2016 at 4:48 PM, Amit Kapila 
wrote:

> On Mon, Jun 13, 2016 at 8:11 PM, Tom Lane  wrote:
>
> >  I do
> > not share your confidence that using apply_projection_to_path within
> > create_grouping_paths is free of such a hazard.
> >
>
> Fair enough, let me try to explain the problem and someways to solve it in
> some more detail.  The main thing that got missed by me in the patch
> related to commit-04ae11f62 is that the partial path list of rel also needs
> to have a scanjoin_target. I was under assumption that
> create_grouping_paths will take care of assigning appropriate Path targets
> for the parallel paths generated by it.  If we see, create_grouping_paths()
> do take care of adding the appropriate path targets for the paths generated
> by that function.  However, it doesn't do anything for partial paths.   The
> patch sent by me yesterday [1] which was trying to assign
> partial_grouping_target to partial paths won't be the right fix, because
> (a) partial_grouping_target includes Aggregates (refer
> make_partialgroup_input_target) which we don't want for partial paths; (b)
> it is formed using grouping_target passed in function
> create_grouping_paths() whereas we need the path target formed from
> final_target for partial paths (as partial paths are scanjoin relations)
>  as is done for main path list (in grouping_planner(),  /* Forcibly apply
> that target to all the Paths for the scan/join rel.*/).   Now, I think we
> have following ways to solve it:
>
> (a) check whether the scanjoin_target contains any parallel-restricted
> clause before applying the same to partial path list in grouping_planner.
> However it could lead to duplicate checks in some cases for
> parallel-restricted clause, once in apply_projection_to_path() for main
> pathlist and then again before applying to partial paths.  I think we can
> avoid that by having an additional parameter in apply_projection_to_path()
> which can indicate if the check for parallel-safety is done inside that
> function and what is the result of same.
>
> (b) generate the appropriate scanjoin_target for partial path list in
> create_grouping_paths using final_target. However I think this might lead
> to some duplicate code in create_grouping_paths() as we might have to some
> thing similar to what we have done in grouping_planner for generating such
> a target.  I think if we want we can pass scanjoin_target to
> create_grouping_paths() as well.   Again, we have to check for
> parallel-safety for scanjoin_target before applying it to partial paths,
> but we need to do that only when grouped_rel is considered parallel-safe.
>
>
One more idea could be to have a flag (say parallel_safe) in PathTarget and
set it once we have ensured that the expressions in target doesn't contain
any parallel-restricted entry.  In
create_pathtarget()/set_pathtarget_cost_width(),  if the parallelmodeOK
flag is set, then we can evaluate target expressions for
parallel-restricted expressions and set the flag accordingly.  Now, this
has the benefit that we don't need to evaluate the expressions of targets
for parallel-restricted clauses again and again.  I think this way if the
flag is set once for final_target in grouping_planner, we don't need to
evaluate it again for other targets (grouping_target, scanjoin_target,
etc.) as those are derived from final_target.  Similarly, I think we need
to set ReplOptInfo->reltarget and others as required.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Reviewing freeze map code

2016-06-15 Thread Robert Haas

On Wed, Jun 15, 2016 at 2:41 AM, Thomas Munro
 wrote:
> I spent some time chasing down the exact circumstances.  I suspect
> that there may be an interlocking problem in heap_update.  Using the
> line numbers from cae1c788 [1], I see the following interaction
> between the VACUUM, UPDATE and SELECT (pg_check_visible) backends, all
> in reference to the same block number:
>
>   [VACUUM] sets all visible bit
>
>   [UPDATE] heapam.c:3931 HeapTupleHeaderSetXmax(oldtup.t_data, 
> xmax_old_tuple);
>   [UPDATE] heapam.c:3938 LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
>
>   [SELECT] LockBuffer(buffer, BUFFER_LOCK_SHARE);
>   [SELECT] observes VM_ALL_VISIBLE as true
>   [SELECT] observes tuple in HEAPTUPLE_DELETE_IN_PROGRESS state
>   [SELECT] barfs
>
>   [UPDATE] heapam.c:4116 visibilitymap_clear(...)

Yikes: heap_update() sets the tuple's XMAX, CMAX, infomask, infomask2,
and CTID without logging anything or clearing the all-visible flag and
then releases the lock on the heap page to go do some more work that
might even ERROR out.  Only if that other work all goes OK do we
relock the page and perform the WAL-logged actions.

That doesn't seem like a good idea even in existing releases, because
you've taken a tuple on an all-visible page and made it not
all-visible, and you've made a page modification that is not
necessarily atomic without logging it.  This is is particularly bad in
9.6, because if that page is also all-frozen then XMAX will eventually
be pointing into space and VACUUM will never visit the page to
re-freeze it the way it would have done in earlier releases.  However,
even in older releases, I think there's a remote possibility of data
corruption.  Backend #1 makes these changes to the page, releases the
lock, and errors out.  Backend #2 writes the page to the OS.  DBA
takes a hot backup, tearing the page in the middle of XMAX.  Oops.

I'm not sure what to do about this: this part of the heap_update()
logic has been like this forever, and I assume that if it were easy to
refactor this away, somebody would have done it by now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-15 Thread Teodor Sigaev


IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 9.6 open item is long past due
for your status update.  Please reacquaint yourself with the policy on open
item ownership[1] and then reply immediately.  If I do not hear from you by
2016-06-16 07:00 UTC, I will transfer this item to release management team
ownership without further notice.

[1] 
http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com


I'm working on it right now.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in to_timestamp().

2016-06-15 Thread amul sul

On Monday, 13 June 2016 9:57 PM, Robert Haas  wrote:


>I think a space in the format string should skip a whitespace
>character in the input string, but not a non-whitespace character.

+1, enough the benefit is we are giving correct output. 
 
PFA patch proposing this fix.

Regards,
Amul Sul. 


0001-RM37358-space-in-the-format-string-should-skip-a-whi.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_dump vs. idle_in_transaction_session_timeout

2016-06-15 Thread Bernd Helmle

Currently pg_dump doesn't turn off idle_in_transaction_session_timeout.

Okay, the window of failure here is very narrow (on my machine it breaks
with an insane setting of 1ms only), but for the sake of reliable backups
and protection against over motivated DBA it looks better to me to turn
that off, no?

--
 
Thanks

Bernd


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel.c is not marked as test covered

2016-06-15 Thread Amit Kapila

On Wed, Jun 15, 2016 at 12:07 PM, Noah Misch  wrote:
>
> On Wed, Jun 15, 2016 at 11:50:33AM +0530, Amit Kapila wrote:
> > In short, this test doesn't serve it's purpose which is to generate an
> > error from worker.
>
> That's bad.  Thanks for figuring out the problem.
>
> > do $$begin
> >   Perform stringu1::int2 from tenk1 where unique1 = 1;
> > end$$;
> >
> > ERROR:  invalid input syntax for integer: "BA"
> > CONTEXT:  parallel worker, PID 4460
> > SQL statement "SELECT stringu1::int2 from tenk1 where unique1 = 1"
> > PL/pgSQL function inline_code_block line 2 at PERFORM
> >
> > Considering above analysis is correct, we have below options:
> > a. Modify the test such that it actually generates an error and to hide
the
> > context, we can exception block and raise some generic error.
> > b. Modify the test such that it actually generates an error and to hide
the
> > context, we can use force_parallel_mode = regress;
>
> Either of those sounds okay.  No need to raise a generic error; one can
raise
> SQLERRM to keep the main message and not the context.  I lean toward (a)
so we
> have nonzero test coverage of force_parallel_mode=on.

Patch implementing option (a) attached with this mail.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_parallel_query_test_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUG] pg_basebackup from disconnected standby fails

2016-06-15 Thread Kyotaro HORIGUCHI

Hello,

At Tue, 14 Jun 2016 21:24:58 +0900, Michael Paquier  
wrote in 
> On Tue, Jun 14, 2016 at 8:31 PM, Kyotaro HORIGUCHI
>  wrote:
> >> +# Take a second backup of the standby while the master is offline.
> >> +$node_master->stop;
> >> +$node_standby_1->backup('my_backup_2');
> >> +$node_master->start;
> >
> > I'm not sure that adding the test case for a particular bug like
> > this is appropriate. But it would be acceptable because it
> > doesn't take long time and it is separate from standard checks.
> 
> We already take a backup from a standby when master is connected, it
> should not cost much in terms of time.

Agreed.

> > It seems to me that we could more agressively advance the
> > minRecoveryPoint (but must not let it go too far..), but it is
> > right for it to aim a bit smaller than the ideal location.
> 
> It may be risky to propose such a change for a backpatch. Anyway, in
> any case there is no guarantee that when using the low-level backup
> routines pg_start/stop_backup with a custom backup method the minimum
> recovery point will be correct.. pg_basebackup does that a bit more
> carefully if I recall correctly (too lazy to look at the code now :)).

Yes, as written in another mail, minRecoveryPoint seems to be
correctly upadtaed as its definition. The problem is that only
redo point of the last restartpoint is updated, even if no update
happened. But just inhibiting the update of redo point for the
case in turn causes an annoying repetition of restartpoints.

Addition to the TAP test above, the following SQL commands on the
master causes the same problem without disconnection.

=# checkpoint;
=# select pg_switch_xlog();

This is somewhat artifical but the same situation could be made
also in the nature. The exact condition for this is replaying a
checkpoint record having no buffer modification since the
preceding checkpoint in the previous WAL segments.

Returning to the discussion on which way to choose, we agreed
that using replayEndRecPtr instead of minRecoveryPoint in
do_pg_stop_backup(), as the patch attached to the following
message.

https://www.postgresql.org/message-id/cab7npqtv5gmkqcndofgtgqoqxz2xlz4rrw247oqojzztvy6...@mail.gmail.com

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Prevent ALTER TABLE DROP NOT NULL on child tables if parent column has it

2016-06-15 Thread Michael Paquier

Hi all,

A couple of months back the $subject has been mentioned, though nobody
actually wrote a patch to prevent that:
http://www.postgresql.org/message-id/21633.1448383...@sss.pgh.pa.us
So here is one..

To put it short, it should not be possible to drop a NOT NULL
constraint on a child relation when its parent table is using it, this
should only be possible from the parent. Attached is a patch handling
this problem by adding a new function in pg_inherits.c to fetch the
list of parent relations for a given relation OID, and did some
refactoring to stick with what is done when scanning child relations.
And here is what this patch can do:
=# create table parent (a int not null);
CREATE TABLE
=# create table child (a int not null);
CREATE TABLE
=# alter table child inherit parent ;
ALTER TABLE
=# alter table child alter COLUMN a drop not null;  -- would work on HEAD
ERROR:  42P16: cannot drop NOT NULL constraint for attribute "a"
DETAIL:  The same column in parent relation "parent" is marked NOT NULL
LOCATION:  ATExecDropNotNull, tablecmds.c:5281
=# alter table parent  alter COLUMN a drop not null; -- works on parent
ALTER TABLE
=# \d child
 Table "public.child"
 Column |  Type   | Modifiers
+-+---
 a  | integer |
Inherits: parent

I have added a new index to pg_inherits, so that's not something that
could be back-patched, still it would be good to fix this weird
behavior on HEAD. I am adding that to the next CF.
Regards,
-- 
Michael


child-drop-not-null-v1.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2016-06-15 Thread Ashutosh Bapat

Amit Langote is working on supporting declarative partitioning in
PostgreSQL [1]. I have started working on supporting partition-wise join.
This mail describes very high level design and some insight into the
performance improvements.

An equi-join between two partitioned tables can be broken down into
pair-wise join between their partitions. This technique is called
partition-wise join. Partition-wise joins between similarly partitioned
tables with equi-join condition can be efficient because [2]
1. Each provably non-empty partition-wise join smaller. All such joins
collectively might be more efficient than the join between their parent.
2. Such joins are able to exploit properties of partitions like indexes,
their storage etc.
3. An N-way partition-wise join may have different efficient join orders
compared to the efficient join order between the parent tables.

A partition-wise join is processed in following stages [2], [3].
1. Applicability testing: This phase checks if the join conditions match
the partitioning scheme. A partition-wise join is efficient if there is an
equi-join on the partition keys. E.g. join between tables R and S
partitioned by columns a and b resp. can be broken down into partition-wise
joins if there exists a join condition is R.a = S.b. Or in other words the
number of provably non-empty partition-wise joins is O(N) where N is the
number of partitions.

2. Matching: This phase determines which joins between the partitions of R
and S can potentially produce tuples in the join and prunes empty joins
between partitions.

3. Clustering: This phase aims at reducing the number of partition-wise
joins by clubbing together partitions from joining relations. E.g. clubbing
multiple partitions from either of the partitioned relations which can join
to a single partition from the other partitioned relation.

4. Path/plan creation: This phase creates multiple paths for each
partition-wise join. It also creates Append path/s representing the union
of partition-wise joins.

The work here focuses on a subset of use-cases discussed in [2]. It only
considers partition-wise join for join between similarly partitioned tables
with same number of partitions with same properties, thus producing at most
as many partition-wise joins as there are partitions. It should be possible
to apply partition-wise join technique (with some special handling for
OUTER joins) if both relations have some extra partitions with
non-overlapping partition conditions, apart from the matching partitions.
But I am not planning to implement this optimization in the first cut.

The attached patch is a POC implementation of partition-wise join. It is is
based on the set of patches posted on 23rd May 2016 by Amit Langote for
declarative partitioning. The patch gives an idea about the approach used.
It has several TODOs, which I am working on.

Attached is a script with output which measures potential performance
improvement because of partition-wise join. The script uses a GUC
enable_partition_wise_join to disable/enable this feature for performance
measurement. The scripts measures performance improvement of a join between
two tables partitioned by range on integer column. Each table contains 50K
rows. Each table has an integer and a varchar column. It shows around
10-15% reduction in execution time when partition-wise join is used.
Accompanied with parallel query and FDWs, it opens up avenues for further
improvements for joins between partitioned tables.

[1]. https://www.postgresql.org/message-id/55d3093c.5010...@lab.ntt.co.jp
[2]. https://users.cs.duke.edu/~shivnath/papers/sigmod295-herodotou.pdf
[3]. https://users.cs.duke.edu/~shivnath/tmp/paqo_draft.pdf

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


partitioned_join.out
Description: Binary data
\set num_samples 100
drop table t1 cascade;
drop table t2 cascade;
create table t1 (a int, b varchar) partition by range(a);
create table t1_p1 partition of t1 for values start (1) end (1) inclusive;
create table t1_p2 partition of t1 for values start (10001) end (2) inclusive;
create table t1_p3 partition of t1 for values start (20001) end (3) inclusive;
create table t1_p4 partition of t1 for values start (30001) end (4) inclusive;
create table t1_p5 partition of t1 for values start (40001) end (5) inclusive;
insert into t1 select i, to_char(i, 'FM00') from generate_series(1, 5) i;
create table t2 (a int, b varchar) partition by range(a);
create table t2_p1 partition of t2 for values start (1) end (1) inclusive;
create table t2_p2 partition of t2 for values start (10001) end (2) inclusive;
create table t2_p3 partition of t2 for values start (20001) end (3) inclusive;
create table t2_p4 partition of t2 for values start (30001) end (4) inclusive;
create table t2_p5 partition of t2 for values start (40001) end (5) inclusive;
insert into t2 select i, to_char(i, 'FM00') from generate_series(1, 5

Re: [HACKERS] parallel.c is not marked as test covered

2016-06-15 Thread Amit Kapila

On Wed, Jun 15, 2016 at 12:07 PM, Noah Misch  wrote:
>
> On Wed, Jun 15, 2016 at 11:50:33AM +0530, Amit Kapila wrote:
> > do $$begin
> >   Perform stringu1::int2 from tenk1 where unique1 = 1;
> > end$$;
> >
> > ERROR:  invalid input syntax for integer: "BA"
> > CONTEXT:  parallel worker, PID 4460
> > SQL statement "SELECT stringu1::int2 from tenk1 where unique1 = 1"
> > PL/pgSQL function inline_code_block line 2 at PERFORM
> >
> > Considering above analysis is correct, we have below options:
> > a. Modify the test such that it actually generates an error and to hide
the
> > context, we can exception block and raise some generic error.
> > b. Modify the test such that it actually generates an error and to hide
the
> > context, we can use force_parallel_mode = regress;
>
> Either of those sounds okay.  No need to raise a generic error; one can
raise
> SQLERRM to keep the main message and not the context.  I lean toward (a)
so we
> have nonzero test coverage of force_parallel_mode=on.
>

Do you mean to say nonzero test coverage of force_parallel_mode=on for
error paths?  I see that for force_parallel_mode=on, we have another test
in select_parallel.sql

set force_parallel_mode=1;

explain (costs off)

  select stringu1::int2 from tenk1 where unique1 = 1;



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

84 matches

Mail list logo