Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)

2017-02-09 Thread Fabien COELHO


Hello Corey,


Changes in this patch:

- invalid boolean expression on \if or \elif is treated as if the script 
had a bad \command, so it either stops the script (ON_ERROR_STOP, script 
mode), or just gives the ParseVariableBool error and continues.


- All interactive "barks" removed except for [...]

- remaining error messages are tersed: [...]


Patch applies, make check ok, psql tap test ok.

Yep. At least the code is significantly simpler.

There is a useless space on one end of line in the perl script.

Shouldn't there be some documentation changes to reflect the behavior on 
errors? A precise paragraph about that would be welcome, IMHO.


In particular, I suggest that given the somehow more risky "ignore and 
keep going whatever" behavior after a syntax error on if in a script, 
there should be some advice that on_error_stop should better be activated 
in scripts which use \if.


Given that there is no more barking, then having some prompt indication 
that the code is inside a conditional branch becomes more important, so 
ISTM that there should be some plan to add it.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Documentation improvements for partitioning

2017-02-09 Thread Amit Langote
Hi Corey,

On 2017/02/09 6:14, Corey Huinker wrote:
> On Fri, Feb 3, 2017 at 4:15 AM, Amit Langote 
> wrote:
> 
>> Here are some patches to improve the documentation about partitioned
>> tables:
> 
> Patch applies.
>
> Overall this looks really good. It goes a long way towards stating some of
> the things I had to learn through experimentation.

Thanks a lot for taking a look at it.

> I had to read a really long way into the patch before finding a blurb that
> I felt wasn't completely clear:
> 
> +
> + 
> +  INSERT statements with ON CONFLICT
> +  clause are currently not allowed on partitioned tables, that is,
> +  cause error when specified.
> + 
> 
> 
> Here's some other tries at saying the same thing, none of which are
> completely satisfying:
> 
> ...ON CONFLICT clause are currently not allowed on partitioned tables and
> will cause an error?
> ...ON CONFLICT clause are currently not allowed on partitioned tables and
> will instead cause an error?
> ...ON CONFLICT clause will currently cause an error if used on a
> partitioned table?

The last one sounds better.

> As far as additional issues to cover, this bit:
> 
> + 
> +  
> +   One cannot drop a NOT NULL constraint on a
> +   partition's column, if the constraint is present in the parent
> table.
> +  
> + 
> 
> Maybe we should add something about how one would go about dropping a NOT
> NULL constraint (parent first then partitions?)

Dropping it on the parent will cause it to be dropped on the partitions as
well.  About your point whether we should add a note about how to go about
dropping it in the partition, it seems to me it would be out of place
here; it's just saying that dropping NOT NULL constraint has a different
behavior with partitioned tables than regular inheritance.  That note most
likely belongs in the ALTER TABLE reference page in the DROP NOT NULL
description, so created a patch for that (patch 0004 of the attached patches).

> In reviewing this patch, do all our target formats make word spacing
> irrelevant? i.e. is there any point in looking at the number of spaces
> after a period, etc?

It seems to be a convention in the sources to include 2 spaces after a
period, which I just try to follow (both in the code comments and SGML).
I don't see that spaces are relevant as far as how the targets such as
HTML are rendered.

> A final note, because I'm really familiar with partitioning on Postgres and
> other databases, documentation which is clear to me might not be to someone
> less familiar with partitioning. Maybe we want another reviewer for that?

More eyeballs will only help make this better.

Thanks,
Amit
>From 3074d1986afe0f3ce4710f38517f2e1929ff4c48 Mon Sep 17 00:00:00 2001
From: amit 
Date: Thu, 26 Jan 2017 18:57:55 +0900
Subject: [PATCH 1/4] Improve CREATE TABLE documentation of partitioning

---
 doc/src/sgml/ref/create_table.sgml | 103 ++---
 1 file changed, 96 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 58f8bf6d6a..5596250aef 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -85,8 +85,8 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
 
 and partition_bound_spec is:
 
-{ IN ( expression [, ...] ) |
-  FROM ( { expression | UNBOUNDED } [, ...] ) TO ( { expression | UNBOUNDED } [, ...] ) }
+{ IN ( { bound_literal | NULL } [, ...] ) |
+  FROM ( { bound_literal | UNBOUNDED } [, ...] ) TO ( { bound_literal | UNBOUNDED } [, ...] ) }
 
 index_parameters in UNIQUE, PRIMARY KEY, and EXCLUDE constraints are:
 
@@ -261,6 +261,44 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
   any existing partition of that parent.
  
 
+ 
+  
+   Each of the values specified in the partition bound specification is
+   a literal, NULL, or UNBOUNDED.
+   A literal is either a numeric constant or a string constant that can be
+   automatically coerced to the corresponding partition key column's type.
+  
+
+  
+   When creating a range partition, the lower bound specified with
+   FROM is an inclusive bound, whereas the upper bound
+   specified with TO is an exclusive bound.  That is,
+   the values specified in the FROM list are accepted
+   values of the corresponding partition key columns in a given partition,
+   whereas those in the TO list are not.  To be precise,
+   this applies only to the first of the partition key columns for which
+   the corresponding values in the FROM and
+   TO lists are not equal.  All rows in a given
+   partition contain the same values for all preceding columns, equal to
+   those specified in FROM and TO
+   lists.  On the other hand, any subsequent columns are insignificant
+   as far as implicit 

Re: [HACKERS] contrib modules and relkind check

2017-02-09 Thread Amit Langote
On 2017/02/10 14:32, Michael Paquier wrote:
> On Thu, Feb 9, 2017 at 7:21 AM, Corey Huinker  wrote:

Thanks Corey and Michael for the reviews!

>> 1. should have these tests named in core functions, like maybe:
>>
>> relation_has_visibility(Relation)
>> relation_has_storage(Relation)
>> and/or corresponding void functions check_relation_has_visibility() and
>> check_relation_has_storage() which would raise the appropriate error message
>> when the boolean test fails.
>> Then again, this might be overkill since we don't add new relkinds very
>> often.
> 
> The visibility checks are localized in pg_visibility.c and the storage
> checks in pgstatindex.c, so yes we could have macros in those files.
> Or even better: just have a sanity check routine as the error messages
> are the same everywhere.

If I understand correctly what's being proposed here, tablecmds.c has
something called ATWrongRelkindError() which sounds (kind of) similar.
It's called by ATSimplePermissions() whenever it finds out that relkind of
the table specified in a given ALTER TABLE command is not in the "allowed
relkind targets" for the command.  Different ALTER TABLE commands call
ATSimplePermissions() with an argument that specifies the "allowed relkind
targets" for each command.   I'm not saying that it should be used
elsewhere, but if we do invent a new centralized routine for relkind
checks, it would look similar.

>> 2. Is it better stylistically to have an AND-ed list of != tests, or a
>> negated list of OR-ed equality tests, and should the one style be changed to
>> conform to the other?

I changed the patch so that all newly added checks use an AND-ed list of
!= tests.

>> No new regression tests. I think we should create a dummy partitioned table
>> for each contrib and show that it fails.
> 
> Yep, good point. That's easy enough to add.

I added tests with a partitioned table to pageinspect's page.sql and
pgstattuple.sql.  I don't see any tests for pg_visibility module, maybe I
should add?

Although, I felt a bit uncomfortable the way the error message looks, for
example:

+ -- check that using any of these functions with a partitioned table
would fail
+ create table test_partitioned (a int) partition by range (a);
+ select pg_relpages('test_partitioned');
+ ERROR:  "test_partitioned" is not a table, index, materialized view,
sequence, or TOAST table

test_partitioned IS a table but just the kind without storage.  This is
not the only place however where we do not separate the check for
partitioned tables from other unsupported relkinds in respective contexts.
 Not sure if that would be a better idea.

> By the way, partition tables create a file on disk but they should
> not... Amit, I think you are working on a patch for that as well?
> That's tweaking heap_create() to unlist partitioned tables and then be
> sure that other code paths don't try to look at the parent partitioned
> table on disk.

Yep, I just sent a message titled "Partitioned tables and relfilenode".

Thanks,
Amit
>From 03621e49ca9ecf1546a03aeba0ffc6c15fa20c78 Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 24 Jan 2017 13:22:17 +0900
Subject: [PATCH] Add relkind checks to certain contrib modules

Contrib functions such a pg_visibility, pg_relpages are only applicable
to certain relkinds; error out if otherwise.

Also, RELKIND_PARTITIONED_TABLE was not added to the pageinspect's and
pgstattuple's list of unsupported relkinds.  Add a test for the same.
---
 contrib/pageinspect/expected/page.out|  5 
 contrib/pageinspect/rawpage.c|  5 
 contrib/pageinspect/sql/page.sql |  5 
 contrib/pg_visibility/pg_visibility.c| 28 ++
 contrib/pgstattuple/expected/pgstattuple.out |  5 
 contrib/pgstattuple/pgstatindex.c| 44 
 contrib/pgstattuple/pgstattuple.c|  3 ++
 contrib/pgstattuple/sql/pgstattuple.sql  |  5 
 8 files changed, 100 insertions(+)

diff --git a/contrib/pageinspect/expected/page.out b/contrib/pageinspect/expected/page.out
index 13964cd878..85b183db4d 100644
--- a/contrib/pageinspect/expected/page.out
+++ b/contrib/pageinspect/expected/page.out
@@ -71,3 +71,8 @@ SELECT * FROM fsm_page_contents(get_raw_page('test1', 'fsm', 0));
 (1 row)
 
 DROP TABLE test1;
+-- check that using any of these functions with a partitioned table would fail
+create table test_partitioned (a int) partition by range (a);
+select get_raw_page('test_partitioned', 0);
+ERROR:  cannot get raw page from partitioned table "test_partitioned"
+drop table test_partitioned;
diff --git a/contrib/pageinspect/rawpage.c b/contrib/pageinspect/rawpage.c
index 102f360c6f..1ccc3ff320 100644
--- a/contrib/pageinspect/rawpage.c
+++ b/contrib/pageinspect/rawpage.c
@@ -123,6 +123,11 @@ get_raw_page_internal(text *relname, ForkNumber forknum, BlockNumber blkno)
 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
  errmsg("cannot get 

Re: [HACKERS] Partitioned tables and relfilenode

2017-02-09 Thread Amit Langote
On 2017/02/10 15:58, Simon Riggs wrote:
> On 10 February 2017 at 06:19, Amit Langote
>  wrote:
>> The new partitioned tables do not contain any data by themselves.  Any
>> data inserted into a partitioned table is routed to and stored in one of
>> its partitions.  In fact, it is impossible to insert *any* data before a
>> partition (to be precise, a leaf partition) is created.
> 
> Where is all of this documented? All I see at the moment is old docs.

Last week I sent documentation patches in a separate message titled
"Documentation improvements for partitioning" [1].  One of the patches
adds a separate section in the DDL chapter describing partitioned tables.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/bf9ebbb3-fb1e-3206-b67c-e7a803a747d9%40lab.ntt.co.jp




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partitioned tables and relfilenode

2017-02-09 Thread Simon Riggs
On 10 February 2017 at 06:19, Amit Langote
 wrote:
> The new partitioned tables do not contain any data by themselves.  Any
> data inserted into a partitioned table is routed to and stored in one of
> its partitions.  In fact, it is impossible to insert *any* data before a
> partition (to be precise, a leaf partition) is created.

Where is all of this documented? All I see at the moment is old docs.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Partitioned tables and relfilenode

2017-02-09 Thread Amit Langote
The new partitioned tables do not contain any data by themselves.  Any
data inserted into a partitioned table is routed to and stored in one of
its partitions.  In fact, it is impossible to insert *any* data before a
partition (to be precise, a leaf partition) is created.  It seems wasteful
then to allocate physical storage (files) for partitioned tables.  If we
do not allocate the storage, then we must make sure that the right thing
happens when a command that is intended to manipulate a table's storage
encounters a partitioned table, the "right thing" here being that the
command's code either throws an error or warning (in some cases) if the
specified table is a partitioned table or ignores any partitioned tables
when it reads the list of relations to process from pg_class.  Commands
that need to be taught about this are vacuum, analyze, truncate, and alter
table.  Specifically:

- In case of vacuum, specifying a partitioned table causes a warning

- In case of analyze, we do not throw an error or warning but simply
  avoid calling do_analyze_rel() *non-recursively*.  Further in
  acquire_inherited_sample_rows(), any partitioned tables in the list
  returned by find_all_inheritors() are skipped.

- In case of truncate, only the part which manipulates table's physical
  storage is skipped for partitioned tables.

- ATRewriteTables() skips on the AlteredTableInfo entries for partitioned
  tables, because there is nothing to be done.

- Since we cannot create indexes on partitioned tables anyway, there is
  no need to handle cluster and reindex (they throw a meaningful error
  already due to the lack of indexes.)

Patches 0001 and 0002 of the attached implement the above part.  0001
teaches the above mentioned commands to do the "right thing" as described
above and 0002 teaches heap_create() and heap_create_with_catalog() to not
create any physical storage (none of the forks) for partitioned tables.

Then comes 0003, which concerns inheritance planning.  In a regular
inheritance set (ie, the inheritance set corresponding to an inheritance
hierarchy whose root is a regular table), the inheritance parents are
included in their role as child members, because they might contribute
rows to the final result.  So AppendRelInfo's are created for each such
table by the planner prep phase, which the later planning steps use to
create a scan plan for those tables as the Append's child plans.
Currently, the partitioned tables are also processed by the optimizer as
inheritance sets.  Partitioned table inheritance parents do not own any
storage, so we *must* not create scan plans for them.  So we do not need
to process them as child members of the inheritance set.  0003 teaches
expand_inherited_rtentry() to not add partitioned tables as child members.
 Also, since the root partitioned table RTE is no longer added to the
Append list as the 1st child member, inheritance_planner() cannot assume
that it can install the 1st child RTE as the nominalRelation of a given
ModifyTable node, instead the original root parent table RTE is installed
as the nominalRelation.


Together the above patches implement the first item listed in "Some
notes:" part of an email [1] on the original declarative partitioning
thread, which says:

"We should try to teach the executor never to scan the parent. That's
never necessary with this system, and it might add significant overhead.
We should also try to get rid of the idea of the parent having storage
(i.e. a relfilenode)."

Thoughts, comments?

Thanks,
Amit

[1]
https://postgr.es/m/CA%2BTgmob6DJEQBd%3DqH2wZG1onYZ9R1Sji3q%2BGqCRUqQi%2Bug28Rw%40mail.gmail.com
>From 1cc123a2f01066ababdc971a3ac9aafe1028be74 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 6 Feb 2017 18:03:28 +0900
Subject: [PATCH 1/3] Partitioned tables are empty themselves

So, there is not much point in trying to do things to them that need
accessing files (a later commit will make it so that there is no file
at all to access.)  Things that needed attention are: vacuum, analyze,
truncate, ATRewriteTables.
---
 src/backend/commands/analyze.c   | 39 +++
 src/backend/commands/tablecmds.c | 15 ---
 src/backend/commands/vacuum.c| 14 +-
 3 files changed, 48 insertions(+), 20 deletions(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index ed3acb1673..4dc5bd48f8 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -201,8 +201,7 @@ analyze_rel(Oid relid, RangeVar *relation, int options,
 	 * locked the relation.
 	 */
 	if (onerel->rd_rel->relkind == RELKIND_RELATION ||
-		onerel->rd_rel->relkind == RELKIND_MATVIEW ||
-		onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		onerel->rd_rel->relkind == RELKIND_MATVIEW)
 	{
 		/* Regular table, so we'll use the regular row acquisition function */
 		acquirefunc = acquire_sample_rows;
@@ -234,7 +233,11 @@ analyze_rel(Oid relid, RangeVar 

Re: [HACKERS] Improve OR conditions on joined columns (common star schema problem)

2017-02-09 Thread Claudio Freire
On Thu, Feb 9, 2017 at 9:50 PM, Jim Nasby  wrote:
> WHERE t1 IN ('a','b') OR t2 IN ('c','d')
>
> into
>
> WHERE f1 IN (1,2) OR f2 IN (3,4)
>
> (assuming a,b,c,d maps to 1,2,3,4)
>
> BTW, there's an important caveat here: users generally do NOT want duplicate
> rows from the fact table if the dimension table results aren't unique. I
> thought my array solution was equivalent to what the JOINs would do in that
> case but that's actually wrong. The array solution does provide the behavior
> users generally want here though. JOIN is the easiest tool to pick up for
> this, so it's what people gravitate to, but I suspect most users would be
> happier with a construct that worked like the array trick does, but was
> easier to accomplish.
>
> I wonder if any other databases have come up with non-standard syntax to do
> this.

What I've been doing is do those transforms (tn -> fn) in application
code. While it's a chore, the improvement in plans is usually well
worth the trouble.

IF there's a FK between fact and dimension tables, you can be certain
the transform will yield equivalent results, becuase you'll be certain
the joins don't duplicate rows.

So the transform should be rather general and useful.

If you have a join of the form:

a join b on a.f1 = b.id

Where a.f1 has an FK referencing b.id, and a filter on b X of any
form, you can turn the plan into:

with b_ids as (select id from b where X)
...
a join b on a.f1 = b.id and a.f1 in (select id from b_ids)

In order to be useful, the expected row count from b_ids should be rather small.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removal of deprecated views pg_user, pg_group, pg_shadow

2017-02-09 Thread Stephen Frost
Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > The question of removing the pre-role, deprecated, views of pg_user,
> > pg_group and pg_shadow has come up again.
> 
> > I figured a new thread was in order, however, to allow others to weigh
> > in on it.
> 
> > Note that these views have not been consistently maintained and have
> > ended up including some role attributes from recent versions (eg:
> > bypassrls) but were missed when others were added (eg: createrole).
> > There are properly maintained and cared for role-based versions of all
> > of these views, which are pg_roles, pg_auth_members, and pg_authid,
> > respectively.
> 
> Umm ... what exactly is the argument that those views are really better,
> and are not just destined to become legacy views in their turn?

The pg_auth_members view includes more information about the role
relationships (who the grantor is, if the admin option has been granted)
and it also frames the relationships in the multi-level directed acyclic
graph structure that truely represents what role membership means.  The
pg_roles view includes *all* of the role attributes, not just an
arbitrary (and entirely unintentional, as far as I can tell) subset of
them.

> > As we move forward with the other many changes in PG10, it seems like a
> > good time to remove these inconsistent and ancient views that were
> > introduced when roles were added in 2005.
> 
> This sounds like "v10 is a great time to break stuff", which we've
> already agreed is not project policy.

How about "it's not a bad time to break stuff."

> If there's a positive reason why these old views are impeding progress,
> then let's remove 'em, but I don't think you've presented one.  What
> exactly will it hurt to leave them there?

They're misleading by having an arbitrary subset of the role attributes
and implying that the role relationships are simpler than they actually
are.  Frankly, they're also not being consistently maintained based on
any proper policy, which I find quite objectionable.  I'll admit that I
hold PG to a pretty high standard when it comes to such things, but
that's simply because I have a very vested interest in having users feel
that PG is a solid, well designed, and well maintained database system.
These half-forgotten warts detract from that, even when they're warts
that I wrote in the first place.  12-year-ago me simply didn't have the
experience that now-me does, or the foresight to predict that these
views would still be hanging around at this point.

Of course, we could fix these issues- we could add the grantor to the
pg_groups view, and perhaps even change it to be an acyclic directed
graph structure, and we could add the role attributes to pg_user and
pg_shadow which are missing, but at that point all we're really doing,
it seems to me, is providing synonyms for the existing canonical views,
and that hardly seems useful.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Declarative partitioning - another take

2017-02-09 Thread Amit Langote
On 2017/02/09 15:22, amul sul wrote:
> About 0001-Check-partition-strategy-in-ATExecDropNotNull.patch,
> following test is already covered in alter_table.sql @ Line # 1918,
> instead of this kindly add test for list_partition:
> 
>  77 +-- cannot drop NOT NULL constraint of a range partition key column
>  78 +ALTER TABLE range_parted ALTER a DROP NOT NULL;
>  79 +

Thanks for the review!  You're right.  Updated patch attached.

Thanks,
Amit
>From a4335048e92462fb55fa6db0d830c7c066cd7011 Mon Sep 17 00:00:00 2001
From: amit 
Date: Fri, 10 Feb 2017 14:52:18 +0900
Subject: [PATCH] Check partition strategy in ATExecDropNotNull

We should prevent dropping the NOT NULL constraint on a column only
if the column is in the *range* partition key (as the error message
also says).  Add a regression test while at it.

Reported by: Amul Sul
Patch by: Amit Langote
Report: https://postgr.es/m/CAAJ_b95g5AgkpJKbLajAt8ovKub874-B9X08PiOqHvVfMO2mLw%40mail.gmail.com
---
 src/backend/commands/tablecmds.c  | 22 +-
 src/test/regress/expected/alter_table.out |  4 
 src/test/regress/sql/alter_table.sql  |  5 +
 3 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 37a4c4a3d6..f33aa70da6 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5593,18 +5593,22 @@ ATExecDropNotNull(Relation rel, const char *colName, LOCKMODE lockmode)
 	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 	{
 		PartitionKey key = RelationGetPartitionKey(rel);
-		int			partnatts = get_partition_natts(key),
-	i;
 
-		for (i = 0; i < partnatts; i++)
+		if (get_partition_strategy(key) == PARTITION_STRATEGY_RANGE)
 		{
-			AttrNumber	partattnum = get_partition_col_attnum(key, i);
+			int			partnatts = get_partition_natts(key),
+		i;
 
-			if (partattnum == attnum)
-ereport(ERROR,
-		(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
-		 errmsg("column \"%s\" is in range partition key",
-colName)));
+			for (i = 0; i < partnatts; i++)
+			{
+AttrNumber	partattnum = get_partition_col_attnum(key, i);
+
+if (partattnum == attnum)
+	ereport(ERROR,
+			(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+			 errmsg("column \"%s\" is in range partition key",
+	colName)));
+			}
 		}
 	}
 
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index d8e7b61294..b0e80a7788 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -3029,6 +3029,10 @@ ERROR:  cannot alter type of column referenced in partition key expression
 -- cannot drop NOT NULL on columns in the range partition key
 ALTER TABLE partitioned ALTER COLUMN a DROP NOT NULL;
 ERROR:  column "a" is in range partition key
+-- it's fine however to drop one on the list partition key column
+CREATE TABLE list_partitioned (a int not null) partition by list (a);
+ALTER TABLE list_partitioned ALTER a DROP NOT NULL;
+DROP TABLE list_partitioned;
 -- partitioned table cannot participate in regular inheritance
 CREATE TABLE foo (
 	a int,
diff --git a/src/test/regress/sql/alter_table.sql b/src/test/regress/sql/alter_table.sql
index 1f551ec53c..7513769359 100644
--- a/src/test/regress/sql/alter_table.sql
+++ b/src/test/regress/sql/alter_table.sql
@@ -1917,6 +1917,11 @@ ALTER TABLE partitioned ALTER COLUMN b TYPE char(5);
 -- cannot drop NOT NULL on columns in the range partition key
 ALTER TABLE partitioned ALTER COLUMN a DROP NOT NULL;
 
+-- it's fine however to drop one on the list partition key column
+CREATE TABLE list_partitioned (a int not null) partition by list (a);
+ALTER TABLE list_partitioned ALTER a DROP NOT NULL;
+DROP TABLE list_partitioned;
+
 -- partitioned table cannot participate in regular inheritance
 CREATE TABLE foo (
 	a int,
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removal of deprecated views pg_user, pg_group, pg_shadow

2017-02-09 Thread Tom Lane
Stephen Frost  writes:
> The question of removing the pre-role, deprecated, views of pg_user,
> pg_group and pg_shadow has come up again.

> I figured a new thread was in order, however, to allow others to weigh
> in on it.

> Note that these views have not been consistently maintained and have
> ended up including some role attributes from recent versions (eg:
> bypassrls) but were missed when others were added (eg: createrole).
> There are properly maintained and cared for role-based versions of all
> of these views, which are pg_roles, pg_auth_members, and pg_authid,
> respectively.

Umm ... what exactly is the argument that those views are really better,
and are not just destined to become legacy views in their turn?

> As we move forward with the other many changes in PG10, it seems like a
> good time to remove these inconsistent and ancient views that were
> introduced when roles were added in 2005.

This sounds like "v10 is a great time to break stuff", which we've
already agreed is not project policy.

If there's a positive reason why these old views are impeding progress,
then let's remove 'em, but I don't think you've presented one.  What
exactly will it hurt to leave them there?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contrib modules and relkind check

2017-02-09 Thread Michael Paquier
On Thu, Feb 9, 2017 at 7:21 AM, Corey Huinker  wrote:
> Is this still needing a reviewer?

Useful input is always welcome.

> Code is quite clear. It does raise two questions:
>
> 1. should have these tests named in core functions, like maybe:
>
> relation_has_visibility(Relation)
> relation_has_storage(Relation)
> and/or corresponding void functions check_relation_has_visibility() and
> check_relation_has_storage() which would raise the appropriate error message
> when the boolean test fails.
> Then again, this might be overkill since we don't add new relkinds very
> often.

The visibility checks are localized in pg_visibility.c and the storage
checks in pgstatindex.c, so yes we could have macros in those files.
Or even better: just have a sanity check routine as the error messages
are the same everywhere.

> 2. Is it better stylistically to have an AND-ed list of != tests, or a
> negated list of OR-ed equality tests, and should the one style be changed to
> conform to the other?
>
> No new regression tests. I think we should create a dummy partitioned table
> for each contrib and show that it fails.

Yep, good point. That's easy enough to add.

By the way, partition tables create a file on disk but they should
not... Amit, I think you are working on a patch for that as well?
That's tweaking heap_create() to unlist partitioned tables and then be
sure that other code paths don't try to look at the parent partitioned
table on disk.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)

2017-02-09 Thread Corey Huinker
On Thu, Feb 9, 2017 at 4:43 PM, Erik Rijkers  wrote:

> On 2017-02-09 22:15, Tom Lane wrote:
>
>> Corey Huinker  writes:
>>
>
> The feature now  ( at patch v10) lets you break off with Ctrl-C anywhere.
> I like it now much more.
>
> The main thing I still dislike somewhat about the patch is the verbose
> output. To be honest I would prefer to just remove /all/ the interactive
> output.
>
> I would vote to just make it remain silent if there is no error.   (and if
> there is an error, issue a message and exit)
>
> thanks,
>
> Erik Rijkers
>

Changes in this patch:
- invalid boolean expression on \if or \elif is treated as if the script
had a bad \command, so it either stops the script (ON_ERROR_STOP, script
mode), or just gives the ParseVariableBool error and continues.

- All interactive "barks" removed except for
"command ignored. use \endif or Ctrl-C to exit current branch" when the
user types a non-branching \command in a false branch
"query ignored. use \endif or Ctrl-C to exit current branch" when the
user types a non-branching \command in a false branch
"\if: escaped" when a user does press Ctrl-C and they escape a branch.

- remaining error messages are tersed:
 \elif: cannot occur after \else
 \elif: no matching \if
 \else: cannot occur after \else
 \else: no matching \if
 \endif: no matching \if
 found EOF before closing \endif(s)
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index ae58708..c0ba4c4 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -2035,6 +2035,67 @@ hello 10
 
   
 
+  
+\if expr
+\elif expr
+\else
+\endif
+
+
+This group of commands implements nestable conditional blocks, like
+this:
+
+
+SELECT
+EXISTS(SELECT 1 FROM customer WHERE customer_id = 123) as is_customer,
+EXISTS(SELECT 1 FROM employee WHERE employee_id = 456) as is_employee
+\gset
+\if :is_customer
+SELECT * FROM customer WHERE customer_id = 123;
+\elif :is_employee
+\echo 'is not a customer but is an employee'
+SELECT * FROM employee WHERE employee_id = 456;
+\else
+\if yes
+\echo 'not a customer or employee'
+\else
+\echo 'this should never print'
+\endif
+\endif
+
+
+Conditional blocks must begin with a \if and end
+with an \endif, and the pairs must be found in
+the same source file. If an EOF is reached on the main file or an
+\include-ed file before all local
+\if-\endif are matched, then
+psql will raise an error.
+
+
+A conditional block can have any number of 
+\elif clauses, which may optionally be followed by a
+single \else clause.
+
+
+The \if and \elif commands
+read the rest of the line and evaluate it as a boolean expression.
+Currently, expressions are limited to a single unquoted string
+which are evaluated like other on/off options, so the valid values
+are any unambiguous case insensitive matches for one of:
+true, false, 
1,
+0, on, off,
+yes, no.  So 
+t, T, and tR
+will all match true.
+
+
+Lines within false branches are not evaluated in any way: queries are
+not sent to the server, non-conditional commands are not evaluated but
+bluntly ignored, nested if-expressions in such branches are also not
+evaluated but are tallied to check for proper nesting.
+
+
+  
 
   
 \ir or \include_relative 
filename
diff --git a/src/bin/psql/Makefile b/src/bin/psql/Makefile
index c53733f..7a418c6 100644
--- a/src/bin/psql/Makefile
+++ b/src/bin/psql/Makefile
@@ -61,8 +61,16 @@ uninstall:
 
 clean distclean:
rm -f psql$(X) $(OBJS) lex.backup
+   rm -rf tmp_check
 
 # files removed here are supposed to be in the distribution tarball,
 # so do not clean them in the clean/distclean rules
 maintainer-clean: distclean
rm -f sql_help.h sql_help.c psqlscanslash.c
+
+
+check:
+   $(prove_check)
+
+installcheck:
+   $(prove_installcheck)
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index f17f610..390bccd 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -49,6 +49,7 @@
 #include "psqlscanslash.h"
 #include "settings.h"
 #include "variables.h"
+#include "fe_utils/psqlscan_int.h"
 
 /*
  * Editable database object types.
@@ -132,7 +133,7 @@ HandleSlashCmds(PsqlScanState scan_state,
status = PSQL_CMD_ERROR;
}
 
-   if (status != PSQL_CMD_ERROR)
+   if (status != PSQL_CMD_ERROR && pset.active_branch)
{
/* eat any remaining arguments after a valid command */
/* note we suppress evaluation of backticks here */
@@ -194,6 +195,30 @@ 

Re: [HACKERS] possibility to specify template database for pg_regress

2017-02-09 Thread Michael Paquier
On Thu, Feb 9, 2017 at 5:13 AM, Pavel Stehule  wrote:
> here is a patch

Thanks.

-   for (sl = dblist; sl; sl = sl->next)
-   create_database(sl->str);
+   if (templatelist != NULL)
+   {
+   _stringlist *tl;
+
+   for (sl = dblist, tl = templatelist; sl; sl = sl->next, tl
= tl->next)
+   {
+   if (tl != NULL)
+   create_database(sl->str, tl->str);
+   else
+   {
+   fprintf(stderr, _("%s: the template list is
shorter than database list\n"),
+   progname);
+   exit(2);
+   }
+   }
+   }
+   else
+   for (sl = dblist; sl; sl = sl->next)
+   create_database(sl->str, "template0");
There is one problem here: if the length of the template list is
shorter than the database list, databases get halfly created, then
pg_regress complains, letting the instance in a half-way state. I
think that you had better do any sanity checks before creating or even
dropping existing databases.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Provide list of subscriptions and publications in psql's completion

2017-02-09 Thread Michael Paquier
On Tue, Feb 7, 2017 at 6:35 AM, Peter Eisentraut
 wrote:
> On 2/6/17 10:54 AM, Fujii Masao wrote:
>> Yes, that's an option. And, if we add dbid to pg_stat_subscription,
>> I'm tempted to add all the pg_subscription's columns except subconninfo
>> into pg_stat_subscription. Since subconninfo may contain security-sensitive
>> information, it should be excluded. But it seems useful to expose other
>> columns. Then, if we expose all the columns except subconninfo, maybe
>> it's better to just revoke subconninfo column on pg_subscription instead of
>> all columns. Thought?
>
> I think previous practice is to provide a view such as pg_subscriptions
> that contains all the nonprivileged information.

OK, I think that I see the point you are coming at:
pg_stat_get_subscription (or stat tables) should not be used for
psql's tab completion. So gathering all things discussed, we have:
- No tab completion for publications.
- Fix the meta-commands.
- Addition of a view pg_subscriptions with all the non-sensitive data.
(- Or really extend pg_stat_subscriptions with the database ID and use
it for tab completion?)
Am I missing something?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup -R

2017-02-09 Thread Michael Paquier
On Thu, Feb 9, 2017 at 8:17 PM, Amit Kapila  wrote:
> +1.  Sounds sensible thing to do.

Hm. It seems to me that PGPASSFILE still needs to be treated as an
exception because it is set to $HOME/.pgpass without any value set in
PQconninfoOption->compiled and it depends on the environment. Similar
rules apply to fallback_application_name, dbname and replication as
well, so they would need to be kept as checked on an individual basis.

Now it is true that pg_basebackup -R enforces the value set for a
parameter in the created string if its environment variable is set.
Bypassing those values would potentially break applications that rely
on the existing behavior.

In short, I'd like to think that we should just filter out those two
parameters by name and call it a day. Or introduce an idea of value
set for the environment by adding some kind of tracking flag in
PQconninfoOption? Though I am not sure that it is worth complicating
libpq to just generate recovery.conf in pg_basebackup.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Removal of deprecated views pg_user, pg_group, pg_shadow

2017-02-09 Thread Stephen Frost
Greetings,

The question of removing the pre-role, deprecated, views of pg_user,
pg_group and pg_shadow has come up again.

I figured a new thread was in order, however, to allow others to weigh
in on it.

Note that these views have not been consistently maintained and have
ended up including some role attributes from recent versions (eg:
bypassrls) but were missed when others were added (eg: createrole).
There are properly maintained and cared for role-based versions of all
of these views, which are pg_roles, pg_auth_members, and pg_authid,
respectively.

As we move forward with the other many changes in PG10, it seems like a
good time to remove these inconsistent and ancient views that were
introduced when roles were added in 2005.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread Stephen Frost
* Josh Berkus (j...@berkus.org) wrote:
> On 02/09/2017 05:19 PM, Michael Paquier wrote:
> > On Fri, Feb 10, 2017 at 10:16 AM, Stephen Frost  wrote:
> >>> As someone mentioned, forcing a user to install an extension makes
> >>> the deprecation visible. Another option would be to have the backend
> >>> spit out a WARNING the first time you access anything that's
> >>> deprecated. Both of those are pertinent reminders to people that
> >>> they need to change their tools.
> >>
> >> Ugh.  Please, no.  Hacking up the backend to recognize that a given
> >> query is referring to a deprecated view and then throwing a warning on
> >> it is just plain ugly.
> >>
> >> Let's go one step further, and throw an ERROR if someone tries to query
> >> these views instead.
> > 
> > FWIW, I am of the opinion to just nuke them as the "soft of"
> > deprecation period has been very long. Applications should have
> > switched to pg_authid and pg_roles long ago already.
> 
> We will definitely break a lot of client code by removing these -- I
> know that, deprecated or not, a lot of infrequently-updated
> driver/orm/GUI code still refers to pg_shadow/pg_user.

Any that we're aware of, I'd suggest we reach out to the maintainers of
and let them know.

> I think Postgres 10 is the right time to break that code (I mean, we
> have to do it someday, and we're already telling people about breakage
> in 10), but be aware that there will be shouting and carrying on.

Agreed.

> -1 on a warning.  Very little code today which references the deprecated
> code is interactive, so who's going to see the warning?

Indeed, I was thinking of this also.  The warnings would just end up in
the server logs, amongst tons of often not-terribly-interesting
information that far too many users ignore already, and those that don't
probably actually read the release notes.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread Josh Berkus
On 02/09/2017 05:19 PM, Michael Paquier wrote:
> On Fri, Feb 10, 2017 at 10:16 AM, Stephen Frost  wrote:
>>> As someone mentioned, forcing a user to install an extension makes
>>> the deprecation visible. Another option would be to have the backend
>>> spit out a WARNING the first time you access anything that's
>>> deprecated. Both of those are pertinent reminders to people that
>>> they need to change their tools.
>>
>> Ugh.  Please, no.  Hacking up the backend to recognize that a given
>> query is referring to a deprecated view and then throwing a warning on
>> it is just plain ugly.
>>
>> Let's go one step further, and throw an ERROR if someone tries to query
>> these views instead.
> 
> FWIW, I am of the opinion to just nuke them as the "soft of"
> deprecation period has been very long. Applications should have
> switched to pg_authid and pg_roles long ago already.
> 

We will definitely break a lot of client code by removing these -- I
know that, deprecated or not, a lot of infrequently-updated
driver/orm/GUI code still refers to pg_shadow/pg_user.

I think Postgres 10 is the right time to break that code (I mean, we
have to do it someday, and we're already telling people about breakage
in 10), but be aware that there will be shouting and carrying on.

-1 on a warning.  Very little code today which references the deprecated
code is interactive, so who's going to see the warning?

-- 
Josh Berkus
Containers & Databases Oh My!


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread Andres Freund
On 2017-02-09 20:02:54 -0500, Stephen Frost wrote:
> * Andres Freund (and...@anarazel.de) wrote:
> > On 2017-02-09 19:19:21 -0500, Stephen Frost wrote:
> > > I'd love to nuke pg_shadow and all the other
> > > not-really-maintained backwards-compat things from when roles were
> > > added too.
> > 
> > Not sure if it's worth the work to rip out and such, but I'm mildly
> > supportive of this one too.  Depends a bit on what all the other things
> > are ;)
> 
> Reviewing 7762619e95272974f90a38d8d85aafbe0e94add5 where roles were
> added, I find:
> 
> pg_user   - use pg_roles instead, which actually includes all of the role
> attributes, unlike pg_user

Hm, I presume this is the most used one.


> pg_group  - use pg_auth_members instead, which includes the info about
> the admin option and the grantor

then this.

> pg_shadow - use pg_authid instead, which, again, actually includes all
> of the role attributes, unlike pg_shadow.

That's probably fine.


I'm fine with dropping now, alternatively we could, and that seems like
it'd institute a good practice, name them to be removed in 10+1 in the
10 release notes. "Upcoming removal of deprecated features" or such. And
schedule stuff for that regularly.  Like e.g. dropping psql support for
< 9.0 (randomly chosen version), pg_dump support for very old versions,
etc, ...
While not everyone will be saved by that (by virtue of not reading /
reacting) it helps those that actually read the notes.  Obviously
there'd still some incompatibilities that do not go through that
mechanism.


> I don't think we should remove things like CREATE USER, that's a
> perfectly reasonable and maintained interface, unlike the above views,
> which missed out on things like the 'createrole' role attribute.

Yea, that'd be a bad plan.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread Michael Paquier
On Fri, Feb 10, 2017 at 10:16 AM, Stephen Frost  wrote:
>> As someone mentioned, forcing a user to install an extension makes
>> the deprecation visible. Another option would be to have the backend
>> spit out a WARNING the first time you access anything that's
>> deprecated. Both of those are pertinent reminders to people that
>> they need to change their tools.
>
> Ugh.  Please, no.  Hacking up the backend to recognize that a given
> query is referring to a deprecated view and then throwing a warning on
> it is just plain ugly.
>
> Let's go one step further, and throw an ERROR if someone tries to query
> these views instead.

FWIW, I am of the opinion to just nuke them as the "soft of"
deprecation period has been very long. Applications should have
switched to pg_authid and pg_roles long ago already.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread Stephen Frost
Jim,

* Jim Nasby (jim.na...@bluetreble.com) wrote:
> On 2/9/17 6:37 PM, Andres Freund wrote:
> >>I'd love to nuke pg_shadow and all the other
> >>not-really-maintained backwards-compat things from when roles were
> >>added too.
> >Not sure if it's worth the work to rip out and such, but I'm mildly
> >supportive of this one too.  Depends a bit on what all the other things
> >are ;)
> 
> The problem with pg_shadow is unless you specifically looked at it
> in the docs after 8.1, you had no idea it was deprecated. I don't
> really think of it as deprecated.

It's not even maintained properly, I hardly see how it couldn't be
anything but deprecated, and the docs certainly are the right place, if
anywhere, to say that something is deprecated.

> As someone mentioned, forcing a user to install an extension makes
> the deprecation visible. Another option would be to have the backend
> spit out a WARNING the first time you access anything that's
> deprecated. Both of those are pertinent reminders to people that
> they need to change their tools.

Ugh.  Please, no.  Hacking up the backend to recognize that a given
query is referring to a deprecated view and then throwing a warning on
it is just plain ugly.

Let's go one step further, and throw an ERROR if someone tries to query
these views instead.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] WAL consistency check facility

2017-02-09 Thread Michael Paquier
On Thu, Feb 9, 2017 at 5:56 AM, Robert Haas  wrote:
> On Wed, Feb 8, 2017 at 1:25 AM, Kuntal Ghosh  
> wrote:
>> Thank you Robert for the review. Please find the updated patch in the
>> attachment.
>
> I have committed this patch after fairly extensive revisions:

Cool. I had finally a look at what has been committed in a507b869.
Running regression tests with all RMGRs enabled, a single installcheck
generates 7GB of WAL. Woah.

> * Rewrote the documentation to give some idea what the underlying
> mechanism of operation of the feature is, so that users who choose to
> enable this will hopefully have some understanding of what they've
> turned on.

Thanks, those look good to me.

> * Renamed "char *page" arguments to "char *pagedata" and "Page page",
> because tempPage doesn't seem to be to be any better a name than
> page_norm.

> * Removed assertion in checkXLogConsistency that consistency checking
> is enabled for the RM; that's trivially false if
> wal_consistency_checking is not the same on the master and the
> standby.  (Note that quite apart from the issue of whether this
> function should exist at all, adding it to a header file after the
> closing #endif guard is certainly not right.)

I recall doing those two things the same way as in the commit. Not
sure at which point they have been re-introduced.

> * Changed checkXLogConsistency to skip pages whose LSN is newer than
> that of the record.  Without this, if you shut down recovery and
> restart it, it complains of inconsistent pages and dies.  (I'm not
> sure this is the only scenario that needs to be covered; it would be
> smart to do more testing of restarting the standby.)

Good point.

> -- a WAL consistency failure causes your standby to die a hard death.
> (Maybe there should be a way to suppress consistency checking on the
> standby -- but I think not just by requiring wal_consistency_checking
> on both ends.  Or maybe we should just downgrade the FATAL to WARNING;
> blowing up the standby irrevocably seems like poor behavior.)

Having a FATAL is useful for buildfarm members, that would show up in
red. Having a switch to generate a warning would be useful for live
deployments I agree. Now I think that we need as well two things:
- A recovery test to run regression tests with a standby behind.
- Extend the TAP tests so as it is possible to fill in postgresql.conf
with custom variables.
- have the buildfarm client run recovery tests!
I am fine to write those patches.

> I also bumped XLOG_PAGE_MAGIC (which is normally done by the
> committer, not the patch author, so I wasn't expecting that to be in
> the patch as submitted).

Here are a couple of things I have noticed while looking at the code.

+ * Portions Copyright (c) 2016, PostgreSQL Global Development Group
s/2016/2017/ in bufmask.c and bufmask.h.

+   if (ItemIdIsNormal(iid))
+   {
+
+   HeapTupleHeader page_htup = (HeapTupleHeader) page_item;
Unnecessary newline here.

+* Read the contents from the backup copy, stored in WAL record and
+* store it in a temporary page. There is not need to allocate a new
+* page here, a local buffer is fine to hold its contents and a mask
+* can be directly applied on it.
s/not need/no need/.

In checkXLogConsistency(), FPWs that have the flag BKPIMAGE_APPLY set
will still be checked, resulting in a FPW being compared to itself. I
think that those had better be bypassed.

Please find attached a patch with those fixes.
-- 
Michael


consistency-checks-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2017-02-09 Thread Peter Geoghegan
On Thu, Feb 9, 2017 at 2:47 PM, Peter Geoghegan  wrote:
>> which isn't an issue here, but reinforces my point about the (badly
>> documented) assumption that we don't release locks on user relations
>> early.
>
> You are right about the substantive issue, I assume, but I have a hard
> time agreeing with the idea that it's even badly documented when there
> are at least 10 counter-examples of blithely doing this. In any case,
> I find it very confusing when you talk about things as established
> practice/coding standards when they're very clearly not consistently
> adhered to at all. You should at least say "let's not make a bad
> situation any worse", or something, so that I don't need to spend 10
> minutes pulling my hair out.

BTW, aren't there cases where a PARALLEL SAFE function that needs to
acquire locks on some arbitrary relation not referenced in the query
can get locks on its own, which may only last as long as the parallel
worker's lifetime? This could happen *despite* the fact that the
author of the function may have imagined that callers could not
release relation level locks early (i.e., by not releasing them
directly, and so correctly following this "badly documented
assumption").

It seems like the existence of PARALLEL RESTRICTED is primarily
motivated by making stuff that definitely needs the locks to last
until xact end being sure that that happens -- the docs say so. This
seems wholly inconsistent with the idea that you're not supposed to
let that happen under any circumstances. I find all this highly
confusing. Have I missed some further subtlety that applies with
PARALLEL RESTRICTED?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread Stephen Frost
* Andres Freund (and...@anarazel.de) wrote:
> On 2017-02-09 19:19:21 -0500, Stephen Frost wrote:
> > I'd love to nuke pg_shadow and all the other
> > not-really-maintained backwards-compat things from when roles were
> > added too.
> 
> Not sure if it's worth the work to rip out and such, but I'm mildly
> supportive of this one too.  Depends a bit on what all the other things
> are ;)

Reviewing 7762619e95272974f90a38d8d85aafbe0e94add5 where roles were
added, I find:

pg_user   - use pg_roles instead, which actually includes all of the role
attributes, unlike pg_user

pg_shadow - use pg_authid instead, which, again, actually includes all
of the role attributes, unlike pg_shadow.

pg_group  - use pg_auth_members instead, which includes the info about
the admin option and the grantor

I don't think we should remove things like CREATE USER, that's a
perfectly reasonable and maintained interface, unlike the above views,
which missed out on things like the 'createrole' role attribute.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread Jim Nasby

On 2/9/17 6:37 PM, Andres Freund wrote:

Anybody who still wants tsearch2 can go get it from an old version, or
somebody can maintain a fork on github.

Works for me.

+1


+1


I'd love to nuke pg_shadow and all the other
not-really-maintained backwards-compat things from when roles were
added too.

Not sure if it's worth the work to rip out and such, but I'm mildly
supportive of this one too.  Depends a bit on what all the other things
are ;)


The problem with pg_shadow is unless you specifically looked at it in 
the docs after 8.1, you had no idea it was deprecated. I don't really 
think of it as deprecated.


As someone mentioned, forcing a user to install an extension makes the 
deprecation visible. Another option would be to have the backend spit 
out a WARNING the first time you access anything that's deprecated. Both 
of those are pertinent reminders to people that they need to change 
their tools.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improve OR conditions on joined columns (common star schema problem)

2017-02-09 Thread Jim Nasby

On 2/8/17 5:54 PM, Tom Lane wrote:

Although ... actually, that may not be the bottleneck for what you're
after.  The issue here is not so much discovering a clever plan for
a union as realizing that the query could be cast as a union in the
first place.


Right; their current workaround is to manually write a UNION. That's 
*significantly* faster than the OR.



Maybe it'd be better to imagine this as something closer to planagg.c,
that is it knows how to apply a specific high-level optimization that
might or might not be a win, so it builds a path describing that and sees
if it looks cheaper than a path done the normal way.  The fact that
we could even build a path describing a union is something that wasn't
there before 9.6, but maybe there's enough infrastructure for that now.


It's encouraging that there's at least a template to follow there... If 
there is missing infrastructure, is it likely to be major? My uninformed 
guess would be that the pathification was the major hurdle.



There's another transform using arrays that's possible as well (see
attached example); I believe that would work regardless of uniqueness.


That doesn't look terribly promising for automated application.
And I think it's really dependent on the exact shape of the OR
clause, which is an unpleasant limitation.  Considering the amount


Not sure what you mean by shape; do you mean whether the OR conditions 
are rels vs Consts or Vars?



of work this will take to do at all, you'd want it to be pretty
general I think.  I'm imagining something that would look for an
OR in which each clause referenced only one rel, then if it can
identify a way to re-unique-ify the result, split into a UNION
with an arm for each rel used in the OR.  The nature of the
conditions in each OR arm don't seem to matter ... though probably
you'd have to insist on there not being volatile conditions anywhere
in the query, because they'd get evaluated too many times.


In my experience, the common use case here is a large fact table that 
joins to a number of dimension tables. If you can filter the large fact 
table down to a fairly small size, those joins don't matter much. But if 
a big chunk of your filter comes from the joins, you're in trouble.


UNION isn't really a great way to solve this problem because you still 
end up doing a full join just to run the filter, but really all that you 
need are the values from the dimension table(s) that you need for the 
filter. IOW, change


WHERE t1 IN ('a','b') OR t2 IN ('c','d')

into

WHERE f1 IN (1,2) OR f2 IN (3,4)

(assuming a,b,c,d maps to 1,2,3,4)

BTW, there's an important caveat here: users generally do NOT want 
duplicate rows from the fact table if the dimension table results aren't 
unique. I thought my array solution was equivalent to what the JOINs 
would do in that case but that's actually wrong. The array solution does 
provide the behavior users generally want here though. JOIN is the 
easiest tool to pick up for this, so it's what people gravitate to, but 
I suspect most users would be happier with a construct that worked like 
the array trick does, but was easier to accomplish.


I wonder if any other databases have come up with non-standard syntax to 
do this.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread Andres Freund
On 2017-02-09 19:19:21 -0500, Stephen Frost wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
> > On Thu, Feb 9, 2017 at 4:24 PM, Tom Lane  wrote:
> > > Also, our experience with contrib/tsearch2 suggests that the extension
> > > shouldn't be part of contrib, because we have zero track record of getting
> > > rid of stuff in contrib, no matter how dead it is.
> > 
> > Let's nuke tsearch2 to remove this adverse precedent, and then add the
> > new thing.
> > 
> > Anybody who still wants tsearch2 can go get it from an old version, or
> > somebody can maintain a fork on github.
> 
> Works for me.

+1


> I'd love to nuke pg_shadow and all the other
> not-really-maintained backwards-compat things from when roles were
> added too.

Not sure if it's worth the work to rip out and such, but I'm mildly
supportive of this one too.  Depends a bit on what all the other things
are ;)

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
> On Thu, Feb 9, 2017 at 4:24 PM, Tom Lane  wrote:
> > Also, our experience with contrib/tsearch2 suggests that the extension
> > shouldn't be part of contrib, because we have zero track record of getting
> > rid of stuff in contrib, no matter how dead it is.
> 
> Let's nuke tsearch2 to remove this adverse precedent, and then add the
> new thing.
> 
> Anybody who still wants tsearch2 can go get it from an old version, or
> somebody can maintain a fork on github.

Works for me.  I'd love to nuke pg_shadow and all the other
not-really-maintained backwards-compat things from when roles were
added too.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] libpq Alternate Row Processor

2017-02-09 Thread Jim Nasby

On 2/8/17 5:11 PM, Kyle Gearhart wrote:

Overall, wall clock improves 24%.  User time elapsed is a 430% improvement.  
About half the time is spent waiting on the IO with the callback.  With the 
regular pqRowProcessor only about 16% of the time is spent waiting on IO.


To wit...

realusersys
single row  0.214   0.131   0.048
callback0.161   0.030   0.051

Those are averaged over 11 runs.

Can you run a trace to see where all the time is going in the single row 
case? I don't see an obvious time-suck with a quick look through the 
code. It'd be interesting to see how things change if you eliminate the 
filler column from the SELECT.


Also, the backend should be buffering ~8kb of data before handing that 
to the socket. If that's more than the kernel can buffer I'd expect a 
serious performance hit.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-02-09 Thread Peter Geoghegan
On Thu, Feb 9, 2017 at 2:31 PM, Robert Haas  wrote:
> You might think about plugging that hole by moving the registry of
> on-destroy functions into the segment itself and making it a shared
> resource.  But ASLR breaks that, especially for loadable modules.  You
> could try to fix that problem, in turn, by storing arguments that can
> later be passed to load_external_function() instead of a function
> pointer per se.  But that sounds pretty fragile because some other
> backend might not try to load the module until after it's attached the
> DSM segment and it might then fail because loading the module runs
> _PG_init() which can throw errors.   Maybe you can think of a way to
> plug that hole too but you're way over your $8 budget by this
> point.

At the risk of stating the obvious, ISTM that the right way to do
this, at a high level, is to err on the side of unneeded extra
unlink() calls, not leaking files. And, to make the window for problem
("remaining hole that you haven't quite managed to plug") practically
indistinguishable from no hole at all, in a way that's kind of baked
into the API.

It's not like we currently throw an error when there is a problem with
deleting temp files that are no longer needed on resource manager
cleanup. We simply log the fact that it happened, and limp on.

I attach my V8. This does not yet do anything with on_dsm_detach().
I've run out of time to work on it this week, and am starting a new
job next week at VMware, which I'll need time to settle into. So I'm
posting this now, since you can still very much see the direction I'm
going in, and can give me any feedback that you have. If anyone wants
to show me how its done by building on this, and finishing what I have
off, be my guest. The new stuff probably isn't quite as polished as I
would prefer, but time grows short, so I won't withhold it.

Changes:

* Implements refcount thing, albeit in a way that leaves a small
window for double unlink() calls if there is an error during the small
window in which there is worker/leader co-ownership of a BufFile (just
add an "elog(ERROR)" just before leader-as-worker Tuplesort state is
ended within _bt_leafbuild() to see what I mean). This implies that
background workers can be reclaimed once the leader needs to start its
final on-the-fly merge, which is nice. As an example of how that's
nice, this change makes maintenance_work_mem a budget that we more
strictly adhere to.

* Fixes bitrot caused by recent logtape.c bugfix in master branch.

* No local segment is created during unification unless and until one
is required. (In practice, for current use of BufFile infrastructure,
no "local" segment is ever created, even if we force a randomAccess
case using one of the testing GUCs from 0002-* -- we'd have to use
another GUC to *also* force there to be no reclaimation.)

* Better testing. As I just mentioned, we can now force logtape.c to
not reclaim blocks, so you make new local segments as part of a
unified BufFile, which have different considerations from a resource
management point of view. Despite being part of the same "unified"
BufFile from the leader's perspective, it behaves like a local
segment, so it definitely seems like a good idea to have test coverage
for this, at least during development. (I have a pretty rough test
suite that I'm using; development of this patch has been somewhat test
driven.)

* Better encapsulation of BufFile stuff. I am even closer to the ideal
of this whole sharing mechanism being a fairly generic BufFile thing
that logtape.c piggy-backs on without having special knowledge of the
mechanism. It's still true that the mechanism (sharing/unification) is
written principally with logtape.c in mind, but that's just because of
its performance characteristics. Nothing to do with the interface.

* Worked through items raised by Thomas in his 2017-01-30 mail to this thread.

 Secondly, I might not want to be constrained by a
 fixed-sized DSM segment to hold my SharedBufFile objects... there are
 cases where I need to shared a number of batch files that is unknown
 at the start of execution time when the DSM segment is sized (I'll
 write about that shortly on the Parallel Shared Hash thread).  Maybe I
 can find a way to get rid of that requirement.  Or maybe it could
 support DSA memory too, but I don't think it's possible to use
 on_dsm_detach-based cleanup routines that refer to DSA memory because
 by the time any given DSM segment's detach hook runs, there's no
 telling which other DSM segments have been detached already, so the
 DSA area may already have partially vanished; some other kind of hook
 that runs earlier would be needed...
>>>
>>> Again, wrench.

I like the wrench analogy too, FWIW.

>> My problem here is that I don't know how many batches I'll finish up
>> creating.  In general that's OK because I can hold onto them as
>> private BufFiles owned by participants with 

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-02-09 Thread Thomas Munro
On Fri, Feb 10, 2017 at 11:31 AM, Robert Haas  wrote:
> On Thu, Feb 9, 2017 at 5:09 PM, Thomas Munro
>  wrote:
>> I agree that the pinned segment case doesn't matter right now, I just
>> wanted to point it out.  I like your $10 wrench analogy, but maybe it
>> could be argued that adding a dsm_on_destroy() callback mechanism is
>> not only better than adding another refcount to track that other
>> refcount, but also a steal at only $8.
>
> If it's that simple, it might be worth doing, but I bet it's not.  One
> problem is that there's a race condition: there will inevitably be a
> period of time after you've called dsm_attach() and before you've
> attached to the specific data structure that we're talking about here.
> So suppose the last guy who actually knows about this data structure
> dies horribly and doesn't clean up because the DSM isn't being
> destroyed; moments later, you die horribly before reaching the code
> where you attach to this data structure.  Oops.

Right, I mentioned this problem earlier ("and also register the
on_dsm_detach callback, before any chance that an error might be
thrown (is that difficult?); failure to do so could result in file
leaks").

Here's my thought process... please tell me where I'm going wrong:

I have been assuming that it's not enough to just deal with this when
the leader detaches on the theory that other participants will always
detach first: that probably isn't true in some error cases, and could
contribute to spurious racy errors where other workers complain about
disappearing files if the leader somehow shuts down and cleans up
while a worker is still running.  Therefore we need *some* kind of
refcounting, whether it's a new kind or a new mechanism based on the
existing kind.

I have also been assuming that we don't want to teach dsm.c directly
about this stuff; it shouldn't need to know about other modules, so we
don't want it talking to buffile.c directly and managing a special
table of files; instead we want a system of callbacks.  Therefore
client code needs to do something after attaching to the segment in
each backend.

It doesn't matter whether we use an on_dsm_detach() callback and
manage our own refcount to infer that destruction is imminent, or a
new on_dsm_destroy() callback which tells us so explicitly: both ways
we'll need to make sure that anyone who attaches to the segment also
"attaches" to this shared BufFile manager object inside it, because
any backend might turn out to be the one that is last to detach.

That bring us to the race you mentioned.  Isn't it sufficient to say
that you aren't allowed to do anything that might throw in between
attaching to the segment and attaching to the SharedBufFileManager
that it contains?

Up until two minutes ago I assumed that policy would leave only two
possibilities: you attach to the DSM segment and attach to the
SharedBufFileManager successfully or you attach to the DSM segment and
then die horribly (but not throw) and the postmaster restarts the
whole cluster and blows all temp files away with RemovePgTempFiles().
But I see now in the comment of that function that crash-induced
restarts don't call that because "someone might want to examine the
temp files for debugging purposes".  Given that policy for regular
private BufFiles, I don't see why that shouldn't apply equally to
shared files: after a crash restart, you may have some junk files that
won't be cleaned up until your next clean restart, whether they were
private or shared BufFiles.

> You might think about plugging that hole by moving the registry of
> on-destroy functions into the segment itself and making it a shared
> resource.  But ASLR breaks that, especially for loadable modules.  You
> could try to fix that problem, in turn, by storing arguments that can
> later be passed to load_external_function() instead of a function
> pointer per se.  But that sounds pretty fragile because some other
> backend might not try to load the module until after it's attached the
> DSM segment and it might then fail because loading the module runs
> _PG_init() which can throw errors.   Maybe you can think of a way to
> plug that hole too but you're way over your $8 budget by this
> point.

Agreed, those approaches seem like a non-starters.

>> My problem here is that I don't know how many batches I'll finish up
>> creating.  [...]
>
> I thought the idea was that the structure we're talking about here
> owns all the files, up to 2 from a leader that wandered off plus up to
> 2 for each worker.  Last process standing removes them.  Or are you
> saying each worker only needs 2 files but the leader needs a
> potentially unbounded number?

Yes, potentially unbounded in rare case.  If we plan for N batches,
and then run out of work_mem because our estimates were just wrong or
the distributions of keys is sufficiently skewed, we'll run
HashIncreaseNumBatches, and that could happen more than once.  I 

Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2017-02-09 Thread Peter Geoghegan
On Thu, Feb 9, 2017 at 2:32 PM, Andres Freund  wrote:
> Except that the proposed names aren't remotely like that... ;).
>
> And I proposed documenting named parameters, and
> check_btree(performing_check_requiring_exclusive_locks => true) is just
> about as expressive.

I cannot think of an example, offhand, of where it isn't self-evident
what relation level locks will be acquired when a statement is
prepared. I guess you could say that VACUUM truncation is a bit like
that, but it's not the same IMV.

Maybe there is an easy compromise here. We can add an argument with a
default, "level". It's an "enum" of the style of pg_prewarm (i.e.,
it's actually a text argument). The default is 'medium' or something
like that. Maybe we don't add any other flags for now, or maybe we add
some generic ones that don't actually changed the behavior. Existing
names are retained.

Doesn't that address your concern about an inflexible API? I really
dislike the idea of having the lock strength be determined on the
basis of a function arguments value.

>> I don't agree with that.  The reason we keep relation locks until the
>> end of the transaction is to avoid surprising application code, not
>> because the system can't tolerate it.  But here it's more likely that
>> retaining the lock will surprise the user.

> There were other issues than this, but right of the top of my head I
> only remember:
> double
> IndexBuildHeapRangeScan(Relation heapRelation,
> ..
> /*
>  * Since caller should hold ShareLock 
> or better, normally
>  * the only way to see this is if it 
> was inserted earlier
>  * in our own transaction.  However, 
> it can happen in
>  * system catalogs, since we tend to 
> release write lock
>  * before commit there.  Give a 
> warning if neither case
>  * applies.
>  */
> xwait = 
> HeapTupleHeaderGetXmin(heapTuple->t_data);
> if 
> (!TransactionIdIsCurrentTransactionId(xwait))
> {
> if (!is_system_catalog)
> elog(WARNING, 
> "concurrent insert in progress within table \"%s\"",
>  
> RelationGetRelationName(heapRelation));
>
> which isn't an issue here, but reinforces my point about the (badly
> documented) assumption that we don't release locks on user relations
> early.

You are right about the substantive issue, I assume, but I have a hard
time agreeing with the idea that it's even badly documented when there
are at least 10 counter-examples of blithely doing this. In any case,
I find it very confusing when you talk about things as established
practice/coding standards when they're very clearly not consistently
adhered to at all. You should at least say "let's not make a bad
situation any worse", or something, so that I don't need to spend 10
minutes pulling my hair out.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2017-02-09 Thread Andres Freund
On 2017-02-09 17:12:58 -0500, Robert Haas wrote:
> On Thu, Feb 9, 2017 at 4:57 PM, Andres Freund  wrote:
> > Meh.  I don't think that's a serious problem, nor is without precedent
> > (say same toplevel DDL with differing lock levels depending on
> > options). Nor do the function names confer that any more efficiently
> > than parameters.
> 
> Really?  check_btree(blah, true) vs. check_btree(blah, false) is no
> more or less clear than check_btree_for_plausible_errors(blah) vs.
> check_btree_in_unreasonable_detail(blah)?

Except that the proposed names aren't remotely like that... ;).

And I proposed documenting named parameters, and
check_btree(performing_check_requiring_exclusive_locks => true) is just
about as expressive.


> > Again, some parts of the code doing something bad isn't a good argument
> > for doing again. Releasing locks early is a bad pattern, because backend
> > code isn't generally safe against it, we have special code-paths for
> > catalog tables to support it for those.
> 
> I don't agree with that.  The reason we keep relation locks until the
> end of the transaction is to avoid surprising application code, not
> because the system can't tolerate it.  But here it's more likely that
> retaining the lock will surprise the user.

It's both.  A bunch of code paths rely on early release ony happening on
catalogs. E.g., and that's just the first thing that comes to mind,
cache invalidations which really only are guaranteed to be delivered and
visibile for other cache accesses, if the lock is only released after
the end of the transaction.  Object accesses like relation_open() accept
invalidations *after* acquiring the lock. Combined with lock releases
only happening *after* transaction commits that guarantees an up2date
view of the cache; but if others release the lock earlier and have cache
invalidations, that doesn't work anymore.  Now you can argue that it's
possibly harmless here because there's presumably no way this can ever
trigger cache invals - and you're probably right - but this isn't the
only place with such assumption and you'd definitely have to argue *why*
it's right.

There were other issues than this, but right of the top of my head I
only remember:
double
IndexBuildHeapRangeScan(Relation heapRelation,
..
/*
 * Since caller should hold ShareLock 
or better, normally
 * the only way to see this is if it 
was inserted earlier
 * in our own transaction.  However, it 
can happen in
 * system catalogs, since we tend to 
release write lock
 * before commit there.  Give a warning 
if neither case
 * applies.
 */
xwait = 
HeapTupleHeaderGetXmin(heapTuple->t_data);
if 
(!TransactionIdIsCurrentTransactionId(xwait))
{
if (!is_system_catalog)
elog(WARNING, 
"concurrent insert in progress within table \"%s\"",
 
RelationGetRelationName(heapRelation));

which isn't an issue here, but reinforces my point about the (badly
documented) assumption that we don't release locks on user relations
early.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-02-09 Thread Robert Haas
On Thu, Feb 9, 2017 at 5:09 PM, Thomas Munro
 wrote:
> I agree that the pinned segment case doesn't matter right now, I just
> wanted to point it out.  I like your $10 wrench analogy, but maybe it
> could be argued that adding a dsm_on_destroy() callback mechanism is
> not only better than adding another refcount to track that other
> refcount, but also a steal at only $8.

If it's that simple, it might be worth doing, but I bet it's not.  One
problem is that there's a race condition: there will inevitably be a
period of time after you've called dsm_attach() and before you've
attached to the specific data structure that we're talking about here.
So suppose the last guy who actually knows about this data structure
dies horribly and doesn't clean up because the DSM isn't being
destroyed; moments later, you die horribly before reaching the code
where you attach to this data structure.  Oops.

You might think about plugging that hole by moving the registry of
on-destroy functions into the segment itself and making it a shared
resource.  But ASLR breaks that, especially for loadable modules.  You
could try to fix that problem, in turn, by storing arguments that can
later be passed to load_external_function() instead of a function
pointer per se.  But that sounds pretty fragile because some other
backend might not try to load the module until after it's attached the
DSM segment and it might then fail because loading the module runs
_PG_init() which can throw errors.   Maybe you can think of a way to
plug that hole too but you're way over your $8 budget by this
point.

>>> Secondly, I might not want to be constrained by a
>>> fixed-sized DSM segment to hold my SharedBufFile objects... there are
>>> cases where I need to shared a number of batch files that is unknown
>>> at the start of execution time when the DSM segment is sized (I'll
>>> write about that shortly on the Parallel Shared Hash thread).  Maybe I
>>> can find a way to get rid of that requirement.  Or maybe it could
>>> support DSA memory too, but I don't think it's possible to use
>>> on_dsm_detach-based cleanup routines that refer to DSA memory because
>>> by the time any given DSM segment's detach hook runs, there's no
>>> telling which other DSM segments have been detached already, so the
>>> DSA area may already have partially vanished; some other kind of hook
>>> that runs earlier would be needed...
>>
>> Again, wrench.
>
> My problem here is that I don't know how many batches I'll finish up
> creating.  In general that's OK because I can hold onto them as
> private BufFiles owned by participants with the existing cleanup
> mechanism, and then share them just before they need to be shared (ie
> when we switch to processing the next batch so they need to be
> readable by all).  Now I only ever share one inner and one outer batch
> file per participant at a time, and then I explicitly delete them at a
> time that I know to be safe and before I need to share a new file that
> would involve recycling the slot, and I'm relying on DSM segment scope
> cleanup only to handle error paths.  That means that in generally I
> only need space for 2 * P shared BufFiles at a time.  But there is a
> problem case: when the leader needs to exit early, it needs to be able
> to transfer ownership of any files it has created, which could be more
> than we planned for, and then not participate any further in the hash
> join, so it can't participate in the on-demand sharing scheme.

I thought the idea was that the structure we're talking about here
owns all the files, up to 2 from a leader that wandered off plus up to
2 for each worker.  Last process standing removes them.  Or are you
saying each worker only needs 2 files but the leader needs a
potentially unbounded number?

> Perhaps we can find a way to describe a variable number of BufFiles
> (ie batches) in a fixed space by making sure the filenames are
> constructed in a way that lets us just have to say how many there are.

That could be done.

> Then the next problem is that for each BufFile we have to know how
> many 1GB segments there are to unlink (files named foo, foo.1, foo.2,
> ...), which Peter's code currently captures by publishing the file
> size in the descriptor... but if a fixed size object must describe N
> BufFiles, where can I put the size of each one?  Maybe I could put it
> in a header of the file itself (yuck!), or maybe I could decide that I
> don't care what the size is, I'll simply unlink "foo", then "foo.1",
> then "foo.2", ... until I get ENOENT.

There's nothing wrong with that algorithm as far as I'm concerned.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2017-02-09 Thread Robert Haas
On Thu, Feb 9, 2017 at 4:57 PM, Andres Freund  wrote:
> On 2017-02-09 13:29:42 -0800, Peter Geoghegan wrote:
>> > I'd really like to have something like relation_check(), index_check()
>> > which calls the correct functions for the relevan index types (and
>> > possibly warns about !nbtree indexes not being checked?).
>>
>> I feel that that's something for an in-core facility that exposes this
>> through DDL. One good reason to not do that just yet is that I'm not
>> completely comfortable with what a uniform interface looks like here.
>> And frankly, I really just want to get this out of the way, to prove
>> the general concept of verification tooling. amcheck v1 is a good
>> start, but much work remains. I think that no one will take the idea
>> completely seriously until it is usable as a core extension. Let's get
>> the ball rolling.
>
> I'm not convinced at all by that argument.  If we want something in
> core, having this in contrib doesn't help much.  Nor do I see much
> point in explicit syntax.

I don't see any need for this to be in core, nor do I think we need
explicit syntax.  But given that, I also don't think that we need the
generalized infrastructure you proposed in the quoted section above.

>> I strongly disagree. I would agree if the lock strength were the same
>> in both cases, but it isn't. Varying the strength of heavyweight lock
>> taken based on an argument to a function is a massive foot-gun IMV.
>
> Meh.  I don't think that's a serious problem, nor is without precedent
> (say same toplevel DDL with differing lock levels depending on
> options). Nor do the function names confer that any more efficiently
> than parameters.

Really?  check_btree(blah, true) vs. check_btree(blah, false) is no
more or less clear than check_btree_for_plausible_errors(blah) vs.
check_btree_in_unreasonable_detail(blah)?

>> I do think that that's the logical way to split out functionality. I
>> don't think that there is likely to be a problem with adding stuff in
>> the future. It'll either be something that we can do in passing
>> anyway, in which case we can just add it to the existing checks
>> harmlessly. Or, it will be something like a tool that verifies the
>> heap is consistent with a given index, which is vastly more expensive
>> in terms of CPU and I/O costs, and yet is probably no worse than
>> bt_index_check() in terms of lock strength (AccessShareLock).
>
> But you quite possibly want to do all three in one pass - and that'll be
> ugly with your API.  With your API I'll not be able to do the parent
> check and the heap check at once.  Lest we add fourth function that does
> all of it.

This however seems like a good point.

> Again, some parts of the code doing something bad isn't a good argument
> for doing again. Releasing locks early is a bad pattern, because backend
> code isn't generally safe against it, we have special code-paths for
> catalog tables to support it for those.

I don't agree with that.  The reason we keep relation locks until the
end of the transaction is to avoid surprising application code, not
because the system can't tolerate it.  But here it's more likely that
retaining the lock will surprise the user.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-02-09 Thread Thomas Munro
On Fri, Feb 10, 2017 at 9:51 AM, Robert Haas  wrote:
> On Wed, Feb 8, 2017 at 5:36 AM, Thomas Munro
>  wrote:
>> Thinking a bit harder about this, I suppose there could be a kind of
>> object called a SharedBufFileManager [... description of that ...].
>
> I think this is approximately reasonable, but I think it could be made
> simpler by having fewer separate objects.  Let's assume the leader can
> put an upper bound on the number of shared BufFiles at the time it's
> sizing the DSM segment (i.e. before InitializeParallelDSM).  Then it
> can allocate a big ol' array with a header indicating the array size
> and each element containing enough space to identify the relevant
> details of 1 shared BufFile.  Now you don't need to do any allocations
> later on, and you don't need a linked list.  You just loop over the
> array and do what needs doing.

Makes sense.

>> There are a couple of problems with the above though.  Firstly, doing
>> reference counting in DSM segment on-detach hooks is really a way to
>> figure out when the DSM segment is about to be destroyed by keeping a
>> separate refcount in sync with the DSM segment's refcount, but it
>> doesn't account for pinned DSM segments.  It's not your use-case or
>> mine currently, but someone might want a DSM segment to live even when
>> it's not attached anywhere, to be reattached later.  If we're trying
>> to use DSM segment lifetime as a scope, we'd be ignoring this detail.
>> Perhaps instead of adding our own refcount we need a new kind of hook
>> on_dsm_destroy.
>
> I think it's good enough to plan for current needs now.  It's not
> impossible to change this stuff later, but we need something that
> works robustly right now without being too invasive.  Inventing whole
> new system concepts because of stuff we might someday want to do isn't
> a good idea because we may easily guess wrong about what direction
> we'll want to go in the future.  This is more like building a wrench
> than a 747: a 747 needs to be extensible and reconfigurable and
> upgradable because it costs $350 million.   A wrench costs $10 at
> Walmart and if it turns out we bought the wrong one, we can just throw
> it out and get a different one later.

I agree that the pinned segment case doesn't matter right now, I just
wanted to point it out.  I like your $10 wrench analogy, but maybe it
could be argued that adding a dsm_on_destroy() callback mechanism is
not only better than adding another refcount to track that other
refcount, but also a steal at only $8.

>> Secondly, I might not want to be constrained by a
>> fixed-sized DSM segment to hold my SharedBufFile objects... there are
>> cases where I need to shared a number of batch files that is unknown
>> at the start of execution time when the DSM segment is sized (I'll
>> write about that shortly on the Parallel Shared Hash thread).  Maybe I
>> can find a way to get rid of that requirement.  Or maybe it could
>> support DSA memory too, but I don't think it's possible to use
>> on_dsm_detach-based cleanup routines that refer to DSA memory because
>> by the time any given DSM segment's detach hook runs, there's no
>> telling which other DSM segments have been detached already, so the
>> DSA area may already have partially vanished; some other kind of hook
>> that runs earlier would be needed...
>
> Again, wrench.

My problem here is that I don't know how many batches I'll finish up
creating.  In general that's OK because I can hold onto them as
private BufFiles owned by participants with the existing cleanup
mechanism, and then share them just before they need to be shared (ie
when we switch to processing the next batch so they need to be
readable by all).  Now I only ever share one inner and one outer batch
file per participant at a time, and then I explicitly delete them at a
time that I know to be safe and before I need to share a new file that
would involve recycling the slot, and I'm relying on DSM segment scope
cleanup only to handle error paths.  That means that in generally I
only need space for 2 * P shared BufFiles at a time.  But there is a
problem case: when the leader needs to exit early, it needs to be able
to transfer ownership of any files it has created, which could be more
than we planned for, and then not participate any further in the hash
join, so it can't participate in the on-demand sharing scheme.

Perhaps we can find a way to describe a variable number of BufFiles
(ie batches) in a fixed space by making sure the filenames are
constructed in a way that lets us just have to say how many there are.
Then the next problem is that for each BufFile we have to know how
many 1GB segments there are to unlink (files named foo, foo.1, foo.2,
...), which Peter's code currently captures by publishing the file
size in the descriptor... but if a fixed size object must describe N
BufFiles, where can I put the size of each one?  Maybe I could put it
in a header of the 

Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2017-02-09 Thread Andres Freund
On 2017-02-09 13:29:42 -0800, Peter Geoghegan wrote:
> > I'd really like to have something like relation_check(), index_check()
> > which calls the correct functions for the relevan index types (and
> > possibly warns about !nbtree indexes not being checked?).
> 
> I feel that that's something for an in-core facility that exposes this
> through DDL. One good reason to not do that just yet is that I'm not
> completely comfortable with what a uniform interface looks like here.
> And frankly, I really just want to get this out of the way, to prove
> the general concept of verification tooling. amcheck v1 is a good
> start, but much work remains. I think that no one will take the idea
> completely seriously until it is usable as a core extension. Let's get
> the ball rolling.

I'm not convinced at all by that argument.  If we want something in
core, having this in contrib doesn't help much.  Nor do I see much
point in explicit syntax.


> >> +#define CHECK_SUPERUSER() { \
> >> + if (!superuser()) \
> >> + ereport(ERROR, \
> >> + 
> >> (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), \
> >> +  (errmsg("must be superuser to use 
> >> verification functions"; }
> >
> > Please make this either a function or just copy - the macro doesn't buy
> > us anything except weirder looking syntax.
> 
> It's actually from pageinspect, but okay.

We have a lot of dodgy stuff in places ;)


> >> +PG_FUNCTION_INFO_V1(bt_index_check);
> >> +PG_FUNCTION_INFO_V1(bt_index_parent_check);
> >
> > I think it'd be good to avoid adding a lot of functions for each
> > additional type of check we can think of.  I think adding a named 'bool
> > check_parents = false' argument or such would be better.  Then we can
> > simply add more and more checks subsequently, and we can have
> > 'bool all_checks = false' so people can trivially check everything,
> > instead of having to add new checks in each new function.
> 
> I strongly disagree. I would agree if the lock strength were the same
> in both cases, but it isn't. Varying the strength of heavyweight lock
> taken based on an argument to a function is a massive foot-gun IMV.

Meh.  I don't think that's a serious problem, nor is without precedent
(say same toplevel DDL with differing lock levels depending on
options). Nor do the function names confer that any more efficiently
than parameters.


> I do think that that's the logical way to split out functionality. I
> don't think that there is likely to be a problem with adding stuff in
> the future. It'll either be something that we can do in passing
> anyway, in which case we can just add it to the existing checks
> harmlessly. Or, it will be something like a tool that verifies the
> heap is consistent with a given index, which is vastly more expensive
> in terms of CPU and I/O costs, and yet is probably no worse than
> bt_index_check() in terms of lock strength (AccessShareLock).

But you quite possibly want to do all three in one pass - and that'll be
ugly with your API.  With your API I'll not be able to do the parent
check and the heap check at once.  Lest we add fourth function that does
all of it.


> My thinking here is that users really ought to use bt_index_check() in
> almost all cases, not bt_index_parent_check(), which is more of a tool
> for hackers that are developing new B-Tree features. The distinction
> is blurry, in fact, but the lock strength requirements of
> bt_index_parent_check() make it vastly less useful.

A quite reasonable patter is to do this verification on standbys that
you can either bring up separately and/or just temporarily pause.  Or
just backups before declaring them valid.


> >> +/*
> >> + * bt_index_check(index regclass)
> >> + *
> >> + * Verify integrity of B-Tree index.
> >> + *
> >> + * Only acquires AccessShareLock on index relation.  Does not consider
> >> + * invariants that exist between parent/child pages.
> >> + */
> 
> > I'm *VERY* strongly against releasing locks on user relations early.
> 
> Why? pageinspect does this.

Again, some parts of the code doing something bad isn't a good argument
for doing again. Releasing locks early is a bad pattern, because backend
code isn't generally safe against it, we have special code-paths for
catalog tables to support it for those.


> >> + /*
> >> +  * We must lock table before index to avoid deadlocks.  However, if 
> >> the
> >> +  * passed indrelid isn't an index then IndexGetRelation() will fail.
> >> +  * Rather than emitting a not-very-helpful error message, postpone
> >> +  * complaining, expecting that the is-it-an-index test below will 
> >> fail.
> >> +  */
> >> + heapid = IndexGetRelation(indrelid, true);
> >> + if (OidIsValid(heapid))
> >> + heaprel = heap_open(heapid, ShareLock);
> >> + else
> >> + heaprel = NULL;
> >> +
> >> + /*
> >> +  * Open the target index relations 

Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)

2017-02-09 Thread Erik Rijkers

On 2017-02-09 22:15, Tom Lane wrote:

Corey Huinker  writes:


The feature now  ( at patch v10) lets you break off with Ctrl-C 
anywhere.  I like it now much more.


The main thing I still dislike somewhat about the patch is the verbose 
output. To be honest I would prefer to just remove /all/ the interactive 
output.


I would vote to just make it remain silent if there is no error.   (and 
if there is an error, issue a message and exit)


thanks,

Erik Rijkers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread Robert Haas
On Thu, Feb 9, 2017 at 4:24 PM, Tom Lane  wrote:
> Also, our experience with contrib/tsearch2 suggests that the extension
> shouldn't be part of contrib, because we have zero track record of getting
> rid of stuff in contrib, no matter how dead it is.

Let's nuke tsearch2 to remove this adverse precedent, and then add the
new thing.

Anybody who still wants tsearch2 can go get it from an old version, or
somebody can maintain a fork on github.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)

2017-02-09 Thread Robert Haas
On Thu, Feb 9, 2017 at 4:15 PM, Tom Lane  wrote:
> Basically, I think you need to start removing complexity (in the sense of
> special cases), not adding more.  I think Robert was saying the same
> thing, though possibly I shouldn't put words in his mouth.

Yeah, I was definitely going in that direction, whatever the details.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-09 Thread Andres Freund
Hi,

On 2016-12-13 01:45:13 +0100, Tomas Vondra wrote:
>  src/backend/utils/mmgr/Makefile   |   2 +-
>  src/backend/utils/mmgr/aset.c | 115 +
>  src/backend/utils/mmgr/memdebug.c | 131 
> ++
>  src/include/utils/memdebug.h  |  22 +++
>  4 files changed, 156 insertions(+), 114 deletions(-)
>  create mode 100644 src/backend/utils/mmgr/memdebug.c

I'm a bit loathe to move these to a .c file - won't this likely make
these debugging tools even slower?  Seems better to put some of them
them in a header as static inlines (not randomize, but the rest).


> From 43aaabf70b979b172fd659ef4d0ef129fd78d72d Mon Sep 17 00:00:00 2001
> From: Tomas Vondra 
> Date: Wed, 30 Nov 2016 15:36:23 +0100
> Subject: [PATCH 2/3] slab allocator
> 
> ---
>  src/backend/replication/logical/reorderbuffer.c |  82 +--
>  src/backend/utils/mmgr/Makefile |   2 +-
>  src/backend/utils/mmgr/slab.c   | 803 
> 
>  src/include/nodes/memnodes.h|   2 +-
>  src/include/nodes/nodes.h   |   1 +
>  src/include/replication/reorderbuffer.h |  15 +-
>  src/include/utils/memutils.h|   9 +

I'd like to see the reorderbuffer changes split into a separate commit
from the slab allocator introduction.




> +/*-
> + *
> + * slab.c
> + * SLAB allocator definitions.
> + *
> + * SLAB is a custom MemoryContext implementation designed for cases of
> + * equally-sized objects.
> + *
> + *
> + * Portions Copyright (c) 2016, PostgreSQL Global Development Group

Bump, before a committer forgets it.


> + * IDENTIFICATION
> + * src/backend/utils/mmgr/slab.c
> + *
> + *
> + *   The constant allocation size allows significant simplification and 
> various
> + *   optimizations. Firstly, we can get rid of the doubling and carve the 
> blocks
> + *   into chunks of exactly the right size (plus alignment), not wasting 
> memory.

Getting rid of it relative to what? I'd try to phrase it so these
comments stand on their own.


> + *   Each block includes a simple bitmap tracking which chunks are used/free.
> + *   This makes it trivial to check if all chunks on the block are free, and
> + *   eventually free the whole block (which is almost impossible with a 
> global
> + *   freelist of chunks, storing chunks from all blocks).

Why is checking a potentially somewhat long-ish bitmap better than a
simple counter, or a "linked list" of "next free chunk-number" or such
(where free chunks simply contain the id of the subsequent chunk)?
Using a list instead of a bitmap would also make it possible to get
'lifo' behaviour, which is good for cache efficiency.  A simple
chunk-number based singly linked list would only imply a minimum
allocation size of 4 - that seems perfectly reasonable?


> + *   At the context level, we use 'freelist' to track blocks ordered by 
> number
> + *   of free chunks, starting with blocks having a single allocated chunk, 
> and
> + *   with completely full blocks on the tail.

Why that way round? Filling chunks up as much as possible is good for
cache and TLB efficiency, and allows for earlier de-allocation of
partially used blocks?  Oh, I see you do that in the next comment,
but it still leaves me wondering.

Also, is this actually a list? It's more an array of lists, right?
I.e. it should be named freelists?

Thirdly, isn't that approach going to result in a quite long freelists
array, when you have small items and a decent blocksize? That seems like
a fairly reasonable thing to do?


> + *   This also allows various optimizations - for example when searching for
> + *   free chunk, we the allocator reuses space from the most full blocks 
> first,
> + *   in the hope that some of the less full blocks will get completely empty
> + *   (and returned back to the OS).

Might be worth mentioning tlb/cache efficiency too.


> + *   For each block, we maintain pointer to the first free chunk - this is 
> quite
> + *   cheap and allows us to skip all the preceding used chunks, eliminating
> + *   a significant number of lookups in many common usage patters. In the 
> worst
> + *   case this performs as if the pointer was not maintained.

Hm, so that'd be eliminated if we maintained a linked list of chunks (by
chunk number) and a free_chunk_cnt or such.


> +
> +#include "postgres.h"
> +
> +#include "utils/memdebug.h"
> +#include "utils/memutils.h"
> +#include "lib/ilist.h"

Move ilist up, above memdebug, so the list is alphabetically ordered.


> +/*
> + * SlabPointer
> + *   Aligned pointer which may be a member of an allocation set.
> + */
> +typedef void *SlabPointer;
> +typedef SlabContext *Slab;

I personally wont commit this whith pointer hiding typedefs.  If
somebody else does, I can live with it, but for me it's bad enough taste
that I wont.


> 

Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2017-02-09 Thread Peter Geoghegan
On Thu, Feb 9, 2017 at 12:05 AM, Andres Freund  wrote:
> reindex_index isn't necessarily comparable, because
> RangeVarCallbackForReindexIndex() is, on first blush at least, careful
> enough about the locking, and reindex_relation only gets the index list
> after building the index:
> /*
>  * Find and lock index, and check permissions on table; use callback 
> to
>  * obtain lock on table first, to avoid deadlock hazard.  The lock 
> level
>  * used here must match the index lock obtained in reindex_index().
>  */
>
> I think you ought to do something similar.

Okay. Looking into it, but see my response to your later remarks below.

> I'd really like to have something like relation_check(), index_check()
> which calls the correct functions for the relevan index types (and
> possibly warns about !nbtree indexes not being checked?).

I feel that that's something for an in-core facility that exposes this
through DDL. One good reason to not do that just yet is that I'm not
completely comfortable with what a uniform interface looks like here.
And frankly, I really just want to get this out of the way, to prove
the general concept of verification tooling. amcheck v1 is a good
start, but much work remains. I think that no one will take the idea
completely seriously until it is usable as a core extension. Let's get
the ball rolling.

>> +#define CHECK_SUPERUSER() { \
>> + if (!superuser()) \
>> + ereport(ERROR, \
>> + 
>> (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), \
>> +  (errmsg("must be superuser to use 
>> verification functions"; }
>
> Please make this either a function or just copy - the macro doesn't buy
> us anything except weirder looking syntax.

It's actually from pageinspect, but okay.

>> +#define CORRUPTION   ERROR
>> +#define CONCERN  DEBUG1
>> +#define POSITION DEBUG2
>> +#endif
>
> Please remove this block and use the normal elevels.

Okay.

>> +/*
>> + * State associated with verifying a B-Tree index
>> + */
>> +typedef struct BtreeCheckState
>> +{
>> + /*
>> +  * Unchanging state, established at start of verification:
>> +  *
>> +  * rel: B-Tree Index Relation
>> +  * exclusivelock:   ExclusiveLock held on rel; else AccessShareLock
>> +  * checkstrategy:   Buffer access strategy
>> +  * targetcontext:   Target page memory context
>> +  */
>> + Relationrel;
>> + boolexclusivelock;
>> + BufferAccessStrategycheckstrategy;
>> + MemoryContext   targetcontext;
>
> Can you move the field comments to the individual fields? Separating
> them makes it more likely to get out of date (obviously keep the
> header).  I'd also rename 'exclusivelock' to 'exclusivelylocked' or
> such.

Okay.

>> + /*
>> +  * Mutable state, for verification of particular page:
>> +  *
>> +  * target:  Main target page
>> +  * targetblock: Main target page's block number
>> +  * targetlsn:   Main target page's LSN
>
> Same here.

Okay.

>> +PG_FUNCTION_INFO_V1(bt_index_check);
>> +PG_FUNCTION_INFO_V1(bt_index_parent_check);
>
> I think it'd be good to avoid adding a lot of functions for each
> additional type of check we can think of.  I think adding a named 'bool
> check_parents = false' argument or such would be better.  Then we can
> simply add more and more checks subsequently, and we can have
> 'bool all_checks = false' so people can trivially check everything,
> instead of having to add new checks in each new function.

I strongly disagree. I would agree if the lock strength were the same
in both cases, but it isn't. Varying the strength of heavyweight lock
taken based on an argument to a function is a massive foot-gun IMV. I
do think that that's the logical way to split out functionality. I
don't think that there is likely to be a problem with adding stuff in
the future. It'll either be something that we can do in passing
anyway, in which case we can just add it to the existing checks
harmlessly. Or, it will be something like a tool that verifies the
heap is consistent with a given index, which is vastly more expensive
in terms of CPU and I/O costs, and yet is probably no worse than
bt_index_check() in terms of lock strength (AccessShareLock).

My thinking here is that users really ought to use bt_index_check() in
almost all cases, not bt_index_parent_check(), which is more of a tool
for hackers that are developing new B-Tree features. The distinction
is blurry, in fact, but the lock strength requirements of
bt_index_parent_check() make it vastly less useful. Especially
considering that it seems fairly unlikely that it will ever catch a
problem that bt_index_check() would not have. bt_index_parent_check()

Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread Tom Lane
Stephen Frost  writes:
> * Josh Berkus (j...@berkus.org) wrote:
>> If we clearly mark the old function names as deprecated aliases, client
>> tools will gradually move to the new names.

> No, they won't.  They haven't.  Look at pg_shadow- it was clearly marked
> as deprecated in *8.1*.

Back in 8.1 we didn't have extensions.  Now that we do, I think the
terms of discussion are a bit different.  In particular, if we relegate
the aliases to an extension, the pain of getting/installing that extension
will provide a forcing function encouraging users to fix their code so
they don't need it anymore.

Also, our experience with contrib/tsearch2 suggests that the extension
shouldn't be part of contrib, because we have zero track record of getting
rid of stuff in contrib, no matter how dead it is.

So I'm entirely happy with somebody who feels a need for this developing
an extension that we don't ship with the core system, and maintaining it
for as long as they continue to feel the need for it.  We couldn't stop
that from happening anyway.  But let's not buy into maintaining it as
part of the core distribution.  We don't have a mechanism for getting
rid of stuff once it's in the distro.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)

2017-02-09 Thread Tom Lane
Corey Huinker  writes:
> On Thu, Feb 9, 2017 at 3:13 PM, Tom Lane  wrote:
>> IMO, an erroneous backslash command should have no effect, period.

> One way around this is to make the small change: commands with invalid
> expressions are ignored in interactive mode.

> Another way around it would be to ignore branching commands in interactive
> mode altogether and give a message like "branching commands not supported
> in interactive mode".

Uh, neither of those seem to be responding to my point.  There is no case
in psql where a command with an invalid argument does something beyond
throwing an error.  I do not think that \if is the place to start.

Having it act differently in interactive and noninteractive modes is an
even worse idea.  AFAICS, the only real value of using \if interactively
is to test out something you are about to copy into a script.  If we go
that route we're destroying the ability to test that way.

Basically, I think you need to start removing complexity (in the sense of
special cases), not adding more.  I think Robert was saying the same
thing, though possibly I shouldn't put words in his mouth.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread Andres Freund
On 2017-02-09 13:03:41 -0800, Josh Berkus wrote:
> Counter-argument: moving the directory is going to break many tools
> anyway, so why bother with function aliases?

There's not actually that many tools affected by renaming pg_xlog,
i.e. there are tools that aren't affected by the rename at all, but do
calle *xlog* functions.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread Stephen Frost
* Josh Berkus (j...@berkus.org) wrote:
> On 02/09/2017 12:53 PM, Stephen Frost wrote:
> > Waiting 10+ years doesn't make the pain go away when it comes to
> > removing things like that.
> 
> Sure it does.  That's two whole generations of client tools.  For
> example, at that point, pgAdmin3 won't reliably run on any supported
> platform, so it won't be a problem if we break it.

That's not true if OpenSCG has their way as they're continuing to
maintain it, from my understanding.

And, I'm sorry, but counting on projects to essentially die off to be
the point where we can drop certain bits of backwards-compatibility just
isn't a winning strategy.

> If we clearly mark the old function names as deprecated aliases, client
> tools will gradually move to the new names.

No, they won't.  They haven't.  Look at pg_shadow- it was clearly marked
as deprecated in *8.1*.

> Counter-argument: moving the directory is going to break many tools
> anyway, so why bother with function aliases?

You're going to have to explain exactly the argument you're making
there, because I don't see the point you're trying to get at with that
question.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread David Steele
On 2/9/17 4:03 PM, Josh Berkus wrote:

> Counter-argument: moving the directory is going to break many tools
> anyway, so why bother with function aliases?

+1.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread Stephen Frost
Andres,

* Andres Freund (and...@anarazel.de) wrote:
> On 2017-02-09 15:53:47 -0500, Stephen Frost wrote:
> > * Josh Berkus (j...@berkus.org) wrote:
> > > What I'm voting against is the idea that we'll have aliases in core, but
> > > remove them in two releases.  Either that's unrealistic, or it's just
> > > prolonging the pain.
> > 
> > Waiting 10+ years doesn't make the pain go away when it comes to
> > removing things like that.
> 
> That's ridiculous. I think we can fairly argue whether backward compat
> is worth it, but claiming that migrations to something new aren't easier
> if there's a number of versions that support both the old and the new
> names/syntax/whatnot is obviously wrong.

We do provide a number of years of overlap- 5 years in fact.  I believe
that's an entirely good thing and gives our users a chance to manage
their upgrade paths from one major release to the next.

Increasing that overlap to 10 or 15 years, however, doesn't make a
hill-of-beans worth of difference, in my opinion.  Either they're
tracking the releases that we're doing and making changes and adapting
as we change things, or they aren't and won't ever unless they're forced
to, instead preferring to make us carry the burden of maintenance on the
backwards-compat pieces.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)

2017-02-09 Thread Corey Huinker
On Thu, Feb 9, 2017 at 3:13 PM, Tom Lane  wrote:

> Robert Haas  writes:
> > I (still) think this is a bad design.  Even if you've got all the
> > messages just right as things stand today, some new feature that comes
> > along in the future can change things so that they're not right any
> > more, and nobody's going to relish maintaining this.
>
> FWIW, I tend to agree that this is way overboard in terms of the amount of
> complexity going into the messages.  Also, I do not like what seems to
> be happening here:
>
> >> $ psql test < test2.sql -v ON_ERROR_STOP=0
> >> unrecognized value "error" for "\if ": boolean expected
> >> new \if is invalid, ignoring commands until next \endif
>
> IMO, an erroneous backslash command should have no effect, period.
> "It failed but we'll treat it as if it were valid" is a rathole
> I don't want to descend into.  It's particularly bad in interactive
> mode, because the most natural thing to do is correct your spelling
> and issue the command again --- but if psql already decided to do
> something on the strength of the mistaken command, that doesn't work,
> and you'll have to do something or other to unwind the unwanted
> control state before you can get back to what you meant to do.
>
> regards, tom lane
>

One way around this is to make the small change: commands with invalid
expressions are ignored in interactive mode.

Another way around it would be to ignore branching commands in interactive
mode altogether and give a message like "branching commands not supported
in interactive mode". That'd get rid of a lot of complexity right there. I
for one wouldn't miss it. The only use I saw for it was debugging a script,
and in that case the user can be their own branching via selective
copy/paste.

Do either of those sound appealing?


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread Josh Berkus
On 02/09/2017 12:53 PM, Stephen Frost wrote:
> * Josh Berkus (j...@berkus.org) wrote:
>> On 02/09/2017 12:42 PM, Stephen Frost wrote:
>>> * Josh Berkus (j...@berkus.org) wrote:
 On 02/09/2017 11:08 AM, Tom Lane wrote:
> Agreed, let's just get it done.
>
> Although this doesn't really settle whether we ought to do 3a (with
> backwards-compatibility function aliases in core) or 3b (without 'em).
> Do people want to re-vote, understanding that those are the remaining
> choices?

 Does 3a) mean keeping the aliases more-or-less forever?

 If not, I vote for 3b.  If we're going to need to break stuff, let's
 just do it.

 If we can keep the aliases for 6-10 years, then I see no reason not to
 have them (3a).  They're not exactly likely to conflict with user-chosen
 names.
>>>
>>> When we remove pg_shadow, then I'll be willing to agree that maybe we
>>> can start having things in PG for a couple releases that are just for
>>> backwards-compatibility and will actually be removed later.
>>>
>>> History has shown that's next to impossible, however.
>>
>> That's why I said 6-10 years.  If we're doing 3a, realistically we're
>> supporting it until PostgreSQL 16, at least, and more likely 20.  I'm OK
>> with that.
> 
> Uh, to be clear, I think it's an entirely bad thing that we've had those
> views and various other cruft hang around for over 10 years.
> 
> And removing them today will probably still have people crying about how
> pgAdmin3 and other things still use them.
> 
>> What I'm voting against is the idea that we'll have aliases in core, but
>> remove them in two releases.  Either that's unrealistic, or it's just
>> prolonging the pain.
> 
> Waiting 10+ years doesn't make the pain go away when it comes to
> removing things like that.

Sure it does.  That's two whole generations of client tools.  For
example, at that point, pgAdmin3 won't reliably run on any supported
platform, so it won't be a problem if we break it.

If we clearly mark the old function names as deprecated aliases, client
tools will gradually move to the new names.

Counter-argument: moving the directory is going to break many tools
anyway, so why bother with function aliases?

-- 
Josh Berkus
Containers & Databases Oh My!


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread Andres Freund
On 2017-02-09 15:53:47 -0500, Stephen Frost wrote:
> * Josh Berkus (j...@berkus.org) wrote:
> > What I'm voting against is the idea that we'll have aliases in core, but
> > remove them in two releases.  Either that's unrealistic, or it's just
> > prolonging the pain.
> 
> Waiting 10+ years doesn't make the pain go away when it comes to
> removing things like that.

That's ridiculous. I think we can fairly argue whether backward compat
is worth it, but claiming that migrations to something new aren't easier
if there's a number of versions that support both the old and the new
names/syntax/whatnot is obviously wrong.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread Stephen Frost
* Josh Berkus (j...@berkus.org) wrote:
> On 02/09/2017 12:42 PM, Stephen Frost wrote:
> > * Josh Berkus (j...@berkus.org) wrote:
> >> On 02/09/2017 11:08 AM, Tom Lane wrote:
> >>> Agreed, let's just get it done.
> >>>
> >>> Although this doesn't really settle whether we ought to do 3a (with
> >>> backwards-compatibility function aliases in core) or 3b (without 'em).
> >>> Do people want to re-vote, understanding that those are the remaining
> >>> choices?
> >>
> >> Does 3a) mean keeping the aliases more-or-less forever?
> >>
> >> If not, I vote for 3b.  If we're going to need to break stuff, let's
> >> just do it.
> >>
> >> If we can keep the aliases for 6-10 years, then I see no reason not to
> >> have them (3a).  They're not exactly likely to conflict with user-chosen
> >> names.
> > 
> > When we remove pg_shadow, then I'll be willing to agree that maybe we
> > can start having things in PG for a couple releases that are just for
> > backwards-compatibility and will actually be removed later.
> > 
> > History has shown that's next to impossible, however.
> 
> That's why I said 6-10 years.  If we're doing 3a, realistically we're
> supporting it until PostgreSQL 16, at least, and more likely 20.  I'm OK
> with that.

Uh, to be clear, I think it's an entirely bad thing that we've had those
views and various other cruft hang around for over 10 years.

And removing them today will probably still have people crying about how
pgAdmin3 and other things still use them.

> What I'm voting against is the idea that we'll have aliases in core, but
> remove them in two releases.  Either that's unrealistic, or it's just
> prolonging the pain.

Waiting 10+ years doesn't make the pain go away when it comes to
removing things like that.

Let's not go there.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-02-09 Thread Robert Haas
On Wed, Feb 8, 2017 at 5:36 AM, Thomas Munro
 wrote:
> Thinking a bit harder about this, I suppose there could be a kind of
> object called a SharedBufFileManager (insert better name) which you
> can store in a DSM segment.  The leader backend that initialises a DSM
> segment containing one of these would then call a constructor function
> that sets an internal refcount to 1 and registers an on_dsm_detach
> callback for its on-detach function.  All worker backends that attach
> to the DSM segment would need to call an attach function for the
> SharedBufFileManager to increment a refcount and also register the
> on_dsm_detach callback, before any chance that an error might be
> thrown (is that difficult?); failure to do so could result in file
> leaks.  Then, when a BufFile is to be shared (AKA exported, made
> unifiable), a SharedBufFile object can be initialised somewhere in the
> same DSM segment and registered with the SharedBufFileManager.
> Internally all registered SharedBufFile objects would be linked
> together using offsets from the start of the DSM segment for link
> pointers.  Now when SharedBufFileManager's on-detach function runs, it
> decrements the refcount in the SharedBufFileManager, and if that
> reaches zero then it runs a destructor that spins through the list of
> SharedBufFile objects deleting files that haven't already been deleted
> explicitly.

I think this is approximately reasonable, but I think it could be made
simpler by having fewer separate objects.  Let's assume the leader can
put an upper bound on the number of shared BufFiles at the time it's
sizing the DSM segment (i.e. before InitializeParallelDSM).  Then it
can allocate a big ol' array with a header indicating the array size
and each element containing enough space to identify the relevant
details of 1 shared BufFile.  Now you don't need to do any allocations
later on, and you don't need a linked list.  You just loop over the
array and do what needs doing.

> There are a couple of problems with the above though.  Firstly, doing
> reference counting in DSM segment on-detach hooks is really a way to
> figure out when the DSM segment is about to be destroyed by keeping a
> separate refcount in sync with the DSM segment's refcount, but it
> doesn't account for pinned DSM segments.  It's not your use-case or
> mine currently, but someone might want a DSM segment to live even when
> it's not attached anywhere, to be reattached later.  If we're trying
> to use DSM segment lifetime as a scope, we'd be ignoring this detail.
> Perhaps instead of adding our own refcount we need a new kind of hook
> on_dsm_destroy.

I think it's good enough to plan for current needs now.  It's not
impossible to change this stuff later, but we need something that
works robustly right now without being too invasive.  Inventing whole
new system concepts because of stuff we might someday want to do isn't
a good idea because we may easily guess wrong about what direction
we'll want to go in the future.  This is more like building a wrench
than a 747: a 747 needs to be extensible and reconfigurable and
upgradable because it costs $350 million.   A wrench costs $10 at
Walmart and if it turns out we bought the wrong one, we can just throw
it out and get a different one later.

> Secondly, I might not want to be constrained by a
> fixed-sized DSM segment to hold my SharedBufFile objects... there are
> cases where I need to shared a number of batch files that is unknown
> at the start of execution time when the DSM segment is sized (I'll
> write about that shortly on the Parallel Shared Hash thread).  Maybe I
> can find a way to get rid of that requirement.  Or maybe it could
> support DSA memory too, but I don't think it's possible to use
> on_dsm_detach-based cleanup routines that refer to DSA memory because
> by the time any given DSM segment's detach hook runs, there's no
> telling which other DSM segments have been detached already, so the
> DSA area may already have partially vanished; some other kind of hook
> that runs earlier would be needed...

Again, wrench.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread Josh Berkus
On 02/09/2017 12:42 PM, Stephen Frost wrote:
> * Josh Berkus (j...@berkus.org) wrote:
>> On 02/09/2017 11:08 AM, Tom Lane wrote:
>>> Agreed, let's just get it done.
>>>
>>> Although this doesn't really settle whether we ought to do 3a (with
>>> backwards-compatibility function aliases in core) or 3b (without 'em).
>>> Do people want to re-vote, understanding that those are the remaining
>>> choices?
>>
>> Does 3a) mean keeping the aliases more-or-less forever?
>>
>> If not, I vote for 3b.  If we're going to need to break stuff, let's
>> just do it.
>>
>> If we can keep the aliases for 6-10 years, then I see no reason not to
>> have them (3a).  They're not exactly likely to conflict with user-chosen
>> names.
> 
> When we remove pg_shadow, then I'll be willing to agree that maybe we
> can start having things in PG for a couple releases that are just for
> backwards-compatibility and will actually be removed later.
> 
> History has shown that's next to impossible, however.

That's why I said 6-10 years.  If we're doing 3a, realistically we're
supporting it until PostgreSQL 16, at least, and more likely 20.  I'm OK
with that.

What I'm voting against is the idea that we'll have aliases in core, but
remove them in two releases.  Either that's unrealistic, or it's just
prolonging the pain.

-- 
Josh Berkus
Containers & Databases Oh My!


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread Stephen Frost
* Josh Berkus (j...@berkus.org) wrote:
> On 02/09/2017 11:08 AM, Tom Lane wrote:
> > Agreed, let's just get it done.
> > 
> > Although this doesn't really settle whether we ought to do 3a (with
> > backwards-compatibility function aliases in core) or 3b (without 'em).
> > Do people want to re-vote, understanding that those are the remaining
> > choices?
> 
> Does 3a) mean keeping the aliases more-or-less forever?
> 
> If not, I vote for 3b.  If we're going to need to break stuff, let's
> just do it.
> 
> If we can keep the aliases for 6-10 years, then I see no reason not to
> have them (3a).  They're not exactly likely to conflict with user-chosen
> names.

When we remove pg_shadow, then I'll be willing to agree that maybe we
can start having things in PG for a couple releases that are just for
backwards-compatibility and will actually be removed later.

History has shown that's next to impossible, however.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread Josh Berkus
On 02/09/2017 11:08 AM, Tom Lane wrote:
> Agreed, let's just get it done.
> 
> Although this doesn't really settle whether we ought to do 3a (with
> backwards-compatibility function aliases in core) or 3b (without 'em).
> Do people want to re-vote, understanding that those are the remaining
> choices?

Does 3a) mean keeping the aliases more-or-less forever?

If not, I vote for 3b.  If we're going to need to break stuff, let's
just do it.

If we can keep the aliases for 6-10 years, then I see no reason not to
have them (3a).  They're not exactly likely to conflict with user-chosen
names.

-- 
Josh Berkus
Containers & Databases Oh My!


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread Kevin Grittner
On Thu, Feb 9, 2017 at 1:44 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Feb 9, 2017 at 2:08 PM, Tom Lane  wrote:
>>> Although this doesn't really settle whether we ought to do 3a (with
>>> backwards-compatibility function aliases in core) or 3b (without 'em).
>>> Do people want to re-vote, understanding that those are the remaining
>>> choices?
>
>> I prefer (3c) put them in an extension and let people that need 'em
>> install 'em, but not have them available by default.
>
> As far as the core code is concerned, 3b and 3c are the same thing.

I think so, too, if we're talking about an extension in core.

My vote is for 3b.  If someone wants to write the alias extension and
make it available outside of core, fine -- though they don't need anyone's
vote to do so.

-- 
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)

2017-02-09 Thread Tom Lane
Robert Haas  writes:
> I (still) think this is a bad design.  Even if you've got all the
> messages just right as things stand today, some new feature that comes
> along in the future can change things so that they're not right any
> more, and nobody's going to relish maintaining this.

FWIW, I tend to agree that this is way overboard in terms of the amount of
complexity going into the messages.  Also, I do not like what seems to
be happening here:

>> $ psql test < test2.sql -v ON_ERROR_STOP=0
>> unrecognized value "error" for "\if ": boolean expected
>> new \if is invalid, ignoring commands until next \endif

IMO, an erroneous backslash command should have no effect, period.
"It failed but we'll treat it as if it were valid" is a rathole
I don't want to descend into.  It's particularly bad in interactive
mode, because the most natural thing to do is correct your spelling
and issue the command again --- but if psql already decided to do
something on the strength of the mistaken command, that doesn't work,
and you'll have to do something or other to unwind the unwanted
control state before you can get back to what you meant to do.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel bitmap heap scan

2017-02-09 Thread Robert Haas
On Wed, Feb 8, 2017 at 10:59 AM, Robert Haas  wrote:
> Looks good to me.  If nobody has further ideas here, I'll push this
> and your previous patch tomorrow.

Done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)

2017-02-09 Thread Robert Haas
On Mon, Feb 6, 2017 at 5:49 PM, Corey Huinker  wrote:
> I suppressed the endif-balance checking in cases where we're in an
> already-failed situation.
> In cases where there was a boolean parsing failure, and ON_ERROR_STOP is on,
> the error message no longer speak of a future which the session does not
> have. I could left the ParseVariableBool() message as the only one, but
> wasn't sure that that was enough of an error message on its own.
> Added the test case to the existing tap tests. Incidentally, the tap tests
> aren't presently fooled into thinking they're interactive.
>
> $ cat test2.sql
> \if error
> \echo NO
> \endif
> \echo NOPE
> $ psql test < test2.sql -v ON_ERROR_STOP=0
> unrecognized value "error" for "\if ": boolean expected
> new \if is invalid, ignoring commands until next \endif
> NOPE
> $ psql test < test2.sql -v ON_ERROR_STOP=1
> unrecognized value "error" for "\if ": boolean expected
> new \if is invalid.

I (still) think this is a bad design.  Even if you've got all the
messages just right as things stand today, some new feature that comes
along in the future can change things so that they're not right any
more, and nobody's going to relish maintaining this.  This sort of
thing seems fine to me:

new \if is invalid

But then further breaking it down by things like whether
ON_ERROR_STOP=1 is set, or breaking down the \endif output depending
on the surrounding context, seems terrifyingly complex to me.

Mind you, I'm not planning to commit this patch anyway, so feel free
to ignore me, but if I were planning to commit it, I would not commit
it with that level of chattiness.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2017-02-09 Thread Andres Freund
On 2017-02-09 14:48:18 -0500, Robert Haas wrote:
> On Thu, Feb 9, 2017 at 3:05 AM, Andres Freund  wrote:
> >> +#define CHECK_SUPERUSER() { \
> >> + if (!superuser()) \
> >> + ereport(ERROR, \
> >> + 
> >> (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), \
> >> +  (errmsg("must be superuser to use 
> >> verification functions"; }
> >
> > Please make this either a function or just copy - the macro doesn't buy
> > us anything except weirder looking syntax.
>
> Better yet, remove the check altogether and just revoke access from
> public.  Then the default policy is superuser-only, but the DBA can
> grant access to who they wish.

That works for me as well.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2017-02-09 Thread Robert Haas
On Thu, Feb 9, 2017 at 3:05 AM, Andres Freund  wrote:
>> +#define CHECK_SUPERUSER() { \
>> + if (!superuser()) \
>> + ereport(ERROR, \
>> + 
>> (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), \
>> +  (errmsg("must be superuser to use 
>> verification functions"; }
>
> Please make this either a function or just copy - the macro doesn't buy
> us anything except weirder looking syntax.

Better yet, remove the check altogether and just revoke access from
public.  Then the default policy is superuser-only, but the DBA can
grant access to who they wish.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Latch reset ordering bug in condition_variable.c

2017-02-09 Thread Robert Haas
On Thu, Feb 9, 2017 at 6:01 AM, Thomas Munro
 wrote:
> ConditionVariablePrepareToSleep() has a race that can leave you
> hanging, introduced by me in the v4 patch.  The problem is that that
> function resets our latch after adding our process to the wakeup list.
> With the right timing, the following sequence can happen:
>
> 1.  ConditionVariablePrepareToSleep() adds us to the wakeup list.
> 2.  Some other process calls ConditionVariableSignal().  It removes us
> from the wakeup list and sets our latch.
> 3.  ConditionVariablePrepareToSleep() resets our latch.
> 4.  We enter (or continue) our predicate loop.  Our exit condition
> happens not to be true yet, so we call ConditionVariableSleep().
> 5.  ConditionVariableSleep() never returns because WaitEventSet()
> blocks.  Our latch is not set, yet we are no longer in the wakeup list
> so ConditionalVariableSignal() will never set it.
>
> We should reset the latch first.  Then there is no way to reach
> ConditionVariableSleep() with neither a set latch nor an entry in the
> wakeup queue.
>
> See attached.  Thoughts?

Oops.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread Tom Lane
Robert Haas  writes:
> On Thu, Feb 9, 2017 at 2:08 PM, Tom Lane  wrote:
>> Although this doesn't really settle whether we ought to do 3a (with
>> backwards-compatibility function aliases in core) or 3b (without 'em).
>> Do people want to re-vote, understanding that those are the remaining
>> choices?

> I prefer (3c) put them in an extension and let people that need 'em
> install 'em, but not have them available by default.

As far as the core code is concerned, 3b and 3c are the same thing.
IOW, somebody can write the extension later.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread Robert Haas
On Thu, Feb 9, 2017 at 2:08 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Mon, Feb 6, 2017 at 12:29 PM, Magnus Hagander  wrote:
 Here is what I have, 6 votes clearly stated:
 1. Rename nothing: Daniel,
 2. Rename directory only: Andres
 3. Rename everything: Stephen, Vladimir, David S, Michael P (with
 aliases for functions, I could live without at this point...)
>
>> [ vote-counting ]
>
>> Therefore, I plan to go ahead and do #3.  Somebody's probably going to
>> jump in now with another opinion but I think this thread's gone on
>> long enough.
>
> Agreed, let's just get it done.
>
> Although this doesn't really settle whether we ought to do 3a (with
> backwards-compatibility function aliases in core) or 3b (without 'em).
> Do people want to re-vote, understanding that those are the remaining
> choices?

I prefer (3c) put them in an extension and let people that need 'em
install 'em, but not have them available by default.  If the only
choices are (3a) and (3b) then I guess I pick (3b), but I think an
extension doesn't cost much and will ease the migration pain for users
so I'm in favor of it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pageinspect: Hash index support

2017-02-09 Thread Robert Haas
On Thu, Feb 9, 2017 at 1:11 PM, Ashutosh Sharma  wrote:
>> I think you should just tighten up the sanity checking in the existing
>> function _hash_ovflblkno_to_bitno rather than duplicating the code.  I
>> don't think it's called often enough for one extra (cheap) test to be
>> an issue.  (You should change the elog in that function to an ereport,
>> too, since it's going to be a user-facing error message now.)
>
> okay, I have taken care of above two points in the attached patch. Thanks.

Alright, committed with a little further hacking.  That would rejected
using hash_bitmap_info() on primary bucket pages and the metapage, but
not on bitmap pages, which seems weird.  So I fixed that and pushed
this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread Andres Freund
On 2017-02-09 14:08:14 -0500, Tom Lane wrote:
> Robert Haas  writes:
> > On Mon, Feb 6, 2017 at 12:29 PM, Magnus Hagander  
> > wrote:
> >>> Here is what I have, 6 votes clearly stated:
> >>> 1. Rename nothing: Daniel,
> >>> 2. Rename directory only: Andres
> >>> 3. Rename everything: Stephen, Vladimir, David S, Michael P (with
> >>> aliases for functions, I could live without at this point...)
> 
> > [ vote-counting ]
> 
> > Therefore, I plan to go ahead and do #3.  Somebody's probably going to
> > jump in now with another opinion but I think this thread's gone on
> > long enough.
> 
> Agreed, let's just get it done.
> 
> Although this doesn't really settle whether we ought to do 3a (with
> backwards-compatibility function aliases in core) or 3b (without 'em).
> Do people want to re-vote, understanding that those are the remaining
> choices?

3a)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to debug the wire protocol?

2017-02-09 Thread Rui Pacheco
The answers are great, thanks!

> On 9 Feb 2017, at 20:26, Andres Freund  wrote:
> 
> On 2017-02-09 20:17:33 +0100, Rui Pacheco wrote:
>> I’ve been anxiously waiting for that reply but got nothing. I even mailed 
>> the mailing list manager again with no reply. And when I click on that link 
>> I get a server error: "An internal server error occurred.”
> 
> The server error has already been fixed, please reload...



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to debug the wire protocol?

2017-02-09 Thread Andres Freund
On 2017-02-09 20:17:33 +0100, Rui Pacheco wrote:
> I’ve been anxiously waiting for that reply but got nothing. I even mailed the 
> mailing list manager again with no reply. And when I click on that link I get 
> a server error: "An internal server error occurred.”

The server error has already been fixed, please reload...


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to debug the wire protocol?

2017-02-09 Thread Rui Pacheco
I’ve been anxiously waiting for that reply but got nothing. I even mailed the 
mailing list manager again with no reply. And when I click on that link I get a 
server error: "An internal server error occurred.”

But thanks, this helps me focus on the problem.

> On 9 Feb 2017, at 20:15, Tom Lane  wrote:
> 
> Rui Pacheco  writes:
>> I’ve sent a similar email to the general mailing list but got no reply.
> 
> Uh, multiple people answered you, see
> https://www.postgresql.org/message-id/flat/22DB2E08-6329-4EF8-B3E5-C0A7728A067B%40gmail.com
> 
>> Is there a way to see how Postgres parses the message? I’ve set logging to 
>> DEBUG5 but the only thing I can see in the logs is, among other things, the 
>> message
> 
>> ERROR:  invalid string in message
>> FATAL:  invalid frontend message type 63
> 
> That seems to comport with the theory I suggested before, that you were
> sending an incorrect message length word --- although this additional
> evidence indicates that the sent length is too short, not too long
> which is what I'd been thinking.  Or possibly you left out required
> field(s) of the message.  Anyway it's expecting a null-terminated string
> and not finding one in the remaining message data.
> 
>   regards, tom lane



Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread David G. Johnston
On Thu, Feb 9, 2017 at 12:08 PM, Tom Lane  wrote:

> Robert Haas  writes:
> > On Mon, Feb 6, 2017 at 12:29 PM, Magnus Hagander 
> wrote:
> >>> Here is what I have, 6 votes clearly stated:
> >>> 1. Rename nothing: Daniel,
> >>> 2. Rename directory only: Andres
> >>> 3. Rename everything: Stephen, Vladimir, David S, Michael P (with
> >>> aliases for functions, I could live without at this point...)
>
> > [ vote-counting ]
>
> > Therefore, I plan to go ahead and do #3.  Somebody's probably going to
> > jump in now with another opinion but I think this thread's gone on
> > long enough.
>
> Agreed, let's just get it done.
>
> Although this doesn't really settle whether we ought to do 3a (with
> backwards-compatibility function aliases in core) or 3b (without 'em).
> Do people want to re-vote, understanding that those are the remaining
> choices?
>

​I wouldn't oppose:

CREATE EXTENSION give_me_my_xlog_back;

but my prior thoughts lead me toward not including such functions in the
bootstrap catalog.

David J.


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread David Steele
On 2/9/17 2:14 PM, David G. Johnston wrote:
> On Thu, Feb 9, 2017 at 12:08 PM, Tom Lane  >wrote:
> 
> Robert Haas >
> writes:
> > On Mon, Feb 6, 2017 at 12:29 PM, Magnus Hagander  > wrote:
> >>> Here is what I have, 6 votes clearly stated:
> >>> 1. Rename nothing: Daniel,
> >>> 2. Rename directory only: Andres
> >>> 3. Rename everything: Stephen, Vladimir, David S, Michael P (with
> >>> aliases for functions, I could live without at this point...)
> 
> > [ vote-counting ]
> 
> > Therefore, I plan to go ahead and do #3.  Somebody's probably going to
> > jump in now with another opinion but I think this thread's gone on
> > long enough.
> 
> Agreed, let's just get it done.
> 
> Although this doesn't really settle whether we ought to do 3a (with
> backwards-compatibility function aliases in core) or 3b (without 'em).
> Do people want to re-vote, understanding that those are the remaining
> choices?

-1 on aliases in core (so to be clear my vote is for 3b).

> ​I wouldn't oppose:
> 
> CREATE EXTENSION give_me_my_xlog_back;
> 
> but my prior thoughts lead me toward not including such functions in the
> bootstrap catalog.

I'm not very excited about an extension, can we just provide a link to a
script in the release notes, or simply note that wrappers can be created?

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to debug the wire protocol?

2017-02-09 Thread Tom Lane
Rui Pacheco  writes:
> I’ve sent a similar email to the general mailing list but got no reply.

Uh, multiple people answered you, see
https://www.postgresql.org/message-id/flat/22DB2E08-6329-4EF8-B3E5-C0A7728A067B%40gmail.com

> Is there a way to see how Postgres parses the message? I’ve set logging to 
> DEBUG5 but the only thing I can see in the logs is, among other things, the 
> message

> ERROR:  invalid string in message
> FATAL:  invalid frontend message type 63

That seems to comport with the theory I suggested before, that you were
sending an incorrect message length word --- although this additional
evidence indicates that the sent length is too short, not too long
which is what I'd been thinking.  Or possibly you left out required
field(s) of the message.  Anyway it's expecting a null-terminated string
and not finding one in the remaining message data.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to debug the wire protocol?

2017-02-09 Thread Pavel Stehule
2017-02-09 20:12 GMT+01:00 Rui Pacheco :

> Not sure this solves my problem. I’d like to debug how the server is
> failing to parse my message, not what it looks like on the wire. I am the
> one forming the message after all.
>
> My apologies if I missed something.
>
>
ok

no problem

Regards

Pavel

> On 9 Feb 2017, at 20:05, Pavel Stehule  wrote:
>
> Hi
>
> 2017-02-09 20:01 GMT+01:00 Rui Pacheco :
>
>> Hello,
>>
>> I’ve sent a similar email to the general mailing list but got no reply.
>>
>> I’m trying to write my own implementation of the wire protocol. I’m stuck
>> at the Parse message where I send it to the server but always get the
>> following error: “Could not parse statement invalid string in message”
>>
>> Is there a way to see how Postgres parses the message? I’ve set logging
>> to DEBUG5 but the only thing I can see in the logs is, among other things,
>> the message
>>
>> ERROR:  invalid string in message
>> FATAL:  invalid frontend message type 63
>>
>> What is exactly wrong? Is there a way to log the parsing of the message,
>> or should I create my own build of postgres with log statements where I
>> need them?
>>
>
>  https://github.com/dalibo/pgshark
>
> Regards
>
> Pavel
>
>>
>> Many thanks,
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>
>
>


Re: [HACKERS] How to debug the wire protocol?

2017-02-09 Thread Rui Pacheco
Not sure this solves my problem. I’d like to debug how the server is failing to 
parse my message, not what it looks like on the wire. I am the one forming the 
message after all.

My apologies if I missed something.

> On 9 Feb 2017, at 20:05, Pavel Stehule  wrote:
> 
> Hi
> 
> 2017-02-09 20:01 GMT+01:00 Rui Pacheco  >:
> Hello,
> 
> I’ve sent a similar email to the general mailing list but got no reply.
> 
> I’m trying to write my own implementation of the wire protocol. I’m stuck at 
> the Parse message where I send it to the server but always get the following 
> error: “Could not parse statement invalid string in message”
> 
> Is there a way to see how Postgres parses the message? I’ve set logging to 
> DEBUG5 but the only thing I can see in the logs is, among other things, the 
> message
> 
> ERROR:  invalid string in message
> FATAL:  invalid frontend message type 63
> 
> What is exactly wrong? Is there a way to log the parsing of the message, or 
> should I create my own build of postgres with log statements where I need 
> them?
> 
>  https://github.com/dalibo/pgshark 
> 
> Regards
> 
> Pavel
> 
> Many thanks,
> 
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org 
> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers 
> 


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread Tom Lane
Robert Haas  writes:
> On Mon, Feb 6, 2017 at 12:29 PM, Magnus Hagander  wrote:
>>> Here is what I have, 6 votes clearly stated:
>>> 1. Rename nothing: Daniel,
>>> 2. Rename directory only: Andres
>>> 3. Rename everything: Stephen, Vladimir, David S, Michael P (with
>>> aliases for functions, I could live without at this point...)

> [ vote-counting ]

> Therefore, I plan to go ahead and do #3.  Somebody's probably going to
> jump in now with another opinion but I think this thread's gone on
> long enough.

Agreed, let's just get it done.

Although this doesn't really settle whether we ought to do 3a (with
backwards-compatibility function aliases in core) or 3b (without 'em).
Do people want to re-vote, understanding that those are the remaining
choices?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to debug the wire protocol?

2017-02-09 Thread Pavel Stehule
Hi

2017-02-09 20:01 GMT+01:00 Rui Pacheco :

> Hello,
>
> I’ve sent a similar email to the general mailing list but got no reply.
>
> I’m trying to write my own implementation of the wire protocol. I’m stuck
> at the Parse message where I send it to the server but always get the
> following error: “Could not parse statement invalid string in message”
>
> Is there a way to see how Postgres parses the message? I’ve set logging to
> DEBUG5 but the only thing I can see in the logs is, among other things, the
> message
>
> ERROR:  invalid string in message
> FATAL:  invalid frontend message type 63
>
> What is exactly wrong? Is there a way to log the parsing of the message,
> or should I create my own build of postgres with log statements where I
> need them?
>

 https://github.com/dalibo/pgshark

Regards

Pavel

>
> Many thanks,
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


[HACKERS] How to debug the wire protocol?

2017-02-09 Thread Rui Pacheco
Hello,

I’ve sent a similar email to the general mailing list but got no reply.

I’m trying to write my own implementation of the wire protocol. I’m stuck at 
the Parse message where I send it to the server but always get the following 
error: “Could not parse statement invalid string in message”

Is there a way to see how Postgres parses the message? I’ve set logging to 
DEBUG5 but the only thing I can see in the logs is, among other things, the 
message

ERROR:  invalid string in message
FATAL:  invalid frontend message type 63

What is exactly wrong? Is there a way to log the parsing of the message, or 
should I create my own build of postgres with log statements where I need them?

Many thanks,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-02-09 Thread Robert Haas
On Mon, Feb 6, 2017 at 12:29 PM, Magnus Hagander  wrote:
>> Here is what I have, 6 votes clearly stated:
>> 1. Rename nothing: Daniel,
>> 2. Rename directory only: Andres
>> 3. Rename everything: Stephen, Vladimir, David S, Michael P (with
>> aliases for functions, I could live without at this point...)
>
> Put my vote down for 2.

I think there is a very strong consensus for going either forward and
renaming everything or going backward and renaming nothing.  That
position has been endorsed by me, David Johnston, Tom Lane, Stephen
Frost, Kevin Grittner, Vladimir Rusinov, David Steele, probably
Michael Paquier, and possibly JD.  The only people who have explicitly
voted against that position are Andres and Magnus, who prefer renaming
only the directory.  I think that's approximately a 7-2 vote in favor
of not leaving things as they are (#2).

The vote on whether to go forward (#3) or backward (#1) is closer.
All of the people mentioned above as wanting consistency - except for
JD whose actual vote wasn't entirely clear - indicated a preference
for #3 over #1, but a number of them prefer it only weakly.  Moreover,
Fujii Masao and Daniel Verite prefer #1.  But I still think that the
vote is in favor of #3.  There are 7 clear votes for that position and
no more than 2 votes for any other position.  Even regarding every
vote that isn't for #3 as a vote for #1, which is probably not
entirely accurate, it's still 7-4 in favor of #3.

Looking back at older emails before things came quite so sharply into
focus, I found various other opinions.  But I don't think they change
the overall picture very much.  Cynthia Shang seemed to favor a more
limited renaming, but her point was that we have lots of internal
stuff that uses the xlog terminology, which isn't quite the same
question as whether the user-visible stuff should all match.  David
Fetter favored not adding aliases when we did the renaming, but didn't
clearly spell out that he favored the renaming.  Similarly, Euler
Taveira favored aliases in an extension, but likewise didn't clearly
spell out his position on the renaming itself.  (I would tend to count
those as votes in favor of the renaming itself but you could argue
that.)  Bruce Momjian wanted to go forward to keep things clean.
Simon Riggs wanted to leave things as they are, but the reason given
was not so much about the merits of the issue but about not wanting to
spend more time on it.  You can come up with different counts
depending on exactly how you interpret what all of those people said,
but not even the least favorable allocation of those votes ends up
with anything other than #3 as the most popular option.

Therefore, I plan to go ahead and do #3.  Somebody's probably going to
jump in now with another opinion but I think this thread's gone on
long enough.  We're going to take some backward-compatibility pain
here as a result of these changes and some people are going to be
unhappy about that, but I think we've allowed enough time for people
to weigh in with opinions and this seems to be where we're at.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pageinspect: Hash index support

2017-02-09 Thread Ashutosh Sharma
> I think you should just tighten up the sanity checking in the existing
> function _hash_ovflblkno_to_bitno rather than duplicating the code.  I
> don't think it's called often enough for one extra (cheap) test to be
> an issue.  (You should change the elog in that function to an ereport,
> too, since it's going to be a user-facing error message now.)

okay, I have taken care of above two points in the attached patch. Thanks.


simplify_hash_bitmap_info_v3.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parameterization of partial path

2017-02-09 Thread Robert Haas
On Thu, Feb 9, 2017 at 12:36 PM, Antonin Houska  wrote:
> When looking at try_partial_hashjoin_path and try_partial_nestloop_path
> functions I'm failing to understand the comment "Parameterized partial paths
> are not supported.".
>
> It's clear to me that GatherPath can't have parameters because repeated
> execution of parallel plan with adjusted parameters is not supported.

Actually, I think in theory that is fine.  You'd start up and shut
down workers for every execution, which is likely to stink in terms of
performance, but it should work OK.  The only problem is that it'll
only work if you pass the values of the parameters down to the worker
processes, which the code currently doesn't. Bind parameters sent by
the user are passed down, but there's no provision to pass down
parameters populated at execution time.  Amit has written code for
that, though, and it's been posted here.  It just hasn't gotten into
the tree yet for one reason or another.

> But the
> fact that generic partial path has parameters should not be a problem if those
> parameters are satisfied below the nearest GatherPath node higher in the plan
> tree. Do I miss anything of the concept?

Yes, I think you're missing a key point.  A parameterized path would
normally be used for a query like small LJ (big1 IJ big2 ON big1.x =
big2.x) ON big1.y = small.y.  The plan might look like this:

Nested Loop Left Join
-> Seq Scan on small
-> Nested Loop Join
  -> Index Scan on big1
  -> Index Scan on big2

In essence, we're repeating the join between big1 and big2 for every
row in small.  That seems like a bad strategy until you realize that
each join will scan only a tiny fraction of each of those tables.  If
you couldn't pass a parameter down from the scan of small to the scans
on big1 and big2, you'd end up with a plan that scans one or both of
those tables in their entirety.  Ouch.

Now, this plan can be parallelized just fine.  The sequential scan on
small can be replaced with a Parallel Seq Scan.  That works fine, and
the planner is already capable of generating such plans.  However, at
no point in that do you get a parameterized *partial* path.  You
generate regular old parameterized paths for big1 and big2 and join
then to produce a parameterized path for (big1 big2), and then you
join that via a nested loop with the non-parameterized partial path
for small, and you get a partial but unparameterized path for (small
big1 big2).  Then you push a Gather node on top and you're done.

Suppose we did generate a partial plan for (big1 big2).  It would look
like this:

Nested Loop Join
-> Parallel Index Scan on big1
-> Index Scan on big2

Now, how could you join that in a meaningful way to small so as to
come up with anything sensible?   You really can't.  Consider a plan
like this:

Gather
-> Nested Loop Left Join
  -> Seq Scan on small
  -> Nested Loop Join
-> Partial Index Scan on big1
-> Index Scan on big2

That's clearly nonsense.  The partial index scan is supposed to return
a subset of the result rows to each worker, but there's no reason the
workers have to be basing their work on the same row from 'small'.
Which of their values would the parallel index scan supposedly be
using?  This isn't a matter of some implementation detail that we're
currently missing; such a plan is just inherently nonsensical.

> Actually this check
>
> if (!bms_is_subset(inner_paramrels, outer_path->parent->relids))
>return;
>
> in try_partial_nestloop_path only rejects special case where the inner path
> references relations outside the join, but still allows the outer path to have
> parameters outside.

Right.  We build a partial join path by joining a partial path on the
outer side with a non-partial path on the inner side.  If the join is
a nested loop, the inner side can be parameterized, but all of those
parameters have to be provided by the outer side; if not, we get the
nonsensical situation illustrated above.

> As for try_partial_hashjoin_path, the comment "If the inner path is
> parameterized ..." seems to be copy & pasted from try_partial_nestloop_path,
> but I think it does not fit hash join. From hash_inner_and_outer I judge that
> neither side of hash join can be parameterized by the other:
>
> /*
>  * If either cheapest-total path is parameterized by the other rel, we
>  * can't use a hashjoin.  (There's no use looking for alternative
>  * input paths, since these should already be the least-parameterized
>  * available paths.)
>  */
> if (PATH_PARAM_BY_REL(cheapest_total_outer, innerrel) ||
> PATH_PARAM_BY_REL(cheapest_total_inner, outerrel))
> return;
>
> Shouldn't try_partial_hashjoin_path and try_partial_nestloop_path do just the
> same checks that try_hashjoin_path and try_nestloop_path do respectively?

No, because there is no point in producing parameterized partial paths
for the reasons mentioned above.  I think you're right that the
comment in try_partial_hashjoin_path is not quite right.  

Re: [HACKERS] ICU integration

2017-02-09 Thread Peter Geoghegan
On Thu, Feb 9, 2017 at 7:58 AM, Peter Eisentraut
 wrote:
> I think I have this sorted out.  What I don't understand, however, is
> why varstr_abbrev_convert() makes a point of looping until the result of
> strxfrm() fits into the output buffer (buf2), when we only need 8 bytes,
> and we throw away the rest later.  Wouldn't it work to just request 8 bytes?

Maybe. We do that because strxfrm() is not required by the standard to
produce well defined contents for the buffer when the return value
indicates that it didn't fit entirely. This is a standard idiom, I
think.

> If there is a problem with just requesting 8 bytes, then I'm wondering
> how this would affect the ICU code branch.

This must be fine with ICU's ucol_nextSortKeyPart(), because it is
designed for the express purpose of producing only a few bytes of the
final blob at a time.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Parameterization of partial path

2017-02-09 Thread Antonin Houska
When looking at try_partial_hashjoin_path and try_partial_nestloop_path
functions I'm failing to understand the comment "Parameterized partial paths
are not supported.".

It's clear to me that GatherPath can't have parameters because repeated
execution of parallel plan with adjusted parameters is not supported. But the
fact that generic partial path has parameters should not be a problem if those
parameters are satisfied below the nearest GatherPath node higher in the plan
tree. Do I miss anything of the concept?

Actually this check

if (!bms_is_subset(inner_paramrels, outer_path->parent->relids))
   return;

in try_partial_nestloop_path only rejects special case where the inner path
references relations outside the join, but still allows the outer path to have
parameters outside.

As for try_partial_hashjoin_path, the comment "If the inner path is
parameterized ..." seems to be copy & pasted from try_partial_nestloop_path,
but I think it does not fit hash join. From hash_inner_and_outer I judge that
neither side of hash join can be parameterized by the other:

/*
 * If either cheapest-total path is parameterized by the other rel, we
 * can't use a hashjoin.  (There's no use looking for alternative
 * input paths, since these should already be the least-parameterized
 * available paths.)
 */
if (PATH_PARAM_BY_REL(cheapest_total_outer, innerrel) ||
PATH_PARAM_BY_REL(cheapest_total_inner, outerrel))
return;

Shouldn't try_partial_hashjoin_path and try_partial_nestloop_path do just the
same checks that try_hashjoin_path and try_nestloop_path do respectively?

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2017-02-09 Thread Robert Haas
On Thu, Jan 12, 2017 at 10:23 PM, Amit Kapila  wrote:
> On Fri, Jan 13, 2017 at 1:04 AM, Jesper Pedersen
>  wrote:
>> On 12/27/2016 01:58 AM, Amit Kapila wrote:
>>> After recent commit's 7819ba1e and 25216c98, this patch requires a
>>> rebase.  Attached is the rebased patch.
>>
>> This needs a rebase after commit e898437.
>
> Attached find the rebased patch.

Well, I've managed to break this again by committing more things.

I think it would be helpful if you broke this into a series of
preliminary refactoring patches and then a final patch that actually
adds WAL-logging. The things that look like preliminary refactoring to
me are:

- Adding _hash_pgaddmultitup and using it in various places.
- Adding and freeing overflow pages has been extensively reworked.
- Similarly, there is some refactoring of how bitmap pages get initialized.
- Index initialization has been rejiggered significantly.
- Bucket splits have been rejiggered.

Individually those changes don't look all that troublesome to review,
but altogether it's quite a lot.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pageinspect: Hash index support

2017-02-09 Thread Robert Haas
On Thu, Feb 9, 2017 at 8:56 AM, Ashutosh Sharma  wrote:
>> If you want to verify that the supplied page number is an overflow
>> page before returning the bit, I think you should do that calculation
>> based on the metapage.  There's enough information in hashm_spares to
>> figure out which pages are primary bucket pages as opposed to overflow
>> pages, and there's enough information in hashm_mapp to identify which
>> pages are bitmap pages, and the metapage is always page 0.  If you
>> take that approach, then you can check a million bitmap bits reading
>> only the relevant bitmap pages plus the metapage, which is a LOT less
>> I/O than reading a million index pages.
>
> Thanks for the inputs. Attached is the patch modified as per your suggestions.

I think you should just tighten up the sanity checking in the existing
function _hash_ovflblkno_to_bitno rather than duplicating the code.  I
don't think it's called often enough for one extra (cheap) test to be
an issue.  (You should change the elog in that function to an ereport,
too, since it's going to be a user-facing error message now.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU integration

2017-02-09 Thread Peter Eisentraut
On 1/24/17 12:45 PM, Peter Eisentraut wrote:
> On 1/9/17 3:45 PM, Peter Geoghegan wrote:
>> * I think it's worth looking into ucol_nextSortKeyPart(), and using
>> that as an alternative to ucol_getSortKey(). It doesn't seem any
>> harder, and when I tested it it was clearly faster. (I think that
>> ucol_nextSortKeyPart() is more or less intended to be used for
>> abbreviated keys.)
> I will try to look into that.

I think I have this sorted out.  What I don't understand, however, is
why varstr_abbrev_convert() makes a point of looping until the result of
strxfrm() fits into the output buffer (buf2), when we only need 8 bytes,
and we throw away the rest later.  Wouldn't it work to just request 8 bytes?

If there is a problem with just requesting 8 bytes, then I'm wondering
how this would affect the ICU code branch.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Avoid precision error in to_timestamp().

2017-02-09 Thread Erik Nordström
Tom,

You are of course right that you cannot add precision that was not there to
begin with. My patch does nothing for input values that cannot be
represented accurately to begin with. However, that was not my main point.

The idea with the calculation is this: When you remove the integral/seconds
part of the value before converting to integral microseconds, you are
creating a number that can be represented with a float at higher accuracy
compared to the original value (i.e., simply because there are less digits
in the mantissa/significand). Thus when doing 0.312311172485352 *
USECS_PER_SEC, you suffer less precision-related errors with regards to
microseconds than when doing 14864803242.312311172485352 * USECS_PER_SEC as
in the original code. In other words, with this approach there are fewer
cases when 1 ULP is bigger than 1 microsecond.

FWIW, this is the output from latest postgres HEAD, which includes your
rint() patch:

postgres=# select to_timestamp(14864803242.312311);
 to_timestamp
---
 2441-01-17 09:00:42.312312+01
(1 row)

And this is the output with my patch:

postgres=# select to_timestamp(14864803242.312311);
 to_timestamp
---
 2441-01-17 09:00:42.312311+01
(1 row)


I don't know why you would get different output (that would be worrying in
itself).

In any case, I would prefer a to_timestamp(numeric), although I see no
reason not to make the float8 version as accurate as possible anyway.

-Erik



On Thu, Feb 9, 2017 at 3:56 PM, Tom Lane  wrote:

> =?UTF-8?Q?Erik_Nordstr=C3=B6m?=  writes:
> > Thanks for the insightful feedback. You are right, the patch does suffer
> > from overflow (and other possible issues when I think about it). Using
> > rint(), as you suggest, helps in my original example case, but there are
> > still cases when the output is not what you would expect. For instance,
> > converting the Unix time 14864803242.312311 gives back the timestamp
> > "2441-01-17 09:00:42.312312+01", even if using rint() (see below).
>
> Hm, that particular case works for me using the patch I committed
> yesterday (which just added a rint() call to the existing code).
> I'm a bit skeptical of the version you propose here because it seems
> mighty prone to subtractive cancellation.  You're basically computing
> x - int(x) which can't add any precision that wasn't there before.
> Looking at successive microsecond values in this range:
>
> regression=# select 14864803242.312310::float8 - 14864803242;
>  ?column?
> ---
>  0.312309265136719
> (1 row)
>
> regression=# select 14864803242.312311::float8 - 14864803242;
>  ?column?
> ---
>  0.312311172485352
> (1 row)
>
> regression=# select 14864803242.312312::float8 - 14864803242;
>  ?column?
> ---
>  0.312311172485352
> (1 row)
>
> regression=# select 14864803242.312313::float8 - 14864803242;
>  ?column?
> ---
>  0.312313079833984
> (1 row)
>
> Basically, 1 ULP in this range is more than 1 microsecond, so there are
> going to be places where you cannot get the right answer.  Reformulating
> the computation will just shift the errors to nearby values.  float8
> simply doesn't have enough bits to represent this many microseconds since
> 1970 exactly, and the problem's only going to get worse the further out
> you look.
>
> I think we might be better advised to add to_timestamp(numeric) alongside
> to_timestamp(float8).  There's plenty of precedent for that (e.g, exp(),
> ln()) so I would not expect problems with ambiguous function calls.
> It'd be slower though ...
>
> regards, tom lane
>


Re: [HACKERS] Patch: Avoid precision error in to_timestamp().

2017-02-09 Thread Erik Nordström
Hi Tom, and others,

Thanks for the insightful feedback. You are right, the patch does suffer
from overflow (and other possible issues when I think about it). Using
rint(), as you suggest, helps in my original example case, but there are
still cases when the output is not what you would expect. For instance,
converting the Unix time 14864803242.312311 gives back the timestamp
"2441-01-17 09:00:42.312312+01", even if using rint() (see below).

I guess precision-related errors are unavoidable when working with floating
point numbers (especially large ones). But I am wondering if it is not
desirable to avoid (or reduce) errors at least in the case when the input
value can be accurately represented to the microsecond by a float (i.e.,
when rounded to microsecond, as in printf)? For example, splitting up the
float into second and microsecond integers might help reduce errors,
especially with large numbers. Something like this:

ts_seconds = (int64)seconds;
ts_microseconds = (int64)rint((seconds - ts_seconds) * USECS_PER_SEC);
result = (ts_seconds - epoch_diff_seconds) * USECS_PER_SEC +
ts_microseconds;

Note that stripping of the full seconds before scaling the microsecond
fractional part will keep it more accurate, and it should solve the problem
for the case above, including overflow.

Or, maybe I am overthinking this and the only proper solution is to provide
a to_timestamp() that takes a microsecond integer? This certainly wouldn't
hurt in any case an makes sense since the timestamp is itself an integer
internally.

Anyway, I am attaching an updated version of the path. Below are some
example outputs (including min and max allowed input) from original
Postgres, with your rint() fix, and the included patch:

Original postgres:

test=# select to_timestamp(9224318015999);
  to_timestamp
-
 294277-01-01 00:59:58.999552+01
(1 row)

test=# select to_timestamp(-210866803200);
  to_timestamp
-
 4714-11-24 01:12:12+01:12:12 BC
(1 row)

test=# select to_timestamp(1486480324.236538);
 to_timestamp
---
 2017-02-07 16:12:04.236537+01
(1 row)

test=# select to_timestamp(14864803242.312311);
 to_timestamp
---
 2441-01-17 09:00:42.312312+01
(1 row)


With rint():

test=# select to_timestamp(9224318015999);
  to_timestamp
-
 294277-01-01 00:59:58.999552+01
(1 row)

test=# select to_timestamp(-210866803200);
  to_timestamp
-
 4714-11-24 01:12:12+01:12:12 BC
(1 row)

test=# select to_timestamp(1486480324.236538);
 to_timestamp
---
 2017-02-07 16:12:04.236538+01
(1 row)

test=# select to_timestamp(14864803242.312311);
 to_timestamp
---
 2441-01-17 09:00:42.312312+01
(1 row)


Included patch:

test=# select to_timestamp(9224318015999);
   to_timestamp
--
 294277-01-01 00:59:59+01
(1 row)

test=# select to_timestamp(-210866803200);
  to_timestamp
-
 4714-11-24 01:12:12+01:12:12 BC
(1 row)

test=# select to_timestamp(1486480324.236538);
 to_timestamp
---
 2017-02-07 16:12:04.236538+01
(1 row)

test=# select to_timestamp(14864803242.312311);
 to_timestamp
---
 2441-01-17 09:00:42.312311+01
(1 row)


--Erik


On Wed, Feb 8, 2017 at 9:52 PM, Tom Lane  wrote:

> I wrote:
> > I wonder if we could make things better just by using rint() rather than
> > a naive cast-to-integer.  The cast will truncate not round, and I think
> > that might be what's mostly biting you.  Does this help for you?
>
> > #ifdef HAVE_INT64_TIMESTAMP
> > - result = seconds * USECS_PER_SEC;
> > + result = rint(seconds * USECS_PER_SEC);
> > #else
>
> I poked around looking for possible similar issues elsewhere, and found
> that a substantial majority of the datetime-related code already uses
> rint() when trying to go from float to int representations.  As far as
> I can find, this function and make_interval() are the only ones that
> fail to do so.  So I'm now thinking that this is a clear oversight,
> and both those places need to be patched to use rint().
>
> regards, tom lane
>


to_timestamp_fix_2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Avoid precision error in to_timestamp().

2017-02-09 Thread Tom Lane
=?UTF-8?Q?Erik_Nordstr=C3=B6m?=  writes:
> Thanks for the insightful feedback. You are right, the patch does suffer
> from overflow (and other possible issues when I think about it). Using
> rint(), as you suggest, helps in my original example case, but there are
> still cases when the output is not what you would expect. For instance,
> converting the Unix time 14864803242.312311 gives back the timestamp
> "2441-01-17 09:00:42.312312+01", even if using rint() (see below).

Hm, that particular case works for me using the patch I committed
yesterday (which just added a rint() call to the existing code).
I'm a bit skeptical of the version you propose here because it seems
mighty prone to subtractive cancellation.  You're basically computing
x - int(x) which can't add any precision that wasn't there before.
Looking at successive microsecond values in this range:

regression=# select 14864803242.312310::float8 - 14864803242;
 ?column?  
---
 0.312309265136719
(1 row)

regression=# select 14864803242.312311::float8 - 14864803242;
 ?column?  
---
 0.312311172485352
(1 row)

regression=# select 14864803242.312312::float8 - 14864803242;
 ?column?  
---
 0.312311172485352
(1 row)

regression=# select 14864803242.312313::float8 - 14864803242;
 ?column?  
---
 0.312313079833984
(1 row)

Basically, 1 ULP in this range is more than 1 microsecond, so there are
going to be places where you cannot get the right answer.  Reformulating
the computation will just shift the errors to nearby values.  float8
simply doesn't have enough bits to represent this many microseconds since
1970 exactly, and the problem's only going to get worse the further out
you look.

I think we might be better advised to add to_timestamp(numeric) alongside
to_timestamp(float8).  There's plenty of precedent for that (e.g, exp(),
ln()) so I would not expect problems with ambiguous function calls.
It'd be slower though ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] One-shot expanded output in psql using \gx

2017-02-09 Thread David Fetter
On Thu, Feb 09, 2017 at 11:12:00AM +0100, Christoph Berg wrote:
> Re: David Fetter 2017-02-08 <20170208151214.ga8...@fetter.org>
> > Would you be open to saving the next person some work by doing
> > something similar to how \d is done, namely looking for an 'x'
> > modifier after g without regard to how far after?  As of this writing,
> > the \d version starts at line 398 in master.
> 
> I think that ship has sailed, given there's already \gset.
> Supporting \g[optionset] next to it *now*, given no one knows how
> "optionset" is going to evolve seems like premature optimization.

Good point.  Thanks for looking it over.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pageinspect: Hash index support

2017-02-09 Thread Ashutosh Sharma
On Wed, Feb 8, 2017 at 11:26 PM, Robert Haas  wrote:
> On Wed, Feb 8, 2017 at 11:58 AM, Ashutosh Sharma  
> wrote:
>>> And then, instead, you need to add some code to set bit based on the
>>> bitmap page, like what you have:
>>>
>>> +mapbuf = _hash_getbuf(indexRel, bitmapblkno, HASH_READ, 
>>> LH_BITMAP_PAGE);
>>> +mappage = BufferGetPage(mapbuf);
>>> +freep = HashPageGetBitmap(mappage);
>>> +
>>> +if (ISSET(freep, bitmapbit))
>>> +bit = 1;
>>>
>>> Except I would write that last line as...
>>>
>>> bit = ISSET(freep, bitmapbit) != 0
>>>
>>> ...instead of using an if statement.
>>>
>>> I'm sort of confused as to why the idea of not reading the underlying
>>> page is so hard to understand.
>>
>> It was not so hard to understand your point. The only reason for
>> reading overflow page is to ensure that we are passing an overflow
>> block as input to '_hash_ovflblkno_to_bitno(metap, (BlockNumber)
>> ovflblkno)'. I had removed the code for reading an overflow page
>> assuming that _hash_ovflblkno_to_bitno() will throw an error if we
>> pass block number of a page other than overflow page but, it doesn't
>> seem to guarantee that it will always return value for an overflow
>> page. Here are my observations,
>>
>> postgres=# select hash_page_type(get_raw_page('con_hash_index', 65));
>>  hash_page_type
>> 
>>  bucket
>> (1 row)
>>
>> postgres=# SELECT * FROM
>> hash_bitmap_info(get_raw_page('con_hash_index', 0), 'con_hash_index',
>> 65);
>>  bitmapblkno | bitmapbit | bitstatus
>> -+---+---
>>   33 | 0 | t
>> (1 row)
>>
>> postgres=# select hash_page_type(get_raw_page('con_hash_index', 64));
>>  hash_page_type
>> 
>>  bucket
>> (1 row)
>>
>> postgres=# SELECT * FROM
>> hash_bitmap_info(get_raw_page('con_hash_index', 0), 'con_hash_index',
>> 64);
>> ERROR:  invalid overflow block number 64
>>
>> postgres=# select hash_page_type(get_raw_page('con_hash_index', 63));
>>  hash_page_type
>> 
>>  bucket
>> (1 row)
>>
>> postgres=# SELECT * FROM
>> hash_bitmap_info(get_raw_page('con_hash_index', 0), 'con_hash_index',
>> 63);
>> ERROR:  invalid overflow block number 63
>>
>> postgres=# select hash_page_type(get_raw_page('con_hash_index', 33));
>>  hash_page_type
>> 
>>  bitmap
>> (1 row)
>>
>> postgres=# SELECT * FROM
>> hash_bitmap_info(get_raw_page('con_hash_index', 0), 'con_hash_index',
>> 33);
>>  bitmapblkno | bitmapbit | bitstatus
>> -+---+---
>>   33 | 0 | t
>> (1 row)
>
> Right, I understand.  But if the caller cares about that, they can use
> hash_page_type() in order to find out whether the result of
> hash_bitmap_info() will be meaningful.  The problem with the way
> you've done it is that if someone wants to check the status of a
> million bitmap bits, that currently requires reading a million pages
> (8GB) whereas if you don't read the underlying page it requires
> reading only enough pages to contain a million bitmap bits (~128kB).
> That's a big difference.
>
> If you want to verify that the supplied page number is an overflow
> page before returning the bit, I think you should do that calculation
> based on the metapage.  There's enough information in hashm_spares to
> figure out which pages are primary bucket pages as opposed to overflow
> pages, and there's enough information in hashm_mapp to identify which
> pages are bitmap pages, and the metapage is always page 0.  If you
> take that approach, then you can check a million bitmap bits reading
> only the relevant bitmap pages plus the metapage, which is a LOT less
> I/O than reading a million index pages.
>

Thanks for the inputs. Attached is the patch modified as per your suggestions.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


simplify_hash_bitmap_info_v2.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical decoding of two-phase transactions

2017-02-09 Thread Stas Kelvich

> On 31 Jan 2017, at 12:22, Craig Ringer  wrote:
> 
> Personally I don't think lack of access to the GID justifies blocking 2PC 
> logical decoding. It can be added separately. But it'd be nice to have 
> especially if it's cheap.

Agreed.

> On 2 Feb 2017, at 00:35, Craig Ringer  wrote:
> 
> Stas was concerned about what happens in logical decoding if we crash between 
> PREPSRE TRANSACTION and COMMIT PREPARED. But we'll always go back and decode 
> the whole txn again anyway so it doesn't matter.

Not exactly. It seems that in previous discussions we were not on the same 
page, probably due to unclear arguments by me.

>From my point of view there is no problems (or at least new problems comparing 
>to ordinary 2PC) with preparing transactions on slave servers with something 
>like “#{xid}#{node_id}” instead of GID if issuing node is coordinator of that 
>transaction. In case of failure, restart, crash we have the same options about 
>deciding what to do with uncommitted transactions.

My concern is about the situation with external coordinator. That scenario is 
quite important for users of postgres native 2pc, notably J2EE user.  Suppose 
user (or his framework) issuing “prepare transaction ‘mytxname’;" to servers 
with ordinary synchronous physical replication. If master will crash and 
replica will be promoted than user can reconnect to it and commit/abort that 
transaction using his GID. And it is unclear to me how to achieve same 
behaviour with logical replication of 2pc without GID in commit record. If we 
will prepare with “#{xid}#{node_id}” on acceptor nodes, then if donor node will 
crash we’ll lose mapping between user’s gid and our internal gid; contrary we 
can prepare with user's GID on acceptors, but then we will not know that GID on 
donor during commit decode (by the time decoding happens all memory state 
already gone and we can’t exchange our xid to gid).

I performed some tests to understand real impact on size of WAL. I've compared 
postgres -master with wal_level = logical, after 3M 2PC transactions with 
patched postgres where GID’s are stored inside commit record too. Testing with 
194-bytes and 6-bytes GID’s. (GID max size is 200 bytes)

-master, 6-byte GID after 3M transaction: pg_current_xlog_location = 0/9572CB28
-patched, 6-byte GID after 3M transaction: pg_current_xlog_location = 0/96C442E0

so with 6-byte GID’s difference in WAL size is less than 1%

-master, 194-byte GID after 3M transaction: pg_current_xlog_location = 
0/B7501578
-patched, 194-byte GID after 3M transaction: pg_current_xlog_location = 
0/D8B43E28

and with 194-byte GID’s difference in WAL size is about 18%

So using big GID’s (as J2EE does) can cause notable WAL bloat, while small 
GID’s are almost unnoticeable.

May be we can introduce configuration option track_commit_gid by analogy with 
track_commit_timestamp and make that behaviour optional? Any objections to that?

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup -R

2017-02-09 Thread Amit Kapila
On Wed, Feb 8, 2017 at 11:38 PM, Robert Haas  wrote:
> I just tried out pg_basebackup -R and got the following recovery.conf file:
>
> standby_mode = 'on'
> primary_conninfo = 'user=rhaas passfile=''/home/rhaas/.pgpass''
> port=5432 sslmode=disable sslcompression=1 target_session_attrs=any'
>
> This seems fairly random to me.  pg_basebackup explains:
>
>  * Do not emit this setting if: - the setting is 
> "replication",
>  * "dbname" or "fallback_application_name", since these would 
> be
>  * overridden by the libpqwalreceiver module anyway. -
> not set or
>  * empty.
>
> ...which looks like it got clubbed by pgindent, but anyway I'm not
> sure that's the best algorithm.  passfile and target_session_attrs are
> both new in v10 and have non-empty default values, yet neither of them
> seems like something that you necessarily want in your
> automatically-created recovery.conf file.  It seems like it would be
> more correct to leave out parameters that are set to the default
> value, rather than those that are set to an empty value.
>

+1.  Sounds sensible thing to do.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Latch reset ordering bug in condition_variable.c

2017-02-09 Thread Thomas Munro
Hi,

ConditionVariablePrepareToSleep() has a race that can leave you
hanging, introduced by me in the v4 patch.  The problem is that that
function resets our latch after adding our process to the wakeup list.
With the right timing, the following sequence can happen:

1.  ConditionVariablePrepareToSleep() adds us to the wakeup list.
2.  Some other process calls ConditionVariableSignal().  It removes us
from the wakeup list and sets our latch.
3.  ConditionVariablePrepareToSleep() resets our latch.
4.  We enter (or continue) our predicate loop.  Our exit condition
happens not to be true yet, so we call ConditionVariableSleep().
5.  ConditionVariableSleep() never returns because WaitEventSet()
blocks.  Our latch is not set, yet we are no longer in the wakeup list
so ConditionalVariableSignal() will never set it.

We should reset the latch first.  Then there is no way to reach
ConditionVariableSleep() with neither a set latch nor an entry in the
wakeup queue.

See attached.  Thoughts?

-- 
Thomas Munro
http://www.enterprisedb.com


fix-condition-variable-race.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-02-09 Thread Amit Kapila
On Thu, Feb 9, 2017 at 12:36 PM, Peter Geoghegan  wrote:
> On Tue, Feb 7, 2017 at 1:31 AM, Mithun Cy  wrote:
>> SEGFAULT was the coding mistake I have called the C-language function
>> directly without initializing the functioncallinfo. Thanks for
>> raising. Below patch fixes same.
>
> It would be nice if this had an option to preload only internal B-tree
> pages into shared_buffers.
>

Sure, but I think it won't directly fit into the current functionality
of patch which loads the blocks that were dumped from shared buffers
before the server has stopped.  The way to extend it could be that
while dumping it just dumps the btree internal pages or something like
that.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Index Scans

2017-02-09 Thread Amit Kapila
On Thu, Feb 9, 2017 at 12:08 PM, Peter Geoghegan  wrote:
> On Wed, Feb 8, 2017 at 10:33 PM, Amit Kapila  wrote:
>> I had some offlist discussion with Robert about the above point and we
>> feel that keeping only heap pages for parallel computation might not
>> be future proof as for parallel index only scans there might not be
>> any heap pages.  So, it is better to use separate GUC for parallel
>> index (only) scans.  We can have two guc's
>> min_parallel_table_scan_size (8MB) and min_parallel_index_scan_size
>> (512kB) for computing parallel scans.  The parallel sequential scan
>> and parallel bitmap heap scans can use min_parallel_table_scan_size as
>> a threshold to compute parallel workers as we are doing now.  For
>> parallel index scans, both min_parallel_table_scan_size and
>> min_parallel_index_scan_size can be used for threshold;  We can
>> compute parallel workers both based on heap_pages to be scanned and
>> index_pages to be scanned and then keep the minimum of those.  This
>> will help us to engage parallel index scans when the index pages are
>> lower than threshold but there are many heap pages to be scanned and
>> will also allow keeping a maximum cap on the number of workers based
>> on index scan size.
>
> What about parallel CREATE INDEX? The patch currently uses
> min_parallel_relation_size as an input into the optimizer's custom
> cost model. I had wondered if that made sense. Note that another such
> input is the projected size of the final index.
>

If projected index size is available, then I think Create Index can
also use a somewhat similar formula where we cap the maximum number of
workers based on the size of the index.  Now, I am not sure if the
threshold values of guc's kept for the scan are realistic for Create
Index operation.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] One-shot expanded output in psql using \gx

2017-02-09 Thread Christoph Berg
Re: David Fetter 2017-02-08 <20170208151214.ga8...@fetter.org>
> Would you be open to saving the next person some work by doing
> something similar to how \d is done, namely looking for an 'x'
> modifier after g without regard to how far after?  As of this writing,
> the \d version starts at line 398 in master.

I think that ship has sailed, given there's already \gset.
Supporting \g[optionset] next to it *now*, given no one knows how
"optionset" is going to evolve seems like premature optimization.

Mit freundlichen Grüßen,
Christoph Berg
-- 
Senior Berater, Tel.: +49 2166 9901 187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
pgp fingerprint: 5C48 FE61 57F4 9179 5970  87C6 4C5A 6BAB 12D2 A7AE


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_bsd_indent: implement -lps ("leave preprocessor space")

2017-02-09 Thread Antonin Houska
Andres Freund  wrote:

> On 2017-02-07 23:30:44 -0500, Tom Lane wrote:
> > Piotr Stefaniak  writes:
> > > this is a patch that Andres asked me for. It makes pg_bsd_indent leave
> > > preprocessor space alone, as in this example:
> > 
> > > #if 0
> > > #  if 0
> > > # if 0
> > > #   error
> > > # endif
> > > #  endif
> > > #else
> > > #  line 7
> > > #endif
> 
> For context: I'd asked Piotr how dificult it'd be to add this.
> 
> > Um ... but the point of pgindent is to standardize spacing, not to let
> > people invent their own style. If you wanted to have a discussion about
> > whether pgindent should force preprocessor directives to look like the
> > above, we could talk about that.  But I do not want to be reading code that
> > looks like the above in one place and code that does not ten lines away.
> 
> I don't think that's something easily done in an automatic
> manner. Because you'd e.g. obviously not want to indent everything
> within include guards.   I don't really buy the danger of large
> divergances in code nearby - this seems mostly useful when writing a bit
> more complicated macros, and I don't think they'll be that frequently
> added in existing files.
> 
> I do think this makes the nesting for #ifdefs a *lot* more readable, and
> we have plenty of cases where it's currently really hard to discern what
> "branch" one is currently reading.  Allowing to opt-in into the "newer"
> formatting in places where it makes sense, seems reasonable to me.

As an alternative approach, the formatting can be implemented in elisp and
added to src/tools/editors/emacs.samples. In particular I mean one function to
make the code human readable and another one to turn it back to the concise
style. User would only call the function on text selection (region) as opposed
to the whole file.

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2017-02-09 Thread Andres Freund
Hi,

On 2016-11-22 10:56:07 -0800, Peter Geoghegan wrote:
> Andres said: "Theoretically we should recheck that the oids still
> match, theoretically the locking here allows the index->table mapping
> to change". I don't know how to act on this feedback, since comparable
> index + heap locking code should have the same problem, but doesn't
> consider it. How is this any different to reindex_index()?

reindex_index isn't necessarily comparable, because
RangeVarCallbackForReindexIndex() is, on first blush at least, careful
enough about the locking, and reindex_relation only gets the index list
after building the index:
/*
 * Find and lock index, and check permissions on table; use callback to
 * obtain lock on table first, to avoid deadlock hazard.  The lock level
 * used here must match the index lock obtained in reindex_index().
 */

I think you ought to do something similar.


I'd really like to have something like relation_check(), index_check()
which calls the correct functions for the relevan index types (and
possibly warns about !nbtree indexes not being checked?).


> @@ -0,0 +1,1227 @@
> +/*-
> + *
> + * verify_nbtree.c
> + *   Verifies the integrity of nbtree indexes based on invariants.
> + *
> + * For B-Tree indexes, verification includes the invariant that each page in
> + * the target index has items in logical order as reported by an insertion
> + * scankey (the insertion scankey sort-wise NULL semantics are needed for
> + * verification).

I'd appreciate some rephrasing of this paragraph. "includes the
invariant that" isn't that easy to understand for an ESL person.


> + *
> + * Copyright (c) 2016, PostgreSQL Global Development Group

... ;)


> +#define CHECK_SUPERUSER() { \
> + if (!superuser()) \
> + ereport(ERROR, \
> + 
> (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), \
> +  (errmsg("must be superuser to use 
> verification functions"; }

Please make this either a function or just copy - the macro doesn't buy
us anything except weirder looking syntax.


> +/*
> + * Callers to verification functions should never receive a false positive
> + * indication of corruption.  Therefore, when using amcheck functions for
> + * stress testing, it may be useful to temporally change the CORRUPTION 
> elevel
> + * to PANIC, to immediately halt the server in the event of detecting an
> + * invariant condition violation.  This may preserve more information about 
> the
> + * nature of the underlying problem.  Note that modifying the CORRUPTION
> + * constant to be an elevel < ERROR is not well tested.
> + */
> +#ifdef NOT_USED
> +#define CORRUPTION   PANIC
> +#define CONCERN  WARNING
> +#define POSITION NOTICE
> +#else
> +#define CORRUPTION   ERROR
> +#define CONCERN  DEBUG1
> +#define POSITION DEBUG2
> +#endif

Please remove this block and use the normal elevels.



> +/*
> + * State associated with verifying a B-Tree index
> + */
> +typedef struct BtreeCheckState
> +{
> + /*
> +  * Unchanging state, established at start of verification:
> +  *
> +  * rel: B-Tree Index Relation
> +  * exclusivelock:   ExclusiveLock held on rel; else AccessShareLock
> +  * checkstrategy:   Buffer access strategy
> +  * targetcontext:   Target page memory context
> +  */
> + Relationrel;
> + boolexclusivelock;
> + BufferAccessStrategycheckstrategy;
> + MemoryContext   targetcontext;

Can you move the field comments to the individual fields? Separating
them makes it more likely to get out of date (obviously keep the
header).  I'd also rename 'exclusivelock' to 'exclusivelylocked' or
such.


> + /*
> +  * Mutable state, for verification of particular page:
> +  *
> +  * target:  Main target page
> +  * targetblock: Main target page's block number
> +  * targetlsn:   Main target page's LSN

Same here.


> +PG_FUNCTION_INFO_V1(bt_index_check);
> +PG_FUNCTION_INFO_V1(bt_index_parent_check);

I think it'd be good to avoid adding a lot of functions for each
additional type of check we can think of.  I think adding a named 'bool
check_parents = false' argument or such would be better.  Then we can
simply add more and more checks subsequently, and we can have
'bool all_checks = false' so people can trivially check everything,
instead of having to add new checks in each new function.


> +/*
> + * bt_index_check(index regclass)
> + *
> + * Verify integrity of B-Tree index.
> + *
> + * Only acquires AccessShareLock on index relation.  Does not consider
> + * invariants that exist between parent/child pages.
> + */
>