Re: [HACKERS] PATCH: pageinspect / add page_checksum and bt_page_items(bytea)

2017-03-12 Thread Ashutosh Sharma
Hi,

I had a look into this patch and would like to share some of my review
comments that requires author's attention.

1) The comment for page_checksum() needs to be corrected. It seems
like it has been copied from page_header and not edited it further.

/*
 * page_header
 *
 * Allows inspection of page header fields of a raw page
 */

PG_FUNCTION_INFO_V1(page_checksum);

Datum
page_checksum(PG_FUNCTION_ARGS)

2) It seems like you have choosen wrong datatype for page_checksum. I
am getting negative checksum value when trying to run below query. I
think the return type for the SQL function page_checksum should be
'integer' instead of 'smallint'.

postgres=# SELECT page_checksum(get_raw_page('pg_class', 0), 100);
 page_checksum
---
-19532
(1 row)

Above problem also exist in case of page_header. I am facing similar
problem with page_header as well.

postgres=# SELECT page_header(get_raw_page('pg_class', 0));
 page_header
-
 (0/2891538,-28949,1,220,7216,8192,8192,4,0)
(1 row)


3) Any reason for not marking bt_page_items or page_checksum as PARALLEL_SAFE.

4) Error messages in new bt_page_items are not consistent with old
bt_page_items. For eg. if i pass meta page blocknumber as input to
bt_page_items the error message thrown by new and old bt_page_items
are different.

postgres=# SELECT * FROM bt_page_items('btree_index', 0) LIMIT 1;
ERROR:  block 0 is a meta page

postgres=# SELECT * FROM bt_page_items(get_raw_page('btree_index', 0)) LIMIT 1;
ERROR:  block is a meta page


postgres=# SELECT * FROM bt_page_items(get_raw_page('btree_index',
1024)) LIMIT 1;
ERROR:  block number 1024 is out of range for relation "btree_index"


postgres=# SELECT * FROM bt_page_items('btree_index', 1024) LIMIT 1;
ERROR:  block number out of range

5) Code duplication in bt_page_items() and bt_page_items_bytea() needs
to be handled.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Tue, Mar 7, 2017 at 9:39 PM, Peter Eisentraut
 wrote:
> On 3/6/17 16:33, Tomas Vondra wrote:
>>> I think it would be better not to maintain so much duplicate code
>>> between bt_page_items(text, int) and bt_page_items(bytea).  How about
>>> just redefining bt_page_items(text, int) as an SQL-language function
>>> calling bt_page_items(get_raw_page($1, $2))?
>>>
>>
>> Maybe. We can also probably share the code at the C level, so I'll look
>> into that.
>
> I think SQL would be easier, but either way some reduction in
> duplication would be good.
>
>>> For page_checksum(), the checksums are internally unsigned, so maybe
>>> return type int on the SQL level, so that there is no confusion about
>>> the sign?
>>>
>>
>> That ship already sailed, I'm afraid. We already have checksum in the
>> page_header() output, and it's defined as smallint there. So using int
>> here would be inconsistent.
>
> OK, no worries then.
>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Scan Mode in Hash Index

2017-03-12 Thread Ashutosh Sharma
Hi,

>
> I've assigned to review this patch.
> At first, I'd like to notice that I like idea and general design.
> Secondly, patch set don't apply cleanly to master.  Please, rebase it.


Thanks for showing your interest towards this patch. I would like to
inform that this patch has got dependency on patch for  'Write Ahead
Logging in hash index - [1]' and 'Microvacuum support in hash index
[1]'. Hence, until above two patches becomes stable I may have to keep
on rebasing this patch. However, I will try to share you the updated
patch asap.

>
>
> On Tue, Feb 14, 2017 at 8:27 AM, Ashutosh Sharma  
> wrote:
>>
>> 1) 0001-Rewrite-hash-index-scans-to-work-a-page-at-a-time.patch: this
>> patch rewrites the hash index scan module to work in page-at-a-time
>> mode. It basically introduces two new functions-- _hash_readpage() and
>> _hash_saveitem(). The former is used to load all the qualifying tuples
>> from a target bucket or overflow page into an items array. The latter
>> one is used by _hash_readpage to save all the qualifying tuples found
>> in a page into an items array. Apart from that, this patch bascially
>> cleans _hash_first(), _hash_next and hashgettuple().
>
>
> I see that forward and backward scan cases of function _hash_readpage contain 
> a lot of code duplication
>  Could you please refactor this function to have less code duplication?

Sure, I will try to avoid the code duplication as much as possible.

>
> Also, I wonder if you have a special idea behind inserting data in test.sql 
> by 1002 separate SQL statements.
> INSERT INTO con_hash_index_table (keycol) SELECT a FROM GENERATE_SERIES(1, 
> 1000) a;
>
> You can achieve the same result by execution of single SQL statement.
> INSERT INTO con_hash_index_table (keycol) SELECT (a - 1) % 1000 + 1 FROM 
> GENERATE_SERIES(1, 1002000) a;
> Unless you have some special idea of doing this in 1002 separate transactions.


There is no reason for having so many INSERT statements in test.sql
file. I think it would be better to replace it with single SQL
statement. Thanks.

[1]- 
https://www.postgresql.org/message-id/CAA4eK1KibVzgVETVay0%2BsiVEgzaXnP5R21BdWiK9kg9wx2E40Q%40mail.gmail.com
[2]- 
https://www.postgresql.org/message-id/CAE9k0PkRSyzx8dOnokEpUi2A-RFZK72WN0h9DEMv_ut9q6bPRw%40mail.gmail.com


--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in get_partition_for_tuple

2017-03-12 Thread Amit Langote
Hi Jeevan,

On 2017/03/13 14:31, Jeevan Ladhe wrote:
> Hi Amit,
> 
> I was able to reproduce the crash, and with the attached patch the crash
> goes
> away. Also, "make check-world" passes clean.
> 
> Patch looks good to me.

Thanks for the review.

> However, In following comment in your test:
> 
> -- check routing error through a list partitioned table when they key is
> null
> 
> I think you want to say:
> 
> -- check routing error through a list partitioned table when the key is null

You're right, fixed that in the attached updated patch.

Thanks,
Amit
>From dbdb0f8f5205a3abd4691c00febf196b853df43f Mon Sep 17 00:00:00 2001
From: amit 
Date: Fri, 10 Mar 2017 11:53:41 +0900
Subject: [PATCH] Fix a bug in get_partition_for_tuple

Handle the NULL partition key more carefully in the case of list
partitioning.
---
 src/backend/catalog/partition.c  | 10 +++---
 src/test/regress/expected/insert.out |  7 +++
 src/test/regress/sql/insert.sql  |  6 ++
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index e01ef864f0..a5ba7448eb 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1726,10 +1726,14 @@ get_partition_for_tuple(PartitionDispatch *pd,
 		errmsg("range partition key of row contains null")));
 		}
 
-		if (partdesc->boundinfo->has_null && isnull[0])
-			/* Tuple maps to the null-accepting list partition */
+		/*
+		 * A null partition key is only acceptable if null-accepting list
+		 * partition exists.
+		 */
+		cur_index = -1;
+		if (isnull[0] && partdesc->boundinfo->has_null)
 			cur_index = partdesc->boundinfo->null_index;
-		else
+		else if (!isnull[0])
 		{
 			/* Else bsearch in partdesc->boundinfo */
 			bool		equal = false;
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index 116854e142..7fafa98212 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -365,6 +365,13 @@ DETAIL:  Failing row contains (1, 2).
 insert into mlparted1 (a, b) values (2, 3);
 ERROR:  new row for relation "mlparted11" violates partition constraint
 DETAIL:  Failing row contains (3, 2).
+-- check routing error through a list partitioned table when the key is null
+create table lparted_nonullpart (a int, b char) partition by list (b);
+create table lparted_nonullpart_a partition of lparted_nonullpart for values in ('a');
+insert into lparted_nonullpart values (1);
+ERROR:  no partition of relation "lparted_nonullpart" found for row
+DETAIL:  Partition key of the failing row contains (b) = (null).
+drop table lparted_nonullpart;
 -- check that RETURNING works correctly with tuple-routing
 alter table mlparted drop constraint check_b;
 create table mlparted12 partition of mlparted1 for values from (5) to (10);
diff --git a/src/test/regress/sql/insert.sql b/src/test/regress/sql/insert.sql
index c56c3c22f8..f9c00705a2 100644
--- a/src/test/regress/sql/insert.sql
+++ b/src/test/regress/sql/insert.sql
@@ -226,6 +226,12 @@ insert into mlparted values (1, 2);
 -- selected by tuple-routing
 insert into mlparted1 (a, b) values (2, 3);
 
+-- check routing error through a list partitioned table when the key is null
+create table lparted_nonullpart (a int, b char) partition by list (b);
+create table lparted_nonullpart_a partition of lparted_nonullpart for values in ('a');
+insert into lparted_nonullpart values (1);
+drop table lparted_nonullpart;
+
 -- check that RETURNING works correctly with tuple-routing
 alter table mlparted drop constraint check_b;
 create table mlparted12 partition of mlparted1 for values from (5) to (10);
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in get_partition_for_tuple

2017-03-12 Thread Jeevan Ladhe
Hi Amit,

I was able to reproduce the crash, and with the attached patch the crash
goes
away. Also, "make check-world" passes clean.

Patch looks good to me. However, In following comment in your test:

-- check routing error through a list partitioned table when they key is
null

I think you want to say:

-- check routing error through a list partitioned table when the key is null


Thanks,
Jeevan Ladhe


On Fri, Mar 10, 2017 at 8:26 AM, Amit Langote  wrote:

> Just observed a crash due to thinko in the logic that handles NULL
> partition key.  Absence of null-accepting partition in this case should
> have caused an error, instead the current code proceeds with comparison
> resulting in crash.
>
> create table p (a int, b char) partition by list (b);
> create table p1 partition of p for values in ('a');
> insert into p values (1);   -- crashes
>
> Attached patch fixes that and adds a test.
>
> Thanks,
> Amit
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] wait events for disk I/O

2017-03-12 Thread Rajkumar Raghuwanshi
On Thu, Mar 9, 2017 at 10:54 AM, Rushabh Lathia
 wrote:
> Thanks Rajkumar for performing tests on this patch.
>
> Yes, I also noticed similar results in my testing. Additionally sometime I
> also
> noticed ReadSLRUPage event on my system.
>
> I also run the reindex database command and I notices below IO events.
>
> SyncImmedRelation,
> WriteDataBlock
> WriteBuffile,
> WriteXLog,
> ReadDataBlock

I have tried for generating some more events, by running pgbench on
master/slave configuration, able to see three more events.
WriteInitXLogFile
SyncInitXLogFile
ReadXLog


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [NOVICE] Why is there a doubtful copyObject call in add_vars_to_targetlist

2017-03-12 Thread Neha Khatri
Sure, understood.

Regards,
Neha


Re: [HACKERS] [NOVICE] Why is there a doubtful copyObject call in add_vars_to_targetlist

2017-03-12 Thread Tom Lane
Neha Khatri  writes:
> Then, should it be alright to remove the doubt itself?

It's a perfectly legitimate comment describing a potential optimization.
There are lots of other similar comments that might or might not ever get
addressed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [NOVICE] Why is there a doubtful copyObject call in add_vars_to_targetlist

2017-03-12 Thread Neha Khatri
On Mon, Mar 13, 2017 at 3:52 PM, Tom Lane  wrote:

> David Rowley  writes:
> > On 13 March 2017 at 14:22, Neha Khatri  wrote:
> >> This copyObject still exits in the current code. So I was wondering if
> the
> >> comment question still holds good and why the question there in first
> place.
> >> To make a new Var object, copyObject seem to be the right choice, then
> why
> >> the doubt?
>
> > The doubt is in the fact if copyObject() is required at all. The other
> > option being to simply reference the same object without having made a
> copy.
>
> Right.  Note that the code that 5efe3121 replaced effectively made a new
> Var object using makeVar.  The new code makes a new Var object using
> copyObject, so there's no actual behavioral change in that fragment, just
> fewer lines of code.  But it's fair to wonder whether it wouldn't be safe
> just to link to the existing Var object.  This is tied up in the planner's
> general willingness to scribble on its input data structures, so that
> linking to a pre-existing object is vulnerable to some unrelated bit of
> code deciding to scribble on that object.  Ideally that wouldn't happen
> ... but cleaning it up looks like a mighty tedious bit of janitorial work,
> with uncertain payoff.  So it hasn't happened in the last twenty years
> and I'm not prepared to bet that it ever will.
>

Then, should it be alright to remove the doubt itself?

Regards,
Neha


Re: [HACKERS] asynchronous execution

2017-03-12 Thread Corey Huinker
>
>
> I think it will, because Append itself has been made async-capable by one
> of the patches and UNION ALL uses Append.  But as mentioned above, only
> the postgres_fdw foreign tables will be able to utilize this for now.
>
>
Ok, I'll re-run my test from a few weeks back and see if anything has
changed.


Re: [HACKERS] [NOVICE] Why is there a doubtful copyObject call in add_vars_to_targetlist

2017-03-12 Thread Tom Lane
David Rowley  writes:
> On 13 March 2017 at 14:22, Neha Khatri  wrote:
>> This copyObject still exits in the current code. So I was wondering if the
>> comment question still holds good and why the question there in first place.
>> To make a new Var object, copyObject seem to be the right choice, then why
>> the doubt?

> The doubt is in the fact if copyObject() is required at all. The other
> option being to simply reference the same object without having made a copy.

Right.  Note that the code that 5efe3121 replaced effectively made a new
Var object using makeVar.  The new code makes a new Var object using
copyObject, so there's no actual behavioral change in that fragment, just
fewer lines of code.  But it's fair to wonder whether it wouldn't be safe
just to link to the existing Var object.  This is tied up in the planner's
general willingness to scribble on its input data structures, so that
linking to a pre-existing object is vulnerable to some unrelated bit of
code deciding to scribble on that object.  Ideally that wouldn't happen
... but cleaning it up looks like a mighty tedious bit of janitorial work,
with uncertain payoff.  So it hasn't happened in the last twenty years
and I'm not prepared to bet that it ever will.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Need a builtin way to run all tests faster manner

2017-03-12 Thread Tom Lane
Andres Freund  writes:
> On 2017-03-11 22:14:07 -0500, Tom Lane wrote:
>> This looks generally sane to me, although I'm not very happy about folding
>> the "$(MKDIR_P) output_iso" call into pg_isolation_regress_check --- that
>> seems weird and unlike the way it's done for the regular regression test
>> case.

> Yea, not super happy about that either - alternatively we could fold it
> into pg_regress.

Yeah, teaching pg_regress to auto-create the --temp-instance directory
seems perfectly sane from here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow interrupts on waiting standby

2017-03-12 Thread Michael Paquier
On Wed, Mar 8, 2017 at 5:45 AM, Peter Eisentraut
 wrote:
> On 1/30/17 20:34, Michael Paquier wrote:
>> Both things are fixed in the new version attached. I have added as
>> well this patch to the next commit fest:
>> https://commitfest.postgresql.org/13/977/
>
> I'm not clear now what this patch is intending to accomplish, seeing
> that the original issue has already been fixed.

This makes ResolveRecoveryConflictWithVirtualXIDs() more responsive if
the process is signaled, as the wait in pg_usleep can go up to 1s if
things remain stuck for a long time.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2017-03-12 Thread Andreas Karlsson

On 03/13/2017 03:11 AM, Andreas Karlsson wrote:

I also fixed the the code to properly support triggers.


And by "support triggers" I actually meant fixing the support for moving 
the foreign keys to the new index.


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2017-03-12 Thread Andreas Karlsson

On 03/02/2017 03:10 AM, Michael Paquier wrote:

On Wed, Mar 1, 2017 at 2:21 AM, Andreas Karlsson  wrote:
+/*
+ * Copy contraint flags for old index. This is safe because the old index
+ * guaranteed uniquness.
+ */
+newIndexForm->indisprimary = oldIndexForm->indisprimary;
+oldIndexForm->indisprimary = false;
+newIndexForm->indisexclusion = oldIndexForm->indisexclusion;
+oldIndexForm->indisexclusion = false;
[...]
+deleteDependencyRecordsForClass(RelationRelationId, newIndexOid,
+RelationRelationId, DEPENDENCY_AUTO);
+deleteDependencyRecordsForClass(RelationRelationId, oldIndexOid,
+ConstraintRelationId,
DEPENDENCY_INTERNAL);
+
+// TODO: pg_depend for old index?


Spotted one of my TODO comments there so I have attached a patch where I 
have cleaned up that function. I also fixed the the code to properly 
support triggers.



There is a lot of mumbo-jumbo in the patch to create the exact same
index definition as the original one being reindexed, and that's a
huge maintenance burden for the future. You can blame me for that in
the current patch. I am wondering if it would not just be better to
generate a CREATE INDEX query string and then use the SPI to create
the index, and also do the following extensions at SQL level:
- Add a sort of WITH NO DATA clause where the index is created, so the
index is created empty, and is marked invalid and not ready.
- Extend pg_get_indexdef_string() with an optional parameter to
enforce the index name to something else, most likely it should be
extended with the WITH NO DATA/INVALID clause, which should just be a
storage parameter by the way.
By doing something like that what the REINDEX CONCURRENTLY code path
should just be careful about is that it chooses an index name that
avoids any conflicts.


Hm, I am not sure how much that would help since a lot of the mumb-jumbo 
is by necessity in the step where we move the constraints over from the 
old index to the new.


Andreas
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index 306def4a15..ca1aeca65f 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -923,7 +923,8 @@ ERROR:  could not serialize access due to read/write dependencies among transact
 
 
  Acquired by VACUUM (without FULL),
- ANALYZE, CREATE INDEX CONCURRENTLY, and
+ ANALYZE, CREATE INDEX CONCURRENTLY,
+ REINDEX CONCURRENTLY,
  ALTER TABLE VALIDATE and other
  ALTER TABLE variants (for full details see
  ).
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 3908ade37b..3449c0af73 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } name
+REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURRENTLY ] name
 
  
 
@@ -68,9 +68,12 @@ REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } CONCURRENTLY option failed, leaving
   an invalid index. Such indexes are useless but it can be
   convenient to use REINDEX to rebuild them. Note that
-  REINDEX will not perform a concurrent build. To build the
-  index without interfering with production you should drop the index and
-  reissue the CREATE INDEX CONCURRENTLY command.
+  REINDEX will perform a concurrent build if 
+  CONCURRENTLY is specified. To build the index without interfering
+  with production you should drop the index and reissue either the
+  CREATE INDEX CONCURRENTLY or REINDEX CONCURRENTLY
+  command. Indexes of toast relations can be rebuilt with REINDEX
+  CONCURRENTLY.
  
 
 
@@ -152,6 +155,21 @@ REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } 
 

+CONCURRENTLY
+
+ 
+  When this option is used, PostgreSQL will rebuild the
+  index without taking any locks that prevent concurrent inserts,
+  updates, or deletes on the table; whereas a standard reindex build
+  locks out writes (but not reads) on the table until it's done.
+  There are several caveats to be aware of when using this option
+   see .
+ 
+
+   
+
+   
 VERBOSE
 
  
@@ -231,6 +249,172 @@ REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } 
 
+  
+   Rebuilding Indexes Concurrently
+
+   
+   index
+   rebuilding concurrently
+   
+
+   
+Rebuilding an index can interfere with regular operation of a database.
+Normally PostgreSQL locks the table whose index is rebuilt
+against writes and performs the entire index build with a single scan of the
+table. Other transactions can still read the table, but if they try to
+insert, update, or delete rows in the table they will block until the
+index rebuild is 

Re: [HACKERS] make async slave to wait for lsn to be replayed

2017-03-12 Thread Thomas Munro
On Fri, Mar 10, 2017 at 1:49 AM, Ivan Kartyshov
 wrote:
> Here I attached rebased patch waitlsn_10dev_v3 (core feature)
> I will leave the choice of implementation (core/contrib) to the discretion
> of the community.
>
> Will be glad to hear your suggestion about syntax, code and patch.

Hi Ivan,

Here is some feedback based on a first read-through of the v4 patch.
I haven't tested it yet.

First, I'll restate my view of the syntax-vs-function question:  I
think an fmgr function is the wrong place to do this, because it
doesn't work for our 2 higher isolation levels as mentioned.  Most
people probably use READ COMMITTED most of the time so the extension
version you've proposed is certainly useful for many people and I like
it, but I will vote against inclusion in core of any feature that
doesn't consider all three of our isolation levels, especially if
there is no way to extend support to other levels later.  I don't want
PostgreSQL to keep adding features that eventually force everyone to
use READ COMMITTED because they want to use feature X, Y or Z.

Maybe someone can think of a clever way for an extension to insert a
wait for a user-supplied LSN *before* acquiring a snapshot so it can
work for the higher levels, or maybe the hooks should go into core
PostgreSQL so that the extension can exist as an external project not
requiring a patched PostgreSQL installation, or maybe this should be
done with new core syntax that extends transaction commands.  Do other
people have views on this?

+ * Portions Copyright (c) 2012-2017, PostgresPro Global Development Group

This should say PostgreSQL.

+wl_lsn_updated_hook(void)
+{
+uint i;
+/*
+ * After update lastReplayedEndRecPtr set Latches in SHMEM array
+ */
+if (counter_waitlsn % count_waitlsn == 0
+|| 
TimestampDifferenceExceeds(time_waitlsn,GetCurrentTimestamp(),interval_waitlsn))
+{

Doesn't this mean that if you are waiting for LSN 1234, and the
primary sends that LSN and then doesn't send anything for another
hour, a standby waiting in pg_waitlsn is quite likely to skip that
update (either because of count_waitlsn or interval_waitlsn), and then
finish up waiting for a very long time?

I'm not sure if this is a good idea, but it's an idea:  You could keep
your update skipping logic, but make sure you always catch the
important case where recovery hits the end of WAL, by invoking your
callback from WaitForWALToBecomeAvailable.  It could have a boolean
parameter that means 'don't skip this one!'.  In other words, it's OK
to skip updates, but not if you know there is no more WAL available to
apply (since you have no idea how long it will be for more to arrive).

Calling GetCurrentTimestamp() at high frequency (after every WAL
record is replayed) may be a bad idea.  It's a slow system call on
some operating systems.  Can we use an existing timestamp for free,
like recoveryLastXTime?  Remembering that the motivation for using
this feature is to wait for *whole transactions* to be replayed and
become visible, there is no sensible reason to wait for random WAL
positions inside a transaction, so if you used that then you would
effectively skip all non-COMMIT records and also skip some COMMIT
records that are coming down the pipe too fast.

+static void
+wl_own_latch(void)
+{
+SpinLockAcquire(>l_arr[MyBackendId].slock);
+OwnLatch(>l_arr[MyBackendId].latch);
+is_latch_owned = true;
+
+if (state->backend_maxid < MyBackendId)
+state->backend_maxid = MyBackendId;
+
+state->l_arr[MyBackendId].pid = MyProcPid;
+SpinLockRelease(>l_arr[MyBackendId].slock);
+}

What is the point of using extra latches for this?  Why not just use
the standard latch?  Syncrep and many other things do that.  I'm not
actually sure if there is ever a reason to create more latches in
regular backends.  SIGUSR1 will be delivered and set the main latch
anyway.

There are cases of specialised latches in the system, like the wal
receiver latch, and I'm reliably informed that that solves problems
like delivering a wakeup message without having to know which backend
is currently the wal receiver (a problem we might consider solving
today with a condition variable?)  I don't think anything like that
applies here.

+for (i = 0; i <= state->backend_maxid; i++)
+{
+SpinLockAcquire(>l_arr[i].slock);
+if (state->l_arr[i].pid != 0)
+SetLatch(>l_arr[i].latch);
+SpinLockRelease(>l_arr[i].slock);
+}

Once we get through the update-skipping logic above, we hit this loop
which acquires  spinlocks for every backend one after another and sets
the latches of every backend, no matter whether they are waiting for
the LSN that has been applied.  Assuming we go with this
scan-the-whole-array approach, why not include the LSN waited for in
the array slots, so that we can avoid disturbing processes that are
waiting for a later LSN?

Could you talk a bit about 

Re: [HACKERS] PassDownLimitBound for ForeignScan/CustomScan [take-2]

2017-03-12 Thread Kouhei Kaigai
> Hello,
> 
> On Wed, March 1, 2017 7:21 pm, Kouhei Kaigai wrote:
> >> I've looked at the patch, and as I'm not that familiar with the
> >> pg-sourcecode, customs and so on, this isn't a review, more like food
> >> for thought and all should be taken with a grain of salt. :)
> >>
> >> So here are a few questions and remarks:
> >>
> >> >+ double  limit_tuples = -1.0;
> >>
> >> Surely the limit cannot be fractional, and must be an integer. So
> >> wouldn't it be better the same type as say:
> >>
> >> >+ if (root->limit_tuples >= 0.0 &&
> >>
> >> Than you could also compare with ">= 0", not ">= 0.0".
> >>
> > The above variable represents the "estimated" number of rows at the
> > planning stage, not execution time.
> > You may be able to see Path structure has "rows" field declared as
> > double type. It makes sense to consider stupid paths during planning,
> > even if it is eventually rejected. For example, if a cross join with
> > two large tables appear during planning, 64bit integer will make
> > overflow easily.
> 
> Hm, ok. Not related to your patch, just curious: Is there a mechanism in
> place that automatically rejects plans where the limit would overflow the
> double to uint64 conversation? Or is this more of a "there would be hopefully
> a plan with a better limit so we do not use the bad one"?
> 
> Would it possible to force a plan where such overflow would occur?
>
We have no such mechanism, and less necessity.
Estimated number of rows in plan time is stored in the plan_rows field of
the Plan structure, as FP64. Once plan-tree gets constructed, estimated
number of rows shall not affect to the execution. (Some plan might use it
for estimation of resource consumption on execution time.)
On the other hands, the actual number of rows that was processed is saved
on the instrument field of the PlanState structure. It is counted up from
the zero by one. So, people wants to replace the hardware prior to uint64
overflow. If 1B rows are processed per sec, uint64 overflow happen at 26th
century(!).

> >> And this comment might be better "were we already called?"
> >>
> >> >+ boolrs_started; /* are we already
> >> called? */
> >>
> > Other variables in ResultState uses present form, like:
> >
> > +   boolrs_started; /* are we already called? */
> > boolrs_done;/* are we done? */
> > boolrs_checkqual;   /* do we need to check the qual? */
> >  } ResultState;
> 
> Yes, I noted that, but still "are" and "called" and "already" don't read
> well together for me:
> 
>   are - present form
>   called - past form like "were we called?", or "are we called bob?" an
> ongoing process
>   already - it has started
> 
> So "are we already called" reads like someone is waiting for being called.
> 
> Maybe to mirror the comment on "rs_done":
> 
>   /* have we started yet? */
> 
> Also, maybe it's easier for the comment to describe what is happening in
> the code because of the flag, not just to the flag itself:
> 
>   /* To do things once when we are called */
> 
> Anyway, it is a minor point and don't let me distract you from your work,
> I do like the feature and the patch :)
>
Fixed to "have we started yet?"


PG-Strom Project / NEC OSS Promotion Center
KaiGai Kohei 



passdown-limit-fdw.v6.patch
Description: passdown-limit-fdw.v6.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous execution

2017-03-12 Thread Amit Langote
On 2017/03/11 8:19, Corey Huinker wrote:
> 
> On Thu, Feb 23, 2017 at 6:59 AM, Kyotaro HORIGUCHI
> >
> wrote:
> 
> 9e43e87
> 
> 
> Patch fails on current master, but correctly applies to 9e43e87. Thanks
> for including the commit id.
> 
> Regression tests pass.
> 
> As with my last attempt at reviewing this patch, I'm confused about what
> kind of queries can take advantage of this patch. Is it only cases where a
> local table has multiple inherited foreign table children?

IIUC, Horiguchi-san's patch adds asynchronous capability for ForeignScan's
driven by postgres_fdw (after building some relevant infrastructure
first).  The same might be added to other Scan nodes (and probably other
nodes as well) eventually so that more queries will benefit from
asynchronous execution.  It may just be that ForeignScan's benefit more
from asynchronous execution than other Scan types.

> Will it work
> with queries where two foreign tables are referenced and combined with a
> UNION ALL?

I think it will, because Append itself has been made async-capable by one
of the patches and UNION ALL uses Append.  But as mentioned above, only
the postgres_fdw foreign tables will be able to utilize this for now.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] scram and \password

2017-03-12 Thread Michael Paquier
On Mon, Mar 13, 2017 at 12:48 AM, Joe Conway  wrote:
> On 03/11/2017 11:07 PM, Michael Paquier wrote:
>> Having a RLS on pg_authid to allow a user to look at its own password
>> type is an idea.
>
> Given that that is not likely at this stage of the dev cycle, what about
> a special purpose SQL function that returns the password type for the
> current user?

OK, so what about doing the following then:
1) Create pg_role_password_type(oid), taking a role OID in input and
returning the type.
2) Extend PQencryptPassword with a method, and document. Plaintext is forbidden.
3) Extend \password in psql with a similar -method=scram|md5 argument,
and forbid the use of "plain" format.
After thinking about it, extending PQencryptPassword() would lead to
future breakage, which makes it clear to not fall into the trap of
having password_encryption set to scram in the server's as well as in
pg_hba.conf and PQencryptPassword() enforcing things to md5.

> Or is it a security issue of some sort to allow a user to
> know their own password type?

Don't think so. Any user has access to the value of
password_encryption. So one can guess what will be the format of a
password hashed.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-12 Thread Craig Ringer
On 11 March 2017 at 00:33, Petr Jelinek  wrote:
> On 09/03/17 18:48, Peter Eisentraut wrote:
>> On 3/6/17 05:27, Petr Jelinek wrote:
>>> And lastly I changed the automagic around exporting, not exporting and
>>> using the snapshot produced by CREATE_REPLICATION_SLOT into explicit
>>> parameter for the CREATE_REPLICATION_SLOT command (and added simple
>>> framework for adding more of those if needed in the future).
>>
>> It might be useful to make this into a separate patch, for clarity.
>>
>
> Okay here it is with this part split out. The first patch also help with
> Craig's logical decoding on standby so it definitely makes sense to be
> split.

Greatly appreciated.

Committing this in chunks makes sense anyway.

I've attached a slightly version that makes pg_recvlogical skip slot
export. The second patch is unchanged, use the copy from the
immediately prior mail.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 02af255a84736ac2783705055d6e998e476359af Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Thu, 9 Mar 2017 14:20:28 +0100
Subject: [PATCH 1/4] Add option to control snapshot export to
 CREATE_REPLICATION_SLOT

We used to export snapshots unconditionally in CREATE_REPLICATION_SLOT
in the replication protocol, but several upcoming patches want more control
over what happens with the slot.

This means that when we allow creation of replication slots on standbys, which
cannot export snapshots because they cannot allocate new XIDs, we don't have to
silently omit the snapshot creation.

It also allows clients like pg_recvlogical, which neither need nor can use the
exported snapshot, to suppress its creation. Since snapshot exporting can fail
this improves reliability.
---
 doc/src/sgml/logicaldecoding.sgml  | 13 +++--
 doc/src/sgml/protocol.sgml | 16 +-
 src/backend/commands/subscriptioncmds.c|  6 ++-
 .../libpqwalreceiver/libpqwalreceiver.c| 15 --
 src/backend/replication/repl_gram.y| 43 
 src/backend/replication/repl_scanner.l |  2 +
 src/backend/replication/walsender.c| 58 --
 src/bin/pg_basebackup/streamutil.c |  5 ++
 src/include/nodes/replnodes.h  |  2 +-
 src/include/replication/walreceiver.h  |  6 +--
 10 files changed, 140 insertions(+), 26 deletions(-)

diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 03c2c69..2b7d6e9 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -268,11 +268,12 @@ $ pg_recvlogical -d postgres --slot test --drop-slot
 

 
-   
+   
 Exported Snapshots
 
- When a new replication slot is created using the streaming replication interface,
- a snapshot is exported
+ When a new replication
+ slot is created using the streaming replication interface, a snapshot
+ is exported
  (see ), which will show
  exactly the state of the database after which all changes will be
  included in the change stream. This can be used to create a new replica by
@@ -282,6 +283,12 @@ $ pg_recvlogical -d postgres --slot test --drop-slot
  database's state at that point in time, which afterwards can be updated
  using the slot's contents without losing any changes.
 
+
+ Creation of a snapshot is not always possible - in particular, it will
+ fail when connected to a hot standby. Applications that do not require
+ snapshot export may suppress it with the NOEXPORT_SNAPSHOT
+ option.
+

   
 
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 3d6e8ee..95603d3 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1487,7 +1487,7 @@ The commands accepted in walsender mode are:
   
 
   
-   CREATE_REPLICATION_SLOT slot_name [ TEMPORARY ] { PHYSICAL [ RESERVE_WAL ] | LOGICAL output_plugin }
+   CREATE_REPLICATION_SLOT slot_name [ TEMPORARY ] { PHYSICAL [ RESERVE_WAL ] | LOGICAL output_plugin [ EXPORT_SNAPSHOT | NOEXPORT_SNAPSHOT ] }
  CREATE_REPLICATION_SLOT
 
 
@@ -1538,6 +1538,20 @@ The commands accepted in walsender mode are:
 

   
+  
+   EXPORT_SNAPSHOT
+   NOEXPORT_SNAPSHOT
+   
+
+ Decides what to do with the snapshot created during logical slot
+ initialization. EXPORT_SNAPSHOT, which is the
+ default, will export the snapshot for use in other sessions. This
+ option can't be used inside a transaction. The
+ NOEXPORT_SNAPSHOT will just use the snapshot for logical
+ decoding as normal but won't do anything else with it.
+
+   
+  
  
 
  
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c

Re: [HACKERS] Need a builtin way to run all tests faster manner

2017-03-12 Thread Andres Freund
0;115;0c
On 2017-03-11 22:14:07 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-03-11 11:48:31 -0800, Andres Freund wrote:
> >> I think that'd be a good plan.  We probably should also keep --outputdir
> >> seperate (which test_decoding/Makefile does, but
> >> pg_isolation_regress_check doesn't)?
> 
> > Here's a patch doing that (based on yours).  I'd be kind of inclined to
> > set --outputdir for !isolation tests too; possibly even move tmp_check
> > below output_iso/ output_regress/ or such - but that seems like it
> > potentially could cause some disagreement...
> 
> This looks generally sane to me, although I'm not very happy about folding
> the "$(MKDIR_P) output_iso" call into pg_isolation_regress_check --- that
> seems weird and unlike the way it's done for the regular regression test
> case.

Yea, not super happy about that either - alternatively we could fold it
into pg_regress.  But given the way that prove_checks works, I thought
it'd not be too ugly comparatively.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] scram and \password

2017-03-12 Thread Michael Paquier
On Mon, Mar 13, 2017 at 9:15 AM, Robert Haas  wrote:
> On Fri, Mar 10, 2017 at 5:43 PM, Michael Paquier
>  wrote:
>> On Sat, Mar 11, 2017 at 2:53 AM, Jeff Janes  wrote:
>>> Should the \password tool in psql inspect password_encryption and act on it
>>> being 'scram'?
>>
>> Not sure if it is wise to change the default fot this release.
>
> Seems like an odd way to phrase it.  Aren't we talking about making a
> feature that worked in previous releases continue to work?

Considering how fresh scram is, it seems clear to me that we do not
want to just switch the default values of password_encryption, the
default behaviors of PQencryptPassword() and \password only to scram,
but have something else. Actually if we change nothing for default
deployments of Postgres using md5, PQencryptPassword() and \password
would still work properly.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2017-03-12 Thread Craig Ringer
On 11 March 2017 at 14:32, Craig Ringer  wrote:

> I'll extract this part of the patch so it can be looked at separately,
> it'll be clearer that way.

Apparently I thought that last time too since I already posted it
split up. Ahem. Working on too many different things at once.

Last-posted patches apply fine, no need for an update.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding on standby

2017-03-12 Thread Craig Ringer
On 13 March 2017 at 10:56, Craig Ringer  wrote:
> On 7 March 2017 at 21:08, Simon Riggs  wrote:
>
>> Patch 4 committed. Few others need rebase.
>
> Since this patch series and initial data copy for logical replication
> both add a facility for suppressing initial snapshot export on a
> logical slot, I've dropped patch 0003 (make snapshot export on logical
> slot creation) in favour of Petr's similar patch.
>
> I will duplicate it in this patch series for ease of application. (The
> version here is slightly extended over Petr's so I'll re-post the
> modified version on the logical replication initial data copy thread
> too).
>
> The main thing I want to direct attention to for Simon, as committer,
> is the xlog'ing of VACUUM's xid threshold before we advance it and
> start removing tuples. This is necessary for the standby to know
> whether a given replication slot is safe to use and fail with conflict
> with recovery if it is not, or if it becomes unsafe due to master
> vacuum activity. Note that we can _not_ use the various vacuum records
> for this because we don't know which are catalogs and which aren't;
> we'd have to add a separate "is catalog" field to each vacuum xlog
> record, which is undesirable. The downstream can't look up whether
> it's a catalog or not because it doesn't have relcache/syscache access
> during decoding.
>
> This change might look a bit similar to the vac_truncate_clog change
> in the txid_status patch, but it isn't really related. The txid_status
> change is about knowing when we can safely look up xids in clog and
> preventing a race with clog truncation. This change is about knowing
> when we can know all catalog tuples for a given xid will still be in
> the heap, not vacuumed away. Both are about making sure standbys know
> more about the state of the system in a low-cost way, though.
>
> WaitForMasterCatalogXminReservation(...) in logical.c is also worth
> looking more closely at.

I should also note that because the TAP tests currently take a long
time, I recommend skipping the tests for this patch by default and
running them only when actually touching logical decoding.

I'm looking at ways to make them faster, but they're inevitably going
to take a while until we can get hot standby feedback replies in
place, including cascading support. Which I have as WIP, but won't
make this release.

Changing the test import to

 use Test::More skip_all => "disabled by default, too slow";

will be sufficient.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-03-12 Thread Peter Geoghegan
On Sun, Mar 12, 2017 at 3:05 PM, Peter Geoghegan  wrote:
> There is still an open item here, though: The leader-as-worker
> Tuplesortstate, a special case, can still leak files.

I phrased this badly. What I mean is that there can be instances where
temp files are left on disk following a failure such as palloc() OOM;
no backend ends up doing an unlink() iff a leader-as-worker
Tuplesortstate was used and we get unlucky. I did not mean a leak of
virtual or real file descriptors, which would see Postgres print a
refcount leak warning from resowner.c. Naturally, these "leaked" files
will eventually be deleted by the next restart of the server at the
latest, within RemovePgTempFiles(). Note also that a duplicate
unlink() (with annoying LOG message) is impossible under any
circumstances with V9, regardless of whether or not a leader-as-worker
Tuplesort state is involved.

Anyway, I was sure that I needed to completely nail this down in order
to be consistent with existing guarantees, but another look at
OpenTemporaryFile() makes me doubt that. ResourceOwnerEnlargeFiles()
is called, which itself uses palloc(), which can of course fail. There
are remarks over that function within resowner.c about OOM:

/*
 * Make sure there is room for at least one more entry in a ResourceOwner's
 * files reference array.
 *
 * This is separate from actually inserting an entry because if we run out
 * of memory, it's critical to do so *before* acquiring the resource.
 */
void
ResourceOwnerEnlargeFiles(ResourceOwner owner)
{
...
}

But this happens after OpenTemporaryFileInTablespace() has already
returned. Taking care to allocate memory up-front here is motivated by
keeping the vFD cache entry and current resource owner in perfect
agreement about the FD_XACT_TEMPORARY-ness of a file, and that's it.
It's *not* true that there is a broader sense in which
OpenTemporaryFile() is atomic, which for some reason I previously
believed to be the case.

So, I haven't failed to prevent an outcome that wasn't already
possible. It doesn't seem like it would be that hard to fix this, and
then have the parallel tuplesort patch live up to that new higher
standard. But, it's possible that Tom or maybe someone else would
consider that a bad idea, for roughly the same reason that we don't
call RemovePgTempFiles() for *crash* induced restarts, as mentioned by
Thomas up-thead:

 * NOTE: we could, but don't, call this during a post-backend-crash restart
 * cycle.  The argument for not doing it is that someone might want to examine
 * the temp files for debugging purposes.  This does however mean that
 * OpenTemporaryFile had better allow for collision with an existing temp
 * file name.
 */
void
RemovePgTempFiles(void)
{
...
}

Note that I did put some thought into making sure OpenTemporaryFile()
does the right thing with collisions with existing temp files. So,
maybe the right thing is to do nothing at all. I don't have strong
feelings either way on this question.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [NOVICE] Why is there a doubtful copyObject call in add_vars_to_targetlist

2017-03-12 Thread David Rowley
(Redirecting to Hackers, since Novice is not the correct place for this
question)

On 13 March 2017 at 14:22, Neha Khatri  wrote:

> Hi,
>
> I was debugging that when does the function _copyVar get invoked, and the
> first hit for that was in the add_vars_to_targetlist. There I happened to
> see the following comment:
>
> /* XXX is copyObject necessary here? */
>
> Further digging showed that this copyObject got added in the commit
> 5efe3121:
>
> +   /* XXX is copyObject necessary here? */
> + rel->targetlist = lappend(rel->targetlist,
> +   create_tl_element((Var *) copyObject(var),
> + length(rel->targetlist) +
> 1));
>
> This copyObject still exits in the current code. So I was wondering if the
> comment question still holds good and why the question there in first place.
> To make a new Var object, copyObject seem to be the right choice, then why
> the doubt?
>

The doubt is in the fact if copyObject() is required at all. The other
option being to simply reference the same object without having made a copy.

The danger of not making a copy would be that any changes made would
naturally affect all things which reference the object. It would seem the
comment and the copyObject() are still there because nobody is satisfied
that it's not required enough to go and remove it, that weighted against
the fact that removing likely wouldn't buy that much performance wise is
likely the reason it's still there.

Probably if someone came up with a realistic enough case to prove that it
was worth removing, then someone might take some time to go and check if it
was safe to remove. There's a good chance that it'll not happen until then,
giving that nobody's bothered in almost 18 years.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-12 Thread Craig Ringer
On 13 March 2017 at 08:54, Vaishnavi Prabakaran
 wrote:

> Before going with this fix, I would like you to consider the option of
> asking batch processing users(via documentation) to set single-row mode
> before calling PQgetResult().
> Either way we need to fix the documentation part, letting users know how
> they can activate single-row mode while using batch processing.
> Let me know your thoughts.

Thanks for looking into this, it's clear that I didn't cover the combo
of single-row-mode and batch mode adequately.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN cost estimate

2017-03-12 Thread David Rowley
On 2 March 2017 at 04:40, Alvaro Herrera  wrote:
> Alvaro Herrera wrote:
>
>> I have added this patch to the commitfest, which I've been intending to
>> get in for a long time.  I'll be submitting an updated patch, if needed.
>
> Here is Emre's patch rebased to current sources.

Looking over this now, and have noticed a couple of things:

1.

+ Assert(nnumbers == 1);

I think its a bad idea to Assert() this. The stat tuple can come from
a plugin which could do anything. Seems like if we need to be certain
of that then it should be an elog(ERROR), maybe mention that we
expected a 1 element array, but got  elements.

2.

+ Assert(varCorrelation >= 0 && varCorrelation <= 1);

same goes for that. I don't think we want to Assert() that either.
elog(ERROR) seems better, or perhaps we should fall back on the old
estimation method in this case?

The Asserted range also seems to contradict the range mentioned in
pg_statistic.h:

/*
 * A "correlation" slot describes the correlation between the physical order
 * of table tuples and the ordering of data values of this column, as seen
 * by the "<" operator identified by staop.  (As with the histogram, more
 * than one entry could theoretically appear.) stavalues is not used and
 * should be NULL.  stanumbers contains a single entry, the correlation
 * coefficient between the sequence of data values and the sequence of
 * their actual tuple positions.  The coefficient ranges from +1 to -1.
 */
#define STATISTIC_KIND_CORRELATION 3

3.

+ numBlocks = 1 + baserel->pages / BrinGetPagesPerRange(indexRel);

should numBlocks be named numRanges? After all we call the option
"pages_per_range".

4.

+ * correlated table is copied 4 times, the correlation would be 0.25,
+ * although the index would be almost as good as the version on the

What's meant by "table is copied 4 times" ?

As best as I can work out, it means.

INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;

but I'm uncertain, and it's not very clear to me what it means.

5.

+ blockSelectivity = (blockProportion + 1 - *indexCorrelation) / 2;

I missed the comment that explains why we divide by two here.

6.

Might want to consider not applying this estimate when the statistics
don't contain any STATISTIC_KIND_CORRELATION array.

In this case the estimation is still applied the same way, only the
*indexCorrelation is 0.

Consider:

CREATE TABLE r2 (values INT NOT NULL);
INSERT INTO r2 VALUES(1);
ANALYZE r2;
\x on
SELECT * FROM pg_statistic WHERE starelid = (SELECT oid FROM pg_class
WHERE relname = 'r2');

Notice the lack of any stakind being set to 3.

7.

+ blockProportion = (double) BrinGetPagesPerRange(indexRel) /
+ baserel->pages;

Perhaps the blockProportion should also be clamped to the number of
pages in the relation. Even a 1 page relation is estimated to have 128
pages with the default pages_per_range, by the method used in the
patch.

blockProportion = Max(blockProportion, baserel->relpages);

during my test the selectivity was set to 64.5, then clamped to 1 by
CLAMP_PROBABILITY(). This does not seem very nice.

Good news is, I'm seeing much better plans coming out in cases when
the index is unlikely to help. So +1 for fixing this issue.

Will Emre be around to make the required changes to the patch? I see
it's been a while since it was originally posted.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-12 Thread Vaishnavi Prabakaran
On Sat, Mar 11, 2017 at 12:52 AM, Daniel Verite 
wrote:

>   Hi,
>
> I notice that PQsetSingleRowMode() doesn't work when getting batch results.
>
> The function is documented as:
> " int PQsetSingleRowMode(PGconn *conn);
>
>   This function can only be called immediately after PQsendQuery or one
>   of its sibling functions, before any other operation on the connection
>   such as PQconsumeInput or PQgetResult"
>
> But PQbatchQueueProcess() unconditionally clears conn->singleRowMode,
> so whatever it was when sending the query gets lost, and besides
> other queries might have been submitted in the meantime.
>
> Also if trying to set that mode when fetching like this
>
>  while (QbatchQueueProcess(conn)) {
>r = PQsetSingleRowMode(conn);
>if (r!=1) {
>   fprintf(stderr, "PQsetSingleRowMode() failed");
>}
>..
>
> it might work the first time, but on the next iterations, conn->asyncStatus
> might be PGASYNC_READY, which is a failure condition for
> PQsetSingleRowMode(), so that won't do.
>


Thanks for investigating the problem, and could you kindly explain what
"next iteration" you mean here? Because I don't see any problem in
following sequence of calls - PQbatchQueueProcess(),PQsetSingleRowMode()
, PQgetResult(). Am I missing anything?
Please note that it is MUST to call PQgetResult immediately after
PQbatchQueueProcess() as documented.



>
> ISTM that the simplest fix would be that when in batch mode,
> PQsetSingleRowMode() should register that the last submitted query
> does request that mode, and when later QbatchQueueProcess dequeues
> the batch entry for the corresponding query, this flag would be popped off
> and set as the current mode.
>
> Please find attached the suggested fix, against the v5 of the patch.
>

Before going with this fix, I would like you to consider the option of
asking batch processing users(via documentation) to set single-row mode
before calling PQgetResult().
Either way we need to fix the documentation part, letting users know how
they can activate single-row mode while using batch processing.
Let me know your thoughts.

Best Regards,
Vaishnavi,
Fujitsu Australia.


Re: [HACKERS] scram and \password

2017-03-12 Thread Robert Haas
On Fri, Mar 10, 2017 at 5:43 PM, Michael Paquier
 wrote:
> On Sat, Mar 11, 2017 at 2:53 AM, Jeff Janes  wrote:
>> Should the \password tool in psql inspect password_encryption and act on it
>> being 'scram'?
>
> Not sure if it is wise to change the default fot this release.

Seems like an odd way to phrase it.  Aren't we talking about making a
feature that worked in previous releases continue to work?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding support for Default partition in partitioning

2017-03-12 Thread Robert Haas
On Fri, Mar 10, 2017 at 2:17 PM, Peter Eisentraut
 wrote:
> On 3/2/17 21:40, Robert Haas wrote:
>> On the point mentioned above, I
>> don't think adding a partition should move tuples, necessarily; seems
>> like it would be good enough - maybe better - for it to fail if there
>> are any that would need to be moved.
>
> ISTM that the uses cases of various combinations of adding a default
> partition, adding another partition after it, removing the default
> partition, clearing out the default partition in order to add more
> nondefault partitions, and so on, need to be more clearly spelled out
> for each partitioning type.  We also need to consider that pg_dump and
> pg_upgrade need to be able to reproduce all those states.  Seems to be a
> bit of work still ...

This patch is only targeting list partitioning.   It is not entirely
clear that the concept makes sense for range partitioning; you can
already define a partition from the end of the last partitioning up to
infinity, or from minus-infinity up to the starting point of the first
partition.  The only thing a default range partition would do is let
you do is have a single partition cover all of the ranges that would
otherwise be unassigned, which might not even be something we want.

I don't know how complete the patch is, but the specification seems
clear enough.  If you have partitions for 1, 3, and 5, you get
partition constraints of (a = 1), (a = 3), and (a = 5).  If you add a
default partition, you get a constraint of (a != 1 and a != 3 and a !=
5).  If you then add a partition for 7, the default partition's
constraint becomes (a != 1 and a != 3 and a != 5 and a != 7).  The
partition must be revalidated at that point for conflicting rows,
which we can either try to move to the new partition, or just error
out if there are any, depending on what we decide we want to do.  I
don't think any of that requires any special handling for either
pg_dump or pg_upgrade; it all just falls out of getting the
partitioning constraints correct and consistently enforcing them, just
as for any other partition.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?

2017-03-12 Thread Venkata B Nagothi
On Tue, Jan 17, 2017 at 9:36 PM, Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp> wrote:

> Hello. I added pgsql-hackers.
>
> This occurs also on git master and back to 9.4.
>
> At Fri, 13 Jan 2017 08:47:06 -0600, Jonathon Nelson 
> wrote in  gmail.com>
> > On Mon, Nov 28, 2016 at 1:39 PM, Jonathon Nelson 
> wrote:
> > > First, the postgresql configuration differs only minimally from the
> stock
> > > config:
> > >
> > > Assume wal_keep_segments = 0.
> > > Assume the use of physical replication slots.
> > > Assume one master, one standby.
> > >
> > > Lastly, we have observed the behavior "in the wild" at least twice and
> in
> > > the lab a dozen or so times.
> > >
> > > EXAMPLE #1 (logs are from the replica):
> > >
> > > user=,db=,app=,client= DEBUG:  creating and filling new WAL file
> > > user=,db=,app=,client= DEBUG:  done creating and filling new WAL file
> > > user=,db=,app=,client= DEBUG:  sending write 6/8B00 flush
> 6/8A00
> > > apply 5/748425A0
> > > user=,db=,app=,client= DEBUG:  sending write 6/8B00 flush
> 6/8B00
> > > apply 5/74843020
> > > 
> > > user=,db=,app=,client= DEBUG:  postmaster received signal 2
> > > user=,db=,app=,client= LOG:  received fast shutdown request
> > > user=,db=,app=,client= LOG:  aborting any active transactions
> > >
> > > And, upon restart:
> > >
> > > user=,db=,app=,client= LOG:  restartpoint starting: xlog
> > > user=,db=,app=,client= DEBUG:  updated min recovery point to
> 6/67002390 on
> > > timeline 1
> > > user=,db=,app=,client= DEBUG:  performing replication slot checkpoint
> > > user=,db=,app=,client= DEBUG:  updated min recovery point to
> 6/671768C0 on
> > > timeline 1
> > > user=,db=,app=,client= CONTEXT:  writing block 589 of relation
> > > base/13294/16501
> > > user=,db=,app=,client= LOG:  invalid magic number  in log segment
> > > 00010006008B, offset 0
> > > user=,db=,app=,client= DEBUG:  switched WAL source from archive to
> stream
> > > after failure
> > > user=,db=,app=,client= LOG:  started streaming WAL from primary at
> > > 6/8A00 on timeline 1
> > > user=,db=,app=,client= FATAL:  could not receive data from WAL stream:
> > > ERROR:  requested WAL segment 00010006008A has already been
> > > removed
>
> I managed to reproduce this. A little tweak as the first patch
> lets the standby to suicide as soon as walreceiver sees a
> contrecord at the beginning of a segment.
>
> - M(aster): createdb as a master with wal_keep_segments = 0
> (default), min_log_messages = debug2
> - M: Create a physical repslot.
> - S(tandby): Setup a standby database.
> - S: Edit recovery.conf to use the replication slot above then
>  start it.
> - S: touch /tmp/hoge
> - M: Run pgbench ...
> - S: After a while, the standby stops.
>   > LOG:   STOP THE SERVER
>
> - M: Stop pgbench.
> - M: Do 'checkpoint;' twice.
> - S: rm /tmp/hoge
> - S: Fails to catch up with the following error.
>
>   > FATAL:  could not receive data from WAL stream: ERROR:  requested WAL
> segment 0001002B has already been removed
>
>
I have been testing / reviewing the latest patch
"0001-Fix-a-bug-of-physical-replication-slot.patch" and i think, i might
need some more clarification on this.

Before applying the patch, I tried re-producing the above error -

- I had master->standby in streaming replication
- Took the backup of master
   - with a low max_wal_size and wal_keep_segments = 0
- Configured standby with recovery.conf
- Created replication slot on master
- Configured the replication slot on standby and started the standby
- I got the below error

   >> 2017-03-10 11:58:15.704 AEDT [478] LOG:  invalid record length at
0/F2000140: wanted 24, got 0
   >> 2017-03-10 11:58:15.706 AEDT [481] LOG:  started streaming WAL from
primary at 0/F200 on timeline 1
   >> 2017-03-10 11:58:15.706 AEDT [481] FATAL:  could not receive data
from WAL stream: ERROR:  requested WAL segment 000100F2 has
already been removed

and i could notice that the file "000100F2" was removed
from the master. This can be easily re-produced and this occurs
irrespective of configuring replication slots.

As long as the file "000100F2" is available on the master,
standby continues to stream WALs without any issues.

some more details -

Contents of the file  "000100F2" on standby before
pg_stop_backup()

rmgr: Standby len (rec/tot): 24/50, tx:  0, lsn:
0/F228, prev 0/F198, desc: RUNNING_XACTS nextXid 638
latestCompletedXid 637 oldestRunningXid 638
rmgr: Standby len (rec/tot): 24/50, tx:  0, lsn:
0/F260, prev 0/F228, desc: RUNNING_XACTS nextXid 638
latestCompletedXid 637 oldestRunningXid 638
rmgr: XLOGlen (rec/tot): 80/   106, tx:  0, lsn:
0/F298, prev 0/F260, desc: CHECKPOINT_ONLINE 

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-03-12 Thread Peter Geoghegan
On Thu, Feb 16, 2017 at 8:45 AM, Peter Geoghegan  wrote:
>> I do not think there should be any reason why we can't get the
>> resource accounting exactly correct here.  If a single backend manages
>> to remove every temporary file that it creates exactly once (and
>> that's currently true, modulo system crashes), a group of cooperating
>> backends ought to be able to manage to remove every temporary file
>> that any of them create exactly once (again, modulo system crashes).
>
> I believe that we are fully in agreement here. In particular, I think
> it's bad that there is an API that says "caller shouldn't throw an
> elog error between these two points", and that will be fixed before
> too long. I just think that it's worth acknowledging a certain nuance.

I attach my V9 of the patch. I came up some stuff for the design of
resource management that I think meets every design goal that we have
for shared/unified BufFiles:

* Avoids both resource leaks, and spurious double-freeing of resources
(e.g., a second unlink() for a file from a different process) when
there are errors. The latter problem was possible before, a known
issue with V8 of the patch. I believe that this revision avoids these
problems in a way that is *absolutely bulletproof* in the face of
arbitrary failures (e.g., palloc() failure) in any process at any
time. Although, be warned that there is a remaining open item
concerning resource management in the leader-as-worker case, which I
go into below.

There are now what you might call "critical sections" in one function.
That is, there are points where we cannot throw an error (without a
BEGIN_CRIT_SECTION()!), but those are entirely confined to unification
code within the leader, where we can be completely sure that no error
can be raised. The leader can even fail before some but not all of a
particular worker's segments are in its local resource manager, and we
still do the right thing. I've been testing this by adding code that
randomly throws errors at points interspersed throughout worker and
leader unification hand-off points. I then leave this stress-test
build to run for a few hours, while monitoring for leaked files and
spurious fd.c reports of double-unlink() and similar issues. Test
builds change LOG to PANIC within several places in fd.c, while
MAX_PHYSICAL_FILESIZE was reduced from 1GiB to BLCKSZ.

All of these guarantees are made without any special care from caller
to buffile.c. The only V9 change to tuplesort.c or logtape.c in this
general area is that they have to pass a dynamic shared memory segment
to buffile.c, so that it can register a new callback. That's it. This
may be of particular interest to Thomas. All complexity is confined to
buffile.c.

* No expansion in the use of shared memory to manage resources.
BufFile refcount is still per-worker. The role of local resource
managers is unchanged.

* Additional complexity over and above ordinary BufFile resource
management is confined to the leader process and its on_dsm_detach()
callback. Only the leader registers a callback. Of course, refcount
management within BufFileClose() can still take place in workers, but
that isn't something that we rely on (that's only for non-error
paths). In general, worker processes mostly have resource managers
managing their temp file segments as a thing that has nothing to do
with BufFiles (BufFiles are still not owned by resowner.c/fd.c --
they're blissfully unaware of all of this stuff).

* In general, unified BufFiles can still be treated in exactly the
same way as conventional BufFiles, and things just work, without any
special cases being exercised internally.

There is still an open item here, though: The leader-as-worker
Tuplesortstate, a special case, can still leak files. So,
stress-testing will only show the patch to be completely robust
against resource leaks when nbtsort.c is modified to enable
FORCE_SINGLE_WORKER testing. Despite the name FORCE_SINGLE_WORKER, you
can also modify that file to force there to be arbitrary-many workers
requested (just change "requested = 1" to something else). The
leader-as-worker problem is avoided because we don't have the leader
participating as a worker this way, which would otherwise present
issues for resowner.c that I haven't got around to fixing just yet. It
isn't hard to imagine why this is -- one backend with two FDs for
certain fd.c temp segments is just going to cause problems for
resowner.c without additional special care. Didn't seem worth blocking
on that. I want to prove that my general approach is workable. That
problem is confined to one backend's resource manager when it is the
leader participating as a worker. It is not a refcount problem. The
simplest solution here would be to ban the leader-as-worker case by
contract. Alternatively, we could pass fd.c segments from the
leader-as-worker Tuplesortstate's BufFile to the leader
Tuplesortstate's BufFile without opening or closing anything. This
way, there will be no 

Re: [HACKERS] possible encoding issues with libxml2 functions

2017-03-12 Thread Pavel Stehule
2017-03-12 21:57 GMT+01:00 Noah Misch :

> On Sun, Mar 12, 2017 at 08:36:58PM +0100, Pavel Stehule wrote:
> > 2017-03-12 0:56 GMT+01:00 Noah Misch :
> > > On Mon, Feb 20, 2017 at 07:48:18PM +0100, Pavel Stehule wrote:
> > > > There are possible two fixes
> > > >
> > > > a) clean decl on input - the encoding info can be removed from decl
> part
> > > >
> > > > b) use xml_out_internal everywhere before transformation to
> > > > xmlChar. pg_xmlCharStrndup can be good candidate.
> > >
> > > I'd prefer (a) if the xml type were a new feature, because no good can
> > > come of
> > > storing an encoding in each xml field when we know the actual encoding
> is
> > > the
> > > database encoding.  However, if you implemented (a), we'd still see
> > > untreated
> > > values brought over via pg_upgrade.  Therefore, I would try (b)
> first.  I
> > > suspect the intent of xml_parse() was to implement (b); it will be
> > > interesting
> > > to see your test case that malfunctions.
> > >
> >
> > I looked there again and I found so this issue is related to xpath
> function
> > only
> >
> > Functions based on xml_parse are working without problems. xpath_internal
> > uses own direct xmlCtxtReadMemory without correct encoding sanitation.
> >
> > so fix is pretty simple
>
> Please add a test case.
>

It needs a application - currently there is not possibility to import XML
document via recv API :(

I wrote a pgimportdoc utility, but it is not part of core


>
> > --- a/src/backend/utils/adt/xml.c
> > +++ b/src/backend/utils/adt/xml.c
> > @@ -3874,9 +3874,11 @@ xpath_internal(text *xpath_expr_text, xmltype
> *data,
> > ArrayType *namespaces,
> > ns_count = 0;
> > }
> >
> > -   datastr = VARDATA(data);
> > -   len = VARSIZE(data) - VARHDRSZ;
> > +   datastr = xml_out_internal(data, 0);
>
> Why not use xml_parse() instead of calling xmlCtxtReadMemory() directly?
> The
> answer is probably in the archives, because someone understood the problem
> enough to document "Some XML-related functions may not work at all on
> non-ASCII data when the server encoding is not UTF-8. This is known to be
> an
> issue for xpath() in particular."


Probably there are two possible issues

1. what I touched - recv function does encoding to database encoding - but
document encoding is not updated.

2. there are not possibility to encode from document encoding to database
encoding.


> > +   len = strlen(datastr);
> > +
> > xpath_len = VARSIZE(xpath_expr_text) - VARHDRSZ;
> > +
>
> The two lines of functional change don't create a cause for more newlines,
> so
> don't add these two newlines.
>

ok


Re: [HACKERS] delta relations in AFTER triggers

2017-03-12 Thread Thomas Munro
On Fri, Mar 10, 2017 at 11:48 AM, Kevin Grittner  wrote:
> On Tue, Mar 7, 2017 at 6:28 PM, Kevin Grittner  wrote:
>
>> New patch attached.
>
> And bit-rotted less than 24 hours later by fcec6caa.
>
> New patch attached just to fix bit-rot.
>
> That conflicting patch might be a candidate to merge into the new
> Ephemeral Named Relation provided by my patch, for more flexibility
> and extensibility...

Thanks.  I found a new way to break it: run the trigger function so
that the plan is cached by plpgsql, then ALTER TABLE incompatibly,
then run the trigger function again.  See attached.

On Wed, Mar 8, 2017 at 1:28 PM, Kevin Grittner  wrote:
>>> I was looking for omissions that would cause some kind of statements
>>> to miss out on ENRs arbitrarily.  It seemed to me that
>>> parse_analyze_varparams should take a QueryEnvironment, mirroring
>>> parse_analyze, so that PrepareQuery could pass it in.  Otherwise,
>>> PREPARE wouldn't see ENRs.  Is there any reason why SPI_prepare should
>>> see them but PREPARE not?
>>
>> Any thoughts about that?
>
> Do you see any way to test that code, or would it be dead code there
> "just in case" we later decided to do something that needed it?  I'm
> not a big fan of the latter.  I've had to spend too much time
> maintaining and/or ripping out code that fits that description.

I guess you could test it by reaching PREPARE and EXECUTE via dynamic
SQL inside a plpgsql function (ie EXECUTE 'EXECUTE ...').

Really I was just trying to be thorough and examine every path into
the parser and analyser to make sure they all supported the new
QueryEnvironment argument.  When I found that the PREPARE path didn't,
my first thought was that there may be PLs that wouldn't be able to
take advantage of plan reuse any other way, but I see that all the
built-in PLs expose SPI_prepare, so that isn't a problem for them.

You're probably right that it's not actually very useful.  We've
recorded this obscure omission in the archives.

> Miscellanea:
>
> Do you suppose we should have all PLs that are part of the base
> distro covered?

I vote for doing that in Postgres 11.  My pl/python patch[1] may be a
useful starting point, but I haven't submitted it in this CF and
nobody has shown up with pl/tcl or pl/perl versions.

> What is necessary to indicate an additional SQL feature covered?

I assume you're talking about information_schema.sql_features, and I
see you've created a new thread to talk about that.  I'm not sure
about that, but a couple of thoughts occurred to me when looking for
references to transition tables in an old draft standard I have.
These are both cases where properties of the subject table should
probably also affect access to the derived transition tables:

* What privileges implications are there for transition tables?  I'm
wondering about column and row level privileges; for example, if you
can't see a column in the subject table, I'm guessing you shouldn't be
allowed to see it in the transition table either, but I'm not sure.

* In future we could consider teaching it about functional
dependencies as required by the spec; if you can SELECT id, name FROM
 GROUP BY id, I believe you should be able to SELECT
id, name FROM  GROUP BY id, but currently you can't.

[1] 
https://www.postgresql.org/message-id/CAEepm=3wvmpmz3bkftk2kcnd9kr7hxpz2skj8sfzx_vsute...@mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com
DROP TABLE IF EXISTS hoge;

CREATE TABLE hoge
(
  id int primary key,
  name text
);

CREATE OR REPLACE FUNCTION hoge_upd_func()
  RETURNS TRIGGER
  LANGUAGE plpgsql
AS $$
BEGIN
  RAISE WARNING 'old table = %, new table = %', (SELECT string_agg(id || '=' || 
name, ',') FROM d), (SELECT string_agg(id || '=' || name, ',') FROM i);
  RETURN NULL;
END;
$$;

CREATE TRIGGER hoge_upd_trigger
  AFTER UPDATE ON hoge
  REFERENCING OLD TABLE AS d NEW TABLE AS i
  FOR EACH STATEMENT EXECUTE PROCEDURE hoge_upd_func();

insert into hoge values (1, '1'), (2, '2'), (3, '3');
update hoge set name = name || name;

-- now change 'name' to an integer to see what happens...
alter table hoge alter column name type int using name::integer;
update hoge set name = (name::text || name::text)::integer;

-- at this point we get an error message:
-- ERROR:  attribute 2 has wrong type
-- DETAIL:  Table has type integer, but query expects text.

-- That error ^ can be cleared by recreating the function hoge_upd_func.

-- now drop column 'name' 
alter table hoge drop column name;
update hoge set id = id;

-- segfault!
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] possible encoding issues with libxml2 functions

2017-03-12 Thread Noah Misch
On Sun, Mar 12, 2017 at 08:36:58PM +0100, Pavel Stehule wrote:
> 2017-03-12 0:56 GMT+01:00 Noah Misch :
> > On Mon, Feb 20, 2017 at 07:48:18PM +0100, Pavel Stehule wrote:
> > > There are possible two fixes
> > >
> > > a) clean decl on input - the encoding info can be removed from decl part
> > >
> > > b) use xml_out_internal everywhere before transformation to
> > > xmlChar. pg_xmlCharStrndup can be good candidate.
> >
> > I'd prefer (a) if the xml type were a new feature, because no good can
> > come of
> > storing an encoding in each xml field when we know the actual encoding is
> > the
> > database encoding.  However, if you implemented (a), we'd still see
> > untreated
> > values brought over via pg_upgrade.  Therefore, I would try (b) first.  I
> > suspect the intent of xml_parse() was to implement (b); it will be
> > interesting
> > to see your test case that malfunctions.
> >
> 
> I looked there again and I found so this issue is related to xpath function
> only
> 
> Functions based on xml_parse are working without problems. xpath_internal
> uses own direct xmlCtxtReadMemory without correct encoding sanitation.
> 
> so fix is pretty simple

Please add a test case.

> --- a/src/backend/utils/adt/xml.c
> +++ b/src/backend/utils/adt/xml.c
> @@ -3874,9 +3874,11 @@ xpath_internal(text *xpath_expr_text, xmltype *data,
> ArrayType *namespaces,
> ns_count = 0;
> }
> 
> -   datastr = VARDATA(data);
> -   len = VARSIZE(data) - VARHDRSZ;
> +   datastr = xml_out_internal(data, 0);

Why not use xml_parse() instead of calling xmlCtxtReadMemory() directly?  The
answer is probably in the archives, because someone understood the problem
enough to document "Some XML-related functions may not work at all on
non-ASCII data when the server encoding is not UTF-8. This is known to be an
issue for xpath() in particular."

> +   len = strlen(datastr);
> +
> xpath_len = VARSIZE(xpath_expr_text) - VARHDRSZ;
> +

The two lines of functional change don't create a cause for more newlines, so
don't add these two newlines.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS][REVIEW] macaddr 64 bit (EUI-64) datatype support

2017-03-12 Thread Stephen Frost
Greetings,

* Stephen Frost (sfr...@snowman.net) wrote:
> * Stephen Frost (sfr...@snowman.net) wrote:
> > * Haribabu Kommi (kommi.harib...@gmail.com) wrote:
> > > On Wed, Feb 1, 2017 at 6:27 AM, Vitaly Burovoy  
> > > wrote:
> > > > The new status of this patch is: Ready for Committer
> > > 
> > > Thanks for the review.
> > 
> > I've started taking a look at this with an eye towards committing it
> > soon.
> 
> I've spent a good bit of time going over this, possibly even more than
> it was worth, but hopefully we'll see people making use of this data
> type with PG10 and as more IPv6 deployment happens.

And, naturally, re-reading the email as it hit the list made me realize
that the documentation/error-message incorrectly said "3rd and 4th"
bytes were being set to FF and FE, when it's actually the 4th and 5th
byte.  The code was correct, just the documentation and the error
message had the wrong numbers.  The commit message is also correct.

As a reminder to myself, this will also need a catversion bump when it
gets committed, of course.

Updated patch attached.

Thanks!

Stephen
From 5b470010cacdc32bec521b1d57ad9f60c7bd638a Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Thu, 9 Mar 2017 17:47:28 -0500
Subject: [PATCH] Add support for EUI-64 MAC addresses as macaddr8

This adds in support for EUI-64 MAC addresses by adding a new data type
called 'macaddr8' (using our usual convention of indicating the number
of bytes stored).

This was largely a copy-and-paste from the macaddr data type, with
appropriate adjustments for having 8 bytes instead of 6 and adding
support for converting a provided EUI-48 (6 byte format) to the EUI-64
format.  Conversion from EUI-48 to EUI-64 inserts FFFE as the 4th and
5th bytes but does not perform the IPv6 modified EUI-64 action of
flipping the 7th bit, but we add a function to perform that specific
action for the user as it may be commonly done by users who wish to
calculate their IPv6 address based on their network prefix and 48-bit
MAC address.

Author: Haribabu Kommi, with a good bit of rework of macaddr8_in by me.
Reviewed by: Vitaly Burovoy, Kuntal Ghosh

Discussion: https://postgr.es/m/CAJrrPGcUi8ZH+KkK+=tctnq+efkecehtmu_yo1mvx8hsk_g...@mail.gmail.com
---
 contrib/btree_gin/Makefile  |   4 +-
 contrib/btree_gin/btree_gin--1.0--1.1.sql   |  35 ++
 contrib/btree_gin/btree_gin.c   |  10 +
 contrib/btree_gin/btree_gin.control |   2 +-
 contrib/btree_gin/expected/macaddr8.out |  51 +++
 contrib/btree_gin/sql/macaddr8.sql  |  22 ++
 contrib/btree_gist/Makefile |  11 +-
 contrib/btree_gist/btree_gist--1.3--1.4.sql |  64 
 contrib/btree_gist/btree_gist.control   |   2 +-
 contrib/btree_gist/btree_gist.h |   1 +
 contrib/btree_gist/btree_macaddr8.c | 200 ++
 contrib/btree_gist/expected/macaddr8.out|  89 +
 contrib/btree_gist/sql/macaddr8.sql |  37 ++
 doc/src/sgml/brin.sgml  |  11 +
 doc/src/sgml/btree-gin.sgml |   3 +-
 doc/src/sgml/btree-gist.sgml|   4 +-
 doc/src/sgml/datatype.sgml  |  83 +
 doc/src/sgml/func.sgml  |  56 +++
 src/backend/utils/adt/Makefile  |   2 +-
 src/backend/utils/adt/mac.c |  13 +-
 src/backend/utils/adt/mac8.c| 560 
 src/backend/utils/adt/network.c |  10 +
 src/backend/utils/adt/selfuncs.c|   1 +
 src/include/catalog/pg_amop.h   |  18 +
 src/include/catalog/pg_amproc.h |   7 +
 src/include/catalog/pg_cast.h   |   6 +
 src/include/catalog/pg_opclass.h|   3 +
 src/include/catalog/pg_operator.h   |  23 +-
 src/include/catalog/pg_opfamily.h   |   3 +
 src/include/catalog/pg_proc.h   |  37 +-
 src/include/catalog/pg_type.h   |   4 +
 src/include/utils/inet.h|  22 ++
 src/test/regress/expected/macaddr8.out  | 355 ++
 src/test/regress/expected/opr_sanity.out|   6 +
 src/test/regress/parallel_schedule  |   2 +-
 src/test/regress/serial_schedule|   1 +
 src/test/regress/sql/macaddr8.sql   |  89 +
 37 files changed, 1827 insertions(+), 20 deletions(-)
 create mode 100644 contrib/btree_gin/btree_gin--1.0--1.1.sql
 create mode 100644 contrib/btree_gin/expected/macaddr8.out
 create mode 100644 contrib/btree_gin/sql/macaddr8.sql
 create mode 100644 contrib/btree_gist/btree_gist--1.3--1.4.sql
 create mode 100644 contrib/btree_gist/btree_macaddr8.c
 create mode 100644 contrib/btree_gist/expected/macaddr8.out
 create mode 100644 contrib/btree_gist/sql/macaddr8.sql
 create mode 100644 src/backend/utils/adt/mac8.c
 create mode 100644 src/test/regress/expected/macaddr8.out
 create mode 100644 src/test/regress/sql/macaddr8.sql

diff --git 

[HACKERS] bugfix: xpath encoding issue

2017-03-12 Thread Pavel Stehule
Hi

When I tested XMLTABLE function I found a bug of XPATH function -
xpath_internal

There xmltype is not correctly encoded to xmlChar due possible invalid
encoding info in header. It is possible when XML was loaded with recv
function and has not UTF8 encoding.

The functions based on xml_parse function works well.

The fix is simple

diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index f81cf489d2..89aae48cb3 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -3874,9 +3874,11 @@ xpath_internal(text *xpath_expr_text, xmltype *data,
ArrayType *namespaces,
ns_count = 0;
}

-   datastr = VARDATA(data);
-   len = VARSIZE(data) - VARHDRSZ;
+   datastr = xml_out_internal(data, 0);
+   len = strlen(datastr);
+
xpath_len = VARSIZE(xpath_expr_text) - VARHDRSZ;
+
if (xpath_len == 0)
ereport(ERROR,
(errcode(ERRCODE_DATA_EXCEPTION),

Regards

Pavel


Re: [HACKERS][REVIEW] macaddr 64 bit (EUI-64) datatype support

2017-03-12 Thread Stephen Frost
Greetings,

* Stephen Frost (sfr...@snowman.net) wrote:
> * Haribabu Kommi (kommi.harib...@gmail.com) wrote:
> > On Wed, Feb 1, 2017 at 6:27 AM, Vitaly Burovoy  
> > wrote:
> > > The new status of this patch is: Ready for Committer
> > 
> > Thanks for the review.
> 
> I've started taking a look at this with an eye towards committing it
> soon.

I've spent a good bit of time going over this, possibly even more than
it was worth, but hopefully we'll see people making use of this data
type with PG10 and as more IPv6 deployment happens.

Of particular note, I rewrote macaddr8_in to not use sscanf().
sscanf(), at least on my system, would accept negative values even for
'%2x', leading to slightly odd errors with certain inputs, including
with our existing macaddr type:

=# select '00-203040506'::macaddr;
ERROR:  invalid octet value in "macaddr" value: "00-203040506"
LINE 1: select '00-203040506'::macaddr;

With my rewrite, the macaddr8 type will throw a clearer (in  my view, at
least) error:

=# select '00-203040506'::macaddr8;
ERROR:  invalid input syntax for type macaddr8: "00-203040506"
LINE 1: select '00-203040506'::macaddr8;

One other point is that the previously disallowed format with just two
colons ('0800:2b01:0203') is now allowed.  Given that both the two dot
format ('0800.2b01.0203') and the two dash format ('0800-2b01-0203')
were accepted, this seemed alright to me.  Is there really a good reason
to disallow the two colon format?

I didn't change how macaddr works as it doesn't appear to produce any
outright incorrect behavior as-is (just slightly odd error messages) and
some users might be expecting the current errors.  I don't hold that
position very strongly, however, and I have little doubt that the new
macaddr8_in() is faster than using sscanf(), so that might be reason to
consider rewriting macaddr_in in a similar fashion (or having a generic
set of functions to handle both).  I considered using the functions we
already use for bytea, but they don't quite match up to the expectations
for MAC addresses (in particular, we shouldn't accept random whitespace
in the middle of a MAC address).  Perhaps we could modify those
functions to be parameterized in a way to support how a MAC address
should look, but it's not all that much code to be reason enough to put
a lot of effort towards that, in my view at least.  This also reduces
the risk that bugs get introduced which break existing behavior too.

I also thought about what we expect the usage of macaddr8 to be and
realized that we should really have a function to help go from EUI-48 to
the IPv6 Modified EUI-64 format, since users will almost certainly want
to do exactly that.  As such, I added a macaddr8_set7bit() function
which will perform the EUI-64 -> Modified EUI-64 change (which is just
setting the 7th bit) and added associated documentation and reasoning
for why that function exists.

In any case, it would be great to get some additional review of this, in
particular of my modifications to macaddr8_in() and if anyone has any
thoughts regarding the added macaddr8_set7bit() function.  I'm going to
take a break from it for a couple days to see if there's any additional
comments and then go over it again myself.

Barring issues, I'll commit the attached later this week.

Thanks!

Stephen
From 239affc0e5b09964029f075c5370556c56de005d Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Thu, 9 Mar 2017 17:47:28 -0500
Subject: [PATCH] Add support for EUI-64 MAC addresses as macaddr8

This adds in support for EUI-64 MAC addresses by adding a new data type
called 'macaddr8' (using our usual convention of indicating the number
of bytes stored).

This was largely a copy-and-paste from the macaddr data type, with
appropriate adjustments for having 8 bytes instead of 6 and adding
support for converting a provided EUI-48 (6 byte format) to the EUI-64
format.  Conversion from EUI-48 to EUI-64 adds FFFE as the 4th and 5th
bytes but does not perform the IPv6 modified EUI-64 action of flipping
the 7th bit, but we add a function to perform that specific action for
the user as it may be commonly done by users who wish to calculate their
IPv6 address based on their network prefix and 48-bit MAC address.

Author: Haribabu Kommi, with a good bit of rework of macaddr8_in by me.
Reviewed by: Vitaly Burovoy, Kuntal Ghosh

Discussion: https://postgr.es/m/CAJrrPGcUi8ZH+KkK+=tctnq+efkecehtmu_yo1mvx8hsk_g...@mail.gmail.com
---
 contrib/btree_gin/Makefile  |   4 +-
 contrib/btree_gin/btree_gin--1.0--1.1.sql   |  35 ++
 contrib/btree_gin/btree_gin.c   |  10 +
 contrib/btree_gin/btree_gin.control |   2 +-
 contrib/btree_gin/expected/macaddr8.out |  51 +++
 contrib/btree_gin/sql/macaddr8.sql  |  22 ++
 contrib/btree_gist/Makefile |  11 +-
 contrib/btree_gist/btree_gist--1.3--1.4.sql |  64 
 contrib/btree_gist/btree_gist.control   |   2 +-
 

Re: [HACKERS] possible encoding issues with libxml2 functions

2017-03-12 Thread Pavel Stehule
2017-03-12 0:56 GMT+01:00 Noah Misch :

> On Mon, Feb 20, 2017 at 07:48:18PM +0100, Pavel Stehule wrote:
> > Today I played with xml_recv function and with xml processing functions.
> >
> > xml_recv function ensures correct encoding from document encoding to
> server
> > encoding. But the decl section holds original encoding info - that should
> > be obsolete after encoding. Sometimes we solve this issue by removing
> decl
> > section - see the xml_out function.
> >
> > Sometimes we don't do it - lot of functions uses direct conversion from
> > xmltype to xmlChar.
>
> > There are possible two fixes
> >
> > a) clean decl on input - the encoding info can be removed from decl part
> >
> > b) use xml_out_internal everywhere before transformation to
> > xmlChar. pg_xmlCharStrndup can be good candidate.
>
> I'd prefer (a) if the xml type were a new feature, because no good can
> come of
> storing an encoding in each xml field when we know the actual encoding is
> the
> database encoding.  However, if you implemented (a), we'd still see
> untreated
> values brought over via pg_upgrade.  Therefore, I would try (b) first.  I
> suspect the intent of xml_parse() was to implement (b); it will be
> interesting
> to see your test case that malfunctions.
>

I looked there again and I found so this issue is related to xpath function
only

Functions based on xml_parse are working without problems. xpath_internal
uses own direct xmlCtxtReadMemory without correct encoding sanitation.

so fix is pretty simple

 diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c

index f81cf489d2..89aae48cb3 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -3874,9 +3874,11 @@ xpath_internal(text *xpath_expr_text, xmltype *data,
ArrayType *namespaces,
ns_count = 0;
}

-   datastr = VARDATA(data);
-   len = VARSIZE(data) - VARHDRSZ;
+   datastr = xml_out_internal(data, 0);
+   len = strlen(datastr);
+
xpath_len = VARSIZE(xpath_expr_text) - VARHDRSZ;
+
if (xpath_len == 0)
ereport(ERROR,
(errcode(ERRCODE_DATA_EXCEPTION),

Regards

Pavel


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-12 Thread Tom Lane
"David G. Johnston"  writes:
> There are only four commands and a finite number of usage permutations.
> Enumerating and figuring out the proper behavior for each should be done.

> Thus - ​If the expressions are bad they are considered false but the block
> is created

> If the flow-control command is bad the system will tell the user why and
> how to get back to a valid state - the entire machine state goes INVALID
> until a corrective command is encountered.

> For instance:

> \if
> \else
> \elif
> warning: elif block cannot occur directly within an \else block.  either
> start a new \if, \endif the current scope, or type \else to continue
> entering commands into the existing else block.  no expression evaluation
> has occurred.
> \echo 'c'
> warning: command ignored in broken \if block scope - see prior correction
> options

This is looking a whole lot like the overcomplicated error reporting that
we already considered and rejected.  I think it's sufficient to print
something like "\elif is not allowed to follow \else; command ignored"
and not change state.  We're not really helping anybody by going into
an "invalid machine state" AFAICS, and having such a thing complicates
the mental model more than I'd like.

A different way of looking at this problem, which will seem like overkill
right now but would absolutely not be once you consider looping, is that
what should happen when we see \if is that we do nothing but absorb text
until we see the matching \endif.  At that point we could bitch and throw
everything away if, say, there's \elif after \else, or anything else you
want to regard as a "compile time error".  Otherwise we start execution,
and from there on it probably has to behave as we've been discussing.
But this'd be pretty unfriendly from an interactive standpoint, and I'm
not really convinced that it makes for significantly better error
reporting.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki

2017-03-12 Thread Jan Michálek
2017-03-10 9:43 GMT+01:00 Jan Michálek :

>
>
> 2017-03-09 20:10 GMT+01:00 Peter Eisentraut  com>:
>
>> This is looking pretty neat.  I played around with it a bit.  There are
>> a couple of edge cases that you need to address, I think.
>>
>
> Thanks, original code is very synoptical and and well prepared for adding
> new formats.
>
>
>>
>> - Does not support \x
>>
>
> I know, i dnot`t know, if \x make sense in this case. I will look, how it
> is done in other formats like html. I think, that it should work in sense,
> that table generated to rst should give similar output after processing
> like output of html format.
>
>
I prepared something like this (i have no prepared diff, i need do some
another changes)
There a few things I need to do. First problem is bold column names, i
should do it in sme fashin as "RECORD", but i need to do some research
about length of column.
Bigger problem is with tab indent, rst processor doesn`t work with this in
this case.

jelen=# execute q \g | xclip
+-++
| **RECORD
1** |
+-++
| column1 | Elephant,
kangaroo,|
| | squirrel,
gorilla  |
+-++
| column2 |
121|
+-++
| column3 |
1.0035971223021583 |
+-++
| column4 |
0. |
+-++
| column5 | Hello Hello Hello Hello Hello Hello Hello Hello Hello
Hello|
+-++
| **RECORD
2** |
+-++
| column1 | goat,
rhinoceros,  |
| | monkey,
ape|
+-++
| column2 |
11121  |
+-++
| column3 |
1.0007824726134585 |
+-++
| column4 |
5. |
+-++
| column5 | xx xx xx xx xx xx xx xx xx
xx  |
+-++
| **RECORD
3** |
+-++
| column1 | donkey, cow, horse,
tit,   |
| | eagle,
whale,  |
| |
aligator,  |
| |
pelican,|
| |
grasshoper |
| |
pig|
| |
bat|
+-++
| column2 |
14351  |
+-++
| column3 |
50.3877551020408163|
+-++
| column4 |
345.11 |
+-++
| column5 | yy yy yy yy yy yy yy yy yy
yy  |
+-++





>
>> - When \pset format is rst, then \pset linestyle also shows up as
>>   "rst".  That is wrong.  Same for markdown.

Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-12 Thread David G. Johnston
On Sun, Mar 12, 2017 at 10:24 AM, Tom Lane  wrote:

>
> One point here is that we need to distinguish problems in the expression,
> which could arise from changing variable values, from some other types of
> mistakes like \elif with no preceding \if.  When you see something like
> that you pretty much have to treat it as a no-op; but I don't think that's
> a problem for scripting usage.
>

​One of my discarded write-ups from last night made a point that we don't
really distinguish between run-time and compile-time errors - possibly
because we haven't had to until now.

​If we detect what would be considered a compile-time error (\elif after
\else for instance) we could treat anything that isn't a conditional
meta-command as a no-op with a warning (and exit in stop-script mode)​.

There are only four commands and a finite number of usage permutations.
Enumerating and figuring out the proper behavior for each should be done.

Thus - ​If the expressions are bad they are considered false but the block
is created

If the flow-control command is bad the system will tell the user why and
how to get back to a valid state - the entire machine state goes INVALID
until a corrective command is encountered.

For instance:

\if
\else
\elif
warning: elif block cannot occur directly within an \else block.  either
start a new \if, \endif the current scope, or type \else to continue
entering commands into the existing else block.  no expression evaluation
has occurred.
\echo 'c'
warning: command ignored in broken \if block scope - see prior correction
options

Yes, that's wordy, but if that was shown the user would be able to
recognize their situation and be able to get back to their desired state.

Figuring out what the valid correction commands are for each invalid state,
and where the user is left, is tedious but mechanical.

So we'd need an INVALID state as well as the existing IGNORE state.

\endif would always work - but take you up one nesting level

The user shouldn't need to memorize the invalid state rules.  While we
could document them and point the reader there having them inline seems
preferable.
​

>
> We could imagine resolving this tension by treating failed \if expressions
> differently in interactive and noninteractive cases.  But I fear that cure
> would be worse than the disease.
>

​I don't think this becomes necessary - we should distinguish the error
types the same in both modes.​


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-12 Thread Corey Huinker
>
>
> (1) document that \if-related commands MUST be on their own
> line (i.e. like cpp #if directives?).
>

I have no opinion on whether \if-related comments must be on their own
line, though I coded as if that were the case.

I want to point out that the goal down the road is to allow rudimentary
expressions beyond just 'will this string cast to boolean true'.

For example, in the earlier thread "Undefined psql variables", I proposed a
slash command that would test if a named psql var were defined, and if not
then assign it a value.

Tom suggested leveraging if-then infrastructure like this

 \if not defined(x)
  \set x y
 \fi

Which would be great. I ask that whatever we decide in terms of how much
more input we read to digest the expression allow for constructs like the
one above.


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-12 Thread Tom Lane
Corey Huinker  writes:
> Reading this, I started to wonder "so how did I get that impression?" and I
> found this from Feb 9:

> IMO, an erroneous backslash command should have no effect, period.
> "It failed but we'll treat it as if it were valid" is a rathole
> I don't want to descend into.  It's particularly bad in interactive
> mode, because the most natural thing to do is correct your spelling
> and issue the command again --- but if psql already decided to do
> something on the strength of the mistaken command, that doesn't work,
> and you'll have to do something or other to unwind the unwanted
> control state before you can get back to what you meant to do.

Yeah, it's not the greatest thing for interactive usage, but as we
already discussed, this feature needs to be optimized for scripting not
interaction --- and even a bit of thought shows that the current behavior
is disastrous for scripting.  If your only suggestion for getting sane
behavior in a script is "set ON_ERROR_STOP", you've failed to provide
useful error handling.

One point here is that we need to distinguish problems in the expression,
which could arise from changing variable values, from some other types of
mistakes like \elif with no preceding \if.  When you see something like
that you pretty much have to treat it as a no-op; but I don't think that's
a problem for scripting usage.

We could imagine resolving this tension by treating failed \if expressions
differently in interactive and noninteractive cases.  But I fear that cure
would be worse than the disease.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-12 Thread Corey Huinker
On Sun, Mar 12, 2017 at 1:52 AM, David G. Johnston <
david.g.johns...@gmail.com> wrote:
>
> Oddly, Corey was using you as support for this position...though without
an actual quote:
>
> """

Reading this, I started to wonder "so how did I get that impression?" and I
found this from Feb 9:

IMO, an erroneous backslash command should have no effect, period.
"It failed but we'll treat it as if it were valid" is a rathole
I don't want to descend into.  It's particularly bad in interactive
mode, because the most natural thing to do is correct your spelling
and issue the command again --- but if psql already decided to do
something on the strength of the mistaken command, that doesn't work,
and you'll have to do something or other to unwind the unwanted
control state before you can get back to what you meant to do.


Re: [HACKERS] scram and \password

2017-03-12 Thread Joe Conway
On 03/11/2017 11:07 PM, Michael Paquier wrote:
> Having a RLS on pg_authid to allow a user to look at its own password
> type is an idea.

Given that that is not likely at this stage of the dev cycle, what about
a special purpose SQL function that returns the password type for the
current user? Or is it a security issue of some sort to allow a user to
know their own password type?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki

2017-03-12 Thread Jan Michálek
I find, there is problem in tab indent in rst, it looks that lines should
be aligned to left in some cases.

2017-03-10 9:43 GMT+01:00 Jan Michálek :

>
>
> 2017-03-09 20:10 GMT+01:00 Peter Eisentraut  com>:
>
>> This is looking pretty neat.  I played around with it a bit.  There are
>> a couple of edge cases that you need to address, I think.
>>
>
> Thanks, original code is very synoptical and and well prepared for adding
> new formats.
>
>
>>
>> - Does not support \x
>>
>
> I know, i dnot`t know, if \x make sense in this case. I will look, how it
> is done in other formats like html. I think, that it should work in sense,
> that table generated to rst should give similar output after processing
> like output of html format.
>
>
>>
>> - When \pset format is rst, then \pset linestyle also shows up as
>>   "rst".  That is wrong.  Same for markdown.
>>
>
> I will look on this.
>
>
>>
>> - Broken output in tuples_only (\t) mode. (rst and markdown)
>>
>
> Similar to \x, im not certain, what it should return. I will look, what
> returns html format. Or i can use it in markdown for nice vs expanded
> format.
>
>
>>
>> - rst: Do something about \pset title; the way it currently shows up
>>   appears to be invalid; could use ".. table:: title" directive
>>
>
> OK, it shouldn`t be problem alter this.
>
>
>>
>> - markdown: Extra blank line between table and footer.
>>
>
> It is because markdown needs empty line after table, if is row count
> presented.
>
>
>>
>> - markdown: We should document or comment somewhere exactly which of the
>>   various markdown table formats this is supposed to produce.  (Pandoc
>>   pipe_tables?)
>>
>
> I use format that was similar to aligned format and ascii linestyle,
> because it allows me to use existing features. I should look over more
> table styles in markdown.
>
>
>>
>> - markdown: Table title needs to be after the table, like
>>
>> Table: title
>>
>> I will change this.
>
>
>> - markdown: Needs to escape | characters in cell contents.  (Not
>>   needed for rst.)  More escaping might be needed.
>>
>
> This can be problem because of aligning, i will look on this, this same
> problem as replace newline with  for markdown.
>
> Have Nice day
>
> Jan
>
>
>>
>> --
>> Peter Eisentraut  http://www.2ndQuadrant.com/
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>
>
>
>
> --
> Jelen
> Starší čeledín datovýho chlíva
>



-- 
Jelen
Starší čeledín datovýho chlíva


Re: [HACKERS] Parallel Append implementation

2017-03-12 Thread Tels
Moin,

On Sat, March 11, 2017 11:29 pm, Robert Haas wrote:
> On Fri, Mar 10, 2017 at 6:01 AM, Tels 
> wrote:
>> Just a question for me to understand the implementation details vs. the
>> strategy:
>>
>> Have you considered how the scheduling decision might impact performance
>> due to "inter-plan parallelism vs. in-plan parallelism"?
>>
>> So what would be the scheduling strategy? And should there be a fixed
>> one
>> or user-influencable? And what could be good ones?
>>
>> A simple example:
>>
>> E.g. if we have 5 subplans, and each can have at most 5 workers and we
>> have 5 workers overall.
>>
>> So, do we:
>>
>>   Assign 5 workers to plan 1. Let it finish.
>>   Then assign 5 workers to plan 2. Let it finish.
>>   and so on
>>
>> or:
>>
>>   Assign 1 workers to each plan until no workers are left?
>
> Currently, we do the first of those, but I'm pretty sure the second is
> way better.  For example, suppose each subplan has a startup cost.  If
> you have all the workers pile on each plan in turn, every worker pays
> the startup cost for every subplan.  If you spread them out, then
> subplans can get finished without being visited by all workers, and
> then the other workers never pay those costs.  Moreover, you reduce
> contention for spinlocks, condition variables, etc.  It's not
> impossible to imagine a scenario where having all workers pile on one
> subplan at a time works out better: for example, suppose you have a
> table with lots of partitions all of which are on the same disk, and
> it's actually one physical spinning disk, not an SSD or a disk array
> or anything, and the query is completely I/O-bound.  Well, it could
> be, in that scenario, that spreading out the workers is going to turn
> sequential I/O into random I/O and that might be terrible.  In most
> cases, though, I think you're going to be better off.  If the
> partitions are on different spindles or if there's some slack I/O
> capacity for prefetching, you're going to come out ahead, maybe way
> ahead.  If you come out behind, then you're evidently totally I/O
> bound and have no capacity for I/O parallelism; in that scenario, you
> should probably just turn parallel query off altogether, because
> you're not going to benefit from it.

I agree with the proposition that both strategies can work well, or not,
depending on system-setup, the tables and data layout. I'd be a bit more
worried about turning it into the "random-io-case", but that's still just
a feeling and guesswork.

So which one will be better seems speculative, hence the question for
benchmarking different strategies.

So, I'd like to see the scheduler be out in a single place, maybe a
function that get's called with the number of currently running workers,
the max. number of workers to be expected, the new worker, the list of
plans still todo, and then schedules that single worker to one of these
plans by strategy X.

That would make it easier to swap out X for Y and see how it fares,
wouldn't it?


However, I don't think the patch needs to select the optimal strategy
right from the beginning (if that even exists, maybe it's a mixed
strategy), even "not so optimal" parallelism will be better than doing all
things sequentially.

Best regards,

Tels


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue about the streaming replication

2017-03-12 Thread Jinhua Luo
Thanks for your information. Now I know the pg_rewind tool.

But why PG does not recover the diverge automatically?

There exists two options at least,  analogy to what "git merge" does:

a) the old master find out and rewind itself to the common base of the
new master in the WAL history before applying new WAL segments from
new master.

b) maybe with logical replication, two masters could merge their data.
if conflict exists, uses the new version. That means data committed on
the old master could survive the crash, which may be important for
some business.

The pg_rewind helps to do option a. But how about option b?


2017-03-12 19:20 GMT+08:00 Michael Paquier :
> On Sun, Mar 12, 2017 at 5:24 PM, Jinhua Luo  wrote:
>> I think this diverge scenario is common, because it's likely the
>> master would crash due to some hardware issue (e.g. power off) which
>> would cause some committed transaction has not yet synced to slave,
>> while the slave would be promoted to new master and accepts new
>> transactions, then how to recover the old master? Moreover, how to
>> recover the data on old master which is missing on new master?
>
> pg_rewind (https://www.postgresql.org/docs/9.6/static/app-pgrewind.html)
> has been designed with exactly this scenario in mind, aka recycling a
> past master as a slave to a promoted node. Have you looked at it? What
> you are trying to do is much likely going to create corruptions on
> your systems, so I am not surprised that you see inconsistency
> failures, what you are seeing is pretty much the tip of hte iceberg.
> --
> Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue about the streaming replication

2017-03-12 Thread Michael Paquier
On Sun, Mar 12, 2017 at 5:24 PM, Jinhua Luo  wrote:
> I think this diverge scenario is common, because it's likely the
> master would crash due to some hardware issue (e.g. power off) which
> would cause some committed transaction has not yet synced to slave,
> while the slave would be promoted to new master and accepts new
> transactions, then how to recover the old master? Moreover, how to
> recover the data on old master which is missing on new master?

pg_rewind (https://www.postgresql.org/docs/9.6/static/app-pgrewind.html)
has been designed with exactly this scenario in mind, aka recycling a
past master as a slave to a promoted node. Have you looked at it? What
you are trying to do is much likely going to create corruptions on
your systems, so I am not surprised that you see inconsistency
failures, what you are seeing is pretty much the tip of hte iceberg.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] issue about the streaming replication

2017-03-12 Thread Jinhua Luo
Hi,

I make a test to see how postgresql handle replication diverge problem:

a) setup two pg cluster A and B
b) run A as master, B as salve, using streaming replication
c) insert some data into table foobar on A, shutdown the network
between A and B at the meantime, which ends up some data would be
missing on B
d) A crashes
e) promote B as new master, insert some data into table foobar on B
f) now data on A and B diverge

When I restart A as new slave, it reports below error in log:
record with incorrect prev-link

And worse is, when I shutdown B and promotes A as master again, it
fails to startup:
LOG:  database system was shut down in recovery
FATAL:  invalid memory alloc request size 2281725952

what's this error and why?

I think this diverge scenario is common, because it's likely the
master would crash due to some hardware issue (e.g. power off) which
would cause some committed transaction has not yet synced to slave,
while the slave would be promoted to new master and accepts new
transactions, then how to recover the old master? Moreover, how to
recover the data on old master which is missing on new master?

Regards,
Jinhua Luo


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-12 Thread Fabien COELHO


Hello Tom,


* Daniel Verite previously pointed out the desirability of disabling
variable expansion while skipping script.  That doesn't seem to be here,


ISTM that it is still there, but for \elif conditions which are currently 
always checked.


  fabien=# \if false
  fabien@#   \echo `echo BAD`
command ignored, use \endif or Ctrl-C to exit current branch.
  fabien@# \else
  fabien=#   \echo `echo OK`
OK
  fabien=# \endif


IIRC, I objected to putting knowledge of ConditionalStack into the 
shared psqlscan.l lexer, and I still think that would be a bad idea; but 
we need some way to get the lexer to shut that off. Probably the best 
way is to add a passthrough "void *" argument that would let the 
get_variable callback function mechanize the rule about not expanding in 
a false branch.


Hmmm. I see this as a circumvolute way of providing the stack knowledge 
without actually giving the stack... it seems that would work, so why not.



* Whether or not you think it's important not to expand skipped variables,
I think that it's critical that skipped backtick expressions not be
executed.



\if something
\elif `expr :var1 + :var2 = :var3`
\endif



I think it's essential that expr not be called if the first if-condition
succeeded.


This was the behavior at some point, but it was changed because we 
understood that it was required that boolean errors were detected and the 
resulting command be simply ignored. I'm really fine with having that 
back.



* The documentation says that an \if or \elif expression extends to the
end of the line, but actually the code is just eating one OT_NORMAL
argument.  That means it's OK to do this: [...]



More generally, I do not think that the approach of having exec_command
simply fall out immediately when in a false branch is going to work,
because it ignores the fact that different backslash commands have
different argument parsing rules.  Some will eat the rest of the line and
some won't.  I'm afraid that it might be necessary to remove that code
block and add a test to every single backslash command that decides
whether to actually perform its action after it's consumed its arguments.
That would be tedious :-(.


Indeed.

IMO the very versatile lexing conventions of backslash commands, or rather 
their actual lack of any consistency, makes it hard to get something very 
sane out of this, especially with the "do not evaluate in false branch" 
argument.


As a simple way out, I suggest to:

(1) document that \if-related commands MUST be on their own
line (i.e. like cpp #if directives?).

(2) check that it is indeed the case when one \if-related
command detected.

(3) call it a feature if someone does not follow the rule and gets a
strange behavior as a result, as below:


regression=# \if 0
regression@# \echo foo \endif
  command ignored, use \endif or Ctrl-C to exit current branch.
  (notice we're not out of the conditional)




* I'm not on board with having a bad expression result in failing
the \if or \elif altogether.


This was understood as a requirement on previous versions which did not 
fail. I do agree that it seems better to keep the structure on errors, at 
least for script usage.


It was stated several times upthread that that should be processed as 
though the result were "false", and I agree with that.


I'm fine with that, if everyone could agree before Corey spends more time 
on this...


[...] We might as well replace the recommendation to use ON_ERROR_STOP 
with a forced abort() for an invalid expression value, because trying to 
continue a script with this behavior is entirely useless.


Hmmm. Maybe your remark is rhetorical. That could be for scripting use, 
but in interactive mode aborting coldly on syntax errors is not too nice 
for the user.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers