Re: [HACKERS] ICU integration

2017-03-23 Thread Jeff Janes
On Thu, Mar 23, 2017 at 12:34 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 3/23/17 05:34, Andreas Karlsson wrote:
> > I am fine with this version of the patch. The issues I have with it,
> > which I mentioned earlier in this thread, seem to be issues with ICU
> > rather than with this patch. For example there seems to be no way for
> > ICU to validate the syntax of the BCP 47 locales (or ICU's old format).
> > But I think we will just have to accept the weirdness of how ICU handles
> > locales.
> >
> > I think this patch is ready to be committed.
> >
> > Found a typo in the documentation:
> >
> > "The inspect the currently available locales" should be "To inspect the
> > currently available locales".
>
> Committed.
>

This has broken the C locale, and the build farm.

if (pg_database_encoding_max_length() > 1 || locale->provider ==
COLLPROVIDER_ICU)

segfaults because locale is null.  (locale_is_c is true)

Cheers,

Jeff


Re: [HACKERS] Hash support for grouping sets

2017-03-23 Thread Andrew Gierth
> "Mark" == Mark Dilger  writes:

 Mark> Is there a performance test case where this patch should shine
 Mark> brightest?  I'd like to load a schema with lots of data, and run
 Mark> a grouping sets query, both before and after applying the patch,
 Mark> to see what the performance advantage is.

The area which I think is most important for performance is the handling
of small cubes; without this patch, a 2d cube needs 2 full sorts, a 3d
one needs 3, and a 4d one needs 6. In many real-world data sets these
would all hash entirely in memory.

So here's a very simple example (deliberately using integers for
grouping to minimize the advantage; grouping by text columns in a non-C
locale would show a much greater speedup for the patch):

create table sales (
  id serial,
  product_id integer,
  store_id integer,
  customer_id integer,
  qty integer);

-- random integer function
create function d(integer) returns integer language sql
 as $f$ select floor(random()*$1)::integer + 1; $f$;

-- 10 million random rows
insert into sales (product_id,store_id,customer_id,qty)
  select d(20), d(6), d(10), d(100) from generate_series(1,1000);

-- example 2d cube:
select product_id, store_id, count(*), sum(qty)
  from sales
 group by cube(product_id, store_id);

-- example 3d cube:
select product_id, store_id, customer_id, count(*), sum(qty)
  from sales
 group by cube(product_id, store_id, customer_id);

-- example 4d cube with a computed column:
select product_id, store_id, customer_id, (qty / 10), count(*), sum(qty)
  from sales
 group by cube(product_id, store_id, customer_id, (qty / 10));

On my machine, the 2d cube is about 3.6 seconds with the patch, and
about 8 seconds without it; the 4d is about 18 seconds with the patch
and about 32 seconds without it (all with work_mem=1GB, compiled with
-O2 and assertions off).

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] createlang/droplang deprecated

2017-03-23 Thread Euler Taveira
2017-03-18 12:29 GMT-03:00 Tom Lane :

>
> But createuser/dropuser are a real problem, because they certainly could
> be mistaken for system-level utilities.


I proposed something along those lines [1] to fix this historical mistake
but we didn't reach a consensus. createuser/dropuser could be a candidate
to removal because it is easily replaced by psql -c "command here" like
Simon said. If we go to this road, other binaries (that are just a wrapper
around an SQL command) could be removed too (such as createdb, dropdb,
clusterdb and reindexdb).


[1]
https://www.postgresql.org/message-id/bdd1adb1-c26d-ad1f-2f15-cc5205606...@timbira.com.br


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento



Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki

2017-03-23 Thread Jan Michálek
2017-03-23 17:26 GMT+01:00 Pierre Ducroquet :

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   tested, passed
> Documentation:tested, passed
>
> Hi
>
> This is my first review (Magnus said in his presentation in PGDay Paris
> that volunteers should just come and help, so here I am), so please notify
> me for any mistake I do when using the review tools...
>
> The feature seems to work as expected, but I don't claim to be a markdown
> and rst expert.
> Some minor issues with the code itself :
> - some indentation issues (documentation and code itself with mix between
> space based and tab based indentation) and a few trailing spaces in code
> - typographic issues in the documentation :
>   - "The html, asciidoc, latex, latex-longtable, troff-ms, and markdown
> and rst formats" ==> duplicated and
>   - "Sets the output format to one of unaligned, aligned, wrapped, html,
> asciidoc, latex (uses tabular), latex-longtable, rst, markdown, or
> troff-ms." ==> extra comma at the end of the list
> - the comment " dont add line after last row, because line is added after
> every row" is misleading, it should warn that it's only for rst
> - there is a block of commented out code left
> - in the print_aligned_vertical function, there is a mix between
> "cont->opt->format == PRINT_RST" and "format == _rst" and I don't see
> any obvious reason for that
> - the documentation doesn't mention (but ok, it's kind of obvious) that
> the linestyle option will not work with rst and markdown
>
> Thanks !
>

Thanks
I will work on it this weekend. I need to adapt it to current master and i
will do some indentation issues with this.
I need to add \x to markdown format and some things about title from older
posts there.

Nice Day Je;



>
> The new status of this patch is: Waiting on Author
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Jelen
Starší čeledín datovýho chlíva


Re: [HACKERS] ICU integration

2017-03-23 Thread Peter Eisentraut
On 3/23/17 16:45, Thomas Munro wrote:
> On Fri, Mar 24, 2017 at 8:34 AM, Peter Eisentraut
>  wrote:
>> Committed.
> 
> On a macOS system using MacPorts, I have "icu" installed.  MacPorts is
> a package manager that installs stuff under /opt/local.  I have
> /opt/local/bin in $PATH so that pkg-config can be found.  I run
> ./configure with --with-icu but without mentioning any special paths
> and it says:
> 
> checking whether to build with ICU support... yes
> checking for pkg-config... /opt/local/bin/pkg-config
> checking pkg-config is at least version 0.9.0... yes
> checking for icu-uc icu-i18n... yes
> 
> ... but then building fails, because there are no headers in the search path:

Fixed.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-23 Thread Stephen Frost
Peter,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 3/22/17 17:33, David Steele wrote:
> > and I doubt that most tool writers would be quick to 
> > add support for a feature that very few people (if any) use.
> 
> I'm not one of those tool writers, although I have written my share of
> DBA scripts over the years, but I wonder why those tools would really
> care.  They are handed files with predetermined names to archive, and
> for restore files with predetermined names are requested back from them.
>  What else do they need?  If something is missing that requires them to
> parse file names, then maybe that should be added.

PG backup technology has come a fair ways from that simple
characterization of it. :)

The backup tools need to also get the LSN from the pg_stop_backup and
verify that they have the WAL file associated with that LSN.  They also
need to make sure that they have all of the WAL files between the
starting LSN and the ending LSN.  Doing that requires understanding how
the files are named to make sure there aren't any missing.

David will probably point out other reasons that the backup tools need
to understand the file naming, but those are ones I know of off-hand.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-23 Thread Tom Lane
Stylistic thought ... I am wondering if it wouldn't be a good idea
to replace EEOP_CASE_WHEN_STEP, EEOP_CASE_THEN_STEP, EEOP_COALESCE,
and EEOP_ARRAYREF_CHECKINPUT with instructions defined in a less
usage-dependent way as

EEOP_JUMP   unconditional jump
EEOP_JUMP_IF_NULL   jump if step result is null
EEOP_JUMP_IF_NOT_NULL   jump if step result isn't null
EEOP_JUMP_IF_NOT_TRUE   jump if step result isn't TRUE

One could imagine later filling out this set with the other BoolTest
condition types, but that seems to be all we need right now.

These are basically just renamings of the step types that exist now,
although EEOP_ARRAYREF_CHECKINPUT would have to drop its not-very-
necessary Assert(!op->d.arrayref.state->isassignment).  Well, I guess
I should say that they're renamings of the semantics that I have
for these steps in my working copy; for instance, I got rid of
casewhen.value/casewhen.isnull in favor of letting CASE WHEN expressions
evaluate into the CASE's final output variable.

At least to me, I think the compiling code would be more readable
this way.  I find WHEN_STEP and THEN_STEP a bit odd because they are
emitted after, not before, the expressions you'd think they control.
ARRAYREF_CHECKINPUT is pretty vaguely named, too.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-23 Thread Michael Banck
Hi,

On Thu, Mar 23, 2017 at 09:00:16AM -0400, Peter Eisentraut wrote:
> On 3/21/17 21:38, Peter Eisentraut wrote:
> > This patch is looking pretty good to me, modulo the failing pg_dump tests.
> > 
> > Attached is a fixup patch.  I have mainly updated some comments and
> > variable naming for (my) clarity.  No functional changes.
> 
> Committed all that.

Maybe I'm doing something wrong, but I'm getting a segfault trying to
start logical replication since that patch went in.

I've blown away my build tree and started over with this minimal
example, any obvious mistakes here?  Quoting the publisher/subscriber
names doesn't seem to change things:

$ initdb --pgdata=data_pg1
$ sed -i -e 's/^#wal_level.=.replica/wal_level = logical/' 
data_pg1/postgresql.conf
$ pg_ctl --pgdata=data_pg1 -l pg1.log start
$ pg_basebackup --pgdata=data_pg2
$ sed -i -e 's/^#port.=.5432/port = 5433/' data_pg2/postgresql.conf
$ pg_ctl --pgdata=data_pg2 -l pg2.log start
$ psql -c "CREATE PUBLICATION pub1;"
$ psql --port=5433 -c "CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=postgres' 
PUBLICATION pub1;"

Backtrace:

Program received signal SIGSEGV, Segmentation fault.
ExecSetSlotDescriptor (slot=slot@entry=0xfc4d38, tupdesc=tupdesc@entry=0x0)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/executor/execTuples.c:269
269 PinTupleDesc(tupdesc);
(gdb) bt
#0  ExecSetSlotDescriptor (slot=slot@entry=0xfc4d38, tupdesc=tupdesc@entry=0x0)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/executor/execTuples.c:269
#1  0x005dc4fc in MakeSingleTupleTableSlot (tupdesc=0x0)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/executor/execTuples.c:203
#2  0x005a16ff in fetch_table_list (wrconn=wrconn@entry=0xc0030001,
publications=publications@entry=0xffb448)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/commands/subscriptioncmds.c:996
#3  0x005a1efa in CreateSubscription (stmt=0x101cd40, 
isTopLevel=)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/commands/subscriptioncmds.c:396
#4  0x006ecf96 in ProcessUtilitySlow (pstate=0x0, 
pstate@entry=0xfc4818, pstmt=0x0,
pstmt@entry=0x101d080, queryString=0xf9d248 ,
queryString@entry=0x101c0b8 "CREATE SUBSCRIPTION \"sub1\" CONNECTION 
'dbname=postgres' PUBLICATION pub1 WITH (COPY DATA);", context=(unknown: 
16534992), context@entry=PROCESS_UTILITY_TOPLEVEL,
params=0x7ffd7e6263d0, params@entry=0x0,
completionTag=0x45 ,
completionTag@entry=0x7ffd7e626b00 "", dest=)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/tcop/utility.c:1612
#5  0x006ec4e9 in standard_ProcessUtility (pstmt=0x101d080,
queryString=0x101c0b8 "CREATE SUBSCRIPTION \"sub1\" CONNECTION 
'dbname=postgres' PUBLICATION pub1 WITH (COPY DATA);", 
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, dest=0x101d160,
completionTag=0x7ffd7e626b00 "")
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/tcop/utility.c:922
#6  0x006e9f5f in PortalRunUtility (portal=0xfbb9d8, pstmt=0x101d080,
isTopLevel=, setHoldSnapshot=, 
dest=,
completionTag=0x7ffd7e626b00 "")
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/tcop/pquery.c:1165
#7  0x006ea977 in PortalRunMulti (portal=portal@entry=0xfbb9d8,
isTopLevel=isTopLevel@entry=1 '\001', 
setHoldSnapshot=setHoldSnapshot@entry=0 '\000',
dest=dest@entry=0x101d160, altdest=altdest@entry=0x101d160,
completionTag=completionTag@entry=0x7ffd7e626b00 "")
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/tcop/pquery.c:1315
#8  0x006eb4cb in PortalRun (portal=portal@entry=0xfbb9d8,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001',
dest=dest@entry=0x101d160, altdest=altdest@entry=0x101d160,
completionTag=completionTag@entry=0x7ffd7e626b00 "")
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/tcop/pquery.c:788
#9  0x006e7868 in exec_simple_query (
query_string=0x101c0b8 "CREATE SUBSCRIPTION \"sub1\" CONNECTION 
'dbname=postgres' PUBLICATION pub1 WITH (COPY DATA);")
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/tcop/postgres.c:1101
#10 0x006e94a9 in PostgresMain (argc=1, argv=0x101c0b8, dbname=0xfc8478 
"postgres",
username=0xfc8458 "postgres")
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/tcop/postgres.c:4069
#11 0x004770ad in BackendRun (port=0xfc2970)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/postmaster/postmaster.c:4317
#12 BackendStartup (port=0xfc2970)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/postmaster/postmaster.c:3989
#13 ServerLoop ()
at 

Re: [HACKERS] Re: [BUGS] Problem in using pgbench's --connect(-C) and --rate=rate(-R rate) options together.

2017-03-23 Thread Teodor Sigaev

Hi, the patch looks good except why do you remove initialization of 
is_throttled?
Suppose, just a typo?

 --- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -1967,7 +1967,6 @@ top:
st->listen = false;
st->sleeping = false;
st->throttling = false;
-   st->is_throttled = false;
memset(st->prepared, 0, sizeof(st->prepared));
}


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump, pg_dumpall and data durability

2017-03-23 Thread Michael Paquier
On Wed, Mar 22, 2017 at 7:00 AM, Michael Paquier
 wrote:
> On Wed, Mar 22, 2017 at 6:24 AM, Andrew Dunstan
>  wrote:
>> This is really a pretty small patch all things considered, and pretty
>> low-risk (although I haven;t been threough the code in fine detail yet).
>> In the end I'm persuaded by Andres' point that there's actually no
>> practical alternative way to make sure the data is actually synced to disk.
>>
>> If nobody else wants to pick it up I will, unless there is a strong
>> objection.
>
> Thanks!

Thanks Andrew, I can see that this has been committed as 96a7128b.

I also saw that:
https://www.postgresql.org/message-id/75e1b6ff-c3d5-9a26-e38b-3cb22a099...@2ndquadrant.com
I'll send a patch in a bit for the regression tests.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Measuring replay lag

2017-03-23 Thread Thomas Munro
On Thu, Mar 23, 2017 at 10:50 PM, Simon Riggs  wrote:
>> Second thoughts... I'll just make LagTrackerWrite externally
>> available, so a plugin can send anything it wants to the tracker.
>> Which means I'm explicitly removing the "logical replication support"
>> from this patch.
>
> Done.
>
> Here's the patch I'm looking to commit, with some docs and minor code
> changes as discussed.

Thank you for committing this.  Time-based replication lag tracking
seems to be a regular topic on mailing lists and IRC, so I hope that
this will provide what people are looking for and not simply replace
that discussion with a new discussion about what lag really means :-)

Many thanks to Simon and Fujii-san for convincing me to move the
buffer to the sender (which now seems so obviously better), to
Fujii-san for the idea of tracking write and flush lag too, and to
Abhijit, Sawada-san, Ian, Craig and Robert for valuable feedback.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential data loss of 2PC files

2017-03-23 Thread Michael Paquier
On Fri, Mar 24, 2017 at 5:08 AM, Teodor Sigaev  wrote:
> Hmm, it doesn't work (but appplies) on current HEAD:
> [...]
> Data page checksums are disabled.
>
> fixing permissions on existing directory /spool/pg_data ... ok
> creating subdirectories ... ok
> selecting default max_connections ... 100
> selecting default shared_buffers ... 128MB
> selecting dynamic shared memory implementation ... posix
> creating configuration files ... ok
> running bootstrap script ... 2017-03-23 23:07:14.705 MSK [40286] FATAL:
> could not open file "pg_clog": No such file or directory
> child process exited with exit code 1
> initdb: data directory "/spool/pg_data" not removed at user's request

And the renaming of pg_clog to pg_xact is also my fault. Attached is
an updated patch.
-- 
Michael
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 2d33510930..7a007a6ba5 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -577,6 +577,13 @@ ShutdownCLOG(void)
 	/* Flush dirty CLOG pages to disk */
 	TRACE_POSTGRESQL_CLOG_CHECKPOINT_START(false);
 	SimpleLruFlush(ClogCtl, false);
+
+	/*
+	 * fsync pg_xact to ensure that any files flushed previously are durably
+	 * on disk.
+	 */
+	fsync_fname("pg_xact", true);
+
 	TRACE_POSTGRESQL_CLOG_CHECKPOINT_DONE(false);
 }
 
@@ -589,6 +596,13 @@ CheckPointCLOG(void)
 	/* Flush dirty CLOG pages to disk */
 	TRACE_POSTGRESQL_CLOG_CHECKPOINT_START(true);
 	SimpleLruFlush(ClogCtl, true);
+
+	/*
+	 * fsync pg_xact to ensure that any files flushed previously are durably
+	 * on disk.
+	 */
+	fsync_fname("pg_xact", true);
+
 	TRACE_POSTGRESQL_CLOG_CHECKPOINT_DONE(true);
 }
 
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 8e1df6e0ea..03ffa20908 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -746,6 +746,12 @@ ShutdownCommitTs(void)
 {
 	/* Flush dirty CommitTs pages to disk */
 	SimpleLruFlush(CommitTsCtl, false);
+
+	/*
+	 * fsync pg_commit_ts to ensure that any files flushed previously are durably
+	 * on disk.
+	 */
+	fsync_fname("pg_commit_ts", true);
 }
 
 /*
@@ -756,6 +762,12 @@ CheckPointCommitTs(void)
 {
 	/* Flush dirty CommitTs pages to disk */
 	SimpleLruFlush(CommitTsCtl, true);
+
+	/*
+	 * fsync pg_commit_ts to ensure that any files flushed previously are durably
+	 * on disk.
+	 */
+	fsync_fname("pg_commit_ts", true);
 }
 
 /*
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 4b4999fd7b..831693 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1650,6 +1650,14 @@ CheckPointTwoPhase(XLogRecPtr redo_horizon)
 	}
 	LWLockRelease(TwoPhaseStateLock);
 
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.  Two-phase files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created
+	 * previously since the last checkpoint.
+	 */
+	fsync_fname(TWOPHASE_DIR, true);
+
 	TRACE_POSTGRESQL_TWOPHASE_CHECKPOINT_DONE();
 
 	if (log_checkpoints && serialized_xacts > 0)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b99ded5df6..11b1929c94 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3469,7 +3469,7 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
 	if (!find_free)
 	{
 		/* Force installation: get rid of any pre-existing segment file */
-		unlink(path);
+		durable_unlink(path, DEBUG1);
 	}
 	else
 	{
@@ -4020,16 +4020,13 @@ RemoveXlogFile(const char *segname, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
 	  path)));
 			return;
 		}
-		rc = unlink(newpath);
+		rc = durable_unlink(newpath, LOG);
 #else
-		rc = unlink(path);
+		rc = durable_unlink(path, LOG);
 #endif
 		if (rc != 0)
 		{
-			ereport(LOG,
-	(errcode_for_file_access(),
-			   errmsg("could not remove old transaction log file \"%s\": %m",
-	  path)));
+			/* Message already logged by durable_unlink() */
 			return;
 		}
 		CheckpointStats.ckpt_segs_removed++;
@@ -10752,17 +10749,13 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
 		(errcode_for_file_access(),
 		 errmsg("could not read file \"%s\": %m",
 BACKUP_LABEL_FILE)));
-			if (unlink(BACKUP_LABEL_FILE) != 0)
-ereport(ERROR,
-		(errcode_for_file_access(),
-		 errmsg("could not remove file \"%s\": %m",
-BACKUP_LABEL_FILE)));
+			durable_unlink(BACKUP_LABEL_FILE, ERROR);
 
 			/*
 			 * Remove tablespace_map file if present, it is created only if there
 			 * are tablespaces.
 			 */
-			unlink(TABLESPACE_MAP);
+			durable_unlink(TABLESPACE_MAP, DEBUG1);
 		}
 		PG_END_ENSURE_ERROR_CLEANUP(pg_stop_backup_callback, (Datum) BoolGetDatum(exclusive));
 	}
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c

Re: [HACKERS] Hash support for grouping sets

2017-03-23 Thread Andrew Gierth
> "Andres" == Andres Freund  writes:

 Andres> We usually cast the result of palloc.

 >> Rough count in the backend has ~400 without casts to ~1350 with, so
 >> this doesn't seem to have been consistently enforced.

 Andres> Yea, but we're still trying.

Well, a lot of the uncasted ones are in fairly new code, from quite a
number of different committers.

So if this is a big deal, why don't we already have

#define palloc_array(etype,ecount) (((etype) *) palloc((ecount) * 
sizeof(etype)))
#define palloc_object(otype) (((otype) *) palloc(sizeof(otype)))

or something of that ilk?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-23 Thread Peter Eisentraut
On 3/22/17 17:33, David Steele wrote:
> I think if we don't change the default size it's very unlikely I would 
> use alternate WAL segment sizes or recommend that anyone else does, at 
> least in v10.
> 
> I simply don't think it would get the level of testing required to be 
> production worthy

I think we could tweak the test harnesses to run all the tests with
different segment sizes.  That would get pretty good coverage.

More generally, the methodology that we should not add an option unless
we also change the default because the option would otherwise not get
enough testing is a bit dubious.

> and I doubt that most tool writers would be quick to 
> add support for a feature that very few people (if any) use.

I'm not one of those tool writers, although I have written my share of
DBA scripts over the years, but I wonder why those tools would really
care.  They are handed files with predetermined names to archive, and
for restore files with predetermined names are requested back from them.
 What else do they need?  If something is missing that requires them to
parse file names, then maybe that should be added.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU integration

2017-03-23 Thread Thomas Munro
On Fri, Mar 24, 2017 at 8:34 AM, Peter Eisentraut
 wrote:
> Committed.

On a macOS system using MacPorts, I have "icu" installed.  MacPorts is
a package manager that installs stuff under /opt/local.  I have
/opt/local/bin in $PATH so that pkg-config can be found.  I run
./configure with --with-icu but without mentioning any special paths
and it says:

checking whether to build with ICU support... yes
checking for pkg-config... /opt/local/bin/pkg-config
checking pkg-config is at least version 0.9.0... yes
checking for icu-uc icu-i18n... yes

... but then building fails, because there are no headers in the search path:

gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -Wno-unused-command-line-argument -g -O2 -Wall -Werror
-I../../../src/include/snowball
-I../../../src/include/snowball/libstemmer -I../../../src/include   -c
-o dict_snowball.o dict_snowball.c -MMD -MP -MF .deps/dict_snowball.Po
...
../../../src/include/utils/pg_locale.h:19:10: fatal error:
'unicode/ucol.h' file not found

But pkg-config does know where to find those headers:

$ pkg-config --cflags icu-i18n
-I/opt/local/include

... and it's not wrong:

$ ls /opt/local/include/unicode/ucol.h
/opt/local/include/unicode/ucol.h

So I think there may be a bug in the configure script: I think it
should be putting pkg-config's --cflags output into one of the
CFLAGS-like variables.

Or am I misunderstanding what pkg-config is supposed to be doing for us here?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-23 Thread Petr Jelinek
Hi,

On 23/03/17 23:06, Michael Banck wrote:
> Hi,
> 
> On Thu, Mar 23, 2017 at 09:00:16AM -0400, Peter Eisentraut wrote:
>> On 3/21/17 21:38, Peter Eisentraut wrote:
>>> This patch is looking pretty good to me, modulo the failing pg_dump tests.
>>>
>>> Attached is a fixup patch.  I have mainly updated some comments and
>>> variable naming for (my) clarity.  No functional changes.
>>
>> Committed all that.
> 
> Maybe I'm doing something wrong, but I'm getting a segfault trying to
> start logical replication since that patch went in.
> 

Thanks for report. Looks like we don't handle correctly empty result set
when fetching table list. Will post fix tomorrow.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-23 Thread Mithun Cy
Hi Amit please find the new patch

On Tue, Mar 21, 2017 at 8:16 AM, Amit Kapila  wrote:
> It is not only about above calculation, but also what the patch is
> doing in function _hash_get_tbuckets().  By the way function name also
> seems unclear (mainly *tbuckets* in the name).

Fixed I have introduced some macros for readability and added more
comments to explain why some calculations are mad. Please let me know
if you think more improvements are needed.

>spelling.
>/forth/fourth
>/at time/at a time
Thanks fixed.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


expand_hashbucket_efficiently_04.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential data loss of 2PC files

2017-03-23 Thread Teodor Sigaev

Hmm, it doesn't work (but appplies) on current HEAD:

% uname -a
FreeBSD *** 11.0-RELEASE-p8 FreeBSD 11.0-RELEASE-p8 #0 r315651: Tue Mar 21 
02:44:23 MSK 2017 teodor@***:/usr/obj/usr/src/sys/XOR  amd64

% pg_config --configure
'--enable-depend' '--enable-cassert' '--enable-debug' '--enable-tap-tests' 
'--with-perl' 'CC=clang' 'CFLAGS=-O0'

% rm -rf /spool/pg_data/*
% initdb -n -E UTF8 --locale ru_RU.UTF-8 --lc-messages C --lc-monetary C 
--lc-numeric C --lc-time C -D /spool/pg_data

Running in no-clean mode.  Mistakes will not be cleaned up.
The files belonging to this database system will be owned by user "teodor".
This user must also own the server process.

The database cluster will be initialized with locales
  COLLATE:  ru_RU.UTF-8
  CTYPE:ru_RU.UTF-8
  MESSAGES: C
  MONETARY: C
  NUMERIC:  C
  TIME: C
The default text search configuration will be set to "russian".

Data page checksums are disabled.

fixing permissions on existing directory /spool/pg_data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... 2017-03-23 23:07:14.705 MSK [40286] FATAL:  could 
not open file "pg_clog": No such file or directory

child process exited with exit code 1
initdb: data directory "/spool/pg_data" not removed at user's request


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] few fts functions for jsonb

2017-03-23 Thread Andrew Dunstan


On 03/21/2017 06:28 PM, Dmitry Dolgov wrote:
> > On 21 March 2017 at 03:03, Andrew Dunstan
>  > wrote:
> >
> > However, I think it should probably be broken up into a couple of
> pieces -
> > one for the generic json/jsonb transforms infrastructure (which probably
> > needs some more comments) and one for the FTS functions that will
> use it.
>
> Sure, here are two patches with separated functionality and a bit more
> commentaries for the transform functions.

I'm not through looking at this. However, here are a few preliminary
comments

  * we might need to rationalize the header locations a bit
  * iterate_json(b) and transform_json(b) are a bit too generally named.
Really what they do is iterate over or transform string values in
the json(b). They ignore / preserve the structure, keys, and
non-string scalar values in the json(b). A general iterate or
transform function would be called in effect with a stream of all
the elements in the json, not just scalar strings.
  * Unless I'm missing something the iterate_json(b)_values return value
is ignored. Instead of returning the state it looks to me like it
should return nothing and be declared as void instead of void *
  * transform_jsonb and transform_json are somewhat asymmetrical. The
latter should probably return a text* instead of a StringInfo, to be
consistent with the former.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-03-23 Thread Ashutosh Sharma
Hi All,

I have tried to test 'group_update_clog_v11.1.patch' shared upthread by
Amit on a high end machine. I have tested the patch with various savepoints
in my test script. The machine details along with test scripts and the test
results are shown below,

Machine details:

24 sockets, 192 CPU(s)
RAM - 500GB

test script:


\set aid random (1,3000)
\set tid random (1,3000)

BEGIN;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid for UPDATE;
SAVEPOINT s1;
SELECT tbalance FROM pgbench_tellers WHERE tid = :tid for UPDATE;
SAVEPOINT s2;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid for UPDATE;
SAVEPOINT s3;
SELECT tbalance FROM pgbench_tellers WHERE tid = :tid for UPDATE;
SAVEPOINT s4;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid for UPDATE;
SAVEPOINT s5;
SELECT tbalance FROM pgbench_tellers WHERE tid = :tid for UPDATE;
END;

Non-default parameters
==
max_connections = 200
shared_buffers=8GB
min_wal_size=10GB
max_wal_size=15GB
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
checkpoint_timeout=900
synchronous_commit=off


pgbench -M prepared -c $thread -j $thread -T $time_for_reading postgres -f
~/test_script.sql

where, time_for_reading = 10 mins

Test Results:
=

With 3 savepoints
=

CLIENT COUNT TPS (HEAD) TPS (PATCH) % IMPROVEMENT
128 50275 53704 6.82048732
64 62860 66561 5.887686923
8 18464 18752 1.559792028

With 5 savepoints
=

CLIENT COUNT TPS (HEAD) TPS (PATCH) % IMPROVEMENT
128 46559 47715 2.482871196
64 52306 52082 -0.4282491492
8 12289 12852 4.581332899


With 7 savepoints
=

CLIENT COUNT TPS (HEAD) TPS (PATCH) % IMPROVEMENT
128 41367 41500 0.3215123166
64 42996 41473 -3.542189971
8 9665 9657 -0.0827728919

With 10 savepoints
==

CLIENT COUNT TPS (HEAD) TPS (PATCH) % IMPROVEMENT
128 34513 34597 0.24338655
64 32581 32035 -1.67582
8 7293 7622 4.511175099
*Conclusion:*
As seen from the test results mentioned above, there is some performance
improvement with 3 SP(s), with 5 SP(s) the results with patch is slightly
better than HEAD, with 7 and 10 SP(s) we do see regression with patch.
Therefore, I think the threshold value of 4 for number of subtransactions
considered in the patch looks fine to me.


--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Tue, Mar 21, 2017 at 6:19 PM, Amit Kapila 
wrote:

> On Mon, Mar 20, 2017 at 8:27 AM, Robert Haas 
> wrote:
> > On Fri, Mar 17, 2017 at 2:30 AM, Amit Kapila 
> wrote:
> >>> I was wondering about doing an explicit test: if the XID being
> >>> committed matches the one in the PGPROC, and nsubxids matches, and the
> >>> actual list of XIDs matches, then apply the optimization.  That could
> >>> replace the logic that you've proposed to exclude non-commit cases,
> >>> gxact cases, etc. and it seems fundamentally safer.  But it might be a
> >>> more expensive test, too, so I'm not sure.
> >>
> >> I think if the number of subxids is very small let us say under 5 or
> >> so, then such a check might not matter, but otherwise it could be
> >> expensive.
> >
> > We could find out by testing it.  We could also restrict the
> > optimization to cases with just a few subxids, because if you've got a
> > large number of subxids this optimization probably isn't buying much
> > anyway.
> >
>
> Yes, and I have modified the patch to compare xids and subxids for
> group update.  In the initial short tests (with few client counts), it
> seems like till 3 savepoints we can win and 10 savepoints onwards
> there is some regression or at the very least there doesn't appear to
> be any benefit.  We need more tests to identify what is the safe
> number, but I thought it is better to share the patch to see if we
> agree on the changes because if not, then the whole testing needs to
> be repeated.  Let me know what do you think about attached?
>
>
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] Fix for grammatical error in code-comment

2017-03-23 Thread Magnus Hagander
On Wed, Mar 22, 2017 at 2:47 PM, Emil Iggland  wrote:

> I found a grammatical error in one of the code comments.
>
> The comment is made regarding the run-time of the algorithm, and
> references big-O performance.
> However the comment text uses "effective", instead of the probably more
> correct "efficient".
>
> The original comment reads as though the code would not do its job
> correctly, rather than just not very quickly.
>

Applied, thanks.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] createlang/droplang deprecated

2017-03-23 Thread Magnus Hagander
On Mon, Mar 20, 2017 at 1:37 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 3/18/17 09:00, Peter Eisentraut wrote:
> > I just noticed that createlang and droplang have been listed as
> > deprecated since PG 9.1.
> >
> > Do we dare remove them?
>
> Patch
>
>
+1


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] ICU integration

2017-03-23 Thread Andreas Karlsson
I am fine with this version of the patch. The issues I have with it, 
which I mentioned earlier in this thread, seem to be issues with ICU 
rather than with this patch. For example there seems to be no way for 
ICU to validate the syntax of the BCP 47 locales (or ICU's old format). 
But I think we will just have to accept the weirdness of how ICU handles 
locales.


I think this patch is ready to be committed.

Found a typo in the documentation:

"The inspect the currently available locales" should be "To inspect the 
currently available locales".


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Measuring replay lag

2017-03-23 Thread Simon Riggs
> Second thoughts... I'll just make LagTrackerWrite externally
> available, so a plugin can send anything it wants to the tracker.
> Which means I'm explicitly removing the "logical replication support"
> from this patch.

Done.

Here's the patch I'm looking to commit, with some docs and minor code
changes as discussed.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


replication-lag-v7sr1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Pavan Deolasee
On Thu, Mar 23, 2017 at 3:02 PM, Amit Kapila 
wrote:

>
>
> That sounds like you are dodging the actual problem.  I mean you can
> put that same PageIsFull() check in master code as well and then you
> will most probably again see the same regression.


Well I don't see it that way. There was a specific concern about a specific
workload that WARM might regress. I think this change addresses that. Sure
if you pick that one piece, put it in master first and then compare against
rest of the WARM code, you will see a regression. But I thought what we
were worried is WARM causing regression to some existing user, who might
see her workload running 10% slower, which this change mitigates.


> Also, I think if we
> test at fillfactor 80 or 75 (which is not unrealistic considering an
> update-intensive workload), then we might again see regression.


Yeah, we might, but it will be lesser than before, may be 2% instead of
10%. And by doing this we are further narrowing an already narrow test
case. I think we need to see things in totality and weigh in costs-benefits
trade offs. There are numbers for very common workloads, where WARM may
provide 20, 30 or even more than 100% improvements.

Andres and Alvaro already have other ideas to address this problem even
further. And as I said, we can pass-in index specific information and make
that routine bail-out even earlier. We need to accept that WARM will need
to do more attr checks than master, especially when there are more than 1
indexes on the table,  and sometimes those checks will go waste. I am ok if
we want to provide table-specific knob to disable WARM, but not sure if
others would like that idea.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-03-23 Thread Thomas Munro
Hi,

Here is a new patch series responding to feedback from Peter and Andres:

1.  Support pgstat_report_tempfile and log_temp_files, which I had
overlooked as Peter pointed out.

2.  Use a patch format that is acceptable to git am, per complaint
off-list from Andres.  (Not actually made with git format-patch; I
need to learn some more git-fu, but they now apply cleanly with git
am).

On Thu, Mar 23, 2017 at 12:55 PM, Peter Geoghegan  wrote:
> I took a quick look at your V9 today. This is by no means a
> comprehensive review of 0008-hj-shared-buf-file-v9.patch, but it's
> what I can manage right now.

Thanks.  I really appreciate your patience with the resource
management stuff I had failed to think through.

> ...
>
> However, I notice that the place that this happens instead,
> PathNameDelete(), does not repeat the fd.c step of doing a final
> stat(), and using the stats for a pgstat_report_tempfile(). So, a new
> pgstat_report_tempfile() call is simply missing. However, the more
> fundamental issue in my mind is: How can you fix that? Where would it
> go if you had it?

You're right.  I may be missing something here (again), but it does
seem straightforward to implement because we always delete each file
that really exists exactly once (and sometimes we also try to delete
files that don't exist due to imprecise meta-data, but that isn't
harmful and we know when that turns out to be the case).

> If you do the obvious thing of just placing that before the new
> unlink() within PathNameDelete(), on the theory that that needs parity
> with the fd.c stuff, that has non-obvious implications. Does the
> pgstat_report_tempfile() call need to happen when going through this
> path, for example?:
>
>> +/*
>> + * Destroy a shared BufFile early.  Files are normally cleaned up
>> + * automatically when all participants detach, but it might be useful to
>> + * reclaim disk space sooner than that.  The caller asserts that no backends
>> + * will attempt to read from this file again and that only one backend will
>> + * destroy it.
>> + */
>> +void
>> +SharedBufFileDestroy(SharedBufFileSet *set, int partition, int participant)
>> +{

Yes, I think it should definitely go into
PathNameDeleteTemporaryFile() (formerly PathNameDelete()).

> The theory with the unlink()'ing() function PathNameDelete(), I
> gather, is that it doesn't matter if it happens to be called more than
> once, say from a worker and then in an error handling path in the
> leader or whatever. Do I have that right?

Yes, it may be called for a file that doesn't exist either because it
never existed, or because it has already been deleted.  To recap,
there are two reasons it needs to tolerate attempts to delete files
that aren't there:

1.  To be able to delete the fd.c files backing a BufFile given only a
BufFileTag.  We don't know how many segment files there are, but we
know how to build the prefix of the filename so we try to delete
[prefix].0, [prefix].1, [prefix].2 ... until we get ENOENT and
terminate.  I think this sort of thing would be more questionable for
durable storage backing a database object, but for temporary files I
can't think of a problem with it.

2.  SharedBufFileSet doesn't actually know how many partitions exist,
it just knows the *range* of partition numbers (because of its
conflicting fixed space and increasable partitions requirements).
>From that information it can loop building BufFileTags for all backing
files that *might* exist, and in practice they usually do because we
don't tend to have a 'sparse' range of partitions.

The error handling path isn't a special case: whoever is the last to
detach from the DSM segment will delete all the files, whether that
results from an error or not.  Now someone might call
SharedBufFileDestroy() to delete files sooner, but that can't happen
at the same time as a detach cleanup (the caller is still attached).

As a small optimisation avoiding a bunch of pointless unlink syscalls,
I shrink the SharedBufFileSet range if you happen to delete explicitly
with a partition number at the extremities of the range, and it so
happens that Parallel Hash Join explicitly deletes them in partition
order as the join runs, so in practice the range is empty by the time
SharedBufFileSet's cleanup runs and there is nothing to do, unless an
error occurs.

> Obviously the concern I have about that is that any stat() call you
> might add for the benefit of a new pgstat_report_tempfile() call,
> needed to keep parity with fd.c, now has a risk of double counting in
> error paths, if I'm not mistaken. We do need to do that accounting in
> the event of error, just as we do when there is no error, at least if
> current stats collector behavior is to be preserved. How can you
> determine which duplicate call here is the duplicate? In other words,
> how can you figure out which one is not supposed to
> pgstat_report_tempfile()? If the size of temp files in each worker is
> unknowable to the 

Re: [HACKERS] Multiple false-positive warnings from Valgrind

2017-03-23 Thread Michael Paquier
On Tue, Mar 21, 2017 at 10:57 PM, Aleksander Alekseev
 wrote:
> Recently I've decided to run PostgreSQL under Valgrind according to wiki
> description [1]. Lots of warnings are generated [2] but it is my
> understanding that all of them are false-positive. For instance I've
> found these two reports particularly interesting:
>
> ```
> ==00:00:40:40.161 7677== Use of uninitialised value of size 8
> ==00:00:40:40.161 7677==at 0xA15FF5: pg_b64_encode (base64.c:68)
> ==00:00:40:40.161 7677==by 0x6FFE85: scram_build_verifier 
> (auth-scram.c:348)
> ==00:00:40:40.161 7677==by 0x6F3F76: encrypt_password (crypt.c:171)
> ==00:00:40:40.161 7677==by 0x68F40C: CreateRole (user.c:403)
> ==00:00:40:40.161 7677==by 0x85D53A: standard_ProcessUtility 
> (utility.c:716)
> ==00:00:40:40.161 7677==by 0x85CCC7: ProcessUtility (utility.c:353)
> ==00:00:40:40.161 7677==by 0x85BD22: PortalRunUtility (pquery.c:1165)
> ==00:00:40:40.161 7677==by 0x85BF20: PortalRunMulti (pquery.c:1308)
> ==00:00:40:40.161 7677==by 0x85B4A0: PortalRun (pquery.c:788)
> ==00:00:40:40.161 7677==by 0x855672: exec_simple_query (postgres.c:1101)
> ==00:00:40:40.161 7677==by 0x8597BB: PostgresMain (postgres.c:4066)
> ==00:00:40:40.161 7677==by 0x7C6322: BackendRun (postmaster.c:4317)
> ==00:00:40:40.161 7677==  Uninitialised value was created by a stack 
> allocation
> ==00:00:40:40.161 7677==at 0x6FFDB7: scram_build_verifier 
> (auth-scram.c:328)

I can see those warnings as well after calling a code path of
scram_build_verifier(), and I have a hard time seeing that as nothing
else than a false positive as you do. All those warnings go away if
you just initialize just do MemSet(salt, 0, SCRAM_SALT_LEN) before
calling pg_backend_random() but this data is filled in with
RAND_bytes() afterwards (if built with openssl). The estimated lengths
of the encoding are also correct. I don't see immediately what's wrong
here, this deserves a second look...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH v1] Add and report the new "in_hot_standby" GUC pseudo-variable.

2017-03-23 Thread Magnus Hagander
On Wed, Mar 22, 2017 at 9:00 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 3/22/17 14:09, Robert Haas wrote:
> >> The opposite means primary.  I can flip the GUC name to "is_primary", if
> >> that's clearer.
> > Hmm, I don't find that clearer.  "hot standby" has a very specific
> > meaning; "primary" isn't vague, but I would say it's less specific.
>
> The problem I have is that there is already a GUC named "hot_standby",
> which determines whether an instance is in hot (as opposed to warm?)
> mode if it is a standby.  This is proposing to add a setting named
> "in_hot_standby" which says nothing about the hotness, but something
> about the standbyness.  Note that these are all in the same namespace.
>
> I think we could use "in_recovery", which would be consistent with
> existing naming.
>

One thing we might want to consider around this -- in 10 we have
target_session_attrs=read-write (since
721f7bd3cbccaf8c07cad2707826b83f84694832), which will issue a SHOW
transaction_read_only on the connection.

We should probably consider if there is some way we can implement these two
things the same way. If we're inventing a new variable that gets pushed on
each connection, perhaps we can use that one and avoid the SHOW command? Is
the read-write thing really interesting in the aspect of the general case,
or is it more about detectinv readonly standbys as well? Or to flip that,
would sending the transaction_read_only parameter be enough for the usecase
in this thread, without having to invent a new variable?

(I haven't thought it through all the way, but figured I should mention the
thought as I'm working through my email backlog.)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Amit Kapila
On Thu, Mar 23, 2017 at 12:19 AM, Pavan Deolasee
 wrote:
>
> Ok, no problem. I did some tests on AWS i2.xlarge instance (4 vCPU, 30GB
> RAM, attached SSD) and results are shown below. But I think it is important
> to get independent validation from your side too, just to ensure I am not
> making any mistake in measurement. I've attached naively put together
> scripts which I used to run the benchmark. If you find them useful, please
> adjust the paths and run on your machine.
>
> I reverted back to UNLOGGED table because with WAL the results looked very
> weird (as posted earlier) even when I was taking a CHECKPOINT before each
> set and had set max_wal_size and checkpoint_timeout high enough to avoid any
> checkpoint during the run. Anyways, that's a matter of separate
> investigation and not related to this patch.
>
> I did two kinds of tests.
> a) update last column of the index
> b) update second column of the index
>
> v19 does considerably better than even master for the last column update
> case and pretty much inline for the second column update test. The reason is
> very clear because v19 determines early in the cycle that the buffer is
> already full and there is very little chance of doing a HOT update on the
> page. In that case, it does not check any columns for modification.
>

That sounds like you are dodging the actual problem.  I mean you can
put that same PageIsFull() check in master code as well and then you
will most probably again see the same regression.  Also, I think if we
test at fillfactor 80 or 75 (which is not unrealistic considering an
update-intensive workload), then we might again see regression.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Measuring replay lag

2017-03-23 Thread Thomas Munro
On Thu, Mar 23, 2017 at 10:50 PM, Simon Riggs  wrote:
>> Second thoughts... I'll just make LagTrackerWrite externally
>> available, so a plugin can send anything it wants to the tracker.
>> Which means I'm explicitly removing the "logical replication support"
>> from this patch.
>
> Done.
>
> Here's the patch I'm looking to commit, with some docs and minor code
> changes as discussed.

Giving LagTrackerWrite external linkage seems sensible, assuming there
is a reasonable way for logical replication to decide when to call it.

+system. Monitoring systems should choose whether the represent this
+as missing data, zero or continue to display the last known value.

s/whether the/whether to/

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] segfault in hot standby for hash indexes

2017-03-23 Thread Ashutosh Sharma
On Thu, Mar 23, 2017 at 9:17 AM, Amit Kapila  wrote:
>
> On Thu, Mar 23, 2017 at 8:43 AM, Amit Kapila  wrote:
> > On Wed, Mar 22, 2017 at 3:39 PM, Ashutosh Sharma  
> > wrote:
> >> Hi,
> >>
> >> On Wed, Mar 22, 2017 at 8:41 AM, Amit Kapila  
> >> wrote:
> >>
> >> To fix this, I think we should pass 'REGBUF_KEEP_DATA' while
> >> registering the buffer. Something like this,
> >>
> >> -   XLogRegisterBuffer(0, buf, REGBUF_STANDARD);
> >> +   XLogRegisterBuffer(0, buf, REGBUF_STANDARD |
> >> REGBUF_KEEP_DATA);
> >>
> >> Attached is the patch that fixes this issue.
> >>
> >
> > I think this will work, but not sure if there is a merit to deviate
> > from what btree does to handle this case.   One thing I find slightly
> > awkward in hash_xlog_vacuum_get_latestRemovedXid() is that you are
> > using a number of tuples registered as part of fixed data
> > (xl_hash_vacuum_one_page) to traverse the data registered as buf data.
> > I think it will be better if we register offsets also in fixed part of
> > data as we are doing btree case.

Agreed. I have made the changes accordingly. Please check attached v2 patch.

>
> >
> >
>
> Also another small point in this regard, do we need two separate
> variables to track number of deleted items in below code?  I think one
> variable is sufficient.
>
> _hash_vacuum_one_page()
> {
> ..
> deletable[ndeletable++] = offnum;
> tuples_removed += 1;--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
> ..
> }
>

Yes, I think 'ndeletable' alone should be fine.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index de7522e..d9ac42c 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -957,8 +957,6 @@ hash_xlog_vacuum_get_latestRemovedXid(XLogReaderState *record)
 	OffsetNumber	hoffnum;
 	TransactionId	latestRemovedXid = InvalidTransactionId;
 	int		i;
-	char *ptr;
-	Size len;
 
 	xlrec = (xl_hash_vacuum_one_page *) XLogRecGetData(record);
 
@@ -977,12 +975,20 @@ hash_xlog_vacuum_get_latestRemovedXid(XLogReaderState *record)
 		return latestRemovedXid;
 
 	/*
+	 * Check if WAL replay has reached a consistent database state. If not,
+	 * we must PANIC. See the definition of btree_xlog_delete_get_latestRemovedXid
+	 * for more details.
+	 */
+	if (!reachedConsistency)
+		elog(PANIC, "hash_xlog_vacuum_get_latestRemovedXid: cannot operate with inconsistent data");
+
+	/*
 	 * Get index page.  If the DB is consistent, this should not fail, nor
 	 * should any of the heap page fetches below.  If one does, we return
 	 * InvalidTransactionId to cancel all HS transactions.  That's probably
 	 * overkill, but it's safe, and certainly better than panicking here.
 	 */
-	XLogRecGetBlockTag(record, 1, , NULL, );
+	XLogRecGetBlockTag(record, 0, , NULL, );
 	ibuffer = XLogReadBufferExtended(rnode, MAIN_FORKNUM, blkno, RBM_NORMAL);
 
 	if (!BufferIsValid(ibuffer))
@@ -994,9 +1000,7 @@ hash_xlog_vacuum_get_latestRemovedXid(XLogReaderState *record)
 	 * Loop through the deleted index items to obtain the TransactionId from
 	 * the heap items they point to.
 	 */
-	ptr = XLogRecGetBlockData(record, 1, );
-
-	unused = (OffsetNumber *) ptr;
+	unused = (OffsetNumber *) ((char *) xlrec + SizeOfHashVacuumOnePage);
 
 	for (i = 0; i < xlrec->ntuples; i++)
 	{
@@ -1121,23 +1125,15 @@ hash_xlog_vacuum_one_page(XLogReaderState *record)
 
 	if (action == BLK_NEEDS_REDO)
 	{
-		char *ptr;
-		Size len;
-
-		ptr = XLogRecGetBlockData(record, 0, );
-
 		page = (Page) BufferGetPage(buffer);
 
-		if (len > 0)
+		if (XLogRecGetDataLen(record) > SizeOfHashVacuumOnePage)
 		{
 			OffsetNumber *unused;
-			OffsetNumber *unend;
 
-			unused = (OffsetNumber *) ptr;
-			unend = (OffsetNumber *) ((char *) ptr + len);
+			unused = (OffsetNumber *) ((char *) xldata + SizeOfHashVacuumOnePage);
 
-			if ((unend - unused) > 0)
-PageIndexMultiDelete(page, unused, unend - unused);
+			PageIndexMultiDelete(page, unused, xldata->ntuples);
 		}
 
 		/*
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 8640e85..8699b5b 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -344,7 +344,6 @@ _hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
 	Page	page = BufferGetPage(buf);
 	HashPageOpaque	pageopaque;
 	HashMetaPage	metap;
-	double tuples_removed = 0;
 
 	/* Scan each tuple in page to see if it is marked as LP_DEAD */
 	maxoff = PageGetMaxOffsetNumber(page);
@@ -355,10 +354,7 @@ _hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
 		ItemId	itemId = PageGetItemId(page, offnum);
 
 		if (ItemIdIsDead(itemId))
-		{
 			deletable[ndeletable++] = offnum;
-			tuples_removed += 1;
-		}
 	}
 
 	if (ndeletable > 0)
@@ -386,7 

Re: [HACKERS] identity columns

2017-03-23 Thread Vitaly Burovoy
On 3/22/17, Peter Eisentraut  wrote:
> On 3/22/17 03:59, Vitaly Burovoy wrote:
>> Column's IDENTITY behavior is very similar to a DEFAULT one. We write
>> "SET DEFAULT" and don't care whether it was set before or not, because
>> we can't have many of them for a single column. Why should we do that
>> for IDENTITY?
>
> One indication is that the SQL standard requires that DROP IDENTITY only
> succeed if the column is currently an identity column.  That is
> different from how DEFAULT works.

I think we'll end up with "DROP IDENTITY IF EXISTS" to avoid raising
an exception and "ADD OR SET" if your grammar remains.

> Another difference is that there is no such thing as "no default",
> because in absence of an explicit default, it is NULL.  So all you are
> doing with SET DEFAULT or DROP DEFAULT is changing the default.  You are
> not actually adding or removing it.

Right. From that PoV IDENTITY also changes a default value: "SET (ADD
... AS?) IDENTITY" works as setting a default to "nextval(...)"
whereas "DROP IDENTITY" works as setting it back to NULL.

> Therefore, the final effect of SET DEFAULT is the same no matter whether
> another default was there before or not.  For ADD/SET IDENTITY, you get
> different behaviors.  For example:
>
> ADD .. AS IDENTITY (START 2)
>
> creates a new sequence that starts at 2 and uses default parameters
> otherwise.  But
>
> SET (START 2)
>
> alters the start parameter of an existing sequence.  So depending on
> whether you already have an identity sequence, these commands do
> completely different things.

If you use "SET START 2" to a non-identity columns, you should get an
exception (no information about an identity type: neither "by default"
nor "always"). The correct example is:
ADD GENERATED BY DEFAULT AS IDENTITY (START 2)
and
SET GENERATED BY DEFAULT SET START 2

Note that creating a sequence is an internal machinery hidden from users.
Try to see from user's PoV: the goal is to have a column with an
autoincrement. If it is already autoincremented, no reason to create a
sequence (it is already present) and no reason to restart with 2
(there can be rows with such numbers).
"... SET START 2" is for the next "RESTART" DDL, and if a user insists
to start with 2, it is still possible:

SET GENERATED BY DEFAULT SET START 2 RESTART 2


I still think that introducing "ADD" for a property which can not be
used more than once (compare with "ADD CHECK": you can use it with the
same expression multiple times) is not a good idea.

I think there should be a consensus in the community for a grammar.

> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-03-23 Thread Andreas Karlsson

On 03/21/2017 08:02 AM, Haribabu Kommi wrote:

Solution -1) Just ignore dumping these CREATE DATABASE
commands and provide the user information in the documentation
to create "postgres" and "template1" database in the target in case
if they don't exist. If this kind of cases are very rare.

Solution-2) Add a new command line option/some other settings
to indicate the pg_dump execution is from pg_dumpall and follow
the current refactored behavior, otherwise follow the earlier pg_dump
behavior in handling CREATE DATABASE commands for "postgres"
and "template1" databases.


I am leaning towards (2) since I feel having pg_dump act differently 
depending on the name of the database is a quite surprising behavior. It 
makes more sense to let a tool like pg_dumpall handle logic like that.



2.  In dumpDatabases function before calling the runPgDump command,
Before refactoring, it used to connect to the database and dump
"SET default_transaction_read_only = off;" to prevent some accidental
overwrite of the target.

I fixed it in the attached patch by removing the connection and dumping
the set command.

Does it needs the similar approach of solution-2) in previous problem and
handle dumping the "SET default_transaction_read_only = off;" whenever
the CREATE DATABASE and \connect command is issued?


Hm, that is a bit annoying. I do not think we want to change any 
behavior here, either of pg_dump or pg_dumpall, but I also do not like 
having to add two new flags to pg_dump (one for including the ALTER 
DATABASE commands but not CREATE DATABASE, and another flag for 
default_transaction_read_only) or a special flag similar to 
--binary-upgrade.


None of these options seem optimal to me, and I do not have any strong 
preference other than that we should avoid breaking pg_dump or changing 
behavior not related to the database attributes.


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise aggregation/grouping

2017-03-23 Thread Jeevan Chalke
On Tue, Mar 21, 2017 at 1:47 PM, Antonin Houska  wrote:

> Jeevan Chalke  wrote:
>
> > Declarative partitioning is supported in PostgreSQL 10 and work is
> already in
> > progress to support partition-wise joins. Here is a proposal for
> partition-wise
> > aggregation/grouping. Our initial performance measurement has shown 7
> times
> > performance when partitions are on foreign servers and approximately 15%
> when
> > partitions are local.
> >
> > Partition-wise aggregation/grouping computes aggregates for each
> partition
> > separately. If the group clause contains the partition key, all the rows
> > belonging to a given group come from one partition, thus allowing
> aggregates
> > to be computed completely for each partition. Otherwise, partial
> aggregates
> > computed for each partition are combined across the partitions to
> produce the
> > final aggregates. This technique improves performance because:
>
> > i. When partitions are located on foreign server, we can push down the
> > aggregate to the foreign server.
>
> > ii. If hash table for each partition fits in memory, but that for the
> whole
> > relation does not, each partition-wise aggregate can use an in-memory
> hash
> > table.
>
> > iii. Aggregation at the level of partitions can exploit properties of
> > partitions like indexes, their storage etc.
>
> I suspect this overlaps with
>
> https://www.postgresql.org/message-id/29111.1483984605%40localhost
>
> I'm working on the next version of the patch, which will be able to
> aggregate
> the result of both base relation scans and joins. I'm trying hard to make
> the
> next version available before an urgent vacation that I'll have to take at
> random date between today and early April. I suggest that we coordinate the
> effort, it's lot of work in any case.
>

IIUC, it seems that you are trying to push down the aggregation into the
joining relations. So basically you are converting
Agg -> Join -> {scan1, scan2} into
FinalAgg -> Join -> {PartialAgg -> scan1, PartialAgg -> scan2}.
In addition to that your patch pushes aggregates on base rel to its
children,
if any.

Where as what I propose here is pushing down aggregation below the append
node keeping join/scan as is. So basically I am converting
Agg -> Append-> Join -> {scan1, scan2} into
Append -> Agg -> Join -> {scan1, scan2}.
This will require partition-wise join as posted in [1].
But I am planning to make this work for partitioned relations and not for
generic inheritance.

I treat these two as separate strategies/paths to be consider while
planning.

Our work will overlap when we are pushing down the aggregate on partitioned
base relation to its children/partitions.

I think you should continue working on pushing down aggregate onto the
joins/scans where as I will continue my work on pushing down aggregates to
partitions (joins as well as single table). Once we are done with these
task,
then we may need to find a way to integrate them.

[1]
https://www.postgresql.org/message-id/flat/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=ead...@mail.gmail.com#CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=ead...@mail.gmail.com



>
> --
> Antonin Houska
> Cybertec Schönig & Schönig GmbH
> Gröhrmühlgasse 26
> A-2700 Wiener Neustadt
> Web: http://www.postgresql-support.de, http://www.cybertec.at
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Re: [HACKERS] Logical decoding on standby

2017-03-23 Thread Craig Ringer
On 23 March 2017 at 00:13, Simon Riggs  wrote:
> On 22 March 2017 at 08:53, Craig Ringer  wrote:
>
>> I'm splitting up the rest of the decoding on standby patch set with
>> the goal of getting minimal functionality for creating and managing
>> slots on standbys in, so we can maintain slots on standbys and use
>> them when the standby is promoted to master.
>>
>> The first, to send catalog_xmin separately to the global xmin on
>> hot_standby_feedback and store it in the upstream physical slot's
>> catalog_xmin, is attached.
>>
>> These are extracted directly from the logical decoding on standby
>> patch, with comments by Petr and Andres made re the relevant code
>> addressed.
>
> I've reduced your two patches back to one with a smaller blast radius.
>
> I'll commit this tomorrow morning, barring objections.

This needs rebasing on top of

commit af4b1a0869bd3bb52e5f662e4491554b7f611489
Author: Simon Riggs 
Date:   Wed Mar 22 16:51:01 2017 +

Refactor GetOldestXmin() to use flags

Replace ignoreVacuum parameter with more flexible flags.

Author: Eiji Seki
Review: Haribabu Kommi


That patch landed up using PROCARRAY flags directly as flags to
GetOldestXmin, so it doesn't make much sense to add a flag like
PROCARRAY_REPLICATION_SLOTS . There won't be any corresponding PROC_
flag for PGXACT->vacuumFlags, replication slot xmin and catalog_xmin
are global state not tracked in individual proc entries.

Rather than add some kind of "PROC_RESERVED" flag in proc.h that would
never be used and only exist to reserve a bit for use for
PROCARRAY_REPLICATION_SLOTS, which we'd use a flag to GetOldestXmin, I
added a new argument to GetOldestXmin like the prior patch did.

If preferred I can instead add

proc.h:

#define PROC_RESERVED 0x20

procarray.h:

#define PROCARRAY_REPLICATION_SLOTS 0x20

and then test for (flags & PROCARRAY_REPLICATION_SLOTS)

but that's kind of ugly to say the least, I'd rather just add another argument.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From ffa43fae35857dbff0efe83ef199df165d887d97 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 22 Mar 2017 12:29:13 +0800
Subject: [PATCH] Report catalog_xmin separately to xmin in hot standby
 feedback

The catalog_xmin of slots on a standby was reported as part of the standby's
xmin, causing the master's xmin to be held down. This could cause considerable
unnecessary bloat on the master.

Instead, report catalog_xmin as a separate field in hot_standby_feedback. If
the upstream walsender is using a physical replication slot, store the
catalog_xmin in the slot's catalog_xmin field. If the upstream doesn't use a
slot and has only a PGPROC entry behaviour doesn't change, as we store the
combined xmin and catalog_xmin in the PGPROC entry.

There's no backward compatibility concern here, as nothing except another
postgres instance of the same major version has any business sending hot
standby feedback and it's only used on the physical replication protocol.
---
 contrib/pg_visibility/pg_visibility.c  |   6 +-
 contrib/pgstattuple/pgstatapprox.c |   2 +-
 doc/src/sgml/protocol.sgml |  33 ++-
 src/backend/access/transam/xlog.c  |   4 +-
 src/backend/catalog/index.c|   3 +-
 src/backend/commands/analyze.c |   2 +-
 src/backend/commands/vacuum.c  |   5 +-
 src/backend/replication/walreceiver.c  |  44 +++--
 src/backend/replication/walsender.c| 110 +++--
 src/backend/storage/ipc/procarray.c|  11 ++-
 src/include/storage/procarray.h|   2 +-
 .../recovery/t/010_logical_decoding_timelines.pl   |  38 ++-
 12 files changed, 198 insertions(+), 62 deletions(-)

diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index ee3936e..2db5762 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -557,7 +557,7 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
 	if (all_visible)
 	{
 		/* Don't pass rel; that will fail in recovery. */
-		OldestXmin = GetOldestXmin(NULL, PROCARRAY_FLAGS_VACUUM);
+		OldestXmin = GetOldestXmin(NULL, PROCARRAY_FLAGS_VACUUM, false);
 	}
 
 	rel = relation_open(relid, AccessShareLock);
@@ -674,7 +674,9 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
  * a buffer lock. And this shouldn't happen often, so it's
  * worth being careful so as to avoid false positives.
  */
-RecomputedOldestXmin = GetOldestXmin(NULL, PROCARRAY_FLAGS_VACUUM);
+RecomputedOldestXmin = GetOldestXmin(NULL,
+	 PROCARRAY_FLAGS_VACUUM,
+	 false);
 
 if (!TransactionIdPrecedes(OldestXmin, 

Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-03-23 Thread Michael Paquier
On Thu, Mar 23, 2017 at 8:19 PM, Kuntal Ghosh
 wrote:
> Hence, to be consistent with others, bgworker processes can be
> initialized from BackgroundWorkerInitializeConnectionBy[Oid].

Yeah, I am fine with that. Thanks for the updated versions.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bug/oversight in TestLib.pm and PostgresNode.pm

2017-03-23 Thread Erik Rijkers

On 2017-03-23 03:28, Michael Paquier wrote:

On Thu, Mar 23, 2017 at 12:51 AM, Erik Rijkers  wrote:
While trying to test pgbench's stderr (looking for 'creating tables' 
in
output of the initialisation step)  I ran into these two bugs (or 
perhaps

better 'oversights').


+   if (defined $expected_stderr) {
+   like($stderr, $expected_stderr, "$test_name: stderr matches");
+   }
+   else {
is($stderr, '', "$test_name: no stderr");
-   like($stdout, $expected_stdout, "$test_name: matches");
+   }
To simplify that you could as well set expected_output to be an empty
string, and just use like() instead of is(), saving this if/else.


(I'll assume you meant '$expected_stderr' (not 'expected_output'))

That would be nice but with that, other tests start complaining: 
"doesn't look like a regex to me"


To avoid that, I uglified your version back to:

+   like($stderr, (defined $expected_stderr ? $expected_stderr : 
qr{}),

+   "$test_name: stderr matches");

I did it like that in the attached patch 
(0001-testlib-like-stderr.diff).



The other (PostgresNode.pm.diff) is unchanged.

make check-world without error.


Thanks,

Erik Rijkers

--- src/test/perl/TestLib.pm.orig	2017-03-23 08:11:16.034410936 +0100
+++ src/test/perl/TestLib.pm	2017-03-23 08:12:33.154132124 +0100
@@ -289,13 +289,14 @@
 
 sub command_like
 {
-	my ($cmd, $expected_stdout, $test_name) = @_;
+	my ($cmd, $expected_stdout, $test_name, $expected_stderr) = @_;
 	my ($stdout, $stderr);
 	print("# Running: " . join(" ", @{$cmd}) . "\n");
 	my $result = IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr;
 	ok($result, "$test_name: exit code 0");
-	is($stderr, '', "$test_name: no stderr");
-	like($stdout, $expected_stdout, "$test_name: matches");
+	like($stderr, (defined $expected_stderr ? $expected_stderr : qr{}),
+	"$test_name: stderr matches");
+	like($stdout, $expected_stdout, "$test_name: stdout matches");
 }
 
 sub command_fails_like
--- src/test/perl/PostgresNode.pm.orig	2017-03-22 15:58:58.690052999 +0100
+++ src/test/perl/PostgresNode.pm	2017-03-22 15:49:38.422777312 +0100
@@ -1283,6 +1283,23 @@
 
 =pod
 
+=item $node->command_fails_like(...) - TestLib::command_fails_like with our PGPORT
+
+See command_ok(...)
+
+=cut
+
+sub command_fails_like
+{
+	my $self = shift;
+
+	local $ENV{PGPORT} = $self->port;
+
+	TestLib::command_fails_like(@_);
+}
+
+=pod
+
 =item $node->issues_sql_like(cmd, expected_sql, test_name)
 
 Run a command on the node, then verify that $expected_sql appears in the

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding on standby

2017-03-23 Thread Craig Ringer
On 23 March 2017 at 16:07, Craig Ringer  wrote:

> If preferred I can instead add
>
> proc.h:
>
> #define PROC_RESERVED 0x20
>
> procarray.h:
>
> #define PROCARRAY_REPLICATION_SLOTS 0x20
>
> and then test for (flags & PROCARRAY_REPLICATION_SLOTS)

Attached done that way.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 2e887ee19c9c1bae442b9f0682169f9b0c61268a Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 22 Mar 2017 12:29:13 +0800
Subject: [PATCH] Report catalog_xmin separately to xmin in hot standby
 feedback

The catalog_xmin of slots on a standby was reported as part of the standby's
xmin, causing the master's xmin to be held down. This could cause considerable
unnecessary bloat on the master.

Instead, report catalog_xmin as a separate field in hot_standby_feedback. If
the upstream walsender is using a physical replication slot, store the
catalog_xmin in the slot's catalog_xmin field. If the upstream doesn't use a
slot and has only a PGPROC entry behaviour doesn't change, as we store the
combined xmin and catalog_xmin in the PGPROC entry.

There's no backward compatibility concern here, as nothing except another
postgres instance of the same major version has any business sending hot
standby feedback and it's only used on the physical replication protocol.
---
 doc/src/sgml/protocol.sgml |  33 ++-
 src/backend/replication/walreceiver.c  |  45 +++--
 src/backend/replication/walsender.c| 110 +++--
 src/backend/storage/ipc/procarray.c|  12 ++-
 src/include/storage/proc.h |   5 +
 src/include/storage/procarray.h|  11 +++
 .../recovery/t/010_logical_decoding_timelines.pl   |  38 ++-
 7 files changed, 202 insertions(+), 52 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 244e381..d8786f0 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1911,10 +1911,11 @@ The commands accepted in walsender mode are:
   
   
   
-  The standby's current xmin. This may be 0, if the standby is
-  sending notification that Hot Standby feedback will no longer
-  be sent on this connection. Later non-zero messages may
-  reinitiate the feedback mechanism.
+  The standby's current global xmin, excluding the catalog_xmin from any
+  replication slots. If both this value and the following
+  catalog_xmin are 0 this is treated as a notification that Hot Standby
+  feedback will no longer be sent on this connection. Later non-zero
+  messages may reinitiate the feedback mechanism.
   
   
   
@@ -1924,7 +1925,29 @@ The commands accepted in walsender mode are:
   
   
   
-  The standby's current epoch.
+  The epoch of the global xmin xid on the standby.
+  
+  
+  
+  
+  
+  Int32
+  
+  
+  
+  The lowest catalog_xmin of any replication slots on the standby. Set to 0
+  if no catalog_xmin exists on the standby or if hot standby feedback is being
+  disabled.
+  
+  
+  
+  
+  
+  Int32
+  
+  
+  
+  The epoch of the catalog_xmin xid on the standby.
   
   
   
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 31c567b..0f22f17 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1175,8 +1175,8 @@ XLogWalRcvSendHSFeedback(bool immed)
 {
 	TimestampTz now;
 	TransactionId nextXid;
-	uint32		nextEpoch;
-	TransactionId xmin;
+	uint32		xmin_epoch, catalog_xmin_epoch;
+	TransactionId xmin, catalog_xmin;
 	static TimestampTz sendTime = 0;
 	/* initially true so we always send at least one feedback message */
 	static bool master_has_standby_xmin = true;
@@ -1221,29 +1221,56 @@ XLogWalRcvSendHSFeedback(bool immed)
 	 * everything else has been checked.
 	 */
 	if (hot_standby_feedback)
-		xmin = GetOldestXmin(NULL, PROCARRAY_FLAGS_DEFAULT);
+	{
+		TransactionId slot_xmin;
+
+		/*
+		 * Usually GetOldestXmin() would include the global replication slot
+		 * xmin and catalog_xmin in its calculations, but we don't want to hold
+		 * upstream back from vacuuming normal user table tuples just because
+		 * they're within the catalog_xmin horizon of logical replication slots
+		 * on this standby, so we ignore slot xmin and catalog_xmin GetOldestXmin
+		 * then deal with them ourselves.
+		 */
+		xmin = GetOldestXmin(NULL,
+			 PROCARRAY_FLAGS_DEFAULT|PROCARRAY_SLOTS_XMIN);
+
+		ProcArrayGetReplicationSlotXmin(_xmin, _xmin);
+
+		if (TransactionIdIsValid(slot_xmin) &&
+			TransactionIdPrecedes(slot_xmin, xmin))
+			xmin = slot_xmin;
+	}
 	else
+	{
 		xmin = 

Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-03-23 Thread Kuntal Ghosh
On Wed, Mar 22, 2017 at 9:54 PM, Robert Haas  wrote:
> On Wed, Mar 22, 2017 at 12:20 PM, Robert Haas  wrote:
>> On Wed, Mar 22, 2017 at 1:31 AM, Michael Paquier
>>  wrote:
>>> Okay, switched as ready for committer. One note for the committer
>>> though: keeping the calls of pgstat_bestart() out of
>>> BackgroundWorkerInitializeConnection() and
>>> BackgroundWorkerInitializeConnectionByOid() keeps users the
>>> possibility to not have backends connected to the database show up in
>>> pg_stat_activity. This may matter for some users (cloud deployment for
>>> example). I am as well in favor in keeping the work of those routines
>>> minimal, without touching at pgstat.
>>
>> I think that's just inviting bugs of omission, in both core and
>> extension code.  I think it'd be much better to do this in a
>> centralized place.
>
> I mean, your argument boils down to "somebody might want to
> deliberately hide things from pg_stat_activity".  But that's not
> really a mode we support in general, and supporting it only for
> certain cases doesn't seem like something that this patch should be
> about.  We could add an option to BackgroundWorkerInitializeConnection
> and BackgroundWorkerInitializeConnectionByOid to suppress it, if it's
> something that somebody wants, but actually I'd be more inclined to
> think that everybody (who has a shared memory connection) should go
> into the machinery and then security-filtering should be left to some
> higher-level facility that can make policy decisions rather than being
> hard-coded in the individual modules.
>
> But I'm slightly confused as to how this even arises.  Background
> workers already show up in pg_stat_activity output, or at least I sure
> think they do.  So why does this patch need to make any change to that
> case at all?
When we initialize a process through InitPostgres, the control returns
from the method at different locations based on the process's type(as
soon as its relevant information is initialized).  Following is the
order of returning a process from InitPostgres:

IsAutoVacuumLauncherProcess
walsender not connected to a DB
bgworker not connected to a DB
Other processes using InitPostgres

Before the patch, not all the processes are shown in pg_stat_activity.
Hence, only at two locations in InitPostgres, we need to call
pgstat_bestart() to initialize the entry for the respective process.
But, since we're increasing the types of a process shown in
pg_stat_activity, it seems to be reasonable to move the
pgstat_bestart() call to a proc's main entry point. I've followed the
same approach for auxiliary processes as well.

AutovacuumLauncher - AutoVacLauncherMain()
bgwriter BackgroundWriterMain()
checkpointer CheckpointerMain()
startup StartupProcessMain()
walwriter WalWriterMain()
walreceiver WalReceiverMain()
walsender InitWalSender()

Hence, to be consistent with others, bgworker processes can be
initialized from BackgroundWorkerInitializeConnectionBy[Oid].

I've attached the updated patches which reflect the above change. PFA.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


0001-Infra-to-expose-all-backend-processes-in-pg_stat_get.patch
Description: binary/octet-stream


0002-Expose-stats-for-all-backends.patch
Description: binary/octet-stream


0003-Add-backend_type-column-in-pg_stat_get_activity.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Measuring replay lag

2017-03-23 Thread Simon Riggs
On 23 March 2017 at 06:42, Simon Riggs  wrote:
> On 23 March 2017 at 01:02, Thomas Munro  wrote:
>
>> Thanks!  Please find attached v7, which includes a note we can point
>> at when someone asks why it doesn't show 00:00:00, as requested.
>
> Thanks.
>
> Now I look harder the handling for logical lag seems like it would be
> problematic in many cases. It's up to the plugin whether it sends
> anything at all, so we should make a LagTrackerWrite() call only if
> the plugin sends something. Otherwise the lag tracker will just slow
> down logical replication.
>
> What I think we should do is add an LSN onto LogicalDecodingContext to
> represent the last LSN sent by the plugin, if any.
>
> If that advances after the call to LogicalDecodingProcessRecord() then
> we know we just sent a message and we can track that with
> LagTrackerWrite().
>
> So we make it the plugin's responsibility to maintain this LSN
> correctly, if at all. (It may decide not to)
>
> In English that means the plugin will update the LSN after each
> Commit, and since we reply only on commit this will provide a series
> of measurements we can use. It will still give a saw-tooth, but its
> better than flooding the LagTracker with false measurements.
>
> I think it seems easier to add that as a minor cleanup/open item after
> this commit.

Second thoughts... I'll just make LagTrackerWrite externally
available, so a plugin can send anything it wants to the tracker.
Which means I'm explicitly removing the "logical replication support"
from this patch.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Pavan Deolasee
Thanks Amit. v19 addresses some of the comments below.

On Thu, Mar 23, 2017 at 10:28 AM, Amit Kapila 
wrote:

> On Wed, Mar 22, 2017 at 4:06 PM, Amit Kapila 
> wrote:
> > On Tue, Mar 21, 2017 at 6:47 PM, Pavan Deolasee
> >  wrote:
> >>
> >>>
> >>
> >> Please find attached rebased patches.
> >>
> >
> > Few comments on 0005_warm_updates_v18.patch:
> >
>
> Few more comments on 0005_warm_updates_v18.patch:
> 1.
> @@ -234,6 +241,25 @@ index_beginscan(Relation heapRelation,
>   scan->heapRelation = heapRelation;
>   scan->xs_snapshot = snapshot;
>
> + /*
> + * If the index supports recheck,
> make sure that index tuple is saved
> + * during index scans. Also build and cache IndexInfo which is used by
> + * amrecheck routine.
> + *
> + * XXX Ideally, we should look at
> all indexes on the table and check if
> + * WARM is at all supported on the base table. If WARM is not supported
> + * then we don't need to do any recheck.
> RelationGetIndexAttrBitmap() does
> + * do that and sets rd_supportswarm after looking at all indexes. But we
> + * don't know if the function was called earlier in the
> session when we're
> + * here. We can't call it now because there exists a risk of causing
> + * deadlock.
> + */
> + if (indexRelation->rd_amroutine->amrecheck)
> + {
> +scan->xs_want_itup = true;
> + scan->indexInfo = BuildIndexInfo(indexRelation);
> + }
> +
>
> Don't we need to do this rechecking during parallel scans?  Also what
> about bitmap heap scans?
>
>
Yes, we need to handle parallel scans. Bitmap scans are not a problem
because it can never return the same TID twice. I fixed this though by
moving this inside index_beginscan_internal.


> 2.
> +++ b/src/backend/access/nbtree/nbtinsert.c
> -
>  typedef struct
>
> Above change is not require.
>
>
Sure. Fixed.


> 3.
> +_bt_clear_items(Page page, OffsetNumber *clearitemnos, uint16 nclearitems)
> +void _hash_clear_items(Page page, OffsetNumber *clearitemnos,
> +   uint16 nclearitems)
>
> Both the above functions look exactly same, isn't it better to have a
> single function like page_clear_items?  If you want separation for
> different index types, then we can have one common function which can
> be called from different index types.
>
>
Yes, makes sense. Moved that to bufpage.c. The reason why I originally had
index-specific versions because I started by putting WARM flag in
IndexTuple header. But since hash index does not have the bit free, moved
everything to TID bit-flag. I still left index-specific wrappers, but they
just call PageIndexClearWarmTuples.


> 4.
> - if (callback(htup, callback_state))
> + flags = ItemPointerGetFlags(>t_tid);
> + is_warm = ((flags & BTREE_INDEX_WARM_POINTER) != 0);
> +
> + if (is_warm)
> + stats->num_warm_pointers++;
> + else
> + stats->num_clear_pointers++;
> +
> + result = callback(htup, is_warm, callback_state);
> + if (result == IBDCR_DELETE)
> + {
> + if (is_warm)
> + stats->warm_pointers_removed++;
> + else
> + stats->clear_pointers_removed++;
>
> The patch looks to be inconsistent in collecting stats for btree and
> hash.  I don't see above stats are getting updated in hash index code.
>
>
Fixed. The hashbucketcleanup signature is just getting a bit too long. May
be we should move some of these counters in a structure and pass that
around. Not done here though.


> 5.
> +btrecheck(Relation indexRel, IndexInfo *indexInfo, IndexTuple indexTuple,
> + Relation heapRel, HeapTuple heapTuple)
> {
> ..
> + if (!datumIsEqual(values[i - 1], indxvalue, att->attbyval,
> + att->attlen))
> ..
> }
>
> Will this work if the index is using non-default collation?
>
>
Not sure I understand that. Can you please elaborate?


> 6.
> +++ b/src/backend/access/nbtree/nbtxlog.c
> @@ -390,83 +390,9 @@ btree_xlog_vacuum(XLogReaderState *record)
> -#ifdef UNUSED
>   xl_btree_vacuum *xlrec = (xl_btree_vacuum *) XLogRecGetData(record);
>
>   /*
> - * This section of code is thought to be no longer needed, after analysis
> - * of the calling paths. It is retained to allow the code to be reinstated
> - * if a flaw is revealed in that thinking.
> - *
> ..
>
> Why does this patch need to remove the above code under #ifdef UNUSED
>
>
Yeah, it isn't strictly necessary. But that dead code was coming in the way
and hence I decided to strip it out. I can put it back if it's an issue or
remove that as a separate commit first.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] createlang/droplang deprecated

2017-03-23 Thread Daniel Gustafsson
> On 20 Mar 2017, at 01:37, Peter Eisentraut  
> wrote:
> 
> On 3/18/17 09:00, Peter Eisentraut wrote:
>> I just noticed that createlang and droplang have been listed as
>> deprecated since PG 9.1.
>> 
>> Do we dare remove them?
> 
> Patch

LGTM, +1

cheers ./daniel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Monitoring roles patch

2017-03-23 Thread Dave Page
On Wed, Mar 22, 2017 at 3:46 PM, Dave Page  wrote:
> On Wed, Mar 22, 2017 at 1:15 PM, Stephen Frost  wrote:
>>
>> I did specifically ask for explicit roles to be made to enable such
>> capability and that the pg_monitor role be GRANT'd those roles instead
>> of hacking the pg_monitor OID into those checks, but it seems like
>> that's not been done yet.
>
> Yeah, sorry - I missed that for pg_stat_activity. I'll update the patch.

Updated patch attached. This changes pg_read_all_gucs to
pg_read_all_settings, and adds pg_read_all_stats which it grants to
pg_monitor. pg_read_all_stats has full access to the pg_stat_ views
(as pg_monitor did previously), and is used in the various contrib
modules in place of pg_monitor.

Thanks.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/contrib/pg_buffercache/Makefile b/contrib/pg_buffercache/Makefile
index 497dbeb229..18f7a87452 100644
--- a/contrib/pg_buffercache/Makefile
+++ b/contrib/pg_buffercache/Makefile
@@ -4,8 +4,9 @@ MODULE_big = pg_buffercache
 OBJS = pg_buffercache_pages.o $(WIN32RES)
 
 EXTENSION = pg_buffercache
-DATA = pg_buffercache--1.2.sql pg_buffercache--1.1--1.2.sql \
-   pg_buffercache--1.0--1.1.sql pg_buffercache--unpackaged--1.0.sql
+DATA = pg_buffercache--1.2.sql pg_buffercache--1.2--1.3.sql \
+   pg_buffercache--1.1--1.2.sql pg_buffercache--1.0--1.1.sql \
+   pg_buffercache--unpackaged--1.0.sql
 PGFILEDESC = "pg_buffercache - monitoring of shared buffer cache in real-time"
 
 ifdef USE_PGXS
diff --git a/contrib/pg_buffercache/pg_buffercache.control 
b/contrib/pg_buffercache/pg_buffercache.control
index a4d664f3fa..8c060ae9ab 100644
--- a/contrib/pg_buffercache/pg_buffercache.control
+++ b/contrib/pg_buffercache/pg_buffercache.control
@@ -1,5 +1,5 @@
 # pg_buffercache extension
 comment = 'examine the shared buffer cache'
-default_version = '1.2'
+default_version = '1.3'
 module_pathname = '$libdir/pg_buffercache'
 relocatable = true
diff --git a/contrib/pg_freespacemap/Makefile b/contrib/pg_freespacemap/Makefile
index 7bc0e9555d..0a2f000ec6 100644
--- a/contrib/pg_freespacemap/Makefile
+++ b/contrib/pg_freespacemap/Makefile
@@ -4,8 +4,8 @@ MODULE_big = pg_freespacemap
 OBJS = pg_freespacemap.o $(WIN32RES)
 
 EXTENSION = pg_freespacemap
-DATA = pg_freespacemap--1.1.sql pg_freespacemap--1.0--1.1.sql \
-   pg_freespacemap--unpackaged--1.0.sql
+DATA = pg_freespacemap--1.1.sql pg_freespacemap--1.1--1.2.sql \
+   pg_freespacemap--1.0--1.1.sql pg_freespacemap--unpackaged--1.0.sql
 PGFILEDESC = "pg_freespacemap - monitoring of free space map"
 
 ifdef USE_PGXS
diff --git a/contrib/pg_freespacemap/pg_freespacemap.control 
b/contrib/pg_freespacemap/pg_freespacemap.control
index 764db30d18..ac8fc5050a 100644
--- a/contrib/pg_freespacemap/pg_freespacemap.control
+++ b/contrib/pg_freespacemap/pg_freespacemap.control
@@ -1,5 +1,5 @@
 # pg_freespacemap extension
 comment = 'examine the free space map (FSM)'
-default_version = '1.1'
+default_version = '1.2'
 module_pathname = '$libdir/pg_freespacemap'
 relocatable = true
diff --git a/contrib/pg_stat_statements/Makefile 
b/contrib/pg_stat_statements/Makefile
index 298951a5f5..39b368b70e 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -4,9 +4,10 @@ MODULE_big = pg_stat_statements
 OBJS = pg_stat_statements.o $(WIN32RES)
 
 EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.3--1.4.sql \
-   pg_stat_statements--1.2--1.3.sql pg_stat_statements--1.1--1.2.sql \
-   pg_stat_statements--1.0--1.1.sql pg_stat_statements--unpackaged--1.0.sql
+DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.4--1.5.sql \
+   pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \
+   pg_stat_statements--1.1--1.2.sql pg_stat_statements--1.0--1.1.sql \
+   pg_stat_statements--unpackaged--1.0.sql
 PGFILEDESC = "pg_stat_statements - execution statistics of SQL statements"
 
 LDFLAGS_SL += $(filter -lm, $(LIBS))
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c 
b/contrib/pg_stat_statements/pg_stat_statements.c
index 221ac98d4a..cec94d5896 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -62,6 +62,7 @@
 #include 
 
 #include "access/hash.h"
+#include "catalog/pg_authid.h"
 #include "executor/instrument.h"
 #include "funcapi.h"
 #include "mb/pg_wchar.h"
@@ -1386,7 +1387,7 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
MemoryContext per_query_ctx;
MemoryContext oldcontext;
Oid userid = GetUserId();
-   boolis_superuser = superuser();
+   boolis_superuser = false;
char   *qbuffer = NULL;
Sizeqbuffer_size = 0;
Sizeextent = 0;
@@ -1394,6 +1395,9 @@ 

Re: [HACKERS] Parallel Append implementation

2017-03-23 Thread Amit Khandekar
On 23 March 2017 at 05:55, Robert Haas  wrote:
> On Wed, Mar 22, 2017 at 4:49 AM, Amit Khandekar  
> wrote:
>> Attached is the updated patch that handles the changes for all the
>> comments except the cost changes part. Details about the specific
>> changes are after the cost-related points discussed below.
>>
>> For non-partial paths, I was checking following 3 options :
>>
>> Option 1. Just take the sum of total non-partial child costs and
>> divide it by number of workers. It seems to be getting close to the
>> actual cost.
>
> If the costs for all children are about equal, then that works fine.
> But when they are very unequal, then it's highly misleading.
>
>> Option 2. Calculate exact cost by an algorithm which I mentioned
>> before, which is pasted below for reference :
>> Per­subpath cost : 20 16 10 8 3 1, with 3 workers.
>> After 10 time units (this is minimum of first 3 i.e. 20, 16, 10), the
>> times remaining are :
>> 10  6  0 8 3 1
>> After 6 units (minimum of 10, 06, 08), the times remaining are :
>> 4  0  0 2 3 1
>> After 2 units (minimum of 4, 2, 3), the times remaining are :
>>  2  0  0 0 1 1
>> After 1 units (minimum of 2, 1, 1), the times remaining are :
>>  1  0  0 0 0 0
>> After 1 units (minimum of 1, 0 , 0), the times remaining are :
>>  0  0  0 0 0 0
>> Now add up above time chunks : 10 + 6 + 2 + 1 + 1 = 20
>

> This gives the same answer as what I was proposing

Ah I see.

> but I believe it's more complicated to compute.
Yes a bit, particularly because in my algorithm, I would have to do
'n' subtractions each time, in case of 'n' workers. But it looked more
natural because it follows exactly the way we manually calculate.

> The way my proposal would work in this
> case is that we would start with an array C[3] (since there are three
> workers], with all entries 0.  Logically C[i] represents the amount of
> work to be performed by worker i.  We add each path in turn to the
> worker whose array entry is currently smallest; in the case of a tie,
> just pick the first such entry.
>
> So in your example we do this:
>
> C[0] += 20;
> C[1] += 16;
> C[2] += 10;
> /* C[2] is smaller than C[0] or C[1] at this point, so we add the next
> path to C[2] */
> C[2] += 8;
> /* after the previous line, C[1] is now the smallest, so add to that
> entry next */
> C[1] += 3;
> /* now we've got C[0] = 20, C[1] = 19, C[2] = 18, so add to C[2] */
> C[2] += 1;
> /* final result: C[0] = 20, C[1] = 19, C[2] = 19 */
>
> Now we just take the highest entry that appears in any array, which in
> this case is C[0], as the total cost.

Wow. The way your final result exactly tallies with my algorithm
result is very interesting. This looks like some maths or computer
science theory that I am not aware.

I am currently coding the algorithm using your method. Meanwhile
attached is a patch that takes care of your other comments, details of
which are below...

>
> In my previous review, I said that you should "define and document a
> new builtin tranche ID"; you did the first but not the second.  See
> the table in monitoring.sgml.

Yeah, I tried to search how TBM did in the source, but I guess I
didn't correctly run "git grep" commands, so the results did not have
monitoring.sgml, so I thought may be you mean something else by
"document".

Added changes in monitoring.sgml now.

>
> Definition of exec_append_goto_next_plan should have a line break
> after the return type, per usual PostgreSQL style rules.

Oops. Done.

>
> - * initialize to scan first subplan
> + * In case it's a sequential Append, initialize to scan first subplan.
>
> This comment is confusing because the code is executed whether it's
> parallel or not.  I think it might be better to write something like
> "initialize to scan first subplan (but note that we'll override this
> later in the case of a parallel append)"
Done.

>
>  /*
> + * Check if we are already finished plans from parallel append. This
> + * can happen if all the subplans are finished when this worker
> + * has not even started returning tuples.
> + */
> +if (node->as_padesc && node->as_whichplan == PA_INVALID_PLAN)
> +return ExecClearTuple(node->ps.ps_ResultTupleSlot);
>
> There seems to be no reason why this couldn't be hoisted out of the
> loop.  Actually, I think Ashutosh pointed this out before, but I
> didn't understand at that time what his point was.  Looking back, I
> see that he also pointed out that the as_padesc test isn't necessary,
> which is also true.

I am assuming both yours and Ashutosh's concern is that this check
will be executed for *each* tuple returned, and which needs to be
avoided. Actually, just moving it out of the loop is not going to
solve the runs-for-each-tuple issue. It still will execute for each
tuple. But after a thought, now I agree this can be taken out of loop
anyways, but, not for solving the per-tuple issue, but because it need
not be 

Re: [HACKERS] pgbench - allow to store select results into variables

2017-03-23 Thread Rafia Sabih
On Thu, Mar 16, 2017 at 12:45 AM, Fabien COELHO  wrote:
>
> Hello Rafia,
>
>> I was reviewing v7 of this patch, to start with I found following white
>> space errors when applying with git apply,
>> /home/edb/Desktop/patches/others/pgbench-into-7.patch:66: trailing
>> whitespace.
>
>
> Yep.
>
> I do not know why "git apply" sometimes complains. All is fine for me both
> with "git apply" and "patch".
>
> Last time it was because my mailer uses text/x-diff for the mime type, as
> define by the system in "/etc/mime.types", which some mailer then interpret
> as a license to change eol-style when saving, resulting in this kind of
> behavior. Could you tell your mailer just to save the file as is?
>
>> Apart from that, on executing SELECT 1 AS a \gset \set i debug(:a) SELECT
>> 2
>> AS a \gcset SELECT 3; given in your provided script gset-1.sql. it is
>> giving error Invalid command \gcset.
>
>
> Are you sure that you are using the compiled pgbench, not a previously
> installed one?
>
>   bin/pgbench> pgbench -t 1 -f SQL/gset-1.sql
> SQL/gset-1.sql:1: invalid command in command "gset"
> \gset
>
>   bin/pgbench> ./pgbench -t 1 -f SQL/gset-1.sql
> starting vacuum...end.
> debug(script=0,command=2): int 1
> debug(script=0,command=4): int 2
> ...
>
>> Not sure what is the intention of this script anyway?
>
>
> The intention is to test that gset & gcset work as expected in various
> settings, especially with combined queries (\;) the right result must be
> extracted in the sequence.
>
>> Also, instead of so many different files for error why don't you combine
>> it into one.
>
>
> Because a pgbench scripts stops on the first error, and I wanted to test
> what happens with several kind of errors.
>
if (my_command->argc > 2)
+ syntax_error(source, lineno, my_command->line, my_command->argv[0],
+ "at most on argument expected", NULL, -1);

I suppose you mean 'one' argument here.
Apart from that indentation is not correct as per pgindent, please check.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-23 Thread Mark Kirkwood

On 24/03/17 02:00, Peter Eisentraut wrote:

On 3/21/17 21:38, Peter Eisentraut wrote:

This patch is looking pretty good to me, modulo the failing pg_dump tests.

Attached is a fixup patch.  I have mainly updated some comments and
variable naming for (my) clarity.  No functional changes.

Committed all that.



Testing now this patch is in, I'm unable to create a subscription:

(master)

bench=# CREATE PUBLICATION pgbench
FOR TABLE pgbench_accounts , pgbench_branches,
  pgbench_tellers, pgbench_history
WITH (PUBLISH INSERT, PUBLISH UPDATE, PUBLISH DELETE);

(slave)

bench=# CREATE SUBSCRIPTION pgbench
CONNECTION 'port=5447 user=postgres dbname=bench'
PUBLICATION pgbench
WITH (COPY DATA);
ERROR:  duplicate key value violates unique constraint 
"pg_subscription_rel_srrelid_srsubid_index"

DETAIL:  Key (srrelid, srsubid)=(0, 16389) already exists.

This is a pair of freshly initdb'ed instances, the master has a size 100 
pgbench schema.


I'm guessing this is a different bug from the segfault also reported


regards


Mark




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2017-03-23 Thread Thomas Munro
On Tue, Mar 14, 2017 at 7:51 AM, Kevin Grittner  wrote:
> On Sun, Mar 12, 2017 at 4:08 PM, Thomas Munro
>  wrote:
>> I found a new way to break it: run the trigger function so
>> that the plan is cached by plpgsql, then ALTER TABLE incompatibly,
>> then run the trigger function again.  See attached.
>
> [...]
>
> I expected that existing mechanisms would have forced re-planning of
> a trigger function if the table the function was attached to was
> altered.  Either that was a bit "optimistic", or the old TupleDesc
> is used for the new plan.  Will track down which it is, and fix it.

When PlanCacheRelCallback runs, I don't think it understands that
these named tuplestore RangeTblEntry objects are dependent on the
subject table.  Could that be fixed like this?

@@ -2571,6 +2582,9 @@ extract_query_dependencies_walker(Node *node,
PlannerInfo *context)
if (rte->rtekind == RTE_RELATION)
context->glob->relationOids =

lappend_oid(context->glob->relationOids, rte->relid);
+   else if (rte->rtekind == RTE_NAMEDTUPLESTORE)
+   context->glob->relationOids =
+
lappend_oid(context->glob->relationOids, [subject table's OID]);
}

> I'll post a new patch once I figure out the dropped column on the
> cached function plan.

If that's fixed and the permissions question can be waved away by
saying it's the same as the per-row situation, my only other comment
would be a bikeshed issue: Enr isn't a great name for a struct.

Very keen to see this feature in PostgreSQL 10!

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-23 Thread Andres Freund
On 2017-03-23 21:58:03 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-03-23 21:26:19 -0400, Tom Lane wrote:
> >> Hmm, I see ... but that only works in the cases where the caller of
> >> ExecBuildProjectionInfo supplied a source slot, and a lot of 'em
> >> don't.
> 
> > Right, the old and new code comment on that:
> 
> >  * inputDesc can be NULL, but if it is not, we check to see whether simple
> >  * Vars in the tlist match the descriptor.  It is important to provide
> >  * inputDesc for relation-scan plan nodes, as a cross check that the 
> > relation
> >  * hasn't been changed since the plan was made.  At higher levels of a plan,
> >  * there is no need to recheck.
> 
> Ah, I'd forgotten the assumption that we only need to check this at scan
> level.
> 
> >> I'd already put in the infrastructure to add ASSIGN_FOO_VAR_FIRST
> >> step types.  I could take it back out, but I wonder if it wouldn't be
> >> smarter to keep it and remove the restriction in ExecBuildProjectionInfo.
> >> Or maybe we could have ExecBuildProjectionInfo emit either
> >> ASSIGN_FOO_VAR_FIRST or ASSIGN_FOO_VAR depending on whether it can prove
> >> the reference safe.
> 
> > I think it's probably ok to just leave the check in, and remove those
> > comments, and simplify the isSimpleVar stuff to only check if
> >  IsA(tle->expr, Var) && ((Var *) tle->expr)->varattno > 0)
> 
> Not sure.  It's a pretty fair amount of duplicative code, once you finish
> dealing with all the ExecJustFoo functions in addition to the main code
> paths.  At this point I'm inclined to take it back out and improve the
> comments around ExecBuildProjectionInfo.

I'm ok with both.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] error handling in RegisterBackgroundWorker

2017-03-23 Thread Peter Eisentraut
On 2/15/17 12:11, Robert Haas wrote:
> On Wed, Feb 15, 2017 at 11:30 AM, Peter Eisentraut
>  wrote:
>> If RegisterBackgroundWorker() (the non-dynamic kind that is only
>> loadable from shared_preload_libraries) fails to register the worker, it
>> writes a log message and proceeds, ignoring the registration request.  I
>> think that is a mistake, it should be a hard error.  The only way in
>> practice to fix the problem is to change shared_preload_libraries or
>> max_worker_processes, both requiring a restart anyway, so proceeding
>> without the worker is not useful.
> 
> I guess the question is whether people will prefer to have the
> database start up and be missing the worker, or to have it not start.
> As you point out, the former is likely to result in an eventual
> restart, but the latter may lead to a longer period of downtime RIGHT
> NOW.  People tend to really hate things that make the database not
> start, so I'm not sure what's best here.

Any other thoughts on this?  Seems like a potential usability issue.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Python: Add cursor and execute methods to plan object

2017-03-23 Thread Jim Nasby

On 2/25/17 10:27 AM, Peter Eisentraut wrote:

So I'm also wondering here which style people prefer so
I can implement it there.


I think the more OO style is definitely better. I expect it would 
simplify the code as well.

--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Measuring replay lag

2017-03-23 Thread Craig Ringer
On 24 March 2017 at 05:39, Thomas Munro  wrote:

> Fujii-san for the idea of tracking write and flush lag too

You mentioned wishing that logical replication would update sent lag
as the decoding position.

It appears to do just that already; see the references to restart_lsn
in StartLogicalReplication, and the update of sentPtr in
XLogSendLogical .

It's a bit misleading, since it hasn't *sent* anything, it buffers it
until commit. But it's useful.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: pageinspect / add page_checksum and bt_page_items(bytea)

2017-03-23 Thread Peter Eisentraut
On 3/17/17 18:35, Tomas Vondra wrote:
> On 03/17/2017 05:23 PM, Peter Eisentraut wrote:
>> I'm struggling to find a good way to share code between
>> bt_page_items(text, int4) and bt_page_items(bytea).
>>
>> If we do it via the SQL route, as I had suggested, it makes the
>> extension non-relocatable, and it will also create a bit of a mess
>> during upgrades.
>>
>> If doing it in C, it will be a bit tricky to pass the SRF context
>> around.  There is no "DirectFunctionCall within SRF context", AFAICT.
> 
> Not sure what it has to do with DirectFunctionCall? You want to call the 
> bytea variant from the existing one? Wouldn't it be easier to simply 
> define a static function with the shared parts, and pass around the 
> fctx/fcinfo? Not quite pretty, but should work.

Perhaps what was added in

would actually work here.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Sync pg_dump and pg_dumpall output

2017-03-23 Thread Michael Paquier
On Thu, Mar 23, 2017 at 1:10 AM, Stephen Frost  wrote:
> Andrew,
>
> * Andrew Dunstan (andrew.duns...@2ndquadrant.com) wrote:
>> On 03/22/2017 11:39 AM, Stephen Frost wrote:
>> > * Andrew Dunstan (and...@dunslane.net) wrote:
>> >> Sync pg_dump and pg_dumpall output
>> > This probably should have adjusted all callers of pg_dump in the
>> > regression tests to use the --no-sync option, otherwise we'll end up
>> > spending possibly a good bit of time calling fsync() during the
>> > regression tests unnecessairly.
>>
>> All of them? The imnpact is not likely to be huge in most cases
>> (possibly different on Windows). On crake, the bin-check stage actually
>> took less time after the change than before, so I suspect that the
>> impact will be pretty small.
>
> Well, perhaps not all, but I'd think --no-sync would be better to have
> in most cases.  We turn off fsync for all of the TAP tests and all
> initdb calls, so it seems like we should here too.  Perhaps it won't be
> much overhead on an unloaded system, but not all of the buildfarm
> animals seem to be on unloaded systems, nor are they particularly fast
> in general or have fast i/o..

Agreed.

>> Still I agree that we should have tests for both cases.
>
> Perhaps, though if I recall correctly, we've basically had zero calls
> for fsync() until this.  If we don't feel like we need to test that in
> the backend then it seems a bit silly to feel like we need it for
> pg_dump's regression coverage.
>
> That said, perhaps the right answer is to have a couple tests for both
> the backend and for pg_dump which do exercise the fsync-enabled paths.

And agreed.

The patch attached uses --no-sync in most places for pg_dump, except
in 4 places of 002_pg_dump.pl to stress as well the sync code path.
-- 
Michael


test-dump-nosync.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow pg_dumpall to work without pg_authid

2017-03-23 Thread Peter Eisentraut
On 3/21/17 23:34, Tom Lane wrote:
> Peter Eisentraut  writes:
>> No answer.  Can we remove this chunk?
> 
>>> +   if (no_role_passwords && binary_upgrade)
> 
> Perhaps, but why?  ISTM that trying to run pg_upgrade as non-superuser
> is a nonstarter for a number of reasons, while if you're superuser you
> do not need --no-role-passwords.

Well, this code was added, apparently without reason.  We don't need to
actively prohibit option combinations just because they are unusual.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Monitoring roles patch

2017-03-23 Thread Peter Eisentraut
On 3/22/17 09:17, Stephen Frost wrote:
>> If we do it via GRANTs instead, then users can easily extend it.
> The intent here is that users will *also* be able to do it via GRANTs if
> they wish to.

But why not do it with GRANTs in the first place then?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-23 Thread Petr Jelinek
On 24/03/17 00:14, Mark Kirkwood wrote:
> On 24/03/17 02:00, Peter Eisentraut wrote:
>> On 3/21/17 21:38, Peter Eisentraut wrote:
>>> This patch is looking pretty good to me, modulo the failing pg_dump
>>> tests.
>>>
>>> Attached is a fixup patch.  I have mainly updated some comments and
>>> variable naming for (my) clarity.  No functional changes.
>> Committed all that.
>>
> 
> Testing now this patch is in, I'm unable to create a subscription:
> 
> (master)
> 
> bench=# CREATE PUBLICATION pgbench
> FOR TABLE pgbench_accounts , pgbench_branches,
>   pgbench_tellers, pgbench_history
> WITH (PUBLISH INSERT, PUBLISH UPDATE, PUBLISH DELETE);
> 
> (slave)
> 
> bench=# CREATE SUBSCRIPTION pgbench
> CONNECTION 'port=5447 user=postgres dbname=bench'
> PUBLICATION pgbench
> WITH (COPY DATA);
> ERROR:  duplicate key value violates unique constraint
> "pg_subscription_rel_srrelid_srsubid_index"
> DETAIL:  Key (srrelid, srsubid)=(0, 16389) already exists.
> 
> This is a pair of freshly initdb'ed instances, the master has a size 100
> pgbench schema.
> 
> I'm guessing this is a different bug from the segfault also reported
> 

Yes, I also forgot to check if the table actually exists on subscriber
when fetching them in CREATE SUBSCRIPTION (we have check during
replication but not there).

Attached patches should fix both issues.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
>From cdb2590e8b42d0dec4c00d8ae7a6affa2bb41372 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Fri, 24 Mar 2017 00:24:47 +0100
Subject: [PATCH 2/2] Check that published table exists on subscriber

---
 src/backend/commands/subscriptioncmds.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 8f201b2..ad11610 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -417,6 +417,11 @@ CreateSubscription(CreateSubscriptionStmt *stmt, bool isTopLevel)
 Oid			relid;
 
 relid = RangeVarGetRelid(rv, AccessShareLock, true);
+if (relid == InvalidOid)
+	ereport(ERROR,
+			(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+			 errmsg("published table \"%s.%s\" not found on subscriber",
+	rv->schemaname, rv->relname)));
 
 SetSubscriptionRelState(subid, relid, table_state,
 		InvalidXLogRecPtr);
@@ -513,6 +518,12 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
 		Oid			relid;
 
 		relid = RangeVarGetRelid(rv, AccessShareLock, false);
+		if (relid == InvalidOid)
+			ereport(ERROR,
+	(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+	 errmsg("published table \"%s.%s\" not found on subscriber",
+			rv->schemaname, rv->relname)));
+
 		pubrel_local_oids[off++] = relid;
 
 		if (!bsearch(, subrel_local_oids,
-- 
2.7.4

>From e492ead4b8036f03b4624093d64280aa6ea23129 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Thu, 23 Mar 2017 23:39:25 +0100
Subject: [PATCH 1/2] Always return tupleslot and tupledesc from libpqrcv_exec

---
 src/backend/replication/libpqwalreceiver/libpqwalreceiver.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 4dd8eef..9d7bb25 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -803,10 +803,6 @@ libpqrcv_processTuples(PGresult *pgres, WalRcvExecResult *walres,
 	MemoryContext	rowcontext;
 	MemoryContext	oldcontext;
 
-	/* No point in doing anything here if there were no tuples returned. */
-	if (PQntuples(pgres) == 0)
-		return;
-
 	/* Make sure we got expected number of fields. */
 	if (nfields != nRetTypes)
 		ereport(ERROR,
@@ -824,6 +820,10 @@ libpqrcv_processTuples(PGresult *pgres, WalRcvExecResult *walres,
 		   PQfname(pgres, coln), retTypes[coln], -1, 0);
 	attinmeta = TupleDescGetAttInMetadata(walres->tupledesc);
 
+	/* No point in doing more here if there were no tuples returned. */
+	if (PQntuples(pgres) == 0)
+		return;
+
 	/* Create temporary context for local allocations. */
 	rowcontext = AllocSetContextCreate(CurrentMemoryContext,
 	   "libpqrcv query result context",
-- 
2.7.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-23 Thread Tom Lane
Andres Freund  writes:
> On 2017-03-23 20:36:32 -0400, Tom Lane wrote:
>> The problem here is that once we drop the buffer pin, any pointers we may
>> have into on-disk data are dangling pointers --- we're at risk of some
>> other backend taking away that shared buffer.  (So I'm afraid that the
>> commentary there is overly optimistic.)  So even though those pointers
>> may still be there beyond tts_nvalid, subsequent references to them are
>> very dangerous.

> This applies to the code in master as well, no?  An ExecEvalScalarVar()
> followed by an ExecEvalWholeRowVar() would have precisely the same
> effect?

Yeah.  The other order would be safe, because ExecEvalScalarVar would do
slot_getattr which would re-extract the value from the newly materialized
tuple.  But there definitely seems to be a hazard for the order you
mentioned.

> Do we need to do anything about this in the back-branches,
> given how unlikely this is going to be in practice?

Probably not.  As I mentioned, I think this may be only theoretical rather
than real, if you believe that buffer pins would only be associated with
slots holding references to regular tuples.  And even if it's not
theoretical, the odds of seeing a failure in the field seem pretty tiny
given that a just-released buffer shouldn't be subject to recycling for
a fair while.  But I don't want to leave it like this going forward.

>> So I think that we have got to fix ExecEvalWholeRowVar so that it doesn't
>> clobber the state of the slot.

> That seems like a good plan.

Yeah.  I have the stopgap code in my working copy, and will look at
refactoring the tuptoaster code for better performance later.

>> Also, while trying to test the above scenario, I realized that the patch
>> as submitted was being way too cavalier about where it was applying 
>> CheckVarSlotCompatibility and so on.  The ASSIGN_FOO_VAR steps, for
>> instance, had no protection at all.

> I don't think that's true - the assign checks had copied the code from
> the old ExecBuildProjectionInfo, setting isSimpleVar iff
> (!attr->attisdropped && variable->vartype == attr->atttypid) - we can
> check that for projections in contrast to normal expressions because we
> already know the slot.

Hmm, I see ... but that only works in the cases where the caller of
ExecBuildProjectionInfo supplied a source slot, and a lot of 'em don't.
As the code stands, we are unable to use ASSIGN_FOO_VAR in quite a lot
of places, including everywhere above the relation scan level.

I'd already put in the infrastructure to add ASSIGN_FOO_VAR_FIRST
step types.  I could take it back out, but I wonder if it wouldn't be
smarter to keep it and remove the restriction in ExecBuildProjectionInfo.
Or maybe we could have ExecBuildProjectionInfo emit either
ASSIGN_FOO_VAR_FIRST or ASSIGN_FOO_VAR depending on whether it can prove
the reference safe.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-23 Thread Andres Freund
On 2017-03-23 21:26:19 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-03-23 20:36:32 -0400, Tom Lane wrote:
> >> The problem here is that once we drop the buffer pin, any pointers we may
> >> have into on-disk data are dangling pointers --- we're at risk of some
> >> other backend taking away that shared buffer.  (So I'm afraid that the
> >> commentary there is overly optimistic.)  So even though those pointers
> >> may still be there beyond tts_nvalid, subsequent references to them are
> >> very dangerous.
> 
> > This applies to the code in master as well, no?  An ExecEvalScalarVar()
> > followed by an ExecEvalWholeRowVar() would have precisely the same
> > effect?
> 
> Yeah.  The other order would be safe, because ExecEvalScalarVar would do
> slot_getattr which would re-extract the value from the newly materialized
> tuple.  But there definitely seems to be a hazard for the order you
> mentioned.
> 
> > Do we need to do anything about this in the back-branches,
> > given how unlikely this is going to be in practice?
> 
> Probably not.  As I mentioned, I think this may be only theoretical rather
> than real, if you believe that buffer pins would only be associated with
> slots holding references to regular tuples.  And even if it's not
> theoretical, the odds of seeing a failure in the field seem pretty tiny
> given that a just-released buffer shouldn't be subject to recycling for
> a fair while.  But I don't want to leave it like this going forward.

Ok.


> >> Also, while trying to test the above scenario, I realized that the patch
> >> as submitted was being way too cavalier about where it was applying 
> >> CheckVarSlotCompatibility and so on.  The ASSIGN_FOO_VAR steps, for
> >> instance, had no protection at all.
> 
> > I don't think that's true - the assign checks had copied the code from
> > the old ExecBuildProjectionInfo, setting isSimpleVar iff
> > (!attr->attisdropped && variable->vartype == attr->atttypid) - we can
> > check that for projections in contrast to normal expressions because we
> > already know the slot.
> 
> Hmm, I see ... but that only works in the cases where the caller of
> ExecBuildProjectionInfo supplied a source slot, and a lot of 'em
> don't.

Right, the old and new code comment on that:

 * inputDesc can be NULL, but if it is not, we check to see whether simple
 * Vars in the tlist match the descriptor.  It is important to provide
 * inputDesc for relation-scan plan nodes, as a cross check that the relation
 * hasn't been changed since the plan was made.  At higher levels of a plan,
 * there is no need to recheck.

and that seems like reasonable to me?  That said, I think we can remove
that assumption, by checking once.


> As the code stands, we are unable to use ASSIGN_FOO_VAR in quite a lot
> of places, including everywhere above the relation scan level.

Hm? If inputDesc isn't given we just, before and after, do:
if (!inputDesc)
isSimpleVar = true; /* can't check 
type, assume OK */



> I'd already put in the infrastructure to add ASSIGN_FOO_VAR_FIRST
> step types.  I could take it back out, but I wonder if it wouldn't be
> smarter to keep it and remove the restriction in ExecBuildProjectionInfo.
> Or maybe we could have ExecBuildProjectionInfo emit either
> ASSIGN_FOO_VAR_FIRST or ASSIGN_FOO_VAR depending on whether it can prove
> the reference safe.

I think it's probably ok to just leave the check in, and remove those
comments, and simplify the isSimpleVar stuff to only check if
 IsA(tle->expr, Var) && ((Var *) tle->expr)->varattno > 0)

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No more libedit?! - openssl plans to switch to APL2

2017-03-23 Thread Andres Freund
On 2017-03-24 02:31:47 +0100, Andreas Karlsson wrote:
> On 08/01/2015 05:14 PM, Andres Freund wrote:
> > According to https://www.openssl.org/blog/blog/2015/08/01/cla/ openssl
> > is planning to relicense to the apache license 2.0. While APL2 is not
> > compatible with GLP2 it *is* compatible with GPL3.
> 
> Great! This means that the Debian packages will eventually be able to drop
> their LD_PRELOAD hack, which never worked perfectly due to compiling against
> libedit or libreadline header resulting in different binaries.

Well, relicensing is hard. It's far from guaranteed to succeed...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pageinspect and hash indexes

2017-03-23 Thread Ashutosh Sharma
On Fri, Mar 24, 2017 at 9:21 AM, Amit Kapila  wrote:
> On Thu, Mar 23, 2017 at 6:46 PM, Ashutosh Sharma  
> wrote:
>>>
>>> Oh, okay, but my main objection was that we should not check hash page
>>> type (hasho_flag) without ensuring whether it is a hash page.  Can you
>>> try to adjust the above code so that this check can be moved after
>>> hasho_page_id check?
>>
>> Yes, I got your point. I have done that but then i had to remove the
>> check for PageIsEmpty(). Anyways, I think PageIsEmpty() condition will
>> only be true for one page in entire hash index table and can be
>> ignored. If you wish, I could mention about it in the documentation.
>>
>
> Yeah, I think it is worth adding a Note in docs, but we can do that
> separately if required.
>
>>>
 To avoid
 this, at the start of verify_hash_page function itself if we recognise
 page as UNUSED page we return immediately.

>
> 2.
> + /* Check if it is an empty hash page. */
> + if (PageIsEmpty(page))
> + ereport(ERROR,
> + (errcode(ERRCODE_INDEX_CORRUPTED),
> + errmsg("index table contains empty page")));
>
>
> Do we want to give a separate message for EMPTY and NEW pages?  Isn't
> it better that the same error message can be given for both of them as
> from user perspective there is not much difference between both the
> messages?

 I think we should show separate message because they are two different
 type of pages. In the sense like, one is initialised whereas other is
 completely zero.

>>>
>>> I understand your point, but not sure if it makes any difference to user.
>>>
>>
>> okay, I have now anyways removed the check for PageIsEmpty. Please
>> refer to the attached '0002
>> allow_pageinspect_handle_UNUSED_hash_pages.patch'
>>
>
> +
> if (pageopaque->hasho_page_id != HASHO_PAGE_ID)
> ereport(ERROR,
>
> spurious white space.
>
>>
>> Also, I have attached
>> '0001-Mark-freed-overflow-page-as-UNUSED-pagetype-v2.patch' that
>> handles your comment mentioned in [1].
>>
>
> In general, we have to initialize prevblock with max_bucket, but here
> it is okay, because we anyway initialize it after this page is
> allocated as overflow page.
>
> Your patches look good to me except for small white space change.

Thanks for reviewing my patch. I have removed the extra white space.
Attached are both the patches.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 86eb711ea80b495c9d2f5228558682d0e95c41eb Mon Sep 17 00:00:00 2001
From: ashu 
Date: Thu, 23 Mar 2017 16:47:17 +0530
Subject: [PATCH] Mark freed overflow page as UNUSED pagetype v2

---
 src/backend/access/hash/hash_xlog.c |  9 +
 src/backend/access/hash/hashovfl.c  | 13 -
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index d9ac42c..2ccaf46 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -697,11 +697,20 @@ hash_xlog_squeeze_page(XLogReaderState *record)
 	if (XLogReadBufferForRedo(record, 2, ) == BLK_NEEDS_REDO)
 	{
 		Page		ovflpage;
+		HashPageOpaque ovflopaque;
 
 		ovflpage = BufferGetPage(ovflbuf);
 
 		_hash_pageinit(ovflpage, BufferGetPageSize(ovflbuf));
 
+		ovflopaque = (HashPageOpaque) PageGetSpecialPointer(ovflpage);
+
+		ovflopaque->hasho_prevblkno = InvalidBlockNumber;
+		ovflopaque->hasho_nextblkno = InvalidBlockNumber;
+		ovflopaque->hasho_bucket = -1;
+		ovflopaque->hasho_flag = LH_UNUSED_PAGE;
+		ovflopaque->hasho_page_id = HASHO_PAGE_ID;
+
 		PageSetLSN(ovflpage, lsn);
 		MarkBufferDirty(ovflbuf);
 	}
diff --git a/src/backend/access/hash/hashovfl.c b/src/backend/access/hash/hashovfl.c
index a3cae21..d3eaee0 100644
--- a/src/backend/access/hash/hashovfl.c
+++ b/src/backend/access/hash/hashovfl.c
@@ -591,9 +591,20 @@ _hash_freeovflpage(Relation rel, Buffer bucketbuf, Buffer ovflbuf,
 	/*
 	 * Initialize the freed overflow page.  Just zeroing the page won't work,
 	 * because WAL replay routines expect pages to be initialized. See
-	 * explanation of RBM_NORMAL mode atop XLogReadBufferExtended.
+	 * explanation of RBM_NORMAL mode atop XLogReadBufferExtended. Also, mark
+	 * the page as UNUSED type and retain it's page id. This allows the tools
+	 * like pageinspect to consider it as a hash page.
 	 */
 	_hash_pageinit(ovflpage, BufferGetPageSize(ovflbuf));
+
+	ovflopaque = (HashPageOpaque) PageGetSpecialPointer(ovflpage);
+
+	ovflopaque->hasho_prevblkno = InvalidBlockNumber;
+	ovflopaque->hasho_nextblkno = InvalidBlockNumber;
+	ovflopaque->hasho_bucket = -1;
+	ovflopaque->hasho_flag = LH_UNUSED_PAGE;
+	ovflopaque->hasho_page_id = HASHO_PAGE_ID;
+
 	MarkBufferDirty(ovflbuf);
 
 	if (BufferIsValid(prevbuf))
-- 
1.8.3.1

diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c
index 812a03f..6ba8847 

[HACKERS] comment/security label for publication/subscription

2017-03-23 Thread Peter Eisentraut
Here is a patch to add COMMENT support for publications and subscriptions.

On a similar issue, do we need SECURITY LABEL support for those?  Does
that make sense?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-Add-COMMENT-support-for-publications-and-subscriptio.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-23 Thread Peter Eisentraut
On 3/23/17 21:47, Jeff Janes wrote:
> I have a pg_restore which predicts the file 5 files ahead of the one it
> was asked for, and initiates a pre-fetch and decompression of it. Then
> it delivers the file it was asked for, either by pulling it out of the
> pre-staging area set up by the N-5th invocation, or by going directly to
> the archive to get it.  This speeds up play-back dramatically when the
> files are stored compressed and non-local.

Yeah, some better support for prefetching would be necessary to avoid
having to have any knowledge of the file naming.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-23 Thread Andres Freund
On 2017-03-23 20:36:32 -0400, Tom Lane wrote:
> So I think that we have got to fix ExecEvalWholeRowVar so that it doesn't
> clobber the state of the slot.  Right at the moment, the only way to do
> that seems to be to do this instead of ExecFetchSlotTupleDatum:
> 
> tuple = ExecCopySlotTuple(slot);
> dtuple = (HeapTupleHeader)
> DatumGetPointer(heap_copy_tuple_as_datum(tuple,
>  slot->tts_tupleDescriptor));
> heap_freetuple(tuple);

Hm.  One disadvantage would be that repeated whole-row references to the
same table would be a bit slower, because we'd repeatedly form a tuple
from a virtual one - but I have a hard time coming up with a scenario
where that'd matter.  I'd suspect that in the end it'd probably even
have a *positive* performance impact, because right now the next scalar
access will have to deform the whole tuple again, and that seems like a
lot more likely scenario.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] free space map and visibility map

2017-03-23 Thread Masahiko Sawada
On Fri, Mar 24, 2017 at 11:01 AM, Kyotaro HORIGUCHI
 wrote:
> At Wed, 22 Mar 2017 02:15:26 +0900, Masahiko Sawada  
> wrote in 
>> On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas  wrote:
>> > On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes  wrote:
>> >> Isn't HEAP2_CLEAN only issued before an intended HOT update?  (Which then
>> >> can't leave the block as all visible or all frozen).  I think the issue is
>> >> here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE.  Am I reading this correctly,
>> >> that neither of those ever update the FSM, regardless of FPI?
>> >
>> > Yes, updates to the FSM are never logged.  Forcing replay of
>> > HEAP2_FREEZE_PAGE to update the FSM might be a good idea.
>> >
>>
>> I think I was missing something. I imaged your situation is that FPI
>> is replayed during crash recovery after the crashed server vacuums the
>> page and marked it as all-frozen. But this situation is also resolved
>> by that solution.
>
> # HEAP2_CLEAN is issued in lazy_vacuum_page
>
> It will work but I'm not sure it is right direction for
> HEAP2_FREEZE_PAGE to touch FSM.
>
> As Masahiko said, the situation must be created by HEAP2_VISIBLE
> without preceding HEAP2_CLEAN, or with HEAP2_CLEAN with FPI. I
> think only the latter can happen. The comment in heap_xlog_clean
> below is right generally but if a page filled with tuples becomes
> almost empty and freezable by this cleanup, a problematic
> situation like this occurs.
>
>> /*
>>  * Update the FSM as well.
>>  *
>>  * XXX: Don't do this if the page was restored from full page image. We
>>  * don't bother to update the FSM in that case, it doesn't need to be
>>  * totally accurate anyway.
>>  */
>> if (action == BLK_NEEDS_REDO)
>>   XLogRecordPageWithFreeSpace(rnode, blkno, freespace);
>
> HEAP_INSERT/HEAP2_MULTI_INSERT/UPDATE does the similar. All of
> these reduces freespace but HEAP2_CLEAN increases. HEAP2_CLEAN
> occurs infrequently than the three. So I suppose HEAP2_CLEAN may
> always update FSM.
>
> Even if the page is not frozen, the similar situation is made
> with just ALL_VISIBLE. Without any updates on the page, freespace
> information for the page won't be corrected until the next
> freezing(or 'aggressive') vacuum occurs.
>
> From this point of view, HEAP2_FREEZE_PAGE is not responsible for
> updating FSM. But if we see that always updating FSM on
> HEAP2_CLEAN is too much, HEAP2_FREEZE_PAGE would be the next way
> to go.
>
> (I don't understand the reason for skipping updating FSM only for
>  FPI. This seems introduced by f8f42279)
>

This code is introduced by e9816533e39be464227b748ee5eeb3d9f688cd76
and discussion is here[1].
ISTM that this code is implemented based on that all page will be
vacuumed eventually. But now that we have freeze map and the pages
could never be vacuum, it would be worth to consider that behavior
again.

[1] 
https://www.postgresql.org/message-id/flat/49072021.7010801%40enterprisedb.com#49072021.7010...@enterprisedb.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-23 Thread Andres Freund
Hi,m

On 2017-03-23 17:40:55 -0400, Tom Lane wrote:
> Stylistic thought ... I am wondering if it wouldn't be a good idea
> to replace EEOP_CASE_WHEN_STEP, EEOP_CASE_THEN_STEP, EEOP_COALESCE,
> and EEOP_ARRAYREF_CHECKINPUT with instructions defined in a less
> usage-dependent way as
> 
>   EEOP_JUMP   unconditional jump
>   EEOP_JUMP_IF_NULL   jump if step result is null
>   EEOP_JUMP_IF_NOT_NULL   jump if step result isn't null
>   EEOP_JUMP_IF_NOT_TRUE   jump if step result isn't TRUE
> 
> One could imagine later filling out this set with the other BoolTest
> condition types, but that seems to be all we need right now.

Hm, no arguments against, but I'm also not particularly excited about
the change.


> These are basically just renamings of the step types that exist now,
> although EEOP_ARRAYREF_CHECKINPUT would have to drop its not-very-
> necessary Assert(!op->d.arrayref.state->isassignment).

I won't shed a tear about that assert's removal.


> Well, I guess I should say that they're renamings of the semantics
> that I have for these steps in my working copy; for instance, I got
> rid of casewhen.value/casewhen.isnull in favor of letting CASE WHEN
> expressions evaluate into the CASE's final output variable.

That sounds like a sensible change (in the abstract, I obviously haven't
seen your working copy).


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partitioned tables and relfilenode

2017-03-23 Thread Amit Langote
On 2017/03/23 23:47, Amit Langote wrote:
> On Thu, Mar 23, 2017 at 11:27 PM, Maksim Milyutin
>  wrote:
>> Hi!
>>
>> I have noticed that there is scheduled unlinking of nonexistent physical
>> storage under partitioned table when we execute DROP TABLE statement on this
>> partitioned table. Though this action doesn't generate any error under
>> typical behavior of postgres because the error of storage's lack is caught
>> through if-statement [1] I think it is not safe.
>>
>> My patch fixes this issue.
>>
>> 1. src/backend/storage/smgr/md.c:1385
> 
> Good catch, will incorporate that in the main patch.

And here is the updated patch.

Thanks,
Amit
>From aed0685b6bfb99a301e674221fb393cc4e660461 Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 24 Jan 2017 14:22:34 +0900
Subject: [PATCH] Do not allocate storage for partitioned tables.

Currently, it is not possible to insert any data into a partitioned
table.  So they're empty at all times, which means it is wasteful to
allocate relfilenode and related storage objects.

Also documented the fact that specifying storage options for partitioned
tables is ineffective at the moment.
---
 doc/src/sgml/ref/create_table.sgml |  6 ++
 src/backend/catalog/heap.c | 17 ++---
 2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 9ed25c05da..7ac55b17fa 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1040,6 +1040,12 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
 will use the table's parameter value.

 
+   
+Note that specifying the following parameters for partitioned tables has
+no effect at the moment.  You must specify them for individual leaf
+partitions if necessary.
+   
+

 

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 41c0056556..2f5090b183 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -291,6 +291,7 @@ heap_create(const char *relname,
 		case RELKIND_VIEW:
 		case RELKIND_COMPOSITE_TYPE:
 		case RELKIND_FOREIGN_TABLE:
+		case RELKIND_PARTITIONED_TABLE:
 			create_storage = false;
 
 			/*
@@ -1345,14 +1346,13 @@ heap_create_with_catalog(const char *relname,
 	if (oncommit != ONCOMMIT_NOOP)
 		register_on_commit_action(relid, oncommit);
 
-	if (relpersistence == RELPERSISTENCE_UNLOGGED)
-	{
-		Assert(relkind == RELKIND_RELATION || relkind == RELKIND_MATVIEW ||
-			   relkind == RELKIND_TOASTVALUE ||
-			   relkind == RELKIND_PARTITIONED_TABLE);
-
+	/*
+	 * We do not want to create any storage objects for a partitioned
+	 * table, including the init fork.
+	 */
+	if (relpersistence == RELPERSISTENCE_UNLOGGED &&
+		relkind != RELKIND_PARTITIONED_TABLE)
 		heap_create_init_fork(new_rel_desc);
-	}
 
 	/*
 	 * ok, the relation has been cataloged, so close our relations and return
@@ -1376,6 +1376,9 @@ heap_create_with_catalog(const char *relname,
 void
 heap_create_init_fork(Relation rel)
 {
+	Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
+		   rel->rd_rel->relkind == RELKIND_MATVIEW ||
+		   rel->rd_rel->relkind == RELKIND_TOASTVALUE);
 	RelationOpenSmgr(rel);
 	smgrcreate(rel->rd_smgr, INIT_FORKNUM, false);
 	log_smgrcreate(>rd_smgr->smgr_rnode.node, INIT_FORKNUM);
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_dump --sequence-data option

2017-03-23 Thread Peter Eisentraut
At the conclusion of
, pg_upgrade was
changed to upgrade sequences "logically".  We initially did that by
adding a pg_dump option --sequence-data that would dump sequence data
(setval calls) in spite of --schema-only.  Later, that option was
removed as a separate option and made automatic in --binary-upgrade,
because we didn't think it was useful independently.

For logical replication, sequence data is not replicated at the moment.
So it could be useful in some cases to have a simple way to dump just
the sequence data.  So I think it would be worth adding that option back.

Comments?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-23 Thread Peter Eisentraut
On 3/23/17 16:58, Stephen Frost wrote:
> The backup tools need to also get the LSN from the pg_stop_backup and
> verify that they have the WAL file associated with that LSN.

There is a function for that.

> They also
> need to make sure that they have all of the WAL files between the
> starting LSN and the ending LSN.  Doing that requires understanding how
> the files are named to make sure there aren't any missing.

There is not a function for that, but there could be one.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-23 Thread Andrew Gierth
> "Andres" == Andres Freund  writes:

 Andres> a) cast result of lfirst/lnext/whatnot.

Again, what we need here is something like

#define lfirst_node(_type_, l) (castNode(_type_, lfirst(l)))

etc.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-23 Thread Tom Lane
Andres Freund  writes:
> On 2017-03-23 21:26:19 -0400, Tom Lane wrote:
>> Hmm, I see ... but that only works in the cases where the caller of
>> ExecBuildProjectionInfo supplied a source slot, and a lot of 'em
>> don't.

> Right, the old and new code comment on that:

>  * inputDesc can be NULL, but if it is not, we check to see whether simple
>  * Vars in the tlist match the descriptor.  It is important to provide
>  * inputDesc for relation-scan plan nodes, as a cross check that the relation
>  * hasn't been changed since the plan was made.  At higher levels of a plan,
>  * there is no need to recheck.

Ah, I'd forgotten the assumption that we only need to check this at scan
level.

>> I'd already put in the infrastructure to add ASSIGN_FOO_VAR_FIRST
>> step types.  I could take it back out, but I wonder if it wouldn't be
>> smarter to keep it and remove the restriction in ExecBuildProjectionInfo.
>> Or maybe we could have ExecBuildProjectionInfo emit either
>> ASSIGN_FOO_VAR_FIRST or ASSIGN_FOO_VAR depending on whether it can prove
>> the reference safe.

> I think it's probably ok to just leave the check in, and remove those
> comments, and simplify the isSimpleVar stuff to only check if
>  IsA(tle->expr, Var) && ((Var *) tle->expr)->varattno > 0)

Not sure.  It's a pretty fair amount of duplicative code, once you finish
dealing with all the ExecJustFoo functions in addition to the main code
paths.  At this point I'm inclined to take it back out and improve the
comments around ExecBuildProjectionInfo.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we create a new roadmap page for development?

2017-03-23 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas
> On Tue, Mar 21, 2017 at 2:36 AM, Tsunakawa, Takayuki
>  wrote:
> > Should I create a page for PostgreSQL 11 likewise?  Or, do you want a
> more stable page named "PostgreSQL Roadmap" instead of "PostgreSQL11
> Roadmap" and attach the target version to each feature as in "Feature X
> (V11)", so that we can represent more longer-term goals?
> 
> I think a page that's just called PostgreSQL Roadmap will get out of date
> pretty quickly.  Ultimately it'll end up like the TODO list, which is not
> really a list of things TO DO.  The advantage of the version-specific pages
> is that old information kind of ages its way out of the system.

Maybe so.  I created a page for PostgreSQL 11 and add a link to our roadmap:

https://wiki.postgresql.org/wiki/PostgreSQL11_Roadmap

Please add your roadmaps when you can.


> > And, why don't we add a link to the above roadmap page in the "Development
> information" page?
> >
> > Development information
> > https://wiki.postgresql.org/wiki/Development_information
> 
> I'm sure nobody will object to you doing that.  It's a wiki, and intended
> to be edited.

Thanks, done. Also, I moved the existing the link to "Development projects" 
from " Past PgCon Developer Meeting Notes" to the new header "Roadmaps and 
Projects", because the development projects page doesn't appear to fit the 
PGCON notes.

Regards
Takayuki


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pageinspect and hash indexes

2017-03-23 Thread Ashutosh Sharma
On Fri, Mar 24, 2017 at 9:46 AM, Ashutosh Sharma  wrote:
> On Fri, Mar 24, 2017 at 9:21 AM, Amit Kapila  wrote:
>> On Thu, Mar 23, 2017 at 6:46 PM, Ashutosh Sharma  
>> wrote:

 Oh, okay, but my main objection was that we should not check hash page
 type (hasho_flag) without ensuring whether it is a hash page.  Can you
 try to adjust the above code so that this check can be moved after
 hasho_page_id check?
>>>
>>> Yes, I got your point. I have done that but then i had to remove the
>>> check for PageIsEmpty(). Anyways, I think PageIsEmpty() condition will
>>> only be true for one page in entire hash index table and can be
>>> ignored. If you wish, I could mention about it in the documentation.
>>>
>>
>> Yeah, I think it is worth adding a Note in docs, but we can do that
>> separately if required.
>>

> To avoid
> this, at the start of verify_hash_page function itself if we recognise
> page as UNUSED page we return immediately.
>
>>
>> 2.
>> + /* Check if it is an empty hash page. */
>> + if (PageIsEmpty(page))
>> + ereport(ERROR,
>> + (errcode(ERRCODE_INDEX_CORRUPTED),
>> + errmsg("index table contains empty page")));
>>
>>
>> Do we want to give a separate message for EMPTY and NEW pages?  Isn't
>> it better that the same error message can be given for both of them as
>> from user perspective there is not much difference between both the
>> messages?
>
> I think we should show separate message because they are two different
> type of pages. In the sense like, one is initialised whereas other is
> completely zero.
>

 I understand your point, but not sure if it makes any difference to user.

>>>
>>> okay, I have now anyways removed the check for PageIsEmpty. Please
>>> refer to the attached '0002
>>> allow_pageinspect_handle_UNUSED_hash_pages.patch'
>>>
>>
>> +
>> if (pageopaque->hasho_page_id != HASHO_PAGE_ID)
>> ereport(ERROR,
>>
>> spurious white space.
>>
>>>
>>> Also, I have attached
>>> '0001-Mark-freed-overflow-page-as-UNUSED-pagetype-v2.patch' that
>>> handles your comment mentioned in [1].
>>>
>>
>> In general, we have to initialize prevblock with max_bucket, but here
>> it is okay, because we anyway initialize it after this page is
>> allocated as overflow page.
>>
>> Your patches look good to me except for small white space change.
>
> Thanks for reviewing my patch. I have removed the extra white space.
> Attached are both the patches.

Sorry, I have mistakenly attached wrong patch. Here are the correct
set of patches.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 86eb711ea80b495c9d2f5228558682d0e95c41eb Mon Sep 17 00:00:00 2001
From: ashu 
Date: Thu, 23 Mar 2017 16:47:17 +0530
Subject: [PATCH] Mark freed overflow page as UNUSED pagetype v2

---
 src/backend/access/hash/hash_xlog.c |  9 +
 src/backend/access/hash/hashovfl.c  | 13 -
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index d9ac42c..2ccaf46 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -697,11 +697,20 @@ hash_xlog_squeeze_page(XLogReaderState *record)
 	if (XLogReadBufferForRedo(record, 2, ) == BLK_NEEDS_REDO)
 	{
 		Page		ovflpage;
+		HashPageOpaque ovflopaque;
 
 		ovflpage = BufferGetPage(ovflbuf);
 
 		_hash_pageinit(ovflpage, BufferGetPageSize(ovflbuf));
 
+		ovflopaque = (HashPageOpaque) PageGetSpecialPointer(ovflpage);
+
+		ovflopaque->hasho_prevblkno = InvalidBlockNumber;
+		ovflopaque->hasho_nextblkno = InvalidBlockNumber;
+		ovflopaque->hasho_bucket = -1;
+		ovflopaque->hasho_flag = LH_UNUSED_PAGE;
+		ovflopaque->hasho_page_id = HASHO_PAGE_ID;
+
 		PageSetLSN(ovflpage, lsn);
 		MarkBufferDirty(ovflbuf);
 	}
diff --git a/src/backend/access/hash/hashovfl.c b/src/backend/access/hash/hashovfl.c
index a3cae21..d3eaee0 100644
--- a/src/backend/access/hash/hashovfl.c
+++ b/src/backend/access/hash/hashovfl.c
@@ -591,9 +591,20 @@ _hash_freeovflpage(Relation rel, Buffer bucketbuf, Buffer ovflbuf,
 	/*
 	 * Initialize the freed overflow page.  Just zeroing the page won't work,
 	 * because WAL replay routines expect pages to be initialized. See
-	 * explanation of RBM_NORMAL mode atop XLogReadBufferExtended.
+	 * explanation of RBM_NORMAL mode atop XLogReadBufferExtended. Also, mark
+	 * the page as UNUSED type and retain it's page id. This allows the tools
+	 * like pageinspect to consider it as a hash page.
 	 */
 	_hash_pageinit(ovflpage, BufferGetPageSize(ovflbuf));
+
+	ovflopaque = (HashPageOpaque) PageGetSpecialPointer(ovflpage);
+
+	ovflopaque->hasho_prevblkno = InvalidBlockNumber;
+	ovflopaque->hasho_nextblkno = InvalidBlockNumber;
+	ovflopaque->hasho_bucket = -1;
+	ovflopaque->hasho_flag = 

Re: [HACKERS] delta relations in AFTER triggers

2017-03-23 Thread Thomas Munro
On Fri, Mar 24, 2017 at 1:14 PM, Thomas Munro
 wrote:
> If that's fixed and the permissions question can be waved away by
> saying it's the same as the per-row situation, my only other comment
> would be a bikeshed issue: Enr isn't a great name for a struct.

One more thought: should this be allowed?

postgres=# create table mytab (i int) partition by list (i);
CREATE TABLE
postgres=# create table mytab1 partition of mytab for values in (42);
CREATE TABLE
postgres=# create function my_trigger_function() returns trigger as $$
begin end; $$ language plpgsql;
CREATE FUNCTION
postgres=# create trigger my_trigger after update on mytab referencing
old table as my_old for each statement execute procedure
my_trigger_function();
CREATE TRIGGER

I haven't really looked into the interaction of triggers and the new
partition feature very much but it looks like the intention is that
you shouldn't be allowed to do something that would need access to the
actual row data from a trigger that is attached to the top-level
partition:

postgres=# create trigger my_trigger before update on mytab for each
row execute procedure my_trigger_function();
ERROR:  "mytab" is a partitioned table
DETAIL:  Partitioned tables cannot have ROW triggers.

By the same logic, I guess that we shouldn't allow transition table
triggers to be attached to the top level partitioned table, because it
can't really work.

You can attach ROW triggers to the concrete tables that hold real
data, which makes sense because they actually have data to capture and
feed to the trigger function:

postgres=# create trigger my_trigger before update on mytab1 for each
row execute procedure my_trigger_function();
CREATE TRIGGER

Perhaps the moral equivalent should be possible for statement triggers
with transition tables, and that already works with your patch as far
as I know.  So I think your patch probably just needs to reject them
on partitioned tables.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-23 Thread Andres Freund
On 2017-03-23 20:36:32 -0400, Tom Lane wrote:
> I found a rather nasty bug :-( ... the comment in EEOP_INNER_VAR_FIRST about
> 
> +/*
> + * Can't assert tts_nvalid, as wholerow var evaluation or such
> + * could have materialized the slot - but the contents are still
> + * valid :/
> + */
> +Assert(op->d.var.attnum >= 0);
> 
> is actually averting its eyes from a potentially serious problem.  If we
> go through ExecEvalWholeRowVar, which calls ExecFetchSlotTupleDatum, and
> ExecFetchSlotTuple decides it has to materialize the tuple, then
> ExecMaterializeSlot does this:
> 
> /*
>  * Drop the pin on the referenced buffer, if there is one.
>  */
> if (BufferIsValid(slot->tts_buffer))
> ReleaseBuffer(slot->tts_buffer);
> 
> slot->tts_buffer = InvalidBuffer;
> 
> /*
>  * Mark extracted state invalid.  This is important because the slot is
>  * not supposed to depend any more on the previous external data; we
>  * mustn't leave any dangling pass-by-reference datums in tts_values.
>  * However, we have not actually invalidated any such datums, if there
>  * happen to be any previously fetched from the slot.  (Note in particular
>  * that we have not pfree'd tts_mintuple, if there is one.)
>  */
> slot->tts_nvalid = 0;
> 
> 
> The problem here is that once we drop the buffer pin, any pointers we may
> have into on-disk data are dangling pointers --- we're at risk of some
> other backend taking away that shared buffer.  (So I'm afraid that the
> commentary there is overly optimistic.)  So even though those pointers
> may still be there beyond tts_nvalid, subsequent references to them are
> very dangerous.

This applies to the code in master as well, no?  An ExecEvalScalarVar()
followed by an ExecEvalWholeRowVar() would have precisely the same
effect?  Do we need to do anything about this in the back-branches,
given how unlikely this is going to be in practice?


> So I think that we have got to fix ExecEvalWholeRowVar so that it doesn't
> clobber the state of the slot.

That seems like a good plan.


> Also, while trying to test the above scenario, I realized that the patch
> as submitted was being way too cavalier about where it was applying 
> CheckVarSlotCompatibility and so on.  The ASSIGN_FOO_VAR steps, for
> instance, had no protection at all.

I don't think that's true - the assign checks had copied the code from
the old ExecBuildProjectionInfo, setting isSimpleVar iff
(!attr->attisdropped && variable->vartype == attr->atttypid) - we can
check that for projections in contrast to normal expressions because we
already know the slot.  The relevant comment for that, from before the
patch, is:
 * inputDesc.  (Note: if there is a type mismatch then ExecEvalScalarVar
 * will probably throw an error at runtime, but we leave that to it.)
 */


> I hope to have a fully reviewed patch to pass back to you tomorrow.
> Or Saturday at the latest.

Cool.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU integration

2017-03-23 Thread Craig Ringer
On 24 March 2017 at 04:54, Peter Eisentraut
 wrote:

> Fixed.

Congratulations on getting this done. It's great work, and it'll make
a whole class of potential bugs and platform portability warts go away
if widely adopted.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No more libedit?! - openssl plans to switch to APL2

2017-03-23 Thread Andreas Karlsson

On 08/01/2015 05:14 PM, Andres Freund wrote:

According to https://www.openssl.org/blog/blog/2015/08/01/cla/ openssl
is planning to relicense to the apache license 2.0. While APL2 is not
compatible with GLP2 it *is* compatible with GPL3.


Great! This means that the Debian packages will eventually be able to 
drop their LD_PRELOAD hack, which never worked perfectly due to 
compiling against libedit or libreadline header resulting in different 
binaries.


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pageinspect and hash indexes

2017-03-23 Thread Amit Kapila
On Thu, Mar 23, 2017 at 6:46 PM, Ashutosh Sharma  wrote:
>>
>> Oh, okay, but my main objection was that we should not check hash page
>> type (hasho_flag) without ensuring whether it is a hash page.  Can you
>> try to adjust the above code so that this check can be moved after
>> hasho_page_id check?
>
> Yes, I got your point. I have done that but then i had to remove the
> check for PageIsEmpty(). Anyways, I think PageIsEmpty() condition will
> only be true for one page in entire hash index table and can be
> ignored. If you wish, I could mention about it in the documentation.
>

Yeah, I think it is worth adding a Note in docs, but we can do that
separately if required.

>>
>>> To avoid
>>> this, at the start of verify_hash_page function itself if we recognise
>>> page as UNUSED page we return immediately.
>>>

 2.
 + /* Check if it is an empty hash page. */
 + if (PageIsEmpty(page))
 + ereport(ERROR,
 + (errcode(ERRCODE_INDEX_CORRUPTED),
 + errmsg("index table contains empty page")));


 Do we want to give a separate message for EMPTY and NEW pages?  Isn't
 it better that the same error message can be given for both of them as
 from user perspective there is not much difference between both the
 messages?
>>>
>>> I think we should show separate message because they are two different
>>> type of pages. In the sense like, one is initialised whereas other is
>>> completely zero.
>>>
>>
>> I understand your point, but not sure if it makes any difference to user.
>>
>
> okay, I have now anyways removed the check for PageIsEmpty. Please
> refer to the attached '0002
> allow_pageinspect_handle_UNUSED_hash_pages.patch'
>

+
if (pageopaque->hasho_page_id != HASHO_PAGE_ID)
ereport(ERROR,

spurious white space.

>
> Also, I have attached
> '0001-Mark-freed-overflow-page-as-UNUSED-pagetype-v2.patch' that
> handles your comment mentioned in [1].
>

In general, we have to initialize prevblock with max_bucket, but here
it is okay, because we anyway initialize it after this page is
allocated as overflow page.

Your patches look good to me except for small white space change.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Privilege checks on array coercions

2017-03-23 Thread Jim Nasby

On 3/23/17 12:37 PM, Andres Freund wrote:

On 2017-03-23 15:26:51 -0400, Tom Lane wrote:

There is a test in privileges.sql (currently lines 589-625 in
privileges.out) that seems to be dependent on the fact that the
ArrayCoerceExpr logic doesn't check for EXECUTE privilege on the
per-element type coercion function if it's dealing with a NULL input
array.

...

Does anyone want to defend this
privileges test case as testing for some behavior that users expect?


Not me - that seems quite sensible to change.


I'd even argue that existing behavior is a bug.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backend crash on non-exclusive backup cancel

2017-03-23 Thread Michael Paquier
On Fri, Mar 24, 2017 at 1:45 AM, Teodor Sigaev  wrote:
> I believe patch looks good and it's ready to commit.

Thanks for the review!

> As I understand, it fixes bug introduced by
> commit 7117685461af50f50c03f43e6a622284c8d54694
> Date:   Tue Apr 5 20:03:49 2016 +0200
>
> Implement backup API functions for non-exclusive backups

Indeed.

> And, suppose, it should be backpatched to 9.6?

Yes, that's where non-exclusive backups have been introduced.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-23 Thread Tom Lane
I found a rather nasty bug :-( ... the comment in EEOP_INNER_VAR_FIRST about

+/*
+ * Can't assert tts_nvalid, as wholerow var evaluation or such
+ * could have materialized the slot - but the contents are still
+ * valid :/
+ */
+Assert(op->d.var.attnum >= 0);

is actually averting its eyes from a potentially serious problem.  If we
go through ExecEvalWholeRowVar, which calls ExecFetchSlotTupleDatum, and
ExecFetchSlotTuple decides it has to materialize the tuple, then
ExecMaterializeSlot does this:

/*
 * Drop the pin on the referenced buffer, if there is one.
 */
if (BufferIsValid(slot->tts_buffer))
ReleaseBuffer(slot->tts_buffer);

slot->tts_buffer = InvalidBuffer;

/*
 * Mark extracted state invalid.  This is important because the slot is
 * not supposed to depend any more on the previous external data; we
 * mustn't leave any dangling pass-by-reference datums in tts_values.
 * However, we have not actually invalidated any such datums, if there
 * happen to be any previously fetched from the slot.  (Note in particular
 * that we have not pfree'd tts_mintuple, if there is one.)
 */
slot->tts_nvalid = 0;


The problem here is that once we drop the buffer pin, any pointers we may
have into on-disk data are dangling pointers --- we're at risk of some
other backend taking away that shared buffer.  (So I'm afraid that the
commentary there is overly optimistic.)  So even though those pointers
may still be there beyond tts_nvalid, subsequent references to them are
very dangerous.

I think that it's pretty hard to hit this in practice, maybe impossible,
because the normal case for an "on-disk" tuple is that
TTS_HAS_PHYSICAL_TUPLE is true, so that ExecFetchSlotTuple won't change
the state of the slot.  If we have a virtual tuple that has to be
materialized, then by that very token it won't have a buffer pin to drop.
But I find this fragile as heck, and the aforesaid patch comment surely
isn't adequately documenting the safety considerations.  Also, if there
ever were a live bug here, reproducing it would be damn hard because of
the low probability that a just-unpinned buffer would get replaced any
time soon.  (Hm, I wonder whether the buffer cache needs something
analogous to the syscaches' CLOBBER_CACHE_ALWAYS behavior...)

Besides which, I really really don't like the lack of an "attnum <
tts_nvalid" assertion there; that's just obviously failing to check for
very simple bugs, such as getting the FETCHSOME steps wrong.

So I think that we have got to fix ExecEvalWholeRowVar so that it doesn't
clobber the state of the slot.  Right at the moment, the only way to do
that seems to be to do this instead of ExecFetchSlotTupleDatum:

tuple = ExecCopySlotTuple(slot);
dtuple = (HeapTupleHeader)
DatumGetPointer(heap_copy_tuple_as_datum(tuple,
 slot->tts_tupleDescriptor));
heap_freetuple(tuple);

That's kind of annoying because of the double copying involved, but
tuptoaster.c doesn't expose any functionality for this except
heap_copy_tuple_as_datum().  I figure we can improve it later --- it looks
like we can refactor heap_copy_tuple_as_datum to expose a function that
reads from Datum/isnull arrays, and then call it on the slot's
tts_values/tts_isnull arrays.  Seems like it might be a good idea to
think a bit harder about the TupleTableSlot APIs, too, and reduce the
number of cases where execTuples exposes destructive changes to the
state of a slot.

Also, while trying to test the above scenario, I realized that the patch
as submitted was being way too cavalier about where it was applying 
CheckVarSlotCompatibility and so on.  The ASSIGN_FOO_VAR steps, for
instance, had no protection at all.  Think I have that all fixed up
though.

I hope to have a fully reviewed patch to pass back to you tomorrow.
Or Saturday at the latest.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No more libedit?! - openssl plans to switch to APL2

2017-03-23 Thread Andres Freund
On 2015-08-01 17:14:10 +0200, Andres Freund wrote:
> According to https://www.openssl.org/blog/blog/2015/08/01/cla/ openssl
> is planning to relicense to the apache license 2.0. While APL2 is not
> compatible with GLP2 it *is* compatible with GPL3.

Just 5 minutes later, some progress on that front:

https://www.openssl.org/blog/blog/2017/03/20/license/

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-23 Thread Jeff Janes
On Thu, Mar 23, 2017 at 1:45 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 3/22/17 17:33, David Steele wrote:
>
> > and I doubt that most tool writers would be quick to
> > add support for a feature that very few people (if any) use.
>
> I'm not one of those tool writers, although I have written my share of
> DBA scripts over the years, but I wonder why those tools would really
> care.  They are handed files with predetermined names to archive, and
> for restore files with predetermined names are requested back from them.
>  What else do they need?  If something is missing that requires them to
> parse file names, then maybe that should be added.
>


I have a pg_restore which predicts the file 5 files ahead of the one it was
asked for, and initiates a pre-fetch and decompression of it. Then it
delivers the file it was asked for, either by pulling it out of the
pre-staging area set up by the N-5th invocation, or by going directly to
the archive to get it.  This speeds up play-back dramatically when the
files are stored compressed and non-local.

That is why I need to know how the files are numbered.  I don't think that
that makes much of a difference, though.  Any change is going to break
that, no matter which change.  Then I'll fix it.

If we are going to break it, I'd prefer to just do away with the 'segment'
thing altogether.  You have timelines, and you have files.  That's it.

Cheers,

Jeff


Re: [HACKERS] free space map and visibility map

2017-03-23 Thread Kyotaro HORIGUCHI
At Wed, 22 Mar 2017 02:15:26 +0900, Masahiko Sawada  
wrote in 
> On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas  wrote:
> > On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes  wrote:
> >> Isn't HEAP2_CLEAN only issued before an intended HOT update?  (Which then
> >> can't leave the block as all visible or all frozen).  I think the issue is
> >> here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE.  Am I reading this correctly,
> >> that neither of those ever update the FSM, regardless of FPI?
> >
> > Yes, updates to the FSM are never logged.  Forcing replay of
> > HEAP2_FREEZE_PAGE to update the FSM might be a good idea.
> >
> 
> I think I was missing something. I imaged your situation is that FPI
> is replayed during crash recovery after the crashed server vacuums the
> page and marked it as all-frozen. But this situation is also resolved
> by that solution.

# HEAP2_CLEAN is issued in lazy_vacuum_page

It will work but I'm not sure it is right direction for
HEAP2_FREEZE_PAGE to touch FSM.

As Masahiko said, the situation must be created by HEAP2_VISIBLE
without preceding HEAP2_CLEAN, or with HEAP2_CLEAN with FPI. I
think only the latter can happen. The comment in heap_xlog_clean
below is right generally but if a page filled with tuples becomes
almost empty and freezable by this cleanup, a problematic
situation like this occurs.

> /*
>  * Update the FSM as well.
>  *
>  * XXX: Don't do this if the page was restored from full page image. We
>  * don't bother to update the FSM in that case, it doesn't need to be
>  * totally accurate anyway.
>  */
> if (action == BLK_NEEDS_REDO)
>   XLogRecordPageWithFreeSpace(rnode, blkno, freespace);

HEAP_INSERT/HEAP2_MULTI_INSERT/UPDATE does the similar. All of
these reduces freespace but HEAP2_CLEAN increases. HEAP2_CLEAN
occurs infrequently than the three. So I suppose HEAP2_CLEAN may
always update FSM.

Even if the page is not frozen, the similar situation is made
with just ALL_VISIBLE. Without any updates on the page, freespace
information for the page won't be corrected until the next
freezing(or 'aggressive') vacuum occurs.

>From this point of view, HEAP2_FREEZE_PAGE is not responsible for
updating FSM. But if we see that always updating FSM on
HEAP2_CLEAN is too much, HEAP2_FREEZE_PAGE would be the next way
to go.

(I don't understand the reason for skipping updating FSM only for
 FPI. This seems introduced by f8f42279)

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [pgsql-www] [HACKERS] Small issue in online devel documentation build

2017-03-23 Thread Peter Eisentraut
I think the fix belongs into the web site CSS, so there is nothing to
commit into PostgreSQL here.  I will close the commit fest entry, but I
have added a section to the open items list so we keep track of it.
(https://wiki.postgresql.org/wiki/PostgreSQL_10_Open_Items#Documentation_tool_chain)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] BUG: pg_dump generates corrupted gzip file in Windows

2017-03-23 Thread Kuntal Ghosh
Hello,
In Windows, if one needs to take a dump in plain text format (this is
the default option, or can be specified using -Fp) with some level of
compression (-Z[0-9]), an output file has to
be specified. Otherwise, if the output is redirected to stdout, it'll
create a corrupted dump (cmd is set to ASCII mode, so it'll put
carriage returns in the file).

I'm referring the following part of pg_dump code:

/*
 * On Windows, we need to use binary mode to read/write non-text archive
 * formats.  Force stdin/stdout into binary mode if that is what we are
 * using.
 */
#ifdef WIN32
if (fmt != archNull &&
(AH->fSpec == NULL || strcmp(AH->fSpec, "") == 0))
{
if (mode == archModeWrite)
setmode(fileno(stdout), O_BINARY);
else
setmode(fileno(stdin), O_BINARY);
}
#endif

For plain-text format, fmt is set to archNull. In that case, the
binary mode will not be forced(I think). To fix this, I've attached a
patch which adds one extra check in the 'if condition' to check the
compression level. PFA.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


0001-Fix-pg_dump-for-Windows-cmd.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] identity columns

2017-03-23 Thread Peter Eisentraut
On 3/23/17 06:09, Vitaly Burovoy wrote:
> I think we'll end up with "DROP IDENTITY IF EXISTS" to avoid raising
> an exception and "ADD OR SET" if your grammar remains.

That sounds reasonable to me.

> Right. From that PoV IDENTITY also changes a default value: "SET (ADD
> ... AS?) IDENTITY" works as setting a default to "nextval(...)"
> whereas "DROP IDENTITY" works as setting it back to NULL.

But dropping and re-adding an identity destroys state, so it's not quite
the same.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] create_unique_path and GEQO

2017-03-23 Thread Ashutosh Bapat
On Wed, Mar 22, 2017 at 3:43 PM, Ashutosh Bapat
 wrote:
> Hi,
> In create_unique_path() there's comment
> /*
>  * We must ensure path struct and subsidiary data are allocated in main
>  * planning context; otherwise GEQO memory management causes trouble.
>  */
> oldcontext = MemoryContextSwitchTo(root->planner_cxt);
>
> pathnode = makeNode(UniquePath);
>
> This means that when GEQO resets the memory context, the RelOptInfo
> for which this path is created and may be set to cheapest_unique_path
> goes away, the unique path lingers on in the planner context.
> Shouldn't we instead allocate the path in the same context as the
> RelOptInfo similar to mark_dummy_rel()?
>

tried this change as attached patch. I ran make installcheck with
geqo_threshold = 2. Only join.sql failed several plans changed, which
is expected. There was one difference related to changed output order
but that's because of the changed plan.

Adding this to the commitfest.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


pg_upath_geqo.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] segfault in hot standby for hash indexes

2017-03-23 Thread Amit Kapila
On Thu, Mar 23, 2017 at 4:26 PM, Ashutosh Sharma  wrote:
> On Thu, Mar 23, 2017 at 9:17 AM, Amit Kapila  wrote:
>>
>> On Thu, Mar 23, 2017 at 8:43 AM, Amit Kapila  wrote:
>> >
>> > I think this will work, but not sure if there is a merit to deviate
>> > from what btree does to handle this case.   One thing I find slightly
>> > awkward in hash_xlog_vacuum_get_latestRemovedXid() is that you are
>> > using a number of tuples registered as part of fixed data
>> > (xl_hash_vacuum_one_page) to traverse the data registered as buf data.
>> > I think it will be better if we register offsets also in fixed part of
>> > data as we are doing btree case.
>
> Agreed. I have made the changes accordingly. Please check attached v2 patch.
>

Changes look good to me.   I think you can modify the comments in
structure xl_hash_vacuum_one_page to mention "TARGET OFFSET NUMBERS
FOLLOW AT THE END"

>>
>> >
>> >
>>
>> Also another small point in this regard, do we need two separate
>> variables to track number of deleted items in below code?  I think one
>> variable is sufficient.
>>
>> _hash_vacuum_one_page()
>> {
>> ..
>> deletable[ndeletable++] = offnum;
>> tuples_removed += 1;--
> With Regards,
> Ashutosh Sharma
> EnterpriseDB:http://www.enterprisedb.com
>> ..
>> }
>>
>
> Yes, I think 'ndeletable' alone should be fine.
>

I think it would have been probably okay to use *int* for ntuples as
that matches with what you are actually assigning in the function.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-23 Thread Andres Freund
Hi,

On 2017-03-23 13:14:59 -0400, Tom Lane wrote:
> Looking at some of the coding choices you made here, I see that you're
> making a hard assumption that called functions never scribble on their
> fcinfo->arg/argnull arrays.  I do not believe that we've ever had such
> an assumption before.

I think we did that before, e.g. ExecEvalScalarArrayOp(). Think there's
others too.


> Are we comfortable with that?  If so, we'd better document it.

I think it's ok, but we indeed should document it. I recall a note
somewhere... Can't find it anywhere however, might have misremembered a
note about pass-by-ref arguments.  fmgr/README? A note in
FunctionCallInfoData's definition?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel "return query" is no good

2017-03-23 Thread Alvaro Herrera
Robert Haas wrote:

> I guess the downside of back-patching this is that it could cause a
> plan change for somebody which ends up being worse.  On the whole,
> serial execution of queries intended to be run in parallel isn't
> likely to work out well, but it's always possible somebody has a cases
> where it happens to be winning, and this could break it.  So maybe I
> should do this only in master?  Thoughts?

I think that the chances of someone depending on a parallel plan running
serially by accident which is better than the non-parallel plan, are
pretty slim.

+1 for back-patching.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

2017-03-23 Thread Robert Haas
On Thu, Mar 23, 2017 at 12:41 AM, Rafia Sabih
 wrote:
> Agree, done.

OK, committed execute-once-v3.patch after some further study.  See
also 
https://www.postgresql.org/message-id/CA%2BTgmoZ_ZuH%2BauEeeWnmtorPsgc_SmP%2BXWbDsJ%2BcWvWBSjNwDQ%40mail.gmail.com
which resulted from that study.

Regarding pl_parallel_opt_support_v3.patch, the change to
init_execution_state looks good.  Whether or not it's safe to use
parallel query doesn't depend on whether the function is marked
volatile, so the current code is just silly (and, arguably,
readonly_func is misnamed).  The changes to spi.c also seem fine; both
of those functions execute the plan to completion and don't allow
cursor operations, so we're good.

The changes to the plpgsql code don't look so good to me.  The change
to exec_stmt_return_query fixes the same bug that I mentioned in the
email linked above, but only half of it -- it corrects the RETURN
QUERY EXECUTE case but not the RETURN QUERY case.  And it's clearly a
separate change; that part is a bug fix, not an enhancement.  Some of
the other changes depend on whether we're in a trigger, which seems
irrelevant to whether we can use parallelism. Even if the outer query
is doing writes, we can still use parallelism for queries inside the
trigger function if warranted.  It's probably a rare case to have
queries inside a trigger that are expensive enough to justify such
handling but I don't see the value of putting in special-case logic to
prevent it.

I suspect that code fails to achieve its goals anyway.  At the top of
exec_eval_expr(), you call exec_prepare_plan() and unconditionally
pass CURSOR_OPT_PARALLEL_OK, so when that function returns, expr->plan
might now be a parallel plan.  If we reach the call to
exec_run_select() further down in that function, and if we happen to
pass false, it's not going to matter, because exec_run_select() is
going to find the plan already initialized.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Scan Mode in Hash Index

2017-03-23 Thread Ashutosh Sharma
On Thu, Mar 23, 2017 at 8:29 PM, Jesper Pedersen
 wrote:
> Hi,
>
> On 03/22/2017 09:32 AM, Ashutosh Sharma wrote:
>>
>> Done. Please refer to the attached v2 version of patch.
>>
>
> Thanks.
>
 1) 0001-Rewrite-hash-index-scans-to-work-a-page-at-a-time.patch: this
 patch rewrites the hash index scan module to work in page-at-a-time
 mode. It basically introduces two new functions-- _hash_readpage() and
 _hash_saveitem(). The former is used to load all the qualifying tuples
 from a target bucket or overflow page into an items array. The latter
 one is used by _hash_readpage to save all the qualifying tuples found
 in a page into an items array. Apart from that, this patch bascially
 cleans _hash_first(), _hash_next and hashgettuple().

>
> 0001v2:
>
> In hashgettuple() you can remove the 'currItem' and 'offnum' from the 'else'
> part, and do the assignment inside
>
>   if (so->numKilled < MaxIndexTuplesPerPage)
>
> instead.
>

Done. Please have a look into the attached v3 patch.

>
> No new comments for 0002 and 0003.

okay. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 4e953c35da2274165b00d763500b83e0f3f9e2a9 Mon Sep 17 00:00:00 2001
From: ashu 
Date: Thu, 23 Mar 2017 23:36:05 +0530
Subject: [PATCH] Rewrite hash index scans to work a page at a timev3

Patch by Ashutosh Sharma
---
 src/backend/access/hash/README   |   9 +-
 src/backend/access/hash/hash.c   | 121 +++--
 src/backend/access/hash/hashpage.c   |  14 +-
 src/backend/access/hash/hashsearch.c | 330 ++-
 src/backend/access/hash/hashutil.c   |  23 ++-
 src/include/access/hash.h|  44 +
 6 files changed, 385 insertions(+), 156 deletions(-)

diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index 1541438..f0a7bdf 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -243,10 +243,11 @@ The reader algorithm is:
 -- then, per read request:
 	reacquire content lock on current page
 	step to next page if necessary (no chaining of content locks, but keep
- the pin on the primary bucket throughout the scan; we also maintain
- a pin on the page currently being scanned)
-	get tuple
-	release content lock
+ the pin on the primary bucket throughout the scan)
+save all the matching tuples from current index page into an items array
+release pin and content lock (but if it is primary bucket page retain
+it's pin till the end of scan)
+get tuple from an item array
 -- at scan shutdown:
 	release all pins still held
 
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 34cc08f..8c28fbd 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -268,66 +268,23 @@ bool
 hashgettuple(IndexScanDesc scan, ScanDirection dir)
 {
 	HashScanOpaque so = (HashScanOpaque) scan->opaque;
-	Relation	rel = scan->indexRelation;
-	Buffer		buf;
-	Page		page;
 	OffsetNumber offnum;
-	ItemPointer current;
 	bool		res;
+	HashScanPosItem	*currItem;
 
 	/* Hash indexes are always lossy since we store only the hash code */
 	scan->xs_recheck = true;
 
 	/*
-	 * We hold pin but not lock on current buffer while outside the hash AM.
-	 * Reacquire the read lock here.
-	 */
-	if (BufferIsValid(so->hashso_curbuf))
-		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE);
-
-	/*
 	 * If we've already initialized this scan, we can just advance it in the
 	 * appropriate direction.  If we haven't done so yet, we call a routine to
 	 * get the first item in the scan.
 	 */
-	current = &(so->hashso_curpos);
-	if (ItemPointerIsValid(current))
+	if (!HashScanPosIsValid(so->currPos))
+		res = _hash_first(scan, dir);
+	else
 	{
 		/*
-		 * An insertion into the current index page could have happened while
-		 * we didn't have read lock on it.  Re-find our position by looking
-		 * for the TID we previously returned.  (Because we hold a pin on the
-		 * primary bucket page, no deletions or splits could have occurred;
-		 * therefore we can expect that the TID still exists in the current
-		 * index page, at an offset >= where we were.)
-		 */
-		OffsetNumber maxoffnum;
-
-		buf = so->hashso_curbuf;
-		Assert(BufferIsValid(buf));
-		page = BufferGetPage(buf);
-
-		/*
-		 * We don't need test for old snapshot here as the current buffer is
-		 * pinned, so vacuum can't clean the page.
-		 */
-		maxoffnum = PageGetMaxOffsetNumber(page);
-		for (offnum = ItemPointerGetOffsetNumber(current);
-			 offnum <= maxoffnum;
-			 offnum = OffsetNumberNext(offnum))
-		{
-			IndexTuple	itup;
-
-			itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum));
-			if (ItemPointerEquals(&(so->hashso_heappos), &(itup->t_tid)))
-break;
-		}
-		if (offnum > maxoffnum)
-			elog(ERROR, "failed to re-find scan position within index \"%s\"",
- RelationGetRelationName(rel));
-		

Re: [HACKERS] parallel "return query" is no good

2017-03-23 Thread Joshua D. Drake

On 03/23/2017 10:03 AM, Robert Haas wrote:

On Thu, Mar 23, 2017 at 12:50 PM, Robert Haas  wrote:

Commit 7aea8e4f2daa4b39ca9d1309a0c4aadb0f7ed81b allowed a parallel
plan to be generated when for a RETURN QUERY or RETURN QUERY EXECUTE
statement in a PL/pgsql block.  As it turns out, the analysis that led
to this decision was totally wrong-headed, because the plan will
always be executed using SPI_cursor_fetch(portal, true, 50), which
will cause ExecutePlan() to get invoked with a count of 50, which will
cause it to run the parallel plan serially, without workers.
Therefore, passing CURSOR_OPT_PARALLEL_OK is a bad idea here; all it
can do is cause us to pick a parallel plan that's slow when executed
serially instead of the best serial plan.

The attached patch fixes it.  I plan to commit this and back-patch it
to 9.6, barring objections or better ideas.


I guess the downside of back-patching this is that it could cause a
plan change for somebody which ends up being worse.  On the whole,
serial execution of queries intended to be run in parallel isn't
likely to work out well, but it's always possible somebody has a cases
where it happens to be winning, and this could break it.  So maybe I
should do this only in master?  Thoughts?


I think the greater good of a fix applies here. +1 to 9.6.






--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Unless otherwise stated, opinions are my own.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add pgstathashindex() to get hash index table statistics.

2017-03-23 Thread Ashutosh Sharma
Hi,

On Tue, Feb 7, 2017 at 9:23 AM, Robert Haas  wrote:
> On Mon, Feb 6, 2017 at 10:40 PM, Amit Kapila  wrote:
>>> Maybe we should call them "unused pages".
>>
>> +1.  If we consider some more names for that column then probably one
>> alternative could be "empty pages".
>
> Yeah, but I think "unused" might be better.  Because a page could be
> in use (as an overflow page or primary bucket page) and still be
> empty.
>

Based on the earlier discussions, I have prepared a patch that would
allow pgstathashindex() to show the number of unused pages in hash
index. Please find the attached patch for the same. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From e3b59fa85f16d6d15be5360e85b7faf63e8683a9 Mon Sep 17 00:00:00 2001
From: ashu 
Date: Thu, 23 Mar 2017 23:02:26 +0530
Subject: [PATCH] Allow pgstathashindex to show unused pages v1

---
 contrib/pgstattuple/expected/pgstattuple.out  | 12 ++--
 contrib/pgstattuple/pgstatindex.c | 19 ---
 contrib/pgstattuple/pgstattuple--1.4--1.5.sql |  1 +
 doc/src/sgml/pgstattuple.sgml | 17 -
 4 files changed, 31 insertions(+), 18 deletions(-)

diff --git a/contrib/pgstattuple/expected/pgstattuple.out b/contrib/pgstattuple/expected/pgstattuple.out
index 2c3515b..1f1ff46 100644
--- a/contrib/pgstattuple/expected/pgstattuple.out
+++ b/contrib/pgstattuple/expected/pgstattuple.out
@@ -132,9 +132,9 @@ select * from pgstatginindex('test_ginidx');
 
 create index test_hashidx on test using hash (b);
 select * from pgstathashindex('test_hashidx');
- version | bucket_pages | overflow_pages | bitmap_pages | zero_pages | live_items | dead_items | free_percent 
--+--++--++++--
-   2 |4 |  0 |1 |  0 |  0 |  0 |  100
+ version | bucket_pages | overflow_pages | bitmap_pages | zero_pages | unused_pages | live_items | dead_items | free_percent 
+-+--++--++--+++--
+   2 |4 |  0 |1 |  0 |0 |  0 |  0 |  100
 (1 row)
 
 -- these should error with the wrong type
@@ -233,9 +233,9 @@ select pgstatindex('test_partition_idx');
 (1 row)
 
 select pgstathashindex('test_partition_hash_idx');
-   pgstathashindex   
--
- (2,8,0,1,0,0,0,100)
+pgstathashindex
+---
+ (2,8,0,1,0,0,0,0,100)
 (1 row)
 
 drop table test_partitioned;
diff --git a/contrib/pgstattuple/pgstatindex.c b/contrib/pgstattuple/pgstatindex.c
index d448e9e..6fc41d6 100644
--- a/contrib/pgstattuple/pgstatindex.c
+++ b/contrib/pgstattuple/pgstatindex.c
@@ -120,6 +120,7 @@ typedef struct HashIndexStat
 	BlockNumber overflow_pages;
 	BlockNumber bitmap_pages;
 	BlockNumber zero_pages;
+	BlockNumber unused_pages;
 
 	int64	live_items;
 	int64	dead_items;
@@ -588,8 +589,8 @@ pgstathashindex(PG_FUNCTION_ARGS)
 	BufferAccessStrategy bstrategy;
 	HeapTuple	tuple;
 	TupleDesc	tupleDesc;
-	Datum		values[8];
-	bool		nulls[8];
+	Datum		values[9];
+	bool		nulls[9];
 	Buffer		metabuf;
 	HashMetaPage	metap;
 	float8		free_percent;
@@ -667,6 +668,8 @@ pgstathashindex(PG_FUNCTION_ARGS)
 			}
 			else if (opaque->hasho_flag & LH_BITMAP_PAGE)
 stats.bitmap_pages++;
+			else if (PageIsEmpty(page))
+stats.unused_pages++;
 			else
 ereport(ERROR,
 		(errcode(ERRCODE_INDEX_CORRUPTED),
@@ -680,8 +683,9 @@ pgstathashindex(PG_FUNCTION_ARGS)
 	/* Done accessing the index */
 	index_close(rel, AccessShareLock);
 
-	/* Count zero pages as free space. */
-	stats.free_space += stats.zero_pages * stats.space_per_page;
+	/* Count zero and unused pages as free space. */
+	stats.free_space += (stats.zero_pages + stats.unused_pages) *
+		 stats.space_per_page;
 
 	/*
 	 * Total space available for tuples excludes the metapage and the bitmap
@@ -711,9 +715,10 @@ pgstathashindex(PG_FUNCTION_ARGS)
 	values[2] = Int64GetDatum((int64) stats.overflow_pages);
 	values[3] = Int64GetDatum((int64) stats.bitmap_pages);
 	values[4] = Int64GetDatum((int64) stats.zero_pages);
-	values[5] = Int64GetDatum(stats.live_items);
-	values[6] = Int64GetDatum(stats.dead_items);
-	values[7] = Float8GetDatum(free_percent);
+	values[5] = Int64GetDatum((int64) stats.unused_pages);
+	values[6] = Int64GetDatum(stats.live_items);
+	values[7] = Int64GetDatum(stats.dead_items);
+	values[8] = Float8GetDatum(free_percent);
 	tuple = heap_form_tuple(tupleDesc, values, nulls);
 
 	PG_RETURN_DATUM(HeapTupleGetDatum(tuple));
diff --git a/contrib/pgstattuple/pgstattuple--1.4--1.5.sql b/contrib/pgstattuple/pgstattuple--1.4--1.5.sql
index 84e112e..eda719a 100644
--- a/contrib/pgstattuple/pgstattuple--1.4--1.5.sql
+++ 

Re: [HACKERS] Hash support for grouping sets

2017-03-23 Thread Andrew Gierth
> "Mark" == Mark Dilger  writes:

 Mark> You define DiscreteKnapsack to take integer weights and double
 Mark> values, and perform the usual Dynamic Programming algorithm to
 Mark> solve.  But the only place you call this, you pass in NULL for
 Mark> the values, indicating that all the values are equal.  I'm
 Mark> confused why in this degenerate case you bother with the DP at
 Mark> all.  Can't you just load the knapsack from lightest to heaviest
 Mark> after sorting, reducing the runtime complexity to O(nlogn)?  It's
 Mark> been a long day.  Sorry if I am missing something obvious.

That's actually a very good question.

(Though in passing I should point out that the runtime cost is going to
be negligible in all practical cases. Even an 8-way cube (256 grouping
sets) has only 70 rollups, and a 4-way cube has only 6; the limit of
4096 grouping sets was only chosen because it was a nice round number
that was larger than comparable limits in other database products. Other
stuff in the grouping sets code has worse time bounds; the
bipartite-match code used to minimize the number of rollups is
potentially O(n^2.5) in the number of grouping sets.)

Thinking about it, it's not at all obvious what we should be optimizing
for here in the absence of accurate sort costs. Right now what matters
is saving as many sort steps as possible, since that's a saving likely
to be many orders of magnitude greater than the differences between two
sorts. The one heuristic that might be useful is to prefer the smaller
estimated size if other factors are equal, just on memory usage grounds,
but even that seems a bit tenuous.

I'm inclined to leave things as they are in the current patch, and maybe
revisit it during beta if we get any relevant feedback from people
actually using grouping sets?

 Mark> I'm guessing you intend to extend the code at some later date to
 Mark> have different values for items.  Is that right?

Well, as I mentioned in a reply to Andres, we probably should eventually
optimize for greatest reduction in cost, and the code as it stands would
allow that to be added easily, but that would require having more
accurate cost info than we have now. cost_sort doesn't take into
consideration the number or types of sort columns or the costs of their
comparison functions, so all sorts of the same input data are costed
equally.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2017-03-23 Thread Robert Haas
On Tue, Mar 21, 2017 at 11:35 PM, Craig Ringer  wrote:
> Changes made per discussion.

Committed 0001.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Mithun Cy
Hi Pavan,
On Thu, Mar 23, 2017 at 12:19 AM, Pavan Deolasee
 wrote:
> Ok, no problem. I did some tests on AWS i2.xlarge instance (4 vCPU, 30GB
> RAM, attached SSD) and results are shown below. But I think it is important
> to get independent validation from your side too, just to ensure I am not
> making any mistake in measurement. I've attached naively put together
> scripts which I used to run the benchmark. If you find them useful, please
> adjust the paths and run on your machine.

I did a similar test appears. Your v19 looks fine to me, it does not
cause any regression, On the other hand, I also ran tests reducing
table fillfactor to 80 there I can see a small regression 2-3% in
average when updating col2 and on updating col9 again I do not see any
regression.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


WARM_test_02.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Scan Mode in Hash Index

2017-03-23 Thread Jesper Pedersen

Hi,

On 03/23/2017 02:11 PM, Ashutosh Sharma wrote:

On Thu, Mar 23, 2017 at 8:29 PM, Jesper Pedersen
 wrote:

0001v2:

In hashgettuple() you can remove the 'currItem' and 'offnum' from the 'else'
part, and do the assignment inside

  if (so->numKilled < MaxIndexTuplesPerPage)

instead.



Done. Please have a look into the attached v3 patch.



No new comments for 0002 and 0003.


okay. Thanks.



I'll keep the entry in 'Needs Review' if Alexander, or others, want to 
add their feedback.


(Best to post the entire patch series each time)

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] createlang/droplang deprecated

2017-03-23 Thread Peter Eisentraut
On 3/23/17 06:41, Daniel Gustafsson wrote:
>> On 20 Mar 2017, at 01:37, Peter Eisentraut 
>>  wrote:
>>
>> On 3/18/17 09:00, Peter Eisentraut wrote:
>>> I just noticed that createlang and droplang have been listed as
>>> deprecated since PG 9.1.
>>>
>>> Do we dare remove them?
>>
>> Patch
> 
> LGTM, +1

Committed.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Pavan Deolasee
On Thu, Mar 23, 2017 at 4:08 PM, Pavan Deolasee 
wrote:

>
>
> On Thu, Mar 23, 2017 at 3:02 PM, Amit Kapila 
> wrote:
>
>>
>>
>> That sounds like you are dodging the actual problem.  I mean you can
>> put that same PageIsFull() check in master code as well and then you
>> will most probably again see the same regression.
>
>
> Well I don't see it that way. There was a specific concern about a
> specific workload that WARM might regress. I think this change addresses
> that. Sure if you pick that one piece, put it in master first and then
> compare against rest of the WARM code, you will see a regression.
>

BTW the PageIsFull() check may not help as much in master as it does with
WARM. In master we anyways bail out early after couple of column checks. In
master it may help to reduce the 10% drop that we see while updating last
index column, but if we compare master and WARM with the patch applied,
regression should be quite nominal.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


  1   2   >