date:20170315

Re: [HACKERS] pg_stat_wal_write statistics view

2017-03-15 Thread vinayak



On 2017/03/16 14:46, Haribabu Kommi wrote:



On Thu, Mar 16, 2017 at 4:15 PM, vinayak 
> wrote:


On 2017/03/16 10:34, Haribabu Kommi wrote:


Updated patch attached.

The patch looks good to me.

Thanks for the review.

How about rename the view as "pg_stat_walwriter"?


With the use of name "walwriter" instead of "walwrites", the
user may confuse that this view is used for displaying
walwriter processes statistics. But actually it is showing
the WAL writes activity in the instance. Because of this
reason, I went with the name of "walwrites".

Understood. Thanks.


The columns of view :
backend_writes -> backend_wal_writes
writes-> background_wal_writes
write_blocks-> wal_write_blocks
write_time->wal_write_time
sync_time->wal_sync_time


As the view name already contains WAL, I am not sure
whether is it required to include WAL in every column?
I am fine to change if others have the same opinion of
adding WAL to column names.


Ok.

Regards,
Vinayak Pokale
NTT Open Source Software Center

Re: [HACKERS] pg_stat_wal_write statistics view

2017-03-15 Thread Haribabu Kommi

On Thu, Mar 16, 2017 at 4:15 PM, vinayak 
wrote:
>
> On 2017/03/16 10:34, Haribabu Kommi wrote:
>
>
> Updated patch attached.
>
> The patch looks good to me.
>

Thanks for the review.

How about rename the view as "pg_stat_walwriter"?
>

With the use of name "walwriter" instead of "walwrites", the
user may confuse that this view is used for displaying
walwriter processes statistics. But actually it is showing
the WAL writes activity in the instance. Because of this
reason, I went with the name of "walwrites".

> The columns of view :
> backend_writes -> backend_wal_writes
> writes-> background_wal_writes
> write_blocks-> wal_write_blocks
> write_time->wal_write_time
> sync_time->wal_sync_time
>

As the view name already contains WAL, I am not sure
whether is it required to include WAL in every column?
I am fine to change if others have the same opinion of
adding WAL to column names.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] identity columns

2017-03-15 Thread Vitaly Burovoy

On 3/15/17, Peter Eisentraut  wrote:
> Vitaly, will you be able to review this again?
>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/

I apologize for a delay. Yes, I'm going to do it by Sunday.

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Microvacuum support for Hash Index

2017-03-15 Thread Amit Kapila

On Wed, Mar 15, 2017 at 9:23 PM, Ashutosh Sharma  wrote:
>
>>
>> Few other comments:
>> 1.
>> + if (ndeletable > 0)
>> + {
>> + /* No ereport(ERROR) until changes are logged */
>> + START_CRIT_SECTION();
>> +
>> + PageIndexMultiDelete(page, deletable, ndeletable);
>> +
>> + pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
>> + pageopaque->hasho_flag &= ~LH_PAGE_HAS_DEAD_TUPLES;
>>
>> You clearing this flag while logging the action, but same is not taken
>> care during replay. Any reasons?
>
> That's because we  conditionally WAL Log this flag status and when we
> do so, we take a it's FPI.
>

Sure, but we are not clearing in conditionally.  I am not sure, how
after recovery it will be cleared it gets set during normal operation.
Moreover, btree already clears similar flag during replay (refer
btree_xlog_delete).


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2017-03-15 Thread Vinayak Pokale

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

I have tested the latest patch and it looks good to me,
so I marked it "Ready for committer".
Anyway, it would be great if anyone could also have a look at the patches and 
send comments.

The new status of this patch is: Ready for Committer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_wal_write statistics view

2017-03-15 Thread vinayak



On 2017/03/16 10:34, Haribabu Kommi wrote:


Updated patch attached.

The patch looks good to me.

How about rename the view as "pg_stat_walwriter"?
The columns of view :
backend_writes -> backend_wal_writes
writes-> background_wal_writes
write_blocks-> wal_write_blocks
write_time->wal_write_time
sync_time->wal_sync_time

Regards,
Vinayak Pokale
NTT Open Source Software Center

[HACKERS] Split conditions on relations

2017-03-15 Thread Jim Nasby

I've got a customer that is running a pretty expensive function as part 
of a WHERE clause. With or without the function, the table the function 
references is the inner-most of a series of nested loops. Without the 
function things are very fast, but adding the function increases the 
cost of the index scan on that table by a factor of ~80x. It also 
falsely skews the row estimate further down, causing a bad shift to 
materialization in another part of the query, but that's a different 
problem. Wrapping the majority of the query in an OFFSET 0 with the 
function call on the outside makes things fast again.


It'd be nice if function execution could be delayed to a higher level of 
a query based on the cost.

--
Jim Nasby, Chief Data Architect, OpenSCG
http://OpenSCG.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multivariate statistics (v25)

2017-03-15 Thread Alvaro Herrera

David Rowley wrote:

> + k = -1;
> + while ((k = bms_next_member(attnums, k)) >= 0)
> + {
> + bool attr_found = false;
> + for (i = 0; i < info->stakeys->dim1; i++)
> + {
> + if (info->stakeys->values[i] == k)
> + {
> + attr_found = true;
> + break;
> + }
> + }
> +
> + /* found attribute not covered by this ndistinct stats, skip */
> + if (!attr_found)
> + {
> + matches = false;
> + break;
> + }
> + }
> 
> Would it be better just to stuff info->stakeys->values into a bitmapset and
> check its a subset of attnums? It would mean allocating memory in the loop,
> so maybe you think otherwise, but in that case maybe StatisticExtInfo
> should store the bitmapset?

Yeah, I think StatisticExtInfo should have a bitmapset, not an
int2vector.

> + appendPQExpBuffer(, "(dependencies)");
> 
> I think it's better practice to use appendPQExpBufferStr() when there's no
> formatting. It'll perform marginally better, which might not be important
> here, but it sets a better example for people to follow when performance is
> more critical.

FWIW this should have said "(ndistinct)" anyway :-)

> +   change the definition of a extended statistics
> 
> "a" should be "an", Also is statistics plural here. It's commonly mixed up
> in the patch. I think it needs standardised. I personally think if you're
> speaking of a single pg_statatic_ext row, then it should be singular. Yet,
> I'm aware you're using plural for the CREATE STATISTICS command, to me that
> feels a bit like: CREATE TABLES mytable ();  am I somehow thinking wrongly
> somehow here?

This was discussed upthread as I recall.  This is what Merriam-Webster says on
the topic:

statistic
1   :  a single term or datum in a collection of statistics
2 a :  a quantity (as the mean of a sample) that is computed from a sample;
   specifically :  estimate 3b
  b :  a random variable that takes on the possible values of a statistic

statistics
1   :  a branch of mathematics dealing with the collection, analysis,
   interpretation, and presentation of masses of numerical data
2   :  a collection of quantitative data

Now, I think there's room to say that a single object created by the new CREATE
STATISTICS is really the latter, not the former.  I find it very weird
that a single of these objects is named in the plural form, though, and
it looks odd all over the place.  I would rather use the term
"statistics object", and then we can continue using the singular.

> +   If a schema name is given (for example, CREATE STATISTICS
> +   myschema.mystat ...) then the statistics is created in the specified
> +   schema.  Otherwise it is created in the current schema.  The name of
> 
> What's created in the current schema? I thought this was just for naming?

Well, "created in a schema" means that the object is named after that
schema.  So both are the same thing.  Is this unclear in some way?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: GetOldestXminExtend for ignoring arbitrary vacuum flags

2017-03-15 Thread Haribabu Kommi

On Fri, Feb 24, 2017 at 3:17 PM, Seki, Eiji 
wrote:

>
> Thank you for your comments.
>
> I reflected these comments to the attached patch. And I renamed IGNORE_XXX
> flags to PROCARRAY_XXX flags.

I checked the latest patch and I have some comments.

+static int
+ConvertProcarrayFlagToProcFlag(int flags)

I feel this function is not needed, if we try to maintain same flag values
for both PROC_XXX and PROCARRAY_XXX by writing some comments
in the both the declarations place to make sure that the person modifying
the flag values needs to update them in both the places. I feel it is
usually
rare that the flag values gets changed.

+ * Now, flags is used only for ignoring backends with some flags.

How about writing something like below.

The flags are used to ignore the backends in calculation when any of the
corresponding flags is set.

+#define PROCARRAY_A_FLAG_VACUUM

How about changing the flag name to PROCARRAY_VACUUM_FLAG
(similar changes to other declarations)

+/* Use these flags in GetOldestXmin as "flags" */

Add some comments here to explain how to use these flags.

+#define PROCARRAY_FLAGS_DEFAULT

same here also as PROCARRAY_DEFAULT_FLAGS
(similar changes to other declarations)

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] Write Ahead Logging for Hash Indexes

2017-03-15 Thread Amit Kapila

On Wed, Mar 15, 2017 at 7:50 PM, Robert Haas  wrote:
> On Wed, Mar 15, 2017 at 9:18 AM, Stephen Frost  wrote:
>>> I think we have sufficient comments in code especially on top of
>>> function _hash_alloc_buckets().
>>
>> I don't see any comments regarding how we have to be sure to handle
>> an out-of-space case properly in the middle of a file because we've made
>> it sparse.
>>
>> I do see that mdwrite() should handle an out-of-disk-space case, though
>> that just makes me wonder what's different here compared to normal
>> relations that we don't have an issue with a sparse WAL'd hash index but
>> we can't handle it if a normal relation is sparse.
>
> I agree.  I think that what hash indexes are doing here is
> inconsistent with what we do for btree indexes and the heap.  And I
> don't think it would be bad to fix that.  We could, of course, go the
> other way and do what Tom is suggesting - insist that everybody's got
> to be prepared for sparse files, but I would view that as something of
> a U-turn.  I think hash indexes are like this because nobody's really
> worried about hash indexes because they haven't been crash-safe or
> performant.  Now that we've made them crash safe and are on the way to
> making them performant, fixing other things is, as Tom already said, a
> lot more interesting.
>
> Now, that having been said, I'm not sure it's a good idea to tinker
> with the behavior for v10.  We could change the new-splitpoint code so
> that it loops over all the pages in the new splitpoint and zeroes them
> all, instead of just the last one.
>

Yeah, but I am slightly afraid that apart from consuming too much
space, it will make the operation lot slower when we have to perform
the allocation step.  Another possibility is to do that when we
actually use/allocate the bucket?  As of now, _hash_getnewbuf assumes
that the block is pre-existing and avoid the actual read/extend by
using RBM_ZERO_AND_LOCK mode.   I think we can change it so that it
forces a new allocation (if the block doesn't exist) on need.  I agree
this is somewhat more change than what you have proposed to circumvent
the sparse file behavior, but may not be a ton more work if we decide
to fix and has the advantage of allocating the space on disk on actual
need.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-15 Thread Andreas Karlsson


Hi,

I got a test failure with this version of the patch in the postges_fdw. 
It looks to me like it was caused by a typo in the source code which is 
fixed in the attached patch.


After applying this patch check-world passes.

Andreas
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 1763be5cf0..2ba5a2ea69 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -42,7 +42,6 @@ static void TidListCreate(TidScanState *tidstate);
 static int	itemptr_comparator(const void *a, const void *b);
 static TupleTableSlot *TidNext(TidScanState *node);
 
-
 /*
  * Compute the list of TIDs to be visited, by evaluating the expressions
  * for them.
@@ -101,6 +100,7 @@ TidListCreate(TidScanState *tidstate)
 			else if (IsCTIDVar(arg2))
 exprstate = ExecInitExpr((Expr *) linitial(fex->args),
 		  >ss.ps);
+			else
 elog(ERROR, "could not identify CTID variable");
 
 			itemptr = (ItemPointer)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication launcher crash on buildfarm

2017-03-15 Thread Andres Freund

On 2017-03-15 20:28:33 -0700, Andres Freund wrote:
> Hi,
> 
> I just unstuck a bunch of my buildfarm animals.  That triggered some
> spurious failures (on piculet, calliphoridae, mylodon), but also one
> that doesn't really look like that:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae=2017-03-16%2002%3A40%3A03
> 
> with the pertinent point being:
> 
> == stack trace: 
> pgsql.build/src/test/regress/tmp_check/data/core ==
> [New LWP 1894]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> Core was generated by `postgres: bgworker: logical replication launcher   
>  '.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x55e265bff5e3 in ?? ()
> #0  0x55e265bff5e3 in ?? ()
> #1  0x55d3ccabed0d in StartBackgroundWorker () at 
> /home/andres/build/buildfarm-culicidae/HEAD/pgsql.build/../pgsql/src/backend/postmaster/bgworker.c:792
> #2  0x55d3ccacf4fc in SubPostmasterMain (argc=3, argv=0x55d3cdbb71c0) at 
> /home/andres/build/buildfarm-culicidae/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:4878
> #3  0x55d3cca443ea in main (argc=3, argv=0x55d3cdbb71c0) at 
> /home/andres/build/buildfarm-culicidae/HEAD/pgsql.build/../pgsql/src/backend/main/main.c:205
> 
> it's possible that me killing things and upgrading caused this, but
> given this is a backend running EXEC_BACKEND, I'm a bit suspicous that
> it's more than that.  The machine is a bit backed up at the moment, so
> it'll probably be a while till it's at that animal/branch again,
> otherwise I'd not have mentioned this.

For some reason it ran again pretty soon. And I'm afraid it's indeed an
issue:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae=2017-03-16%2003%3A30%3A02

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] logical replication launcher crash on buildfarm

2017-03-15 Thread Andres Freund

Hi,

I just unstuck a bunch of my buildfarm animals.  That triggered some
spurious failures (on piculet, calliphoridae, mylodon), but also one
that doesn't really look like that:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae=2017-03-16%2002%3A40%3A03

with the pertinent point being:

== stack trace: 
pgsql.build/src/test/regress/tmp_check/data/core ==
[New LWP 1894]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: bgworker: logical replication launcher 
   '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x55e265bff5e3 in ?? ()
#0  0x55e265bff5e3 in ?? ()
#1  0x55d3ccabed0d in StartBackgroundWorker () at 
/home/andres/build/buildfarm-culicidae/HEAD/pgsql.build/../pgsql/src/backend/postmaster/bgworker.c:792
#2  0x55d3ccacf4fc in SubPostmasterMain (argc=3, argv=0x55d3cdbb71c0) at 
/home/andres/build/buildfarm-culicidae/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:4878
#3  0x55d3cca443ea in main (argc=3, argv=0x55d3cdbb71c0) at 
/home/andres/build/buildfarm-culicidae/HEAD/pgsql.build/../pgsql/src/backend/main/main.c:205

it's possible that me killing things and upgrading caused this, but
given this is a backend running EXEC_BACKEND, I'm a bit suspicous that
it's more than that.  The machine is a bit backed up at the moment, so
it'll probably be a while till it's at that animal/branch again,
otherwise I'd not have mentioned this.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS][REVIEW] macaddr 64 bit (EUI-64) datatype support

2017-03-15 Thread Stephen Frost

Andres,

* Andres Freund (and...@anarazel.de) wrote:
> I see a new warning due to, presumably, this:
> /home/andres/src/postgresql/src/backend/utils/adt/mac8.c: In function 
> ‘hex2_to_uchar’:
> /home/andres/src/postgresql/src/backend/utils/adt/mac8.c:71:23: warning: 
> comparison is always false due to limited range of data type [-Wtype-limits]
>   if (*ptr < 0 || *ptr > 127)

Pushed a fix for this (which also does a bit of other tidying too).

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Changing references of password encryption to hashing

2017-03-15 Thread Bruce Momjian

On Mon, Mar 13, 2017 at 04:48:21PM +0800, Craig Ringer wrote:
> On 12 March 2017 at 06:51, Joe Conway  wrote:
> 
> > My opinion is that the user visible aspects of this should be deprecated
> > and correct syntax provided. But perhaps that is overkill.
> 
> FWIW, in my experience, pretty much nobody understands the pretty
> tangled behaviour of "WITH [ENCRYPTED] PASSWORD", you have to
> understand the fact table of:
> 
> * ENCRYPTED, UNENCRYPTED or neither set
> * password_encryption GUC on or off
> * password begins / doesn't begin with fixed string 'md5'
> 
> to fully know what will happen.
> 
> Then of course, you have to understand how all this interacts with
> pg_hba.conf's 'password' and 'md5' options.
> 
> It's a right mess. Since our catalogs don't keep track of the hash
> separately to the password text and use prefixes instead, and since we
> need compatibility for dumps, it's hard to do a great deal about
> though.

With SCRAM coming in PG 10, is there anything we can do to clean this up
for PG 10?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Microvacuum support for Hash Index

2017-03-15 Thread Ashutosh Sharma

On Mar 16, 2017 7:49 AM, "Robert Haas"  wrote:

On Wed, Mar 15, 2017 at 4:31 PM, Robert Haas  wrote:
> On Wed, Mar 15, 2017 at 3:54 PM, Ashutosh Sharma 
wrote:
>> Changed as per suggestions. Attached v9 patch. Thanks.
>
> Wow, when do you sleep?   Will have a look.

Committed with a few corrections.


Thanks Robert for the commit. Thank you Amit and Jesper for reviewing this
patch.

With Regards,
Ashutosh Sharma

Re: [HACKERS] \h tab-completion

2017-03-15 Thread Andreas Karlsson


On 03/01/2017 02:47 PM, Peter Eisentraut wrote:

Instead of creating another copy of list_ALTER, let's use the
words_after_create list and write a version of
create_command_generator/drop_command_generator.


Good idea. Here is a patch with that.

Andreas

commit 7d691929f5814da87bb8a532e7dcfa2bd1dc9f15
Author: Andreas Karlsson 
Date:   Fri Feb 3 13:05:48 2017 +0100

Add compleition for \help DROP|ALTER

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index e8458e939e..3df7636c5b 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -982,10 +982,11 @@ typedef struct
 
 #define THING_NO_CREATE		(1 << 0)	/* should not show up after CREATE */
 #define THING_NO_DROP		(1 << 1)	/* should not show up after DROP */
-#define THING_NO_SHOW		(THING_NO_CREATE | THING_NO_DROP)
+#define THING_NO_ALTER		(1 << 2)	/* should not show up after ALTER */
+#define THING_NO_SHOW		(THING_NO_CREATE | THING_NO_DROP | THING_NO_ALTER)
 
 static const pgsql_thing_t words_after_create[] = {
-	{"ACCESS METHOD", NULL, NULL},
+	{"ACCESS METHOD", NULL, NULL, THING_NO_ALTER},
 	{"AGGREGATE", NULL, _for_list_of_aggregates},
 	{"CAST", NULL, NULL},		/* Casts have complex structures for names, so
  * skip it */
@@ -999,6 +1000,7 @@ static const pgsql_thing_t words_after_create[] = {
 	{"CONVERSION", "SELECT pg_catalog.quote_ident(conname) FROM pg_catalog.pg_conversion WHERE substring(pg_catalog.quote_ident(conname),1,%d)='%s'"},
 	{"DATABASE", Query_for_list_of_databases},
 	{"DICTIONARY", Query_for_list_of_ts_dictionaries, NULL, THING_NO_SHOW},
+	{"DEFAULT PRIVILEGES", NULL, NULL, THING_NO_CREATE | THING_NO_DROP},
 	{"DOMAIN", NULL, _for_list_of_domains},
 	{"EVENT TRIGGER", NULL, NULL},
 	{"EXTENSION", Query_for_list_of_extensions},
@@ -1006,12 +1008,13 @@ static const pgsql_thing_t words_after_create[] = {
 	{"FOREIGN TABLE", NULL, NULL},
 	{"FUNCTION", NULL, _for_list_of_functions},
 	{"GROUP", Query_for_list_of_roles},
-	{"LANGUAGE", Query_for_list_of_languages},
 	{"INDEX", NULL, _for_list_of_indexes},
+	{"LANGUAGE", Query_for_list_of_languages, NULL, THING_NO_ALTER},
+	{"LARGE OBJECT", NULL, NULL, THING_NO_CREATE | THING_NO_DROP},
 	{"MATERIALIZED VIEW", NULL, _for_list_of_matviews},
 	{"OPERATOR", NULL, NULL},	/* Querying for this is probably not such a
  * good idea. */
-	{"OWNED", NULL, NULL, THING_NO_CREATE},		/* for DROP OWNED BY ... */
+	{"OWNED", NULL, NULL, THING_NO_CREATE | THING_NO_ALTER},		/* for DROP OWNED BY ... */
 	{"PARSER", Query_for_list_of_ts_parsers, NULL, THING_NO_SHOW},
 	{"POLICY", NULL, NULL},
 	{"PUBLICATION", Query_for_list_of_publications},
@@ -1021,15 +1024,16 @@ static const pgsql_thing_t words_after_create[] = {
 	{"SEQUENCE", NULL, _for_list_of_sequences},
 	{"SERVER", Query_for_list_of_servers},
 	{"SUBSCRIPTION", Query_for_list_of_subscriptions},
+	{"SYSTEM", NULL, NULL, THING_NO_CREATE | THING_NO_DROP},
 	{"TABLE", NULL, _for_list_of_tables},
 	{"TABLESPACE", Query_for_list_of_tablespaces},
-	{"TEMP", NULL, NULL, THING_NO_DROP},		/* for CREATE TEMP TABLE ... */
+	{"TEMP", NULL, NULL, THING_NO_DROP | THING_NO_ALTER},		/* for CREATE TEMP TABLE ... */
 	{"TEMPLATE", Query_for_list_of_ts_templates, NULL, THING_NO_SHOW},
 	{"TEXT SEARCH", NULL, NULL},
 	{"TRIGGER", "SELECT pg_catalog.quote_ident(tgname) FROM pg_catalog.pg_trigger WHERE substring(pg_catalog.quote_ident(tgname),1,%d)='%s' AND NOT tgisinternal"},
 	{"TYPE", NULL, _for_list_of_datatypes},
-	{"UNIQUE", NULL, NULL, THING_NO_DROP},		/* for CREATE UNIQUE INDEX ... */
-	{"UNLOGGED", NULL, NULL, THING_NO_DROP},	/* for CREATE UNLOGGED TABLE
+	{"UNIQUE", NULL, NULL, THING_NO_DROP | THING_NO_ALTER},		/* for CREATE UNIQUE INDEX ... */
+	{"UNLOGGED", NULL, NULL, THING_NO_DROP | THING_NO_ALTER},	/* for CREATE UNLOGGED TABLE
  * ... */
 	{"USER", Query_for_list_of_roles},
 	{"USER MAPPING FOR", NULL, NULL},
@@ -1042,6 +1046,7 @@ static const pgsql_thing_t words_after_create[] = {
 static char **psql_completion(const char *text, int start, int end);
 static char *create_command_generator(const char *text, int state);
 static char *drop_command_generator(const char *text, int state);
+static char *alter_command_generator(const char *text, int state);
 static char *complete_from_query(const char *text, int state);
 static char *complete_from_schema_query(const char *text, int state);
 static char *_complete_from_query(int is_schema_query,
@@ -1316,6 +1321,17 @@ psql_completion(const char *text, int start, int end)
 	(previous_words_count >= 2 && \
 	 word_matches_cs(p1, prev_wd) && \
 	 word_matches_cs(p2, prev2_wd))
+#define TailMatchesCS3(p3, p2, p1) \
+	(previous_words_count >= 3 && \
+	 word_matches_cs(p1, prev_wd) && \
+	 word_matches_cs(p2, prev2_wd) && \
+	 word_matches_cs(p3, prev3_wd))
+#define TailMatchesCS4(p4, p3, p2, p1) \
+	(previous_words_count >= 4 && \
+	 word_matches_cs(p1, prev_wd) && \
+	 word_matches_cs(p2, prev2_wd) && \
+

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-15 Thread Vaishnavi Prabakaran

On Tue, Mar 14, 2017 at 5:50 PM, Vaishnavi Prabakaran <
vaishnaviprabaka...@gmail.com> wrote:

>
>
> On Tue, Mar 14, 2017 at 4:19 AM, Daniel Verite 
> wrote:
>
>>
>> I mean the next iteration of the above while statement. Referring
>> to the doc, that would be the "next batch entry":
>>
>>   " To get the result of the first batch entry the client must call
>>PQbatchQueueProcess. It must then call PQgetResult and handle the
>>results until PQgetResult returns null (or would return null if
>>called). The result from the next batch entry may then be retrieved
>>using PQbatchQueueProcess and the cycle repeated"
>>
>> Attached is a bare-bones testcase showing the problem.
>> As it is, it works, retrieving results for three "SELECT 1"
>> in the same batch.  Now in order to use the single-row
>> fetch mode, consider adding this:
>>
>> r = PQsetSingleRowMode(conn);
>> if (r!=1) {
>>   fprintf(stderr, "PQsetSingleRowMode() failed for i=%d\n", i);
>> }
>>
>> When inserted after the call to PQbatchQueueProcess,
>> which is what I understand you're saying works for you,
>> it fails for me when starting to get the results of the 2nd query
>> and after.
>>
>>
>
> Thanks for explaining the issue in detail and yes, I do see the issue
> using your attached test file.
>
> And I think the problem is with the "parseInput" call at the end of
> PQbatchQueueProcess().
> I don't see the need for "parseInput" to cover the scope of
> PQbatchQueueProcess(), also anyways we do it in PQgetResult().
> This function call changes the asyncstatus of connection to
> READY(following command complete message from backend), so eventually
> PQsetSingleRowMode() is failing. So, attached the alternative fix for this
> issue.
>
Please share me your thoughts.
>
> I would also like to hear Craig's opinion on it before applying this fix
> to the original patch, just to make sure am not missing anything here.
>
>
Attached the code patch applied with the fix explained above and moving the
CF status seeking review.

Thanks & Regards,
Vaishnavi,
Fujitsu Australia.


0001-Pipelining-batch-support-for-libpq-code-v6.patch
Description: Binary data


0002-Pipelining-batch-support-for-libpq-test-v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Remove objname/objargs split for referring to objects

2017-03-15 Thread Michael Paquier

On Thu, Mar 16, 2017 at 12:18 AM, Alvaro Herrera
 wrote:
> Peter Eisentraut wrote:
>> Remove objname/objargs split for referring to objects
>
> I don't know if this is the guilty commit, but I'm now getting these
> compiler warnings:
>
>
> /pgsql/source/master/src/backend/catalog/objectaddress.c: In function 
> 'get_object_address':
> /pgsql/source/master/src/backend/catalog/objectaddress.c:1624:33: warning: 
> 'typenames[1]' may be used uninitialized in this function 
> [-Wmaybe-uninitialized]
>ereport(ERROR,
>  ^
> /pgsql/source/master/src/backend/catalog/objectaddress.c:1578:8: note: 
> 'typenames[1]' was declared here
>   char*typenames[2];
> ^
> /pgsql/source/master/src/backend/catalog/objectaddress.c:1624:33: warning: 
> 'typenames[0]' may be used uninitialized in this function 
> [-Wmaybe-uninitialized]
>ereport(ERROR,
>  ^
> /pgsql/source/master/src/backend/catalog/objectaddress.c:1578:8: note: 
> 'typenames[0]' was declared here
>   char*typenames[2];
> ^

What are you using as CFLAGS? As both typenames should be normally
set, what about initializing those fields with NULL and add an
assertion like the attached?
-- 
Michael
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index 3a7f049247..52df94439d 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -1552,7 +1552,7 @@ get_object_address_opf_member(ObjectType objtype,
 	ObjectAddress address;
 	ListCell   *cell;
 	List	   *copy;
-	char	   *typenames[2];
+	char	   *typenames[2] = {NULL, NULL};
 	Oid			typeoids[2];
 	int			membernum;
 	int			i;
@@ -1581,6 +1581,8 @@ get_object_address_opf_member(ObjectType objtype,
 			break;
 	}
 
+	Assert(typenames[0] != NULL && typenames[1] != NULL);
+
 	switch (objtype)
 	{
 		case OBJECT_AMOP:

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Microvacuum support for Hash Index

2017-03-15 Thread Andres Freund

On 2017-03-15 16:31:11 -0400, Robert Haas wrote:
> On Wed, Mar 15, 2017 at 3:54 PM, Ashutosh Sharma  
> wrote:
> > Changed as per suggestions. Attached v9 patch. Thanks.
> 
> Wow, when do you sleep?

I think that applies to a bunch of people, including yourself ;)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Microvacuum support for Hash Index

2017-03-15 Thread Robert Haas

On Wed, Mar 15, 2017 at 4:31 PM, Robert Haas  wrote:
> On Wed, Mar 15, 2017 at 3:54 PM, Ashutosh Sharma  
> wrote:
>> Changed as per suggestions. Attached v9 patch. Thanks.
>
> Wow, when do you sleep?   Will have a look.

Committed with a few corrections.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Upgrading postmaster's log messages about bind/listen errors

2017-03-15 Thread Bruce Momjian

On Tue, Mar 14, 2017 at 12:50:09PM -0400, Tom Lane wrote:
> I wrote:
> >> I think that what would actually be of some use nowadays is a LOG-level
> >> message emitted if the wraparound *isn't* activated immediately at start.
> >> But otherwise, we should follow the rule that silence is golden.
> 
> > Concretely, how about the attached?  It preserves the original
> > "protections are now enabled" message at LOG level, but emits it only
> > when oldestOffsetKnown becomes true *after* startup.  Meanwhile, if
> > oldestOffsetKnown is still not true at the conclusion of TrimMultiXact,
> > then it emits a new LOG message about "protections are not active".
> 
> I realized that the second of these is not necessary because it's
> redundant with the message about "MultiXact member wraparound protections
> are disabled because oldest checkpointed MultiXact %u does not exist on
> disk".  Pushed without that.

Gee, I kind of like the new messages:

LOG:  listening on IPv4 address "127.0.0.1", port 5432
LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication access control patches

2017-03-15 Thread Peter Eisentraut

New patch set based on the discussions.  I have dropped the PUBLICATION
privilege patch.  The patches are also reordered a bit in approximate
decreasing priority order.

0001 Refine rules for altering publication owner

kind of a bug fix

0002 Change logical replication pg_hba.conf use

This was touched upon in the discussion at

and seems to have been viewed favorably there.

0003 Add USAGE privilege for publications

a way to control who can subscribe to a publication

0004 Add subscription apply worker privilege checks

This is a prerequisite for the next one (or one like it).

0005 Add CREATE SUBSCRIPTION privilege on databases

Need a way to determine which user can create subscriptions.  The
presented approach made sense to me, but maybe there are other ideas.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 676e6e26f1b0500907c0cd810969bb015ee548d1 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 13 Feb 2017 08:57:45 -0500
Subject: [PATCH v2 1/5] Refine rules for altering publication owner

Previously, the new owner had to be a superuser.  The new rules are more
refined similar to other objects.
---
 doc/src/sgml/ref/alter_publication.sgml   |  7 +--
 src/backend/commands/publicationcmds.c| 34 ++-
 src/test/regress/expected/publication.out |  8 
 src/test/regress/sql/publication.sql  |  4 
 4 files changed, 41 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/ref/alter_publication.sgml b/doc/src/sgml/ref/alter_publication.sgml
index 47d83b80be..776661bbeb 100644
--- a/doc/src/sgml/ref/alter_publication.sgml
+++ b/doc/src/sgml/ref/alter_publication.sgml
@@ -48,8 +48,11 @@ Description
   
 
   
-   To alter the owner, you must also be a direct or indirect member of the
-   new owning role. The new owner has to be a superuser
+   To alter the owner, you must also be a direct or indirect member of the new
+   owning role. The new owner must have CREATE privilege on
+   the database.  Also, the new owner of a FOR ALL TABLES
+   publication must be a superuser.  However, a superuser can change the
+   ownership of a publication while circumventing these restrictions.
   
 
   
diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c
index 04f83e0a2e..d69e39dc5b 100644
--- a/src/backend/commands/publicationcmds.c
+++ b/src/backend/commands/publicationcmds.c
@@ -670,17 +670,31 @@ AlterPublicationOwner_internal(Relation rel, HeapTuple tup, Oid newOwnerId)
 	if (form->pubowner == newOwnerId)
 		return;
 
-	if (!pg_publication_ownercheck(HeapTupleGetOid(tup), GetUserId()))
-		aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_PUBLICATION,
-	   NameStr(form->pubname));
+	if (!superuser())
+	{
+		AclResult	aclresult;
 
-	/* New owner must be a superuser */
-	if (!superuser_arg(newOwnerId))
-		ereport(ERROR,
-(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
- errmsg("permission denied to change owner of publication \"%s\"",
-		NameStr(form->pubname)),
- errhint("The owner of a publication must be a superuser.")));
+		/* Must be owner */
+		if (!pg_publication_ownercheck(HeapTupleGetOid(tup), GetUserId()))
+			aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_PUBLICATION,
+		   NameStr(form->pubname));
+
+		/* Must be able to become new owner */
+		check_is_member_of_role(GetUserId(), newOwnerId);
+
+		/* New owner must have CREATE privilege on database */
+		aclresult = pg_database_aclcheck(MyDatabaseId, newOwnerId, ACL_CREATE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, ACL_KIND_DATABASE,
+		   get_database_name(MyDatabaseId));
+
+		if (form->puballtables && !superuser_arg(newOwnerId))
+			ereport(ERROR,
+	(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+	 errmsg("permission denied to change owner of publication \"%s\"",
+			NameStr(form->pubname)),
+	 errhint("The owner of a FOR ALL TABLES publication must be a superuser.")));
+	}
 
 	form->pubowner = newOwnerId;
 	CatalogTupleUpdate(rel, >t_self, tup);
diff --git a/src/test/regress/expected/publication.out b/src/test/regress/expected/publication.out
index 7c4834b213..5a7c0edf7d 100644
--- a/src/test/regress/expected/publication.out
+++ b/src/test/regress/expected/publication.out
@@ -182,6 +182,14 @@ ALTER PUBLICATION testpub_default RENAME TO testpub_foo;
 
 -- rename back to keep the rest simple
 ALTER PUBLICATION testpub_foo RENAME TO testpub_default;
+ALTER PUBLICATION testpub_default OWNER TO regress_publication_user2;
+\dRp testpub_default
+   List of publications
+  Name   |   Owner   | Inserts | Updates | Deletes 
+-+---+-+-+-
+ testpub_default | regress_publication_user2 | t   | t

Re: [HACKERS] Backend crash on non-exclusive backup cancel

2017-03-15 Thread Michael Paquier

On Thu, Mar 16, 2017 at 12:46 AM, Andres Freund  wrote:
> Hi,
>
> On 2017-03-15 15:28:03 +0900, Michael Paquier wrote:
>> diff --git a/src/backend/access/transam/xlog.c 
>> b/src/backend/access/transam/xlog.c
>> index 64335f909e..eaf8e32fe1 100644
>> --- a/src/backend/access/transam/xlog.c
>> +++ b/src/backend/access/transam/xlog.c
>> --- a/src/backend/access/transam/xlogfuncs.c
>> +++ b/src/backend/access/transam/xlogfuncs.c
>> diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
>> index 104ee7dd5e..d4abf94862 100644
>> --- a/src/include/access/xlog.h
>> +++ b/src/include/access/xlog.h
>> @@ -288,8 +288,26 @@ extern void assign_max_wal_size(int newval, void 
>> *extra);
>>  extern void assign_checkpoint_completion_target(double newval, void *extra);
>
> This seems like something easy enough to exercise in a tap test, could
> we get one?

The first problem needs to have a cancel request sent when
pg_stop_backup() is running, the second needs to have a session held
to see an the inconsistent status of the session lock, which are two
concepts foreign to the TAP tests without being able to run the
queries asynchronously and keep the sessions alive.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to improve performance of replay of AccessExclusiveLock

2017-03-15 Thread David Rowley

On 16 March 2017 at 13:31, Simon Riggs  wrote:

> On 8 March 2017 at 10:02, David Rowley 
> wrote:
> > On 8 March 2017 at 01:13, Simon Riggs  wrote:
> >> Don't understand this. I'm talking about setting a flag on
> >> commit/abort WAL records, like the attached.
> >
> > There's nothing setting a flag in the attached. I only see the
> > checking of the flag.
>
> Yeh, it wasn't a full patch, just a way marker to the summit for you.
>
> >> We just need to track info so we can set the flag at EOXact and we're
> done.
> >
> > Do you have an idea where to store that information? I'd assume we'd
> > want to store something during LogAccessExclusiveLocks() and check
> > that in XactLogCommitRecord() and XactLogAbortRecord(). I don't see
> > anything existing which might help with that today. Do you think I
> > should introduce some new global variable for that? Or do you propose
> > calling GetRunningTransactionLocks() again while generating the
> > Commit/Abort record?
>
> Seemed easier to write it than explain further. Please see what you think.

Thanks. I had been looking for some struct to store the flag in. I'd not
considered just adding a new global variable.

I was in fact just working on this again earlier, and I had come up with
the attached.

I was just trying to verify if it was going to do the correct thing in
regards to subtransactions and I got a little confused at the comments at
the top of StandbyAcquireAccessExclusiveLock():

 * We record the lock against the top-level xid, rather than individual
 * subtransaction xids. This means AccessExclusiveLocks held by aborted
 * subtransactions are not released as early as possible on standbys.

if this is true, and it seems to be per LogAccessExclusiveLock(),
does StandbyReleaseLockTree() need to release the locks for sub
transactions too?

This code:

for (i = 0; i < nsubxids; i++)
StandbyReleaseLocks(subxids[i]);

FWIW I tested the attached and saw that with my test cases I showed earlier
that  StandbyReleaseLockTree() was down to 0.74% in the perf report. Likely
the gain will be  even better if I ran a few more CPUs on small simple
transactions which are not taking Access Exclusive locks, so I agree this
is a better way forward than what I had in my original patch. Your patch
will have the same effect, so much better than the 1.74% I saw in my perf
report for the 1024 bucket hash table.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

mark_xact_ael.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-03-15 Thread Robert Haas

So I am looking at this part of 0008:

+   /*
+* Do not copy parent_rinfo and child_rinfos because 1. they create a
+* circular dependency between child and parent RestrictInfo 2. dropping
+* those links just means that we loose some memory
optimizations. 3. There
+* is a possibility that the child and parent RestrictInfots
themselves may
+* have got copied and thus the old links may no longer be valid. The
+* caller may set up those links itself, if needed.
+*/

I don't think that it's very clear whether or not this is safe.  I
experimented with making _copyRestrictInfo PANIC, which,
interestingly, does not affect the core regression tests at all, but
does trip on this bit from the postgres_fdw tests:

-- subquery using stable function (can't be sent to remote)
PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN
(SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c4) =
'1970-01-17'::date) ORDER BY c1;
EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st2(10, 20);

I'm not sure why this particular case is affected when so many others
are not, and the comment doesn't help me very much in figuring it out.

Why do we need this cache in the RestrictInfo, anyway?  Aside from the
comment above, I looked at the comment in the RestrictInfo struct, and
I looked at the comment in build_child_restrictinfo, and I looked at
the comment in build_child_clauses, and I looked at the place where
build_child_clauses is called in set_append_rel_size, and none of
those places explain why we need this cache.  I would assume we'd need
a separate translation of the RestrictInfo for every separate
child-join, so how does the cache help?

Maybe the answer is that build_child_clauses() is also called from
try_partition_wise_join() and add_paths_to_child_joinrel(), and those
three call sights all end up producing the same set of translated
RestrictInfos.  But if that's the case, somehow it seems like we ought
to be producing these in one place where we can get convenient access
to them from each child join, rather than having to search through
this cache to find it.  It's a pretty inefficient cache: it takes O(n)
time to search it, I think, where n is the number of partitions.  And
you do O(n) searches.  So it's an O(n^2) algorithm, which is a little
unfortunate.  Can't we affix the translated RestrictInfos someplace
where they can be found more efficiently?

Yet another thing that the comments don't explain is why the existing
adjust_appendrel_attrs call needs to be replaced with
build_child_clauses.

So I feel, overall, that the point of all of this is not explained well at all.

...Robert


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_wal_write statistics view

2017-03-15 Thread Haribabu Kommi

On Thu, Mar 16, 2017 at 9:55 AM, Julien Rouhaud 
wrote:

> On Wed, Feb 15, 2017 at 02:53:44PM +1100, Haribabu Kommi wrote:
> > Here I attached patch that implements the view.
> > I will add this patch to next commitfest.
>
> Hello,
>
> I just reviewed the patch.
>

Thanks for the review.

First, there are some whitespace issues that make git-apply complaining (at
> least lines 218 and 396).
>

Removed.

Patch is rather straightforward and works as expected, doc compiles without
> issue.
>
> I only found some minor issues:
>
> +  One row only, showing statistics about the
> +   wal writing activity. See
>
> +  Number of wal writes that are carried out by the backend
> process
>
>
> WAL should be uppercase (and for some more occurences).
>

Fixed.


> +  
> +  Number of wal writes that are carried out by background workers
> such as checkpointer,
> +  writer and walwriter.
>
> I guess you meant backgroung processes?
>

Yes, it is background processes. Updated.


> >+  This field data will be populated only when the track_io_timing
> GUC is enabled
> (and similar occurences)
>
> track_io_timing should link to 
> instead of
> mentionning GUC.
>

Updated accordingly.

I think you also need to update the track_io_timing description in
> sgml/config.sgml to mention this new view.
>

Added the reference of pg_stat_walwrites in the GUC description.


>
> +   else
> +   {
> +   LocalWalWritesStats.m_wal_total_write_time = 0;
> +   }
> (and similar ones)
>
> The brackets seem unnecessary.
>

Corrected.

Updated patch attached.

Regards,
Hari Babu
Fujitsu Australia


pg_stat_walwrites_view_2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2017-03-15 Thread Michael Paquier

On Wed, Mar 15, 2017 at 4:48 PM, Nikhil Sontakke
 wrote:
>> boolvalid;  /* TRUE if PGPROC entry is in proc array
>> */
>> boolondisk; /* TRUE if prepare state file is on disk
>> */
>> +   boolinredo; /* TRUE if entry was added via xlog_redo
>> */
>> We could have a set of flags here, that's the 3rd boolean of the
>> structure used for a status.
>
> This is more of a cleanup and does not need to be part of this patch. This
> can be a follow-on cleanup patch.

OK, that's fine for me. This patch is complicated enough anyway.

After some time thinking about it, I have finally put my finger on
what was itching me about this patch, and the answer is here:

+ *  Replay of twophase records happens by the following rules:
+ *
+ *  * On PREPARE redo we add the transaction to TwoPhaseState->prepXacts.
+ *We set gxact->inredo to true for such entries.
+ *
+ *  * On Checkpoint we iterate through TwoPhaseState->prepXacts entries
+ *that have gxact->inredo set and are behind the redo_horizon. We
+ *save them to disk and also set gxact->ondisk to true.
+ *
+ *  * On COMMIT/ABORT we delete the entry from TwoPhaseState->prepXacts.
+ *If gxact->ondisk is true, we delete the corresponding entry from
+ *the disk as well.
+ *
+ *  * RecoverPreparedTransactions(), StandbyRecoverPreparedTransactions()
+ *and PrescanPreparedTransactions() have been modified to go through
+ *gxact->inredo entries that have not made to disk yet.

It seems to me that there should be an initial scan of pg_twophase at
the beginning of recovery, discarding on the way with a WARNING
entries that are older than the checkpoint redo horizon. This should
fill in shmem entries using something close to PrepareRedoAdd(), and
mark those entries as inredo. Then, at the end of recovery,
PrescanPreparedTransactions does not need to look at the entries in
pg_twophase. And that's the case as well of
RecoverPreparedTransaction(). I think that you could get the patch
much simplified this way, as any 2PC data can be fetched directly from
WAL segments and there is no need to rely on scans of pg_twophase,
this is replaced by scans of entries in TwoPhaseState.

> I also managed to do some perf testing.
>
> Modified Stas' earlier scripts slightly:
>
> \set naccounts 10 * :scale
> \set from_aid random(1, :naccounts)
> \set to_aid random(1, :naccounts)
> \set delta random(1, 100)
> \set scale :scale+1
> BEGIN;
> UPDATE pgbench_accounts SET abalance = abalance - :delta WHERE aid =
> :from_aid;
> UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid =
> :to_aid;
> PREPARE TRANSACTION ':client_id.:scale';
> COMMIT PREPARED ':client_id.:scale';
>
> Created a base backup with scale factor 125 on an AWS t2.large instance. Set
> up archiving and did a 20 minute run with the above script saving the WALs
> in the archive.
>
> Then used recovery.conf to point to this WAL location and used the base
> backup to recover.
>
> With this patch applied: 20s
> Without patch: Stopped measuring after 5 minutes ;-)

And that's really nice.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS][REVIEW] macaddr 64 bit (EUI-64) datatype support

2017-03-15 Thread Stephen Frost

Andres,

* Andres Freund (and...@anarazel.de) wrote:
> I see a new warning due to, presumably, this:
> /home/andres/src/postgresql/src/backend/utils/adt/mac8.c: In function 
> ‘hex2_to_uchar’:
> /home/andres/src/postgresql/src/backend/utils/adt/mac8.c:71:23: warning: 
> comparison is always false due to limited range of data type [-Wtype-limits]
>   if (*ptr < 0 || *ptr > 127)
>^

Ah, yeah, I suppose I can drop that half of the check.

Will push a fix soon.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS][REVIEW] macaddr 64 bit (EUI-64) datatype support

2017-03-15 Thread Andres Freund

On 2017-03-15 11:20:53 -0400, Stephen Frost wrote:
> Greetings Hari Babu,
> 
> * Haribabu Kommi (kommi.harib...@gmail.com) wrote:
> > On Mon, Mar 13, 2017 at 6:52 AM, Stephen Frost  wrote:
> > > And, naturally, re-reading the email as it hit the list made me realize
> > > that the documentation/error-message incorrectly said "3rd and 4th"
> > > bytes were being set to FF and FE, when it's actually the 4th and 5th
> > > byte.  The code was correct, just the documentation and the error
> > > message had the wrong numbers.  The commit message is also correct.
> > 
> > Thanks for the review and corrections.
> > 
> > I found some small corrections.
> > 
> > s/4rd/4th/g -- Type correction.
> > 
> > + Input is accepted in the following formats:
> > 
> > As we are supporting many different input variants, and all combinations
> > are not listed, so how about changing the above statement as below.
> > 
> > "Following are the some of the input formats that are accepted:"
> 
> Good points, I incorporated them along with a bit of additional
> information in the documentation as to what we do actually support.
> 
> > I didn't find any other problems during testing and review. The patch
> > is fine.
> 
> Great!  I've committed this now.  If you see anything additional or
> other changes which should be made, please let me know.
> 
> ... and bumped catversion after (thanks for the reminder, Robert!).

I see a new warning due to, presumably, this:
/home/andres/src/postgresql/src/backend/utils/adt/mac8.c: In function 
‘hex2_to_uchar’:
/home/andres/src/postgresql/src/backend/utils/adt/mac8.c:71:23: warning: 
comparison is always false due to limited range of data type [-Wtype-limits]
  if (*ptr < 0 || *ptr > 127)
   ^
Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS][REVIEW] macaddr 64 bit (EUI-64) datatype support

2017-03-15 Thread Haribabu Kommi

On Thu, Mar 16, 2017 at 2:20 AM, Stephen Frost  wrote:

> Greetings Hari Babu,
>
> * Haribabu Kommi (kommi.harib...@gmail.com) wrote:
> > On Mon, Mar 13, 2017 at 6:52 AM, Stephen Frost 
> wrote:
> > > And, naturally, re-reading the email as it hit the list made me realize
> > > that the documentation/error-message incorrectly said "3rd and 4th"
> > > bytes were being set to FF and FE, when it's actually the 4th and 5th
> > > byte.  The code was correct, just the documentation and the error
> > > message had the wrong numbers.  The commit message is also correct.
> >
> > Thanks for the review and corrections.
> >
> > I found some small corrections.
> >
> > s/4rd/4th/g -- Type correction.
> >
> > + Input is accepted in the following formats:
> >
> > As we are supporting many different input variants, and all combinations
> > are not listed, so how about changing the above statement as below.
> >
> > "Following are the some of the input formats that are accepted:"
>
> Good points, I incorporated them along with a bit of additional
> information in the documentation as to what we do actually support.
>
> > I didn't find any other problems during testing and review. The patch
> > is fine.
>
> Great!  I've committed this now.  If you see anything additional or
> other changes which should be made, please let me know.
>
> ... and bumped catversion after (thanks for the reminder, Robert!).
>

Thanks for the review and changes.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] utility commands benefiting from parallel plan

2017-03-15 Thread Haribabu Kommi

On Tue, Feb 28, 2017 at 12:48 PM, Haribabu Kommi 
wrote:

>
>
> On Sat, Feb 25, 2017 at 3:21 AM, Robert Haas 
> wrote:
>
>> On Fri, Feb 24, 2017 at 11:43 AM, Haribabu Kommi
>>  wrote:
>> > Here I attached an implementation patch that allows
>> > utility statements that have queries underneath such as
>> > CREATE TABLE AS, CREATE MATERIALIZED VIEW
>> > and REFRESH commands to benefit from parallel plan.
>> >
>> > These write operations not performed concurrently by the
>> > parallel workers, but the underlying query that is used by
>> > these operations are eligible for parallel plans.
>> >
>> > Currently the write operations are implemented for the
>> > tuple dest types DestIntoRel and DestTransientRel.
>> >
>> > Currently I am evaluating other write operations that can
>> > benefit with parallelism without side effects in enabling them.
>> >
>> > comments?
>>
>> I think a lot more work than this will be needed.  See:
>>
>> https://www.postgresql.org/message-id/CA+TgmoZC5ft_t9uQWSO5_
>> 1vU6H8oVyD=zyuLvRnJqTN==fv...@mail.gmail.com
>>
>> ...and the discussion which followed.
>>
>
>
> Thanks for the link.
> Yes, it needs more work to support parallelism even for
> queries that involved in write operations like INSERT,
> DELETE and UPDATE commands.
>

This patch is marked as "returned with feedback" in the ongoing
commitfest.

The proposed DML write operations patch is having good number
of limitations like triggers and etc, but the utility writer operations
patch is in a good shape in my view to start supporting write operations.
This is useful for materialized view while refreshing the data.

Do you find any problems/missings in supporting parallel plan for utility
commands with the attached update patch?  Or is it something
like supporting all write operations at once?

Regards,
Hari Babu
Fujitsu Australia


0001_utility_write_using_parallel_2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-15 Thread Andres Freund

Hi,

On 2017-03-15 20:09:03 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > [ new patches ]
>
> I've started to look at 0004, and the first conclusion I've come to
> is that it's *direly* short of documentation.  To the point that I'd
> vote against committing it if something isn't done about that.

Yea, I asked for input about what's hard to understand and what's not -
I stared at this for a *lot* of time, and it all kinda looks easy-ish
now.  I'm more than willing to expand on the below, and other pieces.

> As  an example, it's quite unclear how ExprEvalSteps acquire correct
> resnull/resvalue pointers, and the algorithm for that seems nontrivial.
> It doesn't help any that the arguments of ExecInitExprRec are entirely
> undocumented.

Generally whatever wants the result of a (sub-)expression passes in the
desired resvalue/resnull.  E.g. when doing a function call the
individual arguments are each prepared for evaluation using
ExecInitExprRec() and resvalue/resnull are pointing into fcinfo's
arg/nulls[i].

> I think it would be worth creating a README file giving an overview
> of how all of this patch is supposed to work.  You also need to do a
> whole lot more work on the function-level comments.

Ok.

> A specific thing I noticed in the particular area of
> what-gets-returned-where is this bit in EEOP_SCALARARRAYOP setup:
>
> +/*
> + * Evaluate array argument into our return value, overwrite
> + * with comparison results afterwards.
> + */
> +ExecInitExprRec((Expr *) lsecond(opexpr->args), parent, 
> state,
> +resv, resnull);
>
> That scares me quite a bit, because it smells exactly like the sort of
> premature optimization that bit us on the rear in CVE-2016-5423 (cf commit
> f0c7b789a).

I don't think there's a danger similar to f0c7b789a here, because the
"caller" (i.e. the node that needs the expression's result) expects
resvalue/null to be overwritten. It'll e.g. be the value "slot" of one
arm (is there a better name for one part of a boolean expression?) of a
boolean expression.

> What's it really buying us to overwrite the return value
> early rather than storing into the fcinfo's second argument slot?

That'd work just as well.

> (The memory of that CVE is part of what's prompting me to demand a clear
> explanation of the algorithm for deciding where steps return their
> results.  Even if this particular code is safe, somebody is going to do
> something unsafe in future if there's not a documented rule to follow.)

I don't think there's a danger here, but I think you more generally have
a point.

> Another thing that ties into the do-I-understand-this-at-all question
> is this bit:
>
> +EEO_CASE(EEOP_BOOL_AND_STEP_FIRST)
> +{
> +*op->d.boolexpr.anynull = false;
> +
> +/*
> + * Fallthrough (can't be last - ANDs have two arguments at 
> least).
> + */
> +}
> +
> +EEO_CASE(EEOP_BOOL_AND_STEP)
>
> It seems like this is missing an "op++;" before falling through.  If it
> isn't, because really BOOL_AND_STEP_FIRST is defined as clearing anynull
> and then also doing a regular BOOL_AND_STEP, then the comment seems rather
> off point.

It's intended to fall through this way, i.e. the difference between
_FIRST and not is just that only the former clears anynull.  What the
comment is about, admittedly too cryptically, is that the _LAST step
that then evaluates anynull cannot be the same step as
EEOP_BOOL_AND_STEP_FIRST, because bool AND/OR always has at least two
"arms".  Will expand / move.

> BTW, it sure seems like ExecInitExprRec and related code ought to set
> off all sorts of valgrind complaints?  It's not being careful at all
> to ensure that all fields of the "scratch" record get initialized before
> we memcpy it to someplace.

It worked not long ago - valgrind's replacment memcpy() doesn't trigger
undefined memory warnings, it just copies the "definedness" of each byte
(or bit?).  But your point gives me an idea: It seems like a good idea
to VALGRIND_MAKE_MEM_UNDEFINED() the "scratch" step at some convenient
places, so the definedness of individual operations is more useful.

Thanks for the look!

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Measuring replay lag

2017-03-15 Thread Simon Riggs

On 16 March 2017 at 08:02, Thomas Munro  wrote:

> I agree that these states exist, but we disagree on what 'lag' really
> means, or, rather, which of several plausible definitions would be the
> most useful here.
>
> My proposal is that the *_lag columns should always report how long it
> took for recently written, flushed and applied WAL to be written,
> flushed and applied (and for the primary to know about it).  By this
> definition, sent LSN = applied LSN is not a special case: we simply
> report how long that LSN took to be written, flushed and applied.
>
> Your proposal is that the *_lag columns should report how far in the
> past the standby is at each of the three stages with respect to the
> current end of WAL.  By this definition when sent LSN = applied LSN we
> are currently in the 'A' state meaning 'caught up' and should show
> 00:00:00.

I accept your proposal for how we handle these, on condition that you
write up some docs that explain the subtle difference between the two,
so we can just show people the URL. That needs to explain clearly the
difference in an impartial way between "what is the most recent lag
measurement" and "how long until we are caught up" as possible
intrepretations of these values. Thanks.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to improve performance of replay of AccessExclusiveLock

2017-03-15 Thread Simon Riggs

On 8 March 2017 at 10:02, David Rowley  wrote:
> On 8 March 2017 at 01:13, Simon Riggs  wrote:
>> Don't understand this. I'm talking about setting a flag on
>> commit/abort WAL records, like the attached.
>
> There's nothing setting a flag in the attached. I only see the
> checking of the flag.

Yeh, it wasn't a full patch, just a way marker to the summit for you.

>> We just need to track info so we can set the flag at EOXact and we're done.
>
> Do you have an idea where to store that information? I'd assume we'd
> want to store something during LogAccessExclusiveLocks() and check
> that in XactLogCommitRecord() and XactLogAbortRecord(). I don't see
> anything existing which might help with that today. Do you think I
> should introduce some new global variable for that? Or do you propose
> calling GetRunningTransactionLocks() again while generating the
> Commit/Abort record?

Seemed easier to write it than explain further. Please see what you think.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


hs_skip_overhead_AEL_on_standby.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-15 Thread Andres Freund

On 2017-03-15 18:48:28 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-03-15 18:16:57 -0400, Tom Lane wrote:
> >> [ scratches head... ]  What deforming logic do you think I'm proposing
> >> removing?
> 
> > I thought you were suggesting that we don't do the get_last_attnums (and
> > inlined version in the isSimpleVar case) at execution time anymore,
> > instead relying on logic in the planner to know how much to deform ahead
> > of time.  Then we'd do slot_getsomeattrs in the appropriate places.  But
> > I understood you suggesting to do so only in scan nodes - which doesn't
> > seem sufficient, due to the use of materialized / minimal tuples in
> > other types of nodes.  Did I misunderstand?
> 
> We would need to do it anywhere that we might be projecting from a
> materialized tuple, I suppose.  From the planner's standpoint it would be
> about as easy to do this for all plan nodes as only selected ones.

Yea.

> Anyway the core point here seems to be that skipping ExecProject misses a
> bet because the underlying tuple doesn't get disassembled.  A quick-hack
> way of seeing if this helps might be to do slot_getallattrs in the places
> where we skip ExecProject.
> I'm not sure we'd want that as a permanent
> solution, because it's possible that it extracts columns not actually
> needed, but it should at least move the hotspot in your example case.

Hm, that could hurt pretty badly if we actually only access the first
few columns.

I wonder if we shouldn't essentially get rid of all slot_getattr()
calls, and replace them with one slot_getsomeattrs() and then direct
tts_values/nulls access.  Where we compute the column number is then
essentially a separate discussion.

Looking around most of them seem to access multiple columns, and several
of them probably can't conveniently done via the planner:

src/backend/catalog/partition.c
1635:   datum = slot_getattr(slot, keycol, );

acesses all the partitioned columns and has convenient location to
compute column to deform to (RelationGetPartitionDispatchInfo).

src/backend/catalog/index.c
1797:   iDatum = slot_getattr(slot, keycol, );

acesses all the partitioned columns and has convenient location to
compute column to deform to (BuildIndexInfo).

src/backend/utils/adt/orderedsetaggs.c
1196:   Datum   d = slot_getattr(slot, nargs + 1, );
1359:   Datum   d = slot_getattr(slot, nargs + 1, );

no benefit.

src/backend/executor/nodeMergeAppend.c
246:datum1 = slot_getattr(s1, attno, );
247:datum2 = slot_getattr(s2, attno, );

accesses all columns in the sort key, convenient place to prepare
(ExecInitMergeAppend).

src/backend/executor/execQual.c
594: * caught inside slot_getattr).  What we have to check for here is the
600: * Note: we allow a reference to a dropped attribute.  slot_getattr will
637:return slot_getattr(slot, attnum, isNull);
676:return slot_getattr(slot, attnum, isNull);
1681:   return slot_getattr(fcache->funcResultSlot, 1, 
isNull);

Already converted in proposed expression evaluation patch.

src/backend/executor/nodeSubplan.c
367:dvalue = slot_getattr(slot, 1, );
395:prmdata->value = slot_getattr(slot, col, 
&(prmdata->isnull));
565:prmdata->value = slot_getattr(slot, col,
1015:   dvalue = slot_getattr(slot, 1, );

Accesses all params, convenient location to compute column number
(ExecInitSubPlan).

src/backend/executor/execCurrent.c
186:/* Use slot_getattr to catch any possible mistakes */
188:
DatumGetObjectId(slot_getattr(scanstate->ss_ScanTupleSlot,
193:
DatumGetPointer(slot_getattr(scanstate->ss_ScanTupleSlot,

Accesses only system columns.

src/backend/executor/nodeSetOp.c
107:flag = DatumGetInt32(slot_getattr(inputslot,

Not an actual column.

src/backend/executor/nodeGatherMerge.c
678:datum1 = slot_getattr(s1, attno, );
679:datum2 = slot_getattr(s2, attno, );

Same as mergeAppend (huh, why did we copy this code), i.e. beneficial.

src/backend/executor/functions.c
970:value = slot_getattr(slot, 1, &(fcinfo->isnull));

no benefit.

src/backend/executor/execJunk.c
253:return slot_getattr(slot, attno, isNull);

no benefit.

src/backend/executor/nodeNestloop.c
136:prm->value = slot_getattr(outerTupleSlot,

accesses all future param values, convenient place to compute column
number.

src/backend/executor/execGrouping.c
100:attr1 = slot_getattr(slot1, att, );
102:attr2 = slot_getattr(slot2, att, );
170:attr1 = slot_getattr(slot1, att, );
175:attr2 = slot_getattr(slot2, att, );
501:attr = slot_getattr(slot, att, );

accesses all grouped-on values, but only some have a convenient place to
compute column number.

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-15 Thread Tom Lane

Andres Freund  writes:
> [ new patches ]

I've started to look at 0004, and the first conclusion I've come to
is that it's *direly* short of documentation.  To the point that I'd
vote against committing it if something isn't done about that.  As
an example, it's quite unclear how ExprEvalSteps acquire correct
resnull/resvalue pointers, and the algorithm for that seems nontrivial.
It doesn't help any that the arguments of ExecInitExprRec are entirely
undocumented.

I think it would be worth creating a README file giving an overview
of how all of this patch is supposed to work.  You also need to do a
whole lot more work on the function-level comments.

A specific thing I noticed in the particular area of
what-gets-returned-where is this bit in EEOP_SCALARARRAYOP setup:

+/*
+ * Evaluate array argument into our return value, overwrite
+ * with comparison results afterwards.
+ */
+ExecInitExprRec((Expr *) lsecond(opexpr->args), parent, state,
+resv, resnull);

That scares me quite a bit, because it smells exactly like the sort of
premature optimization that bit us on the rear in CVE-2016-5423 (cf commit
f0c7b789a).  What's it really buying us to overwrite the return value
early rather than storing into the fcinfo's second argument slot?
(The memory of that CVE is part of what's prompting me to demand a clear
explanation of the algorithm for deciding where steps return their
results.  Even if this particular code is safe, somebody is going to do
something unsafe in future if there's not a documented rule to follow.)

Another thing that ties into the do-I-understand-this-at-all question
is this bit:

+EEO_CASE(EEOP_BOOL_AND_STEP_FIRST)
+{
+*op->d.boolexpr.anynull = false;
+
+/*
+ * Fallthrough (can't be last - ANDs have two arguments at least).
+ */
+}
+
+EEO_CASE(EEOP_BOOL_AND_STEP)

It seems like this is missing an "op++;" before falling through.  If it
isn't, because really BOOL_AND_STEP_FIRST is defined as clearing anynull
and then also doing a regular BOOL_AND_STEP, then the comment seems rather
off point.  It should be more like "Fall through to do regular AND step
processing as well".  The place where the comment would be on point
is probably over here:

+case AND_EXPR:
+/*
+ * ANDs have at least two arguments, so that
+ * no step needs to be both FIRST and LAST.
+ */
+Assert(list_length(boolexpr->args) >= 2);
+
+if (off == 0)
+scratch.opcode = EEOP_BOOL_AND_STEP_FIRST;
+else if (off + 1 == nargs)
+scratch.opcode = EEOP_BOOL_AND_STEP_LAST;
+else
+scratch.opcode = EEOP_BOOL_AND_STEP;
+break;

although I think the Assert ought to be examining nargs not
list_length(boolexpr->args) so that it has some visible connection to the
code after it.  (This all applies to OR processing as well, of course.)

BTW, it sure seems like ExecInitExprRec and related code ought to set
off all sorts of valgrind complaints?  It's not being careful at all
to ensure that all fields of the "scratch" record get initialized before
we memcpy it to someplace.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Measuring replay lag

2017-03-15 Thread Thomas Munro

On Thu, Mar 16, 2017 at 12:07 PM, Simon Riggs  wrote:
> There are two ways of knowing the lag: 1) by measurement/sampling,
> which is the main way this patch approaches this, 2) by direct
> observation the LSNs match. Both are equally valid ways of
> establishing knowledge. Strangely (2) is the only one of those that is
> actually precise and yet you say it is bogus. It is actually the
> measurements which are approximations of the actual state.
>
> The reality is that the lag can change dis-continuously between zero
> and non-zero. I don't think we should hide that from people.
>
> I suspect that your "entirely bogus" feeling comes from the point that
> we actually have 3 states, one of which has unknown lag.
>
> A) "Currently caught-up"
> WALSender LSN == WALReceiver LSN (info type (1))
> At this point the current lag is known precisely to be zero.
>
> B) "Work outstanding, no reply yet"
> Immediately after where WALSenderLSN > WALReceiverLSN, yet we haven't
> yet received new reply
> We expect to stay in this state for however long it takes to receive a
> reply, which could be wal_receiver_status_interval or longer if the
> lag is greater. At this point we have no measurement of what the lag
> is. We could reply NULL since we don't know. We could reply with the
> last measured lag when we were last in state C, but if the new reply
> was delayed for more than that we'd need to reply that the lag is at
> least as high as the delay since last time we left state A.
>
> C) "Continuous flow"
> WALSenderLSN > WALReceiverLSN and we have received a reply
> (measurement, info type (2))
> This is the main case. Easy-ish!
>
> So I think we need to first agree that A and B states exist and how to
> report lag in each state.

I agree that these states exist, but we disagree on what 'lag' really
means, or, rather, which of several plausible definitions would be the
most useful here.

My proposal is that the *_lag columns should always report how long it
took for recently written, flushed and applied WAL to be written,
flushed and applied (and for the primary to know about it).  By this
definition, sent LSN = applied LSN is not a special case: we simply
report how long that LSN took to be written, flushed and applied.

Your proposal is that the *_lag columns should report how far in the
past the standby is at each of the three stages with respect to the
current end of WAL.  By this definition when sent LSN = applied LSN we
are currently in the 'A' state meaning 'caught up' and should show
00:00:00.

Here are two reasons I prefer my definition:

* you can trivially convert from my definition to yours on the basis
of existing information: CASE WHEN sent_location = replay_location
THEN '00:00:00'::interval ELSE replay_lag END, but there is no way to
get from your definition to mine

* lag numbers reported using my definition tell you how long each of
the synchronous replication levels take, but with your definition they
only do that if you catch them during times when they aren't showing
the special case 00:00:00; a fast standby running any workload other
than a benchmark is often going to show all-caught-up 00:00:00 so the
new columns will be useless for that purpose

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-15 Thread Dilip Kumar

On Thu, Mar 16, 2017 at 5:02 AM, Dilip Kumar  wrote:
> After above fix, I am not able to reproduce. Can you give me the
> backtrace of the crash location or the dump?
>
> I am trying on the below commit
>
> commit c5832346625af4193b1242e57e7d13e66a220b38
> Author: Stephen Frost 
> Date:   Wed Mar 15 11:19:39 2017 -0400
>
> + 
> https://www.postgresql.org/message-id/attachment/50164/brin-correlation-v3.patch
> + fix_tbm_empty.patch

Forgot to mention after fix I am seeing this output.

postgres=# explain analyze select * from only r2 where i = 10;
  QUERY PLAN
---
 Gather  (cost=2880.56..9251.98 rows=1 width=4) (actual
time=3.857..3.857 rows=0 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Bitmap Heap Scan on r2  (cost=1880.56..8251.88 rows=1
width=4) (actual time=0.043..0.043 rows=0 loops=3)
 Recheck Cond: (i = 10)
 ->  Bitmap Index Scan on r2_i_idx  (cost=0.00..1880.56
rows=373694 width=0) (actual time=0.052..0.052 rows=0 loops=1)
   Index Cond: (i = 10)
 Planning time: 0.111 ms
 Execution time: 4.449 ms
(9 rows)

postgres=# select * from only r2 where i = 10;
 i
---
(0 rows)

Are you getting the crash with the same test case?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-15 Thread Dilip Kumar

On Thu, Mar 16, 2017 at 12:56 AM, Emre Hasegeli  wrote:
>> Please verify the fix.
>
> The same test with both of the patches applied still crashes for me.
After above fix, I am not able to reproduce. Can you give me the
backtrace of the crash location or the dump?

I am trying on the below commit

commit c5832346625af4193b1242e57e7d13e66a220b38
Author: Stephen Frost 
Date:   Wed Mar 15 11:19:39 2017 -0400

+ 
https://www.postgresql.org/message-id/attachment/50164/brin-correlation-v3.patch
+ fix_tbm_empty.patch


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Measuring replay lag

2017-03-15 Thread Simon Riggs

On 14 March 2017 at 07:39, Thomas Munro  wrote:
> Hi,
>
> Please see separate replies to Simon and Craig below.
>
> On Sun, Mar 5, 2017 at 8:38 PM, Simon Riggs  wrote:
>> On 1 March 2017 at 10:47, Thomas Munro  wrote:
>>> I do see why a new user trying this feature for the first time might
>>> expect it to show a lag of 0 just as soon as sent LSN =
>>> write/flush/apply LSN or something like that, but after some
>>> reflection I suspect that it isn't useful information, and it would be
>>> smoke and mirrors rather than real data.
>>
>> Perhaps I am misunderstanding the way it works.
>>
>> If the last time WAL was generated the lag was 14 secs, then nothing
>> occurs for 2 hours aftwards AND all changes have been successfully
>> applied then it should not continue to show 14 secs for the next 2
>> hours.
>>
>> IMHO the lag time should drop to zero in a reasonable time and stay at
>> zero for those 2 hours because there is no current lag.
>>
>> If we want to show historical lag data, I'm supportive of the idea,
>> but we must report an accurate current value when the system is busy
>> and when the system is quiet.
>
> Ok, I thought about this for a bit and have a new idea that I hope
> will be more acceptable.  Here are the approaches considered:
>
> 1.  Show the last measured lag times on a completely idle system until
> such time as the standby eventually processes more lag, as I had it in
> the v5 patch.  You don't like that and I admit that it is not really
> satisfying (even though I know that real Postgres systems alway
> generate more WAL fairly soon even without user sessions, it's not
> great that it depends on an unknown future event to clear the old
> data).
>
> 2.  Recognise when the last reported write/flush/apply LSN from the
> standby == end of WAL on the sending server, and show lag times of
> 00:00:00 in all three columns.  I consider this entirely bogus: it's
> not an actual measurement that was ever made, and on an active system
> it would flip-flop between real measurements and the artificial
> 00:00:00.  I do not like this.

There are two ways of knowing the lag: 1) by measurement/sampling,
which is the main way this patch approaches this, 2) by direct
observation the LSNs match. Both are equally valid ways of
establishing knowledge. Strangely (2) is the only one of those that is
actually precise and yet you say it is bogus. It is actually the
measurements which are approximations of the actual state.

The reality is that the lag can change dis-continuously between zero
and non-zero. I don't think we should hide that from people.

I suspect that your "entirely bogus" feeling comes from the point that
we actually have 3 states, one of which has unknown lag.

A) "Currently caught-up"
WALSender LSN == WALReceiver LSN (info type (1))
At this point the current lag is known precisely to be zero.

B) "Work outstanding, no reply yet"
Immediately after where WALSenderLSN > WALReceiverLSN, yet we haven't
yet received new reply
We expect to stay in this state for however long it takes to receive a
reply, which could be wal_receiver_status_interval or longer if the
lag is greater. At this point we have no measurement of what the lag
is. We could reply NULL since we don't know. We could reply with the
last measured lag when we were last in state C, but if the new reply
was delayed for more than that we'd need to reply that the lag is at
least as high as the delay since last time we left state A.

C) "Continuous flow"
WALSenderLSN > WALReceiverLSN and we have received a reply
(measurement, info type (2))
This is the main case. Easy-ish!

So I think we need to first agree that A and B states exist and how to
report lag in each state.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_wal_write statistics view

2017-03-15 Thread Julien Rouhaud

On Wed, Feb 15, 2017 at 02:53:44PM +1100, Haribabu Kommi wrote:
> Here I attached patch that implements the view.
> I will add this patch to next commitfest.

Hello,

I just reviewed the patch.

First, there are some whitespace issues that make git-apply complaining (at
least lines 218 and 396).

Patch is rather straightforward and works as expected, doc compiles without
issue.

I only found some minor issues:

+  One row only, showing statistics about the
+   wal writing activity. See

+  Number of wal writes that are carried out by the backend 
process

WAL should be uppercase (and for some more occurences).

+  
+  Number of wal writes that are carried out by background workers such as 
checkpointer,
+  writer and walwriter.

I guess you meant backgroung processes?

>+  This field data will be populated only when the track_io_timing GUC is 
>enabled
(and similar occurences)

track_io_timing should link to  instead of
mentionning GUC.

I think you also need to update the track_io_timing description in
sgml/config.sgml to mention this new view.

+   else
+   {
+   LocalWalWritesStats.m_wal_total_write_time = 0;
+   }
(and similar ones)

The brackets seem unnecessary.

I marked the commitfest entry as waiting on author.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-15 Thread Tom Lane

Andres Freund  writes:
> On 2017-03-15 18:16:57 -0400, Tom Lane wrote:
>> [ scratches head... ]  What deforming logic do you think I'm proposing
>> removing?

> I thought you were suggesting that we don't do the get_last_attnums (and
> inlined version in the isSimpleVar case) at execution time anymore,
> instead relying on logic in the planner to know how much to deform ahead
> of time.  Then we'd do slot_getsomeattrs in the appropriate places.  But
> I understood you suggesting to do so only in scan nodes - which doesn't
> seem sufficient, due to the use of materialized / minimal tuples in
> other types of nodes.  Did I misunderstand?

We would need to do it anywhere that we might be projecting from a
materialized tuple, I suppose.  From the planner's standpoint it would be
about as easy to do this for all plan nodes as only selected ones.

Anyway the core point here seems to be that skipping ExecProject misses a
bet because the underlying tuple doesn't get disassembled.  A quick-hack
way of seeing if this helps might be to do slot_getallattrs in the places
where we skip ExecProject.  I'm not sure we'd want that as a permanent
solution, because it's possible that it extracts columns not actually
needed, but it should at least move the hotspot in your example case.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Measuring replay lag

2017-03-15 Thread Simon Riggs

On 14 March 2017 at 07:39, Thomas Munro  wrote:
>
> On Mon, Mar 6, 2017 at 3:22 AM, Craig Ringer  wrote:
>> On 5 March 2017 at 15:31, Simon Riggs  wrote:
>>> What we want from this patch is something that works for both, as much
>>> as that is possible.
>>
>> If it shows a sawtooth pattern for flush lag, that's good, because
>> it's truthful. We can only flush after we replay commit, therefore lag
>> is always going to be sawtooth, with tooth size approximating xact
>> size and the baseline lag trend representing any sustained increase or
>> decrease in lag over time.
>>
>> This would be extremely valuable to have.
>
> Thanks for your detailed explanation Craig.  (I also had a chat with
> Craig about this off-list.)  Based on your feedback, I've added
> support for reporting lag from logical replication, warts and all.
>
> Just a thought:  perhaps logical replication could consider
> occasionally reporting a 'write' position based on decoded WAL written
> to reorder buffers (rather than just reporting the apply LSN as write
> LSN at commit time)?  I think that would be interesting information in
> its own right, but would also provide more opportunities to
> interpolate the flush/apply sawtooth for large transactions.
>
> Please find a new version attached.

My summary is that with logical the values only change at commit time.
With a stream of small transactions there shouldn't be any noticeable
sawtooth.

Please put in a substantive comment, rather than just "See explanation
in XLogSendPhysical" cos that's clearly not enough. Please write docs
so I can commit this.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-15 Thread Andres Freund

On 2017-03-15 18:16:57 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-03-15 17:33:46 -0400, Tom Lane wrote:
> >> We could make the planner mark each table scan node with the highest
> >> column number that the plan will access, and use that to drive a
> >> slot_getsomeattrs call in advance of any access to tuple contents.
> 
> > probably isn't sufficient - we build non-virtual tuples in a good number
> > of places (sorts, tuplestore using stuff like nodeMaterial, nodeHash.c
> > output, ...).  I suspect it'd have measurable negative consequences if
> > we removed the deforming logic for all expressions/projections above
> > such nodes.  I guess we could just do such a logic for every Plan node?
> 
> [ scratches head... ]  What deforming logic do you think I'm proposing
> removing?

I thought you were suggesting that we don't do the get_last_attnums (and
inlined version in the isSimpleVar case) at execution time anymore,
instead relying on logic in the planner to know how much to deform ahead
of time.  Then we'd do slot_getsomeattrs in the appropriate places.  But
I understood you suggesting to do so only in scan nodes - which doesn't
seem sufficient, due to the use of materialized / minimal tuples in
other types of nodes.  Did I misunderstand?

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-15 Thread Tom Lane

Andres Freund  writes:
> On 2017-03-15 17:33:46 -0400, Tom Lane wrote:
>> We could make the planner mark each table scan node with the highest
>> column number that the plan will access, and use that to drive a
>> slot_getsomeattrs call in advance of any access to tuple contents.

> probably isn't sufficient - we build non-virtual tuples in a good number
> of places (sorts, tuplestore using stuff like nodeMaterial, nodeHash.c
> output, ...).  I suspect it'd have measurable negative consequences if
> we removed the deforming logic for all expressions/projections above
> such nodes.  I guess we could just do such a logic for every Plan node?

[ scratches head... ]  What deforming logic do you think I'm proposing
removing?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] scram and \password

2017-03-15 Thread Michael Paquier

On Thu, Mar 16, 2017 at 6:46 AM, Joe Conway  wrote:
> On 03/15/2017 08:48 AM, Michael Paquier wrote:
>> I have been hacking my way through this thing, and making
>> scram_build_verifier is requiring a bit more refactoring than I
>> thought first:
>> - stored and server keys are hex-encoded using a backend-only routine.
>> I think that those should be instead base64-encoded using what has
>> already been moved in src/common/.
>
> Or possibly make the hex routines available in the front end as well
> instead?

Yeah, but keeping in mind that src/common/ should be kept as small as
necessary, the base64 switch made more sense (hex_encode is really
small I know).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-15 Thread Andres Freund

On 2017-03-15 17:33:46 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-03-15 16:07:14 -0400, Tom Lane wrote:
> >> As for ExecHashGetHashValue, it's most likely going to be working from
> >> virtual tuples passed up to the join, which won't benefit from
> >> predetermination of the last column to be accessed.  The
> >> tuple-deconstruction would have happened while projecting in the scan
> >> node below.
> 
> > I think the physical tuple stuff commonly thwarts that argument?  On
> > master for tpch's Q5 you can e.g. see the following profile (master):
> 
> Hmmm ... I think you're mistaken in fingering the physical-tuple
> optimization per se, but maybe skipping ExecProject at the scan level
> would cause this result?

I think those are often related (i.e. we replace a smaller targetlist
with a "physical" one, which then allows to skip ExecProject()).

> I've thought for some time that it was dumb to have the executor
> reverse-engineering this info at plan startup anyway.

Yea, it'd be good if this (and some similar tasks like building interim
tuple descriptors) could be moved to the planner.  But:

> We could make the planner mark each table scan node with the highest
> column number that the plan will access, and use that to drive a
> slot_getsomeattrs call in advance of any access to tuple contents.

probably isn't sufficient - we build non-virtual tuples in a good number
of places (sorts, tuplestore using stuff like nodeMaterial, nodeHash.c
output, ...).  I suspect it'd have measurable negative consequences if
we removed the deforming logic for all expressions/projections above
such nodes.  I guess we could just do such a logic for every Plan node?

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] scram and \password

2017-03-15 Thread Joe Conway

On 03/15/2017 08:48 AM, Michael Paquier wrote:
> I have been hacking my way through this thing, and making
> scram_build_verifier is requiring a bit more refactoring than I
> thought first:
> - stored and server keys are hex-encoded using a backend-only routine.
> I think that those should be instead base64-encoded using what has
> already been moved in src/common/.

Or possibly make the hex routines available in the front end as well
instead?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-15 Thread Tom Lane

Andres Freund  writes:
> On 2017-03-15 16:07:14 -0400, Tom Lane wrote:
>> As for ExecHashGetHashValue, it's most likely going to be working from
>> virtual tuples passed up to the join, which won't benefit from
>> predetermination of the last column to be accessed.  The
>> tuple-deconstruction would have happened while projecting in the scan
>> node below.

> I think the physical tuple stuff commonly thwarts that argument?  On
> master for tpch's Q5 you can e.g. see the following profile (master):

Hmmm ... I think you're mistaken in fingering the physical-tuple
optimization per se, but maybe skipping ExecProject at the scan level
would cause this result?

I've thought for some time that it was dumb to have the executor
reverse-engineering this info at plan startup anyway.  We could make the
planner mark each table scan node with the highest column number that the
plan will access, and use that to drive a slot_getsomeattrs call in
advance of any access to tuple contents.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallelize queries containing initplans

2017-03-15 Thread Kuntal Ghosh

On Tue, Mar 14, 2017 at 3:20 PM, Amit Kapila  wrote:
> Based on that idea, I have modified the patch such that it will
> compute the set of initplans Params that are required below gather
> node and store them as bitmap of initplan params at gather node.
> During set_plan_references, we can find the intersection of external
> parameters that are required at Gather or nodes below it with the
> initplans that are passed from same or above query level. Once the set
> of initplan params are established, we evaluate those (if they are not
> already evaluated) before execution of gather node and then pass the
> computed value to each of the workers.   To identify whether a
> particular param is parallel safe or not, we check if the paramid of
> the param exists in initplans at same or above query level.  We don't
> allow to generate gather path if there are initplans at some query
> level below the current query level as those plans could be
> parallel-unsafe or undirect correlated plans.

I would like to mention different test scenarios with InitPlans that
we've considered while developing and testing of the patch.

An InitPlan is a subselect that doesn't take any reference from its
immediate outer query level and it returns a param value. For example,
consider the following query:

  QUERY PLAN
--
 Seq Scan on t1
   Filter: (k = $0)
   allParams: $0
   InitPlan 1 (returns $0)
 ->  Aggregate
   ->  Seq Scan on t3
In this case, the InitPlan is evaluated once when the filter is
checked for the first time. For subsequent checks, we need not
evaluate the initplan again since we already have the value. In our
approach, we parallelize the sequential scan by inserting a Gather
node on top of parallel sequential scan node. At the Gather node, we
evaluate the InitPlan before spawning the workers and pass this value
to the worker using dynamic shared memory. This yields the following
plan:
QUERY PLAN
---
 Gather
   Workers Planned: 2
   Params Evaluated: $0
   InitPlan 1 (returns $0)
   ->  Aggregate
   ->  Seq Scan on t3
   ->  Parallel Seq Scan on t1
 Filter: (k = $0)
As Amit mentioned up in the thread, at a Gather node, we evaluate only
those InitPlans that are attached to this query level or any higher
one and are used under the Gather node. extParam at a node includes
the InitPlan params that should be passed from an outer node. I've
attached a patch to show extParams and allParams for each node. Here
is the output with that patch:
QUERY PLAN
---
 Gather
   Workers Planned: 2
   Params Evaluated: $0
   allParams: $0
   InitPlan 1 (returns $0)
 ->  Finalize Aggregate
   ->  Gather
 Workers Planned: 2
 ->  Partial Aggregate
   ->  Parallel Seq Scan on t3
   ->  Parallel Seq Scan on t1
 Filter: (k = $0)
 allParams: $0
 extParams: $0
In this case, $0 is included in extParam of parallel sequential scan
and the InitPlan corresponding to this param is attached to the same
query level that contains the Gather node. Hence, we evaluate $0 at
Gather and pass it to workers.

But, for generating a plan like this requires marking an InitPlan
param as parallel_safe. We can't mark all params as parallel_safe
because of correlated subselects. Hence, in
max_parallel_hazard_walker, the only params marked safe are InitPlan
params from current or outer query level. An InitPlan param from inner
query level isn't marked safe since we can't evaluate this param at
any Gather node above the current node(where the param is used). As
mentioned by Amit, we also don't allow generation of gather path if
there are InitPlans at some query level below the current query level
as those plans could be parallel-unsafe or undirect correlated plans.

I've attached a script file and its output containing several
scenarios relevant to InitPlans. I've also attached the patch for
displaying extParam and allParam at each node. This patch can be
applied on top of pq_pushdown_initplan_v3.patch. Please find the
attachments.


> This restricts some of the cases for parallelism like when initplans
> are below gather node, but the patch looks better. We can open up
> those cases if required in a separate patch.
+1. Unfortunately, this patch doesn't enable parallelism for all
possible cases with InitPlans. Our objective is to keep things simple
and clean. Still, TPC-H q22 runs 2.5~3 times faster with this patch.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


script.out
Description: Binary data


script.sql
Description: application/sql


0002-Show-extParams-and-allParams-in-explain.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-15 Thread Andres Freund

On 2017-03-15 16:07:14 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-03-15 15:41:22 -0400, Tom Lane wrote:
> >> Color me dubious.  Which specific other places have you got in mind, and
> >> do they have expression trees at hand that would tell them which columns
> >> they really need to pull out?
>
> > I was thinking of execGrouping.c's execTuplesMatch(),
> > TupleHashTableHash() (and unequal, but doubt that matters
> > performancewise).  There's also nodeHash.c's ExecHashGetValue(), but I
> > think that'd possibly better fixed differently.
>
> The execGrouping.c functions don't have access to an expression tree
> instructing them which columns to pull out of the tuple, so I fail to see
> how get_last_attnums() would be of any use to them.

I presume most of the callers do.  We'd have to change the API somewhat,
unless we just have a small loop in execTuplesMatch() determining the
biggest column index (which might be worthwhile / acceptable).
TupleHashTableHash() should be able to have that pre-computed in
BuildTupleHashTable().  Might be more viable to go that way.


> As for ExecHashGetHashValue, it's most likely going to be working from
> virtual tuples passed up to the join, which won't benefit from
> predetermination of the last column to be accessed.  The
> tuple-deconstruction would have happened while projecting in the scan
> node below.

I think the physical tuple stuff commonly thwarts that argument?  On
master for tpch's Q5 you can e.g. see the following profile (master):

+   29.38%  postgres  postgres  [.] ExecScanHashBucket
+   16.72%  postgres  postgres  [.] slot_getattr
+5.51%  postgres  postgres  [.] heap_getnext
-5.50%  postgres  postgres  [.] slot_deform_tuple
   - 98.07% slot_deform_tuple
  - 85.98% slot_getattr
 - 96.59% ExecHashGetHashValue
- ExecHashJoin
   - ExecProcNode
  + 85.12% ExecHashJoin
  + 14.88% MultiExecHash
 + 3.41% ExecMakeFunctionResultNoSets
  + 14.02% slot_getsomeattrs
   + 1.58% ExecEvalScalarVarFast

I.e. nearly all calls for slot_deform_tuple are from slot_getattrs in
ExecHashGetHashValue().  And nearly all the time in slot_getattr is
spent on code only executed for actual tuples:

   │   if (tuple == NULL)  /* internal 
error */
  0.18 │ test   %rax,%rax
   │   ↓ je 223
   │*
   │* (We have to check this separately because of various 
inheritance and
   │* table-alteration scenarios: the tuple could be either 
longer or shorter
   │* than the tupdesc.)
   │*/
   │   tup = tuple->t_data;
  0.47 │ mov0x10(%rax),%rsi
   │   if (attnum > HeapTupleHeaderGetNatts(tup))
 75.42 │ movzwl 0x12(%rsi),%eax
  0.70 │ and$0x7ff,%eax
  0.47 │ cmp%eax,%ebx
   │   ↓ jg e8

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 2017-03 Commitfest Midterm

2017-03-15 Thread David Steele

On 3/15/17 4:38 PM, Alvaro Herrera wrote:
> David Steele wrote:
>> We are now halfway through the 2017-03 CF.  Here's the breakdown:
>>
>> Needs review: 83 (-45)
>> Waiting on Author: 36 (+10)
>> Ready for Committer: 19 (-6)
>> Total: 208 (+28)
>>
>> It's interesting that there are 28 more patches than there were on the
>> 1st.  Either I had a copy-paste-o or the patches are multiplying on
>> their own.
> 
> Adding those three numbers, you get 138.  The 208 also includes
> Committed and the other closed states, which you didn't count the first
> time around I think.

Thanks, Álvaro. That's it exactly.  I forgot I had created my own total
to exclude actions that happened before the CF started.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 2017-03 Commitfest Midterm

2017-03-15 Thread Alvaro Herrera

David Steele wrote:
> We are now halfway through the 2017-03 CF.  Here's the breakdown:
> 
> Needs review: 83 (-45)
> Waiting on Author: 36 (+10)
> Ready for Committer: 19 (-6)
> Total: 208 (+28)
> 
> It's interesting that there are 28 more patches than there were on the
> 1st.  Either I had a copy-paste-o or the patches are multiplying on
> their own.

Adding those three numbers, you get 138.  The 208 also includes
Committed and the other closed states, which you didn't count the first
time around I think.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] 2017-03 Commitfest Midterm

2017-03-15 Thread David Steele

We are now halfway through the 2017-03 CF.  Here's the breakdown:

Needs review: 83 (-45)
Waiting on Author: 36 (+10)
Ready for Committer: 19 (-6)
Total: 208 (+28)

It's interesting that there are 28 more patches than there were on the
1st.  Either I had a copy-paste-o or the patches are multiplying on
their own.

If you are a patch author and your patch is in the "Waiting for Author"
state, please be sure to provide an updated patch as soon as possible.
Next week we will need to start making some hard choices about what is
possible for v10 and what is not.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Microvacuum support for Hash Index

2017-03-15 Thread Robert Haas

On Wed, Mar 15, 2017 at 3:54 PM, Ashutosh Sharma  wrote:
> Changed as per suggestions. Attached v9 patch. Thanks.

Wow, when do you sleep?   Will have a look.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd listen_addresses behavior

2017-03-15 Thread Joshua D. Drake


On 03/15/2017 12:57 PM, Tom Lane wrote:


I'm suspicious that you have an override of listen_addresses somewhere ---
for instance, the "-i" postmaster command line switch effectively is
--listen-addresses='*'.


Yep postgresql.auto.conf.

Thanks, sorry for the noise.

JD

--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Unless otherwise stated, opinions are my own.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-15 Thread Tom Lane

Andres Freund  writes:
> On 2017-03-15 15:41:22 -0400, Tom Lane wrote:
>> Color me dubious.  Which specific other places have you got in mind, and
>> do they have expression trees at hand that would tell them which columns
>> they really need to pull out?

> I was thinking of execGrouping.c's execTuplesMatch(),
> TupleHashTableHash() (and unequal, but doubt that matters
> performancewise).  There's also nodeHash.c's ExecHashGetValue(), but I
> think that'd possibly better fixed differently.

The execGrouping.c functions don't have access to an expression tree
instructing them which columns to pull out of the tuple, so I fail to see
how get_last_attnums() would be of any use to them.  As for
ExecHashGetHashValue, it's most likely going to be working from virtual
tuples passed up to the join, which won't benefit from predetermination
of the last column to be accessed.  The tuple-deconstruction would have
happened while projecting in the scan node below.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-15 Thread Andres Freund

On 2017-03-15 15:41:22 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On March 15, 2017 12:33:28 PM PDT, Tom Lane  wrote:
> >> So for my money, you should drop 0002 altogether and just have 0004
> >> remove get_last_attnums() from execUtils.c and stick it into
> >> execExpr.c.  I suppose you still need the LastAttnumInfo API change
> >> so as to decouple it from ProjectionInfo, but that's minor.
> 
> > I think it's quite useful in other places too, we do repeated slot-getattrs 
> > in a bunch of places in the executor, and it's noticeable performance wise. 
> 
> Color me dubious.  Which specific other places have you got in mind, and
> do they have expression trees at hand that would tell them which columns
> they really need to pull out?

I was thinking of execGrouping.c's execTuplesMatch(),
TupleHashTableHash() (and unequal, but doubt that matters
performancewise).  There's also nodeHash.c's ExecHashGetValue(), but I
think that'd possibly better fixed differently.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd listen_addresses behavior

2017-03-15 Thread Tom Lane

"Joshua D. Drake"  writes:
> jd@jd-wks:~/snap/postgresql96/common$ grep listen_addresses 
> data/postgresql.conf
> listen_addresses = '192*' # what IP address(es) to listen on;

> -- I wasn't actually expecting the above to work. I was just testing.

Fails as expected for me:

$ postgres --listen-addresses='192*'
2017-03-15 15:50:11.024 EDT [3852] LOG:  could not translate host name "192*", 
service "5432" to address: Name or service not known
2017-03-15 15:50:11.024 EDT [3852] WARNING:  could not create listen socket for 
"192*"
2017-03-15 15:50:11.024 EDT [3852] FATAL:  could not create any TCP/IP sockets
2017-03-15 15:50:11.024 EDT [3852] LOG:  database system is shut down


> postgres=# show listen_addresses ;
>   listen_addresses
> --
>   *
> (1 row)

I'm suspicious that you have an override of listen_addresses somewhere ---
for instance, the "-i" postmaster command line switch effectively is
--listen-addresses='*'.  Even if you had a version of getnameinfo() that
failed to complain about '192*', that would not cause the recorded value
of the string GUC to silently transmogrify into something else.  You might
look into pg_settings to see where it says that value of listen_addresses
came from.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Microvacuum support for Hash Index

2017-03-15 Thread Ashutosh Sharma

> +action = XLogReadBufferForRedo(record, 0, );
> +
> +if (!IsBufferCleanupOK(buffer))
> +elog(PANIC, "hash_xlog_vacuum_one_page: failed to acquire
> cleanup lock");
>
> That could fail, I think, because of a pin from a Hot Standby backend.
> You want to call XLogReadBufferForRedoExtended() with a third argument
> of true.

Yes, there is a possibility that a new backend may start in standby
after we kill the conflicting backends. If the new backend has pin on
the buffer which the startup process is trying to read then
'IsBufferCleanupOK' will fail thereby causing a startup process to
PANIC.

 Come to think of it, shouldn't hash_xlog_split_allocate_page
> be changed the same way?

No, the reason being we are trying to allocate a new bucket page on
standby so there can't be any backend which could have pin on a page
that is yet to initialised.

>
> +/*
> + * Let us mark the page as clean if vacuum removes the DEAD 
> tuples
> + * from an index page. We do this by clearing
> LH_PAGE_HAS_DEAD_TUPLES
> + * flag.
> + */
>
> Maybe add: Clearing this flag is just a hint; replay won't redo this.

Added. Please check the attached v9 patch.

>
> + * Hash Index records that are marked as LP_DEAD and being removed during
> + * hash index tuple insertion can conflict with standby queries.You might
>
> The word Index shouldn't be capitalized here.  There should be a space
> before "You".

Corrected.

>
> The formatting of this comment is oddly narrow:
>
> + * _hash_vacuum_one_page - vacuum just one index page.
> + * Try to remove LP_DEAD items from the given page.  We
> + * must acquire cleanup lock on the page being modified
> + * before calling this function.
>
> I'd add a blank line after the first line and reflow the rest to be
> something more like 75 characters.  pgindent evidently doesn't think
> this needs reformatting, but it's oddly narrow.

Corrected.

>
> I suggest changing the header comment of
> hash_xlog_vacuum_get_latestRemovedXid like this:
>
> + * Get the latestRemovedXid from the heap pages pointed at by the index
> + * tuples being deleted.  See also btree_xlog_delete_get_latestRemovedXid,
> + * on which this function is based.
>
> This is looking good.

Changed as per suggestions. Attached v9 patch. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


microvacuum_hash_index_v9.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Odd listen_addresses behavior

2017-03-15 Thread Joshua D. Drake


-hackers,

I found this today:

jd@jd-wks:~/snap/postgresql96/common/data$ 
/snap/postgresql96/19/usr/bin/pg_ctl -D data stop

pg_ctl: directory "data" does not exist

jd@jd-wks:~/snap/postgresql96/common/data$ cd ..

jd@jd-wks:~/snap/postgresql96/common$ 
/snap/postgresql96/19/usr/bin/pg_ctl -D data stop

waiting for server to shut down...LOG:  received fast shutdown request
.LOG:  aborting any active transactions
LOG:  autovacuum launcher shutting down
LOG:  shutting down
LOG:  database system is shut down
 done
server stopped

jd@jd-wks:~/snap/postgresql96/common$ grep listen_addresses 
data/postgresql.conf

listen_addresses = '192*'   # what IP address(es) to listen on;

-- I wasn't actually expecting the above to work. I was just testing.


jd@jd-wks:~/snap/postgresql96/common$ 
/snap/postgresql96/19/usr/bin/pg_ctl -D data start

server starting

jd@jd-wks:~/snap/postgresql96/common$ postgresql96.psql -U jd -h 
localhost postgres

psql (9.6.2)
Type "help" for help.

postgres=# show listen_addresses ;
 listen_addresses
--
 *
(1 row)


I grant that this is obscure but perhaps we should do something about it?

Thanks,

JD

--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Unless otherwise stated, opinions are my own.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-15 Thread Pavan Deolasee

On Tue, Mar 14, 2017 at 7:16 PM, Alvaro Herrera 
wrote:

> Pavan Deolasee wrote:
> > On Tue, Mar 14, 2017 at 7:17 AM, Alvaro Herrera <
> alvhe...@2ndquadrant.com>
> > wrote:
>
> > > I have already commented about the executor involvement in btrecheck();
> > > that doesn't seem good.  I previously suggested to pass the EState down
> > > from caller, but that's not a great idea either since you still need to
> > > do the actual FormIndexDatum.  I now think that a workable option would
> > > be to compute the values/isnulls arrays so that btrecheck gets them
> > > already computed.
> >
> > I agree with your complaint about modularity violation. What I am unclear
> > is how passing values/isnulls array will fix that. The way code is
> > structured currently, recheck routines are called by index_fetch_heap().
> So
> > if we try to compute values/isnulls in that function, we'll still need
> > access EState, which AFAIU will lead to similar violation. Or am I
> > mis-reading your idea?
>
> You're right, it's still a problem.  (Honestly, I think the whole idea
> of trying to compute a fake index tuple starting from a just-read heap
> tuple is a problem in itself;

Why do you think so?

> I just wonder if there's a way to do the
> recheck that doesn't involve such a thing.)
>

I couldn't find a better way without a lot of complex infrastructure. Even
though we now have ability to mark index pointers and we know that a given
pointer either points to the pre-WARM chain or post-WARM chain, this does
not solve the case when an index does not receive a new entry. In that
case, both pre-WARM and post-WARM tuples are reachable via the same old
index pointer. The only way we could deal with this is to mark index
pointers as "common", "pre-warm" and "post-warm". But that would require us
to update the old pointer's state from "common" to "pre-warm" for the index
whose keys are being updated. May be it's doable, but might be more complex
than the current approach.

>
> > I wonder if we should instead invent something similar to IndexRecheck(),
> > but instead of running ExecQual(), this new routine will compare the
> index
> > values by the given HeapTuple against given IndexTuple. ISTM that for
> this
> > to work we'll need to modify all callers of index_getnext() and teach
> them
> > to invoke the AM specific recheck method if xs_tuple_recheck flag is set
> to
> > true by index_getnext().
>
> Yeah, grumble, that idea does sound intrusive, but perhaps it's
> workable.  What about bitmap indexscans?  AFAICS we already have a
> recheck there natively, so we only need to mark the page as lossy, which
> we're already doing anyway.
>

Yeah, bitmap indexscans should be ok. We need recheck logic only to avoid
duplicate scans and since a TID can only occur once in the bitmap, there is
no risk for duplicate results.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-15 Thread Tom Lane

Andres Freund  writes:
> On March 15, 2017 12:33:28 PM PDT, Tom Lane  wrote:
>> So for my money, you should drop 0002 altogether and just have 0004
>> remove get_last_attnums() from execUtils.c and stick it into
>> execExpr.c.  I suppose you still need the LastAttnumInfo API change
>> so as to decouple it from ProjectionInfo, but that's minor.

> I think it's quite useful in other places too, we do repeated slot-getattrs 
> in a bunch of places in the executor, and it's noticeable performance wise. 

Color me dubious.  Which specific other places have you got in mind, and
do they have expression trees at hand that would tell them which columns
they really need to pull out?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-15 Thread Andres Freund



On March 15, 2017 12:33:28 PM PDT, Tom Lane  wrote:
>Andres Freund  writes:
>> [ latest patches ]
>
>I looked through 0002-Make-get_last_attnums-more-generic.patch.

>So for my money, you should drop 0002 altogether and just have 0004
>remove get_last_attnums() from execUtils.c and stick it into
>execExpr.c.  I suppose you still need the LastAttnumInfo API change
>so as to decouple it from ProjectionInfo, but that's minor.

I think it's quite useful in other places too, we do repeated slot-getattrs in 
a bunch of places in the executor, and it's noticeable performance wise. 

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-15 Thread Tom Lane

Andres Freund  writes:
> [ latest patches ]

I looked through 0002-Make-get_last_attnums-more-generic.patch.
Although it seems relatively unobjectionable on its own, I'm not
convinced that it's really useful to try to split it out like this.
I see that 0004 removes the only call of ExecGetLastAttnums (the
one in ExecBuildProjectionInfo) and then adds a single call in
ExecInitExprSlots which is in execExpr.c.  To me, the only reason
ExecGetLastAttnums nee get_last_attnums is in execUtils.c in the first
place is that it is a private subroutine of ExecBuildProjectionInfo.
After these changes, it might as well be a private subroutine of
ExecInitExprSlots.  I'm suspicious of turning it into a globally
accessible function as you've done here, because I doubt that it is of
global use --- in particular, the fact that it doesn't deal specially
with INDEX_VAR Vars seems rather specific to this one use-case.

So for my money, you should drop 0002 altogether and just have 0004
remove get_last_attnums() from execUtils.c and stick it into
execExpr.c.  I suppose you still need the LastAttnumInfo API change
so as to decouple it from ProjectionInfo, but that's minor.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-15 Thread Emre Hasegeli

> Please verify the fix.

The same test with both of the patches applied still crashes for me.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - allow to store select results into variables

2017-03-15 Thread Fabien COELHO



Hello Rafia,


I was reviewing v7 of this patch, to start with I found following white
space errors when applying with git apply,
/home/edb/Desktop/patches/others/pgbench-into-7.patch:66: trailing
whitespace.


Yep.

I do not know why "git apply" sometimes complains. All is fine for me both 
with "git apply" and "patch".


Last time it was because my mailer uses text/x-diff for the mime type, as 
define by the system in "/etc/mime.types", which some mailer then 
interpret as a license to change eol-style when saving, resulting in this 
kind of behavior. Could you tell your mailer just to save the file as is?



Apart from that, on executing SELECT 1 AS a \gset \set i debug(:a) SELECT 2
AS a \gcset SELECT 3; given in your provided script gset-1.sql. it is
giving error Invalid command \gcset.


Are you sure that you are using the compiled pgbench, not a previously 
installed one?


  bin/pgbench> pgbench -t 1 -f SQL/gset-1.sql
SQL/gset-1.sql:1: invalid command in command "gset"
\gset

  bin/pgbench> ./pgbench -t 1 -f SQL/gset-1.sql
starting vacuum...end.
debug(script=0,command=2): int 1
debug(script=0,command=4): int 2
...


Not sure what is the intention of this script anyway?


The intention is to test that gset & gcset work as expected in various 
settings, especially with combined queries (\;) the right result must be 
extracted in the sequence.


Also, instead of so many different files for error why don't you combine 
it into one.


Because a pgbench scripts stops on the first error, and I wanted to test 
what happens with several kind of errors.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-03-15 Thread Robert Haas

On Wed, Mar 15, 2017 at 8:55 AM, Ashutosh Bapat
 wrote:
> Sorry. That was added by my patch to refactor
> set_append_rel_pathlist(). I have added a patch in the series to
> remove that line.

It's not worth an extra commit just to change what isn't broken.
Let's just leave it alone.

>> Very sad.  I guess if we had parallel append available, we could maybe
>> dodge this problem, but for now I suppose we're stuck with it.
>
> Really sad. Is there a way to look at the relation (without any
> partial paths yet) and see whether the relation will have partial
> paths or not. Even if we don't have actual partial paths but know that
> there will be at least one added in the future, we will be able to fix
> this problem.

I don't think so.  If we know that rel->consider_parallel will end up
true for a plain table, we should always get a parallel sequential
scan path at least, but if there are foreign tables involved, then
nothing is guaranteed.

>> partition_wise_plan_weight may be useful for testing, but I don't
>> think it should be present in the final patch.
>
> partition_join test needs it so that it can work with smaller dataset
> and complete faster. For smaller data sets the partition-wise join
> paths come out to be costlier than other kinds and are never chosen.
> By setting partition_wise_plan_weight I can force partition-wise join
> to be chosen. An alternate solution would be to use
> sample_partition_fraction = 1.0, but then we will never test delayed
> planning for unsampled child-joins. I also think that users will find
> partition_wise_plan_weight useful when estimates based on samples are
> unrealistic. Obviously, in a longer run we should be able to provide
> better estimates.

I still don't like it -- we have no other similar knob.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-03-15 Thread Robert Haas

On Wed, Mar 15, 2017 at 8:49 AM, Ashutosh Bapat
 wrote:
>> Of course, that supposes that 0009 can manage to postpone creating
>> non-sampled child joinrels until create_partition_join_plan(), which
>> it currently doesn't.
>
> Right. We need the child-join's RelOptInfos to estimate sizes, so that
> we could sample the largest ones. So postponing it looks difficult.

You have a point.

>> In fact, unless I'm missing something, 0009
>> hasn't been even slightly adapted to take advantage of the
>> infrastructure in 0001; it doesn't seem to reset the path_cxt or
>> anything.  That seems like a fairly major omission.
>
> The path_cxt reset introduced by 0001 recycles memory used by all the
> paths, including paths created for the children. But that happens only
> after all the planning has completed. I thought that's what we
> discussed to be done. We could create a separate path context for
> every top-level child-join.

I don't think we need to create a new context for each top-level
child-join, but I think we should create a context to be used across
all top-level child-joins and then reset it after planning each one.
I thought the whole point here was that NOT doing that caused the
memory usage for partitionwise join to get out of control.  Am I
confused?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Leftover invalidation handling in RemoveRelations

2017-03-15 Thread Andres Freund

Hi,

On 2017-03-15 14:46:22 -0400, Robert Haas wrote:
> On Wed, Mar 15, 2017 at 2:40 PM, Andres Freund  wrote:
> Yeah, I don't think that would hurt anything.
> 
> (I'm not sure it'll help anything either - the overhead of an extra
> AcceptInvalidationMessages() call is quite minimal - but, as you say,
> maybe it's worth doing just to discourage future code authors from
> including unnecessary fluff.)

I don't think there's an actual runtime advantage either - but it's
indeed confusing for others, because it doesn't square with what's
needed.  It's not like the AcceptInvalidationMessages() would actually
make things race-free if used without RangeVarGetRelidExtended()...

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Leftover invalidation handling in RemoveRelations

2017-03-15 Thread Robert Haas

On Wed, Mar 15, 2017 at 2:40 PM, Andres Freund  wrote:
> reviewing some citus code copied from postgres I noticed that
> RemoveRelations() has the following bit:
>
> /*
>  * These next few steps are a great deal like 
> relation_openrv, but we
>  * don't bother building a relcache entry since we don't need 
> it.
>  *
>  * Check for shared-cache-inval messages before trying to 
> access the
>  * relation.  This is needed to cover the case where the name
>  * identifies a rel that has been dropped and recreated since 
> the
>  * start of our transaction: if we don't flush the old 
> syscache entry,
>  * then we'll latch onto that entry and suffer an error later.
>  */
> AcceptInvalidationMessages();
>
> /* Look up the appropriate relation using namespace search. */
> state.relkind = relkind;
> state.heapOid = InvalidOid;
> state.concurrent = drop->concurrent;
> relOid = RangeVarGetRelidExtended(rel, lockmode, true,
>   
> false,
>   
> RangeVarCallbackForDropRelation,
>   
> (void *) );
>
> which doesn't seem to make sense - RangeVarGetRelidExtended does
> invalidation handling on it's own.
>
> Looks like this was left there in the course of
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ad36c4e44c8b513f6155656e1b7a8d26715bb94
>
> ISTM AcceptInvalidationMessages() and preceding comment should just be
> removed?

Yeah, I don't think that would hurt anything.

(I'm not sure it'll help anything either - the overhead of an extra
AcceptInvalidationMessages() call is quite minimal - but, as you say,
maybe it's worth doing just to discourage future code authors from
including unnecessary fluff.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Leftover invalidation handling in RemoveRelations

2017-03-15 Thread Andres Freund

Hi,

reviewing some citus code copied from postgres I noticed that
RemoveRelations() has the following bit:

/*
 * These next few steps are a great deal like relation_openrv, 
but we
 * don't bother building a relcache entry since we don't need 
it.
 *
 * Check for shared-cache-inval messages before trying to 
access the
 * relation.  This is needed to cover the case where the name
 * identifies a rel that has been dropped and recreated since 
the
 * start of our transaction: if we don't flush the old syscache 
entry,
 * then we'll latch onto that entry and suffer an error later.
 */
AcceptInvalidationMessages();

/* Look up the appropriate relation using namespace search. */
state.relkind = relkind;
state.heapOid = InvalidOid;
state.concurrent = drop->concurrent;
relOid = RangeVarGetRelidExtended(rel, lockmode, true,

  false,

  RangeVarCallbackForDropRelation,

  (void *) );

which doesn't seem to make sense - RangeVarGetRelidExtended does
invalidation handling on it's own.

Looks like this was left there in the course of
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2ad36c4e44c8b513f6155656e1b7a8d26715bb94

ISTM AcceptInvalidationMessages() and preceding comment should just be
removed?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Microvacuum support for Hash Index

2017-03-15 Thread Robert Haas

On Wed, Mar 15, 2017 at 2:10 PM, Ashutosh Sharma  wrote:
> Okay, I have done the changes as suggested by you. Please refer to the
> attached v8 patch. Thanks.

Cool, but this doesn't look right:

+action = XLogReadBufferForRedo(record, 0, );
+
+if (!IsBufferCleanupOK(buffer))
+elog(PANIC, "hash_xlog_vacuum_one_page: failed to acquire
cleanup lock");

That could fail, I think, because of a pin from a Hot Standby backend.
You want to call XLogReadBufferForRedoExtended() with a third argument
of true. Come to think of it, shouldn't hash_xlog_split_allocate_page
be changed the same way?

+/*
+ * Let us mark the page as clean if vacuum removes the DEAD tuples
+ * from an index page. We do this by clearing
LH_PAGE_HAS_DEAD_TUPLES
+ * flag.
+ */

Maybe add: Clearing this flag is just a hint; replay won't redo this.

+ * Hash Index records that are marked as LP_DEAD and being removed during
+ * hash index tuple insertion can conflict with standby queries.You might

The word Index shouldn't be capitalized here.  There should be a space
before "You".

The formatting of this comment is oddly narrow:

+ * _hash_vacuum_one_page - vacuum just one index page.
+ * Try to remove LP_DEAD items from the given page.  We
+ * must acquire cleanup lock on the page being modified
+ * before calling this function.

I'd add a blank line after the first line and reflow the rest to be
something more like 75 characters.  pgindent evidently doesn't think
this needs reformatting, but it's oddly narrow.

I suggest changing the header comment of
hash_xlog_vacuum_get_latestRemovedXid like this:

+ * Get the latestRemovedXid from the heap pages pointed at by the index
+ * tuples being deleted.  See also btree_xlog_delete_get_latestRemovedXid,
+ * on which this function is based.

This is looking good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Microvacuum support for Hash Index

2017-03-15 Thread Ashutosh Sharma

On Wed, Mar 15, 2017 at 9:28 PM, Robert Haas  wrote:
> On Wed, Mar 15, 2017 at 11:37 AM, Ashutosh Sharma  
> wrote:
>>> +/* Get RelfileNode from relation OID */
>>> +rel = relation_open(htup->t_tableOid, NoLock);
>>> +rnode = rel->rd_node;
>>> +relation_close(rel, NoLock);
>>>  itup->t_tid = htup->t_self;
>>> -_hash_doinsert(index, itup);
>>> +_hash_doinsert(index, itup, rnode);
>>>
>>> This is an awfully low-level place to be doing something like this.
>>> I'm not sure exactly where this should be happening, but not in the
>>> per-tuple callback.
>>
>> Okay, Now I have done this inside _hash_doinsert() instead of callback
>> function. Please have a look into the attached v7 patch.
>
> In the btree case, the heap relation isn't re-opened from anywhere in
> the btree code.  I think we should try to do the same thing here.  If
> we add an argument for the heap relation to _hash_doinsert(),
> hashinsert() can easily it down; it's already got that value
> available.  There are two other calls to _hash_doinsert:
>
> 1. _h_indexbuild() calls _hash_doinsert().  It's called only from
> hashbuild(), which has the heap relation available.  So we can just
> add that as an extra argument to _h_indexbuild() and then from there
> pass it to _hash_doinsert.
>
> 2. hashbuildCallback calls _hash_doinsert().  It's sixth argument is a
> HashBuildState which is set up by hashbuild(), which has the heap
> relation available.  So we can just add an extra member to the
> HashBuildState and have hashbuild() set it before calling
> IndexBuildHeapScan.  hashbuildCallback can then fish it out of the
> HashBuildState and pass it to _hash_doinsert().

Okay, I have done the changes as suggested by you. Please refer to the
attached v8 patch. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


microvacuum_hash_index_v8.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Defaulting psql to ON_ERROR_ROLLBACK=interactive

2017-03-15 Thread Joshua Yanovski

Just chiming in: I rely heavily on the current default behavior
because it prevents large statements pasted into psql that cause
errors in transactions from partially running, and if it were changed
I would have caused production outages.

On Wed, Mar 15, 2017 at 8:42 AM, Pavel Stehule  wrote:
>
>
> 2017-03-15 16:38 GMT+01:00 Robert Haas :
>>
>> On Wed, Mar 15, 2017 at 2:29 AM, Peter van Hardenberg  wrote:
>> > Ads and I were talking over breakfast about usability issues and he
>> > mentioned transaction cancellation during interactive sessions as a
>> > serious
>> > pain point.
>> >
>> > I suggest we update the default of ON_ERROR_ROLLBACK to interactive for
>> > 10.0.
>> >
>> > The last discussion I could find about this subject was in 2011 and
>> > while
>> > there was concern about setting the default to "on" (as this would
>> > tamper
>> > with the expected behaviour of scripts), I don't see any identification
>> > of a
>> > problem that would be caused by setting it to "interactive" by default.
>>
>> Well, then you'd get one behavior when you use psql interactively, and
>> another behavior when you use it from a script.  And if you used a
>> client other than psql the behavior would be different from psql.
>> Plus, it's kinda surprising to have a client that, by default, is
>> sending secret commands to the server that you don't know about.  And
>> it's a backward-incompatible change against previous releases.  I
>> don't think any of that makes this the worst idea ever, but on balance
>> I still think it's better to just recommend to people that they
>> configure their .psqlrc with this setting if they want the behavior.
>>
>> In short, -1 from me.
>
>
> I agree with Robert. I prefer some doc, web page "after install steps".
>
> Pavel
>
>>
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-15 Thread Kevin Grittner

On Wed, Mar 15, 2017 at 11:35 AM, Mengxing Liu
 wrote:

>> On a NUMA machine It is not at all unusual to see bifurcated results
>> -- with each run coming in very close to one number or a second
>> number, often at about a 50/50 rate, with no numbers falling
>> anywhere else.  This seems to be based on where the processes and
>> memory allocations happen to land.
>>
>
> Do you mean that for a NUMA machine, there usually exists two
> different results of its performance?
> Just two? Neither three nor four?

In my personal experience, I have often seen two timings that each
run randomly matched; I have not seen nor heard of more, but that
doesn't mean it can't happen.  ;-)

> At first, I will compile and install PostgreSQL by myself and try
> the profile tools (perf or oprofile).

perf is newer, and generally better if you can use it.  Don't try to
use either on HP hardware -- the BIOS uses some of the same hardware
registers that other manufacturers leave for use of profilers; an HP
machine is likely to freeze or reboot if you try to run either of
those profilers under load.

> Then I will run one or two benchmarks using different config,
> where I may need your help to ensure that my tests are close to the
> practical situation.

Yeah, we should talk about OS and PostgreSQL configuration before
you run any benchmarks.  Neither tends to come configured as I would
run a production system.

> PS: Disable NUMA in BIOS means that CPU can use its own memory
> controller when accessing local memory to reduce hops.

NUMA means that each CPU chip directly controls some of the RAM
(possibly with other, non-CPU controllers for some RAM).  The
question is whether the BIOS or the OS controls the memory
allocation.  The OS looks at what processes are on what cores and
tries to use "nearby" memory for allocations.  This can be pessimal
if the amount of RAM that is under contention is less than the size
of one memory segment, since all CPU chips need to ask the one
managing that RAM for each access.  In such a case, you actually get
best performance using a cpuset which just uses one CPU package and
the memory segments directly managed by that CPU package.  Without
the cpuset you may actually see better performance for this workload
by letting the BIOS interleave allocations, which spreads the RAM
allocations around to memory managed by all CPUs, and no one CPU
becomes the bottleneck.  The access is still not uniform, but you
dodge a bottleneck -- albeit less efficiently than using a custom
cpuset.

> On the contrary, "enable" means UMA.

No.  Think about this: regardless of this BIOS setting each RAM
package is directly connected to one CPU package, which functions as
its memory controller.  Most of the RAM used by PostgreSQL is for
disk buffers -- shared buffers in shared memory and OS cache.
Picture processes running on different CPU packages accessing a
single particular shared buffer.  Also picture processes running on
different CPU packages using the same spinlock at the same time.  No
BIOS setting can avoid the communications and data transfer among
the 8 CPU packages, and the locking of the cache lines.

> Therefore, I think Tony is right, we should disable this setting.
>
> I got the information from here.
> http://frankdenneman.nl/2010/12/28/node-interleaving-enable-or-disable/

Ah, that explains it.  There is no such thing as "disabling NUMA" --
you can have the BIOS force interleaving, or you can have the BIOS
leave the NUMA memory assignment to the OS.  I was assuming that by
"disabling NUMA" you meant to have BIOS control it through
interleaving.  You meant the opposite -- disabling the BIOS override
of OS NUMA control.  I agree that we should leave NUMA scheduling to
the OS. There are, however, some non-standard OS configuration
options that allow NUMA to behave better with PostgreSQL than the
defaults allow.  We will need to tune a little.

The author of that article seems to be assuming that the usage will
be with applications like word processing, spreadsheets, or browsers
-- where the OS can place all the related processes on a single CPU
package and all (or nearly all) memory allocations can be made from
associated memory -- yielding a fairly uniform and fast access when
the BIOS override is disabled.  On a database product which wants to
use all the cores and almost all of the memory, with heavy
contention on shared memory, the situation is very different.
Shared resource access time is going to be non-uniform no matter
what you do.  The difference is that the OS can still make *process
local* allocations from nearby memory segments, while BIOS cannot.

--
Kevin Grittner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improve isolation tests infrastructure.

2017-03-15 Thread Andres Freund

Hi Peter, All,

On 2017-03-14 23:10:19 +, Andres Freund wrote:
> Improve isolation tests infrastructure.

There's:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=frogmouth=2017-03-15%2012%3A32%3A45
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=narwhal=2017-03-15%2005%3A00%3A01

I suspect that's because previously src/test/regress/GNUMakefile's
installcheck provided --bindir='$(bindir)', but
$pg_isolation_regress_installcheck) doesn't (and didn't before the
changes).

ISTM that the only reason that this previously didn't cause issues on
these machines is that neither actually *runs* any of the isolation
checks in other directories.  I.e. they'd have failed before this, too,
if they ran tests using pg_isolation_regress_installcheck.

To me it seems like an oversight in the previous definition of
pg_isolation_regress_installcheck.  Looks like that was introduced in
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=dcae5faccab64776376d354decda0017c648bb53

Peter, I guess it wasn't intentional that you added --bindir to
pg_regress_installcheck but not pg_isolation_regress_installcheck?

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Need a builtin way to run all tests faster manner

2017-03-15 Thread Tom Lane

Peter Eisentraut  writes:
> make check-world -O -j6 PROVE_FLAGS=-j6
> 2 min 34 seconds
> Nice!

One thing I've noticed with parallel check-world is that it's possible
for a sub-job to fail and for the resulting complaint from make to scroll
offscreen before everything stops, leaving all the visible output being
from another sub-job that completed just fine.  If you don't look VERY
closely, or check $?, you can easily be misled into thinking the tests
all passed.

I think it'd be a good idea to fix check-world so that there's a positive
printout at the end of a successful run, along the lines of the

+@echo "All of PostgreSQL successfully made. Ready to install."

we have for the "all" target.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-15 Thread Dilip Kumar

On Wed, Mar 15, 2017 at 10:21 PM, Emre Hasegeli  wrote:
>> hasegeli=# create table r2 as select (random() * 3)::int as i from 
>> generate_series(1, 100);
>> SELECT 100
>> hasegeli=# create index on r2 using brin (i);
>> CREATE INDEX
>> hasegeli=# analyze r2;
>> ANALYZE
>> hasegeli=# explain select * from only r2 where i = 10;
>>  QUERY PLAN
>> -
>>  Gather  (cost=2867.50..9225.32 rows=1 width=4)
>>Workers Planned: 2
>>->  Parallel Bitmap Heap Scan on r2  (cost=1867.50..8225.22 rows=1 
>> width=4)
>>  Recheck Cond: (i = 10)
>>  ->  Bitmap Index Scan on r2_i_idx  (cost=0.00..1867.50 rows=371082 
>> width=0)
>>Index Cond: (i = 10)
>> (6 rows)
>>
>> hasegeli=# select * from only r2 where i = 10;

I am able to reproduce the bug, and attached patch fixes the same.
Problem is that I am not handling TBM_EMPTY state properly.  I
remember that while reviewing the patch Robert mentioned that we might
need to handle the TBM_EMPTY and I told that since we are not handling
in non-parallel mode so we don't need to handle here as well.  But, I
was wrong.  So the problem is that if state is not TBM_HASH then it's
directly assuming TBM_ONE_PAGE which is completely wrong.  I have
fixed that and also fixed in other similar locations.

Please verify the fix.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

fix_tbm_empty.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication access control patches

2017-03-15 Thread Peter Eisentraut

On 3/14/17 14:49, Petr Jelinek wrote:
> Not what I mean - owner should be able to publish table. If you are
> granted role of the owner you can do what owner can no?

I didn't actually know that ownership worked that way.  You can grant
the role of an owner to someone, and then that someone has ownership
rights.  So that should satisfy a pretty good range of use cases for who
can publish what tables.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_ls_waldir() & pg_ls_logdir()

2017-03-15 Thread Robert Haas

On Mon, Feb 20, 2017 at 6:21 AM, Dave Page  wrote:
> Patch includes the code and doc updates.

Review:

+strftime(mtime, 25, "%Y-%m-%d %H:%M:%S %Z",
localtime(&(attrib.st_ctime)));
+const int n = snprintf(NULL, 0, "%lld", attrib.st_size);
+char size[n+1];
+snprintf(size, n+1, "%lld", attrib.st_size);

We don't allow variable declarations in mid-block.  You've been
programming in C++ for too long!

The documentation should be updated to say that access to
pg_ls_logdir() and pg_ls_waldir() can be granted via the permissions
system (see the paragraph above the table you updated).

The four existing functions in the documentation table each have a
corresponding paragraph below the table, but the two new functions you
added don't.

+1 for the concept.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Need a builtin way to run all tests faster manner

2017-03-15 Thread Peter Eisentraut

make check-world -O -j6 PROVE_FLAGS=-j6

2 min 34 seconds

Nice!

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication access control patches

2017-03-15 Thread Peter Eisentraut

On 3/14/17 15:37, Petr Jelinek wrote:
> Yeah that's rather hard to say in front. Maybe safest action would be to
> give the permission to owners in 10 and revisit special privilege in 11
> based on feedback?

I'm fine with that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication access control patches

2017-03-15 Thread Peter Eisentraut

On 3/14/17 15:05, Stephen Frost wrote:
> Another approach to solving my concern would be to only allow the
> publishing of tables by non-owner users who have table-level SELECT
> rights

An early version of the logical replication patch set did that.  But the
problem is that this way someone with SELECT privilege could include a
table without replica identity index into a publication and thereby
prevent updates to the table.  There might be ways to tweak things to
make that work better, but right now it works that way.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] identity columns

2017-03-15 Thread Peter Eisentraut

Vitaly, will you be able to review this again?


On 2/28/17 21:23, Peter Eisentraut wrote:
> New patch that fixes everything.  ;-)  Besides hopefully addressing all
> your issues, this version also no longer requires separate privileges on
> the internal sequence, which was the last outstanding functional issue
> that I am aware of.  This version also no longer stores a default entry
> in pg_attrdef.  Instead, the rewriter makes up the default expression on
> the fly.  This makes some things much simpler.
> 
> On 1/4/17 19:34, Vitaly Burovoy wrote:
>> 1. The fact COPY ignores GENERATED ALWAYS constraint (treats as
>> GENERATED BY DEFAULT) should be mentioned as well as rules.
> 
> fixed (documentation added)
> 
> What do you mean for rules?
> 
>> 2. Usually error message for identical columns (with LIKE clause)
>> looks like this:
>> test=# CREATE TABLE def(i serial, n1 int NOT NULL, n2 int);
>> CREATE TABLE
>> test=# CREATE TABLE test(i serial, LIKE def INCLUDING ALL);
>> ERROR:  column "i" specified more than once
>>
>> but for generated columns it is disorienting enough:
>> test=# CREATE TABLE idnt(i int GENERATED BY DEFAULT AS IDENTITY);
>> CREATE TABLE
>> test=# CREATE TABLE test(i int GENERATED BY DEFAULT AS IDENTITY, LIKE
>> idnt INCLUDING ALL);
>> ERROR:  relation "test_i_seq" already exists
> 
> Yeah, this is kind of hard to fix though, because the sequence names
> are decided during parse analysis, when we don't see that the same
> sequence name is also used by another new column.  We could maybe
> patch around it in an ugly way, but it doesn't seem worth it for this
> marginal case.
> 
>> 3. Strange error (not about absence of a column; but see pp.5 and 8):
>> test=# ALTER TABLE idnt ALTER COLUMN o ADD GENERATED ALWAYS AS IDENTITY;
>> ERROR:  identity column type must be smallint, integer, or bigint
> 
> What's wrong with that?
> 
>> 4. Altering type leads to an error:
>> test=# ALTER TABLE idnt ALTER COLUMN i TYPE bigint;
>> ERROR:  cannot alter data type of identity column "i"
>>
>> it is understandable for non-integers. But why do you forbid changing
>> to the supported types?
> 
> fixed (is now allowed)
> 
>> 5. Attempt to make a column be identity fails:
>> test=# ALTER TABLE idnt ALTER COLUMN n1 SET GENERATED BY DEFAULT;
>> ERROR:  column "n1" of relation "idnt" is not an identity column
>>
>> As I understand from the Spec, chapter 11.20 General Rule 2)b) says
>> about the final state of a column without mentioning of the initial
>> state.
>> Therefore even if the column has the initial state "not generated",
>> after the command it changes the state to either "GENERATED ALWAYS" or
>> "GENERATED BY DEFAULT".
> 
> 11.12  states that "If  specification> or  is specified, then C
> shall be an identity column."  So this clause is only to alter an
> existing identity column, not make a new one.
> 
>> 6. The docs mention a syntax:
>> ALTER [ COLUMN ] > class="PARAMETER">column_name { SET GENERATED { ALWAYS |
>> BY DEFAULT } | SET sequence_option | RESET
>> } [...]
>>
>> but the command fails:
>> test=# ALTER TABLE idnt ALTER COLUMN i RESET;
>> ERROR:  syntax error at or near ";"
>> LINE 1: ALTER TABLE idnt ALTER COLUMN i RESET;
> 
> This works for me.  Check again please.
> 
>> 7. (the same line of the doc)
>> usually ellipsis appears in inside parenthesis with clauses can be
>> repeated, in other words it should be written as:
>> ALTER  column_name ( { SET GENERATED { ALWAYS |
>> BY DEFAULT } |  } [...] )
> 
> But there are no parentheses in that syntax.  I think the syntax
> synopsis as written is correct.
> 
>> 8. To avoid unnecessary syntax "ALTER COLUMN ... ADD GENERATED ..."
>> and use existing syntax "ALTER COLUMN ... SET GENERATED ..." you can
>> just make the "SET" word be optional as a PG's extension. I.e. instead
>> of:
>> test=# ALTER TABLE idnt ALTER i SET GENERATED ALWAYS SET INCREMENT BY 4;
>>
>> you can write:
>> test=# ALTER TABLE idnt ALTER i SET GENERATED ALWAYS INCREMENT BY 4;
>> test=# ALTER TABLE idnt ALTER n1 SET GENERATED ALWAYS INCREMENT BY 4;
>> -- which sets identity constraint - see p.5
>>
>> which is very similar to your extended syntax:
>> test=# ALTER TABLE idnt ALTER n1 ADD GENERATED ALWAYS AS IDENTITY
>> (INCREMENT BY 4);
> 
> See under 5 that that is not correct.
> 
>> 9. The command fails:
>> test=# ALTER TABLE idnt ALTER n2 ADD GENERATED ALWAYS AS IDENTITY;
>> ERROR:  column "n2" of relation "idnt" must be declared NOT NULL
>> before identity can be added
>>
>> whereas the Spec does not contains a requirement for a column to be a
>> NOT NULLABLE.
>> You can implicitly set a column as NOT NULL (as the "serial" macros
>> does), but not require it later.
> 
> The spec requires that an identity column is NOT NULL.
> 
>> Moreover you can get a NULLABLE identity column by:
>> test=# ALTER TABLE idnt ALTER COLUMN i DROP NOT NULL;
>> ALTER TABLE
>> test=# \d idnt
>>   Table "public.idnt"
>>  Column |  Type   | Collation | Nullable |

Re: [HACKERS] [POC] hash partitioning

2017-03-15 Thread Robert Haas

On Wed, Mar 15, 2017 at 12:39 PM, David Steele  wrote:
> Agreed.  Perhaps both types of syntax should be supported, one that is
> friendly to users and one that is precise for dump tools and those who
> care get in the weeds.

Eventually, sure.  For the first version, I want to skip the friendly
syntax and just add the necessary syntax.  That makes it easier to
make sure that pg_dump and everything are working the way you want.
Range and list partitioning could potentially grow convenience syntax
around partition creation, too, but that wasn't essential for the
first patch, so we cut it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] background sessions

2017-03-15 Thread Robert Haas

On Wed, Mar 15, 2017 at 6:43 AM, Pavel Stehule  wrote:
> I don't understand - CHECK_FOR_INTERRUPTS called from executor implicitly.

True.  So there shouldn't be any problem here.  I'm confused as can be
about what you want changed.

Some review of the patch itself:

+pq_redirect_to_shm_mq(session->seg, session->command_qh);
+pq_beginmessage(, 'X');
+pq_endmessage();
+pq_stop_redirect_to_shm_mq();

shm_redirect_to_shm_mq() wasn't really designed to be used this way;
it's designed for use by the worker, not the process that launched it.
If an error occurs while output is redirected, bad things will happen.
I think it would be better to find a way of sending that message to
the queue without doing this.

Also, I suspect this is deadlock-prone.  If we get stuck trying to
send a message to the background session while the queue is full, and
at the same time the session is stuck trying to send us a long error
message, we will have an undetected deadlock.  That's why
pg_background() puts the string being passed to the worker into the
DSM segment in its entirety, rather than sending it through a shm_mq.

+elog(ERROR, "no T before D");

That's not much of an error message, even for an elog.

+elog(ERROR, "already received a T message");

Nor that.

None of these functions have function header comments.  Or much in the
way of internal comments.  For example:

+case '1':
+break;
+case 'E':
+rethrow_errornotice();
+break;

That's really not clear without more commentary.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-15 Thread Emre Hasegeli

> With my test case, I could not crash even with this patch applied.
> Can you provide your test case?

Yes:

> hasegeli=# create table r2 as select (random() * 3)::int as i from 
> generate_series(1, 100);
> SELECT 100
> hasegeli=# create index on r2 using brin (i);
> CREATE INDEX
> hasegeli=# analyze r2;
> ANALYZE
> hasegeli=# explain select * from only r2 where i = 10;
>  QUERY PLAN
> -
>  Gather  (cost=2867.50..9225.32 rows=1 width=4)
>Workers Planned: 2
>->  Parallel Bitmap Heap Scan on r2  (cost=1867.50..8225.22 rows=1 width=4)
>  Recheck Cond: (i = 10)
>  ->  Bitmap Index Scan on r2_i_idx  (cost=0.00..1867.50 rows=371082 
> width=0)
>Index Cond: (i = 10)
> (6 rows)
>
> hasegeli=# select * from only r2 where i = 10;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Use asynchronous connect API in libpqwalreceiver

2017-03-15 Thread Tom Lane

Andrew Dunstan  writes:
> On 03/03/2017 11:11 PM, Tom Lane wrote:
>> Yeah, I was wondering if this is just exposing a pre-existing bug.
>> However, the "normal" path operates by repeatedly invoking PQconnectPoll
>> (cf. connectDBComplete) so it's not immediately obvious how such a bug
>> would've escaped detection.

> (After a long period of fruitless empirical testing I turned to the code)
> Maybe I'm missing something, but connectDBComplete() handles a return of
> PGRESS_POLLING_OK as a success while connectDBStart() seems not to. I
> don't find anywhere in our code other than libpqwalreceiver that
> actually uses that interface, so it's not surprising if it's now
> failing. So my bet is it is indeed a long-standing bug.

Meh ... that argument doesn't hold water, because the old code here called
PQconnectdbParams which is just PQconnectStartParams then
connectDBComplete.  So the problem cannot be in connectDBStart; that's
common to both paths.  It has to be some discrepancy between what
connectDBComplete does and what the new loop in libpqwalreceiver is doing.

The original loop coding in 1e8a85009 was not very close to the documented
spec for PQconnectPoll at all, and while e434ad39a made it closer, it's
still not really the same: connectDBComplete doesn't call PQconnectPoll
until the socket is known read-ready or write-ready.  The walreceiver loop
does not guarantee that, but would make an additional call after any
random other wakeup.  It's not very clear why bowerbird, and only
bowerbird, would be seeing such wakeups --- but I'm having a really hard
time seeing any other explanation for the change in behavior.  (I wonder
whether bowerbird is telling us that WaitLatchOrSocket can sometimes
return prematurely on Windows.)

I'm also pretty sure that the ResetLatch call is in the wrong place which
could lead to missed wakeups, though that's the opposite of the immediate
problem.

I'll try correcting these things and we'll see if it gets any better.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-15 Thread Dilip Kumar

On Wed, Mar 15, 2017 at 10:02 PM, Emre Hasegeli  wrote:
> I was testing with the brin correlation patch [1] applied.  I cannot
> crash it without the patch either.  I am sorry for not testing it
> before.  The patch make BRIN selectivity estimation function access
> more information.
>
> [1] 
> https://www.postgresql.org/message-id/attachment/50164/brin-correlation-v3.patch

With my test case, I could not crash even with this patch applied.
Can you provide your test case?
(table, index, data, query)


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-03-15 Thread David Steele

On 3/15/17 12:25 PM, Robert Haas wrote:
> On Tue, Mar 14, 2017 at 10:08 AM, David Steele  wrote:
>> This patch is marked as POC and after a read-through I agree that's
>> exactly what it is.
> 
> Just out of curiosity, were you looking at Nagata-san's patch, or Amul's?

Both - what I was looking for was some kind of reconciliation between
the two patches and I didn't find that.  It seemed from the thread that
Yugo intended to pull Amul's changes/idea into his patch.

>> As such, I'm not sure it belongs in the last
>> commitfest.  Furthermore, there has not been any activity or a new patch
>> in a while and we are halfway through the CF.
>>
>> Please post an explanation for the delay and a schedule for the new
>> patch.  If no patch or explanation is posted by 2017-03-17 AoE I will
>> mark this submission "Returned with Feedback".
> 
> Regrettably, I do think it's too late to squeeze hash partitioning
> into v10, but I plan to try to get something committed for v11.  

It would certainly be a nice feature to have.

> I was
> heavily involved in the design of Amul's patch, and I think that
> design solves several problems that would be an issue for us if we did
> as Nagata-san is proposing.  For example, he proposed this:
> 
>  CREATE TABLE h1 PARTITION OF h;
>  CREATE TABLE h2 PARTITION OF h;
>  CREATE TABLE h3 PARTITION OF h;
> 
> That looks OK if you are thinking of typing this in interactively, but
> if you're doing a pg_dump, maybe with --binary-upgrade, you don't want
> the meaning of a series of nearly-identical SQL commands to depend on
> the dump ordering.  You want it to be explicit in the SQL command
> which partition is which, and Amul's patch solves that problem.

OK, it wasn't clear to me that this was the case because of the stated
user-unfriendliness.

>  Also,
> Nagata-san's proposal doesn't provide any way to increase the number
> of partitions later, and Amul's approach gives you some options there.
> I'm not sure those options are as good as we'd like them to be, and if
> not then we may need to revise the approach, but I'm pretty sure
> having no strategy at all for changing the partition count is not good
> enough.

Agreed.  Perhaps both types of syntax should be supported, one that is
friendly to users and one that is precise for dump tools and those who
care get in the weeds.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-15 Thread Mengxing Liu

> -Original Messages-
> From: "Kevin Grittner" 
> Sent Time: 2017-03-15 23:20:07 (Wednesday)
> To: DEV_OPS 
> Cc: "Mengxing Liu" , 
> "pgsql-hackers@postgresql.org" 
> Subject: Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from 
> rw-conflict tracking in serializable transactions
> 
> On Tue, Mar 14, 2017 at 3:45 PM, Kevin Grittner  wrote:
> >> On 3/14/17 17:34, Mengxing Liu wrote:
> >>> There are several alternative benchmarks. Tony suggested that we
> >>> should use TPC-E and TPC-DS.
> >
> > More benchmarks is better, all other things being equal.  Keep in
> > mind that good benchmarking practice with PostgreSQL generally
> > requires a lot of setup time (so that we're starting from the exact
> > same conditions for every run), a lot of run time (so that the
> > effects of vacuuming, bloat, and page splitting all comes into play,
> > like it would in the real world), and a lot of repetitions of each
> > run (to account for variation).  In particular, on a NUMA machine it
> > is not at all unusual to see bifurcated
> 
> Sorry I didn't finish that sentence.
> 
> On a NUMA machine It is not at all unusual to see bifurcated results
> -- with each run coming in very close to one number or a second
> number, often at about a 50/50 rate, with no numbers falling
> anywhere else.  This seems to be based on where the processes and
> memory allocations happen to land.
> 

Do you mean that for a NUMA machine, there usually exists two different results 
of its performance? 
Just two? Neither three nor four?  

Anyway, firstly, I think I should get familiar with PostgreSQL and tools you 
recommended to me at first. 
Then I will try to have a better comprehension about it, to make  our 
discussion more efficient.

Recently, I am busy preparing for the presentation at ASPLOS'17  and other lab 
works. 
But I promise I can finish all preparation works before May. Here is my working 
plan: 
At first, I will compile and install PostgreSQL by myself and try the profile 
tools (perf or oprofile).
Then I will run one or two benchmarks using different config, where I may need 
your help to ensure that my tests are close to the practical situation.

PS: Disable NUMA in BIOS means that CPU can use its own memory controller when 
accessing local memory to reduce hops. 
On the contrary, "enable" means UMA. Therefore, I think Tony is right, we 
should disable this setting.
I got the information from here. 
http://frankdenneman.nl/2010/12/28/node-interleaving-enable-or-disable/

Regards.

--
Mengxing Liu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-15 Thread Emre Hasegeli

> This can crash at line:414, if either tuple is invalid memory(but I
> think it's not because we have already accessed this memory in above
> if check) or dtup is invalid (this is also not possible because
> brin_new_memtuple has already accessed this).

I was testing with the brin correlation patch [1] applied.  I cannot
crash it without the patch either.  I am sorry for not testing it
before.  The patch make BRIN selectivity estimation function access
more information.

[1] 
https://www.postgresql.org/message-id/attachment/50164/brin-correlation-v3.patch


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-03-15 Thread Robert Haas

On Tue, Mar 14, 2017 at 10:08 AM, David Steele  wrote:
> This patch is marked as POC and after a read-through I agree that's
> exactly what it is.

Just out of curiosity, were you looking at Nagata-san's patch, or Amul's?

> As such, I'm not sure it belongs in the last
> commitfest.  Furthermore, there has not been any activity or a new patch
> in a while and we are halfway through the CF.
>
> Please post an explanation for the delay and a schedule for the new
> patch.  If no patch or explanation is posted by 2017-03-17 AoE I will
> mark this submission "Returned with Feedback".

Regrettably, I do think it's too late to squeeze hash partitioning
into v10, but I plan to try to get something committed for v11.  I was
heavily involved in the design of Amul's patch, and I think that
design solves several problems that would be an issue for us if we did
as Nagata-san is proposing.  For example, he proposed this:

 CREATE TABLE h1 PARTITION OF h;
 CREATE TABLE h2 PARTITION OF h;
 CREATE TABLE h3 PARTITION OF h;

That looks OK if you are thinking of typing this in interactively, but
if you're doing a pg_dump, maybe with --binary-upgrade, you don't want
the meaning of a series of nearly-identical SQL commands to depend on
the dump ordering.  You want it to be explicit in the SQL command
which partition is which, and Amul's patch solves that problem.  Also,
Nagata-san's proposal doesn't provide any way to increase the number
of partitions later, and Amul's approach gives you some options there.
I'm not sure those options are as good as we'd like them to be, and if
not then we may need to revise the approach, but I'm pretty sure
having no strategy at all for changing the partition count is not good
enough.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-15 Thread Dilip Kumar

On Wed, Mar 15, 2017 at 8:11 PM, Emre Hasegeli  wrote:
>> * thread #1: tid = 0x5045a8f, 0x00010ae44558 
>> postgres`brin_deform_tuple(brdesc=0x7fea3c86a3a8, 
>> tuple=0x7fea3c891040) + 40 at brin_tuple.c:414, queue = 
>> 'com.apple.main-thread', stop reason = signal SIGUSR1
>>  * frame #0: 0x00010ae44558 
>> postgres`brin_deform_tuple(brdesc=0x7fea3c86a3a8, 
>> tuple=0x7fea3c891040) + 40 at brin_tuple.c:414 [opt]
>>frame #1: 0x00010ae4000c 
>> postgres`bringetbitmap(scan=0x7fea3c875c20, tbm=) + 428 at 
>> brin.c:398 [opt]
>>frame #2: 0x00010ae9b451 
>> postgres`index_getbitmap(scan=0x7fea3c875c20, bitmap=) + 65 
>> at indexam.c:726 [opt]
>>frame #3: 0x00010b0035a9 
>> postgres`MultiExecBitmapIndexScan(node=) + 233 at 
>> nodeBitmapIndexscan.c:91 [opt]
>>frame #4: 0x00010b002840 postgres`BitmapHeapNext(node=) 
>> + 400 at nodeBitmapHeapscan.c:143 [opt]

Further analyzing the call stack, seems like this is not exact call
stack where it crashed.  Because, if you notice the code in the
brin_deform_tuple (line 414)

brin_deform_tuple(BrinDesc *brdesc, BrinTuple *tuple)

{
  dtup = brin_new_memtuple(brdesc);

 if (BrinTupleIsPlaceholder(tuple))
 dtup->bt_placeholder = true;
 dtup->bt_blkno = tuple->bt_blkno;  --> line 414

This can crash at line:414, if either tuple is invalid memory(but I
think it's not because we have already accessed this memory in above
if check) or dtup is invalid (this is also not possible because
brin_new_memtuple has already accessed this).

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Suppress Clang 3.9 warnings

2017-03-15 Thread Tom Lane

Noah Misch  writes:
> On Wed, Mar 15, 2017 at 10:57:15AM -0400, Tom Lane wrote:
>> Seems like the correct solution is to
>> absorb that fix, either by updating to a newer autoconf release or by
>> carrying our own version of AC_CHECK_DECLS until they come out with one.

> As you mention upthread, that Autoconf commit is still newer than every
> Autoconf release.  (Last release was 58 months ago.)  Altering configure.ac to
> work around the bug would be reasonable, but it feels heavy relative to the
> benefit of suppressing some warnings.

It does seem like rather a lot of work, but I think it's preferable to
hacking up the coding in port.h.  Mainly because we could booby-trap the
substitute AC_CHECK_DECLS to make sure we revert it whenever autoconf 2.70
does materialize (a check on m4_PACKAGE_VERSION, like the one at
configure.in line 22, ought to do the trick); whereas I do not think
we'd remember to de-kluge port.h if we kluge around it there.

I'm fine with leaving it alone, too.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: new set of psql patches for loading (saving) data from (to) text, binary files

2017-03-15 Thread Stephen Frost

Pavel,

I started looking through this to see if it might be ready to commit and
I don't believe it is.  Below are my comments about the first patch, I
didn't get to the point of looking at the others yet since this one had
issues.

* Pavel Stehule (pavel.steh...@gmail.com) wrote:
> 2017-01-09 17:24 GMT+01:00 Jason O'Donnell :
> > gstore/gbstore:

I don't see the point to 'gstore'- how is that usefully different from
just using '\g'?  Also, the comments around these are inconsistent, some
say they can only be used with a filename, others say it could be a
filename or a pipe+command.

There's a whitespace-only hunk that shouldn't be included.

I don't agree with the single-column/single-row restriction on these.  I
can certainly see a case where someone might want to, say, dump out a
bunch of binary integers into a file for later processing.

The tab-completion for 'gstore' wasn't correct (you didn't include the
double-backslash).  The patch also has conflicts against current master
now.

I guess my thinking about moving this forward would be to simplify it to
just '\gb' which will pull the data from the server side in binary
format and dump it out to the filename or command given.  If there's a
new patch with those changes, I'll try to find time to look at it.

I would recommend going through a detailed review of the other patches
also before rebasing and re-submitting them also, in particular look to
make sure that the comments are correct and consistent, that there are
comments where there should be (generally speaking, whole functions
should have at least some comments in them, not just the function header
comment, etc).

Lastly, I'd suggest creating a 'psql.source' file for the regression
tests instead of just throwing things into 'misc.source'.  Seems like we
should probably have more psql-related testing anyway and dumping
everything into 'misc.source' really isn't a good idea.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] [PATCH] Suppress Clang 3.9 warnings

2017-03-15 Thread Noah Misch

On Wed, Mar 15, 2017 at 10:57:15AM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > On Mon, Mar 13, 2017 at 06:35:53PM +0300, Aleksander Alekseev wrote:
> >> + * Unfortunately in case of strlcat and strlcpy we can't trust tests
> >> + * executed by Autotools if Clang > 3.6 is used.
> 
> > This is wrong on platforms that do have strlcpy() in libc.
> 
> Didn't you submit a patch to upstream autoconf awhile ago to fix the
> AC_CHECK_DECLS test for this?

Yes.

> Seems like the correct solution is to
> absorb that fix, either by updating to a newer autoconf release or by
> carrying our own version of AC_CHECK_DECLS until they come out with one.

As you mention upthread, that Autoconf commit is still newer than every
Autoconf release.  (Last release was 58 months ago.)  Altering configure.ac to
work around the bug would be reasonable, but it feels heavy relative to the
benefit of suppressing some warnings.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-15 Thread Dilip Kumar

On Wed, Mar 15, 2017 at 8:51 PM, Dilip Kumar  wrote:
>> I can try to provide a test case, if that wouldn't be enough to spot
>> the problem.
>
> Thanks for reporting, I am looking into this.  Meanwhile, if you can
> provide the reproducible test case then locating the issue will be
> faster.

After trying multiple attempts with different datasets I am unable to
reproduce the issue.

I tried with below test case:
create table t(a int, b varchar);
 insert into t values(generate_series(1,1000), repeat('x', 100));
 insert into t values(generate_series(1,1), repeat('x', 100));
create index idx on t using brin(a);
postgres=# analyze t;
ANALYZE
postgres=# explain analyze select * from t where a>6;

QUERY PLAN
--
 Gather  (cost=580794.52..3059826.52 rows=110414922 width=105) (actual
time=92.324..91853.716 rows=110425971 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Bitmap Heap Scan on t  (cost=579794.52..3058826.52
rows=46006218 width=105) (actual time=65.651..62023.020 rows=36808657
loops=3)
 Recheck Cond: (a > 6)
 Rows Removed by Index Recheck: 4
 Heap Blocks: lossy=204401
 ->  Bitmap Index Scan on idx  (cost=0.00..552190.79
rows=110425920 width=0) (actual time=88.215..88.215 rows=1904
loops=1)
   Index Cond: (a > 6)
 Planning time: 1.116 ms
 Execution time: 96176.881 ms
(11 rows)

Is it possible for you to provide a reproducible test case?  I also
applied the patch given up thread[1] but still could not reproduce.


[1] 
https://www.postgresql.org/message-id/attachment/50164/brin-correlation-v3.patch

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 160 matches

Mail list logo