date:20170219

Re: [HACKERS] Parallel bitmap heap scan

2017-02-19 Thread Dilip Kumar

On Sun, Feb 19, 2017 at 10:34 PM, Robert Haas  wrote:
> Yeah, but it looks like ExecReScanGather gets rid of the workers, but
> reuses the existing DSM.  I'm not quite sure what happens to the DSA.
> It looks like it probably just hangs around from the previous
> iteration, which means that any allocations will also hang around.

Yes right, they hang around. But, during rescan
(ExecReScanBitmapHeapScan) we can free all these DSA pointers ? That
mean before reallocating the DSA pointers we would have already got
rid of the old ones.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] 答复: [HACKERS] Adding new output parameter of pg_stat_statements toidentify operation of the query.(Internet mail)

2017-02-19 Thread 李跃森

Yes, it seems the pg_stat_sql function can fit the individual need of 
collecting tags of query.  However the new function can not return other values 
of query  at the same time, such as block number info, run time and so on. 
Returning these values at the same time are very important.  
So I think it is still needed by pg_stat_statement when monitoring a database 
system.  

发件人: pgsql-hackers-ow...@postgresql.org [pgsql-hackers-ow...@postgresql.org] 代表 
Tsunakawa, Takayuki [tsunakawa.ta...@jp.fujitsu.com]
发送时间: 2017年2月20日 8:34
收件人: 'husttrip...@vip.sina.com'; pgsql-hackers
主题: Re: [HACKERS] Adding new output parameter of pg_stat_statements toidentify 
operation of the query.(Internet mail)

From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of
> husttrip...@vip.sina.com
>  When using pg_stat_statements to collect running SQL of PG, we
> find it is hard for our program to get exact operation type of the SQL,
> such as SELECT, DELETE, UPDATE, INSERT, and so on.
>So we modify the the source code of pg_stat_statements and add another
> output parameter to tell us the operation type.
> Of course some application know their operation type, but for us and many
> public databases, doing this is hard.
> The only way is to reparse the SQL, obviously it is too expensive for a
> monitoring or diagnosis system.
> We have done the job and are willing to post a patch.
> I sent one through my work mail, but it seems that my mail didn't reach
> the maillist, so I try again by using my personal mail account.

A view for counting the number of executions per operation type is being 
developed for PostgreSQL 10, which is expected to be released this year.

https://commitfest.postgresql.org/13/790/

Would this fit your need?  If not, what's the benefit of getting the operation 
type via pg_stat_statements?

Regards
Takayuki Tsunakawa





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-19 Thread Andres Freund

On 2017-02-19 10:49:29 -0500, Tom Lane wrote:
> Robert Haas  writes:
> > On Sun, Feb 19, 2017 at 3:31 AM, Tom Lane  wrote:
> >> Thoughts?  Should we double down on trying to make this work according
> >> to the "all integer timestamps" protocol specs, or cut our losses and
> >> change the specs?
> 
> > I vote for doubling down.  It's bad enough that we have so many
> > internal details that depend on this setting; letting that cascade
> > into the wire protocol seems like it's just letting the chaos spread
> > farther and wider.
> 
> How do you figure that it's not embedded in the wire protocol already?
> Not only the replicated data for a timestamp column, but also the
> client-visible binary I/O format, depend on this.  I think having some
> parts of the protocol use a different timestamp format than other parts
> is simply weird, and as this exercise has shown, it's bug-prone as all
> get out.

I don't think it's that closely tied together atm. Things like
pg_basebackup, pg_receivexlog etc should work, without having to match
timestamp storage.  Logical replication, unless your output plugin dumps
data in binary / "raw" output, also works just fine across the timestamp
divide.

It doesn't sound that hard to add a SystemToIntTimestamp() function,
given it only needs to do something if float timestamps are enabled?

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-02-19 Thread Rushabh Lathia

Thanks Amit for raising this point. I was not at all aware of mark/restore.
I tried to come up with the case, but haven't found such case.

For now here is the patch with comment update.

Thanks,


On Sun, Feb 19, 2017 at 7:30 PM, Amit Kapila 
wrote:

> On Sun, Feb 19, 2017 at 2:22 PM, Robert Haas 
> wrote:
> > On Sat, Feb 18, 2017 at 6:43 PM, Amit Kapila 
> wrote:
> >> I think there is a value in supporting mark/restore position for any
> >> node which produces sorted results, however, if you don't want to
> >> support it, then I think we should update above comment in the code to
> >> note this fact.  Also, you might want to check other places to see if
> >> any similar comment updates are required in case you don't want to
> >> support mark/restore position for GatherMerge.
> >
> > I don't think it makes sense to put mark/support restore into Gather
> > Merge.  Maybe somebody else will come up with a test that shows
> > differently, but ISTM that with something like Sort it makes a ton of
> > sense to support mark/restore because the Sort node itself can do it
> > much more cheaply than would be possible with a separate Materialize
> > node.  If you added a separate Materialize node, the Sort node would
> > be trying to throw away the exact same data that the Materialize node
> > was trying to accumulate, which is silly.
> >
>
> I am not sure but there might be some cases where adding Materialize
> node on top of Sort node could make sense as we try to cost it as well
> and add it if it turns out to be cheap.
>
> >  Here with Gather Merge
> > there is no such thing happening; mark/restore support would require
> > totally new code - probably we would end up shoving the same code that
> > already exists in Materialize into Gather Merge as well.
> >
>
> I have tried to evaluate this based on plans reported by Rushabh and
> didn't find any case where GatherMerge can be beneficial by supporting
> mark/restore in the plans where it is being used (it is not being used
> on the right side of merge join).  Now, it is quite possible that it
> might be beneficial at higher scale factors or may be planner has
> ignored such a plan.  However, as we are not clear what kind of
> benefits we can get via mark/restore support for GatherMerge, it
> doesn't make much sense to take the trouble of implementing it.
>
> >
> > A comment update is probably a good idea, though.
> >
>
> Agreed.
>
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>



-- 
Rushabh Lathia


gather-merge-v8-update-comment.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SCRAM authentication, take three

2017-02-19 Thread Michael Paquier

On Sun, Feb 19, 2017 at 10:07 PM, Michael Paquier
 wrote:
> There is something that I think is still unwelcome in this patch: the
> interface in pg_hba.conf. I mentioned that in the previous thread but
> now if you want to match a user and a database with a scram password
> you need to do that with the current set of patches:
> local $dbname $user scram
> That's not really portable as SCRAM is one protocol in the SASL
> family, and even worse in our case we use SCRAM-SHA-256. I'd like to
> change pg_hba.conf to be as follows:
> local $dbname $user sasl protocol=scram_sha_256
> This is extensible for the future, and protocol is a mandatory option
> that would have now just a single value: scram_sha_256. Heikki,
> others, are you fine with that?

I have implemented that as 0009 which is attached, and need to be
applied on the rest of upthread. I am not sure if we want to make the
case where no protocol is specified map to everything. This would be a
tricky support for users in the future if new authentication
mechanisms for SASL are added in the future.

Another issue that I have is: do we really want to have
password_encryption being set to "scram" for verifiers of
SCRAM-SHA-256? I would think that scram_sha_256 makes the most sense.
Who knows, perhaps there could be in a couple of years a SHA-SHA-512..

At the same time, attached is a new version of 0008 that implements
SASLprep, I have stabilized the beast after fixing some allocation
calculations when converting the decomposed pg_wchar array back to a
utf8 string.
-- 
Michael


0009-Make-hba-configuration-for-SASL-more-extensible.patch
Description: Binary data


0008-Implement-SASLprep-aka-NFKC-for-SCRAM-authentication.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Bug fix] PQsendQuery occurs error when target_session_attrs is set to read-write

2017-02-19 Thread Michael Paquier

On Thu, Feb 16, 2017 at 10:55 AM, Michael Paquier
 wrote:
> It is possible to get a test easily in this area by abusing of the
> fact that multiple -d switches defined in psql make it use only the
> last value. By looking at psql() in PostgresNode.pm you would see what
> I mean as -d is defined by default. Or we could just enforce
> PGHOST/PGPORT as there is a method to get the unix domain directory.
> Not sure which one of those two methods is most elegant though. I
> would tend for just using PGHOST and PGPORT.

What do you think about the patch attached then? As env{PGHOST} is set
once and for all in INIT for each test run, I have gone with the
approach of building connection strings and feed that to psql -d. I
have reused 001_stream_rep.pl so as connections are done on already
existing nodes, making the test really cheap. Here is how the tests
look:
# testing connection parameter target_session_attrs
ok 5 - connect to node master if mode "read-write" and master,standby_1 listed
ok 6 - connect to node master if mode "read-write" and standby_1,master listed
ok 7 - connect to node master if mode "any" and master,standby_1 listed
ok 8 - connect to node standby_1 if mode "any" and standby_1,master listed

I have registered an entry in the CF here:
https://commitfest.postgresql.org/13/1014/
-- 
Michael

target_session_attrs-tap.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-19 Thread Masahiko Sawada

On Mon, Feb 20, 2017 at 11:35 AM, Jim Nasby  wrote:
> On 2/19/17 7:56 PM, Masahiko Sawada wrote:
>>
>> The half-dead pages are never cleaned up if the ratio of pages
>> containing garbage is always lower than threshold. Also in gin index
>> the pending list is never cleared, which become big problem. I guess
>> that we should take action for each type of indexes.
>
>
> What worries me is that each AM is going to have a different notion of what
> needs to happen to support this. That indicates that trying to handle this
> at the vacuum level is not a good idea.
>
> I think it would be wiser to add support for skipping scans to the AM API
> instead. That also means you don't have to add support for this to every
> index type to start with.

Yeah, and it's better to have it as a index storage parameter.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

2017-02-19 Thread Ashutosh Bapat

On Sun, Feb 19, 2017 at 2:33 PM, Robert Haas  wrote:
> On Fri, Feb 17, 2017 at 11:44 AM, Ashutosh Bapat
>  wrote:
>> That's true for a partitioned table, but not necessarily for every
>> append relation. Amit's patch is generic for all append relations. If
>> the child plans are joins or subquery segments of set operations, I
>> doubt if the same logic works. It may be better if we throw as many
>> workers (or some function "summing" those up) as specified by those
>> subplans. I guess, we have to use different logic for append relations
>> which are base relations and append relations which are not base
>> relations.
>
> Well, I for one do not believe that if somebody writes a UNION ALL
> with 100 branches, they should get 100 (or 99) workers.  Generally
> speaking, the sweet spot for parallel workers on queries we've tested
> so far has been between 1 and 4.  It's straining credulity to believe
> that the number that's correct for parallel append is more than an
> order of magnitude larger.  Since increasing resource commitment by
> the logarithm of the problem size has worked reasonably well for table
> scans, I believe we should pursue a similar approach here.

Thanks for that explanation. I makes sense. So, something like this
would work: total number of workers = some function of log(sum of
sizes of relations). The number of workers allotted to each segment
are restricted to the the number of workers chosen by the planner
while planning that segment. The patch takes care of the limit right
now. It needs to incorporate the calculation for total number of
workers for append.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] fd,c just Assert()s that lseek() succeeds

2017-02-19 Thread Tom Lane

I noticed while researching bug #14555 that fd.c contains two separate
cases like

vfdP->seekPos = lseek(vfdP->fd, (off_t) 0, SEEK_CUR);
Assert(vfdP->seekPos != (off_t) -1);

This seems, um, unwise.  It might somehow fail to fail in production
builds, because elsewhere it's assumed that -1 means FileUnknownPos,
but it seems like reporting the error would be a better plan.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Passing query string to workers

2017-02-19 Thread Rafia Sabih

On Sun, Feb 19, 2017 at 10:11 PM, Robert Haas  wrote:

> On Thu, Feb 16, 2017 at 6:41 PM, Kuntal Ghosh
>  wrote:
> > On Thu, Feb 16, 2017 at 5:47 PM, Rafia Sabih
> >  wrote:
> >> Other that that I updated some comments and other cleanup things. Please
> >> find the attached patch for the revised version.
> > Looks good.
> >
> > It has passed the regression tests (with and without regress mode).
> > Query is getting displayed for parallel workers in pg_stat_activity,
> > log statements and failed-query statements. Moved status to Ready for
> > committer.
>
> The first hunk of this patch says:
>
> +estate->es_queryString = (char *)
> palloc0(strlen(queryDesc->sourceText) + 1);
> +estate->es_queryString = queryDesc->sourceText;
>
> This is all kinds of wrong.
>
> 1. If you assign a value to a variable, and then immediately assign
> another value to a variable, the first assignment might as well not
> have happened.  In other words, this is allocating a string and then
> immediately leaking the allocated memory.
>
> 2. If the intent was to copy the string in into
> estate->es_queryString, ignoring for the moment that you'd need to use
> strcpy() rather than an assignment to make that happen, the use of
> palloc0 would be unnecessary.  Regular old palloc would be fine,
> because you don't need to zero the memory if you're about to overwrite
> it with some new contents.  Zeroing isn't free, so it's best not to do
> it unnecessarily.
>
> 3. Instead of using palloc and then copying the string, you could just
> use pstrdup(), which does the whole thing for you.
>
> 4. I don't actually think you need to copy the query string at all.
> Tom's email upthread referred to copying the POINTER to the string,
> not the string itself, and src/backend/executor/README seems to
> suggest that FreeQueryDesc can't be called until after ExecutorEnd().
> So I think you could replace these two lines of code with
> estate->es_queryString = queryDesc->sourceText and call it good.
> That's essentially what this is doing anyway, as the code is written.
>
> Fixed.

> +/* For passing query text to the workers */
>
> Unnecessary, because it's clear from the name PARALLEL_KEY_QUERY_TEXT.
>

True, done.

>
>  #define PARALLEL_TUPLE_QUEUE_SIZE  65536
> -
>  /*
>
> Don't remove this blank line.
>

Done.

>
> +   query_data = (char *) palloc0(strlen(estate->es_queryString) + 1);
> +   strcpy(query_data, estate->es_queryString);
>
> It's unnecessary to copy the query string here; you're going to use it
> later on in the very same function.  That code can just refer to
> estate->es_queryString directly.
>

Done.

>
> +   const char *es_queryString; /* Query string for passing to workers
> */
>
> Maybe this should be called es_sourceText, since it's being copied
> from querydesc->sourceText?
>

Done.

Please find the attached patch for the revised version.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


pass_queryText_to_workers_v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] drop support for Python 2.3

2017-02-19 Thread Devrim Gündüz


Hi,

On Sun, 2017-02-19 at 16:20 -0500, Tom Lane wrote:
> Well, that test is checking which week-of-the-year a Sunday midnight is
> considered to fall into.  There could be an edge-case bug in Tcl itself,
> or a problem with the time zone data, or maybe if you're setting LC_TIME
> to tr_TR, that changes whether weeks are considered to start on Sunday
> or Monday?  Although if that were the explanation I'd have expected this
> test to fail in tr_TR locale on pretty much any platform.  Weird.

All LC_* settings are en_GB.UTF-8 on this server.

> Whatever, even if it's a bug it's not our bug.  I've adjusted the test to
> check the following Tuesday, so as to dodge the edge case.

Thanks! Looks like buildfarm is green again.

Regards,
-- 
Devrim Gündüz
EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR


signature.asc
Description: This is a digitally signed message part

Re: [HACKERS] pg_dump emits ALTER TABLE ONLY partitioned_table

2017-02-19 Thread Amit Langote

Hi Stephen,

On 2017/02/17 22:32, Stephen Frost wrote:
> Amit,
> 
> * Amit Langote (langote_amit...@lab.ntt.co.jp) wrote:
>> In certain cases, pg_dump's dumpTableSchema() emits a separate ALTER TABLE
>> command for those schema elements of a table that could not be included
>> directly in the CREATE TABLE command for the table.
> 
> Any chance we could start adding regression tests for how pg_dump
> handles partitions?  I'm just about to the point where I have pretty
> much everything else covered (at least in pg_dump.c, where it's not a
> hard-to-reproduce error/exit case, or something version-dependent).
> 
> If you have any questions about how the TAP tests for pg_dump work, or
> about how to generate code-coverage checks to make sure you're at least
> hitting every line (tho, of course, not every possible path), let me
> know.  I'd be happy to explain them.

Yeah, I guess it would be a good idea to have some pg_dump TAP test
coverage for the new partitioning stuff.  I will look into that and get
back to you if I don't grok something there.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding new output parameter of pg_stat_statements to identify operation of the query.

2017-02-19 Thread Tom Lane

Jim Nasby  writes:
> Something that needs to be considered with doing this in  
> pg_stat_statement is that a query that's reported there can contain  
> multiple SQL statements. I don't remember offhand if all statements get  
> parsed as a whole before anything else happens; if that's the case then  
> you could potentially have an array in pg_stat_statements indicating  
> what the command tags are.

I think that's been addressed as of 83f2061dd.

My own concern here is that pg_stat_statements shared hashtable entries
(pgssEntry) are currently 200 bytes, if I counted correctly.  It's hard
to see how to implement this feature without adding COMPLETION_TAG_BUFSIZE
(64 bytes) to that, which is kind of a large percentage bump for a feature
request that AFAIR nobody else has ever made.

I suppose one way to avoid a large fixed-size field would be to store
the tag string out-of-line, similarly to what we do with the query text.
But then you've traded off a shared-memory-bloat worry for a performance
worry, ie what's the added overhead for dealing with another external
string.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tab completion for partitioning

2017-02-19 Thread Amit Langote

On 2017/02/20 1:22, Robert Haas wrote:
> On Thu, Feb 16, 2017 at 7:15 AM, Amit Langote
>  wrote:
>> Also attaching 0002 (unchanged) for tab-completion support for the new
>> partitioning syntax.
> 
> At one point you have this:
> 
> +/* Limited completion support for partition bound specification */
> +else if (TailMatches3("ATTACH", "PARTITION", MatchAny))
> +COMPLETE_WITH_CONST("FOR VALUES");
> +else if (TailMatches5("ATTACH", "PARTITION", MatchAny, "FOR", "VALUES"))
> +COMPLETE_WITH_LIST2("FROM (", "IN (");
> +/*
> 
> And then later on you have it again:
> 
> +/* Limited completion support for partition bound specification */
> +else if (TailMatches3("PARTITION", "OF", MatchAny))
> +COMPLETE_WITH_CONST("FOR VALUES");
> +else if (TailMatches5("PARTITION", "OF", MatchAny, "FOR", "VALUES"))
> +COMPLETE_WITH_LIST2("FROM (", "IN (");
> 
> I don't think there's any benefit in repeating this.  I'm not sure
> which location to keep, but it doesn't seem to make sense to have it
> in two places.

Thanks for taking a look.  Hm, I think the second part seems to be
needless duplication.  So, I changed it to match using TailMatches2("FOR",
"VALUES") and kept just one instance of it.  The first part is matching
and completing two different commands (ATTACH PARTITION partition_name and
PARTITION OF parent_name), so that seems fine.

Updated patch attached.

Thanks,
Amit
>From 898c4d77ceff443b3170b793a0c95a0b793c544a Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 13 Feb 2017 18:18:48 +0900
Subject: [PATCH] Tab completion for the new partitioning syntax

---
 src/bin/psql/tab-complete.c | 57 -
 1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 94814c20d0..ddad71a10f 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -463,6 +463,21 @@ static const SchemaQuery Query_for_list_of_tables = {
 	NULL
 };
 
+static const SchemaQuery Query_for_list_of_partitioned_tables = {
+	/* catname */
+	"pg_catalog.pg_class c",
+	/* selcondition */
+	"c.relkind IN ('P')",
+	/* viscondition */
+	"pg_catalog.pg_table_is_visible(c.oid)",
+	/* namespace */
+	"c.relnamespace",
+	/* result */
+	"pg_catalog.quote_ident(c.relname)",
+	/* qualresult */
+	NULL
+};
+
 static const SchemaQuery Query_for_list_of_constraints_with_schema = {
 	/* catname */
 	"pg_catalog.pg_constraint c",
@@ -913,6 +928,16 @@ static const SchemaQuery Query_for_list_of_matviews = {
 "   SELECT 'DEFAULT' ) ss "\
 "  WHERE pg_catalog.substring(name,1,%%d)='%%s'"
 
+/* the silly-looking length condition is just to eat up the current word */
+#define Query_for_partition_of_table \
+"SELECT pg_catalog.quote_ident(c2.relname) "\
+"  FROM pg_catalog.pg_class c1, pg_catalog.pg_class c2, pg_catalog.pg_inherits i"\
+" WHERE c1.oid=i.inhparent and i.inhrelid=c2.oid"\
+"   and (%d = pg_catalog.length('%s'))"\
+"   and pg_catalog.quote_ident(c1.relname)='%s'"\
+"   and pg_catalog.pg_table_is_visible(c2.oid)"\
+"   and c2.relispartition = 'true'"
+
 /*
  * This is a list of all "things" in Pgsql, which can show up after CREATE or
  * DROP; and there is also a query to get a list of them.
@@ -1742,7 +1767,8 @@ psql_completion(const char *text, int start, int end)
 		static const char *const list_ALTER2[] =
 		{"ADD", "ALTER", "CLUSTER ON", "DISABLE", "DROP", "ENABLE", "INHERIT",
 			"NO INHERIT", "RENAME", "RESET", "OWNER TO", "SET",
-		"VALIDATE CONSTRAINT", "REPLICA IDENTITY", NULL};
+		"VALIDATE CONSTRAINT", "REPLICA IDENTITY", "ATTACH PARTITION",
+		"DETACH PARTITION", NULL};
 
 		COMPLETE_WITH_LIST(list_ALTER2);
 	}
@@ -1923,6 +1949,26 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH_LIST4("FULL", "NOTHING", "DEFAULT", "USING");
 	else if (Matches4("ALTER", "TABLE", MatchAny, "REPLICA"))
 		COMPLETE_WITH_CONST("IDENTITY");
+	/*
+	 * If we have ALTER TABLE  ATTACH PARTITION, provide a list of
+	 * tables.
+	 */
+	else if (Matches5("ALTER", "TABLE", MatchAny, "ATTACH", "PARTITION"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, "");
+	/* Limited completion support for partition bound specification */
+	else if (TailMatches3("ATTACH", "PARTITION", MatchAny))
+		COMPLETE_WITH_CONST("FOR VALUES");
+	else if (TailMatches2("FOR", "VALUES"))
+		COMPLETE_WITH_LIST2("FROM (", "IN (");
+	/*
+	 * If we have ALTER TABLE  DETACH PARTITION, provide a list of
+	 * partitions of .
+	 */
+	else if (Matches5("ALTER", "TABLE", MatchAny, "DETACH", "PARTITION"))
+	{
+		completion_info_charp = prev3_wd;
+		COMPLETE_WITH_QUERY(Query_for_partition_of_table);
+	}
 
 	/* ALTER TABLESPACE  with RENAME TO, OWNER TO, SET, RESET */
 	else if (Matches3("ALTER", "TABLESPACE", MatchAny))
@@ -2300,6 +2346,15 @@ psql_completion(const char *text, int start, int end)
 	/* Complete "CREATE UNLOGGED" with TABLE

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-19 Thread Jim Nasby


On 2/19/17 7:56 PM, Masahiko Sawada wrote:

The half-dead pages are never cleaned up if the ratio of pages
containing garbage is always lower than threshold. Also in gin index
the pending list is never cleared, which become big problem. I guess
that we should take action for each type of indexes.


What worries me is that each AM is going to have a different notion of 
what needs to happen to support this. That indicates that trying to 
handle this at the vacuum level is not a good idea.


I think it would be wiser to add support for skipping scans to the AM 
API instead. That also means you don't have to add support for this to 
every index type to start with.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables and relfilenode

2017-02-19 Thread Amit Langote

On 2017/02/20 5:31, Simon Riggs wrote:
> On 16 February 2017 at 11:32, Simon Riggs  wrote:
>> On 10 February 2017 at 06:19, Amit Langote
>>  wrote:
>>
>>>  the "right thing" here being that the
>>> command's code either throws an error or warning (in some cases) if the
>>> specified table is a partitioned table or ignores any partitioned tables
>>> when it reads the list of relations to process from pg_class.
>>
>> This is a massive assumption and deserves major discussion.
>>
>> My expectation is that "partitioned tables" are "tables". Anything
>> else seems to fly in the face of both the SQL Standard and the POLA
>> principle for users coming from other database systems.
>>
>> IMHO all the main actions should all "just work" not throw errors.
> 
> This included DROP TABLE, which I commented on before.
> 
> CASCADE should not be required.

Yeah, it seemed like that is the consensus so I posted a patch [0], which
re-posted in a new thread titled "dropping partitioned tables without
CASCADE" [1].

Thanks,
Amit

[0] https://postgr.es/m/ca132b99-0d18-439a-fe65-024085449259%40lab.ntt.co.jp
[1] https://postgr.es/m/6c420206-45d7-3f56-8325-4bd7b76483ba%40lab.ntt.co.jp




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding new output parameter of pg_stat_statements to identify operation of the query.

2017-02-19 Thread Jim Nasby


On 2/19/17 6:34 PM, Tsunakawa, Takayuki wrote:

We have done the job and are willing to post a patch.
I sent one through my work mail, but it seems that my mail didn't reach
the maillist, so I try again by using my personal mail account.

A view for counting the number of executions per operation type is being 
developed for PostgreSQL 10, which is expected to be released this year.

https://commitfest.postgresql.org/13/790/

Would this fit your need?  If not, what's the benefit of getting the operation 
type via pg_stat_statements?


Something that needs to be considered with doing this in  
pg_stat_statement is that a query that's reported there can contain  
multiple SQL statements. I don't remember offhand if all statements get  
parsed as a whole before anything else happens; if that's the case then  
you could potentially have an array in pg_stat_statements indicating  
what the command tags are.


Short of that, I'm not sure it would be a good idea to only support a  
single tag being visible at a time; it would be certain to induce users  
to create code that's going to be buggy as soon as someone starts using  
multiple statements.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Documentation improvements for partitioning

2017-02-19 Thread Amit Langote

On 2017/02/20 1:04, Robert Haas wrote:
> On Thu, Feb 16, 2017 at 12:43 PM, Amit Langote wrote:
>> So I count more than a few votes saying that we should be able to DROP
>> partitioned tables without specifying CASCADE.
>>
>> I tried to implement that using the attached patch by having
>> StoreCatalogInheritance1() create DEPENDENCY_AUTO dependency between
>> parent and child if the child is a partition, instead of DEPENDENCY_NORMAL
>> that would otherwise be created.  Now it seems that that is one way of
>> making sure that partitions are dropped when the root partitioned table is
>> dropped, not sure if the best; why create the pg_depend entries at all one
>> might ask.  I chose it for now because that's the one with fewest lines of
>> change.  Adjusted regression tests as well, since we recently tweaked
>> tests [1] to work around the irregularities of test output when using 
>> CASCADE.
> 
> Could you possibly post this on a new thread with a reference back to
> this one?  The number of patches on this one is getting a bit hard to
> track, and some people may be under the misimpression that this one is
> just about documentation.

Sorry about that.  Sent the above message and the patch in a new thread
titled "dropping partitioned tables without CASCADE" [1].

Thanks,
Amit

[1] https://postgr.es/m/6c420206-45d7-3f56-8325-4bd7b76483ba%40lab.ntt.co.jp




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Instability in select_parallel regression test

2017-02-19 Thread Amit Kapila

On Sun, Feb 19, 2017 at 8:32 PM, Robert Haas  wrote:
> On Sun, Feb 19, 2017 at 6:50 PM, Amit Kapila  wrote:
>> To close the remaining gap, don't you think we can check slot->in_use
>> flag when generation number for handle and slot are same.
>
> That doesn't completely fix it either, because
> ForgetBackgroundWorker() also does
> BackgroundWorkerData->parallel_terminate_count++, which we might also
> fail to see, which would cause RegisterDynamicBackgroundWorker() to
> bail out early.  There are CPU ordering effects to think about here,
> not just the order in which the operations are actually performed.
>

Sure, I think we can attempt to fix that as well by adding write
memory barrier in ForgetBackgroundWorker().  The main point is if we
keep any loose end in this area, then there is a chance that the
regression test select_parallel can still fail, if not now, then in
future.  Another way could be that we can try to minimize the race
condition here and then adjust the select_parallel as suggested above
so that we don't see this failure.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] dropping partitioned tables without CASCADE

2017-02-19 Thread Amit Langote

Re-posting the patch I posted in a nearby thread [0].

On 2017/02/16 2:08, Robert Haas wrote:
> On Wed, Feb 15, 2017 at 11:34 AM, Alvaro Herrera
>  wrote:
>> I think new-style partitioning is supposed to consider each partition as
>> an implementation detail of the table; the fact that you can manipulate
>> partitions separately does not really mean that they are their own
>> independent object.  You don't stop to think "do I really want to drop
>> the TOAST table attached to this main table?" and attach a CASCADE
>> clause if so.  You just drop the main table, and the toast one is
>> dropped automatically.  I think new-style partitions should behave
>> equivalently.
>
> That's a reasonable point of view.  I'd like to get some more opinions
> on this topic.  I'm happy to have us do whatever most people want, but
> I'm worried that having table inheritance and table partitioning work
> differently will be create confusion.  I'm also suspicious that there
> may be some implementation difficulties.  On the hand, it does seem a
> little silly to say that DROP TABLE partitioned_table should always
> fail except in the degenerate case where there are no partitions, so
> maybe changing it is for the best.

So I count more than a few votes saying that we should be able to DROP
partitioned tables without specifying CASCADE.

I tried to implement that using the attached patch by having
StoreCatalogInheritance1() create DEPENDENCY_AUTO dependency between
parent and child if the child is a partition, instead of DEPENDENCY_NORMAL
that would otherwise be created.  Now it seems that that is one way of
making sure that partitions are dropped when the root partitioned table is
dropped, not sure if the best; why create the pg_depend entries at all one
might ask.  I chose it for now because that's the one with fewest lines of
change.  Adjusted regression tests as well, since we recently tweaked
tests [1] to work around the irregularities of test output when using CASCADE.

Thanks,
Amit

[0] https://postgr.es/m/ca132b99-0d18-439a-fe65-024085449259%40lab.ntt.co.jp
[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c397814
>From 68133e249156a36be15a6e2e02f702e10f356db5 Mon Sep 17 00:00:00 2001
From: amit 
Date: Thu, 16 Feb 2017 15:56:44 +0900
Subject: [PATCH] Allow dropping partitioned table without CASCADE

Currently, a normal dependency is created between a inheritance
parent and child when creating the child.  That means one must
specify CASCADE to drop the parent table if a child table exists.
When creating partitions as inheritance children, create auto
dependency instead, so that partitions are dropped automatically
when the parent is dropped i.e., without specifying CASCADE.
---
 src/backend/commands/tablecmds.c   | 26 ++
 src/test/regress/expected/alter_table.out  | 10 --
 src/test/regress/expected/create_table.out |  9 ++---
 src/test/regress/expected/inherit.out  | 18 --
 src/test/regress/expected/insert.out   |  7 ++-
 src/test/regress/expected/update.out   |  5 -
 src/test/regress/sql/alter_table.sql   | 10 --
 src/test/regress/sql/create_table.sql  |  9 ++---
 src/test/regress/sql/insert.sql|  7 ++-
 9 files changed, 34 insertions(+), 67 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3cea220421..31b50ad77f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -289,9 +289,11 @@ static List *MergeAttributes(List *schema, List *supers, char relpersistence,
 static bool MergeCheckConstraint(List *constraints, char *name, Node *expr);
 static void MergeAttributesIntoExisting(Relation child_rel, Relation parent_rel);
 static void MergeConstraintsIntoExisting(Relation child_rel, Relation parent_rel);
-static void StoreCatalogInheritance(Oid relationId, List *supers);
+static void StoreCatalogInheritance(Oid relationId, List *supers,
+		bool child_is_partition);
 static void StoreCatalogInheritance1(Oid relationId, Oid parentOid,
-		 int16 seqNumber, Relation inhRelation);
+		 int16 seqNumber, Relation inhRelation,
+		 bool child_is_partition);
 static int	findAttrByName(const char *attributeName, List *schema);
 static void AlterIndexNamespaces(Relation classRel, Relation rel,
    Oid oldNspOid, Oid newNspOid, ObjectAddresses *objsMoved);
@@ -725,7 +727,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 		  typaddress);
 
 	/* Store inheritance information for new rel. */
-	StoreCatalogInheritance(relationId, inheritOids);
+	StoreCatalogInheritance(relationId, inheritOids, stmt->partbound != NULL);
 
 	/*
 	 * We must bump the command counter to make the newly-created relation
@@ -2240,7 +2242,8 @@ MergeCheckConstraint(List *constraints, char *name, Node *expr)
  * supers is a list of the OIDs of the new

Re: [HACKERS] [PATCH] Add pg_disable_checksums() and supporting infrastructure

2017-02-19 Thread Jim Nasby


On 2/19/17 11:02 AM, David Christensen wrote:

My design notes for the patch were submitted to the list with little comment; 
see: 
https://www.postgresql.org/message-id/1E6E64E9-634B-43F4-8AA2-CD85AD92D2F8%40endpoint.com

I have since added the WAL logging of checksum states, however I’d be glad to 
take feedback on the other proposed approaches (particularly the system catalog 
changes + the concept of a checksum cycle).]


A couple notes:

- AFAIK unlogged tables get checksummed today if checksums are enabled; 
the same should hold true if someone enables checksums on the whole cluster.


- Shared relations should be handled as well; you don't mention them.

- If an entire cluster is going to be considered as checksummed, then 
even databases that don't allow connections would need to get enabled.


I like the idea of revalidation, but I'd suggest leaving that off of the 
first pass.


It might be easier on a first pass to look at supporting per-database 
checksums (in this case, essentially treating shared catalogs as their 
own database). All normal backends do per-database stuff (such as 
setting current_database) during startup anyway. That doesn't really 
help for things like recovery and replication though. :/ And there's 
still the question of SLRUs (or are those not checksum'd today??).


BTW, it occurs to me that this is related to the problem we have with 
trying to make changes that break page binary compatibility. If we had a 
method for handling that it would probably be useful for enabling 
checksums as well. You'd essentially treat an un-checksum'd page as if 
it was an "old page version". The biggest problem there is dealing with 
the potential that the new page needs to be larger than the old one was, 
but maybe there's some useful progress to be had in this area before 
tackling the "page too small" problem.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding new output parameter of pg_stat_statements to identify operation of the query.

2017-02-19 Thread Tsunakawa, Takayuki

From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of
> husttrip...@vip.sina.com
>  When using pg_stat_statements to collect running SQL of PG, we
> find it is hard for our program to get exact operation type of the SQL,
> such as SELECT, DELETE, UPDATE, INSERT, and so on.
>So we modify the the source code of pg_stat_statements and add another
> output parameter to tell us the operation type.
> Of course some application know their operation type, but for us and many
> public databases, doing this is hard.
> The only way is to reparse the SQL, obviously it is too expensive for a
> monitoring or diagnosis system.
> We have done the job and are willing to post a patch.
> I sent one through my work mail, but it seems that my mail didn't reach
> the maillist, so I try again by using my personal mail account.

A view for counting the number of executions per operation type is being 
developed for PostgreSQL 10, which is expected to be released this year.

https://commitfest.postgresql.org/13/790/

Would this fit your need?  If not, what's the benefit of getting the operation 
type via pg_stat_statements?

Regards
Takayuki Tsunakawa





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] case_preservation_and_insensitivity = on

2017-02-19 Thread Tom Lane

Joel Jacobson  writes:
> I think a good general philosophy for the PostgreSQL project would be to
> try to look at how to meed the needs for new users of new projects
> in a way that don't impair things for existing users,

Yeah, exactly, and the problem here is that claiming that something
like this doesn't impact existing users is just ignoring reality.
At the minimum, every author of a driver or other client-side tool
is going to have to try to figure out how to make their code work
with all the possible case-folding rules.  They don't have the option
of ignoring server-side changes in the language.

That's why I alluded upthread to the old "server-side-autocommit" fiasco.
That too was sold to us using the argument that it wouldn't impact people
who didn't turn it on.  It took about a year for the full scope of the
damage to become apparent, and the end result was that we took the feature
out again.  I haven't yet seen an alternate-case-folding proposal that
wouldn't likely end up as the same kind of failure.

The versions of autocommit that have actually stood the test of time were
implemented on the client side (in psql and JDBC, and I think ODBC as
well), where the scope of affected code was lots smaller.  I wonder
whether there's any hope of providing something useful for case-folding
in that way.  psql's lexer is already smart enough that you could teach it
rules like "smash any unquoted identifier to lower case" (probably it
would fold keywords too, but that seems OK).  That's probably not much
help for custom applications, which aren't likely to be going through
psql scripts; but the fact that such behavior is in reach at all on the
client side seems encouraging.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel bitmap heap scan

2017-02-19 Thread Tom Lane

Thomas Munro  writes:
> One practical problem that came up was the need for executor nodes to
> get a chance to do that kind of cleanup before the DSM segment is
> detached.  In my patch series I introduced a new node API
> ExecNodeDetach to allow for that.  Andres objected that the need for
> that is evidence that the existing protocol is broken and should be
> fixed instead.  I'm looking into that.

The thing that you really have to worry about for this kind of proposal
is "what if the query errors out and we never get to ExecEndNode"?
It's particularly nasty if you're talking about parallel queries where
maybe only one or some of the processes involved detect an error.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] case_preservation_and_insensitivity = on

2017-02-19 Thread Jim Nasby


On 2/19/17 4:51 PM, Joel Jacobson wrote:

But once you've already
decided to have a hard-and-fast rule that the names must be unique
after lower-casing, there's no obvious benefit to rejecting queries
that mention the same name with different case.

Exactly, that trade-off is necessary, otherwise such queries would be ambiguous.

I think a good general philosophy for the PostgreSQL project would be to
try to look at how to meed the needs for new users of new projects
in a way that don't impair things for existing users,
by accepting the new users might have to live with some trade-offs
for their new feature to be possible to implement,
such as in this case that the trade-off is to not be able to create
objects of different casing with the same lowercase names,
a tradeoff that I personally think would not be a problem for most projects,
since it seems unlikely you would both have a "users" table and a
"Users" table in the same database.


There's a serious problem with that, though: there certainly *could* be 
existing users that depend on the difference between "Users" and users, 
and there's no way we can just leave them out in the cold.


Even if the project decided that "Users" and users is stupid and that we 
should deprecate it, I think the odds of also deciding to tell existing 
users to re-write their apps are zero.


So no matter how this is designed, there has to be some way for existing 
users to be able to continue relying on "Users" and users being 
different. AFAICT that rules out any chance of this being a GUC, because 
you can't take a GUC into consideration when creating a unique index.


What would work is an initdb option that controls this: when ignoring 
case for uniqueness is disabled, your new column would simply be left as 
NULL. With some extra effort you could probably allow changing that on a 
running database as well, just not with something as easy to change as a 
GUC.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-02-19 Thread Masahiko Sawada

On Fri, Feb 17, 2017 at 3:41 AM, Robert Haas  wrote:
> On Thu, Feb 16, 2017 at 6:17 AM, Simon Riggs  wrote:
>> On 15 February 2017 at 08:07, Masahiko Sawada  wrote:
>>> It's a bug. Attached latest version patch, which passed make check.
>>
>> In its current form, I'm not sure this is a good idea. Problems...
>>
>> 1. I'm pretty sure the world doesn't need another VACUUM parameter
>>
>> I suggest that we use the existing vacuum scale factor/4 to reflect
>> that indexes are more sensitive to bloat.
>
> I do not think it's a good idea to control multiple behaviors with a
> single GUC.  We don't really know that dividing by 4 will be right for
> everyone, or even for most people.  It's better to have another
> parameter with a sensible default than to hardcode a ratio that might
> work out poorly for some people.
>
>> 2. The current btree vacuum code requires 2 vacuums to fully reuse
>> half-dead pages. So skipping an index vacuum might mean that second
>> index scan never happens at all, which would be bad.
>
> Maybe.  If there are a tiny number of those half-dead pages in a huge
> index, it probably doesn't matter.  Also, I don't think it would never
> happen, unless the table just never gets any more updates or deletes -
> but that case could also happen today.  It's just a matter of
> happening less frequently.

The half-dead pages are never cleaned up if the ratio of pages
containing garbage is always lower than threshold. Also in gin index
the pending list is never cleared, which become big problem. I guess
that we should take action for each type of indexes.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: ParallelFinish-hook of FDW/CSP (Re: [HACKERS] Steps inside ExecEndGather)

2017-02-19 Thread Kouhei Kaigai

> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas
> Sent: Monday, February 20, 2017 2:20 AM
> To: Kaigai Kouhei(海外 浩平) 
> Cc: Claudio Freire ; Amit Kapila
> ; pgsql-hackers 
> Subject: Re: ParallelFinish-hook of FDW/CSP (Re: [HACKERS] Steps inside
> ExecEndGather)
> 
> On Fri, Feb 17, 2017 at 12:46 PM, Kouhei Kaigai 
> wrote:
> > The attached patch is revised one.
> >
> > Invocation of Exec(Foreign|Custom)ParallelFinish was moved to
> > ExecParallelRetrieveInstrumentation() not to walk on the plan- state
> > tree twice.
> > One (hypothetical) downside is, FDW/CSP can retrieve its own run-time
> > statistics only when query is executed under EXPLAIN ANALYZE.
> >
> > This enhancement allows FDW/CSP to collect its specific run- time
> > statistics more than Instrumentation, then show them as output of
> > EXPLAIN. My expected examples are GPU's kernel execution time, DMA
> > transfer ratio and so on. These statistics will never appear in the
> > Instrumentation structure, however, can be a hot- point of performance
> > bottleneck if CustomScan works on background workers.
> 
> Would gather_shutdown_children_first.patch from
> https://www.postgresql.org/message-id/CAFiTN-s5KuRuDrQCEpiHHzmVf7JTtbn
> b8eb10c-6aywjdxb...@mail.gmail.com
> help with this problem also?  Suppose we did that, and then also added an
> ExecShutdownCustom method.  Then you'd definitely be able to get control
> before the DSM went away, either from ExecEndNode() or ExecShutdownNode().
> 
Ah, yes, I couldn't find any problem around the above approach.
ExecShutdownGather() can be called by either ExecShutdownNode() or
ExecEndGather(). This patch allows to have an entrypoint for CSP/FDW
prior to release of the DSM.

Thanks,

PG-Strom Project / NEC OSS Promotion Center
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel bitmap heap scan

2017-02-19 Thread Thomas Munro

On Sun, Feb 19, 2017 at 10:34 PM, Robert Haas  wrote:
> On Sun, Feb 19, 2017 at 9:59 PM, Dilip Kumar  wrote:
>> I can imagine it can get executed over and over if plan is something like 
>> below.
>>
>> NestLoopJoin
>> -> SeqScan
>> -> Gather
>> -> Parallel Bitmap Heap Scan
>>
>> But in such case every time the Inner node of the NLJ will be
>> rescanned i.e. Gather will be rescanned which in turn shutdown
>> workers.
>
> Yeah, but it looks like ExecReScanGather gets rid of the workers, but
> reuses the existing DSM.  I'm not quite sure what happens to the DSA.
> It looks like it probably just hangs around from the previous
> iteration, which means that any allocations will also hang around.

Yes, it hangs around.  Being able to reuse state in a rescan is a
feature: you might be able to reuse a hash table or a bitmap.

In the Parallel Shared Hash patch, the last participant to detach from
the shared hash table barrier gets a different return code and runs
some cleanup code.  The alternative was to make the leader wait for
the workers to finish accessing the hash table so that it could do
that.  I had it that way in an early version, but my goal is to
minimise synchronisation points so now it's just 'last to leave turns
out the lights' with no waiting.

One practical problem that came up was the need for executor nodes to
get a chance to do that kind of cleanup before the DSM segment is
detached.  In my patch series I introduced a new node API
ExecNodeDetach to allow for that.  Andres objected that the need for
that is evidence that the existing protocol is broken and should be
fixed instead.  I'm looking into that.

On Sun, Feb 19, 2017 at 9:59 PM, Dilip Kumar  wrote:
> So basically, what I want to propose is that Only during
> ExecReScanBitmapHeapScan we can free all the DSA pointers because at
> that time we can be sure that all the workers have completed there
> task and we are safe to free. (And we don't free any DSA memory at
> ExecEndBitmapHeapScan).

I think this works.

I also grappled a bit with the question of whether it's actually worth
trying to free DSA memory when you're finished with it, eating
precious CPU cycles at end of a join, or just letting the the
executor's DSA area get nuked at end of parallel execution.  As you
say, there is a special case for rescans to avoid leaks.  I described
this as a potential approach in a TODO note in my v5 patch, but
currently my code just does the clean-up every time on the grounds
that it's simple and hasn't shown up as a performance problem yet.

Some hand-wavy thoughts on this topic in the context of hash joins:

The argument for cleaning up sooner rather than later would be that it
could reduce the total peak memory usage of large execution plans.  Is
that a reasonable goal and can we achieve it?  I suspect the answer is
yes in theory but no in practice, and we don't even try to achieve it
in non-parallel queries as far as I know.

My understanding is that PostgreSQL's planner can generate left-deep,
bushy and right-deep hash join plans:

N nested left-deep hash joins:  Each hash join is on the outer side of
its parent, supplying tuples to probe the parent hash table.  Their
probe phases overlap, so all N hash tables must exist and be fully
loaded at the same time.

N nested right-deep hash joins:  Each hash join is on the inner side
of its parent, supplying tuples to build the hash table of its parent.
Theoretically you only need two full hash tables in memory at peak,
because you'll finish probing each one while build its parent's hash
table and then not need the child's hash table again (unless you need
to rescan).

N nested bushy hash joins:  Somewhere in between.

But we don't actually take advantage of that opportunity to reduce
peak memory today.  We always have N live hash tables and don't free
them until standard_ExecutorEnd runs ExecProcEnd on the top of the
plan.  Perhaps we could teach hash tables to free themselves ASAP at
the end of their probe phase unless they know a rescan is possible.
But that just opens a whole can of worms:  if we care about total peak
memory usage, should it become a planner goal that might favour
right-deep hash joins?  I guess similar questions must arise for
bitmap heap scan and anything else holding memory that it doesn't
technically need anymore, and parallel query doesn't really change
anything about the situation, except maybe that Gather nodes provide a
point of scoping somewhere in between 'eager destruction' and 'hog all
the space until end of plan' which makes things a bit better.  I don't
know anywhere near enough about query planners to say whether such
ideas about planning are reasonable, and am quite aware that it's
difficult terrain, and I have other fish to fry, so I'm going to put
down the can opener and back away.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list

Re: [HACKERS] pg_get_object_address() doesn't support composites

2017-02-19 Thread Jim Nasby


On 2/18/17 4:26 PM, Jim Nasby wrote:

On 2/17/17 9:53 PM, Alvaro Herrera wrote:

Jim Nasby wrote:

See below. ISTM that pg_get_object_address should support everything
pg_identify_object_as_address can output, no?

I'm guessing the answer here is to have pg_identify_object_as_address
complain if you ask it for something that's not mapable.

Yes, I think we should just reject the case in
pg_identify_object_as_address.


Attached patch does that, and tests for it. Note that there were some
unsupported types that were not being tested before. I've added a
comment requesting people update the test if they add more types.


While testing a view on pg_depend, I discovered this has the unfortunate 
side-effect of meaning you can no longer use 
pg_identify_object_as_address against pg_depend.ref*. Using it against 
pg_depend was already problematic though, because it throws an error on 
the pinned objects if you try and hand it classid, objid or objsubid. So 
maybe it's OK.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-02-19 Thread Gilles Darold

Le 16/02/2017 à 16:13, Robert Haas a écrit :
> On Wed, Feb 15, 2017 at 4:03 PM, Alvaro Herrera
>  wrote:
>> So what is going on here is that SysLogger_Start() wants to unlink the
>> current-logfile file if the collector is not enabled.  This should
>> probably be split out into a separate new function, for two reasons:
>> first, it doesn't seem good idea to have SysLogger_Start do something
>> other than start the logger; and second, so that we don't have a syscall
>> on each ServerLoop iteration.  That new function should be called from
>> some other place -- perhaps reaper() and just before entering
>> ServerLoop, so that the file is deleted if the syslogger goes away or is
>> not started.
> I think it's sufficient to just remove the file once on postmaster
> startup before trying to launch the syslogger for the first time.
> logging_collector is PGC_POSTMASTER, so if it's not turned on when the
> cluster first starts, it can't be activated later.  If it dies, that
> doesn't seem like a reason to remove the file.  We're going to restart
> the syslogger, and when we do, it can update the file.
>

I've attached a new full v30 patch that include last patch from Karl.
Now the current_logfile file is removed only at postmaster startup, just
before launching the syslogger for the first time.


-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 95afc2c..a668456 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4280,6 +4280,11 @@ SELECT * FROM parent WHERE key = 2400;
   where to log
  
 
+ 
+   current_logfiles
+   and the log_destination configuration parameter
+ 
+
  
 
  
@@ -4310,6 +4315,27 @@ SELECT * FROM parent WHERE key = 2400;
  must be enabled to generate
 CSV-format log output.

+   
+When either stderr or
+csvlog are included, the file
+current_logfiles is created to record the location
+of the log file(s) currently in use by the logging collector and the
+associated logging destination. This provides a convenient way to
+find the logs currently in use by the instance. Here is an example of
+this file's content:
+
+stderr pg_log/postgresql.log
+csvlog pg_log/postgresql.csv
+
+
+current_logfiles is recreated when a new log file
+is created as an effect of rotation, and
+when log_destination is reloaded.  It is removed when
+neither stderr
+nor csvlog are included
+in log_destination, and when the logging collector is
+disabled.
+   
 

 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index d7738b1..cfa6349 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -15468,6 +15468,13 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);
   
 
   
+   pg_current_logfile(text)
+   text
+   Primary log file name, or log in the requested format,
+   currently in use by the logging collector
+  
+
+  
pg_my_temp_schema()
oid
OID of session's temporary schema, or 0 if none
@@ -15686,6 +15693,45 @@ SET search_path TO schema , schema, ..

 

+pg_current_logfile
+   
+
+   
+Logging
+pg_current_logfile function
+   
+
+   
+ current_logfiles
+ and the pg_current_logfile function
+   
+
+   
+Logging
+current_logfiles file and the pg_current_logfile
+function
+   
+
+   
+pg_current_logfile returns, as text,
+the path of the log file(s) currently in use by the logging collector.
+The path includes the  directory
+and the log file name.  Log collection must be enabled or the return value
+is NULL.  When multiple log files exist, each in a
+different format, pg_current_logfile called
+without arguments returns the path of the file having the first format
+found in the ordered list: stderr, csvlog.
+NULL is returned when no log file has any of these
+formats.  To request a specific file format supply, as text,
+either csvlog or stderr as the value of the
+optional parameter. The return value is NULL when the
+log format requested is not a configured
+.  The
+pg_current_logfiles reflects the contents of the
+current_logfiles file.
+   
+
+   
 pg_my_temp_schema

 
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 127b759..262adea 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -36,6 +36,10 @@ these required items, the cluster configuration files
 PGDATA, although it is possible to place them elsewhere.
 
 
+
+  current_logfiles
+
+
 
 Contents of PGDATA
 
@@ -61,6 +65,12 @@ Item
 
 
 
+ current_logfiles
+ File recording the log file(s) currently written to by the logging
+  collector
+
+
+
  global
  Subdirectory containing cluster-wide

Re: [HACKERS] case_preservation_and_insensitivity = on

2017-02-19 Thread Joel Jacobson

On Sun, Feb 19, 2017 at 5:58 PM, Robert Haas  wrote:
>> When case preservation by default is on, then simply enforce
>> UNIQUE(LOWER(object_name)), to prevent ambiguity.
>
> That (1) breaks backward compatibility, because people might have
> objects with names identical except for case in existing databases and

Yes, but if since the target for this new feature is probably new projects
(who need this feature to even consider PostgreSQL as a database,
like the original author of the other thread where they were using SQLAnywhere),
then maybe that's an acceptable tradeoff, since all the existing
database who try
to use the feature will just get an error if they try to switch on the feature
e.g. "error: objects exist with identical lowercase names".

> (2) requires an expression index on a system catalog, which is not
> supported.  You could work around problem #2 with enough work, I
> guess, by storing two copies of the name of "name" column, one with
> the original casing and a second that has been downcased for indexing
> purposes.

Yes, storing both the unique lowercase name together with the OriginalCase
was also my idea on how to implement it.

> I don't really understand what the GUC does in this scenario.

Changing the GUC (if you run into problems with tools) would simply
just change what names are returned from pg for all the object names,
i.e. it would then return the lowercase names instead of the OriginalCase
names, to make the tools happy.
This means any pg user who enabled the CasePreserving feature when
creating the database, would not have to redesign their entire database
if they later run into problems with tools, but can simply just switch off
the GUC.

> But once you've already
> decided to have a hard-and-fast rule that the names must be unique
> after lower-casing, there's no obvious benefit to rejecting queries
> that mention the same name with different case.

Exactly, that trade-off is necessary, otherwise such queries would be ambiguous.

I think a good general philosophy for the PostgreSQL project would be to
try to look at how to meed the needs for new users of new projects
in a way that don't impair things for existing users,
by accepting the new users might have to live with some trade-offs
for their new feature to be possible to implement,
such as in this case that the trade-off is to not be able to create
objects of different casing with the same lowercase names,
a tradeoff that I personally think would not be a problem for most projects,
since it seems unlikely you would both have a "users" table and a
"Users" table in the same database.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Resolved typo in a comment

2017-02-19 Thread neha khatri

Hi,

Attached is a patch that fixes a comment typo in varlena.c


Thanks,
Neha


typo_correction.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting xmin from VACUUMs

2017-02-19 Thread Jim Nasby


On 2/19/17 3:43 AM, Robert Haas wrote:

This is the kind of information that you really want to see once per
autovac, though, not just the most recent autovac or some kind of
cumulative total.  Knowing that I've done 301 index scans in my last
300 vacuums is not nearly as useful as knowing which autovacuum did 2
index scans and what exactly was going on with that vacuum.  So I'm
not sure including this sort of thing in the stats files would be very
useful, or at least you'd want to think carefully about how to do it.


Well, counters would be better than nothing I think, but I agree with 
your concern. Really, that's a problem for the entire stats system to 
varying degrees.



As far as bloating the stats file is concerned, the big problem there
is that our current design for the stats file requires rewriting the
entire thing any time we want to update even a single byte of data.
We could fix that by splitting up the files more so that they are
smaller and faster to rewrite, but we could also fix it by coming up
with a way of rewriting just one part of a file instead of the whole
thing, or we could think about storing it in DSM so that you don't
have to rewrite anything at all.  I think that last option is worth
some serious study now that we have DSA, but it's currently not very
high on my personal priority list.


Hmm... so basically replace the temporary file with DSM?

Something else I think would be useful is a way to subscribe to stats 
updates.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication existing data copy - comments snapbuild.c

2017-02-19 Thread Erik Rijkers


On 2017-02-19 23:24, Erik Rijkers wrote:

0001-Use-asynchronous-connect-API-in-libpqwalreceiver-v2.patch
0002-Always-initialize-stringinfo-buffers-in-walsender-v2.patch
0003-Fix-after-trigger-execution-in-logical-replication-v2.patch
0004-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION-v2.patch
0001-Logical-replication-support-for-initial-data-copy-v4.patch


Improve comment blocks in
  src/backend/replication/logical/snapbuild.c



[deep sigh...]  attached...--- src/backend/replication/logical/snapbuild.c.orig2	2017-02-19 17:25:57.237527107 +0100
+++ src/backend/replication/logical/snapbuild.c	2017-02-19 23:19:57.654946968 +0100
@@ -34,7 +34,7 @@
  * xid. That is we keep a list of transactions between snapshot->(xmin, xmax)
  * that we consider committed, everything else is considered aborted/in
  * progress. That also allows us not to care about subtransactions before they
- * have committed which means this modules, in contrast to HS, doesn't have to
+ * have committed which means this module, in contrast to HS, doesn't have to
  * care about suboverflowed subtransactions and similar.
  *
  * One complexity of doing this is that to e.g. handle mixed DDL/DML
@@ -82,7 +82,7 @@
  * Initially the machinery is in the START stage. When an xl_running_xacts
  * record is read that is sufficiently new (above the safe xmin horizon),
  * there's a state transition. If there were no running xacts when the
- * runnign_xacts record was generated, we'll directly go into CONSISTENT
+ * running_xacts record was generated, we'll directly go into CONSISTENT
  * state, otherwise we'll switch to the FULL_SNAPSHOT state. Having a full
  * snapshot means that all transactions that start henceforth can be decoded
  * in their entirety, but transactions that started previously can't. In
@@ -274,7 +274,7 @@
 /*
  * Allocate a new snapshot builder.
  *
- * xmin_horizon is the xid >=which we can be sure no catalog rows have been
+ * xmin_horizon is the xid >= which we can be sure no catalog rows have been
  * removed, start_lsn is the LSN >= we want to replay commits.
  */
 SnapBuild *
@@ -1642,7 +1642,7 @@
 	fsync_fname("pg_logical/snapshots", true);
 
 	/*
-	 * Now there's no way we can loose the dumped state anymore, remember this
+	 * Now there's no way we can lose the dumped state anymore, remember this
 	 * as a serialization point.
 	 */
 	builder->last_serialized_snapshot = lsn;
@@ -1858,7 +1858,7 @@
 	char		path[MAXPGPATH];
 
 	/*
-	 * We start of with a minimum of the last redo pointer. No new replication
+	 * We start off with a minimum of the last redo pointer. No new replication
 	 * slot will start before that, so that's a safe upper bound for removal.
 	 */
 	redo = GetRedoRecPtr();
@@ -1916,7 +1916,7 @@
 			/*
 			 * It's not particularly harmful, though strange, if we can't
 			 * remove the file here. Don't prevent the checkpoint from
-			 * completing, that'd be cure worse than the disease.
+			 * completing, that'd be a cure worse than the disease.
 			 */
 			if (unlink(path) < 0)
 			{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication existing data copy - comments snapbuild.c

2017-02-19 Thread Erik Rijkers


0001-Use-asynchronous-connect-API-in-libpqwalreceiver-v2.patch
0002-Always-initialize-stringinfo-buffers-in-walsender-v2.patch
0003-Fix-after-trigger-execution-in-logical-replication-v2.patch
0004-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION-v2.patch
0001-Logical-replication-support-for-initial-data-copy-v4.patch


Improve comment blocks in
  src/backend/replication/logical/snapbuild.c



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Provide list of subscriptions and publications in psql's completion

2017-02-19 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> On Sun, Feb 19, 2017 at 6:13 PM, Michael Paquier
>  wrote:
> > On Sun, Feb 19, 2017 at 8:03 PM, Magnus Hagander  
> > wrote:
> >> If password auth is used, we have to store the password in plaintext
> >> equivalent somewhere. Meaning it's by definition going to be exposed to
> >> superusers and replication downstreams.
> >
> > Another possibility is to mention the use of the new passfile
> > parameter for connection strings in the docs... This removes the need
> > to have plain passwords directly stored in the database. Not sure if
> > that's better though because that still mean that the password is
> > present in plain format somewhere.
> 
> The real solution to "the password is present in plain form somewhere"
> is probably "don't use passwords for authentication".  Because,
> ultimately, a password by its nature has to exist in plain form
> somewhere, at least in someone's brain, and very likely in their
> password manager or the post-it stuck to their desk or the Notes app
> on their iPhone or similar.  If the password is simple enough that the
> DBA can be certain of remembering it without any sort of memory aid,
> it's probably dumb simple.  If the DBA has few enough distinct
> passwords that he doesn't need a memory aid just on the basis of sheer
> volume of passwords needing to be remembered, that's probably not good
> either.

I agree that the use of plaintext passwords should be discouraged, but
we don't actually provide any way to address these kinds of proxy
authentication issues today except to store *some* secret somewhere on
the system which, if compromised, would allow the attacker to gain
access to the downstream system.

There are solutions to authentication proxying and it would be great if
we would implement them (Kerberos Delegation, for example, and there's a
mechanism to do something similar in TLS also, iirc, and we should be
able to come up with something for same-system cross-database or
cross-cluster unix-socket based auth, really...), but I don't think
anyone is currently working on implementing them, which is unfortunate.

I tend to agree with Michael that we should make it very clear in
our documentation that the password/private key/whatever has to be
stored unencrypted on the filesystem somewhere today.  Perhaps it
doesn't need to be a big "WARNING WARNING" type of message, but it's
important information which we should make sure the user is aware of.

If we do implement proper proxy authentication, we will want to include
documentation as to just what that means regarding attacks too (Kerberos
Delegation, for example, usually means that a ticket for the remote
service is stored somewhere and is valid for a period of time, meaning
that a privileged attacker who has access to the system at the same time
the user is using the system would be able to gain access to the remote
system by stealing the delegated ticket).

Thanks!

Stephen

signature.asc
Description: Digital signature

[HACKERS] Deprecate floating point timestamps.

2017-02-19 Thread Jim Nasby

Over in the "Keeping pg_recvlogical's "feTimestamp" separate from 
TimestampTz"...


On 2/17/17 12:15 PM, Tom Lane wrote:
> I am not sure that it was really a good design to pretend that the
> replication protocol is independent of --disable-integer-datetimes
> when the underlying WAL stream most certainly isn't.

Ok, I'll open the can of worms...

- Should replication be changed to obey --disable-integer-datetimes?
- Should we consider formally deprecating FP timestamps, starting with 
no longer supporting SR?


While #2 may sound rather severe, I'm wondering if a different datatype 
for timestamps stored as floats would ease that pain.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Adding new output parameter of pg_stat_statements to identify operation of the query.

2017-02-19 Thread husttripper

 Hi PG hackers: When using pg_stat_statements to collect running SQL of 
PG, we find it is hard for our program to get exact operation type of the SQL,  
such as SELECT, DELETE, UPDATE, INSERT, and so on.   So we modify the the 
source code of pg_stat_statements and add another output parameter to tell us 
the operation type.Of course some application know their operation type, but 
for us and many public databases, doing this is hard. The only way is to 
reparse the SQL, obviously it is too expensive for a monitoring or diagnosis 
system.We have done the job and are willing to post a patch. I sent one through 
my work mail, but it seems that my mail didn't reach the maillist, so I try 
again by using my personal mail account.
jasonysli

Re: [HACKERS] Allow pg_dumpall to work without pg_authid

2017-02-19 Thread Stephen Frost

Robins,

* Robins Tharakan (thara...@gmail.com) wrote:
> On 19 February 2017 at 17:02, Robins Tharakan  wrote:
> > On Sun, 19 Feb 2017 at 10:08 Stephen Frost  wrote:
> >> If anything, it should use pg_roles, not pg_user.
> >>
> >> I don't really like the "--avoid-pgauthid" option, but "--no-passwords"
> >> would probably work.
> >>
> > '--no-passwords' is a good alternative.
> > Would submit a patch soon.
> >
> After reviewing further, it seems that merely adding a password related
> workaround wouldn't suffice. Further --no-password is already an alias for
> -w, so that flag is effectively taken.

Ah, yes, that makes --no-passwords a bad name.

The other changes to use pg_roles instead of pg_authid when rolpassword
isn't being used look like they should just be changed to use pg_roles
instead of using one or the other.  That should be an independent patch
from the one which adds the option we are discussing.

> Since the main restriction with AWS RDS is the unavailability of pg_authid,
> probably that is a better basis to name the flag on.

I don't like the idea of having the catalog name drive the option name.
For one thing, there's been some discussion of using column-level privs
on catalogs, which would actually make it such that pg_authid could be
queried by regular users for the public columns.

Perhaps --no-role-passwords instead?

> Attaching a patch to add a new flag (--no-pgauthid) to pg_dumpall that can
> dump Globals without needing pg_authid. So the following works with AWS RDS
> Postgres databases.
> 
> pg_dumpall --no-pgauthid --globals-only > a.sql

Does that then work with a non-superuser account on a regular PG
instance also?  If not, I'd like to suggest that we consider follow-on
patches to provide options for whatever else currently requires
superuser on a regular install.

> I'll create a Commitfest entry, if there aren't many objections.

Yes, please do create a commitfest entry for this.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] drop support for Python 2.3

2017-02-19 Thread Tom Lane

Devrim =?ISO-8859-1?Q?G=FCnd=FCz?=  writes:
> On Sun, 2017-02-19 at 13:48 -0500, Tom Lane wrote:
>> Or conceivably it's timezone dependent?

> FWIW, the timezone of the server is GMT+3, if that is what you are asking.

Well, that test is checking which week-of-the-year a Sunday midnight is
considered to fall into.  There could be an edge-case bug in Tcl itself,
or a problem with the time zone data, or maybe if you're setting LC_TIME
to tr_TR, that changes whether weeks are considered to start on Sunday
or Monday?  Although if that were the explanation I'd have expected this
test to fail in tr_TR locale on pretty much any platform.  Weird.

Whatever, even if it's a bug it's not our bug.  I've adjusted the test to
check the following Tuesday, so as to dodge the edge case.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Documentation improvements for partitioning

2017-02-19 Thread Simon Riggs

On 15 February 2017 at 15:46, Robert Haas  wrote:

>> It leaves me asking what else is missing.
>
> There is certainly a lot of room for improvement here but I don't
> understand your persistent negativity about what's been done thus far.
> I think it's pretty clearly a huge step forward, and I think Amit
> deserves a ton of credit for making it happen.  The improvements in
> bulk loading performance alone are stupendous.  You apparently have
> the idea that somebody could have written an even larger patch that
> solved even more problems at once, but this was already a really big
> patch, and IMHO quite a good one.

Please explain these personal comments against me.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables and relfilenode

2017-02-19 Thread Simon Riggs

On 16 February 2017 at 11:32, Simon Riggs  wrote:
> On 10 February 2017 at 06:19, Amit Langote
>  wrote:
>
>>  the "right thing" here being that the
>> command's code either throws an error or warning (in some cases) if the
>> specified table is a partitioned table or ignores any partitioned tables
>> when it reads the list of relations to process from pg_class.
>
> This is a massive assumption and deserves major discussion.
>
> My expectation is that "partitioned tables" are "tables". Anything
> else seems to fly in the face of both the SQL Standard and the POLA
> principle for users coming from other database systems.
>
> IMHO all the main actions should all "just work" not throw errors.

This included DROP TABLE, which I commented on before.

CASCADE should not be required.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] drop support for Python 2.3

2017-02-19 Thread Devrim Gündüz


Hi Tom,

On Sun, 2017-02-19 at 13:48 -0500, Tom Lane wrote:
> Hmph.  I can't see any relevant-looking source changes between 8.4.13
> and 8.4.15, which I have laying about here and which works fine.
> I wonder if Red Hat is carrying some distro-specific patch that
> breaks this case? 

Just downloaded SRPM, and I don't *think* so their patch would break this.

>  Or conceivably it's timezone dependent?

FWIW, the timezone of the server is GMT+3, if that is what you are asking.

> Anyway, my inclination is just to tweak that test a bit so it doesn't
> trip over the problem.  The point of the test is mainly to see if the
> [clock] command works at all, not to exercise any specific parameter
> choices.  Would you check whether this:
> 
> $ tclsh
> % clock format [clock scan "1/26/2010"] -format "%U"
> 
> gives the expected result "04" on that machine?

Yes, I got 04.

Regards,
-- 
Devrim Gündüz
EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR


signature.asc
Description: This is a digitally signed message part

Re: [HACKERS] drop support for Python 2.3

2017-02-19 Thread Tom Lane

Devrim =?ISO-8859-1?Q?G=FCnd=FCz?=  writes:
> On Sun, 2017-02-19 at 10:42 -0500, Tom Lane wrote:
>> Relevant question: what version of tcl is installed on those?

> 8.4.13 is installed.

Hmph.  I can't see any relevant-looking source changes between 8.4.13
and 8.4.15, which I have laying about here and which works fine.
I wonder if Red Hat is carrying some distro-specific patch that
breaks this case?  Or conceivably it's timezone dependent?

Anyway, my inclination is just to tweak that test a bit so it doesn't
trip over the problem.  The point of the test is mainly to see if the
[clock] command works at all, not to exercise any specific parameter
choices.  Would you check whether this:

$ tclsh
% clock format [clock scan "1/26/2010"] -format "%U"

gives the expected result "04" on that machine?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] drop support for Python 2.3

2017-02-19 Thread Devrim Gündüz


Hi Tom,

On Sun, 2017-02-19 at 10:42 -0500, Tom Lane wrote:
> Relevant question: what version of tcl is installed on those?

8.4.13 is installed.

Regards,
-- 
Devrim Gündüz
EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR


signature.asc
Description: This is a digitally signed message part

Re: [HACKERS] COPY IN/BOTH vs. extended query mode

2017-02-19 Thread Corey Huinker

On Sun, Feb 19, 2017 at 9:04 AM, Robert Haas  wrote:

If you tried to write an SQL-callable function that internally started
> and ended a copy from the client, then I think you would run into this
> problem, and probably some others.
>
>
That's it. I had a PoC patch submitted that allowed someone to do this

insert into some_table(id, total_sum) select id, sum(a_numeric_metric) from
copy_srf('/a/file/name') group by 1

or

insert into some_table(id, total_sum) select id, sum(a_numeric_metric) from
copy_srf('/a/program/name arg1 arg2',true) group by 1

and those worked just fine, however, attempts to use the STDIN

insert into some_table(id, total_sum) select id, sum(a_numeric_metric) from
copy_srf(null) group by 1

failed, because as it was explained to me, the order of such events would
be:

1. start query
2. send result set format to client
3. start copy which implies that query result set is done
4. finish copy
5. emit query results to client, but the defining result format is gone,
thus error.

I'm just putting this here for future reference in case there is a protocol
change in the works.

Re: [pgsql-www] [HACKERS] Small issue in online devel documentation build

2017-02-19 Thread Fabien COELHO





  -f
filename
  
  
 


The filename in the newer html appears much larger under chrome, seemingly 
because of the  within a . Maybe a bug in chrome CSS 
interpretation, because CSS on code seems to indicate "font-size: 1.3em", but 
it seems to do 1.3**2 instead for "filename"... However it does not do that 
elsewhere so it may not be that simple...



2. avoid replaceable within option in initial sgml files
   => many changes, will reappear if someone forgets.


I wrote a few lines of perl to move replaceable out of option and did some 
manual editing is special cases, the resulting simple 359 changes is 
attached.


--
Fabien.diff --git a/doc/src/sgml/ref/clusterdb.sgml b/doc/src/sgml/ref/clusterdb.sgml
index 67582fd..6256508 100644
--- a/doc/src/sgml/ref/clusterdb.sgml
+++ b/doc/src/sgml/ref/clusterdb.sgml
@@ -122,8 +122,8 @@ PostgreSQL documentation
  
 
  
-  -t table
-  --table=table
+  -t table
+  --table=table
   

 Cluster table only.
@@ -173,8 +173,8 @@ PostgreSQL documentation
 
 
  
-  -h host
-  --host=host
+  -h host
+  --host=host
   

 Specifies the host name of the machine on which the server is
@@ -185,8 +185,8 @@ PostgreSQL documentation
  
 
  
-  -p port
-  --port=port
+  -p port
+  --port=port
   

 Specifies the TCP port or local Unix domain socket file
@@ -197,8 +197,8 @@ PostgreSQL documentation
  
 
  
-  -U username
-  --username=username
+  -U username
+  --username=username
   

 User name to connect as.
@@ -243,7 +243,7 @@ PostgreSQL documentation
  
 
  
-  --maintenance-db=dbname
+  --maintenance-db=dbname
   

  Specifies the name of the database to connect to discover what other
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index c363bd4..f3ef345 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -86,8 +86,8 @@ PostgreSQL documentation
  
 
  
-  -D tablespace
-  --tablespace=tablespace
+  -D tablespace
+  --tablespace=tablespace
   

 Specifies the default tablespace for the database. (This name
@@ -108,8 +108,8 @@ PostgreSQL documentation
  
 
  
-  -E encoding
-  --encoding=encoding
+  -E encoding
+  --encoding=encoding
   

 Specifies the character encoding scheme to be used in this
@@ -121,8 +121,8 @@ PostgreSQL documentation
  
 
  
-  -l locale
-  --locale=locale
+  -l locale
+  --locale=locale
   

 Specifies the locale to be used in this database.  This is equivalent
@@ -132,7 +132,7 @@ PostgreSQL documentation
  
 
  
-  --lc-collate=locale
+  --lc-collate=locale
   

 Specifies the LC_COLLATE setting to be used in this database.
@@ -141,7 +141,7 @@ PostgreSQL documentation
  
 
  
-  --lc-ctype=locale
+  --lc-ctype=locale
   

 Specifies the LC_CTYPE setting to be used in this database.
@@ -150,8 +150,8 @@ PostgreSQL documentation
  
 
  
-  -O owner
-  --owner=owner
+  -O owner
+  --owner=owner
   

 Specifies the database user who will own the new database.
@@ -161,8 +161,8 @@ PostgreSQL documentation
  
 
  
-  -T template
-  --template=template
+  -T template
+  --template=template
   

 Specifies the template database from which to build this
@@ -209,8 +209,8 @@ PostgreSQL documentation
 
 
  
-  -h host
-  --host=host
+  -h host
+  --host=host
   

 Specifies the host name of the machine on which the
@@ -221,8 +221,8 @@ PostgreSQL documentation
  
 
  
-  -p port
-  --port=port
+  -p port
+  --port=port
   

 Specifies the TCP port or the local Unix domain socket file
@@ -232,8 +232,8 @@ PostgreSQL documentation
  
 
  
-  -U username
-  --username=username
+  -U username
+  --username=username
   

 User name to connect as.
@@ -278,7 +278,7 @@ PostgreSQL documentation
  
 
  
-  --maintenance-db=dbname
+  --maintenance-db=dbname
   

  Specifies the name of the database to connect to when creating the
diff --git a/doc/src/sgml/ref/createlang.sgml b/doc/src/sgml/ref/createlang.sgml
index e9c95d3..ef442bc 100644
--- a/doc/src/sgml/ref/createlang.sgml
+++ b/doc/src/sgml/ref/createlang.sgml
@@ -138,8 +138,8 @@ PostgreSQL documentation
 
 
  
-  -h host
-  --host=host
+  -h host
+  --host=host
   

 Specifies the host name of the machine on which the
@@ -151,8 +151,8 @@ PostgreSQL documentation
  
 
  
-  -p port
-  --port=port
+  -p port
+  --port=port

Re: ParallelFinish-hook of FDW/CSP (Re: [HACKERS] Steps inside ExecEndGather)

2017-02-19 Thread Robert Haas

On Fri, Feb 17, 2017 at 12:46 PM, Kouhei Kaigai  wrote:
> The attached patch is revised one.
>
> Invocation of Exec(Foreign|Custom)ParallelFinish was moved to
> ExecParallelRetrieveInstrumentation() not to walk on the plan-
> state tree twice.
> One (hypothetical) downside is, FDW/CSP can retrieve its own
> run-time statistics only when query is executed under EXPLAIN
> ANALYZE.
>
> This enhancement allows FDW/CSP to collect its specific run-
> time statistics more than Instrumentation, then show them as
> output of EXPLAIN. My expected examples are GPU's kernel execution
> time, DMA transfer ratio and so on. These statistics will never
> appear in the Instrumentation structure, however, can be a hot-
> point of performance bottleneck if CustomScan works on background
> workers.

Would gather_shutdown_children_first.patch from
https://www.postgresql.org/message-id/cafitn-s5kurudrqcepihhzmvf7jttbnb8eb10c-6aywjdxb...@mail.gmail.com
help with this problem also?  Suppose we did that, and then also added
an ExecShutdownCustom method.  Then you'd definitely be able to get
control before the DSM went away, either from ExecEndNode() or
ExecShutdownNode().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-19 Thread Robert Haas

On Sun, Feb 19, 2017 at 9:19 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Sun, Feb 19, 2017 at 3:31 AM, Tom Lane  wrote:
>>> Thoughts?  Should we double down on trying to make this work according
>>> to the "all integer timestamps" protocol specs, or cut our losses and
>>> change the specs?
>
>> I vote for doubling down.  It's bad enough that we have so many
>> internal details that depend on this setting; letting that cascade
>> into the wire protocol seems like it's just letting the chaos spread
>> farther and wider.
>
> How do you figure that it's not embedded in the wire protocol already?
> Not only the replicated data for a timestamp column, but also the
> client-visible binary I/O format, depend on this.  I think having some
> parts of the protocol use a different timestamp format than other parts
> is simply weird, and as this exercise has shown, it's bug-prone as all
> get out.

You've got a point, but if we don't make the replication protocol
consistently use integers, then every client that cares about those
fields has to be able to handle either format, or at least detect that
it got a different format than it was expecting and fail cleanly.
How's that going to work?

Also, I feel like there's some difference between the data stored in a
cluster and the metadata which describes the cluster.  Making the
interpretation of the metadata depend on the cluster configuration
feels circular, like using non-ASCII characters in the name of an
encoding.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel bitmap heap scan

2017-02-19 Thread Robert Haas

On Sun, Feb 19, 2017 at 9:59 PM, Dilip Kumar  wrote:
> I can imagine it can get executed over and over if plan is something like 
> below.
>
> NestLoopJoin
> -> SeqScan
> -> Gather
> -> Parallel Bitmap Heap Scan
>
> But in such case every time the Inner node of the NLJ will be
> rescanned i.e. Gather will be rescanned which in turn shutdown
> workers.

Yeah, but it looks like ExecReScanGather gets rid of the workers, but
reuses the existing DSM.  I'm not quite sure what happens to the DSA.
It looks like it probably just hangs around from the previous
iteration, which means that any allocations will also hang around.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Add pg_disable_checksums() and supporting infrastructure

2017-02-19 Thread David Christensen


> On Feb 19, 2017, at 5:05 AM, Robert Haas  wrote:
> 
> On Fri, Feb 17, 2017 at 2:28 AM, David Christensen  wrote:
>> - Change "data_checksums" from a simple boolean to "data_checksum_state", an 
>> enum type for all of
>>  the potentially-required states for this feature (as well as enabling).
> 
> Color me skeptical.  I don't know what CHECKSUMS_ENABLING,
> CHECKSUMS_ENFORCING, and CHECKSUMS_REVALIDATING are intended to
> represent -- and there's no comments in the patch explaining it -- but
> if we haven't yet written the code to enable checksums, how do we know
> for sure which states it will require?
> 
> If we're going to accept a patch to disable checksums without also
> having the capability to enable checksums, I think we should leave out
> the speculative elements about what might be needed on the "enable"
> side and just implement the minimal "disable" side.
> 
> However, FWIW, I don't accept that being able to disable checksums
> online is a sufficient advance to justify enabling checksums by
> default.  Tomas had some good points on another thread about what
> might be needed to really make that a good choice, and I'm still
> skeptical about whether checksums catch any meaningful number of
> errors that wouldn't be caught otherwise, and about the degree to
> which any complaints it issues are actionable.  I'm not really against
> this patch on its own merits, but I think it's a small advance in an
> area that needs a lot of work.  I think it would be a lot more useful
> if we had a way to *enable* checksums online.  Then people who find
> out that checksums exist and want them have an easier way of getting
> them, and anyone who uses the functionality in this patch and then
> regrets it has a way to get back.


Hi Robert, this is part of a larger patch which *does* enable the checksums 
online; I’ve been extracting the necessary pieces out with the understanding 
that some people thought the disable code might be useful in its own merit.  I 
can add documentation for the various states.  The CHECKSUM_REVALIDATE is the 
only one I feel is a little wibbly-wobbly; the other states are required 
because of the fact that enabling checksums requires distinguishing between “in 
process of enabling” and “everything is enabled”.

My design notes for the patch were submitted to the list with little comment; 
see: 
https://www.postgresql.org/message-id/1E6E64E9-634B-43F4-8AA2-CD85AD92D2F8%40endpoint.com

I have since added the WAL logging of checksum states, however I’d be glad to 
take feedback on the other proposed approaches (particularly the system catalog 
changes + the concept of a checksum cycle).]

Best,

David
--
David Christensen
End Point Corporation
da...@endpoint.com
785-727-1171





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] case_preservation_and_insensitivity = on

2017-02-19 Thread Robert Haas

On Thu, Feb 16, 2017 at 11:16 PM, Joel Jacobson  wrote:
>> The short answer is that nobody can see a way to modify the identifier
>> case-folding rules that isn't going to add more pain than it subtracts.
>> And much of the added pain will be felt by people who aren't getting
>> any benefit, who will therefore be vociferously against the whole thing.
>
> I've read the discussion and have an idea:
>
> When case preservation by default is on, then simply enforce
> UNIQUE(LOWER(object_name)), to prevent ambiguity.

That (1) breaks backward compatibility, because people might have
objects with names identical except for case in existing databases and
(2) requires an expression index on a system catalog, which is not
supported.  You could work around problem #2 with enough work, I
guess, by storing two copies of the name of "name" column, one with
the original casing and a second that has been downcased for indexing
purposes.

I don't really understand what the GUC does in this scenario.
Changing a GUC won't change the data that's already in your system
catalogs, so it would have to change the interpretation of
newly-arriving queries against that data.  But once you've already
decided to have a hard-and-fast rule that the names must be unique
after lower-casing, there's no obvious benefit to rejecting queries
that mention the same name with different case.

As compared with any proposal that actually changes the case-folding
behavior, a new mode that is case-preserving but case-insensitive
would break less stuff.  People who never use case to differentiate
between different objects probably won't notice the difference, except
that sometimes they might accidentally type something in the wrong
case and it would work instead of failing.  Tools that are designed on
the existing fold-to-lowercase behavior would keep working if they
don't actually query the system catalogs for information; those that
do might need adjustment.

It still sounds pretty painful, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CORRESPONDING clause design

2017-02-19 Thread Robert Haas

On Thu, Feb 16, 2017 at 8:28 PM, Surafel Temsgen  wrote:
> Here is the implementation of the clause with the slight change, instead of
> doing column mapping for each side of leaf Queries in planner I make the
> projection nodes output to corresponding column lists only.
>
> This patch compiles and tests successfully with master branch on
> ubuntu-15.10-desktop-amd64.It also includes documentation and new regression
> tests in union.sql.

You should probably add this to https://commitfest.postgresql.org/ so
that it doesn't get forgotten about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Passing query string to workers

2017-02-19 Thread Robert Haas

On Thu, Feb 16, 2017 at 6:41 PM, Kuntal Ghosh
 wrote:
> On Thu, Feb 16, 2017 at 5:47 PM, Rafia Sabih
>  wrote:
>> Other that that I updated some comments and other cleanup things. Please
>> find the attached patch for the revised version.
> Looks good.
>
> It has passed the regression tests (with and without regress mode).
> Query is getting displayed for parallel workers in pg_stat_activity,
> log statements and failed-query statements. Moved status to Ready for
> committer.

The first hunk of this patch says:

+estate->es_queryString = (char *)
palloc0(strlen(queryDesc->sourceText) + 1);
+estate->es_queryString = queryDesc->sourceText;

This is all kinds of wrong.

1. If you assign a value to a variable, and then immediately assign
another value to a variable, the first assignment might as well not
have happened.  In other words, this is allocating a string and then
immediately leaking the allocated memory.

2. If the intent was to copy the string in into
estate->es_queryString, ignoring for the moment that you'd need to use
strcpy() rather than an assignment to make that happen, the use of
palloc0 would be unnecessary.  Regular old palloc would be fine,
because you don't need to zero the memory if you're about to overwrite
it with some new contents.  Zeroing isn't free, so it's best not to do
it unnecessarily.

3. Instead of using palloc and then copying the string, you could just
use pstrdup(), which does the whole thing for you.

4. I don't actually think you need to copy the query string at all.
Tom's email upthread referred to copying the POINTER to the string,
not the string itself, and src/backend/executor/README seems to
suggest that FreeQueryDesc can't be called until after ExecutorEnd().
So I think you could replace these two lines of code with
estate->es_queryString = queryDesc->sourceText and call it good.
That's essentially what this is doing anyway, as the code is written.

+/* For passing query text to the workers */

Unnecessary, because it's clear from the name PARALLEL_KEY_QUERY_TEXT.

 #define PARALLEL_TUPLE_QUEUE_SIZE  65536
-
 /*

Don't remove this blank line.

+   query_data = (char *) palloc0(strlen(estate->es_queryString) + 1);
+   strcpy(query_data, estate->es_queryString);

It's unnecessary to copy the query string here; you're going to use it
later on in the very same function.  That code can just refer to
estate->es_queryString directly.

+   const char *es_queryString; /* Query string for passing to workers */

Maybe this should be called es_sourceText, since it's being copied
from querydesc->sourceText?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel bitmap heap scan

2017-02-19 Thread Dilip Kumar

On Sun, Feb 19, 2017 at 7:44 PM, Robert Haas  wrote:
> It's probably OK if tbm_free() doesn't free the memory allocated from
> DSA, and we just let cleanup at end of query do it.  However, that
> could cause some trouble if the Parallel Bitmap Heap Scan gets
> executed over and over and keeps allocating more and more memory from
> DSA.

Is it possible that Parallel Bitmap Heap Scan will be executed
multiple time without shutting down the Workers?

I can imagine it can get executed over and over if plan is something like below.

NestLoopJoin
-> SeqScan
-> Gather
-> Parallel Bitmap Heap Scan

But in such case every time the Inner node of the NLJ will be
rescanned i.e. Gather will be rescanned which in turn shutdown
workers.

So basically, what I want to propose is that Only during
ExecReScanBitmapHeapScan we can free all the DSA pointers because at
that time we can be sure that all the workers have completed there
task and we are safe to free. (And we don't free any DSA memory at
ExecEndBitmapHeapScan).

 I think the way to fix that would be to maintain a reference
> count that starts at 1 when the iterator arrays are created and gets
> incremented every time a TBMSharedIteratorState is created.  It gets
> decremented when the TIDBitmap is destroyed that has iterator arrays
> is destroyed, and each time a TBMSharedIteratorState is destroyed.
> When it reaches 0, the process that reduces the reference count to 0
> calls dsa_free on the DSA pointers for pagetable, spages, and schunks.
> (Also, if a TIDBitmap is freed before iteration begins, it frees the
> DSA pointer for the pagetable only; the others won't have values yet.)

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Does having a NULL column automatically exclude the table from the tupleDesc cache?

2017-02-19 Thread Ryan Murphy

> > You'll probably want to do those at a C level, bypassing the executor. I
> > would guess that executor overhead will completely swamp the effect of
> the
> > cache in most cases.
>
> That seems like it's kind of missing the point.  If the tupleDesc
> cache saves so little that it's irrelevant when tested through the
> executor, it's not a very useful cache.  I bet that's not the case,
> though.
>

Thank you both for your insight.  I'll probably hold off on the benchmarks
for right now.  I didn't have a production reason to worry about the cache,
just getting more familiar with the codebase.  Thanks again!

Re: [HACKERS] Logical replication existing data copy - comments origin.c

2017-02-19 Thread Erik Rijkers


On 2017-02-19 17:21, Erik Rijkers wrote:

0001-Use-asynchronous-connect-API-in-libpqwalreceiver-v2.patch
0002-Always-initialize-stringinfo-buffers-in-walsender-v2.patch
0003-Fix-after-trigger-execution-in-logical-replication-v2.patch
0004-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION-v2.patch
0001-Logical-replication-support-for-initial-data-copy-v4.patch


Improve readability of comment blocks
in  src/backend/replication/logical/origin.c



now attached



thanks,

Erik Rijkers
--- src/backend/replication/logical/origin.c.orig	2017-02-19 16:45:28.558865304 +0100
+++ src/backend/replication/logical/origin.c	2017-02-19 17:11:09.034023021 +0100
@@ -11,31 +11,29 @@
  * NOTES
  *
  * This file provides the following:
- * * An infrastructure to name nodes in a replication setup
- * * A facility to efficiently store and persist replication progress in an
- *	 efficient and durable manner.
- *
- * Replication origin consist out of a descriptive, user defined, external
- * name and a short, thus space efficient, internal 2 byte one. This split
- * exists because replication origin have to be stored in WAL and shared
+ * * Infrastructure to name nodes in a replication setup
+ * * A facility to efficiently store and persist replication progress
+ *
+ * A replication origin has a descriptive, user defined, external
+ * name and a short, internal 2 byte one. This split
+ * exists because a replication origin has to be stored in WAL and shared
  * memory and long descriptors would be inefficient.  For now only use 2 bytes
  * for the internal id of a replication origin as it seems unlikely that there
- * soon will be more than 65k nodes in one replication setup; and using only
- * two bytes allow us to be more space efficient.
+ * soon will be more than 65k nodes in one replication setup.
  *
  * Replication progress is tracked in a shared memory table
- * (ReplicationStates) that's dumped to disk every checkpoint. Entries
+ * (ReplicationStates) that is dumped to disk every checkpoint. Entries
  * ('slots') in this table are identified by the internal id. That's the case
  * because it allows to increase replication progress during crash
  * recovery. To allow doing so we store the original LSN (from the originating
  * system) of a transaction in the commit record. That allows to recover the
- * precise replayed state after crash recovery; without requiring synchronous
+ * precise replayed state after crash recovery without requiring synchronous
  * commits. Allowing logical replication to use asynchronous commit is
  * generally good for performance, but especially important as it allows a
  * single threaded replay process to keep up with a source that has multiple
  * backends generating changes concurrently.  For efficiency and simplicity
- * reasons a backend can setup one replication origin that's from then used as
- * the source of changes produced by the backend, until reset again.
+ * reasons a backend can setup one replication origin that is used as
+ * the source of changes produced by the backend, until it is reset again.
  *
  * This infrastructure is intended to be used in cooperation with logical
  * decoding. When replaying from a remote system the configured origin is
@@ -45,11 +43,11 @@
  * There are several levels of locking at work:
  *
  * * To create and drop replication origins an exclusive lock on
- *	 pg_replication_slot is required for the duration. That allows us to
- *	 safely and conflict free assign new origins using a dirty snapshot.
+ *	 pg_replication_slot is required. That allows us to
+ *	 safely and conflict-free assign new origins using a dirty snapshot.
  *
- * * When creating an in-memory replication progress slot the ReplicationOirgin
- *	 LWLock has to be held exclusively; when iterating over the replication
+ * * When creating an in-memory replication progress slot the ReplicationOrigin
+ *	 LWLock has to be held exclusively. When iterating over the replication
  *	 progress a shared lock has to be held, the same when advancing the
  *	 replication progress of an individual backend that has not setup as the
  *	 session's replication origin.
@@ -57,7 +55,7 @@
  * * When manipulating or looking at the remote_lsn and local_lsn fields of a
  *	 replication progress slot that slot's lwlock has to be held. That's
  *	 primarily because we do not assume 8 byte writes (the LSN) is atomic on
- *	 all our platforms, but it also simplifies memory ordering concerns
+ *	 all our platforms, but it also simplifies memory ordering
  *	 between the remote and local lsn. We use a lwlock instead of a spinlock
  *	 so it's less harmful to hold the lock over a WAL write
  *	 (c.f. AdvanceReplicationProgress).
@@ -305,7 +303,7 @@
 		}
 	}
 
-	/* now release lock again,	*/
+	/* now release lock again. */
 	heap_close(rel, ExclusiveLock);
 
 	if (tuple == NULL)
@@ -382,7 +380,7 @@
 
 	CommandCounterIncrement();
 
-	/* now release lock again,	*/
+	/* now release lock again. */

[HACKERS] tab completion for partitioning

2017-02-19 Thread Robert Haas

On Thu, Feb 16, 2017 at 7:15 AM, Amit Langote
 wrote:
> Also attaching 0002 (unchanged) for tab-completion support for the new
> partitioning syntax.

At one point you have this:

+/* Limited completion support for partition bound specification */
+else if (TailMatches3("ATTACH", "PARTITION", MatchAny))
+COMPLETE_WITH_CONST("FOR VALUES");
+else if (TailMatches5("ATTACH", "PARTITION", MatchAny, "FOR", "VALUES"))
+COMPLETE_WITH_LIST2("FROM (", "IN (");
+/*

And then later on you have it again:

+/* Limited completion support for partition bound specification */
+else if (TailMatches3("PARTITION", "OF", MatchAny))
+COMPLETE_WITH_CONST("FOR VALUES");
+else if (TailMatches5("PARTITION", "OF", MatchAny, "FOR", "VALUES"))
+COMPLETE_WITH_LIST2("FROM (", "IN (");

I don't think there's any benefit in repeating this.  I'm not sure
which location to keep, but it doesn't seem to make sense to have it
in two places.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication existing data copy - comments origin.c

2017-02-19 Thread Erik Rijkers


0001-Use-asynchronous-connect-API-in-libpqwalreceiver-v2.patch
0002-Always-initialize-stringinfo-buffers-in-walsender-v2.patch
0003-Fix-after-trigger-execution-in-logical-replication-v2.patch
0004-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION-v2.patch
0001-Logical-replication-support-for-initial-data-copy-v4.patch


Improve readability of comment blocks
in  src/backend/replication/logical/origin.c


thanks,

Erik Rijkers




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Documentation improvements for partitioning

2017-02-19 Thread Robert Haas

On Thu, Feb 16, 2017 at 12:43 PM, Amit Langote
 wrote:
> On 2017/02/16 2:08, Robert Haas wrote:
>> On Wed, Feb 15, 2017 at 11:34 AM, Alvaro Herrera
>>  wrote:
>>> I think new-style partitioning is supposed to consider each partition as
>>> an implementation detail of the table; the fact that you can manipulate
>>> partitions separately does not really mean that they are their own
>>> independent object.  You don't stop to think "do I really want to drop
>>> the TOAST table attached to this main table?" and attach a CASCADE
>>> clause if so.  You just drop the main table, and the toast one is
>>> dropped automatically.  I think new-style partitions should behave
>>> equivalently.
>>
>> That's a reasonable point of view.  I'd like to get some more opinions
>> on this topic.  I'm happy to have us do whatever most people want, but
>> I'm worried that having table inheritance and table partitioning work
>> differently will be create confusion.  I'm also suspicious that there
>> may be some implementation difficulties.  On the hand, it does seem a
>> little silly to say that DROP TABLE partitioned_table should always
>> fail except in the degenerate case where there are no partitions, so
>> maybe changing it is for the best.
>
> So I count more than a few votes saying that we should be able to DROP
> partitioned tables without specifying CASCADE.
>
> I tried to implement that using the attached patch by having
> StoreCatalogInheritance1() create DEPENDENCY_AUTO dependency between
> parent and child if the child is a partition, instead of DEPENDENCY_NORMAL
> that would otherwise be created.  Now it seems that that is one way of
> making sure that partitions are dropped when the root partitioned table is
> dropped, not sure if the best; why create the pg_depend entries at all one
> might ask.  I chose it for now because that's the one with fewest lines of
> change.  Adjusted regression tests as well, since we recently tweaked
> tests [1] to work around the irregularities of test output when using CASCADE.

Could you possibly post this on a new thread with a reference back to
this one?  The number of patches on this one is getting a bit hard to
track, and some people may be under the misimpression that this one is
just about documentation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Documentation improvements for partitioning

2017-02-19 Thread Robert Haas

On Thu, Feb 16, 2017 at 7:15 AM, Amit Langote
 wrote:
>> I think 0001 is better than the status quo, but I'm wondering whether
>> we should try to do something slightly different.  Maybe it should
>> always work for the child table to specify neither WITH OIDS nor
>> WITHOUT OIDS, but if you do specify one of them then it has to be the
>> one that matches the parent partitioned table?  With this patch, IIUC,
>> WITH OIDS is allowed only if the parent has the same, but WITHOUT OIDS
>> is allowed (but ignored) regardless of the parent setting.
>
> With the patch, one can always specify (or not) WITH/WITHOUT OIDS when
> creating partitions.  If WITH OIDS is specified and the parent doesn't
> have OIDs, then an error is raised.  Then just like with normal
> inheritance, WITHOUT OIDS specification for a partition will be
> *overridden* if the parent has OIDs.  By the way, CREATE TABLE page says
> this about inheritance and OIDS:
>
>   (If the new table inherits from any tables that have OIDs, then
>   OIDS=TRUE is forced even if the command says
>   OIDS=FALSE.
>
> Hopefully it's clear to someone reading "If the table inherits from any
> tables ..." that it also refers to creating partition of a partitioned table.

I rewrote this to be a bit more explicit and committed it.  I noticed
that there was some duplication here: the old text said both this:

A partition cannot have columns other than those inherited from the parent.

And also this just a little later in the same page:

A partition must have the same column names and types as the table of
which it is a partition.

The second wording seemed better, so I moved that statement a little
higher up and rejiggered the wording to be super-clear about OIDs.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-19 Thread Tom Lane

Robert Haas  writes:
> On Sun, Feb 19, 2017 at 3:31 AM, Tom Lane  wrote:
>> Thoughts?  Should we double down on trying to make this work according
>> to the "all integer timestamps" protocol specs, or cut our losses and
>> change the specs?

> I vote for doubling down.  It's bad enough that we have so many
> internal details that depend on this setting; letting that cascade
> into the wire protocol seems like it's just letting the chaos spread
> farther and wider.

How do you figure that it's not embedded in the wire protocol already?
Not only the replicated data for a timestamp column, but also the
client-visible binary I/O format, depend on this.  I think having some
parts of the protocol use a different timestamp format than other parts
is simply weird, and as this exercise has shown, it's bug-prone as all
get out.

> Also, I wonder if we could consider deprecating and removing
> --disable-integer-datetimes at some point.

Seems like a different conversation.  Although given the lack of
replication bug reports so far, maybe nobody is using
--disable-integer-datetimes anymore.

Certainly, fixing these bugs by removing the --disable-integer-datetimes
option would be a lot less painful than trying to make it actually work
per protocol spec.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] drop support for Python 2.3

2017-02-19 Thread Tom Lane

I wrote:
> Devrim =?ISO-8859-1?Q?G=FCnd=FCz?=  writes:
>> arapaima(x86) and aholehole(x86_64) are the new animals. They are running the
>> buildfarm script now.

> ... and failing.  I wonder what is wrong with tcl_date_week()?
> Will look for the explanation in a bit.

Relevant question: what version of tcl is installed on those?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Documentation improvements for partitioning

2017-02-19 Thread Robert Haas

On Fri, Feb 17, 2017 at 11:25 AM, Ashutosh Bapat
 wrote:
> Forgot to attach the patch. Thanks Rajkumar for notifying me.

I think this is overexplaining what is anyway obvious.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Suppressing that pesky warning with older flex versions

2017-02-19 Thread Tom Lane

Robert Haas  writes:
> On Sat, Feb 18, 2017 at 10:40 PM, Tom Lane  wrote:
>> It seems like it would be quite simple and reliable to apply a patch
>> that inserts "(void) yyg;" into this function.  (Which, indeed, is
>> essentially how flex 2.5.36 and later fixed it.)

> Sounds fine as a master-only fix, but I would vote against back-patching.

Agreed.  I see no need to change this in stable branches; having -Werror
everywhere is mainly of value for development.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] drop support for Python 2.3

2017-02-19 Thread Tom Lane

Devrim =?ISO-8859-1?Q?G=FCnd=FCz?=  writes:
> arapaima(x86) and aholehole(x86_64) are the new animals. They are running the
> buildfarm script now.

... and failing.  I wonder what is wrong with tcl_date_week()?
Will look for the explanation in a bit.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Instability in select_parallel regression test

2017-02-19 Thread Robert Haas

On Sun, Feb 19, 2017 at 6:50 PM, Amit Kapila  wrote:
> To close the remaining gap, don't you think we can check slot->in_use
> flag when generation number for handle and slot are same.

That doesn't completely fix it either, because
ForgetBackgroundWorker() also does
BackgroundWorkerData->parallel_terminate_count++, which we might also
fail to see, which would cause RegisterDynamicBackgroundWorker() to
bail out early.  There are CPU ordering effects to think about here,
not just the order in which the operations are actually performed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow pg_dumpall to work without pg_authid

2017-02-19 Thread Robins Tharakan

On 19 February 2017 at 17:02, Robins Tharakan  wrote:
>
> On Sun, 19 Feb 2017 at 10:08 Stephen Frost  wrote:
>
>> If anything, it should use pg_roles, not pg_user.
>>
>> I don't really like the "--avoid-pgauthid" option, but "--no-passwords"
>> would probably work.
>>
>>
> '--no-passwords' is a good alternative.
> Would submit a patch soon.
>
>
After reviewing further, it seems that merely adding a password related
workaround wouldn't suffice. Further --no-password is already an alias for
-w, so that flag is effectively taken.

Since the main restriction with AWS RDS is the unavailability of pg_authid,
probably that is a better basis to name the flag on.

Attaching a patch to add a new flag (--no-pgauthid) to pg_dumpall that can
dump Globals without needing pg_authid. So the following works with AWS RDS
Postgres databases.

pg_dumpall --no-pgauthid --globals-only > a.sql

I'll create a Commitfest entry, if there aren't many objections.

-
robins

pgdumpall_nopgauthid_flag.diff.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel bitmap heap scan

2017-02-19 Thread Robert Haas

On Sun, Feb 19, 2017 at 7:18 PM, Dilip Kumar  wrote:
> I have observed one problem with 0002 and I though of sharing that
> before fixing the same because we might have encountered the same
> problem in some other patches i.e parallel shared hash and there might
> be already a way to handle that.
>
> The problem is that In ExecEndBitmapHeapScan we do tbm_free. Wherein,
> it frees local pagetable memory and the shared memory using callback
> routine and if other than the Backend (actual backend for the client
> which is executing gather node) node any other worker become the
> leader (worker which is responsible for creating the shared TBM) then
> it will finish it work and free the shared memory by calling
> ExecEndBitmapHeapScan, and it's possible that other worker are still
> operating on the shared memory.
>
> So far I never observed this issue in our test may be because either
> Backend become the leader or by the time leader (non backend) free the
> TBM others also finishes there work.
>
> Solution:
> - One solution to this problem can be that leader should not complete
> the scan unless all other worker have finished the work.
> - We can also think of that we don't make anyone wait but we make a
> arrangement that last worker to finish the bitmapscan only free the
> memory, but one problem with this solution is that last worker can be
> non-leader also, which will have access to shared memory but how to
> free the pagetable local memory (it's local to the leader).
>
> Any suggestion on this ?

It's probably OK if tbm_free() doesn't free the memory allocated from
DSA, and we just let cleanup at end of query do it.  However, that
could cause some trouble if the Parallel Bitmap Heap Scan gets
executed over and over and keeps allocating more and more memory from
DSA.  I think the way to fix that would be to maintain a reference
count that starts at 1 when the iterator arrays are created and gets
incremented every time a TBMSharedIteratorState is created.  It gets
decremented when the TIDBitmap is destroyed that has iterator arrays
is destroyed, and each time a TBMSharedIteratorState is destroyed.
When it reaches 0, the process that reduces the reference count to 0
calls dsa_free on the DSA pointers for pagetable, spages, and schunks.
(Also, if a TIDBitmap is freed before iteration begins, it frees the
DSA pointer for the pagetable only; the others won't have values yet.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel bitmap heap scan

2017-02-19 Thread Amit Kapila

On Sun, Feb 19, 2017 at 7:18 PM, Dilip Kumar  wrote:
> On Sat, Feb 18, 2017 at 10:45 PM, Dilip Kumar  wrote:
>> in 0002:
>> - Improved comments.
>> - Code refactoring in BitmapHeapNext.
>> - Removed local tbm creation in BitmapHeapNext : as per new tidbitmap
>> it's of no use.
>
> I have observed one problem with 0002 and I though of sharing that
> before fixing the same because we might have encountered the same
> problem in some other patches i.e parallel shared hash and there might
> be already a way to handle that.
>
> The problem is that In ExecEndBitmapHeapScan we do tbm_free. Wherein,
> it frees local pagetable memory and the shared memory using callback
> routine and if other than the Backend (actual backend for the client
> which is executing gather node) node any other worker become the
> leader (worker which is responsible for creating the shared TBM) then
> it will finish it work and free the shared memory by calling
> ExecEndBitmapHeapScan, and it's possible that other worker are still
> operating on the shared memory.
> So far I never observed this issue in our test may be because either
> Backend become the leader or by the time leader (non backend) free the
> TBM others also finishes there work.
>
> Solution:
> - One solution to this problem can be that leader should not complete
> the scan unless all other worker have finished the work.

For Parallel Seq Scan, we do wait for parallel workers to finish
before freeing the shared memory.  Why the case is different here?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY IN/BOTH vs. extended query mode

2017-02-19 Thread Robert Haas

On Thu, Feb 16, 2017 at 1:58 PM, Corey Huinker  wrote:
> Forgive my ignorance, but is this issue related to the Catch-22 I had with
> "COPY as a set returning function", wherein a function that invokes
> BeginCopyFrom() basically starts a result set, but then ends it to do the
> BeginCopyFrom() having NULL (meaning STDIN) as the file, so that when the
> results from the copy come back the 'T' record that was going to preface the
> 'D' records emitted by the function is now gone?

I can't quite understand what you've written here.  I would think that
"COPY TO STDOUT", not "COPY FROM", would begin a result set.

If you were trying to write a SQL-callable function that would return
a result set by emitting protocol messages directly, I imagine that
will cause all kinds of problems, because you won't be able to keep
the result set the function produces by emitting protocol messages
cleanly separated from whatever the backend code that's calling that
function does to return whatever it views as the result of the
function call.

If you tried to write an SQL-callable function that internally started
and ended a copy from the client, then I think you would run into this
problem, and probably some others.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2017-02-19 Thread Amit Kapila

On Sun, Feb 19, 2017 at 2:22 PM, Robert Haas  wrote:
> On Sat, Feb 18, 2017 at 6:43 PM, Amit Kapila  wrote:
>> I think there is a value in supporting mark/restore position for any
>> node which produces sorted results, however, if you don't want to
>> support it, then I think we should update above comment in the code to
>> note this fact.  Also, you might want to check other places to see if
>> any similar comment updates are required in case you don't want to
>> support mark/restore position for GatherMerge.
>
> I don't think it makes sense to put mark/support restore into Gather
> Merge.  Maybe somebody else will come up with a test that shows
> differently, but ISTM that with something like Sort it makes a ton of
> sense to support mark/restore because the Sort node itself can do it
> much more cheaply than would be possible with a separate Materialize
> node.  If you added a separate Materialize node, the Sort node would
> be trying to throw away the exact same data that the Materialize node
> was trying to accumulate, which is silly.
>

I am not sure but there might be some cases where adding Materialize
node on top of Sort node could make sense as we try to cost it as well
and add it if it turns out to be cheap.

>  Here with Gather Merge
> there is no such thing happening; mark/restore support would require
> totally new code - probably we would end up shoving the same code that
> already exists in Materialize into Gather Merge as well.
>

I have tried to evaluate this based on plans reported by Rushabh and
didn't find any case where GatherMerge can be beneficial by supporting
mark/restore in the plans where it is being used (it is not being used
on the right side of merge join).  Now, it is quite possible that it
might be beneficial at higher scale factors or may be planner has
ignored such a plan.  However, as we are not clear what kind of
benefits we can get via mark/restore support for GatherMerge, it
doesn't make much sense to take the trouble of implementing it.

>
> A comment update is probably a good idea, though.
>

Agreed.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel bitmap heap scan

2017-02-19 Thread Dilip Kumar

On Sat, Feb 18, 2017 at 10:45 PM, Dilip Kumar  wrote:
> in 0002:
> - Improved comments.
> - Code refactoring in BitmapHeapNext.
> - Removed local tbm creation in BitmapHeapNext : as per new tidbitmap
> it's of no use.

I have observed one problem with 0002 and I though of sharing that
before fixing the same because we might have encountered the same
problem in some other patches i.e parallel shared hash and there might
be already a way to handle that.

The problem is that In ExecEndBitmapHeapScan we do tbm_free. Wherein,
it frees local pagetable memory and the shared memory using callback
routine and if other than the Backend (actual backend for the client
which is executing gather node) node any other worker become the
leader (worker which is responsible for creating the shared TBM) then
it will finish it work and free the shared memory by calling
ExecEndBitmapHeapScan, and it's possible that other worker are still
operating on the shared memory.

So far I never observed this issue in our test may be because either
Backend become the leader or by the time leader (non backend) free the
TBM others also finishes there work.

Solution:
- One solution to this problem can be that leader should not complete
the scan unless all other worker have finished the work.
- We can also think of that we don't make anyone wait but we make a
arrangement that last worker to finish the bitmapscan only free the
memory, but one problem with this solution is that last worker can be
non-leader also, which will have access to shared memory but how to
free the pagetable local memory (it's local to the leader).

Any suggestion on this ?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys

2017-02-19 Thread Robert Haas

On Thu, Feb 16, 2017 at 8:13 PM, Thomas Munro
 wrote:
> Obviously there are vanishing returns here as we add more defences
> making it increasingly unlikely that we hit "fail" mode.  But it
> bothers me that hash joins in general are not 100% guaranteed to be
> able to complete unless you have infinite RAM.

I think in practice most people are forced to set work_mem to such a
small percentage of their available RAM that actual RAM exhaustion is
quite rare.  The default value of 4MB is probably conservative even
for a Raspberry Pi, at least until the connection count spikes
unexpectedly, or until you have this problem:

https://www.postgresql.org/message-id/16161.1324414...@sss.pgh.pa.us

Most advice that I've seen for work_mem involves choosing values like
RAM / (4 * max_connections), which tends to result in much larger
values that are typically still small very small compared to the
amount of RAM that's available at any given moment, because most of
the time you either don't have the maximum number of connections or
some of them are idle or not all of them are using plans that need any
work_mem.  Unfortunately, even with these very conservative settings,
one sometimes sees a machine get absolutely swamped by a large
activity spike at a time when all of the backends just so happen to be
running a query that uses 2 or 3 (or 180) copies of work_mem.[1]

If I were going to try to do something about the problem of machines
running out of memory, I'd be inclined to look at the problem more
broadly than "hey, hash joins can exceed work_mem if certain bad
things happen" and instead think about "hey, work_mem is a stupid way
of deciding on a memory budget".  The intrinsic stupidity of work_mem
as an allocation system means that (1) it's perfectly possible to run
out of memory even if every node respects the memory budget and (2)
it's perfectly possible to drastically underutilize the memory you do
have even if some nodes fail to respect the memory budget.  Of course,
if we had a smarter system for deciding on the budget it would be more
not less important for nodes to respect the budget they were given, so
that's not really an argument against trying to plug the hole you're
complaining about here, just a doubt about how much it will improve
the user experience if that's the only thing you do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] Or all of the connections just touch each of your 100,000
relations and the backend-local caches fill up and the whole system
falls over without using any work_mem at all.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Instability in select_parallel regression test

2017-02-19 Thread Amit Kapila

On Sun, Feb 19, 2017 at 5:54 PM, Robert Haas  wrote:
> On Sun, Feb 19, 2017 at 2:17 PM, Robert Haas  wrote:
>> Such a change can be made, but as I pointed out in the part you didn't
>> quote, there are reasons to wonder whether that will be a constructive
>> change in real life even if it's better for the regression tests.
>> Optimizing PostgreSQL for the use case of running regression tests in
>> the buildfarm at the expense of other use cases wouldn't be very
>> smart.  Maybe such a change is better in real-world applications too,
>> but that deserves at least a little bit of thought and substantive
>> discussion.
>
> Rewind.  Wait a minute.  Looking at this code again, it looks like
> we're supposed to ALREADY BE DOING THIS.
>
> DestroyParallelContext() calls WaitForParallelWorkersToExit() which
> calls WaitForBackgroundWorkerShutdown() for each worker.  That
> function returns only when the postmaster dies (which causes an error
> with that specific complaint) or when GetBackgroundWorkerPid() sets
> the status to BGWH_STOPPED. GetBackgroundWorkerPid() only returns
> BGWH_STOPPED when either (a) handle->generation != slot->generation
> (meaning that the slot got reused, and therefore must have been freed)
> or when (b) slot->pid == 0.  The pid only gets set to 0 in
> BackgroundWorkerStateChange() when slot->terminate is set, or in
> ReportBackgroundWorkerPID() when it's called from
> CleanupBackgroundWorker.  So this function should not be returning
> until after all workers have actually exited.
>

Yeah, I have also noticed this point and was thinking of the way to
close this gap.

> However, it looks like there's a race condition here, because the slot
> doesn't get freed up at the same time that the PID gets set to 0.
> That actually happens later, when the postmaster calls
> maybe_start_bgworker() or DetermineSleepTime() and one of those
> functions calls ForgetBackgroundWorker(). We could tighten this up by
> changing CleanupBackgroundWorker() to also call
> ForgetBackgroundWorker() immediately after calling
> ReportBackgroundWorker() if rw->rw_terminate ||
> rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART.  If we do that
> BEFORE sending the notification to the starting process, that closes
> this hole.  Almost.
>

To close the remaining gap, don't you think we can check slot->in_use
flag when generation number for handle and slot are same.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SCRAM authentication, take three

2017-02-19 Thread Michael Paquier

On Wed, Feb 15, 2017 at 8:28 PM, Heikki Linnakangas  wrote:
> On 02/07/2017 04:20 AM, Michael Paquier wrote:
>> --- a/src/backend/utils/errcodes.txt
>> +++ b/src/backend/utils/errcodes.txt
>> @@ -247,6 +247,7 @@ Section: Class 28 - Invalid Authorization
>> Specification
>>
>>  28000EERRCODE_INVALID_AUTHORIZATION_SPECIFICATION
>> invalid_authorization_specification
>>  28P01EERRCODE_INVALID_PASSWORD
>> invalid_password
>> +28P01EERRCODE_INVALID_NONCE
>> invalid_nonce
>>
>
> Having two error codes with the same SQLSTATE is not cool, and tripped the
> assertion in PL/python. I removed the new error code, it was only used in
> one place, and ERRCODE_PROTOCOL_VIOLATIOn was more appropriate there anyway.
>
> Attached is a new set of patches, with that fixed. Thanks for the report
> Aleksander!

There is something that I think is still unwelcome in this patch: the
interface in pg_hba.conf. I mentioned that in the previous thread but
now if you want to match a user and a database with a scram password
you need to do that with the current set of patches:
local $dbname $user scram
That's not really portable as SCRAM is one protocol in the SASL
family, and even worse in our case we use SCRAM-SHA-256. I'd like to
change pg_hhba.conf to be as follows:
local $dbname $user sasl protocol=scram_sha_256
This is extensible for the future, and protocol is a mandatory option
that would have now just a single value: scram_sha_256. Heikki,
others, are you fine with that?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Provide list of subscriptions and publications in psql's completion

2017-02-19 Thread Robert Haas

On Sun, Feb 19, 2017 at 6:13 PM, Michael Paquier
 wrote:
> On Sun, Feb 19, 2017 at 8:03 PM, Magnus Hagander  wrote:
>> If password auth is used, we have to store the password in plaintext
>> equivalent somewhere. Meaning it's by definition going to be exposed to
>> superusers and replication downstreams.
>
> Another possibility is to mention the use of the new passfile
> parameter for connection strings in the docs... This removes the need
> to have plain passwords directly stored in the database. Not sure if
> that's better though because that still mean that the password is
> present in plain format somewhere.

The real solution to "the password is present in plain form somewhere"
is probably "don't use passwords for authentication".  Because,
ultimately, a password by its nature has to exist in plain form
somewhere, at least in someone's brain, and very likely in their
password manager or the post-it stuck to their desk or the Notes app
on their iPhone or similar.  If the password is simple enough that the
DBA can be certain of remembering it without any sort of memory aid,
it's probably dumb simple.  If the DBA has few enough distinct
passwords that he doesn't need a memory aid just on the basis of sheer
volume of passwords needing to be remembered, that's probably not good
either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Provide list of subscriptions and publications in psql's completion

2017-02-19 Thread Magnus Hagander

On Sun, Feb 19, 2017 at 1:43 PM, Michael Paquier 
wrote:

> On Sun, Feb 19, 2017 at 8:03 PM, Magnus Hagander 
> wrote:
> > If password auth is used, we have to store the password in plaintext
> > equivalent somewhere. Meaning it's by definition going to be exposed to
> > superusers and replication downstreams.
>
> Another possibility is to mention the use of the new passfile
> parameter for connection strings in the docs... This removes the need
> to have plain passwords directly stored in the database. Not sure if
> that's better though because that still mean that the password is
> present in plain format somewhere.
>

That might definitely be a help, because it would be stored out of band.
But it also comes with a caveat in that when it's stored outside, it's not
included in replication (physical HA replication) so you need to maintain
it across all nodes or Bad Things can happen.

And yes, whatever you do, if you want to the system to be able to
automatically start/restart, you have to keep the password in cleartext
equivalent *somewhere*. There's no way around that.

> > Or are you suggesting a scheme
> > whereby you have to enter all your subscription passwords in a prompt of
> > some kind when starting the postmaster, to avoid it?
>
> Thinking about that now we could have a connection string parameter
> called for example password_command, that could be used for example
> with gpg2 to get back a decrypted password value.
>
>
We could, but we don't really have a good way to interface with that. When
the server is started with says systemd, how and where are you going to
prompt the user for the gpg passphrase? It's not like there is a console
available to pop the question up on.

It's the same basic issue as with password protected SSL certificates -
which is why people end up deploying their servers with an unprotected key.
For all services.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Provide list of subscriptions and publications in psql's completion

2017-02-19 Thread Michael Paquier

On Sun, Feb 19, 2017 at 8:03 PM, Magnus Hagander  wrote:
> If password auth is used, we have to store the password in plaintext
> equivalent somewhere. Meaning it's by definition going to be exposed to
> superusers and replication downstreams.

Another possibility is to mention the use of the new passfile
parameter for connection strings in the docs... This removes the need
to have plain passwords directly stored in the database. Not sure if
that's better though because that still mean that the password is
present in plain format somewhere.

> Or are you suggesting a scheme
> whereby you have to enter all your subscription passwords in a prompt of
> some kind when starting the postmaster, to avoid it?

Thinking about that now we could have a connection string parameter
called for example password_command, that could be used for example
with gpg2 to get back a decrypted password value.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Instability in select_parallel regression test

2017-02-19 Thread Robert Haas

On Sun, Feb 19, 2017 at 5:54 PM, Robert Haas  wrote:
> However, it looks like there's a race condition here, because the slot
> doesn't get freed up at the same time that the PID gets set to 0.
> That actually happens later, when the postmaster calls
> maybe_start_bgworker() or DetermineSleepTime() and one of those
> functions calls ForgetBackgroundWorker(). We could tighten this up by
> changing CleanupBackgroundWorker() to also call
> ForgetBackgroundWorker() immediately after calling
> ReportBackgroundWorker() if rw->rw_terminate ||
> rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART.  If we do that
> BEFORE sending the notification to the starting process, that closes
> this hole.  Almost.

And here's a patch implementing something along those lines.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


forget-workers-faster.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Provide list of subscriptions and publications in psql's completion

2017-02-19 Thread Magnus Hagander

On Sun, Feb 19, 2017 at 1:03 PM, Petr Jelinek 
wrote:

> On 19/02/17 12:03, Magnus Hagander wrote:
> >
> >
> > On Sun, Feb 19, 2017 at 2:01 AM, Michael Paquier
> > > wrote:
> >
> > On Sun, Feb 19, 2017 at 9:50 AM, Michael Paquier
> > >
> wrote:
> > > I have been poking at it, and yeah... I missed the fact that
> > > pg_subcription is not a view. I thought that check_conninfo was
> being
> > > called in this context only..
> >
> > Still, storing plain passwords in system catalogs is a practice that
> > should be discouraged as base backup data can go over a network as
> > well... At least adding a note or a warning in the documentation
> would
> > be nice about the fact that any kind of security-sensitive data
> should
> > be avoided here.
> >
> >
> > Isn't that moving the goalposts quite a bit? We already allow passwords
> > in CREATE USER MAPPING without any warnings against it (in fact, we
> > suggest that's what you should do), which is a similar situation. Same
> > goes for dblink.
> >
> > If password auth is used, we have to store the password in plaintext
> > equivalent somewhere. Meaning it's by definition going to be exposed to
> > superusers and replication downstreams. Or are you suggesting a scheme
> > whereby you have to enter all your subscription passwords in a prompt of
> > some kind when starting the postmaster, to avoid it?
> >
>
> The subscriptions will happily use .pgpass for example so it's not like
> users are forced to put password to catalog (well barring some DBaaS
> solutions). But I guess it would not hurt to give extra notice in docs
> about dangers of the various catalogs storing passwords.
>
>
I certainly wouldn't object to doing that, but if we do we should
consistently do it in the other places that have work the same way (like
user mappings).

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Instability in select_parallel regression test

2017-02-19 Thread Robert Haas

On Sun, Feb 19, 2017 at 2:17 PM, Robert Haas  wrote:
> Such a change can be made, but as I pointed out in the part you didn't
> quote, there are reasons to wonder whether that will be a constructive
> change in real life even if it's better for the regression tests.
> Optimizing PostgreSQL for the use case of running regression tests in
> the buildfarm at the expense of other use cases wouldn't be very
> smart.  Maybe such a change is better in real-world applications too,
> but that deserves at least a little bit of thought and substantive
> discussion.

Rewind.  Wait a minute.  Looking at this code again, it looks like
we're supposed to ALREADY BE DOING THIS.

DestroyParallelContext() calls WaitForParallelWorkersToExit() which
calls WaitForBackgroundWorkerShutdown() for each worker.  That
function returns only when the postmaster dies (which causes an error
with that specific complaint) or when GetBackgroundWorkerPid() sets
the status to BGWH_STOPPED. GetBackgroundWorkerPid() only returns
BGWH_STOPPED when either (a) handle->generation != slot->generation
(meaning that the slot got reused, and therefore must have been freed)
or when (b) slot->pid == 0.  The pid only gets set to 0 in
BackgroundWorkerStateChange() when slot->terminate is set, or in
ReportBackgroundWorkerPID() when it's called from
CleanupBackgroundWorker.  So this function should not be returning
until after all workers have actually exited.

However, it looks like there's a race condition here, because the slot
doesn't get freed up at the same time that the PID gets set to 0.
That actually happens later, when the postmaster calls
maybe_start_bgworker() or DetermineSleepTime() and one of those
functions calls ForgetBackgroundWorker(). We could tighten this up by
changing CleanupBackgroundWorker() to also call
ForgetBackgroundWorker() immediately after calling
ReportBackgroundWorker() if rw->rw_terminate ||
rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART.  If we do that
BEFORE sending the notification to the starting process, that closes
this hole.  Almost.

There's still a possibility that the waiting process might receive
SIGUSR1 for some unrelated reason and check the state of shared memory
just after slot->pid == 0 and before slot->in_use is cleared and (also
relevantly) just before BackgroundWorkerData->parallel_terminate_count
is incremented.  Fixing that seems a bit tougher, since we're
certainly not going to have the postmaster start taking locks on
shared data structures, but just forgetting the worker sooner would
probably improve things a lot.  I think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Provide list of subscriptions and publications in psql's completion

2017-02-19 Thread Petr Jelinek

On 19/02/17 12:03, Magnus Hagander wrote:
> 
> 
> On Sun, Feb 19, 2017 at 2:01 AM, Michael Paquier
> > wrote:
> 
> On Sun, Feb 19, 2017 at 9:50 AM, Michael Paquier
> > wrote:
> > I have been poking at it, and yeah... I missed the fact that
> > pg_subcription is not a view. I thought that check_conninfo was being
> > called in this context only..
> 
> Still, storing plain passwords in system catalogs is a practice that
> should be discouraged as base backup data can go over a network as
> well... At least adding a note or a warning in the documentation would
> be nice about the fact that any kind of security-sensitive data should
> be avoided here.
> 
> 
> Isn't that moving the goalposts quite a bit? We already allow passwords
> in CREATE USER MAPPING without any warnings against it (in fact, we
> suggest that's what you should do), which is a similar situation. Same
> goes for dblink.
> 
> If password auth is used, we have to store the password in plaintext
> equivalent somewhere. Meaning it's by definition going to be exposed to
> superusers and replication downstreams. Or are you suggesting a scheme
> whereby you have to enter all your subscription passwords in a prompt of
> some kind when starting the postmaster, to avoid it?
> 

The subscriptions will happily use .pgpass for example so it's not like
users are forced to put password to catalog (well barring some DBaaS
solutions). But I guess it would not hurt to give extra notice in docs
about dangers of the various catalogs storing passwords.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SCRAM authentication, take three

2017-02-19 Thread Michael Paquier

On Sun, Feb 19, 2017 at 6:55 PM, Robert Haas  wrote:
> Gosh, this SCRAM stuff seems to be taking us pretty deeply into
> dealing with encoding details which apparently we haven't formerly
> needed to worry about.  That is a little surprising and maybe
> something we should try to avoid?

The RFC of SCRAM, RFC5802 is clear on the matter
(https://tools.ietf.org/html/rfc5802), SASLprep needs NFKC (RFC4013
here, the worst in the set https://tools.ietf.org/html/rfc4013) if we
want our implementation to be compatible with any other Postgres
driver that implement things at protocol level without libpq. I think
that JDBC is one of those things. So I am afraid we cannot avoid it if
we want SCRAM.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY

2017-02-19 Thread Robert Haas

On Sun, Feb 19, 2017 at 3:52 PM, Pavan Deolasee
 wrote:
> This particular case of corruption results in a heap tuple getting indexed
> by a wrong key (or to be precise, indexed by its old value). So the only way
> to detect the corruption is to look at each index key and check if it
> matches with the corresponding heap tuple. We could write some kind of self
> join that can use a sequential scan and an index-only scan (David Rowley had
> suggested something of that sort internally here), but we can't guarantee
> index-only scan on a table which is being concurrently updated. So not sure
> even that will catch every possible case.

Oh, so the problem isn't index entries that are altogether missing?  I
guess I was confused.

You can certainly guarantee an index-only scan if you write the
validation code in C rather than using SQL.  I think the issue is that
if the table is large enough that keeping a TID -> index value mapping
in a hash table is impractical, there's not going to be a real
efficient strategy for this.  Ignoring the question of whether you use
the main executor for this or just roll your own code, your options
for a large table are (1) a multi-batch hash join, (2) a nested loop,
and (3) a merge join.  (2) is easy to implement but will generate a
ton of random I/O if the table is not resident in RAM.  (3) is most
suitable for very large tables but takes more work to code, and is
also likely to be a lot slower for small tables than a hash or
nestloop-based approach.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding new output parameter of pg_stat_statements to identify operation of the query.

2017-02-19 Thread Robert Haas

On Sun, Feb 19, 2017 at 2:58 PM, jasonysli(李跃森)  wrote:
> Hi PG hackers:
>
>  When using pg_stat_statements to collect running SQL of PG, we find
> it is hard for our program to get exact operation type of the SQL,  such as
> SELECT, DELETE, UPDATE, INSERT, and so on.
>
>So we modify the the source code of pg_stat_statements and add another
> output parameter to tell us the operation type.
>
> Of course some application know their operation type, but for us and many
> public databases, doing this is hard.
>
> The only way is to reparse the SQL, obviously it is too expensive for a
> monitoring or diagnosis system.
>
> We have done the job and are willing to post a patch.

Hi,

You've posted this message to this mailing list 4 times so far.  Once is enough.

This effort sounds similar to https://commitfest.postgresql.org/13/790/

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Incremental sort (was: PoC: Partial sort)

2017-02-19 Thread Robert Haas

On Sat, Feb 18, 2017 at 4:01 PM, Alexander Korotkov
 wrote:
> I decided to start new thread for this patch for following two reasons.
>  * It's renamed from "Partial sort" to "Incremental sort" per suggestion by
> Robert Haas [1].  New name much better characterizes the essence of
> algorithm.
>  * I think it's not PoC anymore.  Patch received several rounds of review
> and now it's in the pretty good shape.
>
> Attached revision of patch has following changes.
>  * According to review [1], two new path and plan nodes are responsible for
> incremental sort: IncSortPath and IncSort which are inherited from SortPath
> and Sort correspondingly.  That allowed to get rid of set of hacks with
> minimal code changes.
>  * According to review [1] and comment [2], previous tuple is stored in
> standalone tuple slot of SortState rather than just HeapTuple.
>  * New GUC parameter enable_incsort is introduced to control planner ability
> to choose incremental sort.
>  * Test of postgres_fdw with not pushed down cross join is corrected.  It
> appeared that with incremental sort such query is profitable to push down.
> I changed ORDER BY columns so that index couldn't be used.  I think this
> solution is more elegant than setting enable_incsort = off.

I usually advocate for spelling things out instead of abbreviating, so
I guess I'll stay true to form here and suggest that abbreviating
incremental to inc doesn't seem like a great idea.  Is that sort
incrementing, incremental, incredible, incautious, or incorporated?

The first hunk in the patch, a change in the postgres_fdw regression
test output, looks an awful lot like a bug: now the query that
formerly returned various different numbers is returning all zeroes.
It might not actually be a bug, because you've also changed the test
query (not sure why), but anyway the new regression test output that
is all zeroes seems less useful for catching bugs in, say, the
ordering of the results than the old output where the different rows
were different.

I don't know of any existing cases where the same executor file is
responsible for executing more than 1 different type of executor node.
I was imagining a more-complete separation of the new executor node.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Add pg_disable_checksums() and supporting infrastructure

2017-02-19 Thread Robert Haas

On Fri, Feb 17, 2017 at 2:28 AM, David Christensen  wrote:
> - Change "data_checksums" from a simple boolean to "data_checksum_state", an 
> enum type for all of
>   the potentially-required states for this feature (as well as enabling).

Color me skeptical.  I don't know what CHECKSUMS_ENABLING,
CHECKSUMS_ENFORCING, and CHECKSUMS_REVALIDATING are intended to
represent -- and there's no comments in the patch explaining it -- but
if we haven't yet written the code to enable checksums, how do we know
for sure which states it will require?

If we're going to accept a patch to disable checksums without also
having the capability to enable checksums, I think we should leave out
the speculative elements about what might be needed on the "enable"
side and just implement the minimal "disable" side.

However, FWIW, I don't accept that being able to disable checksums
online is a sufficient advance to justify enabling checksums by
default.  Tomas had some good points on another thread about what
might be needed to really make that a good choice, and I'm still
skeptical about whether checksums catch any meaningful number of
errors that wouldn't be caught otherwise, and about the degree to
which any complaints it issues are actionable.  I'm not really against
this patch on its own merits, but I think it's a small advance in an
area that needs a lot of work.  I think it would be a lot more useful
if we had a way to *enable* checksums online.  Then people who find
out that checksums exist and want them have an easier way of getting
them, and anyone who uses the functionality in this patch and then
regrets it has a way to get back.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Provide list of subscriptions and publications in psql's completion

2017-02-19 Thread Magnus Hagander

On Sun, Feb 19, 2017 at 2:01 AM, Michael Paquier 
wrote:

> On Sun, Feb 19, 2017 at 9:50 AM, Michael Paquier
>  wrote:
> > I have been poking at it, and yeah... I missed the fact that
> > pg_subcription is not a view. I thought that check_conninfo was being
> > called in this context only..
>
> Still, storing plain passwords in system catalogs is a practice that
> should be discouraged as base backup data can go over a network as
> well... At least adding a note or a warning in the documentation would
> be nice about the fact that any kind of security-sensitive data should
> be avoided here.
>
>
Isn't that moving the goalposts quite a bit? We already allow passwords in
CREATE USER MAPPING without any warnings against it (in fact, we suggest
that's what you should do), which is a similar situation. Same goes for
dblink.

If password auth is used, we have to store the password in plaintext
equivalent somewhere. Meaning it's by definition going to be exposed to
superusers and replication downstreams. Or are you suggesting a scheme
whereby you have to enter all your subscription passwords in a prompt of
some kind when starting the postmaster, to avoid it?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Parallel Index-only scan

2017-02-19 Thread Robert Haas

On Sat, Feb 18, 2017 at 12:02 PM, Amit Kapila  wrote:
> +   /*
> +* If we are here to just update the scan keys, then don't
> reset parallel
> +* scan. For detailed reason behind this look in the comments for
> +* ExecReScanIndexScan.
> +*/
>
> You can phrase the second line as "See ExecReScanIndexScan for
> details.".  Apart from that this patch looks good to me.  I have
> marked this patch as "Ready For Committer".

Committed, although I neglected to incorporate this change.  Not sure
if I should go back and do that; it doesn't read too badly as-is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY

2017-02-19 Thread Pavan Deolasee

On Sun, Feb 19, 2017 at 3:43 PM, Robert Haas  wrote:

> On Fri, Feb 17, 2017 at 11:15 PM, Tom Lane  wrote:
>
> > Ah, nah, scratch that.  If any post-index-build pruning had occurred on a
> > page, the evidence would be gone --- the non-matching older tuples would
> > be removed and what remained would look consistent.  But it wouldn't
> > match the index.  You might be able to find problems if you were willing
> > to do the expensive check on *every* HOT chain, but that seems none too
> > practical.
>
> If the goal is just to detect tuples that aren't indexed and should
> be, what about scanning each index and building a TIDBitmap of all of
> the index entries, and then scanning the heap for non-HOT tuples and
> probing the TIDBitmap for each one?  If you find a heap entry for
> which you didn't find an index entry, go and recheck the index and see
> if one got added concurrently.  If not, whine.
>
>
This particular case of corruption results in a heap tuple getting indexed
by a wrong key (or to be precise, indexed by its old value). So the only
way to detect the corruption is to look at each index key and check if it
matches with the corresponding heap tuple. We could write some kind of self
join that can use a sequential scan and an index-only scan (David Rowley
had suggested something of that sort internally here), but we can't
guarantee index-only scan on a table which is being concurrently updated.
So not sure even that will catch every possible case.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] gitlab post-mortem: pg_basebackup waiting for checkpoint

2017-02-19 Thread Robert Haas

On Sat, Feb 18, 2017 at 4:52 AM, Tomas Vondra
 wrote:
> I have my doubts about this actually addressing gitlab-like mistakes,
> though, because it's a helluva jump from "It's waiting and not doing
> anything," to "We need to remove the datadir." (One of the reasons being
> that non-empty directory is a local issue, and there's no reason why the
> tool should wait instead of just reporting an error.)

It's pretty clear that the gitlab postmortem involves multiple people
making multiple serious errors, including failing to test that the
ostensible backups could actually be restored.  I was taught that rule
#1 as far as backups are concerned is to test that you can restore
them, so that seems like a big miss.  However, I don't think the fact
they made other mistakes is a reason not to improve the things we can
improve and, certainly, having some way for pg_basebackup to tell you
that it's waiting for the master to checkpoint will help the next
person who is confused by that particular thing.  That person may go
on to be confused by something else, but then again maybe not.
Improving the reporting in this case stands on its own merits.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Does having a NULL column automatically exclude the table from the tupleDesc cache?

2017-02-19 Thread Robert Haas

On Sat, Feb 18, 2017 at 12:33 AM, Jim Nasby  wrote:
> On 2/15/17 1:37 PM, Ryan Murphy wrote:
>> attcacheoff can only be set positive for fields preceding any varlena
>> (typlen<0, but including the first such) or nullable values.  I don't
>> know how much faster it is with the cache; you can measure it if your
>> curiosity is strong enough -- just set the first column to nullable.
>>
>> Thanks!  Maybe I'll do some benchmarks.
>
> You'll probably want to do those at a C level, bypassing the executor. I
> would guess that executor overhead will completely swamp the effect of the
> cache in most cases.

That seems like it's kind of missing the point.  If the tupleDesc
cache saves so little that it's irrelevant when tested through the
executor, it's not a very useful cache.  I bet that's not the case,
though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY

2017-02-19 Thread Robert Haas

On Fri, Feb 17, 2017 at 11:15 PM, Tom Lane  wrote:
> I wrote:
>> However, you might be able to find it without so much random I/O.
>> I'm envisioning a seqscan over the table, in which you simply look for
>> HOT chains in which the indexed columns aren't all the same.  When you
>> find one, you'd have to do a pretty expensive index lookup to confirm
>> whether things are OK or not, but hopefully that'd be rare enough to not
>> be a big performance sink.
>
> Ah, nah, scratch that.  If any post-index-build pruning had occurred on a
> page, the evidence would be gone --- the non-matching older tuples would
> be removed and what remained would look consistent.  But it wouldn't
> match the index.  You might be able to find problems if you were willing
> to do the expensive check on *every* HOT chain, but that seems none too
> practical.

If the goal is just to detect tuples that aren't indexed and should
be, what about scanning each index and building a TIDBitmap of all of
the index entries, and then scanning the heap for non-HOT tuples and
probing the TIDBitmap for each one?  If you find a heap entry for
which you didn't find an index entry, go and recheck the index and see
if one got added concurrently.  If not, whine.

One problem is that you'd need to make sure that the TIDBitmap didn't
lossify, but that could be prevented either by specifying a very large
maxbytes setting or by using something that serves the same function
instead of a TIDBitmap per se.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SUBSCRIPTIONS and pg_upgrade

2017-02-19 Thread Robert Haas

On Fri, Feb 17, 2017 at 7:34 AM, Stephen Frost  wrote:
> I'm not entirely sure about the reasoning behind requiring a flag to
> include subscriptions in pg_dump output, as the documentation doesn't
> actually provide one, but if we are going to require that, shouldn't
> pg_upgrade use it, to make sure that the subscriptions are pulled
> forward to the upgraded cluster?

Subscriptions are different from other objects in that whether or not
you want them in your dump output depends on how you plan to use the
dump.  If your goal is to create a server to replace the original
server, you want the subscriptions.  If you goal is to take a static
copy of the data, you don't.  Subscriptions aren't really data in the
sense that table data is data, or even in the sense that functions are
data.  Granted, you can execute a function, but a subscription is
self-executing.

That having been said, I share your discomfort with not dumping these
by default.  I think it would be good to dump and restore them by
default, but maybe restore them in disabled mode as you suggest
downthread, and document that if you want them enabled you have to
turn them on after doing the restore.  Otherwise, you could have
logical replication start up before the dump restore even completes,
which seems like it could cause all sorts of problems.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SCRAM authentication, take three

2017-02-19 Thread Robert Haas

On Fri, Feb 17, 2017 at 7:29 PM, Michael Paquier
 wrote:
> On Wed, Feb 15, 2017 at 9:27 PM, Michael Paquier
>  wrote:
>> On Wed, Feb 15, 2017 at 7:58 PM, Heikki Linnakangas  wrote:
>>> On 02/09/2017 09:33 AM, Michael Paquier wrote:
 Now regarding the shape of the implementation for SCRAM, we need one
 thing: a set of routines in src/common/ to build decompositions for a
 given UTF-8 string with conversion UTF8 string <=> pg_wchar array, the
 decomposition and the reordering. The extension attached roughly
 implements that. What we can actually do as well is have in contrib/ a
 module that does NFK[C|D] using the base APIs in src/common/. Using
 arrays of pg_wchar (integers) to manipulate the characters, we can
 validate and have a set of regression tests that do *not* have to
 print non-ASCII characters.
>>>
>>>
>>> A contrib module or built-in extra functions to deal with Unicode characters
>>> might be handy for a lot of things. But I'd leave that out for now, to keep
>>> this patch minimal.
>>
>> No problem from me. I'll get something for SASLprep in the shape of
>> something like the above. It should not take me long.
>
> OK, attached is a patch that implements SASLprep that needs to be
> applied on top of the other ones. When working on the table reduction,
> the worst size was at 2.4MB. After removing all the characters with a
> class of 0 and no decomposition, I am able to get that down to 570kB.
> After splitting the decompositions by size into their own tables, it
> got down to 120kB, which is even nicer. One thing that I forgot
> previously was the handling of the decomposition of Hangul characters
> (Korean stuff) which is algorithmic, so you actually do not need a
> table for them. The algorithm is here for the curious =>
> http://unicode.org/reports/tr15/tr15-18.html#Hangul.
>
> The patch includes the conversion tables, which is why it is large,
> and the perl script that I used to generate it. It has been pushed as
> well on my github branch. The basics are here I think, still this
> portion really needs a careful review. I have done some basic tests
> and things are basically working, but I have been able to break things
> pretty easily when using some exotic characters. The conversion tables
> look correct, I have tested it using my module which implements NFKC
> (https://github.com/michaelpq/pg_plugins/tree/master/pg_sasl_prepare),
> still much refinement needs to be done.

Gosh, this SCRAM stuff seems to be taking us pretty deeply into
dealing with encoding details which apparently we haven't formerly
needed to worry about.  That is a little surprising and maybe
something we should try to avoid?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables and relfilenode

2017-02-19 Thread Robert Haas

On Fri, Feb 17, 2017 at 1:12 PM, Amit Langote
 wrote:
>> I agree.  But, we need to be careful that a database-wide VACUUM or
>> ANALYZE doesn't hit the partitions multiple times, once for the parent
>> and again for each child.  Actually, a database-wide VACUUM should hit
>> each partition individually and do nothing for the parents, but a
>
> This is what would happen even without the patch.  Patch only modifies
> what happens when a partitioned table is specified in the vacuum command.
> It emits a warning:
>
> WARNING: skipping "%s" --- cannot vacuum partitioned tables
>
> It seems both you and Simon agree that instead of this warning, we should
> recurse to process the leaf partitions (ignoring any partitioned tables in
> the hierarchy for which there is nothing to do).  If that's right, I will
> modify the patch to do that.

Yeah, that sounds fine.

>> database-wide ANALYZE should process the parents and do nothing for
>> the children, so that the inheritance statistics get updated.
>
> Currently vacuum() processes the following relations:
>
> /*
>  * Process all plain relations and materialized views listed in
>  * pg_class
>  */
>
> while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
> {
> Form_pg_class classForm = (Form_pg_class) GETSTRUCT(tuple);
>
> if (classForm->relkind != RELKIND_RELATION &&
> classForm->relkind != RELKIND_MATVIEW)
> continue;
>
> Do you mean that if database-wide analyze is to be run, we should also
> exclude those RELKIND_RELATION relations that are partitions?
>
> So the only way to update a partition's statistics is to directly specify
> it in the command or by autovacuum.

I think if you type:

ANALYZE;

...that should process all partitioned tables and all tables that are
not themselves partitions.  If you type:

ANALYZE name;

...that should ANALYZE that relation, whatever it is.  If it's a
partitioned table, it should recurse.

> Truncate already recurses to partitions by way of inheritance recursion
> that's already in place.  The patch simply teaches ExecuteTruncate() to
> ignore partitioned tables when we get to the part where relfilenodes are
> manipulated.

Oh, OK.  That seems fine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-02-19 Thread Robert Haas

On Sun, Feb 19, 2017 at 3:31 AM, Tom Lane  wrote:
> Thoughts?  Should we double down on trying to make this work according
> to the "all integer timestamps" protocol specs, or cut our losses and
> change the specs?

I vote for doubling down.  It's bad enough that we have so many
internal details that depend on this setting; letting that cascade
into the wire protocol seems like it's just letting the chaos spread
farther and wider.

Also, I wonder if we could consider deprecating and removing
--disable-integer-datetimes at some point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting xmin from VACUUMs

2017-02-19 Thread Robert Haas

On Sat, Feb 18, 2017 at 12:30 AM, Jim Nasby  wrote:
>> What?
>
> There's a bunch of information reported by vacuum logging but not in
> pg_stat*, such as all-visible/frozen skipping, unable to get cleanup lock,
> last freeze scan, times autovac has been interrupted. There's been
> resistance in the past to further bloating the existing stats files, but if
> we allowed more than one stats file per database that bloat would be less of
> a concern (since vacuum stats will see far less update traffic than the main
> relation stats).

This is the kind of information that you really want to see once per
autovac, though, not just the most recent autovac or some kind of
cumulative total.  Knowing that I've done 301 index scans in my last
300 vacuums is not nearly as useful as knowing which autovacuum did 2
index scans and what exactly was going on with that vacuum.  So I'm
not sure including this sort of thing in the stats files would be very
useful, or at least you'd want to think carefully about how to do it.

As far as bloating the stats file is concerned, the big problem there
is that our current design for the stats file requires rewriting the
entire thing any time we want to update even a single byte of data.
We could fix that by splitting up the files more so that they are
smaller and faster to rewrite, but we could also fix it by coming up
with a way of rewriting just one part of a file instead of the whole
thing, or we could think about storing it in DSM so that you don't
have to rewrite anything at all.  I think that last option is worth
some serious study now that we have DSA, but it's currently not very
high on my personal priority list.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

2017-02-19 Thread Robert Haas

On Fri, Feb 17, 2017 at 2:56 PM, Amit Khandekar  wrote:
> The log2(num_children)+1 formula which you proposed does not take into
> account the number of workers for each of the subplans, that's why I
> am a bit more inclined to look for some other logic. May be, treat the
> children as if they belong to partitions, and accordingly calculate
> the final number of workers. So for 2 children with 4 and 5 workers
> respectively, Append parallel_workers would be : log3(3^4 + 3^5) .

In general this will give an answer not different by more than 1 or 2
from my answer, and often exactly the same.  In the case you mention,
whether we get the same answer depends on which way you round:
log3(3^4+3^5) is 5 if you round down, 6 if you round up.

My formula is more aggressive when there are many subplans that are
not parallel or take only 1 worker, because I'll always use at least 5
workers for an append that has 9-16 children, whereas you might use
only 2 if you do log3(3^0+3^0+3^0+3^0+3^0+3^0+3^0+3^0+3^0).  In that
case I like my formula better. With lots of separate children, the
chances of being able to use as many as 5 workers seem good.  (Note
that using 9 workers as Ashutosh seems to be proposing would be a
waste if the different children have very unequal execution times,
because the workers that run children with short execution times can
be reused to run additional subplans while the long ones are still
running.  Running a separate worker for each child only works out if
the shortest runtime is more than 50% of the longest runtime, which
may sometimes be true but doesn't seem like a good bet in general.)

Your formula is more aggressive when you have 3 children that all use
the same number of workers; it'll always decide on +1, whereas mine won't add the extra worker in that case.
Possibly your formula is better than mine in that case, but I'm not
sure.  If you have as many as 9 children that all want N workers, your
formula will decide on N+2 workers, but since my formula guarantees a
minimum of 5 workers in such cases, I'll probably be within 1 of
whatever answer you were getting.

Basically, I don't believe that the log3(n) thing is anything very
special or magical.  The fact that I settled on that formula for
parallel sequential scan doesn't mean that it's exactly right for
every other case.  I do think it's likely that increasing workers
logarithmically is a fairly decent strategy here, but I wouldn't get
hung up on using log3(n) in every case or making all of the answers
100% consistent according to some grand principal.  I'm not even sure
log3(n) is right for parallel sequential scan, so insisting that
Parallel Append has to work that way when I had no better reason than
gut instinct for picking that for Parallel Sequential Scan seems to me
to be a little unprincipled.  We're still in the early stages of this
parallel query experiment, and a decent number of these algorithms are
likely to change as we get more sophisticated.  For now at least, it's
more important to pick things that work well pragmatically than to be
theoretically optimal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 110 matches

Mail list logo