Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-03-26 Thread Andres Freund
On 2017-03-23 20:35:09 +1300, Thomas Munro wrote:
> Here is a new patch series responding to feedback from Peter and Andres:

+
+/* Per-participant shared state. */
+typedef struct SharedTuplestoreParticipant
+{
+   LWLock lock;

Hm. No padding (ala LWLockMinimallyPadded / LWLockPadded) - but that's
probably ok, for now.




+   bool error; /* Error occurred flag. 
*/
+   bool eof;   /* End of file reached. 
*/
+   int read_fileno;/* BufFile segment file number. 
*/
+   off_t read_offset;  /* Offset within segment file. 
*/

Hm. I wonder if it'd not be better to work with 64bit offsets, and just
separate that out upon segment access.

+/* The main data structure in shared memory. */

"main data structure" isn't particularly meaningful.

+struct SharedTuplestore
+{
+   int reading_partition;
+   int nparticipants;
+   int flags;

Maybe add a comment saying /* flag bits from SHARED_TUPLESTORE_* */?

+   Size meta_data_size;

What's this?

+   SharedTuplestoreParticipant participants[FLEXIBLE_ARRAY_MEMBER];

I'd add a comment here, that there's further data after participants.

+};

+
+/* Per-participant backend-private state. */
+struct SharedTuplestoreAccessor
+{

Hm. The name and it being backend-local are a bit conflicting.

+   int participant;/* My partitipant number. */
+   SharedTuplestore *sts;  /* The shared state. */
+   int nfiles; /* Size of local files 
array. */
+   BufFile **files;/* Files we have open locally 
for writing. */

Shouldn't this mention that it's indexed by partition?

+   BufFile *read_file; /* The current file to read 
from. */
+   int read_partition; /* The current partition to 
read from. */
+   int read_participant;   /* The current participant to read 
from. */
+   int read_fileno;/* BufFile segment file number. 
*/
+   off_t read_offset;  /* Offset within segment file. 
*/
+};


+/*
+ * Initialize a SharedTuplestore in existing shared memory.  There must be
+ * space for sts_size(participants) bytes.  If flags is set to the value
+ * SHARED_TUPLESTORE_SINGLE_PASS then each partition may only be read once,
+ * because underlying files will be deleted.

Any reason not to use flags that are compatible with tuplestore.c?

+ * Tuples that are stored may optionally carry a piece of fixed sized
+ * meta-data which will be retrieved along with the tuple.  This is useful for
+ * the hash codes used for multi-batch hash joins, but could have other
+ * applications.
+ */
+SharedTuplestoreAccessor *
+sts_initialize(SharedTuplestore *sts, int participants,
+  int my_participant_number,
+  Size meta_data_size,
+  int flags,
+  dsm_segment *segment)
+{

Not sure I like that the naming here has little in common with
tuplestore.h's api.


+
+MinimalTuple
+sts_gettuple(SharedTuplestoreAccessor *accessor, void *meta_data)
+{

This needs docs.

+   SharedBufFileSet *fileset = GetSharedBufFileSet(accessor->sts);
+   MinimalTuple tuple = NULL;
+
+   for (;;)
+   {

...
+   /* Check if this participant's file has already been entirely 
read. */
+   if (participant->eof)
+   {
+   BufFileClose(accessor->read_file);
+   accessor->read_file = NULL;
+   LWLockRelease(>lock);
+   continue;

Why are we closing the file while holding the lock?

+
+   /* Read the optional meta-data. */
+   eof = false;
+   if (accessor->sts->meta_data_size > 0)
+   {
+   nread = BufFileRead(accessor->read_file, meta_data,
+   
accessor->sts->meta_data_size);
+   if (nread == 0)
+   eof = true;
+   else if (nread != accessor->sts->meta_data_size)
+   ereport(ERROR,
+   (errcode_for_file_access(),
+errmsg("could not read from 
temporary file: %m")));
+   }
+
+   /* Read the size. */
+   if (!eof)
+   {
+   nread = BufFileRead(accessor->read_file, _size, 
sizeof(tuple_size));
+   if (nread == 0)
+   eof = true;

Why is it legal to have EOF here, if metadata previously didn't have an
EOF? Perhaps add an error if accessor->sts->meta_data_size != 0?


+   if (eof)
+   {
+   

Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-03-26 Thread Peter Geoghegan
On Sun, Mar 26, 2017 at 3:41 PM, Thomas Munro
 wrote:
>> 1.  Segments are what buffile.c already calls the individual
>> capped-at-1GB files that it manages.  They are an implementation
>> detail that is not part of buffile.c's user interface.  There seems to
>> be no reason to change that.
>
> After reading your next email I realised this is not quite true:
> BufFileTell and BufFileSeek expose the existence of segments.

Yeah, that's something that tuplestore.c itself relies on.

I always thought that the main reason practical why we have BufFile
multiplex 1GB segments concerns use of temp_tablespaces, rather than
considerations that matter only when using obsolete file systems:

/*
 * We break BufFiles into gigabyte-sized segments, regardless of RELSEG_SIZE.
 * The reason is that we'd like large temporary BufFiles to be spread across
 * multiple tablespaces when available.
 */

Now, I tend to think that most installations that care about
performance would be better off using RAID to stripe their one temp
tablespace file system. But, I suppose this still makes sense when you
have a number of file systems that happen to be available, and disk
capacity is the main concern. PHJ uses one temp tablespace per worker,
which I further suppose might not be as effective in balancing disk
space usage.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch proposal

2017-03-26 Thread Venkata B Nagothi
Hi David,

On Thu, Mar 23, 2017 at 4:21 AM, David Steele  wrote:

> On 3/21/17 8:45 PM, Venkata B Nagothi wrote:
>
>> On Tue, Mar 21, 2017 at 8:46 AM, David Steele >
>> Unfortunately, I don't think the first patch (recoveryStartPoint)
>> will work as currently implemented.  The problem I see is that the
>> new function recoveryStartsHere() depends on pg_control containing a
>> checkpoint right at the end of the backup.  There's no guarantee
>> that this is true, even if pg_control is copied last.  That means a
>> time, lsn, or xid that occurs somewhere in the middle of the backup
>> can be selected without complaint from this code depending on timing.
>>
>>
>> Yes, that is true.  The latest best position, checkpoint position, xid
>> and timestamp of the restored backup of the backup is shown up in the
>> pg_controldata, which means, that is the position from which the
>> recovery would start.
>>
>
> Backup recovery starts from the checkpoint in the backup_label, not from
> the checkpoint in pg_control.  The original checkpoint that started the
> backup is generally overwritten in pg_control by the end of the backup.


Yes, I totally agree. My initial intention was to compare the recovery
target position(s) with the contents in the backup_label, but, then, the
checks would fails if the recovery is performed without the backup_label
file. Then, i decided to compare the recovery target positions with the
contents in the pg_control file.


> Which in-turn means, WALs start getting replayed
>> from that position towards --> minimum recovery position (which is the
>> end backup, which again means applying WALs generated between start and
>> the end backup) all the way through to  --> recovery target position.
>>
>
> minRecoveryPoint is only used when recovering a backup that was made from
> a standby.  For backups taken on the master, the backup end WAL record is
> used.
>
> The best start position to check with would the position shown up in the
>> pg_control file, which is way much better compared to the current
>> postgresql behaviour.
>>
>
> I don't agree, for the reasons given previously.
>

As explained above, my intention was to ensure that the recovery start
positions checks are successfully performed irrespective of the presence of
the backup_label file.

I did some analysis before deciding to use pg_controldata's output instead
of backup_label file contents.

Comparing the output of the pg_controldata with the contents of
backup_label contents.

*Recovery Target LSN*

START WAL LOCATION (which is 0/9C28) in the backup_label =
pg_controldata's latest checkpoint's REDO location (Latest checkpoint's
REDO location:  0/9C28)

*Recovery Target TIME*

backup start time in the backup_label (START TIME: 2017-03-26 11:55:46
AEDT) = pg_controldata's latest checkpoint time (Time of latest checkpoint
:  Sun 26 Mar 2017 11:55:46 AM AEDT)

*Recovery Target XID*

To begin with backup_label does contain any start XID. So, the only option
is to depend on pg_controldata's output.
After a few quick tests and thorough observation, i do notice that, the
pg_control file information is copied as it is to the backup location at
the pg_start_backup. I performed some quick tests by running few
transactions between pg_start_backup and pg_stop_backup. So, i believe,
this is ideal start point for WAL replay.

Am i missing anything here ?



The tests pass (or rather fail as expected) because they are written
>> using values before the start of the backup.  It's actually the end
>> of the backup (where the database becomes consistent on recovery)
>> that defines where PITR can start and this distinction definitely
>> matters for very long backups.  A better test would be to start the
>> backup, get a time/lsn/xid after writing some data, and then make
>> sure that time/lsn/xid is flagged as invalid on recovery.
>>
>> The current behaviour is that, if the XID shown-up by the pg_controldata
>> is say 1400 and the specified recovery_target_xid is say 200, then,
>> postgresql just completes the recovery and promotes at the immediate
>> consistent recovery target, which means, the DBA has to all the way
>> start the restoration and recovery process again to promote the database
>> at the intended position.
>>
>
> I'm not saying that the current behavior is good, only that the proposed
> behavior is incorrect as far as I can tell.


Please consider my explanation above and share your thoughts.


>
> It is also problematic to assume that transaction IDs commit in
>> order. If checkPoint.latestCompletedXid contains 5 then a recovery
>> to xid 4 may or may not be successful depending on the commit
>> order.  However, it appears in this case the patch would disallow
>> recovery to 4.
>
>
>> If the txid 4 is the recovery target xid, then, the appropriate backup
>> taken previous to txid 4 must be used or as an 

Re: [HACKERS] pg_get_statisticsextdef() is not quite the full shilling

2017-03-26 Thread Alvaro Herrera
David Rowley wrote:

> Seems pg_get_statisticsextdef() has a couple of things wrong:
> 
> 1. HeapTupleIsValid() called on the wrong tuple.
> 2. Did not schema qualify names.

Actually we can solve both 1 and the first half of 2 by just using
generate_relation_name, which is less code and gives better results
(because we then schema-qualify the relation name conditionally).  I
patched the other half of 2 using your approach, but I wonder if we
should have a function StatisticsIsVisible or something.

> I've purposefully left out the WITH syntax. We'll want to add some
> logic around that once we have more than one statistic type supported.
> I'd suggest not appending WITH if all supported types are present, and
> only appending it if a true subset are present. That'll mean pg_dump
> from v10 and import into v11 will get all types, if they did in v10,
> and the same subset that they did in v10 when only a subset were
> originally defined.
> 
> Since we support only 1 type now, nothing needs to happen there yet.

Yeah, this sounds sensible to me.


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] few fts functions for jsonb

2017-03-26 Thread Dmitry Dolgov
> I'm not through looking at this. However, here are a few preliminary
comments

I've attached new versions of the patches with improvements related to
these commentaries.
diff --git a/src/backend/tsearch/to_tsany.c b/src/backend/tsearch/to_tsany.c
index 6e5de8f..8f7bcfe 100644
--- a/src/backend/tsearch/to_tsany.c
+++ b/src/backend/tsearch/to_tsany.c
@@ -16,6 +16,7 @@
 #include "tsearch/ts_cache.h"
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
+#include "utils/jsonb.h"
 
 
 typedef struct MorphOpaque
@@ -24,6 +25,14 @@ typedef struct MorphOpaque
 	int			qoperator;		/* query operator */
 } MorphOpaque;
 
+typedef struct TSVectorBuildState
+{
+	ParsedText	*prs;
+	TSVector	result;
+	Oid			cfgId;
+} TSVectorBuildState;
+
+static void add_to_tsvector(void *state, char *elem_value, int elem_len);
 
 Datum
 get_current_ts_config(PG_FUNCTION_ARGS)
@@ -256,6 +265,109 @@ to_tsvector(PG_FUNCTION_ARGS)
 		PointerGetDatum(in)));
 }
 
+Datum
+jsonb_to_tsvector(PG_FUNCTION_ARGS)
+{
+	Jsonb*jb = PG_GETARG_JSONB(0);
+	TSVectorBuildState	state;
+	ParsedText			*prs = (ParsedText *) palloc(sizeof(ParsedText));
+
+	prs->words = NULL;
+	state.result = NULL;
+	state.cfgId = getTSCurrentConfig(true);
+	state.prs = prs;
+
+	iterate_jsonb_values(jb, , (JsonIterateAction) add_to_tsvector);
+
+	PG_FREE_IF_COPY(jb, 1);
+
+	if (state.result == NULL)
+	{
+		/* There weren't any string elements in jsonb,
+		 * so wee need to return an empty vector */
+
+		if (prs->words != NULL)
+			pfree(prs->words);
+
+		state.result = palloc(CALCDATASIZE(0, 0));
+		SET_VARSIZE(state.result, CALCDATASIZE(0, 0));
+		state.result->size = 0;
+	}
+
+	PG_RETURN_TSVECTOR(state.result);
+}
+
+Datum
+json_to_tsvector(PG_FUNCTION_ARGS)
+{
+	text*json = PG_GETARG_TEXT_P(0);
+	TSVectorBuildState	state;
+	ParsedText			*prs = (ParsedText *) palloc(sizeof(ParsedText));
+
+	prs->words = NULL;
+	state.result = NULL;
+	state.cfgId = getTSCurrentConfig(true);
+	state.prs = prs;
+
+	iterate_json_values(json, , (JsonIterateAction) add_to_tsvector);
+
+	PG_FREE_IF_COPY(json, 1);
+	if (state.result == NULL)
+	{
+		/* There weren't any string elements in json,
+		 * so wee need to return an empty vector */
+
+		if (prs->words != NULL)
+			pfree(prs->words);
+
+		state.result = palloc(CALCDATASIZE(0, 0));
+		SET_VARSIZE(state.result, CALCDATASIZE(0, 0));
+		state.result->size = 0;
+	}
+
+	PG_RETURN_TSVECTOR(state.result);
+}
+
+/*
+ * Extend current TSVector from _state with a new one,
+ * build over a json(b) element.
+ */
+static void
+add_to_tsvector(void *_state, char *elem_value, int elem_len)
+{
+	TSVectorBuildState *state = (TSVectorBuildState *) _state;
+	ParsedText	*prs = state->prs;
+	TSVector	item_vector;
+	int			i;
+
+	prs->lenwords = elem_len / 6;
+	if (prs->lenwords == 0)
+		prs->lenwords = 2;
+
+	prs->words = (ParsedWord *) palloc(sizeof(ParsedWord) * prs->lenwords);
+	prs->curwords = 0;
+	prs->pos = 0;
+
+	parsetext(state->cfgId, prs, elem_value, elem_len);
+
+	if (prs->curwords)
+	{
+		if (state->result != NULL)
+		{
+			for (i = 0; i < prs->curwords; i++)
+prs->words[i].pos.pos = prs->words[i].pos.pos + TS_JUMP;
+
+			item_vector = make_tsvector(prs);
+
+			state->result = (TSVector) DirectFunctionCall2(tsvector_concat,
+	TSVectorGetDatum(state->result),
+	PointerGetDatum(item_vector));
+		}
+		else
+			state->result = make_tsvector(prs);
+	}
+}
+
 /*
  * to_tsquery
  */
diff --git a/src/backend/tsearch/wparser.c b/src/backend/tsearch/wparser.c
index 8ca1c62..ab1716a 100644
--- a/src/backend/tsearch/wparser.c
+++ b/src/backend/tsearch/wparser.c
@@ -20,6 +20,7 @@
 #include "tsearch/ts_cache.h"
 #include "tsearch/ts_utils.h"
 #include "utils/builtins.h"
+#include "utils/jsonb.h"
 #include "utils/varlena.h"
 
 
@@ -31,6 +32,19 @@ typedef struct
 	LexDescr   *list;
 } TSTokenTypeStorage;
 
+/* state for ts_headline_json_* */
+typedef struct HeadlineJsonState
+{
+	HeadlineParsedText *prs;
+	TSConfigCacheEntry *cfg;
+	TSParserCacheEntry *prsobj;
+	TSQueryquery;
+	List*prsoptions;
+	booltransformed;
+} HeadlineJsonState;
+
+static text * headline_json_value(void *_state, char *elem_value, int elem_len);
+
 static void
 tt_setup_firstcall(FuncCallContext *funcctx, Oid prsid)
 {
@@ -362,3 +376,177 @@ ts_headline_opt(PG_FUNCTION_ARGS)
 		PG_GETARG_DATUM(1),
 		PG_GETARG_DATUM(2)));
 }
+
+Datum
+ts_headline_jsonb_byid_opt(PG_FUNCTION_ARGS)
+{
+	Jsonb			*out, *jb = PG_GETARG_JSONB(1);
+	TSQuery			query = PG_GETARG_TSQUERY(2);
+	text			*opt = (PG_NARGS() > 3 && PG_GETARG_POINTER(3)) ? PG_GETARG_TEXT_P(3) : NULL;
+
+	HeadlineParsedText prs;
+	HeadlineJsonState *state = palloc0(sizeof(HeadlineJsonState));
+
+	memset(, 0, sizeof(HeadlineParsedText));
+	prs.lenwords = 32;
+	prs.words = (HeadlineWordEntry *) palloc(sizeof(HeadlineWordEntry) * prs.lenwords);
+
+	state->prs = 
+	state->cfg = lookup_ts_config_cache(PG_GETARG_OID(0));
+	state->prsobj = lookup_ts_parser_cache(state->cfg->prsId);
+	

Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-26 Thread David Rowley
On 27 March 2017 at 10:23, Tomas Vondra  wrote:
> I'm not sure we need to invent a new magic value, though. Can we simply look
> at force_parallel_mode, and if it's 'regress' then tread 0 differently?

see standard_planner()

if (force_parallel_mode != FORCE_PARALLEL_OFF && best_path->parallel_safe)
{
Gather   *gather = makeNode(Gather);


Probably force_parallel_mode is good for testing the tuple queue code,
and some code in Gather, but I'm not quite sure what else its good
for. Certainly not GatherMerge.



-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-03-26 Thread Thomas Munro
On Mon, Mar 27, 2017 at 12:12 PM, Peter Geoghegan  wrote:
> On Sun, Mar 26, 2017 at 3:41 PM, Thomas Munro
>  wrote:
>>> 1.  Segments are what buffile.c already calls the individual
>>> capped-at-1GB files that it manages.  They are an implementation
>>> detail that is not part of buffile.c's user interface.  There seems to
>>> be no reason to change that.
>>
>> After reading your next email I realised this is not quite true:
>> BufFileTell and BufFileSeek expose the existence of segments.
>
> Yeah, that's something that tuplestore.c itself relies on.
>
> I always thought that the main reason practical why we have BufFile
> multiplex 1GB segments concerns use of temp_tablespaces, rather than
> considerations that matter only when using obsolete file systems:
>
> /*
>  * We break BufFiles into gigabyte-sized segments, regardless of RELSEG_SIZE.
>  * The reason is that we'd like large temporary BufFiles to be spread across
>  * multiple tablespaces when available.
>  */
>
> Now, I tend to think that most installations that care about
> performance would be better off using RAID to stripe their one temp
> tablespace file system. But, I suppose this still makes sense when you
> have a number of file systems that happen to be available, and disk
> capacity is the main concern. PHJ uses one temp tablespace per worker,
> which I further suppose might not be as effective in balancing disk
> space usage.

I was thinking about IO bandwidth balance rather than size.  If you
rotate through tablespaces segment-by-segment, won't you be exposed to
phasing effects that could leave disk arrays idle for periods of time?
 Whereas if you assign them to participants, you can only get idle
arrays if you have fewer participants than tablespaces.

This seems like a fairly complex subtopic and I don't have a strong
view on it.  Clearly you could rotate through tablespaces on the basis
of participant, partition, segment, some combination, or something
else.  Doing it by participant seemed to me to be the least prone to
IO imbalance cause by phasing effects (= segment based) or data
distribution (= partition based), of the options I considered when I
wrote it that way.

Like you, I also tend to suspect that people would be more likely to
use RAID type technologies to stripe things like this for both
bandwidth and space reasons these days.  Tablespaces seem to make more
sense as a way of separating different classes of storage
(fast/expensive, slow/cheap etc), not as an IO or space striping
technique.  I may be way off base there though...

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Show more processes in pg_stat_activity.

2017-03-26 Thread Michael Paquier
On Mon, Mar 27, 2017 at 11:07 AM, Robert Haas  wrote:
> Show more processes in pg_stat_activity.
>
> Previously, auxiliary processes and background workers not connected
> to a database (such as the logical replication launcher) weren't
> shown.  Include them, so that we can see the associated wait state
> information.  Add a new column to identify the processes type, so that
> people can filter them out easily using SQL if they wish.
>
> Before this patch was written, there was discussion about whether we
> should expose this information in a separate view, so as to avoid
> contaminating pg_stat_activity with things people might not want to
> see.  But putting everything in pg_stat_activity was a more popular
> choice, so that's what the patch does.
>
> Kuntal Ghosh, reviewed by Amit Langote and Michael Paquier.  Some
> revisions and bug fixes by me.

-   /* Report to pgstat that this process is a WAL sender */
-   pgstat_report_activity(STATE_RUNNING, "walsender");
+   /* Report to pgstat that this process is running */
+   pgstat_report_activity(STATE_RUNNING, NULL);

Perhaps this bit should just be removed? We added it to make the
process show in pg_stat_activity, and with this patch it does show up
even without this call.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] comments in hash_alloc_buckets

2017-03-26 Thread Robert Haas
On Sat, Mar 25, 2017 at 3:28 AM, Ashutosh Sharma  wrote:
>>> While working on - [1], I realised that the following comments in
>>> _hash_alloc_buckets() needs to be corrected.
>>>
>>> /*
>>>  * Initialize the freed overflow page.  Just zeroing the page won't 
>>> work,
>>>  * See _hash_freeovflpage for similar usage.
>>>  */
>>> _hash_pageinit(page, BLCKSZ);
>>>
>>> Attached is the patch that corrects above comment. Thanks.
>>>
>>
>> - * Initialize the freed overflow page.  Just zeroing the page won't work,
>> + * Initialize the last page in hash index.
>>
>> I think saying ".. last page in hash index" sounds slightly awkward as
>> this is the last page for current split point, how about just
>> "Initialize the page. ..."
>
> Yes, I mean just adding "Initialize the page. ..." looks more simple
> and correct. Attached is the patch with similar comment.

Committed with a punctuation and formatting adjustment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problem in Parallel Bitmap Heap Scan?

2017-03-26 Thread Dilip Kumar
On Sat, Mar 25, 2017 at 2:25 PM, Thomas Munro
 wrote:
>>> I think in this area we need more testing, reason these are not tested
>>> properly because these are not the natural case for parallel bitmap.
>>> I think in next few days I will test more such cases by forcing the
>>> parallel bitmap.
>>>
>>
>> Okay, is your testing complete?

Yes, I have done more testing around this area with more cases, like
one page with BitmapOr etc.
Now it looks fine to me.
>>
>>> Here is the patch to fix the issue in hand.  I have also run the
>>> regress suit with force_parallel_mode=regress and all the test are
>>> passing.
>>>
>>
>> Thomas, did you get chance to verify Dilip's latest patch?
>>
>> I have added this issue in PostgreSQL 10 Open Items list so that we
>> don't loose track of this issue.
>
> The result is correct with this patch.  I ran make installcheck then
> the same steps as above and the query result was correct after
> creating the index.

Thanks for confirming.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Faster expression evaluation and targetlist projection.

2017-03-26 Thread Joe Conway
On 03/25/2017 09:52 PM, Andres Freund wrote:
> On 2017-03-25 20:40:23 -0700, Andres Freund wrote:
>> I blindly tried to fix these, let's hope that works.
> 
> In a second attempt (yes, reading diffs correctly is helpful), this
> resolved the selinux issue. Yeha.

+1!

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-03-26 Thread Thomas Munro
On Mon, Mar 27, 2017 at 9:41 AM, Andres Freund  wrote:
> Hi,
>
>
> SharedBufFile allows temporary files to be created by one backend and
> then exported for read-only access by other backends, with clean-up
> managed by reference counting associated with a DSM segment.  This includes
> changes to fd.c and buffile.c to support new kinds of temporary file.
>
>
> diff --git a/src/backend/storage/file/buffile.c 
> b/src/backend/storage/file/buffile.c
> index 4ca0ea4..a509c05 100644
> --- a/src/backend/storage/file/buffile.c
> +++ b/src/backend/storage/file/buffile.c
>
> I think the new facilities should be explained in the file's header.

Will do.

> @@ -68,9 +71,10 @@ struct BufFile
>  * avoid making redundant FileSeek calls.
>  */
>
> -   boolisTemp; /* can only add files if this 
> is TRUE */
> +   boolisSegmented;/* can only add files if this is TRUE 
> */
>
> That's a bit of a weird and uncommented upon change.

I was trying to cut down on the number of places we use the word
'temporary' to activate various different behaviours.  In this case,
the only thing it controls is whether the BufFile is backed by one
single fd.c File or many segments, so I figured it should be renamed.

As Peter and you have pointed out, there may be a case for removing it
altogether.

> @@ -79,6 +83,8 @@ struct BufFile
>  */
> ResourceOwner resowner;
>
> +   BufFileTag  tag;/* for discoverability 
> between backends */
>
> Not perfectly happy with the name tag here, the name is a bit too
> similar to BufferTag - something quite different.

Yeah, will rename.

> +static void
> +make_tagged_path(char *tempdirpath, char *tempfilepath,
> +const BufFileTag *tag, int segment)
> +{
> +   if (tag->tablespace == DEFAULTTABLESPACE_OID ||
> +   tag->tablespace == GLOBALTABLESPACE_OID)
> +   snprintf(tempdirpath, MAXPGPATH, "base/%s", 
> PG_TEMP_FILES_DIR);
> +   else
> +   {
> +   snprintf(tempdirpath, MAXPGPATH, "pg_tblspc/%u/%s/%s",
> +tag->tablespace, 
> TABLESPACE_VERSION_DIRECTORY,
> +PG_TEMP_FILES_DIR);
> +   }
> +
> +   snprintf(tempfilepath, MAXPGPATH, "%s/%s%d.%d.%d.%d.%d", tempdirpath,
> +PG_TEMP_FILE_PREFIX,
> +tag->creator_pid, tag->set, tag->partition, 
> tag->participant,
> +segment);
>
> Is there a risk that this ends up running afoul of filename length
> limits on some platforms?

Hmm.  I didn't think so.  Do we have a project guideline on maximum
path lengths based on some kind of survey?  There are various limits
involved (filesystem and OS per-path-component limits, total limits,
and the confusing PATH_MAX, MAX_PATH etc macros), but I was under the
impression that these numbers were always at least 255.  This scheme
seems capable of producing ~50 bytes in the final component
(admittedly more if int is 64 bits), and then nowhere near enough to
reach a limit of that order in the earlier components.

> +}
> +
> +static File
> +make_tagged_segment(const BufFileTag *tag, int segment)
> +{
> +   Filefile;
> +   chartempdirpath[MAXPGPATH];
> +   chartempfilepath[MAXPGPATH];
> +
> +   /*
> +* There is a remote chance that disk files with this (pid, set) pair
> +* already exists after a crash-restart.  Since the presence of
> +* consecutively numbered segment files is used by BufFileOpenShared 
> to
> +* determine the total size of a shared BufFile, we'll defend against
> +* confusion by unlinking segment 1 (if it exists) before creating 
> segment
> +* 0.
> +*/
>
> Gah.  Why on earth aren't we removing temp files when restarting, not
> just on the initial start?  That seems completely wrong?

See the comment above RemovePgTempFiles in fd.c.  From comments on
this list I understand that this is a subject that Robert and Tom
don't agree on.  I don't mind either way, but as long as
RemovePgTempFiles works that way and my patch uses the existence of
files to know how many files there are, I have to defend against that
danger by making sure that I don't accidentally identify files from
before a crash/restart as active.

> If we do decide not to change this: Why is that sufficient? Doesn't the
> same problem exist for segments later than the first?

It does exist and it is handled.  The comment really should say
"unlinking segment N + 1 (if it exists) before creating segment N".
Will update.

> +/*
> + * Open a file that was previously created in another backend with
> + * BufFileCreateShared.
> + */
> +BufFile *
> +BufFileOpenTagged(const BufFileTag *tag)
> +{
> +   BufFile*file = (BufFile *) palloc(sizeof(BufFile));
> +   chartempdirpath[MAXPGPATH];
> + 

Re: [HACKERS] New CORRESPONDING clause design

2017-03-26 Thread Pavel Stehule
Hi

2017-03-25 13:41 GMT+01:00 Surafel Temesgen :

>
>
>>
>> I took a quick look through this and noted that it fails to touch
>> ruleutils.c, which means that dumping of views containing CORRESPONDING
>> certainly doesn't work.
>>
> fixed
>
>> Also, the changes in parser/analyze.c seem rather massive and
>> correspondingly hard to review.  Is it possible to rearrange the
>> patch to reduce the size of that diff?  If you can avoid moving
>> or reindenting existing code, that'd help.
>>
> Part of transformSetOperationTree that make union data type of
> set operation target list became makeUnionDatatype inorder to
> easy using it multiple time and avoid very long transformSetOperationTree
> function
>
>
>> The code in that area seems rather confused, too.  For instance,
>> I'm not sure exactly what orderCorrespondingList() is good for,
>> but it certainly doesn't look to be doing anything that either its
>> name or its header comment (or the comments at the call sites) would
>> suggest.  Its input and output tlists are always in the same order.
>>
>> It give corresponding target list a sequential resnos
> Inorder to avoid touching generate_append_tlist I change
> the comment and function name as such
>
> I also think there should be some comments about exactly what matching
>> semantics we're enforcing.   The SQL standard says
>>
>> a) If CORRESPONDING is specified, then:
>>   i) Within the columns of T1, equivalent s shall
>>  not be specified more than once and within the columns of
>>  T2, equivalent s shall not be specified more
>>  than once.
>>
>> That seems unreasonably stringent to me; it ought to be sufficient to
>> forbid duplicates of the names listed in CORRESPONDING, or the common
>> column names if there's no BY list.  But whichever restriction you prefer,
>> this code seems to be failing to check it --- I certainly don't see any
>> new error message about "column name "foo" appears more than once".
>>
> fixed
>
> I'm not impressed by using A_Const for the members of the CORRESPONDING
>> name list.  That's not a clever solution, that's a confusing kluge,
>> because it's a complete violation of the meaning of A_Const.  Elsewhere
>> we just use lists of String for name lists, and that seems sufficient
>> here.  Personally I'd just use the existing columnList production rather
>> than rolling your own.
>>
> fixed
>
>>
>>
I am sending updated version:

1. the corresponding columns must be unique, other not - code refactoring
(the code is still O(N*M*Z) - but some slow operations (string cmp)
reduced) + regress tests
2. regress tests for views
3. some cleaning (white chars)

Regards

Pavel
diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index 30792f45f1..c3cdee54ad 100644
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -1601,6 +1601,9 @@ SELECT DISTINCT ON (expression , EXCEPT
   
   
+   CORRESPONDING
+  
+  
set union
   
   
@@ -1617,9 +1620,9 @@ SELECT DISTINCT ON (expression , 
-query1 UNION ALL query2
-query1 INTERSECT ALL query2
-query1 EXCEPT ALL query2
+query1 UNION ALL CORRESPONDING BY query2
+query1 INTERSECT ALL CORRESPONDING BY query2
+query1 EXCEPT ALL CORRESPONDING BY query2
 
query1 and
query2 are queries that can use any of
@@ -1659,11 +1662,22 @@ SELECT DISTINCT ON (expression , 
 
   
-   In order to calculate the union, intersection, or difference of two
-   queries, the two queries must be union compatible,
-   which means that they return the same number of columns and
-   the corresponding columns have compatible data types, as
-   described in .
+   EXCEPT returns all rows that are in the result of
+   query1 but not in the result of
+   query2.  (This is sometimes called the
+   difference between two queries.)  Again, duplicates
+   are eliminated unless EXCEPT ALL is used.
+  
+
+  
+   CORRESPONDING returns all columns that are in both 
+   query1 and query2 with the same name.
+  
+
+  
+   CORRESPONDING BY returns all columns in the column list 
+   that are also in both query1 and 
+   query2 with the same name.
   
  
 
diff --git a/doc/src/sgml/sql.sgml b/doc/src/sgml/sql.sgml
index 57396d7c24..f98c22e696 100644
--- a/doc/src/sgml/sql.sgml
+++ b/doc/src/sgml/sql.sgml
@@ -859,7 +859,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressioncondition ]
 [ GROUP BY expression [, ...] ]
 [ HAVING condition [, ...] ]
-[ { UNION | INTERSECT | EXCEPT } [ ALL ] select ]
+[ { UNION | INTERSECT | EXCEPT } [ ALL ] [ CORRESPONDING [ BY ( expression ) ] ] select ]
 [ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start ]
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c23d5c5285..74db9529a1 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2995,6 +2995,7 @@ _copySelectStmt(const 

Re: [HACKERS] Logical decoding on standby

2017-03-26 Thread Craig Ringer
On 20 March 2017 at 17:33, Andres Freund  wrote:

> Have you checked how high the overhead of XLogReadDetermineTimeline is?
> A non-local function call, especially into a different translation-unit
> (no partial inlining), for every single page might end up being
> noticeable.  That's fine in the cases it actually adds functionality,
> but for a master streaming out data, that's not actually adding
> anything.

I haven't been able to measure any difference. But, since we require
the caller to ensure a reasonably up to date ThisTimeLineID, maybe
it's worth adding an inlineable function for the fast-path that tests
the "page cached" and "timeline is current and unchanged" conditions?


//xlogutils.h:
static inline void XLogReadDetermineTimeline(...)
{
   ... first test for page already read-in and valid ...
   ... second test for ThisTimeLineId ...
   XLogReadCheckTimeLineChange(...)
}

XLogReadCheckTimeLineChange(...)
{
   ... rest of function
}

(Yes, I know "inline" means little, but it's a hint for readers)

I'd rather avoid using a macro since it'd be pretty ugly, but it's
also an option if an inline func is undesirable.

#define XLOG_READ_DETERMINE_TIMELINE \
  do { \
... same as above ...
  } while (0);


Can be done after CF if needed anyway, it's just fiddling some code
around. Figured I'd mention though.

>> + /*
>> +  * To avoid largely duplicating ReplicationSlotDropAcquired() 
>> or
>> +  * complicating it with already_locked flags for ProcArrayLock,
>> +  * ReplicationSlotControlLock and 
>> ReplicationSlotAllocationLock, we
>> +  * just release our ReplicationSlotControlLock to drop the 
>> slot.
>> +  *
>> +  * There's no race here: we acquired this slot, and no slot 
>> "behind"
>> +  * our scan can be created or become active with our target 
>> dboid due
>> +  * to our exclusive lock on the DB.
>> +  */
>> + LWLockRelease(ReplicationSlotControlLock);
>> + ReplicationSlotDropAcquired();
>> + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
>
> I don't see much problem with this, but I'd change the code so you
> simply do a goto restart; if you released the slot.  Then there's a lot
> less chance / complications around temporarily releasing
> ReplicationSlotControlLock

I don't quite get this. I suspect I'm just not seeing the implications
as clearly as you do.

Do you mean we should restart the whole scan of the slot array if we
drop any slot? That'll be O(n log m) but since we don't expect to be
working on a big array or a lot of slots it's unlikely to matter.

The patch coming soon will assume we'll restart the whole scan, anyway.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-03-26 Thread Peter Geoghegan
On Sun, Mar 26, 2017 at 6:50 PM, Thomas Munro
 wrote:
> Like you, I also tend to suspect that people would be more likely to
> use RAID type technologies to stripe things like this for both
> bandwidth and space reasons these days.  Tablespaces seem to make more
> sense as a way of separating different classes of storage
> (fast/expensive, slow/cheap etc), not as an IO or space striping
> technique.

I agree.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-26 Thread Tomas Vondra

On 03/25/2017 05:18 PM, Rushabh Lathia wrote:



On Sat, Mar 25, 2017 at 7:01 PM, Peter Eisentraut
> wrote:

On 3/25/17 09:01, David Rowley wrote:
> On 25 March 2017 at 23:09, Rushabh Lathia > wrote:
>> Also another point which I think we should fix is, when someone set
>> max_parallel_workers = 0, we should also set the
>> max_parallel_workers_per_gather
>> to zero. So that way it we can avoid generating the gather path with
>> max_parallel_worker = 0.
> I see that it was actually quite useful that it works the way it does.
> If it had worked the same as max_parallel_workers_per_gather, then
> likely Tomas would never have found this bug.

Another problem is that the GUC system doesn't really support cases
where the validity of one setting depends on the current value of
another setting.  So each individual setting needs to be robust against
cases of related settings being nonsensical.


Okay.

About the original issue reported by Tomas, I did more debugging and
found that - problem was gather_merge_clear_slots() was not returning
the clear slot when nreader is zero (means nworkers_launched = 0).
Due to the same scan was continue even all the tuple are exhausted,
and then end up with server crash at gather_merge_getnext(). In the patch
I also added the Assert into gather_merge_getnext(), about the index
should be less then the nreaders + 1 (leader).

PFA simple patch to fix the problem.



I think there are two issues at play, here - the first one is that we 
still produce parallel plans even with max_parallel_workers=0, and the 
second one is the crash in GatherMerge when nworkers=0.


Your patch fixes the latter (thanks for looking into it), which is 
obviously a good thing - getting 0 workers on a busy system is quite 
possible, because all the parallel workers can be already chewing on 
some other query.


But it seems a bit futile to produce the parallel plan in the first 
place, because with max_parallel_workers=0 we can't possibly get any 
parallel workers ever. I wonder why compute_parallel_worker() only looks 
at max_parallel_workers_per_gather, i.e. why shouldn't it do:


   parallel_workers = Min(parallel_workers, max_parallel_workers);

Perhaps this was discussed and is actually intentional, though.

Regarding handling this at the GUC level - I agree with Peter that 
that's not a good idea. I suppose we could deal with checking the values 
in the GUC check/assign hooks, but what we don't have is a way to undo 
the changes in all the GUCs. That is, if I do


   SET max_parallel_workers = 0;
   SET max_parallel_workers = 16;

I expect to end up with just max_parallel_workers GUC changed and 
nothing else.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-03-26 Thread Michael Paquier
On Sat, Mar 25, 2017 at 5:26 PM, Kuntal Ghosh
 wrote:
> On Fri, Mar 24, 2017 at 9:23 PM, Robert Haas  wrote:
>> I think this is still not good.  The places where pgstat_bestart() has
>> been added are not even correct.  For example, the call added to
>> BackgroundWriterMain() occurs after the section that does
>> error-recovery, so it would get repeated every time the background
>> writer recovers from an error.  There are similar problems elsewhere.
>> Furthermore, although in theory there's an idea here that we're making
>> it no longer the responsibility of InitPostgres() to call
>> pgstat_bestart(), the patch as proposed only removes one of the two
>> calls, so we really don't even have a consistent practice.  I think
>> it's better to go with the idea of having InitPostgres() be
>> responsible for calling this for regular backends, and
>> AuxiliaryProcessMain() for auxiliary backends.  That involves
>> substantially fewer calls to pgstat_bestart() and they are spread
>> across only two functions, which IMHO makes fewer bugs of omission a
>> lot less likely.
>
> Agreed. Calling it from  InitPostgres() and AuxiliaryProcessMain()
> seems correct because of the following two reasons as you've mentioned
> up in the thread:
> 1. security-filtering should be left to some higher-level facility
> that can make policy decisions rather than being hard-coded in the
> individual modules.
> 2. makes fewer bugs of omission a lot less likely.

Okay, fine for me.

>> - I modified the code to tolerate a NULL return from
>> AuxiliaryPidGetProc().  I am pretty sure that without that there's a
>> race condition that could lead to a crash if somebody tried to call
>> this function just as an auxiliary process was terminating.
>
> Wow. Haven't thought of that. If it's called after
> AuxiliaryProcKill(), a crash is evident.

This one is a good catch.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to drop old-style (V0) functions?

2017-03-26 Thread Craig Ringer
On 27 March 2017 at 10:59, Craig Ringer  wrote:
> On 27 March 2017 at 10:45, Craig Ringer  wrote:
>
>> Passes "make check" and recovery tests, check-world running now.
>
> A couple of fixes pending.

Updated.

I didn't have any way to make

seg_l = (SEG *) DatumGetPointer(DirectFunctionCall2(seg_union,
PointerGetDatum(seg_l),
PointerGetDatum(sort_items[i].data)));

pretty, but *shrug*.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From a1ae04c2d6aec7d8093d95e616a1e286e101d612 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 28 Feb 2017 23:14:58 -0800
Subject: [PATCH 1/2] Move contrib/seg to only use V1 calling conventions.

A later commit will remove V0 support.

Author: Andres Freund
Discussion: https://postgr.es/m/20161208213441.k3mbno4twhg2q...@alap3.anarazel.de
---
 contrib/cube/cube.c |   2 +-
 contrib/seg/seg.c   | 458 
 2 files changed, 247 insertions(+), 213 deletions(-)

diff --git a/contrib/cube/cube.c b/contrib/cube/cube.c
index 2bb2ed0..43ecccf 100644
--- a/contrib/cube/cube.c
+++ b/contrib/cube/cube.c
@@ -101,7 +101,7 @@ bool		g_cube_leaf_consistent(NDBOX *key, NDBOX *query, StrategyNumber strategy);
 bool		g_cube_internal_consistent(NDBOX *key, NDBOX *query, StrategyNumber strategy);
 
 /*
-** Auxiliary funxtions
+** Auxiliary functions
 */
 static double distance_1D(double a1, double a2, double b1, double b2);
 static bool cube_is_point_internal(NDBOX *cube);
diff --git a/contrib/seg/seg.c b/contrib/seg/seg.c
index 895d879..3980deb 100644
--- a/contrib/seg/seg.c
+++ b/contrib/seg/seg.c
@@ -47,56 +47,48 @@ PG_FUNCTION_INFO_V1(seg_center);
 /*
 ** GiST support methods
 */
-bool gseg_consistent(GISTENTRY *entry,
-SEG *query,
-StrategyNumber strategy,
-Oid subtype,
-bool *recheck);
-GISTENTRY  *gseg_compress(GISTENTRY *entry);
-GISTENTRY  *gseg_decompress(GISTENTRY *entry);
-float	   *gseg_penalty(GISTENTRY *origentry, GISTENTRY *newentry, float *result);
-GIST_SPLITVEC *gseg_picksplit(GistEntryVector *entryvec, GIST_SPLITVEC *v);
-bool		gseg_leaf_consistent(SEG *key, SEG *query, StrategyNumber strategy);
-bool		gseg_internal_consistent(SEG *key, SEG *query, StrategyNumber strategy);
-SEG		   *gseg_union(GistEntryVector *entryvec, int *sizep);
-SEG		   *gseg_binary_union(SEG *r1, SEG *r2, int *sizep);
-bool	   *gseg_same(SEG *b1, SEG *b2, bool *result);
+PG_FUNCTION_INFO_V1(gseg_consistent);
+PG_FUNCTION_INFO_V1(gseg_compress);
+PG_FUNCTION_INFO_V1(gseg_decompress);
+PG_FUNCTION_INFO_V1(gseg_picksplit);
+PG_FUNCTION_INFO_V1(gseg_penalty);
+PG_FUNCTION_INFO_V1(gseg_union);
+PG_FUNCTION_INFO_V1(gseg_same);
+static Datum gseg_leaf_consistent(Datum key, Datum query, StrategyNumber strategy);
+static Datum gseg_internal_consistent(Datum key, Datum query, StrategyNumber strategy);
+static Datum gseg_binary_union(Datum r1, Datum r2, int *sizep);
 
 
 /*
 ** R-tree support functions
 */
-bool		seg_same(SEG *a, SEG *b);
-bool		seg_contains_int(SEG *a, int *b);
-bool		seg_contains_float4(SEG *a, float4 *b);
-bool		seg_contains_float8(SEG *a, float8 *b);
-bool		seg_contains(SEG *a, SEG *b);
-bool		seg_contained(SEG *a, SEG *b);
-bool		seg_overlap(SEG *a, SEG *b);
-bool		seg_left(SEG *a, SEG *b);
-bool		seg_over_left(SEG *a, SEG *b);
-bool		seg_right(SEG *a, SEG *b);
-bool		seg_over_right(SEG *a, SEG *b);
-SEG		   *seg_union(SEG *a, SEG *b);
-SEG		   *seg_inter(SEG *a, SEG *b);
-void		rt_seg_size(SEG *a, float *sz);
+PG_FUNCTION_INFO_V1(seg_same);
+PG_FUNCTION_INFO_V1(seg_contains);
+PG_FUNCTION_INFO_V1(seg_contained);
+PG_FUNCTION_INFO_V1(seg_overlap);
+PG_FUNCTION_INFO_V1(seg_left);
+PG_FUNCTION_INFO_V1(seg_over_left);
+PG_FUNCTION_INFO_V1(seg_right);
+PG_FUNCTION_INFO_V1(seg_over_right);
+PG_FUNCTION_INFO_V1(seg_union);
+PG_FUNCTION_INFO_V1(seg_inter);
+static void rt_seg_size(SEG *a, float *size);
 
 /*
 ** Various operators
 */
-int32		seg_cmp(SEG *a, SEG *b);
-bool		seg_lt(SEG *a, SEG *b);
-bool		seg_le(SEG *a, SEG *b);
-bool		seg_gt(SEG *a, SEG *b);
-bool		seg_ge(SEG *a, SEG *b);
-bool		seg_different(SEG *a, SEG *b);
+PG_FUNCTION_INFO_V1(seg_cmp);
+PG_FUNCTION_INFO_V1(seg_lt);
+PG_FUNCTION_INFO_V1(seg_le);
+PG_FUNCTION_INFO_V1(seg_gt);
+PG_FUNCTION_INFO_V1(seg_ge);
+PG_FUNCTION_INFO_V1(seg_different);
 
 /*
-** Auxiliary funxtions
+** Auxiliary functions
 */
 static int	restore(char *s, float val, int n);
 
-
 /*
  * Input/Output functions
  */
@@ -193,13 +185,17 @@ seg_upper(PG_FUNCTION_ARGS)
 ** the predicate x op query == FALSE, where op is the oper
 ** corresponding to strategy in the pg_amop table.
 */
-bool
-gseg_consistent(GISTENTRY *entry,
-SEG *query,
-StrategyNumber strategy,
-	

Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-26 Thread Corey Huinker
>
> Patches do not apply cleanly.
> Part 1 gets:
>  error: patch failed: src/test/regress/parallel_schedule:89
>  error: src/test/regress/parallel_schedule: patch does not apply
>
> There is still the useless file, ok it is removed by part2. Could have
> been just one patch...
>

parallel_schedule failed because I hadn't rebased recently enough.

git format-patch did us no favors there. New patch is redone as one commit.

ISTM that PQExpBuffer is partially a memory leak. Something should need to
> be freed?
>

I copied that pattern from somewhere else, so yeah, I duplicated whatever
leak was there. Fixed.


> I think that you should use appendPQExpBufferChar and Str instead of
> relying on the format variant which is probably expensive. Something like:
>
>   if (num_options > 0)
> append...Char(buf, ' ');
>   append...Str(buf, ...);
>

All flavors of appendPQExpBuffer*() I can find have a const *char format
string, so no way to append a naked string. If you know differently, I'm
listening. Not fixed.


>
> is_true_boolean_expression: "return (success) ? tf : false;"
> Is this simply: "return success && tf;"?
>

Neat. Done.


>
> Some functions have opt1, opt2, but some start at opt0. This does not look
> too consistent, although the inconsistency may be preexisting from your
> patch. Basically, there is a need for some more restructuring in
> "command.c".


It is pre-existing. Maybe this patch will inspire someone else to make the
other more consistent.

v26 attached
From fc0c466b0331839efe722d089a8ead521e8a827e Mon Sep 17 00:00:00 2001
From: Corey Huinker 
Date: Sun, 26 Mar 2017 17:30:51 -0400
Subject: [PATCH] psql if v26

---
 doc/src/sgml/ref/psql-ref.sgml |   90 +-
 src/bin/psql/.gitignore|2 +
 src/bin/psql/Makefile  |4 +-
 src/bin/psql/command.c | 1383 
 src/bin/psql/common.c  |4 +-
 src/bin/psql/conditional.c |  103 ++
 src/bin/psql/conditional.h |   62 +
 src/bin/psql/copy.c|4 +-
 src/bin/psql/help.c|7 +
 src/bin/psql/mainloop.c|   54 +-
 src/bin/psql/prompt.c  |6 +-
 src/bin/psql/prompt.h  |3 +-
 src/bin/psql/startup.c |8 +-
 src/test/regress/expected/psql.out |  110 ++
 .../regress/expected/psql_if_on_error_stop.out |   14 +
 src/test/regress/parallel_schedule |2 +-
 src/test/regress/serial_schedule   |1 +
 src/test/regress/sql/psql.sql  |  109 ++
 src/test/regress/sql/psql_if_on_error_stop.sql |   17 +
 19 files changed, 1699 insertions(+), 284 deletions(-)
 create mode 100644 src/bin/psql/conditional.c
 create mode 100644 src/bin/psql/conditional.h
 create mode 100644 src/test/regress/expected/psql_if_on_error_stop.out
 create mode 100644 src/test/regress/sql/psql_if_on_error_stop.sql

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 2a9c412..18f8bfe 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -2064,6 +2064,93 @@ hello 10
 
 
   
+\if expression
+\elif expression
+\else
+\endif
+
+
+This group of commands implements nestable conditional blocks.
+A conditional block must begin with an \if and end
+with an \endif.  In between there may be any number
+of \elif clauses, which may optionally be followed
+by a single \else clause.  Ordinary queries and
+other types of backslash commands may (and usually do) appear between
+the commands forming a conditional block.
+
+
+The \if and \elif commands read
+the rest of the line and evaluate it as a boolean expression.  If the
+expression is true then processing continues
+normally; otherwise, lines are skipped until a
+matching \elif, \else,
+or \endif is reached.  Once
+an \if or \elif has succeeded,
+later matching \elif commands are not evaluated but
+are treated as false.  Lines following an \else are
+processed only if no earlier matching \if
+or \elif succeeded.
+
+
+Lines being skipped are parsed normally to identify queries and
+backslash commands, but queries are not sent to the server, and
+backslash commands other than conditionals
+(\if, \elif,
+\else, \endif) are
+ignored.  Conditional commands are checked only for valid nesting.
+
+
+The expression argument
+of \if or \elif
+is subject to variable interpolation and backquote expansion, just
+

Re: [HACKERS] Performance improvement for joins where outer side is unique

2017-03-26 Thread David Rowley
On 14 March 2017 at 16:37, David Rowley  wrote:
> On 14 March 2017 at 11:35, David Rowley 
> wrote:
>>
>> On 14 March 2017 at 07:50, Tom Lane  wrote:
>>>
>>> [ getting back to this patch finally... ]
>>>
>>> David Rowley  writes:
>>> > I've attached a patch which implements this, though only for
>>> > MergeJoin, else I'd imagine we'd also need to ensure all proofs used
>>> > for testing the uniqueness were also hash-able too. I added some XXX
>>> > comments in analyzejoin.c around the mergeopfamilies == NIL tests to
>>> > mention that Merge Join depends on all the unique proof quals having
>>> > mergeopfamilies.  This also assumes we'll never use some subset of
>>> > mergejoin-able quals for a merge join, which could be an interesting
>>> > area in the future, as we might have some btree index on a subset of
>>> > those columns to provide pre-sorted input. In short, it's a great
>>> > optimisation, but I'm a little scared we might break it one day.
>>>
>>> Umm ... it's broken already isn't it?  See the comment for
>>> generate_mergejoin_paths:
>>>
>> Thanks for looking at this again.
>>
>> Yeah confirmed. It's broken. I guess I just need to remember in the Path
>> if we got all the join quals, although I've not looked in detail what the
>> best fix is. I imagine it'll require storing something else in the JoinPath.
>
>
> OK, so I've spent some more time on this and I've come up with a solution.
>
> Basically the solution is to not skip mark and restore when joinquals
> contains any items.  This is a requirement for SEMI joins, but overly
> cautious for unique joins. However, I think it'll apply in most cases when
> Merge Join will be a win, and that's when there's a btree index on both
> sides of the join which covers all columns in the join condition.  I
> carefully commented this part of the code to explain what can be done to
> have it apply in more cases.
>
> This caused me to go and change the following code too:
>
> @@ -2676,6 +2688,9 @@ final_cost_mergejoin(PlannerInfo *root, MergePath
> *path,
>   * it off does not entitle us to deliver an invalid plan.
>   */
>   else if (innersortkeys == NIL &&
> + !((extra->inner_unique || path->jpath.jointype == JOIN_SEMI) &&
> + list_length(path->jpath.joinrestrictinfo) ==
> + list_length(path->path_mergeclauses)) &&
>   !ExecSupportsMarkRestore(inner_path))
>   path->materialize_inner = true;
>
> I've been staring at this for a while and I think it's correct. If it's
> wrong then we'd get "ERROR: unrecognized node type: " from ExecMarkPos().
>
> Here we must make sure and never skip materializing the inner side, if we're
> not going to skip mark and restore in ExecInitMergeJoin().
>
> I've attached a patch which fixes the issue.
>
> Also added a regression test to try to make sure it stays fixed. There's a
> few other mostly cosmetic fixes in there too.

Patch is attached which fixes up the conflict between the expression
evaluation performance patch.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


unique_joins_2017-03-27.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-26 Thread Tomas Vondra

On 03/25/2017 02:01 PM, David Rowley wrote:
>

I wondered if there's anything we can do here to better test cases
when no workers are able to try to ensure the parallel nodes work
correctly, but the more I think about it, the more I don't see wrong
with just using SET max_parallel_workers = 0;



It's demonstrably a valid way to disable parallel queries (i.e. there's 
nothing wrong with it), because the docs say this:


   Setting this value to 0 disables parallel query execution.

>

My vote would be to leave the GUC behaviour as is and add some tests
to select_parallel.sql which exploit setting max_parallel_workers to 0
for running some tests.

If that's not going to fly, then unless we add something else to allow
us to reliably not get any workers, then we're closing to close the
door on being able to write automatic tests to capture this sort of
bug.

... thinks for a bit

perhaps some magic value like -1 could be used for this... unsure of
how that would be documented though...



I agree it'd be very useful to have a more where we generate parallel 
plans but then prohibit starting any workers. That would detect this and 
similar issues, I think.


I'm not sure we need to invent a new magic value, though. Can we simply 
look at force_parallel_mode, and if it's 'regress' then tread 0 differently?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speedup twophase transactions

2017-03-26 Thread Michael Paquier
On Sun, Mar 26, 2017 at 4:50 PM, Nikhil Sontakke
 wrote:
> I was away for a bit. I will take a look at this patch and get back to you
> soon.

No problem. Thanks for your time!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical decoding of two-phase transactions

2017-03-26 Thread Craig Ringer
On 20 March 2017 at 21:47, Stas Kelvich  wrote:
>
>> On 20 Mar 2017, at 16:39, Craig Ringer  wrote:
>>
>> On 20 March 2017 at 20:57, Stas Kelvich  wrote:
>>>
 On 20 Mar 2017, at 15:17, Craig Ringer  wrote:

> I thought about having special field (or reusing one of the existing 
> fields)
> in snapshot struct to force filtering xmax > snap->xmax or xmin = 
> snap->xmin
> as Petr suggested. Then this logic can reside in ReorderBufferCommit().
> However this is not solving problem with catcache, so I'm looking into it 
> right now.

 OK, so this is only an issue if we have xacts that change the schema
 of tables and also insert/update/delete to their heaps. Right?

 So, given that this is CF3 for Pg10, should we take a step back and
 impose the limitation that we can decode 2PC with schema changes or
 data row changes, but not both?
>>>
>>> Yep, time is tight. I’ll try today/tomorrow to proceed with this two scan 
>>> approach.
>>> If I’ll fail to do that during this time then I’ll just update this patch 
>>> to decode
>>> only non-ddl 2pc transactions as you suggested.
>>
>> I wasn't suggesting not decoding them, but giving the plugin the
>> option of whether to proceed with decoding or not.
>>
>> As Simon said, have a pre-decode-prepared callback that lets the
>> plugin get a lock on the 2pc xact if it wants, or say it doesn't want
>> to decode it until it commits.
>>
>> That'd be useful anyway, so we can filter and only do decoding at
>> prepare transaction time of xacts the downstream wants to know about
>> before they commit.
>
> Ah, got that. Okay.

Any news here?

We're in the last week of the CF. If you have a patch that's nearly
ready or getting there, now would be a good time to post it for help
and input from others.

I would really like to get this in, but we're running out of time.

Even if you just post your snapshot management work, with the cosmetic
changes discussed above, that would be a valuable start.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance improvement for joins where outer side is unique

2017-03-26 Thread David Rowley
On 27 March 2017 at 09:28, David Rowley  wrote:

> Patch is attached which fixes up the conflict between the expression
> evaluation performance patch.

Seems I forgot to commit locally before creating the patch... Here's
the actual patch I meant to attach earlier.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


unique_joins_2017-03-27a.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Valgrind failures caused by multivariate stats patch

2017-03-26 Thread Tomas Vondra

On 03/26/2017 08:47 PM, Andres Freund wrote:

On 2017-03-26 20:38:52 +0200, Tomas Vondra wrote:

...
Hmmm, so I have a theory about what's going on, but no matter what I do I
can't trigger these valgrind failures. What switches are you using to start
valgrind? I'm using this:

valgrind --leak-check=no --undef-value-errors=yes \
  --expensive-definedness-checks=yes --track-origins=yes \
  --read-var-info=yes --gen-suppressions=all \
  --suppressions=src/tools/valgrind.supp --time-stamp=yes \
  --log-file=$HOME/pg-valgrind/%p.log --trace-children=yes \
  postgres --log_line_prefix="%m %p " --log_statement=all \
  --shared_buffers=64MB -D /home/user/tmp/data-valgrind 2>&1


--quiet \
 --error-exitcode=55 \
 --suppressions=${PG_SRC_DIR}/src/tools/valgrind.supp \
 --trace-children=yes --track-origins=yes --read-var-info=yes \
 --num-callers=20 \
 --leak-check=no \
 --gen-suppressions=all \

the error-exitcode makes it a whole lot easier to see an error, because
it triggers a crash-restart cycle at backend exit ;)



OK, this did the trick. And just as I suspected, it seems to be due to 
doing memcpy+offsetof when serializing the statistic into a bytea. The 
attached patch fixes that for me. Can you test it on your build?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/statistics/mvdistinct.c b/src/backend/statistics/mvdistinct.c
index 5df4e29..acc2d7e 100644
--- a/src/backend/statistics/mvdistinct.c
+++ b/src/backend/statistics/mvdistinct.c
@@ -161,10 +161,10 @@ statext_ndistinct_serialize(MVNDistinct *ndistinct)
 	Assert(ndistinct->type == STATS_NDISTINCT_TYPE_BASIC);
 
 	/*
-	 * Base size is base struct size, plus one base struct for each items,
-	 * including number of items for each.
+	 * Base size is size of scalar fields in the struct, plus one base struct
+	 * for each item, including number of items for each.
 	 */
-	len = VARHDRSZ + offsetof(MVNDistinct, items) +
+	len = VARHDRSZ + (3 * sizeof(uint32)) +
 		ndistinct->nitems * (offsetof(MVNDistinctItem, attrs) + sizeof(int));
 
 	/* and also include space for the actual attribute numbers */
@@ -182,9 +182,15 @@ statext_ndistinct_serialize(MVNDistinct *ndistinct)
 
 	tmp = VARDATA(output);
 
-	/* Store the base struct values */
-	memcpy(tmp, ndistinct, offsetof(MVNDistinct, items));
-	tmp += offsetof(MVNDistinct, items);
+	/* Store the base struct values (magic, type, nitems) */
+	memcpy(tmp, >magic, sizeof(uint32));
+	tmp += sizeof(uint32);
+
+	memcpy(tmp, >type, sizeof(uint32));
+	tmp += sizeof(uint32);
+
+	memcpy(tmp, >nitems, sizeof(uint32));
+	tmp += sizeof(uint32);
 
 	/*
 	 * store number of attributes and attribute numbers for each ndistinct
@@ -231,9 +237,10 @@ statext_ndistinct_deserialize(bytea *data)
 	if (data == NULL)
 		return NULL;
 
-	if (VARSIZE_ANY_EXHDR(data) < offsetof(MVNDistinct, items))
+	/* we expect at least the basic fields of MVNDistinct struct */
+	if (VARSIZE_ANY_EXHDR(data) < (3 * sizeof(uint32)))
 		elog(ERROR, "invalid MVNDistinct size %ld (expected at least %ld)",
-			 VARSIZE_ANY_EXHDR(data), offsetof(MVNDistinct, items));
+			 VARSIZE_ANY_EXHDR(data), 3 * sizeof(uint32));
 
 	/* read the MVNDistinct header */
 	ndistinct = (MVNDistinct *) palloc(sizeof(MVNDistinct));
@@ -241,18 +248,24 @@ statext_ndistinct_deserialize(bytea *data)
 	/* initialize pointer to the data part (skip the varlena header) */
 	tmp = VARDATA_ANY(data);
 
-	/* get the header and perform basic sanity checks */
-	memcpy(ndistinct, tmp, offsetof(MVNDistinct, items));
-	tmp += offsetof(MVNDistinct, items);
+	/* get the header fields and perform basic sanity checks */
+	memcpy(>magic, tmp, sizeof(uint32));
+	tmp += sizeof(uint32);
 
 	if (ndistinct->magic != STATS_NDISTINCT_MAGIC)
 		elog(ERROR, "invalid ndistinct magic %d (expected %d)",
 			 ndistinct->magic, STATS_NDISTINCT_MAGIC);
 
+	memcpy(>type, tmp, sizeof(uint32));
+	tmp += sizeof(uint32);
+
 	if (ndistinct->type != STATS_NDISTINCT_TYPE_BASIC)
 		elog(ERROR, "invalid ndistinct type %d (expected %d)",
 			 ndistinct->type, STATS_NDISTINCT_TYPE_BASIC);
 
+	memcpy(>nitems, tmp, sizeof(uint32));
+	tmp += sizeof(uint32);
+
 	Assert(ndistinct->nitems > 0);
 
 	/* what minimum bytea size do we expect for those parameters */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-03-26 Thread Thomas Munro
On Mon, Mar 27, 2017 at 11:03 AM, Thomas Munro
 >> Is there a risk that this ends up
running afoul of filename length
>> limits on some platforms?
>
> Hmm.  I didn't think so.  Do we have a project guideline on maximum
> path lengths based on some kind of survey?  There are various limits
> involved (filesystem and OS per-path-component limits, total limits,
> and the confusing PATH_MAX, MAX_PATH etc macros), but I was under the
> impression that these numbers were always at least 255.  This scheme
> seems capable of producing ~50 bytes in the final component
> (admittedly more if int is 64 bits), and then nowhere near enough to
> reach a limit of that order in the earlier components.

Err, plus prefix.  Still seems unlikely to be too long.

>> I'm a bit unhappy with the partition terminology around this. It's
>> getting a bit confusing. We have partitions, participants and
>> segements. Most of them could be understood for something entirely
>> different than the meaning you have here...
>
> Ok.  Let me try to explain and defend them and see if we can come up
> with something better.
>
> 1.  Segments are what buffile.c already calls the individual
> capped-at-1GB files that it manages.  They are an implementation
> detail that is not part of buffile.c's user interface.  There seems to
> be no reason to change that.

After reading your next email I realised this is not quite true:
BufFileTell and BufFileSeek expose the existence of segments.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_wal_write statistics view

2017-03-26 Thread Haribabu Kommi
On Sat, Mar 25, 2017 at 6:40 AM, Fujii Masao  wrote:

> On Wed, Feb 15, 2017 at 12:53 PM, Haribabu Kommi
>  wrote:
> >
> >
> > On Wed, Feb 8, 2017 at 9:36 PM, Amit Kapila 
> wrote:
> >>
> >> On Tue, Feb 7, 2017 at 11:47 AM, Haribabu Kommi
> >>  wrote:
> >> > Hi Hackers,
> >> >
> >> > I just want to discuss adding of a new statistics view that provides
> >> > the information of wal writing details as follows
> >> >
> >>
> >> +1.  I think it will be useful to observe WAL activity.
> >
> >
> > Thanks for your opinion.
> >
> >> > postgres=# \d pg_stat_wal_writer
> >> > View "pg_catalog.pg_stat_wal_writer"
> >> > Column |   Type   | Collation |
> Nullable
> >> > |
> >> > Default
> >> >
> >> > ---+--+-
> --+--+-
> >> >  num_backend_writes   | bigint   |   |
> >> > |
> >> >  num_total_writes  | bigint   |   |  |
> >> >  num_blocks  | bigint   |   |  |
> >> >  total_write_time   | bigint|   |  |
> >> >  stats_reset   | timestamp with time zone |   |
> >> > |
> >> >
> >> > The columns of the view are
> >> > 1. Total number of xlog writes that are called from the backend.
> >> > 2. Total number of xlog writes that are called from both backend
> >> >  and background workers. (This column can be changed to just
> >> >  display on the background writes).
> >> > 3. The number of the blocks that are written.
> >> > 4. Total write_time of the IO operation it took, this variable data is
> >> > filled only when the track_io_timing GUC is enabled.
> >>
> >> So, here is *write_time* the total time system has spent in WAL
> >> writing before the last reset?
> >
> >
> > total write_time spent in WAL writing "after" the last reset in
> > milliseconds.
> >
> >> I think there should be a separate column for write and sync time.
> >>
> >
> > Added.
> >
> >>
> >> > Or it is possible to integrate the new columns into the existing
> >> > pg_stat_bgwriter view also.
> >> >
> >>
> >> I feel separate view is better.
> >
> >
> > Ok.
> >
> > Following the sample out of the view after regress run.
> >
> > postgres=# select * from pg_stat_walwrites;
> > -[ RECORD 1 ]--+--
> > backend_writes | 19092
> > writes | 663
> > write_blocks   | 56116
> > write_time | 0
> > sync_time  | 3064
> > stats_reset| 2017-02-15 13:37:09.454314+11
> >
> > Currently, writer, walwriter and checkpointer processes
> > are considered as background processes that can do
> > the wal write mainly.
>

Thanks for the review.

I'm not sure if this categorization is good. You told that this view is
> useful
> to tune walwriter parameters at the top of this thread. If so, ISTM that
> the information about walwriter's activity should be separated from others.
>

Yes, that's correct. First I thought of providing the statistics of
walwriter, but
later in development, it turned into showing statistics of all wal write
activity
of background processes also to differentiate the actual write by the
backends.


> What about other processes which *can* write WAL, for example walsender
> (e.g., BASE_BACKUP can cause WAL record), startup process (e.g., end-of-
> recovery checkpoint) and logical replication worker (Not sure if it always
> works with synchronous_commit=off, though). There might be other processes
> which can write WAL


It is possible to add the walsender, stratup and other processes easily,
but not
background workers that does some wal write operations until unless they
report the stats with pgstat_report_stat(). Is it fine to ignore the
workers that
does not report the stats?



> Why didn't you separate "write_blocks", "write_time" and "sync_time" per
> the process category, like "backend_writes" and "writes"?
>

Ok. I will add those columns.


> This view doesn't seem to take into consideration the WAL writing and
> flushing
> during creating new WAL segment file.
>
> I think that it's better to make this view report also the number of WAL
> pages
> which are written when wal_buffer is full. This information is useful to
> tune the size of wal_buffers. This was proposed by Nagayasu before.
> https://www.postgresql.org/message-id/4ff824f3.5090...@uptime.jp


Ok. But this new column just shows how many times the WAL buffers are
flushed
because of wal buffers are full. Not the WAL pages that are actually
flushed because
of wal buffers full as a separate column.


Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Time to drop old-style (V0) functions?

2017-03-26 Thread Craig Ringer
On 7 March 2017 at 22:50, Peter Eisentraut
 wrote:
> I think we have consensus to go ahead with this, and the patches are
> mostly mechanical, so I only have a few comments on style and one
> possible bug below:
>
>
> 0001-Move-contrib-seg-to-only-use-V1-calling-conventions.patch
>
>  static int restore(char *s, float val, int n);
>
> -
>  
> /*
>   * Input/Output functions
>   
> */
>
> +
>
> doesn't seem like the right whitespace change

Fixed.

> @@ -359,13 +360,14 @@ gseg_picksplit(GistEntryVector *entryvec,
> /*
>  * Emit segments to the left output page, and compute its bounding 
> box.
>  */
> -   datum_l = (SEG *) palloc(sizeof(SEG));
> -   memcpy(datum_l, sort_items[0].data, sizeof(SEG));
> +   datum_l = PointerGetDatum(palloc(sizeof(SEG)));
> +   memcpy(DatumGetPointer(datum_l), sort_items[0].data, sizeof(SEG));
>
> There would be a little bit less back-and-forth here if you kept
> datum_l and datum_r of type SEG *.

Also, currently it does:

v->spl_ldatum = PointerGetDatum(datum_l);
v->spl_rdatum = PointerGetDatum(datum_r);

even though they're already Datum.

Downside of keeping them as SEG is we land up with:

seg_l = DatumGetPointer(DirectFunctionCall2(seg_union,
PointerGetDatum(datum_l),

PointerGetDatum(sort_items[i].data)));

but at least it's tied to the fmgr call.

>
> case RTOverlapStrategyNumber:
> -   retval = (bool) seg_overlap(key, query);
> +   retval =
> +   
> !DatumGetBool(DirectFunctionCall2(seg_overlap, key, query));
> break;
>
> Accidentally flipped the logic?

Looks like it. I don't see a reason for it; there's no corresponding
change around seg_overlap and the other callsite isn't negated:

 case RTOverlapStrategyNumber:
-retval = (bool) seg_overlap(key, query);
+retval = DirectFunctionCall2(seg_overlap, key, query);

so I'd say copy-pasteo, given nearby negated bools.

Fixed. Didn't find any other cases.

> -bool
> -seg_contains(SEG *a, SEG *b)
> +Datum
> +seg_contains(PG_FUNCTION_ARGS)
>  {
> -   return ((a->lower <= b->lower) && (a->upper >= b->upper));
> +   SEG*a = (SEG *) PG_GETARG_POINTER(0);
> +   SEG*b = (SEG *) PG_GETARG_POINTER(1);
> +
> +   PG_RETURN_BOOL((a->lower <= b->lower) && (a->upper >= b->upper));
>  }
>
> -bool
> -seg_contained(SEG *a, SEG *b)
> +Datum
> +seg_contained(PG_FUNCTION_ARGS)
>  {
> -   return (seg_contains(b, a));
> +   PG_RETURN_DATUM(
> +   DirectFunctionCall2(seg_contains,
> + 
>   PG_GETARG_DATUM(1),
> + 
>   PG_GETARG_DATUM(0)));
>  }
>
> Maybe in seg_contained also assign the arguments to a and b, so it's
> easier to see the relationship between contains and contained.

Done. Makes for nicer formatting too.

> -bool
> -seg_same(SEG *a, SEG *b)
> +Datum
> +seg_same(PG_FUNCTION_ARGS)
>  {
> -   return seg_cmp(a, b) == 0;
> +   Datum   cmp =
> +   DirectFunctionCall2(seg_cmp, PG_GETARG_DATUM(0), PG_GETARG_DATUM(1));
> +
> +   PG_RETURN_BOOL(DatumGetInt32(cmp) == 0);
>  }
>
> I would write this as
>
> int32 cmp = DatumGetInt32(DirectFunctionCall2(seg_cmp, PG_GETARG_DATUM(0), 
> PG_GETARG_DATUM(1));
>
> Having a variable of type Datum is a bit meaningless.

If you're passing things around between other fmgr-using functions
it's often useful to just carry the Datum form around.

In this case it doesn't make much sense though. Done.


> 0002-Remove-support-for-version-0-calling-conventions.patch
>
> diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml
> index 255bfddad7..cd41b89136 100644
> --- a/doc/src/sgml/xfunc.sgml
> +++ b/doc/src/sgml/xfunc.sgml
> @@ -675,7 +675,7 @@ CREATE FUNCTION mleast(VARIADIC arr numeric[]) RETURNS 
> numeric AS $$
>  $$ LANGUAGE SQL;
>
>  SELECT mleast(10, -1, 5, 4.4);
> - mleast
> + mleast
>  
>   -1
>  (1 row)
>
> These changes are neither right nor wrong, but we have had previous
> discussions about this and settled on leaving the whitespace as psql
> outputs it.  In any case it seems unrelated here.

Removed. Though personally since the patch touches the file anyway it
doesn't seem to matter much either way.

> +
> +Currently only one calling convention is used for C functions
> +(version 1). Support for that calling convention is
> +indicated by writing a PG_FUNCTION_INFO_V1() macro
> +call for the function, as illustrated below.
> 
>
> extra blank line

Fixed.

> @@ 

Re: [HACKERS] Time to drop old-style (V0) functions?

2017-03-26 Thread Craig Ringer
On 27 March 2017 at 10:45, Craig Ringer  wrote:

> Passes "make check" and recovery tests, check-world running now.

A couple of fixes pending.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Move all am-related reloption code into src/backend/access/[am-name] and get rid of relopt_kind for custom AM

2017-03-26 Thread Nikolay Shaplov
В письме от 26 марта 2017 15:02:12 Вы написали:
> Nikolay Shaplov wrote:
> > If I would think about it now: we always know how many options we will
> > have. So we can just pass this number to palloc and assert if somebody
> > adds more options then expected... What do yo think about it.
> 
> I think we need to preserve the ability to add custom options, because
> extensions may want to do that.
Ok. At least this will not do any harm :-)

-- 
Nikolay Shaplov, independent Perl & C/C++ developer. Available for hire.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Move all am-related reloption code into src/backend/access/[am-name] and get rid of relopt_kind for custom AM

2017-03-26 Thread Alvaro Herrera
Nikolay Shaplov wrote:

> If I would think about it now: we always know how many options we will have. 
> So we can just pass this number to palloc and assert if somebody adds more 
> options then expected... What do yo think about it.

I think we need to preserve the ability to add custom options, because
extensions may want to do that.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PDF build is broken

2017-03-26 Thread Peter Eisentraut
On 3/25/17 07:27, Devrim Gündüz wrote:
> I can't build PDFs with latest snapshot tarball:

Fixed.  But I also suggest that you try out the FOP based builds,
because the jadetex-based builds will probably go away soon.

> 
> 
> $ make postgres-A4.pdf
> { \
>   echo ""; \
>   echo ""; \
> } > version.sgml
> '/usr/bin/perl' ./mk_feature_tables.pl YES 
> ../../../src/backend/catalog/sql_feature_packages.txt 
> ../../../src/backend/catalog/sql_features.txt > features-supported.sgml
> '/usr/bin/perl' ./mk_feature_tables.pl NO 
> ../../../src/backend/catalog/sql_feature_packages.txt 
> ../../../src/backend/catalog/sql_features.txt > features-unsupported.sgml
> '/usr/bin/perl' ./generate-errcodes-table.pl 
> ../../../src/backend/utils/errcodes.txt > errcodes-table.sgml
> openjade  -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D . -c 
> /usr/share/sgml/docbook/dsssl-stylesheets/catalog -d stylesheet.dsl -t sgml 
> -i output-html -V html-index postgres.sgml
> LC_ALL=C '/usr/bin/perl' /usr/bin/collateindex.pl -f -g -i 'bookindex' -o 
> bookindex.sgml HTML.index
> Processing HTML.index...
> 2762 entries loaded...
> collateindex.pl: duplicated index entry found: TRUNC  
> 1 entries ignored...
> Done.
> openjade  -D . -D . -c /usr/share/sgml/docbook/dsssl-stylesheets/catalog -d 
> ./stylesheet.dsl -t tex -V tex-backend -i output-print -i include-index -V 
> texpdf-output -V '%paper-type%'=A4 -o postgres-A4.tex-pdf postgres.sgml
> openjade:ref/alter_collation.sgml:96:10:E: [xref to REFSECT1 unsupported]
> pdfjadetex postgres-A4.tex-pdf
> This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016) (preloaded 
> format=pdfjadetex)
>  restricted \write18 enabled.
> entering extended mode
> ! I can't find file `postgres-A4.tex-pdf'.
> <*> postgres-A4.tex-pdf
>
> (Press Enter to retry, or Control-D to exit)
> Please type another input file name: 
> 
> 
> Can someone please take a look?
> 
> Regards,
> -- 
> Devrim Gündüz
> EnterpriseDB: http://www.enterprisedb.com
> PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
> Twitter: @DevrimGunduz , @DevrimGunduzTR
> 


-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Stale comments in vacuumlazy.c

2017-03-26 Thread Pavan Deolasee
I happened to notice a stale comment at the very beginning of vacuumlazy.c.
ISTM we forgot to fix that when we introduced FSM. With FSM, vacuum no
longer needed to track per-page free space info. I propose attached fix.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


vacuumlazy_comment_fixes.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Move all am-related reloption code into src/backend/access/[am-name] and get rid of relopt_kind for custom AM

2017-03-26 Thread Nikolay Shaplov
В письме от 17 марта 2017 14:21:26 пользователь Alvaro Herrera написал:
> I gave this patch a quick skim.
Thanks!

> At first I was confused by the term
> "catalog"; I thought it meant we stored options in a system table.  But
> that's not what is meant at all; instead, what we do is build these
> "catalogs" in memory.  Maybe a different term could have been used, but
> I'm not sure it's stricly necessary.  I wouldn't really bother.
Yes "catalog" is quite confusing. I did not find better name while I was 
writing the code, so I take first one, that came into my mind.
If you, or somebody else, have better idea how to call this sets of options 
definitions I will gladly rename it, as catalog is a bad name. May be 
OptionsDefSet instead of OptionsCatalog?


> I'm confused by the "no ALTER, no lock" rule.  Does it mean that if
> "ALTER..SET" is forbidden?  Because I noticed that brin's
> pages_per_range is marked as such, but we do allow that option to change
> with ALTER..SET, so there's at least one bug there, and I would be
> surprised if there aren't any others.

If you grep, for example, gist index code for "buffering_mode" option, you will 
see, that this option is used only in gistbuild.c. There it is written into 
the meta page, and afterwards, value from meta page is used, and one from 
options, is just ignored.
Nowdays you can successfully alter this value, but this will not affect 
anything until index is recreated... I thought it is very confusing behavior 
and decided that we should just forbid such alters.


> Please make sure to mark functions as static (e.g. bringetreloptcatalog).
Ok. Will do.

> Why not pass ->struct_size and ->postprocess_fun (which I'd rename it as
> ->postprocess_fn, more in line with our naming style) as a parameter to
> allocateOptionsCatalog?
struct_size -- very good idea!
postprocess_fn -- now it is rarely used. In most cases it is NULL. May be it 
would be ok to set it afterwards in that rare cases when it is needed.

> Also, to avoid repalloc() in most cases (and to
> avoid pallocing more options that you're going to need in a bunch of
> cases, perhaps that function should the number of times you expect to
> call AddItems for that catalog (since you do it immediately afterwards
> in all cases I can see), and allocate that number.  If later another
> item arrives, then repalloc using the same code you already have in
> AddItems().
I've copied this code from reloptions code for custom options. Without much 
thinking about it.

If I would think about it now: we always know how many options we will have. 
So we can just pass this number to palloc and assert if somebody adds more 
options then expected... What do yo think about it.


> Something is wrong with leading whitespace in many places; either you
> added too many tabs, or the wrong number spaces; not sure which but
> visually it's clearly wrong.  ... Actually there are whitespace-style
> violations in several places; please fix using pgindent (after adding
> any new typedefs your defining to typedefs.list).
I will run pgindent on my code.



-- 
Nikolay Shaplov, independent Perl & C/C++ developer. Available for hire.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Valgrind failures caused by multivariate stats patch

2017-03-26 Thread Andres Freund
On 2017-03-26 20:38:52 +0200, Tomas Vondra wrote:
> On 03/25/2017 10:10 PM, Andres Freund wrote:
> ...
> > ==2486== Uninitialised byte(s) found during client check request
> > ==2486==at 0x1C4857: printtup (printtup.c:347)
> > ==2486==by 0x401FD5: ExecutePlan (execMain.c:1681)
> > ==2486==by 0x3FFDED: standard_ExecutorRun (execMain.c:355)
> > ==2486==by 0x3FFC07: ExecutorRun (execMain.c:298)
> > ==2486==by 0x5C40BD: PortalRunSelect (pquery.c:928)
> > ==2486==by 0x5C3D50: PortalRun (pquery.c:769)
> > ==2486==by 0x5BDC1A: exec_simple_query (postgres.c:1101)
> > ==2486==by 0x5C203F: PostgresMain (postgres.c:4071)
> > ==2486==by 0x525138: BackendRun (postmaster.c:4317)
> > ==2486==by 0x524848: BackendStartup (postmaster.c:3989)
> > ==2486==by 0x520C4A: ServerLoop (postmaster.c:1729)
> > ==2486==by 0x5201D9: PostmasterMain (postmaster.c:1337)
> > ==2486==by 0x45DFA6: main (main.c:228)
> > ==2486==  Address 0x3e8c50f5 is in a rw- anonymous segment
> > ==2486==  Uninitialised value was created by a heap allocation
> > ==2486==at 0x76212F: palloc (mcxt.c:872)
> > ==2486==by 0x578814: statext_ndistinct_build (mvdistinct.c:80)
> > ==2486==by 0x577CCE: BuildRelationExtStatistics (extended_stats.c:94)
> > ==2486==by 0x3546B6: do_analyze_rel (analyze.c:577)
> > ==2486==by 0x353B76: analyze_rel (analyze.c:271)
> > ==2486==by 0x3E7B89: vacuum (vacuum.c:321)
> > ==2486==by 0x3E7755: ExecVacuum (vacuum.c:122)
> > ==2486==by 0x5C5F1C: standard_ProcessUtility (utility.c:670)
> > ==2486==by 0x5C5724: ProcessUtility (utility.c:353)
> > ==2486==by 0x5C46CA: PortalRunUtility (pquery.c:1174)
> > ==2486==by 0x5C48D0: PortalRunMulti (pquery.c:1317)
> > ==2486==by 0x5C3E1A: PortalRun (pquery.c:795)
> > ==2486==by 0x5BDC1A: exec_simple_query (postgres.c:1101)
> > ==2486==by 0x5C203F: PostgresMain (postgres.c:4071)
> > ==2486==by 0x525138: BackendRun (postmaster.c:4317)
> > ==2486==by 0x524848: BackendStartup (postmaster.c:3989)
> > ==2486==by 0x520C4A: ServerLoop (postmaster.c:1729)
> > ==2486==by 0x5201D9: PostmasterMain (postmaster.c:1337)
> > ==2486==by 0x45DFA6: main (main.c:228)
> 
> Hmmm, so I have a theory about what's going on, but no matter what I do I
> can't trigger these valgrind failures. What switches are you using to start
> valgrind? I'm using this:
> 
> valgrind --leak-check=no --undef-value-errors=yes \
>   --expensive-definedness-checks=yes --track-origins=yes \
>   --read-var-info=yes --gen-suppressions=all \
>   --suppressions=src/tools/valgrind.supp --time-stamp=yes \
>   --log-file=$HOME/pg-valgrind/%p.log --trace-children=yes \
>   postgres --log_line_prefix="%m %p " --log_statement=all \
>   --shared_buffers=64MB -D /home/user/tmp/data-valgrind 2>&1

--quiet \
 --error-exitcode=55 \
 --suppressions=${PG_SRC_DIR}/src/tools/valgrind.supp \
 --trace-children=yes --track-origins=yes --read-var-info=yes \
 --num-callers=20 \
 --leak-check=no \
 --gen-suppressions=all \

the error-exitcode makes it a whole lot easier to see an error, because
it triggers a crash-restart cycle at backend exit ;)

Note that skink also finds the issue:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=skink=2017-03-26%2013%3A00%3A01=check

Did you compile postgres with valgrind support (-DUSE_VALGRIND / 
pg_config_manual.h)?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Unpinning error in parallel worker

2017-03-26 Thread Thomas Munro
On Mon, Mar 27, 2017 at 4:18 AM, Andreas Seltenreich  wrote:
> Hi,
>
> today's testing with master as of d253b0f6e3 yielded two clusters that
> stopped processing queries.  Symptoms:
>
> [...]

Thanks Andreas.  Investigating.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-03-26 Thread Andres Freund
Hi,


SharedBufFile allows temporary files to be created by one backend and
then exported for read-only access by other backends, with clean-up
managed by reference counting associated with a DSM segment.  This includes
changes to fd.c and buffile.c to support new kinds of temporary file.


diff --git a/src/backend/storage/file/buffile.c 
b/src/backend/storage/file/buffile.c
index 4ca0ea4..a509c05 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c

I think the new facilities should be explained in the file's header.


@@ -68,9 +71,10 @@ struct BufFile
 * avoid making redundant FileSeek calls.
 */
 
-   boolisTemp; /* can only add files if this 
is TRUE */
+   boolisSegmented;/* can only add files if this is TRUE */

That's a bit of a weird and uncommented upon change.


@@ -79,6 +83,8 @@ struct BufFile
 */
ResourceOwner resowner;
 
+   BufFileTag  tag;/* for discoverability between 
backends */

Not perfectly happy with the name tag here, the name is a bit too
similar to BufferTag - something quite different.



+static void
+make_tagged_path(char *tempdirpath, char *tempfilepath,
+const BufFileTag *tag, int segment)
+{
+   if (tag->tablespace == DEFAULTTABLESPACE_OID ||
+   tag->tablespace == GLOBALTABLESPACE_OID)
+   snprintf(tempdirpath, MAXPGPATH, "base/%s", PG_TEMP_FILES_DIR);
+   else
+   {
+   snprintf(tempdirpath, MAXPGPATH, "pg_tblspc/%u/%s/%s",
+tag->tablespace, TABLESPACE_VERSION_DIRECTORY,
+PG_TEMP_FILES_DIR);
+   }
+
+   snprintf(tempfilepath, MAXPGPATH, "%s/%s%d.%d.%d.%d.%d", tempdirpath,
+PG_TEMP_FILE_PREFIX,
+tag->creator_pid, tag->set, tag->partition, 
tag->participant,
+segment);

Is there a risk that this ends up running afoul of filename length
limits on some platforms?

+}
+
+static File
+make_tagged_segment(const BufFileTag *tag, int segment)
+{
+   Filefile;
+   chartempdirpath[MAXPGPATH];
+   chartempfilepath[MAXPGPATH];
+
+   /*
+* There is a remote chance that disk files with this (pid, set) pair
+* already exists after a crash-restart.  Since the presence of
+* consecutively numbered segment files is used by BufFileOpenShared to
+* determine the total size of a shared BufFile, we'll defend against
+* confusion by unlinking segment 1 (if it exists) before creating 
segment
+* 0.
+*/

Gah.  Why on earth aren't we removing temp files when restarting, not
just on the initial start?  That seems completely wrong?


If we do decide not to change this: Why is that sufficient? Doesn't the
same problem exist for segments later than the first?

+/*
+ * Open a file that was previously created in another backend with
+ * BufFileCreateShared.
+ */
+BufFile *
+BufFileOpenTagged(const BufFileTag *tag)
+{
+   BufFile*file = (BufFile *) palloc(sizeof(BufFile));
+   chartempdirpath[MAXPGPATH];
+   chartempfilepath[MAXPGPATH];
+   Sizecapacity = 1024;
+   File   *files = palloc(sizeof(File) * capacity);
+   int nfiles = 0;
+
+   /*
+* We don't know how many segments there are, so we'll probe the
+* filesystem to find out.
+*/
+   for (;;)
+   {
+   /* See if we need to expand our file space. */
+   if (nfiles + 1 > capacity)
+   {
+   capacity *= 2;
+   files = repalloc(files, sizeof(File) * capacity);
+   }
+   /* Try to load a segment. */
+   make_tagged_path(tempdirpath, tempfilepath, tag, nfiles);
+   files[nfiles] = PathNameOpenTemporaryFile(tempfilepath);
+   if (files[nfiles] <= 0)
+   break;

Isn't 0 a theoretically valid return value from
PathNameOpenTemporaryFile?

+/*
+ * Delete a BufFile that was created by BufFileCreateTagged.  Return true if
+ * at least one segment was deleted; false indicates that no segment was
+ * found, or an error occurred while trying to delete.  Errors are logged but
+ * the function returns normally because this is assumed to run in a clean-up
+ * path that might already involve an error.
+ */
+bool
+BufFileDeleteTagged(const BufFileTag *tag)
+{
+   chartempdirpath[MAXPGPATH];
+   chartempfilepath[MAXPGPATH];
+   int segment = 0;
+   boolfound = false;
+
+   /*
+* We don't know if the BufFile really exists, because SharedBufFile
+* tracks only the range of file numbers.  If it does exists, we don't
+* know many 

Re: [HACKERS] [PATCH] Move all am-related reloption code into src/backend/access/[am-name] and get rid of relopt_kind for custom AM

2017-03-26 Thread Nikolay Shaplov
В письме от 23 марта 2017 16:14:58 пользователь Fabrízio de Royes Mello 
написал:
> On Thu, Mar 23, 2017 at 3:58 PM, Alvaro Herrera 
> 
> wrote:
> > Copying Fabrízio Mello here, who spent some time trying to work on
> > reloptions too.  He may have something to say about the new
> > functionality that this patch provides.
> 
> Thanks Álvaro, I'll look the patch and try to help in some way.
Thank you, that would be great!

-- 
Nikolay Shaplov, independent Perl & C/C++ developer. Available for hire.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Valgrind failures caused by multivariate stats patch

2017-03-26 Thread Tomas Vondra

On 03/25/2017 10:10 PM, Andres Freund wrote:
...

==2486== Uninitialised byte(s) found during client check request
==2486==at 0x1C4857: printtup (printtup.c:347)
==2486==by 0x401FD5: ExecutePlan (execMain.c:1681)
==2486==by 0x3FFDED: standard_ExecutorRun (execMain.c:355)
==2486==by 0x3FFC07: ExecutorRun (execMain.c:298)
==2486==by 0x5C40BD: PortalRunSelect (pquery.c:928)
==2486==by 0x5C3D50: PortalRun (pquery.c:769)
==2486==by 0x5BDC1A: exec_simple_query (postgres.c:1101)
==2486==by 0x5C203F: PostgresMain (postgres.c:4071)
==2486==by 0x525138: BackendRun (postmaster.c:4317)
==2486==by 0x524848: BackendStartup (postmaster.c:3989)
==2486==by 0x520C4A: ServerLoop (postmaster.c:1729)
==2486==by 0x5201D9: PostmasterMain (postmaster.c:1337)
==2486==by 0x45DFA6: main (main.c:228)
==2486==  Address 0x3e8c50f5 is in a rw- anonymous segment
==2486==  Uninitialised value was created by a heap allocation
==2486==at 0x76212F: palloc (mcxt.c:872)
==2486==by 0x578814: statext_ndistinct_build (mvdistinct.c:80)
==2486==by 0x577CCE: BuildRelationExtStatistics (extended_stats.c:94)
==2486==by 0x3546B6: do_analyze_rel (analyze.c:577)
==2486==by 0x353B76: analyze_rel (analyze.c:271)
==2486==by 0x3E7B89: vacuum (vacuum.c:321)
==2486==by 0x3E7755: ExecVacuum (vacuum.c:122)
==2486==by 0x5C5F1C: standard_ProcessUtility (utility.c:670)
==2486==by 0x5C5724: ProcessUtility (utility.c:353)
==2486==by 0x5C46CA: PortalRunUtility (pquery.c:1174)
==2486==by 0x5C48D0: PortalRunMulti (pquery.c:1317)
==2486==by 0x5C3E1A: PortalRun (pquery.c:795)
==2486==by 0x5BDC1A: exec_simple_query (postgres.c:1101)
==2486==by 0x5C203F: PostgresMain (postgres.c:4071)
==2486==by 0x525138: BackendRun (postmaster.c:4317)
==2486==by 0x524848: BackendStartup (postmaster.c:3989)
==2486==by 0x520C4A: ServerLoop (postmaster.c:1729)
==2486==by 0x5201D9: PostmasterMain (postmaster.c:1337)
==2486==by 0x45DFA6: main (main.c:228)


Hmmm, so I have a theory about what's going on, but no matter what I do 
I can't trigger these valgrind failures. What switches are you using to 
start valgrind? I'm using this:


valgrind --leak-check=no --undef-value-errors=yes \
  --expensive-definedness-checks=yes --track-origins=yes \
  --read-var-info=yes --gen-suppressions=all \
  --suppressions=src/tools/valgrind.supp --time-stamp=yes \
  --log-file=$HOME/pg-valgrind/%p.log --trace-children=yes \
  postgres --log_line_prefix="%m %p " --log_statement=all \
  --shared_buffers=64MB -D /home/user/tmp/data-valgrind 2>&1

FWIW I think the issue is that statext_ndistinct_build does palloc() and 
then uses offsetof() to copy data into the chunk, which might result in 
a few bytes skipped due to alignment/padding. But as I can't reproduce 
the valgrind failure, it's hard to say.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_dump truncating queries in error messages

2017-03-26 Thread Peter Eisentraut
When reporting an error from a query, pg_dump truncates the reported
query to 128 characters (pg_backup_db.c ExecuteSqlCommand()).

Is this (still) sensible?  The kind of queries that pg_dump is running
nowadays, I find myself unable to debug them if they are truncated at
that length.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump truncating queries in error messages

2017-03-26 Thread Tom Lane
Peter Eisentraut  writes:
> When reporting an error from a query, pg_dump truncates the reported
> query to 128 characters (pg_backup_db.c ExecuteSqlCommand()).

> Is this (still) sensible?  The kind of queries that pg_dump is running
> nowadays, I find myself unable to debug them if they are truncated at
> that length.

Not clear that it ever was very sensible ... +1 for removing the limit.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-03-26 Thread Andreas Seltenreich
Hi,

testing with master as of cf366e97ff, sqlsmith occasionally triggers the
following assertion:

TRAP: FailedAssertion("!(LWLockHeldByMe(((LWLock*) 
(&(bufHdr)->content_lock", File: "bufmgr.c", Line: 3397)

Backtraces always look like the one below.  It is reproducible on a
cluster once it happens.  I could provide a tarball if needed.

regards,
Andreas

#2  0x008324b1 in ExceptionalCondition 
(conditionName=conditionName@entry=0x9e4e28 "!(LWLockHeldByMe(((LWLock*) 
(&(bufHdr)->content_lock", errorType=errorType@entry=0x87b03d 
"FailedAssertion", fileName=fileName@entry=0x9e5856 "bufmgr.c", 
lineNumber=lineNumber@entry=3397) at assert.c:54
#3  0x00706971 in MarkBufferDirtyHint (buffer=2844, 
buffer_std=buffer_std@entry=1 '\001') at bufmgr.c:3397
#4  0x004b3ecd in _hash_kill_items (scan=scan@entry=0x66dcf70) at 
hashutil.c:514
#5  0x004a9c1b in hashendscan (scan=0x66dcf70) at hash.c:512
#6  0x004cf17a in index_endscan (scan=0x66dcf70) at indexam.c:353
#7  0x0061fa51 in ExecEndIndexScan (node=0x3093f30) at 
nodeIndexscan.c:852
#8  0x00608e59 in ExecEndNode (node=) at 
execProcnode.c:715
#9  0x006045b8 in ExecEndPlan (estate=0x3064000, planstate=) at execMain.c:1540
#10 standard_ExecutorEnd (queryDesc=0x30cb880) at execMain.c:487
#11 0x005c87b0 in PortalCleanup (portal=0x1a60060) at portalcmds.c:302
#12 0x0085cbb3 in PortalDrop (portal=0x1a60060, isTopCommit=) at portalmem.c:489
#13 0x00736ed2 in exec_simple_query (query_string=0x315b7a0 "...") at 
postgres.c:
#14 0x00738b51 in PostgresMain (argc=, 
argv=argv@entry=0x1a6c6c8, dbname=, username=) at 
postgres.c:4071
#15 0x00475fef in BackendRun (port=0x1a65b90) at postmaster.c:4317
#16 BackendStartup (port=0x1a65b90) at postmaster.c:3989
#17 ServerLoop () at postmaster.c:1729
#18 0x006c8662 in PostmasterMain (argc=argc@entry=4, 
argv=argv@entry=0x1a3f540) at postmaster.c:1337
#19 0x0047729d in main (argc=4, argv=0x1a3f540) at main.c:228


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN cost estimate

2017-03-26 Thread Emre Hasegeli
> If we want to have a variable which stores the number of ranges, then
> I think numRanges is better than numBlocks. I can't imagine many
> people would disagree there.

I renamed it with other two.

> At the very least please write a comment to explain this in the code.
> Right now it looks broken. If I noticed this then one day in the
> future someone else will. If you write a comment then person of the
> future will likely read it, and then not raise any questions about the
> otherwise questionable code.

I added a sentence about it.

> I do strongly agree that the estimates need improved here. I've
> personally had issues with bad brin estimates before, and I'd like to
> see it improved. I think the patch is not quite complete without it
> also considering stats on expression indexes. If you have time to go
> do that I'd suggest you go ahead with that.

I copy-pasted expression index support from btcostestimate() as well,
and extended the regression test.

I think this function can use more polishing before committing, but I
don't know where to begin.  There are single functions for every
access method in here.  I don't like them being in there to begin
with.  selfuncs.s is quite long with all kinds of dependencies and
dependents.  I think it would be better to have the access method
selectivity estimation functions under src/access.  Maybe we should
start doing so by moving this to src/access/brin/brin_selfuncs.c.
What do you think?


brin-correlation-v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2017-03-26 Thread Thomas Munro
On Fri, Mar 24, 2017 at 1:14 PM, Thomas Munro
 wrote:
> On Tue, Mar 14, 2017 at 7:51 AM, Kevin Grittner  wrote:
>> On Sun, Mar 12, 2017 at 4:08 PM, Thomas Munro
>>  wrote:
>>> I found a new way to break it: run the trigger function so
>>> that the plan is cached by plpgsql, then ALTER TABLE incompatibly,
>>> then run the trigger function again.  See attached.
>>
>> [...]
>>
>> I expected that existing mechanisms would have forced re-planning of
>> a trigger function if the table the function was attached to was
>> altered.  Either that was a bit "optimistic", or the old TupleDesc
>> is used for the new plan.  Will track down which it is, and fix it.
>
> When PlanCacheRelCallback runs, I don't think it understands that
> these named tuplestore RangeTblEntry objects are dependent on the
> subject table.  Could that be fixed like this?
>
> @@ -2571,6 +2582,9 @@ extract_query_dependencies_walker(Node *node,
> PlannerInfo *context)
> if (rte->rtekind == RTE_RELATION)
> context->glob->relationOids =
>
> lappend_oid(context->glob->relationOids, rte->relid);
> +   else if (rte->rtekind == RTE_NAMEDTUPLESTORE)
> +   context->glob->relationOids =
> +
> lappend_oid(context->glob->relationOids, [subject table's OID]);

I'm not sure if this is the right approach and it may have style
issues, but it does fix the crashing in the ALTER TABLE case I
reported: see attached patch which applies on top of your v12.

BTW I had to make the following change to your v12 because of commit b8d7f053:

 /*
  * initialize child expressions
  */
-scanstate->ss.ps.targetlist = (List *)
-ExecInitExpr((Expr *) node->scan.plan.targetlist,
- (PlanState *) scanstate);
-scanstate->ss.ps.qual = (List *)
-ExecInitExpr((Expr *) node->scan.plan.qual,
- (PlanState *) scanstate);
+scanstate->ss.ps.qual =
+ExecInitQual(node->scan.plan.qual, (PlanState *) scanstate);

-- 
Thomas Munro
http://www.enterprisedb.com


experimental-fix-for-transition-table-invalidation.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speedup twophase transactions

2017-03-26 Thread Nikhil Sontakke
Thanks Michael,

I was away for a bit. I will take a look at this patch and get back to you
soon.

Regards,
Nikhils

On 22 March 2017 at 07:40, Michael Paquier 
wrote:

> On Fri, Mar 17, 2017 at 5:15 PM, Michael Paquier
>  wrote:
> > On Fri, Mar 17, 2017 at 5:00 PM, Nikhil Sontakke
> >  wrote:
> >> Micheal, it looks like you are working on a final version of this
> patch? I
> >> will wait to review it from my end, then.
> >
> > I have to admit that I am beginning to get drawn into it...
>
> And here is what I got. I have found a couple of inconsistencies in
> the patch, roughly:
> - During recovery entries marked with ondisk = true should have their
> start and end LSN reset to InvalidXLogRecPtr. This was actually
> leading to some inconsistencies in MarkAsPreparing() for 2PC
> transactions staying around for more than 2 checkpoints.
> - RecoverPreparedTransactions(), StandbyRecoverPreparedTransactions()
> and PrescanPreparedTransactions() doing both a scan of pg_twophase and
> the shared memory entries was way too complicated. I have changed
> things so as only memory entries are scanned by those routines, but an
> initial scan of pg_twophase is done before recovery.
> - Some inconsistencies in the comments and some typos found on the way.
> - Simplification of some routines used in redo, as well as simplified
> the set of routines made available to users.
>
> Tests are passing for me, an extra lookup would be nice.
> --
> Michael
>



-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-26 Thread Fabien COELHO


Hello Corey,


v25, try 2:

First file is what you were used to last time. 2nd and 3rd are changes
since then based on feedback.


Patches do not apply cleanly.
Part 1 gets:
 error: patch failed: src/test/regress/parallel_schedule:89
 error: src/test/regress/parallel_schedule: patch does not apply

There is still the useless file, ok it is removed by part2. Could have 
been just one patch...


After a manual fix in parallel_schedule, make check is ok.

gather_boolean_expression:

ISTM that PQExpBuffer is partially a memory leak. Something should need to 
be freed?


I think that you should use appendPQExpBufferChar and Str instead of 
relying on the format variant which is probably expensive. Something like:


  if (num_options > 0)
append...Char(buf, ' ');
  append...Str(buf, ...);

is_true_boolean_expression: "return (success) ? tf : false;"
Is this simply: "return success && tf;"?

Some functions have opt1, opt2, but some start at opt0. This does not look 
too consistent, although the inconsistency may be preexisting from your 
patch. Basically, there is a need for some more restructuring in 
"command.c".


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication apply to run with sync commit off by default

2017-03-26 Thread Masahiko Sawada
On Fri, Mar 24, 2017 at 11:49 PM, Petr Jelinek
 wrote:
> On 21/03/17 22:37, Petr Jelinek wrote:
>> On 21/03/17 18:54, Robert Haas wrote:
>>> On Mon, Mar 20, 2017 at 7:56 PM, Petr Jelinek
>>>  wrote:
 On 18/03/17 13:31, Petr Jelinek wrote:
> On 07/03/17 06:23, Petr Jelinek wrote:
>> there has been discussion at the logical replication initial copy thread
>> [1] about making apply work with sync commit off by default for
>> performance reasons and adding option to change that per subscription.
>>
>> Here I propose patch to implement this - it adds boolean column
>> subssynccommit to pg_subscription catalog which decides if
>> synchronous_commit should be off or local for apply. And it adds
>> SYNCHRONOUS_COMMIT = boolean to the list of WITH options for CREATE and
>> ALTER SUBSCRIPTION. When nothing is specified it will set it to false.
>>
>> The patch is built on top of copy patch currently as there are conflicts
>> between the two and this helps a bit with testing of copy patch.
>>
>> [1]
>> https://www.postgresql.org/message-id/CA+TgmoY7Lk2YKArcp4O=Qu=xoor8j71mad1oteojawmuje3...@mail.gmail.com
>>
>
> I rebased this patch against recent changes and the latest version of
> copy patch.

>
> Rebase after table copy patch got committed.
>

This patch seems to conflict current HEAD. Please update the patch.

$ patch -p1  < 0001-Add-option-to-modify-sync-commit-per-subscription.patch
(Stripping trailing CRs from patch.)
patching file doc/src/sgml/catalogs.sgml
Hunk #1 succeeded at 6511 (offset 97 lines).
(Stripping trailing CRs from patch.)
patching file doc/src/sgml/ref/alter_subscription.sgml
(Stripping trailing CRs from patch.)
patching file doc/src/sgml/ref/create_subscription.sgml
(Stripping trailing CRs from patch.)
patching file src/backend/catalog/pg_subscription.c
(Stripping trailing CRs from patch.)
patching file src/backend/commands/subscriptioncmds.c
(Stripping trailing CRs from patch.)
patching file src/backend/replication/logical/launcher.c
(Stripping trailing CRs from patch.)
patching file src/backend/replication/logical/worker.c
Hunk #1 succeeded at 1395 (offset -15 lines).
Hunk #2 succeeded at 1468 (offset -15 lines).
(Stripping trailing CRs from patch.)
patching file src/bin/pg_dump/pg_dump.c
Hunk #1 succeeded at 3654 (offset 1 line).
Hunk #2 succeeded at 3672 (offset 1 line).
Hunk #3 succeeded at 3687 (offset 1 line).
Hunk #4 succeeded at 3705 (offset 1 line).
Hunk #5 succeeded at 3776 (offset 1 line).
(Stripping trailing CRs from patch.)
patching file src/bin/pg_dump/pg_dump.h
Hunk #1 succeeded at 612 (offset 8 lines).
(Stripping trailing CRs from patch.)
patching file src/bin/pg_dump/t/002_pg_dump.pl
(Stripping trailing CRs from patch.)
patching file src/bin/psql/describe.c
Hunk #1 succeeded at 5171 (offset 51 lines).
Hunk #2 succeeded at 5198 (offset 51 lines).
(Stripping trailing CRs from patch.)
patching file src/include/catalog/pg_subscription.h
(Stripping trailing CRs from patch.)
patching file src/test/regress/expected/subscription.out
Hunk #1 FAILED at 25.
Hunk #2 succeeded at 89 (offset 25 lines).
1 out of 2 hunks FAILED -- saving rejects to file
src/test/regress/expected/subscription.out.rej
(Stripping trailing CRs from patch.)
patching file src/test/regress/sql/subscription.sql
Hunk #1 succeeded at 66 (offset 21 lines).

-
+ 
+  The value of this parameter overrides the
+  synchronous_commit setting.
+  The default value is false.
+ 

+  
+   If true, the apply for the subscription will run with
+   synchronous_commit set to local.
+   Otherwise it will have it set to false.
+  

synchronous_commit can work with false actually but I think that it's
better to use "off" instead of "false" according to the document.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] Unpinning error in parallel worker

2017-03-26 Thread Andreas Seltenreich
Hi,

today's testing with master as of d253b0f6e3 yielded two clusters that
stopped processing queries.  Symptoms:

regression=> select application_name, state, wait_event, wait_event_type, 
count(1), min(pid)
 from pg_stat_activity group by 1,2,3,4;
 application_name | state  |   wait_event   | wait_event_type | 
count | min  
--+++-+---+--
 psql | active || | 
1 | 3781
 psql | active | DynamicSharedMemoryControlLock | LWLock  | 
2 | 3272
 sqlsmith::dut| active | DynamicSharedMemoryControlLock | LWLock  | 
9 | 2726
 sqlsmith::dut| active | MessageQueueSend   | IPC | 
1 | 2625
 sqlsmith::dut| active | BgWorkerShutdown   | IPC | 
1 | 3655
 sqlsmith::schema | idle   | ClientRead | Client  | 
9 | 3634
(6 rows)

regression=> select application_name, state, wait_event, wait_event_type, 
count(1), min(pid)
 from pg_stat_activity group by 1,2,3,4;
 application_name | state  |   wait_event   | wait_event_type | 
count |  min  
--+++-+---+---
 psql | active || | 
1 | 29645
 sqlsmith::dut| active | DynamicSharedMemoryControlLock | LWLock  | 
5 | 23167
 sqlsmith::schema | idle   | ClientRead | Client  | 
5 |  1169
 sqlsmith::dut| active | BgWorkerShutdown   | IPC | 
1 |  1178
(4 rows)

Both sport the following last error in their logfiles:

FATAL:  cannot unpin a segment that is not pinned

Below are the backtraces of the processes throwing them.

regards,
Andreas

Backtrace on dwagon:

#0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
#1  0x006b7c02 in PGSemaphoreLock (sema=0x7f8aa19b20d8) at pg_sema.c:310
#2  0x00726b04 in LWLockAcquire (lock=0x7f8aa19b3400, 
mode=LW_EXCLUSIVE) at lwlock.c:1233
#3  0x00710b56 in dsm_detach (seg=0x12cf6a0) at dsm.c:760
#4  0x00710e6e in dsm_backend_shutdown () at dsm.c:643
#5  0x0071144f in shmem_exit (code=code@entry=1) at ipc.c:248
#6  0x0071150e in proc_exit_prepare (code=code@entry=1) at ipc.c:185
#7  0x00711588 in proc_exit (code=code@entry=1) at ipc.c:102
#8  0x008352ff in errfinish (dummy=dummy@entry=0) at elog.c:543
#9  0x00838a8c in elog_finish (elevel=, fmt=0x9e8e70 
"cannot unpin a segment that is not pinned") at elog.c:1378
#10 0x007111ff in dsm_unpin_segment (handle=545777640) at dsm.c:917
#11 0x00858205 in dsa_release_in_place (place=0x7f8aa524a178) at 
dsa.c:617
#12 0x00710ae0 in dsm_detach (seg=0x12cf6a0) at dsm.c:734
#13 0x00710e6e in dsm_backend_shutdown () at dsm.c:643
#14 0x0071144f in shmem_exit (code=code@entry=1) at ipc.c:248
#15 0x0071150e in proc_exit_prepare (code=code@entry=1) at ipc.c:185
#16 0x00711588 in proc_exit (code=code@entry=1) at ipc.c:102
#17 0x008352ff in errfinish (dummy=) at elog.c:543
#18 0x0073425c in ProcessInterrupts () at postgres.c:2837
#19 0x004baeb5 in heapgetpage (scan=scan@entry=0x1376138, 
page=page@entry=0) at heapam.c:373
#20 0x004bbaf0 in heapgettup_pagemode (key=0x1376550, nkeys=1, 
dir=ForwardScanDirection, scan=0x1376138) at heapam.c:830
#21 heap_getnext (scan=0x1376138, 
direction=direction@entry=ForwardScanDirection) at heapam.c:1804
#22 0x004cf2be in systable_getnext (sysscan=sysscan@entry=0x13760e0) at 
genam.c:435
#23 0x00823760 in ScanPgRelation (targetRelId=, 
indexOK=, force_non_historic=force_non_historic@entry=0 '\000') 
at relcache.c:357
#24 0x00823ad3 in RelationReloadIndexInfo 
(relation=relation@entry=0x7f8aa52321e8) at relcache.c:2235
#25 0x00827dd8 in RelationIdGetRelation 
(relationId=relationId@entry=2662) at relcache.c:2091
#26 0x004bb088 in relation_open (relationId=relationId@entry=2662, 
lockmode=lockmode@entry=1) at heapam.c:1128
#27 0x004cf8f6 in index_open (relationId=relationId@entry=2662, 
lockmode=lockmode@entry=1) at indexam.c:155
#28 0x004cf226 in systable_beginscan 
(heapRelation=heapRelation@entry=0x7f8aa10e64a0, indexId=2662, 
indexOK=, snapshot=snapshot@entry=0x0, nkeys=1, 
key=key@entry=0x7ffe7b2bd260) at genam.c:340
#29 0x0081ef9c in SearchCatCache (cache=0x132cd10, v1=v1@entry=16528, 
v2=v2@entry=0, v3=v3@entry=0, v4=v4@entry=0) at catcache.c:1234
#30 0x0082d46e in SearchSysCache (cacheId=cacheId@entry=46, 
key1=key1@entry=16528, key2=key2@entry=0, key3=key3@entry=0, key4=key4@entry=0) 
at syscache.c:1108
#31 0x00530b8c in pg_class_aclmask (table_oid=table_oid@entry=16528, 

Re: [HACKERS] [PATCH] ALTER DEFAULT PRIVILEGES with GRANT/REVOKE ON SCHEMAS

2017-03-26 Thread Matheus de Oliveira
On Thu, Mar 2, 2017 at 10:27 AM, David Steele  wrote:

> It looks like this patch is still waiting on an update for tab
> completion in psql.


Hi All,

Sorry about the long delay... It was so simple to add it to tab-complete.c
that is a shame I didn't do it before, very sorry about that.

Attached the new version of the patch that is basically the same as
previously with the addition to tab completion for psql and rebased with
master.

Hope it is enough. Thank you all.

-- 
Matheus de Oliveira
diff --git a/doc/src/sgml/ref/alter_default_privileges.sgml b/doc/src/sgml/ref/alter_default_privileges.sgml
index 04064d3..e3363f8 100644
*** a/doc/src/sgml/ref/alter_default_privileges.sgml
--- b/doc/src/sgml/ref/alter_default_privileges.sgml
***
*** 46,51  GRANT { USAGE | ALL [ PRIVILEGES ] }
--- 46,55 
  ON TYPES
  TO { [ GROUP ] role_name | PUBLIC } [, ...] [ WITH GRANT OPTION ]
  
+ GRANT { USAGE | CREATE | ALL [ PRIVILEGES ] }
+ ON SCHEMAS
+ TO { [ GROUP ] role_name | PUBLIC } [, ...] [ WITH GRANT OPTION ]
+ 
  REVOKE [ GRANT OPTION FOR ]
  { { SELECT | INSERT | UPDATE | DELETE | TRUNCATE | REFERENCES | TRIGGER }
  [, ...] | ALL [ PRIVILEGES ] }
***
*** 71,76  REVOKE [ GRANT OPTION FOR ]
--- 75,86 
  ON TYPES
  FROM { [ GROUP ] role_name | PUBLIC } [, ...]
  [ CASCADE | RESTRICT ]
+ 
+ REVOKE [ GRANT OPTION FOR ]
+ { USAGE | CREATE | ALL [ PRIVILEGES ] }
+ ON SCHEMAS
+ FROM { [ GROUP ] role_name | PUBLIC } [, ...]
+ [ CASCADE | RESTRICT ]
  
   
  
***
*** 81,88  REVOKE [ GRANT OPTION FOR ]
 ALTER DEFAULT PRIVILEGES allows you to set the privileges
 that will be applied to objects created in the future.  (It does not
 affect privileges assigned to already-existing objects.)  Currently,
!only the privileges for tables (including views and foreign tables),
!sequences, functions, and types (including domains) can be altered.

  

--- 91,99 
 ALTER DEFAULT PRIVILEGES allows you to set the privileges
 that will be applied to objects created in the future.  (It does not
 affect privileges assigned to already-existing objects.)  Currently,
!only the privileges for schemas, tables (including views and foreign
!tables), sequences, functions, and types (including domains) can be
!altered.

  

***
*** 125,130  REVOKE [ GRANT OPTION FOR ]
--- 136,143 
are altered for objects later created in that schema.
If IN SCHEMA is omitted, the global default privileges
are altered.
+   IN SCHEMA is not allowed when using ON SCHEMAS
+   as schemas can't be nested.
   
  
 
diff --git a/src/backend/catalog/aclchk.c b/src/backeindex d01930f..2d535c2 100644
*** a/src/backend/catalog/aclchk.c
--- b/src/backend/catalog/aclchk.c
***
*** 959,964  ExecAlterDefaultPrivilegesStmt(ParseState *pstate, AlterDefaultPrivilegesStmt *s
--- 959,968 
  			all_privileges = ACL_ALL_RIGHTS_TYPE;
  			errormsg = gettext_noop("invalid privilege type %s for type");
  			break;
+ 		case ACL_OBJECT_NAMESPACE:
+ 			all_privileges = ACL_ALL_RIGHTS_NAMESPACE;
+ 			errormsg = gettext_noop("invalid privilege type %s for schema");
+ 			break;
  		default:
  			elog(ERROR, "unrecognized GrantStmt.objtype: %d",
   (int) action->objtype);
***
*** 1146,1151  SetDefaultACL(InternalDefaultACL *iacls)
--- 1150,1165 
  this_privileges = ACL_ALL_RIGHTS_TYPE;
  			break;
  
+ 		case ACL_OBJECT_NAMESPACE:
+ 			if (OidIsValid(iacls->nspid))
+ ereport(ERROR,
+ 		(errcode(ERRCODE_INVALID_GRANT_OPERATION),
+ 		 errmsg("cannot use IN SCHEMA clause when using GRANT/REVOKE ON SCHEMAS")));
+ 			objtype = DEFACLOBJ_NAMESPACE;
+ 			if (iacls->all_privs && this_privileges == ACL_NO_RIGHTS)
+ this_privileges = ACL_ALL_RIGHTS_NAMESPACE;
+ 			break;
+ 
  		default:
  			elog(ERROR, "unrecognized objtype: %d",
   (int) iacls->objtype);
***
*** 1369,1374  RemoveRoleFromObjectACL(Oid roleid, Oid classid, Oid objid)
--- 1383,1391 
  			case DEFACLOBJ_TYPE:
  iacls.objtype = ACL_OBJECT_TYPE;
  break;
+ 			case DEFACLOBJ_NAMESPACE:
+ iacls.objtype = ACL_OBJECT_NAMESPACE;
+ break;
  			default:
  /* Shouldn't get here */
  elog(ERROR, "unexpected default ACL type: %d",
***
*** 5259,5264  get_user_default_acl(GrantObjectType objtype, Oid ownerId, Oid nsp_oid)
--- 5276,5285 
  			defaclobjtype = DEFACLOBJ_TYPE;
  			break;
  
+ 		case ACL_OBJECT_NAMESPACE:
+ 			defaclobjtype = DEFACLOBJ_NAMESPACE;
+ 			break;
+ 
  		default:
  			return NULL;
  	}
diff --git a/src/backend/catalog/obindex 2948d64..1eb7930 100644
*** a/src/backend/catalog/objectaddress.c
--- b/src/backend/catalog/objectaddress.c
***
*** 1843,1853  get_object_address_defacl(List *object, bool missing_ok)
  		case DEFACLOBJ_TYPE:
  			

Re: [HACKERS] free space map and visibility map

2017-03-26 Thread Kyotaro HORIGUCHI
At Sat, 25 Mar 2017 19:53:47 -0700, Jeff Janes  wrote in 

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-26 Thread Amit Kapila
On Sun, Mar 26, 2017 at 11:26 AM, Mithun Cy  wrote:
> Thanks, Amit for the review.
> On Sat, Mar 25, 2017 at 7:03 PM, Amit Kapila  wrote:
>>
>> I think one-dimensional patch has fewer places to touch, so that looks
>> better to me.  However, I think there is still hard coding and
>> assumptions in code which we should try to improve.
>
> Great!, I will continue with spares 1-dimensional improvement.
>

@@ -563,18 +563,20 @@ _hash_init_metabuffer(Buffer buf, double
num_tuples, RegProcedure procid,\
{
..
  else
- num_buckets = ((uint32) 1) << _hash_log2((uint32) dnumbuckets);
+ num_buckets = _hash_get_totalbuckets(_hash_spareindex(dnumbuckets));
..
..
- metap->hashm_maxbucket = metap->hashm_lowmask = num_buckets - 1;
- metap->hashm_highmask = (num_buckets << 1) - 1;
+ metap->hashm_maxbucket = num_buckets - 1;
+
+ /* set hishmask, which should be sufficient to cover num_buckets. */
+ metap->hashm_highmask = (1 << (_hash_log2(num_buckets))) - 1;
+ metap->hashm_lowmask = (metap->hashm_highmask >> 1);
}

I think we can't change the number of buckets to be created or lowmask
and highmask calculation here without modifying _h_spoolinit() because
it sorts the data to be inserted based on hashkey which in turn
depends on the number of buckets that we are going to create during
create index operation.  We either need to allow create index
operation to still always create buckets in power-of-two fashion or we
need to update _h_spoolinit according to new computation.  One minor
drawback of using power-of-two scheme for creation of buckets during
create index is that it can lead to wastage of space and will be
inconsistent with what the patch does during split operation.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-26 Thread Rushabh Lathia
On Mon, Mar 27, 2017 at 3:43 AM, Tomas Vondra 
wrote:

> On 03/25/2017 05:18 PM, Rushabh Lathia wrote:
>
>>
>>
>> On Sat, Mar 25, 2017 at 7:01 PM, Peter Eisentraut
>> > > wrote:
>>
>> On 3/25/17 09:01, David Rowley wrote:
>> > On 25 March 2017 at 23:09, Rushabh Lathia > > wrote:
>> >> Also another point which I think we should fix is, when someone set
>> >> max_parallel_workers = 0, we should also set the
>> >> max_parallel_workers_per_gather
>> >> to zero. So that way it we can avoid generating the gather path
>> with
>> >> max_parallel_worker = 0.
>> > I see that it was actually quite useful that it works the way it
>> does.
>> > If it had worked the same as max_parallel_workers_per_gather, then
>> > likely Tomas would never have found this bug.
>>
>> Another problem is that the GUC system doesn't really support cases
>> where the validity of one setting depends on the current value of
>> another setting.  So each individual setting needs to be robust
>> against
>> cases of related settings being nonsensical.
>>
>>
>> Okay.
>>
>> About the original issue reported by Tomas, I did more debugging and
>> found that - problem was gather_merge_clear_slots() was not returning
>> the clear slot when nreader is zero (means nworkers_launched = 0).
>> Due to the same scan was continue even all the tuple are exhausted,
>> and then end up with server crash at gather_merge_getnext(). In the patch
>> I also added the Assert into gather_merge_getnext(), about the index
>> should be less then the nreaders + 1 (leader).
>>
>> PFA simple patch to fix the problem.
>>
>>
> I think there are two issues at play, here - the first one is that we
> still produce parallel plans even with max_parallel_workers=0, and the
> second one is the crash in GatherMerge when nworkers=0.
>
> Your patch fixes the latter (thanks for looking into it), which is
> obviously a good thing - getting 0 workers on a busy system is quite
> possible, because all the parallel workers can be already chewing on some
> other query.
>
>
Thanks.


> But it seems a bit futile to produce the parallel plan in the first place,
> because with max_parallel_workers=0 we can't possibly get any parallel
> workers ever. I wonder why compute_parallel_worker() only looks at
> max_parallel_workers_per_gather, i.e. why shouldn't it do:
>
>parallel_workers = Min(parallel_workers, max_parallel_workers);
>
>
I agree with you here. Producing the parallel plan when
max_parallel_workers = 0 is wrong. But rather then your suggested fix, I
think that we should do something like:

/*
 * In no case use more than max_parallel_workers_per_gather or
 * max_parallel_workers.
 */
parallel_workers = Min(parallel_workers, Min(max_parallel_workers,
max_parallel_workers_per_gather));



> Perhaps this was discussed and is actually intentional, though.
>
>
Yes, I am not quite sure about this.

Regarding handling this at the GUC level - I agree with Peter that that's
> not a good idea. I suppose we could deal with checking the values in the
> GUC check/assign hooks, but what we don't have is a way to undo the changes
> in all the GUCs. That is, if I do
>
>SET max_parallel_workers = 0;
>SET max_parallel_workers = 16;
>
> I expect to end up with just max_parallel_workers GUC changed and nothing
> else.
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>



-- 
Rushabh Lathia


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-26 Thread Andres Freund
On 2017-03-25 20:59:27 -0700, Andres Freund wrote:
> On 2017-03-25 23:51:45 -0400, Tom Lane wrote:
> > Andres Freund  writes:
> > > On March 25, 2017 4:56:11 PM PDT, Ants Aasma  wrote:
> > >> I haven't had the time to research this properly, but initial tests
> > >> show that with GCC 6.2 adding
> > >> 
> > >> #pragma GCC optimize ("no-crossjumping")
> > >> 
> > >> fixes merging of the op tail jumps.
> > >> 
> > >> Some quick and dirty benchmarking suggests that the benefit for the
> > >> interpreter is about 15% (5% speedup on a workload that spends 1/3 in
> > >> ExecInterpExpr). My idea of prefetching op->resnull/resvalue to local
> > >> vars before the indirect jump is somewhere between a tiny benefit and
> > >> no effect, certainly not worth introducing extra complexity. Clang 3.8
> > >> does the correct thing out of the box and is a couple of percent
> > >> faster than GCC with the pragma.
> > 
> > > That's large enough to be worth doing (although I recall you seeing all 
> > > jumps commonalized).  We should probably do this on a per function basis 
> > > however (either using pragma push option, or function attributes).
> > 
> > Seems like it would be fine to do it on a per-file basis.
> 
> I personally find per-function annotation ala
> __attribute__((optimize("no-crossjumping")))
> cleaner anyway.  I tested that, and it seems to work.
> 
> Obviously we'd have to hide that behind a configure test.  Could also do
> tests based on __GNUC__ / __GNUC_MINOR__, but that seems uglier.

Checking for this isn't entirely pretty - see my attached attempt at
doing so.  I considered hiding
__attribute__((optimize("no-crossjumping"))) in execInterpExpr.c behind
a macro (like PG_DISABLE_CROSSJUMPING), but I don't really think that
makes things better.

Comments?

Greetings,

Andres Freund
>From ce42086871ebfcbb7ae52266e9c4549b5958e80b Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sun, 26 Mar 2017 22:38:21 -0700
Subject: [PATCH] Disable gcc's crossjumping optimization for ExecInterpExpr().

GCC merges code in ExecInterpExpr() too aggressively, reducing branch
prediction accuracy. As this is performance critical code, go through
the trouble of disabling the relevant optimization for the function.

To do so, have to detect whether the compiler support
__attribute__((optimize("no-crossjumping"))) as a function
annotation. Do so in configure.

Discussion: https://postgr.es/m/20170326035927.5mubkfdtaqlrg...@alap3.anarazel.de
---
 config/c-compiler.m4  | 35 ++
 configure | 41 +++
 configure.in  |  1 +
 src/backend/executor/execExprInterp.c |  9 
 src/include/pg_config.h.in|  4 
 src/include/pg_config.h.win32 |  4 
 6 files changed, 94 insertions(+)

diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 3321f226f3..ac72539dec 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -297,6 +297,41 @@ fi])# PGAC_C_COMPUTED_GOTO
 
 
 
+# PGAC_C_FUNCATTR_NO_CROSSJUMPING
+# ---
+# Check if the C compiler understands the
+# __attribute__((optimize("no-crossjumping"))) function attribute.
+# Define HAVE_FUNCATTR_NO_CROSSJUMPING if so.
+#
+# At some later point it might make sense to generalize this so we can
+# check for other optimization flags, but so far there's no need for
+# that.
+#
+# Have to enable Werror, as some compilers (e.g. clang) would
+# otherwise just warn about an unknown type of attribute.
+AC_DEFUN([PGAC_C_FUNCATTR_NO_CROSSJUMPING],
+[AC_CACHE_CHECK([for function attribute disabling crossjumping optimizations], pgac_cv_funcattr_no_crossjumping,
+[ac_save_c_werror_flag=$ac_c_werror_flag
+ac_c_werror_flag=yes
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([
+extern int testme(void);
+int
+__attribute__((optimize("no-crossjumping")))
+testme(void)
+{
+return 0;
+}
+])],
+[pgac_cv_funcattr_no_crossjumping=yes],
+[pgac_cv_funcattr_no_crossjumping=no])
+ac_c_werror_flag=$ac_save_c_werror_flag])
+if test x"$pgac_cv_funcattr_no_crossjumping" = xyes ; then
+AC_DEFINE(HAVE_FUNCATTR_NO_CROSSJUMPING, 1,
+  [Define to 1 if your compiler understands __attribute__((optimize("no-crossjumping"))).])
+fi])# PGAC_C_FUNCATTR_NO_CROSSJUMPING
+
+
+
 # PGAC_C_VA_ARGS
 # --
 # Check if the C compiler understands C99-style variadic macros,
diff --git a/configure b/configure
index 4b8229e959..87c4799be3 100755
--- a/configure
+++ b/configure
@@ -11835,6 +11835,47 @@ if test x"$pgac_cv_computed_goto" = xyes ; then
 $as_echo "#define HAVE_COMPUTED_GOTO 1" >>confdefs.h
 
 fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for function attribute disabling crossjumping optimizations" >&5
+$as_echo_n "checking for function attribute disabling crossjumping optimizations... " >&6; }
+if ${pgac_cv_funcattr_no_crossjumping+:} false; then :
+  $as_echo_n "(cached) " 

Re: [HACKERS] [sqlsmith] Unpinning error in parallel worker

2017-03-26 Thread Thomas Munro
On Mon, Mar 27, 2017 at 8:38 AM, Thomas Munro
 wrote:
> On Mon, Mar 27, 2017 at 4:18 AM, Andreas Seltenreich  
> wrote:
>> Hi,
>>
>> today's testing with master as of d253b0f6e3 yielded two clusters that
>> stopped processing queries.  Symptoms:
>>
>> [...]
>
> Thanks Andreas.  Investigating.

First, the hanging stems from reentering dsm_backend_shutdown and
trying to acquire DynamicSharedMemoryControlLock which we already
acquired further down the stack in dsm_unpin_segment when it raised an
error.  That's obviously not great, but the real question is how we
reached this this-cannot-happen error condition.

I reproduced this by inserting a sleep before dsa_attach_in_place,
inserting a call to dsa_allocate into ExecInitParallelPlan so that the
executor's DSA area owns at least one segment, and then cancelling a
parallel query before the sleepy worker has managed to attach.  The
DSA area is destroyed before the worker attaches, but the worker
doesn't know this, and goes on to destroy it again after it learns
that the query has been cancelled.

In an earlier version of DSA, attaching should have failed in this
scenario because the handle would be invalid.  Based on complaints
about creating an extra DSM segment all the time even if we don't turn
out to need it, I implemented "in place" DSA areas where the control
object is in user-supplied shared memory, in this case in the parallel
query main DSM segment.  But that created a new hazard: if you try to
attach to a piece of memory that contains the remains of a
already-destroyed DSA area, then we don't do anything to detect that.
Oops.

The attached patch fixes that one way: it detects refcnt == 0 as a
defunct DSA area and raises an error when you try to attach.

Another approach which I could explore would be to "reset" the DSA
area instead of destroying it when the last backend detaches.  I'm not
sure if that would ever be useful as a feature in its own right, but
it would at least allow a very late worker to attach and then detach
in an orderly fashion in this query-cancelled case, so you wouldn't
get a different error message in the worker in this rare case.

Thoughts?

-- 
Thomas Munro
http://www.enterprisedb.com


detect-late-dsa-attach-in-place.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Improve access to parallel query from procedural languages.

2017-03-26 Thread Rafia Sabih
On Sun, Mar 26, 2017 at 3:34 AM, Tom Lane  wrote:
> I wrote:
>> It doesn't seem to be a platform-specific problem: I can duplicate the
>> failure here by applying same settings mandrill uses, ie build with
>> -DRANDOMIZE_ALLOCATED_MEMORY and set force_parallel_mode = regress.
>
> Oh ... scratch that: you don't even need -DRANDOMIZE_ALLOCATED_MEMORY.
> For some reason I just assumed that any parallelism-related patch
> would have been tested with force_parallel_mode = regress.  This one
> evidently was not.
>
> regards, tom lane
>

This is caused because trigger related functions are marked safe and
using global variables, hence when executed in parallel are giving
incorrect  output. Attached patch fixes this. I have modified only
those functions that are showing incorrect behaviour in the regress
output and checked regression test with force_parallel_mode = regress
and all testcases are passing now.

This concerns me that should we be checking all the system defined
functions once again if they are actually parallel safe...?

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


system_defined_fn_update.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers