date:20151204

Re: [HACKERS] psql: add \pset true/false

2015-12-04 Thread Pavel Stehule

2015-12-04 9:37 GMT+01:00 Kyotaro HORIGUCHI :

> Hello, I think this is the my last proposal of an idea on
> psql-side generic solution. Sorry for bothering.
>
> > My environment is CentOS7. But completely forgot Windows
> > environment (or other platforms). I agree with you. It will
> > indeed too much considering all possible platforms.
>
> Ok, DLL is not practical since it would not only be rather
> complex considering multi platform but used only by this feature
> and no other user of DLL is not in sight. It's convincing enough
> for me.
>
> Then, how about calling subprocess for purpose? popen() looks to
> be platform-independent so it is available to call arbitrary
> external programs or scripts.
>
>
> There are several obvious problems on this.
>
>  - Tremendously slow since this executes the program for every value.
>
>  - Dangerous in some aspect. Setting it by environment variable
>is too dengerous but I'm not confidento whether settig by
>\pset is safer as acceptable.
>
> Is it still too complex? Or too slow?
>


long time I am dream about integrating Lua to psql

It is fast enough for these purpose and can be used for custom macros, ..

Regards

Pavel


>
> regards,
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>
> diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
> index 438a4ec..5dad70b 100644
> --- a/src/bin/psql/command.c
> +++ b/src/bin/psql/command.c
> @@ -1148,7 +1148,8 @@ exec_command(const char *cmd,
>
> int i;
> static const char *const my_list[] = {
> -   "border", "columns", "expanded",
> "fieldsep", "fieldsep_zero",
> +   "border", "columns", "column_filter",
> +   "expanded", "fieldsep", "fieldsep_zero",
> "footer", "format", "linestyle", "null",
> "numericlocale", "pager",
> "pager_min_lines",
> "recordsep", "recordsep_zero",
> @@ -2695,6 +2696,17 @@ do_pset(const char *param, const char *value,
> printQueryOpt *popt, bool quiet)
> if (value)
> popt->topt.columns = atoi(value);
> }
> +   /* set column filter */
> +   else if (strcmp(param, "columnfilter") == 0)
> +   {
> +   if (vallen > 0)
> +   popt->topt.columnfilter = pg_strdup(value);
> +   else
> +   {
> +   if (popt->topt.columnfilter)
> pg_free(popt->topt.columnfilter);
> +   popt->topt.columnfilter = NULL;
> +   }
> +   }
> else
> {
> psql_error("\\pset: unknown option: %s\n", param);
> @@ -2726,6 +2738,15 @@ printPsetInfo(const char *param, struct
> printQueryOpt *popt)
> printf(_("Target width is %d.\n"),
> popt->topt.columns);
> }
>
> +   /* show the target width for the wrapped format */
> +   else if (strcmp(param, "columnfilter") == 0)
> +   {
> +   if (!popt->topt.columnfilter)
> +   printf(_("Column filter is unset.\n"));
> +   else
> +   printf(_("Column filter is \"%s\".\n"),
> popt->topt.columnfilter);
> +   }
> +
> /* show expanded/vertical mode */
> else if (strcmp(param, "x") == 0 || strcmp(param, "expanded") == 0
> || strcmp(param, "vertical") == 0)
> {
> @@ -2938,6 +2959,8 @@ pset_value_string(const char *param, struct
> printQueryOpt *popt)
> return psprintf("%d", popt->topt.border);
> else if (strcmp(param, "columns") == 0)
> return psprintf("%d", popt->topt.columns);
> +   else if (strcmp(param, "columnfilter") == 0)
> +   return pstrdup(popt->topt.columnfilter);
> else if (strcmp(param, "expanded") == 0)
> return pstrdup(popt->topt.expanded == 2
>? "auto"
> diff --git a/src/bin/psql/print.c b/src/bin/psql/print.c
> index ad4350e..9731e54 100644
> --- a/src/bin/psql/print.c
> +++ b/src/bin/psql/print.c
> @@ -195,7 +195,7 @@ static void IsPagerNeeded(const printTableContent
> *cont, const int extra_lines,
>   FILE **fout, bool *is_pager);
>
>  static void print_aligned_vertical(const printTableContent *cont, FILE
> *fout);
> -
> +static char *valfilter(char *filtname, char *origcell, int colnum, int
> type, bool *mustfree);
>
>  /* Count number of digits in integral part of number */
>  static int
> @@ -3145,6 +3145,86 @@ printTable(const printTableContent *cont, FILE
> *fout, FILE *flog)
> ClosePager(fout);
>  }
>
> +static char *
> +valfilter(char *filtname, char *origcell, int colnum, int type, bool
> *mustfree)
> +{
> +   char *cmdline;
> +   int buflen = strlen(filtname) + 1 +
> +

Re: [HACKERS] Error with index on unlogged table

2015-12-04 Thread Andres Freund

On 2015-12-04 17:00:13 +0900, Michael Paquier wrote:
> Andres Freud wrote:
> >> @@ -450,14 +450,21 @@ ginbuildempty(PG_FUNCTION_ARGS)
> >>   START_CRIT_SECTION();
> >>   GinInitMetabuffer(MetaBuffer);
> >>   MarkBufferDirty(MetaBuffer);
> >> - log_newpage_buffer(MetaBuffer, false);
> >> + log_newpage_buffer(MetaBuffer, false, false);
> >>   GinInitBuffer(RootBuffer, GIN_LEAF);
> >>   MarkBufferDirty(RootBuffer);
> >> - log_newpage_buffer(RootBuffer, false);
> >> + log_newpage_buffer(RootBuffer, false, true);
> >>   END_CRIT_SECTION();
> >>
> > Why isn't the metapage flushed to disk?
> 
> I was not sure if we should only flush only at the last page, as pages
> just before would be already replayed. Switched.

Uh, that's not how it works! The earlier pages would just be in shared
buffers. Neither smgrwrite, nor smgrimmedsync does anything about that!

> >>  extern void InitXLogInsert(void);
> >> diff --git a/src/include/catalog/pg_control.h 
> >> b/src/include/catalog/pg_control.h
> >> index ad1eb4b..91445f1 100644
> >> --- a/src/include/catalog/pg_control.h
> >> +++ b/src/include/catalog/pg_control.h
> >> @@ -73,6 +73,7 @@ typedef struct CheckPoint
> >>  #define XLOG_END_OF_RECOVERY 0x90
> >>  #define XLOG_FPI_FOR_HINT0xA0
> >>  #define XLOG_FPI 0xB0
> >> +#define XLOG_FPI_FOR_SYNC0xC0
> >
> >
> > I'm not a big fan of the XLOG_FPI_FOR_SYNC name. Syncing is a bit too
> > ambigous for my taste. How about either naming it XLOG_FPI_FLUSH or
> > instead adding actual record data and a 'flags' field in there? I
> > slightly prefer the latter - XLOG_FPI and XLOG_FPI_FOR_HINT really are
> > different, XLOG_FPI_FOR_SYNC not so much.
> 
> Let's go for XLOG_FPI_FLUSH.

I think the other way is a bit better, because we can add new flags
without changing the WAL format.

> diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
> index 99337b0..b646101 100644
> --- a/src/backend/access/brin/brin.c
> +++ b/src/backend/access/brin/brin.c
> @@ -682,7 +682,15 @@ brinbuildempty(PG_FUNCTION_ARGS)
>   brin_metapage_init(BufferGetPage(metabuf), BrinGetPagesPerRange(index),
>  BRIN_CURRENT_VERSION);
>   MarkBufferDirty(metabuf);
> - log_newpage_buffer(metabuf, false);
> +
> + /*
> +  * For unlogged relations, this page should be immediately flushed
> +  * to disk after being replayed. This is necessary to ensure that the
> +  * initial on-disk state of unlogged relations is preserved when
> +  * they get reset at the end of recovery.
> +  */
> + log_newpage_buffer(metabuf, false,
> + index->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED);
>   END_CRIT_SECTION();

Maybe write the last sentence as '... as the on disk files are copied
directly at the end of recovery.'?

> @@ -336,7 +336,8 @@ end_heap_rewrite(RewriteState state)
>   MAIN_FORKNUM,
>   state->rs_blockno,
>   state->rs_buffer,
> - true);
> + true,
> + false);
>   RelationOpenSmgr(state->rs_new_rel);
>  
>   PageSetChecksumInplace(state->rs_buffer, state->rs_blockno);
> @@ -685,7 +686,8 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
>   MAIN_FORKNUM,
>   state->rs_blockno,
>   page,
> - true);
> + true,
> + false);

Did you verify that that's ok when a unlogged table is clustered/vacuum
full'ed?

> @@ -181,6 +183,9 @@ xlog_identify(uint8 info)
>   case XLOG_FPI_FOR_HINT:
>   id = "FPI_FOR_HINT";
>   break;
> + case XLOG_FPI_FLUSH:
> + id = "FPI_FOR_SYNC";
> + break;
>   }

Old string.


> --- a/src/backend/access/transam/xloginsert.c
> +++ b/src/backend/access/transam/xloginsert.c
> @@ -932,10 +932,13 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std)
>   * If the page follows the standard page layout, with a PageHeader and unused
>   * space between pd_lower and pd_upper, set 'page_std' to TRUE. That allows
>   * the unused space to be left out from the WAL record, making it smaller.
> + *
> + * If 'is_flush' is set to TRUE, relation will be requested to flush
> + * immediately its data at replay after replaying this full page image.
>   */

s/is_flush/flush_immed/? And maybe say that it

Re: [HACKERS] Error with index on unlogged table

2015-12-04 Thread Michael Paquier

Andres Freud wrote:
> On 2015-11-20 16:11:15 +0900, Michael Paquier wrote:
>> diff --git a/src/backend/access/transam/xlog.c 
>> b/src/backend/access/transam/xlog.c
>> index cc845d2..4883697 100644
>> --- a/src/backend/access/transam/xlog.c
>> +++ b/src/backend/access/transam/xlog.c
>> @@ -9503,6 +9503,14 @@ xlog_redo(XLogRecPtr lsn, XLogRecord *record)
>>   data += sizeof(BkpBlock);
>>
>>   RestoreBackupBlockContents(lsn, bkpb, data, false, false);
>> +
>> + if (bkpb.fork == INIT_FORKNUM)
>> + {
>> + SMgrRelation srel;
>> + srel = smgropen(bkpb.node, InvalidBackendId);
>> + smgrimmedsync(srel, INIT_FORKNUM);
>> + smgrclose(srel);
>> + }
>>   }
>>   else if (info == XLOG_BACKUP_END)
>>   {
>
> A smgrwrite() instead of a smgrimmedsync() should be sufficient here.

Check.

On Fri, Dec 4, 2015 at 6:35 AM, Andres Freund  wrote:
> On 2015-11-27 16:59:20 +0900, Michael Paquier wrote:
>> Attached is a patch that fixes the issue for me in master and 9.5.
>> Actually in the last patch I forgot a call to smgrwrite to ensure that
>> the INIT_FORKNUM is correctly synced to disk when those pages are
>> replayed at recovery, letting the reset routines for unlogged
>> relations do their job correctly. I have noticed as well that we need
>> to do the same for gin and brin relations. In this case I think that
>> we could limit the flush to unlogged relations, my patch does it
>> unconditionally though to generalize the logic. Thoughts?
>
> I think it's a good idea to limit this to unlogged relations. For a
> dataset that fits into shared_buffers and creates many relations, the
> additional disk writes could be noticeable.

OK, then I have switched the patch this way to limit the flush to
unlogged relations where needed.

>> diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
>> index 99337b0..d7964ac 100644
>> --- a/src/backend/access/brin/brin.c
>> +++ b/src/backend/access/brin/brin.c
>> @@ -682,7 +682,12 @@ brinbuildempty(PG_FUNCTION_ARGS)
>>   brin_metapage_init(BufferGetPage(metabuf), BrinGetPagesPerRange(index),
>>  BRIN_CURRENT_VERSION);
>>   MarkBufferDirty(metabuf);
>> - log_newpage_buffer(metabuf, false);
>> + /*
>> +  * When this full page image is replayed, there is no guarantee that
>> +  * this page will be present to disk when replayed particularly for
>> +  * unlogged relations, hence enforce it to be flushed to disk.
>> +  */
>
> The grammar seems a bit off here.

Check.

>> + /*
>> +  * Initialize and xlog metabuffer and root buffer. When those full
>> +  * pages are replayed, it is not guaranteed that those relation
>> +  * init forks will be flushed to disk after replaying them, hence
>> +  * enforce those pages to be flushed to disk at replay, only the
>> +  * last record will enforce a flush for performance reasons and
>> +  * because it is actually unnecessary to do it multiple times.
>> +  */
>
> That comment needs some love too. I think it really only makes sense if
> you know the current state. There's also some language polishing needed.

Check.

>> @@ -450,14 +450,21 @@ ginbuildempty(PG_FUNCTION_ARGS)
>>   START_CRIT_SECTION();
>>   GinInitMetabuffer(MetaBuffer);
>>   MarkBufferDirty(MetaBuffer);
>> - log_newpage_buffer(MetaBuffer, false);
>> + log_newpage_buffer(MetaBuffer, false, false);
>>   GinInitBuffer(RootBuffer, GIN_LEAF);
>>   MarkBufferDirty(RootBuffer);
>> - log_newpage_buffer(RootBuffer, false);
>> + log_newpage_buffer(RootBuffer, false, true);
>>   END_CRIT_SECTION();
>>
> Why isn't the metapage flushed to disk?

I was not sure if we should only flush only at the last page, as pages
just before would be already replayed. Switched.

>> --- a/src/backend/access/spgist/spginsert.c
>> +++ b/src/backend/access/spgist/spginsert.c
>> @@ -173,7 +173,7 @@ spgbuildempty(PG_FUNCTION_ARGS)
>> (char *) page, true);
>>   if (XLogIsNeeded())
>>   log_newpage(>rd_smgr->smgr_rnode.node, INIT_FORKNUM,
>> - SPGIST_METAPAGE_BLKNO, page, false);
>> + SPGIST_METAPAGE_BLKNO, page, false, 
>> false);
>>
>>   /* Likewise for the root page. */
>>   SpGistInitPage(page, SPGIST_LEAF);
>> @@ -183,7 +183,7 @@ spgbuildempty(PG_FUNCTION_ARGS)
>> (char *) page, true);
>>   if (XLogIsNeeded())
>>   log_newpage(>rd_smgr->smgr_rnode.node, INIT_FORKNUM,
>> - SPGIST_ROOT_BLKNO, page, true);
>> + SPGIST_ROOT_BLKNO, page, true, false);
>>
>
> I don't think it's correct to not flush in these cases. Yes, the primary
> does an smgrimmedsync() - but that's not

Re: [HACKERS] parallel joins, and better parallel explain

2015-12-04 Thread Amit Kapila

On Thu, Dec 3, 2015 at 3:25 AM, Robert Haas  wrote:
>
> On Tue, Dec 1, 2015 at 7:21 AM, Amit Kapila 
wrote:
>
> > - There seems to be some inconsistency in Explain's output when
> > multiple workers are used.
>
>
> So the net result of this is that the times and row counts are
> *averages* across all of the loop iterations.  In the case of the
> inner side of a nested loop, this means you can read the data just as
> you would in a non-parallel plan.  For nodes executed exactly once per
> worker plus once in the master, the value displayed ends up being a
> per-process average of the amount of time spent, and a per-process
> average of the number of rows.  On the other hand, values for buffers
> are NOT divided by the loop count, so those values are absolute
> totals.  Once you understand this, I think the data is pretty easy to
> read.
>

Okay, I am able to understand the results now.

Currently if the node doesn't get executed due to any reason we
use below code to display the required information:
else if (es->analyze)
{
if (es->format == EXPLAIN_FORMAT_TEXT)
appendStringInfoString(es->str, " (never executed)");
else
{
if
(es->timing)
{
ExplainPropertyFloat("Actual Startup Time",
0.0, 3, es);
ExplainPropertyFloat("Actual Total Time", 0.0, 3, es);
}
ExplainPropertyFloat("Actual Rows", 0.0, 0, es);
ExplainPropertyFloat("Actual Loops", 0.0, 0, es);
}
}

Do you think it will be useful to display in a similar way if worker
is not able to execute plan (like before it starts execution, the other
workers have already finished the work)?

Other than that parallel-explain-v2.patch looks good.

> >->  Gather  (cost=1000.00..46203.83 rows=9579 width=0) (actual
> > time=33.983..3
> > 3592.030 rows= loops=1)
> >  Output: c1, c2
> >  Number of Workers: 4
> >  Buffers: shared hit=548 read=142506
> >  ->  Parallel Seq Scan on public.tbl_parallel_test
> > (cost=0.00..44245.93
> >  rows=2129 width=0) (actual time=13.447..33354.099 rows=2000 loops=5)
> >Output: c1, c2
> >Filter: (tbl_parallel_test.c1 < 1)
> >Rows Removed by Filter: 198000
> >Buffers: shared hit=352 read=142506
> >Worker 0: actual time=18.422..33322.132 rows=2170 loops=1
> >  Buffers: shared hit=4 read=30765
> >Worker 1: actual time=0.803..33283.979 rows=1890 loops=1
> >  Buffers: shared hit=1 read=26679
> >Worker 2: actual time=0.711..33360.007 rows=1946 loops=1
> >  Buffers: shared hit=197 read=30899
> >Worker 3: actual time=15.057..33252.605 rows=2145 loops=1
> >  Buffers: shared hit=145 read=25433
> >  Planning time: 0.217 ms
> >  Execution time: 33612.964 ms
> > (22 rows)
> >
> > I am not able to understand how buffer usage add upto what is
> > shown at Gather node.
>
> It doesn't, of course.  But I'm not sure it should,
>

I think the problem is at Gather node, the number of buffers (read + hit)
are greater than the number of pages in relation.  The reason why it
is doing so is that in Workers (ParallelQueryMain()), it starts the buffer
usage accumulation before ExecutorStart() whereas in master backend
it always calculate it for ExecutorRun() phase, on changing it to accumulate
for ExecutorRun() phase above problem is fixed.  Attached patch fixes the
problem.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_parallel_bufusage_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql: add \pset true/false

2015-12-04 Thread Kyotaro HORIGUCHI

Hello, I think this is the my last proposal of an idea on
psql-side generic solution. Sorry for bothering.

> My environment is CentOS7. But completely forgot Windows
> environment (or other platforms). I agree with you. It will
> indeed too much considering all possible platforms.

Ok, DLL is not practical since it would not only be rather
complex considering multi platform but used only by this feature
and no other user of DLL is not in sight. It's convincing enough
for me.

Then, how about calling subprocess for purpose? popen() looks to
be platform-independent so it is available to call arbitrary
external programs or scripts.


There are several obvious problems on this.

 - Tremendously slow since this executes the program for every value.

 - Dangerous in some aspect. Setting it by environment variable
   is too dengerous but I'm not confidento whether settig by
   \pset is safer as acceptable.

Is it still too complex? Or too slow?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 438a4ec..5dad70b 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -1148,7 +1148,8 @@ exec_command(const char *cmd,
 
 			int			i;
 			static const char *const my_list[] = {
-"border", "columns", "expanded", "fieldsep", "fieldsep_zero",
+"border", "columns", "column_filter",
+"expanded", "fieldsep", "fieldsep_zero",
 "footer", "format", "linestyle", "null",
 "numericlocale", "pager", "pager_min_lines",
 "recordsep", "recordsep_zero",
@@ -2695,6 +2696,17 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet)
 		if (value)
 			popt->topt.columns = atoi(value);
 	}
+	/* set column filter */
+	else if (strcmp(param, "columnfilter") == 0)
+	{
+		if (vallen > 0)
+			popt->topt.columnfilter = pg_strdup(value);
+		else
+		{
+			if (popt->topt.columnfilter) pg_free(popt->topt.columnfilter);
+			popt->topt.columnfilter = NULL;
+		}
+	}
 	else
 	{
 		psql_error("\\pset: unknown option: %s\n", param);
@@ -2726,6 +2738,15 @@ printPsetInfo(const char *param, struct printQueryOpt *popt)
 			printf(_("Target width is %d.\n"), popt->topt.columns);
 	}
 
+	/* show the target width for the wrapped format */
+	else if (strcmp(param, "columnfilter") == 0)
+	{
+		if (!popt->topt.columnfilter)
+			printf(_("Column filter is unset.\n"));
+		else
+			printf(_("Column filter is \"%s\".\n"), popt->topt.columnfilter);
+	}
+
 	/* show expanded/vertical mode */
 	else if (strcmp(param, "x") == 0 || strcmp(param, "expanded") == 0 || strcmp(param, "vertical") == 0)
 	{
@@ -2938,6 +2959,8 @@ pset_value_string(const char *param, struct printQueryOpt *popt)
 		return psprintf("%d", popt->topt.border);
 	else if (strcmp(param, "columns") == 0)
 		return psprintf("%d", popt->topt.columns);
+	else if (strcmp(param, "columnfilter") == 0)
+		return pstrdup(popt->topt.columnfilter);
 	else if (strcmp(param, "expanded") == 0)
 		return pstrdup(popt->topt.expanded == 2
 	   ? "auto"
diff --git a/src/bin/psql/print.c b/src/bin/psql/print.c
index ad4350e..9731e54 100644
--- a/src/bin/psql/print.c
+++ b/src/bin/psql/print.c
@@ -195,7 +195,7 @@ static void IsPagerNeeded(const printTableContent *cont, const int extra_lines,
 			  FILE **fout, bool *is_pager);
 
 static void print_aligned_vertical(const printTableContent *cont, FILE *fout);
-
+static char *valfilter(char *filtname, char *origcell, int colnum, int type, bool *mustfree);
 
 /* Count number of digits in integral part of number */
 static int
@@ -3145,6 +3145,86 @@ printTable(const printTableContent *cont, FILE *fout, FILE *flog)
 		ClosePager(fout);
 }
 
+static char *
+valfilter(char *filtname, char *origcell, int colnum, int type, bool *mustfree)
+{
+	char *cmdline;
+	int buflen = strlen(filtname) + 1 +
+		log10(colnum > 0 ? colnum : 1) + 2 + log10(type) + 2 +
+		+ strlen(origcell) + 1 ;
+	FILE *fp;
+	size_t celllen;
+	char *newcell = NULL;
+	char *p;
+	int additional_bytes;
+	bool escape_needed = false;
+
+	/* Check necessity of escaping */
+	for (additional_bytes = 0, p = origcell ; *p ; p++)
+	{
+		if (*p == '\'')
+		{
+			additional_bytes += 4; /* strlen('"'"') - 1 */
+			escape_needed = true;
+		}
+		else if (*p == ' ')
+			escape_needed = true;
+	}
+
+	/* Escaping needed */
+	if (escape_needed)
+	{
+		char *q;
+		char *str;
+		int  newlen;
+
+		additional_bytes += 2;
+		newlen = strlen(origcell) + 1 + additional_bytes;
+		str = (char *)pg_malloc(newlen);
+
+		q = str;
+		*q++ = '\'';
+		for (p = origcell ; *p ; p++)
+		{
+			if (*p == '\'')
+			{
+strcpy(q, "'\"'\"'");
+q += 5;
+			}
+			else
+*q++ = *p;
+		}
+		*q++ = '\'';
+		*q++ = '\0';
+		Assert(q - str == newlen);
+
+		if(*mustfree) free(origcell);
+		origcell = str;
+		*mustfree = true;
+		buflen += additional_bytes;
+	}
+
+	cmdline = pg_malloc(buflen);
+	snprintf(cmdline, buflen, "%s %d %d %s", filtname, type, colnum, origcell);
+	fp = popen(cmdline, "r");
+
+	/* fail silently

Re: [HACKERS] proposal: add 'waiting for replication' to pg_stat_activity.state

2015-12-04 Thread Craig Ringer

On 3 December 2015 at 22:58, Amit Kapila  wrote:

> On Thu, Dec 3, 2015 at 9:02 AM, Craig Ringer 
> wrote:
> >
> > On 3 December 2015 at 09:32, Peter Eisentraut  wrote:
> >>
> >> On 12/2/15 7:00 PM, Craig Ringer wrote:
> >> > I notice that you don't set the 'waiting' flag.  'waiting' is
> presently
> >> > documented as:
> >> >
> >> >True if this backend is currently waiting on a
> lock
> >> >
> >> > ... but I'm inclined to just widen its definition and set it here,
> since
> >> > we most certainly are waiting, and the column isn't named
> >> > 'waiting_on_a_lock'. It shouldn't upset various canned lock monitoring
> >> > queries people have since they generally do an inner join on pg_locks
> >> > anyway.
> >>
> >> I'm not so sure about that assumption.
> >
> >
> > Even if it's an outer join, the worst that'll happen is that they'll get
> entries with nulls in pg_locks. I don't think it's worth worrying about too
> much.
> >
>
> That can be one way of dealing with it and another is that we
> keep the current column as it is and add another view related
> wait stats, anyway we need something like that for other purposes
> like lwlock waits etc.  Basically something on lines what we
> is being discussed in thread [1]
>
> [1] -
> http://www.postgresql.org/message-id/CAA4eK1+=5Ex8-5NRr3u94=_t2p65v0kcjZ5rXddVmkx=lwa...@mail.gmail.com
>


Good point. I'm not sure this shouldn't set 'waiting' anyway, though.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

[HACKERS] Patch: ResourceOwner optimization for tables with many partitions

2015-12-04 Thread Aleksander Alekseev

Hello all,

Current implementation of ResourceOwner uses arrays to store resources
like TupleDescs, Snapshots, etc. When we want to release one of these
resources ResourceOwner finds it with linear scan. Granted, resource
array are usually small so its not a problem most of the time. But it
appears to be a bottleneck when we are working with tables which have a
lot of partitions.

To reproduce this issue:
1. run `./gen.pl 1 | psql my_database postgres`
2. run `pgbench -j 8 -c 8 -f q.sql -T 100 my_database`
3. in second terminal run `sudo perf top -u postgres`

Both gen.pl and q.sql are attached to this message.

You will see that postgres spends a lot of time in ResourceOwnerForget*
procedures:

 32.80%  postgres   [.] list_nth
 20.29%  postgres   [.] ResourceOwnerForgetRelationRef
 12.87%  postgres   [.] find_all_inheritors
  7.90%  postgres   [.] get_tabstat_entry
  6.68%  postgres   [.] ResourceOwnerForgetTupleDesc
  1.17%  postgres   [.] hash_search_with_hash_value
 ... < 1% ...

I would like to suggest a patch (see attachment) witch fixes this
bottleneck. Also I discovered that there is a lot of code duplication in
ResourceOwner. Patch fixes this too. The idea is quite simple. I just
replaced arrays with something that could be considered hash tables,
but with some specific optimizations. 

After applying this patch we can see that bottleneck is gone:

 42.89%  postgres   [.] list_nth
 18.30%  postgres   [.] find_all_inheritors
 10.97%  postgres   [.] get_tabstat_entry
  1.82%  postgres   [.] hash_search_with_hash_value
  1.21%  postgres   [.] SearchCatCache
 ... < 1% ...

For tables with thousands partitions we gain in average 32.5% more TPS.

As far as I can see in the same time patch doesn't make anything worse.
`make check` passes with asserts enabled and disabled. There is no
performance degradation according to both standard pgbench benchmark
and benchmark described above for tables with 10 and 100 partitions.

Best regards,
Aleksander

gen.pl
Description: Perl program


q.sql
Description: application/sql
diff --git a/src/backend/utils/resowner/resowner.c b/src/backend/utils/resowner/resowner.c
index 0e7acbf..6abbfa5 100644
--- a/src/backend/utils/resowner/resowner.c
+++ b/src/backend/utils/resowner/resowner.c
@@ -47,62 +47,296 @@
 #define MAX_RESOWNER_LOCKS 15
 
 /*
- * ResourceOwner objects look like this
+ * This number is used as initial size of resource arrays (like buffers,
+ * catrefs, etc) in ResourceOwnerData structure. If given number of items is
+ * not enough, we double array size and reallocate memory.
+ *
+ * Should be power of two since we use (arrsize -1) as mask for hash value.
+ *
  */
-typedef struct ResourceOwnerData
+#define RESARRAY_INIT_SIZE 16
+
+/*
+ * How many items could be stored in a resource array of given capacity. If
+ * this number is reached we need to resize an array to prevent hash collisions.
+ *
+ * This computation actually costs only two additions and one binary shift.
+ */
+#define RESARRAY_MAX_ITEMS(capacity) ((capacity)*3/4)
+
+/*
+ * Common structure for storing different type of resources.
+ */
+typedef struct ResourceArray
+{
+	uint8*		itemsarr;		/* buffer for storing values */
+	uint32		itemsizelg:  2;	/* sizeof one item log 2 */
+	uint32		capacity:   30;	/* capacity of array */
+	uint32		nitems;			/* how many items is stored in items array */
+	uint32		maxitems;		/* precalculated RESARRAY_MAX_ITEMS(capacity) */
+} ResourceArray;
+
+/*
+ * Such type of callback function is called when resource stored in
+ * ResourceArray is released using ResourceArrayFree. Callback should
+ * _actually_ release a resource so nitems value will be decremented.
+ */
+typedef void (*ResourceArrayRemoveCallback)(const uint8* ref, bool isCommit);
+
+/* Used as argument to memcmp to determine if ResourceArray[i] is free. */
+static const uint8 RESOURCE_ARRAY_ZERO_ELEMENT[sizeof(void*)] = {0};
+
+/*
+ * Calculate hash_any of given data. For uint32 values use faster hash_uint32.
+ */
+static Datum
+ResourceArrayHash(const uint8* data, int size)
+{
+	uint32		tmp;
+
+	Assert(size == sizeof(uint32) || size == sizeof(void*));
+
+	if(size == sizeof(uint32))
+	{
+		tmp = *((const uint32*)data);
+		return hash_uint32(tmp);
+	}
+	else
+	{
+		return hash_any(data, size);
+	}
+}
+
+/*
+ * Initialize ResourceArray
+ */
+static void
+ResourceArrayInit(ResourceArray *resarr, uint32 itemsize) {
+	Assert(itemsize == sizeof(int) || itemsize == sizeof(void*));
+	Assert(resarr->itemsarr == NULL);
+	Assert(resarr->capacity == 0);
+	Assert(resarr->nitems == 0);
+	Assert(resarr->maxitems == 0);
+
+	resarr->itemsizelg = 0;
+	while(itemsize > 1)
+	{
+		resarr->itemsizelg++;
+		itemsize >>= 1;
+	}
+
+	Assert(resarr->itemsizelg == 2 || resarr->itemsizelg == 3);
+}
+
+/*
+ * Add a resource to ResourceArray
+ *
+ * Caller must have previously done

Re: [HACKERS] proposal: add 'waiting for replication' to pg_stat_activity.state

2015-12-04 Thread Andres Freund

On December 4, 2015 9:48:55 AM GMT+01:00, Craig Ringer  
wrote:
>On 3 December 2015 at 22:58, Amit Kapila 
>wrote:
>
>> On Thu, Dec 3, 2015 at 9:02 AM, Craig Ringer 
>> wrote:

>http://www.postgresql.org/message-id/CAA4eK1+=5Ex8-5NRr3u94=_t2p65v0kcjZ5rXddVmkx=lwa...@mail.gmail.com
>>
>Good point. I'm not sure this shouldn't set 'waiting' anyway, though.

No. Waiting corresponds to pg locks -setting it to true for other things would 
be even more confusing...

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Confusing results with lateral references

2015-12-04 Thread Tom Lane

Ashutosh Bapat  writes:
> I am seeing different results with two queries which AFAIU have same
> semantics and hence are expected to give same results.

> postgres=# select * from t1, (select distinct val, val2 from t2) t2ss where 
> t1.val = t2ss.val for update of t1;

> postgres=# select * from t1, lateral (select distinct val, val2 from t2 where 
> t2.val = t1.val) t2ss for update of t1;

(I renamed your inline sub-selects to avoid confusion between them and the
table t2.)

I'm skeptical that those should be claimed to have identical semantics.

In the first example, after we've found the join row (1,1,1,1), we block
to see if the pending update on t1 will commit.  After it does, we recheck
the join condition using the updated row from t1 (and the original row
from t2ss).  The condition fails, so the updated row is not output.

The same thing happens in the second example, ie, we consider the updated
row from t1 and the non-updated row from t2ss (NOT t2).  There are no join
conditions to recheck (in the outer query level), so the row passes, and
we output it.

If you'd allowed the FOR UPDATE to propagate into the sub-select, then the
sub-select's conditions would be considered as needing rechecks ... of
course, that would require removing the DISTINCT.

This example does show that a lateral reference to a FOR UPDATE table from
a non-FOR-UPDATE subselect has confusing behavior.  Maybe we ought to
forbid that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [DOCS] max_worker_processes on the standby

2015-12-04 Thread Alvaro Herrera

Fujii Masao wrote:

> Sorry for not reviewing the patch before you push it...
> 
> In HEAD, I ran very simple test case:
> 
> 1. enable track_commit_timestamp
> 2. start the server
> 3. run some transactions
> 4. execute pg_last_committed_xact() -- returns non-null values
> 5. shutdown the server with immdiate mode
> 6. restart the server -- crash recovery happens
> 7. execute pg_last_committed_xact()
> 
> The last call of pg_last_committed_xact() returns NULL values, which means
> that the xid and timestamp information of the last committed transaction
> disappeared by crash recovery. Isn't this a bug?

Hm, not really, because the status of the "last" transaction is kept in
shared memory as a cache and not expected to live across a restart.
However, I tested the equivalent scenario:

alvherre=# create table fg();
CREATE TABLE

alvherre=# select ts.* from pg_class,pg_xact_commit_timestamp(xmin) ts where 
relname = 'fg';
  ts   
---
 2015-12-04 12:41:48.017976-03
(1 fila)

then crash the server, and after recovery the data is gone:

alvherre=# select ts.*, xmin, c.relname from pg_class 
c,pg_xact_commit_timestamp(xmin) ts where relname = 'fg';
 ts | xmin | relname 
+--+-
|  630 | fg
(1 fila)

Not sure what is going on; my reading of the code certainly says that
the data should be there.  I'm looking into it.

I also noticed that I didn't actually push the whole of the patch
yesterday -- I neglected to "git add" the latest changes, the ones that
fix the promotion scenario :-( so the commit messages is misleading
because it describes something that's not there.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Confusing results with lateral references

2015-12-04 Thread Tom Lane

Ashutosh Bapat  writes:
> There's another seemingly wrong result, not with lateral, but with FOR
> UPDATE.

[ shrug... ]  You're getting the post-update images of the two join
rows that would have been reported without FOR UPDATE.  This one is
definitely not a bug.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 9.5 open items

2015-12-04 Thread Simon Riggs

On 1 December 2015 at 17:05, Robert Haas  wrote:

> do we want to
> back-patch those changes to 9.5 at this late date?
>

Surely the whole point of a release process is to fix issues in the
release. If we don't ever dare put something in the release, we may as well
have released it earlier.

I'm thinking about a two stage release process...

Stage 1 - released, but with caveats and some parts marked
experimental/beta whatever

Stage 2 - released, all caveats resolved

Not sure what to call that.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support on readfuncs.c)

2015-12-04 Thread Robert Haas

On Wed, Dec 2, 2015 at 4:06 AM, Andres Freund  wrote:
> On 2015-12-02 08:52:20 +, Kouhei Kaigai wrote:
>> diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
>> index 26264cb..c4bb76e 100644
>> --- a/src/backend/nodes/copyfuncs.c
>> +++ b/src/backend/nodes/copyfuncs.c
>> @@ -635,8 +635,12 @@ _copyWorkTableScan(const WorkTableScan *from)
>>  static ForeignScan *
>>  _copyForeignScan(const ForeignScan *from)
>>  {
>> - ForeignScan *newnode = makeNode(ForeignScan);
>> -
>> + const ExtensibleNodeMethods *methods
>> + = GetExtensibleNodeMethods(from->extnodename, true);
>> + ForeignScan *newnode = (ForeignScan *) newNode(!methods
>> +
>> ? sizeof(ForeignScan)
>> +
>> : methods->node_size,
>> +
>> T_ForeignScan);
>
> Changing the FDW API seems to put this squarely into 9.6
> territory. Agreed?

I don't think anybody thought this patch was 9.5 material.  I didn't,
anyway.  The FDW changes for the join-pushdown-EPQ problem are another
issue, but that can be discussed on the other thread.

>>   /*
>>* copy node superclass fields
>>*/
>> @@ -645,6 +649,7 @@ _copyForeignScan(const ForeignScan *from)
>>   /*
>>* copy remainder of node
>>*/
>> + COPY_STRING_FIELD(extnodename);
>>   COPY_SCALAR_FIELD(fs_server);
>>   COPY_NODE_FIELD(fdw_exprs);
>>   COPY_NODE_FIELD(fdw_private);
>> @@ -652,6 +657,8 @@ _copyForeignScan(const ForeignScan *from)
>>   COPY_NODE_FIELD(fdw_recheck_quals);
>>   COPY_BITMAPSET_FIELD(fs_relids);
>>   COPY_SCALAR_FIELD(fsSystemCol);
>> + if (methods && methods->nodeCopy)
>> + methods->nodeCopy((Node *) newnode, (const Node *) from);
>
> I wonder if we shouldn't instead "recurse" into a generic routine for
> extensible nodes here.

I don't know what that means.

>> +void
>> +RegisterExtensibleNodeMethods(const ExtensibleNodeMethods *methods)
>> +{
>> + uint32  hash;
>> + int index;
>> + ListCell   *lc;
>> + MemoryContext   oldcxt;
>> +
>> + if (!xnodes_methods_slots)
>> + xnodes_methods_slots = MemoryContextAllocZero(TopMemoryContext,
>> +
>>sizeof(List *) * XNODES_NUM_SLOTS);
>> +
>> + hash = hash_any(methods->extnodename, strlen(methods->extnodename));
>> + index = hash % XNODES_NUM_SLOTS;
>> +
>> + /* duplication check */
>> + foreach (lc, xnodes_methods_slots[index])
>> + {
>> + const ExtensibleNodeMethods *curr = lfirst(lc);
>> +
>> + if (strcmp(methods->extnodename, curr->extnodename) == 0)
>> + ereport(ERROR,
>> + (errcode(ERRCODE_DUPLICATE_OBJECT),
>> +  errmsg("ExtensibleNodeMethods \"%s\" 
>> already exists",
>> + 
>> methods->extnodename)));
>> + }
>> + /* ok, register this entry */
>> + oldcxt = MemoryContextSwitchTo(TopMemoryContext);
>> + xnodes_methods_slots[index] = lappend(xnodes_methods_slots[index],
>> +
>>(void *)methods);
>> + MemoryContextSwitchTo(oldcxt);
>> +}
>
> Why aren't you using dynahash here, and instead reimplement a hashtable?

I had thought we might assume that the number of extensible nodes was
small enough that we could use an array for this and just use linear
search.  But if we want to keep the door open to having enough
extensible nodes that this would be inefficient, then I agree that
dynahash is better than a hand-rolled hash table.

>>  static ForeignScan *
>>  _readForeignScan(void)
>>  {
>> + const ExtensibleNodeMethods *methods;
>>   READ_LOCALS(ForeignScan);
>>
>>   ReadCommonScan(_node->scan);
>>
>> + READ_STRING_FIELD(extnodename);
>>   READ_OID_FIELD(fs_server);
>>   READ_NODE_FIELD(fdw_exprs);
>>   READ_NODE_FIELD(fdw_private);
>> @@ -1804,6 +1806,19 @@ _readForeignScan(void)
>>   READ_BITMAPSET_FIELD(fs_relids);
>>   READ_BOOL_FIELD(fsSystemCol);
>>
>> + methods = GetExtensibleNodeMethods(local_node->extnodename, true);
>> + if (methods && methods->nodeRead)
>> + {
>> + ForeignScan*local_temp = palloc0(methods->node_size);
>> +
>> + Assert(methods->node_size >= sizeof(ForeignScan));
>> +
>> + memcpy(local_temp, local_node, sizeof(ForeignScan));
>> + methods->nodeRead((Node *) local_temp);
>> +
>> + pfree(local_node);
>> + local_node =

Re: [HACKERS] Remaining 9.5 open items

2015-12-04 Thread Tom Lane

Simon Riggs  writes:
> On 1 December 2015 at 17:05, Robert Haas  wrote:
>> do we want to
>> back-patch those changes to 9.5 at this late date?

> Surely the whole point of a release process is to fix issues in the
> release. If we don't ever dare put something in the release, we may as well
> have released it earlier.

> I'm thinking about a two stage release process...

> Stage 1 - released, but with caveats and some parts marked
> experimental/beta whatever

> Stage 2 - released, all caveats resolved

> Not sure what to call that.

9.5beta3.  If you're saying we are not ready for an RC, then it's a beta.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support of partial decompression for datums

2015-12-04 Thread Simon Riggs

On 4 December 2015 at 13:47, Ildus Kurbangaliev <
i.kurbangal...@postgrespro.ru> wrote:

> Attached patch adds support of partial decompression for datums.
> It will be useful in many cases when extracting part of data is
> enough for big varlena structures.
>
> It is especially useful for expanded datums, because it provides
> storage for partial results.
>

This isn't enough for anyone else to follow your thoughts and agree enough
to commit.

Please explain the whole idea, starting from what problem you are trying to
solve and how well this does it, why you did it this way and the other ways
you tried/decided not to pursue. Thanks.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Remaining 9.5 open items

2015-12-04 Thread Simon Riggs

On 4 December 2015 at 16:29, Tom Lane  wrote:

> Simon Riggs  writes:
> > On 1 December 2015 at 17:05, Robert Haas  wrote:
> >> do we want to
> >> back-patch those changes to 9.5 at this late date?
>
> > Surely the whole point of a release process is to fix issues in the
> > release. If we don't ever dare put something in the release, we may as
> well
> > have released it earlier.
>
> > I'm thinking about a two stage release process...
>
> > Stage 1 - released, but with caveats and some parts marked
> > experimental/beta whatever
>
> > Stage 2 - released, all caveats resolved
>
> > Not sure what to call that.
>
> 9.5beta3.  If you're saying we are not ready for an RC, then it's a beta.
>

I guess I really meant "in the future" or "next time", but perhaps that
could apply now.

The main issue is that most of these things are pretty trivial and hardly
worth delaying the release process of The World's Most Advanced Open Source
Database for; its not like the Saturn V will fail to fly.

We don't seem to put any negative weighting on delay; even the smallest
issues are enough to delay us. That's daft because we know we'll hit a raft
of bugs soon after release, we just don't know where they are yet - so
fooling ourselves that it needs to be perfect before release of 9.5.0
doesn't help anybody.

Do we think they ever launched a Saturn V that didn't have some marginal
flashing lights somewhere?

I accept there are open items. I'd like a way to indicate to people they
can start using it with a safety, apart from the listed caveats.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] parallel joins, and better parallel explain

2015-12-04 Thread Simon Riggs

On 30 November 2015 at 17:52, Robert Haas  wrote:

> My idea is that you'd end up with a plan like this:
>
> Gather
> -> Hash Join
>   -> Parallel Seq Scan
>   -> Parallel Hash
> -> Parallel Seq Scan
>
> Not only does this build only one copy of the hash table instead of N
> copies, but we can parallelize the hash table construction itself by
> having all workers insert in parallel, which is pretty cool.

Hmm. If the hash table is small it should be OK to keep it locally. If its
larger, we need the shared copy. Seems like we can do the math on when to
use each kind of hash table build it in shmem and then copy locally if
needed.

Another way might to force the hash table into N batches, then give each
scan the task of handling one batch. That would allow a much larger hash
table to still be kept locally, moving the problem towards routing the data.

I'm not immediately convinced by the coolness of loading the hash table in
parallel. A whole class of bugs could be avoided if we choose not to, plus
the hash table is frequently so small a part of the HJ that its not going
to gain us very much.

The other way to look at what you've said here is that you don't seem to
have a way of building the hash table in only one process, which concerns
me.

What I can confirm at this point is that
> I've thought about the problem you're asking about here, and that
> EnterpriseDB intends to contribute code to address it.

Good

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

[HACKERS] Support of partial decompression for datums

2015-12-04 Thread Ildus Kurbangaliev

Attached patch adds support of partial decompression for datums.
It will be useful in many cases when extracting part of data is
enough for big varlena structures.

It is especially useful for expanded datums, because it provides
storage for partial results.

I have another patch, which removes the 1 Mb limit on tsvector using
this feature.

Usage:

Assert(VARATT_IS_COMPRESSED(attr));
evh->data = (struct varlena *)
palloc(TOAST_COMPRESS_RAWSIZE(attr) + VARHDRSZ);
SET_VARSIZE(evh->data, TOAST_COMPRESS_RAWSIZE(attr) + VARHDRSZ);

/* Extract size of tsvector */
res = toast_decompress_datum_partial(attr, evh->data,
evh->dcState, sizeof(int32));
if (res == -1)
elog(ERROR, "compressed tsvector is corrupted");

evh->count = TS_COUNT((TSVector) evh->data);

/* Extract entries of tsvector */
res = toast_decompress_datum_partial(attr, evh->data,
evh->dcState, sizeof(int32) + sizeof(WordEntry) * evh->count);
if (res == -1)
elog(ERROR, "compressed tsvector is corrupted");


-- 
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index b9691a5..0fc5d5a 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -46,32 +46,12 @@
 
 #undef TOAST_DEBUG
 
-/*
- *	The information at the start of the compressed toast data.
- */
-typedef struct toast_compress_header
-{
-	int32		vl_len_;		/* varlena header (do not touch directly!) */
-	int32		rawsize;
-} toast_compress_header;
-
-/*
- * Utilities for manipulation of header information for compressed
- * toast entries.
- */
-#define TOAST_COMPRESS_HDRSZ		((int32) sizeof(toast_compress_header))
-#define TOAST_COMPRESS_RAWSIZE(ptr) (((toast_compress_header *) (ptr))->rawsize)
-#define TOAST_COMPRESS_RAWDATA(ptr) \
-	(((char *) (ptr)) + TOAST_COMPRESS_HDRSZ)
-#define TOAST_COMPRESS_SET_RAWSIZE(ptr, len) \
-	(((toast_compress_header *) (ptr))->rawsize = (len))
 
 static void toast_delete_datum(Relation rel, Datum value);
 static Datum toast_save_datum(Relation rel, Datum value,
  struct varlena * oldexternal, int options);
 static bool toastrel_valueid_exists(Relation toastrel, Oid valueid);
 static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
-static struct varlena *toast_fetch_datum(struct varlena * attr);
 static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
 		int32 sliceoffset, int32 length);
 static struct varlena *toast_decompress_datum(struct varlena * attr);
@@ -1792,7 +1772,7 @@ toastid_valueid_exists(Oid toastrelid, Oid valueid)
  *	in the toast relation
  * --
  */
-static struct varlena *
+struct varlena *
 toast_fetch_datum(struct varlena * attr)
 {
 	Relation	toastrel;
@@ -2205,12 +2185,33 @@ toast_decompress_datum(struct varlena * attr)
 	if (pglz_decompress(TOAST_COMPRESS_RAWDATA(attr),
 		VARSIZE(attr) - TOAST_COMPRESS_HDRSZ,
 		VARDATA(result),
-		TOAST_COMPRESS_RAWSIZE(attr)) < 0)
+		TOAST_COMPRESS_RAWSIZE(attr),
+		NULL) < 0)
 		elog(ERROR, "compressed data is corrupted");
 
 	return result;
 }
 
+/* --
+ * toast_decompress_datum_partial -
+ *
+ * Decompress a compressed version of a varlena datum partially
+ */
+int
+toast_decompress_datum_partial(struct varlena *source,
+	struct varlena *dest,
+	PGLZ_DecompressState *state,
+	int32 until)
+{
+	Assert(VARATT_IS_COMPRESSED(source));
+
+	state->until = until;
+	return pglz_decompress(TOAST_COMPRESS_RAWDATA(source),
+		VARSIZE(source) - TOAST_COMPRESS_HDRSZ,
+		VARDATA(dest),
+		TOAST_COMPRESS_RAWSIZE(source),
+		state);
+}
 
 /* --
  * toast_open_indexes
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 37cf9de..a1642ed 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -1309,7 +1309,7 @@ RestoreBlockImage(XLogReaderState *record, uint8 block_id, char *page)
 	{
 		/* If a backup block image is compressed, decompress it */
 		if (pglz_decompress(ptr, bkpb->bimg_len, tmp,
-			BLCKSZ - bkpb->hole_length) < 0)
+			BLCKSZ - bkpb->hole_length, NULL) < 0)
 		{
 			report_invalid_record(record, "invalid compressed image at %X/%X, block %d",
   (uint32) (record->ReadRecPtr >> 32),
diff --git a/src/common/pg_lzcompress.c b/src/common/pg_lzcompress.c
index 447a043..df5e169 100644
--- a/src/common/pg_lzcompress.c
+++ b/src/common/pg_lzcompress.c
@@ -680,18 +680,30 @@ pglz_compress(const char *source, int32 slen, char *dest,
  */
 int32
 pglz_decompress(const char *source, int32 slen, char *dest,
-int32 rawsize)
+int32 rawsize, PGLZ_DecompressState *state)
 {
-	const unsigned char *sp;
+	unsigned char *sp;
 	const unsigned char *srcend;
 	unsigned char *dp;
 	unsigned char

Re: [HACKERS] Error with index on unlogged table

2015-12-04 Thread Michael Paquier

On Fri, Dec 4, 2015 at 5:10 PM, Andres Freund  wrote:
> On 2015-12-04 17:00:13 +0900, Michael Paquier wrote:
>> Andres Freud wrote:
>> >>  extern void InitXLogInsert(void);
>> >> diff --git a/src/include/catalog/pg_control.h 
>> >> b/src/include/catalog/pg_control.h
>> >> index ad1eb4b..91445f1 100644
>> >> --- a/src/include/catalog/pg_control.h
>> >> +++ b/src/include/catalog/pg_control.h
>> >> @@ -73,6 +73,7 @@ typedef struct CheckPoint
>> >>  #define XLOG_END_OF_RECOVERY 0x90
>> >>  #define XLOG_FPI_FOR_HINT0xA0
>> >>  #define XLOG_FPI 0xB0
>> >> +#define XLOG_FPI_FOR_SYNC0xC0
>> >
>> >
>> > I'm not a big fan of the XLOG_FPI_FOR_SYNC name. Syncing is a bit too
>> > ambigous for my taste. How about either naming it XLOG_FPI_FLUSH or
>> > instead adding actual record data and a 'flags' field in there? I
>> > slightly prefer the latter - XLOG_FPI and XLOG_FPI_FOR_HINT really are
>> > different, XLOG_FPI_FOR_SYNC not so much.
>>
>> Let's go for XLOG_FPI_FLUSH.
>
> I think the other way is a bit better, because we can add new flags
> without changing the WAL format.

Hm. On the contrary, I think that it would make more sense to have a
flag as well for FOR_HINT honestly, those are really the same
operations, and FOR_HINT is just here for statistic purposes.

>> diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
>> index 99337b0..b646101 100644
>> --- a/src/backend/access/brin/brin.c
>> +++ b/src/backend/access/brin/brin.c
>> @@ -682,7 +682,15 @@ brinbuildempty(PG_FUNCTION_ARGS)
>>   brin_metapage_init(BufferGetPage(metabuf), BrinGetPagesPerRange(index),
>>  BRIN_CURRENT_VERSION);
>>   MarkBufferDirty(metabuf);
>> - log_newpage_buffer(metabuf, false);
>> +
>> + /*
>> +  * For unlogged relations, this page should be immediately flushed
>> +  * to disk after being replayed. This is necessary to ensure that the
>> +  * initial on-disk state of unlogged relations is preserved when
>> +  * they get reset at the end of recovery.
>> +  */
>> + log_newpage_buffer(metabuf, false,
>> + index->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED);
>>   END_CRIT_SECTION();
>
> Maybe write the last sentence as '... as the on disk files are copied
> directly at the end of recovery.'?

Check.

>> @@ -336,7 +336,8 @@ end_heap_rewrite(RewriteState state)
>>   MAIN_FORKNUM,
>>   state->rs_blockno,
>>   state->rs_buffer,
>> - true);
>> + true,
>> + false);
>>   RelationOpenSmgr(state->rs_new_rel);
>>
>>   PageSetChecksumInplace(state->rs_buffer, state->rs_blockno);
>> @@ -685,7 +686,8 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
>>   MAIN_FORKNUM,
>>   state->rs_blockno,
>>   page,
>> - true);
>> + true,
>> + false);
>
> Did you verify that that's ok when a unlogged table is clustered/vacuum
> full'ed?

Yep.

>> @@ -181,6 +183,9 @@ xlog_identify(uint8 info)
>>   case XLOG_FPI_FOR_HINT:
>>   id = "FPI_FOR_HINT";
>>   break;
>> + case XLOG_FPI_FLUSH:
>> + id = "FPI_FOR_SYNC";
>> + break;
>>   }
>
> Old string.

Yeah, that's now completely removed.

>> --- a/src/backend/access/transam/xloginsert.c
>> +++ b/src/backend/access/transam/xloginsert.c
>> @@ -932,10 +932,13 @@ XLogSaveBufferForHint(Buffer buffer, bool buffer_std)
>>   * If the page follows the standard page layout, with a PageHeader and 
>> unused
>>   * space between pd_lower and pd_upper, set 'page_std' to TRUE. That allows
>>   * the unused space to be left out from the WAL record, making it smaller.
>> + *
>> + * If 'is_flush' is set to TRUE, relation will be requested to flush
>> + * immediately its data at replay after replaying this full page image.
>>   */
>
> s/is_flush/flush_immed/? And maybe say that it 'will be flushed to the
> OS immediately after replaying the record'?

s/OS/stable storage?
-- 
Michael
From a25b938d39fe735cf2c4c46a5c54db762510220c Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 4 Dec 2015 16:58:23 +0900
Subject: [PATCH] Ensure consistent on-disk state of UNLOGGED indexes at
 recovery

Unlogged relation indexes need to have a consistent initial state on-disk

Re: [HACKERS] Support of partial decompression for datums

2015-12-04 Thread Ildus Kurbangaliev

On Fri, 4 Dec 2015 22:13:58 +0900
Michael Paquier  wrote:

> On Fri, Dec 4, 2015 at 9:47 PM, Ildus Kurbangaliev
>  wrote:
> > Attached patch adds support of partial decompression for datums.
> > It will be useful in many cases when extracting part of data is
> > enough for big varlena structures.
> >
> > It is especially useful for expanded datums, because it provides
> > storage for partial results.
> >
> > I have another patch, which removes the 1 Mb limit on tsvector using
> > this feature.  
> 
> -1 for changing the shape of pglz_decompress directly and particularly
> use metadata in it. The current format of those routines is close to
> what lz4 offers in terms of compression and decompression of a string,
> let's not break that we had a time hard enough in 9.5 cycle to get
> something clean.

Metadata is not used for current code, only for case with partial
decompression.

> 
> By the way, why don't you compress the multiple chunks and store the
> related metadata at a higher level? There is no need to put that in
> pglz itself.

Yes, but this idea with chunks means that you are creating a whole
new structure, but it can't be used if you want to optimize current
structures. For example you can't just change arrays, but I think there
are places where the partial decompression can be used.

-- 
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some bugs in psql_complete of psql

2015-12-04 Thread Fujii Masao

On Fri, Nov 6, 2015 at 11:27 AM, Kyotaro HORIGUCHI
 wrote:
> Hello, thank you for the comments.
>
> The revised version of this patch is attached.

Thanks for updating the patch!

I tested whether the following patterns work as expected or not.

CREATE UNIQUE INDEX CONCURRENTLY name ON
CREATE UNIQUE INDEX CONCURRENTLY ON
CREATE UNIQUE INDEX name ON
CREATE UNIQUE INDEX ON
CREATE INDEX CONCURRENTLY name ON
CREATE INDEX CONCURRENTLY ON
CREATE INDEX name ON
CREATE INDEX ON

Then I found the following problems.

"CREATE UNIQUE INDEX CONCURRENTLY " didn't suggest "ON".
"CREATE UNIQUE INDEX ON " suggested nothing.
"CREATE INDEX CONCURRENTLY " didn't suggest "ON".
"CREATE INDEX ON " suggested nothing.

BTW, I found that tab-completion for DROP INDEX has the following problems.

"DROP INDEX " didn't suggest "CONCURRENTLY".
"DROP INDEX CONCURRENTLY name " suggested nothing.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support of partial decompression for datums

2015-12-04 Thread Michael Paquier

On Fri, Dec 4, 2015 at 9:47 PM, Ildus Kurbangaliev
 wrote:
> Attached patch adds support of partial decompression for datums.
> It will be useful in many cases when extracting part of data is
> enough for big varlena structures.
>
> It is especially useful for expanded datums, because it provides
> storage for partial results.
>
> I have another patch, which removes the 1 Mb limit on tsvector using
> this feature.

-1 for changing the shape of pglz_decompress directly and particularly
use metadata in it. The current format of those routines is close to
what lz4 offers in terms of compression and decompression of a string,
let's not break that we had a time hard enough in 9.5 cycle to get
something clean.

By the way, why don't you compress the multiple chunks and store the
related metadata at a higher level? There is no need to put that in
pglz itself.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: multiple psql option -c

2015-12-04 Thread Robert Haas

On Wed, Dec 2, 2015 at 12:33 AM, Pavel Stehule  wrote:
>> Yeah, I don't think that's a big issue either to be honest. The code
>> is kept consistent a maximum with what is there previously.
>>
>> Patch is switched to ready for committer.
>
> perfect
>
> Thank you very much to all

I did some edits on this patch and was all set to commit it when I ran
the regression tests and discovered that this breaks 130 out of the
160 regression tests. Allow me to suggest that before submitting a
patch, or marking it ready for commiter, you test that 'make check'
passes.

The problem seems to be the result of this code:

+if (options.actions.head == NULL && pset.notty)
+simple_action_list_append(, ACT_FILE, "-");

The problem with this is that process_file() gets called with "-"
where it previously got called with NULL, which changes the way error
reports are printed. This would be trivial to fix were it not for the
fact that SimpleActionListCell uses char val[FLEXIBLE_ARRAY_MEMBER]
rather than char *val.  I think you should change it so that it does
the latter, and then change the above line to pass NULL for the third
argument.  I think that will fix it, but it's more work than I want to
do on somebody else's patch, so I'm attaching my edited version here
for further work.

For the most part, the cleanups in this version are just cosmetic: I
fixed some whitespace damage, and reverted some needless changes to
the psql references page that were whitespace-only adjustments.  In a
few places, I tweaked documentation or comment language.  I also
hoisted the psqlrc handling out of an if statement where it was the
same in both branches.  Other than that, this version is, I believe,
the same as Pavel's last version.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index e4f72a8..47e9da2 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -38,9 +38,10 @@ PostgreSQL documentation
  PostgreSQL. It enables you to type in
  queries interactively, issue them to
  PostgreSQL, and see the query results.
- Alternatively, input can be from a file. In addition, it provides a
- number of meta-commands and various shell-like features to
- facilitate writing scripts and automating a wide variety of tasks.
+ Alternatively, input can be from a file or from command line
+ arguments. In addition, it provides a number of meta-commands and various
+ shell-like features to facilitate writing scripts and automating a wide
+ variety of tasks.
 
  
 
@@ -89,11 +90,10 @@ PostgreSQL documentation
   --command=command
   
   
-  Specifies that psql is to execute one
-  command string, command,
-  and then exit. This is useful in shell scripts. Start-up files
-  (psqlrc and ~/.psqlrc) are
-  ignored with this option.
+  Specifies that psql is to execute the given
+  command string, command.
+  This option can be repeated and combined in any order with
+  the -f option.
   
   
   command must be either
@@ -102,25 +102,34 @@ PostgreSQL documentation
   or a single backslash command. Thus you cannot mix
   SQL and psql
   meta-commands with this option. To achieve that, you could
-  pipe the string into psql, for example:
+  use repeated -c options or pipe the string
+  into psql, for example:
+  psql -c '\x' -c 'SELECT * FROM foo;' or
   echo '\x \\ SELECT * FROM foo;' | psql.
   (\\ is the separator meta-command.)
   
   
-   If the command string contains multiple SQL commands, they are
-   processed in a single transaction, unless there are explicit
-   BEGIN/COMMIT commands included in the
-   string to divide it into multiple transactions.  This is
-   different from the behavior when the same string is fed to
-   psql's standard input.  Also, only
-   the result of the last SQL command is returned.
+   Each command string passed to -c is sent to the server
+   as a single query. Because of this, the server executes it as a single
+   transaction, even if a command string contains
+   multiple SQL commands, unless there are
+   explicit BEGIN/COMMIT commands included in the
+   string to divide it into multiple transactions.  Also, the server only
+   returns the result of the last SQL command to the
+   client.  This is different from the behavior when the same string with
+   multiple SQL commands is fed
+   to psql's standard input because
+   then psql sends each SQL
+   command separately.
   
   
-   Because of these legacy behaviors, putting more than one command in
-   the -c string often has unexpected results.  It's
-   better to feed multiple commands to psql's
-   standard input, either using

Re: [HACKERS] Logical replication and multimaster

2015-12-04 Thread Craig Ringer

On 3 December 2015 at 20:39, Simon Riggs  wrote:

> On 30 November 2015 at 17:20, Konstantin Knizhnik <
> k.knizh...@postgrespro.ru> wrote:
>
>
>> But looks like there is not so much sense in having multiple network
>> connection between one pair of nodes.
>> It seems to be better to have one connection between nodes, but provide
>> parallel execution of received transactions at destination side. But it
>> seems to be also nontrivial. We have now in PostgreSQL some infrastructure
>> for background works, but there is still no abstraction of workers pool and
>> job queue which can provide simple way to organize parallel execution of
>> some jobs. I wonder if somebody is working now on it or we should try to
>> propose our solution?
>>
>
> There are definitely two clear places where additional help would be
> useful and welcome right now.
>
> 1. Allowing logical decoding to have a "speculative pre-commit data"
> option, to allow some data to be made available via the decoding api,
> allowing data to be transferred prior to commit.
>

Something relevant I ran into re this:

in reorderbuffer.c, on ReorderBufferCommit:

   * We currently can only decode a transaction's contents in when their
commit
   * record is read because that's currently the only place where we know
about
   * cache invalidations. Thus, once a toplevel commit is read, we iterate
over
   * the top and subtransactions (using a k-way merge) and replay the
changes in
   * lsn order.

I haven't dug into the implications particularly as I'm chasing something
else, but want to note it on the thread. Here be dragons when it comes to
transaction streaming.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] atomic reads & writes (with no barriers)

2015-12-04 Thread Kevin Grittner

On Fri, Dec 4, 2015 at 6:35 AM, Greg Stark  wrote:
> On Thu, Dec 3, 2015 at 10:10 PM, Kevin Grittner  wrote:
>> Is the c.h change above on anything resembling the right track for
>> a patch for this?  If not, what would such a patch look like?
>
> It would be nicer if we could come up with an interface that didn't
> require #ifdefs everywhere it's used.
>
> Something like
> ...
>   pg_maybe_atomic int64 threshold_timestamp;
> ...
>
> SpinLockAcquire_if_no_atomics(...)
> threshold_timestamp = >threshold_timestamp;
> SpinLockRelease_if_no_atomics(...)
>
> return threshold_timestamp;

Yeah, I didn't much like including the #ifdefs everywhere; I like
your suggestions.  Will work up a patch for the next CF along those
lines.

Thanks!

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [DOCS] max_worker_processes on the standby

2015-12-04 Thread Fujii Masao

On Tue, Nov 17, 2015 at 6:43 AM, Alvaro Herrera
 wrote:
> I paraphrase Fujii Masao, who wrote:
>
>> 1. Start the master and standby servers with track_commit_timestamp enabled.
>> 2. Disable track_commit_timestamp in the master and restart the master 
>> server.
>> 3. Run checkpoint in the master.
>> 4. Run restartpoint in the standby after the checkpoint WAL record generated
>> 5. Restart the standby server.
>> 6. Enable track_commit_timestamp in the master and restart the master server.
>> 7. Disable track_commit_timestamp in the master and restart the master 
>> server.
>
>> What I think strange is that pg_last_committed_xact() behaves differently
>> in #2, #5, and #7 though the settings of track_commit_timestamp are same
>> in both servers, i.e., it's disabled in the master but enabled in the 
>> standby.
>
> Interesting, thanks.  You're right that this behaves oddly.
>
> I think in order to fix these two points (#5 and #7), we need to make
> the standby not honour the GUC at all, i.e. only heed what the master
> setting is.
>
>> 8. Promote the standby server to new master.
>> Since committs is still inactive even after the promotion,
>> pg_last_committed_xact() keeps causing an ERROR though
>> track_commit_timestamp is on.
>>
>> I was thinking that whether committs is active or not should depend on
>> the setting of track_commit_timestamp *after* the promotion.
>> The behavior in #8 looked strange.
>
> To fix this problem, we can re-run StartupCommitTs() after recovery is
> done, this time making sure to honour the GUC setting.
>
> I haven't tried the scenarios we fixed with the previous round of
> patching, but this patch seems to close the problems just reported.
> (My next step will be looking over the recovery test framework by
> Michael et al, so that I can apply a few tests for this stuff.)
> In the meantime, if you can look this over I would appreciate it.

Sorry for not reviewing the patch before you push it...

In HEAD, I ran very simple test case:

1. enable track_commit_timestamp
2. start the server
3. run some transactions
4. execute pg_last_committed_xact() -- returns non-null values
5. shutdown the server with immdiate mode
6. restart the server -- crash recovery happens
7. execute pg_last_committed_xact()

The last call of pg_last_committed_xact() returns NULL values, which means
that the xid and timestamp information of the last committed transaction
disappeared by crash recovery. Isn't this a bug?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] atomic reads & writes (with no barriers)

2015-12-04 Thread Greg Stark

On Thu, Dec 3, 2015 at 10:10 PM, Kevin Grittner  wrote:
> Is the c.h change above on anything resembling the right track for
> a patch for this?  If not, what would such a patch look like?

It would be nicer if we could come up with an interface that didn't
require #ifdefs everywhere it's used.

Something like
...
  pg_maybe_atomic int64 threshold_timestamp;
...

SpinLockAcquire_if_no_atomics(...)
threshold_timestamp = >threshold_timestamp;
SpinLockRelease_if_no_atomics(...)

return threshold_timestamp;

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some questions about the array.

2015-12-04 Thread Teodor Sigaev


Some inconsistency (if we believe that omitted lower bound is equal to 1):
regression=# insert into arrtest_s values 
('[-1:9]={3,1,4,1,5,9,5,6,7,8,9}'::int[], null);

INSERT 0 1
regression=# UPDATE arrtest_s SET a[:2] = '{23, 24, 25}';
ERROR:  source array too small
regression=# UPDATE arrtest_s SET a[1:2] = '{23, 24, 25}';
UPDATE 1

Seems, omitting boundaries in insert/update isn't a good idea. I suggest to 
allow omitting only in select subscripting.


YUriy Zhuravlev wrote:

On Tuesday 01 December 2015 15:43:47 you wrote:

On Tuesday 01 December 2015 15:30:47 Teodor Sigaev wrote:

As I understand, update should fail with any array, so, first update
should
fail  too. Am I right?


You right. Done. New patch in attach.


Found error when omitted lower bound in INSERT like this:
INSERT INTO arrtest_s (a[:2], b[1:2]) VALUES ('{1,2,3,4,5}', '{7,8,9}');

I fix it in new patch.  Lower bound for new array is 1 by default.

Thanks.






--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2015-12-04 Thread Masahiko Sawada

On Fri, Dec 4, 2015 at 9:51 PM, Jeff Janes  wrote:
> On Tue, Dec 1, 2015 at 10:40 AM, Masahiko Sawada  
> wrote:
>> On Tue, Dec 1, 2015 at 3:04 AM, Jeff Janes  wrote:
>>> On Mon, Nov 30, 2015 at 9:18 AM, Masahiko Sawada  
>>> wrote:
 On Sun, Nov 29, 2015 at 2:21 AM, Jeff Janes  wrote:
> On Tue, Nov 24, 2015 at 3:13 PM, Masahiko Sawada  
> wrote:
>>
>> Yeah, we need to consider to compute checksum if enabled.
>> I've changed the patch, and attached.
>> Please review it.
>
> Thanks for the update.  This now conflicts with the updates doesn to
> fix pg_upgrade out-of-space issue on Windows. I've fixed (I think) the
> conflict in order to do some testing, but I'd like to get an updated
> patch from the author in case I did it wrong.  I don't want to find
> bugs that I just introduced myself.
>

 Thank you for having a look.

 Attached updated v28 patch.
 Please review it.

 Regards,
>>>
>>> After running pg_upgrade, if I manually vacuum a table a start getting 
>>> warnings:
>>>
>>> WARNING:  page is not marked all-visible (and all-frozen) but
>>> visibility map bit(s) is set in relation "foo" page 32756
>>> WARNING:  page is not marked all-visible (and all-frozen) but
>>> visibility map bit(s) is set in relation "foo" page 32756
>>> WARNING:  page is not marked all-visible (and all-frozen) but
>>> visibility map bit(s) is set in relation "foo" page 32757
>>> WARNING:  page is not marked all-visible (and all-frozen) but
>>> visibility map bit(s) is set in relation "foo" page 32757
>>>
>>> The warnings are right where the blocks would start using the 2nd page
>>> of the _vm, so I think the problem is there.  And looking at the code,
>>> I think that "cur += SizeOfPageHeaderData;" in the inner loop cannot
>>> be correct.  We can't skip a header in the current (old) block each
>>> time we reach the end of the new block.  The thing we are skipping in
>>> the current block is half the time not a header, but the data at the
>>> halfway point through the block.
>>>
>>
>> Thank you for reviewing.
>>
>> You're right, it's not necessary.
>> Attached latest v29 patch which removes the mention in pg_upgrade 
>> documentation.
>
> I could successfully upgrade with this patch, with the link option and
> without.  After the update the tables seemed to have their correct
> visibility status, and after a VACUUM FREEZE then had the correct
> freeze status as well.

Thank you for tesing!

> Then I manually corrupted the vm file, just to make sure a corrupted
> one would get detected.  And much to my surprise, I didn't get any
> errors or warning when starting it back up and running vacuum freeze
> (unless I had page checksums turned on, then I got warnings and zeroed
> out pages).  But I guess this is not considered a warnable condition
> for bits to be off when they should be on, only the opposite.

How did you break the vm file?
The inconsistent flags state (set all-frozen but not set all-visible)
will be detected in visibility map code.But the vm file has
consecutive bits simply after its page header, so detecting its
corruption would be difficult unless whole page is corrupted.

> Consecutive VACUUM FREEZE operations with no DML activity between were
> not sped up by as much as I thought they would be, because it still
> had to walk all the indexes even though it didn't touch the table at
> all.  In real-world usage there would almost always be some dead
> tuples that would require an index scan anyway for a normal vacuum.

The another reason why consecutive VACUUM FREEZE were not sped up is
the many pages of that table were on disk cache, right?
In case of very large database, vacuuming large table would engage the
total vacuum time, so it would be more effective.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More stable query plans via more predictable column statistics

2015-12-04 Thread Tom Lane

Robert Haas  writes:
> Still, maybe we should try to sneak at least this much into
> 9.5 RSN, because I have to think this is going to help people with
> mostly-NULL (or mostly-really-wide) columns.

Please no.  We are trying to get to release, not destabilize things.

I think this is fine work for leisurely review and incorporation into
9.6.  It's not appropriate to rush it into 9.5 at the RC stage after
minimal review.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynloader.h missing in prebuilt package for Windows?

2015-12-04 Thread Bruce Momjian

On Sat, Nov 14, 2015 at 07:11:32PM -0500, Chapman Flack wrote:
> I use the PG dynloader because, hey, you guys have already done the work
> of abstracting up from 12 different platforms' variations on dlopen, and
> it seems smarter to stand on your shoulders and not reinvent that. The
> one minor quirk is that the declaration of pg_dlsym is tailored to return
> a PGFunction specifically, which of course is not the type of the one
> function JNI_CreateJavaVM that I need to look up in libjvm. But adding
> a cast is no trouble. I am not expecting any platform's wrapped dl*
> functions to actually fail if the symbol hasn't got that exact type, right?

OK, good logical reason to install dynloader.h on Windows.  Also, I am
very glad you are working on PL/Java.  :-)

What do we need to do to close this item?  What versions of Windows
installers are missing dynloader.h?  Mingw?  MSVC?  EDB's?  OpenSCG? 
Postgres Pro (Russian)?  

Is this a change we need to make on the server end and then all the
installers will automatically install the file?  It is present in all
Unix-like installs?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More stable query plans via more predictable column statistics

2015-12-04 Thread Robert Haas

On Tue, Dec 1, 2015 at 10:21 AM, Shulgin, Oleksandr
 wrote:
> Hi Hackers!
>
> This post summarizes a few weeks of research of ANALYZE statistics
> distribution on one of our bigger production databases with some real-world
> data and proposes a patch to rectify some of the oddities observed.
>
>
> Introduction
> 
>
> We have observed that for certain data sets the distribution of samples
> between most_common_vals and histogram_bounds can be unstable: so that it
> may change dramatically with the next ANALYZE run, thus leading to radically
> different plans.
>
> I was revisiting the following performance thread and I've found some
> interesting details about statistics in our environment:
>
>
> http://www.postgresql.org/message-id/flat/CAMkU=1zxynmn11yl8g7agf7k5u4zhvjn0dqcc_eco1qs49u...@mail.gmail.com#CAMkU=1zxynmn11yl8g7agf7k5u4zhvjn0dqcc_eco1qs49u...@mail.gmail.com
>
> My initial interest was in evaluation if distribution of samples could be
> made more predictable and less dependent on the factor of luck, thus leading
> to more stable execution plans.
>
>
> Unexpected findings
> ===
>
> What I have found is that in a significant percentage of instances, when a
> duplicate sample value is *not* put into the MCV list, it does produce
> duplicates in the histogram_bounds, so it looks like the MCV cut-off happens
> too early, even though we have enough space for more values in the MCV list.
>
> In the extreme cases I've found completely empty MCV lists and histograms
> full of duplicates at the same time, with only about 20% of distinct values
> in the histogram (as it turns out, this happens due to high fraction of
> NULLs in the sample).

Wow, this is very interesting work.  Using values_cnt rather than
samplerows to compute avgcount seems like a clear improvement.  It
doesn't make any sense to raise the threshold for creating an MCV
based on the presence of additional nulls or too-wide values in the
table.  I bet compute_distinct_stats needs a similar fix.  But for
plan stability considerations, I'd say we should back-patch this all
the way, but those considerations might mitigate for a more restrained
approach.  Still, maybe we should try to sneak at least this much into
9.5 RSN, because I have to think this is going to help people with
mostly-NULL (or mostly-really-wide) columns.

As far as the rest of the fix, your code seems to remove the handling
for ndistinct < 0.  That seems unlikely to be correct, although it's
possible that I am missing something.  Aside from that, the rest of
this seems like a policy change, and I'm not totally sure off-hand
whether it's the right policy.  Having more MCVs can increase planning
time noticeably, and the point of the existing cutoff is to prevent us
from choosing MCVs that aren't actually "C".  I think this change
significantly undermines those protections.  It seems to me that it
might be useful to evaluate the effects of this part of the patch
separately from the samplerows -> values_cnt change.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 9.5 open items

2015-12-04 Thread Stephen Frost

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> Stephen Frost wrote:
> > * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> > > Stephen Frost wrote:
> > > 
> > > > The non-documentation question is around DROP OWNED.  We need to either
> > > > have policies dropped by DROP OWNED (well, roles removed, unless it's
> > > > the last one, in which case the policy should be dropped), or update the
> > > > documentation to reflect that they don't.  I had been thinking we'd
> > > > fix DROP OWNED to deal with the policies, but if folks feel it's too
> > > > late for that kind of a change, then we can simply document it.  I don't
> > > > believe that's unreasonable for a new feature and we can work to get it
> > > > addressed in 9.6.
> > > 
> > > DROP OWNED is documented as a mechanism to help you drop the role, so
> > > it should do whatever is needed for that.  I don't think documenting the
> > > fact that it leaves the user as part of policies is good enough.
> > 
> > We already can't take care of everything with DROP OWNED though, since
> > we can't do cross-database queries, and the overall process almost
> > certainly requires additional effort (to reassign objects, etc...), so
> > while I'd be happier if policies were handled by it, I don't think it's
> > as serious of an issue.
> 
> Yes, the documentation says to apply a combination of REASSIGN OWNED
> plus DROP OWNED to each database.  Sure, it's not a single command, but
> if you additionally put the burden that the policies must be taken care
> of separately, then the whole process is made a little worse.
> 
> > Still, I'll get a patch worked up for it and then we can discuss the
> > merits of that patch going in to 9.5 now versus just into HEAD.
> 
> Cool.
> 
> In the past, we've made a bunch of changes to DROP OWNED in order to
> deal with object types that caused errors, even in minor releases.  I
> think this is just another case of the same problem.

Patch attached for review/comment.

I noticed in passing that the role removal documentation should really
also discuss shared objects (as the DROP OWNED BY reference page does).

Thanks!

Stephen
From e3f59f57add2afbb27b68486865274db44e38ab6 Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Fri, 4 Dec 2015 11:00:52 -0500
Subject: [PATCH] Handle policies during DROP OWNED BY

DROP OWNED BY handled GRANT-based ACLs but was not removing roles from
policies.  Fix that by having DROP OWNED BY remove the role specified
from the list of roles the policy (or policies) apply to, or the entire
policy (or policies) if it only applied to the role specified.

As with ACLs, the DROP OWNED BY caller must have permission to modify
the policy or a WARNING is thrown and no change is made to the policy.
---
 src/backend/catalog/pg_shdepend.c |  13 ++
 src/backend/commands/policy.c | 246 ++
 src/include/commands/policy.h |   2 +
 src/test/regress/expected/rowsecurity.out |  14 ++
 src/test/regress/sql/rowsecurity.sql  |  19 +++
 5 files changed, 294 insertions(+)

diff --git a/src/backend/catalog/pg_shdepend.c b/src/backend/catalog/pg_shdepend.c
index 43076c9..eeb231b 100644
--- a/src/backend/catalog/pg_shdepend.c
+++ b/src/backend/catalog/pg_shdepend.c
@@ -50,6 +50,7 @@
 #include "commands/defrem.h"
 #include "commands/event_trigger.h"
 #include "commands/extension.h"
+#include "commands/policy.h"
 #include "commands/proclang.h"
 #include "commands/schemacmds.h"
 #include "commands/tablecmds.h"
@@ -1245,6 +1246,18 @@ shdepDropOwned(List *roleids, DropBehavior behavior)
 			sdepForm->classid,
 			sdepForm->objid);
 	break;
+case SHARED_DEPENDENCY_POLICY:
+	/* If unable to remove role from policy, remove policy. */
+	if (!RemoveRoleFromObjectPolicy(roleid,
+	sdepForm->classid,
+	sdepForm->objid))
+	{
+		obj.classId = sdepForm->classid;
+		obj.objectId = sdepForm->objid;
+		obj.objectSubId = sdepForm->objsubid;
+		add_exact_object_address(, deleteobjs);
+	}
+	break;
 case SHARED_DEPENDENCY_OWNER:
 	/* If a local object, save it for deletion below */
 	if (sdepForm->dbid == MyDatabaseId)
diff --git a/src/backend/commands/policy.c b/src/backend/commands/policy.c
index 8851fe7..7bdc1a4 100644
--- a/src/backend/commands/policy.c
+++ b/src/backend/commands/policy.c
@@ -408,6 +408,252 @@ RemovePolicyById(Oid policy_id)
 }
 
 /*
+ * RemoveRoleFromObjectPolicy -
+ *	 remove a role from a policy by its OID.  If the role is not a member of
+ *	 the policy then an error is raised.  False is returned to indicate that
+ *	 the role could not be removed due to being the only role on the policy
+ *	 and therefore the entire policy should be removed.
+ *
+ * Note that a warning will be thrown and true will be returned on a
+ * permission error, as the policy should not be removed in that case.
+ *
+ * roleid - the oid of the role to remove
+ * classid -

Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries

2015-12-04 Thread Pavel Stehule

2015-12-04 17:34 GMT+01:00 Alvaro Herrera :

> Pavel Stehule wrote:
>
> > It should be disabled by default
> >
> > only when you have some problems, then you can enable it
>
> That still seems mostly unworkable to me.  Are we going to tell DBAs to
> set PGOPTIONS when they have some pg_hba problem?
>

why not - it isn't bad idea.


>
> What's the issue with calling the function when you want to research
> some problem?  Presumably that's the whole point of the function.
>

sometimes you shouldn't set real parameters - I had to solve some issues
with IP6/IP4 - and I missed debug info on server side.

Regards

Pavel


>
> --
> Álvaro Herrerahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-12-04 Thread Alvaro Herrera

Michael Paquier wrote:

> By the way, if there are no objections, I am going to mark this patch
> as committed in the CF app. Putting in the infrastructure is already a
> good step forward, and I will add an entry in next CF to track the
> patch to add tests for recovery itself. Alvaro, what do you think?

Feel free to do that, but I'm likely to look into it before the next CF
anyway.  The thread about this has been open since 2013, so that doesn't
seem unfair.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-12-04 Thread Alvaro Herrera

Noah Misch wrote:

>   git checkout 807b9e0
>   (find src -name \*.pl -o -name \*.pm ) | sort -u | xargs perltidy 
> --profile=src/tools/pgindent/perltidyrc
> 
> perltidy v20090616 leaves the working directory clean, but perltidy v20150815
> introduces diffs:
> 
>  src/backend/catalog/genbki.pl| 15 ---
>  src/bin/pg_basebackup/t/010_pg_basebackup.pl |  2 +-
>  src/tools/msvc/Install.pm|  6 +++---
>  src/tools/msvc/Mkvcbuild.pm  |  2 +-
>  src/tools/msvc/Project.pm|  2 +-
>  src/tools/msvc/Solution.pm   |  5 ++---
>  src/tools/msvc/gendef.pl |  4 ++--
>  7 files changed, 18 insertions(+), 18 deletions(-)
> 
> You see a different result?

Oh, you're right -- on that commit, I get the same results as you with
v20090616 (no changes).  I don't have v20150815; the version packaged by
Debian is v20140328, and Install.pm is not changed by that one (so I
only get 15 lines changed, not 18).

I think my confusion stems from code that was introduced after the last
pgindent run.  I guess I could have tidied the original files prior to
patching.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Random crud left behind by aborted TAP tests

2015-12-04 Thread Tom Lane

Yesterday I was fooling around with the TAP tests, and at least once
I changed my mind and control-C'd out of a run.  Today I find:

[postgres@sss1 pgsql]$ git status
On branch master
Your branch is up-to-date with 'origin/master'.

Untracked files:
  (use "git add ..." to include in what will be committed)

src/bin/scripts/tmp_testGvBC/

nothing added to commit but untracked files present (use "git add" to track)
[postgres@sss1 pgsql]$ ls src/bin/scripts/tmp_testGvBC
archives/  backup/  pgdata/

"make distclean" doesn't get rid of it either; I'll have to remove it
manually.

Judging by the subdirectory names, this is "_basedir" of some PostgresNode
instance.  Should this not have been put inside "tmp_check/" so that
existing make clean/distclean rules could take care of it?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: multiple psql option -c

2015-12-04 Thread Catalin Iacob

On Fri, Dec 4, 2015 at 3:47 PM, Robert Haas  wrote:
> For the most part, the cleanups in this version are just cosmetic: I
> fixed some whitespace damage, and reverted some needless changes to
> the psql references page that were whitespace-only adjustments.  In a
> few places, I tweaked documentation or comment language.

Sorry for the docs whitespace-only changes, I did that.

I realized before the submission I made the diff bigger than it needed
to be, but that's because I used M-q in Emacs to break the lines I did
change and that reformatted the whole paragraph including some
unchanged lines. Breaking all the lines by hand would be quite a job,
and any time you go back and tweak the wording or so you need to do it
again. So I just used M-q and sent the result of that.
Do you happen to know of a better way to do this? I do load
src/tools/editors/emacs.samples in my ~/.emacs but it seems the width
my Emacs chooses doesn't match the one already in the file.

The doc tweaks are good, they make the text more clear. I'm happy
that's all you found to improve: writing good docs is hard and the
Postgres docs are already good so it's not easy to change them, I had
the feeling I'll only make them worse and spent quite some time trying
not to do that.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries

2015-12-04 Thread Pavel Stehule

2015-12-04 17:16 GMT+01:00 Alvaro Herrera :

> Haribabu Kommi wrote:
>
> > The trace messages that are going to print doesn't come to client until
> the
> > connection gets successful. The traces may not useful for the clients
> > to find out
> > why the connection is failing. But it may be useful for administrators.
> > How about the attached patch?
> >
> > [kommih@localhost bin]$ ./psql postgres -h ::1
> > psql (9.6devel)
> > Type "help" for help.
> >
> > postgres=#
> >
> > ServerLog:
> > NOTICE:  Skipped 84 pg_hba line, because of host connection type.
> > NOTICE:  Skipped 86 pg_hba line, because of non matching IP.
>
> That's going to be way too noisy.  Some applications open dozens of
> connections per second -- imagine a dozen NOTICEs per each connection
> established.  It's going to fill any disk you install as the server log
> partition ...
>
> I can imagine worse nightmares, but this one's a pretty ugly one.
>

It should be disabled by default

only when you have some problems, then you can enable it

Regards

Pavel


>
> --
> Álvaro Herrerahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: [HACKERS] Random crud left behind by aborted TAP tests

2015-12-04 Thread Alvaro Herrera

Alvaro Herrera wrote:
> Tom Lane wrote:
> > Yesterday I was fooling around with the TAP tests, and at least once
> > I changed my mind and control-C'd out of a run.  Today I find:
> > 
> > [postgres@sss1 pgsql]$ git status
> > On branch master
> > Your branch is up-to-date with 'origin/master'.
> > 
> > Untracked files:
> >   (use "git add ..." to include in what will be committed)
> > 
> > src/bin/scripts/tmp_testGvBC/
> 
> Hmm ... I would have supposed that this directory should have been
> removed by File::Temp's END block.  Are END blocks abandoned completely
> when a signal is received?  That seems pretty surprising.

Yes, they are.  Quoth man perlmod:

   An "END" code block is executed as late as possible, that is,
   after perl has finished running the program and just before
   the interpreter is being exited, even if it is exiting as a
   result of a die() function.  (But not if it's morphing into
   another program via "exec", or being blown out of the water
   by a signal--you have to trap that yourself (if you can).)

One option is to install a signal handler somewhere to clean this up.
Perhaps we can just set $SIG{INT} to a function that calls "die" or
something like that.  But this doesn't solve the problem if the system
crashes while a test is running, for instance, so perhaps your idea of
moving these directories to inside tmp_check/ is good.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS]WIP: Covering + unique indexes.

2015-12-04 Thread Anastasia Lubennikova




03.12.2015 04:03, Robert Haas пишет:

On Tue, Dec 1, 2015 at 7:53 AM, Anastasia Lubennikova
 wrote:

If we don't need c4 as an index scankey, we don't need any btree opclass on
it.
But we still want to have it in covering index for queries like

SELECT c4 FROM tbl WHERE c1=1000; // uses the IndexOnlyScan
SELECT * FROM tbl WHERE c1=1000; // uses the IndexOnlyScan

The patch "optional_opclass" completely ignores opclasses of included
attributes.

OK, I don't get it.  Why have an opclass here at all, even optionally?


We haven't opclass on c4 and there's no need to have it.
But now (without a patch) it's impossible to create covering index, 
which contains columns with no opclass for btree.


test=# create index on tbl using btree (c1, c4);
ERROR:  data type box has no default operator class for access method 
"btree"


ComputeIndexAttrs() processes the list of index attributes and trying to 
get an opclass for each of them via GetIndexOpClass().
The patch drops this check for included attributes. So it makes possible 
to store any datatype in btree  and use IndexOnlyScan advantages.


I hope that this helps to clarify.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries

2015-12-04 Thread Alvaro Herrera

Haribabu Kommi wrote:

> The trace messages that are going to print doesn't come to client until the
> connection gets successful. The traces may not useful for the clients
> to find out
> why the connection is failing. But it may be useful for administrators.
> How about the attached patch?
> 
> [kommih@localhost bin]$ ./psql postgres -h ::1
> psql (9.6devel)
> Type "help" for help.
> 
> postgres=#
> 
> ServerLog:
> NOTICE:  Skipped 84 pg_hba line, because of host connection type.
> NOTICE:  Skipped 86 pg_hba line, because of non matching IP.

That's going to be way too noisy.  Some applications open dozens of
connections per second -- imagine a dozen NOTICEs per each connection
established.  It's going to fill any disk you install as the server log
partition ...

I can imagine worse nightmares, but this one's a pretty ugly one.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries

2015-12-04 Thread Alvaro Herrera

Haribabu Kommi wrote:

> How about as follows?
> 
> postgres=# select * from pg_hba_lookup('all','all','::1');
>  line_number | type  | database |  user   |  address  | hostname | method | 
> options |  mode
> -+---+--+-+---+--++-+-
> 84   | local  | ["all"] | ["all"] |   |  | trust  | 
> {}  | skipped
> 86   | host   | ["all"] | ["all"] | 127.0.0.1 |  | trust  | 
> {}  | skipped
> 88   | host   | ["all"] | ["all"] | ::1   |  | trust  | 
> {}  | matched
> (3 rows)

What did you do to the whitespace when posting that table?  I had to
reformat it pretty heavily to understand what you had.
Anyway, I think the "mode" column should be right after the line number.
I assume the "reason" for skipped lines is going to be somewhere in the
table too.

What happens if a "reject" line is matched?  I hope the lookup
would terminate there.

What does it mean to query for "all"?  Do you have database and user
named "all"?  Because otherwise that seems wrong to me; you should be
able to query for specific databases/users, but not for special
keywords; maybe I am wrong and there is a use case for this, in which
case please state what it is.

I see three problems in your code.  One is that the translation of
auth_method enum to text should be a separate function, not the SQL
function layer; another is that the code to put keywords as JSON object
values is way too repetitive; the other is that messing with the JSON
API like that is not nice.  (I don't think we're closed to doing that,
but that would be a separate discussion).  I think this patch should
just use the "push value" interface rather than expose add_jsonb.

(I assume the usage of JSON rather than a regular array was already
discussed and JSON was chosen for some reason.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] atomic reads & writes (with no barriers)

2015-12-04 Thread Andres Freund

Hi,

On 2015-12-03 16:10:51 -0600, Kevin Grittner wrote:
> Is the c.h change above on anything resembling the right track for
> a patch for this?  If not, what would such a patch look like?

I think a better path would be to add fallback support for 64bit atomics
- like we already have for 32bit. That should really only take a few
lines. Then you can use pg_atomic_read_u64/pg_atomic_write_u64 and all
the rest of the atomics api.

For that to be helpful we'd want a more efficient read/write
implementation for some platforms (falls back to compare/exchange right
now), but that's easy.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries

2015-12-04 Thread Alvaro Herrera

Pavel Stehule wrote:

> It should be disabled by default
> 
> only when you have some problems, then you can enable it

That still seems mostly unworkable to me.  Are we going to tell DBAs to
set PGOPTIONS when they have some pg_hba problem?

What's the issue with calling the function when you want to research
some problem?  Presumably that's the whole point of the function.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Replication stats from slaves

2015-12-04 Thread Jehan-Guillaume de Rorthais

Hi hackers,

We are developing and maintaining a Pacemaker resource agent for PostgreSQL
here. This agent is "stateful", ie. it knows where is the master and where are
the slaves. See [1]. Now that this resource agent version is close to what we
wanted to achieve, we will make some official announce soon, with details &
stuff. We welcome feedback, help issues, etc, but on github please, not in this
thread.

For the next version, we are now considering to improve the switchover
mechanism with appropriate checks for every steps. For reminder, the switchover
in PostgreSQL is possible since the following commit:

  commit 985bd7d49726c9f178558491d31a570d47340459
  Author: Fujii Masao 
  Date:   Wed Jun 26 02:14:37 2013 +0900

It requires:

  (1) shutdown the master first
  (2) make sure the slave received the shutdown checkpoint from the old master
  (3) promote the slave as master
  (4) start the old master as slave

The problem here is step (2). After discussing IRL with Magnus and
Heikki, they confirmed me checking this using pg_xlogdump is fine, eg.
(reformated):

  $ pg_xlogdump 0001000B0014 -s 'B/14D845F8' -n1 
  rmgr: XLOG
  len (rec/tot): 72/104, 
  tx: 0, 
  lsn: B/14D845F8, prev B/14D84590, bkp: , 
  desc: checkpoint: redo B/14D845F8; 
  tli 1; prev tli 1; fpw true; 
  xid 0/6422462;oid 1183646; multi 1; offset 0; 
  oldest xid 712 in DB 1; oldest multi 1 in DB 1; oldest running xid 0;
  shutdown

This is possible from the resource agent point of view, but not really in a
clean way. It requires:

  * to keep in memory the last LSN of the master after shutdown
  * check on the slave this LSN has been received
  * check the record is a rmgr XLOG with a shutdown information as payload
  * check this is the very last WAL record received (nothing after).

First, looking at the last LSN and creating a cluster attribute (in
Pacemaker context) from the old master to share it with slaves is possible, but
not really elegant for a resource agent. Then, the -n1 in sample command here
avoid pg_xlogdump to exit with an error and a rc=1. But it is not compatible
with the last check (very last WAL record) and I need to check the command
succeed.


A best solution here would be to be able to check from a view on the slave, say
pg_stat_standby, when it was connected to the master for the last time, the last
wal restored by log shipping, last LSN received by streaming rep, flushed,
how/why the SR has been disconnected. As instance, reasons for SR
disconnection might be: master shutdown, too much lag, connection reset.

I can try to give a try to such patch after some acceptance and discussing
what exactly we should push in such view.

Comments? Guidance? Thoughts? Other solutions?

Thanks!

Regards,

[1] https://github.com/dalibo/pgsql-resource-agent/tree/master/multistate
-- 
Jehan-Guillaume de Rorthais
Dalibo
http://www.dalibo.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2015-12-04 Thread Jeff Janes

On Tue, Dec 1, 2015 at 10:40 AM, Masahiko Sawada  wrote:
> On Tue, Dec 1, 2015 at 3:04 AM, Jeff Janes  wrote:
>> On Mon, Nov 30, 2015 at 9:18 AM, Masahiko Sawada  
>> wrote:
>>> On Sun, Nov 29, 2015 at 2:21 AM, Jeff Janes  wrote:
 On Tue, Nov 24, 2015 at 3:13 PM, Masahiko Sawada  
 wrote:
>
> Yeah, we need to consider to compute checksum if enabled.
> I've changed the patch, and attached.
> Please review it.

 Thanks for the update.  This now conflicts with the updates doesn to
 fix pg_upgrade out-of-space issue on Windows. I've fixed (I think) the
 conflict in order to do some testing, but I'd like to get an updated
 patch from the author in case I did it wrong.  I don't want to find
 bugs that I just introduced myself.

>>>
>>> Thank you for having a look.
>>>
>>> Attached updated v28 patch.
>>> Please review it.
>>>
>>> Regards,
>>
>> After running pg_upgrade, if I manually vacuum a table a start getting 
>> warnings:
>>
>> WARNING:  page is not marked all-visible (and all-frozen) but
>> visibility map bit(s) is set in relation "foo" page 32756
>> WARNING:  page is not marked all-visible (and all-frozen) but
>> visibility map bit(s) is set in relation "foo" page 32756
>> WARNING:  page is not marked all-visible (and all-frozen) but
>> visibility map bit(s) is set in relation "foo" page 32757
>> WARNING:  page is not marked all-visible (and all-frozen) but
>> visibility map bit(s) is set in relation "foo" page 32757
>>
>> The warnings are right where the blocks would start using the 2nd page
>> of the _vm, so I think the problem is there.  And looking at the code,
>> I think that "cur += SizeOfPageHeaderData;" in the inner loop cannot
>> be correct.  We can't skip a header in the current (old) block each
>> time we reach the end of the new block.  The thing we are skipping in
>> the current block is half the time not a header, but the data at the
>> halfway point through the block.
>>
>
> Thank you for reviewing.
>
> You're right, it's not necessary.
> Attached latest v29 patch which removes the mention in pg_upgrade 
> documentation.

I could successfully upgrade with this patch, with the link option and
without.  After the update the tables seemed to have their correct
visibility status, and after a VACUUM FREEZE then had the correct
freeze status as well.

Then I manually corrupted the vm file, just to make sure a corrupted
one would get detected.  And much to my surprise, I didn't get any
errors or warning when starting it back up and running vacuum freeze
(unless I had page checksums turned on, then I got warnings and zeroed
out pages).  But I guess this is not considered a warnable condition
for bits to be off when they should be on, only the opposite.

Consecutive VACUUM FREEZE operations with no DML activity between were
not sped up by as much as I thought they would be, because it still
had to walk all the indexes even though it didn't touch the table at
all.  In real-world usage there would almost always be some dead
tuples that would require an index scan anyway for a normal vacuum.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Random crud left behind by aborted TAP tests

2015-12-04 Thread Alvaro Herrera

Tom Lane wrote:
> Yesterday I was fooling around with the TAP tests, and at least once
> I changed my mind and control-C'd out of a run.  Today I find:
> 
> [postgres@sss1 pgsql]$ git status
> On branch master
> Your branch is up-to-date with 'origin/master'.
> 
> Untracked files:
>   (use "git add ..." to include in what will be committed)
> 
> src/bin/scripts/tmp_testGvBC/

Hmm ... I would have supposed that this directory should have been
removed by File::Temp's END block.  Are END blocks abandoned completely
when a signal is received?  That seems pretty surprising.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 9.5 open items

2015-12-04 Thread Geoff Winkless

On 4 December 2015 at 15:50, Simon Riggs  wrote:

> Do we think they ever launched a Saturn V that didn't have some marginal
> flashing lights somewhere?
>

Almost certainly. They had triple-redundant systems that were certified
for correctness. You don't knowingly send rockets into space with dubious
control systems.

I accept there are open items. I'd like a way to indicate to people they
> can start using it with a safety, apart from the listed caveats.
>

Just to add my .2c worth...

T
hat's what betas are for.

There's an implied open-source contract

that anyone who wants to use a feature in the next version will invest a
little of their time making sure that the feature works for them in the
beta before it gets released.

The developer side of that contract is that you fix the bugs people found
in the beta before release, because otherwise next time they won't bother.

And if you start pushing out full releases that you *know *introduce bugs
that weren't in the previous release, "with caveats" or not, you end up in
a situation where people won't upgrade until the second or third point
release.

Geoff

Re: [HACKERS] Random crud left behind by aborted TAP tests

2015-12-04 Thread Tom Lane

Alvaro Herrera  writes:
>> Hmm ... I would have supposed that this directory should have been
>> removed by File::Temp's END block.  Are END blocks abandoned completely
>> when a signal is received?  That seems pretty surprising.

> Yes, they are.  Quoth man perlmod:

>An "END" code block is executed as late as possible, that is,
>after perl has finished running the program and just before
>the interpreter is being exited, even if it is exiting as a
>result of a die() function.  (But not if it's morphing into
>another program via "exec", or being blown out of the water
>by a signal--you have to trap that yourself (if you can).)

> One option is to install a signal handler somewhere to clean this up.
> Perhaps we can just set $SIG{INT} to a function that calls "die" or
> something like that.  But this doesn't solve the problem if the system
> crashes while a test is running, for instance, so perhaps your idea of
> moving these directories to inside tmp_check/ is good.

A second signal received while we're trying to respond to the first
would break it too.  (What I was actually doing was strace'ing the whole
process, which would've slowed things down enough that that wouldn't
be an unlikely thing ...)

Let's just put the cruft in tmp_check/ and call it good.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support of partial decompression for datums

2015-12-04 Thread Michael Paquier

On Sat, Dec 5, 2015 at 12:10 AM, Simon Riggs  wrote:
> On 4 December 2015 at 13:47, Ildus Kurbangaliev
>  wrote:
>
>>
>> Attached patch adds support of partial decompression for datums.
>> It will be useful in many cases when extracting part of data is
>> enough for big varlena structures.
>>
>> It is especially useful for expanded datums, because it provides
>> storage for partial results.
>
>
> This isn't enough for anyone else to follow your thoughts and agree enough
> to commit.
>
> Please explain the whole idea, starting from what problem you are trying to
> solve and how well this does it, why you did it this way and the other ways
> you tried/decided not to pursue. Thanks.

Yeah, I would imagine that what Ildus is trying to achieve is
something close to LZ4_decompress_safe_partial, by being able to stop
compression after getting a certain amount of data decompressed, and
continue working once again after.

And actually I think I get the idea. With his test case, what we get
first is a size, and then we reuse this size to extract only what we
need to fetch only a number of items from the tsvector. But that's
actually linked to the length of the compressed chunk, and at the end
we would still need to decompress the whole string perhaps, but it is
not possible to be sure using the information provided.

Ildus, using your patch for tsvector, are you aiming at being able to
complete an operation by only using a portion of the compressed data?
Or are you planning to use that to improve the speed of detection of
corrupted data in the chunk? If that's the latter, we would need to
still decompress the whole string anyway, so having a routine able to
decompress only until a given position is not necessary, and based on
the example given upthread it is not possible to know what you are
trying to achieve. Hence could you share your thoughts regarding your
stuff with tsvector?

Changing pglz_decompress shape is still a bad idea anyway, I guess we
had better have something new like pglz_decompress_partial instead.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Size of Path nodes

2015-12-04 Thread Robert Haas

On Fri, Dec 4, 2015 at 12:50 PM, Tom Lane  wrote:
> So over in my private branch where I've been fooling around with upper
> planner pathification, I've been sweating bullets to avoid enlarging the
> size of struct Path, because I was aware that common path types like
> IndexPath and AppendPath were already a power-of-2 size (at least on
> 64-bit machines), meaning that palloc'ing them would cost twice as much
> space if I added any fields.
>
> When I got around to merging up to HEAD, I found this in commit f0661c4e8:
>
> @@ -753,6 +753,7 @@ typedef struct Path
>
>  RelOptInfo *parent;/* the relation this path can build */
>  ParamPathInfo *param_info; /* parameterization info, or NULL if none 
> */
> +boolparallel_aware;/* engage parallel-aware logic? */
>
>  /* estimated size/costs for path (see costsize.c for more info) */
>  doublerows;/* estimated number of result tuples */
>
> which means Robert has already blown the planner's space consumption out
> by close to a factor of 2, and I should stop worrying.  Or else he should
> start worrying.  Do we really need this field?  Did anyone pay any
> attention to what happened to planner space consumption and performance
> when the parallel-scan patch went in?  Or am I just too conditioned by
> working with last-century hardware, and I should stop caring about how
> large these nodes are?

If it's really true that the extra byte I added there has doubled
planner memory use, then that's definitely cause for concern.
However, I am skeptical that's really what has happened here.  Not
every path has crossed a power-of-two threshold, and paths are not the
only things the planner allocates.  What's a reasonable way to assess
the effect of this on planner memory use in general?

I really HOPE that this isn't a serious problem.  The pending patch to
all parallel joins adds another int right after that bool, and I doubt
rather deeply that'll be the end of it.  I'm skeptical that int should
be a double, and that maybe there should be a few more things I
don't expect to need tons of space here, but the current code is
pretty stupid about estimating the number of workers that are needed
for a parallel plan, and at some point we're going to want to make
that less stupid, and that's probably going to require some more
bookkeeping.  I don't know exactly how that's all going to work yet,
but it's pretty hard to make the planner general parallel query paths
for a wide variety of plan types if none of those plan types carry any
information specific to parallel query.

> While I'm bitching about this: a field added to a struct as fundamental
> as Path ought to have a pretty well-defined meaning, and this does not
> as far as I can tell.  There was certainly no documentation worthy of
> the name added by the above commit.  What is the semantic difference
> between a Path with this flag set and the identical Path without?
> Likewise for struct Plan?

/me crawls into a hole.

Well, funny story about that.  The parallel sequential scan code
forgot to make that flag actually do what it was budgeted to do, but
it turns out not to matter as long as the only thing that can be done
in parallel is a sequential scan.  So there's no live bug right now,
and the parallel join patch adds the missing code to make that flag
actually meaningful.  So I'm not very surprised that you found the
current state of things stupid. The reason it's needed is because the
parallel join patch generates plans like this:

Gather
-> Hash Join
  -> Parallel Seq Scan
  -> Hash
-> Seq Scan

The idea of this plan is that each worker builds a hash table on the
inner rel, and then reads a subset of the outer rel and performs the
join for the subset of rows which it reads.  Then the Gather brings
all the worker results together in the leader.  Well, when I initially
coded this, it wasn't working properly, and I couldn't for the life of
me figure out why.

Eventually, after hours of debugging, I realized that while I'd added
this parallel_aware flag with the idea of distinguishing between the
outer seq scan (which needs to be split between the workers) from the
inner seq scan (which needs to be executed to completion by every
worker) there was nothing in the code which would actually cause it to
work like that.  The inner seq scan was behaving as if it were
parallel - that is, it returned only a subset of the rows in each
worker - even though the parallel_aware flag was not set.  I had
thought that there was some logic in execParallel.c or, uh, somewhere,
which would give that flag meaning but it turns out that, in master,
there isn't.  So the parallel join patch, which is otherwise just an
optimizer patch, also adds the missing bits to execParallel.c.  Maybe
I should have committed that separately as a bug fix.

To try to clarify a little further, execParallel.c contains three
functions that make passes over the PlanState

Re: [HACKERS] Size of Path nodes

2015-12-04 Thread Jim Nasby


On 12/4/15 11:50 AM, Tom Lane wrote:

which means Robert has already blown the planner's space consumption out
by close to a factor of 2, and I should stop worrying.  Or else he should
start worrying.  Do we really need this field?  Did anyone pay any
attention to what happened to planner space consumption and performance
when the parallel-scan patch went in?  Or am I just too conditioned by
working with last-century hardware, and I should stop caring about how
large these nodes are?


I suspect Cachegrind[1] would answer a lot of these questions (though 
I've never actually used it). I can't get postgres to run under valgrind 
on my laptop, but maybe someone that's been successful at valgrind can 
try cachegrind (It's just another mode of valgrind).


[1] http://valgrind.org/docs/manual/cg-manual.html
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 9.5 open items

2015-12-04 Thread Stephen Frost

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> Stephen Frost wrote:
> > Still, I'll get a patch worked up for it and then we can discuss the
> > merits of that patch going in to 9.5 now versus just into HEAD.
> 
> Cool.

While working on the DROP OWNED BY patch, and part of what took me a bit
longer with it, I came to the realiziation that ALTER POLICY wasn't
handling dependencies quite right.  All of the policy's dependencies
would be dropped, but then only those objects referred to in the ALTER
POLICY command would have dependencies recreated for them.

The attached patch fixes that (using the same approach that I used in
the DROP OWNED BY patch).

Comments welcome, as always.

I'll plan to apply these two patches in a couple of days.

Thanks!

Stephen
From c3d396f0ca6b5caedb96cd3d00f15e271fda08a3 Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Fri, 4 Dec 2015 14:01:23 -0500
Subject: [PATCH] Handle dependencies properly in ALTER POLICY

ALTER POLICY hadn't fully considered partial policy alternation
(eg: change just the roles on the policy, or just change one of
the expressions) when rebuilding the dependencies.  Instead, it
would happily remove all dependencies which existed for the
policy and then only recreate the dependencies for the objects
referred to in the specific ALTER POLICY command.

Correct that by extracting and building the dependencies for all
objects referenced by the policy, regardless of if they were
provided as part of the ALTER POLICY command or were already in
place as part of the pre-existing policy.
---
 src/backend/commands/policy.c | 97 +++
 src/test/regress/expected/rowsecurity.out | 43 ++
 src/test/regress/sql/rowsecurity.sql  | 31 ++
 3 files changed, 171 insertions(+)

diff --git a/src/backend/commands/policy.c b/src/backend/commands/policy.c
index 8851fe7..2db8125 100644
--- a/src/backend/commands/policy.c
+++ b/src/backend/commands/policy.c
@@ -766,6 +766,35 @@ AlterPolicy(AlterPolicyStmt *stmt)
 		replaces[Anum_pg_policy_polroles - 1] = true;
 		values[Anum_pg_policy_polroles - 1] = PointerGetDatum(role_ids);
 	}
+	else
+	{
+		Oid*roles;
+		Datum		roles_datum;
+		bool		attr_isnull;
+		ArrayType  *policy_roles;
+
+		/*
+		 * We need to pull the set of roles this policy applies to from
+		 * what's in the catalog, so that we can recreate the dependencies
+		 * correctly for the policy.
+		 */
+
+		roles_datum = heap_getattr(policy_tuple, Anum_pg_policy_polroles,
+   RelationGetDescr(pg_policy_rel),
+   _isnull);
+		Assert(!attr_isnull);
+
+		policy_roles = DatumGetArrayTypePCopy(roles_datum);
+
+		roles = (Oid *) ARR_DATA_PTR(policy_roles);
+
+		nitems = ARR_DIMS(policy_roles)[0];
+
+		role_oids = (Datum *) palloc(nitems * sizeof(Datum));
+
+		for (i = 0; i < nitems; i++)
+			role_oids[i] = ObjectIdGetDatum(roles[i]);
+	}
 
 	if (qual != NULL)
 	{
@@ -773,6 +802,40 @@ AlterPolicy(AlterPolicyStmt *stmt)
 		values[Anum_pg_policy_polqual - 1]
 			= CStringGetTextDatum(nodeToString(qual));
 	}
+	else
+	{
+		Datum	value_datum;
+		bool	attr_isnull;
+
+		/*
+		 * We need to pull the USING expression and build the range table for
+		 * the policy from what's in the catalog, so that we can recreate
+		 * the dependencies correctly for the policy.
+		 */
+
+		/* Check if the policy has a USING expr */
+		value_datum = heap_getattr(policy_tuple, Anum_pg_policy_polqual,
+   RelationGetDescr(pg_policy_rel),
+   _isnull);
+		if (!attr_isnull)
+		{
+			char	   *qual_value;
+			ParseState *qual_pstate = make_parsestate(NULL);
+
+			/* parsestate is built just to build the range table */
+			qual_pstate = make_parsestate(NULL);
+
+			qual_value = TextDatumGetCString(value_datum);
+			qual = stringToNode(qual_value);
+
+			/* Add this rel to the parsestate's rangetable, for dependencies */
+			addRangeTableEntryForRelation(qual_pstate, target_table, NULL,
+		  false, false);
+
+			qual_parse_rtable = qual_pstate->p_rtable;
+			free_parsestate(qual_pstate);
+		}
+	}
 
 	if (with_check_qual != NULL)
 	{
@@ -780,6 +843,40 @@ AlterPolicy(AlterPolicyStmt *stmt)
 		values[Anum_pg_policy_polwithcheck - 1]
 			= CStringGetTextDatum(nodeToString(with_check_qual));
 	}
+	else
+	{
+		Datum	value_datum;
+		bool	attr_isnull;
+
+		/*
+		 * We need to pull the WITH CHECK expression and build the range table
+		 * for the policy from what's in the catalog, so that we can recreate
+		 * the dependencies correctly for the policy.
+		 */
+
+		/* Check if the policy has a WITH CHECK expr */
+		value_datum = heap_getattr(policy_tuple, Anum_pg_policy_polwithcheck,
+   RelationGetDescr(pg_policy_rel),
+   _isnull);
+		if (!attr_isnull)
+		{
+			char	   *with_check_value;
+			ParseState *with_check_pstate = make_parsestate(NULL);
+
+			/* parsestate is built just to build the range table */
+			with_check_pstate = make_parsestate(NULL);
+
+			with_check_value

[HACKERS] remapped localhost causes connections to localhost to fail using Postgres

2015-12-04 Thread Dann Corbit

Using a Windows computer, editing the file:
C:\WINDOWS\system32\drivers\etc\hosts
the localhost entry was remapped to the machine name by adding the following 
line:
127.0.0.1   

After this change, Postgres would not allow access using the address localhost.
Only using the machine name to access the database was possible.
Is this by design?

In other words
psql -h localhost
fails, but:
psql -h 
succeeds.

Re: [HACKERS] remapped localhost causes connections to localhost to fail using Postgres

2015-12-04 Thread Bill Moran

On Fri, 4 Dec 2015 21:55:23 +
Dann Corbit  wrote:

> Using a Windows computer, editing the file:
> C:\WINDOWS\system32\drivers\etc\hosts
> the localhost entry was remapped to the machine name by adding the following 
> line:
> 127.0.0.1   
> 
> After this change, Postgres would not allow access using the address 
> localhost.
> Only using the machine name to access the database was possible.
> Is this by design?
> 
> In other words
> psql -h localhost
> fails, but:
> psql -h 
> succeeds.

This has nothing to do with Postgres. This is the behavior of the operating
system. I expect that you'll find that no program on the machine is able to
connect using the string "localhost" any more. psql is simply using the
OS' builtin name resolution, and you told that name resolution that there
is no longer a machine named localhost.

On posix systems, it's cusomary to put multiple names in the hosts file,
localhost included. I expect that Windows is the same, but I haven't used
it in so long that I can't be sure.

-- 
Bill Moran

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Size of Path nodes

2015-12-04 Thread Peter Geoghegan

On Fri, Dec 4, 2015 at 2:44 PM, Jim Nasby  wrote:
> I suspect Cachegrind[1] would answer a lot of these questions (though I've
> never actually used it). I can't get postgres to run under valgrind on my
> laptop, but maybe someone that's been successful at valgrind can try
> cachegrind (It's just another mode of valgrind).

I've used Cachegrind, and think it's pretty good. You still need a
test case that exercises what you're interested in, though.
Distributed costs are really hard to quantify. Sometimes that's
because they don't exist, and sometimes it's because they can only be
quantified as part of a value judgement.

As frustrated as I've sometimes been with those discussions, I do
recognize that there has to be a middle ground, and that the emphasis
on distributed costs has as much to do with fairness for every
contributor as anything else. I would have appreciated some attempt to
have quantified the overhead here, but would not have insisted on
Robert being as thorough as he conceivably could have been.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-12-04 Thread Robert Haas

On Tue, Dec 1, 2015 at 10:20 PM, Etsuro Fujita
 wrote:
> One thing I can think of is that we can keep both the structure of a
> ForeignPath node and the API of create_foreignscan_path as-is.  The latter
> is a good thing for FDW authors.  And IIUC the patch you posted today, I
> think we could make create_foreignscan_plan a bit simpler too.  Ie, in your
> patch, you modified that function as follows:
>
> @@ -2129,7 +2134,9 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath
> *best_path,
>  */
> scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid,
>
> best_path,
> -
> tlist, scan_clauses);
> +
> tlist,
> +
> scan_clauses);
> +   outerPlan(scan_plan) = fdw_outerplan;
>
> I think that would be OK, but I think we would have to do a bit more here
> about the fdw_outerplan's targetlist and qual; I think that the targetlist
> needs to be changed to fdw_scan_tlist, as in the patch [1], and that it'd be
> better to change the qual to remote conditions, ie, quals not in the
> scan_plan's scan.plan.qual, to avoid duplicate evaluation of local
> conditions.  (In the patch [1], I didn't do anything about the qual because
> the current postgres_fdw join pushdown patch assumes that all the the
> scan_plan's scan.plan.qual are pushed down.) Or, FDW authors might want to
> do something about fdw_recheck_quals for a foreign-join while creating the
> fdw_outerplan.  So if we do that during GetForeignPlan, I think we could
> make create_foreignscan_plan a bit simpler, or provide flexibility to FDW
> authors.

It's certainly true that we need the alternative plan's tlist to match
that of the main plan; otherwise, it's going to be difficult for the
FDW to make use of that alternative subplan to fill its slot, which is
kinda the point of all this.  However, I'm quite reluctant to
introduce code into create_foreignscan_plan() that forces the
subplan's tlist to match that of the main plan.  For one thing, that
would likely foreclose the possibility of an FDW ever using the outer
plan for any purpose other than EPQ rechecks.  It may be hard to
imagine what else you'd do with the outer plan as things are today,
but right now the two haves of the patch - letting FDWs have an outer
subplan, and providing them with a way of overriding the EPQ recheck
behavior - are technically independent.  Putting tlist-altering
behavior into create_foreignscan_plan() ties those two things together
irrevocably.

Instead, I think we should go the opposite direction and pass the
outerplan to GetForeignPlan after all.  I was lulled into a full sense
of security by the realization that every FDW that uses this feature
MUST want to do outerPlan(scan_plan) = fdw_outerplan.  That's true,
but irrelevant.  The point is that the FDW might want to do something
additional, like frob the outer plan's tlist, and it can't do that if
we don't pass it fdw_outerplan.  So we should do that, after all.

Updated patch attached.  This fixes a couple of whitespace issues that
were pointed out, also.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 5ce8f90..83bbfa1 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -121,7 +121,8 @@ static ForeignScan *fileGetForeignPlan(PlannerInfo *root,
    Oid foreigntableid,
    ForeignPath *best_path,
    List *tlist,
-   List *scan_clauses);
+   List *scan_clauses,
+   Plan *outer_plan);
 static void fileExplainForeignScan(ForeignScanState *node, ExplainState *es);
 static void fileBeginForeignScan(ForeignScanState *node, int eflags);
 static TupleTableSlot *fileIterateForeignScan(ForeignScanState *node);
@@ -525,6 +526,7 @@ fileGetForeignPaths(PlannerInfo *root,
 	 total_cost,
 	 NIL,		/* no pathkeys */
 	 NULL,		/* no outer rel either */
+	 NULL,		/* no extra plan */
 	 coptions));

 	/*
@@ -544,7 +546,8 @@ fileGetForeignPlan(PlannerInfo *root,
    Oid foreigntableid,
    ForeignPath *best_path,
    List *tlist,
-   List *scan_clauses)
+   List *scan_clauses,
+   Plan *outer_plan)
 {
 	Index		scan_relid = baserel->relid;

@@ -564,7 +567,8 @@ fileGetForeignPlan(PlannerInfo *root,
 			NIL,	/* no expressions to evaluate */
 			best_path->fdw_private,
 			NIL,	/* no custom tlist */
-			NIL /* no remote quals */ );
+			NIL,	/* no remote quals */
+			outer_plan);
 }

 /*
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index a6ba672..9a014d4 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -214,7 +214,8 @@ static ForeignScan *postgresGetForeignPlan(PlannerInfo *root,
 	   Oid foreigntableid,
 	   ForeignPath *best_path,
 	   List *tlist,
-	   List *scan_clauses);
+	   List *scan_clauses,
+	   Plan

Re: [HACKERS] dynloader.h missing in prebuilt package for Windows?

2015-12-04 Thread Chapman Flack

On 12/04/15 12:56, Bruce Momjian wrote:
> 
> OK, good logical reason to install dynloader.h on Windows.

Ah!  Thanks.  I was starting to wonder whether I had said something wrong
and been sent to the bit bucket within my first two -hackers posts. :)

> What do we need to do to close this item?  What versions of Windows
> installers are missing dynloader.h?  Mingw?  MSVC?  EDB's?  OpenSCG?
> Postgres Pro (Russian)?

I am too new around here to be able to supply good answers myself to
those questions.  I keep learning though.  For example, I have just
learned that OpenSCG and Postgres Pro (Russian) are things, and (by
inference) that they run on Windows. :)

In working on PL/Java, as a non-Windows dev myself, I have been
blessed to find Ken Olson willing to build my WIP branches in MSVC
and tell me what I've busted. I think he's building against an EDB
installation of PostgreSQL, so that's the combination I am least
ignorant about. I know that mingw has also been used, but I haven't
yet made a connection with someone who can tell me what I'm breaking
for that build

> Is this a change we need to make on the server end and then all the
> installers will automatically install the file?  It is present in all
> Unix-like installs?

AFAICS, it must at one time have been envisioned that sometimes
extension code might like a nice dynloader; the whole way that
backend/port/dynloader contains a version for every platform, and
the configure script links the appropriate one into src/include with
all the other .h files that would go into a -devel package, makes me
_think_ that it'd be present in any install that used configure in
the build. Anyone building a -devel package would have to go out
of their way to exclude it but still package the other .h files.

So I'd guess that Windows builds that don't use configure are probably
the odd players out. But I don't have any contacts or name recognition
to approach { EDB, OpenSCG, Postgres Pro } and find out how their
internal build processes work, or whether they all end up lacking
the file, or whether there is any single change that could be made
in the PG source tree so that their builds would then all include it.

>  Also, I am very glad you are working on PL/Java.  :-)

Thanks!  It has been interesting trying to get up to speed, both on
how it technically works, and also on the development history, why it
lives out-of-tree, and so on. (That question had puzzled me for a long
time, and when I finally found the long 2006 -hackers thread,
http://www.postgresql.org/message-id/44b3952b.2080...@tada.se
I was like your genealogy-obsessed aunt finding a trunk in the attic. :)

I can see that a lot of the considerations raised in that thread, both
ways, are still pertinent today, plus with nine more years behind us
to see how things have actually played out. Tom Lane was concerned about
what would happen if Thomas's time for maintenance became scarce, and
what seems to happen is someone like Johann Oskarsson, or yours truly,
will step up, flounder a bit to get over the learning curve, and then
carry the ball some distance. There is a bit of a barrier to entry,
because PostgreSQL and Java are each large and deep and PL/Java has
both, and for the future vigor of the project it seems important to
find the ways to minimize that barrier. (I know I'm getting OT from
dynloader here, but this other stuff has been on my mind a while.)

That doesn't require reopening the in-tree question necessarily
(I saw that Tom Lane was concerned about a buildfarm going all red
because of a problem too few people could easily fix), but it would
probably be very helpful to at least have some kind of _associated
buildfarm_ so the main board might not go all red, but it would still
be easy to see right away if a change was going to affect important
out-of-tree components.

That seems to have been a worthwhile idea nine years ago (I read that
Andrew Dunstan _almost_ found the time to set it up then), and
still today something like that would be very helpful. PL/Java still needs
work on an easily-automatable suite of tests (that much, I'm working on),
and once that's in place, the next obvious and valuable step would be to get
it building continuously on the several major platforms, so it can turn red
on some board that the team can easily see. I might ask for some help or
suggestions on what are the currently most-favored ways to do that.

I think that with more automated testing and CI, the barrier to entry on
PL/Java contributions will be a lot lower, because it is much less
intimidating to start poking at something when it is easy to see what
happens.

But that's all food for future thought ... this thread's about dynloader ...

-Chap

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Size of Path nodes

2015-12-04 Thread Jim Nasby

On 12/4/15 5:14 PM, Peter Geoghegan wrote:

On Fri, Dec 4, 2015 at 2:44 PM, Jim Nasby  wrote:

>I suspect Cachegrind[1] would answer a lot of these questions (though I've
>never actually used it). I can't get postgres to run under valgrind on my
>laptop, but maybe someone that's been successful at valgrind can try
>cachegrind (It's just another mode of valgrind).

I've used Cachegrind, and think it's pretty good. You still need a
test case that exercises what you're interested in, though.
Distributed costs are really hard to quantify. Sometimes that's
because they don't exist, and sometimes it's because they can only be
quantified as part of a value judgement.

If we had a good way to run cachegrind (and maybe if it was run 
automatically somewhere) then at least we'd know what effect a patch had 
on things. (For those not familiar, there is a tool for diffing too 
cachegrind runs). Knowing is half the battle and all that.

Another interesting possibility would be a standardized perf test. [1] 
makes an argument for that.

Maybe a useful way to set stuff like this up would be to create support 
for things to run in travis-ci. Time-based tools presumably would be 
useless, but something doing analysis like cachegrind would probably be 
OK (though I think they do cap test runs to an hour or something).

[1] https://news.ycombinator.com/item?id=8426302
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Size of Path nodes

2015-12-04 Thread Robert Haas

On Fri, Dec 4, 2015 at 6:14 PM, Peter Geoghegan  wrote:
> As frustrated as I've sometimes been with those discussions, I do
> recognize that there has to be a middle ground, and that the emphasis
> on distributed costs has as much to do with fairness for every
> contributor as anything else. I would have appreciated some attempt to
> have quantified the overhead here, but would not have insisted on
> Robert being as thorough as he conceivably could have been.

Honestly, I really never thought of the issue of whether this would
push some structure over a power-of-two size.  It's not like either
Path or IndexPath have comments saying "for the love of heaven, don't
make this bigger".  This is in contrast to, say, PGXACT, where there
is a such a comment, because I knew it was an issue and I added a
comment explaining that issue.  Now maybe I should have foreseen the
objection anyway, but, you know, it's hard to foresee everything
somebody might object to.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Size of Path nodes

2015-12-04 Thread Amit Kapila

On Sat, Dec 5, 2015 at 2:30 AM, Robert Haas  wrote:
>
> On Fri, Dec 4, 2015 at 12:50 PM, Tom Lane  wrote:
> > So over in my private branch where I've been fooling around with upper
> > planner pathification, I've been sweating bullets to avoid enlarging the
> > size of struct Path, because I was aware that common path types like
> > IndexPath and AppendPath were already a power-of-2 size (at least on
> > 64-bit machines), meaning that palloc'ing them would cost twice as much
> > space if I added any fields.
> >
> > When I got around to merging up to HEAD, I found this in commit
f0661c4e8:
> >
>
> > While I'm bitching about this: a field added to a struct as fundamental
> > as Path ought to have a pretty well-defined meaning, and this does not
> > as far as I can tell.  There was certainly no documentation worthy of
> > the name added by the above commit.  What is the semantic difference
> > between a Path with this flag set and the identical Path without?
> > Likewise for struct Plan?
>
> /me crawls into a hole.
>
> Well, funny story about that.  The parallel sequential scan code
> forgot to make that flag actually do what it was budgeted to do, but
> it turns out not to matter as long as the only thing that can be done
> in parallel is a sequential scan.  So there's no live bug right now,
> and the parallel join patch adds the missing code to make that flag
> actually meaningful.  So I'm not very surprised that you found the
> current state of things stupid. The reason it's needed is because the
> parallel join patch generates plans like this:
>
> Gather
> -> Hash Join
>   -> Parallel Seq Scan
>   -> Hash
> -> Seq Scan
>
> The idea of this plan is that each worker builds a hash table on the
> inner rel, and then reads a subset of the outer rel and performs the
> join for the subset of rows which it reads.  Then the Gather brings
> all the worker results together in the leader.  Well, when I initially
> coded this, it wasn't working properly, and I couldn't for the life of
> me figure out why.
>
> Eventually, after hours of debugging, I realized that while I'd added
> this parallel_aware flag with the idea of distinguishing between the
> outer seq scan (which needs to be split between the workers) from the
> inner seq scan (which needs to be executed to completion by every
> worker) there was nothing in the code which would actually cause it to
> work like that.  The inner seq scan was behaving as if it were
> parallel - that is, it returned only a subset of the rows in each
> worker - even though the parallel_aware flag was not set.  I had
> thought that there was some logic in execParallel.c or, uh, somewhere,
> which would give that flag meaning but it turns out that, in master,
> there isn't.  So the parallel join patch, which is otherwise just an
> optimizer patch, also adds the missing bits to execParallel.c.  Maybe
> I should have committed that separately as a bug fix.
>
> To try to clarify a little further, execParallel.c contains three
> functions that make passes over the PlanState tree.  The first one is
> made before creating the parallel DSM, and its purpose is to estimate
> the amount of space that each plan node will require in the DSM (it
> can overestimate, but it can't underestimate).  The second one is made
> after creating the parallel DSM, and gives plan nodes a chance to
> initialize the space they estimated they would use (or some lesser
> amount).  The third one is made in each worker, and gives plan nodes a
> chance to fish out a pointer to the space previously initialized in
> the master.  All of this processing is of course skipped for plan
> nodes that have no parallel smarts.  But the parallel-aware flag is
> supposed to allow these steps to be skipped even for a node that does
> have parallel smarts, so that those smarts can in effect be shut off
> when they're not desired.
>
>
> Of course, it would be possible to rejigger things to avoid having
> this flag in every path.  Currently, SeqScans are the only plan type
> that is ever parallel-aware, so we could put the plan in only that
> path type.  However, I expect to want to expand the set of
> parallel-aware paths pretty soon.  In particular, I want to add at
> least AppendPath and IndexPath to the list.
>

To add to whatever has been said above, intention of adding that flag
was also to avoid adding new nodes for parallelism.  Basically modify
the existing nodes (like SeqScan) to take care of whatever is needed
for parallel execution.  Now fortunately, not much is needed for Seq Scan
(except for explain part and distinguishing backward scan), but I think
that will not be the case going forward when we need to make other nodes
like Append parallel aware.  The need and idea for this flag has been
proposed and discussed on parallel seq scan thread [1].


[1] -
http://www.postgresql.org/message-id/CA+Tgmoa4=KaObDcnvADrKY5WwEWDN+rnGAfjzrc=d3ug0sz...@mail.gmail.com

With Regards,

Re: [HACKERS] Rework the way multixact truncations work

2015-12-04 Thread Noah Misch

On Thu, Dec 03, 2015 at 07:03:21PM +0100, Andres Freund wrote:
> On 2015-12-03 04:38:45 -0500, Noah Misch wrote:
> > On Wed, Dec 02, 2015 at 04:46:26PM +0100, Andres Freund wrote:
> > > Especially if reverting and redoing includes conflicts that mainly
> > > increases the chance of accidental bugs.
> > 
> > True.  (That doesn't apply to these patches.)
> 
> Uh, it does. You had conflicts in your process, and it's hard to verify
> that the re-applied patch is actually functionally identical given the
> volume of changes. It's much easier to see what actually changes by
> looking at iterative commits forward from the current state.

Ah, we were talking about different topics after all.  I was talking about
_merge_ conflicts in a reversion commit.

> Sorry, but I really just want to see these changes as iterative patches
> ontop of 9.5/HEAD instead of this process. I won't revert the reversion
> if you push it anyway, but I think it's a rather bad idea.

I hear you.  I evaluated your request and judged that the benefits you cited
did not make up for the losses I cited.  Should you wish to change my mind,
your best bet is to find defects in the commits I proposed.  If I introduced
juicy defects, that discovery would lend much weight to your conjectures.

> My question was more about the comment being after the "early return"
> than about the content change, should have made that clearer. Can we
> just move your comment up?

Sure, I will.

> > > >  static bool
> > > >  SetOffsetVacuumLimit(void)
> > > >  {
> > > > -   MultiXactId oldestMultiXactId;
> > > > +   MultiXactId oldestMultiXactId;
> > > > MultiXactId nextMXact;
> > > > -   MultiXactOffset oldestOffset = 0;   /* placate compiler */
> > > > -   MultiXactOffset prevOldestOffset;
> > > > +   MultiXactOffset oldestOffset = 0;   /* placate 
> > > > compiler */
> > > > MultiXactOffset nextOffset;
> > > > boololdestOffsetKnown = false;
> > > > +   MultiXactOffset prevOldestOffset;
> > > > boolprevOldestOffsetKnown;
> > > > -   MultiXactOffset offsetStopLimit = 0;
> > > 
> > > I don't see the benefit of the order changes here.
> > 
> > I reacted the same way.  Commit 4f627f8 reordered some declarations, which I
> > reverted when I refinished that commit as branch mxt3-main.
> 
> But the other changes are there, and in the history anyway. As the new
> order isn't more meaningful than the current one...

Right.  A revert+redo patch series can and should purge formatting changes
that did not belong in its predecessor commits.  Alternate change delivery
strategies wouldn't do that.

> > > > -   if (oldestOffsetKnown)
> > > > -   ereport(DEBUG1,
> > > > -   (errmsg("oldest MultiXactId 
> > > > member is at offset %u",
> > > > -   oldestOffset)));
> > > 
> > > That's imo a rather useful debug message.
> > 
> > The branches emit that message at the same times 4f627f8^ and earlier emit 
> > it.
> 
> During testing I found it rather helpful if it was emitted regularly.

I wouldn't oppose a patch making it happen more often.

> > > > LWLockRelease(MultiXactTruncationLock);
> > > > 
> > > > /*
> > > > -* If we can, compute limits (and install them MultiXactState) 
> > > > to prevent
> > > > -* overrun of old data in the members SLRU area. We can only do 
> > > > so if the
> > > > -* oldest offset is known though.
> > > > +* There's no need to update anything if we don't know the 
> > > > oldest offset
> > > > +* or if it hasn't changed.
> > > >  */
> > > 
> > > Is that really a worthwhile optimization?
> > 
> > I would neither remove that longstanding optimization nor add it from 
> > scratch
> > today.  Branch commit 06c9979 restored it as part of a larger restoration to
> > the pre-4f627f8 structure of SetOffsetVacuumLimit().
> 
> There DetermineSafeOldestOffset() did it unconditionally.

That is true; one won't be consistent with both.  06c9979 materially shortened
the final patch and eliminated some user-visible message emission changes.
Moreover, this is clearly a case of SetOffsetVacuumLimit() absorbing
DetermineSafeOldestOffset(), not vice versa.

> > > > -SlruScanDirCbFindEarliest(SlruCtl ctl, char *filename, int segpage, 
> > > > void *data)
> > > 
> > > That really seems like an independent change, deserving its own commit +
> > > explanation.
> > 
> > Indeed.  I explained that change at length in
> > http://www.postgresql.org/message-id/20151108085407.ga1097...@tornado.leadboat.com,
> > including that it's alone on a branch (mxt1-disk-independent), to become its
> > own PostgreSQL commit.
> 
> The comment there doesn't include the explanation...

If you visit that URL, everything from "If anything here requires careful
study, it's the small mxt1-disk-independent change, which

[HACKERS] libxml2 2.9.3 breaks xml test output

2015-12-04 Thread Christian Ullrich


Hello,

I just noticed that last night all built branches failed on my buildfarm 
animal, jaguarundi. They all failed on the "xml" test, and the output is 
essentially the same everywhere:


***
*** 9,16 
  LINE 1: INSERT INTO xmltest VALUES (3, '');

xmlconcat
  --

The last thing I did in regard to XML on this system was upgrade libxml2 
from 2.9.2 to 2.9.3. I bisected it a a bit and the guilty commit to 
libxml2 is ce0b0d0d8 "Do not print error context when there is none" 
from about two weeks ago.


I have zero experience with libxml2, so no idea if the previous context 
level can be turned on again. IMHO, the libxml2 change is a bug in 
itself; PostgreSQL's error messages are more readable with the XML text 
in them.


--
Christian



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libxml2 2.9.3 breaks xml test output

2015-12-04 Thread Michael Paquier

On Sat, Dec 5, 2015 at 4:38 PM, Christian Ullrich  wrote:
> I have zero experience with libxml2, so no idea if the previous context
> level can be turned on again. IMHO, the libxml2 change is a bug in itself;
> PostgreSQL's error messages are more readable with the XML text in them.

See here for example you are not the only one to see this problem:
http://www.postgresql.org/message-id/cafj8pra4xjqfgnqcqmcygx-umgmr3stt3xfeuw7kbsoiovg...@mail.gmail.com
https://bugzilla.redhat.com/show_bug.cgi?id=1286692
--
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Size of Path nodes

2015-12-04 Thread Tom Lane

So over in my private branch where I've been fooling around with upper
planner pathification, I've been sweating bullets to avoid enlarging the
size of struct Path, because I was aware that common path types like
IndexPath and AppendPath were already a power-of-2 size (at least on
64-bit machines), meaning that palloc'ing them would cost twice as much
space if I added any fields.

When I got around to merging up to HEAD, I found this in commit f0661c4e8:

@@ -753,6 +753,7 @@ typedef struct Path
 
 RelOptInfo *parent;/* the relation this path can build */
 ParamPathInfo *param_info; /* parameterization info, or NULL if none */
+boolparallel_aware;/* engage parallel-aware logic? */
 
 /* estimated size/costs for path (see costsize.c for more info) */
 doublerows;/* estimated number of result tuples */

which means Robert has already blown the planner's space consumption out
by close to a factor of 2, and I should stop worrying.  Or else he should
start worrying.  Do we really need this field?  Did anyone pay any
attention to what happened to planner space consumption and performance
when the parallel-scan patch went in?  Or am I just too conditioned by
working with last-century hardware, and I should stop caring about how
large these nodes are?

While I'm bitching about this: a field added to a struct as fundamental
as Path ought to have a pretty well-defined meaning, and this does not
as far as I can tell.  There was certainly no documentation worthy of
the name added by the above commit.  What is the semantic difference
between a Path with this flag set and the identical Path without?
Likewise for struct Plan?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Size of Path nodes

2015-12-04 Thread Andres Freund

On 2015-12-04 12:50:09 -0500, Tom Lane wrote:
> So over in my private branch where I've been fooling around with upper
> planner pathification, I've been sweating bullets to avoid enlarging the
> size of struct Path, because I was aware that common path types like
> IndexPath and AppendPath were already a power-of-2 size (at least on
> 64-bit machines), meaning that palloc'ing them would cost twice as much
> space if I added any fields.
> 
> When I got around to merging up to HEAD, I found this in commit f0661c4e8:
> 
> @@ -753,6 +753,7 @@ typedef struct Path
>  
>  RelOptInfo *parent;/* the relation this path can build */
>  ParamPathInfo *param_info; /* parameterization info, or NULL if none 
> */
> +boolparallel_aware;/* engage parallel-aware logic? */
>  
>  /* estimated size/costs for path (see costsize.c for more info) */
>  doublerows;/* estimated number of result tuples */
> 
> which means Robert has already blown the planner's space consumption out
> by close to a factor of 2, and I should stop worrying.

FWIW, for me it's still <= 64 bytes:

struct Path {
NodeTagtype; /* 0 4 */
NodeTagpathtype; /* 4 4 */
RelOptInfo *   parent;   /* 8 8 */
ParamPathInfo *param_info;   /*16 8 */
bool   parallel_aware;   /*24 1 */

/* XXX 7 bytes hole, try to pack */

double rows; /*32 8 */
Cost   startup_cost; /*40 8 */
Cost   total_cost;   /*48 8 */
List * pathkeys; /*56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */

/* size: 64, cachelines: 1, members: 9 */
/* sum members: 57, holes: 1, sum holes: 7 */
};

> While I'm bitching about this: a field added to a struct as fundamental
> as Path ought to have a pretty well-defined meaning, and this does not
> as far as I can tell.  There was certainly no documentation worthy of
> the name added by the above commit.  What is the semantic difference
> between a Path with this flag set and the identical Path without?
> Likewise for struct Plan?

That part is obviously independently true, regardless of the size.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 9.5 open items

2015-12-04 Thread Tom Lane

Stephen Frost  writes:
> I noticed in passing that the role removal documentation should really
> also discuss shared objects (as the DROP OWNED BY reference page does).

If you're speaking of section 20.4, that text is all my fault ... but
I'm not clear on what you think needs to be added?  The first DROP OWNED
BY will take care of any privileges on shared objects, so I didn't really
think we need to burden the recipe with that detail.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 9.5 open items

2015-12-04 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > I noticed in passing that the role removal documentation should really
> > also discuss shared objects (as the DROP OWNED BY reference page does).
> 
> If you're speaking of section 20.4, that text is all my fault ... but
> I'm not clear on what you think needs to be added?  The first DROP OWNED
> BY will take care of any privileges on shared objects, so I didn't really
> think we need to burden the recipe with that detail.

Specifically this:

Once any valuable objects have been transferred to new owners, any
remaining objects owned by the role-to-be-dropped can be dropped with
the DROP OWNED command.  Again, this command can only access objects in
the current database, so it is necessary to run it in each database that
contains objects owned by the role.

Isn't quite right, as databases which are owned by the role you're
trying to get rid of won't be dropped.  The "Again," does pay it some
back-handed service but it felt to me like it'd be better if it was more
explicit about shared objects, which won't be dropped even if you do go
through and connect to each database and issue the command.  Perhaps
that's a bit excessive as, really, the only kinds of 'owned, shared'
objects currently are databases and maybe it's clear enough that you
have to manually drop databases owned by the role you are trying to
drop, if you don't reassign the ownership.  That's what I was referring
to, anyway.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Size of Path nodes

2015-12-04 Thread Tom Lane

Andres Freund  writes:
> On 2015-12-04 12:50:09 -0500, Tom Lane wrote:
>> which means Robert has already blown the planner's space consumption out
>> by close to a factor of 2, and I should stop worrying.

> FWIW, for me it's still <= 64 bytes:

No, it's not bare Path I'm worried about, it's IndexPath, which went
from 128 bytes in previous versions to 136 in HEAD.  Likewise for
BitmapHeapPath, TidPath, ForeignPath, AppendPath, ResultPath,
MaterialPath, and possibly others.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 9.5 open items

2015-12-04 Thread Tom Lane

Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> If you're speaking of section 20.4, that text is all my fault ... but
>> I'm not clear on what you think needs to be added?  The first DROP OWNED
>> BY will take care of any privileges on shared objects, so I didn't really
>> think we need to burden the recipe with that detail.

> Specifically this:
> ...
> Isn't quite right, as databases which are owned by the role you're
> trying to get rid of won't be dropped.

Ah, good point.  I'll add something about that.  I'm not sure that we
should talk about shared objects in general, since as you say databases
are the only instance.  It would feel like handwaving I think.  The point
of that section IMO is to be as concrete as we can be about how to drop
a role.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

71 matches

Mail list logo