date:20150721

Re: [HACKERS] Alpha2/Beta1

2015-07-21 Thread Etsuro Fujita


On 2015/07/22 12:40, Noah Misch wrote:

I vote for alpha2.  Comparing the "Open Issues" and "resolved after 9.5alpha1"
sections of https://wiki.postgresql.org/wiki/PostgreSQL_9.5_Open_Items, we've
not transitioned to a qualitatively different level of API stability.


+1 for alpha2.

I'd like to propose a patch for the issue "Foreign join pushdown vs 
EvalPlanQual", but I need more time ...


Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typo in comment in setrefs.c

2015-07-21 Thread Etsuro Fujita


On 2015/07/21 1:38, Alvaro Herrera wrote:

Etsuro Fujita wrote:

I ran into a typo in a comment in setrefs.c.  Patch attached.


Fixed by Heikki in 7845db2aa.


Thank you for letting me know about that, Alvaro!  And thanks Heikki for 
picking this up!


Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

2015-07-21 Thread Andres Freund

On 2015-07-21 21:37:41 +0200, Martijn van Oosterhout wrote:
> On Tue, Jul 21, 2015 at 02:24:47PM -0400, Todd A. Cook wrote:
> > Hi,
> > 
> > This thread seemed to trail off without a resolution.  Was anything done?
> 
> Not that I can tell.

Heikki and I had some in-person conversation about it at a conference,
but we didn't really find anything we both liked...

>I was the original poster of this thread. We've
> worked around the issue by placing a CHECKPOINT command at the end of
> the migration script.  For us it's not a performance issue, more a
> correctness one, tables were empty when they shouldn't have been.

If it's just correctness, you could just use wal_level = archive.

> I'm hoping a fix will appear in the 9.5 release, since we're intending
> to release with that version.  A forced checkpoint every now and them
> probably won't be a serious problem though.

We're imo going to have to fix this in the back branches.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] fdw_scan_tlist for foreign table scans breaks EPQ testing, doesn't it?

2015-07-21 Thread Etsuro Fujita

Hi,

The latest foreign-join pushdown patch allows fdw_scan_tlist to be set
to a targetlist even for simple foreign table scans.  However, since I
think we assume that the test tuple of a foreign table for an EPQ
testing, whether it may be copied from the whole-row var or returned by
the RefetchForeignRow routine, has the rowtype declared for the foreign
table, ISTM that EPQ testing doesn't work properly in such a case since
that the targetlist and qual are adjusted to reference fdw_scan_tlist in
such a case.  Maybe I'm missing something though.

I don't understand custom scans/joins exactly, but I have a similar
concern for the simple-custom-scan case too.

Best regards,
Etsuro Fujita


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Solaris testers wanted for strxfrm() behavior

2015-07-21 Thread Bjorn Munch

On 22/07 02.29, Noah Misch wrote:
> > I ran this program on Solaris 9 U5 (September 2006) on Sparc and got:
> 
> I appreciate your testing.  A few sources give December 2003 as the month for
> Solaris 9 Update 5; would you verify the vintage you used?

Sorry I was mis-parsing the /etc/release. 9/05 is the month. I should
know that. :-/ This was Solaris 9 U9 from September 2005.

>From the output it looks like the bug was present here?

I did a quick search for bugs in libc with strxfrm in the title and
got a few hits but none that seemed to be this one.

- Bjorn

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Solaris testers wanted for strxfrm() behavior

2015-07-21 Thread Noah Misch

On Tue, Jul 21, 2015 at 04:45:21PM +0200, Bjorn Munch wrote:
> On 27/06 12.51, Noah Misch wrote:
> > 
> > PostgreSQL 9.5 adds a strxfrm() call in bttext_abbrev_convert(), which does
> > not account for the Solaris bug.  I wish to determine whether that bug is
> > still relevant today.  If you have access to Solaris with the 
> > is_IS.ISO8859-1
> > locale installed (or root access to install it), please compile and run the
> > attached test program on each Solaris version you have available.  Reply 
> > here
> > with the program's output.  I especially need a report from Solaris 10, but
> > reports from older and newer versions are valuable.  Thanks.
> 
> I ran this program on Solaris 9 U5 (September 2006) on Sparc and got:

I appreciate your testing.  A few sources give December 2003 as the month for
Solaris 9 Update 5; would you verify the vintage you used?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor code improvements to create_foreignscan_plan/ExecInitForeignScan

2015-07-21 Thread Etsuro Fujita


On 2015/07/10 21:59, David Rowley wrote:

On 10 July 2015 at 21:40, Etsuro Fujita mailto:fujita.ets...@lab.ntt.co.jp>> wrote:

To save cycles, I modified create_foreignscan_plan so that it detects
whether any system columns are requested if scanning a base relation.
Also, I revised other code there a little bit.

For ExecInitForeignScan, I simplified the code there to determine the
scan tuple type, whith seems to me complex beyound necessity.  Maybe
that might be nitpicking, though.


For the latter, I misunderstood that the latest foreign-join pushdown 
patch allows fdw_scan_tlist to be set to a targetlist even for simple 
foreign table scans.  Sorry for that, but I have a concern about that. 
I'd like to mention it in a new thread later.



I just glanced at this and noticed that the method for determining if
there's any system columns could be made a bit nicer.

/* Now, are any system columns requested from rel? */
scan_plan->fsSystemCol = false;
for (i = FirstLowInvalidHeapAttributeNumber + 1; i < 0; i++)
{
if (bms_is_member(i - FirstLowInvalidHeapAttributeNumber, attrs_used))
{
scan_plan->fsSystemCol = true;
break;
}
}

I think could just be written as:
/* Now, are any system columns requested from rel? */
if (!bms_is_empty(attrs_used) &&
bms_next_member(attrs_used, -1) < -FirstLowInvalidHeapAttributeNumber)
scan_plan->fsSystemCol = true;
else
scan_plan->fsSystemCol = false;

I know you didn't change this, but just thought I'd mention it while
there's an opportunity to fix it.


Good catch!

Will update the patch (and drop the ExecInitForeignScan part from the 
patch).


Sorry for the delay.

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Alpha2/Beta1

2015-07-21 Thread Simon Riggs

On 21 July 2015 at 22:01, Josh Berkus  wrote:


> My question for Hackers is: should this be Alpha2 or Beta 1?
>

Alpha2

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-21 Thread Kyotaro HORIGUCHI

Hello,

At Tue, 21 Jul 2015 14:28:25 +0300, Ildus Kurbangaliev 
 wrote in <55ae2cd9.4050...@postgrespro.ru>
> On 07/21/2015 01:18 PM, Andres Freund wrote:
> > On 2015-07-21 13:11:36 +0300, Ildus Kurbangaliev wrote:
> >> /*
> >>* Top-level transactions are identified by VirtualTransactionIDs
> >>* comprising
> >> diff --git a/src/include/storage/lwlock.h
> >> b/src/include/storage/lwlock.h
> >> index cff3b99..55b0687 100644
> >> --- a/src/include/storage/lwlock.h
> >> +++ b/src/include/storage/lwlock.h
> >> @@ -58,6 +58,9 @@ typedef struct LWLock
> >>   #ifdef LOCK_DEBUG
> >>struct PGPROC *owner;   /* last exlusive owner of the lock */
> >>   #endif
> >> +
> >> + /* LWLock group, initialized as -1, calculated in first acquire */
> >> +  int group;
> >>   } LWLock;
> > I'd very much like to avoid increasing the size of struct LWLock. We
> > have a lot of those and I'd still like to inline them with the buffer
> > descriptors. Why do we need a separate group and can't reuse the
> > tranche?  That might require creating a few more tranches, but ...?
> >
> > Andres
> Do you mean moving LWLocks defined by offsets and with dynamic sizes
> to separate tranches?

I think it is too much for the purpose. Only two new tranches and
maybe one or some new members (maybe representing the group) of
trances will do, I suppose.

> It sounds like an good option, but it will require refactoring of
> current tranches. In current implementation
> i tried not change current things.

Now one of the most controversial points of this patch is the
details of the implement, which largely affects performance drag,
maybe.


>From the viewpoint of performance, I have some comments on the feature:

 - LWLockReportStat runs a linear search loop which I suppose
   should be avoided even if the loop count is rather small for
   LWLocks, as Fujii-san said upthread or anywhere.

 - Currently pg_stat_activity view is the only API, which would
   be a bit heavy for sampling use. It'd be appreciated to have a
   far lighter means to know the same informtion.

> Simple solution will be using uint8 for tranche (because we have only
> 3 of them now,
> and it will take much time to get to 255) and uint8 for group. In this
> case size will not change.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Use pg_rewind when target timeline was switched

2015-07-21 Thread Michael Paquier

On Mon, Jul 20, 2015 at 9:18 PM, Alexander Korotkov
 wrote:
> attached patch allows pg_rewind to work when target timeline was switched.
> Actually, this patch fixes TODO from pg_rewind comments.
>
>   /*
>* Trace the history backwards, until we hit the target timeline.
>*
>* TODO: This assumes that there are no timeline switches on the target
>* cluster after the fork.
>*/
>
> This patch allows pg_rewind to handle data directory synchronization is much
> more general way. For instance, user can return promoted standby to old
> master.

Yes. That would be useful.

> In this patch target timeline history is exposed as global variable. Index
> in target timeline history is used in function interfaces instead of
> specifying TLI directly. Thus, SimpleXLogPageRead() can easily start reading
> XLOGs from next timeline when current timeline ends.

The implementation looks rather neat by having a first look at it, but
the attached patch fails to compile:
pg_rewind.c:488:4: error: use of undeclared identifier 'targetEntry'
targetEntry = &targetHistory[i];
^
Nice to see as well that this has been added to the CF of September.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench stats per script & other stuff

2015-07-21 Thread Fabien COELHO



Hello Josh,


Maybe -f file.sql:weight (yuk from my point of view, but it can be done
easily).


Maybe it's past time for pgbench to have a config file?


That is an idea.  For "simple" usage, for backward compatibility and for 
people like me who like them, ISTM that options are fine too:-)


Also this may mean adding a dependency to some YAML library, configure 
issues (I'm not sure whether pg currently uses YAML, and JSON is quite 
verbose), maybe conditionals around the feature to compile without the 
dependency, more documentation...


I'm not sure all that is desirable just for weighting scripts.


Given that we want to define some per-workload options, the config file
would probably need to be YAML or JSON, e.g.:

[...]

script1:
file: script1.bench
weight: 3
script2:
file: script2.bench
weight: 1

the above would execute a pgbench with 16 clients, 4 threads, "script1"
three times as often as script2, and report stats at the script (rather
than SQL statement) level.


Yep. Probably numbering within field names should be avoided, so a list of 
records that could look like:


scripts:
 - file: foo1.sql
   weight: 9
 - file: foo2.sql
 - internal: tpc-b
   weight: 2

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] creating extension including dependencies

2015-07-21 Thread Michael Paquier

On Tue, Jul 21, 2015 at 11:20 PM, Tom Lane  wrote:
> Petr Jelinek  writes:
>> ... My main question is if we are
>> ok with SCHEMA having different behavior with CASCADE vs without
>> CASCADE. I went originally with "no" and added the DEFAULT flag to
>> SCHEMA. If the answer is "yes" then we don't need the flag (in that case
>> CASCADE acts as the flag).
>
> Yeah, I was coming around to that position as well.  Insisting that
> SCHEMA throw an error if the extension isn't relocatable makes sense
> as long as only one extension is being considered, but once you say
> CASCADE it seems like mostly a usability fail.  I think it's probably
> OK if with CASCADE, SCHEMA is just "use if needed else ignore".

OK, I'm fine with that, aka with CASCADE and a SCHEMA specified we use
it if needed or ignore it otherwise (if I am following correctly).

"CREATE EXTENSION foo SCHEMA bar" will fail if the extension is not
relocatable *and* does not have a schema specified in its control
file. A non-relocatable extension can be initially created anywhere.
It just cannot be moved afterwards from its original schema.

> Obviously we've gotta document all this properly.

Sure. That's a sine-qua-non condition for this patch.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench stats per script & other stuff

2015-07-21 Thread Fabien COELHO



Hello Robert,


Pgbench *currently* already accept multiple "-f ..." options, and this is a
good thing to test realistic loads which may intermix several kind of
transactions, say a lot of readonly and some update or insert, and very rare
deletes...


Hmm, I didn't realize that.  The code looks a bit inconsistent right
now - e.g. we do support multiple files, but pgbench's options-parsing
loop sets ttype to a value that depends only on the last of -f, -N,
and -S encountered.


Indeed. However as with current pgbench /-N/-S and -f are 
mutually exclusive it is ok to have ttype set as it is.


With the patch pgbench just executes scripts and the options are not 
mutually exclusive: some scripts are internal and others are not, but they 
are treated the same beyond initialization, which helps removing some code 
including the "ttype" variable you mention. The name of the script is kept 
in an SQLScript struct along with its commands, weight and stats.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Linux kernel performance regression tests and Postgres

2015-07-21 Thread Michael Paquier

On Wed, Jul 22, 2015 at 1:27 PM, Peter Geoghegan  wrote:
> Mel Gorman wrote a tool to perform Linux kernel performance testing.
> Detecting performance regressions in new kernels appears to be the
> major goal.
>
> I note that there are Postgres specific tests listed here:
>
> http://www.csn.ul.ie/~mel/results/home/marvin/dashboard-openSUSE-13.1-smart.html

That's really cool. Thanks for sharing.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup and replication slots

2015-07-21 Thread Michael Paquier

On Wed, Jul 22, 2015 at 10:15 AM, Peter Eisentraut wrote:
> On 7/2/15 3:37 AM, Michael Paquier wrote:
>> 2)
>>  sub psql
>>  {
>> my ($dbname, $sql) = @_;
>> -   run [ 'psql', '-X', '-q', '-d', $dbname, '-f', '-' ], '<', \$sql or 
>> die;
>> +   my ($stdout, $stderr);
>> +   run [ 'psql', '-X', '-A', '-t', '-q', '-d', $dbname, '-f', '-'
>> ], '<', \$sql, '>', \$stdout, '2>', \$stderr or die;
>> +   chomp $stdout;
>> +   return $stdout;
>>  }
>> RewindTest.pm has a routine called check_query defined. I would be
>> great to extend a bit more psql than what I thought previously by
>> returning from it ($result, $stdout, $strerr) and make check_query
>> directly use it. This way we have a unique interface to run psql and
>> capture output. I don't mind writing this refactoring patch on top of
>> your set if that's necessary.
>
> Let's do that as a separate patch.  Adding multiple return values makes
> calling awkward, so I'd like to sort out the actual use cases before we
> do that.

OK.

>> 3)
>> +command_ok([ 'pg_basebackup', '-D', "$tempdir/backupxs_sl_R", '-X',
>> 'stream', '-S', 'slot1', '-R' ],
>> +   'pg_basebackup with replication slot and -R runs');
>> +$recovery_conf = slurp_file "$tempdir/backupR/recovery.conf";
>> +like(slurp_file("$tempdir/backupxs_sl_R/recovery.conf"),
>> slurp_file is slurping an incorrect file and $recovery_conf is used
>> nowhere after, so you could simply remove this line.
>
> Yeah, that was some leftover garbage.

OK, thanks for the updated versions. Those ones look good to me.

Now, do we plan to do something about the creation of a slot. I
imagine that it would be useful if we could have --create-slot to
create a slot when beginning a base backup and if-not-exists to
control the error flow. There is already CreateReplicationSlot for
this purpose in streamutils.c so most of the work is already done.
Also, when a base backup fails for a reason or another, we should try
to drop the slot in disconnect_and_exit() if it has been created by
pg_basebackup. if if-not-exists is true and the slot already existed
when beginning, we had better not dropping it perhaps...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Linux kernel performance regression tests and Postgres

2015-07-21 Thread Peter Geoghegan

Mel Gorman wrote a tool to perform Linux kernel performance testing.
Detecting performance regressions in new kernels appears to be the
major goal.

I note that there are Postgres specific tests listed here:

http://www.csn.ul.ie/~mel/results/home/marvin/dashboard-openSUSE-13.1-smart.html

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Alpha2/Beta1

2015-07-21 Thread Peter Geoghegan

On Tue, Jul 21, 2015 at 8:40 PM, Noah Misch  wrote:
> I vote for alpha2.  Comparing the "Open Issues" and "resolved after 9.5alpha1"
> sections of https://wiki.postgresql.org/wiki/PostgreSQL_9.5_Open_Items, we've
> not transitioned to a qualitatively different level of API stability.

I agree. I'd hoped that my bugfixes would get committed in time for
what we were going to call Beta 1. Everyone qualified seems to be very
busy with other things that are more pressing.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Alpha2/Beta1

2015-07-21 Thread Michael Paquier

On Wed, Jul 22, 2015 at 12:40 PM, Noah Misch  wrote:
> On Tue, Jul 21, 2015 at 02:01:57PM -0700, Josh Berkus wrote:
>> At the Developer's Meeting, we said we'd be releasing an alpha or beta
>> each month until final 9.5 release.  As such, we should be releasing a
>> 9.5 alpha/beta in the first week of August.
>>
>> My question for Hackers is: should this be Alpha2 or Beta 1?
>
> I vote for alpha2.  Comparing the "Open Issues" and "resolved after 9.5alpha1"
> sections of https://wiki.postgresql.org/wiki/PostgreSQL_9.5_Open_Items, we've
> not transitioned to a qualitatively different level of API stability.

Mine goes for alpha2. There are some issues remaining yet that require
low-level attention.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [BUGS] [HACKERS] object_classes array is broken, again

2015-07-21 Thread Alvaro Herrera

Jaimin Pan wrote:
> Hi all,
> 
> How about this patch. I believe it will never missing someone and be
> detected while compiling.

Hmm, yeah this looks like something that's worth considering going
forward.  I can't think of any reason not to do this.  Perhaps we can
write getObjectClass using this, too.

The new file has a wrong comment above the list, copy-pasted from
rmgrlist.h.

I've always wondered about unifying OCLASS with OBJECT, but that's
probably a completely independent topic.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Alpha2/Beta1

2015-07-21 Thread Noah Misch

On Tue, Jul 21, 2015 at 02:01:57PM -0700, Josh Berkus wrote:
> At the Developer's Meeting, we said we'd be releasing an alpha or beta
> each month until final 9.5 release.  As such, we should be releasing a
> 9.5 alpha/beta in the first week of August.
> 
> My question for Hackers is: should this be Alpha2 or Beta 1?

I vote for alpha2.  Comparing the "Open Issues" and "resolved after 9.5alpha1"
sections of https://wiki.postgresql.org/wiki/PostgreSQL_9.5_Open_Items, we've
not transitioned to a qualitatively different level of API stability.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE .. ADD PRIMARY KEY .. USING INDEX has dump-restore hazard

2015-07-21 Thread Michael Paquier

On Wed, Jul 22, 2015 at 9:34 AM, Michael Paquier
 wrote:
> On Wed, Jul 22, 2015 at 1:23 AM, Robert Haas  wrote:
>> Notice that the collation specifier is gone.  Oops.
>
> As it is not possible to specify directly a constraint for a PRIMARY
> KEY expression, what about switching dumpConstraint to have it use
> first a CREATE INDEX query with the collation and then use ALTER TABLE
> to attach the constraint to it? I am noticing that we already fetch
> the index definition in indxinfo via pg_get_indexdef. Thoughts?

And poking at that I have finished with the attached that adds a
CREATE INDEX query before ALTER TABLE ADD CONSTRAINT, to which a USING
INDEX is appended. Storage options as well as building the column list
becomes unnecessary because indexdef already provides everything what
is needed, so this patch makes dump rely more on what is on
backend-side.
-- 
Michael
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 0e036b8..7a1e6db 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -14515,7 +14515,6 @@ dumpConstraint(Archive *fout, DumpOptions *dopt, ConstraintInfo *coninfo)
 	{
 		/* Index-related constraint */
 		IndxInfo   *indxinfo;
-		int			k;
 
 		indxinfo = (IndxInfo *) findObjectByDumpId(coninfo->conindex);
 
@@ -14527,6 +14526,10 @@ dumpConstraint(Archive *fout, DumpOptions *dopt, ConstraintInfo *coninfo)
 			binary_upgrade_set_pg_class_oids(fout, q,
 			 indxinfo->dobj.catId.oid, true);
 
+		if (coninfo->contype == 'p' ||
+			coninfo->contype == 'u')
+			appendPQExpBuffer(q, "%s;\n", indxinfo->indexdef);
+
 		appendPQExpBuffer(q, "ALTER TABLE ONLY %s\n",
 		  fmtId(tbinfo->dobj.name));
 		appendPQExpBuffer(q, "ADD CONSTRAINT %s ",
@@ -14534,31 +14537,21 @@ dumpConstraint(Archive *fout, DumpOptions *dopt, ConstraintInfo *coninfo)
 
 		if (coninfo->condef)
 		{
+			/*
+			 * getIndexes sets a constraint definition only for exclusion
+			 * constraints.
+			 */
+			Assert(coninfo->contype == 'x');
+
 			/* pg_get_constraintdef should have provided everything */
 			appendPQExpBuffer(q, "%s;\n", coninfo->condef);
 		}
 		else
 		{
-			appendPQExpBuffer(q, "%s (",
+			appendPQExpBuffer(q, "%s ",
 		 coninfo->contype == 'p' ? "PRIMARY KEY" : "UNIQUE");
-			for (k = 0; k < indxinfo->indnkeys; k++)
-			{
-int			indkey = (int) indxinfo->indkeys[k];
-const char *attname;
-
-if (indkey == InvalidAttrNumber)
-	break;
-attname = getAttrName(indkey, tbinfo);
-
-appendPQExpBuffer(q, "%s%s",
-  (k == 0) ? "" : ", ",
-  fmtId(attname));
-			}
-
-			appendPQExpBufferChar(q, ')');
 
-			if (indxinfo->options && strlen(indxinfo->options) > 0)
-appendPQExpBuffer(q, " WITH (%s)", indxinfo->options);
+			appendPQExpBuffer(q, "USING INDEX %s", indxinfo->dobj.name);
 
 			if (coninfo->condeferrable)
 			{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "make check" changes have caused buildfarm deterioration.

2015-07-21 Thread Michael Paquier

On Wed, Jul 22, 2015 at 10:04 AM, Peter Eisentraut  wrote:
> On 7/21/15 10:00 AM, Tom Lane wrote:
>> I agree; this change may have seemed like a good idea at the time, but
>> it was not.  Failures during "make check"'s install step are rare enough
>> that you don't really need all that output in your face to help with the
>> rare situation where it fails.  And for the buildfarm's purposes, it is
>> surely desirable to segregate that output from the actual check step.
>
> It wasn't really an idea; it was just not necessary anymore.  We can put
> it [directing the make install output into a file] back if that's what
> people prefer.

OK... Attached are two patches (please merge them into a single
commit, I am just separating them as they are separate issues):
- 0001 adds a missing entry in test_ddl_deparse's .gitignore. I
mentioned that upthread.
- 0002 redirects the installation logs into
abs_top_builddir/tmp_install/log/install.log. We could redirect it
only to abs_top_builddir/log/ but tmp_install is not removed after a
run of a regression make target.
-- 
Michael
From 0614062b1d41bd72ed4a0ba000d639607156066d Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 22 Jul 2015 10:15:20 +0900
Subject: [PATCH 1/2] Add missing entry log/ in .gitignore for test_ddl_deparse

Entry forgotten by 9faa6ae1.
---
 src/test/modules/test_ddl_deparse/.gitignore | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/test/modules/test_ddl_deparse/.gitignore b/src/test/modules/test_ddl_deparse/.gitignore
index 6628455..3337b3d 100644
--- a/src/test/modules/test_ddl_deparse/.gitignore
+++ b/src/test/modules/test_ddl_deparse/.gitignore
@@ -1 +1,2 @@
+/log/
 /results/
-- 
2.4.6

From 9df069d70f8c9aecf7254ebfdec85460faf5 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 22 Jul 2015 10:24:17 +0900
Subject: [PATCH 2/2] Redirect installation output of make check into a log
 file

dbf2ec1a changed make check such as the installation logs get directed
to stdout and stderr, however some recent objections are proving that this
was a bad move. Hence put it back into a log file saved as tmp_install/log
which is created once per invocation of any make target doing regression
tests.
---
 src/Makefile.global.in | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 8d1250d..e2f7211 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -307,9 +307,10 @@ temp-install:
 ifndef NO_TEMP_INSTALL
 ifeq ($(MAKELEVEL),0)
 	rm -rf '$(abs_top_builddir)'/tmp_install
-	$(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install install
+	$(MKDIR_P) '$(abs_top_builddir)'/tmp_install/log
+	$(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
 endif
-	$(if $(EXTRA_INSTALL),for extra in $(EXTRA_INSTALL); do $(MAKE) -C '$(top_builddir)'/$$extra DESTDIR='$(abs_top_builddir)'/tmp_install install || exit; done)
+	$(if $(EXTRA_INSTALL),for extra in $(EXTRA_INSTALL); do $(MAKE) -C '$(top_builddir)'/$$extra DESTDIR='$(abs_top_builddir)'/tmp_install install >>'$(abs_top_builddir)'/tmp_install/log/install.log || exit; done)
 endif
 
 PROVE = @PROVE@
-- 
2.4.6


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup and replication slots

2015-07-21 Thread Peter Eisentraut

On 7/2/15 3:37 AM, Michael Paquier wrote:
> Regarding patch 3, I have more comments:
> 1) I think that documentation should clearly mention that if -R and -S
> are used together, a primary_slot_name entry is added in the
> recovery.conf generated with the slot name defined.

Updated proposal attached.

> 2)
>  sub psql
>  {
> my ($dbname, $sql) = @_;
> -   run [ 'psql', '-X', '-q', '-d', $dbname, '-f', '-' ], '<', \$sql or 
> die;
> +   my ($stdout, $stderr);
> +   run [ 'psql', '-X', '-A', '-t', '-q', '-d', $dbname, '-f', '-'
> ], '<', \$sql, '>', \$stdout, '2>', \$stderr or die;
> +   chomp $stdout;
> +   return $stdout;
>  }
> RewindTest.pm has a routine called check_query defined. I would be
> great to extend a bit more psql than what I thought previously by
> returning from it ($result, $stdout, $strerr) and make check_query
> directly use it. This way we have a unique interface to run psql and
> capture output. I don't mind writing this refactoring patch on top of
> your set if that's necessary.

Let's do that as a separate patch.  Adding multiple return values makes
calling awkward, so I'd like to sort out the actual use cases before we
do that.

> 3)
> +command_ok([ 'pg_basebackup', '-D', "$tempdir/backupxs_sl_R", '-X',
> 'stream', '-S', 'slot1', '-R' ],
> +   'pg_basebackup with replication slot and -R runs');
> +$recovery_conf = slurp_file "$tempdir/backupR/recovery.conf";
> +like(slurp_file("$tempdir/backupxs_sl_R/recovery.conf"),
> slurp_file is slurping an incorrect file and $recovery_conf is used
> nowhere after, so you could simply remove this line.

Yeah, that was some leftover garbage.

>From 7ecb1794c2aaeea9753af07e2f54f6e483af255f Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 21 Jul 2015 21:06:45 -0400
Subject: [PATCH 3/3] pg_basebackup: Add --slot option

This option specifies a replication slot for WAL streaming (-X stream),
so that there can be continuous replication slot use between WAL
streaming during the base backup and the start of regular streaming
replication.
---
 doc/src/sgml/ref/pg_basebackup.sgml  | 27 ---
 src/bin/pg_basebackup/pg_basebackup.c| 24 +++-
 src/bin/pg_basebackup/t/010_pg_basebackup.pl | 22 +-
 src/test/perl/TestLib.pm |  5 -
 4 files changed, 72 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index cb8b8a3..5f8b9b7 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -215,15 +215,36 @@ Options
   
 

-Write a minimal recovery.conf in the output directory (or into
-the base archive file when using tar format) to ease setting
-up a standby server.
+Write a minimal recovery.conf in the output
+directory (or into the base archive file when using tar format) to
+ease setting up a standby server.
+The recovery.conf file will record the connection
+settings and, if specified, the replication slot
+that pg_basebackup is using, so that the
+streaming replication will use the same settings later on.

 
   
  
 
  
+  -S slotname
+  --slot=slotname
+  
+   
+This option can only be used together with -X
+stream.  It causes the WAL streaming to use the specified
+replication slot.  If the base backup is intended to be used as a
+streaming replication standby using replication slots, it should then
+use the same replication slot name
+in recovery.conf.  That way, it is ensured that
+the server does not remove any necessary WAL data in the time between
+the end of the base backup and the start of streaming replication.
+   
+  
+ 
+
+ 
   -T olddir=newdir
   --tablespace-mapping=olddir=newdir
   
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 5363680..80de882 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -239,6 +239,7 @@ usage(void)
 	  " (in kB/s, or use suffix \"k\" or \"M\")\n"));
 	printf(_("  -R, --write-recovery-conf\n"
 			 " write recovery.conf after backup\n"));
+	printf(_("  -S, --slot=SLOTNAMEreplication slot to use\n"));
 	printf(_("  -T, --tablespace-mapping=OLDDIR=NEWDIR\n"
 	  " relocate tablespace in OLDDIR to NEWDIR\n"));
 	printf(_("  -x, --xlog include required WAL files in backup (fetch mode)\n"));
@@ -1536,6 +1537,13 @@ GenerateRecoveryConf(PGconn *conn)
 	appendPQExpBuffer(recoveryconfcontents, "primary_conninfo = '%s'\n", escaped);
 	free(escaped);
 
+	if (replication_slot)
+	{
+		escaped = escape_quotes(replication_slot);
+		appendPQExpBuffer(recoveryconfcontents, "primary_slot_name = '%s'\n"

Re: [HACKERS] "make check" changes have caused buildfarm deterioration.

2015-07-21 Thread Michael Paquier

On Tue, Jul 21, 2015 at 10:34 PM, Andrew Dunstan wrote:
> OK, looks sane enough. but please do address the other issue.

Okidoki.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "make check" changes have caused buildfarm deterioration.

2015-07-21 Thread Peter Eisentraut

On 7/21/15 10:00 AM, Tom Lane wrote:
> I agree; this change may have seemed like a good idea at the time, but
> it was not.  Failures during "make check"'s install step are rare enough
> that you don't really need all that output in your face to help with the
> rare situation where it fails.  And for the buildfarm's purposes, it is
> surely desirable to segregate that output from the actual check step.

It wasn't really an idea; it was just not necessary anymore.  We can put
it [directing the make install output into a file] back if that's what
people prefer.

> A possible alternative is to run the "make install" sub-step with -s,
> but that could be objected to on the grounds that if it did fail, you'd
> have a hard time telling exactly which step failed.

I usually run the whole make check with -s.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Reload SSL certificates on SIGHUP

2015-07-21 Thread Andreas Karlsson


On 07/02/2015 06:13 PM, Peter Eisentraut wrote:

I think this would be a useful feature, and the implementation looks
sound.  But I don't like how the reload is organized.  Reinitializing
the context in the sighup handler, aside from questions about how much
work one should do in a signal handler, would cause SSL reinitialization
for unrelated reloads.  We have the GUC assign hook mechanism for
handling this sort of thing.  The trick would be that when multiple
SSL-related settings change, you only want to do one reinitialization.
You could either have the different assign hooks communicate with each
other somehow, or have them set a "need SSL init" flag that is checked
somewhere else.


It is not enough to just add a hook to the GUCs since I would guess most 
users would expect the certificate to be reloaded if just the file has 
been replaced and no GUC was changed. To support this we would need to 
also check the mtimes of the SSL files, would that complexity really be 
worth it?


Andreas



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE .. ADD PRIMARY KEY .. USING INDEX has dump-restore hazard

2015-07-21 Thread Michael Paquier

On Wed, Jul 22, 2015 at 1:23 AM, Robert Haas  wrote:
> Notice that the collation specifier is gone.  Oops.

As it is not possible to specify directly a constraint for a PRIMARY
KEY expression, what about switching dumpConstraint to have it use
first a CREATE INDEX query with the collation and then use ALTER TABLE
to attach the constraint to it? I am noticing that we already fetch
the index definition in indxinfo via pg_get_indexdef. Thoughts?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Planner debug views

2015-07-21 Thread Qingqing Zhou

Here is a proposal introducing some debugging infrastructure into the
core. The basic idea is to allow us to query the planner search space.
To so do, we can dump related information to csv files and use foreign
table to query them. So here I propose two foreign tables:

  create foreign table pg_planner_rels(rel text, content text) ...
  create foreign table pg_planner_paths(rel text, path text,
replacedby text, totalcost float, cheapest text, content text) ...

Where
1. pg_planner_rels consists of RelOptInfo information.
2. pg_planner_paths consists of paths belong to each rel.
3. Field "rel" is a RelOptInfo* pointer so these two tables can join each other.
4. We can further adding subqueries view, or expand pg_planner_paths
table easily adding startup_cost etc.

And an example output is attached. The nice thing about it, is it
reusing the database query power to diagnose database query problems
:-). From the tables, we can find out all paths considered and why
they are discarded.

I have a draft implementation of above idea - main changes including:
1. Keep removed path in a separate list, also the reason of removal
(see "replacedby" field);
2. Change debug_print_rel() family to use StringInfo instead of printf().

I would also suggest we change DEBUG_OPTIMIZER into a GUC, because we
need this ability in release build as well. There is almost no
overhead when GUC is off.

Before I get into more details, I'd like to see if we see this is a
good way to go.

Thoughts?

Regards,
Qingqing


-- query to test
postgres=# explain select a.i, count(a.i) from a join b on a.i=b.i
group by a.i order by a.i limit 1;
  QUERY PLAN

--
 Limit  (cost=14.47..15.64 rows=1 width=4)
   ->  GroupAggregate  (cost=14.47..120.67 rows=91 width=4)
 Group Key: a.i
 ->  Merge Join  (cost=14.47..97.51 rows=4450 width=4)
   Merge Cond: (a.i = b.i)
   ->  Index Only Scan using ai on a  (cost=0.28..45.07
rows=600 width=4)
   ->  Sort  (cost=14.20..14.72 rows=210 width=4)
 Sort Key: b.i
 ->  Seq Scan on b  (cost=0.00..6.10 rows=210 width=4)
(9 rows)

-- after query is done we can see rels
postgres=# select * from pg_planner_rels;
   rel   |   content
-+-
 2f49b70 | RELOPTINFO (1): rows=600 width=4
 2f49d08 | RELOPTINFO (2): rows=210 width=4
 2f4a590 | RELOPTINFO (1 2): rows=4450 width=4
 1543340 | RELOPTINFO (1): rows=1 width=64
(4 rows)

-- and paths
postgres=# select rel, path, replacedby, totalcost, substr(content, 1,
30) from pg_planner_paths ;
   rel   |  path   | replacedby | totalcost | substr
-+-++---+
 2f49b70 | 2f4abb8 || 1 | IdxScan(1) rows=600 cost=0.28.
 2f49b70 | 2f4a1f0 ||10 | SeqScan(1) rows=600 cost=0.00.
 2f49b70 | 2f49e10 ||45 | IdxScan(1) rows=600 cost=0.28.
 2f49b70 | 2f4d6b8 | 2f4d6b8| 5 | BitmapHeapScan(1) rows=600 cos
 2f49d08 | 2f4e3e0 || 6 | SeqScan(2) rows=210 cost=0.00.
 2f4a590 | 2f4f938 ||77 | HashJoin(1 2) rows=4450 cost=1
 2f4a590 | 2f4f678 ||88 | HashJoin(1 2) rows=4450 cost=8
 2f4a590 | 2f4f330 ||98 | MergeJoin(1 2) rows=4450 cost=
 2f4a590 | 2f4f850 ||   140 | NestLoop(1 2) rows=4450 cost=0
 2f4a590 | 2f4f188 ||  1907 | NestLoop(1 2) rows=4450 cost=0
 2f4a590 | 2f4f278 ||  1942 | NestLoop(1 2) rows=4450 cost=0
 2f4a590 | 2f4f8a8 | 2f4f8a8|  1908 | NestLoop(1 2) rows=4450 cost=0
 2f4a590 | 2f4f700 | 2f4f700|   105 | MergeJoin(1 2) rows=4450 cost=
 2f4a590 | 2f4eea0 | 2f4f330|   120 | MergeJoin(1 2) rows=4450 cost=
 2f4a590 | 2f4f220 | 2f4f278|  5280 | NestLoop(1 2) rows=4450 cost=0
 2f4a590 | 2f4efb8 | 2f4f188|  5245 | NestLoop(1 2) rows=4450 cost=0
 1543340 | 1543d08 || 1 | ForeignScan(1) rows=1 cost=0.0
 1543cf8 | 2f49a20 || 3 | ForeignScan(1) rows=17 cost=0.
(18 rows)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench stats per script & other stuff

2015-07-21 Thread Josh Berkus

On 07/21/2015 09:29 AM, Fabien COELHO wrote:
> Maybe -f file.sql:weight (yuk from my point of view, but it can be done
> easily).

Maybe it's past time for pgbench to have a config file?

Given that we want to define some per-workload options, the config file
would probably need to be YAML or JSON, e.g.:

pgbench --config=workload1.pgb

workload1.pgb
-

database: bench
port: 5432
host: localhost
user: josh
clients : 16
threads : 4
response-times : on
stats-level: script
script1:
file: script1.bench
weight: 3
script2:
file: script2.bench
weight: 1

the above would execute a pgbench with 16 clients, 4 threads, "script1"
three times as often as script2, and report stats at the script (rather
than SQL statement) level.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] brin index vacuum versus transaction snapshots

2015-07-21 Thread Jeff Janes

On Tue, Jul 21, 2015 at 3:10 PM, Kevin Grittner  wrote:

> Kevin Grittner  wrote:
>
> > If you run `make installcheck` against a cluster with
> > default_transaction_isolation = 'repeatable read' you get one
> > failure:
>
> > + ERROR: brin_summarize_new_values() cannot run in a transaction that
> has already obtained a snapshot
>
> > This is using source at commit 9faa6ae14f6098e4b55f0131f7ec2694a381fb87.
>
> Ah, the very next commit fixed this while I was running the tests.
>
> Nothing to see here, folks; move along
>


Still looks broken from here.

Re: [HACKERS] brin index vacuum versus transaction snapshots

2015-07-21 Thread Kevin Grittner

Kevin Grittner  wrote:

> If you run `make installcheck` against a cluster with
> default_transaction_isolation = 'repeatable read' you get one
> failure:

> + ERROR: brin_summarize_new_values() cannot run in a transaction that has 
> already obtained a snapshot

> This is using source at commit 9faa6ae14f6098e4b55f0131f7ec2694a381fb87.

Ah, the very next commit fixed this while I was running the tests.

Nothing to see here, folks; move along

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: Allow materialized views to use unlogged data store

2015-07-21 Thread Andres Freund

On 2015-07-21 14:38:24 -0700, Joshua D. Drake wrote:
> What do we think about $subject? In short, the underlying table of an MV
> would have the option of being unlogged.

There was a rather long thread about the difficulities of doing so
correctly back when matviews went in. It's not just a line of code or
10.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] RFC: Allow materialized views to use unlogged data store

2015-07-21 Thread Joshua D. Drake



-hackers,

What do we think about $subject? In short, the underlying table of an MV 
would have the option of being unlogged.


JD
--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench stats per script & other stuff

2015-07-21 Thread Robert Haas

On Tue, Jul 21, 2015 at 12:29 PM, Fabien COELHO  wrote:
> Pgbench *currently* already accept multiple "-f ..." options, and this is a
> good thing to test realistic loads which may intermix several kind of
> transactions, say a lot of readonly and some update or insert, and very rare
> deletes...

Hmm, I didn't realize that.  The code looks a bit inconsistent right
now - e.g. we do support multiple files, but pgbench's options-parsing
loop sets ttype to a value that depends only on the last of -f, -N,
and -S encountered.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Alpha2/Beta1

2015-07-21 Thread Gavin Flower


On 22/07/15 09:01, Josh Berkus wrote:
[...]


My question for Hackers is: should this be Alpha2 or Beta 1?



Yes!  :-)

(The decision, as to which, I'll leave to the REAL Developers.)

More seriously, I expect that the alpha probably generated a lot of 
useful feedback.  Was this actually the case?



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-21 Thread Paul Ramsey

On Tue, Jul 21, 2015 at 11:29 AM, Paul Ramsey  wrote:
> On July 21, 2015 at 11:22:12 AM, Tom Lane (t...@sss.pgh.pa.us) wrote:
>
> No, *not* populated first-time-through, because that won't handle any of
> the CREATE, DROP, or UPGRADE cases. It's also doing a lot of work you
> might never need. I was thinking of "populate on demand", that is, first
> time you need to know whether function X is shippable, you find that out
> and then cache the answer (whether it be positive or negative).
>
>
> Roger that. Off to the races..

Attached, reworked with a local cache. I felt a little dirty sticking
the cache entry right in postgres_fdw.c, so I broke out all my
nonsense into shippable.c.

Thanks!

P
diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index d2b98e1..3543312 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -1,7 +1,7 @@
 # contrib/postgres_fdw/Makefile
 
 MODULE_big = postgres_fdw
-OBJS = postgres_fdw.o option.o deparse.o connection.o $(WIN32RES)
+OBJS = postgres_fdw.o option.o deparse.o connection.o shippable.o $(WIN32RES)
 PGFILEDESC = "postgres_fdw - foreign data wrapper for PostgreSQL"
 
 PG_CPPFLAGS = -I$(libpq_srcdir)
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 81cb2b4..d6fff30 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -229,6 +229,9 @@ foreign_expr_walker(Node *node,
Oid collation;
FDWCollateState state;
 
+   /* Access extension metadata from fpinfo on baserel */
+   PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo 
*)(glob_cxt->foreignrel->fdw_private);
+
/* Need do nothing for empty subexpressions */
if (node == NULL)
return true;
@@ -361,7 +364,7 @@ foreign_expr_walker(Node *node,
 * can't be sent to remote because it might 
have incompatible
 * semantics on remote side.
 */
-   if (!is_builtin(fe->funcid))
+   if (!is_builtin(fe->funcid) && 
!is_shippable(fe->funcid, fpinfo))
return false;
 
/*
@@ -407,7 +410,7 @@ foreign_expr_walker(Node *node,
 * (If the operator is, surely its underlying 
function is
 * too.)
 */
-   if (!is_builtin(oe->opno))
+   if (!is_builtin(oe->opno) && 
!is_shippable(oe->opno, fpinfo))
return false;
 
/*
@@ -445,7 +448,7 @@ foreign_expr_walker(Node *node,
/*
 * Again, only built-in operators can be sent 
to remote.
 */
-   if (!is_builtin(oe->opno))
+   if (!is_builtin(oe->opno) && 
!is_shippable(oe->opno, fpinfo))
return false;
 
/*
@@ -591,7 +594,7 @@ foreign_expr_walker(Node *node,
 * If result type of given expression is not built-in, it can't be sent 
to
 * remote because it might have incompatible semantics on remote side.
 */
-   if (check_type && !is_builtin(exprType(node)))
+   if (check_type && !is_builtin(exprType(node)) && 
!is_shippable(exprType(node), fpinfo))
return false;
 
/*
@@ -1404,8 +1407,7 @@ deparseConst(Const *node, deparse_expr_cxt *context)
}
if (needlabel)
appendStringInfo(buf, "::%s",
-
format_type_with_typemod(node->consttype,
-   
  node->consttypmod));
+   format_type_be_qualified(node->consttype));
 }
 
 /*
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 7547ec2..9aeaf1a 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -15,6 +15,7 @@
 #include "postgres_fdw.h"
 
 #include "access/reloptions.h"
+#include "catalog/pg_foreign_data_wrapper.h"
 #include "catalog/pg_foreign_server.h"
 #include "catalog/pg_foreign_table.h"
 #include "catalog/pg_user_mapping.h"
@@ -124,6 +125,10 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 errmsg("%s requires a 
non-negative numeric value",
def->defname)));
}
+   else if (strcmp(def->defname, "extensions") == 0)
+   {
+   extractExtensionList(defGetString(def), NULL);
+   }
}
 
PG_RETURN_VOID();
@@ -153,6 +158,9 @@ InitPgFdwOptions(void

[HACKERS] Alpha2/Beta1

2015-07-21 Thread Josh Berkus

All:

At the Developer's Meeting, we said we'd be releasing an alpha or beta
each month until final 9.5 release.  As such, we should be releasing a
9.5 alpha/beta in the first week of August.

My question for Hackers is: should this be Alpha2 or Beta 1?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Arguable RLS security bug, EvalPlanQual() paranoia

2015-07-21 Thread Stephen Frost

Robert,

As I mentioned up thread, I'm out until the 27th. I have posted a patch
which I will push to fix the copy.c issue, and I have already stated that
I'll address the statistics issue. Further, Joe has also been working on
issues but he was out of pocket last week attending a conference.

I'm happy to work up a documentation patch for this when I get back.

Thanks!

Stephen

On Tuesday, July 21, 2015, Robert Haas  wrote:

> On Sun, Jul 19, 2015 at 8:56 PM, Peter Geoghegan  > wrote:
> > On Mon, Jun 1, 2015 at 12:29 AM, Peter Geoghegan  > wrote:
> >> If you're using another well known MVCC database system that has RLS,
> >> I imagine when this happens the attacker similarly waits on the
> >> conflicting (privileged) xact to finish (in my example in the patch,
> >> Bob's xact). However, unlike with the Postgres READ COMMITTED mode,
> >> Mallory would then have her malicious UPDATE statement entirely rolled
> >> back, and her statement would acquire an entirely new MVCC snapshot,
> >> to be used by the USING security barrier qual (and everything else)
> >> from scratch. This other system would then re-run her UPDATE with the
> >> new MVCC snapshot. This would repeat until Mallory's UPDATE statement
> >> completes without encountering any concurrent UPDATEs/DELETEs to her
> >> would-be affected rows.
> >>
> >> In general, with this other database system, an UPDATE must run to
> >> completion without violating MVCC, even in READ COMMITTED mode. For
> >> that reason, I think we can take no comfort from the presumption that
> >> this flexibility in USING security barrier quals (allowing subqueries,
> >> etc) works securely in this other system. (I actually didn't check
> >> this out, but I imagine it's true).
> >
> > Where are we on this?
> >
> > Discussion during pgCon with Heikki and Andres led me to believe that
> > the issue is acceptable. The issue can be documented to help ensure
> > that user expectation is in line with actual user-visible behavior.
> > Unfortunately, I think that that will be a clunky documentation patch.
>
> Perhaps I'm missing something, but it looks to me like Stephen has
> done absolutely nothing about the many issues reported with the RLS
> patch.  I organized the open items list by topic on June 26th; almost
> a month later, four more issues have been added to the section on RLS,
> and none have been removed.
>
> I think it is right that we should be concerned about this.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>

Re: [HACKERS] Arguable RLS security bug, EvalPlanQual() paranoia

2015-07-21 Thread Robert Haas

On Sun, Jul 19, 2015 at 8:56 PM, Peter Geoghegan  wrote:
> On Mon, Jun 1, 2015 at 12:29 AM, Peter Geoghegan  wrote:
>> If you're using another well known MVCC database system that has RLS,
>> I imagine when this happens the attacker similarly waits on the
>> conflicting (privileged) xact to finish (in my example in the patch,
>> Bob's xact). However, unlike with the Postgres READ COMMITTED mode,
>> Mallory would then have her malicious UPDATE statement entirely rolled
>> back, and her statement would acquire an entirely new MVCC snapshot,
>> to be used by the USING security barrier qual (and everything else)
>> from scratch. This other system would then re-run her UPDATE with the
>> new MVCC snapshot. This would repeat until Mallory's UPDATE statement
>> completes without encountering any concurrent UPDATEs/DELETEs to her
>> would-be affected rows.
>>
>> In general, with this other database system, an UPDATE must run to
>> completion without violating MVCC, even in READ COMMITTED mode. For
>> that reason, I think we can take no comfort from the presumption that
>> this flexibility in USING security barrier quals (allowing subqueries,
>> etc) works securely in this other system. (I actually didn't check
>> this out, but I imagine it's true).
>
> Where are we on this?
>
> Discussion during pgCon with Heikki and Andres led me to believe that
> the issue is acceptable. The issue can be documented to help ensure
> that user expectation is in line with actual user-visible behavior.
> Unfortunately, I think that that will be a clunky documentation patch.

Perhaps I'm missing something, but it looks to me like Stephen has
done absolutely nothing about the many issues reported with the RLS
patch.  I organized the open items list by topic on June 26th; almost
a month later, four more issues have been added to the section on RLS,
and none have been removed.

I think it is right that we should be concerned about this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BRIN index and aborted transaction

2015-07-21 Thread Robert Haas

On Sat, Jul 18, 2015 at 5:11 AM, Alvaro Herrera
 wrote:
> Yeah, that's a bit of an open problem: we don't have any mechanism to
> mark a block range as needing resummarization, yet.  I don't have any
> great ideas there, TBH.  Some options that were discussed but never led
> anywhere:
>
> 1. whenever a heap tuple is deleted that's minimum or maximum for a
> column, mark the index tuple as needing resummarization.  One a future
> vacuuming pass the index would be updated.  (I think this works for
> minmax, but I don't see how to apply it to inclusion).
>
> 2. have block ranges be resummarized randomly during vacuum.
>
> 3. Have index tuples last for only X number of transactions, marking the
> as needing summarization when that expires.
>
> 4. Have a user-invoked function that re-runs summarization.  That way
> the user can implement any of the above policies, or others.

Maybe I'm confused here, but it seems like the only time
re-summarization can be needed is when tuples are pruned.  The mere
act of deleting a tuple, even if the delete goes on to commit, doesn't
create a scenario where re-summarization can work out to a win,
because there may still be snapshots that can see it.  At the point
where we prune the tuple, though, there might well be a benefit in
re-summarizing, because now a newly-computed summary value won't need
to cover a value that previously had to be there.

But it seems obviously impractical to re-summarize when we HOT-prune,
so it seems like the obvious thing to do is make vacuum do it.  We
know during phase one of vacuum whether we saw any dead tuples in page
range X-Y; if yes, re-summarize.  The only reason not to do this is if
it causes us to do a lot of resummarization that frequently fails to
produce a smaller range. Do you have any experimental data suggesting
that this is or is not a problem?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-21 Thread Robert Haas

On Tue, Jun 30, 2015 at 4:32 AM, Simon Riggs  wrote:
> Yes, I suggest just a single column on pg_stat_activity called pct_complete
>
> trace_completion_interval = 5s (default)
>
> Every interval, we report the current % complete for any operation that
> supports it. We just show NULL if the current operation has not reported
> anything or never will.

I am deeply skeptical about the usefulness of a progress-reporting
system that can only report one number.  I think that, in many cases,
it won't be possible to compute an accurate completion percentage, and
even if it is, people may want more data than that for various reasons
anyway.

For example, in the case of VACUUM, suppose there is a table with
1,000,000 heap pages and 200,000 index pages (so it's probably
over-indexed, but whatever).  After reading 500,000 heap pages, we
have found 0 dead tuples.  What percentage of the work have we
finished?

It's hard to say.  If we don't find any dead tuples, we've read half
the pages we will eventually read and are therefore half done.  But if
we find even 1 dead tuple, then we've got to scan all 200,000 index
pages, so we've read only 41.7% of the pages we'll eventually touch.
If we find so many dead tuples that we have to scan the indexes
multiple times for lack of maintenance_work_mem, we'll eventually read
1,000,000 + 200,000k pages, where k is the number of index scans; if
say k = 5 then we are only 25% done.  All of these scenarios are
plausible because, in all likelihood, the dirty pages in the table are
concentrated near the end.

Now we could come up with ways of making good guesses about what is
likely to happen.  We could look at the data from pg_stat_all_tables,
historical results of vacuuming this table, the state of the
visibility map, and so on.  And that all might help.  But it's going
to be fairly hard to produce a percentage of completion that is
monotonically increasing and always accurately reflects the time
remaining.  Even if we can do it, it doesn't seem like a stretch to
suppose that sometimes people will want to look at the detail data.
Instead of getting told "we're X% done" (according to some arcane
formula), it's quite reasonable to think that people will want to get
a bunch of values, e.g.:

1. For what percentage of heap pages have we completed phase one (HOT
prune + mark all visible if appropriate + freeze + remember dead
tuples)?
2. For what percentage of heap pages have we completed phase two (mark
line pointers unused)?
3. What percentage of maintenance_work_mem are we currently using to
remember tuples?

For a query, the information we want back is likely to be even more
complicated; e.g. EXPLAIN output with row counts and perhaps timings
to date for each plan node.  We can insist that all status reporting
get boiled down to one number, but I suspect we would be better off
asking ourselves how we could let commands return a richer set of
data.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Eliminating CREATE INDEX comparator TID tie-breaker overhead

2015-07-21 Thread Peter Geoghegan

I have one more idea for accelerating sorting-related operations that
is both fairly effective and relatively easy to implement. This just
about clears my backlog of those, though.

There is an open item on the TODO list entitled "Consider whether
duplicate keys should be sorted by block/offset" [1].

Currently, comparetup_index_btree() and comparetup_index_hash() do a
tie-breaker on ItemPointer (heap TID). This started as insurance
against bad qsort() implementations that do badly with many equal
keys, but it was also thought that there is some value in having equal
keys be in physical (heap) order. Clearly the first concern has been
obsolete since 2006, when a high quality qsort() was added to
Postgres, but the second concern is probably still valid.

Overhead
-

Tom said in 2008 that the CREATE INDEX overhead of doing this may be
about 7% [2]. It seems even worse now, presumably due to SortSupport
reducing costs elsewhere. Several years ago, I suggested ripping out
the tie-breaker too, but didn't get very far with that idea due to the
potential downsides for index scans. I'm not about to revisit the
discussion from 2011 about whether or not it should be torn out. I'd
rather just (mostly) fix the real problem without changing the
behavior of comparetup_index_btree() (or comparetup_index_hash()).

The real problem is that we do pointer chasing to get to the
IndexTuple ("tuple proper"), where we might get away with only
examining the SortTuple, as would happen in comparetup_heap() in the
event of an equivalent heap tuplesort, for example. The added memory
latency hurts lower cardinality attributes (when only one attribute
appears in the Index definition and the IndexTuple isn't cache
resident), and wastes memory bandwidth on the system, which is
something that there is virtually never enough of.

Patch
=

Attached patch adds a new tie-breaker which is used only when the
tuples being sorted still fit in memory. Currently, the field
SortTuple.tupindex has a different purpose depending on whether we're
building initial runs or merging runs. However, SortTuple.tupindex
currently has no purpose when a sort can complete entirely in memory,
so that case is available to support a third meaning:
SortTuple.tupindex can be used to hold a simple ordinal number. When
our status is TSS_INITIAL (when we qsort()), this is used as a proxy
for what the TID tie-breaker would ordinarily return.

This reliably preserves the existing behavior because index tuplesort
clients invariably scan heap tuples in physical order, and so nothing
is lost.

Performance
-

The patch can make CREATE INDEX with an unlogged B-Tree index with a
single int4 attribute as much as 15% faster (although I think under
10% is much more likely). That seems like an improvement that makes
the patch worthwhile.

Design considerations and consequences


I don't think that there is much point in getting rid of
SortTuple.tupindex for in-memory sorts, which this patch closes off
the possibility of. I tested the performance of a SortTuple struct
with only a datum1 and "tuple proper" for in-memory sorting at one
time, and it was disappointing, and in any case unworkably invasive.

I also don't think it's worth worrying about the fact that tupindex is
assigned an ordinal number even in cases where it won't be used inside
the comparator. The overhead has to be indistinguishable from zero
anyway, and the right place to put that assignment happens to be where
there is generic TSS_INITIAL copying within puttuple_common().

I'm not concerned about synchronized scans breaking my assumption of a
physical ordering of heaptuples being fed to tuplesort.c. I think that
it is unlikely to ever be worth seriously considering this case.

I have a hard time imagining anything (beyond synchronous scans)
breaking my assumption that index tuplesorts receive tuples in heap
physical order. If anything was to break that in the future (e.g.
parallelizing the heap scan for index builds), then IMV the onus
should be on that new case to take appropriate precautions against
breaking my assumption.

[1] https://wiki.postgresql.org/wiki/Todo#Sorting
[2] http://www.postgresql.org/message-id/23321.1205726...@sss.pgh.pa.us
-- 
Peter Geoghegan
From 22e3aa14436d3c4ca8ccde2cc55777c6207fd9f0 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Thu, 16 Jul 2015 22:26:01 -0700
Subject: [PATCH] Cache-aware index sort comparator tie-breakers

For the B-Tree and hash index build sort cases only, tuplesort
previously performed a tie-breaker on ItemPointer (heap physical order).
This may later provide some performance benefits for index scans.
However, it implied significant overhead during index builds, because
additional pointer chasing (to the IndexTuple "tuple proper") was
required in the comparators, which adds significant memory latency where
the "tuple proper" does not otherwise need to be accessed (e.g. single

[HACKERS] brin index vacuum versus transaction snapshots

2015-07-21 Thread Kevin Grittner

If you run `make installcheck` against a cluster with
default_transaction_isolation = 'repeatable read' you get one
failure:

*** /home/kgrittn/pg/master/src/test/regress/expected/brin.out 2015-07-21 
10:21:58.798619111 -0500
--- /home/kgrittn/pg/master/src/test/regress/results/brin.out 2015-07-21 
14:00:25.169320059 -0500
***
*** 405,409 
--- 405,410 
box(point(odd, even), point(thousand, twothousand))
  FROM tenk1 ORDER BY unique2 LIMIT 5 OFFSET 5;
  VACUUM brintest; -- force a summarization cycle in brinidx
+ ERROR: brin_summarize_new_values() cannot run in a transaction that has 
already obtained a snapshot
  UPDATE brintest SET int8col = int8col * int4col;
  UPDATE brintest SET textcol = '' WHERE textcol IS NOT NULL;


The stacktrace is:

#0 brinsummarize (index=0x7f1fe0eca6b8, heapRel=0x7f1fe0ed64f8, 
numSummarized=0x30e1da8, numExisting=0x30e1da8) at brin.c:1080
#1 0x004683d3 in brinvacuumcleanup (fcinfo=0x7ffd41b95c98) at brin.c:731
#2 0x00a69b28 in FunctionCall2Coll (flinfo=0x7ffd41b96078, collation=0, 
arg1=140725706121624, arg2=0) at fmgr.c:1323
#3 0x004f7b60 in index_vacuum_cleanup (info=0x7ffd41b96198, stats=0x0) 
at indexam.c:717
#4 0x006e1004 in lazy_cleanup_index (indrel=0x7f1fe0eca6b8, stats=0x0, 
vacrelstats=0x30e0bb0) at vacuumlazy.c:1397
#5 0x006df637 in lazy_scan_heap (onerel=0x7f1fe0ed64f8, 
vacrelstats=0x30e0bb0, Irel=0x30e0ca8, nindexes=1, scan_all=0 '\000') at 
vacuumlazy.c:1108
#6 0x006dda3b in lazy_vacuum_rel (onerel=0x7f1fe0ed64f8, options=1, 
params=0x7ffd41b96798, bstrategy=0x30d9a38) at vacuumlazy.c:244
#7 0x006dc18d in vacuum_rel (relid=30220, relation=0x2f8d1a8, 
options=1, params=0x7ffd41b96798) at vacuum.c:1372
#8 0x006db711 in vacuum (options=1, relation=0x2f8d1a8, relid=0, 
params=0x7ffd41b96798, va_cols=0x0, bstrategy=0x30d9a38, isTopLevel=1 '\001') 
at vacuum.c:293
#9 0x006db31d in ExecVacuum (vacstmt=0x2f8d200, isTopLevel=1 '\001') at 
vacuum.c:121
#10 0x008bef36 in standard_ProcessUtility (parsetree=0x2f8d200, 
queryString=0x2f8c788 "VACUUM brintest;", context=PROCESS_UTILITY_TOPLEVEL, 
params=0x0, dest=0x2f8d588, completionTag=0x7ffd41b96d50 "") at utility.c:654
#11 0x008be69e in ProcessUtility (parsetree=0x2f8d200, 
queryString=0x2f8c788 "VACUUM brintest;", context=PROCESS_UTILITY_TOPLEVEL, 
params=0x0, dest=0x2f8d588, completionTag=0x7ffd41b96d50 "") at utility.c:335
#12 0x008be1a7 in PortalRunUtility (portal=0x2f36b18, 
utilityStmt=0x2f8d200, isTopLevel=1 '\001', dest=0x2f8d588, 
completionTag=0x7ffd41b96d50 "") at pquery.c:1187
#13 0x008bd1bd in PortalRunMulti (portal=0x2f36b18, isTopLevel=1 
'\001', dest=0x2f8d588, altdest=0x2f8d588, completionTag=0x7ffd41b96d50 "") at 
pquery.c:1318
#14 0x008bc80d in PortalRun (portal=0x2f36b18, 
count=9223372036854775807, isTopLevel=1 '\001', dest=0x2f8d588, 
altdest=0x2f8d588, completionTag=0x7ffd41b96d50 "") at pquery.c:816
#15 0x008b7edf in exec_simple_query (query_string=0x2f8c788 "VACUUM 
brintest;") at postgres.c:1104
#16 0x008b720c in PostgresMain (argc=1, argv=0x2f1d450, 
dbname=0x2f1d2b0 "regression", username=0x2f1d290 "kgrittn") at postgres.c:4025
#17 0x0081ab99 in BackendRun (port=0x2f3d610) at postmaster.c:4183
#18 0x0081a17a in BackendStartup (port=0x2f3d610) at postmaster.c:3859
#19 0x00816753 in ServerLoop () at postmaster.c:1618
#20 0x00813d4a in PostmasterMain (argc=3, argv=0x2f1c460) at 
postmaster.c:1263
#21 0x0074ec36 in main (argc=3, argv=0x2f1c460) at main.c:223


Note that the function mentioned in the error message is not
anywhere in the stack trace, and that there was not any explicit
transaction started -- just a VACUUM command for the table, without
any BEGIN/COMMIT.


This is using source at commit 9faa6ae14f6098e4b55f0131f7ec2694a381fb87.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

2015-07-21 Thread Martijn van Oosterhout

On Tue, Jul 21, 2015 at 02:24:47PM -0400, Todd A. Cook wrote:
> Hi,
> 
> This thread seemed to trail off without a resolution.  Was anything done?

Not that I can tell. I was the original poster of this thread. We've
worked around the issue by placing a CHECKPOINT command at the end of
the migration script.  For us it's not a performance issue, more a
correctness one, tables were empty when they shouldn't have been.

I'm hoping a fix will appear in the 9.5 release, since we're intending
to release with that version.  A forced checkpoint every now and them
probably won't be a serious problem though.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer

signature.asc
Description: Digital signature

Re: [HACKERS] pgbench stats per script & other stuff

2015-07-21 Thread Fabien COELHO



[...] and that a subsequent -w modifies the meaning of the 
script-specifiying argument already read. That strikes me as a very 
unintuitive interface.


Ok, I understand this "afterward modification" objection.

What if the -w would be required *before*, and supply a weight for (the 
first/maybe all) script(s) specified *afterwards*, so it does not modify 
something already provided? I think it would be more intuitive, or at least 
less surprising.


Here is a v3 which does that. If there is a better idea, do not hesitate!

 sh> ./pgbench -w 9 -f one.sql -f now.sql -T 2 -P 1 --per-script-stats
 starting vacuum...end.
 progress: 1.0 s, 24536.0 tps, lat 0.039 ms stddev 0.024
 progress: 2.0 s, 25963.8 tps, lat 0.038 ms stddev 0.015
 transaction type: multiple scripts
 scaling factor: 1
 query mode: simple
 number of clients: 1
 number of threads: 1
 duration: 2 s
 number of transactions actually processed: 50501
 latency average = 0.039 ms
 latency stddev = 0.020 ms
 tps = 25249.464772 (including connections establishing)
 tps = 25339.454154 (excluding connections establishing)
 SQL script 0, weight 9: one.sql
  - 45366 transactions (89.8% of total, tps = 22682.070035)
  - latency average = 0.038 ms
  - latency stddev = 0.016 ms
 SQL script 1, weight 1: now.sql
  - 5135 transactions (10.2% of total, tps = 2567.394737)
  - latency average = 0.044 ms
  - latency stddev = 0.041 ms

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 2517a3a..6bac511 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -261,6 +261,18 @@ pgbench  options  dbname
 benchmarking arguments:
 
 
+ 
+  -B
+  --tpc-b
+  
+   
+Built-in TPC-B like test which updates three tables and performs
+an insert on a fourth. See below for details.
+This is the default if no bench is explicitely specified.
+   
+  
+ 
+
 
  
   -c clients
@@ -313,8 +325,7 @@ pgbench  options  dbname

 Read transaction script from filename.
 See below for details.
--N, -S, and -f
-are mutually exclusive.
+Multiple -f options are allowed.

   
  
@@ -404,10 +415,10 @@ pgbench  options  dbname
   --skip-some-updates
   

-Do not update pgbench_tellers and
-pgbench_branches.
-This will avoid update contention on these tables, but
-it makes the test case even less like TPC-B.
+Built-in test which updates only one table compared to -B.
+This avoids update contention on pgbench_tellers
+and pgbench_branches, but it makes the test case
+even less like TPC-B.

   
  
@@ -499,9 +510,9 @@ pgbench  options  dbname
 Report the specified scale factor in pgbench's
 output.  With the built-in tests, this is not necessary; the
 correct scale factor will be detected by counting the number of
-rows in the pgbench_branches table.  However, when testing
-custom benchmarks (-f option), the scale factor
-will be reported as 1 unless this option is used.
+rows in the pgbench_branches table.
+However, when testing only custom benchmarks (-f option),
+the scale factor will be reported as 1 unless this option is used.

   
  
@@ -511,7 +522,7 @@ pgbench  options  dbname
   --select-only
   

-Perform select-only transactions instead of TPC-B-like test.
+Built-in test with select-only transactions.

   
  
@@ -552,6 +563,20 @@ pgbench  options  dbname
  
 
  
+  -w
+  --weight
+  
+   
+Set the integer weight for the next test script
+(-B, -N, -S or -f).
+When several test scripts are used, the relative probability of
+drawing each test is proportional to these weights.
+The default weight is 1.
+   
+  
+ 
+
+ 
   --aggregate-interval=seconds
   

@@ -567,6 +592,16 @@ pgbench  options  dbname
  
 
  
+  --per-script-stats
+  
+   
+Report some statistics per script run by pgbench.
+   
+  
+ 
+
+
+ 
   --sampling-rate=rate
   

@@ -661,7 +696,17 @@ pgbench  options  dbname
   What is the Transaction Actually Performed in pgbench?
 
   
-   The default transaction script issues seven commands per transaction:
+   Pgbench executes test scripts chosen randomly from a specified list.
+   They include built-in scripts (-B, -N and
+   -S) and custom scripts provided by the user with -f.
+   Each test may be given a relative weight (-w)
+   specified before the test about which it applies so as
+   to change its draw probability.
+ 
+
+  
+   The default transaction script (also invoked with -B)
+   issues seven commands per transaction:
   
 
   
@@ -689,10 +734,7 @@ pgbench  options  dbname
benchmark scenarios

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-21 Thread Paul Ramsey

On July 21, 2015 at 11:22:12 AM, Tom Lane (t...@sss.pgh.pa.us) wrote:
> So: yes to a local cache of all forwardable functions/ops, populated in full 
> the first time through (does that speak maybe to using a binary search on a 
> sorted list instead of a hash, since I only pay the sort price once and am 
> not doing any insertions?). And then we just hold it until the connection 
> goes away.   

No, *not* populated first-time-through, because that won't handle any of  
the CREATE, DROP, or UPGRADE cases. It's also doing a lot of work you  
might never need. I was thinking of "populate on demand", that is, first  
time you need to know whether function X is shippable, you find that out  
and then cache the answer (whether it be positive or negative).  

Roger that. Off to the races..

P

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-21 Thread Andres Freund

On 2015-07-21 14:07:24 -0400, Tom Lane wrote:
> Paul Ramsey  writes:
> > Folks are going to be OK w/ me dropping in new syscache entries so support 
> > my niche little feature?
> 
> No, mainly because it adds overhead without fixing your problem.

Meh. pg_extension updates are exceedingly rare, and there's a bunch of
code in extension.c that could very well have used a syscache instead of
doing manual scans over the table.

> It's not correct to suppose that a syscache on pg_extension would
> reliably report anything; consider ALTER EXTENSION ADD/DROP, which
> does not touch the pg_extension row.

I'd have just brute-force solved that by forcing a cache inval in that
case.

But I'm not going to complain too loudly if we don't do invalidation.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

2015-07-21 Thread Todd A. Cook


Hi,

This thread seemed to trail off without a resolution.  Was anything done?
(See more below.)

On 07/09/15 19:06, Tom Lane wrote:

Andres Freund  writes:

On 2015-07-06 11:49:54 -0400, Tom Lane wrote:

Rather than reverting cab9a0656c36739f, which would re-introduce a
different performance problem, perhaps we could have COPY create a new
relfilenode when it does this.  That should be safe if the table was
previously empty.



I'm not convinced that cab9a0656c36739f needs to survive in that
form. To me only allowing one COPY to benefit from the wal_level =
minimal optimization has a significantly higher cost than
cab9a0656c36739f.


What evidence have you got to base that value judgement on?

cab9a0656c36739f was based on an actual user complaint, so we have good
evidence that there are people out there who care about the cost of
truncating a table many times in one transaction.


I'm the complainer mentioned in the cab9a0656c36739f commit message. :)

FWIW, we use a temp table to split a join across 4 largish tables
(10^8 rows or more each) and 2 small tables (10^6 rows each).  We
write the results of joining the 2 largest tables into the temp
table, and then join that to the other 4.  This gave significant
performance benefits because the planner would know the exact row
count of the 2-way join heading into the 4-way join.  After commit
cab9a0656c36739f, we got another noticeable performance improvement
(I did timings before and after, but I can't seem to put my hands
on the numbers right now).

We do millions of these queries every day in batches.  Each batch
reuses a single temp table (truncating it before each pair of joins)
so as to reduce the churn in the system catalogs.  In case it matters,
the temp table is created with ON COMMIT DROP.

This was (and still is) done on 9.2.x.

HTH.

-- todd cook
-- tc...@blackducksoftware.com


> On the other hand,

I know of no evidence that anyone's depending on multiple sequential
COPYs, nor intermixed COPY and INSERT, to be fast.  The original argument
for having this COPY optimization at all was to make restoring pg_dump
scripts in a single transaction fast; and that use-case doesn't care
about anything but a single COPY into a virgin table.

I think you're worrying about exactly the wrong case.


My tentative guess is that the best course is to
a) Make heap_truncate_one_rel() create a new relfeilnode. That fixes the
truncation replay issue.
b) Force new pages to be used when using the heap_sync mode in
COPY. That avoids the INIT danger you found. It seems rather
reasonable to avoid using pages that have already been the target of
WAL logging here in general.


And what reason is there to think that this would fix all the problems?
We know of those two, but we've not exactly looked hard for other cases.
Again, the only known field usage for the COPY optimization is the pg_dump
scenario; were that not so, we'd have noticed the problem long since.
So I don't have any faith that this is a well-tested area.

regards, tom lane






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-21 Thread Tom Lane

Paul Ramsey  writes:
> On July 21, 2015 at 11:07:36 AM, Tom Lane (t...@sss.pgh.pa.us) wrote:
> I'm inclined to think that it's not really necessary to worry aboutÂ 
> invalidating a per-connection cache of "is this function safe to ship"Â 
> determinations.

> So: yes to a local cache of all forwardable functions/ops, populated in full 
> the first time through (does that speak maybe to using a binary search on a 
> sorted list instead of a hash, since I only pay the sort price once and am 
> not doing any insertions?). And then we just hold it until the connection 
> goes away.Â 

No, *not* populated first-time-through, because that won't handle any of
the CREATE, DROP, or UPGRADE cases.  It's also doing a lot of work you
might never need.  I was thinking of "populate on demand", that is, first
time you need to know whether function X is shippable, you find that out
and then cache the answer (whether it be positive or negative).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-21 Thread Paul Ramsey

On July 21, 2015 at 11:07:36 AM, Tom Lane (t...@sss.pgh.pa.us) wrote:
I'm inclined to think that it's not really necessary to worry about 
invalidating a per-connection cache of "is this function safe to ship" 
determinations.
So: yes to a local cache of all forwardable functions/ops, populated in full 
the first time through (does that speak maybe to using a binary search on a 
sorted list instead of a hash, since I only pay the sort price once and am not 
doing any insertions?). And then we just hold it until the connection goes 
away. 

Yes?

P.

Re: [HACKERS] First Aggregate Funtion?

2015-07-21 Thread Jim Nasby


On 7/20/15 11:07 AM, Marko Tiikkaja wrote:

On 7/20/15 6:02 PM, Corey Huinker wrote:

By using only(a.name_of_the_thing) we'd have a bit more clarity that the
author expected all of those values to be the same across the aggregate
window, and discovering otherwise was reason enough to fail the query.

*IF* we're considering adding these to core, I think that only() would be
just a slight modification of the last() implementation, and could be
done
at the same time.

[1] I don't care what it gets named. I just want the functionality.


A big +1 from me.  In fact, I wrote a patch implementing this for 9.5
but never got around to finishing it.


A big +1 here too; I've wanted this many times in the past.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-21 Thread Tom Lane

Paul Ramsey  writes:
> Folks are going to be OK w/ me dropping in new syscache entries so support my 
> niche little feature?

No, mainly because it adds overhead without fixing your problem.  It's not
correct to suppose that a syscache on pg_extension would reliably report
anything; consider ALTER EXTENSION ADD/DROP, which does not touch the
pg_extension row.

I'm inclined to think that it's not really necessary to worry about
invalidating a per-connection cache of "is this function safe to ship"
determinations.  Neither CREATE EXTENSION nor DROP EXTENSION pose any
hazard, nor would ALTER EXTENSION UPGRADE for typical scenarios (which
would only include adding new functions that weren't there before, so
they weren't in your cache anyway).

Anybody who's screwing around with extension membership on-the-fly is
unlikely to expect the system to redetermine ship-ability for active FDW
connections anyway.  If you could do that fully correctly for not a lot of
additional cost, sure; but really anything like this is only going to
take you from 99% to 99.01% coverage of real cases.  Doesn't seem worth
the trouble.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-21 Thread Jim Nasby


On 7/20/15 4:32 AM, Thakur, Sameer wrote:

Hello,

Does this actually handle multiple indexes? It doesn't appear so, which I'd 
think is a significant problem... :/

Please find v2 attached which does this.


I think it'd be better to combine both numbers into one report:

elog(WARNING,"Current/Overall index percentage completion %f/%f", 
current_index_progress * 100, all_index_progress);


It'd also be good to standardize on where the * 100 is happening.

Also, AFAIK:

(itemptr->ip_blkid.bi_hi != vacrelstats->last_scanned_page.bi_hi) || 
(itemptr->ip_blkid.bi_lo != vacrelstats->last_scanned_page.bi_lo)


can be replaced by

(itemptr->ipblkid != vacrelstats->last_scanned_page)

and

vacrelstats->current_index_scanned_page_count = 
vacrelstats->current_index_scanned_page_count + 1;


can simply be

vacrelstats->current_index_scanned_page_count++;
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Selectivity estimation for intarray with @@

2015-07-21 Thread Heikki Linnakangas


On 07/21/2015 07:28 PM, Alexander Korotkov wrote:

On Tue, Jul 21, 2015 at 6:49 PM, Heikki Linnakangas  wrote:

In this version of patch it's not checked if variable is actually and int[]
not query_int. See following test case.

# create table test2 as (select '1'::query_int val from
generate_series(1,100));
# analyze test2;

# explain select * from test2 where '{1}'::int[] @@ val;
ERROR:  unrecognized int query item type: 0

I've added this check.

* Also use the estimators for the obsolete @ and ~ operators. Not that I

care much about those as they are obsolete, but seems strange not to, as
it's a trivial matter of setting the right estimator function.

* I added an ANALYZE in the regression test. It still won't systematically
test all the cost estimation code, and there's nothing to check that the
estimates make sense, but at least more of the code will now run.


You also forgot to include intarray--1.0--1.1.sql into patch. I've also
added it.


Thanks, committed!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-21 Thread Paul Ramsey

 
On July 21, 2015 at 8:26:31 AM, Andres Freund 
(and...@anarazel.de(mailto:and...@anarazel.de)) wrote:
> On 2015-07-21 17:00:51 +0200, Andres Freund wrote:
> > On 2015-07-21 07:55:17 -0700, Paul Ramsey wrote:
> > > On Tue, Jul 21, 2015 at 7:45 AM, Andres Freund wrote:
> > > So, right after reading the options in postgresGetForeignRelSize,
> > > expand the extension list into a list of all ops/functions, in a
> > > sorted list, and let that carry through to the deparsing instead?
> >
> > I'd actually try to make it longer lived, i.e. permanently. And just
> > deallocate when a catcache callback says it needs to be invalidated;
> > IIRC there is a relevant cache.
>  
> On second thought I'd not use a binary search but a hash table. If you
> choose the right key a single table is enough for the lookup.
>  
> If you need references for invalidations you might want to look for
> CacheRegisterSyscacheCallback callers. E.g. attoptcache.c

> On 2015-07-21 08:32:34 -0700, Paul Ramsey wrote: 
> > Thanks! Reading some of the syscache callback stuff, one thing I’m not 
> > sure of is which SysCacheIdentifier(s) I should be registering 
> > callbacks against to ensure my list of funcs/procs that reside in 
> > particular extensions is kept fresh. I don’t see anything tracking the 
> > dependencies there 

> FOREIGNSERVEROID, and a new syscache on pg_extension.oid should suffice 
> I think. pg_foreign_server will be changed upon option changes and 
> pg_extension.oid on extension upgrades. 

> Since dependencies won't change independently of extension versions I 
> don't think we need to care otherwise. There's ALTER EXTENSION ... ADD 
> but I'm rather prepared to ignore that; if that's not ok it's trivial to 
> make it emit an invalidation. 

This hole just keeps getting deeper :) So,

- a HASH in my own code to hold all the functions that I consider “safe to 
forward” (which I’ll derive by reading the contents of the extensions the users 
declare)
- callbacks in my code registered using CacheRegisterSyscacheCallback on 
FOREIGNSERVEROID and __see_below__ that refresh my cache when called
- since there is no syscache for extensions right now, a new syscache entry for 
pg_extension.oid (and I ape things in syscache.h and syscache.c and Magic 
Occurs?)
- which means I can also CacheRegisterSyscacheCallback on the new EXTENSIONOID 
syscache
- and finally change my is_in_extension() to efficiently check my HASH instead 
of querying the system catalog

Folks are going to be OK w/ me dropping in new syscache entries so support my 
niche little feature?

ATB,

P






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] What is HeapScanDescData.rs_initblock good for?

2015-07-21 Thread Tom Lane

Alvaro Herrera  writes:
> Tom Lane wrote:
>> The BRIN patch added a HeapScanDescData field rs_initblock, but so far as
>> I can see it's utterly without use, and it's quite confusing (people might
>> mistake it for rs_startblock, for example).  Any objection to taking it
>> out again?

> Ouch, you're right, my mistake.  Feel free to remove it, yeah.

... While I'm looking at it, it sure looks like the BRIN patch broke
syncscan for those index build methods that were using it, which was
most.  You've got IndexBuildHeapRangeScan unconditionally calling
heap_setscanlimits and thereby trashing the result of ss_get_location().

I'm inclined to let it call heap_setscanlimits only if not allow_sync.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] What is HeapScanDescData.rs_initblock good for?

2015-07-21 Thread Alvaro Herrera

Tom Lane wrote:
> The BRIN patch added a HeapScanDescData field rs_initblock, but so far as
> I can see it's utterly without use, and it's quite confusing (people might
> mistake it for rs_startblock, for example).  Any objection to taking it
> out again?

Ouch, you're right, my mistake.  Feel free to remove it, yeah.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] What is HeapScanDescData.rs_initblock good for?

2015-07-21 Thread Tom Lane

The BRIN patch added a HeapScanDescData field rs_initblock, but so far as
I can see it's utterly without use, and it's quite confusing (people might
mistake it for rs_startblock, for example).  Any objection to taking it
out again?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fillfactor for GIN indexes

2015-07-21 Thread Alexander Korotkov

On Tue, Jul 21, 2015 at 7:20 PM, Heikki Linnakangas  wrote:

> On 07/21/2015 04:14 PM, Alexander Korotkov wrote:
>
>> On Tue, Jul 21, 2015 at 3:52 PM, Heikki Linnakangas 
>> wrote:
>>
>>  On 07/21/2015 02:56 PM, Alexander Korotkov wrote:
>>>
>>>  Probably, but currently we are in quite unlogical situation. We have
 default fillfactor = 90 for GiST where it has no use cases at all and
 effectively is just a waste of space.

>>> Why is it useless for GiST?
>>>
>>
>>
>> It's because all of GiST pages are produced by page splits. So, just after
>> CREATE INDEX GiST pages aren't tightly packed in average. Actually, they
>> could be tightly packed by incredible coincidence, but for large indexes
>> it's quite safe assumption that they are not. With GiST we don't have
>> storm
>> of page splits after index creation with fillfactor = 100. So, why should
>> we reserve additional space with fillfactor = 90?
>>
>
> Aha, I see. Yeah, that's pretty useless. Ideally, we would make the GiST
> build algorithm smarter so that it would pack the pages more tightly. I
> have no idea how to do that, however.
>
> Anyway, the fact that fillfactor is useless for GiST is more of an
> argument for removing it from GiST, than for adding it to GIN.

Yes it is. I've just think that fillfactor is more useful for GIN than for
GiST now. However, it's probably doesn't worth it for both of them.
One of our customers complains that freshly built GIN indexes get bloat
very fast. I've asked them a test case. Using that test case I would try to
see if fillfactor for GIN could help in practice. For now we can mark it as
"Returned with feedback" where feedback is "Real life use cases needed".
Removing fillfactor for GiST is now not easy. Once it's exposed for users,
removing it completely would break compatibility. I would propose to chage
default fillfactor for GiST from 90 to 100.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] pgbench stats per script & other stuff

2015-07-21 Thread Fabien COELHO


5~5~5~

That is a truly horrifying abuse of command-line arguments.  -1 from
me, or minus more than one if I've got that many chits to burn.


Are you against the -w, or against saying that pgbench execute scripts,
whether internal or from files?


I'm against the idea that we accept multiple arguments for scripts,


Pgbench *currently* already accept multiple "-f ..." options, and this is 
a good thing to test realistic loads which may intermix several kind of 
transactions, say a lot of readonly and some update or insert, and very 
rare deletes...


Now if you do not need it you do not use it, and all is fine. Once you 
have several scripts, being able to "weight" them becomes useful for 
realism.


and that a subsequent -w modifies the meaning of the script-specifiying 
argument already read. That strikes me as a very unintuitive interface.


Ok, I understand this "afterward modification" objection.

What if the -w would be required *before*, and supply a weight for (the 
first/maybe all) script(s) specified *afterwards*, so it does not modify 
something already provided? I think it would be more intuitive, or at 
least less surprising.


I'm not sure exactly what would be better at the moment, but I think we 
need something better.


Maybe -f file.sql:weight (yuk from my point of view, but it can be 
done easily).


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Selectivity estimation for intarray with @@

2015-07-21 Thread Alexander Korotkov

On Tue, Jul 21, 2015 at 6:49 PM, Heikki Linnakangas  wrote:

> On 07/21/2015 03:44 PM, Alexander Korotkov wrote:
>
>> While Uriy is on vacation, I've revised this patch a bit.
>>
>
> I whacked this around quite a bit, and I think it's in a committable state
> now. But if you could run whatever tests you were using before on this, to
> make sure it still produces the same estimates, that would be great. I
> didn't change the estimates it should produce, only the code structure.
>
> One thing that bothers me slightly with this patch is the way it peeks
> into the Most-Common-Elements arrays, which is produced by the built-in
> type analyze function. If we ever change what statistics are collected for
> arrays, or the way they are stored, this might break. In matchsel, why
> don't we just call the built-in estimator function for each element that we
> need to probe, and not look into the statistics ourselves at all? I
> actually experimented with that, and it did slash much of the code, and it
> would be more future-proof. However, it was also a lot slower for queries
> that contain multiple values. That's understandable: the built-in estimator
> will fetch the statistics tuple, parse the arrays, etc. separately for each
> value in the query_int, while this patch will do it only once for the whole
> query, and perform a simple binary search for each value. So overall, I
> think this is OK as it is. But if we find that we need to use the MCE list
> in this fashion in more places in the future, it might be worthwhile to add
> some support code for this in the backend to allow extracting the stats
> once, and doing multiple "lightweight estimations" using the extracted
> stats.
>

Yeah, I see. We could end up with something like this. But probably we
would need something more general for extensions which wants to play with
statistics.
For instance, pg_trgm could estimate selectivity for "text % text"
operator. But in order to provide that it needs trigram statistics. Now it
could be implemented by defining separate datatype, but it's a kluge.
Probably, we would end up with custom additional statistics for datatypes.


> Some things I fixed/changed:
>
> * I didn't like that transformOperator() function, which looked up the
> function's name. I replaced it with separate wrapper functions for each
> operator, so that the built-in operator's OID can be hardcoded into each.
>
> * I refactored the matchsel function heavily. I think it's more readable
> now.
>
> * I got rid of the Int4Freq array. It didn't seem significantly easier to
> work with than the separate values/numbers arrays, so I just used those
> directly.
>
> * Also use the matchsel estimator for ~~ (the commutator of @@)
>

In this version of patch it's not checked if variable is actually and int[]
not query_int. See following test case.

# create table test2 as (select '1'::query_int val from
generate_series(1,100));
# analyze test2;

# explain select * from test2 where '{1}'::int[] @@ val;
ERROR:  unrecognized int query item type: 0

I've added this check.

* Also use the estimators for the obsolete @ and ~ operators. Not that I
> care much about those as they are obsolete, but seems strange not to, as
> it's a trivial matter of setting the right estimator function.
>
> * I added an ANALYZE in the regression test. It still won't systematically
> test all the cost estimation code, and there's nothing to check that the
> estimates make sense, but at least more of the code will now run.


You also forgot to include intarray--1.0--1.1.sql into patch. I've also
added it.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


intarray-sel-4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] ALTER TABLE .. ADD PRIMARY KEY .. USING INDEX has dump-restore hazard

2015-07-21 Thread Robert Haas

Consider:

rhaas=# create table foo (a text);
CREATE TABLE
rhaas=# create unique index on foo (a collate "C");
CREATE INDEX
rhaas=# alter table foo add primary key using index foo_a_idx;
ALTER TABLE
rhaas=# \d foo
Table "public.foo"
 Column | Type | Modifiers
+--+---
 a  | text | not null
Indexes:
"foo_a_idx" PRIMARY KEY, btree (a COLLATE "C")

Now dump and restore this database.  Then:

rhaas=# \d foo
Table "public.foo"
 Column | Type | Modifiers
+--+---
 a  | text | not null
Indexes:
"foo_a_idx" PRIMARY KEY, btree (a)

Notice that the collation specifier is gone.  Oops.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fillfactor for GIN indexes

2015-07-21 Thread Heikki Linnakangas


On 07/21/2015 04:14 PM, Alexander Korotkov wrote:

On Tue, Jul 21, 2015 at 3:52 PM, Heikki Linnakangas  wrote:


On 07/21/2015 02:56 PM, Alexander Korotkov wrote:


Probably, but currently we are in quite unlogical situation. We have
default fillfactor = 90 for GiST where it has no use cases at all and
effectively is just a waste of space.



Why is it useless for GiST?



It's because all of GiST pages are produced by page splits. So, just after
CREATE INDEX GiST pages aren't tightly packed in average. Actually, they
could be tightly packed by incredible coincidence, but for large indexes
it's quite safe assumption that they are not. With GiST we don't have storm
of page splits after index creation with fillfactor = 100. So, why should
we reserve additional space with fillfactor = 90?


Aha, I see. Yeah, that's pretty useless. Ideally, we would make the GiST 
build algorithm smarter so that it would pack the pages more tightly. I 
have no idea how to do that, however.


Anyway, the fact that fillfactor is useless for GiST is more of an 
argument for removing it from GiST, than for adding it to GIN.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench stats per script & other stuff

2015-07-21 Thread Robert Haas

On Tue, Jul 21, 2015 at 10:42 AM, Fabien COELHO  wrote:
>>>   sh> ./pgbench -T 3 -B -N -w 2 -S -w 7 --per-script-stats
>>
>> That is a truly horrifying abuse of command-line arguments.  -1 from
>> me, or minus more than one if I've got that many chits to burn.
>
> Are you against the -w, or against saying that pgbench execute scripts,
> whether internal or from files?

I'm against the idea that we accept multiple arguments for scripts,
and that a subsequent -w modifies the meaning of the
script-specifiying argument already read.  That strikes me as a very
unintuitive interface.  I'm not sure exactly what would be better at
the moment, but I think we need something better.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-21 Thread Tom Lane

Petr Jelinek  writes:
> Another thing that's not clear to me after playing with this is how do 
> we want to detect if to do pagemode scan or not. I understand that it's 
> neat optimization to say pagemode = true in bernoulli when percentage is 
> high and false when it's low but then this would have to come from the 
> BeginSampleScan() in the proposed API, but then BeginSampleScan would 
> not have access to HeapScanDesc as it would not be inited yet when it's 
> called.

Right.

> The info that BeginSampleScan() needs can be obtained directly 
> from ss_currentRelation I guess, but it's somewhat strange to pass 
> semi-initialized SampleScanState to the BeginSampleScan().

Doesn't seem like a big problem from here.  The HeapScanDesc pointer
will be null at that point, so it's not like attempts to access it
would escape notice.

We could alternatively provide two scan-initialization callbacks,
one that analyzes the parameters before we do heap_beginscan,
and another that can do additional setup afterwards.  Actually,
that second call would really not be meaningfully different from
the ReScan call, so another solution would be to just automatically
invoke ReScan after we've created the HeapScanDesc.  TSMs could work
around not having this by complicating their NextBlock function a bit
(so that it would do some initialization on first call) but perhaps
it would be easier to have the additional init call.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Selectivity estimation for intarray with @@

2015-07-21 Thread Heikki Linnakangas


On 07/21/2015 03:44 PM, Alexander Korotkov wrote:

While Uriy is on vacation, I've revised this patch a bit.


I whacked this around quite a bit, and I think it's in a committable 
state now. But if you could run whatever tests you were using before on 
this, to make sure it still produces the same estimates, that would be 
great. I didn't change the estimates it should produce, only the code 
structure.


One thing that bothers me slightly with this patch is the way it peeks 
into the Most-Common-Elements arrays, which is produced by the built-in 
type analyze function. If we ever change what statistics are collected 
for arrays, or the way they are stored, this might break. In matchsel, 
why don't we just call the built-in estimator function for each element 
that we need to probe, and not look into the statistics ourselves at 
all? I actually experimented with that, and it did slash much of the 
code, and it would be more future-proof. However, it was also a lot 
slower for queries that contain multiple values. That's understandable: 
the built-in estimator will fetch the statistics tuple, parse the 
arrays, etc. separately for each value in the query_int, while this 
patch will do it only once for the whole query, and perform a simple 
binary search for each value. So overall, I think this is OK as it is. 
But if we find that we need to use the MCE list in this fashion in more 
places in the future, it might be worthwhile to add some support code 
for this in the backend to allow extracting the stats once, and doing 
multiple "lightweight estimations" using the extracted stats.


Some things I fixed/changed:

* I didn't like that transformOperator() function, which looked up the 
function's name. I replaced it with separate wrapper functions for each 
operator, so that the built-in operator's OID can be hardcoded into each.


* I refactored the matchsel function heavily. I think it's more readable 
now.


* I got rid of the Int4Freq array. It didn't seem significantly easier 
to work with than the separate values/numbers arrays, so I just used 
those directly.


* Also use the matchsel estimator for ~~ (the commutator of @@)

* Also use the estimators for the obsolete @ and ~ operators. Not that I 
care much about those as they are obsolete, but seems strange not to, as 
it's a trivial matter of setting the right estimator function.


* I added an ANALYZE in the regression test. It still won't 
systematically test all the cost estimation code, and there's nothing to 
check that the estimates make sense, but at least more of the code will 
now run.


- Heikki

diff --git a/contrib/intarray/Makefile b/contrib/intarray/Makefile
index 920c5b1..5ea7f2a 100644
--- a/contrib/intarray/Makefile
+++ b/contrib/intarray/Makefile
@@ -2,10 +2,10 @@
 
 MODULE_big = _int
 OBJS = _int_bool.o _int_gist.o _int_op.o _int_tool.o \
-	_intbig_gist.o _int_gin.o $(WIN32RES)
+	_intbig_gist.o _int_gin.o _int_selfuncs.o $(WIN32RES)
 
 EXTENSION = intarray
-DATA = intarray--1.0.sql intarray--unpackaged--1.0.sql
+DATA = intarray--1.1.sql intarray--1.0--1.1.sql intarray--unpackaged--1.0.sql
 PGFILEDESC = "intarray - functions and operators for arrays of integers"
 
 REGRESS = _int
diff --git a/contrib/intarray/_int_selfuncs.c b/contrib/intarray/_int_selfuncs.c
new file mode 100644
index 000..4896461
--- /dev/null
+++ b/contrib/intarray/_int_selfuncs.c
@@ -0,0 +1,334 @@
+/*-
+ *
+ * _int_selfuncs.c
+ *	  Functions for selectivity estimation of intarray operators
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  contrib/intarray/_int_selfuncs.c
+ *
+ *-
+ */
+#include "postgres.h"
+#include "_int.h"
+
+#include "access/htup_details.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_statistic.h"
+#include "catalog/pg_type.h"
+#include "utils/selfuncs.h"
+#include "utils/syscache.h"
+#include "utils/lsyscache.h"
+#include "miscadmin.h"
+
+PG_FUNCTION_INFO_V1(_int_overlap_sel);
+PG_FUNCTION_INFO_V1(_int_contains_sel);
+PG_FUNCTION_INFO_V1(_int_contained_sel);
+PG_FUNCTION_INFO_V1(_int_overlap_joinsel);
+PG_FUNCTION_INFO_V1(_int_contains_joinsel);
+PG_FUNCTION_INFO_V1(_int_contained_joinsel);
+PG_FUNCTION_INFO_V1(_int_matchsel);
+
+Datum		_int_overlap_sel(PG_FUNCTION_ARGS);
+Datum		_int_contains_sel(PG_FUNCTION_ARGS);
+Datum		_int_contained_sel(PG_FUNCTION_ARGS);
+Datum		_int_overlap_joinsel(PG_FUNCTION_ARGS);
+Datum		_int_contains_joinsel(PG_FUNCTION_ARGS);
+Datum		_int_contained_joinsel(PG_FUNCTION_ARGS);
+Datum		_int_matchsel(PG_FUNCTION_ARGS);
+
+
+static Selectivity int_query_opr_selec(ITEM *item, Datum *values, float4 *freqs,
+	int nmncelems, float4 minfreq);
+static int	compare_val_int4(const void *a, const void *b);
+
+/*
+ * Wrappers around

Re: [HACKERS] Selectivity estimation for intarray with @@

2015-07-21 Thread Alexander Korotkov

On Tue, Jul 21, 2015 at 5:14 PM, Teodor Sigaev  wrote:

> Some notices about code:
>
> 1 Near function transformOperator() there is wrong operator name "@<"
>

Fixed.

> 2 int_query (and intquery) should be replaced to query_int to be
> consistent with actual type name. At least where it's used as separate
> lexeme.
>

Fixed.

> 3 For historical reasons @@ doesn't commutate with itself, it has a
> commutator ~~. Patch assumes that @@ is self-commutator, but ~~ will use
> 'contsel' as a restrict estimation. Suppose, we need to declare ~~ as
> deprecated and introduce commutating @@ operator.

I think deprecating ~~ is a subject of separate patch. I fixed patch
behavior in the different way. @@ and ~~ are now handled by the same
function. The function handles "var @@ const" and "const ~~ var", but
doesn't handle "const @@ var" and "var  ~~ const". It determines the case
by type of variable: it should be int[].

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

intarray_sel-3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-21 Thread Andres Freund

On 2015-07-21 17:00:51 +0200, Andres Freund wrote:
> On 2015-07-21 07:55:17 -0700, Paul Ramsey wrote:
> > On Tue, Jul 21, 2015 at 7:45 AM, Andres Freund  wrote:
> > So, right after reading the options in postgresGetForeignRelSize,
> > expand the extension list into a list of all ops/functions, in a
> > sorted list, and let that carry through to the deparsing instead?
> 
> I'd actually try to make it longer lived, i.e. permanently. And just
> deallocate when a catcache callback says it needs to be invalidated;
> IIRC there is a relevant cache.

On second thought I'd not use a binary search but a hash table. If you
choose the right key a single table is enough for the lookup.

If you need references for invalidations you might want to look for
CacheRegisterSyscacheCallback callers. E.g. attoptcache.c


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-21 Thread Petr Jelinek


On 2015-07-20 17:23, Tom Lane wrote:


Doesn't look like it to me: heap_beginscan_sampling always passes
allow_sync = false to heap_beginscan_internal.



Oh, right.


More to the point, the existing coding of all these methods is such
that they would fail to be reproducible if syncscan were enabled,
because the scan start point wouldn't be the same.  That's fixable,
but it'll take more work than just dealing with the above oversight.
In any case, given that Simon has stated he wants support for sample
methods that don't try to be reproducible, we need the method
to be able to say whether syncscan is allowed.



I am not completely clear on how we do this tbh, do you mean we use the 
info about repeatability for this?


Another thing that's not clear to me after playing with this is how do 
we want to detect if to do pagemode scan or not. I understand that it's 
neat optimization to say pagemode = true in bernoulli when percentage is 
high and false when it's low but then this would have to come from the 
BeginSampleScan() in the proposed API, but then BeginSampleScan would 
not have access to HeapScanDesc as it would not be inited yet when it's 
called. The info that BeginSampleScan() needs can be obtained directly 
from ss_currentRelation I guess, but it's somewhat strange to pass 
semi-initialized SampleScanState to the BeginSampleScan().


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-21 Thread Andres Freund

On 2015-07-21 07:55:17 -0700, Paul Ramsey wrote:
> On Tue, Jul 21, 2015 at 7:45 AM, Andres Freund  wrote:
> So, right after reading the options in postgresGetForeignRelSize,
> expand the extension list into a list of all ops/functions, in a
> sorted list, and let that carry through to the deparsing instead?

I'd actually try to make it longer lived, i.e. permanently. And just
deallocate when a catcache callback says it needs to be invalidated;
IIRC there is a relevant cache.

> That
> would happen once per query, right? But the deparsing also happens
> once per query too, yes? Is the difference going to be that big? (I
> speak not knowing the overhead of things like systable_beginscan, etc)

What you have here is an O(nmembers * nexpressions) approach. And each
of those scans is going to do non-neglegible work (an index scan of
pg_depend), that's fairly expensive.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/pgSQL, RAISE and error context

2015-07-21 Thread Merlin Moncure

On Tue, Jul 21, 2015 at 2:53 AM, Heikki Linnakangas  wrote:
> On 07/21/2015 10:38 AM, Pavel Stehule wrote:
>>
>> where we are with this patch? Can I do some for it?
>
>
> I still feel this approach is misguided, and we should be tweaking psql
> and/or libpq instead. I don't feel strongly though, and if some other
> committer wants to pick this up in its current form, I won't object. So this
> patch has reached an impasse, and if no-one else wants to pick this up, I'm
> going to mark this as "Returned with Feedback" and move on.

That's unfortunate.  Maybe I'm missing something:

What does a client side implementation offer that a server side
implementation does not offer?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-21 Thread Paul Ramsey

On Tue, Jul 21, 2015 at 7:45 AM, Andres Freund  wrote:

>> +
>> + /* We need this relation to scan */
>> + depRel = heap_open(DependRelationId, RowExclusiveLock);
>> +
>> + /* Scan the system dependency table for a all entries this operator */
>> + /* depends on, then iterate through and see if one of them */
>> + /* is a registered extension */
>> + ScanKeyInit(&key[0],
>> + Anum_pg_depend_objid,
>> + BTEqualStrategyNumber, F_OIDEQ,
>> + ObjectIdGetDatum(procnumber));
>> +
>> + scan = systable_beginscan(depRel, DependDependerIndexId, true,
>> +   
>> GetCatalogSnapshot(depRel->rd_id), nkeys, key);
>> +
>> + while (HeapTupleIsValid(tup = systable_getnext(scan)))
>> + {
>> + Form_pg_depend foundDep = (Form_pg_depend) GETSTRUCT(tup);
>> +
>> + if ( foundDep->deptype == DEPENDENCY_EXTENSION )
>> + {
>> + List *extlist = fpinfo->extensions;
>> + ListCell *ext;
>> +
>> + foreach(ext, extlist)
>> + {
>> + Oid extension_oid = (Oid) lfirst(ext);
>> + if ( foundDep->refobjid == extension_oid )
>> + {
>> + nresults++;
>> + }
>> + }
>> + }
>> + if ( nresults > 0 ) break;
>> + }
>> +
>> + systable_endscan(scan);
>> + relation_close(depRel, RowExclusiveLock);
>> +
>> + return nresults > 0;
>> +}
>
> Phew. That's mighty expensive to do at frequency.
>
> I guess it'll be more practical to expand this list once and then do a
> binary search on the result for the individual functions

So, right after reading the options in postgresGetForeignRelSize,
expand the extension list into a list of all ops/functions, in a
sorted list, and let that carry through to the deparsing instead? That
would happen once per query, right? But the deparsing also happens
once per query too, yes? Is the difference going to be that big? (I
speak not knowing the overhead of things like systable_beginscan, etc)

Thanks!

P


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Solaris testers wanted for strxfrm() behavior

2015-07-21 Thread Bjorn Munch

On 27/06 12.51, Noah Misch wrote:
> 
> PostgreSQL 9.5 adds a strxfrm() call in bttext_abbrev_convert(), which does
> not account for the Solaris bug.  I wish to determine whether that bug is
> still relevant today.  If you have access to Solaris with the is_IS.ISO8859-1
> locale installed (or root access to install it), please compile and run the
> attached test program on each Solaris version you have available.  Reply here
> with the program's output.  I especially need a report from Solaris 10, but
> reports from older and newer versions are valuable.  Thanks.

I ran this program on Solaris 9 U5 (September 2006) on Sparc and got:

---
SunOS tor10-z1 5.9 Generic_Virtual sun4v sparc SUNW,SPARC-Enterprise-T5120
locale "is_IS.ISO8859-1": strxfrm returned 162; last modified byte at 106 
(0xffd2)
locale "is_IS.ISO8859-1": strxfrm returned 162; last modified byte at 106 
(0xffd2)
locale "": strxfrm returned 26; last modified byte at 58 (0x0)
---

NB this runs in a Container on a Solaris 10 host. On our oldest
Solaris 10 sparc box, running Solaris 10 U6 (October 2008) I get:

---
SunOS tor07 5.10 Generic_137137-09 sun4v sparc SUNW,SPARC-Enterprise-T5120
locale "is_IS.ISO8859-1": strxfrm returned 216; last modified byte at 58 (0x0)
locale "is_IS.ISO8859-1": strxfrm returned 216; last modified byte at 58 (0x0)
locale "": strxfrm returned 26; last modified byte at 27 (0x0)
---

On x86: Solaris 10 U5 (May 2008) I get:

---
SunOS loki08 5.10 Generic_127128-11 i86pc i386 i86pc
locale "is_IS.ISO8859-1": strxfrm returned 216; last modified byte at 58 (0x0)
locale "is_IS.ISO8859-1": strxfrm returned 216; last modified byte at 58 (0x0)
locale "": strxfrm returned 26; last modified byte at 27 (0x0)
---

- Bjorn


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-21 Thread Andres Freund

Hi,

On 2015-07-21 07:28:22 -0700, Paul Ramsey wrote:
>  /*
> @@ -229,6 +236,9 @@ foreign_expr_walker(Node *node,
>   Oid collation;
>   FDWCollateState state;
>
> + /* Access extension metadata from fpinfo on baserel */
> + PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo 
> *)(glob_cxt->foreignrel->fdw_private);
> +
>   /* Need do nothing for empty subexpressions */
>   if (node == NULL)
>   return true;
> @@ -361,7 +371,7 @@ foreign_expr_walker(Node *node,
>* can't be sent to remote because it might 
> have incompatible
>* semantics on remote side.
>*/
> - if (!is_builtin(fe->funcid))
> + if (!is_builtin(fe->funcid) && 
> !is_in_extension(fe->funcid, fpinfo))
>   return false;

...

>  /*
> + * Returns true if given operator/function is part of an extension declared 
> in the
> + * server options.
> + */
> +static bool
> +is_in_extension(Oid procnumber, PgFdwRelationInfo *fpinfo)
> +{
> + static int nkeys = 1;
> + ScanKeyData key[nkeys];
> + HeapTuple tup;
> + Relation depRel;
> + SysScanDesc scan;
> + int nresults = 0;
> +
> + /* Always return false if we don't have any declared extensions */
> + if ( ! fpinfo->extensions )
> + return false;
> +
> + /* We need this relation to scan */
> + depRel = heap_open(DependRelationId, RowExclusiveLock);
> +
> + /* Scan the system dependency table for a all entries this operator */
> + /* depends on, then iterate through and see if one of them */
> + /* is a registered extension */
> + ScanKeyInit(&key[0],
> + Anum_pg_depend_objid,
> + BTEqualStrategyNumber, F_OIDEQ,
> + ObjectIdGetDatum(procnumber));
> +
> + scan = systable_beginscan(depRel, DependDependerIndexId, true,
> +   
> GetCatalogSnapshot(depRel->rd_id), nkeys, key);
> +
> + while (HeapTupleIsValid(tup = systable_getnext(scan)))
> + {
> + Form_pg_depend foundDep = (Form_pg_depend) GETSTRUCT(tup);
> +
> + if ( foundDep->deptype == DEPENDENCY_EXTENSION )
> + {
> + List *extlist = fpinfo->extensions;
> + ListCell *ext;
> +
> + foreach(ext, extlist)
> + {
> + Oid extension_oid = (Oid) lfirst(ext);
> + if ( foundDep->refobjid == extension_oid )
> + {
> + nresults++;
> + }
> + }
> + }
> + if ( nresults > 0 ) break;
> + }
> +
> + systable_endscan(scan);
> + relation_close(depRel, RowExclusiveLock);
> +
> + return nresults > 0;
> +}

Phew. That's mighty expensive to do at frequency.

I guess it'll be more practical to expand this list once and then do a
binary search on the result for the individual functions

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench stats per script & other stuff

2015-07-21 Thread Fabien COELHO



Hello Robert,


On Fri, Jul 17, 2015 at 9:50 AM, Fabien  wrote:

  sh> ./pgbench -T 3 -B -N -w 2 -S -w 7 --per-script-stats


That is a truly horrifying abuse of command-line arguments.  -1 from
me, or minus more than one if I've got that many chits to burn.


Are you against the -w, or against saying that pgbench execute scripts, 
whether internal or from files?


The former is obviously a matter of taste and I can remove "-w" if nobody 
wants it, too bad because the feature seems useful to me from a testing 
point of view, this is a choice between aesthetic and feature. Note that 
you do not have to use it if you do not like it.


The later really homogeneise the code internally and allows to factor out 
things, to have orthogonal features (internal scripts are treated the same 
way as external files, this requires less lines of code because it is 
simpler), and does not harm anyone IMO, so it would be sad to let it go.



I have been thinking that the way to do this is to push more into the
script file itself, e.g. allow:

\if random() < 0.1
stuff
\else
other stuff
\endif

Maybe that's overkill and there's some way of specifying multiple
scripts on the command line, but IMO what you've got here is not it.


I think that is overkill, and moreover it is not useful: the point is to 
collect statistics *per scripts*, with an "random if" you would not know 
which part was executed, so you would loose the whole point of having per 
script stats.


If you have another suggestion about how to provide weights, which does 
not rely on ifs nor on options? Maybe a special comment in the script (yuk 
from my point of view because the script would carry its weight whereas I 
think this should be orthogonal to the script contents, but it would be 
better than nothing..).


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-21 Thread Paul Ramsey

Here's a third revision that allows the 'extensions' option on the
wrapper as well, so that supported extensions can be declared once in
one place.

Since the "CREATE FOREIGN DATA WRAPPER" statement is actually called
inside the "CREATE EXTENSION" script for postgres_fdw, the way to get
this option is actually to alter the wrapper,

  ALTER FOREIGN DATA WRAPPER postgres_fdw OPTIONS ( extensions 'seg' );

Right now declaring extensions at different levels is additive, I
didn't add the option to revoke an extension for a particular server
definition (the "cube,-postgis" entry for example). If that's a
deal-breaker, I can add it too, but it felt like something that could
wait for a user to positively declare "I must have that feature!"

P.


On Fri, Jul 17, 2015 at 5:58 AM, Paul Ramsey  wrote:
>
> On July 17, 2015 at 5:57:42 AM, Simon Riggs 
> (si...@2ndquadrant.com(mailto:si...@2ndquadrant.com)) wrote:
>
>> Options already exist on CREATE FOREIGN DATA WRAPPER, so it should be easy 
>> to support that.
>>
>> I'd rather add it once on the wrapper than be forced to list all the options 
>> on every foreign server, unless required to do so.
>
> Gotcha, that does make sense.
>
> P.
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 81cb2b4..9c5136e 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -34,11 +34,15 @@
 
 #include "postgres_fdw.h"
 
+#include "access/genam.h"
 #include "access/heapam.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
 #include "access/transam.h"
+#include "catalog/dependency.h"
+#include "catalog/indexing.h"
 #include "catalog/pg_collation.h"
+#include "catalog/pg_depend.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_proc.h"
@@ -49,8 +53,10 @@
 #include "optimizer/var.h"
 #include "parser/parsetree.h"
 #include "utils/builtins.h"
+#include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
+#include "utils/snapmgr.h"
 #include "utils/syscache.h"
 
 
@@ -136,6 +142,7 @@ static void printRemoteParam(int paramindex, Oid paramtype, 
int32 paramtypmod,
 deparse_expr_cxt *context);
 static void printRemotePlaceholder(Oid paramtype, int32 paramtypmod,
   deparse_expr_cxt *context);
+static bool is_in_extension(Oid procid, PgFdwRelationInfo *fpinfo);
 
 
 /*
@@ -229,6 +236,9 @@ foreign_expr_walker(Node *node,
Oid collation;
FDWCollateState state;
 
+   /* Access extension metadata from fpinfo on baserel */
+   PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo 
*)(glob_cxt->foreignrel->fdw_private);
+
/* Need do nothing for empty subexpressions */
if (node == NULL)
return true;
@@ -361,7 +371,7 @@ foreign_expr_walker(Node *node,
 * can't be sent to remote because it might 
have incompatible
 * semantics on remote side.
 */
-   if (!is_builtin(fe->funcid))
+   if (!is_builtin(fe->funcid) && 
!is_in_extension(fe->funcid, fpinfo))
return false;
 
/*
@@ -407,7 +417,7 @@ foreign_expr_walker(Node *node,
 * (If the operator is, surely its underlying 
function is
 * too.)
 */
-   if (!is_builtin(oe->opno))
+   if (!is_builtin(oe->opno) && 
!is_in_extension(oe->opno, fpinfo))
return false;
 
/*
@@ -445,7 +455,7 @@ foreign_expr_walker(Node *node,
/*
 * Again, only built-in operators can be sent 
to remote.
 */
-   if (!is_builtin(oe->opno))
+   if (!is_builtin(oe->opno) && 
!is_in_extension(oe->opno, fpinfo))
return false;
 
/*
@@ -591,7 +601,7 @@ foreign_expr_walker(Node *node,
 * If result type of given expression is not built-in, it can't be sent 
to
 * remote because it might have incompatible semantics on remote side.
 */
-   if (check_type && !is_builtin(exprType(node)))
+   if (check_type && !is_builtin(exprType(node)) && 
!is_in_extension(exprType(node), fpinfo))
return false;
 
/*
@@ -669,6 +679,65 @@ is_builtin(Oid oid)
 
 
 /*
+ * Returns true if given operator/function is part of an extension declared in 
the 
+ * server options.
+ */
+static bool
+is_in_extension(Oid procnumber, PgFdwRelationInfo *fpinfo)
+{
+   static int nkeys = 1;
+   ScanKeyData key[nkeys];
+   HeapTuple

Re: [HACKERS] pgbench stats per script & other stuff

2015-07-21 Thread Robert Haas

On Fri, Jul 17, 2015 at 9:50 AM, Fabien  wrote:
>   sh> ./pgbench -T 3 -B -N -w 2 -S -w 7 --per-script-stats

That is a truly horrifying abuse of command-line arguments.  -1 from
me, or minus more than one if I've got that many chits to burn.

I have been thinking that the way to do this is to push more into the
script file itself, e.g. allow:

\if random() < 0.1
stuff
\else
other stuff
\endif

Maybe that's overkill and there's some way of specifying multiple
scripts on the command line, but IMO what you've got here is not it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] creating extension including dependencies

2015-07-21 Thread Tom Lane

Petr Jelinek  writes:
> ... My main question is if we are 
> ok with SCHEMA having different behavior with CASCADE vs without 
> CASCADE. I went originally with "no" and added the DEFAULT flag to 
> SCHEMA. If the answer is "yes" then we don't need the flag (in that case 
> CASCADE acts as the flag).

Yeah, I was coming around to that position as well.  Insisting that
SCHEMA throw an error if the extension isn't relocatable makes sense
as long as only one extension is being considered, but once you say
CASCADE it seems like mostly a usability fail.  I think it's probably
OK if with CASCADE, SCHEMA is just "use if needed else ignore".

Obviously we've gotta document all this properly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Selectivity estimation for intarray with @@

2015-07-21 Thread Teodor Sigaev


Some notices about code:

1 Near function transformOperator() there is wrong operator name "@<"
2 int_query (and intquery) should be replaced to query_int to be consistent with 
actual type name. At least where it's used as separate lexeme.
3 For historical reasons @@ doesn't commutate with itself, it has a commutator 
~~. Patch assumes that @@ is self-commutator, but ~~ will use  'contsel' as a 
restrict estimation. Suppose, we need to declare ~~ as deprecated and introduce 
commutating @@ operator.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] creating extension including dependencies

2015-07-21 Thread Petr Jelinek


On 2015-07-21 15:48, Tom Lane wrote:

Robert Haas  writes:

On Mon, Jul 20, 2015 at 10:29 PM, Michael Paquier
 wrote:>

In short I would give up on the DEFAULT SCHEMA business, and
add a new flag in the control file to decide if a given extension
passes down the schema name of its child when created in cascade,
default being true for the potential issues with search_path not
pointing to public.



Well, so far, it seems like this decision is something where different
DBAs might have different policies.  If you put the flag in the
control file, you're saying it is the extension developer's decision,
which may not be best.


I have doubts about that too.  But really, why have a flag at all
anywhere?  If we are doing a CASCADE, and the referenced extension needs a
schema, the alternatives are either (1) let it have one, or (2) fail.
I am not seeing that (2) is a superior alternative in any circumstances.

We will need to document that the behavior of CASCADE is "install all
needed extensions into the schema you specify", but what's wrong with
that?  If the user wants to put them in different schemas, he cannot
use CASCADE in any case.



Yes this is the behavior I want as well. My main question is if we are 
ok with SCHEMA having different behavior with CASCADE vs without 
CASCADE. I went originally with "no" and added the DEFAULT flag to 
SCHEMA. If the answer is "yes" then we don't need the flag (in that case 
CASCADE acts as the flag).


Or course "yes" would then mean "CREATE EXTENSION foo SCHEMA bar" will 
fail if "foo" is not relocatable but "CREATE EXTENSION foo SCHEMA bar 
CASCADE" will succeed and install "foo" into schema "foo" instead of 
"bar" and only relocatable dependencies will go to "bar". OTOH 
non-relocatable dependencies will go to their own schema no matter what 
user specifies in the command so I guess it's ok to just document that 
this is the behavior of CASCADE. As you say if somebody wants control 
over each individual extension they can't use CASCADE anyway.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "make check" changes have caused buildfarm deterioration.

2015-07-21 Thread Tom Lane

Andrew Dunstan  writes:
> On 07/21/2015 01:39 AM, Michael Paquier wrote:
>> Regarding install.log, the use of stdout/stderr instead of a log file
>> has been changed in dbf2ec1a after that:
>> http://www.postgresql.org/message-id/553fe7fc.2040...@gmx.net
>> Since 9.5 as the location of the temporary installation is global, we
>> could basically revert some parts of dcae5fac if that helps so as
>> install.log is saved in $ROOT/tmp_install/log/install.log... But I am
>> not sure what we win with that, and the argument to remove install.log
>> is that now the temporary installation is a make target. Both ways
>> have advantages and disadvantages.

> I'm quite unhappy about how we now pollute the output of "make check" 
> with the install log. See the above links for why. And when I run it by 
> hand I don't want all this scrolling by me on the screen. The previous 
> output of "make check" was small and clean, and I want it to go back to 
> that, with the other logs available as necessary. The reason the 
> buildfarm client cd's into the regress directory to run "make check" is 
> to avoid extraneous output.

I agree; this change may have seemed like a good idea at the time, but
it was not.  Failures during "make check"'s install step are rare enough
that you don't really need all that output in your face to help with the
rare situation where it fails.  And for the buildfarm's purposes, it is
surely desirable to segregate that output from the actual check step.

A possible alternative is to run the "make install" sub-step with -s,
but that could be objected to on the grounds that if it did fail, you'd
have a hard time telling exactly which step failed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] creating extension including dependencies

2015-07-21 Thread Tom Lane

Robert Haas  writes:
> On Mon, Jul 20, 2015 at 10:29 PM, Michael Paquier
>  wrote:>
>> In short I would give up on the DEFAULT SCHEMA business, and
>> add a new flag in the control file to decide if a given extension
>> passes down the schema name of its child when created in cascade,
>> default being true for the potential issues with search_path not
>> pointing to public.

> Well, so far, it seems like this decision is something where different
> DBAs might have different policies.  If you put the flag in the
> control file, you're saying it is the extension developer's decision,
> which may not be best.

I have doubts about that too.  But really, why have a flag at all
anywhere?  If we are doing a CASCADE, and the referenced extension needs a
schema, the alternatives are either (1) let it have one, or (2) fail.
I am not seeing that (2) is a superior alternative in any circumstances.

We will need to document that the behavior of CASCADE is "install all
needed extensions into the schema you specify", but what's wrong with
that?  If the user wants to put them in different schemas, he cannot
use CASCADE in any case.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "make check" changes have caused buildfarm deterioration.

2015-07-21 Thread Andrew Dunstan


On 07/21/2015 01:39 AM, Michael Paquier wrote:

On Tue, Jul 21, 2015 at 6:17 AM, Tom Lane  wrote:

Andrew Dunstan  writes:

Somewhere along the way some changes to the way we do "make check" have
caused a significant deterioration in the buildfarm's logging. Compare
these two from animal crake, which happens to be my test instance:

and


Yeah, I've been bitching about the poor logging for awhile, but I had
not realized that it's still working as-expected in the back branches.

Comparing different branches, it looks like "somewhere along the way"
means "between 9.4 and 9.5".  I suspect that commit dcae5faccab64776, or
perhaps the followon dbf2ec1a1c053379, is to blame.

Regarding install.log, the use of stdout/stderr instead of a log file
has been changed in dbf2ec1a after that:
http://www.postgresql.org/message-id/553fe7fc.2040...@gmx.net
Since 9.5 as the location of the temporary installation is global, we
could basically revert some parts of dcae5fac if that helps so as
install.log is saved in $ROOT/tmp_install/log/install.log... But I am
not sure what we win with that, and the argument to remove install.log
is that now the temporary installation is a make target. Both ways
have advantages and disadvantages.



I'm quite unhappy about how we now pollute the output of "make check" 
with the install log. See the above links for why. And when I run it by 
hand I don't want all this scrolling by me on the screen. The previous 
output of "make check" was small and clean, and I want it to go back to 
that, with the other logs available as necessary. The reason the 
buildfarm client cd's into the regress directory to run "make check" is 
to avoid extraneous output.




Regarding initdb.log and postmaster.log, this is definitely a bug.
Those have been moved by dcae5fa from log/ to tmp_check/log/,
tmp_check/ getting removed at the end of pg_regress if there are no
failures counted. Both files will be saved in log/ at the location
pg_regress is called using outputdir whose default is ".". This way
behavior is similar to ~9.4. Attached is a patch to fix this for 9.5
and master.




OK, looks sane enough. but please do address the other issue.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fillfactor for GIN indexes

2015-07-21 Thread Alexander Korotkov

On Tue, Jul 21, 2015 at 3:52 PM, Heikki Linnakangas  wrote:

> On 07/21/2015 02:56 PM, Alexander Korotkov wrote:
>
>> Probably, but currently we are in quite unlogical situation. We have
>> default fillfactor = 90 for GiST where it has no use cases at all and
>> effectively is just a waste of space.
>>
>
> Why is it useless for GiST?

It's because all of GiST pages are produced by page splits. So, just after
CREATE INDEX GiST pages aren't tightly packed in average. Actually, they
could be tightly packed by incredible coincidence, but for large indexes
it's quite safe assumption that they are not. With GiST we don't have storm
of page splits after index creation with fillfactor = 100. So, why should
we reserve additional space with fillfactor = 90?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] creating extension including dependencies

2015-07-21 Thread Robert Haas

On Mon, Jul 20, 2015 at 10:29 PM, Michael Paquier
 wrote:> In > short I would give up on the
DEFAULT SCHEMA business, and
> add a new flag in the control file to decide if a given extension
> passes down the schema name of its child when created in cascade,
> default being true for the potential issues with search_path not
> pointing to public.

Well, so far, it seems like this decision is something where different
DBAs might have different policies.  If you put the flag in the
control file, you're saying it is the extension developer's decision,
which may not be best.

Maybe I'm confused.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fillfactor for GIN indexes

2015-07-21 Thread Heikki Linnakangas


On 07/21/2015 02:56 PM, Alexander Korotkov wrote:

Probably, but currently we are in quite unlogical situation. We have
default fillfactor = 90 for GiST where it has no use cases at all and
effectively is just a waste of space.


Why is it useless for GiST?
- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Selectivity estimation for intarray with @@

2015-07-21 Thread Alexander Korotkov

Hi!

While Uriy is on vacation, I've revised this patch a bit.

On Tue, Jul 14, 2015 at 8:55 PM, Jeff Janes  wrote:

> Hi Uriy,
>
> This patch looks pretty good.
>
> The first line of intarray--1.1.sql mis-identifies itself as "/*
> contrib/intarray/intarray--1.0.sql */"
>

Fixed.


> The real file intarray--1.0.sql file probably should not be included in
> the final patch, but I like having it for testing.
>

I've removed intarray--1.0.sql since it shouldn't be in final commit.


> It applies and builds cleanly over the alter operator patch (and now the
> commit as well), passes make check of the contrib module both with and
> without cassert.
>
> I could succesfully upgrade from version 1.0 to 1.1 without having to drop
> the gin__int_ops indexes in the process.
>
> I could do pg_upgrade from 9.2 and 9.4 to 9.6devel with large indexes in
> place, and then upgrade the extension to 1.1, and it worked without having
> to rebuild the index.
>
> It does what it says, and I think we want this.
>

Good.


> There were some cases where the estimates were not very good, but that
> seems to be limitation of what pg_stats makes available, not of this
> patch.  Specifically if the elements listed in the query text are not part
> of most_common_elems (or worse yet, most_common_elems is null) then it is
> left to guess with no real data, and those guesses can be pretty bad.  It
> is not this patches job to fix that, however.
>


> It assumes all the stats are independent and so doesn't account for
> correlation between members.  This is also how the core selectivity
> estimates work between columns, so I can't really complain about that.  It
> is easy to trick it with things like @@ '(!300 & 300)'::query_int, but I
> don't think that is necessary to fix that.
>

Analysis of all the dependencies inside query is NP-complete task. We could
try to workout simple cases, but postgres optimizer currently doesn't care
about it.

# explain select * from test where val = 'val' and val != 'val';
 QUERY PLAN
-
 Seq Scan on test  (cost=0.00..39831.00 rows=45 width=8)
   Filter: ((val <> 'val'::text) AND (val = 'val'::text))
(2 rows)

I think we could do the same inside intquery until we figure out some
better solution for postgres optimizer in general.


> I have not been creative enough to come up with queries for which this
> improvement in selectivity estimate causes the execution plan to change in
> important ways.  I'm sure the serious users of this module would have such
> cases, however.
>
> I haven't tested gist__int_ops as I can't get those indexes to build in a
> feasible amount of time.  But the selectivity estimates are independent of
> any actual index, so testing those doesn't seem to be critical.
>
> There is no documentation change, which makes sense as this internal stuff
> which isn't documented to start with.
>

Yes. For instance, tsquery make very similar selectivity estimation as
intquery, but it's assumed to be internal and isn't mentioned in
documentation.

There are no regression test changes.  Not sure about that, it does seem
> like regression tests would be desirable.
>

It would be nice to cover selectivity estimation with regression tests, but
AFAICS we didn't do it for any selectivity estimation functions yet.
Problem is that selectivity estimation is quite complex process and its
result depending on random sampling during analyze, floating points
operations and so on. We could make a test like "with very high level of
confidence, estimate number of rows here should be in [10;100]". But it's
hard to fit such assumptions into our current regression tests
infrastructure.

I haven't gone through the C code.
>

I also did some coding style and comments modifications.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


intarray_sel-2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "make check" changes have caused buildfarm deterioration.

2015-07-21 Thread Michael Paquier

On Tue, Jul 21, 2015 at 7:01 PM, Alvaro Herrera
 wrote:
> Michael Paquier wrote:
>> On Tue, Jul 21, 2015 at 2:39 PM, Michael Paquier
>>  wrote:
>
>> > Regarding initdb.log and postmaster.log, this is definitely a bug.
>> > Those have been moved by dcae5fa from log/ to tmp_check/log/,
>> > tmp_check/ getting removed at the end of pg_regress if there are no
>> > failures counted. Both files will be saved in log/ at the location
>> > pg_regress is called using outputdir whose default is ".". This way
>> > behavior is similar to ~9.4. Attached is a patch to fix this for 9.5
>> > and master.
>>
>> Something I just noticed: an entry for log/ in test_ddl_deparse's
>> gitignore is missing.
>
> Well, three things: 1) the entry is not missing right now, but it will
> be missing if we apply your patch; 2) the file is inconsistent with the
> other test modules anyway so we might as well apply your patch to make
> them all alike; 3) we shouldn't really do anything about that until the
> other patch's fate is decided.

Well, just to be clear, I meant "with the previous patch I have attached".
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fillfactor for GIN indexes

2015-07-21 Thread Alexander Korotkov

On Tue, Jul 21, 2015 at 2:49 PM, Heikki Linnakangas  wrote:

> Hmm. That's slightly different from the test case I used, in that the
> update is changing the indexed value, which means that the new value goes
> to different posting tree than the old one. I'm not sure if that makes any
> difference. You're also updating more rows, 1/50 vs. 1/100.
>
> This case is closer to my earlier one:
>
> postgres=# create table foo (id int4, i int4, t text) with (fillfactor=90);
> CREATE TABLE
> postgres=# insert into foo select g, 1 from  generate_series(1, 100) g;
> INSERT 0 100
> postgres=# create index btree_foo_id on foo (id); -- to force non-HOT
> updates
> CREATE INDEX
> postgres=# create index gin_i on foo using gin (i) with (fastupdate=off);
> CREATE INDEX
> postgres=# \di+
>List of relations
>  Schema | Name | Type  | Owner  | Table |  Size   | Description
> +--+---++---+-+-
>  public | btree_foo_id | index | heikki | foo   | 21 MB   |
>  public | gin_i| index | heikki | foo   | 1072 kB |
> (2 rows)
>
> postgres=# update foo set id=-id where id % 100 = 0;
> UPDATE 1
> postgres=# \di+
>List of relations
>  Schema | Name | Type  | Owner  | Table |  Size   | Description
> +--+---++---+-+-
>  public | btree_foo_id | index | heikki | foo   | 22 MB   |
>  public | gin_i| index | heikki | foo   | 1080 kB |
> (2 rows)
>
> I'm not sure what's making the difference to your test case. Could be
> simply that your case happens to leave less free space on each page,
> because of the different data.


Yeah, it's likely because of different data.

 Based on this, I think we should just drop this patch. It's not useful in
>>> practice.
>>>
>>
>> I wouldn't say it's not useful at all. It's for sure not as useful as
>> btree
>> fillfactor, but it still could be useful in some cases...
>> Probably, we could leave 100 as default fillfactor, but provide an option.
>>
>
> Doesn't seem worth the trouble to me...


Probably, but currently we are in quite unlogical situation. We have
default fillfactor = 90 for GiST where it has no use cases at all and
effectively is just a waste of space. On the other had we're rejecting
fillfactor for GIN where it could have at least some use cases... Should we
don't have fillfactor for both GiST and GIN?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Fillfactor for GIN indexes

2015-07-21 Thread Heikki Linnakangas


On 07/21/2015 02:00 PM, Alexander Korotkov wrote:

On Tue, Jul 21, 2015 at 12:40 PM, Heikki Linnakangas 
wrote:


Has anyone done any performance testing of this?

The purpose of a fillfactor is to avoid the storm of page splits right
after building the index, when there are some random updates to the table.
It causes the index to bloat, as every full page is split to two half-full
pages, and also adds latency to all the updates that have to split pages.

As the code stands, do we actually have such a problem with GIN? Let's
investigate. Let's create a little test table:

create table foo (id int4, t text);
insert into foo select g, 'foo' from generate_series(1, 100) g;

And some indexes on it:

-- B-tree index on id, 100%
create index btree_100 on foo (id) with (fillfactor=100);
-- B-tree index on id, 90% (which is the default)
create index btree_90 on foo (id) with (fillfactor=90);
-- GIN index on id. Id is different on each row, so this index has no
posting trees, just the entry tree.
create index gin_id on foo using gin (id) with (fastupdate=off);
-- GIN index on t. T is the same on each row, so this index consists of a
single posting tree.
create index gin_t on foo using gin (t) with (fastupdate=off);

Immediately after creation, the index sizes are:

postgres=# \di+
   List of relations
  Schema |   Name| Type  | Owner  | Table |  Size   | Description
+---+---++---+-+-
  public | btree_100 | index | heikki | foo   | 19 MB   |
  public | btree_90  | index | heikki | foo   | 21 MB   |
  public | gin_id| index | heikki | foo   | 53 MB   |
  public | gin_t | index | heikki | foo   | 1072 kB |
(4 rows)


Now let's update 1% of the table, spread evenly across the table, and see
what happens to the index sizes:

postgres=# update foo set id = id where id % 100 = 0;
UPDATE 1
postgres=# \di+
   List of relations
  Schema |   Name| Type  | Owner  | Table |  Size   | Description
+---+---++---+-+-
  public | btree_100 | index | heikki | foo   | 39 MB   |
  public | btree_90  | index | heikki | foo   | 21 MB   |
  public | gin_id| index | heikki | foo   | 53 MB   |
  public | gin_t | index | heikki | foo   | 1080 kB |
(4 rows)

As you can see, btree_100 index doubled in size. That's the effect we're
trying to avoid with the fillfactor, and indeed in the btree_90 index it
was avoided. However, the GIN indexes are not bloated either. Why is that?

The entry tree (demonstrated by the gin_id index) is not packed tightly
when it's built. If anything, we could pack it more tightly to make it
smaller to begin with. For the use cases where GIN works best, though, the
entry tree should be fairly small compared to the posting lists, so it
shouldn't make a big difference. For the posting tree, I think that's
because the way the items are packed in the segments. Even though we pack
the segments as tightly as possible, there is always some free space left
over on a page that's not enough to add another segment to it at index
build, but can be used by the updates to add items to the existing
segments. So in practice you always end up with some free space on the
posting tree pages, even without a fillfactor, so the "effective
fillfactor" even today is not actually 100%.



There are two reasons of this behavior. You mentioned the first one:
"effective fillfactor" for posting lists is not 100%. But the second one is
structure of posting lists. In your example UPDATE allocates new tuple at
the end of heap. So, from the point of posting lists these updates are
appends.


Ah, good point.


Situation is different when you reuse space in the heap. See an
example.

drop table foo;
create table foo (id integer, val int[]) with (fillfactor = 90);
insert into foo (select g, array[g%2]::int[] from
generate_series(1,100) g);
create index foo_val_idx_100 on foo using gin(val) with (fastupdate=off,
fillfactor=100);
create index foo_val_idx_90 on foo using gin(val) with (fastupdate=off,
fillfactor=90);

# \di+
  List of relations
  Schema |  Name   | Type  | Owner  | Table |  Size   | Description
+-+---++---+-+-
  public | foo_val_idx_100 | index | smagen | foo   | 1088 kB |
  public | foo_val_idx_90  | index | smagen | foo   | 1200 kB |
(2 rows)

update foo set val = array[(id+1)%2]::int[] where id % 50 = 0;

# \di+
  List of relations
  Schema |  Name   | Type  | Owner  | Table |  Size   | Description
+-+---++---+-+-
  public | foo_val_idx_100 | index | smagen | foo   | 1608 kB |
  public | foo_val_idx_90  | index | smagen | foo   | 1200 kB |
(2 rows)


Hmm. That's slightly different from the test case I used, in that the 
update is changing the indexed value, which

Re: [BUGS] [HACKERS] object_classes array is broken, again

2015-07-21 Thread Alvaro Herrera

Tom Lane wrote:
> I wrote:
> > +1 to this patch, in fact I think we could remove MAX_OCLASS altogether
> > which would be very nice for switch purposes.
> 
> Oh, wait, I forgot that the patch itself introduces another reference to
> MAX_OCLASS.  I wonder though if we should get rid of that as an enum value
> in favor of #define'ing a LAST_OCLASS macro referencing the last enum
> item, or some other trick like that.  It's certainly inconvenient in
> event_trigger.c to have a phony member of the enum.

Yeah, that works well enough.  Pushed that way.

> Are there any other arrays that need such tests?

I looked around with this:

git grep 'const.*\[.*[^][0-9].*\].*=.*{'

and found one suspicious-looking use of MAX_ACL_KIND which we could
perhaps define in a way equivalent to what we've done here.  We also
have RM_MAX_ID in a couple of arrays but that looks safe because both
the values and the arrays are auto-generated.

We also have MAX_TIME_PRECISION, MAX_TIMESTAMP_PRECISION,
MAX_INTERVAL_PRECISION, MAXDATEFIELDS, KeyWord_INDEX_SIZE, but these
don't look likely to actually cause any trouble.

(There are many arrays sized to numerical constants, but there are about
5000 of those and I'm not in a hurry to verify how they are used.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-21 Thread Ildus Kurbangaliev


On 07/21/2015 01:18 PM, Andres Freund wrote:

On 2015-07-21 13:11:36 +0300, Ildus Kurbangaliev wrote:
  
  /*

   * Top-level transactions are identified by VirtualTransactionIDs comprising
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index cff3b99..55b0687 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -58,6 +58,9 @@ typedef struct LWLock
  #ifdef LOCK_DEBUG
struct PGPROC *owner;   /* last exlusive owner of the lock */
  #endif
+
+   /* LWLock group, initialized as -1, calculated in first acquire */
+   int group;
  } LWLock;

I'd very much like to avoid increasing the size of struct LWLock. We
have a lot of those and I'd still like to inline them with the buffer
descriptors. Why do we need a separate group and can't reuse the
tranche?  That might require creating a few more tranches, but ...?

Andres
Do you mean moving  LWLocks defined by offsets and with dynamic sizes to 
separate tranches?
It sounds like an good option, but it will require refactoring of 
current tranches. In current implementation

i tried not change current things.

Simple solution will be using uint8 for tranche (because we have only 3 
of them now,
and it will take much time to get to 255) and uint8 for group. In this 
case size will not change.


--
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fillfactor for GIN indexes

2015-07-21 Thread Alexander Korotkov

On Tue, Jul 21, 2015 at 12:40 PM, Heikki Linnakangas 
wrote:

> Has anyone done any performance testing of this?
>
> The purpose of a fillfactor is to avoid the storm of page splits right
> after building the index, when there are some random updates to the table.
> It causes the index to bloat, as every full page is split to two half-full
> pages, and also adds latency to all the updates that have to split pages.
>
> As the code stands, do we actually have such a problem with GIN? Let's
> investigate. Let's create a little test table:
>
> create table foo (id int4, t text);
> insert into foo select g, 'foo' from generate_series(1, 100) g;
>
> And some indexes on it:
>
> -- B-tree index on id, 100%
> create index btree_100 on foo (id) with (fillfactor=100);
> -- B-tree index on id, 90% (which is the default)
> create index btree_90 on foo (id) with (fillfactor=90);
> -- GIN index on id. Id is different on each row, so this index has no
> posting trees, just the entry tree.
> create index gin_id on foo using gin (id) with (fastupdate=off);
> -- GIN index on t. T is the same on each row, so this index consists of a
> single posting tree.
> create index gin_t on foo using gin (t) with (fastupdate=off);
>
> Immediately after creation, the index sizes are:
>
> postgres=# \di+
>   List of relations
>  Schema |   Name| Type  | Owner  | Table |  Size   | Description
> +---+---++---+-+-
>  public | btree_100 | index | heikki | foo   | 19 MB   |
>  public | btree_90  | index | heikki | foo   | 21 MB   |
>  public | gin_id| index | heikki | foo   | 53 MB   |
>  public | gin_t | index | heikki | foo   | 1072 kB |
> (4 rows)
>
>
> Now let's update 1% of the table, spread evenly across the table, and see
> what happens to the index sizes:
>
> postgres=# update foo set id = id where id % 100 = 0;
> UPDATE 1
> postgres=# \di+
>   List of relations
>  Schema |   Name| Type  | Owner  | Table |  Size   | Description
> +---+---++---+-+-
>  public | btree_100 | index | heikki | foo   | 39 MB   |
>  public | btree_90  | index | heikki | foo   | 21 MB   |
>  public | gin_id| index | heikki | foo   | 53 MB   |
>  public | gin_t | index | heikki | foo   | 1080 kB |
> (4 rows)
>
> As you can see, btree_100 index doubled in size. That's the effect we're
> trying to avoid with the fillfactor, and indeed in the btree_90 index it
> was avoided. However, the GIN indexes are not bloated either. Why is that?
>
> The entry tree (demonstrated by the gin_id index) is not packed tightly
> when it's built. If anything, we could pack it more tightly to make it
> smaller to begin with. For the use cases where GIN works best, though, the
> entry tree should be fairly small compared to the posting lists, so it
> shouldn't make a big difference. For the posting tree, I think that's
> because the way the items are packed in the segments. Even though we pack
> the segments as tightly as possible, there is always some free space left
> over on a page that's not enough to add another segment to it at index
> build, but can be used by the updates to add items to the existing
> segments. So in practice you always end up with some free space on the
> posting tree pages, even without a fillfactor, so the "effective
> fillfactor" even today is not actually 100%.
>

There are two reasons of this behavior. You mentioned the first one:
"effective fillfactor" for posting lists is not 100%. But the second one is
structure of posting lists. In your example UPDATE allocates new tuple at
the end of heap. So, from the point of posting lists these updates are
appends. Situation is different when you reuse space in the heap. See an
example.

drop table foo;
create table foo (id integer, val int[]) with (fillfactor = 90);
insert into foo (select g, array[g%2]::int[] from
generate_series(1,100) g);
create index foo_val_idx_100 on foo using gin(val) with (fastupdate=off,
fillfactor=100);
create index foo_val_idx_90 on foo using gin(val) with (fastupdate=off,
fillfactor=90);

# \di+
 List of relations
 Schema |  Name   | Type  | Owner  | Table |  Size   | Description
+-+---++---+-+-
 public | foo_val_idx_100 | index | smagen | foo   | 1088 kB |
 public | foo_val_idx_90  | index | smagen | foo   | 1200 kB |
(2 rows)

update foo set val = array[(id+1)%2]::int[] where id % 50 = 0;

# \di+
 List of relations
 Schema |  Name   | Type  | Owner  | Table |  Size   | Description
+-+---++---+-+-
 public | foo_val_idx_100 | index | smagen | foo   | 1608 kB |
 public | foo_val_idx_90  | index | smagen | foo   | 1200 kB |
(2 rows)

Based on this, I think we should just drop this patch. It's not useful in
> pract

Re: [HACKERS] "make check" changes have caused buildfarm deterioration.

2015-07-21 Thread Alvaro Herrera

Andres Freund wrote:
> On 2015-07-21 14:39:42 +0900, Michael Paquier wrote:
> > Regarding initdb.log and postmaster.log, this is definitely a bug.
> > Those have been moved by dcae5fa from log/ to tmp_check/log/,
> > tmp_check/ getting removed at the end of pg_regress if there are no
> > failures counted.
> 
> FWIW, I think that's bad TM. Especially when comparing good/bad
> buildfarm runs the ability to get logs from good runs is essential.

Hm, okay, so let's put it back where it was.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-21 Thread Andres Freund

On 2015-07-21 13:11:36 +0300, Ildus Kurbangaliev wrote:
>  
>  /*
>   * Top-level transactions are identified by VirtualTransactionIDs comprising
> diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
> index cff3b99..55b0687 100644
> --- a/src/include/storage/lwlock.h
> +++ b/src/include/storage/lwlock.h
> @@ -58,6 +58,9 @@ typedef struct LWLock
>  #ifdef LOCK_DEBUG
>   struct PGPROC *owner;   /* last exlusive owner of the lock */
>  #endif
> +
> + /* LWLock group, initialized as -1, calculated in first acquire */
> + int group;
>  } LWLock;

I'd very much like to avoid increasing the size of struct LWLock. We
have a lot of those and I'd still like to inline them with the buffer
descriptors. Why do we need a separate group and can't reuse the
tranche?  That might require creating a few more tranches, but ...?

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-21 Thread Ildus Kurbangaliev


Hello.
I did some refactoring to previous patch. Improvements:

1) Wait is determined by class and event without affecting to atomic 
usage of it.
They are still stored in one variable. This improvement gives an 
opportunity to make

more detailed views later (waits can be grouped by class).
2) Only active wait of each backend is visible. pg_report_wait_end() 
function called

on the end of wait and clears it.
3) Wait name determination was optimized (last version used cycles for 
each of them,
and was very heavy). I added lazy `group` field to LWLock, which used as 
index in

lwlock names array.
4) New wait can be added by more simpler way. For example an individual 
lwlock

requires only specifying its name in LWLock names arrayb
5) Added new types of waits: Storage, Network, Latch

This patch is more informative and it'll be easier to extend.

Sample:

b1=# select pid, wait_event from pg_stat_activity;

pid  |  wait_event
---+--
17099 | LWLocks: BufferCleanupLock
17100 | Locks: Transaction
17101 | LWLocks: BufferPartitionLock
17102 |
17103 | Network: READ
17086 |
(6 rows)

--
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1dd31b3..91dca5c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4661,8 +4661,10 @@ XLOGShmemInit(void)
 	LWLockRegisterTranche(XLogCtl->Insert.WALInsertLockTrancheId, &XLogCtl->Insert.WALInsertLockTranche);
 	for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
 	{
-		LWLockInitialize(&WALInsertLocks[i].l.lock,
+		LWLock *lock = &WALInsertLocks[i].l.lock;
+		LWLockInitialize(lock,
 		 XLogCtl->Insert.WALInsertLockTrancheId);
+		lock->group = LWLOCK_WAL_INSERT;
 		WALInsertLocks[i].l.insertingAt = InvalidXLogRecPtr;
 	}
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e82a53a..fd04258 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -621,6 +621,7 @@ CREATE VIEW pg_stat_activity AS
 S.query_start,
 S.state_change,
 S.waiting,
+S.wait_event,
 S.state,
 S.backend_xid,
 s.backend_xmin,
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 4a650cc..e5c023f 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -36,6 +36,7 @@
 #include "tcop/tcopprot.h"
 #include "utils/memutils.h"
 #include "storage/proc.h"
+#include "pgstat.h"
 
 
 char	   *ssl_cert_file;
@@ -129,6 +130,8 @@ secure_read(Port *port, void *ptr, size_t len)
 	ssize_t		n;
 	int			waitfor;
 
+	pgstat_report_wait_start(WAIT_NETWORK, WAIT_NETWORK_READ);
+
 retry:
 #ifdef USE_SSL
 	waitfor = 0;
@@ -175,6 +178,7 @@ retry:
 	 * interrupts from being processed.
 	 */
 	ProcessClientReadInterrupt(false);
+	pgstat_report_wait_end();
 
 	return n;
 }
@@ -209,6 +213,8 @@ secure_write(Port *port, void *ptr, size_t len)
 	ssize_t		n;
 	int			waitfor;
 
+	pgstat_report_wait_start(WAIT_NETWORK, WAIT_NETWORK_WRITE);
+
 retry:
 	waitfor = 0;
 #ifdef USE_SSL
@@ -254,6 +260,7 @@ retry:
 	 * interrupts from being processed.
 	 */
 	ProcessClientWriteInterrupt(false);
+	pgstat_report_wait_end();
 
 	return n;
 }
diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c
index 90ec4f8..dcebfe4 100644
--- a/src/backend/port/unix_latch.c
+++ b/src/backend/port/unix_latch.c
@@ -55,6 +55,7 @@
 #include "storage/latch.h"
 #include "storage/pmsignal.h"
 #include "storage/shmem.h"
+#include "pgstat.h"
 
 /* Are we currently in WaitLatch? The signal handler would like to know. */
 static volatile sig_atomic_t waiting = false;
@@ -262,6 +263,8 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
 #endif
 	}
 
+	pgstat_report_wait_start(WAIT_LATCH, 0);
+
 	waiting = true;
 	do
 	{
@@ -500,6 +503,7 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
 		}
 	} while (result == 0);
 	waiting = false;
+	pgstat_report_wait_end();
 
 	return result;
 }
diff --git a/src/backend/port/win32_latch.c b/src/backend/port/win32_latch.c
index 0e3aaee..92bb5a0 100644
--- a/src/backend/port/win32_latch.c
+++ b/src/backend/port/win32_latch.c
@@ -177,6 +177,8 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
 	/* Ensure that signals are serviced even if latch is already set */
 	pgwin32_dispatch_queued_signals();
 
+	pgstat_report_wait_start(WAIT_LATCH, 0);
+
 	do
 	{
 		/*
@@ -278,6 +280,8 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
 		}
 	} while (result == 0);
 
+	pgstat_report_wait_end();
+
 	/* Clean up the event object we created for the socket */
 	if (sockevent != WSA_INVALID_EVENT)
 	{
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e9fbc38..b379f7f 100644
--- a/src/backend/

Re: [HACKERS] "make check" changes have caused buildfarm deterioration.

2015-07-21 Thread Andres Freund

On 2015-07-21 14:39:42 +0900, Michael Paquier wrote:
> Regarding initdb.log and postmaster.log, this is definitely a bug.
> Those have been moved by dcae5fa from log/ to tmp_check/log/,
> tmp_check/ getting removed at the end of pg_regress if there are no
> failures counted.

FWIW, I think that's bad TM. Especially when comparing good/bad
buildfarm runs the ability to get logs from good runs is essential.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "make check" changes have caused buildfarm deterioration.

2015-07-21 Thread Alvaro Herrera

Michael Paquier wrote:
> On Tue, Jul 21, 2015 at 2:39 PM, Michael Paquier
>  wrote:

> > Regarding initdb.log and postmaster.log, this is definitely a bug.
> > Those have been moved by dcae5fa from log/ to tmp_check/log/,
> > tmp_check/ getting removed at the end of pg_regress if there are no
> > failures counted. Both files will be saved in log/ at the location
> > pg_regress is called using outputdir whose default is ".". This way
> > behavior is similar to ~9.4. Attached is a patch to fix this for 9.5
> > and master.
> 
> Something I just noticed: an entry for log/ in test_ddl_deparse's
> gitignore is missing.

Well, three things: 1) the entry is not missing right now, but it will
be missing if we apply your patch; 2) the file is inconsistent with the
other test modules anyway so we might as well apply your patch to make
them all alike; 3) we shouldn't really do anything about that until the
other patch's fate is decided.

So, regarding the other patch, I don't know if it's that useful to keep
the log files when the check succeeds; and if it fails, the only problem
we have, I think, is that the path is wrong in the buildfarm code and
that seems easily fixed, but do we want to make the master branch
different from the others?  Maybe the BF code can look up the new path
first, and if it can't find the file then look in the old path.

Unless I'm misunderstanding the whole thing.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improve BRIN documentation somewhat

2015-07-21 Thread Emre Hasegeli

> Emre Hasegeli told me on IM he's going to submit a patch to add
> something similar for the inclusion opclass framework.

It is attached.  Thank you for pushing forward to improve the documentation.


0001-Improve-BRIN-documentation-for-Inclusion-opclass.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fillfactor for GIN indexes

2015-07-21 Thread Heikki Linnakangas


Has anyone done any performance testing of this?

The purpose of a fillfactor is to avoid the storm of page splits right 
after building the index, when there are some random updates to the 
table. It causes the index to bloat, as every full page is split to two 
half-full pages, and also adds latency to all the updates that have to 
split pages.


As the code stands, do we actually have such a problem with GIN? Let's 
investigate. Let's create a little test table:


create table foo (id int4, t text);
insert into foo select g, 'foo' from generate_series(1, 100) g;

And some indexes on it:

-- B-tree index on id, 100%
create index btree_100 on foo (id) with (fillfactor=100);
-- B-tree index on id, 90% (which is the default)
create index btree_90 on foo (id) with (fillfactor=90);
-- GIN index on id. Id is different on each row, so this index has no 
posting trees, just the entry tree.

create index gin_id on foo using gin (id) with (fastupdate=off);
-- GIN index on t. T is the same on each row, so this index consists of 
a single posting tree.

create index gin_t on foo using gin (t) with (fastupdate=off);

Immediately after creation, the index sizes are:

postgres=# \di+
  List of relations
 Schema |   Name| Type  | Owner  | Table |  Size   | Description
+---+---++---+-+-
 public | btree_100 | index | heikki | foo   | 19 MB   |
 public | btree_90  | index | heikki | foo   | 21 MB   |
 public | gin_id| index | heikki | foo   | 53 MB   |
 public | gin_t | index | heikki | foo   | 1072 kB |
(4 rows)


Now let's update 1% of the table, spread evenly across the table, and 
see what happens to the index sizes:


postgres=# update foo set id = id where id % 100 = 0;
UPDATE 1
postgres=# \di+
  List of relations
 Schema |   Name| Type  | Owner  | Table |  Size   | Description
+---+---++---+-+-
 public | btree_100 | index | heikki | foo   | 39 MB   |
 public | btree_90  | index | heikki | foo   | 21 MB   |
 public | gin_id| index | heikki | foo   | 53 MB   |
 public | gin_t | index | heikki | foo   | 1080 kB |
(4 rows)

As you can see, btree_100 index doubled in size. That's the effect we're 
trying to avoid with the fillfactor, and indeed in the btree_90 index it 
was avoided. However, the GIN indexes are not bloated either. Why is that?


The entry tree (demonstrated by the gin_id index) is not packed tightly 
when it's built. If anything, we could pack it more tightly to make it 
smaller to begin with. For the use cases where GIN works best, though, 
the entry tree should be fairly small compared to the posting lists, so 
it shouldn't make a big difference. For the posting tree, I think that's 
because the way the items are packed in the segments. Even though we 
pack the segments as tightly as possible, there is always some free 
space left over on a page that's not enough to add another segment to it 
at index build, but can be used by the updates to add items to the 
existing segments. So in practice you always end up with some free space 
on the posting tree pages, even without a fillfactor, so the "effective 
fillfactor" even today is not actually 100%.


Based on this, I think we should just drop this patch. It's not useful 
in practice.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 106 matches

Mail list logo