date:20160107

Re: [HACKERS] Comment typo in port/atomics/arch-x86.h

2016-01-07 Thread Magnus Hagander

On Fri, Jan 8, 2016 at 7:37 AM, Tatsuro Yamada  wrote:

> Hi,
>
> I found a typo in arch-x86.h.
> The attached patch fixes a typo: s/468/486/g
>
>
Applied, thanks.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Duplicate 'use' for TestLib in 001_ssltests.pl

2016-01-07 Thread Magnus Hagander

On Fri, Jan 8, 2016 at 8:37 AM, Michael Paquier 
wrote:

> On Fri, Jan 8, 2016 at 4:16 PM, Kyotaro HORIGUCHI
>  wrote:
> > Hello, I noticed that 001_ssltests.pl uselessly (perhaps) imports
> > TestLib twice. Is this intentional? The following change made no
> > differences.
> >
> >
> > diff --git a/src/test/ssl/t/001_ssltests.pl b/src/test/ssl/t/
> 001_ssltests.pl
> > index 9ce0cf3..cd38e45 100644
> > --- a/src/test/ssl/t/001_ssltests.pl
> > +++ b/src/test/ssl/t/001_ssltests.pl
> > @@ -2,7 +2,6 @@ use strict;
> >  use warnings;
> >  use PostgresNode;
> >  use TestLib;
> > -use TestLib;
> >  use Test::More tests => 38;
> >  use ServerSetup;
> >  use File::Copy;
>
> Nice catch. Yes that's an oversight and should be removed.
>

Applied, thanks.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Duplicate 'use' for TestLib in 001_ssltests.pl

2016-01-07 Thread Michael Paquier

On Fri, Jan 8, 2016 at 4:16 PM, Kyotaro HORIGUCHI
 wrote:
> Hello, I noticed that 001_ssltests.pl uselessly (perhaps) imports
> TestLib twice. Is this intentional? The following change made no
> differences.
>
>
> diff --git a/src/test/ssl/t/001_ssltests.pl b/src/test/ssl/t/001_ssltests.pl
> index 9ce0cf3..cd38e45 100644
> --- a/src/test/ssl/t/001_ssltests.pl
> +++ b/src/test/ssl/t/001_ssltests.pl
> @@ -2,7 +2,6 @@ use strict;
>  use warnings;
>  use PostgresNode;
>  use TestLib;
> -use TestLib;
>  use Test::More tests => 38;
>  use ServerSetup;
>  use File::Copy;

Nice catch. Yes that's an oversight and should be removed.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in MergeAttributesIntoExisting() function.

2016-01-07 Thread amul sul

>On Monday, 4 January 2016 8:24 PM, Tom Lane  wrote:

>Either way, however, the way you declared c2, it has an independent
>local definition of all four columns, and so they should not go away
>even if the parent's columns go away.  This is exactly the purpose
>that attislocal was created to serve, and your patch destroys it.

Understood this is not a bug, thank you.


I missed ALTER TABLE statement in previous test case, correct test case is as 
follow:
-- create table
CREATE TABLE p1 (a int , b int);
CREATE TABLE c2 (a int , b int);

-- alter c2' inheritance 
ALTER TABLE c2 INHERIT p1;

-- drop column b
ALTER TABLE p1 DROP COLUMN b;

-- table description 


postgres=# \d p1
Table "public.p1"
Column |  Type  | Modifiers 
+-+---
a| integer | 
Number of child tables: 1 (Use \d+ to list them.)

postgres=# \d c2
Table "public.c2"
Column |  Type  | Modifiers 
+-+---
a| integer | 
b| integer | 
Inherits: p1



Regards,
Amul Sul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Duplicate 'use' for TestLib in 001_ssltests.pl

2016-01-07 Thread Kyotaro HORIGUCHI

Hello, I noticed that 001_ssltests.pl uselessly (perhaps) imports
TestLib twice. Is this intentional? The following change made no
differences.


diff --git a/src/test/ssl/t/001_ssltests.pl b/src/test/ssl/t/001_ssltests.pl
index 9ce0cf3..cd38e45 100644
--- a/src/test/ssl/t/001_ssltests.pl
+++ b/src/test/ssl/t/001_ssltests.pl
@@ -2,7 +2,6 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use TestLib;
 use Test::More tests => 38;
 use ServerSetup;
 use File::Copy;


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: PL/Pythonu - function ereport

2016-01-07 Thread Pavel Stehule

2016-01-08 7:31 GMT+01:00 Catalin Iacob :

> shellOn Thu, Dec 31, 2015 at 11:37 AM, Pavel Stehule
>  wrote:
> > here is new version.
>
> And here's a new review, sorry for the delay.
>
> > Now I use a common ancestor "plpy.BaseError" for plpy builtin classes. So
> > plpy.SPIError isn't descendant of plpy.Error and then there are not
> possible
> > incompatibility issues.
>
> That's good.
>
> > Instead modification builtin function plpy.debug, plpy.info, ... and
> > introduction incompatibility I wrote new set of functions with keyword
> > parameters (used mainly  for elevel < ERROR):
> >
> > plpy.raise_debug, plpy.raise_info, plpy.raise_notice, plpy.raise_warning,
> > plpy.raise_error and plpy.raise_fatal.
>
> I'm on the fence whether these are good ideas. On one hand having them
> is nice and they avoid changing the existing plpy.debug etc. in
> backward incompatible ways, on the other hand they're similar to those
> so it can be confusing to know which ones to pick. Any opinions from
> others on whether it's better to add these or not?
>
> The docs need more work, especially if raise_* are kept, as the docs
> should then clearly explain the differences between the 2 sets and
> nudge users toward the new raise_* functions. I can help with that
> though if there are objections to these functions I wouldn't mind
> hearing it before I document them.
>

ok, we will wait.


>
> As for the code:
>
> 1. there are quite some out of date comments referring to SPIError in
> places that now handle more types (like /* prepare dictionary with
> __init__ method for SPIError class */ in plpy_plpymodule.c)
>
> 2. you moved PLy_spi_exception_set from plpy_spi.c to plpy_elog.c and
> renamed it to PLy_base_exception_set but it's still spi specific: it
> still sets an attribute called spidata. You needed this because you
> call it in PLy_output_kw but can't you instead make PLy_output_kw
> similar to PLy_output and just call PLy_exception_set in the PG_CATCH
> block like PLy_output does? With the patch plpy.error(msg) will raise
> an error object without an spidata attribute but plpy.raise_error(msg)
> will raise an error with an spidata attribute.
>
> 3. PLy_error__init__ is used for BaseError but always sets an
> attribute called spidata, I would expect that to only happen for
> SPIError not for BaseError. You'll need to pick some other way of
> attaching the error details to BaseError instances. Similarly,
> PLy_get_spi_error_data became PLy_get_error_data and it's invoked on
> other classes than SPIError but it still always looks at the spidata
> attribute.
>

I afraid of compatibility issues - so I am not changing internal format
simply. Some legacy application can be based on detection of spidata
attribute. So I cannot to change this structure for SPIError.

I can change it for BaseError, but this is question. Internally BaseError
and SPIError share same data. So it can be strange if BaseError and
SPIError uses different internal formats.

I am interesting your opinion??


>
> 4. it wouldn't hurt to expand the tests a bit to also raise plpy.Fatal
> with keyword arguments and maybe catch BaseError and inspect it a bit
> to see it contains reasonable values (per 4. have spidata when raising
> an SPIError but not when raising an Error or BaseError or Fatal etc.)
>

a Fatal cannnot be raised - it is a session end. Personally - support of
Fatal level is wrong, and it should not be propagated to user level, but it
is now. And then due consistency I wrote support for fatal level. But hard
to test it.



>
> As seen from 1, 2, and 3 the patch is still too much about SPIError.
> As I see it, it should only touch SPIError to make it inherit from the
> new BaseError but for the rest, the patch shouldn't change it and
> shouldn't propagate spidata attributes and SPIError comments. As it
> stands, the patch has historical artifacts that show it was initially
> a patch for SPIError but it's now a patch for error reporting so those
> should go away.
>
> I'll set the patch back to waiting for author.
>

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-01-07 Thread Michael Paquier

On Fri, Jan 8, 2016 at 1:53 PM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
> At Mon, 4 Jan 2016 15:29:34 +0900, Michael Paquier 
>  wrote in 
> 
>> > Attached latest v5 patch.
>> > Please review it.
>>
>> Something that I find rather scary with this patch: could it be
>> possible to get actual regression tests now that there is more
>> machinery with PostgresNode.pm? As syncrep code paths get more and
>> more complex, so are debugging and maintenance.
>
> The test on the whole replication system will very likely to be
> too complex and hard to stabilize, and would be
> disproportionately large to other tests.

I don't buy that much. Mind you, there is in this commit fest a patch
introducing a basic regression test suite for recovery using the new
infrastructure that has been committed last month. You may want to
look at it.

> This patch mainly changes the logic to choose the next syncrep
> standbys and calculate the 'synched' LSNs, so performing separate
> module tests for the logics, then perform the test for the
> behavior according to the result of that by, perhaps,
> PostgresNode.pm would remarkably reduce the labor for
> testing.
> Could we have some tapping point for individual testing of the
> logics in appropriate way?

Isn't pg_stat_replication enough for this purpose? What you basically
need to do is set up a master, a set of slaves and then look at the
WAL sender status. Am I getting that wrong?

> In order to do so, the logics should be able to be fed arbitrary
> complete set of parameters, in other words, defining a kind of
> API to use the logics from the core side, even though it is not
> an extension. Then we will *somehow* kick the API with some set
> of parameters in regest.

Well, you will need to craft in the syncrep test suite associated in
this patch a set of routines that allows to set up appropriately
s_s_names and the other parameters that this patch introduces. I does
not sound like a barrier impossible to cross.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Comment typo in port/atomics/arch-x86.h

2016-01-07 Thread Tatsuro Yamada

Hi,

I found a typo in arch-x86.h.
The attached patch fixes a typo: s/468/486/g

Best regards,
Tatsuro Yamada
*** a/src/include/port/atomics/arch-x86.h
--- b/src/include/port/atomics/arch-x86.h
***
*** 67,73  typedef struct pg_atomic_uint32
  
  /*
   * It's too complicated to write inline asm for 64bit types on 32bit and the
!  * 468 can't do it anyway.
   */
  #ifdef __x86_64__
  #define PG_HAVE_ATOMIC_U64_SUPPORT
--- 67,73 
  
  /*
   * It's too complicated to write inline asm for 64bit types on 32bit and the
!  * 486 can't do it anyway.
   */
  #ifdef __x86_64__
  #define PG_HAVE_ATOMIC_U64_SUPPORT

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: PL/Pythonu - function ereport

2016-01-07 Thread Catalin Iacob

shellOn Thu, Dec 31, 2015 at 11:37 AM, Pavel Stehule
 wrote:
> here is new version.

And here's a new review, sorry for the delay.

> Now I use a common ancestor "plpy.BaseError" for plpy builtin classes. So
> plpy.SPIError isn't descendant of plpy.Error and then there are not possible
> incompatibility issues.

That's good.

> Instead modification builtin function plpy.debug, plpy.info, ... and
> introduction incompatibility I wrote new set of functions with keyword
> parameters (used mainly  for elevel < ERROR):
>
> plpy.raise_debug, plpy.raise_info, plpy.raise_notice, plpy.raise_warning,
> plpy.raise_error and plpy.raise_fatal.

I'm on the fence whether these are good ideas. On one hand having them
is nice and they avoid changing the existing plpy.debug etc. in
backward incompatible ways, on the other hand they're similar to those
so it can be confusing to know which ones to pick. Any opinions from
others on whether it's better to add these or not?

The docs need more work, especially if raise_* are kept, as the docs
should then clearly explain the differences between the 2 sets and
nudge users toward the new raise_* functions. I can help with that
though if there are objections to these functions I wouldn't mind
hearing it before I document them.

As for the code:

1. there are quite some out of date comments referring to SPIError in
places that now handle more types (like /* prepare dictionary with
__init__ method for SPIError class */ in plpy_plpymodule.c)

2. you moved PLy_spi_exception_set from plpy_spi.c to plpy_elog.c and
renamed it to PLy_base_exception_set but it's still spi specific: it
still sets an attribute called spidata. You needed this because you
call it in PLy_output_kw but can't you instead make PLy_output_kw
similar to PLy_output and just call PLy_exception_set in the PG_CATCH
block like PLy_output does? With the patch plpy.error(msg) will raise
an error object without an spidata attribute but plpy.raise_error(msg)
will raise an error with an spidata attribute.

3. PLy_error__init__ is used for BaseError but always sets an
attribute called spidata, I would expect that to only happen for
SPIError not for BaseError. You'll need to pick some other way of
attaching the error details to BaseError instances. Similarly,
PLy_get_spi_error_data became PLy_get_error_data and it's invoked on
other classes than SPIError but it still always looks at the spidata
attribute.

4. it wouldn't hurt to expand the tests a bit to also raise plpy.Fatal
with keyword arguments and maybe catch BaseError and inspect it a bit
to see it contains reasonable values (per 4. have spidata when raising
an SPIError but not when raising an Error or BaseError or Fatal etc.)

As seen from 1, 2, and 3 the patch is still too much about SPIError.
As I see it, it should only touch SPIError to make it inherit from the
new BaseError but for the rest, the patch shouldn't change it and
shouldn't propagate spidata attributes and SPIError comments. As it
stands, the patch has historical artifacts that show it was initially
a patch for SPIError but it's now a patch for error reporting so those
should go away.

I'll set the patch back to waiting for author.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Weighted Stats

2016-01-07 Thread Haribabu Kommi

On Mon, Dec 21, 2015 at 1:50 PM, David Fetter  wrote:
> On Sun, Dec 20, 2015 at 06:13:33PM -0600, Jim Nasby wrote:
>> On 11/2/15 5:46 PM, David Fetter wrote:
>> >I'd like to add weighted statistics to PostgreSQL
>>
>> Anything happen with this? If community isn't interested, ISTM it'd be good
>> to put this in PGXN.
>
> I think it's already in PGXN as an extension, and I'll get another
> version out this early this week, as it involves mostly adding some
> tests.
>
> I'll do the float8 ones for core this week, too, and unless there's a
> really great reason to do more data types on the first pass, it should
> be in committable shape.

I reviewed the patch, following are my observations.

1. +   precision, numeric, or interval

with interval type it is giving problem. As interval data type is not supported,
so remove it in the list of supported inputs.

postgres=# select weighted_avg(f7,f1) from tbl;
ERROR:  function weighted_avg(interval, smallint) does not exist at character 8
HINT:  No function matches the given name and argument types. You
might need to add explicit type casts.

2. +float8_weighted_avg(PG_FUNCTION_ARGS)

It will be helpful, if you provide some information as a function header,
how the weighted average is calculated similar like other weighted functions.

3. + transvalues = check_float8_array(transarray,
"float8_weighted_stddev_accum", 4);

The second parameter to check_float8_array should be "float8_weighted_accum".

4. There is an OID conflict of 4066 with latest master code.

5.+ A += newvalW * ( newvalX - transvalues[2] ) / W;
+ CHECKFLOATVAL(A, isinf(newvalW) || isinf(newvalX - transvalues[2])
|| isinf(1.0/W), true);

+ Q += newvalW * (newvalX - transvalues[2]) * (newvalX - A);
+ CHECKFLOATVAL(A, isinf(newvalX -  transvalues[3]) || isinf(newvalX -
A) || isinf(1.0/W), true);

Is the need of calculation also needs to be passed to CHECKFLOATVAL?
Just passing
the variables involved in the calculation isn't enough? If expressions
are required then
it should be something as follows?

CHECKFLOATVAL(A, isinf(transvalues[2]) || isinf(newvalW) ||
isinf(newvalX - transvalues[2]) || isinf(1.0/W), true);

CHECKFLOATVAL(Q, isinf(transvalues[3]) || isinf(newvalX -
transvalues[2]) || isinf(newvalX - A) || isinf(1.0/W), true);

I verified the stddev transition and final function calculations
according to wikipedia
and they are fine.

Regards,
Hari Babu
Fujitsu Australia

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses

2016-01-07 Thread Peter Geoghegan

On Thu, Jan 7, 2016 at 7:41 AM, Alvaro Herrera  wrote:
> You're stealthily introducing a new abstraction called "string",
> including a typedef and DatumGetString support macros.  Is that really
> necessary?  Shouldn't it be discussed specifically?  I don't necessarily
> oppose it as is, mainly because it's limited to within varlena.c for
> now, but I'm not sure it'd okay to make it more public.

Note that a similar abstraction for the "unknown" type also exists
only within varlena.c.  So, DatumGetStringP() and so on appear right
alongside DatumGetUnknownP() and so on.

The idea of the "string" abstraction is that is advertises that
certain functions could equally well apply to a variety of "varlena
header + some bytes" types. I thought about just using the varlena
type instead, but preferred the "string" abstraction.

> Also, there's a lot of churn in this patch to just remove tss to sss.
> Can't we just keep the original variable name?

I think that minimizing lines changed in a mechanical way by a commit
is overrated as a goal for Postgres patches, but I don't feel too
strongly about holding on to the "churn" in this patch. I attach a new
revision, which has the changes I outlined to code comments. I haven't
minimized the differences in the way you suggest just yet.

-- 
Peter Geoghegan
From b18d45437e99657fed990edf718c21ad6d212970 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Sun, 8 Nov 2015 15:34:32 -0800
Subject: [PATCH] Generalize SortSupport for text

Expose interface that allows char(n) and bytea types to piggyback on a
now-generalized SortSupport for text.  This pushes a little more
knowledge of the bpchar/char(n) type into varlena.c than might be
preferred, but that seems like the approach that creates least friction.

Also, accelerate index builds that use the text_pattern_ops opfamily
(text_pattern_ops and varchar_pattern_ops opclasses), and the
bpchar_pattern_ops opfamily (which has one opclass, also named
bpchar_pattern_ops) by introducing SortSupport.  These work just like
the bytea caller of the generalized SortSupport for text interface --
the "C" locale is forced.
---
 doc/src/sgml/datatype.sgml  |   3 +-
 src/backend/utils/adt/varchar.c |  55 +-
 src/backend/utils/adt/varlena.c | 391 +++-
 src/include/catalog/pg_amproc.h |   4 +
 src/include/catalog/pg_proc.h   |   8 +
 src/include/utils/builtins.h|   6 +
 src/include/utils/bytea.h   |   1 +
 7 files changed, 334 insertions(+), 134 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index d1db0d2..6513b21 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -1140,8 +1140,7 @@ SELECT '52093.89'::money::numeric::float8;
  advantages in some other database systems, there is no such advantage in
  PostgreSQL; in fact
  character(n) is usually the slowest of
- the three because of its additional storage costs and slower
- sorting.  In most situations
+ the three because of its additional storage costs.  In most situations
  text or character varying should be used
  instead.
 
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
index 0498fef..94d6da5 100644
--- a/src/backend/utils/adt/varchar.c
+++ b/src/backend/utils/adt/varchar.c
@@ -17,6 +17,7 @@
 
 #include "access/hash.h"
 #include "access/tuptoaster.h"
+#include "catalog/pg_collation.h"
 #include "libpq/pqformat.h"
 #include "nodes/nodeFuncs.h"
 #include "utils/array.h"
@@ -649,14 +650,21 @@ varchartypmodout(PG_FUNCTION_ARGS)
  */
 
 /* "True" length (not counting trailing blanks) of a BpChar */
-static int
+static inline int
 bcTruelen(BpChar *arg)
 {
-	char	   *s = VARDATA_ANY(arg);
+	return bpchartruelen(VARDATA_ANY(arg), VARSIZE_ANY_EXHDR(arg));
+}
+
+int
+bpchartruelen(char *s, int len)
+{
 	int			i;
-	int			len;
 
-	len = VARSIZE_ANY_EXHDR(arg);
+	/*
+	 * Note that we rely on the assumption that ' ' is a singleton unit on
+	 * every supported multibyte server encoding.
+	 */
 	for (i = len - 1; i >= 0; i--)
 	{
 		if (s[i] != ' ')
@@ -859,6 +867,23 @@ bpcharcmp(PG_FUNCTION_ARGS)
 }
 
 Datum
+bpchar_sortsupport(PG_FUNCTION_ARGS)
+{
+	SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
+	Oid			collid = ssup->ssup_collation;
+	MemoryContext oldcontext;
+
+	oldcontext = MemoryContextSwitchTo(ssup->ssup_cxt);
+
+	/* Use generic string SortSupport */
+	varstr_sortsupport(ssup, collid, true);
+
+	MemoryContextSwitchTo(oldcontext);
+
+	PG_RETURN_VOID();
+}
+
+Datum
 bpchar_larger(PG_FUNCTION_ARGS)
 {
 	BpChar	   *arg1 = PG_GETARG_BPCHAR_PP(0);
@@ -926,8 +951,9 @@ hashbpchar(PG_FUNCTION_ARGS)
 /*
  * The following operators support character-by-character comparison
  * of bpchar datums, to allow building indexes suitable for LIKE clauses.
- * Note that the regular bpchareq/bpcharne comparison operators are assumed
- * to be compatible wi

Odd behavior in foreign table modification (Was: Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW)

2016-01-07 Thread Etsuro Fujita


On 2016/01/07 21:50, Etsuro Fujita wrote:

On 2016/01/06 20:37, Thom Brown wrote:

On 25 December 2015 at 10:00, Etsuro Fujita
 wrote:

Attached is an updated version of the patch, which is
still WIP, but I'd be happy if I could get any feedback.



I've run into an issue:

*# UPDATE master_customers SET id = 22 WHERE id = 16 RETURNING
tableoid::regclass;
ERROR:
CONTEXT:  Remote SQL command: UPDATE public.customers SET id = 22
WHERE ((id = 16)) RETURNING NULL



Will fix.


While working on this, I noticed that the existing postgres_fdw system 
shows similar behavior, so I changed the subject.


IIUC, the reason for that is when the local query specifies "RETURNING 
tableoid::regclass", the FDW has fmstate->has_returning=false while the 
remote query executed at ModifyTable has "RETURNING NULL", as shown in 
the above example; that would cause an abnormal exit in executing the 
remote query in postgresExecForeignUpdate, since that the FDW would get 
PGRES_TUPLES_OK as a result of the query while the FDW would think that 
the right result to get should be PGRES_COMMAND_OK, from the flag 
fmstate->has_returning=false.


Attached is a patch to fix that.

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/deparse.c
--- b/contrib/postgres_fdw/deparse.c
***
*** 1003,1008  deparseReturningList(StringInfo buf, PlannerInfo *root,
--- 1003,1009 
  	 List **retrieved_attrs)
  {
  	Bitmapset  *attrs_used = NULL;
+ 	bool		has_returning = false;
  
  	if (trig_after_row)
  	{
***
*** 1021,1031  deparseReturningList(StringInfo buf, PlannerInfo *root,
--- 1022,1072 
  	   &attrs_used);
  	}
  
+ 	/*
+ 	 * Check to see whether the remote query has a RETURNING clause.
+ 	 *
+ 	 * XXX be careful to keep this in sync with deparseTargetList.
+ 	 */
  	if (attrs_used != NULL)
  	{
+ 		if (bms_is_member(SelfItemPointerAttributeNumber - FirstLowInvalidHeapAttributeNumber,
+ 		  attrs_used))
+ 			has_returning = true;
+ 		else
+ 		{
+ 			TupleDesc	tupdesc = RelationGetDescr(rel);
+ 			bool		have_wholerow;
+ 			int			i;
+ 
+ 			/* If there's a whole-row reference, we'll need all the columns. */
+ 			have_wholerow = bms_is_member(0 - FirstLowInvalidHeapAttributeNumber,
+ 		  attrs_used);
+ 
+ 			for (i = 1; i <= tupdesc->natts; i++)
+ 			{
+ Form_pg_attribute attr = tupdesc->attrs[i - 1];
+ 
+ /* Ignore dropped attributes. */
+ if (attr->attisdropped)
+ 	continue;
+ 
+ if (have_wholerow ||
+ 	bms_is_member(i - FirstLowInvalidHeapAttributeNumber,
+   attrs_used))
+ {
+ 	has_returning = true;
+ 	break;
+ }
+ 			}
+ 		}
+ 	}
+ 
+ 	if (has_returning != false)
+ 	{
  		appendStringInfoString(buf, " RETURNING ");
  		deparseTargetList(buf, root, rtindex, rel, attrs_used,
  		  retrieved_attrs);
+ 		Assert(*retrieved_attrs != NIL);
  	}
  	else
  		*retrieved_attrs = NIL;
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 2408,2413  SELECT c1,c2,c3,c4 FROM ft2 ORDER BY c1;
--- 2408,2466 
   1104 | 204 | ddd| 
  (819 rows)
  
+ EXPLAIN (verbose, costs off)
+ INSERT INTO ft2 (c1,c2,c3) VALUES (,999,'foo') RETURNING tableoid::regclass;
+QUERY PLAN
+ -
+  Insert on public.ft2
+Output: (tableoid)::regclass
+Remote SQL: INSERT INTO "S 1"."T 1"("C 1", c2, c3, c4, c5, c6, c7, c8) VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
+->  Result
+  Output: , 999, NULL::integer, 'foo'::text, NULL::timestamp with time zone, NULL::timestamp without time zone, NULL::character varying, 'ft2   '::character(10), NULL::user_enum
+ (5 rows)
+ 
+ INSERT INTO ft2 (c1,c2,c3) VALUES (,999,'foo') RETURNING tableoid::regclass;
+  tableoid 
+ --
+  ft2
+ (1 row)
+ 
+ EXPLAIN (verbose, costs off)
+ UPDATE ft2 SET c3 = 'bar' WHERE c1 =  RETURNING tableoid::regclass;
+ QUERY PLAN 
+ ---
+  Update on public.ft2
+Output: (tableoid)::regclass
+Remote SQL: UPDATE "S 1"."T 1" SET c3 = $2 WHERE ctid = $1
+->  Foreign Scan on public.ft2
+  Output: c1, c2, NULL::integer, 'bar'::text, c4, c5, c6, c7, c8, ctid
+  Remote SQL: SELECT "C 1", c2, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" = )) FOR UPDATE
+ (6 rows)
+ 
+ UPDATE ft2 SET c3 = 'bar' WHERE c1 =  RETURNING tableoid::regclass;
+  tableoid 
+ -

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-01-07 Thread Kyotaro HORIGUCHI

Hello,

At Mon, 4 Jan 2016 15:29:34 +0900, Michael Paquier  
wrote in 
> > Attached latest v5 patch.
> > Please review it.
> 
> Something that I find rather scary with this patch: could it be
> possible to get actual regression tests now that there is more
> machinery with PostgresNode.pm? As syncrep code paths get more and
> more complex, so are debugging and maintenance.

The test on the whole replication system will very likely to be
too complex and hard to stabilize, and would be
disproportionately large to other tests.

This patch mainly changes the logic to choose the next syncrep
standbys and calculate the 'synched' LSNs, so performing separate
module tests for the logics, then perform the test for the
behavior according to the result of that by, perhaps,
PostgresNode.pm would remarkably reduce the labor for
testing.

Could we have some tapping point for individual testing of the
logics in appropriate way?

In order to do so, the logics should be able to be fed arbitrary
complete set of parameters, in other words, defining a kind of
API to use the logics from the core side, even though it is not
an extension. Then we will *somehow* kick the API with some set
of parameters in regest.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_conversion seems rather strangely defined

2016-01-07 Thread Tom Lane

Tatsuo Ishii  writes:
>> It would only be important to be able to do it like that if different
>> users of the same database had conflicting ideas about what was the
>> correct conversion between client and database encodings.  I submit
>> that that's somewhere around epsilon probability, considering we have
>> not even heard of anyone replacing the system conversions at all.

> I used to had a customer who needs to have different client and
> database encoding than the default.  That is, a slightly different
> mapping between Shift-JIS and other database encoding.  Due to
> unfortunate historical reasons, there are several Shift-JIS variants
> (in addition to the standard defined by government, there are IBM, NEC
> and Microsoft versions).  This is the reason why I wanted to have the
> functionality at that time.  I'm not sure the customer still uses the
> functionality, but it is possible that there are other users who have
> similar use cases, since the Shift-JIS variants are still used.

Hm.  Odd that we've not heard complaints about the removal of
CONVERT(... USING ...), then.

I think it would be a good idea at least to put back some equivalent
of CONVERT(... USING ...), if for no other reason than that it would
ease testing.  As I understood it, the argument to remove it was not
that the functionality was bad, but that we were using a SQL-standard
syntax for what we concluded was not SQL-standard functionality.
I'd propose putting it back with a syntax of, say,

convert_with(input bytea, conversion_name text) returns bytea

As for the client encoding conversion case, I still say a
search-path-based lookup is a horrible idea, and furthermore there
seems no very good reason why it has to be restricted to default
conversions.  Aside from other arguments, that tends to push people
to mark *every* conversion as default, which is outright silly if
you have several competing ones.

As a sketch of an alternative, consider inventing a GUC named
preferred_conversions or some such, which is a list of
possibly-schema-qualified conversion names.  When establishing an
original or new value of client_encoding, we look through the list
for the first entry that exists and performs the desired encoding
conversion (whether or not it is default).  If there is no match,
look up the (unique) default conversion for the encoding pair, and
use that.  (Obviously this has to be done twice, once for each
direction, when setting up client encoding conversions.)  We could
apply the same rules for identifying which specific conversion to use
in convert() and siblings.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_conversion seems rather strangely defined

2016-01-07 Thread Tatsuo Ishii

> It would only be important to be able to do it like that if different
> users of the same database had conflicting ideas about what was the
> correct conversion between client and database encodings.  I submit
> that that's somewhere around epsilon probability, considering we have
> not even heard of anyone replacing the system conversions at all.

I used to had a customer who needs to have different client and
database encoding than the default.  That is, a slightly different
mapping between Shift-JIS and other database encoding.  Due to
unfortunate historical reasons, there are several Shift-JIS variants
(in addition to the standard defined by government, there are IBM, NEC
and Microsoft versions).  This is the reason why I wanted to have the
functionality at that time.  I'm not sure the customer still uses the
functionality, but it is possible that there are other users who have
similar use cases, since the Shift-JIS variants are still used.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: BSD Authentication support

2016-01-07 Thread Marisa Emerson

There's a port for PAM, but we would prefer to use BSD Auth as its quite 
a lot cleaner and is standard on OpenBSD.


I've attached an updated patch that includes documentation. It has been 
tested against OpenBSD 5.8. I'll add this thread to the commitfest.


On 07/01/16 23:26, Greg Stark wrote:

This sounds like a sensible thing to me. I'm actually surprised, it
sounds like something we would have already seen. Do some people just
use PAM on OpenBSD? Are both supported?

You should add the patch to https://commitfest.postgresql.org to
ensure it doesn't slip through the cracks. It's too late for January
though there's nothing stopping people from commenting on or even
committing patches outside the commitfest.

diff --git a/configure b/configure
index 5772d0e..c982e2b 100755
--- a/configure
+++ b/configure
@@ -826,6 +826,7 @@ with_python
 with_gssapi
 with_krb_srvnam
 with_pam
+with_bsd_auth
 with_ldap
 with_bonjour
 with_openssl
@@ -1514,6 +1515,7 @@ Optional Packages:
   --with-krb-srvnam=NAME  default service principal name in Kerberos (GSSAPI)
   [postgres]
   --with-pam  build with PAM support
+  --with-bsd-auth build with BSD Authentication support
   --with-ldap build with LDAP support
   --with-bonjour  build with Bonjour support
   --with-openssl  build with OpenSSL support
@@ -5557,6 +5559,41 @@ $as_echo "$with_pam" >&6; }
 
 
 #
+# BSD
+#
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether to build with BSD support" >&5
+$as_echo_n "checking whether to build with BSD support... " >&6; }
+
+
+
+# Check whether --with-bsd-auth was given.
+if test "${with_bsd_auth+set}" = set; then :
+  withval=$with_bsd_auth;
+  case $withval in
+yes)
+
+$as_echo "#define USE_BSD_AUTH 1" >>confdefs.h
+
+  ;;
+no)
+  :
+  ;;
+*)
+  as_fn_error $? "no argument expected for --with-bsd-auth option" "$LINENO" 5
+  ;;
+  esac
+
+else
+  with_bsd_auth=no
+
+fi
+
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $with_bsd" >&5
+$as_echo "$with_bsd" >&6; }
+
+
+#
 # LDAP
 #
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether to build with LDAP support" >&5
@@ -10475,6 +10512,23 @@ done
 
 fi
 
+if test "$with_bsd" = yes ; then
+  for ac_header in bsd_auth.h
+do :
+  ac_fn_c_check_header_mongrel "$LINENO" "bsd_auth.h" "ac_cv_header_bsd_auth_h" "$ac_includes_default"
+if test "x$ac_cv_header_bsd_auth_h" = xyes; then :
+  cat >>confdefs.h <<_ACEOF
+#define HAVE_BSD_AUTH_H 1
+_ACEOF
+
+else
+  as_fn_error $? "header file  is required for BSD Authentication support" "$LINENO" 5
+fi
+
+done
+
+fi
+
 if test "$with_libxml" = yes ; then
   ac_fn_c_check_header_mongrel "$LINENO" "libxml/parser.h" "ac_cv_header_libxml_parser_h" "$ac_includes_default"
 if test "x$ac_cv_header_libxml_parser_h" = xyes; then :
diff --git a/configure.in b/configure.in
index 44f832f..d5fb726 100644
--- a/configure.in
+++ b/configure.in
@@ -663,6 +663,16 @@ AC_MSG_RESULT([$with_pam])
 
 
 #
+# BSD AUTH
+#
+AC_MSG_CHECKING([whether to build with BSD support])
+PGAC_ARG_BOOL(with, bsd-auth, no,
+  [build with BSD Authentication support],
+  [AC_DEFINE([USE_BSD_AUTH], 1, [Define to 1 to build with BSD support. (--with-bsd-auth)])])
+AC_MSG_RESULT([$with_bsd])
+
+
+#
 # LDAP
 #
 AC_MSG_CHECKING([whether to build with LDAP support])
@@ -1249,6 +1259,10 @@ if test "$with_pam" = yes ; then
  [AC_MSG_ERROR([header file  or  is required for PAM.])])])
 fi
 
+if test "$with_bsd" = yes ; then
+  AC_CHECK_HEADERS(bsd_auth.h, [], [AC_MSG_ERROR([header file  is required for BSD Authentication support])])
+fi
+
 if test "$with_libxml" = yes ; then
   AC_CHECK_HEADER(libxml/parser.h, [], [AC_MSG_ERROR([header file  is required for XML support])])
 fi
diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index 3b2935c..b2c8a43 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -522,6 +522,17 @@ hostnossl  database  user
  
 

+
+   
+ bsd
+ 
+  
+   Authenticate using BSD Authentication (BSD Auth) provided
+   by the operating system. See 
+   for details.
+  
+ 
+   
   
 
   
@@ -1647,6 +1658,33 @@ host ... ldap ldapurl="ldap://ldap.example.net/dc=example,dc=net?uid?sub";
 

   
+
+  
+   BSD Authentication
+
+   
+BSD
+   
+
+   
+This authentication method operates similarly to
+password except that it uses BSD
+Authentication as the authentication mechanism. BSD Authentication
+is used only to validate user name/password pairs.
+Therefore the user must already exist in the database before BSD
+Authentication can be used for authentication. For more information
+about BSD Authentication, please read the
+http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man3/auth_call.3?query=bsd_auth";>
+

Re: [HACKERS] pg_conversion seems rather strangely defined

2016-01-07 Thread Tom Lane

Noah Misch  writes:
> On Wed, Jan 06, 2016 at 11:56:14PM -0500, Tom Lane wrote:
>> Moreover, we have precedent both for this approach being a bad idea
>> and for us changing it without many complaints.  We used to have
>> search-path-dependent lookup of default index operator classes.
>> We found out that that was a bad idea and got rid of it, cf commit
>> 3ac1ac58c.  This situation looks much the same to me.

> Per the 3ac1ac58c log message, "CREATE OPERATOR CLASS only allows one default
> opclass per datatype regardless of schemas."  That had been true since day one
> for CREATE OPERATOR CLASS.  It doesn't hold for conversions today, and that's
> a crucial difference between that commit and this proposal.

Well, the state of affairs is slightly different, but that doesn't mean
it's not equally broken.  What I took away from the default-opclass fiasco
is that search-path-based lookup is a good idea only when you are looking
up something *by name*, so that the user can resolve any ambiguity by
schema-qualifying that name.  Searches that aren't based on a
user-specified name shouldn't depend on search_path.  This is important
because there are multiple conflicting concerns when you choose a
search_path setting, and you may not want to allow any particular search
to force your hand on what you put where in your search path.  If, for
example, you don't really want to put "public" on the front of your search
path, the current code gives you no way to select a conversion that's in
"public".

If we need to cater for alternative conversions, my preference would be a
GUC or some other kind of setting that allows selecting a client I/O
conversion by name, whether it is "default" or not.  But given the lack
of apparent demand, I don't really want to design and implement such a
feature right now.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_conversion seems rather strangely defined

2016-01-07 Thread Noah Misch

On Wed, Jan 06, 2016 at 11:56:14PM -0500, Tom Lane wrote:
> Noah Misch  writes:
> > On Tue, Jan 05, 2016 at 01:46:51PM -0500, Tom Lane wrote:
> >> I do not see a lot of point in the namespacing of encoding conversions
> >> either.  Does anyone really need or use search-path-dependent lookup of
> >> conversions?
> 
> > I have not issued CREATE CONVERSION except to experiment, and I have never
> > worked in a database in which someone else had created one.  Among PGXN
> > distributions, CREATE CONVERSION appears only in the pyrseas test suite.  It
> > could be hard to track down testimony on real-world usage patterns, but I
> > envision two credible patterns.  First, you could change the default search
> > path to "corrected_conversions, pg_catalog, $user, public" and inject fixed
> > versions of the system conversions.  One could use that to backport commit
> > 8d3e090.  Second, you could add conversions we omit entirely, like UTF8 ->
> > MULE_INTERNAL.  Dropping search-path-dependent lookup would remove the
> > supported way to fix system conversions.
> 
> The built-in conversions are very intentionally not pinned.  So to my
> mind, the supported way to replace one is to drop it and create your own.

I just learned something.  Interesting.

> The way you describe works only if an appropriate search path is installed
> at the time a new session activates its client encoding conversions.  TBH,
> I have no idea whether we apply (for example) "ALTER ROLE SET search_path"
> before that happens; but even if we do, there's no real guarantee that it
> wouldn't change.  We publish no documentation about the order of startup
> actions.  Moreover past attempts at defining dependencies between GUC
> settings have been spectacular failures, so I really don't want to go
> there in this context.
> 
> It would only be important to be able to do it like that if different
> users of the same database had conflicting ideas about what was the
> correct conversion between client and database encodings.  I submit
> that that's somewhere around epsilon probability, considering we have
> not even heard of anyone replacing the system conversions at all.
> 
> (Adding conversions we don't supply is, of course, orthogonal to this.)

Agreed on all those points.  Even so, I don't find that the need to set GUCs
in a particular order makes a good case for removing this ancient capability.
I _would_ send a new feature back for redesign on the strength of such a
defect, but that is different.

Independent from that dearth of positive cause to restrict this, users taking
the "DROP CONVERSION pg_catalog.foo" route get dump/restore problems.  pg_dump
doesn't notice that one dropped a pg_catalog conversion; the user would
manually repeat each drop before each restore.  That's especially awkward for
pg_upgrade.  I guess the user could drop each conversion in the new cluster's
template0, run pg_upgrade, and then recreate conversions in databases that had
not overridden them in the original cluster.  That's a mess.

> Moreover, we have precedent both for this approach being a bad idea
> and for us changing it without many complaints.  We used to have
> search-path-dependent lookup of default index operator classes.
> We found out that that was a bad idea and got rid of it, cf commit
> 3ac1ac58c.  This situation looks much the same to me.

Per the 3ac1ac58c log message, "CREATE OPERATOR CLASS only allows one default
opclass per datatype regardless of schemas."  That had been true since day one
for CREATE OPERATOR CLASS.  It doesn't hold for conversions today, and that's
a crucial difference between that commit and this proposal.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] BEGINNER HACKERS: array_remove(anyarray, anyarray)

2016-01-07 Thread Jim Nasby


On 12/10/15 6:25 PM, Jim Nasby wrote:

Recently I had need of removing occurrences of a number of values from
an array. Obviously I could have nested array_remove() call or wrapped
the whole thing in a SELECT unnest(), but that seems rather silly and
inefficient.

Any one have objections to changing array_replace_internal() to make
search and replace arrays instead of Datums?


Seeing none... let's see if someone wants to tackle this.

The goal is to modify array_replace_internal() (in 
src/backend/utils/adt/arrayfuncs.c) so that two of it's arguments 
(search and replace) can be arrays. I believe the best way to do that 
would be to add bool search_isarray and bool replace_isarray arguments, 
and have array_replace_internal() act accordingly if those are true.


The expected behavior when search_isarray is true would be to search for 
a group of elements that completely match the search array. When 
replace_isarray is true, then any replacements into *array would be the 
elements from replace, growing or shrinking the array as necessary.


The user visible array_remove and array_replace functions will need to 
check the data type of their inputs using get_fn_expr_argtype(). You can 
find examples of that in arrayfuncs.c.


A complete patch will also need regression tests 
(src/test/regress/sql/arrays.sql) and tweaks to the documentation for 
those functions (doc/src/sgml/func.sgml).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Windows: Make pg_ctl reliably detect service status

2016-01-07 Thread Michael Paquier

On Fri, Jan 8, 2016 at 8:38 AM, Alvaro Herrera  wrote:
> Tom Lane wrote:
>> Alvaro Herrera  writes:
>> > I confirmed that adding that line makes the new file get compiled.  I
>> > also noticed these warnings when compiling it:
>>
>> > In file included from 
>> > /usr/lib/gcc/x86_64-w64-mingw32/4.9-win32/include/stdarg.h:1:0,
>> >  from /pgsql/source/master/src/include/c.h:85,
>> >  from /pgsql/source/master/src/include/postgres_fe.h:25,
>> >  from /pgsql/source/master/src/port/win32security.c:17:
>> > /pgsql/source/master/src/port/win32security.c: In function ‘log_error’:
>> > /pgsql/source/master/src/port/win32security.c:37:11: warning: passing 
>> > argument 1 of ‘__builtin_va_start’ from incompatible pointer type
>> >   va_start(fmt, ap);
>> >^
>>
>> I take it this code is quite untested, because what that's whining
>> about is that the arguments of va_start() are reversed.
>
> It is untested by me, yes.  Pushed a fix for this problem.

Arg, thanks! My MS 2010 compiler did not complain about that. That's a
bit depressing...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Very confusing installcheck behavior with PGXS

2016-01-07 Thread Alvaro Herrera

David E. Wheeler wrote:
> On Jan 7, 2016, at 11:20 AM, Jim Nasby  wrote:
> 
> >>> Also worth noting: the only reason I'm using pg_regress is it's the 
> >>> easiest
> >>> way to get a test cluster. If not for that, I'd just use pg_prove since 
> >>> I'm
> >>> already using pgTap.
> >> 
> >> In 9.5 you might want to "use PostgresNode" which allows you to initdb
> >> and such.
> > 
> > Oooh, thanks! I might well just copy that into my pgxntool utility.
> 
> Is this documented somewhere?

Not in detail.  Patches welcome; see the "regress-tap" sect1 in
doc/src/sgml/regress.sgml.

> If it’s Perl, seems like it’d only be useful for those of us who compile from 
> source, yes?

Feel free to submit patches for discussion to install the modules.  We
already install pg_regress under lib/pgxs/src/test/regress/ so I see no
reason we couldn't install the Perl test modules somewhere.  If you
hurry you can still make it to 9.6's last commitfest.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Windows: Make pg_ctl reliably detect service status

2016-01-07 Thread Alvaro Herrera

Tom Lane wrote:
> Alvaro Herrera  writes:
> > I confirmed that adding that line makes the new file get compiled.  I
> > also noticed these warnings when compiling it:
> 
> > In file included from 
> > /usr/lib/gcc/x86_64-w64-mingw32/4.9-win32/include/stdarg.h:1:0,
> >  from /pgsql/source/master/src/include/c.h:85,
> >  from /pgsql/source/master/src/include/postgres_fe.h:25,
> >  from /pgsql/source/master/src/port/win32security.c:17:
> > /pgsql/source/master/src/port/win32security.c: In function ‘log_error’:
> > /pgsql/source/master/src/port/win32security.c:37:11: warning: passing 
> > argument 1 of ‘__builtin_va_start’ from incompatible pointer type
> >   va_start(fmt, ap);
> >^
> 
> I take it this code is quite untested, because what that's whining
> about is that the arguments of va_start() are reversed.

It is untested by me, yes.  Pushed a fix for this problem.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Very confusing installcheck behavior with PGXS

2016-01-07 Thread David E. Wheeler

On Jan 7, 2016, at 11:20 AM, Jim Nasby  wrote:

>>> Also worth noting: the only reason I'm using pg_regress is it's the easiest
>>> way to get a test cluster. If not for that, I'd just use pg_prove since I'm
>>> already using pgTap.
>> 
>> In 9.5 you might want to "use PostgresNode" which allows you to initdb
>> and such.
> 
> Oooh, thanks! I might well just copy that into my pgxntool utility.

Is this documented somewhere? If it’s Perl, seems like it’d only be useful for 
those of us who compile from source, yes?

David



smime.p7s
Description: S/MIME cryptographic signature

Re: [HACKERS] Function and view to retrieve WAL receiver status

2016-01-07 Thread Michael Paquier

On Fri, Jan 8, 2016 at 4:38 AM, Alvaro Herrera  wrote:
> Michael Paquier wrote:
>> On Thu, Jan 7, 2016 at 1:57 PM, Haribabu Kommi  
>> wrote:
>> > On Wed, Jan 6, 2016 at 8:00 PM, Michael Paquier
>> >  wrote:
>> >> On Wed, Jan 6, 2016 at 3:04 PM, Michael Paquier
>> >>  wrote:
>> >>> Attached is an updated patch.
>> >>
>> >> Forgot to update rules.out...
>> >
>> > Thanks for the update. Patch looks good to me.
>> > I marked it as ready for committer.
>>
>> Thanks!
>
> Messed around with it, couldn't find any fault, pushed.

Thanks!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Generic WAL logical messages

2016-01-07 Thread Petr Jelinek


On 2016-01-07 17:22, Simon Riggs wrote:


* You call them "logical messages" here, but standby messages in code.
But they only apply to logical decoding, so "logical message" seems a
better name. Can we avoid calling them "messages" cos that will get
confusing.



Yes it's slightly confusing, the "Standby" in the code is mostly for 
consistency with other "Standby*" stuff in neighbouring code, but that 
can be changed. I don't have better name than "messages" though, 
"records" is too generic.



For standard WAL reply, these are basically noop


We should document that.


Okay.



These messages can be both transactional (decoded on commit) or
non-transactional (decoded immediately). Each message has prefix to
differentiate between individual plugins. The prefix has to be
registered exactly once (in similar manner as security label
providers) to avoid conflicts between plugins.


I'm not sure what "transactional" means, nor is that documented.
(Conversely, I think "immediate" fairly clear)

Are they fired only on commit? (Guess so)
Are they fired in the original order, if multiple messages in same
transaction? (Hope so)
Are they fired as they come in the original message sequence, or before
anything else or after everything else? For example, cache invalidation
messages are normally fired right at the end of a transaction, no matter
when they were triggered.



Transnational message is added to the stream same way as any DML 
operation is and has same properties (processed on commit, original 
order, duplicate messages are delivered as they are).


The immediate as you say is obvious, they get to logical decoding 
immediately without dealing with any regards to what's happening around 
(wal order is still preserved though).


Will make this clearer in the docs.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Windows: Make pg_ctl reliably detect service status

2016-01-07 Thread Tom Lane

Alvaro Herrera  writes:
> I confirmed that adding that line makes the new file get compiled.  I
> also noticed these warnings when compiling it:

> In file included from 
> /usr/lib/gcc/x86_64-w64-mingw32/4.9-win32/include/stdarg.h:1:0,
>  from /pgsql/source/master/src/include/c.h:85,
>  from /pgsql/source/master/src/include/postgres_fe.h:25,
>  from /pgsql/source/master/src/port/win32security.c:17:
> /pgsql/source/master/src/port/win32security.c: In function âlog_errorâ:
> /pgsql/source/master/src/port/win32security.c:37:11: warning: passing 
> argument 1 of â__builtin_va_startâ from incompatible pointer type
>   va_start(fmt, ap);
>^

I take it this code is quite untested, because what that's whining
about is that the arguments of va_start() are reversed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2016-01-07 Thread Joshua D. Drake


On 01/07/2016 12:32 PM, Jim Nasby wrote:

On 1/7/16 1:49 PM, Josh Berkus wrote:

Yeah, we could also get rid of this conversation:

"Here's a patch for X, which is on the TODO list"

"Oh, we've obsolesced that, that was added to the TODO before we had Y"

... by auto-closing TODO items at a certain age.


Even if not auto-closed at least it'd be easy to find old items.

Bonus points if we attract some volunteer project managers that will
keep tabs of all those kinds of things...


/me waves hand

There are quite a few contributing companies that likely have people 
that could help out with this in an educated fashion but aren't coders.



JD



--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Windows: Make pg_ctl reliably detect service status

2016-01-07 Thread Alvaro Herrera

Alvaro Herrera wrote:
> Alvaro Herrera wrote:
> > Alvaro Herrera wrote:
> > > Windows: Make pg_ctl reliably detect service status
> > > 
> > > pg_ctl is using isatty() to verify whether the process is running in a
> > > terminal, and if not it sends its output to Windows' Event Log ... which
> > > does the wrong thing when the output has been redirected to a pipe, as
> > > reported in bug #13592.
> > 
> > This broke the mingw port.  Looking.
> 
> A bit of grepping appears to say that I ought to patch configure.in to
> add
>   AC_LIBOBJ(win32security)
> around line 1580 and rerun autoconf, but this seems completely at odds
> with the documented use of AC_LIBOBJS.  Is this black magic?

I confirmed that adding that line makes the new file get compiled.  I
also noticed these warnings when compiling it:

In file included from 
/usr/lib/gcc/x86_64-w64-mingw32/4.9-win32/include/stdarg.h:1:0,
 from /pgsql/source/master/src/include/c.h:85,
 from /pgsql/source/master/src/include/postgres_fe.h:25,
 from /pgsql/source/master/src/port/win32security.c:17:
/pgsql/source/master/src/port/win32security.c: In function ‘log_error’:
/pgsql/source/master/src/port/win32security.c:37:11: warning: passing argument 
1 of ‘__builtin_va_start’ from incompatible pointer type
  va_start(fmt, ap);
   ^
/pgsql/source/master/src/port/win32security.c:37:11: note: expected ‘char **’ 
but argument is of type ‘const char **’
/pgsql/source/master/src/port/win32security.c:37:2: warning: second parameter 
of ‘va_start’ not last named argument [-Wvarargs]
  va_start(fmt, ap);
  ^

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

2016-01-07 Thread Fabien COELHO



Hello Andres,


Hmmm. What I understood is that the workloads that have some performance
regressions (regressions that I have *not* seen in the many tests I ran) are
not due to checkpointer IOs, but rather in settings where most of the writes
is done by backends or bgwriter.


As far as I can see you've not run many tests where the hot/warm data
set is larger than memory (the full machine's memory, not
shared_buffers).


Indeed, I think I ran some, but not many with such characteristics.

That quite drastically alters the performance characteristics here, 
because you suddenly have lots of synchronous read IO thrown into the 
mix.


If I understand this point correctly...

I would expect the overall performance to be abysmal in such a situation 
because you get only intermixed *random* read and writes: As you point 
out, synchroneous *random* reads (very slow), but on the write side the 
IOs are mostly random as well on the checkpointer side because there is 
not much to aggregate to get sequential writes.


Now why would that degrade performance significantly? For me it should 
render the sorting/flushing less and less effective, and it would go back 
to the previous performance levels...


Or maybe it only the flushing itself which degrades performance, as you 
point out, because then you have some synchronous (synced) writes as well 
as read, as opposed to just the reads before without the patch.


If this is indeed the issue, then the solution to avoid the regression is 
*not* to flush so that the OS IO scheduler is less constrained in its job, 
and can be slightly more effective (well, we talking of abysmal random IO 
disk performance here, so effective would be between slightly more or less 
very very very bad).


Maybe a trick could be not to aggregate and flush when buffers in the same 
file are too much apart anyway, for instance, based on some threshold? 
This can be implemented locally when deciding to merge buffer flushes or 
not, and whether to flush or not, so it would fit the current code quite 
simply.


Now my understanding of the sync_file_range call is that it is an advice 
to flush the stuff, but it is still asynchronous in nature, so whether it 
would impact performance that badly depends on the OS IO scheduler. Also, 
I would like to check whether, under the "regressed performance" (in tps 
term that you observed), pg is more or less responsive. It could be that 
the average performance is better but pg is offline longer on fsync. In 
which case, I would consider it better to have lower tps in such cases 
*if* pg responsiveness is significantly improved.


Would you have these measures for the regression runs you observed?


Whether it's bgwriter or not I've not fully been able to establish, but
it's a working theory.


Ok, that is something to check for confirmation or infirmation.

Given the above discussion, I think my suggestion may be wrong: as the tps 
is low because of random read/write accesses then not many buffers are 
modified (so the bgwriter/backends won't need to make space), the 
checkpointer does not have much to write (good), *but* all of it is random 
(bad).



I do not see the point of rewriting the checkpointer for them, although
obviously I agree that something has to be done also for the other
processes.


Rewriting the checkpointer and fixing the flush interface in a more
generic way aren't the same thing at all.


Hmmm, probably I misunderstood something in the discussion. It started 
with an implementation strategy, but it derived to discussing a 
performance regression. I aggree that these are two different subjects.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS]WIP: Covering + unique indexes.

2016-01-07 Thread David Rowley

On 7 January 2016 at 06:36, Jeff Janes  wrote:

> On Tue, Jan 5, 2016 at 11:55 PM, David Rowley
>  wrote:
> > create table ab (a int,b int);
> > insert into ab select x,y from generate_series(1,20) x(x),
> > generate_series(10,1,-1) y(y);
> > create index on ab (a) including (b);
> > explain select * from ab order by a,b;
> > QUERY PLAN
> > --
> >  Sort  (cost=10.64..11.14 rows=200 width=8)
> >Sort Key: a, b
> >->  Seq Scan on ab  (cost=0.00..3.00 rows=200 width=8)
> > (3 rows)
>
> If you set enable_sort=off, then you get the index-only scan with no
> sort.  So it believes the index can be used for ordering (correctly, I
> think), just sometimes it thinks it is not faster to do it that way.
>
> I'm not sure why this would be a correctness problem.  The covered
> column does not participate in uniqueness checks, but it still usually
> participates in index ordering.  (That is why dummy op-classes are
> needed if you want to include non-sortable-type columns as being
> covered.)
>

If that's the case, then it appears that I've misunderstood INCLUDING. From
reading _bt_doinsert() it appeared that it'll ignore the INCLUDING columns
and just find the insert position based on the key columns. Yet that's not
the way that it appears to work. I was also a bit confused, as from working
with another database which has very similar syntax to this, that one only
includes the columns to allow index only scans, and the included columns
are not indexed, therefore can't be part of index quals and the index only
provides a sorted path for the indexed columns, and not the included
columns.

Saying that, I'm now a bit confused to why the following does not produce 2
indexes which are the same size:

create table t1 (a int, b text);
insert into t1 select x,md5(random()::text) from generate_series(1,100)
x(x);
create index t1_a_inc_b_idx on t1 (a) including (b);
create index t1_a_b_idx on t1 (a,b);
select pg_relation_Size('t1_a_b_idx'),pg_relation_size('t1_a_inc_b_idx');
 pg_relation_size | pg_relation_size
--+--
 59064320 | 58744832
(1 row)

Also, if we want INCLUDING() to mean "uniqueness is not enforced on these
columns, but they're still in the index", then I don't really think
allowing types without a btree opclass is a good idea. It's likely too
surprised filled and might not be what the user actually wants. I'd suggest
that these non-indexed columns would be better defined by further expanding
the syntax, the first (perhaps not very good) thing that comes to mind is:

create unique index idx_name on table (unique_col) also index
(other,idx,cols) including (leaf,onlycols);

Looking up thread, I don't think I was the first to be confused by this.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2016-01-07 Thread Jim Nasby


On 1/7/16 1:49 PM, Josh Berkus wrote:

Yeah, we could also get rid of this conversation:

"Here's a patch for X, which is on the TODO list"

"Oh, we've obsolesced that, that was added to the TODO before we had Y"

... by auto-closing TODO items at a certain age.


Even if not auto-closed at least it'd be easy to find old items.

Bonus points if we attract some volunteer project managers that will 
keep tabs of all those kinds of things...

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Interesting read on SCM upending software and hardware architecture

2016-01-07 Thread Jim Nasby

https://queue.acm.org/detail.cfm?id=2874238 discusses how modern Storage 
Class Memory (SCM), such as PCIe SSD and NVDIMMs are completely upending 
every assumption made about storage. To put this in perspective, you can 
now see storage latency that is practically on-par with things like lock 
acquisition[1].


Presumably the bulk of this difference should be handled by the OS, but 
there's probably things we should be considering too:


Tiered storage will become common. That's going to make avoiding things 
like bulk scans even more important. There's a tie-in to partitioning 
and indexes too.


The days of a SAN may be over. With memory, network and storage latency 
approaching parity it's not practical to concentrate any of these 
resources; that creates a bottleneck. This means people will be even 
more resistant to the idea of a single database server.


The cost of temporary data becomes much lower. At some point it probably 
makes sense to just mmap what's needed and move on.


Fortunately, I think our traditional reliance on OS caching has helped 
us... to some degree we've always treated storage as fast because a lot 
of requests would be coming from RAM anyway.


[1] Ok, 25x isn't exactly on-par, but considering this used to be 
25,000x... "At 100K IOPS for a uniform random workload, a CPU has 
approximately 10 microseconds to process an I/O request. Because today's 
SCMs are often considerably faster at processing sequential or read-only 
workloads, this can drop to closer to 2.5 microseconds on commodity 
hardware. Even worse, since these requests usually originate from a 
remote source, network devices have to be serviced at the same rate, 
further reducing the available per-request processing time. To put these 
numbers in context, acquiring a single uncontested lock on today's 
systems takes approximately 20ns, while a non-blocking cache 
invalidation can cost up to 100ns, only 25x less than an I/O operation."

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

2016-01-07 Thread Andres Freund

On 2016-01-07 21:08:10 +0100, Fabien COELHO wrote:
> Hmmm. What I understood is that the workloads that have some performance
> regressions (regressions that I have *not* seen in the many tests I ran) are
> not due to checkpointer IOs, but rather in settings where most of the writes
> is done by backends or bgwriter.

As far as I can see you've not run many tests where the hot/warm data
set is larger than memory (the full machine's memory, not
shared_buffers). That quite drastically alters the performance
characteristics here, because you suddenly have lots of synchronous read
IO thrown into the mix.

Whether it's bgwriter or not I've not fully been able to establish, but
it's a working theory.

> I do not see the point of rewriting the checkpointer for them, although
> obviously I agree that something has to be done also for the other
> processes.

Rewriting the checkpointer and fixing the flush interface in a more
generic way aren't the same thing at all.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Windows: Make pg_ctl reliably detect service status

2016-01-07 Thread Alvaro Herrera

Alvaro Herrera wrote:
> Alvaro Herrera wrote:
> > Windows: Make pg_ctl reliably detect service status
> > 
> > pg_ctl is using isatty() to verify whether the process is running in a
> > terminal, and if not it sends its output to Windows' Event Log ... which
> > does the wrong thing when the output has been redirected to a pipe, as
> > reported in bug #13592.
> 
> This broke the mingw port.  Looking.

A bit of grepping appears to say that I ought to patch configure.in to
add
  AC_LIBOBJ(win32security)
around line 1580 and rerun autoconf, but this seems completely at odds
with the documented use of AC_LIBOBJS.  Is this black magic?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

2016-01-07 Thread Fabien COELHO



Hello Andres,


One of the point of aggregating flushes is that the range flush call cost
is significant, as shown by preliminary tests I did, probably up in the
thread, so it makes sense to limit this cost, hence the aggregation. These
removed some performation regression I had in some cases.


FWIW, my tests show that flushing for clean ranges is pretty cheap.


Yes, I agree that it is quite cheap, but I had a few % tps regressions 
in some cases without aggregating, and aggregating was enough to avoid 
these small regressions.



Also, the granularity of the buffer flush call is a file + offset + size, so
necessarily it should be done this way (i.e. per file).


What syscalls we issue, and at what level we track outstanding flushes,
doesn't have to be the same.


Sure. But the current version is simple, efficient and proven by many 
runs, so there should be a very strong argument to justify a significant 
benefit to change the approach, and I see no such thing in your arguments.


For me the current approach is optimal for the checkpointer, because it 
takes advantage of all available information to perform a better job.



Once buffers are sorted per file and offset within file, then written
buffers are as close as possible one after the other, the merging is very
easy to compute (it is done on the fly, no need to keep the list of buffers
for instance), it is optimally effective, and when the checkpointed file
changes then we will never go back to it before the next checkpoint, so
there is no reason not to flush right then.


Well, that's true if there's only one tablespace, but e.g. not the case
with two tablespaces of about the same number of dirty buffers.


ISTM that in the version of the patch I sent there was one flushing 
structure per tablespace each doing its own flushing on its files, so it 
should work the same, only the writing intensity is devided by the number 
of tablespace? Or am I missing something?



So basically I do not see a clear positive advantage to your suggestion,
especially when taking into consideration the scheduling process of the
scheduler:


I don't think it makes a big difference for the checkpointer alone, but
it makes the interface much more suitable for other processes, e.g. the
bgwriter, and normal backends.


Hmmm.

ISTM that the requirement are not exactly the same for the bgwriter and 
backends vs the checkpointer. The checkpointer has the advantage of being 
able to plan its IOs on the long term (volume & time is known...) and the 
implementation takes the full benefit of this planing by sorting and 
scheduling and flushing buffers so as to generate as much sequential 
writes as possible.


The bgwriter and backends have a much shorter vision (a few seconds, or 
juste one query being process), so the solution will be less efficient and 
probably more messy on the coding side. This is life. I do not see why not 
to take the benefit of a full planing in the checkpointer just because 
other processes cannot do the same, especially as under plenty of loads 
the checkpointer does most of the writing so is the limiting factor.


So I do not buy your suggestion for the checkpointer. Maybe it will be the 
way to go for bgwriter and backends, then fine for them.



Imo that means that we'd better track writes on a relfilenode + block
number level.


I do not think that it is a better option. Moreover, the current approach
has been proven to be very effective on hundreds of runs, so redoing it
differently for the sake of it does not look like good resource allocation.


For a subset of workloads, yes.


Hmmm. What I understood is that the workloads that have some performance 
regressions (regressions that I have *not* seen in the many tests I ran) 
are not due to checkpointer IOs, but rather in settings where most of the 
writes is done by backends or bgwriter.


I do not see the point of rewriting the checkpointer for them, although 
obviously I agree that something has to be done also for the other 
processes.


Maybe if all the writes (bgwriter and checkpointer) where performed by the 
same process then some dynamic mixing and sorting and aggregating would 
make sense, but this is currently not the case, and would probably have 
quite limited effect.


Basically I do not understand how changing the flushing organisation as 
you suggest would improve the checkpointer performance significantly, for 
me it should only degrade the performance compared to the current version, 
as far as the checkpointer is concerned.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2016-01-07 Thread Josh Berkus

On 01/07/2016 10:30 AM, Jeff Janes wrote:
> I don't completely agree with that.  I have often wanted to know when
> a specific item was added to the TODO page, and/or its individual edit
> history.  With only a unified history of the entire TODO page, and
> with no wiki equivalent of "git blame", figuring this out is extremely
> tedious.  A tracker would precisely solve this problem, if nothing
> else.  And when I edit the wiki and forget to make a coherent edit
> summary, there is no way to fix that, while presumably an issue
> tracker would be more tolerant of people's imperfections.

Yeah, we could also get rid of this conversation:

"Here's a patch for X, which is on the TODO list"

"Oh, we've obsolesced that, that was added to the TODO before we had Y"

... by auto-closing TODO items at a certain age.

-- 
Josh Berkus
Red Hat OSAS
(opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Very confusing installcheck behavior with PGXS

2016-01-07 Thread Jim Nasby


On 1/7/16 1:04 PM, Alvaro Herrera wrote:

Jim Nasby wrote:


Also worth noting: the only reason I'm using pg_regress is it's the easiest
way to get a test cluster. If not for that, I'd just use pg_prove since I'm
already using pgTap.


In 9.5 you might want to "use PostgresNode" which allows you to initdb
and such.


Oooh, thanks! I might well just copy that into my pgxntool utility.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Very confusing installcheck behavior with PGXS

2016-01-07 Thread Alvaro Herrera

Jim Nasby wrote:

> Also worth noting: the only reason I'm using pg_regress is it's the easiest
> way to get a test cluster. If not for that, I'd just use pg_prove since I'm
> already using pgTap.

In 9.5 you might want to "use PostgresNode" which allows you to initdb
and such.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Very confusing installcheck behavior with PGXS

2016-01-07 Thread Jim Nasby


On 1/7/16 12:12 PM, Tom Lane wrote:

I'm not really concerned about the current behavior of putting transformed
input/output files into sql/ and expected/.  Only experts are likely to
be creating files requiring transformation at all (and even the experts
prefer to avoid that, because they're a PITA).  So I am not very worried
about duplication of file names between eg input/ and sql/.  But I don't
like handling VPATH by introducing confusion between --inputdir and
--outputdir.


Maybe I'm just confused... what should happen then in a vpath build? Do 
the results of input/*.source get put in sql/ in the source tree?


Oh, I guess there's magic that looks first in $source/sql and then in 
$vpath/sql?


One thing I'm in violent agreement on is there's way too much magic 
happening here. That magic isn't even hinted at by --help, and since 
there's no documentation on pg_regress extension authors can only follow 
other examples here...



It does strike me though that some of your pain is self-inflicted: why
did you think it was a good idea to create both test/sql/ and sql/
subdirectories?  What's the point of test/ at all?  It's certainly not
part of the standard layout for a contrib module.


I blame David Wheeler, who I copied the pattern from. :) Actually, I got 
the pattern from pgxntool, which AFAIK got the pattern from him.


Regardless of blame, the problem I see is that extensions frequently 
deal with nothing but sql, and mixing up your test code with your 
extension code is going to be confusing and messy. So instead of piling 
everything into /sql, David decided to put the test stuff in /test and 
leave the sql stuff in /sql (if there's C files those tend to go in /src).


I see a lot of extensions using this pattern (or something similar). The 
other common pattern I see is to just pile everything into the top level 
extension directory. That's OK at first (and for a really, really simple 
extension might be all you ever want), but if you start having a few 
tests, a doc file, and several upgrade scripts that starts getting messy.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2016-01-07 Thread Jeff Janes

On Wed, Jan 6, 2016 at 4:18 PM, Greg Stark  wrote:
> On Wed, Jan 6, 2016 at 11:42 PM, Jim Nasby  wrote:
>> Right. Personally, I feel the TODO has pretty much outlived it's usefulness.
>> An issue tracker would make maintaining items like this a lot more
>> reasonable, but it certainly wouldn't be free.
>
> Eh, a bug tracker that tracks actual bugs would be useful, I don't
> think anyone would argue with that. A vague "issue" tracker that just
> collects ideas people have had that seemed like a good idea at some
> time in history would suffer exactly the same problem the TODO has.

I don't completely agree with that.  I have often wanted to know when
a specific item was added to the TODO page, and/or its individual edit
history.  With only a unified history of the entire TODO page, and
with no wiki equivalent of "git blame", figuring this out is extremely
tedious.  A tracker would precisely solve this problem, if nothing
else.  And when I edit the wiki and forget to make a coherent edit
summary, there is no way to fix that, while presumably an issue
tracker would be more tolerant of people's imperfections.

It could also be ameliorated without a tracker by people being more
disciplined about linking to the email archives, but evidently we are
not disciplined enough to do that reliably enough.  I think we are
better about that recently than we were in the past, but without the
ability to readily see when an item was added, it is hard to go back
and find the emails to fix the past mistakes.

But, if we want a list of projects for beginners, I think it has to be
explicitly that.  A list of things an experienced expert could do
trivially, but are consciously refraining from doing so that a
beginner can do them instead.  It would need to be separate from a "we
can't decide if we want this or can't decide how to do it or don't
have the time to do it" list.  There is no reason we have to have an
issue tracker in order to create that separation, but it could help.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Very confusing installcheck behavior with PGXS

2016-01-07 Thread Tom Lane

Jim Nasby  writes:
> If we want to keep input/ and output/ inside pg_regress then I think 
> what needs to happen in a vpath build is to first create $vpath/sql and 
> $vpath/expected, copy anything from $(uh... source?)/sql and /expected 
> there, and then process /input and /output (and deal with any duplicate 
> file references).

No, I don't think we want to physically copy anything we don't have to.
(Before the commit I mentioned upthread, we were actually doing it more
or less as you suggest, and it was messy.)

I'm not really concerned about the current behavior of putting transformed
input/output files into sql/ and expected/.  Only experts are likely to
be creating files requiring transformation at all (and even the experts
prefer to avoid that, because they're a PITA).  So I am not very worried
about duplication of file names between eg input/ and sql/.  But I don't
like handling VPATH by introducing confusion between --inputdir and
--outputdir.

It does strike me though that some of your pain is self-inflicted: why
did you think it was a good idea to create both test/sql/ and sql/
subdirectories?  What's the point of test/ at all?  It's certainly not
part of the standard layout for a contrib module.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2016-01-07 Thread Andres Freund

On 2016-01-07 09:28:29 -0800, Jarred Ward wrote:
> I didn't receive a response on the bugs mailing list for the following bug,
> so I was hoping we could triage to someone with more familiarity with
> Postgres internals than I to fix.

Please don't post to unrelated threads, that just confuses things.

Andres

PS: Yes, I do plan to look at that issue at some point.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Very confusing installcheck behavior with PGXS

2016-01-07 Thread Jim Nasby


On 1/7/16 9:56 AM, Tom Lane wrote:

Jim Nasby  writes:

On 1/7/16 9:12 AM, Tom Lane wrote:

(I'm also wondering how convert_sourcefiles() works at all in a vpath
build, considering that I don't see it doing anything like this ...)



It's only looking at outputdir, which I suspect is never ambiguous.


Eh, no, look again.  What it's actually doing is reading $inputdir/input/
and converting into $outputdir/sql/, and reading $inputdir/output/ and
converting into $outputdir/expected/.  I guess that works, but again it's
kind of at variance with the normal expectation of VPATH behavior.  What
you'd expect is that $builddir/input files would override $srcdir/input
files; but as is, $builddir/input and $builddir/output are never examined
at all.


Yeah, I just discovered the whole input/ and output/ bit. That really 
complicates things, because ISTM it's very common to directly specify 
both sql/ and expected/ files directly, and you'd certainly THINK that 
those files are input, and not output.


Which begs the question... how the hell do sql/ and expected/ ever work 
in a vpath build? AFAICT things are never copied from 
$inputdir/(sql|expected) to $outputdir...


Maybe it's just me, but this whole convention seems like a giant POLA 
violation. If pg_regress was only used in Postgres source maybe that 
wouldn't matter since presumably there's always an example to copy from 
(and presumably tests use either input/ and output/ OR sql/ and 
expected/, but never both). But pg_regress is used by contrib and now 
extensions, so it's got a much wider audience than just -hackers. :/


input and output are used in only 3 places in the whole tree[1], so 
maybe the best thing to do is just rip support for that out of 
pg_regress and handle it some other way.


Also worth noting: the only reason I'm using pg_regress is it's the 
easiest way to get a test cluster. If not for that, I'd just use 
pg_prove since I'm already using pgTap.


[1] find . \( -name input -o -name output \) -type d
./contrib/dblink/input
./contrib/dblink/output
./contrib/file_fdw/input
./contrib/file_fdw/output
./src/test/regress/input
./src/test/regress/output
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multi-tenancy with RLS

2016-01-07 Thread Tom Lane

Joe Conway  writes:
> On 01/06/2016 12:15 PM, Robert Haas wrote:
>> Is any committer thinking about taking a serious look at this patch series?
>> 
>> I ask because (1) it seems like it could be nice to have but (2) it
>> frightens me terribly.  We are generally very sparing about assuming
>> that "stuff" (triggers, partial indexes, etc.) that works for user
>> tables can also be made to work for system tables.  I haven't thought
>> deeply about what might go wrong in this particular case, but it
>> strikes me that if Haribabu Kommi is building something that is doomed
>> for some reason, it would be good to figure that out before he spends
>> any more time on it than he already has.

> As Stephen mentioned, yes, I am very interested in at least some aspects
> of this patch. The ability to apply RLS to system tables could be useful
> to solve a number of problems we don't have a good story for today,
> multi-tenancy only being one of them.

FWIW, it seems offhand like we might not have that much trouble with
applying RLS to system catalogs as long as it's understood that RLS
only has anything to do with SQL queries issued against the catalogs.

If we imagine that RLS should somehow filter a backend's own operations on
the catalogs, then I agree with Robert that the entire thing is deeply
scary and probably incapable of being made to work robustly.

However, by "not that much trouble" I only mean getting an implementation
that works and doesn't create more security problems than it fixes.
Usability is still likely to be a huge problem.  In particular it seems
likely that any attempt to actually put RLS policies on the catalogs would
completely destroy the ability to run pg_dump except as a BYPASSRLS role.
That would be an unpleasant consequence.

I've not read the patch set, so maybe it contains some ideas about how
to mitigate the usability issues, in which case never mind ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2016-01-07 Thread Jarred Ward

I didn't receive a response on the bugs mailing list for the following bug,
so I was hoping we could triage to someone with more familiarity with
Postgres internals than I to fix.

This ticket seems like folks who are invested in logical decoding.

The attached script is a simple workload that logical decoding is unable to
decode. It causes an unrecoverable crash in the logical decoder with
'ERROR:  subxact logged without previous toplevel record'.

On Thu, Jan 7, 2016 at 12:44 AM, Craig Ringer  wrote:

>
> Here's v5 of the pglogical patch.
>
> Changes:
>
> * Implement relation metadata caching
> * Add the relmeta_cache_size parameter for cache control
> * Add an extension to get version information
> * Create the pglogical_output header directory on install
> * Restore 9.4 compatibility (it's small)
> * Allow row filter hooks to see details of the changed tuple
> * Remove forward_changesets from pglogical_output (use a hook if you want
> this functionality)
>
> I'm not sure if 9.4 compat will be desirable or not. It's handy to avoid
> needing a separate backported version, but also confusing to do a PGXS
> build within a 9.6 tree against 9.4...
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

bug.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multi-tenancy with RLS

2016-01-07 Thread Joe Conway

On 01/06/2016 12:15 PM, Robert Haas wrote:
> On Tue, Jan 5, 2016 at 11:07 PM, Haribabu Kommi
>  wrote:
>> May be you missed to apply the 3_shared_catalog_tenancy_v4 path,
>> because 4_database_catalog_tenancy_v5 patch depends on it.
>>
>> Here I attached all the patches for your convenience, I am able to
>> apply all patches in the order without any problem.
> 
> Is any committer thinking about taking a serious look at this patch series?
> 
> I ask because (1) it seems like it could be nice to have but (2) it
> frightens me terribly.  We are generally very sparing about assuming
> that "stuff" (triggers, partial indexes, etc.) that works for user
> tables can also be made to work for system tables.  I haven't thought
> deeply about what might go wrong in this particular case, but it
> strikes me that if Haribabu Kommi is building something that is doomed
> for some reason, it would be good to figure that out before he spends
> any more time on it than he already has.

As Stephen mentioned, yes, I am very interested in at least some aspects
of this patch. The ability to apply RLS to system tables could be useful
to solve a number of problems we don't have a good story for today,
multi-tenancy only being one of them.

> Apart from the issue of whether this is doomed for some architectural
> reason, it is not entirely clear to me that there's any consensus that
> we want this.  I don't think that I understand the issues here well
> enough to proffer an opinion of my own just yet... but I'd like to
> hear what other people think.

As said above, I definitely see a need for something like this if not
this specifically.

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2016-01-07 Thread Joshua D. Drake


On 01/06/2016 04:18 PM, Greg Stark wrote:

On Wed, Jan 6, 2016 at 11:42 PM, Jim Nasby  wrote:

Right. Personally, I feel the TODO has pretty much outlived it's usefulness.
An issue tracker would make maintaining items like this a lot more
reasonable, but it certainly wouldn't be free.


Eh, a bug tracker that tracks actual bugs would be useful, I don't
think anyone would argue with that. A vague "issue" tracker that just
collects ideas people have had that seemed like a good idea at some
time in history would suffer exactly the same problem the TODO has.


Not if it was probably integrated it wouldn't. The problem with the TODO 
is that it is in no mans land of the wiki and there is ZERO structure to it.


The wiki is the nosql of the postgresql community. ;)

With an issue tracker you can say:

This is a IN PROGRESS TODO
This is an APPROVED TODO
This is a ISSUE
-> BUG CONFIRMED
-> USER ERROR

etc

And if the right software is used, it can all be done via email AND can 
be tracked a hell of a lot easier than the way we track everything 
(literally) now. It is a hell of a lot easier to say:


See #53466

Than every alternative we currently have.

JD







--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Very confusing installcheck behavior with PGXS

2016-01-07 Thread Tom Lane

Jim Nasby  writes:
> On 1/7/16 9:12 AM, Tom Lane wrote:
>> (I'm also wondering how convert_sourcefiles() works at all in a vpath
>> build, considering that I don't see it doing anything like this ...)

> It's only looking at outputdir, which I suspect is never ambiguous.

Eh, no, look again.  What it's actually doing is reading $inputdir/input/
and converting into $outputdir/sql/, and reading $inputdir/output/ and
converting into $outputdir/expected/.  I guess that works, but again it's
kind of at variance with the normal expectation of VPATH behavior.  What
you'd expect is that $builddir/input files would override $srcdir/input
files; but as is, $builddir/input and $builddir/output are never examined
at all.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses

2016-01-07 Thread Alvaro Herrera

Peter Geoghegan wrote:
> We lack SortSupport for many character-like-type cases. In full, the
> cases within the core system are:

You're stealthily introducing a new abstraction called "string",
including a typedef and DatumGetString support macros.  Is that really
necessary?  Shouldn't it be discussed specifically?  I don't necessarily
oppose it as is, mainly because it's limited to within varlena.c for
now, but I'm not sure it'd okay to make it more public.

Also, there's a lot of churn in this patch to just remove tss to sss.
Can't we just keep the original variable name?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Generic WAL logical messages

2016-01-07 Thread Andres Freund

> Process-wise, I don't understand why is Andres mentioned as co-author.
> Did he actually wrote part of the patch, or advised on the design?
> Who is reviewing the patch?

It's extracted & extended from BDR, where I added that feature (to
implement distributed locking).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Very confusing installcheck behavior with PGXS

2016-01-07 Thread Jim Nasby


On 1/7/16 9:12 AM, Tom Lane wrote:

Jim Nasby  writes:

On 1/7/16 8:47 AM, Tom Lane wrote:

That's pretty hard to believe.  There's nothing in pg_regress that looks
in places other than the given --inputdir.



Actually, I think it does... from pg_regress_main.c:



/*
 * Look for files in the output dir first, consistent with a vpath 
search.
 * This is mainly to create more reasonable error messages if the file 
is
 * not found.  It also allows local test overrides when running 
pg_regress
 * outside of the source tree.
 */
snprintf(infile, sizeof(infile), "%s/sql/%s.sql",
 outputdir, testname);
if (!file_exists(infile))
snprintf(infile, sizeof(infile), "%s/sql/%s.sql",
 inputdir, testname);


Oh, I'd just been looking in pg_regress.c.

The comment's argument about "more reasonable error messages" seems pretty
dubious to me.  The real reason for this behavior seems to have been to
simplify VPATH builds --- see commit feae7856, which added this code,
and was able to get rid of quite a lot of makefile cruft.

I think though that doing it exactly like this is a bit klugy.  A better
implementation of VPATH would've been to introduce an optional "secondary
input directory" into which we look if we fail to find something in the
main input directory.  Then we'd run it with main input directory in the
build tree and secondary input directory pointing to the source tree.


Seems reasonable. Sounds like a good beginner project to boot. :)


(I'm also wondering how convert_sourcefiles() works at all in a vpath
build, considering that I don't see it doing anything like this ...)


It's only looking at outputdir, which I suspect is never ambiguous.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Very confusing installcheck behavior with PGXS

2016-01-07 Thread Tom Lane

Jim Nasby  writes:
> On 1/7/16 8:47 AM, Tom Lane wrote:
>> That's pretty hard to believe.  There's nothing in pg_regress that looks
>> in places other than the given --inputdir.

> Actually, I think it does... from pg_regress_main.c:

>   /*
>* Look for files in the output dir first, consistent with a vpath 
> search.
>* This is mainly to create more reasonable error messages if the file 
> is
>* not found.  It also allows local test overrides when running 
> pg_regress
>* outside of the source tree.
>*/
>   snprintf(infile, sizeof(infile), "%s/sql/%s.sql",
>outputdir, testname);
>   if (!file_exists(infile))
>   snprintf(infile, sizeof(infile), "%s/sql/%s.sql",
>inputdir, testname);

Oh, I'd just been looking in pg_regress.c.

The comment's argument about "more reasonable error messages" seems pretty
dubious to me.  The real reason for this behavior seems to have been to
simplify VPATH builds --- see commit feae7856, which added this code,
and was able to get rid of quite a lot of makefile cruft.

I think though that doing it exactly like this is a bit klugy.  A better
implementation of VPATH would've been to introduce an optional "secondary
input directory" into which we look if we fail to find something in the
main input directory.  Then we'd run it with main input directory in the
build tree and secondary input directory pointing to the source tree.

(I'm also wondering how convert_sourcefiles() works at all in a vpath
build, considering that I don't see it doing anything like this ...)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

2016-01-07 Thread Andres Freund

On 2016-01-07 16:05:32 +0100, Fabien COELHO wrote:
> >But I'm inclined to go a different way: I think it's a mistake to do
> >flusing based on a single file. It seems better to track a fixed number of
> >outstanding 'block flushes', independent of the file. Whenever the number
> >of outstanding blocks is exceeded, sort that list, and flush all
> >outstanding flush requests after merging neighbouring flushes.
> 
> Hmmm. I'm not sure I understand your strategy.
> 
> I do not think that flushing without a prior sorting would be effective,
> because there is no clear reason why buffers written together would then be
> next to the other and thus give sequential write benefits, we would just get
> flushed random IO, I tested that and it worked badly.

Oh, I was thinking of sorting & merging these outstanding flushes. Sorry
for not making that clear.


> One of the point of aggregating flushes is that the range flush call cost
> is significant, as shown by preliminary tests I did, probably up in the
> thread, so it makes sense to limit this cost, hence the aggregation. These
> removed some performation regression I had in some cases.

FWIW, my tests show that flushing for clean ranges is pretty cheap.


> Also, the granularity of the buffer flush call is a file + offset + size, so
> necessarily it should be done this way (i.e. per file).

What syscalls we issue, and at what level we track outstanding flushes,
doesn't have to be the same.


> Once buffers are sorted per file and offset within file, then written
> buffers are as close as possible one after the other, the merging is very
> easy to compute (it is done on the fly, no need to keep the list of buffers
> for instance), it is optimally effective, and when the checkpointed file
> changes then we will never go back to it before the next checkpoint, so
> there is no reason not to flush right then.

Well, that's true if there's only one tablespace, but e.g. not the case
with two tablespaces of about the same number of dirty buffers.


> So basically I do not see a clear positive advantage to your suggestion,
> especially when taking into consideration the scheduling process of the
> scheduler:

I don't think it makes a big difference for the checkpointer alone, but
it makes the interface much more suitable for other processes, e.g. the
bgwriter, and normal backends.



> >Imo that means that we'd better track writes on a relfilenode + block
> >number level.
> 
> I do not think that it is a better option. Moreover, the current approach
> has been proven to be very effective on hundreds of runs, so redoing it
> differently for the sake of it does not look like good resource allocation.

For a subset of workloads, yes.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

2016-01-07 Thread Fabien COELHO



Hello Andres,


I thought of adding a pointer to the current flush structure at the vfd
level, so that on closing a file with a flush in progress the flush can be
done and the structure properly cleaned up, hence later the checkpointer
would see a clean thing and be able to skip it instead of generating flushes
on a closed file or on a different file...

Maybe I'm missing something, but that is the plan I had in mind.


That might work, although it'd not be pretty (not fatally so
though).


Alas, any solution has to communicate somehow between the API levels, so 
it cannot be "pretty", although we should avoid the worse.


But I'm inclined to go a different way: I think it's a mistake to do 
flusing based on a single file. It seems better to track a fixed number 
of outstanding 'block flushes', independent of the file. Whenever the 
number of outstanding blocks is exceeded, sort that list, and flush all 
outstanding flush requests after merging neighbouring flushes.


Hmmm. I'm not sure I understand your strategy.

I do not think that flushing without a prior sorting would be effective, 
because there is no clear reason why buffers written together would then 
be next to the other and thus give sequential write benefits, we would 
just get flushed random IO, I tested that and it worked badly.


One of the point of aggregating flushes is that the range flush call cost
is significant, as shown by preliminary tests I did, probably up in the 
thread, so it makes sense to limit this cost, hence the aggregation. These 
removed some performation regression I had in some cases.


Also, the granularity of the buffer flush call is a file + offset + size, 
so necessarily it should be done this way (i.e. per file).


Once buffers are sorted per file and offset within file, then written 
buffers are as close as possible one after the other, the merging is very 
easy to compute (it is done on the fly, no need to keep the list of 
buffers for instance), it is optimally effective, and when the 
checkpointed file changes then we will never go back to it before the next 
checkpoint, so there is no reason not to flush right then.


So basically I do not see a clear positive advantage to your suggestion, 
especially when taking into consideration the scheduling process of the 
scheduler:


In effect the checkpointer already works with little bursts of activity 
between sleep phases, so that it writes buffers a few at a time, so it may 
already work more or less as you expect, but not for the same reason.


The closest stategy that I experimented which is maybe close to your 
suggestion was to manage a minimum number of buffers to write when awaken 
and to change the sleep delay in between, but I had no clear way to choose 
values and the experiments I did did not show significant performance 
impact by varying these parameters, so I kept that out. If you find a 
magic number of buffer which results in consistant better performance, 
fine with me, but this is independent with aggregating before or after.


Imo that means that we'd better track writes on a relfilenode + block 
number level.


I do not think that it is a better option. Moreover, the current approach 
has been proven to be very effective on hundreds of runs, so redoing it 
differently for the sake of it does not look like good resource 
allocation.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Very confusing installcheck behavior with PGXS

2016-01-07 Thread Jim Nasby


On 1/7/16 8:47 AM, Tom Lane wrote:

Jim Nasby  writes:

However, if I do this:
mv test/sql/acl_type.sql test/sql/acl.sql
mv test/expected/acl_type.out test/expected/acl.out



And change acl_type to acl in that pg_regress command:
/Users/decibel/pgsql/HEAD/i/lib/pgxs/src/makefiles/../../src/test/regress/pg_regress
--bindir='/Users/decibel/pgsql/HEAD/i/bin' --inputdir=test
--load-language=plpgsql --dbname=contrib_regression acl build compat rights



Instead of executing test/sql/acl.sql, it executes ./sql/acl.sql.


That's pretty hard to believe.  There's nothing in pg_regress that looks
in places other than the given --inputdir.


Actually, I think it does... from pg_regress_main.c:

/*
 * Look for files in the output dir first, consistent with a vpath 
search.
 * This is mainly to create more reasonable error messages if the file 
is
 * not found.  It also allows local test overrides when running 
pg_regress
 * outside of the source tree.
 */
snprintf(infile, sizeof(infile), "%s/sql/%s.sql",
 outputdir, testname);
if (!file_exists(infile))
snprintf(infile, sizeof(infile), "%s/sql/%s.sql",
 inputdir, testname);

If I add --outputdir=test, then everything works fine.

Obviously I can just deal with this in my Makefile, but is this really 
the behavior we want? It certainly seems to violate POLA...



I wonder whether you have a test/input/acl.sql and/or test/output/acl.out
that's confusing matters.


Nope...

decibel@decina:[08:35]~/git/pg_acl (master *%=)$ll test
total 16
drwxr-x---   6 decibel  staff  204 Jan  2 17:31 ./
drwxr-x---  17 decibel  staff  578 Jan  7 08:35 ../
-rw-r-   1 decibel  staff   65 Jan  2 17:31 deps.sql
drwxr-x---   6 decibel  staff  204 Jan  7 08:32 expected/
lrwxr-x---   1 decibel  staff   25 Dec 26 13:43 pgxntool@ -> 
../pgxntool/test/pgxntool

drwxr-x---   6 decibel  staff  204 Jan  7 08:32 sql/
decibel@decina:[08:48]~/git/pg_acl (master *%=)$


--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2016-01-07 Thread Joel Jacobson

+1 to both the feature and the concept of how it's implemented.
Haven't looked at the code though.

This feature would be very useful for us at Trustly.
This would mean we got get rid of an entire system component in our
architecture (=memcached) which we only use to write data which must
be immediately readable at the sync slave after the master commits.
The only such data we currently have is the backoffice sessionid which
must be readable on the slave, otherwise the read-only calls which we
route to the slave might fail because it's missing.

On Wed, Nov 11, 2015 at 6:37 AM, Thomas Munro
 wrote:
>
> Hi hackers,
>
> Many sites use hot standby servers to spread read-heavy workloads over more 
> hardware, or at least would like to.  This works well today if your 
> application can tolerate some time lag on standbys.  The problem is that 
> there is no guarantee of when a particular commit will become visible for 
> clients connected to standbys.  The existing synchronous commit feature is no 
> help here because it guarantees only that the WAL has been flushed on another 
> server before commit returns.  It says nothing about whether it has been 
> applied or whether it has been applied on the standby that you happen to be 
> talking to.
>
> A while ago I posted a small patch[1] to allow synchronous_commit to wait for 
> remote apply on the current synchronous standby, but (as Simon Riggs rightly 
> pointed out in that thread) that part isn't the main problem.  It seems to me 
> that the main problem for a practical 'writer waits' system is how to support 
> a dynamic set of servers, gracefully tolerating failures and timing out 
> laggards, while also providing a strong guarantee during any such 
> transitions.  Now I would like to propose something to do that, and share a 
> proof-of-concept patch.
>
>
> === PROPOSAL ===
>
> The working name for the proposed feature is "causal reads", because it 
> provides a form of "causal consistency"[2] (and "read-your-writes" 
> consistency) no matter which server the client is connected to.  There is a 
> similar feature by the same name in another product (albeit implemented 
> differently -- 'reader waits'; more about that later).  I'm not wedded to the 
> name.
>
> The feature allows arbitrary read-only transactions to be run on any hot 
> standby, with a specific guarantee about the visibility of preceding 
> transactions.  The guarantee is that if you set a new GUC "causal_reads = on" 
> in any pair of consecutive transactions (tx1, tx2) where tx2 begins after tx1 
> successfully returns, then tx2 will either see tx1 or fail with a new error 
> "standby is not available for causal reads", no matter which server it runs 
> on.  A discovery mechanism is also provided, giving an instantaneous snapshot 
> of the set of standbys that are currently available for causal reads (ie 
> won't raise the error), in the form of a new column in pg_stat_replication.
>
> For example, a web server might run tx1 to insert a new row representing a 
> message in a discussion forum on the primary server, and then send the user 
> to another web page that runs tx2 to load all messages in the forum on an 
> arbitrary hot standby server.  If causal_reads = on in both tx1 and tx2 (for 
> example, because it's on globally), then tx2 is guaranteed to see the new 
> post, or get a (hopefully rare) error telling the client to retry on another 
> server.
>
> Very briefly, the approach is:
> 1.  The primary tracks apply lag on each standby (including between commits).
> 2.  The primary deems standbys 'available' for causal reads if they are 
> applying WAL and replying to keepalives fast enough, and periodically sends 
> the standby an authorization to consider itself available for causal reads 
> until a time in the near future.
> 3.  Commit on the primary with "causal_reads = on" waits for all 'available' 
> standbys either to apply the commit record, or to cease to be 'available' and 
> begin raising the error if they are still alive (because their authorizations 
> have expired).
> 4.  Standbys can start causal reads transactions only while they have an 
> authorization with an expiry time in the future; otherwise they raise an 
> error when an initial snapshot is taken.
>
> In a follow-up email I can write about the design trade-offs considered 
> (mainly 'writer waits' vs 'reader waits'), comparison with some other 
> products, method of estimating replay lag, wait and timeout logic and how it 
> maintains the guarantee in various failure scenarios, logic for standbys 
> joining and leaving, implications of system clock skew between servers, or 
> any other questions you may have, depending on feedback/interest (but see 
> comments in the attached patch for some of those subjects).  For now I didn't 
> want to clog up the intertubes with too large a wall of text.
>
>
> === PROOF-OF-CONCEPT ===
>
> Please see the POC patch attached.  It adds two new GUCs.  After setting up 
> one or mor

Re: [HACKERS] Very confusing installcheck behavior with PGXS

2016-01-07 Thread Jim Nasby


On 1/6/16 11:54 AM, Tom Lane wrote:

Robert Haas  writes:

On Sun, Jan 3, 2016 at 5:22 PM, Jim Nasby  wrote:

The rule that gets executed if you do `make installcheck` with something
using PGXS is

pgxs.mk:$(pg_regress_installcheck) $(REGRESS_OPTS) $(REGRESS)

where $(pg_regress_installcheck) is set in Makefile.global.in to


pg_regress_installcheck = $(top_builddir)/src/test/regress/pg_regress
--inputdir=$(srcdir) --bindir='$(bindir)' $(pg_regress_locale_flags)
$(EXTRA_REGRESS_OPTS)


The problem here is that in a PGXS make, srcdir is set to '.'[1], and
--inputdir is specified a second time in REGRESS_OPTS. Normally that works
OK (for some reason ignoring what's in ./sql), but if you happen to have a
file in your test/sql directory that matches a file in ./sql, pg_regress
runs the first file and not the second.



Stupid question time: why in the world would you have that?


It doesn't seem that odd to have a test file that matches an extension 
file...



AFAICS, the pieces we supply (Makefile.global and pgxs.mk) would only
specify --inputdir once.  If there's a second specification being added
to REGRESS_OPTS by your own Makefile, that would override the default,
which seems like entirely sensible behavior.  Maybe I'm misunderstanding
something, but it sounds like you're saying "if I write --inputdir=test
in REGRESS_OPTS, it runs the tests in test/sql not those in ./sql".
Why would you not think that's expected?


Actually, it's more bizarre than I thought...

This is what my makefile[1] produces, which currently works:
/Users/decibel/pgsql/HEAD/i/lib/pgxs/src/makefiles/../../src/test/regress/pg_regress 
--inputdir=./ --bindir='/Users/decibel/pgsql/HEAD/i/bin' 
--inputdir=test --load-language=plpgsql --dbname=contrib_regression 
acl_type build compat rights


Removing the first --inputdir also works:
/Users/decibel/pgsql/HEAD/i/lib/pgxs/src/makefiles/../../src/test/regress/pg_regress 
 --bindir='/Users/decibel/pgsql/HEAD/i/bin'--inputdir=test 
--load-language=plpgsql --dbname=contrib_regression acl_type build 
compat rights


However, if I do this:
mv test/sql/acl_type.sql test/sql/acl.sql
mv test/expected/acl_type.out test/expected/acl.out

And change acl_type to acl in that pg_regress command:
/Users/decibel/pgsql/HEAD/i/lib/pgxs/src/makefiles/../../src/test/regress/pg_regress 
--bindir='/Users/decibel/pgsql/HEAD/i/bin' --inputdir=test 
--load-language=plpgsql --dbname=contrib_regression acl build compat rights


Instead of executing test/sql/acl.sql, it executes ./sql/acl.sql.

I'd assumed this was due to the extra --inputdir option, but clearly 
something else is going on here.


Time to gvim pg_regress.c...

[1] https://github.com/decibel/pg_acl
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improved tab completion for FDW DDL

2016-01-07 Thread Tom Lane

Andreas Karlsson  writes:
> On 01/04/2016 01:09 AM, David Fetter wrote:
>> On Wed, Dec 30, 2015 at 01:21:06PM +0100, Andreas Karlsson wrote:
>>> Another completion which is currently missing but I am not sure if we should
>>> add or not is completing "ALTER|CREATE|DROP USER" with "MAPPING FOR", but
>>> since it might interfere with completing to username for "ALTER|DROP USER" I
>>> am not sure we want it. What do you think?

>> Is there a way to require some reasonable chunk--say, one that's
>> disambiguated from the name any known ROLE with LOGIN--of MAPPING
>> before completing with MAPPING FOR?  I confess to not knowing how the
>> new system works in enough detail to know that off the top of my head.

> No, and while it would not be too hard to build it would not be worth 
> doing just for this use case.

The way we've solved other similar cases would translate like this:
instead of the "query for user names" just returning user names, add
on "UNION 'MAPPING FOR'".  So if you do

# alter user 

where you're now offered

alicejoe   postgres

you'd instead get

alicejoe   postgresMAPPING FOR

Dunno if it's worth the trouble though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

2016-01-07 Thread Andres Freund

On 2016-01-07 13:07:33 +0100, Fabien COELHO wrote:
> 
> >>Yes, I'm planning to try to do the minimum possible damage to the current
> >>API to fix the issue.
> >
> >What's your thought there? Afaics it's infeasible to do the flushing tat
> >the fd.c level.
> 
> I thought of adding a pointer to the current flush structure at the vfd
> level, so that on closing a file with a flush in progress the flush can be
> done and the structure properly cleaned up, hence later the checkpointer
> would see a clean thing and be able to skip it instead of generating flushes
> on a closed file or on a different file...
> 
> Maybe I'm missing something, but that is the plan I had in mind.

That might work, although it'd not be pretty (not fatally so
though). But I'm inclined to go a different way: I think it's a mistake
to do flusing based on a single file. It seems better to track a fixed
number of outstanding 'block flushes', independent of the file. Whenever
the number of outstanding blocks is exceeded, sort that list, and flush
all outstanding flush requests after merging neighbouring flushes. Imo
that means that we'd better track writes on a relfilenode + block number
level.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: BSD Authentication support

2016-01-07 Thread Greg Stark

This sounds like a sensible thing to me. I'm actually surprised, it
sounds like something we would have already seen. Do some people just
use PAM on OpenBSD? Are both supported?

You should add the patch to https://commitfest.postgresql.org to
ensure it doesn't slip through the cracks. It's too late for January
though there's nothing stopping people from commenting on or even
committing patches outside the commitfest.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-01-07 Thread Etsuro Fujita


On 2016/01/06 20:37, Thom Brown wrote:

On 25 December 2015 at 10:00, Etsuro Fujita  wrote:

Attached is an updated version of the patch, which is
still WIP, but I'd be happy if I could get any feedback.



I've run into an issue:

*# UPDATE master_customers SET id = 22 WHERE id = 16 RETURNING
tableoid::regclass;
ERROR:
CONTEXT:  Remote SQL command: UPDATE public.customers SET id = 22
WHERE ((id = 16)) RETURNING NULL

However, this works:

*# UPDATE master_customers SET id = 22 WHERE id = 16 RETURNING
tableoid::regclass, *;
 tableoid | id | name  |company| registered_date |
expiry_date | active | status  | account_level
-++---+---+-+-++-+---
  local_customers | 22 | Bruce | Jo's Cupcakes | 2015-01-15  |
2017-01-14  | t  | running | basic
(1 row)

In this example, "local_customers" inherits from the remote table
"public"."customers", which inherits again from the local table
"master_customers"

Same issue with DELETE of course, and the ::regclass isn't important here.


Will fix.

Thanks for the testing!

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-01-07 Thread Etsuro Fujita


On 2016/01/06 18:58, Rushabh Lathia wrote:

I started looking at updated patch and its definitely iked the new
approach.


Thanks for the review!


With the initial look and test overall things looking great, I am still
reviewing the code changes but here are few early doubts/questions:



.) What the need of following change ?

@@ -833,9 +833,6 @@ appendWhereClause(StringInfo buf,
 int nestlevel;
 ListCell   *lc;

-   if (params)
-   *params = NIL;  /* initialize result list to empty */
-
 /* Set up context struct for recursion */
 context.root = root;
 context.foreignrel = baserel;
@@ -971,6 +968,63 @@ deparseUpdateSql(StringInfo buf, PlannerInfo *root,
  }


It is needed for deparsePushedDownUpdateSql to store params in both 
WHERE clauses and expressions to assign to the target columns

into one params_list list.


.) When Tom Lane and Stephen Frost suggested getting the core code involved,
I thought that we can do the mandatory checks into core it self and making
completely out of dml_is_pushdown_safe(). Please correct me


The reason why I put that function in postgres_fdw.c is Check point 4:

+  * 4. We can't push an UPDATE down, if any expressions to assign to 
the target

+  * columns are unsafe to evaluate on the remote server.

I think this depends on the capabilities of the FDW.


.) Documentation for the new API is missing (fdw-callbacks).


Will add the docs.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

2016-01-07 Thread Fabien COELHO



Hello,


I read your patch and I know what I want to try to have a small and simple
fix. I must admit that I have not really understood in which condition the
checkpointer would decide to close a file, but that does not mean that the
potential issue should not be addressed.


There's a trivial example: Consider three tablespaces and
max_files_per_process = 2. The balancing can easily cause three files
being flushed at the same time.


Indeed. Thanks for this explanation!


But more importantly: You designed the API to be generic because you
wanted it to be usable for other purposes as well. And for that it
certainly needs to deal with that.


Yes, I'm planning to try to do the minimum possible damage to the current 
API to fix the issue.



Also, I gave some thoughts about what should be done for bgwriter random
IOs. The idea is to implement some per-file sorting there and then do some
LRU/LFU combing. It would not interact much with the checkpointer, so for me
the two issues should be kept separate and this should not preclude changing
the checkpointer, esp. given the significant performance benefit of the
patch.


Well, the problem is that the patch significantly regresses some cases
right now. So keeping them separate isn't particularly feasible.


I have not seen significant regressions on my many test runs. In 
particular, I would not consider that having a tps deep in cases where 
postgresql is doing 0 tps most of the time anyway (ie pg is offline) 
because of random IO issues should be blocker.


As I understood it, the regressions occur when the checkpointer is less 
used, i.e. bgwriter is doing most of the writes, but this does not change 
much whether the checkpointer sorts buffers or not, and the overall 
behavior of pg is very bad anyway in these cases.


Also I think that coupling the two issues is a recipee for never having 
anything done in the end and keep the current awful behavior:-(


The solution on the bgwriter front is somehow similar to the checkpointer, 
but from a code point of view there is minimum interaction, so I would 
really separate them, esp. as the bgwriter part will require extensive 
testing and discussions as well.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel joins, and better parallel explain

2016-01-07 Thread Dilip Kumar

On Wed, Jan 6, 2016 at 10:29 PM, Robert Haas  wrote:

> On Mon, Jan 4, 2016 at 8:52 PM, Dilip Kumar  wrote:
> > One strange behaviour, after increasing number of processor for VM,
> > max_parallel_degree=0; is also performing better.
>
> So, you went from 6 vCPUs to 8?  In general, adding more CPUs means
> that there is less contention for CPU time, but if you already had 6
> CPUs and nothing else running, I don't know why the backend running
> the query would have had a problem getting a whole CPU to itself.  If
> you previously only had 1 or 2 CPUs then there might have been some
> CPU competition with background processes, but if you had 6 then I
> don't know why the max_parallel_degree=0 case got faster with 8.
>

I am really not sure about this case, may be CPU allocation in virtual
machine had problem.. but can't say anything

> Anyway, I humbly suggest that this query isn't the right place to put
> our attention.  There's no reason why we can't improve things further
> in the future, and if it turns out that lots of people have problems
> with the cost estimates on multi-batch parallel hash joins, then we
> can revise the cost model.  We wouldn't treat a single query where a
> non-parallel multi-batch hash join run slower than the costing would
> suggest as a reason to revise the cost model for that case, and I
> don't think this patch should be held to a higher standard.  In this
> particular case, you can easily make the problem go away by tuning
> configuration parameters, which seems like an acceptable answer for
> people who run into this,

Yes, i agree with this point, cost model can always be improved. And anyway
in most of the queries even in TPC-H benchmark we have seen big improvement
with parallel join.

I have done further testing for observing the plan time, using TPC-H
queries and some other many table join queries(7-8 tables)..

I did not find any visible regression in planning time...

*There are many combinations of queries i have tested, and because of big
size of query and result did not attached in the mail... let me know if
anybody want to know the details of queries...

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Relation extension scalability

2016-01-07 Thread Andres Freund

On 2016-01-07 16:48:53 +0530, Amit Kapila wrote:
> What I could understand from above e-mail is that Dilip has tried to
> extend relation multiple-pages-at-a-time and observed that it gives
> comparable or better performance as compare to Andres's idea of
> lock-free extension and it doesn't regress the low-thread count case.

I think it's a worthwhile approach to pursue. But until it actually
fixes the problem of leaving around uninitialized pages I don't think
it's very meaningful to do performance comparisons.

> Now, I think here point to discuss is that there could be multiple-ways
> for extending a relation multiple-pages-at-a-time like:
> 
> a. Extend the relation page by page and add it to FSM without initializing
> it.  I think this is what the current patch of Dilip seems to be doing. If
> we
> want to go via this route, then we need to ensure that whenever we get
> the page from FSM, if it is empty and not initialised, then initialise
> it.

I think that's pretty much unacceptable, for the non-error path at
least.

One very simple, linux specific, approach would be to simply do
fallocate(FALLOC_FL_KEEP_SIZE) to extend the file, that way space is
pre-allocated, but not yet marked as allocated.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

2016-01-07 Thread Andres Freund

On 2016-01-07 11:27:13 +0100, Fabien COELHO wrote:
> I read your patch and I know what I want to try to have a small and simple
> fix. I must admit that I have not really understood in which condition the
> checkpointer would decide to close a file, but that does not mean that the
> potential issue should not be addressed.

There's a trivial example: Consider three tablespaces and
max_files_per_process = 2. The balancing can easily cause three files
being flushed at the same time.

But more importantly: You designed the API to be generic because you
wanted it to be usable for other purposes as well. And for that it
certainly needs to deal with that.

> Also, I gave some thoughts about what should be done for bgwriter random
> IOs. The idea is to implement some per-file sorting there and then do some
> LRU/LFU combing. It would not interact much with the checkpointer, so for me
> the two issues should be kept separate and this should not preclude changing
> the checkpointer, esp. given the significant performance benefit of the
> patch.

Well, the problem is that the patch significantly regresses some cases
right now. So keeping them separate isn't particularly feasible.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extend pgbench expressions with functions

2016-01-07 Thread Fabien COELHO



Hello Michaël,


Based on that all the final results of a \set command will have an
integer format, still after going through this patch, allowing double
as return type for nested function calls (first time "nested" is
written on this thread) is actually really useful, and that's what
makes sense for this feature.


Yep.

[...] Something that bothered me though while testing: the debug() 
function is useful, but it becomes difficult to see its results 
efficiently when many outputs are stacking, so I think that it would be 
useful to be able to pass a string prefix to have the possibility to 
annotate a debug entry, say "debug('foo', 5.2)" outputs: 
debug(script=0,command=1): note: foo, int/double X I guess that it would 
be useful then to allow as well concatenation of strings using "+" with 
a new PgBenchExprType as ENODE_STRING, but perhaps I am asking too much.


I think that the answer is yes:-)

Obviously all this is possible in the grand scale of things, but this is 
also significant work for a tiny benefit, if any. I rather see "debug" as 
a simple tool for debugging a script with "pgbench -t 1" (i.e. one or few 
transactions), so that the trace length is not an issue.


Once the script is satisfactory, remove the debug and run it for real.

Note that it does not make much sense to run with "debug" calls for many 
transactions because of the performance impact.



Thoughts are welcome regarding that, it does not seem mandatory


Good.


as of now as this patch is already doing much.


Yep.


We could remove some of the functions in the first shot of this patch
to simplify it a bit, but that does not look like a win as their
footprint on the code is low.


That is one good thing about this function infrastructure, adding a new 
function has a small impact. Some functions are there more for the demo of 
how to do it than useful as such (eg sqrt()), but that is not bad thing.



I haven't noticed at quick glance any kind of memory leaks but this
deserves a closer look with valgrind for example, still the patch
looks in good shape to me.


ISTM that there is no allocation in the evaluation part, all is done in 
stack, so there is no risk of a leak there. If there is a leak in the 
parser we really don't care: this is run once when reading a script, and 
then after a run pgbench exits anyway.


And more comments for example in pgbench.h would be welcome to explain 
more the code.


Ok. I'm generally in favor of comments.

I am fine to take a final look at that before handling it to a committer 
though. Does that sound fine as a plan, Fabien?


I understand that you propose to add these comments?

I'm fine with that!:-)

If I misuderstood, tell me:-)

--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add schema-qualified relnames in constraint error messages.

2016-01-07 Thread Shulgin, Oleksandr

On Wed, Jan 6, 2016 at 3:02 PM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:

>
> Please find attached a POC patch, using \errverbose for the command name.
> Unfortunately, I didn't see a good way to contain the change in psql only
> and had to change libpq, adding new interface PQresultBuildErrorMessage().
> Also, what I don't like very much is that we need to keep track of the last
> result from last SendQuery() in psql's pset.  So in my view this is sort of
> a dirty hack that works nonetheless.
>

Added to the Open commitfest: https://commitfest.postgresql.org/9/475/

Re: [HACKERS] Add numeric_trim(numeric)

2016-01-07 Thread Dean Rasheed

On 6 January 2016 at 15:21, Dean Rasheed  wrote:
> Actually I found the implementation a little confusing, partly because
> the first comment doesn't really match the line of code that follows
> it, and partly because of the complexity of the loop, the macros and
> working out all the exit conditions from the loop. A much simpler
> implementation would be to first call strip_var() ...

Additionally, the central part of the algorithm where it computes dig
% 10, dig % 100 and dig % 1000 inside a bunch of conditional macros is
overly complex and hard to read. Once you've got it down to the case
where you know dig is non-zero, you can just do

while ((dig % 10) == 0)
{
dscale--;
dig /= 10;
}

which works for any NBASE, rather than having to introduce an
NBASE-dependent block of code with all those preprocessor
conditionals.

(Actually, given the other thread of this discussion, I would name
that local variable min_scale now.)

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pglogical - logical replication contrib module

2016-01-07 Thread Craig Ringer

... and this is why we don't post while jetlagged and tired.

The patch on the prior mail is the output plugin. Wrong thread, wrong
filename. It's the output plugin update needed for the pglogical downstream
in this thread.

Corrected post of v5 output plugin here:

http://www.postgresql.org/message-id/CAMsr+YEGtE8gYnpAo7=n=ims9olcc8oemvmrh+9ki9wb5cy...@mail.gmail.com

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Add numeric_trim(numeric)

2016-01-07 Thread Dean Rasheed

On 7 January 2016 at 00:11, Tom Lane  wrote:
> A different approach is that I'm not real sure why we want a function
> that returns a modified numeric value at all.  To the extent I understood
> Marko's original use case, it seems like what you'd invariably do with the
> result is extract its scale().  Why not skip the middleman and define a
> function named something like minscale() or leastscale(), which returns an
> int that is the smallest scale that would not drop data?  (If you actually
> did want the modified numeric value, you could use round(x, minscale(x))
> to get it.)
>

minscale() sounds good to me.

Re-reading Marko's original use case, it sounds like that specific
example would boil down to a check that minscale(x) <= 2, although
that can be done today using trunc(x,2) = x. Still, it seems that
minscale() would be a useful general purpose function to have.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add numeric_trim(numeric)

2016-01-07 Thread Pavel Stehule

2016-01-07 8:12 GMT+01:00 Pavel Stehule :

>
>
> 2016-01-07 1:11 GMT+01:00 Tom Lane :
>
>> Dean Rasheed  writes:
>> > On 6 January 2016 at 20:09, Robert Haas  wrote:
>> >> On Wed, Jan 6, 2016 at 10:21 AM, Dean Rasheed <
>> dean.a.rash...@gmail.com> wrote:
>> >>> It seems like a useful function to have, but perhaps it should just be
>> >>> called trim() rather than numeric_trim(), for consistency with the
>> >>> names of the other numeric functions, which don't start with
>> >>> "numeric_".
>>
>> >> That wouldn't work in this case, because we have hard-coded parser
>> >> productions for TRIM().
>>
>> Does it have to be called TRIM()?  After looking at the spec for it
>> I'd think rtrim() is the more correct analogy.
>>
>> Also worth noting is that those hard-wired parser productions aren't
>> as hard-wired as all that.
>>
>> regression=# select trim(43.5);
>> ERROR:  function pg_catalog.btrim(numeric) does not exist
>>
>> If we wanted to call the function btrim() underneath, this would
>> Just Work.  However, to alleviate confusion, it might be better
>> if we altered the grammar productions to output "trim" not "btrim"
>> for the not-LEADING-or-TRAILING cases, and of course renamed the
>> relevant string functions to match.
>>
>> A different approach is that I'm not real sure why we want a function
>> that returns a modified numeric value at all.  To the extent I understood
>> Marko's original use case, it seems like what you'd invariably do with the
>> result is extract its scale().  Why not skip the middleman and define a
>> function named something like minscale() or leastscale(), which returns an
>> int that is the smallest scale that would not drop data?  (If you actually
>> did want the modified numeric value, you could use round(x, minscale(x))
>> to get it.)
>>
>
> A example "round(x, minscale(x))" looks nice, but there can be a
> performance issues - you have to unpack varlena 2x
>

the overhead of two numeric functions instead is about 100ms on 1M rows -
that can be acceptable

I prefer it over string like design

Regards

Pavel


>
> I'll try to some performance tests
>
> Regards
>
> Pavel
>
>
>> regards, tom lane
>>
>
>

72 matches

Mail list logo