date:20150722

Re: [HACKERS] security labels on databases are bad for dump & restore

2015-07-22 Thread Noah Misch

On Wed, Jul 22, 2015 at 03:42:58PM -0400, Adam Brightwell wrote:
> > I like Noah's proposal of having pg_dump --create reproduce all
> > database-level state.
> 
> Should it be enabled by default?  If so, then wouldn't it make more
> sense to call it --no-create and do the opposite?  So, --no-create
> would exclude rather than include database-level information?  Would
> enabling it by default cause issues with the current expected use of
> the tool by end users?

While I'd favor optional --no-create if we were designing fresh, it's not
worth breaking user scripts by changing that now.

> How would this handle related global objects? It seems like this part
> could get a little tricky.

Like roles and tablespaces?  No need to change their treatment.

> Taking it one step further, would a --all option that dumps all
> databases make sense as well?  Of course I know that's probably a
> considerable undertaking and certainly beyond the current scope.

I agree it's outside the scope of fixing $subject.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump quietly ignore missing tables - is it bug?

2015-07-22 Thread Kyotaro HORIGUCHI

Sorry for the bogus on bogus.

At Thu, 23 Jul 2015 14:22:59 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20150723.142259.200902861.horiguchi.kyot...@lab.ntt.co.jp>
> Hello,
> 
> > > 2015-07-19 20:54 GMT+02:00 Pavel Stehule :
> > >> I am sending updated version. It implements new long option
> > >> "--strict-names". If this option is used, then for any entered name
> > >> (without any wildcard char) must be found least one object. This option 
> > >> has
> > >> not impact on patters (has wildcards chars).
> 
> I share the same option with Tom that it should behave in same
> way regardless of the appearance of wildcards.
> 
> You replied Tom as this
> 
> > the behave is same - only one real identifier is allowed
> 
> But the description above about this patch apparently says that
> they are differently handled. And
> expand_(schema|table)_name_patterns does the further differently
> thing from both of Tom's suggestion and what mentioned in your
> reply to that. I will mention for this topic again in this mail.
> 
> # Might "only one real ident is allowed" mean "at most one
> # match", but not "exactly one match"?
> 
> They do like this when strict-names.
> 
>   - Not allow no match for non-wildcarded names.
> 
>   - Allow no match for any wildcarded name spec and finally
> allowing *all* of them don't match anyting.
> 
> This looks to me quite confusing.
> 
> > >>  When this option is not used,
> > >> then behave is 100% same as before (with same numbers of SQL queries for 
> > >> -t
> > >> option). It is based on Oleksandr's documentation (and lightly modified
> > >> philosophy), and cleaned my previous patch. A test on wildchard existency
> > >> "strcspn(cell->val, "?*")" cannot be used, because it doesn't calculate
> > >> quotes (but a replace has few lines only).
> 
> The new name "processSQLName" looks a bit
> bogus. processSQLNameIntenral would be a name commonly seen in
> such cases.

The ocrrent name is not that but processSQLNamePatternInternal.

> > >> There is a possibility to remove a wildcard char test and require least
> > >> one entry for patters too. But I am thinking, when somebody explicitly 
> > >> uses
> > >> any wildcard, then he calculate with a possibility of empty result.
> 
> Why do you think so? Wild cards are usually used to glob multiple
> names at once. One or more matches are expected for many or
> perhaps most cases, I think. Since so, if someone anticipate that
> some of his patterns have no match, I think he shouldn't specify
> --strict-names option at all.
> 
> Furthermore, I don't think no one wants to use both wildcarded
> and non-wildcarded name specs at once but this is a little out of
> scope.
> 
> I'd like to have opinions from others about this point.
> 
> > > other variant is using --strict-names behave as default (and implement
> > > negative option like --disable-strict-names or some similar).
> 
> This contradicts Josh's request. (which I'm not totally agree:p)
> 
> > Note: originally I though, we have to fix it and change the default behave.
> > But with special option, we don't need it. This option in help is signal
> > for user, so some is risky.
> 
> regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Autonomous Transaction is back

2015-07-22 Thread Rajeev rastogi

After few failed attempt to propose Autonomous transaction earlier. I along 
with Simon Riggs would like to propose again but completely different in 
approach.

We also had discussion about this feature in last PGCon2015 Unconference Day, 
those who missed this discussion, please refer

https://wiki.postgresql.org/wiki/AutonomousTransactionsUnconference2015


Before jumping into the design and code proposal for this feature, me along 
with Simon Riggs wanted to propose its behavior and usage to keep everyone in 
the same boat.
So we have summarized the behavior and usage of the Autonomous Transaction 
based on the discussion with community members in last PGCon2015 Unconference 
Day:

Behavior of Autonomous Transaction:
1.The autonomous transaction treated as a completely different 
transaction from the master transaction.
2.It should be allowed to deadlock with master transaction. We need 
to work-out a solution to avoid deadlock.
3.It can support multiple level of nesting based on the 
configuration (may be max as 70).
4.Outer (i.e. main or upper autonomous) transaction to be suspended 
while the inner autonomous transaction is running.
5.Outer transaction should not see data of inner till inner is 
committed (serializable upper transaction should not see even after inner 
transaction commit).

How to Use Autonomous Transaction:
1. We can issue explicit command to start an Autonomous transaction as below:
BEGIN AUTONOMOUS TRANSACTION  (Don't worry about keywords at 
this point.)
Do you work.
COMMIT/ROLLBACK   (Will commit/rollback the autonomous 
transaction and will return to main transaction or upper autonomous 
transaction).

2. The above commands can be issued either inside the procedure to make few 
statements of procedure inside autonomous transaction or even in stand-alone 
query execution.
3. We can make whole procedure itself as autonomous, which will be similar to 
start autonomous transaction in the beginning of the procedure and 
commit/rollback at the end of the procedure.

There was another discussion in Unconference Day to decide whether to implement 
COMMIT/ROLLBACK inside the procedure or autonomous transaction. So our opinion 
about this is that
COMMIT/ROLLBACK inside procedure will be somewhat different 
from Autonomous Transaction as incase of first, once we commit inside the 
procedure,
it commits everything done before call of procedure. This is the behavior of 
Oracle.
So in this case user required to be very careful to not do any operation before 
call of procedure, which is not yet intended to be committed inside procedure.

So we can prefer to implement Autonomous Transaction, which will not only be 
compatible with Oracle but also gives really strong required features.

I have not put the use-cases here as already we agree about its strong 
use-cases.

Requesting for everyone's opinion regarding this based on which we can proceed 
to enhance/tune/re-write our design.

Thanks and Regards,
Kumar Rajeev Rastogi

Re: [HACKERS] pg_dump quietly ignore missing tables - is it bug?

2015-07-22 Thread Kyotaro HORIGUCHI

Hello,

> > 2015-07-19 20:54 GMT+02:00 Pavel Stehule :
> >> I am sending updated version. It implements new long option
> >> "--strict-names". If this option is used, then for any entered name
> >> (without any wildcard char) must be found least one object. This option has
> >> not impact on patters (has wildcards chars).

I share the same option with Tom that it should behave in same
way regardless of the appearance of wildcards.

You replied Tom as this

> the behave is same - only one real identifier is allowed

But the description above about this patch apparently says that
they are differently handled. And
expand_(schema|table)_name_patterns does the further differently
thing from both of Tom's suggestion and what mentioned in your
reply to that. I will mention for this topic again in this mail.

# Might "only one real ident is allowed" mean "at most one
# match", but not "exactly one match"?

They do like this when strict-names.

  - Not allow no match for non-wildcarded names.

  - Allow no match for any wildcarded name spec and finally
allowing *all* of them don't match anyting.

This looks to me quite confusing.

> >>  When this option is not used,
> >> then behave is 100% same as before (with same numbers of SQL queries for -t
> >> option). It is based on Oleksandr's documentation (and lightly modified
> >> philosophy), and cleaned my previous patch. A test on wildchard existency
> >> "strcspn(cell->val, "?*")" cannot be used, because it doesn't calculate
> >> quotes (but a replace has few lines only).

The new name "processSQLName" looks a bit
bogus. processSQLNameIntenral would be a name commonly seen in
such cases.

> >> There is a possibility to remove a wildcard char test and require least
> >> one entry for patters too. But I am thinking, when somebody explicitly uses
> >> any wildcard, then he calculate with a possibility of empty result.

Why do you think so? Wild cards are usually used to glob multiple
names at once. One or more matches are expected for many or
perhaps most cases, I think. Since so, if someone anticipate that
some of his patterns have no match, I think he shouldn't specify
--strict-names option at all.

Furthermore, I don't think no one wants to use both wildcarded
and non-wildcarded name specs at once but this is a little out of
scope.

I'd like to have opinions from others about this point.

> > other variant is using --strict-names behave as default (and implement
> > negative option like --disable-strict-names or some similar).

This contradicts Josh's request. (which I'm not totally agree:p)

> Note: originally I though, we have to fix it and change the default behave.
> But with special option, we don't need it. This option in help is signal
> for user, so some is risky.

regards,


-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Reload SSL certificates on SIGHUP

2015-07-22 Thread Michael Paquier

On Wed, Jul 22, 2015 at 9:52 AM, Andreas Karlsson  wrote:
> On 07/02/2015 06:13 PM, Peter Eisentraut wrote:
>>
>> I think this would be a useful feature, and the implementation looks
>> sound.  But I don't like how the reload is organized.  Reinitializing
>> the context in the sighup handler, aside from questions about how much
>> work one should do in a signal handler, would cause SSL reinitialization
>> for unrelated reloads.  We have the GUC assign hook mechanism for
>> handling this sort of thing.  The trick would be that when multiple
>> SSL-related settings change, you only want to do one reinitialization.
>> You could either have the different assign hooks communicate with each
>> other somehow, or have them set a "need SSL init" flag that is checked
>> somewhere else.
>
>
> It is not enough to just add a hook to the GUCs since I would guess most
> users would expect the certificate to be reloaded if just the file has been
> replaced and no GUC was changed.

Why? It seems to me that the assign hook gets called once per process
at reload for a SIGHUP parameter even if its value is not changed, no?

> To support this we would need to also check
> the mtimes of the SSL files, would that complexity really be worth it?

Or we could reload the SSL context unconditionally once per reload
loop. I am wondering how costly that may prove to be though.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A little RLS oversight?

2015-07-22 Thread Robert Haas

On Wed, Jul 22, 2015 at 5:17 PM, Dean Rasheed  wrote:
> There's another issue here though -- just adding filters to the
> pg_stats view won't prevent a determined user from seeing the contents
> of the underlying table. For that, the view needs to have the
> security_barrier property. Arguably the fact that pg_stats isn't a
> security barrier view is a long-standing information leak allowing
> users to see values from tables for which they don't have any
> permissions. Is anyone concerned about that?

Hrm.  There's no help for that in the back-branches, but we should
probably change it in 9.5+.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add CINE for ALTER TABLE ... ADD COLUMN

2015-07-22 Thread Michael Paquier

On Thu, Jul 23, 2015 at 9:55 AM, Fabrízio de Royes Mello wrote:
> Thank you for the review.

+   /* skipp if the name already exists and if_not_exists is true */
s/skipp/skip.

Except that this looks in good shape to me (see attached for a version
fixing the typo) so switched to "Ready for committer".
-- 
Michael
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 207fec1..5a71c3c 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -36,7 +36,7 @@ ALTER TABLE ALL IN TABLESPACE name
 
 where action is one of:
 
-ADD [ COLUMN ] column_name data_type [ COLLATE collation ] [ column_constraint [ ... ] ]
+ADD [ COLUMN ] [ IF NOT EXISTS ] column_name data_type [ COLLATE collation ] [ column_constraint [ ... ] ]
 DROP [ COLUMN ] [ IF EXISTS ] column_name [ RESTRICT | CASCADE ]
 ALTER [ COLUMN ] column_name [ SET DATA ] TYPE data_type [ COLLATE collation ] [ USING expression ]
 ALTER [ COLUMN ] column_name SET DEFAULT expression
@@ -96,11 +96,12 @@ ALTER TABLE ALL IN TABLESPACE name
 
   

-ADD COLUMN
+ADD COLUMN [ IF NOT EXISTS ]
 
  
   This form adds a new column to the table, using the same syntax as
-  .
+  . If IF NOT EXISTS
+  is specified and the column already exists, no error is thrown.
  
 

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 1c7eded..92b2662 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -328,8 +328,9 @@ static void ATPrepAddColumn(List **wqueue, Relation rel, bool recurse, bool recu
 bool is_view, AlterTableCmd *cmd, LOCKMODE lockmode);
 static ObjectAddress ATExecAddColumn(List **wqueue, AlteredTableInfo *tab,
 Relation rel, ColumnDef *colDef, bool isOid,
-bool recurse, bool recursing, LOCKMODE lockmode);
-static void check_for_column_name_collision(Relation rel, const char *colname);
+bool recurse, bool recursing, bool if_not_exists, LOCKMODE lockmode);
+static bool check_for_column_name_collision(Relation rel, const char *colname,
+bool if_not_exists);
 static void add_column_datatype_dependency(Oid relid, int32 attnum, Oid typid);
 static void add_column_collation_dependency(Oid relid, int32 attnum, Oid collid);
 static void ATPrepAddOids(List **wqueue, Relation rel, bool recurse,
@@ -2304,7 +2305,7 @@ renameatt_internal(Oid myrelid,
 		oldattname)));
 
 	/* new name should not already exist */
-	check_for_column_name_collision(targetrelation, newattname);
+	(void) check_for_column_name_collision(targetrelation, newattname, false);
 
 	/* apply the update */
 	namestrcpy(&(attform->attname), newattname);
@@ -3455,11 +3456,11 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		case AT_AddColumnToView:		/* add column via CREATE OR REPLACE
 		 * VIEW */
 			address = ATExecAddColumn(wqueue, tab, rel, (ColumnDef *) cmd->def,
-	  false, false, false, lockmode);
+	  false, false, false, false, lockmode);
 			break;
 		case AT_AddColumnRecurse:
 			address = ATExecAddColumn(wqueue, tab, rel, (ColumnDef *) cmd->def,
-	  false, true, false, lockmode);
+	  false, true, false, cmd->missing_ok, lockmode);
 			break;
 		case AT_ColumnDefault:	/* ALTER COLUMN DEFAULT */
 			address = ATExecColumnDefault(rel, cmd->name, cmd->def, lockmode);
@@ -3572,14 +3573,14 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 			if (cmd->def != NULL)
 address =
 	ATExecAddColumn(wqueue, tab, rel, (ColumnDef *) cmd->def,
-	true, false, false, lockmode);
+	true, false, false, cmd->missing_ok, lockmode);
 			break;
 		case AT_AddOidsRecurse:	/* SET WITH OIDS */
 			/* Use the ADD COLUMN code, unless prep decided to do nothing */
 			if (cmd->def != NULL)
 address =
 	ATExecAddColumn(wqueue, tab, rel, (ColumnDef *) cmd->def,
-	true, true, false, lockmode);
+	true, true, false, cmd->missing_ok, lockmode);
 			break;
 		case AT_DropOids:		/* SET WITHOUT OIDS */
 
@@ -4677,7 +4678,7 @@ ATPrepAddColumn(List **wqueue, Relation rel, bool recurse, bool recursing,
 static ObjectAddress
 ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
 ColumnDef *colDef, bool isOid,
-bool recurse, bool recursing, LOCKMODE lockmode)
+bool recurse, bool recursing, bool if_not_exists, LOCKMODE lockmode)
 {
 	Oid			myrelid = RelationGetRelid(rel);
 	Relation	pgclass,
@@ -4771,8 +4772,14 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		elog(ERROR, "cache lookup failed for relation %u", myrelid);
 	relkind = ((Form_pg_class) GETSTRUCT(reltup))->relkind;
 
-	/* new name should not already exist */
-	check_for_column_name_collision(rel, colDef->colname);
+	/* skip if the name already exists and if_not_exists is true */
+	if (!check_for_column_name_collision(rel, colDef->colname, if_not_exists))
+	{
+		heap_close(attrdesc, RowExclusive

Re: [HACKERS] Queries runs slow on GPU with PG-Strom

2015-07-22 Thread Kouhei Kaigai

> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Josh Berkus
> Sent: Thursday, July 23, 2015 2:49 AM
> To: YANG; pgsql-hackers@postgresql.org; KaiGai Kohei
> Subject: Re: [HACKERS] Queries runs slow on GPU with PG-Strom
> 
> On 07/22/2015 08:16 AM, YANG wrote:
> >
> > Hello,
> >
> > I've performed some tests on pg_strom according to the wiki. But it seems 
> > that
> > queries run slower on GPU than CPU. Can someone shed a light on what's wrong
> > with my settings. My setup was Quadro K620 + CUDA 7.0 (For Ubuntu 14.10) +
> > Ubuntu 15.04. And the results was
> 
> I believe that pgStrom has its own mailing list.  KaiGai?
>
Sorry, I replied to this earlier.

We have no own mailing list, but issue tracker on GitHub is a proper
way to report problems to the developers.

  https://github.com/pg-strom/devel/issues

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Queries runs slow on GPU with PG-Strom

2015-07-22 Thread Kouhei Kaigai

Hi Yang,

> I've performed some tests on pg_strom according to the wiki. But it seems that
> queries run slower on GPU than CPU. Can someone shed a light on what's wrong
> with my settings. My setup was Quadro K620 + CUDA 7.0 (For Ubuntu 14.10) +
> Ubuntu 15.04. And the results was
>
  :
> ,
> | LOG:  CUDA Runtime version: 7.0.0
> | LOG:  NVIDIA driver version: 346.59
> | LOG:  GPU0 Quadro K620 (384 CUDA cores, 1124MHz), L2 2048KB, RAM 
> 2047MB
> (128bits, 900KHz), capability 5.0
> | LOG:  NVRTC - CUDA Runtime Compilation vertion 7.0
> | LOG:  redirecting log output to logging collector process
> | HINT:  Future log output will appear in directory "pg_log".
> `
>
It looks to me your GPU processor has poor memory access capability,
thus, preprocess of aggregation (that heavy uses atomic operations
towards the global memory) consumes majority of processing time.
Please try the query with:
  SET pg_strom.enable_gpupreagg = off;
GpuJoin uses less atomic operation, so it has an advantage.

GPU's two major advantage are massive amount of cores and higher
memory bandwidth than GPU, so, fundamentally, I'd like to recommend
to use better GPU board...
According to NVIDIA, K620 uses DDR3 DRAM, thus here is no advantage
on memory access speed. How about GTX750Ti (no external power is
needed like K620) or AWS's g2.2xlarge instance type?

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of YANG
> Sent: Thursday, July 23, 2015 12:16 AM
> To: pgsql-hackers@postgresql.org
> Subject: [HACKERS] Queries runs slow on GPU with PG-Strom
> 
> 
> Hello,
> 
> I've performed some tests on pg_strom according to the wiki. But it seems that
> queries run slower on GPU than CPU. Can someone shed a light on what's wrong
> with my settings. My setup was Quadro K620 + CUDA 7.0 (For Ubuntu 14.10) +
> Ubuntu 15.04. And the results was
> 
> with pg_strom
> =
> 
> explain SELECT count(*) FROM t0 WHERE sqrt((x-25.6)^2 + (y-12.8)^2) < 10;
> 
> 
> QUERY PLAN
> 
> 
> ---
>  Aggregate  (cost=190993.70..190993.71 rows=1 width=0) (actual
> time=18792.236..18792.236 rows=1 loops=1)
>->  Custom Scan (GpuPreAgg)  (cost=7933.07..184161.18 rows=86 width=108)
> (actual time=4249.656..18792.074 rows=77 loops=1)
>  Bulkload: On (density: 100.00%)
>  Reduction: NoGroup
>  Device Filter: (sqrtx - '25.6'::double precision) ^ '2'::double
> precision) + ((y - '12.8'::double precision) ^ '2'::double precision))) <
> '10'::double precision)
>  ->  Custom Scan (BulkScan) on t0  (cost=6933.07..182660.32
> rows=1060 width=0) (actual time=139.399..18499.246 rows=1000 loops=1)
>  Planning time: 0.262 ms
>  Execution time: 19268.650 ms
> (8 rows)
> 
> 
> 
> explain analyze SELECT cat, AVG(x) FROM t0 NATURAL JOIN t1 GROUP BY cat;
> 
> QUERY
> PLAN
> 
> --
>  HashAggregate  (cost=298541.48..298541.81 rows=26 width=12) (actual
> time=11311.568..11311.572 rows=26 loops=1)
>Group Key: t0.cat
>->  Custom Scan (GpuPreAgg)  (cost=5178.82..250302.07 rows=1088 width=52)
> (actual time=3304.727..11310.021 rows=2307 loops=1)
>  Bulkload: On (density: 100.00%)
>  Reduction: Local + Global
>  ->  Custom Scan (GpuJoin)  (cost=4178.82..248541.18 rows=1060
> width=12) (actual time=923.417..2661.113 rows=1000 loops=1)
>Bulkload: On (density: 100.00%)
>Depth 1: Logic: GpuHashJoin, HashKeys: (aid), JoinQual: (aid =
> aid), nrows_ratio: 1.
>->  Custom Scan (BulkScan) on t0  (cost=0.00..242858.60
> rows=1060 width=16) (actual time=6.980..871.431 rows=1000 loops=1)
>->  Seq Scan on t1  (cost=0.00..734.00 rows=4 width=4)
> (actual time=0.204..7.309 rows=4 loops=1)
>  Planning time: 47.834 ms
>  Execution time: 11355.103 ms
> (12 rows)
> 
> 
> without pg_strom
> 
> 
> test=# explain analyze SELECT count(*) FROM t0 WHERE sqrt((x-25.6)^2 +
> (y-12.8)^2) < 10;
> 
> QUERY PLAN
> 
> 
> 
>  Aggregate  (cost=426193.03..426193.04 rows=1 width=0) (actual
> time=3880.379..3880.379 rows=1 loops=1)
>->  Seq Scan on t0  (cost=0.00..417859.65 rows=353 width=0) (actual
> time=0.075..3859.200 rows=314063 loops=1)
>  Filter: (sqrt

Re: [HACKERS] [PATCH] SQL function to report log message

2015-07-22 Thread dinesh kumar

On Wed, Jul 22, 2015 at 8:56 PM, Michael Paquier 
wrote:

> On Thu, Jul 23, 2015 at 10:56 AM, dinesh kumar 
> wrote:
> > Hi All,
> >
> > On Mon, Jul 13, 2015 at 2:12 PM, Tom Lane  wrote:
> >>
> >> Jim Nasby  writes:
> >> > On 7/13/15 3:39 PM, dinesh kumar wrote:
> >> >> Ah. It's' my bad interpretation. Let me work on it, and will send a
> new
> >> >> patch as a wrapper sql function for ereport.
> >>
> >> > You might want to present a plan for that; it's not as trivial as it
> >> > sounds due to how ereport works. In particular, I'd want to see (at
> >> > minimum) the same functionality that plpgsql's RAISE command now
> >> > provides (errdetail, errhint, etc).
> >>
> >
> > Jim,
> >
> > For now, I  worked on (ERROR Level, ERROR Message, HIDE ERROR Stmt). In
> our
> > to do item description, I found this wrapper needs to return
> "Anyelement".
> > But, I believe, return "VOID" is enough for this function. Let me know
> if I
> > erred here.
> >
> > In design phase,
> >
> > 1. I took a CustomDataType with the elevel code, elevel text
> >
> > 2. Populated this CDT with all existing pre-processors, except {FATAL,
> > PANIC}. Since, we don't expose these to client.
> >
> > 3. By matching the user elevel text, processing the report log function.
> >
> > Find the attached patch with implementation.
>
>
Thanks Michael.

Uploaded my patch there.


Regards,
Dinesh
manojadinesh.blogspot.com


> Btw, if you want to get more attention for your patch as well as
> reviews, you should consider registering to the next commit fest of
> September:
> https://commitfest.postgresql.org/6/
> Regards,
> --
> Michael
>

Re: [HACKERS] [PATCH] SQL function to report log message

2015-07-22 Thread Michael Paquier

On Thu, Jul 23, 2015 at 10:56 AM, dinesh kumar  wrote:
> Hi All,
>
> On Mon, Jul 13, 2015 at 2:12 PM, Tom Lane  wrote:
>>
>> Jim Nasby  writes:
>> > On 7/13/15 3:39 PM, dinesh kumar wrote:
>> >> Ah. It's' my bad interpretation. Let me work on it, and will send a new
>> >> patch as a wrapper sql function for ereport.
>>
>> > You might want to present a plan for that; it's not as trivial as it
>> > sounds due to how ereport works. In particular, I'd want to see (at
>> > minimum) the same functionality that plpgsql's RAISE command now
>> > provides (errdetail, errhint, etc).
>>
>
> Jim,
>
> For now, I  worked on (ERROR Level, ERROR Message, HIDE ERROR Stmt). In our
> to do item description, I found this wrapper needs to return "Anyelement".
> But, I believe, return "VOID" is enough for this function. Let me know if I
> erred here.
>
> In design phase,
>
> 1. I took a CustomDataType with the elevel code, elevel text
>
> 2. Populated this CDT with all existing pre-processors, except {FATAL,
> PANIC}. Since, we don't expose these to client.
>
> 3. By matching the user elevel text, processing the report log function.
>
> Find the attached patch with implementation.

Btw, if you want to get more attention for your patch as well as
reviews, you should consider registering to the next commit fest of
September:
https://commitfest.postgresql.org/6/
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-22 Thread David Rowley

On 23 June 2015 at 05:37, Robert Haas  wrote:

> When a PostgreSQL system wedges, or when it becomes dreadfully slow
> for some reason, I often find myself relying on tools like strace,
> gdb, or perf to figure out what is happening.  This doesn't tend to
> instill customers with confidence; they would like (quite
> understandably) a process that doesn't require installing developer
> tools on their production systems, and doesn't require a developer to
> interpret the results, and perhaps even something that they could
> connect up to PEM or Nagios or whatever alerting system they are
> using.
>
> There are obviously many ways that we might think about improving
> things here, but what I'd like to do is try to put some better
> information in pg_stat_activity, so that when a process is not
> running, users can get some better information about *why* it's not
> running.  The basic idea is that pg_stat_activity.waiting would be
> replaced by a new column pg_stat_activity.wait_event, which would
> display the reason why that backend is waiting.  This wouldn't be a
> free-form text field, because that would be too expensive to populate.

I've not looked into the feasibility of it, but if it were also possible to
have a "waiting_for" column which would store the process ID of the process
that's holding a lock that this process is waiting on, then it would be
possible for some smart guy to write some code which draws beautiful
graphs, perhaps in Pg Admin 4 of which processes are blocking other
processes. I imagine this as a chart with an icon for each process.
Processes waiting on locks being released would have an arrow pointing to
their blocking process, if we clicked on that blocking process we could see
the query that it's running and various other properties that are existing
columns in pg_stat_activity.

Obviously this is blue-skies stuff, but if we had a few to provide that
information it would be a great step forward towards that.

Regards

David Rowley

--
 David Rowley   http://www.2ndQuadrant.com/

 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-22 Thread Kyotaro HORIGUCHI

Hi, I forgot to mention a significant point.

> At Wed, 22 Jul 2015 17:50:35 +0300, Ildus Kurbangaliev 
>  wrote in <55afadbb.9090...@postgrespro.ru>
> > On 07/22/2015 09:10 AM, Kyotaro HORIGUCHI wrote:
> > > Hello,
> > >
> > > At Tue, 21 Jul 2015 14:28:25 +0300, Ildus Kurbangaliev
> > >  wrote in
> > > <55ae2cd9.4050...@postgrespro.ru>
> > >> On 07/21/2015 01:18 PM, Andres Freund wrote:
> > >>> I'd very much like to avoid increasing the size of struct LWLock. We
> > >>> have a lot of those and I'd still like to inline them with the buffer
> > >>> descriptors. Why do we need a separate group and can't reuse the
> > >>> tranche?  That might require creating a few more tranches, but ...?
> > >>>
> > >>> Andres
> > >> Do you mean moving LWLocks defined by offsets and with dynamic sizes
> > >> to separate tranches?
> > > I think it is too much for the purpose. Only two new tranches and
> > > maybe one or some new members (maybe representing the group) of
> > > trances will do, I suppose.
> > 
> > Can you explain why only two new tranches?
> > There is 13 types of lwlocks (besides individual), and we need
> > separate them somehow.
> 
> Sorry, I minunderstood about tranche.
> 
> Currently tranches other than main are used by WALInsertLocks and
> ReplicationOrigins. Other "dynamic locks" are defined as parts of
> main LWLokcs since they have the same shape with individual
> lwlocks. Leaving the individual locks, every lock groups may have
> their own tranche if we allow lwlocks to have own tranche even if
> it is in MainLWLockArray. New 13-16 trances will be added but no
> need to register their name in LWLOCK_GROUPS[]. After all, this
> array would be renamed such as "IndividualLWLockNames" and the
> name-lookup can be done by the follwoing simple steps.
> 
> - If the the lock is in main tranche, lookup the individual name
>   array for its name.

 This lookup is doable by calculation and no need to scan.

> - Elsewise, use the name of its tranche.
> 
> Does this make sense?
> 
> > >> It sounds like an good option, but it will require refactoring of
> > >> current tranches. In current implementation
> > >> i tried not change current things.
> > > Now one of the most controversial points of this patch is the
> > > details of the implement, which largely affects performance drag,
> > > maybe.
> > >
> > >
> > >  From the viewpoint of performance, I have some comments on the
> > >  feature:
> > >
> > >   - LWLockReportStat runs a linear search loop which I suppose
> > > should be avoided even if the loop count is rather small for
> > > LWLocks, as Fujii-san said upthread or anywhere.
> > It runs only one time in first acquirement. In previous patch
> > it was much heavier. Anyway this code will be removed if we
> > split main tranche to smaller tranches.
> 
> Ah, this should be the same with what I wrote above, isn't it?
> 
> > >   - Currently pg_stat_activity view is the only API, which would
> > > be a bit heavy for sampling use. It'd be appreciated to have a
> > > far lighter means to know the same informtion.
> > Yes, pg_stat_activity is just information about current wait,
> > and it's too heavy for sampling.  Main goal of this patch was
> > creating base structures that can be used later.
> 
> Ok, I see it.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Proposing COPY .. WITH PERMISSIVE

2015-07-22 Thread dinesh kumar

Hi All,

Would like to propose PERMISSIVE mode for the COPY created out files.
I mean, at this moment, if do "COPY" as postgres instance owner, i can able
to read the file as non instance user as well, and would like to restrict
this to
instance owner WITH PERMISSIVE option.

Let me know your thoughts on this.

Regards,
Dinesh
manojadinesh.blogspot.com

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-22 Thread Kyotaro HORIGUCHI

Hello,

At Wed, 22 Jul 2015 17:50:35 +0300, Ildus Kurbangaliev 
 wrote in <55afadbb.9090...@postgrespro.ru>
> On 07/22/2015 09:10 AM, Kyotaro HORIGUCHI wrote:
> > Hello,
> >
> > At Tue, 21 Jul 2015 14:28:25 +0300, Ildus Kurbangaliev
> >  wrote in
> > <55ae2cd9.4050...@postgrespro.ru>
> >> On 07/21/2015 01:18 PM, Andres Freund wrote:
> >>> I'd very much like to avoid increasing the size of struct LWLock. We
> >>> have a lot of those and I'd still like to inline them with the buffer
> >>> descriptors. Why do we need a separate group and can't reuse the
> >>> tranche?  That might require creating a few more tranches, but ...?
> >>>
> >>> Andres
> >> Do you mean moving LWLocks defined by offsets and with dynamic sizes
> >> to separate tranches?
> > I think it is too much for the purpose. Only two new tranches and
> > maybe one or some new members (maybe representing the group) of
> > trances will do, I suppose.
> 
> Can you explain why only two new tranches?
> There is 13 types of lwlocks (besides individual), and we need
> separate them somehow.

Sorry, I minunderstood about tranche.

Currently tranches other than main are used by WALInsertLocks and
ReplicationOrigins. Other "dynamic locks" are defined as parts of
main LWLokcs since they have the same shape with individual
lwlocks. Leaving the individual locks, every lock groups may have
their own tranche if we allow lwlocks to have own tranche even if
it is in MainLWLockArray. New 13-16 trances will be added but no
need to register their name in LWLOCK_GROUPS[]. After all, this
array would be renamed such as "IndividualLWLockNames" and the
name-lookup can be done by the follwoing simple steps.

- If the the lock is in main tranche, lookup the individual name
  array for its name.

- Elsewise, use the name of its tranche.

Does this make sense?

> >> It sounds like an good option, but it will require refactoring of
> >> current tranches. In current implementation
> >> i tried not change current things.
> > Now one of the most controversial points of this patch is the
> > details of the implement, which largely affects performance drag,
> > maybe.
> >
> >
> >  From the viewpoint of performance, I have some comments on the
> >  feature:
> >
> >   - LWLockReportStat runs a linear search loop which I suppose
> > should be avoided even if the loop count is rather small for
> > LWLocks, as Fujii-san said upthread or anywhere.
> It runs only one time in first acquirement. In previous patch
> it was much heavier. Anyway this code will be removed if we
> split main tranche to smaller tranches.

Ah, this should be the same with what I wrote above, isn't it?

> >   - Currently pg_stat_activity view is the only API, which would
> > be a bit heavy for sampling use. It'd be appreciated to have a
> > far lighter means to know the same informtion.
> Yes, pg_stat_activity is just information about current wait,
> and it's too heavy for sampling.  Main goal of this patch was
> creating base structures that can be used later.

Ok, I see it.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] SQL function to report log message

2015-07-22 Thread dinesh kumar

Hi All,

On Mon, Jul 13, 2015 at 2:12 PM, Tom Lane  wrote:

> Jim Nasby  writes:
> > On 7/13/15 3:39 PM, dinesh kumar wrote:
> >> Ah. It's' my bad interpretation. Let me work on it, and will send a new
> >> patch as a wrapper sql function for ereport.
>
> > You might want to present a plan for that; it's not as trivial as it
> > sounds due to how ereport works. In particular, I'd want to see (at
> > minimum) the same functionality that plpgsql's RAISE command now
> > provides (errdetail, errhint, etc).
>
>
Jim,

For now, I  worked on (ERROR Level, ERROR Message, HIDE ERROR Stmt). In our
to do item description, I found this wrapper needs to return "Anyelement".
But, I believe, return "VOID" is enough for this function. Let me know if I
erred here.

In design phase,

1. I took a CustomDataType with the elevel code, elevel text

2. Populated this CDT with all existing pre-processors, except {FATAL,
PANIC}. Since, we don't expose these to client.

3. By matching the user elevel text, processing the report log function.

Find the attached patch with implementation.



> The real question is why the existing functionality in plpgsql isn't
> sufficient.  Somebody who wants a "log from SQL" function can easily
> write a simple plpgsql function that does exactly what they want,
> with no more or fewer bells-n-whistles than they need.  If we try
> to create a SQL function that does all that, it's likely to be a mess
> to use, even with named arguments.
>
> I'm not necessarily against the basic idea, but I think inventing
> something that actually offers an increment in usability compared
> to the existing alternative is going to be harder than it sounds.
>
>
Tom,

I agree with your inputs. We can build  pl/pgsql function as alternative
for this.

My initial proposal, and implementation was, logging messages to log file
irrespectively of our log settings. I was not sure we can do this with some
pl/perlu. And then, I started working on our to do item,
ereport, wrapper callable from SQL, and found it can be useful to have a
direct function call with required log level.

Thanks.

Regards,
Dinesh
manojadinesh.blogspot.com

regards, tom lane
>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 76f77cb..43dbaec 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -17850,6 +17850,15 @@ postgres=# SELECT * FROM 
pg_xlogfile_name_offset(pg_stop_backup());
 Return information about a file.

   
+  
+   
+pg_report_log(eleveltext, 
message anyelement, 
ishidestmtboolean)
+   
+   void
+   
+Write message into log file as per log level.
+   
+  
  
 

@@ -17918,6 +17927,24 @@ SELECT (pg_stat_file('filename')).modification;
 

 
+   
+pg_report_log
+   
+   
+pg_report_log is useful to write custom messages
+into current log destination and returns void.
+This function don't support the PANIC, FATAL log levels due to their 
unique internal DB usage, which may cause the database instability. Using 
ishidestmt, function can write or ignore the current SQL 
statement into the log file.
+Typical usages include:
+
+postgres=# SELECT pg_report_log('NOTICE', 'Custom Message', true);
+NOTICE:  Custom Message
+ pg_report_log 
+---
+ 
+(1 row)
+
+   
+
   
 
   
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index c0495d9..1c7c263 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -76,6 +76,91 @@ current_query(PG_FUNCTION_ARGS)
 }
 
 /*
+ * pg_report_log()
+ *
+ * Printing custom log messages in log file.
+ */
+
+typedef struct
+{
+   int ecode;
+   char *level;
+} errorlevels;
+
+Datum
+pg_report_log(PG_FUNCTION_ARGS)
+{
+
+   /*
+* Do not add FATAL, PANIC log levels to the below list.
+*/
+   errorlevels elevels[]={
+   {DEBUG5, "DEBUG5"}, {DEBUG4, "DEBUG4"}, {DEBUG3, 
"DEBUG3"},
+   {DEBUG2, "DEBUG2"}, {DEBUG1, "DEBUG1"}, {LOG, "LOG"},
+   {COMMERROR, "COMMERROR"}, {INFO, "INFO"}, {NOTICE, 
"NOTICE"},
+   {WARNING, "WARNING"}, {ERROR, "ERROR"}
+   /*
+* Adding PGERROR to elevels if WIN32
+*/
+   #ifdef WIN32
+   ,{PGERROR, "PGERROR"}
+   #endif
+   };
+
+   int itr = 0;
+   bool ishidestmt = false;
+   int noelevel = (int) sizeof(elevels)/sizeof(*elevels);
+   char *level;
+
+   level = text_to_cstring(PG_GETARG_TEXT_P(0));
+   ishidestmt = PG_GETARG_BOOL(2);
+
+   /*
+* Do not expose FATAL, PANIC log levels to outer world.
+*/
+   if(pg_strcasecmp("FATAL", level) == 0)
+   ereport(ERROR,
+   (errmsg("%s is an unsupported report log 
level.", level)));
+
+   else if(pg_strcasecmp("PANIC",

Re: [HACKERS] Asynchronous execution on FDW

2015-07-22 Thread Kyotaro HORIGUCHI

Hello,

> Let me ask an elemental question.
> 
> If we have ParallelAppend node that kicks a background worker process for
> each underlying child node in parallel, does ForeignScan need to do something
> special?

Although I don't see the point of the background worker in your
story but at least for ParalleMergeAppend, it would frequently
discontinues to scan by upper Limit so one more state, say setup
- which mans a worker is allocated but not started- would be
useful and the driver node might need to manage the number of
async execution. Or the driven nodes might do so inversely.

As for ForeignScan, it is merely an API for FDW and does nothing
substantial so it would have nothing special to do. As for
postgres_fdw, current patch restricts one execution per one
foreign server at once by itself. We would have to provide
another execution management if we want to have two or more
simultaneous scans per one foreign server at once.

Sorry for the focusless discussion but does this answer some of
your question?

> Expected waste of CPU or I/O is common problem to be solved, however, it does
> not need to add a special case handling to ForeignScan, I think.
> How about your opinion?

I agree with you that ForeignScan as the wrapper for FDWs don't
need anything special for the case. I suppose for now that
avoiding the penalty from abandoning too many speculatively
executed scans (or other works on bg worker like sorts) would be
a business of the upper node of FDWs, or somewhere else.

However, I haven't dismissed the possibility that some common
works related to resource management could be integrated into
executor (or even into planner), but I see none for now.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add CINE for ALTER TABLE ... ADD COLUMN

2015-07-22 Thread Fabrízio de Royes Mello

On Thu, Jul 16, 2015 at 10:36 PM, Michael Paquier 
wrote:
>
> I had a look at this patch, and here are some minor comments:
> 1) In alter_table.sgml, you need a space here:
> [ IF NOT EXISTS ] 2)
> +   check_for_column_name_collision(targetrelation, newattname,
false);
> (void) needs to be added in front of check_for_column_name_collision
> where its return value is not checked or static code analyzers are
> surely going to complain.

Fixed.


> 3) Something minor, some lines of codes exceed 80 characters, see
> declaration of check_for_column_name_collision for example...

Fixed.


> 4) This comment needs more precisions?
> /* new name should not already exist */
> -   check_for_column_name_collision(rel, colDef->colname);
> +   if (!check_for_column_name_collision(rel, colDef->colname,
> if_not_exists))
> The new name can actually exist if if_not_exists is true.
>

Improved the comment.


> Except that the implementation looks sane to me.
>

Thank you for the review.

Att,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 207fec1..5a71c3c 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -36,7 +36,7 @@ ALTER TABLE ALL IN TABLESPACE name
 
 where action is one of:
 
-ADD [ COLUMN ] column_name data_type [ COLLATE collation ] [ column_constraint [ ... ] ]
+ADD [ COLUMN ] [ IF NOT EXISTS ] column_name data_type [ COLLATE collation ] [ column_constraint [ ... ] ]
 DROP [ COLUMN ] [ IF EXISTS ] column_name [ RESTRICT | CASCADE ]
 ALTER [ COLUMN ] column_name [ SET DATA ] TYPE data_type [ COLLATE collation ] [ USING expression ]
 ALTER [ COLUMN ] column_name SET DEFAULT expression
@@ -96,11 +96,12 @@ ALTER TABLE ALL IN TABLESPACE name
 
   

-ADD COLUMN
+ADD COLUMN [ IF NOT EXISTS ]
 
  
   This form adds a new column to the table, using the same syntax as
-  .
+  . If IF NOT EXISTS
+  is specified and the column already exists, no error is thrown.
  
 

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 1c7eded..05fbe51 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -328,8 +328,9 @@ static void ATPrepAddColumn(List **wqueue, Relation rel, bool recurse, bool recu
 bool is_view, AlterTableCmd *cmd, LOCKMODE lockmode);
 static ObjectAddress ATExecAddColumn(List **wqueue, AlteredTableInfo *tab,
 Relation rel, ColumnDef *colDef, bool isOid,
-bool recurse, bool recursing, LOCKMODE lockmode);
-static void check_for_column_name_collision(Relation rel, const char *colname);
+bool recurse, bool recursing, bool if_not_exists, LOCKMODE lockmode);
+static bool check_for_column_name_collision(Relation rel, const char *colname,
+bool if_not_exists);
 static void add_column_datatype_dependency(Oid relid, int32 attnum, Oid typid);
 static void add_column_collation_dependency(Oid relid, int32 attnum, Oid collid);
 static void ATPrepAddOids(List **wqueue, Relation rel, bool recurse,
@@ -2304,7 +2305,7 @@ renameatt_internal(Oid myrelid,
 		oldattname)));
 
 	/* new name should not already exist */
-	check_for_column_name_collision(targetrelation, newattname);
+	(void) check_for_column_name_collision(targetrelation, newattname, false);
 
 	/* apply the update */
 	namestrcpy(&(attform->attname), newattname);
@@ -3455,11 +3456,11 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		case AT_AddColumnToView:		/* add column via CREATE OR REPLACE
 		 * VIEW */
 			address = ATExecAddColumn(wqueue, tab, rel, (ColumnDef *) cmd->def,
-	  false, false, false, lockmode);
+	  false, false, false, false, lockmode);
 			break;
 		case AT_AddColumnRecurse:
 			address = ATExecAddColumn(wqueue, tab, rel, (ColumnDef *) cmd->def,
-	  false, true, false, lockmode);
+	  false, true, false, cmd->missing_ok, lockmode);
 			break;
 		case AT_ColumnDefault:	/* ALTER COLUMN DEFAULT */
 			address = ATExecColumnDefault(rel, cmd->name, cmd->def, lockmode);
@@ -3572,14 +3573,14 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 			if (cmd->def != NULL)
 address =
 	ATExecAddColumn(wqueue, tab, rel, (ColumnDef *) cmd->def,
-	true, false, false, lockmode);
+	true, false, false, cmd->missing_ok, lockmode);
 			break;
 		case AT_AddOidsRecurse:	/* SET WITH OIDS */
 			/* Use the ADD COLUMN code, unless prep decided to do nothing */
 			if (cmd->def != NULL)
 address =
 	ATExecAddColumn(wqueue, tab, rel, (ColumnDef *) cmd->def,
-	true, true, false, lockmode);
+	true, true, false, cmd->missing_ok, lockmode);

Re: [HACKERS] fdw_scan_tlist for foreign table scans breaks EPQ testing, doesn't it?

2015-07-22 Thread Kouhei Kaigai

> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas
> Sent: Wednesday, July 22, 2015 11:19 PM
> To: Kaigai Kouhei(海外 浩平)
> Cc: Etsuro Fujita; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] fdw_scan_tlist for foreign table scans breaks EPQ 
> testing,
> doesn't it?
> 
> On Wed, Jul 22, 2015 at 8:13 AM, Kouhei Kaigai  wrote:
> >> No, what I'm concerned about is the case when scanrelid > 0.
> >>
> > Hmm. if scanrelid > 0, then fdw_scan_tlist should be NIL.
> > I want to put Assert(scanrelid==0 || fdw_scan_tlist == NIL) just after
> > the GetForeignPlan() in createplan.c.
> >
> > I'm curious why you tried to put valid fdw_scan_tlist for scanrelid > 0.
> > It's unusual.
> 
> Allowing that was part of the point of Tom Lane's commit
> 1a8a4e5cde2b7755e11bde2ea7897bd650622d3e.  See the second bullet
> point, after the comma.
>
Indeed, this commit allows ForeignScan to have fdw_scan_tlist, even if
scanrelid > 0, however, I'm uncertain about its reason/intention.
Does it a preparation for the upcoming target-list-pushdown??

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE .. ADD PRIMARY KEY .. USING INDEX has dump-restore hazard

2015-07-22 Thread Michael Paquier

On Thu, Jul 23, 2015 at 5:47 AM, Gurjeet Singh  wrote:
> On Wed, Jul 22, 2015 at 7:34 AM, Robert Haas  wrote:
>>
>> On Tue, Jul 21, 2015 at 8:34 PM, Michael Paquier
>>  wrote:
>> > On Wed, Jul 22, 2015 at 1:23 AM, Robert Haas 
>> > wrote:
>> >> Notice that the collation specifier is gone.  Oops.
>> >
>> > As it is not possible to specify directly a constraint for a PRIMARY
>> > KEY expression, what about switching dumpConstraint to have it use
>> > first a CREATE INDEX query with the collation and then use ALTER TABLE
>> > to attach the constraint to it? I am noticing that we already fetch
>> > the index definition in indxinfo via pg_get_indexdef. Thoughts?
>>
>> I guess the questions on my mind is "Why is it that you can't do this
>> directly when creating a primary key, but you can do it when turning
>> an existing index into a primary key?"

Sure. This does not seem to be complicated.

>> If there's a good reason for prohibiting doing this when creating a
>> primary key, then presumably it shouldn't be allowed when turning an
>> index into a primary key either.  If there's not, then why not extend
>> the PRIMARY KEY syntax to allow it?

Yeah, I think we should be able to define a collation in this case.
For example it is as well possible to pass a WITH clause with storage
parameters, though we do not document it in
table_constraint_using_index
(http://www.postgresql.org/docs/devel/static/sql-altertable.html).

> +1. I think in the short term we can treat this as a bug, and "fix" the bug
> by disallowing an index with attributes that cannot be present in an index
> created by PRIMARY KEY constraint. The collation attribute on one of the
> keys may be just one of many such attributes.

Er, pushing such a patch on back-branches may break applications that
do exactly what Robert did in his test case (using a unique index as
primary key with a collation), no? I am not sure that this is
acceptable. Taking the problem at its root by swtiching pg_dump to
define an index and then use it looks a more solid approach on back
branches.

> In the long term, we may want to allow collation in primary key, but that
> will be a feature ideally suited for a major version release.

Yep. Definitely.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-22 Thread Tom Lane

I wrote:
> We could alternatively provide two scan-initialization callbacks,
> one that analyzes the parameters before we do heap_beginscan,
> and another that can do additional setup afterwards.  Actually,
> that second call would really not be meaningfully different from
> the ReScan call, so another solution would be to just automatically
> invoke ReScan after we've created the HeapScanDesc.  TSMs could work
> around not having this by complicating their NextBlock function a bit
> (so that it would do some initialization on first call) but perhaps
> it would be easier to have the additional init call.

Actually, there's another reason why this has to be redesigned: evaluating
the parameter expressions of TABLESAMPLE during ExecInitSampleScan is
not OK.  Consider this:

regression=# select * from (values (0::float4),(100)) v(pct), lateral (select 
count(*) from tenk1 tablesample bernoulli (pct)) ss;
 pct | count 
-+---
   0 | 0
 100 | 0
(2 rows)

Surely the second execution of the subquery should've given me 1.
It doesn't because the TABLESAMPLE parameters aren't recomputed during a
rescan.  Even if you're minded to claim that that's all right, this is
definitely not acceptable:

regression=# select * from (values (0::float4),(100)) v(pct), lateral (select * 
from tenk1 tablesample bernoulli (pct)) ss;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

That one's crashing because "pct" is a Var that refers to an outer scan
tuple that doesn't exist yet.  I'm not real sure how come the case with
an aggregate manages not to crash, actually.

This needs to work more like LIMIT, which doesn't try to compute the
limit parameters until the first fetch.  So what we need is an Init
function that does very darn little indeed (maybe we don't even need
it at all), and then a ParamInspect function that is called at first fetch
or during a ReScan, and that one is the one that gets to look at the
evaluated parameter values.

If we wanted to let the method inspect the HeapScanDesc during setup
it would need still a third callback.  I'm not exactly sure if that's
worth the trouble.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] could not truncate directory "pg_subtrans": apparent wraparound

2015-07-22 Thread Heikki Linnakangas


On 06/13/2015 05:02 AM, Thomas Munro wrote:

Since the multixact equivalent of this problem[1] fell through the
cracks on the multixact mega-thread, here is an updated patch that
addresses this problem for both pg_subtrans and pg_multixact/offsets
using the same approach: always step back one multixact/xid (rather
than doing so only if oldest == next, which seemed like an unnecessary
complication, and a bit futile since the result of such a test is only
an instantaneous snapshot).  I've added this to the commitfest[2].  I
am also attaching a new set of repro scripts including a pair to test
the case where next multixact/xid == first valid ID (the scripts with
'wraparound' in the name, which use dirty pg_resetxlog tricks to get
into that situation).  In my previous patch I naively subtracted one,
which didn't work for those (even rarer!) cases.  The new patch steps
over the special ID values.


Thanks, great repro scripts!


I also took a look at the pg_clog and pg_commit_ts truncation
functions.  You could argue that they have the same problem in theory
(they pass a page number derived from the oldest xid to
SimpleLruTruncate, and maybe there is a way for that to be an xid that
hasn't been issued yet), but in practice I don't think it's a
reachable condition.  They use the frozen xid that is updated by
vacuuming, but vacuum itself advances the next xid counter in the
process.  Is there a path though the vacuum code that ever exposes
frozen xid == next xid?  In contrast, for pg_subtrans we use
GetOldestXmin(), which is equal to the next xid if there are no
running transactions, and for pg_multixact we use the oldest
multixact, which can be equal to the next multixact ID after a
wraparound vacuum because vacuum itself doesn't always consume
multixacts.


Yeah, I think pg_clog and pg_commit_ts are safe, for now. It's rather 
accidental, though.


There is one more instance of this bug though: in 9.2 and below, 
pg_multixact members are truncated similarly. Attached are corresponding 
repro-scripts for that case.


Committed with the additional fix for that.

- Heikki



repro-bogus-multixact-members-error.sh
Description: application/shellscript


repro-bogus-multixact-members-error-wraparound.sh
Description: application/shellscript

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A little RLS oversight?

2015-07-22 Thread Dean Rasheed

On 21 July 2015 at 04:53, Michael Paquier  wrote:
> On Tue, Jul 14, 2015 at 4:01 AM, Stephen Frost  wrote:
>> We need to be careful to avoid the slippery slope of trying to prevent
>> all covert channels, which has been extensively discussed previously.

I think this is more serious than the covert channel leaks discussed
before, since most_common_vals explicitly reveals values from the
table, making it an overt leak, albeit of a small portion of the
table's values.

> Looking at that I am not seeing any straight-forward way to resolve
> this issue except by hardening pg_stats by having an additional filter
> of this type so as a non-owner of a relation cannot see the stats of
> this table directly when RLS is enabled:
> c.relrowsecurity = false OR c.relowner = current_user::regrole::oid
> Attached is a patch doing that (/me now hides, expecting to receive
> laser shots because of the use of current_user on a system view).
> Thoughts?

Hmm, I think it probably ought to do more, based on whether or not RLS
is being bypassed or in force-mode -- see the first few checks in
get_row_security_policies(). Perhaps a new SQL-callable function
exposing those checks and calling check_enable_rls(). It's probably
still worth including the "c.relrowsecurity = false" check in SQL to
save calling the function for the majority of tables that don't have
RLS.

There's another issue here though -- just adding filters to the
pg_stats view won't prevent a determined user from seeing the contents
of the underlying table. For that, the view needs to have the
security_barrier property. Arguably the fact that pg_stats isn't a
security barrier view is a long-standing information leak allowing
users to see values from tables for which they don't have any
permissions. Is anyone concerned about that?

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql :: support for \ev viewname and \sv viewname

2015-07-22 Thread Dean Rasheed

On 22 July 2015 at 21:37, Tom Lane  wrote:
> Dean Rasheed  writes:
>> On 3 July 2015 at 20:50, Tom Lane  wrote:
>>> Oh?  If that were true, pg_dump wouldn't work on such views.  It is kind
>>> of a PITA for this purpose that it doesn't include the CREATE text for
>>> you, but we're surely not changing that behavior now.
>
>> This appears to be missing support for view options (WITH CHECK OPTION
>> and security_barrier), so editing a view with either of those options
>> will cause them to be stripped off.
>
> Hm.  Why exactly were those not implemented as part of pg_get_viewdef?
>

pg_get_viewdef in its current form is needed for the
information_schema "views" view, which has separate columns for the
view's query and its CHECK OPTIONs.

Arguably another function could be added. However, given the need for
psql to support older server versions, a new function wouldn't
actually help much, since psql would still need to be able to do it
the hard way in the absence of that new function on the server.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE .. ADD PRIMARY KEY .. USING INDEX has dump-restore hazard

2015-07-22 Thread Gurjeet Singh

On Tue, Jul 21, 2015 at 9:23 AM, Robert Haas  wrote:

> rhaas=# create unique index on foo (a collate "C");
> CREATE INDEX
> rhaas=# alter table foo add primary key using index foo_a_idx;
> ALTER TABLE
>

> Now dump and restore this database.  Then:
>

> Notice that the collation specifier is gone.  Oops.
>

The intent of the 'USING INDEX' feature was to work around the problem that
PRIMARY KEY indexes cannot be reindexed concurrently to reduce bloat.

Considering that the feature doesn't work [1] (at least as shown in example
in the docs [2]) if there are any foreign keys referencing the table,
there's scope for improvement.

There was proposal to support reindexing the primary key index, but I am
not sure where that stands. If that's already in, or soon to be in core, we
may put a deprecation notice around USING INDEX. OTOH, if reindexing PKeys
is too difficult, we may want to support index-replacement via a top-level
ALTER TABLE subcommand that works in CONCURRENT mode, and not recommend the
method shown in the docs (i.e deprecate the USING INDEX syntax).

[1]: The DROP CONSTRAINT part of the command fails if there are any FKeys
pointing to this table.
[2]: http://www.postgresql.org/docs/9.4/static/sql-altertable.html

-- 
Gurjeet Singh http://gurjeet.singh.im/

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-22 Thread Robert Haas

On Wed, Jul 22, 2015 at 2:55 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Jul 21, 2015 at 2:27 PM, Andres Freund  wrote:
>>> But I'm not going to complain too loudly if we don't do invalidation.
>
>> Not doing invalidation seems silly to me.  But I don't want to bend
>> Paul too far around the axle, either.
>
> Just to be clear here: the case we are concerned about is, given that
> we have determined that function X is or is not a member of one of the
> extensions marked "shippable" for a given connection, is it likely that
> that status will change (while the function continues to exist with
> the same OID) during the lifespan of the connection?  If it did change,
> how critical would it be for us to honor the new shippability criterion
> on that connection?  My answer to both is "not very".  So I'm not excited
> about expending lots of code or cycles to check for such changes.
>
> If we were trying to cache things across more than a connection lifespan,
> the answer might be different.

We've had a few cases like this at EnterpriseDB where we've not done
invalidations in situations like this, and customers have reported
them as bugs.  We've also had cases where PostgreSQL doesn't do this
that have been reported to EnterpriseDB as bugs:

http://www.postgresql.org/message-id/ca+tgmoydf7dkxhktk7u_ynafst47sgk5j8gd8c1jfsiouu1...@mail.gmail.com

If you know what's happening, these kinds of problems are often no big
deal: you just reconnect and it starts working.  The problem is that
customers often DON'T know what is happening, leading to a lot of
frustration and head-banging.  "Oh, let me see if reconnecting fixes
it" is just not something that tends to occur to people, at least IME.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE .. ADD PRIMARY KEY .. USING INDEX has dump-restore hazard

2015-07-22 Thread Gurjeet Singh

On Wed, Jul 22, 2015 at 7:34 AM, Robert Haas  wrote:

> On Tue, Jul 21, 2015 at 8:34 PM, Michael Paquier
>  wrote:
> > On Wed, Jul 22, 2015 at 1:23 AM, Robert Haas 
> wrote:
> >> Notice that the collation specifier is gone.  Oops.
> >
> > As it is not possible to specify directly a constraint for a PRIMARY
> > KEY expression, what about switching dumpConstraint to have it use
> > first a CREATE INDEX query with the collation and then use ALTER TABLE
> > to attach the constraint to it? I am noticing that we already fetch
> > the index definition in indxinfo via pg_get_indexdef. Thoughts?
>
> I guess the questions on my mind is "Why is it that you can't do this
> directly when creating a primary key, but you can do it when turning
> an existing index into a primary key?"
>
> If there's a good reason for prohibiting doing this when creating a
> primary key, then presumably it shouldn't be allowed when turning an
> index into a primary key either.  If there's not, then why not extend
> the PRIMARY KEY syntax to allow it?
>

+1. I think in the short term we can treat this as a bug, and "fix" the bug
by disallowing an index with attributes that cannot be present in an index
created by PRIMARY KEY constraint. The collation attribute on one of the
keys may be just one of many such attributes.

In the long term, we may want to allow collation in primary key, but that
will be a feature ideally suited for a major version release.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

Re: [HACKERS] MultiXact member wraparound protections are now enabled

2015-07-22 Thread Robert Haas

On Wed, Jul 22, 2015 at 4:11 PM, Peter Eisentraut  wrote:
> Why is this message logged by default in a fresh installation?  The
> technicality of that message doesn't seem to match the kinds of messages
> that we normally print at startup.

It seems nobody likes that message.

I did it that way because I wanted to provide an easy way for users to
know whether they had those protections enabled.  If you don't display
the message when things are already OK at startup, users have to make
a negative inference, like this: let's see, I'm on a version that is
new enough that it would have printed a message if the protections had
not been enabled, so the absence of the message must mean things are
OK.

But it seemed to me that this could be rather confusing.  I thought it
would be better to be explicit about whether the protections are
enabled in all cases.  That way, (1) if you see the message saying
they are enabled, they are enabled; (2) if you see the message saying
they are disabled, they are disabled; and (3) if you see neither
message, your version does not have those protections.

You are not the first person to dislike this, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql :: support for \ev viewname and \sv viewname

2015-07-22 Thread Tom Lane

Dean Rasheed  writes:
> On 3 July 2015 at 20:50, Tom Lane  wrote:
>> Oh?  If that were true, pg_dump wouldn't work on such views.  It is kind
>> of a PITA for this purpose that it doesn't include the CREATE text for
>> you, but we're surely not changing that behavior now.

> This appears to be missing support for view options (WITH CHECK OPTION
> and security_barrier), so editing a view with either of those options
> will cause them to be stripped off.

Hm.  Why exactly were those not implemented as part of pg_get_viewdef?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql :: support for \ev viewname and \sv viewname

2015-07-22 Thread Dean Rasheed

On 3 July 2015 at 20:50, Tom Lane  wrote:
> Petr Korobeinikov  writes:
>> Fixed. Now both \ev and \sv  numbering lines starting with "1". New version
>> attached.
>
> Applied with a fair amount of mostly-cosmetic adjustment.
>
>> As I've already noticed that pg_get_viewdef() does not support full syntax
>> of creating or replacing views.
>
> Oh?  If that were true, pg_dump wouldn't work on such views.  It is kind
> of a PITA for this purpose that it doesn't include the CREATE text for
> you, but we're surely not changing that behavior now.
>

This appears to be missing support for view options (WITH CHECK OPTION
and security_barrier), so editing a view with either of those options
will cause them to be stripped off.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] MultiXact member wraparound protections are now enabled

2015-07-22 Thread Peter Eisentraut

Why is this message logged by default in a fresh installation?  The
technicality of that message doesn't seem to match the kinds of messages
that we normally print at startup.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] security labels on databases are bad for dump & restore

2015-07-22 Thread Fabrízio de Royes Mello

On Wed, Jul 22, 2015 at 4:42 PM, Adam Brightwell <
adam.brightw...@crunchydatasolutions.com> wrote:
>
> [...]
>
> Also, I think this would potentially conflict with what Fabrízio is
> doing with CURRENT_DATABASE on COMMENT, though, I think it might be a
> preferable solution.
>
> https://commitfest.postgresql.org/5/229/
>

Unfortunately this code is a bit weird and will be better to move to the
next commitfest (I have no time to improve it yet), so we can join efforts
and implement all ideas and make the reviewers life easier with a more
consistency patch.

Seems reasonable?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

Re: [HACKERS] security labels on databases are bad for dump & restore

2015-07-22 Thread Adam Brightwell

> I don't think there's any line near pg_dumpall.  That tool seems to
> have grown out of desperation without much actual design.  I think it
> makes more sense to plan around that's the best pg_dump behavior for the
> various use cases.

Ok.

> I like Noah's proposal of having pg_dump --create reproduce all
> database-level state.

Should it be enabled by default?  If so, then wouldn't it make more
sense to call it --no-create and do the opposite?  So, --no-create
would exclude rather than include database-level information?  Would
enabling it by default cause issues with the current expected use of
the tool by end users?

How would this handle related global objects? It seems like this part
could get a little tricky.

Taking it one step further, would a --all option that dumps all
databases make sense as well?  Of course I know that's probably a
considerable undertaking and certainly beyond the current scope.
Though, I thought I'd throw it out there.

Also, I think this would potentially conflict with what Fabrízio is
doing with CURRENT_DATABASE on COMMENT, though, I think it might be a
preferable solution.

https://commitfest.postgresql.org/5/229/

-Adam

-- 
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-22 Thread Paul Ramsey

On July 22, 2015 at 12:15:14 PM, Andres Freund (and...@anarazel.de) wrote:
It doesn't seem that unlikely that somebody does an ALTER SERVER OPTIONS 
SET .. to add an extension to be shippable while connections are already 
using the fdw. It'll be confusing if some clients are fast and some 
others are really slow. 
This seems more likely than anyone mucking around with extension stuff (adding 
new functions (and working with FDW in production at the same time?)) or 
adding/dropping whole extensions (you’ll have more problems than a stale cache, 
whole columns will disappear if you "DROP EXTENSION postgis CASCADE"), and has 
the added benefit of not needing me to muck into core stuff for my silly 
feature.

I’ll have a look at doing invalidation for the case of changes to the FDW 
wrappers and servers.

P.

Re: [HACKERS] Eliminating CREATE INDEX comparator TID tie-breaker overhead

2015-07-22 Thread Peter Geoghegan

On Wed, Jul 22, 2015 at 11:03 AM, Robert Haas  wrote:
>> I'm not concerned about synchronized scans breaking my assumption of a
>> physical ordering of heaptuples being fed to tuplesort.c. I think that
>> it is unlikely to ever be worth seriously considering this case.
>
> Why not?

The case for doing this tie-breaker is theoretical. The original
rationale for adding it is now obsolete. On the other hand, the cost
of doing it is most certainly not theoretical.

If it's absolutely necessary to preserve a guarantee that equal tuples
are in TID order (rather than at most two staggered sequential
groupings per set of equal tuples -- possible with synchronized
scans), then I suggest we detect synchronized scans and disable this
optimization. They're not especially common, so it would still be
worthwhile, in my estimation.

>> I have a hard time imagining anything (beyond synchronous scans)
>> breaking my assumption that index tuplesorts receive tuples in heap
>> physical order. If anything was to break that in the future (e.g.
>> parallelizing the heap scan for index builds), then IMV the onus
>> should be on that new case to take appropriate precautions against
>> breaking my assumption.
>
> I'm very dubious about that.  There are lots of reasons why we might
> want to read tuples out of order; for example, suppose we want a
> parallel sequential scan to feed the sort.

I agree that there are many reasons to want to do that. If we ever get
parallel sorts, then having a bunch of memtuple arrays, each fed by a
worker participating in a parallel scan makes sense. Those runs could
still be sorted in physical order. Only the merge phase would have to
do pointer chasing to tie-break on the real TID, and maybe not even
then (because run number can also be a proxy for physical order when
merging, assuming some new parallel internal sort algorithm).
Actually, for tape sort/replacement selection sort, the initial heap
build (where run number 0 is assigned currently) could probably reuse
this trick. I just haven't done that because it would be significantly
more invasive and less helpful.

If it's just a matter of wanting to parallelize the heap scan for its
own sake, then I don't think that's likely to be a terribly effective
optimization for B-Tree index builds; most of the cost is always going
to be in the sort proper, which I'm targeting here. In any case, I
think that the very common case where an int4 PK index is built using
quicksort should not have to suffer to avoid a minor inconveniencing
of (say) parallel sorts.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-22 Thread Andres Freund

On 2015-07-22 14:55:15 -0400, Tom Lane wrote:
> Just to be clear here: the case we are concerned about is, given that
> we have determined that function X is or is not a member of one of the
> extensions marked "shippable" for a given connection, is it likely that
> that status will change (while the function continues to exist with
> the same OID) during the lifespan of the connection?  If it did change,
> how critical would it be for us to honor the new shippability criterion
> on that connection?  My answer to both is "not very".  So I'm not excited
> about expending lots of code or cycles to check for such changes.

It doesn't seem that unlikely that somebody does an ALTER SERVER OPTIONS
SET .. to add an extension to be shippable while connections are already
using the fdw. It'll be confusing if some clients are fast and some
others are really slow.

I think we should at least add invalidations for changing server
options. I think adding support for extension upgrades wouldn't hurt
either. Not particularly excited to ALTER EXTENSION ADD, but we could
add it easy enough.

> If we were trying to cache things across more than a connection lifespan,
> the answer might be different.

I think connection poolers defeat that logic more widely tha nwe
sometimes assume.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-22 Thread Tom Lane

Robert Haas  writes:
> On Tue, Jul 21, 2015 at 2:27 PM, Andres Freund  wrote:
>> But I'm not going to complain too loudly if we don't do invalidation.

> Not doing invalidation seems silly to me.  But I don't want to bend
> Paul too far around the axle, either.

Just to be clear here: the case we are concerned about is, given that
we have determined that function X is or is not a member of one of the
extensions marked "shippable" for a given connection, is it likely that
that status will change (while the function continues to exist with
the same OID) during the lifespan of the connection?  If it did change,
how critical would it be for us to honor the new shippability criterion
on that connection?  My answer to both is "not very".  So I'm not excited
about expending lots of code or cycles to check for such changes.

If we were trying to cache things across more than a connection lifespan,
the answer might be different.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench stats per script & other stuff

2015-07-22 Thread Andres Freund

On 2015-07-22 10:54:14 -0700, Josh Berkus wrote:
> Making command-line options order-dependant breaks a lot of system call
> libraries in various languages, as well as being easy to mess up.

What?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gsets: ROW expression semantic broken between 9.4 and 9.5

2015-07-22 Thread Andres Freund

Hi,

On 2015-07-22 23:53:26 +0530, Jeevan Chalke wrote:
> It looks like we have broken the ROW expression without explicit
> ROW keyword in GROUP BY.

That was "intentional", and is actually standards required
behaviour. GROUP BY (a, b) is the same as GROUP BY a,b. It'd otherwise
be pretty confusing because parens in GROUPING SETS would mean something
different than in GROUP BY.

We should probably add a release note entry about it...

Thanks for testing gsets!

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench stats per script & other stuff

2015-07-22 Thread Fabien COELHO




If so, I would vote for:
  -f script1.bench:3 -f script2.bench:1
over:
  -f script1.bench -w 3 -f script2.bench -w 1


Ok, I'll take that into consideration. Any other opinion out there? The 
current v3 version is:


  -w 3 -f script1.bench -w 1 -f script2.bench

With provision to generate errors if a -w is set but not used,
in two case.

 - in the middle ... -w 4  -w 1 ...
 - in the end ... -w 1 

I can provide -f x:weight easilly, but this mean that there will be no way 
to associate weight for internal scripts. Not orthogonal, not very 
elegant, but no big deal.


Oh, you misunderstand.  "script1" and "script2" are meant to be 
user-supplied names which then get reported in things like response time 
output.  They're labels.


Ok, that is much better. This means that labels should not choose names 
which may interact with other commands, so maybe a list would have been 
nice as well. Anyway, I do not think it is the way to go just for this 
feature.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Gsets: ROW expression semantic broken between 9.4 and 9.5

2015-07-22 Thread Jeevan Chalke

Hi

It looks like we have broken the ROW expression without explicit
ROW keyword in GROUP BY.
I mean, after Grouping sets merge, if we have (c1, c2) in group by,
we are treating it as ROW expression for grouping, but at the same
time we are allowing individual column in the target list.
However this was not true with PG9.4 where we error out saying
"column "c1" must appear in the GROUP BY clause..".

But if I use explicit ROW keyword, like ROW(c1, c2), then on PG95
it error outs for individual column reference in select list.

Example may clear more:

ON PG 9.5 (after grouping sets implementation)

postgres=# create view gstest1(a,b,v)
postgres-#   as values (1,1,10),(1,1,11),(1,2,12),(1,2,13),(1,3,14),
postgres-# (2,3,15),
postgres-# (3,3,16),(3,4,17),
postgres-# (4,1,18),(4,1,19);
CREATE VIEW

postgres=#
postgres=# SELECT a, b, max(v) FROM gstest1 GROUP BY (a, b)
ORDER BY 1, 2, 3 DESC;
 a | b | max
---+---+-
 1 | 1 |  11
 1 | 2 |  13
 1 | 3 |  14
 2 | 3 |  15
 3 | 3 |  16
 3 | 4 |  17
 4 | 1 |  19
(7 rows)

postgres=# SELECT a, b, max(v) FROM gstest1 GROUP BY ROW(a, b)
ORDER BY 1, 2, 3 DESC;
ERROR:  column "gstest1.a" must appear in the GROUP BY clause or be used in
an aggregate function
LINE 1: SELECT a, b, max(v) FROM gstest1 GROUP BY ROW(a, b) ORDER BY...
   ^

In above example, you see that when we have only (a, b), it is working fine.
But when we have ROW(a, b), it is throwing an error.
On PG 9.4 both cases are failing. Here it is:

postgres=# SELECT a, b, max(v) FROM gstest1 GROUP BY (a, b)
ORDER BY 1, 2, 3 DESC;
ERROR:  column "gstest1.a" must appear in the GROUP BY clause or be used in
an aggregate function
LINE 1: SELECT a, b, max(v) FROM gstest1 GROUP BY (a, b) ORDER BY 1,...
   ^
postgres=# SELECT a, b, max(v) FROM gstest1 GROUP BY ROW(a, b)
ORDER BY 1, 2, 3 DESC;
ERROR:  column "gstest1.a" must appear in the GROUP BY clause or be used in
an aggregate function
LINE 1: SELECT a, b, max(v) FROM gstest1 GROUP BY ROW(a, b) ORDER BY...
   ^

Do we broke ROW expression semantics in grouping sets implementation?

Any idea why is this happening?

Thanks
-- 
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: [HACKERS] "A huge debt of gratitude" - Michael Stonebraker

2015-07-22 Thread Robert Haas

On Tue, Jul 21, 2015 at 6:42 PM, Jolly Chen  wrote:
> You have probably heard that Mike Stonebraker recently won the Turing award.  
> A recording of his award lecture is available at:
> https://www.youtube.com/watch?v=BbGeKi6T6QI
>
> It is an entertaining talk overall. If you fast forward to about the 1:07 
> mark, he makes some comments about postgres.
>
> Here’s my rough transcription:
>
> "The abstract data type system in postgres has been added to a lot of 
> relational database systems. It's kind of de facto table stakes for 
> relational databases these days, essentially intact.  That idea was really a 
> good one. It was mentioned in the citation for my Turing award winning.  
> However, serendipity played a huge role, which is, the biggest impact of 
> postgres by far came from two Berkeley students that I'll affectionately call 
> Grumpy and Sleepy.  They converted the academic postgres prototype from QUEL 
> to SQL in 1995. This was in parallel to the commercial activity. And then a 
> pick-up team of volunteers, none of whom have anything to do with me or 
> Berkeley, have been shepherding that open source system ever since 1995. The 
> system that you get off the web for postgres comes from this pick-up team.  
> It is open source at its best and I want to just mention that I have nothing 
> to do with that and that collection of folks we all owe a huge debt of 
> gratitude to, because they have robustize that code line and made it so it 
> really works.”
>
> Thank you all so much for your hard work over the last twenty years!!

Wow, thanks for reaching out.  Here is a quote from the current
version of src/test/regress/input/misc.source:

--
-- BTREE shutting out non-functional updates
--
-- the following two tests seem to take a long time on some
-- systems.This non-func update stuff needs to be examined
-- more closely.- jolly (2/22/96)
--

That comment might be obsolete, but we still have it, and a few other
references.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-22 Thread Robert Haas

On Tue, Jul 21, 2015 at 2:27 PM, Andres Freund  wrote:
> But I'm not going to complain too loudly if we don't do invalidation.

Not doing invalidation seems silly to me.  But I don't want to bend
Paul too far around the axle, either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Volatility of pg_xact_commit_timestamp() and pg_last_committed_xact()

2015-07-22 Thread Robert Haas

On Thu, Jul 16, 2015 at 9:49 AM, Fujii Masao  wrote:
> Volatilities of pg_xact_commit_timestamp() and pg_last_committed_xact()
> are now STABLE. But ISTM that those functions can return different results
> even within a single statement. So we should change the volatilities of them
> to VOLATILE?

Sounds right to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] "A huge debt of gratitude" - Michael Stonebraker

2015-07-22 Thread Jolly Chen

Hey everyone,

You have probably heard that Mike Stonebraker recently won the Turing award.  A 
recording of his award lecture is available at:
https://www.youtube.com/watch?v=BbGeKi6T6QI

It is an entertaining talk overall. If you fast forward to about the 1:07 mark, 
he makes some comments about postgres.

Here’s my rough transcription:

"The abstract data type system in postgres has been added to a lot of 
relational database systems. It's kind of de facto table stakes for relational 
databases these days, essentially intact.  That idea was really a good one. It 
was mentioned in the citation for my Turing award winning.  However, 
serendipity played a huge role, which is, the biggest impact of postgres by far 
came from two Berkeley students that I'll affectionately call Grumpy and 
Sleepy.  They converted the academic postgres prototype from QUEL to SQL in 
1995. This was in parallel to the commercial activity. And then a pick-up team 
of volunteers, none of whom have anything to do with me or Berkeley, have been 
shepherding that open source system ever since 1995. The system that you get 
off the web for postgres comes from this pick-up team.  It is open source at 
its best and I want to just mention that I have nothing to do with that and 
that collection of folks we all owe a huge debt of gratitude to, because they 
have robustize that code line and made it so it really works.”

Thank you all so much for your hard work over the last twenty years!!

Affectionately,

Grumpy



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Eliminating CREATE INDEX comparator TID tie-breaker overhead

2015-07-22 Thread Robert Haas

On Tue, Jul 21, 2015 at 4:06 PM, Peter Geoghegan  wrote:
> Design considerations and consequences
> 

Good write-up.

> I'm not concerned about synchronized scans breaking my assumption of a
> physical ordering of heaptuples being fed to tuplesort.c. I think that
> it is unlikely to ever be worth seriously considering this case.

Why not?

> I have a hard time imagining anything (beyond synchronous scans)
> breaking my assumption that index tuplesorts receive tuples in heap
> physical order. If anything was to break that in the future (e.g.
> parallelizing the heap scan for index builds), then IMV the onus
> should be on that new case to take appropriate precautions against
> breaking my assumption.

I'm very dubious about that.  There are lots of reasons why we might
want to read tuples out of order; for example, suppose we want a
parallel sequential scan to feed the sort.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] brin index vacuum versus transaction snapshots

2015-07-22 Thread Alvaro Herrera

Kevin Grittner wrote:
> If you run `make installcheck` against a cluster with
> default_transaction_isolation = 'repeatable read' you get one
> failure:

So there's some code that's specifically intended to handle this case:

/*
 * If using transaction-snapshot mode, it would 
be possible
 * for another transaction to insert a tuple 
that's not
 * visible to our snapshot if we have already 
acquired one,
 * when in snapshot-isolation mode; therefore, 
disallow this
 * from running in such a transaction unless a 
snapshot hasn't
 * been acquired yet.
 *
 * This code is called by VACUUM and
 * brin_summarize_new_values. Have the error 
message mention
 * the latter because VACUUM cannot run in a 
transaction and
 * thus cannot cause this issue.
 */
if (IsolationUsesXactSnapshot() && 
FirstSnapshotSet)
ereport(ERROR,

(errcode(ERRCODE_INVALID_TRANSACTION_STATE),

errmsg("brin_summarize_new_values() cannot run in a transaction that has 
already obtained a snapshot")));

However, this comment is full of it because a snapshot is obtained even
for VACUUM.  So the fact that brin_summarize_new_values() is mentioned
as being in trouble is just a very minor issue: fixing that would just
be a matter of passing a flag down from the caller into
summarize_range() to indicate whether it's vacuum or the function.  The
real problem is that VACUUM should be allowed to run without error.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Alpha2/Beta1

2015-07-22 Thread Josh Berkus

All:

Sounds like the overwhemling consensus is Alpha2 then.  Will run with it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench stats per script & other stuff

2015-07-22 Thread Josh Berkus

On 07/21/2015 10:25 PM, Fabien COELHO wrote:
> 
> Hello Josh,
> 
>>> Maybe -f file.sql:weight (yuk from my point of view, but it can be done
>>> easily).
>>
>> Maybe it's past time for pgbench to have a config file?
> 
> That is an idea.  For "simple" usage, for backward compatibility and for
> people like me who like them, ISTM that options are fine too:-)
> 
> Also this may mean adding a dependency to some YAML library, configure
> issues (I'm not sure whether pg currently uses YAML, and JSON is quite
> verbose), maybe conditionals around the feature to compile without the
> dependency, more documentation...
> 
> I'm not sure all that is desirable just for weighting scripts.

Maybe not.

If so, I would vote for:

-f script1.bench:3 -f script2.bench:1

over:

-f script1.bench -w 3 -f script2.bench -w 1

Making command-line options order-dependant breaks a lot of system call
libraries in various languages, as well as being easy to mess up.

>> Given that we want to define some per-workload options, the config file
>> would probably need to be YAML or JSON, e.g.:
>>
>> [...]
>>
>> script1:
>> file: script1.bench
>> weight: 3
>> script2:
>> file: script2.bench
>> weight: 1
>>
>> the above would execute a pgbench with 16 clients, 4 threads, "script1"
>> three times as often as script2, and report stats at the script (rather
>> than SQL statement) level.
> 
> Yep. Probably numbering within field names should be avoided, so a list
> of records that could look like:

Oh, you misunderstand.  "script1" and "script2" are meant to be
user-supplied names which then get reported in things like response time
output.  They're labels. Better example:

deposit:
file: deposit.bench
weight: 3
withdrawal:
file: withdrawal.bench
weight: 3
reporting:
file: summary_report.bench
weigh: 1

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Queries runs slow on GPU with PG-Strom

2015-07-22 Thread Josh Berkus

On 07/22/2015 08:16 AM, YANG wrote:
> 
> Hello,
> 
> I've performed some tests on pg_strom according to the wiki. But it seems that
> queries run slower on GPU than CPU. Can someone shed a light on what's wrong
> with my settings. My setup was Quadro K620 + CUDA 7.0 (For Ubuntu 14.10) +
> Ubuntu 15.04. And the results was

I believe that pgStrom has its own mailing list.  KaiGai?


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] brin index vacuum versus transaction snapshots

2015-07-22 Thread Kevin Grittner

Jeff Janes  wrote:
> On Tue, Jul 21, 2015 at 3:10 PM, Kevin Grittner  wrote:
>> Kevin Grittner  wrote:
>>
>>> If you run `make installcheck` against a cluster with
>>> default_transaction_isolation = 'repeatable read' you get one
>>> failure:
>>
>>> + ERROR: brin_summarize_new_values() cannot run in a transaction that has 
>>> already obtained a snapshot
>>
>>> This is using source at commit 9faa6ae14f6098e4b55f0131f7ec2694a381fb87.
>>
>> Ah, the very next commit fixed this while I was running the tests.
>
> Still looks broken from here.

You are right; I rushed my test to see whether Tom's patch had any
impact, and neglected to set default_transaction_isolation after
intalling the new build.  :-(  The problem still reliably occurs,
and the error message still references a function name that was not
part of the problem.

I have added this to the "PostgreSQL 9.5 Open Items" Wiki page.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

2015-07-22 Thread Heikki Linnakangas


On 07/22/2015 11:18 AM, Simon Riggs wrote:

On 10 July 2015 at 00:06, Tom Lane  wrote:


Andres Freund  writes:

On 2015-07-06 11:49:54 -0400, Tom Lane wrote:

Rather than reverting cab9a0656c36739f, which would re-introduce a
different performance problem, perhaps we could have COPY create a new
relfilenode when it does this.  That should be safe if the table was
previously empty.



I'm not convinced that cab9a0656c36739f needs to survive in that
form. To me only allowing one COPY to benefit from the wal_level =
minimal optimization has a significantly higher cost than
cab9a0656c36739f.


What evidence have you got to base that value judgement on?

cab9a0656c36739f was based on an actual user complaint, so we have good
evidence that there are people out there who care about the cost of
truncating a table many times in one transaction.  On the other hand,
I know of no evidence that anyone's depending on multiple sequential
COPYs, nor intermixed COPY and INSERT, to be fast.  The original argument
for having this COPY optimization at all was to make restoring pg_dump
scripts in a single transaction fast; and that use-case doesn't care
about anything but a single COPY into a virgin table.



We have to backpatch this fix, so it must be both simple and effective.

Heikki's suggestions may be best, maybe not, but they don't seem
backpatchable.

Tom's suggestion looks good. So does Andres' suggestion. I have coded both.


Thanks. For comparison, I wrote a patch to implement what I had in mind.

When a WAL-skipping COPY begins, we add an entry for that relation in a 
"pending-fsyncs" hash table. Whenever we perform any action on a heap 
that would normally be WAL-logged, we check if the relation is in the 
hash table, and skip WAL-logging if so.


That was a simplified explanation. In reality, when WAL-skipping COPY 
begins, we also memorize the current size of the relation. Any actions 
on blocks greater than the old size are not WAL-logged, and any actions 
on smaller-numbered blocks are. This ensures that if you did any INSERTs 
on the table before the COPY, any new actions on the blocks that were 
already WAL-logged by the INSERT are also WAL-logged. And likewise if 
you perform any INSERTs after (or during, by trigger) the COPY, and they 
modify the new pages, those actions are not WAL-logged. So starting a 
WAL-skipping COPY splits the relation into two parts: the first part 
that is WAL-logged as usual, and the later part that is not WAL-logged. 
(there is one loose end marked with XXX in the patch on this, when one 
of the pages involved in a cold UPDATE is before the watermark and the 
other is after)


The actual fsync() has been moved to the end of transaction, as we are 
now skipping WAL-logging of any actions after the COPY as well.


And truncations complicate things further. If we emit a truncation WAL 
record in the transaction, we also make an entry in the hash table to 
record that. All operations on a relation that has been truncated must 
be WAL-logged as usual, because replaying the truncate record will 
destroy all data even if we fsync later. But we still optimize for 
"BEGIN; CREATE; COPY; TRUNCATE; COPY;" style patterns, because if we 
truncate a relation that has already been marked for fsync-at-COMMIT, we 
don't need to WAL-log the truncation either.



This is more invasive than I'd like to backpatch, but I think it's the 
simplest approach that works, and doesn't disable any of the important 
optimizations we have.



And what reason is there to think that this would fix all the problems?


I don't think either suggested fix could be claimed to be a great solution,
since there is little principle here, only heuristic. Heikki's solution
would be the only safe way, but is not backpatchable.


I can't get too excited about a half-fix that leaves you with data 
corruption in some scenarios.


I wrote a little test script to test all these different scenarios 
(attached). Both of your patches fail with the script.


- Heikki

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 2e3b9d2..9ef688b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2021,12 +2021,6 @@ FreeBulkInsertState(BulkInsertState bistate)
  * The new tuple is stamped with current transaction ID and the specified
  * command ID.
  *
- * If the HEAP_INSERT_SKIP_WAL option is specified, the new tuple is not
- * logged in WAL, even for a non-temp relation.  Safe usage of this behavior
- * requires that we arrange that all new tuples go into new pages not
- * containing any tuples from other transactions, and that the relation gets
- * fsync'd before commit.  (See also heap_sync() comments)
- *
  * The HEAP_INSERT_SKIP_FSM option is passed directly to
  * RelationGetBufferForTuple, which see for more info.
  *
@@ -2115,7 +2109,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	MarkBufferDirty(buffer);
 
 	/* XLOG stuff */
-	if (!(option

Re: [HACKERS] optimizing vacuum truncation scans

2015-07-22 Thread Jeff Janes

On Wed, Jul 22, 2015 at 6:59 AM, Robert Haas  wrote:

> On Mon, Jun 29, 2015 at 1:54 AM, Jeff Janes  wrote:
> > Attached is a patch that implements the vm scan for truncation.  It
> > introduces a variable to hold the last blkno which was skipped during the
> > forward portion.  Any blocks after both this blkno and after the last
> > inspected nonempty page (which the code is already tracking) must have
> been
> > observed to be empty by the current vacuum.  Any other process rendering
> the
> > page nonempty are required to clear the vm bit, and no other process can
> set
> > the bit again during the vacuum's lifetime.  So if the bit is still set,
> the
> > page is still empty without needing to inspect it.
>
> Urgh.  So if we do this, that forever precludes having HOT pruning set
> the all-visible bit.

I wouldn't say forever, as it would be easy to revert the change if
something more important came along that conflicted with it.  I don't think
this change would grow tentacles across the code that make it hard to
revert, you would just have to take the performance hit (and by that time,
maybe HDD will truly be dead anyway and so we don't care anymore). But yes,
that is definitely a downside.  HOT pruning is one example, but also one
could envision having someone (bgwriter?) set vm bits on unindexed tables.
Or if we invent some efficient way to know that no expiring tids for a
certain block range are stored in indexes, other jobs could also set the vm
bit on indexed tables.  Or parallel vacuums in the same table, not that I
really see a reason to have those.

> At the least we'd better document that carefully
> so that nobody breaks it later.  But I wonder if there isn't some
> better approach, because I would certainly rather that we didn't
> foreclose the possibility of doing something like that in the future.
>

But where do we document it (other than in-place)?  README.HOT doesn't seem
sufficient, and there is no README.vm.

I guess add an "Assert(InRecovery || running_a_vacuum);" to
the visibilitymap_set with a comment there, except that I don't know how to
implement running_a_vacuum so that it covers manual vacs as well as
autovac.  Perhaps assert that we hold a SHARE UPDATE EXCLUSIVE on rel?

The advantage of the other approach, just force kernel read-ahead to work
for us, is that it doesn't impose any of these restrictions on future
development.  The disadvantage is that I don't know how to auto-tune it, or
auto-disable it for SSD, and it will never be as quite as efficient.

Cheers,

Jeff

Re: [HACKERS] Parallel Seq Scan

2015-07-22 Thread Robert Haas

On Mon, Jul 6, 2015 at 8:49 PM, Haribabu Kommi  wrote:
> I ran some performance tests on a 16 core machine with large shared
> buffers, so there is no IO involved.
> With the default value of cpu_tuple_comm_cost, parallel plan is not
> getting generated even if we are selecting 100K records from 40
> million records. So I changed the value to '0' and collected the
> performance readings.
>
> Here are the performance numbers:
>
> selectivity(millions)  Seq scan(ms)  Parallel scan
>  2 workers
> 4 workers 8 workers
>   0.1  11498.934821.40
> 3305.843291.90
>   0.4  10942.984967.46
> 3338.583374.00
>   0.8  11619.445189.61
> 3543.863534.40
>   1.5  12585.515718.07
> 4162.712994.90
>   2.7  14725.668346.96
> 10429.058049.11
>   5.4  18719.00  20212.33 21815.19
>  19026.99
>   7.2  21955.79  28570.74 28217.60
>  27042.27
>
> The average table row size is around 500 bytes and query selection
> column width is around 36 bytes.
> when the query selectivity goes more than 10% of total table records,
> the parallel scan performance is dropping.

Thanks for doing this testing.  I think that is quite valuable.  I am
not too concerned about the fact that queries where more than 10% of
records are selected do not speed up.  Obviously, it would be nice to
improve that, but I think that can be left as an area for future
improvement.

One thing I noticed that is a bit dismaying is that we don't get a lot
of benefit from having more workers.  Look at the 0.1 data.  At 2
workers, if we scaled perfectly, we would be 3x faster (since the
master can do work too), but we are actually 2.4x faster.  Each
process is on the average 80% efficient.  That's respectable.  At 4
workers, we would be 5x faster with perfect scaling; here we are 3.5x
faster.   So the third and fourth worker were about 50% efficient.
Hmm, not as good.  But then going up to 8 workers bought us basically
nothing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-22 Thread Jim Nasby

On 7/22/15 9:15 AM, Robert Haas wrote:

I'm not proposing this feature, I'm merely asking for it to be defined in a
>way that makes it work for more than just VACUUM. Once we have a way of
>reporting useful information, other processes can be made to follow that
>mechanism, like REINDEX, ALTER TABLE etc.. I believe those things are
>important, even if we never get such information for user queries. But I
>hope we do.
>
>I won't get in the way of your search for detailed information in more
>complex forms. Both things are needed.

OK.

One idea I have is to create a system where we expose a command tag
(i.e. VACUUM) plus a series of generic fields whose specific meanings
are dependent on the command tag.  Say, 6 bigint counters, 6 float8
counters, and 3 strings up to 80 characters each.  So we have a
fixed-size chunk of shared memory per backend, and each backend that
wants to expose progress information can fill in those fields however
it likes, and we expose the results.

This would be sorta like the way pg_statistic works: the same columns
can be used for different purposes depending on what estimator will be
used to access them.

If we want to expose that level of detail, I think either JSON or arrays 
would make more sense, so we're not stuck with a limited amount of info. 
Perhaps DDL would be OK with the numbers you suggested, but 
https://www.pgcon.org/2013/schedule/events/576.en.html would not, and I 
think wanting query progress is much more common.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-22 Thread Jim Nasby


On 7/22/15 6:58 AM, Amit Langote wrote:

On Wed, Jul 22, 2015 at 8:19 PM, Alvaro Herrera
 wrote:


Not sure what Jim meant.  Maybe he meant to be aware of when spilling to
disk happens?  Obviously, things become slower, so maybe you need to
consider it for progress reporting purposes.



Perhaps the m_w_m determines how many dead tuples lazy_scan_heap() can
keep track of before doing a lazy_vacuum_indexes() +
lazy_vacuum_heap() round. Smaller the m_w_m, more the number of index
scans, slower the progress?


Yes. Any percent completion calculation will have to account for the 
case of needing multiple passes through all the indexes.


Each dead tuple requires 6 bytes (IIRC) of maintenance work mem. So if 
you're deleting 5M rows with m_w_m=1MB you should be getting many passes 
through the indexes. Studying the output of VACUUM VERBOSE will confirm 
that (or just throw a temporary WARNING in the path where we start the 
scan).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Queries runs slow on GPU with PG-Strom

2015-07-22 Thread YANG


Hello,

I've performed some tests on pg_strom according to the wiki. But it seems that
queries run slower on GPU than CPU. Can someone shed a light on what's wrong
with my settings. My setup was Quadro K620 + CUDA 7.0 (For Ubuntu 14.10) +
Ubuntu 15.04. And the results was

with pg_strom
=

explain SELECT count(*) FROM t0 WHERE sqrt((x-25.6)^2 + (y-12.8)^2) < 10;


  QUERY PLAN
---
 Aggregate  (cost=190993.70..190993.71 rows=1 width=0) (actual 
time=18792.236..18792.236 rows=1 loops=1)
   ->  Custom Scan (GpuPreAgg)  (cost=7933.07..184161.18 rows=86 width=108) 
(actual time=4249.656..18792.074 rows=77 loops=1)
 Bulkload: On (density: 100.00%)
 Reduction: NoGroup
 Device Filter: (sqrtx - '25.6'::double precision) ^ '2'::double 
precision) + ((y - '12.8'::double precision) ^ '2'::double precision))) < 
'10'::double precision)
 ->  Custom Scan (BulkScan) on t0  (cost=6933.07..182660.32 
rows=1060 width=0) (actual time=139.399..18499.246 rows=1000 loops=1)
 Planning time: 0.262 ms
 Execution time: 19268.650 ms
(8 rows)



explain analyze SELECT cat, AVG(x) FROM t0 NATURAL JOIN t1 GROUP BY cat;

QUERY PLAN
--
 HashAggregate  (cost=298541.48..298541.81 rows=26 width=12) (actual 
time=11311.568..11311.572 rows=26 loops=1)
   Group Key: t0.cat
   ->  Custom Scan (GpuPreAgg)  (cost=5178.82..250302.07 rows=1088 width=52) 
(actual time=3304.727..11310.021 rows=2307 loops=1)
 Bulkload: On (density: 100.00%)
 Reduction: Local + Global
 ->  Custom Scan (GpuJoin)  (cost=4178.82..248541.18 rows=1060 
width=12) (actual time=923.417..2661.113 rows=1000 loops=1)
   Bulkload: On (density: 100.00%)
   Depth 1: Logic: GpuHashJoin, HashKeys: (aid), JoinQual: (aid = 
aid), nrows_ratio: 1.
   ->  Custom Scan (BulkScan) on t0  (cost=0.00..242858.60 
rows=1060 width=16) (actual time=6.980..871.431 rows=1000 loops=1)
   ->  Seq Scan on t1  (cost=0.00..734.00 rows=4 width=4) 
(actual time=0.204..7.309 rows=4 loops=1)
 Planning time: 47.834 ms
 Execution time: 11355.103 ms
(12 rows)


without pg_strom


test=# explain analyze SELECT count(*) FROM t0 WHERE sqrt((x-25.6)^2 + 
(y-12.8)^2) < 10;
   
QUERY PLAN

 Aggregate  (cost=426193.03..426193.04 rows=1 width=0) (actual 
time=3880.379..3880.379 rows=1 loops=1)
   ->  Seq Scan on t0  (cost=0.00..417859.65 rows=353 width=0) (actual 
time=0.075..3859.200 rows=314063 loops=1)
 Filter: (sqrtx - '25.6'::double precision) ^ '2'::double 
precision) + ((y - '12.8'::double precision) ^ '2'::double precision))) < 
'10'::double precision)
 Rows Removed by Filter: 9685937
 Planning time: 0.411 ms
 Execution time: 3880.445 ms
(6 rows)

t=# explain analyze SELECT cat, AVG(x) FROM t0 NATURAL JOIN t1 GROUP BY cat;
  QUERY PLAN
--
 HashAggregate  (cost=431593.73..431594.05 rows=26 width=12) (actual 
time=4960.810..4960.812 rows=26 loops=1)
   Group Key: t0.cat
   ->  Hash Join  (cost=1234.00..381593.43 rows=1060 width=12) (actual 
time=20.859..3367.510 rows=1000 loops=1)
 Hash Cond: (t0.aid = t1.aid)
 ->  Seq Scan on t0  (cost=0.00..242858.60 rows=1060 width=16) 
(actual time=0.021..895.908 rows=1000 loops=1)
 ->  Hash  (cost=734.00..734.00 rows=4 width=4) (actual 
time=20.567..20.567 rows=4 loops=1)
   Buckets: 65536  Batches: 1  Memory Usage: 1919kB
   ->  Seq Scan on t1  (cost=0.00..734.00 rows=4 width=4) 
(actual time=0.017..11.013 rows=4 loops=1)
 Planning time: 0.567 ms
 Execution time: 4961.029 ms
(10 rows)



Here is the details how I installed pg_strom,

1. download postgresql 9.5alpha1 and compile it with

,
| ./configure --prefix=/export/pg-9.5 --enable-debug --enable-cassert
| make -j8 all
| make install
`

2. install cuda-7.0 (ubuntu 14.10 package from nvidia website)

3. download and compile pg_strom with pg_config in /export/pg-9.5/bin

,
| make
| make install
`


4. create a d

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-22 Thread Ildus Kurbangaliev


On 07/22/2015 09:10 AM, Kyotaro HORIGUCHI wrote:

Hello,

At Tue, 21 Jul 2015 14:28:25 +0300, Ildus Kurbangaliev 
 wrote in <55ae2cd9.4050...@postgrespro.ru>

On 07/21/2015 01:18 PM, Andres Freund wrote:

On 2015-07-21 13:11:36 +0300, Ildus Kurbangaliev wrote:

 /*
* Top-level transactions are identified by VirtualTransactionIDs
* comprising
diff --git a/src/include/storage/lwlock.h
b/src/include/storage/lwlock.h
index cff3b99..55b0687 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -58,6 +58,9 @@ typedef struct LWLock
   #ifdef LOCK_DEBUG
struct PGPROC *owner;   /* last exlusive owner of the lock */
   #endif
+
+ /* LWLock group, initialized as -1, calculated in first acquire */
+   int group;
   } LWLock;

I'd very much like to avoid increasing the size of struct LWLock. We
have a lot of those and I'd still like to inline them with the buffer
descriptors. Why do we need a separate group and can't reuse the
tranche?  That might require creating a few more tranches, but ...?

Andres

Do you mean moving LWLocks defined by offsets and with dynamic sizes
to separate tranches?

I think it is too much for the purpose. Only two new tranches and
maybe one or some new members (maybe representing the group) of
trances will do, I suppose.


Can you explain why only two new tranches?
There is 13 types of lwlocks (besides individual), and we need separate 
them somehow.

It sounds like an good option, but it will require refactoring of
current tranches. In current implementation
i tried not change current things.

Now one of the most controversial points of this patch is the
details of the implement, which largely affects performance drag,
maybe.


 From the viewpoint of performance, I have some comments on the feature:

  - LWLockReportStat runs a linear search loop which I suppose
should be avoided even if the loop count is rather small for
LWLocks, as Fujii-san said upthread or anywhere.

It runs only one time in first acquirement. In previous patch
it was much heavier. Anyway this code will be removed if we
split main tranche to smaller tranches.

  - Currently pg_stat_activity view is the only API, which would
be a bit heavy for sampling use. It'd be appreciated to have a
far lighter means to know the same informtion.

Yes, pg_stat_activity is just information about current wait,
and it's too heavy for sampling.  Main goal of this patch was
creating base structures that can be used later.

--
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE .. ADD PRIMARY KEY .. USING INDEX has dump-restore hazard

2015-07-22 Thread Robert Haas

On Tue, Jul 21, 2015 at 8:34 PM, Michael Paquier
 wrote:
> On Wed, Jul 22, 2015 at 1:23 AM, Robert Haas  wrote:
>> Notice that the collation specifier is gone.  Oops.
>
> As it is not possible to specify directly a constraint for a PRIMARY
> KEY expression, what about switching dumpConstraint to have it use
> first a CREATE INDEX query with the collation and then use ALTER TABLE
> to attach the constraint to it? I am noticing that we already fetch
> the index definition in indxinfo via pg_get_indexdef. Thoughts?

I guess the questions on my mind is "Why is it that you can't do this
directly when creating a primary key, but you can do it when turning
an existing index into a primary key?"

If there's a good reason for prohibiting doing this when creating a
primary key, then presumably it shouldn't be allowed when turning an
index into a primary key either.  If there's not, then why not extend
the PRIMARY KEY syntax to allow it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Alpha2/Beta1

2015-07-22 Thread Robert Haas

On Wed, Jul 22, 2015 at 2:50 AM, Etsuro Fujita
 wrote:
> On 2015/07/22 12:40, Noah Misch wrote:
>> I vote for alpha2.  Comparing the "Open Issues" and "resolved after
>> 9.5alpha1"
>> sections of https://wiki.postgresql.org/wiki/PostgreSQL_9.5_Open_Items,
>> we've
>> not transitioned to a qualitatively different level of API stability.
>
> +1 for alpha2.

+1 for alpha2 from me as well.  We have made some progress but there
is an awful lot of stuff left to get fixed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Implementation of global temporary tables?

2015-07-22 Thread Robert Haas

On Wed, Jul 15, 2015 at 11:52 AM, Simon Riggs  wrote:
> On 15 July 2015 at 16:44, Andres Freund  wrote:
>> On 2015-07-15 16:36:12 +0100, Simon Riggs wrote:
>> > On 15 July 2015 at 16:28, Andres Freund  wrote:
>> > > I think that's generally a fair point. But here we're discussing to
>> > > add
>> > > a fair amount of wrinkles with the copy approach. The fact alone that
>> > > the oid is different will have some ugly consequences.
>> > >
>> >
>> > Why? We are creating a local temp table LIKE the global temp table. That
>> > is
>> > already a supported operation. So there is no "different oid".
>>
>> Then your locking against ALTER, DROP etc. isn't going to work.
>
> There would be two objects, both locked. The temp table is just nice and
> simple. No problem.
>
> Your optimization may work; I hope it does. My approach definitely will. So
> we could choose either.

It's not really an optimization; it's a whole different approach.  I
looked at the create-a-temp-table-on-the-fly idea back when I
implemented unlogged tables and concluded it was an unworkable mess.
Deep down in the guts of name resolution code is not the place where
you want to suddenly decide that you need to run some DDL.  So I
believe in what Andres is proposing.  I'm not necessarily going to
shout it down if somebody finds a way to make the
temp-table-on-the-fly approach work, but my view is that making that
work, although it may look superficially appealing, will eventually
make whoever has to do it hate their life; and that even if they get
it to where it sorta works, it's going to have ugly corner cases that
are almost impossible to file down.

Another advantage of Andres's approach, BTW, is that it could
potentially eventually be extended to work on Hot Standby machines.
For that to work, we'd need a separate XID space for temporary tables,
but Noah proposed that before, and I don't think it's a completely
crazy idea (just mostly crazy).  Now, maybe nobody's going to care
about that any more in 5 years if we have full-blown logical
replication deeply integrated into core, but there's a lot to like
about a design that keeps our options in that area open.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pg_upgrade fails when postgres/template1 isn't in default tablespace

2015-07-22 Thread Marti Raudsepp

On Wed, Jul 22, 2015 at 5:00 PM, Robert Haas  wrote:
> +1.  I would recommend adding it to the CF *immediately* to have it
> get attention.   The CF app is basically our patch tracker.

Thanks, I have done so now: https://commitfest.postgresql.org/6/313/

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] stringify MAKE_SQLSTATE()

2015-07-22 Thread Teodor Sigaev


#define MAKE_SQLSTATE(ch1,ch2,ch3,ch4,ch5) \
((char[]){(char)(ch1),(char)(ch2),(char)(ch3),(char)(ch4),(char)(ch5),(char)'\0'})


I'm pretty sure that's a gcc-ism, not standard C.
Hmm, after some digging: the feature is called compound literals and it was 
introduced in c99 although gcc has support in c90. To disable this support it's 
needed a flag -pedantic. Sorry.



The real question is do we want to.  What's your use-case for this?
Just do not have a hardcoded values. It is too easy to make a mistake in 
hardcoded value.





--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fdw_scan_tlist for foreign table scans breaks EPQ testing, doesn't it?

2015-07-22 Thread Robert Haas

On Wed, Jul 22, 2015 at 8:13 AM, Kouhei Kaigai  wrote:
>> No, what I'm concerned about is the case when scanrelid > 0.
>>
> Hmm. if scanrelid > 0, then fdw_scan_tlist should be NIL.
> I want to put Assert(scanrelid==0 || fdw_scan_tlist == NIL) just after
> the GetForeignPlan() in createplan.c.
>
> I'm curious why you tried to put valid fdw_scan_tlist for scanrelid > 0.
> It's unusual.

Allowing that was part of the point of Tom Lane's commit
1a8a4e5cde2b7755e11bde2ea7897bd650622d3e.  See the second bullet
point, after the comma.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-22 Thread Robert Haas

On Wed, Jul 22, 2015 at 8:24 AM, Simon Riggs  wrote:
> * An estimate of the estimated time of completion - I liked your view that
> this prediction may be costly to request

I'm saying it may be massively unreliable, not that it may be costly.
(Someone else may have said that it would be costly, but I don't think
it was me.)

>> Most of the progress estimators I have seen over the ~30 years that
>> I've been playing with computers have been unreliable, and many of
>> those have been unreliable to the point of being annoying.  I think
>> that's likely to happen with what you are proposing too, though of
>> course like all predictions of the future it could turn out to be
>> wrong.
>
> Almost like an Optimizer then. Important, often annoyingly wrong, needs more
> work.

Yes, but with an important difference.  If the optimizer mis-estimates
the row count by 3x or 10x or 1000x, but the plan is OK anyway, it's
often the case that no one cares.  Except when the plan is bad, people
don't really care about the method used to derive it.  The same is not
true here: people will rely on the progress estimates directly, and
they will really care if they are not right.

> I'm not proposing this feature, I'm merely asking for it to be defined in a
> way that makes it work for more than just VACUUM. Once we have a way of
> reporting useful information, other processes can be made to follow that
> mechanism, like REINDEX, ALTER TABLE etc.. I believe those things are
> important, even if we never get such information for user queries. But I
> hope we do.
>
> I won't get in the way of your search for detailed information in more
> complex forms. Both things are needed.

OK.

One idea I have is to create a system where we expose a command tag
(i.e. VACUUM) plus a series of generic fields whose specific meanings
are dependent on the command tag.  Say, 6 bigint counters, 6 float8
counters, and 3 strings up to 80 characters each.  So we have a
fixed-size chunk of shared memory per backend, and each backend that
wants to expose progress information can fill in those fields however
it likes, and we expose the results.

This would be sorta like the way pg_statistic works: the same columns
can be used for different purposes depending on what estimator will be
used to access them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pg_upgrade fails when postgres/template1 isn't in default tablespace

2015-07-22 Thread Robert Haas

On Mon, Jul 20, 2015 at 8:20 AM, Michael Paquier
 wrote:
> After a certain amount of time without anything happening, I would
> recommend just adding it to a CF to have it get attention. I imagine
> that it is one of the reasons why there is as well a category "Bug
> Fix".

+1.  I would recommend adding it to the CF *immediately* to have it
get attention.   The CF app is basically our patch tracker.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] stringify MAKE_SQLSTATE()

2015-07-22 Thread Tom Lane

Teodor Sigaev  writes:
> Following discussion at https://commitfest.postgresql.org/5/190/ patch, I 
> found 
> (at seems to me) a way to stringify MAKE_SQLSTATE(), the idea is to use char 
> array as string:

> #define MAKE_SQLSTATE(ch1,ch2,ch3,ch4,ch5) \
> ((char[]){(char)(ch1),(char)(ch2),(char)(ch3),(char)(ch4),(char)(ch5),(char)'\0'})

I'm pretty sure that's a gcc-ism, not standard C.

As I mentioned in the other thread, if we wanted to provide string
versions of the ERRCODE macros, the safest way to do it would be
to create a separate header file with defines like

#define ERRCODE_WARNING_DEPRECATED_FEATURE "01P01"

Now that we generate errcodes.h from errcodes.txt anyway, it would
certainly be a trivial extension of the build infrastructure to make
a second header like this.

The real question is do we want to.  What's your use-case for this?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] optimizing vacuum truncation scans

2015-07-22 Thread Robert Haas

On Mon, Jun 29, 2015 at 1:54 AM, Jeff Janes  wrote:
> Attached is a patch that implements the vm scan for truncation.  It
> introduces a variable to hold the last blkno which was skipped during the
> forward portion.  Any blocks after both this blkno and after the last
> inspected nonempty page (which the code is already tracking) must have been
> observed to be empty by the current vacuum.  Any other process rendering the
> page nonempty are required to clear the vm bit, and no other process can set
> the bit again during the vacuum's lifetime.  So if the bit is still set, the
> page is still empty without needing to inspect it.

Urgh.  So if we do this, that forever precludes having HOT pruning set
the all-visible bit.  At the least we'd better document that carefully
so that nobody breaks it later.  But I wonder if there isn't some
better approach, because I would certainly rather that we didn't
foreclose the possibility of doing something like that in the future.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] stringify MAKE_SQLSTATE()

2015-07-22 Thread Teodor Sigaev


Hi!

Following discussion at https://commitfest.postgresql.org/5/190/ patch, I found 
(at seems to me) a way to stringify MAKE_SQLSTATE(), the idea is to use char 
array as string:


#include 

#define MAKE_SQLSTATE(ch1,ch2,ch3,ch4,ch5) \

((char[]){(char)(ch1),(char)(ch2),(char)(ch3),(char)(ch4),(char)(ch5),(char)'\0'})

#define ERRCODE_WARNING_DEPRECATED_FEATURE MAKE_SQLSTATE('0','1','P','0','1')

int
main(int argn, char* argv[])
{
char*x = ERRCODE_WARNING_DEPRECATED_FEATURE;

printf("%s\n", x);

return 0;
}

That works for clang ang gcc48 with flags -Wall -Wextra -ansi (or -std=c90) 
without warnings, but doesn't work with -pedantic. Is that enough to to work 
with other compilers?


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-22 Thread Simon Riggs

On 22 July 2015 at 13:00, Robert Haas  wrote:

> On Wed, Jul 22, 2015 at 3:02 AM, Simon Riggs 
> wrote:
> > For me, the user workflow looks like these
> >
> > Worried: "Task X is taking ages? When is it expected to finish?"
> > Ops: 13:50
> > 
> > Worried: "Task X is still running? But I thought its ETA was 13:50?"
> > Ops: Now says 14:30
> > Worried: "Is it stuck, or is it making progress?"
> > Ops: Looks like its making progress
> > Worried: "Can we have a look at it and find out what its doing?"
>
> How does Ops know that it is making progress?  Just because the
> completion percentage is changing?
>

You could, but that is not the way I suggested.

We need
* Some measure of actual progress (the definition of which may vary from
action to action, e.g. blocks scanned)
* Some estimate of the total work required
* An estimate of the estimated time of completion - I liked your view that
this prediction may be costly to request

> > In terms of VACUUM specifically: VACUUM should be able to assess
> beforehand
> > whether it will scan the indexes, or it can just assume that it will
> need to
> > scan the indexes. Perhaps VACUUM can pre-scan the VM to decide how big a
> > task it has before it starts.
>
> Well, we can assume that it will scan the indexes exactly once, but
> the actual number may be more or less; and the cost of rescanning the
> heap in phase 2 is also hard to estimate.
>
> Maybe I'm worrying over nothing, but I have a feeling that if we try
> to do what you're proposing here, we're gonna end up with this:
>
> https://xkcd.com/612/
>
> Most of the progress estimators I have seen over the ~30 years that
> I've been playing with computers have been unreliable, and many of
> those have been unreliable to the point of being annoying.  I think
> that's likely to happen with what you are proposing too, though of
> course like all predictions of the future it could turn out to be
> wrong.

Almost like an Optimizer then. Important, often annoyingly wrong, needs
more work.

I'm not proposing this feature, I'm merely asking for it to be defined in a
way that makes it work for more than just VACUUM. Once we have a way of
reporting useful information, other processes can be made to follow that
mechanism, like REINDEX, ALTER TABLE etc.. I believe those things are
important, even if we never get such information for user queries. But I
hope we do.

I won't get in the way of your search for detailed information in more
complex forms. Both things are needed.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] fdw_scan_tlist for foreign table scans breaks EPQ testing, doesn't it?

2015-07-22 Thread Kouhei Kaigai

> -Original Message-
> From: Etsuro Fujita [mailto:fujita.ets...@lab.ntt.co.jp]
> Sent: Wednesday, July 22, 2015 7:05 PM
> To: Kaigai Kouhei(海外 浩平)
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] fdw_scan_tlist for foreign table scans breaks EPQ 
> testing,
> doesn't it?
> 
> Hi KaiGai-san,
> 
> On 2015/07/22 16:44, Kouhei Kaigai wrote:
> >> The latest foreign-join pushdown patch allows fdw_scan_tlist to be set
> >> to a targetlist even for simple foreign table scans.  However, since I
> >> think we assume that the test tuple of a foreign table for an EPQ
> >> testing, whether it may be copied from the whole-row var or returned by
> >> the RefetchForeignRow routine, has the rowtype declared for the foreign
> >> table, ISTM that EPQ testing doesn't work properly in such a case since
> >> that the targetlist and qual are adjusted to reference fdw_scan_tlist in
> >> such a case.  Maybe I'm missing something though.
> >>
> > Let me confirm step-by-step.
> > For EPQ testing, whole-row-reference or RefetchForeignRow pulls a record
> > with row-type compatible to the base foreign table. Then, this record
> > is stored in the es_epqTuple[] indexed by the base relation.
> >
> > According to the previous discussion, I expect these tuples are re-checked
> > by built-in execution plan, but equivalent to the sub-plan entirely pushed
> > out to the remote side.
> > Do we see the same assumption?
> 
> No, what I'm concerned about is the case when scanrelid > 0.
>
Hmm. if scanrelid > 0, then fdw_scan_tlist should be NIL.
I want to put Assert(scanrelid==0 || fdw_scan_tlist == NIL) just after
the GetForeignPlan() in createplan.c.

I'm curious why you tried to put valid fdw_scan_tlist for scanrelid > 0.
It's unusual.

> > If so, next step is enhancement of ExecScanFetch() to run the alternative
> > built-in plans towards each es_epqTuple[] records, if given scanrelid==0.
> > In this case, expression nodes adjusted to fdw_scan_tlist never called,
> > so it should not lead any problems...?
> 
> When scanrelid = 0, I think we should run the alternative plans in
> ExecScanFetch or somewhere, as you mentioned.
>
OK,

> >> I don't understand custom scans/joins exactly, but I have a similar
> >> concern for the simple-custom-scan case too.
> >>
> > In case of custom scan/join, it fetches a record using heap_fetch()
> > identified by ctid, and saved to es_epqTuple[].
> > Then, EvalPlanQual() walks down the plan-tree. Once it appears a node
> > of custom-join (scanrelid==0), it shall call the equivalent alternatives
> > if possible, or calls ExecProcNode() towards the underlying nodes then
> > re-construct its result according to the custom_scan_tlist definition.
> >
> > It does not look to me problematic.
> 
> Sorry, I don't understand what you mean.  Maybe I have to learn more
> about custom scans/joins, but thanks for the explanation!
>
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-22 Thread Robert Haas

On Wed, Jul 22, 2015 at 3:02 AM, Simon Riggs  wrote:
> For me, the user workflow looks like these
>
> Worried: "Task X is taking ages? When is it expected to finish?"
> Ops: 13:50
> 
> Worried: "Task X is still running? But I thought its ETA was 13:50?"
> Ops: Now says 14:30
> Worried: "Is it stuck, or is it making progress?"
> Ops: Looks like its making progress
> Worried: "Can we have a look at it and find out what its doing?"

How does Ops know that it is making progress?  Just because the
completion percentage is changing?

> In terms of VACUUM specifically: VACUUM should be able to assess beforehand
> whether it will scan the indexes, or it can just assume that it will need to
> scan the indexes. Perhaps VACUUM can pre-scan the VM to decide how big a
> task it has before it starts.

Well, we can assume that it will scan the indexes exactly once, but
the actual number may be more or less; and the cost of rescanning the
heap in phase 2 is also hard to estimate.

Maybe I'm worrying over nothing, but I have a feeling that if we try
to do what you're proposing here, we're gonna end up with this:

https://xkcd.com/612/

Most of the progress estimators I have seen over the ~30 years that
I've been playing with computers have been unreliable, and many of
those have been unreliable to the point of being annoying.  I think
that's likely to happen with what you are proposing too, though of
course like all predictions of the future it could turn out to be
wrong.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-22 Thread Amit Langote

On Wed, Jul 22, 2015 at 8:19 PM, Alvaro Herrera
 wrote:
>
> Not sure what Jim meant.  Maybe he meant to be aware of when spilling to
> disk happens?  Obviously, things become slower, so maybe you need to
> consider it for progress reporting purposes.
>

Perhaps the m_w_m determines how many dead tuples lazy_scan_heap() can
keep track of before doing a lazy_vacuum_indexes() +
lazy_vacuum_heap() round. Smaller the m_w_m, more the number of index
scans, slower the progress?

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-22 Thread Alvaro Herrera

Thakur, Sameer wrote:
> Hello,
> >I think it'd be better to combine both numbers into one report:
> >It'd also be good to standardize on where the * 100 is happening.
> Done
> >can be replaced by
> >(itemptr->ipblkid != vacrelstats->last_scanned_page)
> Get compiler error : invalid operands to binary != (have ‘BlockIdData’ and 
> ‘BlockIdData’)
> >vacrelstats->current_index_scanned_page_count++;
> Done
> Please find v3 attached.
> 
> I am struggling to create  maintenance work memory exhaustion.  Did the 
> following
> maintenance_work_mem=1MB.
> Inserted 10 million records in tbl1 with 3 indexes. Deleted 5 million and 
> vacuumed. So far no error. I could keep bumping up the records to say 100 
> million and try to get this error.
> This seems a tedious manner to simulate maintenance work memory exhaustion. 
> Is there a better way?
> To insert I am using COPY (from a csv which has 10 million records) and 
> building indexes after insert is complete.

I don't think there's any maintenance work exhaustion that results in an
error.  The system is designed to use all the memory it is allowed to,
and to have other strategies when it's not sufficient to do the whole
sort.

Not sure what Jim meant.  Maybe he meant to be aware of when spilling to
disk happens?  Obviously, things become slower, so maybe you need to
consider it for progress reporting purposes.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] We need to support ForeignRecheck for late row locking, don't we?

2015-07-22 Thread Etsuro Fujita

While working on the issue "Foreign join pushdown vs EvalPlanQual", I
happened to notice odd behaviors of late row locking in FDWs.  An
example will be shown below using the attached postgres_fdw patch, which
is basically the same as [1], but I changed its isolation level to READ
COMMITTED and modified it so that the output parameter "updated" of
RefetchForeignRow is updated accordingly.

[Create an environment]

mydatabase=# create table mytable (a int);
mydatabase=# insert into mytable values (1);
postgres=# create extension postgres_fdw;
postgres=# create server myserver foreign data wrapper postgres_fdw
options (dbname 'mydatabase');
postgres=# create user mapping for current_user server myserver;
postgres=# create foreign table ftable (a int) server myserver options
(table_name 'mytable');

[Run concurrent transactions that behave oddly]

mydatabase=# begin;
mydatabase=# update mytable set a = a + 1;
postgres=# select * from ftable where a = 1 for update;
mydatabase=# commit;
(After the commit, the following result will be shown in the terminal
for the postgres database.  Note that the result doesn't satify a = 1.)
 a
---
 2
(1 row)

I think the reason for that is because we don't check pushed-down quals
inside an EPQ testing even if what was fetched by RefetchForeignRow was
an updated version of the tuple rather than the same version previously
obtained.  So, to fix this, I'd like to propose that pushed-down quals
be checked in ForeignRecheck.

Comments welcome!

Best regards,
Etsuro Fujita

[1] http://www.postgresql.org/message-id/16016.1431455...@sss.pgh.pa.us
*** a/contrib/postgres_fdw/connection.c
--- b/contrib/postgres_fdw/connection.c
***
*** 384,390  begin_remote_xact(ConnCacheEntry *entry)
  		if (IsolationIsSerializable())
  			sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
  		else
! 			sql = "START TRANSACTION ISOLATION LEVEL REPEATABLE READ";
  		do_sql_command(entry->conn, sql);
  		entry->xact_depth = 1;
  	}
--- 384,390 
  		if (IsolationIsSerializable())
  			sql = "START TRANSACTION ISOLATION LEVEL SERIALIZABLE";
  		else
! 			sql = "START TRANSACTION ISOLATION LEVEL READ COMMITTED";
  		do_sql_command(entry->conn, sql);
  		entry->xact_depth = 1;
  	}
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***
*** 89,94  typedef struct PgFdwRelationInfo
--- 89,96 
   *
   * 1) SELECT statement text to be sent to the remote server
   * 2) Integer list of attribute numbers retrieved by the SELECT
+  * 3) SELECT statement text to be sent to the remote server
+  * 4) Integer list of attribute numbers retrieved by the SELECT
   *
   * These items are indexed with the enum FdwScanPrivateIndex, so an item
   * can be fetched with list_nth().  For example, to get the SELECT statement:
***
*** 99,105  enum FdwScanPrivateIndex
  	/* SQL statement to execute remotely (as a String node) */
  	FdwScanPrivateSelectSql,
  	/* Integer list of attribute numbers retrieved by the SELECT */
! 	FdwScanPrivateRetrievedAttrs
  };
  
  /*
--- 101,111 
  	/* SQL statement to execute remotely (as a String node) */
  	FdwScanPrivateSelectSql,
  	/* Integer list of attribute numbers retrieved by the SELECT */
! 	FdwScanPrivateRetrievedAttrs,
! 	/* SQL statement to execute remotely (as a String node) */
! 	FdwScanPrivateSelectSql2,
! 	/* Integer list of attribute numbers retrieved by SELECT */
! 	FdwScanPrivateRetrievedAttrs2
  };
  
  /*
***
*** 187,192  typedef struct PgFdwModifyState
--- 193,222 
  } PgFdwModifyState;
  
  /*
+  * Execution state for fetching/locking foreign rows.
+  */
+ typedef struct PgFdwFetchState
+ {
+ 	Relation	rel;			/* relcache entry for the foreign table */
+ 	AttInMetadata *attinmeta;	/* attribute datatype conversion metadata */
+ 
+ 	/* for remote query execution */
+ 	PGconn	   *conn;			/* connection for the fetch */
+ 	char	   *p_name;			/* name of prepared statement, if created */
+ 
+ 	/* extracted fdw_private data */
+ 	char	   *query;			/* text of SELECT command */
+ 	List	   *retrieved_attrs;	/* attr numbers retrieved by SELECT */
+ 
+ 	/* info about parameters for prepared statement */
+ 	int			p_nums;			/* number of parameters to transmit */
+ 	FmgrInfo   *p_flinfo;		/* output conversion functions for them */
+ 
+ 	/* working memory context */
+ 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
+ } PgFdwFetchState;
+ 
+ /*
   * Workspace for analyzing a foreign table.
   */
  typedef struct PgFdwAnalyzeState
***
*** 277,282  static TupleTableSlot *postgresExecForeignDelete(EState *estate,
--- 307,318 
  static void postgresEndForeignModify(EState *estate,
  		 ResultRelInfo *resultRelInfo);
  static int	postgresIsForeignRelUpdatable(Relation rel);
+ static RowMarkType postgresGetForeignRowMarkType(RangeTblEntry *rte,
+ 			  LockClauseStrength strength);
+ static HeapTuple postgresRefetchForeignRow(EState *esta

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-22 Thread Thakur, Sameer

Hello,
>I think it'd be better to combine both numbers into one report:
>It'd also be good to standardize on where the * 100 is happening.
Done
>can be replaced by
>(itemptr->ipblkid != vacrelstats->last_scanned_page)
Get compiler error : invalid operands to binary != (have ‘BlockIdData’ and 
‘BlockIdData’)
>vacrelstats->current_index_scanned_page_count++;
Done
Please find v3 attached.

I am struggling to create  maintenance work memory exhaustion.  Did the 
following
maintenance_work_mem=1MB.
Inserted 10 million records in tbl1 with 3 indexes. Deleted 5 million and 
vacuumed. So far no error. I could keep bumping up the records to say 100 
million and try to get this error.
This seems a tedious manner to simulate maintenance work memory exhaustion. Is 
there a better way?
To insert I am using COPY (from a csv which has 10 million records) and 
building indexes after insert is complete.
Thank you
Sameer

__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.


IndexScanProgress_v3.patch
Description: IndexScanProgress_v3.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fdw_scan_tlist for foreign table scans breaks EPQ testing, doesn't it?

2015-07-22 Thread Etsuro Fujita

Hi KaiGai-san,

On 2015/07/22 16:44, Kouhei Kaigai wrote:
>> The latest foreign-join pushdown patch allows fdw_scan_tlist to be set
>> to a targetlist even for simple foreign table scans.  However, since I
>> think we assume that the test tuple of a foreign table for an EPQ
>> testing, whether it may be copied from the whole-row var or returned by
>> the RefetchForeignRow routine, has the rowtype declared for the foreign
>> table, ISTM that EPQ testing doesn't work properly in such a case since
>> that the targetlist and qual are adjusted to reference fdw_scan_tlist in
>> such a case.  Maybe I'm missing something though.
>>
> Let me confirm step-by-step.
> For EPQ testing, whole-row-reference or RefetchForeignRow pulls a record
> with row-type compatible to the base foreign table. Then, this record
> is stored in the es_epqTuple[] indexed by the base relation.
> 
> According to the previous discussion, I expect these tuples are re-checked
> by built-in execution plan, but equivalent to the sub-plan entirely pushed
> out to the remote side.
> Do we see the same assumption?

No, what I'm concerned about is the case when scanrelid > 0.

> If so, next step is enhancement of ExecScanFetch() to run the alternative
> built-in plans towards each es_epqTuple[] records, if given scanrelid==0.
> In this case, expression nodes adjusted to fdw_scan_tlist never called,
> so it should not lead any problems...?

When scanrelid = 0, I think we should run the alternative plans in
ExecScanFetch or somewhere, as you mentioned.

>> I don't understand custom scans/joins exactly, but I have a similar
>> concern for the simple-custom-scan case too.
>>
> In case of custom scan/join, it fetches a record using heap_fetch()
> identified by ctid, and saved to es_epqTuple[].
> Then, EvalPlanQual() walks down the plan-tree. Once it appears a node
> of custom-join (scanrelid==0), it shall call the equivalent alternatives
> if possible, or calls ExecProcNode() towards the underlying nodes then
> re-construct its result according to the custom_scan_tlist definition.
> 
> It does not look to me problematic.

Sorry, I don't understand what you mean.  Maybe I have to learn more
about custom scans/joins, but thanks for the explanation!

Best regards,
Etsuro Fujita


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] polymorphic types - enforce casting to most common type automatically

2015-07-22 Thread Heikki Linnakangas


On 07/11/2015 12:19 AM, Pavel Stehule wrote:

2015-07-10 18:43 GMT+02:00 Tom Lane :


An example of what would presumably happen if we adopted this sort of rule
(I've not checked whether the patch as written does this, but it would
logically follow) is that appending a float to an integer array would
cause the whole array to be silently promoted to float, with attendant
possible loss of precision for existing array elements.


it is based on select_common_type() - so it is use only available implicit
casts.


Without patch:

postgres=# select pg_typeof(array_append('{1,2,3}'::int[], 1.2::float));
ERROR:  function array_append(integer[], double precision) does not exist

With patch:

postgres=# select pg_typeof(array_append('{1,2,3}'::int[], 1.2::float));
 pg_typeof

 double precision[]
(1 row)


Yeah, I agree with Tom that we don't want that change in behaviour. I'll 
mark this as rejected.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Asynchronous execution on FDW

2015-07-22 Thread Kouhei Kaigai

> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kyotaro HORIGUCHI
> Sent: Wednesday, July 22, 2015 4:10 PM
> To: robertmh...@gmail.com
> Cc: hlinn...@iki.fi; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Asynchronous execution on FDW
> 
> Hello, thank you for the comment.
> 
> At Fri, 17 Jul 2015 14:34:53 -0400, Robert Haas  wrote
> in 
> > On Fri, Jul 3, 2015 at 4:41 PM, Heikki Linnakangas  wrote:
> > > At a quick glance, I think this has all the same problems as starting the
> > > execution at ExecInit phase. The correct way to do this is to kick off the
> > > queries in the first IterateForeignScan() call. You said that "ExecProc
> > > phase does not fit" - why not?
> >
> > What exactly are those problems?
> >
> > I can think of these:
> >
> > 1. If the scan is parametrized, we probably can't do it for lack of
> > knowledge of what they will be.  This seems easy; just don't do it in
> > that case.
> 
> We can put an early kick to foreign scans only for the first shot
> if we do it outside (before) ExecProc phase.
> 
> Nestloop
> -> SeqScan
> -> Append
>-> Foreign (Index) Scan
>-> Foreign (Index) Scan
>..
> 
> This plan premises precise (even to some extent) estimate for
> remote query but async execution within ExecProc phase would be
> in effect for this case.
> 
> 
> > 2. It's possible that we're down inside some subtree of the plan that
> > won't actually get executed.  This is trickier.
> 
> As for current postgres_fdw, it is done simply abandoning queued
> result then close the cursor.
> 
> > Consider this:
> >
> > Append
> > -> Foreign Scan
> > -> Foreign Scan
> > -> Foreign Scan
> > 
> >
> > If we don't start each foreign scan until the first tuple is fetched,
> > we will not get any benefit here, because we won't fetch the first
> > tuple from query #2 until we finish reading the results of query #1.
> > If the result of the Append node will be needed in its entirety, we
> > really, really want to launch of those queries as early as possible.
> > OTOH, if there's a Limit node with a small limit on top of the Append
> > node, that could be quite wasteful.
> 
> It's the nature of speculative execution, but the Limit will be
> pushed down onto every Foreign Scans near future.
> 
> > We could decide not to care: after all, if our limit is
> > satisfied, we can just bang the remote connections shut, and if
> > they wasted some CPU, well, tough luck for them.  But it would
> > be nice to be smarter.  I'm not sure how, though.
> 
> Appropriate fetch size will cap the harm and the case will be
> handled as I mentioned above as for postgres_fdw.
>
Horiguchi-san,

Let me ask an elemental question.

If we have ParallelAppend node that kicks a background worker process for
each underlying child node in parallel, does ForeignScan need to do something
special?

Expected waste of CPU or I/O is common problem to be solved, however, it does
not need to add a special case handling to ForeignScan, I think.
How about your opinion?

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

2015-07-22 Thread Simon Riggs

On 10 July 2015 at 00:06, Tom Lane  wrote:

> Andres Freund  writes:
> > On 2015-07-06 11:49:54 -0400, Tom Lane wrote:
> >> Rather than reverting cab9a0656c36739f, which would re-introduce a
> >> different performance problem, perhaps we could have COPY create a new
> >> relfilenode when it does this.  That should be safe if the table was
> >> previously empty.
>
> > I'm not convinced that cab9a0656c36739f needs to survive in that
> > form. To me only allowing one COPY to benefit from the wal_level =
> > minimal optimization has a significantly higher cost than
> > cab9a0656c36739f.
>
> What evidence have you got to base that value judgement on?
>
> cab9a0656c36739f was based on an actual user complaint, so we have good
> evidence that there are people out there who care about the cost of
> truncating a table many times in one transaction.  On the other hand,
> I know of no evidence that anyone's depending on multiple sequential
> COPYs, nor intermixed COPY and INSERT, to be fast.  The original argument
> for having this COPY optimization at all was to make restoring pg_dump
> scripts in a single transaction fast; and that use-case doesn't care
> about anything but a single COPY into a virgin table.
>

We have to backpatch this fix, so it must be both simple and effective.

Heikki's suggestions may be best, maybe not, but they don't seem
backpatchable.

Tom's suggestion looks good. So does Andres' suggestion. I have coded both.

And what reason is there to think that this would fix all the problems?
>

I don't think either suggested fix could be claimed to be a great solution,
since there is little principle here, only heuristic. Heikki's solution
would be the only safe way, but is not backpatchable.

Forcing SKIP_FSM to always extend has no negative side effects in other
code paths, AFAICS.

Patches attached. Martijn, please verify.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


fix_wal_logging_copy_truncate.v1.patch
Description: Binary data


fix_copy_zero_blocks.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fdw_scan_tlist for foreign table scans breaks EPQ testing, doesn't it?

2015-07-22 Thread Kouhei Kaigai

Fujita-san,

Sorry for my late response.

> The latest foreign-join pushdown patch allows fdw_scan_tlist to be set
> to a targetlist even for simple foreign table scans.  However, since I
> think we assume that the test tuple of a foreign table for an EPQ
> testing, whether it may be copied from the whole-row var or returned by
> the RefetchForeignRow routine, has the rowtype declared for the foreign
> table, ISTM that EPQ testing doesn't work properly in such a case since
> that the targetlist and qual are adjusted to reference fdw_scan_tlist in
> such a case.  Maybe I'm missing something though.
>
Let me confirm step-by-step.
For EPQ testing, whole-row-reference or RefetchForeignRow pulls a record
with row-type compatible to the base foreign table. Then, this record
is stored in the es_epqTuple[] indexed by the base relation.

According to the previous discussion, I expect these tuples are re-checked
by built-in execution plan, but equivalent to the sub-plan entirely pushed
out to the remote side.
Do we see the same assumption?

If so, next step is enhancement of ExecScanFetch() to run the alternative
built-in plans towards each es_epqTuple[] records, if given scanrelid==0.
In this case, expression nodes adjusted to fdw_scan_tlist never called,
so it should not lead any problems...?

> I don't understand custom scans/joins exactly, but I have a similar
> concern for the simple-custom-scan case too.
>
In case of custom scan/join, it fetches a record using heap_fetch()
identified by ctid, and saved to es_epqTuple[].
Then, EvalPlanQual() walks down the plan-tree. Once it appears a node
of custom-join (scanrelid==0), it shall call the equivalent alternatives
if possible, or calls ExecProcNode() towards the underlying nodes then
re-construct its result according to the custom_scan_tlist definition.

It does not look to me problematic.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Use pg_rewind when target timeline was switched

2015-07-22 Thread Alexander Korotkov

On Wed, Jul 22, 2015 at 8:48 AM, Michael Paquier 
wrote:

> On Mon, Jul 20, 2015 at 9:18 PM, Alexander Korotkov
>  wrote:
> > attached patch allows pg_rewind to work when target timeline was
> switched.
> > Actually, this patch fixes TODO from pg_rewind comments.
> >
> >   /*
> >* Trace the history backwards, until we hit the target timeline.
> >*
> >* TODO: This assumes that there are no timeline switches on the target
> >* cluster after the fork.
> >*/
> >
> > This patch allows pg_rewind to handle data directory synchronization is
> much
> > more general way. For instance, user can return promoted standby to old
> > master.
>
> Yes. That would be useful.
>
> > In this patch target timeline history is exposed as global variable.
> Index
> > in target timeline history is used in function interfaces instead of
> > specifying TLI directly. Thus, SimpleXLogPageRead() can easily start
> reading
> > XLOGs from next timeline when current timeline ends.
>
> The implementation looks rather neat by having a first look at it, but
> the attached patch fails to compile:
> pg_rewind.c:488:4: error: use of undeclared identifier 'targetEntry'
> targetEntry = &targetHistory[i];
> ^
> Nice to see as well that this has been added to the CF of September.
>

Uh, sorry. Fixed version is attached.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pg-rewind-target-switch-2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Asynchronous execution on FDW

2015-07-22 Thread Kyotaro HORIGUCHI

Hello, thank you for the comment.

At Fri, 17 Jul 2015 14:34:53 -0400, Robert Haas  wrote 
in 
> On Fri, Jul 3, 2015 at 4:41 PM, Heikki Linnakangas  wrote:
> > At a quick glance, I think this has all the same problems as starting the
> > execution at ExecInit phase. The correct way to do this is to kick off the
> > queries in the first IterateForeignScan() call. You said that "ExecProc
> > phase does not fit" - why not?
> 
> What exactly are those problems?
> 
> I can think of these:
> 
> 1. If the scan is parametrized, we probably can't do it for lack of
> knowledge of what they will be.  This seems easy; just don't do it in
> that case.

We can put an early kick to foreign scans only for the first shot
if we do it outside (before) ExecProc phase.

Nestloop
-> SeqScan
-> Append
   -> Foreign (Index) Scan
   -> Foreign (Index) Scan
   ..

This plan premises precise (even to some extent) estimate for
remote query but async execution within ExecProc phase would be
in effect for this case.


> 2. It's possible that we're down inside some subtree of the plan that
> won't actually get executed.  This is trickier.

As for current postgres_fdw, it is done simply abandoning queued
result then close the cursor.

> Consider this:
> 
> Append
> -> Foreign Scan
> -> Foreign Scan
> -> Foreign Scan
> 
> 
> If we don't start each foreign scan until the first tuple is fetched,
> we will not get any benefit here, because we won't fetch the first
> tuple from query #2 until we finish reading the results of query #1.
> If the result of the Append node will be needed in its entirety, we
> really, really want to launch of those queries as early as possible.
> OTOH, if there's a Limit node with a small limit on top of the Append
> node, that could be quite wasteful.

It's the nature of speculative execution, but the Limit will be
pushed down onto every Foreign Scans near future.

> We could decide not to care: after all, if our limit is
> satisfied, we can just bang the remote connections shut, and if
> they wasted some CPU, well, tough luck for them.  But it would
> be nice to be smarter.  I'm not sure how, though.

Appropriate fetch size will cap the harm and the case will be
handled as I mentioned above as for postgres_fdw.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-22 Thread Simon Riggs

On 21 July 2015 at 21:24, Robert Haas  wrote:

> On Tue, Jun 30, 2015 at 4:32 AM, Simon Riggs 
> wrote:
> > Yes, I suggest just a single column on pg_stat_activity called
> pct_complete
> >
> > trace_completion_interval = 5s (default)
> >
> > Every interval, we report the current % complete for any operation that
> > supports it. We just show NULL if the current operation has not reported
> > anything or never will.
>
> I am deeply skeptical about the usefulness of a progress-reporting
> system that can only report one number.  I think that, in many cases,
> it won't be possible to compute an accurate completion percentage, and
> even if it is, people may want more data than that for various reasons
> anyway.
>

The goal here was to have a common metric for all tasks. Multiple numbers
are OK for me, but not extended detail (yet!)

As I said later:

Simon Riggs  wrote:
> I'm interested in seeing a report that relates to actual progress made.
> Predicted total work required is also interesting, but is much less
trustworthy figure.

I'm aware of the difficulties for VACUUM in particular and agree with your
scenario/details.

That does not deter me from wishing to see high level information, even it
varies or is inaccurate. The "arrival time" on my Sat Nav is still useful,
even if it changes because of traffic jams that develop while my journey is
in progress. If the value bothers me, I look at the detail. So both summary
and detail information are useful, but summary is more important even
though it is less accurate.

Instead of getting told "we're X% done" (according to some arcane
> formula), it's quite reasonable to think that people will want to get
> a bunch of values, e.g.:
>
> 1. For what percentage of heap pages have we completed phase one (HOT
> prune + mark all visible if appropriate + freeze + remember dead
> tuples)?
> 2. For what percentage of heap pages have we completed phase two (mark
> line pointers unused)?
> 3. What percentage of maintenance_work_mem are we currently using to
> remember tuples?
>
> For a query, the information we want back is likely to be even more
> complicated; e.g. EXPLAIN output with row counts and perhaps timings
> to date for each plan node.  We can insist that all status reporting
> get boiled down to one number, but I suspect we would be better off
> asking ourselves how we could let commands return a richer set of
> data.

I agree that it is desirable to have a more detailed breakdown of what is
happening. As soon as we do that we hit the need for very action-specific
information reporting, which renders the topic much harder and much more
specific.

For me, the user workflow looks like these

Worried: "Task X is taking ages? When is it expected to finish?"
Ops: 13:50

Worried: "Task X is still running? But I thought its ETA was 13:50?"
Ops: Now says 14:30
Worried: "Is it stuck, or is it making progress?"
Ops: Looks like its making progress
Worried: "Can we have a look at it and find out what its doing?"

Worried: "When will Task Y finish?"
Ops: Monday at 11am
Worried: "Bad news! We should cancel it on Sunday evening."

The point is that nobody looks at the detailed info until we have looked at
the summary. So the summary of progress/completion time is important, even
if it is possibly wrong. The detail is also useful. I think we should have
both, but I'd like to see the summary info first, because it is the most
useful, best leading indicator of problems.

In terms of VACUUM specifically: VACUUM should be able to assess beforehand
whether it will scan the indexes, or it can just assume that it will need
to scan the indexes. Perhaps VACUUM can pre-scan the VM to decide how big a
task it has before it starts.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

84 matches

Mail list logo