Re: Backport "WITH ... AS MATERIALIZED" syntax to <12?

2019-10-18 Thread Andrew Gierth
> "Michael" == Michael Paquier  writes:

 > On Fri, Oct 18, 2019 at 02:21:30PM +0100, Colin Watson wrote:
 >> However, an alternative would be to backport the new syntax to some
 >> earlier versions. "WITH ... AS MATERIALIZED" can easily just be
 >> synonymous with "WITH ... AS" in versions prior to 12; there's no
 >> need to support "NOT MATERIALIZED" since that's explicitly
 >> requesting the new query-folding feature that only exists in 12.
 >> Would something like the attached patch against REL_11_STABLE be
 >> acceptable? I'd like to backpatch it at least as far as PostgreSQL
 >> 10.

 Michael> I am afraid that new features don't gain a backpatch. This is
 Michael> a project policy. Back-branches should just include bug fixes.

I do think an argument can be made for making an exception in this
particular case. This wouldn't be backpatching a feature, just accepting
and ignoring some of the new syntax to make upgrading easier.

-- 
Andrew (irc:RhodiumToad)




Re: v12.0: reindex CONCURRENTLY: lock ShareUpdateExclusiveLock on object 14185/39327/0 is already held

2019-10-18 Thread Michael Paquier
On Fri, Oct 18, 2019 at 01:26:27PM -0500, Justin Pryzby wrote:
> Checking if anybody is working on either of these
> https://www.postgresql.org/message-id/20191013025145.GC4475%40telsasoft.com
> https://www.postgresql.org/message-id/20191015164047.GA22729%40telsasoft.com

FWIW, I have spent an hour or two poking at this issue the last couple
of days so I am not ignoring both, not as much as I would have liked
but well...  My initial lookup leads me to think that something is
going wrong with the cleanup of the session-level lock on the parent
table taken in the first transaction doing the REINDEX CONCURRENTLY.
--
Michael


signature.asc
Description: PGP signature


Re: Backport "WITH ... AS MATERIALIZED" syntax to <12?

2019-10-18 Thread Michael Paquier
On Fri, Oct 18, 2019 at 02:21:30PM +0100, Colin Watson wrote:
> However, an alternative would be to backport the new syntax to some
> earlier versions.  "WITH ... AS MATERIALIZED" can easily just be
> synonymous with "WITH ... AS" in versions prior to 12; there's no need
> to support "NOT MATERIALIZED" since that's explicitly requesting the new
> query-folding feature that only exists in 12.  Would something like the
> attached patch against REL_11_STABLE be acceptable?  I'd like to
> backpatch it at least as far as PostgreSQL 10.

I am afraid that new features don't gain a backpatch.  This is a
project policy.  Back-branches should just include bug fixes.
--
Michael


signature.asc
Description: PGP signature


Re: recovery_min_apply_delay in archive recovery causes assertion failure in latch

2019-10-18 Thread Michael Paquier
On Thu, Oct 17, 2019 at 02:35:13PM +0900, Michael Paquier wrote:
> ArchiveRecoveryRequested will be set to true if recovery.signal or
> standby.signal are found, so it seems to me that you can make all
> those checks more simple by removing from the equation
> StandbyModeRequested, no?  StandbyModeRequested is never set to true
> if ArchiveRecoveryRequested is not itself true.

For the sake of the archives, this has been applied by Fujii-san as of
ec1259e8.
--
Michael


signature.asc
Description: PGP signature


Re: Remaining calls of heap_close/heap_open in the tree

2019-10-18 Thread Michael Paquier
On Fri, Oct 18, 2019 at 10:03:11AM +0900, Michael Paquier wrote:
> Not sure that's worth the trouble.  If there are no objections, I will
> remove the compatibility macros.

Okay, cleanup done with the compatibility macros removed.
--
Michael


signature.asc
Description: PGP signature


Re: v12.0: segfault in reindex CONCURRENTLY

2019-10-18 Thread Michael Paquier
On Fri, Oct 18, 2019 at 07:30:37AM -0300, Alvaro Herrera wrote:
> Sure thing, thanks, done :-)

Thanks, Alvaro.
--
Michael


signature.asc
Description: PGP signature


Missing error_context_stack = NULL in AutoVacWorkerMain()

2019-10-18 Thread Ashwin Agrawal
I am not sure if this causes any potential problems or not, but for
consistency of code seems we are missing below. All other places in code
where sigsetjmp() exists for top level handling has error_context_stack set
to NULL.

diff --git a/src/backend/postmaster/autovacuum.c
b/src/backend/postmaster/autovacuum.c
index 073f313337..b06d0ad058 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1558,6 +1558,9 @@ AutoVacWorkerMain(int argc, char *argv[])
 */
if (sigsetjmp(local_sigjmp_buf, 1) != 0)
{
+   /* Since not using PG_TRY, must reset error stack by hand */
+   error_context_stack = NULL;
+
/* Prevents interrupts while cleaning up */
HOLD_INTERRUPTS();

This was spotted by Paul during code inspection.


Backport "WITH ... AS MATERIALIZED" syntax to <12?

2019-10-18 Thread Colin Watson
I've been struggling with how we're going to upgrade launchpad.net to
PostgreSQL 12 or newer (we're currently on 10).  We're one of those
applications that deliberately uses CTEs as optimization fences in a few
difficult places.  The provision of the MATERIALIZED keyword in 12 is
great, but the fact that it doesn't exist in earlier versions is
awkward.  We certainly don't want to upgrade our application code at the
same time as upgrading the database, and dealing with performance
degradation between the database upgrade and the application upgrade
doesn't seem great either (not to mention that it would be hard to
coordinate).  That leaves us with hacking our query compiler to add the
MATERIALIZED keyword only if it's running on 12 or newer, which would be
possible but is pretty cumbersome.

However, an alternative would be to backport the new syntax to some
earlier versions.  "WITH ... AS MATERIALIZED" can easily just be
synonymous with "WITH ... AS" in versions prior to 12; there's no need
to support "NOT MATERIALIZED" since that's explicitly requesting the new
query-folding feature that only exists in 12.  Would something like the
attached patch against REL_11_STABLE be acceptable?  I'd like to
backpatch it at least as far as PostgreSQL 10.

This compiles and passes regression tests.

Thanks,

-- 
Colin Watson[cjwat...@canonical.com]
>From 063186eb678ad9831961d6319f7a4279f1029358 Mon Sep 17 00:00:00 2001
From: Colin Watson 
Date: Fri, 18 Oct 2019 14:08:11 +0100
Subject: [PATCH] Backport "WITH ... AS MATERIALIZED" syntax

Applications that deliberately use CTEs as optimization fences need to
adjust their code to prepare for PostgreSQL 12.  Unfortunately, the
MATERIALIZED keyword that they need to add isn't valid syntax in earlier
versions of PostgreSQL, so they're stuck with either upgrading the
application and the database simultaneously, accepting performance
degradation between the two parts of the upgrade, or doing complex query
compiler work to add MATERIALIZED conditionally.

It makes things much easier in these cases if the MATERIALIZED keyword
is accepted and ignored in earlier releases.  Users can then upgrade to
a suitable point release, change their application code to add
MATERIALIZED, and then upgrade to PostgreSQL 12.
---
 doc/src/sgml/queries.sgml   | 12 ++
 doc/src/sgml/ref/select.sgml| 18 +-
 src/backend/parser/gram.y   | 12 +++---
 src/test/regress/expected/subselect.out | 31 +
 src/test/regress/sql/subselect.sql  | 14 +++
 5 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index 88bc189646..cc33d92133 100644
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -2215,6 +2215,18 @@ SELECT n FROM t LIMIT 100;
rows.)
   
 
+  
+   In some cases, PostgreSQL 12 folds
+   WITH queries into the parent query, allowing joint
+   optimization of the two query levels.  You can override that decision by
+   specifying MATERIALIZED to force separate calculation
+   of the WITH query.  While versions of
+   PostgreSQL before 12 do not support folding of
+   WITH queries, specifying
+   MATERIALIZED is permitted to ease application
+   upgrades.
+  
+
   
The examples above only show WITH being used with
SELECT, but it can be attached in the same way to
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 4db8142afa..1bd711a3cb 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -72,7 +72,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionand with_query is:
 
-with_query_name [ ( column_name [, ...] ) ] AS ( select | values | insert | update | delete )
+with_query_name [ ( column_name [, ...] ) ] AS [ MATERIALIZED ] ( select | values | insert | update | delete )
 
 TABLE [ ONLY ] table_name [ * ]
 
@@ -290,6 +290,17 @@ TABLE [ ONLY ] table_name [ * ]
 row, the results are unspecified.

 
+   
+PostgreSQL 12 folds side-effect-free
+WITH queries into the primary query in some cases.
+To override this and retain the behaviour up to
+PostgreSQL 11, mark the
+WITH query as MATERIALIZED.  That
+might be useful, for example, if the WITH query is
+being used as an optimization fence to prevent the planner from choosing
+a bad plan.
+   
+

 See  for additional information.

@@ -2087,6 +2098,11 @@ SELECT distributors.* WHERE distributors.name = 'Westward';

 ROWS FROM( ... ) is an extension of the SQL standard.

+
+   
+The MATERIALIZED option of WITH is
+an extension of the SQL standard.
+   
   
 
  
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index bc65319c2c..70df09f409 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -479,7 +479,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node 

Re: jsonb_set() strictness considered harmful to data

2019-10-18 Thread Mark Felder



On Fri, Oct 18, 2019, at 12:37, Ariadne Conill wrote:
> Hello,
> 
> I am one of the primary maintainers of Pleroma, a federated social
> networking application written in Elixir, which uses PostgreSQL in
> ways that may be considered outside the typical usage scenarios for
> PostgreSQL.
> 
> Namely, we leverage JSONB heavily as a backing store for JSON-LD
> documents[1].  We also use JSONB in combination with Ecto's "embedded
> structs" to store things like user preferences.
> 
> The fact that we can use JSONB to achieve our design goals is a
> testament to the flexibility PostgreSQL has.
> 
> However, in the process of doing so, we have discovered a serious flaw
> in the way jsonb_set() functions, but upon reading through this
> mailing list, we have discovered that this flaw appears to be an
> intentional design.[2]
> 
> A few times now, we have written migrations that do things like copy
> keys in a JSONB object to a new key, to rename them.  These migrations
> look like so:
> 
>update users set info=jsonb_set(info, '{bar}', info->'foo');
> 
> Typically, this works nicely, except for cases where evaluating
> info->'foo' results in an SQL null being returned.  When that happens,
> jsonb_set() returns an SQL null, which then results in data loss.[3]
> 
> This is not acceptable.  PostgreSQL is a database that is renowned for
> data integrity, but here it is wiping out data when it encounters a
> failure case.  The way jsonb_set() should fail in this case is to
> simply return the original input: it should NEVER return SQL null.
> 
> But hey, we've been burned by this so many times now that we'd like to
> donate a useful function to the commons, consider it a mollyguard for
> the real jsonb_set() function.
> 
> create or replace function safe_jsonb_set(target jsonb, path
> text[], new_value jsonb, create_missing boolean default true) returns
> jsonb as $$
> declare
>   result jsonb;
> begin
>   result := jsonb_set(target, path, coalesce(new_value,
> 'null'::jsonb), create_missing);
>   if result is NULL then
> return target;
>   else
> return result;
>   end if;
> end;
> $$ language plpgsql;
> 
> This safe_jsonb_set() wrapper should not be necessary.  PostgreSQL's
> own jsonb_set() should have this safety feature built in.  Without it,
> using jsonb_set() is like playing russian roulette with your data,
> which is not a reasonable expectation for a database renowned for its
> commitment to data integrity.
> 
> Please fix this bug so that we do not have to hack around this bug.
> It has probably ruined countless people's days so far.  I don't want
> to hear about how the function is strict, I'm aware it is strict, and
> that strictness is harmful.  Please fix the function so that it is
> actually safe to use.
> 
> [1]: JSON-LD stands for JSON Linked Data.  Pleroma has an "internal
> representation" that shares similar qualities to JSON-LD, so I use
> JSON-LD here as a simplification.
> 
> [2]: 
> https://www.postgresql.org/message-id/flat/qfkua9$2q0e$1...@blaine.gmane.org
> 
> [3]: https://git.pleroma.social/pleroma/pleroma/issues/1324 is an
> example of data loss induced by this issue.
> 
> Ariadne
>

This should be directed towards the hackers list, too.

What will it take to change the semantics of jsonb_set()? MySQL implements safe 
behavior here. It's a real shame Postgres does not. I'll offer a $200 bounty to 
whoever fixes it. I'm sure it's destroyed more than $200 worth of data and 
people's time by now, but it's something.


Kind regards,



-- 
  Mark Felder
  ports-secteam & portmgr alumni
  f...@freebsd.org




Re: v12.0: reindex CONCURRENTLY: lock ShareUpdateExclusiveLock on object 14185/39327/0 is already held

2019-10-18 Thread Justin Pryzby
Checking if anybody is working on either of these
https://www.postgresql.org/message-id/20191013025145.GC4475%40telsasoft.com
https://www.postgresql.org/message-id/20191015164047.GA22729%40telsasoft.com

On Sat, Oct 12, 2019 at 09:51:45PM -0500, Justin Pryzby wrote:
> I ran into this while trying to trigger the previously-reported segfault. 
> 
> CREATE TABLE t(i) AS SELECT * FROM generate_series(1,9);
> CREATE INDEX ON t(i);
> 
> [pryzbyj@database ~]$ for i in `seq 1 9`; do 
> PGOPTIONS='-cstatement_timeout=9' psql postgres --host /tmp --port 5678 -c 
> "REINDEX INDEX CONCURRENTLY t_i_idx" ; done
> ERROR:  canceling statement due to statement timeout
> ERROR:  lock ShareUpdateExclusiveLock on object 14185/47287/0 is already held
> [...]
> 
> Variations on this seem to leave the locks table (?) or something else in a
> Real Bad state, such that I cannot truncate the table or drop it; or at least
> commands are unreasonably delayed for minutes, on this otherwise-empty test
> cluster.

On Tue, Oct 15, 2019 at 11:40:47AM -0500, Justin Pryzby wrote:
> On a badly-overloaded VM, we hit the previously-reported segfault in progress
> reporting.  This left around some *ccold indices.  I tried to drop them but:
> 
> sentinel=# DROP INDEX child.alarms_null_alarm_id_idx1_ccold; -- 
> child.alarms_null_alarm_time_idx_ccold; -- alarms_null_alarm_id_idx_ccold;
> ERROR:  could not find tuple for parent of relation 41351896
> 
> Those are children of relkind=I index on relkind=p table.
> 
> postgres=# CREATE TABLE t(i int)PARTITION BY RANGE(i);
> postgres=# CREATE TABLE t1 PARTITION OF t FOR VALUES FROM (1)TO(100);
> postgres=# INSERT INTO t1 SELECT 1 FROM generate_series(1,9);
> postgres=# CREATE INDEX ON t(i);
> 
> postgres=# begin; SELECT * FROM t; -- DO THIS IN ANOTHER SESSION
> 
> postgres=# REINDEX INDEX CONCURRENTLY t1_i_idx; -- cancel this one
> ^CCancel request sent
> ERROR:  canceling statement due to user request
> 
> postgres=# \d t1
> ...
> "t1_i_idx" btree (i)
> "t1_i_idx_ccold" btree (i) INVALID
> 
> postgres=# SELECT inhrelid::regclass FROM pg_inherits WHERE 
> inhparent='t_i_idx'::regclass;
> inhrelid
> t1_i_idx
> (1 row)
> 
> Not only can't I DROP the _ccold indexes, but also dropping the table doesn't
> cause them to be dropped, and then I can't even slash dee them anymore:
> 
> jtp=# DROP INDEX t1_i_idx_ccold;
> ERROR:  could not find tuple for parent of relation 290818869
> 
> jtp=# DROP TABLE t; -- does not fail, but ..
> 
> jtp=# \d t1_i_idx_ccold
> ERROR:  cache lookup failed for relation 290818865
> 
> jtp=# SELECT indrelid::regclass, * FROM pg_index WHERE 
> indexrelid='t1_i_idx_ccold'::regclass;
> indrelid   | 290818865
> indexrelid | 290818869
> indrelid   | 290818865
> [...]




Re: Columns correlation and adaptive query optimization

2019-10-18 Thread Konstantin Knizhnik

Smarter version of join selectivity patch handling cases like this:


explain select * from outer_tab join inner_tab using(x,y) where x=1;
   QUERY PLAN

 Nested Loop  (cost=0.42..1815.47 rows=10 width=12)
   Join Filter: (outer_tab.y = inner_tab.y)
   ->  Seq Scan on outer_tab  (cost=0.00..1791.00 rows=1 width=12)
 Filter: (x = 1)
   ->  Index Only Scan using inner_tab_x_y_idx on inner_tab 
(cost=0.42..24.35 rows=10 width=8)

 Index Cond: (x = 1)
(6 rows)


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index a9536c2..915a204 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -13,12 +13,25 @@
 #include "postgres.h"
 
 #include 
+#include 
 
 #include "access/parallel.h"
 #include "commands/explain.h"
+#include "commands/defrem.h"
 #include "executor/instrument.h"
 #include "jit/jit.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/cost.h"
+#include "optimizer/optimizer.h"
+#include "optimizer/planmain.h"
+#include "parser/parsetree.h"
+#include "storage/ipc.h"
+#include "statistics/statistics.h"
 #include "utils/guc.h"
+#include "utils/syscache.h"
+#include "utils/lsyscache.h"
+#include "utils/ruleutils.h"
 
 PG_MODULE_MAGIC;
 
@@ -33,7 +46,9 @@ static bool auto_explain_log_settings = false;
 static int	auto_explain_log_format = EXPLAIN_FORMAT_TEXT;
 static int	auto_explain_log_level = LOG;
 static bool auto_explain_log_nested_statements = false;
+static bool auto_explain_suggest_only = false;
 static double auto_explain_sample_rate = 1;
+static double auto_explain_add_statistics_threshold = 0.0;
 
 static const struct config_enum_entry format_options[] = {
 	{"text", EXPLAIN_FORMAT_TEXT, false},
@@ -218,6 +233,30 @@ _PG_init(void)
 			 NULL,
 			 NULL);
 
+	DefineCustomRealVariable("auto_explain.add_statistics_threshold",
+			 "Sets the threshold for actual/estimated #rows ratio triggering creation of multicolumn statistic for the related columns.",
+			 "Zero disables implicit creation of multicolumn statistic.",
+			 _explain_add_statistics_threshold,
+			 0.0,
+			 0.0,
+			 INT_MAX,
+			 PGC_SUSET,
+			 0,
+			 NULL,
+			 NULL,
+			 NULL);
+
+	DefineCustomBoolVariable("auto_explain.suggest_only",
+			 "Do not create statistic but just record in WAL suggested create statistics statement.",
+			 NULL,
+			 _explain_suggest_only,
+			 false,
+			 PGC_SUSET,
+			 0,
+			 NULL,
+			 NULL,
+			 NULL);
+
 	EmitWarningsOnPlaceholders("auto_explain");
 
 	/* Install hooks. */
@@ -353,6 +392,230 @@ explain_ExecutorFinish(QueryDesc *queryDesc)
 	PG_END_TRY();
 }
 
+static void
+AddMultiColumnStatisticsForNode(PlanState *planstate, ExplainState *es);
+
+static void
+AddMultiColumnStatisticsForSubPlans(List *plans, ExplainState *es)
+{
+	ListCell   *lst;
+
+	foreach(lst, plans)
+	{
+		SubPlanState *sps = (SubPlanState *) lfirst(lst);
+
+		AddMultiColumnStatisticsForNode(sps->planstate, es);
+	}
+}
+
+static void
+AddMultiColumnStatisticsForMemberNodes(PlanState **planstates, int nsubnodes,
+	ExplainState *es)
+{
+	int			j;
+
+	for (j = 0; j < nsubnodes; j++)
+		AddMultiColumnStatisticsForNode(planstates[j], es);
+}
+
+static int
+vars_list_comparator(const ListCell *a, const ListCell *b)
+{
+	char* va = strVal((Value *) linitial(((ColumnRef *)lfirst(a))->fields));
+	char* vb = strVal((Value *) linitial(((ColumnRef *)lfirst(b))->fields));
+	return strcmp(va, vb);
+}
+
+static void
+AddMultiColumnStatisticsForQual(void* qual, ExplainState *es)
+{
+	List *vars = NULL;
+	ListCell* lc;
+	foreach (lc, qual)
+	{
+		Node* node = (Node*)lfirst(lc);
+		if (IsA(node, RestrictInfo))
+			node = (Node*)((RestrictInfo*)node)->clause;
+		vars = list_concat(vars, pull_vars_of_level(node, 0));
+	}
+	while (vars != NULL)
+	{
+		ListCell *cell;
+		List *cols = NULL;
+		Index varno = 0;
+		Bitmapset* colmap = NULL;
+
+		foreach (cell, vars)
+		{
+			Node* node = (Node *) lfirst(cell);
+			if (IsA(node, Var))
+			{
+Var *var = (Var *) node;
+if (cols == NULL || var->varnoold == varno)
+{
+	varno = var->varnoold;
+	if (var->varattno > 0 &&
+		!bms_is_member(var->varattno, colmap) &&
+		varno >= 1 &&
+		varno <= list_length(es->rtable) &&
+		list_length(cols) < STATS_MAX_DIMENSIONS)
+	{
+		RangeTblEntry *rte = rt_fetch(varno, es->rtable);
+		if (rte->rtekind == RTE_RELATION)
+		{
+			ColumnRef  *col = makeNode(ColumnRef);
+			char *colname = get_rte_attribute_name(rte, var->varattno);
+			col->fields = list_make1(makeString(colname));
+			cols = lappend(cols, col);
+			colmap = bms_add_member(colmap, 

Missing constant propagation in planner on hash quals causes join slowdown

2019-10-18 Thread Hans Buschmann
Hi Hackers,

By optimising our application I stumbled over the join quals used very often in 
our application.
In general this concerns datasets, which are subdivided into chunks, like 
years, seasons (here half a year), multiple tenants in OLTP system etc.
In these cases many tables are joined only to data of the same chunk, 
identified always by a separating column in the table (here: xx_season).

(I tested it on PG 12.0 on Windows 64 bit, but similar results on older stable 
releases and other OS)

Here is the test case, also in the attached file:
(I choose to join 2 tables with 2 seasons (2 and 3) of about 1 million of 
records for every season. I put some randomness in the table creation to 
simulate the normal situation in OLTP systems)

--- Source start

drop table if exists tmaster;

create table tmaster (
id_t1 integer,
t1_season integer,
t1_id_t2 integer,
t1_value integer,
t1_cdescr varchar,
primary key (id_t1)
);

--


select setseed (0.34512);

insert into tmaster
select
 inum
,iseason
,row_number () over () as irow
,irandom
,'TXT: '||irandom::varchar
from (
select 
 inum::integer
,((inum>>20)+2)::integer as iseason
,inum::integer + (50*random())::integer as irandom
from generate_series (1,(1<<21)) as inum
order by irandom
)qg
;

alter table tmaster add constraint uk_master_season_id unique (t1_season,id_t1);



drop table if exists tfact;

create table tfact (
id_t2 integer,
t2_season integer,
t2_value integer,
t2_cdescr varchar,
primary key (id_t2)
);

--


select setseed (-0.76543);

insert into tfact
select
 qg.*
,'FKT: '||irandom::varchar
from (
select 
 inum::integer
,((inum>>20)+2)::integer as iseason
,inum::integer + (50*random())::integer as irandom
from generate_series (1,(1<<21)) as inum
order by irandom
)qg
;

alter table tfact add constraint uk_fact_season_id unique (t2_season,id_t2);

-

-- slower:

explain (analyze, verbose, costs, settings, buffers)
select *
from tmaster
left join tfact on id_t2=t1_id_t2 and t2_season=t1_season
where t1_season=3
;

-- faster by setting a constant in left join on condition:

explain (analyze, verbose, costs, settings, buffers)
select *
from tmaster
left join tfact on id_t2=t1_id_t2 and t2_season=3 --t1_season
where t1_season=3
;

--- Source end

The results for the first query:

explain (analyze, verbose, costs, settings, buffers)
select *
from tmaster
left join tfact on id_t2=t1_id_t2 and t2_season=t1_season
where t1_season=3
;


QUERY PLAN
--
 Hash Left Join  (cost=53436.01..111610.15 rows=1046129 width=52) (actual 
time=822.784..2476.573 rows=1048576 loops=1)
   Output: tmaster.id_t1, tmaster.t1_season, tmaster.t1_id_t2, 
tmaster.t1_value, tmaster.t1_cdescr, tfact.id_t2, tfact.t2_season, 
tfact.t2_value, tfact.t2_cdescr
   Inner Unique: true
   Hash Cond: ((tmaster.t1_season = tfact.t2_season) AND (tmaster.t1_id_t2 = 
tfact.id_t2))
   Buffers: shared hit=2102193, temp read=10442 written=10442
   ->  Index Scan using uk_master_season_id on public.tmaster  
(cost=0.43..32263.38 rows=1046129 width=28) (actual time=0.008..565.222 
rows=1048576 loops=1)
 Output: tmaster.id_t1, tmaster.t1_season, tmaster.t1_id_t2, 
tmaster.t1_value, tmaster.t1_cdescr
 Index Cond: (tmaster.t1_season = 3)
 Buffers: shared hit=1051086
   ->  Hash  (cost=31668.49..31668.49 rows=1043473 width=24) (actual 
time=820.960..820.961 rows=1048576 loops=1)
 Output: tfact.id_t2, tfact.t2_season, tfact.t2_value, tfact.t2_cdescr
 Buckets: 524288 (originally 524288)  Batches: 4 (originally 2)  Memory 
Usage: 28673kB
 Buffers: shared hit=1051107, temp written=4316
 ->  Index Scan using uk_fact_season_id on public.tfact  
(cost=0.43..31668.49 rows=1043473 width=24) (actual time=0.024..598.648 
rows=1048576 loops=1)
   Output: tfact.id_t2, tfact.t2_season, tfact.t2_value, 
tfact.t2_cdescr
   Index Cond: (tfact.t2_season = 3)
   Buffers: shared hit=1051107
 Settings: effective_cache_size = '8GB', random_page_cost = '1', temp_buffers = 
'32MB', work_mem = '32MB'
 Planning Time: 0.627 ms
 Execution Time: 2502.702 ms
(20 rows)

and for the second one:

explain (analyze, verbose, costs, settings, buffers)
select *
from tmaster
left join tfact on id_t2=t1_id_t2 and t2_season=3 --t1_season
where t1_season=3
;


QUERY PLAN
--
 Hash Left Join  (cost=50827.33..106255.38 rows=1046129 width=52) (actual 
time=758.086..2313.175 rows=1048576 loops=1)
   

Re: Bug about drop index concurrently

2019-10-18 Thread Tomas Vondra

Hi,

I can trivially reproduce this - it's enough to create a master-standby
setup, and then do this on the master

 CREATE TABLE t (a int, b int);
 INSERT INTO t SELECT i, i FROM generate_series(1,1) s(i);

and run pgbench with this script

 DROP INDEX CONCURRENTLY IF EXISTS t_a_idx;
 CREATE INDEX t_a_idx ON t(a);

while on the standby there's another pgbench running this script

 EXPLAIN ANALYZE SELECT * FROM t WHERE a = 1;

and it fails pretty fast for me. With an extra assert(false) added to
src/backend/access/common/relation.c I get a backtrace like this:

   Program terminated with signal SIGABRT, Aborted.
   #0  0x7c32e458fe35 in raise () from /lib64/libc.so.6
   Missing separate debuginfos, use: dnf debuginfo-install 
glibc-2.29-22.fc30.x86_64
   (gdb) bt
   #0  0x7c32e458fe35 in raise () from /lib64/libc.so.6
   #1  0x7c32e457a895 in abort () from /lib64/libc.so.6
   #2  0x00a58579 in ExceptionalCondition (conditionName=0xacd9bc "!(0)", 
errorType=0xacd95b "FailedAssertion", fileName=0xacd950 "relation.c", lineNumber=64) at 
assert.c:54
   #3  0x0048d1bd in relation_open (relationId=38216, lockmode=1) at 
relation.c:64
   #4  0x005082e4 in index_open (relationId=38216, lockmode=1) at 
indexam.c:130
   #5  0x0080ac3f in get_relation_info (root=0x21698b8, 
relationObjectId=16385, inhparent=false, rel=0x220ce60) at plancat.c:196
   #6  0x008118c6 in build_simple_rel (root=0x21698b8, relid=1, 
parent=0x0) at relnode.c:292
   #7  0x007d485d in add_base_rels_to_query (root=0x21698b8, 
jtnode=0x2169478) at initsplan.c:114
   #8  0x007d48a3 in add_base_rels_to_query (root=0x21698b8, 
jtnode=0x21693e0) at initsplan.c:122
   #9  0x007d8fad in query_planner (root=0x21698b8, qp_callback=0x7ded97 
, qp_extra=0x7fffa4834f10) at planmain.c:168
   #10 0x007dc316 in grouping_planner (root=0x21698b8, 
inheritance_update=false, tuple_fraction=0) at planner.c:2048
   #11 0x007da7ca in subquery_planner (glob=0x220d078, parse=0x2168f78, 
parent_root=0x0, hasRecursion=false, tuple_fraction=0) at planner.c:1012
   #12 0x007d942c in standard_planner (parse=0x2168f78, 
cursorOptions=256, boundParams=0x0) at planner.c:406
   #13 0x007d91e8 in planner (parse=0x2168f78, cursorOptions=256, 
boundParams=0x0) at planner.c:275
   #14 0x008e1b0d in pg_plan_query (querytree=0x2168f78, 
cursorOptions=256, boundParams=0x0) at postgres.c:878
   #15 0x00658683 in ExplainOneQuery (query=0x2168f78, cursorOptions=256, 
into=0x0, es=0x220cd90, queryString=0x21407b8 "explain analyze select * from t where 
a = 1;", params=0x0, queryEnv=0x0) at explain.c:367
   #16 0x00658386 in ExplainQuery (pstate=0x220cc28, stmt=0x2141728, 
queryString=0x21407b8 "explain analyze select * from t where a = 1;", 
params=0x0, queryEnv=0x0, dest=0x220cb90) at explain.c:255
   #17 0x008ea218 in standard_ProcessUtility (pstmt=0x21425c0, 
queryString=0x21407b8 "explain analyze select * from t where a = 1;", 
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x220cb90,
   completionTag=0x7fffa48355c0 "") at utility.c:675
   #18 0x008e9a45 in ProcessUtility (pstmt=0x21425c0, queryString=0x21407b8 
"explain analyze select * from t where a = 1;", 
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x220cb90,
   completionTag=0x7fffa48355c0 "") at utility.c:360
   #19 0x008e8a0c in PortalRunUtility (portal=0x219c278, pstmt=0x21425c0, 
isTopLevel=true, setHoldSnapshot=true, dest=0x220cb90, completionTag=0x7fffa48355c0 
"") at pquery.c:1175
   #20 0x008e871a in FillPortalStore (portal=0x219c278, 
isTopLevel=true) at pquery.c:1035
   #21 0x008e8075 in PortalRun (portal=0x219c278, count=9223372036854775807, 
isTopLevel=true, run_once=true, dest=0x21efb90, altdest=0x21efb90, 
completionTag=0x7fffa48357b0 "") at pquery.c:765
   #22 0x008e207c in exec_simple_query (query_string=0x21407b8 "explain 
analyze select * from t where a = 1;") at postgres.c:1215
   #23 0x008e636e in PostgresMain (argc=1, argv=0x216c600, dbname=0x216c4e0 
"test", username=0x213c3f8 "user") at postgres.c:4236
   #24 0x0083c71e in BackendRun (port=0x2165850) at postmaster.c:4437
   #25 0x0083beef in BackendStartup (port=0x2165850) at 
postmaster.c:4128
   #26 0x00838313 in ServerLoop () at postmaster.c:1704
   #27 0x00837bbf in PostmasterMain (argc=3, argv=0x213a360) at 
postmaster.c:1377
   #28 0x00759643 in main (argc=3, argv=0x213a360) at main.c:228

So my guess is the root cause is pretty simple - we close/unlock the
indexes after completing the query, but then EXPLAIN tries to open it
again when producing the explain plan.

I don't have a very good idea how to fix this, as explain has no idea
which indexes will be used by the query, etc.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com

Re: configure fails for perl check on CentOS8

2019-10-18 Thread Tom Lane
Kyotaro Horiguchi  writes:
> The immediately problematic command generated by autoconf is:
> ...
> /usr/bin/ld: /tmp/ccGxodNv.o: relocation R_X86_64_32 against symbol 
> `PL_memory_wrap' can not be used when making a PIE object; recompile with 
> -fPIC
> /usr/bin/ld: final link failed: Nonrepresentable section on output
> collect2: error: ld returned 1 exit status

> Very interestingly I don't get the error when the "-O0" is "-O2". It
> is because gcc eliminates the PL_memory_wrap maybe by inlining.

Yeah, probably so.  But I don't like the idea of fixing a problem
triggered by user-supplied CFLAGS by injecting more cflags from
elsewhere.  That seems likely to be counterproductive, or at
least it risks overriding what the user wanted.

Can we fix this by using something other than perl_alloc() as
the tested-for function?  That is surely a pretty arbitrary
choice.  Are there any standard Perl entry points that are just
plain functions with no weird macro expansions?

regards, tom lane




Re: Add Change Badges to documentation

2019-10-18 Thread Tomas Vondra

On Fri, Oct 18, 2019 at 07:54:18AM -0400, Corey Huinker wrote:

Attached is a patch to implement change badges in our documentation.

What's a change badge? It's my term for a visual cue in the documentation
used to indicate that the nearby section of documentation is new in this
version or otherwise changed from the previous version.

One example of change badges being used is in the DocBook documentation
reference:
https://tdg.docbook.org/tdg/4.5/ref-elements.html#common.attributes

Docbook used graphical badges, which seemed to be a bad idea. Instead, I
went with a decorated text span like one finds in gmail labels or Reddit
"flair".



Looks useful. I sometimes need to look at a command in version X and see
what changed since version Y. Currently I do that by opening both pages
and visually comparing them, so those badges make it easier.


The badges are implemented via using the "revision" attribute available on
all docbook tags. All one needs to do to indicate a change is to change one
tag, and add a revision attribute. For example:



will add a small green text box with the tex "new in 13" immediately
preceding the rendered elements. I have attached a screenshot
(badges_in_acronyms.png) of an example of this from my browser viewing
changes to the acronyms.html file. This obviously lacks the polish of
viewing the page on a full website, but it does give you an idea of the
flexibility of the change badge, and where badge placement is (and is not)
a good idea.

What are the benefits of using this?

I think the benefits are as follows:

1. It shows a casual user what pieces are new on that page (new functions,
new keywords, new command options, etc).



Yep.


2. It also works in the negative: a user can quickly skim a page, and
lacking any badges, feel confident that everything there works in the way
that it did in version N-1.



Not sure about this. It'd require marking all changes with the badge,
but we'll presumably mark only the large-ish changes, and it's unclear
where the threshold is.

It also does not work when removing a block of text (e.g. when removing
some limitation), although it's true we often add a new para too.


3. It also acts as a subtle cue for the user to click on the previous
version to see what it used to look like, confident that there *will* be a
difference on the previous version.


How would we implement this?

1. All new documentation pages would get a "NEW" badge in their title.

2. New function definitions, new command options, etc would get a "NEW"
badge as visually close to the change as is practical.

3. Changes to existing functions, options, etc. would get a badge of
"UPDATED"

4. At major release time, we could do one of two things:

4a. We could keep the NEW/UPDATED badges in the fixed release version, and
then completely remove them from the master, because for version N+1, they
won't be new anymore. This can be accomplished with an XSL transform
looking for any tag with the "revision" attribute

4b. We could code in the version number at release time, and leave it in
place. So in version 14 you could find both "v13" and "v14" badges, and in
version 15 you could find badges for 15, 14, and 13. At some point (say
v17), we start retiring the v13 badges, and in v18 we'd retire the v14
badges, and so on, to keep the clutter to a minimum.



Presumably we could keep the SGML source and only decide which badges to
ignore during build of the docs. That would however require somewhat
more structured approach - now it's a single attribute with free text,
which does not really allow easy filtering. With separate attributes for
new/removed bits, e.g.

  

and

  

the filtering would be much easier. But my experience with SGML is
rather limited, so maybe I'm wrong.


Back to the patch:
I implemented this only for html output, and the colors I chose are very
off-brand for postgres, so that will have to change. There's probably some
spacing/padding issues I haven't thought of. Please try it out, make some
modifications to existing document pages to see how badges would work in
those contexts.


Haven't looked yet, but I agree the colors might need a change - that's
a rather minor detail, though.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: UPSERT on view does not find constraint by name

2019-10-18 Thread Tom Lane
Jeremy Finzel  writes:
> test=# CREATE TEMP TABLE foo (id int primary key);
> CREATE TABLE
> test=# CREATE VIEW bar AS SELECT * FROM foo;
> NOTICE:  view "bar" will be a temporary view
> CREATE VIEW
> ...
> test=# INSERT INTO bar (id)
> VALUES (1)
> ON CONFLICT ON CONSTRAINT foo_pkey
> DO NOTHING;
> ERROR:  constraint "foo_pkey" for table "bar" does not exist
> test=# INSERT INTO bar (id)
> VALUES (1)
> ON CONFLICT (id)
> DO NOTHING;
> INSERT 0 0

> Of interest are the last 2 statements above.  ON CONFLICT on the constraint
> name does not work, but it does work by field name.  I'm not saying it
> *should* work both ways, but I'm more wondering if this is
> known/expected/desired behavior.

The first case looks perfectly normal to me: there is no "foo_pkey"
constraint associated with the "bar" view.  It is interesting that
the second case drills down to find there's an underlying constraint,
but that seems like a bit of a hack :-(.

Poking at it a little more closely, it seems like the first case
involves a parse-time constraint lookup, while the second case
postpones the lookup to plan time, and so the second case works
because the view has already been expanded into a direct reference
to the underlying table.  Maybe it wasn't good to do those cases
differently.  I can't get too excited about it though.

regards, tom lane




Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-18 Thread Stephen Frost
Greetings,

* Chapman Flack (c...@anastigmatix.net) wrote:
> On 10/18/19 08:18, Stephen Frost wrote:
> > I realize that I need to don some fireproof gear for suggesting this,
> > but I really wonder how much fallout we'd have from just allowing {} to
> > be used..  It's about a billion[1] times cleaner and more sensible than
> > using {0} and doesn't create a dependency on what the first element of
> > the struct is..
> 
> I guess the non-flamey empirical question would be, if it's not ISO C,
> are we supporting any compiler that doesn't understand it?

Right, that's basically what I was trying to ask. :)

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Clean up MinGW def file generation

2019-10-18 Thread Tom Lane
Alvaro Herrera  writes:
> On 2019-Oct-17, Michael Paquier wrote:
>> On Tue, Oct 15, 2019 at 09:00:23AM +0200, Peter Eisentraut wrote:
>>> I think we can clean this up and just have the regular ddl.def built
>>> normally at build time if required.
>>> Does anyone know more about this?

> Well, yes, but that code originates from much earlier.  For example
> 2a63c1602d9d (Tom Lane, Oct. 2004) is the one that created the libpq
> ones.

Yeah, the comment that Peter complained about is mine.  I believe the
desire to avoid depending on "sed" at build time was focused on our
old support for building libpq with Borland C (and not much else).
Since this makefile infrastructure is now only used for MinGW, I agree
we ought to be able to quit shipping those files in tarballs.

I think there could be some .gitignore cleanup done along with this.
Notably, I see exclusions for /exports.list in several places, but no
other references to that name --- isn't that an intermediate file that
we used to generate while creating these files?

regards, tom lane




Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-18 Thread Chapman Flack
On 10/18/19 08:18, Stephen Frost wrote:
> I realize that I need to don some fireproof gear for suggesting this,
> but I really wonder how much fallout we'd have from just allowing {} to
> be used..  It's about a billion[1] times cleaner and more sensible than
> using {0} and doesn't create a dependency on what the first element of
> the struct is..

I guess the non-flamey empirical question would be, if it's not ISO C,
are we supporting any compiler that doesn't understand it?

-Chap




Re: UPSERT on view does not find constraint by name

2019-10-18 Thread Jeremy Finzel
On Fri, Oct 18, 2019 at 3:42 AM Tom Lane  wrote:

> Jeremy Finzel  writes:
> > I'm not sure if this can be considered a bug or not, but it is perhaps
> > unexpected.  I found that when using a view that is simply select * from
> > table, then doing INSERT ... ON CONFLICT ON CONSTRAINT constraint_name on
> > that view, it does not find the constraint and errors out.  But it does
> > find the constraint if one lists the columns instead.
>
> I'm confused by this report.  The view wouldn't have any constraints,
> and experimenting shows that the parser won't let you name a
> constraint of the underlying table here.  So would you provide a
> concrete example of what you're talking about?
>
> regards, tom lane
>

Apologies for the lack of clarity.  Here is a simple example of what I mean:

test=# CREATE TEMP TABLE foo (id int primary key);
CREATE TABLE
test=# \d foo
   Table "pg_temp_4.foo"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 id | integer |   | not null |
Indexes:
"foo_pkey" PRIMARY KEY, btree (id)

test=# CREATE VIEW bar AS SELECT * FROM foo;
NOTICE:  view "bar" will be a temporary view
CREATE VIEW
test=# INSERT INTO foo (id)
test-# VALUES (1)
test-# ON CONFLICT ON CONSTRAINT foo_pkey
test-# DO NOTHING;
INSERT 0 1
test=# INSERT INTO foo (id)
VALUES (1)
ON CONFLICT ON CONSTRAINT foo_pkey
DO NOTHING;
INSERT 0 0
test=# INSERT INTO foo (id)
VALUES (1)
ON CONFLICT ON CONSTRAINT foo_pkey
DO NOTHING;
INSERT 0 0
test=# INSERT INTO bar (id)
VALUES (1)
ON CONFLICT ON CONSTRAINT foo_pkey
DO NOTHING;
ERROR:  constraint "foo_pkey" for table "bar" does not exist
test=# INSERT INTO bar (id)
VALUES (1)
ON CONFLICT (id)
DO NOTHING;
INSERT 0 0



Of interest are the last 2 statements above.  ON CONFLICT on the constraint
name does not work, but it does work by field name.  I'm not saying it
*should* work both ways, but I'm more wondering if this is
known/expected/desired behavior.

The point of interest for us is that we frequently preserve a table's
"public API" by instead swapping out a table for a view as above, in order
for instance to rebuild a table behind the scenes without breaking table
usage.  Above case is a rare example where that doesn't work, and which in
any case I advise (as does the docs) that they do not use on conflict on
constraint, but rather to list the field names instead.

Thanks,
Jeremy


RE: Copy data to DSA area

2019-10-18 Thread ideriha.take...@fujitsu.com
>>ShmZoneContext for SharedPlan and SharedRelCache is not implemented but
>>I'm going to do it following your points.
>
>After looking into existing code, I'm thinking Generation Memory Context seems 
>to
>have the similar purpose. So I'll implement ShmZoneContext by reference it.
>Generation context manages chunks having almost same life span.
>ShmZoneContext would dsa_allocate() block and hand out chunks and free chunks 
>and
>its block all together.

I added ShmZoneContext to my patch.
I haven't added detailed comments and test set, so let me explain how to
use it here. I followed Thomas' suggestion. 

At start up, ShmZoneContext is created in shared memory by 
ShmZoneContextCreateOrigin(). Before allocating memory, another context 
is created and set to short-lived parent context via MemoryContextClone()
so that objects and contexts are automatically freed. Then you can use, 
palloc() which returns chunk from dsa_allocated block. When you use 
MemoryContextSetParent() to long-lived ShmContext,
you need to acquire lock to protect parent-child path. The LWLock object
is get by ShmZoneContextGetLock(). Every cloned ShmZoneContext
uses the same lock instance. If you want to free allocated object, 
use MemoryContextDelete(). After the context becomes long-lived, 
you need to get lock again to do MemoryContextDelete() in order to
protect MemoryContext parent-child path.

Thinking about use case of Shared RelCache/PlanCache, allocation 
happens only before the parent context is switch to long-lived shared one,
so I think we don't need to take lock while palloc(). 
I also think MemoryContextDelete() should be done after allocated objects
are deleted from some shared index structure (ex. hash table or list 
in shared memory) so that another process can take a look at it

What do you think?

Regards,
Takeshi Ideriha


shm_zone_retail_context-v2.patch
Description: shm_zone_retail_context-v2.patch


Re: Non working timeout detection in logical worker

2019-10-18 Thread Jehan-Guillaume (ioguix) de Rorthais
On Fri, 18 Oct 2019 07:47:13 +0200
Julien Rouhaud  wrote:

> On Fri, Oct 18, 2019 at 7:32 AM Michael Paquier  wrote:
> >
> > On Thu, Oct 17, 2019 at 08:00:15PM +0200, Julien Rouhaud wrote:  
> > > Jehan-Guillaume (in Cc) reported me today a problem with logical
> > > replication, where in case of network issue the walsender is correctly
> > > terminating at the given wal_sender_timeout but the logical worker
> > > kept waiting indefinitely.
> > >
> > > The issue is apparently a simple thinko, the timestamp of the last
> > > received activity being unconditionally set at the beginning of the
> > > main processing loop, making any reasonable timeout setting
> > > ineffective.  Trivial patch to fix the problem attached.  
> >
> > Right, good catch.  That's indeed incorrect.  The current code would
> > just keep resetting the timeout if walrcv_receive() returns 0 roughly
> > once per NAPTIME_PER_CYCLE.  The ping sent to the server once reaching
> > half of wal_receiver_timeout was also broken because of that.
> >
> > In short, applied and back-patched down to 10.  
> 
> Thanks Michael!

Thank you both guys!




Re: [Patch proposal] libpq portal support

2019-10-18 Thread Craig Ringer
On Thu, 17 Oct 2019 at 03:12, Sergei Fedorov 
wrote:

> Hello everybody,
>
> Our company was in desperate need of portals in async interface of libpq,
> so we patched it.
>
> We would be happy to upstream the changes.
>
> The description of changes:
>
> Two functions in libpq-fe.h:
> PQsendPortalBindParams for sending a command to bind a portal to a
> previously prepared statement;
> PQsendPortalExecute for executing a previously bound portal with a given
> number of rows.
>
> A patch to pqParseInput3 in fe-protocol3.c to handle the `portal
> suspended` message tag.
>
> The patch is ready for review, but it lacks documentation, tests and usage
> examples.
>
> There are no functions for sending bind without params and no functions
> for sync interface, but they can easily be added to the feature.
>

If you are happy to put it under The PostgreSQL License, then sending it as
an attachment here is the first step.

If possible, please rebase it on top of git master.

Some explanation for why you have this need and what problems this solves
for you would be helpful as well.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-18 Thread Stephen Frost
Greetings,

* Thomas Munro (thomas.mu...@gmail.com) wrote:
> On Tue, Oct 8, 2019 at 11:09 PM Amit Kapila  wrote:
> > I am personally still in the camp of people advocating the use of
> > macro for this purpose.  It is quite possible after reading your
> > points, some people might change their opinion or some others also
> > share their opinion against using a macro in which case we can drop
> > the idea of using a macro.
> 
> -1 for these macros.

Agreed.

> These are basic facts about the C language.  I hope C eventually
> supports {} like C++, so that you don't have to think hard about
> whether the first member is another struct, and recursively so … but
> since the macros can't help with that problem, what is the point?

I realize that I need to don some fireproof gear for suggesting this,
but I really wonder how much fallout we'd have from just allowing {} to
be used..  It's about a billion[1] times cleaner and more sensible than
using {0} and doesn't create a dependency on what the first element of
the struct is..

Thanks,

Stephen

1: Detailed justification not included intentionally and is left as an
exercise to the reader.


signature.asc
Description: PGP signature


Re: Clean up MinGW def file generation

2019-10-18 Thread Alvaro Herrera
On 2019-Oct-17, Michael Paquier wrote:

> On Tue, Oct 15, 2019 at 09:00:23AM +0200, Peter Eisentraut wrote:

> > I think we can clean this up and just have the regular ddl.def built
> > normally at build time if required.
> > 
> > Does anyone know more about this?
> 
> This comes from here, but I cannot see a thread about this topic
> around this date:
> commit: a1d5d8574751d62a039d8ceb44329ee7c637196a
> author: Peter Eisentraut 
> date: Tue, 26 Feb 2008 06:41:24 +
> Refactor the code that creates the shared library export files to appear
> only once in Makefile.shlib and not in four copies.

Well, yes, but that code originates from much earlier.  For example
2a63c1602d9d (Tom Lane, Oct. 2004) is the one that created the libpq
ones.  But even that ancient one seems to be just refactoring some stuff
that was already there, namely something that seems to have been created
by commit 53cd7cd8a916:

commit 53cd7cd8a9168d4b2e2feb52129336429cc99b98
Author: Bruce Momjian 
AuthorDate: Tue Mar 9 04:53:37 2004 +
CommitDate: Tue Mar 9 04:53:37 2004 +

Make a separate win32 debug DLL along with the non-debug version:

Currently, src/interfaces/libpq/win32.mak builds a statically-linked
library "libpq.lib", a debug dll "libpq.dll", import library for the
debug dll "libpqdll.lib", a release dll "libpq.dll", import library for
the release dll "libpqdll.lib".  To avoid naming clashes, I would make
the debug dll and import libraries "libpqd.dll" and "libpqddll.lib".

Basically, the debug build uses the cl flags: "/MDd /D _DEBUG", and the
release build uses the cl flags "/MD /D NDEBUG".  Usually the debug
build has a "D" suffix on the file name, so for example:

libpqd.dll libpq, debug build
libpqd.lib libpq, debug build, import library
libpq.dll  libpq, release build
libpq.lib  libpq, release build, import library

David Turner

This stuff was used by win32.mak, but I don't know if that tells anyone
anything.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: libpq: Fix wrong connection status on invalid "connect_timeout"

2019-10-18 Thread Lars Kanis
Am 18.10.19 um 05:06 schrieb Michael Paquier:

> So attached is a patch to skip trailing whitespaces as well,
> which also fixes the issue with ECPG.  I have refactored the parsing
> logic a bit while on it.  The comment at the top of parse_int_param()
> needs to be reworked a bit more.

I tested this and it looks good to me. Maybe you could omit some
redundant 'end' checks, as in the attached patch. Or was your intention
to verify non-NULL 'end'?


> Perhaps we could add directly regression
> tests for libpq.  I'll start a new thread about that once we are done
> here, the topic is larger.

We have around 650 tests on ruby-pg to ensure everything runs as
expected and I always wondered how the API of libpq is being verified.


--
Kind Regards,
Lars Kanis

diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index f91f0f2..0eeabb8 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -1687,7 +1687,7 @@ useKeepalives(PGconn *conn)
 /*
  * Parse and try to interpret "value" as an integer value, and if successful,
  * store it in *result, complaining if there is any trailing garbage or an
- * overflow.
+ * overflow.  This allows any number of leading and trailing whitespaces.
  */
 static bool
 parse_int_param(const char *value, int *result, PGconn *conn,
@@ -1698,14 +1698,31 @@ parse_int_param(const char *value, int *result, PGconn *conn,
 
 	*result = 0;
 
+	/* strtol(3) skips leading whitespaces */
 	errno = 0;
 	numval = strtol(value, , 10);
-	if (errno == 0 && *end == '\0' && numval == (int) numval)
-	{
-		*result = numval;
-		return true;
-	}
 
+	/*
+	 * If no progress was done during the parsing or an error happened, fail.
+	 * This tests properly for overflows of the result.
+	 */
+	if (value == end || errno != 0 || numval != (int) numval)
+		goto error;
+
+	/*
+	 * Skip any trailing whitespace; if anything but whitespace remains before
+	 * the terminating character, fail
+	 */
+	while (*end != '\0' && isspace((unsigned char) *end))
+		end++;
+
+	if (*end != '\0')
+		goto error;
+
+	*result = numval;
+	return true;
+
+error:
 	appendPQExpBuffer(>errorMessage,
 	  libpq_gettext("invalid integer value \"%s\" for connection option \"%s\"\n"),
 	  value, context);
@@ -2008,7 +2025,11 @@ connectDBComplete(PGconn *conn)
 	{
 		if (!parse_int_param(conn->connect_timeout, , conn,
 			 "connect_timeout"))
+		{
+			/* mark the connection as bad to report the parsing failure */
+			conn->status = CONNECTION_BAD;
 			return 0;
+		}
 
 		if (timeout > 0)
 		{


Re: libpq: Fix wrong connection status on invalid "connect_timeout"

2019-10-18 Thread Lars Kanis
Am 18.10.19 um 05:06 schrieb Michael Paquier:

> So attached is a patch to skip trailing whitespaces as well,
> which also fixes the issue with ECPG.  I have refactored the parsing
> logic a bit while on it.  The comment at the top of parse_int_param()
> needs to be reworked a bit more.

I tested this and it looks good to me. Maybe you could omit some
redundant 'end' checks, as in the attached patch. Or was your intention
to verify non-NULL 'end'?


> Perhaps we could add directly regression
> tests for libpq.  I'll start a new thread about that once we are done
> here, the topic is larger.

We have around 650 tests on ruby-pg to ensure everything runs as
expected and I always wondered how the API of libpq is being verified.


--
Kind Regards,
Lars Kanis

diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index f91f0f2..0eeabb8 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -1687,7 +1687,7 @@ useKeepalives(PGconn *conn)
 /*
  * Parse and try to interpret "value" as an integer value, and if successful,
  * store it in *result, complaining if there is any trailing garbage or an
- * overflow.
+ * overflow.  This allows any number of leading and trailing whitespaces.
  */
 static bool
 parse_int_param(const char *value, int *result, PGconn *conn,
@@ -1698,14 +1698,31 @@ parse_int_param(const char *value, int *result, PGconn *conn,
 
 	*result = 0;
 
+	/* strtol(3) skips leading whitespaces */
 	errno = 0;
 	numval = strtol(value, , 10);
-	if (errno == 0 && *end == '\0' && numval == (int) numval)
-	{
-		*result = numval;
-		return true;
-	}
 
+	/*
+	 * If no progress was done during the parsing or an error happened, fail.
+	 * This tests properly for overflows of the result.
+	 */
+	if (value == end || errno != 0 || numval != (int) numval)
+		goto error;
+
+	/*
+	 * Skip any trailing whitespace; if anything but whitespace remains before
+	 * the terminating character, fail
+	 */
+	while (*end != '\0' && isspace((unsigned char) *end))
+		end++;
+
+	if (*end != '\0')
+		goto error;
+
+	*result = numval;
+	return true;
+
+error:
 	appendPQExpBuffer(>errorMessage,
 	  libpq_gettext("invalid integer value \"%s\" for connection option \"%s\"\n"),
 	  value, context);
@@ -2008,7 +2025,11 @@ connectDBComplete(PGconn *conn)
 	{
 		if (!parse_int_param(conn->connect_timeout, , conn,
 			 "connect_timeout"))
+		{
+			/* mark the connection as bad to report the parsing failure */
+			conn->status = CONNECTION_BAD;
 			return 0;
+		}
 
 		if (timeout > 0)
 		{


Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-10-18 Thread Amit Kapila
On Mon, Oct 14, 2019 at 3:09 PM Dilip Kumar  wrote:
>
> On Thu, Oct 3, 2019 at 4:03 AM Tomas Vondra
>  wrote:
> >
> >
> > Sure, I wasn't really proposing to adding all stats from that patch,
> > including those related to streaming.  We need to extract just those
> > related to spilling. And yes, it needs to be moved right after 0001.
> >
> I have extracted the spilling related code to a separate patch on top
> of 0001.  I have also fixed some bugs and review comments and attached
> as a separate patch.  Later I can merge it to the main patch if you
> agree with the changes.
>

Few comments
-
0001-Add-logical_decoding_work_mem-to-limit-ReorderBuffer
1.
+ {
+ {"logical_decoding_work_mem", PGC_USERSET, RESOURCES_MEM,
+ gettext_noop("Sets the maximum memory to be used for logical decoding."),
+ gettext_noop("This much memory can be used by each internal "
+ "reorder buffer before spilling to disk or streaming."),
+ GUC_UNIT_KB
+ },

I think we can remove 'or streaming' from above sentence for now.  We
can add it later with later patch where streaming will be allowed.

2.
@@ -206,6 +206,18 @@ CREATE SUBSCRIPTION subscription_name
 

+
+   
+work_mem (integer)
+
+ 
+  Limits the amount of memory used to decode changes on the
+  publisher.  If not specified, the publisher will use the default
+  specified by logical_decoding_work_mem. When
+  needed, additional data are spilled to disk.
+ 
+
+   

It is not clear why we need this parameter at least with this patch?
I have raised this multiple times [1][2].

bugs_and_review_comments_fix
1.
},
  _decoding_work_mem,
- -1, -1, MAX_KILOBYTES,
- check_logical_decoding_work_mem, NULL, NULL
+ 65536, 64, MAX_KILOBYTES,
+ NULL, NULL, NULL

I think the default value should be 1MB similar to
maintenance_work_mem.  The same was true before this change.

2. -#logical_decoding_work_mem = 64MB # min 1MB, or -1 to use
maintenance_work_mem
+i#logical_decoding_work_mem = 64MB # min 64kB

It seems the 'i' is a leftover character in the above change.  Also,
change the default value considering the previous point.

3.
@@ -2479,7 +2480,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb,
ReorderBufferTXN *txn)

  /* update the statistics */
  rb->spillCount += 1;
- rb->spillTxns += txn->serialized ? 1 : 0;
+ rb->spillTxns += txn->serialized ? 0 : 1;
  rb->spillBytes += size;

Why is this change required?  Shouldn't we increase the spillTxns
count only when the txn is serialized?

0002-Track-statistics-for-spilling
1.
+
+ spill_txns
+ integer
+ Number of transactions spilled to disk after the memory used by
+  logical decoding exceeds logical_work_mem. The
+  counter gets incremented both for toplevel transactions and
+  subtransactions.
+  
+

The parameter name is wrong here. /logical_work_mem/logical_decoding_work_mem

2.
+
+ spill_txns
+ integer
+ Number of transactions spilled to disk after the memory used by
+  logical decoding exceeds logical_work_mem. The
+  counter gets incremented both for toplevel transactions and
+  subtransactions.
+  
+
+
+ spill_count
+ integer
+ Number of times transactions were spilled to disk. Transactions
+  may get spilled repeatedly, and this counter gets incremented on every
+  such invocation.
+  
+
+
+ spill_bytes
+ integer
+ Amount of decoded transaction data spilled to disk.
+  
+

In all the above cases, the explanation text starts immediately after
 tag, but the general coding practice is to start from the next
line, see the explanation of nearby parameters.

It seems these parameters are added in pg-stat-wal-receiver-view in
the docs, but in code, it is present as part of pg_stat_replication.
It seems doc needs to be updated.  Am, I missing something?

3.
ReorderBufferSerializeTXN()
{
..
/* update the statistics */
rb->spillCount += 1;
rb->spillTxns += txn->serialized ? 0 : 1;
rb->spillBytes += size;

Assert(spilled == txn->nentries_mem);
Assert(dlist_is_empty(>changes));
txn->nentries_mem = 0;
txn->serialized = true;
..
}

I am not able to understand the above code.  We are setting the
serialized parameter a few lines after we check it and increment the
spillTxns count. Can you please explain it?

Also, isn't spillTxns count bit confusing, because in some cases it
will include subtransactions and other cases (where the largest picked
transaction is a subtransaction) it won't include it?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: libpq: Fix wrong connection status on invalid "connect_timeout"

2019-10-18 Thread Lars Kanis
Am 18.10.19 um 05:06 schrieb Michael Paquier:

> So attached is a patch to skip trailing whitespaces as well,
> which also fixes the issue with ECPG.  I have refactored the parsing
> logic a bit while on it.  The comment at the top of parse_int_param()
> needs to be reworked a bit more.

I tested this and it looks good to me. Maybe you could omit some
redundant 'end' checks, as in the attached patch. Or was your intention
to verify non-NULL 'end'?


> Perhaps we could add directly regression
> tests for libpq.  I'll start a new thread about that once we are done
> here, the topic is larger.

We have around 650 tests on ruby-pg to ensure everything runs as
expected and I always wondered how the API of libpq is being verified.


--
Kind Regards,
Lars Kanis

diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index f91f0f2..0eeabb8 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -1687,7 +1687,7 @@ useKeepalives(PGconn *conn)
 /*
  * Parse and try to interpret "value" as an integer value, and if successful,
  * store it in *result, complaining if there is any trailing garbage or an
- * overflow.
+ * overflow.  This allows any number of leading and trailing whitespaces.
  */
 static bool
 parse_int_param(const char *value, int *result, PGconn *conn,
@@ -1698,14 +1698,31 @@ parse_int_param(const char *value, int *result, PGconn *conn,
 
 	*result = 0;
 
+	/* strtol(3) skips leading whitespaces */
 	errno = 0;
 	numval = strtol(value, , 10);
-	if (errno == 0 && *end == '\0' && numval == (int) numval)
-	{
-		*result = numval;
-		return true;
-	}
 
+	/*
+	 * If no progress was done during the parsing or an error happened, fail.
+	 * This tests properly for overflows of the result.
+	 */
+	if (value == end || errno != 0 || numval != (int) numval)
+		goto error;
+
+	/*
+	 * Skip any trailing whitespace; if anything but whitespace remains before
+	 * the terminating character, fail
+	 */
+	while (*end != '\0' && isspace((unsigned char) *end))
+		end++;
+
+	if (*end != '\0')
+		goto error;
+
+	*result = numval;
+	return true;
+
+error:
 	appendPQExpBuffer(>errorMessage,
 	  libpq_gettext("invalid integer value \"%s\" for connection option \"%s\"\n"),
 	  value, context);
@@ -2008,7 +2025,11 @@ connectDBComplete(PGconn *conn)
 	{
 		if (!parse_int_param(conn->connect_timeout, , conn,
 			 "connect_timeout"))
+		{
+			/* mark the connection as bad to report the parsing failure */
+			conn->status = CONNECTION_BAD;
 			return 0;
+		}
 
 		if (timeout > 0)
 		{


Add Change Badges to documentation

2019-10-18 Thread Corey Huinker
Attached is a patch to implement change badges in our documentation.

What's a change badge? It's my term for a visual cue in the documentation
used to indicate that the nearby section of documentation is new in this
version or otherwise changed from the previous version.

One example of change badges being used is in the DocBook documentation
reference:
https://tdg.docbook.org/tdg/4.5/ref-elements.html#common.attributes

Docbook used graphical badges, which seemed to be a bad idea. Instead, I
went with a decorated text span like one finds in gmail labels or Reddit
"flair".

The badges are implemented via using the "revision" attribute available on
all docbook tags. All one needs to do to indicate a change is to change one
tag, and add a revision attribute. For example:



will add a small green text box with the tex "new in 13" immediately
preceding the rendered elements. I have attached a screenshot
(badges_in_acronyms.png) of an example of this from my browser viewing
changes to the acronyms.html file. This obviously lacks the polish of
viewing the page on a full website, but it does give you an idea of the
flexibility of the change badge, and where badge placement is (and is not)
a good idea.

What are the benefits of using this?

I think the benefits are as follows:

1. It shows a casual user what pieces are new on that page (new functions,
new keywords, new command options, etc).

2. It also works in the negative: a user can quickly skim a page, and
lacking any badges, feel confident that everything there works in the way
that it did in version N-1.

3. It also acts as a subtle cue for the user to click on the previous
version to see what it used to look like, confident that there *will* be a
difference on the previous version.


How would we implement this?

1. All new documentation pages would get a "NEW" badge in their title.

2. New function definitions, new command options, etc would get a "NEW"
badge as visually close to the change as is practical.

3. Changes to existing functions, options, etc. would get a badge of
"UPDATED"

4. At major release time, we could do one of two things:

4a. We could keep the NEW/UPDATED badges in the fixed release version, and
then completely remove them from the master, because for version N+1, they
won't be new anymore. This can be accomplished with an XSL transform
looking for any tag with the "revision" attribute

4b. We could code in the version number at release time, and leave it in
place. So in version 14 you could find both "v13" and "v14" badges, and in
version 15 you could find badges for 15, 14, and 13. At some point (say
v17), we start retiring the v13 badges, and in v18 we'd retire the v14
badges, and so on, to keep the clutter to a minimum.

Back to the patch:
I implemented this only for html output, and the colors I chose are very
off-brand for postgres, so that will have to change. There's probably some
spacing/padding issues I haven't thought of. Please try it out, make some
modifications to existing document pages to see how badges would work in
those contexts.
From ded965fc90b223a834ac52d55512587b7a6ea139 Mon Sep 17 00:00:00 2001
From: Corey Huinker 
Date: Fri, 18 Oct 2019 06:15:10 -0400
Subject: [PATCH] add document change badges

---
 doc/src/sgml/acronyms.sgml  |  6 +++---
 doc/src/sgml/stylesheet-html-common.xsl | 10 ++
 doc/src/sgml/stylesheet.css | 10 ++
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/acronyms.sgml b/doc/src/sgml/acronyms.sgml
index f638665dc9..87bfef04be 100644
--- a/doc/src/sgml/acronyms.sgml
+++ b/doc/src/sgml/acronyms.sgml
@@ -10,7 +10,7 @@
   
 

-ANSI
+ANSI
 
  
   https://en.wikipedia.org/wiki/American_National_Standards_Institute;>
@@ -19,7 +19,7 @@
 

 
-   
+   
 API
 
  
@@ -31,7 +31,7 @@

 ASCII
 
- 
+ 
   https://en.wikipedia.org/wiki/Ascii;>American Standard
   Code for Information Interchange
  
diff --git a/doc/src/sgml/stylesheet-html-common.xsl b/doc/src/sgml/stylesheet-html-common.xsl
index 9edce52a10..cb04cb7f0d 100644
--- a/doc/src/sgml/stylesheet-html-common.xsl
+++ b/doc/src/sgml/stylesheet-html-common.xsl
@@ -289,4 +289,14 @@ set   toc,title
   
 
 
+
+
+  
+  
+  
+
+  
+  
+
+
 
diff --git a/doc/src/sgml/stylesheet.css b/doc/src/sgml/stylesheet.css
index 1a66c789d5..d0cae2f59f 100644
--- a/doc/src/sgml/stylesheet.css
+++ b/doc/src/sgml/stylesheet.css
@@ -109,3 +109,13 @@ acronym		{ font-style: inherit; }
 width: 75%;
   }
 }
+
+/* version badge styling */
+span.revision-badge {
+	visibility: visible  ;
+color: white;
+	background-color: #00933C;
+	border: 1px solid #00;
+	border-radius: 2px;
+padding: 1px;
+}
-- 
2.14.1



Re: Questions/Observations related to Gist vacuum

2019-10-18 Thread Dilip Kumar
On Fri, Oct 18, 2019 at 10:55 AM Amit Kapila  wrote:
>
> On Fri, Oct 18, 2019 at 9:41 AM Dilip Kumar  wrote:
> >
> > On Wed, Oct 16, 2019 at 7:22 PM Heikki Linnakangas  wrote:
> > >
> > > On 16 October 2019 12:57:03 CEST, Amit Kapila  
> > > wrote:
> > > >On Tue, Oct 15, 2019 at 7:13 PM Heikki Linnakangas 
> > > >wrote:
> > > >> All things
> > > >> considered, I'm not sure which is better.
> > > >
> > > >Yeah, this is a tough call to make, but if we can allow it to delete
> > > >the pages in bulkdelete conditionally for parallel vacuum workers,
> > > >then it would be better.
> > >
> > > Yeah, if it's needed for parallel vacuum, maybe that tips the scale.
> >
> > Are we planning to do this only if it is called from parallel vacuum
> > workers or in general?
> >
>
> I think we can do it in general as adding some check for parallel
> vacuum there will look bit hackish.
I agree with that point.
 It is not clear if we get enough
> benefit by keeping it for cleanup phase of the index as discussed in
> emails above.  Heikki, others, let us know if you don't agree here.

I have prepared a first version of the patch.  Currently, I am
performing an empty page deletion for all the cases.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


delete_emptypages_in_gistbulkdelete_v1.patch
Description: Binary data


Re: WIP/PoC for parallel backup

2019-10-18 Thread Jeevan Chalke
On Thu, Oct 17, 2019 at 10:51 AM Asif Rehman  wrote:

>
> Attached are the updated patches.
>

I had a quick look over these changes and they look good overall.
However, here are my few review comments I caught while glancing the patches
0002 and 0003.


--- 0002 patch

1.
Can lsn option be renamed to start-wal-location? This will be more clear
too.

2.
+typedef struct
+{
+charname[MAXPGPATH];
+chartype;
+int32size;
+time_tmtime;
+} BackupFile;

I think it will be good if we keep this structure in a common place so that
the client can also use it.

3.
+SEND_FILE_LIST,
+SEND_FILES_CONTENT,
Can above two commands renamed to SEND_BACKUP_MANIFEST and SEND_BACKUP_FILE
respectively?
The reason behind the first name change is, we are not getting only file
lists
here instead we are getting a few more details with that too. And for
others,
it will be inline with START_BACKUP/STOP_BACKUP/SEND_BACKUP_MANIFEST.

4.
Typos:
non-exlusive => non-exclusive
retured => returned
optionaly => optionally
nessery => necessary
totoal => total


--- 0003 patch

1.
+static int
+simple_list_length(SimpleStringList *list)
+{
+intlen = 0;
+SimpleStringListCell *cell;
+
+for (cell = list->head; cell; cell = cell->next, len++)
+;
+
+return len;
+}

I think it will be good if it goes to simple_list.c. That will help in other
usages as well.

2.
Please revert these unnecessary changes:

@@ -1475,6 +1575,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res,
int rownum)
  */
 snprintf(filename, sizeof(filename), "%s/%s", current_path,
  copybuf);
+
 if (filename[strlen(filename) - 1] == '/')
 {
 /*

@@ -1528,8 +1622,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res,
int rownum)
  * can map them too.)
  */
 filename[strlen(filename) - 1] = '\0';/* Remove
trailing slash */
-
 mapped_tblspc_path =
get_tablespace_mapping([157]);
+
 if (symlink(mapped_tblspc_path, filename) != 0)
 {
 pg_log_error("could not create symbolic link from
\"%s\" to \"%s\": %m",

3.
Typos:
retrive => retrieve
takecare => take care
tablespae => tablespace

4.
ParallelBackupEnd() function does not do anything for parallelism. Will it
be
better to just rename it as EndBackup()?

5.
To pass a tablespace path to the server in SEND_FILES_CONTENT, you are
reusing
a LABEL option, that seems odd. How about adding a new option for that?

6.
It will be good if we have some comments explaining what the function is
actually doing in its prologue. For functions like:
GetBackupFilesList()
ReceiveFiles()
create_workers_and_fetch()


Thanks


>
> Thanks,
>
> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>
>

-- 
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Re: v12.0: segfault in reindex CONCURRENTLY

2019-10-18 Thread Alvaro Herrera
On 2019-Oct-18, Michael Paquier wrote:

> What you are proposing here sounds fine to me.  Perhaps you would
> prefer to adjust the code yourself?

Sure thing, thanks, done :-)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Compressed pluggable storage experiments

2019-10-18 Thread Andres Freund
Hi,

On 2019-10-17 12:47:47 -0300, Alvaro Herrera wrote:
> On 2019-Oct-10, Ildar Musin wrote:
> 
> > 1. Unlike FDW API, in pluggable storage API there are no routines like
> > "begin modify table" and "end modify table" and there is no shared
> > state between insert/update/delete calls.
> 
> Hmm.  I think adding a begin/end to modifytable is a reasonable thing to
> do (it'd be a no-op for heap and zheap I guess).

I'm fairly strongly against that. Adding two additional "virtual"
function calls for something that's rarely going to be used, seems like
adding too much overhead to me.


> > 2. It looks like I cannot implement custom storage options. E.g. for
> > compressed storage it makes sense to implement different compression
> > methods (lz4, zstd etc.) and corresponding options (like compression
> > level). But as i can see storage options (like fillfactor etc) are
> > hardcoded and are not extensible. Possible solution is to use GUCs
> > which would work but is not extremely convinient.
> 
> Yeah, the reloptions module is undergoing some changes.  I expect that
> there will be a way to extend reloptions from an extension, at the end
> of that set of patches.

Cool.


> > 3. A bit surprising limitation that in order to use bitmap scan the
> > maximum number of tuples per page must not exceed 291 due to
> > MAX_TUPLES_PER_PAGE macro in tidbitmap.c which is calculated based on
> > 8kb page size. In case of 1mb page this restriction feels really
> > limiting.
> 
> I suppose this is a hardcoded limit that needs to be fixed by patching
> core as we make table AM more pervasive.

That's not unproblematic - a dynamic limit would make a number of
computations more expensive, and we already spend plenty CPU cycles
building the tid bitmap. And we'd waste plenty of memory just having all
that space for the worst case.  ISTM that we "just" need to replace the
TID bitmap with some tree like structure.


> > 4. In order to use WAL-logging each page must start with a standard 24
> > byte PageHeaderData even if it is needless for storage itself. Not a
> > big deal though. Another (acutally documented) WAL-related limitation
> > is that only generic WAL can be used within extension. So unless
> > inserts are made in bulks it's going to require a lot of disk space to
> > accomodate logs and wide bandwith for replication.
> 
> Not sure what to suggest.  Either you should ignore this problem, or
> you should fix it.

I think if it becomes a problem you should ask for an rmgr ID to use for
your extension, which we encode and then then allow to set the relevant
rmgr callbacks for that rmgr id at startup.  But you should obviously
first develop the WAL logging etc, and make sure it's beneficial over
generic wal logging for your case.

Greetings,

Andres Freund




Re: [PATCH] Race condition in logical walsender causes long postgresql shutdown delay

2019-10-18 Thread Craig Ringer
On Thu, 17 Oct 2019 at 21:19, Alvaro Herrera 
wrote:

> On 2019-Sep-26, Alvaro Herrera wrote:
>
> > On 2019-Sep-26, Jeff Janes wrote:
>
> > > Hi Alvaro, does this count as a review?
> >
> > Well, I'm already a second pair of eyes for Craig's code, so I think it
> > does :-)  I would have liked confirmation from Craig that my change
> > looks okay to him too, but maybe we'll have to go without that.
>
> There not being a third pair of eyes, I have pushed this.
>
> Thanks!
>
>
> Thanks.

I'm struggling to keep up with my own threads right now...


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Obsolete comment in partbounds.c

2019-10-18 Thread Alvaro Herrera
On 2019-Oct-18, Etsuro Fujita wrote:

> While reviewing the partitionwise-join patch, I noticed $Subject,ie,
> this in create_list_bounds():
> 
> /*
>  * Never put a null into the values array, flag instead for
>  * the code further down below where we construct the actual
>  * relcache struct.
>  */
> if (null_index != -1)
> elog(ERROR, "found null more than once");
> null_index = i;
> 
> "the code further down below where we construct the actual relcache
> struct" isn't in the same file anymore by refactoring by commit
> b52b7dc25.  How about modifying it like the attached?

Yeah, agreed.  Instead of "the null comes from" I would use "the
partition that stores nulls".

While reviewing your patch I noticed a few places where we use an odd
pattern in switches, which can be simplified as shown here.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 318d8ecae9..e051094d54 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -209,13 +209,9 @@ partition_bounds_create(PartitionBoundSpec **boundspecs, int nparts,
 			return create_range_bounds(boundspecs, nparts, key, mapping);
 
 		default:
-			elog(ERROR, "unexpected partition strategy: %d",
- (int) key->strategy);
-			break;
+			Assert(false);
+			return NULL;		/* keep compiler quiet */
 	}
-
-	Assert(false);
-	return NULL;/* keep compiler quiet */
 }
 
 /*
@@ -1797,10 +1793,7 @@ qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
 static int
 get_partition_bound_num_indexes(PartitionBoundInfo bound)
 {
-	int			num_indexes;
-
 	Assert(bound);
-
 	switch (bound->strategy)
 	{
 		case PARTITION_STRATEGY_HASH:
@@ -1809,24 +1802,20 @@ get_partition_bound_num_indexes(PartitionBoundInfo bound)
 			 * The number of the entries in the indexes array is same as the
 			 * greatest modulus.
 			 */
-			num_indexes = get_hash_partition_greatest_modulus(bound);
-			break;
+			return get_hash_partition_greatest_modulus(bound);
 
 		case PARTITION_STRATEGY_LIST:
-			num_indexes = bound->ndatums;
+			return bound->ndatums;
 			break;
 
 		case PARTITION_STRATEGY_RANGE:
 			/* Range partitioned table has an extra index. */
-			num_indexes = bound->ndatums + 1;
-			break;
+			return bound->ndatums + 1;
 
 		default:
-			elog(ERROR, "unexpected partition strategy: %d",
- (int) bound->strategy);
+			Assert(false);
+			return 0;			/* keep compiler quiet */
 	}
-
-	return num_indexes;
 }
 
 /*


Re: Partitioning versus autovacuum

2019-10-18 Thread Greg Stark
At the risk of forking this thread... I think there's actually a
planner estimation bug here too.

Consider this test case of a simple partitioned table and a simple
join. The cardinality estimates for each partition and the Append node
are all perfectly accurate. But the estimate for the join is way off.
The corresponding test case without partitioning produces a perfect
cardinality estimate for the join.

I've never completely wrapped my head around the planner selectivity
estimations. IIRC join restrictions are treated differently from
single-relation restrictions. Perhaps what's happening here is that
the single-relation restrictions are being correctly estimated based
on the child partitions but the join restriction code hasn't been
taught the same tricks?



stark=# create table p (i integer, j integer) partition by list  (i);
CREATE TABLE

stark=# create table p0 partition of p for values in (0);
CREATE TABLE
stark=# create table p1 partition of p for values in (1);
CREATE TABLE

stark=# insert into p select 0,generate_series(1,1000);
INSERT 0 1000
stark=# insert into p select 1,generate_series(1,1000);
INSERT 0 1000

stark=# analyze p0;
ANALYZE
stark=# analyze p1;
ANALYZE

stark=# create table q (i integer);
CREATE TABLE
stark=# insert into q values (0);
INSERT 0 1
stark=# analyze q;
ANALYZE

-- Query partitioned table, get wildly off row estimates for join

stark=# explain analyze select * from q join p using (i) where j
between 1 and 500;
┌─┐
│ QUERY PLAN
   │
├─┤
│ Hash Join  (cost=1.02..44.82 rows=5 width=8) (actual
time=0.060..1.614 rows=500 loops=1)│
│   Hash Cond: (p0.i = q.i)
   │
│   ->  Append  (cost=0.00..40.00 rows=1000 width=8) (actual
time=0.030..1.127 rows=1000 loops=1) │
│ ->  Seq Scan on p0  (cost=0.00..20.00 rows=500 width=8)
(actual time=0.029..0.440 rows=500 loops=1) │
│   Filter: ((j >= 1) AND (j <= 500))
   │
│   Rows Removed by Filter: 500
   │
│ ->  Seq Scan on p1  (cost=0.00..20.00 rows=500 width=8)
(actual time=0.018..0.461 rows=500 loops=1) │
│   Filter: ((j >= 1) AND (j <= 500))
   │
│   Rows Removed by Filter: 500
   │
│   ->  Hash  (cost=1.01..1.01 rows=1 width=4) (actual
time=0.011..0.012 rows=1 loops=1)  │
│ Buckets: 1024  Batches: 1  Memory Usage: 9kB
   │
│ ->  Seq Scan on q  (cost=0.00..1.01 rows=1 width=4) (actual
time=0.005..0.006 rows=1 loops=1)   │
│ Planning time: 0.713 ms
   │
│ Execution time: 1.743 ms
   │
└─┘
(14 rows)


-- Query non-partitioned table get accurate row estimates for join

stark=# create table pp as (Select * from p);
SELECT 2000
stark=# analyze pp;
ANALYZE

stark=# explain analyze select * from q join pp using (i) where j
between 1 and 500;
┌─┐
│   QUERY PLAN
   │
├─┤
│ Hash Join  (cost=1.02..48.77 rows=500 width=8) (actual
time=0.027..0.412 rows=500 loops=1)  │
│   Hash Cond: (pp.i = q.i)
   │
│   ->  Seq Scan on pp  (cost=0.00..39.00 rows=1000 width=8) (actual
time=0.014..0.243 rows=1000 loops=1) │
│ Filter: ((j >= 1) AND (j <= 500))
   │
│ Rows Removed by Filter: 1000
   │
│   ->  Hash  (cost=1.01..1.01 rows=1 width=4) (actual
time=0.005..0.005 rows=1 loops=1)  │
│ Buckets: 1024  Batches: 1  Memory Usage: 9kB
   │
│ ->  Seq Scan on q  (cost=0.00..1.01 rows=1 width=4) (actual
time=0.003..0.003 rows=1 loops=1)   │
│ Planning time: 0.160 ms
   │
│ Execution time: 0.456 ms
   │
└─┘
(10 rows)


Re: [Patch] Base backups and random or zero pageheaders

2019-10-18 Thread Michael Banck
Hi,

Am Samstag, den 04.05.2019, 21:50 +0900 schrieb Michael Paquier:
> On Tue, Apr 30, 2019 at 03:07:43PM +0200, Michael Banck wrote:
> > This is still an open item for the back branches I guess, i.e. zero page
> > header for pg_verify_checksums and additionally random page header for
> > pg_basebackup's base backup.
> 
> I may be missing something, but could you add an entry in the future
> commit fest about the stuff discussed here?  I have not looked at your
> patch closely..  Sorry.

Here is finally a rebased patch for the (IMO) more important issue in
pg_basebackup. I've added a commitfest entry for this now: 
https://commitfest.postgresql.org/25/2308/


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutzFrom 084a2fe968a3ee9c0bb4f18fa9fbc8a582f38b3c Mon Sep 17 00:00:00 2001
From: Michael Banck 
Date: Fri, 18 Oct 2019 10:48:22 +0200
Subject: [PATCH] Fix base backup checksum verification for random or zero page
 headers

We currently do not checksum a page if it is considered new by PageIsNew() or
if its pd_lsn page header field is larger than the LSN at the beginning of the
base backup. However, this means that if the whole page header is zero,
PageIsNew() will consider that page new due to the pd_upper field being zero
and no subsequent checksum verification is done. Also, a random value in the
pd_lsn field will very likely be larger than the LSN at backup start, again
resulting in the page not getting checksummed.

To fix, factor out the all-zeroes check from PageIsVerified() into a new method
PageIsZero() and call that for pages considered new by PageIsNew(). If a page
is all zero, consider that a checksum failure. Also check the pd_lsn field
against both the backup start pointer and the current pointer from
GetInsertRecPtr().

Add two more tests to the pg_basebackup TAP tests to check those errors.

Reported-By: Andres Freund
Discussion: https://postgr.es/m/20190319200050.ncuxejradurja...@alap3.anarazel.de
---
 src/backend/replication/basebackup.c | 159 +++
 src/backend/storage/page/bufpage.c   |  56 ++
 src/bin/pg_basebackup/t/010_pg_basebackup.pl |  30 +++--
 src/include/storage/bufpage.h|   1 +
 4 files changed, 148 insertions(+), 98 deletions(-)

diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index d0f210de8c..ee026bc1d0 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1456,87 +1456,28 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 page = buf + BLCKSZ * i;
 
 /*
- * Only check pages which have not been modified since the
- * start of the base backup. Otherwise, they might have been
- * written only halfway and the checksum would not be valid.
- * However, replaying WAL would reinstate the correct page in
- * this case. We also skip completely new pages, since they
- * don't have a checksum yet.
+ * We skip completely new pages after checking they are
+ * all-zero, since they don't have a checksum yet.
  */
-if (!PageIsNew(page) && PageGetLSN(page) < startptr)
+if (PageIsNew(page))
 {
-	checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE);
-	phdr = (PageHeader) page;
-	if (phdr->pd_checksum != checksum)
+	if (!PageIsZero(page))
 	{
 		/*
-		 * Retry the block on the first failure.  It's
-		 * possible that we read the first 4K page of the
-		 * block just before postgres updated the entire block
-		 * so it ends up looking torn to us.  We only need to
-		 * retry once because the LSN should be updated to
-		 * something we can ignore on the next pass.  If the
-		 * error happens again then it is a true validation
-		 * failure.
+		 * pd_upper is zero, but the page is not all zero.  We
+		 * cannot run pg_checksum_page() on the page as it
+		 * would throw an assertion failure.  Consider this a
+		 * checksum failure.
 		 */
-		if (block_retry == false)
-		{
-			/* Reread the failed block */
-			if (fseek(fp, -(cnt - BLCKSZ * i), SEEK_CUR) == -1)
-			{
-ereport(ERROR,
-		(errcode_for_file_access(),
-		 errmsg("could not fseek in file \"%s\": %m",
-readfilename)));
-			}
-
-			if (fread(buf + BLCKSZ * i, 1, BLCKSZ, fp) != BLCKSZ)
-			{
-/*
- * If we hit end-of-file, a concurrent
- * truncation must have occurred, so break out
- * of this loop just as if the initial fread()
- * 

Re: UPSERT on view does not find constraint by name

2019-10-18 Thread Tom Lane
Jeremy Finzel  writes:
> I'm not sure if this can be considered a bug or not, but it is perhaps
> unexpected.  I found that when using a view that is simply select * from
> table, then doing INSERT ... ON CONFLICT ON CONSTRAINT constraint_name on
> that view, it does not find the constraint and errors out.  But it does
> find the constraint if one lists the columns instead.

I'm confused by this report.  The view wouldn't have any constraints,
and experimenting shows that the parser won't let you name a
constraint of the underlying table here.  So would you provide a
concrete example of what you're talking about?

regards, tom lane




Re: generating catcache control data

2019-10-18 Thread John Naylor
On Fri, Oct 11, 2019 at 3:14 AM Tom Lane  wrote:
> I do not like attaching this data to the DECLARE_UNIQUE_INDEX macros.
> It's really no business of the indexes' whether they are associated
> with a syscache.  It's *certainly* no business of theirs how many
> buckets such a cache should start off with.
>
> I'd be inclined to make a separate file that's specifically concerned
> with declaring syscaches, and put all the required data there.

That gave me another idea that would further reduce the bookkeeping
involved in creating new syscaches -- put declarations in the cache id
enum (syscache.h), like this:

#define DECLARE_SYSCACHE(cacheid,indexname,indexoid,numbuckets) cacheid

enum SysCacheIdentifier
{
DECLARE_SYSCACHE(AGGFNOID, pg_aggregate_fnoid_index,
AggregateFnoidIndexId, 16) = 0,
...
};

> > Relname, and relisshared are straightforward. For eq/hash functions,
> > we could add metadata attributes to pg_type.dat for the relevant
> > types. Tuple descriptors would get their attrs from schemapg.h.
>
> I don't see a need to hard-wire more information than we do today, and
> I'd prefer not to because it adds to the burden of creating new syscaches.
> Assuming that the plan is for genbki.pl or some similar script to generate
> the constants, it could look up all the appropriate data from the initial
> contents for pg_opclass and friends.  That is, basically what we want here
> is for a constant-creation script to perform the same lookups that're now
> done during backend startup.

Looking at it again, the eq/hash functions are local and looked up via
GetCCHashEqFuncs(), but the runtime of that is surely miniscule, so I
left it alone.

> > Is this something worth doing?
>
> Hard to tell.  It'd take a few cycles out of backend startup, which
> seems like a worthy goal; but I don't know if it'd save enough to be
> worth the trouble.  Probably can't tell for sure without doing most
> of the work :-(.

I went ahead and did just enough to remove the relation-opening code.
Looking in the archives, I found this as a quick test:

echo '\set x 1' > x.txt
./inst/bin/pgbench -n -C -c 1 -f x.txt -T 10

Typical numbers:

master:
number of transactions actually processed: 4276
latency average = 2.339 ms
tps = 427.549137 (including connections establishing)
tps = 24082.726350 (excluding connections establishing)

patch:
number of transactions actually processed: 4436
latency average = 2.255 ms
tps = 443.492369 (including connections establishing)
tps = 21817.308410 (excluding connections establishing)

...which amounts to nearly 4% improvement in the first tps number,
which isn't earth-shattering, but it's something. Opinions? It
wouldn't be a lot of additional work to put together a WIP patch.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Obsolete comment in partbounds.c

2019-10-18 Thread Etsuro Fujita
While reviewing the partitionwise-join patch, I noticed $Subject,ie,
this in create_list_bounds():

/*
 * Never put a null into the values array, flag instead for
 * the code further down below where we construct the actual
 * relcache struct.
 */
if (null_index != -1)
elog(ERROR, "found null more than once");
null_index = i;

"the code further down below where we construct the actual relcache
struct" isn't in the same file anymore by refactoring by commit
b52b7dc25.  How about modifying it like the attached?

Best regards,
Etsuro Fujita


fix-obsolete-comment.patch
Description: Binary data


Re: [HACKERS] Block level parallel vacuum

2019-10-18 Thread Dilip Kumar
On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila  wrote:
>
> On Fri, Oct 18, 2019 at 8:45 AM Dilip Kumar  wrote:
> >
> > On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila  wrote:
> > >
> > > On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar  wrote:
> > > >
> > > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada  
> > > > wrote:
> > > > >
> > > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila  
> > > > > wrote:
> > > > > >
> > > > > > Another point in this regard is that the user anyway has an option 
> > > > > > to
> > > > > > turn off the cost-based vacuum.  By default, it is anyway disabled.
> > > > > > So, if the user enables it we have to provide some sensible 
> > > > > > behavior.
> > > > > > If we can't come up with anything, then, in the end, we might want 
> > > > > > to
> > > > > > turn it off for a parallel vacuum and mention the same in docs, but 
> > > > > > I
> > > > > > think we should try to come up with a solution for it.
> > > > >
> > > > > I finally got your point and now understood the need. And the idea I
> > > > > proposed doesn't work fine.
> > > > >
> > > > > So you meant that all workers share the cost count and if a parallel
> > > > > vacuum worker increase the cost and it reaches the limit, does the
> > > > > only one worker sleep? Is that okay even though other parallel workers
> > > > > are still running and then the sleep might not help?
> > > > >
> > >
> > > Remember that the other running workers will also increase
> > > VacuumCostBalance and whichever worker finds that it becomes greater
> > > than VacuumCostLimit will reset its value and sleep.  So, won't this
> > > make sure that overall throttling works the same?
> > >
> > > > I agree with this point.  There is a possibility that some of the
> > > > workers who are doing heavy I/O continue to work and OTOH other
> > > > workers who are doing very less I/O might become the victim and
> > > > unnecessarily delay its operation.
> > > >
> > >
> > > Sure, but will it impact the overall I/O?  I mean to say the rate
> > > limit we want to provide for overall vacuum operation will still be
> > > the same.  Also, isn't a similar thing happens now also where heap
> > > might have done a major portion of I/O but soon after we start
> > > vacuuming the index, we will hit the limit and will sleep.
> >
> > Actually, What I meant is that the worker who performing actual I/O
> > might not go for the delay and another worker which has done only CPU
> > operation might pay the penalty?  So basically the worker who is doing
> > CPU intensive operation might go for the delay and pay the penalty and
> > the worker who is performing actual I/O continues to work and do
> > further I/O.  Do you think this is not a practical problem?
> >
>
> I don't know.  Generally, we try to delay (if required) before
> processing (read/write) one page which means it will happen for I/O
> intensive operations, so I am not sure if the point you are making is
> completely correct.

Ok, I agree with the point that we are checking it only when we are
doing the I/O operation.  But, we also need to consider that each I/O
operations have a different weightage.  So even if we have a delay
point at I/O operation there is a possibility that we might delay the
worker which is just performing read buffer with page
hit(VacuumCostPageHit).  But, the other worker who is actually
dirtying the page(VacuumCostPageDirty = 20) continue the work and do
more I/O.

>
> > Stepping back a bit,  OTOH, I think that we can not guarantee that the
> > one worker who has done more I/O will continue to do further I/O and
> > the one which has not done much I/O will not perform more I/O in
> > future.  So it might not be too bad if we compute shared costs as you
> > suggested above.
> >
>
> I am thinking if we can write the patch for both the approaches (a.
> compute shared costs and try to delay based on that, b. try to divide
> the I/O cost among workers as described in the email above[1]) and do
> some tests to see the behavior of throttling, that might help us in
> deciding what is the best strategy to solve this problem, if any.
> What do you think?

I agree with this idea.  I can come up with a POC patch for approach
(b).  Meanwhile, if someone is interested to quickly hack with the
approach (a) then we can do some testing and compare.  Sawada-san,
by any chance will you be interested to write POC with approach (a)?
Otherwise, I will try to write it after finishing the first one
(approach b).

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com