Re: Print physical file path when checksum check fails

2020-02-18 Thread Michael Paquier
On Wed, Feb 19, 2020 at 03:00:54PM +0900, Kyotaro Horiguchi wrote:
> I have had support requests related to broken block several times, and
> (I think) most of *them* had hard time to locate the broken block or
> even broken file.  I don't think it is useles at all, but I'm not sure
> it is worth the additional complexity.

I handle stuff like that from time to time, and any reports usually
go down to people knowledgeable about PostgreSQL enough to know the
difference.  My point is that in order to know where a broken block is
physically located on disk, you need to know four things:
- The block number.
- The physical location of the relation.
- The size of the block.
- The length of a file segment.
The first two items are printed in the error message, and you can
guess easily the actual location (file, offset) with the two others.

I am not necessarily against improving the error message here, but
FWIW I think that we need to consider seriously if the code
complications and the maintenance cost involved are really worth
saving from one simple calculation.  Particularly, quickly reading
through the patch, I am rather unhappy about the shape of the second
patch which pushes down the segment number knowledge into relpath.c,
and creates more complication around the handling of
zero_damaged_pages and zero'ed pages.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] WAL logging problem in 9.4.3?

2020-02-18 Thread Noah Misch
I think attached v35nm is ready for commit to master.  Would anyone like to
talk me out of back-patching this?  I would not enjoy back-patching it, but
it's hard to justify lack of back-patch for a data-loss bug.

Notable changes since v34:

- Separate a few freestanding fixes into their own patches.

On Mon, Jan 27, 2020 at 07:28:31PM +0900, Kyotaro Horiguchi wrote:
> --- a/src/backend/catalog/storage.c
> +++ b/src/backend/catalog/storage.c
> @@ -388,13 +388,7 @@ RelationPreTruncate(Relation rel)
>   /* Record largest maybe-unsynced block of files under tracking  */
>   pending = hash_search(pendingSyncHash, &(rel->rd_smgr->smgr_rnode.node),
> HASH_FIND, NULL);
> - if (pending)
> - {
> - BlockNumber nblocks = smgrnblocks(rel->rd_smgr, MAIN_FORKNUM);
> -
> - if (pending->max_truncated < nblocks)
> - pending->max_truncated = nblocks;
> - }
> + pending->is_truncated = true;

- Fix this crashing when "pending" is NULL, as it is in this test case:

  begin;
  create temp table t ();
  create table t2 ();  -- cause pendingSyncHash to exist
  truncate t;
  rollback;

- Fix the "deleted while still in use" problem that Thomas Munro reported, by
  removing the heap_create() change.  Restoring the saved rd_createSubid had
  made obsolete the heap_create() change.  check-world now passes with
  wal_level=minimal and CLOBBER_CACHE_ALWAYS.

- Set rd_droppedSubid in RelationForgetRelation(), not
  RelationClearRelation().  RelationForgetRelation() knows it is processing a
  drop, but RelationClearRelation() could only infer that from circumstantial
  evidence.  This seems more future-proof to me.

- When reusing an index build, instead of storing the dropped relid in the
  IndexStmt and opening the dropped relcache entry in ATExecAddIndex(), store
  the subid fields in the IndexStmt.  This is less code, and I felt
  RelationIdGetRelationCache() invited misuse.
Author: Noah Misch 
Commit: Noah Misch 

Fix cosmetic blemishes involving rd_createSubid.

Remove an obsolete comment from AtEOXact_cleanup().  Restore formatting
of a comment in struct RelationData, mangled by the pgindent run in
commit 9af4159fce6654aa0e081b00d02bca40b978745c.  Back-patch to 9.5 (all
supported versions), because another fix stacks on this.

diff --git a/src/backend/utils/cache/relcache.c 
b/src/backend/utils/cache/relcache.c
index 50f8912..44fa843 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3029,10 +3029,7 @@ AtEOXact_cleanup(Relation relation, bool isCommit)
 *
 * During commit, reset the flag to zero, since we are now out of the
 * creating transaction.  During abort, simply delete the relcache entry
-* --- it isn't interesting any longer.  (NOTE: if we have forgotten the
-* new-ness of a new relation due to a forced cache flush, the entry 
will
-* get deleted anyway by shared-cache-inval processing of the aborted
-* pg_class insertion.)
+* --- it isn't interesting any longer.
 */
if (relation->rd_createSubid != InvalidSubTransactionId)
{
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 31d8a1a..bf7a7df 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -63,7 +63,7 @@ typedef struct RelationData
 * 
rd_replidindex) */
boolrd_statvalid;   /* is rd_statlist valid? */
 
-   /*
+   /*--
 * rd_createSubid is the ID of the highest subtransaction the rel has
 * survived into; or zero if the rel was not created in the current top
 * transaction.  This can be now be relied on, whereas previously it 
could
@@ -73,8 +73,13 @@ typedef struct RelationData
 * have forgotten changing it). rd_newRelfilenodeSubid can be forgotten
 * when a relation has multiple new relfilenodes within a single
 * transaction, with one of them occurring in a subsequently aborted
-* subtransaction, e.g. BEGIN; TRUNCATE t; SAVEPOINT save; TRUNCATE t;
-* ROLLBACK TO save; -- rd_newRelfilenodeSubid is now forgotten
+* subtransaction, e.g.
+*  BEGIN;
+*  TRUNCATE t;
+*  SAVEPOINT save;
+*  TRUNCATE t;
+*  ROLLBACK TO save;
+*  -- rd_newRelfilenodeSubid is now forgotten
 */
SubTransactionId rd_createSubid;/* rel was created in current 
xact */
SubTransactionId rd_newRelfilenodeSubid;/* new relfilenode 
assigned in
Author: Noah Misch 
Commit: Noah Misch 

During heap rebuild, lock any TOAST index until end of transaction.

swap_relation_files() calls toast_get_valid_index() to find and lock
this index, just before 

Re: WAL usage calculation patch

2020-02-18 Thread Kirill Bychik
вт, 18 февр. 2020 г. в 06:23, Thomas Munro :
> On Mon, Feb 10, 2020 at 8:20 PM Craig Ringer  wrote:
> > On Wed, 5 Feb 2020 at 21:36, Kirill Bychik  wrote:
> > > Patch is separated in two parts: core changes and pg_stat_statements
> > > additions. Essentially the extension has its schema updated to allow
> > > two more fields, docs updated to reflect the change. Patch is prepared
> > > against master branch.
> > >
> > > Please provide your comments and/or code findings.
> >
> > I like the concept, I'm a big fan of anything that affordably improves
> > visibility into Pg's I/O and activity.
>
> +1
>
> > To date I've been relying on tools like systemtap to do this sort of
> > thing. But that's a bit specialised, and Pg currently lacks useful
> > instrumentation for it so it can be a pain to match up activity by
> > parallel workers and that sort of thing. (I aim to find time to submit
> > a patch for that.)
>
> (I'm interested in seeing your conference talk about that!  I did a
> bunch of stuff with static probes to measure PHJ behaviour around
> barrier waits and so on but it was hard to figure out what stuff like
> that to put in the actual tree, it was all a bit
> use-once-to-test-a-theory-and-then-throw-away.)
>
> Kirill, I noticed that you included a regression test that is failing.  Can
> this possibly be stable across machines or even on the same machine?
> Does it still pass for you or did something change on the master
> branch to add a new WAL record since you posted the patch?

Thank you for testing the patch and running extension checks. I assume
the patch applies without problems.

As for the regr test, it apparently requires some rework. I didn't pay
attention enough to make sure the data I check is actually meaningful
and isolated enough to be repeatable.

Please consider the extension part of the patch as WIP, I'll resubmit
the patch once I get a stable and meanngful test up. Thanks for
finding it!

> query | calls | rows | wal_write_bytes | wal_write_records
>  
> ---+---+--+-+---
> - CREATE INDEX test_b ON test(b)| 1 |0 | 1673 |
> 16
> - DROP FUNCTION IF EXISTS PLUS_ONE(INTEGER) | 1 |0 |   56 |
>  1
> + CREATE INDEX test_b ON test(b)| 1 |0 | 1755 |
> 17
> + DROP FUNCTION IF EXISTS PLUS_ONE(INTEGER) | 1 |0 |0 |
>  0




Re: plan cache overhead on plpgsql expression

2020-02-18 Thread Amit Langote
On Wed, Feb 19, 2020 at 3:56 PM Amit Langote  wrote:
> On Wed, Feb 19, 2020 at 3:38 PM Pavel Stehule  wrote:
> > st 19. 2. 2020 v 7:30 odesílatel Pavel Stehule  
> > napsal:
> >> út 18. 2. 2020 v 17:08 odesílatel Amit Langote  
> >> napsal:
> >>> > I updated the patch to do that.
> >>> >
> >>> > With the new patch, `select foo()`, with inline-able sql_incr() in it,
> >>> > runs in 679 ms.
> >>> >
> >>> > Without any inline-able function, it runs in 330 ms, whereas with
> >>> > HEAD, it takes 590 ms.
> >>>
> >>> I polished it a bit.
> >>
> >>
> >> the performance looks very interesting - on my comp the execution time of  
> >> 1 iterations was decreased from 34 sec to 15 sec,
> >>
> >> So it is interesting speedup
> >
> > but regress tests fails
>
> Oops, I failed to check src/pl/plpgsql tests.
>
> Fixed in the attached.

Added a regression test based on examples discussed here too.

Thanks,
Amit
diff --git a/src/backend/optimizer/util/clauses.c 
b/src/backend/optimizer/util/clauses.c
index 2d3ec22407..5ce0079a12 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -108,6 +108,7 @@ static bool 
contain_volatile_functions_not_nextval_walker(Node *node, void *cont
 static bool max_parallel_hazard_walker(Node *node,
   
max_parallel_hazard_context *context);
 static bool contain_nonstrict_functions_walker(Node *node, void *context);
+static bool contain_inlinable_functions_walker(Node *node, void *context);
 static bool contain_context_dependent_node(Node *clause);
 static bool contain_context_dependent_node_walker(Node *node, int *flags);
 static bool contain_leaked_vars_walker(Node *node, void *context);
@@ -1218,6 +1219,63 @@ contain_nonstrict_functions_walker(Node *node, void 
*context)
  context);
 }
 
+/*
+ * Check clauses for inline-able functions
+ */
+
+bool
+contain_inlinable_functions(Node *node)
+{
+   return contain_inlinable_functions_walker(node, NULL);
+}
+
+/*
+ * can_inline_function - checks if a function is inline-able
+ */
+static bool
+can_inline_function(HeapTuple func_tuple)
+{
+   Form_pg_proc funcform = (Form_pg_proc) GETSTRUCT(func_tuple);
+
+   /*
+* Nope if the function is not SQL-language or has other showstopper
+* properties.  (The prokind and nargs checks are just paranoia.)
+*/
+   return  funcform->prolang == SQLlanguageId &&
+   funcform->prokind == PROKIND_FUNCTION &&
+   !funcform->prosecdef && !funcform->proretset &&
+   funcform->prorettype != RECORDOID &&
+   heap_attisnull(func_tuple, Anum_pg_proc_proconfig, 
NULL);
+}
+
+static bool
+can_inline_function_checker(Oid funcid, void *context)
+{
+   HeapTuple   func_tuple;
+   boolresult;
+
+   func_tuple = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid));
+   if (!HeapTupleIsValid(func_tuple))
+   elog(ERROR, "cache lookup failed for function %u", funcid);
+
+   result = can_inline_function(func_tuple);
+   ReleaseSysCache(func_tuple);
+
+   return result;
+}
+
+static bool
+contain_inlinable_functions_walker(Node *node, void *context)
+{
+   if (node == NULL)
+   return false;
+   if (check_functions_in_node(node, can_inline_function_checker, context))
+   return true;
+
+   return expression_tree_walker(node, contain_inlinable_functions_walker,
+ context);
+}
+
 /*
  * Check clauses for context-dependent nodes
  */
@@ -4022,7 +4080,8 @@ simplify_function(Oid funcid, Oid result_type, int32 
result_typmod,
Assert(newexpr != (Expr *) );
}
 
-   if (!newexpr && allow_non_const)
+   if (!newexpr && allow_non_const &&
+   can_inline_function(func_tuple))
newexpr = inline_function(funcid, result_type, result_collid,
  input_collid, 
args, funcvariadic,
  func_tuple, 
context);
@@ -4415,16 +4474,11 @@ inline_function(Oid funcid, Oid result_type, Oid 
result_collid,
int i;
 
/*
-* Forget it if the function is not SQL-language or has other 
showstopper
-* properties.  (The prokind and nargs checks are just paranoia.)
+* Caller should already have checked whether the function can be 
inlined
+* using 

Re: Yet another fast GiST build

2020-02-18 Thread Thomas Munro
On Mon, Dec 30, 2019 at 7:43 PM Andrey Borodin  wrote:
> PFA rebased patch.

Hi Andrey,

This looks really interesting, and I am sure there are a lot of GIS
people who would love to see dramatically faster and smaller indexes
in PG13.  I don't know enough to comment on the details, but here are
some superficial comments:

+   method is also optional and is used diring fast GiST build.

-> during

+/* esteblish order between x and y */

-> establish

+/* Compute Z-oder for point */
 static inline uint64
 point_zorder_internal(Point *p)

-> order

Could this function please have a comment that explains why it works?
I mean, just a breadcrumb... the name of the technique or something...
so that uninitiated hackers can google their way to a clue (is it
"Morton encoding"?)

MSVC says:

src/backend/access/gist/gistproc.c(1582): error C2065: 'INT32_MAX' :
undeclared identifier

GCC says:

gistbuild.c: In function ‘gist_indexsortbuild’:
gistbuild.c:256:4: error: ISO C90 forbids mixed declarations and code
[-Werror=declaration-after-statement]
IndexTuple *itvec = gistextractpage(lower_page, _len);
^




Re: plan cache overhead on plpgsql expression

2020-02-18 Thread Amit Langote
On Wed, Feb 19, 2020 at 3:38 PM Pavel Stehule  wrote:
> st 19. 2. 2020 v 7:30 odesílatel Pavel Stehule  
> napsal:
>> út 18. 2. 2020 v 17:08 odesílatel Amit Langote  
>> napsal:
>>> > I updated the patch to do that.
>>> >
>>> > With the new patch, `select foo()`, with inline-able sql_incr() in it,
>>> > runs in 679 ms.
>>> >
>>> > Without any inline-able function, it runs in 330 ms, whereas with
>>> > HEAD, it takes 590 ms.
>>>
>>> I polished it a bit.
>>
>>
>> the performance looks very interesting - on my comp the execution time of  
>> 1 iterations was decreased from 34 sec to 15 sec,
>>
>> So it is interesting speedup
>
> but regress tests fails

Oops, I failed to check src/pl/plpgsql tests.

Fixed in the attached.

Thanks,
Amit
diff --git a/src/backend/optimizer/util/clauses.c 
b/src/backend/optimizer/util/clauses.c
index 2d3ec22407..5ce0079a12 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -108,6 +108,7 @@ static bool 
contain_volatile_functions_not_nextval_walker(Node *node, void *cont
 static bool max_parallel_hazard_walker(Node *node,
   
max_parallel_hazard_context *context);
 static bool contain_nonstrict_functions_walker(Node *node, void *context);
+static bool contain_inlinable_functions_walker(Node *node, void *context);
 static bool contain_context_dependent_node(Node *clause);
 static bool contain_context_dependent_node_walker(Node *node, int *flags);
 static bool contain_leaked_vars_walker(Node *node, void *context);
@@ -1218,6 +1219,63 @@ contain_nonstrict_functions_walker(Node *node, void 
*context)
  context);
 }
 
+/*
+ * Check clauses for inline-able functions
+ */
+
+bool
+contain_inlinable_functions(Node *node)
+{
+   return contain_inlinable_functions_walker(node, NULL);
+}
+
+/*
+ * can_inline_function - checks if a function is inline-able
+ */
+static bool
+can_inline_function(HeapTuple func_tuple)
+{
+   Form_pg_proc funcform = (Form_pg_proc) GETSTRUCT(func_tuple);
+
+   /*
+* Nope if the function is not SQL-language or has other showstopper
+* properties.  (The prokind and nargs checks are just paranoia.)
+*/
+   return  funcform->prolang == SQLlanguageId &&
+   funcform->prokind == PROKIND_FUNCTION &&
+   !funcform->prosecdef && !funcform->proretset &&
+   funcform->prorettype != RECORDOID &&
+   heap_attisnull(func_tuple, Anum_pg_proc_proconfig, 
NULL);
+}
+
+static bool
+can_inline_function_checker(Oid funcid, void *context)
+{
+   HeapTuple   func_tuple;
+   boolresult;
+
+   func_tuple = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid));
+   if (!HeapTupleIsValid(func_tuple))
+   elog(ERROR, "cache lookup failed for function %u", funcid);
+
+   result = can_inline_function(func_tuple);
+   ReleaseSysCache(func_tuple);
+
+   return result;
+}
+
+static bool
+contain_inlinable_functions_walker(Node *node, void *context)
+{
+   if (node == NULL)
+   return false;
+   if (check_functions_in_node(node, can_inline_function_checker, context))
+   return true;
+
+   return expression_tree_walker(node, contain_inlinable_functions_walker,
+ context);
+}
+
 /*
  * Check clauses for context-dependent nodes
  */
@@ -4022,7 +4080,8 @@ simplify_function(Oid funcid, Oid result_type, int32 
result_typmod,
Assert(newexpr != (Expr *) );
}
 
-   if (!newexpr && allow_non_const)
+   if (!newexpr && allow_non_const &&
+   can_inline_function(func_tuple))
newexpr = inline_function(funcid, result_type, result_collid,
  input_collid, 
args, funcvariadic,
  func_tuple, 
context);
@@ -4415,16 +4474,11 @@ inline_function(Oid funcid, Oid result_type, Oid 
result_collid,
int i;
 
/*
-* Forget it if the function is not SQL-language or has other 
showstopper
-* properties.  (The prokind and nargs checks are just paranoia.)
+* Caller should already have checked whether the function can be 
inlined
+* using can_function_inline().
 */
-   if (funcform->prolang != SQLlanguageId ||
-   funcform->prokind != PROKIND_FUNCTION ||
-   

Re: plan cache overhead on plpgsql expression

2020-02-18 Thread Pavel Stehule
st 19. 2. 2020 v 7:30 odesílatel Pavel Stehule 
napsal:

>
>
> út 18. 2. 2020 v 17:08 odesílatel Amit Langote 
> napsal:
>
>> On Tue, Feb 18, 2020 at 6:56 PM Amit Langote 
>> wrote:
>> > On Tue, Feb 18, 2020 at 2:56 PM Pavel Stehule 
>> wrote:
>> > > út 18. 2. 2020 v 6:03 odesílatel Amit Langote <
>> amitlangot...@gmail.com> napsal:
>> > >> I didn't send the patch, because it didn't handle the cases where a
>> > >> simple expression consists of an inline-able function(s) in it, which
>> > >> are better handled by a full-fledged planner call backed up by the
>> > >> plan cache.  If we don't do that then every evaluation of such
>> > >> "simple" expression needs to invoke the planner.  For example:
>> > >>
>> > >> Consider this inline-able SQL function:
>> > >>
>> > >> create or replace function sql_incr(a bigint)
>> > >> returns int
>> > >> immutable language sql as $$
>> > >> select a+1;
>> > >> $$;
>> > >>
>> > >> Then this revised body of your function foo():
>> > >>
>> > >> CREATE OR REPLACE FUNCTION public.foo()
>> > >>  RETURNS int
>> > >>  LANGUAGE plpgsql
>> > >>  IMMUTABLE
>> > >> AS $function$
>> > >> declare i bigint = 0;
>> > >> begin
>> > >>   while i < 100
>> > >>   loop
>> > >> i := sql_incr(i);
>> > >>   end loop; return i;
>> > >> end;
>> > >> $function$
>> > >> ;
>> > >>
>> > >> With HEAD `select foo()` finishes in 786 ms, whereas with the patch,
>> > >> it takes 5102 ms.
>> > >>
>> > >> I think the patch might be good idea to reduce the time to compute
>> > >> simple expressions in plpgsql, if we can address the above issue.
>> > >
>> > >
>> > > Your patch is very interesting - minimally it returns performance
>> before 8.2. The mentioned issue can be fixed if we disallow SQL functions
>> in this fast execution.
>> >
>> > I updated the patch to do that.
>> >
>> > With the new patch, `select foo()`, with inline-able sql_incr() in it,
>> > runs in 679 ms.
>> >
>> > Without any inline-able function, it runs in 330 ms, whereas with
>> > HEAD, it takes 590 ms.
>>
>> I polished it a bit.
>>
>
> the performance looks very interesting - on my comp the execution time of
> 1 iterations was decreased from 34 sec to 15 sec,
>
> So it is interesting speedup
>

but regress tests fails



> Pavel
>
>
>
>> Thanks,
>> Amit
>>
>


regression.out
Description: Binary data


regression.diffs
Description: Binary data


Re: plan cache overhead on plpgsql expression

2020-02-18 Thread Pavel Stehule
út 18. 2. 2020 v 17:08 odesílatel Amit Langote 
napsal:

> On Tue, Feb 18, 2020 at 6:56 PM Amit Langote 
> wrote:
> > On Tue, Feb 18, 2020 at 2:56 PM Pavel Stehule 
> wrote:
> > > út 18. 2. 2020 v 6:03 odesílatel Amit Langote 
> napsal:
> > >> I didn't send the patch, because it didn't handle the cases where a
> > >> simple expression consists of an inline-able function(s) in it, which
> > >> are better handled by a full-fledged planner call backed up by the
> > >> plan cache.  If we don't do that then every evaluation of such
> > >> "simple" expression needs to invoke the planner.  For example:
> > >>
> > >> Consider this inline-able SQL function:
> > >>
> > >> create or replace function sql_incr(a bigint)
> > >> returns int
> > >> immutable language sql as $$
> > >> select a+1;
> > >> $$;
> > >>
> > >> Then this revised body of your function foo():
> > >>
> > >> CREATE OR REPLACE FUNCTION public.foo()
> > >>  RETURNS int
> > >>  LANGUAGE plpgsql
> > >>  IMMUTABLE
> > >> AS $function$
> > >> declare i bigint = 0;
> > >> begin
> > >>   while i < 100
> > >>   loop
> > >> i := sql_incr(i);
> > >>   end loop; return i;
> > >> end;
> > >> $function$
> > >> ;
> > >>
> > >> With HEAD `select foo()` finishes in 786 ms, whereas with the patch,
> > >> it takes 5102 ms.
> > >>
> > >> I think the patch might be good idea to reduce the time to compute
> > >> simple expressions in plpgsql, if we can address the above issue.
> > >
> > >
> > > Your patch is very interesting - minimally it returns performance
> before 8.2. The mentioned issue can be fixed if we disallow SQL functions
> in this fast execution.
> >
> > I updated the patch to do that.
> >
> > With the new patch, `select foo()`, with inline-able sql_incr() in it,
> > runs in 679 ms.
> >
> > Without any inline-able function, it runs in 330 ms, whereas with
> > HEAD, it takes 590 ms.
>
> I polished it a bit.
>

the performance looks very interesting - on my comp the execution time of
1 iterations was decreased from 34 sec to 15 sec,

So it is interesting speedup

Pavel



> Thanks,
> Amit
>


Re: Print physical file path when checksum check fails

2020-02-18 Thread Kyotaro Horiguchi
At Wed, 19 Feb 2020 13:28:04 +0900, Michael Paquier  wrote 
in 
> On Wed, Feb 19, 2020 at 01:07:36PM +0900, Kyotaro Horiguchi wrote:
> > If we also verify checksum in md layer, callback is overkill since the
> > immediate caller consumes the event immediately.  We can signal the
> > error by somehow returning a file tag.
> 
> FWIW, I am wondering if there is any need for a change here and
> complicate more the code.  If you know the block number, the page size
> and the segment file size you can immediately guess where is the
> damaged block.  The first information is already part of the error

I have had support requests related to broken block several times, and
(I think) most of *them* had hard time to locate the broken block or
even broken file.  I don't think it is useles at all, but I'm not sure
it is worth the additional complexity.

> damaged block.  The first information is already part of the error
> message, and the two other ones are constants defined at
> compile-time.

May you have misread the snippet?

What Hubert proposed is:

 "invalid page in block %u of relation file %s; zeroing out page",
blkno, 

The second format in my messages just before is:
  "invalid page in block %u in relation %u, file \"%s\"",
 blockNum, smgr->smgr_rnode.node.relNode, smgrfname()

All of them are not compile-time constant at all.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-02-18 Thread Amit Kapila
On Mon, Feb 17, 2020 at 2:42 AM Tom Lane  wrote:
>
> Andres Freund  writes:
> > On 2020-02-14 13:34:03 -0500, Robert Haas wrote:
> >> I think the group locking + deadlock detection things are more
> >> fundamental than you might be crediting, but I agree that having
> >> parallel mechanisms has its own set of pitfalls.
>
> > It's possible. But I'm also hesitant to believe that we'll not need
> > other lock types that conflict between leader/worker, but that still
> > need deadlock detection.  The more work we want to parallelize, the more
> > likely that imo will become.
>
> Yeah.  The concept that leader and workers can't conflict seems to me
> to be dependent, in a very fundamental way, on the assumption that
> we only need to parallelize read-only workloads.  I don't think that's
> going to have a long half-life.
>

Surely, someday, we need to solve that problem.  But it is not clear
when because if we see the operations for which we want to solve the
relation extension lock problem doesn't require that.  For example,
for a parallel copy or further improving parallel vacuum to allow
multiple workers to scan and process the heap and individual index, we
don't need to change anything in group locking as far as I understand.

Now, for parallel deletes/updates, I think it will depend on how we
choose to parallelize those operations.  I mean if we decide that each
worker will work on an independent set of pages like we do for a
sequential scan, we again might not need to change the group locking
unless I am missing something which is possible.

I think till we know the real need for changing group locking, going
in the direction of what Tom suggested to use an array of LWLocks [1]
to address the problems in hand is a good idea.  It is not very clear
to me that are we thinking to give up on Tom's idea [1] and change
group locking even though it is not clear or at least nobody has
proposed an idea/patch which requires that?  Or are we thinking that
we can do what Tom suggested for relation extension lock and also plan
to change group locking for future parallel operations that might
require it?

[1] - https://www.postgresql.org/message-id/19443.1581435793%40sss.pgh.pa.us

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




pg_regress cleans up tablespace twice.

2020-02-18 Thread Kyotaro Horiguchi
Hello.

I saw a failure of vcregress check with the following message several
times, on a machine under a heavy load and maybe with realtime virus
scanning.

> pg_regress: could not create directory ".../testtablespace": Permission 
> denied.

I found that pg_regress repeats the sequence
rmtree(tablespace)->make_directory(tablespace) twice under
initialize_environment. So it should be THE DELETE_PENDING. It is
because the code is in convert_sourcefiles_in, which is called
succssively twice in convert_sourcefiles.

But in the first place it comes from [1] and the comment says:

> * XXX it would be better if pg_regress.c had nothing at all to do with
> * testtablespace, and this were handled by a .BAT file or similar on
> * Windows.  See pgsql-hackers discussion of 2008-01-18.

Is there any reason not to do that in vcregress.pl?  I think the
commands other than 'check' don't needs this.

[1] https://www.postgresql.org/message-id/11718.1200684807%40sss.pgh.pa.us

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 9a4e52bc7b..ae2bbba541 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -490,28 +490,6 @@ convert_sourcefiles_in(const char *source_subdir, const char *dest_dir, const ch
 
 	snprintf(testtablespace, MAXPGPATH, "%s/testtablespace", outputdir);
 
-#ifdef WIN32
-
-	/*
-	 * On Windows only, clean out the test tablespace dir, or create it if it
-	 * doesn't exist.  On other platforms we expect the Makefile to take care
-	 * of that.  (We don't migrate that functionality in here because it'd be
-	 * harder to cope with platform-specific issues such as SELinux.)
-	 *
-	 * XXX it would be better if pg_regress.c had nothing at all to do with
-	 * testtablespace, and this were handled by a .BAT file or similar on
-	 * Windows.  See pgsql-hackers discussion of 2008-01-18.
-	 */
-	if (directory_exists(testtablespace))
-		if (!rmtree(testtablespace, true))
-		{
-			fprintf(stderr, _("\n%s: could not remove test tablespace \"%s\"\n"),
-	progname, testtablespace);
-			exit(2);
-		}
-	make_directory(testtablespace);
-#endif
-
 	/* finally loop on each file and do the replacement */
 	for (name = names; *name; name++)
 	{
diff --git a/src/tools/msvc/vcregress.pl b/src/tools/msvc/vcregress.pl
index 82dca29a61..d20d700d15 100644
--- a/src/tools/msvc/vcregress.pl
+++ b/src/tools/msvc/vcregress.pl
@@ -128,8 +128,15 @@ sub installcheck
 sub check
 {
 	my $schedule = shift || 'parallel';
+	my $tablespace = 'testtablespace';
+
 	InstallTemp();
 	chdir "${topdir}/src/test/regress";
+
+	# Tablespace setup
+	rmtree($tablespace) if (-e $tablespace);
+	mkdir($tablespace);
+
 	my @args = (
 		"../../../$Config/pg_regress/pg_regress",
 		"--dlpath=.",


Re: Resolving the python 2 -> python 3 mess

2020-02-18 Thread Tom Lane
After thinking about this awhile longer, I'm starting to believe
we should do some of each.  That is, the stub replacement for
plpython2.so should redirect "plpythonu" functions to plpython3.so,
but throw errors for "plpython2u" functions.  This is not because
of any technical difference between plpythonu and plpython2u ---
up to now, there wasn't any --- but because it seems like users
would be expecting that if they've read what we have said in

https://www.postgresql.org/docs/current/plpython-python23.html

Admittedly, what it says there is that plpythonu might become
Python 3 in some "distant" future release, not next year.
But at least there's a direct line between that documentation
and this behavior.

So attached is a pair of draft patches that do it like that.
0001 creates an extension with two conversion functions, based
on the script I showed in the other thread.  Almost independently
of that, 0002 provides code to generate a stub version of
plpython2.so that behaves as stated above.  0002 is incomplete,
because I haven't looked into what is needed in the MSVC build
scripts.  Maybe we could create some regression tests, too.
But I think these are potentially committable with those additions,
if people approve of this approach.

regards, tom lane

diff --git a/doc/src/sgml/plpython.sgml b/doc/src/sgml/plpython.sgml
index 1921915..ac989a3 100644
--- a/doc/src/sgml/plpython.sgml
+++ b/doc/src/sgml/plpython.sgml
@@ -164,13 +164,6 @@
   
 
   
-   See also the
-   document https://docs.python.org/3/whatsnew/3.0.html;>What's
-   New In Python 3.0 for more information about porting to
-   Python 3.
-  
-
-  
It is not allowed to use PL/Python based on Python 2 and PL/Python
based on Python 3 in the same session, because the symbols in the
dynamic modules would clash, which could result in crashes of the
@@ -179,6 +172,90 @@
a mismatch is detected.  It is possible, however, to use both
PL/Python variants in the same database, from separate sessions.
   
+
+  
+   Converting from Python 2 to Python 3
+
+   
+See the
+document https://docs.python.org/3/whatsnew/3.0.html;>What's
+New In Python 3.0 for the Python community's information and
+recommendations about porting to Python 3.
+   
+
+   
+PostgreSQL provides some support for helping
+you to convert existing Python 2 routines to Python 3.  In an
+installation built with Python 3, there is an
+extension convert_python3 that changes functions
+and procedures from the plpythonu
+and plpython2u languages to
+the plpython3u language.  While doing so, it applies
+the 2to3 tool described in the above document to
+the body of each such routine.
+   
+
+   
+Using convert_python3 can be as simple as:
+
+CREATE EXTENSION convert_python3;
+CALL convert_python3_all();
+
+This must be done as database superuser.  If you wish, you can drop the
+extension once you're done converting everything.
+   
+
+   
+Since convert_python3 is Python 3 code, be careful
+not to install or run it in a session that has previously executed any
+Python 2 code.  As explained above, that won't work.
+   
+
+   
+convert_python3_all has two optional arguments: the
+name of the conversion tool to use (by default 2to3,
+but you might for instance need to provide a full path name) and any
+special command-line options to provide to it.  You might for example
+want to adjust the set of fixer rules
+that 2to3 applies:
+
+CALL convert_python3_all(options = '-f idioms -x apply');
+
+See 2to3's
+https://docs.python.org/3/library/2to3.html;>documentation
+for more information.
+   
+
+   
+The convert_python3 extension also provides a
+procedure that converts just one Python 2 function at a time:
+
+CALL convert_python3_one('myfunc(int)');
+
+The argument is the target function's OID, which can be written as
+a regprocedure constant (see
+).  The main reason to use this would be
+if you need to use different options for different functions.  It has
+the same optional arguments as convert_python3_all:
+
+CALL convert_python3_one('otherfunc(text)', tool = '/usr/bin/2to3',
+ options = '-f idioms');
+
+   
+
+   
+If you have needs that go beyond this, consult the source code for
+the convert_python3 extension (it's just a
+couple of plpython3u procedures) and adapt those
+procedures as necessary.
+   
+
+   
+Keep in mind that if you've constructed any DO blocks
+that use Python 2 code, those will have to be fixed up manually,
+wherever the source code for them exists.
+   
+  
  
 
  
diff --git a/src/pl/plpython/Makefile b/src/pl/plpython/Makefile
index 0d53d3d..f0de8ff 100644
--- a/src/pl/plpython/Makefile
+++ b/src/pl/plpython/Makefile
@@ -38,6 +38,9 @@ DATA = $(NAME)u.control $(NAME)u--1.0.sql $(NAME)u--unpackaged--1.0.sql
 ifeq ($(python_majorversion),2)
 

Re: Print physical file path when checksum check fails

2020-02-18 Thread Michael Paquier
On Wed, Feb 19, 2020 at 01:07:36PM +0900, Kyotaro Horiguchi wrote:
> If we also verify checksum in md layer, callback is overkill since the
> immediate caller consumes the event immediately.  We can signal the
> error by somehow returning a file tag.

FWIW, I am wondering if there is any need for a change here and
complicate more the code.  If you know the block number, the page size
and the segment file size you can immediately guess where is the
damaged block.  The first information is already part of the error
message, and the two other ones are constants defined at
compile-time.
--
Michael


signature.asc
Description: PGP signature


Re: Parallel copy

2020-02-18 Thread Amit Kapila
On Tue, Feb 18, 2020 at 7:51 PM Mike Blackwell 
wrote:

> On Sun, Feb 16, 2020 at 12:51 AM Andrew Dunstan <
> andrew.duns...@2ndquadrant.com> wrote:
>
>>
>> IIRC, in_quote only matters here in CSV mode (because CSV fields can
>> have embedded newlines). So why not just forbid parallel copy in CSV
>> mode, at least for now? I guess it depends on the actual use case. If we
>> expect to be parallel loading humungous CSVs then that won't fly.
>
>
> Loading large CSV files is pretty common here.  I hope this can be
> supported.
>
>
Thank you for your inputs.  It is important and valuable.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: Parallel copy

2020-02-18 Thread Amit Kapila
On Tue, Feb 18, 2020 at 8:08 PM Ants Aasma  wrote:
>
> On Tue, 18 Feb 2020 at 15:21, Amit Kapila  wrote:
> >
> > On Tue, Feb 18, 2020 at 5:59 PM Ants Aasma  wrote:
> > >
> > > On Tue, 18 Feb 2020 at 12:20, Amit Kapila  wrote:
> > > > This is something similar to what I had also in mind for this idea.  I
> > > > had thought of handing over complete chunk (64K or whatever we
> > > > decide).  The one thing that slightly bothers me is that we will add
> > > > some additional overhead of copying to and from shared memory which
> > > > was earlier from local process memory.  And, the tokenization (finding
> > > > line boundaries) would be serial.  I think that tokenization should be
> > > > a small part of the overall work we do during the copy operation, but
> > > > will do some measurements to ascertain the same.
> > >
> > > I don't think any extra copying is needed.
> > >
> >
> > I am talking about access to shared memory instead of the process
> > local memory.  I understand that an extra copy won't be required.
> >
> > > The reader can directly
> > > fread()/pq_copymsgbytes() into shared memory, and the workers can run
> > > CopyReadLineText() inner loop directly off of the buffer in shared memory.
> > >
> >
> > I am slightly confused here.  AFAIU, the for(;;) loop in
> > CopyReadLineText is about finding the line endings which we thought
> > that the reader process will do.
>
> Indeed, I somehow misread the code while scanning over it. So CopyReadLineText
> currently copies data from cstate->raw_buf to the StringInfo in
> cstate->line_buf. In parallel mode it would copy it from the shared data 
> buffer
> to local line_buf until it hits the line end found by the data reader. The
> amount of copying done is still exactly the same as it is now.
>

Yeah, on a broader level it will be something like that, but actual
details might vary during implementation.  BTW, have you given any
thoughts on one other approach I have shared above [1]?  We might not
go with that idea, but it is better to discuss different ideas and
evaluate their pros and cons.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1LyAyPCtBk4rkwomeT6%3DyTse5qWws-7i9EFwnUFZhvu5w%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Clean up some old cruft related to Windows

2020-02-18 Thread Michael Paquier
On Tue, Feb 18, 2020 at 05:54:43PM +0900, Kyotaro Horiguchi wrote:
> I understand that this is not for back-patching.

Cleanups don't go to back-branches.

> At Tue, 18 Feb 2020 16:44:59 +0900, Michael Paquier  
> wrote in 
> The unmodified section just above is griping that "Strange they call
> UNIX an application". The expression "application such as UNIX" seems
> corresponding to the gripe.  I tried to find the soruce of the phrase
> but the above URL (.._crt_signal.asp) sent me "We're sorry, the page
> you requested cannot be found.":(

Yes, we should use that instead:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/signal

> Do we let the new PG version for already-unsupported platforms?  If I
> don't missing anything Windows Server 2008 is already
> End-Of-Extended-Support (2020/1/14) along with Windows 7.

Windows is known for keeping things backward compatible, so I don't
see any reason to not allow Postgres to run on those versions.
Outdated of course, still they could be used at runtime even if they
cannot compile the code.

> By the way that pharse is considering Windows environment and perhaps
> cmd.exe. So the folloinwg description:
> 
> https://www.postgresql.org/docs/current/install-windows-full.html

Let's tackle that as a separate patch as this is MSVC-dependent.

>>  
>>   PostgreSQL can be expected to work on these 
>> operating
>>   systems: Linux (all recent distributions), Windows (Win2000 SP4 and later),
>>   FreeBSD, OpenBSD, NetBSD, macOS, AIX, HP/UX, and Solaris.
>>   Other Unix-like systems may also work but are not currently
>>   being tested.  In most cases, all CPU architectures supported by
> 
> (The coming version of) PostgreSQL doesn't support Win2000 SP4.

Right, per the change for src/common/exec.c.  I am wondering though if
we don't have more portability issues if we try to run Postgres on
something older than XP as there has been many changes in the last
couple of years, and we have no more buildfarm members that old.
Anyway, that's not worth the cost.  For now I have applied to the tree
the smaller version as that's still a good cut.
--
Michael


signature.asc
Description: PGP signature


Re: Parallel copy

2020-02-18 Thread Amit Kapila
On Tue, Feb 18, 2020 at 8:41 PM David Fetter  wrote:
>
> On Tue, Feb 18, 2020 at 06:51:29PM +0530, Amit Kapila wrote:
> > On Tue, Feb 18, 2020 at 5:59 PM Ants Aasma  wrote:
> > >
> > > On Tue, 18 Feb 2020 at 12:20, Amit Kapila  wrote:
> > > > This is something similar to what I had also in mind for this idea.  I
> > > > had thought of handing over complete chunk (64K or whatever we
> > > > decide).  The one thing that slightly bothers me is that we will add
> > > > some additional overhead of copying to and from shared memory which
> > > > was earlier from local process memory.  And, the tokenization (finding
> > > > line boundaries) would be serial.  I think that tokenization should be
> > > > a small part of the overall work we do during the copy operation, but
> > > > will do some measurements to ascertain the same.
> > >
> > > I don't think any extra copying is needed.
> >
> > I am talking about access to shared memory instead of the process
> > local memory.  I understand that an extra copy won't be required.
>
> Isn't accessing shared memory from different pieces of execution what
> threads were designed to do?
>

Sorry, but I don't understand what you mean by the above?  We are
going to use background workers (which are processes) for parallel
workers.  In general, it might not make a big difference in accessing
shared memory as compared to local memory especially because the cost
of other stuff in the copy is relatively higher.  But still, it is a
point to consider.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Print physical file path when checksum check fails

2020-02-18 Thread Kyotaro Horiguchi
Hello. Thank you for the new patch.

At Tue, 18 Feb 2020 09:27:39 +0800, Hubert Zhang  wrote in 
> On Wed, Feb 12, 2020 at 5:22 PM Hubert Zhang  wrote:
> 
> > Thanks Andres,
> >
> > On Tue, Feb 11, 2020 at 5:30 AM Andres Freund  wrote:
> >
> >> HHi,
> >>
> >> On 2020-02-10 16:04:21 +0800, Hubert Zhang wrote:
> >> > Currently we only print block number and relation path when checksum
> >> check
> >> > fails. See example below:
> >> >
> >> > ERROR: invalid page in block 333571 of relation base/65959/656195
> >>
> >> > DBA complains that she needs additional work to calculate which physical
> >> > file is broken, since one physical file can only contain `RELSEG_SIZE`
> >> > number of blocks. For large tables, we need to use many physical files
> >> with
> >> > additional suffix, e.g. 656195.1, 656195.2 ...
> >> >
> >> > Is that a good idea to also print the physical file path in error
> >> message?
> >> > Like below:
> >> >
> >> > ERROR: invalid page in block 333571 of relation base/65959/656195, file
> >> > path base/65959/656195.2
> >>
> >> I think that'd be a nice improvement. But:
> >>
> >> I don't think the way you did it is right architecturally. The
> >> segmenting is really something that lives within md.c, and we shouldn't
> >> further expose it outside of that. And e.g. the undo patchset uses files
> >> with different segmentation - but still goes through bufmgr.c.
> >>
> >> I wonder if this partially signals that the checksum verification piece
> >> is architecturally done in the wrong place currently? It's imo not good
> >> that every place doing an smgrread() needs to separately verify
> >> checksums. OTOH, it doesn't really belong inside smgr.c.
> >>
> >>
> >> This layering issue was also encountered in
> >>
> >> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3eb77eba5a51780d5cf52cd66a9844cd4d26feb0
> >> so perhaps we should work to reuse the FileTag it introduces to
> >> represent segments, without hardcoding the specific segment size?
> >>
> >>
> > I checked the FileTag commit. It calls `register_xxx_segment` inside md.c
> > to store the sync request into a hashtable and used by checkpointer later.
> >
> > Checksum verify is simpler. We could move the `PageIsVerified` into md.c
> > (mdread). But we can not elog error inside md.c because read buffer mode
> > RBM_ZERO_ON_ERROR is at bugmgr.c level.
> >
> > One idea is to change save the error message(or the FileTag) at (e.g. a
> > static string in bufmgr.c) by calling `register_checksum_failure` inside
> > mdread in md.c.
> >
> > As for your concern about the need to do checksum verify after every
> > smgrread, we now move the checksum verify logic into md.c, but we still
> > need to check the checksum verify result after smgrread and reset buffer to
> > zero if mode is RBM_ZERO_ON_ERROR.
> >
> > If this idea is OK, I will submit the new PR.
> >
> >
> Based on Andres's comments, here is the new patch for moving checksum
> verify logic into mdread() instead of call PageIsVerified in every
> smgrread(). Also using FileTag to print the physical file name when
> checksum verify failed, which handle segmenting inside md.c as well.

The patch doesn't address the first comment from Andres. It still
expose the notion of segment to the upper layer, but bufmgr (bufmgr.c
and bufpage.c) or Realation (relpath.c) layer shouldn't know of
segment.  So the two layers should ask smgr without using segment
number for the real file name for the block.

For example, I think the following structure works. (Without moving
checksum verification.)

==
md.c:
  char *mdfname(SmgrRelation reln, Forknumber forkNum, BlockNumber blocknum);
smgr.c:
  char *smgrfname(SMgrRelation reln, ForkNumber forkNum, BlockNumber Blocknum);

bufmgr.c:
  ReadBuffer_common()
  {
  ..
/* check for garbage data */
if (!PageIsVerified((Page) bufBlock, blockNum))
if (mode == RBM_ZERO_ON_ERROR || 
zero_damaged_pages)
ereport(WARNING,
 errmsg("invalid page 
in block %u in file %s; zeroing out page",

blockNum,

smgrfname(smgr, forkNum, blockNum;
MemSet((char *) bufBlock, 0, BLCKSZ);


However, the block number in the error messages looks odd as it is
actually not the block number in the segment. We could convert
BlockNum into relative block number or offset in the file but it would
be overkill. So something like this works?

"invalid page in block %u in relation %u, file \"%s\"",
   blockNum, smgr->smgr_rnode.node.relNode, smgrfname()


If we also verify checksum in md layer, callback is overkill since the
immediate caller consumes the event immediately.  We can signal the
error by somehow returning a file tag.


Re: Clean up some old cruft related to Windows

2020-02-18 Thread Michael Paquier
On Tue, Feb 18, 2020 at 12:26:06PM +0100, Juan José Santamaría Flecha wrote:
> I cannot point when SetEnv.bat was exactly dropped, probably Windows 7 SDK
> was the place where it was included [1], so that needs to be updated.
> 
> Using VS2019 and VS2017 this would be done using VsDevCmd.bat [2], while
> VS2015 and VS2013 use VSVARS32.bat.

Would you like to write a patch for that part?
--
Michael


signature.asc
Description: PGP signature


Re: Clean up some old cruft related to Windows

2020-02-18 Thread Michael Paquier
On Tue, Feb 18, 2020 at 12:05:42PM +0100, Juan José Santamaría Flecha wrote:
> Maybe this needs a specific thread, as it is not quite cruft but something
> that will require maintenance.

Makes sense.  I have discarded that part for now.
--
Michael


signature.asc
Description: PGP signature


Re: pg_trigger.tgparentid

2020-02-18 Thread Amit Langote
On Tue, Feb 18, 2020 at 1:11 PM Amit Langote  wrote:
> On Tue, Feb 18, 2020 at 6:56 AM Alvaro Herrera  
> wrote:
> @@ -16541,7 +16493,7 @@ CloneRowTriggersToPartition(Relation parent,
> Relation partition)
>   *
>   * However, if our parent is a partitioned relation, there might be
>
> This is existing text, but should really be:
>
> However, if our parent is a *partition* itself, there might be
>
> (Sorry, I forgot to report this when the bug-fix went in couple months ago.)
>
>   * internal triggers that need cloning.  In that case, we must skip
> - * clone it if the trigger on parent depends on another trigger.
> + * cloning it if the trigger on parent depends on another trigger.
>
> 2nd sentence seems unclear to me.  Does the following say what needs
> to be said here:
>
>  * However, if our parent is a partition itself, there might be
>  * internal triggers that need cloning.  For example, triggers on the
>  * parent that were in turn cloned from its own parent are marked
>  * internal, which too must be cloned to the partition.

Or:

 * However, if our parent is a partition itself, there might be
 * internal triggers that must not be skipped.  For example, triggers
 * on the parent that were in turn cloned from its own parent are
 * marked internal, which must be cloned to the partition.

Thanks,
Amit




Re: Internal key management system

2020-02-18 Thread Craig Ringer
On Tue, 11 Feb 2020 at 09:58, Andres Freund  wrote:
> Isn't that basically a problem of the past by now?  Partially due to
> changed laws (e.g. France, which used to be a problematic case), but
> also because it's basically futile to have import restrictions on
> cryptography by now. Just about every larger project contains
> significant amounts of cryptographic code and it's entirely impractical
> to operate anything interfacing with network without some form of
> transport encryption.  And just about all open source distribution
> mechanism have stopped separating out crypto code a long time ago.

Australia passed some stunningly backwards crypto laws only quite recently.
The Defense Trade Control Act (DCTA) imposes restrictions not only on
exporting crypto software, but even on teaching about cryptography without
a permit. While supposedly restricted to military items and software, it's
rather broad and unclear how that is determined. It's one of those "written
broadly, applied selectively, trust us to be nice" laws, because they're
NEVER abused, right? See
https://www.defence.gov.au/ExportControls/Cryptography.asp .

More recently we passed another idiotic "did you even bother to listen at
all to the people who explained this to you" law called the
Telecommunications (Assistance and Access) Act. This allows the Government
to order companies/organisations to permit "lawful access" to encrypted
communication, including end-to-end encrypted communications. It doesn't
legislatively order the creation of backdoors, it just legislates that
companies must be able to add them on demand, so ... um, it legislates
backdoors. The law was drafted quickly, with little consultation, and
rammed through Parliament during Christmas with the usual "but the
Terrorists" handwaving. (Nevermind that real world terrorist organisations
are communicating mainly through videogames chats and other innocuous
places, not relying on strong crypto.) The law is already being abused to
attack journalists. It has some nasty provisions about what Australia may
order employees of a company to do as well, but thankfully the politicians
who drafted those provisions did not appear to understand things like
revision control or code review, so their real world threat is minimal.

My point? In practice, much of what we do with crypto is subject to a
variety of legal questions in many legal jurisdictions. Not much is
outright illegal in most places, but it's definitely complicated. I do not
participate in anything I know to be illegal or reasonably suspect to be
illegal - but with the kind of incredibly broad laws we have now on the
books in so many places, talking about the Ceasar Cipher / rot13 could be a
violation of someone's crypto law somewhere if you get the wrong judge and
the wrong situation.

The main change has been that it got simpler in the US, so enough
developers stopped caring. The US's Dep't of Commerce export restrictions
were weakened and the set of countries they applied to were narrowed,
allowing US companies and citizens the ability to participate in projects
containing non-crippled crypto.

There are still plenty of places where any sort of non-backdoored crypto is
entirely illegal, we just say "that's your problem" to people in those
places.

I wholly support this approach. Pretty much everything is illegal
somewhere. Patents are pain enough already.

(Apologies for thread-breaking reply, this is not from my
usually-subscribed account. I do not speak in any way for my employer on
this matter.)

-- 
Craig Ringer


Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-02-18 Thread Amit Langote
On Tue, Feb 18, 2020 at 9:32 PM Fujii Masao  wrote:
> OK, I changed the doc that way. Attached the updated version of the patch.

Thank you.  Looks good to me.

Regards,
Amit




Re: Delaying/avoiding BTreeTupleGetNAtts() call within _bt_compare()

2020-02-18 Thread Peter Geoghegan
On Mon, Feb 10, 2020 at 1:05 AM Floris Van Nee  wrote:
> I ran all the tests on two different machines, several times for 1 hour each 
> time. I'm still having a hard time getting reliable results for the 30 
> clients case though. I'm pretty certain the patches bring a performance 
> benefit, but how high exactly is difficult to say. As for applying only patch 
> 1+2 or all three patches - I found no significant difference between these 
> two cases. It looks like all the performance benefit is in the first two 
> patches.

Attached is v3, which no longer includes the third patch/optimization.
It also inlines (in the second patch) by marking the _bt_compare
definition as inline, while not changing anything in nbtree.h. I
believe that this is portable C99 -- let's see what CF Tester thinks
of it.

I'm going to test this myself. It would be nice if you could repeat
something like the previous experiments with v3, Floris. master vs v3
(both patches together). With a variable number of clients.

Thanks
-- 
Peter Geoghegan


v3-0002-Inline-_bt_compare.patch
Description: Binary data


v3-0001-Avoid-pipeline-stall-in-_bt_compare.patch
Description: Binary data


Re: PL/Python - lifetime of variables?

2020-02-18 Thread Craig Ringer
On Wed, 19 Feb 2020 at 09:12, Kohei KaiGai  wrote:

> I noticed that variables in PL/Python are not released at the end of 
> procedure.
> Does it expected behavior?

PL/Python vars are freed when the interpreter instance is freed and
their refcounts reach zero.

I believe we use one subinterpreter for the lifetime of the backend
session. It might be worth checking whether we do an eager refcount
check and sweep when a procedure finishes.

But in general, I suggest that relying on object
finalizers/destructors to accomplish side effects visible outside the
procedure is bad development practice. Instead, use a "with" block, or
a try/finally block, and do explicit cleanup for external resources.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise




Re: Delaying/avoiding BTreeTupleGetNAtts() call within _bt_compare()

2020-02-18 Thread Peter Geoghegan
On Tue, Feb 18, 2020 at 4:45 PM Thomas Munro  wrote:
> Yeah, for CF entries with multiple threads, it currently looks at
> whichever thread has the most recent email on it, and then it finds
> the most recent patch set on that thread.  Perhaps it should look at
> all the registered threads and find the message with the newest patch
> set across all of them, as you say.  I will look into that.

Thanks!

I know that I am a bit unusual in that I use all of the CF app
features at the same time. But the current behavior of CF Tester
disincentivizes using multiple threads. This works against the goal of
having good metadata about patches that are worked on over multiple
releases or years. We have a fair few of those.

-- 
Peter Geoghegan




PL/Python - lifetime of variables?

2020-02-18 Thread Kohei KaiGai
Hello,

I noticed that variables in PL/Python are not released at the end of procedure.
Does it expected behavior?

See this example below:
https://github.com/heterodb/pg-strom/blob/master/test/input/arrow_python.source#L53

This PL/Python function maps a GPU buffer as cupy.ndarray object by
cupy_strom.ipc_import()
at the L59. It shall be stored in X. I have expected that X shall be
released at end of the
procedure invocation, but not happen.

The object X internally hold IpcMemory class,
  https://github.com/heterodb/pg-strom/blob/master/python/cupy_strom.pyx
And, it has destructor routine that unmaps the above GPU buffer using CUDA API.
  https://github.com/heterodb/pg-strom/blob/master/python/cupy_ipcmem.c#L242
Because of the restriction by CUDA API, we cannot map a certain GPU buffer twice
on the same process space. So, I noticed that the second invocation of
the PL/Python
procedure on the same session failed.
The L103 explicitly reset X, by X=0, to invoke the destructor manually.

I wonder whether it is an expected behavior, or oversight of something.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 




Re: Delaying/avoiding BTreeTupleGetNAtts() call within _bt_compare()

2020-02-18 Thread Thomas Munro
On Wed, Feb 19, 2020 at 1:35 PM Peter Geoghegan  wrote:
> On Tue, Feb 18, 2020 at 12:55 PM Thomas Munro  wrote:
> > The cfbot seems to be showing "pg_regress: initdb failed" on Ubuntu,
> > with an assertion failure like this:
> >
> > #2 0x008e594f in ExceptionalCondition
> > (conditionName=conditionName@entry=0x949098 "BTreeTupleGetNAtts(itup,
> > rel) >= key->keysz", errorType=errorType@entry=0x938a7d
> > "FailedAssertion", fileName=fileName@entry=0x949292 "nbtsearch.c",
>
> This is a legitimate bug in v1 of the patch, which was written in a
> hurry. v2 does not have the problem. Floris inadvertently created a
> separate thread for this same patch, which I responded to when posting
> v2. I've now updated the CF entry for this patch [1] to have both
> threads.
>
> BTW, I've noticed that CF Tester is wonky with patches that have
> multiple threads with at least one patch file posted to each thread.
> The deduplication patch [2] has this problem, for example. It would be
> nice if CF Tester knew to prefer one thread over another based on a
> simple rule, like "consistently look for patch files on the first
> thread connected to a CF app entry, never any other thread".

Ahh.  Well I had that rule early on, and then had the problem that
some discussions move entirely to a second or third thread and left it
testing some ancient stale patch.

> Maybe you'd rather not go that way -- I guess that it would break
> other cases, such as the CF app entry for this patch (which now
> technically has one thread that supersedes the other). Perhaps a
> compromise is possible. At a minimum, CF Tester should not look for a
> patch on the (say) second thread of a CF app entry for a patch just
> because somebody posted an e-mail to that thread (an e-mail that did
> not contain a new patch). CF Tester will do this even though there is
> a more recent patch on the first thread of the CF app entry, that has
> already been accepted as passing by CFTester. I believe that CF Tester
> will actually pingpong back and forth between the same two patches on
> each thread as e-mail is sent to each thread, without anybody ever
> posting a new patch.

Yeah, for CF entries with multiple threads, it currently looks at
whichever thread has the most recent email on it, and then it finds
the most recent patch set on that thread.  Perhaps it should look at
all the registered threads and find the message with the newest patch
set across all of them, as you say.  I will look into that.




Re: Delaying/avoiding BTreeTupleGetNAtts() call within _bt_compare()

2020-02-18 Thread Peter Geoghegan
On Tue, Feb 18, 2020 at 12:55 PM Thomas Munro  wrote:
> The cfbot seems to be showing "pg_regress: initdb failed" on Ubuntu,
> with an assertion failure like this:
>
> #2 0x008e594f in ExceptionalCondition
> (conditionName=conditionName@entry=0x949098 "BTreeTupleGetNAtts(itup,
> rel) >= key->keysz", errorType=errorType@entry=0x938a7d
> "FailedAssertion", fileName=fileName@entry=0x949292 "nbtsearch.c",

This is a legitimate bug in v1 of the patch, which was written in a
hurry. v2 does not have the problem. Floris inadvertently created a
separate thread for this same patch, which I responded to when posting
v2. I've now updated the CF entry for this patch [1] to have both
threads.

BTW, I've noticed that CF Tester is wonky with patches that have
multiple threads with at least one patch file posted to each thread.
The deduplication patch [2] has this problem, for example. It would be
nice if CF Tester knew to prefer one thread over another based on a
simple rule, like "consistently look for patch files on the first
thread connected to a CF app entry, never any other thread".

Maybe you'd rather not go that way -- I guess that it would break
other cases, such as the CF app entry for this patch (which now
technically has one thread that supersedes the other). Perhaps a
compromise is possible. At a minimum, CF Tester should not look for a
patch on the (say) second thread of a CF app entry for a patch just
because somebody posted an e-mail to that thread (an e-mail that did
not contain a new patch). CF Tester will do this even though there is
a more recent patch on the first thread of the CF app entry, that has
already been accepted as passing by CFTester. I believe that CF Tester
will actually pingpong back and forth between the same two patches on
each thread as e-mail is sent to each thread, without anybody ever
posting a new patch.

Thanks

[1] https://commitfest.postgresql.org/27/2429/#
[2] https://commitfest.postgresql.org/27/2202/
-- 
Peter Geoghegan




Re: Fastpath while arranging the changes in LSN order in logical decoding

2020-02-18 Thread David Zhang

After manually applied the patch, a diff regenerated is attached.

On 2020-02-18 4:16 p.m., David Zhang wrote:

1. Tried to apply the patch to PG 12.2 commit 
45b88269a353ad93744772791feb6d01bc7e1e42 (HEAD -> REL_12_2, tag: REL_12_2), it 
doesn't work. Then tried to check the patch, and found the errors showing below.
$ git apply --check 
0001-Fastpath-for-sending-changes-to-output-plugin-in-log.patch
error: patch failed: contrib/test_decoding/logical.conf:1
error: contrib/test_decoding/logical.conf: patch does not apply
error: patch failed: src/backend/replication/logical/reorderbuffer.c:1133
error: src/backend/replication/logical/reorderbuffer.c: patch does not apply

2. Ran a further check for file "logical.conf", and found there is only one commit since 
2014, which doesn't have the parameter, "logical_decoding_work_mem = 64kB"

3. Manually apply the patch including 
src/backend/replication/logical/reorderbuffer.c, and then ran a simple logical 
replication test. A connection issue is found like below,
"table public.pgbench_accounts: INSERT: aid[integer]:4071 bid[integer]:1 
abalance[integer]:0 filler[character]:'  
  '
pg_recvlogical: error: could not receive data from WAL stream: server closed 
the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_recvlogical: disconnected; waiting 5 seconds to try again"

4. This connection issue can be reproduced on PG 12.2 commit mentioned above, 
the basic steps,
4.1 Change "wal_level = logical" in "postgresql.conf"
4.2 create a logical slot and listen on it,
$ pg_recvlogical -d postgres --slot test --create-slot
$ pg_recvlogical -d postgres --slot test --start -f -

4.3 from another terminal, run the command below,
$ pgbench -i -p 5432 -d postgres

Let me know if I did something wrong, and if a new patch is available, I can 
re-run the test on the same environment.


--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca
diff --git a/contrib/test_decoding/logical.conf 
b/contrib/test_decoding/logical.conf
index 367f706651..02595d99d5 100644
--- a/contrib/test_decoding/logical.conf
+++ b/contrib/test_decoding/logical.conf
@@ -1,2 +1,3 @@
 wal_level = logical
 max_replication_slots = 4
+logical_decoding_work_mem = 64MB
diff --git a/src/backend/replication/logical/reorderbuffer.c 
b/src/backend/replication/logical/reorderbuffer.c
index a74fd705b4..62b661dc4d 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -989,11 +989,6 @@ ReorderBufferIterTXNInit(ReorderBuffer *rb, 
ReorderBufferTXN *txn,
nr_txns++;
}
 
-   /*
-* TODO: Consider adding fastpath for the rather common nr_txns=1 case, 
no
-* need to allocate/build a heap then.
-*/
-
/* allocate iteration state */
state = (ReorderBufferIterTXNState *)
MemoryContextAllocZero(rb->context,
@@ -1009,10 +1004,11 @@ ReorderBufferIterTXNInit(ReorderBuffer *rb, 
ReorderBufferTXN *txn,
state->entries[off].segno = 0;
}
 
-   /* allocate heap */
-   state->heap = binaryheap_allocate(state->nr_txns,
- 
ReorderBufferIterCompare,
- 
state);
+   /* allocate heap, if we have more than one transaction. */
+   if (nr_txns > 1)
+   state->heap = binaryheap_allocate(state->nr_txns,
+   
  ReorderBufferIterCompare,
+   
  state);
 
/* Now that the state fields are initialized, it is safe to return it. 
*/
*iter_state = state;
@@ -1044,7 +1040,9 @@ ReorderBufferIterTXNInit(ReorderBuffer *rb, 
ReorderBufferTXN *txn,
state->entries[off].change = cur_change;
state->entries[off].txn = txn;
 
-   binaryheap_add_unordered(state->heap, Int32GetDatum(off++));
+   /* add to heap, only if we have more than one transaction. */
+   if (nr_txns > 1)
+   binaryheap_add_unordered(state->heap, 
Int32GetDatum(off++));
}
 
/* add subtransactions if they contain changes */
@@ -1073,12 +1071,15 @@ ReorderBufferIterTXNInit(ReorderBuffer *rb, 
ReorderBufferTXN *txn,
state->entries[off].change = cur_change;
state->entries[off].txn = cur_txn;
 
-   binaryheap_add_unordered(state->heap, 
Int32GetDatum(off++));
+   /* add to heap, only if we have more than one 
transaction. */
+   if (nr_txns > 1)
+   

Re: Internal key management system

2020-02-18 Thread Cary Huang
Hi 



I have tried the attached kms_v3 patch and have some comments:



1) In the comments, I think you meant hmac + iv + encrypted data instead of iv 
+ hmac + encrypted data? 



---> in kmgr_wrap_key( ):

+   /*

+    * Assemble the wrapped key. The order of the wrapped key is iv, hmac 
and

+    * encrypted data.

+    */





2) I see that create_keywrap_ctx function in kmgr_utils.c and regular cipher 
context init will both call ossl_aes256_encrypt_init to initialise context for 
encryption and key wrapping. In ossl_aes256_encrypt_init, the cipher method 
always initialises to aes-256-cbc, which is ok for keywrap because under CBC 
block cipher mode, we only had to supply one unique IV as initial value. But 
for actual WAL and buffer encryption that will come in later, I think the 
discussion is to use CTR block cipher mode, which requires one unique IV for 
each block, and the sequence id from WAL and buffer can be used to derive 
unique IV for each block for better security? I think it would be better to 
allow caller to decide which EVP_CIPHER to initialize? EVP_aes_256_cbc(), 
EVP_aes_256_ctr() or others?



+ossl_aes256_encrypt_init(pg_cipher_ctx *ctx, uint8 *key)



+{


+   if (!EVP_EncryptInit_ex(ctx, EVP_aes_256_cbc(), NULL, NULL, NULL))

+   return false;

+   if (!EVP_CIPHER_CTX_set_key_length(ctx, PG_AES256_KEY_LEN))

+   return false;

+   if (!EVP_EncryptInit_ex(ctx, NULL, NULL, key, NULL))

+   return false;

+

+   /*

+    * Always enable padding. We don't need to check the return

+    * value as EVP_CIPHER_CTX_set_padding always returns 1.

+    */

+   EVP_CIPHER_CTX_set_padding(ctx, 1);

+

+   return true;

+}



3) Following up point 2), I think we should enhance the enum to include not 
only the Encryption algorithm and key size, but also the block cipher mode as 
well because having all 3 pieces of information can describe exactly how KMS is 
performing the encryption and decryption. So when we call 
"ossl_aes256_encrypt_init", we can include the new enum as input parameter and 
it will initialise the EVP_CIPHER_CTX with either EVP_aes_256_cbc() or 
EVP_aes_256_ctr() for different purposes (key wrapping, or WAL encryption..etc).



---> kmgr.h

+/* Value of key_management_cipher */






+enum

+{

+   KMGR_CIPHER_OFF = 0,

+   KMGR_CIPHER_AES256

+};

+



so it would become 

+enum

+{

+   KMGR_CIPHER_OFF = 0,

+   KMGR_CIPHER_AES256_CBC = 1,

+       KMGR_CIPHER_AES256_CTR = 2

+};



If you agree with this change, several other places will need to be changed as 
well, such as "kmgr_cipher_string", "kmgr_cipher_value" and the initdb code



4) the pg_wrap_key and pg_unwrap_key SQL functions defined in kmgr.c

I tried these new SQL functions and found that the pg_unwrap_key will produce 
the original key with 4 bytes less. This is because the result length is not 
set long enough to accommodate the 4 byte VARHDRSZ header used by the 
multi-type variable.



the len variable in SET_VARSIZE(res, len) should include also the variable 
header VARHDRSZ. Now it is 4 byte short so it will produce incomplete output.



---> pg_unwrap_key function in kmgr.c

+   if (!kmgr_unwrap_key(UnwrapCtx, (uint8 *) VARDATA_ANY(data), datalen,


+    (uint8 *) VARDATA(res), ))

+   ereport(ERROR,

+   (errmsg("could not unwrap the given secret")));

+

+   /*

+    * The size of unwrapped key can be smaller than the size estimated

+    * before unwrapping since the padding is removed during unwrapping.

+    */

+   SET_VARSIZE(res, len);

+   PG_RETURN_BYTEA_P(res);



I am only testing their functionalities with random key as input data. It is 
currently not possible for a user to obtain the wrapped key from the server in 
order to use these wrap/unwrap functions. I personally don't think it is a good 
idea to expose these functions to user



thank you






Cary Huang

-

HighGo Software Inc. (Canada)

mailto:cary.hu...@highgo.ca

http://www.highgo.ca



 On Fri, 14 Feb 2020 08:00:45 -0800 Robert Haas  
wrote 


On Thu, Feb 6, 2020 at 9:19 PM Masahiko Sawada 
 wrote: 
> This feature protects data from disk thefts but cannot protect data 
> from attackers who are able to access PostgreSQL server. In this 
> design application side still is responsible for managing the wrapped 
> secret in order to protect it from attackers. This is the same as when 
> we use pgcrypto now. The difference is that data is safe even if 
> attackers steal the wrapped secret and the disk. The data cannot be 
> decrypted either without the passphrase which can be stored to other 
> key management systems or without accessing postgres server. IOW for 
> example, attackers can get the data if they get the wrapped secret 
> managed by 

Re: Fastpath while arranging the changes in LSN order in logical decoding

2020-02-18 Thread David Zhang
1. Tried to apply the patch to PG 12.2 commit 
45b88269a353ad93744772791feb6d01bc7e1e42 (HEAD -> REL_12_2, tag: REL_12_2), it 
doesn't work. Then tried to check the patch, and found the errors showing below.
$ git apply --check 
0001-Fastpath-for-sending-changes-to-output-plugin-in-log.patch
error: patch failed: contrib/test_decoding/logical.conf:1
error: contrib/test_decoding/logical.conf: patch does not apply
error: patch failed: src/backend/replication/logical/reorderbuffer.c:1133
error: src/backend/replication/logical/reorderbuffer.c: patch does not apply

2. Ran a further check for file "logical.conf", and found there is only one 
commit since 2014, which doesn't have the parameter, "logical_decoding_work_mem 
= 64kB"

3. Manually apply the patch including 
src/backend/replication/logical/reorderbuffer.c, and then ran a simple logical 
replication test. A connection issue is found like below,
"table public.pgbench_accounts: INSERT: aid[integer]:4071 bid[integer]:1 
abalance[integer]:0 filler[character]:' 
   '
pg_recvlogical: error: could not receive data from WAL stream: server closed 
the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_recvlogical: disconnected; waiting 5 seconds to try again"

4. This connection issue can be reproduced on PG 12.2 commit mentioned above, 
the basic steps,
4.1 Change "wal_level = logical" in "postgresql.conf"
4.2 create a logical slot and listen on it,
$ pg_recvlogical -d postgres --slot test --create-slot
$ pg_recvlogical -d postgres --slot test --start -f -

4.3 from another terminal, run the command below,
$ pgbench -i -p 5432 -d postgres

Let me know if I did something wrong, and if a new patch is available, I can 
re-run the test on the same environment.

-- 
David
Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca

Re: Implementing Incremental View Maintenance

2020-02-18 Thread Yugo NAGATA
On Tue, 18 Feb 2020 22:03:47 +0900
nuko yokohama  wrote:

> Hi.
> 
> SELECT statements with a TABLESAMPLE clause should be rejected.
> 
> Currently, CREATE INCREMENTAL MATERIALIZED VIEW allows SELECT statements
> with the TABLESAMPLE clause.
> However, the result of this SELECT statement is undefined and should be
> rejected when specified in CREATE INCREMENTAL MATERIALIZED VIEW.
> (similar to handling non-immutable functions)

Thanks!  We totally agree with you.  We are now working on improvement of
query checks at creating IMMV.  TABLESAMPLE will also be checked in this.

Regards,
Yugo Nagata

> Regard.
> 
> 2020年2月8日(土) 11:15 nuko yokohama :
> 
> > Hi.
> >
> > UNION query problem.(server crash)
> >
> > When creating an INCREMENTAL MATERIALIZED VIEW,
> > the server process crashes if you specify a query with a UNION.
> >
> > (commit id = 23151be7be8d8f8f9c35c2d0e4e5353aedf2b31e)
> >
> > execute log.
> >
> > ```
> > [ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f union_query_crash.sql
> > DROP TABLE IF EXISTS table_x CASCADE;
> > psql:union_query_crash.sql:6: NOTICE:  drop cascades to view xy_union_v
> > DROP TABLE
> > DROP TABLE IF EXISTS table_y CASCADE;
> > DROP TABLE
> > CREATE TABLE table_x (id int, data numeric);
> > CREATE TABLE
> > CREATE TABLE table_y (id int, data numeric);
> > CREATE TABLE
> > INSERT INTO table_x VALUES (generate_series(1, 3), random()::numeric);
> > INSERT 0 3
> > INSERT INTO table_y VALUES (generate_series(1, 3), random()::numeric);
> > INSERT 0 3
> > SELECT * FROM table_x;
> >  id |data
> > +
> >   1 |  0.950724735058774
> >   2 | 0.0222670808201144
> >   3 |  0.391258547114841
> > (3 rows)
> >
> > SELECT * FROM table_y;
> >  id |data
> > +
> >   1 |  0.991717347778337
> >   2 | 0.0528458947672874
> >   3 |  0.965044982911163
> > (3 rows)
> >
> > CREATE VIEW xy_union_v AS
> > SELECT 'table_x' AS name, * FROM table_x
> > UNION
> > SELECT 'table_y' AS name, * FROM table_y
> > ;
> > CREATE VIEW
> > TABLE xy_union_v;
> >   name   | id |data
> > -++
> >  table_y |  2 | 0.0528458947672874
> >  table_x |  2 | 0.0222670808201144
> >  table_y |  3 |  0.965044982911163
> >  table_x |  1 |  0.950724735058774
> >  table_x |  3 |  0.391258547114841
> >  table_y |  1 |  0.991717347778337
> > (6 rows)
> >
> > CREATE INCREMENTAL MATERIALIZED VIEW xy_imv AS
> > SELECT 'table_x' AS name, * FROM table_x
> > UNION
> > SELECT 'table_y' AS name, * FROM table_y
> > ;
> > psql:union_query_crash.sql:28: server closed the connection unexpectedly
> > This probably means the server terminated abnormally
> > before or while processing the request.
> > psql:union_query_crash.sql:28: fatal: connection to server was lost
> > ```
> > UNION query problem.(server crash)
> >
> > When creating an INCREMENTAL MATERIALIZED VIEW,
> > the server process crashes if you specify a query with a UNION.
> >
> > (commit id = 23151be7be8d8f8f9c35c2d0e4e5353aedf2b31e)
> >
> > execute log.
> >
> > ```
> > [ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f union_query_crash.sql
> > DROP TABLE IF EXISTS table_x CASCADE;
> > psql:union_query_crash.sql:6: NOTICE:  drop cascades to view xy_union_v
> > DROP TABLE
> > DROP TABLE IF EXISTS table_y CASCADE;
> > DROP TABLE
> > CREATE TABLE table_x (id int, data numeric);
> > CREATE TABLE
> > CREATE TABLE table_y (id int, data numeric);
> > CREATE TABLE
> > INSERT INTO table_x VALUES (generate_series(1, 3), random()::numeric);
> > INSERT 0 3
> > INSERT INTO table_y VALUES (generate_series(1, 3), random()::numeric);
> > INSERT 0 3
> > SELECT * FROM table_x;
> >  id |data
> > +
> >   1 |  0.950724735058774
> >   2 | 0.0222670808201144
> >   3 |  0.391258547114841
> > (3 rows)
> >
> > SELECT * FROM table_y;
> >  id |data
> > +
> >   1 |  0.991717347778337
> >   2 | 0.0528458947672874
> >   3 |  0.965044982911163
> > (3 rows)
> >
> > CREATE VIEW xy_union_v AS
> > SELECT 'table_x' AS name, * FROM table_x
> > UNION
> > SELECT 'table_y' AS name, * FROM table_y
> > ;
> > CREATE VIEW
> > TABLE xy_union_v;
> >   name   | id |data
> > -++
> >  table_y |  2 | 0.0528458947672874
> >  table_x |  2 | 0.0222670808201144
> >  table_y |  3 |  0.965044982911163
> >  table_x |  1 |  0.950724735058774
> >  table_x |  3 |  0.391258547114841
> >  table_y |  1 |  0.991717347778337
> > (6 rows)
> >
> > CREATE INCREMENTAL MATERIALIZED VIEW xy_imv AS
> > SELECT 'table_x' AS name, * FROM table_x
> > UNION
> > SELECT 'table_y' AS name, * FROM table_y
> > ;
> > psql:union_query_crash.sql:28: server closed the connection unexpectedly
> > This probably means the server terminated abnormally
> > before or while processing the request.
> > psql:union_query_crash.sql:28: fatal: connection to server was lost
> > ```
> >
> > 2018年12月27日(木) 21:57 Yugo Nagata :
> >
> >> Hi,
> >>
> >> I would 

Re: Memory-Bounded Hash Aggregation

2020-02-18 Thread Melanie Plageman
On Tue, Feb 18, 2020 at 10:57 AM Tomas Vondra 
wrote:

> Hi,
>
> I wanted to take a look at this thread and do a review, but it's not
> very clear to me if the recent patches posted here are independent or
> how exactly they fit together. I see
>
> 1) hashagg-20200212-1.patch (2020/02/13 by Jeff)
>
> 2) refactor.patch (2020/02/13 by Jeff)
>
> 3) v1-0001-aggregated-unaggregated-cols-together.patch (2020/02/14 by
> Melanie)
>
> I suppose this also confuses the cfbot - it's probably only testing (3)
> as it's the last thing posted here, at least I think that's the case.
>
> And it fails:
>
> nodeAgg.c: In function ‘find_aggregated_cols_walker’:
> nodeAgg.c:1208:2: error: ISO C90 forbids mixed declarations and code
> [-Werror=declaration-after-statement]
>FindColsContext *find_cols_context = (FindColsContext *) context;
>^
> nodeAgg.c: In function ‘find_unaggregated_cols_walker’:
> nodeAgg.c:1225:2: error: ISO C90 forbids mixed declarations and code
> [-Werror=declaration-after-statement]
>FindColsContext *find_cols_context = (FindColsContext *) context;
>^
> cc1: all warnings being treated as errors
> : recipe for target 'nodeAgg.o' failed
> make[3]: *** [nodeAgg.o] Error 1
> make[3]: *** Waiting for unfinished jobs
>
>
Oops! Sorry, I would fix the code that those compiler warnings is
complaining about, but that would confuse the cfbot more. So, I'll let
Jeff decide what he wants to do about the patch at all (e.g. include
it in his overall patch or exclude it for now). Anyway it is trivial
to move those declarations up, were he to decide to include it.

-- 
Melanie Plageman


Re: Extracting only the columns needed for a query

2020-02-18 Thread Melanie Plageman
On Fri, Jan 31, 2020 at 9:52 AM Melanie Plageman 
wrote:

> I'm bumping this to the next commitfest because I haven't had a chance
> to address the questions posed by Dmitry. I'm sure I'll get to it in
> the next few weeks.
>
> > I believe it would be beneficial to add this potential API extension
> patch into
> > the thread (as an example of an interface defining how scanCols could be
> used)
> > and review them together.
>
> As for including some code that uses the scanCols, after discussing
> off-list with a few folks, there are three options I would like to
> pursue for doing this.
>
> One option I will pursue is using the scanCols to inform the columns
> needed to be spilled for memory-bounded hashagg (mentioned by Jeff
> here [1]).
>
>
> The third is exercising it with a test only but providing an example
> of how a table AM API user like Zedstore uses the columns during
> planning.
>

Outside of the use case that Pengzhou has provided in [1], we started
looking into using scanCols for extracting the subset of columns
needed in two cases:

1) columns required to be spilled for memory-bounded hashagg
2) referenced CTE columns which must be materialized into tuplestore

However, implementing these optimization with the scanCols patch
wouldn't work with its current implementation.

The scanCols are extracted from PlannerInfo->simple_rel_array and
PlannerInfo->simple_rte_array, at which point, we have no way of
knowing if the column was aggregated or if it was part of a CTE or
anything else about how it is used in the query.

We could solve this by creating multiple bitmaps at the time that we
create the scanCols field -- one for aggregated columns, one for
unaggregated columns, one for CTEs. However, that seems like it would
add a lot of extra complexity to the common code path during planning.

Basically, scanCols are simply columns that need to be scanned. It is
probably okay if it is only used by table access method API users, as
Pengzhou's patch illustrates.

Given that we have addressed the feedback about showing a use case,
this patch is probably ready for a once over from Dmitry again. (It is
registered for the March fest).

[1]
https://www.postgresql.org/message-id/CAG4reAQc9vYdmQXh%3D1D789x8XJ%3DgEkV%2BE%2BfT9%2Bs9tOWDXX3L9Q%40mail.gmail.com

-- 
Melanie Plageman


Re: BUG #16108: Colorization to the output of command-line has unproperly behaviors at Windows platform

2020-02-18 Thread Michail Nikolaev
P.S.

Also, should we enable vt100 mode in case of PG_COLOR=always? I think yes.


Re: Transactions involving multiple postgres foreign servers, take 2

2020-02-18 Thread Masahiko Sawada
On Tue, 11 Feb 2020 at 12:42, amul sul  wrote:
>
> On Fri, Jan 24, 2020 at 11:31 AM Masahiko Sawada 
>  wrote:
>>
>> On Fri, 6 Dec 2019 at 17:33, Kyotaro Horiguchi  
>> wrote:
>> >
>> > Hello.
>> >
>> > This is the reased (and a bit fixed) version of the patch. This
>> > applies on the master HEAD and passes all provided tests.
>> >
>> > I took over this work from Sawada-san. I'll begin with reviewing the
>> > current patch.
>> >
>>
>> The previous patch set is no longer applied cleanly to the current
>> HEAD. I've updated and slightly modified the codes.
>>
>> This patch set has been marked as Waiting on Author for a long time
>> but the correct status now is Needs Review. The patch actually was
>> updated and incorporated all review comments but they was not rebased
>> actively.
>>
>> The mail[1] I posted before would be helpful to understand the current
>> patch design and there are README in the patch and a wiki page[2].
>>
>> I've marked this as Needs Review.
>>
>
> Hi Sawada san,
>
> I just had a quick look to 0001 and 0002 patch here is the few suggestions.
>
> patch: v27-0001:
>
> Typo: s/non-temprary/non-temporary
> 
>
> patch: v27-0002: (Note:The left-hand number is the line number in the 
> v27-0002 patch):
>
>  138 +PostgreSQL's the global transaction manager (GTM), as a distributed 
> transaction
>  139 +participant The registered foreign transactions are tracked until the 
> end of
>
> Full stop "." is missing after "participant"
>
>
> 174 +API Contract With Transaction Management Callback Functions
>
> Can we just say "Transaction Management Callback Functions";
> TOBH, I am not sure that I understand this title.
>
>
>  203 +processing foreign transaction (i.g. preparing, committing or aborting) 
> the
>
> Do you mean "i.e" instead of i.g. ?
>
>
> 269 + * RollbackForeignTransactionAPI. Registered participant servers are 
> identified
>
> Add space before between RollbackForeignTransaction and API.
>
>
>  292 + * automatically so must be processed manually using by 
> pg_resovle_fdwxact()
>
> Do you mean pg_resolve_foreign_xact() here?
>
>
>  320 + *   the foreign transaction is authorized to update the fields from 
> its own
>  321 + *   one.
>  322 +
>  323 + * Therefore, before doing PREPARE, COMMIT PREPARED or ROLLBACK 
> PREPARED a
>
> Please add asterisk '*' on line#322.
>
>
>  816 +static void
>  817 +FdwXactPrepareForeignTransactions(void)
>  818 +{
>  819 +   ListCell*lcell;
>
> Let's have this variable name as "lc" like elsewhere.
>
>
> 1036 +   ereport(ERROR, (errmsg("could not insert a foreign 
> transaction entry"),
> 1037 +   errdetail("duplicate entry with transaction 
> id %u, serverid %u, userid %u",
> 1038 +  xid, serverid, userid)));
> 1039 +   }
>
> Incorrect formatting.
>
>
> 1166 +/*
> 1167 + * Return true and set FdwXactAtomicCommitReady to true if the current 
> transaction
>
> Do you mean ForeignTwophaseCommitIsRequired instead of 
> FdwXactAtomicCommitReady?
>
>
> 3529 +
> 3530 +/*
> 3531 + * FdwXactLauncherRegister
> 3532 + * Register a background worker running the foreign transaction
> 3533 + *  launcher.
> 3534 + */
>
> This prolog style is not consistent with the other function in the file.
>
>
> And here are the few typos:
>
> s/conssitent/consistent
> s/consisnts/consist
> s/Foriegn/Foreign
> s/tranascation/transaction
> s/itselft/itself
> s/rolbacked/rollbacked
> s/trasaction/transaction
> s/transactio/transaction
> s/automically/automatically
> s/CommitForeignTransaciton/CommitForeignTransaction
> s/Similary/Similarly
> s/FDWACT_/FDWXACT_
> s/dink/disk
> s/requried/required
> s/trasactions/transactions
> s/prepread/prepared
> s/preapred/prepared
> s/beging/being
> s/gxact/xact
> s/in-dbout/in-doubt
> s/respecitively/respectively
> s/transction/transaction
> s/idenetifier/identifier
> s/identifer/identifier
> s/checkpoint'S/checkpoint's
> s/fo/of
> s/transcation/transaction
> s/trasanction/transaction
> s/non-temprary/non-temporary
> s/resovler_internal.h/resolver_internal.h
>
>

Thank you for reviewing the patch! I've incorporated all comments in
local branch.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: BUG #16108: Colorization to the output of command-line has unproperly behaviors at Windows platform

2020-02-18 Thread Michail Nikolaev
Hello everyone.

>  Please find attached a version that supports older Mingw versions and SDKs.

I have checked the patch source code and it seems to be working. But a
few moments I want to mention:

I think it is not good idea to mix the logic of detecting the fact of
TTY with enabling of the VT100 mode. Yeah, it seems to be correct for
current case but a little confusing.
Maybe is it better to detect terminal using *isatty* and later call
*enable_vt_mode*?

Also, it seems like if GetConsoleMode returns
ENABLE_VIRTUAL_TERMINAL_PROCESSING flag already set - we could skip
SetConsoleMode call (not a big deal of course).

Thanks,
Michail.




Re: ssl passphrase callback

2020-02-18 Thread Andrew Dunstan
On Wed, Feb 19, 2020 at 7:10 AM Andrew Dunstan
 wrote:
>
> On Tue, Feb 18, 2020 at 2:01 PM Thomas Munro  wrote:
> >
> > On Wed, Jan 22, 2020 at 8:02 PM Andrew Dunstan
> >  wrote:
> > > Not sure if the placement is what you want, but maybe something like this?
> >
> > Hi Andrew, FYI this failed here:
> >
> > t/001_testfunc.pl .. Bailout called. Further testing stopped: pg_ctl
> > start failed
> > FAILED--Further testing stopped: pg_ctl start failed
> > Makefile:23: recipe for target 'prove-check' failed
> >
> > Unfortunately my robot is poorly trained and does not dump any of the
> > interesting logs for this case, but it looks like it's failing that
> > way every time.
> >
> > https://travis-ci.org/postgresql-cfbot/postgresql/builds/651756191
>
>
> Thanks for letting me know, I will investigate.
>


This should fix the issue, it happened when I switched to using a
pre-generated cert/key.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 18d3fff068..b0a8816cff 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -45,6 +45,9 @@
 #include "tcop/tcopprot.h"
 #include "utils/memutils.h"
 
+/* default init hook can be overridden by a shared library */
+static void  default_openssl_tls_init(SSL_CTX *context, bool isServerStart);
+openssl_tls_init_hook_typ openssl_tls_init_hook = default_openssl_tls_init;
 
 static int	my_sock_read(BIO *h, char *buf, int size);
 static int	my_sock_write(BIO *h, const char *buf, int size);
@@ -116,27 +119,10 @@ be_tls_init(bool isServerStart)
 	SSL_CTX_set_mode(context, SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER);
 
 	/*
-	 * Set password callback
+	 * Call init hook (usually to set password callback)
 	 */
-	if (isServerStart)
-	{
-		if (ssl_passphrase_command[0])
-			SSL_CTX_set_default_passwd_cb(context, ssl_external_passwd_cb);
-	}
-	else
-	{
-		if (ssl_passphrase_command[0] && ssl_passphrase_command_supports_reload)
-			SSL_CTX_set_default_passwd_cb(context, ssl_external_passwd_cb);
-		else
+	(* openssl_tls_init_hook)(context, isServerStart);
 
-			/*
-			 * If reloading and no external command is configured, override
-			 * OpenSSL's default handling of passphrase-protected files,
-			 * because we don't want to prompt for a passphrase in an
-			 * already-running server.
-			 */
-			SSL_CTX_set_default_passwd_cb(context, dummy_ssl_passwd_cb);
-	}
 	/* used by the callback */
 	ssl_is_server_start = isServerStart;
 
@@ -1313,3 +1299,27 @@ ssl_protocol_version_to_openssl(int v, const char *guc_name, int loglevel)
 	GetConfigOption(guc_name, false, false;
 	return -1;
 }
+
+
+static void
+default_openssl_tls_init(SSL_CTX *context, bool isServerStart)
+{
+	if (isServerStart)
+	{
+		if (ssl_passphrase_command[0])
+			SSL_CTX_set_default_passwd_cb(context, ssl_external_passwd_cb);
+	}
+	else
+	{
+		if (ssl_passphrase_command[0] && ssl_passphrase_command_supports_reload)
+			SSL_CTX_set_default_passwd_cb(context, ssl_external_passwd_cb);
+		else
+			/*
+			 * If reloading and no external command is configured, override
+			 * OpenSSL's default handling of passphrase-protected files,
+			 * because we don't want to prompt for a passphrase in an
+			 * already-running server.
+			 */
+			SSL_CTX_set_default_passwd_cb(context, dummy_ssl_passwd_cb);
+	}
+}
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index b3986bee75..53776d6198 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -972,17 +972,6 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	LocalProcessControlFile(false);
 
-	/*
-	 * Initialize SSL library, if specified.
-	 */
-#ifdef USE_SSL
-	if (EnableSSL)
-	{
-		(void) secure_initialize(true);
-		LoadedSSL = true;
-	}
-#endif
-
 	/*
 	 * Register the apply launcher.  Since it registers a background worker,
 	 * it needs to be called before InitializeMaxBackends(), and it's probably
@@ -996,6 +985,17 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	process_shared_preload_libraries();
 
+	/*
+	 * Initialize SSL library, if specified.
+	 */
+#ifdef USE_SSL
+	if (EnableSSL)
+	{
+		(void) secure_initialize(true);
+		LoadedSSL = true;
+	}
+#endif
+
 	/*
 	 * Now that loadable modules have had their chance to register background
 	 * workers, calculate MaxBackends.
diff --git a/src/include/libpq/libpq-be.h b/src/include/libpq/libpq-be.h
index 82e57afc64..90f80bb33a 100644
--- a/src/include/libpq/libpq-be.h
+++ b/src/include/libpq/libpq-be.h
@@ -287,6 +287,10 @@ extern void be_tls_get_peer_serial(Port *port, char *ptr, size_t len);
 extern char *be_tls_get_certificate_hash(Port *port, size_t *len);
 #endif
 
+/* init hook for SSL, the default sets the passwor callback if appropriate */
+typedef void(* openssl_tls_init_hook_typ)(SSL_CTX *context, bool 

Re: Improve search for missing parent downlinks in amcheck

2020-02-18 Thread Peter Geoghegan
On Tue, Feb 18, 2020 at 2:16 AM Alexander Korotkov
 wrote:
> Great, thank you very much!

No problem!

My remarks here are based on
"amcheck-btree-improve-missing-parent-downlinks-check-6.patch". I have
found a false positive corruption report bug in this latest version --
see note below about incomplete page splits.

> > Don't need the "!P_ISLEAF()" here.
>
> Why don't I need.  bt_downlink_connectivity_check() checks one level
> down to the target level.  But there is no one level down to leaf...

Because offset_is_negative_infinity() checks P_ISLEAF() for you. Maybe
it's better your way, though -- apparently it's clearer.

> > > Alternatively
> > > +* we might find child with high key while traversing from
> > > +* previous downlink to current one.  Then matching key 
> > > resides
> > > +* the same offset number as current downlink.
> > > +*/
> >
> > Not sure what "traversing from previous downlink to current one" means at 
> > all.
>
> I've rephrased this comment, please check.

> Agree, replaced _bt_compare() with bt_pivot_tuple_identical().  It
> becomes even simpler now, thanks!

There was actually an even better reason to invent
bt_pivot_tuple_identical(): a call to _bt_compare() in amcheck needs
to do something like the extra steps that you see in routines like
invariant_l_offset(). _bt_compare() will return 0 when the insertion
scankey has a prefix of scankey/column values that are equal, even
though there may be additional columns in the index tuple that are not
compared. So, you could have a truncated multi-column high key that is
"equal" to pivot tuple in parent that is actually to the right in the
key space. This blind spot would often occur with low cardinality
indexes, where we often have something like this in pivot tuples on
internal pages:

'foo, -inf'
'foo, (1,24)'
'food, -inf'. <-- This pivot tuple's downlink points to the final leaf
page that's filled with duplicates of the value 'foo'
'food, (3,19)' <-- This pivot tuple's downlink points to the *first*
leaf page that's filled with duplicates of the value 'food'
...

The situation is really common in low cardinality indexes because
nbtsplitloc.c hates splitting a leaf page between two duplicates -- it
is avoided whenever possible. You reliably get a '-inf' value for the
TID in the first pivot tuple for the duplicate, followed by a real
heap TID for later pivot tuples for pages with the same duplicate
value.

(Anyway, it's not important now.)

> > I think bt_downlink_check() and bt_downlink_connectivity_check()
> > should be renamed to something broader. In my mind, downlink is
> > basically a block number. We have been sloppy about using the term
> > downlink when we really mean "pivot tuple with a downlink" -- I am
> > guilty of this myself. But it seems more important, now that you have
> > the new high key check.
>
> Hmm... Names are hard for me.  I didn't do any renaming for now.  What
> about this?
> bt_downlink_check() => bt_child_check()
> bt_downlink_connectivity_check() => bt_child_connectivity_highkey_check()

I suggest:

bt_downlink_check() => bt_child_check()
bt_downlink_connectivity_check() => bt_child_highkey_check()

While bt_downlink_connectivity_check() moves right on the target's
child level, this isn't the common case. Moving right like that
doesn't need to be suggested by the name of the function.

Most of the time, we just check the high key -- right?

> I've revised finding the matching pivot key for high key.  Now, code
> assumes we should always find a matching pivot key.  It could use both
> target high key or left sibling high key (which is memorized as "low
> key").
>
> I've checked this on normal indexes, but I didn't try to exercise this
> with broken indexes.  I would appreciate if you do.

I can confirm that checking the high key in target's child page
against the high key in target (when appropriate) removes the "cousin
page verification blind spot" that I noticed in the last version, as
expected. Great!

* You should say "target page lsn" here instead:

pg@tpce:5432 [19852]=# select bt_index_parent_check(:'idxrelation',true, true);
ERROR:  mismatch between parent key and child high key in index "i_holding2"
DETAIL:  Target block=1570 child block=1690 parent page lsn=0/0.
Time: 12.509 ms

* Maybe say "Move to the right on the child level" in a comment above
the bt_downlink_connectivity_check() "while (true)" loop here:

> +
> +   while (true)
> +   {
> +   /*
> +* Did we traverse the whole tree level and this is check for pages to
> +* the right of rightmost downlink?
> +*/

* If you are going to save a low key for the target page in memory,
then you only need to do so for "state->readonly"/parent verification.

* You should s/lokey/lowkey/ -- I prefer the spelling "lowkey" or "low
key". This is a term that nbtsort.c now uses, in case you didn't know.

* The reason for saving a low key for each target page is very
unclear. Can 

Re: Delaying/avoiding BTreeTupleGetNAtts() call within _bt_compare()

2020-02-18 Thread Thomas Munro
On Mon, Feb 10, 2020 at 10:05 PM Floris Van Nee
 wrote:
> > The interesting thing now is the role of the "negative infinity test"
> > patch (the 0003-* patch) in all of this. I suspect that it may not be 
> > helping us
> > much here. I wonder, could you test the following configurations to settle
> > this question?
> >
> > *  with 30 clients (i.e. repeat the test that you reported on most
> > recently)
> >
> > *  with 30 clients (i.e. repeat the test that you reported got 
> > us
> > that nice ~8.6% increase in TPS)
> >
> > *  with 30 clients -- a new test, to see if performance is at all
> > helped by the "negative infinity test" patch (the 0003-* patch).
> >
> > It seems like a good idea to repeat the other two tests as part of 
> > performing
> > this third test, out of general paranoia. Intel seem to roll out a microcode
> > update for a spectre-like security issue about every other day.
> >
>
> I ran all the tests on two different machines, several times for 1 hour each 
> time. I'm still having a hard time getting reliable results for the 30 
> clients case though. I'm pretty certain the patches bring a performance 
> benefit, but how high exactly is difficult to say. As for applying only patch 
> 1+2 or all three patches - I found no significant difference between these 
> two cases. It looks like all the performance benefit is in the first two 
> patches.

The cfbot seems to be showing "pg_regress: initdb failed" on Ubuntu,
with an assertion failure like this:

#2 0x008e594f in ExceptionalCondition
(conditionName=conditionName@entry=0x949098 "BTreeTupleGetNAtts(itup,
rel) >= key->keysz", errorType=errorType@entry=0x938a7d
"FailedAssertion", fileName=fileName@entry=0x949292 "nbtsearch.c",
lineNumber=lineNumber@entry=620) at assert.c:67
No locals.
#3  0x004fdbaa in _bt_compare_inl (offnum=3,
page=0x7ff7904bdf00 "", key=0x7ffde7c9bfa0, rel=0x7ff7a2325c20) at
nbtsearch.c:620
itup = 0x7ff7904bfec8
heapTid = 
ski = 
itupdesc = 0x7ff7a2325f50
scankey = 
ntupatts = 0

https://travis-ci.org/postgresql-cfbot/postgresql/builds/651843143

It's passing on Windows though, so perhaps there is something
uninitialised or otherwise unstable in the patch?




Re: ssl passphrase callback

2020-02-18 Thread Andrew Dunstan
On Tue, Feb 18, 2020 at 2:01 PM Thomas Munro  wrote:
>
> On Wed, Jan 22, 2020 at 8:02 PM Andrew Dunstan
>  wrote:
> > Not sure if the placement is what you want, but maybe something like this?
>
> Hi Andrew, FYI this failed here:
>
> t/001_testfunc.pl .. Bailout called. Further testing stopped: pg_ctl
> start failed
> FAILED--Further testing stopped: pg_ctl start failed
> Makefile:23: recipe for target 'prove-check' failed
>
> Unfortunately my robot is poorly trained and does not dump any of the
> interesting logs for this case, but it looks like it's failing that
> way every time.
>
> https://travis-ci.org/postgresql-cfbot/postgresql/builds/651756191


Thanks for letting me know, I will investigate.

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: First WAL segment file that initdb creates

2020-02-18 Thread David Zhang
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:tested, passed

The issue has been verified using below steps:
1. $ initdb -D /home/test/PG122DATA/data
2. $ ls -l /home/test/PG122DATA/data/pg_wal/
total 16388
-rw--- 1 test test 16777216 Feb 18 12:07 00010001
drwx-- 2 test test 4096 Feb 18 12:07 archive_status

The first WAL segment file created by initdb is "00010001", not 
"0001".

Re: Resolving the python 2 -> python 3 mess

2020-02-18 Thread Corey Huinker
>
> So, as with Jesse's example, what I'm wondering is whether or not 2to3
> will fix that for you (or even flag it).  The basic difference between
> the two alternatives I suggested is whether we force people to put their
> python function through that converter before we'll even try to run it.
> Subtleties that 2to3 doesn't catch seem like non-reasons to insist on
> applying it.
>

The 2018 vintage of 2to3 didn't catch it.

It's not firsthand knowledge, but I just watched a nearby team have some
production issues where one library couldn't fetch b'http://foo.org' so I'm
guessing 2to3 still doesn't catch those things, or they stopped using it.


Re: Memory-Bounded Hash Aggregation

2020-02-18 Thread Tomas Vondra

Hi,

I wanted to take a look at this thread and do a review, but it's not
very clear to me if the recent patches posted here are independent or
how exactly they fit together. I see

1) hashagg-20200212-1.patch (2020/02/13 by Jeff)

2) refactor.patch (2020/02/13 by Jeff)

3) v1-0001-aggregated-unaggregated-cols-together.patch (2020/02/14 by
   Melanie)

I suppose this also confuses the cfbot - it's probably only testing (3)
as it's the last thing posted here, at least I think that's the case.

And it fails:

nodeAgg.c: In function ‘find_aggregated_cols_walker’:
nodeAgg.c:1208:2: error: ISO C90 forbids mixed declarations and code 
[-Werror=declaration-after-statement]
  FindColsContext *find_cols_context = (FindColsContext *) context;
  ^
nodeAgg.c: In function ‘find_unaggregated_cols_walker’:
nodeAgg.c:1225:2: error: ISO C90 forbids mixed declarations and code 
[-Werror=declaration-after-statement]
  FindColsContext *find_cols_context = (FindColsContext *) context;
  ^
cc1: all warnings being treated as errors
: recipe for target 'nodeAgg.o' failed
make[3]: *** [nodeAgg.o] Error 1
make[3]: *** Waiting for unfinished jobs


It's probably a good idea to either start a separate thread for patches
that are only loosely related to the main topic, or always post the
whole patch series.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: DB running out of memory issues after upgrade

2020-02-18 Thread Merlin Moncure
On Tue, Feb 18, 2020 at 12:10 PM Nagaraj Raj  wrote:
>
> Below are the same configurations ins .conf file before and after updagrade
>
> show max_connections; = 1743
> show shared_buffers = "4057840kB"
> show effective_cache_size =  "8115688kB"
> show maintenance_work_mem = "259MB"
> show checkpoint_completion_target = "0.9"
> show wal_buffers = "16MB"
> show default_statistics_target = "100"
> show random_page_cost = "1.1"
> show effective_io_concurrency =" 200"
> show work_mem = "4MB"
> show min_wal_size = "256MB"
> show max_wal_size = "2GB"
> show max_worker_processes = "8"
> show max_parallel_workers_per_gather = "2"

This smells like oom killer for sure.  how did you resolve some of
these values.  In particular max_connections and effective_cache_size.
  How much memory is in this server?

merlin




Re: Making psql error out on output failures

2020-02-18 Thread David Zhang

hi,

I did some further research on this bug. Here is the summary:

1. Tried to wrap "fputs" similar to "fprintf" redefined by "pg_fprintf", 
but it ended up with too much error due to "fputs" is called everywhere 
in "print_unaligned_text". If add an extra static variable to track the 
output status, then it will be an overkill and potentially more bugs may 
be introduced.


2. Investigated some other libraries such as libexplain 
(http://libexplain.sourceforge.net/), but it definitely will introduce 
the complexity.


3. I think a better way to resolve this issue will still be the solution 
with an extra %m, which can make the error message much more informative 
to the end users. The reason is that,


3.1 Providing the reasons for errors is required by PostgreSQL document, 
https://www.postgresql.org/docs/12/error-style-guide.html

    "Reasons for Errors
    Messages should always state the reason why an error occurred. For 
example:


    BAD:    could not open file %s
    BETTER: could not open file %s (I/O failure)
    If no reason is known you better fix the code."

3.2 Adding "%m" can provide a common and easy to understand reasons 
crossing many operating systems. The "%m" fix has been tested on 
platforms: CentOS 7, RedHat 7, Ubuntu 18.04, Windows 7/10, and macOS 
Mojave 10.14, and it works.


3.3 If multiple errors happened after the root cause "No space left on 
device", such as "No such file or directory" and "Permission denied", 
then it make sense to report the latest one. The end users suppose to 
know the latest error and solve it first. Eventually, "No space left on 
device" error will be showing up.


3.4 Test log on different operating systems.

### CentOS 7
[postgres@localhost ~]$ uname -a
Linux localhost.localdomain 3.10.0-1062.9.1.el7.x86_64 #1 SMP Fri Dec 6 
15:49:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux


[postgres@localhost ~]$ sudo mount -t tmpfs -o rw,size=1M tmpfs /mnt/ramdisk
[postgres@localhost ~]$ df -h
tmpfs    1.0M 0  1.0M   0% /mnt/ramdisk

[postgres@localhost ~]$ psql
psql (12.2)
Type "help" for help.

postgres=# select repeat('111', 100) \g /mnt/ramdisk/file
Error printing tuples (No space left on device)

[postgres@localhost ~]$ psql -d postgres  -At -c "select repeat('111', 
100)" -o /mnt/ramdisk/file

Error printing tuples (No space left on device)

[postgres@localhost ~]$ psql -d postgres  -At -c "select repeat('111', 
100)" > /mnt/ramdisk/file

Error printing tuples (No space left on device)


### RedHat 7
[root@rh7 postgres]# uname -a
Linux rh7 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC 2019 
x86_64 x86_64 x86_64 GNU/Linux

[postgres@rh7 ~]$ sudo mkdir -p /mnt/ramdisk

We trust you have received the usual lecture from the local System
Administrator. It usually boils down to these three things:

    #1) Respect the privacy of others.
    #2) Think before you type.
    #3) With great power comes great responsibility.

[sudo] password for postgres:
[postgres@rh7 ~]$ sudo mount -t tmpfs -o rw,size=1M tmpfs /mnt/ramdisk

[postgres@rh7 ~]$ psql -d postgres
psql (12.2)
Type "help" for help.

postgres=# select repeat('111', 100) \g /mnt/ramdisk/file
Error printing tuples (No space left on device)

[postgres@rh7 ~]$ psql -d postgres  -At -c "select repeat('111', 
100)" > /mnt/ramdisk/file

Error printing tuples (No space left on device)

[postgres@rh7 ~]$ psql -d postgres  -At -c "select repeat('111', 
100)" -o /mnt/ramdisk/file

Error printing tuples (No space left on device)


### Ubuntu 18.04
postgres=# select repeat('111', 1000) \g /mnt/ramdisk/file
Error printing tuples (No space left on device)
postgres=# \q

david@david-VM:~$ psql -d postgres  -At -c "select repeat('111', 
100)" > /mnt/ramdisk/file

Error printing tuples (No space left on device)

david@david-VM:~$ psql -d postgres  -At -c "select repeat('111', 
100)" -o /mnt/ramdisk/file

Error printing tuples (No space left on device)

### Windows 7
postgres=# select repeat('111', 100) \g E:/file
Error printing tuples (No space left on device)
postgres=# \q

C:\pg12.1\bin>psql -d postgres -U postgres -h 172.20.14.29 -At -c 
"select repeat

('111', 10)" >> E:/file
Error printing tuples (No space left on device)

C:\pg12.1\bin>psql -d postgres -U postgres -h 172.20.14.29 -At -c 
"select repeat

('111', 10)" -o E:/file
Error printing tuples (No space left on device)

### Windows 10
postgres=# select repeat('111', 100) \g E:/file
Error printing tuples (No space left on device)
postgres=# \q

C:\>psql -d postgres -U postgres -h 192.168.0.19 -At -c "select 
repeat('111', 1000)" -o E:/file

Error printing tuples (No space left on device)

C:\>psql -d postgres -U postgres -h 192.168.0.19 -At -c "select 
repeat('111', 200)" >> E:/file

Error printing tuples (No space left on device)

### macOS Mojave 10.14
postgres=# select repeat('111', 100) \g  /Volumes/sdcard/file
Error printing tuples (No space left on device)
postgres=# 

Re: DB running out of memory issues after upgrade

2020-02-18 Thread Nagaraj Raj
 Below are the same configurations ins .conf file before and after updagrade
show max_connections; = 1743show shared_buffers = "4057840kB"show 
effective_cache_size =  "8115688kB"show maintenance_work_mem = "259MB"show 
checkpoint_completion_target = "0.9"show wal_buffers = "16MB"show 
default_statistics_target = "100"show random_page_cost = "1.1"show 
effective_io_concurrency =" 200"show work_mem = "4MB"show min_wal_size = 
"256MB"show max_wal_size = "2GB"show max_worker_processes = "8"show 
max_parallel_workers_per_gather = "2"

here is some sys logs,
2020-02-16 21:01:17 UTC [-]The database process was killed by the OS 
due to excessive memory consumption. 2020-02-16 13:41:16 UTC [-]The 
database process was killed by the OS due to excessive memory consumption. 

I identified one simple select which consuming more memory and here is the 
query plan,


"Result  (cost=0.00..94891854.11 rows=3160784900 width=288)""  ->  Append  
(cost=0.00..47480080.61 rows=3160784900 width=288)""        ->  Seq Scan on 
msghist  (cost=0.00..15682777.12 rows=312949 width=288)""              
Filter: (((data -> 'info'::text) ->> 'status'::text) = 'CLOSE'::text)""        
->  Seq Scan on msghist msghist_1  (cost=0.00..189454.50 rows=31294900 
width=288)""              Filter: (((data -> 'info'::text) ->> 'status'::text) 
= 'CLOSE'::text)"


Thanks,


On Tuesday, February 18, 2020, 09:59:37 AM PST, Tomas Vondra 
 wrote:  
 
 On Tue, Feb 18, 2020 at 05:46:28PM +, Nagaraj Raj wrote:
>after upgrade Postgres to v9.6.11 from v9.6.9 DB running out of memory issues 
>no world load has changed before and after upgrade. 
>
>spec: RAM 16gb,4vCore
>Any bug reported like this or suggestions on how to fix this issue? I 
>appreciate the response..!! 
>

This bug report (in fact, we don't know if it's a bug, but OK) is
woefully incomplete :-(

The server log is mostly useless, unfortunately - it just says a bunch
of processes were killed (by OOM killer, most likely) so the server has
to restart. It tells us nothing about why the backends consumed so much
memory etc.

What would help us is knowing how much memory was the backend (killed by
OOM) consuming, which should be in dmesg.

And then MemoryContextStats output - you need to connect to a backend
consuming a lot of memory using gdb (before it gets killed) and do

  (gdb) p MemoryContextStats(TopMemoryContext)
  (gdb) q

and show us the output printed into server log. If it's a backend
running a query, it'd help knowing the execution plan.

It would also help knowing the non-default configuration, i.e. stuff
tweaked in postgresql.conf.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 


  

Re: DB running out of memory issues after upgrade

2020-02-18 Thread Tomas Vondra

On Tue, Feb 18, 2020 at 05:46:28PM +, Nagaraj Raj wrote:

after upgrade Postgres to v9.6.11 from v9.6.9 DB running out of memory issues 
no world load has changed before and after upgrade. 

spec: RAM 16gb,4vCore
Any bug reported like this or suggestions on how to fix this issue? I 
appreciate the response..!! 



This bug report (in fact, we don't know if it's a bug, but OK) is
woefully incomplete :-(

The server log is mostly useless, unfortunately - it just says a bunch
of processes were killed (by OOM killer, most likely) so the server has
to restart. It tells us nothing about why the backends consumed so much
memory etc.

What would help us is knowing how much memory was the backend (killed by
OOM) consuming, which should be in dmesg.

And then MemoryContextStats output - you need to connect to a backend
consuming a lot of memory using gdb (before it gets killed) and do

 (gdb) p MemoryContextStats(TopMemoryContext)
 (gdb) q

and show us the output printed into server log. If it's a backend
running a query, it'd help knowing the execution plan.

It would also help knowing the non-default configuration, i.e. stuff
tweaked in postgresql.conf.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: plan cache overhead on plpgsql expression

2020-02-18 Thread Amit Langote
On Tue, Feb 18, 2020 at 6:56 PM Amit Langote  wrote:
> On Tue, Feb 18, 2020 at 2:56 PM Pavel Stehule  wrote:
> > út 18. 2. 2020 v 6:03 odesílatel Amit Langote  
> > napsal:
> >> I didn't send the patch, because it didn't handle the cases where a
> >> simple expression consists of an inline-able function(s) in it, which
> >> are better handled by a full-fledged planner call backed up by the
> >> plan cache.  If we don't do that then every evaluation of such
> >> "simple" expression needs to invoke the planner.  For example:
> >>
> >> Consider this inline-able SQL function:
> >>
> >> create or replace function sql_incr(a bigint)
> >> returns int
> >> immutable language sql as $$
> >> select a+1;
> >> $$;
> >>
> >> Then this revised body of your function foo():
> >>
> >> CREATE OR REPLACE FUNCTION public.foo()
> >>  RETURNS int
> >>  LANGUAGE plpgsql
> >>  IMMUTABLE
> >> AS $function$
> >> declare i bigint = 0;
> >> begin
> >>   while i < 100
> >>   loop
> >> i := sql_incr(i);
> >>   end loop; return i;
> >> end;
> >> $function$
> >> ;
> >>
> >> With HEAD `select foo()` finishes in 786 ms, whereas with the patch,
> >> it takes 5102 ms.
> >>
> >> I think the patch might be good idea to reduce the time to compute
> >> simple expressions in plpgsql, if we can address the above issue.
> >
> >
> > Your patch is very interesting - minimally it returns performance before 
> > 8.2. The mentioned issue can be fixed if we disallow SQL functions in this 
> > fast execution.
>
> I updated the patch to do that.
>
> With the new patch, `select foo()`, with inline-able sql_incr() in it,
> runs in 679 ms.
>
> Without any inline-able function, it runs in 330 ms, whereas with
> HEAD, it takes 590 ms.

I polished it a bit.

Thanks,
Amit


plpgsql-simple-exprs_v3.patch
Description: Binary data


Re: Resolving the python 2 -> python 3 mess

2020-02-18 Thread Tom Lane
Corey Huinker  writes:
>> A possible gotcha in this approach is if there are any python 2/3
>> incompatibilities that would not manifest as syntax errors or
>> obvious runtime errors, but would allow old code to execute and
>> silently do the wrong thing.

> Unfortunately, I think there are cases like that. The shift to Unicode as
> the default string means that some functions that used to return a `str`
> now return a `bytes` (I know of this in the hashlib and base64 modules, but
> probably also in URL request data and others), and to use a `bytes` in
> string manipulation you have to first explicitly convert it to some string
> encoding. So things like a function that wraps around a python crypto
> library would be the exact places where those was-str-now-bytes functions
> would be used.

So, as with Jesse's example, what I'm wondering is whether or not 2to3
will fix that for you (or even flag it).  The basic difference between
the two alternatives I suggested is whether we force people to put their
python function through that converter before we'll even try to run it.
Subtleties that 2to3 doesn't catch seem like non-reasons to insist on
applying it.

regards, tom lane




Re: Clean up some old cruft related to Windows

2020-02-18 Thread Juan José Santamaría Flecha
On Tue, Feb 18, 2020 at 12:26 PM Juan José Santamaría Flecha <
juanjo.santama...@gmail.com> wrote:

>
> [Edit] ... probably Windows 7 SDK was the *last* place where it was
included [1]...

>
> Regards,
>
> Juan José Santamaría Flecha
>


Re: Marking some contrib modules as trusted extensions

2020-02-18 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Andres Freund  writes:
> > On 2020-01-29 14:41:16 -0500, Tom Lane wrote:
> >> pgcrypto
> 
> > FWIW, given the code quality, I'm doubtful about putting itq into the 
> > trusted
> > section.
> 
> I don't particularly have an opinion about that --- is it really that
> awful?  If there is anything broken in it, wouldn't we consider that
> a security problem anyhow?

I would certainly hope so- and I would expect that to go for any of the
other extensions which are included in core.  If we aren't going to
maintain them and deal with security issues in them, then we should drop
them.

Which goes back to my earlier complaint that having extensions in core
which aren't or can't be marked as trusted is not a position we should
put our users in.  Either they're maintained and have been vetted
through our commit process, or they aren't and should be removed.

> > Especially with FROM UNPACKAGED it seems like it'd be fairly easy to get
> > an extension script to do dangerous things (as superuser). One could
> > just create pre-existing objects that have *not* been created by a
> > previous version, and some upgrade scripts would do pretty weird
> > stuff. There's several that do things like updating catalogs directly
> > etc.  It seems to me that FROM UNPACKAGED shouldn't support trusted.
> 
> Hmm, seems like a reasonable idea, but I'm not quite sure how to mechanize
> it given that "unpackaged" isn't magic in any way so far as extension.c
> is concerned.  Maybe we could decide that the time for supporting easy
> updates from pre-9.1 is past, and just remove all the unpackaged-to-XXX
> scripts?  Maybe even remove the "FROM version" option altogether.

I agree in general with dropping the unpackaged-to-XXX bits.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Parallel copy

2020-02-18 Thread David Fetter
On Tue, Feb 18, 2020 at 06:51:29PM +0530, Amit Kapila wrote:
> On Tue, Feb 18, 2020 at 5:59 PM Ants Aasma  wrote:
> >
> > On Tue, 18 Feb 2020 at 12:20, Amit Kapila  wrote:
> > > This is something similar to what I had also in mind for this idea.  I
> > > had thought of handing over complete chunk (64K or whatever we
> > > decide).  The one thing that slightly bothers me is that we will add
> > > some additional overhead of copying to and from shared memory which
> > > was earlier from local process memory.  And, the tokenization (finding
> > > line boundaries) would be serial.  I think that tokenization should be
> > > a small part of the overall work we do during the copy operation, but
> > > will do some measurements to ascertain the same.
> >
> > I don't think any extra copying is needed.
> 
> I am talking about access to shared memory instead of the process
> local memory.  I understand that an extra copy won't be required.

Isn't accessing shared memory from different pieces of execution what
threads were designed to do?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Parallel copy

2020-02-18 Thread Ants Aasma
On Tue, 18 Feb 2020 at 15:21, Amit Kapila  wrote:
>
> On Tue, Feb 18, 2020 at 5:59 PM Ants Aasma  wrote:
> >
> > On Tue, 18 Feb 2020 at 12:20, Amit Kapila  wrote:
> > > This is something similar to what I had also in mind for this idea.  I
> > > had thought of handing over complete chunk (64K or whatever we
> > > decide).  The one thing that slightly bothers me is that we will add
> > > some additional overhead of copying to and from shared memory which
> > > was earlier from local process memory.  And, the tokenization (finding
> > > line boundaries) would be serial.  I think that tokenization should be
> > > a small part of the overall work we do during the copy operation, but
> > > will do some measurements to ascertain the same.
> >
> > I don't think any extra copying is needed.
> >
>
> I am talking about access to shared memory instead of the process
> local memory.  I understand that an extra copy won't be required.
>
> > The reader can directly
> > fread()/pq_copymsgbytes() into shared memory, and the workers can run
> > CopyReadLineText() inner loop directly off of the buffer in shared memory.
> >
>
> I am slightly confused here.  AFAIU, the for(;;) loop in
> CopyReadLineText is about finding the line endings which we thought
> that the reader process will do.

Indeed, I somehow misread the code while scanning over it. So CopyReadLineText
currently copies data from cstate->raw_buf to the StringInfo in
cstate->line_buf. In parallel mode it would copy it from the shared data buffer
to local line_buf until it hits the line end found by the data reader. The
amount of copying done is still exactly the same as it is now.

Regards,
Ants Aasma




Re: Index only scan and ctid

2020-02-18 Thread Tom Lane
Greg Stark  writes:
> For the user visible ctid we could just arbitrarily declare that the ctid
> returned by an IOS is the head of the HOT update chain instead of the tail.

No, I don't think that'd work at all, because that tuple might be dead.
A minimum expectation is that "SELECT ... WHERE ctid = 'xxx'" would return
the same data as the IOS, and that would fail because it wouldn't return
anything.

(In principle I suppose we could *also* redefine what selecting by ctid
means.  Doubt I want to go there though.)

> For a data modifying query -- and it would have to be one targeting some
> other table or else there's no way it could be an IOS -- does having a ctid
> for the head rather than the tail still work?

If you target a tuple that is live according to your current snapshot,
but nonetheless out-of-date, EPQ will chase up to the head for you.
But you gotta start with a tuple that is visible to your snapshot.

regards, tom lane




Re: Parallel copy

2020-02-18 Thread Mike Blackwell
On Sun, Feb 16, 2020 at 12:51 AM Andrew Dunstan <
andrew.duns...@2ndquadrant.com> wrote:

>
> IIRC, in_quote only matters here in CSV mode (because CSV fields can
> have embedded newlines). So why not just forbid parallel copy in CSV
> mode, at least for now? I guess it depends on the actual use case. If we
> expect to be parallel loading humungous CSVs then that won't fly.


Loading large CSV files is pretty common here.  I hope this can be
supported.



MIKE BLACKWELL


* *



>


Re: Collation versioning

2020-02-18 Thread Julien Rouhaud
On Mon, Feb 17, 2020 at 5:58 AM Thomas Munro  wrote:
>
> On Thu, Feb 13, 2020 at 8:13 AM Julien Rouhaud  wrote:
> > Hearing no complaints on the suggestions, I'm attaching v8 to address that:
> >
> > - pg_dump is now using a binary_upgrade_set_index_coll_version() function
> >   rather than plain DDL
> > - the additional DDL is now of the form:
> >   ALTER INDEX name ALTER COLLATION name REFRESH VERSION
>
> +1
>
> A couple of thoughts:
>
> @@ -1115,21 +1117,44 @@ index_create(Relation heapRelation,
> ...
> +   /*
> +* Get required distinct dependencies on collations
> for all index keys.
> +* Collations of directly referenced column in hash
> indexes can be
> +* skipped is they're deterministic.
> +*/
> for (i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++)
> {
> -   if (OidIsValid(collationObjectId[i]) &&
> -   collationObjectId[i] != DEFAULT_COLLATION_OID)
> +   Oid colloid = collationObjectId[i];
> +
> +   if (OidIsValid(colloid))
> {
> -   referenced.classId = CollationRelationId;
> -   referenced.objectId = collationObjectId[i];
> -   referenced.objectSubId = 0;
> +   if ((indexInfo->ii_Am != HASH_AM_OID) ||
> +
> !get_collation_isdeterministic(colloid))
>
> I still don't like the way catalog/index.c has hard-coded knowledge of
> HASH_AM_OID here.  Although it errs on the side of the assuming that
> there *is* a version dependency (good)

Oh, but it also means that it fails to create a versionless
dependency, which is totally wrong.  What we should do is instead
setup a "track_version" flag to pass down.

It also means that the current way I handled unknown version (empty
string) vs unknown collation lib version (null) will be problematic,
both for runtime check and pg_upgrade.  I think we should record an
empty string for both cases, and keep NULL for when explicitly no
version has to be recorded (whether it's not a dependency on
collation, or because the depender doesn't care about version).  It
also mean that I'm missing regression tests using such an access
method.

> there is already another AM in
> the tree that could safely skip it for deterministic collations AFAIK:
> Bloom indexes.  I suppose that any extension AM that is doing some
> kind of hashing would also like to be able to be able to opt out of
> collation version checking, when that is safe.  The question is how to
> model that in our system...

Oh indeed.

> One way would be for each AM to declare whether it is affected by
> collations; the answer could be yes/maybe (default), no,
> only-non-deterministic-ones.  But that still feels like the wrong
> level, not taking advantage of knowledge about operators.

On the other hand, would it be possible that some AM only supports
collation-dependency-free operators while still internally relying on
a stable sort order?

> A better way might be to make declarations about that sort of thing in
> the catalog, somewhere in the vicinity of the operator classes, or
> maybe just to have hard coded knowledge about operator classes (ie
> making declarations in the manual about what eg hash functions are
> allowed to consult and when), and then check which of those an index
> depends on.  I am not sure what would be best, I'd need to spend some
> time studying the am operator system.

I think this will be required at some point anyway, if we want to
eventually avoid warning about possible corruption when an
expression/where clause isn't depending on the collation ordering.

> Perhaps for the first version of this feature, we should just add a
> new local function
> index_can_skip_collation_version_dependency(indexInfo, colloid) to
> encapsulate your existing logic, but add a comment that in future we
> might be able to support skipping in more cases through analysis of
> the catalogs.

That'd be convenient, but would also break extensibility as bloom
would get a preferential treatment (even if such AM doesn't already
exist).

>
> +   
> +ALTER COLLATION
> +
> + 
> +  This command declares that the index is compatible with the currently
> +  installed version of the collations that determine its order.  It is 
> used
> +  to silence warnings caused by collation version incompatibilities and
> +  should be called after rebuilding the index or otherwise verifying its
> +  consistency.  Be aware that incorrect use of this command can hide 
> index
> +  corruption.
> + 
> +
> +   
>
> This sounds like something that you need to do after you reindex, but
> that's not true, is it?  This is something you can do *instead* of
> reindex, to make it shut up about versions.  Shouldn't it be something
> like "... should be issued only if the 

Re: Index only scan and ctid

2020-02-18 Thread Greg Stark
For the user visible ctid we could just arbitrarily declare that the ctid
returned by an IOS is the head of the HOT update chain instead of the tail.
It might be a bit confusing when sequential scans return the tail (or
whichever member is visible). But it's not really wrong, all the members of
the chain are equally valid answers.

For a data modifying query -- and it would have to be one targeting some
other table or else there's no way it could be an IOS -- does having a ctid
for the head rather than the tail still work? I'm not clear how EPQ works
for such cases. Does it still do an index scan at all or does it just do a
ctid scan? And does it follow HOT update chains if the row was updated?

On Tue., Feb. 4, 2020, 13:23 Tom Lane,  wrote:

> Laurenz Albe  writes:
> > On Mon, 2020-02-03 at 14:43 -0500, Tom Lane wrote:
> >> Laurenz Albe  writes:
> >>> I noticed that "ctid" in the select list prevents an index only scan:
> >>> This strikes me as strange, since every index contains "ctid".
>
> >> There's no provision for an IOS to return a system column, though.
> >> Not sure what it'd take to make that possible.
>
> > I was reminded what the obvious problem is:
> > the ctid of a heap only tuple is not stored in the index.  Duh.
>
> Duh ... the members of a HOT chain share the same indexed value(s),
> which is why we needn't care exactly which one is live during IOS.
> But they don't have the same TID.  Oh well.
>
> regards, tom lane
>
>
>


Re: Parallel copy

2020-02-18 Thread Amit Kapila
On Tue, Feb 18, 2020 at 5:59 PM Ants Aasma  wrote:
>
> On Tue, 18 Feb 2020 at 12:20, Amit Kapila  wrote:
> > This is something similar to what I had also in mind for this idea.  I
> > had thought of handing over complete chunk (64K or whatever we
> > decide).  The one thing that slightly bothers me is that we will add
> > some additional overhead of copying to and from shared memory which
> > was earlier from local process memory.  And, the tokenization (finding
> > line boundaries) would be serial.  I think that tokenization should be
> > a small part of the overall work we do during the copy operation, but
> > will do some measurements to ascertain the same.
>
> I don't think any extra copying is needed.
>

I am talking about access to shared memory instead of the process
local memory.  I understand that an extra copy won't be required.

> The reader can directly
> fread()/pq_copymsgbytes() into shared memory, and the workers can run
> CopyReadLineText() inner loop directly off of the buffer in shared memory.
>

I am slightly confused here.  AFAIU, the for(;;) loop in
CopyReadLineText is about finding the line endings which we thought
that the reader process will do.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Implementing Incremental View Maintenance

2020-02-18 Thread nuko yokohama
Hi.

SELECT statements with a TABLESAMPLE clause should be rejected.

Currently, CREATE INCREMENTAL MATERIALIZED VIEW allows SELECT statements
with the TABLESAMPLE clause.
However, the result of this SELECT statement is undefined and should be
rejected when specified in CREATE INCREMENTAL MATERIALIZED VIEW.
(similar to handling non-immutable functions)
Regard.

2020年2月8日(土) 11:15 nuko yokohama :

> Hi.
>
> UNION query problem.(server crash)
>
> When creating an INCREMENTAL MATERIALIZED VIEW,
> the server process crashes if you specify a query with a UNION.
>
> (commit id = 23151be7be8d8f8f9c35c2d0e4e5353aedf2b31e)
>
> execute log.
>
> ```
> [ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f union_query_crash.sql
> DROP TABLE IF EXISTS table_x CASCADE;
> psql:union_query_crash.sql:6: NOTICE:  drop cascades to view xy_union_v
> DROP TABLE
> DROP TABLE IF EXISTS table_y CASCADE;
> DROP TABLE
> CREATE TABLE table_x (id int, data numeric);
> CREATE TABLE
> CREATE TABLE table_y (id int, data numeric);
> CREATE TABLE
> INSERT INTO table_x VALUES (generate_series(1, 3), random()::numeric);
> INSERT 0 3
> INSERT INTO table_y VALUES (generate_series(1, 3), random()::numeric);
> INSERT 0 3
> SELECT * FROM table_x;
>  id |data
> +
>   1 |  0.950724735058774
>   2 | 0.0222670808201144
>   3 |  0.391258547114841
> (3 rows)
>
> SELECT * FROM table_y;
>  id |data
> +
>   1 |  0.991717347778337
>   2 | 0.0528458947672874
>   3 |  0.965044982911163
> (3 rows)
>
> CREATE VIEW xy_union_v AS
> SELECT 'table_x' AS name, * FROM table_x
> UNION
> SELECT 'table_y' AS name, * FROM table_y
> ;
> CREATE VIEW
> TABLE xy_union_v;
>   name   | id |data
> -++
>  table_y |  2 | 0.0528458947672874
>  table_x |  2 | 0.0222670808201144
>  table_y |  3 |  0.965044982911163
>  table_x |  1 |  0.950724735058774
>  table_x |  3 |  0.391258547114841
>  table_y |  1 |  0.991717347778337
> (6 rows)
>
> CREATE INCREMENTAL MATERIALIZED VIEW xy_imv AS
> SELECT 'table_x' AS name, * FROM table_x
> UNION
> SELECT 'table_y' AS name, * FROM table_y
> ;
> psql:union_query_crash.sql:28: server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> psql:union_query_crash.sql:28: fatal: connection to server was lost
> ```
> UNION query problem.(server crash)
>
> When creating an INCREMENTAL MATERIALIZED VIEW,
> the server process crashes if you specify a query with a UNION.
>
> (commit id = 23151be7be8d8f8f9c35c2d0e4e5353aedf2b31e)
>
> execute log.
>
> ```
> [ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f union_query_crash.sql
> DROP TABLE IF EXISTS table_x CASCADE;
> psql:union_query_crash.sql:6: NOTICE:  drop cascades to view xy_union_v
> DROP TABLE
> DROP TABLE IF EXISTS table_y CASCADE;
> DROP TABLE
> CREATE TABLE table_x (id int, data numeric);
> CREATE TABLE
> CREATE TABLE table_y (id int, data numeric);
> CREATE TABLE
> INSERT INTO table_x VALUES (generate_series(1, 3), random()::numeric);
> INSERT 0 3
> INSERT INTO table_y VALUES (generate_series(1, 3), random()::numeric);
> INSERT 0 3
> SELECT * FROM table_x;
>  id |data
> +
>   1 |  0.950724735058774
>   2 | 0.0222670808201144
>   3 |  0.391258547114841
> (3 rows)
>
> SELECT * FROM table_y;
>  id |data
> +
>   1 |  0.991717347778337
>   2 | 0.0528458947672874
>   3 |  0.965044982911163
> (3 rows)
>
> CREATE VIEW xy_union_v AS
> SELECT 'table_x' AS name, * FROM table_x
> UNION
> SELECT 'table_y' AS name, * FROM table_y
> ;
> CREATE VIEW
> TABLE xy_union_v;
>   name   | id |data
> -++
>  table_y |  2 | 0.0528458947672874
>  table_x |  2 | 0.0222670808201144
>  table_y |  3 |  0.965044982911163
>  table_x |  1 |  0.950724735058774
>  table_x |  3 |  0.391258547114841
>  table_y |  1 |  0.991717347778337
> (6 rows)
>
> CREATE INCREMENTAL MATERIALIZED VIEW xy_imv AS
> SELECT 'table_x' AS name, * FROM table_x
> UNION
> SELECT 'table_y' AS name, * FROM table_y
> ;
> psql:union_query_crash.sql:28: server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> psql:union_query_crash.sql:28: fatal: connection to server was lost
> ```
>
> 2018年12月27日(木) 21:57 Yugo Nagata :
>
>> Hi,
>>
>> I would like to implement Incremental View Maintenance (IVM) on
>> PostgreSQL.
>> IVM is a technique to maintain materialized views which computes and
>> applies
>> only the incremental changes to the materialized views rather than
>> recomputate the contents as the current REFRESH command does.
>>
>> I had a presentation on our PoC implementation of IVM at PGConf.eu 2018
>> [1].
>> Our implementation uses row OIDs to compute deltas for materialized
>> views.
>> The basic idea is that if we have information about which rows in base
>> tables

Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-02-18 Thread Fujii Masao



On 2020/02/18 16:53, Amit Langote wrote:

On Tue, Feb 18, 2020 at 4:42 PM Fujii Masao  wrote:

On 2020/02/18 16:02, Amit Langote wrote:

I noticed that there is missing  tag in the documentation changes:


Could you tell me where I should add  tag?


+ 
+  waiting for checkpoint to finish
+  
+   The WAL sender process is currently performing
+   pg_start_backup to set up for
+   taking a base backup, and waiting for backup start
+   checkpoint to finish.
+  
+ 

There should be a  between  and  at the end of the
hunk shown above.


Will fix. Thanks!


Just to clarify, that's the missing  tag I am talking about above.


OK, so I added  tag just after the above .


+  
+   Whenever pg_basebackup is taking a base
+   backup, the pg_stat_progress_basebackup
+   view will contain a row for each WAL sender process that is currently
+   running BASE_BACKUP replication command
+   and streaming the backup.

I understand that you wrote "Whenever pg_basebackup is taking a
backup...", because description of other views contains a similar
starting line.  But, it may not only be pg_basebackup that would be
served by this view, no?  It could be any tool that speaks Postgres'
replication protocol and thus be able to send a BASE_BACKUP command.
If that is correct, I would write something like "When an application
is taking a backup" or some such without specific reference to
pg_basebackup.  Thoughts?


Yeah, there may be some such applications. But most users would
use pg_basebackup, so getting rid of the reference to pg_basebackup
would make the description a bit difficult-to-read. Also I can imagine
that an user of those backup applications would get to know
the progress reporting view from their documents. So I prefer
the existing one or something like "Whenever an application like
   pg_basebackup ...". Thought?


Sure, "an application like pg_basebackup" sounds fine to me.


OK, I changed the doc that way. Attached the updated version of the patch.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 87586a7b06..dd4a668eea 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -376,6 +376,14 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   
11:34   0:00 postgres: ser
   
  
 
+ 
+  
pg_stat_progress_basebackuppg_stat_progress_basebackup
+  One row for each WAL sender process streaming a base backup,
+   showing current progress.
+   See .
+  
+ 
+
 

   
@@ -3535,7 +3543,10 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
certain commands during command execution.  Currently, the only commands
which support progress reporting are ANALYZE,
CLUSTER,
-   CREATE INDEX, and VACUUM.
+   CREATE INDEX, VACUUM,
+   and  (i.e., replication
+   command that  issues to take
+   a base backup).
This may be expanded in the future.
   
 
@@ -4336,6 +4347,156 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,


   
+ 
+
+ 
+  Base Backup Progress Reporting
+
+  
+   Whenever an application like pg_basebackup
+   is taking a base backup, the
+   pg_stat_progress_basebackup
+   view will contain a row for each WAL sender process that is currently
+   running BASE_BACKUP replication command
+   and streaming the backup. The tables below describe the information
+   that will be reported and provide information about how to interpret it.
+  
+
+  
+   pg_stat_progress_basebackup View
+   
+
+
+  Column
+  Type
+  Description
+ 
+
+
+   
+
+ pid
+ integer
+ Process ID of a WAL sender process.
+
+
+ phase
+ text
+ Current processing phase. See .
+
+
+ backup_total
+ bigint
+ 
+  Total amount of data that will be streamed. If progress reporting
+  is not enabled in pg_basebackup
+  (i.e., --progress option is not specified),
+  this is 0. Otherwise, this is estimated and
+  reported as of the beginning of
+  streaming database files phase. Note that
+  this is only an approximation since the database
+  may change during streaming database files phase
+  and WAL log may be included in the backup later. This is always
+  the same value as backup_streamed
+  once the amount of data streamed exceeds the estimated
+  total size.
+ 
+
+
+ backup_streamed
+ bigint
+ 
+  Amount of data streamed. This counter only advances
+  when the phase is streaming database files or
+  transfering wal files.
+ 
+
+
+ tablespaces_total
+ bigint
+ 
+  Total number of tablespaces that will be streamed.
+ 
+
+
+ tablespaces_streamed
+ bigint
+ 
+  Number of tablespaces streamed. This counter only
+  advances when the phase is streaming database 

Re: Parallel copy

2020-02-18 Thread Ants Aasma
On Tue, 18 Feb 2020 at 12:20, Amit Kapila  wrote:
> This is something similar to what I had also in mind for this idea.  I
> had thought of handing over complete chunk (64K or whatever we
> decide).  The one thing that slightly bothers me is that we will add
> some additional overhead of copying to and from shared memory which
> was earlier from local process memory.  And, the tokenization (finding
> line boundaries) would be serial.  I think that tokenization should be
> a small part of the overall work we do during the copy operation, but
> will do some measurements to ascertain the same.

I don't think any extra copying is needed. The reader can directly
fread()/pq_copymsgbytes() into shared memory, and the workers can run
CopyReadLineText() inner loop directly off of the buffer in shared memory.

For serial performance of tokenization into lines, I really think a SIMD
based approach will be fast enough for quite some time. I hacked up the code in
the simdcsv  project to only tokenize on line endings and it was able to
tokenize a CSV file with short lines at 8+ GB/s. There are going to be many
other bottlenecks before this one starts limiting. Patch attached if you'd
like to try that out.

Regards,
Ants Aasma
diff --git a/src/main.cpp b/src/main.cpp
index 9d33a85..2cf775c 100644
--- a/src/main.cpp
+++ b/src/main.cpp
@@ -185,7 +185,6 @@ bool find_indexes(const uint8_t * buf, size_t len, ParsedCSV & pcsv) {
 #endif
 simd_input in = fill_input(buf+internal_idx);
 uint64_t quote_mask = find_quote_mask(in, prev_iter_inside_quote);
-uint64_t sep = cmp_mask_against_input(in, ',');
 #ifdef CRLF
 uint64_t cr = cmp_mask_against_input(in, 0x0d);
 uint64_t cr_adjusted = (cr << 1) | prev_iter_cr_end;
@@ -195,7 +194,7 @@ bool find_indexes(const uint8_t * buf, size_t len, ParsedCSV & pcsv) {
 #else
 uint64_t end = cmp_mask_against_input(in, 0x0a);
 #endif
-fields[b] = (end | sep) & ~quote_mask;
+fields[b] = (end) & ~quote_mask;
   }
   for(size_t b = 0; b < SIMDCSV_BUFFERSIZE; b++){
 size_t internal_idx = 64 * b + idx;
@@ -211,7 +210,6 @@ bool find_indexes(const uint8_t * buf, size_t len, ParsedCSV & pcsv) {
 #endif
   simd_input in = fill_input(buf+idx);
   uint64_t quote_mask = find_quote_mask(in, prev_iter_inside_quote);
-  uint64_t sep = cmp_mask_against_input(in, ',');
 #ifdef CRLF
   uint64_t cr = cmp_mask_against_input(in, 0x0d);
   uint64_t cr_adjusted = (cr << 1) | prev_iter_cr_end;
@@ -226,7 +224,7 @@ bool find_indexes(const uint8_t * buf, size_t len, ParsedCSV & pcsv) {
 // then outside the quotes with LF so it's OK to "and off"
 // the quoted bits here. Some other quote convention would
 // need to be thought about carefully
-  uint64_t field_sep = (end | sep) & ~quote_mask;
+  uint64_t field_sep = (end) & ~quote_mask;
   flatten_bits(base_ptr, base, idx, field_sep);
   }
 #undef SIMDCSV_BUFFERSIZE


Re: Clean up some old cruft related to Windows

2020-02-18 Thread Juan José Santamaría Flecha
On Tue, Feb 18, 2020 at 9:56 AM Kyotaro Horiguchi 
wrote:

>
> https://www.postgresql.org/docs/current/install-windows-full.html
> > In recent SDK versions you can change the targeted CPU architecture,
> > build type, and target OS by using the setenv command, e.g. setenv
> > /x86 /release /xp to target Windows XP or later with a 32-bit
> > release build. See /? for other options to setenv. All commands
> > should be run from the src\tools\msvc directory.
>
> AFAICS we cannot use "setenv command" on cmd.exe, or no such command
> found in the msvc directory.
>
>
I cannot point when SetEnv.bat was exactly dropped, probably Windows 7 SDK
was the place where it was included [1], so that needs to be updated.

Using VS2019 and VS2017 this would be done using VsDevCmd.bat [2], while
VS2015 and VS2013 use VSVARS32.bat.

[1]
https://docs.microsoft.com/en-us/previous-versions/visualstudio/windows-sdk/ff660764(v=vs.100)?redirectedfrom=MSDN
[2]
https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/compiler-options/how-to-set-environment-variables-for-the-visual-studio-command-line

Regards,

Juan José Santamaría Flecha


Re: Clean up some old cruft related to Windows

2020-02-18 Thread Juan José Santamaría Flecha
On Tue, Feb 18, 2020 at 7:54 AM Michael Paquier  wrote:

> On Thu, Dec 19, 2019 at 08:09:45PM +0100, Juan José Santamaría Flecha
> wrote:
> > This is probably not an issue for the supported MSVC and their SDK, but
> > current MinGW defaults to Windows 2003 [1]. So I would suggest a logic
> like:
> >
> > #define WINNTVER(ver) ((ver) >> 16)
> > #define NTDDI_VERSION 0x06000100
> > #define _WIN32_WINNT WINNTVER(NTDDI_VERSION)
> >
> > [1]
> >
> https://github.com/mirror/mingw-w64/blob/master/mingw-w64-headers/include/sdkddkver.h
>
> You're right, thanks for the pointer.  This is this part of the
> header:
> #define NTDDI_VERSION NTDDI_WS03
>
> Thinking more about that, the changes in win32.h are giving me cold
> feet.
>
>
Maybe this needs a specific thread, as it is not quite cruft but something
that will require maintenance.

Regards,

Juan José Santamaría Flecha


Re: Parallel copy

2020-02-18 Thread Kyotaro Horiguchi
At Tue, 18 Feb 2020 15:59:36 +0530, Amit Kapila  wrote 
in 
> On Tue, Feb 18, 2020 at 7:28 AM Kyotaro Horiguchi
>  wrote:
> >
> > In an extreme case, if we didn't see a QUOTE in a chunk, we cannot
> > know the chunk is in a quoted section or not, until all the past
> > chunks are parsed.  After all we are forced to parse fully
> > sequentially as far as we allow QUOTE.
> >
> 
> Right, I think the benefits of this as compared to single reader idea
> would be (a) we can save accessing shared memory for the most part of
> the chunk (b) for non-csv mode, even the tokenization (finding line
> boundaries) would also be parallel.   OTOH, doing processing
> differently for csv and non-csv mode might not be good.

Agreed. So I think it's a good point of compromize.

> > On the other hand, if we allowed "COPY t FROM f WITH (FORMAT CSV,
> > QUOTE '')" in order to signal that there's no quoted section in the
> > file then all chunks would be fully concurrently parsable.
> >
> 
> Yeah, if we can provide such an option, we can probably make parallel
> csv processing equivalent to non-csv.  However, users might not like
> this as I think in some cases it won't be easier for them to tell
> whether the file has quoted fields or not.  I am not very sure of this
> point.

I'm not sure how large portion of the usage contains quoted sections,
so I'm not sure how it is useful..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] WAL logging problem in 9.4.3?

2020-02-18 Thread Kyotaro Horiguchi
Oops. I played on a wrong branch and got stuck in slow build on
Windows...

At Tue, 18 Feb 2020 00:53:37 -0800, Noah Misch  wrote in 
> On Tue, Feb 18, 2020 at 03:56:15PM +1300, Thomas Munro wrote:
> > CREATE TYPE priv_testtype1 AS (a int, b text);
> > +ERROR: relation 24844 deleted while still in use
> > REVOKE USAGE ON TYPE priv_testtype1 FROM PUBLIC;
> > 
> > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.79923
> > 
> > It didn't fail on the same OS a couple of days earlier:
> > 
> > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/30829686
> 
> Thanks for the report.  This reproduces consistently under
> CLOBBER_CACHE_ALWAYS (which, coincidentally, I started today).  Removing the
> heap_create() change fixes it.  Since we now restore a saved rd_createSubid,
> the heap_create() change is obsolete.  My next version will include that fix.

Yes, ATExecAddIndex correctly set createSubid without that.

> The system uses rd_createSubid to mean two things.  First, rd_node is new.
> Second, the rel might not yet be in catalogs, so we can't rebuild its relcache
> entry.  The first can be false while the second is true, hence this failure.
> However, the second is true in a relatively-narrow period in which we don't
> run arbitrary user code.  Hence, that simple fix suffices.

I didn't care the second meaning. I thought it is caused by
invalidation but I couldn't get a core dump on Windows 10.. The
comment for RelationCacheInvalidate seems faintly explains about the
second meaning.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Parallel copy

2020-02-18 Thread Amit Kapila
On Tue, Feb 18, 2020 at 7:28 AM Kyotaro Horiguchi
 wrote:
>
> At Mon, 17 Feb 2020 16:49:22 +0530, Amit Kapila  
> wrote in
> > On Sun, Feb 16, 2020 at 12:21 PM Andrew Dunstan
> >  wrote:
> > > On 2/15/20 7:32 AM, Amit Kapila wrote:
> > > > On Sat, Feb 15, 2020 at 4:08 PM Alastair Turner  
> > > > wrot> > So why not just forbid parallel copy in CSV
> > > mode, at least for now? I guess it depends on the actual use case. If we
> > > expect to be parallel loading humungous CSVs then that won't fly.
> > >
> >
> > I am not sure about this part.  However, I guess we should at the very
> > least have some extendable solution that can deal with csv, otherwise,
> > we might end up re-designing everything if someday we want to deal
> > with CSV.  One naive idea is that in csv mode, we can set up the
> > things slightly differently like the worker, won't start processing
> > the chunk unless the previous chunk is completely parsed.  So each
> > worker would first parse and tokenize the entire chunk and then start
> > writing it.  So, this will make the reading/parsing part serialized,
> > but writes can still be parallel.  Now, I don't know if it is a good
> > idea to process in a different way for csv mode.
>
> In an extreme case, if we didn't see a QUOTE in a chunk, we cannot
> know the chunk is in a quoted section or not, until all the past
> chunks are parsed.  After all we are forced to parse fully
> sequentially as far as we allow QUOTE.
>

Right, I think the benefits of this as compared to single reader idea
would be (a) we can save accessing shared memory for the most part of
the chunk (b) for non-csv mode, even the tokenization (finding line
boundaries) would also be parallel.   OTOH, doing processing
differently for csv and non-csv mode might not be good.

> On the other hand, if we allowed "COPY t FROM f WITH (FORMAT CSV,
> QUOTE '')" in order to signal that there's no quoted section in the
> file then all chunks would be fully concurrently parsable.
>

Yeah, if we can provide such an option, we can probably make parallel
csv processing equivalent to non-csv.  However, users might not like
this as I think in some cases it won't be easier for them to tell
whether the file has quoted fields or not.  I am not very sure of this
point.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel copy

2020-02-18 Thread Amit Kapila
On Mon, Feb 17, 2020 at 8:34 PM Ants Aasma  wrote:
>
> On Sat, 15 Feb 2020 at 14:32, Amit Kapila  wrote:
> > Good point and I agree with you that having a single process would
> > avoid any such stuff.   However, I will think some more on it and if
> > you/anyone else gets some idea on how to deal with this in a
> > multi-worker system (where we can allow each worker to read and
> > process the chunk) then feel free to share your thoughts.
>
> I think having a single process handle splitting the input into tuples makes
> most sense. It's possible to parse csv at multiple GB/s rates [1], finding
> tuple boundaries is a subset of that task.
>
> My first thought for a design would be to have two shared memory ring buffers,
> one for data and one for tuple start positions. Reader process reads the CSV
> data into the main buffer, finds tuple start locations in there and writes
> those to the secondary buffer.
>
> Worker processes claim a chunk of tuple positions from the secondary buffer 
> and
> update their "keep this data around" position with the first position. Then
> proceed to parse and insert the tuples, updating their position until they 
> find
> the end of the last tuple in the chunk.
>

This is something similar to what I had also in mind for this idea.  I
had thought of handing over complete chunk (64K or whatever we
decide).  The one thing that slightly bothers me is that we will add
some additional overhead of copying to and from shared memory which
was earlier from local process memory.  And, the tokenization (finding
line boundaries) would be serial.  I think that tokenization should be
a small part of the overall work we do during the copy operation, but
will do some measurements to ascertain the same.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Improve search for missing parent downlinks in amcheck

2020-02-18 Thread Alexander Korotkov
On Tue, Feb 18, 2020 at 1:15 PM Alexander Korotkov
 wrote:
> On Fri, Jan 24, 2020 at 4:31 AM Peter Geoghegan  wrote:
> > On Wed, Jan 22, 2020 at 6:41 PM Alexander Korotkov
> >  wrote:
> > > Rebased patch is attached.  Sorry for so huge delay.
> >
> > I really like this patch. Your interest in amcheck is something that
> > makes me feel good about having put so much work into it myself.
> >
> > Here are some review comments:
>
> Great, thank you very much!

Sorry, I've forgot to commit some comments before publishing a patch.
The right version is attached.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


amcheck-btree-improve-missing-parent-downlinks-check-6.patch
Description: Binary data


Re: Improve search for missing parent downlinks in amcheck

2020-02-18 Thread Alexander Korotkov
Hi, Peter!

On Fri, Jan 24, 2020 at 4:31 AM Peter Geoghegan  wrote:
> On Wed, Jan 22, 2020 at 6:41 PM Alexander Korotkov
>  wrote:
> > Rebased patch is attached.  Sorry for so huge delay.
>
> I really like this patch. Your interest in amcheck is something that
> makes me feel good about having put so much work into it myself.
>
> Here are some review comments:

Great, thank you very much!

> > +   /*
> > +* Rightlink and incomplete split flag of previous block referenced by
> > +* downlink.
> > +*/
> > +   BlockNumber prevrightlink;
> > +   boolprevincompletesplit;
> > +
>
> What downlink? What does this mean? Do you mean the most recently
> followed rightlink on the current level, or InvalidBlockNumber if
> target page is the leftmost page on the current level on the scan?
>
> (Thinks some more...)
>
> Actually, these two new fields track the state *one level down* from
> the target page level when !readonly (unless target page is on the
> leaf level). Right? Comments should be explicit about this. The
> current comments about downlinks isn't clear.

I agree. I've used very vague terms in comments.  Revised now.

> > if (offset_is_negative_infinity(topaque, offset))
> > +   {
> > +   /*
> > +* Initialize downlink connectivity check if needed.
> > +*/
> > +   if (!P_ISLEAF(topaque) && state->readonly)
> > +   {
> > +   bt_downlink_connectivity_check(state,
> > +  offset,
> > +  NULL,
> > +  topaque->btpo.level);
> > +   }
> > continue;
> > +   }
>
> Don't need the "!P_ISLEAF()" here.

Why don't I need.  bt_downlink_connectivity_check() checks one level
down to the target level.  But there is no one level down to leaf...

> Also, you should say something like
> "we need to call this here because the usual callsite in
> bt_downlink_check() won't be reached".

Sure, fixed.

> > /*
> > -* * Check if page has a downlink in parent *
> > -*
> > -* This can only be checked in heapallindexed + readonly case.
> > +* If we traversed the whole level to the rightmost page, there might be
> > +* missing downlinks for the pages to the right of rightmost downlink.
> > +* Check for them.
> >  */
>
> You mean "to the right of the child page pointed to by our rightmost 
> downlink"?

Yep, fixed.

> I think that the final bt_downlink_connectivity_check() call within
> bt_target_page_check() should make it clear that it is kind of special
> compared to the other two calls.

Yes, this is fixed too.

> > +/*
> > + * Check connectivity of downlinks.  Traverse rightlinks from previous 
> > downlink
> > + * to the current one.  Check that there are no intermediate pages with 
> > missing
> > + * downlinks.
> > + *
> > + * If 'loaded_page' is given, it's assumed to be contents of downlink
> > + * referenced by 'downlinkoffnum'.
> > + */
>
> Say "assumed to be the page pointed to by the downlink", perhaps?

Yes, fixed.

> > +static void
> > +bt_downlink_connectivity_check(BtreeCheckState *state,
> > +  OffsetNumber downlinkoffnum,
> > +  Page loaded_page,
> > +  uint32 parent_level)
> > +{
>
> In amcheck, we always have a current target page. Every page gets to
> be the target exactly once, though sometimes other subsidiary pages
> are accessed. We try to blame the target page, even with checks that
> are technically against its child/sibling/whatever. The target page is
> always our conceptual point of reference. Sometimes this is a bit
> artificial, but it's still worth doing consistently. So I think you
> should change these argument names with that in mind (see below).

Yes, the arguments were changes as you proposed.

> > +   /*
> > +* If we visit page with high key, check that it should be equal to
> > +* the target key next to corresponding downlink.
> > +*/
>
> I suggest "...check that it is equal to the target key..."

Agree, fixed.

> > +   /*
> > +* There might be two situations when we examine high key.  If
> > +* current child page is referenced by given downlink, we should
> > +* look to the next offset number for matching key.
>
> You mean "the next offset number for the matching key from the target
> page"? I find it much easier to keep this stuff in my head if
> everything is defined in terms of its relationship with the current
> target page. For example, bt_downlink_connectivity_check()'s
> "parent_level" argument should be called "target_level" instead, while
> its "loaded_page" should be called "loaded_child". Maybe
> "downlinkoffnum" should be "target_downlinkoffnum". And downlinkoffnum
> should definitely be explained in comments at the top of
> bt_downlink_connectivity_check() 

Re: plan cache overhead on plpgsql expression

2020-02-18 Thread Amit Langote
On Tue, Feb 18, 2020 at 2:56 PM Pavel Stehule  wrote:
> út 18. 2. 2020 v 6:03 odesílatel Amit Langote  
> napsal:
>> I didn't send the patch, because it didn't handle the cases where a
>> simple expression consists of an inline-able function(s) in it, which
>> are better handled by a full-fledged planner call backed up by the
>> plan cache.  If we don't do that then every evaluation of such
>> "simple" expression needs to invoke the planner.  For example:
>>
>> Consider this inline-able SQL function:
>>
>> create or replace function sql_incr(a bigint)
>> returns int
>> immutable language sql as $$
>> select a+1;
>> $$;
>>
>> Then this revised body of your function foo():
>>
>> CREATE OR REPLACE FUNCTION public.foo()
>>  RETURNS int
>>  LANGUAGE plpgsql
>>  IMMUTABLE
>> AS $function$
>> declare i bigint = 0;
>> begin
>>   while i < 100
>>   loop
>> i := sql_incr(i);
>>   end loop; return i;
>> end;
>> $function$
>> ;
>>
>> With HEAD `select foo()` finishes in 786 ms, whereas with the patch,
>> it takes 5102 ms.
>>
>> I think the patch might be good idea to reduce the time to compute
>> simple expressions in plpgsql, if we can address the above issue.
>
>
> Your patch is very interesting - minimally it returns performance before 8.2. 
> The mentioned issue can be fixed if we disallow SQL functions in this fast 
> execution.

I updated the patch to do that.

With the new patch, `select foo()`, with inline-able sql_incr() in it,
runs in 679 ms.

Without any inline-able function, it runs in 330 ms, whereas with
HEAD, it takes 590 ms.

Thanks,
Amit
diff --git a/src/backend/optimizer/util/clauses.c 
b/src/backend/optimizer/util/clauses.c
index 2d3ec22407..5ce0079a12 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -108,6 +108,7 @@ static bool 
contain_volatile_functions_not_nextval_walker(Node *node, void *cont
 static bool max_parallel_hazard_walker(Node *node,
   
max_parallel_hazard_context *context);
 static bool contain_nonstrict_functions_walker(Node *node, void *context);
+static bool contain_inlinable_functions_walker(Node *node, void *context);
 static bool contain_context_dependent_node(Node *clause);
 static bool contain_context_dependent_node_walker(Node *node, int *flags);
 static bool contain_leaked_vars_walker(Node *node, void *context);
@@ -1218,6 +1219,63 @@ contain_nonstrict_functions_walker(Node *node, void 
*context)
  context);
 }
 
+/*
+ * Check clauses for inline-able functions
+ */
+
+bool
+contain_inlinable_functions(Node *node)
+{
+   return contain_inlinable_functions_walker(node, NULL);
+}
+
+/*
+ * can_inline_function - checks if a function is inline-able
+ */
+static bool
+can_inline_function(HeapTuple func_tuple)
+{
+   Form_pg_proc funcform = (Form_pg_proc) GETSTRUCT(func_tuple);
+
+   /*
+* Nope if the function is not SQL-language or has other showstopper
+* properties.  (The prokind and nargs checks are just paranoia.)
+*/
+   return  funcform->prolang == SQLlanguageId &&
+   funcform->prokind == PROKIND_FUNCTION &&
+   !funcform->prosecdef && !funcform->proretset &&
+   funcform->prorettype != RECORDOID &&
+   heap_attisnull(func_tuple, Anum_pg_proc_proconfig, 
NULL);
+}
+
+static bool
+can_inline_function_checker(Oid funcid, void *context)
+{
+   HeapTuple   func_tuple;
+   boolresult;
+
+   func_tuple = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid));
+   if (!HeapTupleIsValid(func_tuple))
+   elog(ERROR, "cache lookup failed for function %u", funcid);
+
+   result = can_inline_function(func_tuple);
+   ReleaseSysCache(func_tuple);
+
+   return result;
+}
+
+static bool
+contain_inlinable_functions_walker(Node *node, void *context)
+{
+   if (node == NULL)
+   return false;
+   if (check_functions_in_node(node, can_inline_function_checker, context))
+   return true;
+
+   return expression_tree_walker(node, contain_inlinable_functions_walker,
+ context);
+}
+
 /*
  * Check clauses for context-dependent nodes
  */
@@ -4022,7 +4080,8 @@ simplify_function(Oid funcid, Oid result_type, int32 
result_typmod,
Assert(newexpr != (Expr *) );
}
 
-   if (!newexpr && allow_non_const)
+   if (!newexpr && allow_non_const &&
+   can_inline_function(func_tuple))

Re: error context for vacuum to include block number

2020-02-18 Thread Masahiko Sawada
On Mon, 17 Feb 2020 at 15:14, Justin Pryzby  wrote:
>
> On Mon, Feb 17, 2020 at 02:18:11PM +0900, Masahiko Sawada wrote:
> > Oops it seems to me that it's better to set error context after
> > tupindex = 0. Sorry for my bad.
>
> I take your point but did it differently - see what you think
>
> > And the above comment can be written in a single line as other same 
> > comments.
>
> Thanks :)
>
> > Hmm I don't think it's a good idea to have count_nondeletable_pages
> > set the error context of PHASE_TRUNCATE.
>
> I think if we don't do it there then we shouldn't bother handling
> PHASE_TRUNCATE at all, since that's what's likely to hit filesystem or other
> lowlevel errors, before lazy_truncate_heap() hits them.
>
> > Because the patch sets the
> > error context during RelationTruncate that actually truncates the heap
> > but count_nondeletable_pages doesn't truncate it.
>
> I would say that ReadBuffer called by the prefetch in
> count_nondeletable_pages() is called during the course of truncation, the same
> as ReadBuffer called during the course of vacuuming can be attributed to
> vacuuming.

Why do we want to include only count_nondeletable_pages in spite of
that there are also several places where we may wait: waiting for
lock, get the number of blocks etc. User may cancel vacuum during them
but user will not be able to know that vacuum is in truncation phase.
If we want to set the error callback during operation that actually
doesn't truncate heap like count_nondeletable_pages we should set it
for whole lazy_truncate_heap. Otherwise I think we should set it for
only RelationTruncate.

>
> > I think setting the error context only during RelationTruncate would be a
> > good start. We can hear other opinions from other hackers. Some hackers may
> > want to set the error context for whole lazy_truncate_heap.
>
> I avoided doing that since it has several "return" statements, each of which
> would need to "Pop the error context stack", which is at risk of being
> forgotten and left unpopped by anyone who adds or changes flow control.

I imagined that we can add some goto and pop the error callback there.
But since it might make the code bad I suggested to set the error
callback for only RelationTruncate as the first step

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Some problems of recovery conflict wait events

2020-02-18 Thread Masahiko Sawada
Hi all,

When recovery conflicts happen on the streaming replication standby,
the wait event of startup process is null when
max_standby_streaming_delay = 0 (to be exact, when the limit time
calculated by max_standby_streaming_delay is behind the last WAL data
receipt time is behind). Moreover the process title of waiting startup
process looks odd in the case of lock conflicts.

1. When max_standby_streaming_delay > 0 and the startup process
conflicts with a lock,

* wait event
 backend_type | wait_event_type | wait_event
--+-+
 startup  | Lock| relation
(1 row)

* ps
42513   ??  Ss 0:00.05 postgres: startup   recovering
00010003 waiting

Looks good.

2. When max_standby_streaming_delay > 0 and the startup process
conflicts with a snapshot,

* wait event
 backend_type | wait_event_type | wait_event
--+-+
 startup  | |
(1 row)

* ps
44299   ??  Ss 0:00.05 postgres: startup   recovering
00010003 waiting

wait_event_type and wait_event are null in spite of waiting for
conflict resolution.

3. When max_standby_streaming_delay > 0 and the startup process
conflicts with a lock,

* wait event
 backend_type | wait_event_type | wait_event
--+-+
 startup  | |
(1 row)

* ps
46510   ??  Ss 0:00.05 postgres: startup   recovering
00010003 waiting waiting

wait_event_type and wait_event are null and the process title is
wrong; "waiting" appears twice.

The cause of the first problem, wait_event_type and wait_event are not
set, is that WaitExceedsMaxStandbyDelay which is called by
ResolveRecoveryConflictWithVirtualXIDs waits for other transactions
using pg_usleep rather than WaitLatch. I think we can change it so
that it uses WaitLatch and those caller passes wait event information.

For the second problem, wrong process title, the cause is also
relevant with ResolveRecoveryConflictWithVirtualXIDs; in case of lock
conflicts we add "waiting" to the process title in WaitOnLock but we
add it again in ResolveRecoveryConflictWithVirtualXIDs. I think we can
have WaitOnLock not set process title in recovery case.

This problem exists on 12, 11 and 10. I'll submit the patch.

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Clean up some old cruft related to Windows

2020-02-18 Thread Kyotaro Horiguchi
Hello.

I understand that this is not for back-patching.

At Tue, 18 Feb 2020 16:44:59 +0900, Michael Paquier  wrote 
in 
> On Thu, Dec 19, 2019 at 01:46:33PM +0900, Kyotaro Horiguchi wrote:
> > I found some similar places by grep'ing for windows version names the
> > whole source tree.
> > 
> > - The comment for trapsig is mentioning win98/Me/NT/2000/XP.
> 
> Let's refresh the comment here, as per the following:
> https://docs.microsoft.com/en-us/previous-versions/xdkz3x12(v=vs.140)

  * The Windows runtime docs at
  * http://msdn.microsoft.com/library/en-us/vclib/html/_crt_signal.asp
...
- *  Win32 operating systems generate a new thread to specifically handle
- *  that interrupt. This can cause a single-thread application such as 
UNIX,
- *  to become multithreaded, resulting in unexpected behavior.
+ *  SIGINT is not supported for any Win32 application. When a CTRL+C interrupt
+ *  occurs, Win32 operating systems generate a new thread to specifically
+ *  handle that interrupt. This can cause a single-thread application, such as
+ *  one in UNIX, to become multithreaded and cause unexpected behavior.
  *
  * I have no idea how to handle this. (Strange they call UNIX an application!)
  * So this will need some testing on Windows.

The unmodified section just above is griping that "Strange they call
UNIX an application". The expression "application such as UNIX" seems
corresponding to the gripe.  I tried to find the soruce of the phrase
but the above URL (.._crt_signal.asp) sent me "We're sorry, the page
you requested cannot be found.":(

Thank you for checking the belows.

> > - We don't need the (only) caller site of IsWindows7OrGreater()?
> 
> The compiled code can still run with Windows Server 2008. 

Do we let the new PG version for already-unsupported platforms?  If I
don't missing anything Windows Server 2008 is already
End-Of-Extended-Support (2020/1/14) along with Windows 7.

> > - The comment for AddUserToTokenDacl() is mentioning "XP/2K3,
> >   Vista/2008".
> 
> Keeping some context is still good here IMO.

I'm fine with that.

> > - InitializeLDAPConnection dynamically loads WLDAP32.DLL for Windows
> >   2000. It could either be statically loaded or could be left as it
> >   is, but the comment seems to need a change in either case.
> 
> Looks safer to me to keep it.

If it is still possible that the file is missing on Windows 8/ Server
2012 or later, the comment should be updatd accordingly.

> > - The comment for IsoLocaleName mentioning Vista and Visual Studio
> >   2012.
> 
> It is good to keep some history in this context.

Agreed.

> > - install-windows.sgml is mentioning "XP and later" around line 117.
> 
> But this still applies to XP, even if compilation is supported from
> Windows 7.

Hmm. "/xp" can be the reason to preserve it.

By the way that pharse is considering Windows environment and perhaps
cmd.exe. So the folloinwg description:

https://www.postgresql.org/docs/current/install-windows-full.html
> In recent SDK versions you can change the targeted CPU architecture,
> build type, and target OS by using the setenv command, e.g. setenv
> /x86 /release /xp to target Windows XP or later with a 32-bit
> release build. See /? for other options to setenv. All commands
> should be run from the src\tools\msvc directory.

AFAICS we cannot use "setenv command" on cmd.exe, or no such command
found in the msvc directory.

> > - installation.sgml is mentioning NT/2000/XP as platforms that don't
> >   support adduser/su, command.
> 
> No objections to simplify that a bit.

Sorry for the ambiguity. I meant the following part

installation.sgml
>  
>   PostgreSQL can be expected to work on these 
> operating
>   systems: Linux (all recent distributions), Windows (Win2000 SP4 and later),
>   FreeBSD, OpenBSD, NetBSD, macOS, AIX, HP/UX, and Solaris.
>   Other Unix-like systems may also work but are not currently
>   being tested.  In most cases, all CPU architectures supported by

(The coming version of) PostgreSQL doesn't support Win2000 SP4.

> Attached is a simplified version.  It is smaller than the previous
> one, but that's already a good cut.  I have also done some testing
> with the service manager to check after pipe_read_line(), and that
> works.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] WAL logging problem in 9.4.3?

2020-02-18 Thread Noah Misch
On Tue, Feb 18, 2020 at 03:56:15PM +1300, Thomas Munro wrote:
> CREATE TYPE priv_testtype1 AS (a int, b text);
> +ERROR: relation 24844 deleted while still in use
> REVOKE USAGE ON TYPE priv_testtype1 FROM PUBLIC;
> 
> https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.79923
> 
> It didn't fail on the same OS a couple of days earlier:
> 
> https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/30829686

Thanks for the report.  This reproduces consistently under
CLOBBER_CACHE_ALWAYS (which, coincidentally, I started today).  Removing the
heap_create() change fixes it.  Since we now restore a saved rd_createSubid,
the heap_create() change is obsolete.  My next version will include that fix.

The system uses rd_createSubid to mean two things.  First, rd_node is new.
Second, the rel might not yet be in catalogs, so we can't rebuild its relcache
entry.  The first can be false while the second is true, hence this failure.
However, the second is true in a relatively-narrow period in which we don't
run arbitrary user code.  Hence, that simple fix suffices.




Re: logical decoding : exceeded maxAllocatedDescs for .spill files

2020-02-18 Thread Amit Kapila
On Tue, Feb 18, 2020 at 11:33 AM Andres Freund  wrote:
>
> On 2020-02-18 11:20:17 +0530, Amit Kapila wrote:
> > Andres, any objections on proceeding with Kuntal's patch for
> > back-branches (10, 9.6 and 9.5)?
>
> Yes. In my past experiments that lead to *terrible* allocator
> performance due to fragmentation. Like, up to 90% of the time spent in
> aset.c.  Try a workload with a number of overlapping transactions that
> have different tuple sizes.
>

I thought slab-cache would have addressed it.  But, it is possible if
there are many-2 such overlapping transactions, then that might lead
to performance regression.  OTOH, the current code also might lead to
worse performance for transactions with multiple subtransactions as
they would frequently need to malloc.

> I'm not even sure it's the right thing to do anything in the back
> branches to be honest. If somebody hits this badly they likely have done
> so before, and they at least have the choice to upgrade, but if we
> regress performance for more people...

I could see that for some cases the current code might give better
performance, but OTOH, consuming memory at a high rate for some other
cases is also not good either. But you are right that we can always
ask such users to upgrade (which again sometimes is painful for some
of the users), so maybe the right thing is to do nothing here. Anyone
else has any opinion on this?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com