Re: [HACKERS] Custom compression methods

2020-12-26 Thread Andrey Borodin


> 25 дек. 2020 г., в 14:34, Dilip Kumar  написал(а):
> 
>  
>  
>  
>  
>  
> 

Maybe add Lz4\Zlib WAL FPI compression on top of this patchset? I'm not 
insisting on anything, it just would be so cool to have it...

BTW currently there are Oid collisions in original patchset.

Best regards, Andrey Borodin.



v16-0007-Add-Lz4-compression-to-WAL-FPIs.patch
Description: Binary data


possible replacement for llvm JIT

2020-12-26 Thread Pavel Stehule
Hi

maybe interesting project for decreasing overhead of JIT

https://developers.redhat.com/blog/2020/01/20/mir-a-lightweight-jit-compiler-project/

Regards

Pavel


Re: Wrong comment in tuptable.h

2020-12-26 Thread Andres Freund
Hi,

On 2020-12-26 18:00:49 -0800, Jeff Davis wrote:
>   /*
>* Return a copy of heap tuple representing the contents of the slot.
> The
>* copy needs to be palloc'd in the current memory context. The slot
>* itself is expected to remain unaffected. It is *not* expected to
> have
>* meaningful "system columns" in the copy. The copy is not be
> "owned" by
>* the slot i.e. the caller has to take responsibility to free memory
>* consumed by the slot.
>*/
>   HeapTuple (*copy_heap_tuple) (TupleTableSlot *slot);
> 
> But acquire_sample_rows() calls ExecCopySlotHeapTuple(), and then
> subsequently sorts the rows by TID. Is acquire_sample_rows() doing
> something it shouldn't, or is the comment mistaken?

I think the comment is too vague and thinking of system columns as
xmin/xmax/cmin/cmax.

Greetings,

Andres Freund




Wrong comment in tuptable.h

2020-12-26 Thread Jeff Davis



  /*
   * Return a copy of heap tuple representing the contents of the slot.
The
   * copy needs to be palloc'd in the current memory context. The slot
   * itself is expected to remain unaffected. It is *not* expected to
have
   * meaningful "system columns" in the copy. The copy is not be
"owned" by
   * the slot i.e. the caller has to take responsibility to free memory
   * consumed by the slot.
   */
  HeapTuple (*copy_heap_tuple) (TupleTableSlot *slot);

But acquire_sample_rows() calls ExecCopySlotHeapTuple(), and then
subsequently sorts the rows by TID. Is acquire_sample_rows() doing
something it shouldn't, or is the comment mistaken?

Regards,
Jeff Davis






Re: SQL/JSON: JSON_TABLE

2020-12-26 Thread Zhihong Yu
For new files introduced in the patches:

+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group

2021 is a few days ahead. It would be good to update the year range.

For transformJsonTableColumn:

+   jfexpr->op =
+   jtc->coltype == JTC_REGULAR ? IS_JSON_VALUE :
+   jtc->coltype == JTC_EXISTS ? IS_JSON_EXISTS : IS_JSON_QUERY;

Should IS_JSON_EXISTS be mentioned in the comment preceding the method ?

For JsonTableDestroyOpaque():

+   state->opaque = NULL;

Should the memory pointed to by opaque be freed ?

+   l2 = list_head(tf->coltypes);
+   l3 = list_head(tf->coltypmods);
+   l4 = list_head(tf->colvalexprs);

Maybe the ListCell pointer variables can be named corresponding to the
lists they iterate so that the code is easier to understand.

+typedef enum JsonTablePlanJoinType
+{
+   JSTP_INNER = 0x01,
+   JSTP_OUTER = 0x02,
+   JSTP_CROSS = 0x04,

Since plan type enum starts with JSTP_, I think the plan join type should
start with JSTPJ_ or JSTJ_. Otherwise the following would be a bit hard to
read:

+   else if (plan->plan_type == JSTP_JOINED)
+   {
+   if (plan->join_type == JSTP_INNER ||
+   plan->join_type == JSTP_OUTER)

since different fields are checked in the two if statements but the
prefixes don't convey that.

+  Even though the path names are not incuded into the PLAN
DEFAULT

Typo: incuded -> included

+   int nchilds = 0;

nchilds -> nchildren

+#if 0 /* XXX it' unclear from the standard whether root path name is
mandatory or not */
+   if (plan && plan->plan_type != JSTP_DEFAULT && !rootPathName)

Do you plan to drop the if block ?

Cheers

On Fri, Dec 25, 2020 at 12:32 PM Nikita Glukhov 
wrote:

>
> On 03.08.2020 10:55, Michael Paquier wrote:
>
> On Sun, Jul 05, 2020 at 12:15:58PM -0500, Justin Pryzby wrote:
>
> It looks like this needs to be additionally rebased - I will set cfbot to
> "Waiting".
>
> ...  Something that has not happened in four weeks, so this is marked
> as returned with feedback.  Please feel free to resubmit once a rebase
> is done.
> --
> Michael
>
>
> Atatched 44th version of the pacthes rebased onto current master
> (#0001 corresponds to v51 of SQL/JSON patches).
>
>
> --
> Nikita Glukhov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>


Re: Better client reporting for "immediate stop" shutdowns

2020-12-26 Thread Andres Freund
Hi,

On 2020-12-26 13:37:15 -0500, Tom Lane wrote:
> I suppose a compromise position could be to let the postmaster export its
> "Shutdown" state variable, and then let backends assume that SIGTERM means
> fast shutdown or pg_terminate_backend depending on the state of that one
> global variable.  But it'd be a bit imprecise so I don't really feel it's
> more useful than what we have.

Fair enough, I think.


> > I'd like to not log all these repeated messages into the server
> > log. It's quite annoying to have to digg through thousands of lines of
> > repeated "terminating connection..."
> 
> Hm.  That's an orthogonal issue, but certainly worth considering.
> There are a couple of levels we could consider:
> 
> 1. Just make the logged messages less verbose (they certainly don't
> need the DETAIL and HINT lines).
> 
> 2. Suppress the log entries altogether.
> 
> I would have been against #2 before this patch, because it'd mean
> that a rogue SIGQUIT leaves no clear trace in the log.  But with
> this patch, we can be fairly sure that we know whether SIGQUIT came
> from the postmaster, and then it might be all right to suppress the
> log entry altogether when it did.
> 
> I'd be happy to write up a patch for either of these, but let's
> decide what we want first.

My vote would be #2, with the same reasoning as yours.

Greetings,

Andres Freund




Re: SQL/JSON: functions

2020-12-26 Thread Zhihong Yu
Hi,
For ExecEvalJsonExprSubtrans(), if you check !subtrans first,

+   /* No need to use subtransactions. */
+   return func(op, econtext, res, resnull, p, error);

The return statement would allow omitting the else keyword and left-indent
the code in the current if block.

For ExecEvalJsonExpr()

+   *resnull = !DatumGetPointer(res);
+   if (error && *error)
+   return (Datum) 0;

Suppose *resnull is false and *error is true, 0 would be returned
with *resnull as false. Should the *resnull be consistent with the actual
return value ?

For ExecEvalJson() :

+   Assert(*op->resnull);
+   *op->resnull = true;

I am not sure of the purpose for the assignment since *op->resnull should
be true by the assertion.

For raw_expression_tree_walker :

+   if (walker(jfe->on_empty, context))
+   return true;

Should the if condition include jfe->on_empty prior to walking ?

nit: for contain_mutable_functions_walker, if !IsA(jexpr->path_spec, Const)
is checked first (and return), the current if block can be left indented.

For JsonPathDatatypeStatus,

+   jpdsDateTime,   /* unknown datetime type */

Should the enum be named jpdsUnknownDateTime so that its meaning is clear
to people reading the code ?

For get_json_behavior(), I wonder if mapping from behavior->btype to the
string form would shorten the body of switch statement.
e.g.
char* map[] = {
  " NULL",
  " ERROR",
  " EMPTY",
...
};

Cheers

On Fri, Dec 25, 2020 at 5:19 PM Zhihong Yu  wrote:

> For 0001-Common-SQL-JSON-clauses-v51.patch :
>
> +   /*  | implementation_defined_JSON_representation_option (BSON,
> AVRO etc) */
>
> I don't find implementation_defined_JSON_representation_option in the
> patchset. Maybe rephrase the above as a comment
> without implementation_defined_JSON_representation_option ?
>
> For getJsonEncodingConst(), should the method error out for the default
> case of switch (encoding) ?
>
> 0002-SQL-JSON-constructors-v51.patch :
>
> +   Assert(!OidIsValid(collation)); /* result is always an
> json[b] type */
>
> an json -> a json
>
> +   /* XXX TEXTOID is default by standard */
> +   returning->typid = JSONOID;
>
> Comment doesn't seem to match the assignment.
>
> For json_object_agg_transfn :
>
> +   if (out->len > 2)
> +   appendStringInfoString(out, ", ");
>
> Why length needs to be at least 3 (instead of 2) ?
>
> Cheers
>
> On Fri, Dec 25, 2020 at 12:26 PM Nikita Glukhov 
> wrote:
>
>> On 17.09.2020 08:41, Michael Paquier wrote:
>>
>> On Sat, Jul 18, 2020 at 09:24:11AM -0400, Andrew Dunstan wrote:
>>
>> I think patches 5 and 6 need to be submitted to the next commitfest,
>> This is far too much scope creep to be snuck in under the current CF item.
>>
>> I'll look at patches 1-4.
>>
>> Even with that, the patch set has been waiting on author for the last
>> six weeks, so I am marking it as RwF for now.  Please feel free to
>> resubmit.
>>
>> Attached 51st version of the patches rebased onto current master.
>>
>>
>> There were some shift/reduce conflicts in SQL grammar that have appeared
>> after "expr AS keyword" refactoring in 06a7c3154f.  I'm not sure if I 
>> resolved
>> them correctly.  JSON TEXT pseudotype, introduced in #0006, caused a lot of
>> grammar conflicts, so it was replaced with simple explicit pg_catalog.json.
>>
>> Also new CoercionForm COERCE_SQL_SYNTAX was introduced, and this reminds 
>> custom
>> function formats that I have used in earlier version of the patches for
>> deparsing of SQL/JSON constructor expressions that were based on raw json[b]
>> function calls.  These custom function formats were replaced in v43 with
>> dedicated executor nodes for SQL/JSON constructors.  So, I'm not sure is it
>> worth to try to replace back nodes with new COERCE_SQL_SYNTAX.
>>
>> --
>> Nikita Glukhov
>> Postgres Professional: http://www.postgrespro.com
>> The Russian Postgres Company
>>
>>


Re: Better client reporting for "immediate stop" shutdowns

2020-12-26 Thread Tom Lane
Andres Freund  writes:
> On 2020-12-21 16:43:33 -0500, Tom Lane wrote:
>> * I chose to report the same message for immediate shutdown as we
>> already use for SIGTERM (fast shutdown or pg_terminate_backend()).
>> Should it be different, and if so what?

[ per upthread, I did already change the SIGQUIT message to specify
"immediate shutdown" ]

> To do better I think we'd have to distinguish the different cases? An
> error message like
> "terminating connection due to {fast shutdown,immediate shutdown,connection 
> termination} administrator command"
> or such could be helpful, but I don't think your patch adds *quite*
> enough state?

Well, if you want to distinguish different causes for SIGTERM then
you'd need additional mechanism for that.  I think we'd have to have
a per-child termination-reason field, since SIGTERM might be sent to
just an individual backend rather than the whole flotilla at once.
I didn't think it was quite worth the trouble --- "administrator command"
seems close enough for both fast shutdown and pg_terminate_backend() ---
but you could certainly argue differently.

I suppose a compromise position could be to let the postmaster export its
"Shutdown" state variable, and then let backends assume that SIGTERM means
fast shutdown or pg_terminate_backend depending on the state of that one
global variable.  But it'd be a bit imprecise so I don't really feel it's
more useful than what we have.

> I'd like to not log all these repeated messages into the server
> log. It's quite annoying to have to digg through thousands of lines of
> repeated "terminating connection..."

Hm.  That's an orthogonal issue, but certainly worth considering.
There are a couple of levels we could consider:

1. Just make the logged messages less verbose (they certainly don't
need the DETAIL and HINT lines).

2. Suppress the log entries altogether.

I would have been against #2 before this patch, because it'd mean
that a rogue SIGQUIT leaves no clear trace in the log.  But with
this patch, we can be fairly sure that we know whether SIGQUIT came
from the postmaster, and then it might be all right to suppress the
log entry altogether when it did.

I'd be happy to write up a patch for either of these, but let's
decide what we want first.

regards, tom lane




Re: [HACKERS] [PATCH] Generic type subscripting

2020-12-26 Thread Tom Lane
Dmitry Dolgov <9erthali...@gmail.com> writes:
>> On Tue, Dec 22, 2020 at 02:21:22PM -0500, Tom Lane wrote:
>> We do have precedent for this, it's the rules about resolving argument
>> types for overloaded functions.  But the conclusion that that precedent
>> leads to is that we should check whether the subscript expression can
>> be *implicitly* coerced to either integer or text, and fail if neither
>> coercion or both coercions succeed.

> I'm not sure I completely follow and can't immediately find the relevant
> code for overloaded functions, so I need to do a perception check.
> Following this design in jsonb_subscripting_transform we try to coerce
> the subscription expression to both integer and text (and maybe even to
> jsonpath), and based on the result of which coercion has succeeded chose
> different logic to handle it, right?

Right, with the important proviso that the coercion strength is 
COERCION_IMPLICIT not COERCION_ASSIGNMENT.

> And just for me to understand. In the above example of the overloaded
> function, with the integer we can coerce it only to text (since original
> type of the expression is integer), and with the bigint it could be
> coerced both to integer and text, that's why failure, isn't?

No, there's no such IMPLICIT-level casts.  Coercing bigint down to int
is only allowed at ASSIGNMENT or higher coercion strength.

In a case like jsonpath['...'], the initially UNKNOWN-type literal could
in theory be coerced to any of these types, so you'd have to resolve that
case manually.  The overloaded-function code has an internal preference
that makes it choose TEXT if it has a choice of TEXT or some other target
type for an UNKNOWN input (cf parse_func.c starting about line 1150), but
if you ask can_coerce_type() it's going to say TRUE for all three cases.

Roughly speaking, then, I think what you want to do is

1. If input type is UNKNOWNOID, choose result type TEXT.

2. Otherwise, apply can_coerce_type() to see if the input type can be
coerced to int4, text, or jsonpath.  If it succeeds for none or more
than one of these, throw error.  Otherwise choose the single successful
type.

3. Apply coerce_type() to coerce to the chosen result type.

4. At runtime, examine exprType() of the input to figure out what to do.

regards, tom lane




Re: pglz compression performance, take two

2020-12-26 Thread Tom Lane
Tomas Vondra  writes:
> On 12/26/20 8:06 AM, Andrey Borodin wrote:
>> I'm still in doubt should I register this patch on CF or not. I'm willing to 
>> work on this, but it's not clear will it hit PGv14. And I hope for PGv15 we 
>> will have lz4 or something better for WAL compression.

> I'd suggest registering it, otherwise people are much less likely to 
> give you feedback. I don't see why it couldn't land in PG14.

Even if lz4 or something else shows up, the existing code will remain
important for TOAST purposes.  It would be years before we lose interest
in it.

regards, tom lane




Re: Rethinking plpgsql's assignment implementation

2020-12-26 Thread Pavel Stehule
Hi

I repeated tests. I wrote a set of simple functions. It is a synthetical
test, but I think it can identify potential problems well.

I calculated the average of 3 cycles and I checked the performance of each
function. I didn't find any problem. The total execution time is well too.
Patched code is about 11% faster than master (14sec x 15.8sec). So there is
new important functionality with nice performance benefits.

make check-world passed

Regards

Pavel


plpgsql-perftest.sql
Description: application/sql


Re: pg_stat_statements and "IN" conditions

2020-12-26 Thread Zhihong Yu
Hi,
A few comments.

+   "After this number of duplicating constants
start to merge them.",

duplicating -> duplicate

+   foreach(lc, (List *) expr)
+   {
+   Node * subExpr = (Node *) lfirst(lc);
+
+   if (!IsA(subExpr, Const))
+   {
+   allConst = false;
+   break;
+   }
+   }

It seems the above foreach loop (within foreach(temp, (List *) node)) can
be preceded with a check that allConst is true. Otherwise the loop can be
skipped.

+   if (currentExprIdx == pgss_merge_threshold - 1)
+   {
+   JumbleExpr(jstate, expr);
+
+   /*
+* A const expr is already found, so JumbleExpr must
+* record it. Mark it as merged, it will be the
first
+* merged but still present in the statement query.
+*/
+   Assert(jstate->clocations_count > 0);
+   jstate->clocations[jstate->clocations_count -
1].merged = true;
+   currentExprIdx++;
+   }

The above snippet occurs a few times. Maybe extract into a helper method.

Cheers

On Sat, Dec 26, 2020 at 2:45 AM Dmitry Dolgov <9erthali...@gmail.com> wrote:

> > On Wed, Nov 18, 2020 at 05:04:32PM +0100, Dmitry Dolgov wrote:
> > > On Wed, Aug 12, 2020 at 06:19:02PM +0200, Dmitry Dolgov wrote:
> > >
> > > I would like to start another thread to follow up on [1], mostly to
> bump up the
> > > topic. Just to remind, it's about how pg_stat_statements jumbling
> ArrayExpr in
> > > queries like:
> > >
> > > SELECT something FROM table WHERE col IN (1, 2, 3, ...)
> > >
> > > The current implementation produces different jumble hash for every
> different
> > > number of arguments for essentially the same query. Unfortunately a
> lot of ORMs
> > > like to generate these types of queries, which in turn leads to
> > > pg_stat_statements pollution. Ideally we want to prevent this and have
> only one
> > > record for such a query.
> > >
> > > As the result of [1] I've identified two highlighted approaches to
> improve this
> > > situation:
> > >
> > > * Reduce the generated ArrayExpr to an array Const immediately, in
> cases where
> > >   all the inputs are Consts.
> > >
> > > * Make repeating Const to contribute nothing to the resulting hash.
> > >
> > > I've tried to prototype both approaches to find out pros/cons and be
> more
> > > specific. Attached patches could not be considered a completed piece
> of work,
> > > but they seem to work, mostly pass the tests and demonstrate the
> point. I would
> > > like to get some high level input about them and ideally make it clear
> what is
> > > the preferred solution to continue with.
> >
> > I've implemented the second approach mentioned above, this version was
> > tested on our test clusters for some time without visible issues. Will
> > create a CF item and would appreciate any feedback.
>
> After more testing I found couple of things that could be improved,
> namely in the presence of non-reducible constants one part of the query
> was not copied into the normalized version, and this approach also could
> be extended for Params. These are incorporated in the attached patch.
>


Re: Parallel Inserts in CREATE TABLE AS

2020-12-26 Thread vignesh C
On Thu, Dec 24, 2020 at 1:07 PM Bharath Rupireddy
 wrote:
>
> On Thu, Dec 24, 2020 at 10:25 AM vignesh C  wrote:
> > You could change intoclause_len = strlen(intoclausestr) to
> > strlen(intoclausestr) + 1 and use intoclause_len in the remaining
> > places. We can avoid the +1 in the other places.
> > +   /* Estimate space for into clause for CTAS. */
> > +   if (IS_CTAS(intoclause) && OidIsValid(objectid))
> > +   {
> > +   intoclausestr = nodeToString(intoclause);
> > +   intoclause_len = strlen(intoclausestr);
> > +   shm_toc_estimate_chunk(>estimator, intoclause_len + 
> > 1);
> > +   shm_toc_estimate_keys(>estimator, 1);
> > +   }
>
> Done.
>
> > Can we use  node->nworkers_launched == 0 in place of
> > node->need_to_scan_locally, that way the setting and resetting of
> > node->need_to_scan_locally can be removed. Unless need_to_scan_locally
> > is needed in any of the functions that gets called.
> > +   /* Enable leader to insert in case no parallel workers were 
> > launched. */
> > +   if (node->nworkers_launched == 0)
> > +   node->need_to_scan_locally = true;
> > +
> > +   /*
> > +* By now, for parallel workers (if launched any), would have
> > started their
> > +* work i.e. insertion to target table. In case the leader is 
> > chosen to
> > +* participate for parallel inserts in CTAS, then finish its
> > share before
> > +* going to wait for the parallel workers to finish.
> > +*/
> > +   if (node->need_to_scan_locally)
> > +   {
>
> need_to_scan_locally is being set in ExecGather() even if
> nworkers_launched > 0 it can still be true, so I think we can not
> remove need_to_scan_locally in ExecParallelInsertInCTAS.
>
> Attaching v15 patch set for further review. Note that the change is
> only in 0001 patch, other patches remain unchanged from v14.
>

+-- parallel inserts must occur
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;

We can change comment  "parallel inserts must occur" like "parallel
insert must be selected for CTAS on normal table"

+-- parallel inserts must occur
+select explain_pictas(
+'create unlogged table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;

We can change comment "parallel inserts must occur" like "parallel
insert must be selected for CTAS on unlogged table"
Similar comment need to be handled in other places also.

+create function explain_pictas(text) returns setof text
+language plpgsql as
+$$
+declare
+ln text;
+begin
+for ln in
+execute format('explain (analyze, costs off, summary off,
timing off) %s',
+$1)
+loop
+ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers
Launched: N');
+ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual
rows=N loops=N');
+ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows
Removed by Filter: N');
+return next ln;
+end loop;
+end;
+$$;

The above function is same as function present in partition_prune.sql:
create function explain_parallel_append(text) returns setof text
language plpgsql as
$$
declare
ln text;
begin
for ln in
execute format('explain (analyze, costs off, summary off,
timing off) %s',
$1)
loop
ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers
Launched: N');
ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual
rows=N loops=N');
ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows
Removed by Filter: N');
return next ln;
end loop;
end;
$$;

If possible try to make a common function for both and use.

+   if (intoclausestr && OidIsValid(objectid))
+   fpes->objectid = objectid;
+   else
+   fpes->objectid = InvalidOid;
Here OidIsValid(objectid) check is not required intoclausestr will be
set only if OidIsValid.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel copy

2020-12-26 Thread vignesh C
On Wed, Dec 23, 2020 at 3:05 PM Hou, Zhijie  wrote:
>
> Hi
>
> > Yes this optimization can be done, I will handle this in the next patch
> > set.
> >
>
> I have a suggestion for the parallel safety-check.
>
> As designed, The leader does not participate in the insertion of data.
> If User use (PARALLEL 1), there is only one worker process which will do the 
> insertion.
>
> IMO, we can skip some of the safety-check in this case, becase the 
> safety-check is to limit parallel insert.
> (except temporary table or ...)
>
> So, how about checking (PARALLEL 1) separately ?
> Although it looks a bit complicated, But (PARALLEL 1) do have a good 
> performance improvement.
>

Thanks for the comments Hou Zhijie, I will run a few tests with 1
worker and try to include this in the next patch set.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Added missing copy related data structures to typedefs.list

2020-12-26 Thread vignesh C
On Thu, Dec 17, 2020 at 4:28 AM Bruce Momjian  wrote:
>
> On Mon, Dec  7, 2020 at 01:56:50PM +0530, vignesh C wrote:
> > Hi,
> >
> > Added missing copy related data structures to typedefs.list, these
> > data structures were added while copy files were split during the
> > recent commit. I found this while running pgindent for parallel copy
> > patches.
> > The Attached patch has the changes for the same.
> > Thoughts?
>
> Uh, we usually only update the typedefs file before we run pgindent on
> the master branch.
>

Ok, Thanks for the clarification. I was not sure as in few of the
enhancements it was included as part of the patches.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: pglz compression performance, take two

2020-12-26 Thread Tomas Vondra




On 12/26/20 8:06 AM, Andrey Borodin wrote:




12 дек. 2020 г., в 22:47, Andrey Borodin  написал(а):




I've cleaned up comments, checked that memory alignment stuff actually make 
sense for 32-bit ARM (according to Godbolt) and did some more code cleanup. PFA 
v2 patch.

I'm still in doubt should I register this patch on CF or not. I'm willing to 
work on this, but it's not clear will it hit PGv14. And I hope for PGv15 we 
will have lz4 or something better for WAL compression.



I'd suggest registering it, otherwise people are much less likely to 
give you feedback. I don't see why it couldn't land in PG14.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: pgsql: Add key management system

2020-12-26 Thread Bruce Momjian
On Sat, Dec 26, 2020 at 02:38:40PM +, Shinoda, Noriyoshi (PN Japan FSIP) 
wrote:
> Hi,
> 
> Thank you for developing a great feature. I tested it immediately.
> The attached patch changes the value of the --file-encryption-keylen option 
> of the initdb command to mandatory. No value is currently required.
> I also changed the help message and the manual.

Thank you, applied.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





RE: pgsql: Add key management system

2020-12-26 Thread Shinoda, Noriyoshi (PN Japan FSIP)
Hi,

Thank you for developing a great feature. I tested it immediately.
The attached patch changes the value of the --file-encryption-keylen option of 
the initdb command to mandatory. No value is currently required.
I also changed the help message and the manual.

Regards,
Noriyoshi Shinoda

-Original Message-
From: Bruce Momjian [mailto:br...@momjian.us] 
Sent: Saturday, December 26, 2020 4:01 AM
To: Erik Rijkers 
Cc: pgsql-hackers@lists.postgresql.org
Subject: Re: pgsql: Add key management system

On Fri, Dec 25, 2020 at 07:30:01PM +0100, Erik Rijkers wrote:
> On 2020-12-25 16:19, Bruce Momjian wrote:
> 
> > Add key management system
> > doc/src/sgml/database-encryption.sgml |  97 +
> 
> Attached are a few typos.
> 
> I also noticed that this option does not occur in the initdb --help:
> 
>   -u  --copy-encryption-keys
> 
> Was that deliberate?

No.  :-(  Attached patch applied.  Thanks.

-- 
  Bruce Momjian  https://momjian.us 
  EnterpriseDB https://enterprisedb.com 

  The usefulness of a cup is in its emptiness, Bruce Lee



keylength.diff
Description: keylength.diff


Re: pg_stat_statements and "IN" conditions

2020-12-26 Thread Dmitry Dolgov
> On Wed, Nov 18, 2020 at 05:04:32PM +0100, Dmitry Dolgov wrote:
> > On Wed, Aug 12, 2020 at 06:19:02PM +0200, Dmitry Dolgov wrote:
> >
> > I would like to start another thread to follow up on [1], mostly to bump up 
> > the
> > topic. Just to remind, it's about how pg_stat_statements jumbling ArrayExpr 
> > in
> > queries like:
> >
> > SELECT something FROM table WHERE col IN (1, 2, 3, ...)
> >
> > The current implementation produces different jumble hash for every 
> > different
> > number of arguments for essentially the same query. Unfortunately a lot of 
> > ORMs
> > like to generate these types of queries, which in turn leads to
> > pg_stat_statements pollution. Ideally we want to prevent this and have only 
> > one
> > record for such a query.
> >
> > As the result of [1] I've identified two highlighted approaches to improve 
> > this
> > situation:
> >
> > * Reduce the generated ArrayExpr to an array Const immediately, in cases 
> > where
> >   all the inputs are Consts.
> >
> > * Make repeating Const to contribute nothing to the resulting hash.
> >
> > I've tried to prototype both approaches to find out pros/cons and be more
> > specific. Attached patches could not be considered a completed piece of 
> > work,
> > but they seem to work, mostly pass the tests and demonstrate the point. I 
> > would
> > like to get some high level input about them and ideally make it clear what 
> > is
> > the preferred solution to continue with.
>
> I've implemented the second approach mentioned above, this version was
> tested on our test clusters for some time without visible issues. Will
> create a CF item and would appreciate any feedback.

After more testing I found couple of things that could be improved,
namely in the presence of non-reducible constants one part of the query
was not copied into the normalized version, and this approach also could
be extended for Params. These are incorporated in the attached patch.
>From a93824799eda63391989e8845393f0b773508e18 Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthali...@gmail.com>
Date: Tue, 17 Nov 2020 16:18:08 +0100
Subject: [PATCH v2] Prevent jumbling of every element in ArrayExpr

pg_stat_statements produces multiple entries for queries like

SELECT something FROM table WHERE col IN (1, 2, 3, ...)

depending on number of parameters, because every element of ArrayExpr is
jumbled. Make Consts contribute nothing to the jumble hash if they're
part of a series and at position further that specified threshold. Do
the same for similar queries with VALUES as well.
---
 .../expected/pg_stat_statements.out   | 657 +-
 .../pg_stat_statements/pg_stat_statements.c   | 262 ++-
 .../sql/pg_stat_statements.sql| 129 
 3 files changed, 1034 insertions(+), 14 deletions(-)

diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 2a303a7f07..6978e37ca7 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -205,7 +205,7 @@ SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
  INSERT INTO test VALUES(generate_series($1, $2), $3) | 1 |   10
  SELECT * FROM test ORDER BY a| 1 |   12
  SELECT * FROM test WHERE a > $1 ORDER BY a   | 2 |4
- SELECT * FROM test WHERE a IN ($1, $2, $3, $4, $5)   | 1 |8
+ SELECT * FROM test WHERE a IN ($1, $2, $3, $4, ...)  | 1 |8
  SELECT pg_stat_statements_reset()| 1 |1
  SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C" | 0 |0
  UPDATE test SET b = $1 WHERE a = $2  | 6 |6
@@ -861,4 +861,659 @@ SELECT query, plans, calls, rows FROM pg_stat_statements ORDER BY query COLLATE
  SELECT query, plans, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C" | 1 | 0 |0
 (6 rows)
 
+--
+-- Consts merging
+--
+SET pg_stat_statements.merge_threshold = 5;
+CREATE TABLE test_merge (id int, data int);
+-- IN queries
+-- Normal
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset 
+--
+ 
+(1 row)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3);
+ id | data 
++--
+(0 rows)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query  | calls 
++---
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3)  | 1
+ SELECT pg_stat_statements_reset()  | 1
+ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C" | 0