Re: WIP: Aggregation push-down

2020-02-02 Thread Antonin Houska
legrand legrand  wrote:

> set enable_agg_pushdown to true;
> isn't displayed in EXPLAIN (SETTINGS) syntax.
> 
> 
> The following modification seems to fix that: 
> 
> src/backend/utils/misc/guc.c
>  
>   {"enable_agg_pushdown", PGC_USERSET, QUERY_TUNING_METHOD,
>   gettext_noop("Enables aggregation push-down."),
>   NULL,
>   GUC_EXPLAIN   --- line Added ---
>   },
>   _agg_pushdown,
>   false,
>   NULL, NULL, NULL
>   },

Thanks. I'll include this change in the next version.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: Tid scan increments value of pg_stat_all_tables.seq_scan. (but not seq_tup_read)

2020-02-02 Thread Kasahara Tatsuhito
On Mon, Feb 3, 2020 at 4:22 PM Fujii Masao  wrote:
> On 2020/02/01 16:05, Kasahara Tatsuhito wrote:
> >  if (scan->rs_base.rs_flags & (SO_TYPE_SEQSCAN | SO_TYPE_SAMPLESCAN))
> >  {
> >  /*
> >   * Ensure a missing snapshot is noticed reliably, even if the
> >   * isolation mode means predicate locking isn't performed (and
> >   * therefore the snapshot isn't used here).
> >   */
> >  Assert(snapshot);
> >  PredicateLockRelation(relation, snapshot);
> >  }
> >
> > Therefore, it can not simply remove the SO_TYPE_SEQSCAN flag from a TID 
> > scan.
> > To keep the old behavior, I think it would be better to add a new
> > SO_TYPE_TIDSCAN flag and take a predicate lock on the entire relation.
>
> But in the old behavior, PredicateLockRelation() was not called in TidScan 
> case
> because its flag was not SO_TYPE_SEQSCAN. No?
No. Tid scan called PredicateLockRelation() both previous and current.

In the current (v12 and HEAD), Tid scan has SO_TYPE_SEQSCAN flag so
that PredicateLockRelation()is called in Tid scan.
In the Previous (- v11), in heap_beginscan_internal(), checks
is_bitmapscan flags.
If is_bitmapscan is set to false, calls PredicateLockRelation().

(- v11)
if (!is_bitmapscan)
PredicateLockRelation(relation, snapshot);

And in the Tid scan,  is_bitmapscan is set to false, so that
PredicateLockRelation()is called in Tid scan.

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-02-02 Thread Amit Langote
On Mon, Feb 3, 2020 at 4:28 PM Amit Langote  wrote:
> If we are trying to "pg_stop_backup" in phase name, maybe we should
> avoid "pg_start_backup"?  Then maybe:

Sorry, I meant to write:

If we are trying to avoid "pg_stop_backup" in phase name, maybe we
should avoid "pg_start_backup"?  Then maybe:

Thanks,
Amit




Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-02-02 Thread Amit Langote
On Mon, Feb 3, 2020 at 1:17 PM Fujii Masao  wrote:
> On 2020/02/02 14:59, Masahiko Sawada wrote:
> > On Fri, 31 Jan 2020 at 02:29, Fujii Masao  
> > wrote:
> >> On 2020/01/30 12:58, Kyotaro Horiguchi wrote:
> >>> +WHEN 3 THEN 'stopping backup'::text
> >>>
> >>> I'm not sure, but the "stop" seems suggesting the backup is terminated
> >>> before completion. If it is following the name of the function
> >>> pg_stop_backup, I think the name is suggesting to stop "the state for
> >>> performing backup", not a backup.
> >>>
> >>> In the first place, the progress is about "backup" so it seems strange
> >>> that we have another phase after the "stopping backup" phase.  It
> >>> might be better that it is "finishing file transfer" or such.
> >>>
> >>>  "initializing"
> >>> -> "starting file transfer"
> >>> -> "transferring files"
> >>> -> "finishing file transfer"
> >>> -> "transaferring WAL"
> >>
> >> Better name is always welcome! If "stopping back" is confusing,
> >> what about "performing pg_stop_backup"? So
> >>
> >> initializing
> >> performing pg_start_backup
> >> streaming database files
> >> performing pg_stop_backup
> >> transfering WAL files
> >
> > Another idea I came up with is to show steps that take time instead of
> > pg_start_backup/pg_stop_backup, for better understanding the
> > situation. That is, "performing checkpoint" and "performing WAL
> > archive" etc, which engage the most of the time of these functions.
>
> Yeah, that's an idea. ISTM that "waiting for WAL archiving" sounds
> better than "performing WAL archive". Thought?
> I've not applied this change in the patch yet, but if there is no
> other idea, I'd like to adopt this.

If we are trying to "pg_stop_backup" in phase name, maybe we should
avoid "pg_start_backup"?  Then maybe:

initializing
starting backup / waiting for [ backup start ] checkpoint to finish
transferring database files
finishing backup / waiting for WAL archiving to finish
transferring WAL files

?

Some comments on documentation changes in v2 patch:

+  Amount of data already streamed.

"already" may be redundant.  For example, in pg_start_progress_vacuum,
heap_blks_scanned is described as "...blocks scanned", not "...blocks
already scanned".

+ tablespace_total
+ tablespace_streamed

Better to use plural tablespaces_total and tablespaces_streamed for consistency?

+   The WAL sender process is currently performing
+   pg_start_backup and setting up for
+   making a base backup.

How about "taking" instead of "making" in the above sentence?

-  
+  

I don't see any new text in the documentation patch that uses above
xref, so no need to define it?

Thanks,
Amit




Re: Tid scan increments value of pg_stat_all_tables.seq_scan. (but not seq_tup_read)

2020-02-02 Thread Fujii Masao




On 2020/02/01 16:05, Kasahara Tatsuhito wrote:

On Thu, Jan 30, 2020 at 1:49 PM Kyotaro Horiguchi
 wrote:

At Thu, 30 Jan 2020 13:30:56 +0900, Kasahara Tatsuhito 
 wrote in

TID scan   : yes , seq_scan, , 

Here is wrong, because TID scan came to have SO_TYPE_SEQSCAN flags
from commit 147e3722f7.


It is reflectings the discussion below, which means TID scan doesn't
have corresponding SO_TYPE_ value. Currently it is setting
SO_TYPE_SEQSCAN by accedent.

Ah, sorry I misunderstood..

Upon further investigation, the SO_TYPE_SEQSCAN flag was also used at
heap_beginscan() to determine whether a predicate lock was taken on
the entire relation.

 if (scan->rs_base.rs_flags & (SO_TYPE_SEQSCAN | SO_TYPE_SAMPLESCAN))
 {
 /*
  * Ensure a missing snapshot is noticed reliably, even if the
  * isolation mode means predicate locking isn't performed (and
  * therefore the snapshot isn't used here).
  */
 Assert(snapshot);
 PredicateLockRelation(relation, snapshot);
 }

Therefore, it can not simply remove the SO_TYPE_SEQSCAN flag from a TID scan.
To keep the old behavior, I think it would be better to add a new
SO_TYPE_TIDSCAN flag and take a predicate lock on the entire relation.


But in the old behavior, PredicateLockRelation() was not called in TidScan case
because its flag was not SO_TYPE_SEQSCAN. No?

Regards,


--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: Add %x to PROMPT1 and PROMPT2

2020-02-02 Thread Michael Paquier
On Wed, Jan 29, 2020 at 11:51:10PM +0100, Vik Fearing wrote:
> Thanks for the review!
> 
> Would you mind changing the status in the commitfest app?
> https://commitfest.postgresql.org/27/2427/

FWIW, I am not really in favor of changing a default old enough that
it could vote (a45195a).
--
Michael


signature.asc
Description: PGP signature


Re: Proposal: Add more compile-time asserts to expose inconsistencies.

2020-02-02 Thread Michael Paquier
On Fri, Jan 31, 2020 at 02:46:16PM +0900, Michael Paquier wrote:
> Actually, thinking more about it, I'd rather remove this part as well,
> keeping only the change in the third paragraph of this comment block.

I have tweaked a bit the comments in this area, and applied the
patch.  I'll begin a new thread with the rest of the refactoring.
There are a couple of things I'd like to double-check first.
--
Michael


signature.asc
Description: PGP signature


Re: table partitioning and access privileges

2020-02-02 Thread Amit Langote
On Mon, Feb 3, 2020 at 2:07 PM Fujii Masao  wrote:
> On 2020/02/03 11:05, Amit Langote wrote:
> > Okay.  How about the attached?
>
> Thanks for the patches! You added the note just after the description
> about row level security on inherited table, but isn't it better to
> add it before that? Attached patch does that. Thought?

Yeah, that might be a better flow for that paragraph.

> > Maybe, we should also note the LOCK TABLE exception?
>
> Yes.

Note that, unlike TRUNCATE, LOCK TABLE exception exists in HEAD too,
but maybe you're aware of that.

Thanks,
Amit




Re: table partitioning and access privileges

2020-02-02 Thread Fujii Masao



On 2020/02/03 11:05, Amit Langote wrote:

On Fri, Jan 31, 2020 at 9:39 PM Fujii Masao  wrote:

On 2020/01/31 13:38, Amit Langote wrote:

On Fri, Jan 31, 2020 at 1:28 AM Fujii Masao  wrote:

Fair enough. I finally did back-patch because the behavior is clearly
documented and I failed to hear the opinions to object the back-patch.
But I should have heard and discussed such risks more.

I'm OK to revert all those back-patch. Instead, probably the document
should be updated in old branches.


I could find only this paragraph in the section on inheritance that
talks about how access permissions work:

9.4:

"Note how table access permissions are handled. Querying a parent
table can automatically access data in child tables without further
access privilege checking. This preserves the appearance that the data
is (also) in the parent table. Accessing the child tables directly is,
however, not automatically allowed and would require further
privileges to be granted."

9.5-12:

"Inherited queries perform access permission checks on the parent
table only. Thus, for example, granting UPDATE permission on the
cities table implies permission to update rows in the capitals table
as well, when they are accessed through cities. This preserves the
appearance that the data is (also) in the parent table. But the
capitals table could not be updated directly without an additional
grant. In a similar way, the parent table's row security policies (see
Section 5.7) are applied to rows coming from child tables during an
inherited query. A child table's policies, if any, are applied only
when it is the table explicitly named in the query; and in that case,
any policies attached to its parent(s) are ignored."

Do you mean that the TRUNCATE exception should be noted here?


Yes, that's what I was thinking.


Okay.  How about the attached?


Thanks for the patches! You added the note just after the description
about row level security on inherited table, but isn't it better to
add it before that? Attached patch does that. Thought?


Maybe, we should also note the LOCK TABLE exception?


Yes.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index d88651df9e..7550d03f27 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3384,7 +3384,15 @@ VALUES ('Albany', NULL, NULL, 'NY');
accessed through cities.  This preserves the 
appearance
that the data is (also) in the parent table.  But
the capitals table could not be updated directly
-   without an additional grant.  In a similar way, the parent table's row
+   without an additional grant.  Two exceptions to this rule are
+   TRUNCATE and LOCK TABLE,
+   where permissions on the child tables are always checked,
+   whether they are processed directly or recursively via those commands
+   performed on the parent table.
+  
+
+  
+   In a similar way, the parent table's row
security policies (see ) are applied to
rows coming from child tables during an inherited query.  A child table's
policies, if any, are applied only when it is the table explicitly named


Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-02-02 Thread Amit Kapila
On Fri, Jan 10, 2020 at 10:53 AM Dilip Kumar  wrote:
>
> On Mon, Dec 30, 2019 at 3:43 PM Amit Kapila  wrote:
> >
> >
> > > 2. During commit time in DecodeCommit we check whether we need to skip
> > > the changes of the transaction or not by calling
> > > SnapBuildXactNeedsSkip but since now we support streaming so it's
> > > possible that before we decode the commit WAL, we might have already
> > > sent the changes to the output plugin even though we could have
> > > skipped those changes.  So my question is instead of checking at the
> > > commit time can't we check before adding to ReorderBuffer itself
> > >
> >
> > I think if we can do that then the same will be true for current code
> > irrespective of this patch.  I think it is possible that we can't take
> > that decision while decoding because we haven't assembled a consistent
> > snapshot yet.  I think we might be able to do that while we try to
> > stream the changes.  I think we need to take care of all the
> > conditions during streaming (when the logical_decoding_workmem limit
> > is reached) as we do in DecodeCommit.  This needs a bit more study.
>
> I have analyzed this further and I think we can not decide all the
> conditions even while streaming.  Because IMHO once we get the
> SNAPBUILD_FULL_SNAPSHOT we can add the changes to the reorder buffer
> so that if we get the commit of the transaction after we reach to the
> SNAPBUILD_CONSISTENT.  However, if we get the commit before we reach
> to SNAPBUILD_CONSISTENT then we need to ignore this transaction.  Now,
> even if we have SNAPBUILD_FULL_SNAPSHOT we can stream the changes
> which might get dropped later but that we can not decide while
> streaming.
>

This makes sense to me, but we should add a comment for the same when
we are streaming to say we can't skip similar to how we do during
commit time because of the above reason described by you.  Also, what
about other conditions where we can skip the transaction, basically
cases like (a) when the transaction happened in another database,  (b)
when the output plugin is not interested in the origin and (c) when we
are doing fast-forwarding

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-02-02 Thread Fujii Masao



On 2020/02/02 14:59, Masahiko Sawada wrote:

On Fri, 31 Jan 2020 at 02:29, Fujii Masao  wrote:




On 2020/01/30 12:58, Kyotaro Horiguchi wrote:

At Wed, 29 Jan 2020 23:16:08 +0900, Fujii Masao  
wrote in

Hi,

pg_basebackup reports the backup progress if --progress option is
specified,
and we can monitor it in the client side. I think that it's useful if
we can
monitor the progress information also in the server side because, for
example,
we can easily check that by using SQL. So I'd like to propose
pg_stat_progress_basebackup view that allows us to monitor the
progress
of pg_basebackup, in the server side. Thought?

POC patch is attached.


| postgres=# \d pg_stat_progress_basebackup
|  View "pg_catalog.pg_stat_progress_basebackup"
|Column|  Type   | Collation | Nullable | Default
| -+-+---+--+-
|  pid | integer |   |  |
|  phase   | text|   |  |
|  backup_total| bigint  |   |  |
|  backup_streamed | bigint  |   |  |
|  tablespace_total| bigint  |   |  |
|  tablespace_streamed | bigint  |   |  |

I think the view needs client identity such like host/port pair
besides pid (host/port pair fails identify client in the case of
unix-sockets.).


I don't think this is job of a progress reporting. If those information
is necessary, we can join this view with pg_stat_replication using
pid column as the join key.


Also elapsed time from session start might be
useful.


+1
I think that not only pg_stat_progress_basebackup but also all the other
progress views should report the time when the target command was
started and the time when the phase was last changed. Those times
would be useful to estimate the remaining execution time from the
progress infromation.


pg_stat_progress_acuum has datid, datname and relid.

+ if (backup_total > 0 && backup_streamed > backup_total)
+ {
+ backup_total = backup_streamed;

Do we need the condition "backup_total > 0"?


I added the condition for the case where --progress option is not supplied
in pg_basebackup, i.e., the case where the total amount of backup is not
estimated at the beginning of the backup. In this case, total_backup is
always 0.


+ int tblspc_streamed = 0;
+
+ pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
+  
PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP);

This starts "streaming backup" phase with backup_total = 0. Coudln't
we move to that phase after setting backup total and tablespace total?
That is, just after calling SendBackupHeader().


OK, that's a bit less confusing for users. I will change in that way.


Fixed. Attached is the updated version of the patch.
I also fixed the regression test failure.




+WHEN 3 THEN 'stopping backup'::text

I'm not sure, but the "stop" seems suggesting the backup is terminated
before completion. If it is following the name of the function
pg_stop_backup, I think the name is suggesting to stop "the state for
performing backup", not a backup.

In the first place, the progress is about "backup" so it seems strange
that we have another phase after the "stopping backup" phase.  It
might be better that it is "finishing file transfer" or such.

 "initializing"
-> "starting file transfer"
-> "transferring files"
-> "finishing file transfer"
-> "transaferring WAL"


Better name is always welcome! If "stopping back" is confusing,
what about "performing pg_stop_backup"? So

initializing
performing pg_start_backup
streaming database files
performing pg_stop_backup
transfering WAL files


Another idea I came up with is to show steps that take time instead of
pg_start_backup/pg_stop_backup, for better understanding the
situation. That is, "performing checkpoint" and "performing WAL
archive" etc, which engage the most of the time of these functions.


Yeah, that's an idea. ISTM that "waiting for WAL archiving" sounds
better than "performing WAL archive". Thought?
I've not applied this change in the patch yet, but if there is no
other idea, I'd like to adopt this.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 8839699079..136fcbc2af 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -376,6 +376,14 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   
11:34   0:00 postgres: ser
   
  
 
+ 
+  
pg_stat_progress_basebackuppg_stat_progress_basebackup
+  One row for each WAL sender process streaming a base backup,
+   showing current progress.
+   See .
+  
+ 
+
 

   
@@ -3515,7 +3523,10 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,

Re: Autovacuum on partitioned table

2020-02-02 Thread Amit Langote
On Sun, Feb 2, 2020 at 12:53 PM Masahiko Sawada
 wrote:
> On Wed, 29 Jan 2020 at 17:56, Amit Langote  wrote:
> > On Wed, Jan 29, 2020 at 11:29 AM yuzuko  wrote:
> > > > How are you going to track changes_since_analyze of partitioned table?
> > > > It's just an idea but we can accumulate changes_since_analyze of
> > > > partitioned table by adding child tables's value after analyzing each
> > > > child table. And compare the partitioned tables value to the threshold
> > > > that is computed by (autovacuum_analyze_threshold  + total rows
> > > > including all child tables * autovacuum_analyze_scale_factor).
> > > >
> > > The idea Sawada-san mentioned is similar to mine.
> >
> > So if I understand this idea correctly, a partitioned table's analyze
> > will only be triggered when partitions are analyzed.  That is,
> > inserts, updates, deletes of tuples in partitions will be tracked by
> > pgstat, which in turn is used by autovacuum to trigger analyze on
> > partitions.  Then, partitions changes_since_analyze is added into the
> > parent's changes_since_analyze, which in turn *may* trigger analyze
> > parent.  I said "may", because it would take multiple partition
> > analyzes to accumulate enough changes to trigger one on the parent.
> > Am I getting that right?
>
> Yeah that is what I meant. In addition, adding partition's
> changes_since_analyze to its parent needs to be done recursively as
> the parent table could also be a partitioned table.

That's a good point.  So, changes_since_analyze increments are
essentially propagated from leaf partitions to all the way up to the
root table, including any intermediate partitioned tables.  We'll need
to consider whether we should propagate only one level at a time (from
bottom of the tree) or update all parents up to the root, every time a
leaf partition is analyzed.  If we we do the latter, that might end up
triggering analyze on all the parents at the same time causing
repeated scanning of the same child tables in close intervals,
although setting analyze threshold and scale factor of the parent
tables of respective levels wisely can help avoid any negative impact
of that.

Thanks,
Amit




Re: Internal key management system

2020-02-02 Thread Chris Travers
Hi;

So I actually have tried to do carefully encrypted data in Postgres via
pg_crypto.  I think the key management problems in PostgreSQL are separable
from table-level encryption. In particular the largest problem right now
with having encrypted attributes is accidental key disclosure.  I think if
we solve key management in a way that works for encrypted attributes first,
we can then add encrypted tables later.

Additionally big headaches come with key rotation.  So here are my thoughts
here.  This is a fairly big topic.  And I am not sure it can be done
incrementally as much as that seems to doom big things in the community,
but I think it could be done with a major push by a combination of big
players, such as Second Quadrant.


On Sun, Feb 2, 2020 at 3:02 AM Masahiko Sawada <
masahiko.saw...@2ndquadrant.com> wrote:

> Hi,
>
> I've started a new separate thread from the previous long thread[1]
> for internal key management system to PostgreSQL. As I mentioned in
> the previous mail[2], we've decided to step back and focus on only
> internal key management system for PG13. The internal key management
> system introduces the functionality to PostgreSQL that allows user to
> encrypt and decrypt data without knowing the actual key. Besides, it
> will be able to be integrated with transparent data encryption in the
> future.
>
> The basic idea is that PostgreSQL generates the master encryption key
> which is further protected by the user-provided passphrase. The key
> management system provides two functions to wrap and unwrap the secret
> by the master encryption key. A user generates a secret key locally
> and send it to PostgreSQL to wrap it using by pg_kmgr_wrap() and save
> it somewhere. Then the user can use the encrypted secret key to
> encrypt data and decrypt data by something like:
>

So my understanding is that  you would then need something like:

1.  Symmetric keys for actual data storage.  These could never be stored in
the clear.
2.  User public/private keys to  use to access data storage keys.  The
private key would need to be encrypted with passphrases.  And the server
needs to access the private key.
3.  Symmetric secret keys to encrypt private keys
4.  A key management public/private key pair used to exchange the password
for the private key.

>
> INSERT INTO tbl VALUES (pg_encrypt('user data', pg_kmgr_unwrap('x'));
> SELECT pg_decrypt(secret_column, pg_kmgr_unwrap('x')) FROM tbl;
>

If you get anything wrong you risk logs being useful to break tne
encryption keys and make data access easy.  You don't want
pg_kmgr_unwrap('') in your logs.

Here what I would suggest is a protocol extension to do the key exchange.
In other words, protocol messages to:
1.  Request data exchange server public key.
2.  Send server public-key encrypted symmetric key.  Make sure it is
properly padded etc.

These are safe still only over SSL with sslmode=full_verify since otherwise
you might be vulnerable to an MITM attack.

Then the keys should be stored in something like CacheMemoryContext and
pg_encrypt()/pg_decrypt() would have access to them along with appropriate
 catalogs needed to get to the storage keys themselves.




>
> Where 'x' is the result of pg_kmgr_wrap function.
>
> That way we can get something encrypted and decrypted without ever
> knowing the actual key that was used to encrypt it.
>
> I'm currently updating the patch and will submit it.
>

The above though is only a small part of the problem.  What we also need
are a variety of new DDL commands specifically for key management.  This is
needed because without commands of this sort, we cannot make absolutely
sure that the commands are never logged.  These commands MUST not have keys
logged  and therefore must have keys stripped prior to logging.  If I were
designing this:

1.  Users on an SSL connection would be able to:  CREATE ENCRYPTION USER
KEY PAIR WITH PASSWORD 'xyz' which would automatically rotate keys.
2.  Superusers could:  ALTER SYSTEM ROTATE ENCRYPTION EXCHANGE KEY PAIR;
3.  Add an ENCRYPTED attribute to columns and disallow indexing of
ENCRYPTED columns.  This would store keys for the columns encrypted with
user public keys where they have access.
4. Allow superusers to ALTER TABLE foo ALTER encrypted_column ROTATE KEYS;
which would naturally require a full table rewrite.

Now, what that proposal does not provide is the use of encryption to
enforce finer-grained access such as per-row keys but that's another topic
and maybe something we don't need.

However I hope that explains what I see as a version of a minimum viable
infrastructure here.

>
> On Sun, 2 Feb 2020 at 00:37, Sehrope Sarkuni  wrote:
> >
> > On Fri, Jan 31, 2020 at 1:21 AM Masahiko Sawada
> >  wrote:
> > > On Thu, 30 Jan 2020 at 20:36, Sehrope Sarkuni 
> wrote:
> > > > That
> > > > would allow the internal usage to have a fixed output length of
> > > > LEN(IV) + LEN(HMAC) + LEN(DATA) = 16 + 32 + 64 = 112 bytes.
> > >
> > > Probably you meant 

Re: Portal->commandTag as an enum

2020-02-02 Thread Tom Lane
Mark Dilger  writes:
> I put the CommandTag enum in src/common because there wasn’t any reason
> not to do so.  It seems plausible that frontend test frameworks might want
> access to this enum.

Au contraire, that's an absolutely fundamental mistake.  There is
zero chance of this enum holding still across PG versions, so if
we expose it to frontend code, we're going to have big problems
with cross-version compatibility.  See our historical problems with
assuming the enum for character set encodings was the same between
frontend and backend ... and that set is orders of magnitude more
stable than this one.

As I recall, the hardest problem with de-string-ifying this is the fact
that for certain tags we include a rowcount in the string.  I'd like to
see that undone --- we have to keep it like that on-the-wire to avoid a
protocol break, but it'd be best if noplace inside the backend did it that
way, and we converted at the last moment before sending a CommandComplete
to the client.  Your references to "completion tags" make it sound like
you've only half done the conversion (though I've not read the patch
in enough detail to verify).

BTW, the size of the patch is rather depressing, especially if it's
only half done.  Unlike Andres, I'm not even a little bit convinced
that this is worth the amount of code churn it'll cause.  Have you
done any code-size or speed comparisons?

regards, tom lane




Re: table partitioning and access privileges

2020-02-02 Thread Amit Langote
On Fri, Jan 31, 2020 at 9:39 PM Fujii Masao  wrote:
> On 2020/01/31 13:38, Amit Langote wrote:
> > On Fri, Jan 31, 2020 at 1:28 AM Fujii Masao  
> > wrote:
> >> Fair enough. I finally did back-patch because the behavior is clearly
> >> documented and I failed to hear the opinions to object the back-patch.
> >> But I should have heard and discussed such risks more.
> >>
> >> I'm OK to revert all those back-patch. Instead, probably the document
> >> should be updated in old branches.
> >
> > I could find only this paragraph in the section on inheritance that
> > talks about how access permissions work:
> >
> > 9.4:
> >
> > "Note how table access permissions are handled. Querying a parent
> > table can automatically access data in child tables without further
> > access privilege checking. This preserves the appearance that the data
> > is (also) in the parent table. Accessing the child tables directly is,
> > however, not automatically allowed and would require further
> > privileges to be granted."
> >
> > 9.5-12:
> >
> > "Inherited queries perform access permission checks on the parent
> > table only. Thus, for example, granting UPDATE permission on the
> > cities table implies permission to update rows in the capitals table
> > as well, when they are accessed through cities. This preserves the
> > appearance that the data is (also) in the parent table. But the
> > capitals table could not be updated directly without an additional
> > grant. In a similar way, the parent table's row security policies (see
> > Section 5.7) are applied to rows coming from child tables during an
> > inherited query. A child table's policies, if any, are applied only
> > when it is the table explicitly named in the query; and in that case,
> > any policies attached to its parent(s) are ignored."
> >
> > Do you mean that the TRUNCATE exception should be noted here?
>
> Yes, that's what I was thinking.

Okay.  How about the attached?

Maybe, we should also note the LOCK TABLE exception?

Regards,
Amit
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 8d2908c34d..d274d048ec 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -2323,7 +2323,10 @@ VALUES ('New York', NULL, NULL, 'NY');
access privilege checking.  This preserves the appearance that the
data is (also) in the parent table.  Accessing the child tables
directly is, however, not automatically allowed and would require
-   further privileges to be granted.
+   further privileges to be granted.  One exception to this rule is
+   TRUNCATE command, where permissions on the child tables
+   are always checked, whether they are truncated directly or recursively via
+   TRUNCATE performed on the parent table.
   
 
  
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 6ab37e7354..cf7b4fd891 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -2762,7 +2762,10 @@ VALUES ('Albany', NULL, NULL, 'NY');
rows coming from child tables during an inherited query.  A child table's
policies, if any, are applied only when it is the table explicitly named
in the query; and in that case, any policies attached to its parent(s) are
-   ignored.
+   ignored.  One exception to this rule is TRUNCATE command,
+   where permissions on the child tables are always checked, whether they are
+   truncated directly or recursively via TRUNCATE performed
+   on the parent table.
   
 
   


Re: Prevent pg_basebackup running as root

2020-02-02 Thread Ian Barwick

On 2020/02/01 18:34, Michael Paquier wrote:

On Thu, Jan 30, 2020 at 04:00:40PM +0900, Michael Paquier wrote:

Anyway, your patch looks like a good idea to me, so let's see if
others have opinions or objections about it.


Seeing nothing, committed v2.


Thanks!


Regards

Ian Barwick


--
Ian Barwick   https://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: PATCH: add support for IN and @> in functional-dependency statistics use

2020-02-02 Thread Tomas Vondra

On Sun, Feb 02, 2020 at 10:59:32AM +0100, Pierre Ducroquet wrote:

On Saturday, February 1, 2020 3:24:46 PM CET Tomas Vondra wrote:

On Sat, Feb 01, 2020 at 08:51:04AM +0100, Pierre Ducroquet wrote:
>Hello
>
>At my current job, we have a lot of multi-tenant databases, thus with
>tables containing a tenant_id column. Such a column introduces a severe
>bias in statistics estimation since any other FK in the next columns is
>very likely to have a functional dependency on the tenant id. We found
>several queries where this functional dependency messed up the estimations
>so much the optimizer chose wrong plans.
>When we tried to use extended statistics with CREATE STATISTIC on
>tenant_id, other_id, we noticed that the current implementation for
>detecting functional dependency lacks two features (at least in our use
>case):
>- support for IN clauses
>- support for the array contains operator (that could be considered as a
>special case of IN)

Thanks for the patch. I don't think the lack of support for these clause
types is an oversight - we haven't done them because we were not quite
sure the functional dependencies can actually apply to them. But maybe
we can support them, I'm not against that in principle.

>After digging in the source code, I think the lack of support for IN
>clauses is an oversight and due to the fact that IN clauses are
>ScalarArrayOpExpr instead of OpExpr. The attached patch fixes this by
>simply copying the code- path for OpExpr and changing the type name. It
>compiles and the results are correct, with a dependency being correctly
>taken into consideration when estimating rows. If you think such a copy
>paste is bad and should be factored in another static bool function,
>please say so and I will happily provide an updated patch.

Hmmm. Consider a query like this:

   ... WHERE tenant_id = 1 AND another_column IN (2,3)

which kinda contradicts the idea of functional dependencies that knowing
a value in one column, tells us something about a value in a second
column. But that assumes a single value, which is not quite true here.

The above query is essentially the same thing as

   ... WHERE (tenant_id=1 AND (another_column=2 OR another_column=3))

and also

   ... WHERE (tenant_id=1 AND another_column=2)
  OR (tenant_id=1 AND another_column=3)

at wchich point we could apply functional dependencies - but we'd do it
once for each AND-clause, and then combine the results to compute
selectivity for the OR clause.

But this means that if we apply functional dependencies directly to the
original clause, it'll be inconsistent. Which seems a bit unfortunate.

Or do I get this wrong?


In my tests, I've got a table with two columns a and b, generated this way:
   CREATE TABLE test (a INT, b INT)
   AS SELECT i, i/10 FROM
 generate_series(1, 10) s(i),
 generate_series(1, 5) x;

With statistics defined on the a, b columns

Here are the estimated selectivity results without any patch:

SELECT * FROM test WHERE a = 1 AND b = 1 : 5
SELECT * FROM test WHERE a = 1 AND (b = 1 OR b = 2) : 1
SELECT * FROM test WHERE (a = 1 AND b = 1) OR (a = 1 AND b = 2) : 1
SELECT * FROM test WHERE a = 1 AND b IN (1, 2) : 1

With the patch, the estimated rows of the last query goes back to 5, which is
more logical. The other ones don't change.



Yes, I think you're right. I've been playing with this a bit more, and I
think you're right we can support it the way you propose.

I'm still a bit annoyed by the inconsistency this might/does introduce.
Consider for example these clauses:

a) ... WHERE a = 10 AND b IN (100, 200)
b) ... WHERE a = 10 AND (b = 100 OR b = 200)
c) ... WHERE (a = 10 AND b = 100) OR (a = 10 AND b = 200)

All three cases are logically equivalent and do return the same set of
rows. But we estimate them differently, arriving at different estimates.

Case (a) is the one you improve in your patch. Case (c) is actually not
possible in practice, because we rewrite it as (b) during planning.

But (b) is estimated very differently, because we don't recognize the OR
clause as supported by functional dependencies. On the one hand I'm sure
it's not the first case where we already estimate equivalent clauses
differently. On the other hand I wonder how difficult would it be to
support this specific type of OR clause (with all expressions having the
form "Var = Const" and all Vars referencing the same rel).

I'm not going to block the patch because of this, of course. Similarly,
it'd be nice to add support for ScalarArrayOpExpr to MCV stats, not just
functional dependencies ...


BTW the code added in the 0001 patch is the same as for is_opclause, so
maybe we can simply do

 if (is_opclause(rinfo->clause) ||
 IsA(rinfo->clause, ScalarArrayOpExpr))
 {
 ...
 }

instead of just duplicating the code.


I would love doing that, but the ScalarArrayOpExpr and OpExpr are not binary
compatible for the members used here. In ScalarArrayOpExpr, on 

[PATCH] Fix for slow GIN index queries when "gin_fuzzy_search_limit" setting is relatively small for large tables

2020-02-02 Thread Adé
Hello all,

Like the title says, using "gin_fuzzy_search_limit" degrades speed when it
has a relatively low setting.

What do I mean by "relatively low"? An example would be when a table with a
GIN index has many millions of rows and a particular keyword search has
1,000,000 possible results because the keyword is very common (or it's just
that the table is so supremely large that even a somewhat common keyword
appears enough to return one million results). However, you only want to
return around 100 random results from that one million, so you set
gin_fuzzy_search_limit to 100. That limit is relatively low when you look
at the ratio of the limit value to the possible results: 100 / 1,000,000  =
0.0001. You'll find the query is very slow for such a low ratio. It isn't
so slow if gin_fuzzy_search_limit is 100 but the keyword search has only a
total of 10,000 possible results (resulting in a higher ratio of 0.1).

This would explain why in the documentation it is said that "From
experience, values in the thousands (e.g., 5000 — 2) work well". It's
not so common to have queries that return large enough result sets such
that gin_fuzzy_search_limit values between 5,000 and 20,000 would result in
low ratios and so result in the performance issue I've observed (these
gin_fuzzy_search_limit values have relatively high ratios between 0.005 and
0.02 if you have 1,000,000 results for a keyword search). However, if you
desire a lower gin_fuzzy_search_limit such as 100, while also having a
relatively larger table, you'll find this slowness issue.

I discussed this issue more and the reason for it in my original bug
report:
https://www.postgresql.org/message-id/16220-1a0a4f0cb67ca...@postgresql.org

Attached is SQL to test and observe this issue and also attached is a patch
I want to eventually submit to a commitfest.

Best regards,
Adé


gin_fuzzy_search_limit_test.sql
Description: Binary data


ginget.patch
Description: Binary data


Re: TestLib condition for deleting temporary directories

2020-02-02 Thread Daniel Gustafsson
> On 2 Feb 2020, at 18:01, Noah Misch  wrote:
> 
> Forking thread "logical decoding : exceeded maxAllocatedDescs for .spill
> files" for this side issue:

Thanks, I hadn't seen this.

> On Wed, Jan 08, 2020 at 09:37:04PM -0800, Noah Misch wrote:
>> v10
>> deletes PostgresNode base directories at the end of this test file, despite
>> the failure[1].
> 
>> [1] It has the all_tests_passing() logic in an attempt to stop this.  I'm
>> guessing it didn't help because the file failed by calling die "connection
>> error: ...", not by reporting a failure to Test::More via ok(0) or similar.
> 
> That is what happened.  We should test the exit status to decide whether to
> keep temporaries, as attached.  PostgresNode does that, since commit 90627cf
> (thread https://postgr.es/m/flat/6205.1492883490%40sss.pgh.pa.us).  That
> thread already discussed $SUBJECT[1] and the __DIE__ handler being
> redundant[2].  I plan to back-patch, since it's most useful for v10 and v9.6.

I'm travelling and haven't been able to test, but this makes sense from
reading.  +1 on backpatching.

cheers ./daniel



TestLib condition for deleting temporary directories

2020-02-02 Thread Noah Misch
Forking thread "logical decoding : exceeded maxAllocatedDescs for .spill
files" for this side issue:

On Wed, Jan 08, 2020 at 09:37:04PM -0800, Noah Misch wrote:
> v10
> deletes PostgresNode base directories at the end of this test file, despite
> the failure[1].

> [1] It has the all_tests_passing() logic in an attempt to stop this.  I'm
> guessing it didn't help because the file failed by calling die "connection
> error: ...", not by reporting a failure to Test::More via ok(0) or similar.

That is what happened.  We should test the exit status to decide whether to
keep temporaries, as attached.  PostgresNode does that, since commit 90627cf
(thread https://postgr.es/m/flat/6205.1492883490%40sss.pgh.pa.us).  That
thread already discussed $SUBJECT[1] and the __DIE__ handler being
redundant[2].  I plan to back-patch, since it's most useful for v10 and v9.6.

[1] 
https://postgr.es/m/CAMsr+YFyFU=+mvfzqhthfmw22x5-h517e6ck6et+dt+x4bu...@mail.gmail.com
[2] https://postgr.es/m/fea925b2-c3ae-4ba9-9194-5f5616ad0...@yesql.se
Author: Noah Misch 
Commit: Noah Misch 

When a TAP file has non-zero exit status, retain temporary directories.

PostgresNode already retained base directories in such cases.  Stop
using $SIG{__DIE__}, which is redundant with the exit status check, in
lieu of proliferating it to TestLib.  Back-patch to 9.6, where commit
88802e068017bee8cea7a5502a712794e761c7b5 introduced retention on
failure.

Reviewed by FIXME.

Discussion: https://postgr.es/m/FIXME

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 2e0cf4a..ff972d1 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -1206,20 +1206,6 @@ sub can_bind
return $ret;
 }
 
-# Retain the errno on die() if set, else assume a generic errno of 1.
-# This will instruct the END handler on how to handle artifacts left
-# behind from tests.
-$SIG{__DIE__} = sub {
-   if ($!)
-   {
-   $died = $!;
-   }
-   else
-   {
-   $died = 1;
-   }
-};
-
 # Automatically shut down any still-running nodes when the test script exits.
 # Note that this just stops the postmasters (in the same order the nodes were
 # created in).  Any temporary directories are deleted, in an unspecified
@@ -1238,8 +1224,7 @@ END
next if defined $ENV{'PG_TEST_NOCLEAN'};
 
# clean basedir on clean test invocation
-   $node->clean_node
- if TestLib::all_tests_passing() && !defined $died && 
!$exit_code;
+   $node->clean_node if $exit_code == 0 && 
TestLib::all_tests_passing();
}
 
$? = $exit_code;
diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm
index 458b801..65ee042 100644
--- a/src/test/perl/TestLib.pm
+++ b/src/test/perl/TestLib.pm
@@ -183,8 +183,13 @@ INIT
 END
 {
 
-   # Preserve temporary directory for this test on failure
-   $File::Temp::KEEP_ALL = 1 unless all_tests_passing();
+   # Test files have several ways of causing prove_check to fail:
+   # 1. Exit with a non-zero status.
+   # 2. Call ok(0) or similar, indicating that a constituent test failed.
+   # 3. Deviate from the planned number of tests.
+   #
+   # Preserve temporary directories after (1) and after (2).
+   $File::Temp::KEEP_ALL = 1 unless $? == 0 && all_tests_passing();
 }
 
 =pod


Re: BUG #16171: Potential malformed JSON in explain output

2020-02-02 Thread Tom Lane
[ cc'ing depesz to see what he thinks about this ]

Daniel Gustafsson  writes:
> On 1 Feb 2020, at 20:37, Tom Lane  wrote:
>> This does lead to some field
>> order rearrangement in text mode, as per the regression test changes,
>> but I think that's not a big deal.  (A change can only happen if there
>> are initplan(s) attached to the parent node.)

> Does that prevent backpatching this, or are we Ok with EXPLAIN text output not
> being stable across minors?  AFAICT Pg::Explain still works fine with this
> change, but mileage may vary for other parsers.

I'm not sure about that either.  It should be a clear win for parsers
of the non-text formats, because now we're generating valid
JSON-or-whatever where we were not before.  But it's not too hard to
imagine that someone's ad-hoc parser of text output would break,
depending on how much it relies on field order rather than indentation
to make sense of things.

In the background of all this is that cases where it matters must be
pretty thin on the ground so far, else we'd have gotten complaints
sooner.  So we shouldn't really assume that everyone's parser handles
such cases at all yet.

I'm a little bit inclined to back-patch, on the grounds that JSON
output is hopelessly broken without this, and any text-mode parsers
that need work would need work by September anyway.  But I could
easily be argued into not back-patching.

Another approach we could consider is putting your patch in the
back branches and mine in HEAD.  I'm not sure that's a good idea;
it trades short-term stability of the text format for a long-term
mess in the non-text formats.  But maybe somebody will argue for it.

Thoughts?

regards, tom lane




ALTER tbl rewrite loses CLUSTER ON index

2020-02-02 Thread Justin Pryzby
Other options are preserved by ALTER (and CLUSTER ON is and most obviously
should be preserved by CLUSTER's rewrite), so I think (SET) CLUSTER should be
preserved by ALTER, too.

As far as I can see, this should be the responsibility of something in the
vicinity of ATPostAlterTypeParse/RememberIndexForRebuilding.

Attach patch sketches a fix.

ts=# SET client_min_messages=debug; DROP TABLE t; CREATE TABLE t(i int); CREATE 
INDEX ON t(i)WITH(fillfactor=11, vacuum_cleanup_index_scale_factor=12); CLUSTER 
t USING t_i_key; ALTER TABLE t ALTER i TYPE bigint; \d t
SET
DEBUG:  drop auto-cascades to type t
DEBUG:  drop auto-cascades to type t[]
DEBUG:  drop auto-cascades to index t_i_idx
DROP TABLE
CREATE TABLE
DEBUG:  building index "t_i_idx" on table "t" serially
CREATE INDEX
ERROR:  index "t_i_key" for table "t" does not exist
DEBUG:  rewriting table "t"
DEBUG:  building index "t_i_idx" on table "t" serially
DEBUG:  drop auto-cascades to type pg_temp_3091172777
DEBUG:  drop auto-cascades to type pg_temp_3091172777[]
ALTER TABLE
 Table "public.t"
 Column |  Type  | Collation | Nullable | Default 
++---+--+-
 i  | bigint |   |  | 
Indexes:
"t_i_idx" btree (i) WITH (fillfactor='11', 
vacuum_cleanup_index_scale_factor='12')
>From f235a691722a464059358cd6b1d744f75d7bf92f Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 2 Feb 2020 09:49:57 -0600
Subject: [PATCH v1] preserve CLUSTER ON during ALTER TABLE

---
 src/backend/commands/tablecmds.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f599393..c4e6cbd 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -11616,6 +11616,7 @@ RememberIndexForRebuilding(Oid indoid, AlteredTableInfo *tab)
 		}
 		else
 		{
+			Relation indrel;
 			/* OK, capture the index's existing definition string */
 			char	   *defstring = pg_get_indexdef_string(indoid);
 
@@ -11623,6 +11624,18 @@ RememberIndexForRebuilding(Oid indoid, AlteredTableInfo *tab)
 indoid);
 			tab->changedIndexDefs = lappend(tab->changedIndexDefs,
 			defstring);
+			/* Preserve CLUSTER ON if set */
+			indrel = index_open(indoid, AccessShareLock);
+			if (indrel->rd_index->indisclustered) {
+char buf[3*NAMEDATALEN + 24];
+sprintf(buf, "ALTER TABLE %s CLUSTER ON %s",
+		get_rel_name(tab->relid), get_rel_name(indoid)); // XXX: schema
+tab->changedIndexOids = lappend_oid(tab->changedIndexOids,
+	indoid);
+tab->changedIndexDefs = lappend(tab->changedIndexDefs,
+pstrdup(buf));
+			}
+			index_close(indrel, NoLock);
 		}
 	}
 }
@@ -11901,6 +11914,11 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd,
 	 * the new table definition.
 	 */
 }
+else if (cmd->subtype == AT_ClusterOn)
+{
+	tab->subcmds[AT_PASS_OLD_INDEX] =
+		lappend(tab->subcmds[AT_PASS_OLD_INDEX], cmd);
+}
 else
 	elog(ERROR, "unexpected statement subtype: %d",
 		 (int) cmd->subtype);
-- 
2.7.4



Re: BUG #16171: Potential malformed JSON in explain output

2020-02-02 Thread Daniel Gustafsson
> On 1 Feb 2020, at 20:37, Tom Lane  wrote:
> 
> Hamid Akhtar  writes:
>> I've reviewed and verified this patch and IMHO, this is ready to be 
>> committed.
> 
> I took a look at this and I don't think it's really going in the right
> direction.  ISTM the clear intent of this code was to attach the "Subplans
> Removed" item as a field of the parent [Merge]Append node, but the author
> forgot about the intermediate "Plans" array.  So I think that, rather than
> doubling down on a mistake, we ought to move where the field is generated
> so that it *is* a field of the parent node.

Right, that makes sense; +1 on the attached 0001 patch.

> This does lead to some field
> order rearrangement in text mode, as per the regression test changes,
> but I think that's not a big deal.  (A change can only happen if there
> are initplan(s) attached to the parent node.)

Does that prevent backpatching this, or are we Ok with EXPLAIN text output not
being stable across minors?  AFAICT Pg::Explain still works fine with this
change, but mileage may vary for other parsers.

> 0002 attached isn't committable, because nobody would want the overhead
> in production, but it seems like a good trick to keep up our sleeves.

Thats a neat trick, I wonder if it would be worth maintaining a curated list of
these tricks in a README under src/test to help others avoid/reduce wheel
reinventing?

cheers ./daniel



Re: PATCH: add support for IN and @> in functional-dependency statistics use

2020-02-02 Thread Pierre Ducroquet
On Saturday, February 1, 2020 3:24:46 PM CET Tomas Vondra wrote:
> On Sat, Feb 01, 2020 at 08:51:04AM +0100, Pierre Ducroquet wrote:
> >Hello
> >
> >At my current job, we have a lot of multi-tenant databases, thus with
> >tables containing a tenant_id column. Such a column introduces a severe
> >bias in statistics estimation since any other FK in the next columns is
> >very likely to have a functional dependency on the tenant id. We found
> >several queries where this functional dependency messed up the estimations
> >so much the optimizer chose wrong plans.
> >When we tried to use extended statistics with CREATE STATISTIC on
> >tenant_id, other_id, we noticed that the current implementation for
> >detecting functional dependency lacks two features (at least in our use
> >case):
> >- support for IN clauses
> >- support for the array contains operator (that could be considered as a
> >special case of IN)
> 
> Thanks for the patch. I don't think the lack of support for these clause
> types is an oversight - we haven't done them because we were not quite
> sure the functional dependencies can actually apply to them. But maybe
> we can support them, I'm not against that in principle.
> 
> >After digging in the source code, I think the lack of support for IN
> >clauses is an oversight and due to the fact that IN clauses are
> >ScalarArrayOpExpr instead of OpExpr. The attached patch fixes this by
> >simply copying the code- path for OpExpr and changing the type name. It
> >compiles and the results are correct, with a dependency being correctly
> >taken into consideration when estimating rows. If you think such a copy
> >paste is bad and should be factored in another static bool function,
> >please say so and I will happily provide an updated patch.
> 
> Hmmm. Consider a query like this:
> 
>... WHERE tenant_id = 1 AND another_column IN (2,3)
> 
> which kinda contradicts the idea of functional dependencies that knowing
> a value in one column, tells us something about a value in a second
> column. But that assumes a single value, which is not quite true here.
> 
> The above query is essentially the same thing as
> 
>... WHERE (tenant_id=1 AND (another_column=2 OR another_column=3))
> 
> and also
> 
>... WHERE (tenant_id=1 AND another_column=2)
>   OR (tenant_id=1 AND another_column=3)
> 
> at wchich point we could apply functional dependencies - but we'd do it
> once for each AND-clause, and then combine the results to compute
> selectivity for the OR clause.
> 
> But this means that if we apply functional dependencies directly to the
> original clause, it'll be inconsistent. Which seems a bit unfortunate.
> 
> Or do I get this wrong?

In my tests, I've got a table with two columns a and b, generated this way:
CREATE TABLE test (a INT, b INT)
AS SELECT i, i/10 FROM 
  generate_series(1, 10) s(i),
  generate_series(1, 5) x;

With statistics defined on the a, b columns

Here are the estimated selectivity results without any patch:

SELECT * FROM test WHERE a = 1 AND b = 1 : 5
SELECT * FROM test WHERE a = 1 AND (b = 1 OR b = 2) : 1
SELECT * FROM test WHERE (a = 1 AND b = 1) OR (a = 1 AND b = 2) : 1
SELECT * FROM test WHERE a = 1 AND b IN (1, 2) : 1

With the patch, the estimated rows of the last query goes back to 5, which is 
more logical. The other ones don't change.

> BTW the code added in the 0001 patch is the same as for is_opclause, so
> maybe we can simply do
> 
>  if (is_opclause(rinfo->clause) ||
>  IsA(rinfo->clause, ScalarArrayOpExpr))
>  {
>  ...
>  }
> 
> instead of just duplicating the code.

I would love doing that, but the ScalarArrayOpExpr and OpExpr are not binary 
compatible for the members used here. In ScalarArrayOpExpr, on AMD64, args is 
at offset 24 and opno at 4, while they are at 32 and 4 in OpExpr. I can work 
around with this kind of code, but I don't like it much:
List *args;
Oid opno;
if (IsA(rinfo->clause, OpExpr))
{
/* If it's an opclause, we will check for Var = Const or Const = Var. */
OpExpr *expr = (OpExpr *) rinfo->clause;
args = expr->args;
opno = expr->opno;
}
else if (IsA(rinfo->clause, ScalarArrayOpExpr))
{
/* If it's a ScalarArrayOpExpr, check for Var IN Const. */
ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) rinfo->clause;
args = expr->args;
opno = expr->opno;
}

Or I can rewrite it in C++ to play with templates... :)

> We also need some at least some
> regression tests, testing functional dependencies with this clause type.

Agreed

> >The lack of support for @> operator, on the other hand, seems to be a
> >decision taken when writing the initial code, but I can not find any
> >mathematical nor technical reason for it. The current code restricts
> >dependency calculation to the = operator, obviously because inequality
> >operators are not going to work... but array contains is just several =
> >operators grouped, thus the same for the 

Re: psql - add SHOW_ALL_RESULTS option

2020-02-02 Thread Fabien COELHO



Hello Tomas,


This patch was marked as ready for committer, but clearly there's an
ongoin discussion about what should be the default behavoir, if this
breaks existing apps etc. So I've marked it as "needs review" and moved
it to the next CF.


The issue is that root (aka Tom) seems to be against the feature, and 
would like the keep it as current. Although my opinion is that the 
previous behavior is close to insane, I'm ready to resurect the guc to 
control the behavior so that it would be possible, or even the default.


Right now I'm waiting for a "I will not veto it on principle" from Tom 
(I'm okay with a reject based on bad implementation) before spending more 
time on it: Although my time is given for free, it is not a good reason to 
send it down the drain if there is a reject coming whatever I do.


Tom, would you consider the feature acceptable with a guc to control it?

--
Fabien.




Re: Internal key management system

2020-02-02 Thread Fabien COELHO



Hello Masahiko-san,


I've started a new separate thread from the previous long thread[1]
for internal key management system to PostgreSQL. As I mentioned in
the previous mail[2], we've decided to step back and focus on only
internal key management system for PG13. The internal key management
system introduces the functionality to PostgreSQL that allows user to
encrypt and decrypt data without knowing the actual key. Besides, it
will be able to be integrated with transparent data encryption in the
future.

The basic idea is that PostgreSQL generates the master encryption key
which is further protected by the user-provided passphrase. The key
management system provides two functions to wrap and unwrap the secret
by the master encryption key. A user generates a secret key locally


In understand that the secret key is sent in the clear for being encrypted 
by a master key.



and send it to PostgreSQL to wrap it using by pg_kmgr_wrap() and save
it somewhere. Then the user can use the encrypted secret key to
encrypt data and decrypt data by something like:

INSERT INTO tbl VALUES (pg_encrypt('user data', pg_kmgr_unwrap('x'));
SELECT pg_decrypt(secret_column, pg_kmgr_unwrap('x')) FROM tbl;

Where 'x' is the result of pg_kmgr_wrap function.


I'm lost. If pg_{en,de}crypt and pg_kmgr_unwrap are functions, what 
prevent users to:


  SELECT pg_kmgr_unwrap('');

so as to recover the key, somehow in contradiction to "allows user to 
encrypt and decrypt data without knowing the actual key".


When dealing with cryptography and key management, I can only recommand 
extreme caution.


--
Fabien.