Re: docs about FKs referencing partitioned tables

2019-05-28 Thread Amit Langote
Hi,

On 2019/05/29 14:01, Paul A Jungwirth wrote:
> On Sun, May 26, 2019 at 7:49 PM Michael Paquier  wrote:
>> ...  However you are adding a paragraph for something which is
>> completely unrelated to the issue we are trying to fix.  If I were to
>> add something, I think that I would be more general than what you are
>> trying here and just mention a link to the previous paragraph about
>> the caveats of inheritance as they apply to single table members of an
>> inheritance tree and not a full set:
>> "Indexes and foreign key constraint apply to single tables and not
>> their inheritance children, hence they have some caveats to
>> be aware of."
> 
> That seems reasonable to me. Here is a patch file if that is helpful
> (minor typo corrected).

The patch looks good, thanks.

Michael commented upthread that the new next might be repeating what's
already said elsewhere.  I did find that to be the case by looking at
5.10. Inheritance.  For example, 5.10.1. Caveats says exactly the same thing:

   A serious limitation of the inheritance feature is that indexes
   (including unique constraints) and foreign key constraints only apply
   to single tables, not to their inheritance children. This is true on
   both the referencing and referenced sides of a foreign key constraint.

But couple of other points mentioned in 5.11.3.3. Caveats (of 5.11. Table
Partitioning) are already repeated in 5.10.1. Caveats; for example, note
the point about VACUUM, ANALYZE, INSERT ON CONFLICT, etc. applying to
single tables.  So, perhaps it won't hurt to repeat the caveat about
indexes and foreign keys too.

Thanks,
Amit





Re: docs about FKs referencing partitioned tables

2019-05-28 Thread Paul A Jungwirth
On Sun, May 26, 2019 at 7:49 PM Michael Paquier  wrote:
> Well, the point I would like to outline is that section 5.11.2 about
> declarative partitioning and 5.11.3 about partitioning with
> inheritance treat about two separate, independent partitioning
> methods.  So removing the paragraph from the declarative partitioning
> section mentioning foreign keys referencing partitioned tables is
> fine, because that's not the case anymore...
> [snip]
> ...  However you are adding a paragraph for something which is
> completely unrelated to the issue we are trying to fix.  If I were to
> add something, I think that I would be more general than what you are
> trying here and just mention a link to the previous paragraph about
> the caveats of inheritance as they apply to single table members of an
> inheritance tree and not a full set:
> "Indexes and foreign key constraint apply to single tables and not
> their inheritance children, hence they have some caveats to
> be aware of."

That seems reasonable to me. Here is a patch file if that is helpful
(minor typo corrected).

Yours,
Paul


inheritance_fk_docs_v0003.patch
Description: Binary data


Re: How to know referenced sub-fields of a composite type?

2019-05-28 Thread Amit Langote
Kaigai-san,

On 2019/05/29 12:13, Kohei KaiGai wrote:
> One interesting data type in Apache Arrow is "Struct" data type. It is
> equivalent to composite
> type in PostgreSQL. The "Struct" type has sub-fields, and individual
> sub-fields have its own
> values array for each.
> 
> It means we can skip to load the sub-fields unreferenced, if
> query-planner can handle
> referenced and unreferenced sub-fields correctly.
> On the other hands, it looks to me RelOptInfo or other optimizer
> related structure don't have
> this kind of information. RelOptInfo->attr_needed tells extension
> which attributes are referenced
> by other relation, however, its granularity is not sufficient for sub-fields.

Isn't that true for some other cases as well, like when a query accesses
only some sub-fields of a json(b) column?  In that case too, planner
itself can't optimize away access to other sub-fields.  What it can do
though is match a suitable index to the operator used to access the
individual sub-fields, so that the index (if one is matched and chosen)
can optimize away accessing unnecessary sub-fields.  IOW, it seems to me
that the optimizer leaves it up to the indexes (and plan nodes) to further
optimize access to within a field.  How is this case any different?

> Probably, all we can do right now is walk-on the RelOptInfo list to
> lookup FieldSelect node
> to see the referenced sub-fields. Do we have a good idea instead of
> this expensive way?
> # Right now, PG-Strom loads all the sub-fields of Struct column from
> arrow_fdw foreign-table
> # regardless of referenced / unreferenced sub-fields. Just a second best.

I'm missing something, but if PG-Strom/arrow_fdw does look at the
FieldSelect nodes to see which sub-fields are referenced, why doesn't it
generate a plan that will only access those sub-fields or why can't it?

Thanks,
Amit





How to know referenced sub-fields of a composite type?

2019-05-28 Thread Kohei KaiGai
Hello,

A recent revision of PG-Strom has its columnar-storage using Apache
Arrow format files on
FDW infrastructure. Because of the columnar nature, it allows to load
the values which are
referenced by the query, thus, maximizes efficiency of the storage bandwidth.
http://heterodb.github.io/pg-strom/arrow_fdw/

Apache Arrow defines various primitive types that can be mapped on
PostgreSQL data types.
For example, FloatingPoint (precision=Single) on Arrow is equivalent
to float4 of PostgreSQL.
One interesting data type in Apache Arrow is "Struct" data type. It is
equivalent to composite
type in PostgreSQL. The "Struct" type has sub-fields, and individual
sub-fields have its own
values array for each.

It means we can skip to load the sub-fields unreferenced, if
query-planner can handle
referenced and unreferenced sub-fields correctly.
On the other hands, it looks to me RelOptInfo or other optimizer
related structure don't have
this kind of information. RelOptInfo->attr_needed tells extension
which attributes are referenced
by other relation, however, its granularity is not sufficient for sub-fields.

Probably, all we can do right now is walk-on the RelOptInfo list to
lookup FieldSelect node
to see the referenced sub-fields. Do we have a good idea instead of
this expensive way?
# Right now, PG-Strom loads all the sub-fields of Struct column from
arrow_fdw foreign-table
# regardless of referenced / unreferenced sub-fields. Just a second best.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 




Re: Question about some changes in 11.3

2019-05-28 Thread Amit Langote
Hi Mat,

On 2019/05/25 6:05, Mat Arye wrote:
> Hi,
> 
> 11.3 included some change to partition table planning. Namely
> commit 925f46f ("Fix handling of targetlist SRFs when scan/join relation is
> known empty.") seems to redo all paths for partitioned tables
> in apply_scanjoin_target_to_paths. It clears the paths in:
> 
> ```
>  if (rel_is_partitioned)
> rel->pathlist = NIL
> ```
> 
> Then the code rebuild the paths. However, the rebuilt append path never
> gets the
> set_rel_pathlist_hook called. Thus, the work that hook did previously gets
> thrown away and the rebuilt append path can never be influenced by this
> hook.

By dropping the old paths like done here, the core code is simply
forgetting that set_rel_pathlist_hook may have editorialized over them,
which seems like an oversight of that commit.

Your proposal to call set_rel_pathlist_hook() after
add_paths_to_append_rel() to rebuild the Append paths sounds fine to me.

Thanks,
Amit





Re: PG 12 draft release notes

2019-05-28 Thread Amit Langote
On 2019/05/29 0:58, Andres Freund wrote:
> Hi,
> 
>   
> 
> 
>
> Add function  
> linkend="functions-info-partition">pg_partition_root()
> to return top-most parent of a partition tree (Michaël Paquier)
>
>   
> 
>   
> 
> 
>
> Add function  
> linkend="functions-info-partition">pg_partition_ancestors()
> to report all ancestors of a partition (Álvaro Herrera)
>
>   
> 
>   
> 
> 
>
> Add function  
> linkend="functions-info-partition">pg_partition_tree()
> to display information about partitions (Amit Langote)
>
>   
> 
> Can we combine these three?

+1

Thanks,
Amit





Re: Fix inconsistencies for v12

2019-05-28 Thread Amit Langote
On 2019/05/28 20:26, Amit Kapila wrote:
> On Tue, May 28, 2019 at 12:29 PM Amit Langote wrote:
>> Seeing that the crash occurs due to postgres_fdw relying on
>> es_result_relation_info being set when initializing a "direct
>> modification" plan on foreign tables managed by it, we could change the
>> comment to say that instead.  Note that allowing "direct modification" of
>> foreign tables is a core feature, so there's no postgres_fdw-specific
>> behavior here; there may be other FDWs that support "direct modification"
>> plans and so likewise rely on es_result_relation_info being set.
> 
> 
> Can we ensure some way that only FDW's rely on it not any other part
> of the code?

Hmm, I can't think of any way of doing than other than manual inspection.
We are sure that no piece of core code relies on it in the ExecInitNode()
code path.  Apparently FDWs may, as we've found out here.  Now that I've
looked around, maybe other loadable modules may too, by way of (only?)
Custom nodes.  I don't see any other way to hook into ExecInitNode(), so
maybe that's it.

So, maybe reword a bit as:

diff --git a/src/backend/executor/nodeModifyTable.c
b/src/backend/executor/nodeModifyTable.c
index a3c0e91543..95545c9472 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2316,7 +2316,7 @@ ExecInitModifyTable(ModifyTable *node, EState
*estate, int eflags)
  * verify that the proposed target relations are valid and open their
  * indexes for insertion of new index entries.  Note we *must* set
  * estate->es_result_relation_info correctly while we initialize each
- * sub-plan; ExecContextForcesOids depends on that!
+ * sub-plan; external modules such as FDWs may depend on that.

Thanks,
Amit





Re: Quick doc typo fix

2019-05-28 Thread Guillaume Lelarge
Le mar. 28 mai 2019 à 21:28, Michael Paquier  a écrit :

> On Tue, May 28, 2019 at 05:05:10PM +0200, Guillaume Lelarge wrote:
> >   
> > linkend="catalog-pg-am">pg_am
> > -  index access methods
> > +  table and index access methods
> >   
>
> Perhaps we could just say "relation" here?  That's the term used on
> the paragraph describing pg_am.
>

Hehe, that was the first thing I wrote :) but went with "table and index"
as it was also used a bit later in the chapter. Both are fine with me.


-- 
Guillaume.


Re: Fix comment in pgcrypto tests

2019-05-28 Thread Gurjeet Singh
Thanks!

I have changed the patch status as follows in commitfest [1]
Reviewer: Michael Paquier
Committer: Michael Paquier
Status: committed

[1]: https://commitfest.postgresql.org/23/2132/

Best regards,

On Tue, May 28, 2019 at 3:38 AM Michael Paquier  wrote:

> On Mon, May 27, 2019 at 07:25:37PM -0700, Gurjeet Singh wrote:
> > Please see attached the patch that corrects the file-level SQL comment
> that
> > indicates which submodule of pgcrypto is being tested.
>
> Thanks, committed.  There was a second one in pgp-decrypt.sql.
> --
> Michael
>


-- 
Gurjeet Singh http://gurjeet.singh.im/


Re: Quick doc typo fix

2019-05-28 Thread Michael Paquier
On Tue, May 28, 2019 at 05:05:10PM +0200, Guillaume Lelarge wrote:
>   
> linkend="catalog-pg-am">pg_am
> -  index access methods
> +  table and index access methods
>   

Perhaps we could just say "relation" here?  That's the term used on
the paragraph describing pg_am.
--
Michael


signature.asc
Description: PGP signature


Re: Confusing error message for REINDEX TABLE CONCURRENTLY

2019-05-28 Thread David Rowley
On Tue, 28 May 2019 at 01:23, Ashwin Agrawal  wrote:
> I think we will need to separate out the NOTICE message for concurrent and 
> regular case.
>
> For example this doesn't sound correct
> WARNING:  cannot reindex exclusion constraint index "public.circles_c_excl" 
> concurrently, skipping
> NOTICE:  table "circles" has no indexes to reindex
>
> As no indexes can't be reindexed *concurrently* but there are still indexes 
> which can be reindexed, invalid indexes I think fall in same category.

Swap "can't" for "can" and, yeah. I think it would be good to make the
error messages differ for these two cases. This would serve as a hint
to the user that they might have better luck trying without the
"concurrently" option.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: Collation versioning

2019-05-28 Thread Thomas Munro
On Fri, Sep 28, 2018 at 9:30 AM Peter Eisentraut
 wrote:
> On 16/09/2018 20:12, Douglas Doole wrote:
> > All this collation stuff is great, and I know users want it, but it
> > feels like were pushing them out of an airplane with a ripped parachute
> > every time the collation libraries change. Maybe they'll land safely or
> > maybe things will get very messy.
>
> At some point, a schema designer also needs to take some responsibility
> for making smart choices for longevity.  It is known that collations can
> change, and the sort of changes that can happen are also generally
> understood.  So if you want to use range partitioning on text fields,
> maybe you shouldn't, or at least choose the ranges conservatively.
> Similarly, maybe you shouldn't have timestamp range partition boundaries
> around DST changes or on the 29th of every month, and maybe you
> shouldn't partition float values at negative zero.  Some ideas are
> better than others.  We will help you recognize and fix breakage, but we
> can't prevent it altogether.

Since there's a chance of an "unconference" session on locale versions
tomorrow at PGCon, here's a fresh rebase of the patchset to add
per-database-object collation version tracking.  It doesn't handle
default collations yet (not hard AFAIK, will try that soon), but it
does work well enough to demonstrate the generate principal.  I won't
attach the CHECK support just yet, because it needs more work, but the
point of it was to demonstrate that pg_depend can handle this for all
kinds of database objects in one standard way, rather than sprinkling
collation version stuff all over the place in pg_index, pg_constraint,
etc, and I think it did that already.

postgres=# create table t (k text collate "en-x-icu");
CREATE TABLE
postgres=# create index on t(k);
CREATE INDEX
postgres=# select refobjversion from pg_depend where refobjversion != '';
 refobjversion
---
 153.72
(1 row)

Mess with it artificially (or install a different version of ICU):
postgres=# update pg_depend set refobjversion = '42' where
refobjversion = '153.72';
UPDATE 1

In a new session, we get a warning when first loading the index
because the version doesn't match:
postgres=# select * from t where k = 'x';
psql: WARNING:  index "t_k_idx" depends on collation 12711 version
"42", but the current version is "153.72"
DETAIL:  The index may be corrupted due to changes in sort order.
HINT:  REINDEX to avoid the risk of corruption.
 k
---
(0 rows)

The warning can be cleared for the indexes on that one table like so:
postgres=# reindex table t;
REINDEX

You can see that it's captured the new version:
postgres=# select refobjversion from pg_depend where refobjversion != '';
 refobjversion
---
 153.72
(1 row)

-- 
Thomas Munro
https://enterprisedb.com


0001-Remove-pg_collation.collversion.patch
Description: Binary data


0002-Add-pg_depend.refobjversion.patch
Description: Binary data


0003-Track-collation-versions-for-indexes.patch
Description: Binary data


[PATCH] Simple typos fix

2019-05-28 Thread Andrea Gelmini
Hi everybody,
   thanks a lot for your work.

   This is just a stupid patch to fix some typos.
   Thanks a lot to Magnus for the review.

   You can see it also on GitHub,¹ if you prefer, or
   apply it on top of today latest GIT.²

   It passed "make check" on Linux.

Ciao,
Gelma

---

   ¹ 
https://github.com/Gelma/postgres/commit/6c59961f91b89f55b103c57fffa94308dc29546a
   ² commit: d5ec46bf224d2ea1b010b2bc10a65e44d4456553
diff --git a/config/pkg.m4 b/config/pkg.m4
index 13a8890178..f9075e56c8 100644
--- a/config/pkg.m4
+++ b/config/pkg.m4
@@ -86,7 +86,7 @@ dnl Check to see whether a particular set of modules exists. Similar to
 dnl PKG_CHECK_MODULES(), but does not set variables or print errors.
 dnl
 dnl Please remember that m4 expands AC_REQUIRE([PKG_PROG_PKG_CONFIG])
-dnl only at the first occurence in configure.ac, so if the first place
+dnl only at the first occurrence in configure.ac, so if the first place
 dnl it's called might be skipped (such as if it is within an "if", you
 dnl have to call PKG_CHECK_EXISTS manually
 AC_DEFUN([PKG_CHECK_EXISTS],
diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index de0a98f6d9..ff13b0c9e7 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -1278,7 +1278,7 @@ bt_right_page_check_scankey(BtreeCheckState *state)
 	 * Routines like _bt_search() don't require *any* page split interlock
 	 * when descending the tree, including something very light like a buffer
 	 * pin. That's why it's okay that we don't either.  This avoidance of any
-	 * need to "couple" buffer locks is the raison d' etre of the Lehman & Yao
+	 * need to "couple" buffer locks is the reason d'etre of the Lehman & Yao
 	 * algorithm, in fact.
 	 *
 	 * That leaves deletion.  A deleted page won't actually be recycled by
diff --git a/contrib/citext/expected/citext.out b/contrib/citext/expected/citext.out
index 94aba67cdb..96800be9c0 100644
--- a/contrib/citext/expected/citext.out
+++ b/contrib/citext/expected/citext.out
@@ -2609,7 +2609,7 @@ SELECT citext_pattern_ge('b'::citext, 'A'::citext) AS true;
  t
 (1 row)
 
--- Multi-byte tests below are diabled like the sanity tests above.
+-- Multi-byte tests below are disabled like the sanity tests above.
 -- Uncomment to run them.
 -- Test ~<~ and ~<=~
 SELECT 'a'::citext ~<~  'B'::citext AS t;
diff --git a/contrib/citext/expected/citext_1.out b/contrib/citext/expected/citext_1.out
index 187d3b5d2c..33e3676d3c 100644
--- a/contrib/citext/expected/citext_1.out
+++ b/contrib/citext/expected/citext_1.out
@@ -2609,7 +2609,7 @@ SELECT citext_pattern_ge('b'::citext, 'A'::citext) AS true;
  t
 (1 row)
 
--- Multi-byte tests below are diabled like the sanity tests above.
+-- Multi-byte tests below are disabled like the sanity tests above.
 -- Uncomment to run them.
 -- Test ~<~ and ~<=~
 SELECT 'a'::citext ~<~  'B'::citext AS t;
diff --git a/contrib/citext/sql/citext.sql b/contrib/citext/sql/citext.sql
index 0cc909eb52..261b73cfa6 100644
--- a/contrib/citext/sql/citext.sql
+++ b/contrib/citext/sql/citext.sql
@@ -810,7 +810,7 @@ SELECT citext_pattern_ge('b'::citext, 'a'::citext) AS true;
 SELECT citext_pattern_ge('B'::citext, 'a'::citext) AS true;
 SELECT citext_pattern_ge('b'::citext, 'A'::citext) AS true;
 
--- Multi-byte tests below are diabled like the sanity tests above.
+-- Multi-byte tests below are disabled like the sanity tests above.
 -- Uncomment to run them.
 
 -- Test ~<~ and ~<=~
diff --git a/contrib/isn/EAN13.h b/contrib/isn/EAN13.h
index 7023ebdf63..c1bea862da 100644
--- a/contrib/isn/EAN13.h
+++ b/contrib/isn/EAN13.h
@@ -54,7 +54,7 @@ const char *EAN13_range[][2] = {
 	{"484", "484"},/* GS1 Moldova */
 	{"485", "485"},/* GS1 Armenia */
 	{"486", "486"},/* GS1 Georgia */
-	{"487", "487"},/* GS1 Kazakstan */
+	{"487", "487"},/* GS1 Kazakhstan */
 	{"489", "489"},/* GS1 Hong Kong */
 	{"490", "499"},/* GS1 Japan */
 	{"500", "509"},/* GS1 UK */
diff --git a/contrib/pg_trgm/data/trgm2.data b/contrib/pg_trgm/data/trgm2.data
index 664e079fec..d52bcf7127 100644
--- a/contrib/pg_trgm/data/trgm2.data
+++ b/contrib/pg_trgm/data/trgm2.data
@@ -18,7 +18,7 @@ Daikalay
 Bakall
 Stubaital
 Neustift im Stubaital
-Anonyme Appartments Stubaital
+Anonyme Apartments Stubaital
 Barkaladja Pool
 Awabakal Nature Reserve
 Awabakal Field Studies Centre
diff --git a/contrib/pgcrypto/imath.c b/contrib/pgcrypto/imath.c
index 6936d2cdca..9ad5b24644 100644
--- a/contrib/pgcrypto/imath.c
+++ b/contrib/pgcrypto/imath.c
@@ -3366,14 +3366,14 @@ s_udiv_knuth(mp_int u, mp_int v)
 		 * D4,D5,D6: Multiply qhat * v and test for a correct value of q
 		 *
 		 * We proceed a bit different than the way described by Knuth. This
-		 * way is simpler but less efficent. Instead of doing the multiply and
+		 * way is simpler but less efficient. Instead of doing the multiply and
 		 * subtract then checking for underflow, we first do the multiply of
 		 * qhat * v and see if it is larger than the 

Re: Indexing - comparison of tree structures

2019-05-28 Thread Andres Freund
Hi,

On 2019-05-27 12:40:07 +0200, Sascha Kuhl wrote:
> Where I can I find research on trees and indexing related to postgresql?

1) Please respect the list style of properly quoting responses inline,
   and only responding to messages that are somewhat related to the
   previous content
2) You ask a lot of question, without actually responding to responses
3) Please do some of your own research, before asking
   questions. E.g. there's documentation about our btree implementation
   etc in our source tree.

Regards,

Andres




Re: Indexing - comparison of tree structures

2019-05-28 Thread Sascha Kuhl
Dear moderator,

Can you inform me after you (as a mailing list) have changed something
related to my work. I like to keep track of my success.

Regards

Sascha Kuhl

Sascha Kuhl  schrieb am Mo., 27. Mai 2019, 16:07:

> Would not
>
> Sascha Kuhl  schrieb am Mo., 27. Mai 2019, 16:06:
>
>> To give you another fair hint: big O estimations would have revealed such
>> a difference.
>>
>> Sascha Kuhl  schrieb am Mo., 27. Mai 2019, 14:06:
>>
>>> I understand that changing is never easy
>>>
>>> Sascha Kuhl  schrieb am Mo., 27. Mai 2019, 13:52:
>>>
 You don't have to be rude: social communication is higher than looking
 and studying in a mailing list db. For me, at least;)

 Thanks for the direction and permission (I'm respectful with the work
 of others)

 Jonah H. Harris  schrieb am Mo., 27. Mai 2019,
 13:05:

> On Mon, May 27, 2019 at 5:14 AM Sascha Kuhl 
> wrote:
>
>> Can you bring me to the research showing b-tree is equally
>> performant? Is postgres taking this research into account?
>>
>
> Not trying to be rude, but you've been asking rather general
> questions; our mailing list is archived, searchable, and probably a better
> use of everyone's time for you to consult prior to posting. Per your
> question, to my knowledge, there is no active work on changing our primary
> b-tree index structure, which is based on Lehman and Yao's b-link tree.
> Given the maturity of our current implementation, I think it would be
> rather difficult to improve upon it in terms of performance, especially
> considering concurrency-related issues.
>
> --
> Jonah H. Harris
>
>


Re: Indexing - comparison of tree structures

2019-05-28 Thread Sascha Kuhl
Where I can I find research on trees and indexing related to postgresql?

Sascha Kuhl  schrieb am Mo., 27. Mai 2019, 11:14:

> Can you bring me to the research showing b-tree is equally performant? Is
> postgres taking this research into account?
>
> Jonah H. Harris  schrieb am Sa., 25. Mai 2019,
> 02:15:
>
>> T-tree (and variants) are index types commonly associated with in-memory
>> database management systems and rarely, if-ever, used with on-disk
>> databases. There has been a lot of research in regard to more modern cache
>> conscious/oblivious b-trees that perform equally or better than t-tree.
>> What’s the use-case?
>>
>> On Fri, May 24, 2019 at 5:38 AM Sascha Kuhl 
>> wrote:
>>
>>> Hi,
>>>
>>> I compared two data structures realistically by time, after estimating
>>> big O. T-tree outperforms b-tree, which is commonly used, for a medium size
>>> table. Lehmann and Carey showed the same, earlier.
>>>
>>> Can you improve indexing by this?
>>>
>>> Understandably
>>>
>>> Sascha Kuhl
>>>
>> --
>> Jonah H. Harris
>>
>>


Re: initdb recommendations

2019-05-28 Thread Magnus Hagander
On Fri, May 24, 2019 at 11:24 AM Noah Misch  wrote:

> On Thu, May 23, 2019 at 06:56:49PM +0200, Magnus Hagander wrote:
> > On Thu, May 23, 2019, 18:54 Peter Eisentraut <
> peter.eisentr...@2ndquadrant.com> wrote:
> > > To recap, the idea here was to change the default authentication
> methods
> > > that initdb sets up, in place of "trust".
> > >
> > > I think the ideal scenario would be to use "peer" for local and some
> > > appropriate password method (being discussed elsewhere) for host.
> > >
> > > Looking through the buildfarm, I gather that the only platforms that
> > > don't support peer are Windows, AIX, and HP-UX.  I think we can
> probably
> > > figure out some fallback or alternative default for the latter two
> > > platforms without anyone noticing.  But what should the defaults be on
> > > Windows?  It doesn't have local sockets, so the lack of peer wouldn't
> > > matter.  But is it OK to default to a password method, or would that
> > > upset people particularly?
> >
> > I'm sure password would be fine there. It's what "everybody else" does
> > (well sqlserver also cord integrated security, but people are used to
> it).
>
> Our sspi auth is a more-general version of peer auth, and it works over
> TCP.
> It would be a simple matter of programming to support "peer" on Windows,
> consisting of sspi auth with an implicit pg_ident map.  Nonetheless, I
> agree
> password would be fine.
>

I hope oyu don't mean "make peer use sspi on windows". I think that's a
really bad idea from a confusion perspective.

However, what we could do there is have the defaut pg_hba.conf file contain
a "reasonable setup using sspi" that's a different story.

But I wonder if that isn't better implemented at the installer level. I
think we're better off doing something like scram as the config when you
build from source ,and then encourage installers to do other things based
on the fact that they know more information about the setup (such as
usernames actually used).

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: PG 12 draft release notes

2019-05-28 Thread Andres Freund
Hi,

  


   
Add function pg_partition_root()
to return top-most parent of a partition tree (Michaël Paquier)
   
  

  


   
Add function pg_partition_ancestors()
to report all ancestors of a partition (Álvaro Herrera)
   
  

  


   
Add function pg_partition_tree()
to display information about partitions (Amit Langote)
   
  

Can we combine these three?

Greetings,

Andres Freund




Re: Question about some changes in 11.3

2019-05-28 Thread Mat Arye
On Fri, May 24, 2019 at 5:05 PM Mat Arye  wrote:

> Hi,
>
> 11.3 included some change to partition table planning. Namely
> commit 925f46f ("Fix handling of targetlist SRFs when scan/join relation is
> known empty.") seems to redo all paths for partitioned tables
> in apply_scanjoin_target_to_paths. It clears the paths in:
>
> ```
>  if (rel_is_partitioned)
> rel->pathlist = NIL
> ```
>
> Then the code rebuild the paths. However, the rebuilt append path never
> gets the
> set_rel_pathlist_hook called. Thus, the work that hook did previously gets
> thrown away and the rebuilt append path can never be influenced by this
> hook. Is this intended behavior? Am I missing something?
>
> Thanks,
> Mat
> TimescaleDB
>

I've  attached a small patch to address this discrepancy for when the
set_rel_pathlist_hook is called so that's it is called for actual paths
used for partitioned rels. Please let me know if I am misunderstanding how
this should be handled.


call_set_rel_pathlist_hook_on_part_rels.patch
Description: Binary data


Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2019-05-28 Thread Andres Freund
Hi,

On 2019-04-07 18:04:27 -0700, Andres Freund wrote:
> Here's a *prototype* patch for this.  It only implements what I
> described for heap_multi_insert, not for plain inserts. I wanted to see
> what others think before investing additional time.

Pavan, are you planning to work on this for v13 CF1? Or have you lost
interest on the topic?

- Andres




Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

2019-05-28 Thread Antonin Houska
Thanks of another round of review.

Robert Haas  wrote:

> On Wed, Apr 24, 2019 at 10:48 AM Antonin Houska  wrote:
> > Attached is the next version. It tries to address various problems pointed 
> > out
> > upthread, including documentation.
> 
> It's not immediately evident to me, perhaps because I haven't looked
> at this stuff in a while, what the purpose of 0001 and 0002 is.  I
> think it'd be helpful if you would generate these patches with 'git
> format-patch' and include a meaningful commit message in each patch of
> the series.

ok, will do.

> +   
> +This setting specifies path to executable file that returns
> +either encryption_key= for key
> +and encryption_password= for password.
> +   
> 
> This is incomprehensible.  Executables don't return strings.

That's right, the key should be written to the standard output.

> Also, you can return "either a or b" or you can return "both a and b," but
> you can't return "either a and b".

Sure, this is a thinko.

> - Build with support for SSL (encrypted)
> - connections. This requires the OpenSSL
> - package to be installed.  configure will check
> - for the required header files and libraries to make sure that
> + Build with support for on-disk data encryption
> + or SSL (encrypted) connections. This requires
> + the OpenSSL package to be
> + installed.  configure will check for the
> + required header files and libraries to make sure that
> 
> I think we should instead just remove the word 'connections'.  We
> don't want to have multiple places in the documentation listing all
> the things for which OpenSSL is required.

ok

> Generally, the documentation changes here have a lot of stuff about
> key-or-password, which is quite confusing.  It seems like the idea is
> that you accept either a key or a password from which a key is
> derived, but I don't think that's explained super-clearly, and I
> wonder whether it's unnecessary complexity.  Why not just require a
> key always?

We thought that it's more convenient for administrator to enter password. To
achieve that, we can instead tell the admin how to use the key derivation tool
(pg_keysetup) and send its output to postgres. So yes, it's possible to always
receive the key from the "encryption key command". I'll change this.

> Another general point is that the documentation, comments, and README
> need some work on the English.  They contain lots of information, but
> they read somewhat awkwardly and are hard to understand in some
> places.

ok, I'll try to explain the tricky things in simple senteces.

> Wherever you are adding a large hunk of code into the middle of an
> existing function (e.g. XLogWrite), please consider putting the logic
> in a new function and calling it from the original function, instead
> of just inlining the entire thing.

ok

> xloginsert.c adds a new #include even though there are no other
> changes to that file.  That should not be necessary.  If it is, you've
> broken the principle the header files are compilable independently of
> other header files (except postgres.h, which is a prerequisite for
> every header file).

It's not intentional, I failed to remove it when moving some encryption
related function calls elsewhere.

> -XLogRead(char *buf, int segsize, TimeLineID tli, XLogRecPtr startptr,
> - Size count)
> +XLogRead(char *buf, int segsize, TimeLineID tli, XLogRecPtr startptr,
> Size count)

> Spurious hunk.  Please try to avoid unnecessary changes.

Maybe I added and later removed some argument and forgot to enforce the
maximum line length. I expected pg_indent to fix things like this but it
apparently does not. I'll pay more attention to such things next time.

> + * XXX Currently the function is only called with startptr at page
> + * boundary. If it should change, encryption needs to reflect this fact,
> + * i.e. read data from the beginning of the page. Actually this is a
> + * reason to adjust and use walsender.c:XLogRead().
> 
> This comment and the similar comment at the end of the function are
> not very clear.   It sorta sounds, though, like maybe the decryption
> logic should move from here to the caller or something.  You seem to
> be saying that, after your changes, this function depends on the
> caller using it in a certain way, which is often a sign that you
> haven't abstracted things in the right way.  And if walsender.c's
> version of the logic is better, why not also do that stuff here?

Please consider this an incomplete implementation of the XLOG
decryption. Eventually these restrictions will be relaxed and the decryption
will be hidden from caller. Currently there are two implementations of
XLogRead() in the backend (plus one in the frontend) and I didn't want to
spend too much effort on merging the decryption into core until this another
patch gets committed:

https://commitfest.postgresql.org/23/2098/

The current XLogRead() functions 

Quick doc typo fix

2019-05-28 Thread Guillaume Lelarge
Issue found while translating the v12 manual. I also fixed something that
was missing, as far as I understand it (first fix, the typo is the second
fix).

See patch attached.

Thanks.


-- 
Guillaume.
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 4c7e93892a..f81f3d4160 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -57,7 +57,7 @@
 
  
   pg_am
-  index access methods
+  table and index access methods
  
 
  
diff --git a/doc/src/sgml/tablesample-method.sgml b/doc/src/sgml/tablesample-method.sgml
index 880e42c950..68eea68077 100644
--- a/doc/src/sgml/tablesample-method.sgml
+++ b/doc/src/sgml/tablesample-method.sgml
@@ -264,7 +264,7 @@ NextSampleTuple (SampleScanState *node,
 requests to sample missing or invisible tuples; that should not result in
 any bias in the sample.  However, if necessary, the function can use
 node-donetuples to examine how many of the tuples
-it returned were vlaid and visible.
+it returned were valid and visible.

   
 


Re: Alternate methods for multiple rows input/output to a function.

2019-05-28 Thread Adrian Klaver

On 5/28/19 7:36 AM, RAJIN RAJ K wrote:

--> Function ' filter_id ' filters the ID's based on some conditions.
--> Input is set of ID's. (Not directly taking the input since there is 
no provision to pass multiple rows to a function)


To be honest I cannot follow what you are trying to achieve below. I do 
have one suggestion as to creating temp tables.


Why not use a  CTE:

https://www.postgresql.org/docs/11/queries-with.html

in the function to build a 'temp' table on the fly?



create function filter_id()
return table (id bigint)
begin

--> Assuming input table is already created #temp_input_id

retun query as select id
from tbl a
inner join
#temp_input_id b on (a.id  = b.id )
where a.;

end;


--> Calling Function:

create function caller()
return table (id bigint,col1 bigint, col2 bigint)
begin

--> do some processing

--> Find out the input id's for filtering.

--> Create temp table for providing input for the filtering function

create temp table #TEMP1
as select id from tbla;
(Cannot move the input id logic to  filter_function)

--> calling the filter function
create temp table #TEMP2
as select * from filter_id(); --> This is a generic function used in 
many functions.



return query
as select a.*
from tb3 a inner join tb4 inner join tb 5 inner join #TEMP2;
end;


Is there any alternate way of achieving this? Passing multiple records 
to a function im creating a temp table before invoking the function.
For receiving an output of multiple rows i'm creating a temp table to 
reuse further in the code.


Can this be done using Refcursor? Is it possible to convert refcursor to 
a temp table and use it as normal  table in query?






--
Adrian Klaver
adrian.kla...@aklaver.com




Alternate methods for multiple rows input/output to a function.

2019-05-28 Thread RAJIN RAJ K
--> Function ' filter_id ' filters the ID's based on some conditions.
--> Input is set of ID's. (Not directly taking the input since there is no
provision to pass multiple rows to a function)

create function filter_id()
return table (id bigint)
begin

--> Assuming input table is already created #temp_input_id

retun query as select id
from tbl a
inner join
#temp_input_id b on (a.id = b.id)
where a.;

end;


--> Calling Function:

create function caller()
return table (id bigint,col1 bigint, col2 bigint)
begin

--> do some processing

--> Find out the input id's for filtering.

--> Create temp table for providing input for the filtering function

create temp table #TEMP1
as select id from tbla;
(Cannot move the input id logic to  filter_function)

--> calling the filter function
create temp table #TEMP2
as select * from filter_id(); --> This is a generic function used in many
functions.


return query
as select a.*
from tb3 a inner join tb4 inner join tb 5 inner join #TEMP2;
end;


Is there any alternate way of achieving this? Passing multiple records to a
function im creating a temp table before invoking the function.
For receiving an output of multiple rows i'm creating a temp table to reuse
further in the code.

Can this be done using Refcursor? Is it possible to convert refcursor to a
temp table and use it as normal  table in query?


Re: Add command column to pg_stat_progress_create_index

2019-05-28 Thread David Fetter
On Mon, May 27, 2019 at 11:20:28AM -0700, Andres Freund wrote:
> Hi,
> 
> On 2019-05-27 14:18:12 -0400, Peter Eisentraut wrote:
> > I propose to add a column "command" to pg_stat_progress_create_index.
> > The sibling view pg_stat_progress_cluster already contains such a
> > column.  This can help distinguish which command is running and thus
> > which phases to expect.  It seems reasonable to keep these views
> > consistent, too.  (They are both new in PG12.)  Patch attached.
> 
> Seems like we should do that for v12 then?

+1

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Remove page-read callback from XLogReaderState.

2019-05-28 Thread Andres Freund
Hi,

On 2019-04-18 21:02:57 +0900, Kyotaro HORIGUCHI wrote:
> Hello. As mentioned before [1], read_page callback in
> XLogReaderState is a cause of headaches. Adding another
> remote-controlling stuff to xlog readers makes things messier [2].
> 
> I refactored XLOG reading functions so that we don't need the
> callback. In short, ReadRecrod now calls XLogPageRead directly
> with the attached patch set.
> 
> | while (XLogReadRecord(xlogreader, RecPtr, , )
> |== XLREAD_NEED_DATA)
> | XLogPageRead(xlogreader, fetching_ckpt, emode, randAccess);
> 
> On the other hand, XLogReadRecord became a bit complex. The patch
> makes XLogReadRecord a state machine. I'm not confident that the
> additional complexity is worth doing. Anyway I'll gegister this
> to the next CF.

Just FYI, to me this doesn't clearly enough look like an improvement,
for a change of this size.

Greetings,

Andres Freund




Re: Fix inconsistencies for v12

2019-05-28 Thread Amit Kapila
On Tue, May 28, 2019 at 12:29 PM Amit Langote
 wrote:
>
> On 2019/05/28 14:00, Alexander Lakhin wrote:
> > 28.05.2019 2:05, Amit Kapila wrote:
> >> ... If we read the comment atop ExecContextForcesOids
> >> [1] before it was removed, it seems to indicate that the
> >> initialization of es_result_relation_info for each subplan is for its
> >> usage in ExecContextForcesOids.  I have run the regression tests with
> >> the attached patch (which removes changing es_result_relation_info in
> >> ExecInitModifyTable) and all the tests passed.  Do you have any
> >> thoughts on this matter?
> >
> > I got a coredump with `make installcheck-world` (on postgres_fdw test):
> > Core was generated by `postgres: law contrib_regression [local]
> > UPDATE   '.
> > Program terminated with signal SIGSEGV, Segmentation fault.
> > #0  0x7ff1410ece98 in postgresBeginDirectModify
> > (node=0x560a563fab30, eflags=0) at postgres_fdw.c:2363
> > 2363rtindex =
> > estate->es_result_relation_info->ri_RangeTableIndex;
> > (gdb) bt
> > #0  0x7ff1410ece98 in postgresBeginDirectModify
> > (node=0x560a563fab30, eflags=0) at postgres_fdw.c:2363
> > #1  0x560a55979e62 in ExecInitForeignScan
> > (node=node@entry=0x560a56254dc0, estate=estate@entry=0x560a563f9ae8,
> > eflags=eflags@entry=0) at nodeForeignscan.c:227
> > #2  0x560a5594e123 in ExecInitNode (node=node@entry=0x560a56254dc0,
> > estate=estate@entry=0x560a563f9ae8,
> > eflags=eflags@entry=0) at execProcnode.c:277
> > ...
> > So It seems that this is not a dead code.
>
> > ... So it seems that
> > this comment at least diverged from the initial author's intent.
> > With this in mind, I am inclined to just remove it.
>
> Seeing that the crash occurs due to postgres_fdw relying on
> es_result_relation_info being set when initializing a "direct
> modification" plan on foreign tables managed by it, we could change the
> comment to say that instead.  Note that allowing "direct modification" of
> foreign tables is a core feature, so there's no postgres_fdw-specific
> behavior here; there may be other FDWs that support "direct modification"
> plans and so likewise rely on es_result_relation_info being set.
>


Can we ensure some way that only FDW's rely on it not any other part
of the code?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Add command column to pg_stat_progress_create_index

2019-05-28 Thread Michael Paquier
On Mon, May 27, 2019 at 11:20:28AM -0700, Andres Freund wrote:
> On 2019-05-27 14:18:12 -0400, Peter Eisentraut wrote:
>> I propose to add a column "command" to pg_stat_progress_create_index.
>> The sibling view pg_stat_progress_cluster already contains such a
>> column.  This can help distinguish which command is running and thus
>> which phases to expect.  It seems reasonable to keep these views
>> consistent, too.  (They are both new in PG12.)  Patch attached.
> 
> Seems like we should do that for v12 then?

+1.
--
Michael


signature.asc
Description: PGP signature


Re: Fix comment in pgcrypto tests

2019-05-28 Thread Michael Paquier
On Mon, May 27, 2019 at 07:25:37PM -0700, Gurjeet Singh wrote:
> Please see attached the patch that corrects the file-level SQL comment that
> indicates which submodule of pgcrypto is being tested.

Thanks, committed.  There was a second one in pgp-decrypt.sql.
--
Michael


signature.asc
Description: PGP signature


Re: Remove useless associativity/precedence from parsers

2019-05-28 Thread Andrew Gierth
> "Akim" == Akim Demaille  writes:

 >> Yeah, we've definitely found that resolving shift/reduce conflicts
 >> via precedence declarations has more potential for surprising
 >> side-effects than one would think.

 Akim> That's why in recent versions of Bison we also provide a means to
 Akim> pure %expect directives on the rules themselves, to be more
 Akim> precise about what happens.

It's possibly worth looking at the details of each case where we've run
into problems to see whether there is a better solution.

The main cases I know of are:

1. RANGE UNBOUNDED PRECEDING - this one is actually a defect in the
standard SQL grammar, since UNBOUNDED is a non-reserved keyword and so
it can also appear as a legal , and the construct
RANGE  PRECEDING allows  to
appear as a .

We solve this by giving UNBOUNDED a precedence below PRECEDING.

2. CUBE() - in the SQL spec, GROUP BY does not allow expressions, only
column references, but we allow expressions as an extension. The syntax
GROUP BY CUBE(a,b) is a shorthand for grouping sets, but this is
ambiguous with a function cube(...). (CUBE is also a reserved word in the
spec, but it's an unreserved keyword for us.)

We solve this by giving CUBE (and ROLLUP) precedence below '('.

3. General handling of postfix operator conflicts

The fact that we allow postfix operators means that any sequence which
looks like   is ambiguous. This affects the use
of aliases in the SELECT list, and also PRECEDING, FOLLOWING, GENERATED,
and NULL can all follow expressions.

4. Not reserving words that the spec says should be reserved

We avoid reserving PARTITION, RANGE, ROWS, GROUPS by using precedence
hacks.

-- 
Andrew (irc:RhodiumToad)




Re: accounting for memory used for BufFile during hash joins

2019-05-28 Thread Hubert Zhang
Hi Tomas,

Here is the patch, it's could be compatible with your patch and it focus on
when to regrow the batch.


On Tue, May 28, 2019 at 3:40 PM Hubert Zhang  wrote:

> On Sat, May 4, 2019 at 8:34 AM Tomas Vondra 
> wrote:
>
>> The root cause is that hash join treats batches as pretty much free, but
>> that's not really true - we do allocate two BufFile structs per batch,
>> and each BufFile is ~8kB as it includes PGAlignedBuffer.
>>
>> The OOM is not very surprising, because with 524288 batches it'd need
>> about 8GB of memory, and the system only has 8GB RAM installed.
>>
>> The second patch tries to enforce work_mem more strictly. That would be
>> impossible if we were to keep all the BufFile structs in memory, so
>> instead it slices the batches into chunks that fit into work_mem, and
>> then uses a single "overflow" file for slices currently not in memory.
>> These extra slices can't be counted into work_mem, but we should need
>> just very few of them. For example with work_mem=4MB the slice is 128
>> batches, so we need 128x less overflow files (compared to per-batch).
>>
>>
> Hi Tomas
>
> I read your second patch which uses overflow buf files to reduce the total
> number of batches.
> It would solve the hash join OOM problem what you discussed above: 8K per
> batch leads to batch bloating problem.
>
> I mentioned in another thread:
>
> https://www.postgresql.org/message-id/flat/CAB0yrekv%3D6_T_eUe2kOEvWUMwufcvfd15SFmCABtYFOkxCFdfA%40mail.gmail.com
> There is another hashjoin OOM problem which disables splitting batches too
> early. PG uses a flag hashtable->growEnable to determine whether to split
> batches. Once one splitting failed(all the tuples are assigned to only one
> batch of two split ones) The growEnable flag would be turned off forever.
>
> The is an opposite side of batch bloating problem. It only contains too
> few batches and makes the in-memory hash table too large to fit into memory.
>
> Here is the tradeoff: one batch takes more than 8KB(8KB makes sense, due
> to performance), in-memory hash table takes memory as well and splitting
> batched may(not must) reduce the in-memory hash table size but introduce
> more batches(and thus more memory usage 8KB*#batch).
> Can we conclude that it would be worth to splitting if satisfy:
> (The reduced memory of in-memory hash table) - (8KB * number of new
> batches) > 0
>
> So I'm considering to combine our patch with your patch to fix join OOM
> problem. No matter the OOM is introduced by (the memory usage of
> in-memory hash table) or (8KB * number of batches).
>
> nbatch_inmemory in your patch could also use the upper rule to redefine.
>
> What's your opinion?
>
> Thanks
>
> Hubert Zhang
>


-- 
Thanks

Hubert Zhang


0001-Allow-to-continue-to-split-batch-when-tuples-become-.patch
Description: Binary data


Re: Why does pg_checksums -r not have a long option?

2019-05-28 Thread Fabien COELHO



|I have no problem with changing it to -r. -f seems a bit wrong to me,
|as it might read as a file. And in the future we might want to implement
|the ability to take full filename (with path), in which case it would
|make sense to use -f for that.


You could also use a long option for that without a one-letter option, 
like --file-path or such, so reserving a one-letter option for a future, 
hypothetical use is not really a stopper in my opinion.  In consequence, 
I think that that it is fine to just use -f/--filenode.


Yep. Also, the -f option could be overloaded by guessing whether is 
associated argument is a number or a path…



Any objections or better suggestions from other folks here?


--
Fabien.

Re: Remove page-read callback from XLogReaderState.

2019-05-28 Thread Antonin Houska
Kyotaro HORIGUCHI  wrote:

> v3-0001 : Changed macrosas suggested.

This attachment is missing, please send it too.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: accounting for memory used for BufFile during hash joins

2019-05-28 Thread Hubert Zhang
On Sat, May 4, 2019 at 8:34 AM Tomas Vondra 
wrote:

> The root cause is that hash join treats batches as pretty much free, but
> that's not really true - we do allocate two BufFile structs per batch,
> and each BufFile is ~8kB as it includes PGAlignedBuffer.
>
> The OOM is not very surprising, because with 524288 batches it'd need
> about 8GB of memory, and the system only has 8GB RAM installed.
>
> The second patch tries to enforce work_mem more strictly. That would be
> impossible if we were to keep all the BufFile structs in memory, so
> instead it slices the batches into chunks that fit into work_mem, and
> then uses a single "overflow" file for slices currently not in memory.
> These extra slices can't be counted into work_mem, but we should need
> just very few of them. For example with work_mem=4MB the slice is 128
> batches, so we need 128x less overflow files (compared to per-batch).
>
>
Hi Tomas

I read your second patch which uses overflow buf files to reduce the total
number of batches.
It would solve the hash join OOM problem what you discussed above: 8K per
batch leads to batch bloating problem.

I mentioned in another thread:

https://www.postgresql.org/message-id/flat/CAB0yrekv%3D6_T_eUe2kOEvWUMwufcvfd15SFmCABtYFOkxCFdfA%40mail.gmail.com
There is another hashjoin OOM problem which disables splitting batches too
early. PG uses a flag hashtable->growEnable to determine whether to split
batches. Once one splitting failed(all the tuples are assigned to only one
batch of two split ones) The growEnable flag would be turned off forever.

The is an opposite side of batch bloating problem. It only contains too few
batches and makes the in-memory hash table too large to fit into memory.

Here is the tradeoff: one batch takes more than 8KB(8KB makes sense, due to
performance), in-memory hash table takes memory as well and splitting
batched may(not must) reduce the in-memory hash table size but introduce
more batches(and thus more memory usage 8KB*#batch).
Can we conclude that it would be worth to splitting if satisfy:
(The reduced memory of in-memory hash table) - (8KB * number of new
batches) > 0

So I'm considering to combine our patch with your patch to fix join OOM
problem. No matter the OOM is introduced by (the memory usage of in-memory
hash table) or (8KB * number of batches).

nbatch_inmemory in your patch could also use the upper rule to redefine.

What's your opinion?

Thanks

Hubert Zhang


Re: Fix inconsistencies for v12

2019-05-28 Thread Amit Langote
On 2019/05/28 14:00, Alexander Lakhin wrote:
> 28.05.2019 2:05, Amit Kapila wrote:
>> ... If we read the comment atop ExecContextForcesOids
>> [1] before it was removed, it seems to indicate that the
>> initialization of es_result_relation_info for each subplan is for its
>> usage in ExecContextForcesOids.  I have run the regression tests with
>> the attached patch (which removes changing es_result_relation_info in
>> ExecInitModifyTable) and all the tests passed.  Do you have any
>> thoughts on this matter?
>
> I got a coredump with `make installcheck-world` (on postgres_fdw test):
> Core was generated by `postgres: law contrib_regression [local]
> UPDATE   '.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x7ff1410ece98 in postgresBeginDirectModify
> (node=0x560a563fab30, eflags=0) at postgres_fdw.c:2363
> 2363    rtindex =
> estate->es_result_relation_info->ri_RangeTableIndex;
> (gdb) bt
> #0  0x7ff1410ece98 in postgresBeginDirectModify
> (node=0x560a563fab30, eflags=0) at postgres_fdw.c:2363
> #1  0x560a55979e62 in ExecInitForeignScan
> (node=node@entry=0x560a56254dc0, estate=estate@entry=0x560a563f9ae8,
>     eflags=eflags@entry=0) at nodeForeignscan.c:227
> #2  0x560a5594e123 in ExecInitNode (node=node@entry=0x560a56254dc0,
> estate=estate@entry=0x560a563f9ae8,
>     eflags=eflags@entry=0) at execProcnode.c:277
> ...
> So It seems that this is not a dead code.

> ... So it seems that
> this comment at least diverged from the initial author's intent.
> With this in mind, I am inclined to just remove it.

Seeing that the crash occurs due to postgres_fdw relying on
es_result_relation_info being set when initializing a "direct
modification" plan on foreign tables managed by it, we could change the
comment to say that instead.  Note that allowing "direct modification" of
foreign tables is a core feature, so there's no postgres_fdw-specific
behavior here; there may be other FDWs that support "direct modification"
plans and so likewise rely on es_result_relation_info being set.

How about:

diff --git a/src/backend/executor/nodeModifyTable.c
b/src/backend/executor/nodeModifyTable.c
index a3c0e91543..95545c9472 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2316,7 +2316,7 @@ ExecInitModifyTable(ModifyTable *node, EState
*estate, int eflags)
  * verify that the proposed target relations are valid and open their
  * indexes for insertion of new index entries.  Note we *must* set
  * estate->es_result_relation_info correctly while we initialize each
- * sub-plan; ExecContextForcesOids depends on that!
+ * sub-plan; FDWs may depend on that.
  */
 saved_resultRelInfo = estate->es_result_relation_info;

Thanks,
Amit





Re: BEFORE UPDATE trigger on postgres_fdw table not work

2019-05-28 Thread Etsuro Fujita
Hi,

On Tue, May 28, 2019 at 12:54 PM Amit Langote
 wrote:
> On 2019/05/27 22:02, Tom Lane wrote:
> > Amit Langote  writes:
> >> On 2019/05/27 10:52, Shohei Mochizuki wrote:
> >>> I noticed returning a modified record in a row-level BEFORE UPDATE trigger
> >>> on postgres_fdw foreign tables do not work. Attached patch fixes this 
> >>> issue.
> >>> This is because current fdw code adds only columns to RemoteSQL that were
> >>> explicitly targets of the UPDATE as follows.
> >
> >> Yeah.  So, the trigger execution correctly modifies the existing tuple
> >> fetched from the remote server, but those changes are then essentially
> >> discarded by postgres_fdw, that is, postgresExecForeignModify().

> > Perhaps, if the table has relevant BEFORE triggers, we should just abandon
> > our attempts to optimize away fetching/storing all columns?  It seems like
> > another potential hazard here is a trigger needing to read a column that
> > is not mentioned in the SQL query.

> So, the only problem here is the optimizing away of storing all columns,
> which the Mochizuki-san's patch seems enough to fix.

Will look into the patch after returning from PGCon, unless somebody wants to.

Best regards,
Etsuro Fujita