Re: zheap: a new storage format for PostgreSQL

2018-10-31 Thread Amit Kapila
On Sat, May 26, 2018 at 6:33 PM Amit Kapila  wrote:
> On Fri, Mar 2, 2018 at 4:05 PM, Alexander Korotkov
>  wrote:
>
> It's been a while since we have updated the progress on this project,
> so here is an update.
>

Yet, another update.

>  This is based on the features that were not
> working (as mentioned in Readme.md) when the branch was published.
> 1. TID Scans are working now.
> 2. Insert .. On Conflict is working now.
> 3. Tuple locking is working with a restriction that if there are more
> concurrent lockers on a page than the number of transaction slots on a
> page, then some of the lockers will wait till others get committed.
> We are working on a solution to extend the number of transaction slots
> on a separate set of pages which exist in heap, but will contain only
> transaction data.
>

Now, we have a working solution for this problem.  The extended
transaction slots are stored in TPD pages (those contains only
transaction slot arrays) which are interleaved with regular pages.
For a detailed idea, you can see atop src/backend/access/zheap/tpd.c.
We still have a caveat here which is once the TPD pages are pruned
(the TPD page can be pruned if all the transaction slots are old
enough to matter), they are not added to FSM for reuse.  We are
working on a patch for this which we expect to finish in a week or so.

Toast tables are working now, the toast data is stored in zheap.
Apart from having a consistency for storing toast data in the same
storage engine as main data, it has the advantage of early cleanup
which means the space for deleted rows can be reclaimed as soon as the
transaction commits.  This is good for toast tables as each update in
toast table is a DELETE+INSERT.

Alignment of tuples is changed such that we don’t have align padding
between the tuple header and the tuple data as we always make a copy
of the tuple to support in-place updates. Likewise, we ideally don't
need any alignment padding between tuples. However, there are places
in zheap code where we access tuple header directly from page (ex.
zheap_delete, zheap_update, etc.) for which we want them to be aligned
at the two-byte boundary).   We omit all alignment padding for
pass-by-value types. Even in the current heap, we never point directly
to such values, so the alignment padding doesn’t help much; it lets us
fetch the value using a single instruction, but that is all.
Pass-by-reference types will work as they do in the heap. We can't
directly access unaligned values; instead, we need to use memcpy.  We
believe that the space savings will more than pay for the additional
CPU costs.

Vacuum full is implemented in such a way that we don't copy the
information required for MVCC-aware scans.  We copy only LIVE tuples
in new heap and freeze them before storing in new heap.  This is not a
good idea as we lose all the visibility information of tuples, but
OTOH, the same can't be copied from the original tuple as that is
maintained in undo and we don't have the facility to modify
undorecords.  We can either allow to modify undo records or write
special kind of undo records which will capture the required
visibility information.  I think it will be tricky to do this and not
sure if it is valuable to put a whole lot of effort without making
basic things work and another thing is that after zheap, the need of
vacuum will anyway be minimized to a good extent.

Serializable isolation is also supported, we don't need to make any
major changes except for making it understand ZheapTuple (used TID in
the required API's).  I think this part needs some changes after
integration with pluggable storage API.   We have a special handling
for the tuples which are in-place updated or the latest transaction
that modified that tuple got aborted. In that case, we check whether
the latest committed transaction that modified that tuple is a
concurrent transaction. Based on that, we take a decision on whether
we have any serialization conflict.

In zheap, for sub-transactions we don't need to generate new xid as
the visibility information for a particular tuple is present in undo
and on Rollabck To Savepoint, we apply the required undo to make the
state of the tuples as they were before the particular transaction.
This gives us a performance/scalability boost when sub-transactions
are involved as we don't need to acquire XIDGenLock for
subtransaction.  Apart from the above benefits, we need this for zheap
as otherwise the undo chain for each transaction won't be linear and
we save allocating additional slots for the each transaction id at the
page level.

Undo workers and transaction rollbacks are working now.  My colleague
Dilip has posted a separate patch [1] for this as this can have some
use cases without zheap as well and Thomas has just posted a patch
using that facility.

Some of the other features like row movement for an update of
partition key are also handled.

In short, now most of the user-visible features are working.  The make
installch

Re: INSTALL file

2018-10-31 Thread Michael Paquier
On Thu, Nov 01, 2018 at 01:32:09AM +0100, Andreas 'ads' Scherbaum wrote:
> Picking up on this idea, attached is a first draft for changing the
> README.

Why don't you add it to the upcoming commit fest?  It would be good to
get some traction with a formal review.

> It includes links to the website, as well as the short version of the
> installation instructions.

+The installation instructions are listed on the website:
+
+https://www.postgresql.org/docs/current/static/install-short.html
+
+Short version:
+
+./configure
+make
+su
+make install
+adduser postgres
+mkdir /usr/local/pgsql/data
+chown postgres /usr/local/pgsql/data
+su - postgres
+/usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data
+/usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/data -l logfile start

Adding a section about installation and another one about documentation
are good things.  Now for the installation section I disagree about
adding this detailed way of doing things, and just adding a URL looks
enough. 

Pointing to the global installation recommendations would be a better
fit also as a lot of things are platform-dependent.  So this URL looks
better:
https://www.postgresql.org/docs/current/static/installation.html

Now there is also a problem, the README would point out to the
development version of the documentation.  As this is made at working
using git, I could personally live with having stable branches also
refer to the development version, but it could also make sense to have
each stable branch point to the URL of the versions they work on.
--
Michael


signature.asc
Description: PGP signature


Re: pgsql: Avoid duplicate XIDs at recovery when building initial snapshot

2018-10-31 Thread Michael Paquier
On Tue, Oct 23, 2018 at 10:43:38AM +0900, Michael Paquier wrote:
> Well, following the same kind of thoughts, txid_current_snapshot() uses
> sort_snapshot() to remove all the duplicates after fetching its data
> from GetSnapshotData(), so wouldn't we want to do something about
> removal of duplicates if dummy PGXACT entries are found while scanning
> the ProcArray also in this case?  What I would think we should do is not
> only to patch GetRunningTransactionData() but also GetSnapshotData() so
> as we don't have duplicates also in this case, and do things in such a
> way that both code paths use the same logic, and that we don't need to
> have sort_snapshot() anymore.  That would be more costly though...

My apologies it took a bit longer than I thought.  I got a patch on my
desk for a couple of days, and finally took the time to finish something
which would address the concerns raised here.  As long as we don't reach
more than hundreds of thousands of entries, there is not going to be any
performance impact.  So what I do in the attached is to revert 1df21ddb,
and then have GetRunningTransactionData() sort the XIDs in the snapshot
and remove duplicates only if at least one dummy proc entry is found
while scanning, for xids and subxids.  This way, there is no need to
impact most of the instance deployments with the extra sort/removal
phase as most don't use two-phase transactions.  The sorting done at
recovery when initializing the standby snapshot still needs to happen of
course.

The patch is added to the upcoming CF for review.

Thanks,
--
Michael
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 908f62d37e..625c8ddf48 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -799,21 +799,10 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 		qsort(xids, nxids, sizeof(TransactionId), xidComparator);
 
 		/*
-		 * Add the sorted snapshot into KnownAssignedXids.  The running-xacts
-		 * snapshot may include duplicated xids because of prepared
-		 * transactions, so ignore them.
+		 * Add the sorted snapshot into KnownAssignedXids.
 		 */
 		for (i = 0; i < nxids; i++)
-		{
-			if (i > 0 && TransactionIdEquals(xids[i - 1], xids[i]))
-			{
-elog(DEBUG1,
-	 "found duplicated transaction %u for KnownAssignedXids insertion",
-	 xids[i]);
-continue;
-			}
 			KnownAssignedXidsAdd(xids[i], xids[i], true);
-		}
 
 		KnownAssignedXidsDisplay(trace_recovery(DEBUG3));
 	}
@@ -1924,10 +1913,9 @@ ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc)
  * This is never executed during recovery so there is no need to look at
  * KnownAssignedXids.
  *
- * Dummy PGXACTs from prepared transaction are included, meaning that this
- * may return entries with duplicated TransactionId values coming from
- * transaction finishing to prepare.  Nothing is done about duplicated
- * entries here to not hold on ProcArrayLock more than necessary.
+ * If dummy entries for prepared transactions are found in the proc array
+ * make sure to discard any duplicates as a transaction finishing to
+ * prepare may cause two entries with the same TransactionId to be found.
  *
  * We don't worry about updating other counters, we want to keep this as
  * simple as possible and leave GetSnapshotData() as the primary code for
@@ -1951,6 +1939,7 @@ GetRunningTransactionData(void)
 	int			count;
 	int			subcount;
 	bool		suboverflowed;
+	bool		found_dummy = false;
 
 	Assert(!RecoveryInProgress());
 
@@ -1999,6 +1988,7 @@ GetRunningTransactionData(void)
 	{
 		int			pgprocno = arrayP->pgprocnos[index];
 		volatile PGXACT *pgxact = &allPgXact[pgprocno];
+		volatile PGPROC *proc = &allProcs[pgprocno];
 		TransactionId xid;
 
 		/* Fetch xid just once - see GetNewTransactionId */
@@ -2022,6 +2012,13 @@ GetRunningTransactionData(void)
 		if (pgxact->overflowed)
 			suboverflowed = true;
 
+		/*
+		 * Mark that a dummy entry has been found, and that it is necessary to
+		 * check for duplicates entries.
+		 */
+		if (proc->pid == 0)
+			found_dummy = true;
+
 		/*
 		 * If we wished to exclude xids this would be the right place for it.
 		 * Procs with the PROC_IN_VACUUM flag set don't usually assign xids,
@@ -2033,6 +2030,34 @@ GetRunningTransactionData(void)
 		xids[count++] = xid;
 	}
 
+	/*
+	 * If some dummy entries have been found, it is possible that a two-phase
+	 * transaction just finishing to prepare has a duplicated entry with still
+	 * the active transaction entry in the procArray. There is no need to
+	 * remove any duplicates if no dummy entries have been found.
+	 */
+	if (found_dummy && count > 1)
+	{
+		TransactionId last = 0;
+		int			idx1,
+	idx2;
+		int			former_count;
+
+		qsort(xids, count, sizeof(TransactionId), xidComparator);
+
+		/* remove duplicate xids */
+		former_count = count;
+		idx1 = idx2 = 0;
+		while (idx1 < former_count)
+		{
+			if (TransactionIdEquals(xids[idx1], last))
+last = xids[idx2++] = x

Re: [HACKERS] Block level parallel vacuum

2018-10-31 Thread Yura Sokolov
Excuse me for being noisy.

Increasing vacuum's ring buffer improves vacuum upto 6 times.
https://www.postgresql.org/message-id/flat/20170720190405.GM1769%40tamriel.snowman.net
This is one-line change.

How much improvement parallel vacuum gives?

31.10.2018 3:23, Masahiko Sawada пишет:
> On Tue, Oct 30, 2018 at 5:30 PM Masahiko Sawada  wrote:
>>
>> On Tue, Aug 14, 2018 at 9:31 AM Masahiko Sawada  
>> wrote:
>>>
>>> On Thu, Nov 30, 2017 at 11:09 AM, Michael Paquier
>>>  wrote:
 On Tue, Oct 24, 2017 at 5:54 AM, Masahiko Sawada  
 wrote:
> Yeah, I was thinking the commit is relevant with this issue but as
> Amit mentioned this error is emitted by DROP SCHEMA CASCASE.
> I don't find out the cause of this issue yet. With the previous
> version patch, autovacuum workers were woking with one parallel worker
> but it never drops relations. So it's possible that the error might
> not have been relevant with the patch but anywayI'll continue to work
> on that.

 This depends on the extension lock patch from
 https://www.postgresql.org/message-id/flat/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B+Ss=gn...@mail.gmail.com/
 if I am following correctly. So I propose to mark this patch as
 returned with feedback for now, and come back to it once the root
 problems are addressed. Feel free to correct me if you think that's
 not adapted.
>>>
>>> I've re-designed the parallel vacuum patch. Attached the latest
>>> version patch. As the discussion so far, this patch depends on the
>>> extension lock patch[1]. However I think we can discuss the design
>>> part of parallel vacuum independently from that patch. That's way I'm
>>> proposing the new patch. In this patch, I structured and refined the
>>> lazy_scan_heap() because it's a single big function and not suitable
>>> for making it parallel.
>>>
>>> The parallel vacuum worker processes keep waiting for commands from
>>> the parallel vacuum leader process. Before entering each phase of lazy
>>> vacuum such as scanning heap, vacuum index and vacuum heap, the leader
>>> process changes the all workers state to the next state. Vacuum worker
>>> processes do the job according to the their state and wait for the
>>> next command after finished. Also in before entering the next phase,
>>> the leader process does some preparation works while vacuum workers is
>>> sleeping; for example, clearing shared dead tuple space before
>>> entering the 'scanning heap' phase. The status of vacuum workers are
>>> stored into a DSM area pointed by WorkerState variables, and
>>> controlled by the leader process. FOr the basic design and performance
>>> improvements please refer to my presentation at PGCon 2018[2].
>>>
>>> The number of parallel vacuum workers is determined according to
>>> either the table size or PARALLEL option in VACUUM command. The
>>> maximum of parallel workers is max_parallel_maintenance_workers.
>>>
>>> I've separated the code for vacuum worker process to
>>> backends/commands/vacuumworker.c, and created
>>> includes/commands/vacuum_internal.h file to declare the definitions
>>> for the lazy vacuum.
>>>
>>> For autovacuum, this patch allows autovacuum worker process to use the
>>> parallel option according to the relation size or the reloption. But
>>> autovacuum delay, since there is no slots for parallel worker of
>>> autovacuum in AutoVacuumShmem this patch doesn't support the change of
>>> the autovacuum delay configuration during running.
>>>
>>
>> Attached rebased version patch to the current HEAD.
>>
>>> Please apply this patch with the extension lock patch[1] when testing
>>> as this patch can try to extend visibility map pages concurrently.
>>>
>>
>> Because the patch leads performance degradation in the case where
>> bulk-loading to a partitioned table I think that the original
>> proposal, which makes group locking conflict when relation extension
>> locks, is more realistic approach. So I worked on this with the simple
>> patch instead of [1]. Attached three patches:
>>
>> * 0001 patch publishes some static functions such as
>> heap_paralellscan_startblock_init so that the parallel vacuum code can
>> use them.
>> * 0002 patch makes the group locking conflict when relation extension locks.
>> * 0003 patch add paralel option to lazy vacuum.
>>
>> Please review them.
>>
> 
> Oops, forgot to attach patches.
> 
> Regards,
> 
> --
> Masahiko Sawada
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center
> 




Re: Speeding up INSERTs and UPDATEs to partitioned tables

2018-10-31 Thread Gavin Flower

On 01/11/2018 14:30, David Rowley wrote:

On 1 November 2018 at 13:35, Amit Langote  wrote:

On 2018/11/01 8:58, David Rowley wrote:



[...]

I agree. I don't think "TupRouting" really needs to be in the name.
Probably "To" can also just become "2" and we can put back the
Parent/Child before that.

Agree that "TupRouting" can go, but "To" is not too long for using "2"
instead of it.
I think that while '2' may only be one character less than 'to', the 
character '2' stands out more.  However, can't say I could claim this 
was of the utmost importance!

Okay. Here's a version with "2" put back to "To"...


[...]




Re: Getting ERROR: could not open file "base/13164/t3_16388" with partition table with ON COMMIT

2018-10-31 Thread Michael Paquier
On Thu, Nov 01, 2018 at 12:39:16PM +0900, Amit Langote wrote:
> Rajkumar pointed out off-list that the patch still remains to be applied.
> Considering that there is a planned point release on Nov 8, maybe we
> should do something about this?

Yes doing something about that very soon would be a good idea.  Tom,
are you planning to look at it or should I jump in?
--
Michael


signature.asc
Description: PGP signature


Re: move PartitionBoundInfo creation code

2018-10-31 Thread Amit Langote
On 2018/11/01 13:02, Michael Paquier wrote:
> On Thu, Nov 01, 2018 at 12:58:29PM +0900, Amit Langote wrote:
>> Attached find a patch that does such refactoring, along with making some
>> functions in partbounds.c that are not needed outside static.
> 
> This looks like a very good idea to me.  Thanks for digging into that.
> Please just make sure to register it to the next CF if not already
> done.

Done a few moments ago. :)

Thanks,
Amit





Re: move PartitionBoundInfo creation code

2018-10-31 Thread Michael Paquier
On Thu, Nov 01, 2018 at 12:58:29PM +0900, Amit Langote wrote:
> Attached find a patch that does such refactoring, along with making some
> functions in partbounds.c that are not needed outside static.

This looks like a very good idea to me.  Thanks for digging into that.
Please just make sure to register it to the next CF if not already
done.
--
Michael


signature.asc
Description: PGP signature


move PartitionBoundInfo creation code

2018-10-31 Thread Amit Langote
Hi,

Currently, the code that creates a PartitionBoundInfo struct from the
PartitionBoundSpec nodes of constituent partitions read from the catalog
is in RelationBuildPartitionDesc that's in partcache.c.  I think that
da6f3e45dd that moved around the partitioning code [1] really missed the
opportunity to move the aforementioned code into partbounds.c.  I think
there is quite a bit of logic contained in that code that can aptly be
said to belong in partbounds.c.  Also, if factored out into a function of
its own with a proper interface, it could be useful to other callers that
may want to build it for, say, fake partitions [2] which are not real
relations.

Attached find a patch that does such refactoring, along with making some
functions in partbounds.c that are not needed outside static.

Thanks,
Amit

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=da6f3e45ddb

[2]
https://www.postgresql.org/message-id/009901d3f34c%2471e1bdc0%2455a53940%24%40lab.ntt.co.jp
From b03a6fee7624058197bdeb67c5da04f8e3492505 Mon Sep 17 00:00:00 2001
From: amit 
Date: Thu, 1 Nov 2018 11:32:35 +0900
Subject: [PATCH v1] Move PartitionBoundInfo creation code to partbounds.c

This Factors out the code that creates a PartitionBoundInfo struct
from a list of PartitionBoundSpec nodes into a function called
build_partition_boundinfo and moves it into partbounds.c, where it
aptly seems to belong.

Along with that movement, also make some functions in partbounds.c
static that are not used (or no longer used) outside of that file.
---
 src/backend/partitioning/partbounds.c | 534 +-
 src/backend/utils/cache/partcache.c   | 526 +
 src/include/partitioning/partbounds.h |  14 +-
 3 files changed, 547 insertions(+), 527 deletions(-)

diff --git a/src/backend/partitioning/partbounds.c 
b/src/backend/partitioning/partbounds.c
index c94f73aadc..cd30bb2b25 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -36,6 +36,24 @@
 #include "utils/ruleutils.h"
 #include "utils/syscache.h"
 
+static PartitionRangeBound *make_one_partition_rbound(PartitionKey key, int 
index,
+ List *datums, bool lower);
+static int32 partition_hbound_cmp(int modulus1, int remainder1, int modulus2,
+int remainder2);
+static int32 partition_rbound_cmp(int partnatts, FmgrInfo *partsupfunc,
+Oid *partcollation, Datum *datums1,
+PartitionRangeDatumKind *kind1, bool 
lower1,
+PartitionRangeBound *b2);
+static int partition_range_bsearch(int partnatts, FmgrInfo *partsupfunc,
+   Oid *partcollation,
+   PartitionBoundInfo boundinfo,
+   PartitionRangeBound *probe, 
bool *is_equal);
+
+static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
+static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
+  void *arg);
+static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
+  void *arg);
 static int get_partition_bound_num_indexes(PartitionBoundInfo b);
 static Expr *make_partition_op_expr(PartitionKey key, int keynum,
   uint16 strategy, Expr *arg1, Expr 
*arg2);
@@ -93,6 +111,469 @@ get_qual_from_partbound(Relation rel, Relation parent,
 }
 
 /*
+ * build_partition_boundinfo
+ * Build a PartitionBoundInfo struct from a list of 
PartitionBoundSpec
+ * nodes
+ *
+ * partoids is the list of OIDs of partitions, which can be NIL if the
+ * caller doesn't have one.  If non-NIL, oids should also be non-NULL and
+ * *oids should point to an array of OIDs containing space for
+ * list_length(partoids) elements.
+ */
+PartitionBoundInfo
+build_partition_boundinfo(PartitionKey key, List *boundspecs,
+ List *partoids, Oid **oids)
+{
+   ListCell   *cell;
+   int i,
+   nparts;
+   int ndatums = 0;
+   int default_index = -1;
+
+   /* Hash partitioning specific */
+   PartitionHashBound **hbounds = NULL;
+
+   /* List partitioning specific */
+   PartitionListValue **all_values = NULL;
+   int null_index = -1;
+
+   /* Range partitioning specific */
+   PartitionRangeBound **rbounds = NULL;
+
+   PartitionBoundInfo boundinfo;
+   int*mapping;
+   int next_index = 0;
+
+   nparts = list_length(boundspecs);
+   Assert(nparts > 0);
+
+   /* Convert from node to the inter

Re: Getting ERROR: could not open file "base/13164/t3_16388" with partition table with ON COMMIT

2018-10-31 Thread Amit Langote
On 2018/09/14 10:53, Amit Langote wrote:
> On 2018/09/13 23:13, Tom Lane wrote:
>> Amit Langote  writes:
>>> On 2018/09/13 1:14, Tom Lane wrote:
 That seems excessively restrictive.  Anything that has storage (e.g.
 matviews) ought to be truncatable, no?
>>
>>> Not by heap_truncate it seems.  The header comment of heap_truncate says
>>> that it concerns itself only with ON COMMIT truncation of temporary tables:
>>
>> Ah.  Well, in that case I'm OK with just a simple test for
>> RELKIND_RELATION, but I definitely feel that it should be inside
>> heap_truncate.  Callers don't need to know about the limited scope
>> of what that does.
> 
> I guess you meant inside heap_truncate_one_rel.  I updated the patch that
> way, but I wonder if we shouldn't also allow other relkinds that have
> storage which RelationTruncate et al can technically deal with.

Rajkumar pointed out off-list that the patch still remains to be applied.
Considering that there is a planned point release on Nov 8, maybe we
should do something about this?

Thanks,
Amit




Re: heap_sync seems rather oblivious to partitioned tables (wal_level=minimal)

2018-10-31 Thread David Rowley
Thanks for looking at this.

On 1 November 2018 at 16:07, Steve Singer  wrote:
> --- a/doc/src/sgml/perform.sgml
> +++ b/doc/src/sgml/perform.sgml
> @@ -1534,9 +1534,10 @@ SELECT * FROM x, y, a, b, c WHERE something AND 
> somethingelse;
>  TRUNCATE command. In such cases no WAL
>  needs to be written, because in case of an error, the files
>  containing the newly loaded data will be removed anyway.
> -However, this consideration only applies when
> - is minimal as all 
> commands
> -must write WAL otherwise.
> +However, this consideration only applies for non-partitioned
> +tables when  is
> +minimal as all commands must write WAL
> +otherwise.
> 
>
> Would this be better as "However, this consideration only applies for 
> non-partitioned and the wal_level is minimal"
> (Switching the 'when' to 'and the')

I ended up going with:

However, this consideration only applies when
 is minimal for
non-partitioned tables as all commands must write WAL otherwise.

> --- a/doc/src/sgml/ref/copy.sgml
> +++ b/doc/src/sgml/ref/copy.sgml
> @@ -222,9 +222,10 @@ COPY {  class="parameter">table_name [ ( Requests copying the data with rows already frozen, just as they
>would be after running the VACUUM FREEZE command.
>This is intended as a performance option for initial data loading.
> -  Rows will be frozen only if the table being loaded has been created
> -  or truncated in the current subtransaction, there are no cursors
> -  open and there are no older snapshots held by this transaction.
> +  Rows will be frozen only for non-partitioned tables if the table
> +  being loaded has been created or truncated in the current
> +  subtransaction, there are no cursors open and there are no older
> +  snapshots held by this transaction.
>
> Similar concern with above

I've changed this to:

  Rows will be frozen only if the table being loaded has been created
  or truncated in the current subtransaction, there are no cursors
  open and there are no older snapshots held by this transaction.  It is
  currently not possible to perform a COPY FREEZE on
  a partitioned table.

Which is just the original text with the new sentence tagged onto the end.

> Other than that the patch looks fine and works as expected.
>
> The new status of this patch is: Waiting on Author

Thanks for looking at this. I've attached an updated patch.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


fix_incorrect_setting_of_hi_options_for_partitioned_tables_v3.patch
Description: Binary data


Re: PG vs macOS Mojave

2018-10-31 Thread Tom Lane
[ just when you thought it was safe to go back in the water ]

I wrote:
>> Jakob Egger  writes:
>>> I would assume that clang sets -isysroot automatically, but I have no idea 
>>> why that didn't work for you previously.

>> [ experiments further ... ]  It looks like clang does default to assuming
>> -isysroot with the correct sysroot for its Xcode version.

So today I updated longfin's host to macOS 10.14.1 + Xcode 10.1,
and things promptly broke:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2018-10-31%2013%3A49%3A50

It can't find Perl's headers anymore.  Investigation says that the reason
is that clang no longer has this as default:

  -isysroot 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk

You can verify the thing's built-in settings with something like

$ echo >test.c
$ clang -v -E test.c

and on Xcode 10.0 I get

 
"/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang"
 -cc1 -triple x86_64-apple-macosx10.14.0 -Wdeprecated-objc-isa-usage 
-Werror=deprecated-objc-isa-usage -E -disable-free -disable-llvm-verifier 
-discard-value-names -main-file-name test.c -mrelocation-model pic -pic-level 2 
-mthread-model posix -mdisable-fp-elim -fno-strict-return -masm-verbose 
-munwind-tables -target-cpu penryn -dwarf-column-info -debugger-tuning=lldb 
-target-linker-version 409.12 -v -resource-dir 
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/10.0.0
 -isysroot 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk
 -I/usr/local/include -fdebug-compilation-dir /Users/tgl -ferror-limit 19 
-fmessage-length 80 -stack-protector 1 -fblocks 
-fencode-extended-block-signature -fobjc-runtime=macosx-10.14.0 
-fmax-type-align=16 -fdiagnostics-show-option -fcolor-diagnostics -o - -x c 
test.c

while with Xcode 10.1's compiler:

 
"/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang"
 -cc1 -triple x86_64-apple-macosx10.14.0 -Wdeprecated-objc-isa-usage 
-Werror=deprecated-objc-isa-usage -E -disable-free -disable-llvm-verifier 
-discard-value-names -main-file-name test.c -mrelocation-model pic -pic-level 2 
-mthread-model posix -mdisable-fp-elim -fno-strict-return -masm-verbose 
-munwind-tables -target-cpu penryn -dwarf-column-info -debugger-tuning=lldb 
-target-linker-version 409.12 -v -resource-dir 
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/10.0.0
 -fdebug-compilation-dir /Users/tgl -ferror-limit 19 -fmessage-length 80 
-stack-protector 1 -fblocks -fencode-extended-block-signature 
-fobjc-runtime=macosx-10.14.0 -fmax-type-align=16 -fdiagnostics-show-option 
-fcolor-diagnostics -o - -x c test.c

No default sysroot at all.  So the immediate question is whether Apple
broke this in Xcode 10.1, or whether it was broken in Xcode 10.0 and
now they've "fixed" it.  Unfortunately, I'm afraid the answer is the
latter.  I do not any longer have a pre-10.0 Xcode handy to try, but
if those versions did not have any default sysroot, that would handily
explain the results I was getting before updating to Mojave.  Also,
some googling suggests that older Xcode versions had no default sysroot
setting; for instance see here:

https://langui.sh/2015/07/24/osx-clang-include-lib-search-paths/

In short, it looks like our current solution of expecting the compiler
to default to the correct -isysroot setting is just broken.  It
accidentally works on Xcode 10.0, but not earlier or later versions.

We don't have a lot of time to fix this, as our next quarterly
releases wrap on Monday.  Right now I think the only plausible
fix is to go back to adding "-isysroot $PG_SYSROOT" to CPPFLAGS.
Maybe we can fix things to make it relatively easy to override that
for cases like building against a different Xcode version, but it's
not going to just be automatic for that to work.

regards, tom lane



Re: Ordered Partitioned Table Scans

2018-10-31 Thread David Rowley
On 1 November 2018 at 04:01, Antonin Houska  wrote:
> * As for the logic, I found generate_mergeappend_paths() to be the most
>   interesting part:
>
> Imagine table partitioned by "i", so "partition_pathkeys" is {i}.
>
> partition 1:
>
> i | j
> --+--
> 0 | 0
> 1 | 1
> 0 | 1
> 1 | 0
>
> partition 2:
>
> i | j
> --+--
> 3 | 0
> 2 | 0
> 2 | 1
> 3 | 1
>
> Even if "pathkeys" is {i, j}, i.e. not contained in "partition_pathkeys", the
> ordering of the subpaths should not change the way tuples are split into
> partitions.
>
> Obviously a problem is if "partition_pathkeys" and "pathkeys" lists start with
> different items. To propose more generic rule, I used this example of
> range-partitioned table, where "i" and "j" are the partitioning keys:
>
> partition 1:
>
>  i | j | k
> ---+---+---
>  0 | 0 | 1
>  0 | 0 | 0
>
> partition 2:
>
>  i | j | k
> ---+---+---
>  0 | 1 | 0
>  0 | 1 | 1
>
> If the output "pathkey" is {i, k}, then the Append path makes rows of both
> partitions interleave:
>
>  i | j | k
> ---+---+---
>  0 | 0 | 0
>  0 | 1 | 0
>  0 | 0 | 1
>  0 | 1 | 1
>
> So in general I think the restriction is that no valid position of "pathkeys"
> and "partition_pathkeys" may differ. Or in other words: the shorter of the 2
> pathkey lists must be contained in the longer one. Does it make sense to you?

I understand what you're saying. I just don't understand what you
think is wrong with the patch in this area.

> Another problem I see is that build_partition_pathkeys() continues even if it
> fails to create a pathkey for some partitioning column. In the example above
> it would mean that the table can have "partition_pathkeys" equal to {j}
> instead of {i, j} on some EC-related conditions. However such a key does not
> correspond to reality - this is easier to imagine if another partition is
> considered.
>
> partition 3:
>
>  i | j | k
> ---+---+---
>  1 | 0 | 1
>  1 | 0 | 0
>
> So I think no "partition_pathkeys" should be generated in that case. On the
> other hand, if the function returned the part of the list it could construct
> so far, it'd be wrong because such incomplete pathkeys could pass the checks I
> proposed above for reasons unrelated to the partitioning scheme.

Oops. That's a mistake. We should return what we have so far if we
can't make one of the pathkeys. Will fix.

> The following comments are mostly on coding:
>
> * Both qsort_partition_list_value_cmp() and qsort_partition_rbound_cmp() have
>   this sentence in the header comment:
>
> Note: If changing this, see build_partition_pathkeys()
>
> However I could not find other use besides that in
> RelationBuildPartitionDesc().

While the new code does not call those directly, the new code does
depend on the sort order of the partitions inside the PartitionDesc,
which those functions are responsible for. Perhaps there's a better
way to communicate that.

Actually, I think the partitioning checking code I added to pathkeys.c
does not belong there. Likely those checks should live with the other
partitioning code in the form of a bool returning function. I'll
change that now. It means we don't have to work that out twice as I'm
currently running it once for forward and once for the backwards scan
case.  Currently the code is very simple but if we start analysing
list partition bounds then it will become slower.

> * create_append_path():
>
> /*
>  * Apply query-wide LIMIT if known and path is for sole base relation.
>  * (Handling this at this low level is a bit klugy.)
>  */
> if (root != NULL && pathkeys != NULL &&
> bms_equal(rel->relids, root->all_baserels))
> pathnode->limit_tuples = root->limit_tuples;
> else
> pathnode->limit_tuples = -1.0;
>
>   I think this optimization is not specific to AppendPath / MergeAppendPath,
>   so it could be moved elsewhere (as a separate patch of course). But
>   specifically for AppendPath, why do we have to test pathkeys? The pathkeys
>   of the AppendPath do not necessarily describe the order of the set to which
>   LIMIT is applied, so their existence should not be important here.

The pathkeys != NULL could be removed. I was just trying to maintain
the status quo for Appends without pathkeys. In reality it currently
does not matter since that's only used as a parameter for cost_sort().
There'd be no reason previously to have a Sort path as a subpath in an
Append node since the order would be destroyed after the Append.
Perhaps we should just pass it through as one day it might be useful.
I just can't currently imagine why.

> * If pathkeys is passed, shouldn't create_append_path() include the
>   cost_sort() of subpaths just like create_merge_append_path() does?  And if
>   so, then create_append_path() and create_merge_append_path() might
>   eventually have some common code (at least for the subpath processing) to be
>   put into a separate function.

It does. It's just done via the call to cost_append().

Re: heap_sync seems rather oblivious to partitioned tables (wal_level=minimal)

2018-10-31 Thread Steve Singer
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, failed

--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1534,9 +1534,10 @@ SELECT * FROM x, y, a, b, c WHERE something AND 
somethingelse;
 TRUNCATE command. In such cases no WAL
 needs to be written, because in case of an error, the files
 containing the newly loaded data will be removed anyway.
-However, this consideration only applies when
- is minimal as all 
commands
-must write WAL otherwise.
+However, this consideration only applies for non-partitioned
+tables when  is
+minimal as all commands must write WAL
+otherwise.


Would this be better as "However, this consideration only applies for 
non-partitioned and the wal_level is minimal"
(Switching the 'when' to 'and the')



--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -222,9 +222,10 @@ COPY { table_name [ ( VACUUM FREEZE command.
   This is intended as a performance option for initial data loading.
-  Rows will be frozen only if the table being loaded has been created
-  or truncated in the current subtransaction, there are no cursors
-  open and there are no older snapshots held by this transaction.
+  Rows will be frozen only for non-partitioned tables if the table
+  being loaded has been created or truncated in the current
+  subtransaction, there are no cursors open and there are no older
+  snapshots held by this transaction.

Similar concern with above

When I read this comment it makes me think that the table is partitioned then 
the rows will be frozen even if the table was created by an earlier 
transaction. 
Do we want to instead say "Rows will be frozen if the table is not partitioned 
and the table being loaded has been"

Other than that the patch looks fine and works as expected.

The new status of this patch is: Waiting on Author


Re: PostgreSQL Limits and lack of documentation about them.

2018-10-31 Thread John Naylor
On 11/1/18, Nasby, Jim  wrote:
> Hmm… 18 bytes doesn’t sound right, at least not for the Datum. Offhand I’d
> expect it to be the small (1 byte) varlena header + an OID (4 bytes). Even
> then I don’t understand how 1600 text columns would work; the data area of a
> tuple should be limited to ~2000 bytes, and 2000/5 = 400.

The wording in the docs (under Physical Storage) is "Allowing for the
varlena header bytes, the total size of an on-disk TOAST pointer datum
is therefore 18 bytes regardless of the actual size of the represented
value.", and as I understand it, it's

header + toast table oid + chunk_id + logical size + compressed size.

This is one area where visual diagrams would be nice.

-John Naylor



Re: Super PathKeys (Allowing sort order through precision loss functions)

2018-10-31 Thread David Rowley
On 1 November 2018 at 12:24, Andres Freund  wrote:
> FWIW, I kind of wonder if we built proper infrastructure to allow to
> make such inferrences from function calls, whether it could also be made
> to support the transformation of LIKEs into indexable <= >= clauses.

Perhaps, but I doubt it would be the same function to do both.  Surely
I need something that accepts details about the function call as
arguments and returns an Expr * of the argument that we can derive the
order of the return value from, or NULL.  I think the transformation
you need might be more along the lines of returning a List * of quals
that can substitute an OpExpr containing a function call. I'm not that
clear on how we'd know the new quals were better than the existing
ones.  For example extract('year', dt)  = 2018 could be transformed to
dt >= '2018-01-01' AND dt < '2019-01-01', but how would we know that
was better. There might be an index on extract('year', dt).

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Speeding up INSERTs and UPDATEs to partitioned tables

2018-10-31 Thread David Rowley
On 1 November 2018 at 13:35, Amit Langote  wrote:
> On 2018/11/01 8:58, David Rowley wrote:
>> On 1 November 2018 at 06:45, Robert Haas  wrote:
>>> I think a better way to shorten the name would be to truncate the
>>> PartitionTupRouting() prefix in some way, e.g. dropping TupRouting.
>>
>> Thanks for chipping in on this.
>>
>> I agree. I don't think "TupRouting" really needs to be in the name.
>> Probably "To" can also just become "2" and we can put back the
>> Parent/Child before that.
>
> Agree that "TupRouting" can go, but "To" is not too long for using "2"
> instead of it.

Okay. Here's a version with "2" put back to "To"...

It's great to know the patch is now so perfect that we've only the
macro naming left to debate ;-)

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v12-0001-Speed-up-INSERT-and-UPDATE-on-partitioned-tables.patch
Description: Binary data


Re: pg_promote not marked as parallel-restricted in pg_proc.dat

2018-10-31 Thread Michael Paquier
On Wed, Oct 31, 2018 at 01:09:53PM -0400, Robert Haas wrote:
> There's no rule whatsoever that a parallel worker can't write to the
> disk.  pg_start_backup and pg_stop_backup have to be
> parallel-restricted because, when used in non-exclusive mode, they
> establish backend-local state that wouldn't be synchronized with the
> state in the workers -- namely the information that a non-exclusive
> backup is in progress.

Okay, but likely we would not want to signal the postmaster
unnecessarily, no?  FALLBACK_PROMOTE_SIGNAL_FILE gets discarded if
promotion is triggered more than once, but that does not like a sane
thing to do if not necessary.

As far as I understand, there has been some input on this thread:
- I would prefer marking the function as parallel-restricted.
- Tom would make it parallel-unsafe.
- Laurenz (author of the feature) is fine with restricted or unsafe.

It is a bit hard to make a decision from that :)
--
Michael


signature.asc
Description: PGP signature


Re: COPY FROM WHEN condition

2018-10-31 Thread David Fetter
On Wed, Oct 31, 2018 at 11:21:33PM +, Nasby, Jim wrote:
> On Oct 11, 2018, at 10:35 AM, David Fetter  wrote:
> > 
> >> It didn't get far, but you may want to take a look at a rejected patch for
> >> copy_srf() (set returning function)
> >> https://www.postgresql.org/message-id/CADkLM%3DdoeiWQX4AGtDNG4PsWfSXz3ai7kY%3DPZm3sUhsUeev9Bg%40mail.gmail.com
> >> https://commitfest.postgresql.org/12/869/
> >> 
> >> Having a set returning function gives you the full expressiveness of SQL,
> >> at the cost of an extra materialization step.
> > 
> > I wonder whether something JIT-like could elide this. A very
> > interesting subset of such WHEN clauses could be pretty
> > straight-forward to implement in a pretty efficient way.
> 
> Are you thinking something like having a COPY command that provides
> results in such a way that they could be referenced in a FROM clause
> (perhaps a COPY that defines a cursor…)?

That would also be nice, but what I was thinking of was that some
highly restricted subset of cases of SQL in general could lend
themselves to levels of optimization that would be impractical in
other contexts.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Is there way to detect uncommitted 'new table' in pg_class?

2018-10-31 Thread Michael Paquier
On Wed, Oct 31, 2018 at 01:30:52PM -0400, Robert Haas wrote:
> In theory, at least, you could write C code to scan the catalog tables
> with SnapshotDirty to find the catalog entries, but I don't think that
> helps a whole lot.  You couldn't necessarily rely on those catalog
> entries to be in a consistent state, and even if they were, they might
> depend on committed types or functions or similar whose definitions
> your backend can't see.  Moreover, the creating backend will have an
> AccessExclusiveLock on the table -- if you write C code to bypass that
> and read the data anyway, then you will probably destabilize the
> entire system for complicated reasons that I don't feel like
> explaining right now.

One take here is that we cannot give any guarantee that a single DDL
will create only one consistent version of the tuple added in system
catalogs.  In those cases a new version is made visible by using
CommandCounterIncrement() so as the follow-up processing can see it.

> You should try very hard to find some way of solving this problem that
> doesn't require reading data from a table that hasn't been committed
> yet, because you are almost certainly not going to be able to make
> that work reliably even if you are willing to write code in C.

+1.
--
Michael


signature.asc
Description: PGP signature


Re: Speeding up INSERTs and UPDATEs to partitioned tables

2018-10-31 Thread Amit Langote
On 2018/11/01 8:58, David Rowley wrote:
> On 1 November 2018 at 06:45, Robert Haas  wrote:
>> On Wed, Aug 22, 2018 at 8:30 AM David Rowley
>>  wrote:
>>> On 22 August 2018 at 19:08, Amit Langote  
>>> wrote:
 +#define PartitionTupRoutingGetToParentMap(p, i) \
 +#define PartitionTupRoutingGetToChildMap(p, i) \

 If the "Get" could be replaced by "Child" and "Parent", respectively,
 they'd sound more meaningful, imho.
>>>
>>> I did that to save 3 chars.  I think putting the additional
>>> Child/Parent in the name is not really required. It's not as if we're
>>> going to have a ParentToParent or a ChildToChild map, so I thought it
>>> might be okay to assume that if it's "ToParent", that it's being
>>> converted from the child and "ToChild" seems safe to assume it's being
>>> converted from the parent. I can change it though if you feel very
>>> strongly that what I've got is no good.
>>
>> I'm not sure exactly what is best here, but it seems to unlikely to me
>> that somebody is going to read that macro name and think, oh, that
>> means "get the to-parent map".  They are more like be confused by the
>> juxtaposition of "get" and "to".
>>
>> I think a better way to shorten the name would be to truncate the
>> PartitionTupRouting() prefix in some way, e.g. dropping TupRouting.
> 
> Thanks for chipping in on this.
> 
> I agree. I don't think "TupRouting" really needs to be in the name.
> Probably "To" can also just become "2" and we can put back the
> Parent/Child before that.

Agree that "TupRouting" can go, but "To" is not too long for using "2"
instead of it.

Thanks,
Amit




Re: replication_slots usability issue

2018-10-31 Thread Michael Paquier
HI Andres,

On Wed, Oct 31, 2018 at 03:48:02PM -0700, Andres Freund wrote:
> And done.  Thanks for the report JD.

Shouldn't we also switch the PANIC to a FATAL in RestoreSlotFromDisk()?
I don't mind doing so myself if you agree with the change, only on
HEAD as you seemed to disagree about changing that on back-branches.

Also, from 691d79a which you just committed:
+   ereport(FATAL,
+   (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+errmsg("logical replication slots \"%s\" exists, but wal_level 
< logical",
+   NameStr(cp.slotdata.name)),
I can see one grammar mistake here, as you refer to only one slot here.
The error messages should read:
"logical replication slot \"%s\" exists, but wal_level < logical"
and:
"physical replication slot \"%s\" exists, but wal_level < replica"

Thanks,
--
Michael


signature.asc
Description: PGP signature


Re: INSTALL file

2018-10-31 Thread Andreas 'ads' Scherbaum

On 01.11.18 01:29, Michael Paquier wrote:

On Wed, Oct 31, 2018 at 08:30:40AM -0400, Stephen Frost wrote:

Agreed, we should really improve the README by merging the README.git
into it and make the project, as a whole, more accessible to new
developers.

+1.  I think as well that this approach would be a good thing.



Picking up on this idea, attached is a first draft for changing the README.

It includes links to the website, as well as the short version of the 
installation instructions.



Regards,

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project

diff --git a/README b/README
index 12de3f1d73..24814265e8 100644
--- a/README
+++ b/README
@@ -13,14 +13,47 @@ PostgreSQL has many language interfaces, many of which are listed here:
 
 	https://www.postgresql.org/download
 
-See the file INSTALL for instructions on how to build and install
-PostgreSQL.  That file also lists supported operating systems and
-hardware platforms and contains information regarding any other
-software packages that are required to build or run the PostgreSQL
-system.  Copyright and license information can be found in the
-file COPYRIGHT.  A comprehensive documentation set is included in this
-distribution; it can be read as described in the installation
-instructions.
+
+Installation
+
+
+The installation instructions are listed on the website:
+
+https://www.postgresql.org/docs/current/static/install-short.html
+
+Short version:
+
+./configure
+make
+su
+make install
+adduser postgres
+mkdir /usr/local/pgsql/data
+chown postgres /usr/local/pgsql/data
+su - postgres
+/usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data
+/usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/data -l logfile start
+
+The list of supported platforms is available on the website:
+
+https://www.postgresql.org/docs/current/static/supported-platforms.html
+
+Copyright and license information can be found in the file COPYRIGHT.
+
+
+Documentation
+=
+
+A comprehensive documentation set is included in this distribution;
+it can be read as described in the installation instructions.
+
+The documentation for all versions of PostgreSQL is available on the website:
+
+https://www.postgresql.org/docs/
+
+
+Download
+
 
 The latest version of this software may be obtained at
 https://www.postgresql.org/download/.  For more information look at our


Re: INSTALL file

2018-10-31 Thread Andres Freund
Hi,

On 2018-11-01 09:29:37 +0900, Michael Paquier wrote:
> On Wed, Oct 31, 2018 at 08:30:40AM -0400, Stephen Frost wrote:
> > Agreed, we should really improve the README by merging the README.git
> > into it and make the project, as a whole, more accessible to new
> > developers.
> 
> +1.  I think as well that this approach would be a good thing.

Strong +1.  I was recently looking through the information for new
developers and it's a major mess.

Greetings,

Andres Freund



Re: row filtering for logical replication

2018-10-31 Thread Euler Taveira
Em qua, 28 de fev de 2018 às 20:03, Euler Taveira
 escreveu:
> The attached patches add support for filtering rows in the publisher.
>
I rebased the patch. I added row filtering for initial
synchronization, pg_dump support and psql support. 0001 removes unused
code. 0002 reduces memory use. 0003 passes only structure member that
is used in create_estate_for_relation. 0004 reuses a parser node for
row filtering. 0005 is the feature. 0006 prints WHERE expression in
psql. 0007 adds pg_dump support. 0008 is only for debug purposes (I'm
not sure some of these messages will be part of the final patch).
0001, 0002, 0003 and 0008 are not mandatory for this feature.

Comments?


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
From b2e56eaa9e16246c8158ff2961a6a4b2acbd096b Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Fri, 9 Mar 2018 18:39:22 +
Subject: [PATCH 1/8] Remove unused atttypmod column from initial table
 synchronization

 Since commit 7c4f52409a8c7d85ed169bbbc1f6092274d03920, atttypmod was
 added but not used. The removal is safe because COPY from publisher
 does not need such information.
---
 src/backend/replication/logical/tablesync.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 6e420d8..f285813 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -660,7 +660,7 @@ fetch_remote_table_info(char *nspname, char *relname,
 	StringInfoData cmd;
 	TupleTableSlot *slot;
 	Oid			tableRow[2] = {OIDOID, CHAROID};
-	Oid			attrRow[4] = {TEXTOID, OIDOID, INT4OID, BOOLOID};
+	Oid			attrRow[3] = {TEXTOID, OIDOID, BOOLOID};
 	bool		isnull;
 	int			natt;
 
@@ -704,7 +704,6 @@ fetch_remote_table_info(char *nspname, char *relname,
 	appendStringInfo(&cmd,
 	 "SELECT a.attname,"
 	 "   a.atttypid,"
-	 "   a.atttypmod,"
 	 "   a.attnum = ANY(i.indkey)"
 	 "  FROM pg_catalog.pg_attribute a"
 	 "  LEFT JOIN pg_catalog.pg_index i"
@@ -714,7 +713,7 @@ fetch_remote_table_info(char *nspname, char *relname,
 	 "   AND a.attrelid = %u"
 	 " ORDER BY a.attnum",
 	 lrel->remoteid, lrel->remoteid);
-	res = walrcv_exec(wrconn, cmd.data, 4, attrRow);
+	res = walrcv_exec(wrconn, cmd.data, 3, attrRow);
 
 	if (res->status != WALRCV_OK_TUPLES)
 		ereport(ERROR,
@@ -735,7 +734,7 @@ fetch_remote_table_info(char *nspname, char *relname,
 		Assert(!isnull);
 		lrel->atttyps[natt] = DatumGetObjectId(slot_getattr(slot, 2, &isnull));
 		Assert(!isnull);
-		if (DatumGetBool(slot_getattr(slot, 4, &isnull)))
+		if (DatumGetBool(slot_getattr(slot, 3, &isnull)))
 			lrel->attkeys = bms_add_member(lrel->attkeys, natt);
 
 		/* Should never happen. */
-- 
2.7.4

From 797a0e8d858b490df7a9e1526f76e49fe1e10215 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Fri, 9 Mar 2018 17:37:36 +
Subject: [PATCH 2/8] Store number of tuples in WalRcvExecResult

It seems to be a useful information while allocating memory for queries
that returns more than one row. It reduces memory allocation
for initial table synchronization.

While in it, since we have the number of columns, allocate only nfields
for cstrs instead of MaxTupleAttributeNumber.
---
 src/backend/replication/libpqwalreceiver/libpqwalreceiver.c | 9 ++---
 src/backend/replication/logical/tablesync.c | 5 ++---
 src/include/replication/walreceiver.h   | 1 +
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 1e1695e..2533e3a 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -865,6 +865,7 @@ libpqrcv_processTuples(PGresult *pgres, WalRcvExecResult *walres,
  errdetail("Expected %d fields, got %d fields.",
 		   nRetTypes, nfields)));
 
+	walres->ntuples = PQntuples(pgres);
 	walres->tuplestore = tuplestore_begin_heap(true, false, work_mem);
 
 	/* Create tuple descriptor corresponding to expected result. */
@@ -875,7 +876,7 @@ libpqrcv_processTuples(PGresult *pgres, WalRcvExecResult *walres,
 	attinmeta = TupleDescGetAttInMetadata(walres->tupledesc);
 
 	/* No point in doing more here if there were no tuples returned. */
-	if (PQntuples(pgres) == 0)
+	if (walres->ntuples == 0)
 		return;
 
 	/* Create temporary context for local allocations. */
@@ -884,15 +885,17 @@ libpqrcv_processTuples(PGresult *pgres, WalRcvExecResult *walres,
 	   ALLOCSET_DEFAULT_SIZES);
 
 	/* Process returned rows. */
-	for (tupn = 0; tupn < PQntuples(pgres); tupn++)
+	for (tupn = 0; tupn < walres->ntuples; tupn++)
 	{
-		char	   *cstrs[MaxTupleAttributeNumber];
+		char	**cstrs;
 
 		CHECK_FOR_INTE

Re: INSTALL file

2018-10-31 Thread Michael Paquier
On Wed, Oct 31, 2018 at 08:30:40AM -0400, Stephen Frost wrote:
> Agreed, we should really improve the README by merging the README.git
> into it and make the project, as a whole, more accessible to new
> developers.

+1.  I think as well that this approach would be a good thing.
--
Michael


signature.asc
Description: PGP signature


Re: Speeding up INSERTs and UPDATEs to partitioned tables

2018-10-31 Thread David Rowley
On 1 November 2018 at 06:45, Robert Haas  wrote:
> On Wed, Aug 22, 2018 at 8:30 AM David Rowley
>  wrote:
>> On 22 August 2018 at 19:08, Amit Langote  
>> wrote:
>> > +#define PartitionTupRoutingGetToParentMap(p, i) \
>> > +#define PartitionTupRoutingGetToChildMap(p, i) \
>> >
>> > If the "Get" could be replaced by "Child" and "Parent", respectively,
>> > they'd sound more meaningful, imho.
>>
>> I did that to save 3 chars.  I think putting the additional
>> Child/Parent in the name is not really required. It's not as if we're
>> going to have a ParentToParent or a ChildToChild map, so I thought it
>> might be okay to assume that if it's "ToParent", that it's being
>> converted from the child and "ToChild" seems safe to assume it's being
>> converted from the parent. I can change it though if you feel very
>> strongly that what I've got is no good.
>
> I'm not sure exactly what is best here, but it seems to unlikely to me
> that somebody is going to read that macro name and think, oh, that
> means "get the to-parent map".  They are more like be confused by the
> juxtaposition of "get" and "to".
>
> I think a better way to shorten the name would be to truncate the
> PartitionTupRouting() prefix in some way, e.g. dropping TupRouting.

Thanks for chipping in on this.

I agree. I don't think "TupRouting" really needs to be in the name.
Probably "To" can also just become "2" and we can put back the
Parent/Child before that.

I've attached v11, which does this.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v11-0001-Speed-up-INSERT-and-UPDATE-on-partitioned-tables.patch
Description: Binary data


Re: Super PathKeys (Allowing sort order through precision loss functions)

2018-10-31 Thread Andres Freund
Hi,

On 2018-11-01 12:19:32 +1300, David Rowley wrote:
> On 1 November 2018 at 12:11, Tomas Vondra  
> wrote:
> > I still have trouble imagining what exactly would the function do to
> > determine if the optimization can be applied to substr() and similar
> > collation-dependent cases.
> 
> I guess the function would have to check for a Const offset of 0, and
> a collection, perhaps of "C" for the 1st arg.  In any case, I wouldn't
> want this idea to be hung up on the fact we can't determine how to
> make substr() work correctly with it.
> 
> I'm most interested in date_trunc() and friends. A first cut
> implementation would not have to implement functions for everything
> that's possible to implement.

FWIW, I kind of wonder if we built proper infrastructure to allow to
make such inferrences from function calls, whether it could also be made
to support the transformation of LIKEs into indexable <= >= clauses.

Greetings,

Andres Freund



Re: COPY FROM WHEN condition

2018-10-31 Thread Nasby, Jim
On Oct 11, 2018, at 10:35 AM, David Fetter  wrote:
> 
>> It didn't get far, but you may want to take a look at a rejected patch for
>> copy_srf() (set returning function)
>> https://www.postgresql.org/message-id/CADkLM%3DdoeiWQX4AGtDNG4PsWfSXz3ai7kY%3DPZm3sUhsUeev9Bg%40mail.gmail.com
>> https://commitfest.postgresql.org/12/869/
>> 
>> Having a set returning function gives you the full expressiveness of SQL,
>> at the cost of an extra materialization step.
> 
> I wonder whether something JIT-like could elide this. A very
> interesting subset of such WHEN clauses could be pretty
> straight-forward to implement in a pretty efficient way.

Are you thinking something like having a COPY command that provides results in 
such a way that they could be referenced in a FROM clause (perhaps a COPY that 
defines a cursor…)?

Re: Super PathKeys (Allowing sort order through precision loss functions)

2018-10-31 Thread David Rowley
On 1 November 2018 at 12:11, Tomas Vondra  wrote:
> I still have trouble imagining what exactly would the function do to
> determine if the optimization can be applied to substr() and similar
> collation-dependent cases.

I guess the function would have to check for a Const offset of 0, and
a collection, perhaps of "C" for the 1st arg.  In any case, I wouldn't
want this idea to be hung up on the fact we can't determine how to
make substr() work correctly with it.

I'm most interested in date_trunc() and friends. A first cut
implementation would not have to implement functions for everything
that's possible to implement.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Parallel threads in query

2018-10-31 Thread David Fetter
On Wed, Oct 31, 2018 at 09:07:43AM -1000, Darafei "Komяpa" Praliaskouski wrote:
> Hi,
> 
> I've tried porting some of PostGIS algorithms to utilize multiple cores via
> OpenMP to return faster.

Great!

> Question is, what's the best policy to allocate cores so we can play nice
> with rest of postgres?

As Tom mentioned, the PostgreSQL backend is not yet threaded and is
not likely to be any time in the next couple of years. There
has been at least one attempt recently to start this work.
https://www.postgresql.org/message-id/9defcb14-a918-13fe-4b80-a0b02ff85527%40postgrespro.ru
 

> What I'd like to see is some function that I can call and get a
> number of threads I'm allowed to run, that will also advise rest of
> postgres to not use them, and a function to return the cores back
> (or do it automatically at the end of query). Is there an
> infrastructure for that?

Not really, as above.  In the case of GIS queries, you may be able to
send a large-enough payload of work to make it worthwhile to do some
kind of IPC or even a (short, high-bandwidth, low-latency) network hop
to communicate with a separate OpenMP GIS server.  So long as the
threads don't directly interact with the backend, you could do this
safely.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Super PathKeys (Allowing sort order through precision loss functions)

2018-10-31 Thread Tomas Vondra
On 10/31/2018 10:07 PM, David Rowley wrote:
> On 1 November 2018 at 05:40, Robert Haas  wrote:
>> This kinda reminds me of commit
>> 8f9fe6edce358f7904e0db119416b4d1080a83aa.  We needed a way to provide
>> the planner with knowledge about the behavior of specific functions.
>> In that case, the specific need was to be able to tell the planner
>> that a certain function call could be omitted or strength-reduced, and
>> we did that by having the planner call a function that encoded the
>> necessary knowledge.  Here, we want to evaluate a function call and
>> see whether it is order preserving, which could depend on a whole
>> bunch of stuff that isn't easily parameterized by catalog entries, but
>> could be figured out by a C function written for that purpose.  I'm
>> not really sure how that would work in this case, or whether it's a
>> good idea, but I thought I'd mention it just in case it's helpful.
> 
> Agreed. That's a good idea. Thanks.
> 

FWIW this is mostly what I had in mind when referring to the selectivity
estimation functions for operators, although I now realize it might not
have been explained very clearly.

Anyway, I agree this seems like a much better way than trying to store
all the potentially relevant meta-data in catalogs.

I still have trouble imagining what exactly would the function do to
determine if the optimization can be applied to substr() and similar
collation-dependent cases.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Super PathKeys (Allowing sort order through precision loss functions)

2018-10-31 Thread Nasby, Jim

> On Oct 30, 2018, at 9:08 AM, Simon Riggs  wrote:
> 
> On Tue, 30 Oct 2018 at 07:58, David Rowley  
> wrote:
>  
> I've started working on something I've ended up calling "Super
> PathKeys".  The idea here is to increase the likelihood of a Path with
> PathKeys being used for a purpose that requires a less strict sort
> order due to ordering being required from the return value of some
> precision loss function.
> 
> Anything left anchored would benefit, so SUBSTR(), TRIM() etc
> 
> Main use for this would be where the partition condition is a function, so we 
> can still order by partitions easily.

This would also be very helpful in many BI cases; it’s very common to aggregate 
based on year, year/month, year/quarter, etc.

The other thing that would be extremely useful would be pushing predicats 
through this, so you could do things like

WHERE date_trunc(‘year’, timestamp_field) = 2018

Re: PostgreSQL Limits and lack of documentation about them.

2018-10-31 Thread Nasby, Jim

> On Oct 31, 2018, at 5:22 PM, David Rowley  
> wrote:
> 
> On 1 November 2018 at 04:40, John Naylor  wrote:
>> Thanks for doing this. I haven't looked at the rendered output yet,
>> but I have some comments on the content.
>> 
>> +  Maximum Relation Size
>> +  32 TB
>> +  Limited by 2^32 pages per relation
>> 
>> I prefer "limited to" or "limited by the max number of pages per
>> relation, ...". I think pedantically it's 2^32 - 1, since that value
>> is used for InvalidBlockNumber. More importantly, that seems to be for
>> 8kB pages. I imagine this would go up with a larger page size. Page
>> size might also be worth mentioning separately. Also max number of
>> relation file segments, if any.
> 
> Thanks for looking at this.
> 
> I've changed this and added mention of BLKSIZE.  I was a bit unclear
> on how much internal detail should go into this.

It’s a bit misleading to say “Can be increased by increasing BLKSZ and 
recompiling”, since you’d also need to re initdb. Given that messing with BLKSZ 
is pretty uncommon I would simply put a note somewhere that mentions that these 
values assume the default BLKSZ of 8192.

>> +  Maximum Columns per Table
>> +  250 - 1600
>> +  Depending on column types. (More details here)
>> 
>> Would this also depend on page size? Also, I'd put this entry before this 
>> one:
>> 
>> +  Maximum Row Size
>> +  1600 GB
>> +  Assuming 1600 columns, each 1 GB in size
>> 
>> A toast pointer is 18 bytes, according to the docs, so I would guess
>> the number of toasted columns would actually be much less? I'll test
>> this on my machine sometime (not 1600GB, but the max number of toasted
>> columns per tuple).
> 
> I did try a table with 1600 text columns then inserted values of
> several kB each. Trying with BIGINT columns the row was too large for
> the page. I've never really gotten a chance to explore these limits
> before, so I guess this is about the time.

Hmm… 18 bytes doesn’t sound right, at least not for the Datum. Offhand I’d 
expect it to be the small (1 byte) varlena header + an OID (4 bytes). Even then 
I don’t understand how 1600 text columns would work; the data area of a tuple 
should be limited to ~2000 bytes, and 2000/5 = 400.

Re: replication_slots usability issue

2018-10-31 Thread Andres Freund
On 2018-10-30 10:52:54 -0700, Andres Freund wrote:
> On 2018-10-30 11:51:09 +0900, Michael Paquier wrote:
> > On Mon, Oct 29, 2018 at 12:13:04PM -0700, Andres Freund wrote:
> > > I don't think this quite is the problem. ISTM the issue is rather that
> > > StartupReplicationSlots() *needs* to check whether wal_level > minimal,
> > > and doesn't. So you can create a slot, shutdown, change wal_level,
> > > startup. A slot exists but won't work correctly.
> > 
> > It seems to me that what we are looking for is just to complain at
> > startup if we find any slot data and if trying to start up with
> > wal_level = minimal.
> 
> Right, we really should just call CheckSlotRequirements() before doing
> so. I'll make it so, once I'm actually awake and had some coffee.

And done.  Thanks for the report JD.

- Andres



Re: PostgreSQL Limits and lack of documentation about them.

2018-10-31 Thread David Rowley
On 1 November 2018 at 04:40, John Naylor  wrote:
> Thanks for doing this. I haven't looked at the rendered output yet,
> but I have some comments on the content.
>
> +  Maximum Relation Size
> +  32 TB
> +  Limited by 2^32 pages per relation
>
> I prefer "limited to" or "limited by the max number of pages per
> relation, ...". I think pedantically it's 2^32 - 1, since that value
> is used for InvalidBlockNumber. More importantly, that seems to be for
> 8kB pages. I imagine this would go up with a larger page size. Page
> size might also be worth mentioning separately. Also max number of
> relation file segments, if any.

Thanks for looking at this.

I've changed this and added mention of BLKSIZE.  I was a bit unclear
on how much internal detail should go into this.

> +  Maximum Columns per Table
> +  250 - 1600
> +  Depending on column types. (More details here)
>
> Would this also depend on page size? Also, I'd put this entry before this one:
>
> +  Maximum Row Size
> +  1600 GB
> +  Assuming 1600 columns, each 1 GB in size
>
> A toast pointer is 18 bytes, according to the docs, so I would guess
> the number of toasted columns would actually be much less? I'll test
> this on my machine sometime (not 1600GB, but the max number of toasted
> columns per tuple).

I did try a table with 1600 text columns then inserted values of
several kB each. Trying with BIGINT columns the row was too large for
the page. I've never really gotten a chance to explore these limits
before, so I guess this is about the time.

> +  Maximum Identifier Length
> +  63 characters
> +  
>
> Can this be increased with recompiling, if not conveniently?

Yeah. I added a note about that.

> +  Maximum Indexed Columns
> +  32
> +  Can be increased by recompiling
> PostgreSQL
>
> How about the max number of included columns in a covering index?

Those are included in the limit. I updated the text.

>> I'm not so sure about detailing limits of GUCs since the limits of
>> those are mentioned in pg_settings.
>
> Maybe we could just have a link to that section in the docs.

That's likely a good idea. I was just unable to find anything better
than the link to the pg_settings view.

I've attached an updated patch, again it's just intended as an aid for
discussions at this stage. Also included the rendered html.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Title: Appendix B. Database Limitations

Appendix B. Database LimitationsPrev UpPart VIII. AppendixesHome NextAppendix B. Database Limitations
 The following table describes the limits of PostgreSQL
 Table B.1. PostgreSQL limitationsItemLimitCommentMaximum Database SizeUnlimited Maximum Number of DatabasesUnlimited Maximum Relation Size32 TBLimited to 2^32 - 1 pages per relation. Can be increased by
  increasing BLCKSZ and recompiling
  PostgreSQLMaximum Columns per Table250 - 1600Depending on column types. (More details here)Maximum Row Size1600 GBAssuming 1600 columns, each 1 GB in sizeMaximum Field Size1 GB Maximum Identifier Length63 charactersCan be increased by recompiling PostgreSQLMaximum Rows per TableUnlimited Maximum Indexes per TableUnlimited Maximum Indexed Columns32Can be increased by recompiling PostgreSQL. Limit includes
  any INCLUDE columnsMaximum Partition Keys32Can be increased by recompiling PostgreSQLMaximum Relations per DatabaseUnlimited Maximum Partitions per Partitioned Relations268,435,456May be increased by using sub-partitioningPrev Up NextAppendix A. PostgreSQL Error Codes Home Appendix C. Date/Time Support

v2-0001-Add-documentation-section-appendix-detailing-some.patch
Description: Binary data


Re: Should pg 11 use a lot more memory building an spgist index?

2018-10-31 Thread Tom Lane
Amit Langote  writes:
> Maybe, we don't need to spoil the interface of index_beginscan with the
> new memory context argument like my patch does if the simple following of
> its contract by amendscan would suffice.

Yeah, I'm not enamored of changing the API of index_beginscan for this;
the existing convention that it should allocate in CurrentMemoryContext
seems perfectly suitable.  And changing it would create a lot of code
churn, not only for us but for externally-maintained AMs.

What is at stake is changing the API of amendscan, to specify that it
need not release memory because the current context is expected to be
destroyed or reset shortly after ending the scan.  Then, for the small
number of call sites where that wouldn't be true, it's on those callers
to set up a suitable context and switch into it.  Note this is actually
forwards compatible, in that an AM that's still following the convention
of releasing stuff manually would not be broken.

regards, tom lane



Re: Parallel threads in query

2018-10-31 Thread Tom Lane
=?UTF-8?Q?Darafei_=22Kom=D1=8Fpa=22_Praliaskouski?=  writes:
> Question is, what's the best policy to allocate cores so we can play nice
> with rest of postgres?

> What I'd like to see is some function that I can call and get a number of
> threads I'm allowed to run, that will also advise rest of postgres to not
> use them, and a function to return the cores back (or do it automatically
> at the end of query). Is there an infrastructure for that?

There is not, because we do not use or support multiple threads inside
a Postgres backend, and have no intention of doing so any time soon.
There is a huge amount of non-thread-safe infrastructure there, and
if you call any of it from other than the main thread, bad things will
happen.

You might be able to make this work if the threaded stuff is kept rigidly
separate from any core Postgres code, but it'll be error-prone.

regards, tom lane



Re: Super PathKeys (Allowing sort order through precision loss functions)

2018-10-31 Thread David Rowley
On 1 November 2018 at 05:40, Robert Haas  wrote:
> This kinda reminds me of commit
> 8f9fe6edce358f7904e0db119416b4d1080a83aa.  We needed a way to provide
> the planner with knowledge about the behavior of specific functions.
> In that case, the specific need was to be able to tell the planner
> that a certain function call could be omitted or strength-reduced, and
> we did that by having the planner call a function that encoded the
> necessary knowledge.  Here, we want to evaluate a function call and
> see whether it is order preserving, which could depend on a whole
> bunch of stuff that isn't easily parameterized by catalog entries, but
> could be figured out by a C function written for that purpose.  I'm
> not really sure how that would work in this case, or whether it's a
> good idea, but I thought I'd mention it just in case it's helpful.

Agreed. That's a good idea. Thanks.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Parallel threads in query

2018-10-31 Thread Komяpa
Hi,

I've tried porting some of PostGIS algorithms to utilize multiple cores via
OpenMP to return faster.

Question is, what's the best policy to allocate cores so we can play nice
with rest of postgres?

What I'd like to see is some function that I can call and get a number of
threads I'm allowed to run, that will also advise rest of postgres to not
use them, and a function to return the cores back (or do it automatically
at the end of query). Is there an infrastructure for that?
-- 
Darafei Praliaskouski
Support me: http://patreon.com/komzpa


Re: Super PathKeys (Allowing sort order through precision loss functions)

2018-10-31 Thread Tom Lane
Robert Haas  writes:
> This kinda reminds me of commit
> 8f9fe6edce358f7904e0db119416b4d1080a83aa.  We needed a way to provide
> the planner with knowledge about the behavior of specific functions.
> In that case, the specific need was to be able to tell the planner
> that a certain function call could be omitted or strength-reduced, and
> we did that by having the planner call a function that encoded the
> necessary knowledge.  Here, we want to evaluate a function call and
> see whether it is order preserving, which could depend on a whole
> bunch of stuff that isn't easily parameterized by catalog entries, but
> could be figured out by a C function written for that purpose.

+1.  If we're otherwise going to need multiple pg_proc columns, this
is a better answer just from the standpoint of avoiding pg_proc bloat:
we'd only need to add one OID column.  And I concur with your point
that we're going to have a really hard time parameterizing the mechanism
adequately if there isn't dedicated per-function code.

regards, tom lane



Re: Continue work on changes to recovery.conf API

2018-10-31 Thread Robert Haas
On Fri, Sep 28, 2018 at 4:20 PM Peter Eisentraut
 wrote:
> > - recovery_target (immediate), recovery_target_name, recovery_target_time, 
> > recovery_target_xid, recovery_target_lsn are replaced to 
> > recovery_target_type and recovery_target_value (was discussed and changed 
> > in previous thread)
>
> I think this was the major point of contention.  I reread the old
> thread, and it's still not clear why we need to change this.  _type and
> _value look like an EAV system to me.  GUC variables should be
> verifiable independent of another variable.  The other idea of using
> only one variable seems better, but using two variables seems like a
> poor compromise between one and five.

+1.  I like one best, I can live with five, and I think two stinks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgbench doc fix

2018-10-31 Thread Robert Haas
On Tue, Oct 30, 2018 at 8:48 AM Tatsuo Ishii  wrote:
> Yes, you need to send params (thus send bind message) anyway.
> Regarding re-parsing, maybe you mixed up parse-analythis with
> planning? Re-parse-analythis can only be avoided if you can reuse
> named (or unnamed) parepared statements.

So given this, I'm struggling to see anything wrong with the current
wording.  I mean, if you say that you are reusing prepared statements,
someone will assume that you are avoiding preparing them repeatedly,
which -M extended will not do ... and by the nature of that approach,
cannot do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Speeding up INSERTs and UPDATEs to partitioned tables

2018-10-31 Thread Robert Haas
On Wed, Aug 22, 2018 at 8:30 AM David Rowley
 wrote:
> On 22 August 2018 at 19:08, Amit Langote  
> wrote:
> > +#define PartitionTupRoutingGetToParentMap(p, i) \
> > +#define PartitionTupRoutingGetToChildMap(p, i) \
> >
> > If the "Get" could be replaced by "Child" and "Parent", respectively,
> > they'd sound more meaningful, imho.
>
> I did that to save 3 chars.  I think putting the additional
> Child/Parent in the name is not really required. It's not as if we're
> going to have a ParentToParent or a ChildToChild map, so I thought it
> might be okay to assume that if it's "ToParent", that it's being
> converted from the child and "ToChild" seems safe to assume it's being
> converted from the parent. I can change it though if you feel very
> strongly that what I've got is no good.

I'm not sure exactly what is best here, but it seems to unlikely to me
that somebody is going to read that macro name and think, oh, that
means "get the to-parent map".  They are more like be confused by the
juxtaposition of "get" and "to".

I think a better way to shorten the name would be to truncate the
PartitionTupRouting() prefix in some way, e.g. dropping TupRouting.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



How to properly use the Executor interface?

2018-10-31 Thread Kai Kratz
Hi Hackers,

first time writing to the hackers list, so I hope this is the right place to 
ask. I recently joined Swarm64 and we are building a postgres extension with 
the fdw interface.

I am trying to evaluate sql statements with ExecutorBeing, -Run, -End, -Finish 
calls during ExecForeignInsert. I set up a QueryDesc  and call ExecutorStart 
during BeginForeignModify and store the QueryDesc in the fdwState to call 
ExecutorRun / Rewind on each ExecForeignInsert. On EndForeignModify 
ExecutorFinish and ExecutorEnd are called. The statement is used to generate an 
additional column generated from the value of one ore more present columns, 
e.g. hashing a string.

I am now not really sure if I am abusing the interface and if this is save to 
be used in such a way. I noticed there are hooks for Start, Run, End and Finish 
and I am unsure what requirements this places on my usage.

Initially I tried to keep the started Executor around beyond the lifetime of a 
query and only clean up when the worker process shuts down. This did not end 
well (as in SIGSEGV) as soon as I loaded auto_explain which tried to report on 
the 'slow query' outside of a portal.

Give the comments in execMain.c I am also wondering what cleanup I need to do 
in case the insert fails after ExecutorStart.

Cheers
--
Kai Kratz
Swarm64: https://swarm64.com/



Re: Is there way to detect uncommitted 'new table' in pg_class?

2018-10-31 Thread Robert Haas
On Wed, Oct 31, 2018 at 6:05 AM Hubert Zhang  wrote:
> In PG READ UNCOMMITTED is treated as READ COMMITTED
> But I have a requirement to read dirty table. Is there way to detect table 
> which is created in other uncommitted transaction?
>
> T1:
> BEGIN;
> create table a(i int);
>
> T2:
> select * from pg_class where relname='a';
> could return table a?

No.  The catalog entries are uncommitted, and therefore invisible to
other transactions.

In theory, at least, you could write C code to scan the catalog tables
with SnapshotDirty to find the catalog entries, but I don't think that
helps a whole lot.  You couldn't necessarily rely on those catalog
entries to be in a consistent state, and even if they were, they might
depend on committed types or functions or similar whose definitions
your backend can't see.  Moreover, the creating backend will have an
AccessExclusiveLock on the table -- if you write C code to bypass that
and read the data anyway, then you will probably destabilize the
entire system for complicated reasons that I don't feel like
explaining right now.

You should try very hard to find some way of solving this problem that
doesn't require reading data from a table that hasn't been committed
yet, because you are almost certainly not going to be able to make
that work reliably even if you are willing to write code in C.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pg_dumpall --exclude-database option

2018-10-31 Thread Fabien COELHO



:-( My fault, I just created a new one.


Hmmm... so did I:-) We did it a few minutes apart. I did not find yours 
when I first searched, then I proceeded to try to move the previous CF 
entry which had been marked as "returned" but this was rejected, so I 
recreated the one without checking whether it had appeared in between.


Hopefully someone can remove it…

--
Fabien.

Re: ToDo: show size of partitioned table

2018-10-31 Thread Pavel Stehule
st 31. 10. 2018 v 7:34 odesílatel Amit Langote <
langote_amit...@lab.ntt.co.jp> napsal:

> On 2018/10/31 15:30, Pavel Stehule wrote:
> > st 31. 10. 2018 v 3:27 odesílatel Amit Langote <
> > langote_amit...@lab.ntt.co.jp> napsal:
> >> +appendPQExpBufferStr(&buf, "\nWHERE c.relkind IN ('p')\n");
> >>
> >> I wonder if we should list partitioned indexes ('I') as well, because
> >> their size information is not available with \di+.  But maybe, they
> should
> >> have a separate command.
> >>
> >
> > I though about it too and I prefer separate command. Similar to \di+
>
> Okay, maybe \dI+.
>
>
what about combination

\dPt+ for partitined tables and size
\dPi+ for indexes on partitioned tables and size
\dP+ for total partition size (tables + indexes)

What do you think about it?

Regards

Pavel



> > I am not sure. Has not sense run this test over empty database, and some
> > bigger database can increase running.
> >
> > More the size can be platform depend.
>
> Okay, sure.
>
> Thanks,
> Amit
>
>


Re: FETCH FIRST clause WITH TIES option

2018-10-31 Thread Robert Haas
On Mon, Oct 29, 2018 at 12:48 PM Andrew Gierth
 wrote:
> Then FETCH FIRST N WITH TIES becomes "stop when the expression
>   rank() over (order by ...) <= N
> is no longer true" (where the ... is the existing top level order by)

Wouldn't that be wicked expensive compared to something hard-coded
that does the same thing?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pg_promote not marked as parallel-restricted in pg_proc.dat

2018-10-31 Thread Robert Haas
On Mon, Oct 29, 2018 at 6:48 PM Michael Paquier  wrote:
> All the backup-related functions doing on-disk activity are marked as
> parallel-restricted:
> =# select proparallel, proname from pg_proc where proname ~ 'backup';
>  proparallel |   proname
> -+--
>  s   | pg_is_in_backup
>  s   | pg_backup_start_time
>  r   | pg_stop_backup
>  r   | pg_start_backup
>  r   | pg_stop_backup
> (5 rows)
>
> As restricted, only the group leader is allowed to execute the
> function.  So as long as we enforce the execution uniqueness, I don't
> think that there is any point in enforcing a serial scan for the whole
> query.  Queries launching pg_promote are unlikely going to be much
> complicated, still it does not seem a consistent experience to me to not
> use the same level for such operations.

There's no rule whatsoever that a parallel worker can't write to the
disk.  pg_start_backup and pg_stop_backup have to be
parallel-restricted because, when used in non-exclusive mode, they
establish backend-local state that wouldn't be synchronized with the
state in the workers -- namely the information that a non-exclusive
backup is in progress.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pg_dumpall --exclude-database option

2018-10-31 Thread Andrew Dunstan




On 10/31/2018 12:44 PM, Fabien COELHO wrote:


Hello Andrew,


This patch addresses all these concerns.


Patch v4 applies cleanly, compiles, doc generation ok, global & local 
tests ok.


Tiny comments: there is a useless added blank line at the beginning of 
the added varlistenry.


I have recreated the CF entry and put the patch to ready... but I've 
must have mixed up something because now there are two entries:-(


Could anywone remove the duplicate entry (1859 & 1860 are the same)? 
Peter??





:-( My fault, I just created a new one.

cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: WIP: Avoid creation of the free space map for small tables

2018-10-31 Thread Robert Haas
On Tue, Oct 23, 2018 at 9:42 AM John Naylor  wrote:
> A case could be made for setting the threshold to 4, since not having
> 3 blocks of FSM in shared buffers exactly makes up for the 3 other
> blocks of heap that are checked when free space runs out.

That doesn't seem like an unreasonable argument.  I'm not sure whether
the right threshold is 4 or something a little bigger, but I bet it's
not very large.  It seems important to me that before anybody thinks
about committing this, we construct some kind of destruction case
where repeated scans of the whole table are triggered as frequently as
possible, and then run that test with varying thresholds.  I might be
totally wrong, but I bet with a value as large as 32 you will be able
to find cases where it regresses in a big way.

We also need to think about what happens on the standby, where the FSM
is updated in a fairly different way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pg_dumpall --exclude-database option

2018-10-31 Thread Fabien COELHO



Hello Andrew,


This patch addresses all these concerns.


Patch v4 applies cleanly, compiles, doc generation ok, global & local 
tests ok.


Tiny comments: there is a useless added blank line at the beginning of the 
added varlistenry.


I have recreated the CF entry and put the patch to ready... but I've must 
have mixed up something because now there are two entries:-(


Could anywone remove the duplicate entry (1859 & 1860 are the same)? 
Peter??


--
Fabien.



Re: Super PathKeys (Allowing sort order through precision loss functions)

2018-10-31 Thread Robert Haas
On Wed, Oct 31, 2018 at 9:19 AM Tomas Vondra
 wrote:
> I think it can't be made just by adding a couple of columns to pg_proc,
> as explained above. A separate catalog mapping procs to opclasses (and
> maybe collations) may be needed. For cases where it depends on the
> actual parameter values (like substr/trim/ltrim) an extra function might
> be needed (something like the selectivity functions for data types).

This kinda reminds me of commit
8f9fe6edce358f7904e0db119416b4d1080a83aa.  We needed a way to provide
the planner with knowledge about the behavior of specific functions.
In that case, the specific need was to be able to tell the planner
that a certain function call could be omitted or strength-reduced, and
we did that by having the planner call a function that encoded the
necessary knowledge.  Here, we want to evaluate a function call and
see whether it is order preserving, which could depend on a whole
bunch of stuff that isn't easily parameterized by catalog entries, but
could be figured out by a C function written for that purpose.  I'm
not really sure how that would work in this case, or whether it's a
good idea, but I thought I'd mention it just in case it's helpful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: FDW Parallel Append

2018-10-31 Thread Sanyo Moura
Hello Amit
Tks a lot, that worked.
Now I have to deal with other concerns when lauching more than 1 worker.
I'll try to advance my tests.

Regards,

Sanyo Moura

Em qua, 31 de out de 2018 às 01:38, Amit Langote <
langote_amit...@lab.ntt.co.jp> escreveu:

> Hi,
>
> On 2018/10/31 3:25, Sanyo Moura wrote:
> > Hi hackers,
> >
> > I am trying to improve my xdr_fdw (a foreign data wrapper that scan file
> > systems that keep big data compacted) to scan partitions in parallel.
> >
> > But when I execute or analyze I get an error:
> >
> > EXPLAIN ANALYZE SELECT * FROM precio WHERE fecha BETWEEN '2017-01-01' AND
> > '2017-01-02'
> >
> > ERROR:  ExtensibleNodeMethods "XDRInfo" was not registered
> > CONTEXT:  parallel worker
> > SQL state: 42704
> >
> > XDRInfo is my private structure that implement extensible methods.
> > It is registered in my "xdr_fdw_handler"
> >
> > Datum
> > xdr_fdw_handler(PG_FUNCTION_ARGS)
> > {
> > FdwRoutine *routine = makeNode(FdwRoutine);
> > .
> > .
> > .
> > RegisterXDRInfoExtension();
> >
> > PG_RETURN_POINTER(routine);
> > }
> >
> > I think that each new worker process need to call my
> > RegisterXDRInfoExtension function.
> > But where do I call it? Do I have an entry point each time a new worker
> > process is created?
>
> I think you'll need to call your Register* function from the _PG_init
> function of your module.  If you haven't defined the _PG_init function,
> you should do that first.
>
> Thanks,
> Amit
>
>


Re: PostgreSQL Limits and lack of documentation about them.

2018-10-31 Thread John Naylor
On 10/30/18, David Rowley  wrote:
> On 26 October 2018 at 11:40, Haribabu Kommi 
> wrote:
>> On Fri, Oct 26, 2018 at 9:30 AM David Rowley
>> 
>> wrote:
>>>
>>> For a long time, we documented our table size, max columns, max column
>>> width limits, etc. in https://www.postgresql.org/about/ , but that
>>> information seems to have now been removed. The last version I can
>>> find with the information present is back in April this year. Here's a
>>> link to what we had:
>>>
>>> https://web.archive.org/web/20180413232613/https://www.postgresql.org/about/
>>>
>>> I think it's a bit strange that we don't have this information fairly
>>> early on in the official documentation.  I only see a mention of the
>>> 1600 column limit in the create table docs. Nothing central and don't
>>> see mention of 32 TB table size limit.
>>>
>>> I don't have a patch, but I propose we include this information in the
>>> docs, perhaps on a new page in the preface part of the documents.
>>
>>
>> I also try to find such limits of PostgreSQL, but I couldn't find it.
>> +1 to add them to docs.
>
> I've attached a very rough patch which adds a new appendix section
> named "Database Limitations".  I've included what was mentioned in [1]
> plus I've added a few other things that I thought should be mentioned.
> I'm sure there will be many more ideas.

David,
Thanks for doing this. I haven't looked at the rendered output yet,
but I have some comments on the content.

+  Maximum Relation Size
+  32 TB
+  Limited by 2^32 pages per relation

I prefer "limited to" or "limited by the max number of pages per
relation, ...". I think pedantically it's 2^32 - 1, since that value
is used for InvalidBlockNumber. More importantly, that seems to be for
8kB pages. I imagine this would go up with a larger page size. Page
size might also be worth mentioning separately. Also max number of
relation file segments, if any.

+  Maximum Columns per Table
+  250 - 1600
+  Depending on column types. (More details here)

Would this also depend on page size? Also, I'd put this entry before this one:

+  Maximum Row Size
+  1600 GB
+  Assuming 1600 columns, each 1 GB in size

A toast pointer is 18 bytes, according to the docs, so I would guess
the number of toasted columns would actually be much less? I'll test
this on my machine sometime (not 1600GB, but the max number of toasted
columns per tuple).

+  Maximum Identifier Length
+  63 characters
+  

Can this be increased with recompiling, if not conveniently?

+  Maximum Indexed Columns
+  32
+  Can be increased by recompiling
PostgreSQL

How about the max number of included columns in a covering index?

> I'm not so sure about detailing limits of GUCs since the limits of
> those are mentioned in pg_settings.

Maybe we could just have a link to that section in the docs.

--
-John Naylor



Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-10-31 Thread Andres Freund
Hi,

On 2018-10-31 11:13:13 -0400, Andrew Dunstan wrote:
> I agree that just sending a blob of the internal format isn't a great idea.

It's entirely unacceptable afaict. Besides the whole "exposing
internals" issue, it's also at least not endianess safe, depends on the
local alignment requirements (which differ both between platforms and
32/64 bit), numeric's internal encoding and probably more.

Greetings,

Andres Freund



Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-10-31 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> I dunno, I do not think it's a great idea to expose jsonb's internal
> >> format to the world.  We intentionally did not do that when the type
> >> was first defined --- that's why its binary I/O format isn't already
> >> like this --- and I don't see that the tradeoffs have changed since then.
> 
> > I disagree- it's awfully expensive to go back and forth between string
> > and a proper representation.
> 
> Has anyone put any effort into making jsonb_out() faster?  I think that
> that would be way more productive.  Nobody is going to want to write
> code to convert jsonb's internal form into whatever their application
> uses; particularly not dealing with numeric fields.

I'm all for making jsonb_out() faster, but even a faster jsonb_out()
isn't going to be faster than shoveling the jsonb across.

The concern over the application question seems like a complete red
herring to me- people will bake into the various libraries the ability
to convert from our jsonb format to the language's preferred json data
structure, and even if that doesn't happen, we clearly have someone here
who is saying they'd write code to convert from our jsonb format to
whatever their application or language uses, and we've heard that
multiple times on this list.

> In any case, the approach proposed in this patch seems like the worst
> of all possible worlds: it's inconsistent and we get zero benefit from
> having thrown away our information-hiding.  If we're going to expose the
> internal format, let's just change the definition of the type's binary
> I/O format, thereby getting a win for purposes like COPY BINARY as well.

I hadn't looked at the patch, so I'm not surprised it has issues.  I
recall there was some prior discussion about what a good approach was to
implementing this and that changing the binary i/o format was the way to
go, but certainly a review of the threads should be done by whomever
wants to implement this or review the patch.

> We'd need to ensure that jsonb_recv could tell whether it was seeing the
> old or new format, but at worst that'd require prepending a header of
> some sort.  (In practice, I suspect we'd end up with a wire-format
> definition that isn't exactly the bits-on-disk, but something easily
> convertible to/from that and more easily verifiable by jsonb_recv.
> Numeric subfields, for instance, ought to match the numeric wire
> format, which IIRC isn't exactly the bits-on-disk either.)

Agreed, that'd certainly be a good idea.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-10-31 Thread Andrew Dunstan




On 10/31/2018 10:21 AM, Tom Lane wrote:

Stephen Frost  writes:

* Tom Lane (t...@sss.pgh.pa.us) wrote:

I dunno, I do not think it's a great idea to expose jsonb's internal
format to the world.  We intentionally did not do that when the type
was first defined --- that's why its binary I/O format isn't already
like this --- and I don't see that the tradeoffs have changed since then.

I disagree- it's awfully expensive to go back and forth between string
and a proper representation.

Has anyone put any effort into making jsonb_out() faster?  I think that
that would be way more productive.  Nobody is going to want to write
code to convert jsonb's internal form into whatever their application
uses; particularly not dealing with numeric fields.

In any case, the approach proposed in this patch seems like the worst
of all possible worlds: it's inconsistent and we get zero benefit from
having thrown away our information-hiding.  If we're going to expose the
internal format, let's just change the definition of the type's binary
I/O format, thereby getting a win for purposes like COPY BINARY as well.
We'd need to ensure that jsonb_recv could tell whether it was seeing the
old or new format, but at worst that'd require prepending a header of
some sort.  (In practice, I suspect we'd end up with a wire-format
definition that isn't exactly the bits-on-disk, but something easily
convertible to/from that and more easily verifiable by jsonb_recv.
Numeric subfields, for instance, ought to match the numeric wire
format, which IIRC isn't exactly the bits-on-disk either.)





jsonb_send() sends a version number byte, currently 1. So if we invent a 
new version we would send 2 and teach jsonb_recv to be able to handle 
both. This was kinda anticipated.


I agree that just sending a blob of the internal format isn't a great idea.

cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Ordered Partitioned Table Scans

2018-10-31 Thread Antonin Houska
David Rowley  wrote:

> On 31 October 2018 at 13:05, David Rowley  
> wrote:
> >>> On 28 October 2018 at 03:49, Julien Rouhaud  wrote:
> >> I've registered as a reviewer.   I still didn't have a deep look at
> >> the patch yet, but thanks a lot for working on it!
> >
> > Thanks for signing up to review.  I need to send another revision of
> > the patch to add a missing call to truncate_useless_pathkeys(). Will
> > try to do that today.
> 
> I've attached a patch that ...

I've picked this one when looking around what I could review.

* As for the logic, I found generate_mergeappend_paths() to be the most
  interesting part:

Imagine table partitioned by "i", so "partition_pathkeys" is {i}.

partition 1:

i | j
--+--
0 | 0
1 | 1
0 | 1
1 | 0

partition 2:

i | j
--+--
3 | 0
2 | 0
2 | 1
3 | 1

Even if "pathkeys" is {i, j}, i.e. not contained in "partition_pathkeys", the
ordering of the subpaths should not change the way tuples are split into
partitions.

Obviously a problem is if "partition_pathkeys" and "pathkeys" lists start with
different items. To propose more generic rule, I used this example of
range-partitioned table, where "i" and "j" are the partitioning keys:

partition 1:

 i | j | k 
---+---+---
 0 | 0 | 1
 0 | 0 | 0

partition 2:

 i | j | k 
---+---+---
 0 | 1 | 0
 0 | 1 | 1

If the output "pathkey" is {i, k}, then the Append path makes rows of both
partitions interleave:

 i | j | k 
---+---+---
 0 | 0 | 0
 0 | 1 | 0
 0 | 0 | 1
 0 | 1 | 1

So in general I think the restriction is that no valid position of "pathkeys"
and "partition_pathkeys" may differ. Or in other words: the shorter of the 2
pathkey lists must be contained in the longer one. Does it make sense to you?


Another problem I see is that build_partition_pathkeys() continues even if it
fails to create a pathkey for some partitioning column. In the example above
it would mean that the table can have "partition_pathkeys" equal to {j}
instead of {i, j} on some EC-related conditions. However such a key does not
correspond to reality - this is easier to imagine if another partition is
considered.

partition 3:

 i | j | k 
---+---+---
 1 | 0 | 1
 1 | 0 | 0

So I think no "partition_pathkeys" should be generated in that case. On the
other hand, if the function returned the part of the list it could construct
so far, it'd be wrong because such incomplete pathkeys could pass the checks I
proposed above for reasons unrelated to the partitioning scheme.


The following comments are mostly on coding:

* Both qsort_partition_list_value_cmp() and qsort_partition_rbound_cmp() have
  this sentence in the header comment:

Note: If changing this, see build_partition_pathkeys()

However I could not find other use besides that in
RelationBuildPartitionDesc().

* create_append_path():

/*
 * Apply query-wide LIMIT if known and path is for sole base relation.
 * (Handling this at this low level is a bit klugy.)
 */
if (root != NULL && pathkeys != NULL &&
bms_equal(rel->relids, root->all_baserels))
pathnode->limit_tuples = root->limit_tuples;
else
pathnode->limit_tuples = -1.0;

  I think this optimization is not specific to AppendPath / MergeAppendPath,
  so it could be moved elsewhere (as a separate patch of course). But
  specifically for AppendPath, why do we have to test pathkeys? The pathkeys
  of the AppendPath do not necessarily describe the order of the set to which
  LIMIT is applied, so their existence should not be important here.

* If pathkeys is passed, shouldn't create_append_path() include the
  cost_sort() of subpaths just like create_merge_append_path() does?  And if
  so, then create_append_path() and create_merge_append_path() might
  eventually have some common code (at least for the subpath processing) to be
  put into a separate function.

* Likewise, create_append_plan() / create_merge_append_plan() are going to be
  more similar then before, so some refactoring could also make sense.

Although it's not too much code, I admit the patch is not trivial, so I'm
curious about your opinion.

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com



Re: Typo in xlog.c comment?

2018-10-31 Thread Andres Freund
Hi,

On 2018-10-31 15:32:24 +0100, Daniel Gustafsson wrote:
> Am I right in interpreting the below comment in xlog.c as documenting a new
> mode of operation, so “Not” should actually be “Note”?  Or am I just not able
> to English today?

No, you're right. Fixed!

Greetings,

Andres Freund



Typo in xlog.c comment?

2018-10-31 Thread Daniel Gustafsson
Am I right in interpreting the below comment in xlog.c as documenting a new
mode of operation, so “Not” should actually be “Note”?  Or am I just not able
to English today?

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 62fc418893..246869bba2 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5579,7 +5579,7 @@ readRecoveryCommandFile(void)
}

/*
-* Override any inconsistent requests. Not that this is a change of
+* Override any inconsistent requests. Note that this is a change of
 * behaviour in 9.5; prior to this we simply ignored a request to pause 
if
 * hot_standby = off, which was surprising behaviour.
 */

cheers ./daniel


Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-10-31 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> I dunno, I do not think it's a great idea to expose jsonb's internal
>> format to the world.  We intentionally did not do that when the type
>> was first defined --- that's why its binary I/O format isn't already
>> like this --- and I don't see that the tradeoffs have changed since then.

> I disagree- it's awfully expensive to go back and forth between string
> and a proper representation.

Has anyone put any effort into making jsonb_out() faster?  I think that
that would be way more productive.  Nobody is going to want to write
code to convert jsonb's internal form into whatever their application
uses; particularly not dealing with numeric fields.

In any case, the approach proposed in this patch seems like the worst
of all possible worlds: it's inconsistent and we get zero benefit from
having thrown away our information-hiding.  If we're going to expose the
internal format, let's just change the definition of the type's binary
I/O format, thereby getting a win for purposes like COPY BINARY as well.
We'd need to ensure that jsonb_recv could tell whether it was seeing the
old or new format, but at worst that'd require prepending a header of
some sort.  (In practice, I suspect we'd end up with a wire-format
definition that isn't exactly the bits-on-disk, but something easily
convertible to/from that and more easily verifiable by jsonb_recv.
Numeric subfields, for instance, ought to match the numeric wire
format, which IIRC isn't exactly the bits-on-disk either.)

regards, tom lane



Re: pg_dumpall --exclude-database option

2018-10-31 Thread Andrew Dunstan



On 10/13/2018 10:07 AM, Fabien COELHO wrote:


Hello Andrew,

A question: would it makes sense to have a symmetrical 
--include-database=PATTERN option as well?


I don't think so. If you only want a few databases, just use pg_dump. 
The premise of pg_dumpall is that you want all of them and this 
switch provides for exceptions to that.


Ok, sounds reasonable.

Somehow the option does not make much sense when under -g/-r/-t... 
maybe it should complain, like it does when the others are used 
together?


Added an error check.


Ok.

ISTM that it would have been better to issue just one query with an 
OR list, but that would require to extend "processSQLNamePattern" a 
little bit. Not sure whether it is worth it.


I don't think it is. This uses the same pattern that is used in 
pg_dump.c for similar switches.


Ok.


revised patch attached.


Patch applies cleanly, compiles, make check ok, pg_dump tap tests ok, 
doc build ok.


Very minor comments:

Missing space after comma:

 + {"exclude-database",required_argument, NULL, 5},

Now that C99 is okay, ISTM that both for loops in 
expand_dbname_patterns could benefit from using loop-local variables:


  for (SimpleStringListCell *cell = ...
  for (int i = ...

About the documentation:

   "When using wildcards, be careful to quote the pattern if needed to 
prevent

    the shell from expanding the wildcards."

I'd suggest to consider simplifying the end, maybe "to prevent shell 
wildcard expansion".


The feature is not tested per se. Maybe one existing tap test could be 
extended with minimal fuss to use it, eg --exclude-database='[a-z]*' 
should be close to only keeping the global stuff? I noticed an 
"exclude table" test already exists.






This patch addresses all these concerns.

cheers

andrew


--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/doc/src/sgml/ref/pg_dumpall.sgml b/doc/src/sgml/ref/pg_dumpall.sgml
index c51a130..973b995 100644
--- a/doc/src/sgml/ref/pg_dumpall.sgml
+++ b/doc/src/sgml/ref/pg_dumpall.sgml
@@ -300,6 +300,27 @@ PostgreSQL documentation
   
  
 
+
+ 
+  --exclude-database=pattern
+  
+   
+Do not dump databases whose name matches
+pattern.
+Multiple patterns can be excluded by writing multiple
+--exclude-database switches.  The
+pattern parameter is
+interpreted as a pattern according to the same rules used by
+psql's \d
+commands (see ),
+so multiple databases can also be excluded by writing wildcard
+characters in the pattern.  When using wildcards, be careful to
+quote the pattern if needed to prevent shell wildcard expansion.
+   
+  
+ 
+
  
   --if-exists
   
diff --git a/src/bin/pg_dump/pg_dumpall.c b/src/bin/pg_dump/pg_dumpall.c
index 5176626..ee7bc8e 100644
--- a/src/bin/pg_dump/pg_dumpall.c
+++ b/src/bin/pg_dump/pg_dumpall.c
@@ -52,6 +52,8 @@ static PGconn *connectDatabase(const char *dbname, const char *connstr, const ch
 static char *constructConnStr(const char **keywords, const char **values);
 static PGresult *executeQuery(PGconn *conn, const char *query);
 static void executeCommand(PGconn *conn, const char *query);
+static void expand_dbname_patterns(PGconn *conn, SimpleStringList *patterns,
+   SimpleStringList *names);
 
 static char pg_dump_bin[MAXPGPATH];
 static const char *progname;
@@ -87,6 +89,9 @@ static char role_catalog[10];
 static FILE *OPF;
 static char *filename = NULL;
 
+static SimpleStringList database_exclude_patterns = {NULL, NULL};
+static SimpleStringList database_exclude_names = {NULL, NULL};
+
 #define exit_nicely(code) exit(code)
 
 int
@@ -123,6 +128,7 @@ main(int argc, char *argv[])
 		{"column-inserts", no_argument, &column_inserts, 1},
 		{"disable-dollar-quoting", no_argument, &disable_dollar_quoting, 1},
 		{"disable-triggers", no_argument, &disable_triggers, 1},
+		{"exclude-database", required_argument, NULL, 5},
 		{"if-exists", no_argument, &if_exists, 1},
 		{"inserts", no_argument, &inserts, 1},
 		{"lock-wait-timeout", required_argument, NULL, 2},
@@ -318,6 +324,10 @@ main(int argc, char *argv[])
 appendPQExpBufferStr(pgdumpopts, " --no-sync");
 break;
 
+			case 5:
+simple_string_list_append(&database_exclude_patterns, optarg);
+break;
+
 			default:
 fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 exit_nicely(1);
@@ -334,6 +344,16 @@ main(int argc, char *argv[])
 		exit_nicely(1);
 	}
 
+	if (database_exclude_patterns.head != NULL &&
+		(globals_only || roles_only || tablespaces_only))
+	{
+		fprintf(stderr, _("%s: option --exclude-database cannot be used together with -g/--globals-only, -r/--roles-only or -t/--tablespaces-only\n"),
+progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+progname);
+		exit_nicely(1);
+	}
+
 	/* Make sure the user has

Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-10-31 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Kevin Van  writes:
> > This patch adds a new function that allows callers to receive binary jsonb.
> > This change was proposed in the discussion here:
> > https://www.postgresql.org/message-id/CAOsiKEL7%2BKkV0C_hAJWxqwTg%2BPYVfiGPQ0yjFww7ECtqwBjb%2BQ%40mail.gmail.com
> > and the primary motivation is to reduce database load by skipping jsonb to
> > string conversion (with the expectation that the application can parse
> > binary jsonb).
> 
> I dunno, I do not think it's a great idea to expose jsonb's internal
> format to the world.  We intentionally did not do that when the type
> was first defined --- that's why its binary I/O format isn't already
> like this --- and I don't see that the tradeoffs have changed since then.
> The format is complex, and specific to Postgres' needs, and we might wish
> to change it in future.

I disagree- it's awfully expensive to go back and forth between string
and a proper representation.  If we did decide to change the on-disk
format, we'd likely be able to translate that format into the
wire-protocol format and that'd still be much better than going to
strings.

As I recall, we did end up making changes to jsonb right before we
released it and so, at the time, it was definitely a good thing that we
hadn't exposed that format, but we're quite a few years in now (by the
time 12 comes out, every supported release will have jsonb) and the
format hasn't changed further.

If we really did want to go to a new format in the future, we'd probably
end up wanting to do so in a way which allowed us to avoid a full
dump/restore of the database too, so we'd need code to be able to at
least convert from the old format into the new format and it'd hopefully
not be too bad to have code to go the other way too.  Even if we didn't
though, I expect library authors would deal with the update across a
major version change- it's not like we don't do other things that
require them to update (SCRAM being my current favorite) and they've
been pretty good about getting things updated.

As always, there'll be the fallback option of going back to text format
too.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Super PathKeys (Allowing sort order through precision loss functions)

2018-10-31 Thread Tomas Vondra

On 10/31/2018 04:32 AM, David Rowley wrote:

On 31 October 2018 at 14:23, Tomas Vondra  wrote:

The other thing likely affecting this is locale / collation. Probably
not for date_trunc, but certainly for things like substr()/trim(),
mentioned by Simon upthread.

In some languages the rules are pretty complex, and there's no chance
it'll survive arbitrary substr() applied to the string. For example, in
Czech we mostly sort character-by-character, but "ch" is an exception
sorted in between "h" and "i".

So essentially " <= hchh <= hihh". Obviously, substr($1,0,3) cuts
the "ch" in half, changing the ordering:

create table x (y text collate "cs_CZ");
insert into x values (''), ('hchh'), ('hihh');

test=# select y from x order by 1;
   y
--
  
  hchh
  hihh
(3 rows)

test=# select substr(y,0,3) from x order by 1;
  substr

  hc
  hh
  hi
(3 rows)

I'm preeetty sure other languages have even funnier rules ...


That's pretty interesting, but as mentioned in my initial email...
More careful thought would be needed for other numerical types and
text types, I imagine, though I've not thought much about that.

I don't really think trim() or substr() would ever work for the reason
that they don't always operate on a prefix of the string. What you've
mentioned seems to rule out LEFT().



Sure, but it wasn't very clear to me what the "more careful thought" 
would mean.


I think a direct consequence of Tom's point about opclasses is that 
storing this information in pg_proc is not going to fly - you would need 
a separate catalog for that, to allow mapping the function to multiple 
opclasses (possibly with different features?). But OK, that's doable.


But what if you also need to do that for collations? Firstly, it would 
it add another degree of freedom, essentially making it (proc, opclass, 
collation, ... metadata ...) so there would be many such combinations. I 
can't really imagine defining that manually, but maybe it can be 
simplified. But more importantly - the list of collations is kinda 
dynamic, and AFAIK the collation rules may change depending on the 
glibc/icu versions. So, that seems pretty tricky. It's certainly not a 
just a SMOP.



I'm mildly suspicious of this data set, because it's essentially
perfectly sorted/deterministic, and the Sort node won't have to do
anything. So perhaps it just works by chance.

Consider this instead:

create table dt (ts timestamp, x text);

insert into dt select * from (select d, (case when random() < 0.5 then
'month' else 'hour' end) from generate_series('2018-01-01'::timestamp,
'2018-12-31'::timestamp, '1 hour'::interval) d) as foo order by random();



[...]


  2018-01-01 00:00:00
  2018-01-01 00:00:00
  2018-01-01 00:00:00
  2018-01-02 13:00:00
  ... kaboom! ...


Yeah. This is an issue. Thanks for the test case. However, I
acknowledged that in my reply to Tom. I did overlook it, which was
completely down to lack of imagination on my part. I'd never
considered using date_trunc() without a const 1st argument before.  It
seems simple enough to disable the optimisation in that case. I've
attached a patch which does that. Maybe that'll help us look beyond
this and focus on any other reasons why this is not possible.



Ah, sorry - I've missed this bit in your response. In my defense, it's 
been quite late over here, and my caffeine level got a tad too low.


Anyway, the question is how strong would the requirement need to be, and 
how would that affect applicability of this optimization in other cases. 
I agree it's probably not an issue for date_trunc() - I don't recall 
ever using it with non-constant first parameter. I wonder if there are 
other functions where that's not the case - although, that's not really 
an argument against this optimization.



It's also true that this diminishes the usefulness of the idea, but
part of the reason I've posting the idea so early after having thought
about it is precisely to see if this is going to float or sink.


Sure. And pgsql-hackers are very good in sinking ideas ;-)


Maybe we'll decide the scope of usefulness is so small that it's not
worth it, or that each function has such different requirements that
we can't reasonably make it work by adding a few columns to pg_proc.
I'm personally more interested in the cases that can work. I
understand there is no shortage of cases where it can't.



I think it can't be made just by adding a couple of columns to pg_proc, 
as explained above. A separate catalog mapping procs to opclasses (and 
maybe collations) may be needed. For cases where it depends on the 
actual parameter values (like substr/trim/ltrim) an extra function might 
be needed (something like the selectivity functions for data types).



Giving that we require const arguments away from the orderkey, perhaps
it could be made to work for simple arithmetic OpExprs. I'm not sure
if the use cases are actually there for that sort of thing and I've
seen WHERE indexcol+0 =  used many times

Re: WIP: Avoid creation of the free space map for small tables

2018-10-31 Thread Amit Kapila
On Wed, Oct 31, 2018 at 1:42 PM John Naylor  wrote:
>
> Upthread I wrote:
>
> > -A possible TODO item is to teach pg_upgrade not to link FSMs for
> > small heaps. I haven't look into the feasibility of that, however.
>
> This turned out to be relatively light weight (0002 attached). I had
> to add relkind to the RelInfo struct and save the size of each heap as
> its transferred. The attached SQL script will setup a couple test
> cases to demonstrate pg_upgrade. Installations with large numbers of
> small tables will be able to see space savings right away.
>
> For 0001, I adjusted the README and docs, and also made some cosmetic
> improvements to the code, mostly in the comments. I've set the
> commitfest entry back to 'needs review'
>
> One thing I noticed is that one behavior on master hasn't changed:
> System catalogs created during bootstrap still have a FSM if they have
> any data. Possibly related, catalogs also have a VM even if they have
> no data at all. This isn't anything to get excited about, but it would
> be nice to investigate, at least so it can be documented. A cursory
> dig hasn't found the cause, but I'll keep doing that as time permits.
>

Thanks for your work on this, I will try to review it during CF.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] generated columns

2018-10-31 Thread Sergei Kornilov
Hi

> OK, so the problem is COPY.
>
> Which means we have an issue with restore. We need to be able to pg_dump a 
> table with generated columns, then restore it afterwards. More generally, we 
> need to be able to handle data that has already been generated - the 
> "generate" idea should apply to new data not existing data.
pg_dump was fixed in published patch (i check this yesterday). It does not COPY 
generated columns values, only non-generated.

regards, Sergei



Re: [HACKERS] generated columns

2018-10-31 Thread Simon Riggs
On Wed, 31 Oct 2018 at 08:29, Erik Rijkers  wrote:

> On 2018-10-31 09:15, Simon Riggs wrote:
> > On Wed, 31 Oct 2018 at 07:58, Erikjan Rijkers  wrote:
> >
> >
> >> I have also noticed that logical replication isn't possible on tables
> >> with a generated column.  That's a shame but I suppsoe that is as
> >> expected.
> >>
> >
> > Couldn't see anything like that in the patch. Presumably unintended
> > consequence. The generated value needs to be in WAL, so decoding it
> > should
> > be trivial.
> >
>
> These log messages occur on attempting at logical replication:
>
> ( table t1 has no generated columns; replicates fine.
>table t2 has one generated column; replication fails: see below )
>
> LOG:  database system is ready to accept connections
> LOG:  logical replication apply worker for subscription "sub1" has
> started
> LOG:  logical replication table synchronization worker for subscription
> "sub1", table "t1" has started
> LOG:  logical replication table synchronization worker for subscription
> "sub1", table "t2" has started
> LOG:  logical replication table synchronization worker for subscription
> "sub1", table "t1" has finished
> ERROR:  column "i2" is a generated column
> DETAIL:  Generated columns cannot be used in COPY.
> LOG:  background worker "logical replication worker" (PID 22252) exited
> with exit code 1
>

OK, so the problem is COPY.

Which means we have an issue with restore. We need to be able to pg_dump a
table with generated columns, then restore it afterwards. More generally,
we need to be able to handle data that has already been generated - the
"generate" idea should apply to new data not existing data.

Sounds like we need to do an ALTER TABLE ... GENERATE ALWAYS after the
table has been re-created and re-loaded, so that both logical replication
and dump/restore would work.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: INSTALL file

2018-10-31 Thread Stephen Frost
* Andreas 'ads' Scherbaum (a...@pgug.de) wrote:
> On 30.10.18 11:49, Andrew Dunstan wrote:
> >On 10/30/2018 06:14 AM, Andreas 'ads' Scherbaum wrote:
> >>On 30.10.18 04:11, Michael Paquier wrote:
> >>>FWIW, I think that people depend too much on github and what github
> >>>thinks projects should do to be more presentable, like adding a
> >>>markdown-style README or such.

I don't find this to be all that relevant- people clone our git repo all
the time and cloning it either directly or through one of the mirrors is
almost certainly what people interested in working with the source code
will do first, these days.

For better or worse, the days of people pulling down tarballs to start
hacking on code are pretty much gone.  We likely still need to provide
the tarballs for our packagers, but that may be gone one day too.

> >>>I get your point that people look at README first though, and that the
> >>>current status is confusing.  One idea would be to merge the contents
> >>>of
> >>>README.git into the README.  However the current status also makes some
> >>>sense, as INSTALL is part of an distributed tarball, while README.git
> >>>is
> >>>automatically removed when running "make distdir".  Looking at README
> >>>is
> >>>the first thing I do when checking out any project or after
> >>>decompressing any source code tarball, so things could be better.

The current status is confusing and does *not* make sense today.  The
README needs to be updated to be sensible to someone using git to pull
down the repo.  The only way to make the current approach make sense
would be to actually *remove* the currently confusing README from the
git repo in its current form and also generate it as part of the tarball
build, but that seems like it's going very much in the wrong direction.

> >>Right, thanks. That's why one of my proposals was to have an INSTALL
> >>file in place, and overwrite it during the tarball creation process.
> >>
> >>This way the general INSTALL file is there, and can contain "general"
> >>instructions, and later on is overwritten by a specific INSTALL file for
> >>the tarballs.

I'm not really much of a fan of this.  My thinking would be to merge the
README.git into the README, and, in particular, provide a very clear
link to the
https://www.postgresql.org/docs/devel/static/installation.html URL.  I'd
also be a fan of including the 'Short version' in the README, and/or
maybe instructions on how to build the INSTALL file, but I suspect that
to be non-trivial to someone new.

> >That has the potential to be somewhat confusing:
> >
> >   "The INSTALL file says ..."
> >
> >   "Which INSTALL file are you referring to?"
> >
> >
> >Merging README.git into README make sense.

Agreed.

> >I think our attitude has generally been that if you're a developer you
> >should build from git, in which case we assume you know what you're doing,
> >and everyone else should build from a tarball. That's arguably somewhat
> >old-fashioned, specially since you can download release tarballs/zips from
> >places like  Sadly, these
> >won't have the artefacts created by "make dist". Maybe those too are less
> >important these days.

This is more than 'old fashioned', in my view, it's just not what
happens today.  Unfortunately, I don't think we have the stats anymore,
but I strongly suspect that our github/git/git mirrors get far more
pulls than our tarballs do now.

> Most experienced developers will know, I think. This was raised during the
> Google Code-In project, where students stumbled over this, and asked where
> the INSTALL file is ...
> 
> This has potential to confuse anyone new to PostgreSQL, and it's a burden
> which can easily be avoided.

Agreed, we should really improve the README by merging the README.git
into it and make the project, as a whole, more accessible to new
developers.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-10-31 Thread Tom Lane
Kevin Van  writes:
> This patch adds a new function that allows callers to receive binary jsonb.
> This change was proposed in the discussion here:
> https://www.postgresql.org/message-id/CAOsiKEL7%2BKkV0C_hAJWxqwTg%2BPYVfiGPQ0yjFww7ECtqwBjb%2BQ%40mail.gmail.com
> and the primary motivation is to reduce database load by skipping jsonb to
> string conversion (with the expectation that the application can parse
> binary jsonb).

I dunno, I do not think it's a great idea to expose jsonb's internal
format to the world.  We intentionally did not do that when the type
was first defined --- that's why its binary I/O format isn't already
like this --- and I don't see that the tradeoffs have changed since then.
The format is complex, and specific to Postgres' needs, and we might wish
to change it in future.

regards, tom lane



WIP Patch: Add a function that returns binary JSONB as a bytea

2018-10-31 Thread Kevin Van
This patch adds a new function that allows callers to receive binary jsonb.
This change was proposed in the discussion here:
https://www.postgresql.org/message-id/CAOsiKEL7%2BKkV0C_hAJWxqwTg%2BPYVfiGPQ0yjFww7ECtqwBjb%2BQ%40mail.gmail.com
and the primary motivation is to reduce database load by skipping jsonb to
string conversion (with the expectation that the application can parse
binary jsonb).

This patch is still a WIP and I am looking for any guidance on the
approach. If it is valid, I'd also appreciate any guidance on what kind of
test coverage would be appropriate for a change like this.

Note: I prepended the version number "1" to the byte array -- this is
similar to what jsonb_recv does in the same file.

I have been testing on a x86-64 processor running MacOSX and do not know
what the behavior is on other platforms.

This patch is off of master and compiles successfully. The following is
some example output using psql:

database=# SELECT jsonb_raw_bytes('{"b":12345}'::jsonb);

 jsonb_raw_bytes

--

 \x01012001800d1062002800018001002909

(1 row)


database=# SELECT jsonb_raw_bytes('{"a":{"b":12345}}'::jsonb->'a');

 jsonb_raw_bytes

--

 \x01012001800d1062002800018001002909

(1 row)



Some preliminary testing on my own machine shows me that this change has a
significant impact on performance.


I used psql to select a jsonb column from all the rows in a table (about 1
million rows) where the json data was roughly 400-500 bytes per record.


database=# \timing

Timing is on.

database=# \o /tmp/raw_bytes_out.txt;

database=# SELECT jsonb_raw_bytes(data) FROM datatable;

Time: 2582.545 ms (00:02.583)

database=# \o /tmp/json_out.txt;

database=# SELECT data FROM datatable;

Time: 5653.235 ms (00:05.653)


Of note is that the size of raw_bytes_out.txt in the example is roughly
twice that of json_out.txt so the timing difference is not due to less data
being transferred.


jsonb_raw_bytes_v1.patch
Description: Binary data


Re: Pluggable Storage - Andres's take

2018-10-31 Thread Dmitry Dolgov
> On Mon, 29 Oct 2018 at 05:56, Haribabu Kommi  wrote:
>
>> This problem couldn't be reproduced on the master branch, so I've tried to
>> investigate it. It comes from nodeModifyTable.c:1267, when we've got
>> HeapTupleInvisible as a result, and this value in turn comes from
>> table_lock_tuple. Everything points to the new way of handling 
>> HeapTupleUpdated
>> result from heap_update, when table_lock_tuple call was introduced. Since I
>> don't see anything similar in the master branch, can anyone clarify why is 
>> this
>> lock necessary here?
>
>
> In the master branch code also, there is a tuple lock that is happening in
> EvalPlanQual() function, but pluggable-storage code, the lock is kept outside
> and also function call rearrangements, to make it easier for the table access
> methods to provide their own MVCC implementation.

Yes, now I see it, thanks. Also I can confirm that the attached patch solves
this issue.

FYI, alongside with reviewing the code changes I've ran few performance tests
(that's why I hit this issue with pgbench in the first place). In case of high
concurrecy so far I see small performance degradation in comparison with the
master branch (about 2-5% of average latency, depending on the level of
concurrency), but can't really say why exactly (perf just shows barely
noticeable overhead there and there, maybe what I see is actually a cumulative
impact).



Is there way to detect uncommitted 'new table' in pg_class?

2018-10-31 Thread Hubert Zhang
Hi all,

In PG READ UNCOMMITTED is treated as READ COMMITTED
But I have a requirement to read dirty table. Is there way to detect table
which is created in other uncommitted transaction?

T1:
BEGIN;
create table a(i int);

T2:
select * from pg_class where relname='a';
could return table a?
-- 
Thanks

Hubert Zhang


Re: [PATCH][PROPOSAL] Add enum releation option type

2018-10-31 Thread Nikolay Shaplov
В письме от 12 сентября 2018 21:40:49 пользователь Nikolay Shaplov написал:
> > As you mentioned in previous mail, you prefer to keep enum and
> > relopt_enum_element_definition array in the same .h file. I'm not sure,
> > but I think it is done to keep everything related to enum in one place
> > to avoid inconsistency in case of changes in some part (e.g. change of
> > enum without change of array). On the other hand, array content created
> > without array creation itself in .h file. Can we move actual array
> > creation into same .h file? What is the point to separate array content
> > definition and array definition?
> 
> Hm... This would be good. We really can do it? ;-) I am not C-expert, some
> aspects of C-development is still mysterious for me. So if it is really ok
> to create array in .h file, I would be happy to move it there  (This patch
> does not include this change, I still want to be sure we can do it)
I've discussed this issue with a colleague, who IS C-expert, and his advice 
was not to include this static const into .h file. Because copy of this const 
would be created in all objective files where this .h were included. And it is 
not the best way... 

-- 
Do code for fun.

signature.asc
Description: This is a digitally signed message part.


Re: syntax error: VACUUM ANALYZE VERBOSE (PostgreSQL 11 regression)

2018-10-31 Thread Pavel Stehule
st 31. 10. 2018 v 8:55 odesílatel Michael Paquier 
napsal:

> On Wed, Oct 31, 2018 at 08:38:27AM +0100, Pavel Stehule wrote:
> > ok. Unfortunatelly it is not mentioned in release notes - in not
> compatible
> > changes.
> >
> > This change can hit lot of users. It is small nonsense, but lot of people
> > use it.
>
> Please just look at the order of the words in the command.  This is
> documented:
> VACUUM VERBOSE ANALYZE;
> However this version has never been documented, still unfortunately
> it was supported:
> VACUUM ANALYZE VERBOSE;
>

I miss a sentence in release note - like attention on order of keywords in
VACUUM xxx.

Regards

Pavel

--
> Michael
>


RE: Timeout parameters

2018-10-31 Thread Nagaura, Ryohei
Hi Andrei,

First, I inform you that I may not contact for the following period:
From November 1st to November 19th

Second, I noticed my misunderstanding in previous mail.
> > Nevertheless, it is necessary to take into account that the option
> > TCP_USER_TIMEOUT is supported by Linux kernel starting since 2.6.37.
> > In a lower kernel version these changes will not take affect.
> Does it mean how do we support Linux OS whose kernel version is less than
> 2.6.37?
I understand that you pointed out my implementation.
I'll remake patch files when I return.

Finally, I write test method for each parameters here roughly.
You may use iptables command on linux when testing TCP_USER_TIMEOUT.
You may use pg_sleep(seconds) command in postgres.
I'll write the details after my returning.

Continue to discuss the socket_timeout, please.

Best regards,
-
Ryohei Nagaura



Re: [HACKERS] generated columns

2018-10-31 Thread Erik Rijkers

On 2018-10-31 09:15, Simon Riggs wrote:

On Wed, 31 Oct 2018 at 07:58, Erikjan Rijkers  wrote:



I have also noticed that logical replication isn't possible on tables
with a generated column.  That's a shame but I suppsoe that is as
expected.



Couldn't see anything like that in the patch. Presumably unintended
consequence. The generated value needs to be in WAL, so decoding it 
should

be trivial.



These log messages occur on attempting at logical replication:

( table t1 has no generated columns; replicates fine.
  table t2 has one generated column; replication fails: see below )

LOG:  database system is ready to accept connections
LOG:  logical replication apply worker for subscription "sub1" has 
started
LOG:  logical replication table synchronization worker for subscription 
"sub1", table "t1" has started
LOG:  logical replication table synchronization worker for subscription 
"sub1", table "t2" has started
LOG:  logical replication table synchronization worker for subscription 
"sub1", table "t1" has finished

ERROR:  column "i2" is a generated column
DETAIL:  Generated columns cannot be used in COPY.
LOG:  background worker "logical replication worker" (PID 22252) exited 
with exit code 1





Virtual columns wouldn't need to be replicated.

I guess we might choose to replicate generated cols as a value, or 
leave

them out and let them be generated on the downstream side. The default
should be to just treat them as a value.

--
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] generated columns

2018-10-31 Thread Simon Riggs
On Wed, 31 Oct 2018 at 07:58, Erikjan Rijkers  wrote:


> I have also noticed that logical replication isn't possible on tables
> with a generated column.  That's a shame but I suppsoe that is as
> expected.
>

Couldn't see anything like that in the patch. Presumably unintended
consequence. The generated value needs to be in WAL, so decoding it should
be trivial.

Virtual columns wouldn't need to be replicated.

I guess we might choose to replicate generated cols as a value, or leave
them out and let them be generated on the downstream side. The default
should be to just treat them as a value.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: WIP: Avoid creation of the free space map for small tables

2018-10-31 Thread John Naylor
Upthread I wrote:

> -A possible TODO item is to teach pg_upgrade not to link FSMs for
> small heaps. I haven't look into the feasibility of that, however.

This turned out to be relatively light weight (0002 attached). I had
to add relkind to the RelInfo struct and save the size of each heap as
its transferred. The attached SQL script will setup a couple test
cases to demonstrate pg_upgrade. Installations with large numbers of
small tables will be able to see space savings right away.

For 0001, I adjusted the README and docs, and also made some cosmetic
improvements to the code, mostly in the comments. I've set the
commitfest entry back to 'needs review'

One thing I noticed is that one behavior on master hasn't changed:
System catalogs created during bootstrap still have a FSM if they have
any data. Possibly related, catalogs also have a VM even if they have
no data at all. This isn't anything to get excited about, but it would
be nice to investigate, at least so it can be documented. A cursory
dig hasn't found the cause, but I'll keep doing that as time permits.

-John Naylor
From 428b56e12e3f5d2bd8fc81c9fb5fe7169e9da580 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Wed, 31 Oct 2018 14:26:38 +0700
Subject: [PATCH v7 1/2] Avoid creation of the free space map for small tables.

The FSM isn't created if the heap has fewer than 8 blocks. If the last
known good block has insufficient space, try every block before extending
the heap.

If a heap with a FSM is truncated back to below the threshold, the FSM
stays around and can be used as usual.
---
 contrib/pageinspect/expected/page.out |  77 +++
 contrib/pageinspect/sql/page.sql  |  33 +--
 doc/src/sgml/storage.sgml |   3 +-
 src/backend/access/brin/brin.c|   2 +-
 src/backend/access/brin/brin_pageops.c|   8 +-
 src/backend/access/heap/hio.c |  61 --
 src/backend/commands/vacuumlazy.c |  17 +-
 src/backend/storage/freespace/README  |  11 +-
 src/backend/storage/freespace/freespace.c | 234 +-
 src/backend/storage/freespace/indexfsm.c  |   4 +-
 src/include/storage/freespace.h   |   6 +-
 11 files changed, 360 insertions(+), 96 deletions(-)

diff --git a/contrib/pageinspect/expected/page.out b/contrib/pageinspect/expected/page.out
index 3fcd9fbe6d..83e5910453 100644
--- a/contrib/pageinspect/expected/page.out
+++ b/contrib/pageinspect/expected/page.out
@@ -1,48 +1,69 @@
 CREATE EXTENSION pageinspect;
-CREATE TABLE test1 (a int, b int);
-INSERT INTO test1 VALUES (16777217, 131584);
-VACUUM test1;  -- set up FSM
+CREATE TABLE test_rel_forks (a int);
+-- Make sure there are enough blocks in the heap for the FSM to be created.
+INSERT INTO test_rel_forks SELECT g from generate_series(1,1) g;
+-- set up FSM and VM
+VACUUM test_rel_forks;
 -- The page contents can vary, so just test that it can be read
 -- successfully, but don't keep the output.
-SELECT octet_length(get_raw_page('test1', 'main', 0)) AS main_0;
+SELECT octet_length(get_raw_page('test_rel_forks', 'main', 0)) AS main_0;
  main_0 
 
8192
 (1 row)
 
-SELECT octet_length(get_raw_page('test1', 'main', 1)) AS main_1;
-ERROR:  block number 1 is out of range for relation "test1"
-SELECT octet_length(get_raw_page('test1', 'fsm', 0)) AS fsm_0;
+SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS main_100;
+ERROR:  block number 100 is out of range for relation "test_rel_forks"
+SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0;
  fsm_0 
 ---
   8192
 (1 row)
 
-SELECT octet_length(get_raw_page('test1', 'fsm', 1)) AS fsm_1;
- fsm_1 

-  8192
-(1 row)
-
-SELECT octet_length(get_raw_page('test1', 'vm', 0)) AS vm_0;
+SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10;
+ERROR:  block number 10 is out of range for relation "test_rel_forks"
+SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 0)) AS vm_0;
  vm_0 
 --
  8192
 (1 row)
 
-SELECT octet_length(get_raw_page('test1', 'vm', 1)) AS vm_1;
-ERROR:  block number 1 is out of range for relation "test1"
+SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 1)) AS vm_1;
+ERROR:  block number 1 is out of range for relation "test_rel_forks"
 SELECT octet_length(get_raw_page('xxx', 'main', 0));
 ERROR:  relation "xxx" does not exist
-SELECT octet_length(get_raw_page('test1', 'xxx', 0));
+SELECT octet_length(get_raw_page('test_rel_forks', 'xxx', 0));
 ERROR:  invalid fork name
 HINT:  Valid fork names are "main", "fsm", "vm", and "init".
-SELECT get_raw_page('test1', 0) = get_raw_page('test1', 'main', 0);
+SELECT * FROM fsm_page_contents(get_raw_page('test_rel_forks', 'fsm', 0));
+ fsm_page_contents 
+---
+ 0: 192   +
+ 1: 192   +
+ 3: 192   +
+ 7: 192   +
+ 15: 192  +
+ 31: 192  +
+ 63: 192  +
+ 127: 192 +
+ 255: 192 +
+ 511: 192 +
+ 1023: 192+
+ 2047: 192+

Re: ToDo: show size of partitioned table

2018-10-31 Thread Michael Paquier
On Wed, Oct 31, 2018 at 08:41:45AM +0100, Pavel Stehule wrote:
> I am not sure - I remember one private test that we did on our patches, and
> this tests fails sometimes on 32bits. So I afraid about stability.

That could be possible as well.  I think that you are right to be afraid
of such things, and keeping the tests portable a maximum is always a
good thing I think.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] generated columns

2018-10-31 Thread Erikjan Rijkers

On 2018-10-30 16:14, Sergei Kornilov wrote:

Hi

I applied this patch on top 2fe42baf7c1ad96b5f9eb898161e258315298351
commit and found a bug while adding STORED column:

postgres=# create table test(i int);
CREATE TABLE
postgres=# insert into test values (1),(2);
INSERT 0 2
postgres=# alter table test add column gen_stored integer GENERATED
ALWAYS AS ((i * 2)) STORED;
ALTER TABLE
postgres=# alter table test add column gen_virt integer GENERATED
ALWAYS AS ((i * 2));
ALTER TABLE
postgres=# table test;
 i | gen_stored | gen_virt
---++--
 1 ||2
 2 ||4

Virtual columns was calculated on table read and its ok, but stored
column does not update table data.


This workaround is possible:

update test set i = i where gen_stored is null returning *;
 i | gen_stored | gen_virt
---++--
 1 |  2 |2
 2 |  4 |4
(2 rows)

table test ;
 i | gen_stored | gen_virt
---++--
 3 |  6 |6
 4 |  8 |8
 1 |  2 |2
 2 |  4 |4
(4 rows)


Hm, well, I suppose it's still a bug...


I have also noticed that logical replication isn't possible on tables 
with a generated column.  That's a shame but I suppsoe that is as 
expected.



Erik Rijkers



regards, Sergei




Re: syntax error: VACUUM ANALYZE VERBOSE (PostgreSQL 11 regression)

2018-10-31 Thread Michael Paquier
On Wed, Oct 31, 2018 at 08:38:27AM +0100, Pavel Stehule wrote:
> ok. Unfortunatelly it is not mentioned in release notes - in not compatible
> changes.
> 
> This change can hit lot of users. It is small nonsense, but lot of people
> use it.

Please just look at the order of the words in the command.  This is
documented:
VACUUM VERBOSE ANALYZE;
However this version has never been documented, still unfortunately
it was supported:
VACUUM ANALYZE VERBOSE;
--
Michael


signature.asc
Description: PGP signature


Re: COPY FROM WHEN condition

2018-10-31 Thread Masahiko Sawada
On Tue, Oct 30, 2018 at 11:47 PM Surafel Temesgen  wrote:
>
>
> Hi,
> Thank you for looking at it .
> On Sun, Oct 28, 2018 at 7:19 PM Tomas Vondra  
> wrote:
>>
>>
>> 1) I think this deserves at least some regression tests. Plenty of tests
>> already use COPY, but there's no coverage for the new piece. So let's
>> add a new test suite, or maybe add a couple of tests into copy2.sql.
>>
>>
>> 2) In copy.sqml, the new item is defined like this
>>
>> WHEN Clause
>>
>> I suggest we use just WHEN, that's what
>> the other items do (see ENCODING).
>>
>> The other thing is that this does not say what expressions are allowed
>> in the WHEN clause. It seems pretty close to WHEN clause for triggers,
>> which e.g. mentions that subselects are not allowed. I'm pretty sure
>> that's true here too, because it fails like this (118 = T_SubLink):
>>
>> test=# copy t(a,b,c) from '/tmp/t.data' when ((select 1) < 10);
>> ERROR:  unrecognized node type: 118
>>
>> So, the patch needs to detect this, produce a reasonable error message
>> and document the limitations in copy.sqml, just like we do for CREATE
>> TRIGGER.
>
> fixed
>>
>>
>> 3) For COPY TO, the WHEN clause is accepted but ignored, leading to
>> confusing cases like this:
>>
>> test=# copy t(a,b,c) to '/tmp/t.data' when ((select 100) < 10);
>> COPY 151690
>>
>> So, it contains subselect, but unlike COPY FROM it does not fail
>> (because we never execute it). The fun part is that the expression is
>> logically false, so a user might expect it to filter rows, yet we copy
>> everything.
>>
>> IMHO we need to either error-out in these cases, complaining about WHEN
>> not being supported for COPY TO, or make it work (effectively treating
>> it as a simpler alternative to COPY (subselect) TO).
>
> English is not my first language but I chose error-out because WHEN condition 
> for COPY TO seems to me semantically incorrect
>>
>>
>> AFAICS we could just get rid of the extra when_cluase variable and mess
>> with the cstate->whenClause directly, depending on how (3) gets fixed.
>
> I did it this way because CopyState structure memory allocate and initialize 
> in BeginCopyFrom but the analysis done before it
>>
>>
>> 5) As I mentioned, the CREATE TRIGGER already has WHEN clause, but it
>> requires it to be 'WHEN (expr)'. I suggest we do the same thing here,
>> requiring the parentheses.
>>
>>
>> 6) The skip logic in CopyFrom() seems to be slightly wrong. It does
>> work, but the next_record label is defined after CHECK_FOR_INTERRUPTS()
>> so a COPY will not respond to Ctrl-C unless it finds a row matching the
>> WHEN condition. If you have a highly selective condition, that's a bit
>> inconvenient.
>>
>> It also skips
>>
>> MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
>>
>> so I wonder what the heap_form_tuple() right after the next_record label
>> will use for tuples right after a skipped one. I'd bet it'll use the
>> oldcontext (essentially the long-lived context), essentially making it
>> a memory leak.
>>
>> So I suggest to get rid of the next_record label, and use 'continue'
>> instead of the 'goto next_record'.
>>
> fixed
>>

I've looked at this patch and tested.

When I use a function returning string in WHEN clause I got the following error:

=# copy test from '/tmp/aaa.csv' (format 'csv') when (lower(t) = 'hello');
ERROR:  could not determine which collation to use for lower() function
HINT:  Use the COLLATE clause to set the collation explicitly.
CONTEXT:  COPY hoge, line 1: "1,hoge,2018-01-01"

And then although I specified COLLATE I got an another error (127 =
T_CollateExpr):

=# copy test from '/tmp/aaa.csv' (format 'csv') when (lower(t) collate
"en_US" = 'hello');
ERROR:  unrecognized node type: 127

This error doesn't happen if I put the similar condition in WHEN
clause for triggers. I think the patch needs to produce a reasonable
error message.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: syntax error: VACUUM ANALYZE VERBOSE (PostgreSQL 11 regression)

2018-10-31 Thread Michael Paquier
On Wed, Oct 31, 2018 at 08:19:35AM +0100, Pavel Stehule wrote:
> I have report of one customer. Some scripts stopped on 11 because VACUUM
> ANALYZE VERBOSE doesn't work now.
> 
> postgres=# vacuum analyze verbose;
> ERROR:  syntax error at or near "verbose"
> LINE 1: vacuum analyze verbose;
>^
> vacuum verbose analyze is working.

Here you go:
commit: 921059bd66c7fb1230c705d3b1a65940800c4cbb
author: Robert Haas 
date: Tue, 9 Jan 2018 10:12:58 -0500
Don't allow VACUUM VERBOSE ANALYZE VERBOSE.

There are plans to extend the syntax for ANALYZE, so we need to break
the link between VacuumStmt and AnalyzeStmt.  But apart from that, the
syntax above is undocumented and, if discovered by users, might give
the impression that the VERBOSE option for VACUUM differs from the
verbose option from ANALYZE, which it does not.

Nathan Bossart, reviewed by Michael Paquier and Masahiko Sawada

Discussion: http://postgr.es/m/d3fc73e2-9b1a-4db4-8180-55f57d116...@amazon.com

You could just use this query for the same result:
vacuum (analyze, verbose);
--
Michael


signature.asc
Description: PGP signature


Re: ToDo: show size of partitioned table

2018-10-31 Thread Pavel Stehule
st 31. 10. 2018 v 8:38 odesílatel Michael Paquier 
napsal:

> On Wed, Oct 31, 2018 at 03:34:02PM +0900, Amit Langote wrote:
> > On 2018/10/31 15:30, Pavel Stehule wrote:
> >> I am not sure. Has not sense run this test over empty database, and some
> >> bigger database can increase running.
> >>
> >> More the size can be platform depend.
> >
> > Okay, sure.
>
> Well, that would mostly depend on the relation page size which is
> defined at compilation, no?  The plans of some queries are unstable
> depending on the page size so they would fail, but let's limit the
> damage if possible.
>

I am not sure - I remember one private test that we did on our patches, and
this tests fails sometimes on 32bits. So I afraid about stability.

Regards

Pavel

--
> Michael
>


Re: syntax error: VACUUM ANALYZE VERBOSE (PostgreSQL 11 regression)

2018-10-31 Thread Pavel Stehule
st 31. 10. 2018 v 8:34 odesílatel Sergei Kornilov  napsal:

> Hi
>
> At least this is documented behavior:
> > When the option list is surrounded by parentheses, the options can be
> written in any order. Without parentheses, options must be specified in
> exactly the order shown above.
> https://www.postgresql.org/docs/current/static/sql-vacuum.html
>
> Previously we have another additional synopsis
> > VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] ANALYZE [ table_name [
> (column_name [, ...] ) ] ]
> Seems this syntax was removed in this commit:
> https://www.postgresql.org/message-id/E1dzW4X-00089L-5a%40gemulon.postgresql.org


ok. Unfortunatelly it is not mentioned in release notes - in not compatible
changes.

This change can hit lot of users. It is small nonsense, but lot of people
use it.

Regards

Pavel




>
>
> regards, Sergei
>


Re: ToDo: show size of partitioned table

2018-10-31 Thread Michael Paquier
On Wed, Oct 31, 2018 at 03:34:02PM +0900, Amit Langote wrote:
> On 2018/10/31 15:30, Pavel Stehule wrote:
>> I am not sure. Has not sense run this test over empty database, and some
>> bigger database can increase running.
>> 
>> More the size can be platform depend.
> 
> Okay, sure.

Well, that would mostly depend on the relation page size which is
defined at compilation, no?  The plans of some queries are unstable
depending on the page size so they would fail, but let's limit the
damage if possible.
--
Michael


signature.asc
Description: PGP signature


Re: syntax error: VACUUM ANALYZE VERBOSE (PostgreSQL 11 regression)

2018-10-31 Thread Sergei Kornilov
Hi

At least this is documented behavior: 
> When the option list is surrounded by parentheses, the options can be written 
> in any order. Without parentheses, options must be specified in exactly the 
> order shown above.
https://www.postgresql.org/docs/current/static/sql-vacuum.html

Previously we have another additional synopsis
> VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] ANALYZE [ table_name [ (column_name [, 
> ...] ) ] ]
Seems this syntax was removed in this commit: 
https://www.postgresql.org/message-id/E1dzW4X-00089L-5a%40gemulon.postgresql.org

regards, Sergei



Re: jsonpath

2018-10-31 Thread Oleg Bartunov
On Mon, Oct 29, 2018 at 2:20 AM Tomas Vondra
 wrote:
>
> Hi,
>
> On 10/02/2018 04:33 AM, Michael Paquier wrote:
> > On Sat, Sep 08, 2018 at 02:21:27AM +0300, Nikita Glukhov wrote:
> >> Attached 18th version of the patches rebased onto the current master.
> >
> > Nikita, this version fails to apply, as 0004 has conflicts with some
> > regression tests.  Could you rebase?  I am moving the patch to CF
> > 2018-11, waiting for your input.
> > --
> > Michael
> >
>
> As Michael mentioned, the patch does not apply anymore, so it would be
> good to provide a rebased version for CF 2018-11. I've managed to do
> that, as the issues are due to minor bitrot, so that I can do some quick
> review of the current version.
>
> I haven't done much testing, pretty much just compiling, running the
> usual regression tests and reading through the patches. I plan to do
> more testing once the rebased patch is submitted.
>
> Now, a couple of comments based on eye-balling the diffs.
>
>
> 1) There are no docs, which makes it pretty much non-committable for
> now. I know there is [1] and it was a good intro for the review, but we
> need something as a part of our sgml docs.

SQL/JSON documentation discussed in separate topic
https://www.postgresql.org/message-id/flat/732208d3-56c3-25a4-8f08-3be1d54ad51b%40postgrespro.ru


>
>
> 2) 0001 says this:
>
> *typmod = -1; /* TODO implement FF1, ..., FF9 */
>
> but I have no idea what FF1 or FF9 are. I guess it needs to be
> implemented, or explained better.
>
>
> 3) The makefile rule for scan.o does this:
>
> +# Latest flex causes warnings in this file.
> +ifeq ($(GCC),yes)
> +scan.o: CFLAGS += -Wno-error
> +endif
>
> That seems a bit ugly, and we should probably try to make it work with
> the latest flex, instead of hiding the warnings. I don't think we have
> any such ad-hoc rules in other Makefiles. If we really need it, can't we
> make it part of configure, and/or restrict it depending on flex version?
>
>
> 4) There probably should be .gitignore rule for jsonpath_gram.h, just
> like for other generated header files.
>
>
> 5) jbvType says jbvDatetime is "virtual type" but does not explain what
> it is. IMHO that deserves a comment or something.
>
>
> 6) I see the JsonPath definition says this about header:
>
> /* just version, other bits are reservedfor future use */
>
> but the very next thing it does is defining two pieces stored in the
> header - version AND "lax" mode flag. Which makes the comment invalid
> (also, note the missing space after "reserved").
>
> FWIW, I'd use JSONPATH_STRICT instead of JSONPATH_LAX. The rest of the
> codebase works with "strict" flags passed around, and it's easy to
> forget to negate the flag somewhere (at least that's my experience).
>
>
> 7) I see src/include/utils/jsonpath_json.h adds support for plain json
> by undefining various jsonb macros and redirecting them to the json
> variants. I find that rather suspicious - imagine you're investigating
> something in code using those jsonb macros, and wondering why it ends up
> calling the json stuff. I'd be pretty confused ...
>
> Also, some of the redefinitions are not really needed for example
> JsonbWrapItemInArray and JsonbWrapItemsInArray are not used anywhere
> (and neither are the json variants).
>
>
> 8) I see 0002 defines IsAJsonbScalar like this:
>
>   #define IsAJsonbScalar(jsonbval)  (((jsonbval)->type >= jbvNull && \
>  (jsonbval)->type <= jbvBool) || \
>  (jsonbval)->type == jbvDatetime)
>
> while 0004 does this
>
>   #define jspIsScalar(type) ((type) >= jpiNull && (type) <= jpiBool)
>
> I know those are for different data types (jsonb vs. jsonpath), but I
> suppose jspIsScalar should include the datetime too.
>
> FWIW JsonPathItemType would deserve better documentation what the
> various items mean (a comment for each line would be enough). I've been
> wondering if "jpiDouble" means scalar double value until I realized
> there's a ".double()" function for paths.
>
>
> 9) It's generally a good idea to make the individual pieces committable
> separately, but that means e.g. the regression tests have to pass after
> each patch. At the moment that does not seem to be the case for 0002,
> see the attached file. I'm running with -DRANDOMIZE_ALLOCATED_MEMORY,
> not sure if that's related.
>
> regards
>
>
> [1] https://github.com/obartunov/sqljsondoc/blob/master/README.jsonpath.md
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


-- 
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] generated columns

2018-10-31 Thread Michael Paquier
On Tue, Oct 30, 2018 at 09:35:18AM +0100, Peter Eisentraut wrote:
> Attached is a new version of this patch.

Thanks Peter for sending a new patch.  I am still assigned as a
reviewer, and still plan to look at it in more details.

> It supports both computed-on-write and computed-on-read variants, using
> the keywords STORED and VIRTUAL respectively.

It is actually good to see that you are tackling both problems now.
--
Michael


signature.asc
Description: PGP signature


syntax error: VACUUM ANALYZE VERBOSE (PostgreSQL 11 regression)

2018-10-31 Thread Pavel Stehule
Hi

I have report of one customer. Some scripts stopped on 11 because VACUUM
ANALYZE VERBOSE doesn't work now.

postgres=# vacuum analyze verbose;
ERROR:  syntax error at or near "verbose"
LINE 1: vacuum analyze verbose;
   ^

vacuum verbose analyze is working.

Regards

Pavel