RE: [bug fix] Produce a crash dump before main() on Windows

2018-07-25 Thread Tsunakawa, Takayuki
From: Michael Paquier [mailto:mich...@paquier.xyz]
> No, I really mean a library dependency failure.  For example, imagine that
> Postgres is compiled on Windows dynamically, and that it depends on
> libxml2.dll, which is itself compiled dynamically.  Then imagine, in a
> custom build echosystem, that a folk comes in and adds lz support to
> libxml2 on Windows.  If Postgres still consumes libxml2 but does not add
> in its PATH a version of lz, then a backend in need of libxml2 would fail
> to load, causing Postgres to not start properly.  True, painful, story.

I see, that's surely painful.  But the DLL in use cannot be overwritten on 
Windows.  So, I assume the following:

1. postmaster loads libxml2.dll without LZ in folder A.
2. Someone adds libxml2.dll with LZ in folder B.  folder B is ahead of folder A 
in postgres's PATH.
3. Some user tries to connect to a database, creating a new child postgres 
process, which fails to load libxml2.dll in folder B.




> What is ticking me is if the popup window stuff discussed on this thread
> could be a problem in the detection of such dependency errors, as with the
> product I am thinking about, Postgres is not running as a service, but kicked
> by another thing which is a service, and monitors the postmaster.

I understood you are talking about a case where some (server) application uses 
PostgreSQL internally.  That application runs as a Windows service, but 
PostgreSQL itself doesn't on its own.  The application starts PostgreSQL by 
running pg_ctl start.

In that case, postgres runs under service.  I confirmed it with the following 
test program.  This code is extracted from pgwin32_is_service() in PostgreSQL.

--
#include 
#include 

int
main(void)
{
BOOLIsMember;
PSIDServiceSid;
PSIDLocalSystemSid;
SID_IDENTIFIER_AUTHORITY NtAuthority = {SECURITY_NT_AUTHORITY};
FILE *fp;

SetErrorMode(0);

/* Check for service group membership */
if (!AllocateAndInitializeSid(, 1,
  
SECURITY_SERVICE_RID, 0, 0, 0, 0, 0, 0, 0,
  ))
{
fprintf(stderr, "could not get SID for service group: error 
code %lu\n",
GetLastError());
return 1;
}

if (!CheckTokenMembership(NULL, ServiceSid, ))
{
fprintf(stderr, "could not check access token membership: error 
code %lu\n",
GetLastError());
FreeSid(ServiceSid);
return 1;
}
FreeSid(ServiceSid);

fp = fopen("d:\\a.txt", "a");
if (IsMember)
fprintf(fp, "is under service\n");
else
fprintf(fp, "is not under service\n");

return 0;
}
--

You can build the above program with:
  cl chksvc.c advapi32.lib


Regards
Takayuki Tsunakawa






Re: [HACKERS] WAL logging problem in 9.4.3?

2018-07-25 Thread Kyotaro HORIGUCHI
Hello.

At Wed, 25 Jul 2018 23:08:33 +0900, Michael Paquier  wrote 
in <20180725140833.gc6...@paquier.xyz>
> On Wed, Jul 18, 2018 at 05:58:16PM +0300, Heikki Linnakangas wrote:
> > I'd suggest that we continue based on the patch that Kyotaro posted at
> > https://www.postgresql.org/message-id/20180330.100646.86008470.horiguchi.kyotaro%40lab.ntt.co.jp.
> 
> Whatever happens here, perhaps one way to move on would be to commit
> first the TAP test that I proposed upthread.  That would not work for
> wal_level=minimal so this part should be commented out, but that's
> easier this way to test basically all the cases we talked about with any
> approach taken.

https://www.postgresql.org/message-id/20180704045912.gg1...@paquier.xyz

However I'm not sure the policy (if any) allows us to add a test
that should success, I'm not opposed to do that. But even if we
did that, it won't be visible to other than us in this thread. It
seems to me more or less similar to pasting a boilerplate that
points to the above message in this thread, or just writing "this
patch passes "the" test.".

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Optimizer misses big in 10.4 with BRIN index

2018-07-25 Thread David Rowley
On 26 July 2018 at 04:50, Tomas Vondra  wrote:
> My guess is this is the root cause - the estimated number of rows is much
> higher than in practice (3377106 vs. 23040), so at the end the seqscan is
> considered to be slightly cheaper and wins. But the actual row count is
> ~150x lower, making the bitmap index scan way faster.

Isn't the 23040 just the totalpages * 10 per `return totalpages * 10;`
in bringetbitmap()?

The BRIN index costing was changed quite dramatically in [1] which
first appears in pg10. Previous to that patch BRIN assumed it would
touch total_pages * pred_selectivity blocks in the bitmap scan.  That
worked well when the table happened to be perfectly ordered by the
column of the BRIN index, but was pretty broken when the order was
random.  That lead to BRIN indexes being used when they really
shouldn't have been.  The patch tried to fix that by using what
information it could. That was basically the correlation statistics
from pg_statistic.   It appeared that the patch was producing more
realistic plans than unpatched, so we went with it, but it's
definitely not perfect; what statistics are?

It would be quite interesting to know the result of:

select stakind3 from pg_Statistic where starelid =
'schema0_lab.data_table'::regclass;

I also see the estimated costs of either plan are very close, so the
chosen plan may well be quite susceptible to changing after different
ANALYZE runs on the table.

[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7e534adcdc70866e7be74d626b0ed067c890a251

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-25 Thread Ashutosh Bapat
On Thu, Jul 26, 2018 at 2:12 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Jul 24, 2018 at 7:51 AM, Etsuro Fujita
>>  wrote:
>>> I'm not sure that's a good idea, because I think we have a trade-off
>>> relation; the more we make create_plan simple, the more we need to make
>>> earlier states of the planner complicated.
>>>
>>> And it looks to me like the partitionwise join code is making earlier (and
>>> later) stages of the planner too complicated, to make create_plan simple.
>
>> I think that create_plan is *supposed* to be simple.  Its purpose is
>> to prune away data that's only needed during planning and add things
>> that can be computed at the last minute which are needed at execution
>> time.  Making it do anything else is, in my opinion, not good.
>
> I tend to agree with Robert on this.  In particular, if you put anything
> into create_plan or later that affects cost estimates, you're really doing
> it wrong, because changing those estimates might've changed the plan
> entirely.  (As I recall, we have a couple of cheats like that now,
> associated with dropping no-op projection and subquery scan nodes.
> But let's not add more.)

+1

>
> TBH, I think this entire discussion is proving what sheer folly it was
> to allow partitions to have rowtypes not identical to their parent.
> But I suppose we're stuck with that now.
>

Every table created has a different rowtype and when the partition's
rowtype is different from that of the partitioned table, we add a
ConvertRowtypeExpr node on top of the whole-row expression.

2153 if (appinfo->parent_reltype != appinfo->child_reltype)
2154 {
2155 ConvertRowtypeExpr *r =
makeNode(ConvertRowtypeExpr);

We may be able to set rowtype of a partition to be same as the rowtype
of a partitioned table when a partition table is created using an
ALTER TABLE syntax. But are you suggesting that we change the reltype
of a partition being attached? What happens when we detach a partition
and let the table live independent of the partitioned table? Do we
create a new reltype?

I am not even going into the problem when we try to attach a partition
with different tuple layout. I do believe that such usecases arise and
rewriting the table being attached is a viable option.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-25 Thread Michael Paquier
Hi,

So, I have spent the last couple of days trying to figure out a nice
solution for VACUUM, TRUNCATE and REINDEX, and attached is a set of
three patches to address the issues of this thread:
1) 0001 does the work for TRUNCATE, but using RangeVarGetRelidExtended
with a custom callback based on the existing truncate_check_rel().  I
had to split the session-level checks into a separate routine as no
Relation is available, but that finishes by being a neat result without
changing any user-facing behavior.
2) 0002 does the work for VACUUM.  It happens that vacuum_rel() already
does all the skip-related checks we need to know about to decide if a
relation needs to be vacuum'ed or not, so I refactored things as
follows:
2-1) For a VACUUM manual listing relations, issue an ERROR if it cannot
be vacuum'ed. Previously vacuum_rel() would just log a WARNING and call
it a day *after* locking the relation.  But as we need to rely on
RangeVarGetRelidExtended() an ERROR is necessary.  The ERROR happens
only if VACUUM FULL is used.
2-2) When a relation list is not specified in a manual VACUUM command,
then the decision to skip the relation is done in get_all_vacuum_rels()
when building the relation list with the pg_class lookup.  This logs a
DEBUG message when the relation is skipped, which is more information
that what we have now.  The checks need to happen again in vacuum_rel as
the VACUUM work could be spawned across multiple transactions, where a
WARNING is logged.
3) REINDEX is already smart enough to check for ownership of relations
if one is manually listed and reports an ERROR.  However it can cause
the instance to be stuck when doing a database-wide REINDEX on a
database using just the owner of this database.  In this case it seems
to me that we need to make ReindexMultipleTables in terms of ACL
checks, as per 0003.

I quite like the shape of the patches proposed here, and the refactoring
is I think pretty clear.  Each patch can be treated independently as
well.  Comments are welcome.  (Those patches are not indented yet, which
does not matter much at this stage anyway.)

Thanks,
--
Michael
From 4dc84bb5fb0d33a5f6fedd31828adda9db8fd2d4 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 26 Jul 2018 11:51:45 +0900
Subject: [PATCH 1/3] Refactor TRUNCATE execution to avoid early lock lookups

A caller of TRUNCATE can perform early lookup obtention which can cause
other sessions to block on the request done, causing potentially DOS
attacks as even a non-privileged user can attempt a truncation of a
critical catalog table to block even all incoming connection attempts.

This commit fixes the case of TRUNCATE so as RangeVarGetRelidExtended is
used with a callback doing the necessary checks at an earlier stage,
avoiding locking issues at early stages.

Author: Michael Paquier
Discussion: https://postgr.es/m/152512087100.19803.12733865831237526...@wrigleys.postgresql.org
---
 src/backend/commands/tablecmds.c | 83 +---
 1 file changed, 65 insertions(+), 18 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index eb2d33dd86..b06c9dbf63 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -300,7 +300,10 @@ struct DropRelationCallbackState
 #define child_dependency_type(child_is_partition)	\
 	((child_is_partition) ? DEPENDENCY_AUTO : DEPENDENCY_NORMAL)
 
-static void truncate_check_rel(Relation rel);
+static void truncate_check_rel(Oid relid, Form_pg_class reltuple);
+static void truncate_check_session(Relation rel);
+static void RangeVarCallbackForTruncate(const RangeVar *relation,
+			Oid relId, Oid oldRelId, void *arg);
 static List *MergeAttributes(List *schema, List *supers, char relpersistence,
 bool is_partition, List **supOids, List **supconstr,
 int *supOidCount);
@@ -1336,15 +1339,25 @@ ExecuteTruncate(TruncateStmt *stmt)
 		bool		recurse = rv->inh;
 		Oid			myrelid;
 
-		rel = heap_openrv(rv, AccessExclusiveLock);
-		myrelid = RelationGetRelid(rel);
+		myrelid = RangeVarGetRelidExtended(rv, AccessExclusiveLock,
+		   false, RangeVarCallbackForTruncate,
+		   NULL);
+
+		/* open the relation, we already hold a lock on it */
+		rel = heap_open(myrelid, NoLock);
+
 		/* don't throw error for "TRUNCATE foo, foo" */
 		if (list_member_oid(relids, myrelid))
 		{
 			heap_close(rel, AccessExclusiveLock);
 			continue;
 		}
-		truncate_check_rel(rel);
+
+		/*
+		 * RangeVarGetRelidExtended has done some basic checks with its
+		 * callback, and only remain session-level checks.
+		 */
+		truncate_check_session(rel);
 		rels = lappend(rels, rel);
 		relids = lappend_oid(relids, myrelid);
 		/* Log this relation only if needed for logical decoding */
@@ -1367,7 +1380,9 @@ ExecuteTruncate(TruncateStmt *stmt)
 
 /* find_all_inheritors already got lock */
 rel = heap_open(childrelid, NoLock);
-truncate_check_rel(rel);
+truncate_check_rel(RelationGetRelid(rel), 

Re: print_path is missing GatherMerge and CustomScan support

2018-07-25 Thread Etsuro Fujita

(2018/07/19 14:11), Ashutosh Bapat wrote:

On Thu, Jul 19, 2018 at 5:37 AM, Michael Paquier  wrote:

On Wed, Jul 18, 2018 at 12:15:25PM +0530, Ashutosh Bapat wrote:

Yes that's right. Thanks for taking care of it.


Okay, I have pushed a fix for this one as that's wrong and
back-patched to v11.  The coverage of reparameterize_path_by_child is
actually quite poor if you look at the reports:
https://coverage.postgresql.org/src/backend/optimizer/util/pathnode.c.gcov.html

Could it be possible to stress that more?  This way mistakes like this
one could have been avoided from the start.


I had extensive testcases in my original patch-set to exercise that
code but 1. that testset was too extensive; even today
partition_join.sql is a separate testcase and it's quite large. 2.
that function returns NULL rather than throwing an error, if it can
not produce a parameterized path. So, unless we check whether each of
those paths get created no test is useful and that can only be done
through an EXPLAIN OUTPUT which means that the testcase becomes
fragile. I fine if we want to add more tests just to cover the code if
those are not as fragile and do not blow up partition_join too much.


It would not be possible to cover these cases:

case T_GatherPath:
{
GatherPath *gpath;

FLAT_COPY_PATH(gpath, path, GatherPath);
REPARAMETERIZE_CHILD_PATH(gpath->subpath);
new_path = (Path *) gpath;
}
break;

case T_GatherMergePath:
{
GatherMergePath *gmpath;

FLAT_COPY_PATH(gmpath, path, GatherMergePath);
REPARAMETERIZE_CHILD_PATH(gmpath->subpath);
new_path = (Path *) gmpath;
}
break;

because we currently don't consider gathering partial child-scan or 
child-join paths.  I think we might consider that in future, though.


Best regards,
Etsuro Fujita



Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack

2018-07-25 Thread Michael Paquier
On Tue, Jul 24, 2018 at 06:27:16PM +0900, Kyotaro HORIGUCHI wrote:
> I may be misunderstanding something because (really ?) it's still
> extremely hot in Japan, today.

It is better since yesterday, in exchange of a typhoon heading straight
to Tokyo :) 
--
Michael


signature.asc
Description: PGP signature


Re: PartitionDispatch's partdesc field

2018-07-25 Thread Amit Langote
On 2018/07/26 5:23, Robert Haas wrote:
> I noticed today that in the PartitionDispatch structure, the partdesc
> field is set but not used.  So we could remove it.  See attached
> pd-partdesc-remove.patch.  If we want to go this route, I suggest
> doing a slightly more thorough removal and getting rid of the key
> field as well.  See attached pd-partdesc-and-key-remove.patch.
> 
> Another alternative, which I think might make more sense, is to make
> use pd->key and pd->partdesc in preference to pd->reldesc->rd_partkey
> and pd->reldesc->rd_partdesc.  It seems to me that the idea of the
> PartitionDispatch structure is that it gathers together all of the
> information that we need for tuple routing, so it might make sense for
> the tuple routing code ought to get the information from there rather
> than referring back to the RelationDesc.  See attached
> pd-partdesc-use.patch.

+1 to pd-partdesc-use.patch.

> Basically, the decision here is whether we want to avoid invoking
> RelationGetPartitionKey and RelationGetPartitionDesc over and over
> again, either for reasons of notational cleanliness (macro names are
> long) or safety (absolutely no chance that the answer will change) or
> efficiency (maybe those macros will someday do more than just a
> pointer dereference?).  If so, we should cache the data in the
> PartitionDispatch object and then use it.

I agree.

Thanks,
Amit




Re: Making "COPY partitioned_table FROM" faster

2018-07-25 Thread David Rowley
On 25 July 2018 at 04:37, Simon Riggs  wrote:
> I don't see any need here for another GUC, nor even a command option.
> The user has already indicated their use case to us:

I agree.

> We know that the common case for RANGE partitioned tables is to load
> into the one current partition. We also know that the partition might
> change as we load data, when the data loaded crosses the partition
> boundary, so we need this optimization to be adaptive. Not all data
> loads follow that rule, so we also need the adpative algorithm to shut
> off for those cases.
>
> We also know that the common case for HASH partitions is to load into
> all partitions at once, since hash is specifically designed to spread
> data out across partitions. It is almost never true that we would want
> to load one partition at a time, so it seems easy to turn the
> optimization off if we use this type of partitioning. Or better, we
> need work done to improve that case also, but that is outside the
> current scope of this patch.
>
> If we have multi-level partitions, if any of the levels includes a
> Hash, then turn it off.
>
> LIST partitions are less likely to have a clear pattern, so I would
> treat them like HASH and assume the data is not sorted by partition.
>
> So for this patch, just add an "if (RANGE)" test.

I agree RANGE partition is probably the most likely case to benefit
from this optimisation, but I just don't think that HASH could never
benefit and LIST probably sits somewhere in the middle.

HASH partitioning might be useful in cases like partitioning by
"sensor_id".  It does not seem that unreasonable that someone might
want to load all the data for an entire sensor at once.

The v3 version of the patch also fixes the very small performance
regression for the (probably quite likely) worst-case situation.  New
performance is about 3.5% faster instead of 0.5-1% slower. So likely
there's no longer any need to consider this.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Explain buffers wrong counter with parallel plans

2018-07-25 Thread Amit Kapila
On Wed, Jul 25, 2018 at 7:42 PM, Jonathan S. Katz
 wrote:
>
> On Jul 7, 2018, at 12:03 AM, Amit Kapila  wrote:
>
> I tried running the below on both 11beta2 and HEAD with the patch
> applied:
>
> CREATE UNLOGGED TABLE t1 (c1 int);
> INSERT INTO t1 SELECT generate_series(1,1);
> /** restart PostgreSQL */
> EXPLAIN (analyze,buffers,timing off,costs off)
> SELECT count(*) FROM t1;
> /** repeat above two queries */
>
> I have identical postgresql.conf files on both instances as well.
>
> 11beta2
> ==
> buffers=# explain (analyze,buffers,timing off,costs off) select count(*)
> from t1;
> QUERY PLAN
>
> --
>  Finalize Aggregate (actual rows=1 loops=1)
>Buffers: shared read=63175
>->  Gather (actual rows=7 loops=1)
>  Workers Planned: 6
>  Workers Launched: 6
>  Buffers: shared read=63175
>  ->  Partial Aggregate (actual rows=1 loops=7)
>Buffers: shared read=442478
>->  Parallel Seq Scan on t1 (actual rows=14285714
> loops=7)
>  Buffers: shared read=442478
>  Planning Time: 1.422 ms
>  Execution Time: 3214.407 ms
> (12 rows)
>
> buffers=# explain (analyze,buffers,timing off,costs off) select count(*)
> from t1;
> QUERY PLAN
>
> --
>  Finalize Aggregate (actual rows=1 loops=1)
>Buffers: shared hit=27 read=64960
>->  Gather (actual rows=7 loops=1)
>  Workers Planned: 6
>  Workers Launched: 6
>  Buffers: shared hit=27 read=64960
>  ->  Partial Aggregate (actual rows=1 loops=7)
>Buffers: shared hit=224 read=442254
>->  Parallel Seq Scan on t1 (actual rows=14285714
> loops=7)
>  Buffers: shared hit=224 read=442254
>  Planning Time: 0.080 ms
>  Execution Time: 3774.091 ms
> (12 rows)
>
> HEAD
> =
> buffers=# explain (analyze,buffers,timing off,costs off) select count(*)
> from t1;
> QUERY PLAN
>
> --
>  Finalize Aggregate (actual rows=1 loops=1)
>Buffers: shared read=442478
>->  Gather (actual rows=7 loops=1)
>  Workers Planned: 6
>  Workers Launched: 6
>  Buffers: shared read=442478
>  ->  Partial Aggregate (actual rows=1 loops=7)
>Buffers: shared read=442478
>->  Parallel Seq Scan on t1 (actual rows=14285714
> loops=7)
>  Buffers: shared read=442478
>  Planning Time: 1.594 ms
>  Execution Time: 6207.799 ms
> (12 rows)
>
> buffers=# explain (analyze,buffers,timing off,costs off) select count(*)
> from t1;
> QUERY PLAN
>
> --
>  Finalize Aggregate (actual rows=1 loops=1)
>Buffers: shared hit=224 read=442254
>->  Gather (actual rows=7 loops=1)
>  Workers Planned: 6
>  Workers Launched: 6
>  Buffers: shared hit=224 read=442254
>  ->  Partial Aggregate (actual rows=1 loops=7)
>Buffers: shared hit=224 read=442254
>->  Parallel Seq Scan on t1 (actual rows=14285714
> loops=7)
>  Buffers: shared hit=224 read=442254
>  Planning Time: 0.074 ms
>  Execution Time: 5006.395 ms
> (12 rows)
>
> Notice the “read” numbers just before the “Finalize Aggregate” is much
> higher,
>

You mean to say the number (Buffers: shared read=442478) in HEAD,
right?  If so, yeah, I am also wondering why the results of the patch
are different in HEAD and 11beta2.  So, if I read correctly, the
numbers in 11beta2 appears correct, but on HEAD it is not correct, it
should have divided the buffers read by loops.  I will figure that
out, but this is just to clarify that both of us are seeing the
results in the same way.

>and there appears to be a performance hit.
>

Do you mean to say the performance of the same query in 11beta2 and
HEAD or something else?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Making "COPY partitioned_table FROM" faster

2018-07-25 Thread David Rowley
Hi Melanie,

Many thanks for looking over this again.

On 25 July 2018 at 03:32, Melanie Plageman  wrote:
> One small additional typo I noticed:
>
> In the patched code on line 2555, there appears to be a typo:
>  /* ...
>  * inserting into and act differently if the tuples that have already
>  * been processed any prepared for insertion are not there.
>  */
> The word "any" here is probably wrong, though I am actually not sure what
> was meant.

It was meant to be "and"

> Though it is from the existing comments, it would be nice to spell out once
> what is meant by "BR trigger". I'm sure it is mentioned elsewhere in the
> code, but, since it is not easily googled, it might be helpful to specify.
>
>> Many thanks to both of you for the thorough review. I hope the
>> attached addresses all the concerns.
>
>
> I feel that the code itself is clear and feel our concerns are addressed.

Great!

>> One final note:  I'm not entirely convinced we need this adaptive
>> code, but it seems easy enough to rip it back out if it's more trouble
>> than it's worth. But if the other option is a GUC, then I'd rather
>> stick with the adaptive code, it's likely going to do much better than
>> a GUC since it can change itself during the copy which will be useful
>> when the stream contains a mix small and large sets of consecutive
>> tuples which belong to the same partition.
>>
> Though the v2 numbers do look better, it doesn't complete alleviate the
> slow-down in the worst case. Perhaps the GUC and the adaptive code are not
> alternatives and could instead be used together. You could even make the
> actual RECHECK_MULTI_INSERT_THRESHOLD the GUC.
> However, I think that the decision as to whether or not it makes sense to do
> the adaptive code without a GUC is really based on the average use case, to
> which I can't really speak.
> The counter-argument I see, however, is that it is not acceptable to have
> completely un-optimized insertion of data into partition tables and that an
> overwhelming flood of GUCs is undesirable.

I'm pretty much against a GUC, because:

1) Most people won't use it or know about it, and;
2) Copy streams might contain mixes of long runs of tuples belonging
to the same partition and periods where the tuple changes frequently.
GUC cannot be set to account for that, and;
3) Because of the following:

I looked at this patch again and I saw that there are a few more
things we can do to improve the performance further.

I've made the following changes:

a) Made the avgTuplesPerPartChange >= 1.3 the first condition in the
test that sets leafpart_use_multi_insert.
b) Added an unlikely() when testing of the partition's resultRelInfo
has set to be initialised. This is only true the first time a
partition is inserted into during the COPY.
c) Added unlikely() around lastPartitionSampleLineNo test.  This will
only be true 1 in 1000, so pretty unlikely.
d) Moved the nBufferedTuples > 0 test under the insertMethod ==
CIM_MULTI_CONDITIONAL test. It's never going to be > 0 when we're
doing CIM_SINGLE.

I spun up another AWS m5d.large instance to test this. This time I
only tested the case where the partition changes on every tuple.  The
new patch takes about 96.5% of the time that master takes. I performed
10 runs of each and tested both with fsync=on and fsync=off.  I've
attached the results in spreadsheet form.

I think given the new performance results there's no need for any GUC
or conditionally enabling this optimisation based on partitioning
strategy.

v3 patch is attached.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


copy_speedup_v3_results.ods
Description: application/vnd.oasis.opendocument.spreadsheet


v3-0001-Allow-multi-inserts-during-COPY-into-a-partitione.patch
Description: Binary data


Re: no partition pruning when partitioning using array type

2018-07-25 Thread Amit Langote
On 2018/07/26 1:41, Alvaro Herrera wrote:
> On 2018-Jul-12, Amit Langote wrote:
>> On 2018/07/12 2:33, Alvaro Herrera wrote:
>>> Yeah, any domain constraints added before won't be a problem.  Another
>>> angle on this problem is to verify partition bounds against the domain
>>> constraint being added; if they all pass, there's no reason to reject
>>> the constraint.
>>
>> That's an idea.  It's not clear to me how easy it is to get hold of all
>> the partition bounds that contain domain values.  How do we get from the
>> domain on which a constraint is being added to the relpartbound which
>> would contain the partition bound value of the domain?
> 
> Well, we already get all table columns using a domain when the domain
> gets a new constraint; I suppose it's just a matter of verifying for
> each column whether it's part of the partition key, and then drill down
> from there.  (I'm not really familiar with that part of the catalog.)

Possibly doable, but maybe leave it for another day.

>>> Actually, here's another problem: why are we letting a column on a
>>> domain become partition key, if you'll never be able to create a
>>> partition?  It seems that for pg11 the right reaction is to check
>>> the partition key and reject this case.
>>
>> Yeah, that seems much safer, and given how things stand today, no users
>> would complain if we do this.
> 
> Agreed.
> 
>>> I'm not sure of this implementation -- I couldn't find any case where we
>>> reject the deletion on the function called from doDeletion (and we don't
>>> have a preliminary phase on which to check for this, either).  Maybe we
>>> need a pg_depend entry to prevent this, though it seems a little silly.
>>> Maybe we should add a preliminary verification phase in
>>> deleteObjectsInList(), where we invoke object-type-specific checks.
>>
>> Doing pre-check based fix had crossed my mind, but I didn't try hard to
>> pursue it.  If we decide to just prevent domain partition keys, we don't
>> need to try too hard now to come up with a nice implementation for this,
>> right?
> 
> Absolutely.

OK, attached is a patch for that.

Thanks,
Amit
From 042aa582f717ceb695a1ab60517c2e9f7d04704b Mon Sep 17 00:00:00 2001
From: amit 
Date: Thu, 26 Jul 2018 11:07:51 +0900
Subject: [PATCH v1] Disallow domain type partition key

---
 src/backend/commands/tablecmds.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index eb2d33dd86..871c831cd2 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -13693,6 +13693,12 @@ ComputePartitionAttrs(Relation rel, List *partParams, 
AttrNumber *partattrs,
atttype = attform->atttypid;
attcollation = attform->attcollation;
ReleaseSysCache(atttuple);
+
+   /* Prevent using domains as a partition key. */
+   if (get_typtype(atttype) == TYPTYPE_DOMAIN)
+   ereport(ERROR,
+   
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+errmsg("cannot use 
domain type column as partition key")));
}
else
{
@@ -13703,6 +13709,12 @@ ComputePartitionAttrs(Relation rel, List *partParams, 
AttrNumber *partattrs,
atttype = exprType(expr);
attcollation = exprCollation(expr);
 
+   /* Prevent using domains as a partition key. */
+   if (get_typtype(atttype) == TYPTYPE_DOMAIN)
+   ereport(ERROR,
+   
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+errmsg("cannot use 
domain type expression as partition key")));
+
/*
 * Strip any top-level COLLATE clause.  This ensures 
that we treat
 * "x COLLATE y" and "(x COLLATE y)" alike.
-- 
2.11.0



Re: [Proposal] Add accumulated statistics for wait event

2018-07-25 Thread Michael Paquier
On Tue, Jul 24, 2018 at 04:23:03PM +, Phil Florent wrote:
> It loses non meaningful details and it's in fact a good point. In this
> example, sampling will definitely find the cause and won't cost
> resources.

The higher the sampling frequency, the more details you get, with the
most load on the instance.  So if you are able to take an infinity of
samples, where registering multiple times the same event for the same
backend also matters because its overall weight gets higher and it shows
up higher in profiles, then you would be able converge to the set of
results that this patch adds.  Sampling method, especially its
frequency, is something controlled by the client and not the server.
Approaches like the one proposed here push the load on the server-side,
unconditionally, for *all* backends, and this has its cost.

Even if you have spiky workloads, sampling may miss those, but even with
adding counters for each event you would need to query the table holding
the counters at an insane frequency to be able to perhaps get something
out of it as you need to do sampling of the counters as well to extract
deltas.

As Tomas has mentioned up-thread, sampling is light-weight, as-is the
current design for wait events.  Even if it is not perfect because it
cannot give exact numbers, it would find bottlenecks in really most
cases, and that's what matters.  If not, increasing the sampling
frequency makes things easier to detect as well.  What would be the
point of taking only one sample every checkpoint for example?

There may be a benefit in having counters, I don't know the answer to
that, though the point would be to make sure that there is a specific
set of workloads where it makes sense, still my gut feeling is that
sampling would be able to detect those anyway.

(I am not a computer scientist by default but a physicist, think fluid
dynamics and turbulence, and I had my load of random events and signal
analysis as well.  All that is just statistics with attempts to approach
reality, where sampling is a life-saver over exactitude of
measurements.)

Adding hooks is not acceptable to me either, those have a cost, and it
is not clear what's the benefit we can get into putting hooks in such a
place for cases other than sampling and counters...
--
Michael


signature.asc
Description: PGP signature


Re: Alter index rename concurrently to

2018-07-25 Thread Corey Huinker
>
> You appear to be saying that you think that renaming an index
> concurrently is not safe.  In that case, this patch should be rejected.
> However, I don't think it necessarily is unsafe.  What we need is some
> reasoning about the impact, not a bunch of different options that we
> don't understand.
>

I've had this same need, and dreamed this same solution before. I also
thought about a syntax like ALTER INDEX foo RENAME TO
WHATEVER-IT-WOULD-HAVE-BEEN-NAMED-BY-DEFAULT to aid this situation.

But all of those needs fade if we have REINDEX CONCURRENTLY. I think that's
where we should focus our efforts.

A possible side effort into something like a VACUUM FULL CONCURRENTLY,
which would essentially do what pg_repack does, but keeping the same oid
and the stats that go with it, but even that's a nice-to-have add-on to
REINDEX CONCURRENTLY.


Re: LLVM jit and matview

2018-07-25 Thread Andres Freund
On 2018-07-25 18:59:51 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-07-25 18:11:13 -0400, Alvaro Herrera wrote:
> >> But what would be the advantage of avoiding the context release inside
> >> FreeExecutorState?  It seems pretty appropriate to me to do it there.
> >> You could argue that the JIT context is definitely part of the estate
> >> being freed.  Just amend the comment, no?
> 
> > I agree it's right to do it there.  I think I'm more questioning whether
> > there's even a need to adapt the comment, given it's really a local
> > memory resource. But I guess I'll just add a 'and ...' after
> > "ExprContexts within the EState".
> 
> +1.  Clarity is good.

Pushed something along those lines.

Thanks again Dmitry for reporting the issue!


- Andres



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-07-25 Thread Andres Freund
Hi,

On 2018-07-20 13:24:50 -0700, Andres Freund wrote:
> On 2018-07-20 16:15:14 -0400, Tom Lane wrote:
> > We've seen several occurrences of $subject in the buildfarm in the past
> > month or so.  Scraping the logs, I find
> > 
> >  coypu| 2018-06-14 21:17:49 | HEAD  | Check | 2018-06-14 
> > 23:31:44.505 CEST [5b22deb8.30e1:124] ERROR:  could not read block 3 in 
> > file "base/16384/2662": read only 0 of 8192 bytes
> >  coypu| 2018-06-14 21:17:49 | HEAD  | Check | 2018-06-14 
> > 23:31:44.653 CEST [5b22deb8.6d4d:132] ERROR:  could not read block 3 in 
> > file "base/16384/2662": read only 0 of 8192 bytes
> >  gull | 2018-06-21 04:02:11 | HEAD  | Check | 2018-06-20 
> > 21:27:06.843 PDT [31138:3] FATAL:  could not read block 3 in file 
> > "base/16384/2662": read only 0 of 8192 bytes
> >  mandrill | 2018-06-22 16:14:16 | HEAD  | Check | 2018-06-22 
> > 16:46:24.138 UTC [14353240:43] pg_regress/create_table_like ERROR:  could 
> > not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes
> >  mandrill | 2018-06-22 16:14:16 | HEAD  | Check | 2018-06-22 
> > 16:46:24.137 UTC [20710034:1] ERROR:  could not read block 3 in file 
> > "base/16384/2662": read only 0 of 8192 bytes
> >  lapwing  | 2018-07-05 07:20:02 | HEAD  | Check | 2018-07-05 
> > 07:21:45.585 UTC [24022:1] ERROR:  could not read block 3 in file 
> > "base/16384/2662": read only 0 of 8192 bytes
> >  lapwing  | 2018-07-05 07:20:02 | HEAD  | Check | 2018-07-05 
> > 07:21:45.591 UTC [23941:39] pg_regress/roleattributes ERROR:  could not 
> > read block 3 in file "base/16384/2662": read only 0 of 8192 bytes
> >  mantid   | 2018-07-17 01:20:04 | REL_11_STABLE | Check | 2018-07-16 
> > 21:21:32.557 EDT [26072:1] ERROR:  could not read block 3 in file 
> > "base/16384/2662": read only 0 of 8192 bytes
> >  mantid   | 2018-07-17 01:20:04 | REL_11_STABLE | Check | 2018-07-16 
> > 21:21:32.557 EDT [25990:82] ERROR:  could not read block 3 in file 
> > "base/16384/2662": read only 0 of 8192 bytes
> >  coypu| 2018-07-17 01:27:39 | HEAD  | Check | 2018-07-17 
> > 03:47:26.264 CEST [5b4d4aa4.3ac4:71] ERROR:  could not read block 3 in file 
> > "base/16384/2662": read only 0 of 8192 bytes
> >  coypu| 2018-07-20 11:18:02 | HEAD  | Check | 2018-07-20 
> > 13:32:14.104 CEST [5b51c833.4884:131] ERROR:  could not read block 3 in 
> > file "base/16384/2662": read only 0 of 8192 bytes
> > 
> > While I didn't go back indefinitely far, there are no other occurrences of
> > "could not read block" failures in the last three months.  This suggests
> > strongly that we broke something in early June.  Don't know what though.
> > Ideas?
> > 
> > In case anyone's wondering, 2662 is pg_class_oid_index.
> 
> While I do not immediately see how, and the affected versions don't
> really point that way, the timing does make me think of
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a54e1f1587793b5bf926630ec9ce142e4578d7a0
> 
> HEAD/REL_11_STABLE apparently solely being affected points elsewhere,
> but I don't immediatley know where.

Hm, there was:
http://archives.postgresql.org/message-id/20180628150209.n2qch5jtn3vt2xaa%40alap3.anarazel.de

I don't immediately see it being responsible, but I wonder if there's a
chance it actually is: Note that it happens in a parallel group that
includes vacuum.sql, which does a VACUUM FULL pg_class - but I still
don't immediately see how it could apply.

But either way, we need to fix the issue mentioned in the above email.

Greetings,

Andres Freund



Re: LLVM jit and matview

2018-07-25 Thread Tom Lane
Andres Freund  writes:
> On 2018-07-25 18:11:13 -0400, Alvaro Herrera wrote:
>> But what would be the advantage of avoiding the context release inside
>> FreeExecutorState?  It seems pretty appropriate to me to do it there.
>> You could argue that the JIT context is definitely part of the estate
>> being freed.  Just amend the comment, no?

> I agree it's right to do it there.  I think I'm more questioning whether
> there's even a need to adapt the comment, given it's really a local
> memory resource. But I guess I'll just add a 'and ...' after
> "ExprContexts within the EState".

+1.  Clarity is good.

regards, tom lane



Re: Scariest patch tournament, PostgreSQL 11 edition

2018-07-25 Thread Peter Geoghegan
On Mon, Jun 25, 2018 at 3:52 PM, Alvaro Herrera
 wrote:
> One month of beta testing has flown by, and enough bugs have already
> been reported that your view of what patches are scariest might have
> matured.  You still have a few days before we close the contest at the
> end of the month.  Let us know what patches you think are scariest:

Is there going to be an announcement of the results?

-- 
Peter Geoghegan



Re: LLVM jit and matview

2018-07-25 Thread Alvaro Herrera
On 2018-Jul-25, Andres Freund wrote:

> On 2018-07-25 18:11:13 -0400, Alvaro Herrera wrote:
> > On 2018-Jul-25, Andres Freund wrote:

> > But what would be the advantage of avoiding the context release inside
> > FreeExecutorState?  It seems pretty appropriate to me to do it there.
> > You could argue that the JIT context is definitely part of the estate
> > being freed.  Just amend the comment, no?
> 
> I agree it's right to do it there.

Ok.

> I think I'm more questioning whether there's even a need to adapt the
> comment, given it's really a local memory resource. But I guess I'll
> just add a 'and ...' after "ExprContexts within the EState".

Yeah, adding two words there sounds good.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: LLVM jit and matview

2018-07-25 Thread Alvaro Herrera
On 2018-Jul-25, Andres Freund wrote:

> The fix is easy, releasing the JIT context should just happen in
> FreeExecutorState(). Only thing is that that function has the following
> comment in the header:
>  * Note: this is not responsible for releasing non-memory resources,
>  * such as open relations or buffer pins.  But it will shut down any
>  * still-active ExprContexts within the EState.  That is sufficient
>  * cleanup for situations where the EState has only been used for expression
>  * evaluation, and not to run a complete Plan.
> 
> I don't really think throwing away functions is a violation of that, but
> I think it's possible to argue the other way?

I suppose the other possible way about it is to say estate->es_jit in a
local variable so that you can call it after FreeExecutorState.  But
what would be the advantage of avoiding the context release inside
FreeExecutorState?  It seems pretty appropriate to me to do it there.
You could argue that the JIT context is definitely part of the estate
being freed.  Just amend the comment, no?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Allow COPY's 'text' format to output a header

2018-07-25 Thread Simon Muller
On 25 July 2018 at 19:24, Cynthia Shang 
wrote:

>
> I've reviewed this patch and feel this patch addresses the original ask. I
> tested it manually trying to break it and, as mentioned previously, it's
> behavior is the same as the CSV copy with regards to it's shortcomings.
> However, I feel
> 1) a "copy from" test is needed and
> 2) the current "copy to" test is (along with a few others) in the wrong
> file.
>
> With regards to #2, the copy.source tests are for things requiring
> replacement when running the tests. Given that these copy tests do not, I
> have moved the current last set of copy tests to the copy2.sql file and
> have provided an attached patch.
>
>
Thanks for reviewing the patch.

I agree that moving those previous and these new tests out of the .source
files seems to make more sense as they don't make use of the
preprocessing/replacement feature.

With regards to #1, the patch I have provided can then be used and the
> following added as the COPY TO/FROM tests (perhaps after line 426 of the
> attached copy2.sql file).  Note that I moved the FROM test before the TO
> test and omitted the "(format text, header true)" in the FROM test since it
> is another way the command can be invoked.
>
> copy copytest3 from stdin header;
> this is just a line full of junk that would error out if parsed
> 11 a 1
> 22 b 2
> \.
>
> copy copytest3 to stdout with (format text, header true);
>
>
>
I've incorporated both your suggestions and included the patch you provided
in the attached patch. Hope it's as expected.


> As for the matching check of the header in the discussion of this patch, I
> feel that is a separate patch that can be added later since it would affect
> the general functionality of the copy command, not just the ability to have
> a text header.
>
> Best,
> - Cynthia Shang
>

P.S. I did receive the first attached patch, but on my Ubuntu I had to
apply it using "git apply --ignore-space-change --ignore-whitespace",
probably due to line ending differences.

--
Simon Muller


text_header_v4.patch
Description: Binary data


Re: LLVM jit and matview

2018-07-25 Thread Andres Freund
Hi,

On 2018-07-25 08:41:29 -0700, Andres Freund wrote:
> Oh, interesting. It only crashes when compiling LLVM without LLVM's
> asserts enabled, even when using exactly the same LLVM checkout for both
> builds. No idea what that's about, yet.

Oh, well, this took me longer than it should have.  The problem, to my
possibly demented mind, is actually kinda funny:

The problematic expression in the query invokes ExecEvalRowNull() (it's
EEOP_WHOLEROW followed by EEOP_NULLTEST_ROWISNOTNULL). With inlining
enabled, that ends up inlining ExecEvalRowNull(), ExecEvalRowNullInt(),
get_cached_rowtype(), ShutdownTupleDescRef() (largely because these are
static functions that'd otherwise prevent ExecEvalRowNull from being
inlined).
get_cached_rowtype() does
/* Need to register shutdown callback to release tupdesc */
RegisterExprContextCallback(econtext,
ShutdownTupleDescRef,
PointerGetDatum(cache_field));
which, in the inlined version, means that ShutdownTupleDescRef is the
inlined copy. So, the query execution sets up a shutdown callback, that
is a JITed function.

But note standard_ExecutorEnd():

/* release JIT context, if allocated */
if (estate->es_jit)
{
jit_release_context(estate->es_jit);
estate->es_jit = NULL;
}

/*
 * Must switch out of context before destroying it
 */
MemoryContextSwitchTo(oldcontext);

/*
 * Release EState and per-query memory context.  This should release
 * everything the executor has allocated.
 */
FreeExecutorState(estate);

FreeExecutorState(), which calls the shutdown callbacks, is executed
*after* the JIT context is destroyed! Thereby causing the segfault, as
the shutdown callback now invokes unmapped memory.

The fix is easy, releasing the JIT context should just happen in
FreeExecutorState(). Only thing is that that function has the following
comment in the header:
 * Note: this is not responsible for releasing non-memory resources,
 * such as open relations or buffer pins.  But it will shut down any
 * still-active ExprContexts within the EState.  That is sufficient
 * cleanup for situations where the EState has only been used for expression
 * evaluation, and not to run a complete Plan.

I don't really think throwing away functions is a violation of that, but
I think it's possible to argue the other way?

Greetings,

Andres Freund



Re: Loaded footgun open_datasync on Windows

2018-07-25 Thread Thomas Munro
On Thu, Jun 21, 2018 at 2:06 AM, Laurenz Albe  wrote:
> How about checking what the buildfarm thinks about the attached?

Hi Laurenz,

It looks like initdb is failing with this patch:

https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.6732

Unfortunately cfbot is not clever enough to print out the contents of
the initdb log so I don't have any more information...

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-25 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jul 24, 2018 at 7:51 AM, Etsuro Fujita
>  wrote:
>> I'm not sure that's a good idea, because I think we have a trade-off
>> relation; the more we make create_plan simple, the more we need to make
>> earlier states of the planner complicated.
>> 
>> And it looks to me like the partitionwise join code is making earlier (and
>> later) stages of the planner too complicated, to make create_plan simple.

> I think that create_plan is *supposed* to be simple.  Its purpose is
> to prune away data that's only needed during planning and add things
> that can be computed at the last minute which are needed at execution
> time.  Making it do anything else is, in my opinion, not good.

I tend to agree with Robert on this.  In particular, if you put anything
into create_plan or later that affects cost estimates, you're really doing
it wrong, because changing those estimates might've changed the plan
entirely.  (As I recall, we have a couple of cheats like that now,
associated with dropping no-op projection and subquery scan nodes.
But let's not add more.)

TBH, I think this entire discussion is proving what sheer folly it was
to allow partitions to have rowtypes not identical to their parent.
But I suppose we're stuck with that now.

regards, tom lane



BLOB / CLOB support in PostgreSQL

2018-07-25 Thread Vladimir Sitnikov
Hi,

According to Pgjdbc GitHub statistics, the most popular page is
https://github.com/pgjdbc/pgjdbc/issues/1102 which is
 "org.postgresql.jdbc.PgConnection.createClob() is not yet implemented"
issue (1600 visits from 1400 unique visitors per a fortnight).

There are workarounds to silence the error, however I'm sure CLOB (as in
"streaming text datatype") is not yet supported in PostgreSQL backend.

I have browsed pgsql-hackers mailing list re CLOB, and it looks like
there's no relevant discussion, so I'm quite sure I've done my homework on
"check prior mails regarding the subject".

**Issue**: there's no sensible way to map java.sql.Clob to the existing
backend datatypes. `text` can't stream (it always loads the data fully),
and it is 1GB limited.

Java distinguishes java.sql.Blob and java.sql.Clob.
Blob is a "binary stream with streaming features". It can be mapped to
existing "Large Objects", and existing Large Object API somewhat suits for
the implementation.
There are glitches (like "pgjdbc has to perform 4 API calls
tell/seek/tell/seek in order just to get LO length once"), however it is
fine.

Java Clob API is just a dozen of methods (13 to be exact), however there
are two major issues there:
1) "Large Object" is just a binary object. There's no way to tell if the
contents is a UTF-8 string or Windows-1251 string or protobuf-encoded
message or whatever.
That is if pgjdbc encodes java.sql.Clob (large string) into some form of
binary (e.g. UTF-8) and store it as PostgreSQL Large Object, then this LO
automatically becomes "pgjdbc-specific blob".
There's no way to use the data in SQL or pl/pgsql or other applications.
For instance, one can't perform " where clob_column like '%abcd%' "

2) "characters". For instance, `long length()` should return the number of
characters in the string.
If pgjdbc implements java.sql.Clob as a UTF-8 encoded binary, then it would
have to **process the whole blob** in order to measure string length.
The same thing goes for `String getSubString(long pos, int length)`. It
would have to process all the bytes up to character `long pos` (how
otherwise it would know byte position for character `pos`?).

Currently pgjdbc encodes strings using client_encoding, stores them as LO,
and has been like that for ages. Apparently that might easily produce
garbage in the DB if clients use various encodings, however pgjdbc's
default setting is to use UTF-8 so the problem should be not that visible.

I fully understand LO has issues with "removing obsolete entries", however
mapping java.sql.Clob to `text` seems to make less sense.
For instance: suppose pgjdbc choses "Clob == text". Then a client meets
"1GB" limit.

"Streaming TOAST data" looks more like a research project rather than a
clear thing to implement.

What if there was a standard of storing strings in Large Objects?
For instance: "CLOB is a UTF-8 encoded string stored as a single LO". When
such an agreement exists, various applications could read and write the
data.
Of course, UTF-8 might not suit everybody, so a format might be "prefix
that specifies encoding, then encoded string".
Of course both variations above fail to support streaming (as in "need to
process all the contents in order to get the last character"), so it might
be better to use
"prefix that specifies encoding + 'index block' (that specifies offsets for
each 1M characters) + encoded string".
I'm sure there are known algorithms to store strings in binary format that
support subsequence / overwrite / length in reasonable time (O(1) or O(N)
with reasonable constant).
There might be an option to use UTF-16 (so each "character" becomes 2 bytes
always), however it would come at a cost of space usage.

**Here goes the question**:  do you think such an implementation ("large
string stored in Large Objects" could be merged into the core eventually)?

Q2: any ideas/existing libraries for random access read-write large strings
stored as binary?

PS. Relevant pgjdbc PR is https://github.com/pgjdbc/pgjdbc/pull/1272

-- 
Regards,
Vladimir Sitnikov


Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-07-25 Thread Robert Haas
On Tue, Jul 24, 2018 at 7:51 AM, Etsuro Fujita
 wrote:
>> Isn't that assumption fundamental to your whole approach?
>
> I don't think so.  What I mean here is: currently the subplan would be a
> scan/join node, but in future we might have eg, a Sort node atop the
> scan/join node, so it would be better to update the patch to handle such a
> case as well.

But how would you do that?

 I think that's a bad idea.  The target list affects lots
 of things, like costing.  If we don't insert a ConvertRowTypeExpr into
 the child's target list, the costing will be wrong to the extent that
 ConvertRowTypeExpr has any cost, which it does.
>>>
>>>
>>> Actually, this is not true at least currently, because
>>> set_append_rel_size
>>> doesn't do anything about the costing:
>>
>>
>> Why would it?  Append can't project, so the cost of any expressions
>> that appear in its target list is irrelevant.  What is affected is the
>> cost of the scans below the Append -- see e.g. cost_seqscan(), which
>> uses the data produced by set_pathtarget_cost_width().
>
> By set_rel_size()?

Sorry, I don't understand what you mean by this.

> I'm not sure that's a good idea, because I think we have a trade-off
> relation; the more we make create_plan simple, the more we need to make
> earlier states of the planner complicated.
>
> And it looks to me like the partitionwise join code is making earlier (and
> later) stages of the planner too complicated, to make create_plan simple.

I think that create_plan is *supposed* to be simple.  Its purpose is
to prune away data that's only needed during planning and add things
that can be computed at the last minute which are needed at execution
time.  Making it do anything else is, in my opinion, not good.

> When considering paritionwise joining, it would make things complicated to
> have a ConvertRowtypeExpr in a partrel's targetlist, because as discussed
> upthread, it deviates from the planner's assumption that a rel's targetlist
> would only include Vars and PHVs.  So, I propose to include a child
> whole-row Var in the targetlist instead, in which case, we need to fix the
> Var after the fact, but can avoid making many other parts of the planner
> complicated.

Well, I could have the wrong idea here, but I tend to think allowing
for ConvertRowTypeExpr elsewhere won't be that bad.

Does anyone else want to weigh in on this issue?  Tom, perhaps?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



PartitionDispatch's partdesc field

2018-07-25 Thread Robert Haas
I noticed today that in the PartitionDispatch structure, the partdesc
field is set but not used.  So we could remove it.  See attached
pd-partdesc-remove.patch.  If we want to go this route, I suggest
doing a slightly more thorough removal and getting rid of the key
field as well.  See attached pd-partdesc-and-key-remove.patch.

Another alternative, which I think might make more sense, is to make
use pd->key and pd->partdesc in preference to pd->reldesc->rd_partkey
and pd->reldesc->rd_partdesc.  It seems to me that the idea of the
PartitionDispatch structure is that it gathers together all of the
information that we need for tuple routing, so it might make sense for
the tuple routing code ought to get the information from there rather
than referring back to the RelationDesc.  See attached
pd-partdesc-use.patch.

Basically, the decision here is whether we want to avoid invoking
RelationGetPartitionKey and RelationGetPartitionDesc over and over
again, either for reasons of notational cleanliness (macro names are
long) or safety (absolutely no chance that the answer will change) or
efficiency (maybe those macros will someday do more than just a
pointer dereference?).  If so, we should cache the data in the
PartitionDispatch object and then use it.  If not, there seems to be
no reason to copy those pointers from the Relation to the
PartitionDispatch in the first place, since the latter has a pointer
to the former anyway.

Opinions?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pd-partdesc-remove.patch
Description: Binary data


pd-partdesc-and-key-remove.patch
Description: Binary data


pd-partdesc-use.patch
Description: Binary data


Re: JIT breaks PostGIS

2018-07-25 Thread Andres Freund
Hi,

On 2018-07-25 21:59:32 +0200, Christoph Berg wrote:
> Re: Andres Freund 2018-07-25 
> <20180725195037.jmykfzfporf6a...@alap3.anarazel.de>
> > The way inlining works is that, when referencing a function, the bitcode
> > summary file corresponding to it (either postgres.index.bc if builtin or
> > $extension.index.bc if in an extension) gets opened.  That contains
> > metadata (name, number of instructions, ...) about the functions
> > included in the indexed .bc files (which reside in
> > $module/path/to/$file.bc). Those .bc files in turn are generated by
> > clang -emit-llvm.  The inlining machinery looks up functions in the
> > corresponding .index.bc, checks whether they are smaller than the
> > instruction limit, and inlines them if below.
> 
> Is that top-level functions (SQL language "C" functions), or all
> C-code functions?

The "initial" entry point has to be a C (or INTERNAL) function. But
called function can recursively get inlined (with the size limit being
halved every recursion step). Outside of hooks etc there's no other way
to call extension functions outside of SQL level functions, so that's
not really a restriction.


> > If a function is not in the summary, or it is too large, it'll just
> > generate an external function call. It's perfectly normal to have too
> > large functions and functions that aren't present (e.g. random libraries
> > including libc).
> 
> What happens if some (SQL) function is in there, but calls into a
> function that is not? Or is in a different index?

I'm not precisely sure what you mean. You're thinking of a C UDF that's
calling a C function (not a UDF) that's not in the index? If so, then
we'll just not inline that inner function, but generate a normal
function call (i.e. on asm level that'll be a callq instruction or
whatever your platform wants to do).

Greetings,

Andres Freund



Re: JIT breaks PostGIS

2018-07-25 Thread Christoph Berg
Re: Andres Freund 2018-07-25 <20180725195037.jmykfzfporf6a...@alap3.anarazel.de>
> > Different question, the other way round, is there a way to know that
> > all files needed to inline a query/extension are there? How does the
> > JIT machinery determine it can (try to) compile things? (That's
> > something extension packages might want to test for.)
> 
> I'm not 100% sure I understand your question. Let me describe how it
> works, and maybe you can then rephrase afterwards?
> 
> The way inlining works is that, when referencing a function, the bitcode
> summary file corresponding to it (either postgres.index.bc if builtin or
> $extension.index.bc if in an extension) gets opened.  That contains
> metadata (name, number of instructions, ...) about the functions
> included in the indexed .bc files (which reside in
> $module/path/to/$file.bc). Those .bc files in turn are generated by
> clang -emit-llvm.  The inlining machinery looks up functions in the
> corresponding .index.bc, checks whether they are smaller than the
> instruction limit, and inlines them if below.

Is that top-level functions (SQL language "C" functions), or all
C-code functions?

> If a function is not in the summary, or it is too large, it'll just
> generate an external function call. It's perfectly normal to have too
> large functions and functions that aren't present (e.g. random libraries
> including libc).

What happens if some (SQL) function is in there, but calls into a
function that is not? Or is in a different index?

Christoph



Re: JIT breaks PostGIS

2018-07-25 Thread Andres Freund
Hi,

On 2018-07-25 21:39:26 +0200, Christoph Berg wrote:
> Re: Andres Freund 2018-07-25 
> <20180725191143.sietxlbfehv24...@alap3.anarazel.de>
> > I haven't investigated the details here.  It certainly would be possible
> > to have the _PG_init() of postgis's so force JIT to be off, and emit a
> > warning.
> 
> Isn't that too late, if postgis.so gets loaded by a query that is in
> to process of being jit'ed?

Hm, I'd guess it'd still be fine, but I haven't thought it sufficiently
through.


> Different question, the other way round, is there a way to know that
> all files needed to inline a query/extension are there? How does the
> JIT machinery determine it can (try to) compile things? (That's
> something extension packages might want to test for.)

I'm not 100% sure I understand your question. Let me describe how it
works, and maybe you can then rephrase afterwards?

The way inlining works is that, when referencing a function, the bitcode
summary file corresponding to it (either postgres.index.bc if builtin or
$extension.index.bc if in an extension) gets opened.  That contains
metadata (name, number of instructions, ...) about the functions
included in the indexed .bc files (which reside in
$module/path/to/$file.bc). Those .bc files in turn are generated by
clang -emit-llvm.  The inlining machinery looks up functions in the
corresponding .index.bc, checks whether they are smaller than the
instruction limit, and inlines them if below.

If a function is not in the summary, or it is too large, it'll just
generate an external function call. It's perfectly normal to have too
large functions and functions that aren't present (e.g. random libraries
including libc).

What precisely would you want to test?

Greetings,

Andres Freund



Re: Early WIP/PoC for inlining CTEs

2018-07-25 Thread David Fetter
On Wed, Jul 25, 2018 at 04:18:42PM +0100, Andrew Gierth wrote:
> > "David" == David Fetter  writes:
> 
>  David> Please find attached a version rebased atop 167075be3ab1547e18
>  David> with what I believe are appropriate changes to regression test
>  David> output. The other changes to the regression tests output are
>  David> somewhat puzzling, as they change the actual results of queries.
> 
> Both of those changes are the result of volatile CTEs being inlined more
> than once (in one case, as part of an explicit test to ensure that CTEs
> are being materialized and not multiply evaluated).
> 
> If you look for the XXX comment in the patch, it should be easy to add a
> check that skips inlining if cterefcount > 1 and
> contains_volatile_functions is true.

Thanks for the broad hints!

Please find attached the next version, which passes 'make check'.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 92cc38740341b74dd7d26c0fd80d285554faf038 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Tue, 24 Jul 2018 22:09:55 -0700
Subject: [PATCH] Inlining CTEs v0003
To: pgsql-hack...@postgresql.org

---
 src/backend/nodes/copyfuncs.c |   1 +
 src/backend/nodes/equalfuncs.c|   1 +
 src/backend/nodes/outfuncs.c  |   1 +
 src/backend/nodes/readfuncs.c |   1 +
 src/backend/optimizer/plan/subselect.c| 152 ++
 src/include/nodes/parsenodes.h|   1 +
 src/test/regress/expected/rowsecurity.out |  92 ++---
 src/test/regress/expected/rowtypes.out|  13 +-
 src/test/regress/expected/rules.out   |   6 +-
 9 files changed, 201 insertions(+), 67 deletions(-)

diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 17b650b8cb..8ba2fc1aed 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2530,6 +2530,7 @@ _copyCommonTableExpr(const CommonTableExpr *from)
 	COPY_NODE_FIELD(ctequery);
 	COPY_LOCATION_FIELD(location);
 	COPY_SCALAR_FIELD(cterecursive);
+	COPY_SCALAR_FIELD(ctematerialized);
 	COPY_SCALAR_FIELD(cterefcount);
 	COPY_NODE_FIELD(ctecolnames);
 	COPY_NODE_FIELD(ctecoltypes);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 378f2facb8..2a04506d0d 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2793,6 +2793,7 @@ _equalCommonTableExpr(const CommonTableExpr *a, const CommonTableExpr *b)
 	COMPARE_NODE_FIELD(ctequery);
 	COMPARE_LOCATION_FIELD(location);
 	COMPARE_SCALAR_FIELD(cterecursive);
+	COMPARE_SCALAR_FIELD(ctematerialized);
 	COMPARE_SCALAR_FIELD(cterefcount);
 	COMPARE_NODE_FIELD(ctecolnames);
 	COMPARE_NODE_FIELD(ctecoltypes);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index a6454ce28b..0ea6b7125a 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -3080,6 +3080,7 @@ _outCommonTableExpr(StringInfo str, const CommonTableExpr *node)
 	WRITE_NODE_FIELD(ctequery);
 	WRITE_LOCATION_FIELD(location);
 	WRITE_BOOL_FIELD(cterecursive);
+	WRITE_BOOL_FIELD(ctematerialized);
 	WRITE_INT_FIELD(cterefcount);
 	WRITE_NODE_FIELD(ctecolnames);
 	WRITE_NODE_FIELD(ctecoltypes);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 9a01eb6b63..01db3c1c85 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -409,6 +409,7 @@ _readCommonTableExpr(void)
 	READ_NODE_FIELD(ctequery);
 	READ_LOCATION_FIELD(location);
 	READ_BOOL_FIELD(cterecursive);
+	READ_BOOL_FIELD(ctematerialized);
 	READ_INT_FIELD(cterefcount);
 	READ_NODE_FIELD(ctecolnames);
 	READ_NODE_FIELD(ctecoltypes);
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 83008d7661..5aa242ff26 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1146,6 +1146,141 @@ hash_ok_operator(OpExpr *expr)
 }
 
 
+struct inline_cte_walker_ctx
+{
+	const char *ctename;
+	int levelsup;
+	int refcount;
+	Query *ctequery;
+	CommonTableExpr *cte;
+};
+
+static bool inline_cte_walker(Node *node, void *ctxp)
+{
+	struct inline_cte_walker_ctx *ctx = ctxp;
+
+	if (!node)
+		return false;
+
+	if (IsA(node, Query))
+	{
+		/*
+		 * This one is a bit tricky. It's our job to handle the recursion here,
+		 * but we do some of the lifting normally handled by query_tree_walker
+		 * in order to get the sequence of operations right.
+		 *
+		 * First, if the Query we're looking at is the one containing our CTE
+		 * definition, then we don't need to recurse into our own CTE or CTEs
+		 * that are earlier in the list than ours (since cteList has been
+		 * sorted for us into dependency order). We could check whether a
+		 * nested query is hiding ours, but that seems too much of an edge case
+		 * to be worth optimizing (the levelsup check will ensure we don't
+		 * 

Re: JIT breaks PostGIS

2018-07-25 Thread Andres Freund
Hi,

On 2018-07-25 21:05:33 +0200, Christoph Berg wrote:
> > > It'll only be an issue for extensions that throw c++ style exceptions. I
> > > don't think that rises to the level of disallowing any LLVM version <
> > > 5.0. I suggest postgis adds an error check to its buildprocess that
> > > refuses to run if jit is enabled and a too old version is used?
> > 
> > Unfortunately that's not an option.
> 
> Is it possible to disable JIT per extension, say by removing all .bc
> files for it, or via a PGXS variable? Or does this bug still trigger
> even without the .bc files for PostGIS?

I haven't investigated the details here.  It certainly would be possible
to have the _PG_init() of postgis's so force JIT to be off, and emit a
warning.

There's no way to just force JIT off whenever anything involving postgis
is present in the query. Not installing the .bc files will prevent
inlining, but I don't think that's sufficient to prevent the problem in
all cases.


> > I think that a good way to deal with it will be to bump minimum required
> > version of LLVM to 5 on Postgres build, with a notice in docs that will say
> > that "you can build with lower version, but that will give you this and
> > this bug".
> 
> Is LLVM << 5 generally acceptable for use with PostgreSQL

It is. Newer version wouldn't hurt though.


> , or should we look into getting a new version to distribute along
> with it on apt.postgresql.org? I'd rather like to avoid having to ship
> a compiler...

Well, you don't need to ship the compiler (clang), strictly
speaking. But yea.


> > It also may happen that a patch for LLVM can be applied to LLVM4 build in
> > Debian and brought in as an update, but such a plan should not be a default
> > one.
> 
> That's actually a plan I like very much. Hopefully the patch is small
> and backpatchable.

It's not entirely trivial, and I suspect there's independent changes
making it not apply cleanly:
https://github.com/llvm-mirror/llvm/commit/ab3dba86f951a1bdfe01d3102e2850e607af791a

Greetings,

Andres Freund



Re: JIT breaks PostGIS

2018-07-25 Thread Christoph Berg
Re: Darafei "Komяpa" Praliaskouski 2018-07-23 

> > It looks to me like it's a LLVM issue, specifically
> > https://bugs.llvm.org/show_bug.cgi?id=34424
> > fixed in LLVM 5+.
> >
> 
> Thank you for your investigation.

Thanks!

> > It'll only be an issue for extensions that throw c++ style exceptions. I
> > don't think that rises to the level of disallowing any LLVM version <
> > 5.0. I suggest postgis adds an error check to its buildprocess that
> > refuses to run if jit is enabled and a too old version is used?
> 
> Unfortunately that's not an option.

Is it possible to disable JIT per extension, say by removing all .bc
files for it, or via a PGXS variable? Or does this bug still trigger
even without the .bc files for PostGIS?

> If Christoph decides to keep LLVM enabled for 11 on that platform, as he is
> recommended upthread by Tom, that would mean that PostGIS can't be packaged
> at all, and all the people who need it will have to stay on Postgres 10.

We'll definitely need to find a proper fix, not shipping PostGIS is
not an option.

> If PostGIS decides not to implement the check, and instead tweaks test
> runner to execute `set jit to off;` before tickets.sql, then Postgres 11 on
> that platform will have a known way to segfault it, even without superuser
> rights, as jit GUC is tweakable by anyone.

Not a good option, ack.

> I think that a good way to deal with it will be to bump minimum required
> version of LLVM to 5 on Postgres build, with a notice in docs that will say
> that "you can build with lower version, but that will give you this and
> this bug".

Is LLVM << 5 generally acceptable for use with PostgreSQL, or should
we look into getting a new version to distribute along with it on
apt.postgresql.org? I'd rather like to avoid having to ship a
compiler...

> It also may happen that a patch for LLVM can be applied to LLVM4 build in
> Debian and brought in as an update, but such a plan should not be a default
> one.

That's actually a plan I like very much. Hopefully the patch is small
and backpatchable.

Christoph



Re: Early WIP/PoC for inlining CTEs

2018-07-25 Thread Nico Williams
On Tue, Jul 24, 2018 at 07:57:49PM -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-07-24 19:49:19 -0400, Tom Lane wrote:
> >> However, a singly-referenced SELECT CTE could reasonably be treated as
> >> equivalent to a sub-select-in-FROM, and then you would have the same
> >> mechanisms for preventing inlining as you do for those cases,
> >> e.g. OFFSET 0.  And sticking in OFFSET 0 would be backwards-compatible
> >> too: your code would still work the same in older releases, unlike if
> >> we invent new syntax for this.
> 
> > I still think this is just doubling down on prior mistakes.
> 
> Not following what you think a better alternative is?  I'd be the
> first to agree that OFFSET 0 is a hack, but people are used to it.
> 
> Assuming that we go for inline-by-default for at least some cases,
> there's a separate discussion to be had about whether it's worth
> making a planner-control GUC to force the old behavior.  I'm not
> very excited about that, but I bet some people will be.

It is widely known that CTEs in PG are optimizer barriers.

That actually is useful, and I do make use of that fact (though I'm not
proud of it).

My proposal is that PG add an extension for specifying that a CTE is to
be materialized (barrier) or not (then inlined).

Nico
-- 



Re: Early WIP/PoC for inlining CTEs

2018-07-25 Thread Nico Williams
On Wed, Jul 25, 2018 at 05:08:43PM +0100, Andrew Gierth wrote:
>  Nico> It is possible to add a keyword for this purpose in the WITH syntax:
> 
>  Nico> WITH   VIEW (...) AS a_view
> 
> The existing (and standard) syntax is WITH ctename AS (query).

Oy, I flubbed that up.

> Syntaxes that have been suggested for explicitly controlling the
> materialization are along the lines of:
> 
> WITH ctename AS [[NOT] MATERIALIZED] (query)
> 
> (MATERIALIZED is already an un-reserved keyword)

Works for me.



Re: How can we submit code patches that implement our (pending) patents?

2018-07-25 Thread Nico Williams
On Wed, Jul 25, 2018 at 11:45:58AM -0400, Chapman Flack wrote:
> On 07/25/2018 11:25 AM, Nico Williams wrote:
> 
> > I don't understand why it's not obvious that one can unknowingly and
> > accidentally re-invent someone else's idea.
> 
> It's perfectly obvious. It's the chief reason the whole topic
> of software patents has been deeply controversial for so long.

Thanks.

> You seem to be using it as part of a proof by contradiction:

wat

>   One can unknowingly and accidentally reinvent
>   a patented idea.
> 
>   If that were not tidily excused in practice,
>   software patents would be deeply problematic.
>   --
> 
>   Therefore, it must be the case that unknowing and
>   accidental reinvention is tidily excused in practice.
> 
> I don't think it works.

I didn't say it's excused.  The damages that can be awarded are just
three times less.  In practice it is common to settle for no longer
infringing.

What are you proposing anyways?  That every commit come with a patent
search?

Nico
-- 



Re: Allow COPY's 'text' format to output a header

2018-07-25 Thread Cynthia Shang
On Wed, Jul 25, 2018 at 1:24 PM, Cynthia Shang
 wrote:

> With regards to #2, the copy.source tests are for things requiring
> replacement when running the tests. Given that these copy tests do not, I
> have moved the current last set of copy tests to the copy2.sql file and have
> provided an attached patch.

The patch appears in the RAW and in your email (hopefully) but it
doesn't appear in the thread archive so I am reattaching from a
different email client.


move-copy-tests-v1.patch
Description: Binary data


Re: Allow COPY's 'text' format to output a header

2018-07-25 Thread Cynthia Shang
On 4 July 2018 at 22:44, Simon Muller  wrote:I noticed through the patch tester link at http://commitfest.cputube.org/ that my patch caused a file_fdw test to fail (since I previously tested only with "make check" and not with "make check-world").This v2 patch should fix that.This patch just fixes a newline issue introduced in my previous patch.I've reviewed this patch and feel this patch addresses the original ask. I tested it manually trying to break it and, as mentioned previously, it's behavior is the same as the CSV copy with regards to it's shortcomings. However, I feel 	1) a "copy from" test is needed and 	2) the current "copy to" test is (along with a few others) in the wrong file. With regards to #2, the copy.source tests are for things requiring replacement when running the tests. Given that these copy tests do not, I have moved the current last set of copy tests to the copy2.sql file and have provided an attached patch. With regards to #1, the patch I have provided can then be used and the following added as the COPY TO/FROM tests (perhaps after line 426 of the attached copy2.sql file).  Note that I moved the FROM test before the TO test and omitted the "(format text, header true)" in the FROM test since it is another way the command can be invoked.copy copytest3 from stdin header;this is just a line full of junk that would error out if parsed11	a	122	b	2\.copy copytest3 to stdout with (format text, header true);As for the matching check of the header in the discussion of this patch, I feel that is a separate patch that can be added later since it would affect the general functionality of the copy command, not just the ability to have a text header.

move-copy-tests-v1.patch
Description: Binary data
Best,- Cynthia Shang

Re: Optimizer misses big in 10.4 with BRIN index

2018-07-25 Thread Tomas Vondra

Hi,

On 07/25/2018 03:58 PM, Arcadiy Ivanov wrote:
  ->  Bitmap Index Scan on tradedate_idx 
(cost=0.00..231.96 rows=3377106 width=0) (actual time=4.500..4.500 
rows=23040 loops=1)
    Index Cond: data_table.data ->> 
'tradeDate'::text))::numeric >= '1531267200'::numeric) AND 
(((data_table.data ->> 'tradeDate'::text))::numeric <= 
'1531353600'::numeric))


My guess is this is the root cause - the estimated number of rows is 
much higher than in practice (3377106 vs. 23040), so at the end the 
seqscan is considered to be slightly cheaper and wins. But the actual 
row count is ~150x lower, making the bitmap index scan way faster.


IMHO you'll need to find a way to improve the estimates, which may be 
difficult. The first thing I'd try is creating an expression index on 
the expression you use in the WHERE clause. Something like


CREATE INDEX ON data_table (((data_table.data ->> 
'tradeDate'::text))::numeric);


And then ANALYZE the table again ...

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: no partition pruning when partitioning using array type

2018-07-25 Thread Alvaro Herrera
On 2018-Jul-12, Amit Langote wrote:

> On 2018/07/12 2:33, Alvaro Herrera wrote:

> > Yeah, any domain constraints added before won't be a problem.  Another
> > angle on this problem is to verify partition bounds against the domain
> > constraint being added; if they all pass, there's no reason to reject
> > the constraint.
> 
> That's an idea.  It's not clear to me how easy it is to get hold of all
> the partition bounds that contain domain values.  How do we get from the
> domain on which a constraint is being added to the relpartbound which
> would contain the partition bound value of the domain?

Well, we already get all table columns using a domain when the domain
gets a new constraint; I suppose it's just a matter of verifying for
each column whether it's part of the partition key, and then drill down
from there.  (I'm not really familiar with that part of the catalog.)

> > Actually, here's another problem: why are we letting a column on a
> > domain become partition key, if you'll never be able to create a
> > partition?  It seems that for pg11 the right reaction is to check
> > the partition key and reject this case.
> 
> Yeah, that seems much safer, and given how things stand today, no users
> would complain if we do this.

Agreed.

> > I'm not sure of this implementation -- I couldn't find any case where we
> > reject the deletion on the function called from doDeletion (and we don't
> > have a preliminary phase on which to check for this, either).  Maybe we
> > need a pg_depend entry to prevent this, though it seems a little silly.
> > Maybe we should add a preliminary verification phase in
> > deleteObjectsInList(), where we invoke object-type-specific checks.
> 
> Doing pre-check based fix had crossed my mind, but I didn't try hard to
> pursue it.  If we decide to just prevent domain partition keys, we don't
> need to try too hard now to come up with a nice implementation for this,
> right?

Absolutely.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Using test_ddl_deparse as an extension

2018-07-25 Thread Alvaro Herrera
On 2018-Jul-25, Jeremy Finzel wrote:

> I am interested in using the exact functionality in test_ddl_deparse which
> provides the sub command types for alter table statements.  I would prefer
> to leverage this code which has already been vetted by the community, and
> also not just duplicate this code myself, or package it up myself.
> 
> It would be nice to just use the extension.  But I'm interested to hear
> what recommendations could be given for my use case (for that matter - what
> is the recommendation to anyone wanting to use the exact functionality
> provided in a postgres test module).
> 
> My specific intention is for sending DDL via logical replication, and
> ignoring, for example, enable/disable trigger statements when I am using
> selective replication.

The design for the intended functionality was to emit a JSON blob that
could be modified or examined programmatically; so you would be able to
obtain specific bits of knowledge (is this going into schema XYZ? etc).
There was a module that was about 90% complete before I abandoned it
because of changing priorities, needed just "a little push" and I never
got around to doing that.  Alexander Shulgin spent some time with it and
moved it forward a good bit, but I didn't get time to review his work,
so he understandably gave up.  I think with a little bit more effort we
could get it in core, which I'd love ...

That test code was more a placeholder to have *something* to ensure we
didn't completely break the feature, but I don't really see it as a good
way forward for the feature.  Let's not turn it into a production-level
feature.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: How can we submit code patches that implement our (pending) patents?

2018-07-25 Thread jonasmehler46
On 2018-Jul-07, David Fetter composed: 

> If they have no plans to practice any exclusive rights, our standard thing 

> process where individuals submit things and consent to have us name them 

> with the PGDG copyright and distribute them under TPL would be the 

> most straightforward approach to achieve it. 

Eh, however in the event that the submitting organization has licenses,
would it not be untrustworthy 

to distribute as PGDG copyright and permit with no joined patent give? 

Some other organization getting a restrictive fork from Postgres could later 

be sued by the submitting organization, on the grounds that there is no
legitimate standing 

for them to utilize the licensed code. 

TBH I don't see in what capacity would we be able to double permit the code
in a way that 

secures those restrictive forks. Can you (Andres) clarify what is the 

thought?

Regards:
Kamagra   



-
Kamagra 
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



"WIP: Data at rest encryption" patch and, 2 phase commit.

2018-07-25 Thread Toshi Harada
Hi.

I applied the patch "WIP: Data at rest encryption" to PostgreSQL 11 - beta 2 
and I'm working on it.

When this patch is applied, the following problem occurs.

* An error occurs when CHECKPOINT is executed during two-phase commit.
* After an error occurs, if you stop PostgreSQL, it will never start again.

(1) First, execute PREPARE TRANSACTION.

postgres=# BEGIN;
BEGIN
postgres=# PREPARE TRANSACTION 'foo';
PREPARE TRANSACTION
postgres=#

(2) Execute the CHECKPOINT command from another terminal.
CHEKPOINT command fails.

postgres=# CHECKPOINT;
ERROR:  checkpoint request failed
HINT:  Consult recent messages in the server log for details.
postgres=#


(3) ROLLBACK PREPARED command also fails.

postgres=# ROLLBACK PREPARED 'foo';
ERROR:  could not read two-phase state from WAL at 0/167EBA0
postgres=# 


(4) Shut down the PostgreSQL server.
During shutdown, a "could not read two-phase state from WAL" error occurs.


2018-07-23 14:49:08.924 JST [15821] LOG:  received fast shutdown request
2018-07-23 14:49:08.925 JST [15821] LOG:  aborting any active transactions
2018-07-23 14:49:08.925 JST [15831] FATAL:  terminating connection due to 
administrator command
2018-07-23 14:49:08.928 JST [15821] LOG:  background worker "logical 
replication launcher" (PID 15829) exited with exit code 1
2018-07-23 14:49:08.928 JST [15824] LOG:  shutting down
2018-07-23 14:49:08.935 JST [15824] FATAL:  could not read two-phase state from 
WAL at 0/167EBA0
2018-07-23 14:49:08.936 JST [15821] LOG:  checkpointer process (PID 15824) 
exited with exit code 1
2018-07-23 14:49:08.936 JST [15821] LOG:  terminating any other active server 
processes
2018-07-23 14:49:08.937 JST [15821] LOG:  abnormal database system shutdown
2018-07-23 14:49:08.945 JST [15821] LOG:  database system is shut down


(5) When restarting the PostgreSQL server, an error(could not read two-phase 
state from WAL) occurs 
and the PostgreSQL server can not be started.

2018-07-23 14:49:42.489 JST [15864] LOG:  listening on IPv6 address "::1", port 
5432
2018-07-23 14:49:42.489 JST [15864] LOG:  listening on IPv4 address 
"127.0.0.1", port 5432
2018-07-23 14:49:42.492 JST [15864] LOG:  listening on Unix socket 
"/tmp/.s.PGSQL.5432"
2018-07-23 14:49:42.521 JST [15866] LOG:  database system shutdown was 
interrupted; last known up at 2018-07-23 14:49:08 JST
2018-07-23 14:49:42.674 JST [15866] LOG:  database system was not properly shut 
down; automatic recovery in progress
2018-07-23 14:49:42.676 JST [15866] LOG:  redo starts at 0/167EB60
2018-07-23 14:49:42.676 JST [15866] LOG:  invalid record length at 0/167EC70: 
wanted 24, got 0
2018-07-23 14:49:42.676 JST [15866] LOG:  redo done at 0/167EC30
2018-07-23 14:49:42.677 JST [15866] FATAL:  could not read two-phase state from 
WAL at 0/167EBA0
2018-07-23 14:49:42.678 JST [15864] LOG:  startup process (PID 15866) exited 
with exit code 1
2018-07-23 14:49:42.678 JST [15864] LOG:  aborting startup due to startup 
process failure
2018-07-23 14:49:42.682 JST [15864] LOG:  database system is shut down

Regards.


Harada Toshi.
NTT TechnoCross Corporation

Antonin Houska  wrote:
> Ants Aasma  wrote:
> 
> > Attached to this mail is a work in progress patch that adds an
> > extensible encryption mechanism. There are some loose ends left to tie
> > up, but the general concept and architecture is at a point where it's
> > ready for some feedback, fresh ideas and bikeshedding.
> 
> Rebased patch is attached here, in case it helps to achieve (some of) the
> goals mentioned in the related thread [1].
> 
> Besides encrypting table and WAL pages, it encrypts the temporary files
> (buffile.c), data stored during logical decoding (reorderbuffer.c) and
> statistics temporary files (pgstat.c). Unlike the previous version, SLRU files
> (e.g. CLOG) are not encrypted (it does not seem critical and the encryption
> makes torn page write quite difficult to handle).
> 
> Another difference is that we use the OpenSSL of the (tweaked) AES XTS cipher
> now.
> 
> Binary upgrade from unencrypted to encrypted cluster is not implemented yet.
> 
> 
> [1] 
> https://www.postgresql.org/message-id/031401d3f41d$5c70ed90$1552c8b0$@lab.ntt.co.jp
> 
> -- 
> Antonin Houska
> Cybertec Schonig & Schonig GmbH
> Grohrmuhlgasse 26, A-2700 Wiener Neustadt
> Web: https://www.cybertec-postgresql.com
> 





Re: Early WIP/PoC for inlining CTEs

2018-07-25 Thread Andrew Gierth
> "Nico" == Nico Williams  writes:

 Nico> It is possible to add a keyword for this purpose in the WITH syntax:

 Nico> WITH   VIEW (...) AS a_view

The existing (and standard) syntax is WITH ctename AS (query).

Syntaxes that have been suggested for explicitly controlling the
materialization are along the lines of:

WITH ctename AS [[NOT] MATERIALIZED] (query)

(MATERIALIZED is already an un-reserved keyword)

-- 
Andrew (irc:RhodiumToad)



Re: How can we submit code patches that implement our (pending) patents?

2018-07-25 Thread Andres Freund
On 2018-07-25 11:45:58 -0400, Chapman Flack wrote:
> On 07/25/2018 11:25 AM, Nico Williams wrote:
> 
> > I don't understand why it's not obvious that one can unknowingly and
> > accidentally re-invent someone else's idea.
> 
> It's perfectly obvious. It's the chief reason the whole topic
> of software patents has been deeply controversial for so long.
> 
> You seem to be using it as part of a proof by contradiction:
> 
> 
>   One can unknowingly and accidentally reinvent
>   a patented idea.
> 
>   If that were not tidily excused in practice,
>   software patents would be deeply problematic.
>   --
> 
>   Therefore, it must be the case that unknowing and
>   accidental reinvention is tidily excused in practice.

This seems like a discussion more suited to be over a beer, or some
other medium, not the PG lists.

Greetings,

Andres Freund



Re: How can we submit code patches that implement our (pending) patents?

2018-07-25 Thread Chapman Flack
On 07/25/2018 11:25 AM, Nico Williams wrote:

> I don't understand why it's not obvious that one can unknowingly and
> accidentally re-invent someone else's idea.

It's perfectly obvious. It's the chief reason the whole topic
of software patents has been deeply controversial for so long.

You seem to be using it as part of a proof by contradiction:


  One can unknowingly and accidentally reinvent
  a patented idea.

  If that were not tidily excused in practice,
  software patents would be deeply problematic.
  --

  Therefore, it must be the case that unknowing and
  accidental reinvention is tidily excused in practice.


I don't think it works.

-Chap



Re: LLVM jit and matview

2018-07-25 Thread Andres Freund
On 2018-07-25 14:59:20 +0200, Dmitry Dolgov wrote:
> > On Wed, 25 Jul 2018 at 07:49, Michael Paquier  wrote:
> >
> > On Tue, Jul 24, 2018 at 07:49:56PM -0700, Andres Freund wrote:
> > > FWIW, this doesn't crash for me, using a trunk checkout for LLVM. I'll
> > > try older ones, as soon as they finish rebuilding. But perhaps you could
> > > re-verify that this still is an issue on recent PG checkouts? And which
> > > version of LLVM are you guys using?
> >
> > One of your recent fixes has visibly taken take of the issue (I am too
> > lazy to check which one).  I am using llvm 6.0 with Debian SID as far as
> > I know, configure links to stuff in /usr/lib/llvm-6.0/.
> 
> Looks like I still can reproduce it on the current head 167075be3ab. I was 
> using
> LLVM 5.0 package, but will try reproduce on the LLVM master branch as soon
> as compilation will be finished

Oh, interesting. It only crashes when compiling LLVM without LLVM's
asserts enabled, even when using exactly the same LLVM checkout for both
builds. No idea what that's about, yet.


> (apparently, I need to rebuild it with LLVM_USE_PERF, otherwise
> Postgres complains about undefined symbol:
> LLVMCreatePerfJITEventListener).

Sorry, that was an intermittent state, fixed now (was waiting for a
review to come in). OTOH, having perf support is good anyway ;)

Greetings,

Andres Freund



Re: Missing pg_control crashes postmaster

2018-07-25 Thread David Steele

On 7/25/18 11:26 AM, Andres Freund wrote:

On 2018-07-25 11:19:49 -0400, David Steele wrote:

If one wanted to improve recoverability in scenarios like this, there'd
be actually useful things like adding the option to extract control
files, FPIs, clog contents from the WAL with pg_waldump.


I think you'd need to be a bit careful about what to trust in the WAL 
but presumably the last checkpoint LSN would be safe enough.


At the end of the day good backups are the best protection against an 
issue like this.


Regards,
--
-David
da...@pgmasters.net



Re: Missing pg_control crashes postmaster

2018-07-25 Thread Tom Lane
David Steele  writes:
> On 7/25/18 11:09 AM, Andres Freund wrote:
>> The problem is that that'll just hide the issue for a bit longer, while
>> continuing (due to the O_CREAT we'll not PANIC anymore).  Which can lead
>> to a lot of followup issues, like checkpoints removing old WAL that'd
>> have been useful for data recovery.

> So if a panic is the best thing to do, it might still be good to write 
> out a copy of pg_control to another file and let the user know that it's 
> there.  More information seems better than less to me.

I'm still dubious that this is fixing any real-world problem that is
more pressing than the problems it would create.  If you're asked to
resuscitate a dead cluster, do you trust pg_control.bak if you find
it?  Maybe it's horribly out of date (consider likelihood that someone
removed pg_control more than once, having got away with that the first
time).  If there's both that and pg_control, which do you trust?

regards, tom lane



Re: Early WIP/PoC for inlining CTEs

2018-07-25 Thread Nico Williams
On Wed, Jul 25, 2018 at 07:42:37AM +0200, David Fetter wrote:
> Please find attached a version rebased atop 167075be3ab1547e18 with
> what I believe are appropriate changes to regression test output.  The
> other changes to the regression tests output are somewhat puzzling, as
> they change the actual results of queries.  I've also attached both
> the "leftover" diff and the files to which it should be applied.

I think the SQL programmer needs some control over whether a CTE is:

 - a materialized view -- and therefore a barrier
 - a view (which can then be inlined by the optimizer)

It is possible to add a keyword for this purpose in the WITH syntax:

WITH   VIEW (...) AS a_view
 , MATERIALIZED VIEW (...) AS a_barrier
...;

This would be a lot like creating TEMP views, but without the catalog
overhead.

(I wonder how hard it would be to partiion the OID namespace into
temp/persistent ranges so that temp schema elements need not be written
into the catalog.)

Nico
-- 



Re: Missing pg_control crashes postmaster

2018-07-25 Thread Andres Freund
Hi,

On 2018-07-25 11:19:49 -0400, David Steele wrote:
> On 7/25/18 11:09 AM, Andres Freund wrote:
> > On 2018-07-25 10:52:08 -0400, David Steele wrote:
> > The problem is that that'll just hide the issue for a bit longer, while
> > continuing (due to the O_CREAT we'll not PANIC anymore).  Which can lead
> > to a lot of followup issues, like checkpoints removing old WAL that'd
> > have been useful for data recovery.
> 
> So if a panic is the best thing to do, it might still be good to write out a
> copy of pg_control to another file and let the user know that it's there.
> More information seems better than less to me.

Sure, if that were the only concern. But code complexity and testability
also is. This means we need to maintain code for an edge case that
nobody has come up with a reason for. And at least during development,
we need to come up with test cases. Or we continually run the test case
as part of the regression tests for something without any sort of
practical relevance.

If one wanted to improve recoverability in scenarios like this, there'd
be actually useful things like adding the option to extract control
files, FPIs, clog contents from the WAL with pg_waldump.

Greetings,

Andres Freund



Re: How can we submit code patches that implement our (pending) patents?

2018-07-25 Thread Nico Williams
On Wed, Jul 25, 2018 at 02:48:01PM +0700, Benjamin Scherrey wrote:
> If you violate a patent, knowingly or otherwise, you are subject to
> penalties (perhaps not treble but still penalties) and will have to remove
> the offending code unless a deal is reached with the patent holder.

Unless you do a patent search every time you design something, you can't
avoid the risk of accidentally infringing.

> It is critical that Postgres require that all contributors do not enforce
> patents against Postgres - full stop. That's the IP agreement that should
> be in place.

My take is that the PG license should make royalty-free, non-exclusive,
transferrable, worldwide patent grants that are either irrevocable or
revocable only againts those who sue the owner.  This together with a
contributor agreement would stop contributions by patent trolls, but it
still wouldn't prevent accidental re-inventions.

I don't understand why it's not obvious that one can unknowingly and
accidentally re-invent someone else's idea.



Re: Missing pg_control crashes postmaster

2018-07-25 Thread Tom Lane
David Steele  writes:
> On 7/25/18 10:37 AM, Andres Freund wrote:
>> What would we win here? Which scenario that's not contrived would be less 
>> bad due to the proposed change.  This seems complexity for it's own sake.

> I favor the contrived scenario that helps preserve the current cluster 
> instead of a hypothetical newly init'd one.  I also don't think that 
> users deleting files out of a cluster is all that contrived.

What's not contrived about it?  Particularly the case of *only* deleting
pg_control and not any other critical file?  I don't recall having heard
any such stories in the last twenty years.

Also, even if we added the complexity needed to write data into say
pg_control.bak, what then?  You think that that would be any less
prone to indiscriminate rm'ing?  Where and how would we document this,
and what's the odds that someone dumb enough to remove pg_control would
ever have read that part of the documentation?

I'm with Andres: this is a solution in search of a problem.

regards, tom lane



Re: How can we submit code patches that implement our (pending) patents?

2018-07-25 Thread Nico Williams
On Wed, Jul 25, 2018 at 03:06:22AM -0400, Chapman Flack wrote:
> On 07/25/18 01:56, Nico Williams wrote:
> 
> > Wrong.  With patents the important thing is not to know about them when
> > you implement -- if you come up with the same idea by accident (which,
> > of course, is obviously entirely possible) then you are not subject to
> > trebble damages.
> 
> Even if the damages are not trebled, can 1✕ the damages be more than you
> would like to shell out? Not to mention the hassle of getting any infringing
> uses to cease?

You get a chance to stop infringing.  Also, how could you avoid
accidentally re-inventing patented algorithms if not by doing a costly
patent search every time you open $EDITOR??

> Also, is this distinction universally applied across jurisdictions?

It doesn't matter.  The point is that no one does a patent search every
time they design an algorithm -- not unless they mean to patent it.  So
you can't avoid accidentally re-inventing patented algorithms.



Re: Missing pg_control crashes postmaster

2018-07-25 Thread David Steele

On 7/25/18 11:09 AM, Andres Freund wrote:

On 2018-07-25 10:52:08 -0400, David Steele wrote:


I favor the contrived scenario that helps preserve the current cluster
instead of a hypothetical newly init'd one.  I also don't think that users
deleting files out of a cluster is all that contrived.


But trying to limp on in that case, and that being helpful, is.


OK, I can't argue with that.  It would be wrong to continue operating 
without knowing what the damage is.



Adding O_CREATE to open() doesn't seem too complex to me.  I'm not really in
favor of the renaming idea, but I'm not against it either if it gets me a
copy of the pg_control file.


The problem is that that'll just hide the issue for a bit longer, while
continuing (due to the O_CREAT we'll not PANIC anymore).  Which can lead
to a lot of followup issues, like checkpoints removing old WAL that'd
have been useful for data recovery.


So if a panic is the best thing to do, it might still be good to write 
out a copy of pg_control to another file and let the user know that it's 
there.  More information seems better than less to me.


Regards,
--
-David
da...@pgmasters.net



Re: Early WIP/PoC for inlining CTEs

2018-07-25 Thread Andrew Gierth
> "David" == David Fetter  writes:

 David> Please find attached a version rebased atop 167075be3ab1547e18
 David> with what I believe are appropriate changes to regression test
 David> output. The other changes to the regression tests output are
 David> somewhat puzzling, as they change the actual results of queries.

Both of those changes are the result of volatile CTEs being inlined more
than once (in one case, as part of an explicit test to ensure that CTEs
are being materialized and not multiply evaluated).

If you look for the XXX comment in the patch, it should be easy to add a
check that skips inlining if cterefcount > 1 and
contains_volatile_functions is true.

-- 
Andrew (irc:RhodiumToad)



Re: Missing pg_control crashes postmaster

2018-07-25 Thread Andres Freund
Hi,

On 2018-07-25 10:52:08 -0400, David Steele wrote:
> On 7/25/18 10:37 AM, Andres Freund wrote:
> > On July 25, 2018 7:18:30 AM PDT, David Steele  wrote:
> > > 
> > > It seems like an easy win if we can find a safe way to do it, though I
> > > admit that this is only a benefit in corner cases.
> > 
> > What would we win here? Which scenario that's not contrived would be less 
> > bad due to the proposed change.  This seems complexity for it's own sake.
> 
> I think it's worth preserving pg_control even in the case where there is
> other damage to the cluster.  The alternative in this case (if no backup
> exists) is to run pg_resetwal which means data since the last checkpoint
> will not be written out causing even more data loss.  I have run clusters
> with checkpoint_timeout = 60m so data loss in this case is a real concern.

Wait, what? How is "data loss in this case is a real concern." - no
even a remotely realistic scenario has been described where this matters
so far.


> I favor the contrived scenario that helps preserve the current cluster
> instead of a hypothetical newly init'd one.  I also don't think that users
> deleting files out of a cluster is all that contrived.

But trying to limp on in that case, and that being helpful, is.


> Adding O_CREATE to open() doesn't seem too complex to me.  I'm not really in
> favor of the renaming idea, but I'm not against it either if it gets me a
> copy of the pg_control file.

The problem is that that'll just hide the issue for a bit longer, while
continuing (due to the O_CREAT we'll not PANIC anymore).  Which can lead
to a lot of followup issues, like checkpoints removing old WAL that'd
have been useful for data recovery.

Greetings,

Andres Freund



Re: How can we submit code patches that implement our (pending) patents?

2018-07-25 Thread David Fetter
On Tue, Jul 24, 2018 at 06:13:37AM +, Tsunakawa, Takayuki wrote:
> From: Bruce Momjian [mailto:br...@momjian.us]
> > On Tue, Jul 10, 2018 at 08:20:53AM +, Tsunakawa, Takayuki wrote:
> > > Yes, that's one unfortunate future, which I don't want to happen
> > > of course.  I believe PostgreSQL should accept patent for further
> > > evolution, because PostgreSQL is now a popular, influential software
> > > that many organizations want to join.
> > 
> > Why did you say this last sentence?
> 
> That's a simple story (but might still be a pipedream now.)
> PostgreSQL may become popular enough to be considered a public
> property like Linux, OpenStack and Hadoop.

I disagree with your characterization of this as a future possibility.
It's a current reality, and it got to be that way by a process I
describe below.

> Then more companies may want to join its development.  For example,
> Greenplum may want to contribute its clever planner code to better
> align with the latest version of PostgreSQL, IBM may want to give
> its Netezza-specific code to reduce maintenance burdon, and
> AWS/Microsoft/Google may want to contribute some basic scalability
> and HA technology so that they can focus on more advanced features
> with less rebase burdon.  I think PostgreSQL community can be ready
> to open its door a bit more to embrace big companies with many
> patents.

What made PostgreSQL attractive to those companies in the first place
was a known lack of need to have Extensive Conversations with Legal™
about licensing and other financial/IP matters. This in turn was
guaranteed by our license, which is totally pervasive in its grants,
and by our refusal, as a policy, to include anything that might
compromise those pervasive grants. We have specifically excluded, and
in at least one case removed, patented things in order to maintain
this property.

Your proposal is aimed at a problem which doesn't exist, and it is
asking us to make a titanic shift in our project policy in exchange
for benefits which are at best ephemeral, even according to you.

Please let this go.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Missing pg_control crashes postmaster

2018-07-25 Thread David Steele

On 7/25/18 10:37 AM, Andres Freund wrote:

On July 25, 2018 7:18:30 AM PDT, David Steele  wrote:


It seems like an easy win if we can find a safe way to do it, though I
admit that this is only a benefit in corner cases.


What would we win here? Which scenario that's not contrived would be less bad 
due to the proposed change.  This seems complexity for it's own sake.


I think it's worth preserving pg_control even in the case where there is 
other damage to the cluster.  The alternative in this case (if no backup 
exists) is to run pg_resetwal which means data since the last checkpoint 
will not be written out causing even more data loss.  I have run 
clusters with checkpoint_timeout = 60m so data loss in this case is a 
real concern.


I favor the contrived scenario that helps preserve the current cluster 
instead of a hypothetical newly init'd one.  I also don't think that 
users deleting files out of a cluster is all that contrived.


Adding O_CREATE to open() doesn't seem too complex to me.  I'm not 
really in favor of the renaming idea, but I'm not against it either if 
it gets me a copy of the pg_control file.


Regards,
--
-David
da...@pgmasters.net



Re: Missing pg_control crashes postmaster

2018-07-25 Thread Andres Freund



On July 25, 2018 7:18:30 AM PDT, David Steele  wrote:
>On 7/23/18 7:00 PM, Tom Lane wrote:
>> Brian Faherty  writes:
> >
>>> There does not really seem to be a need for this behavior as all the
>>> information postgres needs is in memory at this point. I propose
>with
>>> a patch to just recreate pg_control on updates if it does not exist.
>> 
>> I would vote to reject any such patch; it's too likely to cause more
>> problems than it solves.  Generally, if critical files like that one
>> have disappeared, trying to write new data isn't going to be enough
>> to fix it and could well result in more corruption.
>> 
>> As an example, imagine that you do "rm -rf $PGDATA; initdb" without
>> remembering to shut down the old postmaster first.  Currently, the
>> old postmaster will panic/quit fairly promptly and no harm done.
>> The more aggressive it is at trying to "recover" from the situation,
>> the more likely it is to corrupt the new installation.
>
>It seems much more likely that a missing/modified postmaster.pid will 
>cause postgres to panic than it is for a missing pg_control to do so.
>
>Older versions of postgres don't panic until the next checkpoint and 
>newer versions won't panic at all on an idle system since we fixed 
>redundant checkpoints in 9.6 (6ef2eba3).  An idle postgres 11 cluster 
>seems happy enough to run without a pg_control file indefinitely (or at
>
>least 10 minutes, which is past the default checkpoint time).  As soon 
>as I write data or perform a checkpoint it does panic, of course.
>
>Conversely, removing/modifying postmaster.pid causes postgres to panic 
>very quickly on the versions I tested, 9.4 and 11.
>
>It seems to me that doing the postmaster.pid test at checkpoint time
>(if 
>we don't already) would be enough to protect pg_control against 
>unintentionally replaced clusters.
>
>Or perhaps writing to an alternate file as David J suggests would do
>the 
>trick.
>
>It seems like an easy win if we can find a safe way to do it, though I 
>admit that this is only a benefit in corner cases.

What would we win here? Which scenario that's not contrived would be less bad 
due to the proposed change.  This seems complexity for it's own sake.

Andres

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: Missing pg_control crashes postmaster

2018-07-25 Thread David Steele

On 7/23/18 7:00 PM, Tom Lane wrote:

Brian Faherty  writes:

>

There does not really seem to be a need for this behavior as all the
information postgres needs is in memory at this point. I propose with
a patch to just recreate pg_control on updates if it does not exist.


I would vote to reject any such patch; it's too likely to cause more
problems than it solves.  Generally, if critical files like that one
have disappeared, trying to write new data isn't going to be enough
to fix it and could well result in more corruption.

As an example, imagine that you do "rm -rf $PGDATA; initdb" without
remembering to shut down the old postmaster first.  Currently, the
old postmaster will panic/quit fairly promptly and no harm done.
The more aggressive it is at trying to "recover" from the situation,
the more likely it is to corrupt the new installation.


It seems much more likely that a missing/modified postmaster.pid will 
cause postgres to panic than it is for a missing pg_control to do so.


Older versions of postgres don't panic until the next checkpoint and 
newer versions won't panic at all on an idle system since we fixed 
redundant checkpoints in 9.6 (6ef2eba3).  An idle postgres 11 cluster 
seems happy enough to run without a pg_control file indefinitely (or at 
least 10 minutes, which is past the default checkpoint time).  As soon 
as I write data or perform a checkpoint it does panic, of course.


Conversely, removing/modifying postmaster.pid causes postgres to panic 
very quickly on the versions I tested, 9.4 and 11.


It seems to me that doing the postmaster.pid test at checkpoint time (if 
we don't already) would be enough to protect pg_control against 
unintentionally replaced clusters.


Or perhaps writing to an alternate file as David J suggests would do the 
trick.


It seems like an easy win if we can find a safe way to do it, though I 
admit that this is only a benefit in corner cases.


Regards,
--
-David
da...@pgmasters.net



Re: Explain buffers wrong counter with parallel plans

2018-07-25 Thread Jonathan S. Katz

> On Jul 7, 2018, at 12:03 AM, Amit Kapila  wrote:
> 
> On Sat, Jul 7, 2018 at 7:45 AM, Amit Kapila  > wrote:
>> On Sat, Jul 7, 2018 at 12:44 AM, Robert Haas  wrote:
>>> On Fri, Jul 6, 2018 at 9:44 AM, Amit Kapila  wrote:
 I have tried this idea, but it doesn't completely solve the problem.
 The problem is that nodes below LIMIT won't get a chance to accumulate
 the stats as they won't be able to call InstrStopNode.
>>> 
>>> I'm not sure I understand.  Why not?  I see that we'd need to insert
>>> an extra call to InstrStopNode() if we were stopping the node while it
>>> was running, because then InstrStartNode() would have already been
>>> done, but the corresponding call to InstrStopNode() would not have
>>> been done.  But I'm not sure how that would happen in this case.  Can
>>> you explain further?
>>> 
>> 
>> Okay, let me try.   The code flow is that for each node we will call
>> InstrStartNode()->ExecProcNodeReal()->InstrStopNode().  Now let's say
>> we have to execute a plan Limit->Gather-> Parallel SeqScan.  In this,
>> first for Limit node, we will call InstrStartNode() and
>> ExecProcNodeReal() and then for Gather we will call InstrStartNode(),
>> ExecProcNodeReal() and InstrStopNode().  Now, Limit node decides that
>> it needs to shutdown all the nodes (ExecShutdownNode) and after that
>> it will call InstrStopNode() for Limit node.  So, in this flow after
>> shutting down nodes, we never get chance for Gather node to use stats
>> collected during ExecShutdownNode.
>> 
> 
> I went ahead and tried the solution which I had mentioned yesterday,
> that is to allow ExecShutdownNode to count stats.  Apart from fixing
> this problem, it will also fix the problem with Gather Merge as
> reported by Adrien [1], because now Gather Merge will also get a
> chance to count stats after shutting down workers.
> 
> Note that, I have changed the location of InstrStartParallelQuery in
> ParallelQueryMain so that the buffer usage stats are accumulated only
> for the plan execution which is what we do for instrumentation
> information as well.  If we don't do that, it will count some
> additional stats for ExecutorStart which won't match with what we have
> in Instrumentation structure of each node.

I tried running the below on both 11beta2 and HEAD with the patch
applied:

CREATE UNLOGGED TABLE t1 (c1 int);
INSERT INTO t1 SELECT generate_series(1,1);
/** restart PostgreSQL */
EXPLAIN (analyze,buffers,timing off,costs off)
SELECT count(*) FROM t1;
/** repeat above two queries */

I have identical postgresql.conf files on both instances as well.

11beta2
==
buffers=# explain (analyze,buffers,timing off,costs off) select count(*)
from t1;
QUERY PLAN
--
 Finalize Aggregate (actual rows=1 loops=1)
   Buffers: shared read=63175
   ->  Gather (actual rows=7 loops=1)
 Workers Planned: 6
 Workers Launched: 6
 Buffers: shared read=63175
 ->  Partial Aggregate (actual rows=1 loops=7)
   Buffers: shared read=442478
   ->  Parallel Seq Scan on t1 (actual rows=14285714 loops=7)
 Buffers: shared read=442478
 Planning Time: 1.422 ms
 Execution Time: 3214.407 ms
(12 rows)

buffers=# explain (analyze,buffers,timing off,costs off) select count(*)
from t1;
QUERY PLAN
--
 Finalize Aggregate (actual rows=1 loops=1)
   Buffers: shared hit=27 read=64960
   ->  Gather (actual rows=7 loops=1)
 Workers Planned: 6
 Workers Launched: 6
 Buffers: shared hit=27 read=64960
 ->  Partial Aggregate (actual rows=1 loops=7)
   Buffers: shared hit=224 read=442254
   ->  Parallel Seq Scan on t1 (actual rows=14285714 loops=7)
 Buffers: shared hit=224 read=442254
 Planning Time: 0.080 ms
 Execution Time: 3774.091 ms
(12 rows)

HEAD
=
buffers=# explain (analyze,buffers,timing off,costs off) select count(*)
from t1;
QUERY PLAN
--
 Finalize Aggregate (actual rows=1 loops=1)
   Buffers: shared read=442478
   ->  Gather (actual rows=7 loops=1)
 Workers Planned: 6
 Workers Launched: 6
 Buffers: shared read=442478
 ->  Partial Aggregate (actual rows=1 loops=7)
   Buffers: shared read=442478
   ->  Parallel Seq Scan on t1 (actual rows=14285714 loops=7)
 Buffers: shared read=442478
 Planning Time: 1.594 ms
 Execution Time: 

Re: [HACKERS] WAL logging problem in 9.4.3?

2018-07-25 Thread Michael Paquier
On Wed, Jul 18, 2018 at 05:58:16PM +0300, Heikki Linnakangas wrote:
> I'd suggest that we continue based on the patch that Kyotaro posted at
> https://www.postgresql.org/message-id/20180330.100646.86008470.horiguchi.kyotaro%40lab.ntt.co.jp.

Whatever happens here, perhaps one way to move on would be to commit
first the TAP test that I proposed upthread.  That would not work for
wal_level=minimal so this part should be commented out, but that's
easier this way to test basically all the cases we talked about with any
approach taken.
--
Michael


signature.asc
Description: PGP signature


Optimizer misses big in 10.4 with BRIN index

2018-07-25 Thread Arcadiy Ivanov

Hi,

Before spamming the list with reproduction examples I want to make sure 
the issue isn't already known.


Moving to 10.4 from 9.2 (AWS RDS but repro on local laptop as well) 
we've discovered that an optimizer prefers a seq scan to fully analyzed 
consistent BRIN index, ending up with a query that is 4.8s long on 
seqscan vs 56ms when forcing use of BRIN (85 times difference).
The size of the dataset is millions of rows and with extremely high 
probability the rows are naturally clustered on BRIN index column.


Anybody observed anything like that?


schema0=# SET enable_seqscan=false;
SET
schema0=# EXPLAIN (analyze, verbose, costs, buffers) SELECT data, 
count(*) OVER() AS full_count FROM schema0_lab.data_table WHERE segment 
= 'pb1'

    AND (data->>'tradeDate')::numeric >= '1531267200'
    AND (data->>'tradeDate')::numeric <= '1531353600'
    AND data->>'tradeStatus' = 'Replaced'
    ORDER BY (data->>'tradeDate')::numeric
    DESC;
QUERY PLAN
-
 Sort  (cost=617851.03..617851.24 rows=84 width=1219) (actual 
time=55.765..55.794 rows=611 loops=1)
   Output: data, (count(*) OVER (?)), (((data ->> 
'tradeDate'::text))::numeric)

   Sort Key: (((data_table.data ->> 'tradeDate'::text))::numeric) DESC
   Sort Method: quicksort  Memory: 1256kB
   Buffers: shared hit=824
   ->  WindowAgg  (cost=1231.98..617848.34 rows=84 width=1219) (actual 
time=52.688..55.068 rows=611 loops=1)
 Output: data, count(*) OVER (?), ((data ->> 
'tradeDate'::text))::numeric

 Buffers: shared hit=824
 ->  Gather  (cost=1231.98..617846.66 rows=84 width=1179) 
(actual time=8.247..51.804 rows=611 loops=1)

   Output: data
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=824
   ->  Parallel Bitmap Heap Scan on schema0_lab.data_table  
(cost=231.98..616838.26 rows=35 width=1179) (actual time=3.850..46.704 
rows=204 loops=3)

 Output: data
 Recheck Cond: data_table.data ->> 
'tradeDate'::text))::numeric >= '1531267200'::numeric) AND 
(((data_table.data ->> 'tradeDate'::text))::numeric <= 
'1531353600'::numeric))

 Rows Removed by Index Recheck: 4404
 Filter: (((data_table.segment)::text = 
'pb1'::text) AND ((data_table.data ->> 'tradeStatus'::text) = 
'Replaced'::text))

 Heap Blocks: lossy=794
 Buffers: shared hit=2334
 Worker 0: actual time=3.572..44.145 rows=236 loops=1
   Buffers: shared hit=749
 Worker 1: actual time=0.326..45.184 rows=212 loops=1
   Buffers: shared hit=761
 ->  Bitmap Index Scan on tradedate_idx 
(cost=0.00..231.96 rows=3377106 width=0) (actual time=4.500..4.500 
rows=23040 loops=1)
   Index Cond: data_table.data ->> 
'tradeDate'::text))::numeric >= '1531267200'::numeric) AND 
(((data_table.data ->> 'tradeDate'::text))::numeric <= 
'1531353600'::numeric))

   Buffers: shared hit=30
 Planning time: 0.246 ms
 Execution time: 56.209 ms
(29 rows)

schema0=# SET enable_seqscan=true;
SET
schema0=# EXPLAIN (analyze, verbose, costs, buffers) SELECT data, 
count(*) OVER() AS full_count FROM schema0_lab.data_table WHERE segment 
= 'pb1'

    AND (data->>'tradeDate')::numeric >= '1531267200'
    AND (data->>'tradeDate')::numeric <= '1531353600'
    AND data->>'tradeStatus' = 'Replaced'
    ORDER BY (data->>'tradeDate')::numeric
    DESC;
QUERY PLAN

-
-
 Sort  (cost=617619.05..617619.26 rows=84 width=1219) (actual 
time=4823.081..4823.106 rows=611 loops=1)
   Output: data, (count(*) OVER (?)), (((data ->> 
'tradeDate'::text))::numeric)

   Sort Key: (((data_table.data ->> 'tradeDate'::text))::numeric) DESC
   Sort Method: quicksort  Memory: 1256kB
   Buffers: shared hit=839 read=187353
   ->  WindowAgg  (cost=1000.00..617616.36 rows=84 width=1219) (actual 
time=4820.005..4822.390 rows=611 loops=1)
 Output: data, count(*) OVER (?), ((data ->> 
'tradeDate'::text))::numeric

 Buffers: shared hit=839 read=187353
 ->  Gather  (cost=1000.00..617614.68 rows=84 width=1179) 
(actual time=3.262..4819.362 rows=611 loops=1)

   

Using test_ddl_deparse as an extension

2018-07-25 Thread Jeremy Finzel
I am interested in using the exact functionality in test_ddl_deparse which
provides the sub command types for alter table statements.  I would prefer
to leverage this code which has already been vetted by the community, and
also not just duplicate this code myself, or package it up myself.

It would be nice to just use the extension.  But I'm interested to hear
what recommendations could be given for my use case (for that matter - what
is the recommendation to anyone wanting to use the exact functionality
provided in a postgres test module).

My specific intention is for sending DDL via logical replication, and
ignoring, for example, enable/disable trigger statements when I am using
selective replication.

Many thanks!
Jeremy


Re: LLVM jit and matview

2018-07-25 Thread Dmitry Dolgov
> On Wed, 25 Jul 2018 at 07:49, Michael Paquier  wrote:
>
> On Tue, Jul 24, 2018 at 07:49:56PM -0700, Andres Freund wrote:
> > FWIW, this doesn't crash for me, using a trunk checkout for LLVM. I'll
> > try older ones, as soon as they finish rebuilding. But perhaps you could
> > re-verify that this still is an issue on recent PG checkouts? And which
> > version of LLVM are you guys using?
>
> One of your recent fixes has visibly taken take of the issue (I am too
> lazy to check which one).  I am using llvm 6.0 with Debian SID as far as
> I know, configure links to stuff in /usr/lib/llvm-6.0/.

Looks like I still can reproduce it on the current head 167075be3ab. I was using
LLVM 5.0 package, but will try reproduce on the LLVM master branch as soon
as compilation will be finished (apparently, I need to rebuild it with
LLVM_USE_PERF, otherwise Postgres complains about undefined symbol:
LLVMCreatePerfJITEventListener).



Re: Locking B-tree leafs immediately in exclusive mode

2018-07-25 Thread Alexander Korotkov
On Wed, Jul 25, 2018 at 5:54 AM Imai, Yoshikazu
 wrote:
> On Mon, July 9, 2018 at 3:19 AM, Imai, Yoshikazu wrote:
> > I'm planning to do code review and send it in the next mail.
>
> Sorry for delaying the code review.
>
> I did the code review, and I think there are no logical wrongs
> with B-Tree.
>
> I tested integrity of B-Tree with amcheck just in case.
> I execute this test on 16-cores machine with 64GB mem.
>
> I run B-Tree insertion and deletion simultaneously and execute
> 'bt_index_parent_check' periodically.
> I also run B-Tree insertion and update simultaneously and execute
> 'bt_index_parent_check' periodically.
>
> ##  DDL
> create table mytab (val1 int, val2 int);
> create index mytabidx on mytab (val1, val2);
>
> ## script_integrity_check_insert.sql
> \set i random(1, 100)
> \set j random(1, 100)
> insert into mytab values (:i, :j);
>
> ## script_integrity_check_delete.sql
> \set i random(1, 100)
> \set j random(1, 100)
> delete from mytab where val1 = :i and val2 = :j
>
> ## script_integrity_check_update.sql
> \set i random(1, 100)
> \set j random(1, 100)
> \set k random(1, 100)
> \set l random(1, 100)
> update mytab set val1 = :k, val2 = :l where val1 = :i and val2 = :j
>
> ## commands(insert, delete, simultaneously executed)
> pgbench -T 60 -P 1 -M prepared -f script_integrity_check_insert.sql -c 2 
> postgres
> pgbench -T 60 -P 1 -M prepared -f script_integrity_check_delete.sql -c 2 
> postgres
>
> ## commands(insert, update, simultaneously executed)
> pgbench -T 60 -P 1 -M prepared -f script_integrity_check_insert.sql -c 2 
> postgres
> pgbench -T 60 -P 1 -M prepared -f script_integrity_check_update.sql -c 2 
> postgres
>
> ## commands(executed during above insert, delete, update)
> (psql) select bt_index_parent_check('mytabidx');
>
>
> Finally, I confirmed there are no integrity problems during B-Tree operation.
>
> Lefted tasks in my review is doing the regression tests.

Cool, thank you for review!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Locking B-tree leafs immediately in exclusive mode

2018-07-25 Thread Alexander Korotkov
Hi!

On Wed, Jul 25, 2018 at 1:19 PM Simon Riggs  wrote:
> On 13 July 2018 at 03:14, Imai, Yoshikazu  
> wrote:
> > From an attached graph("some_contention_points_on_leaf_nodes.png"), as 
> > contention points dispersed, we can see that TPS is increased and TPS 
> > difference between master and patched version becomes smaller.
>
> So I think this clearly shows the drop in throughput when we have
> index contention and that this patch improves on that situation.
>
> In cases where we don't have contention, the patch doesn't cause a
> negative effect.
>
> So +1 from me!

Thank you, but Imai found another case [1], where patch causes a small
regression.  For now, it's not completely clear for me what test case
was used, because I forgot to add any indexes to initial specification
of this test case [2].  If regression will be confirmed, then it would
be nice to mitigate that.  And even if we wouldn't manage to mitigate
that and consider that as acceptable trade-off, then it would be nice
to at least explain it...

1. 
https://www.postgresql.org/message-id/0F97FA9ABBDBE54F91744A9B37151A51189451%40g01jpexmbkw24
2. 
https://www.postgresql.org/message-id/CAPpHfds%3DWmjv%2Bu7S0peHN2zRrw4C%3DYySn-ZKddp4E7q8KQ18hQ%40mail.gmail.com--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Locking B-tree leafs immediately in exclusive mode

2018-07-25 Thread Alexander Korotkov
Hi!

On Tue, Jul 10, 2018 at 4:39 PM 今井 良一  wrote:
> On 2018/07/10 20:36, Alexander Korotkov wrote:
> > Thank you for the experiments!  It seems that there is real regression
> > here...  BTW, which script were you using in this benchmark:
> > script_unordered.sql or script_duplicated.sql?
>
> Sorry, I forgot to write that. I used script_unordered.sql.

I've reread the thread.  And I found that in my initial letter [1] I
forget to include index definition.
CREATE INDEX unordered_i_idx ON unordered (i);

So, when you did experiments [2], did you define any index on
unordered table?  If not, then it's curious why there is any
performance difference at all.

1. 
https://www.postgresql.org/message-id/CAPpHfduAMDFMNYTCN7VMBsFg_hsf0GqiqXnt%2BbSeaJworwFoig%40mail.gmail.com
2. 
https://www.postgresql.org/message-id/0F97FA9ABBDBE54F91744A9B37151A51189451%40g01jpexmbkw24

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: "WIP: Data at rest encryption" patch and, 2 phase commit.

2018-07-25 Thread Antonin Houska
Toshi Harada  wrote:

> Hi.
> 
> I applied the patch "WIP: Data at rest encryption" to PostgreSQL 11 - beta 2 
> and I'm working on it.
> 
> When this patch is applied, the following problem occurs.
> 
> * An error occurs when CHECKPOINT is executed during two-phase commit.
> * After an error occurs, if you stop PostgreSQL, it will never start again.
> 
> (1) First, execute PREPARE TRANSACTION.
> 
> postgres=# BEGIN;
> BEGIN
> postgres=# PREPARE TRANSACTION 'foo';
> PREPARE TRANSACTION
> postgres=#
> 
> (2) Execute the CHECKPOINT command from another terminal.
> CHEKPOINT command fails.
> 
> postgres=# CHECKPOINT;
> ERROR:  checkpoint request failed
> HINT:  Consult recent messages in the server log for details.
> postgres=#

The patch version I posted in

https://www.postgresql.org/message-id/11678.1532519255%40localhost

fixes an issue (unitialized pointer) that caused failure here, but it was
SEGFAULT rather than ERROR. And the scope of the bug was broader than just
CHECKPOINT.

Can you please test it again with the new version of the patch?

Anyway, thanks for your reports!


-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com



Fwd: Re[2]: Alter index rename concurrently to

2018-07-25 Thread Andrey Klychkov
Понедельник, 23 июля 2018, 18:06 +03:00 от Peter Eisentraut < 
peter.eisentr...@2ndquadrant.com >:
>
>You appear to be saying that you think that renaming an index
>concurrently is not safe
No, I didn't say it about renaming indexes.
I tried to say that it does not make sense exactly to rename a table (view, 
functions) concurrently
(and implement the lower lock level respectively) because probably no one will 
do this on production.
Because names of that objects are used in application's configuration.
I don't remember any cases I need to do this for tables but for indexes very 
often.
Renaming indexes is safe for client's applications because their names are not 
used in texts of SQL queries.
I think it's safe for relations too because, as we know, the database server 
works with indexes by using identifiers, not names.
Moreover, I tested it.

Ok, Peter, thanks for your answer!

-- 
Kind regards,
Andrey Klychkov

--

-- 
Kind regards,
Andrey Klychkov


Re: Global snapshots

2018-07-25 Thread Arseny Sher
Hello,

I have looked through the patches and found them pretty accurate. I'd
fixed a lot of small issues here and there; updated patchset is
attached. But first, some high-level notes:

 * I agree that it would be cool to implement functionality like current
   "snapshot too old": that is, abort transaction with old global
   snapshot only if it really attempted to touch modified data.

 * I also agree with Stas that any attempts to trade oldestxmin in
   gossip between the nodes would drastically complicate this patch and
   make it discussion-prone; it would be nice first to get some feedback
   on general approach, especially from people trying to distribute
   Postgres.

 * One drawback of these patches is that only REPEATABLE READ is
   supported. For READ COMMITTED, we must export every new snapshot
   generated on coordinator to all nodes, which is fairly easy to
   do. SERIALIZABLE will definitely require chattering between nodes,
   but that's much less demanded isolevel (e.g. we still don't support
   it on replicas).

 * Another somewhat serious issue is that there is a risk of recency
   guarantee violation. If client starts transaction at node with
   lagging clocks, its snapshot might not include some recently
   committed transactions; if client works with different nodes, she
   might not even see her own changes. CockroachDB describes at [1] how
   they and Google Spanner overcome this problem. In short, both set
   hard limit on maximum allowed clock skew.  Spanner uses atomic
   clocks, so this skew is small and they just wait it at the end of
   each transaction before acknowledging the client. In CockroachDB, if
   tuple is not visible but we are unsure whether it is truly invisible
   or it's just the skew (the difference between snapshot and tuple's
   csn is less than the skew), transaction is restarted with advanced
   snapshot. This process is not infinite because the upper border
   (initial snapshot + max skew) stays the same; this is correct as we
   just want to ensure that our xact sees all the committed ones before
   it started. We can implement the same thing.

Now, the description of my mostly cosmetical changes:

 * Don't ERROR in BroadcastStmt to allow us to handle failure manually;
 * Check global_snapshot_defer_time in ImportGlobalSnapshot instead of
   falling on assert;
 * (Arguably) improved comments around locking at circular buffer
   maintenance; also, don't lock procarray during global_snapshot_xmin
   bump.
 * s/snaphot/snapshot, other typos.
 * Don't track_global_snapshots by default -- while handy for testing, it
   doesn't look generally good.
 * Set track_global_snapshots = true in tests everywhere.
 * GUC renamed from postgres_fdw.use_tsdtm to
   postgres_fdw.use_global_snapshots for consistency.
 * 003_bank_shared.pl test is removed. In current shape (loading one
   node) it is useless, and if we bombard both nodes, deadlock surely
   appears. In general, global snaphots are not needed for such
   multimaster-like setup -- either there are no conflicts and we are
   fine, or there is a conflict, in which case we get a deadlock.
 * Fix initdb failure with non-zero global_snapshot_defer_time.
 * Enforce REPEATABLE READ since currently we export snap only once in
   xact.
 * Remove assertion that circular buffer entries are monotonic, as
   GetOldestXmin *can* go backwards.


[1] https://www.cockroachlabs.com/blog/living-without-atomic-clocks/

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

>From 21687e75366df03b92e48c6125bb2e90d01bb70a Mon Sep 17 00:00:00 2001
From: Stas Kelvich 
Date: Wed, 25 Apr 2018 16:05:46 +0300
Subject: [PATCH 1/3] GlobalCSNLog SLRU

---
 src/backend/access/transam/Makefile |   3 +-
 src/backend/access/transam/global_csn_log.c | 439 
 src/backend/access/transam/twophase.c   |   1 +
 src/backend/access/transam/varsup.c |   2 +
 src/backend/access/transam/xlog.c   |  12 +
 src/backend/storage/ipc/ipci.c  |   3 +
 src/backend/storage/ipc/procarray.c |   3 +
 src/backend/storage/lmgr/lwlocknames.txt|   1 +
 src/backend/utils/misc/guc.c|   9 +
 src/backend/utils/probes.d  |   2 +
 src/bin/initdb/initdb.c |   3 +-
 src/include/access/global_csn_log.h |  30 ++
 src/include/storage/lwlock.h|   1 +
 src/include/utils/snapshot.h|   3 +
 14 files changed, 510 insertions(+), 2 deletions(-)
 create mode 100644 src/backend/access/transam/global_csn_log.c
 create mode 100644 src/include/access/global_csn_log.h

diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 16fbe47269..03aa360ea3 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -12,7 +12,8 @@ subdir = src/backend/access/transam
 top_builddir = ../../../..
 include 

Re: Online enabling of checksums

2018-07-25 Thread Sergei Kornilov
Hello
Thank you for update! I did only quick test now: patch applied and build clean. 
But i have reproducible error during check-world:

> t/001_standby_checksum.pl .. 6/10 
> #   Failed test 'ensure checksums are enabled on standby'
> #   at t/001_standby_checksum.pl line 84.
> #  got: 'inprogress'
> # expected: 'on'

In stanby log i found error:
> 2018-07-25 13:13:05.463 MSK [16544] FATAL:  could not receive data from WAL 
> stream: ERROR:  requested WAL segment 00010003 has already 
> been removed

Checksumhelper obviously writes lot of wal. Test pass if i change restart order 
to slave first:
> $node_standby_1->restart();
> $node_master->restart();
Or we need replication slot setup.

Also we have log record after start:
> data checksums in pending state, starting background worker to enable
even in recovery state with actual start background worker only at recovery end 
(or promote). I think better using DEBUG ereport in postmaster and LOG in 
checksumhelper.

regards, Sergei

25.07.2018, 12:35, "Daniel Gustafsson" :
>>  On 24 Jul 2018, at 11:05, Sergei Kornilov  wrote:
>>
>>  The following review has been posted through the commitfest application:
>>  make installcheck-world: tested, failed
>>  Implements feature: not tested
>>  Spec compliant: not tested
>>  Documentation: tested, failed
>>
>>  Hello
>>  As i wrote few weeks ago i can not build documentation due errors:
>>>  postgres.sgml:19625: element xref: validity error : IDREF attribute 
>>> linkend references an unknown ID "runtime-checksumhelper-cost-delay"
>>>  postgres.sgml:19626: element xref: validity error : IDREF attribute 
>>> linkend references an unknown ID "runtime-checksumhelper-cost-limit"
>>
>>  After remove such xref for test purposes patch pass check-world.
>
> Hi!,
>
> Thanks for reviewing, I’ve updated the patch with the above mentioned 
> incorrect
> linkends as well as fixed the comments you made in a previous review.
>
> The CF-builder-bot is red, but it’s because it’s trying to apply the already
> committed patch which is in the attached datallowconn thread.
>
> cheers ./daniel



Re: Locking B-tree leafs immediately in exclusive mode

2018-07-25 Thread Simon Riggs
On 13 July 2018 at 03:14, Imai, Yoshikazu  wrote:
> On Mon, July 9, 2018 at 5:25 PM, Simon Riggs wrote:
>> Please can you check insertion with the index on 2 keys
>> 1st key has 10,000 values
>> 2nd key has monotonically increasing value from last 1st key value
>>
>> So each session picks one 1st key value
>> Then each new INSERTion is a higher value of 2nd key
>> so 1,1, then 1,2 then 1,3 etc
>>
>> Probably easier to do this with a table like this
>>
>> CREATE UNLOGGED TABLE ordered2 (id integer, logdate timestamp default
>> now(), value text not null, primary key (id, logdate));
>>
>> # script_ordered2.sql
>> \set i random(1, 1)
>> INSERT INTO ordered2 (id, value) VALUES (:i, 'abcdefghijklmnoprsqtuvwxyz');
>>
>> Thanks
> I tried to do this, but I might be mistaken your intention, so please specify 
> if I am wrong.
>
> While script_ordered.sql supposes that there is one contention point on the 
> most right leaf node,
> script_ordered2.sql supposes that there are some contention points on some 
> leaf nodes, is it right?

Yes, that is right, thank you for testing.

> I experimented with key1 having 1 values, but there are no difference in 
> the results compared to unordered.sql one

That is an important result, thanks

> so I experimented with key1 having 1, 2, 3, 5, 10, and 100 values.

OK, good

> From an attached graph("some_contention_points_on_leaf_nodes.png"), as 
> contention points dispersed, we can see that TPS is increased and TPS 
> difference between master and patched version becomes smaller.

So I think this clearly shows the drop in throughput when we have
index contention and that this patch improves on that situation.

In cases where we don't have contention, the patch doesn't cause a
negative effect.

So +1 from me!

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Online enabling of checksums

2018-07-25 Thread Daniel Gustafsson
> On 24 Jul 2018, at 11:05, Sergei Kornilov  wrote:
> 
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, failed
> Implements feature:   not tested
> Spec compliant:   not tested
> Documentation:tested, failed
> 
> Hello
> As i wrote few weeks ago i can not build documentation due errors:
>> postgres.sgml:19625: element xref: validity error : IDREF attribute linkend 
>> references an unknown ID "runtime-checksumhelper-cost-delay"
>> postgres.sgml:19626: element xref: validity error : IDREF attribute linkend 
>> references an unknown ID "runtime-checksumhelper-cost-limit"
> 
> After remove such xref for test purposes patch pass check-world.

Hi!,

Thanks for reviewing, I’ve updated the patch with the above mentioned incorrect
linkends as well as fixed the comments you made in a previous review.

The CF-builder-bot is red, but it’s because it’s trying to apply the already
committed patch which is in the attached datallowconn thread.

cheers ./daniel



online_checksums13.patch
Description: Binary data


Re: ToDo: show size of partitioned table

2018-07-25 Thread Pavel Stehule
2018-07-25 11:09 GMT+02:00 Amit Langote :

> Hi Pavel.
>
> On 2018/07/23 20:46, Pavel Stehule wrote:
> > Hi
> >
> > I am sending a prototype of patch. Now, it calculates size of partitioned
> > tables with recursive query. When any more simple method will be
> possible,
> > the size calculation will be changed.
> >
> > postgres=# \dt+
> >List of relations
> > +++---+---+-+-+
> > | Schema |Name| Type  | Owner |  Size   | Description |
> > +++---+---+-+-+
> > | public | data   | table | pavel | 0 bytes | |
> > | public | data_2016  | table | pavel | 15 MB   | |
> > | public | data_2017  | table | pavel | 15 MB   | |
> > | public | data_other | table | pavel | 11 MB   | |
> > +++---+---+-+-+
> > (4 rows)
> >
> > postgres=# \dP+
> >   List of partitioned tables
> > ++--+---+---+-+
> > | Schema | Name | Owner | Size  | Description |
> > ++--+---+---+-+
> > | public | data | pavel | 42 MB | |
> > ++--+---+---+-+
> > (1 row)
>
> This looks nice, although I haven't looked at the patch yet.  Also, as you
> said, we could later replace the method of directly querying pg_inherits
> by something else.
>
> > p.s. Another patch can be replacement of relation type from "table" to
> > "partitioned table"
> >
> > postgres=# \dt+
> >  List of relations
> > +++---+---+-
> +-+
> > | Schema |Name|   Type| Owner |  Size   |
> Description |
> > +++---+---+-
> +-+
> > | public | data   | partitioned table | pavel | 0 bytes |
>  |
> > | public | data_2016  | table | pavel | 15 MB   |
>  |
> > | public | data_2017  | table | pavel | 15 MB   |
>  |
> > | public | data_other | table | pavel | 11 MB   |
>  |
> > +++---+---+-
> +-+
> > (4 rows)
> >
> > A patch is simple for this case
> >
> > diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
> > index c3bdf8555d..491e58eb29 100644
> > --- a/src/bin/psql/describe.c
> > +++ b/src/bin/psql/describe.c
> > @@ -3490,8 +3490,8 @@ listTables(const char *tabtypes, const char
> *pattern,
> > bool verbose, bool showSys
> >   gettext_noop("sequence"),
> >   gettext_noop("special"),
> >   gettext_noop("foreign table"),
> > - gettext_noop("table"),/* partitioned table */
> > - gettext_noop("index"),/* partitioned index */
> > + gettext_noop("partitioned table"),/*
> partitioned
> > table */
> > + gettext_noop("partitioned index"),/*
> partitioned
> > index */
> >   gettext_noop("Type"),
> >   gettext_noop("Owner"));
>
> Inclined to +1 this, too.  Although, one might ask why partitioned tables
> are called so only in the psql's output, but not in the backend's error
> messages, for example, as was discussed in the past.
>

i think so error messages is different chapter - it means revision of all
error messages, and probably somewhere the name of partition, and not
partition table can be correct.

Personally I am not sure about benefit to change error messages. Now, the
partition is table too, so the error messages are not strongly wrong.

Regards

Pavel



> Thanks,
> Amit
>
>


Re: ToDo: show size of partitioned table

2018-07-25 Thread Amit Langote
Hi Pavel.

On 2018/07/23 20:46, Pavel Stehule wrote:
> Hi
> 
> I am sending a prototype of patch. Now, it calculates size of partitioned
> tables with recursive query. When any more simple method will be possible,
> the size calculation will be changed.
> 
> postgres=# \dt+
>List of relations
> +++---+---+-+-+
> | Schema |Name| Type  | Owner |  Size   | Description |
> +++---+---+-+-+
> | public | data   | table | pavel | 0 bytes | |
> | public | data_2016  | table | pavel | 15 MB   | |
> | public | data_2017  | table | pavel | 15 MB   | |
> | public | data_other | table | pavel | 11 MB   | |
> +++---+---+-+-+
> (4 rows)
> 
> postgres=# \dP+
>   List of partitioned tables
> ++--+---+---+-+
> | Schema | Name | Owner | Size  | Description |
> ++--+---+---+-+
> | public | data | pavel | 42 MB | |
> ++--+---+---+-+
> (1 row)

This looks nice, although I haven't looked at the patch yet.  Also, as you
said, we could later replace the method of directly querying pg_inherits
by something else.

> p.s. Another patch can be replacement of relation type from "table" to
> "partitioned table"
> 
> postgres=# \dt+
>  List of relations
> +++---+---+-+-+
> | Schema |Name|   Type| Owner |  Size   | Description |
> +++---+---+-+-+
> | public | data   | partitioned table | pavel | 0 bytes | |
> | public | data_2016  | table | pavel | 15 MB   | |
> | public | data_2017  | table | pavel | 15 MB   | |
> | public | data_other | table | pavel | 11 MB   | |
> +++---+---+-+-+
> (4 rows)
> 
> A patch is simple for this case
> 
> diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
> index c3bdf8555d..491e58eb29 100644
> --- a/src/bin/psql/describe.c
> +++ b/src/bin/psql/describe.c
> @@ -3490,8 +3490,8 @@ listTables(const char *tabtypes, const char *pattern,
> bool verbose, bool showSys
>   gettext_noop("sequence"),
>   gettext_noop("special"),
>   gettext_noop("foreign table"),
> - gettext_noop("table"),/* partitioned table */
> - gettext_noop("index"),/* partitioned index */
> + gettext_noop("partitioned table"),/* partitioned
> table */
> + gettext_noop("partitioned index"),/* partitioned
> index */
>   gettext_noop("Type"),
>   gettext_noop("Owner"));

Inclined to +1 this, too.  Although, one might ask why partitioned tables
are called so only in the psql's output, but not in the backend's error
messages, for example, as was discussed in the past.

Thanks,
Amit




Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

2018-07-25 Thread Amit Langote
On 2018/07/21 0:17, David Rowley wrote:
> On 20 July 2018 at 21:44, Amit Langote  wrote:
>> But I don't think the result of make_partition_pruneinfo itself has to be
>> List of PartitionedRelPruneInfo nested under PartitionPruneInfo.  I gather
>> that each PartitionPruneInfo corresponds to each root partitioned table
>> and a PartitionedRelPruneInfo contains the actual pruning information,
>> which is created for every partitioned table (non-leaf tables), including
>> the root tables.  I don't think such nesting is necessary.  I think that
>> just like flattened partitioned_rels list, we should put flattened list of
>> PartitionedRelPruneInfo into the Append or MergeAppend plan.  No need for
>> nesting PartitionedRelPruneInfo under PartitionPruneInfo.
> 
> To do that properly you'd need to mark the target partitioned table of
> each hierarchy. Your test of pg_class.relispartition is not going to
> work as you're assuming the query is always going to the root. The
> user might query some intermediate partitioned table (which will have
> relispartition = true). Your patch will fall flat in that case.

Yeah, I forgot to consider that.

> You could work around that by having some array that points to the
> target partitioned table of each hierarchy, but I don't see why that's
> better than having the additional struct.

Or it could be a Bitmapset called root_indexes that stores the offset of
the first Index value in every partitioned_rels list contained in turn in
the list that's passed to make_partition_pruneinfo.

> There's also some code
> inside ExecFindInitialMatchingSubPlans() which does a backward scan
> over the partitions. This must process children before their parents.
> Unsure how well that's going to work if we start mixing the
> hierarchies. I'm sure it can be made to work providing each hierarchy
> is stored together consecutively in the array, but it just seems
> pretty fragile to me. That code is already pretty hard to follow.

I don't see how removing a nested loop changes things for worse.  AIUI,
the code replaces index values contained in the subplan_map arrays of
various PartitionedRelPruningData structs to account for any pruned
sub-plans.  Removing a nesting level because of having removed the nesting
struct doesn't seem to affect anything about that translation.  But your
point here seems to be about the relative ordering of
PartitionedRelPruningData structs among themselves being affected due to
their now being put into a flat array, although I don't see that as being
any more fragile.  We already are assuming a bunch about the relative
ordering of sub-plans or of PartitionedRelPruningData structs to have been
relying on storing their indexes in subplan_map and subpart_map.  Also, it
occurred to me that the new subplan indexes that
ExecFindInitialMatchingSubPlans computes are based on where subplans are
actually stored in the AppendState.appendplans array, which, in turn, is
based on the Bitmapset of "valid subplans" that
ExecFindInitialMatchingSubPlans passes back to ExecInitAppend.

> What's the reason you don't like the additional level to represent
> multiple hierarchies?

I started thinking about flattening PartitionedRelPruneInfo after looking
at flatten_partitioned_rels() in your patch.  If we're flattening
partitioned_rels (that is, not having it as a List of Lists in the
finished plan), why not flatten the pruning info node too?  As I said
earlier, I get it that we need List of Lists within the planner to get
make_partition_pruneinfo to work correctly in these types of queries, but
once we have figured out the correct details to pass to executor to
perform run-time pruning, I don't see why we need to pass that info again
as a List of Lists.

I have attached v2 of the delta patch which adds a root_indexes field to
PartitionPruneInfo to track topmost parents in every partition hierarchy
contained whose pruning info is contained in the Append.

Also, I noticed a bug with how ExecFindInitialMatchingSubPlans handles
other_subplans.  While the indexes in subplan_map arrays are updated to
contain revised values after pruning, those in the other_subplans
Bitmapset are not, leading to crashes or possibly wrong result.  For example:

create table p (a int, b int, c int) partition by list (a);
create table p1 partition of p for values in (1);
create table p2 partition of p for values in (2);
create table q (a int, b int, c int) partition by list (a);
create table q1 partition of q for values in (1) partition by list (b);
create table q11 partition of q1 for values in (1) partition by list (c);
create table q111 partition of q11 for values in (1);
create table q2 partition of q for values in (2) partition by list (b);
create table q21 partition of q2 for values in (1);
create table q22 partition of q2 for values in (2);

prepare q (int, int) as
select *
from (
  select * from p
  union all
  select * from q1
  union all
  select 1, 1, 1
 ) s(a, b, c)
where s.a = 

Re: How can we submit code patches that implement our (pending) patents?

2018-07-25 Thread Benjamin Scherrey
On Wed, Jul 25, 2018 at 12:56 PM, Nico Williams 
wrote:

> On Tue, Jul 24, 2018 at 06:29:37PM -0400, Isaac Morland wrote:
> > On 24 July 2018 at 18:17, Nico Williams  wrote:
> > > Note that it's OK to *accidentally* implement patented algorithms as
> > > long as the author of the contribution didn't know about.  There's no
> > > trebble damages in that case, and no tainting of others, plus,
> > > contributors and code reviewers/committers can't be expected to do
> > > patent searches for each contribution.
> >
> > Non-lawyer here, but "OK" is not a description I would apply to
> > implementing a patented algorithm. You might be thinking of copyright. Of
> > course it is true that people can't reasonably be expected to do patent
> > searches, as you describe, but the patent system as applied to software
> is
> > not well-known among knowledgeable people for being reasonable.
>
> Wrong.  With patents the important thing is not to know about them when
> you implement -- if you come up with the same idea by accident (which,
> of course, is obviously entirely possible) then you are not subject to
> trebble damages.  But if you knowingly copy the invention without a
> license then you are subject to trebble damages.
>
> A lot of patented ideas are fairly obvious.  That always seems true
> after the fact, naturally, but many are fairly obvious even before
> knowing about them.  It's clearly possible that you'll infringe by
> accident -- that's OK by comparison to infringing on purpose.
>
> Nico
> --
>

If you violate a patent, knowingly or otherwise, you are subject to
penalties (perhaps not treble but still penalties) and will have to remove
the offending code unless a deal is reached with the patent holder.

It is critical that Postgres require that all contributors do not enforce
patents against Postgres - full stop. That's the IP agreement that should
be in place.

  -- Ben Scherrey


Re: [HACKERS] Optional message to user when terminating/cancelling backend

2018-07-25 Thread Daniel Gustafsson
> On 24 Jul 2018, at 22:57, Daniel Gustafsson  wrote:
> 
>> On 6 Jul 2018, at 02:18, Thomas Munro  wrote:
>> 
>> On Fri, Jun 15, 2018 at 9:56 AM, Daniel Gustafsson  wrote:
>>> attached
>> 
>> Hi Daniel,
>> 
>> 6118  --select pg_cancel_backend(pg_backend_pid(), 'it brings on many 
>> changes');
>> 6119  select pg_cancel_backend(pg_backend_pid(), NULL);
>> 6120! ERROR:  canceling statement due to user request
>> 6121--- 25,32 
>> 6122
>> 6123  --select pg_cancel_backend(pg_backend_pid(), 'it brings on many 
>> changes');
>> 6124  select pg_cancel_backend(pg_backend_pid(), NULL);
>> 6125!  pg_cancel_backend
>> 6126! ---
>> 6127!  t
>> 
>> Apparently Windows can take or leave it as it pleases.
> 
> Well played =)
> 
>> https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.4488
> 
> That reads to me like it’s cancelling another backend than the current one,
> which clearly isn’t right as we’re passing pg_backend_pid().  I can’t really
> see what Windows specific bug was introduced by this patch though (or well, 
> the
> bug exhibits itself on Windows but it may well be generic of course).
> 
> Will continue to hunt.

Seems the build of the updated patch built and tested Ok.  Still have no idea
why the previous one didn’t.

https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.6695

cheers ./daniel


Re: How can we submit code patches that implement our (pending) patents?

2018-07-25 Thread Chapman Flack
On 07/25/18 01:56, Nico Williams wrote:

> Wrong.  With patents the important thing is not to know about them when
> you implement -- if you come up with the same idea by accident (which,
> of course, is obviously entirely possible) then you are not subject to
> trebble damages.

Even if the damages are not trebled, can 1✕ the damages be more than you
would like to shell out? Not to mention the hassle of getting any infringing
uses to cease?

Also, is this distinction universally applied across jurisdictions?

-Chap