date:20171011

Re: [HACKERS] Re: [COMMITTERS] pgsql: Add configure infrastructure to detect support for C99's restric

2017-10-11 Thread Andres Freund

On 2017-10-11 21:59:53 -0700, Andres Freund wrote:
> That fixed that problem I think. But unfortunately since then another
> problem has been reported by some other animals, all with older msvc
> versions afaict (thrips - vs 2012, bowerbird - vs 2012).

Correction, thrips is vs 2010, not 2012.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Add configure infrastructure to detect support for C99's restric

2017-10-11 Thread Andres Freund

On 2017-10-11 17:13:20 -0700, Andres Freund wrote:
> Hi,
> 
> On 2017-10-11 23:11:15 +, Andres Freund wrote:
> > Add configure infrastructure to detect support for C99's restrict.
> > 
> > Will be used in later commits improving performance for a few key
> > routines where information about aliasing allows for significantly
> > better code generation.
> > 
> > This allows to use the C99 'restrict' keyword without breaking C89, or
> > for that matter C++, compilers. If not supported it's defined to be
> > empty.
> 
> Woodlouse doesn't like this, erroring out with:
> C:\buildfarm\buildenv\HEAD\pgsql.build\src\include\libpq/pqformat.h(47): 
> error C2219: syntax error : type qualifier must be after '*' 
> (src/backend/access/common/printsimple.c) 
> [C:\buildfarm\buildenv\HEAD\pgsql.build\postgres.vcxproj]
> 
> It's MSVC being super peculiar about error checks. I think msvc is just
> confused by the pointer hiding typedef. Using some online MSVC (and
> other compilers) frontend:
> https://godbolt.org/g/TD3nmA
> 
> I confirmed that removing the pointer hiding typedef indeed resolves the
> isssue.
> 
> I can't quite decide whether msvc just has taste and dislikes pointer
> hiding typedefs as much as I do, or whether it's incredibly stupid ;)
> 
> I'll add a comment and use StringInfoData *.

That fixed that problem I think. But unfortunately since then another
problem has been reported by some other animals, all with older msvc
versions afaict (thrips - vs 2012, bowerbird - vs 2012). Those report
that the defining of restrict to __restrict interfers with some system
headers:
C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\include\stdlib.h(619): 
error C2485: '__restrict' : unrecognized extended attribute 
(src/backend/access/brin/brin.c) 
[c:\prog\bf\root\HEAD\pgsql.build\postgres.vcxproj]

Presumably that is because the headers contain __declspec(restrict) on
some function declarations.

I've temporarily silenced that error by moving the stdlib.h include
before the definition of restrict, but that seems fairly fragile. I
primarily wanted to see whether there's other problems.  At least thrips
is is now happy.

I see a number of options to continue:
- only define restrict on msvc 2013+ - for some reason woodlouse didn't
  complain about this problem, but I'm very doubtful that that's
  actually reliable.
- rename restrict to pg_restrict. That's annoying because we'd have to
  copy the autoconf test.
- live with the hack of including stdlib.h early, in pg_config.h.win32.
- $better_idea

Comments?

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages

2017-10-11 Thread Dilip Kumar

On Fri, Oct 6, 2017 at 9:21 PM, Dilip Kumar  wrote:
> On Fri, Oct 6, 2017 at 7:24 PM, Dilip Kumar  wrote:
>> On Fri, Oct 6, 2017 at 6:08 PM, Alexander Kuzmenkov
>>  wrote:
>>>
 Analysis: The estimated value of the lossy_pages is way higher than
 its actual value and reason is that the total_pages calculated by the
 "Mackert and Lohman formula" is not correct.
>>>
>>>
>>> I think the problem might be that the total_pages includes cache effects and
>>> rescans. For bitmap entries we should use something like relation pages *
>>> selectivity.
>>
>> I have noticed that for the TPCH case if I use "pages * selectivity"
>> it give me better results, but IMHO directly multiplying the pages
>> with selectivity may not be the correct way to calculate the number of
>> heap pages it can only give the correct result when all the TID being
>> fetched are clustered.  But on the other hand "Mackert and Lohman
>> formula" formulae consider that all the TID's are evenly distributed
>> across the heap pages which can also give the wrong estimation like we
>> are seeing in our TPCH case.
>
> I agree with the point that the total_pages included the cache effects
> and rescan when loop_count > 1, that can be avoided if we always
> calculate heap_pages as it is calculated in the else part
> (loop_count=0).  Fortunately, in all the TPCH query plan what I posted
> up thread bitmap scan was never at the inner side of the NLJ so
> loop_count was always 0.  I will fix this.

I have fixed the issue. Now, for calculating the lossy pages it will
not consider the rescan.  As mentioned above it will not affect the
TPCH plan so haven't measured the performance again.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


improve_bitmap_cost_v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Lockable views

2017-10-11 Thread Tatsuo Ishii

>> test=# CREATE VIEW v3 AS SELECT count(*) FROM v1;
>> CREATE VIEW
>> test=# BEGIN;
>> BEGIN
>> test=# LOCK TABLE v3;
>> ERROR:  cannot lock view "v3"
>> DETAIL:  Views that return aggregate functions are not automatically 
>> updatable.
> 
> It would be nice if the message would be something like:
> 
> DETAIL:  Views that return aggregate functions are not lockable
> 
>> test=# END;
>> ROLLBACK
>> 
>> test=# CREATE FUNCTION fnc() RETURNS trigger AS $$ BEGIN RETURN NEW; END; $$ 
>> LANGUAGE plpgsql;
>> CREATE FUNCTION
>> test=# CREATE TRIGGER trg INSTEAD OF INSERT ON v1 FOR EACH ROW EXECUTE 
>> PROCEDURE fnc();
>> CREATE TRIGGER
>> test=# BEGIN;
>> BEGIN
>> test=# LOCK TABLE v1;
>> ERROR:  cannot lock view "v1"
>> DETAIL:  views that have an INSTEAD OF trigger are not lockable
>> test=# END;
>> ROLLBACK
> 
> I wonder if we should lock tables in a subquery as well. For example,
> 
> create view v1 as select * from t1 where i in (select i from t2);
> 
> In this case should we lock t2 as well?

Current the patch ignores t2 in the case above.

So we have options below:

- Leave as it is (ignore tables appearing in a subquery)

- Lock all tables including in a subquery

- Check subquery in the view definition. If there are some tables
  involved, emit an error and abort.

The first one might be different from what users expect. There may be
a risk that the second one could cause deadlock. So it seems the third
one seems to be the safest IMO.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Continuous integration on Windows?

2017-10-11 Thread Thomas Munro

Hi hackers,

I don't use Windows myself, but I'd rather avoid submitting patches
that fail to build, build with horrible warnings or blow up on that
fine operating system.  I think it would be neat to be able to have
experimental branches of PostgreSQL built and tested on Windows
automatically just by pushing them to a watched public git repo.  Just
like Travis CI and several others do for the major GNU/Linux distros,
it seems there is at least one Windows-based CI company that
generously offers free testing to open source projects:
https://www.appveyor.com (hat tip to Ilmari for the pointer).  I
wonder... has anyone here with Microsoft know-how ever tried to
produce an appveyor.yml file that would do a MSVC build and
check-world?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] show precise repos version for dev builds?

2017-10-11 Thread Tom Lane

Craig Ringer  writes:
> On 12 October 2017 at 06:46, Peter Eisentraut
>  wrote:
>> I've been using
>> --with-extra-version=+git`date +%Y%m%d`"~"`git rev-parse --short HEAD`
>> for my local builds for some time, and I've not experienced any such
>> problems.

> I've seen issues with a number of tools. The one I can remember most
> clearly is check_postgres.pl . Nobody's going to argue that this is
> pretty code, but last time I tested (9.4-era, admittedly) it exploded
> messily with extra-version.

FWIW, Salesforce tried to do something similar to Peter's example
while I was there.  It did not work as well as we'd hoped :-( because
what got baked into the built executables was the latest git commit hash
as of the time you'd last run configure, not what was current as of the
latest "make".  Not to mention that you might have built from an
uncommitted state.  We tried to find a fix for the former problem that
didn't create lots of overhead, without much success.  No idea what
to do about the latter.

These are not big problems for package-building since you basically
never distribute executables that don't get built from a clean state.
But they put a crimp in the usefulness of the idea for developers.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-10-11 Thread Ashutosh Bapat

On Wed, Oct 11, 2017 at 7:47 PM, Robert Haas  wrote:
> On Mon, Oct 9, 2017 at 2:05 AM, Ashutosh Bapat
>  wrote:
>> On Sat, Oct 7, 2017 at 1:04 AM, Robert Haas  wrote:
>>> Committed.  I hope that makes things less red rather than more,
>>> because I'm going to be AFK for a few hours anyway.
>>
>> Here's the last patch, dealing with the dummy relations, rebased. With
>> this fix every join order of a partitioned join can be considered
>> partitioned. (This wasn't the case earlier when dummy relation was
>> involved.). So, we can allocate the child-join RelOptInfo array in
>> build_joinrel_partition_info(), instead of waiting for an appropriate
>> pair to arrive in try_partition_wise_join().
>
> Wouldn't a far more general approach be to allow a partition-wise join
> between a partitioned table and an unpartitioned table, considering
> the result as partitioned?  That seems like it would very often yield
> much better query plans than what we have right now, and also make the
> need for this particular thing go away.
>

You are suggesting that a dummy partitioned table be treated as an
un-partitioned table and apply above suggested optimization. A join
between a partitioned and unpartitioned table is partitioned by the
keys of only partitioned table. An unpartitioned table doesn't have
any keys, so this is fine. But a dummy partitioned table does have
keys. Recording them as keys of the join relation helps when it joins
to other relations. Furthermore a join between partitioned and
unpartitioned table doesn't require any equi-join condition on
partition keys of partitioned table but a join between partitioned
tables is considered to be partitioned by keys on both sides only when
there is an equi-join. So, when implementing a partitioned join
between a partitioned and an unpartitioned table, we will have to make
a special case to record partition keys when the unpartitioned side is
actually a dummy partitioned table. That might be awkward.

Because we don't have dummy children relation in all cases, we already
have some awkwardness like allocating part_rels array only when we
encounter a join order which has all the children. This patch removes
that.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-10-11 Thread Amit Langote

On 2017/09/30 1:53, Robert Haas wrote:
> On Thu, Sep 28, 2017 at 1:54 AM, Amit Langote
>  wrote:
>> I looked into how satisfies_hash_partition() works and came up with an
>> idea that I think will make constraint exclusion work.  What if we emitted
>> the hash partition constraint in the following form instead:
>>
>> hash_partition_mod(hash_partition_hash(key1-exthash, key2-exthash),
>>) = 
>>
>> With that form, constraint exclusion seems to work as illustrated below:
>>
>> \d+ p0
>> <...>
>> Partition constraint:
>> (hash_partition_modulus(hash_partition_hash(hashint4extended(a,
>> '8816678312871386367'::bigint)), 4) = 0)
>>
>> -- note only p0 is scanned
>> explain select * from p where
>> hash_partition_modulus(hash_partition_hash(hashint4extended(a,
>> '8816678312871386367'::bigint)), 4) = 0;
> 
> What we actually want constraint exclusion to cover is SELECT * FROM p
> WHERE a = 525600;

I agree.

> As Amul says, nobody's going to enter a query in the form you have it
> here.  Life is too short to take time to put queries into bizarre
> forms.

Here too.  I was falsely thinking that satisfies_hash_partition() is
intended to be used for more than just enforcing the partition constraint
when data is directly inserted into a hash partition, or more precisely to
be used in the CHECK constraint of the table that is to be attached as a
hash partition.  Now, we ask users to add such a constraint to avoid the
constraint validation scan, because the system knows how to infer from the
constraint that the partition constraint is satisfied.  I observed however
that, unlike range and list partitioning, the hash partition's constraint
could only ever be implied because of structural equality (equal()'ness)
of the existing constraint expression and the partition constraint
expression.  For example, a more restrictive range or list qual implies
the partition constraint, but it requires performing btree operator based
proof.  The proof is impossible with the chosen structure of hash
partitioning constraint, but it seems that that's OK.  That is, it's OK to
ask users to add the exact constraint (matching modulus and reminder
values in the call to satisfies_hash_partition() specified in the CHECK
constraint) to avoid the validation scan.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] show precise repos version for dev builds?

2017-10-11 Thread Craig Ringer

On 12 October 2017 at 06:46, Peter Eisentraut
 wrote:

> I've been using
>
> --with-extra-version=+git`date +%Y%m%d`"~"`git rev-parse --short HEAD`
>
> for my local builds for some time, and I've not experienced any such
> problems.

Interesting.

I've seen issues with a number of tools. The one I can remember most
clearly is check_postgres.pl . Nobody's going to argue that this is
pretty code, but last time I tested (9.4-era, admittedly) it exploded
messily with extra-version.

> However, using the various numeric reporting options is clearly better
> if you want to do version comparisons.  The "extra version" stuff should
> be mainly for labeling.

The trouble there is that we lack numeric version in some (IMO
crucial) places where we only have the string version:

- In the startup packet. We send server_version, but not
server_version_num, as GUC_REPORT. So if a client wants
server_version_num it has to do another round trip to query for it.

- In pg_config, where we don't expose any --version-num only --version.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Slow synchronous logical replication

2017-10-11 Thread Craig Ringer

On 12 October 2017 at 00:57, Konstantin Knizhnik
 wrote:

> The reason of such behavior is obvious: wal sender has to decode huge
> transaction generate by insert although it has no relation to this
> publication.

It does. Though I wouldn't expect anywhere near the kind of drop you
report, and haven't observed it here.

Is the CREATE TABLE and INSERT done in the same transaction? Because
that's a known pathological case for logical replication, it has to do
a LOT of extra work when it's in a transaction that has done DDL. I'm
sure there's room for optimisation there, but the general
recommendation for now is "don't do that".

> Filtering of insert records of this transaction is done only inside output
> plug-in.

Only partly true. The output plugin can register a transaction origin
filter and use that to say it's entirely uninterested in a
transaction. But this only works based on filtering by origins. Not
tables.

I imagine we could call another hook in output plugins, "do you care
about this table", and use it to skip some more work for tuples that
particular decoding session isn't interested in. Skip adding them to
the reorder buffer, etc. No such hook currently exists, but it'd be an
interesting patch for Pg11 if you feel like working on it.

> Unfortunately it is not quite clear how to make wal-sender smarter and let
> him skip transaction not affecting its publication.

As noted, it already can do so by origin. Mostly. We cannot totally
skip over WAL, since we need to process various invalidations etc. See
ReorderBufferSkip.

It's not so simple by table since we don't know early enough whether
the xact affects tables of interest or not. But you could definitely
do some selective skipping. Making it efficient could be the
challenge.

> Once of the possible solutions is to let backend inform wal-sender about
> smallest LSN it should wait for (backend knows which table is affected by
> current operation,
> so which publications are interested in this operation and so can point wal
> -sender to the proper LSN without decoding huge part of WAL.
> But it seems to be not so easy to implement.

Sounds like confusing layering violations to me.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-10-11 Thread Robert Haas

On Tue, Oct 10, 2017 at 7:07 AM, amul sul  wrote:
> How about the attached patch(0003)?
> Also, the dim variable is renamed to natts.

I'm not sure I believe this comment:

+/*
+ * We arrange the partitions in the ascending order of their modulus
+ * and remainders.  Also every modulus is factor of next larger
+ * modulus.  This means that the index of a given partition is same as
+ * the remainder of that partition.  Also entries at (remainder + N *
+ * modulus) positions in indexes array are all same for (modulus,
+ * remainder) specification for any partition.  Thus datums array from
+ * both the given bounds are same, if and only if their indexes array
+ * will be same.  So, it suffices to compare indexes array.
+ */

I am particularly not sure that I believe that the index of a
partition must be the same as the remainder.  It doesn't seem like
that would be true when there is more than one modulus or when some
partitions are missing.

+if (offset < 0)
+{
+next_modulus = DatumGetInt32(datums[0][0]);
+valid_modulus = (next_modulus % spec->modulus) == 0;
+}
+else
+{
+prev_modulus = DatumGetInt32(datums[offset][0]);
+valid_modulus = (spec->modulus % prev_modulus) == 0;
+
+if (valid_modulus && (offset + 1) < ndatums)
+{
+next_modulus =
DatumGetInt32(datums[offset + 1][0]);
+valid_modulus = (next_modulus %
spec->modulus) == 0;
+}
+}

I don't think this is quite right.  It checks the new modulus against
prev_modulus whenever prev_modulus is defined, which is correct, but
it doesn't check the new modulus against the next_modulus except when
offset < 0.  But actually that check needs to be performed, I think,
whenever the new modulus is less than the greatest modulus seen so
far.

+ * For a partitioned table defined as:
+ *CREATE TABLE simple_hash (a int, b char(10)) PARTITION BY HASH (a, b);
+ *
+ * CREATE TABLE p_p1 PARTITION OF simple_hash
+ *FOR VALUES WITH (MODULUS 2, REMAINDER 1);
+ * CREATE TABLE p_p2 PARTITION OF simple_hash
+ *FOR VALUES WITH (MODULUS 4, REMAINDER 2);
+ * CREATE TABLE p_p3 PARTITION OF simple_hash
+ *FOR VALUES WITH (MODULUS 8, REMAINDER 0);
+ * CREATE TABLE p_p4 PARTITION OF simple_hash
+ *FOR VALUES WITH (MODULUS 8, REMAINDER 4);
+ *
+ * This function will return one of the following in the form of an
+ * expression:
+ *
+ * for p_p1: satisfies_hash_partition(2, 1, hash_fn_1_extended(a, HASH_SEED),
+ * hash_fn_2_extended(b,
HASH_SEED))
+ * for p_p2: satisfies_hash_partition(4, 2, hash_fn_1_extended(a, HASH_SEED),
+ * hash_fn_2_extended(b,
HASH_SEED))
+ * for p_p3: satisfies_hash_partition(8, 0, hash_fn_1_extended(a, HASH_SEED),
+ * hash_fn_2_extended(b,
HASH_SEED))
+ * for p_p4: satisfies_hash_partition(8, 4, hash_fn_1_extended(a, HASH_SEED),
+ * hash_fn_2_extended(b,
HASH_SEED))

I think instead of this lengthy example you should try to explain the
general rule.  Maybe something like: the partition constraint for a
hash partition is always a call to the built-in function
satisfies_hash_partition().  The first two arguments are the modulus
and remainder for the partition; the remaining arguments are the hash
values computed for each column of the partition key using the
extended hash function from the appropriate opclass.

+static uint64
+mix_hash_value(int nkeys, Datum *hash_array, bool *isnull)

It would be nice to use the hash_combine() facility Andres recently
added for this rather than having a way to do it that is specific to
hash partitioning, but that function only works for 32-bit hash
values.  Maybe we can persuade Andres to add a hash_combine64...

+ * a hash operator class

Missing period at end.

+if (strategy == PARTITION_STRATEGY_HASH)
+ereport(ERROR,
+(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+ errmsg("default hash partition is not supported")));

Maybe errmsg("a hash-partitioned table may not have a default partition")?

+/* Seed for the extended hash function */
+#define HASH_SEED UINT64CONST(0x7A5B22367996DCFF)

I suggest HASH_PARTITION_SEED -- this is too generic.

Have you checked how well the tests you've added cover the code you've
added?  What code is not covered by the tests, and is there any way to
cover it?

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list

Re: [HACKERS] 64-bit queryId?

2017-10-11 Thread Michael Paquier

On Thu, Oct 12, 2017 at 9:05 AM, Robert Haas  wrote:
> On Wed, Oct 4, 2017 at 9:00 PM, Michael Paquier
>  wrote:
>> v4 looks correct to me. Testing it through (pgbench and some custom
>> queries) I have not spotted issues. If the final decision is to use
>> 64-bit query IDs, then this patch could be pushed.
>
> Cool.  Committed, thanks for the review.

The final result looks fine for me. Thanks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

2017-10-11 Thread Robert Haas

On Wed, Oct 4, 2017 at 9:00 PM, Michael Paquier
 wrote:
> v4 looks correct to me. Testing it through (pgbench and some custom
> queries) I have not spotted issues. If the final decision is to use
> 64-bit query IDs, then this patch could be pushed.

Cool.  Committed, thanks for the review.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Lockable views

2017-10-11 Thread Tatsuo Ishii

> Hi,
> 
> Attached is a patch to enable views to be locked.

Nice.

> PostgreSQL has supported automatically updatable views since 9.3, so we can
> udpate simply defined views like regular tables. However, currently, 
> table-level locks on views are not supported. We can not execute LOCK TABLE
> for views, while we can get row-level locks by FOR UPDATE/SHARE. In some
> situations that we need table-level locks on tables, we may also need 
> table-level locks on automatically updatable views. Although we can lock
> base-relations manually, it would be useful if we can lock views without
> knowing the definition of the views.
> 
> In the attached patch, only automatically-updatable views that do not have
> INSTEAD OF rules or INSTEAD OF triggers are lockable. It is assumed that
> those views definition have only one base-relation. When an auto-updatable
> view is locked, its base relation is also locked. If the base relation is a 
> view again, base relations are processed recursively. For locking a view,
> the view owner have to have he priviledge to lock the base relation.
> 
> * Example
> 
> test=# CREATE TABLE tbl (i int);
> CREATE TABLE
> 
> test=# CREATE VIEW v1 AS SELECT * FROM tbl; 
> CREATE VIEW
> test=# BEGIN;
> BEGIN
> test=# LOCK TABLE v1;
> LOCK TABLE
> test=# SELECT relname, locktype, mode FROM pg_locks,pg_class c WHERE 
> c.oid=relation AND relname NOT LIKE 'pg%';
>  relname | locktype |mode 
> -+--+-
>  tbl | relation | AccessExclusiveLock
>  v1  | relation | AccessExclusiveLock
> (2 rows)
> 
> test=# END;
> COMMIT
> 
> test=# CREATE VIEW v2 AS SELECT * FROM v1;
> CREATE VIEW
> test=# BEGIN;
> BEGIN
> test=# LOCK TABLE v2;
> LOCK TABLE
> test=# SELECT relname, locktype, mode FROM pg_locks,pg_class c WHERE 
> c.oid=relation AND relname NOT LIKE 'pg%';
>  relname | locktype |mode 
> -+--+-
>  v2  | relation | AccessExclusiveLock
>  tbl | relation | AccessExclusiveLock
>  v1  | relation | AccessExclusiveLock
> (3 rows)
> 
> test=# END;
> COMMIT
> 
> test=# CREATE VIEW v3 AS SELECT count(*) FROM v1;
> CREATE VIEW
> test=# BEGIN;
> BEGIN
> test=# LOCK TABLE v3;
> ERROR:  cannot lock view "v3"
> DETAIL:  Views that return aggregate functions are not automatically 
> updatable.

It would be nice if the message would be something like:

DETAIL:  Views that return aggregate functions are not lockable

> test=# END;
> ROLLBACK
> 
> test=# CREATE FUNCTION fnc() RETURNS trigger AS $$ BEGIN RETURN NEW; END; $$ 
> LANGUAGE plpgsql;
> CREATE FUNCTION
> test=# CREATE TRIGGER trg INSTEAD OF INSERT ON v1 FOR EACH ROW EXECUTE 
> PROCEDURE fnc();
> CREATE TRIGGER
> test=# BEGIN;
> BEGIN
> test=# LOCK TABLE v1;
> ERROR:  cannot lock view "v1"
> DETAIL:  views that have an INSTEAD OF trigger are not lockable
> test=# END;
> ROLLBACK

I wonder if we should lock tables in a subquery as well. For example,

create view v1 as select * from t1 where i in (select i from t2);

In this case should we lock t2 as well?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Is it time to kill support for very old servers?

2017-10-11 Thread Michael Paquier

On Wed, Oct 11, 2017 at 10:28 PM, Robert Haas  wrote:
> On Tue, Oct 10, 2017 at 9:14 PM, Andres Freund  wrote:
>> I'm actually inclined not to, and keep this as a undocumented debugging
>> option. Limiting the use of this option to people willing to read the
>> code seems like a good idea to me.
>
> -1.  I use the documentation to find things, even though I am a
> developer, and I don't think hiding things from users is a good idea
> anyway.  We can say that we recommend against using the option and
> that it's just for testing, but it should be documented anyway.

You would be disappointed with the current state of things then.
fe-connect.c lists a couple of debug options:
- authtype, which is not used anywhere, still kept to not reject
connection strings using it. I can get that this is not documented.
- tty, same argument as authtype.
- replication, which is still available, and that I have for example
used a couple of times for debugging the replication parser (psql can
work directly with some replication commands as you know). But this is
not documented.

FWIW, I would like to see things properly documented as well. And
please note that I am fine to write such a patch as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] replace GrantObjectType with ObjectType

2017-10-11 Thread Peter Eisentraut

It seems to me that having ACL_OBJECT_* symbols alongside OBJECT_*
symbols is not useful and leads to duplication.  Digging around in the
past suggests that we used to have a lot of these command-specific
symbols but got rid of them over time, except that the ACL stuff was
never touched.  The attached patch accomplishes that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From f80e5aced1cedd2682fd50f6fa067bf455f66f4d Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 11 Oct 2017 18:35:19 -0400
Subject: [PATCH] Replace GrantObjectType with ObjectType

There used to be a lot of different *Type and *Kind symbol groups to
address objects within different commands, most of which have been
replaced by ObjectType, starting with
b256f2426433c56b4bea3a8102757749885b81ba.  But this conversion was never
done for the ACL commands until now.

This change ends up being just a plain replacement of the types and
symbols, without any code restructuring needed, except deleting some now
redundant code.
---
 src/backend/catalog/aclchk.c | 208 +--
 src/backend/catalog/heap.c   |   4 +-
 src/backend/catalog/pg_namespace.c   |   2 +-
 src/backend/catalog/pg_proc.c|   2 +-
 src/backend/catalog/pg_type.c|   2 +-
 src/backend/commands/event_trigger.c | 109 +++---
 src/backend/parser/gram.y|  44 
 src/backend/tcop/utility.c   |   2 +-
 src/backend/utils/adt/acl.c  |  56 +-
 src/include/commands/event_trigger.h |   1 -
 src/include/nodes/parsenodes.h   |  19 +---
 src/include/tcop/deparse_utility.h   |   2 +-
 src/include/utils/acl.h  |   4 +-
 src/include/utils/aclchk_internal.h  |   2 +-
 14 files changed, 203 insertions(+), 254 deletions(-)

diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index ccde66a7dd..2aee4f2751 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -86,7 +86,7 @@ typedef struct
Oid nspid;  /* namespace, or 
InvalidOid if none */
/* remaining fields are same as in InternalGrant: */
boolis_grant;
-   GrantObjectType objtype;
+   ObjectType  objtype;
boolall_privs;
AclMode privileges;
List   *grantees;
@@ -116,8 +116,8 @@ static void ExecGrant_Type(InternalGrant *grantStmt);
 static void SetDefaultACLsInSchemas(InternalDefaultACL *iacls, List *nspnames);
 static void SetDefaultACL(InternalDefaultACL *iacls);
 
-static List *objectNamesToOids(GrantObjectType objtype, List *objnames);
-static List *objectsInSchemaToOids(GrantObjectType objtype, List *nspnames);
+static List *objectNamesToOids(ObjectType objtype, List *objnames);
+static List *objectsInSchemaToOids(ObjectType objtype, List *nspnames);
 static List *getRelationsInNamespace(Oid namespaceId, char relkind);
 static void expand_col_privileges(List *colnames, Oid table_oid,
  AclMode this_privileges,
@@ -441,60 +441,60 @@ ExecuteGrantStmt(GrantStmt *stmt)
 
/*
 * Convert stmt->privileges, a list of AccessPriv nodes, into an AclMode
-* bitmask.  Note: objtype can't be ACL_OBJECT_COLUMN.
+* bitmask.  Note: objtype can't be OBJECT_COLUMN.
 */
switch (stmt->objtype)
{
+   case OBJECT_TABLE:
/*
 * Because this might be a sequence, we test both 
relation and
 * sequence bits, and later do a more limited test when 
we know
 * the object type.
 */
-   case ACL_OBJECT_RELATION:
all_privileges = ACL_ALL_RIGHTS_RELATION | 
ACL_ALL_RIGHTS_SEQUENCE;
errormsg = gettext_noop("invalid privilege type %s for 
relation");
break;
-   case ACL_OBJECT_SEQUENCE:
+   case OBJECT_SEQUENCE:
all_privileges = ACL_ALL_RIGHTS_SEQUENCE;
errormsg = gettext_noop("invalid privilege type %s for 
sequence");
break;
-   case ACL_OBJECT_DATABASE:
+   case OBJECT_DATABASE:
all_privileges = ACL_ALL_RIGHTS_DATABASE;
errormsg = gettext_noop("invalid privilege type %s for 
database");
break;
-   case ACL_OBJECT_DOMAIN:
+   case OBJECT_DOMAIN:
all_privileges = ACL_ALL_RIGHTS_TYPE;
errormsg = gettext_noop("invalid privilege type %s for 
domain");
break;
-   case ACL_OBJECT_FUNCTION:
+   case OBJECT_FUNCTION:
all_privileges =

Re: [HACKERS] Fix a typo in execReplication.c

2017-10-11 Thread Masahiko Sawada

On Thu, Oct 12, 2017 at 5:02 AM, Robert Haas  wrote:
> On Mon, Oct 9, 2017 at 10:59 PM, Masahiko Sawada  
> wrote:
>> Attached a patch for $subject. ISTM these are mistakes of copy-and-paste.
>
> Committed, but isn't the code itself wrong as well in the DELETE case?
>
> /* BEFORE ROW DELETE Triggers */
> if (resultRelInfo->ri_TrigDesc &&
> resultRelInfo->ri_TrigDesc->trig_update_before_row)
> {
> skip_tuple = !ExecBRDeleteTriggers(estate, epqstate, resultRelInfo,
>>tts_tuple->t_self,
>NULL);
> }
>
> Why not trig_delete_before_row?
>

Thank you!
I think you're right. It should be trig_delete_before_row and be
back-patched to 10. Attached patch fixes it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index c26420a..fb538c0 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -511,7 +511,7 @@ ExecSimpleRelationDelete(EState *estate, EPQState *epqstate,
 
 	/* BEFORE ROW DELETE Triggers */
 	if (resultRelInfo->ri_TrigDesc &&
-		resultRelInfo->ri_TrigDesc->trig_update_before_row)
+		resultRelInfo->ri_TrigDesc->trig_delete_before_row)
 	{
 		skip_tuple = !ExecBRDeleteTriggers(estate, epqstate, resultRelInfo,
 		   >tts_tuple->t_self,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] show precise repos version for dev builds?

2017-10-11 Thread Peter Eisentraut

On 10/11/17 04:19, Craig Ringer wrote:
> On 11 October 2017 at 11:44, Jeremy Schneider  
> wrote:
>> On Sun, Oct 1, 2017 at 8:10 AM, Tom Lane  wrote:
>>>
>>> configure --with-extra-version=whateveryouwant
>>
>> I see that this build option has been around since 9.4; is anyone
>> using it to mark patched production builds?  EnterpriseDB or
>> 2ndQuadrant? How about the cloud providers?
> 
> We started using it for BDR, but unfortunately too much software
> explodes spectacularly when you use it, due to simplistic/buggy
> version parsing.

I've been using

--with-extra-version=+git`date +%Y%m%d`"~"`git rev-parse --short HEAD`

for my local builds for some time, and I've not experienced any such
problems.

However, using the various numeric reporting options is clearly better
if you want to do version comparisons.  The "extra version" stuff should
be mainly for labeling.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [BUGS] BUG #14821: idle_in_transaction_session_timeout sometimes gets ignored when statement timeout is pending

2017-10-11 Thread Lukas Fittl

On Wed, Oct 11, 2017 at 2:11 PM Andres Freund  wrote:

> I've pushed this. Thanks for the report & fix!
>

Excellent, thanks!

Best,
Lukas
-- 
Lukas Fittl

Re: [HACKERS] Windows warnings from VS 2017

2017-10-11 Thread Tom Lane

Andres Freund  writes:
> Andrew Dunstan  writes:
>> I presume if you can make that assertion you already have something
>> along those lines?

> Not really. I just replaced memset with MemSetAligned in a bunch of
> places in the code and looked at profiles.

Well, it's not that much work to try it and see.  I compared results
of this simplistic test case:
pgbench -S -c 1 -T 60 bench
(using a scale-factor-10 pgbench database) on current HEAD and HEAD
with the attached patch, which just lobotomizes all the MemSet
macros into memset().  Median of 3 runs is 9698 tps for HEAD and
9675 tps with the patch; the range is ~100 tps though, making
this difference well within the noise level.

I did this using RHEL6's gcc (Red Hat 4.4.7-18), which is pretty
far from bleeding-edge so I think it's okay as a baseline for
what optimizations we can expect to find used in the field.

So I can't sustain Andres' assertion that memset is actually faster
for the cases we care about, but it certainly doesn't seem any
slower either.  It would be interesting to check compromise
patches, such as keeping only the MemSetLoop case.

BTW, I got a couple of warnings with this patch:

pmsignal.c: In function 'PMSignalShmemInit':
pmsignal.c:104: warning: passing argument 1 of 'memset' discards qualifiers 
from pointer target type
/usr/include/string.h:65: note: expected 'void *' but argument is of type 
'volatile struct PMSignalData *'
procsignal.c: In function 'ProcSignalInit':
procsignal.c:119: warning: passing argument 1 of 'memset' discards qualifiers 
from pointer target type
/usr/include/string.h:65: note: expected 'void *' but argument is of type 
'volatile sig_atomic_t *'

Those seem like maybe something to look into.  MemSet is evidently
casting away pointer properties that it perhaps shouldn't.

regards, tom lane

diff --git a/src/include/c.h b/src/include/c.h
index b6a9697..f7ea3a5 100644
*** a/src/include/c.h
--- b/src/include/c.h
*** typedef NameData *Name;
*** 843,874 
   *	memset() functions.  More research needs to be done, perhaps with
   *	MEMSET_LOOP_LIMIT tests in configure.
   */
! #define MemSet(start, val, len) \
! 	do \
! 	{ \
! 		/* must be void* because we don't know if it is integer aligned yet */ \
! 		void   *_vstart = (void *) (start); \
! 		int		_val = (val); \
! 		Size	_len = (len); \
! \
! 		if uintptr_t) _vstart) & LONG_ALIGN_MASK) == 0 && \
! 			(_len & LONG_ALIGN_MASK) == 0 && \
! 			_val == 0 && \
! 			_len <= MEMSET_LOOP_LIMIT && \
! 			/* \
! 			 *	If MEMSET_LOOP_LIMIT == 0, optimizer should find \
! 			 *	the whole "if" false at compile time. \
! 			 */ \
! 			MEMSET_LOOP_LIMIT != 0) \
! 		{ \
! 			long *_start = (long *) _vstart; \
! 			long *_stop = (long *) ((char *) _start + _len); \
! 			while (_start < _stop) \
! *_start++ = 0; \
! 		} \
! 		else \
! 			memset(_vstart, _val, _len); \
! 	} while (0)
  
  /*
   * MemSetAligned is the same as MemSet except it omits the test to see if
--- 843,849 
   *	memset() functions.  More research needs to be done, perhaps with
   *	MEMSET_LOOP_LIMIT tests in configure.
   */
! #define MemSet(start, val, len) memset(start, val, len)
  
  /*
   * MemSetAligned is the same as MemSet except it omits the test to see if
*** typedef NameData *Name;
*** 876,901 
   * that the pointer is suitably aligned (typically, because he just got it
   * from palloc(), which always delivers a max-aligned pointer).
   */
! #define MemSetAligned(start, val, len) \
! 	do \
! 	{ \
! 		long   *_start = (long *) (start); \
! 		int		_val = (val); \
! 		Size	_len = (len); \
! \
! 		if ((_len & LONG_ALIGN_MASK) == 0 && \
! 			_val == 0 && \
! 			_len <= MEMSET_LOOP_LIMIT && \
! 			MEMSET_LOOP_LIMIT != 0) \
! 		{ \
! 			long *_stop = (long *) ((char *) _start + _len); \
! 			while (_start < _stop) \
! *_start++ = 0; \
! 		} \
! 		else \
! 			memset(_start, _val, _len); \
! 	} while (0)
! 
  
  /*
   * MemSetTest/MemSetLoop are a variant version that allow all the tests in
--- 851,857 
   * that the pointer is suitably aligned (typically, because he just got it
   * from palloc(), which always delivers a max-aligned pointer).
   */
! #define MemSetAligned(start, val, len) memset(start, val, len)
  
  /*
   * MemSetTest/MemSetLoop are a variant version that allow all the tests in
*** typedef NameData *Name;
*** 911,925 
  	MEMSET_LOOP_LIMIT != 0 && \
  	(val) == 0 )
  
! #define MemSetLoop(start, val, len) \
! 	do \
! 	{ \
! 		long * _start = (long *) (start); \
! 		long * _stop = (long *) ((char *) _start + (Size) (len)); \
! 	\
! 		while (_start < _stop) \
! 			*_start++ = 0; \
! 	} while (0)
  
  
  /*
--- 867,873 
  	MEMSET_LOOP_LIMIT != 0 && \
  	(val) == 0 )
  
! #define MemSetLoop(start, val, len) memset(start, val, len)
  
  
  /*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your

Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns

2017-10-11 Thread Andres Freund

On 2017-10-11 08:54:10 -0700, Andres Freund wrote:
> On 2017-10-11 10:53:56 +0200, Alvaro Herrera wrote:
> > Maybe it'd be a good idea to push 0001 with some user of restrict ahead
> > of the rest, just to see how older msvc reacts.
> 
> Can do. Not quite sure which older user yet, but I'm sure I can find
> something.

I looked around and didn't immedialy see a point where it'd be useful. I
don't really want to put it in some place where it's not useful. I think
we can just as well wait for the first patch using it to exercise
restrict support.

There's references to restrict support back to VS 2008:
https://msdn.microsoft.com/en-us/library/5ft82fed(v=vs.90).aspx

So I'll change the pg_config.h.win32 hunk to:
/* Define to the equivalent of the C99 'restrict' keyword, or to
   nothing if this is not supported.  Do not define if restrict is
   supported directly.  */
/* Visual Studio 2008 and upwards */
#if (_MSC_VER >= 1500)
/* works for C and C++ in msvc */
#define restrict __restrict
#else
#define restrict
#endif

there's several patterns like that (except for the version mapping) in
the file already.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Windows warnings from VS 2017

2017-10-11 Thread Andres Freund

> On 09/21/2017 09:41 PM, Andres Freund wrote:
> > On 2017-09-21 09:30:31 -0400, Tom Lane wrote:
> >> Andrew Dunstan  writes:
> >>> The speed of memset is hardly going to be the dominating factor in a
> >>> 'CREATE DATABASE' command, so we could certainly afford to change to
> >>> plain memset calls here.
> >> Another thought is that it may be time for our decennial debate about
> >> whether MemSet is worth the electrons it's printed on.  I continue to
> >> think that any modern compiler+libc ought to do an equivalent or better
> >> optimization given just a plain memset().  If that seems to be true
> >> for recent MSVC, we could consider putting an #if into c.h to define
> >> MemSet as just memset for MSVC.  Maybe later that could be extended
> >> to other compilers.
> > +many. glibc's memset is nearly an order of magnitude faster than MemSet
> > on my machine for medium+ length copies, and still 3-4 four times ~100
> > bytes. And both gcc and clang optimize way the memcpy entirely when the
> > length is a fixed short length.

> Perhaps we need some sort of small benchmark program that we can run on
> a representative set of platforms?

I personally don't think that's needed here, given how incredibly widely
used memset is in our codebase.


> I presume if you can make that assertion you already have something
> along those lines?

Not really. I just replaced memset with MemSetAligned in a bunch of
places in the code and looked at profiles.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [BUGS] BUG #14821: idle_in_transaction_session_timeout sometimes gets ignored when statement timeout is pending

2017-10-11 Thread Andres Freund

Hi,

On 2017-09-20 20:27:05 -0700, Lukas Fittl wrote:
> As per the bug report at
> https://www.postgresql.org/message-id/20170921010956.17345.61461%40wrigleys.postgresql.org
> it seems that the query cancellation holdoff logic in ProcessInterrupts is
> a bit overly aggressive in keeping other interrupts from running.
> 
> In particular I've seen an issue in the wild where
> idle_in_transaction_session_timeout did not get triggered because
> the HOLD_CANCEL_INTERRUPTS() in SocketBackend wraps around a pq_getbyte()
> call, and so ProcessInterrupts doesn't do anything when it gets called
> because the query cancel holdoff counter is positive.
> 
> Andres suggested the following re-ordering of the logic on -bugs:

I've pushed this. Thanks for the report & fix!

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Omission in GRANT documentation

2017-10-11 Thread Tom Lane

Laurenz Albe  writes:
> grant.sgml says that
>the default privileges granted to PUBLIC are as follows: CONNECT and
>CREATE TEMP TABLE for databases; EXECUTE privilege for functions;
>and USAGE privilege for languages.

> But types also have the USAGE privilege for PUBLIC by default:

Yup, that's an oversight.

> Hence I propose the attached documentation patch.

Pushed, with a bit of additional wordsmithing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pluggable storage

2017-10-11 Thread Peter Geoghegan

On Wed, Oct 11, 2017 at 1:08 PM, Robert Haas  wrote:
> On Mon, Oct 9, 2017 at 10:22 AM, Alexander Korotkov
>  wrote:
>> For me, it's crucial point that pluggable storages should be able to have
>> different MVCC implementation, and correspondingly have full control over
>> its interactions with indexes.
>> Thus, it would be good if we would get consensus on that point.  I'd like
>> other discussion participants to comment whether they agree/disagree and
>> why.
>> Any comments?
>
> I think it's good for new storage managers to have full control over
> interactions with indexes.  I'm not sure about the MVCC part.  I think
> it would be legitimate to want a storage manager to ignore MVCC
> altogether - e.g. to build a non-transactional table.

I agree with Alexander -- if you're going to have a new MVCC
implementation, you have to do significant work within index access
methods. Adding "retail index tuple deletion" is probably just the
beginning. ISTM that you need something like InnoDB's purge thread
when index values change, since two versions of the same index tuple
(each with distinct attribute values) have to physically co-exist for
a time.

> I don't know
> that it would be a very good idea to have two different full-fledged
> MVCC implementations, though.  Like Tom says, that would be
> replicating a lot of the awfulness of the MySQL model.

It's not just the MySQL model, FWIW. SQL-on-Hadoop systems like
Impala, certain NoSQL systems, and AFAIK any database system that
claims to have pluggable storage all do it this way. That is, core
transaction management functions (e.g. MVCC snapshot acquisition) is
outsourced to the storage engine. It *is* very cumbersome, but that's
what they do.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] show precise repos version for dev builds?

2017-10-11 Thread Jeremy Schneider

On Wed, Oct 11, 2017 at 1:19 AM, Craig Ringer  wrote:
> We started using it for BDR, but unfortunately too much software
> explodes spectacularly when you use it, due to simplistic/buggy
> version parsing.

Since 10.0 will break most of that software anyway, maybe this is a
good time to revisit the question?  :)

-J

-- 
http://about.me/jeremy_schneider


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pluggable storage

2017-10-11 Thread Robert Haas

On Mon, Oct 9, 2017 at 10:22 AM, Alexander Korotkov
 wrote:
> For me, it's crucial point that pluggable storages should be able to have
> different MVCC implementation, and correspondingly have full control over
> its interactions with indexes.
> Thus, it would be good if we would get consensus on that point.  I'd like
> other discussion participants to comment whether they agree/disagree and
> why.
> Any comments?

I think it's good for new storage managers to have full control over
interactions with indexes.  I'm not sure about the MVCC part.  I think
it would be legitimate to want a storage manager to ignore MVCC
altogether - e.g. to build a non-transactional table.  I don't know
that it would be a very good idea to have two different full-fledged
MVCC implementations, though.  Like Tom says, that would be
replicating a lot of the awfulness of the MySQL model.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix a typo in execReplication.c

2017-10-11 Thread Robert Haas

On Mon, Oct 9, 2017 at 10:59 PM, Masahiko Sawada  wrote:
> Attached a patch for $subject. ISTM these are mistakes of copy-and-paste.

Committed, but isn't the code itself wrong as well in the DELETE case?

/* BEFORE ROW DELETE Triggers */
if (resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_update_before_row)
{
skip_tuple = !ExecBRDeleteTriggers(estate, epqstate, resultRelInfo,
   >tts_tuple->t_self,
   NULL);
}

Why not trig_delete_before_row?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns

2017-10-11 Thread Robert Haas

On Wed, Oct 11, 2017 at 2:47 PM, Andres Freund  wrote:
>> If do nothing, it's unlikely we'd ever get rid of the compat function.
>
> I think that's ok.

Yeah.  I mean, it seems similar to what happened with heap_formtuple:
the last in-tree users of that function went away in 8.4 (902d1cb35)
but we didn't remove the function itself until 9.6 (726117243).  It
didn't really hurt anyone in the meantime; it was just a function
that, in most installs, was probably never called.  I think we should
do the same thing here: plan to keep the old functions around at least
until 11 is out of support, maybe longer, and not worry about it very
much.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns

2017-10-11 Thread Andres Freund

Hi,

On 2017-10-11 18:05:32 +0200, Alvaro Herrera wrote:
> Well, my concern is to ensure that extension authors take advantage of
> the optimized implementation.  If we never let them know that we've
> rewritten things, most are not going to realize they can make their
> extensions faster with a very simple code change.

The inline pq_gsendint() already results in faster code in a good
number of cases, as a decent compiler will often be able to evaluate at
plan time.

> On the other hand, I suppose that for the vast majority of extensions,
> doing the change is unlikely to have a noticeable in performance, so
> perhaps we should just keep the shim and move along.

Yea, I think it's unlikely to be noticeable unless you do a lot of them
in a row. Unfortunately all send functions essentially allocate a new
StringInfo - which is going to dominate execution cost.  We literally
allocate 1kb to send a single four byte integer.

Fixing the output function performance requires a fairly different type
of patch imo.

> If do nothing, it's unlikely we'd ever get rid of the compat function.

I think that's ok.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-10-11 Thread Robert Haas

On Tue, Oct 10, 2017 at 11:22 AM, Alvaro Herrera
 wrote:
> Hmm ... yeah, ATTACH and DETACH sound acceptable to me.  On DETACH, the
> abstract index should be marked indisvalid=false unless a substitute
> index already exists; and on ATTACH when indisvalid=false we verify that
> all local indexes exist, and if so we can flip indisvalid.  That way, we
> can continue to rely on the parent index always having all its children
> when the flag is set.

True.  I don't see an immediate use for that guarantee, actually,
because the index can't be "scanned" either way. But it might be
useful someday, if for example we wanted to try plan large numbers of
children symmetrically rather than (as we do currently) planning each
one individually, or if we've got an idea about making foreign keys
work.  I think you could ignore this for now, though, if you want.

> I'm not clear on a syntax that creates the main index and hopes to later
> have the sub-indexes created.  Another approach is to do it the other
> way around, i.e. create the children first, then once they're all in
> place create the main one normally, which merely verifies that all the
> requisite children exist.  This is related to what I proposed to occur
> when a regular table is joined as a partition of the partitioned table:
> we run a verification that an index matching the parent's abstract
> indexes exists, and if not we raise an error.  (Alternatively we could
> allow the case, and mark the abstract index as indisvalid=false, but
> that seems to violate POLA).

It's not much different than
pg_catalog.binary_upgrade_create_empty_extension but I don't really
care which way pg_dump emits it.

>> Another thing that would let you do is CREATE INDEX CONCURRENTLY
>> replacement_concrete_index; ALTER INDEX abstract_index DETACH
>> PARTITION old_concrete_index, ATTACH PARTITION
>> replacement_concrete_index; DROP INDEX CONCURRENTLY
>> old_concrete_index, which seems like a thing someone might want to do.
>
> Yeah, this is a point I explicitly mentioned, and this proposal seems
> like a good way.

Cool.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Slow synchronous logical replication

2017-10-11 Thread Konstantin Knizhnik




On 11.10.2017 10:07, Craig Ringer wrote:

On 9 October 2017 at 15:37, Konstantin Knizhnik
 wrote:

Thank you for explanations.

On 08.10.2017 16:00, Craig Ringer wrote:

I think it'd be helpful if you provided reproduction instructions,
test programs, etc, making it very clear when things are / aren't
related to your changes.


It will be not so easy to provide some reproducing scenario, because
actually it involves many components (postgres_fdw, pg_pasthman,
pg_shardman, LR,...)

So simplify it to a test case that doesn't.

The simplest reproducing scenario is the following:
1. Start two Posgtgres instances: synchronous_commit=on, fsync=off
2. Initialize pgbench database at both instances: pgbench -i
3. Create publication for pgbench_accounts table at one node
4. Create correspondent subscription at another node with 
copy_data=false parameter

5. Add subscription to synchronous_standby_names at first node.
6. Start pgbench -c 8 -N -T 100 -P 1 at first node. At my systems 
results are the following:

standalone postgres: 8600 TPS
asynchronous replication: 6600 TPS
synchronous replication:   5600 TPS
Quite good results.
7. Create some dummy table and perform bulk insert in it:
create table dummy(x integer primary key);
insert into dummy values (generate_series(1,1000));

pgbench almost stuck: until end of insert performance drops almost 
to zero.


The reason of such behavior is obvious: wal sender has to decode huge 
transaction generate by insert although it has no relation to this 
publication.
Filtering of insert records of this transaction is done only inside 
output plug-in.
Unfortunately it is not quite clear how to make wal-sender smarter and 
let him skip transaction not affecting its publication.
Once of the possible solutions is to let backend inform wal-sender about 
smallest LSN it should wait for (backend knows which table is affected 
by current operation,
so which publications are interested in this operation and so can point 
wal -sender to the proper LSN without decoding huge part of WAL.

But it seems to be not so easy to implement.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise aggregation/grouping

2017-10-11 Thread Robert Haas

On Tue, Oct 10, 2017 at 6:00 AM, Ashutosh Bapat
 wrote:
> This looks good to me. I think it should be a separate, yet very small patch.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Slow synchronous logical replication

2017-10-11 Thread Andres Freund

Hi,

On 2017-10-09 10:37:01 +0300, Konstantin Knizhnik wrote:
> So we have implement sharding - splitting data between several remote tables
> using pg_pathman and postgres_fdw.
> It means that insert or update of parent table  cause insert or update of
> some derived partitions which is forwarded by postgres_fdw to the
> correspondent node.
> Number of shards is significantly larger than number of nodes, i.e. for 5
> nodes we have 50 shards. Which means that at each onde we have 10 shards.
> To provide fault tolerance each shard is replicated using logical
> replication to one or more nodes. Right now we considered only redundancy
> level 1 - each shard has only one replica.
> So from each node we establish 10 logical replication channels.

Isn't that part of the pretty fundamental problem? There shouldn't be 10
different replication channels per node. There should be one.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add port/strnlen support to libpq and ecpg Makefiles.

2017-10-11 Thread Tom Lane

Andres Freund  writes:
> On 2017-10-11 11:58:58 -0400, Tom Lane wrote:
>> I agree the PITA factor of the current approach keeps increasing.
>> It sounds a bit silly to build libpgport three ways, but maybe
>> we should just do that.

> We already kinda are, just by copying things around ;)

Yeah.  I hadn't realized how much duplication of effort is happening
within ecpg.  This was a somewhat reasonable solution when it was
first invented for libpq only, but building snprintf.o four times
is pretty silly.

>> Or conceivably we should just build the FE version of libpgport.a
>> with -fPIC and not worry about whether that loses some efficiency
>> for client programs.  A lot of distros are effectively forcing
>> that, or even -fPIE, anyway.

> Hm.

On reflection, let's just go with the solution of building libpgport_lib.a
with the right flags (fPIC + threading) and leave libpgport.a alone.
That way we don't need a debate about whether there's an efficiency
cost worth worrying about.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Predicate Locks for writes?

2017-10-11 Thread Robert Haas

On Wed, Oct 11, 2017 at 11:51 AM, Simon Riggs  wrote:
> I'm inclined to believe Robert's hypothesis that there is some race
> condition there.
>
> The question is whether that still constitutes a serializability
> problem since we haven't done anything with the data, just passed it
> upwards to be modified.

Well, the question is whether passing it upwards constitutes reading
it.  I kind of suspect it does.  The plan tree isn't just blindly
propagating values upward but stuff with them.  There could be quite a
bit between the ModifyTable node and the underlying scan if DELETE ..
FROM or UPDATE .. USING is in use.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns

2017-10-11 Thread Alvaro Herrera

Andres Freund wrote:
> On 2017-10-11 10:53:56 +0200, Alvaro Herrera wrote:

> > I wonder if it'd be a good idea to nag external users about pq_sendint
> > usage (is a #warning possible?).
> 
> Think we'd need some separate infrastructure, i.e. for gcc ending up
> with __attribute__((deprecated)). I don't quite see this being worth
> adding it, but ...

Probably not.

> > OTOH, do we really need to keep it
> > around?  Maybe we should ditch it, since obviously the compat shim can
> > be installed locally in each extension that really needs it (thinking
> > that most external code can simply be adjusted to the new functions).
> 
> That seems like causing unnecessary pain - we're talking about a few
> lines in a header here, right? It's not like they'll be trivially
> converting to pq_sendint$width anytime soon, unless we backpatch this.

Well, my concern is to ensure that extension authors take advantage of
the optimized implementation.  If we never let them know that we've
rewritten things, most are not going to realize they can make their
extensions faster with a very simple code change.  On the other hand, I
suppose that for the vast majority of extensions, doing the change is
unlikely to have a noticeable in performance, so perhaps we should just
keep the shim and move along.

If do nothing, it's unlikely we'd ever get rid of the compat function.
Maybe add a comment suggesting to remove once pg10 is out of support?

> > I'm scared about the not-null-terminated stringinfo stuff.  Is it
> > possible to create bugs by polluting a stringinfo with it, then having
> > the stringinfo be used by unsuspecting code?  Admittedly, you can break
> > things already with the binary appends, so probably not an issue.
> 
> All of the converted sites already add integers into the StringInfo -
> and most of the those integers consist out of a couple bytes of 0,
> because they're lengths. So I don't think there's a huge danger here.

Right, agreed on that.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns

2017-10-11 Thread Andres Freund

On 2017-10-11 10:53:56 +0200, Alvaro Herrera wrote:
> Andres Freund wrote:
> > Hi,
> > 
> > On 2017-10-03 13:58:37 -0400, Robert Haas wrote:
> > > On Tue, Oct 3, 2017 at 12:23 PM, Andres Freund  wrote:
> > > > Makes sense?
> > > 
> > > Yes.
> > 
> > Here's an updated version of this patchset.
> 
> Maybe it'd be a good idea to push 0001 with some user of restrict ahead
> of the rest, just to see how older msvc reacts.

Can do. Not quite sure which older user yet, but I'm sure I can find
something.

> I wonder if it'd be a good idea to nag external users about pq_sendint
> usage (is a #warning possible?).

Think we'd need some separate infrastructure, i.e. for gcc ending up
with __attribute__((deprecated)). I don't quite see this being worth
adding it, but ...

> OTOH, do we really need to keep it
> around?  Maybe we should ditch it, since obviously the compat shim can
> be installed locally in each extension that really needs it (thinking
> that most external code can simply be adjusted to the new functions).

That seems like causing unnecessary pain - we're talking about a few
lines in a header here, right? It's not like they'll be trivially
converting to pq_sendint$width anytime soon, unless we backpatch this.

> I'm scared about the not-null-terminated stringinfo stuff.  Is it
> possible to create bugs by polluting a stringinfo with it, then having
> the stringinfo be used by unsuspecting code?  Admittedly, you can break
> things already with the binary appends, so probably not an issue.

All of the converted sites already add integers into the StringInfo -
and most of the those integers consist out of a couple bytes of 0,
because they're lengths. So I don't think there's a huge danger here.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallelize queries containing initplans

2017-10-11 Thread Robert Haas

On Mon, Oct 9, 2017 at 5:56 AM, Amit Kapila  wrote:
> How about always returning false for PARAM_EXTERN?

Yeah, I think that's what we should do.  Let's do that first as a
separate patch, which we might even want to back-patch, then return to
this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Predicate Locks for writes?

2017-10-11 Thread Simon Riggs

On 11 October 2017 at 15:33, Robert Haas  wrote:
> On Sat, Oct 7, 2017 at 7:26 AM, Simon Riggs  wrote:
>> PredicateLockTuple() specifically avoids adding an SIRead lock if the
>> tuple already has a write lock on it, so surely it must also be
>> correct to skip the SIRead lock if we are just about to update the
>> row?
>
> I wonder if there's a race condition.  Can someone else read/update
> the tuple between the time when we would've taken the SIRead lock and
> the time when we would have gotten the actual tuple lock?

On 9 October 2017 at 13:23, Alexander Korotkov
 wrote:

>> PredicateLockTuple() specifically avoids adding an SIRead lock if the
>> tuple already has a write lock on it, so surely it must also be
>> correct to skip the SIRead lock if we are just about to update the
>> row?
>>
>> I am suggesting that we ignore PredicateLockTuple() for cases where we
>> are about to update or delete the found tuple.
>
>
> If my thoughts above are correct, than it would be a reasonable
> optimization.  We would skip cycle of getting SIReadLock on tuple and then
> releasing it, without any change of behavior.

I'm inclined to believe Robert's hypothesis that there is some race
condition there.

The question is whether that still constitutes a serializability
problem since we haven't done anything with the data, just passed it
upwards to be modified.

If not we can just skip the SIread lock.

If it is an issue that still leaves the optimization in the case where
we choose to locate the row using an exclusive lock and just skip the
SIread lock altogether.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Discussion on missing optimizations

2017-10-11 Thread Robert Haas

On Fri, Oct 6, 2017 at 10:19 PM, Tom Lane  wrote:
>> 9. Unneeded Self JOIN
>
>> Can't remember discussions of this.
>
> I can't get very excited about that one either.

My memories of being a PostgreSQL user rather than a developer are
getting a bit distant by now, but I definitely remember hitting this
problem especially in cases involving DELETE FROM and UPDATE USING.
You can't specify a left join between the result relation and some
other table directly, so you end up having to do DELETE FROM x USING x
LEFT JOIN y ... which then has lousy performance.

I think you could hit it in other cases, too, e.g. a join between two
views which share a common base table.  Of course maybe you wouldn't
have two views in the first place if join removal worked better than
it currently does, but without, at least, inner join removal you can't
really rely on the query planner to prune things down to what's really
needed.

> In the end, what the article fails to consider is that all of these are
> tradeoffs, not unalloyed goods.  If you spend planner cycles on every
> query to look for cases that only the most unabashedly brain-dead ORMs
> ever generate, you're not really doing your users a favor on balance.

I think what you're failing to consider is that the author of the
article is a user.  When he says these optimizations are cool, he
means that they would benefit his use cases (and presumably that he's
willing to pay some number of additional cycles to get them).

We haven't really done a very good job figuring out what to do about
optimizations that some people (mostly you) think are marginal.  It's
obviously true that we can't just burn infinite planner cycles on
things that don't usually help, but at the same time, we can't just
keep ignoring requests by users for the same features and saying
"you'll be sad if we give this to you".  Those people don't give up on
wanting the optimization; they just don't use PostgreSQL.  I think we
need to stop just saying "no" and start thinking about what we could
do that would let us say "yes".

One trick that some system use is avoid replanning as much as we do
by, for example, saving plans in a shared cache and reusing them even
in other sessions.  That's hard to do in our architecture because the
controlling GUCs can be different in every session and there's not
even any explicit labeling of which GUCs control planner behavior. But
if you had it, then extra planning cycles would be, perhaps, more
tolerable.

Another trick is to plan harder when the cost or complexity of the
query exceeds some threshold, but that's hard to do because we don't
really know what the cost or complexity is until after we get done
planning, by which time it's too late to (cheaply) go back and apply
those optimizations.  This problem of not really knowing whether we've
got an expensive query is also a problem in terms of being smart about
how many parallel workers to pick or whether to consider parallelism
at all; we'd like to consider more options for expensive queries than
cheap ones.  But there's not really any good way to do this as things
stand today.

Maybe one way to attack this problem is to make it more general:
what's the next big thing for the planner?  It wasn't altogether clear
to me that the path->plan conversion was going to be a huge
improvement, but it definitely has been.  Nothing really got much
faster as a direct result of that work, but it made the code simpler
and more understandable and thus easier to enhance, unblocking
improvements like postgres_fdw aggregate pushdown.  I think we're a
long way from reaching the end of the benefits of that commit - there
is a lot more that can usefully be done.  But, what's next?  You told
me a few years back when I was thinking about what to do next that
there was still a lot of gains to be had from improving the planner,
and I wasn't sure I quite believed it.  But it's becoming more clear
to me that this is true, and that the kinds of things this article is
talking about are one approach.  I don't really have the answers here,
but I don't think our planner is anywhere close to as good as it can
be, even if in certain respects it is stuck at some kind of local
maximum.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Add ALWAYS DEFERRED option for constraints

2017-10-11 Thread Nico Williams

FYI, I've added my patch to the commitfest.

https://commitfest.postgresql.org/15/1319/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_regress help output

2017-10-11 Thread Tom Lane

Alvaro Herrera  writes:
> Tom Lane wrote:
>> Also, why is the patch apparently changing whitespace in all the help
>> lines?  Seems like that will create a lot of make-work for translators.

> I think we don't have translations for pg_regress.

Good point --- objection withdrawn.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Predicate Locks for writes?

2017-10-11 Thread Robert Haas

On Sat, Oct 7, 2017 at 7:26 AM, Simon Riggs  wrote:
> PredicateLockTuple() specifically avoids adding an SIRead lock if the
> tuple already has a write lock on it, so surely it must also be
> correct to skip the SIRead lock if we are just about to update the
> row?

I wonder if there's a race condition.  Can someone else read/update
the tuple between the time when we would've taken the SIRead lock and
the time when we would have gotten the actual tuple lock?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] A hook for session start

2017-10-11 Thread Robert Haas

On Fri, Oct 6, 2017 at 3:36 PM, Nico Williams  wrote:
>> If global temporary tables should be effective, then you have not have
>> modify system catalogue after creating. But lot of processes requires it -
>> ANALYZE, query planning.
>
> But the nice thing about them is that you need only create them once, so
> leave them in the catalog.  Stats about them should not be gathered nor
> stored, since they could be different per-session.

The topic of this thread seems to have drifted quite far from the
subject line, but here are some links to previous discussions of
global temporary tables.

https://www.postgresql.org/message-id/flat/u2o603c8f071004231952i36642ae6u9d6a7eae6eb6ff32%40mail.gmail.com
https://www.postgresql.org/message-id/AANLkTim=5af41BKFvZ=ofvj465kxqkjdjhqzudz3k...@mail.gmail.com
https://www.postgresql.org/message-id/flat/CAFj8pRC2h6qhHsFbcE7b_7SagiS6o%3D5J2UvCwCb05Ka1XFv_Ng%40mail.gmail.com
https://www.postgresql.org/message-id/flat/CA%2B0EDdACCx8w4nK-wdj-eodbJn4juChnHoUWVMM3u3uhLVPnJw%40mail.gmail.com

I would strongly recommend reading through those carefully before
undertaking anything here.  This is not a straightforward project...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-10-11 Thread Robert Haas

On Mon, Oct 9, 2017 at 2:05 AM, Ashutosh Bapat
 wrote:
> On Sat, Oct 7, 2017 at 1:04 AM, Robert Haas  wrote:
>> Committed.  I hope that makes things less red rather than more,
>> because I'm going to be AFK for a few hours anyway.
>
> Here's the last patch, dealing with the dummy relations, rebased. With
> this fix every join order of a partitioned join can be considered
> partitioned. (This wasn't the case earlier when dummy relation was
> involved.). So, we can allocate the child-join RelOptInfo array in
> build_joinrel_partition_info(), instead of waiting for an appropriate
> pair to arrive in try_partition_wise_join().

Wouldn't a far more general approach be to allow a partition-wise join
between a partitioned table and an unpartitioned table, considering
the result as partitioned?  That seems like it would very often yield
much better query plans than what we have right now, and also make the
need for this particular thing go away.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Is it time to kill support for very old servers?

2017-10-11 Thread Robert Haas

On Tue, Oct 10, 2017 at 9:14 PM, Andres Freund  wrote:
> I'm actually inclined not to, and keep this as a undocumented debugging
> option. Limiting the use of this option to people willing to read the
> code seems like a good idea to me.

-1.  I use the documentation to find things, even though I am a
developer, and I don't think hiding things from users is a good idea
anyway.  We can say that we recommend against using the option and
that it's just for testing, but it should be documented anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

2017-10-11 Thread Amit Khandekar

On 9 October 2017 at 16:03, Amit Kapila  wrote:
> On Fri, Oct 6, 2017 at 12:03 PM, Amit Khandekar  
> wrote:
>> On 6 October 2017 at 08:49, Amit Kapila  wrote:
>>>
>>> Okay, but why not cheapest partial path?
>>
>> I gave some thought on this point. Overall I feel it does not matter
>> which partial path it should pick up. RIght now the partial paths are
>> not ordered. But for non-partial paths sake, we are just choosing the
>> very last path. So in case of mixed paths, leader will get a partial
>> path, but that partial path would not be the cheapest path. But if we
>> also order the partial paths, the same logic would then pick up
>> cheapest partial path. The question is, should we also order the
>> partial paths for the leader ?
>>
>> The only scenario I see where leader choosing cheapest partial path
>> *might* show some benefit, is if there are some partial paths that
>> need to do some startup work using only one worker. I think currently,
>> parallel hash join is one case where it builds the hash table, but I
>> guess here also, we support parallel hash build, but not sure about
>> the status.
>>
>
> You also need to consider how merge join is currently work in parallel
> (each worker need to perform the whole of work of right side).

Yes, here if the leader happens to take the right side, it may slow
down the overall merge join. But this seems to be a different case
than the case of high startup costs.

>  I think there could be more scenario's where the startup cost is high
> and parallel worker needs to do that work independently.

True.

>
>  For such plan, if leader starts it, it would be slow, and
>> no other worker would be able to help it, so its actual startup cost
>> would be drastically increased. (Another path is parallel bitmap heap
>> scan where the leader has to do something and the other workers wait.
>> But here, I think it's not much work for the leader to do). So
>> overall, to handle such cases, it's better for leader to choose a
>> cheapest path, or may be, a path with cheapest startup cost. We can
>> also consider sorting partial paths with decreasing startup cost.
>>
>
> Yeah, that sounds reasonable.

Attached patch sorts partial paths by descending startup cost.


On 6 October 2017 at 08:49, Amit Kapila  wrote:
> On Thu, Oct 5, 2017 at 4:11 PM, Amit Khandekar  wrote:
>>
>> Ok. How about removing pa_all_partial_subpaths altogether , and
>> instead of the below condition :
>>
>> /*
>> * If all the child rels have partial paths, and if the above Parallel
>> * Append path has a mix of partial and non-partial subpaths, then consider
>> * another Parallel Append path which will have *all* partial subpaths.
>> * If enable_parallelappend is off, make this one non-parallel-aware.
>> */
>> if (partial_subpaths_valid && !pa_all_partial_subpaths)
>> ..
>>
>> Use this condition :
>> if (partial_subpaths_valid && pa_nonpartial_subpaths != NIL)
>> ..
>>
>
> Sounds good to me.

Did this. Here is the new condition I used  along with the comments
explaining it :

+* If parallel append has not been added above, or the added
one has a mix
+* of partial and non-partial subpaths, then consider another Parallel
+* Append path which will have *all* partial subpaths. We can add such a
+* path only if all childrels have partial paths in the first
place. This
+* new path will be parallel-aware unless enable_parallelappend is off.
 */
-   if (partial_subpaths_valid && !pa_all_partial_subpaths)
+   if (partial_subpaths_valid &&
+   (!pa_subpaths_valid || pa_nonpartial_subpaths != NIL))

Also added some test scenarios.

On 6 October 2017 at 12:03, Amit Khandekar  wrote:
> On 6 October 2017 at 08:49, Amit Kapila  wrote:
>>
>> One minor point:
>>
>> + if (!node->as_padesc)
>> + {
>> + /*
>> + */
>> + if (!exec_append_seq_next(node))
>> + return ExecClearTuple(node->ps.ps_ResultTupleSlot);
>> + }
>>
>> It seems either you want to add a comment in above part of patch or
>> you just left /**/ mistakenly.
>
> Oops. Yeah, the comment wrapper remained there when I moved its
> content "This is Parallel-aware append. Follow it's own logic ..." out
> of the if block.

Removed the comment wrapper.


Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


ParallelAppend_v17.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Hash take II

2017-10-11 Thread Prabhat Sahu

Hi Thomas,

I was testing this feature with v20 patch, and I got a crash while doing
large joins with small work_mem, and lots of workers as below.

-- Machine Configuration: (d1.xlarge) CUPs : 8 , RAM  : 16GB , SIze : 640GB

-- postgres.conf setting as below:
work_mem = *64kB*
max_parallel_workers_per_gather = *128*
max_parallel_workers = *64*
enable_mergejoin = off
enable_nestloop = off
enable_hashjoin = on
force_parallel_mode = on
seq_page_cost = 0.1
random_page_cost = 0.1
effective_cache_size = 128MB
parallel_tuple_cost = 0
parallel_setup_cost = 0
parallel_synchronization_cost = 0

-- created only one table "lineitem" of size 93GB.
postgres=# select pg_size_pretty(pg_total_relation_size('lineitem'));
 pg_size_pretty

 93 GB
(1 row)

[centos@centos-prabhat bin]$ vi test10.sql
explain (analyze, costs off)
select  count(*) from lineitem t1 join lineitem t2 using(l_suppkey) join
lineitem t3 using(l_suppkey);
select  count(*) from lineitem t1 join lineitem t2 using(l_suppkey) join
lineitem t3 using(l_suppkey);

[centos@centos-prabhat bin]$ ./psql postgres -a -f test10.sql > test10.out

[centos@centos-prabhat bin]$ vi test10.out
explain (analyze, costs off)
select  count(*) from lineitem t1 join lineitem t2 using(l_suppkey) join
lineitem t3 using(l_suppkey);
psql:test10.sql:2: WARNING:  terminating connection because of crash of
another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
psql:test10.sql:2: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:test10.sql:2: connection to server was lost

Kindly check, if you can reproduce the same at your end.

*Thanks & Regards,*

*Prabhat Kumar Sahu*
Mob: 7758988455
Skype ID: prabhat.sahu1984

www.enterprisedb.co m

On Thu, Oct 5, 2017 at 1:15 PM, Thomas Munro 
wrote:

> On Thu, Oct 5, 2017 at 7:07 PM, Rushabh Lathia 
> wrote:
> > v20 patch set (I was trying 0008, 0009 patch)  not getting cleanly apply
> on
> > latest commit also getting compilation error due to refactor in below
> > commit.
> >
> > commit 0c5803b450e0cc29b3527df3f352e6f18a038cc6
>
> Hi Rushabh
>
> I am about to post a new patch set that fixes the deadlock problem,
> but in the meantime here is a rebase of those two patches (numbers
> changed to 0006 + 0007).  In the next version I think I should
> probably remove that 'stripe' concept.  The idea was to spread
> temporary files over the available temporary tablespaces, but it's a
> terrible API, since you have to promise to use the same stripe number
> when opening the same name later... Maybe I should use a hash of the
> name for that instead.  Thoughts?
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>

Re: [HACKERS] On markers of changed data

2017-10-11 Thread Greg Stark

On 10 October 2017 at 23:50, Stephen Frost  wrote:
> Yeah, it sounds interesting, but I was just chatting w/ David about it
> and we were thinking about how checkpoints are really rather often done,
> so you end up with quite a few of these lists being out there.
>
> Now, if the lists were always kept in a sorted fashion, then perhaps we
> would be able to essentially merge-sort them all back together and
> de-dup that way but even then, you're talking about an awful lot if
> you're looking at daily incrementals- that's 288 standard 5-minute
> checkpoints, each with some 128k pages changed, assuming max_wal_size of
> 1GB, and I think we can all agree that the default max_wal_size is for
> rather small systems.  That ends up being something around 2MB per
> checkpoint to store the pages in or half a gig per day just to keep
> track of the pages which changed in each checkpoint across that day.

I was actually imagining a bitmap, probably for each 1GB piece of each
table. That's probably how you would maintain this data in memory
anyways. After compression it should be fairly small. You'll probably
be modifying the same blocks frequently or doing bulk loads which will
touch a consecutive range of blocks.

But that's still about the same amount of data. But probably you don't
want to actually keep every checkpoint anyways. The nice thing about
the changelists is that they will tend to reach a maximum size
regardless of how long a time range they span so if you keep one
changelist for every 10 checkpoints or every 100 checkpoints you could
reduce the storage needs and only lose the time precision.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH] Lockable views

2017-10-11 Thread Yugo Nagata

Hi,

Attached is a patch to enable views to be locked.

PostgreSQL has supported automatically updatable views since 9.3, so we can
udpate simply defined views like regular tables. However, currently, 
table-level locks on views are not supported. We can not execute LOCK TABLE
for views, while we can get row-level locks by FOR UPDATE/SHARE. In some
situations that we need table-level locks on tables, we may also need 
table-level locks on automatically updatable views. Although we can lock
base-relations manually, it would be useful if we can lock views without
knowing the definition of the views.

In the attached patch, only automatically-updatable views that do not have
INSTEAD OF rules or INSTEAD OF triggers are lockable. It is assumed that
those views definition have only one base-relation. When an auto-updatable
view is locked, its base relation is also locked. If the base relation is a 
view again, base relations are processed recursively. For locking a view,
the view owner have to have he priviledge to lock the base relation.

* Example

test=# CREATE TABLE tbl (i int);
CREATE TABLE

test=# CREATE VIEW v1 AS SELECT * FROM tbl; 
CREATE VIEW
test=# BEGIN;
BEGIN
test=# LOCK TABLE v1;
LOCK TABLE
test=# SELECT relname, locktype, mode FROM pg_locks,pg_class c WHERE 
c.oid=relation AND relname NOT LIKE 'pg%';
 relname | locktype |mode 
-+--+-
 tbl | relation | AccessExclusiveLock
 v1  | relation | AccessExclusiveLock
(2 rows)

test=# END;
COMMIT

test=# CREATE VIEW v2 AS SELECT * FROM v1;
CREATE VIEW
test=# BEGIN;
BEGIN
test=# LOCK TABLE v2;
LOCK TABLE
test=# SELECT relname, locktype, mode FROM pg_locks,pg_class c WHERE 
c.oid=relation AND relname NOT LIKE 'pg%';
 relname | locktype |mode 
-+--+-
 v2  | relation | AccessExclusiveLock
 tbl | relation | AccessExclusiveLock
 v1  | relation | AccessExclusiveLock
(3 rows)

test=# END;
COMMIT

test=# CREATE VIEW v3 AS SELECT count(*) FROM v1;
CREATE VIEW
test=# BEGIN;
BEGIN
test=# LOCK TABLE v3;
ERROR:  cannot lock view "v3"
DETAIL:  Views that return aggregate functions are not automatically updatable.
test=# END;
ROLLBACK

test=# CREATE FUNCTION fnc() RETURNS trigger AS $$ BEGIN RETURN NEW; END; $$ 
LANGUAGE plpgsql;
CREATE FUNCTION
test=# CREATE TRIGGER trg INSTEAD OF INSERT ON v1 FOR EACH ROW EXECUTE 
PROCEDURE fnc();
CREATE TRIGGER
test=# BEGIN;
BEGIN
test=# LOCK TABLE v1;
ERROR:  cannot lock view "v1"
DETAIL:  views that have an INSTEAD OF trigger are not lockable
test=# END;
ROLLBACK



Regards,

-- 
Yugo Nagata 
diff --git a/doc/src/sgml/ref/lock.sgml b/doc/src/sgml/ref/lock.sgml
index b946eab..5a431b7 100644
--- a/doc/src/sgml/ref/lock.sgml
+++ b/doc/src/sgml/ref/lock.sgml
@@ -46,6 +46,13 @@ LOCK [ TABLE ] [ ONLY ] name [ * ]
   
 
   
+   Automatically updatable views (see )
+   that do not have INSTEAD OF triggers or INSTEAD
+   rules are also lockable. When a view is locked, its base relations are
+   also locked recursively with the same lock mode.
+  
+
+  
When acquiring locks automatically for commands that reference
tables, PostgreSQL always uses the least
restrictive lock mode possible. LOCK TABLE
diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c
index 9fe9e02..37fd028 100644
--- a/src/backend/commands/lockcmds.c
+++ b/src/backend/commands/lockcmds.c
@@ -19,15 +19,20 @@
 #include "commands/lockcmds.h"
 #include "miscadmin.h"
 #include "parser/parse_clause.h"
+#include "parser/parsetree.h"
 #include "storage/lmgr.h"
 #include "utils/acl.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
+#include "rewrite/rewriteHandler.h"
+#include "access/heapam.h"
 
-static void LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait);
-static AclResult LockTableAclCheck(Oid relid, LOCKMODE lockmode);
+static void LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait, Oid userid);
+static AclResult LockTableAclCheck(Oid relid, LOCKMODE lockmode, Oid userid);
 static void RangeVarCallbackForLockTable(const RangeVar *rv, Oid relid,
 			 Oid oldrelid, void *arg);
+static void LockViewRecurse(Oid reloid, Oid root_reloid, LOCKMODE lockmode, bool nowait);
+static void LockViewCheck(Relation view, Query *viewquery);
 
 /*
  * LOCK TABLE
@@ -62,8 +67,10 @@ LockTableCommand(LockStmt *lockstmt)
 		  RangeVarCallbackForLockTable,
 		  (void *) >mode);
 
-		if (recurse)
-			LockTableRecurse(reloid, lockstmt->mode, lockstmt->nowait);
+		if (get_rel_relkind(reloid) == RELKIND_VIEW)
+			LockViewRecurse(reloid, reloid, lockstmt->mode, lockstmt->nowait);
+		else if (recurse)
+			LockTableRecurse(reloid, lockstmt->mode, lockstmt->nowait, GetUserId());
 	}
 }
 
@@ -86,15 +93,17 @@ RangeVarCallbackForLockTable(const RangeVar *rv, Oid relid, Oid oldrelid,
 		return;	/* woops, concurrently dropped; no permissions
  * check */
 
-	/* Currently,

Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns

2017-10-11 Thread Alvaro Herrera

Andres Freund wrote:
> Hi,
> 
> On 2017-10-03 13:58:37 -0400, Robert Haas wrote:
> > On Tue, Oct 3, 2017 at 12:23 PM, Andres Freund  wrote:
> > > Makes sense?
> > 
> > Yes.
> 
> Here's an updated version of this patchset.

Maybe it'd be a good idea to push 0001 with some user of restrict ahead
of the rest, just to see how older msvc reacts.

I wonder if it'd be a good idea to nag external users about pq_sendint
usage (is a #warning possible?).  OTOH, do we really need to keep it
around?  Maybe we should ditch it, since obviously the compat shim can
be installed locally in each extension that really needs it (thinking
that most external code can simply be adjusted to the new functions).

I'm scared about the not-null-terminated stringinfo stuff.  Is it
possible to create bugs by polluting a stringinfo with it, then having
the stringinfo be used by unsuspecting code?  Admittedly, you can break
things already with the binary appends, so probably not an issue.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] show precise repos version for dev builds?

2017-10-11 Thread Craig Ringer

On 11 October 2017 at 11:44, Jeremy Schneider  wrote:
> On Sun, Oct 1, 2017 at 8:10 AM, Tom Lane  wrote:
>>
>> configure --with-extra-version=whateveryouwant
>
> I see that this build option has been around since 9.4; is anyone
> using it to mark patched production builds?  EnterpriseDB or
> 2ndQuadrant? How about the cloud providers?

We started using it for BDR, but unfortunately too much software
explodes spectacularly when you use it, due to simplistic/buggy
version parsing.

This led me to push for wider adoption of server_version_num - in
particular, exposing it as an initial connection parameter via
GUC_REPORT, and exposing it in pg_config. After a few attempts I've
given up on that as a lost cause now, though I still disagree with the
rationale.

Right now, if you use it, various 3rd party software will fail in a
variety of exciting ways, some more subtle than others.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_regress help output

2017-10-11 Thread Alvaro Herrera

Tom Lane wrote:
> Joe Conway  writes:
> > I have been annoyed at least twice now by the lack of pg_regress command
> > line help output for the "--bindir=" option. In passing I noted
> > that there was no output for "--help" or "--version" options either.
> 
> > Any objections to the attached?
> 
> +1 for documenting it, but the phrasing seems a bit awkward:
> 
> ! printf(_("  --bindir=BINPATH  use BINPATH for programs that 
> are run;\n"));
> ! printf(_("if BINPATH empty, use PATH 
> from the environment\n"));
> 
> Maybe just "if empty, use PATH ..." ?
> 
> Also, why is the patch apparently changing whitespace in all the help
> lines?  Seems like that will create a lot of make-work for translators.

I think we don't have translations for pg_regress.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] utility commands benefiting from parallel plan

2017-10-11 Thread Haribabu Kommi

On Fri, Oct 6, 2017 at 2:43 AM, Robert Haas  wrote:

> On Fri, Sep 15, 2017 at 2:22 AM, Haribabu Kommi
>  wrote:
> > Thanks for the review.
>
> I committed this patch with some cosmetic changes.  I think the fact
> that several people have asked for this indicates that, even without
> making some of the more complicated cases work, this has some value.
> I am not convinced it is safe in any case other than when the DML
> command is both creating and populating the table, so I removed
> REFRESH MATERIALIZED VIEW support from the patch and worked over the
> documentation and comments to a degree.
>
> The problem with a case like REFRESH MATERIALIZED VIEW is that there's
> nothing to prevent something that gets run in the course of the query
> from trying to access the view (and the heavyweight lock won't prevent
> that, due to group locking).  That's probably a stupid thing to do,
> but it can't be allowed to break the world.  The other cases are safe
> from that particular problem because the table doesn't exist yet.
>

Thanks for committing the patch.
I understand the problem of REFRESH MATERIALIZED VIEW case.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] parallelize queries containing initplans

2017-10-11 Thread tushar


On 10/09/2017 03:26 PM, Amit Kapila wrote:

I have reverted the check
in the attached patch.


I have applied this patch against PG HEAD and run sqlsmith and analyzed 
results . didn't find any specific failures against this patch.


--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Slow synchronous logical replication

2017-10-11 Thread Craig Ringer

On 9 October 2017 at 15:37, Konstantin Knizhnik
 wrote:
> Thank you for explanations.
>
> On 08.10.2017 16:00, Craig Ringer wrote:
>>
>> I think it'd be helpful if you provided reproduction instructions,
>> test programs, etc, making it very clear when things are / aren't
>> related to your changes.
>
>
> It will be not so easy to provide some reproducing scenario, because
> actually it involves many components (postgres_fdw, pg_pasthman,
> pg_shardman, LR,...)

So simplify it to a test case that doesn't.

> I have checked syncrepl.c file, particularly SyncRepGetSyncRecPtr function.
> Each wal sender independently calculates minimal LSN among all synchronous
> replicas and wakeup backends waiting for this LSN. It means that transaction
> performing update of data in one shard will actually wait confirmation from
> replication channels for all shards.

That's expected for the current sync rep design, yes. Because it's
based on lsn, and was designed for physical rep where there's no
question about whether we're sending some data to some peers and not
others.

So all backends will wait for the slowest-responding peer, including
peers that don't need to actually do anything for this xact. You could
possibly hack around that by having the output plugin advance the slot
position when it sees that it just processed an empty xact.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Slow synchronous logical replication

2017-10-11 Thread Masahiko Sawada

On Mon, Oct 9, 2017 at 4:37 PM, Konstantin Knizhnik
 wrote:
> Thank you for explanations.
>
> On 08.10.2017 16:00, Craig Ringer wrote:
>>
>> I think it'd be helpful if you provided reproduction instructions,
>> test programs, etc, making it very clear when things are / aren't
>> related to your changes.
>
>
> It will be not so easy to provide some reproducing scenario, because
> actually it involves many components (postgres_fdw, pg_pasthman,
> pg_shardman, LR,...)
> and requires multinode installation.
> But let me try to explain what going on:
> So we have implement sharding - splitting data between several remote tables
> using pg_pathman and postgres_fdw.
> It means that insert or update of parent table  cause insert or update of
> some derived partitions which is forwarded by postgres_fdw to the
> correspondent node.
> Number of shards is significantly larger than number of nodes, i.e. for 5
> nodes we have 50 shards. Which means that at each onde we have 10 shards.
> To provide fault tolerance each shard is replicated using logical
> replication to one or more nodes. Right now we considered only redundancy
> level 1 - each shard has only one replica.
> So from each node we establish 10 logical replication channels.
>
> We want commit to wait until data is actually stored at all replicas, so we
> are using synchronous replication:
> So we set synchronous_commit option to "on" and include all ten 10
> subscriptions in synchronous_standby_names list.
>
> In this setup commit latency is very large (about 100msec and most of the
> time is actually spent in commit) and performance is very bad - pgbench
> shows about 300 TPS for optimal number of clients (about 10, for larger
> number performance is almost the same). Without logical replication at the
> same setup we get about 6000 TPS.
>
> I have checked syncrepl.c file, particularly SyncRepGetSyncRecPtr function.
> Each wal sender independently calculates minimal LSN among all synchronous
> replicas and wakeup backends waiting for this LSN. It means that transaction
> performing update of data in one shard will actually wait confirmation from
> replication channels for all shards.
> If some shard is updated rarely than other or is not updated at all (for
> example because communication channels between this node is broken), then
> all backens will stuck.
> Also all backends are competing for the single SyncRepLock, which also can
> be a contention point.
>

IIUC, I guess you meant to say that in current synchronous logical
replication a transaction has to wait for updated table data to be
replicated even on servers that don't subscribe for the table. If we
change it so that a transaction needs to wait for only the server that
are subscribing for the table it would be more efficiency, for at
least your use case.
We send at least the begin and commit data to all subscriptions and
then wait for the reply from them but can we skip to wait them, for
example, when the walsender actually didn't send any data modified by
the transaction?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How does postgres store the join predicate for a relation in a given query

2017-10-11 Thread Ashutosh Bapat

On Tue, Oct 10, 2017 at 7:29 PM, Gourav Kumar  wrote:
> Hi all,
>
> When you fire a query in postgresql, it will first parse the query and
> create the data structures for storing various aspects of the query and
> executing the query. (Like RangeTblEntry, PlannerInfo, RangeOptInfo etc.).
>
> I want to know how does postgresql stores the join predicates of a query.
> Like which data structure is used to store the join predicates.
>
> How can we find the join predicates applied on a relation from relid, Oid or
> RangeTblEntry ?
>

Every relation has a RelOptInfo associated with it. Predicates
applicable to it are stored in this RelOptInfo as a list. For base
relations (simple tables) it's in baserestrictinfo. The join
predicates applicable are in joininfo. You can get RelOptInfo of a
given simple table using find_base_rel().

HTH.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

59 matches

Mail list logo