date:20170321

Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-03-21 Thread Haribabu Kommi

Because of this refactor handing of database objects between
pg_dump and pg_dumpall, the latest pg_dump tap tests are
failing in the following scenarios.

1. CREATE DATABASE postgres

Before this patch, the pg_dump uses to dump the CREATE
DATABASE command of postgres but not by pg_dumpall.
During this refactor handling, the approach that I took in
pg_dump for the --create option to use the similar appraoch
of pg_dumpall to not to print the CREATE DATABASE commands
for "postgres" and "template1" databases.

It just prints the ALTER DATABASE commands to SET the
TABLESPACE for those two databases.

Solution -1) Just ignore dumping these CREATE DATABASE
commands and provide the user information in the documentation
to create "postgres" and "template1" database in the target in case
if they don't exist. If this kind of cases are very rare.

Solution-2) Add a new command line option/some other settings
to indicate the pg_dump execution is from pg_dumpall and follow
the current refactored behavior, otherwise follow the earlier pg_dump
behavior in handling CREATE DATABASE commands for "postgres"
and "template1" databases.


2.  In dumpDatabases function before calling the runPgDump command,
Before refactoring, it used to connect to the database and dump
"SET default_transaction_read_only = off;" to prevent some accidental
overwrite of the target.

I fixed it in the attached patch by removing the connection and dumping
the set command.

Does it needs the similar approach of solution-2) in previous problem and
handle dumping the "SET default_transaction_read_only = off;" whenever
the CREATE DATABASE and \connect command is issued?

Documentation is yet to update to reflect the above changes.

Regards,
Hari Babu
Fujitsu Australia


pg_dump_changes_3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Partition-wise aggregation/grouping

2017-03-21 Thread Jeevan Chalke

Hi all,

Declarative partitioning is supported in PostgreSQL 10 and work is already
in
progress to support partition-wise joins. Here is a proposal for
partition-wise
aggregation/grouping.  Our initial performance measurement has shown 7 times
performance when partitions are on foreign servers and approximately 15%
when
partitions are local.

Partition-wise aggregation/grouping computes aggregates for each partition
separately.  If the group clause contains the partition key, all the rows
belonging to a given group come from one partition, thus allowing aggregates
to be computed completely for each partition.  Otherwise, partial aggregates
computed for each partition are combined across the partitions to produce
the
final aggregates. This technique improves performance because:
i. When partitions are located on foreign server, we can push down the
aggregate to the foreign server.
ii. If hash table for each partition fits in memory, but that for the whole
relation does not, each partition-wise aggregate can use an in-memory hash
table.
iii. Aggregation at the level of partitions can exploit properties of
partitions like indexes, their storage etc.

Attached an experimental patch for the same based on the partition-wise join
patches posted in [1].

This patch currently implements partition-wise aggregation when group clause
contains the partitioning key.  A query below, involving a partitioned table
with 3 partitions containing 1M rows each, producing total 30 groups showed
15% improvement over non-partition-wise aggregation. Same query showed 7
times
improvement when the partitions were located on the foreign servers.

Here is the sample plan:

postgres=# set enable_partition_wise_agg to true;
SET
postgres=# EXPLAIN ANALYZE SELECT a, count(*) FROM plt1 GROUP BY a;
  QUERY
PLAN
--
 Append  (cost=5100.00..61518.90 rows=30 width=12) (actual
time=324.837..944.804 rows=30 loops=1)
   ->  Foreign Scan  (cost=5100.00..20506.30 rows=10 width=12) (actual
time=324.837..324.838 rows=10 loops=1)
 Relations: Aggregate on (public.fplt1_p1 plt1)
   ->  Foreign Scan  (cost=5100.00..20506.30 rows=10 width=12) (actual
time=309.954..309.956 rows=10 loops=1)
 Relations: Aggregate on (public.fplt1_p2 plt1)
   ->  Foreign Scan  (cost=5100.00..20506.30 rows=10 width=12) (actual
time=310.002..310.004 rows=10 loops=1)
 Relations: Aggregate on (public.fplt1_p3 plt1)
 Planning time: 0.370 ms
 Execution time: 945.384 ms
(9 rows)

postgres=# set enable_partition_wise_agg to false;
SET
postgres=# EXPLAIN ANALYZE SELECT a, count(*) FROM plt1 GROUP BY a;
  QUERY
PLAN
---
 HashAggregate  (cost=121518.01..121518.31 rows=30 width=12) (actual
time=6498.452..6498.459 rows=30 loops=1)
   Group Key: plt1.a
   ->  Append  (cost=0.00..106518.00 rows=301 width=4) (actual
time=0.595..5769.592 rows=300 loops=1)
 ->  Seq Scan on plt1  (cost=0.00..0.00 rows=1 width=4) (actual
time=0.007..0.007 rows=0 loops=1)
 ->  Foreign Scan on fplt1_p1  (cost=100.00..35506.00 rows=100
width=4) (actual time=0.587..1844.506 rows=100 loops=1)
 ->  Foreign Scan on fplt1_p2  (cost=100.00..35506.00 rows=100
width=4) (actual time=0.384..1839.633 rows=100 loops=1)
 ->  Foreign Scan on fplt1_p3  (cost=100.00..35506.00 rows=100
width=4) (actual time=0.402..1876.505 rows=100 loops=1)
 Planning time: 0.251 ms
 Execution time: 6499.018 ms
(9 rows)

Patch needs a lot of improvement including:
1. Support for partial partition-wise aggregation
2. Estimating number of groups for every partition
3. Estimating cost of partition-wise aggregation based on sample partitions
similar to partition-wise join
and much more.

In order to support partial aggregation on foreign partitions, we need
support
to fetch partially aggregated results from the foreign server. That can be
handled as a separate follow-on patch.

Though is lot of work to be done, I would like to get suggestions/opinions
from
hackers.

I would like to thank Ashutosh Bapat for providing a draft patch and helping
me off-list on this feature while he is busy working on partition-wise join
feature.

[1]
https://www.postgresql.org/message-id/CAFjFpRcbY2QN3cfeMTzVEoyF5Lfku-ijyNR%3DPbXj1e%3D9a%3DqMoQ%40mail.gmail.com

Thanks

-- 
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


pg_partwise_agg_WIP.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] btree_gin and btree_gist for enums

2017-03-21 Thread Anastasia Lubennikova


28.02.2017 00:22, Andrew Dunstan:

OK, here's the whole series of patches.

Patch 1 adds the CallerFInfoFunctionCall{1,2} functions.

Patch 2 adds btree_gist support for their use for non-varlena types

Patch 3 does the same for varlena types (Not required for patch 4, but
better to be consistent, I think.)

Patch 4 adds enum support to btree_gist

Patch 5 adds enum support to btree_gin

cheers

andrew


Thank you for implementing the feature!
These patches seem to have some merge conflicts with recent commit:

commit c7a9fa399d557c6366222e90b35db31e45d25678
Author: Stephen Frost 
Date:   Wed Mar 15 11:16:25 2017 -0400

Add support for EUI-64 MAC addresses as macaddr8

And also, it's needed to update patch 0002 to consider macaddr8, I 
attached the diff.
Complete patch (including 0002_addition fix) rebased to the current 
master is attached as well.

It works as expected, code itself looks clear and well documented.

The only issue I've found is a make check failure in contrib/btree_gin 
subdirectory:


test numeric  ... ok
test enum ... /bin/sh: 1: cannot open 
/home/anastasia/newprojects/original_repo/postgres/contrib/btree_gin/sql/enum.sql: 
No such file
diff: 
/home/anastasia/newprojects/original_repo/postgres/contrib/btree_gin/results/enum.out: 
No such file or directory
diff command failed with status 512: diff 
"/home/anastasia/newprojects/original_repo/postgres/contrib/btree_gin/expected/enum.out" 
"/home/anastasia/newprojects/original_repo/postgres/contrib/btree_gin/results/enum.out" 
> 
"/home/anastasia/newprojects/original_repo/postgres/contrib/btree_gin/results/enum.out.diff"



Please, add regression test for btree_gin, and this patch can be 
considered "Ready for committer".


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/btree_gist/btree_macaddr8.c b/contrib/btree_gist/btree_macaddr8.c
index 13238ef..96afbcd 100644
--- a/contrib/btree_gist/btree_macaddr8.c
+++ b/contrib/btree_gist/btree_macaddr8.c
@@ -28,37 +28,37 @@ PG_FUNCTION_INFO_V1(gbt_macad8_same);
 
 
 static bool
-gbt_macad8gt(const void *a, const void *b)
+gbt_macad8gt(const void *a, const void *b, FmgrInfo *flinfo)
 {
 	return DatumGetBool(DirectFunctionCall2(macaddr8_gt, PointerGetDatum(a), PointerGetDatum(b)));
 }
 static bool
-gbt_macad8ge(const void *a, const void *b)
+gbt_macad8ge(const void *a, const void *b, FmgrInfo *flinfo)
 {
 	return DatumGetBool(DirectFunctionCall2(macaddr8_ge, PointerGetDatum(a), PointerGetDatum(b)));
 }
 
 static bool
-gbt_macad8eq(const void *a, const void *b)
+gbt_macad8eq(const void *a, const void *b, FmgrInfo *flinfo)
 {
 	return DatumGetBool(DirectFunctionCall2(macaddr8_eq, PointerGetDatum(a), PointerGetDatum(b)));
 }
 
 static bool
-gbt_macad8le(const void *a, const void *b)
+gbt_macad8le(const void *a, const void *b, FmgrInfo *flinfo)
 {
 	return DatumGetBool(DirectFunctionCall2(macaddr8_le, PointerGetDatum(a), PointerGetDatum(b)));
 }
 
 static bool
-gbt_macad8lt(const void *a, const void *b)
+gbt_macad8lt(const void *a, const void *b, FmgrInfo *flinfo)
 {
 	return DatumGetBool(DirectFunctionCall2(macaddr8_lt, PointerGetDatum(a), PointerGetDatum(b)));
 }
 
 
 static int
-gbt_macad8key_cmp(const void *a, const void *b)
+gbt_macad8key_cmp(const void *a, const void *b, FmgrInfo *flinfo)
 {
 	mac8KEY*ia = (mac8KEY *) (((const Nsrt *) a)->t);
 	mac8KEY*ib = (mac8KEY *) (((const Nsrt *) b)->t);
@@ -142,7 +142,7 @@ gbt_macad8_consistent(PG_FUNCTION_ARGS)
 	key.upper = (GBT_NUMKEY *) &kkk->upper;
 
 	PG_RETURN_BOOL(
-   gbt_num_consistent(&key, (void *) query, &strategy, GIST_LEAF(entry), &tinfo)
+   gbt_num_consistent(&key, (void *) query, &strategy, GIST_LEAF(entry), &tinfo, fcinfo->flinfo)
 		);
 }
 
@@ -154,7 +154,7 @@ gbt_macad8_union(PG_FUNCTION_ARGS)
 	void	   *out = palloc0(sizeof(mac8KEY));
 
 	*(int *) PG_GETARG_POINTER(1) = sizeof(mac8KEY);
-	PG_RETURN_POINTER(gbt_num_union((void *) out, entryvec, &tinfo));
+	PG_RETURN_POINTER(gbt_num_union((void *) out, entryvec, &tinfo, fcinfo->flinfo));
 }
 
 
@@ -184,7 +184,7 @@ gbt_macad8_picksplit(PG_FUNCTION_ARGS)
 	PG_RETURN_POINTER(gbt_num_picksplit(
 	(GistEntryVector *) PG_GETARG_POINTER(0),
 	  (GIST_SPLITVEC *) PG_GETARG_POINTER(1),
-		&tinfo
+		&tinfo, fcinfo->flinfo
 		));
 }
 
@@ -195,6 +195,6 @@ gbt_macad8_same(PG_FUNCTION_ARGS)
 	mac8KEY*b2 = (mac8KEY *) PG_GETARG_POINTER(1);
 	bool	   *result = (bool *) PG_GETARG_POINTER(2);
 
-	*result = gbt_num_same((void *) b1, (void *) b2, &tinfo);
+	*result = gbt_num_same((void *) b1, (void *) b2, &tinfo, fcinfo->flinfo);
 	PG_RETURN_POINTER(result);
 }
commit 50f4a4efcb9ed3ae5e0378676459c6d1ce455bd2
Author: Anastasia 
Date:   Tue Mar 21 10:11:37 2017 +0300

complete btree_gin_gist_enum patch

diff --git a/contrib/btree_gin/Makefile b/contrib/btree_gin/Makefile
index f22e4af..589755f 100644
--- a/contrib/btree_gin/Make

Re: [HACKERS] Proposal: GetOldestXminExtend for ignoring arbitrary vacuum flags

2017-03-21 Thread Haribabu Kommi

On Tue, Mar 21, 2017 at 3:16 PM, Seki, Eiji 
wrote:
>
>
> Thank you for you review.
>
> I reflected your comment and attach the updated patch.

Thanks for the updated patch.

+/* Use these flags in GetOldestXmin as "flags" */

How about some thing like the following.
/* Use the following flags as an input "flags" to GetOldestXmin function */

+/* Ignore vacuum backends (Note: this also ignores analyze with vacuum
backends) */
+#define PROCARRAY_FLAGS_VACUUM PROCARRAY_FLAGS_DEFAULT |
PROCARRAY_VACUUM_FLAG
+/* Ignore analyze backends (Note: this also ignores vacuum with analyze
backends) */
+#define PROCARRAY_FLAGS_ANALYZE PROCARRAY_FLAGS_DEFAULT |
PROCARRAY_ANALYZE_FLAG

Whenever the above flags are passed to the GetOldestXmin() function,
it just verifies whether any one of the flags are set in the backend flags
or not. And also actually the PROC_IN_ANALYZE flag will set when
analyze operation is started and reset at the end. I feel, it is not
required to mention the Note section.

+/* Ignore vacuum backends and analyze ones */

How about changing it as "Ignore both vacuum and analyze backends".

Regards,
Hari Babu
Fujitsu Australia

[HACKERS] segfault in hot standby for hash indexes

2017-03-21 Thread Jeff Janes

Against an unmodified HEAD (17fa3e8), I got a segfault in the hot standby.

Using the attached files, I start the test case like this:

nice sh do_nocrash_sr.sh >& do_nocrash_sr.err &

And start the replica like:

rm -r /tmp/data2_replica/ ;
psql -p 9876 -c "select pg_create_physical_replication_slot('foo')";
~/pgsql/pure_origin/bin/pg_basebackup -p 9876 -D /tmp/data2_replica -R -S
foo;
~/pgsql/pure_origin/bin/pg_ctl start -D /tmp/data2_replica/ -o "--port=9874"

After less than a minute, the replica fails.

#0  0x004b85fe in hash_xlog_vacuum_get_latestRemovedXid
(record=0x1925418) at hash_xlog.c:1006
1006iitemid = PageGetItemId(ipage, unused[i]);
(gdb) bt
#0  0x004b85fe in hash_xlog_vacuum_get_latestRemovedXid
(record=0x1925418) at hash_xlog.c:1006
#1  0x004b881f in hash_xlog_vacuum_one_page (record=0x1925418) at
hash_xlog.c:1113
#2  0x004b8bed in hash_redo (record=0x1925418) at hash_xlog.c:1217
#3  0x0051a96c in StartupXLOG () at xlog.c:7152
#4  0x00789ffb in StartupProcessMain () at startup.c:216
#5  0x0052d617 in AuxiliaryProcessMain (argc=2,
argv=0x7fffe7661500) at bootstrap.c:421
#6  0x007890cf in StartChildProcess (type=StartupProcess) at
postmaster.c:5256
#7  0x0078419d in PostmasterMain (argc=4, argv=0x18fc300) at
postmaster.c:1329
#8  0x006cd78e in main (argc=4, argv=0x18fc300) at main.c:228

'unused' is NULL at that point.

Cheers,

Jeff


count.pl
Description: Binary data


do_nocrash_sr.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise aggregation/grouping

2017-03-21 Thread Antonin Houska

Jeevan Chalke  wrote:

> Declarative partitioning is supported in PostgreSQL 10 and work is already in
> progress to support partition-wise joins. Here is a proposal for 
> partition-wise
> aggregation/grouping. Our initial performance measurement has shown 7 times
> performance when partitions are on foreign servers and approximately 15% when
> partitions are local.
> 
> Partition-wise aggregation/grouping computes aggregates for each partition
> separately. If the group clause contains the partition key, all the rows
> belonging to a given group come from one partition, thus allowing aggregates
> to be computed completely for each partition. Otherwise, partial aggregates
> computed for each partition are combined across the partitions to produce the
> final aggregates. This technique improves performance because:

> i. When partitions are located on foreign server, we can push down the
> aggregate to the foreign server.

> ii. If hash table for each partition fits in memory, but that for the whole
> relation does not, each partition-wise aggregate can use an in-memory hash
> table.

> iii. Aggregation at the level of partitions can exploit properties of
> partitions like indexes, their storage etc.

I suspect this overlaps with

https://www.postgresql.org/message-id/29111.1483984605%40localhost

I'm working on the next version of the patch, which will be able to aggregate
the result of both base relation scans and joins. I'm trying hard to make the
next version available before an urgent vacation that I'll have to take at
random date between today and early April. I suggest that we coordinate the
effort, it's lot of work in any case.

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] segfault in hot standby for hash indexes

2017-03-21 Thread Amit Kapila

On Tue, Mar 21, 2017 at 1:28 PM, Jeff Janes  wrote:
> Against an unmodified HEAD (17fa3e8), I got a segfault in the hot standby.
>

I think I see the problem in hash_xlog_vacuum_get_latestRemovedXid().
It seems to me that we are using different block_id for registering
the deleted items in xlog XLOG_HASH_VACUUM_ONE_PAGE and then using
different block_id for fetching those items in
hash_xlog_vacuum_get_latestRemovedXid().  So probably matching those
will fix this issue (instead of fetching block number and items from
block_id 1, we should use block_id 0).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Implementing delete in columnar store fdw

2017-03-21 Thread sri harsha

Hello,

   I want to implement delete functionality for a column store fdw in
postgres. It is similar to file_fdw. I want to use the
“AddForeignUpdateTargets” function to implement this , but the junk filter
shouldn’t be a column present in the table . Is it possible to add a
Expr/Var to the junk filter that is not present in the schema of the table
, that i will generate on the fly ??

If yes, how can i do that ?? Thanks in advance.

Regards,
Sri Harsha

Re: [HACKERS] Asymmetry between parent and child wrt "false" quals

2017-03-21 Thread Amit Langote

On 2017/03/21 14:59, Ashutosh Bapat wrote:
> When I run a query like below on a child-less table, the plan comes out to be
> 
> explain verbose SELECT * FROM uprt1_l WHERE a = 1 AND a = 2;
>   QUERY PLAN
> --
>  Result  (cost=0.00..11.50 rows=1 width=13)
>Output: a, b, c
>One-Time Filter: false
>->  Seq Scan on public.uprt1_l  (cost=0.00..11.50 rows=1 width=13)
>  Output: a, b, c
>  Filter: (uprt1_l.a = 1)
> (6 rows)
> 
> where as the same query run on a parent with children, the plan is
> postgres=# \d prt1_l
> Table "public.prt1_l"
>  Column |   Type| Collation | Nullable | Default
> +---+---+--+-
>  a  | integer   |   | not null |
>  b  | integer   |   |  |
>  c  | character varying |   |  |
> Partition key: RANGE (a)
> Number of partitions: 3 (Use \d+ to list them.)
> 
> postgres=# explain verbose SELECT * FROM prt1_l WHERE a = 1 AND a = 2;
> QUERY PLAN
> ---
>  Result  (cost=0.00..0.00 rows=0 width=40)
>Output: prt1_l.a, prt1_l.b, prt1_l.c
>One-Time Filter: false
> (3 rows)
> 
> For a parent table with children, set_append_rel_size() evaluates
> restrictions in loop
>  880 foreach(l, root->append_rel_list)
>  881 {
>  882 AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
> 
> starting at 1021. If any of the restrictions are evaluated to false,
> it set the child as dummy. If all children are dummy, the appendrel is
> set to dummy.
> 
> But for a child-less table, even if the "false" qual is available in
> baserestrictinfo in set_rel_size(), we do not mark the relation as
> dummy. Instead, paths are created for it and only at the time of
> planning we add the gating plan when there is a pseudo constant quals.
> Why do we have different behaviours in these two cases?

I think the case where there is no child table would not be handled by
set_append_rel_size(), because rte->inh would be false.  Instead, I
thought the test at the beginning of relation_excluded_by_constraints()
would have detected this somehow; the comment there says the following:

/*
 * Regardless of the setting of constraint_exclusion, detect
 * constant-FALSE-or-NULL restriction clauses.  Because const-folding will
 * reduce "anything AND FALSE" to just "FALSE", any such case should
 * result in exactly one baserestrictinfo entry.

But the qual (a = 1 and a = 2) is *not* reduced to exactly one
constant-false-or-null baserestrictinfo entry; instead I see that there
are two RestrictInfos viz. a = 1 and const-FALSE at that point.  I think
the const-folding mentioned in the above comment does not occur after
equivalence class processing, which would be required to conclude that (a
= 1 and a = 2) reduces to constant-false.  OTOH, (a = 1 and false) can be
reduced to constant-false much earlier when performing
preprocess_qual_conditions().

That said, I am not sure if it's worthwhile to modify the test at the
beginning of relation_excluded_by_constraints() to iterate over
rel->baserestrictinfos to look for any const-FALSE quals, instead of doing
it only when there *only* the const-FALSE qual.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH]: fix bug in SP-GiST box_ops

2017-03-21 Thread Anastasia Lubennikova

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

As I can see, this bugfix was already discussed and reviewed.
All mentioned issues are fixed in the latest version of the patch
So I mark it "Ready for committer".
Thank you for fixing it!

The new status of this patch is: Ready for Committer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki

2017-03-21 Thread Ideriha, Takeshi

Hi,
I like your suggestion and took a look at your patch though I’m not the expert 
about psql.

I like the idea taking advantage of linestyle utilities
to implement rst and markdown format efficiently instead of newly developing 
pset format things.
But I'm thinking two comments below needs change to something about not 
focusing only linestyle.
That's because they really take care of both '\pset linestyle and \pset format' 
and it may lead to misunderstanding to readers.

---
/* Line style control structures */
const printTextFormat pg_markdown =

/* get selected or default line style */
const printTextFormat *
get_line_style(const printTableOpt *opt)
---

The rest things are about code style convention.
- there are some indents with white spaces around skip_leading_spaces_print()
  but Postgresql conventions says indents should be with 4 column tab.
  https://www.postgresql.org/docs/devel/static/source-format.html

- On the other hand, in docs there are some tab indent
  but white space indenet is preferable. Looking around sgml files, white space 
is used.

- some multi-line comment style also needs fix according to the above 
documentation (link)

- And I also found patch cannot be applied to current master.

Regards,
Ideriha, Takeshi

From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Jan Michalek
Sent: Sunday, March 19, 2017 5:10 AM
To: Peter Eisentraut 
Cc: Pavel Stehule ; PostgreSQL mailing lists 

Subject: Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki



2017-03-12 19:21 GMT+01:00 Jan Michálek 
mailto:godzilalal...@gmail.com>>:


2017-03-10 9:43 GMT+01:00 Jan Michálek 
mailto:godzilalal...@gmail.com>>:


2017-03-09 20:10 GMT+01:00 Peter Eisentraut 
mailto:peter.eisentr...@2ndquadrant.com>>:
This is looking pretty neat.  I played around with it a bit.  There are
a couple of edge cases that you need to address, I think.

Thanks, original code is very synoptical and and well prepared for adding new 
formats.


- Does not support \x

I know, i dnot`t know, if \x make sense in this case. I will look, how it is 
done in other formats like html. I think, that it should work in sense, that 
table generated to rst should give similar output after processing like output 
of html format.

I prepared something like this (i have no prepared diff, i need do some another 
changes)
There a few things I need to do. First problem is bold column names, i should 
do it in sme fashin as "RECORD", but i need to do some research about length of 
column.
Bigger problem is with tab indent, rst processor doesn`t work with this in this 
case.

In new diff is added \x for rst and skipping leading spaces in rst in both. 
make check passed

Jan


jelen=# execute q \g | xclip
+-++
| **RECORD 1**  
   |
+-++
| column1 | Elephant, kangaroo, 
   |
| | squirrel, gorilla   
   |
+-++
| column2 | 121 
   |
+-++
| column3 | 1.0035971223021583  
   |
+-++
| column4 | 0.  
   |
+-++
| column5 | Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello 
   |
+-++
| **RECORD 2**  
   |
+-++
| column1 | goat, rhinoceros,   
   |
| | monkey, ape 
   |
+-++
| column2 | 11121   
   |
+-++
| column3 | 1.0007824726134585  
   |
+-++
| column4 | 5.  
   |
+-++
| column5 | xx xx xx xx xx xx xx xx xx 
xx  |
+--

Re: [HACKERS] extended statistics: n-distinct

2017-03-21 Thread Kyotaro HORIGUCHI

Thank you for finishing this.

At Mon, 20 Mar 2017 16:02:20 -0300, Alvaro Herrera  
wrote in <20170320190220.ixlaueanxegqd5gr@alvherre.pgsql>
> Here is a closer to final version of the multivariate statistics series,
> last posted at
> https://www.postgresql.org/message-id/20170316222033.ncdi7nidah2gdzjx%40alvherre.pgsql

I'm sorry but this seems conflicting the current master(17fa3e8)
and a3eac988c267, which is the base of v28.

> If you've always wanted to review multivariate stats, but never found a
> good reason to, now is a terrific time to do so!  (In other words: I
> plan to get this pushed in the not too distant future.)

Great! But sorry for having contributed not so much.

> This is a new thread to present a version of the n-distinct patch that
> IMO is close enough to commit.  There are some work items still.
> There's some discussion on the topic of cross-column statistics:
> https://wiki.postgresql.org/wiki/Cross_Columns_Stats
> 
> This problem is important enough that Kyotaro Horiguchi submitted
> another patch that does the same thing:
> https://www.postgresql.org/message-id/flat/20150828.173334.114731693.horiguchi.kyotaro%40lab.ntt.co.jp
> This patch aims to provide the same functionality, keeping the design
> general enough that other kinds of statistics can be added later (such
> as functional dependencies, histograms and MCVs, all of which have been
> previously submitted as patches by Tomas).

I may be stupid but I don't get the picture here, specifically
about the relation to Tomas's patch. Does this work as
infrastructure for Tomas's mv patch? Or in some other
relationsip?

> To recap, what this patch provides is a new command of the form
>CREATE STATISTICS statname [WITH (opts)] ON (columns) FROM table
> 
> Note that we put the table name in a separate FROM clause instead of
> together with the column name, so that this is more readily extensible
> to things that are not just columns, for example expressions that might
> involve more than one table (per review from Dean Rasheed).  Currently,
> only one table is supported.
> 
> In this patch, the "opts" can only be "ndistinct", which creates a
> pg_statistic_ext row with the number of distinct groups found in all
> possible combination across that set of columns.  This can be used when
> a GROUP BY or a DISTINCT clause need to estimate the number of distinct
> groups in an aggregation.
> 
> 
> 
> Some things left to change:
> 
> * Currently, we use the ndistinct value only if the grouping uses
> exactly the set of columns covered by a statistics.  For example, if we
> have stats on (a,b,c) and the grouping is on (a,b,c,d), we fall back to
> the old method, which may result in worse results than if we used the
> number we know about (a,b,c) then applied a fixup to consider the
> distinctness of (d).

Do you planning to realize correcting esitimation of joins
perplexed by strong correlations?

> * Also, estimate_num_groups() looks a bit patchy.  With slightly more
> invasive changes we can make it look more natural.
> 
> * I'm not terribly happy with the header organization.  I think
> VacAttrStats should be in its own (new) src/include/statistics/analyze.h
> for example (which cleans up a bunch of existing stuff a bit), and the
> new files could do with some slight makeover.
> 
> * The current code uses AttrNumber * and int2vector, in places where it
> would be more convenient to use Bitmapsets.
> 
> * We currently try to keep a stats object even if a column in it is
> dropped -- for example, if we have stats on (a,b,c) and drop (b), then
> we still have stats on (a,c).  While this is nice, it creates a bunch of
> weird corner cases, so I'm going to rip that out and just drop the
> statistics instead.  If the user wants stats on (a,c) to remain, they
> can create it after (or before) dropping the column.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables and relfilenode

2017-03-21 Thread Amit Langote

On 2017/03/21 1:16, Robert Haas wrote:
> On Fri, Mar 17, 2017 at 4:57 AM, Amit Langote
>  wrote:
>>> Yes, but on the flip side, you're having to add code in a lot of
>>> places -- I think I counted 7 -- where you turn around and ignore
>>> those AppendRelInfos.
>>
>> Perhaps you were looking at the previous version with "minimal" appinfos
>> containing the child_is_partitioned field?
> 
> Yes, I think I was.  I think this version looks a lot better.

Just to clarify, I assume you reviewed the latest version which does not
create AppendRelInfos, but instead creates PartitionedChildRelInfos (as
also evident from your comments below).  Sorry about the confusion.

>  /*
> + * Close the root partitioned rel if we opened it above, but keep the
> + * lock.
> + */
> +if (rel != mtstate->resultRelInfo->ri_RelationDesc)
> +heap_close(rel, NoLock);
> 
> We didn't take a lock above, though, so drop everything in the comment
> from "but" onward.

Oh, right.

> -add_paths_to_append_rel(root, rel, live_childrels);
> +add_paths_to_append_rel(root, rel, live_childrels, partitioned_rels);
> 
> I think it would make more sense to put the new logic into
> add_paths_to_append_rel, instead of passing this down as an additional
> parameter.

OK, done.

> + * do not appear anywhere else in the plan.  Situation is exactly the
> 
> The situation is...

Fixed.

> 
> +if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
> +{
> +foreach(lc, root->pcinfo_list)
> +{
> +PartitionedChildRelInfo *pc = lfirst(lc);
> +
> +if (pc->parent_relid == parentRTindex)
> +{
> +partitioned_rels = pc->child_rels;
> +break;
> +}
> +}
> +}
> 
> You seem to have a few copies of this logic.  I think it would be
> worth factoring it out into a separate function.

OK, done.  Put the definition in in planner.c

> +root->glob->nonleafResultRelations =
> +list_concat(root->glob->nonleafResultRelations,
> +list_copy(splan->partitioned_rels));
> 
> Please add a brief comment.  One line is fine.

Done.

> 
> +newrc->isParent = childrte->relkind == RELKIND_PARTITIONED_TABLE;
> 
> I'm not sure what project style is, but I personally find these kinds
> of assignments easier to read with an extra set of parantheses:
> 
> newrc->isParent = (childrte->relkind == 
> RELKIND_PARTITIONED_TABLE);

Ah, done.

> 
> +if (partitioned_rels == NIL)
> +return;
> +
> +foreach(lc, partitioned_rels)
> 
> I think the if-test is pointless; the foreach loop is going to start
> by comparing the initial value with NIL.

Right, fixed.

> Why doesn't ExecSerializePlan() need to transfer a proper value for
> nonleafResultRelations to workers?  Seems like it should.

It doesn't transfer resultRelations either, presumably because workers do
not handle result relations yet.  Also, both resultRelations and
nonleafResultRelations are set *only* if there is a ModifyTable node.

Attached updated patches.

Thanks,
Amit
>From 28fe97f6a831485f5f85d20ec6a3e0df066ad612 Mon Sep 17 00:00:00 2001
From: amit 
Date: Fri, 10 Mar 2017 13:48:31 +0900
Subject: [PATCH 1/2] Avoid creating scan nodes for partitioned tables

There are quite a few changes here:

* Currently, we create scan nodes for inheritance parents in their role
  as an inheritance set member.  Partitioned tables do not contain any
  data, so it's useless to create scan nodes for them.  So, it's not
  necessary to create AppendRelInfos for partitioned child tables in
  expand_inherited_rtentry().  Although, create the RTEs and record the
  RT indexes in a list.  The list is associated with the parent_relid
  using the new PartitionedChildRelInfo nodes, which are kept in a global
  list called pcinfo_list in the root PlannerInfo, similar to
  append_rel_list.  For a given parent RTE, add_paths_to_append_rel() and
  inheritance_planner() must copy the aforementioned list of partitioned
  child RT indexes to Append/MergeAppend and ModifyTable paths,
  respectively.  The list is then carried over to a new field called
  partitioned_rels in Append, MergeAppend, and ModifyTable nodes.

* For partitioned parent RTEs with inh=false, set_rel_size() must not
  try to do set_plain_rel_size().  Instead mark such rels as "dummy",
  because they are empty.  Happens with partitioned tables without any
  leaf partitions.

* In case of UPDATE/DELETE on a partitioned table, executor can already
  know the leaf result relations with PlannedStmt.resultRelations, but
  not the non-leaf relations.  InitPlan must lock *all* the result
  relations before initializing the plan tree to avoid lock upgrade
  hazards, so we must include non-leaf result relations as well.  For
  that, add a new field nonleafResultRelations to PlannedStmt, which
  is intitialized by copying ModifyTable.partitioned_rels.

* PlanRowMar

Re: [HACKERS] postgres_fdw: support parameterized foreign joins

2017-03-21 Thread Etsuro Fujita


On 2017/03/16 22:23, Arthur Zakirov wrote:

2017-02-27 12:40 GMT+03:00 Etsuro Fujita :

I'd like to propose to support parameterized foreign joins.  Attached is a
patch for that, which has been created on top of [1].



Can you rebase the patch? It is not applied now.


Ok, will do.  Thanks for the report!

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw bug in 9.6

2017-03-21 Thread Etsuro Fujita


On 2017/03/17 0:37, David Steele wrote:

This patch does not apply cleanly at cccbdde:



Marked "Waiting on Author".


Ok, I'll update the patch.  One thing I'd like to revise in addition to 
that is (1) add to JoinPathExtraData a flag member to indicate whether 
to give the FDW a chance to consider a remote join, which will be set to 
true if the joinrel's fdwroutine is not NULL and the fdwroutine's 
GetForeignJoinPaths is not NULL, and (2) if the flag is true, save info 
to create an alternative local join path, such as hashclauses and 
mergeclauses proposed in the patch, into JoinPathExtraData in 
add_paths_to_joinrel.  This would avoid useless overhead in saving such 
info into JoinPathExtraData when we don't give the FDW that chance.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Push down more full joins in postgres_fdw

2017-03-21 Thread Etsuro Fujita


On 2017/03/17 2:35, Robert Haas wrote:

And ... I don't see anything to complain about, so, committed.


Thanks for committing, Robert!  Thanks for reviewing, Ashutosh and David!

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki

2017-03-21 Thread Pavel Stehule

2017-03-21 9:59 GMT+01:00 Ideriha, Takeshi :

> Hi,
>
> I like your suggestion and took a look at your patch though I’m not the
> expert about psql.
>
>
>
> I like the idea taking advantage of linestyle utilities
>
to implement rst and markdown format efficiently instead of newly
> developing pset format things.
>
> But I'm thinking two comments below needs change to something about not
> focusing only linestyle.
>
> That's because they really take care of both '\pset linestyle and \pset
> format' and it may lead to misunderstanding to readers.
>

I am not sure if linestyle is enough - the markdown aligning is not visual.

regards

Pavel

>
>
> ---
>
> /* Line style control structures */
>
> const printTextFormat pg_markdown =
>
>
>
> /* get selected or default line style */
>
> const printTextFormat *
>
> get_line_style(const printTableOpt *opt)
>
> ---
>
>
>
> The rest things are about code style convention.
>
> - there are some indents with white spaces around
> skip_leading_spaces_print()
>
>   but Postgresql conventions says indents should be with 4 column tab.
>
>   https://www.postgresql.org/docs/devel/static/source-format.html
>
>
>
> - On the other hand, in docs there are some tab indent
>
>   but white space indenet is preferable. Looking around sgml files, white
> space is used.
>
>
>
> - some multi-line comment style also needs fix according to the above
> documentation (link)
>
>
>
> - And I also found patch cannot be applied to current master.
>
>
>
> Regards,
>
> Ideriha, Takeshi
>
>
>
> *From:* pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-owner@
> postgresql.org] *On Behalf Of *Jan Michalek
> *Sent:* Sunday, March 19, 2017 5:10 AM
> *To:* Peter Eisentraut 
> *Cc:* Pavel Stehule ; PostgreSQL mailing lists <
> pgsql-hackers@postgresql.org>
> *Subject:* Re: [HACKERS] Other formats in pset like markdown, rst,
> mediawiki
>
>
>
>
>
>
>
> 2017-03-12 19:21 GMT+01:00 Jan Michálek :
>
>
>
>
>
> 2017-03-10 9:43 GMT+01:00 Jan Michálek :
>
>
>
>
>
> 2017-03-09 20:10 GMT+01:00 Peter Eisentraut  com>:
>
> This is looking pretty neat.  I played around with it a bit.  There are
> a couple of edge cases that you need to address, I think.
>
>
>
> Thanks, original code is very synoptical and and well prepared for adding
> new formats.
>
>
>
>
> - Does not support \x
>
>
>
> I know, i dnot`t know, if \x make sense in this case. I will look, how it
> is done in other formats like html. I think, that it should work in sense,
> that table generated to rst should give similar output after processing
> like output of html format.
>
>
>
> I prepared something like this (i have no prepared diff, i need do some
> another changes)
>
> There a few things I need to do. First problem is bold column names, i
> should do it in sme fashin as "RECORD", but i need to do some research
> about length of column.
>
> Bigger problem is with tab indent, rst processor doesn`t work with this in
> this case.
>
>
> In new diff is added \x for rst and skipping leading spaces in rst in
> both. make check passed
>
>
>
> Jan
>
>
>
>
>
> jelen=# execute q \g | xclip
> +-+-
> ---+
> | **RECORD 1**
> |
> +-+-
> ---+
> | column1 | Elephant, kangaroo,
> |
> | | squirrel, gorilla
> |
> +-+-
> ---+
> | column2 | 121
> |
> +-+-
> ---+
> | column3 | 1.0035971223021583
> |
> +-+-
> ---+
> | column4 | 0.
> |
> +-+-
> ---+
> | column5 | Hello Hello Hello Hello Hello Hello Hello Hello Hello
> Hello|
> +-+-
> ---+
> | **RECORD 2**
> |
> +-+-
> ---+
> | column1 | goat, rhinoceros,
>|
> | | monkey, ape
>  |
> +-+-
> ---+
> | column2 | 11121
> |
> +-+-
> ---+
> | column3 | 1.0007824726134585
> |
> +-+-
> ---+
> | column4 | 5.
> |
> +-+-
> ---+
> | column5 | xx xx xx xx xx xx xx xx xx
> xx  |
> +-+-
> ---+
> | **RECORD 3**
> |
> +-+-
> ---+
> | column1 | donkey,

Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki

2017-03-21 Thread Jan Michálek

2017-03-21 9:59 GMT+01:00 Ideriha, Takeshi :

> Hi,
>
> I like your suggestion and took a look at your patch though I’m not the
> expert about psql.
>
>
>
> I like the idea taking advantage of linestyle utilities
>
> to implement rst and markdown format efficiently instead of newly
> developing pset format things.
>
> But I'm thinking two comments below needs change to something about not
> focusing only linestyle.
>
> That's because they really take care of both '\pset linestyle and \pset
> format' and it may lead to misunderstanding to readers.
>
>
>
> ---
>
> /* Line style control structures */
>
> const printTextFormat pg_markdown =
>
>
>
> /* get selected or default line style */
>
> const printTextFormat *
>
> get_line_style(const printTableOpt *opt)
>
> ---
>

It is in command.c?

I have it done that \pset format changes linestyle

psql (9.6.2, server 9.6.1)
Type "help" for help.

jelen=# \pset linestyle ascii
Line style is ascii.
jelen=# \pset format rst
Output format is rst.
jelen=# \pset linestyle
Line style is rst.
jelen=#

Peter wrote that this is not right, but i don`t know how it should like,
because most of this is done on linestyle, format is used only for switch
from console.



>
>
> The rest things are about code style convention.
>
> - there are some indents with white spaces around
> skip_leading_spaces_print()
>
>   but Postgresql conventions says indents should be with 4 column tab.
>
>   https://www.postgresql.org/docs/devel/static/source-format.html
>

Thanks, i often using 4 whitespaces (i have it in vim) but in other code i
found mostly used 8 whitespaces.
I will look on this. I use code from another functions (fputnbytes,
print_html_escaped) as template.


>
>
> - On the other hand, in docs there are some tab indent
>
>   but white space indenet is preferable. Looking around sgml files, white
> space is used.
>
>
>
> - some multi-line comment style also needs fix according to the above
> documentation (link)
>

I will look on the comments, this is only work version, coding style issues
will be corrected (i have some comments only my orientation in code).


>
>
> - And I also found patch cannot be applied to current master.
>

I have 9.6.2 source code. It is not correct? Where i find source code i
should use?

Have nice day
Jan


>
>
> Regards,
>
> Ideriha, Takeshi
>
>
>
> *From:* pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-owner@
> postgresql.org] *On Behalf Of *Jan Michalek
> *Sent:* Sunday, March 19, 2017 5:10 AM
> *To:* Peter Eisentraut 
> *Cc:* Pavel Stehule ; PostgreSQL mailing lists <
> pgsql-hackers@postgresql.org>
> *Subject:* Re: [HACKERS] Other formats in pset like markdown, rst,
> mediawiki
>
>
>
>
>
>
>
> 2017-03-12 19:21 GMT+01:00 Jan Michálek :
>
>
>
>
>
> 2017-03-10 9:43 GMT+01:00 Jan Michálek :
>
>
>
>
>
> 2017-03-09 20:10 GMT+01:00 Peter Eisentraut  com>:
>
> This is looking pretty neat.  I played around with it a bit.  There are
> a couple of edge cases that you need to address, I think.
>
>
>
> Thanks, original code is very synoptical and and well prepared for adding
> new formats.
>
>
>
>
> - Does not support \x
>
>
>
> I know, i dnot`t know, if \x make sense in this case. I will look, how it
> is done in other formats like html. I think, that it should work in sense,
> that table generated to rst should give similar output after processing
> like output of html format.
>
>
>
> I prepared something like this (i have no prepared diff, i need do some
> another changes)
>
> There a few things I need to do. First problem is bold column names, i
> should do it in sme fashin as "RECORD", but i need to do some research
> about length of column.
>
> Bigger problem is with tab indent, rst processor doesn`t work with this in
> this case.
>
>
> In new diff is added \x for rst and skipping leading spaces in rst in
> both. make check passed
>
>
>
> Jan
>
>
>
>
>
> jelen=# execute q \g | xclip
> +-+-
> ---+
> | **RECORD 1**
> |
> +-+-
> ---+
> | column1 | Elephant, kangaroo,
> |
> | | squirrel, gorilla
> |
> +-+-
> ---+
> | column2 | 121
> |
> +-+-
> ---+
> | column3 | 1.0035971223021583
> |
> +-+-
> ---+
> | column4 | 0.
> |
> +-+-
> ---+
> | column5 | Hello Hello Hello Hello Hello Hello Hello Hello Hello
> Hello|
> +-+-
> ---+
> | **RECORD 2**
> |
> +-+-
> ---+
> | column1 | goat, rhinoceros,
>|
> |

Re: [HACKERS] WIP: Covering + unique indexes.

2017-03-21 Thread Anastasia Lubennikova


Patch rebased to the current master is in attachments.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

commit 497d52b713dd8f926b465ddf22f21db7229b12e3
Author: Anastasia 
Date:   Tue Mar 21 12:58:13 2017 +0300

include_columns_10.0_v4.patch

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index c1e9089..5c80e3b 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
 static void deleteConnection(const char *name);
-static char **get_pkey_attnames(Relation rel, int16 *numatts);
+static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts);
 static char **get_text_array_contents(ArrayType *array, int *numitems);
 static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals);
 static char *get_sql_delete(Relation rel, int *pkattnums, int pknumatts, char **tgt_pkattvals);
@@ -1480,7 +1480,7 @@ PG_FUNCTION_INFO_V1(dblink_get_pkey);
 Datum
 dblink_get_pkey(PG_FUNCTION_ARGS)
 {
-	int16		numatts;
+	int16		indnkeyatts;
 	char	  **results;
 	FuncCallContext *funcctx;
 	int32		call_cntr;
@@ -1506,7 +1506,7 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		rel = get_rel_from_relname(PG_GETARG_TEXT_PP(0), AccessShareLock, ACL_SELECT);
 
 		/* get the array of attnums */
-		results = get_pkey_attnames(rel, &numatts);
+		results = get_pkey_attnames(rel, &indnkeyatts);
 
 		relation_close(rel, AccessShareLock);
 
@@ -1526,9 +1526,9 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		attinmeta = TupleDescGetAttInMetadata(tupdesc);
 		funcctx->attinmeta = attinmeta;
 
-		if ((results != NULL) && (numatts > 0))
+		if ((results != NULL) && (indnkeyatts > 0))
 		{
-			funcctx->max_calls = numatts;
+			funcctx->max_calls = indnkeyatts;
 
 			/* got results, keep track of them */
 			funcctx->user_fctx = results;
@@ -2016,10 +2016,10 @@ dblink_fdw_validator(PG_FUNCTION_ARGS)
  * get_pkey_attnames
  *
  * Get the primary key attnames for the given relation.
- * Return NULL, and set numatts = 0, if no primary key exists.
+ * Return NULL, and set indnkeyatts = 0, if no primary key exists.
  */
 static char **
-get_pkey_attnames(Relation rel, int16 *numatts)
+get_pkey_attnames(Relation rel, int16 *indnkeyatts)
 {
 	Relation	indexRelation;
 	ScanKeyData skey;
@@ -2029,8 +2029,8 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 	char	  **result = NULL;
 	TupleDesc	tupdesc;
 
-	/* initialize numatts to 0 in case no primary key exists */
-	*numatts = 0;
+	/* initialize indnkeyatts to 0 in case no primary key exists */
+	*indnkeyatts = 0;
 
 	tupdesc = rel->rd_att;
 
@@ -2051,12 +2051,12 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 		/* we're only interested if it is the primary key */
 		if (index->indisprimary)
 		{
-			*numatts = index->indnatts;
-			if (*numatts > 0)
+			*indnkeyatts = index->indnkeyatts;
+			if (*indnkeyatts > 0)
 			{
-result = (char **) palloc(*numatts * sizeof(char *));
+result = (char **) palloc(*indnkeyatts * sizeof(char *));
 
-for (i = 0; i < *numatts; i++)
+for (i = 0; i < *indnkeyatts; i++)
 	result[i] = SPI_fname(tupdesc, index->indkey.values[i]);
 			}
 			break;
diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c
index 1241108..943e410 100644
--- a/contrib/tcn/tcn.c
+++ b/contrib/tcn/tcn.c
@@ -138,9 +138,9 @@ triggered_change_notification(PG_FUNCTION_ARGS)
 		/* we're only interested if it is the primary key and valid */
 		if (index->indisprimary && IndexIsValid(index))
 		{
-			int			numatts = index->indnatts;
+			int indnkeyatts = index->indnkeyatts;
 
-			if (numatts > 0)
+			if (indnkeyatts > 0)
 			{
 int			i;
 
@@ -150,7 +150,7 @@ triggered_change_notification(PG_FUNCTION_ARGS)
 appendStringInfoCharMacro(payload, ',');
 appendStringInfoCharMacro(payload, operation);
 
-for (i = 0; i < numatts; i++)
+for (i = 0; i < indnkeyatts; i++)
 {
 	int			colno = index->indkey.values[i];
 
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index df0435c..e196e20 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -3620,6 +3620,14 @@
   pg_class.relnatts)
  
 
+  
+  indnkeyatts
+  int2
+  
+  The number of key columns in the index. "Key columns" are ordinary
+  index columns (as opposed to "included" columns).
+ 
+
  
   indisunique
   bool
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index ac51258..f9539e9 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -114,6 +114,8 @@ typedef struct IndexAmRoutine
 boolamcanparallel;
 /* type of data stored in index, or InvalidOid if variable */
 Oid amkeytype;
+/* does AM support columns included with clause INCLUDE? */
+boolamca

Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki

2017-03-21 Thread Pavel Stehule

2017-03-21 10:59 GMT+01:00 Jan Michálek :

>
>
> 2017-03-21 9:59 GMT+01:00 Ideriha, Takeshi  >:
>
>> Hi,
>>
>> I like your suggestion and took a look at your patch though I’m not the
>> expert about psql.
>>
>>
>>
>> I like the idea taking advantage of linestyle utilities
>>
>> to implement rst and markdown format efficiently instead of newly
>> developing pset format things.
>>
>> But I'm thinking two comments below needs change to something about not
>> focusing only linestyle.
>>
>> That's because they really take care of both '\pset linestyle and \pset
>> format' and it may lead to misunderstanding to readers.
>>
>>
>>
>> ---
>>
>> /* Line style control structures */
>>
>> const printTextFormat pg_markdown =
>>
>>
>>
>> /* get selected or default line style */
>>
>> const printTextFormat *
>>
>> get_line_style(const printTableOpt *opt)
>>
>> ---
>>
>
> It is in command.c?
>
> I have it done that \pset format changes linestyle
>
> psql (9.6.2, server 9.6.1)
> Type "help" for help.
>
> jelen=# \pset linestyle ascii
> Line style is ascii.
> jelen=# \pset format rst
> Output format is rst.
> jelen=# \pset linestyle
> Line style is rst.
> jelen=#
>
> Peter wrote that this is not right, but i don`t know how it should like,
> because most of this is done on linestyle, format is used only for switch
> from console.
>
>
>
>>
>>
>> The rest things are about code style convention.
>>
>> - there are some indents with white spaces around
>> skip_leading_spaces_print()
>>
>>   but Postgresql conventions says indents should be with 4 column tab.
>>
>>   https://www.postgresql.org/docs/devel/static/source-format.html
>>
>
> Thanks, i often using 4 whitespaces (i have it in vim) but in other code i
> found mostly used 8 whitespaces.
> I will look on this. I use code from another functions (fputnbytes,
> print_html_escaped) as template.
>
>
>>
>>
>> - On the other hand, in docs there are some tab indent
>>
>>   but white space indenet is preferable. Looking around sgml files, white
>> space is used.
>>
>>
>>
>> - some multi-line comment style also needs fix according to the above
>> documentation (link)
>>
>
> I will look on the comments, this is only work version, coding style
> issues will be corrected (i have some comments only my orientation in code).
>
>
>>
>>
>> - And I also found patch cannot be applied to current master.
>>
>
> I have 9.6.2 source code. It is not correct? Where i find source code i
> should use?
>

Please use git  master  https://wiki.postgresql.org/wiki/Working_with_Git

Regards

Pavel

>
> Have nice day
> Jan
>
>
>>
>>
>> Regards,
>>
>> Ideriha, Takeshi
>>
>>
>>
>> *From:* pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-owner@po
>> stgresql.org] *On Behalf Of *Jan Michalek
>> *Sent:* Sunday, March 19, 2017 5:10 AM
>> *To:* Peter Eisentraut 
>> *Cc:* Pavel Stehule ; PostgreSQL mailing lists <
>> pgsql-hackers@postgresql.org>
>> *Subject:* Re: [HACKERS] Other formats in pset like markdown, rst,
>> mediawiki
>>
>>
>>
>>
>>
>>
>>
>> 2017-03-12 19:21 GMT+01:00 Jan Michálek :
>>
>>
>>
>>
>>
>> 2017-03-10 9:43 GMT+01:00 Jan Michálek :
>>
>>
>>
>>
>>
>> 2017-03-09 20:10 GMT+01:00 Peter Eisentraut <
>> peter.eisentr...@2ndquadrant.com>:
>>
>> This is looking pretty neat.  I played around with it a bit.  There are
>> a couple of edge cases that you need to address, I think.
>>
>>
>>
>> Thanks, original code is very synoptical and and well prepared for adding
>> new formats.
>>
>>
>>
>>
>> - Does not support \x
>>
>>
>>
>> I know, i dnot`t know, if \x make sense in this case. I will look, how it
>> is done in other formats like html. I think, that it should work in sense,
>> that table generated to rst should give similar output after processing
>> like output of html format.
>>
>>
>>
>> I prepared something like this (i have no prepared diff, i need do some
>> another changes)
>>
>> There a few things I need to do. First problem is bold column names, i
>> should do it in sme fashin as "RECORD", but i need to do some research
>> about length of column.
>>
>> Bigger problem is with tab indent, rst processor doesn`t work with this
>> in this case.
>>
>>
>> In new diff is added \x for rst and skipping leading spaces in rst in
>> both. make check passed
>>
>>
>>
>> Jan
>>
>>
>>
>>
>>
>> jelen=# execute q \g | xclip
>> +-+-
>> ---+
>> | **RECORD 1**
>> |
>> +-+-
>> ---+
>> | column1 | Elephant, kangaroo,
>> |
>> | | squirrel, gorilla
>> |
>> +-+-
>> ---+
>> | column2 | 121
>> |
>> +-+-
>> ---+
>> | column3 | 1.0035971223021583
>> |
>> +-+-
>> ---+
>> | column4 | 0.
>> |
>> +-+

Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

2017-03-21 Thread Rafia Sabih

On Wed, Mar 15, 2017 at 8:55 PM, Robert Haas  wrote:
> Note this:
>
> if (completed || !fcache->returnsSet)
> postquel_end(es);
>
> When the SQL function doesn't return a set, then we can allow
> parallelism even when lazyEval is set, because we'll only call
> ExecutorStart() once.  But my impression is that something like this:

Well, when I test following SQL function I see it cannot be
parallelised because lazyEval is true for it though it is not
returning set,

CREATE OR REPLACE FUNCTION not_parallel()
RETURNS bigint AS $$
BEGIN
  SELECT count(distinct i) FROM t WHERE j = 12;
END;
$$ LANGUAGE sql;

Query Text:
 SELECT count(distinct i) FROM t WHERE j = 12;
Aggregate  (cost=34.02..34.02 rows=1 width=8) (actual
time=0.523..0.523 rows=1 loops=1)
 ->  Seq Scan on t  (cost=0.00..34.01 rows=1 width=4) (actual
time=0.493..0.493 rows=0 loops=1)
   Filter: (j = 12)
   Rows Removed by Filter: 2001
2017-03-21 15:24:03.378 IST [117823] CONTEXT:  SQL function
"already_parallel" statement 1
2017-03-21 15:24:03.378 IST [117823] LOG:  duration: 94868.181 ms  plan:
Query Text: select already_parallel();
Result  (cost=0.00..0.26 rows=1 width=8) (actual
time=87981.047..87981.048 rows=1 loops=1)
 already_parallel
--
0
(1 row)

As far as my understanding goes for this case, lazyEvalOk is set
irrespective of whether the function returns set or not in fmgr_sql,

else
{
randomAccess = false;
lazyEvalOK = true;
}

then it is passed to init_sql_fcache which is then passed to
init_execution_state where cache->lazyEval is set,

if (lasttages && fcache->junkFilter)
{
lasttages->setsResult = true;
if (lazyEvalOK &&
lasttages->stmt->commandType == CMD_SELECT &&
!lasttages->stmt->hasModifyingCTE)
fcache->lazyEval = lasttages->lazyEval = true;
}

Finally, this lazyEval is passed to ExecutorRun in postquel_getnext
that restricts parallelism by setting execute_once = 0,

/* Run regular commands to completion unless lazyEval */
uint64 count = (es->lazyEval) ? 1 : 0;

ExecutorRun(es->qd, ForwardScanDirection, count, !es->lazyEval);

So, this is my concern that why is such a query should not execute in
parallel when in SQL function. If I run this same query from PLpgsql
function then it can run in parallel,

CREATE OR REPLACE FUNCTION not_parallel()
RETURNS bigint AS $$
declare cnt int:=0;
BEGIN
  SELECT count(distinct i) into cnt FROM t WHERE j = 12;
  RETURN cnt;
END;
$$ LANGUAGE plpgsql;

select not_parallel();
2017-03-21 15:28:56.282 IST [123086] LOG:  duration: 0.003 ms  plan:
Query Text: SELECT count(distinct i)  FROM t WHERE j = 12
Parallel Seq Scan on t  (cost=0.00..19.42 rows=1 width=4) (actual
time=0.001..0.001 rows=0 loops=1)
 Filter: (j = 12)
2017-03-21 15:28:56.282 IST [123087] LOG:  duration: 0.003 ms  plan:
Query Text: SELECT count(distinct i)  FROM t WHERE j = 12
Parallel Seq Scan on t  (cost=0.00..19.42 rows=1 width=4) (actual
time=0.001..0.001 rows=0 loops=1)
 Filter: (j = 12)
2017-03-21 15:28:57.530 IST [117823] LOG:  duration: 1745.372 ms  plan:
Query Text: SELECT count(distinct i)  FROM t WHERE j = 12
Aggregate  (cost=19.42..19.43 rows=1 width=8) (actual
time=1255.743..1255.743 rows=1 loops=1)
 ->  Gather  (cost=0.00..19.42 rows=1 width=4) (actual
time=1255.700..1255.700 rows=0 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on t  (cost=0.00..19.42 rows=1 width=4)
(actual time=418.443..418.443 rows=0 loops=3)
 Filter: (j = 12)
 Rows Removed by Filter: 667
2017-03-21 15:28:57.530 IST [117823] CONTEXT:  SQL statement "SELECT
count(distinct i)  FROM t WHERE j = 12"
PL/pgSQL function not_parallel() line 4 at SQL statement
2017-03-21 15:28:57.531 IST [117823] LOG:  duration: 2584.282 ms  plan:
Query Text: select not_parallel();
Result  (cost=0.00..0.26 rows=1 width=8) (actual
time=2144.315..2144.316 rows=1 loops=1)
 not_parallel
--
0
(1 row)

Hence, it appears lazyEval is the main reason behind it and it should
be definitely fixed in my opinion.
Please enlighten me with your comments/opinions.

Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect and hash indexes

2017-03-21 Thread Amit Kapila

On Tue, Mar 21, 2017 at 10:15 AM, Ashutosh Sharma  wrote:
>>
>> I think it is not just happening for freed overflow but also for newly
>> allocated bucket page. It would be good if we could mark  freed
>> overflow page as UNUSED page rather than just initialising it's header
>> portion and leaving the page type in special area as it is. Attached
>> is the patch '0001-mark_freed_ovflpage_as_UNUSED_pagetype.patch' that
>> marks a freed overflow page as an unused page.
>>
>> Also, I have now changed pageinspect module to handle unused and empty
>> hash index page. Attached is the patch
>> (0002-allow_pageinspect_handle_UNUSED_OR_EMPTY_hash_pages.patch) for
>> the same.
>>
>> [1] - 
>> https://www.postgresql.org/message-id/CAE9k0P%3DN%2BJjzqnHqrURE7ZQMgySRpv%3DBkjafbz%3DpeD4cbCgbhA%40mail.gmail.com
>>

I have yet to review your patches, let's first try to clear other
things before doing so.

>
> I think when expanding hash index table we are only initialising the
> pages that will be in-use after split or the last block in the index.
> The initial few pages where tuples will be moved from old to new pages
> has no issues but the last block that is just initialised and is not
> marked as hash page needs to be handled along with the freed overflow
> page.
>

I think PageIsEmpty() check in hashfuncs.c is sufficient for same.  I
don't see any value in treating the last page allocated during
_hash_alloc_buckets() any different from other pages which are prior
to that but will get allocated when required.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-03-21 Thread Rajkumar Raghuwanshi

> On Mon, Mar 20, 2017 at 1:19 PM, Ashutosh Bapat
>  wrote:

I have created some test to cover partition wise joins with
postgres_fdw, also verified make check.
patch attached.

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 059c5c3..f0b1a32 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -7181,3 +7181,149 @@ AND ftoptions @> array['fetch_size=6'];
 (1 row)
 
 ROLLBACK;
+-- ===
+-- test partition-wise-joins
+-- ===
+SET enable_partition_wise_join=on;
+--range partition
+CREATE TABLE fprt1 (a int, b int, c varchar) PARTITION BY RANGE(a);
+CREATE TABLE fprt1_p1 (a int, b int, c text);
+CREATE TABLE fprt1_p2 (a int, b int, c text);
+CREATE FOREIGN TABLE ftprt1_p1 PARTITION OF fprt1 FOR VALUES FROM (0) TO (250)
+SERVER loopback OPTIONS (TABLE_NAME 'fprt1_p1');
+CREATE FOREIGN TABLE ftprt1_p2 PARTITION OF fprt1 FOR VALUES FROM (250) TO (500)
+SERVER loopback OPTIONS (TABLE_NAME 'fprt1_p2');
+INSERT INTO fprt1_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 2) i;
+INSERT INTO fprt1_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 2) i;
+ANALYZE fprt1;
+CREATE TABLE fprt2 (a int, b int, c varchar) PARTITION BY RANGE(b);
+CREATE TABLE fprt2_p1 (a int, b int, c text);
+CREATE TABLE fprt2_p2 (a int, b int, c text);
+CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (0) TO (250)
+SERVER loopback OPTIONS (TABLE_NAME 'fprt2_p1');
+CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500)
+SERVER loopback OPTIONS (TABLE_NAME 'fprt2_p2');
+INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i;
+INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i;
+ANALYZE fprt2;
+-- inner join three tables, all join qualified
+EXPLAIN (COSTS OFF)
+SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER JOIN fprt1 t3 ON (t2.b = t3.a) WHERE t1.a % 25 =0 ORDER BY 1,2,3;
+ QUERY PLAN 
+
+ Sort
+   Sort Key: t1.a, t3.c
+   ->  Append
+ ->  Foreign Scan
+   Relations: ((public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2)) INNER JOIN (public.ftprt1_p1 t3)
+ ->  Foreign Scan
+   Relations: ((public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2)) INNER JOIN (public.ftprt1_p2 t3)
+(7 rows)
+
+SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER JOIN fprt1 t3 ON (t2.b = t3.a) WHERE t1.a % 25 =0 ORDER BY 1,2,3;
+  a  |  b  |  c   
+-+-+--
+   0 |   0 | 
+ 150 | 150 | 0003
+ 250 | 250 | 0005
+ 400 | 400 | 0008
+(4 rows)
+
+-- left outer join + nullable clasue
+EXPLAIN (COSTS OFF)
+SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
+ QUERY PLAN  
+-
+ Merge Append
+   Sort Key: t1.a, ftprt2_p1.b, ftprt2_p1.c
+   ->  Foreign Scan
+ Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2)
+(4 rows)
+
+SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
+ a | b |  c   
+---+---+--
+ 0 | 0 | 
+ 2 |   | 
+ 4 |   | 
+ 6 | 6 | 
+ 8 |   | 
+(5 rows)
+
+-- full outer join + right outer join
+EXPLAIN (COSTS OFF)
+SELECT t1.a,t2.b,t3.c FROM fprt1 t1 FULL JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) RIGHT JOIN fprt1 t3 ON (t2.a = t3.b and t2.a = t3.b) WHERE t1.a % 25 =0 ORDER BY 1,2,3;
+ QUERY PLAN  
+-
+ Sort
+   Sort Key: t1.a, t3.c
+   ->  Hash Join
+ Hash Cond: (t3.b = t1.b)
+ ->  Append
+   ->  Seq Scan on fprt1 t3
+   ->  Foreign Scan on ftprt1_p1 t3_1
+   ->  Foreign Scan on ftprt1_p2 t3_2
+ ->  Hash
+   ->  Append
+ ->  Foreign Scan
+   Relations: (public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2)
+ ->  Foreign Scan
+   Relations: (public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2)
+(14 rows)
+
+SELECT t1.a,t2.b,t3.c FROM fprt1 t1 FULL JOIN fprt2 t2 ON

Re: [HACKERS] Logical Replication and Character encoding

2017-03-21 Thread Kyotaro HORIGUCHI

At Mon, 6 Mar 2017 11:13:48 +0100, Petr Jelinek  
wrote in 
> >>> Well the length is necessary to be able to add binary format support in
> >>> future so it's definitely not an omission.
> >>
> >> Right.  So it appears the right function to use here is
> >> pq_sendcountedtext().  However, this sends the strings without null
> >> termination, so we'd have to do extra processing on the receiving end.
> >> Or we could add an option to pq_sendcountedtext() to make it do what we
> >> want.  I'd rather stick to standard pqformat.c functions for handling
> >> the protocol.
> > 
> > It seems reasonable. I changed the prototype of
> > pg_sendcountedtext as the following,
> > 
> > | extern void pq_sendcountedtext(StringInfo buf, const char *str, int slen,
> > |  bool countincludesself, bool binary);
> > 
> > I think 'binary' seems fine for the parameter here. The patch
> > size increased a bit.
> > 
> 
> Hmm I find it bit weird that binary true means NULL terminated while
> false means not NULL terminated, I would think that the behaviour would
> be exactly opposite given that text strings are the ones where NULL
> termination matters?

You're right. I renamed it as with more straightforward
name. Addition to that, it contained a stupid bug that sends a
garbage byte when non-null-terminated string is not converted.
And the comment is edited to reflect the new behavior.

> > By the way, I noticed that postmaster launches logical
> > replication launcher even if wal_level < logical. Is it right?
> 
> Yes, that came up couple of times in various threads. The logical
> replication launcher is needed only to start subscriptions and
> subscriptions don't need wal_level to be logical, only publication needs
> that.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c
index 5fe1c72..62fa70d 100644
--- a/src/backend/access/common/printsimple.c
+++ b/src/backend/access/common/printsimple.c
@@ -96,7 +96,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self)
 	pq_sendcountedtext(&buf,
 	   VARDATA_ANY(t),
 	   VARSIZE_ANY_EXHDR(t),
-	   false);
+	   false, false);
 }
 break;
 
@@ -106,7 +106,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self)
 	char	str[12];	/* sign, 10 digits and '\0' */
 
 	pg_ltoa(num, str);
-	pq_sendcountedtext(&buf, str, strlen(str), false);
+	pq_sendcountedtext(&buf, str, strlen(str), false, false);
 }
 break;
 
@@ -116,7 +116,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self)
 	char	str[23];	/* sign, 21 digits and '\0' */
 
 	pg_lltoa(num, str);
-	pq_sendcountedtext(&buf, str, strlen(str), false);
+	pq_sendcountedtext(&buf, str, strlen(str), false, false);
 }
 break;
 
diff --git a/src/backend/access/common/printtup.c b/src/backend/access/common/printtup.c
index a2ca2d7..1ebc50f 100644
--- a/src/backend/access/common/printtup.c
+++ b/src/backend/access/common/printtup.c
@@ -353,7 +353,8 @@ printtup(TupleTableSlot *slot, DestReceiver *self)
 			char	   *outputstr;
 
 			outputstr = OutputFunctionCall(&thisState->finfo, attr);
-			pq_sendcountedtext(&buf, outputstr, strlen(outputstr), false);
+			pq_sendcountedtext(&buf, outputstr, strlen(outputstr),
+			   false, false);
 		}
 		else
 		{
@@ -442,7 +443,7 @@ printtup_20(TupleTableSlot *slot, DestReceiver *self)
 		Assert(thisState->format == 0);
 
 		outputstr = OutputFunctionCall(&thisState->finfo, attr);
-		pq_sendcountedtext(&buf, outputstr, strlen(outputstr), true);
+		pq_sendcountedtext(&buf, outputstr, strlen(outputstr), true, false);
 	}
 
 	pq_endmessage(&buf);
diff --git a/src/backend/libpq/pqformat.c b/src/backend/libpq/pqformat.c
index c8cf67c..a83c73e 100644
--- a/src/backend/libpq/pqformat.c
+++ b/src/backend/libpq/pqformat.c
@@ -123,30 +123,39 @@ pq_sendbytes(StringInfo buf, const char *data, int datalen)
  * The data sent to the frontend by this routine is a 4-byte count field
  * followed by the string.  The count includes itself or not, as per the
  * countincludesself flag (pre-3.0 protocol requires it to include itself).
- * The passed text string need not be null-terminated, and the data sent
- * to the frontend isn't either.
+ * The passed text string need not be null-terminated but should not contain
+ * null as a part. sendterminator instructs to send null-terminator.
  * 
  */
 void
 pq_sendcountedtext(StringInfo buf, const char *str, int slen,
-   bool countincludesself)
+   bool countincludesself, bool sendterminator)
 {
-	int			extra = countincludesself ? 4 : 0;
+	int			extra_self		= countincludesself ? 4 : 0;
+	int			extra_term		= sendterminator ? 1 : 0;
 	char	   *p;
 
 	p = pg_server_to_client(str, slen);
 	if (p != str)/* actual conversion has been done? */
 	{
 		slen = strlen(p);
-		pq_sendint(buf, slen + extra, 4);
+		pq_sendint(buf, slen +

Re: [HACKERS] Logical Replication and Character encoding

2017-03-21 Thread Kyotaro HORIGUCHI

Mmm. I shot the previous mail halfway.

At Mon, 6 Mar 2017 11:13:48 +0100, Petr Jelinek  
wrote in 
> > By the way, I noticed that postmaster launches logical
> > replication launcher even if wal_level < logical. Is it right?
> 
> Yes, that came up couple of times in various threads. The logical
> replication launcher is needed only to start subscriptions and
> subscriptions don't need wal_level to be logical, only publication needs
> that.

I understand, thanks. Anyway the message seems to have hidden
from startup message:)

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

2017-03-21 Thread Dilip Kumar

On Tue, Mar 21, 2017 at 3:36 PM, Rafia Sabih
 wrote:
> On Wed, Mar 15, 2017 at 8:55 PM, Robert Haas  wrote:
>> Note this:
>>
>> if (completed || !fcache->returnsSet)
>> postquel_end(es);
>>
>> When the SQL function doesn't return a set, then we can allow
>> parallelism even when lazyEval is set, because we'll only call
>> ExecutorStart() once.  But my impression is that something like this:

How about taking the decision of execute_once based on
fcache->returnsSet instead of based on lazyEval?

change
+ ExecutorRun(es->qd, ForwardScanDirection, count, !es->lazyEval);
to
+ ExecutorRun(es->qd, ForwardScanDirection, count, !fcache->returnsSet);

IMHO, Robert have the same thing in mind?

>SELECT * FROM blah() LIMIT 3
>
>...will trigger three separate calls to ExecutorRun(), which is a
>problem if the plan is a parallel plan.

And you also need to test this case what Robert have mentioned up thread.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki

2017-03-21 Thread Jan Michálek

2017-03-21 11:01 GMT+01:00 Pavel Stehule :

>
>
> 2017-03-21 10:59 GMT+01:00 Jan Michálek :
>
>>
>>
>> 2017-03-21 9:59 GMT+01:00 Ideriha, Takeshi > m>:
>>
>>> Hi,
>>>
>>> I like your suggestion and took a look at your patch though I’m not the
>>> expert about psql.
>>>
>>>
>>>
>>> I like the idea taking advantage of linestyle utilities
>>>
>>> to implement rst and markdown format efficiently instead of newly
>>> developing pset format things.
>>>
>>> But I'm thinking two comments below needs change to something about not
>>> focusing only linestyle.
>>>
>>> That's because they really take care of both '\pset linestyle and \pset
>>> format' and it may lead to misunderstanding to readers.
>>>
>>>
>>>
>>> ---
>>>
>>> /* Line style control structures */
>>>
>>> const printTextFormat pg_markdown =
>>>
>>>
>>>
>>> /* get selected or default line style */
>>>
>>> const printTextFormat *
>>>
>>> get_line_style(const printTableOpt *opt)
>>>
>>> ---
>>>
>>
>> It is in command.c?
>>
>> I have it done that \pset format changes linestyle
>>
>> psql (9.6.2, server 9.6.1)
>> Type "help" for help.
>>
>> jelen=# \pset linestyle ascii
>> Line style is ascii.
>> jelen=# \pset format rst
>> Output format is rst.
>> jelen=# \pset linestyle
>> Line style is rst.
>> jelen=#
>>
>> Peter wrote that this is not right, but i don`t know how it should like,
>> because most of this is done on linestyle, format is used only for switch
>> from console.
>>
>>
>>
>>>
>>>
>>> The rest things are about code style convention.
>>>
>>> - there are some indents with white spaces around
>>> skip_leading_spaces_print()
>>>
>>>   but Postgresql conventions says indents should be with 4 column tab.
>>>
>>>   https://www.postgresql.org/docs/devel/static/source-format.html
>>>
>>
>> Thanks, i often using 4 whitespaces (i have it in vim) but in other code
>> i found mostly used 8 whitespaces.
>> I will look on this. I use code from another functions (fputnbytes,
>> print_html_escaped) as template.
>>
>>
>>>
>>>
>>> - On the other hand, in docs there are some tab indent
>>>
>>>   but white space indenet is preferable. Looking around sgml files,
>>> white space is used.
>>>
>>>
>>>
>>> - some multi-line comment style also needs fix according to the above
>>> documentation (link)
>>>
>>
>> I will look on the comments, this is only work version, coding style
>> issues will be corrected (i have some comments only my orientation in code).
>>
>>
>>>
>>>
>>> - And I also found patch cannot be applied to current master.
>>>
>>
>> I have 9.6.2 source code. It is not correct? Where i find source code i
>> should use?
>>
>
> Please use git  master  https://wiki.postgresql.org/wiki/Working_with_Git
>
> Thanks, i will look on this on weekend.



> Regards
>
> Pavel
>
>>
>> Have nice day
>> Jan
>>
>>
>>>
>>>
>>> Regards,
>>>
>>> Ideriha, Takeshi
>>>
>>>
>>>
>>> *From:* pgsql-hackers-ow...@postgresql.org [mailto:
>>> pgsql-hackers-ow...@postgresql.org] *On Behalf Of *Jan Michalek
>>> *Sent:* Sunday, March 19, 2017 5:10 AM
>>> *To:* Peter Eisentraut 
>>> *Cc:* Pavel Stehule ; PostgreSQL mailing lists
>>> 
>>> *Subject:* Re: [HACKERS] Other formats in pset like markdown, rst,
>>> mediawiki
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> 2017-03-12 19:21 GMT+01:00 Jan Michálek :
>>>
>>>
>>>
>>>
>>>
>>> 2017-03-10 9:43 GMT+01:00 Jan Michálek :
>>>
>>>
>>>
>>>
>>>
>>> 2017-03-09 20:10 GMT+01:00 Peter Eisentraut <
>>> peter.eisentr...@2ndquadrant.com>:
>>>
>>> This is looking pretty neat.  I played around with it a bit.  There are
>>> a couple of edge cases that you need to address, I think.
>>>
>>>
>>>
>>> Thanks, original code is very synoptical and and well prepared for
>>> adding new formats.
>>>
>>>
>>>
>>>
>>> - Does not support \x
>>>
>>>
>>>
>>> I know, i dnot`t know, if \x make sense in this case. I will look, how
>>> it is done in other formats like html. I think, that it should work in
>>> sense, that table generated to rst should give similar output after
>>> processing like output of html format.
>>>
>>>
>>>
>>> I prepared something like this (i have no prepared diff, i need do some
>>> another changes)
>>>
>>> There a few things I need to do. First problem is bold column names, i
>>> should do it in sme fashin as "RECORD", but i need to do some research
>>> about length of column.
>>>
>>> Bigger problem is with tab indent, rst processor doesn`t work with this
>>> in this case.
>>>
>>>
>>> In new diff is added \x for rst and skipping leading spaces in rst in
>>> both. make check passed
>>>
>>>
>>>
>>> Jan
>>>
>>>
>>>
>>>
>>>
>>> jelen=# execute q \g | xclip
>>> +-+-
>>> ---+
>>> | **RECORD 1**
>>> |
>>> +-+-
>>> ---+
>>> | column1 | Elephant, kangaroo,
>>> |
>>> | | squirrel, gorilla
>>> |
>>> +-+-
>>> ---+
>>> | column2 | 121
>>> |
>>> +---

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Amit Kapila

On Fri, Mar 10, 2017 at 11:37 PM, Alvaro Herrera
 wrote:
> Robert Haas wrote:
>> On Wed, Mar 8, 2017 at 2:30 PM, Alvaro Herrera  
>> wrote:
>> > Not really -- it's a bit slower actually in a synthetic case measuring
>> > exactly the slowed-down case.  See
>> > https://www.postgresql.org/message-id/cad__ougk12zqmwwjzim-yyud1y8jmmy6x9yectnif3rpp6h...@mail.gmail.com
>> > I bet in normal cases it's unnoticeable.  If WARM flies, then it's going
>> > to provide a larger improvement than is lost to this.
>>
>> Hmm, that test case isn't all that synthetic.  It's just a single
>> column bulk update, which isn't anything all that crazy,
>
> The problem is that the update touches the second indexed column.  With
> the original code we would have stopped checking at that point, but with
> the patched code we continue to verify all the other indexed columns for
> changes.
>
> Maybe we need more than one bitmapset to be given -- multiple ones for
> for "any of these" checks (such as HOT, KEY and Identity) which can be
> stopped as soon as one is found, and one for "all of these" (for WARM,
> indirect indexes) which needs to be checked to completion.
>

How will that help to mitigate the regression?  I think what might
help here is if we fetch the required columns for WARM only when we
know hot_update is false.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Amit Kapila

On Thu, Mar 9, 2017 at 8:43 AM, Robert Haas  wrote:
> On Wed, Mar 8, 2017 at 2:30 PM, Alvaro Herrera  
> wrote:
>> Not really -- it's a bit slower actually in a synthetic case measuring
>> exactly the slowed-down case.  See
>> https://www.postgresql.org/message-id/cad__ougk12zqmwwjzim-yyud1y8jmmy6x9yectnif3rpp6h...@mail.gmail.com
>> I bet in normal cases it's unnoticeable.  If WARM flies, then it's going
>> to provide a larger improvement than is lost to this.
>
> Hmm, that test case isn't all that synthetic.  It's just a single
> column bulk update, which isn't anything all that crazy, and 5-10%
> isn't nothing.
>
> I'm kinda surprised it made that much difference, though.
>

I think it is because heap_getattr() is not that cheap.  We have
noticed the similar problem during development of scan key push down
work [1].

[1] - https://commitfest.postgresql.org/12/850/

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] segfault in hot standby for hash indexes

2017-03-21 Thread Ashutosh Sharma

Hi Jeff,

On Tue, Mar 21, 2017 at 1:54 PM, Amit Kapila  wrote:
> On Tue, Mar 21, 2017 at 1:28 PM, Jeff Janes  wrote:
>> Against an unmodified HEAD (17fa3e8), I got a segfault in the hot standby.
>>
>
> I think I see the problem in hash_xlog_vacuum_get_latestRemovedXid().
> It seems to me that we are using different block_id for registering
> the deleted items in xlog XLOG_HASH_VACUUM_ONE_PAGE and then using
> different block_id for fetching those items in
> hash_xlog_vacuum_get_latestRemovedXid().  So probably matching those
> will fix this issue (instead of fetching block number and items from
> block_id 1, we should use block_id 0).
>

Thanks for reporting this issue. As Amit said, it is happening due to
block_id mismatch. Attached is the patch that fixes the same. I
apologise for such a silly mistake. Please note that  I was not able
to reproduce the issue on my local machine using the test script you
shared. Could you please check with the attached patch if you are
still seeing the issue. Thanks in advance.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


corrected_block_id_reference_in_hash_vacuum_get_latestRemovedXid.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Problem in Parallel Bitmap Heap Scan?

2017-03-21 Thread Thomas Munro

Hi,

I noticed a failure in the inet.sql test while running the regression
tests with parallelism cranked up, and can reproduce it interactively
as follows.  After an spgist index is created and the plan changes to
the one shown below, the query returns no rows.

regression=# set force_parallel_mode = regress;
SET
regression=# set max_parallel_workers_per_gather = 2;
SET
regression=# set parallel_tuple_cost = 0;
SET
regression=# set parallel_setup_cost = 0;
SET
regression=# set min_parallel_table_scan_size = 0;
SET
regression=# set min_parallel_index_scan_size = 0;
SET
regression=# set enable_seqscan = off;
SET
regression=# SELECT * FROM inet_tbl WHERE i <> '192.168.1.0/24'::cidr
ORDER BY i;

c  |i
+--
 10.0.0.0/8 | 9.1.2.3/8
 10.0.0.0/8 | 10.1.2.3/8
 10.0.0.0/32| 10.1.2.3/8
 10.0.0.0/8 | 10.1.2.3/8
 10.1.0.0/16| 10.1.2.3/16
 10.1.2.0/24| 10.1.2.3/24
 10.1.2.3/32| 10.1.2.3
 10.0.0.0/8 | 11.1.2.3/8
 192.168.1.0/24 | 192.168.1.226/24
 192.168.1.0/24 | 192.168.1.255/24
 192.168.1.0/24 | 192.168.1.0/25
 192.168.1.0/24 | 192.168.1.255/25
 192.168.1.0/26 | 192.168.1.226
 10.0.0.0/8 | 10::/8
 :::1.2.3.4/128 | ::4.3.2.1/24
 10:23::f1/128  | 10:23::f1/64
 10:23::8000/113| 10:23::
(17 rows)

regression=# CREATE INDEX inet_idx3 ON inet_tbl using spgist (i);
CREATE INDEX
regression=# SELECT * FROM inet_tbl WHERE i <> '192.168.1.0/24'::cidr
ORDER BY i;
 c | i
---+---
(0 rows)

regression=# explain SELECT * FROM inet_tbl WHERE i <>
'192.168.1.0/24'::cidr ORDER BY i;
   QUERY PLAN
-
 Gather Merge  (cost=16.57..16.67 rows=10 width=64)
   Workers Planned: 1
   ->  Sort  (cost=16.56..16.58 rows=10 width=64)
 Sort Key: i
 ->  Parallel Bitmap Heap Scan on inet_tbl  (cost=12.26..16.39
rows=10 width=64)
   Recheck Cond: (i <> '192.168.1.0/24'::inet)
   ->  Bitmap Index Scan on inet_idx3  (cost=0.00..12.26
rows=17 width=0)
 Index Cond: (i <> '192.168.1.0/24'::inet)
(8 rows)


-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] postgres_fdw: correct regression test for parameterized scan for foreign table

2017-03-21 Thread Etsuro Fujita


Hi,

While working on adding support for parameterized foreign joins to  
postgres_fdw, I noticed that commit  
e4106b2528727c4b48639c0e12bf2f70a766b910 forgot to modify a test query  
for execution of a parameterized foreign scan for a foreign table:


--- parameterized remote path
+-- parameterized remote path for foreign table
 EXPLAIN (VERBOSE, COSTS false)
-  SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;
+  SELECT * FROM "S 1"."T 1" a, ft2 b WHERE a."C 1" = 47 AND b.c1 = a.c2;
 SELECT * FROM ft2 a, ft2 b WHERE a.c1 = 47 AND b.c1 = a.c2;

The query in the last line as-is would test to see if the remote join  
works correctly, which isn't the right thing to do here.  Attached is a  
small patch for addressing that.


Best regards,
Etsuro Fujita


postgresql-fdw-regress.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Create replication slot in pg_basebackup if requested and not yet present

2017-03-21 Thread Michael Banck

Hi,

Am Dienstag, den 21.03.2017, 11:03 +0900 schrieb Michael Paquier:
> On Tue, Mar 21, 2017 at 1:32 AM, Michael Banck
>  wrote:
>  /*
> + * Try to create a permanent replication slot if one is specified. Do
> + * not error out if the slot already exists, other errors are already
> + * reported by CreateReplicationSlot().
> + */
> +if (!stream->temp_slot && stream->replication_slot)
> +{
> +if (!CreateReplicationSlot(conn, stream->replication_slot,
> NULL, true, true))
> +return false;
> +}
> This could be part of an else taken from the previous if where
> temp_slot is checked.

Not sure how this would work, as ISTM the if clause above only sets the
name for param->temp_slot (if supported), but doesn't distinguish which
kind of slot (if any) will actually be created. 

I've done it for the second (refactoring) patch though, but I had to
make `no_slot' a global variable to not have the logic be too
complicated.

> There should be some TAP tests included. And +1 for sharing the same
> behavior as pg_receivewal.

Well, I've adjusted the TAP tests in so far as it's now checking that
the physical slot got created, previously it checked for the opposite:

|-$node->command_fails(
|+$node->command_ok(
|[   'pg_basebackup', '-D',
|"$tempdir/backupxs_sl_fail", '-X',
|'stream','-S',
|-   'slot1' ],
|-   'pg_basebackup fails with nonexistent replication slot');
|+   'slot0' ],
|+   'pg_basebackup runs with nonexistent replication slot');
|+
|+my $lsn = $node->safe_psql('postgres',
|+   q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name
|= 'slot0'}
|+);
|+like($lsn, qr!^0/[0-9A-F]{7,8}$!, 'slot is present');

I have now made the message a bit clearer by saying "pg_basebackup -S
creates previously nonexistent replication slot".

Probably there could be additional TAP tests, but off the top of my head
I can't think of any?

New patches attached.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
From 93337fe0ef320fe8ef68a9b64c4df85a63c4346c Mon Sep 17 00:00:00 2001
From: Michael Banck 
Date: Sun, 19 Mar 2017 10:58:13 +0100
Subject: [PATCH 1/2] Create replication slot in pg_basebackup if requested and
 not yet present.

If a replication slot is explicitly requested, try to create it before
starting to replicate from it.
---
 src/bin/pg_basebackup/pg_basebackup.c| 15 +++
 src/bin/pg_basebackup/t/010_pg_basebackup.pl | 19 ---
 2 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 0a4944d..69ca4be 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -581,6 +581,21 @@ StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier)
 	else
 		param->temp_slot = temp_replication_slot;
 
+	/*
+	 * Try to create a permanent replication slot if one is specified. Do
+	 * not error out if the slot already exists, other errors are already
+	 * reported by CreateReplicationSlot().
+	 */
+	if (!temp_replication_slot && replication_slot)
+	{
+		if (verbose)
+			fprintf(stderr, _("%s: creating replication slot \"%s\"\n"),
+	progname, replication_slot);
+
+		if (!CreateReplicationSlot(param->bgconn, replication_slot, NULL, true, true))
+			return 1;
+	}
+
 	if (format == 'p')
 	{
 		/*
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 14bd813..f50005f 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -4,7 +4,7 @@ use Cwd;
 use Config;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 72;
+use Test::More tests => 73;
 
 program_help_ok('pg_basebackup');
 program_version_ok('pg_basebackup');
@@ -246,17 +246,22 @@ $node->command_ok(
 	'pg_basebackup -X stream runs with --no-slot');
 
 $node->command_fails(
-	[ 'pg_basebackup', '-D', "$tempdir/fail", '-S', 'slot1' ],
+	[ 'pg_basebackup', '-D', "$tempdir/fail", '-X', 'none', '-S', 'slot0' ],
 	'pg_basebackup with replication slot fails without -X stream');
-$node->command_fails(
+$node->command_ok(
 	[   'pg_basebackup', '-D',
 		"$tempdir/backupxs_sl_fail", '-X',
 		'stream','-S',
-		'slot1' ],
-	'pg_basebackup fails with nonexistent replication slot');
+		'slot0' ],
+	'pg_basebackup -S creates previously nonexistent replication slot');
+
+my $lsn = $node->safe_psql('postgres',
+	q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot0'}
+);
+like($lsn, qr!^0/[0-9A-F]{7,8}$!, 'slot is present');
 
 $node->saf

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Robert Haas

On Tue, Mar 21, 2017 at 6:56 AM, Amit Kapila  wrote:
>> Hmm, that test case isn't all that synthetic.  It's just a single
>> column bulk update, which isn't anything all that crazy, and 5-10%
>> isn't nothing.
>>
>> I'm kinda surprised it made that much difference, though.
>>
>
> I think it is because heap_getattr() is not that cheap.  We have
> noticed the similar problem during development of scan key push down
> work [1].

Yeah.  So what's the deal with this?  Is somebody working on figuring
out a different approach that would reduce this overhead?  Are we
going to defer WARM to v11?  Or is the intent to just ignore the 5-10%
slowdown on a single column update and commit everything anyway?  (A
strong -1 on that course of action from me.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-03-21 Thread Thomas Munro

Hi,

Here is a new version of the patch series addressing complaints from
Rafia, Peter, Andres and Robert -- see below.

First, two changes not already covered in this thread:

1.  Today Robert asked me a question off-list that I hadn't previously
considered: since I am sharing tuples between backends, don't I have
the same kind of transient record remapping problems that tqueue.c has
to deal with?  The answer must be yes, and in fact it's a trickier
version because there are N 'senders' and N 'receivers' communicating
via the shared hash table.  So I decided to avoid the problem by not
planning shared hash tables if the tuples could include transient
RECORD types: see tlist_references_transient_type() in
0007-hj-shared-single-batch-v8.patch.  Perhaps in the future we can
find a way for parallel query to keep local types in sync, so this
restriction could be lifted.  (I've tested this with a specially
modified build, because I couldn't figure out how to actually get any
transient types to be considered in a parallel query, but if someone
has a suggestion for a good query for that I'd love to put one into
the regression test.)

2.  Earlier versions included support for Shared Hash (= one worker
builds, other workers wait, but we get to use work_mem * P memory) and
Parallel Shared Hash (= all workers build).  Shared Hash is by now
quite hard to reach, since so many hash join inner plans are now
parallelisable.  I decided to remove support for it from the latest
patch series: I think it adds cognitive load and patch lines for
little or no gain.  With time running out, I thought that it would be
better to rip it out for now to try to simplify things and avoid some
difficult questions about how to cost that mode.  It could be added
with a separate patch after some more study if it really does make
some sense.

>> On Mon, Mar 13, 2017 at 8:40 PM, Rafia Sabih
>>  wrote:
>>> In an attempt to test v7 of this patch on TPC-H 20 scale factor I found a
>>> few regressions,
>>> Q21: 52 secs on HEAD and  400 secs with this patch

As already mentioned there is a planner bug which we can fix
separately from this patch series.  Until that is resolved, please see
that other thread[1] for the extra tweak require for sane results when
testing Q21.

Even with that tweak, there was a slight regression with fewer than 3
workers at 1GB for Q21.  That turned out to be because the patched
version was not always using as many workers as unpatched.  To fix
that, I had to rethink the deadlock avoidance system to make it a bit
less conservative about giving up workers: see
src/backend/utils/misc/leader_gate.c in
0007-hj-shared-single-batch-v8.patch.

Here are some speed-up numbers comparing master to patched that I
recorded on TPCH scale 10 with work_mem = 1GB.  These are the queries
whose plans change with the patch.  Both master and v8 were patched
with fix-neqseljoin-for-semi-joins.patch.

 query | w = 0 | w = 1 | w = 2 | w = 3 | w = 4 | w = 5 | w = 6 | w = 7 | w = 8
---+---+---+---+---+---+---+---+---+---
 Q3| 0.94x | 1.06x | 1.25x | 1.46x | 1.64x | 1.87x | 1.99x | 1.67x | 1.67x
 Q5| 1.17x | 1.03x | 1.23x | 1.27x | 1.44x | 0.56x | 0.95x | 0.94x | 1.16x
 Q7| 1.13x | 1.04x | 1.31x | 1.06x | 1.15x | 1.28x | 1.31x | 1.35x | 1.13x
 Q8| 0.99x | 1.13x | 1.23x | 1.22x | 1.36x | 0.42x | 0.82x | 0.78x | 0.81x
 Q9| 1.16x | 0.95x | 1.92x | 1.68x | 1.90x | 1.89x | 2.02x | 2.05x | 1.81x
 Q10   | 1.01x | 1.03x | 1.08x | 1.10x | 1.16x | 1.17x | 1.09x | 1.01x | 1.07x
 Q12   | 1.03x | 1.19x | 1.42x | 0.75x | 0.74x | 1.00x | 0.99x | 1.00x | 1.01x
 Q13   | 1.10x | 1.66x | 1.99x | 1.00x | 1.12x | 1.00x | 1.12x | 1.01x | 1.13x
 Q14   | 0.97x | 1.13x | 1.22x | 1.45x | 1.43x | 1.55x | 1.55x | 1.50x | 1.45x
 Q16   | 1.02x | 1.13x | 1.07x | 1.09x | 1.10x | 1.10x | 1.13x | 1.10x | 1.11x
 Q18   | 1.05x | 1.43x | 1.33x | 1.21x | 1.07x | 1.57x | 1.76x | 1.09x | 1.09x
 Q21   | 0.99x | 1.01x | 1.07x | 1.18x | 1.28x | 1.37x | 1.63x | 1.26x | 1.60x

These tests are a bit short and noisy and clearly there are some
strange dips in there that need some investigation but the trend is
positive.

Here are some numbers from some simple test joins, so that you can see
the raw speedup of large hash joins without all the other things going
on in those TPCH plans.  I executed 1-join, 2-join and 3-join queries
like this:

  CREATE TABLE simple AS
  SELECT generate_series(1, 1000) AS id,
'aa';
  ANALYZE simple;

  SELECT COUNT(*)
FROM simple r
JOIN simple s USING (id);

  SELECT COUNT(*)
FROM simple r
JOIN simple s USING (id)
JOIN simple t USING (id);

  SELECT COUNT(*)
FROM simple r
JOIN simple s USING (id)
JOIN simple t USING (id)
JOIN simple u USING (id);

Unpatched master can make probing go faster by adding workers, but not
building, so in these self-joins the ability to scale with more CPUs
is limited (here w = 1 shows the speedup compared to

Re: [HACKERS] Removing binaries

2017-03-21 Thread Robert Haas

On Mon, Mar 20, 2017 at 6:15 PM, David Steele  wrote:
> On 3/20/17 3:40 PM, Jan de Visser wrote:
>> On Monday, March 20, 2017 3:30:49 PM EDT Robert Haas wrote:
>>> That would annoy me, because I use these constantly.  I also think
>>> that they solve a problem for users, which is this:
>>>
>>> [rhaas ~]$ psql
>>> psql: FATAL:  database "rhaas" does not exist
>>> [rhaas ~]$ psql -c 'create database rhaas;'
>>> psql: FATAL:  database "rhaas" does not exist
>>> [rhaas ~]$ gosh, i know i need to connect to a database in order to
>>> create the database to which psql tries to connect by default, so
>>> there must be an existing database with some name, but what exactly is
>>> that name, anyway?
>>> -bash: gosh,: command not found
>>>
>>> There was an occasion when this exact problem almost caused me to give
>>> up on using PostgreSQL.  Everybody here presumably knows that
>>> template1 and postgres are the magic words you can add to the end of
>>> that command line to make it work, but that is NOT self-evident to
>>> newcomers.
>>
>> Same here. I worked on a system with a shrink-wrap installer which
>> installed
>> pgsql as well and initialized it for use by the system user of our
>> software.
>> If a tester or sales engineer wanted to play with the DB, it would be
>> about 30
>> minutes before they would end up at my desk, in tears.
>
> How about adding a hint?

I think it's tricky to do that, because the error happens in response
to the connection attempt and at that point you don't know that the
command the user plans to send is CREATE DATABASE.  If we could
somehow detect that the user is trying to CREATE DATABASE against a
nonexistent database and hint in that case, that would be *great*, but
I don't see a way to make it work.

Here's another idea: what if we always created the default database at
initdb time?  For example, if I initdb as rhaas, maybe it should
create an "rhaas" database for me, so that this works:

initdb
pg_ctl start
psql

I think a big part of the usability problem here comes from the fact
that the default database for connections is based on the username,
but the default databases that get created have fixed names (postgres,
template1).  So the default configuration is one where you can't
connect.  Why the heck do we do it that way?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Create replication slot in pg_basebackup if requested and not yet present

2017-03-21 Thread Michael Banck

Am Dienstag, den 21.03.2017, 12:52 +0100 schrieb Michael Banck:
> New patches attached.

On second though, there was a superfluous whitespace change in
t/010_pg_basebackup.pl, and I've moved the check-for-hex regex fix to
the second patch as it's cosmetic and not related to changing the --slot
creation behaviour.


Michael
 
-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

From 2ab1be27a341ecdcd2d6a3dd5ce535aba5e16b03 Mon Sep 17 00:00:00 2001
From: Michael Banck 
Date: Sun, 19 Mar 2017 10:58:13 +0100
Subject: [PATCH 01/29] Create replication slot in pg_basebackup if requested
 and not yet present.

If a replication slot is explicitly requested, try to create it before
starting to replicate from it.
---
 src/bin/pg_basebackup/pg_basebackup.c| 15 +++
 src/bin/pg_basebackup/t/010_pg_basebackup.pl | 15 ++-
 2 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 0a4944d..69ca4be 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -581,6 +581,21 @@ StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier)
 	else
 		param->temp_slot = temp_replication_slot;
 
+	/*
+	 * Try to create a permanent replication slot if one is specified. Do
+	 * not error out if the slot already exists, other errors are already
+	 * reported by CreateReplicationSlot().
+	 */
+	if (!temp_replication_slot && replication_slot)
+	{
+		if (verbose)
+			fprintf(stderr, _("%s: creating replication slot \"%s\"\n"),
+	progname, replication_slot);
+
+		if (!CreateReplicationSlot(param->bgconn, replication_slot, NULL, true, true))
+			return 1;
+	}
+
 	if (format == 'p')
 	{
 		/*
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 14bd813..70c3284 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -4,7 +4,7 @@ use Cwd;
 use Config;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 72;
+use Test::More tests => 73;
 
 program_help_ok('pg_basebackup');
 program_version_ok('pg_basebackup');
@@ -246,14 +246,19 @@ $node->command_ok(
 	'pg_basebackup -X stream runs with --no-slot');
 
 $node->command_fails(
-	[ 'pg_basebackup', '-D', "$tempdir/fail", '-S', 'slot1' ],
+	[ 'pg_basebackup', '-D', "$tempdir/fail", '-X', 'none', '-S', 'slot0' ],
 	'pg_basebackup with replication slot fails without -X stream');
-$node->command_fails(
+$node->command_ok(
 	[   'pg_basebackup', '-D',
 		"$tempdir/backupxs_sl_fail", '-X',
 		'stream','-S',
-		'slot1' ],
-	'pg_basebackup fails with nonexistent replication slot');
+		'slot0' ],
+	'pg_basebackup -S creates previously nonexistent replication slot');
+
+my $lsn = $node->safe_psql('postgres',
+	q{SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'slot0'}
+);
+like($lsn, qr!^0/[0-9A-F]{7,8}$!, 'slot is present');
 
 $node->safe_psql('postgres',
 	q{SELECT * FROM pg_create_physical_replication_slot('slot1')});
-- 
2.1.4

From 5f0f31f61bf668de937868840a97c0a56dc2cd5d Mon Sep 17 00:00:00 2001
From: Michael Banck 
Date: Tue, 21 Mar 2017 12:50:22 +0100
Subject: [PATCH 02/29] Refactor replication slot creation in pg_basebackup et
 al.

Add new argument `is_temporary' to CreateReplicationSlot() in streamutil.c
which specifies whether a physical replication slot is temporary or not. Update
all callers.

Move the creation of the temporary replication slot for pg_basebackup from
receivelog.c to pg_basebackup.c. At the same time, also create the temporary
slot via CreateReplicationSlot() instead of creating the temporary slot with an
explicit SQL command.
---
 src/bin/pg_basebackup/pg_basebackup.c| 28 +---
 src/bin/pg_basebackup/pg_receivewal.c|  2 +-
 src/bin/pg_basebackup/pg_recvlogical.c   |  2 +-
 src/bin/pg_basebackup/receivelog.c   | 18 --
 src/bin/pg_basebackup/streamutil.c   | 18 +-
 src/bin/pg_basebackup/streamutil.h   |  2 +-
 src/bin/pg_basebackup/t/010_pg_basebackup.pl |  2 +-
 7 files changed, 34 insertions(+), 38 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 69ca4be..c7ae281 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -92,6 +92,7 @@ static pg_time_t last_progress_report = 0;
 static int32 maxrate = 0;		/* no limit by default */
 static char *replication_slot = NULL;
 static bool temp_replication_slot = true;
+static bool no_slot = false;
 
 static bool success = false;
 static bool made_new_pgdat

Re: [HACKERS] BRIN cost estimate

2017-03-21 Thread David Rowley

On 17 March 2017 at 23:50, Emre Hasegeli  wrote:
>> 1.
>>
>> + Assert(nnumbers == 1);
>>
>> I think its a bad idea to Assert() this. The stat tuple can come from
>> a plugin which could do anything. Seems like if we need to be certain
>> of that then it should be an elog(ERROR), maybe mention that we
>> expected a 1 element array, but got  elements.
>
> But it is not coming from a plugin which can do anything.  It is the
> plugin we know.  Assert() is exact same coding with btcostestimate().

Not sure what you mean here. I'm not speaking of the brin index am, I
mean the get_index_stats_hook call which you've added. An extension
can be loaded which sets this hook and provides its own tuple, which
may cause the Assert to fail. Asserts are normally used to verify in
assert enabled builds that would cause some following code to crash or
not do what it's meant to. Normally they're used to help track down
bugs in the software, they're not meant to track bugs in some software
we have no control over.

>> 2.
>>
>> + Assert(varCorrelation >= 0 && varCorrelation <= 1);
>>
>> same goes for that. I don't think we want to Assert() that either.
>> elog(ERROR) seems better, or perhaps we should fall back on the old
>> estimation method in this case?
>
> Again the correlation value is expected to be in this range.  I don't
> think we even need to bother with Assert().  Garbage in garbage out.

That's fine. Let's just not garbage crash.

>> The Asserted range also seems to contradict the range mentioned in
>> pg_statistic.h:
>
> We are using Abs() of the value.

I missed that.

>> 3.
>>
>> + numBlocks = 1 + baserel->pages / BrinGetPagesPerRange(indexRel);
>>
>> should numBlocks be named numRanges? After all we call the option
>> "pages_per_range".
>
> It was named this way before.

hmm, before what exactly? before your patch it didn't exist. You
introduced it into brincostestimate().

Here's the line of code:

+ double numBlocks;

If we want to have a variable which stores the number of ranges, then
I think numRanges is better than numBlocks. I can't imagine many
people would disagree there.

>> 6.
>>
>> Might want to consider not applying this estimate when the statistics
>> don't contain any STATISTIC_KIND_CORRELATION array.
>
> I am not sure what would be better than using the value as 0 in this case.

At the very least please write a comment to explain this in the code.
Right now it looks broken. If I noticed this then one day in the
future someone else will. If you write a comment then person of the
future will likely read it, and then not raise any questions about the
otherwise questionable code.

> I can look into supporting expression indexes, if you think something
> very much incomplete like this has a chance to get committed.

I do strongly agree that the estimates need improved here. I've
personally had issues with bad brin estimates before, and I'd like to
see it improved. I think the patch is not quite complete without it
also considering stats on expression indexes. If you have time to go
do that I'd suggest you go ahead with that.

I've not looked at the new patch yet, but thanks for making some of
those changes.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Problem in Parallel Bitmap Heap Scan?

2017-03-21 Thread Dilip Kumar

On Tue, Mar 21, 2017 at 4:47 PM, Thomas Munro
 wrote:
> I noticed a failure in the inet.sql test while running the regression
> tests with parallelism cranked up, and can reproduce it interactively
> as follows.  After an spgist index is created and the plan changes to
> the one shown below, the query returns no rows.

Thanks for reporting.  Seems like we are getting issues related to
TBM_ONE_PAGE and TBM_EMPTY.

I think in this area we need more testing, reason these are not tested
properly because these are not the natural case for parallel bitmap.
I think in next few days I will test more such cases by forcing the
parallel bitmap.

Here is the patch to fix the issue in hand.  I have also run the
regress suit with force_parallel_mode=regress and all the test are
passing.

Results after fix.

postgres=# explain analyze SELECT * FROM inet_tbl WHERE i <>
'192.168.1.0/24'::cidr
ORDER BY i;
QUERY PLAN
--
 Gather Merge  (cost=16.53..16.62 rows=9 width=64) (actual
time=4.467..4.478 rows=16 loops=1)
   Workers Planned: 1
   Workers Launched: 1
   ->  Sort  (cost=16.52..16.54 rows=9 width=64) (actual
time=0.090..0.093 rows=8 loops=2)
 Sort Key: i
 Sort Method: quicksort  Memory: 25kB
 ->  Parallel Bitmap Heap Scan on inet_tbl  (cost=12.26..16.37
rows=9 width=64) (actual time=0.048..0.050 rows=8 loops=2)
   Recheck Cond: (i <> '192.168.1.0/24'::inet)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on inet_idx3  (cost=0.00..12.25
rows=16 width=0) (actual time=0.016..0.016 rows=16 loops=1)
 Index Cond: (i <> '192.168.1.0/24'::inet)
 Planning time: 0.149 ms
 Execution time: 5.143 ms
(13 rows)

postgres=# SELECT * FROM inet_tbl WHERE i <> '192.168.1.0/24'::cidr
ORDER BY i;
 c  |i
+--
 10.0.0.0/8 | 9.1.2.3/8
 10.0.0.0/8 | 10.1.2.3/8
 10.0.0.0/32| 10.1.2.3/8
 10.0.0.0/8 | 10.1.2.3/8
 10.1.0.0/16| 10.1.2.3/16
 10.1.2.0/24| 10.1.2.3/24
 10.1.2.3/32| 10.1.2.3
 10.0.0.0/8 | 11.1.2.3/8
 192.168.1.0/24 | 192.168.1.226/24
 192.168.1.0/24 | 192.168.1.255/24
 192.168.1.0/24 | 192.168.1.0/25
 192.168.1.0/24 | 192.168.1.255/25
 192.168.1.0/26 | 192.168.1.226
 :::1.2.3.4/128 | ::4.3.2.1/24
 10:23::f1/128  | 10:23::f1/64
 10:23::8000/113| 10:23::
(16 rows)


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


fix_parallel_bitmap_handling_onepage.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE command progress checker

2017-03-21 Thread Haribabu Kommi

On Tue, Mar 21, 2017 at 3:41 PM, vinayak 
wrote:

> Thank you for testing the patch on Windows platform.
>
>
Thanks for the updated patch.

It works good for a normal relation.  But for a relation that contains
child tables,
the PROGRESS_ANALYZE_NUM_ROWS_SAMPLED produces wrong results.

How about adding another phase called
PROGRESS_ANALYZE_PHASE_COLLECT_INHERIT_SAMPLE_ROWS
and set this phase only when it is an inheritance analyze operation. And
adding
some explanation of ROWS_SAMPLED phase about inheritance tables
how these sampled rows are calculated will provide good analyze progress of
relation that contains child relations also.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] BRIN cost estimate

2017-03-21 Thread Emre Hasegeli

> Not sure what you mean here. I'm not speaking of the brin index am, I
> mean the get_index_stats_hook call which you've added.

I see.  Actually this part was from Alvaro.  I haven't noticed the
get_index_stats_hook call before, but it is still the same coding as
btcostestimate().  btcostestimate() also calls get_index_stats_hook,
and then Asserts nnumbers == 1.

> hmm, before what exactly? before your patch it didn't exist. You
> introduced it into brincostestimate().

I confused by looking at my changes on my repository I made on top of
Alvaro's.  I will rename it on the next version.

> At the very least please write a comment to explain this in the code.
> Right now it looks broken. If I noticed this then one day in the
> future someone else will. If you write a comment then person of the
> future will likely read it, and then not raise any questions about the
> otherwise questionable code.

Will do.

> I do strongly agree that the estimates need improved here. I've
> personally had issues with bad brin estimates before, and I'd like to
> see it improved. I think the patch is not quite complete without it
> also considering stats on expression indexes. If you have time to go
> do that I'd suggest you go ahead with that.

I will look into it this week.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Pavan Deolasee

On Tue, Mar 21, 2017 at 5:34 PM, Robert Haas  wrote:

> On Tue, Mar 21, 2017 at 6:56 AM, Amit Kapila 
> wrote:
> >> Hmm, that test case isn't all that synthetic.  It's just a single
> >> column bulk update, which isn't anything all that crazy, and 5-10%
> >> isn't nothing.
> >>
> >> I'm kinda surprised it made that much difference, though.
> >>
> >
> > I think it is because heap_getattr() is not that cheap.  We have
> > noticed the similar problem during development of scan key push down
> > work [1].
>
> Yeah.  So what's the deal with this?  Is somebody working on figuring
> out a different approach that would reduce this overhead?  Are we
> going to defer WARM to v11?  Or is the intent to just ignore the 5-10%
> slowdown on a single column update and commit everything anyway?

I think I should clarify something. The test case does a single column
update, but it also has columns which are very wide, has an index on many
columns (and it updates a column early in the list). In addition, in the
test Mithun updated all 10million rows of the table in a single
transaction, used UNLOGGED table and fsync was turned off.

TBH I see many artificial scenarios here. It will be very useful if he can
rerun the query with some of these restrictions lifted. I'm all for
addressing whatever we can, but I am not sure if this test demonstrates a
real world usage.

Having said that, may be if we can do a few things to reduce the overhead.

- Check if the page has enough free space to perform a HOT/WARM update. If
not, don't look for all index keys.
- Pass bitmaps separately for each index and bail out early if we conclude
neither HOT nor WARM is possible. In this case since there is just one
index and as soon as we check the second column we know neither HOT nor
WARM is possible, we will return early. It might complicate the API a lot,
but I can give it a shot if that's what is needed to make progress.

Any other ideas?

Thanks,
Pavan
-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

[HACKERS] Implementing delete in columnar store fdw

2017-03-21 Thread Hari Krishnan

Hello,

I want to implement delete functionality for a column store fdw in 
postgres. It is similar to file_fdw. I want to use the 
“AddForeignUpdateTargets” function to implement this , but the junk filter 
shouldn’t be a column present in the table . Is it possible to add a Expr/Var 
to the junk filter that is not present in the schema of the table , that i will 
generate on the fly ??

If yes, how can i do that ?? Thanks in advance.

Regards,
Harikrishnan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-03-21 Thread Amit Kapila

On Mon, Mar 20, 2017 at 8:27 AM, Robert Haas  wrote:
> On Fri, Mar 17, 2017 at 2:30 AM, Amit Kapila  wrote:
>>> I was wondering about doing an explicit test: if the XID being
>>> committed matches the one in the PGPROC, and nsubxids matches, and the
>>> actual list of XIDs matches, then apply the optimization.  That could
>>> replace the logic that you've proposed to exclude non-commit cases,
>>> gxact cases, etc. and it seems fundamentally safer.  But it might be a
>>> more expensive test, too, so I'm not sure.
>>
>> I think if the number of subxids is very small let us say under 5 or
>> so, then such a check might not matter, but otherwise it could be
>> expensive.
>
> We could find out by testing it.  We could also restrict the
> optimization to cases with just a few subxids, because if you've got a
> large number of subxids this optimization probably isn't buying much
> anyway.
>

Yes, and I have modified the patch to compare xids and subxids for
group update.  In the initial short tests (with few client counts), it
seems like till 3 savepoints we can win and 10 savepoints onwards
there is some regression or at the very least there doesn't appear to
be any benefit.  We need more tests to identify what is the safe
number, but I thought it is better to share the patch to see if we
agree on the changes because if not, then the whole testing needs to
be repeated.  Let me know what do you think about attached?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

group_update_clog_v11.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-03-21 Thread Beena Emerson

PFA an updated patch.

This fixes an issue reported by Tushar internally. Since the patch changes
the way min and max wal_size is stored internally from segment count to
size in kb, it limited the maximum size of min and max_wal_size to 2GB in
32 bit systems.

The minimum required segment is 2 and the minimum wal size is 1MB. So the
lowest possible value of the min/max wal_size is 2MB. Hence, I have changed
the internal representation to MB instead of KB so that we can increase the
range. Also, for any wal-seg-size, it retains the default seg count as 5
and 64 for min and max wal_size. Consequently, the size of the variables
increase automatically according to the wal_segment_size. This behaviour is
similar to that of existing code.

I have also run pg_indent on the files.


On Mon, Mar 20, 2017 at 11:37 PM, Beena Emerson 
wrote:

> Hello,
>
> PFA the updated patch.
>
> On Fri, Mar 17, 2017 at 6:40 AM, Robert Haas 
> wrote:
>
>> On Tue, Mar 14, 2017 at 1:44 AM, Beena Emerson 
>> wrote:
>> > Attached is the updated patch. It fixes the issues and also updates few
>> code
>> > comments.
>>
>> I did an initial readthrough of this patch tonight just to get a
>> feeling for what's going on.  Based on that, here are a few review
>> comments:
>>
>> The changes to pg_standby seem to completely break the logic to wait
>> until the file has attained the correct size.  I don't know how to
>> salvage that logic off-hand, but just breaking it isn't acceptable.
>>
>
> Using, the XLogLongPageHeader->xlp_seg_size, all the original checks have
> been retained. This methid is even used in pg_waldump.
>
>
>
>>
>> + Note that changing this value requires an initdb.
>>
>> Instead, maybe say something like "Note that this value is fixed for
>> the lifetime of the database cluster."
>>
>
> Corrected.
>
>
>>
>> -intmax_wal_size = 64;/* 1 GB */
>> -intmin_wal_size = 5;/* 80 MB */
>> +intwal_segment_size = 2048;/* 16 MB */
>> +intmax_wal_size = 1024 * 1024;/* 1 GB */
>> +intmin_wal_size = 80 * 1024;/* 80 MB */
>>
>> If wal_segment_size is now measured in multiple of XLOG_BLCKSZ, then
>> it's not the case that 2048 is always 16MB.  If the other values are
>> now measured in kB, perhaps rename the variables to add _kb, to avoid
>> confusion with the way it used to work (and in general).  The problem
>> with leaving this as-is is that any existing references to
>> max_wal_size in core or extension code will silently break; you want
>
> it to break in a noticeable way so that it gets fixed.
>>
>>
> The  wal_segment_size  now is DEFAULT_XLOG_SEG_SIZE / XLOG_BLCKSZ;
> min and max wal_size  have _kb postfix
>
>
>> + * UsableBytesInSegment: It is set in assign_wal_segment_size and stores
>> the
>> + * number of bytes in a WAL segment usable for WAL data.
>>
>> The comment doesn't need to say where it gets set, and it doesn't need
>> to repeat the variable name.  Just say "The number of bytes in a..."
>>
>
> Done.
>
>
>>
>> +assign_wal_segment_size(int newval, void *extra)
>>
>> Why does a PGC_INTERNAL GUC need an assign hook?  I think the GUC
>> should only be there to expose the value; it shouldn't have
>> calculation logic associated with it.
>>
>
> Removed the function and called the functions in ReadControlFile.
>
>
>>
>>  /*
>> + * initdb passes the WAL segment size in an environment variable. We
>> don't
>> + * bother doing any sanity checking, we already check in initdb that
>> the
>> + * user gives a sane value.
>> + */
>> +XLogSegSize = pg_atoi(getenv("XLOG_SEG_SIZE"), sizeof(uint32), 0);
>>
>> I think we should bother.  I don't like the idea of the postmaster
>> crashing in flames without so much as a reasonable error message if
>> this parameter-passing mechanism goes wrong.
>>
>
> I have rechecked the XLogSegSize.
>
>
>>
>> +{"wal-segsize", required_argument, NULL, 'Z'},
>>
>> When adding an option with no documented short form, generally one
>> picks a number that isn't a character for the value at the end.  See
>> pg_regress.c or initdb.c for examples.
>>
>
> Done.
>
>
>>
>> +   wal_segment_size = atoi(str_wal_segment_size);
>>
>> So, you're comfortable interpreting --wal-segsize=1TB or
>> --wal-segsize=1GB as 1?  Implicitly, 1MB?
>>
>
> Imitating the current behaviour of config option --with-wal-segment, I
> have used strtol to throw an error if the value is not only integers.
>
>
>>
>> + * ControlFile is not accessible here so use SHOW wal_segment_size
>> command
>> + * to set the XLogSegSize
>>
>> Breaks compatibility with pre-9.6 servers.
>>
>
> Added check for the version, the SHOW command will be run only in v10 and
> above. Previous versions do not need this.
>
>
>>
>> --
> Thank you,
>
> Beena Emerson
>
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
Thank you,

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreS

Re: [HACKERS] increasing the default WAL segment size

2017-03-21 Thread Stephen Frost

Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Mar 20, 2017 at 7:23 PM, David Steele  wrote:
> > With 16MB WAL segments the filename neatly aligns with the LSN.  For
> > example:
> >
> > WAL FILE 0001000100FE = LSN 1/FE00
> >
> > This no longer holds true with this patch.
> 
> It is already possible to change the WAL segment size using the
> configure option --with-wal-segsize, and I think the patch should be
> consistent with whatever that existing option does.

Considering how little usage that option has likely seen (I can't say
I've ever run into usage of it so far...), I'm not really sure that it
makes sense to treat it as final when we're talking about changing the
default here.

In short, I'm also concerned about this change to make WAL file names no
longer match up with LSNs and also about the odd stepping that you get
as a result of this change when it comes to WAL file names.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Robert Haas

On Tue, Mar 21, 2017 at 8:41 AM, Pavan Deolasee
 wrote:
>> Yeah.  So what's the deal with this?  Is somebody working on figuring
>> out a different approach that would reduce this overhead?  Are we
>> going to defer WARM to v11?  Or is the intent to just ignore the 5-10%
>> slowdown on a single column update and commit everything anyway?
>
> I think I should clarify something. The test case does a single column
> update, but it also has columns which are very wide, has an index on many
> columns (and it updates a column early in the list). In addition, in the
> test Mithun updated all 10million rows of the table in a single transaction,
> used UNLOGGED table and fsync was turned off.
>
> TBH I see many artificial scenarios here. It will be very useful if he can
> rerun the query with some of these restrictions lifted. I'm all for
> addressing whatever we can, but I am not sure if this test demonstrates a
> real world usage.

That's a very fair point, but if these patches - or some of them - are
going to get committed then these things need to get discussed.  Let's
not just have nothing-nothing-nothing giant unagreed code drop.

I think that very wide columns and highly indexed tables are not
particularly unrealistic, nor do I think updating all the rows is
particularly unrealistic.  Sure, it's not everything, but it's
something.  Now, I would agree that all of that PLUS unlogged tables
with fsync=off is not too realistic.  What kind of regression would we
observe if we eliminated those last two variables?

> Having said that, may be if we can do a few things to reduce the overhead.
>
> - Check if the page has enough free space to perform a HOT/WARM update. If
> not, don't look for all index keys.
> - Pass bitmaps separately for each index and bail out early if we conclude
> neither HOT nor WARM is possible. In this case since there is just one index
> and as soon as we check the second column we know neither HOT nor WARM is
> possible, we will return early. It might complicate the API a lot, but I can
> give it a shot if that's what is needed to make progress.

I think that whether the code ends up getting contorted is an
important consideration here.  For example, if the first of the things
you mention can be done without making the code ugly, then I think
that would be worth doing; it's likely to help fairly often in
real-world cases.  The problem with making the code contorted and
ugly, as you say that the second idea would require, is that it can
easily mask bugs.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH]: fix bug in SP-GiST box_ops

2017-03-21 Thread Teodor Sigaev


Thank you, pushed. I just make test table permanent.

Anastasia Lubennikova wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

As I can see, this bugfix was already discussed and reviewed.
All mentioned issues are fixed in the latest version of the patch
So I mark it "Ready for committer".
Thank you for fixing it!

The new status of this patch is: Ready for Committer



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Alvaro Herrera

Amit Kapila wrote:

> I think it is because heap_getattr() is not that cheap.  We have
> noticed the similar problem during development of scan key push down
> work [1].

One possibility to reduce the cost of that is to use whole tuple deform
instead of repeated individual heap_getattr() calls.  Since we don't
actually need *all* attrs, we can create a version of heap_deform_tuple
that takes an attribute number as argument and decodes up to that point.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-03-21 Thread Kuntal Ghosh

On Tue, Mar 21, 2017 at 10:52 AM, Michael Paquier
 wrote:
Thank you for the review.

> Unfortunately this is true only for background workers that connect to
> a database. And this would break for bgworkers that do not do that.
> The point to fix is here:
> +   if (MyBackendId != InvalidBackendId)
> +   {
> [...]
> +   else if (IsBackgroundWorker)
> +   {
> +   /* bgworker */
> +   beentry->st_backendType = B_BG_WORKER;
> +   }
> Your code is assuming that a bgworker will always be setting
> MyBackendId which is not necessarily true, and you would trigger the
> assertion down:
> Assert(MyAuxProcType != NotAnAuxProcess);
> So you need to rely on IsBackgroundWorker in priority I think. I would
> suggest as well to give up calling pgstat_bestart() in
> BackgroundWorkerInitializeConnection[ByOid] and let the workers do
> this work by themselves. This gives more flexibility. You would need
> to update the logical replication worker and worker_spi for that as
> well.
We reserve a slot for each possible BackendId, plus one for each
possible auxiliary process type. For a non-auxiliary process,
BackendId is used to refer the backend status in PgBackendStatus
array. So, a bgworker without any BackendId can't initialize its'
entry in PgBackendStatus array. In simple terms, it will not be shown
in pg_stat_activity. I've added some comments regarding this in
pgstat_bestart().
And, any bgworker having valid BackendId will have either a valid
userid or BOOTSTRAP_SUPERUSERID.

> If you want to test this configuration, feel free to use this background 
> worker:
> https://github.com/michaelpq/pg_plugins/tree/master/hello_world
> This just prints an entry to the logs every 10s, and does not connect
> to any database. Adding a call to pgstat_bestart() in hello_main
> triggers the assertion.
>
In this case, pgstat_bestart() shouldn't be called from this module as
the spawned bgworker will have invalid BackendId. By the way, thanks
for sharing the repo link. Found a lot of interesting things to
explore and learn. :)

>>> @@ -808,6 +836,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
>>> nulls[12] = true;
>>> nulls[13] = true;
>>> nulls[14] = true;
>>> +   nulls[23] = true;
>>> }
>>> That's not possible to have backend_type set as NULL, no?
>>
>> Yes, backend_type can't be null. But, I'm not sure whether it should
>> be visible to a user with insufficient privileges. Anyway, I've made
>> it visible to all user for now.
>>
>> Please find the updated patches in the attachment.
>
> Yeah, hiding it may make sense...
Modified.

> /* The autovacuum launcher is done here */
> if (IsAutoVacuumLauncherProcess())
> +   {
> return;
> +   }
> Unnecessary noise here.
>
Fixed.

> Except for the two issues pointed out in this email, I am pretty cool
> with this patch. Nice work.
Thank you. :)

Please find the updated patches.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


0001-Infra-to-expose-all-backend-processes-in-pg_stat_get.patch
Description: application/download


0002-Expose-stats-for-all-backends.patch
Description: application/download


0003-Add-backend_type-column-in-pg_stat_get_activity.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables and relfilenode

2017-03-21 Thread Robert Haas

On Tue, Mar 21, 2017 at 5:05 AM, Amit Langote
 wrote:
> Attached updated patches.

Committed 0001 after removing a comma.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze on Cygwin w/ concurrency

2017-03-21 Thread Robert Haas

On Mon, Mar 20, 2017 at 11:47 PM, Noah Misch  wrote:
> "pgbench -i -s 50; pgbench -S -j2 -c16 -T900 -P5" freezes consistently on
> Cygwin 2.2.1 and Cygwin 2.6.0.  (I suspect most other versions are affected.)
> I've pinged[1] the Cygwin bug thread with some additional detail.

Ouch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Multiple false-positive warnings from Valgrind

2017-03-21 Thread Aleksander Alekseev

Hello.

I need a little help.

Recently I've decided to run PostgreSQL under Valgrind according to wiki
description [1]. Lots of warnings are generated [2] but it is my
understanding that all of them are false-positive. For instance I've
found these two reports particularly interesting:

```
==00:00:40:40.161 7677== Use of uninitialised value of size 8
==00:00:40:40.161 7677==at 0xA15FF5: pg_b64_encode (base64.c:68)
==00:00:40:40.161 7677==by 0x6FFE85: scram_build_verifier (auth-scram.c:348)
==00:00:40:40.161 7677==by 0x6F3F76: encrypt_password (crypt.c:171)
==00:00:40:40.161 7677==by 0x68F40C: CreateRole (user.c:403)
==00:00:40:40.161 7677==by 0x85D53A: standard_ProcessUtility (utility.c:716)
==00:00:40:40.161 7677==by 0x85CCC7: ProcessUtility (utility.c:353)
==00:00:40:40.161 7677==by 0x85BD22: PortalRunUtility (pquery.c:1165)
==00:00:40:40.161 7677==by 0x85BF20: PortalRunMulti (pquery.c:1308)
==00:00:40:40.161 7677==by 0x85B4A0: PortalRun (pquery.c:788)
==00:00:40:40.161 7677==by 0x855672: exec_simple_query (postgres.c:1101)
==00:00:40:40.161 7677==by 0x8597BB: PostgresMain (postgres.c:4066)
==00:00:40:40.161 7677==by 0x7C6322: BackendRun (postmaster.c:4317)
==00:00:40:40.161 7677==  Uninitialised value was created by a stack allocation
==00:00:40:40.161 7677==at 0x6FFDB7: scram_build_verifier (auth-scram.c:328)

==00:00:40:40.593 7677== Use of uninitialised value of size 8
==00:00:40:40.593 7677==at 0x8A7C36: hex_encode (encode.c:132)
==00:00:40:40.593 7677==by 0x6FFEF5: scram_build_verifier (auth-scram.c:355)
==00:00:40:40.593 7677==by 0x6F3F76: encrypt_password (crypt.c:171)
==00:00:40:40.593 7677==by 0x68F40C: CreateRole (user.c:403)
==00:00:40:40.593 7677==by 0x85D53A: standard_ProcessUtility (utility.c:716)
==00:00:40:40.593 7677==by 0x85CCC7: ProcessUtility (utility.c:353)
==00:00:40:40.593 7677==by 0x85BD22: PortalRunUtility (pquery.c:1165)
==00:00:40:40.593 7677==by 0x85BF20: PortalRunMulti (pquery.c:1308)
==00:00:40:40.593 7677==by 0x85B4A0: PortalRun (pquery.c:788)
==00:00:40:40.593 7677==by 0x855672: exec_simple_query (postgres.c:1101)
==00:00:40:40.593 7677==by 0x8597BB: PostgresMain (postgres.c:4066)
==00:00:40:40.593 7677==by 0x7C6322: BackendRun (postmaster.c:4317)
==00:00:40:40.593 7677==  Uninitialised value was created by a stack allocation
==00:00:40:40.593 7677==at 0x6FFDB7: scram_build_verifier (auth-scram.c:328)
==00:00:40:40.593 7677==
```

And here is what I see in GDB [3]:

```
0x00a160b4 in pg_b64_encode (src=0xffefffb10 [...] at base64.c:80
80  *p++ = _base64[(buf >> 12) & 0x3f];
(gdb) monitor get_vbits 0xffefffb10 10
  

0x008a7c36 in hex_encode (
src=0xffefffbc0 [...] at encode.c:132
132 *dst++ = hextbl[(*src >> 4) & 0xF];
(gdb) monitor get_vbits 0xffefffbc0 32
       
```

So Valgrind thinks that in both cases first argument is completely
uninitialized which is very doubtful to say the least :) There are also
lots of memory leak reports which could be found in [2].

I got a strong feeling that maybe I'm doing something wrong. Here are
exact script I'm using to build [4], install and run PostgreSQL under
Valgrind [5]. Naturally USE_VALGRIND in declared in pg_config_manual.h.
Valgrind version is 3.12 and an environment in general is Arch Linux.

Could you please give a little piece of advice? Or maybe a wiki page is
just a bit outdated?

[1] https://wiki.postgresql.org/wiki/Valgrind
[2] http://afiskon.ru/s/8a/390698e914_valgrind.tgz
[3] http://afiskon.ru/s/09/c4f6231679_pgvg.txt
[4] https://github.com/afiskon/pgscripts/blob/master/quick-build.sh
[5] https://github.com/afiskon/pgscripts/blob/master/valgrind.sh

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Amit Kapila

On Tue, Mar 21, 2017 at 6:55 PM, Robert Haas  wrote:
> On Tue, Mar 21, 2017 at 8:41 AM, Pavan Deolasee
>  wrote:
>>> Yeah.  So what's the deal with this?  Is somebody working on figuring
>>> out a different approach that would reduce this overhead?  Are we
>>> going to defer WARM to v11?  Or is the intent to just ignore the 5-10%
>>> slowdown on a single column update and commit everything anyway?
>>
>> I think I should clarify something. The test case does a single column
>> update, but it also has columns which are very wide, has an index on many
>> columns (and it updates a column early in the list). In addition, in the
>> test Mithun updated all 10million rows of the table in a single transaction,
>> used UNLOGGED table and fsync was turned off.
>>
>> TBH I see many artificial scenarios here. It will be very useful if he can
>> rerun the query with some of these restrictions lifted. I'm all for
>> addressing whatever we can, but I am not sure if this test demonstrates a
>> real world usage.
>
> That's a very fair point, but if these patches - or some of them - are
> going to get committed then these things need to get discussed.  Let's
> not just have nothing-nothing-nothing giant unagreed code drop.
>
> I think that very wide columns and highly indexed tables are not
> particularly unrealistic, nor do I think updating all the rows is
> particularly unrealistic.  Sure, it's not everything, but it's
> something.  Now, I would agree that all of that PLUS unlogged tables
> with fsync=off is not too realistic.  What kind of regression would we
> observe if we eliminated those last two variables?
>

Sure, we can try that.  I think we need to try it with
synchronous_commit = off, otherwise, WAL writes completely overshadows
everything.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Robert Haas

On Tue, Mar 21, 2017 at 10:01 AM, Amit Kapila  wrote:
>> I think that very wide columns and highly indexed tables are not
>> particularly unrealistic, nor do I think updating all the rows is
>> particularly unrealistic.  Sure, it's not everything, but it's
>> something.  Now, I would agree that all of that PLUS unlogged tables
>> with fsync=off is not too realistic.  What kind of regression would we
>> observe if we eliminated those last two variables?
>
> Sure, we can try that.  I think we need to try it with
> synchronous_commit = off, otherwise, WAL writes completely overshadows
> everything.

synchronous_commit = off is a much more realistic scenario than fsync = off.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Pavan Deolasee

On Tue, Mar 21, 2017 at 6:55 PM, Robert Haas  wrote:

>
> I think that very wide columns and highly indexed tables are not
> particularly unrealistic, nor do I think updating all the rows is
> particularly unrealistic.

Ok. But those who update 10M rows in a single transaction, would they
really notice 5-10% variation? I think it probably makes sense to run those
updates in smaller transactions and see if the regression is still visible
(otherwise tweaking synchronous_commit is mute anyways).

> Sure, it's not everything, but it's
> something.  Now, I would agree that all of that PLUS unlogged tables
> with fsync=off is not too realistic.  What kind of regression would we
> observe if we eliminated those last two variables?
>

Hard to say. I didn't find any regression on the machines available to me
even with the original test case that I used, which was pretty bad case to
start with (sure, Mithun tweaked it further to create even worse scenario).
May be the kind of machines he has access to, it might show up even with
those changes.

>
>
> I think that whether the code ends up getting contorted is an
> important consideration here.  For example, if the first of the things
> you mention can be done without making the code ugly, then I think
> that would be worth doing; it's likely to help fairly often in
> real-world cases.  The problem with making the code contorted and
> ugly, as you say that the second idea would require, is that it can
> easily mask bugs.

Agree. That's probably one reason why Alvaro wrote the patch to start with.
I'll give the first of those two options a try.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Alvaro Herrera

Robert Haas wrote:
> On Tue, Mar 21, 2017 at 10:01 AM, Amit Kapila  wrote:

> > Sure, we can try that.  I think we need to try it with
> > synchronous_commit = off, otherwise, WAL writes completely overshadows
> > everything.
> 
> synchronous_commit = off is a much more realistic scenario than fsync = off.

Sure, synchronous_commit=off is a reasonable case.  But I say if we lose
a few % on the case where you update only the first indexed of a large
number of very wide columns all indexed, and this is only noticeable if
you don't write WAL and only if you update all the rows in the table,
then I don't see much reason for concern.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-21 Thread Simon Riggs

On 21 March 2017 at 02:21, Craig Ringer  wrote:
> On 20 March 2017 at 17:33, Andres Freund  wrote:
>
>>> Subject: [PATCH 2/3] Follow timeline switches in logical decoding
>>
>> FWIW, the title doesn't really seem accurate to me.
>
> Yeah, it's not really at the logical decoding layer at all.
>
> "Teach xlogreader to follow timeline switches" ?

Happy with that. I think Craig has addressed Andres' issues with this
patch, so I will apply later today as planned using that name.

The longer Logical Decoding on Standby will not be applied yet and not
without further changess, per review.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-03-21 Thread Ashutosh Bapat

Thanks Rajkumar. Added those in the latest set of patches.

On Tue, Mar 21, 2017 at 3:52 PM, Rajkumar Raghuwanshi
 wrote:
>> On Mon, Mar 20, 2017 at 1:19 PM, Ashutosh Bapat
>>  wrote:
>
> I have created some test to cover partition wise joins with
> postgres_fdw, also verified make check.
> patch attached.
>
> Thanks & Regards,
> Rajkumar Raghuwanshi
> QMG, EnterpriseDB Corporation



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing binaries

2017-03-21 Thread David G. Johnston

On Tue, Mar 21, 2017 at 5:12 AM, Robert Haas  wrote:

>
> Here's another idea: what if we always created the default database at
> initdb time?  For example, if I initdb as rhaas, maybe it should
> create an "rhaas" database for me, so that this works:
>
> initdb
> pg_ctl start
> psql
>
> I think a big part of the usability problem here comes from the fact
> that the default database for connections is based on the username,
> but the default databases that get created have fixed names (postgres,
> template1).  So the default configuration is one where you can't
> connect.  Why the heck do we do it that way?
>
>
I'd be curious to estimate how many users that have difficulties learning
how all this works actually run a manual initdb prior to beginning their
experimentation.  I suspect the percentage is fairly low.

Doing away with "the default database for psql is one named after the user"
seems like it would be more widely applicable.  I for one tend to name
things after what they do, or are used for, and thus have never benefited
from this particular design decision.

David J.

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-03-21 Thread Robert Haas

On Tue, Mar 21, 2017 at 7:41 AM, Ashutosh Bapat
 wrote:
> On Mon, Mar 20, 2017 at 10:17 PM, Ashutosh Bapat
>  wrote:
>>>
>>> On a further testing of this patch I find another case when it is
>>> showing regression, the time taken with patch is around 160 secs and
>>> without it is 125 secs.
>>> Another minor thing to note that is planning time is almost twice with
>>> this patch, though I understand that this is for scenarios with really
>>> big 'big data' so this may not be a serious issue in such cases, but
>>> it'd be good if we can keep an eye on this that it doesn't exceed the
>>> computational bounds for a really large number of tables.
>>
>> Right, planning time would be proportional to the number of partitions
>> at least in the first version. We may improve upon it later.
>>
>>> Please find the attached .out file to check the output I witnessed and
>>> let me know if anymore information is required
>>> Schema and data was similar to the preciously shared schema with the
>>> addition of more data for this case, parameter settings used were:
>>> work_mem = 1GB
>>> random_page_cost = seq_page_cost = 0.1
>
> this doesn't look good. Why do you set both these costs to the same value?

That's a perfectly reasonable configuration if the data is in memory
on a medium with fast random access, like an SSD.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Robert Haas

On Tue, Mar 21, 2017 at 10:21 AM, Alvaro Herrera
 wrote:
> Robert Haas wrote:
>> On Tue, Mar 21, 2017 at 10:01 AM, Amit Kapila  
>> wrote:
>
>> > Sure, we can try that.  I think we need to try it with
>> > synchronous_commit = off, otherwise, WAL writes completely overshadows
>> > everything.
>>
>> synchronous_commit = off is a much more realistic scenario than fsync = off.
>
> Sure, synchronous_commit=off is a reasonable case.  But I say if we lose
> a few % on the case where you update only the first indexed of a large
> number of very wide columns all indexed, and this is only noticeable if
> you don't write WAL and only if you update all the rows in the table,
> then I don't see much reason for concern.

If the WAL writing hides the loss, then I agree that's not a big
concern.  But if the loss is still visible even when WAL is written,
then I'm not so sure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-03-21 Thread David Steele


On 3/21/17 9:04 AM, Stephen Frost wrote:

Robert,

* Robert Haas (robertmh...@gmail.com) wrote:

On Mon, Mar 20, 2017 at 7:23 PM, David Steele  wrote:

With 16MB WAL segments the filename neatly aligns with the LSN.  For
example:

WAL FILE 0001000100FE = LSN 1/FE00

This no longer holds true with this patch.


It is already possible to change the WAL segment size using the
configure option --with-wal-segsize, and I think the patch should be
consistent with whatever that existing option does.


Considering how little usage that option has likely seen (I can't say
I've ever run into usage of it so far...), I'm not really sure that it
makes sense to treat it as final when we're talking about changing the
default here.


+1.  A seldom-used compile-time option does not necessarily provide a 
good model for a user-facing feature.



In short, I'm also concerned about this change to make WAL file names no
longer match up with LSNs and also about the odd stepping that you get
as a result of this change when it comes to WAL file names.


I can't decide which way I like best.  I like the filenames 
corresponding to LSNs as they do now, but it seems like a straight 
sequence might be easier to understand.  Either way you need to know 
that different segment sizes mean different numbers of segments per 
lsn.xlogid.


Even now the correspondence is a bit tenuous.  I've always thought:

00010001000F

Should be:

000100010F00

I'm really excited to (hopefully) have this feature in v10.  I just want 
to be sure we discuss this as it will be a big change for tool authors 
and just about anybody who looks at WAL.


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing binaries

2017-03-21 Thread David Steele


On 3/21/17 10:30 AM, David G. Johnston wrote:

On Tue, Mar 21, 2017 at 5:12 AM, Robert Haas mailto:robertmh...@gmail.com>>wrote:


Here's another idea: what if we always created the default database at
initdb time?  For example, if I initdb as rhaas, maybe it should
create an "rhaas" database for me, so that this works:

initdb
pg_ctl start
psql

I think a big part of the usability problem here comes from the fact
that the default database for connections is based on the username,
but the default databases that get created have fixed names (postgres,
template1).  So the default configuration is one where you can't
connect.  Why the heck do we do it that way?


I'd be curious to estimate how many users that have difficulties
learning how all this works actually run a manual initdb prior to
beginning their experimentation.  I suspect the percentage is fairly low.

Doing away with "the default database for psql is one named after the
user" seems like it would be more widely applicable.  I for one tend to
name things after what they do, or are used for, and thus have never
benefited from this particular design decision.


I suppose it would be too big a change to have psql try the username and 
then fallback to postgres on failure?


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing binaries

2017-03-21 Thread Robert Haas

On Tue, Mar 21, 2017 at 10:43 AM, David Steele  wrote:
> I suppose it would be too big a change to have psql try the username and
> then fallback to postgres on failure?

It's not so much that the change would be too big as that the
resulting semantics would be confusing, at least IMHO.  Imagine:

1. Person A logs into the service account, runs psql, starts creating
stuff in postgres DB without realizing it.
2. Person B logs into the service account, runs psql -l, sees that the
usual DB name is not there, runs createdb.
3. Person A reruns psql and says "where'd all my stuff go?".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add missing support for new node fields

2017-03-21 Thread Tom Lane

Andres Freund  writes:
> On 2017-03-21 07:22:57 +0100, Fabien COELHO wrote:
>> I've been annoyed by these stupid functions and forgetting to update them
>> since I run into them while trying to fix an issue in pg_stat_statement some
>> time ago.
>> 
>> I've started to develop a perl script to generate most of them from headers.
>> It is not done yet, but it looks that it can work in the end with limited
>> effort. Currently it works for copy & equal.

> It'd have to do out/read as well imo.

Yeah.  A partial solution is pretty much useless.  Even with out/read
support as well, I'm not sure it's not useless, because you'd still
have to remember to consider places like expression_tree_walker and
expression_tree_mutator.  Those are not really amenable to automatic
generation AFAICS, because they have to understand the actual semantics
of each field.

It's conceivable that you could get somewhere if the starting point were
some marked-up representation of every node type's field list, rather than
just a C declaration.  (IOW, include/nodes/*.h would become generated
files as well.)  But really, isn't that just moving the error locale from
"forgetting to update equalfuncs.c" to "forgetting to include the correct
marker keywords"?  And people would have to learn about this generator
tool and its input language.

In the end there's very little substitute for checking every reference
to a struct type when you add/modify one of its fields.  grep is your
friend.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add missing support for new node fields

2017-03-21 Thread Robert Haas

On Tue, Mar 21, 2017 at 10:56 AM, Tom Lane  wrote:
> Andres Freund  writes:
>> On 2017-03-21 07:22:57 +0100, Fabien COELHO wrote:
>>> I've been annoyed by these stupid functions and forgetting to update them
>>> since I run into them while trying to fix an issue in pg_stat_statement some
>>> time ago.
>>>
>>> I've started to develop a perl script to generate most of them from headers.
>>> It is not done yet, but it looks that it can work in the end with limited
>>> effort. Currently it works for copy & equal.
>
>> It'd have to do out/read as well imo.
>
> Yeah.  A partial solution is pretty much useless.  Even with out/read
> support as well, I'm not sure it's not useless, because you'd still
> have to remember to consider places like expression_tree_walker and
> expression_tree_mutator.  Those are not really amenable to automatic
> generation AFAICS, because they have to understand the actual semantics
> of each field.

Well, let's not move the goalposts into the outer solar system.  There
are plenty of changes to node structure that don't require
expression_tree_walker or expression_tree_mutator to be touched at
all.  Expression nodes aren't touched all that frequently compared to
path, plan, etc. nodes.

IMHO, what would be a lot more useful than something that generates
{read,equal,copy,out}funcs.c automatically would be something that
just checks them for trivial errors of omission.  For example, if you
read a list of structure members from the appropriate header files and
cross-checked it against the list of structure members mentioned in
the body of a copy/equal/read/outfunc, you'd catch probably 80% of the
mistakes people make right there.  If you could also check for a
copy/read/equal/outfunc that ought to have been present but was
omitted entirely, that'd probably up it to 90%.

The idea would be that if you added a field that wasn't supposed to be
copied, you'd have to add something to copyfuncs.c that said, e.g.

/* NOTCOPIED: mymember */

...and the checking script would ignore things so flagged.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-21 Thread Tom Lane

Andres Freund  writes:
> On 2017-03-20 16:06:27 -0400, Tom Lane wrote:
>> ... is there a reason why resultnum for EEOP_ASSIGN_* steps is declared
>> size_t and not just int?  Since it's an array index, and one that
>> certainly can't be bigger than AttrNumber, that seems rather confusing.

> Not that I can see, no.  I guess I might have "overcompensated" when
> changing it from AttrNumber - AttrNumber isn't a good idea because that
> needs an extra move-zero-extend, because 16bit indexing isn't that well
> supported on x86.  But that doesn't mean it should be a 64bit number -
> to the contrary actually.

OK, will fix in the edits I'm working on.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

2017-03-21 Thread Teodor Sigaev


Thank you for your suggestions, do not hesitate to ask any questions,
concurrency and GIN both are very interesting topics.


I had a look on patch and found some issue.
Look at ginvacuum.c around line 387, function ginVacuumPostingTreeLeaves():


/*
 * All subtree is empty - just return TRUE to indicate that parent must
 * do a cleanup. Unless we are ROOT an there is way to go upper.
 */

if(isChildHasVoid && !isAnyNonempy && !isRoot)
return TRUE;

if(isChildHasVoid)
{
...
ginScanToDelete(gvs, blkno, TRUE, &root, InvalidOffsetNumber);
}

In first 'if' clause I see !isRoot, so second if and corresponding 
ginScanToDelete() could be called only for root in posting tree. If so, it seems 
to me, it wasn't a good idea to move ginScanToDelete() from
ginVacuumPostingTree() and patch should just remove lock for cleanup over 
ginVacuumPostingTreeLeaves() and if it returns that tree contains empty page 
then lock the whole posting tree to do ginScanToDelete() work.




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] kNN for SP-GiST

2017-03-21 Thread David Steele


Hi Nikita,

On 3/9/17 8:52 AM, Alexander Korotkov wrote:


I take a look to this patchset.  My first notes are following.


This thread has been idle for quite a while.  Please respond and/or post 
a new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be 
marked "Returned with Feedback".


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Potential data loss of 2PC files

2017-03-21 Thread Teodor Sigaev


If that can happen, don't we have the same problem in many other places?
Like, all the SLRUs? They don't fsync the directory either.

Right, pg_commit_ts and pg_clog enter in this category.


Implemented as attached.


Is unlink() guaranteed to be durable, without fsyncing the directory? If
not, then we need to fsync() the directory even if there are no files in it
at the moment, because some might've been removed earlier in the checkpoint
cycle.


What is protection if pg crashes after unlimk() but before fsync()? Right, it's 
rather small window for such scenario, but isn't better to  have another 
protection? Like WAL-logging of WAL segment removing...


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-03-21 Thread Robert Haas

On Sun, Mar 19, 2017 at 9:03 PM, Peter Geoghegan  wrote:
> On Sun, Mar 12, 2017 at 3:05 PM, Peter Geoghegan  wrote:
>> I attach my V9 of the patch. I came up some stuff for the design of
>> resource management that I think meets every design goal that we have
>> for shared/unified BufFiles:
>
> Commit 2609e91fc broke the parallel CREATE INDEX cost model. I should
> now pass -1 as the index block argument to compute_parallel_worker(),
> just as all callers that aren't parallel index scan do after that
> commit. This issue caused V9 to never choose parallel CREATE INDEX
> within nbtsort.c. There was also a small amount of bitrot.
>
> Attached V10 fixes this regression. I also couldn't resist adding a
> few new assertions that I thought were worth having to buffile.c, plus
> dedicated wait events for parallel tuplesort. And, I fixed a silly bug
> added in V9 around where worker_wait() should occur.

Some initial review comments:

- * This code is moderately slow (~10% slower) compared to the regular
- * btree (insertion) build code on sorted or well-clustered data.  On
- * random data, however, the insertion build code is unusable -- the
- * difference on a 60MB heap is a factor of 15 because the random
- * probes into the btree thrash the buffer pool.  (NOTE: the above
- * "10%" estimate is probably obsolete, since it refers to an old and
- * not very good external sort implementation that used to exist in
- * this module.  tuplesort.c is almost certainly faster.)

While I agree that the old comment is probably inaccurate, I don't
think dropping it without comment in a patch to implement parallel
sorting is the way to go. How about updating it to be more current as
a separate patch?

+/* Magic numbers for parallel state sharing */
+#define PARALLEL_KEY_BTREE_SHARED  UINT64CONST(0xA001)
+#define PARALLEL_KEY_TUPLESORT UINT64CONST(0xA002)
+#define PARALLEL_KEY_TUPLESORT_SPOOL2  UINT64CONST(0xA003)

1, 2, and 3 would probably work just as well.  The parallel
infrastructure uses high-numbered values to avoid conflict with
plan_node_id values, but this is a utility statement so there's no
such problem.  But it doesn't matter very much.

+ * Note: caller had better already hold some type of lock on the table and
+ * index.
+ */
+int
+plan_create_index_workers(Oid tableOid, Oid indexOid)

Caller should pass down the Relation rather than the Oid.  That is
better both because it avoids unnecessary work and because it more or
less automatically avoids the problem mentioned in the note.

Why is this being put in planner.c rather than something specific to
creating indexes?  Not sure that's a good idea.

+ * This should be called when workers have flushed out temp file buffers and
+ * yielded control to caller's process.  Workers should hold open their
+ * BufFiles at least until the caller's process is able to call here and
+ * assume ownership of BufFile.  The general pattern is that workers make
+ * available data from their temp files to one nominated process; there is
+ * no support for workers that want to read back data from their original
+ * BufFiles following writes performed by the caller, or any other
+ * synchronization beyond what is implied by caller contract.  All
+ * communication occurs in one direction.  All output is made available to
+ * caller's process exactly once by workers, following call made here at the
+ * tail end of processing.

Thomas has designed a system for sharing files among cooperating
processes that lacks several of these restrictions.  With his system,
it's still necessary for all data to be written and flushed by the
writer before anybody tries to read it.  But the restriction that the
worker has to hold its BufFile open until the leader can assume
ownership goes away.  That's a good thing; it avoids the need for
workers to sit around waiting for the leader to assume ownership of a
resource instead of going away faster and freeing up worker slots for
some other query, or moving on to some other computation.   The
restriction that the worker can't reread the data after handing off
the file also goes away.  The files can be read and written by any
participant in any order, as many times as you like, with only the
restriction that the caller must guarantee that data will be written
and flushed from private buffers before it can be read.  I don't see
any reason to commit both his system and your system, and his is more
general so I think you should use it.  That would cut hundreds of
lines from this patch with no real disadvantage that I can see --
including things like worker_wait(), which are only needed because of
the shortcomings of the underlying mechanism.

+ * run.  Parallel workers always use quicksort, however.

Comment fails to mention a reason.

+elog(LOG, "%d using " INT64_FORMAT " KB of memory for read
buffers among %d input tapes",
+ state->worker, state->availMem / 1024, numInputTapes);

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Andres Freund

On 2017-03-21 08:04:11 -0400, Robert Haas wrote:
> On Tue, Mar 21, 2017 at 6:56 AM, Amit Kapila  wrote:
> >> Hmm, that test case isn't all that synthetic.  It's just a single
> >> column bulk update, which isn't anything all that crazy, and 5-10%
> >> isn't nothing.
> >>
> >> I'm kinda surprised it made that much difference, though.
> >>
> >
> > I think it is because heap_getattr() is not that cheap.  We have
> > noticed the similar problem during development of scan key push down
> > work [1].
> 
> Yeah.  So what's the deal with this?  Is somebody working on figuring
> out a different approach that would reduce this overhead?

I think one reasonable thing would be to use slots here, and use
slot_getsomeattrs(), with a pre-computed offset, for doing the
deforming.  Given that more than one place run into the issue with
deforming cost via heap_*, that seems like something we're going to have
to do.  Additionally the patches I had for JITed deforming all
integrated at the slot layer, so it'd be a good thing from that angle as
well.

Deforming all columns at once would also a boon for the accompanying
index_getattr calls.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables and relfilenode

2017-03-21 Thread Simon Riggs

On 16 March 2017 at 10:03, Amit Langote  wrote:
> On 2017/03/15 7:09, Robert Haas wrote:

>> I think that eliding the Append node when there's only one child may
>> be unsafe in the case where the child's attribute numbers are
>> different from the parent's attribute numbers.  I remember Tom making
>> some comment about this when I was working on MergeAppend, although I
>> no longer remember the specific details.
>
> Append node elision does not occur in the one-child case.  With the patch:
...
> create table q1 partition of q for values in (1);
> explain select * from q;
>  QUERY PLAN
> 
>  Append  (cost=0.00..35.50 rows=2550 width=4)
>->  Seq Scan on q1  (cost=0.00..35.50 rows=2550 width=4)
> (2 rows)
>
> Maybe that should be done, but this patch doesn't implement that.

Robert raises the possible problem that removing the Append wouldn't
work when the parent and child attribute numbers don't match. Surely
that never happens with partitions, by definition?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add missing support for new node fields

2017-03-21 Thread Pavel Stehule

>
> IMHO, what would be a lot more useful than something that generates
> {read,equal,copy,out}funcs.c automatically would be something that
> just checks them for trivial errors of omission.  For example, if you
> read a list of structure members from the appropriate header files and
> cross-checked it against the list of structure members mentioned in
> the body of a copy/equal/read/outfunc, you'd catch probably 80% of the
> mistakes people make right there.  If you could also check for a
> copy/read/equal/outfunc that ought to have been present but was
> omitted entirely, that'd probably up it to 90%.
>
>
+1

The work on nodes maintenance is not too much, but some smart check can be
nice.

Regards

Pavel


> The idea would be that if you added a field that wasn't supposed to be
> copied, you'd have to add something to copyfuncs.c that said, e.g.
>
> /* NOTCOPIED: mymember */
>
> ...and the checking script would ignore things so flagged.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Andres Freund

On 2017-03-21 19:49:07 +0530, Pavan Deolasee wrote:
> On Tue, Mar 21, 2017 at 6:55 PM, Robert Haas  wrote:
> 
> >
> > I think that very wide columns and highly indexed tables are not
> > particularly unrealistic, nor do I think updating all the rows is
> > particularly unrealistic.
> 
> 
> Ok. But those who update 10M rows in a single transaction, would they
> really notice 5-10% variation?

Yes. It's very common in ETL, and that's quite performance sensitive.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] segfault in hot standby for hash indexes

2017-03-21 Thread Jeff Janes

On Tue, Mar 21, 2017 at 4:00 AM, Ashutosh Sharma 
wrote:

> Hi Jeff,
>
> On Tue, Mar 21, 2017 at 1:54 PM, Amit Kapila 
> wrote:
> > On Tue, Mar 21, 2017 at 1:28 PM, Jeff Janes 
> wrote:
> >> Against an unmodified HEAD (17fa3e8), I got a segfault in the hot
> standby.
> >>
> >
> > I think I see the problem in hash_xlog_vacuum_get_latestRemovedXid().
> > It seems to me that we are using different block_id for registering
> > the deleted items in xlog XLOG_HASH_VACUUM_ONE_PAGE and then using
> > different block_id for fetching those items in
> > hash_xlog_vacuum_get_latestRemovedXid().  So probably matching those
> > will fix this issue (instead of fetching block number and items from
> > block_id 1, we should use block_id 0).
> >
>
> Thanks for reporting this issue. As Amit said, it is happening due to
> block_id mismatch. Attached is the patch that fixes the same. I
> apologise for such a silly mistake. Please note that  I was not able
> to reproduce the issue on my local machine using the test script you
> shared. Could you please check with the attached patch if you are
> still seeing the issue. Thanks in advance.
>

Hi Ashutosh,

I can confirm that that fixes the seg faults for me.

Did you mean you couldn't reproduce the problem in the first place, or that
you could reproduce it and now the patch fixes it?  If the first of those,
I forget to say you do have to wait for hot standby to reach a consistency
and open for connections, and then connect to the standby ("psql -p 9874"),
before the seg fault will be triggered.

But, there are places where hash_xlog_vacuum_get_latestRemovedXid diverges
from btree_xlog_delete_get_latestRemovedXid, which I don't understand the
reason for the divergence.  Is there a reason we dropped the PANIC if we
have not reached consistency?  That is a case which should never happen,
but it seems worth preserving the PANIC.  And why does this code get
'unused' from XLogRecGetBlockData(record, 0, &len), while the btree code
gets it from xlrec?  Is that because the record being replayed is
structured differently between btree and hash, or is there some other
reason?

Thanks,

Jeff

Re: [HACKERS] Partitioned tables and relfilenode

2017-03-21 Thread Robert Haas

On Tue, Mar 21, 2017 at 12:19 PM, Simon Riggs  wrote:
> On 16 March 2017 at 10:03, Amit Langote  wrote:
>> On 2017/03/15 7:09, Robert Haas wrote:
>
>>> I think that eliding the Append node when there's only one child may
>>> be unsafe in the case where the child's attribute numbers are
>>> different from the parent's attribute numbers.  I remember Tom making
>>> some comment about this when I was working on MergeAppend, although I
>>> no longer remember the specific details.
>>
>> Append node elision does not occur in the one-child case.  With the patch:
> ...
>> create table q1 partition of q for values in (1);
>> explain select * from q;
>>  QUERY PLAN
>> 
>>  Append  (cost=0.00..35.50 rows=2550 width=4)
>>->  Seq Scan on q1  (cost=0.00..35.50 rows=2550 width=4)
>> (2 rows)
>>
>> Maybe that should be done, but this patch doesn't implement that.
>
> Robert raises the possible problem that removing the Append wouldn't
> work when the parent and child attribute numbers don't match. Surely
> that never happens with partitions, by definition?

No, the attribute numbers don't have to match.  This decision was made
a long time ago, and there have been a whole bunch of followup commits
since the original partitioning patch that were dedicated to fixing up
cases where that wasn't working properly in the original commit.  It
seems superficially attractive to require the attribute numbers to
match, but it creates some really unpleasant cases.  For example,
suppose a user tries to creates a partitioned table, drops a column,
then creates a standalone table which matches the apparent column list
of the partitioned table, then tries to attach it as a partition.

ERROR: the columns you previously dropped from the parent that you
can't see and don't know about aren't the same as the ones dropped
from the standalone table you're trying to attach as a partition
DETAIL: Try recreating your proposed new partition with a
pass-by-value column of width 8 after the third column, and then
dropping that column before trying to attach it as a partition.
HINT: Abandon all hope, ye who enter here.

Not cool with that.

The decision not to require the attribute numbers to match doesn't
necessarily mean we can't get rid of the Append node, though.  First
of all, in a lot of practical cases the attribute numbers will all
match.  Second, if they don't, the most that would be required is a
projection step, which could usually be done without a separate node
because most nodes are projection-capable.  And maybe not even that
much is needed; I'd have to go back and look at what Tom was worried
about the last time this came up.  (Hmm, maybe the problem had to do
with varnos matching, rather then attribute numbers?)

Another and independent problem with eliding the Append node is that,
if we did that, we'd still have to guarantee that the parent relation
corresponding to the Append node got locked somehow.  Otherwise, we'll
be accessing the tuple routing information for a table on which we
don't have a lock.  That's probably a solvable problem, too, but it
hasn't been solved yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Do we create a new roadmap page for development?

2017-03-21 Thread Robert Haas

On Tue, Mar 21, 2017 at 2:36 AM, Tsunakawa, Takayuki
 wrote:
> I'd like to share our roadmap for PostgreSQL development, as other companies 
> and individuals do in the following page.  But this page is for PostgreSQL 10.
>
> PostgreSQL10 Roadmap
> https://wiki.postgresql.org/wiki/PostgreSQL10_Roadmap
>
> Should I create a page for PostgreSQL 11 likewise?  Or, do you want a more 
> stable page named "PostgreSQL Roadmap" instead of "PostgreSQL11 Roadmap" and 
> attach the target version to each feature as in "Feature X (V11)", so that we 
> can represent more longer-term goals?

I think a page that's just called PostgreSQL Roadmap will get out of
date pretty quickly.  Ultimately it'll end up like the TODO list,
which is not really a list of things TO DO.  The advantage of the
version-specific pages is that old information kind of ages its way
out of the system.

> And, why don't we add a link to the above roadmap page in the "Development 
> information" page?
>
> Development information
> https://wiki.postgresql.org/wiki/Development_information

I'm sure nobody will object to you doing that.  It's a wiki, and
intended to be edited.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Bruce Momjian

On Tue, Mar 21, 2017 at 09:25:49AM -0400, Robert Haas wrote:
> On Tue, Mar 21, 2017 at 8:41 AM, Pavan Deolasee
> > TBH I see many artificial scenarios here. It will be very useful if he can
> > rerun the query with some of these restrictions lifted. I'm all for
> > addressing whatever we can, but I am not sure if this test demonstrates a
> > real world usage.
> 
> That's a very fair point, but if these patches - or some of them - are
> going to get committed then these things need to get discussed.  Let's
> not just have nothing-nothing-nothing giant unagreed code drop.

First, let me say I love this feature for PG 10, along with
multi-variate statistics.

However, not to be a bummer on this, but the persistent question I have
is whether we are locking ourselves into a feature that can only do
_one_ index-change per WARM chain before a lazy vacuum is required.  Are
we ever going to figure out how to do more changes per WARM chain in the
future, and is our use of so many bits for this feature going to
restrict our ability to do that in the future.

I know we have talked about it, but not recently, and if everyone else
is fine with it, I am too, but I have to ask these questions.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

2017-03-21 Thread David Steele


Hi Michael,

On 3/10/17 9:15 AM, Peter Eisentraut wrote:

On 3/9/17 17:03, Michael Paquier wrote:

Having something like --limit-retained-segments partially addresses
it, as long as there is a way to define an automatic mode, based on
statvfs() obviously.


But that is not portable/usable enough, as we have determined, I think.

Have you looked into using inotify for implementing your use case?


This thread has been idle for quite a while.  Please respond and/or post 
a new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be 
marked "Returned with Feedback".


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Robert Haas

On Tue, Mar 21, 2017 at 12:49 PM, Bruce Momjian  wrote:
> On Tue, Mar 21, 2017 at 09:25:49AM -0400, Robert Haas wrote:
>> On Tue, Mar 21, 2017 at 8:41 AM, Pavan Deolasee
>> > TBH I see many artificial scenarios here. It will be very useful if he can
>> > rerun the query with some of these restrictions lifted. I'm all for
>> > addressing whatever we can, but I am not sure if this test demonstrates a
>> > real world usage.
>>
>> That's a very fair point, but if these patches - or some of them - are
>> going to get committed then these things need to get discussed.  Let's
>> not just have nothing-nothing-nothing giant unagreed code drop.
>
> First, let me say I love this feature for PG 10, along with
> multi-variate statistics.
>
> However, not to be a bummer on this, but the persistent question I have
> is whether we are locking ourselves into a feature that can only do
> _one_ index-change per WARM chain before a lazy vacuum is required.  Are
> we ever going to figure out how to do more changes per WARM chain in the
> future, and is our use of so many bits for this feature going to
> restrict our ability to do that in the future.
>
> I know we have talked about it, but not recently, and if everyone else
> is fine with it, I am too, but I have to ask these questions.

I think that's a good question.  I previously expressed similar
concerns.  On the one hand, it's hard to ignore the fact that, in the
cases where this wins, it already buys us a lot of performance
improvement.  On the other hand, as you say (and as I said), it eats
up a lot of bits, and that limits what we can do in the future.  On
the one hand, there is a saying that a bird in the hand is worth two
in the bush.  On the other hand, there is also a saying that one
should not paint oneself into the corner.

I'm not sure we've had any really substantive discussion of these
issues.  Pavan's response to my previous comments was basically "well,
I think it's worth it", which is entirely reasonable, because he
presumably wouldn't have written the patch that way if he thought it
sucked.  But it might not be the only opinion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] make async slave to wait for lsn to be replayed

2017-03-21 Thread David Steele


Hi Ivan,

On 3/12/17 10:20 PM, Thomas Munro wrote:

On Fri, Mar 10, 2017 at 1:49 AM, Ivan Kartyshov
 wrote:

Here I attached rebased patch waitlsn_10dev_v3 (core feature)
I will leave the choice of implementation (core/contrib) to the discretion
of the community.

Will be glad to hear your suggestion about syntax, code and patch.


Hi Ivan,

Here is some feedback based on a first read-through of the v4 patch.
I haven't tested it yet.


This thread has been idle for over a week.  Please respond and/or post a 
new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be 
marked "Returned with Feedback".


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Peter Geoghegan

On Tue, Mar 21, 2017 at 10:04 AM, Robert Haas  wrote:
> I think that's a good question.  I previously expressed similar
> concerns.  On the one hand, it's hard to ignore the fact that, in the
> cases where this wins, it already buys us a lot of performance
> improvement.  On the other hand, as you say (and as I said), it eats
> up a lot of bits, and that limits what we can do in the future.  On
> the one hand, there is a saying that a bird in the hand is worth two
> in the bush.  On the other hand, there is also a saying that one
> should not paint oneself into the corner.

Are we really saying that there can be no incompatible change to the
on-disk representation for the rest of eternity? I can see why that's
something to avoid indefinitely, but I wouldn't like to rule it out.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PassDownLimitBound for ForeignScan/CustomScan [take-2]

2017-03-21 Thread David Steele


Hi,

On 3/13/17 3:25 AM, Jeevan Chalke wrote:


I have reviewed this patch further and here are my comments:


This thread has been idle for over a week.  Please respond and/or post a 
new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be 
marked "Returned with Feedback".


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Robert Haas

On Tue, Mar 21, 2017 at 1:08 PM, Peter Geoghegan  wrote:
> On Tue, Mar 21, 2017 at 10:04 AM, Robert Haas  wrote:
>> I think that's a good question.  I previously expressed similar
>> concerns.  On the one hand, it's hard to ignore the fact that, in the
>> cases where this wins, it already buys us a lot of performance
>> improvement.  On the other hand, as you say (and as I said), it eats
>> up a lot of bits, and that limits what we can do in the future.  On
>> the one hand, there is a saying that a bird in the hand is worth two
>> in the bush.  On the other hand, there is also a saying that one
>> should not paint oneself into the corner.
>
> Are we really saying that there can be no incompatible change to the
> on-disk representation for the rest of eternity? I can see why that's
> something to avoid indefinitely, but I wouldn't like to rule it out.

Well, I don't want to rule it out either, but if we do a release to
which you can't pg_upgrade, it's going to be really painful for a lot
of users.  Many users can't realistically upgrade using pg_dump, ever.
So they'll be stuck on the release before the one that breaks
compatibility for a very long time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] free space map and visibility map

2017-03-21 Thread Masahiko Sawada

On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas  wrote:
> On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes  wrote:
>> Isn't HEAP2_CLEAN only issued before an intended HOT update?  (Which then
>> can't leave the block as all visible or all frozen).  I think the issue is
>> here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE.  Am I reading this correctly,
>> that neither of those ever update the FSM, regardless of FPI?
>
> Yes, updates to the FSM are never logged.  Forcing replay of
> HEAP2_FREEZE_PAGE to update the FSM might be a good idea.
>

I think I was missing something. I imaged your situation is that FPI
is replayed during crash recovery after the crashed server vacuums the
page and marked it as all-frozen. But this situation is also resolved
by that solution.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Generic type subscripting

2017-03-21 Thread David Steele


Hi Dmitry,

On 3/14/17 7:10 PM, Tom Lane wrote:

Dmitry Dolgov <9erthali...@gmail.com> writes:

[ generic_type_subscription_v7.patch ]


I looked through this a bit.


This thread has been idle for over a week.  Please respond and/or post a 
new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be 
marked "Returned with Feedback".


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Bruce Momjian

On Tue, Mar 21, 2017 at 01:04:14PM -0400, Robert Haas wrote:
> > I know we have talked about it, but not recently, and if everyone else
> > is fine with it, I am too, but I have to ask these questions.
> 
> I think that's a good question.  I previously expressed similar
> concerns.  On the one hand, it's hard to ignore the fact that, in the
> cases where this wins, it already buys us a lot of performance
> improvement.  On the other hand, as you say (and as I said), it eats
> up a lot of bits, and that limits what we can do in the future.  On
> the one hand, there is a saying that a bird in the hand is worth two
> in the bush.  On the other hand, there is also a saying that one
> should not paint oneself into the corner.
> 
> I'm not sure we've had any really substantive discussion of these
> issues.  Pavan's response to my previous comments was basically "well,
> I think it's worth it", which is entirely reasonable, because he
> presumably wouldn't have written the patch that way if he thought it
> sucked.  But it might not be the only opinion.

Early in the discussion we talked about allowing multiple changes per
WARM chain if they all changed the same index and were in the same
direction so there were no duplicates, but it was complicated.  There
was also discussion about checking the index during INSERT/UPDATE to see
if there was a duplicate.  However, those ideas never led to further
discussion.

I know the current patch yields good results, but only on a narrow test
case, so I am not ready to just stop asking questions based the opinion
of the author or test results alone.

If someone came to me and said, "We have thought about allowing more
than one index change per WARM chain, and if we can ever do it, it will
probably be done this way, and we have the bits for it," I would be more
comfortable.

One interesting side-issue is that indirect indexes have a similar
problem with duplicate index entries, and there is no plan on how to fix
that either.  I guess I just don't feel we have explored the
duplicate-index-entry problem enough for me to be comfortable.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] make async slave to wait for lsn to be replayed

2017-03-21 Thread Robert Haas

On Sun, Mar 12, 2017 at 10:20 PM, Thomas Munro
 wrote:
> Maybe someone can think of a clever way for an extension to insert a
> wait for a user-supplied LSN *before* acquiring a snapshot so it can
> work for the higher levels, or maybe the hooks should go into core
> PostgreSQL so that the extension can exist as an external project not
> requiring a patched PostgreSQL installation, or maybe this should be
> done with new core syntax that extends transaction commands.  Do other
> people have views on this?

IMHO, trying to do this using a function-based interface is a really
bad idea for exactly the reasons you mention.  I don't see why we'd
resist the idea of core syntax here; transactions are a core part of
PostgreSQL.

There is, of course, the question of whether making LSNs such a
user-visible thing is a good idea in the first place, but that's a
separate question from issue of what syntax for such a thing is best.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-21 Thread Bruce Momjian

On Tue, Mar 21, 2017 at 01:14:00PM -0400, Robert Haas wrote:
> On Tue, Mar 21, 2017 at 1:08 PM, Peter Geoghegan  wrote:
> > On Tue, Mar 21, 2017 at 10:04 AM, Robert Haas  wrote:
> >> I think that's a good question.  I previously expressed similar
> >> concerns.  On the one hand, it's hard to ignore the fact that, in the
> >> cases where this wins, it already buys us a lot of performance
> >> improvement.  On the other hand, as you say (and as I said), it eats
> >> up a lot of bits, and that limits what we can do in the future.  On
> >> the one hand, there is a saying that a bird in the hand is worth two
> >> in the bush.  On the other hand, there is also a saying that one
> >> should not paint oneself into the corner.
> >
> > Are we really saying that there can be no incompatible change to the
> > on-disk representation for the rest of eternity? I can see why that's
> > something to avoid indefinitely, but I wouldn't like to rule it out.
> 
> Well, I don't want to rule it out either, but if we do a release to
> which you can't pg_upgrade, it's going to be really painful for a lot
> of users.  Many users can't realistically upgrade using pg_dump, ever.
> So they'll be stuck on the release before the one that breaks
> compatibility for a very long time.

Right.  If we weren't setting tuple and tid bits we could imrpove it
easily in PG 11, but if we use them for a single-change WARM chain for
PG 10, we might need bits that are not available to improve it later.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables and relfilenode

2017-03-21 Thread Simon Riggs

On 21 March 2017 at 16:33, Robert Haas  wrote:
> On Tue, Mar 21, 2017 at 12:19 PM, Simon Riggs  wrote:
>> On 16 March 2017 at 10:03, Amit Langote  
>> wrote:
>>> On 2017/03/15 7:09, Robert Haas wrote:
>>
 I think that eliding the Append node when there's only one child may
 be unsafe in the case where the child's attribute numbers are
 different from the parent's attribute numbers.  I remember Tom making
 some comment about this when I was working on MergeAppend, although I
 no longer remember the specific details.
>>>
>>> Append node elision does not occur in the one-child case.  With the patch:
>> ...
>>> create table q1 partition of q for values in (1);
>>> explain select * from q;
>>>  QUERY PLAN
>>> 
>>>  Append  (cost=0.00..35.50 rows=2550 width=4)
>>>->  Seq Scan on q1  (cost=0.00..35.50 rows=2550 width=4)
>>> (2 rows)
>>>
>>> Maybe that should be done, but this patch doesn't implement that.
>>
>> Robert raises the possible problem that removing the Append wouldn't
>> work when the parent and child attribute numbers don't match. Surely
>> that never happens with partitions, by definition?
>
> No, the attribute numbers don't have to match.  This decision was made
> a long time ago, and there have been a whole bunch of followup commits
> since the original partitioning patch that were dedicated to fixing up
> cases where that wasn't working properly in the original commit.  It
> seems superficially attractive to require the attribute numbers to
> match, but it creates some really unpleasant cases.  For example,
> suppose a user tries to creates a partitioned table, drops a column,
> then creates a standalone table which matches the apparent column list
> of the partitioned table, then tries to attach it as a partition.
>
> ERROR: the columns you previously dropped from the parent that you
> can't see and don't know about aren't the same as the ones dropped
> from the standalone table you're trying to attach as a partition
> DETAIL: Try recreating your proposed new partition with a
> pass-by-value column of width 8 after the third column, and then
> dropping that column before trying to attach it as a partition.
> HINT: Abandon all hope, ye who enter here.
>
> Not cool with that.

Thanks for the explanation.

> The decision not to require the attribute numbers to match doesn't
> necessarily mean we can't get rid of the Append node, though.  First
> of all, in a lot of practical cases the attribute numbers will all
> match.  Second, if they don't, the most that would be required is a
> projection step, which could usually be done without a separate node
> because most nodes are projection-capable.  And maybe not even that
> much is needed; I'd have to go back and look at what Tom was worried
> about the last time this came up.  (Hmm, maybe the problem had to do
> with varnos matching, rather then attribute numbers?)

There used to be some code there to fix them up, not sure where that went.

> Another and independent problem with eliding the Append node is that,
> if we did that, we'd still have to guarantee that the parent relation
> corresponding to the Append node got locked somehow.  Otherwise, we'll
> be accessing the tuple routing information for a table on which we
> don't have a lock.  That's probably a solvable problem, too, but it
> hasn't been solved yet.

Hmm, why would we need to access tuple routing information?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-21 Thread Masahiko Sawada

On Wed, Mar 15, 2017 at 4:50 PM, Amit Kapila  wrote:
> On Thu, Mar 9, 2017 at 10:21 PM, Masahiko Sawada  
> wrote:
>> On Wed, Mar 8, 2017 at 1:43 AM, Peter Geoghegan  wrote:
>>> On Sat, Mar 4, 2017 at 1:30 AM, Amit Kapila  wrote:
> While I can't see this explained anywhere, I'm
> pretty sure that that's supposed to be impossible, which this patch
> changes.
>

 What makes you think that patch will allow pg_class.relfrozenxid to be
 advanced past opaque->btpo.xact which was previously not possible?
>>>
>>> By not reliably recycling pages in a timely manner, we won't change
>>> anything about the dead page. It just sticks around. This is mostly
>>> fine, but we still need VACUUM to be able to reason about it (to
>>> determine if it is in fact recyclable), which is what I'm concerned
>>> about breaking here. It still needs to be *possible* to recycle any
>>> recyclable page at some point (e.g., when we find it convenient).
>>>
>>> pg_class.relfrozenxid is InvalidTransactionId for indexes because
>>> indexes generally don't store XIDs. This is the one exception that I'm
>>> aware of, presumably justified by the fact that it's only for
>>> recyclable pages anyway, and those are currently *guaranteed* to get
>>> recycled quickly. In particular, they're guaranteed to get recycled by
>>> the next VACUUM. They may be recycled in the course of anti-wraparound
>>> VACUUM, even if VACUUM has no garbage tuples to kill (even if we only
>>> do lazy_cleanup_index() instead of lazy_vacuum_index()). This is the
>>> case that this patch proposes to have us skip touching indexes for.
>>>
>>
>> To prevent this, I think we need to not skip the lazy_cleanup_index
>> when ant-wraparound vacuum (aggressive = true) even if the number of
>> scanned pages is less then the threshold. This can ensure that
>> pg_class.relfrozenxid is not advanced past opaque->bp.xact with
>> minimal pain. Even if the btree dead page is leaved, the subsequent
>> modification makes garbage on heap and then autovauum recycles that
>> page while index vacuuming(lazy_index_vacuum).
>>
>
> What about if somebody does manual vacuum and there are no garbage
> tuples to clean, won't in that case also you want to avoid skipping
> the lazy_cleanup_index?

Yes, in that case lazy_cleanup_index will be skipped.

> Another option could be to skip updating the
> relfrozenxid if we have skipped the index cleanup.

This could make anti-wraparound VACUUM occurs at high frequency and we
cannot skip lazy_clean_index when aggressive vacuum after all.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-21 Thread David Steele


Hi,

On 3/15/17 9:50 AM, Amit Kapila wrote:


What about if somebody does manual vacuum and there are no garbage
tuples to clean, won't in that case also you want to avoid skipping
the lazy_cleanup_index?  Another option could be to skip updating the
relfrozenxid if we have skipped the index cleanup.


This thread has been idle for six days.  Please respond and/or post a 
new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be 
marked "Returned with Feedback".


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Measuring replay lag

2017-03-21 Thread David Steele


Hi Thomas,

On 3/15/17 8:38 PM, Simon Riggs wrote:

On 16 March 2017 at 08:02, Thomas Munro  wrote:


I agree that these states exist, but we disagree on what 'lag' really
means, or, rather, which of several plausible definitions would be the
most useful here.

My proposal is that the *_lag columns should always report how long it
took for recently written, flushed and applied WAL to be written,
flushed and applied (and for the primary to know about it).  By this
definition, sent LSN = applied LSN is not a special case: we simply
report how long that LSN took to be written, flushed and applied.

Your proposal is that the *_lag columns should report how far in the
past the standby is at each of the three stages with respect to the
current end of WAL.  By this definition when sent LSN = applied LSN we
are currently in the 'A' state meaning 'caught up' and should show
00:00:00.


I accept your proposal for how we handle these, on condition that you
write up some docs that explain the subtle difference between the two,
so we can just show people the URL. That needs to explain clearly the
difference in an impartial way between "what is the most recent lag
measurement" and "how long until we are caught up" as possible
intrepretations of these values. Thanks.


This thread has been idle for six days.  Please respond and/or post a 
new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be 
marked "Returned with Feedback".


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: recursive json_populate_record()

2017-03-21 Thread David Steele


On 3/16/17 11:54 AM, David Steele wrote:

On 2/1/17 12:53 AM, Michael Paquier wrote:

On Thu, Jan 26, 2017 at 6:49 AM, Tom Lane  wrote:

Nikita Glukhov  writes:

On 25.01.2017 23:58, Tom Lane wrote:

I think you need to take a second look at the code you're producing
and realize that it's not so clean either.  This extract from
populate_record_field, for example, is pretty horrid:



But what if we introduce some helper macros like this:



#define JsValueIsNull(jsv) \
 ((jsv)->is_json ? !(jsv)->val.json.str \
 : !(jsv)->val.jsonb || (jsv)->val.jsonb->type == jbvNull)



#define JsValueIsString(jsv) \
 ((jsv)->is_json ? (jsv)->val.json.type == JSON_TOKEN_STRING \
 : (jsv)->val.jsonb && (jsv)->val.jsonb->type == jbvString)


Yeah, I was wondering about that too.  I'm not sure that you can make
a reasonable set of helper macros that will fix this, but if you want
to try, go for it.

BTW, just as a stylistic thing, I find "a?b:c||d" unreadable: I have
to go back to the manual every darn time to convince myself whether
that means (a?b:c)||d or a?b:(c||d).  It's better not to rely on
the reader (... or the author) having memorized C's precedence rules
in quite that much detail.  Extra parens help.


Moved to CF 2017-03 as discussion is going on and more review is
needed on the last set of patches.



The current patches do not apply cleanly at cccbdde:

$ git apply ../other/0001-introduce-JsonContainerXxx-macros-v04.patch
error: patch failed: src/backend/utils/adt/jsonb_util.c:328
error: src/backend/utils/adt/jsonb_util.c: patch does not apply
error: patch failed: src/backend/utils/adt/jsonfuncs.c:1266
error: src/backend/utils/adt/jsonfuncs.c: patch does not apply
error: patch failed: src/include/utils/jsonb.h:218
error: src/include/utils/jsonb.h: patch does not apply

In addition, it appears a number of suggestions have been made by Tom
that warrant new patches.  Marked "Waiting on Author".


This thread has been idle for months since Tom's review.

The submission has been marked "Returned with Feedback".  Please feel 
free to resubmit to a future commitfest.


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZE to max_wal_send

2017-03-21 Thread David Steele


On 3/16/17 11:56 AM, David Steele wrote:


My recommendation is that we mark this patch "Returned with Feedback" to
allow you time to test and refine the patch.  You can resubmit once it
is ready.


This submission has been marked "Returned with Feedback".  Please feel 
free to resubmit to a future commitfest.


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2017-03-21 Thread Robert Haas

On Mon, Mar 20, 2017 at 1:38 AM, Craig Ringer  wrote:
> On 14 March 2017 at 19:57, Robert Haas  wrote:
>> On Mon, Mar 13, 2017 at 10:22 PM, Craig Ringer  wrote:
>>> I'll introduce a new LWLock, ClogTruncationLock, which will be held
>>> from when we advance the new clogOldestXid field through to when clog
>>> truncation completes.
>>
>> +1.
>
> OK, here's the revised patch. Relevant comments added where needed.
>
> It still changes the xlog record for clog truncation to carry the new
> xid, but I've left advancing oldestXid until the checkpoint as is
> currently the case, and only advanced oldestClogXid when we replay
> clog truncation.

/me smacks forehead.  Actually, it should be CLogTruncationLock, with
a capital L, for consistency with CLogControlLock.

The new lock needs to be added to the table in monitoring.sgml.

I don't think the new header comments in TransactionIdDidCommit and
TransactionIdDidAbort are super-clear.  I'm not sure you're going to
be able to explain it there in a reasonable number of words, but I
think that speaking of "testing against oldestClogXid" will leave
people wondering what exactly that means. Maybe just write "caller is
responsible for ensuring that the clog records covering XID being
looked up can't be truncated away while the lookup is in progress",
and then leave the bit about CLogTruncationLock to be explained by the
callers that do that.  Or you could drop these comments entirely.

Overall, though, I think that 0001 looks far better than any previous
iteration.  It's simple.  It looks safe.  It seems unlikely to break
anything that works now.  Woo hoo!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication apply to run with sync commit off by default

2017-03-21 Thread Robert Haas

On Mon, Mar 20, 2017 at 7:56 PM, Petr Jelinek
 wrote:
> On 18/03/17 13:31, Petr Jelinek wrote:
>> On 07/03/17 06:23, Petr Jelinek wrote:
>>> there has been discussion at the logical replication initial copy thread
>>> [1] about making apply work with sync commit off by default for
>>> performance reasons and adding option to change that per subscription.
>>>
>>> Here I propose patch to implement this - it adds boolean column
>>> subssynccommit to pg_subscription catalog which decides if
>>> synchronous_commit should be off or local for apply. And it adds
>>> SYNCHRONOUS_COMMIT = boolean to the list of WITH options for CREATE and
>>> ALTER SUBSCRIPTION. When nothing is specified it will set it to false.
>>>
>>> The patch is built on top of copy patch currently as there are conflicts
>>> between the two and this helps a bit with testing of copy patch.
>>>
>>> [1]
>>> https://www.postgresql.org/message-id/CA+TgmoY7Lk2YKArcp4O=Qu=xoor8j71mad1oteojawmuje3...@mail.gmail.com
>>>
>>
>> I rebased this patch against recent changes and the latest version of
>> copy patch.
>
> And another rebase after pg_dump tests commit.

+else if (strcmp(defel->defname, "nosynchronous_commit") == 0
&& synchronous_commit)
+{
+if (synchronous_commit_given)
+ereport(ERROR,
+(errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options")));
+
+synchronous_commit_given = true;
+*synchronous_commit = !defGetBoolean(defel);
+}

Uh, what's this nosynchronous_commit thing?

+  local otherwise to false. The
+  default value is false independently of the default
+  synchronous_commit setting for the instance.

This phrasing isn't very clear or accurate, IMHO.  I'd say something
like "The value of this parameter overrides the synchronous_commit
setting.  The default value is false."  And I'd make the word
synchronous_commit in that sentence a link to the GUC, so that it's
absolutely unmistakable what we mean by "the synchronous_commit
setting".

 /*
+ * We need to make new connection to new slot if slot name has changed
+ * so exit here as well if that's the case.
+ */
+if (strcmp(newsub->slotname, MySubscription->slotname) != 0)
+{
+ereport(LOG,
+(errmsg("logical replication worker for subscription
\"%s\" will "
+"restart because the replication slot name
was changed",
+MySubscription->name)));
+
+walrcv_disconnect(wrconn);
+proc_exit(0);
+}

Looks unrelated.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

2017-03-21 Thread Andrew Borodin

Hi, Teodor!

2017-03-21 20:32 GMT+05:00 Teodor Sigaev :
> I had a look on patch

That's great, thanks!

>
> /*
>  * All subtree is empty - just return TRUE to indicate that parent
> must
>  * do a cleanup. Unless we are ROOT an there is way to go upper.
>  */
>
> if(isChildHasVoid && !isAnyNonempy && !isRoot)
> return TRUE;
>
> if(isChildHasVoid)
> {
> ...
> ginScanToDelete(gvs, blkno, TRUE, &root,
> InvalidOffsetNumber);
> }
>
> In first 'if' clause I see !isRoot, so second if and corresponding
> ginScanToDelete() could be called only for root in posting tree.

No, second conditional code will be called for any subtree, which
contains totally empty subtree. That check !isRoot covers case when
the entire posting tree should be erased: we cannot just quit out of
recursive cleanup, we have to make a scan here, starting from root.

Probably, variable isChildHasVoid has a bit confusing name. This flag
indicates that some subtree:
1. Had empty pages
2. Did not bother deleting them, because there is a chance that it is
a part of a bigger empty subtree.
May be it'd be better to call the variable "someChildIsVoidSubtree".

>just remove lock for cleanup over ginVacuumPostingTreeLeaves() and if it 
>returns that tree contains empty page then lock the whole posting tree to do 
>ginScanToDelete() work.

It is, indeed, viable approach. But currently proposed patch is
capable of dealing with small page deletes without much of locking
fuss, even in 4-5 level trees.

How do you think, which way should we take?

Best regards, Andrey Borodin.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 3 >

1 - 100 of 207 matches

Mail list logo