[HACKERS] pg_dump, pg_dumpall and data durability

2016-10-12 Thread Michael Paquier
Hi all,

In my quest of making the backup tools more compliant to data
durability, here is a thread for pg_dump and pg_dumpall. Here is in a
couple of lines my proposal:
- Addition in _archiveHandle of a field to track if the dump generated
should be synced or not.
- This is effective for all modes, when the user specifies an output
file. In short that's when fileSpec is not NULL.
- Actually do the the sync in _EndData and _EndBlob[s] if appropriate.
There is for example nothing to do for pg_backup_null.c
- Addition of --nosync option to allow users to disable it. By default
it is enabled.
Note that to make the data durable, the file need to be sync'ed as
well as its parent folder. So with pg_dump we can only make that
really durable with -Fd. I think that in the case where the user
specifies an output file for the other modes we should sync it, that's
the best we can do. This last statement applies as well for
pg_dumpall.

Of course, if no output file is specified, that's up to the user to
deal with the sync phases.

Or more simply, as truly durability can just be achieved if each file
and their parent directory are fsync'd, we just support the operation
for -Fd. Still I'd like to think htat we had better do the best we can
here and do things as well for the other modes.

Thoughts? I'd like to prepare a patch according to those lines for the next CF.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Mention to pg_backup in pg_dump.c

2016-10-12 Thread Michael Paquier
Hi all,

I just bumped into the following thing in pg_dump.c:
/* Set default options based on progname */
if (strcmp(progname, "pg_backup") == 0)
format = "c";
As far as I can see this comes from e8f69be0 dated of 2000. Couldn't
this be removed? I have never seen traces of pg_backup in the code.

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2016-10-12 Thread Vitaly Burovoy
On 10/12/16, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Tom Lane wrote:
>>> Vitaly Burovoy  writes:
 P.S.: I still think it is a good idea to change storage format,

>>> I'm not sure which part of "no" you didn't understand,

I just paid attention to the words "likelihood" (mixed up with
"likeliness"), "we wanted" and "probably".
Also there was a note about "would also break send/recv" which
behavior can be saved.
And after your letter Julien Rouhaud wrote about mapping from MAC-48
to EUI-64 which leads absence of a bit indicated version of a stored
value. Which can be considered as a new information.

>>> but we're
>>> not breaking on-disk compatibility of existing macaddr columns.

Can I ask why? It will not be a varlen (typstorage will not be
changed), it just changes typlen to 8 and typalign to 'd'.
For every major release 9.0, 9.1, 9.2 .. 9.6 the docs says "A
dump/restore using pg_dumpall, or use of pg_upgrade, is required".
Both handle changes in a storage format. Do they?

>>> Breaking the on-the-wire binary I/O representation seems like a
>>> nonstarter as well.

I wrote that for the EUI-48 (MAC-48) values the binary I/O
representation can be saved.
The binary format (in DataRow message) has a length of the column
value which is reflected in PGresAttValue.len in libpq.
If the client works with the binary format it must consult with the
length of the data.
But until the client works with (and columns have) MAC-48 nothing
hurts it and PGresAttValue.len is "6" as now.

>> I think the suggestion was to rename macaddr to macaddr6 or similar,
>> keeping the existing behavior and the current OID.  So existing columns
>> would continue to work fine and maintain on-disk compatibility, but any
>> newly created columns would become the 8-byte variant.
>
> ... and would have different I/O behavior from before, possibly breaking
> applications that expect "macaddr" to mean what it used to.  I'm still
> dubious that that's a good idea.

Only if a new type will send xx:xx:xx:FF:FF:xx:xx:xx instead of usual
(expected) 6 octets long.
Again, that case in my offer is similar (by I/O behavior) to "just
change 'macaddr' to keep and accept both MAC-48 and MAC-64", but
allows to use "-k" key for pg_upgrade to prevent rewriting possibly
huge (for instance, 'log') tables (but users unexpectedly get
"macaddr6" after upgrade in their columns and function names which
looks strange enough).

> The larger picture here is that we got very little thanks when we squeezed
> IPv6 into the pre-existing inet datatype;

Without a sarcasm, I thank a lot all people involved in it because it
does not hurt me (and many other people) from distinguishing ipv4 and
ipv6 at app-level.
I write apps and just save remote address of clients to an "inet"
column named "remote_ip" without thinking "what if we start serving
clients via ipv6?"; or have a column named "allowed_ip" with IPs or
subnets and just save client's IPv4 or IPv6 as a white list (and use
"allowed_ip >>= $1"). It just works.

> there's a large number of people
> who just said "no thanks" and started using the add-on ip4r type instead.

I found a repository[1] at github. From the description it is
understandable why people used ip4r those days (2005 year). The reason
"Furthermore, they are variable length types (to support ipv6) with
non-trivial overheads" is mentioned as the last in its README.
When you deal with IPv4 in 99.999%, storing it in TOAST tables leads
to a big penalty, but the new version of macaddr is not so wide, so it
does not lead to similar speed decrease (it will be stored inplace).

> So I'm not sure why we want to complicate our lives in order to make
> macaddr follow the same path.

Because according to the Wiki[3] MAC addresses now "are formed
according to the rules of one of three numbering name spaces ...:
MAC-48, EUI-48, and EUI-64.", so IEEE extended range of allowed values
from 48 to 64 bits and since Postgres claims supporting of "mac
addresses", I (as a developer who still uses PG as a primary database)
expect supporting of any kind of mac address, not a limited one. I
expect it is just works.

I reject to imagine what I have to do if I have a column of a type
"macaddr" and unexpectedly I have to deal with an input of EUI-64
type. Add a new column or change columns's type?

In the first case what to do with stored procedures? Duplicate input
parameter to pass the new column of macaddr8 (if macaddr was passed
before)? Duplicate stored procedure?
Also I have to support two columns at the application level. Why? I
just want to store it in the DB, work with it there and get it back!

In the second case (if output will not be mapped to MAC-48 when it is
possible) I have the same troubles as you wrote (oid, I/O and text
representation at least for output, may be also for input).
Moreover I still have to rewrite tables but not when I'm ready for it
(at a migration stage from one 

Re: [HACKERS] Change of extension name to new name

2016-10-12 Thread Tom Lane
Haribabu Kommi  writes:
> As we are planning to change an extension name from one name to another
> name because of additional features that are added into this extension,

The usual approach to that is just to increase the version number.
Why is it necessary to change the name?

> I just thought of adding the support of (ALTER EXTENSION name RENAME To
> newname), this can be executed before executing the pg_upgrade to the new
> extension name that is available in the
> newer version.

And if the user forgets to do that before upgrading?  Not to mention
that the extension is mostly broken the moment its SQL name no longer
corresponds to the on-disk control file name.  This seems like
a non-solution.

In general, once you've shipped something, changing its name is a huge
pain both for you and your users.  Just say no.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Extend framework from commit 53be0b1ad to report latch waits.

2016-10-12 Thread Michael Paquier
On Thu, Oct 13, 2016 at 8:38 AM, Andres Freund  wrote:
> On 2016-10-12 11:18:57 -0400, Peter Eisentraut wrote:
>> On 10/4/16 11:29 AM, Tom Lane wrote:
>> > Robert Haas  writes:
>> >> Apparently, 'make world' does not build worker_spi.  I thought 'make
>> >> world' was supposed to build everything?
>> >
>> > You'd have thunk, yeah.  It looks like the issue is that src/Makefile
>> > is selective about recursing into certain subdirectories of test/,
>> > but mostly not test/ itself.  src/test/Makefile naively believes it's
>> > in charge, though.  Probably that logic ought to get shoved down one
>> > level, and then adjusted so that src/test/modules gets built by "all".
>> > Or else teach top-level "make world" to do "make all" in src/test/,
>> > but that seems like it's just doubling down on confusing interconnections.
>>
>> We generally don't build test code during make world.
>
> FWIW, I find that quite annoying. I want to run make world with parallelism so
> I can run make world afterwards with as little unnecessary
> unparallelized work. And since the latter takes just about forever, any
> errors visible earlier are good.
>
> What are we gaining by this behaviour?

Personally, I have been trapped by the fact that worker_spi does not
get built when doing a simple check-world recently... So +1 for just
building it when running check-world. We gain nothing with the current
behavior except alarms on the buildfarm.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2016-10-12 Thread Haribabu Kommi
On Thu, Oct 13, 2016 at 7:59 AM, Tom Lane  wrote:

> Alvaro Herrera  writes:
> > Tom Lane wrote:
> >> Vitaly Burovoy  writes:
> >>> P.S.: I still think it is a good idea to change storage format,
>
> >> I'm not sure which part of "no" you didn't understand, but we're
> >> not breaking on-disk compatibility of existing macaddr columns.
> >> Breaking the on-the-wire binary I/O representation seems like a
> >> nonstarter as well.
>
> > I think the suggestion was to rename macaddr to macaddr6 or similar,
> > keeping the existing behavior and the current OID.  So existing columns
> > would continue to work fine and maintain on-disk compatibility, but any
> > newly created columns would become the 8-byte variant.
>
> ... and would have different I/O behavior from before, possibly breaking
> applications that expect "macaddr" to mean what it used to.  I'm still
> dubious that that's a good idea.
>
> The larger picture here is that we got very little thanks when we squeezed
> IPv6 into the pre-existing inet datatype; there's a large number of people
> who just said "no thanks" and started using the add-on ip4r type instead.
> So I'm not sure why we want to complicate our lives in order to make
> macaddr follow the same path.
>
>
Thanks for all your opinions regarding the addition of new datatype to
support
EUI-64 Mac address, I will work on it and come up with a patch.


Regards,
Hari Babu
Fujitsu Australia


[HACKERS] Change of extension name to new name

2016-10-12 Thread Haribabu Kommi
Hi All,

As we are planning to change an extension name from one name to another
name because of additional features that are added into this extension, so
the name is not matching, so we decided
to change the name, but it is causing problem to the already existing
installations with old extension
name tries to upgrade to a new version.

9.5 + 'extension_name1' -> 9.6 + 'extension_name2' fails during pg_upgrade
when it tries to load
the extension and it's object dependencies on 9.6. If we keep the same
extension name then it passed.

Is there any possibility to change the extension name without causing the
pg_upgrade failure with minimal changes?

I just thought of adding the support of (ALTER EXTENSION name RENAME To
newname), this can be executed before executing the pg_upgrade to the new
extension name that is available in the
newer version.

Is there any simpler way to change the extension name, if not any opinion
about adding the syntax support to rename an extension?

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] parallel.sgml

2016-10-12 Thread Tatsuo Ishii
> On Thu, Oct 13, 2016 at 10:57 AM, Tatsuo Ishii  wrote:
>>> While reading parallel.sgml, I met with following sentence.
>>>
>>> If this occurs, the leader will execute the portion of the plan
>>> between below the Gather node entirely by itself,
>>> almost as if the Gather node were not present.
>>>
>>> Maybe "the portion of the plan between below" shoud have been
>>> "the portion of the plan below"?
>>
>> Can someone, possibly a native English speaker, please confirm this is
>> typo or not? I cannot parse the sentence as it is, but this maybe
>> because I'm not a English speaker.
> 
> No native speaker here, but it seems to me that "between" could be ripped off.

Thanks. Unless there's an objection, I will commit the proposed change.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Non-empty default log_line_prefix

2016-10-12 Thread Alvaro Herrera
Peter Eisentraut wrote:
> On 10/12/16 11:58 AM, Christoph Berg wrote:
> > (Yes, the '' default might be fine for syslog, but I don't think
> > that's a good argument for sticking with it for default installs. I've
> > seen way too many useless log files out there, and at worst we'll have
> > syslogs with two timestamps.)
> 
> We'd have to think of a way to sort this out during upgrades.

Sure, but we don't have to do that in this patch, do we?

Some systems such as Debian copy the postgresql.conf file from the old
installation to the new one, and then have specific code to adjust
settings that no longer exist in the new version.  I'm not a fan of that
approach myself precisely because if PGDG installs new defaults (such as
in this case), the sites using that upgrade tool don't benefit.  A
better approach might be to copy over the non-default settings from the
old install to the new.  If Debian were to do that then we wouldn't need
to do anything here.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Parallel Index Scans

2016-10-12 Thread Amit Kapila
As of now, the driving table for parallel query is accessed by
parallel sequential scan which limits its usage to a certain degree.
Parallelising index scans would further increase the usage of parallel
query in many more cases.  This patch enables the parallelism for the
btree scans.  Supporting parallel index scan for other index types
like hash, gist, spgist can be done as separate patches.

The basic idea is quite similar to parallel heap scans which is that
each worker (including leader whenever possible) will scan a block and
then get the next block that is required to be scan. The parallelism
in implemented at the leaf level of a btree.  The first worker to
start a btree scan will scan till leaf and others will wait till the
first worker has reached till leaf.   The first worker after reading
the leaf block will set the next block to be read and wake the first
worker waiting to scan the next block and proceed with scanning tuples
from the block it has read, similarly each worker after reading the
block, sets the next block to be read and wakes up the first waiting
worker.  This is achieved by using the condition variable patch [1]
proposed by Robert.  Parallelism is supported for both forward and
backward scans.

The optimizer will choose the parallelism based on number of pages in
index relation and cpu cost for evaluating the rows is divided equally
among workers.  Index Scan node is made parallel aware and can be used
beneath Gather as shown below:

Current Plan for Index Scans

 Index Scan using idx2 on test  (cost=0.42..7378.96 rows=2433 width=29)
   Index Cond: (c < 10)


Parallel version of plan
--
 Gather  (cost=1000.42..1243.40 rows=2433 width=29)
   Workers Planned: 1
   ->  Parallel Index Scan using idx2 on test  (cost=0.42..0.10
rows=1431 width=29)
 Index Cond: (c < 10)


The Parallel index scans can be used in parallelising aggregate
queries as well.  For example, given a query like:  select count(*)
from t1 where c1 > 1000 and c1 < 1100 and c2='aaa' Group By c2; below
form of parallel plans are possible:

 Finalize HashAggregate
   Group Key: c2
   ->  Gather
 Workers Planned: 1
 ->  Partial HashAggregate
   Group Key: c2
   ->  Parallel Index Scan using idx_t1_partial on t1
 Index Cond: ((c1 > 1000) AND (c1 < 1100))
 Filter: (c2 = 'aaa'::bpchar)

OR

Finalize GroupAggregate
   Group Key: c2
   ->  Sort
 ->  Gather
   Workers Planned: 1
   ->  Partial GroupAggregate
 Group Key: c2
 ->  Parallel Index Scan using idx_t1_partial on t1
   Index Cond: ((c1 > 1000) AND (c1 < 1100))
   Filter: (c2 = 'aaa'::bpchar)

In the second plan (GroupAggregate), the Sort + Gather step would be
replaced with GatherMerge, once we have a GatherMerge node as proposed
by Rushabh [2].  Note, that above examples are just taken to explain
the usage of parallel index scan, actual plans will be selected based
on cost.

Performance tests

This test has been performed on community m/c (hydra, POWER-7).

Initialize pgbench with 3000 scale factor (./pgbench -i -s 3000 postgres)

Count the rows in pgbench_accounts based on values of aid and bid

Serial plan
--
set max_parallel_workers_per_gather=0;

postgres=# explain analyze select count(aid) from pgbench_accounts
where aid > 1000 and aid < 9000 and bid > 800 and bid < 900;

QUERY PLAN



 Aggregate  (cost=4714590.52..4714590.53 rows=1 width=8) (actual
time=35684.425..35684.425 rows=1 loops=1)
   ->  Index Scan using pgbench_accounts_pkey on pgbench_accounts
(cost=0.57..4707458.12 rows=2852961 width=4) (actual
time=29210.743..34385.271 rows=990 loops
=1)
 Index Cond: ((aid > 1000) AND (aid < 9000))
 Filter: ((bid > 800) AND (bid < 900))
 Rows Removed by Filter: 80098999
 Planning time: 0.183 ms
 Execution time: 35684.459 ms
(7 rows)


Parallel Plan
---
set max_parallel_workers_per_gather=2;

postgres=# explain analyze select count(aid) from pgbench_accounts
where aid > 1000 and aid < 9000 and bid > 800 and bid < 900;

  QUERY PLAN


-
 Finalize Aggregate  (cost=3924773.13..3924773.14 rows=1 width=8)
(actual time=15033.105..15033.105 rows=1 loops=1)
   ->  Gather  (cost=3924772.92..3924773.12 rows=2 width=8) (actual
time=15032.986..15033.093 rows=3 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 

Re: [HACKERS] parallel.sgml

2016-10-12 Thread Michael Paquier
On Thu, Oct 13, 2016 at 10:57 AM, Tatsuo Ishii  wrote:
>> While reading parallel.sgml, I met with following sentence.
>>
>> If this occurs, the leader will execute the portion of the plan
>> between below the Gather node entirely by itself,
>> almost as if the Gather node were not present.
>>
>> Maybe "the portion of the plan between below" shoud have been
>> "the portion of the plan below"?
>
> Can someone, possibly a native English speaker, please confirm this is
> typo or not? I cannot parse the sentence as it is, but this maybe
> because I'm not a English speaker.

No native speaker here, but it seems to me that "between" could be ripped off.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Non-empty default log_line_prefix

2016-10-12 Thread Peter Eisentraut
On 10/12/16 11:58 AM, Christoph Berg wrote:
> (Yes, the '' default might be fine for syslog, but I don't think
> that's a good argument for sticking with it for default installs. I've
> seen way too many useless log files out there, and at worst we'll have
> syslogs with two timestamps.)

We'd have to think of a way to sort this out during upgrades.
Documentation might be enough.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Non-empty default log_line_prefix

2016-10-12 Thread Peter Eisentraut
On 10/12/16 11:40 AM, Tom Lane wrote:
> There would be some value in the complexity you're thinking of for
> installations that log to multiple targets concurrently, but really,
> who does that?

I see that a lot.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-12 Thread Tomas Vondra
On 10/12/2016 08:55 PM, Robert Haas wrote:
> On Wed, Oct 12, 2016 at 3:21 AM, Dilip Kumar  wrote:
>> I think at higher client count from client count 96 onwards contention
>> on CLogControlLock is clearly visible and which is completely solved
>> with group lock patch.
>>
>> And at lower client count 32,64  contention on CLogControlLock is not
>> significant hence we can not see any gain with group lock patch.
>> (though we can see some contention on CLogControlLock is reduced at 64
>> clients.)
> 
> I agree with these conclusions.  I had a chance to talk with Andres
> this morning at Postgres Vision and based on that conversation I'd
> like to suggest a couple of additional tests:
> 
> 1. Repeat this test on x86.  In particular, I think you should test on
> the EnterpriseDB server cthulhu, which is an 8-socket x86 server.
> 
> 2. Repeat this test with a mixed read-write workload, like -b
> tpcb-like@1 -b select-only@9
> 

FWIW, I'm already running similar benchmarks on an x86 machine with 72
cores (144 with HT). It's "just" a 4-socket system, but the results I
got so far seem quite interesting. The tooling and results (pushed
incrementally) are available here:

https://bitbucket.org/tvondra/hp05-results/overview

The tooling is completely automated, and it also collects various stats,
like for example the wait event. So perhaps we could simply run it on
ctulhu and get comparable results, and also more thorough data sets than
just snippets posted to the list?

There's also a bunch of reports for the 5 already completed runs

 - dilip-300-logged-sync
 - dilip-300-unlogged-sync
 - pgbench-300-logged-sync-skip
 - pgbench-300-unlogged-sync-noskip
 - pgbench-300-unlogged-sync-skip

The name identifies the workload type, scale and whether the tables are
wal-logged (for pgbench the "skip" means "-N" while "noskip" does
regular pgbench).

For example the "reports/wait-events-count-patches.txt" compares the
wait even stats with different patches applied (and master):

https://bitbucket.org/tvondra/hp05-results/src/506d0bee9e6557b015a31d72f6c3506e3f198c17/reports/wait-events-count-patches.txt?at=master=file-view-default

and average tps (from 3 runs, 5 minutes each):

https://bitbucket.org/tvondra/hp05-results/src/506d0bee9e6557b015a31d72f6c3506e3f198c17/reports/tps-avg-patches.txt?at=master=file-view-default

There are certainly interesting bits. For example while the "logged"
case is dominated y WALWriteLock for most client counts, for large
client counts that's no longer true.

Consider for example dilip-300-logged-sync results with 216 clients:

 wait_event  | master  | gran_lock | no_cont_lock | group_upd
 +-+---+--+---
 CLogControlLock |  624566 |474261 |   458599 |225338
 WALWriteLock|  431106 |623142 |   619596 |699224
 |  331542 |358220 |   371393 |537076
 buffer_content  |  261308 |134764 |   138664 |102057
 ClientRead  |   59826 |100883 |   103609 |118379
 transactionid   |   26966 | 23155 |23815 | 31700
 ProcArrayLock   |3967 |  3852 | 4070 |  4576
 wal_insert  |3948 | 10430 | 9513 | 12079
 clog|1710 |  4006 | 2443 |   925
 XidGenLock  |1689 |  3785 | 4229 |  3539
 tuple   | 965 |   617 |  655 |   840
 lock_manager| 300 |   571 |  619 |   802
 WALBufMappingLock   | 168 |   140 |  158 |   147
 SubtransControlLock |  60 |   115 |  124 |   105

Clearly, CLOG is an issue here, and it's (slightly) improved by all the
patches (group_update performing the best). And with 288 clients (which
is 2x the number of virtual cores in the machine, so not entirely crazy)
you get this:

 wait_event  | master  | gran_lock | no_cont_lock | group_upd
 +-+---+--+---
 CLogControlLock |  901670 |736822 |   728823 |398111
 buffer_content  |  492637 |318129 |   319251 |270416
 WALWriteLock|  414371 |593804 |   589809 |656613
 |  380344 |452936 |   470178 |745790
 ClientRead  |   60261 |111367 |   111391 |126151
 transactionid   |   43627 | 34585 |35464 | 48679
 wal_insert  |5423 | 29323 |25898 | 30191
 ProcArrayLock   |4379 |  3918 | 4006 |  4582
 clog|2952 |  9135 | 5304 |  2514
 XidGenLock  |2182 |  9488 | 8894 |  8595
 tuple   |2176 |  1288 | 1409 |  1821
 lock_manager| 323 |   797 |  827 |  1006
 WALBufMappingLock   | 124 |   124 |   

Re: [HACKERS] parallel.sgml

2016-10-12 Thread Tatsuo Ishii
> While reading parallel.sgml, I met with following sentence.
> 
> If this occurs, the leader will execute the portion of the plan
> between below the Gather node entirely by itself,
> almost as if the Gather node were not present.
> 
> Maybe "the portion of the plan between below" shoud have been
> "the portion of the plan below"?

Can someone, possibly a native English speaker, please confirm this is
typo or not? I cannot parse the sentence as it is, but this maybe
because I'm not a English speaker.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: scan key push down to heap [WIP]

2016-10-12 Thread Dilip Kumar
On Thu, Oct 13, 2016 at 6:05 AM, Robert Haas  wrote:
> I seriously doubt that this should EVER be supported for anything
> other than "var op const", and even then only for very simple
> operators.
Yes, with existing key push down infrastructure only "var op const",
but If we extend that I think we can cover many other simple
expressions, i.e

Unary Operator : ISNULL, NOTNULL
Other Simple expression : "Var op Const", "Var op Var", "Var op
SimpleExpr(Var, Const)" ..

We can not push down function expression because we can not guarantee
that they are safe, but can we pushdown inbuilt functions ? (I think
we can identify their safety based on their data type, but I am not
too sure about this point).

  For example, texteq() is probably too complicated, because
> it might de-TOAST, and that might involve doing a LOT of work while
> holding the buffer content lock.  But int4eq() would be OK.
>
> Part of the trick if we want to make this work is going to be figuring
> out how we'll identify which operators are safe.
Yes, I agree that's the difficult part. Can't we process full qual
list and decide decide the operator are safe or not based on their
datatype ?

What I mean to say is instead of checking safety of each operator like
texteq(), text_le()...
we can directly discard any operator involving such kind of data types.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Typo in foreign.h

2016-10-12 Thread Amit Langote
Attached fixes a minor typo: s/Thes/These/g

Thanks,
Amit
diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h
index 5dc2c90..143566a 100644
--- a/src/include/foreign/foreign.h
+++ b/src/include/foreign/foreign.h
@@ -23,7 +23,7 @@
 
 /*
  * Generic option types for validation.
- * NB! Thes are treated as flags, so use only powers of two here.
+ * NB! These are treated as flags, so use only powers of two here.
  */
 typedef enum
 {

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: scan key push down to heap [WIP]

2016-10-12 Thread Robert Haas
On Tue, Oct 11, 2016 at 4:57 AM, Dilip Kumar  wrote:
> This patch will extract the expression from qual and prepare the scan
> keys. Currently in POC version I have only supported  "var OP const"
> type of qual, because these type of quals can be pushed down using
> existing framework.
>
> Purpose of this work is to first implement the basic functionality and
> analyze the results. If results are good then we can extend it for
> other type of expressions.
>
> However in future when we try to expand the support for complex
> expressions, then we need to be very careful while selecting
> pushable expression. It should not happen that we push something very
> complex, which may cause contention with other write operation (as
> HeapKeyTest is done under page lock).

I seriously doubt that this should EVER be supported for anything
other than "var op const", and even then only for very simple
operators.  For example, texteq() is probably too complicated, because
it might de-TOAST, and that might involve doing a LOT of work while
holding the buffer content lock.  But int4eq() would be OK.

Part of the trick if we want to make this work is going to be figuring
out how we'll identify which operators are safe.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is it time to kill support for very old servers?

2016-10-12 Thread Robert Haas
On Tue, Oct 11, 2016 at 11:34 AM, Heikki Linnakangas  wrote:
> On 10/11/2016 08:23 PM, Tom Lane wrote:
>> Not sure where to go from here, but the idea of dropping V2 support
>> is seeming attractive again.  Or we could just continue the policy
>> of benign neglect.
>
> Let's drop it.

I agree.  The evidence in Tom's latest email suggests that getting
this to a point where it can be tested will be more work than
anybody's likely to want to put in, and the return on investment
doesn't seem likely to be good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] another typo in parallel.sgml

2016-10-12 Thread Robert Haas
On Mon, Oct 10, 2016 at 5:40 PM, Tatsuo Ishii  wrote:
> I think I found a typo in parallel.sgml.

I think so, too.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove vacuum_defer_cleanup_age

2016-10-12 Thread Robert Haas
On Sun, Oct 9, 2016 at 9:51 PM, Josh Berkus  wrote:
> Given that hot_standby_feedback is pretty bulletproof now, and a lot of
> the work in reducing replay conflicts, I think the utility of
> vacuum_defer_cleanup_age is at an end.  I really meant so submit a patch
> to remove it to 9.6, but it got away from me.
>
> Any objections to removing the option in 10?

I'm not sure I see the point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-10-12 Thread Robert Haas
On Wed, Oct 12, 2016 at 8:13 AM, Peter Eisentraut
 wrote:
> On 10/4/16 10:16 AM, Julien Rouhaud wrote:
>> Please find attached v9 of the patch, adding the parallel worker class
>> and changing max_worker_processes default to 16 and max_parallel_workers
>> to 8.  I also added Amit's explanation for the need of a write barrier
>> in ForgetBackgroundWorker().
>
> This approach totally messes up the decoupling between the background
> worker facilities and consumers of those facilities.  Having dozens of
> lines of code in bgworker.c that does the accounting and resource
> checking on behalf of parallel.c looks very suspicious.  Once we add
> logical replication workers or whatever, we'll be tempted to add even
> more stuff in there and it will become a mess.

I attach a new version of the patch that I've been hacking on in my
"spare time", which reduces the footprint in bgworker.c quite a bit.
I don't think it's wrong that the handling is done there, though.  The
process that is registering the background worker is well-placed to
check whether there are already too many, and if it does not then the
slot is consumed at least temporarily even if it busts the cap.  On
the flip side, the postmaster is the only process that is well-placed
to know when a background worker terminates.  The worker process
itself can't be made responsible for it, as you suggest below, because
it may never even start up in the first place (e.g. fork() returns
EAGAIN).  And the registering process can't be made responsible,
because it might die before the worker.

> I think we should think of a way where we can pass the required
> information during background worker setup, like a pointer to the
> resource-limiting GUC variable.

I don't think this can work, per the above.

> For style, I would also prefer the "class" to be a separate struct field
> from the flags.

I think that it makes sense to keep the class as part of the flags
because then a large amount of the stuff in this patch that is
concerned with propagating the flags around becomes unnecessary.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e826c19..8e53edc 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1984,7 +1984,7 @@ include_dir 'conf.d'
 
  Sets the maximum number of background processes that the system
  can support.  This parameter can only be set at server start.  The
- default is 8.
+ default is 16.
 
 
 
@@ -2006,8 +2006,9 @@ include_dir 'conf.d'
  Sets the maximum number of workers that can be started by a single
  Gather node.  Parallel workers are taken from the
  pool of processes established by
- .  Note that the requested
- number of workers may not actually be available at run time.  If this
+ , limited by
+ .  Note that the requested
+ number of workers may not actually be available at runtime.  If this
  occurs, the plan will run with fewer workers than expected, which may
  be inefficient.  The default value is 2.  Setting this value to 0
  disables parallel query execution.
@@ -2036,6 +2037,22 @@ include_dir 'conf.d'

   
 
+  
+   max_parallel_workers (integer)
+   
+max_parallel_workers configuration parameter
+   
+   
+   
+
+ Sets the maximum number of workers that the system can support for
+ parallel queries.  The default value is 8.  When increasing or
+ decreasing this value, consider also adjusting
+ .
+
+   
+  
+
   
backend_flush_after (integer)

diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 59dc394..1c32fcd 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -454,7 +454,8 @@ LaunchParallelWorkers(ParallelContext *pcxt)
 	snprintf(worker.bgw_name, BGW_MAXLEN, "parallel worker for PID %d",
 			 MyProcPid);
 	worker.bgw_flags =
-		BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+		BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION
+		| BGWORKER_CLASS_PARALLEL;
 	worker.bgw_start_time = BgWorkerStart_ConsistentState;
 	worker.bgw_restart_time = BGW_NEVER_RESTART;
 	worker.bgw_main = ParallelWorkerMain;
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 3a2929a..75f9a55 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -80,9 +80,22 @@ typedef struct BackgroundWorkerSlot
 	BackgroundWorker worker;
 } BackgroundWorkerSlot;
 
+/*
+ * In order to limit the total number of parallel workers (according to
+ * max_parallel_workers GUC), we maintain the number of active parallel
+ * workers.  Since the postmaster cannot take 

Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2016-10-12 Thread Haribabu Kommi
On Thu, Oct 13, 2016 at 7:31 AM, Vitaly Burovoy 
wrote:

> Haribabu Kommi, why have you read enough about EUI-64?
> Your function "macaddr64_trunc" sets 4 lower bytes as 0 whereas
> (according to the Wikipedia, but I can look for a standard) 3 bytes
> are still define an OUI (organizationally unique identifier), so
> truncation should be done for 5, not 4 lower octets.
>
> The macros "hibits" should be 3 octets long, not 4; "lobits" --- 5 bytes,
> not 4.
> In the other case your comparisons fail.
>
> What document have you used to write the patch? Are short form formats
> correct in macaddr64_in?
>


Yes, OUI is 24 bits. I just created prototype patch to check community
opinion on it.
I checked the following links [1], [2] for the development of macaddr8. But
the patch is
not correct for all the cases, it is just a prototype to see whether it
accepts 8 byte
MAC address or not?


[1] - http://standards.ieee.org/develop/regauth/tut/eui64.pdf
[2] - https://en.wikipedia.org/wiki/MAC_address


Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Re: [COMMITTERS] pgsql: Extend framework from commit 53be0b1ad to report latch waits.

2016-10-12 Thread Andres Freund
On 2016-10-12 11:18:57 -0400, Peter Eisentraut wrote:
> On 10/4/16 11:29 AM, Tom Lane wrote:
> > Robert Haas  writes:
> >> Apparently, 'make world' does not build worker_spi.  I thought 'make
> >> world' was supposed to build everything?
> > 
> > You'd have thunk, yeah.  It looks like the issue is that src/Makefile
> > is selective about recursing into certain subdirectories of test/,
> > but mostly not test/ itself.  src/test/Makefile naively believes it's
> > in charge, though.  Probably that logic ought to get shoved down one
> > level, and then adjusted so that src/test/modules gets built by "all".
> > Or else teach top-level "make world" to do "make all" in src/test/,
> > but that seems like it's just doubling down on confusing interconnections.
> 
> We generally don't build test code during make world.

FWIW, I find that quite annoying. I want to run make world with parallelism so
I can run make world afterwards with as little unnecessary
unparallelized work. And since the latter takes just about forever, any
errors visible earlier are good.

What are we gaining by this behaviour?

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] munmap() failure due to sloppy handling of hugepage size

2016-10-12 Thread Merlin Moncure
On Wed, Oct 12, 2016 at 5:18 PM, Tom Lane  wrote:
> Merlin Moncure  writes:
>> ISTM all this silliness is pretty much unique to linux anyways.
>> Instead of reading the filesystem, what about doing test map and test
>> unmap?
>
> And if mmap succeeds and munmap fails, you'll recover how exactly?
>
> If this API were less badly designed, we'd not be having this problem
> in the first place ...

I was thinking to 'guess' in a ^2 loop in the case the obvious unmap
didn't work, finally aborting if no guess worked.  :-).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump getBlobs query broken for 7.3 servers

2016-10-12 Thread Robert Haas
On Wed, Oct 12, 2016 at 12:24 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Wed, Oct 12, 2016 at 11:54 AM, Greg Stark  wrote:
>>> Fwiw I was considering proposing committing some patches for these old
>>> releases to make them easier to build. I would suggest creating a tag
>>> for a for this stable legacy version and limiting the commits to just:
>>>
>>> 1) Disabling warnings
>>> 2) Fixing bugs in the configure scripts that occur on more recent
>>> systems (version number parsing etc)
>>> 3) Backporting things like the variable-length array code that prevents 
>>> building
>>> 4) Adding compiler options like -fwrapv
>
>> I'd support that.  The reason why we remove branches from support is
>> so that we don't have to back-patch things to 10 or 15 branches when
>> we have a bug fix.  But that doesn't mean that we should prohibit all
>> commits to those branches for any reason, and making it easier to test
>> backward-compatibility when we want to do that seems like a good
>> reason.
>
> Meh, I think that this will involve a great deal more work than it's
> worth.  We deal with moving-target platforms *all the time*.  New compiler
> optimizations break things, libraries such as OpenSSL whack things around,
> other libraries such as uuid-ossp stop getting maintained and become
> unusable on new platforms, bison decides to get stickier about comma
> placement, yadda yadda yadda.  How much of that work are we going to
> back-port to dead branches?  And to what extent is such effort going to be
> self-defeating because the branch no longer behaves as it did back in the
> day?
>
> If Greg wants to do this kind of work, he's got a commit bit.  My position
> is that we have a limited support lifespan for a reason, and I'm not going
> to spend time on updating dead branches forever.  To me, it's more useful
> to test them in place on contemporary platforms.  We've certainly got
> enough old platforms laying about in the buildfarm and elsewhere.

I agree that it shouldn't be an expectation that committers in general
will do this, whether you or me or anyone else.  However, I think that
if Greg or some other committer wants to volunteer their own time to
do some of it, that is fine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] munmap() failure due to sloppy handling of hugepage size

2016-10-12 Thread Andres Freund
On 2016-10-12 16:33:38 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On October 12, 2016 1:25:54 PM PDT, Tom Lane  wrote:
> >> A little bit of research suggests that on Linux the thing to do would
> >> be to get the actual default hugepage size by reading /proc/meminfo and
> >> looking for a line like "Hugepagesize:   2048 kB".
> 
> > We had that, but Heikki ripped it out when merging... I think you're 
> > supposed to use /sys to get the available size.
> 
> According to
> https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
> looking into /proc/meminfo is the longer-standing API and thus is
> likely to work on more kernel versions.

MAP_HUGETLB, which we rely on for hugepage support, is newer than the
introducing the /sys stuff.


> Also, if you look into /sys then you are going to see multiple
> possible values and it's not clear how to choose the right one.

That's a fair point. It'd probably be good to use the largest we can,
bounded by a percentage of max waste or such.  But that's likely
something for another day.

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] munmap() failure due to sloppy handling of hugepage size

2016-10-12 Thread Tom Lane
Merlin Moncure  writes:
> ISTM all this silliness is pretty much unique to linux anyways.
> Instead of reading the filesystem, what about doing test map and test
> unmap?

And if mmap succeeds and munmap fails, you'll recover how exactly?

If this API were less badly designed, we'd not be having this problem
in the first place ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] munmap() failure due to sloppy handling of hugepage size

2016-10-12 Thread Merlin Moncure
On Wed, Oct 12, 2016 at 5:10 PM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Tom Lane wrote:
>>> According to
>>> https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
>>> looking into /proc/meminfo is the longer-standing API and thus is
>>> likely to work on more kernel versions.  Also, if you look into
>>> /sys then you are going to see multiple possible values and it's
>>> not clear how to choose the right one.
>
>> I'm not sure that this is the best rationale.  In my system there are
>> 2MB and 1GB huge page sizes; in systems with lots of memory (let's say 8
>> GB of shared memory is requested) it seems a clear winner to allocate 8
>> 1GB hugepages than 4096 2MB hugepages because the page table is so much
>> smaller.  The /proc interface only shows the 2MB page size, so if we go
>> that route we'd not be getting the full benefit of the feature.
>
> And you'll tell mmap() which one to do how exactly?  I haven't found
> anything explaining how applications get to choose which page size applies
> to their request.  The kernel document says that /proc/meminfo reflects
> the "default" size, and I'd assume that that's what we'll get from mmap.

hm. for (recent) linux, I see:

   MAP_HUGE_2MB, MAP_HUGE_1GB (since Linux 3.8)
  Used in conjunction with MAP_HUGETLB to select alternative
  hugetlb page sizes (respectively, 2 MB and 1 GB) on systems
  that support multiple hugetlb page sizes.

  More generally, the desired huge page size can be configured
  by encoding the base-2 logarithm of the desired page size in
  the six bits at the offset MAP_HUGE_SHIFT.  (A value of zero
  in this bit field provides the default huge page size; the
  default huge page size can be discovered vie the Hugepagesize
  field exposed by /proc/meminfo.)  Thus, the above two
  constants are defined as:

  #define MAP_HUGE_2MB(21 << MAP_HUGE_SHIFT)
  #define MAP_HUGE_1GB(30 << MAP_HUGE_SHIFT)

  The range of huge page sizes that are supported by the system
  can be discovered by listing the subdirectories in
  /sys/kernel/mm/hugepages.


via: http://man7.org/linux/man-pages/man2/mmap.2.html#NOTES

ISTM all this silliness is pretty much unique to linux anyways.
Instead of reading the filesystem, what about doing test map and test
unmap?  We could zero in on the page size for default I think with
some probing of known possible values.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication connection information management

2016-10-12 Thread Alvaro Herrera
Peter Eisentraut wrote:

> An idea is to make the foreign server concept more general and allow
> it to exist independently of a foreign data wrapper.  Then create more
> specific syntax like
> 
> CREATE SERVER node1 FOR SUBSCRIPTION OPTIONS ( ... );
> 
> or
> 
> CREATE SUBSCRIPTION SERVER ...
> 
> This would work a bit like pg_constraint, which can be tied to a table
> or a type or even nothing (for the hypothetical assertions feature).

I was with you until you mentioned pg_constraint, because as I
understand it we're not entirely happy with that one catalog.  We've
talked about splitting it up into two catalogs for table and domain
constraints (I don't think we've considered assertions).  Doing this
would cause us to drop the NOT NULL from pg_foreign_server.srvfdw.
However, while this sounds bad I think it's not *too* bad:  it's not as
heavily used as the corresponding pg_constraint columns would be.  In
other words, this raised some red flags with me initially but I think
it's okay.

> We'd need a separate mechanism for controlling which user has the right
> to create such subscription servers, but it might be acceptable at the
> beginning to just require superuserness.

We'll need to have a better answer to this sooner rather than later.
Perhaps a new predefined grantable privilege?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] munmap() failure due to sloppy handling of hugepage size

2016-10-12 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> According to
>> https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
>> looking into /proc/meminfo is the longer-standing API and thus is
>> likely to work on more kernel versions.  Also, if you look into
>> /sys then you are going to see multiple possible values and it's
>> not clear how to choose the right one.

> I'm not sure that this is the best rationale.  In my system there are
> 2MB and 1GB huge page sizes; in systems with lots of memory (let's say 8
> GB of shared memory is requested) it seems a clear winner to allocate 8
> 1GB hugepages than 4096 2MB hugepages because the page table is so much
> smaller.  The /proc interface only shows the 2MB page size, so if we go
> that route we'd not be getting the full benefit of the feature.

And you'll tell mmap() which one to do how exactly?  I haven't found
anything explaining how applications get to choose which page size applies
to their request.  The kernel document says that /proc/meminfo reflects
the "default" size, and I'd assume that that's what we'll get from mmap.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1

2016-10-12 Thread Tom Lane
I wrote:
> Well, the buildfarm doesn't like that even a little bit.  It seems that
> the MSVC compiler does not like seeing both "extern Datum foo(...)" and
> "extern PGDLLEXPORT Datum foo(...)", so anything that had an extern in
> a .h file is failing.  There is also quite a bit of phase-of-the-moon
> behavior in here, because in some cases some functions are raising errors
> and others that look entirely the same are not.

I take back the latter comment --- I was comparing HEAD source code
against errors reported on back branches, and at least some of the
discrepancies are explained by additions/removals of separate "extern"
declarations.  But still, if we want to do this we're going to need to
put PGDLLEXPORT in every V1 function extern declaration.  We might be
able to remove some of the externs as unnecessary instead, but any way
you slice it it's going to be messy.

For the moment I've reverted the change.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Add support for SRF and returning composites to pl/tcl

2016-10-12 Thread Jim Nasby
Attached is a patch that adds support for SRFs and returning composites 
from pl/tcl. This work was sponsored by Flight Aware.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461
diff --git a/doc/src/sgml/pltcl.sgml b/doc/src/sgml/pltcl.sgml
index 805cc89..1c185cb 100644
--- a/doc/src/sgml/pltcl.sgml
+++ b/doc/src/sgml/pltcl.sgml
@@ -173,8 +173,54 @@ $$ LANGUAGE pltcl;
 
 
 
- There is currently no support for returning a composite-type
- result value, nor for returning sets.
+ PL/Tcl functions can return a record containing multiple output
+ parameters.  The function's Tcl code should return a list of
+ key-value pairs matching the output parameters.
+
+
+CREATE FUNCTION square_cube(in int, out squared int, out cubed int) AS $$
+return [list squared [expr {$1 * $1}] cubed [expr {$1 * $1 * $1}]]
+$$ LANGUAGE 'pltcl';
+
+
+
+
+ Sets can be returned as a table type.  The Tcl code should successively
+ call return_next with an argument consisting of a Tcl
+ list of key-value pairs.
+
+
+CREATE OR REPLACE FUNCTION squared_srf(int,int) RETURNS TABLE (x int, y int) 
AS $$
+for {set i $1} {$i < $2} {incr i} {
+return_next [list x $i y [expr {$i * $i}]]
+}
+$$ LANGUAGE 'pltcl';
+
+
+
+
+ Any columns that are defined in the composite return type but absent from
+ a list of key-value pairs passed to return_next are implicitly
+ null in the corresponding row. PL/Tcl will generate a Tcl error when a
+ column name in the key-value list is not one of the defined columns.
+
+
+
+ Similarly, functions can be defined as returning SETOF
+ with a user-defined data type.
+
+
+
+ PL/Tcl functions can also use return_next to return a set of
+ a scalar data type.
+
+
+CREATE OR REPLACE FUNCTION sequence(int,int) RETURNS SETOF int AS $$
+for {set i $1} {$i < $2} {incr i} {
+return_next $i
+}
+$$ language 'pltcl';
+
 
 
 
@@ -197,8 +243,10 @@ $$ LANGUAGE pltcl;
  displayed by a SELECT statement).  Conversely, the
  return
  command will accept any string that is acceptable input format for
- the function's declared return type.  So, within the PL/Tcl function,
- all values are just text strings.
+ the function's declared return type(s).  Likewise when producing a
+ set using return_next, values are converted to their
+ native database data types.  (A Tcl error is generated whenever this
+ conversion fails.)
 
 

diff --git a/src/pl/tcl/expected/pltcl_queries.out 
b/src/pl/tcl/expected/pltcl_queries.out
index 6cb1fdb..7a7b029 100644
--- a/src/pl/tcl/expected/pltcl_queries.out
+++ b/src/pl/tcl/expected/pltcl_queries.out
@@ -303,3 +303,44 @@ select tcl_lastoid('t2') > 0;
  t
 (1 row)
 
+-- test compound return
+select * from tcl_test_cube_squared(5);
+ squared | cubed 
+-+---
+  25 |   125
+(1 row)
+
+CREATE FUNCTION bad_record(OUT a text , OUT b text) AS $$return [list cow]$$ 
LANGUAGE pltcl;
+SELECT bad_record();
+ERROR:  list must have even number of elements
+CREATE FUNCTION bad_field(OUT a text , OUT b text) AS $$return [list cow 1 a 2 
b 3]$$ LANGUAGE pltcl;
+SELECT bad_field();
+ERROR:  Tcl list contains nonexistent column "cow"
+CREATE OR REPLACE FUNCTION tcl_error(OUT a int, OUT b int) AS $$return {$$ 
LANGUAGE pltcl;
+SELECT tcl_error();
+ERROR:  missing close-brace
+-- test SRF
+select * from tcl_test_squared_rows(0,5);
+ x | y  
+---+
+ 0 |  0
+ 1 |  1
+ 2 |  4
+ 3 |  9
+ 4 | 16
+(5 rows)
+
+CREATE OR REPLACE FUNCTION non_srf() RETURNS int AS $$return_next 1$$ LANGUAGE 
pltcl;
+select non_srf();
+ERROR:  cannot use return_next in a non-set-returning function
+-- test setof returns
+select * from tcl_test_sequence(0,5) as a;
+ a 
+---
+ 0
+ 1
+ 2
+ 3
+ 4
+(5 rows)
+
diff --git a/src/pl/tcl/expected/pltcl_setup.out 
b/src/pl/tcl/expected/pltcl_setup.out
index e65e9e3..5332187 100644
--- a/src/pl/tcl/expected/pltcl_setup.out
+++ b/src/pl/tcl/expected/pltcl_setup.out
@@ -569,6 +569,19 @@ create function tcl_error_handling_test() returns text as 
$$
 return "no error"
 }
 $$ language pltcl;
+CREATE OR REPLACE FUNCTION tcl_test_cube_squared(in int, out squared int, out 
cubed int) AS $$
+return [list squared [expr {$1 * $1}] cubed [expr {$1 * $1 * $1}]]
+$$ LANGUAGE 'pltcl';
+CREATE OR REPLACE FUNCTION tcl_test_squared_rows(int,int) RETURNS TABLE (x 
int, y int) AS $$
+for {set i $1} {$i < $2} {incr i} {
+return_next [list y [expr {$i * $i}] x $i]
+}
+$$ LANGUAGE 'pltcl';
+CREATE OR REPLACE FUNCTION tcl_test_sequence(int,int) RETURNS SETOF int AS $$
+for {set i $1} {$i < $2} {incr i} {
+return_next $i
+}
+$$ language 'pltcl';
 select tcl_error_handling_test();
 tcl_error_handling_test 

Re: [HACKERS] munmap() failure due to sloppy handling of hugepage size

2016-10-12 Thread Alvaro Herrera
Tom Lane wrote:
> Andres Freund  writes:
> > On October 12, 2016 1:25:54 PM PDT, Tom Lane  wrote:
> >> A little bit of research suggests that on Linux the thing to do would
> >> be to get the actual default hugepage size by reading /proc/meminfo and
> >> looking for a line like "Hugepagesize:   2048 kB".
> 
> > We had that, but Heikki ripped it out when merging... I think you're 
> > supposed to use /sys to get the available size.
> 
> According to
> https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
> looking into /proc/meminfo is the longer-standing API and thus is
> likely to work on more kernel versions.  Also, if you look into
> /sys then you are going to see multiple possible values and it's
> not clear how to choose the right one.

I'm not sure that this is the best rationale.  In my system there are
2MB and 1GB huge page sizes; in systems with lots of memory (let's say 8
GB of shared memory is requested) it seems a clear winner to allocate 8
1GB hugepages than 4096 2MB hugepages because the page table is so much
smaller.  The /proc interface only shows the 2MB page size, so if we go
that route we'd not be getting the full benefit of the feature.

We could just fall back to reading /proc if we cannot find the /sys
stuff; that makes it continue to work on older kernels.

Regarding choosing one among several size choices, I'd think the larger
huge page size is always better if the request size is at least that
large, otherwise fall back to the next smaller one.  (This could stand
some actual research).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FTS Configuration option

2016-10-12 Thread Artur Zakirov
Thank you for sharing your thoughts!

2016-10-12 15:08 GMT+03:00 Emre Hasegeli :
> However then the stemmer doesn't do a good job on those words, because
> the changed characters are important for the language.  What I really
> needed was something like this:
>
>> ALTER TEXT SEARCH CONFIGURATION turkish
>> ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, 
>> hword_part
>> WITH (fix_mistyped_characters AND (turkish_hunspell OR turkish_stem) AND 
>> unaccent);

Your syntax looks more flexible and prettier than with JOIN option. As
I understand there are three steps here. On each step a dictionary
return a lexeme and pass it to next dictionary. If dictionary return
NULL then the processing will interrupt.

With such syntax we also don't need the TSL_FILTER flag for lexeme. At
the current time unaccent extension set this flag to pass a lexeme to
a next dictionary. This flag is used by the text-search parser. It
looks like a hard coded solution. User can't change this behaviour.

Maybe also better to use -> instead of AND? AND would has another
behaviour. I could create the following configuration:

=> ALTER TEXT SEARCH CONFIGURATION multi_conf
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
word, hword, hword_part
WITH (german_ispell AND english_ispell) OR simple;

which will return both german_ispell and english_ispell results. But
I'm not sure that this is a good solution.

Of course if this syntax will be implemented, old syntax with commas
also should be maintained.

-- 
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1

2016-10-12 Thread Tom Lane
I wrote:
> Albe Laurenz  writes:
>> Tom Lane wrote:
>>> I'm okay with adding PGDLLEXPORT to the extern, but we should update
>>> that comment to note that it's not necessary with any of our standard
>>> Windows build processes.  (For that matter, the comment fails to explain
>>> why this macro is providing an extern for the base function at all...)

>> Here is a patch for that, including an attempt to improve the comment.

> Pushed with some further twiddling of the comment.

Well, the buildfarm doesn't like that even a little bit.  It seems that
the MSVC compiler does not like seeing both "extern Datum foo(...)" and
"extern PGDLLEXPORT Datum foo(...)", so anything that had an extern in
a .h file is failing.  There is also quite a bit of phase-of-the-moon
behavior in here, because in some cases some functions are raising errors
and others that look entirely the same are not.

We could plaster all those declarations with PGDLLEXPORT, but I'm rather
considerably tempted to go back to the way things were, instead.  I do
not like patches that end up with Microsoft-droppings everywhere.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2016-10-12 Thread Vitaly Burovoy
On 10/12/16, Tom Lane  wrote:
> Vitaly Burovoy  writes:
>> I'm sorry for the offtopic, but does anyone know a reason why a
>> condition in mac.c
>
>>> if ((a < 0) || (a > 255) || (b < 0) || (b > 255) ||
>>> (c < 0) || (c > 255) || (d < 0) || (d > 255) ||
>>> (e < 0) || (e > 255) || (f < 0) || (f > 255))
>
>> can not be rewritten as:
>
>>> if (((a | b | c | d | e | f) < 0) ||
>>> ((a | b | c | d | e | f) > 255))
>

> Well, it's ugly and

> it adds a bunch of assumptions about arithmetic
> behavior that we don't particularly need to make.

It explains the reason, thank you. I'm just not familiar with other
architectures where it is not the same as in X86/X86-64.

> If this were some
> amazingly hot hot-spot then maybe it would be worth making the code
> unreadable to save a few nanoseconds, but I doubt that it is.

> (Anyway, you've not shown that there actually is any benefit ...)
I don't think it has a speed benefit, especially comparing with the sscanf call.
But personally for me the second variant does not seem ugly, just "no
one bit in all variables is out of a byte" (looks better with
comparison with "0xff" as sscanf operates with "%2x").
Sorry for my bad taste and for a noise.

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2016-10-12 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> Vitaly Burovoy  writes:
>>> P.S.: I still think it is a good idea to change storage format,

>> I'm not sure which part of "no" you didn't understand, but we're
>> not breaking on-disk compatibility of existing macaddr columns.
>> Breaking the on-the-wire binary I/O representation seems like a
>> nonstarter as well.

> I think the suggestion was to rename macaddr to macaddr6 or similar,
> keeping the existing behavior and the current OID.  So existing columns
> would continue to work fine and maintain on-disk compatibility, but any
> newly created columns would become the 8-byte variant.

... and would have different I/O behavior from before, possibly breaking
applications that expect "macaddr" to mean what it used to.  I'm still
dubious that that's a good idea.

The larger picture here is that we got very little thanks when we squeezed
IPv6 into the pre-existing inet datatype; there's a large number of people
who just said "no thanks" and started using the add-on ip4r type instead.
So I'm not sure why we want to complicate our lives in order to make
macaddr follow the same path.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2016-10-12 Thread Tom Lane
Vitaly Burovoy  writes:
> I'm sorry for the offtopic, but does anyone know a reason why a
> condition in mac.c

>> if ((a < 0) || (a > 255) || (b < 0) || (b > 255) ||
>> (c < 0) || (c > 255) || (d < 0) || (d > 255) ||
>> (e < 0) || (e > 255) || (f < 0) || (f > 255))

> can not be rewritten as:

>> if (((a | b | c | d | e | f) < 0) ||
>> ((a | b | c | d | e | f) > 255))

Well, it's ugly and it adds a bunch of assumptions about arithmetic
behavior that we don't particularly need to make.  If this were some
amazingly hot hot-spot then maybe it would be worth making the code
unreadable to save a few nanoseconds, but I doubt that it is.
(Anyway, you've not shown that there actually is any benefit ...)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2016-10-12 Thread Alvaro Herrera
Tom Lane wrote:
> Vitaly Burovoy  writes:
> > P.S.: I still think it is a good idea to change storage format,
> 
> I'm not sure which part of "no" you didn't understand, but we're
> not breaking on-disk compatibility of existing macaddr columns.
> Breaking the on-the-wire binary I/O representation seems like a
> nonstarter as well.

I think the suggestion was to rename macaddr to macaddr6 or similar,
keeping the existing behavior and the current OID.  So existing columns
would continue to work fine and maintain on-disk compatibility, but any
newly created columns would become the 8-byte variant.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2016-10-12 Thread Tom Lane
Vitaly Burovoy  writes:
> P.S.: I still think it is a good idea to change storage format,

I'm not sure which part of "no" you didn't understand, but we're
not breaking on-disk compatibility of existing macaddr columns.
Breaking the on-the-wire binary I/O representation seems like a
nonstarter as well.

If you can think of a way to have one type do it all without breaking
that, then fine; but that seems like a pretty hard problem.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2016-10-12 Thread Vitaly Burovoy
I'm sorry for the offtopic, but does anyone know a reason why a
condition in mac.c

> if ((a < 0) || (a > 255) || (b < 0) || (b > 255) ||
> (c < 0) || (c > 255) || (d < 0) || (d > 255) ||
> (e < 0) || (e > 255) || (f < 0) || (f > 255))

can not be rewritten as:

> if (((a | b | c | d | e | f) < 0) ||
> ((a | b | c | d | e | f) > 255))

It seems more compact and a compiler can optimize it to keep a result
of a binary OR for the comparison with 255...

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] munmap() failure due to sloppy handling of hugepage size

2016-10-12 Thread Tom Lane
Andres Freund  writes:
> On October 12, 2016 1:25:54 PM PDT, Tom Lane  wrote:
>> A little bit of research suggests that on Linux the thing to do would
>> be to get the actual default hugepage size by reading /proc/meminfo and
>> looking for a line like "Hugepagesize:   2048 kB".

> We had that, but Heikki ripped it out when merging... I think you're supposed 
> to use /sys to get the available size.

According to
https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
looking into /proc/meminfo is the longer-standing API and thus is
likely to work on more kernel versions.  Also, if you look into
/sys then you are going to see multiple possible values and it's
not clear how to choose the right one.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2016-10-12 Thread Vitaly Burovoy
On 10/12/16, Vitaly Burovoy  wrote:
> On 10/12/16, Alvaro Herrera  wrote:
>> Julien Rouhaud wrote:
>>> On 12/10/2016 14:32, Alvaro Herrera wrote:
>>> > Julien Rouhaud wrote:
>>> >
>>> >> and you can instead make macaddr64 support both format, and provide a
>>> >> macaddr::macaddr64 cast
>>> >
>>> > Having macaddr64 support both formats sounds nice, but how does it
>>> > work?
>>> > Will we have to reserve one additional bit to select the
>>> > representation?
>>> > That would make the type be 65 bits which is a clear loser IMO.
>>> >
>>> > Is it allowed to just leave 16 bits as zeroes which would indicate
>>> > that
>>> > the address is EUI48?  I wouldn't think so ...
>>>
>>> From what I read, you can indicate it's an EUI-48 address by storing
>>> FF:FF (or FF:FE for MAC-48) in 4th and 5th bytes of the EUI-64 address.
>>
>> That seems reasonable at first glance; so the new type would be able to
>> store both 48-bit and 64-bit values, and there would be assignment casts
>> in both directions
>
> I think either "macaddr" should be renamed to "macaddr6" (saved its
> oid), a new type "macaddr8" is added with introducing a new alias
> "macaddr" or the current "macaddr" should accept both versions as the
> "inet" type does.
>
> The function "macaddr_recv" can distinguish them by the
> StringInfoData.len member, "macaddr_in" - by number of parts split by
> ":".
> The "macaddr_out" and "macaddr_send" can out 6 octets if the stored
> value is mapped to MAC-48.
> Storing can be done always as 8 bytes using the rule above.
>
> In the other case there is hard from user's PoV to detect at the
> design stage when it is necessary to define column as macaddr and when
> to macaddr8.
> If users need to update a column type (a new hardware appears with
> EUI-64 address), they should keep in mind that all things are changed
> for the new column type - stored procedure's parameters, application
> code interacting with the DB etc.).
> I don't agree with Tom's proposal to introduce a new type, it would be
> inconvenient for users.
>
> We have types "int2", "int4", "int8" and an alias "int" for a type
> "int4", at least psql does not show it:
> postgres=# \dT+ int
> List of data types
>  Schema | Name | Internal name | Size | Elements | Owner | Access
> privileges | Description
> +--+---+--+--+---+---+-
> (0 rows)
>
> It is understandable to have 3 types for integers because most space
> of the DB occupied by them and in the real life we just don't use big
> numbers, but cases for "inet" and "macaddr" are different.
>
>> and a suite of operators to enable interoperability
>> of 48-bit values in macaddr8 with values in type macaddr.  Right?
>>
>> (The cast function from macaddr8 to macaddr would raise error if the
>> 4th and 5th bytes are not either FF:FF or FF:FE -- I don't think we can
>> in practice distinguish EUI-48 from MAC-48 in this context.
>
> The wikipedia says[1] they are the same things but MAC-48 is an
> obsolete term for a special case, so there is no necessary to
> distinguish them.
>
>> The cast in the other direction would have no restriction and should
>> probably always use FF:FE).
>
> [1]
> https://en.wikipedia.org/wiki/Organizationally_unique_identifier#48-bit_Media_Access_Control_Identifier_.28MAC-48.29

Haribabu Kommi, why have you read enough about EUI-64?
Your function "macaddr64_trunc" sets 4 lower bytes as 0 whereas
(according to the Wikipedia, but I can look for a standard) 3 bytes
are still define an OUI (organizationally unique identifier), so
truncation should be done for 5, not 4 lower octets.

The macros "hibits" should be 3 octets long, not 4; "lobits" --- 5 bytes, not 4.
In the other case your comparisons fail.

What document have you used to write the patch? Are short form formats
correct in macaddr64_in?

P.S.: I still think it is a good idea to change storage format,
macaddr_{in,out,send,recv}, fill 4th and 5th bytes if necessary;
change "lobits" macros and add new fields to bit operation functions.
It avoids a new type, casting, comparison functions between macaddr6
and macaddr8 etc. Proposed patch is mostly copy-paste of mac.c

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] munmap() failure due to sloppy handling of hugepage size

2016-10-12 Thread Andres Freund


On October 12, 2016 1:25:54 PM PDT, Tom Lane  wrote:
>If any of you were following the thread at
>https://www.postgresql.org/message-id/flat/CAOan6TnQeSGcu_627NXQ2Z%2BWyhUzBjhERBm5RN9D0QFWmk7PoQ%40mail.gmail.com
>I spent quite a bit of time following a bogus theory, but the problem
>turns out to be very simple: on Linux, munmap() is pickier than mmap()
>about the length of a hugepage allocation.  The comments in
>sysv_shmem.c
>mention that on older kernels mmap() with MAP_HUGETLB will fail if
>given
>a length request that's not a multiple of the hugepage size.  Well, the
>behavior they replaced that with is little better: mmap() succeeds, but
>it gives you back a region that's been silently enlarged to the next
>hugepage boundary, and then munmap() will fail if you specify the
>region
>size you asked for rather than the region size you were given.
>
>Since AFAICS there is no way to inquire what region size you were
>given,
>this API is astonishingly brain-dead IMO.  But that seems to be what
>we've got.  Chris Richards reported it against a 3.16.7 kernel, and
>I can replicate the behavior on RHEL6 (2.6.32) by asking for an
>odd-size
>huge page region.
>
>We've mostly masked this by rounding up to a 2MB boundary, which is
>what
>the hugepage size typically is.  But that assumption is wrong on some
>hardware, and it's not likely to get less wrong as time passes.
>
>A little bit of research suggests that on Linux the thing to do would
>be
>to get the actual default hugepage size by reading /proc/meminfo and
>looking for a line like "Hugepagesize:   2048 kB".  I don't know
>of any more-portable API, so this does nothing for non-Linux kernels.
>But we have not heard of similar misbehavior on other platforms, even
>though IA64 and PPC64 can both have hugepages larger than 2MB, so it's
>reasonable to hope that other implementations of munmap() don't have
>the same gotcha.

We had that, but Heikki ripped it out when merging... I think you're supposed 
to use /sys to get the available size.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] munmap() failure due to sloppy handling of hugepage size

2016-10-12 Thread Tom Lane
If any of you were following the thread at
https://www.postgresql.org/message-id/flat/CAOan6TnQeSGcu_627NXQ2Z%2BWyhUzBjhERBm5RN9D0QFWmk7PoQ%40mail.gmail.com
I spent quite a bit of time following a bogus theory, but the problem
turns out to be very simple: on Linux, munmap() is pickier than mmap()
about the length of a hugepage allocation.  The comments in sysv_shmem.c
mention that on older kernels mmap() with MAP_HUGETLB will fail if given
a length request that's not a multiple of the hugepage size.  Well, the
behavior they replaced that with is little better: mmap() succeeds, but
it gives you back a region that's been silently enlarged to the next
hugepage boundary, and then munmap() will fail if you specify the region
size you asked for rather than the region size you were given.

Since AFAICS there is no way to inquire what region size you were given,
this API is astonishingly brain-dead IMO.  But that seems to be what
we've got.  Chris Richards reported it against a 3.16.7 kernel, and
I can replicate the behavior on RHEL6 (2.6.32) by asking for an odd-size
huge page region.

We've mostly masked this by rounding up to a 2MB boundary, which is what
the hugepage size typically is.  But that assumption is wrong on some
hardware, and it's not likely to get less wrong as time passes.

A little bit of research suggests that on Linux the thing to do would be
to get the actual default hugepage size by reading /proc/meminfo and
looking for a line like "Hugepagesize:   2048 kB".  I don't know
of any more-portable API, so this does nothing for non-Linux kernels.
But we have not heard of similar misbehavior on other platforms, even
though IA64 and PPC64 can both have hugepages larger than 2MB, so it's
reasonable to hope that other implementations of munmap() don't have
the same gotcha.

Barring objections I'll go make this happen.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump getBlobs query broken for 7.3 servers

2016-10-12 Thread Tom Lane
Robert Haas  writes:
> On Wed, Oct 12, 2016 at 11:54 AM, Greg Stark  wrote:
>> Fwiw I was considering proposing committing some patches for these old
>> releases to make them easier to build. I would suggest creating a tag
>> for a for this stable legacy version and limiting the commits to just:
>> 
>> 1) Disabling warnings
>> 2) Fixing bugs in the configure scripts that occur on more recent
>> systems (version number parsing etc)
>> 3) Backporting things like the variable-length array code that prevents 
>> building
>> 4) Adding compiler options like -fwrapv

> I'd support that.  The reason why we remove branches from support is
> so that we don't have to back-patch things to 10 or 15 branches when
> we have a bug fix.  But that doesn't mean that we should prohibit all
> commits to those branches for any reason, and making it easier to test
> backward-compatibility when we want to do that seems like a good
> reason.

Meh, I think that this will involve a great deal more work than it's
worth.  We deal with moving-target platforms *all the time*.  New compiler
optimizations break things, libraries such as OpenSSL whack things around,
other libraries such as uuid-ossp stop getting maintained and become
unusable on new platforms, bison decides to get stickier about comma
placement, yadda yadda yadda.  How much of that work are we going to
back-port to dead branches?  And to what extent is such effort going to be
self-defeating because the branch no longer behaves as it did back in the
day?

If Greg wants to do this kind of work, he's got a commit bit.  My position
is that we have a limited support lifespan for a reason, and I'm not going
to spend time on updating dead branches forever.  To me, it's more useful
to test them in place on contemporary platforms.  We've certainly got
enough old platforms laying about in the buildfarm and elsewhere.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Non-empty default log_line_prefix

2016-10-12 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
> On Wed, Oct 12, 2016 at 11:20 AM, Christoph Berg  wrote:
> > Re: Jeff Janes 2016-10-12 
> > 

Re: [HACKERS] Non-empty default log_line_prefix

2016-10-12 Thread Devrim Gündüz

Hi,

On Wed, 2016-10-12 at 13:11 -0400, Tom Lane wrote:
> > What is the cost of using %m, other than 4 (rather compressible) bytes per
> > log entry?
> 
> More log I/O, which is not free ...

FWIW, we've been setting log_line_prefix to '< %m > ' for quite a long time in
PGDG RPMs, and did not get any complaints. I'd vote for %m for default.

Regards,
-- 
Devrim GÜNDÜZ
EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-10-12 Thread Peter Geoghegan
On Wed, Oct 12, 2016 at 11:09 AM, Robert Haas  wrote:
> I realize that you are primarily targeting utility commands here, and
> that is obviously great, because making index builds faster is very
> desirable. However, I'd just like to talk for a minute about how this
> relates to parallel query.  With Rushabh's Gather Merge patch, you can
> now have a plan that looks like Gather Merge -> Sort -> whatever.
> That patch also allows other patterns that are useful completely
> independently of this patch, like Finalize GroupAggregate -> Gather
> Merge -> Partial GroupAggregate -> Sort -> whatever, but the Gather
> Merge -> Sort -> whatever path is very related to what this patch
> does.  For example, instead of committing this patch at all, we could
> try to funnel index creation through the executor, building a plan of
> that shape, and using the results to populate the index.  I'm not
> saying that's a good idea, but it could be done.

Right, but that would be essentially the same approach as mine, but, I
suspect, less efficient and more complicated. More importantly, it
wouldn't be partitioning, and partitioning is what we really need
within the executor.

> On the flip side, what if anything can queries hope to get out of
> parallel sort that they can't get out of Gather Merge?  One
> possibility is that a parallel sort might end up being substantially
> faster than Gather Merge-over-non-parallel sort.  In that case, we
> obviously should prefer it.

I must admit that I don't know enough about it to comment just yet.
Offhand, it occurs to me that the Gather Merge sorted input could come
from a number of different types of paths/nodes, whereas adopting what
I've done here could only work more or less equivalently to "Gather
Merge -> Sort -> Seq Scan" -- a special case, really.

> For example, it's possible that you might want to have all
> workers participate in sorting some data and then divide the result of
> the sort into equal ranges that are again divided among the workers,
> or that you might want all workers to sort and then each worker to
> read a complete copy of the output data.  But these don't seem like
> particularly mainstream needs, nor do they necessarily seem like
> problems that parallel sort itself should be trying to solve.

This project of mine is about parallelizing tuplesort.c, which isn't
really what you want for parallel query -- you shouldn't try to scope
the problem as "make the sort more scalable using parallelism" there.
Rather, you want to scope it at "make the execution of the entire
query more scalable using parallelism", which is really quite a
different thing, which necessarily involves the executor having direct
knowledge of partition boundaries. Maybe the executor enlists
tuplesort.c to help with those boundaries to some degree, but that
whole thing is basically something which treats tuplesort.c as a low
level primitive.

> The
> Volcano paper[1], one of the oldest and most-cited sources I can find
> for research into parallel execution and with a design fairly similar
> to our own executor, describes various variants of what they call
> Exchange, of which what we now call Gather is one.

I greatly respect the work of Goetz Graef, including his work on the
Volcano paper. Graef has been the single biggest external influence on
my work on Postgres.

> They describe
> another variant called Interchange, which acts like a Gather node
> without terminating parallelism: every worker process reads the
> complete output of an Interchange, which is the union of all rows
> produced by all workers running the Interchange's input plan.  That
> seems like a better design than coupling such data flows specifically
> to parallel sort.
>
> I'd like to think that parallel sort will help lots of queries, as
> well as helping utility commands, but I'm not sure it will.  Thoughts?

You are right that I'm targeting the cases where we can get real
benefits without really changing the tuplesort.h contract too much.
This is literally the parallel tuplesort.c patch, which probably isn't
very useful for parallel query, because the final output is always
consumed serially here (this doesn't matter all that much for CREATE
INDEX, I believe). This approach of mine seems like the simplest way
of getting a large benefit to users involving parallelizing sorting,
but I certainly don't imagine it to be the be all and end all.

I have at least tried to anticipate how tuplesort.c will eventually
serve the needs of partitioning for the benefit of parallel query. My
intuition is that you'll have to teach it about partitioning
boundaries fairly directly -- it won't do to add something generic to
the executor. And, it probably won't be the only thing that needs to
be taught about them.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump getBlobs query broken for 7.3 servers

2016-10-12 Thread Robert Haas
On Wed, Oct 12, 2016 at 11:54 AM, Greg Stark  wrote:
> On Mon, Oct 10, 2016 at 9:52 PM, Greg Stark  wrote:
>>
>> The code is here:
>>
>> https://github.com/gsstark/retropg
>>
>> The build script is called "makeall" and it applies patches from the
>> "old-postgres-fixes" directory though some of the smarts are in the
>> script because it knows about how to run older version of the
>> configure script and it tries to fix up the ecpg parser duplcate
>> tokens separately. It saves a diff after applying the patches and
>> other fixups into the "net-diffs" directory but I've never checked if
>> those diffs would work cleanly on their own.
>
>
> Fwiw I was considering proposing committing some patches for these old
> releases to make them easier to build. I would suggest creating a tag
> for a for this stable legacy version and limiting the commits to just:
>
> 1) Disabling warnings
> 2) Fixing bugs in the configure scripts that occur on more recent
> systems (version number parsing etc)
> 3) Backporting things like the variable-length array code that prevents 
> building
> 4) Adding compiler options like -fwrapv

I'd support that.  The reason why we remove branches from support is
so that we don't have to back-patch things to 10 or 15 branches when
we have a bug fix.  But that doesn't mean that we should prohibit all
commits to those branches for any reason, and making it easier to test
backward-compatibility when we want to do that seems like a good
reason.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Non-empty default log_line_prefix

2016-10-12 Thread Robert Haas
On Wed, Oct 12, 2016 at 11:20 AM, Christoph Berg  wrote:
> Re: Jeff Janes 2016-10-12 
> 

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-12 Thread Robert Haas
On Wed, Oct 12, 2016 at 3:21 AM, Dilip Kumar  wrote:
> I think at higher client count from client count 96 onwards contention
> on CLogControlLock is clearly visible and which is completely solved
> with group lock patch.
>
> And at lower client count 32,64  contention on CLogControlLock is not
> significant hence we can not see any gain with group lock patch.
> (though we can see some contention on CLogControlLock is reduced at 64
> clients.)

I agree with these conclusions.  I had a chance to talk with Andres
this morning at Postgres Vision and based on that conversation I'd
like to suggest a couple of additional tests:

1. Repeat this test on x86.  In particular, I think you should test on
the EnterpriseDB server cthulhu, which is an 8-socket x86 server.

2. Repeat this test with a mixed read-write workload, like -b
tpcb-like@1 -b select-only@9

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] int2vector and btree indexes

2016-10-12 Thread Tom Lane
Amit Langote  writes:
> On 2016/10/11 21:40, Tom Lane wrote:
>> Hmm ... I kind of wonder why we have int2vectoreq or hashint2vector at
>> all, likewise the hash opclass based on them.

> Agreed.  So how about the attached patch to remove the said infrastructure?

Looks good, pushed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump getBlobs query broken for 7.3 servers

2016-10-12 Thread Greg Stark
On Mon, Oct 10, 2016 at 9:52 PM, Greg Stark  wrote:
>
> The code is here:
>
> https://github.com/gsstark/retropg
>
> The build script is called "makeall" and it applies patches from the
> "old-postgres-fixes" directory though some of the smarts are in the
> script because it knows about how to run older version of the
> configure script and it tries to fix up the ecpg parser duplcate
> tokens separately. It saves a diff after applying the patches and
> other fixups into the "net-diffs" directory but I've never checked if
> those diffs would work cleanly on their own.


Fwiw I was considering proposing committing some patches for these old
releases to make them easier to build. I would suggest creating a tag
for a for this stable legacy version and limiting the commits to just:

1) Disabling warnings
2) Fixing bugs in the configure scripts that occur on more recent
systems (version number parsing etc)
3) Backporting things like the variable-length array code that prevents building
4) Adding compiler options like -fwrapv


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Polyphase merge is obsolete

2016-10-12 Thread Heikki Linnakangas

On 10/12/2016 08:27 PM, Tom Lane wrote:

Heikki Linnakangas  writes:

The beauty of the polyphase merge algorithm is that it allows reusing
input tapes as output tapes efficiently ... So the whole idea of trying to
efficiently reuse input tapes as output tapes is pointless.


It's been awhile since I looked at that code, but I'm quite certain that
it *never* thought it was dealing with actual tapes.  Rather, the point of
sticking with polyphase merge was that it allowed efficient incremental
re-use of temporary disk files, so that the maximum on-disk footprint was
only about equal to the volume of data to be sorted, rather than being a
multiple of that.  Have we thrown that property away?


No, there's no difference to that behavior. logtape.c takes care of 
incremental re-use of disk space, regardless of the merging pattern.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Non-empty default log_line_prefix

2016-10-12 Thread Christoph Berg
Re: Jeff Janes 2016-10-12 

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-10-12 Thread Robert Haas
On Fri, Oct 7, 2016 at 5:47 PM, Peter Geoghegan  wrote:
> Work is still needed on:
>
> * Cost model. Should probably attempt to guess final index size, and
> derive calculation of number of workers from that. Also, I'm concerned
> that I haven't given enough thought to the low end, where with default
> settings most CREATE INDEX statements will use at least one parallel
> worker.
>
> * The whole way that I teach nbtsort.c to disallow catalog tables for
> parallel CREATE INDEX due to concerns about parallel safety is in need
> of expert review, preferably from Robert. It's complicated in a way
> that relies on things happening or not happening from a distance.
>
> * Heikki seems to want to change more about logtape.c, and its use of
> indirection blocks. That may evolve, but for now I can only target the
> master branch.
>
> * More extensive performance testing. I think that this V3 is probably
> the fastest version yet, what with Heikki's improvements, but I
> haven't really verified that.

I realize that you are primarily targeting utility commands here, and
that is obviously great, because making index builds faster is very
desirable. However, I'd just like to talk for a minute about how this
relates to parallel query.  With Rushabh's Gather Merge patch, you can
now have a plan that looks like Gather Merge -> Sort -> whatever.
That patch also allows other patterns that are useful completely
independently of this patch, like Finalize GroupAggregate -> Gather
Merge -> Partial GroupAggregate -> Sort -> whatever, but the Gather
Merge -> Sort -> whatever path is very related to what this patch
does.  For example, instead of committing this patch at all, we could
try to funnel index creation through the executor, building a plan of
that shape, and using the results to populate the index.  I'm not
saying that's a good idea, but it could be done.

On the flip side, what if anything can queries hope to get out of
parallel sort that they can't get out of Gather Merge?  One
possibility is that a parallel sort might end up being substantially
faster than Gather Merge-over-non-parallel sort.  In that case, we
obviously should prefer it.  Other possibilities seem a little
obscure.  For example, it's possible that you might want to have all
workers participate in sorting some data and then divide the result of
the sort into equal ranges that are again divided among the workers,
or that you might want all workers to sort and then each worker to
read a complete copy of the output data.  But these don't seem like
particularly mainstream needs, nor do they necessarily seem like
problems that parallel sort itself should be trying to solve.  The
Volcano paper[1], one of the oldest and most-cited sources I can find
for research into parallel execution and with a design fairly similar
to our own executor, describes various variants of what they call
Exchange, of which what we now call Gather is one.  They describe
another variant called Interchange, which acts like a Gather node
without terminating parallelism: every worker process reads the
complete output of an Interchange, which is the union of all rows
produced by all workers running the Interchange's input plan.  That
seems like a better design than coupling such data flows specifically
to parallel sort.

I'd like to think that parallel sort will help lots of queries, as
well as helping utility commands, but I'm not sure it will.  Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] "Volcano - an Extensible and Parallel Query Evaluation System",
https://pdfs.semanticscholar.org/865b/5f228f08ebac0b68d3a4bfd97929ee85e4b6.pdf
[2] See "C. Variants of the Exchange Operator" on p. 13 of [1]


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] logical replication connection information management

2016-10-12 Thread Peter Eisentraut
I want to discuss the connection information management aspect of the
logical replication patch set that is currently being proposed
(https://commitfest.postgresql.org/10/701/).

To review, the user-visible interfaces center around

   -- on sending end
   CREATE PUBLICATION mypub FOR TABLE tbl1, tbl2, ...;

   -- on receiving end
   CREATE SUBSCRIPTION mysub PUBLICATION mypub CONNECTION 'host= dbname=
...'

Both of these map pretty directly into system catalogs pg_publication
and pg_subscription.

The concern is about storing the connection information.  Right now,
this is just a string that is stored and passed to libpqwalreceiver.
But this string can contain passwords, so it needs to be protected.
Currently, pg_subscription has read permissions removed.  This creates
various annoyances.

An idea was to use the facilities we already have for foreign data
access for storing replication connection information.  It already has
considered and solved these problems.  So it might look like this:

CREATE SERVER node1 OPTIONS (host '...', dbname '...');
CREATE USER MAPPING FOR CURRENT_USER SERVER node1;
CREATE SUBSCRIPTION mysub PUBLICATION mypub SERVER node1;

This would have a number of advantages:

- Secret information such as passwords is all stored in one place that
  is already secured.

- Remote connection information is stored all in one place.

- Subscriptions pointing to the same remote host are logically
  connected.

- It's easier to change connection information for all subscriptions
  pointing to a host or to change the password of a user.

- Access control can use existing facilities.  We would not need a new
  concept for who can create subscriptions and not need to use
  superuser or some semi-superuser status.  To allow the use of a
  server, grant USAGE on the server.

So functionality-wise, this looks pretty good, but there is some
awkwardness in how to wire this into the existing facilities, since a
server, also known as a foreign server, is currently tied to a foreign
data wrapper.  I have currently implemented this by creating a fake
built-in foreign data wrapper called "subscription", so the actual
syntax is

CREATE SERVER node1 WRAPPER subscription OPTIONS (host '...', dbname
'...');

which isn't terrible, but still a bit weird.

An idea is to make the foreign server concept more general and allow
it to exist independently of a foreign data wrapper.  Then create more
specific syntax like

CREATE SERVER node1 FOR SUBSCRIPTION OPTIONS ( ... );

or

CREATE SUBSCRIPTION SERVER ...

This would work a bit like pg_constraint, which can be tied to a table
or a type or even nothing (for the hypothetical assertions feature).

We'd need a separate mechanism for controlling which user has the right
to create such subscription servers, but it might be acceptable at the
beginning to just require superuserness.

Thoughts on that?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove "Source Code" column from \df+ ?

2016-10-12 Thread Pavel Stehule
2016-10-12 19:48 GMT+02:00 Peter Eisentraut <
peter.eisentr...@2ndquadrant.com>:

> On 10/12/16 11:16 AM, Tom Lane wrote:
> > I'm not sure that Peter was voting for retaining "internal name", but
> > personally I prefer that to deleting prosrc entirely, so +1.
>
> I'm not sure what the point of showing the internal name would be if we
> have already declared that the source code of non-C functions is not
> that interesting.  But I don't have a strong feeling about it.
>

The benefit is for people who have to look on C implementation of internal
functions. Probably not too big group - but it can be interesting for
beginners who starting with reading of PostgreSQL code.

Regards

Pavel


>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [HACKERS] Remove "Source Code" column from \df+ ?

2016-10-12 Thread Peter Eisentraut
On 10/12/16 11:16 AM, Tom Lane wrote:
> I'm not sure that Peter was voting for retaining "internal name", but
> personally I prefer that to deleting prosrc entirely, so +1.

I'm not sure what the point of showing the internal name would be if we
have already declared that the source code of non-C functions is not
that interesting.  But I don't have a strong feeling about it.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2016-10-12 Thread Vitaly Burovoy
On 10/12/16, Alvaro Herrera  wrote:
> Julien Rouhaud wrote:
>> On 12/10/2016 14:32, Alvaro Herrera wrote:
>> > Julien Rouhaud wrote:
>> >
>> >> and you can instead make macaddr64 support both format, and provide a
>> >> macaddr::macaddr64 cast
>> >
>> > Having macaddr64 support both formats sounds nice, but how does it
>> > work?
>> > Will we have to reserve one additional bit to select the
>> > representation?
>> > That would make the type be 65 bits which is a clear loser IMO.
>> >
>> > Is it allowed to just leave 16 bits as zeroes which would indicate that
>> > the address is EUI48?  I wouldn't think so ...
>>
>> From what I read, you can indicate it's an EUI-48 address by storing
>> FF:FF (or FF:FE for MAC-48) in 4th and 5th bytes of the EUI-64 address.
>
> That seems reasonable at first glance; so the new type would be able to
> store both 48-bit and 64-bit values, and there would be assignment casts
> in both directions

I think either "macaddr" should be renamed to "macaddr6" (saved its
oid), a new type "macaddr8" is added with introducing a new alias
"macaddr" or the current "macaddr" should accept both versions as the
"inet" type does.

The function "macaddr_recv" can distinguish them by the
StringInfoData.len member, "macaddr_in" - by number of parts split by
":".
The "macaddr_out" and "macaddr_send" can out 6 octets if the stored
value is mapped to MAC-48.
Storing can be done always as 8 bytes using the rule above.

In the other case there is hard from user's PoV to detect at the
design stage when it is necessary to define column as macaddr and when
to macaddr8.
If users need to update a column type (a new hardware appears with
EUI-64 address), they should keep in mind that all things are changed
for the new column type - stored procedure's parameters, application
code interacting with the DB etc.).
I don't agree with Tom's proposal to introduce a new type, it would be
inconvenient for users.

We have types "int2", "int4", "int8" and an alias "int" for a type
"int4", at least psql does not show it:
postgres=# \dT+ int
List of data types
 Schema | Name | Internal name | Size | Elements | Owner | Access
privileges | Description
+--+---+--+--+---+---+-
(0 rows)

It is understandable to have 3 types for integers because most space
of the DB occupied by them and in the real life we just don't use big
numbers, but cases for "inet" and "macaddr" are different.

> and a suite of operators to enable interoperability
> of 48-bit values in macaddr8 with values in type macaddr.  Right?
>
> (The cast function from macaddr8 to macaddr would raise error if the
> 4th and 5th bytes are not either FF:FF or FF:FE -- I don't think we can
> in practice distinguish EUI-48 from MAC-48 in this context.

The wikipedia says[1] they are the same things but MAC-48 is an
obsolete term for a special case, so there is no necessary to
distinguish them.

> The cast in the other direction would have no restriction and should
> probably always use FF:FE).

[1] 
https://en.wikipedia.org/wiki/Organizationally_unique_identifier#48-bit_Media_Access_Control_Identifier_.28MAC-48.29
-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Non-empty default log_line_prefix

2016-10-12 Thread Jeff Janes
On Wed, Oct 12, 2016 at 10:11 AM, Tom Lane  wrote:

> Jeff Janes  writes:
> > On Sun, Oct 2, 2016 at 1:20 PM, Christoph Berg  wrote:
> >> Patch attached. (Still using %t, I don't think %m makes sense for the
> >> default.)
>
> > What is the cost of using %m, other than 4 (rather compressible) bytes
> per
> > log entry?
>
> More log I/O, which is not free ... and that remark about compressibility
> is bogus for anyone who doesn't pipe their postmaster stderr into gzip.
> I'm already afraid that adding the timestamps will get us some pushback
> about log volume.
>

I don't pipe them into gzip, but every few months I go and pxz any of them
more than few months old.

Do you think the pushback will come from people who just accept the
defaults?

Cheers,

Jeff


Re: [HACKERS] Polyphase merge is obsolete

2016-10-12 Thread Tom Lane
Heikki Linnakangas  writes:
> The beauty of the polyphase merge algorithm is that it allows reusing 
> input tapes as output tapes efficiently ... So the whole idea of trying to 
> efficiently reuse input tapes as output tapes is pointless.

It's been awhile since I looked at that code, but I'm quite certain that
it *never* thought it was dealing with actual tapes.  Rather, the point of
sticking with polyphase merge was that it allowed efficient incremental
re-use of temporary disk files, so that the maximum on-disk footprint was
only about equal to the volume of data to be sorted, rather than being a
multiple of that.  Have we thrown that property away?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Polyphase merge is obsolete

2016-10-12 Thread Heikki Linnakangas
The beauty of the polyphase merge algorithm is that it allows reusing 
input tapes as output tapes efficiently. So if you have N tape drives, 
you can keep them all busy throughout the merge.


That doesn't matter, when we can easily have as many "tape drives" as we 
want. In PostgreSQL, a tape drive consumes just a few kB of memory, for 
the buffers. With the patch being discussed to allow a tape to be 
"paused" between write passes [1], we don't even keep the tape buffers 
around, when a tape is not actively read written to, so all it consumes 
is the memory needed for the LogicalTape struct.


The number of *input* tapes we can use in each merge pass is still 
limited, by the memory needed for the tape buffers and the merge heap, 
but only one output tape is active at any time. The inactive output 
tapes consume very few resources. So the whole idea of trying to 
efficiently reuse input tapes as output tapes is pointless.


Let's switch over to a simple k-way balanced merge. Because it's 
simpler. If you're severely limited on memory, like when sorting 1GB of 
data with work_mem='1MB' or less, it's also slightly faster. I'm not too 
excited about the performance aspect, because in the typical case of a 
single-pass merge, there's no difference. But it would be worth changing 
on simplicity grounds, since we're mucking around in tuplesort.c anyway.


I came up with the attached patch to do that. This patch applies on top 
of my latest "Logical tape pause/resume" patches [1]. It includes 
changes to the logtape.c interface, that make it possible to create and 
destroy LogicalTapes in a tapeset on the fly. I believe this will also 
come handy for Peter's parallel tuplesort patch set.


[1] Logical tape pause/resume, 
https://www.postgresql.org/message-id/55b3b7ae-8dec-b188-b8eb-e07604052351%40iki.fi


PS. I finally bit the bullet and got self a copy of The Art of Computer 
Programming, Vol 3, 2nd edition. In section 5.4 on External Sorting, 
Knuth says:


"
When this book was first written, magnetic tapes were abundant and disk 
drives were expensive. But disks became enormously better during the 
1980s, and by the late 1990s they had almost completely replaced 
magnetic tape units on most of the world's computer systems. Therefore 
the once-crucial topic of tape merging has become of limited relevance 
to current needs.


Yet many of the patterns are quite beautiful, and the associated 
algorithms reflect some of the best research done in computer science 
during its early days; the techniques are just too nice to be discarded 
abruptly onto the rubbish heap of history. Indeed, the ways in which 
these methods blend theory with practice are especially instructive. 
Therefore merging patterns are discussed carefully and completely below, 
in what may be their last grand appearance before they accept the final 
curtain call.

"

Yep, the polyphase and other merge patterns are beautiful. I enjoyed 
reading through those sections. Now let's put them to rest in PostgreSQL.


- Heikki
>From 15169f32f99a69401d3565c8d0ff0d532d6b6638 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 12 Oct 2016 20:04:17 +0300
Subject: [PATCH 1/1] Replace polyphase merge algorithm with a simple balanced
 k-way merge.

The advantage of polyphase merge is that it can reuse the input tapes as
output tapes efficiently, but that is irrelevant on modern hardware, when
we can easily emulate any number of tape drives. The number of input tapes
we can/should use during merging is limited by work_mem, but output tapes
that we are not currently writing to only cost a little bit of memory, so
there is no need to skimp on them.

Refactor LogicalTapeSet/LogicalTape interface. All the tape functions,
like LogicalTapeRead and LogicalTapeWrite, take a LogicalTape as argument,
instead of LogicalTapeSet+tape number. You can create any number of
LogicalTapes in a single LogicalTapeSet, and you don't need to decide the
number upfront, when you create the tape set.
---
 src/backend/utils/sort/logtape.c   | 223 +
 src/backend/utils/sort/tuplesort.c | 665 +++--
 src/include/utils/logtape.h|  37 ++-
 3 files changed, 366 insertions(+), 559 deletions(-)

diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index 1f540f0..2370fa5 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -118,6 +118,8 @@ typedef struct TapeBlockTrailer
  */
 typedef struct LogicalTape
 {
+	LogicalTapeSet *tapeSet;	/* tape set this tape is part of */
+
 	bool		writing;		/* T while in write phase */
 	bool		paused;			/* T if the tape is paused */
 	bool		frozen;			/* T if blocks should not be freed when read */
@@ -173,10 +175,6 @@ struct LogicalTapeSet
 	long	   *freeBlocks;		/* resizable array */
 	int			nFreeBlocks;	/* # of currently free blocks */
 	int			freeBlocksLen;	/* current allocated length of freeBlocks[] */
-
-	/* The 

Re: [HACKERS] Non-empty default log_line_prefix

2016-10-12 Thread Tom Lane
Jeff Janes  writes:
> On Sun, Oct 2, 2016 at 1:20 PM, Christoph Berg  wrote:
>> Patch attached. (Still using %t, I don't think %m makes sense for the
>> default.)

> What is the cost of using %m, other than 4 (rather compressible) bytes per
> log entry?

More log I/O, which is not free ... and that remark about compressibility
is bogus for anyone who doesn't pipe their postmaster stderr into gzip.
I'm already afraid that adding the timestamps will get us some pushback
about log volume.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove "Source Code" column from \df+ ?

2016-10-12 Thread Tom Lane
Stephen Frost  writes:
> * Robert Haas (robertmh...@gmail.com) wrote:
>> I'm still not used to the change that I have to use \d+ rather than \d
>> to see the view definition.  It's the #1 thing I want to see when
>> examining a view, and since 2fe1b4dd651917aad2accac7ba8adb44d9f54930 I
>> have to remember to stick a + sign in there.  So, in short, I agree.

> I definitely see the argument of "\d on a view used to give me the view
> def and now it's almost useless and I have to remember to \d+ all the
> time", but I also think that I might be able to retrain my fingers to
> do \sv for views more easily than always remembering to add a '+' to \d,
> which I use much more frequently than \sv or \d+.

I'm unimpressed with the "I can retrain" argument, because that only
applies to people who exclusively use the latest and greatest.  \sv
didn't exist before 9.6, so if I relearn to use that, it'll fail on me
anytime I'm using a pre-9.6 psql, which is going to be a significant
percentage of the time for awhile to come.

Some quick digging says that \d on a view included the view definition
in a footer since the very beginning (7.0 era, see commit a45195a19) and
we changed it in 9.0 to require +.  That's a heck of a lot of history
and fingertip knowledge to override on the strength of one man's
complaint, even if he is the man who made it that way in the first place.
I also note that getting into the habit of using "\d+" to see view
definitions didn't create any major problems when using older psqls.

(BTW, \sf has been there since 9.1, which means that equating the
compatibility situations for views and functions is a fallacy.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Non-empty default log_line_prefix

2016-10-12 Thread Jeff Janes
On Sun, Oct 2, 2016 at 1:20 PM, Christoph Berg  wrote:

> Re: Tom Lane 2016-09-29 <18642.1475159...@sss.pgh.pa.us>
> > > Possibly the longer version could be added as an example in the
> > > documentation.
> >
> > I suspect that simply having a nonempty default in the first place
> > is going to do more to raise peoples' awareness than anything we
> > could do in the documentation.  But perhaps an example along these
> > lines would be useful for showing proper use of %q.
>
> Patch attached. (Still using %t, I don't think %m makes sense for the
> default.)
>

I don't agree with that part.  When looking at sections of log files that
people post on help forums, I've often wished people had shared
milliseconds, and I've never wished they had truncated them off.

If two messages are separated by 0.950 seconds, it can have entirely
different implications than if they are separated by 0.002 seconds.

What is the cost of using %m, other than 4 (rather compressible) bytes per
log entry?

Cheers,

Jeff


Re: [HACKERS] Remove "Source Code" column from \df+ ?

2016-10-12 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> >> I'm OK with just removing all the source codes from the \d family and
> >> using the \s family instead.
> 
> > Ok, great, thanks for clarifying that.  Since we only have '\sf' today,
> > I think the prevailing option here is then to make the change to
> > removing 'prosrc' from \df+, have an 'internal name' column, and have
> > users use \sf for functions.
> 
> I'm not sure that Peter was voting for retaining "internal name", but
> personally I prefer that to deleting prosrc entirely, so +1.

Apologies, didn't mean to say that he had agree with keeping 'internal
name', just that it seemed to be the most generally accepted apporach
(and it has a +1 from me as well).

> > Personally, I like the idea of a '\sv' for views, though we should
> > discuss that on a new thread.
> 
> We have \sv already no?

Right, sorry.

> I'm kind of -1 on removing view definitions from \d+.  It's worked like
> that for a very long time and Peter's is the first complaint I've heard.
> I think changing it is likely to annoy more people than will think it's
> an improvement.

* Robert Haas (robertmh...@gmail.com) wrote:
> I'm still not used to the change that I have to use \d+ rather than \d
> to see the view definition.  It's the #1 thing I want to see when
> examining a view, and since 2fe1b4dd651917aad2accac7ba8adb44d9f54930 I
> have to remember to stick a + sign in there.  So, in short, I agree.

I definitely see the argument of "\d on a view used to give me the view
def and now it's almost useless and I have to remember to \d+ all the
time", but I also think that I might be able to retrain my fingers to
do \sv for views more easily than always remembering to add a '+' to \d,
which I use much more frequently than \sv or \d+.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1

2016-10-12 Thread Tom Lane
Albe Laurenz  writes:
> Tom Lane wrote:
>> I'm okay with adding PGDLLEXPORT to the extern, but we should update
>> that comment to note that it's not necessary with any of our standard
>> Windows build processes.  (For that matter, the comment fails to explain
>> why this macro is providing an extern for the base function at all...)

> Here is a patch for that, including an attempt to improve the comment.

Pushed with some further twiddling of the comment.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2016-10-12 Thread Alvaro Herrera
Julien Rouhaud wrote:
> On 12/10/2016 14:32, Alvaro Herrera wrote:
> > Julien Rouhaud wrote:
> > 
> >> and you can instead make macaddr64 support both format, and provide a
> >> macaddr::macaddr64 cast
> > 
> > Having macaddr64 support both formats sounds nice, but how does it work?
> > Will we have to reserve one additional bit to select the representation?
> > That would make the type be 65 bits which is a clear loser IMO.
> > 
> > Is it allowed to just leave 16 bits as zeroes which would indicate that
> > the address is EUI48?  I wouldn't think so ...
> 
> From what I read, you can indicate it's an EUI-48 address by storing
> FF:FF (or FF:FE for MAC-48) in 4th and 5th bytes of the EUI-64 address.

That seems reasonable at first glance; so the new type would be able to
store both 48-bit and 64-bit values, and there would be assignment casts
in both directions and a suite of operators to enable interoperability
of 48-bit values in macaddr8 with values in type macaddr.  Right?

(The cast function from macaddr8 to macaddr would raise error if the
4th and 5th bytes are not either FF:FF or FF:FE -- I don't think we can
in practice distinguish EUI-48 from MAC-48 in this context.  The cast in
the other direction would have no restriction and should probably always
use FF:FE).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove "Source Code" column from \df+ ?

2016-10-12 Thread Robert Haas
On Wed, Oct 12, 2016 at 8:16 AM, Tom Lane  wrote:
> Stephen Frost  writes:
>> * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
>>> I'm OK with just removing all the source codes from the \d family and
>>> using the \s family instead.
>
>> Ok, great, thanks for clarifying that.  Since we only have '\sf' today,
>> I think the prevailing option here is then to make the change to
>> removing 'prosrc' from \df+, have an 'internal name' column, and have
>> users use \sf for functions.
>
> I'm not sure that Peter was voting for retaining "internal name", but
> personally I prefer that to deleting prosrc entirely, so +1.
>
>> Personally, I like the idea of a '\sv' for views, though we should
>> discuss that on a new thread.
>
> We have \sv already no?
>
> I'm kind of -1 on removing view definitions from \d+.  It's worked like
> that for a very long time and Peter's is the first complaint I've heard.
> I think changing it is likely to annoy more people than will think it's
> an improvement.

I'm still not used to the change that I have to use \d+ rather than \d
to see the view definition.  It's the #1 thing I want to see when
examining a view, and since 2fe1b4dd651917aad2accac7ba8adb44d9f54930 I
have to remember to stick a + sign in there.  So, in short, I agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Extend framework from commit 53be0b1ad to report latch waits.

2016-10-12 Thread Robert Haas
On Wed, Oct 12, 2016 at 8:18 AM, Peter Eisentraut
 wrote:
> On 10/4/16 11:29 AM, Tom Lane wrote:
>> Robert Haas  writes:
>>> Apparently, 'make world' does not build worker_spi.  I thought 'make
>>> world' was supposed to build everything?
>>
>> You'd have thunk, yeah.  It looks like the issue is that src/Makefile
>> is selective about recursing into certain subdirectories of test/,
>> but mostly not test/ itself.  src/test/Makefile naively believes it's
>> in charge, though.  Probably that logic ought to get shoved down one
>> level, and then adjusted so that src/test/modules gets built by "all".
>> Or else teach top-level "make world" to do "make all" in src/test/,
>> but that seems like it's just doubling down on confusing interconnections.
>
> We generally don't build test code during make world.
>
> The reason src/Makefile does that is probably because pg_regress is part
> of the installation.
>
> So this looks more or less correct to me.

But it's annoying to push code and have it break the buildfarm when
you already did 'make world' and 'make check-world'.  What target
should I be using?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Non-empty default log_line_prefix

2016-10-12 Thread Christoph Berg
Re: Peter Eisentraut 2016-10-12 
<0caa6d7f-deb6-9a43-2b38-60e63af93...@2ndquadrant.com>
> >> > is going to do more to raise peoples' awareness than anything we
> >> > could do in the documentation.  But perhaps an example along these
> >> > lines would be useful for showing proper use of %q.
> > Patch attached. (Still using %t, I don't think %m makes sense for the
> > default.)
> 
> That still doesn't address what to do about syslog and eventlog users.
> We would need either a separate prefix setting for those, or maybe
> something like %q that says, skip to here if using syslog.  (I don't
> know eventlog, so I don't know if a common setting for syslog and
> eventlog would work.)

This patch simply tries to fix the default (stderr + '') which wasn't
useful for anyone. Note that there is already a hint in the
documentation that says timestamps and PID are not useful for syslog.

(Yes, the '' default might be fine for syslog, but I don't think
that's a good argument for sticking with it for default installs. I've
seen way too many useless log files out there, and at worst we'll have
syslogs with two timestamps.)

Christoph


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Extend framework from commit 53be0b1ad to report latch waits.

2016-10-12 Thread Tom Lane
Peter Eisentraut  writes:
> On 10/4/16 11:29 AM, Tom Lane wrote:
>> Robert Haas  writes:
>>> Apparently, 'make world' does not build worker_spi.  I thought 'make
>>> world' was supposed to build everything?

>> You'd have thunk, yeah.  It looks like the issue is that src/Makefile
>> is selective about recursing into certain subdirectories of test/,
>> but mostly not test/ itself.  src/test/Makefile naively believes it's
>> in charge, though.  Probably that logic ought to get shoved down one
>> level, and then adjusted so that src/test/modules gets built by "all".
>> Or else teach top-level "make world" to do "make all" in src/test/,
>> but that seems like it's just doubling down on confusing interconnections.

> We generally don't build test code during make world.

> The reason src/Makefile does that is probably because pg_regress is part
> of the installation.

> So this looks more or less correct to me.

Even if you think the behavior is correct (I'm not convinced), the
implementation is certainly confusing.  I don't like the way that
src/test/Makefile gets bypassed for decisions about which of its
subdirectories get built for what.  Any normal person would expect
that src/test/Makefile is what determines that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Non-empty default log_line_prefix

2016-10-12 Thread Tom Lane
Peter Eisentraut  writes:
> That still doesn't address what to do about syslog and eventlog users.
> We would need either a separate prefix setting for those, or maybe
> something like %q that says, skip to here if using syslog.  (I don't
> know eventlog, so I don't know if a common setting for syslog and
> eventlog would work.)

Meh.  Those aren't default log output targets, so I don't think the
default prefix needs to cater for them.  People who adjust one setting
can adjust the other too.

There would be some value in the complexity you're thinking of for
installations that log to multiple targets concurrently, but really,
who does that?  Most production installations already begrudge
the I/O volume for a single log target.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-12 Thread Stephen Frost
* David Steele (da...@pgmasters.net) wrote:
> On 10/4/16 1:48 AM, Michael Paquier wrote:
> 
> > So this is still open for votes. Here are the candidates and who
> > voiced for what:
> > - pg_transaction: Michael P, Thomas M. => Current 0002 is doing that.
> > - pg_xact: David S, Robert
> > - pg_trans: Tom
> > - pg_transaction_status: Peter E.
> 
> Christoph also prefers pg_xact [1].

Since we're asking for votes, I also prefer pg_xact over the other
options listed here.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-12 Thread Tom Lane
Peter Eisentraut  writes:
> On 10/4/16 1:48 AM, Michael Paquier wrote:
>> So this is still open for votes. Here are the candidates and who
>> voiced for what:
>> - pg_transaction: Michael P, Thomas M. => Current 0002 is doing that.
>> - pg_xact: David S, Robert
>> - pg_trans: Tom
>> - pg_transaction_status: Peter E.

> I think this would all be a lot easier if we just moved all the pg_*
> directories one directory down.

The main problem we're trying to fix here is people thinking that
something with "log" in the name contains discardable data.  Just
relocating the directory without renaming it won't improve that.

The relocation aspect was meant to address another problem, which
is making it easier to distinguish which parts of the tree should
be copied by tools like pg_basebackup.  But that's not what Michael
is asking for votes on.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Non-empty default log_line_prefix

2016-10-12 Thread Peter Eisentraut
On 10/2/16 4:20 PM, Christoph Berg wrote:
>> I suspect that simply having a nonempty default in the first place
>> > is going to do more to raise peoples' awareness than anything we
>> > could do in the documentation.  But perhaps an example along these
>> > lines would be useful for showing proper use of %q.
> Patch attached. (Still using %t, I don't think %m makes sense for the
> default.)

That still doesn't address what to do about syslog and eventlog users.
We would need either a separate prefix setting for those, or maybe
something like %q that says, skip to here if using syslog.  (I don't
know eventlog, so I don't know if a common setting for syslog and
eventlog would work.)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Extend framework from commit 53be0b1ad to report latch waits.

2016-10-12 Thread Peter Eisentraut
On 10/4/16 11:29 AM, Tom Lane wrote:
> Robert Haas  writes:
>> Apparently, 'make world' does not build worker_spi.  I thought 'make
>> world' was supposed to build everything?
> 
> You'd have thunk, yeah.  It looks like the issue is that src/Makefile
> is selective about recursing into certain subdirectories of test/,
> but mostly not test/ itself.  src/test/Makefile naively believes it's
> in charge, though.  Probably that logic ought to get shoved down one
> level, and then adjusted so that src/test/modules gets built by "all".
> Or else teach top-level "make world" to do "make all" in src/test/,
> but that seems like it's just doubling down on confusing interconnections.

We generally don't build test code during make world.

The reason src/Makefile does that is probably because pg_regress is part
of the installation.

So this looks more or less correct to me.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove "Source Code" column from \df+ ?

2016-10-12 Thread Tom Lane
Stephen Frost  writes:
> * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
>> I'm OK with just removing all the source codes from the \d family and
>> using the \s family instead.

> Ok, great, thanks for clarifying that.  Since we only have '\sf' today,
> I think the prevailing option here is then to make the change to
> removing 'prosrc' from \df+, have an 'internal name' column, and have
> users use \sf for functions.

I'm not sure that Peter was voting for retaining "internal name", but
personally I prefer that to deleting prosrc entirely, so +1.

> Personally, I like the idea of a '\sv' for views, though we should
> discuss that on a new thread.

We have \sv already no?

I'm kind of -1 on removing view definitions from \d+.  It's worked like
that for a very long time and Peter's is the first complaint I've heard.
I think changing it is likely to annoy more people than will think it's
an improvement.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove "Source Code" column from \df+ ?

2016-10-12 Thread Stephen Frost
* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 10/12/16 11:08 AM, Stephen Frost wrote:
> > Personally, I like the idea of a '\sv' for views, though we should
> > discuss that on a new thread.
> 
> \sv already exists. :-)

Whoops, sorry, was looking at a 9.5 psql. :)

Neat!

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Remove "Source Code" column from \df+ ?

2016-10-12 Thread Peter Eisentraut
On 10/12/16 11:08 AM, Stephen Frost wrote:
> Personally, I like the idea of a '\sv' for views, though we should
> discuss that on a new thread.

\sv already exists. :-)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-10-12 Thread Peter Eisentraut
On 10/4/16 10:16 AM, Julien Rouhaud wrote:
> Please find attached v9 of the patch, adding the parallel worker class
> and changing max_worker_processes default to 16 and max_parallel_workers
> to 8.  I also added Amit's explanation for the need of a write barrier
> in ForgetBackgroundWorker().

This approach totally messes up the decoupling between the background
worker facilities and consumers of those facilities.  Having dozens of
lines of code in bgworker.c that does the accounting and resource
checking on behalf of parallel.c looks very suspicious.  Once we add
logical replication workers or whatever, we'll be tempted to add even
more stuff in there and it will become a mess.

I think we should think of a way where we can pass the required
information during background worker setup, like a pointer to the
resource-limiting GUC variable.

For style, I would also prefer the "class" to be a separate struct field
from the flags.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove "Source Code" column from \df+ ?

2016-10-12 Thread Stephen Frost
* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 10/11/16 7:51 PM, Tom Lane wrote:
> > 1. Do nothing.
> > 2. Remove the prosrc column from \df+ altogether.
> > 3. Suppress prosrc for PL functions, but continue to show it for
> >C and internal functions (and, probably, rename it to something
> >other than "Source code" in that case).
> > 4. #3 plus show PL function source code in footers.
> 
> One related annoyance I have with psql is that \d+ on a view *does* show
> the "source code" in the footer, because it's often too long and bulky
> and ugly and unrelated to why I wanted to use the +.

I tend to agree with that, though I believe it's a topic for another
thread.

> I'm OK with just removing all the source codes from the \d family and
> using the \s family instead.

Ok, great, thanks for clarifying that.  Since we only have '\sf' today,
I think the prevailing option here is then to make the change to
removing 'prosrc' from \df+, have an 'internal name' column, and have
users use \sf for functions.

If anyone feels differently, please speak up.

Personally, I like the idea of a '\sv' for views, though we should
discuss that on a new thread.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-12 Thread Peter Eisentraut
On 10/4/16 1:48 AM, Michael Paquier wrote:
> So this is still open for votes. Here are the candidates and who
> voiced for what:
> - pg_transaction: Michael P, Thomas M. => Current 0002 is doing that.
> - pg_xact: David S, Robert
> - pg_trans: Tom
> - pg_transaction_status: Peter E.

I think this would all be a lot easier if we just moved all the pg_*
directories one directory down.  We did this with "global" many years
ago and it worked out well.  We didn't actually have to change the file
names, the top-level directory became cleaner, and no one was messing
around with the files anymore.  Adjusting external tools also becomes
easier.  There is so much lore out there about clog and xlog; it would
be annoying to break with that.

I can see a reason for not doing that with pg_xlog, because that is a
bit of an external interface, with initdb -X and such, and pg_wal is a
pretty good alternative name.  But no one deals with these other
directories directly, so just move them out of the way.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1

2016-10-12 Thread Albe Laurenz
Tom Lane wrote:
> I'm okay with adding PGDLLEXPORT to the extern, but we should update
> that comment to note that it's not necessary with any of our standard
> Windows build processes.  (For that matter, the comment fails to explain
> why this macro is providing an extern for the base function at all...)

Here is a patch for that, including an attempt to improve the comment.

Yours,
Laurenz Albe


0001-Add-PGDLLEXPORT-to-PG_FUNCTION_INFO_V1.patch
Description: 0001-Add-PGDLLEXPORT-to-PG_FUNCTION_INFO_V1.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove "Source Code" column from \df+ ?

2016-10-12 Thread Peter Eisentraut
On 10/11/16 7:51 PM, Tom Lane wrote:
> 1. Do nothing.
> 2. Remove the prosrc column from \df+ altogether.
> 3. Suppress prosrc for PL functions, but continue to show it for
>C and internal functions (and, probably, rename it to something
>other than "Source code" in that case).
> 4. #3 plus show PL function source code in footers.

One related annoyance I have with psql is that \d+ on a view *does* show
the "source code" in the footer, because it's often too long and bulky
and ugly and unrelated to why I wanted to use the +.

I'm OK with just removing all the source codes from the \d family and
using the \s family instead.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical tape pause/resume

2016-10-12 Thread Heikki Linnakangas

On 10/10/2016 10:31 PM, Peter Geoghegan wrote:

On Sun, Oct 9, 2016 at 11:52 PM, Heikki Linnakangas  wrote:

Ok, good. I think we're in agreement on doing this, then, even if we don't
agree on the justification :-).


Agreed. :-)


Attached are new patch versions. Rebased them over current head, fixed a 
number of bugs in the seek and backspace code, and did a bunch of 
stylistic cleanups and comment fixes.


I changed the API of LogicalTapeBackspace() slightly. If you try to back 
up beyond the beginning of tape, it used to return FALSE and do nothing. 
Now it backspaces as much as it can, to the very beginning of the tape, 
and returns the amount backspaced. That was more convenient than the old 
behaviour with this new implementation.


- Heikki

>From 588d74efe33f380221391b1d8520dc4b4cfa09be Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 12 Oct 2016 12:26:25 +0300
Subject: [PATCH 1/2] Simplify tape block format.

No more indirect blocks. The blocks form a linked list instead.

This saves some memory, because we don't need to have a buffer in memory to
hold the indirect block (or blocks). To reflect that, TAPE_BUFFER_OVERHEAD
is reduced from 3 to 1 buffer, which allows using more memory for building
the initial runs.
---
 src/backend/utils/sort/logtape.c   | 627 +++--
 src/backend/utils/sort/tuplesort.c |  58 ++--
 src/include/utils/logtape.h|   4 +-
 3 files changed, 214 insertions(+), 475 deletions(-)

diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index 3163010..07d147d 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -31,15 +31,8 @@
  * in BLCKSZ-size blocks.  Space allocation boils down to keeping track
  * of which blocks in the underlying file belong to which logical tape,
  * plus any blocks that are free (recycled and not yet reused).
- * The blocks in each logical tape are remembered using a method borrowed
- * from the Unix HFS filesystem: we store data block numbers in an
- * "indirect block".  If an indirect block fills up, we write it out to
- * the underlying file and remember its location in a second-level indirect
- * block.  In the same way second-level blocks are remembered in third-
- * level blocks, and so on if necessary (of course we're talking huge
- * amounts of data here).  The topmost indirect block of a given logical
- * tape is never actually written out to the physical file, but all lower-
- * level indirect blocks will be.
+ * The blocks in each logical tape form a chain, with a prev- and next-
+ * pointer in each block.
  *
  * The initial write pass is guaranteed to fill the underlying file
  * perfectly sequentially, no matter how data is divided into logical tapes.
@@ -87,58 +80,65 @@
 #include "utils/memutils.h"
 
 /*
- * Block indexes are "long"s, so we can fit this many per indirect block.
- * NB: we assume this is an exact fit!
- */
-#define BLOCKS_PER_INDIR_BLOCK	((int) (BLCKSZ / sizeof(long)))
-
-/*
- * We use a struct like this for each active indirection level of each
- * logical tape.  If the indirect block is not the highest level of its
- * tape, the "nextup" link points to the next higher level.  Only the
- * "ptrs" array is written out if we have to dump the indirect block to
- * disk.  If "ptrs" is not completely full, we store -1L in the first
- * unused slot at completion of the write phase for the logical tape.
+ * A TapeBlockTrailer is stored at the end of each BLCKSZ block.
+ *
+ * The first block of a tape has prev == -1.  The last block of a tape
+ * stores the number of valid bytes on the block, inverted, in 'next'
+ * Therefore next < 0 indicates the last block.
  */
-typedef struct IndirectBlock
+typedef struct TapeBlockTrailer
 {
-	int			nextSlot;		/* next pointer slot to write or read */
-	struct IndirectBlock *nextup;		/* parent indirect level, or NULL if
-		 * top */
-	long		ptrs[BLOCKS_PER_INDIR_BLOCK];	/* indexes of contained blocks */
-} IndirectBlock;
+	long		prev;			/* previous block on this tape, or -1 on first
+ * block */
+	long		next;			/* next block on this tape, or # of valid
+ * bytes on last block (if < 0) */
+}	TapeBlockTrailer;
+
+#define TapeBlockPayloadSize  (BLCKSZ - sizeof(TapeBlockTrailer))
+#define TapeBlockGetTrailer(buf) \
+	((TapeBlockTrailer *) ((char *) buf + TapeBlockPayloadSize))
+
+#define TapeBlockIsLast(buf) (TapeBlockGetTrailer(buf)->next < 0)
+#define TapeBlockGetNBytes(buf) \
+	(TapeBlockIsLast(buf) ? \
+	 (- TapeBlockGetTrailer(buf)->next) : TapeBlockPayloadSize)
+#define TapeBlockSetNBytes(buf, nbytes) \
+	(TapeBlockGetTrailer(buf)->next = -(nbytes))
+
 
 /*
  * This data structure represents a single "logical tape" within the set
- * of logical tapes stored in the same file.  We must keep track of the
- * current partially-read-or-written data block as well as the active
- * indirect block level(s).
+ * of 

Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1

2016-10-12 Thread Tom Lane
Albe Laurenz  writes:
> Tom Lane wrote:
>> Except that we don't.  There aren't PGDLLEXPORT markings for any
>> functions exported from contrib modules, and we don't use dlltool
>> on them either.  By your argument, none of contrib would work on
>> Windows builds at all, but we have a ton of buildfarm evidence and
>> successful field use to the contrary.  How is that all working?

> I thought it was the job of src/tools/msvc/gendef.pl to generate
> .DEF files?

Hm, okay, so after further review and googling:

MSVC: gendef.pl is used to build a .DEF file, creating the equivalent
of --export-all-symbols behavior.

Mingw: we use handmade .DEF files for libpq and a few other libraries,
otherwise Makefile.shlib explicitly specifies -Wl,--export-all-symbols.

Cygwin: per https://sourceware.org/binutils/docs-2.17/ld/WIN32.html
--export-all-symbols is the default unless you use a .DEF file or have at
least one symbol marked __declspec(dllexport) --- but the Cygwin build
defines PGDLLEXPORT as empty, so there won't be any of the latter.

So it works for us, but the stackoverflow complainant is evidently using
some homebrew build process that does none of the above.

I'm okay with adding PGDLLEXPORT to the extern, but we should update
that comment to note that it's not necessary with any of our standard
Windows build processes.  (For that matter, the comment fails to explain
why this macro is providing an extern for the base function at all...)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU integration

2016-10-12 Thread Peter Eisentraut
On 9/30/16 4:32 PM, Thomas Munro wrote:
>> Hmm, yeah, that will need more work.  To be completely correct, I think,
>> > we'd also need to record the version in each expression node, so that
>> > check constraints of the form CHECK (x > 'abc') can be handled.
> Hmm.  That is quite a rabbit hole.  In theory you need to recheck such
> a constraint, but it's not at all clear when you should recheck and
> what you should do about it if it fails.  Similar for the future
> PARTITION feature.

I think it's not worth dealing with this in that much detail at the
moment.  It's not like the collation will just randomly change under you
(unlike with glibc).  It would have to involve pg_upgrade, physical
replication, or a rebuilt installation.  So I think I will change the
message to something to the effect of "however you got here, you can't
do that".  We can develop some recipes and ideas on the side for how to
recover situations like that and then maybe integrate tooling for that
later.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1

2016-10-12 Thread Albe Laurenz
Tom Lane wrote:
>> PostgreSQL itself seems to use export files that explicitly declare the
>> exported symbols, so it gets away without these decorations.
> 
> Except that we don't.  There aren't PGDLLEXPORT markings for any
> functions exported from contrib modules, and we don't use dlltool
> on them either.  By your argument, none of contrib would work on
> Windows builds at all, but we have a ton of buildfarm evidence and
> successful field use to the contrary.  How is that all working?

I thought it was the job of src/tools/msvc/gendef.pl to generate
.DEF files?  In the buildfarm logs I can find lines like:

  Generating CITEXT.DEF from directory Release\citext, platform x64
  Generating POSTGRES_FDW.DEF from directory Release\postgres_fdw, platform x64

which are emitted by gendef.pl.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1

2016-10-12 Thread Tom Lane
Albe Laurenz  writes:
> Tom Lane wrote:
>> The lack of complaints about this suggest that it's not actually necessary
>> to do so.  So what this makes me wonder is whether we can't drop the
>> DLLEXPORT on the finfo function too.  I'd rather reduce the number of
>> Microsoft-isms in the code, not increase it.

> I understand the sentiment.

> But it seems to actually cause a problem on Windows, at least there was a
> complaint here: http://stackoverflow.com/q/39964233

> Adding PGDLLEXPORT solved the problem there.

> I guess that there are not more complaints about that because
> few people build C functions on Windows themselves (lack of PGXS)
> and those who do are probably knowledgeable enough that they can
> fix it themselves by sticking an extra declaration with PGDLLEXPORT
> into their source file.

> PostgreSQL itself seems to use export files that explicitly declare the
> exported symbols, so it gets away without these decorations.

Except that we don't.  There aren't PGDLLEXPORT markings for any
functions exported from contrib modules, and we don't use dlltool
on them either.  By your argument, none of contrib would work on
Windows builds at all, but we have a ton of buildfarm evidence and
successful field use to the contrary.  How is that all working?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1

2016-10-12 Thread Albe Laurenz
Tom Lane wrote:
> Albe Laurenz  writes:
>> Currently, PG_FUNCTION_INFO_V1 is defined as
[...]
> 
>> Is there a good reason why the "funcname" declaration is not decorated
>> with PGDLLEXPORT?
> 
> The lack of complaints about this suggest that it's not actually necessary
> to do so.  So what this makes me wonder is whether we can't drop the
> DLLEXPORT on the finfo function too.  I'd rather reduce the number of
> Microsoft-isms in the code, not increase it.

I understand the sentiment.

But it seems to actually cause a problem on Windows, at least there was a
complaint here: http://stackoverflow.com/q/39964233

Adding PGDLLEXPORT solved the problem there.

I guess that there are not more complaints about that because
few people build C functions on Windows themselves (lack of PGXS)
and those who do are probably knowledgeable enough that they can
fix it themselves by sticking an extra declaration with PGDLLEXPORT
into their source file.

PostgreSQL itself seems to use export files that explicitly declare the
exported symbols, so it gets away without these decorations.

>> BTW, I admit I don't understand the comment.
> 
> It seems like an obvious typo.  Or, possibly, sloppy thinking that
> contributed to failure to recognize that the keyword isn't needed.

Looking through the history, it seems to be as follows:
- In 5fde8613, the comment was added (mistakenly) together with a DLLIMPORT 
decoration.
- In 77e50a61, the decoration was changed to PGDLLEXPORT, but the comment 
forgotten.
- In e7128e8d, the function prototype was added to the macro, but
  the PGDLLEXPORT decoration was forgotten.

The dlltool mentioned is MinGW, so it doesn't apply to people building
with MSVC.

Maybe the comment should be like this:

/*
 *  Macro to build an info function associated with the given function name.
 *  Win32 loadable functions usually link with 'dlltool --export-all' or use
 *  a .DEF file, but it doesn't hurt to add PGDLLEXPORT in case they don't.
 */

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Aggregate Push Down - Performing aggregation on foreign server

2016-10-12 Thread Jeevan Chalke
On Thu, Sep 8, 2016 at 10:41 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

>
>
> I think we should try to measure performance gain because of aggregate
> pushdown. The EXPLAIN
> doesn't show actual improvement in the execution times.
>

I did performance testing for aggregate push down and see good performance
with the patch.

Attached is the script I have used to get the performance numbers along with
the results I got with and without patch (pg_agg_push_down_v3.patch).  Also
attached few GNU plots from the readings I got.  These were run on my local
VM having following details:

Linux centos7 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 UTC
2016 x86_64 x86_64 x86_64 GNU/Linux
RAM alloted: 8 GB
CPUs alloted: 8
postgresql.conf is default.

With aggregate push down I see around 12x performance for count(*)
operation.
In another test, I have observed that if number of groups returned from
remote server is same as that of input rows, then aggregate push down
performs
slightly poor which I think is expected. However in all other cases where
number of groups are less than input rows, pushing down aggregate gives
better
performance than performing grouping on the local server.

I did this performance testing on my local setup. I would expected even
better
numbers on a specialized high-end performance machine.  I would be more than
happy if someone does this testing on such high-end machine.

Let me know if you need any help.

Thanks

-- 
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
=== WITH PATCH ===

query|  rows   
| avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time 
-+-+--+--+--+--
 select count(*) from fperf1 |   1 
| 141.8686 |  4.4668353500495 |  138.903 |  152.241
 select c2, avg(c1) from fperf1 group by c2 having count(*) < 34 |   2 
| 405.7661 | 11.9403142844368 |  400.689 |  439.675
 select c2, sum(c1) from fperf1 group by c2  |   3 
| 363.2299 | 4.29278180851687 |  354.815 |  369.739
 select c3, avg(c1), sum(c2) from fperf1 group by c3 |   5 
| 454.4478 | 3.98680590057494 |  447.248 |  457.955
 select c4, avg(c1), sum(c2) from fperf1 group by c4 |  10 
| 501.0197 | 5.26951028823454 |  491.726 |  508.574
 select c5, avg(c1), sum(c2) from fperf1 group by c5 | 100 
| 490.0783 | 5.64261263462349 |   480.78 |  496.662
 select c6, avg(c1), sum(c2) from fperf1 group by c6 |1000 
| 582.6842 |  9.9196984474776 |  564.425 |   592.69
 select c1%1, avg(c1), sum(c2) from fperf1 group by c1%1 |   1 
| 901.1682 | 9.58382273302904 |  888.383 |  923.386
 select c1%10, avg(c1), sum(c2) from fperf1 group by c1%10   |  10 
|1032.1959 | 6.89087326268629 | 1018.598 | 1045.423
 select c1%100, avg(c1), sum(c2) from fperf1 group by c1%100 | 100 
|1076.3834 | 11.1022883947539 | 1061.305 | 1093.892
 select c1%1000, avg(c1), sum(c2) from fperf1 group by c1%1000   |1000 
|1113.6001 | 11.2143472634172 | 1092.863 | 1133.007
 select c1%1, avg(c1), sum(c2) from fperf1 group by c1%1 |   1 
|1182.1374 | 32.5953859659133 | 1148.961 | 1231.296
 select c1%10, avg(c1), sum(c2) from fperf1 group by c1%10   |  10 
|1467.1811 | 14.3535175437048 |  1443.95 | 1485.645
 select c1%100, avg(c1), sum(c2) from fperf1 group by c1%100 | 100 
|5466.2306 | 633.367848489717 | 5127.339 | 7248.381
(14 rows)


=== WITHOUT PATCH ===

query|  rows   
| avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time 
-+-+--+--+--+--
 select count(*) from fperf1 |   1 
|1674.5339 | 27.1549108570754 | 1637.345 | 1725.057
 select c2, avg(c1) from fperf1 group by c2 having count(*) < 34 |   2 
|2467.8368 |  22.606929678949 | 2437.674 | 2506.438
 select c2, sum(c1) from fperf1 group by c2  |   3 
|  2387.39 | 34.3686766983568 | 2350.396 | 2444.313
 select c3, avg(c1), sum(c2) from fperf1 group by c3 |   5 
|2702.3344 | 28.0312843452488 | 2665.317 | 2746.862
 select c4, avg(c1), sum(c2) from fperf1 group by c4 |  10 
|2850.9818 | 42.5758532759606 | 2813.562 | 2946.991
 select c5, avg(c1), 

Re: [HACKERS] How to inspect tuples during execution of a plan?

2016-10-12 Thread Ernst-Georg Schmid
Hello,

>
> Either of those would do, if you want to write change the executor.
>

I used the ExeceutorRun_hook and it seems to work. The drawback is,
that I have to provide my own implementation of ExecutePlan() which
could make it incompatible over versions of PostgreSQL. I don't know
how stable that function is over time.

https://github.com/ergo70/pg_sentinel

Thank you,

Ernst-Georg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove "Source Code" column from \df+ ?

2016-10-12 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > As was mentioned, this thread doesn't really need a patch but rather
> > some comment from those who have voiced a -1 on removing the PL source
> > code column.
> 
> > In another, perhaps vain, attempt to get to a consensus, here's what it
> > looks like the current standings are for "Remove source from \df+",
> 
> I think this is oversimplified, because there are multiple proposals on
> the table, and it's not entirely clear to me who approves of which.

That's certainly fair and I had begun that email by trying to come up
with a way to represent everyone's positions fairly but, frankly, after
an hour of reading through the thread and noting the various changes in
positions, I got to the point where I felt...

> > There have been a number of voices asking that we do *something* here.
> 
> Yes.  I agree with your summary that Peter is the only one who appears
> to be in favor of "do nothing" (and even there, his complaint was at
> least partly procedural not substantive).

We really need a response on this part if we're going to actually make
any progress.

If we'd actually like to do a formal condorcet-style vote (or something
similar which allows preferences to be considered) over the various
options, I'm willing to put effort into making it happen, but only if
we'd actually agree to accept the result, otherwise we're just back here
again.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2016-10-12 Thread Julien Rouhaud
On 12/10/2016 14:32, Alvaro Herrera wrote:
> Julien Rouhaud wrote:
> 
>> and you can instead make macaddr64 support both format, and provide a
>> macaddr::macaddr64 cast
> 
> Having macaddr64 support both formats sounds nice, but how does it work?
> Will we have to reserve one additional bit to select the representation?
> That would make the type be 65 bits which is a clear loser IMO.
> 
> Is it allowed to just leave 16 bits as zeroes which would indicate that
> the address is EUI48?  I wouldn't think so ...
> 

>From what I read, you can indicate it's an EUI-48 address by storing
FF:FF (or FF:FE for MAC-48) in 4th and 5th bytes of the EUI-64 address.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1

2016-10-12 Thread Tom Lane
Albe Laurenz  writes:
> Currently, PG_FUNCTION_INFO_V1 is defined as
>   /*
>*  Macro to build an info function associated with the given function name.
>*  Win32 loadable functions usually link with 'dlltool --export-all', but 
> it
>*  doesn't hurt to add PGDLLIMPORT in case they don't.
>*/
>   #define PG_FUNCTION_INFO_V1(funcname) \
>   Datum funcname(PG_FUNCTION_ARGS); \
>   extern PGDLLEXPORT const Pg_finfo_record * 
> CppConcat(pg_finfo_,funcname)(void); \
>   const Pg_finfo_record * \
>   CppConcat(pg_finfo_,funcname) (void) \
>   { \
>   static const Pg_finfo_record my_finfo = { 1 }; \
>   return _finfo; \
>   } \
>   extern int no_such_variable

> Is there a good reason why the "funcname" declaration is not decorated
> with PGDLLEXPORT?

The lack of complaints about this suggest that it's not actually necessary
to do so.  So what this makes me wonder is whether we can't drop the
DLLEXPORT on the finfo function too.  I'd rather reduce the number of
Microsoft-isms in the code, not increase it.

> BTW, I admit I don't understand the comment.

It seems like an obvious typo.  Or, possibly, sloppy thinking that
contributed to failure to recognize that the keyword isn't needed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2016-10-12 Thread Tom Lane
Haribabu Kommi  writes:
> Here I attached a POC patch that adds the support for EUI-64 MAC address
> storage with a new datatype macaddr64. Currently this type takes only
> EUI-64 datatype, not accepts 48 bit MAC address.

Our other data types that have sizes in the names measure the sizes in
bytes (float4, int8, etc).  Should this be called macaddr8?

> As we are moving to PostgreSQL 10, so are there any plans of backward
> compatiblity breakage, so the existing macaddr datatype itself can be
> changed to support both 48 and 64 bit MAC addresses.

As others have noted, there is no likelihood that we'd take a disk-format-
compatibility-breaking patch for v10.  Even if we wanted to do that, the
above proposal would also break send/recv (binary COPY) compatibility for
macaddr.

I think that probably the best bet here is to have two types and put some
thought into making them interoperate where appropriate, as the various
sizes of int do.  It's kind of a shame that this won't look like the
approach used for inet addresses, but we're stuck.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >