[HACKERS] pg_dump, pg_dumpall and data durability
Hi all, In my quest of making the backup tools more compliant to data durability, here is a thread for pg_dump and pg_dumpall. Here is in a couple of lines my proposal: - Addition in _archiveHandle of a field to track if the dump generated should be synced or not. - This is effective for all modes, when the user specifies an output file. In short that's when fileSpec is not NULL. - Actually do the the sync in _EndData and _EndBlob[s] if appropriate. There is for example nothing to do for pg_backup_null.c - Addition of --nosync option to allow users to disable it. By default it is enabled. Note that to make the data durable, the file need to be sync'ed as well as its parent folder. So with pg_dump we can only make that really durable with -Fd. I think that in the case where the user specifies an output file for the other modes we should sync it, that's the best we can do. This last statement applies as well for pg_dumpall. Of course, if no output file is specified, that's up to the user to deal with the sync phases. Or more simply, as truly durability can just be achieved if each file and their parent directory are fsync'd, we just support the operation for -Fd. Still I'd like to think htat we had better do the best we can here and do things as well for the other modes. Thoughts? I'd like to prepare a patch according to those lines for the next CF. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Mention to pg_backup in pg_dump.c
Hi all, I just bumped into the following thing in pg_dump.c: /* Set default options based on progname */ if (strcmp(progname, "pg_backup") == 0) format = "c"; As far as I can see this comes from e8f69be0 dated of 2000. Couldn't this be removed? I have never seen traces of pg_backup in the code. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support
On 10/12/16, Tom Lanewrote: > Alvaro Herrera writes: >> Tom Lane wrote: >>> Vitaly Burovoy writes: P.S.: I still think it is a good idea to change storage format, >>> I'm not sure which part of "no" you didn't understand, I just paid attention to the words "likelihood" (mixed up with "likeliness"), "we wanted" and "probably". Also there was a note about "would also break send/recv" which behavior can be saved. And after your letter Julien Rouhaud wrote about mapping from MAC-48 to EUI-64 which leads absence of a bit indicated version of a stored value. Which can be considered as a new information. >>> but we're >>> not breaking on-disk compatibility of existing macaddr columns. Can I ask why? It will not be a varlen (typstorage will not be changed), it just changes typlen to 8 and typalign to 'd'. For every major release 9.0, 9.1, 9.2 .. 9.6 the docs says "A dump/restore using pg_dumpall, or use of pg_upgrade, is required". Both handle changes in a storage format. Do they? >>> Breaking the on-the-wire binary I/O representation seems like a >>> nonstarter as well. I wrote that for the EUI-48 (MAC-48) values the binary I/O representation can be saved. The binary format (in DataRow message) has a length of the column value which is reflected in PGresAttValue.len in libpq. If the client works with the binary format it must consult with the length of the data. But until the client works with (and columns have) MAC-48 nothing hurts it and PGresAttValue.len is "6" as now. >> I think the suggestion was to rename macaddr to macaddr6 or similar, >> keeping the existing behavior and the current OID. So existing columns >> would continue to work fine and maintain on-disk compatibility, but any >> newly created columns would become the 8-byte variant. > > ... and would have different I/O behavior from before, possibly breaking > applications that expect "macaddr" to mean what it used to. I'm still > dubious that that's a good idea. Only if a new type will send xx:xx:xx:FF:FF:xx:xx:xx instead of usual (expected) 6 octets long. Again, that case in my offer is similar (by I/O behavior) to "just change 'macaddr' to keep and accept both MAC-48 and MAC-64", but allows to use "-k" key for pg_upgrade to prevent rewriting possibly huge (for instance, 'log') tables (but users unexpectedly get "macaddr6" after upgrade in their columns and function names which looks strange enough). > The larger picture here is that we got very little thanks when we squeezed > IPv6 into the pre-existing inet datatype; Without a sarcasm, I thank a lot all people involved in it because it does not hurt me (and many other people) from distinguishing ipv4 and ipv6 at app-level. I write apps and just save remote address of clients to an "inet" column named "remote_ip" without thinking "what if we start serving clients via ipv6?"; or have a column named "allowed_ip" with IPs or subnets and just save client's IPv4 or IPv6 as a white list (and use "allowed_ip >>= $1"). It just works. > there's a large number of people > who just said "no thanks" and started using the add-on ip4r type instead. I found a repository[1] at github. From the description it is understandable why people used ip4r those days (2005 year). The reason "Furthermore, they are variable length types (to support ipv6) with non-trivial overheads" is mentioned as the last in its README. When you deal with IPv4 in 99.999%, storing it in TOAST tables leads to a big penalty, but the new version of macaddr is not so wide, so it does not lead to similar speed decrease (it will be stored inplace). > So I'm not sure why we want to complicate our lives in order to make > macaddr follow the same path. Because according to the Wiki[3] MAC addresses now "are formed according to the rules of one of three numbering name spaces ...: MAC-48, EUI-48, and EUI-64.", so IEEE extended range of allowed values from 48 to 64 bits and since Postgres claims supporting of "mac addresses", I (as a developer who still uses PG as a primary database) expect supporting of any kind of mac address, not a limited one. I expect it is just works. I reject to imagine what I have to do if I have a column of a type "macaddr" and unexpectedly I have to deal with an input of EUI-64 type. Add a new column or change columns's type? In the first case what to do with stored procedures? Duplicate input parameter to pass the new column of macaddr8 (if macaddr was passed before)? Duplicate stored procedure? Also I have to support two columns at the application level. Why? I just want to store it in the DB, work with it there and get it back! In the second case (if output will not be mapped to MAC-48 when it is possible) I have the same troubles as you wrote (oid, I/O and text representation at least for output, may be also for input). Moreover I still have to rewrite tables but not when I'm ready for it (at a migration stage from one
Re: [HACKERS] Change of extension name to new name
Haribabu Kommiwrites: > As we are planning to change an extension name from one name to another > name because of additional features that are added into this extension, The usual approach to that is just to increase the version number. Why is it necessary to change the name? > I just thought of adding the support of (ALTER EXTENSION name RENAME To > newname), this can be executed before executing the pg_upgrade to the new > extension name that is available in the > newer version. And if the user forgets to do that before upgrading? Not to mention that the extension is mostly broken the moment its SQL name no longer corresponds to the on-disk control file name. This seems like a non-solution. In general, once you've shipped something, changing its name is a huge pain both for you and your users. Just say no. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Extend framework from commit 53be0b1ad to report latch waits.
On Thu, Oct 13, 2016 at 8:38 AM, Andres Freundwrote: > On 2016-10-12 11:18:57 -0400, Peter Eisentraut wrote: >> On 10/4/16 11:29 AM, Tom Lane wrote: >> > Robert Haas writes: >> >> Apparently, 'make world' does not build worker_spi. I thought 'make >> >> world' was supposed to build everything? >> > >> > You'd have thunk, yeah. It looks like the issue is that src/Makefile >> > is selective about recursing into certain subdirectories of test/, >> > but mostly not test/ itself. src/test/Makefile naively believes it's >> > in charge, though. Probably that logic ought to get shoved down one >> > level, and then adjusted so that src/test/modules gets built by "all". >> > Or else teach top-level "make world" to do "make all" in src/test/, >> > but that seems like it's just doubling down on confusing interconnections. >> >> We generally don't build test code during make world. > > FWIW, I find that quite annoying. I want to run make world with parallelism so > I can run make world afterwards with as little unnecessary > unparallelized work. And since the latter takes just about forever, any > errors visible earlier are good. > > What are we gaining by this behaviour? Personally, I have been trapped by the fact that worker_spi does not get built when doing a simple check-world recently... So +1 for just building it when running check-world. We gain nothing with the current behavior except alarms on the buildfarm. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support
On Thu, Oct 13, 2016 at 7:59 AM, Tom Lanewrote: > Alvaro Herrera writes: > > Tom Lane wrote: > >> Vitaly Burovoy writes: > >>> P.S.: I still think it is a good idea to change storage format, > > >> I'm not sure which part of "no" you didn't understand, but we're > >> not breaking on-disk compatibility of existing macaddr columns. > >> Breaking the on-the-wire binary I/O representation seems like a > >> nonstarter as well. > > > I think the suggestion was to rename macaddr to macaddr6 or similar, > > keeping the existing behavior and the current OID. So existing columns > > would continue to work fine and maintain on-disk compatibility, but any > > newly created columns would become the 8-byte variant. > > ... and would have different I/O behavior from before, possibly breaking > applications that expect "macaddr" to mean what it used to. I'm still > dubious that that's a good idea. > > The larger picture here is that we got very little thanks when we squeezed > IPv6 into the pre-existing inet datatype; there's a large number of people > who just said "no thanks" and started using the add-on ip4r type instead. > So I'm not sure why we want to complicate our lives in order to make > macaddr follow the same path. > > Thanks for all your opinions regarding the addition of new datatype to support EUI-64 Mac address, I will work on it and come up with a patch. Regards, Hari Babu Fujitsu Australia
[HACKERS] Change of extension name to new name
Hi All, As we are planning to change an extension name from one name to another name because of additional features that are added into this extension, so the name is not matching, so we decided to change the name, but it is causing problem to the already existing installations with old extension name tries to upgrade to a new version. 9.5 + 'extension_name1' -> 9.6 + 'extension_name2' fails during pg_upgrade when it tries to load the extension and it's object dependencies on 9.6. If we keep the same extension name then it passed. Is there any possibility to change the extension name without causing the pg_upgrade failure with minimal changes? I just thought of adding the support of (ALTER EXTENSION name RENAME To newname), this can be executed before executing the pg_upgrade to the new extension name that is available in the newer version. Is there any simpler way to change the extension name, if not any opinion about adding the syntax support to rename an extension? Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] parallel.sgml
> On Thu, Oct 13, 2016 at 10:57 AM, Tatsuo Ishiiwrote: >>> While reading parallel.sgml, I met with following sentence. >>> >>> If this occurs, the leader will execute the portion of the plan >>> between below the Gather node entirely by itself, >>> almost as if the Gather node were not present. >>> >>> Maybe "the portion of the plan between below" shoud have been >>> "the portion of the plan below"? >> >> Can someone, possibly a native English speaker, please confirm this is >> typo or not? I cannot parse the sentence as it is, but this maybe >> because I'm not a English speaker. > > No native speaker here, but it seems to me that "between" could be ripped off. Thanks. Unless there's an objection, I will commit the proposed change. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Non-empty default log_line_prefix
Peter Eisentraut wrote: > On 10/12/16 11:58 AM, Christoph Berg wrote: > > (Yes, the '' default might be fine for syslog, but I don't think > > that's a good argument for sticking with it for default installs. I've > > seen way too many useless log files out there, and at worst we'll have > > syslogs with two timestamps.) > > We'd have to think of a way to sort this out during upgrades. Sure, but we don't have to do that in this patch, do we? Some systems such as Debian copy the postgresql.conf file from the old installation to the new one, and then have specific code to adjust settings that no longer exist in the new version. I'm not a fan of that approach myself precisely because if PGDG installs new defaults (such as in this case), the sites using that upgrade tool don't benefit. A better approach might be to copy over the non-default settings from the old install to the new. If Debian were to do that then we wouldn't need to do anything here. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Parallel Index Scans
As of now, the driving table for parallel query is accessed by parallel sequential scan which limits its usage to a certain degree. Parallelising index scans would further increase the usage of parallel query in many more cases. This patch enables the parallelism for the btree scans. Supporting parallel index scan for other index types like hash, gist, spgist can be done as separate patches. The basic idea is quite similar to parallel heap scans which is that each worker (including leader whenever possible) will scan a block and then get the next block that is required to be scan. The parallelism in implemented at the leaf level of a btree. The first worker to start a btree scan will scan till leaf and others will wait till the first worker has reached till leaf. The first worker after reading the leaf block will set the next block to be read and wake the first worker waiting to scan the next block and proceed with scanning tuples from the block it has read, similarly each worker after reading the block, sets the next block to be read and wakes up the first waiting worker. This is achieved by using the condition variable patch [1] proposed by Robert. Parallelism is supported for both forward and backward scans. The optimizer will choose the parallelism based on number of pages in index relation and cpu cost for evaluating the rows is divided equally among workers. Index Scan node is made parallel aware and can be used beneath Gather as shown below: Current Plan for Index Scans Index Scan using idx2 on test (cost=0.42..7378.96 rows=2433 width=29) Index Cond: (c < 10) Parallel version of plan -- Gather (cost=1000.42..1243.40 rows=2433 width=29) Workers Planned: 1 -> Parallel Index Scan using idx2 on test (cost=0.42..0.10 rows=1431 width=29) Index Cond: (c < 10) The Parallel index scans can be used in parallelising aggregate queries as well. For example, given a query like: select count(*) from t1 where c1 > 1000 and c1 < 1100 and c2='aaa' Group By c2; below form of parallel plans are possible: Finalize HashAggregate Group Key: c2 -> Gather Workers Planned: 1 -> Partial HashAggregate Group Key: c2 -> Parallel Index Scan using idx_t1_partial on t1 Index Cond: ((c1 > 1000) AND (c1 < 1100)) Filter: (c2 = 'aaa'::bpchar) OR Finalize GroupAggregate Group Key: c2 -> Sort -> Gather Workers Planned: 1 -> Partial GroupAggregate Group Key: c2 -> Parallel Index Scan using idx_t1_partial on t1 Index Cond: ((c1 > 1000) AND (c1 < 1100)) Filter: (c2 = 'aaa'::bpchar) In the second plan (GroupAggregate), the Sort + Gather step would be replaced with GatherMerge, once we have a GatherMerge node as proposed by Rushabh [2]. Note, that above examples are just taken to explain the usage of parallel index scan, actual plans will be selected based on cost. Performance tests This test has been performed on community m/c (hydra, POWER-7). Initialize pgbench with 3000 scale factor (./pgbench -i -s 3000 postgres) Count the rows in pgbench_accounts based on values of aid and bid Serial plan -- set max_parallel_workers_per_gather=0; postgres=# explain analyze select count(aid) from pgbench_accounts where aid > 1000 and aid < 9000 and bid > 800 and bid < 900; QUERY PLAN Aggregate (cost=4714590.52..4714590.53 rows=1 width=8) (actual time=35684.425..35684.425 rows=1 loops=1) -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.57..4707458.12 rows=2852961 width=4) (actual time=29210.743..34385.271 rows=990 loops =1) Index Cond: ((aid > 1000) AND (aid < 9000)) Filter: ((bid > 800) AND (bid < 900)) Rows Removed by Filter: 80098999 Planning time: 0.183 ms Execution time: 35684.459 ms (7 rows) Parallel Plan --- set max_parallel_workers_per_gather=2; postgres=# explain analyze select count(aid) from pgbench_accounts where aid > 1000 and aid < 9000 and bid > 800 and bid < 900; QUERY PLAN - Finalize Aggregate (cost=3924773.13..3924773.14 rows=1 width=8) (actual time=15033.105..15033.105 rows=1 loops=1) -> Gather (cost=3924772.92..3924773.12 rows=2 width=8) (actual time=15032.986..15033.093 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2
Re: [HACKERS] parallel.sgml
On Thu, Oct 13, 2016 at 10:57 AM, Tatsuo Ishiiwrote: >> While reading parallel.sgml, I met with following sentence. >> >> If this occurs, the leader will execute the portion of the plan >> between below the Gather node entirely by itself, >> almost as if the Gather node were not present. >> >> Maybe "the portion of the plan between below" shoud have been >> "the portion of the plan below"? > > Can someone, possibly a native English speaker, please confirm this is > typo or not? I cannot parse the sentence as it is, but this maybe > because I'm not a English speaker. No native speaker here, but it seems to me that "between" could be ripped off. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Non-empty default log_line_prefix
On 10/12/16 11:58 AM, Christoph Berg wrote: > (Yes, the '' default might be fine for syslog, but I don't think > that's a good argument for sticking with it for default installs. I've > seen way too many useless log files out there, and at worst we'll have > syslogs with two timestamps.) We'd have to think of a way to sort this out during upgrades. Documentation might be enough. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Non-empty default log_line_prefix
On 10/12/16 11:40 AM, Tom Lane wrote: > There would be some value in the complexity you're thinking of for > installations that log to multiple targets concurrently, but really, > who does that? I see that a lot. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 10/12/2016 08:55 PM, Robert Haas wrote: > On Wed, Oct 12, 2016 at 3:21 AM, Dilip Kumarwrote: >> I think at higher client count from client count 96 onwards contention >> on CLogControlLock is clearly visible and which is completely solved >> with group lock patch. >> >> And at lower client count 32,64 contention on CLogControlLock is not >> significant hence we can not see any gain with group lock patch. >> (though we can see some contention on CLogControlLock is reduced at 64 >> clients.) > > I agree with these conclusions. I had a chance to talk with Andres > this morning at Postgres Vision and based on that conversation I'd > like to suggest a couple of additional tests: > > 1. Repeat this test on x86. In particular, I think you should test on > the EnterpriseDB server cthulhu, which is an 8-socket x86 server. > > 2. Repeat this test with a mixed read-write workload, like -b > tpcb-like@1 -b select-only@9 > FWIW, I'm already running similar benchmarks on an x86 machine with 72 cores (144 with HT). It's "just" a 4-socket system, but the results I got so far seem quite interesting. The tooling and results (pushed incrementally) are available here: https://bitbucket.org/tvondra/hp05-results/overview The tooling is completely automated, and it also collects various stats, like for example the wait event. So perhaps we could simply run it on ctulhu and get comparable results, and also more thorough data sets than just snippets posted to the list? There's also a bunch of reports for the 5 already completed runs - dilip-300-logged-sync - dilip-300-unlogged-sync - pgbench-300-logged-sync-skip - pgbench-300-unlogged-sync-noskip - pgbench-300-unlogged-sync-skip The name identifies the workload type, scale and whether the tables are wal-logged (for pgbench the "skip" means "-N" while "noskip" does regular pgbench). For example the "reports/wait-events-count-patches.txt" compares the wait even stats with different patches applied (and master): https://bitbucket.org/tvondra/hp05-results/src/506d0bee9e6557b015a31d72f6c3506e3f198c17/reports/wait-events-count-patches.txt?at=master=file-view-default and average tps (from 3 runs, 5 minutes each): https://bitbucket.org/tvondra/hp05-results/src/506d0bee9e6557b015a31d72f6c3506e3f198c17/reports/tps-avg-patches.txt?at=master=file-view-default There are certainly interesting bits. For example while the "logged" case is dominated y WALWriteLock for most client counts, for large client counts that's no longer true. Consider for example dilip-300-logged-sync results with 216 clients: wait_event | master | gran_lock | no_cont_lock | group_upd +-+---+--+--- CLogControlLock | 624566 |474261 | 458599 |225338 WALWriteLock| 431106 |623142 | 619596 |699224 | 331542 |358220 | 371393 |537076 buffer_content | 261308 |134764 | 138664 |102057 ClientRead | 59826 |100883 | 103609 |118379 transactionid | 26966 | 23155 |23815 | 31700 ProcArrayLock |3967 | 3852 | 4070 | 4576 wal_insert |3948 | 10430 | 9513 | 12079 clog|1710 | 4006 | 2443 | 925 XidGenLock |1689 | 3785 | 4229 | 3539 tuple | 965 | 617 | 655 | 840 lock_manager| 300 | 571 | 619 | 802 WALBufMappingLock | 168 | 140 | 158 | 147 SubtransControlLock | 60 | 115 | 124 | 105 Clearly, CLOG is an issue here, and it's (slightly) improved by all the patches (group_update performing the best). And with 288 clients (which is 2x the number of virtual cores in the machine, so not entirely crazy) you get this: wait_event | master | gran_lock | no_cont_lock | group_upd +-+---+--+--- CLogControlLock | 901670 |736822 | 728823 |398111 buffer_content | 492637 |318129 | 319251 |270416 WALWriteLock| 414371 |593804 | 589809 |656613 | 380344 |452936 | 470178 |745790 ClientRead | 60261 |111367 | 111391 |126151 transactionid | 43627 | 34585 |35464 | 48679 wal_insert |5423 | 29323 |25898 | 30191 ProcArrayLock |4379 | 3918 | 4006 | 4582 clog|2952 | 9135 | 5304 | 2514 XidGenLock |2182 | 9488 | 8894 | 8595 tuple |2176 | 1288 | 1409 | 1821 lock_manager| 323 | 797 | 827 | 1006 WALBufMappingLock | 124 | 124 |
Re: [HACKERS] parallel.sgml
> While reading parallel.sgml, I met with following sentence. > > If this occurs, the leader will execute the portion of the plan > between below the Gather node entirely by itself, > almost as if the Gather node were not present. > > Maybe "the portion of the plan between below" shoud have been > "the portion of the plan below"? Can someone, possibly a native English speaker, please confirm this is typo or not? I cannot parse the sentence as it is, but this maybe because I'm not a English speaker. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: scan key push down to heap [WIP]
On Thu, Oct 13, 2016 at 6:05 AM, Robert Haaswrote: > I seriously doubt that this should EVER be supported for anything > other than "var op const", and even then only for very simple > operators. Yes, with existing key push down infrastructure only "var op const", but If we extend that I think we can cover many other simple expressions, i.e Unary Operator : ISNULL, NOTNULL Other Simple expression : "Var op Const", "Var op Var", "Var op SimpleExpr(Var, Const)" .. We can not push down function expression because we can not guarantee that they are safe, but can we pushdown inbuilt functions ? (I think we can identify their safety based on their data type, but I am not too sure about this point). For example, texteq() is probably too complicated, because > it might de-TOAST, and that might involve doing a LOT of work while > holding the buffer content lock. But int4eq() would be OK. > > Part of the trick if we want to make this work is going to be figuring > out how we'll identify which operators are safe. Yes, I agree that's the difficult part. Can't we process full qual list and decide decide the operator are safe or not based on their datatype ? What I mean to say is instead of checking safety of each operator like texteq(), text_le()... we can directly discard any operator involving such kind of data types. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Typo in foreign.h
Attached fixes a minor typo: s/Thes/These/g Thanks, Amit diff --git a/src/include/foreign/foreign.h b/src/include/foreign/foreign.h index 5dc2c90..143566a 100644 --- a/src/include/foreign/foreign.h +++ b/src/include/foreign/foreign.h @@ -23,7 +23,7 @@ /* * Generic option types for validation. - * NB! Thes are treated as flags, so use only powers of two here. + * NB! These are treated as flags, so use only powers of two here. */ typedef enum { -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: scan key push down to heap [WIP]
On Tue, Oct 11, 2016 at 4:57 AM, Dilip Kumarwrote: > This patch will extract the expression from qual and prepare the scan > keys. Currently in POC version I have only supported "var OP const" > type of qual, because these type of quals can be pushed down using > existing framework. > > Purpose of this work is to first implement the basic functionality and > analyze the results. If results are good then we can extend it for > other type of expressions. > > However in future when we try to expand the support for complex > expressions, then we need to be very careful while selecting > pushable expression. It should not happen that we push something very > complex, which may cause contention with other write operation (as > HeapKeyTest is done under page lock). I seriously doubt that this should EVER be supported for anything other than "var op const", and even then only for very simple operators. For example, texteq() is probably too complicated, because it might de-TOAST, and that might involve doing a LOT of work while holding the buffer content lock. But int4eq() would be OK. Part of the trick if we want to make this work is going to be figuring out how we'll identify which operators are safe. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is it time to kill support for very old servers?
On Tue, Oct 11, 2016 at 11:34 AM, Heikki Linnakangaswrote: > On 10/11/2016 08:23 PM, Tom Lane wrote: >> Not sure where to go from here, but the idea of dropping V2 support >> is seeming attractive again. Or we could just continue the policy >> of benign neglect. > > Let's drop it. I agree. The evidence in Tom's latest email suggests that getting this to a point where it can be tested will be more work than anybody's likely to want to put in, and the return on investment doesn't seem likely to be good. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] another typo in parallel.sgml
On Mon, Oct 10, 2016 at 5:40 PM, Tatsuo Ishiiwrote: > I think I found a typo in parallel.sgml. I think so, too. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove vacuum_defer_cleanup_age
On Sun, Oct 9, 2016 at 9:51 PM, Josh Berkuswrote: > Given that hot_standby_feedback is pretty bulletproof now, and a lot of > the work in reducing replay conflicts, I think the utility of > vacuum_defer_cleanup_age is at an end. I really meant so submit a patch > to remove it to 9.6, but it got away from me. > > Any objections to removing the option in 10? I'm not sure I see the point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On Wed, Oct 12, 2016 at 8:13 AM, Peter Eisentrautwrote: > On 10/4/16 10:16 AM, Julien Rouhaud wrote: >> Please find attached v9 of the patch, adding the parallel worker class >> and changing max_worker_processes default to 16 and max_parallel_workers >> to 8. I also added Amit's explanation for the need of a write barrier >> in ForgetBackgroundWorker(). > > This approach totally messes up the decoupling between the background > worker facilities and consumers of those facilities. Having dozens of > lines of code in bgworker.c that does the accounting and resource > checking on behalf of parallel.c looks very suspicious. Once we add > logical replication workers or whatever, we'll be tempted to add even > more stuff in there and it will become a mess. I attach a new version of the patch that I've been hacking on in my "spare time", which reduces the footprint in bgworker.c quite a bit. I don't think it's wrong that the handling is done there, though. The process that is registering the background worker is well-placed to check whether there are already too many, and if it does not then the slot is consumed at least temporarily even if it busts the cap. On the flip side, the postmaster is the only process that is well-placed to know when a background worker terminates. The worker process itself can't be made responsible for it, as you suggest below, because it may never even start up in the first place (e.g. fork() returns EAGAIN). And the registering process can't be made responsible, because it might die before the worker. > I think we should think of a way where we can pass the required > information during background worker setup, like a pointer to the > resource-limiting GUC variable. I don't think this can work, per the above. > For style, I would also prefer the "class" to be a separate struct field > from the flags. I think that it makes sense to keep the class as part of the flags because then a large amount of the stuff in this patch that is concerned with propagating the flags around becomes unnecessary. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index e826c19..8e53edc 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1984,7 +1984,7 @@ include_dir 'conf.d' Sets the maximum number of background processes that the system can support. This parameter can only be set at server start. The - default is 8. + default is 16. @@ -2006,8 +2006,9 @@ include_dir 'conf.d' Sets the maximum number of workers that can be started by a single Gather node. Parallel workers are taken from the pool of processes established by - . Note that the requested - number of workers may not actually be available at run time. If this + , limited by + . Note that the requested + number of workers may not actually be available at runtime. If this occurs, the plan will run with fewer workers than expected, which may be inefficient. The default value is 2. Setting this value to 0 disables parallel query execution. @@ -2036,6 +2037,22 @@ include_dir 'conf.d' + + max_parallel_workers (integer) + +max_parallel_workers configuration parameter + + + + + Sets the maximum number of workers that the system can support for + parallel queries. The default value is 8. When increasing or + decreasing this value, consider also adjusting + . + + + + backend_flush_after (integer) diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c index 59dc394..1c32fcd 100644 --- a/src/backend/access/transam/parallel.c +++ b/src/backend/access/transam/parallel.c @@ -454,7 +454,8 @@ LaunchParallelWorkers(ParallelContext *pcxt) snprintf(worker.bgw_name, BGW_MAXLEN, "parallel worker for PID %d", MyProcPid); worker.bgw_flags = - BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION; + BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION + | BGWORKER_CLASS_PARALLEL; worker.bgw_start_time = BgWorkerStart_ConsistentState; worker.bgw_restart_time = BGW_NEVER_RESTART; worker.bgw_main = ParallelWorkerMain; diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c index 3a2929a..75f9a55 100644 --- a/src/backend/postmaster/bgworker.c +++ b/src/backend/postmaster/bgworker.c @@ -80,9 +80,22 @@ typedef struct BackgroundWorkerSlot BackgroundWorker worker; } BackgroundWorkerSlot; +/* + * In order to limit the total number of parallel workers (according to + * max_parallel_workers GUC), we maintain the number of active parallel + * workers. Since the postmaster cannot take
Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support
On Thu, Oct 13, 2016 at 7:31 AM, Vitaly Burovoywrote: > Haribabu Kommi, why have you read enough about EUI-64? > Your function "macaddr64_trunc" sets 4 lower bytes as 0 whereas > (according to the Wikipedia, but I can look for a standard) 3 bytes > are still define an OUI (organizationally unique identifier), so > truncation should be done for 5, not 4 lower octets. > > The macros "hibits" should be 3 octets long, not 4; "lobits" --- 5 bytes, > not 4. > In the other case your comparisons fail. > > What document have you used to write the patch? Are short form formats > correct in macaddr64_in? > Yes, OUI is 24 bits. I just created prototype patch to check community opinion on it. I checked the following links [1], [2] for the development of macaddr8. But the patch is not correct for all the cases, it is just a prototype to see whether it accepts 8 byte MAC address or not? [1] - http://standards.ieee.org/develop/regauth/tut/eui64.pdf [2] - https://en.wikipedia.org/wiki/MAC_address Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] Re: [COMMITTERS] pgsql: Extend framework from commit 53be0b1ad to report latch waits.
On 2016-10-12 11:18:57 -0400, Peter Eisentraut wrote: > On 10/4/16 11:29 AM, Tom Lane wrote: > > Robert Haaswrites: > >> Apparently, 'make world' does not build worker_spi. I thought 'make > >> world' was supposed to build everything? > > > > You'd have thunk, yeah. It looks like the issue is that src/Makefile > > is selective about recursing into certain subdirectories of test/, > > but mostly not test/ itself. src/test/Makefile naively believes it's > > in charge, though. Probably that logic ought to get shoved down one > > level, and then adjusted so that src/test/modules gets built by "all". > > Or else teach top-level "make world" to do "make all" in src/test/, > > but that seems like it's just doubling down on confusing interconnections. > > We generally don't build test code during make world. FWIW, I find that quite annoying. I want to run make world with parallelism so I can run make world afterwards with as little unnecessary unparallelized work. And since the latter takes just about forever, any errors visible earlier are good. What are we gaining by this behaviour? Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] munmap() failure due to sloppy handling of hugepage size
On Wed, Oct 12, 2016 at 5:18 PM, Tom Lanewrote: > Merlin Moncure writes: >> ISTM all this silliness is pretty much unique to linux anyways. >> Instead of reading the filesystem, what about doing test map and test >> unmap? > > And if mmap succeeds and munmap fails, you'll recover how exactly? > > If this API were less badly designed, we'd not be having this problem > in the first place ... I was thinking to 'guess' in a ^2 loop in the case the obvious unmap didn't work, finally aborting if no guess worked. :-). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump getBlobs query broken for 7.3 servers
On Wed, Oct 12, 2016 at 12:24 PM, Tom Lanewrote: > Robert Haas writes: >> On Wed, Oct 12, 2016 at 11:54 AM, Greg Stark wrote: >>> Fwiw I was considering proposing committing some patches for these old >>> releases to make them easier to build. I would suggest creating a tag >>> for a for this stable legacy version and limiting the commits to just: >>> >>> 1) Disabling warnings >>> 2) Fixing bugs in the configure scripts that occur on more recent >>> systems (version number parsing etc) >>> 3) Backporting things like the variable-length array code that prevents >>> building >>> 4) Adding compiler options like -fwrapv > >> I'd support that. The reason why we remove branches from support is >> so that we don't have to back-patch things to 10 or 15 branches when >> we have a bug fix. But that doesn't mean that we should prohibit all >> commits to those branches for any reason, and making it easier to test >> backward-compatibility when we want to do that seems like a good >> reason. > > Meh, I think that this will involve a great deal more work than it's > worth. We deal with moving-target platforms *all the time*. New compiler > optimizations break things, libraries such as OpenSSL whack things around, > other libraries such as uuid-ossp stop getting maintained and become > unusable on new platforms, bison decides to get stickier about comma > placement, yadda yadda yadda. How much of that work are we going to > back-port to dead branches? And to what extent is such effort going to be > self-defeating because the branch no longer behaves as it did back in the > day? > > If Greg wants to do this kind of work, he's got a commit bit. My position > is that we have a limited support lifespan for a reason, and I'm not going > to spend time on updating dead branches forever. To me, it's more useful > to test them in place on contemporary platforms. We've certainly got > enough old platforms laying about in the buildfarm and elsewhere. I agree that it shouldn't be an expectation that committers in general will do this, whether you or me or anyone else. However, I think that if Greg or some other committer wants to volunteer their own time to do some of it, that is fine. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] munmap() failure due to sloppy handling of hugepage size
On 2016-10-12 16:33:38 -0400, Tom Lane wrote: > Andres Freundwrites: > > On October 12, 2016 1:25:54 PM PDT, Tom Lane wrote: > >> A little bit of research suggests that on Linux the thing to do would > >> be to get the actual default hugepage size by reading /proc/meminfo and > >> looking for a line like "Hugepagesize: 2048 kB". > > > We had that, but Heikki ripped it out when merging... I think you're > > supposed to use /sys to get the available size. > > According to > https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt > looking into /proc/meminfo is the longer-standing API and thus is > likely to work on more kernel versions. MAP_HUGETLB, which we rely on for hugepage support, is newer than the introducing the /sys stuff. > Also, if you look into /sys then you are going to see multiple > possible values and it's not clear how to choose the right one. That's a fair point. It'd probably be good to use the largest we can, bounded by a percentage of max waste or such. But that's likely something for another day. Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] munmap() failure due to sloppy handling of hugepage size
Merlin Moncurewrites: > ISTM all this silliness is pretty much unique to linux anyways. > Instead of reading the filesystem, what about doing test map and test > unmap? And if mmap succeeds and munmap fails, you'll recover how exactly? If this API were less badly designed, we'd not be having this problem in the first place ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] munmap() failure due to sloppy handling of hugepage size
On Wed, Oct 12, 2016 at 5:10 PM, Tom Lanewrote: > Alvaro Herrera writes: >> Tom Lane wrote: >>> According to >>> https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt >>> looking into /proc/meminfo is the longer-standing API and thus is >>> likely to work on more kernel versions. Also, if you look into >>> /sys then you are going to see multiple possible values and it's >>> not clear how to choose the right one. > >> I'm not sure that this is the best rationale. In my system there are >> 2MB and 1GB huge page sizes; in systems with lots of memory (let's say 8 >> GB of shared memory is requested) it seems a clear winner to allocate 8 >> 1GB hugepages than 4096 2MB hugepages because the page table is so much >> smaller. The /proc interface only shows the 2MB page size, so if we go >> that route we'd not be getting the full benefit of the feature. > > And you'll tell mmap() which one to do how exactly? I haven't found > anything explaining how applications get to choose which page size applies > to their request. The kernel document says that /proc/meminfo reflects > the "default" size, and I'd assume that that's what we'll get from mmap. hm. for (recent) linux, I see: MAP_HUGE_2MB, MAP_HUGE_1GB (since Linux 3.8) Used in conjunction with MAP_HUGETLB to select alternative hugetlb page sizes (respectively, 2 MB and 1 GB) on systems that support multiple hugetlb page sizes. More generally, the desired huge page size can be configured by encoding the base-2 logarithm of the desired page size in the six bits at the offset MAP_HUGE_SHIFT. (A value of zero in this bit field provides the default huge page size; the default huge page size can be discovered vie the Hugepagesize field exposed by /proc/meminfo.) Thus, the above two constants are defined as: #define MAP_HUGE_2MB(21 << MAP_HUGE_SHIFT) #define MAP_HUGE_1GB(30 << MAP_HUGE_SHIFT) The range of huge page sizes that are supported by the system can be discovered by listing the subdirectories in /sys/kernel/mm/hugepages. via: http://man7.org/linux/man-pages/man2/mmap.2.html#NOTES ISTM all this silliness is pretty much unique to linux anyways. Instead of reading the filesystem, what about doing test map and test unmap? We could zero in on the page size for default I think with some probing of known possible values. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical replication connection information management
Peter Eisentraut wrote: > An idea is to make the foreign server concept more general and allow > it to exist independently of a foreign data wrapper. Then create more > specific syntax like > > CREATE SERVER node1 FOR SUBSCRIPTION OPTIONS ( ... ); > > or > > CREATE SUBSCRIPTION SERVER ... > > This would work a bit like pg_constraint, which can be tied to a table > or a type or even nothing (for the hypothetical assertions feature). I was with you until you mentioned pg_constraint, because as I understand it we're not entirely happy with that one catalog. We've talked about splitting it up into two catalogs for table and domain constraints (I don't think we've considered assertions). Doing this would cause us to drop the NOT NULL from pg_foreign_server.srvfdw. However, while this sounds bad I think it's not *too* bad: it's not as heavily used as the corresponding pg_constraint columns would be. In other words, this raised some red flags with me initially but I think it's okay. > We'd need a separate mechanism for controlling which user has the right > to create such subscription servers, but it might be acceptable at the > beginning to just require superuserness. We'll need to have a better answer to this sooner rather than later. Perhaps a new predefined grantable privilege? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] munmap() failure due to sloppy handling of hugepage size
Alvaro Herrerawrites: > Tom Lane wrote: >> According to >> https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt >> looking into /proc/meminfo is the longer-standing API and thus is >> likely to work on more kernel versions. Also, if you look into >> /sys then you are going to see multiple possible values and it's >> not clear how to choose the right one. > I'm not sure that this is the best rationale. In my system there are > 2MB and 1GB huge page sizes; in systems with lots of memory (let's say 8 > GB of shared memory is requested) it seems a clear winner to allocate 8 > 1GB hugepages than 4096 2MB hugepages because the page table is so much > smaller. The /proc interface only shows the 2MB page size, so if we go > that route we'd not be getting the full benefit of the feature. And you'll tell mmap() which one to do how exactly? I haven't found anything explaining how applications get to choose which page size applies to their request. The kernel document says that /proc/meminfo reflects the "default" size, and I'd assume that that's what we'll get from mmap. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1
I wrote: > Well, the buildfarm doesn't like that even a little bit. It seems that > the MSVC compiler does not like seeing both "extern Datum foo(...)" and > "extern PGDLLEXPORT Datum foo(...)", so anything that had an extern in > a .h file is failing. There is also quite a bit of phase-of-the-moon > behavior in here, because in some cases some functions are raising errors > and others that look entirely the same are not. I take back the latter comment --- I was comparing HEAD source code against errors reported on back branches, and at least some of the discrepancies are explained by additions/removals of separate "extern" declarations. But still, if we want to do this we're going to need to put PGDLLEXPORT in every V1 function extern declaration. We might be able to remove some of the externs as unnecessary instead, but any way you slice it it's going to be messy. For the moment I've reverted the change. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Add support for SRF and returning composites to pl/tcl
Attached is a patch that adds support for SRFs and returning composites from pl/tcl. This work was sponsored by Flight Aware. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 diff --git a/doc/src/sgml/pltcl.sgml b/doc/src/sgml/pltcl.sgml index 805cc89..1c185cb 100644 --- a/doc/src/sgml/pltcl.sgml +++ b/doc/src/sgml/pltcl.sgml @@ -173,8 +173,54 @@ $$ LANGUAGE pltcl; - There is currently no support for returning a composite-type - result value, nor for returning sets. + PL/Tcl functions can return a record containing multiple output + parameters. The function's Tcl code should return a list of + key-value pairs matching the output parameters. + + +CREATE FUNCTION square_cube(in int, out squared int, out cubed int) AS $$ +return [list squared [expr {$1 * $1}] cubed [expr {$1 * $1 * $1}]] +$$ LANGUAGE 'pltcl'; + + + + + Sets can be returned as a table type. The Tcl code should successively + call return_next with an argument consisting of a Tcl + list of key-value pairs. + + +CREATE OR REPLACE FUNCTION squared_srf(int,int) RETURNS TABLE (x int, y int) AS $$ +for {set i $1} {$i < $2} {incr i} { +return_next [list x $i y [expr {$i * $i}]] +} +$$ LANGUAGE 'pltcl'; + + + + + Any columns that are defined in the composite return type but absent from + a list of key-value pairs passed to return_next are implicitly + null in the corresponding row. PL/Tcl will generate a Tcl error when a + column name in the key-value list is not one of the defined columns. + + + + Similarly, functions can be defined as returning SETOF + with a user-defined data type. + + + + PL/Tcl functions can also use return_next to return a set of + a scalar data type. + + +CREATE OR REPLACE FUNCTION sequence(int,int) RETURNS SETOF int AS $$ +for {set i $1} {$i < $2} {incr i} { +return_next $i +} +$$ language 'pltcl'; + @@ -197,8 +243,10 @@ $$ LANGUAGE pltcl; displayed by a SELECT statement). Conversely, the return command will accept any string that is acceptable input format for - the function's declared return type. So, within the PL/Tcl function, - all values are just text strings. + the function's declared return type(s). Likewise when producing a + set using return_next, values are converted to their + native database data types. (A Tcl error is generated whenever this + conversion fails.) diff --git a/src/pl/tcl/expected/pltcl_queries.out b/src/pl/tcl/expected/pltcl_queries.out index 6cb1fdb..7a7b029 100644 --- a/src/pl/tcl/expected/pltcl_queries.out +++ b/src/pl/tcl/expected/pltcl_queries.out @@ -303,3 +303,44 @@ select tcl_lastoid('t2') > 0; t (1 row) +-- test compound return +select * from tcl_test_cube_squared(5); + squared | cubed +-+--- + 25 | 125 +(1 row) + +CREATE FUNCTION bad_record(OUT a text , OUT b text) AS $$return [list cow]$$ LANGUAGE pltcl; +SELECT bad_record(); +ERROR: list must have even number of elements +CREATE FUNCTION bad_field(OUT a text , OUT b text) AS $$return [list cow 1 a 2 b 3]$$ LANGUAGE pltcl; +SELECT bad_field(); +ERROR: Tcl list contains nonexistent column "cow" +CREATE OR REPLACE FUNCTION tcl_error(OUT a int, OUT b int) AS $$return {$$ LANGUAGE pltcl; +SELECT tcl_error(); +ERROR: missing close-brace +-- test SRF +select * from tcl_test_squared_rows(0,5); + x | y +---+ + 0 | 0 + 1 | 1 + 2 | 4 + 3 | 9 + 4 | 16 +(5 rows) + +CREATE OR REPLACE FUNCTION non_srf() RETURNS int AS $$return_next 1$$ LANGUAGE pltcl; +select non_srf(); +ERROR: cannot use return_next in a non-set-returning function +-- test setof returns +select * from tcl_test_sequence(0,5) as a; + a +--- + 0 + 1 + 2 + 3 + 4 +(5 rows) + diff --git a/src/pl/tcl/expected/pltcl_setup.out b/src/pl/tcl/expected/pltcl_setup.out index e65e9e3..5332187 100644 --- a/src/pl/tcl/expected/pltcl_setup.out +++ b/src/pl/tcl/expected/pltcl_setup.out @@ -569,6 +569,19 @@ create function tcl_error_handling_test() returns text as $$ return "no error" } $$ language pltcl; +CREATE OR REPLACE FUNCTION tcl_test_cube_squared(in int, out squared int, out cubed int) AS $$ +return [list squared [expr {$1 * $1}] cubed [expr {$1 * $1 * $1}]] +$$ LANGUAGE 'pltcl'; +CREATE OR REPLACE FUNCTION tcl_test_squared_rows(int,int) RETURNS TABLE (x int, y int) AS $$ +for {set i $1} {$i < $2} {incr i} { +return_next [list y [expr {$i * $i}] x $i] +} +$$ LANGUAGE 'pltcl'; +CREATE OR REPLACE FUNCTION tcl_test_sequence(int,int) RETURNS SETOF int AS $$ +for {set i $1} {$i < $2} {incr i} { +return_next $i +} +$$ language 'pltcl'; select tcl_error_handling_test(); tcl_error_handling_test
Re: [HACKERS] munmap() failure due to sloppy handling of hugepage size
Tom Lane wrote: > Andres Freundwrites: > > On October 12, 2016 1:25:54 PM PDT, Tom Lane wrote: > >> A little bit of research suggests that on Linux the thing to do would > >> be to get the actual default hugepage size by reading /proc/meminfo and > >> looking for a line like "Hugepagesize: 2048 kB". > > > We had that, but Heikki ripped it out when merging... I think you're > > supposed to use /sys to get the available size. > > According to > https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt > looking into /proc/meminfo is the longer-standing API and thus is > likely to work on more kernel versions. Also, if you look into > /sys then you are going to see multiple possible values and it's > not clear how to choose the right one. I'm not sure that this is the best rationale. In my system there are 2MB and 1GB huge page sizes; in systems with lots of memory (let's say 8 GB of shared memory is requested) it seems a clear winner to allocate 8 1GB hugepages than 4096 2MB hugepages because the page table is so much smaller. The /proc interface only shows the 2MB page size, so if we go that route we'd not be getting the full benefit of the feature. We could just fall back to reading /proc if we cannot find the /sys stuff; that makes it continue to work on older kernels. Regarding choosing one among several size choices, I'd think the larger huge page size is always better if the request size is at least that large, otherwise fall back to the next smaller one. (This could stand some actual research). -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FTS Configuration option
Thank you for sharing your thoughts! 2016-10-12 15:08 GMT+03:00 Emre Hasegeli: > However then the stemmer doesn't do a good job on those words, because > the changed characters are important for the language. What I really > needed was something like this: > >> ALTER TEXT SEARCH CONFIGURATION turkish >> ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, >> hword_part >> WITH (fix_mistyped_characters AND (turkish_hunspell OR turkish_stem) AND >> unaccent); Your syntax looks more flexible and prettier than with JOIN option. As I understand there are three steps here. On each step a dictionary return a lexeme and pass it to next dictionary. If dictionary return NULL then the processing will interrupt. With such syntax we also don't need the TSL_FILTER flag for lexeme. At the current time unaccent extension set this flag to pass a lexeme to a next dictionary. This flag is used by the text-search parser. It looks like a hard coded solution. User can't change this behaviour. Maybe also better to use -> instead of AND? AND would has another behaviour. I could create the following configuration: => ALTER TEXT SEARCH CONFIGURATION multi_conf ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part WITH (german_ispell AND english_ispell) OR simple; which will return both german_ispell and english_ispell results. But I'm not sure that this is a good solution. Of course if this syntax will be implemented, old syntax with commas also should be maintained. -- Artur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1
I wrote: > Albe Laurenzwrites: >> Tom Lane wrote: >>> I'm okay with adding PGDLLEXPORT to the extern, but we should update >>> that comment to note that it's not necessary with any of our standard >>> Windows build processes. (For that matter, the comment fails to explain >>> why this macro is providing an extern for the base function at all...) >> Here is a patch for that, including an attempt to improve the comment. > Pushed with some further twiddling of the comment. Well, the buildfarm doesn't like that even a little bit. It seems that the MSVC compiler does not like seeing both "extern Datum foo(...)" and "extern PGDLLEXPORT Datum foo(...)", so anything that had an extern in a .h file is failing. There is also quite a bit of phase-of-the-moon behavior in here, because in some cases some functions are raising errors and others that look entirely the same are not. We could plaster all those declarations with PGDLLEXPORT, but I'm rather considerably tempted to go back to the way things were, instead. I do not like patches that end up with Microsoft-droppings everywhere. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support
On 10/12/16, Tom Lanewrote: > Vitaly Burovoy writes: >> I'm sorry for the offtopic, but does anyone know a reason why a >> condition in mac.c > >>> if ((a < 0) || (a > 255) || (b < 0) || (b > 255) || >>> (c < 0) || (c > 255) || (d < 0) || (d > 255) || >>> (e < 0) || (e > 255) || (f < 0) || (f > 255)) > >> can not be rewritten as: > >>> if (((a | b | c | d | e | f) < 0) || >>> ((a | b | c | d | e | f) > 255)) > > Well, it's ugly and > it adds a bunch of assumptions about arithmetic > behavior that we don't particularly need to make. It explains the reason, thank you. I'm just not familiar with other architectures where it is not the same as in X86/X86-64. > If this were some > amazingly hot hot-spot then maybe it would be worth making the code > unreadable to save a few nanoseconds, but I doubt that it is. > (Anyway, you've not shown that there actually is any benefit ...) I don't think it has a speed benefit, especially comparing with the sscanf call. But personally for me the second variant does not seem ugly, just "no one bit in all variables is out of a byte" (looks better with comparison with "0xff" as sscanf operates with "%2x"). Sorry for my bad taste and for a noise. -- Best regards, Vitaly Burovoy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support
Alvaro Herrerawrites: > Tom Lane wrote: >> Vitaly Burovoy writes: >>> P.S.: I still think it is a good idea to change storage format, >> I'm not sure which part of "no" you didn't understand, but we're >> not breaking on-disk compatibility of existing macaddr columns. >> Breaking the on-the-wire binary I/O representation seems like a >> nonstarter as well. > I think the suggestion was to rename macaddr to macaddr6 or similar, > keeping the existing behavior and the current OID. So existing columns > would continue to work fine and maintain on-disk compatibility, but any > newly created columns would become the 8-byte variant. ... and would have different I/O behavior from before, possibly breaking applications that expect "macaddr" to mean what it used to. I'm still dubious that that's a good idea. The larger picture here is that we got very little thanks when we squeezed IPv6 into the pre-existing inet datatype; there's a large number of people who just said "no thanks" and started using the add-on ip4r type instead. So I'm not sure why we want to complicate our lives in order to make macaddr follow the same path. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support
Vitaly Burovoywrites: > I'm sorry for the offtopic, but does anyone know a reason why a > condition in mac.c >> if ((a < 0) || (a > 255) || (b < 0) || (b > 255) || >> (c < 0) || (c > 255) || (d < 0) || (d > 255) || >> (e < 0) || (e > 255) || (f < 0) || (f > 255)) > can not be rewritten as: >> if (((a | b | c | d | e | f) < 0) || >> ((a | b | c | d | e | f) > 255)) Well, it's ugly and it adds a bunch of assumptions about arithmetic behavior that we don't particularly need to make. If this were some amazingly hot hot-spot then maybe it would be worth making the code unreadable to save a few nanoseconds, but I doubt that it is. (Anyway, you've not shown that there actually is any benefit ...) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support
Tom Lane wrote: > Vitaly Burovoywrites: > > P.S.: I still think it is a good idea to change storage format, > > I'm not sure which part of "no" you didn't understand, but we're > not breaking on-disk compatibility of existing macaddr columns. > Breaking the on-the-wire binary I/O representation seems like a > nonstarter as well. I think the suggestion was to rename macaddr to macaddr6 or similar, keeping the existing behavior and the current OID. So existing columns would continue to work fine and maintain on-disk compatibility, but any newly created columns would become the 8-byte variant. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support
Vitaly Burovoywrites: > P.S.: I still think it is a good idea to change storage format, I'm not sure which part of "no" you didn't understand, but we're not breaking on-disk compatibility of existing macaddr columns. Breaking the on-the-wire binary I/O representation seems like a nonstarter as well. If you can think of a way to have one type do it all without breaking that, then fine; but that seems like a pretty hard problem. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support
I'm sorry for the offtopic, but does anyone know a reason why a condition in mac.c > if ((a < 0) || (a > 255) || (b < 0) || (b > 255) || > (c < 0) || (c > 255) || (d < 0) || (d > 255) || > (e < 0) || (e > 255) || (f < 0) || (f > 255)) can not be rewritten as: > if (((a | b | c | d | e | f) < 0) || > ((a | b | c | d | e | f) > 255)) It seems more compact and a compiler can optimize it to keep a result of a binary OR for the comparison with 255... -- Best regards, Vitaly Burovoy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] munmap() failure due to sloppy handling of hugepage size
Andres Freundwrites: > On October 12, 2016 1:25:54 PM PDT, Tom Lane wrote: >> A little bit of research suggests that on Linux the thing to do would >> be to get the actual default hugepage size by reading /proc/meminfo and >> looking for a line like "Hugepagesize: 2048 kB". > We had that, but Heikki ripped it out when merging... I think you're supposed > to use /sys to get the available size. According to https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt looking into /proc/meminfo is the longer-standing API and thus is likely to work on more kernel versions. Also, if you look into /sys then you are going to see multiple possible values and it's not clear how to choose the right one. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support
On 10/12/16, Vitaly Burovoywrote: > On 10/12/16, Alvaro Herrera wrote: >> Julien Rouhaud wrote: >>> On 12/10/2016 14:32, Alvaro Herrera wrote: >>> > Julien Rouhaud wrote: >>> > >>> >> and you can instead make macaddr64 support both format, and provide a >>> >> macaddr::macaddr64 cast >>> > >>> > Having macaddr64 support both formats sounds nice, but how does it >>> > work? >>> > Will we have to reserve one additional bit to select the >>> > representation? >>> > That would make the type be 65 bits which is a clear loser IMO. >>> > >>> > Is it allowed to just leave 16 bits as zeroes which would indicate >>> > that >>> > the address is EUI48? I wouldn't think so ... >>> >>> From what I read, you can indicate it's an EUI-48 address by storing >>> FF:FF (or FF:FE for MAC-48) in 4th and 5th bytes of the EUI-64 address. >> >> That seems reasonable at first glance; so the new type would be able to >> store both 48-bit and 64-bit values, and there would be assignment casts >> in both directions > > I think either "macaddr" should be renamed to "macaddr6" (saved its > oid), a new type "macaddr8" is added with introducing a new alias > "macaddr" or the current "macaddr" should accept both versions as the > "inet" type does. > > The function "macaddr_recv" can distinguish them by the > StringInfoData.len member, "macaddr_in" - by number of parts split by > ":". > The "macaddr_out" and "macaddr_send" can out 6 octets if the stored > value is mapped to MAC-48. > Storing can be done always as 8 bytes using the rule above. > > In the other case there is hard from user's PoV to detect at the > design stage when it is necessary to define column as macaddr and when > to macaddr8. > If users need to update a column type (a new hardware appears with > EUI-64 address), they should keep in mind that all things are changed > for the new column type - stored procedure's parameters, application > code interacting with the DB etc.). > I don't agree with Tom's proposal to introduce a new type, it would be > inconvenient for users. > > We have types "int2", "int4", "int8" and an alias "int" for a type > "int4", at least psql does not show it: > postgres=# \dT+ int > List of data types > Schema | Name | Internal name | Size | Elements | Owner | Access > privileges | Description > +--+---+--+--+---+---+- > (0 rows) > > It is understandable to have 3 types for integers because most space > of the DB occupied by them and in the real life we just don't use big > numbers, but cases for "inet" and "macaddr" are different. > >> and a suite of operators to enable interoperability >> of 48-bit values in macaddr8 with values in type macaddr. Right? >> >> (The cast function from macaddr8 to macaddr would raise error if the >> 4th and 5th bytes are not either FF:FF or FF:FE -- I don't think we can >> in practice distinguish EUI-48 from MAC-48 in this context. > > The wikipedia says[1] they are the same things but MAC-48 is an > obsolete term for a special case, so there is no necessary to > distinguish them. > >> The cast in the other direction would have no restriction and should >> probably always use FF:FE). > > [1] > https://en.wikipedia.org/wiki/Organizationally_unique_identifier#48-bit_Media_Access_Control_Identifier_.28MAC-48.29 Haribabu Kommi, why have you read enough about EUI-64? Your function "macaddr64_trunc" sets 4 lower bytes as 0 whereas (according to the Wikipedia, but I can look for a standard) 3 bytes are still define an OUI (organizationally unique identifier), so truncation should be done for 5, not 4 lower octets. The macros "hibits" should be 3 octets long, not 4; "lobits" --- 5 bytes, not 4. In the other case your comparisons fail. What document have you used to write the patch? Are short form formats correct in macaddr64_in? P.S.: I still think it is a good idea to change storage format, macaddr_{in,out,send,recv}, fill 4th and 5th bytes if necessary; change "lobits" macros and add new fields to bit operation functions. It avoids a new type, casting, comparison functions between macaddr6 and macaddr8 etc. Proposed patch is mostly copy-paste of mac.c -- Best regards, Vitaly Burovoy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] munmap() failure due to sloppy handling of hugepage size
On October 12, 2016 1:25:54 PM PDT, Tom Lanewrote: >If any of you were following the thread at >https://www.postgresql.org/message-id/flat/CAOan6TnQeSGcu_627NXQ2Z%2BWyhUzBjhERBm5RN9D0QFWmk7PoQ%40mail.gmail.com >I spent quite a bit of time following a bogus theory, but the problem >turns out to be very simple: on Linux, munmap() is pickier than mmap() >about the length of a hugepage allocation. The comments in >sysv_shmem.c >mention that on older kernels mmap() with MAP_HUGETLB will fail if >given >a length request that's not a multiple of the hugepage size. Well, the >behavior they replaced that with is little better: mmap() succeeds, but >it gives you back a region that's been silently enlarged to the next >hugepage boundary, and then munmap() will fail if you specify the >region >size you asked for rather than the region size you were given. > >Since AFAICS there is no way to inquire what region size you were >given, >this API is astonishingly brain-dead IMO. But that seems to be what >we've got. Chris Richards reported it against a 3.16.7 kernel, and >I can replicate the behavior on RHEL6 (2.6.32) by asking for an >odd-size >huge page region. > >We've mostly masked this by rounding up to a 2MB boundary, which is >what >the hugepage size typically is. But that assumption is wrong on some >hardware, and it's not likely to get less wrong as time passes. > >A little bit of research suggests that on Linux the thing to do would >be >to get the actual default hugepage size by reading /proc/meminfo and >looking for a line like "Hugepagesize: 2048 kB". I don't know >of any more-portable API, so this does nothing for non-Linux kernels. >But we have not heard of similar misbehavior on other platforms, even >though IA64 and PPC64 can both have hugepages larger than 2MB, so it's >reasonable to hope that other implementations of munmap() don't have >the same gotcha. We had that, but Heikki ripped it out when merging... I think you're supposed to use /sys to get the available size. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] munmap() failure due to sloppy handling of hugepage size
If any of you were following the thread at https://www.postgresql.org/message-id/flat/CAOan6TnQeSGcu_627NXQ2Z%2BWyhUzBjhERBm5RN9D0QFWmk7PoQ%40mail.gmail.com I spent quite a bit of time following a bogus theory, but the problem turns out to be very simple: on Linux, munmap() is pickier than mmap() about the length of a hugepage allocation. The comments in sysv_shmem.c mention that on older kernels mmap() with MAP_HUGETLB will fail if given a length request that's not a multiple of the hugepage size. Well, the behavior they replaced that with is little better: mmap() succeeds, but it gives you back a region that's been silently enlarged to the next hugepage boundary, and then munmap() will fail if you specify the region size you asked for rather than the region size you were given. Since AFAICS there is no way to inquire what region size you were given, this API is astonishingly brain-dead IMO. But that seems to be what we've got. Chris Richards reported it against a 3.16.7 kernel, and I can replicate the behavior on RHEL6 (2.6.32) by asking for an odd-size huge page region. We've mostly masked this by rounding up to a 2MB boundary, which is what the hugepage size typically is. But that assumption is wrong on some hardware, and it's not likely to get less wrong as time passes. A little bit of research suggests that on Linux the thing to do would be to get the actual default hugepage size by reading /proc/meminfo and looking for a line like "Hugepagesize: 2048 kB". I don't know of any more-portable API, so this does nothing for non-Linux kernels. But we have not heard of similar misbehavior on other platforms, even though IA64 and PPC64 can both have hugepages larger than 2MB, so it's reasonable to hope that other implementations of munmap() don't have the same gotcha. Barring objections I'll go make this happen. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump getBlobs query broken for 7.3 servers
Robert Haaswrites: > On Wed, Oct 12, 2016 at 11:54 AM, Greg Stark wrote: >> Fwiw I was considering proposing committing some patches for these old >> releases to make them easier to build. I would suggest creating a tag >> for a for this stable legacy version and limiting the commits to just: >> >> 1) Disabling warnings >> 2) Fixing bugs in the configure scripts that occur on more recent >> systems (version number parsing etc) >> 3) Backporting things like the variable-length array code that prevents >> building >> 4) Adding compiler options like -fwrapv > I'd support that. The reason why we remove branches from support is > so that we don't have to back-patch things to 10 or 15 branches when > we have a bug fix. But that doesn't mean that we should prohibit all > commits to those branches for any reason, and making it easier to test > backward-compatibility when we want to do that seems like a good > reason. Meh, I think that this will involve a great deal more work than it's worth. We deal with moving-target platforms *all the time*. New compiler optimizations break things, libraries such as OpenSSL whack things around, other libraries such as uuid-ossp stop getting maintained and become unusable on new platforms, bison decides to get stickier about comma placement, yadda yadda yadda. How much of that work are we going to back-port to dead branches? And to what extent is such effort going to be self-defeating because the branch no longer behaves as it did back in the day? If Greg wants to do this kind of work, he's got a commit bit. My position is that we have a limited support lifespan for a reason, and I'm not going to spend time on updating dead branches forever. To me, it's more useful to test them in place on contemporary platforms. We've certainly got enough old platforms laying about in the buildfarm and elsewhere. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Non-empty default log_line_prefix
* Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Oct 12, 2016 at 11:20 AM, Christoph Bergwrote: > > Re: Jeff Janes 2016-10-12 > >
Re: [HACKERS] Non-empty default log_line_prefix
Hi, On Wed, 2016-10-12 at 13:11 -0400, Tom Lane wrote: > > What is the cost of using %m, other than 4 (rather compressible) bytes per > > log entry? > > More log I/O, which is not free ... FWIW, we've been setting log_line_prefix to '< %m > ' for quite a long time in PGDG RPMs, and did not get any complaints. I'd vote for %m for default. Regards, -- Devrim GÜNDÜZ EnterpriseDB: http://www.enterprisedb.com PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer Twitter: @DevrimGunduz , @DevrimGunduzTR signature.asc Description: This is a digitally signed message part
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Wed, Oct 12, 2016 at 11:09 AM, Robert Haaswrote: > I realize that you are primarily targeting utility commands here, and > that is obviously great, because making index builds faster is very > desirable. However, I'd just like to talk for a minute about how this > relates to parallel query. With Rushabh's Gather Merge patch, you can > now have a plan that looks like Gather Merge -> Sort -> whatever. > That patch also allows other patterns that are useful completely > independently of this patch, like Finalize GroupAggregate -> Gather > Merge -> Partial GroupAggregate -> Sort -> whatever, but the Gather > Merge -> Sort -> whatever path is very related to what this patch > does. For example, instead of committing this patch at all, we could > try to funnel index creation through the executor, building a plan of > that shape, and using the results to populate the index. I'm not > saying that's a good idea, but it could be done. Right, but that would be essentially the same approach as mine, but, I suspect, less efficient and more complicated. More importantly, it wouldn't be partitioning, and partitioning is what we really need within the executor. > On the flip side, what if anything can queries hope to get out of > parallel sort that they can't get out of Gather Merge? One > possibility is that a parallel sort might end up being substantially > faster than Gather Merge-over-non-parallel sort. In that case, we > obviously should prefer it. I must admit that I don't know enough about it to comment just yet. Offhand, it occurs to me that the Gather Merge sorted input could come from a number of different types of paths/nodes, whereas adopting what I've done here could only work more or less equivalently to "Gather Merge -> Sort -> Seq Scan" -- a special case, really. > For example, it's possible that you might want to have all > workers participate in sorting some data and then divide the result of > the sort into equal ranges that are again divided among the workers, > or that you might want all workers to sort and then each worker to > read a complete copy of the output data. But these don't seem like > particularly mainstream needs, nor do they necessarily seem like > problems that parallel sort itself should be trying to solve. This project of mine is about parallelizing tuplesort.c, which isn't really what you want for parallel query -- you shouldn't try to scope the problem as "make the sort more scalable using parallelism" there. Rather, you want to scope it at "make the execution of the entire query more scalable using parallelism", which is really quite a different thing, which necessarily involves the executor having direct knowledge of partition boundaries. Maybe the executor enlists tuplesort.c to help with those boundaries to some degree, but that whole thing is basically something which treats tuplesort.c as a low level primitive. > The > Volcano paper[1], one of the oldest and most-cited sources I can find > for research into parallel execution and with a design fairly similar > to our own executor, describes various variants of what they call > Exchange, of which what we now call Gather is one. I greatly respect the work of Goetz Graef, including his work on the Volcano paper. Graef has been the single biggest external influence on my work on Postgres. > They describe > another variant called Interchange, which acts like a Gather node > without terminating parallelism: every worker process reads the > complete output of an Interchange, which is the union of all rows > produced by all workers running the Interchange's input plan. That > seems like a better design than coupling such data flows specifically > to parallel sort. > > I'd like to think that parallel sort will help lots of queries, as > well as helping utility commands, but I'm not sure it will. Thoughts? You are right that I'm targeting the cases where we can get real benefits without really changing the tuplesort.h contract too much. This is literally the parallel tuplesort.c patch, which probably isn't very useful for parallel query, because the final output is always consumed serially here (this doesn't matter all that much for CREATE INDEX, I believe). This approach of mine seems like the simplest way of getting a large benefit to users involving parallelizing sorting, but I certainly don't imagine it to be the be all and end all. I have at least tried to anticipate how tuplesort.c will eventually serve the needs of partitioning for the benefit of parallel query. My intuition is that you'll have to teach it about partitioning boundaries fairly directly -- it won't do to add something generic to the executor. And, it probably won't be the only thing that needs to be taught about them. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump getBlobs query broken for 7.3 servers
On Wed, Oct 12, 2016 at 11:54 AM, Greg Starkwrote: > On Mon, Oct 10, 2016 at 9:52 PM, Greg Stark wrote: >> >> The code is here: >> >> https://github.com/gsstark/retropg >> >> The build script is called "makeall" and it applies patches from the >> "old-postgres-fixes" directory though some of the smarts are in the >> script because it knows about how to run older version of the >> configure script and it tries to fix up the ecpg parser duplcate >> tokens separately. It saves a diff after applying the patches and >> other fixups into the "net-diffs" directory but I've never checked if >> those diffs would work cleanly on their own. > > > Fwiw I was considering proposing committing some patches for these old > releases to make them easier to build. I would suggest creating a tag > for a for this stable legacy version and limiting the commits to just: > > 1) Disabling warnings > 2) Fixing bugs in the configure scripts that occur on more recent > systems (version number parsing etc) > 3) Backporting things like the variable-length array code that prevents > building > 4) Adding compiler options like -fwrapv I'd support that. The reason why we remove branches from support is so that we don't have to back-patch things to 10 or 15 branches when we have a bug fix. But that doesn't mean that we should prohibit all commits to those branches for any reason, and making it easier to test backward-compatibility when we want to do that seems like a good reason. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Non-empty default log_line_prefix
On Wed, Oct 12, 2016 at 11:20 AM, Christoph Bergwrote: > Re: Jeff Janes 2016-10-12 >
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Wed, Oct 12, 2016 at 3:21 AM, Dilip Kumarwrote: > I think at higher client count from client count 96 onwards contention > on CLogControlLock is clearly visible and which is completely solved > with group lock patch. > > And at lower client count 32,64 contention on CLogControlLock is not > significant hence we can not see any gain with group lock patch. > (though we can see some contention on CLogControlLock is reduced at 64 > clients.) I agree with these conclusions. I had a chance to talk with Andres this morning at Postgres Vision and based on that conversation I'd like to suggest a couple of additional tests: 1. Repeat this test on x86. In particular, I think you should test on the EnterpriseDB server cthulhu, which is an 8-socket x86 server. 2. Repeat this test with a mixed read-write workload, like -b tpcb-like@1 -b select-only@9 -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] int2vector and btree indexes
Amit Langotewrites: > On 2016/10/11 21:40, Tom Lane wrote: >> Hmm ... I kind of wonder why we have int2vectoreq or hashint2vector at >> all, likewise the hash opclass based on them. > Agreed. So how about the attached patch to remove the said infrastructure? Looks good, pushed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump getBlobs query broken for 7.3 servers
On Mon, Oct 10, 2016 at 9:52 PM, Greg Starkwrote: > > The code is here: > > https://github.com/gsstark/retropg > > The build script is called "makeall" and it applies patches from the > "old-postgres-fixes" directory though some of the smarts are in the > script because it knows about how to run older version of the > configure script and it tries to fix up the ecpg parser duplcate > tokens separately. It saves a diff after applying the patches and > other fixups into the "net-diffs" directory but I've never checked if > those diffs would work cleanly on their own. Fwiw I was considering proposing committing some patches for these old releases to make them easier to build. I would suggest creating a tag for a for this stable legacy version and limiting the commits to just: 1) Disabling warnings 2) Fixing bugs in the configure scripts that occur on more recent systems (version number parsing etc) 3) Backporting things like the variable-length array code that prevents building 4) Adding compiler options like -fwrapv -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Polyphase merge is obsolete
On 10/12/2016 08:27 PM, Tom Lane wrote: Heikki Linnakangaswrites: The beauty of the polyphase merge algorithm is that it allows reusing input tapes as output tapes efficiently ... So the whole idea of trying to efficiently reuse input tapes as output tapes is pointless. It's been awhile since I looked at that code, but I'm quite certain that it *never* thought it was dealing with actual tapes. Rather, the point of sticking with polyphase merge was that it allowed efficient incremental re-use of temporary disk files, so that the maximum on-disk footprint was only about equal to the volume of data to be sorted, rather than being a multiple of that. Have we thrown that property away? No, there's no difference to that behavior. logtape.c takes care of incremental re-use of disk space, regardless of the merging pattern. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Non-empty default log_line_prefix
Re: Jeff Janes 2016-10-12
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Fri, Oct 7, 2016 at 5:47 PM, Peter Geogheganwrote: > Work is still needed on: > > * Cost model. Should probably attempt to guess final index size, and > derive calculation of number of workers from that. Also, I'm concerned > that I haven't given enough thought to the low end, where with default > settings most CREATE INDEX statements will use at least one parallel > worker. > > * The whole way that I teach nbtsort.c to disallow catalog tables for > parallel CREATE INDEX due to concerns about parallel safety is in need > of expert review, preferably from Robert. It's complicated in a way > that relies on things happening or not happening from a distance. > > * Heikki seems to want to change more about logtape.c, and its use of > indirection blocks. That may evolve, but for now I can only target the > master branch. > > * More extensive performance testing. I think that this V3 is probably > the fastest version yet, what with Heikki's improvements, but I > haven't really verified that. I realize that you are primarily targeting utility commands here, and that is obviously great, because making index builds faster is very desirable. However, I'd just like to talk for a minute about how this relates to parallel query. With Rushabh's Gather Merge patch, you can now have a plan that looks like Gather Merge -> Sort -> whatever. That patch also allows other patterns that are useful completely independently of this patch, like Finalize GroupAggregate -> Gather Merge -> Partial GroupAggregate -> Sort -> whatever, but the Gather Merge -> Sort -> whatever path is very related to what this patch does. For example, instead of committing this patch at all, we could try to funnel index creation through the executor, building a plan of that shape, and using the results to populate the index. I'm not saying that's a good idea, but it could be done. On the flip side, what if anything can queries hope to get out of parallel sort that they can't get out of Gather Merge? One possibility is that a parallel sort might end up being substantially faster than Gather Merge-over-non-parallel sort. In that case, we obviously should prefer it. Other possibilities seem a little obscure. For example, it's possible that you might want to have all workers participate in sorting some data and then divide the result of the sort into equal ranges that are again divided among the workers, or that you might want all workers to sort and then each worker to read a complete copy of the output data. But these don't seem like particularly mainstream needs, nor do they necessarily seem like problems that parallel sort itself should be trying to solve. The Volcano paper[1], one of the oldest and most-cited sources I can find for research into parallel execution and with a design fairly similar to our own executor, describes various variants of what they call Exchange, of which what we now call Gather is one. They describe another variant called Interchange, which acts like a Gather node without terminating parallelism: every worker process reads the complete output of an Interchange, which is the union of all rows produced by all workers running the Interchange's input plan. That seems like a better design than coupling such data flows specifically to parallel sort. I'd like to think that parallel sort will help lots of queries, as well as helping utility commands, but I'm not sure it will. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] "Volcano - an Extensible and Parallel Query Evaluation System", https://pdfs.semanticscholar.org/865b/5f228f08ebac0b68d3a4bfd97929ee85e4b6.pdf [2] See "C. Variants of the Exchange Operator" on p. 13 of [1] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] logical replication connection information management
I want to discuss the connection information management aspect of the logical replication patch set that is currently being proposed (https://commitfest.postgresql.org/10/701/). To review, the user-visible interfaces center around -- on sending end CREATE PUBLICATION mypub FOR TABLE tbl1, tbl2, ...; -- on receiving end CREATE SUBSCRIPTION mysub PUBLICATION mypub CONNECTION 'host= dbname= ...' Both of these map pretty directly into system catalogs pg_publication and pg_subscription. The concern is about storing the connection information. Right now, this is just a string that is stored and passed to libpqwalreceiver. But this string can contain passwords, so it needs to be protected. Currently, pg_subscription has read permissions removed. This creates various annoyances. An idea was to use the facilities we already have for foreign data access for storing replication connection information. It already has considered and solved these problems. So it might look like this: CREATE SERVER node1 OPTIONS (host '...', dbname '...'); CREATE USER MAPPING FOR CURRENT_USER SERVER node1; CREATE SUBSCRIPTION mysub PUBLICATION mypub SERVER node1; This would have a number of advantages: - Secret information such as passwords is all stored in one place that is already secured. - Remote connection information is stored all in one place. - Subscriptions pointing to the same remote host are logically connected. - It's easier to change connection information for all subscriptions pointing to a host or to change the password of a user. - Access control can use existing facilities. We would not need a new concept for who can create subscriptions and not need to use superuser or some semi-superuser status. To allow the use of a server, grant USAGE on the server. So functionality-wise, this looks pretty good, but there is some awkwardness in how to wire this into the existing facilities, since a server, also known as a foreign server, is currently tied to a foreign data wrapper. I have currently implemented this by creating a fake built-in foreign data wrapper called "subscription", so the actual syntax is CREATE SERVER node1 WRAPPER subscription OPTIONS (host '...', dbname '...'); which isn't terrible, but still a bit weird. An idea is to make the foreign server concept more general and allow it to exist independently of a foreign data wrapper. Then create more specific syntax like CREATE SERVER node1 FOR SUBSCRIPTION OPTIONS ( ... ); or CREATE SUBSCRIPTION SERVER ... This would work a bit like pg_constraint, which can be tied to a table or a type or even nothing (for the hypothetical assertions feature). We'd need a separate mechanism for controlling which user has the right to create such subscription servers, but it might be acceptable at the beginning to just require superuserness. Thoughts on that? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove "Source Code" column from \df+ ?
2016-10-12 19:48 GMT+02:00 Peter Eisentraut < peter.eisentr...@2ndquadrant.com>: > On 10/12/16 11:16 AM, Tom Lane wrote: > > I'm not sure that Peter was voting for retaining "internal name", but > > personally I prefer that to deleting prosrc entirely, so +1. > > I'm not sure what the point of showing the internal name would be if we > have already declared that the source code of non-C functions is not > that interesting. But I don't have a strong feeling about it. > The benefit is for people who have to look on C implementation of internal functions. Probably not too big group - but it can be interesting for beginners who starting with reading of PostgreSQL code. Regards Pavel > > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: [HACKERS] Remove "Source Code" column from \df+ ?
On 10/12/16 11:16 AM, Tom Lane wrote: > I'm not sure that Peter was voting for retaining "internal name", but > personally I prefer that to deleting prosrc entirely, so +1. I'm not sure what the point of showing the internal name would be if we have already declared that the source code of non-C functions is not that interesting. But I don't have a strong feeling about it. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support
On 10/12/16, Alvaro Herrerawrote: > Julien Rouhaud wrote: >> On 12/10/2016 14:32, Alvaro Herrera wrote: >> > Julien Rouhaud wrote: >> > >> >> and you can instead make macaddr64 support both format, and provide a >> >> macaddr::macaddr64 cast >> > >> > Having macaddr64 support both formats sounds nice, but how does it >> > work? >> > Will we have to reserve one additional bit to select the >> > representation? >> > That would make the type be 65 bits which is a clear loser IMO. >> > >> > Is it allowed to just leave 16 bits as zeroes which would indicate that >> > the address is EUI48? I wouldn't think so ... >> >> From what I read, you can indicate it's an EUI-48 address by storing >> FF:FF (or FF:FE for MAC-48) in 4th and 5th bytes of the EUI-64 address. > > That seems reasonable at first glance; so the new type would be able to > store both 48-bit and 64-bit values, and there would be assignment casts > in both directions I think either "macaddr" should be renamed to "macaddr6" (saved its oid), a new type "macaddr8" is added with introducing a new alias "macaddr" or the current "macaddr" should accept both versions as the "inet" type does. The function "macaddr_recv" can distinguish them by the StringInfoData.len member, "macaddr_in" - by number of parts split by ":". The "macaddr_out" and "macaddr_send" can out 6 octets if the stored value is mapped to MAC-48. Storing can be done always as 8 bytes using the rule above. In the other case there is hard from user's PoV to detect at the design stage when it is necessary to define column as macaddr and when to macaddr8. If users need to update a column type (a new hardware appears with EUI-64 address), they should keep in mind that all things are changed for the new column type - stored procedure's parameters, application code interacting with the DB etc.). I don't agree with Tom's proposal to introduce a new type, it would be inconvenient for users. We have types "int2", "int4", "int8" and an alias "int" for a type "int4", at least psql does not show it: postgres=# \dT+ int List of data types Schema | Name | Internal name | Size | Elements | Owner | Access privileges | Description +--+---+--+--+---+---+- (0 rows) It is understandable to have 3 types for integers because most space of the DB occupied by them and in the real life we just don't use big numbers, but cases for "inet" and "macaddr" are different. > and a suite of operators to enable interoperability > of 48-bit values in macaddr8 with values in type macaddr. Right? > > (The cast function from macaddr8 to macaddr would raise error if the > 4th and 5th bytes are not either FF:FF or FF:FE -- I don't think we can > in practice distinguish EUI-48 from MAC-48 in this context. The wikipedia says[1] they are the same things but MAC-48 is an obsolete term for a special case, so there is no necessary to distinguish them. > The cast in the other direction would have no restriction and should > probably always use FF:FE). [1] https://en.wikipedia.org/wiki/Organizationally_unique_identifier#48-bit_Media_Access_Control_Identifier_.28MAC-48.29 -- Best regards, Vitaly Burovoy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Non-empty default log_line_prefix
On Wed, Oct 12, 2016 at 10:11 AM, Tom Lanewrote: > Jeff Janes writes: > > On Sun, Oct 2, 2016 at 1:20 PM, Christoph Berg wrote: > >> Patch attached. (Still using %t, I don't think %m makes sense for the > >> default.) > > > What is the cost of using %m, other than 4 (rather compressible) bytes > per > > log entry? > > More log I/O, which is not free ... and that remark about compressibility > is bogus for anyone who doesn't pipe their postmaster stderr into gzip. > I'm already afraid that adding the timestamps will get us some pushback > about log volume. > I don't pipe them into gzip, but every few months I go and pxz any of them more than few months old. Do you think the pushback will come from people who just accept the defaults? Cheers, Jeff
Re: [HACKERS] Polyphase merge is obsolete
Heikki Linnakangaswrites: > The beauty of the polyphase merge algorithm is that it allows reusing > input tapes as output tapes efficiently ... So the whole idea of trying to > efficiently reuse input tapes as output tapes is pointless. It's been awhile since I looked at that code, but I'm quite certain that it *never* thought it was dealing with actual tapes. Rather, the point of sticking with polyphase merge was that it allowed efficient incremental re-use of temporary disk files, so that the maximum on-disk footprint was only about equal to the volume of data to be sorted, rather than being a multiple of that. Have we thrown that property away? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Polyphase merge is obsolete
The beauty of the polyphase merge algorithm is that it allows reusing input tapes as output tapes efficiently. So if you have N tape drives, you can keep them all busy throughout the merge. That doesn't matter, when we can easily have as many "tape drives" as we want. In PostgreSQL, a tape drive consumes just a few kB of memory, for the buffers. With the patch being discussed to allow a tape to be "paused" between write passes [1], we don't even keep the tape buffers around, when a tape is not actively read written to, so all it consumes is the memory needed for the LogicalTape struct. The number of *input* tapes we can use in each merge pass is still limited, by the memory needed for the tape buffers and the merge heap, but only one output tape is active at any time. The inactive output tapes consume very few resources. So the whole idea of trying to efficiently reuse input tapes as output tapes is pointless. Let's switch over to a simple k-way balanced merge. Because it's simpler. If you're severely limited on memory, like when sorting 1GB of data with work_mem='1MB' or less, it's also slightly faster. I'm not too excited about the performance aspect, because in the typical case of a single-pass merge, there's no difference. But it would be worth changing on simplicity grounds, since we're mucking around in tuplesort.c anyway. I came up with the attached patch to do that. This patch applies on top of my latest "Logical tape pause/resume" patches [1]. It includes changes to the logtape.c interface, that make it possible to create and destroy LogicalTapes in a tapeset on the fly. I believe this will also come handy for Peter's parallel tuplesort patch set. [1] Logical tape pause/resume, https://www.postgresql.org/message-id/55b3b7ae-8dec-b188-b8eb-e07604052351%40iki.fi PS. I finally bit the bullet and got self a copy of The Art of Computer Programming, Vol 3, 2nd edition. In section 5.4 on External Sorting, Knuth says: " When this book was first written, magnetic tapes were abundant and disk drives were expensive. But disks became enormously better during the 1980s, and by the late 1990s they had almost completely replaced magnetic tape units on most of the world's computer systems. Therefore the once-crucial topic of tape merging has become of limited relevance to current needs. Yet many of the patterns are quite beautiful, and the associated algorithms reflect some of the best research done in computer science during its early days; the techniques are just too nice to be discarded abruptly onto the rubbish heap of history. Indeed, the ways in which these methods blend theory with practice are especially instructive. Therefore merging patterns are discussed carefully and completely below, in what may be their last grand appearance before they accept the final curtain call. " Yep, the polyphase and other merge patterns are beautiful. I enjoyed reading through those sections. Now let's put them to rest in PostgreSQL. - Heikki >From 15169f32f99a69401d3565c8d0ff0d532d6b6638 Mon Sep 17 00:00:00 2001 From: Heikki LinnakangasDate: Wed, 12 Oct 2016 20:04:17 +0300 Subject: [PATCH 1/1] Replace polyphase merge algorithm with a simple balanced k-way merge. The advantage of polyphase merge is that it can reuse the input tapes as output tapes efficiently, but that is irrelevant on modern hardware, when we can easily emulate any number of tape drives. The number of input tapes we can/should use during merging is limited by work_mem, but output tapes that we are not currently writing to only cost a little bit of memory, so there is no need to skimp on them. Refactor LogicalTapeSet/LogicalTape interface. All the tape functions, like LogicalTapeRead and LogicalTapeWrite, take a LogicalTape as argument, instead of LogicalTapeSet+tape number. You can create any number of LogicalTapes in a single LogicalTapeSet, and you don't need to decide the number upfront, when you create the tape set. --- src/backend/utils/sort/logtape.c | 223 + src/backend/utils/sort/tuplesort.c | 665 +++-- src/include/utils/logtape.h| 37 ++- 3 files changed, 366 insertions(+), 559 deletions(-) diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c index 1f540f0..2370fa5 100644 --- a/src/backend/utils/sort/logtape.c +++ b/src/backend/utils/sort/logtape.c @@ -118,6 +118,8 @@ typedef struct TapeBlockTrailer */ typedef struct LogicalTape { + LogicalTapeSet *tapeSet; /* tape set this tape is part of */ + bool writing; /* T while in write phase */ bool paused; /* T if the tape is paused */ bool frozen; /* T if blocks should not be freed when read */ @@ -173,10 +175,6 @@ struct LogicalTapeSet long *freeBlocks; /* resizable array */ int nFreeBlocks; /* # of currently free blocks */ int freeBlocksLen; /* current allocated length of freeBlocks[] */ - - /* The
Re: [HACKERS] Non-empty default log_line_prefix
Jeff Janeswrites: > On Sun, Oct 2, 2016 at 1:20 PM, Christoph Berg wrote: >> Patch attached. (Still using %t, I don't think %m makes sense for the >> default.) > What is the cost of using %m, other than 4 (rather compressible) bytes per > log entry? More log I/O, which is not free ... and that remark about compressibility is bogus for anyone who doesn't pipe their postmaster stderr into gzip. I'm already afraid that adding the timestamps will get us some pushback about log volume. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove "Source Code" column from \df+ ?
Stephen Frostwrites: > * Robert Haas (robertmh...@gmail.com) wrote: >> I'm still not used to the change that I have to use \d+ rather than \d >> to see the view definition. It's the #1 thing I want to see when >> examining a view, and since 2fe1b4dd651917aad2accac7ba8adb44d9f54930 I >> have to remember to stick a + sign in there. So, in short, I agree. > I definitely see the argument of "\d on a view used to give me the view > def and now it's almost useless and I have to remember to \d+ all the > time", but I also think that I might be able to retrain my fingers to > do \sv for views more easily than always remembering to add a '+' to \d, > which I use much more frequently than \sv or \d+. I'm unimpressed with the "I can retrain" argument, because that only applies to people who exclusively use the latest and greatest. \sv didn't exist before 9.6, so if I relearn to use that, it'll fail on me anytime I'm using a pre-9.6 psql, which is going to be a significant percentage of the time for awhile to come. Some quick digging says that \d on a view included the view definition in a footer since the very beginning (7.0 era, see commit a45195a19) and we changed it in 9.0 to require +. That's a heck of a lot of history and fingertip knowledge to override on the strength of one man's complaint, even if he is the man who made it that way in the first place. I also note that getting into the habit of using "\d+" to see view definitions didn't create any major problems when using older psqls. (BTW, \sf has been there since 9.1, which means that equating the compatibility situations for views and functions is a fallacy.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Non-empty default log_line_prefix
On Sun, Oct 2, 2016 at 1:20 PM, Christoph Bergwrote: > Re: Tom Lane 2016-09-29 <18642.1475159...@sss.pgh.pa.us> > > > Possibly the longer version could be added as an example in the > > > documentation. > > > > I suspect that simply having a nonempty default in the first place > > is going to do more to raise peoples' awareness than anything we > > could do in the documentation. But perhaps an example along these > > lines would be useful for showing proper use of %q. > > Patch attached. (Still using %t, I don't think %m makes sense for the > default.) > I don't agree with that part. When looking at sections of log files that people post on help forums, I've often wished people had shared milliseconds, and I've never wished they had truncated them off. If two messages are separated by 0.950 seconds, it can have entirely different implications than if they are separated by 0.002 seconds. What is the cost of using %m, other than 4 (rather compressible) bytes per log entry? Cheers, Jeff
Re: [HACKERS] Remove "Source Code" column from \df+ ?
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frostwrites: > > * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote: > >> I'm OK with just removing all the source codes from the \d family and > >> using the \s family instead. > > > Ok, great, thanks for clarifying that. Since we only have '\sf' today, > > I think the prevailing option here is then to make the change to > > removing 'prosrc' from \df+, have an 'internal name' column, and have > > users use \sf for functions. > > I'm not sure that Peter was voting for retaining "internal name", but > personally I prefer that to deleting prosrc entirely, so +1. Apologies, didn't mean to say that he had agree with keeping 'internal name', just that it seemed to be the most generally accepted apporach (and it has a +1 from me as well). > > Personally, I like the idea of a '\sv' for views, though we should > > discuss that on a new thread. > > We have \sv already no? Right, sorry. > I'm kind of -1 on removing view definitions from \d+. It's worked like > that for a very long time and Peter's is the first complaint I've heard. > I think changing it is likely to annoy more people than will think it's > an improvement. * Robert Haas (robertmh...@gmail.com) wrote: > I'm still not used to the change that I have to use \d+ rather than \d > to see the view definition. It's the #1 thing I want to see when > examining a view, and since 2fe1b4dd651917aad2accac7ba8adb44d9f54930 I > have to remember to stick a + sign in there. So, in short, I agree. I definitely see the argument of "\d on a view used to give me the view def and now it's almost useless and I have to remember to \d+ all the time", but I also think that I might be able to retrain my fingers to do \sv for views more easily than always remembering to add a '+' to \d, which I use much more frequently than \sv or \d+. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1
Albe Laurenzwrites: > Tom Lane wrote: >> I'm okay with adding PGDLLEXPORT to the extern, but we should update >> that comment to note that it's not necessary with any of our standard >> Windows build processes. (For that matter, the comment fails to explain >> why this macro is providing an extern for the base function at all...) > Here is a patch for that, including an attempt to improve the comment. Pushed with some further twiddling of the comment. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support
Julien Rouhaud wrote: > On 12/10/2016 14:32, Alvaro Herrera wrote: > > Julien Rouhaud wrote: > > > >> and you can instead make macaddr64 support both format, and provide a > >> macaddr::macaddr64 cast > > > > Having macaddr64 support both formats sounds nice, but how does it work? > > Will we have to reserve one additional bit to select the representation? > > That would make the type be 65 bits which is a clear loser IMO. > > > > Is it allowed to just leave 16 bits as zeroes which would indicate that > > the address is EUI48? I wouldn't think so ... > > From what I read, you can indicate it's an EUI-48 address by storing > FF:FF (or FF:FE for MAC-48) in 4th and 5th bytes of the EUI-64 address. That seems reasonable at first glance; so the new type would be able to store both 48-bit and 64-bit values, and there would be assignment casts in both directions and a suite of operators to enable interoperability of 48-bit values in macaddr8 with values in type macaddr. Right? (The cast function from macaddr8 to macaddr would raise error if the 4th and 5th bytes are not either FF:FF or FF:FE -- I don't think we can in practice distinguish EUI-48 from MAC-48 in this context. The cast in the other direction would have no restriction and should probably always use FF:FE). -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove "Source Code" column from \df+ ?
On Wed, Oct 12, 2016 at 8:16 AM, Tom Lanewrote: > Stephen Frost writes: >> * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote: >>> I'm OK with just removing all the source codes from the \d family and >>> using the \s family instead. > >> Ok, great, thanks for clarifying that. Since we only have '\sf' today, >> I think the prevailing option here is then to make the change to >> removing 'prosrc' from \df+, have an 'internal name' column, and have >> users use \sf for functions. > > I'm not sure that Peter was voting for retaining "internal name", but > personally I prefer that to deleting prosrc entirely, so +1. > >> Personally, I like the idea of a '\sv' for views, though we should >> discuss that on a new thread. > > We have \sv already no? > > I'm kind of -1 on removing view definitions from \d+. It's worked like > that for a very long time and Peter's is the first complaint I've heard. > I think changing it is likely to annoy more people than will think it's > an improvement. I'm still not used to the change that I have to use \d+ rather than \d to see the view definition. It's the #1 thing I want to see when examining a view, and since 2fe1b4dd651917aad2accac7ba8adb44d9f54930 I have to remember to stick a + sign in there. So, in short, I agree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Extend framework from commit 53be0b1ad to report latch waits.
On Wed, Oct 12, 2016 at 8:18 AM, Peter Eisentrautwrote: > On 10/4/16 11:29 AM, Tom Lane wrote: >> Robert Haas writes: >>> Apparently, 'make world' does not build worker_spi. I thought 'make >>> world' was supposed to build everything? >> >> You'd have thunk, yeah. It looks like the issue is that src/Makefile >> is selective about recursing into certain subdirectories of test/, >> but mostly not test/ itself. src/test/Makefile naively believes it's >> in charge, though. Probably that logic ought to get shoved down one >> level, and then adjusted so that src/test/modules gets built by "all". >> Or else teach top-level "make world" to do "make all" in src/test/, >> but that seems like it's just doubling down on confusing interconnections. > > We generally don't build test code during make world. > > The reason src/Makefile does that is probably because pg_regress is part > of the installation. > > So this looks more or less correct to me. But it's annoying to push code and have it break the buildfarm when you already did 'make world' and 'make check-world'. What target should I be using? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Non-empty default log_line_prefix
Re: Peter Eisentraut 2016-10-12 <0caa6d7f-deb6-9a43-2b38-60e63af93...@2ndquadrant.com> > >> > is going to do more to raise peoples' awareness than anything we > >> > could do in the documentation. But perhaps an example along these > >> > lines would be useful for showing proper use of %q. > > Patch attached. (Still using %t, I don't think %m makes sense for the > > default.) > > That still doesn't address what to do about syslog and eventlog users. > We would need either a separate prefix setting for those, or maybe > something like %q that says, skip to here if using syslog. (I don't > know eventlog, so I don't know if a common setting for syslog and > eventlog would work.) This patch simply tries to fix the default (stderr + '') which wasn't useful for anyone. Note that there is already a hint in the documentation that says timestamps and PID are not useful for syslog. (Yes, the '' default might be fine for syslog, but I don't think that's a good argument for sticking with it for default installs. I've seen way too many useless log files out there, and at worst we'll have syslogs with two timestamps.) Christoph -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Extend framework from commit 53be0b1ad to report latch waits.
Peter Eisentrautwrites: > On 10/4/16 11:29 AM, Tom Lane wrote: >> Robert Haas writes: >>> Apparently, 'make world' does not build worker_spi. I thought 'make >>> world' was supposed to build everything? >> You'd have thunk, yeah. It looks like the issue is that src/Makefile >> is selective about recursing into certain subdirectories of test/, >> but mostly not test/ itself. src/test/Makefile naively believes it's >> in charge, though. Probably that logic ought to get shoved down one >> level, and then adjusted so that src/test/modules gets built by "all". >> Or else teach top-level "make world" to do "make all" in src/test/, >> but that seems like it's just doubling down on confusing interconnections. > We generally don't build test code during make world. > The reason src/Makefile does that is probably because pg_regress is part > of the installation. > So this looks more or less correct to me. Even if you think the behavior is correct (I'm not convinced), the implementation is certainly confusing. I don't like the way that src/test/Makefile gets bypassed for decisions about which of its subdirectories get built for what. Any normal person would expect that src/test/Makefile is what determines that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Non-empty default log_line_prefix
Peter Eisentrautwrites: > That still doesn't address what to do about syslog and eventlog users. > We would need either a separate prefix setting for those, or maybe > something like %q that says, skip to here if using syslog. (I don't > know eventlog, so I don't know if a common setting for syslog and > eventlog would work.) Meh. Those aren't default log output targets, so I don't think the default prefix needs to cater for them. People who adjust one setting can adjust the other too. There would be some value in the complexity you're thinking of for installations that log to multiple targets concurrently, but really, who does that? Most production installations already begrudge the I/O volume for a single log target. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Renaming of pg_xlog and pg_clog
* David Steele (da...@pgmasters.net) wrote: > On 10/4/16 1:48 AM, Michael Paquier wrote: > > > So this is still open for votes. Here are the candidates and who > > voiced for what: > > - pg_transaction: Michael P, Thomas M. => Current 0002 is doing that. > > - pg_xact: David S, Robert > > - pg_trans: Tom > > - pg_transaction_status: Peter E. > > Christoph also prefers pg_xact [1]. Since we're asking for votes, I also prefer pg_xact over the other options listed here. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Renaming of pg_xlog and pg_clog
Peter Eisentrautwrites: > On 10/4/16 1:48 AM, Michael Paquier wrote: >> So this is still open for votes. Here are the candidates and who >> voiced for what: >> - pg_transaction: Michael P, Thomas M. => Current 0002 is doing that. >> - pg_xact: David S, Robert >> - pg_trans: Tom >> - pg_transaction_status: Peter E. > I think this would all be a lot easier if we just moved all the pg_* > directories one directory down. The main problem we're trying to fix here is people thinking that something with "log" in the name contains discardable data. Just relocating the directory without renaming it won't improve that. The relocation aspect was meant to address another problem, which is making it easier to distinguish which parts of the tree should be copied by tools like pg_basebackup. But that's not what Michael is asking for votes on. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Non-empty default log_line_prefix
On 10/2/16 4:20 PM, Christoph Berg wrote: >> I suspect that simply having a nonempty default in the first place >> > is going to do more to raise peoples' awareness than anything we >> > could do in the documentation. But perhaps an example along these >> > lines would be useful for showing proper use of %q. > Patch attached. (Still using %t, I don't think %m makes sense for the > default.) That still doesn't address what to do about syslog and eventlog users. We would need either a separate prefix setting for those, or maybe something like %q that says, skip to here if using syslog. (I don't know eventlog, so I don't know if a common setting for syslog and eventlog would work.) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Extend framework from commit 53be0b1ad to report latch waits.
On 10/4/16 11:29 AM, Tom Lane wrote: > Robert Haaswrites: >> Apparently, 'make world' does not build worker_spi. I thought 'make >> world' was supposed to build everything? > > You'd have thunk, yeah. It looks like the issue is that src/Makefile > is selective about recursing into certain subdirectories of test/, > but mostly not test/ itself. src/test/Makefile naively believes it's > in charge, though. Probably that logic ought to get shoved down one > level, and then adjusted so that src/test/modules gets built by "all". > Or else teach top-level "make world" to do "make all" in src/test/, > but that seems like it's just doubling down on confusing interconnections. We generally don't build test code during make world. The reason src/Makefile does that is probably because pg_regress is part of the installation. So this looks more or less correct to me. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove "Source Code" column from \df+ ?
Stephen Frostwrites: > * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote: >> I'm OK with just removing all the source codes from the \d family and >> using the \s family instead. > Ok, great, thanks for clarifying that. Since we only have '\sf' today, > I think the prevailing option here is then to make the change to > removing 'prosrc' from \df+, have an 'internal name' column, and have > users use \sf for functions. I'm not sure that Peter was voting for retaining "internal name", but personally I prefer that to deleting prosrc entirely, so +1. > Personally, I like the idea of a '\sv' for views, though we should > discuss that on a new thread. We have \sv already no? I'm kind of -1 on removing view definitions from \d+. It's worked like that for a very long time and Peter's is the first complaint I've heard. I think changing it is likely to annoy more people than will think it's an improvement. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove "Source Code" column from \df+ ?
* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote: > On 10/12/16 11:08 AM, Stephen Frost wrote: > > Personally, I like the idea of a '\sv' for views, though we should > > discuss that on a new thread. > > \sv already exists. :-) Whoops, sorry, was looking at a 9.5 psql. :) Neat! Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Remove "Source Code" column from \df+ ?
On 10/12/16 11:08 AM, Stephen Frost wrote: > Personally, I like the idea of a '\sv' for views, though we should > discuss that on a new thread. \sv already exists. :-) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rename max_parallel_degree?
On 10/4/16 10:16 AM, Julien Rouhaud wrote: > Please find attached v9 of the patch, adding the parallel worker class > and changing max_worker_processes default to 16 and max_parallel_workers > to 8. I also added Amit's explanation for the need of a write barrier > in ForgetBackgroundWorker(). This approach totally messes up the decoupling between the background worker facilities and consumers of those facilities. Having dozens of lines of code in bgworker.c that does the accounting and resource checking on behalf of parallel.c looks very suspicious. Once we add logical replication workers or whatever, we'll be tempted to add even more stuff in there and it will become a mess. I think we should think of a way where we can pass the required information during background worker setup, like a pointer to the resource-limiting GUC variable. For style, I would also prefer the "class" to be a separate struct field from the flags. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove "Source Code" column from \df+ ?
* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote: > On 10/11/16 7:51 PM, Tom Lane wrote: > > 1. Do nothing. > > 2. Remove the prosrc column from \df+ altogether. > > 3. Suppress prosrc for PL functions, but continue to show it for > >C and internal functions (and, probably, rename it to something > >other than "Source code" in that case). > > 4. #3 plus show PL function source code in footers. > > One related annoyance I have with psql is that \d+ on a view *does* show > the "source code" in the footer, because it's often too long and bulky > and ugly and unrelated to why I wanted to use the +. I tend to agree with that, though I believe it's a topic for another thread. > I'm OK with just removing all the source codes from the \d family and > using the \s family instead. Ok, great, thanks for clarifying that. Since we only have '\sf' today, I think the prevailing option here is then to make the change to removing 'prosrc' from \df+, have an 'internal name' column, and have users use \sf for functions. If anyone feels differently, please speak up. Personally, I like the idea of a '\sv' for views, though we should discuss that on a new thread. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Renaming of pg_xlog and pg_clog
On 10/4/16 1:48 AM, Michael Paquier wrote: > So this is still open for votes. Here are the candidates and who > voiced for what: > - pg_transaction: Michael P, Thomas M. => Current 0002 is doing that. > - pg_xact: David S, Robert > - pg_trans: Tom > - pg_transaction_status: Peter E. I think this would all be a lot easier if we just moved all the pg_* directories one directory down. We did this with "global" many years ago and it worked out well. We didn't actually have to change the file names, the top-level directory became cleaner, and no one was messing around with the files anymore. Adjusting external tools also becomes easier. There is so much lore out there about clog and xlog; it would be annoying to break with that. I can see a reason for not doing that with pg_xlog, because that is a bit of an external interface, with initdb -X and such, and pg_wal is a pretty good alternative name. But no one deals with these other directories directly, so just move them out of the way. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1
Tom Lane wrote: > I'm okay with adding PGDLLEXPORT to the extern, but we should update > that comment to note that it's not necessary with any of our standard > Windows build processes. (For that matter, the comment fails to explain > why this macro is providing an extern for the base function at all...) Here is a patch for that, including an attempt to improve the comment. Yours, Laurenz Albe 0001-Add-PGDLLEXPORT-to-PG_FUNCTION_INFO_V1.patch Description: 0001-Add-PGDLLEXPORT-to-PG_FUNCTION_INFO_V1.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove "Source Code" column from \df+ ?
On 10/11/16 7:51 PM, Tom Lane wrote: > 1. Do nothing. > 2. Remove the prosrc column from \df+ altogether. > 3. Suppress prosrc for PL functions, but continue to show it for >C and internal functions (and, probably, rename it to something >other than "Source code" in that case). > 4. #3 plus show PL function source code in footers. One related annoyance I have with psql is that \d+ on a view *does* show the "source code" in the footer, because it's often too long and bulky and ugly and unrelated to why I wanted to use the +. I'm OK with just removing all the source codes from the \d family and using the \s family instead. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical tape pause/resume
On 10/10/2016 10:31 PM, Peter Geoghegan wrote: On Sun, Oct 9, 2016 at 11:52 PM, Heikki Linnakangaswrote: Ok, good. I think we're in agreement on doing this, then, even if we don't agree on the justification :-). Agreed. :-) Attached are new patch versions. Rebased them over current head, fixed a number of bugs in the seek and backspace code, and did a bunch of stylistic cleanups and comment fixes. I changed the API of LogicalTapeBackspace() slightly. If you try to back up beyond the beginning of tape, it used to return FALSE and do nothing. Now it backspaces as much as it can, to the very beginning of the tape, and returns the amount backspaced. That was more convenient than the old behaviour with this new implementation. - Heikki >From 588d74efe33f380221391b1d8520dc4b4cfa09be Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Wed, 12 Oct 2016 12:26:25 +0300 Subject: [PATCH 1/2] Simplify tape block format. No more indirect blocks. The blocks form a linked list instead. This saves some memory, because we don't need to have a buffer in memory to hold the indirect block (or blocks). To reflect that, TAPE_BUFFER_OVERHEAD is reduced from 3 to 1 buffer, which allows using more memory for building the initial runs. --- src/backend/utils/sort/logtape.c | 627 +++-- src/backend/utils/sort/tuplesort.c | 58 ++-- src/include/utils/logtape.h| 4 +- 3 files changed, 214 insertions(+), 475 deletions(-) diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c index 3163010..07d147d 100644 --- a/src/backend/utils/sort/logtape.c +++ b/src/backend/utils/sort/logtape.c @@ -31,15 +31,8 @@ * in BLCKSZ-size blocks. Space allocation boils down to keeping track * of which blocks in the underlying file belong to which logical tape, * plus any blocks that are free (recycled and not yet reused). - * The blocks in each logical tape are remembered using a method borrowed - * from the Unix HFS filesystem: we store data block numbers in an - * "indirect block". If an indirect block fills up, we write it out to - * the underlying file and remember its location in a second-level indirect - * block. In the same way second-level blocks are remembered in third- - * level blocks, and so on if necessary (of course we're talking huge - * amounts of data here). The topmost indirect block of a given logical - * tape is never actually written out to the physical file, but all lower- - * level indirect blocks will be. + * The blocks in each logical tape form a chain, with a prev- and next- + * pointer in each block. * * The initial write pass is guaranteed to fill the underlying file * perfectly sequentially, no matter how data is divided into logical tapes. @@ -87,58 +80,65 @@ #include "utils/memutils.h" /* - * Block indexes are "long"s, so we can fit this many per indirect block. - * NB: we assume this is an exact fit! - */ -#define BLOCKS_PER_INDIR_BLOCK ((int) (BLCKSZ / sizeof(long))) - -/* - * We use a struct like this for each active indirection level of each - * logical tape. If the indirect block is not the highest level of its - * tape, the "nextup" link points to the next higher level. Only the - * "ptrs" array is written out if we have to dump the indirect block to - * disk. If "ptrs" is not completely full, we store -1L in the first - * unused slot at completion of the write phase for the logical tape. + * A TapeBlockTrailer is stored at the end of each BLCKSZ block. + * + * The first block of a tape has prev == -1. The last block of a tape + * stores the number of valid bytes on the block, inverted, in 'next' + * Therefore next < 0 indicates the last block. */ -typedef struct IndirectBlock +typedef struct TapeBlockTrailer { - int nextSlot; /* next pointer slot to write or read */ - struct IndirectBlock *nextup; /* parent indirect level, or NULL if - * top */ - long ptrs[BLOCKS_PER_INDIR_BLOCK]; /* indexes of contained blocks */ -} IndirectBlock; + long prev; /* previous block on this tape, or -1 on first + * block */ + long next; /* next block on this tape, or # of valid + * bytes on last block (if < 0) */ +} TapeBlockTrailer; + +#define TapeBlockPayloadSize (BLCKSZ - sizeof(TapeBlockTrailer)) +#define TapeBlockGetTrailer(buf) \ + ((TapeBlockTrailer *) ((char *) buf + TapeBlockPayloadSize)) + +#define TapeBlockIsLast(buf) (TapeBlockGetTrailer(buf)->next < 0) +#define TapeBlockGetNBytes(buf) \ + (TapeBlockIsLast(buf) ? \ + (- TapeBlockGetTrailer(buf)->next) : TapeBlockPayloadSize) +#define TapeBlockSetNBytes(buf, nbytes) \ + (TapeBlockGetTrailer(buf)->next = -(nbytes)) + /* * This data structure represents a single "logical tape" within the set - * of logical tapes stored in the same file. We must keep track of the - * current partially-read-or-written data block as well as the active - * indirect block level(s). + * of
Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1
Albe Laurenzwrites: > Tom Lane wrote: >> Except that we don't. There aren't PGDLLEXPORT markings for any >> functions exported from contrib modules, and we don't use dlltool >> on them either. By your argument, none of contrib would work on >> Windows builds at all, but we have a ton of buildfarm evidence and >> successful field use to the contrary. How is that all working? > I thought it was the job of src/tools/msvc/gendef.pl to generate > .DEF files? Hm, okay, so after further review and googling: MSVC: gendef.pl is used to build a .DEF file, creating the equivalent of --export-all-symbols behavior. Mingw: we use handmade .DEF files for libpq and a few other libraries, otherwise Makefile.shlib explicitly specifies -Wl,--export-all-symbols. Cygwin: per https://sourceware.org/binutils/docs-2.17/ld/WIN32.html --export-all-symbols is the default unless you use a .DEF file or have at least one symbol marked __declspec(dllexport) --- but the Cygwin build defines PGDLLEXPORT as empty, so there won't be any of the latter. So it works for us, but the stackoverflow complainant is evidently using some homebrew build process that does none of the above. I'm okay with adding PGDLLEXPORT to the extern, but we should update that comment to note that it's not necessary with any of our standard Windows build processes. (For that matter, the comment fails to explain why this macro is providing an extern for the base function at all...) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ICU integration
On 9/30/16 4:32 PM, Thomas Munro wrote: >> Hmm, yeah, that will need more work. To be completely correct, I think, >> > we'd also need to record the version in each expression node, so that >> > check constraints of the form CHECK (x > 'abc') can be handled. > Hmm. That is quite a rabbit hole. In theory you need to recheck such > a constraint, but it's not at all clear when you should recheck and > what you should do about it if it fails. Similar for the future > PARTITION feature. I think it's not worth dealing with this in that much detail at the moment. It's not like the collation will just randomly change under you (unlike with glibc). It would have to involve pg_upgrade, physical replication, or a rebuilt installation. So I think I will change the message to something to the effect of "however you got here, you can't do that". We can develop some recipes and ideas on the side for how to recover situations like that and then maybe integrate tooling for that later. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1
Tom Lane wrote: >> PostgreSQL itself seems to use export files that explicitly declare the >> exported symbols, so it gets away without these decorations. > > Except that we don't. There aren't PGDLLEXPORT markings for any > functions exported from contrib modules, and we don't use dlltool > on them either. By your argument, none of contrib would work on > Windows builds at all, but we have a ton of buildfarm evidence and > successful field use to the contrary. How is that all working? I thought it was the job of src/tools/msvc/gendef.pl to generate .DEF files? In the buildfarm logs I can find lines like: Generating CITEXT.DEF from directory Release\citext, platform x64 Generating POSTGRES_FDW.DEF from directory Release\postgres_fdw, platform x64 which are emitted by gendef.pl. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1
Albe Laurenzwrites: > Tom Lane wrote: >> The lack of complaints about this suggest that it's not actually necessary >> to do so. So what this makes me wonder is whether we can't drop the >> DLLEXPORT on the finfo function too. I'd rather reduce the number of >> Microsoft-isms in the code, not increase it. > I understand the sentiment. > But it seems to actually cause a problem on Windows, at least there was a > complaint here: http://stackoverflow.com/q/39964233 > Adding PGDLLEXPORT solved the problem there. > I guess that there are not more complaints about that because > few people build C functions on Windows themselves (lack of PGXS) > and those who do are probably knowledgeable enough that they can > fix it themselves by sticking an extra declaration with PGDLLEXPORT > into their source file. > PostgreSQL itself seems to use export files that explicitly declare the > exported symbols, so it gets away without these decorations. Except that we don't. There aren't PGDLLEXPORT markings for any functions exported from contrib modules, and we don't use dlltool on them either. By your argument, none of contrib would work on Windows builds at all, but we have a ton of buildfarm evidence and successful field use to the contrary. How is that all working? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1
Tom Lane wrote: > Albe Laurenzwrites: >> Currently, PG_FUNCTION_INFO_V1 is defined as [...] > >> Is there a good reason why the "funcname" declaration is not decorated >> with PGDLLEXPORT? > > The lack of complaints about this suggest that it's not actually necessary > to do so. So what this makes me wonder is whether we can't drop the > DLLEXPORT on the finfo function too. I'd rather reduce the number of > Microsoft-isms in the code, not increase it. I understand the sentiment. But it seems to actually cause a problem on Windows, at least there was a complaint here: http://stackoverflow.com/q/39964233 Adding PGDLLEXPORT solved the problem there. I guess that there are not more complaints about that because few people build C functions on Windows themselves (lack of PGXS) and those who do are probably knowledgeable enough that they can fix it themselves by sticking an extra declaration with PGDLLEXPORT into their source file. PostgreSQL itself seems to use export files that explicitly declare the exported symbols, so it gets away without these decorations. >> BTW, I admit I don't understand the comment. > > It seems like an obvious typo. Or, possibly, sloppy thinking that > contributed to failure to recognize that the keyword isn't needed. Looking through the history, it seems to be as follows: - In 5fde8613, the comment was added (mistakenly) together with a DLLIMPORT decoration. - In 77e50a61, the decoration was changed to PGDLLEXPORT, but the comment forgotten. - In e7128e8d, the function prototype was added to the macro, but the PGDLLEXPORT decoration was forgotten. The dlltool mentioned is MinGW, so it doesn't apply to people building with MSVC. Maybe the comment should be like this: /* * Macro to build an info function associated with the given function name. * Win32 loadable functions usually link with 'dlltool --export-all' or use * a .DEF file, but it doesn't hurt to add PGDLLEXPORT in case they don't. */ Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Aggregate Push Down - Performing aggregation on foreign server
On Thu, Sep 8, 2016 at 10:41 PM, Ashutosh Bapat < ashutosh.ba...@enterprisedb.com> wrote: > > > I think we should try to measure performance gain because of aggregate > pushdown. The EXPLAIN > doesn't show actual improvement in the execution times. > I did performance testing for aggregate push down and see good performance with the patch. Attached is the script I have used to get the performance numbers along with the results I got with and without patch (pg_agg_push_down_v3.patch). Also attached few GNU plots from the readings I got. These were run on my local VM having following details: Linux centos7 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux RAM alloted: 8 GB CPUs alloted: 8 postgresql.conf is default. With aggregate push down I see around 12x performance for count(*) operation. In another test, I have observed that if number of groups returned from remote server is same as that of input rows, then aggregate push down performs slightly poor which I think is expected. However in all other cases where number of groups are less than input rows, pushing down aggregate gives better performance than performing grouping on the local server. I did this performance testing on my local setup. I would expected even better numbers on a specialized high-end performance machine. I would be more than happy if someone does this testing on such high-end machine. Let me know if you need any help. Thanks -- Jeevan B Chalke Principal Software Engineer, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company === WITH PATCH === query| rows | avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time -+-+--+--+--+-- select count(*) from fperf1 | 1 | 141.8686 | 4.4668353500495 | 138.903 | 152.241 select c2, avg(c1) from fperf1 group by c2 having count(*) < 34 | 2 | 405.7661 | 11.9403142844368 | 400.689 | 439.675 select c2, sum(c1) from fperf1 group by c2 | 3 | 363.2299 | 4.29278180851687 | 354.815 | 369.739 select c3, avg(c1), sum(c2) from fperf1 group by c3 | 5 | 454.4478 | 3.98680590057494 | 447.248 | 457.955 select c4, avg(c1), sum(c2) from fperf1 group by c4 | 10 | 501.0197 | 5.26951028823454 | 491.726 | 508.574 select c5, avg(c1), sum(c2) from fperf1 group by c5 | 100 | 490.0783 | 5.64261263462349 | 480.78 | 496.662 select c6, avg(c1), sum(c2) from fperf1 group by c6 |1000 | 582.6842 | 9.9196984474776 | 564.425 | 592.69 select c1%1, avg(c1), sum(c2) from fperf1 group by c1%1 | 1 | 901.1682 | 9.58382273302904 | 888.383 | 923.386 select c1%10, avg(c1), sum(c2) from fperf1 group by c1%10 | 10 |1032.1959 | 6.89087326268629 | 1018.598 | 1045.423 select c1%100, avg(c1), sum(c2) from fperf1 group by c1%100 | 100 |1076.3834 | 11.1022883947539 | 1061.305 | 1093.892 select c1%1000, avg(c1), sum(c2) from fperf1 group by c1%1000 |1000 |1113.6001 | 11.2143472634172 | 1092.863 | 1133.007 select c1%1, avg(c1), sum(c2) from fperf1 group by c1%1 | 1 |1182.1374 | 32.5953859659133 | 1148.961 | 1231.296 select c1%10, avg(c1), sum(c2) from fperf1 group by c1%10 | 10 |1467.1811 | 14.3535175437048 | 1443.95 | 1485.645 select c1%100, avg(c1), sum(c2) from fperf1 group by c1%100 | 100 |5466.2306 | 633.367848489717 | 5127.339 | 7248.381 (14 rows) === WITHOUT PATCH === query| rows | avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time -+-+--+--+--+-- select count(*) from fperf1 | 1 |1674.5339 | 27.1549108570754 | 1637.345 | 1725.057 select c2, avg(c1) from fperf1 group by c2 having count(*) < 34 | 2 |2467.8368 | 22.606929678949 | 2437.674 | 2506.438 select c2, sum(c1) from fperf1 group by c2 | 3 | 2387.39 | 34.3686766983568 | 2350.396 | 2444.313 select c3, avg(c1), sum(c2) from fperf1 group by c3 | 5 |2702.3344 | 28.0312843452488 | 2665.317 | 2746.862 select c4, avg(c1), sum(c2) from fperf1 group by c4 | 10 |2850.9818 | 42.5758532759606 | 2813.562 | 2946.991 select c5, avg(c1),
Re: [HACKERS] How to inspect tuples during execution of a plan?
Hello, > > Either of those would do, if you want to write change the executor. > I used the ExeceutorRun_hook and it seems to work. The drawback is, that I have to provide my own implementation of ExecutePlan() which could make it incompatible over versions of PostgreSQL. I don't know how stable that function is over time. https://github.com/ergo70/pg_sentinel Thank you, Ernst-Georg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove "Source Code" column from \df+ ?
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frostwrites: > > As was mentioned, this thread doesn't really need a patch but rather > > some comment from those who have voiced a -1 on removing the PL source > > code column. > > > In another, perhaps vain, attempt to get to a consensus, here's what it > > looks like the current standings are for "Remove source from \df+", > > I think this is oversimplified, because there are multiple proposals on > the table, and it's not entirely clear to me who approves of which. That's certainly fair and I had begun that email by trying to come up with a way to represent everyone's positions fairly but, frankly, after an hour of reading through the thread and noting the various changes in positions, I got to the point where I felt... > > There have been a number of voices asking that we do *something* here. > > Yes. I agree with your summary that Peter is the only one who appears > to be in favor of "do nothing" (and even there, his complaint was at > least partly procedural not substantive). We really need a response on this part if we're going to actually make any progress. If we'd actually like to do a formal condorcet-style vote (or something similar which allows preferences to be considered) over the various options, I'm willing to put effort into making it happen, but only if we'd actually agree to accept the result, otherwise we're just back here again. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support
On 12/10/2016 14:32, Alvaro Herrera wrote: > Julien Rouhaud wrote: > >> and you can instead make macaddr64 support both format, and provide a >> macaddr::macaddr64 cast > > Having macaddr64 support both formats sounds nice, but how does it work? > Will we have to reserve one additional bit to select the representation? > That would make the type be 65 bits which is a clear loser IMO. > > Is it allowed to just leave 16 bits as zeroes which would indicate that > the address is EUI48? I wouldn't think so ... > >From what I read, you can indicate it's an EUI-48 address by storing FF:FF (or FF:FE for MAC-48) in 4th and 5th bytes of the EUI-64 address. -- Julien Rouhaud http://dalibo.com - http://dalibo.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1
Albe Laurenzwrites: > Currently, PG_FUNCTION_INFO_V1 is defined as > /* >* Macro to build an info function associated with the given function name. >* Win32 loadable functions usually link with 'dlltool --export-all', but > it >* doesn't hurt to add PGDLLIMPORT in case they don't. >*/ > #define PG_FUNCTION_INFO_V1(funcname) \ > Datum funcname(PG_FUNCTION_ARGS); \ > extern PGDLLEXPORT const Pg_finfo_record * > CppConcat(pg_finfo_,funcname)(void); \ > const Pg_finfo_record * \ > CppConcat(pg_finfo_,funcname) (void) \ > { \ > static const Pg_finfo_record my_finfo = { 1 }; \ > return _finfo; \ > } \ > extern int no_such_variable > Is there a good reason why the "funcname" declaration is not decorated > with PGDLLEXPORT? The lack of complaints about this suggest that it's not actually necessary to do so. So what this makes me wonder is whether we can't drop the DLLEXPORT on the finfo function too. I'd rather reduce the number of Microsoft-isms in the code, not increase it. > BTW, I admit I don't understand the comment. It seems like an obvious typo. Or, possibly, sloppy thinking that contributed to failure to recognize that the keyword isn't needed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support
Haribabu Kommiwrites: > Here I attached a POC patch that adds the support for EUI-64 MAC address > storage with a new datatype macaddr64. Currently this type takes only > EUI-64 datatype, not accepts 48 bit MAC address. Our other data types that have sizes in the names measure the sizes in bytes (float4, int8, etc). Should this be called macaddr8? > As we are moving to PostgreSQL 10, so are there any plans of backward > compatiblity breakage, so the existing macaddr datatype itself can be > changed to support both 48 and 64 bit MAC addresses. As others have noted, there is no likelihood that we'd take a disk-format- compatibility-breaking patch for v10. Even if we wanted to do that, the above proposal would also break send/recv (binary COPY) compatibility for macaddr. I think that probably the best bet here is to have two types and put some thought into making them interoperate where appropriate, as the various sizes of int do. It's kind of a shame that this won't look like the approach used for inet addresses, but we're stuck. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers