[HACKERS] max_connections and standby server
Today I encountered an interesting situation. 1) A streaming replication primary server and a standby server is running. At this point max_connections = 100 on both servers. 2) Shutdown both servers. 3) Change max_connections to 1100 on both servers and restart both servers. 4) The primary server happily started but the standby server won't because of lacking resource. 5) Shutdown both servers. 6) Restore max_connections to 100 on both servers and restart both servers. 7) The primary server happily started but the standby server won't because of the reason below. 32695 2015-08-11 13:46:22 JST FATAL: hot standby is not possible because max_connections = 100 is a lower setting than on the master server (its value was 1100) 32695 2015-08-11 13:46:22 JST CONTEXT: xlog redo parameter change: max_connections=1100 max_worker_processes=8 max_prepared_xacts=10 max_locks_per_xact=64 wal_level=hot_standby wal_log_hints=off 32693 2015-08-11 13:46:22 JST LOG: startup process (PID 32695) exited with exit code 1 32693 2015-08-11 13:46:22 JST LOG: terminating any other active server processes I think this is because pg_control on the standby remembers that the previous primary server's max_connections = 1100 even if the standby server fails to start. Shouldn't we update pg_control file only when standby succeeds to start? Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] linux sparc compile issue
Hi Tom, Tom Lane wrote, Waldemar Brodkorb w...@openadk.org writes: while doing regular builds via buildroot autobuilders a compile problem for sparc 32bit v8 was found. It seems the defines for Linux are other than for Solaris. Following patch fixes it for buildroot: The gcc predefines for Linux are __sparc_v8__/__sparc_v7__ I've applied your suggested patch for this, but I'm a bit curious what version of gcc you are using; our code's been like that for a very long time and nobody complained before. Thanks. The cross-compiler we use is gcc 4.9.3. But also the native gcc on my Sun Voyager running Debian 4.0 have it: platin:~# gcc -dM -E - /dev/null|grep sparc #define sparc 1 #define __sparc__ 1 #define __sparc 1 #define __sparc_v8__ 1 platin:~# gcc -v Using built-in specs. Target: sparc-linux-gnu Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --program-suffix=-4.1 --enable-__cxa_atexit --enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --with-cpu=v8 --enable-checking=release sparc-linux-gnu Thread model: posix gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21) platin:~# cat /etc/debian_version 4.0 The last supported Debian is delivering Postgresql 7.5.22. I think this version did not contained the code: platin:~/postgresql-7.5.22# find . -name \*lock.h platin:~/postgresql-7.5.22# grep -r sparc * So may be buildroot is one of the few projects supporting sparcv8 for 32 Bit sparc machines. best regards Waldemar -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_connections and standby server
Tatsuo Ishii is...@postgresql.org writes: I think this is because pg_control on the standby remembers that the previous primary server's max_connections = 1100 even if the standby server fails to start. Shouldn't we update pg_control file only when standby succeeds to start? Somebody refresh my memory as to why we have this restriction (that is, slave's max_connections = master's max_connections) in the first place? Seems like it should not be a necessary requirement, and working towards getting rid of it would be far better than any other answer. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_connections and standby server
Somebody refresh my memory as to why we have this restriction (that is, slave's max_connections = master's max_connections) in the first place? Seems like it should not be a necessary requirement, and working towards getting rid of it would be far better than any other answer. If you care about max_connections, you might want to care about below as well (from xlog.c) RecoveryRequiresIntParameter(max_worker_processes, max_worker_processes, ControlFile-max_worker_processes); RecoveryRequiresIntParameter(max_prepared_transactions, max_prepared_xacts, ControlFile-max_prepared_xacts); RecoveryRequiresIntParameter(max_locks_per_transaction, Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.
On Tue, Aug 11, 2015 at 1:50 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Masahiko Sawada wrote: This topic may have been already discussed but, why don't we use just total scanned pages and total pages? Because those numbers don't extrapolate nicely. If the density of dead tuples is irregular across the table, such absolute numbers might be completely meaningless: you could scan 90% of the table without seeing any index scan, and then at the final 10% be hit by many index scans cleaning dead tuples. Thus you would see progress go up to 90% very quickly and then take hours to have it go to 91%. (Additionally, and a comparatively minor point: since you don't know how many index scans are going to happen, there's no way to know the total number of blocks scanned, unless you don't count index blocks at all, and then the numbers become a lie.) If you instead track number of heap pages separately from index pages, and indicate how many index scans have taken place, you have a chance of actually figuring out how many heap pages are left to scan and how many more index scans will occur. Thank you for your explanation! I understood about this. VACUUM has 3 phases now, but since phases 2 and 3 repeat, you can have an unbounded number of phases. But that assumes that we don't count truncation as a 4th phase of VACUUM... In case of vacuum, I think we need to track the number of scanned heap pages at least, and the information about index scan is the additional information. The another idea for displaying progress is to have two kind of information: essential information and additional information. Essential information has one numeric data, which is stored essentially information regarding of its processing. Additional information has two data: text and numeric. These data is free-style data which is stored by each backend as it like. And these three data are output at same time. For example, In case of vacuum, essential information is the number of total scanned heap page. * When lazy_scan_heap starts, the two additional data are NULL. * When lazy_vacuum_index starts, the backend set additional data like followings. - Index vacuuming into text data which describes what we're doing now actually. - 50 into numeric data which describes how many index pages we scanned. * And when lazy_vacuum_index is done, backend sets additional data NULL again. Regards, -- Masahiko Sawada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_connections and standby server
On Tue, Aug 11, 2015 at 2:42 PM, Tom Lane t...@sss.pgh.pa.us wrote: Tatsuo Ishii is...@postgresql.org writes: I think this is because pg_control on the standby remembers that the previous primary server's max_connections = 1100 even if the standby server fails to start. Shouldn't we update pg_control file only when standby succeeds to start? Somebody refresh my memory as to why we have this restriction (that is, slave's max_connections = master's max_connections) in the first place? Seems like it should not be a necessary requirement, and working towards getting rid of it would be far better than any other answer. If I recall correctly, that's because KnownAssignedXIDs and the lock table need to be large enough on the standby for the largest snapshot possible (procarray.c). -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL, RAISE and error context
2015-08-10 9:11 GMT+02:00 Heikki Linnakangas hlinn...@iki.fi: On 07/26/2015 08:34 AM, Pavel Stehule wrote: Hi here is complete patch, that introduce context filtering on client side. The core of this patch is trivial and small - almost all of size are trivial changes in regress tests - removing useless context. Documentation, check-world Looks good to me at first glance. I'll mark this as Ready for Committer. Is it acceptable for all? I have not a problem with this way. Regards Pavel - Heikki
Re: [HACKERS] WIP: Rework access method interface
Alexander Korotkov a.korot...@postgrespro.ru writes: On Mon, Aug 10, 2015 at 1:12 PM, Petr Jelinek p...@2ndquadrant.com wrote: I don't understand this, there is already AmRoutine in RelationData, why the need for additional field for just amsupport? We need amsupport in load_relcache_init_file() which reads pg_internal.init. I'm not sure this is correct place to call am_handler. It should work in the case of built-in AM. But if AM is defined in the extension then we wouldn't be able to do catalog lookup for am_handler on this stage of initialization. This is an issue we'll have to face before there's much hope of having index AMs as extensions: how would you locate any extension function without catalog access? Storing raw function pointers in pg_internal.init is not an answer in an ASLR world. I think we can dodge the issue so far as pg_internal.init is concerned by decreeing that system catalogs can only have indexes with built-in AMs. Calling a built-in function doesn't require catalog access, so there should be no problem with re-calling the handler function by OID during load_relcache_init_file(). We could also have problems with WAL replay, though I think the consensus there is that extension AMs have to use generic WAL records that don't require any index-specific replay code. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Rework access method interface
On Mon, Aug 10, 2015 at 6:36 PM, Petr Jelinek p...@2ndquadrant.com wrote: On 2015-08-10 16:58, Alexander Korotkov wrote: That should work, thanks! Also we can have SQL-visible functions to get amsupport and amstrategies and use them in the regression tests. SQL-visible functions would be preferable to storing it in pg_am as keeping the params in pg_am would limit the extensibility of pg_am itself. I actually meant just two particular functions (not per AM) which both get AM oid as argument. They should call amhandler and return amsupport and amstrategies correspondingly. These functions can be used in regression tests to check opclass and opfamilies correctness. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP: Rework access method interface
Tom Lane wrote: There are a couple of other pg_am columns, such as amstorage and amcanorderbyop, which similarly bear on what's legal to appear in related catalogs such as pg_opclass. I'd be sort of inclined to leave those in the catalog as well. I do not see that exposing a SQL function is better than exposing a catalog column; either way, that property is SQL-visible. If we do that, it doesn't seem reasonable to use the same catalog for other things such as sequence AM, right? IMO it'd be better to keep the catalog agnostic for exactly what each row is going to be an AM for. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cache invalidation skip logic
On Sun, Aug 9, 2015 at 8:24 AM, Robert Haas robertmh...@gmail.com wrote: In step 1, AcceptInvalidationMessages() should process all pending invalidation messages. So if step 2 did AcceptInvalidationMessages() again it would be a no-op, because no messages should remain at that point. That's what I think at first. I would try to see if I can manually repro a case. Thanks, Qingqing -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Rework access method interface
On 2015-08-10 18:16, Alvaro Herrera wrote: Tom Lane wrote: There are a couple of other pg_am columns, such as amstorage and amcanorderbyop, which similarly bear on what's legal to appear in related catalogs such as pg_opclass. I'd be sort of inclined to leave those in the catalog as well. I do not see that exposing a SQL function is better than exposing a catalog column; either way, that property is SQL-visible. If we do that, it doesn't seem reasonable to use the same catalog for other things such as sequence AM, right? IMO it'd be better to keep the catalog agnostic for exactly what each row is going to be an AM for. Yeah I said the same, the question is if we should have pg_am agnostic or just assume that it's index AM and let other AM types create separate catalogs. Tom seems to prefer the latter, I don't see problem with that, except that I would really hate to add more am related columns to pg_class. I would not mind having relam pointing to different AM catalog for different relkinds but dunno if that's ok for others (it's not really normalized design). We could also keep pg_am agnostic and add pg_index_am for additional info about index AMs, but that would make this patch more invasive. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Rework access method interface
On Mon, Aug 10, 2015 at 5:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alexander Korotkov a.korot...@postgrespro.ru writes: On Mon, Aug 10, 2015 at 1:12 PM, Petr Jelinek p...@2ndquadrant.com wrote: I don't understand this, there is already AmRoutine in RelationData, why the need for additional field for just amsupport? We need amsupport in load_relcache_init_file() which reads pg_internal.init. I'm not sure this is correct place to call am_handler. It should work in the case of built-in AM. But if AM is defined in the extension then we wouldn't be able to do catalog lookup for am_handler on this stage of initialization. This is an issue we'll have to face before there's much hope of having index AMs as extensions: how would you locate any extension function without catalog access? Storing raw function pointers in pg_internal.init is not an answer in an ASLR world. I think we can dodge the issue so far as pg_internal.init is concerned by decreeing that system catalogs can only have indexes with built-in AMs. Calling a built-in function doesn't require catalog access, so there should be no problem with re-calling the handler function by OID during load_relcache_init_file(). That should work, thanks! Also we can have SQL-visible functions to get amsupport and amstrategies and use them in the regression tests. We could also have problems with WAL replay, though I think the consensus there is that extension AMs have to use generic WAL records that don't require any index-specific replay code. Yes, I've already showed one version of generic WAL records. The main objecting against them was it's hard insure that composed WAL-record is correct. Now I'm working on new version which would be much easier and safe to use. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Moving SS_finalize_plan processing to the end of planning
David Rowley david.row...@2ndquadrant.com writes: On 10 August 2015 at 07:50, Tom Lane t...@sss.pgh.pa.us wrote: I've started to work on path-ification of the upper planner (finally), I was digging around the grouping_planner() last week with the intentions of making some changes there to allow GROUP BY before join, but in the end decided to stay well clear of this area until this pathification is done. So far I've managed to keep my changes away from the upper planner and I've added GroupingPath types, which from what I can predict of what you'll be making changes to, I think you'll also need to have grouping_planner() return a few variations of GroupingPath to allow the paths list to be passed up to subquery_planner() and on up to functions like recurse_set_operations() so that they have the option of choosing GroupAggregate / MergeAppend to implement UNION. If I'm right on this, then maybe there's a few things you can copy and paste from the patch I posted here: http://www.postgresql.org/message-id/CAKJS1f-sEcm=gtfs-dqjsocsz-vlhrp_hsrntjjq-s7egn8...@mail.gmail.com specifically around create_grouping_path()? Yeah, I saw your patch, but have not yet had time to think about what parts of it I could borrow. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Rework access method interface
On 2015-08-10 17:47, Tom Lane wrote: Petr Jelinek p...@2ndquadrant.com writes: On 2015-08-10 16:58, Alexander Korotkov wrote: That should work, thanks! Also we can have SQL-visible functions to get amsupport and amstrategies and use them in the regression tests. SQL-visible functions would be preferable to storing it in pg_am as keeping the params in pg_am would limit the extensibility of pg_am itself. I don't see any particularly good reason to remove amsupport and amstrategies from pg_am. Those are closely tied to the other catalog infrastructure for indexes (pg_amproc, pg_amop) which I don't think are candidates for getting changed by this patch. Ok, in that case it seems unlikely that we'll be able to use pg_am for any other access methods besides indexes in the future. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Rework access method interface
On 2015-08-10 18:08, Tom Lane wrote: Alexander Korotkov a.korot...@postgrespro.ru writes: On Mon, Aug 10, 2015 at 6:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: There are a couple of other pg_am columns, such as amstorage and amcanorderbyop, which similarly bear on what's legal to appear in related catalogs such as pg_opclass. I'd be sort of inclined to leave those in the catalog as well. I do not see that exposing a SQL function is better than exposing a catalog column; either way, that property is SQL-visible. That answers my question, thanks! BTW, just to clarify: we can still have the desirable property that CREATE INDEX ACCESS METHOD needs no parameters other than the AM handler function name. The AM code would still expose all of its properties through the struct returned by the handler. What is at issue here is how information in that struct that needs to be available to SQL code gets exposed. We can do that either by exposing SQL functions to get it, or by having CREATE INDEX ACCESS METHOD call the handler and then copy some fields into the new pg_am row. I'm suggesting that we should do the latter for any fields that determine validity of pg_opclass, pg_amop, etc entries associated with the AM type. The AM could not reasonably change its mind about such properties (within a major release at least) so there is no harm in making a long-lived copy in pg_am. And we might as well not break SQL code unnecessarily by removing fields that used to be there. That's definitely the case for built-in AMs but 3rd party ones won't necessarily follow PG release schedule/versioning and I can imagine change of for example amcanorderbyop between AM releases as the AM matures. This could probably be solved by some kind of invalidation mechanism (even if it's some additional SQL). However I am not sure if using catalog as some sort of cache for function output is a good idea in general. IMHO it would be better to just have those options as part of CREATE and ALTER DDL for INDEX ACCESS METHODS if we store them in pg_am. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Asynchronous execution on FDW
On Mon, Aug 10, 2015 at 3:23 AM, Heikki Linnakangas hlinn...@iki.fi wrote: I've marked this as rejected in the commitfest, because others are working on a more general solution with parallel workers. That's still work-in-progress, and it's not certain if it's going to make it into 9.6, but if it does it will largely render this obsolete. We can revisit this patch later in the release cycle, if the parallel scan patch hasn't solved the same use case by then. I think the really important issue for this patch is the one discussed here: http://www.postgresql.org/message-id/ca+tgmoaijk1svzw_gkfu+zssxcijkfelqu2aomvuphpsfw4...@mail.gmail.com You raised an important issue there but never really expressed an opinion on the points I raised, here or on the other thread. And neither did anyone else except the patch author who, perhaps unsurprisingly, thinks it's OK. I wish we could get more discussion about that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] How to compare different datums within from a tuple?
Hello, I try to write my first patch. It is too early to tell more about it, but I am just fiddling around with some prototypes. Can someone tell me, how I can compare two datum fields, when I do not know the data type in advance inside an executor function? For example, x less than y where x and y are of various types that form intervals. I have found the method ExecTuplesMatch, but it is only for equality comparison, I think. Another one is ApplySortComparator... maybe that's the correct way to go? Some code to make things clearer... Datum x = heap_getattr(out-tts_tuple, node-xpos, out-tts_tupleDescriptor, isNull1); Datum y = slot_getattr(curr, node-ypos, isNull2); if (compareDatumWithCorrectMethod(x,y) 0) { /* do something */ } Thx, Peter -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.
Masahiko Sawada wrote: This topic may have been already discussed but, why don't we use just total scanned pages and total pages? Because those numbers don't extrapolate nicely. If the density of dead tuples is irregular across the table, such absolute numbers might be completely meaningless: you could scan 90% of the table without seeing any index scan, and then at the final 10% be hit by many index scans cleaning dead tuples. Thus you would see progress go up to 90% very quickly and then take hours to have it go to 91%. (Additionally, and a comparatively minor point: since you don't know how many index scans are going to happen, there's no way to know the total number of blocks scanned, unless you don't count index blocks at all, and then the numbers become a lie.) If you instead track number of heap pages separately from index pages, and indicate how many index scans have taken place, you have a chance of actually figuring out how many heap pages are left to scan and how many more index scans will occur. The mechanism of VACUUM is complicated a bit today, Understatement of the week ... -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tap tests remove working directories
On 08/10/2015 09:55 AM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: On 08/09/2015 08:58 PM, Michael Paquier wrote: Sure. Attached is what I have in mind. Contrary to your version we keep around temp paths should a run succeed after one that has failed when running make check multiple times in a row. Perhaps it does not matter much in practice as log files get removed at each new run but I think that this allows a finer control in case of failure. Feel free to have a look. Actually, I don't think this is a very good idea. You could end up with a whole series of opaquely named directories from a series of failing runs. If you want to keep the directory after a failure, and want to do another run, then rename the directory. That's what you have to do with the main regression tests, too. My version also has the benefit of changing exactly 3 lines in the source :-) FWIW, I think we should keep the behavior of the TAP tests as close as possible to the established behavior of the main regression tests. It's certainly possible that there's room for improvement in that benchmark behavior. But let's first converge the test behaviors, and then have a discussion about whether we want changes, and if so change all the tests at the same time. Yeah. To do that we should probably stop using File::Temp to make the directory, and just use a hardcoded name given to File::Path::mkpath. If the directory exists, we'd just remove it first. That won't be a very big change - probably just a handful of lines. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.
Hello, When we're in Phase2 or 3, don't we need to report the number of total page scanned or percentage of how many table pages scanned, as well? The total heap pages scanned need to be reported with phase 2 or 3. Complete progress report need to have numbers from each phase when applicable. Phase 1. Report 2 integer counters: heap pages scanned and total heap pages, 1 float counter: percentage_complete and progress message. Phase 2. Report 2 integer counters: index pages scanned and total index pages(across all indexes) and progress message. Phase 3. 1 integer counter: heap pages vacuumed. Sorry for being unclear here. What I meant to say is, each phase of a command will correspond to a slot in COMMAND_NUM_SLOTS. Each phase will be a separate array element and will comprise of n integers, n floats, string. So , in the view reporting progress, VACUUM command can have 3 entries one for each phase. Thank you, Rahila Syed __ Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Rework access method interface
On Mon, Aug 10, 2015 at 6:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: Petr Jelinek p...@2ndquadrant.com writes: On 2015-08-10 16:58, Alexander Korotkov wrote: That should work, thanks! Also we can have SQL-visible functions to get amsupport and amstrategies and use them in the regression tests. SQL-visible functions would be preferable to storing it in pg_am as keeping the params in pg_am would limit the extensibility of pg_am itself. I don't see any particularly good reason to remove amsupport and amstrategies from pg_am. Those are closely tied to the other catalog infrastructure for indexes (pg_amproc, pg_amop) which I don't think are candidates for getting changed by this patch. There are a couple of other pg_am columns, such as amstorage and amcanorderbyop, which similarly bear on what's legal to appear in related catalogs such as pg_opclass. I'd be sort of inclined to leave those in the catalog as well. I do not see that exposing a SQL function is better than exposing a catalog column; either way, that property is SQL-visible. That answers my question, thanks! -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP: Rework access method interface
Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane wrote: I think that's likely for the best anyway; there are too many catalogs that think a pg_am OID identifies an index AM. Better to create other catalogs for other types of AMs. That means we won't be able to reuse pg_class.relam as a pointer to the AM-of-the-other-kind either. Hm. So one way or the other we're going to end up violating relational theory somewhere. OK, I yield: let's say that pg_am has amname, amkind, amhandler, and nothing else. Then we will need SQL functions to expose whatever information we think needs to be available to SQL code. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpointer continuous flushing
Hello Andres, Thanks for your comments. Some answers and new patches included. + /* + * Array of buffer ids of all buffers to checkpoint. + */ +static int *CheckpointBufferIds = NULL; Should be at the beginning of the file. There's a bunch more cases of that. done. +/* Compare checkpoint buffers + */ +static int bufcmp(const int * pa, const int * pb) +{ + BufferDesc + *a = GetBufferDescriptor(*pa), + *b = GetBufferDescriptor(*pb); This definitely needs comments about ignoring the normal buffer header locking. Added. Why are we ignoring the database directory? I doubt it'll make a huge difference, but grouping metadata affecting operations by directory helps. I wanted to do the minimal comparisons to order buffers per file, so I skipped everything else. My idea of a checkpoint is a lot of data in a few files (at least compared to the data...), so I do not think that it is worth it. I may be proven wrong! +static void +AllocateCheckpointBufferIds(void) +{ + /* Safe worst case allocation, all buffers belong to the checkpoint... +* that is pretty unlikely. +*/ + CheckpointBufferIds = (int *) palloc(sizeof(int) * NBuffers); +} (wrong comment style...) Fixed. Heikki, you were concerned about the size of the allocation of this, right? I don't think it's relevant - we used to allocate an array of that size for the backend's private buffer pin array until 9.5, so in theory we should be safe agains that. NBuffers is limited to INT_MAX/2 in guc.ċ, which ought to be sufficient? I think that there is no issue with the current shared_buffers limit. I could allocate and use 4 GB on my laptop without problem. I added a cast to ensure that unsigned int are used for the size computation. + /* + * Lazy allocation: this function is called through the checkpointer, + * but also by initdb. Maybe the allocation could be moved to the callers. + */ + if (CheckpointBufferIds == NULL) + AllocateCheckpointBufferIds(); + I don't think it's a good idea to allocate this on every round. That just means a lot of page table entries have to be built and torn down regularly. It's not like checkpoints only run for 1% of the time or such. Sure. It is not allocated on every round, it is allocated once on the first checkpoint, the variable tested is static. There is no free. Maybe the allocation could be moved to the callers, though. FWIW, I still think it's a much better idea to allocate the memory once in shared buffers. Hmmm. The memory does not need to be shared with other processes? It's not like that makes us need more memory overall, and it'll be huge page allocations if configured. I also think that sooner rather than later we're going to need more than one process flushing buffers, and then it'll need to be moved there. That is an argument. I think that it could wait for the need to actually arise. + /* +* Sort buffer ids to help find sequential writes. +* +* Note: buffers are not locked in anyway, but that does not matter, +* this sorting is really advisory, if some buffer changes status during +* this pass it will be filtered out later. The only necessary property +* is that marked buffers do not move elsewhere. +*/ That reasoning makes it impossible to move the fsyncing of files into the loop (whenever we move to a new file). That's not nice. I do not see why. Moving rsync ahead is definitely an idea that you already pointed out, I have given it some thoughts, and it would require a carefull implementation and some restructuring. For instance, you do not want to issue fsync right after having done writes, you want to wait a little bit so that the system had time to write the buffers to disk. The formulation with necessary property doesn't seem very clear to me? Removed. How about: /* * Note: Buffers are not locked in any way during sorting, but that's ok: * A change in the buffer header is only relevant when it changes the * buffer's identity. If the identity has changed it'll have been * written out by BufferAlloc(), so there's no need for checkpointer to * write it out anymore. The buffer might also get written out by a * backend or bgwriter, but that's equally harmless. */ This new version included. Also, qsort implementation +* should be resilient to occasional contradictions (cmp(a,b) != -cmp(b,a)) +* because of these possible concurrent changes. Hm. Is that actually the case for our qsort implementation? I think that it is hard to write a qsort which would fail that. That would mean that it would compare the same items twice, which would be inefficient. If the pivot element changes its identity won't the result be pretty much random? That would be a very unlikely event, given the short time spent in qsort. Anyway, this is not a problem, and is the beauty of the advisory sort:
Re: [HACKERS] Summary of plans to avoid the annoyance of Freezing
On 10 August 2015 at 18:02, Josh Berkus j...@agliodbs.com wrote: There's a lesser version of this item which remains relevant unless we implement (5). That is, currently the same autovacuum_vaccuum_delay (AVVD) applies to regular autovacuums as does to anti-wraparound autovacuums. If the user has set AV with a high delay, this means that anti-wraparound AV may never complete. For that reason, we ought to have a separate parameter for AVVD, which defaults to a lower number (like 5ms), or even to zero. Of course, if we implement (5), that's not necessary, since AV will never trigger an anti-wraparound freeze. Good idea. Having a freeze map would be wholly unnecessary if we don't ever need to freeze whole tables again. Freezing would still be needed on individual blocks where an old row has been updated or deleted; a freeze map would not help there either. So there is no conflict, but options 2) and 3) are completely redundant if we go for 5). After investigation, I now think 5) is achievable in 9.6, but if I am wrong for whatever reason, we have 2) as a backstop. It's not redundant. Users may still want to freeze for two reasons: 1. to shrink the clog and multixact logs 2. to support INDEX-ONLY SCAN Freezing is not a necessary pre-condition for either of those things, I am happy to say. There is confusion here because for ( 1 ) the shrink was performed after freezing, but when you have access to the epoch there is no need for exhaustive freezing - only in special cases, as noted. If we are lucky those special cases will mean a massive reduction in I/O. For ( 2 ) a normal VACUUM is sufficient and as Robert observes, maybe just HOT is enough. In the new world, the clog can be shrunk when everything has been hinted. Given that is possible with just a normal VACUUM, I think the new anti-freeze design (hey, we need a name, right?) will mean the clog actually stays smaller in most cases than it does now. In both of those cases, having a freeze map would speed up the manual vacuum freeze considerably. Otherwise, we're just punting on the problem, and making it worse for users who wait too long. There would be no further need for the VACUUM FREEZE command. It would do nothing desirable. Now, it might still be the case that the *overhead* of a freeze map is a bad tradeoff if we don't have to worry about forced wraparound. But that's a different argument. I myself was in favour of the freeze map solution for some time, but I'm not anymore. Thank discussions at Pgcon slowly working their way into my brain. BTW, has it occured to anyone that implementing XID epoch headers is going to mean messing with multixact logs again? Just thought I'd open a papercut and pour some lemon juice on it. I doubt we have seen the last of that pain, but its not my fingers on the chopping board, so squeeze all you want. ;-) -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] Commitfest remaining Needs Review items
* extends pgbench expressions with functions Robert signed up as reviewer for this a long time ago. Ping, do you still plan to review this? I seem to recall that I nominated Robert when adding the patch, because this is an extension of his expression patch. So the sign up is really just a hint. * checkpoint continuous flushing Per discussion on that thread, let's just do the sorting part now. Needs some cleanup, but it's almost there. Given the performance improvements (yes some tps, but also latency), I think that the full patch deserves consideration. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpointer continuous flushing
Ok ok, I stop resisting... I'll have a look. Here is a v7 ab version which uses shared memory instead of palloc. -- Fabien.diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index e900dcc..1cec243 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2454,6 +2454,28 @@ include_dir 'conf.d' /listitem /varlistentry + varlistentry id=guc-checkpoint-sort xreflabel=checkpoint_sort + termvarnamecheckpoint_sort/varname (typebool/type) + indexterm + primaryvarnamecheckpoint_sort/ configuration parameter/primary + /indexterm + /term + listitem + para +Whether to sort buffers before writting them out to disk on checkpoint. +For a HDD storage, this setting allows to group together +neighboring pages written to disk, thus improving performance by +reducing random write activity. +This sorting should have limited performance effects on SSD backends +as such storages have good random write performance, but it may +help with wear-leveling so be worth keeping anyway. +The default is literalon/. +This parameter can only be set in the filenamepostgresql.conf/ +file or on the server command line. + /para + /listitem + /varlistentry + varlistentry id=guc-checkpoint-warning xreflabel=checkpoint_warning termvarnamecheckpoint_warning/varname (typeinteger/type) indexterm diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml index e3941c9..f538698 100644 --- a/doc/src/sgml/wal.sgml +++ b/doc/src/sgml/wal.sgml @@ -546,6 +546,18 @@ /para para + When hard-disk drives (HDD) are used for terminal data storage + xref linkend=guc-checkpoint-sort allows to sort pages + so that neighboring pages on disk will be flushed together by + chekpoints, reducing the random write load and improving performance. + If solid-state drives (SSD) are used, sorting pages induces no benefit + as their random write I/O performance is good: this feature could then + be disabled by setting varnamecheckpoint_sort/ to valueoff/. + It is possible that sorting may help with SSD wear leveling, so it may + be kept on that account. + /para + + para The number of WAL segment files in filenamepg_xlog/ directory depends on varnamemin_wal_size/, varnamemax_wal_size/ and the amount of WAL generated in previous checkpoint cycles. When old log diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 68e33eb..bee38ab 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7995,11 +7995,13 @@ LogCheckpointEnd(bool restartpoint) sync_secs, total_secs, longest_secs, +sort_secs, average_secs; int write_usecs, sync_usecs, total_usecs, longest_usecs, +sort_usecs, average_usecs; uint64 average_sync_time; @@ -8030,6 +8032,10 @@ LogCheckpointEnd(bool restartpoint) CheckpointStats.ckpt_end_t, total_secs, total_usecs); + TimestampDifference(CheckpointStats.ckpt_sort_t, + CheckpointStats.ckpt_sort_end_t, + sort_secs, sort_usecs); + /* * Timing values returned from CheckpointStats are in microseconds. * Convert to the second plus microsecond form that TimestampDifference @@ -8048,8 +8054,8 @@ LogCheckpointEnd(bool restartpoint) elog(LOG, %s complete: wrote %d buffers (%.1f%%); %d transaction log file(s) added, %d removed, %d recycled; - write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; - sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; + sort=%ld.%03d s, write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; + sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; distance=%d kB, estimate=%d kB, restartpoint ? restartpoint : checkpoint, CheckpointStats.ckpt_bufs_written, @@ -8057,6 +8063,7 @@ LogCheckpointEnd(bool restartpoint) CheckpointStats.ckpt_segs_added, CheckpointStats.ckpt_segs_removed, CheckpointStats.ckpt_segs_recycled, + sort_secs, sort_usecs / 1000, write_secs, write_usecs / 1000, sync_secs, sync_usecs / 1000, total_secs, total_usecs / 1000, diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c index 3ae2848..ec2436f 100644 --- a/src/backend/storage/buffer/buf_init.c +++ b/src/backend/storage/buffer/buf_init.c @@ -65,7 +65,8 @@ void InitBufferPool(void) { bool foundBufs, -foundDescs; +foundDescs, +foundCpid; /* Align descriptors to a cacheline boundary. */ BufferDescriptors = (BufferDescPadded *) CACHELINEALIGN( @@ -77,10 +78,14 @@ InitBufferPool(void) ShmemInitStruct(Buffer Blocks, NBuffers * (Size) BLCKSZ, foundBufs); - if (foundDescs || foundBufs) + CheckpointBufferIds = (int *) + ShmemInitStruct(Checkpoint BufferIds, + NBuffers * sizeof(int), foundCpid); + + if (foundDescs || foundBufs || foundCpid) { - /* both should be
Re: [HACKERS] checkpointer continuous flushing
On 2015-08-10 19:07:12 +0200, Fabien COELHO wrote: I think that there is no issue with the current shared_buffers limit. I could allocate and use 4 GB on my laptop without problem. I added a cast to ensure that unsigned int are used for the size computation. You can't allocate 4GB with palloc(), it has a builtin limit against allocating more than 1GB. + /* + * Lazy allocation: this function is called through the checkpointer, + * but also by initdb. Maybe the allocation could be moved to the callers. + */ + if (CheckpointBufferIds == NULL) + AllocateCheckpointBufferIds(); + I don't think it's a good idea to allocate this on every round. That just means a lot of page table entries have to be built and torn down regularly. It's not like checkpoints only run for 1% of the time or such. Sure. It is not allocated on every round, it is allocated once on the first checkpoint, the variable tested is static. There is no free. Maybe the allocation could be moved to the callers, though. Well, then everytime the checkpointer is restarted. FWIW, I still think it's a much better idea to allocate the memory once in shared buffers. Hmmm. The memory does not need to be shared with other processes? The point is that it's done at postmaster startup, and we're pretty much guaranteed that the memory will availabl.e. It's not like that makes us need more memory overall, and it'll be huge page allocations if configured. I also think that sooner rather than later we're going to need more than one process flushing buffers, and then it'll need to be moved there. That is an argument. I think that it could wait for the need to actually arise. Huge pages are used today. + /* +* Sort buffer ids to help find sequential writes. +* +* Note: buffers are not locked in anyway, but that does not matter, +* this sorting is really advisory, if some buffer changes status during +* this pass it will be filtered out later. The only necessary property +* is that marked buffers do not move elsewhere. +*/ That reasoning makes it impossible to move the fsyncing of files into the loop (whenever we move to a new file). That's not nice. I do not see why. Because it means that the sorting isn't necessarily correct. I.e. we can't rely on it to determine whether a file has already been fsynced. Also, qsort implementation +* should be resilient to occasional contradictions (cmp(a,b) != -cmp(b,a)) +* because of these possible concurrent changes. Hm. Is that actually the case for our qsort implementation? I think that it is hard to write a qsort which would fail that. That would mean that it would compare the same items twice, which would be inefficient. What? The same two elements aren't frequently compared pairwise with each other, but of course an individual element is frequently compared with other elements. Consider what happens when the chosen pivot element changes its identity after already dividing half. The two partitions will not be divided in any meaning full way anymore. I don't see how this will results in a meaningful sort. If the pivot element changes its identity won't the result be pretty much random? That would be a very unlikely event, given the short time spent in qsort. Meh, we don't want to rely on likeliness on such things. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpointer continuous flushing
Hello Andres, You can't allocate 4GB with palloc(), it has a builtin limit against allocating more than 1GB. Argh, too bad, I assumed very naively that palloc was malloc in disguise. [...] Well, then everytime the checkpointer is restarted. Hm... The point is that it's done at postmaster startup, and we're pretty much guaranteed that the memory will availabl.e. Ok ok, I stop resisting... I'll have a look. Would it also fix the 1 GB palloc limit on the same go? I guess so... That reasoning makes it impossible to move the fsyncing of files into the loop (whenever we move to a new file). That's not nice. I do not see why. Because it means that the sorting isn't necessarily correct. I.e. we can't rely on it to determine whether a file has already been fsynced. Ok, I understand your point. Then the file would be fsynced twice: if the fsync is done properly (data have already been flushed to disk) then it would not cost much, and doing it sometimes twice on some file would not be a big issue. The code could also detect such event and log a warning, which would give a hint about how often it occurs in practice. Hm. Is that actually the case for our qsort implementation? I think that it is hard to write a qsort which would fail that. That would mean that it would compare the same items twice, which would be inefficient. What? The same two elements aren't frequently compared pairwise with each other, but of course an individual element is frequently compared with other elements. Sure. Consider what happens when the chosen pivot element changes its identity after already dividing half. The two partitions will not be divided in any meaning full way anymore. I don't see how this will results in a meaningful sort. It would be partly meaningful, which is enough for performance, and does not matter for correctness: currently buffers are not sorted at all and it works, even if it does not work well. If the pivot element changes its identity won't the result be pretty much random? That would be a very unlikely event, given the short time spent in qsort. Meh, we don't want to rely on likeliness on such things. My main argument is that even if it occurs, and the qsort result is partly wrong, it does not change correctness, it just mean that the actual writes will be less in order than wished. If it occurs, one pivot separation would be quite strange, but then others would be right, so the buffers would be partly sorted. Another issue I see is that even if buffers are locked within cmp, the status may change between two cmp... I do not think that locking all buffers for sorting them is an option. So on the whole, I think that locking buffers for sorting is probably not possible with the simple (and efficient) lightweight approach used in the patch. The good news, as I argued before, is that the order is only advisory to help with performance, but the correctness is really that all checkpoint buffers are written and fsync is called in the end, and does not depend on the buffer order. That is how it currently works anyway. If you block on this then I'll put a heavy weight approach, but that would be a waste of memory in my opinion, hence my argumentation for the lightweight approach. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] linux sparc compile issue
Hi, while doing regular builds via buildroot autobuilders a compile problem for sparc 32bit v8 was found. It seems the defines for Linux are other than for Solaris. Following patch fixes it for buildroot: The gcc predefines for Linux are __sparc_v8__/__sparc_v7__ Signed-off-by: Waldemar Brodkorb w...@openadk.org diff -Nur postgresql-9.4.4.orig/src/include/storage/s_lock.h postgresql-9.4.4/src/include/storage/s_lock.h --- postgresql-9.4.4.orig/src/include/storage/s_lock.h 2015-06-09 21:29:38.0 +0200 +++ postgresql-9.4.4/src/include/storage/s_lock.h 2015-08-09 19:57:06.0 +0200 @@ -420,12 +420,12 @@ : =r(_res), +m(*lock) : r(lock) : memory); -#if defined(__sparcv7) +#if defined(__sparcv7) || defined(__sparc_v7__) /* * No stbar or membar available, luckily no actually produced hardware * requires a barrier. */ -#elif defined(__sparcv8) +#elif defined(__sparcv8) || defined(__sparc_v8__) /* stbar is available (and required for both PSO, RMO), membar isn't */ __asm__ __volatile__ (stbar \n:::memory); #else @@ -438,13 +438,13 @@ return (int) _res; } -#if defined(__sparcv7) +#if defined(__sparcv7) || defined(__sparc_v7__) /* * No stbar or membar available, luckily no actually produced hardware * requires a barrier. */ #define S_UNLOCK(lock) (*((volatile slock_t *) (lock)) = 0) -#elif defined(__sparcv8) +#elif defined(__sparcv8) || defined(__sparc_v8__) /* stbar is available (and required for both PSO, RMO), membar isn't */ #define S_UNLOCK(lock) \ do \ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Summary of plans to avoid the annoyance of Freezing
Simon, Thank you for this summary! I was losing track, myself. On 08/09/2015 11:03 PM, Simon Riggs wrote: Freezing is painful for VLDBs and high transaction rate systems. We have a number of proposals to improve things... 3. Speed up autovacuums when they are triggered to avoid wraparounds (Simon) Idea is to do a VACUUM scan which only freezes tuples. If we dirty a page from freezing we then also prune it, but don't attempt to scan indexes to remove the now-truncated dead tuples. This looks very straightforward, no technical issues. Might even be able to backpatch it. [patch investigated but not finished yet] There's a lesser version of this item which remains relevant unless we implement (5). That is, currently the same autovacuum_vaccuum_delay (AVVD) applies to regular autovacuums as does to anti-wraparound autovacuums. If the user has set AV with a high delay, this means that anti-wraparound AV may never complete. For that reason, we ought to have a separate parameter for AVVD, which defaults to a lower number (like 5ms), or even to zero. Of course, if we implement (5), that's not necessary, since AV will never trigger an anti-wraparound freeze. Having a freeze map would be wholly unnecessary if we don't ever need to freeze whole tables again. Freezing would still be needed on individual blocks where an old row has been updated or deleted; a freeze map would not help there either. So there is no conflict, but options 2) and 3) are completely redundant if we go for 5). After investigation, I now think 5) is achievable in 9.6, but if I am wrong for whatever reason, we have 2) as a backstop. It's not redundant. Users may still want to freeze for two reasons: 1. to shrink the clog and multixact logs 2. to support INDEX-ONLY SCAN In both of those cases, having a freeze map would speed up the manual vacuum freeze considerably. Otherwise, we're just punting on the problem, and making it worse for users who wait too long. Now, it might still be the case that the *overhead* of a freeze map is a bad tradeoff if we don't have to worry about forced wraparound. But that's a different argument. BTW, has it occured to anyone that implementing XID epoch headers is going to mean messing with multixact logs again? Just thought I'd open a papercut and pour some lemon juice on it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] A \pivot command for psql
Tom Lane wrote: I'm not sure how pushing it out to psql makes that better. There is no way to do further processing on something that psql has printed, so you've punted on solving that issue just as much if not more. It's the same spirit as \x : the only thing it achieves is better readability in certain cases, I don't expect more from it. But I admit that \pivot would be much more of a niche use case than \x Besides which, psql is not all that great for looking at very wide tables, so I'm not sure that this would be very useful for dynamic column sets anyway. I use \x all the time with wide tables and \x and \pivot happen to play out well together (In fact I was pleasantly surprised by this) Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_dump and search_path
I earliest reference I found to this issue is here http://postgresql.nabble.com/set-search-path-in-dump-output-considered-harm ful-td1947594.html and refers to the search_path being arbitrarily set in the file created by pg_dump. This is apparently still the case in 9.4. I found this issue because I use SERIAL/BIGSERIAL columns and when I created schema-specific tables in a schema other than the first listed in search_path the nextval() sequence references were schema-qualified. When I created a backup file with pg_dump and then restored using psql, the nextval() sequence references were no longer schema-qualified because the backup file set my table schema as the first schema in search_path. I saw the same result with pg_restore. While the results of \d testschema.testtable shows the schema-qualified sequence name in nextval(): \d testschema.testtable; Table testschema.testtable Column | Type | Modifiers ++-- - id | integer| not null default nextval('testschema.testtable_id_seq'::regclass) The actual default read from pg_attrdef does not: SELECT a.attnum, n.nspname, c.relname, d.adsrc AS default_value FROM pg_attribute AS a JOIN pg_class AS c ON a.attrelid = c.oid JOIN pg_namespace AS n ON c.relnamespace = n.oid LEFT JOIN pg_attrdef AS d ON d.adrelid = c.oid AND d.adnum = a.attnum WHERE a.attnum 0 AND n.nspname = 'testschema' AND c.relname = 'testtable'; attnum | nspname | relname | default_value ++---+--- 1 | testschema | testtable | nextval('testtable_id_seq'::regclass) 2 | testschema | testtable | This insistency is described here http://dba.stackexchange.com/questions/21150/default-value-of-serial-fields -changes-after-restore . This is not a documented behavior-at least I couldn't find it and I searched quite a bit. There was no indication to me that when I run pg_dump it will do something more than I asked it to do and it took me a while to figure out why. I solved the problem by setting the search_path as pg_dump does when creating the database so now the restore does not create a different database than I did. Certainly it would seem a bug that \d and a direct read from pg_attrdef give different results even though pg_dump determining on its own what the search_path should be is no doubt an intended behavior. But it seems to me this should be an option. I expected pg_dump to do what I asked it to do and when it did something other than that it was quite a headache. What's more, I like schema-qualified references. Schemas are an effective database organization tool and I teach my people to use them and not depend on the search path as doing so leads to sloppy and inconsistent thinking as well as coding. Please consider making the arbitrary determination of search_path by pg_dump an optional behavior. Or better yet, just have it generate a backup that accurately reflects the database it is backing up. BTW, I am a huge fan of PostgreSQL. Cheers!
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Thu, Oct 30, 2014 at 5:30 AM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Oct 30, 2014 at 7:30 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: + { + {pending_list_cleanup_size, PGC_USERSET, CLIENT_CONN_STATEMENT, + gettext_noop(Sets the maximum size of the pending list for GIN index.), +NULL, + GUC_UNIT_KB + }, + pending_list_cleanup_size, + 4096, 0, MAX_KILOBYTES, + NULL, NULL, NULL + }, ISTM it'd be better to use RESOURCES_MEM, not CLIENT_CONN_STATEMENT. No? Yes if the pending list always exists in the memory. But not, IIUC. Thought? Also why not set min to 64, not to 0, in accoradance with that of work_mem? I'm OK to use 64. But I just chose 0 because I could not think of any reasonable reason why 64k is suitable as the minimum size of the pending list. IOW, I have no idea about whether it's reasonable to use the min value of work_mem as the min size of the pending list. I know I am late to the party here, but would like to have the minimum be 0, not 64. As long as by zero, it means it doesn't insert anything into the pending list, rather than inserting and promptly cleaning it up. The reason for this is that if I am trying to decide what pending_list_cleanup_size I want to set for the index in the indexes storage parameters, the way to find that out is try a bunch of different ones through the guc while the index is still at the default of no overriding storage parameter. It would be nice to try the fastupdate=off alternative (i.e. the same as pending_list_cleanup_size=0) without having to change the index itself and change the syntax used in the testing. Cheers, Jeff
Re: [HACKERS] [patch] A \pivot command for psql
David Fetter wrote: I'm working up a proposal to add (UN)PIVOT support to the back-end. I was under the impression that a server-side PIVOT *with dynamic columns* was just unworkable as an SQL query, because it couldn't be prepared if it existed. I am wrong on that? I feel like you guys are all telling me that \pivot should happen on the server, but the point that it would not be realistic to begin with is not considered. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pg_upgrade fails when postgres/template1 isn't in default tablespace
On Fri, Jun 19, 2015 at 07:10:40PM +0300, Marti Raudsepp wrote: Hi list Sorry I am just getting this report. Thanks to the people who stalled for me. One of my databases failed to upgrade successfully and produced this error in the copying phase: error while copying relation pg_catalog.pg_largeobject (/srv/ssd/ PG_9.3_201306121/1/12023 to /PG_9.4_201409291/1/12130): No such file or directory As with all good bug reports, there are multiple bugs here. ;-) Notice the second path has an invalid prefix: /PG_9.4_201409291/1/12130 --- obviously something is seriously wrong. The failure of pg_dumpall to move 'postgres' and 'template1' databases to the new tablespace cannot explain that, i.e. I could understand pg_upgrade looking for pg_largeobject in the default data directory, or in the new one, but not in a path that doesn't even exist. pg_upgrade tracks old and new tablespaces separately, so there is obviously something wrong. And I found it, patch attached. The pg_upgrade code was testing for the tablespace of the _old_ object, then assigning the old and _new_ tablespaces based on that. The first patch fixes that, and should be backpatched to all supported pg_upgrade versions. Turns out this happens when either the postgres or template1 databases have been moved to a non-default tablespace. pg_dumpall does not dump attributes (such as tablespace) for these databases; pg_upgrade queries the new cluster about the tablespace for these relations and builds a broken destination path for the copy/link operation. The least bad solution seems to be generating ALTER DATBASE SET TABLESPACE commands for these from pg_dumpall. Previously a --globals-only dump didn't generate psql \connect commands, but you can't run SET TABLESPACE from within the same database you're connected to. So to move postgres, it needs to connect to template1 and vice versa. That seems fine for the purpose of pg_upgrade which can assume a freshly created cluster with both databases intact. Yes, seems like a good solution. I have implemented a proof of concept patch for this. Currently I'm only tackling the binary upgrade failure and not general pg_dumpall. Alternatively, we could allow SET TABLESPACE in the current database, which seems less ugly to me. A code comment says Obviously can't move the tables of my own database, but it's not obvious to me why. If I'm the only connected backend, it seems that any caches and open files could be invalidated. But I don't know how big of an undertaking that would be. Your submitted patch, attached, also looks good, and should be backpatched. My only question is whether this should be for all runs of pg_dumpall, not just in binary upgrade mode. Comments? Once we agree on this, I will apply these to all back branches and run tests by moving template1 before the upgrade to make sure it works for all PG versions. pg_upgrade) misses: * Nothing at all is dumped for the template0 database, although ACLs, settings and the tablespace can be changed by the user Uh, we _assume_ no one is connecting to template0, but you are right people can change things _about_ template0. It would be nice to preserve those. * template1 postgres databases retain ACLs and settings, but not attributes like TABLESPACE or CONNECTION LIMIT. Other attributes like LC_COLLATE and LC_CTYPE can't be changed without recreating the DB, but those don't matter for the pg_upgrade case anyway. It seems those would be good material for another patch? Agreed. I am not sure how we have gotten this far with so few complaints about this problem. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c new file mode 100644 index e158c9f..390fc62 *** a/src/bin/pg_upgrade/info.c --- b/src/bin/pg_upgrade/info.c *** create_rel_filename_map(const char *old_ *** 140,145 --- 140,146 const RelInfo *old_rel, const RelInfo *new_rel, FileNameMap *map) { + /* Someday the old/new tablespaces might not match, so handle it. */ if (strlen(old_rel-tablespace) == 0) { /* *** create_rel_filename_map(const char *old_ *** 147,162 * exist in the data directories. */ map-old_tablespace = old_data; - map-new_tablespace = new_data; map-old_tablespace_suffix = /base; - map-new_tablespace_suffix = /base; } else { /* relation belongs to a tablespace, so use the tablespace location */ map-old_tablespace = old_rel-tablespace; - map-new_tablespace = new_rel-tablespace; map-old_tablespace_suffix = old_cluster.tablespace_suffix; map-new_tablespace_suffix = new_cluster.tablespace_suffix; } --- 148,171 * exist in the data directories. */ map-old_tablespace = old_data; map-old_tablespace_suffix = /base;
Re: [HACKERS] Summary of plans to avoid the annoyance of Freezing
On 08/10/2015 10:31 AM, Simon Riggs wrote: Freezing is not a necessary pre-condition for either of those things, I am happy to say. There is confusion here because for ( 1 ) the shrink was performed after freezing, but when you have access to the epoch there is no need for exhaustive freezing - only in special cases, as noted. If we are lucky those special cases will mean a massive reduction in I/O. For ( 2 ) a normal VACUUM is sufficient and as Robert observes, maybe just HOT is enough. Yeah, saw your explanation on this on the other thread. Good point. Question: does regular vacuum update the visibility map in the same way vacuum freeze does? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Rework access method interface
On 2015-08-10 16:58, Alexander Korotkov wrote: On Mon, Aug 10, 2015 at 5:48 PM, Tom Lane t...@sss.pgh.pa.us mailto:t...@sss.pgh.pa.us wrote: Alexander Korotkov a.korot...@postgrespro.ru mailto:a.korot...@postgrespro.ru writes: On Mon, Aug 10, 2015 at 1:12 PM, Petr Jelinek p...@2ndquadrant.com mailto:p...@2ndquadrant.com wrote: I don't understand this, there is already AmRoutine in RelationData, why the need for additional field for just amsupport? We need amsupport in load_relcache_init_file() which reads pg_internal.init. I'm not sure this is correct place to call am_handler. It should work in the case of built-in AM. But if AM is defined in the extension then we wouldn't be able to do catalog lookup for am_handler on this stage of initialization. This is an issue we'll have to face before there's much hope of having index AMs as extensions: how would you locate any extension function without catalog access? Storing raw function pointers in pg_internal.init is not an answer in an ASLR world. I think we can dodge the issue so far as pg_internal.init is concerned by decreeing that system catalogs can only have indexes with built-in AMs. Calling a built-in function doesn't require catalog access, so there should be no problem with re-calling the handler function by OID during load_relcache_init_file(). That should work, thanks! Also we can have SQL-visible functions to get amsupport and amstrategies and use them in the regression tests. SQL-visible functions would be preferable to storing it in pg_am as keeping the params in pg_am would limit the extensibility of pg_am itself. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.
On Tue, Aug 11, 2015 at 12:20 AM, Simon Riggs si...@2ndquadrant.com wrote: On 10 August 2015 at 15:59, Syed, Rahila rahila.s...@nttdata.com wrote: Hello, When we're in Phase2 or 3, don't we need to report the number of total page scanned or percentage of how many table pages scanned, as well? The total heap pages scanned need to be reported with phase 2 or 3. Complete progress report need to have numbers from each phase when applicable. Phase 1. Report 2 integer counters: heap pages scanned and total heap pages, 1 float counter: percentage_complete and progress message. Phase 2. Report 2 integer counters: index pages scanned and total index pages(across all indexes) and progress message. Phase 3. 1 integer counter: heap pages vacuumed. Sorry for being unclear here. What I meant to say is, each phase of a command will correspond to a slot in COMMAND_NUM_SLOTS. Each phase will be a separate array element and will comprise of n integers, n floats, string. So , in the view reporting progress, VACUUM command can have 3 entries one for each phase. VACUUM has 3 phases now, but since phases 2 and 3 repeat, you can have an unbounded number of phases. But that assumes that we don't count truncation as a 4th phase of VACUUM... Yeah. This topic may have been already discussed but, why don't we use just total scanned pages and total pages? The mechanism of VACUUM is complicated a bit today, and other maintenance command is as well. It would be tough to trace these processing, and these might be changed in the future. But basically, we can trace total scanned pages of target relation easily, and such information would be enough at many case. Regards, -- Masahiko Sawada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN pageinspect functions
On Fri, Nov 21, 2014 at 2:04 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 11/20/2014 05:52 AM, Michael Paquier wrote: On Wed, Nov 19, 2014 at 7:01 AM, Peter Geoghegan p...@heroku.com wrote: On Tue, Nov 4, 2014 at 7:26 AM, Amit Kapila amit.kapil...@gmail.com wrote: 1. Documentation seems to be missing, other API's exposed via pageinspect are documented at: http://www.postgresql.org/docs/devel/static/pageinspect.html Done. 2. +CREATE FUNCTION gin_metapage(IN page bytea, +OUT pending_head bigint, +OUT pending_tail bigint, +OUT version int4) +AS 'MODULE_PATHNAME', 'gin_metapage' +LANGUAGE C STRICT; a. Isn't it better to name the function as gin_metap(..) similar to existing function bt_metap(..)? I actually liked more gin_metapage_info, a name similar to the newly-introduced brin indexes. b. Can this function have a similar signature as bt_metap() which means it should take input as relname? That's mostly a matter of taste but I think we should definitely pass a raw page to it as it is now. This has the advantage to add an extra check if the page passed is really a meta page of not, something useful for development. 3. Can gin_dataleafpage() API have similar name and signature as API bt_page_items() exposed for btree? What about gin_leafpage_items then? The signature of bt_page_items() isn't a good example to follow. It existed before the get_raw_page() function, and the other functions that are designed to work with that, was added. gin_leafpage_items() name seems fine to me. When I call gin_leafpage_items on a {leaf} page, I get the ERROR: ERROR: input page is not a compressed GIN data leaf page DETAIL: Flags 0002, expected 0083 I'm don't know why it won't work on an uncompressed leaf page (or for that matter, why my index pages are not compressed), but the docs should probably note the restriction. Cheers, Jeff pageinspect_gin_leaf.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE POLICY and RETURNING
Zhaomo, * Zhaomo Yang (zmp...@gmail.com) wrote: This thread has a pretty thorough discussion of pros and cons of applying SELECT policy to other commands. Besides what have been mentioned, I think there is another potential pro: we can enable references to pseudorelations OLD and NEW in predicates. Now, for UPDATE, the references to the target table in USING clause are actually references to OLD and the references in WITH CHECK clause are references to NEW. Logically now USING and WITH CHECK are combined by AND, so we cannot have predicates like For my part, I find that the simplicity of having USING only ever refer to existing records and WITH CHECK only ever refer to records being added to be good and I'm concerned that this approach would be confusing. If no NEW or OLD is used, what happens? Or would you have to always specify OLD/NEW for UPDATE, and then what about for the other policies, and the FOR ALL policies? foo(NEW) 1 OR bar(OLD) 1 (combine predicates referencing OLD and NEW by an operation other than AND) Your statement that this can't be done with the existing policy approach is incorrect, or I've misunderstood what you mean above. This could be accomplished with USING (bar 1) and WITH CHECK (foo 1), no? Your sentence above that USING and WITH CHECK are combined by AND isn't correct either- they're independent and are therefore really OR'd. If they were AND'd then the new record would have to pass both USING and WITH CHECK policies. NEW.id OLD.id(reference both in the same expression) Here you're correct that this isn't something the existing approach to UPDATE policies can support. I don't intend to simply punt on this, but I will point out that this particular requirement can be handled in a trigger, if desired. Further, I'm not sure that I see how this would work in a case where you have the SELECT policy (which clearly could only refer to OLD) applied first, as you suggest? If we apply SELECT policy to other commands, we only need one predicate for INSERT/UPDATE/DELETE. That predicate can reference to OLD and NEW, just like predicates for triggers and rules. For UPDATE and DELETE, the predicate of SELECT will be applied first (when the target table is scanned) to ensure no information leakage and their own predicate will be applied later. This doesn't change much for INSERT and DELETE, but it gives users more flexibility when they set predicates for UPDATE. OLD and NEW are only applicable for the UPDATE case and requiring those to be used for the other policies is adding complexity for a pretty narrow use-case, as is combining the SELECT USING policy with the USING (or any) policy for the other commands. Further, it clearly reduces the range of options for UPDATE as it means that you can't do blind updates or deletes. Perhaps that's sensible, but we could do that by simply AND'ing the SELECT USING policy with the other command USING policy, but Dean was against that idea up-thread, as am I, because it adds complexity, and it would further mean that neither of your suggested predicates above would be supported, as I explained above. Also, as far as I see, none of this really amounts to anything different with regard to RETURNING than the previous proposal of simply applying the SELECT USING policy to the records first, and the records returned could still not be allowed by the SELECT policy as they would be the results of the update, which could still pass the UPDATE policy for new records. Applying the SELECT USING policy against the RETURNING records strikes me, more and more, as just complexity which will cause more confusion than it helps anyone. We definitely need to explain how the USING clause works for the commands and how that impacts RETURNING, as I've mentioned in the 9.5 open items list, and I'm planning to tackle that this week. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] checkpointer continuous flushing
Hi, On 2015-08-08 20:49:03 +0300, Heikki Linnakangas wrote: I ripped out the flushing part, keeping only the sorting. I refactored the logic in BufferSync() a bit. There's now a separate function, nextCheckpointBuffer(), that returns the next buffer ID from the sorted list. The tablespace-parallelization behaviour in encapsulated there, keeping the code in BufferSync() much simpler. See attached. Needs some minor cleanup and commenting still before committing, and I haven't done any testing besides a simple make check. Thought it'd be useful to review the current version as well. Some of what I'm commenting on you'll probably already have though of under the label of minor cleanup. /* + * Array of buffer ids of all buffers to checkpoint. + */ +static int *CheckpointBufferIds = NULL; + +/* Compare checkpoint buffers + */ Should be at the beginning of the file. There's a bunch more cases of that. +/* Compare checkpoint buffers + */ +static int bufcmp(const int * pa, const int * pb) +{ + BufferDesc + *a = GetBufferDescriptor(*pa), + *b = GetBufferDescriptor(*pb); + + /* tag: rnode, forkNum (different files), blockNum + * rnode: { spcNode (ignore: not really needed), + * dbNode (ignore: this is a directory), relNode } + * spcNode: table space oid, not that there are at least two + * (pg_global and pg_default). + */ + /* compare relation */ + if (a-tag.rnode.spcNode b-tag.rnode.spcNode) + return -1; + else if (a-tag.rnode.spcNode b-tag.rnode.spcNode) + return 1; + if (a-tag.rnode.relNode b-tag.rnode.relNode) + return -1; + else if (a-tag.rnode.relNode b-tag.rnode.relNode) + return 1; + /* same relation, compare fork */ + else if (a-tag.forkNum b-tag.forkNum) + return -1; + else if (a-tag.forkNum b-tag.forkNum) + return 1; + /* same relation/fork, so same segmented file, compare block number + * which are mapped on different segments depending on the number. + */ + else if (a-tag.blockNum b-tag.blockNum) + return -1; + else /* should not be the same block anyway... */ + return 1; +} This definitely needs comments about ignoring the normal buffer header locking. Why are we ignoring the database directory? I doubt it'll make a huge difference, but grouping metadata affecting operations by directory helps. + +static void +AllocateCheckpointBufferIds(void) +{ + /* Safe worst case allocation, all buffers belong to the checkpoint... + * that is pretty unlikely. + */ + CheckpointBufferIds = (int *) palloc(sizeof(int) * NBuffers); +} (wrong comment style...) Heikki, you were concerned about the size of the allocation of this, right? I don't think it's relevant - we used to allocate an array of that size for the backend's private buffer pin array until 9.5, so in theory we should be safe agains that. NBuffers is limited to INT_MAX/2 in guc.ċ, which ought to be sufficient? + /* + * Lazy allocation: this function is called through the checkpointer, + * but also by initdb. Maybe the allocation could be moved to the callers. + */ + if (CheckpointBufferIds == NULL) + AllocateCheckpointBufferIds(); + I don't think it's a good idea to allocate this on every round. That just means a lot of page table entries have to be built and torn down regularly. It's not like checkpoints only run for 1% of the time or such. FWIW, I still think it's a much better idea to allocate the memory once in shared buffers. It's not like that makes us need more memory overall, and it'll be huge page allocations if configured. I also think that sooner rather than later we're going to need more than one process flushing buffers, and then it'll need to be moved there. + /* + * Sort buffer ids to help find sequential writes. + * + * Note: buffers are not locked in anyway, but that does not matter, + * this sorting is really advisory, if some buffer changes status during + * this pass it will be filtered out later. The only necessary property + * is that marked buffers do not move elsewhere. + */ That reasoning makes it impossible to move the fsyncing of files into the loop (whenever we move to a new file). That's not nice. The formulation with necessary property doesn't seem very clear to me? How about: /* * Note: Buffers are not locked in any way during sorting, but that's ok: * A change in the buffer header is only relevant when it changes the * buffer's identity. If the identity has changed it'll have been * written out by BufferAlloc(), so there's no need for checkpointer to * write it out anymore. The buffer might also get written out by a * backend or bgwriter, but that's equally harmless. */ Also, qsort implementation + *
Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.
On 10 August 2015 at 15:59, Syed, Rahila rahila.s...@nttdata.com wrote: Hello, When we're in Phase2 or 3, don't we need to report the number of total page scanned or percentage of how many table pages scanned, as well? The total heap pages scanned need to be reported with phase 2 or 3. Complete progress report need to have numbers from each phase when applicable. Phase 1. Report 2 integer counters: heap pages scanned and total heap pages, 1 float counter: percentage_complete and progress message. Phase 2. Report 2 integer counters: index pages scanned and total index pages(across all indexes) and progress message. Phase 3. 1 integer counter: heap pages vacuumed. Sorry for being unclear here. What I meant to say is, each phase of a command will correspond to a slot in COMMAND_NUM_SLOTS. Each phase will be a separate array element and will comprise of n integers, n floats, string. So , in the view reporting progress, VACUUM command can have 3 entries one for each phase. VACUUM has 3 phases now, but since phases 2 and 3 repeat, you can have an unbounded number of phases. But that assumes that we don't count truncation as a 4th phase of VACUUM... SELECT statements also have a variable number of phases, hash, materialize, sorts all act as blocking nodes where you cannot progress to next phase until it is complete and you don't know for certain how much data will come in later phases. I think the best you'll do is an array of pairs of values [(current blocks, total blocks), ... ] Knowing how many phases there are is a tough problem. I think the only way forwards is to admit that we will publish our best initial estimate of total workload size and then later we may realise it was wrong and publish a better number (do until complete). It's not wonderful, but la vida es loca. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] WIP: Rework access method interface
Petr Jelinek p...@2ndquadrant.com writes: On 2015-08-10 16:58, Alexander Korotkov wrote: That should work, thanks! Also we can have SQL-visible functions to get amsupport and amstrategies and use them in the regression tests. SQL-visible functions would be preferable to storing it in pg_am as keeping the params in pg_am would limit the extensibility of pg_am itself. I don't see any particularly good reason to remove amsupport and amstrategies from pg_am. Those are closely tied to the other catalog infrastructure for indexes (pg_amproc, pg_amop) which I don't think are candidates for getting changed by this patch. There are a couple of other pg_am columns, such as amstorage and amcanorderbyop, which similarly bear on what's legal to appear in related catalogs such as pg_opclass. I'd be sort of inclined to leave those in the catalog as well. I do not see that exposing a SQL function is better than exposing a catalog column; either way, that property is SQL-visible. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Rework access method interface
Petr Jelinek p...@2ndquadrant.com writes: On 2015-08-10 17:47, Tom Lane wrote: I don't see any particularly good reason to remove amsupport and amstrategies from pg_am. Those are closely tied to the other catalog infrastructure for indexes (pg_amproc, pg_amop) which I don't think are candidates for getting changed by this patch. Ok, in that case it seems unlikely that we'll be able to use pg_am for any other access methods besides indexes in the future. I think that's likely for the best anyway; there are too many catalogs that think a pg_am OID identifies an index AM. Better to create other catalogs for other types of AMs. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Rework access method interface
Alexander Korotkov a.korot...@postgrespro.ru writes: On Mon, Aug 10, 2015 at 6:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: There are a couple of other pg_am columns, such as amstorage and amcanorderbyop, which similarly bear on what's legal to appear in related catalogs such as pg_opclass. I'd be sort of inclined to leave those in the catalog as well. I do not see that exposing a SQL function is better than exposing a catalog column; either way, that property is SQL-visible. That answers my question, thanks! BTW, just to clarify: we can still have the desirable property that CREATE INDEX ACCESS METHOD needs no parameters other than the AM handler function name. The AM code would still expose all of its properties through the struct returned by the handler. What is at issue here is how information in that struct that needs to be available to SQL code gets exposed. We can do that either by exposing SQL functions to get it, or by having CREATE INDEX ACCESS METHOD call the handler and then copy some fields into the new pg_am row. I'm suggesting that we should do the latter for any fields that determine validity of pg_opclass, pg_amop, etc entries associated with the AM type. The AM could not reasonably change its mind about such properties (within a major release at least) so there is no harm in making a long-lived copy in pg_am. And we might as well not break SQL code unnecessarily by removing fields that used to be there. There may be some other AM properties that would be better to expose through SQL functions because they could potentially change without invalidating information stored in the other catalogs. I'm not sure whether there is any such data that needs to be accessible at SQL level, though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Rework access method interface
Tom Lane wrote: Petr Jelinek p...@2ndquadrant.com writes: On 2015-08-10 17:47, Tom Lane wrote: I don't see any particularly good reason to remove amsupport and amstrategies from pg_am. Those are closely tied to the other catalog infrastructure for indexes (pg_amproc, pg_amop) which I don't think are candidates for getting changed by this patch. Ok, in that case it seems unlikely that we'll be able to use pg_am for any other access methods besides indexes in the future. I think that's likely for the best anyway; there are too many catalogs that think a pg_am OID identifies an index AM. Better to create other catalogs for other types of AMs. That means we won't be able to reuse pg_class.relam as a pointer to the AM-of-the-other-kind either. I don't think this is the best course of action. We have the sequence AM patch that already reuses pg_class.relam to point to pg_seqam.oid, but you objected to that on relational theory grounds, which seemed reasonable to me. The other option is to create yet another pg_class column with an OID of some other AM catalog, but this seems a waste. FWIW the column store patch we're working on also wants to have its own AM-like catalog. In our current code we have a separate catalog pg_colstore_am, but are eagerly waiting for the discussion on this to settle so that we can just use pg_am and pg_class.relam instead. (The patch itself is not public yet since it's nowhere near usable, and this detail is a pretty minor issue, but I thought reasonable to bring it up here. We're also waiting on upper-planner path-ification since it seems likely that some code will collide there, anyway.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Precedence of standard comparison operators
On 10 August 2015 at 16:31, Tom Lane t...@sss.pgh.pa.us wrote: Pavel Stehule pavel.steh...@gmail.com writes: On Sun, Aug 09, 2015 at 08:06:11PM -0400, Tom Lane wrote: So yeah, I do think that getting a syntax error if you don't use parentheses is the preferable behavior here. If we raise a syntax error, then there should be very informative message, Yeah, it would sure be nice to throw something better than syntax error. But I've looked at bison's facilities for that, and they're pretty bad. I'm not sure there's much we can do short of moving to some other parser generator tool, which would be a Large Undertaking ... and I don't know of any arguably-better tool anyway. I would say that anyone who's tricksy enough to be using boolean logic in the way you describe would be expected to have enough nouse about them to either a) know what the precedences are or b) know that their lack of knowledge means they should sprinkle their code with brackets . Returning a syntax error for something that isn't actually an error in syntax is a poor showing. Are we to start expecting syntax errors when people use comparison operators on NULLable columns without a COALESCE because the comparison might do something they don't expect if they haven't tested their code with NULL values ? IMO using a = b c as described is unnecessarily* horrid, *whichever* way you mean it, and will only produce the sort of unreadability that generates errors in the long run anyway (even if you understand it, chances are the next poor sap reading your code won't) and deserves code that breaks, especially if you're the sort of person who uses it without fully understanding it. (*Unnecessarily because there are clearer constructs - CASE springs to mind - that do the same thing without the associated unreadability and/or ambiguity) Geoff
Re: [HACKERS] checkpointer continuous flushing
On August 10, 2015 8:24:21 PM GMT+02:00, Fabien COELHO coe...@cri.ensmp.fr wrote: Hello Andres, You can't allocate 4GB with palloc(), it has a builtin limit against allocating more than 1GB. Argh, too bad, I assumed very naively that palloc was malloc in disguise. It is, but there's some layering (memory pools/contexts) on top. You can get huge allocations with polloc_huge. Then the file would be fsynced twice: if the fsync is done properly (data have already been flushed to disk) then it would not cost much, and doing it sometimes twice on some file would not be a big issue. The code could also detect such event and log a warning, which would give a hint about how often it occurs in practice. Right. At the cost of keeping track of all files... If the pivot element changes its identity won't the result be pretty much random? That would be a very unlikely event, given the short time spent in qsort. Meh, we don't want to rely on likeliness on such things. My main argument is that even if it occurs, and the qsort result is partly wrong, it does not change correctness, it just mean that the actual writes will be less in order than wished. If it occurs, one pivot separation would be quite strange, but then others would be right, so the buffers would be partly sorted. It doesn't matter for correctness today, correct. But it makes out impossible to rely on or too. Another issue I see is that even if buffers are locked within cmp, the status may change between two cmp... Sure. That's not what in suggesting. Earlier versions of the patch kept an array of buffer headers exactly because of that. I do not think that locking all buffers for sorting them is an option. So on the whole, I think that locking buffers for sorting is probably not possible with the simple (and efficient) lightweight approach used in the patch. Yes, the other version has a higher space overhead. I'm not convinced that's meaningful in comparison to shared buffets in space. And rather doubtful it a loss performance wise in a loaded server. All the buffer headers are touched on other cores and doing the sort with indirection will greatly increase bus traffic. The good news, as I argued before, is that the order is only advisory to help with performance, but the correctness is really that all checkpoint buffers are written and fsync is called in the end, and does not depend on the buffer order. That is how it currently works anyway It's not particularly desirable to have a performance feature that works less well if the server is heavily and concurrently loaded. The likelihood of bogus sort results will increase with the churn rate in shared buffers. Andres --- Please excuse brevity and formatting - I am writing this on my mobile phone. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix pg_dump to dump shell types.
On 8/9/15 6:23 PM, Tom Lane wrote: It looks to me like the reason for this is that pg_dump forces the typacl of a type to be '{=U}' when reading the schema data for a pre-9.2 type, rather than reading it as NULL (ie default permissions) which would result in not printing any grant/revoke commands for the object. I do not see a good reason for that; quite aside from this problem, it means there is one more place that knows the default permissions for a type than there needs to be. Peter, what was the rationale? This was probably just copied from how proacl and lanacl are handled, which predate typacl by quite a bit. Maybe there was a reason in those days. It might also have something to do with how owner privileges are handled. An explicit '{=U}' doesn't create owner privileges, unlike a null value in that field. Maybe this is necessary if you dump and restore between databases with different user names. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] fix oversight converting buf_id to Buffer
Attached patch fixes oversights converting buf_id to Buffer in PrintBufferDescs() and InvalidateBuffer(). Especially for the latter, the reason we haven't seen any reports of the issue might be that it needs certain concurrent conditions to be true. Along the line, it also changes all direct maths against buf_id to use BufferDescriptorGetBuffer() instead. Regards, Qingqing diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c new file mode 100644 index e4b2558..2e9a7c7 *** a/src/backend/storage/buffer/bufmgr.c --- b/src/backend/storage/buffer/bufmgr.c *** retry: *** 1273,1279 UnlockBufHdr(buf); LWLockRelease(oldPartitionLock); /* safety check: should definitely not be our *own* pin */ ! if (GetPrivateRefCount(buf-buf_id) 0) elog(ERROR, buffer is pinned in InvalidateBuffer); WaitIO(buf); goto retry; --- 1273,1279 UnlockBufHdr(buf); LWLockRelease(oldPartitionLock); /* safety check: should definitely not be our *own* pin */ ! if (GetPrivateRefCount(BufferDescriptorGetBuffer(buf)) 0) elog(ERROR, buffer is pinned in InvalidateBuffer); WaitIO(buf); goto retry; *** ReleaseAndReadBuffer(Buffer buffer, *** 1426,1441 static bool PinBuffer(volatile BufferDesc *buf, BufferAccessStrategy strategy) { ! int b = buf-buf_id; boolresult; PrivateRefCountEntry *ref; ! ref = GetPrivateRefCountEntry(b + 1, true); if (ref == NULL) { ReservePrivateRefCountEntry(); ! ref = NewPrivateRefCountEntry(b + 1); LockBufHdr(buf); buf-refcount++; --- 1426,1441 static bool PinBuffer(volatile BufferDesc *buf, BufferAccessStrategy strategy) { ! Buffer b = BufferDescriptorGetBuffer(buf); boolresult; PrivateRefCountEntry *ref; ! ref = GetPrivateRefCountEntry(b, true); if (ref == NULL) { ReservePrivateRefCountEntry(); ! ref = NewPrivateRefCountEntry(b); LockBufHdr(buf); buf-refcount++; *** PinBuffer(volatile BufferDesc *buf, Buff *** 1460,1467 ref-refcount++; Assert(ref-refcount 0); ! ResourceOwnerRememberBuffer(CurrentResourceOwner, ! BufferDescriptorGetBuffer(buf)); return result; } --- 1460,1466 ref-refcount++; Assert(ref-refcount 0); ! ResourceOwnerRememberBuffer(CurrentResourceOwner, b); return result; } *** PinBuffer(volatile BufferDesc *buf, Buff *** 1489,1511 static void PinBuffer_Locked(volatile BufferDesc *buf) { ! int b = buf-buf_id; PrivateRefCountEntry *ref; /* * As explained, We don't expect any preexisting pins. That allows us to * manipulate the PrivateRefCount after releasing the spinlock */ ! Assert(GetPrivateRefCountEntry(b + 1, false) == NULL); buf-refcount++; UnlockBufHdr(buf); ! ref = NewPrivateRefCountEntry(b + 1); ref-refcount++; ! ResourceOwnerRememberBuffer(CurrentResourceOwner, ! BufferDescriptorGetBuffer(buf)); } /* --- 1488,1509 static void PinBuffer_Locked(volatile BufferDesc *buf) { ! Buffer b = BufferDescriptorGetBuffer(buf); PrivateRefCountEntry *ref; /* * As explained, We don't expect any preexisting pins. That allows us to * manipulate the PrivateRefCount after releasing the spinlock */ ! Assert(GetPrivateRefCountEntry(b, false) == NULL); buf-refcount++; UnlockBufHdr(buf); ! ref = NewPrivateRefCountEntry(b); ref-refcount++; ! ResourceOwnerRememberBuffer(CurrentResourceOwner, b); } /* *** static void *** 1520,1533 UnpinBuffer(volatile BufferDesc *buf, bool fixOwner) { PrivateRefCountEntry *ref; /* not moving as we're likely deleting it soon anyway */ ! ref = GetPrivateRefCountEntry(buf-buf_id + 1, false); Assert(ref != NULL); if (fixOwner) ! ResourceOwnerForgetBuffer(CurrentResourceOwner, ! BufferDescriptorGetBuffer(buf)); Assert(ref-refcount 0); ref-refcount--; --- 1518,1531 UnpinBuffer(volatile BufferDesc *buf, bool fixOwner) { PrivateRefCountEntry *ref; + Buffer b = BufferDescriptorGetBuffer(buf); /* not
Re: [HACKERS] WIP: SCRAM authentication
On 08/09/2015 07:19 PM, Michael Paquier wrote: ... during my exchange with Michael, I was thinking about the bug potential of taking the password field and multiplexing it in some way, which is significant. There is a definite risk of making this too complicated and we'll need to contrast that against ease-of-migration, because complicated mechanisms tend to be less secure due to user error. Sure. That's why I am all in for adding a compatibility GUC or similar that enforces the removal of old verifier types after marking those as deprecated for a couple of years as there's surely a significant risk to keep old passwords around or bad pg_hba entries. Still we need IMO a way for a user to save multiple verifiers generated from a client to manage carefully the password verifier aging, deprecations and support removal. That still falls under the heading of possibly too complicated though. As I see it, there's two potential migration paths: 1. Allow multiple verifiers for each login role (Heikki's plan). 2. Set verifier type per login role. (2) has the advantage of not requiring a bunch of new scaffolding (although it will require a little) and thus being less likely to introduce new bugs. It also doesn't require adding new catalog structures which are really only needed for the migration period, and after which will become a wart (i.e. having multiple verifiers per login role). In real migration terms, though, (2) has some major drawbacks in terms of making migration much harder. a) it won't work for Heroku and other 1-login-per-database hosting. b) moving to multiple roles from single roles per app is a painful process currently. For (a), one could argue that these are good candidates for all at once migrations, and that moving to SCRAM will depend on the host supporting it. Someone from Heroku could speak up here. For (b), there are a lot of things we could do to make migrating to a multiple-role infra much easier for users, which would continue to be useful even after the migration to SCRAM is history: * remove the role requirement for ALTER DEFAULT PRIVILEGES, and ... * add ALTER ROLE ___ ALTER DEFAULT OWNER, a command which sets the default owner of newly created objects by that login role to a different role of which they are a member. Alternatively, add a way to make a default SET ROLE whenever a login role logs in. These two changes, or changes like them that serve the same purpose, would allow us to prescribe the following migration path for most users: 1. add a new login role which is a member of the old login role and uses SCRAM; 2. set the defaults for that role so that its objects and permissions belong to the parent role; 3. move all applications to using SCRAM and the new role; 4. disable logins on the old, parent role. It's currently (2) which is painfully difficult, and could be made less so via the two features recommended above. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: SCRAM authentication
* Robert Haas (robertmh...@gmail.com) wrote: On Sat, Aug 8, 2015 at 1:23 PM, Heikki Linnakangas hlinn...@iki.fi wrote: On 08/08/2015 04:27 PM, Robert Haas wrote: I don't see that there's any good reason to allow the same password to be stored in the catalog encrypted more than one way, Sure there is. If you want to be able to authenticate using different mechanism, you need the same password encrypted in different ways. SCRAM uses verifier that's derived from the password in one way, MD5 authentication needs an MD5 hash, and yet other protocols have other requirements. Why do we need to be able to authenticate using more than one mechanism? If you have some clients that can't support SCRAM yet, you might as well continue using MD5 across the board until that changes. You're not going to get much real security out of using MD5 for some authentication attempts and SCRAM for other ones, and the amount of infrastructure we're proposing to introduce to support that is pretty substantial. I agree with Josh that requiring a hard switch simply won't be acceptable to our user community and if decide there can only be one running at a time then we'll never get people to move off md5 and therefore we wouldn't ever make real progress here. Apologies to Josh if I've misconstrued anything in his excellent response. and I don't think there's any good reason to introduce the PASSWORD VERIFIER terminology. I think we should store (1) your password, either encrypted or unencrypted; and (2) the method used to encrypt it. And that's it. Like Joe and Stephen, I actually find it highly confusing that we call the MD5 hash an encrypted password. The term password verifier is fairly common in the specifications of authentication mechanisms. I think we should adopt it. OK, but it sure looked from Michael's syntax description like you wrote PASSWORD VERIFIER (md5 'the_actual_password'). Or at least that was my impression from reading it, maybe I got it wrong. If you want to introduce ALTER USER ... PASSWORD VERIFIER as alternative syntax for what we now call ALTER USER ... ENCRYPTED PASSWORD, that works for me. But a plaintext password shouldn't be called a password verifier under the terminology you are using here, IIUC. Apologies for my incomplete up-thread definition of a password verifier, it was a bit off-the-cuff while I was out of the country and referred to the specific password verifier in SCRAM, which isn't a cleartext or encrypted password or a simple hash of it, or a salted hash. To flip it around as a positive definition instead of the negative it's not X, Y or Z which I provided up-thread, a password verifier is a general term for anything that can be used to verify that the client knows the right password, so it technically could be the cleartext version of the password, or a simple hash of the password, or an encrypted version of the password, or pretty much anything else that works for the protocol. The reason it came about was because of this very issue that there wasn't a general term for the value- it was referred to as a salted hash, or encrypted, or just hashed, etc, and those terms don't apply to everything that can be used today as a password verifier. As such, it's actually correct usage in this case as it's a general term, rather than the specific and incorrect term encrypted which we have today. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Asynchronous execution on FDW
* Robert Haas (robertmh...@gmail.com) wrote: On Mon, Aug 10, 2015 at 3:23 AM, Heikki Linnakangas hlinn...@iki.fi wrote: I've marked this as rejected in the commitfest, because others are working on a more general solution with parallel workers. That's still work-in-progress, and it's not certain if it's going to make it into 9.6, but if it does it will largely render this obsolete. We can revisit this patch later in the release cycle, if the parallel scan patch hasn't solved the same use case by then. I think the really important issue for this patch is the one discussed here: http://www.postgresql.org/message-id/ca+tgmoaijk1svzw_gkfu+zssxcijkfelqu2aomvuphpsfw4...@mail.gmail.com I agree that it'd be great to figure out the answer to #2, but I'm also of the opinion that we can either let the user tell us through the use of the GUCs proposed in the patch or simply not worry about the potential for time wastage associated with starting them all at once, as you suggested there. You raised an important issue there but never really expressed an opinion on the points I raised, here or on the other thread. And neither did anyone else except the patch author who, perhaps unsurprisingly, thinks it's OK. I wish we could get more discussion about that. When I read the proposal, I had the same reaction that it didn't seem like quite the right place and it further bothered me that it was specific to FDWs. Perhaps not surprisingly, as I authored it, but I'm still a fan of my proposal #1 here: http://www.postgresql.org/message-id/20131104032604.gb2...@tamriel.snowman.net More generally, I completely agree with the position (I believe your's, but I might be misremembering) that we want to have this async capability independently and in addition to parallel scan. I don't believe one obviates the advantages of the other. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] FSM versus GIN pending list bloat
On Tue, Aug 4, 2015 at 12:38 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Tue, Aug 4, 2015 at 1:39 AM, Simon Riggs si...@2ndquadrant.com wrote: On 4 August 2015 at 06:03, Jeff Janes jeff.ja...@gmail.com wrote: The attached proof of concept patch greatly improves the bloat for both the insert and the update cases. You need to turn on both features: adding the pages to fsm, and vacuuming the fsm, to get the benefit (so JJ_GIN=3). The first of those two things could probably be adopted for real, but the second probably is not acceptable. What is the right way to do this? Could a variant of RecordFreeIndexPage bubble the free space up `the map immediately rather than waiting for a vacuum? It would only have to move up until it found a page with freespace already recorded in it, which the vast majority of the time would mean observing up one level and then not writing to it, assuming the pending list pages remain well clustered. You make a good case for action here since insert only tables with GIN indexes on text are a common use case for GIN. Why would vacuuming the FSM be unacceptable? With a large gin_pending_list_limit it makes sense. But with a smallish gin_pending_list_limit (like the default 4MB) this could be called a lot (multiple times a second during some spurts), and would read the entire fsm each time. If it is unacceptable, perhaps we can avoid calling it every time, or simply have FreeSpaceMapVacuum() terminate more quickly on some kind of 80/20 heuristic for this case. Or maybe it could be passed a range of blocks which need vacuuming, so it concentrated on that range. But from the README file, it sounds like it is already supposed to be bubbling up. I'll have to see just whats going on there when I get a chance. Before making changes to the FSM code to make immediate summarization possible, I decided to quantify the effect of vacuuming the entire fsm. Going up to 5 GB of index size, the time taken to vacuum the entire FSM one time for each GIN_NDELETE_AT_ONCE was undetectable. Based on that, I made this patch which vacuums it one time per completed ginInsertCleanup, which should be far less than once per GIN_NDELETE_AT_ONCE. I would be interested in hearing what people with very large GIN indexes think of it. It does seem like at some point the time needed must become large, but from what I can tell that point is way beyond what someone is likely to have for an index on an unpartitioned table. I have a simple test case that inserts an array of 101 md5 digests into each row. With 10_000 of these rows inserted into an already indexed table, I get 40MB for the table and 80MB for the index unpatched. With the patch, I get 7.3 MB for the index. Cheers, Jeff gin_fast_freespace_v001.patch Description: Binary data gin_freespace2.pl Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Summary of plans to avoid the annoyance of Freezing
Having a freeze map would be wholly unnecessary if we don't ever need to freeze whole tables again. Freezing would still be needed on individual blocks where an old row has been updated or deleted; a freeze map would not help there either. So there is no conflict, but options 2) and 3) are completely redundant if we go for 5). After investigation, I now think 5) is achievable in 9.6, but if I am wrong for whatever reason, we have 2) as a backstop for more go to h ttp://www.pillenpalast.com/ http://www.pillenpalast.com/ - Kamagra http://www.pillenpalast.com/ -- View this message in context: http://postgresql.nabble.com/Summary-of-plans-to-avoid-the-annoyance-of-Freezing-tp5861530p5861534.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix pg_dump to dump shell types.
I wrote: But now that you mention it, isn't that completely broken? What pg_dump actually prints given this made-up data is REVOKE ALL ON TYPE myshell FROM PUBLIC; REVOKE ALL ON TYPE myshell FROM postgres; GRANT ALL ON TYPE myshell TO PUBLIC; which seems like a completely insane interpretation. There is no way that dumping a type from a pre-typacl database and restoring it into a newer one should end up with the type's owner having no privileges on it. I'm astonished that we've not gotten bug reports about that. ... of course, the reason for no field reports is that the owner still has privileges, as she inherits them from PUBLIC. Doesn't make this any less broken though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] linux sparc compile issue
On 2015-08-10 17:36:57 -0400, Tom Lane wrote: Waldemar Brodkorb w...@openadk.org writes: while doing regular builds via buildroot autobuilders a compile problem for sparc 32bit v8 was found. It seems the defines for Linux are other than for Solaris. Following patch fixes it for buildroot: The gcc predefines for Linux are __sparc_v8__/__sparc_v7__ I've applied your suggested patch for this, but I'm a bit curious what version of gcc you are using; our code's been like that for a very long time and nobody complained before. Hm. Could this be caused by using a new solaris studio compiler? It apparently uses gcc as a frontend and thus might actually reach that bit of code. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tap tests remove working directories
On 08/10/2015 10:32 AM, Andrew Dunstan wrote: On 08/10/2015 09:55 AM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: On 08/09/2015 08:58 PM, Michael Paquier wrote: Sure. Attached is what I have in mind. Contrary to your version we keep around temp paths should a run succeed after one that has failed when running make check multiple times in a row. Perhaps it does not matter much in practice as log files get removed at each new run but I think that this allows a finer control in case of failure. Feel free to have a look. Actually, I don't think this is a very good idea. You could end up with a whole series of opaquely named directories from a series of failing runs. If you want to keep the directory after a failure, and want to do another run, then rename the directory. That's what you have to do with the main regression tests, too. My version also has the benefit of changing exactly 3 lines in the source :-) FWIW, I think we should keep the behavior of the TAP tests as close as possible to the established behavior of the main regression tests. It's certainly possible that there's room for improvement in that benchmark behavior. But let's first converge the test behaviors, and then have a discussion about whether we want changes, and if so change all the tests at the same time. Yeah. To do that we should probably stop using File::Temp to make the directory, and just use a hardcoded name given to File::Path::mkpath. If the directory exists, we'd just remove it first. That won't be a very big change - probably just a handful of lines. Unfortunately, it's rather more complicated. These tests are extremely profligate with space and time, creating not less than 29 data directories in 19 temporary directories, including 14 temp directories in scripts alone, and not counting those in pg_rewind which puts two more data directories in a place which is different from all the others. And of course, in those directories (scripts and pg_ctl) that have multiple temp directories, we have no idea which one goes with which set of tests. It's no wonder that these tests take an inordinate amount of time to run - on crake, where it takes 23 seconds for make installcheck to run, it takes 176 seconds for make -C src/bin installcheck to run. My bet is all those initdb calls account for a substantial amount of that time. I think this needs a significant bit of reworking. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] subplan variable reference / unassigned NestLoopParams
Andreas Seltenreich seltenre...@gmx.de writes: Tom Lane writes: Andreas Seltenreich seltenre...@gmx.de writes: Tom Lane writes: Well, I certainly think all of these represent bugs: 3 | ERROR: plan should not reference subplan's variable 2 | ERROR: failed to assign all NestLoopParams to plan nodes These appear to be related. The following query produces the former, but if you replace the very last reference of provider with the literal 'bar', it raises the latter error. Fixed that, thanks for the test case! I haven't seen the former since your commit, but there recently were some instances of the latter. The attached queries all throw the error against master at 8752bbb. Turns out my patch only addressed some cases of the underlying issue. I've gone back for a second swipe at it; thanks for the continued testing! BTW, the commit hashes you've been mentioning don't seem to correspond to anything in the master PG repo ... are you working with modified sources? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] linux sparc compile issue
Waldemar Brodkorb w...@openadk.org writes: while doing regular builds via buildroot autobuilders a compile problem for sparc 32bit v8 was found. It seems the defines for Linux are other than for Solaris. Following patch fixes it for buildroot: The gcc predefines for Linux are __sparc_v8__/__sparc_v7__ I've applied your suggested patch for this, but I'm a bit curious what version of gcc you are using; our code's been like that for a very long time and nobody complained before. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fix oversight converting buf_id to Buffer
Hi, That's a very nice catch! Did you trigger the error or just found it when reading the code? On 2015-08-10 12:12:01 -0700, Qingqing Zhou wrote: Attached patch fixes oversights converting buf_id to Buffer in PrintBufferDescs() and InvalidateBuffer(). Especially for the latter, the reason we haven't seen any reports of the issue might be that it needs certain concurrent conditions to be true. PrintBufferDescs() is dead code, so bit is not surprising. I'm also not surprised that the wrong buffer in InvalidateBuffer()'s check doesn't trigger. You'd have to have a cursor open in the current backend that currently pins the off-by-one buffer. I'm too tired right now to look at this, but it generally looked sane. Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix pg_dump to dump shell types.
I wrote: Peter Eisentraut pete...@gmx.net writes: This was probably just copied from how proacl and lanacl are handled, which predate typacl by quite a bit. Maybe there was a reason in those days. Hm ... I wonder whether those are well-thought-out either. They're not. Testing with ancient servers shows that we dump very silly grant/revoke state for functions and languages as well, if the source server is too old to have proacl or lanacl (ie, pre-7.3). As with typacl, the silliness is accidentally masked as long as the owner doesn't do something like revoke the privileges granted to PUBLIC. Things work far more sanely with the attached patch, to wit we just leave all object privileges as default if dumping from a version too old to have privileges on that type of object. I think we should back-patch this into all supported branches; it's considerably more likely that older branches would be used to dump from ancient servers. regards, tom lane diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 51b6d3a..87dadbf 100644 *** a/src/bin/pg_dump/pg_dump.c --- b/src/bin/pg_dump/pg_dump.c *** getTypes(Archive *fout, int *numTypes) *** 3513,3519 else if (fout-remoteVersion = 80300) { appendPQExpBuffer(query, SELECT tableoid, oid, typname, ! typnamespace, '{=U}' AS typacl, (%s typowner) AS rolname, typinput::oid AS typinput, typoutput::oid AS typoutput, typelem, typrelid, --- 3513,3519 else if (fout-remoteVersion = 80300) { appendPQExpBuffer(query, SELECT tableoid, oid, typname, ! typnamespace, NULL AS typacl, (%s typowner) AS rolname, typinput::oid AS typinput, typoutput::oid AS typoutput, typelem, typrelid, *** getTypes(Archive *fout, int *numTypes) *** 3528,3534 else if (fout-remoteVersion = 70300) { appendPQExpBuffer(query, SELECT tableoid, oid, typname, ! typnamespace, '{=U}' AS typacl, (%s typowner) AS rolname, typinput::oid AS typinput, typoutput::oid AS typoutput, typelem, typrelid, --- 3528,3534 else if (fout-remoteVersion = 70300) { appendPQExpBuffer(query, SELECT tableoid, oid, typname, ! typnamespace, NULL AS typacl, (%s typowner) AS rolname, typinput::oid AS typinput, typoutput::oid AS typoutput, typelem, typrelid, *** getTypes(Archive *fout, int *numTypes) *** 3542,3548 else if (fout-remoteVersion = 70100) { appendPQExpBuffer(query, SELECT tableoid, oid, typname, ! 0::oid AS typnamespace, '{=U}' AS typacl, (%s typowner) AS rolname, typinput::oid AS typinput, typoutput::oid AS typoutput, typelem, typrelid, --- 3542,3548 else if (fout-remoteVersion = 70100) { appendPQExpBuffer(query, SELECT tableoid, oid, typname, ! 0::oid AS typnamespace, NULL AS typacl, (%s typowner) AS rolname, typinput::oid AS typinput, typoutput::oid AS typoutput, typelem, typrelid, *** getTypes(Archive *fout, int *numTypes) *** 3558,3564 appendPQExpBuffer(query, SELECT (SELECT oid FROM pg_class WHERE relname = 'pg_type') AS tableoid, oid, typname, ! 0::oid AS typnamespace, '{=U}' AS typacl, (%s typowner) AS rolname, typinput::oid AS typinput, typoutput::oid AS typoutput, typelem, typrelid, --- 3558,3564 appendPQExpBuffer(query, SELECT (SELECT oid FROM pg_class WHERE relname = 'pg_type') AS tableoid, oid, typname, ! 0::oid AS typnamespace, NULL AS typacl, (%s typowner) AS rolname, typinput::oid AS typinput, typoutput::oid AS typoutput, typelem, typrelid, *** getAggregates(Archive *fout, DumpOptions *** 4249,4255 CASE WHEN aggbasetype = 0 THEN 0 ELSE 1 END AS pronargs, aggbasetype AS proargtypes, (%s aggowner) AS rolname, ! '{=X}' AS aggacl FROM pg_aggregate where oid '%u'::oid, username_subquery, --- 4249,4255 CASE WHEN aggbasetype = 0 THEN 0 ELSE 1 END AS pronargs, aggbasetype AS proargtypes, (%s aggowner) AS rolname, ! NULL AS aggacl FROM pg_aggregate where oid '%u'::oid, username_subquery, *** getAggregates(Archive *fout, DumpOptions *** 4264,4270 CASE WHEN aggbasetype = 0 THEN 0 ELSE 1 END AS pronargs, aggbasetype AS proargtypes, (%s aggowner) AS rolname, ! '{=X}' AS aggacl FROM pg_aggregate where oid '%u'::oid, username_subquery, --- 4264,4270 CASE WHEN aggbasetype = 0 THEN 0 ELSE 1 END AS pronargs, aggbasetype AS proargtypes, (%s aggowner) AS rolname, !
Re: [HACKERS] [patch] A \pivot command for psql
On Mon, Aug 10, 2015 at 07:10:41PM +0200, Daniel Verite wrote: David Fetter wrote: I'm working up a proposal to add (UN)PIVOT support to the back-end. I was under the impression that a server-side PIVOT *with dynamic columns* was just unworkable as an SQL query, because it couldn't be prepared if it existed. That depends on what you mean by dynamic columns. The approach taken in the tablefunc extension is to use functions which return SETOF RECORD, which in turn need to be cast at runtime. At least one other implementation takes two separate approaches, both interesting at least from my point of view. The first is to spell out all the columns in the query, which is not dynamic columns in any reasonable sense, as hand-craft one piece of SQL whose purpose is to generate the actual pivot SQL is not a reasonable level of dynamic. The second, more on point, is to specify a serialization for the rows in the dynamic columns case. Their syntax is PIVOT XML, but I would rather do something more like PIVOT (SERIALIZATION XML). A third idea I have only roughed out feels a bit like cheating: creating a function which returns two distinct rowtypes via REFCURSORs or similar. The first result, used to create a CAST, spells out the row type of the second. I am wrong on that? I feel like you guys are all telling me that \pivot should happen on the server, but the point that it would not be realistic to begin with is not considered. I think that starting the conversation again, especially at this stage of the 9.6 cycle, is a very good thing, whatever its outcome. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ON CONFLICT DO UPDATE using EXCLUDED.column gives an error about mismatched types
On Wed, Aug 5, 2015 at 12:58 AM, Amit Langote langote_amit...@lab.ntt.co.jp wrote: I forgot mentioning one thing later yesterday. The way exclRelTlist is initialized, all the way in the beginning (transformOnConflictClause), is most probably to blame. It uses expandRelAttrs() for other valid reasons; but within it, expandRTE() is called with 'false' for include_dropped (columns). I applied the attached (perhaps ugly) fix, and it seemed to fix the issue. But, I guess you'll be able to come up with some better fix. Thanks for working on this. I think you have this right, but I also don't think we should discuss this fix outside of the context of a wider discussion about the excluded targetlist. What I'm going to do is roll this into my own pending patch to fix the issue with wholerow vars, which is also down to a problem with the excluded targetlist initially generated by calling expandRelAttrs(). Andres might want to take an alternative approach with that, so a consolidation makes sense. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix pg_dump to dump shell types.
On 08/10/2015 07:29 PM, Tom Lane wrote: I wrote: Peter Eisentraut pete...@gmx.net writes: This was probably just copied from how proacl and lanacl are handled, which predate typacl by quite a bit. Maybe there was a reason in those days. Hm ... I wonder whether those are well-thought-out either. They're not. Testing with ancient servers shows that we dump very silly grant/revoke state for functions and languages as well, if the source server is too old to have proacl or lanacl (ie, pre-7.3). As with typacl, the silliness is accidentally masked as long as the owner doesn't do something like revoke the privileges granted to PUBLIC. Things work far more sanely with the attached patch, to wit we just leave all object privileges as default if dumping from a version too old to have privileges on that type of object. I think we should back-patch this into all supported branches; it's considerably more likely that older branches would be used to dump from ancient servers. FYI this has fixed the problem that was encountered in cross-version upgrade testing. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix pg_dump to dump shell types.
Peter Eisentraut pete...@gmx.net writes: On 8/9/15 6:23 PM, Tom Lane wrote: It looks to me like the reason for this is that pg_dump forces the typacl of a type to be '{=U}' when reading the schema data for a pre-9.2 type, rather than reading it as NULL (ie default permissions) which would result in not printing any grant/revoke commands for the object. I do not see a good reason for that; quite aside from this problem, it means there is one more place that knows the default permissions for a type than there needs to be. Peter, what was the rationale? This was probably just copied from how proacl and lanacl are handled, which predate typacl by quite a bit. Maybe there was a reason in those days. Hm ... I wonder whether those are well-thought-out either. It might also have something to do with how owner privileges are handled. An explicit '{=U}' doesn't create owner privileges, unlike a null value in that field. Maybe this is necessary if you dump and restore between databases with different user names. But now that you mention it, isn't that completely broken? What pg_dump actually prints given this made-up data is REVOKE ALL ON TYPE myshell FROM PUBLIC; REVOKE ALL ON TYPE myshell FROM postgres; GRANT ALL ON TYPE myshell TO PUBLIC; which seems like a completely insane interpretation. There is no way that dumping a type from a pre-typacl database and restoring it into a newer one should end up with the type's owner having no privileges on it. I'm astonished that we've not gotten bug reports about that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpointer continuous flushing
On Tue, Aug 11, 2015 at 4:28 AM, Andres Freund wrote: On August 10, 2015 8:24:21 PM GMT+02:00, Fabien COELHO wrote: You can't allocate 4GB with palloc(), it has a builtin limit against allocating more than 1GB. Argh, too bad, I assumed very naively that palloc was malloc in disguise. It is, but there's some layering (memory pools/contexts) on top. You can get huge allocations with polloc_huge. palloc_huge does not exist yet ;) There is either repalloc_huge or palloc_extended now, though implementing one would be trivial. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fix oversight converting buf_id to Buffer
On Mon, Aug 10, 2015 at 4:15 PM, Andres Freund and...@anarazel.de wrote: That's a very nice catch! Did you trigger the error or just found it when reading the code? My fellow colleagues hit the issue during some stress: I am not clear the exact repro but from the failed assertion, the cause is kinda clear. Regards, Qingqing -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] A \pivot command for psql
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 08/10/2015 04:03 PM, David Fetter wrote: I am wrong on that? I feel like you guys are all telling me that \pivot should happen on the server, but the point that it would not be realistic to begin with is not considered. I think that starting the conversation again, especially at this stage of the 9.6 cycle, is a very good thing, whatever its outcome. Talk to Atri Sharma -- he recently indicated to me that he might be working on built-in pivot in the near term future (if not already started). - -- Joe Conway Crunchy Data Enterprise PostgreSQL http://crunchydata.com -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJVyUVSAAoJEDfy90M199hlhEgP/1mQpZ0JOR2kbaP5Iw6FY4q+ P1SLWbCWMhVDp97OAjbP0T34YmG5Rw0fDh4WLPbNBp6xyPTQQgIVFvQAu6kDSngk X9kbQHKGE+pNqc9hMc7CyuSse8Pw8VeQGRDU5a+8E3fPdi9rbB2YTDt7SJIXlavc zJ8kZlUj59Xw9Kdkpon8Jb/Nxn3JV4GWLTe4+nxRZoH/9POjslyM+rVGtrMlHA59 0NWWPnTsLfU1HNk+olf72ihZpmQM4KPog8PdWo0GyFqJhMwmYtE/WTJh4jNDygBe NXTQE7/I5OA6fYQniq+7Wsyhc5raOH16lVSSo0QquuTtGG2mrYsvd82Zx7J0SDQp NfVk4qgjJYMNBN9/hvPXZZ2+LReYqliloR2PqLqxgDn3DyGgSpSUs1JyDZRtoJEj P4jEVGVWnUbjKuNldRJZi1DmMTKVSS2RSnoC0RJ7DzAfUyQ4oH4FmX8jX5rZpoXN nJLtVPhIRpp0A2Lq849hXB3/LuNzjYmZ3VvGNUwffTD7d3XxQ/Zeb9ZtWZszfJUo zjMAh0wFwDtdVBYdFStzlByGTSEIqeqdGCDSHhiZlNw/E0xV2YjNzdUP9N34aWml i+KUgPMxTJ/GAineSXpjt+I6Y+cHA10JpQLOf6fVdrOp/R1Z2EqFfZl2GzbyXJ5n lz4QjhOfx6YG1Tdi5qYW =nyuc -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Test code is worth the space
On Mon, Aug 10, 2015 at 2:02 AM, Simon Riggs si...@2ndquadrant.com wrote: Almost every patch I review has either zero or insufficient tests. If we care about robustness, then we must discuss tests. Here are my two recent experiences: I agree we could do with x10 as many tests, but that doesn't mean all tests make sense, so there needs to be some limiting principles to avoid adding pointless test cases. I recently objected to adding a test case to Isolation tester for the latest ALTER TABLE patch because in that case there is no concurrent behavior to test. There is already a regression test that tests lock levels for those statements, so in my view it is more sensible to modify the existing test than to add a whole new isolation test that does nothing more than demonstrate the lock levels work as described, which we already knew. On another review I suggested we add a function to core to allow it to be used in regression tests. A long debate ensued, deciding that we must be consistent and put diagnostic functions in contrib. My understanding is that we are not allowed to write regression tests that use contrib modules, yet the consistent place to put diagnostic functions is contrib - therefore, we are never allowed to write tests utilizing diagnostic functions. We are allowed to put modules for testing in another directory, but these are not distributed to users so cannot be used for production diagnosis. Systemic fail, advice needed. There are actually two ways to do this. One model is the dummy_seclabel module. The build system arranges for that to be available when running the core regression tests, so they can use it. And the dummy_seclabel test does. There are actually a couple of other loadable modules that are used by the core regression tests in this kind of way (refint and autoinc, from contrib/spi). The other model is what I'll call the test_decoding model. Since the core code can't be tested without a module, we have a module. But then the tests are in the module's directory (contrib/test_decoding/{sql,expected}) not the core regression tests. In general, I have a mild preference for the second model. It seems more scalable, and keeps the core tests quick to run, which is appropriate for more obscure tests that are unlikely to break very often. But the first model can also be done, as show by the fact that we have in fact done it several times. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest remaining Needs Review items
On 08/10/2015 10:46 AM, Atri Sharma wrote: * Unique Joins Still needs to be reviewed. Any volunteers? Can take this one up, if its within my limits. Thanks! I've added you as reviewer to that. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] A \pivot command for psql
2015-08-10 6:04 GMT+02:00 David Fetter da...@fetter.org: On Sun, Aug 09, 2015 at 07:29:40PM +0200, Daniel Verite wrote: Hi, I want to suggest a client-side \pivot command in psql, implemented in the attached patch. \pivot takes the current query in the buffer, execute it and display it pivoted by interpreting the result as: column1 = row in pivoted output column2 = column in pivoted output column3 = value at (row,column) in pivoted ouput This is really neat work, and thanks for the patch. This issue in this one as in the crosstab extension is that it's available only if you are using some particular piece of extra software, in this case the psql client. I'm working up a proposal to add (UN)PIVOT support to the back-end. Would you like to join in on that? PIVOT, UNPIVOT should be preferred solution Pavel I can look on implementations in other db Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest remaining Needs Review items
On Mon, Aug 10, 2015 at 1:04 PM, Heikki Linnakangas hlinn...@iki.fi wrote: Hello Hackers, There are a few Needs Review items remaining in the July commitfest. Reviewers, please take action - you are holding up the commitfest. In addition to these items, there are a bunch of items in Ready for Committer state. Committers: please help with those. * extends pgbench expressions with functions Robert signed up as reviewer for this a long time ago. Ping, do you still plan to review this? * self-defined policy for sepgsql regression test Stephen: would you have the time to handle this, please? * Allocate restart_lsn at the time of creating physical replication slot Andres, can you have a look at this, please? * Parallel Seq scan Jeff, Haribabu, you're signed up as reviewers. How's it going? * Join pushdown support for foreign tables Ashutosh, KaiGai, Etsuro: You signed up as reviewers in spring, but there hasn't been any activity during this commitfest. What's the status of this patch? Do you plan to review this now? There are three patches involved here. I have reviewed one of them. I am planning to look at them within few days. * replace pg_stat_activity.waiting with something more descriptive This is not quite ready yet, but has had its fair share of review, so I'm going to move it to the next commitfest. Feel free to continue the discussion and review, though; I just don't want drag the commitfest because for this. * Unique Joins Still needs to be reviewed. Any volunteers? * checkpoint continuous flushing Per discussion on that thread, let's just do the sorting part now. Needs some cleanup, but it's almost there. - Heikki -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Summary of plans to avoid the annoyance of Freezing
On 2015-08-10 07:26:29 +0100, Simon Riggs wrote: On 10 August 2015 at 07:14, Peter Geoghegan p...@heroku.com wrote: On Sun, Aug 9, 2015 at 11:03 PM, Simon Riggs si...@2ndquadrant.com wrote: If 5) fails to bring a workable solution by the Jan 2016 CF then we commit 2) instead. Is there actually a conflict there? I didn't think so. I didn't explain myself fully, thank you for asking. Having a freeze map would be wholly unnecessary if we don't ever need to freeze whole tables again. Freezing would still be needed on individual blocks where an old row has been updated or deleted; a freeze map would not help there either. So there is no conflict, but options 2) and 3) are completely redundant if we go for 5). After investigation, I now think 5) is achievable in 9.6, but if I am wrong for whatever reason, we have 2) as a backstop. I don't think that's true. You can't ever delete the clog without freezing. There's no need for anti-wraparound scans anymore, but you still need to freeze once. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Precedence of standard comparison operators
2015-08-10 5:37 GMT+02:00 Noah Misch n...@leadboat.com: On Sun, Aug 09, 2015 at 08:06:11PM -0400, Tom Lane wrote: Noah Misch n...@leadboat.com writes: In SQL:2008 and SQL:2011 at least, =, and BETWEEN are all in the same boat. They have no precedence relationships to each other; SQL sidesteps the question by requiring parentheses. They share a set of precedence relationships to other constructs. SQL does not imply whether to put them in one %nonassoc precedence group or in a few, but we can contemplate whether users prefer an error or prefer the 9.4 behavior for affected queries. Part of my thinking was that the 9.4 behavior fails the principle of least astonishment, because I seriously doubt that people expect '=' to be either right-associative or lower priority than ''. Here's one example: regression=# select false = true false; ?column? -- t (1 row) So yeah, I do think that getting a syntax error if you don't use parentheses is the preferable behavior here. If we raise a syntax error, then there should be very informative message, because pattern true = 2 1 is probably relative often and it is hard to find syntax error on this trivial expression Regards Pavel I agree. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpointer continuous flushing
On 2015-08-08 20:49:03 +0300, Heikki Linnakangas wrote: * I think we should drop the flush part of this for now. It's not as clearly beneficial as the sorting part, and adds a great deal more code complexity. And it's orthogonal to the sorting patch, so we can deal with it separately. I don't agree. For one I've seen it cause rather big latency improvements, and we're horrible at that. But more importantly I think the requirements of the flush logic influences how exactly the sorting is done. Splitting them will just make it harder to do the flushing in a not too big patch. * Is it really necessary to parallelize the I/O among tablespaces? I can see the point, but I wonder if it makes any difference in practice. Today it's somewhat common to have databases that are bottlenecked on write IO and all those writes being done by the checkpointer. If we suddenly do the writes to individual tablespaces separately and sequentially we'll be bottlenecked on the peak IO of a single tablespace. * Is there ever any harm in sorting the buffers? The GUC is useful for benchmarking, but could we leave it out of the final patch? Agreed. * Do we need to worry about exceeding the 1 GB allocation limit in AllocateCheckpointBufferIds? It's enough got 2 TB of shared_buffers. That's a lot, but it's not totally crazy these days that someone might do that. At the very least, we need to lower the maximum of shared_buffers so that you can't hit that limit. We can just use the _huge variant? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Summary of plans to avoid the annoyance of Freezing
On 08/10/2015 11:17 AM, Andres Freund wrote: On 2015-08-10 07:26:29 +0100, Simon Riggs wrote: On 10 August 2015 at 07:14, Peter Geoghegan p...@heroku.com wrote: On Sun, Aug 9, 2015 at 11:03 PM, Simon Riggs si...@2ndquadrant.com wrote: If 5) fails to bring a workable solution by the Jan 2016 CF then we commit 2) instead. Is there actually a conflict there? I didn't think so. I didn't explain myself fully, thank you for asking. Having a freeze map would be wholly unnecessary if we don't ever need to freeze whole tables again. Freezing would still be needed on individual blocks where an old row has been updated or deleted; a freeze map would not help there either. So there is no conflict, but options 2) and 3) are completely redundant if we go for 5). After investigation, I now think 5) is achievable in 9.6, but if I am wrong for whatever reason, we have 2) as a backstop. I don't think that's true. You can't ever delete the clog without freezing. There's no need for anti-wraparound scans anymore, but you still need to freeze once. What's your definition of freezing? As long as you remove all dead tuples, you can just leave the rest in place with their original XID+epoch in place, and assume that everything old enough is committed. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE POLICY and RETURNING
Hi, This thread has a pretty thorough discussion of pros and cons of applying SELECT policy to other commands. Besides what have been mentioned, I think there is another potential pro: we can enable references to pseudorelations OLD and NEW in predicates. Now, for UPDATE, the references to the target table in USING clause are actually references to OLD and the references in WITH CHECK clause are references to NEW. Logically now USING and WITH CHECK are combined by AND, so we cannot have predicates like foo(NEW) 1 OR bar(OLD) 1 (combine predicates referencing OLD and NEW by an operation other than AND) NEW.id OLD.id(reference both in the same expression) If we apply SELECT policy to other commands, we only need one predicate for INSERT/UPDATE/DELETE. That predicate can reference to OLD and NEW, just like predicates for triggers and rules. For UPDATE and DELETE, the predicate of SELECT will be applied first (when the target table is scanned) to ensure no information leakage and their own predicate will be applied later. This doesn't change much for INSERT and DELETE, but it gives users more flexibility when they set predicates for UPDATE. Thanks, Zhaomo -- View this message in context: http://postgresql.nabble.com/CREATE-POLICY-and-RETURNING-tp5823192p5861550.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE POLICY and RETURNING
In case you missed the link to the previous discussion at the bottom, http://www.postgresql.org/message-id/CAHGQGwEqWD=ynqe+zojbpoxywt3xlk52-v_q9s+xofckjd5...@mail.gmail.com
Re: [HACKERS] statement_timeout affects query results fetching?
Thanks for the explanation Robert, that makes total sense. However, it seems like the utility of PG's statement_timeout is much more limited than I thought. In case you're interested, I dug a little further and it seems that Microsoft's client for SQL Server implements the following timeout (source https://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlcommand.commandtimeout%28v=vs.110%29.aspx?f=255MSPPError=-2147217396 ): cumulative time-out (for all network packets that are read during the invocation of a method) for all network reads during command execution or processing of the results. A time-out can still occur after the first row is returned, and does not include user processing time, only network read time. Since it doesn't seem possible to have a clean query-processing-only timeout at the backend, we may be better off doing something similar to the above and enforce timeouts on the client only. Any further thoughts on this would be appreciated. On Sun, Aug 9, 2015 at 5:21 PM, Robert Haas robertmh...@gmail.com wrote: On Sat, Aug 8, 2015 at 11:30 AM, Shay Rojansky r...@roji.org wrote: the entire row in memory (imagine rows with megabyte-sized columns). This makes sense to me; Tom, doesn't the libpq behavior you describe of absorbing the result set as fast as possible mean that a lot of memory is wasted on the client side? It sure does. I can definitely appreciate the complexity of changing this behavior. I think that some usage cases (such as mine) would benefit from a timeout on the time until the first row is sent, this would allow to put an upper cap on stuff like query complexity, for example. Unfortunately, it would not do any such thing. It's possible for the first row to be returned really really fast and then for an arbitrary amount of time to pass and computation to happen before all the rows are returned. A plan can have a startup cost of zero and a run cost of a billion (or a trillion). This kind of scenario isn't even particularly uncommon. You just need a plan that looks like this: Nested Loop - Nested Loop - Nested Loop - Seq Scan - Index Scan - Index Scan - Index Scan You can just keep pushing more nested loop/index scans on there and the first row will still pop out quite fast. But if the seq-scanned table is large, generating the whole result set can take a long, long time. Even worse, you could have a case like this: SELECT somefunc(a) FROM foo; Now suppose somefunc is normally very quick, but if a = 42 then it does pg_sleep() in a loop until the world ends. You're going to have however many rows of foo have a != 42 pop out near-instantaneously, and then it will go into the tank and not come out until the meaning of life, the universe, and everything is finally revealed. That second case is a little extreme, and a little artificial, but the point is this: just as you don't want the client to have to buffer the results in memory, the server doesn't either. It's not the case that the server computes the rows and sends them to you. Each one is sent to you as it is computed, and in the normal case, at the time the first row is sent, only a small percentage of the total work of the query has been performed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Moving SS_finalize_plan processing to the end of planning
On 10 August 2015 at 07:50, Tom Lane t...@sss.pgh.pa.us wrote: I've started to work on path-ification of the upper planner (finally), and since that's going to be a large patch in any case, I've been looking for pieces that could be bitten off and done separately. One such piece is the fact that SS_finalize_plan (computation of extParams/allParams) currently gets run at the end of subquery_planner; as long as that's true, we cannot have subquery_planner returning paths rather than concrete plans. The attached patch moves that processing to occur just before set_plan_references is run. Basically what this patch does is to divide what had been done in SS_finalize_plan into three steps: * SS_identify_outer_params determines which outer-query-level Params will be available for the current query level to use, and saves that aside in a new PlannerInfo field root-outer_params. This processing turns out to be the only reason that SS_finalize_plan had to be called in subquery_planner: we have to capture this data before exiting subquery_planner because the upper levels' plan_params lists may change as they plan other subplans. * SS_attach_initplans does the work of attaching initplans to the parent plan node and adjusting the parent's cost estimate accordingly. * SS_finalize_plan now *only* does extParam/allParam calculation. I had to change things around a bit in planagg.c (which was already a hack, and the law of conservation of cruft applies). Otherwise it's pretty straightforward and removes some existing hacks. One notable point is that there's no longer an assumption within SS_finalize_plan that initPlans can only appear in the top plan node. Any objections? Great! I'm very interested in this work, specifically around the grouping_planner() changes too. I've looked over the patch and from what I understand it seems like a good solid step in the right direction. I was digging around the grouping_planner() last week with the intentions of making some changes there to allow GROUP BY before join, but in the end decided to stay well clear of this area until this pathification is done. So far I've managed to keep my changes away from the upper planner and I've added GroupingPath types, which from what I can predict of what you'll be making changes to, I think you'll also need to have grouping_planner() return a few variations of GroupingPath to allow the paths list to be passed up to subquery_planner() and on up to functions like recurse_set_operations() so that they have the option of choosing GroupAggregate / MergeAppend to implement UNION. If I'm right on this, then maybe there's a few things you can copy and paste from the patch I posted here: http://www.postgresql.org/message-id/CAKJS1f-sEcm=gtfs-dqjsocsz-vlhrp_hsrntjjq-s7egn8...@mail.gmail.com specifically around create_grouping_path()? Kind Regards David Rowley -- David Rowley http://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [HACKERS] Test code is worth the space
On 8 August 2015 at 17:47, Noah Misch n...@leadboat.com wrote: We've too often criticized patches for carrying many lines/bytes of test case additions. Let's continue to demand debuggable, secure tests that fail only when something is wrong, but let's stop pushing for test minimalism. Such objections may improve the individual patch, but that doesn't make up for the chilling effect on test contributions. I remember clearly the first time I submitted thorough test coverage with a feature. Multiple committers attacked it in the name of brevity. PostgreSQL would be better off with 10x its current test bulk, even if the average line of test code were considerably less important than we expect today. Almost every patch I review has either zero or insufficient tests. If we care about robustness, then we must discuss tests. Here are my two recent experiences: I agree we could do with x10 as many tests, but that doesn't mean all tests make sense, so there needs to be some limiting principles to avoid adding pointless test cases. I recently objected to adding a test case to Isolation tester for the latest ALTER TABLE patch because in that case there is no concurrent behavior to test. There is already a regression test that tests lock levels for those statements, so in my view it is more sensible to modify the existing test than to add a whole new isolation test that does nothing more than demonstrate the lock levels work as described, which we already knew. On another review I suggested we add a function to core to allow it to be used in regression tests. A long debate ensued, deciding that we must be consistent and put diagnostic functions in contrib. My understanding is that we are not allowed to write regression tests that use contrib modules, yet the consistent place to put diagnostic functions is contrib - therefore, we are never allowed to write tests utilizing diagnostic functions. We are allowed to put modules for testing in another directory, but these are not distributed to users so cannot be used for production diagnosis. Systemic fail, advice needed. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] Summary of plans to avoid the annoyance of Freezing
On Sun, Aug 9, 2015 at 11:03 PM, Simon Riggs si...@2ndquadrant.com wrote: If 5) fails to bring a workable solution by the Jan 2016 CF then we commit 2) instead. Is there actually a conflict there? I didn't think so. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Summary of plans to avoid the annoyance of Freezing
On 10 August 2015 at 07:14, Peter Geoghegan p...@heroku.com wrote: On Sun, Aug 9, 2015 at 11:03 PM, Simon Riggs si...@2ndquadrant.com wrote: If 5) fails to bring a workable solution by the Jan 2016 CF then we commit 2) instead. Is there actually a conflict there? I didn't think so. I didn't explain myself fully, thank you for asking. Having a freeze map would be wholly unnecessary if we don't ever need to freeze whole tables again. Freezing would still be needed on individual blocks where an old row has been updated or deleted; a freeze map would not help there either. So there is no conflict, but options 2) and 3) are completely redundant if we go for 5). After investigation, I now think 5) is achievable in 9.6, but if I am wrong for whatever reason, we have 2) as a backstop. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
[HACKERS] Commitfest remaining Needs Review items
Hello Hackers, There are a few Needs Review items remaining in the July commitfest. Reviewers, please take action - you are holding up the commitfest. In addition to these items, there are a bunch of items in Ready for Committer state. Committers: please help with those. * extends pgbench expressions with functions Robert signed up as reviewer for this a long time ago. Ping, do you still plan to review this? * self-defined policy for sepgsql regression test Stephen: would you have the time to handle this, please? * Allocate restart_lsn at the time of creating physical replication slot Andres, can you have a look at this, please? * Parallel Seq scan Jeff, Haribabu, you're signed up as reviewers. How's it going? * Join pushdown support for foreign tables Ashutosh, KaiGai, Etsuro: You signed up as reviewers in spring, but there hasn't been any activity during this commitfest. What's the status of this patch? Do you plan to review this now? * replace pg_stat_activity.waiting with something more descriptive This is not quite ready yet, but has had its fair share of review, so I'm going to move it to the next commitfest. Feel free to continue the discussion and review, though; I just don't want drag the commitfest because for this. * Unique Joins Still needs to be reviewed. Any volunteers? * checkpoint continuous flushing Per discussion on that thread, let's just do the sorting part now. Needs some cleanup, but it's almost there. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest remaining Needs Review items
* Unique Joins Still needs to be reviewed. Any volunteers? Can take this one up, if its within my limits.
[HACKERS] Summary of plans to avoid the annoyance of Freezing
Freezing is painful for VLDBs and high transaction rate systems. We have a number of proposals to improve things... 1. Allow parallel cores to be used with vacuumdb (Dilip) Idea is if we have to get the job done, lets do it as fast as we can using brute force. Reasonable thinking, but there are more efficient approaches. 2. Freeze Map (Sawada) This works and we have a mostly-ready patch. I'm down to do final review on this, which is why I'm producing this summary and working out what's the best next action for Postgres. 3. Speed up autovacuums when they are triggered to avoid wraparounds (Simon) Idea is to do a VACUUM scan which only freezes tuples. If we dirty a page from freezing we then also prune it, but don't attempt to scan indexes to remove the now-truncated dead tuples. This looks very straightforward, no technical issues. Might even be able to backpatch it. [patch investigated but not finished yet] 4. 64-bit Xids (Alexander) Proposal rejected 5. 64-bit Xid Visibility, with Xid Epoch in page header (Heikki) * Epoch is stored on new pages, using a new page format. Epoch of existing pages is stored in control file - code changes to bufpage.c look isolated * All transaction age tests are made using both xid and epoch - the code changes to tqual.c look isolated * We won't need to issue anti-wraparound vacuums again, ever * We will still need to issue clog-trimming vacuums, so we'll need a new parameter to control when to start scanning to avoid clog growing too large * relfrozenxid will be set to InvalidTransactionId once all existing pages have been made visible * relhintxid and relhintepoch will be required so we have a backstop to indicate where to truncate clog (and same for multixacts) * There is no need to FREEZE except when UPDATEing/DELETEing pages from previous epochs * VACUUM FREEZE will only freeze old pages; on a new cluster it will work same as VACUUM * VACUUMs that touch every non-all-visible page will be able to set relhintxid to keep clog trimmed, so never need to scan all blocks in table * Code changes seem fairly isolated * No action required at pg_upgrade * Additional benefit: we can move to 32-bit CRC checksums on data pages at same time as doing this (seamlessly). * 8 bytes additional space required per page (~0.1% growth in database size) * (Any other changes to page headers can be made or reserved at this time) To me 3) would have been useful if we'd done it earlier. Now that we have 2) and 5), I don't see much point pursuing 3). The main options are 2) or 5) Freeze Map (2) makes Freezing more efficient for larger tables, but doesn't avoid the need altogether. 5) is a deeper treatment of the underlying problem and is a better architecture for the future of Postgres, IMHO. I was previously a proponent of (2) as a practical way forwards, but my proposal here today is that we don't do anything further on 2) yet, and seek to make progress on 5) instead. If 5) fails to bring a workable solution by the Jan 2016 CF then we commit 2) instead. If Heikki wishes to work on (5), that's good. Otherwise, I think its something I can understand and deliver by 1 Jan, though likely for 1 Nov CF. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] Asynchronous execution on FDW
I've marked this as rejected in the commitfest, because others are working on a more general solution with parallel workers. That's still work-in-progress, and it's not certain if it's going to make it into 9.6, but if it does it will largely render this obsolete. We can revisit this patch later in the release cycle, if the parallel scan patch hasn't solved the same use case by then. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL, RAISE and error context
On 07/26/2015 08:34 AM, Pavel Stehule wrote: Hi here is complete patch, that introduce context filtering on client side. The core of this patch is trivial and small - almost all of size are trivial changes in regress tests - removing useless context. Documentation, check-world Looks good to me at first glance. I'll mark this as Ready for Committer. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest remaining Needs Review items
On Mon, Aug 10, 2015 at 1:04 PM, Heikki Linnakangas hlinn...@iki.fi wrote: * Parallel Seq scan Jeff, Haribabu, you're signed up as reviewers. How's it going? The current state of the patch is that there are few comments by Jeff and others which I need to take care, however it would have been better if there are more comments after detailed review. In the absence of more detailed review, I will address the existing comments and send an updated patch. Feel free to move it to next CF or I will do the same if there are no more comments before you want to close this CF. Many Thanks for managing this Commit Fest and giving each patch a fair share of it's time for review and discussion and doing review of as many patches as possible by yourself. Your contribution in systematic progress of CF is really valuable. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Commitfest remaining Needs Review items
On 2015/08/10 17:10, Ashutosh Bapat wrote: On Mon, Aug 10, 2015 at 1:04 PM, Heikki Linnakangas hlinn...@iki.fi mailto:hlinn...@iki.fi wrote: * Join pushdown support for foreign tables Ashutosh, KaiGai, Etsuro: You signed up as reviewers in spring, but there hasn't been any activity during this commitfest. What's the status of this patch? Do you plan to review this now? There are three patches involved here. I have reviewed one of them. I am planning to look at them within few days. Sorry for not having taken any action. I think we should address the issue dicussed in [1] first because I think that would affect the design of join pushdown API in the core. I'll submit a patch for that within a few days. I'd like to review this in detail after addressing that, if it's OK. Best regards, Etsuro Fujita [1] http://www.postgresql.org/message-id/558a18b3.9050...@lab.ntt.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Rework access method interface
On 2015-08-09 23:56, Alexander Korotkov wrote: Hacker, some time before I proposed patches implementing CREATE ACCESS METHOD. http://www.postgresql.org/message-id/capphfdsxwzmojm6dx+tjnpyk27kt4o7ri6x_4oswcbyu1rm...@mail.gmail.com As I get from comments to my patches and also from Tom's comment about AM interface in tablesampling thread – AM interface needs reworking. And AFAICS AM interface rework is essential to have CREATE ACCESS METHOD command. http://www.postgresql.org/message-id/5438.1436740...@sss.pgh.pa.us Cool, I was planning to take a stab at this myself when I have time, so I am glad you posted this. This is why I'd like to show a WIP patch implementing AM interface rework. Patch is quite dirty yet, but I think it illustrated the idea quite clear. AM now have single parameter – handler function. This handler returns pointer to AmRoutine which have the same data as pg_am had before. Thus, API is organized very like FDW. I wonder if it would make sense to call it IndexAmRoutine instead in case we add other AMs (like the sequence am, or maybe column store if that's done as AM) in the future. The patch should probably add the am_handler type which should be return type of the am handler function (see fdw_handler and tsm_handler types). I also think that the CHECK_PROCEDUREs should be in the place of the original GET_REL_PROCEDUREs (just after relation check). If the interface must exist we may as well check for it at the beginning and not after we did some other work which is useless without the interface. However, this patch appears to take more work than I expected. It have to do many changes spread among many files. Yeah you need to change the definition and I/O handling of every AM function, but that's to be expected. Also, it appears not so easy to hide amsupport into AmRoutine, because it's needed for relcache. As a temporary solution it's duplicated in RelationData. I don't understand this, there is already AmRoutine in RelationData, why the need for additional field for just amsupport? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Using contrib modules in check (Re: pgsql: Fix BRIN to use SnapshotAny during summarization)
Re: Tom Lane 2015-08-07 928.1438900...@sss.pgh.pa.us Alvaro Herrera alvhe...@alvh.no-ip.org writes: Fix BRIN to use SnapshotAny during summarization This patch added an isolation test that fails unless contrib/pageinspect has been built and installed. I do not find that acceptable. It causes make check-world to fail ... and no, installing the extension during make check-world isn't going to make me happier. I don't really think we need this isolation test at all, but if we do, please fix it to not rely on any extensions. Perhaps looking at pg_relation_size or some such would do? Or you could just issue a query that should use the index, and see if it finds the rows it ought to. Hi, this breaks the Debian package builds as well because we run check-world as a build step. Any chance for a fix/workaround so the nightly master/head builds will succeed again? Christoph -- c...@df7cb.de | http://www.df7cb.de/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tap tests remove working directories
On 08/09/2015 08:58 PM, Michael Paquier wrote: On Sun, Aug 9, 2015 at 11:19 PM, Andrew Dunstan and...@dunslane.net wrote: On 08/09/2015 08:41 AM, Michael Paquier wrote: On Sun, Aug 9, 2015 at 1:40 AM, Andrew Dunstan and...@dunslane.net wrote: On 08/08/2015 09:31 AM, Robert Haas wrote: On Fri, Aug 7, 2015 at 7:17 PM, Andrew Dunstan and...@dunslane.net wrote: That certainly isn't what happens, and given the way this is done in TestLib.pm, using the CLEANUP parameter of File::Temp's tempdir() function, it's not clear how we could do that easily. shot-in-the-dark Set cleanup to false and manually remove the directory later in the code, so that stuff runs only if we haven't died sooner? /shot-in-the-dark Yeah, maybe. I'm thinking of trying to do it more globally, like in src/Makefile.global.in. That way we wouldn't have to add new code to every test file. If we rely on END to clean up the temporary data folder, there is no need to impact each test file, just the test functions called in TestLib.pm that could switch a flag to not perform any cleanup at the end of the run should an unexpected result be found. Now I am as well curious to see what you have in mind with manipulating Makefile.global. See attached. If you have a better idea please post your patch. Sure. Attached is what I have in mind. Contrary to your version we keep around temp paths should a run succeed after one that has failed when running make check multiple times in a row. Perhaps it does not matter much in practice as log files get removed at each new run but I think that this allows a finer control in case of failure. Feel free to have a look. Actually, I don't think this is a very good idea. You could end up with a whole series of opaquely named directories from a series of failing runs. If you want to keep the directory after a failure, and want to do another run, then rename the directory. That's what you have to do with the main regression tests, too. My version also has the benefit of changing exactly 3 lines in the source :-) cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.
On Mon, Aug 10, 2015 at 1:36 PM, Rahila Syed rahilasye...@gmail.com wrote: Hello, Say, 6 bigint counters, 6 float8 counters, and 3 strings up to 80 characters each. So we have a fixed-size chunk of shared memory per backend, and each backend that wants to expose progress information can fill in those fields however it likes, and we expose the results. This would be sorta like the way pg_statistic works: the same columns can be used for different purposes depending on what estimator will be used to access them. After thinking more on this suggestion, I came up with following generic structure which can be used to store progress of any command per backend in shared memory. Struct PgBackendProgress { int32 *counter[COMMAND_NUM_SLOTS]; float8 *counter_float[COMMAND_NUM_SLOTS]; char *progress_message[COMMAND_NUM_SLOTS]; } COMMAND_NUM_SLOTS will define maximum number of slots(phases) for any command. Progress of command will be measured using progress of each phase in command. For some command the number of phases can be singular and rest of the slots will be NULL. Each phase will report n integer counters, n float counters and a progress message. For some phases , any of the above fields can be NULL. For VACUUM , there can 3 phases as discussed in the earlier mails. Phase 1. Report 2 integer counters: heap pages scanned and total heap pages, 1 float counter: percentage_complete and progress message. Phase 2. Report 2 integer counters: index pages scanned and total index pages(across all indexes) and progress message. Phase 3. 1 integer counter: heap pages vacuumed. This structure can be accessed by statistics collector to display progress via new view. I have one question about this. When we're in Phase2 or 3, don't we need to report the number of total page scanned or percentage of how many table pages scanned, as well? As Robert said, Phase2(means lazy_vacuum_index here) and 3(means lazy_vacuum_heap here) could be called whenever lazy_scan_heap fills up the maintenance_work_mem. And phase 3 could be called at the end of scanning single page if table doesn't have index. So if vacuum progress checker reports the only current phase information, we would not be able to know where we are in now. Regards, -- Masahiko Sawada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Test code is worth the space
On 10 August 2015 at 13:55, Robert Haas robertmh...@gmail.com wrote: On Mon, Aug 10, 2015 at 2:02 AM, Simon Riggs si...@2ndquadrant.com wrote: On another review I suggested we add a function to core to allow it to be used in regression tests. A long debate ensued, deciding that we must be consistent and put diagnostic functions in contrib. My understanding is that we are not allowed to write regression tests that use contrib modules, yet the consistent place to put diagnostic functions is contrib - therefore, we are never allowed to write tests utilizing diagnostic functions. We are allowed to put modules for testing in another directory, but these are not distributed to users so cannot be used for production diagnosis. Systemic fail, advice needed. There are actually two ways to do this. One model is the dummy_seclabel module. The build system arranges for that to be available when running the core regression tests, so they can use it. And the dummy_seclabel test does. There are actually a couple of other loadable modules that are used by the core regression tests in this kind of way (refint and autoinc, from contrib/spi). The other model is what I'll call the test_decoding model. Since the core code can't be tested without a module, we have a module. But then the tests are in the module's directory (contrib/test_decoding/{sql,expected}) not the core regression tests. In general, I have a mild preference for the second model. It seems more scalable, and keeps the core tests quick to run, which is appropriate for more obscure tests that are unlikely to break very often. But the first model can also be done, as show by the fact that we have in fact done it several times. Neither of those uses a diagnostic function that also has value in production environments - for logical decoding we added something to core specifically to allow this pg_recvlogical. Anyway, resolving this isn't important anymore because I wish to pursue a different mechanism for freezing, but its possible I hit the same issue. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] tap tests remove working directories
Andrew Dunstan and...@dunslane.net writes: On 08/09/2015 08:58 PM, Michael Paquier wrote: Sure. Attached is what I have in mind. Contrary to your version we keep around temp paths should a run succeed after one that has failed when running make check multiple times in a row. Perhaps it does not matter much in practice as log files get removed at each new run but I think that this allows a finer control in case of failure. Feel free to have a look. Actually, I don't think this is a very good idea. You could end up with a whole series of opaquely named directories from a series of failing runs. If you want to keep the directory after a failure, and want to do another run, then rename the directory. That's what you have to do with the main regression tests, too. My version also has the benefit of changing exactly 3 lines in the source :-) FWIW, I think we should keep the behavior of the TAP tests as close as possible to the established behavior of the main regression tests. It's certainly possible that there's room for improvement in that benchmark behavior. But let's first converge the test behaviors, and then have a discussion about whether we want changes, and if so change all the tests at the same time. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Don'st start streaming after creating a slot in pg_receivexlog
On 2015-07-30 17:14:16 +0900, Michael Paquier wrote: So, perhaps the attached is more convincing then? It just changes --create-slot to leave immediately after creation to address the complain of this thread. -- Michael Pushed that. Thanks! Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] expose confirmed_flush for replication slots
On 2015-07-08 15:01:15 +0300, Marko Tiikkaja wrote: + if (confirmed_flush_lsn != InvalidTransactionId) + values[i++] = LSNGetDatum(confirmed_flush_lsn); + else + nulls[i++] = true; + Hm. That comparison is using the wrong datatype, but it appears you only copied my earlier mistake... Fixed back to 9.4 and in your patch. Other notes: * you missed to touch test_decoding's regression test output files. * None of the docs were touched. catalogs.sgml definitely needs docs about the new columns, and I see no reason to leave the examples elsewhere stale. Fixed those and committed it. Thanks for the patch! - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers