[HACKERS] Proposal: ALTER EXTENSION SET OPTION
Hi all; One of the annoyances we currently deal with regarding analytics extensions in a PG environment with mixed versions is there is no facility right now to conditionally support certain modifications to functions that might be required to make certain features work properly. The case that comes to mind right now is in marking some functions parallel safe for PostgreSQL 9.6 and above while taking no action for 9.5. Currently we have a few options: 1. Mark on a best effort basis 2, Drop support for 9.5 and below 3. Come up with some much more complicated version graph. It would be very nice to be able to define some options which could be set for extensions but don't affect their arguments or return types, such as marking functions parallel-safe and then have scripts which run to define these functions. My thinking is one would have a syntax for a few specified options, such as: ALTER EXTENSION foo SET OPTION PARALLEL SAFETY; or if the extension supports the reverse: ALTER EXTENSION foo UNSET OPTION PARALLEL SAFETY; Over time more options could be added, but would be mapped to a file convention. In this case, we could expect: foo--[version]--set-parallel-safety.sql and foo--[version]--unset-parallel-safety.sql -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] proposal: schema variables
Some thoughts on this. On Thu, Nov 2, 2017 at 4:48 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Nico Williams <n...@cryptonector.com> writes: > > With access controls, GUCs could become schema variables, and settings > > from postgresql.conf could move into the database itself (which I think > > would be nice). > > People re-propose some variant of that every so often, but it never works, > because it ignores the fact that some of the GUCs' values are needed > before you can access system catalogs at all, or in places where relying > on system catalog access would be a bad idea. > I think the basic point one should get here is that no matter the unification, you still have some things in the db and some things out. I would rather look at how the GUC could be improved on a functional/use case level before we look at the question of a technical solution. One major use case today would be restricting how high various users can set something like work_mem or the like. As it stands, there isn't really a way to control this with any granularity. So some of the proposals regarding granting access to a session variable would be very handy in granting access to a GUC variable. > > Sure, we could have two completely different configuration mechanisms > so that some of the variables could be "inside the database", but that > doesn't seem like a net improvement to me. The point of the Grand Unified > Configuration mechanism was to be unified, after all. > +1 > > I'm on board with having a totally different mechanism for session > variables. The fact that people have been abusing GUC to store > user-defined variables doesn't make it a good way to do that. > What about having a more clunky syntax as: SET VARIABLE foo='bar'; Perhaps one can have a short form of: SET VAR foo = 'bar'; vs SET foo = 'bar'; -- GUC > > regards, tom lane > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] Oracle to PostGre
As a brief note, this is probably not the best list for this. You would do better to ask questions like this on -general where you have more application developers and so forth. This is more of an SQL question so asking people who are hacking the codebase may not be the best way to get it answered. Also, it is Postgres or PostgreSQL. People will assume you are totally new if you call it Postgre. On Wed, Nov 1, 2017 at 12:55 PM, Brahmam Eswar <brahmam1...@gmail.com> wrote: > Hi, > > App is moving to Postgre from Oracel . After migrating the store procedure > is throwing an error with collection type. > > *Oracle :* > > create or replace PROCEDURE"PROC1" > ( > > , REQ_CURR_CODE IN VARCHAR2 > , IS_VALID OUT VARCHAR2 > , ERROR_MSG OUT VARCHAR2 > ) AS > > >TYPE INV_LINES_RT IS RECORD( > VENDOR_NUM AP.CREATURE_TXN_LINE_ITEMS.VENDOR_NUM%TYPE, > VENDOR_SITE_CODE AP.CREATURE_TXN_LINE_ITEMS. > VENDOR_SITE_CODE%TYPE, > INVOICE_NUM AP.CREATURE_TXN_LINE_ITEMS.INVOICE_NUM%TYPE, > TXN_CNT NUMBER >); >TYPE INV_LINES_T IS TABLE OF INV_LINES_RT; >L_INV_LINES INV_LINES_T; >IS_MULTI_VENDOR FINO_APRVL_BUS_CHANN_DEFAULTS. > MULTI_VENDOR_REQ_ALLOWED%TYPE; >BUS_CHANNEL_RECORD FINO_APRVL_BUS_CHANN_DEFAULTS%ROWTYPE; > CAL_APRVL_AMT_BY_TOTAL_AMT FINO_APRVL_BUS_CHANN_DEFAULTS. > CAL_APR_AMT_BY_INV%TYPE; > > > Postgre : > > create or replace FUNCTION AP.VALIDATE_CRTR_LINE_ITEMS > ( > REQ_CURR_CODE IN VARCHAR, > IS_VALID OUT VARCHAR, > ERROR_MSG OUT VARCHAR > ) AS $$ > > DECLARE > > INV_LINES_T ap.validate_crtr_line_items$inv_lines_rt ARRAY; > L_INV_LINES INV_LINES_T%TYPE; > L_INV_LINES$temporary_record ap.validate_crtr_line_items$inv_lines_rt; > IS_MULTI_VENDOR AP.FINO_APRVL_BUS_CHANN_DEFAULTS.MULTI_VENDOR_REQ_ > ALLOWED%TYPE; > BUS_CHANNEL_RECORD ap.fino_aprvl_bus_chann_defaults%ROWTYPE; > CAL_APRVL_AMT_BY_TOTAL_AMT AP.FINO_APRVL_BUS_CHANN_ > DEFAULTS.CAL_APR_AMT_BY_INV%TYPE; > > > but it's throwing an error as : 0 SQLState: 42P01 Message: ERROR: > relation "l_inv_lines" does not exist > -- > Thanks & Regards, > Brahmeswara Rao J. > When you ask on -general, please include the query which is actually causing the problem. My guess is that either you didn't declare the type properly or there is some other error in your function, but the information provided is not sufficient to answer it. Best or luck asking on -general. -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs
Attached is the patch as submitted for commitfest. Please note, I am not adverse to adding an additional --Include-path directive if that avoids backwards-compatibility problems. However the patch is complex enough I would really prefer review on the rest of it to start first. This doesn't strike me as perfectly trivial and I think it is easier to review pieces separately. I have already started on the --Include-path directive and could probably get it attached to a later version of the patch very soon. I would also like to address a couple of important points here: 1. I think default restrictions plus additional paths is the best, safest way forward. Excluding shell-globs doesn't solve the "I need this particular config file" very well particularly if we want to support this outside of an internal environment. Shell globs also tend to be overly inclusive and so if you exclude based on them, you run into a chance that your rewind is corrupt for being overly exclusive. 2. I would propose any need for an additional paths be specified using an --Include-path directive. This could be specified multiple times and could point to a file or directory which would be added to the paths rewound. I have a patch for this mostly done but I am concerned that these sorts of changes result in a combination of changes that are easier to review separately than together. So if needed, I can add it or in a separate patch following. 3. I think it would be a mistake to tie backup solutions in non-replicated environments to replication use cases, and vice versa. Things like replication slots (used for streaming backups) have different considerations in different environments. Rather than build the same infrastructure first, I think it is better to support different use cases well and then build common infrastructure to support the different cases. I am not against building common infrastructure for pg_rewind and pg_basebackup. I am very much against having the core guarantees being the exact same. Best Wishes, Chris Travers On Sat, Oct 28, 2017 at 1:22 PM, Chris Travers <chris.trav...@adjust.com> wrote: > Hi; > > There are still some cleanup bits needed here but I wanted to get feedback > on my general direction. > > I hope to submit for commit fest soon if the general feedback is good. > Tests are passing (with adjustments intended for change of behaviour in one > test script). I want to note that Crimson Thompson (also with Adjust) has > provided some valuable discussion on this code. > > The Problem: > > pg_rewind makes no real guarantees regarding the final state of non-data > files or directories. It.checks to see if the timeline has incremented > (and therefore guarantees that if successful the data directories are on > the same timeline) but for non-data files, these are clobbered if we rewind > and left intact if not. These other files include postgresql.auto.conf, > replication slots, and can include log files. > > Copying logs over to the new slave is something one probably never wants > to do (same with replication slots), and the behaviours here can mask > troubleshooting regarding what a particular master failed, cause wal > segments to build up, automatic settings changes, and other undesirable > behaviours. Together these make pg_rewind very difficult to use properly > and push tasks to replication management tooling that the management tools > are not in a good position to handle correctly. > > Designing the Fix: > > Two proposed fixes have been considered and one selected: Either we > whitelist directories and only rewind those. The other was to allow shell > globs to be provided that could be used to exclude files. The whitelisting > solution was chosen for a number of reasons. > > When pg_rewind "succeeds" but not quite correctly the result is usually a > corrupted installation which requires a base backup to replace it anyway. > In a recovery situation, sometimes pressures occur which render human > judgment less effective, and therefore glob exclusion strikes me as > something which would probably do more harm than good, but maybe I don't > understand the use case (comments as to why some people look at the other > solution as preferable would be welcome). > > In going with the whitelisting solution, we chose to include all > directories with WAL-related information.This allows more predicable > interaction with other parts of the replication chain. Consequently not > only do we copy pg_wal and pg_xact but also commit timestamps and so forth. > > The Solution: > > The solution is a whitelist of directories specified which are the only > ones which are synchronised. The relevant part of this patch is: > > +/* List of directories to synchronize: > > + * base data dirs (a
Re: [HACKERS] Patch: restrict pg_rewind to whitelisted directories
On Tue, Oct 31, 2017 at 1:38 PM, Robert Haas <robertmh...@gmail.com> wrote: > On Mon, Oct 30, 2017 at 6:44 PM, Chris Travers <chris.trav...@adjust.com> > wrote: > > The attached patch is cleaned up and filed for the commit fest this next > > month: > > It's generally better to post the patch on the same message as the > discussion thread, or at least link back to the discussion thread from > the new one. Otherwise people may have trouble understanding the > history. > Fair point. Mostly concerned about the WIP marker there. This is my first time around this. I will then retire this thread and attach the patch on the other one. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
[HACKERS] Anyone have experience benchmarking very high effective_io_concurrency on NVME's?
Hi; After Andres's excellent talk at PGConf we tried benchmarking effective_io_concurrency on some of our servers and found that those which have a number of NVME storage volumes could not fill the I/O queue even at the maximum setting (1000). Before we start benchmarking patched versions of PostgreSQL to bump this setting up to ever higher values, I am wondering if anyone has done this yet and if so if you would be willing to share results. -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
[HACKERS] Patch: restrict pg_rewind to whitelisted directories
The attached patch is cleaned up and filed for the commit fest this next month: Here's the full commit message via Mercurial. I will likely have a new branch per version the patch since that's the closest thing to a rebase in this version control system. changeset: 60492:47f87a2d2fa1 tag: mine/pg_rewind_restrict_dirs parent: 60446:e638ba9c3c11 user:Chris Travers <chris.trav...@gmail.com> date:Mon Oct 30 12:25:18 2017 +0100 files: doc/src/sgml/ref/pg_rewind.sgml src/bin/pg_rewind/copy_fetch.c src/bin/pg_rewind/fetch.c src/bin/pg_rewind/fetch.h src/bin/pg_rewind/libpq_fetch.c src/bin/pg_rewind/pg_rewind.c src/bin/pg_rewind/t/003_extrafiles.pl description: Restrict pg_rewind to whitelisted directories. This is intended to be a minimum working version and in fact builds and passes tests. Note that tests for extra files have been changed to reflect new behavior and additional debugging informnation added in to output in case of failure. The patch iterates through a series of set directories to synchronize them only. This improves predictability of the complete state of the system after a rewind. One important outstanding question here is whether we need to ensure the possibility of backing up other files if they exist via an --include-path command line switch (this would not be a glob). In the thread discussing this patch, Michael Paquier has expressed concern about configuration files created by extensions or other components not being copied. I could add such a switch but the patch is long enough, and it is unclear enough to the extent this is needed at present, so I am leaving it at the reviewer's discretion whether I should add this here or submit a second patch later to add the ability to add additional paths to the filemap. Either way, it is worth noting that I expect to have a subsequent patch either incorporated here or in a further submission that takes this and adds the ability to include additional directories or files via a command line flag. This will *not* be a shell glob but one directory or file per invocation of the switch (similar to -t in pg_dump). -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin pg_rewind_restrict_dirs.v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs
On Mon, Oct 30, 2017 at 11:36 AM, Michael Paquier <michael.paqu...@gmail.com > wrote: > On Mon, Oct 30, 2017 at 10:15 AM, Chris Travers > <chris.trav...@adjust.com> wrote: > > This also brings up a fairly major concern more generally about control > by > > the way. A lot of cases where pg_rewind is called, the user doesn't > > necessarily have much control on how it is called. Moreover in many of > > these cases, the user is probably not in a position to understand the > > internals well enough to grasp what to check after. > > Likely they are not. > So currently I am submitting the current patch to commit fest. I am open to adding a new --Include-path option in this patch, but I am worried about atomicity concerns, and I am still not really sure what the impact is for you (I haven't heard whether you expect this file to be there before the rewind, i.e. whether it would be on both master and slave, or just on the slave). Sp there is the question of under what specific circumstances this would break for you. > > > Right, but there is a use case difference between "I am taking a backup > of a > > server" and "I need to get the server into rejoin the replication as a > > standby." > > The intersection of the first and second categories is not empty. Some > take backups and use them to deploy standbys. > Sure, but it is not complete either. > > > A really good example of where this is a big problem is with replication > > slots. On a backup I would expect you want replication slots to be > copied > > in. > > I would actually expect the contrary, and please note that replication > slots are not taken in a base backup, which is what the documentation > says as well: > https://www.postgresql.org/docs/10/static/protocol-replication.html > "pg_dynshmem, pg_notify, pg_replslot, pg_serial, pg_snapshots, > pg_stat_tmp, and pg_subtrans are copied as empty directories (even if > they are symbolic links)." > Many of these are emptied on server restart but repslot is not. What is the logic for excluding it from backups other than to avoid problems with replication provisioning? I mean if I have a replication slot for taking backups with barman (and I am not actually doing replication) why would I *not* want that in my base backup if I might have to restore to a new machine somewhere? > > Some code I have with 9.6's pg_bsaebackup removes manually replication > slots as this logic is new in 10 ;) > > >> The pattern that base backups now use is an exclude list. In the > >> future I would rather see base backups and pg_rewind using the same > >> infrastructure for both things: > >> - pg_rewind should use the replication protocol with BASE_BACKUP. > >> Having it rely on root access now is crazy. > >> - BASE_BACKUP should have an option where it is possible to exclude > >> custom paths. > >> What you are proposing here would make both diverge more, which is in > >> my opinion not a good approach. > > > > How does rep mgr or other programs using pg_rewind know what to exclude? > > Good question. Answers could come from folks such as David Steele > (pgBackRest) or Marco (barman) whom I am attaching in CC. > Two points that further occur to me: Shell globs seem to me to be foot guns in this area, I think special paths should be one path per invocation of the option not "--exclude=pg_w*" since this is just asking for problems as time goes on and things get renamed. It also seems to me that adding specific paths is far safer than removing specific paths. > -- > Michael > -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs
On Mon, Oct 30, 2017 at 10:57 AM, Michael Paquier <michael.paqu...@gmail.com > wrote: > On Mon, Oct 30, 2017 at 9:43 AM, Chris Travers <chris.trav...@adjust.com> > wrote: > > Are there any cases right now where you have features added by > extensions that write to directories which are required for a rewind? > > In some of the stuff I maintain, I actually have one case now of a > configuration file included with include_if_exists by postgresql.conf > and is expected to be generated by a component that my code doing the > rewind has no direct access on... I can control how pg_rewind kicks > in, but I think that you would actually break silently the current > code, which is scary especially if I don't control the code where > pg_rewind is called when Postgres 11 gets integrated into the product > I am thinking about, and even more scary if there is no way to include > something. > Ok, so there is an argument that there needs to be a way to include additional paths in this patch. One important question I would have in these cases is if you expect these to be set only on the master. If not, then is this a problem and if not then how do you handle the fail-back problem etc? This also brings up a fairly major concern more generally about control by the way. A lot of cases where pg_rewind is called, the user doesn't necessarily have much control on how it is called. Moreover in many of these cases, the user is probably not in a position to understand the internals well enough to grasp what to check after. > > > The problem with an exclude list is that we cannot safely exclude > > configuration files or logs (because these could be named anything), > right? > > You have the exact same problem with base backups now, and any > configuration files created by extensions are a per-case problem... > Right, but there is a use case difference between "I am taking a backup of a server" and "I need to get the server into rejoin the replication as a standby." A really good example of where this is a big problem is with replication slots. On a backup I would expect you want replication slots to be copied in. However when setting yourself up as a slave you most certainly do not want to just copy these over from the master unless you have infinite disk space. I would argue that under *no* circumstance should pg_rewind *ever* copy replication slots. But pg_base_backup really *should* (and when provisioning a new slave you should clear these as soon as it is set up). The pattern that base backups now use is an exclude list. In the > future I would rather see base backups and pg_rewind using the same > infrastructure for both things: > - pg_rewind should use the replication protocol with BASE_BACKUP. > Having it rely on root access now is crazy. > - BASE_BACKUP should have an option where it is possible to exclude > custom paths. > What you are proposing here would make both diverge more, which is in > my opinion not a good approach. > How does rep mgr or other programs using pg_rewind know what to exclude? Again I am not convinced setting up a replica and taking a backup for disaster recovery are the same use case or have the same requirements. -- > Michael > -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs
First, thanks for your thoughts on this, and I am interested in probing them more. On Mon, Oct 30, 2017 at 9:04 AM, Michael Paquier <michael.paqu...@gmail.com> wrote: > On Sat, Oct 28, 2017 at 4:22 AM, Chris Travers <chris.trav...@adjust.com> > wrote: > > The Solution: > > The solution is a whitelist of directories specified which are the only > ones > > which are synchronised. The relevant part of this patch is: > > > > +/* List of directories to synchronize: > > + * base data dirs (and ablespaces) > > + * wal/transaction data > > + * and that is it. > > + * > > + * This array is null-terminated to make > > + * it easy to expand > > + */ > > > > +const char *rewind_dirs[] = { > > +"base", > > +"global", > > +"pg_commit_ts", > > +"pg_logical", > > +"pg_multixact", > > +"pg_serial", > > +"pg_subtrans", > > +"pg_tblspc", > > +"pg_twophase", > > +"pg_wal", > > +"pg_xact", > > +NULL > > +}; > > The problem with a white list, which is one reason why I do not favour > it, is in the case where a feature adds in the data folder a new path > for its data, particularly in the case where this is critical for a > base backup. If this folder is not added in this list, then a rewind > may be silently corrupted as well. There are also a set of directories > in this list that we do not care about, pg_serial being one. > pg_subtrans is a second, as it gets zeroed on startup. > The problem with an exclude list is that we cannot safely exclude configuration files or logs (because these could be named anything), right? Are there any cases right now where you have features added by extensions that write to directories which are required for a rewind? I am asking because I would like to see if this is the minimum working change or if this change is fundamentally broken currently and must be extended to allow custom paths to be sync'd as well. > > And if you think about it, pg_rewind is actually the *same* processing > as a base backup taken using the replication protocol. So instead of > this patch I would recommend using excludeFiles and excludeDirContents > by moving this list to a common header where frontend and backend can > refer to. Note that basebackup.h includes replnodes.h, so my thoughts > is that you should split the current header with something like > basebackup_internal.h which is backend-only, and have pg_rewind fetch > the list of directories to automatically ignore as those ones. You can > also simplify a bit the code of pg_rewind a bit so as things like > postmaster.opts. On top of that, there is visibly no need to tweak the > SQL query fetching the directory list (which is useful for debugging > as shaped now actually), but just change process_source_file so as its > content is not added in the file map. > I am not sure it *should* be the same, however. In a backup we probably want to backup the postgresql.auto.conf, but on a failover, I don't think we want to clobber configuration. We certainly do not want to sometimes clobber configuration but not other times (which is what happens right now in some cases). And under no circumstances do we want to clobber logs on a failed server with logs on a working server. That's asking for serious problems in my view. If you think about it, there's a huge difference in use case in backing up a database cluster (Including replication slots, configs in the dir etc) and re-syncing the data so that replication can resume, and I think there are some dangers that come up when assuming these should be closely tied together. > > Then there is a second case where you do not want a set of folders to > be included, but those can be useful by default, an example here being > pg_wal where one might want to have an empty path to begin with. A > third case is a set of folders that you do not care about, but depends > on the deployment, being here "log" for example. For those you could > just add an --exclude-path option which simply piles up paths into a > linked list when called multiple times. This could happen with a > second patch. > Agreed. And one could add an "--include-path" option to allow for unusual cases where you want extra directories, such as replication slots, or the like. I think another patch could also specifically empty and perhaps create a replication slot allowing for one to bring tings back up via streaming replication more safely. > > > From there we iterate over this array for directory-based approaches in > > copy_fetch.c and with one query per directory in libpq
Re: [HACKERS] proposal: schema variables
On Sat, Oct 28, 2017 at 4:56 PM, Pavel Stehule <pavel.steh...@gmail.com> wrote: > >> > The creating database objects and necessary infrastructure is the most > simple task of this project. I'll be more happy if there are zero > intersection because variables and GUC are designed for different purposes. > But due SET keyword the intersection there is. > > When I thinking about it, I have only one, but important reason, why I > prefer design new type of database object -the GUC are stack based with > different default granularity - global, database, user, session, function. > This can be unwanted behave for variables - it can be source of hard to > detected bugs. I afraid so this behave can be too messy for usage as > variables. > > @1 I have not clean opinion about it - not sure if rights are good enough > - probably some user limits can be more practical - but can be hard to > choose result when some user limits and GUC will be against > I was mostly thinking that users can probably set things like work_mem and possibly this might be a problem. > @2 With variables typed custom GUC are not necessary > I don't know about that. For example with the geoip2lookup extension it is nice that you could set the preferred language for translation on a per user basis or the mmdb path on a per-db basis. > @3 Why you need it? It is possible with set_config function now. > Yeah you could do it safely with set_config and a CTE, but suppose I have: with a (Id, value) as (values (1::Int, 'foo'), (2, 'bar'), (3, 'baz')) SELECT set_config('custom_val', value) from a where id = 2; What is the result out of this? I would *expect* that this would probably run set_config 3 times and filter the output. > > Regards > > Pavel > > > > >> >> >>> regards >>> >>> Pavel >>> >>> >>> >> >> >> -- >> Best Regards, >> Chris Travers >> Database Administrator >> >> Tel: +49 162 9037 210 <+49%20162%209037210> | Skype: einhverfr | >> www.adjust.com >> Saarbrücker Straße 37a, 10405 Berlin >> <https://maps.google.com/?q=Saarbr%C3%BCcker+Stra%C3%9Fe+37a,+10405+Berlin=gmail=g> >> >> > -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] proposal: schema variables
On Thu, Oct 26, 2017 at 9:21 AM, Pavel Stehule <pavel.steh...@gmail.com> wrote: > Hi, > > I propose a new database object - a variable. The variable is persistent > object, that holds unshared session based not transactional in memory value > of any type. Like variables in any other languages. The persistence is > required for possibility to do static checks, but can be limited to session > - the variables can be temporal. > > My proposal is related to session variables from Sybase, MSSQL or MySQL > (based on prefix usage @ or @@), or package variables from Oracle (access > is controlled by scope), or schema variables from DB2. Any design is coming > from different sources, traditions and has some advantages or > disadvantages. The base of my proposal is usage schema variables as session > variables for stored procedures. It should to help to people who try to > port complex projects to PostgreSQL from other databases. > > The Sybase (T-SQL) design is good for interactive work, but it is weak > for usage in stored procedures - the static check is not possible. Is not > possible to set some access rights on variables. > > The ADA design (used on Oracle) based on scope is great, but our > environment is not nested. And we should to support other PL than PLpgSQL > more strongly. > > There is not too much other possibilities - the variable that should be > accessed from different PL, different procedures (in time) should to live > somewhere over PL, and there is the schema only. > > The variable can be created by CREATE statement: > > CREATE VARIABLE public.myvar AS integer; > CREATE VARIABLE myschema.myvar AS mytype; > > CREATE [TEMP] VARIABLE [IF NOT EXISTS] name AS type > [ DEFAULT expression ] [[NOT] NULL] > [ ON TRANSACTION END { RESET | DROP } ] > [ { VOLATILE | STABLE } ]; > > It is dropped by command DROP VARIABLE [ IF EXISTS] varname. > > The access rights is controlled by usual access rights - by commands > GRANT/REVOKE. The possible rights are: READ, WRITE > > The variables can be modified by SQL command SET (this is taken from > standard, and it natural) > > SET varname = expression; > > Unfortunately we use the SET command for different purpose. But I am > thinking so we can solve it with few tricks. The first is moving our GUC to > pg_catalog schema. We can control the strictness of SET command. In one > variant, we can detect custom GUC and allow it, in another we can disallow > a custom GUC and allow only schema variables. A new command LET can be > alternative. > > The variables should be used in queries implicitly (without JOIN) > > SELECT varname; > > The SEARCH_PATH is used, when varname is located. The variables can be > used everywhere where query parameters are allowed. > > I hope so this proposal is good enough and simple. > > Comments, notes? > I have a question on this. Since one can issue set commands on arbitrary settings (and later ALTER database/role/system on settings you have created in the current session) I am wondering how much overlap there is between a sort of extended GUC with custom settings and variables. Maybe it would be simpler to treat variables and GUC settings to be similar and see what can be done to extend GUC in this way? I mean if instead we allowed restricting SET to known settings then we could have a CREATE SETTING command which would behave like this and then use SET the same way across both. In essence I am wondering if this really needs to be as separate from GUC as you are proposing. If done this way then: 1. You could issue grant or revoke on GUC settings, allowing some users but not others to set things like work_mem for their queries 2. You could specify allowed types in custom settings. 3. In a subsequent stage you might be able to SELECT INTO setting_name FROM ; allowing access to setting writes based on queries. > regards > > Pavel > > > -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
[HACKERS] WIP: Restricting pg_rewind to data/wal dirs
Hi; There are still some cleanup bits needed here but I wanted to get feedback on my general direction. I hope to submit for commit fest soon if the general feedback is good. Tests are passing (with adjustments intended for change of behaviour in one test script). I want to note that Crimson Thompson (also with Adjust) has provided some valuable discussion on this code. The Problem: pg_rewind makes no real guarantees regarding the final state of non-data files or directories. It.checks to see if the timeline has incremented (and therefore guarantees that if successful the data directories are on the same timeline) but for non-data files, these are clobbered if we rewind and left intact if not. These other files include postgresql.auto.conf, replication slots, and can include log files. Copying logs over to the new slave is something one probably never wants to do (same with replication slots), and the behaviours here can mask troubleshooting regarding what a particular master failed, cause wal segments to build up, automatic settings changes, and other undesirable behaviours. Together these make pg_rewind very difficult to use properly and push tasks to replication management tooling that the management tools are not in a good position to handle correctly. Designing the Fix: Two proposed fixes have been considered and one selected: Either we whitelist directories and only rewind those. The other was to allow shell globs to be provided that could be used to exclude files. The whitelisting solution was chosen for a number of reasons. When pg_rewind "succeeds" but not quite correctly the result is usually a corrupted installation which requires a base backup to replace it anyway. In a recovery situation, sometimes pressures occur which render human judgment less effective, and therefore glob exclusion strikes me as something which would probably do more harm than good, but maybe I don't understand the use case (comments as to why some people look at the other solution as preferable would be welcome). In going with the whitelisting solution, we chose to include all directories with WAL-related information.This allows more predicable interaction with other parts of the replication chain. Consequently not only do we copy pg_wal and pg_xact but also commit timestamps and so forth. The Solution: The solution is a whitelist of directories specified which are the only ones which are synchronised. The relevant part of this patch is: +/* List of directories to synchronize: + * base data dirs (and ablespaces) + * wal/transaction data + * and that is it. + * + * This array is null-terminated to make + * it easy to expand + */ + +const char *rewind_dirs[] = { +"base", +"global", +"pg_commit_ts", +"pg_logical", +"pg_multixact", +"pg_serial", +"pg_subtrans", +"pg_tblspc", +"pg_twophase", +"pg_wal", +"pg_xact", +NULL +}; From there we iterate over this array for directory-based approaches in copy_fetch.c and with one query per directory in libpq_fetch.c. This also means shifting from the basic interface from PQexec to PQexecParams. It would be possible to move to binary formats too, but this was not done currently in order to simplify code review (that could be a separate independent patch at a later time). Testing Done: The extra files tests correctly test this change in behaviour. The tests have been modified, and diagnostics in cases of failures expanded, in this case. The other tests provide good coverage of whether pg_rewind does what it is supposed to do. Cleanup still required: I accidentally left Carp::Always in the PM in this perl module. It will be fixed. I renamed one of the functions used to have a more descriptive name but currently did not remove the old function yet. Feedback is very welcome. pg_rewind is a very nice piece of software. I am hoping that these sorts of changes will help ensure that it is easier to use and provides more predictable results. -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin pg_rewind_restrict.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add Roman numeral conversion to to_number
On Sun, Sep 17, 2017 at 6:43 PM, David Fetter <da...@fetter.org> wrote: > On Sat, Sep 16, 2017 at 10:42:49PM +, Douglas Doole wrote: > > Oliver, I took a look at your tests and they look thorough to me. > > > > One recommendation, instead of having 3999 separate selects to test every > > legal roman numeral, why not just do something like this: > > > > do $$ > > declare > > i int; > > rn text; > > rn_val int; > > begin > > for i in 1..3999 loop > > rn := trim(to_char(i, 'rn')); > > rn_val := to_number(rn, 'rn'); > > if (i <> rn_val) then > > raise notice 'Mismatch: i=% rn=% rn_val=%', i, rn, rn_val; > > end if; > > end loop; > > raise notice 'Tested roman numerals 1..3999'; > > end; > > $$; > > > > It's a lot easier to maintain than separate selects. > > Why not just one SELECT, as in: > > SELECT i, to_char(i, 'rn'), to_number(to_char(i, 'rn'), 'rn'); > FROM generate_series(1,3999) i > Question: What is our definition of a legal Roman numeral? For example sometimes IXX appears in the corpus to refer to 19 even though our standardised notation would be XIX. > > Best, > David. > -- > David Fetter <david(at)fetter(dot)org> http://fetter.org/ > Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter > Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com > > Remember to vote! > Consider donating to Postgres: http://www.postgresql.org/about/donate > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] pg_rewind proposed scope and interface changes
On Wed, Sep 13, 2017 at 6:28 AM, Michael Paquier <michael.paqu...@gmail.com> wrote: > On Tue, Sep 12, 2017 at 11:52 PM, Chris Travers > <chris.trav...@adjust.com> wrote: > > Additionally the wal, xact, timestamp and logical directories must be > > processed in some way. > > To what does the term "logical directories" refer to? > > > * if --wal=sync the directories are processed the way they are today > > * if --wal=clear then the contents of the directories are cleared and > > replication is assumed to be used to bring the system up after. Note > this > > will need to come with warning about the need for replication slots. > > Hm. I am not sure in what --wal=clear is helpful. Keeping around WAL > segments from the point of the last checkpoint where WAL forked up to > the point where WAL has forked is helpful, because you don't need to > copy again those WAL segments, be they come from an archive or from > streaming. Copying a set of WAL segments during the rewind of the new > timeline is helpful as well because you don't need to do the copy > again. One configuration where this is helpful is that there is > already an archive local to the target server available with the > segments of the new timeline available. > so maybe clear should just clear diverged wal files. That makes some sense. > > > Base, global, pg_tablespace > > > > With > > pg_wal, pg_xact, pg_commit_ts, pg_logical added if wal strategy is set to > > sync. > > Skipping some directories in a way similar to what a base backup does > would be nicer I think. We already have a list of those in > basebackup.c in the shape of excludeDirContents and excludeFiles. I > think that it would be a good idea to export those into a header that > pg_rewind could include, and refer to in order to exclude them when > fetching a set of files. At the end of the day, a rewind is a kind of > base backup in itself, and this patch would already serve well a lot > of people. > I think there are several reasons not to just do this based on base backup. For base backup there may be some things we restore as such that we don't want to restore with pg_rewind. Configuration files, logs, and replication slots are the ones that immediately come to mind. Given that configuration files can have arbitrary names and locations, expecting third party failover tooling like repmgr or patroni to pick up on such policies strikes me as asking a bit much. I think we are better off with a program that solves one problem well and solves it right and the problem is rewinding a system so that it can follow a replica that was previously following it. > > Having on top of that a way to exclude a wanted set of files and the > log directory (for example: should we look at log_directory and > exclude it from the fetched paths if it is not an absolute path?), > which is smart enough to take care of not removing paths critical for > a rewind like anything in base/, then you are good to go with a > full-blown tool that I think would serve the purposes you are looking > for. > If you are going to go this route, I think you need to look at the config files themselves, parse them, and possibly look at the command line by which PostgreSQL is started.Otherwise I don't know how you expect this to be configured > -- > Michael > -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
[HACKERS] pg_rewind proposed scope and interface changes
Hi all; After working with pg_rewind a bit more it has become clear to me that some aspects of the ideal scope of the program are a bit unclear. This is in part because the majority of cases where you use the utility, a small amount of data loss is accepted due to the tradeoff of continuity. Basic Responsibility: pg_rewind’s core responsibility is to restore data directories so that they are compatible with following another server. Expanded Options: As with pg_basebackup, wal files and related data may need to be specified separately. Pre-run checks: PG_VERSION must be the same on target and source or else we stop. We then check the timelines and if they have not diverged don’t do anything. Processed directories: pg_basebackup must *always* process global, base, and pg_tablespace directories regardless of any WAL Additionally the wal, xact, timestamp and logical directories must be processed in some way. * if --wal=sync the directories are processed the way they are today * if --wal=clear then the contents of the directories are cleared and replication is assumed to be used to bring the system up after. Note this will need to come with warning about the need for replication slots. No other files will be clobbered. This means that the basic processing file list will look like this: Base, global, pg_tablespace With pg_wal, pg_xact, pg_commit_ts, pg_logical added if wal strategy is set to sync. Proposed patch roadmap: 1. Improve WAL handling by allowing unlinking of all regular files in those directories during the rewind process. Make —wal=sync the default for now because that is backwards compatible. Longer-run we may consider switching the default. 2. Restrict main operation to the directories listed above. Problems that we will not try to solve: * Rewinding past table creation orphans table file. This is a complex topic on its own and probably needs a separate utility. Thoughts? -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] Proposal: pg_rewind to skip config files
On Thu, Sep 7, 2017 at 2:47 PM, Alvaro Herrera <alvhe...@alvh.no-ip.org> wrote: > After reading this discussion, I agree that pg_rewind needs to become > smarter in order to be fully useful in production environments; right > now there are many warts and corner cases that did not seem to have been > considered during the initial development (which I think is all right, > taking into account that its mere concept was complicated enough; so we > need not put any blame on the original developers, rather the contrary). > Agreed with this assessment. And as a solution to the problem of "base backups take too long to take and transfer" the solution and the corner cases make a lot of sense. > > I think we need to make the program simple to use (i.e. not have the > user write shell globs for the common log file naming patterns) while > remaining powerful (i.e. not forcibly copy any files that do not match > hardcoded globs). I would add that well-defined tasks are a key aspect of powerful software in my view and here the well defined task is to restore data states to a particular usable timeline point taken from another system. If that is handled well, that opens up new uses and makes some problems that are difficult right now much easier to solve. > Is the current dry-run mode enough to give the user > peace of mind regarding what would be done in terms of testing globs > etc? If not, maybe the debug mode needs to be improved (for instance, > have it report the file size for each file that would be copied; > otherwise you may not notice it's going to copy the 20GB log file until > it's too late ...) > The dry run facility solves one problem in one circumstance, namely a manually invoked run of the software along with the question of "will this actually re-wind?" I suppose software developers might be able to use it to backup and restore things that are to be clobbered (but is anyone likely to on the software development side?). I don't see anything in that corner that can be improved without over engineering the solution. There are two reasons I am skeptical that a dry-run mode will ever be "enough." The first is that pg_rewind is often integrated into auto-failover/back tools and the chance of running it in a dry-run mode before it is automatically triggered is more or less nil. These are also the cases where you won't notice it does something bad until much later. The second is that there are at least some corner cases we may need to define as outside the responsibility of pg_rewind. The one that comes to mind here is if I am rewinding back past the creation of a small table. I don't see an easy or safe way to address that from inside pg_rewind without a lot of complication. It might be better to have a dedicated tool for that. > > Now, in order for any of this to happen, there will need to be a > champion to define what the missing pieces are, write all those patches > and see them through the usual (cumbersome, slow) procedure. Are you, > Chris, willing to take the lead on that? > Yeah. I think the first step would be list of the corner cases and a proposal for how I think it should work. Then maybe a roadmap of patches, and then submitting them as they become complete. > -- > Álvaro Herrerahttps://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] Proposal: pg_rewind to skip config files
One more side to this which is relevant to other discussions. If I am rewinding back to before when a table was created, the current algorithm as well as any proposed algorithms will delete the reference to the relfilenode in the catalogs but not the file itself. I don't see how an undo subsystem would fix this. Is this a reason to rethink the idea that maybe a pg_fsck utility might be useful that could be run immediately after a rewind? -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] Proposal: pg_rewind to skip config files
On Tue, Sep 5, 2017 at 2:40 PM, Vladimir Borodin <r...@simply.name> wrote: > > 5 сент. 2017 г., в 14:04, Michael Paquier <michael.paqu...@gmail.com> > написал(а): > > For example, in archive_command we put WALs for archiving from > pg_xlog/pg_wal into another directory inside PGDATA and than another cron > task makes real archiving. This directory ideally should be skipped by > pg_rewind, but it would not be handled by proposed change. > > > I would be curious to follow the reasoning for such a two-phase > archiving (You basically want to push it in two places, no? But why > not just use pg_receivexlog then?). This is complicated to handle from > the point of view of availability and backup reliability + durability. > > > We do compress WALs and send them over network. Doing it via > archive_command in single thread is sometimes slower than new WALs are > written under heavy load. > How would this work when it comes to rewinding against a file directory? > > -- > May the force be with you… > https://simply.name > > -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] Proposal: pg_rewind to skip config files
On Tue, Sep 5, 2017 at 1:04 PM, Michael Paquier <michael.paqu...@gmail.com> wrote: > On Tue, Sep 5, 2017 at 7:54 PM, Vladimir Borodin <r...@simply.name> wrote: > > 5 сент. 2017 г., в 12:31, Chris Travers <chris.trav...@adjust.com> > > написал(а): > > > > I think the simplest solution for now is to skip any files ending in > .conf, > > .log, and serverlog. > > This is not a portable solution. Users can include configuration files > with the names they want. So the current patch as proposed is > definitely not something worth it. > Actually that is exactly why I think the long-term solution is to figure out what we need to copy and not copy anything we don't recognise. That means the following directories as far as I can see: * base * global * pg_xlog/pg_wal * pg_clog/pg_xact * pg_commit_ts * pg_twophase * pg_snapshots? Are there any other directories I am missing? At any rate, I think the current state makes it very difficult to test rewind adequately, and it makes it extremely difficult to use in a non-trivial environment because you have to handle replication slots, configuration files, and so forth yourself, and you have to be aware that these *may* or *may not* be consistently clobbered by a rewind, so you have to have some way of applying another set of files in following a rewind. If nothing else we ought to *at least* special case the recovery.conf and the postgresql.auto.conf, and pg_replslot because these are always located there and should never be clobbered. > > > For example, in archive_command we put WALs for archiving from > > pg_xlog/pg_wal into another directory inside PGDATA and than another cron > > task makes real archiving. This directory ideally should be skipped by > > pg_rewind, but it would not be handled by proposed change. > > I would be curious to follow the reasoning for such a two-phase > archiving (You basically want to push it in two places, no? But why > not just use pg_receivexlog then?). This is complicated to handle from > the point of view of availability and backup reliability + durability. > > > While it is definitely an awful idea the user can easily put something > > strange (i.e. logs) to any important directory in PGDATA (i.e. into base > or > > pg_wal). Or how for example pg_replslot should be handled (I asked about > it > > a couple of years ago [1])? It seems that a glob/regexp for things to > skip > > is a more universal solution. > > > > [1] > > https://www.postgresql.org/message-id/flat/8DDCCC9D-450D- > 4CA2-8CF6-40B382F1F699%40simply.name > > Well, keeping the code simple is not always a bad thing. Logs are an > example that can be easily countered, as well as archives in your > case. > > -- > Michael > -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] Proposal: pg_rewind to skip config files
On Tue, Sep 5, 2017 at 12:54 PM, Vladimir Borodin <r...@simply.name> wrote: > > 5 сент. 2017 г., в 12:31, Chris Travers <chris.trav...@adjust.com> > написал(а): > > I think the simplest solution for now is to skip any files ending in > .conf, .log, and serverlog. > > > Why don’t you want to solve the problem once? It is a bit harder to get > consensus on a way how to do it, but it seems that there are no reasons to > make temporary solution here. > > For example, in archive_command we put WALs for archiving from > pg_xlog/pg_wal into another directory inside PGDATA and than another cron > task makes real archiving. This directory ideally should be skipped by > pg_rewind, but it would not be handled by proposed change. > Ok let's back up a bit in terms of what I see is the proper long-term fix. Simple code, by the way, is important, but at least as important are programs which solve simple, well defined problems. The current state is: 1. pg_rewind makes *no guarantee* as to whether differences in logs, config files, etc. are clobbered. They may (If a rewind is needed) or not (If the timelines haven't diverged). Therefore the behavior of these sorts of files with the invocation of pg_rewind is not really very well defined. That's a fairly big issue in an operational environment. 2. There are files which *may* be copied (I.e. are copied if the timelines have diverged) which *may* have side effects on replication topology, wal archiving etc. Replication slots, etc. are good examples. The problem I think pg_rewind should solve is "give me a consistent data environment from the timeline on that server." I would think that access to the xlog/clog files would indeed be relevant to that. If I were rewriting the application now I would include those. Just because something can be handled separately doesn't mean it should be, and I would refer not to assume that archiving is properly set up and working. > > > Long run, it would be nice to change pg_rewind from an opt-out approach to > an approach of processing the subdirectories we know are important. > > > While it is definitely an awful idea the user can easily put something > strange (i.e. logs) to any important directory in PGDATA (i.e. into base or > pg_wal). Or how for example pg_replslot should be handled (I asked about it > a couple of years ago [1])? It seems that a glob/regexp for things to skip > is a more universal solution. > I am not convinced it is a universal solution unless you take an arbitrary number or regexes to check and loop through checking all of them. Then the chance of getting something catastrophically wrong in a critical environment goes way up and you may end up in an inconsistent state at the end. Simple code is good. A program that solves simple problems reliably (and in simple ways) is better. The problem I see is that pg_rewind gets incorporated into other tools which don't really provide the user before or after hooks and therefore it isn't really fair to say, for example that repmgr has the responsibility to copy server logs out if present, or to make sure that configuration files are not in the directory. The universal solution is to only touch the files that we know are needed and therefore work simply and reliably in a demanding environment. > > [1] https://www.postgresql.org/message-id/flat/8DDCCC9D- > 450D-4CA2-8CF6-40B382F1F699%40simply.name > > > -- > May the force be with you… > https://simply.name > > -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] Proposal: pg_rewind to skip config files
On Mon, Sep 4, 2017 at 3:38 PM, Alvaro Herrera <alvhe...@alvh.no-ip.org> wrote: > Chris Travers wrote: > > On Mon, Sep 4, 2017 at 12:23 PM, Michael Paquier < > michael.paqu...@gmail.com> > > wrote: > > > > > On Mon, Sep 4, 2017 at 7:21 PM, Michael Paquier > > > <michael.paqu...@gmail.com> wrote: > > > > A simple idea would be to pass as a parameter a regex on which we > > > > check files to skip when scanning the directory of the target > remotely > > > > or locally. This needs to be used with care though, it would be easy > > > > to corrupt an instance. > > > > > > I actually shortcut that with a strategy similar to base backups: logs > > > are on another partition, log_directory uses an absolute path, and > > > PGDATA has no reference to the log path. > > > > Yeah, it is quite possible to move all these out of the data directory, > but > > bad things can happen when you accidentally copy configuration or logs > over > > those on the target and expecting that all environments will be properly > > set up to avoid these problems is not always a sane assumption. > > I agree that operationally it's better if these files weren't in PGDATA > to start with, but from a customer support perspective, things are > frequently not already setup like that, so failing to support that > scenario is a loser. > > I wonder how portable fnmatch() is in practice (which we don't currently > use anywhere). A shell glob seems a more natural interface to me for > this than a regular expression. > I think the simplest solution for now is to skip any files ending in .conf, .log, and serverlog. Long run, it would be nice to change pg_rewind from an opt-out approach to an approach of processing the subdirectories we know are important. It is worth noting further that if you rewind in the wrong way, in a cascading replication environment, you can accidentally change your replication topology if you clobber the recovery.conf from another replica and there is no way to ensure that this file is not in the data directory since it MUST be put there. Best Wishes, Chris Travers > > -- > Álvaro Herrerahttps://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] Proposal: pg_rewind to skip config files
On Mon, Sep 4, 2017 at 12:23 PM, Michael Paquier <michael.paqu...@gmail.com> wrote: > On Mon, Sep 4, 2017 at 7:21 PM, Michael Paquier > <michael.paqu...@gmail.com> wrote: > > A simple idea would be to pass as a parameter a regex on which we > > check files to skip when scanning the directory of the target remotely > > or locally. This needs to be used with care though, it would be easy > > to corrupt an instance. > > I actually shortcut that with a strategy similar to base backups: logs > are on another partition, log_directory uses an absolute path, and > PGDATA has no reference to the log path. > Yeah, it is quite possible to move all these out of the data directory, but bad things can happen when you accidentally copy configuration or logs over those on the target and expecting that all environments will be properly set up to avoid these problems is not always a sane assumption. So consequently, I think it would be good to fix in the tool. The fundamental question is if there is any reason someone would actually want to copy config files over. -- > Michael > -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] Proposal: pg_rewind to skip config files
Ok so I have a proof of concept patch here. This is proof of concept only. It odes not change documentation or the like. The purpose of the patch is discussion on the "do we want this" side. The patch is fairly trivial but I have not added test cases or changed docs yet. Intention of the patch: pg_rewind is an important backbone tool for recovering data directories following a switchover. However currently it is over inclusive as to what it copies. This patch excludes any file ending in "serverlog", ".conf", and ".log" because these are never directly related and add a great deal of complexity to switchovers. .conf files are excluded for two major reasons. The first is that often we may want to put postgresql.conf and other files in the data directory, and if we change these during switchover this can change, for example, the port the database is running on or other things that can break production or testing environments. This is usually a problem with testing environments, but it could happen with production environments as well. A much larger concern with .conf files though is the recovery.conf. This file MUST be put in the data directory, and it helps determine the replication topology regarding cascading replication and the like. If you rewind from an upstream replica, you suddenly change the replication topology and that can have wide-ranging impacts. I think we are much better off insisting that .conf files should be copied separately because the scenarios where you want to do so are more limited and the concern often separate from rewinding the timeline itself. The second major exclusion added are files ending in "serverlog" and ".log." I can find no good reason why server logs from the source should *ever* clobber those on the target. If you do this, you lose historical information relating to problems and introduce management issues. Backwards-incompatibility scenarios: If somehow one has a workflow that depends on copying .conf files, this would break that. I cannot think of any cases where anyone would actually want to do this but that doesn't mean they aren't out there. If people really want to, then they need to copy the configuration files they want separately. Next Steps: If people like this idea I will add test cases and edit documentation as appropriate. -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin pg_rewind_log_conf_patch.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Proposal: pg_rewind to skip config files
In some experiments with pg_rewind and rep mgr I noticed that local testing is complicated by the fact that pg_rewind appears to copy configuration files from the source to target directory. I would propose to make a modest patch to exclude postgresql.conf, pg_hba.conf, and pg_ident.conf from the file tree traversal. Any feedback before I create.a proof of concept? -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] Proposal: global index
On Fri, Aug 25, 2017 at 12:15 PM, Petr Jelinek <petr.jeli...@2ndquadrant.com > wrote: > On 25/08/17 10:28, Chris Travers wrote: > > > > > > On Thu, Aug 24, 2017 at 9:44 PM, Andres Freund <and...@anarazel.de > > <mailto:and...@anarazel.de>> wrote: > > > > Hi, > > > > On 2017-08-18 12:12:58 +0300, Ildar Musin wrote: > > > While we've been developing pg_pathman extension one of the most > frequent > > > questions we got from our users was about global index support. We > cannot > > > provide it within an extension. And I couldn't find any recent > discussion > > > about someone implementing it. So I'm thinking about giving it a > shot and > > > start working on a patch for postgres. > > > > FWIW, I personally think for constraints the better approach is to > make > > the constraint checking code cope with having to check multiple > > indexes. Initially by just checking all indexes, over the longer term > > perhaps pruning the set of to-be-checked indexes based on the values > in > > the partition key if applicable. The problem with creating huge > global > > indexes is that you give away some the major advantages of > partitioning: > > - dropping partitions now is slow / leaves a lof of garbage again > > - there's no way you can do this with individual partitions being > remote > > or such > > - there's a good chunk of locality loss in global indexes > > > > The logic we have for exclusion constraints checking can essentially > be > > extended to do uniqueness checking over multiple partitions. > Depending > > on the desired deadlock behaviour one might end up doing speculative > > insertions in addition. The foreign key constraint checking is > fairly > > simple, essentially one "just" need to remove the ONLY from the > > generated check query. > > > > +1 (or +as much as I am allowed to get away with really ;) ) > > > > > To be clear, this would still require a high-level concept of a global > > index and the only question is whether it gets stored as multiple > > partitions against partitioned tables vs stored in one giant index, > right? > > > No, just global constraints. For example, if you consider unique index > to be implementation detail of a unique constraint, there is nothing > stopping us to use multiple such indexes (one per partition) as > implementation detail to single global unique constraint. No need for > global index at all. > Ok so in this case a global constraint needs to track partitioned indexes, right? How does this differ, in practice from a "global but partitioned index?" This seems like splitting hairs but I am trying to see if there is disagreement here that goes beyond language. For example, could I have a global, partial unique constraint the way I can do things with a single table currently (something like create global unique index foo_id_idxuf on foo(Id) where id > 12345)? Is this something the discussion here would foreclose? It seems to me that you can get both of these (and more) by adding the concept of a global index which means: 1. Index is on parent table 2. Index is inherited to child tables and managed on parent. 3. Writes to children that hit inherited unique index ALSO must check (with exclusion constraints etc) ALL other tables in the inheritance tree of the index. That would also have a few important side benefits: 1. Easier management of indexes where all partitions (or other children since there are other uses for table inheritance than partitioning) must be indexed 2. Ability to have partial unique indexes enforced consistently across an inheritance tree. An alternative might be to generalise partial unique indexes into partial unique constraints. (alter table foo add unique (bar) where id > 12345) > > -- > Petr Jelinek http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] Thoughts on unit testing?
On Thu, Aug 24, 2017 at 9:07 AM, Torsten Zuehlsdorff < mailingli...@toco-domains.de> wrote: > > > On 13.08.2017 21:19, Peter Geoghegan wrote: > >> On Thu, Aug 10, 2017 at 2:53 PM, Thomas Munro >> <thomas.mu...@enterprisedb.com> wrote: >> >>> The current regression tests, isolation tests and TAP tests are very >>> good (though I admit my experience with TAP is limited), but IMHO we >>> are lacking support for C-level unit testing. Complicated, fiddly >>> things with many states, interactions, edge cases etc can be hard to >>> get full test coverage on from the outside. Consider >>> src/backend/utils/mmgr/freepage.c as a case in point. >>> >> I don't want to see full test coverage of the code btw. I think that undermines the benefit from testing. I would like to see better test coverage though for reasons below. > >> It is my understanding that much of the benefit of unit testing comes >> from maintainability. >> > > I never had this understanding. I write tests to test expected behavior > and not the coded one. If possible i separate the persons writing > unit-tests from the ones writing the function itself. Having someone really > read the documentation or translate the expectation into a test-case, makes > sure, the function itself works well. > Right. I would go so far as to say I test to the documentation, not to the code. Usually when I test my own code, I write docs, then code, then I re-read the docs, and write test cases from the docs. In my not-so humble opinion, the point of tests is to make sure you don't break other people's stuff, people who are reading the docs and using them when they write their code. > > Also it really safes time in the long run. Subtle changes / bugs can be > caught which unit-tests. Also a simple bug-report can be translated into a > unit-test make sure that the bugfix really works and that no regression > will happen later. I literally saved ones a week of work with a single > unit-test. > Also comparing to the docs means we have a way to reason about where misbehaviour is occurring that means better maintainability and less code entropy in the course of fixing bugs. > > There are many other advantages, but to earn them the code need to be > written to be testable. And this is often not the case. Most literature > advises to Mocking, mixins or other techniques, which most times just > translate into "this code is not written testable" or "the technique / > language / concept / etc is not very good in being testable". > Agreed here. Usually you end up with better, more stable, more carefully designed components as a result. But as I say, documentation, design, and testing are actually harder to get right than coding So with the above being said, the fact is that a lot of advanced stuff can be done by writing C libraries that get preloaded into PostgreSQL. These C libraries risk being broken by cases where behaviour does not match documentation. So if I want to translate an oid into a db name or vice versa, I might call the internal functions to do this. Having a stable C API would be of great help in ensuring longer-term maintainability of such libraries. So I think it would be good to add some unit tests here. Of course it also means we get to decide what functionality is sufficiently stable to test and guarantee, and that saves both time in maintenance, and it improves the safety of moving forward. But I still think the question of what to test ought to be geared around "what are we willing to try to guarantee as behaviour for some years, not just to ourselves but to third parties." > > Greetings, > Torsten > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] Proposal: global index
On Thu, Aug 24, 2017 at 9:44 PM, Andres Freund <and...@anarazel.de> wrote: > Hi, > > On 2017-08-18 12:12:58 +0300, Ildar Musin wrote: > > While we've been developing pg_pathman extension one of the most frequent > > questions we got from our users was about global index support. We cannot > > provide it within an extension. And I couldn't find any recent discussion > > about someone implementing it. So I'm thinking about giving it a shot and > > start working on a patch for postgres. > > FWIW, I personally think for constraints the better approach is to make > the constraint checking code cope with having to check multiple > indexes. Initially by just checking all indexes, over the longer term > perhaps pruning the set of to-be-checked indexes based on the values in > the partition key if applicable. The problem with creating huge global > indexes is that you give away some the major advantages of partitioning: > - dropping partitions now is slow / leaves a lof of garbage again > - there's no way you can do this with individual partitions being remote > or such > - there's a good chunk of locality loss in global indexes > > The logic we have for exclusion constraints checking can essentially be > extended to do uniqueness checking over multiple partitions. Depending > on the desired deadlock behaviour one might end up doing speculative > insertions in addition. The foreign key constraint checking is fairly > simple, essentially one "just" need to remove the ONLY from the > generated check query. > To be clear, this would still require a high-level concept of a global index and the only question is whether it gets stored as multiple partitions against partitioned tables vs stored in one giant index, right? Also for foreign key constraints, does it make sense, for backwards-compatibility reasons to introduce a new syntax for checking all child tables? > > Greetings, > > Andres Freund > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] Proposal: global index
On Aug 21, 2017 07:47, "Simon Riggs"wrote: On 18 August 2017 at 15:40, Alvaro Herrera wrote: > Ildar Musin wrote: > >> While we've been developing pg_pathman extension one of the most frequent >> questions we got from our users was about global index support. We cannot >> provide it within an extension. And I couldn't find any recent discussion >> about someone implementing it. So I'm thinking about giving it a shot and >> start working on a patch for postgres. >> >> One possible solution is to create an extended version of item pointer which >> would store relation oid along with block number and position: > > I've been playing with the index code in order to allow indirect tuples, > which are stored in a format different from IndexTupleData. > > I've been adding an "InMemoryIndexTuple" (maybe there's a better name) > which internally has pointers to both IndexTupleData and > IndirectIndexTupleData, which makes it easier to pass around the index > tuple in either format. > It's very easy to add an OID to that struct, > which then allows to include the OID in either an indirect index tuple > or a regular one. If there is a unique index then there is no need for that. Additional data to the index makes it even bigger and even less useful, so we need to count that as a further disadvantage of global indexes. I have a very clear statement from a customer recently that "We will never use global indexes", based upon their absolute uselessness in Oracle. It is worth noting that the only use case I can see where global indexes fill a functionality gap are with unique indexes which allow you to enforce uniqueness across an inheritance tree where the uniqueness is orthogonal to any partition key. I could find large numbers of uses for that. That could also allow referential integrity to check against a root table rather than force partition explosion.Will Otherwise the following mostly works: Create table (like foo including all) inherits (foo); So the gap this addresses is very real even if it is narrow. > Then, wherever we're using IndexTupleData in the index AM code, we would > replace it with InMemoryIndexTuple. This should satisfy both your use > case and mine. Global indexes are a subset of indirect indexes use case but luckily not the only use. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?
On Sun, Aug 20, 2017 at 4:10 AM, MauMau <maumau...@gmail.com> wrote: > From: Chris Travers > > Why cannot you do all this in a language handler and treat as a user > defined function? > > ... > > If you have a language handler for cypher, why do you need in_region > or cast_region? Why not just have a graph_search() function which > takes in a cypher query and returns a set of records? > > The language handler is for *stored* functions. The user-defined > function (UDF) doesn't participate in the planning of the outer > (top-level) query. And they both assume that they are executed in SQL > commands. > Sure but stored functions can take arguments, such as a query string which gets handled by the language handler. There's absolutely no reason you cannot declare a function in C that takes in a Cypher query and returns a set of tuples. And you can do a whole lot with preloaded shared libraries if you need to. The planning bit is more difficult, but see below as to where I see major limits here. > > I want the data models to meet these: > > 1) The query language can be used as a top-level session language. > For example, if an app specifies "region=cypher_graph" at database > connection, it can use the database as a graph database and submit > Cypher queries without embedding them in SQL. > That sounds like a foot gun. I would probably think of those cases as being ideal for a custom background worker, similar to Mongress. Expecting to be able to switch query languages on the fly strikes me as adding totally needless complexity everywhere to be honest. Having different listeners on different ports simplifies this a lot and having, additionally, query languages for ad-hoc mixing via language handlers might be able to get most of what you want already. > > 2) When a query contains multiple query fragments of different data > models, all those fragments are parsed and planned before execution. > The planner comes up with the best plan, crossing the data model > boundary. To take the query example in my first mail, which joins a > relational table and the result of a graph query. The relational > planner considers how to scan the table, the graph planner considers > how to search the graph, and the relational planner considers how to > join the two fragments. > It seems like all you really need is a planner hook for user defined languages (I.e. "how many rows does this function return with these parameters" right?). Right now we allow hints but they are static. I wonder how hard this would be using preloaded, shared libraries. > > So in_region() and cast_region() are not functions to be executed > during execution phase, but are syntax constructs that are converted, > during analysis phase, into calls to another region's parser/analyzer > and an inter-model cast routine. > So basically they work like immutable functions except that you cannot index the output? > > 1. The relational parser finds in_region('cypher_graph', 'graph > query') and produces a parse node InRegion(region_name, query) in the > parse tree. > > 2. The relational analyzer looks up the system catalog to checks if > the specified region exists, then calls its parser/analyzer to produce > the query tree for the graph query fragment. The relational analyser > attaches the graph query tree to the InRegion node. > > 3. When the relational planner finds the graph query tree, it passes > the graph query tree to the graph planner to produce the graph > execution plan. > > 4. The relational planner produces a join plan node, based on the > costs/statistics of the relational table scan and graph query. The > graph execution plan is attached to the join plan node. > > The parse/query/plan nodes have a label to denote a region, so that > appropriate region's routines can be called. > It would be interesting to see how much of what you want you can get with what we currently have and what pieces are really missing. Am I right that if you wrote a function in C to take a Cypher query plan, and analyse it, and execute it, the only thing really missing would be feedback to the PostgreSQL planner regarding number of rows expected? > > Regards > MauMau > > -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?
On Sat, Aug 19, 2017 at 4:29 PM, MauMau <maumau...@gmail.com> wrote: > Hello, > > Please forgive me for asking such a stupid and rough question. > > I'm thinking of making PostgreSQL a multi-model database by supporting > data models other than the current relational model. A data model > consists of a query language (e.g. SQL for relational model, Cypher > for graph model), a parser and analyzer to transform a query into a > query tree, a planner to transform the query tree into an execution > plan, an executor, and a storage engine. > > To promote the data model development, I want to make data models > pluggable. The rough sketch is: > > 1) A data model developer implements the parser, analyzer, > transformer, planner, executor, and storage engine functions in a > shared library. > > 2) The DBA registers the data model. > > CREATE QUERY LANGUAGE Cypher ( > PARSER = > ); > > CREATE DATA MODEL graph ( > QUERY LANGUAGE = Cypher, > ANALYZER = , > TRANSFORMER = , > PLANNER = , > EXECUTOR = , > STORAGE ENGINE = , > ); > > CREATE REGION cypher_graph ( > QUERY LANGUAGE = Cypher, > DATA MODEL = graph > ); > Why cannot you do all this in a language handler and treat as a user defined function? > > The region is just a combination of a query language and a data model, > much like a locale is a combination of a language and a country. This > is because there may be multiple popular query languages for a data > model. > > 3) The application connects to the database, specifying a desired > region. The specified region's query language becomes the default > query language for the session. > > > The application can use the data of multiple data models in one query > by specifying another region and its query via in_region(). For > example, the following query combines the relational restaurant table > and a social graph to get the five chinese restaurants in Tokyo that > are most popular among friends of John and their friends. > > SELECT r.name, g.num_likers > FROM restaurant r, > cast_region( > in_region('cypher_graph', > 'MATCH (:Person {name:"John"})-[:IS_FRIEND_OF*1..2]-(friend), > (friend)-[:LIKES]->(restaurant:Restaurant) > RETURN restaurant.name, count(*)'), > 'relational', 'g', '(name text, num_likers int') > WHERE r.name = g.name AND > r.city = 'Tokyo' AND r.cuisine = 'chinese' > ORDER BY g.num_likers DESC LIMIT 5 If you have a language handler for cypher, why do you need in_region or cast_region? Why not just have a graph_search() function which takes in a cypher query and returns a set of records? > ; > > > What do you think would be difficult to make data models pluggable, > especially related to plugging the parser, planner, executor, etc? > One possible concern is that various PostgreSQL components might be > too dependent on the data model being relational, and it would be > difficult to separate tight coupling. > I guess I am missing why the current language handler structure is not enough. Maybe I am missing something? > > Regards > MauMau > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] Proposal: global index
I would really like to see global indexes. It would make things a lot easier for things like unique constraints across table inheritance trees. On Fri, Aug 18, 2017 at 11:12 AM, Ildar Musin <i.mu...@postgrespro.ru> wrote: > Hi hackers, > > While we've been developing pg_pathman extension one of the most frequent > questions we got from our users was about global index support. We cannot > provide it within an extension. And I couldn't find any recent discussion > about someone implementing it. So I'm thinking about giving it a shot and > start working on a patch for postgres. > > One possible solution is to create an extended version of item pointer > which would store relation oid along with block number and position: > > struct ItemPointerExt > { > Oid ip_relid; > BlockIdData ip_blkid; > OffsetNumber ip_posid; > }; > > and use it in global index (regular index will still use old version). > This will require in-depth refactoring of existing index nodes to make them > support both versions. Accordingly, we could replace ItemPointer with > ItemPointerExt in index AM to make unified API to access both regular and > global indexes. TIDBitmap will require refactoring as well to be able to > deal with relation oids. > So, to be clear on-disk representations would be unchanged for old indexes (ensuring that pg_upgrade would not be broken), right? > > It seems to be quite an invasive patch since it requires changes in > general index routines, existing index nodes, catalog, vacuum routines and > syntax. So I'm planning to implement it step by step. As a first prototype > it could be: > > * refactoring of btree index to be able to store both regular and extended > item pointers; Do you foresee any performance implementation of handling both? > * refactoring of TIDBitmap; > * refactoring of general index routines (index_insert, index_getnext, etc) > and indexAM api; > * catalog (add pg_index.indisglobal attribute and/or a specific relkind as > discussed in [1] thread); > * syntax for global index definition. E.g., it could be oracle-like syntax: > > CREATE INDEX my_idx ON my_tbl (key) GLOBAL; > > If it goes well, then I’ll do the rest of indexes and vacuuming. If you > have any ideas or concerns I’ll be glad to hear it. > > [1] https://www.postgresql.org/message-id/c8fe4f6b-ff46-aae0-89e > 3-e936a35f0cfd%40postgrespro.ru > > Thanks! > > -- > Ildar Musin > i.mu...@postgrespro.ru > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] Orphaned files in base/[oid]
On Wed, Aug 16, 2017 at 7:15 PM, Andres Freund <and...@anarazel.de> wrote: > > > I think this entirely is the wrong approach. We shouldn't add weird > check commands that require locks on pg_class, we should avoid leaving > the orphaned files in the first place. I've upthread outlined > approached how to do so. > And in the mean time I suppose there is no reason that the approach I have outlined cannot be used by an external program.I think it is sane to draw the line at having the db responsible for cleaning up after itself when something goes wrong and leave external programs to do after-the-fact review and cleanup. > > Greetings, > > Andres Freund > -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] Orphaned files in base/[oid]
So having throught about this a bit more, and having had some real-world experience with the script now, I have an idea that might work and some questions to make it succeed. My thinking is to add a new form of vacuum called VACUUM FSCK. This would: 1. lock pg_class in exclusive mode (or do I need exclusive access?), as this is needed to solve the race conditions. As I see, this seems to bring the database to a screeching halt concurrency-wise (but unlike my script would allow other databases to be accessed normally). 2. read the files where the name consists of only digits out of the filesystem and compare with oids from pg_class and relfilenodes 3. Any file not found in that list would then unlink it, as well as any files with the patter followed by an underscore or period. This would mean that the following cases would not be handled: If you have the first extent gone but later extents are present we check on the first extant, and so would not see the later ones. Same goes for visibility maps and other helper files. If you add a file in the directory which has a name like 34F3A222BC, that would never get cleaned up because it contains non-digits. So this leads to the following questions: 1. Is locking pg_class enough to avoid race conditions? Is exclusive mode sufficient or do I need exclusive access mode? 2. would it be preferable to move the file to a directory rather than unlinking it? 3. Should I perform any sort of check on the tables at the end to make sure everything is ok? -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] Orphaned files in base/[oid]
On Tue, Aug 15, 2017 at 3:32 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Chris Travers <chris.trav...@adjust.com> writes: > > I wonder about a different solution. Would it be possible to special > case > > vacuum to check for and remove (or just move to where they can be > removed) > > files when vacuuming pg_class? At the point we are vacuuming pg_class, > we > > ought to be able to know that a relfilenode shouldn't be used anymore, > > right? > > I don't think so. It's not clear to me whether you have in mind "scan > pg_class, collect relfilenodes from all live tuples, then zap all files > not in that set" or "when removing a dead tuple, zap the relfilenode > it mentions". But neither one works. The first case has a race condition > against new pg_class entries. As for the second, the existence of a dead > tuple bearing relfilenode N isn't evidence that some other live tuple > can't have relfilenode N. > Ah because if the file never made it on to disk the number could be re-used. > > Another problem for the second solution is that in the case you're worried > about (ie, PANIC due to out-of-WAL-space during relation's creating > transaction), there's no very good reason to expect that the relation's > pg_class tuple ever made it to disk at all. > > A traditional low-tech answer to this has been to keep the WAL on a > separate volume from the main data store, so that it's protected from > out-of-space conditions in the main store and temp areas. The space > needs for WAL proper are generally much more predictable than the main > store, so it's easier to keep the dedicated space from overflowing. > (Stalled replication/archiving processes can be hazardous to your > health in this scenario, though, if they prevent prompt recycling of > WAL files.) > Yeah, most of our dbs here have wal on a separate volume but not this system. This system is also unusual in that disk usage varies wildly (and I am not 100% sure that this is the only case which causes it though I can reproduce it consistently in the case of the wal writer running out of disk space with symptoms exactly what I found). So for now that leaves my fallback approach as a way to fix it when I see it. I have written a shell script which does as follows: 1. starts Postgres in single user mode with a data directory or dies (won't run if Postgres seems to be already running) 2. gets the old of the current database 3. lists all files consisting of only digits in the base/[dboid] directory 4. asks Postgres (In single user mode again) for all relfilenodes and oids of tables (In my testing both were required because there were some cases where relfilenodes were not set in some system 5. Loops through the file nodes gathered, checks against the relfilenode entries, and zaps $f, $f_*, and $f.*. Currently for testing "zaps" has been to move to a lostnfound folder for inspection following the script. The logic here is not perfect and is very slightly under inclusive, but better that than the other way. Then we can start Postgres again. I cannot find a better way to avoid race conditions, I guess. At any rate it sounds like preventing the problem more generally may be something beyond what I would feel comfortable trying to do as a patch at my current level of familiarity with he source code. The full script is included inline below my signature in case it is of interest to anyone on the list. > regards, tom lane > -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin --- #!/bin/bash datadir=$1 database=$2 pg_ctl -D $datadir stop dboidfile="$PWD/cleanupdb.oid" reloidfile="$PWD/refilenodes.list" echo "COPY (select oid from pg_database where datname = current_database()) TO '$dboidfile'" | postgres --single -D $datadir $database > /dev/null if (($?)) then echo "FATAL: Could not start Postgres in single user mode" exit 1 fi dboid=`cat $dboidfile` filenodes=`(cd test/base/$dboid; ls [0-9]*[0-9] | grep -v '\.' | sort -n)` #echo $filenodes echo "COPY (select relfilenode from pg_class union select oid as relfilenode from pg_class) TO '$reloidfile'" | postgres --single -D $datadir $database > /dev/null relfilenodes=`cat $reloidfile` #echo $relfilenodes if [[ -z relfilenodes ]] then echo "FATAL: did not get any relfilenodes" exit 2 fi mkdir lostnfound; for f in $filenodes do if [[ -z `echo $relfilenodes | grep -w $f` ]] then echo moving $f to lostnfound mv $datadir/base/$dboid/$f lostnfound mv $datadir/base/$dboid/${f}_* lostnfound 2> /dev/null mv $datadir/base/$dboid/${f}.* lostnfound 2> /dev/null fi done rm $dboidfile rm $reloidfile
Re: [HACKERS] Orphaned files in base/[oid]
There's another side to this and that I am not sure it is a backend crash. Here is what I did to reproduce: 2 virtual disk images: 100mb for main data, 40 MB for WAL. work_mem set to 256MB. The idea is to test different out of space conditions. Create table as ...; drop table; select pg_size_pretty(pg_current_xlog_location() - '0/0'); I played around with parameters to determine how different kinds of out of space errors were handled. 1. running out of temp space was cleaned up without a server restart needed. 2. A relation running out of disk space *seemed* to get cleaned up. 3. Running out of WAL space left *both* temp and non-temp files. I wonder about a different solution. Would it be possible to special case vacuum to check for and remove (or just move to where they can be removed) files when vacuuming pg_class? At the point we are vacuuming pg_class, we ought to be able to know that a relfilenode shouldn't be used anymore, right? -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] Orphaned files in base/[oid]
On Mon, Aug 14, 2017 at 8:40 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > > > It would be possible to have orphaned non-temp tables if you'd suffered > a crash during the transaction that created those tables. Ordinarily > a newly-created table file wouldn't be that large, but if your workflow > created tables and shoved boatloads of data into them in the same > transaction, it's not so hard to see this becoming an issue. > I think the working theory is that these were very like a number of very large (multi-hundred-GB materialised views). > > The core problem with zapping non-temp table files is that you can't > do that unless you're sure you have consistent, up-to-date pg_class > data that nobody else is busy adding to. It's hard to see an external > application being able to do that safely. You certainly can't do it > at the point in the postmaster startup cycle where we currently do > the other things --- for those, we rely only on filesystem naming > conventions to identify what to zap. Yeah that occurred to me. At this point I would settle for something I could run with Postgres in single user mode. Although that is very far from ideal. So what I wonder is if at least a short-term solution might be a utility that starts Postgres in single user mode and we insist that PostgreSQL is otherwise not running before the run. I am certainly not feeling qualified at present for more advanced solutions but that I might be able to do. > > regards, tom lane > -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] Orphaned files in base/[oid]
On Mon, Aug 14, 2017 at 6:33 PM, Andres Freund <and...@anarazel.de> wrote: > Hi, > > On 2017-08-14 14:12:22 +0200, Chris Travers wrote: > > Problem: > > The system this came up on is PostgreSQL 9.6.3 and has had repeated > trouble > > with disk space. Querying pg_database_size, as well as du on the > > subdirectory of base/ show total usage to be around 3.8TB. Summing up > the > > size of the relations in pg_class though shows around 2.1TB. > > > > Initial troubleshooting found around 150 GB of space in pg_temp which had > > never been cleared and was at least several days old. Restarting the > > server cleared these up. > > > > Poking around the base/[oid] directory, I found a large number of files > > which did not correspond with a pg_class entry. One of the apparent > > relations was nearly 1TB in size. > > > > What I think happened: > > I think various pg_temp/* and orphaned relation files (In base/[oid]) > were > > created when PostgreSQL crashed due to running out of space in various > > operations including creating materialised views. > > > > So my question is if there is a way we can safely clean these up on > server > > restart? If not does it make sense to try to create a utility that can > > connect to PostgreSQL, seek out valid files, and delete the rest? > > I think the fix here is to call RemovePgTempFiles() during > crash-restarts, instead of just full starts. The previously stated need > to be able to inspect temp files after a crash can be less impactfully > fulfilled with restart_after_crash = false. > > But that only clears temp files right? I am less concerned about the temp files because a restart clears them. The bigger issue I see are with the orphaned base files. It looks like files in base/[oid] don't get cleaned up either if I read my output correctly and it would explain why we saw 1.7TB of discrepancy between relations and database size. Safety-wise it seems like the best way out of that is a dump/restore but doing that with a 2.1TB database is annoying. > Greetings, > > Andres Freund > -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] Orphaned files in base/[oid]
On Aug 14, 2017 14:12, "Chris Travers" <chris.trav...@adjust.com> wrote: Hi all; I am trying to track down a problem we are seeing that looks very similar to bug #12050, and would certainly consider trying to contribute a fix if we agree on one. (I am not sure we can, so absent that, the next question is whether it makes sense to create a utility to fix the problem when it comes up so that a dump/restore is not needed). The system: PostgreSQL 9.6.3 Gentoo Linux. Problem: The system this came up on is PostgreSQL 9.6.3 and has had repeated trouble with disk space. Querying pg_database_size, as well as du on the subdirectory of base/ show total usage to be around 3.8TB. Summing up the size of the relations in pg_class though shows around 2.1TB. Initial troubleshooting found around 150 GB of space in pg_temp which had never been cleared and was at least several days old. Restarting the server cleared these up. Poking around the base/[oid] directory, I found a large number of files which did not correspond with a pg_class entry. One of the apparent relations was nearly 1TB in size. What I think happened: I think various pg_temp/* and orphaned relation files (In base/[oid]) were created when PostgreSQL crashed due to running out of space in various operations including creating materialised views. So my question is if there is a way we can safely clean these up on server restart? If not does it make sense to try to create a utility that can connect to PostgreSQL, seek out valid files, and delete the rest? Ok I have identified one case where symptoms I am seeing can be reproduced. I am currently working on a Mac so there may be quirks in my repro. However When the WAL writer runs out of disk space no cleanup is done. So I will be looking at possible solutions next. -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 <+49%20162%209037210> | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
[HACKERS] Orphaned files in base/[oid]
Hi all; I am trying to track down a problem we are seeing that looks very similar to bug #12050, and would certainly consider trying to contribute a fix if we agree on one. (I am not sure we can, so absent that, the next question is whether it makes sense to create a utility to fix the problem when it comes up so that a dump/restore is not needed). The system: PostgreSQL 9.6.3 Gentoo Linux. Problem: The system this came up on is PostgreSQL 9.6.3 and has had repeated trouble with disk space. Querying pg_database_size, as well as du on the subdirectory of base/ show total usage to be around 3.8TB. Summing up the size of the relations in pg_class though shows around 2.1TB. Initial troubleshooting found around 150 GB of space in pg_temp which had never been cleared and was at least several days old. Restarting the server cleared these up. Poking around the base/[oid] directory, I found a large number of files which did not correspond with a pg_class entry. One of the apparent relations was nearly 1TB in size. What I think happened: I think various pg_temp/* and orphaned relation files (In base/[oid]) were created when PostgreSQL crashed due to running out of space in various operations including creating materialised views. So my question is if there is a way we can safely clean these up on server restart? If not does it make sense to try to create a utility that can connect to PostgreSQL, seek out valid files, and delete the rest? -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: [HACKERS] Funny WAL corruption issue
On Fri, Aug 11, 2017 at 1:33 PM, Greg Stark <st...@mit.edu> wrote: > On 10 August 2017 at 15:26, Chris Travers <chris.trav...@gmail.com> wrote: > > > > > > The bitwise comparison is interesting. Remember the error was: > > > > pg_xlogdump: FATAL: error in WAL record at 1E39C/E1117FB8: unexpected > > pageaddr 1E375/61118000 in log segment 0001E39C00E1, offset > > 1146880 > ... > > Since this didn't throw a checksum error (we have data checksums > disabled but wal records ISTR have a separate CRC check), would this > perhaps indicate that the checksum operated over incorrect data? > > No checksum error and this "unexpected pageaddr" doesn't necessarily > mean data corruption. It could mean that when the database stopped logging > it was reusing a wal file and the old wal stream had a record boundary > on the same byte position. So the previous record checksum passed and > the following record checksum passes but the record header is for a > different wal stream position. > > I think you could actually hack xlogdump to ignore this condition and > keep outputting and you'll see whether the records that follow appear > to be old wal log data. I haven't actually tried this though. > For better or worse, I get a different error at the same spot if I try this: Doing so involved disabling the check in the backend wal reader. pg_xlogdump: FATAL: error in WAL record at 1E39C/E1117FB8: invalid contrecord length 4509 at 1E39C/E1117FF8 If I hack it to ignore all errors on that record, no further records come out though it does run over the same records. This leads me to conclude there are no further valid records. > > -- > greg > -- Best Wishes, Chris Travers Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor lock-in. http://www.efficito.com/learn_more
Re: [HACKERS] Funny WAL corruption issue
On Fri, Aug 11, 2017 at 1:33 PM, Greg Stark <st...@mit.edu> wrote: > On 10 August 2017 at 15:26, Chris Travers <chris.trav...@gmail.com> wrote: > > > > > > The bitwise comparison is interesting. Remember the error was: > > > > pg_xlogdump: FATAL: error in WAL record at 1E39C/E1117FB8: unexpected > > pageaddr 1E375/61118000 in log segment 0001E39C00E1, offset > > 1146880 > ... > > Since this didn't throw a checksum error (we have data checksums > disabled but wal records ISTR have a separate CRC check), would this > perhaps indicate that the checksum operated over incorrect data? > > No checksum error and this "unexpected pageaddr" doesn't necessarily > mean data corruption. It could mean that when the database stopped logging > it was reusing a wal file and the old wal stream had a record boundary > on the same byte position. So the previous record checksum passed and > the following record checksum passes but the record header is for a > different wal stream position. > I expect to test this theory shortly. Assuming it is correct, what can we do to prevent restarts of slaves from running into it? > I think you could actually hack xlogdump to ignore this condition and > keep outputting and you'll see whether the records that follow appear > to be old wal log data. I haven't actually tried this though. > > -- > greg > -- Best Wishes, Chris Travers Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor lock-in. http://www.efficito.com/learn_more
Re: [HACKERS] Funny WAL corruption issue
On Thu, Aug 10, 2017 at 3:17 PM, Vladimir Rusinov <vrusi...@google.com> wrote: > > > On Thu, Aug 10, 2017 at 1:48 PM, Aleksander Alekseev < > a.aleks...@postgrespro.ru> wrote: > >> I just wanted to point out that a hardware issue or third party software >> issues (bugs in FS, software RAID, ...) could not be fully excluded from >> the list of suspects. According to the talk by Christophe Pettus [1] >> it's not that uncommon as most people think. > > > This still might be the case of hardware corruption, but it does not look > like one. > Yeah, I don't think so either. The systems were not restarted, only the service so I don't think this is a lie-on-write case. We have EEC with full checks, etc. It really looks like something I initiated caused it but not sure what and really not interested in trying to reproduce on a db of this size. > > Likelihood of two different persons seeing similar error message just a > year apart is low. From our practice hardware corruption usually looks like > a random single bit flip (most common - bad cpu or memory), bunch of zeroes > (bad storage), or bunch of complete garbage (usually indicates in-memory > pointer corruption). > > Chris, if you still have original WAL segment from the master and it's > corrupt copy from standby, can you do bit-by-bit comparison to see how they > are different? Also, if you can please share some hardware details. > Specifically, do you use ECC? If so, are there any ECC errors logged? Do > you use physical disks/ssd or some form of storage virtualization? > Straight on bare metal, eec with no errors logged. SSD for both data and wal. The bitwise comparison is interesting. Remember the error was: pg_xlogdump: FATAL: error in WAL record at 1E39C/E1117FB8: unexpected pageaddr 1E375/61118000 in log segment 0001E39C00E1, offset 1146880 Starting with the good segment: Good wall segment, I think the record starts at 003b: 0117fb0 003b 0117fc0 7f28 e111 e39c 0001 0940 cb88 db01 0117fd0 0200 067f 4000 2249 0195 0117fe0 0001 8001 b5c3 05ff 0117ff0 0003 008c 0118000 d093 0005 0001 8000 e111 e39c 0001 0118010 0084 7fb8 e111 e39c 0001 0118020 0910 ccac 2eba 2000 0056 067f 0118030 4000 2249 0195 b5c4 08ff 0001 0118040 0002 0003 0004 0005 0006 0007 0008 0009 0118050 000a 000b 000c 000d 000e 000f 0010 0011 0118060 0012 0013 0014 0015 0016 0017 0018 0019 0118070 001a 001b 001c 001d 001e 001f 0020 0021 0117fb0 003b 0117fc0 7f28 e111 e39c 0001 0940 cb88 db01 0117fd0 0200 067f 4000 2249 0195 0117fe0 0001 8001 b5c3 05ff 0117ff0 0003 4079 ce05 1cce ecf9 0118000 d093 0005 0001 8000 6111 e375 0001 0118010 119d cfd4 00cc ca00 0410 0118020 1800 7c00 5923 544b dc20 914c 7a5c afec 0118030 db45 0060 b700 1910 1800 7c00 791f 2ede 0118040 c573 a110 5a88 e1e6 ab48 0034 9c00 2210 0118050 1800 7c00 4415 400d 2c7e b5e3 7c88 bcef 0118060 4666 00db 9900 0a10 1800 7c00 7d1d b355 0118070 d432 8365 de99 4dba 87c7 00ed 6200 2210 I think the divergence is interesting here. Up through 0117ff8, they are identical. Then the last half if the line differs. The first half of the next is the same (but up through 011800a this time), but the last 6 bytes differ (those six hold what appear to be the memory address causing the problem), and we only have a few bits different in the rest of the line. It looks like some data and some flags were overwritten, perhaps while the process exited. Very interesting. > Also, in absolute majority of cases corruption is caught by checksums. I > am not familiar with WAL protocol - do we have enough checksums when > writing it out and on the wire? I suspect there are much more things > PostgreSQL can do to be more resilient, and at least detect corruptions > earlier. > Since this didn't throw a checksum error (we have data checksums disabled but wal records ISTR have a separate CRC check), would this perhaps indicate that the checksum operated over incorrect data? > > -- > Vladimir Rusinov > PostgreSQL SRE, Google Ireland > > Google Ireland Ltd.,Gordon House, Barrow Street, Dublin 4, Ireland > Registered in Dublin, Ireland > Registration Number: 368047 > -- Best Wishes, Chris Travers Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor lock-in. http://www.efficito.com/learn_more
Re: [HACKERS] Funny WAL corruption issue
> Yes. Exactly the same output until a certain point where pg_xlogdump dies > with an error. That is the same LSN where PostgreSQL died with an error on > restart. > One other thing that is possibly relevant after reading through the last bug report is the error pgxlogdumo gives: pg_xlogdump: FATAL: error in WAL record at 1E39C/E1117FB8: unexpected pageaddr 1E375/61118000 in log segment 0001E39C00E1, offset 1146880 > -- Best Wishes, Chris Travers Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor lock-in. http://www.efficito.com/learn_more
Re: [HACKERS] Funny WAL corruption issue
Sorry, meant to reply all. On Thu, Aug 10, 2017 at 2:19 PM, Vladimir Borodin <r...@simply.name> wrote: > Hi, Chris. > > 10 авг. 2017 г., в 15:09, Chris Travers <chris.trav...@gmail.com> > написал(а): > > Hi; > > I ran into a funny situation today regarding PostgreSQL replication and > wal corruption and wanted to go over what I think happened and what I > wonder about as a possible solution. > > Basic information is custom-build PostgreSQL 9.6.3 on Gentoo, on a ~5TB > database with variable load. Master database has two slaves and generates > 10-20MB of WAL traffic a second. The data_checksum option is off. > > > The problem occurred when I attempted to restart the service on the slave > using pg_ctl (I believe the service had been started with sys V init > scripts). On trying to restart, it gave me a nice "Invalid memory > allocation request" error and promptly stopped. > > The main logs showed a lot of messages like before the restart: > 2017-08-02 11:47:33 UTC LOG: PID 19033 in cancel request did not match > any process > 2017-08-02 11:47:33 UTC LOG: PID 19032 in cancel request did not match > any process > 2017-08-02 11:47:33 UTC LOG: PID 19024 in cancel request did not match > any process > 2017-08-02 11:47:33 UTC LOG: PID 19034 in cancel request did not match > any process > > On restart, the following was logged to stderr: > LOG: entering standby mode > LOG: redo starts at 1E39C/8B77B458 > LOG: consistent recovery state reached at 1E39C/E1117FF8 > FATAL: invalid memory alloc request size 3456458752 > LOG: startup process (PID 18167) exited with exit code 1 > LOG: terminating any other active server processes > > LOG: database system is shut down > > After some troubleshooting I found that the wal segment had become > corrupt, I copied the correct one from the master and everything came up to > present. > > So It seems like somewhere something crashed big time on the back-end and > when we tried to restart, the wal ended in an invalid way. > > > We have reported the same thing [1] nearly a year ago. Could you please > check with pg_xlogdump that both WALs (normal from master and corrupted) > are exactly the same until some certain LSN? > > [1] https://www.postgresql.org/message-id/20160614103415. > 5796.6885%40wrigleys.postgresql.org > Yes. Exactly the same output until a certain point where pg_xlogdump dies with an error. That is the same LSN where PostgreSQL died with an error on restart. > > > I am wondering what can be done to prevent these sorts of things from > happening in the future if, for example, a replica dies in the middle of a > wal fsync. > -- > Best Wishes, > Chris Travers > > Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor > lock-in. > http://www.efficito.com/learn_more > > > > -- > May the force be with you… > https://simply.name > > -- Best Wishes, Chris Travers Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor lock-in. http://www.efficito.com/learn_more
[HACKERS] Funny WAL corruption issue
Hi; I ran into a funny situation today regarding PostgreSQL replication and wal corruption and wanted to go over what I think happened and what I wonder about as a possible solution. Basic information is custom-build PostgreSQL 9.6.3 on Gentoo, on a ~5TB database with variable load. Master database has two slaves and generates 10-20MB of WAL traffic a second. The data_checksum option is off. The problem occurred when I attempted to restart the service on the slave using pg_ctl (I believe the service had been started with sys V init scripts). On trying to restart, it gave me a nice "Invalid memory allocation request" error and promptly stopped. The main logs showed a lot of messages like before the restart: 2017-08-02 11:47:33 UTC LOG: PID 19033 in cancel request did not match any process 2017-08-02 11:47:33 UTC LOG: PID 19032 in cancel request did not match any process 2017-08-02 11:47:33 UTC LOG: PID 19024 in cancel request did not match any process 2017-08-02 11:47:33 UTC LOG: PID 19034 in cancel request did not match any process On restart, the following was logged to stderr: LOG: entering standby mode LOG: redo starts at 1E39C/8B77B458 LOG: consistent recovery state reached at 1E39C/E1117FF8 FATAL: invalid memory alloc request size 3456458752 LOG: startup process (PID 18167) exited with exit code 1 LOG: terminating any other active server processes LOG: database system is shut down After some troubleshooting I found that the wal segment had become corrupt, I copied the correct one from the master and everything came up to present. So It seems like somewhere something crashed big time on the back-end and when we tried to restart, the wal ended in an invalid way. I am wondering what can be done to prevent these sorts of things from happening in the future if, for example, a replica dies in the middle of a wal fsync. -- Best Wishes, Chris Travers Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor lock-in. http://www.efficito.com/learn_more
Re: [HACKERS] GiST support for UUIDs
On Tue, Nov 29, 2016 at 1:13 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Pushed with that change and some other mostly-cosmetic tweaking. Thank you for addressing all those issues, Tom! I tested some exclusion constraints that are interesting to me, and everything seems to be working well. -- Chris -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GiST support for UUIDs
On Mon, Nov 28, 2016 at 4:04 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > > What I would suggest is that you forget the union hack and just use > memcmp in all the comparison functions. It's not likely to be worth > the trouble to try to get those calls to be safely aligned. The > only place where you need go to any trouble is in uuid_parts_distance, > which could be redesigned to memcpy a not-necessarily-aligned source > into a local uint64[2] array and then byte-swap if needed. Done. I only have one architecture to test on (Linux, Intel x86_64) so I must defer to others in this regard. > I don't entirely see the point of making pg_uuid_t an opaque struct when > the only interesting thing about it, namely UUID_LEN, is exposed anyway. > So my inclination would be to just do > > typedef struct pg_uuid_t > { > unsigned char data[UUID_LEN]; > } pg_uuid_t; > > in uuid.h and forget the idea of it being opaque. Done. > Also, the patch could be made a good deal smaller (and easier to review) > in the wake of commit 40b449ae8: you no longer need to convert > btree_gist--1.2.sql into btree_gist--1.3.sql, just leave it alone and add > btree_gist--1.2--1.3.sql. That way we don't have to worry about whether > the upgrade script matches the change in the base script. Done. -- Chris btree_gist_uuid_8.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] extract text from XML
Hi, I have found a basic use case which is supported by the xml2 module, but is unsupported by the new XML API. It is not possible to correctly extract text (either from text nodes or attribute values) which contains the characters '<', '&', or '>'. xpath() (correctly) returns XML text nodes for queries targeting these node types, and there is no inverse to xmlelement(). For example: => select (xpath('/a/text()', xmlelement(name a, '<&>')))[1]::text; xpath --- (1 row) Again, not a bug; but there is no way to specify my desired intent. The xml2 module does provide such a function, xpath_string: => select xpath_string(xmlelement(name a, '<&>')::text, '/a/text()'); xpath_string -- <&> (1 row) One workaround is to return the node's text value by serializing the XML value, and textually replacing those three entities with the characters they represent, but this relies on the xpath() function not generating other entities. (My use case is importing data in XML format, and processing with Postgres into a relational format.) Perhaps a function xpath_value(text, xml) -> text[] would close the gap? (I did search and no such function seems to exist currently, outside xml2.) Thanks, Chris -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] OS X 10.11.3, psql, bus error 10, 9.5.1
unfortunately, I have to admit to my disgrace, that I'm still no C programmer after all these decades of dabbling in writing code. I just used the flags because someone at some point told me that it was a good idea, turns out, it's not [always]. I shall rebuild 9.5.1 without the -fno-common flag and see if that fixes things. Thanks Tom for spending part of your weekend on this. Chris. > On Mar 12, 2016, at 17:58, Tom Lane <t...@sss.pgh.pa.us> wrote: > > I wrote: >> That's confusing because it implies that -fno-common is the default, >> which it evidently is not. But anyway, my diagnosis is that you're >> breaking something about the linker's behavior with that switch. > > Oh! Looking closer, the core dump happens here: > > const printTextFormat pg_utf8format; > > printTextFormat *popt = (printTextFormat *) _utf8format; > > --> popt->name = "unicode"; > > So apparently, the relevant property of "-fno-common" is that it > causes "const" variables to actually get placed in read-only data. > > I think this code is new in 9.5, which'd explain why you didn't see > the failure with older PG versions. It's surely busted though. > > I shall get rid of the const-ness, as well as the lame casting away > of it, and I think I will also go make buildfarm member longfin use > "-fno-common". It is truly sad that we apparently have no test > machine that enforces that const means const ... > > regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Removing max_connection requirement on hot_standby
I'm orchestrating Postgres to behave as a leader-follower cluster. I've run into issues when I am scaling down a connection count for a cluster (scaling up is fine -- scaling down results in fatal errors). I use an open source tool I've written to orchestrate the cluster called Governor ( http://github.com/compose/governor). It would work if I weren't running hot_standby, but alas…I'm running with it. I found the code which throws the fatal is actually a pre-flight test for hot_standby written in 2009 (i.e. battle tested): https://github.com/postgres/postgres/blob/efc16ea520679d713d98a2c7bf1453c4ff7b91ec/src/backend/access/transam/xlog.c#L5312-L5321 I've tested changing this value from a FATAL to a WARN. I've compiled and tested my scenario and all appears to be correct: https://github.com/compose/postgres/commit/2bdf6b36821987aadb401e1b8590ecc5b02126d8 In researching these lines of code, it appears the original FATAL code was put in place to ensure that a hot_standby is as close as possibly configured the same as the leader. This change will also allow backups taken using `pg_basebackup` to work with settings that different from the original host. Am I missing something with this change? Cheers, Chris
Re: [HACKERS] CTE optimization fence on the todo list?
I need this feature a lot. Can anyone point me to a place in the code where I can hack together a quick-and-dirty, compatibility-breaking implementation? Thanks! On Sun, May 3, 2015 at 10:03 PM, Jim Nasby jim.na...@bluetreble.com wrote: On 5/3/15 11:59 AM, Andrew Dunstan wrote: On 05/03/2015 11:49 AM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: On 05/01/2015 07:24 PM, Josh Berkus wrote: (A possible compromise position would be to offer a new GUC to enable/disable the optimization globally; that would add only a reasonably small amount of control code, and people who were afraid of the change breaking their apps would probably want a global disable anyway.) This could be a very bad, almost impossible to catch, behaviour break. Even if we add the GUC, we're probably going to be imposing very significant code audit costs on some users. On what grounds do you claim it'd be a behavior break? It's possible that the subquery flattening would result in less-desirable plans not more-desirable ones, but the results should still be correct. I meant w.r.t. performance. Sorry if that wasn't clear. To put this in perspective... I've seen things like this take query runtime from minutes to multiple hours or worse; bad enough that behavior break becomes a valid description. We definitely need to highlight this in the release notes, and I think the GUC would be mandatory. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CTE optimization fence on the todo list?
Has there been any movement on this in the last couple years? I could really use the ability to optimize across CTE boundaries, and it seems like a lot of other people could too.
[HACKERS] Re: [pgsql-pkg-debian] Updated libpq5 packages cause connection errors on postgresql 9.2
Hi Christoph, - Original Message - From: Christoph Berg c...@df7cb.de To: Chris Butler cbut...@zedcore.com Googling for digest too big for rsa key seems to indicate that this problem occurs when you are using (client?) certificates with short RSA keys. 512 bits is most often cited in the problem reports, something like 768 is around the minimum size that works, and of course, anything smaller than 1024 or really 1536 (or 2048) bits is too small for today's crypto standards. So the question here is if this is also the problem you saw - are you using client or server certificates with short keys? Yes, that would appear to be the case - the key we're using is only 512 bits. I'll make sure we replace the certificate before re-applying the update (which will probably be after the holidays now). What this explanation doesn't explain is why the problem occurs with 9.4's libpq5 while it works with 9.3's. The libssl version used for building these packages should really be the same, 9.3.5-2.pgdg70+1 was built just two days ago as well. For info, I can confirm that both libraries are loading the same libssl: zedcore@web2:/tmp/usr/lib/x86_64-linux-gnu$ ldd /usr/lib/x86_64-linux-gnu/libpq.so.5 | grep libssl libssl.so.1.0.0 = /usr/lib/x86_64-linux-gnu/libssl.so.1.0.0 (0x7f3e8d898000) zedcore@web2:/tmp/usr/lib/x86_64-linux-gnu$ ldd ./libpq.so.5 | grep libssl libssl.so.1.0.0 = /usr/lib/x86_64-linux-gnu/libssl.so.1.0.0 (0x7f5d76176000) I can see a few changes are listed in the 9.4 changelog relating to SSL, so my guess would be one of those changes has altered the behaviour of libssl when presented with a small key. -- Chris Butler Zedcore Systems Ltd Telephone: 0114 303 0666 Direct dial: 0114 303 0572 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] why does LIMIT 2 take orders of magnitude longer than LIMIT 1 in this query?
I'm on PostgreSQL 9.3. This should reproduce on any table with 100,000+ rows. The EXPLAIN ANALYZE shows many more rows getting scanned with LIMIT 2, but I can't figure out why. Limit 1: EXPLAIN ANALYZE WITH base AS ( SELECT *, ROW_NUMBER() OVER () AS rownum FROM a_big_table ), filter AS ( SELECT rownum, true AS thing FROM base ) SELECT * FROM base LEFT JOIN filter USING (rownum) WHERE filter.thing LIMIT 1 Result: Limit (cost=283512.19..283517.66 rows=1 width=2114) (actual time=0.019..0.019 rows=1 loops=1) CTE base - WindowAgg (cost=0.00..188702.69 rows=4740475 width=101) (actual time=0.008..0.008 rows=1 loops=1) - Seq Scan on a_big_table (cost=0.00..129446.75 rows=4740475 width=101) (actual time=0.003..0.003 rows=1 loops=1) CTE filter - CTE Scan on base base_1 (cost=0.00..94809.50 rows=4740475 width=8) (actual time=0.000..0.000 rows=1 loops=1) - Nested Loop (cost=0.00..307677626611.24 rows=56180269915 width=2114) (actual time=0.018..0.018 rows=1 loops=1) Join Filter: (base.rownum = filter.rownum) - CTE Scan on base (cost=0.00..94809.50 rows=4740475 width=2113) (actual time=0.011..0.011 rows=1 loops=1) - CTE Scan on filter (cost=0.00..94809.50 rows=2370238 width=9) (actual time=0.002..0.002 rows=1 loops=1) Filter: thing Total runtime: 0.057 ms Limit 2: EXPLAIN ANALYZE WITH base AS ( SELECT *, ROW_NUMBER() OVER () AS rownum FROM a_big_table ), filter AS ( SELECT rownum, true AS thing FROM base ) SELECT * FROM base LEFT JOIN filter USING (rownum) WHERE filter.thing LIMIT 2 Result: Limit (cost=283512.19..283523.14 rows=2 width=2114) (actual time=0.018..14162.283 rows=2 loops=1) CTE base - WindowAgg (cost=0.00..188702.69 rows=4740475 width=101) (actual time=0.008..4443.359 rows=4714243 loops=1) - Seq Scan on a_big_table (cost=0.00..129446.75 rows=4740475 width=101) (actual time=0.002..1421.622 rows=4714243 loops=1) CTE filter - CTE Scan on base base_1 (cost=0.00..94809.50 rows=4740475 width=8) (actual time=0.001..10214.684 rows=4714243 loops=1) - Nested Loop (cost=0.00..307677626611.24 rows=56180269915 width=2114) (actual time=0.018..14162.280 rows=2 loops=1) Join Filter: (base.rownum = filter.rownum) Rows Removed by Join Filter: 4714243 - CTE Scan on base (cost=0.00..94809.50 rows=4740475 width=2113) (actual time=0.011..0.028 rows=2 loops=1) - CTE Scan on filter (cost=0.00..94809.50 rows=2370238 width=9) (actual time=0.009..6595.770 rows=2357122 loops=2) Filter: thing Total runtime: 14247.374 ms
Re: [HACKERS] Proposal for better support of time-varying timezone abbreviations
On Tue, Oct 7, 2014 at 5:05 PM, Tom Lane t...@sss.pgh.pa.us wrote: typedef struct { ! char token[TOKMAXLEN + 1]; /* now always null-terminated */ char type; ! int32 value; } datetkn; Being entirely new to this code, now makes me think of the current timestamp. I think this word can be removed to reduce ambiguity. + /* use strncmp so that we match truncated tokens */ result = strncmp(key, position-token, TOKMAXLEN); In your proposal you wanted to remove crufty code that deals with non-null-terminated token strings. Is this some of that crufty code? Can it be removed? -- Chris
Re: [HACKERS] 9.3 Json Array's
On 24 September 2013 at 13:46 Andrew Dunstan and...@dunslane.net wrote: Feel free to ask questions. The heart of the API is the event handlers defined in this stuct in include/utils/jsonapi.h: typedef struct JsonSemAction { void *semstate; json_struct_action object_start; json_struct_action object_end; json_struct_action array_start; json_struct_action array_end; json_ofield_action object_field_start; json_ofield_action object_field_end; json_aelem_action array_element_start; json_aelem_action array_element_end; json_scalar_action scalar; } JsonSemAction; Basically there is a handler for the start and end of each non-scalar structural element in JSON, plus a handler for scalars. There are several problems that will be posed by processing nested arrays and objects, including: * in effect you would need to construct a stack of state that could be pushed and popped True. * JSON arrays aren't a very good match for SQL arrays - they are unidimensional and heterogenous. This is true, but I think one would have to start with an assumption that the data is valid for an SQL type and then check again once one gets it done. JSON is a pretty flexible format which makes it a poor match in many cases for SQL types generally. But I think the example so far (with json_populate_recordset) is a good one, namely a best effort conversion. I'm not saying this can't be done - it will just take a bit of effort. Yeah, looking through the code, I think it will be more work than I originally thought but that just means it will take longer. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers Best Wishes, Chris Travers http://www.2ndquadrant.com PostgreSQL Services, Training, and Support
Re: [HACKERS] 9.3 Json Array's
On 23 September 2013 at 23:37 Adam Jelinek ajeli...@gmail.com wrote: I am sure I am doing something wrong here, or this is an unsupported feature, but I wanted to double check. I was hoping that if I did a json_agg(x) and then copied that output of that and passed it into a json_populate_recordset that I would get the record back. I know I can make things work using a CTE and other functions like json_each, but I was hoping for a simple one liner. Yeah, I had the same experience. It is not supported. I am looking at trying to add support for nested objects and better support for arrays. Interested in collaborating? CREATE SCHEMA varrm; CREATE SEQUENCE varrm.item_id_seq; CREATE TABLE varrm.item (item_idbigint DEFAULT nextval('varrm.item_id_seq') ,title text ,short_desc text ,long_desc text ,tags text[] ^^^ That is what it chokes on. ,external_api_key text ,trans_timestamptimestamp without time zone DEFAULT now() ,upsert_timestamp timestamp without time zone DEFAULT clock_timestamp() ,end_timestamp timestamp without time zone DEFAULT '-12-31 23:59:59.99'::timestamp without time zone ,CONSTRAINT item_primary_keyPRIMARY KEY (item_id) ); INSERT INTO varrm.item (title, short_desc, long_desc, tags, external_api_key) values ('My Title', 'My Short Desc', 'My Very Long Desc', '{GAME, WII, PS4, ACTION, FIRST PERSON SHOOTER}', '1235467'); SELECT json_agg(t1) FROM (SELECT title, short_desc, long_desc, tags, external_api_key FROM varrm.item) AS t1 --output is --[{title:My Title,short_desc:My Short Desc,long_desc:My Very Long Desc,tags:[GAME,WII,PS4,ACTION,FIRST PERSON SHOOTER],external_api_key:null}] SELECT '[{title:My Title,short_desc:My Short Desc,long_desc:My Very Long Desc,tags:[GAME,WII,PS4,ACTION,FIRST PERSON SHOOTER],external_api_key:null}]'::JSON SELECT * FROM json_populate_recordset(null::varrm.item, '[{title:My Title,short_desc:My Short Desc,long_desc:My Very Long Desc,tags:[GAME,WII,PS4,ACTION,FIRST PERSON SHOOTER],external_api_key:null}]'::JSON) /** ERROR: cannot call json_populate_recordset on a nested object I am still in the process of wrapping my head around the current JSON logic. I hope to produce a proof of concept that can later be turned into a patch. See my previous post on this topic. Again collaboration is welcome. Best Wishes, Chris Travers http://www.2ndquadrant.com PostgreSQL Services, Training, and Support
Re: [HACKERS] Minmax indexes
On 16 September 2013 at 11:03 Heikki Linnakangas hlinnakan...@vmware.com wrote: Something like this seems completely sensible to me: create index i_accounts on accounts using minmax (ts) where valid = true; The situation where that would be useful is if 'valid' accounts are fairly well clustered, but invalid ones are scattered all over the table. The minimum and maximum stoed in the index would only concern valid accounts. Here's one that occurs to me: CREATE INDEX i_billing_id_mm ON billing(id) WHERE paid_in_full IS NOT TRUE; Note that this would be a frequently moving target and over years of billing, the subset would be quite small compared to the full system (imagine, say, 50k rows out of 20M). Best Wises, Chris Travers - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers Best Wishes, Chris Travers http://www.2ndquadrant.com PostgreSQL Services, Training, and Support
Re: [HACKERS] Proposal: json_populate_record and nested json objects
On 16 September 2013 at 14:43 Merlin Moncure mmonc...@gmail.com wrote: Huge +1 on on this. Couple random thoughts: *) Hard to see how you would structure this as an extension as you're adjusting the behaviors of existing functions, unless you wanted to introduce new function names for testing purposes? Yeah, and reading the source, it looks like some parts of the JSON parsing code will have to be rewritten because the nested object errors are thrown quite deeply in the parsing stage. It looks to me as if this will require some significant copying as a POC into a new file with different publicly exposed function names. *) Would like to at least consider being able to use casting syntax as a replacement for populate_record and (the missing) populate_array for most usages. Yes. I am trying to figure out how best to do this at present. Initially I think I would be happy to settle for casts wrapping functions which themselves just wrap the call to populate_record. What I will probably do for my POC is expose the following methods: 1. json_populate_type() 2. json_populate_array() Then we can talk about whether to merge the changes into the core, This will be an interesting way to get into PostgreSQL hacking. Best Wishes, Chris Travers merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers Best Wishes, Chris Travers http://www.2ndquadrant.com PostgreSQL Services, Training, and Support
Re: [HACKERS] Proposal: json_populate_record and nested json objects
On 15 September 2013 at 18:42 Andrew Dunstan and...@dunslane.net wrote: On 09/14/2013 10:27 PM, chris travers wrote: Well, you could fairly easily build it as an extension as a POC. The main point of the API this is built on was to allow for extensions. The logic changes might be a bit tricky. I'll be interested to see what you come up with. If we're going to do this we should make these handle arrays as well as objects. Yes, arrays are necessary for my use case. I probably should have been explicit that I needed that too. Best Wishes, Chris Travers cheers andrew Best Wishes, Chris Travers http://www.2ndquadrant.com PostgreSQL Services, Training, and Support
[HACKERS] Proposal: json_populate_record and nested json objects
Hi all; Currently json_populate_record and json_populate_recordset cannot work with nested json objects. This creates two fundamental problems when trying to use JSON as an interface format. The first problem is you can't easily embed a json data type in an json object and have it populate a record. This means that storing extended attributes in the database is somewhat problematic if you accept the whole row in as a json object. The second problem is that nested data structures and json don't go together well. You can't have a composite type which has as an attribute an array of another composite type and populate this from a json object. This makes json largely an alternative to hstore for interfaces in its current shape. I would propose handling the json_populate_record and friends as such: 1. Don't throw errors initially as a pre-check if the json object is nested. 2. If one comes to a nested fragment, check the attribute type it is going into first. 2.1 If it is a json type, put the nested fragment there. 2.2 If it is a composite type (i.e. anything in pg_class), push it through another json_populate_record run 2.3 If it is neither, then see if a json::[type] cast exists, if so call it. 2.4 Otherwise raise an exception I have a few questions before I go on to look at creating a patch. 1. Are there any problems anyone spots with this approach? 2. Is anyone working on something like this? 3. Would it be preferable to build something like this first as an extension (perhaps with different function names) or first as a patch? Best Wishes, Chris Travers http://www.2ndquadrant.com PostgreSQL Services, Training, and Support
Re: [HACKERS] Proposal: PL/PgSQL strict_mode
A few thoughts about this. On 14 September 2013 at 05:28 Marko Tiikkaja ma...@joh.to wrote: Hi, After my previous suggestion for adding a STRICT keyword got shot down[1], I've been thinking about an idea Andrew Gierth tossed out: adding a new strict mode into PL/PgSQL. In this mode, any query which processes more than one row will raise an exception. This is a bit similar to specifying INTO STRICT .. for every statement, except processing no rows does not trigger the exception. The need for this mode comes from a few observations I make almost every day: 1) The majority of statements only deal with exactly 0 or 1 rows. 2) Checking row_count for a statement is ugly and cumbersome, so often it just isn't checked. I often use RETURNING TRUE INTO STRICT _OK for DML, but that a) requires an extra variable, and b) isn't possible if 0 rows affected is not an error in the application logic. 3) SELECT .. INTO only fetches one row and ignores the rest. Even row_count is always set to 0 or 1, so there's no way to fetch a value *and* to check that there would not have been more rows. This creates bugs which make your queries return wrong results and which could go undetected for a long time. I am going to suggest that STRICT is semantically pretty far from what is meant here in common speech. I think STRICT here would be confusing. This would be really pretty severe for people coming from Perl or MySQL backgrounds. May I suggest SINGLE as a key word instead? It might be worth having attached to a INSERT, UPDATE, and DELETE statements. I am thinking something like: DELETE SINGLE FROM foo WHERE f1 1000; would be more clearer. Similarly one could have: INSERT SINGLE INTO foo SELECT * from foo2; and UPDATE SINGLE foo You could even use SELECT SINGLE but not sure where the use case is there where unique indexes are not sufficient. Attached is a proof-of-concept patch (no docs, probably some actual code problems too) to implement this as a compile option: =# create or replace function footest() returns void as $$ $# #strict_mode strict $# begin $# -- not allowed to delete more than one row $# delete from foo where f1 100; $# end$$ language plpgsql; CREATE FUNCTION =# select footest(); ERROR: query processed more than one row CONTEXT: PL/pgSQL function footest() line 5 at SQL statement Now while I think this is a step into the right direction, I do have a couple of problems with this patch: 1) I'm not sure what would be the correct behaviour with EXECUTE. I'm tempted to just leave EXECUTE alone, as it has slightly different rules anyway. 2) If you're running in strict mode and you want to insert/update/delete more than one row, things get a bit uglier; a wCTE would work for some cases. If strict mode doesn't affect EXECUTE (see point 1 above), that could work too. Or maybe there could be a new command which runs a query, discards the results and ignores the number of rows processed. Yeah, I am worried about this one. I am concerned that if you can't disable on a statement by statement basis, then you have a problem where you end up removing the mode from the function and then it becomes a lot harder for everyone maintaining the function to have a clear picture of what is going on. I am further worried that hacked ugly code ways around it will introduce plenty of other maintenance pain points that will be worse than what you are solving. I'll be adding this to the open commitfest in hopes of getting some feedback on this idea (I'm prepared to hear a lot of you're crazy!), but feel free to comment away any time you please. Well, I don't know if my feedback above is helpful, but there it is ;-) Regards, Marko Tiikkaja [1]: http://www.postgresql.org/message-id/510bf731.5020...@gmx.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers Best Wishes, Chris Travers http://www.2ndquadrant.com PostgreSQL Services, Training, and Support
Re: [HACKERS] ASYNC Privileges proposal
So I would think that if this was to go further then channels would need to be more of a first class citizen and created explicitly, with CREATE CHANNEL, DROP CHANNEL etc: CREATE CHANNEL channame; GRANT LISTEN ON CHANNEL channame TO rolename; GRANT NOTIFY ON CHANNEL channame TO rolename; LISTEN channame; -- OK NOTIFY channame, 'hi'; -- OK LISTEN ; -- exception: no channel named NOTIFY , 'hi'; -- exception: no channel named Personally I think explicitly creating channels would be beneficial; I have hit issues where an typo in a channel name has caused a bug to go unnoticed for a while. But of course this would break backwards compatibility with the current model (with implicit channel names) so unless we wanted to force everyone to add CREATE CHANNEL statements during their upgrade then, maybe there would need to be some kind of system to workaround this Possibly some kind of catch-all channel, that enables implicit channel names? GRANT LISTEN, NOTIFY ON CHANNEL * TO PUBLIC; -- enabled by default for backwards compat NOTIFY ; -- OK via * CHANNEL LISTEN ; -- OK via * CHANNEL Chris On 18 June 2013 18:31, Josh Berkus j...@agliodbs.com wrote: I had a quick play to see what might be involved [attached], and would like to hear people thoughts; good idea, bad idea, not like that! etc I question the usefulness of allowing listen/notify to be restricted to an entire class of users. The granularity of this seems too broad, though I am not sure if allowing message to be sent to a specific user is easily achievable. Well, if we're going to have privs on LISTEN/NOTIFY at all, they should be on specific message bands, i.e.: REVOKE LISTEN ON 'cacheupdates' FROM PUBLIC; GRANT LISTEN ON 'cacheupdates' TO webuser; GRANT LISTEN ON ALL TO admin; I can certainly see wanting this, but it'd be a great deal more sophisticated than what Chris has proposed. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
[HACKERS] ASYNC Privileges proposal
Hey all, I find the current LISTEN / NOTIFY rather limited in the context of databases with multiple roles. As it stands it is not possible to restrict the use of LISTEN or NOTIFY to specific roles, and therefore notifications (and their payloads) cannot really be trusted as coming from any particular source. If the payloads of notifications could be trusted, then applications could make better use of them, without fear of leaking any sensitive information to anyone who shouldn't be able to see it. I'd like to propose a new ASYNC database privilege that would control whether a role can use LISTEN, NOTIFY and UNLISTEN statements and the associated pg_notify function. ie: GRANT ASYNC ON DATABASE TO bob; REVOKE ASYNC ON DATABASE FROM bob; SECURITY DEFINER functions could then be used anywhere that a finer grained access control was required. I had a quick play to see what might be involved [attached], and would like to hear people thoughts; good idea, bad idea, not like that! etc Chris. async_privileges_r0.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] ASYNC Privileges proposal
Hey all, I find the current LISTEN / NOTIFY rather limited in the context of databases with multiple roles. As it stands it is not possible to restrict the use of LISTEN or NOTIFY to specific roles, and therefore notifications (and their payloads) cannot really be trusted as coming from any particular source. If the payloads of notifications could be trusted, then applications could make better use of them, without fear of leaking any sensitive information to anyone who shouldn't be able to see it. I'd like to propose a new ASYNC database privilege that would control whether a role can use LISTEN, NOTIFY and UNLISTEN statements and the associated pg_notify function. ie: GRANT ASYNC ON DATABASE TO bob; REVOKE ASYNC ON DATABASE FROM bob; SECURITY DEFINER functions could then be used anywhere that a finer grained access control was required. I had a quick play to see what might be involved [attached], and would like to hear people thoughts; good idea, bad idea, not like that! etc Chris. async_privileges_r0.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ASYNC Privileges proposal
In fairness NOTIFY has only had a payload since v9 (maybe 8.4?), and the issue of trust is mainly tied to data leaking from the payload, so I suspect I won't be last person to request this as people re-visit NOTIFY :) ...but I totally get your point of course. My first thought was also that having control at the channel-level would be ideal, but that would be a huge implementation change by the looks of things, would certainly affect performance a great deal more and would not really give much more benefit that could be attained with database-level control + a SECURITY DEFINER function. Chris On 20 May 2013 03:23, Tom Lane t...@sss.pgh.pa.us wrote: Chris Farmiloe chrisfa...@gmail.com writes: I find the current LISTEN / NOTIFY rather limited in the context of databases with multiple roles. As it stands it is not possible to restrict the use of LISTEN or NOTIFY to specific roles, and therefore notifications (and their payloads) cannot really be trusted as coming from any particular source. TBH, nobody has complained about this in the fifteen-plus years that LISTEN has been around. I'm dubious about adding privilege-checking overhead for everybody to satisfy a complaint from one person. I'd like to propose a new ASYNC database privilege that would control whether a role can use LISTEN, NOTIFY and UNLISTEN statements and the associated pg_notify function. ... and if I did think that there were an issue here, I doubt I'd think that a privilege as coarse-grained as that would fix it. Surely you'd want per-channel privileges if you were feeling paranoid about this, not to mention separate read and write privileges. But the demand for that just isn't out there. regards, tom lane
Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL
What's the use case of this? It sounds like it will just create a maintenance nightmare where some stuff you expect to lookup in in postgresql.conf is actually hiding in the .auto file. Assuming only super users/sysadmins would have the ability to change things in the config file, wouldn't they be more likely to just do it on the server and edit the .conf (which among other things, keeps it tidy and orderly). Also, how would you propose to handle settings that require the server to be restarted, such as checkpoint_segments? It seems like by allowing these to be set via a command (which isn't really SQL) you're creating the impression that they will take immediate effect, which isn't the case. Just my $0.02. Of course, I might be missing the point. Il giorno 30/ott/2012, alle ore 00:31, Amit Kapila ha scritto: SYNTAX: ALTER SYSTEM SET configuration_parameter = value COMMENT 'value'; DESIGN IDEA: (a) have a postgresql.conf.auto (b) add a default include for postgresql.conf.auto at the beginning of PostgreSQL.conf (c) SQL updates go to postgresql.conf.auto, which consists only ofsetting = value #comments . (d) We document that settings which are changed manually in postgresql.conf will override postgresql.conf.auto. IMPLEMENTATION IDEA: The main Idea is we create a lock file, it acts as lock to avoid concurrent edit into .conf auto file and also as an intermediate file where we keep all the new changes until we commit the alter system command. CCREATION OF AUTO FILE 1. during initdb we create the .auto file and it will be empty. 2. .conf file will have its first entry as follows #-- # Postgresql.conf.auto inclusion #-- # Do not edit postgresql.conf.auto file or remove the include. # You can Edit the settings below in this file which will override auto-generated file. include = 'postgresql.conf.auto' ALGORITHM for ALTER SYSTEM: 1. check whether the given key : value is valid. -- This is done so that next read from .auto file should not throw error. 2. get postgresql.conf.auto path. (always the data directory) -- Since the .auto file in data directory pg_basebackup will pick it up. 3. Create the postgresql.conf.auto.lock file( with O_EXCL flag). -- This act as a protection from other backends who are trying to edit this file. -- If already exist we wait for some time by retrying. 4. Open the postgresql.conf.auto file in read mode. 5. Write the new (key, value, comment) in to the postgresql.conf.auto.lock file by using below steps: a. read the contents of postgresql.conf.auto in to memory buffer line by line. b. Scan for key in postgresql.conf.auto file. if found get the line number in file such that where we have to insert the new (key,value). else we should write the new (key, value) pair to last line. c. add the new (key, value, comment) to memory buffer to the line as found in step b. d. Write the memory buffer into postgresql.conf.auto.lock file. -- here memory buffer represent the modified state of the postgresql.conf.auto file. e. Commit the .lock file. -- Here rename the lock file to auto file. -- If auto file is opened by other process (SIGHUP processing) then we retry rename for some time other wise alter system command fails. f. If any error in between rollback lock file -- here delete the lock file. CLARIFICATION 1. Tom, the below is mentioned by you in one of the discussions for this topic. I need small clarification: About the only change I want to make immediately is that initdb ought to shove its settings into postgresql.auto instead of mucking with postgresql.conf. So do you mean to say the settings done by initdb (like max_connections, etc.) need to be in .auto file instead of .conf and let these parameters be commented in .conf? 2. Do .auto file needs to be included by default? 3. Can the path of .auto be fixed as data directory path? Note: 1. Only One backend can edit conf file at a time others wait. 2. Suppose .auto have invalid entry eg: listening port number mentioned is taken up by other application then if we try to restart the postgres it fails. This need manual intervention. 3. This command cannot be executed inside the transaction block. Not sure what to do for this part, whether it needs to be supported in a block? 4. currently command for reset or invalidation of (key, value)
Re: [HACKERS] Should select 'nan'::float = 'nan'::float; return false as per IEEE 754
Would this introduce problems finding rows where the stored value is NaN? You'd need to add a function or operator to avoid that. Il giorno 28/ott/2012, alle ore 20:43, Hannu Krosing ha scritto: This is how PostgreSQL currently works - test=# select 'NaN'::float = 'NaN'::float as must_be_false; must_be_false -- t (1 row) I think that PostgreSQL's behaviour of comparing two NaN-s as equal is wrong and Iwe should follow the IEEE 754 spec here As per IEEE 754 a NaN behaves similar to NULL in SQL. There is some discussion of why it is so at: http://stackoverflow.com/questions/1565164/what-is-the-rationale-for-all-comparisons-returning-false-for-ieee754-nan-values especially the first comment - Hannu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Is PQexecParams() simply a wrapper function?
Hi All, I'm currently using PQexecParams() as part of a bridge to allow Postgres to be used in another programming language. Not specifying the OIDs simplifies things (for the end user), but bytea params still cause headaches, since they need to be formatted differently and I receive the data as a (char *) internally, but can't really infer a type myself, at least not reliably. It looks like PQprepare(), PQdescribePrepared() and PQexecPrepared() could be used to allow me to check if any of the params represent a BYTEAOID, then escape only those params. This seems like 3 round-trips to the server, however. That said, I believe PQexecParams() is doing a similar thing, in that it internally prepares a statement, then executes it (2 round trips). Or am I needlessly concerning myself over microseconds here? Cheers, Chris -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SEGFAULT on SELECT * FROM view
I ported the entire schema to my test DB server and could not reproduce the error there. Note that probably recreating the view solves this issue. Given this, how should I proceed to create a test case? Any tutorial on this? (I'm not too familiar with all this yet.) It's possibly statistics-dependent; make sure you have the same stats targets on both DBs, and try re-analyzing a few times. Check other planner parameters are the same, too. In addition to the schema, I now also copied parts of the data in our production system to the testing DB. (I cannot copy all data, as it's too large for the testing DB.) After a couple of VACUUM FULL ANALYZEs, the bug still doesn't reproduce. In our production environment, I'm working with the re-created view that doesn't show the SEGFAULT behavior. I can live with this workaround, as only one particular view seems to be affected, but if you want me to debug more on the error please let me know. Thanks for your help until now! Chris -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] SEGFAULT on SELECT * FROM view
Hi all, When SELECTing from one particular view, I get reproducible SEGFAULTs in my pg 9.1.2 production database. To test for hardware errors, I ran memtest, which succeeded. I then asked for help in #postgres and got advice to create stacktraces, but I couldn't find the problem yet. Scenario: VIEW 'vwa' on TABLE 'tba' and some subselects and aggregates. I've no idea what's wrong with this view, but a SELECT * FROM vwa WHERE myid = 110 LIMIT 100 OFFSET 0; and even EXPLAINing this statement SEGFAULTs the server. gdb trace: http://pgsql.privatepaste.com/eacd1b6c5d gdb ec output: http://pgsql.privatepaste.com/a61db5b564 smaps: http://pgsql.privatepaste.com/3c4f494015 Then I used the definition of vwa, and created vwb, and the very same SELECT (just with vwb) works perfectly fine. I've a pastebin link comparing the two view definitions, which I'm willing to share privately if that helps. The views actually differ, although the look identical with \d+ in the psql console, in that the newer view names more columns that were added to the referenced tables lately. So, you tell me, what's wrong with the old view? Thanks a lot, Chris -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SEGFAULT on SELECT * FROM view
So, you tell me, what's wrong with the old view? Nothing; that looks like a planner bug to me. Please submit a self-contained test case. I ported the entire schema to my test DB server and could not reproduce the error there. Note that probably recreating the view solves this issue. Given this, how should I proceed to create a test case? Any tutorial on this? (I'm not too familiar with all this yet.) Chris -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.1.2 ?
On Wed, Nov 9, 2011 at 6:22 PM, Florian Pflug f...@phlo.org wrote: On Nov9, 2011, at 23:53 , Daniel Farina wrote: I think a novice user would be scared half to death: I know I was the first time. That's not a great impression for the project to leave for what is not, at its root, a vast defect, and the fact it's occurring for people when they use rsync rather than my very sensitive backup routines is indication that it's not very corner-ey. Just to emphasize the non-conerish-ness of this problem, it should be mentioned that the HS issue was observed even with backups taken with pg_basebackup, if memory serves correctly. Yes I personally can reliably reproduce both the clog+subtrans problems using pg_basebackup, and can confirm that the oldestActiveXid_fixed.v2.patch does resolve both issues.
Re: [HACKERS] Hot Standby startup with overflowed snapshots
oopsreply-to-all -- Forwarded message -- From: Chris Redekop ch...@replicon.com Date: Wed, Nov 2, 2011 at 8:41 AM Subject: Re: [HACKERS] Hot Standby startup with overflowed snapshots To: Simon Riggs si...@2ndquadrant.com Sure, I've got quite a few logs lying around - I've attached 3 of 'em...let me know if there are any specific things you'd like me to do or look for next time it happens On Wed, Nov 2, 2011 at 2:59 AM, Simon Riggs si...@2ndquadrant.com wrote: On Fri, Oct 28, 2011 at 3:42 AM, Chris Redekop ch...@replicon.com wrote: On a side note I am sporadically seeing another error on hotstandby startup. I'm not terribly concerned about it as it is pretty rare and it will work on a retry so it's not a big deal. The error is FATAL: out-of-order XID insertion in KnownAssignedXids. If you think it might be a bug and are interested in hunting it down let me know and I'll help any way I can...but if you're not too worried about it then neither am I :) I'd be interested to see further details of this if you see it again, or have access to previous logs. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services postgresql-2011-10-27_202007.log Description: Binary data postgresql-2011-10-31_152925.log Description: Binary data postgresql-2011-11-01_094501.log Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
okay, sorry I'm a little confused then. Should I be able to apply both the v2 patch as well as the v3 patch? or is it expected that I'd have to manually do the merge? On Wed, Nov 2, 2011 at 1:34 AM, Simon Riggs si...@2ndquadrant.com wrote: On Wed, Nov 2, 2011 at 2:40 AM, Chris Redekop ch...@replicon.com wrote: looks like the v3 patch re-introduces the pg_subtrans issue... No, I just separated the patches to be clearer about the individual changes. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
looks like the v3 patch re-introduces the pg_subtrans issue... On Tue, Nov 1, 2011 at 9:33 AM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, Oct 27, 2011 at 4:25 PM, Simon Riggs si...@2ndquadrant.com wrote: StartupMultiXact() didn't need changing, I thought, but I will review further. Good suggestion. On review, StartupMultiXact() could also suffer similar error to the clog failure. This was caused *because* MultiXact is not maintained by recovery, which I had thought meant it was protected from such failure. Revised patch attached. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [HACKERS] Hot Standby startup with overflowed snapshots
Thanks for the patch Simon, but unfortunately it does not resolve the issue I am seeing. The standby still refuses to finish starting up until long after all clients have disconnected from the primary (10 minutes). I do see your new log statement on startup, but only once - it does not repeat. Is there any way for me to see what the oldest xid on the standby is via controldata or something like that? The standby does stream to keep up with the primary while the primary has load, and then it becomes idle when the primary becomes idle (when I kill all the connections)so it appears to be current...but it just doesn't finish starting up I'm not sure if it's relevant, but after it has sat idle for a couple minutes I start seeing these statements in the log (with the same offset every time): DEBUG: skipping restartpoint, already performed at 9/9520 On Thu, Oct 27, 2011 at 7:26 AM, Simon Riggs si...@2ndquadrant.com wrote: Chris Redekop's recent report of slow startup for Hot Standby has made me revisit the code there. Although there isn't a bug, there is a missed opportunity for starting up faster which could be the source of Chris' annoyance. The following patch allows a faster startup in some circumstances. The patch also alters the log levels for messages and gives a single simple message for this situation. The log will now say LOG: recovery snapshot waiting for non-overflowed snapshot or until oldest active xid on standby is at least %u (now %u) ...multiple times until snapshot non-overflowed or xid reached... whereas before the first LOG message shown was LOG: consistent state delayed because recovery snapshot incomplete and only later, at DEBUG2 do you see LOG: recovery snapshot waiting for %u oldest active xid on standby is %u ...multiple times until xid reached... Comments please. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby startup with overflowed snapshots
hrmz, still basically the same behaviour. I think it might be a *little* better with this patch. Before when under load it would start up quickly maybe 2 or 3 times out of 10 attemptswith this patch it might be up to 4 or 5 times out of 10...ish...or maybe it was just fluke *shrug*. I'm still only seeing your log statement a single time (I'm running at debug2). I have discovered something though - when the standby is in this state if I force a checkpoint on the primary then the standby comes right up. Is there anything I check or try for you to help figure this out?or is it actually as designed that it could take 10-ish minutes to start up even after all clients have disconnected from the primary? On Thu, Oct 27, 2011 at 11:27 AM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, Oct 27, 2011 at 5:26 PM, Chris Redekop ch...@replicon.com wrote: Thanks for the patch Simon, but unfortunately it does not resolve the issue I am seeing. The standby still refuses to finish starting up until long after all clients have disconnected from the primary (10 minutes). I do see your new log statement on startup, but only once - it does not repeat. Is there any way for me to see what the oldest xid on the standby is via controldata or something like that? The standby does stream to keep up with the primary while the primary has load, and then it becomes idle when the primary becomes idle (when I kill all the connections)so it appears to be current...but it just doesn't finish starting up I'm not sure if it's relevant, but after it has sat idle for a couple minutes I start seeing these statements in the log (with the same offset every time): DEBUG: skipping restartpoint, already performed at 9/9520 OK, so it looks like there are 2 opportunities to improve, not just one. Try this. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [HACKERS] Hot Standby startup with overflowed snapshots
Sorry...designed was poor choice of words, I meant not unexpected. Doing the checkpoint right after pg_stop_backup() looks like it will work perfectly for me, so thanks for all your help! On a side note I am sporadically seeing another error on hotstandby startup. I'm not terribly concerned about it as it is pretty rare and it will work on a retry so it's not a big deal. The error is FATAL: out-of-order XID insertion in KnownAssignedXids. If you think it might be a bug and are interested in hunting it down let me know and I'll help any way I can...but if you're not too worried about it then neither am I :) On Thu, Oct 27, 2011 at 4:55 PM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, Oct 27, 2011 at 10:09 PM, Chris Redekop ch...@replicon.com wrote: hrmz, still basically the same behaviour. I think it might be a *little* better with this patch. Before when under load it would start up quickly maybe 2 or 3 times out of 10 attemptswith this patch it might be up to 4 or 5 times out of 10...ish...or maybe it was just fluke *shrug*. I'm still only seeing your log statement a single time (I'm running at debug2). I have discovered something though - when the standby is in this state if I force a checkpoint on the primary then the standby comes right up. Is there anything I check or try for you to help figure this out?or is it actually as designed that it could take 10-ish minutes to start up even after all clients have disconnected from the primary? Thanks for testing. The improvements cover specific cases, so its not subject to chance; its not a performance patch. It's not designed to act the way you describe, but it does. The reason this occurs is that you have a transaction heavy workload with occasional periods of complete quiet and a base backup time that is much less than checkpoint_timeout. If your base backup was slower the checkpoint would have hit naturally before recovery had reached a consistent state. Which seems fairly atypical. I guess you're doing this on a test system. It seems cheap to add in a call to LogStandbySnapshot() after each call to pg_stop_backup(). Does anyone think this case is worth adding code for? Seems like one more thing to break. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
And I think they also reported that if they didn't run hot standby, but just normal recovery into a new master, it didn't have the problem either, i.e. without hotstandby, recovery ran, properly extended the clog, and then ran as a new master fine. Yes this is correct...attempting to start as hotstandby will produce the pg_clog error repeatedly and then without changing anything else, just turning hot standby off it will start up successfully. This fits the OP's observation ob the problem vanishing when pg_start_backup() does an immediate checkpoint. Note that this is *not* the behaviour I'm seeingit's possible it happens more frequently without the immediate checkpoint, but I am seeing it happen even with the immediate checkpoint. This is a different problem and has already been reported by one of your colleagues in a separate thread, and answered in detail by me there. There is no bug related to this error message. Excellent...I will continue this discussion in that thread.
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
FYI I have given this patch a good test and can now no longer reproduce either the subtrans nor the clog error. Thanks guys! On Wed, Oct 26, 2011 at 11:09 AM, Simon Riggs si...@2ndquadrant.com wrote: On Wed, Oct 26, 2011 at 5:16 PM, Simon Riggs si...@2ndquadrant.com wrote: On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs si...@2ndquadrant.com wrote: Brewing a patch now. Latest thinking... confirmations or other error reports please. This fixes both the subtrans and clog bugs in one patch. I'll be looking to commit that tomorrow afternoon as two separate patches with appropriate credits. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
Chris, can you rearrange the backup so you copy the pg_control file as the first act after the pg_start_backup? I tried this and it doesn't seem to make any difference. I also tried the patch and I can no longer reproduce the subtrans error, however instead it now it starts up, but never gets to the point where it'll accept connections. It starts up but if I try to do anything I always get FATAL: the database system is starting up...even if the load is removed from the primary, the standby still never finishes starting up. Attached below is a log of one of these startup attempts. In my testing with the patch applied approx 3 in 10 attempts start up successfully, 7 in 10 attempts go into the db is starting up statethe pg_clog error is still there, but seems much harder to reproduce nowI've seen it only once since applying the patch (out of probably 50 or 60 under-load startup attempts). It does seem to be moody like that thoit will be very difficult to reproduce for a while, and then it will happen damn-near every time for a while...weirdness On a bit of a side note, I've been thinking of changing my scripts so that they perform an initial rsync prior to doing the startbackup-rsync-stopbackup just so that the second rsync will be fasterso that the backup is in progress for a shorter period of time, as while it is running it will stop other standbys from starting upthis shouldn't cause any issues eh? 2011-10-25 13:43:24.035 MDT [15072]: [1-1] LOG: database system was interrupted; last known up at 2011-10-25 13:43:11 MDT 2011-10-25 13:43:24.035 MDT [15072]: [2-1] LOG: creating missing WAL directory pg_xlog/archive_status 2011-10-25 13:43:24.037 MDT [15072]: [3-1] LOG: entering standby mode DEBUG: received replication command: IDENTIFY_SYSTEM DEBUG: received replication command: START_REPLICATION 2/CF00 2011-10-25 13:43:24.041 MDT [15073]: [1-1] LOG: streaming replication successfully connected to primary 2011-10-25 13:43:24.177 MDT [15092]: [1-1] FATAL: the database system is starting up 2011-10-25 13:43:24.781 MDT [15072]: [4-1] DEBUG: checkpoint record is at 2/CF81A478 2011-10-25 13:43:24.781 MDT [15072]: [5-1] DEBUG: redo record is at 2/CF20; shutdown FALSE 2011-10-25 13:43:24.781 MDT [15072]: [6-1] DEBUG: next transaction ID: 0/4634700; next OID: 1188228 2011-10-25 13:43:24.781 MDT [15072]: [7-1] DEBUG: next MultiXactId: 839; next MultiXactOffset: 1686 2011-10-25 13:43:24.781 MDT [15072]: [8-1] DEBUG: oldest unfrozen transaction ID: 1669, in database 1 2011-10-25 13:43:24.781 MDT [15072]: [9-1] DEBUG: transaction ID wrap limit is 2147485316, limited by database with OID 1 2011-10-25 13:43:24.783 MDT [15072]: [10-1] DEBUG: resetting unlogged relations: cleanup 1 init 0 2011-10-25 13:43:24.791 MDT [15072]: [11-1] DEBUG: initializing for hot standby 2011-10-25 13:43:24.791 MDT [15072]: [12-1] LOG: consistent recovery state reached at 2/CF81A4D0 2011-10-25 13:43:24.791 MDT [15072]: [13-1] LOG: redo starts at 2/CF20 2011-10-25 13:43:25.019 MDT [15072]: [14-1] LOG: consistent state delayed because recovery snapshot incomplete 2011-10-25 13:43:25.019 MDT [15072]: [15-1] CONTEXT: xlog redo running xacts: nextXid 4634700 latestCompletedXid 4634698 oldestRunningXid 4634336; 130 xacts: 4634336 4634337 4634338 4634339 4634340 4634341 4634342 4634343 4634344 4634345 4634346 4634347 4634348 4634349 4634350 4634351 4634352 4634353 4634354 4634355 4634356 4634357 4634358 4634359 4634360 4634361 4634362 4634363 4634364 4634365 4634366 4634367 4634368 4634369 4634370 4634371 4634515 4634516 4634517 4634518 4634519 4634520 4634521 4634522 4634523 4634524 4634525 4634526 4634527 4634528 4634529 4634530 4634531 4634532 4634533 4634534 4634535 4634536 4634537 4634538 4634539 4634540 4634541 4634542 4634543 4634385 4634386 4634387 4634388 4634389 4634390 4634391 4634392 4634393 4634394 4634395 4634396 4634397 4634398 4634399 4634400 4634401 4634402 4634403 4634404 4634405 4634406 4634407 4634408 4634409 4634410 4634411 4634412 4634413 4634414 4634415 4634416 4634417 4634418 4634419 4634420 4634579 4634580 4634581 4634582 4634583 4634584 4634585 4634586 4634587 4634588 4634589 4634590 4634591 4634592 4634593 4634594 4634595 4634596 4634597 4634598 4634599 4634600 4634601 4634602 4634603 4634604 4634605 4634606 4634607; subxid ovf 2011-10-25 13:43:25.240 MDT [15130]: [1-1] FATAL: the database system is starting up DEBUG: standby sync_rep_test has now caught up with primary 2011-10-25 13:43:26.304 MDT [15167]: [1-1] FATAL: the database system is starting up 2011-10-25 13:43:27.366 MDT [15204]: [1-1] FATAL: the database system is starting up 2011-10-25 13:43:28.426 MDT [15241]: [1-1] FATAL: the database system is starting up 2011-10-25 13:43:29.461 MDT [15275]: [1-1] FATAL: the database system is starting up and so on... On Tue, Oct 25, 2011 at 6:51 AM, Simon Riggs si...@2ndquadrant.com wrote: On Tue, Oct 25, 2011 at 12:39 PM, Florian Pflug f...@phlo.org
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
That isn't a Hot Standby problem, a recovery problem nor is it certain its a PostgreSQL problem. Do you have any theories on this that I could help investigate? It happens even when using pg_basebackup and it persists until another sync is performed, so the files must be in some state that that it can't recover fromwithout understanding the internals just viewing from an outside perspective, I don't really see how this could not be a PostgreSQL problem
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
I can confirm that both the pg_clog and pg_subtrans errors do occur when using pg_basebackup instead of rsync. The data itself seems to be fine because using the exact same data I can start up a warm standby no problem, it is just the hot standby that will not start up. On Sat, Oct 15, 2011 at 7:33 PM, Chris Redekop ch...@replicon.com wrote: Linas, could you capture the output of pg_controldata *and* increase the log level to DEBUG1 on the standby? We should then see nextXid value of the checkpoint the recovery is starting from. I'll try to do that whenever I'm in that territory again... Incidentally, recently there was a lot of unrelated-to-this-post work to polish things up for a talk being given at PGWest 2011 Today :) I also checked what rsync does when a file vanishes after rsync computed the file list, but before it is sent. rsync 3.0.7 on OSX, at least, complains loudly, and doesn't sync the file. It BTW also exits non-zero, with a special exit code for precisely that failure case. To be precise, my script has logic to accept the exit code 24, just as stated in PG manual: Docs For example, some versions of rsync return a separate exit code for Docs vanished source files, and you can write a driver script to accept Docs this exit code as a non-error case. I also am running into this issue and can reproduce it very reliably. For me, however, it happens even when doing the fast backup like so: pg_start_backup('whatever', true)...my traffic is more write-heavy than linas's tho, so that might have something to do with it. Yesterday it reliably errored out on pg_clog every time, but today it is failing sporadically on pg_subtrans (which seems to be past where the pg_clog error was)the only thing that has changed is that I've changed the log level to debug1I wouldn't think that could be related though. I've linked the requested pg_controldata and debug1 logs for both errors. Both links contain the output from pg_start_backup, rsync, pg_stop_backup, pg_controldata, and then the postgres debug1 log produced from a subsequent startup attempt. pg_clog: http://pastebin.com/mTfdcjwH pg_subtrans: http://pastebin.com/qAXEHAQt Any workarounds would be very appreciated.would copying clog+subtrans before or after the rest of the data directory (or something like that) make any difference? Thanks!
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
Well, on the other hand maybe there is something wrong with the data. Here's the test/steps I just did - 1. I do the pg_basebackup when the master is under load, hot slave now will not start up but warm slave will. 2. I start a warm slave and let it catch up to current 3. On the slave I change 'hot_standby=on' and do a 'service postgresql restart' 4. The postgres fails to restart with the same error. 5. I turn hot_standby back off and postgres starts back up fine as a warm slave 6. I then turn off the load, the slave is all caught up, master and slave are both sitting idle 7. I, again, change 'hot_standby=on' and do a service restart 8. Again it fails, with the same error, even though there is no longer any load. 9. I repeat this warmstart/hotstart cycle a couple more times until to my surprise, instead of failing, it successfully starts up as a hot standby (this is after maybe 5 minutes or so of sitting idle) So...given that it continued to fail even after the load had been turned of, that makes me believe that the data which was copied over was invalid in some way. And when a checkpoint/logrotation/somethingelse occurred when not under load it cleared itself upI'm shooting in the dark here Anyone have any suggestions/ideas/things to try? On Mon, Oct 17, 2011 at 2:13 PM, Chris Redekop ch...@replicon.com wrote: I can confirm that both the pg_clog and pg_subtrans errors do occur when using pg_basebackup instead of rsync. The data itself seems to be fine because using the exact same data I can start up a warm standby no problem, it is just the hot standby that will not start up. On Sat, Oct 15, 2011 at 7:33 PM, Chris Redekop ch...@replicon.com wrote: Linas, could you capture the output of pg_controldata *and* increase the log level to DEBUG1 on the standby? We should then see nextXid value of the checkpoint the recovery is starting from. I'll try to do that whenever I'm in that territory again... Incidentally, recently there was a lot of unrelated-to-this-post work to polish things up for a talk being given at PGWest 2011 Today :) I also checked what rsync does when a file vanishes after rsync computed the file list, but before it is sent. rsync 3.0.7 on OSX, at least, complains loudly, and doesn't sync the file. It BTW also exits non-zero, with a special exit code for precisely that failure case. To be precise, my script has logic to accept the exit code 24, just as stated in PG manual: Docs For example, some versions of rsync return a separate exit code for Docs vanished source files, and you can write a driver script to accept Docs this exit code as a non-error case. I also am running into this issue and can reproduce it very reliably. For me, however, it happens even when doing the fast backup like so: pg_start_backup('whatever', true)...my traffic is more write-heavy than linas's tho, so that might have something to do with it. Yesterday it reliably errored out on pg_clog every time, but today it is failing sporadically on pg_subtrans (which seems to be past where the pg_clog error was)the only thing that has changed is that I've changed the log level to debug1I wouldn't think that could be related though. I've linked the requested pg_controldata and debug1 logs for both errors. Both links contain the output from pg_start_backup, rsync, pg_stop_backup, pg_controldata, and then the postgres debug1 log produced from a subsequent startup attempt. pg_clog: http://pastebin.com/mTfdcjwH pg_subtrans: http://pastebin.com/qAXEHAQt Any workarounds would be very appreciated.would copying clog+subtrans before or after the rest of the data directory (or something like that) make any difference? Thanks!
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
Linas, could you capture the output of pg_controldata *and* increase the log level to DEBUG1 on the standby? We should then see nextXid value of the checkpoint the recovery is starting from. I'll try to do that whenever I'm in that territory again... Incidentally, recently there was a lot of unrelated-to-this-post work to polish things up for a talk being given at PGWest 2011 Today :) I also checked what rsync does when a file vanishes after rsync computed the file list, but before it is sent. rsync 3.0.7 on OSX, at least, complains loudly, and doesn't sync the file. It BTW also exits non-zero, with a special exit code for precisely that failure case. To be precise, my script has logic to accept the exit code 24, just as stated in PG manual: Docs For example, some versions of rsync return a separate exit code for Docs vanished source files, and you can write a driver script to accept Docs this exit code as a non-error case. I also am running into this issue and can reproduce it very reliably. For me, however, it happens even when doing the fast backup like so: pg_start_backup('whatever', true)...my traffic is more write-heavy than linas's tho, so that might have something to do with it. Yesterday it reliably errored out on pg_clog every time, but today it is failing sporadically on pg_subtrans (which seems to be past where the pg_clog error was)the only thing that has changed is that I've changed the log level to debug1I wouldn't think that could be related though. I've linked the requested pg_controldata and debug1 logs for both errors. Both links contain the output from pg_start_backup, rsync, pg_stop_backup, pg_controldata, and then the postgres debug1 log produced from a subsequent startup attempt. pg_clog: http://pastebin.com/mTfdcjwH pg_subtrans: http://pastebin.com/qAXEHAQt Any workarounds would be very appreciated.would copying clog+subtrans before or after the rest of the data directory (or something like that) make any difference? Thanks!
Re: [HACKERS] pg_last_xact_insert_timestamp
Thanks for all the feedback guys. Just to throw another monkey wrench in here - I've been playing with Simon's proposed solution of returning 0 when the WAL positions match, and I've come to the realizatiion that even if using pg_last_xact_insert_timestamp, although it would help, we still wouldn't be able to get a 100% accurate how far behind? counternot that this is a big deal, but I know my ops team is going to bitch to me about it :).take this situation: there's a lull of 30 seconds where there are no transactions committed on the serverthe slave is totally caught up, WAL positions match, I'm reporting 0, everything is happy. Then a transaction is committed on the masterbefore the slave gets it my query hits it and sees that we're 30 seconds behind (when in reality we're 1sec behind).Because of this affect my graph is a little spikey...I mean it's not a huge deal or anything - I can put some sanity checking in my number reporting (if 1 second ago you were 0 seconds behind, you can't be more than 1 second behind now sorta thing). But if we wanted to go for super-ideal solution there would be a way to get the timestamp of pg_stat_replication.replay_location+1 (the first transaction that the slave does not have). On Thu, Sep 8, 2011 at 7:03 AM, Robert Haas robertmh...@gmail.com wrote: On Thu, Sep 8, 2011 at 6:14 AM, Fujii Masao masao.fu...@gmail.com wrote: OTOH, new function enables users to monitor the delay as a timestamp. For users, a timestamp is obviously easier to handle than LSN, and the delay as a timestamp is more intuitive. So, I think that it's worth adding something like pg_last_xact_insert_timestamp into core for improvement of user-friendness. It seems very nice from a usability point of view, but I have to agree with Simon's concern about performance. Actually, as of today, WALInsertLock is such a gigantic bottleneck that I suspect the overhead of this additional bookkeeping would be completely unnoticeable. But I'm still reluctant to add more centralized spinlocks that everyone has to fight over, having recently put a lot of effort into getting rid of some of the ones we've traditionally had. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] [GENERAL] Creating temp tables inside read only transactions
On Mon, Jul 11, 2011 at 12:49 PM, David Johnston pol...@yahoo.com wrote: I do not see how recursive queries (really iteration of records) even enters the picture... I agree, FWIW. If the feature was that desirable, we could look at questions of implementation to make recursion either unnecessary or at least well managed. Right now I can emulate a hierarchical schema structure via a naming scheme - for example schemabase_sub1_sub2_etc. I am simply looking for a formal way to do the above AND also tell the system that I want all schemas under schemabase to be in the search path. Heck, I guess just allowing for simply pattern matching in search_path would be useful in this case regardless of the presence of an actual schema hierarchy. Using LIKE syntax say: SET search_path TO schemabase_sub1_% or something similar. The only missing ability becomes a way for graphical tools to represent the schema hierarchy using a tree-structure with multiple depths. Right. Semantically myapp_schemaname_subschemaname is no less hierarchical than myapp.schemaname.subschemaname. The larger issue is that of potential ambiguity wrt cross-database references (I don't have a lot of experience reading the SQL standards, but seeing how different db's implement cross-db references suggests that the standards contemplate semantic meaning to depth of the namespace). I can see how adding . and .. and relative paths would confuse the issue those are not necessary features of a multi-level schema depth. The above, combined with a different separator for intra-level namespace/schema delineation, would allow for an unambiguous way to define and use a hierarchical schema with seemingly minimal invasion into the current way of doing things. You could almost implement it just by requiring a specific character to act as the separator and then construct the actual schema using single-level literals and supporting functions that can convert them into an hierarchy. In other words, the schema table would still only contain one field with the full parent!child as opposed to (schema, parent) with (VALUES('parent',null),('child','parent')). In other words, if we use ! as the separator, any schema named parent!child could be stored and referenced as such but then if you run a getChildren(parent) function it would return child along with any other schemas of the form parent!%. In this case the % sign could maybe only match everything except ! and the * symbol could be used to match ! as well. Agreed that this would be helpful. I would personally have a lot of use for this sort of feature, particularly with managing large numbers of stored procedures. Right now I am using a double underscore which is error-prone. Best Wishes, Chris Travers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Testing extension upgrade scripts
da...@kineticode.com (David E. Wheeler) writes: You should blog this. He just did, using the SMTP protocol... -- select 'cbbrowne' || '@' || 'acm.org'; http://linuxdatabases.info/info/postgresql.html Where do you want to Tell Microsoft To Go Today? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
j...@agliodbs.com (Josh Berkus) writes: I don't understand what you're talking about at all here. I think there are a lot of unsolved problems in monitoring but the one thing I think everyone is pretty clear on is that the right way to export metrics like these is to export a counter and then have some external component periodically copy the counter into some history table and calculate the derivative, second derivative, running average of the first derivative, etc. You missed the original point of the discussion, which was to have stats we could use for auto-tuning internally. Not to export them. For example, there are optimizations we could make with the query planner if we knew which tables and indexes were hot in general. That's how we started this discussion, and it's not solved by storing the stats history on another server. There's value to both, and there's no dearth of monitoring frameworks that people keep on replacing with successors, so there's certainly room for both ;-). Recent stuff about such... https://lopsa.org/content/philosophy-monitoring https://labs.omniti.com/labs/reconnoiter I'm not quite sure what ought to be in PostgreSQL as a built-in; I suspect that what's eventually needed is to be able to correlate things across database instances, so that when Tom says, I need to know what data the planner's working on, the answer can be OK, got that... This data is surely useful to get out of the system, so I'd bias towards something sorta like what Greg suggests. And the closed-ended answer may prevent us from asking more sophisticated questions, also not a notably good thing... -- (reverse (concatenate 'string moc.liamg @ enworbbc)) If tautologies do not convey information, mathematicians would not be surprised by them. -- Mark Miller -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CommitFest 2011-01 as of 2011-02-04
robertmh...@gmail.com (Robert Haas) writes: It does, but frankly I don't see much reason to change it, since it's been working pretty well on the whole. Andrew was on point when he mentioned that it's not obvious what committers get out of working on other people's patches. Obviously, the answer is, well, they get a better PostgreSQL, and that's ultimately good for all of us. But the trickiest part of this whole process is that, on the one hand, it's not fair for committers to ignore other people's patches, but on the other hand, it's not fair to expect committers to sacrifice getting their own projects done to get other people's projects done. I had two interesting germane comments in my RSS feed this morning, both entitled Please send a patch http://www.lucas-nussbaum.net/blog/?p=630 Where Lucas suggests that, when someone requests an enhancement, the retort Please send a patch mayn't be the best idea, because the one receiving the requests may be many times better at contributing such changes than the one making the request. http://hezmatt.org/~mpalmer/blog/general/please_send_a_patch.html On the other hand, Lucas, remember that each time you ask someone to take some time to implement your pet feature request, you take some time away from her that could be used to contribute something in an area where she gives a damn. These are *both* true statements, and, in order to grow the community that is capable of enhancing the system, there is merit to the careful application of both positions. There's stuff that Tom should do :-). And absent the general availability of cloning machines, we need to have people improving their skills so that there are more that are capable of submitting (and evaluating and committing) usable patches. -- wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','gmail.com'). http://linuxdatabases.info/info/languages.html Signs of a Klingon Programmer - 14. Our competitors are without honor! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why two dashes in extension load files
t...@sss.pgh.pa.us (Tom Lane) writes: Peter Eisentraut pete...@gmx.net writes: On mån, 2011-02-14 at 10:13 -0500, Tom Lane wrote: Peter Eisentraut pete...@gmx.net writes: Why do the extension load files need two dashes, like xml2--1.0.sql? Why isn't one enough? Because we'd have to forbid dashes in extension name and version strings. This was judged to be a less annoying solution. See yesterday's discussion. I'm not convinced. There was nothing in that discussion why any particular character would have to be allowed in a version number. Well, there's already a counterexample in the current contrib stuff: uuid-ossp. We could rename that to uuid_ossp of course, but it's not clear to me that there's consensus for forbidding dashes here. I suspect that _ might be troublesome. Let me observe on Debian policy... It requires that package names consist as follows: Package names (both source and binary, see Package, Section 5.6.7) must consist only of lower case letters (a-z), digits (0-9), plus (+) and minus (-) signs, and periods (.). They must be at least two characters long and must start with an alphanumeric character. http://www.debian.org/doc/debian-policy/ch-controlfields.html#s-f-Source I suspect that we'll need to have a policy analagous to that. Also worth observing: Debian package files are of the form: ${package}_${version}-${dversion}_${arch}.deb where package and version have fairly obvious interpretation, and... - dversion indicates a sequence handled by Debian - arch indicates CPU architecture (i386, amd64, ...) Probably the dversion/arch bits aren't of interest to us, but the remainder of the notation used by Debian seems not inapplicable for us. -- let name=cbbrowne and tld=gmail.com in String.concat @ [name;tld];; http://linuxdatabases.info/info/languages.html Signs of a Klingon Programmer - 4. You cannot really appreciate Dilbert unless you've read it in the original Klingon. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postponing some large patches to 9.2
pg...@j-davis.com (Jeff Davis) writes: On Tue, 2011-02-08 at 15:10 -0500, Chris Browne wrote: It's more than a bit sad... The RangeType change has the massive merit of enabling some substantial development changes, where we can get rid of whole classes of comparison clauses, and hopefully whole classes of range errors. That was my favorite would-be feature for 9.1. I appreciate the support. If you take the feature for a quick spin before the next commitfest, that would be a big help. If I get it in the first commitfest of 9.2 that may mean some follow-up features, like RANGE KEYs/FKs, and maybe even RANGE JOIN might have a chance for 9.2 as well. Or, maybe some other features might find it useful, like partitioning or audit logs. I've found my wish item... I wish that queries could expand ranges in much the same fashion that BETWEEN expands into two query nodes. That way, you can use a range to pick data from a large table, and not revert to a Seq Scan+Filter, which is what I'm seeing for the following sort of query: select * from some_data where '[2010-01-01,2010-02-01)'::daterange @ whensit; -- let name=cbbrowne and tld=gmail.com in String.concat @ [name;tld];; http://linuxfinances.info/info/lsf.html Rules of the Evil Overlord #162. If I steal something very important to the hero, I will not put it on public display. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Range Types - efficiency
One of the things I'd particularly like to use range types for is to make it easier to construct range-related queries. Classic example is that of reports that work on date ranges. I create a table that will have transaction data: CREATE TABLE some_data ( id serial, whensit date -- And it'll have other attributes, but those don't matter here... ); CREATE INDEX some_when ON some_data USING btree (whensit); I then populate it with a bunch of date-based data... rangetest@localhost- select count(*), min(whensit), max(whensit) from some_data; count |min |max ---++ 37440 | 2007-01-01 | 2014-12-27 (1 row) Here's the traditional way of doing a range-based query on this data: rangetest@localhost- explain analyze select * from some_data where whensit = '2010-01-01' and whensit '2010-02-01'; QUERY PLAN --- Bitmap Heap Scan on some_data (cost=12.30..184.23 rows=395 width=8) (actual time=0.064..0.150 rows=390 loops=1) Recheck Cond: ((whensit = '2010-01-01'::date) AND (whensit '2010-02-01'::date)) - Bitmap Index Scan on some_when (cost=0.00..12.21 rows=395 width=0) (actual time=0.054..0.054 rows=390 loops=1) Index Cond: ((whensit = '2010-01-01'::date) AND (whensit '2010-02-01'::date)) Total runtime: 0.197 ms (5 rows) The RangeType-based equivalent is the following: rangetest@localhost- explain analyze select * from some_data where '[2010-01-01,2010-02-01)'::daterange @ whensit; QUERY PLAN - Seq Scan on some_data (cost=0.00..634.00 rows=1 width=8) (actual time=1.045..111.739 rows=390 loops=1) Filter: ('[ 2010-01-01, 2010-02-01 )'::daterange @ whensit) Total runtime: 111.780 ms (3 rows) This, alas, reverts to a seq scan on the table, rather than restricting itself to the tuples of interest. I realize that, after a fashion, I'm using this backwards. But when I'm doing temporal stuff, that tends to be the pattern: - There is a set of temporal configuration, indicating criteria that are true for particular date ranges - There is then event data, which has but a single date, but which needs to be matched against the temporal configuration. It sure would be nice to expand that filter into subqueries involving the two criteria, in much the same fashion that is true today for BETWEEN. I imagine that would allow many queries with this kind of pattern to make use of indexes, making them visibly thousands of times faster. -- I have traveled the length and breadth of this country and talked with the best people, and can assure you that data processing is a fad that won't last out the year. -- Business books editor, Prentice Hall 1957 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types - efficiency
pg...@j-davis.com (Jeff Davis) writes: On Wed, 2011-02-09 at 16:20 -0500, Chris Browne wrote: rangetest@localhost- explain analyze select * from some_data where '[2010-01-01,2010-02-01)'::daterange @ whensit; QUERY PLAN - Seq Scan on some_data (cost=0.00..634.00 rows=1 width=8) (actual time=1.045..111.739 rows=390 loops=1) Filter: ('[ 2010-01-01, 2010-02-01 )'::daterange @ whensit) Total runtime: 111.780 ms (3 rows) This, alas, reverts to a seq scan on the table, rather than restricting itself to the tuples of interest. I realize that, after a fashion, I'm using this backwards. But when I'm doing temporal stuff, that tends to be the pattern: Yes. The index is a btree index on a normal column, so range types can't exactly help with that directly -- except maybe as a rewrite like you say. One thing you might try is a functional index on (range(whensit)) and then do: where '...' @ range(whensit). Does that work for you? That doesn't appear to actually help: rangetest@localhost- create index i2 on some_data (range(whensit)); CREATE INDEX rangetest@localhost- explain analyze select * from some_data where '[2010-01-01,2010-02-01)'::daterange @ range(whensit); QUERY PLAN - Seq Scan on some_data (cost=0.00..727.60 rows=12480 width=8) (actual time=1.030..110.542 rows=390 loops=1) Filter: ('[ 2010-01-01, 2010-02-01 )'::daterange @ range(whensit)) Total runtime: 110.585 ms (3 rows) In any case, I suggest that as a couple steps down the road thing, it would be desirable to have that query rewrite. Seems like a reasonable ToDo item to consider for the future, if not in the first deployment. Maybe that's something to add in 9.2 CommitFest #3! :-) -- There isn't any reason why Linux can't be implemented as an enterprise computing solution. Find out what you've been missing while you've been rebooting Windows NT. - Infoworld -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postponing some large patches to 9.2
sfr...@snowman.net (Stephen Frost) writes: * Robert Haas (robertmh...@gmail.com) wrote: - Range Types. This is a large patch which was submitted for the first time to the last CommitFest of the cycle, and the first version that had no open TODO items was posted yesterday, three-quarters of the way through that last CommitFest. Some good review has been done. While more is probably needed, I think we should feel good about what's been accomplished and mark this one Returned with Feedback. I don't agree w/ punting Range Types. Range Types were discussed as far back as the 2010 developer meeting, were discussed quite a bit again starting in October and throughout the fall, and Jeff has regularly been posting updates to it. Given how thorough Jeff is, my feeling is that this patch is more than ready for beta. My impression is also that it's not as invasive or destablizing as the others and while it wasn't being posted to the previous commit fests, it was clearly being worked on, updated, and improved. I generally mirror those thoughts. Range Types don't seem invasive or destabilizing, and the code base has been deployed for quite some time as an extension (not quite contrib). It would be disappointing to drop this one when it is mighty close. - synchronous replication. Based on some looking at this today, I am somewhat doubtful about the possibility of me or anyone else beating this completely into shape in time for 9.2, unless we choose to extend the deadline by several weeks. Simon said that he would have time to finish this in the next two weeks, but, as noted, the CommitFest is scheduled to be over in ONE week, and it looks to me like this is still pretty rough. However, there's a lot of good stuff in here, and I think it might be practical to get some portion of it committed even if we can't agree on all of it. I recommend we give this one a little more time to shake out before giving up on it. It really would be nice to have this, but I agree that it's pretty late in the game for it to be in the state is appears to be in. :/ It also seems to have been stalled for the past couple of months, which doesn't bode well for it, in my view. The stall troubles me, and doesn't bode terribly well for 9.1. -- Rules of the Evil Overlord #39. If I absolutely must ride into battle, I will certainly not ride at the forefront of my Legions of Terror, nor will I seek out my opposite number among his army. http://www.eviloverlord.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postponing some large patches to 9.2
pg...@j-davis.com (Jeff Davis) writes: On Tue, 2011-02-08 at 06:57 -0500, Stephen Frost wrote: * Robert Haas (robertmh...@gmail.com) wrote: - Range Types. This is a large patch which was submitted for the first time to the last CommitFest of the cycle, and the first version that had no open TODO items was posted yesterday, three-quarters of the way through that last CommitFest. Some good review has been done. While more is probably needed, I think we should feel good about what's been accomplished and mark this one Returned with Feedback. I don't agree w/ punting Range Types. Range Types were discussed as far back as the 2010 developer meeting, were discussed quite a bit again starting in October and throughout the fall, and Jeff has regularly been posting updates to it. Given how thorough Jeff is, my feeling is that this patch is more than ready for beta. I appreciate the sentiment, but in addition to some cleanup, any patch like this at least requires some discussion. It's a language change we'll be supporting for a long time. At minimum, we're a couple hundred emails shy of a real consensus on the naming ;) It's more than a bit sad... The RangeType change has the massive merit of enabling some substantial development changes, where we can get rid of whole classes of comparison clauses, and hopefully whole classes of range errors. That was my favorite would-be feature for 9.1. It'll take some time to get code changes into systems to use this; the sooner the feature's in a deployable version of Postgres, the earlier that kind of thing may start. -- (format nil ~S@~S cbbrowne gmail.com) http://www3.sympatico.ca/cbbrowne/internet.html Colorless green ideas sleep furiously. -- Noam Chomsky -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] OpenVMS - an effort which needs guidance and support.
peder...@ccsscorp.com (Bill Pedersen) writes: I look forward to hearing from people in the PostgreSQL community as well as from others interested in this effort. To a number of us, it's academically interesting, though, as we don't have VMS systems, it's not likely to be super-easy to assist in the matter. It certainly would be interesting to see how easy or difficult the port would be. I suspect that's a more interesting port than, say, Digital UNIX, these days. -- (reverse (concatenate 'string ofni.sesabatadxunil @ enworbbc)) http://linuxfinances.info/info/slony.html On the other hand, you have different fingers. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep for 2011CF1
dp...@pgadmin.org (Dave Page) writes: On Mon, Feb 7, 2011 at 6:55 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Feb 7, 2011 at 12:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: ... Well, the current CommitFest ends in one week, ... Really? I thought the idea for the last CF of a development cycle was that it kept going till we'd dealt with everything. Arbitrarily rejecting stuff we haven't dealt with doesn't seem fair. Uh, we did that with 8.4 and it was a disaster. The CommitFest lasted *five months*. We've been doing schedule-based CommitFests ever since and it's worked much better. Rejecting stuff because we haven't gotten round to dealing with it in such a short period of time is a damn good way to limit the number of contributions we get. I don't believe we've agreed at any point that the last commitfest should be the same time length as the others (when we originally came up with the commitfest idea, it certainly wasn't expected), and deciding that without giving people advanced notice is a really good way to piss them off and encourage them to go work on other things. If we're going to put a time limit on this - and I think we should - we should publish a date ASAP, that gives everyone a fair chance to finish their work - say, 4 weeks. Then, if we want to make the last commitfest the same length as the others next year, we can make that decision and document those plans. There *is* a problem that there doesn't seem to be enough time to readily allow development of larger features without people getting stuck fighting with the release periods. But that's not the problem taking place here. It was documented, last May, that the final CommitFest for 9.1 was to complete 2011-02-15, and there did seem to be agreement on that. It sure looks to me like there are going to be a bunch of items that, based on the recognized policies, need to get deferred to 9.2, and the prospects for Sync Rep getting into 9.1 don't look notably good to me. Looking at things statistically, the 9.1 commitfests have had the following numbers of items: #1 - 2010-09 - 52, of which 26 were committed #2 - 2010-11 - 43, of which 23 were committed #3 - 2011-01 - 98, of which 35 have been committed, and 10 are considered ready to commit. It may appear unfair to not offer everyone a fair chance to finish their work, but it's not as if the date wasn't published Plenty Long Ago. and well-publicized. But deferring the end of the CommitFest would be Not Fair to those that *did* get their proposed changes ready for the preceding Fests. We cannot evade unfairness. It's definitely readily arguable that fairness requires that: - Items not committable by 2011-02-15 be deferred to the 2011-Next fest There are around 25 items right now that are sitting with [Waiting for Author] and [Returned with Feedback] statuses. They largely seem like pretty fair game for next fest. - Large items that weren't included in the 2010-11 fest be considered problematic to try to integrate into 9.1 There sure seem to be some large items in the 2011-01 fest, which I thought wasn't supposed to be the case. We shouldn't just impose policy for the sake of imposing policy, but I do recall Really Long CommitFests being pretty disastrous. And there's *SO* much outstanding in this particular fest that it's getting past time for doing some substantial triage so that reviewer attentions may be directed towards the items most likely to be acceptable for 9.1. I hate to think that 9.1 won't include Simon's SR material, but that may have to be. -- http://www3.sympatico.ca/cbbrowne/slony.html It's a pretty rare beginner who isn't clueless. If beginners weren't clueless, the infamous Unix learning cliff wouldn't be a problem. -- david parsons -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers