Re: [HACKERS] Possible bug in cascaded standby
I'll retry and report back if I see the problem on the offending platform. Just to close out this thread, I can't reproduce this on the Mac OS either. While I'd done a make clean earlier, make distclean did the trick. Sorry for the noise. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] Optimising Foreign Key checks
On Tue, Jun 04, 2013 at 02:45:17PM +0100, Simon Riggs wrote: On Sun, Jun 02, 2013 at 10:45:21AM +0100, Simon Riggs wrote: For clarity the 4 problems are 1. SQL execution overhead 2. Memory usage 3. Memory scrolling 4. Locking overhead, specifically FPWs and WAL records from FK checks probably in that order or thereabouts. Lets rethink things to put a few options on the table and see what we get... 2. Don't store FK events in the after trigger queue at all, but apply them as we go. That solves problems2 and 3. That could be considered to be in violation of the SQL standard, which requires us to apply the checks at the end of the statement. We already violate the standard with regard to uniqueness checks, so doing it here doesn't seem unacceptable. I wouldn't like to see that compliance bug propagate to other constraint types. What clauses in the standard demand end-of-statement timing, anyway? What if we followed the example of deferred UNIQUE: attempt FK checks as we go and enqueue an after-trigger recheck when such an initial test fails? Implementation: Given we know that COPY uses a ring buffer, and given your earlier thoughts on use of a batched SQL, I have a new suggestion. Every time the ring buffer fills, we go through the last buffers accessed, pulling out all the PKs and then issue them as a single SQL statement (as above). We can do that manually, or using the block scan mentioned previously. This uses batched SQL to solve problem1. It doesn't build up a large data structure in memory, problem2, and it also solves problem3 by accessing data blocks before they fall out of the ring buffer. If there are no duplicates in the referenced table, then this behavious will do as much as possible to make accesses to the referenced table also use a small working set. (We may even wish to consider making the batched SQL use a separate ring buffer for RI accesses). That approach doesn't make any assumptions about duplicates. If this can be made standard-compliant, it sounds most fruitful. Perhaps another way would be to avoid very large COPY statements altogether, breaking down loads into smaller pieces. True. It would be nice for the system to not need such hand-holding, but that's a largely-adequate tool for coping in the field. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
From: Daniel Farina dan...@heroku.com On Fri, Jun 7, 2013 at 12:14 PM, Josh Berkus j...@agliodbs.com wrote: Right now, what we're telling users is You can have continuous backup with Postgres, but you'd better hire and expensive consultant to set it up for you, or use this external tool of dubious provenance which there's no packages for, or you might accidentally cause your database to shut down in the middle of the night. At which point most sensible users say no thanks, I'll use something else. Inverted and just as well supported: if you want to not accidentally lose data, you better hire an expensive consultant to check your systems for all sorts of default 'safety = off' features. This being but the hypothetical first one. Furthermore, I see no reason why high quality external archiving software cannot exist. Maybe some even exists already, and no doubt they can be improved and the contract with Postgres enriched to that purpose. Finally, it's not that hard to teach any archiver how to no-op at user-peril, or perhaps Postgres can learn a way to do this expressly to standardize the procedure a bit to ease publicly shared recipes, perhaps. Yes, I feel designing reliable archiving, even for the simplest case - copy WAL to disk, is very difficult. I know there are following three problems if you just follow the PostgreSQL manual. Average users won't notice them. I guess even professional DBAs migrating from other DBMSs won't, either. 1. If the machine or postgres crashes while archive_command is copying a WAL file, later archive recovery fails. This is because cp leaves a file of less than 16MB in archive area, and postgres refuses to start when it finds such a small archive WAL file. The solution, which IIRC Tomas san told me here, is to do like cp %p /archive/dir/%f.tmp mv /archive/dir/%f.tmp /archive/dir/%f. 2. archive_command dumps core when you run pg_ctl stop -mi. This is because postmaster sends SIGQUIT to all its descendants. The core files accumulate in the data directory, which will be backed up with the database. Of course those core files are garbage. archive_command script needs to catch SIGQUIT and exit. 3. You cannot know the reason of archive_command failure (e.g. archive area full) if you don't use PostgreSQL's server logging. This is because archive_command failure is not logged in syslog/eventlog. I hope PostgreSQL will provide a reliable archiving facility that is ready to use. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF-8 encoding problem w/ libpq
On 06/03/2013 02:41 PM, Andrew Dunstan wrote: On 06/03/2013 02:28 PM, Tom Lane wrote: . I wonder though if we couldn't just fix this code to not do anything to high-bit-set bytes in multibyte encodings. That's exactly what I suggested back in November. This thread seems to have gone cold, so I have applied the fix I originally suggested along these lines to all live branches. At least that means we won't produce junk, but we still need to work out how to downcase multi-byte characters. If anyone thinks there are other places in the code that need similar treatment, they are welcome to find them. I have not yet found one. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system catalog pg_rewrite column ev_attr document description problem
Kevin Grittner kgri...@ymail.com wrote: I'll leave this alone for a day. If nobody objects, I will change the ruleutils.c code to work with either value (to support pg_upgrade) and change the code to set this to zero, for 9.3 and forward only. I will change the 9.3 docs to mention that newly created rows will have zero, but that clusters upgraded via pg_upgrade may still have some rows containing the old value of -1. For anyone casually following along without reading the code, it's not a bug in the sense that current code ever misbehaves, but the value which has been used for ev_attr to indicate whole table since at least Postgres95 version 1.01 is not consistent with other places we set a dummy value in attribute number to indicate whole table. Since 2002 we have not supported column-level rules, so it has just been filled with a constant of -1. The idea is to change the constant to zero -- to make it more consistent so as to reduce confusion and the chance of future bugs should we ever decide to use the column again. When I went to make this change, I found over 100 lines of obsolete code related to this. Commit 95ef6a344821655ce4d0a74999ac49dd6af6d342 removed the ability to create a rule on a column effective with 7.3, but a lot of code for supporting that was left behind. It seems to me that the rewrite area is complicated without carrying code that's been dead for over a decade. The first patch removes the dead weight. The second patch makes the change suggested by Tom. Those carrying forward from beta1 without an initdb will still see some rows in pg_rewrite with -1, but anyone else will see zero for this column. Does anyone see a problem with applying both of these for 9.3? If we were a little earlier in the release cycle I would argue that if we're going to do anything with this column we should drop it. Which is exactly what I think we should do as soon as we branch. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company*** a/src/backend/rewrite/rewriteHandler.c --- b/src/backend/rewrite/rewriteHandler.c *** *** 1274,1288 matchLocks(CmdType event, } } ! if (oneLock-event == event) ! { ! if (parsetree-commandType != CMD_SELECT || ! (oneLock-attrno == -1 ? ! rangeTableEntry_used((Node *) parsetree, varno, 0) : ! attribute_used((Node *) parsetree, ! varno, oneLock-attrno, 0))) ! matching_locks = lappend(matching_locks, oneLock); ! } } return matching_locks; --- 1274,1281 } } ! if (oneLock-event == event parsetree-commandType != CMD_SELECT) ! matching_locks = lappend(matching_locks, oneLock); } return matching_locks; *** *** 1296,1302 static Query * ApplyRetrieveRule(Query *parsetree, RewriteRule *rule, int rt_index, - bool relation_level, Relation relation, List *activeRIRs, bool forUpdatePushedDown) --- 1289,1294 *** *** 1310,1317 ApplyRetrieveRule(Query *parsetree, elog(ERROR, expected just one rule action); if (rule-qual != NULL) elog(ERROR, cannot handle qualified ON SELECT rule); - if (!relation_level) - elog(ERROR, cannot handle per-attribute ON SELECT rule); if (rt_index == parsetree-resultRelation) { --- 1302,1307 *** *** 1633,1646 fireRIRrules(Query *parsetree, List *activeRIRs, bool forUpdatePushedDown) if (rule-event != CMD_SELECT) continue; - if (rule-attrno 0) - { - /* per-attr rule; do we need it? */ - if (!attribute_used((Node *) parsetree, rt_index, - rule-attrno, 0)) - continue; - } - locks = lappend(locks, rule); } --- 1623,1628 *** *** 1665,1671 fireRIRrules(Query *parsetree, List *activeRIRs, bool forUpdatePushedDown) parsetree = ApplyRetrieveRule(parsetree, rule, rt_index, - rule-attrno == -1, rel, activeRIRs, forUpdatePushedDown); --- 1647,1652 *** a/src/backend/rewrite/rewriteManip.c --- b/src/backend/rewrite/rewriteManip.c *** *** 858,927 rangeTableEntry_used(Node *node, int rt_index, int sublevels_up) /* - * attribute_used - - * Check if a specific attribute number of a RTE is used - * somewhere in the query or expression. - */ - - typedef struct - { - int rt_index; - int attno; - int sublevels_up; - } attribute_used_context; - - static bool - attribute_used_walker(Node *node, - attribute_used_context *context) - { - if (node == NULL) - return false; - if (IsA(node, Var)) - { - Var *var = (Var *) node; - - if (var-varlevelsup == context-sublevels_up - var-varno == context-rt_index - var-varattno == context-attno) - return true; - return false; - } - if (IsA(node, Query)) - { - /* Recurse into subselects */ - bool
[HACKERS] Proposed patch: remove hard-coded limit MAX_ALLOCATED_DESCS
Recently we had a gripe about how you can't read very many files concurrently with contrib/file_fdw: http://www.postgresql.org/message-id/of419b5767.8a3c9adb-on85257b79.005491e9-85257b79.0054f...@isn.rtss.qc.ca The reason for this is that file_fdw goes through the COPY code, which uses AllocateFile(), which has a wired-in assumption that not very many files need to be open concurrently. I thought for a bit about trying to get COPY to use a virtual file descriptor such as is provided by the rest of fd.c, but that didn't look too easy, and in any case probably nobody would be excited about adding additional overhead to the COPY code path. What seems more practical is to relax the hard-coded limit on the number of files concurrently open through AllocateFile(). Now, we could certainly turn that array into some open-ended list structure, but that still wouldn't let us have an arbitrary number of files open, because at some point we'd run into platform-specific EMFILES or ENFILES limits on the number of open file descriptors. In practice we want to keep it under the max_safe_fds limit that fd.c goes to a lot of trouble to determine. So it seems like the most useful compromise is to keep the allocatedDescs[] array data structure as-is, but allow its size to depend on max_safe_fds. In the attached proposed patch I limit it to max_safe_fds / 2, so that there's still a reasonable number of FDs available for fd.c's main pool of virtual FDs. On typical modern platforms that should be at least a couple of hundred, thus making the effective limit about an order of magnitude more than it is currently (32). Barring objections or better ideas, I'd like to back-patch this as far as 9.1 where file_fdw was introduced. regards, tom lane diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c index 78c7d41ac48d95d6648baa3951563ba9e4ad3496..7de93647759ce8dc58d0a9f3c8aad9ed7f36f5e6 100644 *** a/src/backend/storage/file/fd.c --- b/src/backend/storage/file/fd.c *** *** 43,50 * wrappers around fopen(3), opendir(3), popen(3) and open(2), respectively. * They behave like the corresponding native functions, except that the handle * is registered with the current subtransaction, and will be automatically ! * closed at abort. These are intended for short operations like reading a ! * configuration file, and there is a fixed limit on the number of files that * can be opened using these functions at any one time. * * Finally, BasicOpenFile is just a thin wrapper around open() that can --- 43,50 * wrappers around fopen(3), opendir(3), popen(3) and open(2), respectively. * They behave like the corresponding native functions, except that the handle * is registered with the current subtransaction, and will be automatically ! * closed at abort. These are intended mainly for short operations like ! * reading a configuration file; there is a limit on the number of files that * can be opened using these functions at any one time. * * Finally, BasicOpenFile is just a thin wrapper around open() that can *** static uint64 temporary_files_size = 0; *** 198,210 /* * List of OS handles opened with AllocateFile, AllocateDir and * OpenTransientFile. - * - * Since we don't want to encourage heavy use of those functions, - * it seems OK to put a pretty small maximum limit on the number of - * simultaneously allocated descs. */ - #define MAX_ALLOCATED_DESCS 32 - typedef enum { AllocateDescFile, --- 198,204 *** typedef enum *** 216,232 typedef struct { AllocateDescKind kind; union { FILE *file; DIR *dir; int fd; } desc; - SubTransactionId create_subid; } AllocateDesc; static int numAllocatedDescs = 0; ! static AllocateDesc allocatedDescs[MAX_ALLOCATED_DESCS]; /* * Number of temporary files opened during the current session; --- 210,227 typedef struct { AllocateDescKind kind; + SubTransactionId create_subid; union { FILE *file; DIR *dir; int fd; } desc; } AllocateDesc; static int numAllocatedDescs = 0; ! static int maxAllocatedDescs = 0; ! static AllocateDesc *allocatedDescs = NULL; /* * Number of temporary files opened during the current session; *** static void FreeVfd(File file); *** 284,289 --- 279,286 static int FileAccess(File file); static File OpenTemporaryFileInTablespace(Oid tblspcOid, bool rejectError); + static bool reserveAllocatedDesc(void); + static int FreeDesc(AllocateDesc *desc); static void AtProcExit_Files(int code, Datum arg); static void CleanupTempFiles(bool isProcExit); static void RemovePgTempFilesInDir(const char *tmpdirname); *** FilePathName(File file) *** 1491,1496 --- 1488,1553 /* + * Create/enlarge the allocatedDescs[] array if needed and possible. + * Returns true if an
Re: [HACKERS] system catalog pg_rewrite column ev_attr document description problem
Kevin Grittner kgri...@ymail.com writes: If we were a little earlier in the release cycle I would argue that if we're going to do anything with this column we should drop it. Which is exactly what I think we should do as soon as we branch. If we're going to do that, there doesn't seem to me to be a lot of point in adjusting the column's contents in 9.3 --- the argument for doing that was mostly forward compatibility with some future actual usage of the column. I'd be more inclined to leave HEAD as-is and then do the column-ectomy along with this removal of dead code in 9.4. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
On 06/07/2013 12:14 PM, Josh Berkus wrote: Right now, what we're telling users is You can have continuous backup with Postgres, but you'd better hire and expensive consultant to set it up for you, or use this external tool of dubious provenance which there's no packages for, or you might accidentally cause your database to shut down in the middle of the night. This is an outright falsehood. We are telling them, You better know what you are doing or You should call a consultant. This is no different than, You better know what you are doing or You should take driving lessons. At which point most sensible users say no thanks, I'll use something else. Josh I have always admired your flair for dramatics, it almost rivals mine. Users are free to use what they want, some will chose lesser databases. I am ok with that because eventually if PostgreSQL is the right tool, they will come back to us, and PgExperts or CMD or OmniTI or they will know what they are doing and thus don't need us. JD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
On 06/08/2013 07:36 AM, MauMau wrote: 1. If the machine or postgres crashes while archive_command is copying a WAL file, later archive recovery fails. This is because cp leaves a file of less than 16MB in archive area, and postgres refuses to start when it finds such a small archive WAL file. The solution, which IIRC Tomas san told me here, is to do like cp %p /archive/dir/%f.tmp mv /archive/dir/%f.tmp /archive/dir/%f. Well it seems to me that one of the problems here is we tell people to use copy. We should be telling people to use a command (or supply a command) that is smarter than that. 3. You cannot know the reason of archive_command failure (e.g. archive area full) if you don't use PostgreSQL's server logging. This is because archive_command failure is not logged in syslog/eventlog. Wait, what? Is this true (someone else?) JD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
On 06/06/2013 07:52 AM, Heikki Linnakangas wrote: I think it can be made fairly robust otherwise, and the performance impact should be pretty easy to measure with e.g pgbench. Once upon a time in a land far, far away, we expected users to manage their own systems. We had things like soft and hard quotas on disks and last log to find out who was logging into the system. Alas, as far as I know soft and hard quotas are kind of a thing of the past but that doesn't mean that their usefulness has ended. The idea that we PANIC is not just awful, it is stupid. I don't think anyone is going to disagree with that. However, there is a question of what to do instead. I think the idea of sprinkling checks into the higher level code before specific operations is not invalid but I also don't think it is necessary. To me, a more pragmatic approach makes sense. Obviously having some kind of code that checks the space makes sense but I don't know that it needs to be around any operation other than we are creating a segment. What do we care why the segment is being created? If we don't have enough room to create the segment, the transaction rollsback with some OBVIOUS not OBTUSE error. Obviously this could cause a ton of transactions to roll back but I think keeping the database consistent and rolling back a transaction in case of error is exactly what we are supposed to do. Sincerely, Joshua D. Drake - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
On 2013-06-08 11:15:40 -0700, Joshua D. Drake wrote: To me, a more pragmatic approach makes sense. Obviously having some kind of code that checks the space makes sense but I don't know that it needs to be around any operation other than we are creating a segment. What do we care why the segment is being created? If we don't have enough room to create the segment, the transaction rollsback with some OBVIOUS not OBTUSE error. Obviously this could cause a ton of transactions to roll back but I think keeping the database consistent and rolling back a transaction in case of error is exactly what we are supposed to do. You know, the PANIC isn't there just because we like to piss of users. There's actual technical reasons that don't just go away by judging the PANIC as stupid. At the points where the XLogInsert()s happens we're in critical sections out of which we *cannot* ERROR out because we already may have made modifications that cannot be allowed to be performed partially/unlogged. That's why we're throwing a PANIC which will force a cluster wide restart including *NOT* writing any further buffers from s_b out. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
On 2013-06-07 12:02:57 +0300, Heikki Linnakangas wrote: On 07.06.2013 00:38, Andres Freund wrote: On 2013-06-06 23:28:19 +0200, Christian Ullrich wrote: * Heikki Linnakangas wrote: The current situation is that if you run out of disk space while writing WAL, you get a PANIC, and the server shuts down. That's awful. We can So we need to somehow stop new WAL insertions from happening, before it's too late. A naive idea is to check if there's enough preallocated WAL space, just before inserting the WAL record. However, it's too late to check that in There is a database engine, Microsoft's Jet Blue aka the Extensible Storage Engine, that just keeps some preallocated log files around, specifically so it can get consistent and halt cleanly if it runs out of disk space. In other words, the idea is not to check over and over again that there is enough already-reserved WAL space, but to make sure there always is by having a preallocated segment that is never used outside a disk space emergency. That's not a bad technique. I wonder how reliable it would be in postgres. That's no different from just having a bit more WAL space in the first place. We need a mechanism to stop backends from writing WAL, before you run out of it completely. It doesn't matter if the reservation is done by stashing away a WAL segment for emergency use, or by a variable in shared memory. Either way, backends need to stop using it up, by blocking or throwing an error before they enter the critical section. Well, if you have 16 or 32MB of reserved WAL space available you don't need to judge all that precisely how much space is available. So we can just sprinkle some EnsureXLogHasSpace() on XLogInsert() callsites like heap_insert(), but we can do that outside of the critical sections and we can do it without locks since there needs to happen quite some write activity to overrun the reserved space. Anything that desparately needs to write stuff, like the end of recovery checkpoint, can just not call EnsureXLogHasSpace() and rely on the reserved space. Seems like 90% of the solution for 30% of the complexity or so. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system catalog pg_rewrite column ev_attr document description problem
Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kgri...@ymail.com writes: If we were a little earlier in the release cycle I would argue that if we're going to do anything with this column we should drop it. Which is exactly what I think we should do as soon as we branch. If we're going to do that, there doesn't seem to me to be a lot of point in adjusting the column's contents in 9.3 --- the argument for doing that was mostly forward compatibility with some future actual usage of the column. I'd be more inclined to leave HEAD as-is and then do the column-ectomy along with this removal of dead code in 9.4. ok -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bad error message on valuntil
On 06/07/2013 12:31 PM, Tom Lane wrote: Joshua D. Drake j...@commandprompt.com writes: On 06/07/2013 11:57 AM, Tom Lane wrote: I think it's intentional that we don't tell the *client* that level of detail. Why? That seems rather silly. The general policy on authentication failure reports is that we don't tell the client anything it doesn't know already about what the auth method is. We can log additional info into the postmaster log if it I was looking at the code and I saw this catchall: default: errstr = gettext_noop(authentication failed for user \%s\: invalid authentication method); break; I think we could make the argument that if valuntil is expired that the authentication method is invalid. Thoughts? Else I am trying to come up with some decent wording... something like: Authentication failed: not all authentication tokens were met ? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
On Fri, Jun 7, 2013 at 12:14 PM, Josh Berkus j...@agliodbs.com wrote: The archive command can be made a shell script (or that matter a compiled program) which can do anything it wants upon failure, including emailing people. You're talking about using external tools -- frequently hackish, workaround ones -- to handle something which PostgreSQL should be doing itself, and which only the database engine has full knowledge of. I think the database engine is about the last thing which would have full knowledge of the best way to contact the DBA, especially during events which by definition mean things are already going badly. I certainly don't see having core code which knows how to talk to every PBX, SMS, email system, or twitter feed that anyone might wish to use for logging. PostgreSQL already supports two formats of text logs, plus syslog and eventlog. Is there some additional logging management tool that we could support which is widely used, doesn't require an expensive consultant to set-up and configure correctly (or even to decide what correctly means for the given situation), and which solves 80% of the problems? It would be nice to have the ability to specify multiple log destinations with different log_min_messages for each one. I'm sure syslog already must implement some kind of method for doing that, but I've been happy enough with the text logs that I've never bothered to look into it much. While that's the only solution we have for now, it's hardly a worthy design goal. Right now, what we're telling users is You can have continuous backup with Postgres, but you'd better hire and expensive consultant to set it up for you, or use this external tool of dubious provenance which there's no packages for, or you might accidentally cause your database to shut down in the middle of the night. At which point most sensible users say no thanks, I'll use something else. What does the something else do? Hopefully it is not silently invalidate your backups. Cheers, Jeff
Re: [HACKERS] Proposed patch: remove hard-coded limit MAX_ALLOCATED_DESCS
Le samedi 8 juin 2013 19:06:58, Tom Lane a écrit : Recently we had a gripe about how you can't read very many files concurrently with contrib/file_fdw: http://www.postgresql.org/message-id/OF419B5767.8A3C9ADB-ON85257B79.005491E 9-85257b79.0054f...@isn.rtss.qc.ca The reason for this is that file_fdw goes through the COPY code, which uses AllocateFile(), which has a wired-in assumption that not very many files need to be open concurrently. I thought for a bit about trying to get COPY to use a virtual file descriptor such as is provided by the rest of fd.c, but that didn't look too easy, and in any case probably nobody would be excited about adding additional overhead to the COPY code path. What seems more practical is to relax the hard-coded limit on the number of files concurrently open through AllocateFile(). Now, we could certainly turn that array into some open-ended list structure, but that still wouldn't let us have an arbitrary number of files open, because at some point we'd run into platform-specific EMFILES or ENFILES limits on the number of open file descriptors. In practice we want to keep it under the max_safe_fds limit that fd.c goes to a lot of trouble to determine. So it seems like the most useful compromise is to keep the allocatedDescs[] array data structure as-is, but allow its size to depend on max_safe_fds. In the attached proposed patch I limit it to max_safe_fds / 2, so that there's still a reasonable number of FDs available for fd.c's main pool of virtual FDs. On typical modern platforms that should be at least a couple of hundred, thus making the effective limit about an order of magnitude more than it is currently (32). Barring objections or better ideas, I'd like to back-patch this as far as 9.1 where file_fdw was introduced. I just wonder about this statement that you removed: * Since we don't want to encourage heavy use of those functions, * it seems OK to put a pretty small maximum limit on the number of * simultaneously allocated descs. Is it now encouraged to use those functions, or at least that it seems less 'scary' than in the past ? -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Proposed patch: remove hard-coded limit MAX_ALLOCATED_DESCS
=?iso-8859-15?q?C=E9dric_Villemain?= ced...@2ndquadrant.com writes: I just wonder about this statement that you removed: * Since we don't want to encourage heavy use of those functions, * it seems OK to put a pretty small maximum limit on the number of * simultaneously allocated descs. Is it now encouraged to use those functions, or at least that it seems less 'scary' than in the past ? Well, we could put back some weaker form of the statement, but I wasn't sure how to word it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
On Sat, Jun 8, 2013 at 11:15 AM, Joshua D. Drake j...@commandprompt.comwrote: On 06/06/2013 07:52 AM, Heikki Linnakangas wrote: I think it can be made fairly robust otherwise, and the performance impact should be pretty easy to measure with e.g pgbench. Once upon a time in a land far, far away, we expected users to manage their own systems. We had things like soft and hard quotas on disks and last log to find out who was logging into the system. Alas, as far as I know soft and hard quotas are kind of a thing of the past but that doesn't mean that their usefulness has ended. The idea that we PANIC is not just awful, it is stupid. I don't think anyone is going to disagree with that. However, there is a question of what to do instead. I think the idea of sprinkling checks into the higher level code before specific operations is not invalid but I also don't think it is necessary. Given that the system is going to become unusable, I don't see why PANIC is an awful, stupid way of doing it. And if it can only be used for things that don't generate WAL, that is pretty much unusable, as even read only transactions often need to do clean-up tasks that generate WAL. Cheers, Jeff
Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)
On 6/1/13 5:00 AM, Fabien COELHO wrote: Question 1: should it report the maximum lang encountered? I haven't found the lag measurement to be very useful yet, outside of debugging the feature itself. Accordingly I don't see a reason to add even more statistics about the number outside of testing the code. I'm seeing some weird lag problems that this will be useful for though right now, more on that a few places below. Question 2: the next step would be to have the current lag shown under option --progress, but that would mean having a combined --throttle --progress patch submission, or maybe dependencies between patches. This is getting too far ahead. Let's get the throttle part nailed down before introducing even more moving parts into this. I've attached an updated patch that changes a few things around already. I'm not done with this yet and it needs some more review before commit, but it's not too far away from being ready. This feature works quite well. On a system that will run at 25K TPS without any limit, I did a run with 25 clients and a rate of 400/second, aiming at 10,000 TPS, and that's what I got: number of clients: 25 number of threads: 1 duration: 60 s number of transactions actually processed: 599620 average transaction lag: 0.307 ms tps = 9954.779317 (including connections establishing) tps = 9964.947522 (excluding connections establishing) I never thought of implementing the throttle like this before, but it seems to work out well so far. Check out tps.png to see the smoothness of the TPS curve (the graphs came out of pgbench-tools. There's a little more play outside of the target than ideal for this case. Maybe it's worth tightening the Poisson curve a bit around its center? The main implementation issue I haven't looked into yet is why things can get weird at the end of the run. See the latency.png graph attached and you can see what I mean. I didn't like the naming on this option or all of the ways you could specify the delay. None of those really added anything, since you can get every other behavior by specifying a non-integer TPS. And using the word throttle inside the code is fine, but I didn't like exposing that implementation detail more than it had to be. What I did instead was think of this as a transaction rate target, which makes the help a whole lot simpler: -R SPEC, --rate SPEC target rate per client in transactions per second Made the documentation easier to write too. I'm not quite done with that yet, the docs wording in this updated patch could still be better. I personally would like this better if --rate specified a *total* rate across all clients. However, there are examples of both types of settings in the program already, so there's no one precedent for which is right here. -t is per-client and now -R is too; I'd prefer it to be like -T instead. It's not that important though, and the code is cleaner as it's written right now. Maybe this is better; I'm not sure. I did some basic error handling checks on this and they seemed good, the program rejects target rates of =0. On the topic of this weird latency spike issue, I did see that show up in some of the results too. Here's one where I tried to specify a rate higher than the system can actually handle, 8 TPS total on a SELECT-only test $ pgbench -S -T 30 -c 8 -j 4 -R1tps pgbench starting vacuum...end. transaction type: SELECT only scaling factor: 100 query mode: simple number of clients: 8 number of threads: 4 duration: 30 s number of transactions actually processed: 761779 average transaction lag: 10298.380 ms tps = 25392.312544 (including connections establishing) tps = 25397.294583 (excluding connections establishing) It was actually limited by the capabilities of the hardware, 25K TPS. 10298 ms of lag per transaction can't be right though. Some general patch submission suggestions for you as a new contributor: -When re-submitting something with improvements, it's a good idea to add a version number to the patch so reviewers can tell them apart easily. But there is no reason to change the subject line of the e-mail each time. I followed that standard here. If you updated this again I would name the file pgbench-throttle-v9.patch but keep the same e-mail subject. -There were some extra carriage return characters in your last submission. Wasn't a problem this time, but if you can get rid of those that makes for a better patch. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com attachment: tps.pngattachment: latency.pngdiff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index 8c202bf..799dfcd 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -137,6 +137,12 @@ intunlogged_tables = 0; double sample_rate = 0.0; /* + * When clients are
Re: [HACKERS] Proposed patch: remove hard-coded limit MAX_ALLOCATED_DESCS
I just wonder about this statement that you removed: * Since we don't want to encourage heavy use of those functions, * it seems OK to put a pretty small maximum limit on the number of * simultaneously allocated descs. Is it now encouraged to use those functions, or at least that it seems less 'scary' than in the past ? Well, we could put back some weaker form of the statement, but I wasn't sure how to word it. I'm not sure of the 'expected' problems... The only thing I can think of at the moment is the time spent to close file descriptors on abort/commit. I'm not sure of expected value of max_safe_fds. Your patch now initialize with 5 slots instead of 10, if max_safe_fds is large maybe it is better to double the size each time we need instead of jumping directly to the largest size ? -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation
Re: [HACKERS] Redesigning checkpoint_segments
On 6/7/13 2:43 PM, Robert Haas wrote: name. What I would like to see is a single number here in memory units that replaces both checkpoint_segments and wal_keep_segments. This isn't really making sense to me. I don't think we should assume that someone who wants to keep WAL around for replication also wants to wait longer between checkpoints. Those are two quite different things. It's been years since I saw anyone actually using checkpoint_segments as that sort of limit. I see a lot of sites pushing the segments limit up and then using checkpoint_timeout carefully. It's pretty natural to say I don't want to go more than X minutes between checkpoints. The case for wanting to say I don't want to go more than X MB between checkpoints instead, motivated by not wanting too much activity to queue between them, I'm just not seeing demand for that now. The main reason I do see people paying attention to checkpoint_segments still is to try and put a useful bound on WAL disk space usage. That's the use case I think overlaps with wal_keep_segments such that you might replace both of them. I think we really only need one control that limits how much WAL space is expected inside of pg_xlog, and it should be easy and obvious how to set it. The more I look at this checkpoint_segments patch, the more I wonder why it's worth even bothering with anything but a disk space control here. checkpoint_segments is turning into an internal implementation detail most sites I see wouldn't miss at all. Rather than put work into autotuning it, I'd be happy to eliminate checkpoint_segments altogther, in favor of a WAL disk space limit. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost limited statements RFC
On Thu, Jun 6, 2013 at 2:27 PM, Andres Freund and...@2ndquadrant.comwrote: On 2013-06-06 12:34:01 -0700, Jeff Janes wrote: On Fri, May 24, 2013 at 11:51 AM, Greg Smith g...@2ndquadrant.com wrote: On 5/24/13 9:21 AM, Robert Haas wrote: But I wonder if we wouldn't be better off coming up with a little more user-friendly API. Instead of exposing a cost delay, a cost limit, and various charges, perhaps we should just provide limits measured in KB/s, like dirty_rate_limit = amount of data you can dirty per second, in kB and read_rate_limit = amount of data you can read into shared buffers per second, in kB. I already made and lost the argument for doing vacuum in KB/s units, so I wasn't planning on putting that in the way of this one. I think the problem is that making that change would force people to relearn something that was already long established, and it was far from clear that the improvement, though real, was big enough to justify forcing people to do that. I don't find that argument very convincing. Since you basically can translate the current variables into something like the above variables with some squinting we sure could have come up with some way to keep the old definition and automatically set the new GUCs and the other way round. That may be, but it was not what the patch that was submitted did. And I don't think the author or the reviewers were eager to put in the effort to make that change, which would surely be quite a bit more work than the original patch was in the first place. Also, I'm not sure that such a complexity would even be welcomed. It sounds like an ongoing maintenance cost, and I'm sure the word baroque would get thrown around. Anyway, I don't think that resistance to making user visible changes to old features should inhibit us from incorporating lessons from them into new features. guc.c should even have enough information to prohibit setting both in the config file... Is there precedence/infrastructure for things like that? I could see uses for mutually exclusive complexes of configuration variables, but I wouldn't even know where to start in implementing such. Cheers, Jeff
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
On 06/08/2013 11:27 AM, Andres Freund wrote: On 2013-06-08 11:15:40 -0700, Joshua D. Drake wrote: To me, a more pragmatic approach makes sense. Obviously having some kind of code that checks the space makes sense but I don't know that it needs to be around any operation other than we are creating a segment. What do we care why the segment is being created? If we don't have enough room to create the segment, the transaction rollsback with some OBVIOUS not OBTUSE error. Obviously this could cause a ton of transactions to roll back but I think keeping the database consistent and rolling back a transaction in case of error is exactly what we are supposed to do. You know, the PANIC isn't there just because we like to piss of users. There's actual technical reasons that don't just go away by judging the PANIC as stupid. Yes I know we aren't trying to piss off users. What I am saying is that it is stupid to the user that it PANICS. I apologize if that came out wrong. At the points where the XLogInsert()s happens we're in critical sections out of which we *cannot* ERROR out because we already may have made modifications that cannot be allowed to be performed partially/unlogged. That's why we're throwing a PANIC which will force a cluster wide restart including *NOT* writing any further buffers from s_b out. Does this preclude (sorry I don't know this part of the code very well) my suggestion of on log create? JD Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] small patch to crypt.c
Hello, In my quest to understand how all the logging etc works with authentication I came across the area of crypt.c that checks for valid_until but it seems like it has an extraneous check. If I am wrong I apologize for the noise but wouldn't mind an explanation. index f01d904..8d809b2 100644 --- a/src/backend/libpq/crypt.c +++ b/src/backend/libpq/crypt.c @@ -145,9 +145,7 @@ md5_crypt_verify(const Port *port, const char *role, char *client_pass) /* * Password OK, now check to be sure we are not past rolvaliduntil */ - if (isnull) - retval = STATUS_OK; - else if (vuntil GetCurrentTimestamp()) + if (vuntil GetCurrentTimestamp()) retval = STATUS_ERROR; else retval = STATUS_OK; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimising Foreign Key checks
On 8 June 2013 15:30, Noah Misch n...@leadboat.com wrote: On Tue, Jun 04, 2013 at 02:45:17PM +0100, Simon Riggs wrote: On Sun, Jun 02, 2013 at 10:45:21AM +0100, Simon Riggs wrote: For clarity the 4 problems are 1. SQL execution overhead 2. Memory usage 3. Memory scrolling 4. Locking overhead, specifically FPWs and WAL records from FK checks probably in that order or thereabouts. Lets rethink things to put a few options on the table and see what we get... 2. Don't store FK events in the after trigger queue at all, but apply them as we go. That solves problems2 and 3. That could be considered to be in violation of the SQL standard, which requires us to apply the checks at the end of the statement. We already violate the standard with regard to uniqueness checks, so doing it here doesn't seem unacceptable. I wouldn't like to see that compliance bug propagate to other constraint types. What clauses in the standard demand end-of-statement timing, anyway? What if we followed the example of deferred UNIQUE: attempt FK checks as we go and enqueue an after-trigger recheck when such an initial test fails? The copy I have access to (2008 draft), 4.17.2 Checking of constraints The checking of a constraint depends on its constraint mode within the current SQL-transaction. Whenever an SQL-statement is executed, every constraint whose mode is immediate is checked, at a certain point after any changes to SQL-data and schemas resulting from that execution have been effected, to see if it is satisfied. A constraint is satisfied if and only if the applicable search condition included in its descriptor evaluates to True or Unknown. If any constraint is not satisfied, then any changes to SQL-data or schemas resulting from executing that statement are canceled. (See the General Rules of Subclause 13.5, “SQL procedure statement”. NOTE 31 — This includes SQL-statements that are executed as a direct result or an indirect result of executing a different SQL-statement. It also includes statements whose effects explicitly or implicitly include setting the constraint mode to immediate. I can't see anything there that stops me applying locks as we go, but I feel like someone will object... This point seems crucial to the particular approach we take, so I need wider input. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost limited statements RFC
On Thu, Jun 6, 2013 at 1:02 PM, Robert Haas robertmh...@gmail.com wrote: If we can see our way clear to ripping out the autovacuum costing stuff and replacing them with a read rate limit and a dirty rate limit, I'd be in favor of that. The current system limits the linear combination of those with user-specified coefficients, which is more powerful but less intuitive. If we need that, we'll have to keep it the way it is, but I'm hoping we don't. I don't know what two independent setting would look like. Say you keep two independent counters, where each can trigger a sleep, and the triggering of that sleep clears only its own counter. Now you still have a limit on the linear combination, it is just that summation has moved to a different location. You have two independent streams of sleeps, but they add up to the same amount of sleeping as a single stream based on a summed counter. Or if one sleep clears both counters (the one that triggered it and the other one), I don't think that that is what I would call independent either. Or at least not if it has no memory. The intuitive meaning of independent would require that it keep track of which of the two counters was controlling over the last few seconds. Am I overthinking this? Also, in all the anecdotes I've been hearing about autovacuum causing problems from too much IO, in which people can identify the specific problem, it has always been the write pressure, not the read, that caused the problem. Should the default be to have the read limit be inactive and rely on the dirty-limit to do the throttling? Cheers, Jeff
[HACKERS] ALTER TABLE ... ALTER CONSTRAINT
While fiddling with FK tuning, it was useful to be able to enable and disable the DEFERRED mode of constraints. That is not currently possible in SQL, so I wrote this patch. Without this you have to drop and then re-add a constraint, which is impractical for large tables. e.g. CREATE TABLE fktable (id integer, fk integer REFERENCES pktable (id)); ALTER TABLE foo ALTER CONSTRAINT fktable_fk_fkey DEFERRED INITIALLY IMMEDIATE; Includes docs and tests. Currently works for FKs only. Potentially other constraints can be supported in future. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services alter_table_alter_constraint.v1.sql Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost limited statements RFC
On 6/8/13 4:43 PM, Jeff Janes wrote: Also, in all the anecdotes I've been hearing about autovacuum causing problems from too much IO, in which people can identify the specific problem, it has always been the write pressure, not the read, that caused the problem. Should the default be to have the read limit be inactive and rely on the dirty-limit to do the throttling? That would be bad, I have to carefully constrain both of them on systems that are short on I/O throughput. There all sorts of cases where cleanup of a large and badly cached relation will hit the read limit right now. I suspect the reason we don't see as many complaints is that a lot more systems can handle 7.8MB/s of random reads then there are ones that can do 3.9MB/s of random writes. If we removed that read limit, a lot more complaints would start rolling in about the read side. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Batch API for After Triggers
While fiddling with FK tuning, Noah suggested batching trigger executions together to avoid execution overhead. It turns out there is no easy way to write triggers that can take advantage of the knowledge that they are being executed as a set of trigger executions. Some API is required to allow a trigger to understand that there may be other related trigger executions in the very near future, so it can attempt to amortise call overhead across many invocations (batching). The attached patch adds two fields to the TriggerDesc trigger functions are handed, allowing them to inspect (if they choose) the additional fields and thus potentially use some form of batching. This is backwards compatible with earlier trigger API. Two fields are int tg_tot_num_events; int tg_event_num So your trigger can work out it is e.g. number 3 of 56 invocations in the current set of after triggers. Going back to Noah's example, this would allow you to collect all 56 values and then execute a single statement with an array of 56 values in it. Knowing there are 56 means you can wait until the 56th invocation before executing the batched statement, without risking skipping some checks because you've only got half a batch left. If you don't do this, then you'd need to introduce the concept of a final function similar to the way aggregates work. But that seems much too complex to be real world useful. This seemed a generally useful approach for any after trigger author, not just for RI. Comments please. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services batch_api_after_triggers.v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
On 7 June 2013 10:02, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 07.06.2013 00:38, Andres Freund wrote: On 2013-06-06 23:28:19 +0200, Christian Ullrich wrote: * Heikki Linnakangas wrote: The current situation is that if you run out of disk space while writing WAL, you get a PANIC, and the server shuts down. That's awful. We can So we need to somehow stop new WAL insertions from happening, before it's too late. A naive idea is to check if there's enough preallocated WAL space, just before inserting the WAL record. However, it's too late to check that in There is a database engine, Microsoft's Jet Blue aka the Extensible Storage Engine, that just keeps some preallocated log files around, specifically so it can get consistent and halt cleanly if it runs out of disk space. In other words, the idea is not to check over and over again that there is enough already-reserved WAL space, but to make sure there always is by having a preallocated segment that is never used outside a disk space emergency. That's not a bad technique. I wonder how reliable it would be in postgres. That's no different from just having a bit more WAL space in the first place. We need a mechanism to stop backends from writing WAL, before you run out of it completely. It doesn't matter if the reservation is done by stashing away a WAL segment for emergency use, or by a variable in shared memory. Either way, backends need to stop using it up, by blocking or throwing an error before they enter the critical section. I guess you could use the stashed away segment to ensure that you can recover after PANIC. At recovery, there are no other backends that could use up the emergency segment. But that's not much better than what we have now. Christian's idea seems good to me. Looks like you could be dismissing this too early, especially since there's no better idea emerged. I doubt that we're going to think of something others didn't already face. It's pretty clear that most other DBMS do this with their logs. For your fast wal insert patch to work well, we need a simple and fast technique to detect out of space. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost limited statements RFC
On Sat, Jun 8, 2013 at 1:57 PM, Greg Smith g...@2ndquadrant.com wrote: On 6/8/13 4:43 PM, Jeff Janes wrote: Also, in all the anecdotes I've been hearing about autovacuum causing problems from too much IO, in which people can identify the specific problem, it has always been the write pressure, not the read, that caused the problem. Should the default be to have the read limit be inactive and rely on the dirty-limit to do the throttling? That would be bad, I have to carefully constrain both of them on systems that are short on I/O throughput. There all sorts of cases where cleanup of a large and badly cached relation will hit the read limit right now. I wouldn't remove the ability, just change the default. You can still tune your exquisitely balanced systems :) Of course if the default were to be changed, who knows what complaints we would start getting, which we don't get now because the current default prevents them. But my gut feeling is that if autovacuum is trying to read faster than the hardware will support, it will just automatically get throttled, by inherent IO waits, at a level which can be comfortably supported. And this will cause minimal interference with other processes. It is self-limiting. If it tries to write too much, however, the IO system is reduced to a quivering heap, not just for that process, but for all others as well. I suspect the reason we don't see as many complaints is that a lot more systems can handle 7.8MB/s of random reads then there are ones that can do 3.9MB/s of random writes. If we removed that read limit, a lot more complaints would start rolling in about the read side. Why is there so much random IO? Do your systems have autovacuum_vacuum_scale_factor set far below the default? Unless they do, most of the IO (both read and write) should be sequential. Or at least, I don't understand why they are not sequential. Cheers, Jeff
Re: [HACKERS] Cost limited statements RFC
Greg Smith g...@2ndquadrant.com wrote: I suspect the reason we don't see as many complaints is that a lot more systems can handle 7.8MB/s of random reads then there are ones that can do 3.9MB/s of random writes. If we removed that read limit, a lot more complaints would start rolling in about the read side. I'll believe that all of that is true, but I think there's another reason. With multiple layers of write cache (PostgreSQL shared_buffers, OS cache, controller or SSD cache) I think there's a tendency for an avalanche effect. Dirty pages stick to cache at each level like snow on the side of a mountain, accumulating over time. When it finally breaks loose at the top, it causes more from lower levels to break free as it passes. The result at the bottom can be devastating. Before I leave the metaphor, I will admit that I've sometimes done the equivalent of setting off an occasional stick of dynamite to cause things to cascade down before they have built up to a more dangerous level. Setting aside the colorful imagery, with a write cache you often see *very* fast writes for bursts or even sustained writes up to a certain point, after which you suddenly have serious latency spikes. Reads tend to degrade more gracefully, giving you a sense that you're starting to get into trouble while you still have time to react to prevent extreme conditions. At least that has been my experience. I think the sudden onset of problems from write saturation contributes to the complaints. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Batch API for After Triggers
Simon Riggs si...@2ndquadrant.com wrote: Comments please. How much of this problem space do you think could be addressed by providing OLD and NEW *relations* to AFTER EACH STATEMENT triggers? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed patch: remove hard-coded limit MAX_ALLOCATED_DESCS
=?iso-8859-1?q?C=E9dric_Villemain?= ced...@2ndquadrant.com writes: I'm not sure of expected value of max_safe_fds. Your patch now initialize with 5 slots instead of 10, if max_safe_fds is large maybe it is better to double the size each time we need instead of jumping directly to the largest size ? I don't see the point particularly. At the default value of max_files_per_process (1000), the patch can allocate at most 500 array elements, which on a 64-bit machine would probably be 16 bytes each or 8KB total. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimising Foreign Key checks
On Sat, Jun 08, 2013 at 09:39:14PM +0100, Simon Riggs wrote: On 8 June 2013 15:30, Noah Misch n...@leadboat.com wrote: On Tue, Jun 04, 2013 at 02:45:17PM +0100, Simon Riggs wrote: 2. Don't store FK events in the after trigger queue at all, but apply them as we go. That solves problems2 and 3. That could be considered to be in violation of the SQL standard, which requires us to apply the checks at the end of the statement. We already violate the standard with regard to uniqueness checks, so doing it here doesn't seem unacceptable. I wouldn't like to see that compliance bug propagate to other constraint types. What clauses in the standard demand end-of-statement timing, anyway? What if we followed the example of deferred UNIQUE: attempt FK checks as we go and enqueue an after-trigger recheck when such an initial test fails? The copy I have access to (2008 draft), 4.17.2 Checking of constraints The checking of a constraint depends on its constraint mode within the current SQL-transaction. Whenever an SQL-statement is executed, every constraint whose mode is immediate is checked, at a certain point after any changes to SQL-data and schemas resulting from that execution have been effected, to see if it is satisfied. A constraint is satisfied if and only if the applicable search condition included in its descriptor evaluates to True or Unknown. If any constraint is not satisfied, then any changes to SQL-data or schemas resulting from executing that statement are canceled. (See the General Rules of Subclause 13.5, “SQL procedure statement”. NOTE 31 — This includes SQL-statements that are executed as a direct result or an indirect result of executing a different SQL-statement. It also includes statements whose effects explicitly or implicitly include setting the constraint mode to immediate. This does appear to specify FK timing semantics like PostgreSQL gives today. Namely, it does not permit a FK-induced error when later actions of the query that prompted the check could possibly remedy the violation. I can't see anything there that stops me applying locks as we go, but Likewise; I don't see why we couldn't perform an optimistic check ASAP and schedule a final after-statement check when an early check fails. That changes performance characteristics without changing semantics. I feel like someone will object... This point seems crucial to the particular approach we take, so I need wider input. Agreed. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost limited statements RFC
On 6/8/13 5:17 PM, Jeff Janes wrote: But my gut feeling is that if autovacuum is trying to read faster than the hardware will support, it will just automatically get throttled, by inherent IO waits, at a level which can be comfortably supported. And this will cause minimal interference with other processes. If this were true all the time autovacuum tuning would be a lot easier. You can easily make a whole server unresponsive by letting loose one rogue process doing a lot of reads. Right now this isn't a problem for autovacuum because any one process running at 7.8MB/s is usually not a big deal. It doesn't take too much in the way of read-ahead logic and throughput to satisfy that. But I've seen people try and push the read rate upwards who didn't get very far beyond that before it was way too obtrusive. I could collect some data from troubled servers to see how high I can push the read rate before they suffer. Maybe there's a case there for increasing the default read rate because the write one is a good enough secondary limiter. I'd be surprised if we could get away with more than a 2 or 3X increase though, and the idea of going unlimited is really scary. It took me a year of before/after data collection before I was confident that it's OK to run unrestricted in all cache hit situations. Why is there so much random IO? Do your systems have autovacuum_vacuum_scale_factor set far below the default? Unless they do, most of the IO (both read and write) should be sequential. Insert one process doing sequential reads into a stream of other activity and you can easily get random I/O against the disks out of the mix. You don't necessarily need the other activity to be random to get that. N sequential readers eventually acts like N random readers for high enough values of N. On busy servers, autovacuum is normally competing against multiple random I/O processes though. Also, the database's theoretical model that block number correlates directly with location on disk can break down. I haven't put a hard number to measuring it directly, but systems with vacuum problems seem more likely to have noticeable filesystem level fragmentation. I've been thinking about collecting data from a few systems with filefrag to see if I'm right about that. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost limited statements RFC
On 6/8/13 5:20 PM, Kevin Grittner wrote: I'll believe that all of that is true, but I think there's another reason. With multiple layers of write cache (PostgreSQL shared_buffers, OS cache, controller or SSD cache) I think there's a tendency for an avalanche effect. Dirty pages stick to cache at each level like snow on the side of a mountain, accumulating over time. When it finally breaks loose at the top, it causes more from lower levels to break free as it passes. I explained this once as being like a tower of leaky buckets where each one drips into the one below. Buckets draining out of the bottom at one rate, and new water comes in at another. You can add water much faster than it drains, for a while. But once one of the buckets fills you've got a serious mess. I think the sudden onset of problems from write saturation contributes to the complaints. It's also important to realize that vacuum itself doesn't even do the writes in many cases. If you have a large shared_buffers value, it wanders off making things dirty without any concern for what's going to disk. When the next checkpoint shows up is when pressure increases at the top. The way this discussion has wandered off has nicely confirmed I was right to try and avoid going into this area again :( -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
On Sat, Jun 8, 2013 at 11:27 AM, Andres Freund and...@2ndquadrant.comwrote: You know, the PANIC isn't there just because we like to piss of users. There's actual technical reasons that don't just go away by judging the PANIC as stupid. At the points where the XLogInsert()s happens we're in critical sections out of which we *cannot* ERROR out because we already may have made modifications that cannot be allowed to be performed partially/unlogged. That's why we're throwing a PANIC which will force a cluster wide restart including *NOT* writing any further buffers from s_b out. If archiving is on and failure is due to no space, could we just keep trying XLogFileInit again for a couple minutes to give archiving a chance to do its things? Doing that while holding onto locks and a critical section would be unfortunate, but if the alternative is a PANIC, it might be acceptable. The problem is that even if the file is only being kept so it can be archived, once archiving succeeds I think the file is not removed immediately but rather not until the next checkpoint, which will never happen when the locks are still held. Cheers, Jeff
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
From: Joshua D. Drake j...@commandprompt.com On 06/08/2013 07:36 AM, MauMau wrote: 3. You cannot know the reason of archive_command failure (e.g. archive area full) if you don't use PostgreSQL's server logging. This is because archive_command failure is not logged in syslog/eventlog. Wait, what? Is this true (someone else?) I'm sorry, please let me correct myself. What I meant is: This is because the exact reason for archive_command failure (e.g. cp's output) is not logged in syslog/eventlog. The fact itself that archive_command failed is recorded. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
From: Joshua D. Drake j...@commandprompt.com On 06/08/2013 11:27 AM, Andres Freund wrote: You know, the PANIC isn't there just because we like to piss of users. There's actual technical reasons that don't just go away by judging the PANIC as stupid. Yes I know we aren't trying to piss off users. What I am saying is that it is stupid to the user that it PANICS. I apologize if that came out wrong. I've experienced PANIC shutdown several times due to WAL full during development. As a user, one problem with PANIC is that it dumps core. I think core files should be dumped only when can't happen events has occurred like PostgreSQL's bug. I didn't expect postgres dumps core simply because disk is full. I want postgres to shutdown with FATAL message in that exact case. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Don't downcase non-ascii identifier chars in multi-byte encoding
On Sat, Jun 8, 2013 at 10:25 AM, Andrew Dunstan and...@dunslane.net wrote: Don't downcase non-ascii identifier chars in multi-byte encodings. Long-standing code has called tolower() on identifier character bytes with the high bit set. This is clearly an error and produces junk output when the encoding is multi-byte. This patch therefore restricts this activity to cases where there is a character with the high bit set AND the encoding is single-byte. There have been numerous gripes about this, most recently from Martin Schäfer. Backpatch to all live releases. I'm all for changing this, but back-patching seems like a terrible idea. This could easily break queries that are working now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimising Foreign Key checks
On Sat, Jun 8, 2013 at 5:41 PM, Noah Misch n...@leadboat.com wrote: This does appear to specify FK timing semantics like PostgreSQL gives today. Namely, it does not permit a FK-induced error when later actions of the query that prompted the check could possibly remedy the violation. Yeah. Standard or no standard, I think we'd have unhappy users if we broke that. I can't see anything there that stops me applying locks as we go, but Not sure I follow that bit but... Likewise; I don't see why we couldn't perform an optimistic check ASAP and schedule a final after-statement check when an early check fails. That changes performance characteristics without changing semantics. ...this seems like it might have some promise; but what if the action we're performing isn't idempotent? And how do we know? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
From: Josh Berkus j...@agliodbs.com There's actually three potential failure cases here: - One Volume: WAL is on the same volume as PGDATA, and that volume is completely out of space. - XLog Partition: WAL is on its own partition/volume, and fills it up. - Archiving: archiving is failing or too slow, causing the disk to fill up with waiting log segments. I think there is one more case. Is this correct? - Failure of a disk containing data directory or tablespace If checkpoint can't write buffers to disk because of disk failure, checkpoint cannot complete, thus WAL files accumulate in pg_xlog/. This means that one disk failure will lead to postgres shutdown. xLog Partition -- As Heikki pointed, out, a full dedicated WAL drive is hard to fix once it gets full, since there's nothing you can safely delete to clear space, even enough for a checkpoint record. This sounds very scary. Is it possible to complete recovery and start up postmaster with either or both of the following modifications? [Idea 1] During recovery, force archiving a WAL file and delete/recycle it in pg_xlog/ as soon as its contents are applied. [Idea 2] During recovery, when disk full is encountered at end-of-recovery checkpoint, force archiving all unarchived WAL files and delete/recycle them at that time. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost limited statements RFC
On Sat, Jun 8, 2013 at 4:43 PM, Jeff Janes jeff.ja...@gmail.com wrote: I don't know what two independent setting would look like. Say you keep two independent counters, where each can trigger a sleep, and the triggering of that sleep clears only its own counter. Now you still have a limit on the linear combination, it is just that summation has moved to a different location. You have two independent streams of sleeps, but they add up to the same amount of sleeping as a single stream based on a summed counter. Or if one sleep clears both counters (the one that triggered it and the other one), I don't think that that is what I would call independent either. Or at least not if it has no memory. The intuitive meaning of independent would require that it keep track of which of the two counters was controlling over the last few seconds. Am I overthinking this? Yep. Suppose the user has a read limit of 64 MB/s and a dirty limit of 4MB/s. That means that, each second, we can read 8192 buffers and dirty 512 buffers. If we sleep for 20 ms (1/50th of a second), that covers 163 buffer reads and 10 buffer writes, so we just reduce the accumulate counters by those amounts (minimum zero). Also, in all the anecdotes I've been hearing about autovacuum causing problems from too much IO, in which people can identify the specific problem, it has always been the write pressure, not the read, that caused the problem. Should the default be to have the read limit be inactive and rely on the dirty-limit to do the throttling? The main time I think you're going to hit the read limit is during anti-wraparound vacuums. That problem may be gone in 9.4, if Heikki writes that patch we were discussing just recently. But at the moment, we'll do periodic rescans of relations that are already all-frozen, and that's potentially expensive. So I'm not particularly skeptical about the need to throttle reads. I suspect many people don't need it, but there are probably some who do, at least for anti-wraparound cases - especially on EC2, where the limit on I/O is often the GigE card. What I *am* skeptical about is the notion that people need the precise value of the write limit to depend on how many of the pages read are being found in shared_buffers versus not. That's essentially what the present system is accomplishing - at a great cost in user-visible complexity. Basically, I think that anti-wraparound vacuums may need either read throttling or write throttling depending on whether the data is already frozen; and regular vacuums probably only need write-throttling. But I have neither any firsthand experience nor any empirical reason to presume that the write limit needs to be lower when the read-rate is high. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimising Foreign Key checks
On Sat, Jun 08, 2013 at 08:20:42PM -0400, Robert Haas wrote: On Sat, Jun 8, 2013 at 5:41 PM, Noah Misch n...@leadboat.com wrote: Likewise; I don't see why we couldn't perform an optimistic check ASAP and schedule a final after-statement check when an early check fails. That changes performance characteristics without changing semantics. ...this seems like it might have some promise; but what if the action we're performing isn't idempotent? And how do we know? The action discussed so far is RI_FKey_check_ins(). It acquires a KEY SHARE lock (idempotent by nature) on a row that it finds using B-tree equality (presumed IMMUTABLE, thus idempotent). RI_FKey_check_upd() is nearly the same action, so the same argument holds. Before treating any other operation in the same way, one would need to conduct similar analysis. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Don't downcase non-ascii identifier chars in multi-byte encoding
On Sat, Jun 08, 2013 at 08:09:15PM -0400, Robert Haas wrote: On Sat, Jun 8, 2013 at 10:25 AM, Andrew Dunstan and...@dunslane.net wrote: Don't downcase non-ascii identifier chars in multi-byte encodings. Long-standing code has called tolower() on identifier character bytes with the high bit set. This is clearly an error and produces junk output when the encoding is multi-byte. This patch therefore restricts this activity to cases where there is a character with the high bit set AND the encoding is single-byte. There have been numerous gripes about this, most recently from Martin Sch?fer. Backpatch to all live releases. I'm all for changing this, but back-patching seems like a terrible idea. This could easily break queries that are working now. If more than one encoding covers the characters used in a given application, that application's semantics should be the same regardless of which of those encodings is in use. We certainly don't _guarantee_ that today; PostgreSQL leaves much to libc, which may not implement the relevant locales compatibly. However, this change bakes into PostgreSQL itself a departure from that principle. If a database contains tables ä and Ä, which of those SELECT * FROM Ä finds will be encoding-dependent. If we're going to improve the current (granted, worse) downcase_truncate_identifier() behavior, we should not adopt another specification bearing such surprises. Let's return to the drawing board on this one. I would be inclined to keep the current bad behavior until we implement the i18n-aware case folding required by SQL. If I'm alone in thinking that, perhaps switch to downcasing only ASCII characters regardless of the encoding. That at least gives consistent application behavior. I apologize for not noticing to comment on this week's thread. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed patch: remove hard-coded limit MAX_ALLOCATED_DESCS
* Tom Lane (t...@sss.pgh.pa.us) wrote: Recently we had a gripe about how you can't read very many files concurrently with contrib/file_fdw: http://www.postgresql.org/message-id/of419b5767.8a3c9adb-on85257b79.005491e9-85257b79.0054f...@isn.rtss.qc.ca Ouch, that's pretty bad. Barring objections or better ideas, I'd like to back-patch this as far as 9.1 where file_fdw was introduced. +1. This would bump the limit to something like 300 instead of 10, right? That seems pretty reasonable. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] small patch to crypt.c
JD, * Joshua D. Drake (j...@commandprompt.com) wrote: In my quest to understand how all the logging etc works with authentication I came across the area of crypt.c that checks for valid_until but it seems like it has an extraneous check. If I am wrong I apologize for the noise but wouldn't mind an explanation. Alright, there probably aren't too many people out there running with their clock set to pre-2000, but wouldn't this end up giving the wrong result in those cases, as GetCurrentTimestamp() would end up returning a negative value, which would make it less than vuntil's default of zero? Perhaps we could change what vuntil is set to by default, but I think it's probably better to keep things as-is; we should really be checking for null cases explicitly in general. Thanks, Stephen signature.asc Description: Digital signature
[HACKERS] Re: [COMMITTERS] pgsql: Don't downcase non-ascii identifier chars in multi-byte encoding
On 06/08/2013 10:52 PM, Noah Misch wrote: On Sat, Jun 08, 2013 at 08:09:15PM -0400, Robert Haas wrote: On Sat, Jun 8, 2013 at 10:25 AM, Andrew Dunstan and...@dunslane.net wrote: Don't downcase non-ascii identifier chars in multi-byte encodings. Long-standing code has called tolower() on identifier character bytes with the high bit set. This is clearly an error and produces junk output when the encoding is multi-byte. This patch therefore restricts this activity to cases where there is a character with the high bit set AND the encoding is single-byte. There have been numerous gripes about this, most recently from Martin Sch?fer. Backpatch to all live releases. I'm all for changing this, but back-patching seems like a terrible idea. This could easily break queries that are working now. If more than one encoding covers the characters used in a given application, that application's semantics should be the same regardless of which of those encodings is in use. We certainly don't _guarantee_ that today; PostgreSQL leaves much to libc, which may not implement the relevant locales compatibly. However, this change bakes into PostgreSQL itself a departure from that principle. If a database contains tables ä and Ä, which of those SELECT * FROM Ä finds will be encoding-dependent. If we're going to improve the current (granted, worse) downcase_truncate_identifier() behavior, we should not adopt another specification bearing such surprises. Let's return to the drawing board on this one. I would be inclined to keep the current bad behavior until we implement the i18n-aware case folding required by SQL. If I'm alone in thinking that, perhaps switch to downcasing only ASCII characters regardless of the encoding. That at least gives consistent application behavior. I apologize for not noticing to comment on this week's thread. The behaviour which this fixes is an unambiguous bug. Calling tolower() on the individual bytes of a multi-byte character can't possibly produce any sort of correct result. A database that contains such corrupted names, probably not valid in any encoding at all, is almost certainly not restorable, and I'm not sure if it's dumpable either. It's already produced several complaints in recent months, so ISTM that returning to it for any period of time is unthinkable. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Batch API for After Triggers
* Simon Riggs (si...@2ndquadrant.com) wrote: While fiddling with FK tuning, Noah suggested batching trigger executions together to avoid execution overhead. I like the general idea, but I'd prefer a way to avoid having to queue up tons of trigger events to be executed in the first place. Aggregates do this by providing a way to store up information to be processed by an eventual 'final' function. Another option, as Kevin asked about, would be statement level triggers which are able to see both the OLD and the NEW versions of the relation. The per-row trigger option with a way to be called immediately and then store what it cares about for a later final function strikes me as very generalized and able to do things that the statement-level option couldn't, but I'm not sure if there's a use-case that could solve which the OLD/NEW statement trigger capability couldn't. Thanks, Stephen signature.asc Description: Digital signature
[HACKERS] Re: [COMMITTERS] pgsql: Don't downcase non-ascii identifier chars in multi-byte encoding
On Sat, Jun 08, 2013 at 11:50:53PM -0400, Andrew Dunstan wrote: On 06/08/2013 10:52 PM, Noah Misch wrote: Let's return to the drawing board on this one. I would be inclined to keep the current bad behavior until we implement the i18n-aware case folding required by SQL. If I'm alone in thinking that, perhaps switch to downcasing only ASCII characters regardless of the encoding. That at least gives consistent application behavior. I apologize for not noticing to comment on this week's thread. The behaviour which this fixes is an unambiguous bug. Calling tolower() on the individual bytes of a multi-byte character can't possibly produce any sort of correct result. A database that contains such corrupted names, probably not valid in any encoding at all, is almost certainly not restorable, and I'm not sure if it's dumpable either. I agree with each of those points. However, since any change here breaks compatibility, we should fix it right the first time. A second compatibility break would be all the more onerous once this intermediate step helps more users to start using unquoted, non-ASCII object names. It's already produced several complaints in recent months, so ISTM that returning to it for any period of time is unthinkable. PostgreSQL has lived with this wrong behavior since ... the beginning? It's a problem, certainly, but a bandage fix brings its own trouble. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
On 06/07/2013 01:00 AM, Josh Berkus wrote: Daniel, So your suggestion is that if archiving is falling behind, we should introduce delays on COMMIT in order to slow down the rate of WAL writing? Delaying commit wouldn't be enough; consider a huge COPY, which can produce a lot of WAL at a high rate without a convenient point to delay at. I expect a delay after writing an xlog record would make a more suitable write-rate throttle, though I'd want to be sure the extra branch didn't hurt performance significantly. Branch prediction hints would help; since we don't *care* if the delay branch is slow and causes pipeline stalls, tagging the no-delay branch as likely would probably deal with that concern for supported compilers/platforms. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redesigning checkpoint_segments
On 06/06/2013 03:21 PM, Joshua D. Drake wrote: Not to be unkind but the problems of the uniformed certainly are not the problems of the informed. Or perhaps they are certainly the problems of the informed :P. I'm not convinced that's a particularly good argument not to improve something. Sure, it might be a usability issue not a purely technical issue, but that IMO doesn't make it much less worth fixing. Bad usability puts people off early, before they can become productive and helpful community members. It also puts others off trying the software at all by reputation alone. In any case, I don't think this is an issue of the informed vs uninformed. It's also a matter of operational sanity at scale. The sysadmin can't watch 100,000 individual servers and jump in to make minute tweaks - nor should they have to when some auto-tuning could obviate the need. The same issue exists with vacuum - it's hard for basic users to understand, so they misconfigure it and often achieve the opposite results to what they need. It's been getting better, but some feedback-based control would make a world of difference when running Pg. In this I really have to agree with Hekki and Daniel - more usable and preferably feedback-tuned defaults would be really, really nice to have, though I'd want good visibility (logging, SHOW commands, etc) into what they were doing and options to override for special cases. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
On 06/06/2013 10:00 PM, Heikki Linnakangas wrote: I've seen a case, where it was even worse than a PANIC and shutdown. pg_xlog was on a separate partition that had nothing else on it. The partition filled up, and the system shut down with a PANIC. Because there was no space left, it could not even write the checkpoint after recovery, and thus refused to start up again. There was nothing else on the partition that you could delete to make space. The only recourse would've been to add more disk space to the partition (impossible), or manually delete an old WAL file that was not needed to recover from the latest checkpoint (scary). Fortunately this was a test system, so we just deleted everything. There were a couple of dba.stackexchange.com reports along the same lines recently, too. Both involved an antivirus vendor's VM appliance with a canned (stupid) configuration that set wal_keep_segments too high for the disk space allocated and stored WAL on a separate partition. People are having issues with WAL space management in the real world and I think it and autovacuum are the two hardest things for most people to configure and understand in Pg right now. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)
On 06/08/2013 10:57 AM, Daniel Farina wrote: At which point most sensible users say no thanks, I'll use something else. [snip] I have a clear bias in experience here, but I can't relate to someone who sets up archives but is totally okay losing a segment unceremoniously, because it only takes one of those once in a while to make a really, really bad day. It sounds like between you both you've come up with a pretty solid argument by exclusion for throttling WAL writing as space grows critical. Dropping archive segments seems pretty unsafe from the solid arguments Daniel presents, and Josh makes a good case for why shutting the DB down when archiving can't keep up isn't any better. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] small patch to crypt.c
On 06/08/2013 08:47 PM, Stephen Frost wrote: JD, * Joshua D. Drake (j...@commandprompt.com) wrote: In my quest to understand how all the logging etc works with authentication I came across the area of crypt.c that checks for valid_until but it seems like it has an extraneous check. If I am wrong I apologize for the noise but wouldn't mind an explanation. Alright, there probably aren't too many people out there running with their clock set to pre-2000, but wouldn't this end up giving the wrong result in those cases, as GetCurrentTimestamp() would end up returning a negative value, which would make it less than vuntil's default of zero? Perhaps we could change what vuntil is set to by default, but I think it's probably better to keep things as-is; we should really be checking for null cases explicitly in general. Well I was more referring to the default is: check if null, if true return ok check if valuntil today, if true return error else return ok To me we don't need the null check. However, when I tested it, without the null check you can't login. So now I am curious about what is going on. JD Thanks, Stephen -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers