Re: [HACKERS] PQConnectPoll, connect(2), EWOULDBLOCK and somaxconn
On 2013-06-26 20:07:40 -0400, Tom Lane wrote: I don't see how we could trigger the conditions for EINPROGRESS on windows that msdn lists, but since we need it on unixoid systems and its valid to treat the connect as partiall successfull on windows, there seems little benefit in dropping it. I was about to argue for removing the EINPROGRESS check on Windows, on the grounds that this would be the same type of bug as you're complaining of on Linux, ie, if the call does return this error we'll mistakenly think the connection is in progress and go on to deliver an unhelpful failure message later. However, some more trolling of the intertubes suggests that Cygwin's emulation of socket() does indeed return EINPROGRESS; see for instance this ancient thread of ours: http://www.postgresql.org/message-id/flat/14855.49635.565990.716...@kryten.bedford.waii.com#14855.49635.565990.716...@kryten.bedford.waii.com Unless we want to distinguish Cygwin from native Windows in this code chunk, maybe we'd better leave well enough alone. After some looking it gets funnier. There currently doesn't seem to be any chance that we actually can get an EINPROGRESS at that level on windows. We #define connect() to pgwin32_connect() which fudges errno around. Where WSAEINPROGRESS is mapped to EINVAL. Also for some reason the connect is performed synchronously to the outside by doing pgwin32_waitforsinglesocket() if EWOULDBLOCK is returned by connect()... That seems to be the case since a4c40f140d23cefbf94e00283f7688c772867f1b. I still want to delete the test for SOCK_ERRNO == 0. I traced that back to commit da9501bddb4dc33c031b1db6ce2133bcee7b, but I can't find anything in the mailing list archives to explain that. I have an inquiry in to Jan to see if he can remember the reason ... That looks strange, yes. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal 9.4 plpgsql: allows access to call stack from GET DIAGNOSTICS statement
Latest patch looks good to me. Regards, Rushabh On Wed, Jun 26, 2013 at 11:21 PM, Pavel Stehule pavel.steh...@gmail.comwrote: Hello updated patch with some basic doc Regards Pavel 2013/6/26 Rushabh Lathia rushabh.lat...@gmail.com: On Wed, Jun 26, 2013 at 5:11 PM, Pavel Stehule pavel.steh...@gmail.com wrote: 2013/6/26 Rushabh Lathia rushabh.lat...@gmail.com: On Tue, Jun 25, 2013 at 11:09 AM, Pavel Stehule pavel.steh...@gmail.com wrote: 2013/6/25 Rushabh Lathia rushabh.lat...@gmail.com: On Tue, Jun 25, 2013 at 1:47 AM, Pavel Stehule pavel.steh...@gmail.com wrote: Hello This is fragment ofmy old code from orafce package - it is functional, but it is written little bit more generic due impossible access to static variables (in elog.c) static char* dbms_utility_format_call_stack(char mode) { MemoryContext oldcontext = CurrentMemoryContext; ErrorData *edata; ErrorContextCallback *econtext; StringInfo sinfo; #if PG_VERSION_NUM = 80400 errstart(ERROR, __FILE__, __LINE__, PG_FUNCNAME_MACRO, TEXTDOMAIN); #else errstart(ERROR, __FILE__, __LINE__, PG_FUNCNAME_MACRO); #endif MemoryContextSwitchTo(oldcontext); for (econtext = error_context_stack; econtext != NULL; econtext = econtext-previous) (*econtext-callback) (econtext-arg); edata = CopyErrorData(); FlushErrorState(); https://github.com/orafce/orafce/blob/master/utility.c 2013/6/24 Rushabh Lathia rushabh.lat...@gmail.com: Hi, Use of this feature is to get call stack for debugging without raising exception. This definitely seems very useful. Here are my comments about the submitted patch: Patch gets applied cleanly (there are couple of white space warning but that's into test case file). make and make install too went smooth. make check was smooth too. Did some manual testing about feature and its went smooth. However would like to share couple of points: My main motive is concentration to stack info string only. So I didn't use err_start function, and I didn't use CopyErrorData too. These routines does lot of other work. 1) I noticed that you did the initialization of edata manually, just wondering can't we directly use CopyErrorData() which will do that job ? yes, we can, but in this context on context field is interesting for us. I found that inside CopyErrorData() there is Assert: Assert(CurrentMemoryContext != ErrorContext); And in the patch we are setting using ErrorContext as short living context, is it the reason of not using CopyErrorData() ? it was not a primary reason, but sure - this routine is not designed for direct using from elog module. Next, we can save one palloc call. Hmm ok. 2) To reset stack to empty, FlushErrorState() can be used. yes, it can be. I use explicit rows due different semantics, although it does same things. FlushErrorState some global ErrorState and it is used from other modules. I did a reset of ErrorContext. I fill a small difference there - so FlushErrorState is not best name for my purpose. What about modification: static void resetErrorStack() { errordata_stack_depth = -1; recursion_depth = 0; /* Delete all data in ErrorContext */ MemoryContextResetAndDeleteChildren(ErrorContext); } and then call in InvokeErrorCallbacks -- resetErrorStack() and rewrote flushErrorState like void FlushErrorState(void) { /* reset ErrorStack is enough */ resetErrorStack(); } ??? Nope. rather then that I would still prefer direct call of FlushErrorState(). but I can live well with direct call of FlushErrorState() - it is only minor change. 3) I was think about the usability of the feature and wondering how about if we add some header to the stack, so user can differentiate between STACK and normal NOTICE message ? Something like: postgres=# select outer_outer_func(10); NOTICE: - Call Stack - PL/pgSQL function inner_func(integer) line 4 at GET DIAGNOSTICS PL/pgSQL function outer_func(integer) line 3 at RETURN PL/pgSQL function outer_outer_func(integer) line 3 at RETURN CONTEXT: PL/pgSQL function outer_func(integer) line 3 at RETURN PL/pgSQL function outer_outer_func(integer) line 3 at RETURN outer_outer_func -- 20 (1 row) I
Re: [HACKERS] pg_filedump 9.3: checksums (and a few other fixes)
On 2013-06-26 21:18:49 -0700, Peter Geoghegan wrote: On Wed, Jun 26, 2013 at 8:27 PM, Tom Lane t...@sss.pgh.pa.us wrote: TBH, I've always been annoyed that pg_filedump is GPL and so there's no way for us to just ship it in contrib. (That stems from Red Hat corporate policy of a dozen years ago, but the conflict is real anyway.) If somebody is sufficiently excited about this topic to do something that's largely new anyway, I'd be in favor of starting from scratch so it could be put under the usual Postgres license. Heroku are interested in online verification of basebackups (i.e. using checksums to verify the integrity of heap files as they are backed up, with a view to relying less and less on logical backups). I am very glad that you made the page checksums stuff available to external utilities in commit f04216341dd1cc235e975f93ac806d9d3729a344. In the last couple of days, I haven't been able to figure out a way to solve the problem of torn pages in a way that isn't a complete kludge (with a hopefully-acceptable risk of false positives), so I've been operating under the assumption that anything I produce here won't be up to the standards of contrib. Why not do this from a function/background worker in the backend where you can go via the buffer manager to avoid torn pages et al. If you use a buffer strategy the cache poisoning et al should be controlleable. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Patch to compute Max LSN of Data Pages
On Thursday, June 27, 2013 11:26 AM Andres Freund wrote: On 2013-06-27 11:16:25 +0530, Amit Kapila wrote: On Wednesday, June 26, 2013 10:19 PM Fujii Masao wrote: On Wed, Jun 26, 2013 at 8:57 PM, Amit Kapila amit.kap...@huawei.com One more use case for which this utility was done is as below: It will be used to decide that on new-standby (old-master) whether a full backup is needed from New-master(old-standby). The backup is required when the data page in old-master precedes the last applied LSN in old-standby (i.e., new-master) at the moment of the failover. That's exactly what I was afraid of. Unless I miss something the tool is *NOT* sufficient to do this. You mean to say if user knows the max LSN of data pages in old- master and last applied LSN in new master, he cannot decide whether Full backup is needed? It should be straightforward decision that skip a backup if that old-master LSN is less than the new-master (i.e., last applied LSN, IOW, timeline switch LSN). It was proposed as a usecase in this below mail: http://www.postgresql.org/message-id/CAHGQGwHyd1fY0hF0qKh0-uKDh- gcbYxMOFBYVk kh4jzji-f...@mail.gmail.com I guess he meant the commit hint bit problem. True, after reading the thread mentioned by Andres, I got the reason he was pointing why it is not sufficient. So can it be useful if database has checksums enabled? I think for that usecase its far more useful to work on getting pg_rewind since that has a chance of working when local WAL has been applied that hadn't yet shipped to the other side (which is frequently the case). Aren't the use case for both is bit different Pg_computmaxlsn - by computing max lsn for checksums enabled database, user can made old-master follow new-master if maxlsn suggests that fullbackup is not required. Pg_rewind - a tool to resynchronize old-master and new-master by copying changed blocks from new master. I think more work might be needed in case DDL's happened on old-master after forking of new-master. Although for this case, both have resemblance in terms of avoiding full backup, but I think maxlsn tool can be independently also used. Do you think pg_rewind can be used by default for any checksum enabled database to resynchronize old-master? With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_filedump 9.3: checksums (and a few other fixes)
On Wed, Jun 26, 2013 at 11:27 PM, Andres Freund and...@2ndquadrant.com wrote: Why not do this from a function/background worker in the backend where you can go via the buffer manager to avoid torn pages et al. If you use a buffer strategy the cache poisoning et al should be controlleable. I had considered that, but thought it might be a little bit aggressive, even with a strategy of BAS_BULKREAD. Maybe the kludge I have in mind might not end up being that bad in practice, and would certainly perform better than an approach that used the buffer manager. But then, going through shared_buffers could be worth the overhead, if only for the peace of mind of not relying on something that is subtly broken. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_filedump 9.3: checksums (and a few other fixes)
On 2013-06-26 23:42:55 -0700, Peter Geoghegan wrote: On Wed, Jun 26, 2013 at 11:27 PM, Andres Freund and...@2ndquadrant.com wrote: Why not do this from a function/background worker in the backend where you can go via the buffer manager to avoid torn pages et al. If you use a buffer strategy the cache poisoning et al should be controlleable. I had considered that, but thought it might be a little bit aggressive, even with a strategy of BAS_BULKREAD. Well, you can influence the pacing yourself, you don't need to rely on the strategy for that. I'd only use it because of the ringbuffer logic it has to avoid trashing the cache. Maybe the kludge I have in mind might not end up being that bad in practice, and would certainly perform better than an approach that used the buffer manager. What do you have in mind then? But then, going through shared_buffers could be worth the overhead, if only for the peace of mind of not relying on something that is subtly broken. Spurious alarms quickly lead to people ignoring them, consciously or not, so trying to take care not to go there sounds like a good idea. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add --progress option to pgbench (submission 3)
Dear Febien (2013/06/27 14:39), Fabien COELHO wrote: If I show a latency at full load, that would be nclients/tps, not 1/tps. However, I'm hoping to pass the throttling patch to pgbench, in which case the latency to show is a little bit different because the nclients/tps would include sleep time and does not correspond to the latency for the end user. Also, under throttling it would also be useful to show the time lag behind scheduled transactions. All right. Of Corse, I consider your wishing functions is realized with best implementation. So I would like to know whether the throttling patch is committed and then update the progress patch to take that into account. OK! I watch it and use it. Best regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_filedump 9.3: checksums (and a few other fixes)
On Tue, Jun 18, 2013 at 9:42 AM, Jeff Davis pg...@j-davis.com wrote: I'm not sure what the resolution of Alvaro's concern was, so I left the flag reporting the same as the previous patch. Alvaro's concern was that the new flags added (those added by the foreign key locks patch) do something cute with re-using multiple other bits in an otherwise nonsensical combination to represent a distinct state. So as written, the infoMask if statements will result in spurious reporting of information stored in t_infomask. If you AND some integer with HEAP_XMAX_SHR_LOCK and get something non-zero, you'll surely also get a non-zero result with HEAP_LOCK_MASK, because the latter flag has all the same bits set as the former (plus others, obviously). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash partitioning.
Guys, Single core CPU's are dying for Home users, my cellular has 4 cores. Today's standard is minimum 4 cores per CPU and tomorrow who knows? Parallelization sometimes is only one solution for heavy nightly jobs. From the other hand parallelization is very tricky and unpredictable when it comes into user's hands. Anyway when you have this option (same for hash partitioning) you in much better position than you don't have it. The question is: when we may hope to have it? Sincerely yours, Yuri Levinsky, DBA Celltick Technologies Ltd., 32 Maskit St., Herzliya 46733, Israel Mobile: +972 54 6107703, Office: +972 9 9710239; Fax: +972 9 9710222 -Original Message- From: Markus Wanner [mailto:mar...@bluegap.ch] Sent: Wednesday, June 26, 2013 6:56 PM To: Heikki Linnakangas Cc: Kevin Grittner; Claudio Freire; Robert Haas; Bruce Momjian; Yuri Levinsky; PostgreSQL-Dev Subject: Re: [HACKERS] Hash partitioning. On 06/26/2013 05:46 PM, Heikki Linnakangas wrote: We could also allow a large query to search a single table in parallel. A seqscan would be easy to divide into N equally-sized parts that can be scanned in parallel. It's more difficult for index scans, but even then it might be possible at least in some limited cases. So far reading sequentially is still faster than hopping between different locations. Purely from the I/O perspective, that is. For queries where the single CPU core turns into a bottle-neck and which we want to parallelize, we should ideally still do a normal, fully sequential scan and only fan out after the scan and distribute the incoming pages (or even tuples) to the multiple cores to process. Regards Markus Wanner This mail was received via Mail-SeCure System. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_filedump 9.3: checksums (and a few other fixes)
On Thu, Jun 27, 2013 at 12:07 AM, Peter Geoghegan p...@heroku.com wrote: I'm not sure what the resolution of Alvaro's concern was, so I left the flag reporting the same as the previous patch. Alvaro's concern was that the new flags added (those added by the foreign key locks patch) do something cute with re-using multiple other bits in an otherwise nonsensical combination to represent a distinct state. I just realized that it wasn't that you didn't understand the nature of the problem, but that you weren't sure of a resolution. Sorry for the noise. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
On Tue, Jun 25, 2013 at 9:07 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Jun 21, 2013 at 5:20 AM, Hitoshi Harada umi.tan...@gmail.com wrote: If I don't miss something, the requirement for the CONCURRENTLY option is to allow simple SELECT reader to read the matview concurrently while the view is populating the new data, and INSERT/UPDATE/DELETE and SELECT FOR UPDATE/SHARE are still blocked. So, I wonder why it is not possible just to acquire ExclusiveLock on the matview while populating the data and swap the relfile by taking small AccessExclusiveLock. This lock escalation is no dead lock hazard, I suppose, because concurrent operation would block the other at the point ExclusiveLock is acquired, and ExclusiveLock conflicts AccessExclusiveLock. Then you don't need the complicated SPI logic or unique key index dependency. This is no good. One, all lock upgrades are deadlock hazards. In this case, that plays out as follows: suppose that the session running REFRESH MATERIALIZED VIEW CONCURRENTLY also holds a lock on something else. Some other process takes an AccessShareLock on the materialized view and then tries to take a conflicting lock on the other object. Kaboom, deadlock. Granted, the chances of that happening in practice are small, but it IS the reason why we typically try to having long-running operations perform lock upgrades. Users get really annoyed when their DDL runs for an hour and then rolls back. OK, that' not safe. What I was thinking was something similar to compare-and-swap, where the whole operation is atomic under an AccessExclusiveLock. What if we release ExclusiveLock once a new matview was created and re-acquire AccessExclusiveLock before trying swap? Note matview is a little different from index which I know people are talking about in REINDEX CONCURRENTLY thread, in that the content of matview does not change incrementally (at least at this point), but only does change fully in swapping operation by the same REFRESH MATERIALIZED VIEW command. The only race condition is between releasing Exclusive lock and re-acquire AccessExclusiveLock someone else can go ahead with the same operation and could create another one. If it happens, let's abort us, because I guess that's the way our transaction system is working anyway; in case of unique key index insertion for example, if I find another guy is inserting the same value in the index, I wait for the other guy to finish his work and if his transaction commits I give up, otherwise I go ahead. Maybe it's annoying if an hour operation finally gets aborted, but my purpose is actually achieved by the other guy. If the primary goal of this feature is let reader reads the matview concurrently it should be ok? Hmm, but in such cases the first guy is always win and the second guy who may come an hour later loses so we cannot get the result from the latest command... I still wonder there should be some way. Two, until we get MVCC catalog scans, it's not safe to update any system catalog tuple without an AccessExclusiveLock on some locktag that will prevent concurrent catalog scans for that tuple. Under SnapshotNow semantics, concurrent readers can fail to see that the object is present at all, leading to mysterious failures - especially if some of the object's catalog scans are seen and others are missed. So what I'm saying above is take AccessExclusiveLock on swapping relfile in catalog. This doesn't violate your statement, I suppose. I'm actually still skeptical about MVCC catalog, because even if you can make catalog lookup MVCC, relfile on the filesystem is not MVCC. If session 1 changes relfilenode in pg_class and commit transaction, delete the old relfile from the filesystem, but another concurrent session 2 that just took a snapshot before 1 made such change keeps running and tries to open this relation, grabbing the old relfile and open it from filesystem -- ERROR: relfile not found. So everyone actually needs to see up-to-date information that synchronizes with what filesystem says and that's SnapshotNow. In my experimental thought above about compare-and-swap way, in compare phase he needs to see the most recent valid information, otherwise he never thinks someone did something new. Since I haven't read the whole thread, maybe we have already discussed about it, but it would help if you clarify this concern. Thanks, -- Hitoshi Harada
Re: [HACKERS] refresh materialized view concurrently
On Wed, Jun 26, 2013 at 1:38 AM, Kevin Grittner kgri...@ymail.com wrote: Hitoshi Harada umi.tan...@gmail.com wrote: I spent a few hours to review the patch. Thanks! As far as I can tell, the overall approach is as follows. - create a new temp heap as non-concurrent does, but with ExclusiveLock on the matview, so that reader wouldn't be blocked Non-concurrent creates the heap in the matview's tablespace and namespace, so the temp part is different in concurrent generation. This difference is why concurrent can be faster when few rows change. It's still not clear to me why you need temp in concurrent and not in non-concurrent. If this type of operations is always creating temp table and just swap it with existing one, why can't we just make it temp always? And if the performance is the only concern, is temp better than just turning off WAL for the table or UNLOGGED table? Also, before the next step there is an ANALYZE of the temp table, so the planner can make good choices in the next step. - with this temp table and the matview, query FULL JOIN and extract difference between original matview and temp heap (via SPI) Right; into another temp table. - this operation requires unique index for performance reason (or correctness reason too) It is primarily for correctness in the face of duplicate rows which have no nulls. Do you think the reasons need to be better documented with comments? Ah, yes, even after looking at patch I was confused if it was for performance or correctness. It's a shame we cannot refresh it concurrently if we have duplicate rows in the matview. I thought it would make sense to allow it without unique key if it was only performance tradeoffs. I also modified the confusing error message to something close to the suggestion from Robert. What to do about the Assert that the matview is not a system relation seems like material for a separate patch. After review, I'm inclined to remove the test altogether, so that extensions can create matviews in pg_catalog. I like this better. New version attached. Will take another look. Thanks, -- Hitoshi Harada
[HACKERS] Documentation/help for materialized and recursive views
Is there a particular reason why CREATE RECURSIVE VIEW is part of the help for CREATE VIEW, but CREATE MATERIALIZED VIEW doesn't show up there? I realize the technical reason (they're different man pages, and that also controls what's in \h in psql which is where I ran into it), but was there any particular reason to split those up in the first place? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
On 2013-06-27 00:12:07 -0700, Hitoshi Harada wrote: Two, until we get MVCC catalog scans, it's not safe to update any system catalog tuple without an AccessExclusiveLock on some locktag that will prevent concurrent catalog scans for that tuple. Under SnapshotNow semantics, concurrent readers can fail to see that the object is present at all, leading to mysterious failures - especially if some of the object's catalog scans are seen and others are missed. So what I'm saying above is take AccessExclusiveLock on swapping relfile in catalog. This doesn't violate your statement, I suppose. I'm actually still skeptical about MVCC catalog, because even if you can make catalog lookup MVCC, relfile on the filesystem is not MVCC. If session 1 changes relfilenode in pg_class and commit transaction, delete the old relfile from the filesystem, but another concurrent session 2 that just took a snapshot before 1 made such change keeps running and tries to open this relation, grabbing the old relfile and open it from filesystem -- ERROR: relfile not found. We can play cute tricks akin to what CREATE INDEX CONCURRENTLY currently does, i.e. wait for all other relations that could have possibly seen the old relfilenode (they must have at least a share lock on the relation) before dropping the actual storage. The reason we cannot currently do that in most scenarios is that we cannot perform transactional/mvcc updates of non-exclusively locked objects due to the SnapshotNow problems of seeing multiple or no versions of a row during a single scan. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Group Commits Vs WAL Writes
Hi all, I think this is a naive question. When we do a commit, WAL buffers are written to the disk. This has a disk latency for the required I/O. Now, with group commits, do we see a spike in that disk write latency, especially in the cases where the user has set wal_buffers to a high value? Regards, Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MD5 aggregate
On 26 June 2013 22:48, Noah Misch n...@leadboat.com wrote: On Wed, Jun 26, 2013 at 09:04:34PM +0100, Dean Rasheed wrote: On 26 June 2013 19:32, Noah Misch n...@leadboat.com wrote: On Mon, Jun 17, 2013 at 11:34:52AM +0100, Dean Rasheed wrote: md5_agg() is well-defined and not cryptographically novel, and your use case is credible. However, not every useful-sometimes function belongs in core; we mostly stick to functions with ubiquitous value and functions that would be onerous to implement externally. (We do carry legacy stuff that wouldn't make the cut today.) In my estimation, md5_agg() does not make that cut. The variety of intuitive md5_agg() definitions attested upthread doesn't bode well for its broad applicability. It could just as well live in an extension published on PGXN. Mine is just one opinion, though; I won't be horrified if the community wants an md5_agg() in core. I disagree with this though. I see md5_agg() as a natural extension of the already-in-core md5() functions and underlying code, satisfying a well-defined use-case, and providing a checksum comparable with externally generated md5 sums. All true, but I don't see those facts justifying core inclusion. A quick google search reveals several people asking for something like this, and people recommending md5(string_agg(...)) or md5(string_agg(md5(...))) based solutions, which are doomed to failure on larger tables. So I think that there is a case for having md5_agg() in core as an alternative to such hacks, while having more sophisticated crypto functions available as extensions. My nine Google hits for md5(string_agg included one Stack Overflow answer, several mirrors of that answer, and a few posts on this thread. I found more than that using Google. It's not uncommon for people to use Google and then pick the first accepted answer on Stack Overflow, in which case they might well pick one of the answers here [1] or here [2]. Or they might find this [3]. Or if they came to our lists they might find this [4], or deduce from this [5] that it isn't possible. [1] http://stackoverflow.com/questions/4020033/how-can-i-get-a-hash-of-an-entire-table-in-postgresql [2] http://stackoverflow.com/questions/13554333/finding-out-the-hash-value-of-a-group-of-rows [3] https://ralphholz.de/?q=node/298 [4] http://www.postgresql.org/message-id/4e5f469c.5070...@compulab.co.il [5] http://www.postgresql.org/message-id/e94d85500801301153u6b976e31m89e311c7134a0...@mail.gmail.com I'd say there are clearly people who want it, and the nature of some of those answers suggests to me that we ought to have a better answer in core. I haven't looked at pgcrypto because, as I said, performance wasn't my primary criterion either, but removing the unnessary data copy just seemed like good sense. I'll take a look at it, and then as you and Peter suggest, look to split the patch into two. I think I will withdraw md5_total() because I was never entirely happy with that anyway. Understood; feel free to back off from any performance improvements in which you didn't wish to involve yourself. If we do end up moving forward with md5_agg(), I note that the pgcrypto version already has an initialize/accumulate/finalize API division. A patch importing that code largely-unchanged would be easier to verify than a patch restructuring the src/backend/libpq/md5.c implementation. The two patches would then be a use pgcrypto md5.c in core patch and an md5_agg() patch. I'll take a look at it, although I think there is still the potential for bugs either way. What I've done with libpq's md5.c is just to rearrange the internal pieces, without touching the core algorithm code or the higher level callers. So the most likely types of bugs are boundary/size errors. If I'd broken any of pg_md5_hash(), pg_md5_binary() or pg_md5_crypt(), then I'd have broken them all. It's possible to get a reasonable level of confidence in those changes using queries like this on HEAD and with the patch: SELECT md5(string_agg(md5(str) || repeat(' ', i), '')) FROM ( SELECT i, string_agg(chr(32+(i+j*31)%224), '') AS str FROM generate_series(0,1) i, generate_series(0,i) j GROUP BY i ) t; and these with the patch: SELECT md5_agg(md5(str) || repeat(' ', i)) FROM ( SELECT i, string_agg(chr(32+(i+j*31)%224), '') AS str FROM generate_series(0,1) i, generate_series(0,i) j GROUP BY i ) t; SELECT md5_agg(md5_agg || repeat(' ', i)) FROM ( SELECT i, md5_agg(chr(32+(i+j*31)%224)) FROM generate_series(0,1) i, generate_series(0,i) j GROUP BY i ) t; which all produce the same overall sum. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MD5 aggregate
On 26 June 2013 21:46, Peter Eisentraut pete...@gmx.net wrote: On 6/26/13 4:04 PM, Dean Rasheed wrote: A quick google search reveals several people asking for something like this, and people recommending md5(string_agg(...)) or md5(string_agg(md5(...))) based solutions, which are doomed to failure on larger tables. The thread discussed several other options of checksumming tables that did not have the air of a crytographic offering, as Noah put it. True but md5 has the advantage of being directly comparable with the output of Unix md5sum, which would be useful if you loaded data from external files and wanted to confirm that your import process didn't mangle it. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] in-catalog Extension Scripts and Control parameters (templates?)
On Mon, Jun 24, 2013 at 4:20 AM, Dimitri Fontaine dimi...@2ndquadrant.frwrote: Jaime Casanova ja...@2ndquadrant.com writes: just tried to build this one, but it doesn't apply cleanly anymore... specially the ColId_or_Sconst contruct in gram.y Please find attached a new version of the patch, v7, rebased to current master tree and with some more cleanup. I've been using the new grammar entry NonReservedWord_or_Sconst, I'm not sure about that. I played a bit with this patch. - If I have control file that has the same name as template, create extension picks up control file? Is this by design? - Though control file is kind of global information among different databases, pg_extension_template is not. Sounds a little weird to me. - Why do we need with() clause even if I don't need it? - I copied and paste from my plv8.control file to template script, and MODULE_PATHNAME didn't work. By design? - foo=# create template for extension ex2 version '1.0' with (requires ex1) as $$ $$; ERROR: template option requires takes an argument - create template ex2, create extension ex2, alter template ex2 rename to ex3, create extension ex3, drop template ex3; foo=# drop template for extension ex3 version '1.0'; ERROR: cannot drop unrecognized object 3179 16429 0 because other objects depend on it - a template that is created in another template script does not appear to depend on the parent template. - Without control file/template, attempt to create a new extension gives: foo=# create extension plv8; ERROR: extension plv8 has no default control template but can it be better, like extension plv8 has no default control file or template? - Do we really need separate tables, pg_extension_template and pg_extension_control? - Looks like both tables don't have toast tables. Some experiment gives: ERROR: row is too big: size 8472, maximum size 8160 Thanks, -- Hitoshi Harada
Re: [HACKERS] FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]
Tom Lane said: Agreed, separating out the function-call-with-trailing-declaration syntaxes so they aren't considered in FROM and index_elem seems like the best compromise. If we do that for window function OVER clauses as well, can we make OVER less reserved? Yes. At least, I tried it with both OVER and FILTER unreserved and there were no grammar conflicts (and I didn't have to do anything fancy to avoid them), and it passed regression with the exception of the changed error message for window functions in the from-clause. So is this the final decision on how to proceed? It seems good to me, and I can work with David to get it done. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]
Hello 2013/6/27 Andrew Gierth and...@tao11.riddles.org.uk: Tom Lane said: Agreed, separating out the function-call-with-trailing-declaration syntaxes so they aren't considered in FROM and index_elem seems like the best compromise. If we do that for window function OVER clauses as well, can we make OVER less reserved? Yes. At least, I tried it with both OVER and FILTER unreserved and there were no grammar conflicts (and I didn't have to do anything fancy to avoid them), and it passed regression with the exception of the changed error message for window functions in the from-clause. So is this the final decision on how to proceed? It seems good to me, and I can work with David to get it done. Isn't dangerous do OVER unreserved keyword?? Regards Pavel -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Documentation/help for materialized and recursive views
2013/6/27 Magnus Hagander mag...@hagander.net: Is there a particular reason why CREATE RECURSIVE VIEW is part of the help for CREATE VIEW, but CREATE MATERIALIZED VIEW doesn't show up there? I realize the technical reason (they're different man pages, and that also controls what's in \h in psql which is where I ran into it), but was there any particular reason to split those up in the first place? Normal views are an abstraction layer, while materialized views (despite containing the word “view”) are mainly a performance tool (in a way similar to indexes). The functionality of materialized views will (over time) totally swamp that of normal views, so mixing all the corresponding documentation with the documentation for normal views probably doesn’t make things easier for people that are only interested in normal views. Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix conversion for Decimal arguments in plpython functions
On 27 June 2013 05:21, Steve Singer st...@ssinger.info wrote: On 06/26/2013 04:47 PM, Szymon Guz wrote: Attached patch has all changes against trunk code. There is added a function for conversion from Postgres numeric to Python Decimal. The Decimal type is taken from cdecimal.Decimal, if it is available. It is an external library, quite fast, but may be not available. If it is not available, then decimal.Decimal will be used. It is in standard Python library, however it is rather slow. The initialization is done in the conversion function, the pointer to a proper Decimal constructor is stored as static variable inside the function and is lazy initialized. The documentation is updated. I've tested this version with python 2.7 with and without cdecimal and also with 3.3 that has the faster decimal performance. It seems fine. The v5 version of the patch makes only white-space changes to plpy_main.c you should excluded that from the patch if your making a new version (I have done this in the v6 version I'm attaching) Tests for python 2 and 3 have been added. They work only with standard decimal.Decimal, as the type is printed in the *.out files. I think there is nothing we can do with that now. I think we should make test_type_conversion_numeric to do something that generates the same output in both cases. ie py.info(str(x)). I downside of having the test fail on installs with cdecimal installed is much greater than any benefit we get by ensuring that the type is really decimal. I've attached a v6 version of the patch that does this, do you agree with my thinking? Hi Steve, thanks for the changes. You're idea about common code for decimal and cdecimal is good, however not good enough. I like the idea of common code for decimal and cdecimal. But we need class name, not the value. I've changed the code from str(x) to x.__class__.__name__ so the function prints class name (which is Decimal for both packages), not the value. We need to have the class name check. The value is returned by the function and is a couple of lines lower in the file. patch is attached. thanks, Szymon plpython_decimal_v7.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bugfix and new feature for PGXS
Le mercredi 26 juin 2013 16:52:01, Andrew Dunstan a écrit : On 06/25/2013 11:29 AM, Cédric Villemain wrote: Le mardi 25 juin 2013 17:18:51, Andrew Dunstan a écrit : On 06/24/2013 07:24 PM, Cédric Villemain wrote: Le mardi 25 juin 2013 00:18:26, Andrew Dunstan a écrit : On 06/24/2013 04:02 PM, Cédric Villemain wrote: WIth extension, we do have to set VPATH explicitely if we want to use VPATH (note that contribs/extensions must not care that postgresql has been built with or without VPATH set). My patches try to fix that. No, this is exactly what I'm objecting to. I want to be able to do: invoke_vpath_magic standard_make_commands_as_for_non_vpath Your patches have been designed to overcome your particular issues, but they don't meet the general case, IMNSHO. This is why I want to have more discussion about how vpath builds could work for PGXS modules. The patch does not restrict anything, it is not supposed to lead to regression. The assignment of VPATH and srcdir are wrong in the PGXS case, the patch correct them. You're still free to do make VPATH=/mypath ... the VPATH provided won't be erased by the pgxs.mk logic. I still think this is premature. I have spent some more time trying to make it work the way I think it should, so far without success. I think we need more discussion about how we'd like VPATH to work for PGXS would. There's no urgency about this - we've lived with the current situation for quite a while. Sure... I did a quick and dirty patch (I just validate without lot of testing), attached to this email to fix json_build (at least for make, make install) As you're constructing targets based on commands to execute in the srcdir directory, and because srcdir is only set in pgxs.mk, it is possible to define srcdir early in the json_build/Makefile and use it. This still doesn't do what I really want, which is to be able to invoke make without anything special in the invocation that triggers VPATH processing. I believe it is the case currently (with my patches applyed), we just have to invoke Makefile like we invoke configure for PostgreSQL, except that we don't need a configure stage with the contribs. Obviously it is not exactly the same because 'configure' is a script to execute, but 'make' is a command with a configuration file (the Makefile) For PostgreSQL it is: $ mkdir build_dir $ cd build_dir $ path/to/source/tree/configure [options go here] $ gmake For contribs it is: $ mkdir build_dir $ cd build_dir $ gmake -f /path/to/source/tree/Makefile [options go here] Here's what I did that works like I think it should. First the attached patch on top of yours. Second, I did: mkdir vpath.json_build cd vpath.json_build sh/path/to/pgsource/config/prep_buildtree ../json_build . ln -s ../json_build/json_build.control . OK, this creates supposedly required directories in the build process. This looks a bit wrong and more work than required: not all the source directories are required by the build process. But I understand the point. Second, config/prep_buildtree is not installed (by make install), so it is not suitable to use it with PGXS (PGXS is found with pg_config, when the postgresql source tree is not accessible anymore). We may add config/prep_buildtree to INSTALL_PROGRAM. The 'ln -s' can probably be copied by prep_buildtree. Then I created vpath.mk with these contents: ext_srcdir = ../json_build USE_VPATH = $(ext_srcdir) OK. Finally I vpath-ized the Makefile (see attached). I don't see how this is more pretty than other solution. Given all of that, I was able to do, in the vpath directory: make make install make installcheck make clean and it all just worked, with exactly the same make invocations as work in an in-tree build. It breaks others: mkdir /tmp/auth_delay cd /tmp/auth_delay sh /path/prep_buildtree /path/contrib/auth_delay/ . (no .control, it's not an extension) make make: *** Pas de règle pour fabriquer la cible « auth_delay.so », nécessaire pour « all ». Arrêt. This works: cd /tmp/auth_delay rm -rf * make -f /path/contrib/auth_delay/Makefile PS: I exported USE_PGXS=1 because of the switch case in the makefile. So what's lacking to make this nice is a tool to automate the second and third steps above. If the second and third steps are automatized, you'll still have to invoke them at some point. How ? mkdir /tmp/json_build cd /tmp/json_build make make install make installcheck make clean - this will never work, make won't find anything to do. You'll need the equivalent of configure (as we do for PostgreSQL, I agree) to trigger prep_buildtree: mkdir /tmp/json_build cd /tmp/json_build $ path/to/source/tree/configure [options go here] make make install make installcheck make clean Maybe there are other ways of doing this, but this
Re: [HACKERS] Hash partitioning.
2013/6/26 Heikki Linnakangas hlinnakan...@vmware.com: On 26.06.2013 16:41, Yuri Levinsky wrote: Heikki, As far as I understand the height of the btree will affect the number of I/Os necessary. The height of the tree does not increase linearly with the number of records. Now let's compare that with a hash partitioned table, with 1000 partitions and a b-tree index on every partition. [..] This is almost equivalent to just having a b-tree that's one level taller [..] There certainly isn't any difference in the number of actual I/O performed. Imagine that there are a lot of indexes, e.g., 50. Although a lookup (walking one index) is equally fast, an insertion must update al 50 indexes. When each index requires one extra I/O (because each index is one level taller), that is 50 extra I/Os. In the partitioned case, each index would require the normal smaller amount of I/Os. Choosing which partition to use must only be done once: The result “counts” for all indexes that are to be updated. Additionally: Imagine that the data can be partitioned along some column that makes sense for performance reasons (e.g., some “date” where most accesses are concentrated on rows with more recent dates). The other indexes will probably not have such a performance distribution. Using those other indexes (both for look-ups and updates) in the non-partitioned case, will therefore pull a huge portion of each index into cache (because of the “random distribution” of the non-date data). In the partitioned case, more cache can be spent on the indexes that correspond to the “hot partitions.” Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash partitioning.
2013/6/27 Nicolas Barbier nicolas.barb...@gmail.com: When each index requires one extra I/O (because each index is one level taller), that is 50 extra I/Os. In the partitioned case, each index would require the normal smaller amount of I/Os. [..] Using those other indexes (both for look-ups and updates) in the non-partitioned case, will therefore pull a huge portion of each index into cache (because of the “random distribution” of the non-date data). In the partitioned case, more cache can be spent on the indexes that correspond to the “hot partitions.” It seems that the system described by Claudio fixes this problem another way: Claudio wrote: Now I just have two indices. One that indexes only hot tuples, it's very heavily queried and works blazingly fast, and one that indexes by (hotness, key). Yuri, maybe that is something you should investigate instead of partitioning? Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checking variadic any argument in parser - should be array
Hi Pavel, I had a look over your new patch and it looks good to me. My review comments on patch: 1. It cleanly applies with patch -p1 command. 2. make/make install/make check were smooth. 3. My own testing didn't find any issue. 4. I had a code walk-through and I am little bit worried or confused on following changes: if (PG_ARGISNULL(argidx)) return NULL; ! /* ! * Non-null argument had better be an array. The parser doesn't ! * enforce this for VARIADIC ANY functions (maybe it should?), so that ! * check uses ereport not just elog. ! */ ! arr_typid = get_fn_expr_argtype(fcinfo-flinfo, argidx); ! if (!OidIsValid(arr_typid)) ! elog(ERROR, could not determine data type of concat() input); ! ! if (!OidIsValid(get_element_type(arr_typid))) ! ereport(ERROR, ! (errcode(ERRCODE_DATATYPE_MISMATCH), ! errmsg(VARIADIC argument must be an array))); - /* OK, safe to fetch the array value */ arr = PG_GETARG_ARRAYTYPE_P(argidx); /* --- 3820,3828 if (PG_ARGISNULL(argidx)) return NULL; ! /* Non-null argument had better be an array */ ! Assert(OidIsValid(get_element_type(get_fn_expr_argtype(fcinfo-flinfo, argidx; arr = PG_GETARG_ARRAYTYPE_P(argidx); We have moved checking of array type in parser (ParseFuncOrColumn()) which basically verifies that argument type is indeed an array. Which exactly same as that of second if statement in earlier code, which you replaced by an Assert. However, what about first if statement ? Is it NO more required now? What if get_fn_expr_argtype() returns InvalidOid, don't you think we need to throw an error saying could not determine data type of concat() input? Well, I tried hard to trigger that code, but didn't able to get any test which fails with that error in earlier version and with your version. And thus I believe it is a dead code, which you removed ? Is it so ? Moreover, if in any case get_fn_expr_argtype() returns an InvalidOid, we will hit an Assert rather than an error, is this what you expect ? 5. This patch has user visibility, i.e. now we are throwing an error when user only says VARIADIC NULL like: select concat(variadic NULL) is NULL; Previously it was working but now we are throwing an error. Well we are now more stricter than earlier with using VARIADIC + ANY, so I have no issue as such. But I guess we need to document this user visibility change. I don't know exactly where though. I searched for VARIADIC and all related documentation says it needs an array, so nothing harmful as such, so you can ignore this review comment but I thought it worth mentioning it. Thanks On Thu, Jun 27, 2013 at 12:35 AM, Pavel Stehule pavel.steh...@gmail.comwrote: Hello remastered version Regards Pavel 2013/6/26 Jeevan Chalke jeevan.cha...@enterprisedb.com: Hi Pavel On Sat, Jan 26, 2013 at 9:22 AM, Pavel Stehule pavel.steh...@gmail.com wrote: Hello Tom you did comment ! * Non-null argument had better be an array. The parser doesn't ! * enforce this for VARIADIC ANY functions (maybe it should?), so ! * that check uses ereport not just elog. ! */ So I moved this check to parser. It is little bit stricter - requests typed NULL instead unknown NULL, but it can mark error better and early Tom suggested that this check should be better done by parser. This patch tries to accomplish that. I will go review this. However, is it possible to you to re-base it on current master? I can't apply it using git apply but patch -p1 was succeeded with lot of offset. Thanks Regards Pavel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Jeevan B Chalke Senior Software Engineer, RD EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: +91 20 30589500 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. -- Jeevan B Chalke Senior Software Engineer,
Re: [HACKERS] Documentation/help for materialized and recursive views
On Thu, Jun 27, 2013 at 10:56 AM, Nicolas Barbier nicolas.barb...@gmail.com wrote: 2013/6/27 Magnus Hagander mag...@hagander.net: Is there a particular reason why CREATE RECURSIVE VIEW is part of the help for CREATE VIEW, but CREATE MATERIALIZED VIEW doesn't show up there? I realize the technical reason (they're different man pages, and that also controls what's in \h in psql which is where I ran into it), but was there any particular reason to split those up in the first place? Normal views are an abstraction layer, while materialized views (despite containing the word “view”) are mainly a performance tool (in a way similar to indexes). Oh yes, I'm well aware of this of course. The functionality of materialized views will (over time) totally swamp that of normal views, so mixing all the corresponding documentation with the documentation for normal views probably doesn’t make things easier for people that are only interested in normal views. That's a better point I think. That said, it would be very useful if it actually showed up in \h CREATE VIEW in psql - I wonder if we should just add the syntax to that page, and then link said future information on a separate page somehow? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash partitioning.
On Jun 27, 2013 12:24 PM, Nicolas Barbier nicolas.barb...@gmail.com wrote: 2013/6/27 Nicolas Barbier nicolas.barb...@gmail.com: When each index requires one extra I/O (because each index is one level taller), that is 50 extra I/Os. In the partitioned case, each index would require the normal smaller amount of I/Os. [..] Using those other indexes (both for look-ups and updates) in the non-partitioned case, will therefore pull a huge portion of each index into cache (because of the “random distribution” of the non-date data). In the partitioned case, more cache can be spent on the indexes that correspond to the “hot partitions.” It seems that the system described by Claudio fixes this problem another way: Claudio wrote: Now I just have two indices. One that indexes only hot tuples, it's very heavily queried and works blazingly fast, and one that indexes by (hotness, key). This is not really related to hash partitioning, but you can also do index partitioning while having the tables unpartitioned. If the hotness field is a timestamp like it often is, you can create a predicate index on (key, tstamp) where tstamp [some date in recent past], and replace the index with a newer one every so often to keep the size small. This way you can have a non-partitioned index for batch queries and a small one for the OLTP workload. If we added the option to build indexes using an index only scan, building the replacement index would be quite cheap. Regards, Ants Aasma
Re: [HACKERS] in-catalog Extension Scripts and Control parameters (templates?)
Hi, Thanks a lot for your review! Some answers here, new version of the patch with fixes by tuesday. Hitoshi Harada umi.tan...@gmail.com writes: - If I have control file that has the same name as template, create extension picks up control file? Is this by design? Yes. That should allow to answer most of Heikki's security complaint, but isn't enough to allow us to open the feature to non superuser if I understand correctly. - Though control file is kind of global information among different databases, pg_extension_template is not. Sounds a little weird to me. I think that's a limitation of the old model and we don't want to turn templates for extensions into being shared catalogs. At least that's my understanding of the design consensus. - Why do we need with() clause even if I don't need it? Will have a fresh look, thanks. - I copied and paste from my plv8.control file to template script, and MODULE_PATHNAME didn't work. By design? Yes. MODULE_PATHNAME is deprecated by the control file parameter of the same name, and there's no reason to use it in extension templates even if we got to support C coded extensions in them, which is not the case now. foo=# create template for extension ex2 version '1.0' with (requires ex1) as $$ $$; ERROR: template option requires takes an argument Will see about that. - create template ex2, create extension ex2, alter template ex2 rename to ex3, create extension ex3, drop template ex3; foo=# drop template for extension ex3 version '1.0'; ERROR: cannot drop unrecognized object 3179 16429 0 because other objects depend on it Well, if I'm following, you're trying to remove a non-existing object. I guess you would prefer a better error message, right? - a template that is created in another template script does not appear to depend on the parent template. I don't think that should be automatically the case, even if I admit I didn't think about that case. - Without control file/template, attempt to create a new extension gives: foo=# create extension plv8; ERROR: extension plv8 has no default control template but can it be better, like extension plv8 has no default control file or template? Will rework. - Do we really need separate tables, pg_extension_template and pg_extension_control? Yes, to be able to have the same feature as we have today with auxilliary control files, that is change properties of the extension from a version to the next. - Looks like both tables don't have toast tables. Some experiment gives: ERROR: row is too big: size 8472, maximum size 8160 Will fix in next version of the patch. Thanks, -- Dimitri Fontaine06 63 07 10 78 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ASYNC Privileges proposal
So I would think that if this was to go further then channels would need to be more of a first class citizen and created explicitly, with CREATE CHANNEL, DROP CHANNEL etc: CREATE CHANNEL channame; GRANT LISTEN ON CHANNEL channame TO rolename; GRANT NOTIFY ON CHANNEL channame TO rolename; LISTEN channame; -- OK NOTIFY channame, 'hi'; -- OK LISTEN ; -- exception: no channel named NOTIFY , 'hi'; -- exception: no channel named Personally I think explicitly creating channels would be beneficial; I have hit issues where an typo in a channel name has caused a bug to go unnoticed for a while. But of course this would break backwards compatibility with the current model (with implicit channel names) so unless we wanted to force everyone to add CREATE CHANNEL statements during their upgrade then, maybe there would need to be some kind of system to workaround this Possibly some kind of catch-all channel, that enables implicit channel names? GRANT LISTEN, NOTIFY ON CHANNEL * TO PUBLIC; -- enabled by default for backwards compat NOTIFY ; -- OK via * CHANNEL LISTEN ; -- OK via * CHANNEL Chris On 18 June 2013 18:31, Josh Berkus j...@agliodbs.com wrote: I had a quick play to see what might be involved [attached], and would like to hear people thoughts; good idea, bad idea, not like that! etc I question the usefulness of allowing listen/notify to be restricted to an entire class of users. The granularity of this seems too broad, though I am not sure if allowing message to be sent to a specific user is easily achievable. Well, if we're going to have privs on LISTEN/NOTIFY at all, they should be on specific message bands, i.e.: REVOKE LISTEN ON 'cacheupdates' FROM PUBLIC; GRANT LISTEN ON 'cacheupdates' TO webuser; GRANT LISTEN ON ALL TO admin; I can certainly see wanting this, but it'd be a great deal more sophisticated than what Chris has proposed. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Re: [HACKERS] in-catalog Extension Scripts and Control parameters (templates?)
On Thu, Jun 27, 2013 at 2:49 AM, Dimitri Fontaine dimi...@2ndquadrant.frwrote: Hi, Thanks a lot for your review! Some answers here, new version of the patch with fixes by tuesday. Hitoshi Harada umi.tan...@gmail.com writes: - create template ex2, create extension ex2, alter template ex2 rename to ex3, create extension ex3, drop template ex3; foo=# drop template for extension ex3 version '1.0'; ERROR: cannot drop unrecognized object 3179 16429 0 because other objects depend on it Well, if I'm following, you're trying to remove a non-existing object. I guess you would prefer a better error message, right? Right. unrecognized object x y z doesn't look good. - a template that is created in another template script does not appear to depend on the parent template. I don't think that should be automatically the case, even if I admit I didn't think about that case. Really? My understanding is everything that is created under extension depends on the extension, which depends on the template. Why only template is exception? -- Hitoshi Harada
Re: [HACKERS] PostgreSQL 9.3 latest dev snapshot
On Tue, Jun 25, 2013 at 3:31 PM, Michael Paquier michael.paqu...@gmail.com wrote: On 2013/06/25, at 22:23, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Jun 25, 2013 at 6:33 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Tue, Jun 25, 2013 at 5:33 PM, Misa Simic misa.si...@gmail.com wrote: Hi, Where we can find latest snapshot for 9.3 version? We have taken latest snapshot from http://ftp.postgresql.org/pub/snapshot/dev/ But it seems it is for 9.4 version... 9.3 has moved to branch REL9_3_STABLE a couple of days ago. Yes. We can find the snapshot from REL9_3_STABLE git branch. http://git.postgresql.org/gitweb/?p=postgresql.git;a=shortlog;h=refs/heads/REL9_3_STABLE Indeed, I completely forgot that you can download snapshots from postgresql.org's git. Simply use that instead of the FTP server now as long as 9.3 snapshots are not generated there. In case somebody is still looking, snapshots are properly building for 9.3 now. Those snapshots aren't identical to a download from git, as they've gone through a make dist-prep or whatever it's called. But they're pretty close. However, if oyu're looking for a snapshot, please use the one on the ftpsite. Generating those snapshots on the git server is slow and expensive... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Min value for port
Is there a reason why we have set the min allowed value for port to 1, not 1024? Given that you can't actually start postgres with a value of 1024, shoulnd't the entry in pg_settings reference that as well? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: backend hangs at immediate shutdown (Re: [HACKERS] Back-branch update releases coming in a couple weeks)
Hi, Alvaro san, From: Alvaro Herrera alvhe...@2ndquadrant.com MauMau escribió: Yeah, I see that --- after removing that early exit, there are unwanted messages. And in fact there are some signals sent that weren't previously sent. Clearly we need something here: if we're in immediate shutdown handler, don't signal anyone (because they have already been signalled) and don't log any more messages; but the cleaning up of postmaster's process list must still be carried out. Would you please add that on top of the attached cleaned up version of your patch? I did this. Please find attached the revised patch. I modified HandleChildCrash(). I tested the immediate shutdown, and the child cleanup succeeded. In addition, I added if condition at the end of the function. This is to prevent resetting AbortStartTime every time one child terminates. Regards MauMau reliable_immediate_shutdown-3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: query result history in psql
Thank you for the review! There were a few english/grammatical mistakes that I went ahead and fixed. Thank you for that. If you could send me a patch-to-a-patch so I can correct all the mistakes in the next release? Additionally, I think some of the string manipulation might be placed outside of the main ans.c file. I don't know if there's a better place for 'EscapeForCopy' and 'GetEscapedLen'. Not really a big deal, just an organizational idea. I also changed 'EscapeForCopy' to 'EscapeAndCopy'. I think that better describes the functionality. 'EscapeForCopy' kind of implies that another function is needed to copy the string. The 'EscapeForCopy' was meant to mean 'Escape string in a format require by the COPY TEXT format', so 'copy' in the name refers to the escaping format, not the action performed by the function. They could be, indeed, placed in separate module. I'll do it. What does 'ans' stand for? I am not sure how it relates to the concept of a query history. It didn't stop my understanding of the code, but I don't know if a user will immediately know the meaning. Some mathematical toolkits, like Matlab or Mathematica, automatically set a variable called 'ans' (short for answer) containing the result of the last operation. I was trying to emulate exactly this behaviour. Probably the biggest problem is that the query history list is missing a maximum size variable. I think this could be valuable for preventing users from shooting themselves in the foot. If the user is running large queries, they might accidentally store too much data. This probably somewhat of an edge-case but I believe it is worth considering. We could provide a sensible default limit (10 queries?) and also allow the user to change it. I was considering such a behaviour. But since the feature is turned off by default, I decided that whoever is using it, is aware of cost. Instead of truncating the history automatically (which could lead to a nasty surprise), I decided to equip the user with \ansclean , a command erasing the history. I believe that it is better to let the user decide when history should be erased, instead of doing it automatically. Finally, is it worth resetting the query history every time a user reconnects to the database? I can see how this might interrupt a user's workflow. If the user suddenly disconnects (network connection interrupted, etc) then they would lose their history. I think this is definitely up for debate. It would add more management overhead (psql options etc) and might just be unnecessary. However, with a sane limit to the size of the query history, I don't know if there would be too many drawbacks from a storage perspective. The history is not erased. The history is always stored in the client's memory. When a history item is used for the first time, a TEMPORARY table is created in the database that stores the data server-side. When user disconnects from the database, the session ends and all these tables are dropped. Tables names have to be removed from the history, so next time the item is used, the table will be created and populated again. I use the feature while switching often between databases, and it works seamlessly. Actually, it's quite useful to move bits of data across databases: Connect to database A, run a query, connect to database B, run another query joining local data with the results of the previous query. Those issues aside - I think it's a great feature! I can add the grammatical fixes I made whenever the final patch is ready. Or earlier, whatever works for you. Also, this is my first time reviewing a patch, so please let me know if I can improve on anything. Thanks! This is my first submitted patch, so I can't really comment on the process. But if you could add the author's email to CC, the message would be much easier to spot. I replied after two days only because I missed the message in the flood of other pgsql-hacker messages. I think I need to scan the list more carefully... Maciej
Re: [HACKERS] Reduce maximum error in tuples estimation after vacuum.
On Wednesday, June 26, 2013 7:40 AM Kyotaro HORIGUCHI wrote: I've recovered from messing up. snip Please let me have a bit of time to diagnose this. I was completely messed up and walking on the wrong way. I looked into the vacuum for UPDATEs, not DELETE's so it's quite resonable to have such results. The renewed test script attached shows the verbose output of vacuum after the deletes. I had following output from it. # I belive this runs for you.. | INFO: t: found 98 removable, 110 nonremovable row | versions in 6308 out of 10829 pages On such a case of partially-scanned, lazy_scan_heap() tries to estimate resulting num_tuples in vac_estimate_reltuples() assuming the uniformity of tuple density, which failes for such a a strong imbalance made by bulk updates. Do you find any differences between what you will have and the following I had? I could see the same output with your latest script, also I could reproduce the test if I run the test with individual sql statements. One of the main point for reproducing individual test was to keep autovacuum = off. Now I can look into it further, I have still not gone through in detail about your new approach to calculate the reltuples, but I am wondering whether there can be anyway with which estimates can be improved with different calculation in vac_estimate_reltuples(). One thing I have observed that 2nd parameter is_analyze of vac_estimate_reltuples() is currently not used. I cannot work on it till early next week, so others are welcome to join review. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MD5 aggregate
On Thu, Jun 27, 2013 at 11:28 AM, Dean Rasheed dean.a.rash...@gmail.com wrote: On 26 June 2013 21:46, Peter Eisentraut pete...@gmx.net wrote: On 6/26/13 4:04 PM, Dean Rasheed wrote: A quick google search reveals several people asking for something like this, and people recommending md5(string_agg(...)) or md5(string_agg(md5(...))) based solutions, which are doomed to failure on larger tables. The thread discussed several other options of checksumming tables that did not have the air of a crytographic offering, as Noah put it. True but md5 has the advantage of being directly comparable with the output of Unix md5sum, which would be useful if you loaded data from external files and wanted to confirm that your import process didn't mangle it. The problem with md5_agg() is that it's only useful in toy scenarios. It's more useful give people script that does same sum(hash(row)) on dump file than try to run MD5 on ordered rows. Also, I don't think anybody actually cares about MD5(table-as-bytes), instead people want way to check if 2 tables or table and dump are same. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] in-catalog Extension Scripts and Control parameters (templates?)
Hitoshi Harada umi.tan...@gmail.com writes: - a template that is created in another template script does not appear to depend on the parent template. I don't think that should be automatically the case, even if I admit I didn't think about that case. Really? My understanding is everything that is created under extension depends on the extension, which depends on the template. Why only template is exception? Oh sorry, I understood you meant at CREATE TEMPLATE FOR EXTENSION time rather than at CREATE EXTENSION time, and that you were refering to dependency as in the require control parameter. The pg_depend entry against the extension should be there, will fix. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [9.3 doc fix] ECPG VAR command does not describe the actual specification
Hello, Looking around the 9.3 doc, I found a small, but not-insignificant error in the documentation. http://www.postgresql.org/docs/9.3/static/ecpg-sql-var.html According to the description, EXEC SQL VAR a IS int; is equivalent to: Exec sql begin declare section; int a; exec sql end declare section; However, ecpg var.pgc produces the error: var.pgc:1: ERROR: variable a is not declared On the other hand, the following code can be preprocessed correctly, treating the host variable a as an int type. -- Exec sql begin declare section; short a; exec sql end declare section; EXEC SQL VAR a IS int; EXEC SQL SELECT c INTO :a FROM mytable; -- The attached patch fixes this documentation error. This is not specific to 9.3, so it needs to be backported. Could you commit this? Regards MauMau ecpg_var_doc.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Min value for port
On 6/27/13 6:34 AM, Magnus Hagander wrote: Is there a reason why we have set the min allowed value for port to 1, not 1024? Given that you can't actually start postgres with a value of 1024, shoulnd't the entry in pg_settings reference that as well? Are you thinking of the restriction that you need to be root to use ports 1024? That restriction is not necessarily universal. We can let the kernel tell us at run time if it doesn't like our port. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Move unused buffers to freelist
On Wed, Jun 26, 2013 at 8:09 AM, Amit Kapila amit.kap...@huawei.com wrote: Configuration Details O/S - Suse-11 RAM - 128GB Number of Cores - 16 Server Conf - checkpoint_segments = 300; checkpoint_timeout = 15 min, synchronous_commit = 0FF, shared_buffers = 14GB, AutoVacuum=off Pgbench - Select-only Scalefactor - 1200 Time - 30 mins 8C-8T16C-16T32C-32T64C-64T Head 62403101810 99516 94707 Patch 62827101404 99109 94744 On 128GB RAM, if use scalefactor=1200 (database=approx 17GB) and 14GB shared buffers, this is no major difference. One of the reasons could be that there is no much swapping in shared buffers as most data already fits in shared buffers. I'd like to just back up a minute here and talk about the broader picture here. What are we trying to accomplish with this patch? Last year, I did some benchmarking on a big IBM POWER7 machine (16 cores, 64 hardware threads). Here are the results: http://rhaas.blogspot.com/2012/03/performance-and-scalability-on-ibm.html Now, if you look at these results, you see something interesting. When there aren't too many concurrent connections, the higher scale factors are only modestly slower than the lower scale factors. But as the number of connections increases, the performance continues to rise at the lower scale factors, and at the higher scale factors, this performance stops rising and in fact drops off. So in other words, there's no huge *performance* problem for a working set larger than shared_buffers, but there is a huge *scalability* problem. Now why is that? As far as I can tell, the answer is that we've got a scalability problem around BufFreelistLock. Contention on the buffer mapping locks may also be a problem, but all of my previous benchmarking (with LWLOCK_STATS) suggests that BufFreelistLock is, by far, the elephant in the room. My interest in having the background writer add buffers to the free list is basically around solving that problem. It's a pretty dramatic problem, as the graph above shows, and this patch doesn't solve it. There may be corner cases where this patch improves things (or, equally, makes them worse) but as a general point, the difficulty I've had reproducing your test results and the specificity of your instructions for reproducing them suggests to me that what we have here is not a clear improvement on general workloads. Yet such an improvement should exist, because there are other products in the world that have scalable buffer managers; we currently don't. Instead of spending a lot of time trying to figure out whether there's a small win in narrow cases here (and there may well be), I think we should back up and ask why this isn't a great big win, and what we'd need to do to *get* a great big win. I don't see much point in tinkering around the edges here if things are broken in the middle; things that seem like small wins or losses now may turn out otherwise in the face of a more comprehensive solution. One thing that occurred to me while writing this note is that the background writer doesn't have any compelling reason to run on a read-only workload. It will still run at a certain minimum rate, so that it cycles the buffer pool every 2 minutes, if I remember correctly. But it won't run anywhere near fast enough to keep up with the buffer allocation demands of 8, or 32, or 64 sessions all reading data not all of which is in shared_buffers at top speed. In fact, we've had reports that the background writer isn't too effective even on read-write workloads. The point is - if the background writer isn't waking up and running frequently enough, what it does when it does wake up isn't going to matter very much. I think we need to spend some energy poking at that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Developer meeting photos
On Wed, Jun 26, 2013 at 10:22 AM, Alexander Korotkov aekorot...@gmail.com wrote: at last developer meeting we missed Oleg Bartunov. So, it's not surprising that photos is also missed. I remember that somebody took photos, but unfortunately it appears that I don't remember who. My employer who sponsored my attendance in PGCon want to publish post about it on the website. So, it would be very nice to have a photo of developer meeting. Does anybody have it? Didn't Kevin take some photos? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Developer meeting photos
On Thu, Jun 27, 2013 at 4:23 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Jun 26, 2013 at 10:22 AM, Alexander Korotkov aekorot...@gmail.com wrote: at last developer meeting we missed Oleg Bartunov. So, it's not surprising that photos is also missed. I remember that somebody took photos, but unfortunately it appears that I don't remember who. My employer who sponsored my attendance in PGCon want to publish post about it on the website. So, it would be very nice to have a photo of developer meeting. Does anybody have it? Didn't Kevin take some photos? I've received photos by private email. Thanks. -- With best regards, Alexander Korotkov.
Re: [HACKERS] XLogInsert scaling, revisited
Hi, On 2013-06-26 18:52:30 +0300, Heikki Linnakangas wrote: * Could you document the way slots prevent checkpoints from occurring when XLogInsert rechecks for full page writes? I think it's correct - but not very obvious on a glance. There's this in the comment near the top of the file: * To update RedoRecPtr or fullPageWrites, one has to make sure that all * subsequent inserters see the new value. This is done by reserving all the * insertion slots before changing the value. XLogInsert reads RedoRecPtr and * fullPageWrites after grabbing a slot, so by holding all the slots * simultaneously, you can ensure that all subsequent inserts see the new * value. Those fields change very seldom, so we prefer to be fast and * non-contended when they need to be read, and slow when they're changed. Does that explain it well enough? XLogInsert holds onto a slot while it rechecks for full page writes. Yes. Earlieron that was explained in XLogInsert() - maybe point to the documentation ontop of the file? The file is too big to expect everyone to reread the whole file in a new release... * The read of Insert-RedoRecPtr while rechecking whether it's out of date now is unlocked, is that correct? And why? Same here, XLogInsert holds the slot while rechecking Insert-RedoRecptr. I was wondering whether its guaranteed that we don't read a cached value, but I didn't think of the memory barriers due to the spinlock in Release/AcquireSlot. I think I thought that release wouldn't acquire the spinlock at all. * Have you measured whether it works to just keep as many slots as max_backends requires around and not bothering with dynamically allocating them to inserters? That seems to require to keep actually waiting slots in a separate list which very well might make that too expensive. Correctness wise the biggest problem for that probably is exlusive acquiration of all slots CreateCheckpoint() does... Hmm. It wouldn't be much different, each backend would still need to reserve its own dedicated slot, because it might be held by the a backend that grabbed all the slots. Also, bgwriter and checkpointer need to flush the WAL, so they'd need slots too. More slots make WaitXLogInsertionsToFinish() more expensive, as it has to loop through all of them. IIRC some earlier pgbench tests I ran didn't show much difference in performance, whether there were 40 slots or 100, as long as there was enough of them. I can run some more tests with even more slots to see how it behaves. In a very quick test I ran previously I did see the slot acquiration in the profile when using short connections that all did some quick inserts - which isn't surprising since they tend to all start with the same slot so they will all fight for the first slots. Maybe we could ammeliorate that by initializing MySlotNo to (MyProc-pgprocno % num_xloginsert_slots) when uninitialized? That won't be completely fair since the number of procs won't usually be a multiple of num_insert_slots, but I doubt that will be problematic. * What about using some sort of linked list of unused slots for WALInsertSlotAcquireOne? Everytime we're done we put it to the *end* of the list so it's less likely to have been grabbed by somebody else so we can reuse it. a) To grab a new one go to the head of the linked list spinlock it and recheck whether it's still free. If not, restart. Otherwise, remove from list. b) To reuse a previously used slot That way we only have to do the PGSemaphoreLock() dance if there really aren't any free slots. That adds a spinlock acquisition / release into the critical path, to protect the linked list. I'm trying very hard to avoid serialization points like that. Hm. We already have at least one spinlock in that path? Or are you thinking of a global one protecting the linked list? If so, I think we should be able to get away with locking individual slots. IIRC if you never need to delete list elements you can even get away with a lockless implementation. While profiling the tests I've done, finding a free slot hasn't shown up much. So I don't think it's a problem performance-wise as it is, and I don't think it would make the code simpler. It sure wouldn't make it simpler. As I said above, I saw the slot acquiration in a profile when using -C and a short pgbench script (that just inserts 10 rows). * The queuing logic around slots seems to lack documentation. It's complex enough to warrant that imo. Yeah, it's complex. I expanded the comments above XLogInsertSlot, to explain how xlogInsertingAt works. Did that help, or was it some other part of that that needs more docs? What I don't understand so far is why we don't reset xlogInsertingAt during SlotReleaseOne. There are places documenting that we don't do so, but not why unless I missed it. Do we do this only to have some plausible value for a backend that been acquired but
Re: [HACKERS] Move unused buffers to freelist
On 2013-06-27 08:23:31 -0400, Robert Haas wrote: I'd like to just back up a minute here and talk about the broader picture here. Sounds like a very good plan. So in other words, there's no huge *performance* problem for a working set larger than shared_buffers, but there is a huge *scalability* problem. Now why is that? As far as I can tell, the answer is that we've got a scalability problem around BufFreelistLock. Part of the problem is it's name ;) Contention on the buffer mapping locks may also be a problem, but all of my previous benchmarking (with LWLOCK_STATS) suggests that BufFreelistLock is, by far, the elephant in the room. Contention wise I aggree. What I have seen is that we have a huge amount of cacheline bouncing around the buffer header spinlocks. My interest in having the background writer add buffers to the free list is basically around solving that problem. It's a pretty dramatic problem, as the graph above shows, and this patch doesn't solve it. One thing that occurred to me while writing this note is that the background writer doesn't have any compelling reason to run on a read-only workload. It will still run at a certain minimum rate, so that it cycles the buffer pool every 2 minutes, if I remember correctly. I have previously added some adhoc instrumentation that printed the amount of buffers that were required (by other backends) during a bgwriter cycle and the amount of buffers that the buffer manager could actually write out. I don't think I actually found any workload where the bgwriter actually wroute out a relevant percentage of the necessary pages. Which would explain why the patch doesn't have a big benefit. The freelist is empty most of the time, so we don't benefit from the reduced work done under the lock. I think the whole algorithm that guides how much the background writer actually does, including its pacing/sleeping logic, needs to be rewritten from scratch before we are actually able to measure the benefit from this patch. I personally don't think there's much to salvage from the current code. Problems with the current code: * doesn't manipulate the usage_count and never does anything to used pages. Which means it will just about never find a victim buffer in a busy database. * by far not aggressive enough, touches only a few buffers ahead of the clock sweep. * does not advance the clock sweep, so the individual backends will touch the same buffers again and transfer all the buffer spinlock cacheline around * The adaption logic it has makes it so slow to adapt that it takes several minutes to adapt. * ... There's another thing we could do to noticeably improve scalability of buffer acquiration. Currently we do a huge amount of work under the freelist lock. In StrategyGetBuffer: LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE); ... // check freelist, will usually be empty ... for (;;) { buf = BufferDescriptors[StrategyControl-nextVictimBuffer]; ++StrategyControl-nextVictimBuffer; LockBufHdr(buf); if (buf-refcount == 0) { if (buf-usage_count 0) { buf-usage_count--; } else { /* Found a usable buffer */ if (strategy != NULL) AddBufferToRing(strategy, buf); return buf; } } UnlockBufHdr(buf); } So, we perform the entire clock sweep until we found a single buffer we can use inside a *global* lock. At times we need to iterate over the whole shared buffers BM_MAX_USAGE_COUNT (5) times till we pushed down all the usage counts enough (if the database is busy it can take even longer...). In a busy database where usually all the usagecounts are high the next backend will touch a lot of those buffers again which causes massive cache eviction bouncing. It seems far more sensible to only protect the clock sweep's nextVictimBuffer with a spinlock. With some care all the rest can happen without any global interlock. I think even after fixing this - which we definitely should do - having a sensible/more aggressive bgwriter moving pages onto the freelist makes sense because then backends then don't need to deal with dirty pages. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add long options to pgbench (submission 1)
On Tue, Jun 25, 2013 at 3:09 PM, Fabien COELHO coe...@cri.ensmp.fr wrote: I think --quiet-log should be spelled --quiet. ISTM that --quiet usually means not verbose on stdout, so I added log because this was specific to the log output, and that there may be a need for a --quiet option for stdout at some time. The output that is quieted by -q is not the log output produced by --log; it's the regular progress output on stdout/stderr. So I changed that, and committed this, with some further cosmetic changes. I made the formatting of the help message more like psql's help message, including adjusting pgbench to start the description of each option in the same column that psql does. This got rid of a lot of line breaks and IMHO makes the output of pgbench --help quite a bit more readable. I made stylistic adjustments to the documentation portion of the patch as well, again to match the markup used for psql. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash partitioning.
On 06/27/2013 11:12 AM, Nicolas Barbier wrote: Imagine that there are a lot of indexes, e.g., 50. Although a lookup (walking one index) is equally fast, an insertion must update al 50 indexes. When each index requires one extra I/O (because each index is one level taller), that is 50 extra I/Os. In the partitioned case, each index would require the normal smaller amount of I/Os. Choosing which partition to use must only be done once: The result “counts” for all indexes that are to be updated. I think you're underestimating the cost of partitioning. After all, the lookup of what index to update for a given partition is a a lookup in pg_index via pg_index_indrelid_index - a btree index. Additionally, the depth of an index doesn't directly translate to the number of I/O writes per insert (or delete). I'd rather expect the avg. number of I/O writes per insert into a b-tree to be reasonably close to one - depending mostly on the number of keys per page, not depth. Additionally: Imagine that the data can be partitioned along some column that makes sense for performance reasons (e.g., some “date” where most accesses are concentrated on rows with more recent dates). The other indexes will probably not have such a performance distribution. Using those other indexes (both for look-ups and updates) in the non-partitioned case, will therefore pull a huge portion of each index into cache (because of the “random distribution” of the non-date data). In the partitioned case, more cache can be spent on the indexes that correspond to the “hot partitions.” That's a valid point, yes. I'd call this index partitioning. And with partial indices, Postgres already has something that gets pretty close, I think. Though, I don't consider this to be related to how the tuples of the relation are laid out on disk. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extensible external toast tuple support snappy prototype
On Wed, Jun 19, 2013 at 3:27 AM, Andres Freund and...@2ndquadrant.com wrote: There will be a newer version of the patch coming today or tomorrow, so there's probably no point in looking at the one linked above before that... This patch is marked as Ready for Committer in the CommitFest app. But it is not clear to me where the patch is that is being proposed for commit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Min value for port
On Thu, Jun 27, 2013 at 2:16 PM, Peter Eisentraut pete...@gmx.net wrote: On 6/27/13 6:34 AM, Magnus Hagander wrote: Is there a reason why we have set the min allowed value for port to 1, not 1024? Given that you can't actually start postgres with a value of 1024, shoulnd't the entry in pg_settings reference that as well? Are you thinking of the restriction that you need to be root to use ports 1024? That restriction is not necessarily universal. We can let the kernel tell us at run time if it doesn't like our port. Yes, that's the restriction I was talking about. It's just a bit annoying that if you look at pg_settings.min_value it doesn't actually tell you the truth. But yeah, I believe Windows actually lets you use a lower port number, so it'd at least have to be #ifdef'ed for that if we wanted to change it. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Min value for port
On 27/06/13 15:11, Magnus Hagander wrote: On Thu, Jun 27, 2013 at 2:16 PM, Peter Eisentraut pete...@gmx.net wrote: On 6/27/13 6:34 AM, Magnus Hagander wrote: Is there a reason why we have set the min allowed value for port to 1, not 1024? Given that you can't actually start postgres with a value of 1024, shoulnd't the entry in pg_settings reference that as well? Are you thinking of the restriction that you need to be root to use ports 1024? That restriction is not necessarily universal. We can let the kernel tell us at run time if it doesn't like our port. Yes, that's the restriction I was talking about. It's just a bit annoying that if you look at pg_settings.min_value it doesn't actually tell you the truth. But yeah, I believe Windows actually lets you use a lower port number, so it'd at least have to be #ifdef'ed for that if we wanted to change it. There's also authbind and CAP_NET_BIND_SERVICE. Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Min value for port
On 2013-06-27 15:11:26 +0200, Magnus Hagander wrote: On Thu, Jun 27, 2013 at 2:16 PM, Peter Eisentraut pete...@gmx.net wrote: On 6/27/13 6:34 AM, Magnus Hagander wrote: Is there a reason why we have set the min allowed value for port to 1, not 1024? Given that you can't actually start postgres with a value of 1024, shoulnd't the entry in pg_settings reference that as well? Are you thinking of the restriction that you need to be root to use ports 1024? That restriction is not necessarily universal. We can let the kernel tell us at run time if it doesn't like our port. Yes, that's the restriction I was talking about. It's just a bit annoying that if you look at pg_settings.min_value it doesn't actually tell you the truth. But yeah, I believe Windows actually lets you use a lower port number, so it'd at least have to be #ifdef'ed for that if we wanted to change it. You can easily change the setting on linux as well. And you can grant specific binaries the permission to bind to restricted ports without being root. I don't think the additional complexity to get a sensible value in there is warranted. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PQConnectPoll, connect(2), EWOULDBLOCK and somaxconn
Andres Freund and...@2ndquadrant.com writes: On 2013-06-26 20:07:40 -0400, Tom Lane wrote: However, some more trolling of the intertubes suggests that Cygwin's emulation of socket() does indeed return EINPROGRESS; see for instance this ancient thread of ours: http://www.postgresql.org/message-id/flat/14855.49635.565990.716...@kryten.bedford.waii.com#14855.49635.565990.716...@kryten.bedford.waii.com Unless we want to distinguish Cygwin from native Windows in this code chunk, maybe we'd better leave well enough alone. After some looking it gets funnier. There currently doesn't seem to be any chance that we actually can get an EINPROGRESS at that level on windows. We #define connect() to pgwin32_connect() which fudges errno around. Where WSAEINPROGRESS is mapped to EINVAL. That's only in the backend though: note the #define is controlled by #ifndef FRONTEND, and we don't link backend/port/win32/socket.c into libpq anyway. The signal managing it's doing wouldn't work at all in client-side programs. But yeah, on the backend side we would definitely treat WSAEINPROGRESS as a hard error. OTOH, we don't use nonblocking connect (much?) in the backend so I'm not sure whether that's a directly comparable case or not. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Move unused buffers to freelist
On Thu, Jun 27, 2013 at 9:01 AM, Andres Freund and...@2ndquadrant.com wrote: Contention wise I aggree. What I have seen is that we have a huge amount of cacheline bouncing around the buffer header spinlocks. How did you measure that? I have previously added some adhoc instrumentation that printed the amount of buffers that were required (by other backends) during a bgwriter cycle and the amount of buffers that the buffer manager could actually write out. I think you can see how many are needed from buffers_alloc. No? I don't think I actually found any workload where the bgwriter actually wroute out a relevant percentage of the necessary pages. Check. Problems with the current code: * doesn't manipulate the usage_count and never does anything to used pages. Which means it will just about never find a victim buffer in a busy database. Right. I was thinking that was part of this patch, but it isn't. I think we should definitely add that. In other words, the background writer's job should be to run the clock sweep and add buffers to the free list. I think we should also split the lock: a spinlock for the freelist, and an lwlock for the clock sweep. * by far not aggressive enough, touches only a few buffers ahead of the clock sweep. Check. Fixing this might be a separate patch, but then again maybe not. The changes we're talking about here provide a natural feedback mechanism: if we observe that the freelist is empty (or less than some length, like 32 buffers?) set the background writer's latch, because we know it's not keeping up. * does not advance the clock sweep, so the individual backends will touch the same buffers again and transfer all the buffer spinlock cacheline around Yes, I think that should be fixed as part of this patch too. It's obviously connected to the point about usage counts. * The adaption logic it has makes it so slow to adapt that it takes several minutes to adapt. Yeah. I don't know if fixing that will fall naturally out of these other changes or not, but I think it's a second-order concern in any event. There's another thing we could do to noticeably improve scalability of buffer acquiration. Currently we do a huge amount of work under the freelist lock. In StrategyGetBuffer: LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE); ... // check freelist, will usually be empty ... for (;;) { buf = BufferDescriptors[StrategyControl-nextVictimBuffer]; ++StrategyControl-nextVictimBuffer; LockBufHdr(buf); if (buf-refcount == 0) { if (buf-usage_count 0) { buf-usage_count--; } else { /* Found a usable buffer */ if (strategy != NULL) AddBufferToRing(strategy, buf); return buf; } } UnlockBufHdr(buf); } So, we perform the entire clock sweep until we found a single buffer we can use inside a *global* lock. At times we need to iterate over the whole shared buffers BM_MAX_USAGE_COUNT (5) times till we pushed down all the usage counts enough (if the database is busy it can take even longer...). In a busy database where usually all the usagecounts are high the next backend will touch a lot of those buffers again which causes massive cache eviction bouncing. It seems far more sensible to only protect the clock sweep's nextVictimBuffer with a spinlock. With some care all the rest can happen without any global interlock. That's a lot more spinlock acquire/release cycles, but it might work out to a win anyway. Or it might lead to the system suffering a horrible spinlock-induced death spiral on eviction-heavy workloads. I think even after fixing this - which we definitely should do - having a sensible/more aggressive bgwriter moving pages onto the freelist makes sense because then backends then don't need to deal with dirty pages. Or scanning to find evictable pages. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]
On Thu, Jun 27, 2013 at 08:41:59AM +, Andrew Gierth wrote: Tom Lane said: Agreed, separating out the function-call-with-trailing-declaration syntaxes so they aren't considered in FROM and index_elem seems like the best compromise. If we do that for window function OVER clauses as well, can we make OVER less reserved? Yes. At least, I tried it with both OVER and FILTER unreserved and there were no grammar conflicts (and I didn't have to do anything fancy to avoid them), and it passed regression with the exception of the changed error message for window functions in the from-clause. So is this the final decision on how to proceed? It seems good to me, and I can work with David to get it done. If this is really the direction people want to go, I'm in. Is there some code I can look at? I still submit that having our reserved word ducks in a row in advance is a saner way to go about this, and will work up a patch for that as I have time. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add more regression tests for CREATE OPERATOR
On Wed, Jun 26, 2013 at 3:29 AM, Szymon Guz mabew...@gmail.com wrote: OK, so I think this patch can be committed, I will change the status. We have a convention that roles created by the regression tests needs to have regress or something of the sort in the name, and that they need to be dropped by the regression tests. The idea is that if someone runs make installcheck against an installed server, it should pass - even if you run it twice in succession. And also, it shouldn't be likely to try to create (and then drop!) a role name that already exists. Setting this to Waiting on Author. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_filedump 9.3: checksums (and a few other fixes)
On 2013-06-27 09:51:07 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2013-06-26 21:18:49 -0700, Peter Geoghegan wrote: Heroku are interested in online verification of basebackups (i.e. using checksums to verify the integrity of heap files as they are backed up, with a view to relying less and less on logical backups). Why not do this from a function/background worker in the backend where you can go via the buffer manager to avoid torn pages et al. If you use a buffer strategy the cache poisoning et al should be controlleable. That would require having a functioning postmaster environment, which seems like it would make such a tool much less flexible. I personally wouldn't trust online backups that aren't proven to be replayable into a runnable state very much. I have seen too many cases where that failed. Maybe the trick is to add a recovery.conf option to make postgres replay to the first restartpoint and then shutdown. At that point you can be sure there aren't any torn pages anymore (bugs aside). In fact that sounds like a rather useful pg_basebackup extension... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_filedump 9.3: checksums (and a few other fixes)
Andres Freund and...@2ndquadrant.com writes: On 2013-06-26 21:18:49 -0700, Peter Geoghegan wrote: Heroku are interested in online verification of basebackups (i.e. using checksums to verify the integrity of heap files as they are backed up, with a view to relying less and less on logical backups). Why not do this from a function/background worker in the backend where you can go via the buffer manager to avoid torn pages et al. If you use a buffer strategy the cache poisoning et al should be controlleable. That would require having a functioning postmaster environment, which seems like it would make such a tool much less flexible. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Mon, Jun 24, 2013 at 10:47 PM, Sawada Masahiko sawada.m...@gmail.com wrote: 1. synchronous standby and make same as failback safe standby 2. asynchronous standby and make same as failback safe standby in above case, adding new parameter might be meaningless. but I think that we should handle case not only case 1,2 but also following case 3, 4 for DR. to support case 1 and 2, I'm thinking that following another 2 ideas. we add synchronous_transfer( commit/ data_flush/ all) . This GUC will only affect the standbys mentioned in the list of synchronous_standby_names. 1. If synchronous_transfer is set to commit, current synchronous replication behavior is achieved 2. If synchronous_transfer is set to data_flush, the standbys named in synchronous_standby_names will act as ASYNC failback safe standbys 3. If synchronous_transfer is set to all, the standbys named in synchronous_standby_names will act as SYNC failback safe standbys in this approach, 3 is confusing because we are actually setting up a ASYNC standby by using the GUCs meant for sync standby setup. - we extend synchronous_commit so that it also accepts like 'all'. ( this approach dosen't provide 'synchronous_transfer' parameter) 'all' value means that master wait for not only replicated WAL but also replicated data page (e.g., CLOG, pg_control). and master changes the process by whether standby is connected as sync or async. 1. If synchronous_commit is set to 'all' and synchronous_standby_name is set to standby name, the standbys named in synchronous_standby_names will act as SYNC failback safe standby. 2. If synchronous_commit is set to 'all' and synchronous_standby_name is NOT set to standby name, the standbys which is connecting to master will act as ASYNC failback safe standby. one problem with not naming ASYNC standby explicitly is that the master has no clue which standby to wait on. If it chooses to wait on all async standbys for failback-safety that can be quite detrimental, especially because async standbys can become easily unreachable if they are on a slow link or at remote location. please give me feedback. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]
Andrew Gierth and...@tao11.riddles.org.uk writes: Tom Lane said: Agreed, separating out the function-call-with-trailing-declaration syntaxes so they aren't considered in FROM and index_elem seems like the best compromise. If we do that for window function OVER clauses as well, can we make OVER less reserved? Yes. At least, I tried it with both OVER and FILTER unreserved and there were no grammar conflicts (and I didn't have to do anything fancy to avoid them), and it passed regression with the exception of the changed error message for window functions in the from-clause. So is this the final decision on how to proceed? It seems good to me, and I can work with David to get it done. Yeah, please submit a separate patch that just refactors the existing grammar as above; that'll simplify reviewing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]
Pavel Stehule pavel.steh...@gmail.com writes: Tom Lane said: If we do that for window function OVER clauses as well, can we make OVER less reserved? Isn't dangerous do OVER unreserved keyword?? How so? The worst-case scenario is that we find we have to make it more reserved again in some future release, as a consequence of some new randomness from the SQL committee. That will just return us to the status quo, in which anybody who uses OVER as a table/column name has been broken since about 8.4. Since we still hear of people using releases as old as 7.2.x, I'm sure there are a few out there who would still be helped if we could de-reserve OVER again. (Not to mention people migrating from other systems in which it's not a keyword.) In any case, the general project policy has been to never make keywords any more reserved than we absolutely have to. If we didn't care about this, we wouldn't be bothering with four separate categories of keywords. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add more regression tests for CREATE OPERATOR
Sure Robert. I 'll update the tests and get back. Two questions, while we're at it: 1. Any other conventions (for naming)? 2. Should I assume that all database objects that get created, need to be dropped explicitly? Or is this point specifically about ROLES? -- Robins Tharakan On 27 June 2013 09:00, Robert Haas robertmh...@gmail.com wrote: On Wed, Jun 26, 2013 at 3:29 AM, Szymon Guz mabew...@gmail.com wrote: OK, so I think this patch can be committed, I will change the status. We have a convention that roles created by the regression tests needs to have regress or something of the sort in the name, and that they need to be dropped by the regression tests. The idea is that if someone runs make installcheck against an installed server, it should pass - even if you run it twice in succession. And also, it shouldn't be likely to try to create (and then drop!) a role name that already exists. Setting this to Waiting on Author. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Add more regression tests for dbcommands
On Wed, Jun 26, 2013 at 11:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: Josh Berkus j...@agliodbs.com writes: On 06/26/2013 12:08 PM, Fabien COELHO wrote: I have been suggesting something upon that line in some of the reviews I've posted about Robins non regression tests, if they were to be rejected on the basis that they add a few seconds for checks. They are well made to test corner cases quite systematically, and I feel that it would be sad if they were lost. My thinking was that someone should add all of his new tests at once, and then see how much of a time difference they make. If it's 7 seconds, who cares? Making that measurement on the current set of tests doesn't seem to me to prove much. I assume Robins' eventual goal is to make a significant improvement in the tests' code coverage across the entire backend, and what we see submitted now is just as much as he's been able to do yet in that line. So even if the current cost is negligible, I don't think it'll stay that way. Well, this is a discussion we've had before. I'm certainly in favor of having a bigger set of regression tests, so that people can just run the small suite if they want to. But I'm not keen on pushing tests into that extended suite when they run in a fractions of a second. That's likely to create more hassle than it solves, since people will forget to update the expected outputs if they don't run those tests, and if they do run those tests, then having them in a separate suite isn't saving anything. The place for a separate test suite, IMHO, is when you have tests that run for a long time individually. FWIW, internally to EDB, we have it broken down just about exactly that way. Our core set of regression tests for PPAS takes about 5 minutes to run on my laptop, and unfortunately quite a bit longer on some older laptops. It's a nuisance, but it's a manageable nuisance, and it finds a lot of coding errors. Then we have separate schedules for test suites that contain long-running tests or have dependencies on which compiler flags you use; most developers don't run that suite regularly, but it gets run every night. That keeps those annoyances out of the foreground path of developers. I really don't see why we shouldn't take the same approach here. I don't think we want as large a regression test suite for PostgreSQL as we have for PPAS, because among other things, PostgreSQL doesn't have a paid staff of people who can be roped into helping manage it when needed. But the management effort for the kind of expansion that Robins Tharakan is proposing here is not going to be all that large, at least if my EDB experience is any indication. And I think it will help find bugs. I also think (and this is also something we've seen internally) that part of the value of a regression suite is that it exposes when you've changed the behavior. The regression tests fail and you have to change the expected outputs; fine. But now it's much less likely that you're going to make those kinds of changes without *realizing* that you've changed the behavior. I don't have a specific example top of mind right now, but I am pretty sure there have been at least a few cases where PostgreSQL changes have had knock-on consequences that weren't obvious at the time they were made. The kind of work that Robins is doing here is not going to fix that problem entirely, but I think it will help, and I think that has a lot of value. So I'd like to endorse Josh's idea: subject to appropriate review, let's add these test cases. Then, if it really turns out to be too burdensome, we can take them out, or figure out a sensible way to split the suite. Pushing all of Robins work into a secondary suite, or throwing it out altogether, both seem to me to be things which will not be to the long-term benefit of the project. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part 1: additional information
On 06/25/2013 12:03 AM, Alexander Korotkov wrote: New revision of patch is attached. Now it includes some docs. Hi, I was curious about the new layout of the data page, so I spent a while looking into the code. It's interesting, but I suspect 2 things are not o.k.: * gindatapage.c:dataIsEnoughSpace() - 'i++' in the for loop should probably be 'j++', otherwise it loops forever * gin_private.h:ginDataPageLeafRead() - fetch_att() is used to retrieve the additional info, but per the definition and comments in tupmacs.h it expects aligned pointer. * gindatapage.c:ginCheckPlaceToDataPageLeaf() - comment if leaf data page should probably be on a leaf data page or so. Regards, Antonin Houska (Tony) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add long options to pgbench (submission 1)
On Thu, Jun 27, 2013 at 10:02 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Jun 25, 2013 at 3:09 PM, Fabien COELHO coe...@cri.ensmp.fr wrote: I think --quiet-log should be spelled --quiet. ISTM that --quiet usually means not verbose on stdout, so I added log because this was specific to the log output, and that there may be a need for a --quiet option for stdout at some time. The output that is quieted by -q is not the log output produced by --log; it's the regular progress output on stdout/stderr. So I changed that, and committed this, with some further cosmetic changes. I made the formatting of the help message more like psql's help message, including adjusting pgbench to start the description of each option in the same column that psql does. This got rid of a lot of line breaks and IMHO makes the output of pgbench --help quite a bit more readable. I made stylistic adjustments to the documentation portion of the patch as well, again to match the markup used for psql. In help messages: +-s NUM, --scale NUM scaling factor\n This should be -s, --scale=NUM for the sake of consistency with other options. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
Andres Freund and...@2ndquadrant.com wrote: We can play cute tricks akin to what CREATE INDEX CONCURRENTLY currently does, i.e. wait for all other relations that could have possibly seen the old relfilenode (they must have at least a share lock on the relation) before dropping the actual storage. The reason we cannot currently do that in most scenarios is that we cannot perform transactional/mvcc updates of non-exclusively locked objects due to the SnapshotNow problems of seeing multiple or no versions of a row during a single scan. Not only would that be slower than the submitted patch in cases where only a few rows differ, but it could be waiting to swap in that new heap for an unbounded amount of time. I think the current patch will play nicer with incremental maintenance than what you suggest here. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Move unused buffers to freelist
On 2013-06-27 09:50:32 -0400, Robert Haas wrote: On Thu, Jun 27, 2013 at 9:01 AM, Andres Freund and...@2ndquadrant.com wrote: Contention wise I aggree. What I have seen is that we have a huge amount of cacheline bouncing around the buffer header spinlocks. How did you measure that? perf record -e cache-misses. If you want it more detailed looking at {L1,LLC}-{load,store}{s,misses} can sometimes be helpful too. Also, running perf stat -vvv postgres -D ... for a whole benchmark can be useful to compare how much a change influences cache misses and such. For very detailed analysis running something under valgrind/cachegrind can be helpful too, but I usually find perf to be sufficient. I have previously added some adhoc instrumentation that printed the amount of buffers that were required (by other backends) during a bgwriter cycle and the amount of buffers that the buffer manager could actually write out. I think you can see how many are needed from buffers_alloc. No? Not easily correlated with bgwriter activity. If we cannot keep up because it's 100% busy writing out buffers I don't have many problems with that. But I don't think we often are. Problems with the current code: * doesn't manipulate the usage_count and never does anything to used pages. Which means it will just about never find a victim buffer in a busy database. Right. I was thinking that was part of this patch, but it isn't. I think we should definitely add that. In other words, the background writer's job should be to run the clock sweep and add buffers to the free list. We might need to split it into two for that. One process to writeout dirty pages, one to populate the freelist. Otherwise we will probably regularly hit the current scalability issues because we're currently io contended. Say during a busy or even immediate checkpoint. I think we should also split the lock: a spinlock for the freelist, and an lwlock for the clock sweep. Yea, thought about that when writing the thing about the exclusive lock during the clocksweep. * by far not aggressive enough, touches only a few buffers ahead of the clock sweep. Check. Fixing this might be a separate patch, but then again maybe not. The changes we're talking about here provide a natural feedback mechanism: if we observe that the freelist is empty (or less than some length, like 32 buffers?) set the background writer's latch, because we know it's not keeping up. Yes, that makes sense. Also provides adaptability to bursty workloads which means we don't have too complex logic in the bgwriter for that. There's another thing we could do to noticeably improve scalability of buffer acquiration. Currently we do a huge amount of work under the freelist lock. ... So, we perform the entire clock sweep until we found a single buffer we can use inside a *global* lock. At times we need to iterate over the whole shared buffers BM_MAX_USAGE_COUNT (5) times till we pushed down all the usage counts enough (if the database is busy it can take even longer...). In a busy database where usually all the usagecounts are high the next backend will touch a lot of those buffers again which causes massive cache eviction bouncing. It seems far more sensible to only protect the clock sweep's nextVictimBuffer with a spinlock. With some care all the rest can happen without any global interlock. That's a lot more spinlock acquire/release cycles, but it might work out to a win anyway. Or it might lead to the system suffering a horrible spinlock-induced death spiral on eviction-heavy workloads. I can't imagine it to be worse that what we have today. Also, nobody requires us to only advance the clocksweep by one page, we can easily do it say 29 pages at a time or so if we detect the lock is contended. Alternatively it shouldn't be too hard to make it into an atomic increment, although that requires some trickery to handle the wraparound sanely. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spin Lock sleep resolution
On Wed, Jun 26, 2013 at 5:49 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Tue, Jun 18, 2013 at 12:09 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Jeff's patch seems to somewhat alleviate the huge fall in performance I'm otherwise seeing without the nonlocked-test patch. With the nonlocked-test patch, if you squint you can see a miniscule benefit. I wasn't expecting much of a gain from this, just wanted to verify that it's not making things worse. So looks good to me. Hi Heikki, Thanks for trying out the patch. I see in the commitfest app it is set to Waiting on Author (but I don't know who maiku41 is). Based on the comments so far, I don't know what I should be doing on it at the moment, and I thought perhaps your comment above meant it should be ready for committer. If we think the patch has a risk of introducing subtle errors, then it probably can't be justified based on the small performance gains you saw. But if we think it has little risk, then I think it is justified simply based on cleaner code, and less confusion for people who are tracing a problematic process. If we want it to do a random escalation, it should probably look like a random escalation to the interested observer. I think it has little risk. If it turns out to be worse for performance, we can always revert it, but I expect it'll be better or a wash, and the results so far seem to bear that out. Another interesting question is whether we should fool with the actual values for minimum and maximum delays, but that's a separate and much harder question, so I think we should just do this for now and call it good. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add long options to pgbench (submission 1)
On Thu, Jun 27, 2013 at 10:24 AM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Jun 27, 2013 at 10:02 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Jun 25, 2013 at 3:09 PM, Fabien COELHO coe...@cri.ensmp.fr wrote: I think --quiet-log should be spelled --quiet. ISTM that --quiet usually means not verbose on stdout, so I added log because this was specific to the log output, and that there may be a need for a --quiet option for stdout at some time. The output that is quieted by -q is not the log output produced by --log; it's the regular progress output on stdout/stderr. So I changed that, and committed this, with some further cosmetic changes. I made the formatting of the help message more like psql's help message, including adjusting pgbench to start the description of each option in the same column that psql does. This got rid of a lot of line breaks and IMHO makes the output of pgbench --help quite a bit more readable. I made stylistic adjustments to the documentation portion of the patch as well, again to match the markup used for psql. In help messages: +-s NUM, --scale NUM scaling factor\n This should be -s, --scale=NUM for the sake of consistency with other options. Woops, missed that one. Fixed, thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Kudos for Reviewers -- straw poll
On Tue, Jun 25, 2013 at 2:27 PM, Andrew Dunstan and...@dunslane.net wrote: I'd like to see prizes each release for best contribution and best reviewer - I've thought for years something like this would be worth trying. Committers and core members should not be eligible - this is about encouraging new people. Encouraging new people is good, but recognizing sustained, long-term contributions is good, too. I think we should do more of that, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]
On 6/23/13 10:50 PM, Tom Lane wrote: It'd sure be interesting to know what the SQL committee's target parsing algorithm is. It's whatever Oracle and IBM implement. Or maybe they really don't give a damn about breaking applications every time they invent a new reserved word? Well, yes, I think that policy was built into the language at the very beginning. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] refresh materialized view concurrently
Hitoshi Harada umi.tan...@gmail.com wrote: Kevin Grittner kgri...@ymail.com wrote: Hitoshi Harada umi.tan...@gmail.com wrote: As far as I can tell, the overall approach is as follows. - create a new temp heap as non-concurrent does, but with ExclusiveLock on the matview, so that reader wouldn't be blocked Non-concurrent creates the heap in the matview's tablespace and namespace, so the temp part is different in concurrent generation. This difference is why concurrent can be faster when few rows change. It's still not clear to me why you need temp in concurrent and not in non-concurrent. Well, temp tables can be in an entirely different tablespace, so you can't just move the heap of a temp table into place as the new heap for the matview (as we do for a non-concurrent REFRESH) -- at least not without potentially copying everything an extra time. For concurrent we are modifying the existing matview heap, and the freshly generated set of values, as well as the diff table, are just needed, well, temporarily. That makes the behavior of temp tables ideal. Not only are they not logged, they are potentially placed on faster tablespaces, and the data written to them might never be written to disk: http://www.postgresql.org/docs/9.2/interactive/runtime-config-resource.html#GUC-TEMP-BUFFERS If this type of operations is always creating temp table and just swap it with existing one, why can't we just make it temp always? Because of the potentially differrent tablespaces. And if the performance is the only concern, is temp better than just turning off WAL for the table or UNLOGGED table? Yes, it is. - this operation requires unique index for performance reason (or correctness reason too) It is primarily for correctness in the face of duplicate rows which have no nulls. Do you think the reasons need to be better documented with comments? Ah, yes, even after looking at patch I was confused if it was for performance or correctness. This is the part of the function's comment which attempts to explain the problem. * This join cannot work if there are any * duplicated rows in either the old or new versions, in the sense that every * column would compare as equal between the two rows. It does work correctly * in the face of rows which have at least one NULL value, with all non-NULL * columns equal. The behavior of NULLs on equality tests and on UNIQUE * indexes turns out to be quite convenient here; the tests we need to make * are consistent with default behavior. If there is at least one UNIQUE * index on the materialized view, we have exactly the guarantee we need. By * joining based on equality on all columns which are part of any unique * index, we identify the rows on which we can use UPDATE without any problem. * If any column is NULL in either the old or new version of a row (or both), * we must use DELETE and INSERT, since there could be multiple rows which are * NOT DISTINCT FROM each other, and we could otherwise end up with the wrong * number of occurrences in the updated relation. I'm open to suggestions on better wording. As an example of the way the full join gets into trouble with duplicate rows when used for a diff, see this example: test=# create table old (c1 int, c2 int); CREATE TABLE test=# create table new (c1 int, c2 int); CREATE TABLE test=# insert into old values test-# (1,1),(1,2),(1,2),(1,2),(1,3),(1,null),(1,null); INSERT 0 7 test=# insert into new values test-# (1,1),(1,2),(1,2),(1,2),(1,2),(1,2),(1,4),(1,null),(1,null),(1,null); INSERT 0 10 At this point it is clear that we need to add two rows with values (1,2) and we need to wind up with one more row with values (1,null) than we already have. We also need to delete (1,3) and add (1,4). But full join logic fails to get things right for the case of duplicate rows with no nulls: test=# select old, new test-# from old test-# full join new on old.c1 = new.c1 and old.c2 = new.c2 test-# where (old.*) is distinct from (new.*); old | new ---+--- (1,3) | (1,) | (1,) | | (1,4) | (1,) | (1,) | (1,) (7 rows) It's a shame we cannot refresh it concurrently if we have duplicate rows in the matview. I thought it would make sense to allow it without unique key if it as only performance tradeoffs. Well, we *could* allow it without a unique index, but the code would be more complex and significantly slower. I think we would still want to handle it the way the current patch does when a unique index is present, even if we have a way to handle cases where such an index is not present. Even if I were convinced it was worthwhile to support the more general case, I would want to commit this patch first, and add the more complicated code as a follow-on patch. At this point I'm not convinced of the value of supporting concurrent refresh of materialized views which contain duplicate rows, nor of the benefit of allowing it to work ten
Re: [HACKERS] Add more regression tests for CREATE OPERATOR
Robins Tharakan thara...@gmail.com writes: 2. Should I assume that all database objects that get created, need to be dropped explicitly? Or is this point specifically about ROLES? It's about any global objects (that wouldn't get dropped by dropping the regression database). As far as local objects go, there are benefits to leaving them around, particularly if they present interesting test cases for pg_dump/pg_restore. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add more regression tests for dbcommands
On 6/27/13 10:20 AM, Robert Haas wrote: So I'd like to endorse Josh's idea: subject to appropriate review, let's add these test cases. Then, if it really turns out to be too burdensome, we can take them out, or figure out a sensible way to split the suite. Pushing all of Robins work into a secondary suite, or throwing it out altogether, both seem to me to be things which will not be to the long-term benefit of the project. I agree. If it gets too big, let's deal with it then. But let's not make things more complicated because they *might* get too big later. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add more regression tests for dbcommands
On 6/26/13 12:17 PM, Tom Lane wrote: (I like to point at mysql's regression tests, which take well over an hour even on fast machines. If lots of tests are so helpful, why is their bug rate no better than ours?) Tests are not (primarily) there to prevent bugs. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Move unused buffers to freelist
Andres Freund and...@2ndquadrant.com wrote: I don't think I actually found any workload where the bgwriter actually wroute out a relevant percentage of the necessary pages. I had one at Wisconsin Courts. The database which we targeted with logical replication from the 72 circuit court databases (plus a few others) on six database connection pool with about 20 to (at peaks) hundreds of transactions per second modifying the database (the average transaction involving about 20 modifying statements with potentially hundreds of affected rows), with maybe 2000 to 3000 queries per second on a 30 connection pool, wrote about one-third each of the dirty buffers with checkpoints, background writer, and backends needing to read a page. I shared my numbers with Greg, who I believe used them as one of his examples for how to tune memory, checkpoints, and background writer, so you might want to check with him if you want more detail. Of course, we set bgwriter_lru_maxpages = 1000 and bgwriter_lru_multiplier = 4, and kept shared_buffers to 2GB to hit that. Without the reduced shared_buffers and more aggressive bgwriter we hit the problem with writes overwhelming the RAID controller's cache and causing everything in the database to freeze until it cleared some cache space. I'm not saying this invalidates your general argument; just that such cases do exist. Hopefully this data point is useful. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add more regression tests for dbcommands
Peter Eisentraut pete...@gmx.net writes: On 6/26/13 12:17 PM, Tom Lane wrote: (I like to point at mysql's regression tests, which take well over an hour even on fast machines. If lots of tests are so helpful, why is their bug rate no better than ours?) Tests are not (primarily) there to prevent bugs. Really? Pray tell, what do you think they're for? Particularly code coverage tests, which is what these are? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error code returned by lock_timeout
On Thu, Jun 27, 2013 at 2:22 PM, Boszormenyi Zoltan z...@cybertec.at wrote: Hi, I just realized that in the original incarnation of lock_timeout, I used ERRCODE_LOCK_NOT_AVAILABLE (to be consistent with NOWAIT) but the patch that was accepted into 9.3 contained ERRCODE_QUERY_CANCELED which is the same as for statement_timeout. Which would be more correct? ISTM that ERRCODE_LOCK_NOT_AVAILABLE is more suitable... Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.3 doc fix] ECPG VAR command does not describe the actual specification
Looking around the 9.3 doc, I found a small, but not-insignificant error in the documentation. Thanks for finding and fixing. Patch committed. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org Jabber: michael.meskes at gmail dot com VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improvement of checkpoint IO scheduler for stable transaction responses
On Tue, Jun 25, 2013 at 4:28 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: The only feedback we have on how bad things are is how long it took the last fsync to complete, so I actually think that's a much better way to go than any fixed sleep - which will often be unnecessarily long on a well-behaved system, and which will often be far too short on one that's having trouble. I'm inclined to think think Kondo-san has got it right. Quite possible, I really don't know. I'm inclined to first try the simplest thing possible, and only make it more complicated if that's not good enough. Kondo-san's patch wasn't very complicated, but nevertheless a fixed sleep between every fsync, unless you're behind the schedule, is even simpler. I'm pretty sure Greg Smith tried it the fixed-sleep thing before and it didn't work that well. I have also tried it and the resulting behavior was unimpressive. It makes checkpoints take a long time to complete even when there's very little data to flush out to the OS, which is annoying; and when things actually do get ugly, the sleeps aren't long enough to matter. See the timings Kondo-san posted downthread: 100ms delays aren't going let the system recover in any useful way when the fsync can take 13 s for one file. On a system that's badly weighed down by I/O, the fsync times are often *extremely* long - 13 s is far from the worst you can see. You have to give the system a meaningful time to recover from that, allowing other processes to make meaningful progress before you hit it again, or system performance just goes down the tubes. Greg's test, IIRC, used 3 s sleeps rather than your proposal of 100 ms, but it still wasn't enough. In particular, it's easier to tie that into the checkpoint scheduler - I'm not sure how you'd measure progress or determine how long to sleep unless you assume that every fsync is the same. I think the thing to do is assume that the fsync phase will take 10% or so of the total checkpoint time, but then be prepared to let the checkpoint run a bit longer if the fsyncs end up being slow. As Greg has pointed out during prior discussions of this, the normal scenario when things get bad here is that there is no way in hell you're going to fit the checkpoint into the originally planned time. Once all of the write caches between PostgreSQL and the spinning rust are full, the system is in trouble and things are going to suck. The hope is that we can stop beating the horse while it is merely in intensive care rather than continuing until the corpse is fully skeletized. Fixed delays don't work because - to push an already-overdone metaphor a bit further - we have no idea how much of a beating the horse can take; we need something adaptive so that we respond to what actually happens rather than making predictions that will almost certainly be wrong a large fraction of the time. To put this another way, when we start the fsync() phase, it often consumes 100% of the available I/O on the machine, completing starving every other process that might need any. This is certainly a deficiency in the Linux I/O scheduler, but as they seem in no hurry to fix it we'll have to cope with it as best we can. If you do the fsyncs in fast succession (and 100ms separation might as well be no separation at all), then the I/O starvation of the entire system persists through the entire fsync phase. If, on the other hand, you sleep for the same amount of time the previous fsync took, then on the average, 50% of the machine's I/O capacity will be available for all other system activity throughout the fsync phase, rather than 0%. Now, unfortunately, this is still not that good, because it's often the case that all of the fsyncs except one are reasonably fast, and there's one monster one that is very slow. ext3 has a known bad behavior that dumps all dirty data for the entire *filesystem* when you fsync, which tends to create these kinds of effects. But even on better-behaved filesystem, like ext4, it's fairly common to have one fsync that is painfully longer than all the others. So even with this patch, there are still going to be cases where the whole system becomes unresponsive. I don't see any way to to do better without a better kernel API, or a better I/O scheduler, but that doesn't mean we shouldn't do at least this much. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error code returned by lock_timeout
2013-06-27 17:03 keltezéssel, Fujii Masao írta: On Thu, Jun 27, 2013 at 2:22 PM, Boszormenyi Zoltan z...@cybertec.at wrote: Hi, I just realized that in the original incarnation of lock_timeout, I used ERRCODE_LOCK_NOT_AVAILABLE (to be consistent with NOWAIT) but the patch that was accepted into 9.3 contained ERRCODE_QUERY_CANCELED which is the same as for statement_timeout. Which would be more correct? ISTM that ERRCODE_LOCK_NOT_AVAILABLE is more suitable... I think, too. Can someone commit this one-liner to master and 9.3? Best regards, Zoltán Bsöszörményi -- -- Zoltán Böszörményi Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/ --- src/backend/tcop/postgres.c~ 2013-06-27 07:05:08.0 +0200 +++ src/backend/tcop/postgres.c 2013-06-27 17:10:28.040972642 +0200 @@ -2900,7 +2900,7 @@ DisableNotifyInterrupt(); DisableCatchupInterrupt(); ereport(ERROR, - (errcode(ERRCODE_QUERY_CANCELED), + (errcode(ERRCODE_LOCK_NOT_AVAILABLE), errmsg(canceling statement due to lock timeout))); } if (get_timeout_indicator(STATEMENT_TIMEOUT, true)) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add --progress option to pgbench (submission 3)
On Wed, Jun 26, 2013 at 7:16 AM, Fabien COELHO coe...@cri.ensmp.fr wrote: Here is a v4 that takes into account most of your points: The report is performed for all threads by thread 0, however --progress is not supported under thread fork emulation if there are more than one thread. The report time does not slip anymore. I don't believe that to be an acceptable restriction. We generally require features to work on all platforms we support. We have made occasional compromises there, but generally only when the restriction is fundamental to the platform rather than for developer convenience. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Group Commits Vs WAL Writes
On Thu, Jun 27, 2013 at 3:56 AM, Atri Sharma atri.j...@gmail.com wrote: When we do a commit, WAL buffers are written to the disk. This has a disk latency for the required I/O. Check. Now, with group commits, do we see a spike in that disk write latency, especially in the cases where the user has set wal_buffers to a high value? Well, it does take longer to fsync a larger byte range to disk than a smaller byte range, in some cases. But it's generally more efficient to write one larger range than many smaller ranges, so you come out ahead on the whole. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Documentation/help for materialized and recursive views
On Thu, Jun 27, 2013 at 5:29 AM, Magnus Hagander mag...@hagander.net wrote: The functionality of materialized views will (over time) totally swamp that of normal views, so mixing all the corresponding documentation with the documentation for normal views probably doesn’t make things easier for people that are only interested in normal views. That's a better point I think. That said, it would be very useful if it actually showed up in \h CREATE VIEW in psql - I wonder if we should just add the syntax to that page, and then link said future information on a separate page somehow? IMHO, it's better to keep them separate; they are very different beasts. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add more regression tests for dbcommands
On 6/27/13 10:57 AM, Tom Lane wrote: Peter Eisentraut pete...@gmx.net writes: On 6/26/13 12:17 PM, Tom Lane wrote: (I like to point at mysql's regression tests, which take well over an hour even on fast machines. If lots of tests are so helpful, why is their bug rate no better than ours?) Tests are not (primarily) there to prevent bugs. Really? Pray tell, what do you think they're for? Particularly code coverage tests, which is what these are? Tests document the existing functionality of the code, to facilitate refactoring. (paraphrased, Uncle Bob) Case in point, the existing ALTER DDL code could arguably use even more refactoring. But no one wants to do it because it's unclear what will break. With the proposed set of tests (which I have not read to completion), this could become quite a bit easier, both for the coder and the reviewer. But these tests probably won't detect, say, locking bugs in such new code. That can only be prevented by careful code review and a clean architecture. Perhaps, MySQL has neither. ;-) Code coverage is not an end itself, it's a tool. In this sense, tests prevent existing functionality being broken, which might be classified as a bug. But it's wrong to infer that because system X has a lot of bugs and a lot of tests, having a lot of tests must be useless. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reduce maximum error in tuples estimation after vacuum.
On Thu, Jun 27, 2013 at 7:27 AM, Amit Kapila amit.kap...@huawei.com wrote: Now I can look into it further, I have still not gone through in detail about your new approach to calculate the reltuples, but I am wondering whether there can be anyway with which estimates can be improved with different calculation in vac_estimate_reltuples(). I think this is getting at the threshold question for this patch, which is whether it's really making things better or just moving the problems around. I mean, I have no problem accepting that the new algorithm is (1) reasonably cheap and (2) better in some cases. But if it's worse in other cases, which AFAICS hasn't been discussed, then it's no good. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MD5 aggregate
On Thu, Jun 27, 2013 at 7:29 AM, Marko Kreen mark...@gmail.com wrote: On Thu, Jun 27, 2013 at 11:28 AM, Dean Rasheed dean.a.rash...@gmail.com wrote: On 26 June 2013 21:46, Peter Eisentraut pete...@gmx.net wrote: On 6/26/13 4:04 PM, Dean Rasheed wrote: A quick google search reveals several people asking for something like this, and people recommending md5(string_agg(...)) or md5(string_agg(md5(...))) based solutions, which are doomed to failure on larger tables. The thread discussed several other options of checksumming tables that did not have the air of a crytographic offering, as Noah put it. True but md5 has the advantage of being directly comparable with the output of Unix md5sum, which would be useful if you loaded data from external files and wanted to confirm that your import process didn't mangle it. The problem with md5_agg() is that it's only useful in toy scenarios. It's more useful give people script that does same sum(hash(row)) on dump file than try to run MD5 on ordered rows. Also, I don't think anybody actually cares about MD5(table-as-bytes), instead people want way to check if 2 tables or table and dump are same. I think you're trying to tell Dean to write the patch that you want instead of the patch that he wants. There are certainly other things that could be done that some people might sometimes prefer, but that doesn't mean what he did isn't useful. That having been said, I basically agree with Noah: I think this would be a useful extension (perhaps even in contrib?) but I don't think we need to install it by default. It's useful, but it's also narrow. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] in-catalog Extension Scripts and Control parameters (templates?)
On Thu, Jun 27, 2013 at 5:49 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: I think that's a limitation of the old model and we don't want to turn templates for extensions into being shared catalogs. At least that's my understanding of the design consensus. I agree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fixing pg_ctl with relative paths
On Thu, Jun 27, 2013 at 10:36 AM, Josh Kupershmidt schmi...@gmail.com wrote: On Wed, Jun 26, 2013 at 12:22 PM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Jun 26, 2013 at 2:36 PM, Hari Babu haribabu.ko...@huawei.com wrote: On June 26, 2013 5:02 AM Josh Kupershmidt wrote: Thanks for the feedback. Attached is a rebased version of the patch with the two small issues noted fixed. The following description in the document of pg_ctl needs to be modified? restart might fail if relative paths specified were specified on the command-line during server start. Right, that caveat could go away. +#define DATADIR_SPEC \-D\ \ + + datadir = strstr(post_opts, DATADIR_SPEC); Though this is a corner case, the patch doesn't seem to handle properly the case where -D appears as other option value, e.g., -k option value, in postmaster.opts file. Could I see a command-line example of what you mean? postmaster -k -D, for example. Of course, it's really a corner case :) Another corner case is, for example, pg_ctl -D test1 -o -D test2, that is, multiple -D specifications appear in the command-line. Can we overlook these cases? Just idea to work around that problem is to just append the specified -D option and value to post_opts. IOW, -D option and value appear twice in post_opts. In this case, posteriorly-located ones are used in the end. Thought? Hrm, I think we'd have to be careful that postmaster.opts doesn't accumulate an additional -D specification with every restart. Yes. Oh, I was thinking that postmaster writes only -D specification which postmaster actually uses, in the opts file. So that accumulation would not happen, I thought. But that's not true. Postmaster writes all the specified arguments in the opts file. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Kudos for Reviewers -- straw poll
On Thu, Jun 27, 2013 at 10:37 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, Jun 25, 2013 at 2:27 PM, Andrew Dunstan and...@dunslane.net wrote: I'd like to see prizes each release for best contribution and best reviewer - I've thought for years something like this would be worth trying. Committers and core members should not be eligible - this is about encouraging new people. Encouraging new people is good, but recognizing sustained, long-term contributions is good, too. I think we should do more of that, too. Conforming with David Fetter's pointer to the notion that sometimes attempts to reward can backfire, I'm not sure that it will be super-helpful to create special rewards. On the other hand, to recognize reviewer contributions in places relevant to where they take place seems pretty apropos, which could include: a) Obviously we already capture this in the CommitFest web site (but it's worth mentioning when trying to do a census) b) It would be a pretty good thing to mention reviewers within commit notes; that provides some direct trace-back as to who it was that either validated that the change was good, or that let a bad one slip through. c) The release notes indicate authors of changes; to have a list of reviewers would be a fine thing. If it requires inordinate effort to get the reviewers directly attached to each and every change, perhaps it isn't worthwhile to go to extreme efforts to that end. It could be pretty satisfactory to have a simple listing, in the release notes, of the set of reviewers. That's a lot less bookkeeping than tracking this for each and every change. The statement of such a list is a public acknowledgement of those that help assure that the quality of PostgreSQL code remains excellent. (And that may represent a good way to sell this kudo.) This allows organizations that are sponsoring PostgreSQL development to have an extra metric by which *they* can recognize that their staff that do such work are being recognized as contributors. It seems to me that this is way more useful than a free t-shirt or the like.
Re: [HACKERS] Kudos for Reviewers -- straw poll
On Thu, Jun 27, 2013 at 11:50:07AM -0400, Christopher Browne wrote: b) It would be a pretty good thing to mention reviewers within commit notes; that provides some direct trace-back as to who it was that either validated that the change was good, or that let a bad one slip through. c) The release notes indicate authors of changes; to have a list of reviewers would be a fine thing. If it requires inordinate effort to get the reviewers directly attached to each and every change, perhaps it isn't worthwhile to go to extreme efforts to that end. It could be pretty satisfactory to have a simple listing, in the release notes, of the set of reviewers. That's a lot less bookkeeping than tracking this for each and every change. Adding the names to each release note item is not a problem; the problem is the volume of names that overwhelms the release note text. If we went that direction, I predict we would just remove _all_ names from the release notes. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Min value for port
On Thu, Jun 27, 2013 at 9:22 AM, Andres Freund and...@2ndquadrant.comwrote: On 2013-06-27 15:11:26 +0200, Magnus Hagander wrote: On Thu, Jun 27, 2013 at 2:16 PM, Peter Eisentraut pete...@gmx.net wrote: On 6/27/13 6:34 AM, Magnus Hagander wrote: Is there a reason why we have set the min allowed value for port to 1, not 1024? Given that you can't actually start postgres with a value of 1024, shoulnd't the entry in pg_settings reference that as well? Are you thinking of the restriction that you need to be root to use ports 1024? That restriction is not necessarily universal. We can let the kernel tell us at run time if it doesn't like our port. Yes, that's the restriction I was talking about. It's just a bit annoying that if you look at pg_settings.min_value it doesn't actually tell you the truth. But yeah, I believe Windows actually lets you use a lower port number, so it'd at least have to be #ifdef'ed for that if we wanted to change it. You can easily change the setting on linux as well. And you can grant specific binaries the permission to bind to restricted ports without being root. I don't think the additional complexity to get a sensible value in there is warranted. With that large a set of local policies that can change the usual 1024 policy, yep, I agree that it's not worth trying too hard on this one. And supposing something like SE-Linux can grant bindings for a particular user/binary to access a *specific* port, that represents a model that is pretty incompatible with the notion of a minimum value. On the one hand, the idea of having to add a lot of platform-specific code (which may further be specific to a framework like SE-Linux) is not terribly appealing. Further, if the result is something that doesn't really fit with a minimum, is it much worth fighting with the platform localities? Indeed, I begin to question whether indicating a minimum is actually meaningful. -- When confronted by a difficult problem, solve it by reducing it to the question, How would the Lone Ranger handle this?
Re: [HACKERS] extensible external toast tuple support snappy prototype
On Thu, Jun 27, 2013 at 6:08 AM, Robert Haas robertmh...@gmail.com wrote: On Wed, Jun 19, 2013 at 3:27 AM, Andres Freund and...@2ndquadrant.com wrote: There will be a newer version of the patch coming today or tomorrow, so there's probably no point in looking at the one linked above before that... This patch is marked as Ready for Committer in the CommitFest app. But it is not clear to me where the patch is that is being proposed for commit. No, this conversation is about patch #1153, pluggable toast compression, which is in Needs Review, and you may be confused with #1127, extensible external toast tuple support. -- Hitoshi Harada
Re: [HACKERS] Kudos for Reviewers -- straw poll
On 06/27/2013 12:12 PM, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: On Thu, Jun 27, 2013 at 11:50:07AM -0400, Christopher Browne wrote: It could be pretty satisfactory to have a simple listing, in the release notes, of the set of reviewers. That's a lot less bookkeeping than tracking this for each and every change. Adding the names to each release note item is not a problem; the problem is the volume of names that overwhelms the release note text. If we went that direction, I predict we would just remove _all_ names from the release notes. Yeah. Keep in mind that the overwhelming majority of the audience for the release notes doesn't actually give a darn who implemented what. Maybe we should have a Kudos / Bragging rights wiki page, with a table something like this: Release Feature Name Principal Author(s) Contributing Author(s) Code Reviewer(s) Tester(s) Constructing it going backwards would be an interesting task :-) cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Group Commits Vs WAL Writes
On Thu, Jun 27, 2013 at 12:56 AM, Atri Sharma atri.j...@gmail.com wrote: Now, with group commits, do we see a spike in that disk write latency, especially in the cases where the user has set wal_buffers to a high value? commit_delay exists to artificially increase the window in which the leader backend waits for more group commit followers. At higher client counts, that isn't terribly useful because you'll naturally have enough clients anyway, but at lower client counts particularly where fsyncs have high latency, it can help quite a bit. I mention this because clearly commit_delay is intended to trade off latency for throughput. Although having said that, when I worked on commit_delay, the average and worse-case latencies actually *improved* for the workload in question, which consisted of lots of small write transactions. Though I wouldn't be surprised if you could produce a reasonable case where latency was hurt a bit, but throughput improved. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash partitioning.
2013/6/27 Markus Wanner mar...@bluegap.ch: On 06/27/2013 11:12 AM, Nicolas Barbier wrote: Imagine that there are a lot of indexes, e.g., 50. Although a lookup (walking one index) is equally fast, an insertion must update al 50 indexes. When each index requires one extra I/O (because each index is one level taller), that is 50 extra I/Os. In the partitioned case, each index would require the normal smaller amount of I/Os. Choosing which partition to use must only be done once: The result “counts” for all indexes that are to be updated. I think you're underestimating the cost of partitioning. After all, the lookup of what index to update for a given partition is a a lookup in pg_index via pg_index_indrelid_index - a btree index. I am assuming that this (comparatively very small and super-hot) index is cached all the time, while for the other indexes (that are supposedly super-huge) only the top part stays cached. I am mostly just trying to find out where Yuri’s “partitioning is needed for super-huge tables” experience might come from, and noting that Heikki’s argument might not be 100% valid. I think that the “PostgreSQL-answer” to this problem is to somehow cluster the data on the “hotness column” (so that all hot rows are close to one another, thereby improving the efficiency of the caching of relation blocks) + partial indexes for the hot rows (as first mentioned by Claudio; to improve the efficiency of the caching of index blocks). Additionally, the depth of an index doesn't directly translate to the number of I/O writes per insert (or delete). I'd rather expect the avg. number of I/O writes per insert into a b-tree to be reasonably close to one - depending mostly on the number of keys per page, not depth. My reasoning was: To determine which index block to update (typically one in both the partitioned and non-partitioned cases), one needs to walk the index first, which supposedly causes one additional (read) I/O in the non-partitioned case on average, because there is one extra level and the lower part of the index is not cached (because of the size of the index). I think that pokes a hole in Heikki’s argument of “it really doesn’t matter, partitioning == using one big table with big non-partial indexes.” Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add --progress option to pgbench (submission 3)
Dear Robert, Here is a v4 that takes into account most of your points: The report is performed for all threads by thread 0, however --progress is not supported under thread fork emulation if there are more than one thread. The report time does not slip anymore. I don't believe that to be an acceptable restriction. The pthread fork emulation is just an ugly hack to run pgbench on a host that does not have pthreads (portable threads). I'm not sure that it applies on any significant system, but I can assure you that it imposes severe limitations about how to do things properly in pgbench: As there is no threads, there is no shared memory, no locking mecanism, nothing really. So it is hard to generated a shared report in such conditions. My first proposal is to remove the fork emulation altogether, which would remove many artificial limitations to pgbench and simplify the code significantly. That would be an improvement. Otherwise, he simplest possible adaptation, if it is required to have the progress feature under fork emulation to pass it, is that under fork emulation each processus reports its current progress instead of having a collective summing. Note that it is possible to implement the feature with interprocess communications, but really generating many pipes will add a lot of complexity to the code, and I do not thing that the code nor this simple feature deserve that. Another option is to have each thread to report its progression indenpently with all implementations, that what I did in the first instance. It is much less interesting, but it would be homogeneous although poor for every versions. We generally require features to work on all platforms we support. We have made occasional compromises there, but generally only when the restriction is fundamental to the platform rather than for developer convenience. I agree with this kind of generally, but please consider that pthread fork emulation really means processes, so that simple things with threads become significantly more complex to implement. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PQConnectPoll, connect(2), EWOULDBLOCK and somaxconn
Andres Freund and...@2ndquadrant.com writes: On 2013-06-26 20:07:40 -0400, Tom Lane wrote: I still want to delete the test for SOCK_ERRNO == 0. I traced that back to commit da9501bddb4dc33c031b1db6ce2133bcee7b, but I can't find anything in the mailing list archives to explain that. I have an inquiry in to Jan to see if he can remember the reason ... That looks strange, yes. Committed that way. We can always undo it if Jan can defend the logic. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Kudos for Reviewers -- straw poll
Bruce Momjian br...@momjian.us writes: On Thu, Jun 27, 2013 at 11:50:07AM -0400, Christopher Browne wrote: It could be pretty satisfactory to have a simple listing, in the release notes, of the set of reviewers. That's a lot less bookkeeping than tracking this for each and every change. Adding the names to each release note item is not a problem; the problem is the volume of names that overwhelms the release note text. If we went that direction, I predict we would just remove _all_ names from the release notes. Yeah. Keep in mind that the overwhelming majority of the audience for the release notes doesn't actually give a darn who implemented what. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MD5 aggregate
On 6/27/13 4:19 AM, Dean Rasheed wrote: I'd say there are clearly people who want it, and the nature of some of those answers suggests to me that we ought to have a better answer in core. It's not clear what these people wanted this functionality for. They all wanted to analyze a table to compare with another table (or the same table later). Either, they wanted this to detect data changes, in which case the right tool is a checksum, not a cryptographic hash. We already have several checksum implementations in core, so we could expose on of them. Or they wanted this to protect their data from tampering, in which case the right tool is a cryptographic hash, but Noah argues that a sum of MD5 hashes is not cryptographically sound. (And in any case, we don't put cryptographic functionality into the core.) The reason md5_agg is proposed here and in all those cited posts is presumably because the md5() function was already there anyway. The the md5() function is there because the md5 code was already there anyway because of the authentication. Let's not add higher-order already-there-anyway code. ;-) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Group Commits Vs WAL Writes
commit_delay exists to artificially increase the window in which the leader backend waits for more group commit followers. At higher client counts, that isn't terribly useful because you'll naturally have enough clients anyway, but at lower client counts particularly where fsyncs have high latency, it can help quite a bit. I mention this because clearly commit_delay is intended to trade off latency for throughput. Although having said that, when I worked on commit_delay, the average and worse-case latencies actually *improved* for the workload in question, which consisted of lots of small write transactions. Though I wouldn't be surprised if you could produce a reasonable case where latency was hurt a bit, but throughput improved. Thanks for your reply. The logic says that latency will be hit when commit_delay is applied, but I am really interested in why we get an improvement instead. Can small writes be the reason? Regards, Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers