Re: [HACKERS] heap vacuum cleanup locks
On Mon, Jun 6, 2011 at 12:19 AM, Itagaki Takahiro itagaki.takah...@gmail.com wrote: On Sun, Jun 5, 2011 at 12:03, Robert Haas robertmh...@gmail.com wrote: If other buffer pins do exist, then we can't defragment the page, but that doesn't mean no useful work can be done: we can still mark used line pointers dead, or dead line pointers unused. We cannot defragment, but that can be done either by the next VACUUM or by a HOT cleanup. This is just an idea -- Is it possible to have copy-on-write techniques? VACUUM allocates a duplicated page for the pinned page, and copy valid tuples into the new page. Following buffer readers after the VACUUM will see the cloned page instead of the old pinned one. Heikki suggested the same thing, and it's not a bad idea, but I think it would be more work to implement than what I proposed. The caller would need to be aware that, if it tries to re-acquire a content lock on the same page, the offset of the tuple within the page might change. I'm not sure how much work would be required to cope with that possibility. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum cleanup locks
On Jun 6, 2011, at 1:00 AM, Robert Haas wrote: On Mon, Jun 6, 2011 at 12:19 AM, Itagaki Takahiro itagaki.takah...@gmail.com wrote: On Sun, Jun 5, 2011 at 12:03, Robert Haas robertmh...@gmail.com wrote: If other buffer pins do exist, then we can't defragment the page, but that doesn't mean no useful work can be done: we can still mark used line pointers dead, or dead line pointers unused. We cannot defragment, but that can be done either by the next VACUUM or by a HOT cleanup. This is just an idea -- Is it possible to have copy-on-write techniques? VACUUM allocates a duplicated page for the pinned page, and copy valid tuples into the new page. Following buffer readers after the VACUUM will see the cloned page instead of the old pinned one. Heikki suggested the same thing, and it's not a bad idea, but I think it would be more work to implement than what I proposed. The caller would need to be aware that, if it tries to re-acquire a content lock on the same page, the offset of the tuple within the page might change. I'm not sure how much work would be required to cope with that possibility. I've had a related idea that I haven't looked into... if you're scanning a relation (ie: index scan, seq scan) I've wondered if it would be more efficient to deal with the entire page at once, possibly be making a copy of it. This would reduce the number of times you pin the page (often quite dramatically). I realize that means copying the entire page, but I suspect that would occur entirely in the L1 cache, which would be fast. So perhaps instead of copy on write we should try for copy on read on all appropriate plan nodes. On a related note, I've also wondered if it would be useful to allow nodes to deal with more than one tuple at a time; the idea being that it's better to execute a smaller chunk of code over a bigger chunk of data instead of dribbling tuples through an entire execution tree one at a time. Perhaps that will only be useful if nodes are executing in parallel... -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum cleanup locks
On Sun, Jun 5, 2011 at 8:33 AM, Robert Haas robertmh...@gmail.com wrote: We've occasionally seen problems with VACUUM getting stuck for failure to acquire a cleanup lock due to, for example, a cursor holding a pin on the buffer page. In the worst case, this can cause an undetected deadlock, if the backend holding the buffer pin blocks trying to acquire a heavyweight lock that is in turn blocked by VACUUM. A while back, someone (Greg Stark? me?) floated the idea of not waiting for the cleanup lock. If we can't get it immediately, or within some short period of time, then we just skip the page and continue on. Do we know if this is really a problem though ? The deadlock for example, can happen only when a backend tries to get a table level conflicting lock while holding the buffer pin and I am not sure if we do that. The contention issue would probably make sense for small tables because for large to very large tables, the probability that a backend and vacuum would process the same page would be quite low. With the current default for vac_threshold, the small tables can get vacuumed very frequently and if they are also heavily accessed, the cleanup lock can become a bottleneck. Another issue that might be worth paying attention to is the single pass vacuum that I am currently working on. The design that we agreed up on, assumes that the index vacuum must clear index pointers to all the dead line pointers. If we skip any page, we must at least collect the existing dead line pointers and remove those index pointers. If we create dead line pointers and we want to vacuum them later, we store the LSN in the page and that may require defrag. Of course, we can work around that, but I think it will be useful if we do some tests to show that the cleanup lock is indeed a major bottleneck. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] storing TZ along timestamps
On Jun 4, 2011, at 3:56 AM, Greg Stark wrote: On Thu, Jun 2, 2011 at 8:58 PM, Jim Nasby j...@nasby.net wrote: I'm torn between whether the type should store the original time or the original time converted to GMT. This is the wrong way to think about it. We *never* store time converted to GMT. When we want to represent a point in time we represent it as seconds since the epoch. Right. Sorry, my bad. The question here is how to represent more complex concepts than simply points in time. I think the two concepts under discussion are a) a composite type representing a point in time and a timezone it should be interpreted in for operations and display and b) the original input provided which is a text string with the constraint that it's a valid input which can be interpreted as a point in time. My fear with A is that something could change that would make it impossible to actually get back to the time that was originally entered. For example, a new version of the timezone database could change something. Though, that problem also exists for timestamptz today, so presumably if it was much of an issue we'd have gotten complaints by now. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
On 06.06.2011 07:12, Robert Haas wrote: I did some further investigation of this. It appears that more than 99% of the lock manager lwlock traffic that remains with this patch applied has locktag_type == LOCKTAG_VIRTUALTRANSACTION. Every SELECT statement runs in a separate transaction, and for each new transaction we run VirtualXactLockTableInsert(), which takes a lock on the vxid of that transaction, so that other processes can wait for it. That requires acquiring and releasing a lock manager partition lock, and we have to do the same thing a moment later at transaction end to dump the lock. A quick grep seems to indicate that the only places where we actually make use of those VXID locks are in DefineIndex(), when CREATE INDEX CONCURRENTLY is in use, and during Hot Standby, when max_standby_delay expires. Considering that these are not commonplace events, it seems tremendously wasteful to incur the overhead for every transaction. It might be possible to make the lock entry spring into existence on demand - i.e. if a backend wants to wait on a vxid entry, it creates the LOCK and PROCLOCK objects for that vxid. That presents a few synchronization challenges, and plus we have to make sure that the backend that's just been given a lock knows that it needs to release it, but those seem like they might be manageable problems, especially given the new infrastructure introduced by the current patch, which already has to deal with some of those issues. I'll look into this further. Ah, I remember I saw that vxid lock pop up quite high in an oprofile profile recently. I think it was the case of executing a lot of very simple prepared queries. So it would be nice to address that, even from a single CPU point of view. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum cleanup locks
On 06.06.2011 09:35, Jim Nasby wrote: I've had a related idea that I haven't looked into... if you're scanning a relation (ie: index scan, seq scan) I've wondered if it would be more efficient to deal with the entire page at once, possibly be making a copy of it. This would reduce the number of times you pin the page (often quite dramatically). I realize that means copying the entire page, but I suspect that would occur entirely in the L1 cache, which would be fast. We already do that. When an index scan moves to an index page, the heap tid pointers of all the matching index tuples are copied to backend-private memory in one go, and the lock is released. And for a seqscan, the visibility of all the tuples on the page is checked in one go while holding the lock, then the lock is released but the pin is kept. The pin is only released after all the tuples have been read. There's no repeated pin-unpin for each tuple. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] WIP: AuthenticationMD5 protocol documentation clarification
This is my first patch, so I hope I've got the process right for submitting patches. I'm building a driver to talk version 3.0 of the protocol, and generally I've found the documentation to be excellent. One are I had trouble with was responding to the AuthenticationMD5Password challenge. After receiving help on IRC, I've attached a patch to the protocol documentation attempting to clarify what is expected by the backend, basically: concat( 'md5', hex_encode( md5( concat( hex_encode( md5( concat(password, username) ) ), salt ) ) ) ) My technical writing skills were not up to wording that in plain english, and it seems like the rest of the documentation for the protocol steers clear of anything that looks like code. Is this policy in this area or is the code-esque description ok? No code is changed, only documentation, so I've left out the code-relevant patch info fields Patch info: Project name: postgresql Branch: master Cyan diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml new file mode 100644 index d3de330..ba95241 *** a/doc/src/sgml/protocol.sgml --- b/doc/src/sgml/protocol.sgml *** *** 294,303 listitem para The frontend must now send a PasswordMessage containing the ! password encrypted via MD5, using the 4-character salt ! specified in the AuthenticationMD5Password message. If ! this is the correct password, the server responds with an ! AuthenticationOk, otherwise it responds with an ErrorResponse. /para /listitem /varlistentry --- 294,306 listitem para The frontend must now send a PasswordMessage containing the ! result of concat('md5', ! hex_encode(md5(concat(hex_encode(md5(concat(password, username))), ! salt, where salt is the 4-character salt specified in ! the AuthenticationMD5Password message. Username and password do not ! include the trailing null byte. If this is the correct password, the ! server responds with an AuthenticationOk, otherwise it responds with ! an ErrorResponse. /para /listitem /varlistentry -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuum, visibility maps and SKIP_PAGES_THRESHOLD
On 03.06.2011 22:16, Bruce Momjian wrote: I realize we just read the pages from the kernel to maintain sequential I/O, but do we actually read the contents of the page if we know it doesn't need vacuuming? Yes. If so, do we need to? Not necessarily, but it allows us to freeze old tuples, and doesn't cost much anyway. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] gdb with postgres
Hi, I was using gdb to debug postgres. In order to debug the backend of running query, I start postgres first and use select * from pg_backend_pid() to ask for backend pid. Then I start gdb in another bash window with gdb postgres and attach the pid obtained above and set the breakpoint. Then I run the query from the first window. However, the debugging precess which is shown below is not going to the breakpoint. I tried many different breakpoints, but it always start from the 305 client_read_ended(). GNU gdb 6.6 Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as sparc-sun-solaris2.10... (gdb) b qp_add_paths_to_joinrel Breakpoint 1 at 0x1a6744: file joinpath.c, line 67. (gdb) attach 23903 Attaching to program `/usrlocal/pgsql/bin/postgres', process 23903 Retry #1: Retry #2: Retry #3: Retry #4: [New LWP 1] 0xff0cbaa4 in _rt_boot () from /usr/lib/ld.so.1 (gdb) n Single stepping until exit from function _rt_boot, which has no line number information. secure_read (port=0x4a7760, ptr=0x455948, len=8192) at be-secure.c:305 305 client_read_ended(); (gdb) n pq_recvbuf () at pqcomm.c:767 767 if (r 0) (gdb) n 769 if (errno == EINTR) (gdb) n 782 if (r == 0) (gdb) n 788 return EOF; (gdb) n 791 PqRecvLength += r; (gdb) Any one know what is going wrong? BTW, as you can see, the system is solaris. -- Best Regards Huang Qi Victor
Re: [HACKERS] gdb with postgres
On Mon, Jun 6, 2011 at 1:13 PM, HuangQi huangq...@gmail.com wrote: Hi, I was using gdb to debug postgres. In order to debug the backend of running query, I start postgres first and use select * from pg_backend_pid() to ask for backend pid. Then I start gdb in another bash window with gdb postgres and attach the pid obtained above and set the breakpoint. Then I run the query from the first window. However, the debugging precess which is shown below is not going to the breakpoint. I tried many different breakpoints, but it always start from the 305 client_read_ended(). Please compile with -O0 -g flags to see all the debug symbols. CFLAGS=-O0 -g ./configure --enable-debug Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gdb with postgres
Sorry, but recompile with this flag still doesn't work. On 6 June 2011 15:46, Pavan Deolasee pavan.deola...@gmail.com wrote: On Mon, Jun 6, 2011 at 1:13 PM, HuangQi huangq...@gmail.com wrote: Hi, I was using gdb to debug postgres. In order to debug the backend of running query, I start postgres first and use select * from pg_backend_pid() to ask for backend pid. Then I start gdb in another bash window with gdb postgres and attach the pid obtained above and set the breakpoint. Then I run the query from the first window. However, the debugging precess which is shown below is not going to the breakpoint. I tried many different breakpoints, but it always start from the 305 client_read_ended(). Please compile with -O0 -g flags to see all the debug symbols. CFLAGS=-O0 -g ./configure --enable-debug Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Best Regards Huang Qi Victor
Re: [HACKERS] BLOB support
On Sun, 05 Jun 2011 22:16:41 +0200, Dimitri Fontaine wrote: Tom Lane t...@sss.pgh.pa.us writes: Yes. I think the appropriate problem statement is provide streaming access to large field values, as an alternative to just fetching/storing the entire value at once. I see no good reason to import the entire messy notion of LOBS/CLOBS. (The fact that other databases have done it is not a good reason.) Spent some time in the archive to confirm a certain “déjà vu” impression. Couldn't find it. Had to manually search in closed commit fests… but here we are, I think: https://commitfest.postgresql.org/action/patch_view?id=70 http://archives.postgresql.org/message-id/17891.1246301...@sss.pgh.pa.us http://archives.postgresql.org/message-id/4a4bf87e.7010...@ak.jp.nec.com Regards, I think more about this with contrast to sent references, but I still have in my mind construct Blob myWeddingDvd = conn.createBlob(myWeddingStream, size); //A bit outdated we have BlueRay conn.prepareStatemnt(INSERT INTO someonetubevideos values (?)) where 1st parameter is myWeddingDvd, or if someone doesn't like Java he/she/it may wish to put C++ istream, or C FILE. I think (with respect to below consideration), this implicite requires that LOBs should be stored in one, centralized place doesn't matter if this will be file system or special table, or something else, but when statement is processed there is no idea with which table LOB will be associated, if we want to TOAST, where TOAST it, what will be if insertion will by SQL function, which choose table depending on BLOB content? Quite interesting idea from cited patch was about string identifying LOB, but with above it close road to for JDBC create LOB. I think, as well constructs that insert 1st, small LOB into table to get some driver depending API are little bit old fashioned. Possible solutions, if we don't want centralized storage, may be: 1. Keep BLOB in memory, but this may, depending on implementation, reduce size of initial BLOB. 2. Temporally backup blob in file, then when values are stored copy file to TOAST table, but still some changes are required to support LOBs for complex types and arrays. So please give some ideas how to resolve this, or may be it has low priority? Regards, Radek -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum cleanup locks
On Sun, Jun 5, 2011 at 4:03 AM, Robert Haas robertmh...@gmail.com wrote: We've occasionally seen problems with VACUUM getting stuck for failure to acquire a cleanup lock due to, for example, a cursor holding a pin on the buffer page. In the worst case, this can cause an undetected deadlock, if the backend holding the buffer pin blocks trying to acquire a heavyweight lock that is in turn blocked by VACUUM. A while back, someone (Greg Stark? me?) floated the idea of not waiting for the cleanup lock. If we can't get it immediately, or within some short period of time, then we just skip the page and continue on. Today I had what might be a better idea: don't try to acquire a cleanup lock at all. Instead, acquire an exclusive lock. After having done so, observe the pin count. If there are no other buffer pins, that means our exclusive lock is actually a cleanup lock, and we proceed as now. If other buffer pins do exist, then we can't defragment the page, but that doesn't mean no useful work can be done: we can still mark used line pointers dead, or dead line pointers unused. We cannot defragment, but that can be done either by the next VACUUM or by a HOT cleanup. We can even arrange - using existing mechanism - to leave behind a hint that the page is a good candidate for a HOT cleanup, by setting pd_prune_xid to, say, FrozenXID. Like the idea of skipping pages on which we can't acquire a cleanup lock altogether, this should prevent VACUUM from getting stuck trying to lock a heap page. While buffer pins can be held for extended periods of time, I don't think there is any operation that holds a buffer content lock more than very briefly. Furthermore, unlike the idea of skipping the page altogether, we could use this approach even during an anti-wraparound vacuum. Thoughts? Not waiting seems like a good idea. Not returning to the block while it is in RAM or not cleaning the block at all would cause a different performance issues, which I would wish to avoid. Hot Standby has specific code to avoid this situation. Perhaps you could copy that, not sure. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BLOB support
2011/6/6 Radosław Smogura rsmog...@softperience.eu: On Sun, 05 Jun 2011 22:16:41 +0200, Dimitri Fontaine wrote: Tom Lane t...@sss.pgh.pa.us writes: Yes. I think the appropriate problem statement is provide streaming access to large field values, as an alternative to just fetching/storing the entire value at once. I see no good reason to import the entire messy notion of LOBS/CLOBS. (The fact that other databases have done it is not a good reason.) Spent some time in the archive to confirm a certain “déjà vu” impression. Couldn't find it. Had to manually search in closed commit fests… but here we are, I think: https://commitfest.postgresql.org/action/patch_view?id=70 http://archives.postgresql.org/message-id/17891.1246301...@sss.pgh.pa.us http://archives.postgresql.org/message-id/4a4bf87e.7010...@ak.jp.nec.com Regards, I think more about this with contrast to sent references, but I still have in my mind construct Blob myWeddingDvd = conn.createBlob(myWeddingStream, size); //A bit outdated we have BlueRay conn.prepareStatemnt(INSERT INTO someonetubevideos values (?)) where 1st parameter is myWeddingDvd, or if someone doesn't like Java he/she/it may wish to put C++ istream, or C FILE. I think (with respect to below consideration), this implicite requires that LOBs should be stored in one, centralized place doesn't matter if this will be file system or special table, or something else, but when statement is processed there is no idea with which table LOB will be associated, if we want to TOAST, where TOAST it, what will be if insertion will by SQL function, which choose table depending on BLOB content? Quite interesting idea from cited patch was about string identifying LOB, but with above it close road to for JDBC create LOB. I think, as well constructs that insert 1st, small LOB into table to get some driver depending API are little bit old fashioned. Possible solutions, if we don't want centralized storage, may be: 1. Keep BLOB in memory, but this may, depending on implementation, reduce size of initial BLOB. 2. Temporally backup blob in file, then when values are stored copy file to TOAST table, but still some changes are required to support LOBs for complex types and arrays. @1 is useles for multiuser applications. This is a problem of current implemementation for large TOAST values. You can hold around work_mem bytes in mem, but any larger content should to be forwarded to file. Pavel So please give some ideas how to resolve this, or may be it has low priority? Regards, Radek -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SAVEPOINTs and COMMIT performance
On 06.02.2011 23:09, Simon Riggs wrote: On Sun, 2011-02-06 at 12:11 -0500, Bruce Momjian wrote: Did this ever get addressed? Patch attached. Seems like the easiest fix I can come up with. @@ -2518,7 +2518,7 @@ CommitTransactionCommand(void) case TBLOCK_SUBEND: do { - CommitSubTransaction(); + CommitSubTransaction(true); s = CurrentTransactionState;/* changed by pop */ } while (s-blockState == TBLOCK_SUBEND); /* If we had a COMMIT command, finish off the main xact too */ We also get into this codepath at RELEASE SAVEPOINT, in which case it is wrong to not reassign the locks to the parent subtransaction. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
On Sat, Jun 4, 2011 at 5:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: The approach looks sound to me. It's a fairly isolated patch and we should be considering this for inclusion in 9.1, not wait another year. That suggestion is completely insane. The patch is only WIP and full of bugs, even according to its author. Even if it were solid, it is way too late to be pushing such stuff into 9.1. We're trying to ship a release, not find ways to cause it to slip more. In 8.3, you implemented virtual transactionids days before we produced a Release Candidate, against my recommendation. At that time, I didn't start questioning your sanity. In fact we all applauded that because it was a great performance gain. The fact that you disagree with me does not make me insane. Inaction on this point, resulting in a year's delay, will be considered to be a gross waste by the majority of objective observers. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
On 06.06.2011 12:40, Simon Riggs wrote: On Sat, Jun 4, 2011 at 5:55 PM, Tom Lanet...@sss.pgh.pa.us wrote: Simon Riggssi...@2ndquadrant.com writes: The approach looks sound to me. It's a fairly isolated patch and we should be considering this for inclusion in 9.1, not wait another year. That suggestion is completely insane. The patch is only WIP and full of bugs, even according to its author. Even if it were solid, it is way too late to be pushing such stuff into 9.1. We're trying to ship a release, not find ways to cause it to slip more. In 8.3, you implemented virtual transactionids days before we produced a Release Candidate, against my recommendation. FWIW, this bottleneck was not introduced by the introduction of virtual transaction ids. Before that patch, we just took the lock on the real transaction id instead. The fact that you disagree with me does not make me insane. You are not insane, even if your suggestion is. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Fast GiST index build
On 06.06.2011 10:42, Heikki Linnakangas wrote: On 03.06.2011 14:02, Alexander Korotkov wrote: Hackers, WIP patch of fast GiST index build is attached. Code is dirty and comments are lacking, but it works. Now it is ready for first benchmarks, which should prove efficiency of selected technique. It's time to compare fast GiST index build with repeat insert build on large enough datasets (datasets which don't fit to cache). There are following aims of testing: 1) Measure acceleration of index build. 2) Measure change in index quality. I'm going to do first testing using synthetic datasets. Everybody who have interesting real-life datasets for testing are welcome. I ran another test with a simple table generated with: CREATE TABLE pointtest (p point); INSERT INTO pointtest SELECT point(random(), random()) FROM generate_series(1,5000); Generating a gist index with: CREATE INDEX i_pointtest ON pointtest USING gist (p); took about 15 hours without the patch, and 2 hours with it. That's quite dramatic. Oops, that was a rounding error, sorry. The run took about 2.7 hours with the patch, which of course should be rounded to 3 hours, not 2. Anyway, it is still a very impressive improvement. I'm glad you could get the patch ready for benchmarking this quickly. Now you just need to get the patch into shape so that it can be committed. That is always the more time-consuming part, so I'm glad you have plenty of time left for it. Could you please create a TODO list on the wiki page, listing all the missing features, known bugs etc. that will need to be fixed? That'll make it easier to see how much work there is left. It'll also help anyone looking at the patch to know which issues are known issues. Meanwhile, it would still be very valuable if others could test this with different workloads. And Alexander, it would be good if at some point you could write some benchmark scripts too, and put them on the wiki page, just to see what kind of workloads have been taken into consideration and tested already. Do you think there's some worst-case data distributions where this algorithm would perform particularly badly? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DOMAINs and CASTs
On tis, 2011-05-17 at 14:11 -0500, Jaime Casanova wrote: On Tue, May 17, 2011 at 12:19 PM, Robert Haas robertmh...@gmail.com wrote: The more controversial question is what to do if someone tries to create such a cast anyway. We could just ignore that as we do now, or we could throw a NOTICE, WARNING, or ERROR. IMHO, not being an error per se but an implementation limitation i would prefer to send a WARNING Implementation limitations are normally reported as errors. I don't see why it should be different here. It's debatable whether it's an implementation restriction anyway. If you want to create casts from or to a domain, maybe distinct types or type aliases or something like that would be a more appropriate feature in the long run. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
On Mon, Jun 6, 2011 at 2:54 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Ah, I remember I saw that vxid lock pop up quite high in an oprofile profile recently. I think it was the case of executing a lot of very simple prepared queries. So it would be nice to address that, even from a single CPU point of view. It doesn't seem too hard to do, although I have to think about the details. Even though the VXID locks involved are Exclusive locks, they are actually very much like the weak locks that the current patch accelerates, because the Exclusive lock is taken only by the VXID owner, and it can therefore be safely assumed that the initial lock acquisition won't block anything. Therefore, it's really unnecessary to touch the primary lock table at transaction start (and to only touch it at the end if someone's waiting). However, there's a fly in the ointment: when someone tries to ShareLock a VXID, we need to determine whether that VXID is still around and, if so, make an Exclusive lock entry for it in the primary lock table. And, unlike what I'm doing for strong relation locks, it's probably NOT acceptable for that to acquire and release every per-backend LWLock, because every place that waits for VXID locks waits for a list of locks in sequence, so we could end up with O(n^2) behavior. Now, in theory that's not a huge problem: the VXID includes the backend ID, so we ought to be able to figure out which single per-backend LWLock is of interest and just acquire/release that one. Unfortunately, it appears that there's no easy way to go from a backend ID to a PGPROC. The backend IDs are offsets into the ProcState array, so they give us a pointer to the backend's sinval state, not its PGPROC. And while the PGPROC has a pointer to the sinval info, there's no pointer in the opposite direction. Even if there were, we'd probably need to hold SInvalWriteLock in shared mode to follow it. That might not be the end of the world, since VXID locks are fairly infrequently used, but it's certainly a little grotty. I do rather wonder if we should be trying to reduce the number of separate places where we list the running processes. We have arrays of PGPROC structures, and then we have one set of pointers to PGPROCs in the ProcArray, and then we have the ProcState structures for sinval. I wonder if there's some way to rearrange all this to simplify the bookkeeping. BTW, how do you identify from oprofile that *vxid* locks were the problem? I didn't think it could produce that level of detail. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Fast GiST index build
Hi! On Mon, Jun 6, 2011 at 2:51 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 06.06.2011 10:42, Heikki Linnakangas wrote: I ran another test with a simple table generated with: CREATE TABLE pointtest (p point); INSERT INTO pointtest SELECT point(random(), random()) FROM generate_series(1,5000); Generating a gist index with: CREATE INDEX i_pointtest ON pointtest USING gist (p); took about 15 hours without the patch, and 2 hours with it. That's quite dramatic. Oops, that was a rounding error, sorry. The run took about 2.7 hours with the patch, which of course should be rounded to 3 hours, not 2. Anyway, it is still a very impressive improvement. I have similar results on 100 millions of rows: 21.6 hours without patch and 2 hours with patch. But I found a problem: index quality is worse. See following query plans. There test is relation where index was created in ordinal way, and test2 is relation where patch was used. QUERY PLAN --- Bitmap Heap Scan on test (cost=4391.01..270397.31 rows=10 width=20) (actual time=1.257..2.147 rows=838 loops=1) Recheck Cond: (v @ '(0.903,0.203),(0.9,0.2)'::box) Buffers: shared hit=968 - Bitmap Index Scan on test_idx (cost=0.00..4366.01 rows=10 width=0) (actual time=1.162..1.162 rows=838 loops=1) Index Cond: (v @ '(0.903,0.203),(0.9,0.2)'::box) Buffers: shared hit=131 Total runtime: 2.214 ms (7 rows) QUERY PLAN Bitmap Heap Scan on test2 (cost=4370.84..270377.13 rows=10 width=20) (actual time=5.252..6.056 rows=838 loops=1) Recheck Cond: (v @ '(0.903,0.203),(0.9,0.2)'::box) Buffers: shared hit=1458 - Bitmap Index Scan on test2_idx (cost=0.00..4345.84 rows=10 width=0) (actual time=5.155..5.155 rows=838 loops=1) Index Cond: (v @ '(0.903,0.203),(0.9,0.2)'::box) Buffers: shared hit=621 Total runtime: 6.121 ms (7 rows) QUERY PLAN --- Bitmap Heap Scan on test (cost=4391.01..270397.31 rows=10 width=20) (actual time=2.148..2.977 rows=850 loops=1) Recheck Cond: (v @ '(0.503,0.503),(0.5,0.5)'::box) Buffers: shared hit=1099 - Bitmap Index Scan on test_idx (cost=0.00..4366.01 rows=10 width=0) (actual time=2.052..2.052 rows=850 loops=1) Index Cond: (v @ '(0.503,0.503),(0.5,0.5)'::box) Buffers: shared hit=249 Total runtime: 3.033 ms (7 rows) QUERY PLAN Bitmap Heap Scan on test2 (cost=4370.84..270377.13 rows=10 width=20) (actual time=6.806..7.602 rows=850 loops=1) Recheck Cond: (v @ '(0.503,0.503),(0.5,0.5)'::box) Buffers: shared hit=1615 - Bitmap Index Scan on test2_idx (cost=0.00..4345.84 rows=10 width=0) (actual time=6.709..6.709 rows=850 loops=1) Index Cond: (v @ '(0.503,0.503),(0.5,0.5)'::box) Buffers: shared hit=773 Total runtime: 7.667 ms (7 rows) We can see that index scan requires read of several times more pages. Original paper denotes such effect. It explains it by the routing rectangles in less optimal ways. But this effect wasn't so dramatic in tests provided in the paper. So, I have following thoughts about this problem: 1) Number of pages, which was readed from index is too large even with ordinal index build. Querying of small area requires read of hundred of pages. It probbably caused by picksplit implementation. I've version of picksplit algorithm which seems to be much more efficient. I'll do some benchmarks with my picksplit algorithm. I hope difference in index quality will be not so dramatic. 2) I can try to do some enchancements in fast build alogrithms which could improve tree quality. In original paper Hilbert heuristic was used to achive even better tree quality than tree which was created in ordinal way. But since we use GiST we are restricted by it's interface (or we have to create new interface functions(s), but I like to avoid it). I would like to try to do some ordering by penalty value in buffer emptying process and buffers relocation on split. 3) Probably, there is some bug which affects tree quality. Could you please create a TODO list on the wiki page, listing all the missing features, known bugs etc. that will need to be fixed? That'll make it easier to see how much work there is left. It'll also help anyone looking at the patch to
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
On 06.06.2011 07:12, Robert Haas wrote: I did some further investigation of this. It appears that more than 99% of the lock manager lwlock traffic that remains with this patch applied has locktag_type == LOCKTAG_VIRTUALTRANSACTION. Every SELECT statement runs in a separate transaction, and for each new transaction we run VirtualXactLockTableInsert(), which takes a lock on the vxid of that transaction, so that other processes can wait for it. That requires acquiring and releasing a lock manager partition lock, and we have to do the same thing a moment later at transaction end to dump the lock. A quick grep seems to indicate that the only places where we actually make use of those VXID locks are in DefineIndex(), when CREATE INDEX CONCURRENTLY is in use, and during Hot Standby, when max_standby_delay expires. Considering that these are not commonplace events, it seems tremendously wasteful to incur the overhead for every transaction. It might be possible to make the lock entry spring into existence on demand - i.e. if a backend wants to wait on a vxid entry, it creates the LOCK and PROCLOCK objects for that vxid. That presents a few synchronization challenges, and plus we have to make sure that the backend that's just been given a lock knows that it needs to release it, but those seem like they might be manageable problems, especially given the new infrastructure introduced by the current patch, which already has to deal with some of those issues. I'll look into this further. At the moment, the transaction with given vxid acquires an ExclusiveLock on the vxid, and anyone who wants to wait for it to finish acquires a ShareLock. If we simply reverse that, so that the transaction itself takes ShareLock, and anyone wanting to wait on it take an ExclusiveLock, will this fastlock patch bust this bottleneck too? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum cleanup locks
On Mon, Jun 6, 2011 at 2:36 AM, Pavan Deolasee pavan.deola...@gmail.com wrote: Do we know if this is really a problem though ? The deadlock for example, can happen only when a backend tries to get a table level conflicting lock while holding the buffer pin and I am not sure if we do that. The deadlock isn't terribly common, because, as you say, you need the process holding the buffer pin to try to take a lock on the relation being vacuumed that is strong enough to conflict with ShareUpdateExclusiveLock. That's a slightly unusual thing to do. But the problem of vacuum stalling out because it can't get the cleanup lock is a very real one. I've seen at least one customer hit this in production, and it was pretty painful. Now, granted, you need some bad application design, too: you have to leave a cursor lying around instead of running it to completion and then stopping. But supposing you do make that mistake, you might hope that it wouldn't cause VACUUM starvation, which is what happens today. IOW, I'm less worried about whether the cleanup lock is slowing vacuum down than I am about eliminating the pathological cases where an autovacuum workers gets pinned down, stuck waiting for a cleanup lock that never arrives. Now the table doesn't get vacuumed (bad) and the system as a whole is one AV worker short of what it's supposed to have (also bad). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
On Mon, Jun 6, 2011 at 8:02 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 06.06.2011 07:12, Robert Haas wrote: I did some further investigation of this. It appears that more than 99% of the lock manager lwlock traffic that remains with this patch applied has locktag_type == LOCKTAG_VIRTUALTRANSACTION. Every SELECT statement runs in a separate transaction, and for each new transaction we run VirtualXactLockTableInsert(), which takes a lock on the vxid of that transaction, so that other processes can wait for it. That requires acquiring and releasing a lock manager partition lock, and we have to do the same thing a moment later at transaction end to dump the lock. A quick grep seems to indicate that the only places where we actually make use of those VXID locks are in DefineIndex(), when CREATE INDEX CONCURRENTLY is in use, and during Hot Standby, when max_standby_delay expires. Considering that these are not commonplace events, it seems tremendously wasteful to incur the overhead for every transaction. It might be possible to make the lock entry spring into existence on demand - i.e. if a backend wants to wait on a vxid entry, it creates the LOCK and PROCLOCK objects for that vxid. That presents a few synchronization challenges, and plus we have to make sure that the backend that's just been given a lock knows that it needs to release it, but those seem like they might be manageable problems, especially given the new infrastructure introduced by the current patch, which already has to deal with some of those issues. I'll look into this further. At the moment, the transaction with given vxid acquires an ExclusiveLock on the vxid, and anyone who wants to wait for it to finish acquires a ShareLock. If we simply reverse that, so that the transaction itself takes ShareLock, and anyone wanting to wait on it take an ExclusiveLock, will this fastlock patch bust this bottleneck too? Not without some further twaddling. Right now, the fast path only applies when you are taking a lock ShareUpdateExclusiveLock on an unshared relation. See also the email I just sent on why using the exact same mechanism might not be such a hot idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error in PQsetvalue
Hello, guys. You wrote: MM On Fri, Jun 3, 2011 at 10:36 PM, Andrew Chernow a...@esilo.com wrote: On 6/3/2011 10:26 PM, Andrew Chernow wrote: I disagree -- I think the fix is a one-liner. line 446: if (tup_num == res-ntups !res-tuples[tup_num]) should just become if (tup_num == res-ntups) also the memset of the tuple slots when the slot array is expanded can be removed. (in addition, the array tuple array expansion should really be abstracted, but that isn't strictly necessary here). All true. This is a cleaner fix to something that was in fact broken ;) You want Attached a patch that fixes the OP's issue. PQsetvalue now uses pqAddTuple to grow the tuple table and has removed the remnants of an older idea that caused the bug. Sorry, I attached the wrong patch. Here is the correct one. MM This looks good. Pavel, want to test it? Sorry for delay in answer. Yeah, I'm glad to. Should I apply this patch by myself? MM merlin -- With best wishes, Pavel mailto:pa...@gf.microolap.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Fast GiST index build
On Mon, Jun 6, 2011 at 2:51 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Do you think there's some worst-case data distributions where this algorithm would perform particularly badly? I think there could be some worst-case GiST applications. Just now gist fast build algorithm invokes more penalty calls than repeatable insert algorithm. If I succeed then it will invoke even more such calls. So, if penalty function is very slow then gist fast build will be slover then repeatable insert. -- With best regards, Alexander Korotkov.
Re: [HACKERS] WIP: Fast GiST index build
On Mon, Jun 6, 2011 at 4:14 PM, Alexander Korotkov aekorot...@gmail.comwrote: If I succeed then it will invoke even more such calls. I meant here that if I succeed in enhancements which improve index quality then fast build algorithm will invoke even more such calls. -- With best regards, Alexander Korotkov.
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
On 06.06.2011 14:59, Robert Haas wrote: BTW, how do you identify from oprofile that *vxid* locks were the problem? I didn't think it could produce that level of detail. It can show the call stack of each call, with --callgraph=n option, where you can see what percentage of the calls to LockAcquire come from VirtualXactLockTableInsert. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types and extensions
Jeff Davis pg...@j-davis.com writes: I'd like to take another look at Range Types and whether part of it should be an extension. Some of these issues relate to extensions in general, not just range types. That's a good question :) I think the way things are going to be organised now is that we will have core-blessed extensions: don't mix the mechanism and the policy. non-issue if we had a good type interface system (that works on polymorphic types) -- we could just have a built-in range interface, and the range extension could add as the range interface's overlaps operator for the type ANYRANGE. That means that this is, IMHO, the right approach. Have core support that enables user defined RANGE types with indexing and planner support, etc, like we have OPERATOR CLASS and FAMILY and all the jazz. And the useful stuff you need to have to benefit from that core support would be an extension. It could be a core maintained extension, and it could even get installed by default, so that all the users would need to do is 'CREATE EXTENSION timeranges;', for example. So, I see us getting those different kinds of extensions in the future: a. core extensions, shipped by default b. contribs, not shipped by default, maintained by core hackers c. examples, included in the source code only, maintained as contribs d. “trusted network” of extensions (pgdg, pgxn, debian, privates, etc) e. external independent extensions, just as now The other main difference between a core extension and a contrib will be where it's documented. Greg Smith proposed a patch wherein he moved some contribs to a new extension/ place, and had them auto installed. I think the consensus is to instead add a new chapter (maybe between current chapters 9. Functions and Operators and 10. Type Conversion) and host “core extensions” docs there. The source code organisation is controversial because technically not necessary. We have to keep the work Greg did to keep those contribs shipped by default. Oh, and that is on the 9.1 Open Items, right? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Bug in XPATH() if expression returns a scalar value
On tis, 2011-05-31 at 16:19 +0200, Florian Pflug wrote: If people deem this to be a problem, we could instead add a separate function XPATH_VALUE() that returns VARCHAR, and make people use that for scalar-value-returning expressions. Why not replicate what contrib/xml2 provides, namely xpath_string() xpath_number() xpath_bool() That way, types are preserved. However, to avoid confusion, XPATH() should then be taught to raise an error if used for scalar-value returning expressions, instead of silently returning an empty array as it does now. Sounds reasonable. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Domains versus polymorphic functions, redux
On fre, 2011-06-03 at 13:53 -0500, Kevin Grittner wrote: Another long-range nicety would be something which I have seen in some other databases, and which is consistent with the inheritance theme, is that you can't compare or assign dissimilar domains -- an error is thrown. So if you try to join from the eye color column in a person table to the key of a hair color table, you get an error unless you explicitly cast one or both of them to the common type. What you are looking for is the SQL feature called distinct types. The makers of the SQL standard have sort of deprecated domains in favor of distinct types, because distinct types address your sort of use case better, and prescribing the behavior of domains becomes weirder and weirder as the type system becomes more complex. Which is pretty much the same experience we've been having over the years. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [v9.2] Fix leaky-view problem, part 1
This patch enables to fix up leaky-view problem using functions with tiny cost estimation scenario. The point of this scenario is criteria to reorder qualifiers of scanning plan in order_qual_clauses(). The optimizer may pull up simple subqueries into upper level, then its qualifier will get merged with ones in the upper level. When executor scans a relation, qualifiers with smaller cost shall be executed earlier to minimize cost to filter out invisible tuples. However, we know unpreferable side-effects when we use a view for row-level security. Even if a certain subquery rewritten from a view is defined for row-level security, a function with tiny cost appended from outside of the view may executed earlier than qualifiers to perform as security policy of the view, as long as the view is enough simple and the supplied function has tiny cost. In the result, this function can see the arguments come from invisible tuples, and leak them into somewhere. The solution is quite simple. This patch enables to track original depth of qualifiers and modify criteria to sort qualifiers in order_qual_clauses(). Even if a function with tiny cost is supplied from outside of views, the patched optimizer does not prioritize cost estimation more than the depth. It fixes up the scenario [1] in the bellow descriprions. The background of the leaky-view problem is well summarized at: http://wiki.postgresql.org/wiki/RLS We had discussed several scenarios in v9.1 development cycle, and the last developer meeting. We almost concluded the following criteria to characterize whether a leak-view scenario is problematic to be fixed, or not. * If unprived user can directly reference contents of invisible tuples, it is a problem to be fixed. * As long as contents of invisible tuples are consumed by internal stuff (eg, index-access method), it is not a problem to be fixed. Thus, the scenario [1] and [2] are problematic to be fixed, but [3] and [4] are not. So, I'll try to fix up these two scenario with the patch part-1 amd part-2. [1] unexpected reorder of functions with tiny-cost and side-effects Qualifiers of WHERE or JOIN ... IN clause shall be sorted by estimated cost, not depth of nest level. Thus, this logic can make order reversal when user-given qualifier has smaller cost than qualifiers to perform as security policy inside of view. In the result, these qualifiers can reference both of visible and invisible tuples prior to the filtering by row-level security policy of the view. Thus, this behavior can be used to leak contents of invisible tuples. [2] unexpected push-down of functions with side-effect into join-loop If arguments of qualifier being appended on outside of join-loop references only one-side of the join-loop, it is a good strategy to distribute this qualifier into inside of the join-loop to minimize number of tuples to be joined, from the viewpoint of performance. However, it also makes order reversal when the join-loop is a part of view definition that should perform row-level security policy. Then, these exogenetic qualifiers may be executed prior to the filtering by row-level security policy of the view. Thus, this behavior can be used to leak contents of invisible tuple. [3] estimation of hidden value using iteration of PK/FK proves Due to the nature of PK/FK constraints, we can infer existence of key values being stored within invisible tuple, even if we never allows users to reference contents of invisible tuples. We commonly call this type of information leaks covert-channel, and it is basically impossible to prevent according to the previous security research, however, its risk is also relatively small because of slow bandwidth to leak. We already made consensus this scenario is not a problem to be fixed. [4] estimation of hidden value using statistics One example was selectivity-estimator function; that may reference statistical information delivered from the tables have invisible tuples for optimization. Here are two points to be considered. The one is purely internal stuff may be able to reference invisible tuples, however, it is not a problem as long as it does not leak them into end-users; such as index access methods. The second is statistical or other form of date delivered from invisible tuples. We can set up a table that contains data delivered from invisible tuples using row-level triggers, however, it is quite a matter of database administration. Unless owner of tables set up such a leakable configuration, other users cannot reference them. Thanks, -- NEC Europe Ltd, SAP Global Competence Center KaiGai Kohei kohei.kai...@emea.nec.com pgsql-fix-leaky-view-part-1.patch Description: pgsql-fix-leaky-view-part-1.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gdb with postgres
On Mon, Jun 6, 2011 at 3:43 AM, HuangQi huangq...@gmail.com wrote: Hi, I was using gdb to debug postgres. In order to debug the backend of running query, I start postgres first and use select * from pg_backend_pid() to ask for backend pid. Then I start gdb in another bash window with gdb postgres and attach the pid obtained above and set the breakpoint. Then I run the query from the first window. However, the debugging precess which is shown below is not going to the breakpoint. I tried many different breakpoints, but it always start from the 305 client_read_ended(). GNU gdb 6.6 Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as sparc-sun-solaris2.10... (gdb) b qp_add_paths_to_joinrel Breakpoint 1 at 0x1a6744: file joinpath.c, line 67. (gdb) attach 23903 Attaching to program `/usrlocal/pgsql/bin/postgres', process 23903 Retry #1: Retry #2: Retry #3: Retry #4: [New LWP 1] 0xff0cbaa4 in _rt_boot () from /usr/lib/ld.so.1 (gdb) n Single stepping until exit from function _rt_boot, which has no line number information. secure_read (port=0x4a7760, ptr=0x455948, len=8192) at be-secure.c:305 305 client_read_ended(); (gdb) n pq_recvbuf () at pqcomm.c:767 767 if (r 0) (gdb) n 769 if (errno == EINTR) (gdb) n 782 if (r == 0) (gdb) n 788 return EOF; (gdb) n 791 PqRecvLength += r; (gdb) Any one know what is going wrong? BTW, as you can see, the system is solaris. Perhaps you want c for continue rather than n for next. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gdb with postgres
If I enter c, gdb will directly finish executing this process and current query will finish. Furthermore, if I enter next query, gdb will not debug it and stay in continue status. On 6 June 2011 21:44, Robert Haas robertmh...@gmail.com wrote: On Mon, Jun 6, 2011 at 3:43 AM, HuangQi huangq...@gmail.com wrote: Hi, I was using gdb to debug postgres. In order to debug the backend of running query, I start postgres first and use select * from pg_backend_pid() to ask for backend pid. Then I start gdb in another bash window with gdb postgres and attach the pid obtained above and set the breakpoint. Then I run the query from the first window. However, the debugging precess which is shown below is not going to the breakpoint. I tried many different breakpoints, but it always start from the 305 client_read_ended(). GNU gdb 6.6 Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as sparc-sun-solaris2.10... (gdb) b qp_add_paths_to_joinrel Breakpoint 1 at 0x1a6744: file joinpath.c, line 67. (gdb) attach 23903 Attaching to program `/usrlocal/pgsql/bin/postgres', process 23903 Retry #1: Retry #2: Retry #3: Retry #4: [New LWP 1] 0xff0cbaa4 in _rt_boot () from /usr/lib/ld.so.1 (gdb) n Single stepping until exit from function _rt_boot, which has no line number information. secure_read (port=0x4a7760, ptr=0x455948, len=8192) at be-secure.c:305 305 client_read_ended(); (gdb) n pq_recvbuf () at pqcomm.c:767 767 if (r 0) (gdb) n 769 if (errno == EINTR) (gdb) n 782 if (r == 0) (gdb) n 788 return EOF; (gdb) n 791 PqRecvLength += r; (gdb) Any one know what is going wrong? BTW, as you can see, the system is solaris. Perhaps you want c for continue rather than n for next. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Best Regards Huang Qi Victor
Re: [HACKERS] gdb with postgres
On Mon, Jun 6, 2011 at 9:47 AM, HuangQi huangq...@gmail.com wrote: If I enter c, gdb will directly finish executing this process and current query will finish. Furthermore, if I enter next query, gdb will not debug it and stay in continue status. Hmm, that must mean your breakpoint isn't properly set. Instead of doing gdb postgres and then using attach, try just doing gdb -p PIDNAME, then set your breakpoint, then continue. That's how I've always done it, anyway... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gdb with postgres
HuangQi huangq...@gmail.com wrote: (gdb) b qp_add_paths_to_joinrel Breakpoint 1 at 0x1a6744: file joinpath.c, line 67. (gdb) attach 23903 If I enter c, gdb will directly finish executing this process and current query will finish. Are you absolutely sure that running your query will result in a call to this function? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: AuthenticationMD5 protocol documentation clarification
On Sun, Jun 5, 2011 at 11:26 AM, Cyan Ogilvie cyan.ogil...@gmail.com wrote: This is my first patch, so I hope I've got the process right for submitting patches. You're doing great. I suspect we do want to either (1) reword what you've done in English, rather than writing it as code, or at least (2) add some SGML markup to the code. Our next CommitFest starts in just over a week, so you should receive some more specific feedback pretty soon. Also, if you'd like to help review someone else's patch, that would be great. http://archives.postgresql.org/pgsql-rrreviewers/2011-06/msg0.php -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BLOB support
=?UTF-8?Q?Rados=C5=82aw_Smogura?= rsmog...@softperience.eu writes: I think more about this with contrast to sent references, but I still have in my mind construct Blob myWeddingDvd = conn.createBlob(myWeddingStream, size); //A bit outdated we have BlueRay conn.prepareStatemnt(INSERT INTO someonetubevideos values (?)) where 1st parameter is myWeddingDvd, Yes, if you insist upon designing the API like that, then you come to the conclusion that you need global LOB identifiers. However, there are many ways to design this that don't work that way. One idea to think about is insert into someonetubevideos values('') returning open_for_write(videocolumn) which gives you back some kind of writable stream ID (this is a transient, within-session ID, not global) for the target field in the row you just inserted. BTW, as was noted upthread by Dimitri, this whole subject has been discussed before on pgsql-hackers. You really ought to go re-read the previous threads. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
On Mon, Jun 6, 2011 at 11:19 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 06.06.2011 12:40, Simon Riggs wrote: On Sat, Jun 4, 2011 at 5:55 PM, Tom Lanet...@sss.pgh.pa.us wrote: Simon Riggssi...@2ndquadrant.com writes: The approach looks sound to me. It's a fairly isolated patch and we should be considering this for inclusion in 9.1, not wait another year. That suggestion is completely insane. The patch is only WIP and full of bugs, even according to its author. Even if it were solid, it is way too late to be pushing such stuff into 9.1. We're trying to ship a release, not find ways to cause it to slip more. In 8.3, you implemented virtual transactionids days before we produced a Release Candidate, against my recommendation. FWIW, this bottleneck was not introduced by the introduction of virtual transaction ids. Before that patch, we just took the lock on the real transaction id instead. Of course it wasn't. You've misunderstood completely. My point was that we have in the past implemented performance changes to increase scalability at the last minute, and also that our personal risk perspectives are not always set in stone. Robert has highlighted the value of this change and its clearly not beyond our wit to include it, even if it is beyond our will to do so. The fact that you disagree with me does not make me insane. You are not insane, even if your suggestion is. LOL. Your logic is still poor though. :-) -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Different execution time for same plan
Hi, I am using postgresql 8.4.6. I have made an index on my data-type that is working fine. I mean output is coming properly. When i execute the query first time, query takes a quite longer time but second time execution of the same query takes very less time (despite execution plan is same) This is my first time execution of query *explain analyze select * from vehicle_st1 where '(116.3,39.3,2007/06/11 11:11:11),(117.2,39.8,2007/09/13 11:11:11)' @ stpoint;* QUERY PLAN - Index Scan using st1 on vehicle_st1 (cost=0.00..8226.36 rows=2096 width=66) (actual time=65.962..1587.627 rows=9069 loops=1) Index Cond: ('(116.30,39.30,2007-06-11 11:11:11+05:30),(117.20,39.80,2007-09-13 11:11:11+05:30)'::ndpoint @ stpoint) * Total runtime: 1594.446 ms* (3 rows) Second time *explain analyze select * from vehicle_st1 where '(116.3,39.3,2007/06/11 11:11:11),(117.2,39.8,2007/09/13 11:11:11)' @ stpoint;* QUERY PLAN - Index Scan using st1 on vehicle_st1 (cost=0.00..8226.36 rows=2096 width=66) (actual time=0.156..14.316 rows=9069 loops=1) Index Cond: ('(116.30,39.30,2007-06-11 11:11:11+05:30),(117.20,39.80,2007-09-13 11:11:11+05:30)'::ndpoint @ stpoint) *Total runtime: 19.525 ms* (3 rows) Third time *It gives 17.148 ms* Fourth time *It gives 25.102 ms* MY postgresql.conf file having setting like this (this is original setting, i haven't modify anything) #-- # RESOURCE USAGE (except WAL) #-- # - Memory - shared_buffers = 28MB# min 128kB # (change requires restart) #temp_buffers = 8MB# min 800kB #max_prepared_transactions = 0# zero disables the feature # (change requires restart) # Note: Increasing max_prepared_transactions costs ~600 bytes of shared memory # per transaction slot, plus lock space (see max_locks_per_transaction). # It is not advisable to set max_prepared_transactions nonzero unless you # actively intend to use prepared transactions. #work_mem = 1MB# min 64kB #maintenance_work_mem = 16MB# min 1MB #max_stack_depth = 2MB# min 100kB Why the same plan giving different execution time? (Reason may be data gets buffered (cached) for the second time execution) Why there is so much difference? I want to know the estimate correct time of this query then which option is true? 1. First one(1594 ms) when application just started, all buffer are empty. But in practical situation they are not fully empty. 2. I have to taken the stable execution time (19-21 ms). 3. Average down these four execution time. Which option will be true? Thanks Nick
Re: [HACKERS] Domains versus polymorphic functions, redux
Peter Eisentraut pete...@gmx.net writes: What you are looking for is the SQL feature called distinct types. The makers of the SQL standard have sort of deprecated domains in favor of distinct types, because distinct types address your sort of use case better, and prescribing the behavior of domains becomes weirder and weirder as the type system becomes more complex. Which is pretty much the same experience we've been having over the years. Yeah ... the one thing that is actually completely broken about (our current interpretation of) domains is that a first-class SQL datatype cannot sanely have a NOT NULL constraint attached to it. That just doesn't work in conjunction with outer joins, to take one glaring example. As I mentioned upthread, a closer look at the standard leads me to think that the committee doesn't actually intend that a domain's constraints follow it through operations --- I now think they only intend that the constraints get checked in the context of a cast to the domain (including assignment casts). In our terminology that would mean that a domain gets downcast to its base type as soon as you do anything at all to the value, even just pass it through a join. There are certainly applications where such a behavior isn't what you want, but trying to force domains to do something else is just not going to lead to desirable results. It's better to invent some other concept, and it sounds like the committee reached the same conclusion. Anyway, I think we're out of time to do anything about the issue for 9.1. I think what we'd better do is force a downcast in the context of matching to an ANYARRAY parameter, and leave the other cases to revisit later. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BLOB support
Tom Lane t...@sss.pgh.pa.us Monday 06 of June 2011 16:13:26 =?UTF-8?Q?Rados=C5=82aw_Smogura?= rsmog...@softperience.eu writes: I think more about this with contrast to sent references, but I still have in my mind construct Blob myWeddingDvd = conn.createBlob(myWeddingStream, size); //A bit outdated we have BlueRay conn.prepareStatemnt(INSERT INTO someonetubevideos values (?)) where 1st parameter is myWeddingDvd, Yes, if you insist upon designing the API like that, then you come to the conclusion that you need global LOB identifiers. However, there are many ways to design this that don't work that way. One idea to think about is insert into someonetubevideos values('') returning open_for_write(videocolumn) which gives you back some kind of writable stream ID (this is a transient, within-session ID, not global) for the target field in the row you just inserted. I know, but this is a little bit old-fashioned bahaviour. BTW, as was noted upthread by Dimitri, this whole subject has been discussed before on pgsql-hackers. You really ought to go re-read the previous threads. regards, tom lane I read this, but it may be rethinked again. Actaully changes to TOAST (I mean streaming will be just for LOBs, I had written all found disadvantages for TOAST for LOB and it's looks like only performance of above is some kind of disadvantage, as well this prevent some less usefull concepts of Copy on Write for LOBs. Introducing streaming for TOAST is little useless, sorry just for cite from my, mentoined document: (This is generally about on demand stream of TOASTed value, in context of LOBs is acceptable, as long not transactional aware LOBs are acceptable). If we will add streaming of TOASTed values, so caller will get some reference to this value, we need to ensure that pointed data will not be changed, nor deleted - I think this will require caller to add FOR UPDATE (or silently to add this by server) for each statement returning pointers to TOASTed, as client may do transactional query, and other client just after (1st) may remove record, commit, and call VACUUM. In this situation when 1st will try to read data form given row, it will get error. This may be accpetable for LOBs (commonly LOBs may be not transaction aware, but I will be angry if this will happen with VARCHAR) If this is acceptable I will do following changes. Add - server_max_in_memory_lob_size - GUC server start-only config to describe maximum value of client session parameter max_in_memory_lob. - max_in_memory_lob - session GUC describing how huge LOBs may be keept in memory before backing up to file - rescursivly toasting, detoasting during insert/update/remove for searching for LOBs (we need this for arrays and complex types) - this is for last stage (error disallowing LOBs in composites/arrays may be quite enaugh, for begining) - I want LOBs to be starting point for LOBing other types (e.g. some big arrays may be LOBbed). - during toasting, lob will be toasted and in place of LOB, the reference to it will be putted, and encoded in LOB datum. - api for LOB manipulation (few changes to current implementation) in way that BLOB is field type and uninon of in_memory, in_file, in_toast. This few bytes for LOB will not affect size of datum looking at size of LOB. - api for maintaing temoraly lob files, we need this as per session list of id - file desc, to prevent prevent stealing of lobs by different connections (security) - streaming api for TOASTED values (based on COPY protocol, or changed COPY protocol) or at least function calls - I havent looked at this in context of TOASTed LOBs. Is it good? Regards, Radek -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BLOB support
=?utf-8?q?Rados=C5=82aw_Smogura?= rsmog...@softperience.eu writes: Introducing streaming for TOAST is little useless, sorry just for cite from my, mentoined document: (This is generally about on demand stream of TOASTed value, in context of LOBs is acceptable, as long not transactional aware LOBs are acceptable). If we will add streaming of TOASTed values, so caller will get some reference to this value, we need to ensure that pointed data will not be changed, nor deleted - I think this will require caller to add FOR UPDATE (or silently to add this by server) for each statement returning pointers to TOASTed, as client may do transactional query, It's already been explained to you that that's not the case. If this is acceptable I will do following changes. Add - server_max_in_memory_lob_size - GUC server start-only config to describe maximum value of client session parameter max_in_memory_lob. - max_in_memory_lob - session GUC describing how huge LOBs may be keept in memory before backing up to file - rescursivly toasting, detoasting during insert/update/remove for searching for LOBs (we need this for arrays and complex types) - this is for last stage (error disallowing LOBs in composites/arrays may be quite enaugh, for begining) - I want LOBs to be starting point for LOBing other types (e.g. some big arrays may be LOBbed). - during toasting, lob will be toasted and in place of LOB, the reference to it will be putted, and encoded in LOB datum. - api for LOB manipulation (few changes to current implementation) in way that BLOB is field type and uninon of in_memory, in_file, in_toast. This few bytes for LOB will not affect size of datum looking at size of LOB. - api for maintaing temoraly lob files, we need this as per session list of id - file desc, to prevent prevent stealing of lobs by different connections (security) - streaming api for TOASTED values (based on COPY protocol, or changed COPY protocol) or at least function calls - I havent looked at this in context of TOASTed LOBs. Is it good? This all looks like you decided on a solution first and then started to look for a problem to apply it to. I don't want to see us inventing a pile of GUCs for this, and I don't think there is a need to make any fundamental changes in the TOAST mechanism either. What we do need is a streaming access protocol to read or write wide field values *without* forcing people to change the basic structure of their tables. You keep on wanting to invent weird, IMO unnecessary language features instead. Try to make the feature as transparent as possible, not as visible as possible. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postmaster holding unlinked files for pg_largeobject table
Excerpts from Tom Lane's message of sáb jun 04 12:49:05 -0400 2011: Alvaro Herrera alvhe...@commandprompt.com writes: What surprises me is that the open references remain after a database drop. Surely this means that no backends keep open file descriptors to any table in that database, because there are no connections. bgwriter ... Actually you were both wrong, hah. It's not bgwriter, and this doesn't belong on pgsql-general. It's a backend. However, as we mentioned initially, the database to which this file belongs is dropped. What we found out after more careful investigation is that the file is kept open by a backend connected to a different database. I have a suspicion that what happened here is that this backend was forced to flush out a page from shared buffers to read some other page; and it was forced to do a fsync of this file. And then it forgets to close the file descriptor. Actually, there are 11 processes holding open file descriptors to the table, each to a slightly different subset of the many segments of the table. (There's also one holding a FD to the deleted pg_largeobject_loid_pn_index -- remember, this is a dropped database). All those backends belong to Zabbix, the monitoring platform, which are connected to a different database. I think what we have here is a new bug. (This is running 8.4.8, by the way.) -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BLOB support
On 06.06.2011 17:13, Tom Lane wrote: =?UTF-8?Q?Rados=C5=82aw_Smogura?=rsmog...@softperience.eu writes: I think more about this with contrast to sent references, but I still have in my mind construct Blob myWeddingDvd = conn.createBlob(myWeddingStream, size); //A bit outdated we have BlueRay conn.prepareStatemnt(INSERT INTO someonetubevideos values (?)) where 1st parameter is myWeddingDvd, Yes, if you insist upon designing the API like that, then you come to the conclusion that you need global LOB identifiers. That's what the JDBC api looks like, but it doesn't mean you need global LOB identifiers. When you create the Blob object (myWeddingDvd), the driver can just keep a reference to the given stream (myWeddingStream) to the Blob object. When you execute the INSERT statement, the driver can read the stream and stream the data to the server. The protocol changes I think Tom and I and others are envisioning would work just fine with that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Different execution time for same plan
First off, this is posted to the wrong list -- this list is for discussion of development of the PostgreSQL product. There is a list for performance questions where this belongs: pgsql-performa...@postgresql.org. I'm moving this to the performance list with a blind copy to the -hackers list so people know where the discussion went. Nick Raj nickrajj...@gmail.com wrote: When i execute the query first time, query takes a quite longer time but second time execution of the same query takes very less time (despite execution plan is same) Why the same plan giving different execution time? (Reason may be data gets buffered (cached) for the second time execution) Why there is so much difference? Because an access to a RAM buffer is much, much faster than a disk access. Which option will be true? It depends entirely on how much of the data needed for the query is cached. Sometimes people will run a set of queries to warm the cache before letting users in. MY postgresql.conf file having setting like this (this is original setting, i haven't modify anything) shared_buffers = 28MB #work_mem = 1MB# min 64kB #maintenance_work_mem = 16MB# min 1MB If you're concerned about performance, these settings (and several others) should probably be adjusted: http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
On Mon, Jun 6, 2011 at 10:49 AM, Simon Riggs si...@2ndquadrant.com wrote: My point was that we have in the past implemented performance changes to increase scalability at the last minute, and also that our personal risk perspectives are not always set in stone. Robert has highlighted the value of this change and its clearly not beyond our wit to include it, even if it is beyond our will to do so. So, at the risk of totally derailing this thread -- what this boils down to is a philosophical disagreement. It seems to me (and, I think, to Tom and Heikki and others as well) that it's not possible to keep on making changes to the release right up until the last minute and then expect the release to be of high quality. If we keep committing new features, then we'll keep introducing new bugs. The only hope of making the bug count go down at some point is to stop making changes that aren't bug fixes. We could come up with some complex procedure for determining whether a patch is important enough and non-invasive enough to bypass the normal deadline, but that would probably lead to a lot more arguing about procedure, and realistically, it's still going to increase the bug count at least somewhat. IMHO, it's better to just have a deadline, and stuff either makes it or it doesn't. I realize we haven't always adhered to the principle in the past, but at least IMV that's not a mistake we want to continue repeating. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: AuthenticationMD5 protocol documentation clarification
On 06.06.2011 16:58, Robert Haas wrote: On Sun, Jun 5, 2011 at 11:26 AM, Cyan Ogilviecyan.ogil...@gmail.com wrote: This is my first patch, so I hope I've got the process right for submitting patches. You're doing great. I suspect we do want to either (1) reword what you've done in English, rather than writing it as code, or at least (2) add some SGML markup to the code. Our next CommitFest starts in just over a week, so you should receive some more specific feedback pretty soon. That is quite complicated to explain in plain English, so some sort of pseudo-code is probably a good idea. I would recommend not to formulate it as a SQL expression, though. It makes you think you could execute it from psql or something. Even if you know that's not how to do it, it feels confusing. Maybe something like: literalmd5/literal hex_encode(md5(hex_encode(md5(password username) salt) with some extra markup to make it look pretty. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postmaster holding unlinked files for pg_largeobject table
Alvaro Herrera alvhe...@commandprompt.com wrote: What we found out after more careful investigation is that the file is kept open by a backend connected to a different database. I have a suspicion that what happened here is that this backend was forced to flush out a page from shared buffers to read some other page; and it was forced to do a fsync of this file. And then it forgets to close the file descriptor. This sounds vaguely similar to what I found with WAL files being held open for days after they were deleted by read-only backends: http://archives.postgresql.org/message-id/15412.1259630...@sss.pgh.pa.us I mention it only because there might be one place to fix both -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postmaster holding unlinked files for pg_largeobject table
Excerpts from Kevin Grittner's message of lun jun 06 11:58:51 -0400 2011: Alvaro Herrera alvhe...@commandprompt.com wrote: What we found out after more careful investigation is that the file is kept open by a backend connected to a different database. I have a suspicion that what happened here is that this backend was forced to flush out a page from shared buffers to read some other page; and it was forced to do a fsync of this file. And then it forgets to close the file descriptor. This sounds vaguely similar to what I found with WAL files being held open for days after they were deleted by read-only backends: http://archives.postgresql.org/message-id/15412.1259630...@sss.pgh.pa.us I mention it only because there might be one place to fix both Hmm interesting. I don't think the placement suggested by Tom would be useful, because the Zabbix backends are particularly busy all the time, so they wouldn't run ProcessCatchupEvent at all. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postmaster holding unlinked files for pg_largeobject table
Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Kevin Grittner's message of lun jun 06 11:58:51 -0400 2011: Alvaro Herrera alvhe...@commandprompt.com wrote: What we found out after more careful investigation is that the file is kept open by a backend connected to a different database. I have a suspicion that what happened here is that this backend was forced to flush out a page from shared buffers to read some other page; and it was forced to do a fsync of this file. And then it forgets to close the file descriptor. It doesn't forget to close the descriptor; it intentionally keeps it for possible further use. This sounds vaguely similar to what I found with WAL files being held open for days after they were deleted by read-only backends: http://archives.postgresql.org/message-id/15412.1259630...@sss.pgh.pa.us I mention it only because there might be one place to fix both Hmm interesting. I don't think the placement suggested by Tom would be useful, because the Zabbix backends are particularly busy all the time, so they wouldn't run ProcessCatchupEvent at all. Yeah, I wasn't that thrilled with the suggestion either. But we can't just have backends constantly closing every open FD they hold, or performance will suffer. I don't see any very good place to do this... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
Robert Haas robertmh...@gmail.com wrote: IMHO, it's better to just have a deadline, and stuff either makes it or it doesn't. I realize we haven't always adhered to the principle in the past, but at least IMV that's not a mistake we want to continue repeating. +1 I've said it before, but I think it bears repeating, that deferring this to 9.2 doesn't mean that it comes out in a production release 12 months later -- unless we continue to repeat this mistake endlessly. It means that this release comes out closer to when we said it would -- for the sake of argument let's hypothesize one month. So by holding the line on such inclusions all the current 9.1 features come out one month sooner, and this feature comes out 11 months later than it would have if we'd put it into 9.1. With some feature we consider squeezing in, it would be more like delaying everything which is done by three months so that one feature gets out nine months earlier. Perhaps the best way to describe the suggestion that this be included in 9.1 isn't that it's an insane suggestion; but that it's a suggestion which, if adopted, would be likely to drive those who are striving for a more organized development and release process insane. Or one could look at it in a cost/benefit format -- major features delivered per year go up by holding the line, administrative costs are reduced, and people who are focusing on release stability get more months per year to do development. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] WALInsertLock tuning
In earlier discussions of how to improve WALInsertLock contention, it was observed that we must zero each new page before we advance the WAL insertion point. http://postgresql.1045698.n5.nabble.com/Reworking-WAL-locking-td1983647.html IMHO the page zeroing is completely unnecessary, and replication works perfectly well without that (as a test of recovery logic). It is unnecessary because we already allow non-zeroed parts of WAL files, and provide a mechanism to detect stale data. The following trivial patch removes the page zeroing, which reduces the lock duration. Comments? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services walinsertlock_avoid_zero.v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postmaster holding unlinked files for pg_largeobject table
Excerpts from Tom Lane's message of lun jun 06 12:10:24 -0400 2011: Alvaro Herrera alvhe...@commandprompt.com writes: Hmm interesting. I don't think the placement suggested by Tom would be useful, because the Zabbix backends are particularly busy all the time, so they wouldn't run ProcessCatchupEvent at all. Yeah, I wasn't that thrilled with the suggestion either. But we can't just have backends constantly closing every open FD they hold, or performance will suffer. I don't see any very good place to do this... How about doing something on an sinval message for pg_database? That doesn't solve the WAL problem Kevin found, of course ... -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types and extensions
On Mon, 2011-06-06 at 06:56 +0200, Pavel Stehule wrote: 2011/6/6 Darren Duncan dar...@darrenduncan.net: Jeff Davis wrote: I'd like to take another look at Range Types and whether part of it should be an extension. Some of these issues relate to extensions in general, not just range types. First of all, what are the advantages to being in core? it should be supported by FOREACH statement in PL/pgSQL Oh, good idea. It would only work for discrete ranges though. However, I would need to somehow reintroduce the concept of next, which has some hazards to it (as Tom pointed out, we don't want someone to define the next for a float to be +1.0). I'll have to think about this. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postmaster holding unlinked files for pg_largeobject table
Alvaro Herrera alvhe...@commandprompt.com wrote: That doesn't solve the WAL problem Kevin found, of course ... I wouldn't worry about that too much -- the WAL issue is self-limiting and not likely to ever cause a failure. The biggest risk is that every now and then some new individual will notice it and waste a bit of time investigating. The LO issue, on the other hand, could easily eat enough disk to cause a failure. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types and extensions
On Sun, 2011-06-05 at 21:51 -0700, Darren Duncan wrote: Jeff Davis wrote: I'd like to take another look at Range Types and whether part of it should be an extension. Some of these issues relate to extensions in general, not just range types. First of all, what are the advantages to being in core? I believe that ranges aka intervals are widely useful generic types, next after relations/tuples/arrays, and they *should* be supported in core, same as arrays are. I think we all agree that ranges are important. I am not suggesting that we sacrifice on the semantics to make it an extension; I'm just trying to see if involving extensions for some of the approximately 5000 lines would be a good idea. Now assuming that a range/interval value is generally defined in terms of a pair of endpoints of some ordered type (that is, a type for which ORDER BY or RANK or {,,=,=} etc or LIMIT makes sense), it will be essential that this value is capable of distinguishing open and closed intervals. Right, it already does that explicitly. I'd appreciate your input on some of the previous discussion though. Also, if Postgres has some concept of type-generic special values -Inf and +Inf (which always sort before or after any other value in the type system), those can be used as endpoints to indicate that the interval is unbounded. I already introduced +/- infinity to range types. They are not generic outside of ranges, however -- therefore you can't select the upper bound of an upper-infinite range. Unless you have some other syntax in mind, I suggest lifting the range literal syntax from Perl 6, where .. is an infix operator building a range between its arguments, and a ^ on either side means that side is open, I think; so there are 4 variants: {..,^..,..^,^..^}. Oh, interesting syntax. That might make a good operator version of a constructor. Unfortunately, . is not valid in an operator name in PG. Maybe I can use tilde or dash? Any operation that wants to deal with a range somehow, such as the BETWEEN syntax, could instead use a range/interval; for example, both of: foo in 1..10 I don't know if it's reasonable to introduce syntax like in here. Maybe we could just still use between and it would recognize that the RHS is a range? The LIMIT clause could take a range to specify take and skip count at once. Interesting idea. Array slicing can be done using foo[first..last] or such. I like that, but we already have foo[3:7], so it might be better not to introduce redundancy. Too bad I can't use : as an operator. A random number generator that takes endpoints can take a range argument. Sounds useful because it would make it more explicit whether the endpoints are possible results. An array or relation of these range can represent ranges with holes, and the general results of range union operations. Right, that's been brought up before as well. In particular, Scott Bailey has done some thinking/writing on this topic. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types and extensions
2011/6/6 Jeff Davis pg...@j-davis.com: On Mon, 2011-06-06 at 06:56 +0200, Pavel Stehule wrote: 2011/6/6 Darren Duncan dar...@darrenduncan.net: Jeff Davis wrote: I'd like to take another look at Range Types and whether part of it should be an extension. Some of these issues relate to extensions in general, not just range types. First of all, what are the advantages to being in core? it should be supported by FOREACH statement in PL/pgSQL Oh, good idea. It would only work for discrete ranges though. However, I would need to somehow reintroduce the concept of next, which has some hazards to it (as Tom pointed out, we don't want someone to define the next for a float to be +1.0). I'll have to think about this. we can define a step FOREACH x IN RANGE . BY LOOP END LOOP Regards Pavel Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postmaster holding unlinked files for pg_largeobject table
Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Tom Lane's message of lun jun 06 12:10:24 -0400 2011: Yeah, I wasn't that thrilled with the suggestion either. But we can't just have backends constantly closing every open FD they hold, or performance will suffer. I don't see any very good place to do this... How about doing something on an sinval message for pg_database? That doesn't solve the WAL problem Kevin found, of course ... Hmm ... that would help for the specific scenario of dropped databases, but we've also heard complaints about narrower cases such as a single dropped table. A bigger issue is that I don't believe it's very practical to scan the FD array looking for files associated with a particular database (or table). They aren't labeled that way, and parsing the file path to find out the info seems pretty grotty. On reflection I think this behavior is probably limited to the case where we've done what we used to call a blind write of a block that is unrelated to our database or tables. For normal SQL-driven accesses, there's a relcache entry, and flushing of that entry will lead to closure of associated files. I wonder whether we should go back to forcibly closing the FD after a blind write. This would suck if a backend had to do many dirty-buffer flushes for the same relation, but hopefully the bgwriter is doing most of those. We'd want to make sure such forced closure *doesn't* occur in the bgwriter. (If memory serves, it has a checkpoint-driven closure mechanism instead.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postmaster holding unlinked files for pg_largeobject table
On Mon, Jun 6, 2011 at 12:30 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Tom Lane's message of lun jun 06 12:10:24 -0400 2011: Yeah, I wasn't that thrilled with the suggestion either. But we can't just have backends constantly closing every open FD they hold, or performance will suffer. I don't see any very good place to do this... How about doing something on an sinval message for pg_database? That doesn't solve the WAL problem Kevin found, of course ... Hmm ... that would help for the specific scenario of dropped databases, but we've also heard complaints about narrower cases such as a single dropped table. A bigger issue is that I don't believe it's very practical to scan the FD array looking for files associated with a particular database (or table). They aren't labeled that way, and parsing the file path to find out the info seems pretty grotty. On reflection I think this behavior is probably limited to the case where we've done what we used to call a blind write of a block that is unrelated to our database or tables. For normal SQL-driven accesses, there's a relcache entry, and flushing of that entry will lead to closure of associated files. I wonder whether we should go back to forcibly closing the FD after a blind write. This would suck if a backend had to do many dirty-buffer flushes for the same relation, but hopefully the bgwriter is doing most of those. We'd want to make sure such forced closure *doesn't* occur in the bgwriter. (If memory serves, it has a checkpoint-driven closure mechanism instead.) Instead of closing them immediately, how about flagging the FD and closing all the flagged FDs at the end of each query, or something like that? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types and extensions
On Mon, 2011-06-06 at 14:42 +0200, Dimitri Fontaine wrote: I think the way things are going to be organised now is that we will have core-blessed extensions: don't mix the mechanism and the policy. I like that idea. non-issue if we had a good type interface system (that works on polymorphic types) -- we could just have a built-in range interface, and the range extension could add as the range interface's overlaps operator for the type ANYRANGE. That means that this is, IMHO, the right approach. Have core support that enables user defined RANGE types with indexing and planner support, etc, like we have OPERATOR CLASS and FAMILY and all the jazz. If we take the minimal approach, the index support would be the first to be moved to an extension. In order to have index support in core, we need quite a few functions and a significant amount of code. Minimal would be: * CREATE TYPE ... AS RANGE * ANYRANGE * The IO functions * Possibly the constructors and accessors ( range(), range_oc(), lower(), upper(), etc.) Regarding the type interfaces, the only thing that really worries me there is that my future work will depend on them existing, and I haven't really thought through the details. For instance, it just occurred to me recently that it would need to support polymorphic types, which might be a little bit more complicated than a simple lookup. I suppose it's easier to put a few functions in core later if we get stuck than to rip them out later. And the useful stuff you need to have to benefit from that core support would be an extension. It could be a core maintained extension, and it could even get installed by default, so that all the users would need to do is 'CREATE EXTENSION timeranges;', for example. Sounds good to me. However, would the extension be available in pg_regress? If not, I will need to include those constructors/accessors to be able to test anything. I think the consensus is to instead add a new chapter (maybe between current chapters 9. Functions and Operators and 10. Type Conversion) and host “core extensions” docs there. The source code organisation is controversial because technically not necessary. We have to keep the work Greg did to keep those contribs shipped by default. Oh, and that is on the 9.1 Open Items, right? OK, so there are still a few things to be decided around documentation and tests. Both of those things can take a significant amount of time to rework, so I think I'll leave it alone until we have more of a consensus. We still have time before 9.2 to break some of the code out into an extension when we do have the doc/test issues resolved. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types and extensions
On Sun, Jun 5, 2011 at 6:59 PM, Jeff Davis pg...@j-davis.com wrote: There might also be some middle ground, where its like the minimalist approach, but with a few very basic constructors and accessors. That would at least make it easier to test, but then to be actually useful (with index support, operators, fancy functions, etc.) you'd need the extension. Thoughts? I can see merit to having parts of RANGE implemented in core, along with some of the usage parts implemented as extensions, so that if I'm not actually using (say) INET ranges, then the database isn't cluttered up with all the functions and operators for INET ranges. How to slice it apart into an appropriate admixture of core and extensions is a good question, though it seems pretty likely that having an extension for each data type that is to be mixed into a range is a reasonable way to go. I think this also can make some would-be arguments against RANGE go away... I hate that this RANGE extension means we have to draw 5000 lines of code into every database, and draws in 275 operator functions evaporates if the base part is entirely smaller, and if you only draw in all the functions and operators if you request loading of each of the 17 extensions. Per-type extensions offers a pretty natural partitioning of the code for each type, which seems pretty good. -- When confronted by a difficult problem, solve it by reducing it to the question, How would the Lone Ranger handle this? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types and extensions
On Mon, 2011-06-06 at 18:28 +0200, Pavel Stehule wrote: we can define a step FOREACH x IN RANGE . BY That wouldn't need any of the range infrastructure at all -- it would be purely syntactic, right? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WALInsertLock tuning
Simon Riggs si...@2ndquadrant.com writes: In earlier discussions of how to improve WALInsertLock contention, it was observed that we must zero each new page before we advance the WAL insertion point. http://postgresql.1045698.n5.nabble.com/Reworking-WAL-locking-td1983647.html IMHO the page zeroing is completely unnecessary, I don't believe it's completely unnecessary. It does in fact offer additional protection against mistakenly taking stale data as valid. You could maybe argue that the degree of safety increase isn't sufficient to warrant the cost of zeroing the page, but you've not offered any quantification of either the risk or the cost savings. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum cleanup locks
Excerpts from Robert Haas's message of lun jun 06 08:06:10 -0400 2011: But the problem of vacuum stalling out because it can't get the cleanup lock is a very real one. I've seen at least one customer hit this in production, and it was pretty painful. Now, granted, you need some bad application design, too: you have to leave a cursor lying around instead of running it to completion and then stopping. But supposing you do make that mistake, you might hope that it wouldn't cause VACUUM starvation, which is what happens today. IOW, I'm less worried about whether the cleanup lock is slowing vacuum down than I am about eliminating the pathological cases where an autovacuum workers gets pinned down, stuck waiting for a cleanup lock that never arrives. Now the table doesn't get vacuumed (bad) and the system as a whole is one AV worker short of what it's supposed to have (also bad). One of the good things about your proposal is that (AFAICS) you can freeze tuples without the cleanup lock, so the antiwraparound cleanup would still work. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postmaster holding unlinked files for pg_largeobject table
Robert Haas robertmh...@gmail.com writes: On Mon, Jun 6, 2011 at 12:30 PM, Tom Lane t...@sss.pgh.pa.us wrote: On reflection I think this behavior is probably limited to the case where we've done what we used to call a blind write of a block that is unrelated to our database or tables. For normal SQL-driven accesses, there's a relcache entry, and flushing of that entry will lead to closure of associated files. I wonder whether we should go back to forcibly closing the FD after a blind write. This would suck if a backend had to do many dirty-buffer flushes for the same relation, but hopefully the bgwriter is doing most of those. We'd want to make sure such forced closure *doesn't* occur in the bgwriter. (If memory serves, it has a checkpoint-driven closure mechanism instead.) Instead of closing them immediately, how about flagging the FD and closing all the flagged FDs at the end of each query, or something like that? Hmm, there's already a mechanism for closing temp FDs at the end of a query ... maybe blind writes could use temp-like FDs? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types and extensions
On Mon, 2011-06-06 at 16:45 +, Christopher Browne wrote: How to slice it apart into an appropriate admixture of core and extensions is a good question, though it seems pretty likely that having an extension for each data type that is to be mixed into a range is a reasonable way to go. ... Per-type extensions offers a pretty natural partitioning of the code for each type, which seems pretty good. Ideally, most range types can be created with a simple: CREATE TYPE foorange AS RANGE (subtype=foo); There might be a few subtype-specific functions, like the canonical function, but overall it should be a small amount of code per range. However, I'd say just bundle a bunch of rangetypes together in one extension. There's not really much cost -- if you are using one range type, you'll use a few more. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types and extensions
On Sun, Jun 5, 2011 at 2:59 PM, Jeff Davis pg...@j-davis.com wrote: So, where on this spectrum should range types fall? I think the most minimalist would be to only support #1 (and the necessary type IO functions); and leave all other functions, operators, and opclasses to an extension. That has a lot of appeal, but I don't think we can ignore the challenges above. On the other hand, trying to make it a complete feature in core has challenges as well. For instance, even with Range Types, Exclusion Constraints aren't practical out-of-the-box unless we also have BTree-GiST in core. So there's a snowball effect. There might also be some middle ground, where its like the minimalist approach, but with a few very basic constructors and accessors. That would at least make it easier to test, but then to be actually useful (with index support, operators, fancy functions, etc.) you'd need the extension. I don't have clear feeling on this question in general, but if we're going to break this up into pieces, it's important that they be logical pieces. Putting half the feature in core and half into an extension just because we can will simplify complicate code maintenance to no good end. The snowball effect is something to avoid, and we need to watch out for that, but if the upshot of putting part of it in core is that the core code can no longer be understood or maintained because it depends heavily on a bunch of non-core code, that's not helpful. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap vacuum cleanup locks
On Mon, Jun 6, 2011 at 12:49 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Robert Haas's message of lun jun 06 08:06:10 -0400 2011: But the problem of vacuum stalling out because it can't get the cleanup lock is a very real one. I've seen at least one customer hit this in production, and it was pretty painful. Now, granted, you need some bad application design, too: you have to leave a cursor lying around instead of running it to completion and then stopping. But supposing you do make that mistake, you might hope that it wouldn't cause VACUUM starvation, which is what happens today. IOW, I'm less worried about whether the cleanup lock is slowing vacuum down than I am about eliminating the pathological cases where an autovacuum workers gets pinned down, stuck waiting for a cleanup lock that never arrives. Now the table doesn't get vacuumed (bad) and the system as a whole is one AV worker short of what it's supposed to have (also bad). One of the good things about your proposal is that (AFAICS) you can freeze tuples without the cleanup lock, so the antiwraparound cleanup would still work. Yeah, I think that's a major selling point. VACUUM getting stuck is Bad. Anti-wraparound VACUUM getting stuck is Really Bad. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Allow \dd to show constraint comments
Excerpts from Josh Kupershmidt's message of dom jun 05 16:36:57 -0400 2011: On Tue, May 24, 2011 at 10:31 PM, Josh Kupershmidt schmi...@gmail.com wrote: Attached is a rebased patch. From a quick look, it seems that most of the object types missing from \dd are already covered by pg_comments (cast, constraint, conversion, domain, language, operator class, operator family). A few objects would probably still need to be added (foreign data wrapper, server). Here's another update to this patch. Includes: * rudimentary doc page for pg_comments * 'foreign data wrapper' and 'server' comment types now included in pg_comments; regression test updated Hmm, if we're going to have pg_comments as a syntactic sugar kind of thing, it should output things in format immediately useful to the user, i.e. relation/column/etc names and not OIDs. The OIDs would force you to do lots of joins just to make it readable. Maybe you should have a column for the class of object the comment applies to, but as a name and not a regclass. And then a column for names that each comment applies to. (We're still struggling to get a useful pg_locks display). I mean, if OIDs are good for you and you're OK with doing a few joins, why not go to the underlying catalogs in the first place? (IMHO anyway -- what do others think?) -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postmaster holding unlinked files for pg_largeobject table
Excerpts from Tom Lane's message of lun jun 06 12:49:46 -0400 2011: Robert Haas robertmh...@gmail.com writes: On Mon, Jun 6, 2011 at 12:30 PM, Tom Lane t...@sss.pgh.pa.us wrote: On reflection I think this behavior is probably limited to the case where we've done what we used to call a blind write of a block that is unrelated to our database or tables. For normal SQL-driven accesses, there's a relcache entry, and flushing of that entry will lead to closure of associated files. I wonder whether we should go back to forcibly closing the FD after a blind write. This would suck if a backend had to do many dirty-buffer flushes for the same relation, but hopefully the bgwriter is doing most of those. We'd want to make sure such forced closure *doesn't* occur in the bgwriter. (If memory serves, it has a checkpoint-driven closure mechanism instead.) Instead of closing them immediately, how about flagging the FD and closing all the flagged FDs at the end of each query, or something like that? Hmm, there's already a mechanism for closing temp FDs at the end of a query ... maybe blind writes could use temp-like FDs? OK, I'll have a look at how blind writes work this afternoon and propose something. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
On Mon, Jun 6, 2011 at 5:14 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Perhaps the best way to describe the suggestion that this be included in 9.1 isn't that it's an insane suggestion; but that it's a suggestion which, if adopted, would be likely to drive those who are striving for a more organized development and release process insane. Kevin, I respect your opinion and thank you for stating your case without insults. In this discussion it should be recognised that I have personally driven the development of a more organized dev and release process. I requested and argued for stated release dates to assist resource planning and suggested commitfests as a mechanism to reduce the feedback times for developers. I also provided the first guide to patch reviews we published. So I am a proponent of planning and organization, though some would like to claim I see things differently. The major problems of the dev process are now solved, yet more organization is still being discussed, as if more == better. What I hear is changed organization and I am not certain that all change == better in what I see is a leading example of how to produce great software. Releasing regularly is important, but not more important than anything. Ever. Period. Trying to force that will definitely make you mad, I can see. I request that people stop trying to enforce a process so strictly that sensible and important change cannot take place when needed. Or one could look at it in a cost/benefit format -- major features delivered per year go up by holding the line, administrative costs are reduced, and people who are focusing on release stability get more months per year to do development. I do look at it in a cost/benefit format. The problem is the above statement has nothing user-centric about it. The cost to us is a few days work and the benefit is a whole year's worth of increased performance for our user base, which has a hardware equivalent well into the millions of dollars. And that's ignoring the users that would've switched to using Postgres earlier, and those who might leave because of competitive comparison. I won't say any more about this because I am in no way a beneficiary from this and even my opinion is given unpaid. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types and extensions
2011/6/6 Jeff Davis pg...@j-davis.com: On Mon, 2011-06-06 at 18:28 +0200, Pavel Stehule wrote: we can define a step FOREACH x IN RANGE . BY That wouldn't need any of the range infrastructure at all -- it would be purely syntactic, right? I don't think. For lot of types the specification of a step is necessary - for date, for float. Pavel Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Allow \dd to show constraint comments
On Mon, Jun 6, 2011 at 1:03 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Josh Kupershmidt's message of dom jun 05 16:36:57 -0400 2011: On Tue, May 24, 2011 at 10:31 PM, Josh Kupershmidt schmi...@gmail.com wrote: Attached is a rebased patch. From a quick look, it seems that most of the object types missing from \dd are already covered by pg_comments (cast, constraint, conversion, domain, language, operator class, operator family). A few objects would probably still need to be added (foreign data wrapper, server). Here's another update to this patch. Includes: * rudimentary doc page for pg_comments * 'foreign data wrapper' and 'server' comment types now included in pg_comments; regression test updated Hmm, if we're going to have pg_comments as a syntactic sugar kind of thing, it should output things in format immediately useful to the user, i.e. relation/column/etc names and not OIDs. The OIDs would force you to do lots of joins just to make it readable. Well, that's basically what this is doing. See the objname/objtype columns. It's intended that the output of this view should match the format that COMMENT takes as input. But propagating the OIDs through is sensible as well, because sometimes people may want to do other joins, filtering, etc. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BLOB support
I just started with some image as blob works. And I think topic of this will come back. As well many other problems will arise. Flattering tuple, etc. I will send scretches of streaming in this way, I hope, as I want go back to clusterization work. Sorry, for top reply, windows phone 7. Regards, Radek -Original Message- From: Tom Lane Sent: 6 czerwca 2011 17:41 To: Radoslaw Smogura Cc: Alvaro Herrera; Pavel Stehule; Dimitri Fontaine; Robert Haas; Peter Eisentraut; PG Hackers Subject: Re: [HACKERS] BLOB support =?utf-8?q?Rados=C5=82aw_Smogura?= rsmog...@softperience.eu writes: Introducing streaming for TOAST is little useless, sorry just for cite from my, mentoined document: (This is generally about on demand stream of TOASTed value, in context of LOBs is acceptable, as long not transactional aware LOBs are acceptable). If we will add streaming of TOASTed values, so caller will get some reference to this value, we need to ensure that pointed data will not be changed, nor deleted - I think this will require caller to add FOR UPDATE (or silently to add this by server) for each statement returning pointers to TOASTed, as client may do transactional query, It's already been explained to you that that's not the case. If this is acceptable I will do following changes. Add - server_max_in_memory_lob_size - GUC server start-only config to describe maximum value of client session parameter max_in_memory_lob. - max_in_memory_lob - session GUC describing how huge LOBs may be keept in memory before backing up to file - rescursivly toasting, detoasting during insert/update/remove for searching for LOBs (we need this for arrays and complex types) - this is for last stage (error disallowing LOBs in composites/arrays may be quite enaugh, for begining) - I want LOBs to be starting point for LOBing other types (e.g. some big arrays may be LOBbed). - during toasting, lob will be toasted and in place of LOB, the reference to it will be putted, and encoded in LOB datum. - api for LOB manipulation (few changes to current implementation) in way that BLOB is field type and uninon of in_memory, in_file, in_toast. This few bytes for LOB will not affect size of datum looking at size of LOB. - api for maintaing temoraly lob files, we need this as per session list of id - file desc, to prevent prevent stealing of lobs by different connections (security) - streaming api for TOASTED values (based on COPY protocol, or changed COPY protocol) or at least function calls - I havent looked at this in context of TOASTed LOBs. Is it good? This all looks like you decided on a solution first and then started to look for a problem to apply it to. I don't want to see us inventing a pile of GUCs for this, and I don't think there is a need to make any fundamental changes in the TOAST mechanism either. What we do need is a streaming access protocol to read or write wide field values *without* forcing people to change the basic structure of their tables. You keep on wanting to invent weird, IMO unnecessary language features instead. Try to make the feature as transparent as possible, not as visible as possible. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] permissions of external PID file
On Fri, Jun 03, 2011 at 09:51:45PM +0300, Peter Eisentraut wrote: The external PID file, if configured, is currently generated with 600 permissions, which results from the umask setting in the postmaster. I think it would be nicer if we could make that 644. I have often dealt with scripts and such that look through PID files in /var/run, and it's always annoying when some PID file is not readable for some reason or no reason at all. (One simple benefit is that you don't need extra privileges to check whether the process is alive.) Almost all PID files on my system are world-readable now. +1 for making this world-readable. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types and extensions
On Mon, Jun 06, 2011 at 12:53:49PM -0400, Robert Haas wrote: I don't have clear feeling on this question in general, but if we're going to break this up into pieces, it's important that they be logical pieces. Putting half the feature in core and half into an extension just because we can will simplify complicate code maintenance to no good end. The snowball effect is something to avoid, and we need to watch out for that, but if the upshot of putting part of it in core is that the core code can no longer be understood or maintained because it depends heavily on a bunch of non-core code, that's not helpful. And concretely, code paths that cannot be exercised easily from core-only code will not get regression tested, and will therefore rot. Ross -- Ross Reedstrom, Ph.D. reeds...@rice.edu Systems Engineer Admin, Research Scientistphone: 713-348-6166 Connexions http://cnx.orgfax: 713-348-3665 Rice University MS-375, Houston, TX 77005 GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] heap_hot_search_buffer refactoring
The attached patch refactors heap_hot_search_buffer() so that index_getnext() can use it, and modifies index_getnext() to do so. The idea is based on one of Heikki's index-only scan patches, from 2009: http://archives.postgresql.org/pgsql-hackers/2009-07/msg00676.php I believe, however, that this portion of that patch stands alone, completely independently of index-only scans. At present, we have two copies of the logic to traverse HOT chains floating around, which means they can get out of sync, possibly resulting in bugs. This has already happened once: http://archives.postgresql.org/pgsql-hackers/2011-05/msg00733.php As a nice bonus, the new logic is both simpler and, I believe, more correct than the current logic. In IndexScanDescData, xs_hot_dead, xs_next_hot, and xs_prev_xmax get replaced by a single boolean xs_continue_hot. There is a small behavioral difference: in the current code, when use a non-MVCC snapshot with index_getnext() and walk a HOT chain, each call remembers the *next* TID that should be examined. With this change, we instead remember the TID that we most recently returned, and compute the next TID when index_getnext() is called again. It seems to me that this is really what we should have been doing all along. Imagine, for example, that we have a HOT chain A - B, where B has not yet committed. We return A and remember that we next intend to look at B. Before index_getnext() is called, B aborts and a new HOT update produces a HOT chain A - C. The next call to index_getnext() will nonetheless look at B and conclude that it's reached the end of the HOT chain. This doesn't actually matter a bit for current uses of index_getnext(), because - at least according to Heikki's old notes - the only place we use a non-MVCC snapshot with this function is during CLUSTER. And at that point, we have the whole table locked down, so nothing is happening concurrently. I'm not sure there's any possibility of us ever using this function in a way that would break under the current implementation, but what I'm proposing here does seem more robust to me. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company heap-hot-search-refactoring.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Allow \dd to show constraint comments
Robert Haas robertmh...@gmail.com writes: On Mon, Jun 6, 2011 at 1:03 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Hmm, if we're going to have pg_comments as a syntactic sugar kind of thing, it should output things in format immediately useful to the user, i.e. relation/column/etc names and not OIDs. The OIDs would force you to do lots of joins just to make it readable. Well, that's basically what this is doing. See the objname/objtype columns. It's intended that the output of this view should match the format that COMMENT takes as input. But propagating the OIDs through is sensible as well, because sometimes people may want to do other joins, filtering, etc. Is it also propagating the catalog OID through? Because joining on OID alone is not to be trusted. I tend to agree with Alvaro's viewpoint here: anybody who wants to deal directly in OIDs is better off joining directly to pg_description, and not going through the rather large overhead that this view is going to impose. So we should just make this a purely user-friendly view and be done with it, not try to create an amalgam that serves neither purpose well. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Allow \dd to show constraint comments
On Mon, Jun 6, 2011 at 2:06 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Jun 6, 2011 at 1:03 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Hmm, if we're going to have pg_comments as a syntactic sugar kind of thing, it should output things in format immediately useful to the user, i.e. relation/column/etc names and not OIDs. The OIDs would force you to do lots of joins just to make it readable. Well, that's basically what this is doing. See the objname/objtype columns. It's intended that the output of this view should match the format that COMMENT takes as input. But propagating the OIDs through is sensible as well, because sometimes people may want to do other joins, filtering, etc. Is it also propagating the catalog OID through? Because joining on OID alone is not to be trusted. Yep. I tend to agree with Alvaro's viewpoint here: anybody who wants to deal directly in OIDs is better off joining directly to pg_description, and not going through the rather large overhead that this view is going to impose. So we should just make this a purely user-friendly view and be done with it, not try to create an amalgam that serves neither purpose well. Possibly. On the other hand we have things like pg_tables floating around that are basically useless because you can't get the OID easily. The information_schema suffers from this problem as well. I get what you're saying, but I think we should think two or three times very carefully before throwing away the OID in a situation where it can easily be provided. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types and extensions
* Dimitri Fontaine (dimi...@2ndquadrant.fr) wrote: That means that this is, IMHO, the right approach. Have core support that enables user defined RANGE types with indexing and planner support, etc, like we have OPERATOR CLASS and FAMILY and all the jazz. Yes, we do, however.. And the useful stuff you need to have to benefit from that core support would be an extension. It could be a core maintained extension, and it could even get installed by default, so that all the users would need to do is 'CREATE EXTENSION timeranges;', for example. I don't like the idea of having a capability which is not utilized in core. We should make it so extensions can *also* have access to define their own, but we should have the basics covered in core. a. core extensions, shipped by default Having it as a core extension might work, but I'm not really 'sold' on it. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] WALInsertLock tuning
On Mon, Jun 6, 2011 at 5:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: In earlier discussions of how to improve WALInsertLock contention, it was observed that we must zero each new page before we advance the WAL insertion point. http://postgresql.1045698.n5.nabble.com/Reworking-WAL-locking-td1983647.html IMHO the page zeroing is completely unnecessary, I don't believe it's completely unnecessary. It does in fact offer additional protection against mistakenly taking stale data as valid. You could maybe argue that the degree of safety increase isn't sufficient to warrant the cost of zeroing the page, but you've not offered any quantification of either the risk or the cost savings. If we did ever reference stale data, it would need to have a value of xl_prev that exactly matched what we expected AND would also need a CRC that exactly matched also. That would be fairly difficult to arrange deliberately and pretty close to zero chance of happening otherwise. But that even assumes we write the unzeroed data at the end of the buffer. We don't. We only write data up to the end of the WAL record on the current page, unless we do a continuation record, which means only replay we would read in another page and check page headers. So for this to cause an error, we'd have to have an overall matching CRC, a matching xl_prev and at least one matching page header, which contains a pageaddress. But that even assumes we would read data in a different way to which it was written. So the only way we could ever reference the stale data is if the WAL reading code didn't match the WAL writing code (1) because of a bug or (2) because of a corrupt pg_control record that caused random access to an otherwise unreachable part of WAL AND the CRC matched, and the xl_prev matched and the next page header matched. Risk == zero. If it wasn't zero I would even mention it because this is not a trade-off optimization. Cost savings are those already identified by yourself in our previous discussion on this, link given upthread. It's the biggest single removable item from within WALInsertLock. We can measure them if you like, but I don't see the value in that. It will clearly be a useful saving on what we all agree is a heavily contended lock. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types and extensions
On Sun, Jun 5, 2011 at 1:59 PM, Jeff Davis pg...@j-davis.com wrote: In the several talks that I've given, a common question is related to multiranges (ranges with holes). These get a little complex, and I don't have a complete answer. However, multiranges can be approximated with ordered arrays of non-overlapping, non-adjacent ranges. If someone wants to take it upon themselves to develop a set of operators here, that would be great -- but without ANYRANGE the operators would be unmanageable. 2. Documentation and Tests -- Let's say we take a minimalist view, and only have ANYRANGE and CREATE TYPE ... AS RANGE in core; and leave the rest as an extension. What exactly would the documentation say? I think it would be even more hypothetical and abstract than the documentation for Exclusion Constraints. So, there is a certain documentation advantage to having at least enough functionality to allow someone to try out the feature. And the tests for such a minimalist feature would be a significant challenge -- what do we do there? Get pg_regress to load the extension from PGXN? 3. Quality -- PostgreSQL has a great reputation for quality, and for good reason. But extensions don't follow the same quality-control standards; and even if some do, there is no visible stamp of approval. So, to ask someone to use an extension means that they have to evaluate the quality for themselves, which is a pretty high barrier. Since PGXN (thanks David Wheeler) and EXTENSIONs (thanks Dmitri) solve many of the other issues, quality control is one of the biggest ones remaining. I still get questions about when the temporal type will be in core, and I think this is why. I don't think this is a good excuse to put it in core though. We need to solve this problem, and the best way to start is by getting well-reviewed, high-quality extensions out there. 4. Future work -- RANGE KEY, RANGE FOREIGN KEY, RANGE MERGE JOIN, etc. - There are a few aspects of range types that aren't in the first patch, but are fairly obvious follow-up additions. These will require some knowledge about ranges in the backend, like finding the overlaps operator for a range. The current patch provides this knowledge by providing a built-in overlaps operator for ANYRANGE. This would be a non-issue if we had a good type interface system (that works on polymorphic types) -- we could just have a built-in range interface, and the range extension could add as the range interface's overlaps operator for the type ANYRANGE. = So, where on this spectrum should range types fall? I think the most minimalist would be to only support #1 (and the necessary type IO functions); and leave all other functions, operators, and opclasses to an extension. That has a lot of appeal, but I don't think we can ignore the challenges above. On the other hand, trying to make it a complete feature in core has challenges as well. For instance, even with Range Types, Exclusion Constraints aren't practical out-of-the-box unless we also have BTree-GiST in core. So there's a snowball effect. There might also be some middle ground, where its like the minimalist approach, but with a few very basic constructors and accessors. That would at least make it easier to test, but then to be actually useful (with index support, operators, fancy functions, etc.) you'd need the extension. Thoughts? ISTM (I haven't followed all the lead up so apologies if this is already covered) a range is a 3rd pseudo 'container' type (the other two being composites and arrays). Do you see: *) being able to make arrays of ranges/ranges of arrays? *) range of composites? I vote for at minimum the type itself and ANYRANGE to be in core. From there you could make it like arrays where the range type is automatically generated for each POD type. I would consider that for sure on basis of simplicity in user-land unless all the extra types and operators are a performance hit. A clean and highly usable implementation in the type system in the spirit of arrays would be fantastic. I'm particularly interested in hypothetical constructor/destructor and in/out mechanics...an 'unnest' like function, a range(a,b,c) that does as row(a,b,c) does, etc, especially if you can work it out so that everything is not hammered through textual processing. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error in PQsetvalue
On Mon, Jun 6, 2011 at 7:09 AM, Pavel Golub pa...@microolap.com wrote: Sorry for delay in answer. Yeah, I'm glad to. Should I apply this patch by myself? sure, run it against your test case and make sure it works. if so, we can pass it up the chain of command (hopefully as context diff). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
That's an improvement of about ~3.5x. According to the vmstat output, when running without the patch, the CPU state was about 40% idle. With the patch, it dropped down to around 6%. Wow! That's fantastic. Jignesh, are you in a position to test any of Robert's work using DBT or other benchmarks? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
Robert Haas robertmh...@gmail.com writes: IMHO, it's better to just have a deadline, Well, that's the fine point we're now talking about. I still think that we should try at making the best release possible. And if that means including changes at beta time because that's when someone got around to doing them, so be it — well, they should really worth it. So, to the question “do we want hard deadlines?” I think the answer is “no”, to “do we need hard deadlines?”, my answer is still “no”, and to the question “does this very change should be considered this late?” my answer is yes. Because it really changes the game for PostgreSQL users. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SAVEPOINTs and COMMIT performance
On Mon, Jun 6, 2011 at 10:33 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 06.02.2011 23:09, Simon Riggs wrote: On Sun, 2011-02-06 at 12:11 -0500, Bruce Momjian wrote: Did this ever get addressed? Patch attached. Seems like the easiest fix I can come up with. @@ -2518,7 +2518,7 @@ CommitTransactionCommand(void) case TBLOCK_SUBEND: do { - CommitSubTransaction(); + CommitSubTransaction(true); s = CurrentTransactionState; /* changed by pop */ } while (s-blockState == TBLOCK_SUBEND); /* If we had a COMMIT command, finish off the main xact too */ We also get into this codepath at RELEASE SAVEPOINT, in which case it is wrong to not reassign the locks to the parent subtransaction. Agreed. Thanks for the review. I'll change that. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types and extensions
Stephen Frost sfr...@snowman.net writes: I don't like the idea of having a capability which is not utilized in core. We should make it so extensions can *also* have access to define their own, but we should have the basics covered in core. Well if another part of core depends on the feature set, then of course you don't have a choice to make it an extension any more. I think that's where I would draw the line. Having it as a core extension might work, but I'm not really 'sold' on it. Well, core extension means built by default, part of default regression tests and all. The regression test simply begins with the create extension stanza, that's about it. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
On Mon, Jun 6, 2011 at 8:12 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: So, to the question “do we want hard deadlines?” I think the answer is “no”, to “do we need hard deadlines?”, my answer is still “no”, and to the question “does this very change should be considered this late?” my answer is yes. Because it really changes the game for PostgreSQL users. Much as I hate to say it (I too want to keep our schedule as predictable and organised as possible), I have to agree. Assuming the patch is good, I think this is something we should push into 9.1. It really could be a game changer. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
On 06/06/2011 09:24 PM, Dave Page wrote: On Mon, Jun 6, 2011 at 8:12 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: So, to the question “do we want hard deadlines?” I think the answer is “no”, to “do we need hard deadlines?”, my answer is still “no”, and to the question “does this very change should be considered this late?” my answer is yes. Because it really changes the game for PostgreSQL users. Much as I hate to say it (I too want to keep our schedule as predictable and organised as possible), I have to agree. Assuming the patch is good, I think this is something we should push into 9.1. It really could be a game changer. I disagree - the proposed patch maybe provides a very significant improvment for a certain workload type(nothing less but nothing more), but it was posted way after -BETA and I'm not sure we yet understand all implications of the changes. We also have to consider that the underlying issues are known problems for multiple years^releases so I don't think there is a particular rush to force them into a particular release (as in 9.1). Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
* Dave Page (dp...@pgadmin.org) wrote: Much as I hate to say it (I too want to keep our schedule as predictable and organised as possible), I have to agree. Assuming the patch is good, I think this is something we should push into 9.1. It really could be a game changer. So, with folks putting up that we should hammer this patch out and force it into 9.1.. What should our new release date for 9.1 be? What about other patches that didn't make it into 9.1? What about the upcoming CommitFest that we've asked people to start working on? If we're going to start putting in changes like this, I'd suggest that we try and target something like September for 9.1 to actually be released. Playing with the lock management isn't something we want to be doing lightly and I think we definitely need to have serious testing of this, similar to what has been done for the SSI changes, before we're going to be able to release it. I don't agree that we should delay 9.1, but if people really want this in, then we need to figure out what the new schedule is going to be. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
On Mon, Jun 6, 2011 at 8:40 PM, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: On 06/06/2011 09:24 PM, Dave Page wrote: On Mon, Jun 6, 2011 at 8:12 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: So, to the question “do we want hard deadlines?” I think the answer is “no”, to “do we need hard deadlines?”, my answer is still “no”, and to the question “does this very change should be considered this late?” my answer is yes. Because it really changes the game for PostgreSQL users. Much as I hate to say it (I too want to keep our schedule as predictable and organised as possible), I have to agree. Assuming the patch is good, I think this is something we should push into 9.1. It really could be a game changer. I disagree - the proposed patch maybe provides a very significant improvment for a certain workload type(nothing less but nothing more), but it was posted way after -BETA and I'm not sure we yet understand all implications of the changes. We certainly need to be happy with the implications if we were to make such a decision. We also have to consider that the underlying issues are known problems for multiple years^releases so I don't think there is a particular rush to force them into a particular release (as in 9.1). No, there's no *technical* reason we need to do this, as there would be if it were a bug fix for example. I would just like to see us narrow the gap with our competitors sooner rather than later, *if* we're a) happy with the change, and b) we're talking about a minimal delay (which we may be - Robert says he thinks the patch is good, so with another review and beta testing). -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
On 6/6/11 12:12 PM, Dimitri Fontaine wrote: So, to the question “do we want hard deadlines?” I think the answer is “no”, to “do we need hard deadlines?”, my answer is still “no”, and to the question “does this very change should be considered this late?” my answer is yes. I could not disagree more strongly. We're in *beta* now. It's not like the last CF closed a couple weeks ago. Heck, I'm about to open the first CF for 9.2 in just over a week. Also, a patch like this needs several months of development, discussion and testing in order to fix the issues Robert already identified and make sure it doesn't break something fundamental to concurrency. Which would mean delaying the release would be delayed until at least November, screwing over all the users who don't care about this patch. There will *always* be another really cool patch. If we keep delaying release to get in one more patch, then we never release. At some point you just have to take what you have and call it a release, and we are months past that point. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
On 06/06/2011 03:24 PM, Dave Page wrote: On Mon, Jun 6, 2011 at 8:12 PM, Dimitri Fontainedimi...@2ndquadrant.fr wrote: So, to the question “do we want hard deadlines?” I think the answer is “no”, to “do we need hard deadlines?”, my answer is still “no”, and to the question “does this very change should be considered this late?” my answer is yes. Because it really changes the game for PostgreSQL users. Much as I hate to say it (I too want to keep our schedule as predictable and organised as possible), I have to agree. Assuming the patch is good, I think this is something we should push into 9.1. It really could be a game changer. I'm not a fan of hard and fast deadlines for releases - it puts too much pressure on us to release before we might be ready. But I'm also not a fan of totally abandoning our established processes, which accepting this would. I don't mind bending the rules a bit occasionally; I do mind throwing them out the door. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
On Mon, Jun 6, 2011 at 8:44 PM, Stephen Frost sfr...@snowman.net wrote: * Dave Page (dp...@pgadmin.org) wrote: Much as I hate to say it (I too want to keep our schedule as predictable and organised as possible), I have to agree. Assuming the patch is good, I think this is something we should push into 9.1. It really could be a game changer. So, with folks putting up that we should hammer this patch out and force it into 9.1.. What should our new release date for 9.1 be? What about other patches that didn't make it into 9.1? What about the upcoming CommitFest that we've asked people to start working on? If we're going to start putting in changes like this, I'd suggest that we try and target something like September for 9.1 to actually be released. Playing with the lock management isn't something we want to be doing lightly and I think we definitely need to have serious testing of this, similar to what has been done for the SSI changes, before we're going to be able to release it. Completely aside from the issue at hand, aren't we looking at a September release by now anyway (assuming we have to void late July/August as we usually do)? -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
On Mon, Jun 6, 2011 at 2:49 PM, Josh Berkus j...@agliodbs.com wrote: That's an improvement of about ~3.5x. According to the vmstat output, when running without the patch, the CPU state was about 40% idle. With the patch, it dropped down to around 6%. Wow! That's fantastic. Jignesh, are you in a position to test any of Robert's work using DBT or other benchmarks? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com I missed the discussion. Can you send me the patch (will that work with 9.1 beta?)? I can do a before and after with DBT2 and let you know. And also test it with sysbench read test which also has a relation locking bottleneck. Thanks. Regards, Jignesh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
On Mon, Jun 6, 2011 at 5:13 PM, Simon Riggs si...@2ndquadrant.com wrote: The cost to us is a few days work and the benefit is a whole year's worth of increased performance for our user base, which has a hardware equivalent well into the millions of dollars. I doubt that this is an accurate reflection of the cost. What was presented by Robert Haas was a proof of concept, and he pointed out that it had numerous problems. To requote: There are numerous problems with the code as it stands at this point. It crashes if you try to use 2PC, which means the regression tests fail; it probably does horrible things if you run out of shared memory; pg_locks knows nothing about the new mechanism (arguably, we could leave it that way: only locks that can't possibly be conflicting with anything can be taken using this mechanism, but it would be nice to fix, I think); and there are likely some other gotchas as well. Turning this into something ready for production deployment in 9.1 would require a non-trivial amount of additional effort, and would likely have the adverse effect of deferring the release of 9.1, as well as of further deferring all the effects of the patches submitted for the latest commitfest (https://commitfest.postgresql.org/action/commitfest_view?id=10), since this defers release of 9.2, as well. While the patch is a fine one, in that it has interesting effects, it seems like a way wiser idea to me to let it go through the 9.2 process, so that it has 6 months worth of buildfarm runs before it gets deployed for real just like all the other items in the 2011-06 CommitFest. Note that it may lead to further discoveries, so that perhaps, in the 9.2 series, we'd see further improvements due to things that are discovered as further consequence of testing https://commitfest.postgresql.org/action/patch_view?id=572. -- When confronted by a difficult problem, solve it by reducing it to the question, How would the Lone Ranger handle this? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
On Mon, Jun 6, 2011 at 3:59 PM, Christopher Browne cbbro...@gmail.com wrote: On Mon, Jun 6, 2011 at 5:13 PM, Simon Riggs si...@2ndquadrant.com wrote: The cost to us is a few days work and the benefit is a whole year's worth of increased performance for our user base, which has a hardware equivalent well into the millions of dollars. I doubt that this is an accurate reflection of the cost. What was presented by Robert Haas was a proof of concept, and he pointed out that it had numerous problems. To requote: There are numerous problems with the code as it stands at this point. It crashes if you try to use 2PC, which means the regression tests fail; it probably does horrible things if you run out of shared memory; pg_locks knows nothing about the new mechanism (arguably, we could leave it that way: only locks that can't possibly be conflicting with anything can be taken using this mechanism, but it would be nice to fix, I think); and there are likely some other gotchas as well. The latest version of the patch is in much better shape: http://archives.postgresql.org/pgsql-hackers/2011-06/msg00403.php But this is not intended as disparagement for the balance of your argument. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] contrib/citext versus collations
I've been looking into bug #6053, in which Regina Obe complains that hash-based DISTINCT queries fail for type citext. The cause is not far to seek: the header comment for execGrouping.c states * Note: we currently assume that equality and hashing functions are not * collation-sensitive, so the code in this file has no support for passing * collation settings through from callers. That may have to change someday. and indeed the failure comes directly from the fact that citext's hash function *does* expect a collation to be passed to it. I'm a bit embarrassed to not have noticed that citext was a counterexample for this assumption, especially since I already fixed one bug that should have clued me in (commit a0b75a41a907e1582acdb8aa6ebb9cacca39d7d8). Now, removing this assumption from execGrouping.c is already a pretty sizable task --- for starters, at least plan node types Agg, Group, SetOp, Unique, and WindowAgg would need collation attributes that they don't have today. But the assumption that equality operators are not collation-sensitive is baked into a number of other places too; for instance nodeAgg.c @ line 600 indxpath.c @ line 2200 prepunion.c @ line 640 ri_triggers.c @ line 3000 and that's just places where there's a comment about it :-(. It's worth noting also that in many of these places, paying attention to collation is not merely going to need more coding; it will directly translate to a performance hit, one that is entirely unnecessary for the normal case where collation doesn't affect equality. So this leaves us between a rock and a hard place. I think there's just about no chance of fixing all these things without a serious fresh slip in the 9.1 schedule. Also, I'm *not* prepared to fix these things personally. I already regret the amount of time I put into collations this past winter/spring, and am not willing to drop another several weeks down that sinkhole right now. The most workable alternative that I can see is to lobotomize citext so that it always does lower-casing according to the database's default collation, which would allow us to pretend that its notion of equality is not collation-sensitive after all. We could hope to improve this in future release cycles, but not till we've done the infrastructure work outlined above. One bit of infrastructure that might be a good idea is a flag to indicate whether an equality operator's behavior is potentially collation-dependent, so that we could avoid taking performance hits in the normal case. Comments, other ideas? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch
Stephen Frost sfr...@snowman.net wrote: if people really want this in, then we need to figure out what the new schedule is going to be. I suggest June, 2012. That way we can get a whole bunch more really cool patches in, and the users won't have to wait for 9.2 to get them. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers