Re: [HACKERS] Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan
From: Amit Khandekar [mailto:amit.khande...@enterprisedb.com] On 1 November 2013 16:32, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: From: Fujii Masao [mailto:masao.fu...@gmail.com] I'm not sure if it's good idea to show the number of the fetches because it seems difficult to tune work_mem from that number. How can we calculate how much to increase work_mem to avoid lossy bitmap from the number of the fetches in EXPLAIN output? We can calculate that from the following equation in tbm_create(): nbuckets = maxbytes / (MAXALIGN(sizeof(HASHELEMENT)) + MAXALIGN(sizeof(PagetableEntry)) + sizeof(Pointer) + sizeof(Pointer)), where maxbytes is the size of memory used for the hashtable in a TIDBitmap, designated by work_mem, and nbuckets is the estimated number of hashtable entries we can have within maxbytes. From this, the size of work_mem within which we can have every hashtable entry as an exact bitmap is calculated as follows: work_mem = (the number of exact pages + the number of lossy pages) * (MAXALIGN(sizeof(HASHELEMENT)) + MAXALIGN(sizeof(PagetableEntry)) + sizeof(Pointer) + sizeof(Pointer)) / (1024 * 1024). I am yet to give more thought on the above formula (particularly exact_pages + lossy_pages), but I was also wondering if the user would indeed be able to figure out the above way to estimate the memory, or the explain itself should show the estimated memory required for the bitmap. For hash joins we do show the memory taken by the hash table in show_hash_info(). We can show the memory requirement in addition to the number of exact/lossy pages. Thank you for the review! Reconsidering that, I wish to know your opinion. The patch shows the number of exact/lossy pages that has been fetched in a bitmap heap scan. But the number varies with the fraction of tuples to be retrieved like the following. postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and 0.02; QUERY PLAN Bitmap Heap Scan on demo (cost=2187.35..101419.96 rows=102919 width=42) (actual time=23.684..1302.382 rows=99803 loops=1) Recheck Cond: ((col2 = 0.01::double precision) AND (col2 = 0.02::double precision)) Rows Removed by Index Recheck: 6279502 Heap Blocks: exact=1990 lossy=59593 - Bitmap Index Scan on demo_col2_idx (cost=0.00..2161.62 rows=102919 width=0) (actual time=23.330..23.330 rows=99803 loops=1) Index Cond: ((col2 = 0.01::double precision) AND (col2 = 0.02::double precision)) Total runtime: 1311.949 ms (7 rows) postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and 0.02 LIMIT 5000; QUERY PLAN -- Limit (cost=2187.35..7008.26 rows=5000 width=42) (actual time=23.543..86.093 rows=5000 loops=1) - Bitmap Heap Scan on demo (cost=2187.35..101419.96 rows=102919 width=42) (actual time=23.542..85.196 rows=5000 loops=1) Recheck Cond: ((col2 = 0.01::double precision) AND (col2 = 0.02::double precision)) Rows Removed by Index Recheck: 312179 Heap Blocks: exact=99 lossy=2963 - Bitmap Index Scan on demo_col2_idx (cost=0.00..2161.62 rows=102919 width=0) (actual time=23.189..23.189 rows=99803 loops=1) Index Cond: ((col2 = 0.01::double precision) AND (col2 = 0.02::double precision)) Total runtime: 86.626 ms (8 rows) So, my question is, we should show the number of exact/lossy pages in a TIDBitmap, not the number of these pages that has been fetched in the bitmap heap scan? Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: PostgreSQL Service on Windows does not start. ~ is not a valid Win32 application
* Tom Lane wrote: I looked at this patch a bit. I agree that we need to fix pgwin32_CommandLine to double-quote the executable name, but it needs a great deal more work than that :-(. Whoever wrote this code was One additional issue is that the path to the service executable should use backslashes exclusively. Currently, the last directory separator in the service command line (the one before pg_ctl.exe) is a forward slash. I recently had trouble with Symantec Backup Exec (not sure which versions are affected); it fails to do system state backups when a service registered using pg_ctl is present on the system. See http://www.symantec.com/docs/TECH144413 for the same issue involving a different service. The EDB installer does not cause that problem, although I don't know if that is because it does not use pg_ctl to register the service or because it fixes the path afterwards. -- Christian -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sequence Access Method WIP
On 24.11.2013 19:23, Simon Riggs wrote: On 18 November 2013 07:06, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 18.11.2013 13:48, Simon Riggs wrote: On 18 November 2013 07:50, Heikki Linnakangas hlinnakan...@vmware.com wrote: It doesn't go far enough, it's still too *low*-level. The sequence AM implementation shouldn't need to have direct access to the buffer page at all. I don't think the sequence AM should be in control of 'cached'. The caching is done outside the AM. And log_cnt probably should be passed to the _alloc function directly as an argument, ie. the server code asks the AM to allocate N new values in one call. I can't see what the rationale of your arguments is. All index Ams write WAL and control buffer locking etc.. Index AM's are completely responsible for the on-disk structure, while with the proposed API, both the AM and the backend are intimately aware of the on-disk representation. Such a shared responsibility is not a good thing in an API. I would also be fine with going 100% to the index AM direction, and remove all knowledge of the on-disk layout from the backend code and move it into the AMs. Then you could actually implement the discussed store all sequences in a single file change by writing a new sequence AM for it. I think the way to resolve this is to do both of these things, i.e. a two level API 1. Implement SeqAM API at the most generic level. Add a nextval() call as well as alloc() 2. Also implement the proposed changes to alloc() The proposed changes to alloc() would still suffer from all the problems that I complained about. Adding a new API alongside doesn't help with that. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Status of FDW pushdowns
Merlin Moncure mmonc...@gmail.com writes: By 'insignificant' you mean 'necessary to do any non-trivial real work'. Personally, I'd prefer it if FDW was extended to allow arbitrary parameterized queries like every other database connectivity API ever made ever. But in lieu of that, I'll take as much push down power as possible :-D. That sounds more like FOREIGN VIEW and FOREIGN FUNCTION to me, where you could have the whole control of the local/remote boundaries. I mean that when planning a query using a FOREIGN VIEW it would probably make sense to consider it as a CTE as far as the optimizer is concerned. About FOREIGN FUNCTION, that would allow to inject arbitrary parameters anywhere in the remote query when doing SQL functions. We have a very nice version of FOREIGN FUNCTION already, that's plproxy. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new unicode table border styles for psql
Pavel Stehule pavel.steh...@gmail.com writes: there is other issue - simply parser will be really user unfriendly, and user friendly parser will not by simply :( If simple things are hard to implement, get yourself better tools. Each time we get on the topic of improving scripting abilities for our interactive tool, it's always the same problem: having to invent a scripting language with a whole parser is just too much work. Maybe it's time we step back a little and consider real scripting solutions to embed into psql, and pgbench too: http://ecls.sourceforge.net/ LGPL Common Lisp http://www.gnu.org/software/guile/ LGPL Scheme, Javascript, Emacs Lisp http://www.lua.org/ MIT Lua Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY table FROM STDIN doesn't show count tag
On 25 November 2013, Amit Khandekar amit.khande...@enterprisedb.commailto:amit.khande...@enterprisedb.com wrote: Ok. we will then first fix the \COPY TO issue where it does not revert back the overriden psql output file handle. Once this is solved, fix for both COPY FROM and COPY TO, like how it is done in the patch earlier (copydefectV2.patch). I analyzed the solution to fix \COPY TO issue but unfortunately I observed that do_copy is already resetting the value of cur_cmd_source and queryFout but before that itself result status is printed. So we'll have to reset the value before result status is being displayed. So as other alternative solutions, I have two approaches: 1. We can store current file destination queryFout in some local variable and pass the same to SendQuery function as a parameter. Same can be used to reset the value of queryFout after return from ProcessResult From all other callers of SendQuery , we can pass NULL value for this new parameter. 2. We can add new structure member variable FILE *prevQueryFout in structure struct _psqlSettings, which hold the value of queryFout before being changed in do_copy. And then same can be used to reset value in SendQuery or ProcessResult. I think approach #2 is fine. Rather than prevQueryFout, I suggest defining a separate FILE * handle for COPY. I don't see any other client-side command that uses its own file pointer for reading and writing, like how COPY does. And this handle has nothing to do with pset stdin and stdout. So we can have this special _psqlSettings-copystream specifically for COPY. Both handleCopyIn() and handleCopyOut() will be passed pset.copystream. In do_copy(), instead of overriding pset.queryFout, we can set pset.copystream to copystream, or to stdin/stdout if copystream is NULL. OK. I have revised the patch as per the discussion. Now if \copy command is called then, we are setting the appropriate value of _psqlSettings-copystream in do_copy and same is being used inside handleCopyIn() and handleCopyOut(). Once the \copy command execution finishes, we are resetting the value of _psqlSettings-copystream to NULL. And if COPY(No slash) command is used, then in that case _psqlSettings-copystream will be NULL. So based on this value being NULL, copyStream will be assigned as stdout/stdin depending on TO/FROM respectively inside the function handleCopyOut()/handleCopyIn(). Also in order to address the queries like ./psql -d postgres -c \copy tbl to '/home/rajeev/9.4gitcode/install/bin/data/temp.txt'; copy tbl from stdin; Inside the function ProcessResult, we check that if it is the second cycle and result status is COPY OUT or IN, then we reset the value of _psqlSettings-copystream to NULL, so that it can take the value as stdout/stdin for further processing. Please provide your opinion. Thanks and Regards, Kumar Rajeev Rastogi copyerrorV4.patch Description: copyerrorV4.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sequence Access Method WIP
On 25 November 2013 04:01, Heikki Linnakangas hlinnakan...@vmware.com wrote: The proposed changes to alloc() would still suffer from all the problems that I complained about. Adding a new API alongside doesn't help with that. You made two proposals. I suggested implementing both. What would you have me do? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Errors on missing pg_subtrans/ files with 9.3
Hi, On 2013-11-24 16:56:26 -0500, J Smith wrote: coredumper worked like a charm. Useful tool, that is... although as a bit of advice, I'd try not to run it on Postgres if your various memory settings are tweaked towards production use -- the core dump that was captured on my server weighed in at 16 GB. Nov 23 14:38:32 dev postgres[23810]: [4-1] user=dev,db=dev ERROR: could not access status of transaction 13514992 Nov 23 14:38:32 dev postgres[23810]: [4-2] user=dev,db=dev DETAIL: Could not open file pg_subtrans/00CE: Success. Nov 23 14:38:32 dev postgres[23810]: [4-3] user=dev,db=dev CONTEXT: SQL statement SELECT 1 FROM ONLY dev.collection_batches x WHERE id OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x Ok, this is helpful. Do you rather longrunning transactions? The transaction that does foreign key checks has an xid of 10260613, while the row that's getting checked has 13514992. #4 0x00635dc7 in XactLockTableWait (xid=13514992) at lmgr.c:501 tag = {locktag_field1 = 13514992, locktag_field2 = 0, locktag_field3 = 0, locktag_field4 = 0, locktag_type = 4 '\004', locktag_lockmethodid = 1 '\001'} #5 0x00482223 in heap_lock_updated_tuple_rec (rel=0x2b20f050a8d0, tuple=value optimized out, ctid=value optimized out, xid=10260613, mode=LockTupleKeyShare) at heapam.c:4847 I am not sure whether that's the origin of the problem but at the very least it seems to me that heap_lock_updated_tuple_rec() is missing several important pieces: a) do the priorXmax==xmin dance to check we're still following the same ctid chain. Currently we could easily stumble across completely unrelated tuples if a chain element aborted and got vacuumed. b) Check whether a chain element actually aborted - currently we're only doing that in the HEAP_KEYS_UPDATED updated case, but that seems wrong (we can't check for committed tho!). c) (reported separately as well) cope with failure of heap_fetch() to return anything. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging WAL when updating hintbit
On Sun, Nov 24, 2013 at 6:02 AM, Jeff Davis pg...@j-davis.com wrote: On Tue, 2013-11-19 at 11:42 -0500, Robert Haas wrote: My take is that configuration options should be used to serve different use cases. One thing I like about postgres is that it doesn't have options for the sake of options. The trade-off here seems to be: if you want fast failback, you have to write more WAL during normal operation. But it's not clear to me which one I should choose for a given installation. If there's a lot of data, then fast failback is nice, but then again, logging the hint bits on a large amount of data might be painful during normal operation. The only time the choice is easy is when you already have checksums enabled. I think we should work some more in this area first so we can end up with something that works for everyone; or at least a more clear choice to offer users. My hope is that it will go something like: 1. We get more reports from the field about checksum performance 2. pg_rewind gets some more attention 3. we find a way to upgrade a replica set using pg_upgrade and pg_rewind so that the replicas do not all need a new basebackup after an upgrade 4. We further mitigate the performance impact of logging all page modifications 5. We eventually see that the benefits of logging all page modifications outweigh the performance cost for most people, and start recommending to most people 6. The other WAL levels become more specialized for single, unreplicated instances That's just a hope, of course, but we've made some progress and I think it's a plausible outcome. As it stands, the replica re-sync after any failover or upgrade seems to be a design gap. All that being said, I don't object to this option -- I just want it to lead us somewhere. Not be another option that we keep around forever with conflicting recommendations about how to set it. Thank you for feedback. I agree with you. We need to more report regarding checksum performance first. I will do that. I attached the new version patch which adds separated parameter 'enable_logging_hintbit'. It works same as previous patch, just independence from wal_level and name is changed. One changed behave is that it doesn't work together with 'minimal' wal_level. Regards, --- Sawada Masahiko log_hint_bit_wal_v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] pg_upgrade ?deficiency
On Sat, Nov 23, 2013 at 08:44:42AM -0800, Kevin Grittner wrote: Bruce Momjian br...@momjian.us wrote: I am not a fan of backpatching any of this. Here's my problem with that. Here's setup to create what I don't think is all that weird a setup: The cluster is created in the state that was dumped, default read only flags and all. Are you saying that you find current behavior acceptable in back branches? First, I don't need to see a 300-line pg_dump restore output to know it is a bug. Second, what you didn't do is to address the rest of my paragraph: I am not a fan of backpatching any of this. We have learned the fix is more complex than thought, and the risk of breakage and having pg_dump diffs change between minor releases doesn't seem justified. We have to balance the _one_ user failure report we have received vs. potential breakage. Now, others seem to be fine with a backpatch, so perhaps it is safe. I am merely pointing out that, with all backpatching, we have to balance the fix against possible breakage and behavioral change. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Service on Windows does not start. ~ is not a valid Win32 application
On 24 November 2013, Tom Lane Wrote: One suggestion: Instead of using sizeof(cmdLine), a. Can't we use strlen (hence small 'for' loop). b. Or use memmove to move one byte. I looked at this patch a bit. I agree that we need to fix pgwin32_CommandLine to double-quote the executable name, but it needs a great deal more work than that :-(. Whoever wrote this code was apparently unacquainted with the concept of buffer overrun. It's not going to be hard at all to crash pg_ctl with overlength arguments. I'm not sure that that amounts to a security bug, but it's certainly bad. After some thought it seems like the most future-proof fix is to not use a fixed-length buffer for the command string at all. The attached revised patch switches it over to using a PQExpBuffer instead, which is pretty much free since we're relying on libpq anyway in this program. (We still use a fixed-length buffer for the program path, which is OK because that's what find_my_exec and find_other_exec expect.) In addition, I fixed it to append .exe in both cases not just the one. I'm not in a position to actually test this, but it does compile without warnings. I tested the latest patch. My observation is: If we give relative data directory path while registering the service, then service start fails. But same works if the data directory is absolute path. Looks like an existing issue. May be we need to internally convert relative data path to absolute. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new unicode table border styles for psql
On Mon, Nov 25, 2013 at 3:33 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Pavel Stehule pavel.steh...@gmail.com writes: there is other issue - simply parser will be really user unfriendly, and user friendly parser will not by simply :( If simple things are hard to implement, get yourself better tools. Each time we get on the topic of improving scripting abilities for our interactive tool, it's always the same problem: having to invent a scripting language with a whole parser is just too much work. Maybe it's time we step back a little and consider real scripting solutions to embed into psql, and pgbench too: I'm thinking (did I miss something?) that Pavel was commenting merely on the parsing of setting unicode border characters, not the wider scripting of psql. (psql scripting is a fun topic to discuss though :-)). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new unicode table border styles for psql
On 11/25/2013 09:00 AM, Merlin Moncure wrote: On Mon, Nov 25, 2013 at 3:33 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Pavel Stehule pavel.steh...@gmail.com writes: there is other issue - simply parser will be really user unfriendly, and user friendly parser will not by simply :( If simple things are hard to implement, get yourself better tools. Each time we get on the topic of improving scripting abilities for our interactive tool, it's always the same problem: having to invent a scripting language with a whole parser is just too much work. Maybe it's time we step back a little and consider real scripting solutions to embed into psql, and pgbench too: I'm thinking (did I miss something?) that Pavel was commenting merely on the parsing of setting unicode border characters, not the wider scripting of psql. (psql scripting is a fun topic to discuss though :-)). Even if it is it's totally off topic. Please don't hijack email threads. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new unicode table border styles for psql
Andrew Dunstan and...@dunslane.net writes: Even if it is it's totally off topic. Please don't hijack email threads. Well, when I read that parsing a user setup is too complex, for me that calls for a solution that offers more power to the user without us having to write specialized code each time. I'm sorry, but I don't understand how off-topic or hijack applies here. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new unicode table border styles for psql
On 11/25/2013 09:33 AM, Dimitri Fontaine wrote: Andrew Dunstan and...@dunslane.net writes: Even if it is it's totally off topic. Please don't hijack email threads. Well, when I read that parsing a user setup is too complex, for me that calls for a solution that offers more power to the user without us having to write specialized code each time. I'm sorry, but I don't understand how off-topic or hijack applies here. It just seems to me to be a very big stretch to go from the topic of psql border styles to the topic of psql scripting support. Your use case would surely be using a sledgehammer to crack a nut. By all means argue for better scripting support in psql, but I would suggest your argument would be better if the use case were something more important and central to psql's purpose. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new unicode table border styles for psql
Andrew Dunstan and...@dunslane.net writes: I'm sorry, but I don't understand how off-topic or hijack applies here. And I just realize there's another way to read what Pavel said, which is that *user scripts* parsing the output of psql might become harder to write as soon as they don't control the default border style in use. Well in that case, yes I'm vastly off-topic. I was answering to how to parse the user setting itself, so writing C code inside the psql source tree itself, and how to expose a fine grained solution to that problem without having to write a whole new configuration parser. It just seems to me to be a very big stretch to go from the topic of psql border styles to the topic of psql scripting support. Your use case would surely be using a sledgehammer to crack a nut. By all means argue for better scripting support in psql, but I would suggest your argument would be better if the use case were something more important and central to psql's purpose. I think I just understood something entirely different that what you were talking about. -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] docbook-xsl version for release builds
On 10/3/13, 7:58 AM, Magnus Hagander wrote: Did we get around to doing this? Nope. Given my schedule between now and the release wrap, I won't have a chance to do it if I want any reasonable ability to roll it back if it fails. But if you want to ahead and get it done, go ahead :) Next try? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] docbook-xsl version for release builds
On Mon, Nov 25, 2013 at 4:27 PM, Peter Eisentraut pete...@gmx.net wrote: On 10/3/13, 7:58 AM, Magnus Hagander wrote: Did we get around to doing this? Nope. Given my schedule between now and the release wrap, I won't have a chance to do it if I want any reasonable ability to roll it back if it fails. But if you want to ahead and get it done, go ahead :) Next try? Thanks for the reminder - done (installed the DEB from sid, version 1.78.1). This will somehow show up in the snapshot builds, correct? So we (you? :P) can verify after the next snapshots that it's correct? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] docbook-xsl version for release builds
Magnus Hagander mag...@hagander.net writes: This will somehow show up in the snapshot builds, correct? So we (you? :P) can verify after the next snapshots that it's correct? I thought the devel docs at http://www.postgresql.org/docs/devel/static/ were built on that machine? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MultiXact bugs
Andres Freund wrote: Hi, The attached pgbench testcase can reproduce two issues: Thanks. 2) we frequently error out in heap_lock_updated_tuple_rec() with ERROR: unable to fetch updated version of tuple That's because when we're following a ctid chain, it's perfectly possible for the updated version of a tuple to already have been vacuumed/pruned away if the the updating transaction has aborted. Also looks like a 9.3+ issues to me, slightly higher impact, but in the end you're just getting some errors under concurrency. Yes, I think this is 9.3 only. First attachment shows my proposed patch, which is just to report success to caller in case the tuple cannot be acquired by heap_fetch. This is OK because if by the time we check the updated version of a tuple it is gone, it means there is no further update chain to follow and lock. 1) (takes a bit) TRAP: FailedAssertion(!(((xid) = ((TransactionId) 3))), File:/pruneheap.c, Line: 601) That's because HeapTupleHeaderGetUpdateXid() ignores aborted updaters and returns InvalidTransactionId in that case, but HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DELETE_IN_PROGRESS... Interesting. This is a very narrow race condition: when we call HeapTupleSafisfiesVacuum the updater is still running, so it returns HEAPTUPLE_DELETE_IN_PROGRESS; but it aborts just before we read the tuple's update Xid. So that one returns InvalidXid and that causes the failure. I was able to hit this race condition very quickly by adding a pg_usleep(1000) in heap_prune_chain(), in the HEAPTUPLE_DELETE_IN_PROGRESS case, before trying to acquire the update Xid. There is no way to close the window, but there is no need; if the updater aborted, we don't need to mark the page prunable in the first place. So we can just check the return value of HeapTupleHeaderGetUpdateXid and if it's InvalidXid, don't set the prunable bit. The second attachment below fixes the bug that way. I checked for other cases where the update Xid is checked after HeapTupleSatisfiesVacuum returns HEAPTUPLE_DELETE_IN_PROGRESS. As far as I can tell, the only one that would be affected is the one in predicate.c. It is far from clear to me what is the right thing to do in these cases; the simplest idea is to return without reporting a failure if the updater aborted, just as above; but I wonder if this needs to be conditional on visible. I added a pg_usleep() before acquiring the update Xid in the relevant case, but the isolation test cases didn't hit the problem (I presume there is no update/delete in these test cases, but I didn't check). I defer to Kevin on this issue. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services *** a/src/backend/access/heap/heapam.c --- b/src/backend/access/heap/heapam.c *** *** 4829,4835 heap_lock_updated_tuple_rec(Relation rel, ItemPointer tid, TransactionId xid, ItemPointerCopy(tupid, (mytup.t_self)); if (!heap_fetch(rel, SnapshotAny, mytup, buf, false, NULL)) ! elog(ERROR, unable to fetch updated version of tuple); l4: CHECK_FOR_INTERRUPTS(); --- 4829,4844 ItemPointerCopy(tupid, (mytup.t_self)); if (!heap_fetch(rel, SnapshotAny, mytup, buf, false, NULL)) ! { ! /* ! * if we fail to find the updated version of the tuple, it's ! * because it was vacuumed/pruned away after its creator ! * transaction aborted. So behave as if we got to the end of the ! * chain, and there's no further tuple to lock: return success to ! * caller. ! */ ! return HeapTupleMayBeUpdated; ! } l4: CHECK_FOR_INTERRUPTS(); *** a/src/backend/access/heap/pruneheap.c --- b/src/backend/access/heap/pruneheap.c *** *** 479,491 heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum, break; case HEAPTUPLE_DELETE_IN_PROGRESS: - /* - * This tuple may soon become DEAD. Update the hint field so - * that the page is reconsidered for pruning in future. - */ - heap_prune_record_prunable(prstate, - HeapTupleHeaderGetUpdateXid(htup)); break; case HEAPTUPLE_LIVE: --- 479,500 break; case HEAPTUPLE_DELETE_IN_PROGRESS: + { + TransactionId xmax; + + /* + * This tuple may soon become DEAD. Update the hint field + * so that the page is reconsidered for pruning in future. + * If there was a MultiXactId updater, and it aborted after + * HTSV checked, then we will get an invalid Xid here. + * There is no need for future pruning of the page in that + * case, so skip it. + */ + xmax = HeapTupleHeaderGetUpdateXid(htup); + if (TransactionIdIsValid(xmax)) + heap_prune_record_prunable(prstate, xmax); + } break; case HEAPTUPLE_LIVE: *** a/src/backend/storage/lmgr/predicate.c --- b/src/backend/storage/lmgr/predicate.c *** ***
Re: [HACKERS] docbook-xsl version for release builds
On Mon, Nov 25, 2013 at 4:40 PM, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: This will somehow show up in the snapshot builds, correct? So we (you? :P) can verify after the next snapshots that it's correct? I thought the devel docs at http://www.postgresql.org/docs/devel/static/ were built on that machine? They are, but I thought the issue Peter wanted fixed was only in the man pages, which aren't uploaded there... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] session_preload_libraries not in sample config?
On Mon, Nov 25, 2013 at 4:04 AM, Jeff Davis pg...@j-davis.com wrote: session_preload_libraries is not in the sample config file. Is that just an oversight? I think so. Regards, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MultiXact bugs
On 2013-11-25 12:36:19 -0300, Alvaro Herrera wrote: 2) we frequently error out in heap_lock_updated_tuple_rec() with ERROR: unable to fetch updated version of tuple That's because when we're following a ctid chain, it's perfectly possible for the updated version of a tuple to already have been vacuumed/pruned away if the the updating transaction has aborted. Also looks like a 9.3+ issues to me, slightly higher impact, but in the end you're just getting some errors under concurrency. Yes, I think this is 9.3 only. First attachment shows my proposed patch, which is just to report success to caller in case the tuple cannot be acquired by heap_fetch. This is OK because if by the time we check the updated version of a tuple it is gone, it means there is no further update chain to follow and lock. Looks good. 1) (takes a bit) TRAP: FailedAssertion(!(((xid) = ((TransactionId) 3))), File:/pruneheap.c, Line: 601) That's because HeapTupleHeaderGetUpdateXid() ignores aborted updaters and returns InvalidTransactionId in that case, but HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DELETE_IN_PROGRESS... Interesting. This is a very narrow race condition: when we call HeapTupleSafisfiesVacuum the updater is still running, so it returns HEAPTUPLE_DELETE_IN_PROGRESS; but it aborts just before we read the tuple's update Xid. Well, it's not *that* narrow - remember that a transaction is marked as aborted in the clog *before* it is removed from the proc array. There is no way to close the window, but there is no need; if the updater aborted, we don't need to mark the page prunable in the first place. So we can just check the return value of HeapTupleHeaderGetUpdateXid and if it's InvalidXid, don't set the prunable bit. The second attachment below fixes the bug that way. I am not sure I like the fact that HeapTupleHeaderGetUpdateXid() checks for aborted transactions in the first place. Why is that a good idea? ISTM that wanders off a fair bit from the other HeapTupleHeaderGet* macros. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
Dimitri, * Dimitri Fontaine (dimi...@2ndquadrant.fr) wrote: As the path didn't make it already, yes it needs another (final) round of review. The main difficulty in reviewing is understanding the design and the relation in between our current model of extensions and what this patch offers. I'm afraid this really needs more work, at least of the more mundane kind. I started working through this patch, but when I hit on get_template_oid, I was reminded of the discussion we had back in January around using just 'template' everywhere. http://www.postgresql.org/message-id/20130118182156.gf16...@tamriel.snowman.net We already have other 'template' objects in the system and I'm not excited about the confusion. This also applies to 'CreateTemplate', 'CreateTemplateTupleDesc', right down to 'template.h' and 'template.c'. Attached is a patch against v16 which fixes up a few documentation issues (I'm pretty sure extension templates and aggregates are unrelated..), and points out that there is zero documentation on these new catalog tables (look for 'XXX' in the patch) along with a few other areas which could use improvement. Thanks, Stephen diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml new file mode 100644 index acc261c..ae1ce2e 100644 *** a/doc/src/sgml/catalogs.sgml --- b/doc/src/sgml/catalogs.sgml *** *** 154,159 --- 154,174 /row row + entrylink linkend=catalog-pg-extension-controlstructnamepg_extension_control/structname/link/entry + XXXNeeds documentation/XXX + /row + + row + entrylink linkend=catalog-pg-extension-templatestructnamepg_extension_template/structname/link/entry + XXXNeeds documentation/XXX + /row + + row + entrylink linkend=catalog-pg-extension-uptmplstructnamepg_extension_uptmpl/structname/link/entry + XXXNeeds documentation/XXX + /row + + row entrylink linkend=catalog-pg-foreign-data-wrapperstructnamepg_foreign_data_wrapper/structname/link/entry entryforeign-data wrapper definitions/entry /row *** *** 3290,3295 --- 3305,3325 /para /sect1 + sect1 id=catalog-pg-extension-control + titlestructnamepg_extension_control/structname/title + XXXNeeds documentation/XXX + /sect1 + + sect1 id=catalog-pg-extension-template + titlestructnamepg_extension_template/structname/title + XXXNeeds documentation/XXX + /sect1 + + sect1 id=catalog-pg-extension-uptmpl + titlestructnamepg_extension_uptmpl/structname/title + XXXNeeds documentation/XXX + /sect1 + sect1 id=catalog-pg-foreign-data-wrapper titlestructnamepg_foreign_data_wrapper/structname/title diff --git a/doc/src/sgml/extend.sgml b/doc/src/sgml/extend.sgml new file mode 100644 index 772310b..b4a589c 100644 *** a/doc/src/sgml/extend.sgml --- b/doc/src/sgml/extend.sgml *** *** 384,398 para The xref linkend=sql-create-template-for-extension command allows a ! superuser to create an firsttermExtension's Template/, that users will then be able to use as the source for their script and control parameters. Upgrade scripts can be provided with the same command, and ! those upgrade scripts can include parameters update too, as would a secondary control file. /para para ! Here's a very simple example at using a template for an extension: programlisting create template for extension myextension version '1.0' with (nosuperuser) as $script$ --- 384,398 para The xref linkend=sql-create-template-for-extension command allows a ! superuser to create an firsttermExtension Template/, that users will then be able to use as the source for their script and control parameters. Upgrade scripts can be provided with the same command, and ! those upgrade scripts can include parameter updates too, as would a secondary control file. /para para ! Here's a very simple example of using a template for an extension: programlisting create template for extension myextension version '1.0' with (nosuperuser) as $script$ *** create extension myextension; *** 476,487 listitem para This option allows an extension author to avoid shiping all versions ! scripts when shipping an extension. When a version is requested and ! the matching script does not exist on disk, set replaceabledefault_full_version/replaceable to the first script you still ship and PostgreSQL will apply the intermediate upgrade script as per the commandALTER EXTENSION UPDATE/command command. /para /listitem /varlistentry --- 476,488 listitem para This option allows an extension author to avoid shiping all versions ! of all scripts when shipping an
Re: [HACKERS] Errors on missing pg_subtrans/ files with 9.3
On Mon, Nov 25, 2013 at 6:47 AM, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2013-11-24 16:56:26 -0500, J Smith wrote: Nov 23 14:38:32 dev postgres[23810]: [4-1] user=dev,db=dev ERROR: could not access status of transaction 13514992 Nov 23 14:38:32 dev postgres[23810]: [4-2] user=dev,db=dev DETAIL: Could not open file pg_subtrans/00CE: Success. Nov 23 14:38:32 dev postgres[23810]: [4-3] user=dev,db=dev CONTEXT: SQL statement SELECT 1 FROM ONLY dev.collection_batches x WHERE id OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x Ok, this is helpful. Do you rather longrunning transactions? The transaction that does foreign key checks has an xid of 10260613, while the row that's getting checked has 13514992. We did have some long-running transactions, yes. We refactored a bit and removed them and the problem ceased on our end. We ended up reverting our changes for the sake of running this experiment over the weekend and the errors returned. We've since restored our fix and haven't had any problems since, so yeah, long-running transactions appear to be involved. We can continue to experiment if you have any additional tests you'd like us to run. We may have to keep the experiments to running over the weekend, but they're definitely do-able. Cheers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MultiXact bugs
Andres Freund wrote: On 2013-11-25 12:36:19 -0300, Alvaro Herrera wrote: There is no way to close the window, but there is no need; if the updater aborted, we don't need to mark the page prunable in the first place. So we can just check the return value of HeapTupleHeaderGetUpdateXid and if it's InvalidXid, don't set the prunable bit. The second attachment below fixes the bug that way. I am not sure I like the fact that HeapTupleHeaderGetUpdateXid() checks for aborted transactions in the first place. Why is that a good idea? ISTM that wanders off a fair bit from the other HeapTupleHeaderGet* macros. Originally it didn't, which caused various bugs. I recall it turned out to be cleaner to do the check inside it than putting it out to its callers. I have thoughts that this design might break other things such as the priorXmax checking while traversing HOT chains. Not seeing how: surely if there's an aborted updater in a tuple, there can't be a followup HOT chain elsewhere involving the same tuple. A HOT chain would require another updater Xid in the MultiXact (and we ensure there can only be one updater in a multi). I might be missing something. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Errors on missing pg_subtrans/ files with 9.3
J Smith escribió: We did have some long-running transactions, yes. We refactored a bit and removed them and the problem ceased on our end. We ended up reverting our changes for the sake of running this experiment over the weekend and the errors returned. We've since restored our fix and haven't had any problems since, so yeah, long-running transactions appear to be involved. We can continue to experiment if you have any additional tests you'd like us to run. We may have to keep the experiments to running over the weekend, but they're definitely do-able. I am working on patches to get these bugs fixed. Would you be up for running a patched binary and see if the errors go away? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MultiXact bugs
On 2013-11-25 13:45:53 -0300, Alvaro Herrera wrote: Andres Freund wrote: On 2013-11-25 12:36:19 -0300, Alvaro Herrera wrote: There is no way to close the window, but there is no need; if the updater aborted, we don't need to mark the page prunable in the first place. So we can just check the return value of HeapTupleHeaderGetUpdateXid and if it's InvalidXid, don't set the prunable bit. The second attachment below fixes the bug that way. I am not sure I like the fact that HeapTupleHeaderGetUpdateXid() checks for aborted transactions in the first place. Why is that a good idea? ISTM that wanders off a fair bit from the other HeapTupleHeaderGet* macros. Originally it didn't, which caused various bugs. I recall it turned out to be cleaner to do the check inside it than putting it out to its callers. This seems strange to me because we do *not* do those checks for !IS_MULTI xmax's. So there surely shouldn't be too many callers caring about that? I have thoughts that this design might break other things such as the priorXmax checking while traversing HOT chains. Not seeing how: Hm. Are you arguing with yourself about this? surely if there's an aborted updater in a tuple, there can't be a followup HOT chain elsewhere involving the same tuple. A HOT chain would require another updater Xid in the MultiXact (and we ensure there can only be one updater in a multi). I might be missing something. I don't see dangers that way either. I think there might be some strange behaviour because callers need to check IsRunning() first though, which MultiXactIdGetUpdateXid() doesn't. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Errors on missing pg_subtrans/ files with 9.3
On Mon, Nov 25, 2013 at 11:46 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: J Smith escribió: We did have some long-running transactions, yes. We refactored a bit and removed them and the problem ceased on our end. We ended up reverting our changes for the sake of running this experiment over the weekend and the errors returned. We've since restored our fix and haven't had any problems since, so yeah, long-running transactions appear to be involved. We can continue to experiment if you have any additional tests you'd like us to run. We may have to keep the experiments to running over the weekend, but they're definitely do-able. I am working on patches to get these bugs fixed. Would you be up for running a patched binary and see if the errors go away? Sure, just give me a git commit hash and I can do a test build towards the weekend. I'll include coredumper again just in case things go awry and we can go from there. Cheers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SQL assertions prototype
Andrew Tipton and...@kiwidrew.com wrote: Simon Riggs si...@2ndquadrant.com wrote: So we'd need to get access to the changed rows, rather than re-executing a huge SQL command that re-checks every row of the table. That last point will make it unusable for sensible amounts of data. That sounds very similar to handling incremental maintenance of materialized views, which Kevin is working on. It does. Let's assume that the huge SQL command that re-checks every row of the table is actually a materialized view. In that case, the CREATE ASSERTION trigger would merely need to scan the matview and raise an error if any rows were present. That should be a very quick operation. That would certainly be a viable way to implement this once we have incremental maintenance for materialized views, although I make no claims to having evaluated it versus the alternatives to be able to assert what the *best* way is. No need to invent some sort of get access to the changed rows mechanism especially for CREATE ASSERTION. As soon as we are out of this CF, I am planning to write code to capture deltas and fire functions to process them eagerly (within the creating transaction). There has been suggestions that the changeset mechanism should be used for that, which I will look into; but my gut feel is that it will be better to build a tuplestore of tids flagged with old or new around the point that after triggers fire. How close does that sound to what CREATE ASSERTION (as currently envisioned) would need? How viable does it sound to turn an assertion expression into a matview which is empty if there are no violations? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SQL assertions prototype
On Mon, Nov 25, 2013 at 11:04:23AM -0800, Kevin Grittner wrote: Andrew Tipton and...@kiwidrew.com wrote: Simon Riggs si...@2ndquadrant.com wrote: So we'd need to get access to the changed rows, rather than re-executing a huge SQL command that re-checks every row of the table. That last point will make it unusable for sensible amounts of data. That sounds very similar to handling incremental maintenance of materialized views, which Kevin is working on. It does. Let's assume that the huge SQL command that re-checks every row of the table is actually a materialized view. In that case, the CREATE ASSERTION trigger would merely need to scan the matview and raise an error if any rows were present. That should be a very quick operation. That would certainly be a viable way to implement this once we have incremental maintenance for materialized views, although I make no claims to having evaluated it versus the alternatives to be able to assert what the *best* way is. No need to invent some sort of get access to the changed rows mechanism especially for CREATE ASSERTION. As soon as we are out of this CF, I am planning to write code to capture deltas and fire functions to process them eagerly (within the creating transaction). There has been suggestions that the changeset mechanism should be used for that, which I will look into; but my gut feel is that it will be better to build a tuplestore of tids flagged with old or new around the point that after triggers fire. How close does that sound to what CREATE ASSERTION (as currently envisioned) would need? It sounds *extremely* close to what we'd need for row access in per-statement triggers, as in probably identical. The SQL syntax of this sub-feature is described in Foundation section 11.49 and called REFERENCING in CREATE TRIGGER. Do you have any prototypes I could use for that purpose? Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SQL assertions prototype
David Fetter da...@fetter.org wrote: On Mon, Nov 25, 2013 at 11:04:23AM -0800, Kevin Grittner wrote: As soon as we are out of this CF, I am planning to write code to capture deltas and fire functions to process them eagerly (within the creating transaction). There has been suggestions that the changeset mechanism should be used for that, which I will look into; but my gut feel is that it will be better to build a tuplestore of tids flagged with old or new around the point that after triggers fire. How close does that sound to what CREATE ASSERTION (as currently envisioned) would need? It sounds *extremely* close to what we'd need for row access in per-statement triggers, as in probably identical. The SQL syntax of this sub-feature is described in Foundation section 11.49 and called REFERENCING in CREATE TRIGGER. Do you have any prototypes I could use for that purpose? No, but it is at the top of my list after the CF. I will also need an execution node type or two to produce the referenced rows for the appropriate contexts, which is probably also very close to what you need for per-statement triggers. I will be happy to coordinate work with you. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Errors on missing pg_subtrans/ files with 9.3
Andres Freund escribió: Ok, this is helpful. Do you rather longrunning transactions? The transaction that does foreign key checks has an xid of 10260613, while the row that's getting checked has 13514992. Thanks for the analysis. #4 0x00635dc7 in XactLockTableWait (xid=13514992) at lmgr.c:501 tag = {locktag_field1 = 13514992, locktag_field2 = 0, locktag_field3 = 0, locktag_field4 = 0, locktag_type = 4 '\004', locktag_lockmethodid = 1 '\001'} #5 0x00482223 in heap_lock_updated_tuple_rec (rel=0x2b20f050a8d0, tuple=value optimized out, ctid=value optimized out, xid=10260613, mode=LockTupleKeyShare) at heapam.c:4847 I am not sure whether that's the origin of the problem but at the very least it seems to me that heap_lock_updated_tuple_rec() is missing several important pieces: a) do the priorXmax==xmin dance to check we're still following the same ctid chain. Currently we could easily stumble across completely unrelated tuples if a chain element aborted and got vacuumed. This seems simple to handle by adding the check you propose to the loop. Basically if the xmax doesn't match the xmin, we reached the end, there's nothing more to lock and we can return success without any further work: /* * Check the tuple XMIN against prior XMAX, if any. If we reached * the end of the chain, we're done, so return success. */ if (TransactionIdIsValid(priorXmax) !TransactionIdEquals(HeapTupleHeaderGetXmin(mytup), priorXmax)) { UnlockReleaseBuffer(buf); return HeapTupleMayBeUpdated; } b) Check whether a chain element actually aborted - currently we're only doing that in the HEAP_KEYS_UPDATED updated case, but that seems wrong (we can't check for committed tho!). Let me point out that this is exactly the same code that would be affected by my proposed fix for #8434, which would have this check the updateXid in all cases, not only in KEYS_UPDATED as currently. I don't understand your comment about can't check for committed. I mean, if the updating transaction committed, then we need to return failure to caller, HeapTupleUpdated. This signals to the caller that the future version of the tuple is locked and the whole thing needs to be failed. (In READ COMMITTED isolation level, this would cause EvalPlanQual to fetch the updated version of the tuple and redo the whole thing from there. In REPEATABLE READ and above, the whole operation would be aborted.) In short I propose to fix part this with the simple patch I proposed for bug #8434. c) (reported separately as well) cope with failure of heap_fetch() to return anything. Proposed patch for this was posted in the other thread. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why is UPDATE with column-list syntax not implemented
Claudio, Unfortunately, this UPDATE...FROM approach does not detect ambiguities, unless we go for tricks. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Why-is-UPDATE-with-column-list-syntax-not-implemented-tp5779600p5780215.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why semicolon after begin is not allowed in postgresql?
Kevin, I do see your logic now, but this thing is a common mistake - it means that this seems counter-intuitive to some people. What would happen if we applied Occam's razor and just removed this rule? All existing code would continue to work as is, and we would have one less rule to memorize. That would make PostgreSql a slightly better product, right? -- View this message in context: http://postgresql.1045698.n5.nabble.com/why-semicolon-after-begin-is-not-allowed-in-postgresql-tp5779905p5780216.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Errors on missing pg_subtrans/ files with 9.3
On 2013-11-25 17:10:39 -0300, Alvaro Herrera wrote: I am not sure whether that's the origin of the problem but at the very least it seems to me that heap_lock_updated_tuple_rec() is missing several important pieces: a) do the priorXmax==xmin dance to check we're still following the same ctid chain. Currently we could easily stumble across completely unrelated tuples if a chain element aborted and got vacuumed. This seems simple to handle by adding the check you propose to the loop. Basically if the xmax doesn't match the xmin, we reached the end, there's nothing more to lock and we can return success without any further work: Right, that's what we do in other places (notably EvalPlanQualFetch()). b) Check whether a chain element actually aborted - currently we're only doing that in the HEAP_KEYS_UPDATED updated case, but that seems wrong (we can't check for committed tho!). Let me point out that this is exactly the same code that would be affected by my proposed fix for #8434, which would have this check the updateXid in all cases, not only in KEYS_UPDATED as currently. Hm. I don't see a patch in any of the mails about #8434 although I seem to remember talking with you about one. Maybe that was on IM? I don't understand your comment about can't check for committed. I mean, if the updating transaction committed, then we need to return failure to caller, HeapTupleUpdated. I mean that in the !KEYS_UPDATED case we don't need to abort if we're only acquiring a key share... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why semicolon after begin is not allowed in postgresql?
On 11/25/2013 03:42 PM, AK wrote: Kevin, I do see your logic now, but this thing is a common mistake - it means that this seems counter-intuitive to some people. What would happen if we applied Occam's razor and just removed this rule? All existing code would continue to work as is, and we would have one less rule to memorize. That would make PostgreSql a slightly better product, right? It would make it a worse product, being inconsistent and stupid. The rule is that you use semicolons to terminate statements. 'begin' on its own is not a complete statement. Therefore it should not be followed by a semicolon. Several people have explained this basis of the rule. It's not counter-intuitive to me or lots of other people. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Errors on missing pg_subtrans/ files with 9.3
Andres Freund escribió: On 2013-11-25 17:10:39 -0300, Alvaro Herrera wrote: Let me point out that this is exactly the same code that would be affected by my proposed fix for #8434, which would have this check the updateXid in all cases, not only in KEYS_UPDATED as currently. Hm. I don't see a patch in any of the mails about #8434 although I seem to remember talking with you about one. Maybe that was on IM? Ah, yeah, that's possible. The patch I proposed back then is attached here. I haven't merged this to latest master; this applies cleanly on top of 16a906f5350 I don't understand your comment about can't check for committed. I mean, if the updating transaction committed, then we need to return failure to caller, HeapTupleUpdated. I mean that in the !KEYS_UPDATED case we don't need to abort if we're only acquiring a key share... Hm, I think that's correct -- we don't need to abort. But we still need to wait until the updater completes. So this proposed patch is not the full story. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services commit 71aa5c689c153a03bc09e7f9f433ca55a6f00a23 Author: Alvaro Herrera alvhe...@alvh.no-ip.org Date: Mon Oct 7 18:33:59 2013 -0300 Don't skip waiting on tuple updaters if key is unchanged My commit 0ac5ad5134 contained a bug that allowed lockers to avoid sleeping waiting for a future version of the tuple being locked to be released. This happens while following the update chain to lock the updated versions of the row. Prior to that commit, there was no such behavior, so backpatch no further than 9.3. The isolation test changes illustrate the difference in behavior. Some transactions which were previously allowed to continue must now sleep waiting for the other transaction; but there is still more concurrency than there was prior to the mentioned commit. Per bug #8434 reported by Tomonari Katsumata diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index ead3d69..530a001 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -4832,15 +4832,12 @@ l4: xmax = HeapTupleHeaderGetRawXmax(mytup.t_data); /* - * If this tuple is updated and the key has been modified (or - * deleted), what we do depends on the status of the updating - * transaction: if it's live, we sleep until it finishes; if it has - * committed, we have to fail (i.e. return HeapTupleUpdated); if it - * aborted, we ignore it. For updates that didn't touch the key, we - * can just plough ahead. + * If this tuple is updated, what we do depends on the status of the + * updating transaction: if it's live, we sleep until it finishes; if + * it has committed, we have to fail (i.e. return HeapTupleUpdated); if + * it aborted, we ignore it. */ - if (!(old_infomask HEAP_XMAX_INVALID) - (mytup.t_data-t_infomask2 HEAP_KEYS_UPDATED)) + if (!(old_infomask HEAP_XMAX_INVALID)) { TransactionId update_xid; diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index 791f336..efe7d5a 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -1970,7 +1970,7 @@ EvalPlanQualFetch(EState *estate, Relation relation, int lockmode, * heap_lock_tuple() will throw an error, and so would any later * attempt to update or delete the tuple. (We need not check cmax * because HeapTupleSatisfiesDirty will consider a tuple deleted - * by our transaction dead, regardless of cmax.) Wee just checked + * by our transaction dead, regardless of cmax.) We just checked * that priorXmax == xmin, so we can test that variable instead of * doing HeapTupleHeaderGetXmin again. */ diff --git a/src/test/isolation/expected/fk-deadlock.out b/src/test/isolation/expected/fk-deadlock.out index 69eac88..bbd1867 100644 --- a/src/test/isolation/expected/fk-deadlock.out +++ b/src/test/isolation/expected/fk-deadlock.out @@ -11,25 +11,22 @@ step s2c: COMMIT; starting permutation: s1i s1u s2i s1c s2u s2c step s1i: INSERT INTO child VALUES (1, 1); step s1u: UPDATE parent SET aux = 'bar'; -step s2i: INSERT INTO child VALUES (2, 1); +step s2i: INSERT INTO child VALUES (2, 1); waiting ... step s1c: COMMIT; +step s2i: ... completed step s2u: UPDATE parent SET aux = 'baz'; step s2c: COMMIT; starting permutation: s1i s1u s2i s2u s1c s2c step s1i: INSERT INTO child VALUES (1, 1); step s1u: UPDATE parent SET aux = 'bar'; -step s2i: INSERT INTO child VALUES (2, 1); -step s2u: UPDATE parent SET aux = 'baz'; waiting ... -step s1c: COMMIT; -step s2u: ... completed -step s2c: COMMIT; +step s2i: INSERT INTO child VALUES (2, 1); waiting ... +invalid permutation detected starting permutation: s1i s1u s2i s2u s2c s1c step s1i: INSERT INTO child VALUES (1, 1); step s1u: UPDATE parent SET aux = 'bar'; -step s2i: INSERT INTO
[HACKERS] Put json type into alphabetical order in manual table
Hi, When looking at table 8-1 at http://www.postgresql.org/docs/9.3/static/datatype.html i noticed that all types except for json was in alphabetical order. I have attached a patch which fixes this. By the way should character and character varying be swapped in that table too? -- Andreas Karlsson diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml new file mode 100644 index dea5195..fb9c41a *** a/doc/src/sgml/datatype.sgml --- b/doc/src/sgml/datatype.sgml *** *** 137,142 --- 137,148 /row row +entrytypejson/type/entry +entry/entry +entryJSON data/entry + /row + + row entrytypeline/type/entry entry/entry entryinfinite line on a plane/entry *** *** 269,280 entry/entry entryXML data/entry /row - - row -entrytypejson/type/entry -entry/entry -entryJSON data/entry - /row /tbody /tgroup /table --- 275,280 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why semicolon after begin is not allowed in postgresql?
AK wrote Kevin, I do see your logic now, but this thing is a common mistake - it means that this seems counter-intuitive to some people. What would happen if we applied Occam's razor and just removed this rule? All existing code would continue to work as is, and we would have one less rule to memorize. That would make PostgreSql a slightly better product, right? I'm somewhat on the fence for this but am leaning toward maintaining status-quo. Mostly because of the analogy with IF ... END IF; versus the SQL BEGIN; command which is a entirely separate construct. I would maybe change the documentation so that instead of simply dictating a rule we explain why the syntax is the way it is - like this thread is doing. If they consciously omit the semi-colon hopefully they also understand that what they are beginning is a code-block in plpgsql as opposed to an SQL transaction. That said, technical purity isn't always a good answer. I'd be inclined to let someone passionate enough about the idea implement it an critique instead of dis-allowing it outright; but in the end that is likely to result in the same end. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/why-semicolon-after-begin-is-not-allowed-in-postgresql-tp5779905p5780222.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Errors on missing pg_subtrans/ files with 9.3
On 2013-11-25 18:06:30 -0300, Alvaro Herrera wrote: I mean that in the !KEYS_UPDATED case we don't need to abort if we're only acquiring a key share... Hm, I think that's correct -- we don't need to abort. But we still need to wait until the updater completes. So this proposed patch is not the full story. Hm. Why do we need to wait in that case? Isn't the entire point of KEY SHARE locks *not* having to wait for !KEYS_UPDATED? ISTM in that case we should only check whether the creating transaction has aborted because in that case we don't need to take out a lock? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why semicolon after begin is not allowed in postgresql?
AK alk...@gmail.com wrote: I do see your logic now, but this thing is a common mistake - it means that this seems counter-intuitive to some people. What would happen if we applied Occam's razor and just removed this rule? All existing code would continue to work as is, and we would have one less rule to memorize. That would make PostgreSql a slightly better product, right? Possibly; but what happens if we want to use plpgsql syntax for stored procedures which allow the BEGIN command some day? We would have lost the ability to distinguish that command from the start of a compound statement. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo: fast update of arrays with fixed length fields for PL/pgSQL
(sorry for possible duplicates, used wrong account on first try) On 07.10.2013 17:00, Pavel Stehule wrote: Hello I fixed patch - there was a missing cast to domain when it was used (and all regress tests are ok now). This doesn't work correctly for varlen datatypes. I modified your quicksort example to work with text[], and got this: postgres=# SELECT array_upper(quicksort(1,2,array_agg((g::text))),1) FROM generate_series(1,2) g; ERROR: invalid input syntax for integer: CONTEXT: PL/pgSQL function quicksort(integer,integer,text[]) line 10 at assignment PL/pgSQL function quicksort(integer,integer,text[]) line 16 at assignment The handling of array domains is also wrong. You replace the new value to the array and only check the domain constraint after that. If the constraint is violated, it's too late to roll that back. For example: postgres=# create domain posarray as int[] check (value[1] 0); CREATE DOMAIN postgres=# do $$ declare iarr posarray; begin begin iarr[1] := 1; iarr[1] := -1; exception when others then raise notice 'got error: %', sqlerrm; end; raise notice 'value: %', iarr[1]; end; $$; NOTICE: got error: value for domain posarray violates check constraint posarray_check NOTICE: value: -1 DO Without the patch, it prints 1, but with the patch, -1. In general, I'm not convinced this patch is worth the trouble. The speedup isn't all that great; manipulating large arrays in PL/pgSQL is still so slow that if you care about performance you'll want to rewrite your function in some other language or use temporary tables. And you only get a gain with arrays of fixed-length element type with no NULLs. So I think we should drop this patch in its current form. If we want to make array manipulation in PL/pgSQL, I think we'll have to do something similar to how we handle row variables, or something else entirely. But that's a much bigger patch, and I'm not sure it's worth the trouble either. Looking at perf profile and the functions involved, though, I see some low-hanging fruit: in array_set, the new array is allocated with palloc0'd, but we only need to zero out a few alignment bytes where the new value is copied. Replacing it with palloc() will save some cycles, per the attached patch. Nowhere near as much as your patch, but this is pretty uncontroversial. - Heikki CREATE OR REPLACE FUNCTION public.quicksort(l integer, r integer, a text[]) RETURNS text[] LANGUAGE plpgsql IMMUTABLE STRICT AS $function$ DECLARE akt text[] = a; i integer := l; j integer := r; x text = akt[(l+r) / 2]; w integer; BEGIN LOOP WHILE akt[i] x LOOP i := i + 1; END LOOP; WHILE x akt[j] loop j := j - 1; END LOOP; IF i = j THEN w := akt[i]; akt[i] := akt[j]; akt[j] := w; i := i + 1; j := j - 1; END IF; EXIT WHEN i j; END LOOP; IF l j THEN akt := quicksort(l,j,akt); END IF; IF i r then akt := quicksort(i,r,akt); END IF; RETURN akt; END; $function$ diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c index 438c3d0..ced41f0 100644 --- a/src/backend/utils/adt/arrayfuncs.c +++ b/src/backend/utils/adt/arrayfuncs.c @@ -2235,7 +2235,7 @@ array_set(ArrayType *array, /* * OK, create the new array and fill in header/dimensions */ - newarray = (ArrayType *) palloc0(newsize); + newarray = (ArrayType *) palloc(newsize); SET_VARSIZE(newarray, newsize); newarray-ndim = ndim; newarray-dataoffset = newhasnulls ? overheadlen : 0; @@ -2250,8 +2250,12 @@ array_set(ArrayType *array, (char *) array + oldoverheadlen, lenbefore); if (!isNull) + { + /* zero out any padding in the slot reserved for the new item */ + memset((char *) newarray + overheadlen + lenbefore, 0, newitemlen); ArrayCastAndSet(dataValue, elmlen, elmbyval, elmalign, (char *) newarray + overheadlen + lenbefore); + } memcpy((char *) newarray + overheadlen + lenbefore + newitemlen, (char *) array + oldoverheadlen + lenbefore + olditemlen, lenafter); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cube extension split algorithm fix
Hi! On Wed, Sep 25, 2013 at 3:14 PM, Stas Kelvich stas.kelv...@gmail.comwrote: I've fixed split algorithm that was implemented in cube extension. I've changed it according to the original Guttman paper (old version was more simple algorithm) and also ported Alexander Korotkov's algorithm from box datatype indexing that work faster and better on low dimensions. On my test dataset (1M records, 7 dimensions, real world database of goods) numbers was following: Building index over table (on expression): old: 67.296058 seconds new: 48.842391 seconds Cube point search, mean, 100 queries old: 0.001025 seconds new: 0.000427 seconds Index on field size: old: 562 MB new: 283 MB Thanks for patch! Here is my review. 1) After points for cubes were committed, this patch doesn't applies anymore to head. patching file contrib/cube/cube.c Hunk #1 FAILED at 131. Hunk #2 succeeded at 482 (offset 17 lines). Hunk #3 succeeded at 494 (offset 17 lines). Hunk #4 succeeded at 501 (offset 17 lines). Hunk #5 succeeded at 611 (offset 17 lines). Hunk #6 FAILED at 681. Hunk #7 succeeded at 790 with fuzz 1 (offset 30 lines). Hunk #8 succeeded at 808 (offset 29 lines). Hunk #9 succeeded at 888 (offset 29 lines). Hunk #10 succeeded at 1090 (offset 29 lines). Hunk #11 succeeded at 1160 (offset 29 lines). Hunk #12 succeeded at 1272 (offset 27 lines). Hunk #13 succeeded at 1560 with fuzz 1 (offset 95 lines). 2 out of 13 hunks FAILED -- saving rejects to file contrib/cube/cube.c.rej (Stripping trailing CRs from patch.) patching file contrib/cube/cubedata.h Hunk #1 succeeded at 47 (offset 35 lines). 2) Split algorithms are choosing by number of dimensions in first cube. This is generally weak idea :) At least, cubes could have different number of dimensions in the same index. This rule would give different answers for different orders of cubes. Also, as corner case n+1 dimensional cubes with confluent (n+1)th dimension are generally same as n dimensional cubes. Choosing split algorithm shouldn't depend on it. We should try to extrapolate this principle little more and detect useless for split dimensions to do a better choice. Also, as responsible part as choosing split algorithm needs to be documented (with references to the papers if needed). 3) We need more comprehensive testing. One dataset is not enough. We need much more to prove the rule of choosing split algorithm. Description of set of queries 'Cube point search' is not detail. Can you share dataset you use? We need the tests that anyone can reproduce. -- With best regards, Alexander Korotkov.
Re: [HACKERS] ToDo: fast update of arrays with fixed length fields for PL/pgSQL
Heikki Linnakangas hei...@localhost.vmware.com writes: In general, I'm not convinced this patch is worth the trouble. The speedup isn't all that great; manipulating large arrays in PL/pgSQL is still so slow that if you care about performance you'll want to rewrite your function in some other language or use temporary tables. And you only get a gain with arrays of fixed-length element type with no NULLs. So I think we should drop this patch in its current form. If we want to make array manipulation in PL/pgSQL, I think we'll have to do something similar to how we handle row variables, or something else entirely. I think that this area would be a fruitful place to make use of the noncontiguous datatype storage ideas that we were discussing with the PostGIS guys recently. I agree that tackling it in the context of plpgsql alone is not a good way to go at it. I'm not saying this in a vacuum of information, either. Some of the guys at Salesforce have been poking at noncontiguous storage for arrays and have gotten nice speedups --- but their patch is for plpgsql only and only addresses arrays, which makes it enough of a kluge that I've not wanted to bring it to the community. I think we should work towards a general solution instead. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why semicolon after begin is not allowed in postgresql?
On 26/11/13 09:42, AK wrote: Kevin, I do see your logic now, but this thing is a common mistake - it means that this seems counter-intuitive to some people. What would happen if we applied Occam's razor and just removed this rule? All existing code would continue to work as is, and we would have one less rule to memorize. That would make PostgreSql a slightly better product, right? Perhaps not a good use of Mr Occam's razor. Postgres supports many procedural languages (e.g plperl, plpython) and all these have different grammar rules from SQL - and from each other. We can't (and shouldn't) try altering them to be similar to SQL - it would defeat the purpose of providing a procedural environment where the given language works as advertised. So in the case of plpgsql - it needs to follow the Ada grammar, otherwise it would be useless. The fact that different languages may have similar or the same keywords with different grammar and punctuation rules is just a fact or life (I trip over that often when switching from perl to python in the same day)! Regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MultiXact bugs
Alvaro Herrera alvhe...@2ndquadrant.com wrote: Andres Freund wrote: That's because HeapTupleHeaderGetUpdateXid() ignores aborted updaters and returns InvalidTransactionId in that case, but HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DELETE_IN_PROGRESS... I checked for other cases where the update Xid is checked after HeapTupleSatisfiesVacuum returns HEAPTUPLE_DELETE_IN_PROGRESS. As far as I can tell, the only one that would be affected is the one in predicate.c. It is far from clear to me what is the right thing to do in these cases; the simplest idea is to return without reporting a failure if the updater aborted, just as above; but I wonder if this needs to be conditional on visible. I added a pg_usleep() before acquiring the update Xid in the relevant case, but the isolation test cases didn't hit the problem (I presume there is no update/delete in these test cases, but I didn't check). I defer to Kevin on this issue. Right now if HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DELETE_IN_PROGRESS we call HeapTupleHeaderGetUpdateXid(tuple-t_data) and Assert() that the result is valid. It sounds like we should do something like the attached, maybe? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Companydiff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c index a8a0e98..1f2bf91 100644 --- a/src/backend/storage/lmgr/predicate.c +++ b/src/backend/storage/lmgr/predicate.c @@ -3907,6 +3907,8 @@ CheckForSerializableConflictOut(bool visible, Relation relation, break; case HEAPTUPLE_DELETE_IN_PROGRESS: xid = HeapTupleHeaderGetUpdateXid(tuple-t_data); + if (!TransactionIdIsValid(xid)) +return; break; case HEAPTUPLE_INSERT_IN_PROGRESS: xid = HeapTupleHeaderGetXmin(tuple-t_data); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] pg_upgrade ?deficiency
Bruce Momjian br...@momjian.us wrote: Kevin Grittner wrote: Bruce Momjian br...@momjian.us wrote: I am not a fan of backpatching any of this. Here's my problem with that. Here's setup to create what I don't think is all that weird a setup: The cluster is created in the state that was dumped, default read only flags and all. Are you saying that you find current behavior acceptable in back branches? First, I don't need to see a 300-line pg_dump restore output to know it is a bug. I was trying to save people the trouble of working up their own test to see what happened when running pg_dumpall output from a successful run into a freshly initdb'd cluster. No offense intended. Second, what you didn't do is to address the rest of my paragraph: I am not a fan of backpatching any of this. We have learned the fix is more complex than thought, I didn't realize that anyone thought the pg_dumpall fix would amount to something simpler than two printf statements; but even so, that seems pretty reasonable to me. and the risk of breakage and having pg_dump diffs change between minor releases doesn't seem justified. You might have missed the part of the thread where we seemed to have reached a consensus that pg_dump output would not change at all; only pg_dumpall output -- to something capable of being restored to a fresh cluster. We have to balance the _one_ user failure report we have received vs. potential breakage. What breakage mode do you see? Now, others seem to be fine with a backpatch, so perhaps it is safe. I am merely pointing out that, with all backpatching, we have to balance the fix against possible breakage and behavioral change. Sure, but I think a bug in a dump utility which allows it to run with apparent success while generating results which don't restore is pretty clearly something to fix. FWIW I don't feel as strongly about the pg_upgrade aspect of this -- as long as a test run reports that the cluster can't be upgraded without changing those settings, there is no problem. Of course it would be nice if it could be fixed instead, but that's not as critical in my view. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Python: domain over array support
On 24.11.2013 18:44, Marko Kreen wrote: On Sat, Nov 23, 2013 at 11:09:53AM -0200, Rodolfo Campero wrote: 2013/11/22 Marko Kreen mark...@gmail.com One more thing - please update Python 3 regtests too. The attached patch (version 3) includes the expected results for Python 3 (file plpython_types_3.out). Thanks. Looks good now. Looks good to me too. This does change the behavior of any existing functions that return a domain over array. For example: postgres=# create domain intarr as integer[]; CREATE DOMAIN postgres=# create function intarr_test() returns intarr as $$ return '{1,2}' $$ language plpythonu; CREATE FUNCTION Before patch: postgres=# select intarr_test(); intarr_test - {1,2} (1 row) After patch: postgres=# select intarr_test(); ERROR: invalid input syntax for integer: { CONTEXT: while creating return value PL/Python function intarr_test The new behavior is clearly better, but it is an incompatibility nonetheless. I don't do anything with PL/python myself, so I don't have a good feel of how much that'll break people's applications. Probably not much I guess. But warrants a mention in the release notes at least. Any thoughts on that? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add \i option to bring in the specified file as a quoted literal
On Fri, 2013-11-22 at 09:54 -0300, Alvaro Herrera wrote: Amit Kapila escribió: On Fri, Nov 22, 2013 at 1:33 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: \ib homer ~/photos/homer.jpg insert into people (name, photo) values ('Homer', :homer); Isn't something similar already supported as mentioned in docs: One example use of this mechanism is to copy the contents of a file into a table column. First load the file into a variable and then interpolate the variable's value as a quoted string: testdb= \set content `cat my_file.txt` testdb= INSERT INTO my_table VALUES (:'content'); or do you prefer an alternative without any kind of quote using \ib? If the only use case of the feature proposed in this thread is to load stuff from files to use as column values, then we're pretty much done, and this patch is not needed -- except, maybe, that the `` is unlikely to work on Windows, as already mentioned elsewhere. But if the OP had something else in mind, let's hear what it is. I've test set command mentioned above, and I think it is enough. At this moment I see no point in implementing new command. Best Regards Piotr Marcinczyk -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Put json type into alphabetical order in manual table
On Mon, Nov 25, 2013 at 3:07 PM, Andreas Karlsson andr...@proxel.se wrote: Hi, When looking at table 8-1 at http://www.postgresql.org/docs/9.3/static/datatype.html i noticed that all types except for json was in alphabetical order. I have attached a patch which fixes this. By the way should character and character varying be swapped in that table too? I would. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why semicolon after begin is not allowed in postgresql?
Mark Kirkwood-2 wrote Postgres supports many procedural languages (e.g plperl, plpython) and all these have different grammar rules from SQL - and from each other. We can't (and shouldn't) try altering them to be similar to SQL - it would defeat the purpose of providing a procedural environment where the given language works as advertised. So in the case of plpgsql - it needs to follow the Ada grammar, otherwise it would be useless. I do not follow the useless conclusion - what, present day, does Ada got to do with it? And the request is to alter only plpgsql, not all the other languages. To the casual end-user plpgsql is an internal language under our full control and installed by default in all new releases. Is it really unreasonable to expect us to design in some level of coordination between it and SQL? Cross-compatibility is a valid reason though I'm guessing with all the inherent differences between our standard PL and other database's PLs that making this change would not be a materially noticeable additional incompatibility. I'll even accept language consistency and not worth the effort of special-casing but mostly because the error is immediate and obvious, and the solution is simple and readily learned. A side observation: why does DECLARE not require a block-end keyword but instead BEGIN acts as effectively both start and end? BEGIN, IF, FOR, etc... all come in pairs but DECLARE does not. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/why-semicolon-after-begin-is-not-allowed-in-postgresql-tp5779905p5780245.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why semicolon after begin is not allowed in postgresql?
On 11/25/2013 06:13 PM, David Johnston wrote: A side observation: why does DECLARE not require a block-end keyword but instead BEGIN acts as effectively both start and end? BEGIN, IF, FOR, etc... all come in pairs but DECLARE does not. A complete block is: [ DECLARE declarations ] BEGIN statements [ EXCEPTIONS handlers ] END The declare and exceptions parts are optional, as indicated. Does that make it clearer? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why semicolon after begin is not allowed in postgresql?
Andrew Dunstan wrote On 11/25/2013 06:13 PM, David Johnston wrote: A side observation: why does DECLARE not require a block-end keyword but instead BEGIN acts as effectively both start and end? BEGIN, IF, FOR, etc... all come in pairs but DECLARE does not. A complete block is: [ DECLARE declarations ] BEGIN statements [ EXCEPTIONS handlers ] END The declare and exceptions parts are optional, as indicated. Does that make it clearer? Doh! IF / THEN / ELSE / ENDIF (concept, not syntax) That also does help to reinforce the point being made here... David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/why-semicolon-after-begin-is-not-allowed-in-postgresql-tp5779905p5780250.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
On 24.11.2013 00:19, Dimitri Fontaine wrote: Jeff Davis pg...@j-davis.com writes: In the CF app, this is marked Ready for Committer. That's a bit vague here, considering Dimitri, you, Peter, and Alvaro are all committers. Who is this patch waiting on? Is the discussion concluding, or does it need another round of review? Thanks for the confusion I guess, but I'm no committer here ;-) This patch has received extensive review in July and I think it now properly implements the design proposed by Tom and Heikki in 9.3/CF4. Ok, since my name has been mentioned, I'll bite.. I still don't like this. What I suggested back in December was to have a simple mechanism to upload an extension zip file to the server via libpq (http://www.postgresql.org/message-id/50bf80a6.20...@vmware.com). The idea developed from that into the concept of extension templates, but the original idea was lost somewhere along the way. Back in December, when I agreed that upload zip file via libpq might be useful, Tom suggested that we call control+sql file a template, and the installed entity an extension. So far so good. Now comes the patch, and the extension template no longer means a control+sql file. It means an entity that's installed in the database that contains the same information as a control+sql file, but in a new format. In fact, what *do* you call the control+sql file that lies on the filesystem? Not a template, apparently. I want to be able to download extension.zip from pgxn.org, and then install it on a server. I want to be able to install it the traditional way, by unzipping it to the filesystem, or via libpq by using this new feature. I do *not* want to rewrite the extension using a new CREATE TEMPLATE FOR EXTENSION syntax to do the latter. I want to be able to install the *same* zip file using either method. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why semicolon after begin is not allowed in postgresql?
On 26/11/13 12:13, David Johnston wrote: Mark Kirkwood-2 wrote Postgres supports many procedural languages (e.g plperl, plpython) and all So in the case of plpgsql - it needs to follow the Ada grammar, otherwise it would be useless. I do not follow the useless conclusion - what, present day, does Ada got to do with it? And the request is to alter only plpgsql, not all the other languages. To the casual end-user plpgsql is an internal language under our full control and installed by default in all new releases. Is it really unreasonable to expect us to design in some level of coordination between it and SQL? Cross-compatibility is a valid reason though I'm guessing with all the inherent differences between our standard PL and other database's PLs that making this change would not be a materially noticeable additional incompatibility. I guess I was thinking useless as an example of a PL/SQL or Ada compatible language, which I probably should have stated fully - sorry. While we do add extra features to plpgsql we don't usually add deliberately PL/SQL or Ada incompatible ones. Where we do, sometimes might wish we had not (ISTR a discussion about PERFORM). Other posters have pointed out that adding the semi colon to BEGIN confuses its main reason for existence - indicating the start of a code block, and would also confuse the casual reader about whether a code block or transaction was starting. All in all a materially noticeable incompatibility! regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Put json type into alphabetical order in manual table
On 11/25/2013 11:52 PM, Merlin Moncure wrote: On Mon, Nov 25, 2013 at 3:07 PM, Andreas Karlsson andr...@proxel.se wrote: Hi, When looking at table 8-1 at http://www.postgresql.org/docs/9.3/static/datatype.html i noticed that all types except for json was in alphabetical order. I have attached a patch which fixes this. By the way should character and character varying be swapped in that table too? I would. Ok, I have attached a patch which fixes that too. -- Andreas Karlsson diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml new file mode 100644 index dea5195..0386330 *** a/doc/src/sgml/datatype.sgml --- b/doc/src/sgml/datatype.sgml *** *** 83,100 /row row -entrytypecharacter varying [ (replaceablen/replaceable) ]/type/entry -entrytypevarchar [ (replaceablen/replaceable) ]/type/entry -entryvariable-length character string/entry - /row - - row entrytypecharacter [ (replaceablen/replaceable) ]/type/entry entrytypechar [ (replaceablen/replaceable) ]/type/entry entryfixed-length character string/entry /row row entrytypecidr/type/entry entry/entry entryIPv4 or IPv6 network address/entry --- 83,100 /row row entrytypecharacter [ (replaceablen/replaceable) ]/type/entry entrytypechar [ (replaceablen/replaceable) ]/type/entry entryfixed-length character string/entry /row row +entrytypecharacter varying [ (replaceablen/replaceable) ]/type/entry +entrytypevarchar [ (replaceablen/replaceable) ]/type/entry +entryvariable-length character string/entry + /row + + row entrytypecidr/type/entry entry/entry entryIPv4 or IPv6 network address/entry *** *** 137,142 --- 137,148 /row row +entrytypejson/type/entry +entry/entry +entryJSON data/entry + /row + + row entrytypeline/type/entry entry/entry entryinfinite line on a plane/entry *** *** 269,280 entry/entry entryXML data/entry /row - - row -entrytypejson/type/entry -entry/entry -entryJSON data/entry - /row /tbody /tgroup /table --- 275,280 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Sat, Nov 23, 2013 at 11:52 PM, Peter Geoghegan p...@heroku.com wrote: I have some concerns about what you've done that may limit my immediate ability to judge performance, and the relative merits of both approaches generally. Now, I know you just wanted to sketch something out, and that's fine, but I'm only sharing my thoughts. I am particularly worried about the worst case (for either approach), particularly with more than 1 unique index. I am also worried about livelock hazards (again, in particular with more than 1 index) - I am not asserting that they exist in your patch, but they are definitely more difficult to reason about. Value locking works because once a page lock is acquired, all unique indexes are inserted into. Could you have two upserters livelock each other with two unique indexes with 1:1 correlated values in practice (i.e. 2 unique indexes that might almost work as 1 composite index)? That is a reasonable usage of upsert, I think. So I had it backwards: In fact, it isn't possible to get your patch to deadlock when it should - it livelocks instead (where with my patch, as far as I can tell, we predictably and correctly have detected deadlocks). I see an infinite succession of insertion conflicted after pre-check DEBUG1 elog messages, and no progress, which is an obvious indication of livelock. My test does involve 2 unique indexes - that's generally the hard case to get right. Dozens of backends are tied-up in livelock. Test case for this is attached. My patch is considerably slowed down by the way this test-case tangles everything up, but does get through each pgbench run/loop in the bash script predictably enough. And when I kill the test-case, a bunch of backends are not left around, stuck in perpetual livelock (with my patch it takes only a few seconds for the deadlock detector to get around to killing every backend). I'm also seeing this: Client 45 aborted in state 2: ERROR: attempted to lock invisible tuple Client 55 aborted in state 2: ERROR: attempted to lock invisible tuple Client 41 aborted in state 2: ERROR: attempted to lock invisible tuple To me this seems like a problem with the (potential) total lack of locking that your approach takes (inserting btree unique index tuples as in your patch is a form of value locking...sort of...it's a little hard to reason about as presented). Do you think this might be an inherent problem, or can you suggest a way to make your approach still work? So I probably should have previously listed as a requirement for our design: * Doesn't just work with one unique index. Naming a unique index directly in DML, or assuming that the PK is intended seems quite weak to me. This is something I discussed plenty with Robert, and I guess I just forgot to repeat myself when asked. Thanks -- Peter Geoghegan testcase.sh Description: Bourne shell script \set extent 10 * :scale \setrandom rec 1 :extent with rej as(insert into foo(a, b, c) values(:rec, :rec * random(), 'fd') on duplicate key lock for update returning rejects *) update foo set c = rej.c from rej where foo.a = rej.a; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] pg_upgrade ?deficiency
On Mon, Nov 25, 2013 at 02:24:25PM -0800, Kevin Grittner wrote: Bruce Momjian br...@momjian.us wrote: Kevin Grittner wrote: Bruce Momjian br...@momjian.us wrote: I am not a fan of backpatching any of this. Here's my problem with that. Here's setup to create what I don't think is all that weird a setup: The cluster is created in the state that was dumped, default read only flags and all. Are you saying that you find current behavior acceptable in back branches? First, I don't need to see a 300-line pg_dump restore output to know it is a bug. I was trying to save people the trouble of working up their own test to see what happened when running pg_dumpall output from a successful run into a freshly initdb'd cluster. No offense intended. Fine, but were 300 lines of output necessary? Second, what you didn't do is to address the rest of my paragraph: I am not a fan of backpatching any of this. We have learned the fix is more complex than thought, I didn't realize that anyone thought the pg_dumpall fix would amount to something simpler than two printf statements; but even so, that seems pretty reasonable to me. Well, if I say more complex than thought, it means not just the patch, but the impact it might have. We are pushing out a replication fix in a few weeks because of backpatches that people thought were safe. For example, if we backpatch, we get zero user testing. If we did it wrong somehow, it is much harder to fix later. For example, what is the logic that a per-database setting of statement_timeout is reset in pg_dump, but default_transaction_read_only is reset in pg_dumpall? Can we know we have thought this all through with zero user testing? and the risk of breakage and having pg_dump diffs change between minor releases doesn't seem justified. You might have missed the part of the thread where we seemed to have reached a consensus that pg_dump output would not change at all; only pg_dumpall output -- to something capable of being restored to a fresh cluster. Yes, I saw that. My point was that we have had to move around the patch to fix the problem, meaning the fix was not obvious. The number of lines is only part of a measure of a patch's riskiness. We have to balance the _one_ user failure report we have received vs. potential breakage. What breakage mode do you see? Uh, I said potential breakage. If I knew of the breakage, I would have said so. Now, others seem to be fine with a backpatch, so perhaps it is safe. I am merely pointing out that, with all backpatching, we have to balance the fix against possible breakage and behavioral change. Sure, but I think a bug in a dump utility which allows it to run with apparent success while generating results which don't restore is pretty clearly something to fix. FWIW I don't feel as strongly pg_dumpall certainly reduces the diff risk. about the pg_upgrade aspect of this -- as long as a test run reports that the cluster can't be upgraded without changing those settings, there is no problem. Of course it would be nice if it could be fixed instead, but that's not as critical in my view. How are we handling breakage of pg_dump, not pg_dumpall? doc patch? pg_upgrade probably needs a doc patch too, or a reset like pg_dumpall. pg_upgrade is more like pg_dumpall, so a code patch seems most logical, again, assuming we decide that pg_dumpall is the right place for this reset of default_transaction_read_only. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggestion: Issue warning when calling SET TRANSACTION outside transaction block
On Fri, Nov 22, 2013 at 01:19:55PM -0500, Bruce Momjian wrote: On Fri, Nov 22, 2013 at 12:17:41PM -0500, Bruce Momjian wrote: Good points. I have modified the attached patch to do as you suggested. Also, I have read through the thread and summarized the positions of the posters: 9.3 WARNING ERROR SET noneTom, DavidJ, AndresFRobert, Kevin SAVEPOINT error Tom, DavidJ, Robert, AndresF, Kevin LOCK, DECLARE error Tom, DavidJ, Robert, AndresF, Kevin Everyone seems to agree that SAVEPOINT, LOCK, and DECLARE should remain as errors. Everyone also seems to agree that BEGIN and COMMIT should remain warnings, and ABORT should be changed from notice to warning. Our only disagreement seems to be how to handle the SET commands, which used to report nothing. Would anyone else like to correct or express an opinion? Given the current vote count and backward-compatibility, warning seems to be the direction we are heading. Patch applied. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
* Heikki Linnakangas (hlinnakan...@vmware.com) wrote: On 24.11.2013 00:19, Dimitri Fontaine wrote: This patch has received extensive review in July and I think it now properly implements the design proposed by Tom and Heikki in 9.3/CF4. Ok, since my name has been mentioned, I'll bite.. I still don't like this. What I suggested back in December was to have a simple mechanism to upload an extension zip file to the server via libpq (http://www.postgresql.org/message-id/50bf80a6.20...@vmware.com). The idea developed from that into the concept of extension templates, but the original idea was lost somewhere along the way. I hate to admit it, but I kind of agree.. While reviewing the patch and thinking about it, the whole thing really did start to strike me as a little too 'meta'. We want a way to package and ship extensions which can then be loaded via the libpq protocol. I'm not sure that there's really any need to track all of this in catalog tables. Also as part of the patch review, I went back and looked through some of the older threads around this and noticed the understandable concern, given the current patch, of non-superusers being able to create extension templates. I'm not sure that an approach which allows a zip file to be uploaded would be something we could allow non-superusers to do, but I really feel like we need a way for non-superusers to create extensions (when they don't have untrusted-language components). Now, given these two trains of thought, I start to see that we may want to avoid non-superusers being able to create arbitrary files on disk, even in a controlled area. We've managed that for various other objects (tables, et al), I'm sure we could work out a solution to that issue... Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] autovacuum_work_mem
On Sun, Nov 24, 2013 at 9:06 AM, Simon Riggs si...@2ndquadrant.com wrote: VACUUM uses 6 bytes per dead tuple. And autovacuum regularly removes dead tuples, limiting their numbers. In what circumstances will the memory usage from multiple concurrent VACUUMs become a problem? In those circumstances, reducing autovacuum_work_mem will cause more passes through indexes, dirtying more pages and elongating the problem workload. Yes, of course, but if we presume that the memory for autovacuum workers to do everything in one pass simply isn't there, it's still better to do multiple passes. Similarly, it's also sometimes (roughly speaking) locally suboptimal but globally optimal to do tapesorts in preference to in-memory quicksorts, even though, as you know, very frequently tapesort is very considerably slower than quicksort. Look at the folk wisdom for sizing maintenance_work_mem that is floating around (for example, take a look at Greg Smith's recommendations in his book). Setting it within postgresql.conf is assumed. You can end up with a conservative value because you're worrying about the worst case. The average case suffers. Especially since, as you say, autovacuum only uses 6 bytes per tuple, and so probably isn't all that likely to run out of working memory, making that worst case (that is, maintenance_work_mem over-allocation by autovacuum workers) very unlikely. So on larger Heroku Postgres plans, the generic maintenance_work_mem is on the low side, and I sometimes have to manually increase it when I'm doing something like creating a new index. I would like to not have to do that, and I would like to not require users to be aware of this issue, especially since external sorting is so much slower. I am inclined to think that we need an altogether more sophisticated model, but this is an incremental improvement. Can we re-state what problem actually is here and discuss how to solve it. (The reference [2] didn't provide a detailed explanation of the problem, only the reason why we want a separate parameter). It's principally a DBA feature, in that it allows the DBA to separately control the memory used by very routine vacuuming, while also having a less conservative default value for maintenance operations that typically are under direct human control. Yes, this is no better than just having maintenance_work_mem be equal to your would-be autovacuum_work_mem setting in the first place, and having everyone remember to set maintenance_work_mem dynamically. However, sometimes people are ill-informed (more ill-informed than the person that puts the setting in postgresql.conf), and other times they're forgetful, and other times still they're using a tool like pg_restore with no convenient way to dynamically set maintenance_work_mem. So, to answer your question, yes: it is entirely possible that you or someone like you may have no use for this. It's often reasonable to assume that autovacuum workers are the only processes that can allocate memory bound in size by maintenance_work_mem that are not under the direct control of a human performing maintenance. Autovacuum workers are in a sense just servicing regular application queries (consider how Oracle handles ROLLBACK garbage collection), and things that service regular application queries are already bound separately by work_mem. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On Mon, Jul 22, 2013 at 2:31 PM, Greg Smith g...@2ndquadrant.com wrote: On the Mac, the only case that seems to have a slowdown now is hundred tiny fields, half nulled. It would be nice to understand just what is going on with that one. I got some ugly results in two short fields, no change too, along with a couple of other weird results, but I think those were testing procedure issues that can be ignored. The pgbench throttle work I did recently highlights that I can't really make a Mac quiet/consistent for benchmarking very well. Note that I ran all of the Mac tests with assertions on, to try and catch platform specific bugs. The Linux ones used the default build parameters. Amit has been asking me to look at this patch for a while, so I finally did. While I agree that it would be nice to get the CPU overhead down low enough that we can turn this on by default and forget about it, I'm not convinced that it's without value even if we can't. Fundamentally, this patch trades away some CPU in exchanged for decrease I/O. The testing thus far is all about whether the CPU overhead can be made trivial, which is a good question to ask, because if the answer is yes, then rather than trading something for something else, we just get something for free. Win! But even if that doesn't pan out, I think the fallback position should not be OK, well, if we can't get decreased I/O for free then forget it but rather OK, if we can't get decreased I/O for free then let's get decreased I/O in exchange for increased CPU usage. I spent a little time running the tests from Heikki's script under perf. On all three two short fields tests and also on the ten tiny fields, all changed test, we spend about 1% of the CPU time in pglz_delta_encode. I don't see any evidence that it's actually compressing anything at all; it appears to be falling out where we test the input length against the strategy, presumably because the default strategy (which we largely copy here) doesn't try to compress input data of less than 32 bytes. Given that this code isn't actually compressing anything in these cases, I'm a bit confused by Greg's report of substantial gains on ten tiny fields, all changed test; how can we win if we're not compressing? I studied the hundred tiny fields, half nulled test case in some detail. Some thoughts: - There is a comment TODO: It would be nice to behave like the history and the source strings were concatenated, so that you could compress using the new data, too. If we're not already doing that, then how are we managing to compress WAL by more than one-quarter in the hundred tiny fields, all changed case? It looks to me like the patch IS doing that, and I'm not sure it's a good idea, especially because it's using pglz_hist_add_no_recycle rather than pglz_add_hist: we verify that hlen + slen 2 * PGLZ_HISTORY_SIZE but that doesn't seem good enough. On the hundred tiny fields, half nulled test case, removing that line reduces compression somewhat but also saves on CPU cycles. - pglz_find_match() is happy to walk all the way down even a really, really long bucket chain. It has some code that reduces good_match each time through, but it fails to produce a non-zero decrement once good_match * good_drop 100. So if we're searching an enormously deep bucket many times in a row, and there are actually no matches, we'll go flying down the whole linked list every time. I tried mandating a minimum decrease of 1 and didn't observe that it made any difference in the run time of this test case, but it still seems odd. For the record, it's not new behavior with this patch; pglz_compress() has the same issue as things stand today. I wonder if we ought to decrease the good match length by a constant rather than a percentage at each step. - pglz_delta_encode() seems to spend about 50% of its CPU time loading up the history data. It would be nice to find a way to reduce that effort. I thought about maybe only making a history entry for, say, every fourth input position rather than every one, but a quick test seems to suggest that's a big fail, probably because it's too easy to skip over the position where we would have made the right match via some short match. But maybe there's some more sophisticated strategy here that would work better. For example, see: http://en.wikipedia.org/wiki/Rabin_fingerprint The basic idea is that you use a rolling hash function to divide up the history data into chunks of a given average size. So we scan the history data, compute a rolling hash value at each point, and each time the bottom N bits are zero, we consider that to be the end of a chunk. We enter all the chunks into a hash table. The chunk size will vary, but on the average, given a reasonably well-behaved rolling hash function (the pglz one probably doesn't qualify) it'll happen every 2^N bytes, so perhaps for this purpose we'd choose N to be between 3 and 5. Then, we scan the input we want to compress
Re: [HACKERS] Suggestion: Issue warning when calling SET TRANSACTION outside transaction block
On Mon, Nov 25, 2013 at 7:19 PM, Bruce Momjian br...@momjian.us wrote: On Fri, Nov 22, 2013 at 01:19:55PM -0500, Bruce Momjian wrote: On Fri, Nov 22, 2013 at 12:17:41PM -0500, Bruce Momjian wrote: Good points. I have modified the attached patch to do as you suggested. Also, I have read through the thread and summarized the positions of the posters: 9.3 WARNING ERROR SET noneTom, DavidJ, AndresFRobert, Kevin SAVEPOINT error Tom, DavidJ, Robert, AndresF, Kevin LOCK, DECLARE error Tom, DavidJ, Robert, AndresF, Kevin Everyone seems to agree that SAVEPOINT, LOCK, and DECLARE should remain as errors. Everyone also seems to agree that BEGIN and COMMIT should remain warnings, and ABORT should be changed from notice to warning. Our only disagreement seems to be how to handle the SET commands, which used to report nothing. Would anyone else like to correct or express an opinion? Given the current vote count and backward-compatibility, warning seems to be the direction we are heading. Patch applied. I must be missing something. The commit message for this patch says: Also change ABORT outside of a transaction block from notice to warning. But the documentation says: - Issuing commandABORT/ when not inside a transaction does - no harm, but it will provoke a warning message. + Issuing commandABORT/ outside of a transaction block has no effect. Those things are not the same. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggestion: Issue warning when calling SET TRANSACTION outside transaction block
On Mon, Nov 25, 2013 at 10:04:19PM -0500, Robert Haas wrote: On Mon, Nov 25, 2013 at 7:19 PM, Bruce Momjian br...@momjian.us wrote: On Fri, Nov 22, 2013 at 01:19:55PM -0500, Bruce Momjian wrote: On Fri, Nov 22, 2013 at 12:17:41PM -0500, Bruce Momjian wrote: Good points. I have modified the attached patch to do as you suggested. Also, I have read through the thread and summarized the positions of the posters: 9.3 WARNING ERROR SET noneTom, DavidJ, AndresFRobert, Kevin SAVEPOINT error Tom, DavidJ, Robert, AndresF, Kevin LOCK, DECLARE error Tom, DavidJ, Robert, AndresF, Kevin Everyone seems to agree that SAVEPOINT, LOCK, and DECLARE should remain as errors. Everyone also seems to agree that BEGIN and COMMIT should remain warnings, and ABORT should be changed from notice to warning. Our only disagreement seems to be how to handle the SET commands, which used to report nothing. Would anyone else like to correct or express an opinion? Given the current vote count and backward-compatibility, warning seems to be the direction we are heading. Patch applied. I must be missing something. The commit message for this patch says: Also change ABORT outside of a transaction block from notice to warning. But the documentation says: - Issuing commandABORT/ when not inside a transaction does - no harm, but it will provoke a warning message. + Issuing commandABORT/ outside of a transaction block has no effect. Those things are not the same. Uh, I ended up mentioning no effect to highlight it does nothing, rather than mention a warning. Would people prefer I say warning? Or should I say issues a warning because it has no effect or something? It is easy to change. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Get more from indices.
Kyotaro HORIGUCHI wrote: the attched pathkey_and_uniqueindx_v4_20131122.patch is changed as follows. The patch is compiled successfully and passes all regression tests. Also, the patch works well for the tests shown in an earlier email from Horiguchi-san. But in this version I found an incorrect behavior. postgres=# CREATE TABLE t (a int not null, b int not null, c int, d text); CREATE TABLE postgres=# CREATE UNIQUE INDEX i_t_ab ON t (a, b); CREATE INDEX postgres=# INSERT INTO t (SELECT a / 5, 4 - (a % 5), a, 't' FROM generate_series(00, 09) a); INSERT 0 10 postgres=# ANALYZE t; ANALYZE postgres=# CREATE TABLE t2 (e text, f int); CREATE TABLE postgres=# INSERT INTO t2 VALUES ('t', 2); INSERT 0 1 postgres=# INSERT INTO t2 VALUES ('t', 1); INSERT 0 1 postgres=# ANALYZE t2; ANALYZE postgres=# EXPLAIN SELECT * FROM t, t2 WHERE t.a 10 AND t.d = t2.e ORDER BY t.a, t.b, t.c, t.d, t2.f LIMIT 10; QUERY PLAN Limit (cost=0.29..9.30 rows=10 width=20) - Nested Loop (cost=0.29..129.99 rows=144 width=20) Join Filter: (t.d = t2.e) - Index Scan using i_t_ab on t (cost=0.29..126.80 rows=72 width=14) Index Cond: (a 10) - Materialize (cost=0.00..1.03 rows=2 width=6) - Seq Scan on t2 (cost=0.00..1.02 rows=2 width=6) (7 rows) postgres=# SELECT * FROM t, t2 WHERE t.a 10 AND t.d = t2.e ORDER BY t.a, t.b, t.c, t.d, t2.f LIMIT 10; a | b | c | d | e | f ---+---+---+---+---+--- 0 | 0 | 4 | t | t | 2 0 | 0 | 4 | t | t | 1 0 | 1 | 3 | t | t | 2 0 | 1 | 3 | t | t | 1 0 | 2 | 2 | t | t | 2 0 | 2 | 2 | t | t | 1 0 | 3 | 1 | t | t | 2 0 | 3 | 1 | t | t | 1 0 | 4 | 0 | t | t | 2 0 | 4 | 0 | t | t | 1 (10 rows) (Note the column f is sorted in the descending order.) ISTM this was brought by the following change. In truncate_useless_pathkeys, if query_pathkeys is longer than pathkeys made from index columns old patch picked up the latter as IndexPath's pathkeys. But the former is more suitable according to the context here. - truncate_useless_pathkeys returns root-query_pathkeys when the index is fully-ordered and query_pathkeys contains the pathkeys made from index columns. I think it would be required to modify the patch so that the transformation of the pathkeys is performed only when each pathkey of query_pathkeys references the indexed relation. (The above change might have been made according to my comment in an earlier email I sent. But I have to admit my explanation there was not adequate. I'm sorry for that.) Here are random comments. * In grouping_planner(), the patch resets the pathkeys of the cheapest presorted index-path to query_pathkeys through get_cheapest_fractional_path_for_pathkeys(). Is this necessary? ISTM the transformation of the pathkeys after the scan/join optimization would be no longer necessary once it has been done at the index-path creation time, ie, build_index_paths(). Why does the patch do that thing? * In get_relation_info(), the patch determines the branch condition keyattno != ObjectIdAttributeNumber. I fail to understand the meaning of this branch condition. Could you explain about it? Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Python: domain over array support
2013/11/25 Heikki Linnakangas hlinnakan...@vmware.com [...] This does change the behavior of any existing functions that return a domain over array. For example: postgres=# create domain intarr as integer[]; CREATE DOMAIN postgres=# create function intarr_test() returns intarr as $$ return '{1,2}' $$ language plpythonu; CREATE FUNCTION Before patch: postgres=# select intarr_test(); intarr_test - {1,2} (1 row) After patch: postgres=# select intarr_test(); ERROR: invalid input syntax for integer: { CONTEXT: while creating return value PL/Python function intarr_test The new behavior is clearly better, but it is an incompatibility nonetheless. I don't do anything with PL/python myself, so I don't have a good feel of how much that'll break people's applications. Probably not much I guess. But warrants a mention in the release notes at least. Any thoughts on that? - Heikki Bear in mind that the same goes for receiving domains over arrays as parameters; instead of seeing a string (previous behavior), with this patch a function will see a list from the Python side (the function implementation). A mention in the release notes is in order, I agree with that. I can't speak for other people, but I guess using domains over arrays as parameters and/or return values in plpythonu functions should be rare, considering the current behavior and especially given the possibility of using a regular array in order to handle array values a lists in Python. Regards, -- Rodolfo
Re: [HACKERS] UNION ALL on partitioned tables won't use indices.
Thank you. I'll soon come up with regress for the patch 3, 4. = See commit dd4134ea, which added that text. IIRC, the key point is that among real EC members, there will never be duplicates: a Var x.y for instance cannot appear in two different ECs, because we'd have merged the two ECs into one instead. However, this property does *not* hold for child EC members. The problem is that two completely unrelated UNION ALL child subselects might have, say, constant 1 in their tlists. This should not lead to concluding that we can merge ECs that mention their UNION ALL parent variables. Thanks for the pointer; I found things clearer after reviewing the ECs from the test case queries from commits dd4134ea and 57664ed2. Is it correct that ec-members are parted into two groups one of which is 'full-fledged' and never duplicate among different ECs so as to be useful for forming pathkeys (parent) and another is immatured and might be duplicate which could not form pathkeys. Thus the new-in-patch-1 third group which is not duplicate (because they should be always be Var x.y (for this case)) but immatured (child) currently can not be adequately classified? My PATCH-1 which make them in the third group forcibly classified as 'parent' instead of 'child' as before therefore makes 'broken' ec member? I've not looked at these patches yet, but if they're touching equivclass.c at all, I'd guess that it's wrong. Only PATCH-1 touched equivclass.c. Surely. My gut feeling after two minutes' thought is that the best fix is for expand_inherited_rtentry to notice when the parent table is already an appendrel member, and enlarge that appendrel instead of making a new one. (This depends on the assumption that pull_up_simple_union_all always happens first, but that seems OK.) I'm not sure if that concept exactly corresponds to either PATCH-3 or PATCH-4, but that's the way I'd think about it. PATCH-4 does approximately that. I agree that's the right general direction. Ok, I'll provide regress for the patches 3 and 4. An alternative to extending our ability to pull up UNION ALL subqueries, having perhaps broader applicability, would be to push down the possibly-useful sort order to subqueries we can't pull up. We'd sometimes have two levels of MergeAppend, but that could still handily beat the explicit-sort plan. In any case, it is indeed a separate topic. For the sake of the archives, you can get such plans today by manually adding the ORDER BY to the relevant UNION ALL branch: Yes, but this seems could be done in path-generation phase or earlier (that's mere a smattering). -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO: Split out pg_resetxlog output into pre- and post-sections
On Mon, Nov 25, 2013 at 9:17 AM, Rajeev rastogi rajeev.rast...@huawei.com wrote: On Sat, Nov 9, 2013, Amit Kapila wrote Further Review of this patch: a. there are lot of trailing whitespaces in patch. b. why to display 'First log segment after reset' in 'Currrent pg_control values' section as now the same information is available in new section Values to be used after reset: ? c. I think it is better to display 'First log segment after reset' as file name as it was earlier. This utility takes filename as input, so I think we should display filename. d. I still think it is not good idea to use new parameter changedParam to detect which parameters are changed and the reason is code already has that information. We should try to use that information rather introducing new parameter to mean the same. e. if (guessed !force) { PrintControlValues(guessed); if (!noupdate) { printf(_(\nIf these values seem acceptable, use -f to force reset.\n)); exit(1); } Do you think there will be any chance when noupdate can be true in above loop, if not then why to keep the check for same? f. if (changedParam DISPLAY_XID) { printf(_(NextXID: %u\n), ControlFile.checkPointCopy.nextXid); printf(_(oldestXID:%u\n), ControlFile.checkPointCopy.oldestXid); } Here you are priniting oldestXid, but not oldestXidDB, if user provides xid both values are changed. Same is the case for multitransaction. g. /* newXlogSegNo will be always printed unconditionally*/ Extra space between always and printed. In single line comments space should be provided when comment text ends, please refer other single line comments. In case when the values are guessed and -n option is not given, then this patch will not be able to distinguish the values. Don't you think it is better to display values in 2 sections in case of guessed values without -n flag as well, otherwise this utility will have 2 format's to display values? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNION ALL on partitioned tables won't use indices.
Hello, I'll soon come up with regress for the patch 3, 4. New version patches added with regression tests are attached. I'll show you pulling-up-in-expand_inherited_rtentry version later. regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 6670794..5d1cf83 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -404,6 +404,15 @@ subquery_planner(PlannerGlobal *glob, Query *parse, expand_inherited_tables(root); /* + * Collapse multilevel inheritances into fewer levels. + * + * UNION ALL containing subselects on inherited tables leaves multilevel + * inheritance after calling expand_inherited_tables(). Fold them in + * order to make MergeAppend plan available for such queries. + */ + collapse_appendrels(root); + + /* * Set hasHavingQual to remember if HAVING clause is present. Needed * because preprocess_expression will reduce a constant-true condition to * an empty qual list ... but HAVING TRUE is not a semantic no-op. diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c index e249628..c7933b5 100644 --- a/src/backend/optimizer/prep/prepunion.c +++ b/src/backend/optimizer/prep/prepunion.c @@ -57,6 +57,11 @@ typedef struct AppendRelInfo *appinfo; } adjust_appendrel_attrs_context; +typedef struct { + AppendRelInfo *child_appinfo; + Index target_rti; +} transvars_merge_context; + static Plan *recurse_set_operations(Node *setOp, PlannerInfo *root, double tuple_fraction, List *colTypes, List *colCollations, @@ -98,6 +103,8 @@ static List *generate_append_tlist(List *colTypes, List *colCollations, List *input_plans, List *refnames_tlist); static List *generate_setop_grouplist(SetOperationStmt *op, List *targetlist); +static Node *transvars_merge_mutator(Node *node, + transvars_merge_context *ctx); static void expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti); static void make_inh_translation_list(Relation oldrelation, @@ -1211,6 +1218,131 @@ expand_inherited_tables(PlannerInfo *root) } /* + * collapse_appendrels + * Pulling up the descendents by recursively combining successive + * translations. + * + * Although the same thing could be done in adjust_appendrel_attrs(), + * doing it here all through at once is more efficient than individually + * (and maybe repetitively) done there. + */ +void +collapse_appendrels(PlannerInfo *root) +{ + Index last_parent_relid = 1; + AppendRelInfo *last_parent = NULL; + ListCell *lcchild; + ListCell *lcparent; + bool full_search = false; + + foreach(lcchild, root-append_rel_list) + { + AppendRelInfo *ch_appinfo = (AppendRelInfo *)lfirst(lcchild); + Index ch_parent_relid = ch_appinfo-parent_relid; + + if (last_parent_relid != ch_parent_relid) + { + /* + * Find the parent for the current child appinfo if parent relid + * is different from that of preveous child. + */ + do + { +/* + * For most cases, the relations are referred to as the parent + * in their apperarence order in rtable from + * append_rel_list. So start searching for the parent appinfos + * from just after the last parent. If the incremental search + * was failed, retry searching the entire list and exit on + * failure. + */ +if (!last_parent) +{ + /* + * If this is the first time or the preveous search was + * failed, set up for full search. + */ + lcparent = list_head(root-append_rel_list); + last_parent_relid = 1; + full_search = true; +} + +last_parent = NULL; +for_each_cell(lcparent, lcparent) +{ + /* + * Children's and parents' apprelinfos are bonded via + * rtable relations. So two apprelinfos are in + * parent-child relationship when the child_relid of the + * parent is equal to the parent_relid of the child. + */ + AppendRelInfo *papp = (AppendRelInfo*)lfirst(lcparent); + if (papp-child_relid == ch_parent_relid) + { + last_parent = papp; + last_parent_relid = ch_parent_relid; + + /* Search from the next cell next time. */ + lcparent = lnext(lcparent); + full_search = false; + break; /* Parent found */ + } +} +/* Retry only when incremental search was failed. */ + } while (!full_search !last_parent); + } + + /* + * Replace child translated_vars so as to be a direct children of the + * topmost relation. + */ + if (last_parent) + { + transvars_merge_context ctx; + + ctx.child_appinfo = ch_appinfo; + ctx.target_rti = last_parent-child_relid; + + ch_appinfo-translated_vars = +(List*)expression_tree_mutator( + (Node*)last_parent-translated_vars, + transvars_merge_mutator, ctx); + ch_appinfo-parent_relid = last_parent-parent_relid; +
Re: [HACKERS] Extension Templates S03E11
On Tue, 2013-11-26 at 01:37 +0200, Heikki Linnakangas wrote: Back in December, when I agreed that upload zip file via libpq might be useful, Tom suggested that we call control+sql file a template, and the installed entity an extension. Simply uploading safe extension files (i.e. not native libraries) using the protocol seems narrower in scope and less controversial. However, I like the idea of putting extensions in the catalogs (aside from DSOs of course), too. Setting aside compatibility problems with existing extensions, do you think that's a reasonable goal to work toward, or do you think that's a dead end? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan
On 25 November 2013 13:37, Etsuro Fujita fujita.ets...@lab.ntt.co.jpwrote: Reconsidering that, I wish to know your opinion. The patch shows the number of exact/lossy pages that has been fetched in a bitmap heap scan. But the number varies with the fraction of tuples to be retrieved like the following. postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and 0.02; QUERY PLAN Bitmap Heap Scan on demo (cost=2187.35..101419.96 rows=102919 width=42) (actual time=23.684..1302.382 rows=99803 loops=1) Recheck Cond: ((col2 = 0.01::double precision) AND (col2 = 0.02::double precision)) Rows Removed by Index Recheck: 6279502 Heap Blocks: exact=1990 lossy=59593 - Bitmap Index Scan on demo_col2_idx (cost=0.00..2161.62 rows=102919 width=0) (actual time=23.330..23.330 rows=99803 loops=1) Index Cond: ((col2 = 0.01::double precision) AND (col2 = 0.02::double precision)) Total runtime: 1311.949 ms (7 rows) postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and 0.02 LIMIT 5000; QUERY PLAN -- Limit (cost=2187.35..7008.26 rows=5000 width=42) (actual time=23.543..86.093 rows=5000 loops=1) - Bitmap Heap Scan on demo (cost=2187.35..101419.96 rows=102919 width=42) (actual time=23.542..85.196 rows=5000 loops=1) Recheck Cond: ((col2 = 0.01::double precision) AND (col2 = 0.02::double precision)) Rows Removed by Index Recheck: 312179 Heap Blocks: exact=99 lossy=2963 - Bitmap Index Scan on demo_col2_idx (cost=0.00..2161.62 rows=102919 width=0) (actual time=23.189..23.189 rows=99803 loops=1) Index Cond: ((col2 = 0.01::double precision) AND (col2 = 0.02::double precision)) Total runtime: 86.626 ms (8 rows) So, my question is, we should show the number of exact/lossy pages in a TIDBitmap, not the number of these pages that has been fetched in the bitmap heap scan? Yes, I agree that rather than looking at the bitmap heap scan to track the number of pages, we should look somewhere in the underlying index scan. Yes, we should get a constant number of index pages regardless of the actual parent table rows. I can see that btgetbitmap() adds all the tuples into the bitmap, so somewhere below under btgetbitmap() might be the right place to track. Somewhere in tbm_create_pagetable(), but not sure. Thanks, Best regards, Etsuro Fujita