Re: [HACKERS] bgwriter reference to HOT standby
On 24 January 2013 07:25, Pavan Deolasee pavan.deola...@gmail.com wrote: On Thu, Jan 24, 2013 at 12:36 PM, Jeff Janes jeff.ja...@gmail.com wrote: The docs on bgworker twice refer to HOT standby. I don't think that in either case, the hot needs emphasis, and if it does making it look like an acronym (one already used for something else) is probably not the way to do it. I think it should it be Hot Standby for consistency. +1 for changing it from HOT to hot/Hot anyway Patch applied. Thanks -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG #6510: A simple prompt is displayed using wrong charset
On 01/24/2013 01:06 AM, Alexander Law wrote: Hello, Please let me know if I can do something to get the bug fix (https://commitfest.postgresql.org/action/patch_view?id=902) committed. I would like to fix other bugs related to postgres localization, but I am not sure yet how to do it. For anyone looking for the history, the 1st post on this topic is here: http://www.postgresql.org/message-id/e1s3twb-0004oy...@wrigleys.postgresql.org -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state
On 24 January 2013 01:17, Robert Haas robertmh...@gmail.com wrote: I agree. The thing that scares me about the logical replication stuff is not that it might be slow (and if your numbers are to be believed, it isn't), but that I suspect it's riddled with bugs and possibly some questionable design decisions. If we commit it and release it, then we're going to be stuck maintaining it for a very, very long time. If it turns out to have serious bugs that can't be fixed without a new major release, it's going to be a serious black eye for the project. Of course, I have no evidence that that will happen. This is a generic argument against applying any invasive patch. I agree 9.2 had major bugs on release, though that was because of the invasive nature of some of the changes, even in seemingly minor patches. The most invasive and therefore risky changes in this release are already committed - changes to the way WAL reading and timelines work. If we don't apply a single additional patch in this CF, we will still in my opinion have a major requirement for beta testing prior to release. The code executed here is isolated to users of the new feature and is therefore low risk to non-users. Of course there will be bugs. Everybody understands what new feature means and we as a project aren't exposed to risks from this. New feature also means groundbreaking new capabilities, so the balance of high reward, low risk means this gets my vote to apply. I'm just about to spend some days giving a final review on it to confirm/refute that opinion in technical detail. Code using these features is available and marked them clearly as full copyright transfer to PGDG, TPL licenced. That code is external not by author's choice, but at the specific request of the project to make it thay way. I personally will be looking to add code to core over time. It was useful for everybody that replication solutions started out of core, but replication is now a core requirement for databases and we must fully deliver on that thought. I agree with your concern re: checksums and foreign key locks. FK locks has had considerable review and support, so I expect that to be a manageable issue. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Setting visibility map in VACUUM's second phase
On Friday, December 7, 2012, Pavan Deolasee wrote: Revised patch attached. This time much less invasive. I added a new function heap_page_is_all_visible() to check if the given page is all-visible and also return the visibility cutoff xid. Hi Pavan, lazy_vacuum_page now pins and unpins the vmbuffer for each page it marks all-visible, which seems like a lot of needless traffic since the next vmbuffer is likely to be the same as the previous one. Could it instead get vmbuffer passed down to it from the two calling sites? (One call site would have to have it introduced, just for this purpose, the other should be able to use its existing one.) Cheers, Jeff
Re: [HACKERS] [sepgsql 1/3] add name qualified creation label
2013/1/24 Tom Lane t...@sss.pgh.pa.us: John R Pierce pie...@hogranch.com writes: On 1/23/2013 8:32 PM, Tom Lane wrote: FWIW, in Fedora-land I see: ... I'd be far more interested in what is in RHEL and CentOS.Fedora, with its 6 month obsolescence cycle, is of zero interest to me for deploying database servers. But of course Fedora is also the upstream that will become RHEL7 and beyond. EL6 has libselinux 2.0.94 EL5 has libselinux 1.33.4 sepgsql already requires libselinux 2.0.99, so it doesn't appear to me that moving that goalpost is going to change things one way or the other for the existing RHEL branches. I couldn't ship contrib/sepgsql today in those branches. It might be that the update timing makes a bigger difference in some other distros, though. To return to Heikki's original point about Debian, what are they shipping today? Even though I'm not good at release cycle of Debian, I tried to check the shipped version of postgresql and libselinux for stable, testing, unstable and experimental release. I'm not certain why they don't push postgresql-9.2 into experimental release yet. However, it seems to me optimistic libselinux-2.1.10 being bundled on the timeline of postgresql-9.3. If someone familiar with Debian's release cycle, I'd like to see the suggestion. * Debian (stable) ... postgresql-8.4 + libselinux-2.0.96 http://packages.debian.org/en/squeeze/postgresql http://packages.debian.org/en/source/squeeze/libselinux * Debian (testing) ... postgresql-9.1 + libselinux-2.1.9 http://packages.debian.org/en/wheezy/postgresql http://packages.debian.org/en/source/wheezy/libselinux * Debian (unstable) ... postgresql-9.1 + libselinux-2.1.9 http://packages.debian.org/en/sid/postgresql http://packages.debian.org/en/source/sid/libselinux * Debian (experimental) ... postgresql-9.1 + libselinux-2.1.12 http://packages.debian.org/en/experimental/postgresql http://packages.debian.org/en/source/experimental/libselinux Also, Ubuntu almost reflects Debian's release. * Ubuntu 11.10 ... postgresql-9.1 + libselinux-2.0.98 https://launchpad.net/ubuntu/oneiric/+package/postgresql https://launchpad.net/ubuntu/oneiric/+source/libselinux * Ubuntu 12.04 ... postgresql-9.1 + libselinux-2.1.0 https://launchpad.net/ubuntu/precise/+package/postgresql https://launchpad.net/ubuntu/precise/+source/libselinux * Ubuntu 12.10 ... postgresql-9.1 + libselinux-2.1.9 https://launchpad.net/ubuntu/quantal/+package/postgresql https://launchpad.net/ubuntu/quantal/+source/libselinux Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CF3+4 (was Re: Parallel query execution)
On Wed, Jan 23, 2013 at 10:14 PM, Robert Haas robertmh...@gmail.com wrote: I think there's been something of a professionalization of PostgreSQL development over the last few years. More and more people are able to get paid to work on PostgreSQL as part or in a few cases all of their job. This trend includes me, but also a lot of other people. There are certainly good things about this, but I think it increases the pressure to get patches committed. Also many of the new developers who were previously working from proprietary companies (includes me) may not have seen so long cycles for feature development, from design to acceptance/rejection. The problem aggravates because the same developer was previously doing development at much much faster pace and will find our style very conservative. Of course, the quality of work might be a notch lower when you are coding features at that pace and you may introduce regression and bugs on the way, but that probably gets compensated by more structured QA and testing that proprietary companies do. And many of these developers might be working on equally important and mission critical products even before. I know we are very conscious of our code quality and product stability, and for the right reasons, but I wonder the overall productivity will go up if we do the same i.e. have quick feature acceptance cycles compensated by more QA. The problem is being a community driven project we attract more developers but very less QA people. Would it help to significantly enhance our regression coverage so that bugs are caught early (assuming that's a problem) ? If so, may be we should have a month-long QAfest once where every developer is only writing test cases. Would it help to have a formal product manager (or a team) that gets to decide whether we want a particular feature or not ? This might be similar to our current model of discussion on hackers but more time bound and with the authority to the team to accept/reject ideas at the end of the time period and not keeping it vague for later decision after people have put in a lot of efforts already. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Setting visibility map in VACUUM's second phase
Hi Jeff, On Thu, Jan 24, 2013 at 2:41 PM, Jeff Janes jeff.ja...@gmail.com wrote: lazy_vacuum_page now pins and unpins the vmbuffer for each page it marks all-visible, which seems like a lot of needless traffic since the next vmbuffer is likely to be the same as the previous one. Good idea. Even though the cost of pinning/unpinning may not be high with respect to the vacuum cost itself, but it seems to be a good idea because we already do that at other places. Do you have any other review comments on the patch or I'll fix this and send an updated patch soon. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH 0/3] Work around icc miscompilation
I'm sending three smaller patches for review, which try to fix icc and pathscale (mis)compilation problems described in my last email. Not sure if we need to fix the overflow check x + 1 = 0 (x 0) at src/backend/executor/execQual.c:3088, which icc optimizes away, too. Fix x + y x overflow checks Fix x + 1 x overflow checks Fix overflow checking in repeat() src/backend/utils/adt/float.c |8 src/backend/utils/adt/oracle_compat.c | 18 +++--- src/backend/utils/adt/varbit.c|7 +-- src/backend/utils/adt/varlena.c | 10 ++ src/pl/plpgsql/src/pl_exec.c |4 ++-- 5 files changed, 24 insertions(+), 23 deletions(-) On 1/20/13 2:58 AM, Xi Wang wrote: Intel's icc and PathScale's pathcc compilers optimize away several overflow checks, since they consider signed integer overflow as undefined behavior. This leads to a vulnerable binary. Currently we use -fwrapv to disable such (mis)optimizations in gcc, but not in other compilers. Examples 1) x + 1 = 0 (assuming x 0). src/backend/executor/execQual.c:3088 Below is the simplified code. - void bar(void); void foo(int this_ndims) { if (this_ndims = 0) return; int elem_ndims = this_ndims; int ndims = elem_ndims + 1; if (ndims = 0) bar(); } - $ icc -S -o - sadd1.c ... foo: # parameter 1: %edi ..B1.1: ..___tag_value_foo.1: ..B1.2: ret 2) x + 1 x src/backend/utils/adt/float.c:2769 src/backend/utils/adt/float.c:2785 src/backend/utils/adt/oracle_compat.c:1045 (x + C x) Below is the simplified code. - void bar(void); void foo(int count) { int result = count + 1; if (result count) bar(); } - $ icc -S -o - sadd2.c ... foo: # parameter 1: %edi ..B1.1: ..___tag_value_foo.1: ret 3) x + y = x (assuming y 0) src/backend/utils/adt/varbit.c:1142 src/backend/utils/adt/varlena.c:1001 src/backend/utils/adt/varlena.c:2024 src/pl/plpgsql/src/pl_exec.c:1975 src/pl/plpgsql/src/pl_exec.c:1981 Below is the simplified code. - void bar(void); void foo(int sp, int sl) { if (sp = 0) return; int sp_pl_sl = sp + sl; if (sp_pl_sl = sl) bar(); } - $ icc -S -o - sadd3.c foo: # parameter 1: %edi # parameter 2: %esi ..B1.1: ..___tag_value_foo.1: ..B1.2: ret Possible fixes == * Recent versions of icc and pathcc support gcc's workaround option, -fno-strict-overflow, to disable some optimizations based on signed integer overflow. It's better to add this option to configure. They don't support gcc's -fwrapv yet. * This -fno-strict-overflow option cannot help in all cases: it cannot prevent the latest icc from (mis)compiling the 1st case. We could also fix the source code by avoiding signed integer overflows, as follows. x + y = 0 (assuming x 0, y 0) -- x INT_MAX - y x + y = x (assuming y 0) -- x INT_MAX - y I'd suggest to fix the code rather than trying to work around the compilers since the fix seems simple and portable. See two recent compiler bugs of -fwrapv/-fno-strict-overflow as well. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55883 http://software.intel.com/en-us/forums/topic/358200 * I don't have access to IBM's xlc compiler. Not sure how it works for the above cases. - xi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Event Triggers: adding information
Robert Haas robertmh...@gmail.com writes: Or maybe we should just silently ignore failures to look up the event trigger. That might be better, because the DBA could always do: DROP FUNCTION myeventtrgfn() CASCADE; ...and it would be undesirable for other sessions to error out in that case due to SnapshotNow effects. What about taking a lock on the functions we decide we will need to be running, maybe a ShareUpdateExclusiveLock, so that the function can not disappear under us from concurrent activity? Note to self, most probably using: LockRelationOid(fnoid, ShareUpdateExclusiveLock); After all, we might be right not to optimize for DDL concurrency… Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH 1/3] Fix x + y x overflow checks
icc optimizes away the overflow check x + y x (y 0), because signed integer overflow is undefined behavior in C. Instead, use a safe precondition test x INT_MAX - y. --- src/backend/utils/adt/varbit.c |7 +-- src/backend/utils/adt/varlena.c | 10 ++ src/pl/plpgsql/src/pl_exec.c|4 ++-- 3 files changed, 13 insertions(+), 8 deletions(-) diff --git a/src/backend/utils/adt/varbit.c b/src/backend/utils/adt/varbit.c index 1712c12..af69200 100644 --- a/src/backend/utils/adt/varbit.c +++ b/src/backend/utils/adt/varbit.c @@ -16,6 +16,8 @@ #include postgres.h +#include limits.h + #include access/htup_details.h #include libpq/pqformat.h #include nodes/nodeFuncs.h @@ -1138,12 +1140,13 @@ bit_overlay(VarBit *t1, VarBit *t2, int sp, int sl) ereport(ERROR, (errcode(ERRCODE_SUBSTRING_ERROR), errmsg(negative substring length not allowed))); - sp_pl_sl = sp + sl; - if (sp_pl_sl = sl) + if (sl INT_MAX - sp) ereport(ERROR, (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), errmsg(integer out of range))); + sp_pl_sl = sp + sl; + s1 = bitsubstring(t1, 1, sp - 1, false); s2 = bitsubstring(t1, sp_pl_sl, -1, true); result = bit_catenate(s1, t2); diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c index 95e41bf..c907f44 100644 --- a/src/backend/utils/adt/varlena.c +++ b/src/backend/utils/adt/varlena.c @@ -997,12 +997,13 @@ text_overlay(text *t1, text *t2, int sp, int sl) ereport(ERROR, (errcode(ERRCODE_SUBSTRING_ERROR), errmsg(negative substring length not allowed))); - sp_pl_sl = sp + sl; - if (sp_pl_sl = sl) + if (sl INT_MAX - sp) ereport(ERROR, (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), errmsg(integer out of range))); + sp_pl_sl = sp + sl; + s1 = text_substring(PointerGetDatum(t1), 1, sp - 1, false); s2 = text_substring(PointerGetDatum(t1), sp_pl_sl, -1, true); result = text_catenate(s1, t2); @@ -2020,12 +2021,13 @@ bytea_overlay(bytea *t1, bytea *t2, int sp, int sl) ereport(ERROR, (errcode(ERRCODE_SUBSTRING_ERROR), errmsg(negative substring length not allowed))); - sp_pl_sl = sp + sl; - if (sp_pl_sl = sl) + if (sl INT_MAX - sp) ereport(ERROR, (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), errmsg(integer out of range))); + sp_pl_sl = sp + sl; + s1 = bytea_substring(PointerGetDatum(t1), 1, sp - 1, false); s2 = bytea_substring(PointerGetDatum(t1), sp_pl_sl, -1, true); result = bytea_catenate(s1, t2); diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c index 0ecf651..a9cf6df 100644 --- a/src/pl/plpgsql/src/pl_exec.c +++ b/src/pl/plpgsql/src/pl_exec.c @@ -1972,13 +1972,13 @@ exec_stmt_fori(PLpgSQL_execstate *estate, PLpgSQL_stmt_fori *stmt) */ if (stmt-reverse) { - if ((int32) (loop_value - step_value) loop_value) + if (loop_value INT_MIN + step_value) break; loop_value -= step_value; } else { - if ((int32) (loop_value + step_value) loop_value) + if (loop_value INT_MAX - step_value) break; loop_value += step_value; } -- 1.7.10.4 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH 2/3] Fix x + 1 x overflow checks
icc optimizes away x + 1 x because signed integer overflow is undefined behavior in C. Instead, simply check if x is INT_MAX. --- src/backend/utils/adt/float.c |8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c index b73e0d5..344b092 100644 --- a/src/backend/utils/adt/float.c +++ b/src/backend/utils/adt/float.c @@ -2764,12 +2764,12 @@ width_bucket_float8(PG_FUNCTION_ARGS) result = 0; else if (operand = bound2) { - result = count + 1; /* check for overflow */ - if (result count) + if (count == INT_MAX) ereport(ERROR, (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), errmsg(integer out of range))); + result = count + 1; } else result = ((float8) count * (operand - bound1) / (bound2 - bound1)) + 1; @@ -2780,12 +2780,12 @@ width_bucket_float8(PG_FUNCTION_ARGS) result = 0; else if (operand = bound2) { - result = count + 1; /* check for overflow */ - if (result count) + if (count == INT_MAX) ereport(ERROR, (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), errmsg(integer out of range))); + result = count + 1; } else result = ((float8) count * (bound1 - operand) / (bound1 - bound2)) + 1; -- 1.7.10.4 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH 3/3] Fix overflow checking in repeat()
icc optimizes away `check + VARHDRSZ = check' since signed integer overflow is undefined behavior. Simplify the overflow check for `VARHDRSZ + count * slen' as `count (INT_MAX - VARHDRSZ) / slen'. --- src/backend/utils/adt/oracle_compat.c | 18 +++--- 1 file changed, 7 insertions(+), 11 deletions(-) diff --git a/src/backend/utils/adt/oracle_compat.c b/src/backend/utils/adt/oracle_compat.c index d448088..a85a672 100644 --- a/src/backend/utils/adt/oracle_compat.c +++ b/src/backend/utils/adt/oracle_compat.c @@ -15,6 +15,8 @@ */ #include postgres.h +#include limits.h + #include utils/builtins.h #include utils/formatting.h #include mb/pg_wchar.h @@ -1034,20 +1036,14 @@ repeat(PG_FUNCTION_ARGS) count = 0; slen = VARSIZE_ANY_EXHDR(string); - tlen = VARHDRSZ + (count * slen); /* Check for integer overflow */ - if (slen != 0 count != 0) - { - int check = count * slen; - int check2 = check + VARHDRSZ; - - if ((check / slen) != count || check2 = check) - ereport(ERROR, - (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), -errmsg(requested length too large))); - } + if (slen != 0 count (INT_MAX - VARHDRSZ) / slen) + ereport(ERROR, + (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), +errmsg(requested length too large))); + tlen = VARHDRSZ + (count * slen); result = (text *) palloc(tlen); SET_VARSIZE(result, tlen); -- 1.7.10.4 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 0/3] Work around icc miscompilation
On 24.01.2013 11:33, Xi Wang wrote: I'm sending three smaller patches for review, which try to fix icc and pathscale (mis)compilation problems described in my last email. These patches look ok at a quick glance, but how do we ensure this kind of problems don't crop back again in the future? Does icc give a warning about these? Do we have a buildfarm animal that produces the warnings? If we fix these, can we stop using -frapv on gcc? Is there any way to get gcc to warn about these? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
On Thursday, January 24, 2013 7:42 AM Noah Misch wrote: On Wed, Jan 09, 2013 at 11:22:06AM +, Simon Riggs wrote: On 24 December 2012 16:57, Amit Kapila amit.kap...@huawei.com wrote: Performance: Average of 3 runs of pgbench in tps 9.3devel | with trailing null patch --+-- 578.9872 | 573.4980 On balance, it would seem optimizing for this special case would affect everybody negatively; not much, but enough. Which means we should rekect this patch. Do you have a reason why a different view might be taken? We've mostly ignored performance changes of a few percentage points, because the global consequences of adding or removing code to the binary image can easily change performance that much. It would be great to get to the point where we can reason about 1% performance changes, but we're not there. If a measured +1% performance change would have yielded a different decision, we should rethink the decision based on more-robust criteria. Today, I had taken one more set of readings of original pg_bench which are below in this mail. The difference is that this set of readings are on Suse 11 and with Shared buffers - 4G IMO, the changes in this patch are not causing any regression, however it gives performance/size reduction in some of the cases as shown by data in previous mails. So the point to decide is whether the usecases in which it is giving benefit are worth to have this Patch? As Tom had already raised some concern's about the code, I think the Patch can only have merit if the usecase makes sense to people. System Configuration: Hardware : 4 core (Intel(R) Xeon(R) CPU L5408 @ 2.13GHz) RAM : 24GB Operating System: Suse-Linux 11.2 x86_64 Sever Configuration: The database cluster will be initialized with locales COLLATE: C CTYPE:C MESSAGES: en_US.UTF-8 MONETARY: en_US.UTF-8 NUMERIC: en_US.UTF-8 TIME: en_US.UTF-8 shared_buffers = 4GB checkpoint_segments = 255 checkpoint_timeout = 10min pgbench: transaction type: TPC-B (sort of) scaling factor: 75 query mode: simple number of clients: 4 number of threads: 4 duration: 300 s originalpatch Run-1: 311 312 Run-2: 307 302 Run-3: 300 312 Run-4: 285 295 Run-5: 308 305 Avg : 302.2 305.2 With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 0/3] Work around icc miscompilation
On 1/24/13 5:02 AM, Heikki Linnakangas wrote: These patches look ok at a quick glance, but how do we ensure this kind of problems don't crop back again in the future? Does icc give a warning about these? Do we have a buildfarm animal that produces the warnings? If we fix these, can we stop using -frapv on gcc? Is there any way to get gcc to warn about these? Thanks for reviewing. gcc has this -Wstrict-overflow option to warn against overflow checks that may be optimized away. The result in inaccurate: it may produce a large number of false warnings, and it may also miss many cases (esp. when gcc's value-range-propagation fails to compute variables' ranges). Not sure if other compilers have similar options. I find these broken checks using a static checker I'm developing, and only report cases that existing compilers do miscompile. If you are interested, I'll post a complete list of overflow checks in pgsql that invoke undefined behavior and thus may be killed by future compilers. I believe we can get rid of -fwrapv once we fix all such checks. - xi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state
On 24.01.2013 00:30, Andres Freund wrote: Hi, I decided to reply on the patches thread to be able to find this later. On 2013-01-23 22:48:50 +0200, Heikki Linnakangas wrote: logical changeset generation v4 This is a boatload of infrastructure for supporting logical replication, yet we have no code actually implementing logical replication that would go with this. The premise of logical replication over trigger-based was that it'd be faster, yet we cannot asses that without a working implementation. I don't think this can be committed in this state. Its a fair point that this is a huge amount of code without a user in itself in-core. But the reason it got no user included is because several people explicitly didn't want a user in-core for now but said the first part of this would be to implement the changeset generation as a separate piece. Didn't you actually prefer not to have any users of this in-core yourself? Yes, I certainly did. But we still need to see the other piece of the puzzle to see how this fits with it. BTW, why does all the transaction reordering stuff has to be in core? How much of this infrastructure is to support replicating DDL changes? IOW, if we drop that requirement, how much code can we slash? Any other features or requirements that could be dropped? I think it's clear at this stage that this patch is not going to be committed as it is. If you can reduce it to a fraction of what it is now, that fraction might have a chance. Otherwise, it's just going to be pushed to the next commitfest as whole, and we're going to be having the same doubts and discussions then. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Event Triggers: adding information
Robert Haas robertmh...@gmail.com writes: On Mon, Jan 21, 2013 at 12:27 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: - Context exposing and filtering I'm not feeling very sanguine about any of this. I feel strongly that we should try to do something that's more principled than just throwing random junk into ProcessUtility. The goal is to allow for advanced users of that feature to get at the sequence, constraints and index names that have been picked automatically by PostgreSQL when doing some CREATE TABLE statement that involves them: CREATE TABLE foo (id serial PRIMARY KEY, f1 text); Andres and Steve, as you're the next ones to benefit directly from that whole feature and at different levels, do you have a strong opinion on whether you really need that to operate at the logical level? The trick is not implementing the feature (that's quite easy really), it's about being able to provide it without any API promises, because it's an opening on the internals. Maybe the answer is to provide the same level of information as you get in an Event Trigger to new hooks. - Additional Information to PL/pgSQL We're talking about : - operation (CREATE, ALTER, DROP) - object type (TABLE, FUNCTION, TEXT SEARCH DICTIONARY, …) - object OID(when there's a single object target) - object shema name (when there's a single object target) - object name (when there's a single object target) I realise I omited something important here. The current patch effort is only adding those information if the target object exists in the catalogs at the time of the information lookup. In ddl_command_start, I believe you can only expose the information that is available in the parse tree. That is, if the user typed CREATE TABLE foo, you can tell the event trigger that the user didn't specify a schema and that they mentioned the name foo. You cannot I used to do that, and that's the reason why I had the Command String Normalisation in the same patch: both are walking the parsetree to some extend, and I didn't want to have different routines doing different walks to get at the information. predict what schema foo will actually end up in at that point, and we shouldn't try. I don't have a problem with exposing the information we have at this point with the documented caveat that if you rely on it to mean something it doesn't you get to keep both pieces. Fair enough. I retracted from doing that to be able to separate away the parse tree walking code for later review, and I'm now wondering if that's really what we want to do. If auditing, you can log the information after the command has been run (in case of CREATE and ALTER command) or before the object has been deleted (in case of a DROP command). If you want to cancel the whole operation depending on the name of the object, or the schema it ends up in, then you can do it at the end of the command's execution with a RAISE EXCEPTION. But I wonder: wouldn't it be better to just expose the raw string the user typed? I mean, in all the cases we really care about, the information we can *reliably* expose at this point is going to be pretty nearly the same as extracting the third space-delimited word out of the command text and splitting it on a period. And you can do that much easily even in PL/pgsql. You could also extract a whole lot Totally Agreed. That's another reason why I want to provide users with the Normalized Command String, it will be then be even easier for them to do what you just say. After having been working on that for some time, though, I'm leaning toward Tom's view that you shouldn't try to expose information about objets that don't exists in the catalogs, because that's not code, that's magic. of OTHER potentially useful information from the command text that you couldn't get if we just exposed the given schema name and object name. For example, this would allow you to write an event trigger that lets you enforce a policy that all column names must use Hungarian notation. Note that this is a terrible idea and you shouldn't do it, but you could if you wanted to. And, the trigger would be relatively long and complex and you might have bugs, but that's still better than the status quo where you are simply out of luck. You seem to be talking about System Hungarian, not Apps Hungarian, but I don't think that's the point here. http://www.joelonsoftware.com/articles/Wrong.html Now, in a ddl_command_end trigger, there's a lot more information that you can usefully expose. In theory, if the command records what it did somewhere, you can extract that information with as much precision as it cared to record. However, I'm pretty skeptical of the idea of exposing a single OID. I mean, if the main use case here is There's precedent in PostgreSQL: how do you get information about each row that were in the target from a FOR EACH STATEMENT trigger? array
Re: [HACKERS] Event Triggers: adding information
Robert Haas robertmh...@gmail.com writes: Can you point me specifically at what you have in mind so I can make sure I'm considering the right thing? Something like this part: + -- now try something crazy to ensure we don't crash the backend + create function test_event_trigger_drop_function() + returns event_trigger + as $$ + BEGIN + drop function test_event_trigger2() cascade; + END + $$ language plpgsql; + + create function test_event_trigger2() returns event_trigger as $$ + BEGIN + RAISE NOTICE 'test_event_trigger2: % %', tg_event, tg_tag; + END + $$ language plpgsql; + + create event trigger drop_test_a on ddl_command_start + when tag in ('create table') +execute procedure test_event_trigger_drop_function(); + + create event trigger drop_test_b on ddl_command_start +execute procedure test_event_trigger2(); + + create table event_trigger_fire1 (a int); + + -- then cleanup the crazy test + drop event trigger drop_test_a; + drop event trigger drop_test_b; + drop function test_event_trigger_drop_function(); The only problem for the regression tests being that there's an OID in the ouput, but with your proposed error message that would go away. -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state
Hi Robert, Hi all, On 2013-01-23 20:17:04 -0500, Robert Haas wrote: On Wed, Jan 23, 2013 at 5:30 PM, Andres Freund and...@2ndquadrant.com wrote: The only reason the submitted version of logical decoding is comparatively slow is that its xmin update policy is braindamaged, working on that right now. I agree. The thing that scares me about the logical replication stuff is not that it might be slow (and if your numbers are to be believed, it isn't), but that I suspect it's riddled with bugs and possibly some questionable design decisions. If we commit it and release it, then we're going to be stuck maintaining it for a very, very long time. If it turns out to have serious bugs that can't be fixed without a new major release, it's going to be a serious black eye for the project. Thats way much more along the lines of what I am afraid of than the performance stuff - but Heikki cited those, so I replied to that. Note that I didn't say this must, must go in - I just don't think Heikki's reasoning about why not hit the nail on the head. Of course, I have no evidence that that will happen. But it is a really big piece of code, and therefore unless you are superman, it's probably got a really large number of bugs. The scary thing is that it is not as if we can say, well, this is a big hunk of code, but it doesn't really touch the core of the system, so if it's broken, it'll be broken itself, but it won't break anything else. Rather, this code is deeply in bed with WAL, with MVCC, and with the on-disk format of tuples, and makes fundamental changes to the first two of those. You agreed with Tom that 9.2 is the buggiest release in recent memory, but I think logical replication could easily be an order of magnitude worse. I tried very, very hard to get the basics of the design interface solid. Which obviously doesn't man I am succeeding - luckily not being superhuman after all ;). And I think thats very much where input is desparetely needed and where I failed to raise enough attention. The output plugin interface follewed by the walsender interface is what needs to be most closely vetted. Those are the permanent, user/developer exposed UI and the one we should try to keep as stable as possible. The output plugin callbacks are defined here: http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=src/include/replication/output_plugin.h;hb=xlog-decoding-rebasing-cf4 To make it more agnostic of the technology to implement changeset extraction we possibly should replace the ReorderBuffer(TXN|Change) structs being passed by something more implementation agnostic. walsender interface: http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=src/backend/replication/repl_gram.y;hb=xlog-decoding-rebasing-cf4 The interesting new commands are: 1) K_INIT_LOGICAL_REPLICATION NAME NAME 2) K_START_LOGICAL_REPLICATION NAME RECPTR plugin_options 3) K_FREE_LOGICAL_REPLICATION NAME 1 3 allocate (respectively free) the permanent state associated with one changeset consumer whereas START_LOGICAL_REPLICATION streams out changes starting at RECPTR. Btw, there are currently *no* changes to the wal format at all if wal_format logical except that xl_running_xacts are logged more frequently which obviously could easily be made conditional. Baring bugs of course. The changes with wal_level=logical aren't that big either imo: * heap_insert, heap_update prevent full page writes from removing their normal record by using a separate XLogRecData block for the buffer and the record * heap_delete adds more data (the pkey of the tuple) after the unchanged xl_heap_delete struct * On changes to catalog tables (relfilenode, tid, cmin, cmax) are logged. No changes to mvcc for normal backends at all, unless you count the very slightly changed *Satisfies interface (getting passed a HeapTuple instead of HeapTupleHeader). I am not sure what you're concerned about WRT the on-disk format of the tuples? We are pretty much nailed down on that due to pg_upgrade'ability anyway and it could be changed from this patches POV without a problem, the output plugin just sees normal HeapTuples? Or are you concerned about the code extracting them from the xlog records? So I think the won't break anything else argument can be made rather fairly if the heapam.c changes, which aren't that complex, are vetted closely. Now, the disucssion about all the code thats active *during* decoding is something else entirely :/ You agreed with Tom that 9.2 is the buggiest release in recent memory, but I think logical replication could easily be an order of magnitude worse. I unfortunately think that not providing more builtin capabilities in this area also has significant dangers. Imo this is one of the weakest, or even the weakest, area of postgres. I personally have the impression that just about nobody did actual beta testing of the lastest releases, especially 9.2, and that is the
Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]
On Thursday, January 24, 2013 1:56 AM Andres Freund wrote: On 2013-01-22 12:32:07 +, Amit kapila wrote: This closes all comments raised till now for this patch. Kindly let me know if you feel something is missing? I am coming late to this patch, so bear with me if I repeat somethign said elsewhere. Thanks for looking at patch. Review comments of cursory pass through the patch: * most comments are hard to understand. I know the problem of that being hard for a non-native speaker by heart, but I think another pass over them would be good thing. I shall give another pass to improve the comments in code. * The gram.y changes arround set_rest_(more|common) seem pretty confused to me. E.g. its not possible anymore to set the timezone for a function. What do you exactly mean by this part of comment. And why is it possible to persistently set the search path, but not client encoding? Why is FROM CURRENT in set_rest_more? I think the main reason was some of the commands can work only in transaction and as SET Persistent cannot work inside transaction block, so I had kept some seggregation. I shall reply you the exact reason in separate mail. * set_config_file should elog(ERROR), not return on an unhandled setstmt-kind Shall fix it. * why are you creating AutoConfFileName if its not stat'able? It seems better to simply skip parsing the old file in that case Sure, it's better to handle as you are suggesting. * Writing the temporary file to .$pid seems like a bad idea, better use one file for that, SET PERSISTENT is protected by an exclusive lock anyway. I think we can use one temporary file, infact that was one of the ways I have asked in one of the previous mails. However Tom and Zoltan felt this is better way to do it. I can think of one reason where it will be better to have .$pid, and that is if some one has opened the file manually, then all other sessions can fail (in WINDOWS). Infact this is one of test Zoltan had performed. * the write sequence should be: * fsync(tempfile) * fsync(directory) * rename(tempfile, configfile) * fsync(configfile) * fsync(directory) Why do we need fsync(directory) and fsync(configfile)? * write_auto_conf_file should probably escape quoted values? Can you please elaborate more, I am not able to understand your point? * coding style should be adhered to more closesly, there are many if (pointer) which should be if (pointer != NULL), Are you pointing in function validate_conf_option(const char *name, char *value) for below usage: + if (value) single-line blocks enclosed in curlies which shouldn't, etc. Shall fix. * replace_auto_config_file and surrounding functions need more comments in the header Sure, shall add more comments in function header of following functions: validate_conf_option() RemoveAutoConfTmpFiles() write_auto_conf_file() replace_auto_config_value() * the check that prevents persistent SETs in a transaction should rather be in utility.c and use PreventTransactionChain like most of the others that need to do that (c.f. T_CreatedbStmt). We can move the check in utility.c, but if we use PreventTransactionChain, then it will be disallowed in functions as well. But the original idea was to not support in transaction blocks only. So I feel use of current function IsTransactionBlock() should be okay. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CF3+4 (was Re: Parallel query execution)
On Thursday, January 24, 2013 2:58 AM Stephen Frost wrote: Heikki, * Heikki Linnakangas (hlinnakan...@vmware.com) wrote: FWIW, here's how I feel about some the patches. It's not an exhaustive list. Thanks for going through them and commenting on them. built-in/SQL Command to edit the server configuration file (postgresql.conf) I think this should be a pgfoundry project, not in core. At least for now. I don't think it would ever get deployed or used then.. I think for this feature there has been lot of test already done and reviewed by quite a few people. About the feature's value and high level design, there is quite a lot of discuss already happened. At this point, there is no design level concern for this patch, there are few issues which can be dealt with minor changes. Do you see any lack of importance for this feature? With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state
One random thing that caught my eye in the patch, I though I'd mention it while I still remember: In heap_delete, you call heap_form_tuple() in a critical section. That's a bad idea, because if it runs out of memory - PANIC. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state
On 2013-01-24 12:38:25 +0200, Heikki Linnakangas wrote: On 24.01.2013 00:30, Andres Freund wrote: Hi, I decided to reply on the patches thread to be able to find this later. On 2013-01-23 22:48:50 +0200, Heikki Linnakangas wrote: logical changeset generation v4 This is a boatload of infrastructure for supporting logical replication, yet we have no code actually implementing logical replication that would go with this. The premise of logical replication over trigger-based was that it'd be faster, yet we cannot asses that without a working implementation. I don't think this can be committed in this state. Its a fair point that this is a huge amount of code without a user in itself in-core. But the reason it got no user included is because several people explicitly didn't want a user in-core for now but said the first part of this would be to implement the changeset generation as a separate piece. Didn't you actually prefer not to have any users of this in-core yourself? Yes, I certainly did. But we still need to see the other piece of the puzzle to see how this fits with it. Fair enough. I am also working on a user of this infrastructure but that doesn't help you very much. Steve Singer seemed to make some stabs at writing an output plugin as well. Steve, how far did you get there? BTW, why does all the transaction reordering stuff has to be in core? It didn't use to, but people argued pretty damned hard that no undecoded data should ever allowed to leave the postgres cluster. And to be fair it makes writing an output plugin *way* much easier. Check http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=contrib/test_decoding/test_decoding.c;hb=xlog-decoding-rebasing-cf4 If you skip over tuple_to_stringinfo(), which is just pretty generic scaffolding for converting a whole tuple to a string, writing out the changes in some format by now is pretty damn simple. How much of this infrastructure is to support replicating DDL changes? IOW, if we drop that requirement, how much code can we slash? Unfortunately I don't think too much unless we add in other code that allows us to check whether the current definition of a table is still the same as it was back when the tuple was logged. Any other features or requirements that could be dropped? I think it's clear at this stage that this patch is not going to be committed as it is. If you can reduce it to a fraction of what it is now, that fraction might have a chance. Otherwise, it's just going to be pushed to the next commitfest as whole, and we're going to be having the same doubts and discussions then. One thing that reduces complexity is to declare the following as unsupported: - CREATE TABLE foo(data text); - DECODE UP TO HERE; - INSERT INTO foo(data) VALUES(very-long-to-be-externally-toasted-tuple); - DROP TABLE foo; - DECODE UP TO HERE; but thats just a minor thing. I think what we can do more realistically than to chop of required parts of changeset extraction is to start applying some of the preliminary patches independently: - the relmapper/relfilenode changes + pg_relation_by_filenode(spc, relnode) should be independently committable if a bit boring - allowing walsenders to connect to a database possibly needs an interface change but otherwise it should be fine to go in independently. It also has other potential use-cases, so I think thats fair. - logging xl_running_xact's more frequently could also be committed independently and makes sense independently as it allows a standby to enter HS faster if the master is busy - Introducing InvalidCommandId should be relatively uncontroversial. The fact that no invalid value for command ids exists is imo an oversight - the *Satisfies change could be applied and they are imo ready but there's no use-case for it without the rest, so I am not sure whether theres a point - currently not separately available, but we could add wal_level=logical independently. There would be no user of it, but it would be partial work. That includes the relcache support for keeping track of the primary key which already is available separately. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] My first patch! (to \df output)
On Thu, Jan 24, 2013 at 2:27 AM, Craig Ringer cr...@2ndquadrant.com wrote: On 01/24/2013 01:50 AM, Phil Sorber wrote: This looks good to me now. Compiles and works as described. Ready to go? https://commitfest.postgresql.org/action/patch_view?id=1008 I guess I wasn't ready to be so bold, but sure. :) I changed it to 'ready for committer'. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: logical changeset generation v4 - Heikki's thoughts about the patch state
On 2013-01-24 13:28:18 +0200, Heikki Linnakangas wrote: One random thing that caught my eye in the patch, I though I'd mention it while I still remember: In heap_delete, you call heap_form_tuple() in a critical section. That's a bad idea, because if it runs out of memory - PANIC. Good point, will fix. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]
On 2013-01-24 16:45:42 +0530, Amit Kapila wrote: * The gram.y changes arround set_rest_(more|common) seem pretty confused to me. E.g. its not possible anymore to set the timezone for a function. What do you exactly mean by this part of comment. The set_rest_more production is used in FunctionSetResetClause and youve removed capabilities from it. And why is it possible to persistently set the search path, but not client encoding? Why is FROM CURRENT in set_rest_more? I think the main reason was some of the commands can work only in transaction and as SET Persistent cannot work inside transaction block, so I had kept some seggregation. Yes, I can see reasons for doing this, I just think the split isn't correct as youve done it. * Writing the temporary file to .$pid seems like a bad idea, better use one file for that, SET PERSISTENT is protected by an exclusive lock anyway. I think we can use one temporary file, infact that was one of the ways I have asked in one of the previous mails. However Tom and Zoltan felt this is better way to do it. The have? I didn't read it like that. The file can only ever written by a running postmaster and we already have code that ensures that. There's absolutely no need for the tempfile to have a nondeterministic name. That makes cleanup way easier as well. I can think of one reason where it will be better to have .$pid, and that is if some one has opened the file manually, then all other sessions can fail (in WINDOWS). Infact this is one of test Zoltan had performed. Why on earth should somebody have the tempfile open? There's an exclusive lock around writing the file in your code and if anybody interferes like that with postgres' temporary data you have far bigger problems than SET PERSISTENT erroring out. * the write sequence should be: * fsync(tempfile) * fsync(directory) * rename(tempfile, configfile) * fsync(configfile) * fsync(directory) Why do we need fsync(directory) and fsync(configfile)? So they don't vanish /get corrupted after a crash? The above is the canonically safe way to safely write a file without an invalid file ever being visible. * write_auto_conf_file should probably escape quoted values? Can you please elaborate more, I am not able to understand your point? You do something like (don't have the code right here) if (quoted) appendStringInfo(= \%s\, value). what happens if value contains a ? * coding style should be adhered to more closesly, there are many if (pointer) which should be if (pointer != NULL), Are you pointing in function validate_conf_option(const char *name, char *value) for below usage: + if (value) For example, yes. * the check that prevents persistent SETs in a transaction should rather be in utility.c and use PreventTransactionChain like most of the others that need to do that (c.f. T_CreatedbStmt). We can move the check in utility.c, but if we use PreventTransactionChain, then it will be disallowed in functions as well. But the original idea was to not support in transaction blocks only. So I feel use of current function IsTransactionBlock() should be okay. I don't think its even remotely safe to allow doing this from a function. Doing it there explicitly allows doing it in a transaction. Should we find out its safe, fine, relaxing restrictions lateron is no problem, imposing ones after introducing a feature is. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sepgsql 1/3] add name qualified creation label
On Thu, Jan 24, 2013 at 10:11 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: 2013/1/24 Tom Lane t...@sss.pgh.pa.us: John R Pierce pie...@hogranch.com writes: On 1/23/2013 8:32 PM, Tom Lane wrote: FWIW, in Fedora-land I see: ... I'd be far more interested in what is in RHEL and CentOS.Fedora, with its 6 month obsolescence cycle, is of zero interest to me for deploying database servers. But of course Fedora is also the upstream that will become RHEL7 and beyond. Do we know which version of Fedora will become RHEL7, and thus, which version of libselinux will go in RHEL7? (And do we know which version of postgres will go in RHEL7, assuming release schedules hold) It might be that the update timing makes a bigger difference in some other distros, though. To return to Heikki's original point about Debian, what are they shipping today? Even though I'm not good at release cycle of Debian, I tried to check the shipped version of postgresql and libselinux for stable, testing, unstable and experimental release. I'm not certain why they don't push postgresql-9.2 into experimental release yet. However, it seems to me optimistic libselinux-2.1.10 being bundled on the timeline of postgresql-9.3. If someone familiar with Debian's release cycle, I'd like to see the suggestion. * Debian (stable) ... postgresql-8.4 + libselinux-2.0.96 http://packages.debian.org/en/squeeze/postgresql http://packages.debian.org/en/source/squeeze/libselinux * Debian (testing) ... postgresql-9.1 + libselinux-2.1.9 http://packages.debian.org/en/wheezy/postgresql http://packages.debian.org/en/source/wheezy/libselinux Just as a note, wheezy is the version that will be the next debian stable, and it's in freeze since quite a while back. So we can safely expect it will be 2.1.9 that's included in the next debian stable. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG #6510: A simple prompt is displayed using wrong charset
On Thu, Jan 24, 2013 at 03:45:36PM +0800, Craig Ringer wrote: On 01/24/2013 01:34 AM, Tom Lane wrote: Alexander Law exclus...@gmail.com writes: Please let me know if I can do something to get the bug fix (https://commitfest.postgresql.org/action/patch_view?id=902) committed. It's waiting on some Windows-savvy committer to pick it up, IMO. I'm no committer, but I can work with Windows and know text encoding issues fairly well. I'll take a look, though I can't do it immediately as I have some other priority work. Should be able to in the next 24h. Feel free, but it's already Ready for Committer. Also, Andrew has said he plans to pick it up. If in doubt, I think this patch is relatively solid in terms of non-committer review. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG #6510: A simple prompt is displayed using wrong charset
On 01/24/2013 07:57 PM, Noah Misch wrote: On Thu, Jan 24, 2013 at 03:45:36PM +0800, Craig Ringer wrote: On 01/24/2013 01:34 AM, Tom Lane wrote: Alexander Law exclus...@gmail.com writes: Please let me know if I can do something to get the bug fix (https://commitfest.postgresql.org/action/patch_view?id=902) committed. It's waiting on some Windows-savvy committer to pick it up, IMO. I'm no committer, but I can work with Windows and know text encoding issues fairly well. I'll take a look, though I can't do it immediately as I have some other priority work. Should be able to in the next 24h. Feel free, but it's already Ready for Committer. Also, Andrew has said he plans to pick it up. If in doubt, I think this patch is relatively solid in terms of non-committer review. I'm happy with that; I'll look elsewhere. Thanks. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] \d failing to find uppercased object names
Hello, some app created tables ussing uppercase letters in table names (this app was migrated from mysql). One tries to use \d to search tables info: ru=# \d pnct_Board Did not find any relation named pnct_Board. Of course, it works with : ru=# \d pnct_Board Table public.pnct_Board Column| Type | Modifiers -+--+--- id | bigint | not null default nextval('pnct_Board_id_seq'::regclass) ... Could this be considered as gotcha? The thing is that Postgres reply is pretty much clear: Did not find any relation named pnct_Board. , but table pnct_Board does exist in the system, so Postgres' reply is obviously incorrect. Version: 9.2.1
Re: [HACKERS] \d failing to find uppercased object names
2013/1/24 Nikolay Samokhvalov samokhva...@gmail.com: some app created tables ussing uppercase letters in table names (this app was migrated from mysql). One tries to use \d to search tables info: ... Could this be considered as gotcha? The thing is that Postgres reply is pretty much clear: Did not find any relation named pnct_Board. , but table pnct_Board does exist in the system, so Postgres' reply is obviously incorrect. Take a look at this section in the docs: http://www.postgresql.org/docs/current/interactive/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS This is expected behavior. -- Victor Y. Yegorov -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_ctl idempotent option
On Thu, Jan 24, 2013 at 09:05:59AM +0530, Ashutosh Bapat wrote: I agree, answering the question, whether the particular attempt of starting a server succeeded or not, will need the current behaviour. Now, question is which of these behaviours should be default? That would work. pg_upgrade knows the cluster version at that point and can use the proper flag. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]
On Thursday, January 24, 2013 5:25 PM Andres Freund wrote: On 2013-01-24 16:45:42 +0530, Amit Kapila wrote: * The gram.y changes arround set_rest_(more|common) seem pretty confused to me. E.g. its not possible anymore to set the timezone for a function. What do you exactly mean by this part of comment. The set_rest_more production is used in FunctionSetResetClause and youve removed capabilities from it. set_rest_more has set_reset_common, so there should be no problem. set_rest_more:/* Generic SET syntaxes: */ set_rest_common | var_name FROM CURRENT_P { VariableSetStmt *n = makeNode(VariableSetStmt); n-kind = VAR_SET_CURRENT; n-name = $1; $$ = n; } And why is it possible to persistently set the search path, but not client encoding? Why is FROM CURRENT in set_rest_more? I think the main reason was some of the commands can work only in transaction and as SET Persistent cannot work inside transaction block, so I had kept some seggregation. Yes, I can see reasons for doing this, I just think the split isn't correct as youve done it. * Writing the temporary file to .$pid seems like a bad idea, better use one file for that, SET PERSISTENT is protected by an exclusive lock anyway. I think we can use one temporary file, infact that was one of the ways I have asked in one of the previous mails. However Tom and Zoltan felt this is better way to do it. The have? I didn't read it like that. The file can only ever written by a running postmaster and we already have code that ensures that. There's absolutely no need for the tempfile to have a nondeterministic name. That makes cleanup way easier as well. Sure, I understand that cleaning will be easier. So IIUC you are suggesting, we should just keep name as postgresql.auto.conf.tmp In general, please read the below mail where it has been suggested to use .$pid http://www.postgresql.org/message-id/28379.1358541...@sss.pgh.pa.us I can think of one reason where it will be better to have .$pid, and that is if some one has opened the file manually, then all other sessions can fail (in WINDOWS). Infact this is one of test Zoltan had performed. Why on earth should somebody have the tempfile open? There's an exclusive lock around writing the file in your code and if anybody interferes like that with postgres' temporary data you have far bigger problems than SET PERSISTENT erroring out. I am also not sure, but may be some people do for test purpose. * the write sequence should be: * fsync(tempfile) * fsync(directory) * rename(tempfile, configfile) * fsync(configfile) * fsync(directory) Why do we need fsync(directory) and fsync(configfile)? So they don't vanish /get corrupted after a crash? The above is the canonically safe way to safely write a file without an invalid file ever being visible. Do you think there will be problem if we just use as below: * fsync(tempfile) * rename(tempfile, configfile) I have seen such kind of use elsewhere also in code (writeTimeLineHistory()) * write_auto_conf_file should probably escape quoted values? Can you please elaborate more, I am not able to understand your point? You do something like (don't have the code right here) if (quoted) appendStringInfo(= \%s\, value). what happens if value contains a ? I think this case is missed. I shall take care to handle the case when value contains such value. * coding style should be adhered to more closesly, there are many if (pointer) which should be if (pointer != NULL), Are you pointing in function validate_conf_option(const char *name, char *value) for below usage: + if (value) For example, yes. Okay, I shall fix these, but I have seen such code at other places (set_config_option), that's why I have kept it like how it is currently. * the check that prevents persistent SETs in a transaction should rather be in utility.c and use PreventTransactionChain like most of the others that need to do that (c.f. T_CreatedbStmt). We can move the check in utility.c, but if we use PreventTransactionChain, then it will be disallowed in functions as well. But the original idea was to not support in transaction blocks only. So I feel use of current function IsTransactionBlock() should be okay. I don't think its even remotely safe to allow doing this from a function. Doing it there explicitly allows doing it in a transaction. As SET command is allowed inside a function, so I don't think without any reason we should disallow SET PERSISTENT in function. The
[HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]
On 2013-01-24 18:37:29 +0530, Amit Kapila wrote: On Thursday, January 24, 2013 5:25 PM Andres Freund wrote: On 2013-01-24 16:45:42 +0530, Amit Kapila wrote: * The gram.y changes arround set_rest_(more|common) seem pretty confused to me. E.g. its not possible anymore to set the timezone for a function. What do you exactly mean by this part of comment. The set_rest_more production is used in FunctionSetResetClause and youve removed capabilities from it. set_rest_more has set_reset_common, so there should be no problem. set_rest_more:/* Generic SET syntaxes: */ set_rest_common | var_name FROM CURRENT_P { VariableSetStmt *n = makeNode(VariableSetStmt); n-kind = VAR_SET_CURRENT; n-name = $1; $$ = n; } True. I still think that the split youve made isn't right as-is. * Writing the temporary file to .$pid seems like a bad idea, better use one file for that, SET PERSISTENT is protected by an exclusive lock anyway. I think we can use one temporary file, infact that was one of the ways I have asked in one of the previous mails. However Tom and Zoltan felt this is better way to do it. The have? I didn't read it like that. The file can only ever written by a running postmaster and we already have code that ensures that. There's absolutely no need for the tempfile to have a nondeterministic name. That makes cleanup way easier as well. Sure, I understand that cleaning will be easier. So IIUC you are suggesting, we should just keep name as postgresql.auto.conf.tmp In general, please read the below mail where it has been suggested to use .$pid http://www.postgresql.org/message-id/28379.1358541...@sss.pgh.pa.us That was in the context of your use of mkstemp() as far as I read it. We use constantly named temp files in other locations as well. Why on earth should somebody have the tempfile open? There's an exclusive lock around writing the file in your code and if anybody interferes like that with postgres' temporary data you have far bigger problems than SET PERSISTENT erroring out. I am also not sure, but may be some people do for test purpose. That seems like an completely pointless reasoning. * the write sequence should be: * fsync(tempfile) * fsync(directory) * rename(tempfile, configfile) * fsync(configfile) * fsync(directory) Why do we need fsync(directory) and fsync(configfile)? So they don't vanish /get corrupted after a crash? The above is the canonically safe way to safely write a file without an invalid file ever being visible. Do you think there will be problem if we just use as below: * fsync(tempfile) * rename(tempfile, configfile) I have seen such kind of use elsewhere also in code (writeTimeLineHistory()) Yea, there are some places where the code isn't entirely safe. No reason to introduce more of those. * the check that prevents persistent SETs in a transaction should rather be in utility.c and use PreventTransactionChain like most of the others that need to do that (c.f. T_CreatedbStmt). We can move the check in utility.c, but if we use PreventTransactionChain, then it will be disallowed in functions as well. But the original idea was to not support in transaction blocks only. So I feel use of current function IsTransactionBlock() should be okay. I don't think its even remotely safe to allow doing this from a function. Doing it there explicitly allows doing it in a transaction. As SET command is allowed inside a function, so I don't think without any reason we should disallow SET PERSISTENT in function. The reason why it's not allowed in transaction is that we have to delete the temporary file in transaction end or at rollback which could have made the logic much more complex. Yea, and allowing use in functions doesn't help you with that at all: Consider the following plpgsql function: $$ BEGIN SET PERSISTENT foo = 'bar'; RAISE ERROR 'blub'; END; $$ Now you have a SET PERSISTENT that succeeded although the transaction failed. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]
Amit Kapila escribió: On Wednesday, January 23, 2013 11:51 PM Alvaro Herrera wrote: Yes it is -- the /etc/postgresql/version/cluster directory (where postgresql.conf resides) is owned by user postgres. So in that case we can consider postgresql.auto.conf to be in path w.r.t postgresql.conf instead of data directory. That seems sensible to me. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG #6510: A simple prompt is displayed using wrong charset
On 01/24/2013 03:42 AM, Craig Ringer wrote: On 01/24/2013 01:06 AM, Alexander Law wrote: Hello, Please let me know if I can do something to get the bug fix (https://commitfest.postgresql.org/action/patch_view?id=902) committed. I would like to fix other bugs related to postgres localization, but I am not sure yet how to do it. For anyone looking for the history, the 1st post on this topic is here: http://www.postgresql.org/message-id/e1s3twb-0004oy...@wrigleys.postgresql.org Yeah. I'm happy enough with this patch. ISTM it's really a bug and should be backpatched, no? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Back-branch update releases coming in a couple weeks
From: Fujii Masao masao.fu...@gmail.com On Thu, Jan 24, 2013 at 7:42 AM, MauMau maumau...@gmail.com wrote: I searched through PostgreSQL mailing lists with WAL contains references to invalid pages, and i found 19 messages. Some people encountered similar problem. There were some discussions regarding those problems (Tom and Simon Riggs commented), but those discussions did not reach a solution. I also found a discussion which might relate to this problem. Does this fix the problem? [BUG] lag of minRecoveryPont in archive recovery http://www.postgresql.org/message-id/20121206.130458.170549097.horiguchi.kyot...@lab.ntt.co.jp Yes. Could you check whether you can reproduce the problem on the latest REL9_2_STABLE? I tried to produce the problem by doing pg_ctl stop -mi against the primary more than ten times on REL9_2_STABLE, but the problem did not appear. However, I encountered the crash only once out of dozens of failovers, possibly more than a hundred times, on PostgreSQL 9.1.6. So, I'm not sure the problem is fixed in REL9_2_STABLE. I'm wondering if the fix discussed in the above thread solves my problem. I found the following differences between Horiguchi-san's case and my case: (1) Horiguchi-san says the bug outputs the message: WARNING: page 0 of relation base/16384/16385 does not exist On the other hand, I got the message: WARNING: page 506747 of relation base/482272/482304 was uninitialized (2) Horiguchi-san produced the problem when he shut the standby immediately and restarted it. However, I saw the problem during failover. (3) Horiguchi-san did not use any index, but in my case the WARNING message refers to an index. But there's a similar point. Horiguchi-san says the problem occurs after DELETE+VACUUM. In my case, I shut the primary down while the application was doing INSERT/UPDATE. As the below messages show, some vacuuming was running just before the immediate shutdown: ... LOG: automatic vacuum of table GOLD.scm1.tbl1: index scans: 0 pages: 0 removed, 36743 remain tuples: 0 removed, 73764 remain system usage: CPU 0.09s/0.11u sec elapsed 0.66 sec LOG: automatic analyze of table GOLD.scm1.tbl1 system usage: CPU 0.00s/0.14u sec elapsed 0.32 sec LOG: automatic vacuum of table GOLD.scm1.tbl2: index scans: 0 pages: 0 removed, 12101 remain tuples: 40657 removed, 44142 remain system usage: CPU 0.06s/0.06u sec elapsed 0.30 sec LOG: automatic analyze of table GOLD.scm1.tbl2 system usage: CPU 0.00s/0.06u sec elapsed 0.14 sec LOG: received immediate shutdown request ... Could you tell me the details of the problem discussed and fixed in the upcoming minor release? I would to like to know the phenomenon and its conditions, and whether it applies to my case. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 1/3] Fix x + y x overflow checks
On Thu, Jan 24, 2013 at 6:36 AM, Xi Wang xi.w...@gmail.com wrote: icc optimizes away the overflow check x + y x (y 0), because signed integer overflow is undefined behavior in C. Instead, use a safe precondition test x INT_MAX - y. I should mention gcc 4.7 does the same, and it emits a warning. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Passing connection string to pg_basebackup
On Tue, Jan 22, 2013 3:27 PM Hari Babu wrote: On Saturday, January 19, 2013 5:49 PM Magnus Hagander wrote: On Fri, Jan 18, 2013 at 1:05 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 18.01.2013 13:41, Amit Kapila wrote: On Friday, January 18, 2013 3:46 PM Heikki Linnakangas wrote: On 18.01.2013 08:50, Amit Kapila wrote: So to solve this problem below can be done: 1. Support connection string in pg_basebackup and mention keepalives or connection_timeout 2. Support recv_timeout separately to provide a way to users who are not comfortable tcp keepalives a. 1 can be done alone b. 2 can be done alone c. both 1 and 2. Right. Let's do just 1 for now. An general application level, non-TCP, keepalive message at the libpq level might be a good idea, but that's a much larger patch, definitely not 9.3 material. +1 for doing 1 now. But actually, I think we can just keep it that way in the future as well. If you need to specify these fairly advanced options, using a connection string really isn't a problem. I think it would be more worthwhile to go through the rest of the tools in bin/ and make sure they *all* support connection strings. And, an important point, do it the same way. Presently I am trying to implement the option-1 by adding an extra command line Option -C connection_string to pg_basebackup and pg_receivexlog. This option can be used with all the tools in bin folder. The existing command line options to the tools are not planned to remove as of now. To handle both options, we can follow these approaches. 1. To make the code simpler, the connection string is formed inside with the existing command line options, if the user is not provided the connection_string option. which is used for further processing. 2. The connection_string and existing command line options are handled separately. I feel approach-1 is better. Please provide your suggestions on the same. Here is the patch which handles taking of connection string as an argument to pg_basebackup and pg_receivexlog. Description of changes: 1. New command line -C connection-stringoption is added for passing the connection string. 2. Used PQconnectdb function for connecting to server instead of existing function PQconnectdbParams. 3. The existing command line parameters are formed in a string and passed to PQconnectdb function. 4. With the connection string, if user provides additional options with existing command line options, higher priority is given for the additional options. 5. conninfo_parse function is modified to handle of single quote in the password provided as input. please provide your suggestions. Regards, Hari babu. pg_basebkup_recvxlog_conn_string_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]
On Thursday, January 24, 2013 6:51 PM Andres Freund wrote: On 2013-01-24 18:37:29 +0530, Amit Kapila wrote: On Thursday, January 24, 2013 5:25 PM Andres Freund wrote: On 2013-01-24 16:45:42 +0530, Amit Kapila wrote: * The gram.y changes arround set_rest_(more|common) seem pretty confused to me. E.g. its not possible anymore to set the timezone for a function. What do you exactly mean by this part of comment. The set_rest_more production is used in FunctionSetResetClause and youve removed capabilities from it. set_rest_more has set_reset_common, so there should be no problem. set_rest_more:/* Generic SET syntaxes: */ set_rest_common | var_name FROM CURRENT_P { VariableSetStmt *n = makeNode(VariableSetStmt); n-kind = VAR_SET_CURRENT; n-name = $1; $$ = n; } True. I still think that the split youve made isn't right as-is. I am working on to provide the exact reasoning of split and if some variable is not appropriate, I shall do the appropriate movement. * Writing the temporary file to .$pid seems like a bad idea, better use one file for that, SET PERSISTENT is protected by an exclusive lock anyway. I think we can use one temporary file, infact that was one of the ways I have asked in one of the previous mails. However Tom and Zoltan felt this is better way to do it. The have? I didn't read it like that. The file can only ever written by a running postmaster and we already have code that ensures that. There's absolutely no need for the tempfile to have a nondeterministic name. That makes cleanup way easier as well. Sure, I understand that cleaning will be easier. So IIUC you are suggesting, we should just keep name as postgresql.auto.conf.tmp In general, please read the below mail where it has been suggested to use .$pid http://www.postgresql.org/message-id/28379.1358541...@sss.pgh.pa.us That was in the context of your use of mkstemp() as far as I read it. We use constantly named temp files in other locations as well. Okay, as I told you that I have also proposed initially, so as now you have pointed it specifically, I shall do it that way unless somebody will have another strong point of not doing it this way. Why on earth should somebody have the tempfile open? There's an exclusive lock around writing the file in your code and if anybody interferes like that with postgres' temporary data you have far bigger problems than SET PERSISTENT erroring out. I am also not sure, but may be some people do for test purpose. That seems like an completely pointless reasoning. I have no specific reason, so I shall do it the way you have suggested. * the write sequence should be: * fsync(tempfile) * fsync(directory) * rename(tempfile, configfile) * fsync(configfile) * fsync(directory) Why do we need fsync(directory) and fsync(configfile)? So they don't vanish /get corrupted after a crash? The above is the canonically safe way to safely write a file without an invalid file ever being visible. Do you think there will be problem if we just use as below: * fsync(tempfile) * rename(tempfile, configfile) I have seen such kind of use elsewhere also in code (writeTimeLineHistory()) Yea, there are some places where the code isn't entirely safe. No reason to introduce more of those. Okay, I will check this and let you know if I have any doubts. * the check that prevents persistent SETs in a transaction should rather be in utility.c and use PreventTransactionChain like most of the others that need to do that (c.f. T_CreatedbStmt). We can move the check in utility.c, but if we use PreventTransactionChain, then it will be disallowed in functions as well. But the original idea was to not support in transaction blocks only. So I feel use of current function IsTransactionBlock() should be okay. I don't think its even remotely safe to allow doing this from a function. Doing it there explicitly allows doing it in a transaction. As SET command is allowed inside a function, so I don't think without any reason we should disallow SET PERSISTENT in function. The reason why it's not allowed in transaction is that we have to delete the temporary file in transaction end or at rollback which could have made the logic much more complex. Yea, and allowing use in functions doesn't help you with that at all: Consider the following plpgsql function: $$ BEGIN SET PERSISTENT foo = 'bar'; RAISE ERROR 'blub';
Re: [HACKERS] [PATCH 0/3] Work around icc miscompilation
Xi Wang xi.w...@gmail.com writes: On 1/24/13 5:02 AM, Heikki Linnakangas wrote: If we fix these, can we stop using -frapv on gcc? Is there any way to get gcc to warn about these? I believe we can get rid of -fwrapv once we fix all such checks. TBH, I find that statement to be nonsense, and these patches to be completely the wrong way to go about it. The fundamental problem here is that the compiler, unless told otherwise by a compilation switch, believes it is entitled to assume that no integer overflow will happen anywhere in the program. Therefore, any error check that is looking for overflow *should* get optimized away. The only reason the compiler would fail to do that is if its optimizer isn't quite smart enough to prove that the code is testing for an overflow condition. So what you are proposing here is merely the next move in an arms race with the compiler writers, and it will surely break again in a future generation of compilers. Or even if these particular trouble spots don't break, something else will. The only reliable solution is to not let the compiler make that type of assumption. So I think we should just reject all of these, and instead fix configure to make sure it turns on icc's equivalent of -fwrapv. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Setting visibility map in VACUUM's second phase
On Thu, Jan 24, 2013 at 1:28 AM, Pavan Deolasee pavan.deola...@gmail.com wrote: Hi Jeff, On Thu, Jan 24, 2013 at 2:41 PM, Jeff Janes jeff.ja...@gmail.com wrote: lazy_vacuum_page now pins and unpins the vmbuffer for each page it marks all-visible, which seems like a lot of needless traffic since the next vmbuffer is likely to be the same as the previous one. Good idea. Even though the cost of pinning/unpinning may not be high with respect to the vacuum cost itself, but it seems to be a good idea because we already do that at other places. Do you have any other review comments on the patch or I'll fix this and send an updated patch soon. That was the only thing that stood out to me. You had some doubts about visibility_cutoff_xid, but I don't know enough about that to address them. I agree that it would be nice to avoid the code duplication, but I don't see a reasonable way to do that. It applies cleanly (some offsets), builds without warnings, passes make check under cassert.No documentation or regression tests are needed. We want this, and it does it. I have verified the primary objective (setting vm sooner) but haven't experimentally verified that this actually increases throughput for some workload. For pgbench when all data fits in shared_buffers, at least it doesn't cause a readily noticeable slow down. I haven't devised any patch-specific test cases, either for correctness or performance. I just used make check, pgbench, and the vacuum verbose verification you told us about in the original submission. If we are going to scan a block twice, I wonder if it should be done the other way around. If there are any dead tuples that need to be removed from indexes, there is no point in dirtying the page with a HOT prune on the first pass when it will just be dirtied again after the indexes are vacuumed. I don't see this idea holding up your patch though, as surely it would be more invasive than what you are doing. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state
On 13-01-24 06:40 AM, Andres Freund wrote: Fair enough. I am also working on a user of this infrastructure but that doesn't help you very much. Steve Singer seemed to make some stabs at writing an output plugin as well. Steve, how far did you get there? I was able to get something that generated output for INSERT statements in a format similar to what a modified slony apply trigger would want. This was with the list of tables to replicate hard-coded in the plugin. This was with the patchset from the last commitfest.I had gotten a bit hung up on the UPDATE and DELETE support because slony allows you to use an arbitrary user specified unique index as your key. It looks like better support for tables with a unique non-primary key is in the most recent patch set. I am hoping to have time this weekend to update my plugin to use parameters passed in on the init and other updates in the most recent version. If I make some progress I will post a link to my progress at the end of the weekend. My big issue is that I have limited time to spend on this. BTW, why does all the transaction reordering stuff has to be in core? It didn't use to, but people argued pretty damned hard that no undecoded data should ever allowed to leave the postgres cluster. And to be fair it makes writing an output plugin *way* much easier. Check http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=contrib/test_decoding/test_decoding.c;hb=xlog-decoding-rebasing-cf4 If you skip over tuple_to_stringinfo(), which is just pretty generic scaffolding for converting a whole tuple to a string, writing out the changes in some format by now is pretty damn simple. I think we will find that the replication systems won't be the only users of this feature. I have often seen systems that have a logging requirement for auditing purposes or to log then reconstruct the sequence of changes made to a set of tables in order to feed a downstream application. Triggers and a journaling table are the traditional way of doing this but it should be pretty easy to write a plugin to accomplish the same thing that should give better performance. If the reordering stuff wasn't in core this would be much harder. How much of this infrastructure is to support replicating DDL changes? IOW, if we drop that requirement, how much code can we slash? Unfortunately I don't think too much unless we add in other code that allows us to check whether the current definition of a table is still the same as it was back when the tuple was logged. Any other features or requirements that could be dropped? I think it's clear at this stage that this patch is not going to be committed as it is. If you can reduce it to a fraction of what it is now, that fraction might have a chance. Otherwise, it's just going to be pushed to the next commitfest as whole, and we're going to be having the same doubts and discussions then. One thing that reduces complexity is to declare the following as unsupported: - CREATE TABLE foo(data text); - DECODE UP TO HERE; - INSERT INTO foo(data) VALUES(very-long-to-be-externally-toasted-tuple); - DROP TABLE foo; - DECODE UP TO HERE; but thats just a minor thing. I think what we can do more realistically than to chop of required parts of changeset extraction is to start applying some of the preliminary patches independently: - the relmapper/relfilenode changes + pg_relation_by_filenode(spc, relnode) should be independently committable if a bit boring - allowing walsenders to connect to a database possibly needs an interface change but otherwise it should be fine to go in independently. It also has other potential use-cases, so I think thats fair. - logging xl_running_xact's more frequently could also be committed independently and makes sense independently as it allows a standby to enter HS faster if the master is busy - Introducing InvalidCommandId should be relatively uncontroversial. The fact that no invalid value for command ids exists is imo an oversight - the *Satisfies change could be applied and they are imo ready but there's no use-case for it without the rest, so I am not sure whether theres a point - currently not separately available, but we could add wal_level=logical independently. There would be no user of it, but it would be partial work. That includes the relcache support for keeping track of the primary key which already is available separately. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG #6510: A simple prompt is displayed using wrong charset
On Thu, Jan 24, 2013 at 08:50:36AM -0500, Andrew Dunstan wrote: On 01/24/2013 03:42 AM, Craig Ringer wrote: On 01/24/2013 01:06 AM, Alexander Law wrote: Hello, Please let me know if I can do something to get the bug fix (https://commitfest.postgresql.org/action/patch_view?id=902) committed. I would like to fix other bugs related to postgres localization, but I am not sure yet how to do it. For anyone looking for the history, the 1st post on this topic is here: http://www.postgresql.org/message-id/e1s3twb-0004oy...@wrigleys.postgresql.org Yeah. I'm happy enough with this patch. ISTM it's really a bug and should be backpatched, no? It is a bug, yes. I'm neutral on whether to backpatch. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]
Andres Freund and...@2ndquadrant.com writes: On 2013-01-24 16:45:42 +0530, Amit Kapila wrote: * Writing the temporary file to .$pid seems like a bad idea, better use one file for that, SET PERSISTENT is protected by an exclusive lock anyway. I think we can use one temporary file, infact that was one of the ways I have asked in one of the previous mails. However Tom and Zoltan felt this is better way to do it. The have? I didn't read it like that. The file can only ever written by a running postmaster and we already have code that ensures that. There's absolutely no need for the tempfile to have a nondeterministic name. That makes cleanup way easier as well. Say again? Surely the temp file is being written by whichever backend is executing SET PERSISTENT, and there could be more than one. Or, if it isn't and you really do mean that this patch is getting the postmaster process involved in executing SET PERSISTENT, that's going to be sufficient grounds for rejection right there. The only way that we have to keep the postmaster reliable is to not have it doing more than it absolutely must. While I have not been paying very much attention to this patch, I have to say that every time I've looked into the thread I've gotten an overwhelming impression of overdesign and overcomplication. This should not be a very large or subtle patch. Write a setting into a temp file, rename it into place, done. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Skip checkpoint on promoting from streaming replication
On 6 January 2013 21:58, Simon Riggs si...@2ndquadrant.com wrote: On 9 August 2012 10:45, Simon Riggs si...@2ndquadrant.com wrote: On 22 June 2012 05:03, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: I hope this is promising. I've reviewed this and thought about it over some time. I've been torn between the need to remove the checkpoint for speed and being worried about the implications of doing so. We promote in multiple use cases. When we end a PITR, or are performing a switchover, it doesn't really matter how long the shutdown checkpoint takes, so I'm inclined to leave it there in those cases. For failover, we need fast promotion. So my thinking is to make pg_ctl promote -m fast be the way to initiate a fast failover that skips the shutdown checkpoint. That way all existing applications work the same as before, while new users that explicitly choose to do so will gain from the new option. Here's a patch to skip checkpoint when we do pg_ctl promote -m fast We keep the end of recovery checkpoint in all other cases. The only thing left from Kyotaro's patch is a single line of code - the call to ReadCheckpointRecord() that checks to see if the WAL records for the last two restartpoints is on disk, which was an important line of code. Patch implements a new record type XLOG_END_OF_RECOVERY that behaves on replay like a shutdown checkpoint record. I put this back in from my patch because I believe its important that we have a clear place where the WAL history changes timelineId. WAL format change bump required. So far this is only barely tested, but considering time goes on, I thought people might want to pass comment on this. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services fast_promote.v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]
On 2013-01-24 11:22:52 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2013-01-24 16:45:42 +0530, Amit Kapila wrote: * Writing the temporary file to .$pid seems like a bad idea, better use one file for that, SET PERSISTENT is protected by an exclusive lock anyway. I think we can use one temporary file, infact that was one of the ways I have asked in one of the previous mails. However Tom and Zoltan felt this is better way to do it. The have? I didn't read it like that. The file can only ever written by a running postmaster and we already have code that ensures that. There's absolutely no need for the tempfile to have a nondeterministic name. That makes cleanup way easier as well. Say again? Surely the temp file is being written by whichever backend is executing SET PERSISTENT, and there could be more than one. Sure, but the patch acquires SetPersistentLock exlusively beforehand which seems fine to me. Any opinion whether its acceptable to allow SET PERSISTENT in functions? It seems absurd to me to allow it, but Amit seems to be of another opinion. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 0/3] Work around icc miscompilation
On 1/24/13 10:48 AM, Tom Lane wrote: The fundamental problem here is that the compiler, unless told otherwise by a compilation switch, believes it is entitled to assume that no integer overflow will happen anywhere in the program. Therefore, any error check that is looking for overflow *should* get optimized away. The only reason the compiler would fail to do that is if its optimizer isn't quite smart enough to prove that the code is testing for an overflow condition. So what you are proposing here is merely the next move in an arms race with the compiler writers, and it will surely break again in a future generation of compilers. Or even if these particular trouble spots don't break, something else will. The only reliable solution is to not let the compiler make that type of assumption. What I am proposing here is the opposite: _not_ to enter an arm race with the compiler writers. Instead, make the code conform to the C standard, something both sides can agree on. Particularly, avoid using (signed) overflowed results to detect overflows, which the C standard clearly specifies as undefined behavior and many compilers are actively exploiting. We could use either unsigned overflows (which is well defined) or precondition testing (like `x INT_MAX - y' in the patches). So I think we should just reject all of these, and instead fix configure to make sure it turns on icc's equivalent of -fwrapv. While I agree it's better to turn on icc's -fno-strict-overflow as a workaround, the fundamental problem here is that we are _not_ programming in the C language. Rather, we are programming in some C-with-signed-wrapraround dialect. The worst part of this C dialect is that it has never been specified clearly what it means by signed wraparound: gcc's -fwrapv assumes signed wraparound for add/sub/mul, but not for div (e.g., INT_MIN/-1 traps on x86) nor shift (e.g., 132 produces undef with clang). clang's -fwrapv also assumes workaround for pointer arithmetic, while gcc's does not. I have no idea what icc and pathscale's -fno-strict-overflow option does (probably the closest thing to -fwrapv). Sometimes it prevents such checks from being optimized away, sometimes it doesn't. Anyway, it would not be surprising if future C compilers break this dialect again. But they shouldn't break standard-conforming code. - xi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pg_isready (was: [WIP] pg_ping utility)
On Wed, Jan 23, 2013 at 8:12 PM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Jan 24, 2013 at 8:47 AM, Phil Sorber p...@omniti.com wrote: On Wed, Jan 23, 2013 at 6:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: Phil Sorber p...@omniti.com writes: On Wed, Jan 23, 2013 at 1:58 PM, Bruce Momjian br...@momjian.us wrote: On Wed, Jan 23, 2013 at 12:27:45PM -0500, Tom Lane wrote: +1 for default timeout --- if this isn't like ping where you are expecting to run indefinitely, I can't see that it's a good idea for it to sit very long by default, in any circumstance. FYI, the pg_ctl -w (wait) default is 60 seconds: Great. That is what I came to on my own as well. Figured that might be a sticking point, but as there is precedent, I'm happy with it. I'm not sure that's a relevant precedent at all. What that number is is the time that pg_ctl will wait around for the postmaster to start or stop before reporting a problem --- and in either case, a significant delay (multiple seconds) is not surprising, because of crash-recovery work, shutdown checkpointing, etc. For pg_isready, you'd expect to get a response more or less instantly, wouldn't you? Personally, I'd decide that pg_isready is broken if it didn't give me an answer in a couple of seconds, much less a minute. What I had in mind was a default timeout of maybe 3 or 4 seconds... I was thinking that if it was in a script you wouldn't care if it was 60 seconds. If it was at the command line you would ^C it much earlier. I think the default linux TCP connection timeout is around 20 seconds. My feeling is everyone is going to have a differing opinion on this, which is why I was hoping that some good precedent existed. I'm fine with 3 or 4, whatever can be agreed upon. +1 with 3 or 4 secounds. Aside from this issue, I have one minor comment. ISTM you need to add the following line to the end of the help message. This line has been included in the help message of all the other client commands. Report bugs to pgsql-b...@postgresql.org. Ok, I set the default timeout to 3 seconds, added the bugs email to the help, and also added docs that I forgot last time. Also, still hoping to get some feedback on my other issues. Thanks. Regards, -- Fujii Masao pg_isready_timeout_v2.diff Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Skip checkpoint on promoting from streaming replication
On 24.01.2013 18:24, Simon Riggs wrote: On 6 January 2013 21:58, Simon Riggssi...@2ndquadrant.com wrote: I've been torn between the need to remove the checkpoint for speed and being worried about the implications of doing so. We promote in multiple use cases. When we end a PITR, or are performing a switchover, it doesn't really matter how long the shutdown checkpoint takes, so I'm inclined to leave it there in those cases. For failover, we need fast promotion. So my thinking is to make pg_ctl promote -m fast be the way to initiate a fast failover that skips the shutdown checkpoint. That way all existing applications work the same as before, while new users that explicitly choose to do so will gain from the new option. Here's a patch to skip checkpoint when we do pg_ctl promote -m fast We keep the end of recovery checkpoint in all other cases. Hmm, there seems to be no way to do a fast promotion with a trigger file. I'm a bit confused why there needs to be special mode for this. Can't we just always do the fast promotion? I agree that there's no urgency when you're doing PITR, but shouldn't do any harm either. Or perhaps always do fast promotion when starting up from standby mode, and slow otherwise. Are we comfortable enough with this to skip the checkpoint after crash recovery? I may be missing something, but it looks like after a fast promotion, you don't request a new checkpoint. So it can take quite a while for the next checkpoint to be triggered by checkpoint_timeout/segments. That shouldn't be a problem, but I feel that it'd be prudent to request a new checkpoint immediately (not necessarily an immediate checkpoint, though). The only thing left from Kyotaro's patch is a single line of code - the call to ReadCheckpointRecord() that checks to see if the WAL records for the last two restartpoints is on disk, which was an important line of code. Why's that important, just for paranoia? If the last two restartpoints have disappeared, something's seriously wrong, and you will be in trouble e.g if you crash at that point. Do we need to be extra paranoid when doing a fast promotion? Patch implements a new record type XLOG_END_OF_RECOVERY that behaves on replay like a shutdown checkpoint record. I put this back in from my patch because I believe its important that we have a clear place where the WAL history changes timelineId. WAL format change bump required. Agreed, such a WAL record is essential. At replay, an end-of-recovery record should be a signal to the hot standby mechanism that there are no transactions running in the master at that point, same as a shutdown checkpoint. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Back-branch update releases coming in a couple weeks
On Thu, Jan 24, 2013 at 11:53 PM, MauMau maumau...@gmail.com wrote: From: Fujii Masao masao.fu...@gmail.com On Thu, Jan 24, 2013 at 7:42 AM, MauMau maumau...@gmail.com wrote: I searched through PostgreSQL mailing lists with WAL contains references to invalid pages, and i found 19 messages. Some people encountered similar problem. There were some discussions regarding those problems (Tom and Simon Riggs commented), but those discussions did not reach a solution. I also found a discussion which might relate to this problem. Does this fix the problem? [BUG] lag of minRecoveryPont in archive recovery http://www.postgresql.org/message-id/20121206.130458.170549097.horiguchi.kyot...@lab.ntt.co.jp Yes. Could you check whether you can reproduce the problem on the latest REL9_2_STABLE? I tried to produce the problem by doing pg_ctl stop -mi against the primary more than ten times on REL9_2_STABLE, but the problem did not appear. However, I encountered the crash only once out of dozens of failovers, possibly more than a hundred times, on PostgreSQL 9.1.6. So, I'm not sure the problem is fixed in REL9_2_STABLE. You can reproduce the problem in REL9_1_STABLE? Sorry, I pointed wrong version, i.e., REL9_2_STABLE. Since you encountered the problem in 9.1.6, you need to retry the test in REL9_1_STABLE. I'm wondering if the fix discussed in the above thread solves my problem. I found the following differences between Horiguchi-san's case and my case: (1) Horiguchi-san says the bug outputs the message: WARNING: page 0 of relation base/16384/16385 does not exist On the other hand, I got the message: WARNING: page 506747 of relation base/482272/482304 was uninitialized (2) Horiguchi-san produced the problem when he shut the standby immediately and restarted it. However, I saw the problem during failover. (3) Horiguchi-san did not use any index, but in my case the WARNING message refers to an index. But there's a similar point. Horiguchi-san says the problem occurs after DELETE+VACUUM. In my case, I shut the primary down while the application was doing INSERT/UPDATE. As the below messages show, some vacuuming was running just before the immediate shutdown: ... LOG: automatic vacuum of table GOLD.scm1.tbl1: index scans: 0 pages: 0 removed, 36743 remain tuples: 0 removed, 73764 remain system usage: CPU 0.09s/0.11u sec elapsed 0.66 sec LOG: automatic analyze of table GOLD.scm1.tbl1 system usage: CPU 0.00s/0.14u sec elapsed 0.32 sec LOG: automatic vacuum of table GOLD.scm1.tbl2: index scans: 0 pages: 0 removed, 12101 remain tuples: 40657 removed, 44142 remain system usage: CPU 0.06s/0.06u sec elapsed 0.30 sec LOG: automatic analyze of table GOLD.scm1.tbl2 system usage: CPU 0.00s/0.06u sec elapsed 0.14 sec LOG: received immediate shutdown request ... Could you tell me the details of the problem discussed and fixed in the upcoming minor release? I would to like to know the phenomenon and its conditions, and whether it applies to my case. http://www.postgresql.org/message-id/20121206.130458.170549097.horiguchi.kyot...@lab.ntt.co.jp Could you read the discussion in the above thread? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Improve concurrency of foreign key locking
Alvaro Herrera wrote: Improve concurrency of foreign key locking I noticed a bug in visibility routines when pg_upgrade is involved: tuples marked FOR UPDATE in the old cluster (i.e. HEAP_XMAX_INVALID not set, HEAP_XMAX_EXCL_LOCK set, HEAP_XMAX_IS_MULTI not set) are invisible (dead) on the new cluster. I had defined the HEAP_XMAX_IS_LOCKED_ONLY thusly: #define HEAP_XMAX_IS_LOCKED_ONLY(infomask) \ ((infomask) HEAP_XMAX_LOCK_ONLY) but this doesn't work for the reason stated above. The fix is to redefine the macro like this: /* * A tuple is only locked (i.e. not updated by its Xmax) if the * HEAP_XMAX_LOCK_ONLY bit is set; or, for pg_upgrade's sake, if the Xmax is * not a multi and the EXCL_LOCK bit is set. * * See also HeapTupleHeaderIsOnlyLocked, which also checks for a possible * aborted updater transaction. * * Beware of multiple evaluations of the argument. */ #define HEAP_XMAX_IS_LOCKED_ONLY(infomask) \ (((infomask) HEAP_XMAX_LOCK_ONLY) || \ (((infomask) (HEAP_XMAX_IS_MULTI | HEAP_LOCK_MASK)) == HEAP_XMAX_EXCL_LOCK)) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]
Andres Freund and...@2ndquadrant.com writes: On 2013-01-24 11:22:52 -0500, Tom Lane wrote: Say again? Surely the temp file is being written by whichever backend is executing SET PERSISTENT, and there could be more than one. Sure, but the patch acquires SetPersistentLock exlusively beforehand which seems fine to me. Why should we have such a lock? Seems like that will probably introduce as many problems as it fixes. Deadlock risk, blockages, etc. It is not necessary for atomicity, since rename() would be atomic already. Any opinion whether its acceptable to allow SET PERSISTENT in functions? It seems absurd to me to allow it, but Amit seems to be of another opinion. Well, it's really a definitional question I think: do you expect that subsequent failure of the transaction should cause such a SET to roll back? I think it would be entirely self-consistent to define SET PERSISTENT as a nontransactional operation. Then the implementation would just be to write the file immediately when the command is executed, and there's no particular reason why it can't be allowed inside a transaction block. If you want it to be transactional, then the point of disallowing it in transaction blocks (or functions) would be to not have a very large window between writing the file and committing. But it's still possible that the transaction would fail somewhere in there, leading to the inconsistent outcome that the transaction reports failing but we applied the SET anyway. I do agree that it would be nonsensical to allow SET PERSISTENT in functions but not transaction blocks. Another approach is to remember the requested setting until somewhere in the pre-commit sequence, and then try to do the file write at that time. I'm not terribly thrilled with that approach, though, because (a) it only narrows the window for an inconsistent outcome, it doesn't remove it entirely; (b) there are already too many things that want to be the last thing done before commit; and (c) it adds complexity and overhead that I'd just as soon this patch not add. But if you want S.P. to be transactional and allowed inside functions, I think this would be the only acceptable implementation. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_retainxlog for inclusion in 9.3?
After reviewing this, it appears to me that this is really just a very verbose version of archive_command = 'sleep $initialsleep; while test $(psql -AtX -c select pg_xlogfile_name(something) $$%f$$ collate \C\;) = t; sleep $sleep; done' I think it might be better to just document this as an example. I don't quite see the overhead of maintaining another tool justified. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_retainxlog for inclusion in 9.3?
On Thu, Jan 24, 2013 at 6:04 PM, Peter Eisentraut pete...@gmx.net wrote: After reviewing this, it appears to me that this is really just a very verbose version of archive_command = 'sleep $initialsleep; while test $(psql -AtX -c select pg_xlogfile_name(something) $$%f$$ collate \C\;) = t; sleep $sleep; done' I think it might be better to just document this as an example. I don't quite see the overhead of maintaining another tool justified. Well, obviously I don't entirely agree ;) Yes, it's a convenience command. Like pg_standby was. And like many other commands that we maintain as part of *core*, such as createuser, vacuumdb, etc. Those can all be done with an even *simpler* command than the one you suggest above. So I don't see that as an argument why it wouldn't be useful. Also, the command you suggest above does not work on Windows. You can probably write a .BAT file to do it for you, but I'm pretty sure it's impossible to do it as an archive_command there. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]
On 2013-01-24 12:02:59 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2013-01-24 11:22:52 -0500, Tom Lane wrote: Say again? Surely the temp file is being written by whichever backend is executing SET PERSISTENT, and there could be more than one. Sure, but the patch acquires SetPersistentLock exlusively beforehand which seems fine to me. Why should we have such a lock? Seems like that will probably introduce as many problems as it fixes. Deadlock risk, blockages, etc. It is not necessary for atomicity, since rename() would be atomic already. Well, the patch acquires it as-is, so I made the judgement based on that. But I think that lock isn't neccesarily a bad idea. While the replacement of the values clearly is atomic due to the rename I think there's another confusing behaviour lurking: Backend A: starts Backend B: starts Backend A: reads the config Backend B: reads the config Backend A: does SET PERSISTENT foobar =..; Backend B: does SET PERSISTENT foobar =..; Now B overwrites the config change A has made as they are all stored in the same file. So the only safe way to do this seems to be: LWLockAcquire(SetPersistentLock, LW_EXCLUSIVE); ProcessConfigFile(); validate_option(); write_rename(); LWLockRelease(); We can decide not to care about the above case but the window isn't that small if no reload is implied and so this seems to be overly grotty. Any opinion whether its acceptable to allow SET PERSISTENT in functions? It seems absurd to me to allow it, but Amit seems to be of another opinion. Well, it's really a definitional question I think: do you expect that subsequent failure of the transaction should cause such a SET to roll back? I think it would be entirely self-consistent to define SET PERSISTENT as a nontransactional operation. Then the implementation would just be to write the file immediately when the command is executed, and there's no particular reason why it can't be allowed inside a transaction block. Thats how its implemented atm except for not allowing it in transactions. I think the reason I am weary of allowing it inside transaction is that I think the config file needs to be reloaded before writing the new one and it seems dangerous to me to reload the config in all the possible situations a function can be called. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_retainxlog for inclusion in 9.3?
Magnus Hagander mag...@hagander.net writes: On Thu, Jan 24, 2013 at 6:04 PM, Peter Eisentraut pete...@gmx.net wrote: I think it might be better to just document this as an example. I don't quite see the overhead of maintaining another tool justified. Well, obviously I don't entirely agree ;) Yes, it's a convenience command. Like pg_standby was. And like many other commands that we maintain as part of *core*, such as createuser, vacuumdb, etc. Those can all be done with an even *simpler* command than the one you suggest above. So I don't see that as an argument why it wouldn't be useful. We've discussed removing a lot of those tools, too. Not breaking backwards compatibility is probably the only reason they're still there. In the case at hand, I seem to recall from upthread that we expect this'd be obsolete in a release or two. If that's true then I think a para or two of documentation is a better idea than a tool we'll be essentially condemned to keep maintaining forever. Also, the command you suggest above does not work on Windows. You can probably write a .BAT file to do it for you, but I'm pretty sure it's impossible to do it as an archive_command there. Perhaps we could whip up such a .BAT file and put it in the docs? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_retainxlog for inclusion in 9.3?
On Thu, Jan 24, 2013 at 6:25 PM, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: On Thu, Jan 24, 2013 at 6:04 PM, Peter Eisentraut pete...@gmx.net wrote: I think it might be better to just document this as an example. I don't quite see the overhead of maintaining another tool justified. Well, obviously I don't entirely agree ;) Yes, it's a convenience command. Like pg_standby was. And like many other commands that we maintain as part of *core*, such as createuser, vacuumdb, etc. Those can all be done with an even *simpler* command than the one you suggest above. So I don't see that as an argument why it wouldn't be useful. We've discussed removing a lot of those tools, too. Not breaking backwards compatibility is probably the only reason they're still there. In the case at hand, I seem to recall from upthread that we expect this'd be obsolete in a release or two. If that's true then I think a para or two of documentation is a better idea than a tool we'll be essentially condemned to keep maintaining forever. Not really sure there is such an expectation - any more than there was such an expectation when we initially put pg_standby in there. It would be *possible* to do it, certainly. But it's not like we have an actual plan. And AFAIK the stuff that was discussed upthread was a simplified version of it - not the full flexibility. That said, it's certainly a point that we'd have to maintain it. But I don't see why we'd have to maintain it beyond the point where we included the same functionality in core, if we did. Also, the command you suggest above does not work on Windows. You can probably write a .BAT file to do it for you, but I'm pretty sure it's impossible to do it as an archive_command there. Perhaps we could whip up such a .BAT file and put it in the docs? That would probably work, yes. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]
Andres Freund and...@anarazel.de writes: Backend A: does SET PERSISTENT foobar =..; Backend B: does SET PERSISTENT foobar =..; Now B overwrites the config change A has made as they are all stored in the same file. Say what? I thought the plan was one setting per file, so that we don't get involved in having to parse-and-edit the file contents. What was all that argument about a new directory, if we're only using one file? If we are using just one file, then I agree a lock would be needed to synchronize updates. But that seems to add a lot of complication elsewhere. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: fix corner use case of variadic fuctions usage
Hello so there is updated version + some lines of doc + new regress tests there are no reply to my previous mail - so I choose concat(variadic null) --- NULL concat(variadic '{}') --- empty string this behave should not be precedent for any other variadic any function, because concat() and concat_ws() functions has a specific behave - when it is called with one parameter returns string or empty string Regards Pavel 2013/1/23 Pavel Stehule pavel.steh...@gmail.com: 2013/1/23 Tom Lane t...@sss.pgh.pa.us: Pavel Stehule pavel.steh...@gmail.com writes: next related example CREATE OR REPLACE FUNCTION public.myleast(VARIADIC integer[]) RETURNS integer LANGUAGE sql AS $function$ select min(v) from unnest($1) g(v) $function$ The reason you get a null from that is that (1) unnest() produces zero rows out for either a null or empty-array input, and (2) min() over zero rows produces NULL. In a lot of cases, it's not very sane for aggregates over no rows to produce NULL; the best-known example is that SUM() produces NULL, when anyone who'd not suffered brain-damage from sitting on the SQL committee would have made it return zero. So I'm not very comfortable with generalizing from this specific case to decide that NULL is the universally right result. I am testing some situation and there are no consistent idea - can we talk just only about functions concat and concat_ws? these functions are really specific - now we talk about corner use case and strongly PostgreSQL proprietary solution. So any solution should not too bad. Difference between all solution will by maximally +/- 4 simple rows per any function. Possibilities 1) A. concat(variadic NULL) = empty string, B. concat(variadic '{}') = empty string -- if we accept @A, then B is ok 2) A. concat(variadic NULL) = NULL, B. concat(variadic '{}') = NULL -- question - is @B valid ? 3) A. concat(variadic NULL) = NULL, B. concat(variadic '{}) = empty string There are no other possibility. I can live with any variant - probably we find any precedent to any variant in our code or in ANSI SQL. I like @2 as general concept for PostgreSQL variadic functions, but when we talk about concat() and concat_ws() @1 is valid too. Please, can somebody say his opinion early Regards Pavel regards, tom lane variadic_any_fix_20130124.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]
On 2013-01-24 12:30:02 -0500, Tom Lane wrote: Andres Freund and...@anarazel.de writes: Backend A: does SET PERSISTENT foobar =..; Backend B: does SET PERSISTENT foobar =..; Now B overwrites the config change A has made as they are all stored in the same file. Say what? I thought the plan was one setting per file, so that we don't get involved in having to parse-and-edit the file contents. What was all that argument about a new directory, if we're only using one file? I initially thought that as well (and voted for it) but after reskimming the thread and reading the patch that doesn't seem to be the case unless its implemented in a way I don't understand: +#define PG_AUTOCONF_DIRauto.conf.d +#define PG_AUTOCONF_FILENAME postgresql.auto.conf + /* Frame auto conf name and auto conf sample temp file name */ + snprintf(AutoConfFileName, sizeof(AutoConfFileName), %s/%s/%s, + ConfigFileDir, + PG_AUTOCONF_DIR, + PG_AUTOCONF_FILENAME); + snprintf(AutoConfTmpFileName, sizeof(AutoConfTmpFileName),%s/%s/%s.%d, + ConfigFileDir, + PG_AUTOCONF_DIR, + PG_AUTOCONF_FILENAME, + (int) getpid()); + I don't understand what the conf.d is all about either if only one file is going to be used. If we are using just one file, then I agree a lock would be needed to synchronize updates. But that seems to add a lot of complication elsewhere. More people seem to have voted for the single file approach but I still haven't understood why... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Skip checkpoint on promoting from streaming replication
On 24 January 2013 16:52, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 24.01.2013 18:24, Simon Riggs wrote: On 6 January 2013 21:58, Simon Riggssi...@2ndquadrant.com wrote: I've been torn between the need to remove the checkpoint for speed and being worried about the implications of doing so. We promote in multiple use cases. When we end a PITR, or are performing a switchover, it doesn't really matter how long the shutdown checkpoint takes, so I'm inclined to leave it there in those cases. For failover, we need fast promotion. So my thinking is to make pg_ctl promote -m fast be the way to initiate a fast failover that skips the shutdown checkpoint. That way all existing applications work the same as before, while new users that explicitly choose to do so will gain from the new option. Here's a patch to skip checkpoint when we do pg_ctl promote -m fast We keep the end of recovery checkpoint in all other cases. Hmm, there seems to be no way to do a fast promotion with a trigger file. True. I thought we were moving away from trigger files to use of promote I'm a bit confused why there needs to be special mode for this. Can't we just always do the fast promotion? I agree that there's no urgency when you're doing PITR, but shouldn't do any harm either. Or perhaps always do fast promotion when starting up from standby mode, and slow otherwise. Are we comfortable enough with this to skip the checkpoint after crash recovery? I'm not. Maybe if we get no bugs we can make it do this always, in next release. It;s fast when it needs to be and safe otherwise. I may be missing something, but it looks like after a fast promotion, you don't request a new checkpoint. So it can take quite a while for the next checkpoint to be triggered by checkpoint_timeout/segments. That shouldn't be a problem, but I feel that it'd be prudent to request a new checkpoint immediately (not necessarily an immediate checkpoint, though). I thought of that and there is a long comment to explain why I didn't. Two problems: 1) an immediate checkpoint can cause a disk/resource usage spike, which is definitely not what you need just when a spike of connections and new SQL hits the system. 2) If we did that, we would have an EndOfRecovery record, some other records and then a Shutdown checkpoint. As I right this, (2) is wrong, because we shouldn't do a a Shutdown checkpoint anyway. But I still think (1) is a valid concern. The only thing left from Kyotaro's patch is a single line of code - the call to ReadCheckpointRecord() that checks to see if the WAL records for the last two restartpoints is on disk, which was an important line of code. Why's that important, just for paranoia? If the last two restartpoints have disappeared, something's seriously wrong, and you will be in trouble e.g if you crash at that point. Do we need to be extra paranoid when doing a fast promotion? The check is cheap, so what do we gain by skipping the check? Patch implements a new record type XLOG_END_OF_RECOVERY that behaves on replay like a shutdown checkpoint record. I put this back in from my patch because I believe its important that we have a clear place where the WAL history changes timelineId. WAL format change bump required. Agreed, such a WAL record is essential. At replay, an end-of-recovery record should be a signal to the hot standby mechanism that there are no transactions running in the master at that point, same as a shutdown checkpoint. I had a reason why I didn't do that, but it seems to have slipped my mind. If I can't remember, I'll add it. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] LATERAL, UNNEST and spec compliance
Folks, Andrew Gierth asked me to send this out as his email is in a parlous state at the moment. My comments will follow in replies. Without further ado: SQL2008 says, for 7.6 table reference 6) a) If TR is contained in a from clause FC with no intervening query expression, then the scope clause SC of TR is the select statement: single row or innermost query specification that contains FC. The scope of a range variable of TR is the select list, where clause, group by clause, having clause, and window clause of SC, together with every lateral derived table that is simply contained in FC and is preceded by TR, and every collection derived table that is simply contained in FC and is preceded by TR, and the join condition of all joined tables contained in SC that contain TR. If SC is the query specification that is the query expression body of a simple table query STQ, then the scope of a range variable of TR also includes the order by clause of STQ. This is the clause that defines the scope effect of LATERAL, and as can be seen, it defines collection derived table, i.e. UNNEST(), as having the same behaviour as lateral derived table. It is also worth noting at this point that pg's FROM func() syntax is not in the spec (the nearest is FROM TABLE(collection value expression)). Our implementation of UNNEST currently deviates from the spec by not being implicitly LATERAL; given the (sub)query SELECT * FROM sometable, UNNEST(somearray); then somearray is required to be a parameter or outer reference rather than a column of sometable. To get the spec's behaviour for this, we currently have to do: SELECT * FROM sometable, LATERAL UNNEST(somearray); which is non-standard syntax. (In the spec, only table subquery can follow LATERAL.) (We also don't accept the (optional) syntax of S301, allowing multiple parameters to UNNEST().) As I see it, the current options are: 1. Do nothing, and insist on non-standard use of the LATERAL keyword. 2. Add UNNEST to the grammar (or parse analysis) as a special case, making it implicitly LATERAL. (This would make implementing S301 easier, but special cases are ugly.) 3. Make all cases of SRFs in the FROM-clause implicitly LATERAL. (As far as I can tell, those cases whose behaviour would be changed by this actually produce errors in versions prior to 9.3, so no working code should be affected.) Since LATERAL is new in 9.3, I think the pros and cons of these choices should be considered now, rather than being allowed to slide by unexamined. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On Thu, Jan 17, 2013 at 07:54:55AM -0500, Robert Haas wrote: On Wed, Jan 16, 2013 at 1:38 PM, Tom Lane t...@sss.pgh.pa.us wrote: Where I really need someone to hit me upside the head with a clue-stick is the code I added to the bottom of RelationBuildDesc() in relcache.c. The idea is that on first access to an unlogged MV, to detect that the heap has been replaced by the init fork, set relisvalid to false, and make the heap look normal again. Hmm. I agree that relcache.c has absolutely no business doing that, but not sure what else to do instead. Seems like it might be better done at first touch of the MV in the parser, rewriter, or planner --- but the fact that I can't immediately decide which of those is right makes me feel that it's still too squishy. I think we shouldn't be doing that at all. The whole business of transferring the relation-is-invalid information from the relation to a pg_class flag seems like a Rube Goldberg device to me. I'm still not convinced that we should have a relation-is-invalid flag at all, but can we at least not have two? It seems perfectly adequate to detect that the MV is invalid when we actually try to execute a plan - that is, when we first access the heap or one of its indexes. So the bit can just live in the file-on-disk, and there's no need to have a second copy of it in pg_class. Like Kevin, I want a way to distinguish unpopulated MVs from MVs that genuinely yielded the empty set at last refresh. I agree that there's no particular need to store that fact in pg_class, and I would much prefer only storing it one way in any case. A user-visible disadvantage of the current implementation is that relisvalid remains stale until something opens the rel. That's fine for the system itself, but it can deceive user-initiated catalog queries. Imagine a check_postgres action that looks for invalid MVs to complain about. It couldn't just scan pg_class; it would need to first do something that opens every MV. I suggest the following: 1. Let an invalid MV have a zero-length heap. Distinguish a valid, empty MV by giving it a page with no tuples. This entails VACUUM[1] not truncating MVs below one page and the refresh operation, where necessary, extending the relation from zero pages to one. 2. Remove pg_class.relisvalid. 3. Add a bool field to RelationData. The word valid is used in that context to refer to the validity of the structure itself, so perhaps call the new field rd_scannable. RelationIsFlaggedAsValid() can become a macro; consider changing its name for consistency with the field name. 4. During relcache build, set the field to RelationGetNumberBlocks(rel) != 0 for MVs, fixed true for everyone else. Any operation that changes the field must, and probably would anyway, instigate a relcache invalidation. 5. Expose a database function, say pg_relation_scannable(), for querying the current state of a relation. This supports user-level monitoring. Does that seem reasonable? One semantic difference to keep in mind is that unlogged MVs will be considered invalid on the standby while valid on the master. That's essentially an accurate report, so I won't mind it. For the benefit of the archives, I note that we almost need not truncate an unlogged materialized view during crash recovery. MVs are refreshed in a VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the MV's pg_class to that relfilenode. When a crash occurs with no refresh in flight, the latest refresh had been safely synced. When a crash cuts short a refresh, the pg_class update will not stick, and the durability of the old heap is not in doubt. However, non-btree index builds don't have the same property; we would need to force an immediate sync of the indexes to be safe here. It would remain necessary to truncate unlogged MVs when recovering a base backup, which may contain a partially-written refresh that did eventually commit. Future MV variants that modify the MV in place would also need the usual truncate on crash. I'm going to follow this with a review covering a broader range of topics. Thanks, nm [1] For the time being, it's unfortunate to VACUUM materialized views at all; they only ever bear frozen tuples. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Hi Kevin, The patch conflicts with git master; I tested against master@{2013-01-20}. On Wed, Jan 16, 2013 at 12:40:55AM -0500, Kevin Grittner wrote: I've been struggling with two areas: - pg_dump sorting for MVs which depend on other MVs From your later messages, I understand that you have a way forward on this. - proper handling of the relisvalid flag for unlogged MVs after recovery I have discussed this in a separate email. While reading the patch to assess that topic, I found a few more things: *** a/contrib/pg_upgrade/version_old_8_3.c --- b/contrib/pg_upgrade/version_old_8_3.c *** *** 145,151 old_8_3_check_for_tsquery_usage(ClusterInfo *cluster) FROM pg_catalog.pg_class c, pg_catalog.pg_namespace n, pg_catalog.pg_attribute a ! WHERE c.relkind = 'r' AND c.oid = a.attrelid AND NOT a.attisdropped AND a.atttypid = 'pg_catalog.tsquery'::pg_catalog.regtype AND --- 145,151 FROM pg_catalog.pg_class c, pg_catalog.pg_namespace n, pg_catalog.pg_attribute a ! WHERE c.relkind in ('r', 'm') AND c.oid = a.attrelid AND NOT a.attisdropped AND a.atttypid = 'pg_catalog.tsquery'::pg_catalog.regtype AND PostgreSQL 8.3 clusters won't contain materialized views, so it doesn't really matter whether this change happens or not. I suggest adding a comment, whether or not you keep the code change. *** a/contrib/sepgsql/sepgsql.h --- b/contrib/sepgsql/sepgsql.h *** *** 32,37 --- 32,39 /* * Internally used code of object classes + * + * NOTE: Materialized views are treated as tables for now. This smells like a bypass of mandatory access control. Unless you've determined that this is correct within the sepgsql security model, I suggest starting with a draconian policy, like simply crippling MVs. Even if you have determined that, separating out the nontrivial sepgsql support might be good. The set of ideal reviewers is quite different. */ #define SEPG_CLASS_PROCESS 0 #define SEPG_CLASS_FILE 1 *** a/contrib/vacuumlo/vacuumlo.c --- b/contrib/vacuumlo/vacuumlo.c *** *** 209,215 vacuumlo(const char *database, const struct _param * param) strcat(buf, AND a.atttypid = t.oid ); strcat(buf, AND c.relnamespace = s.oid ); strcat(buf, AND t.typname in ('oid', 'lo') ); ! strcat(buf, AND c.relkind = 'r'); strcat(buf, AND s.nspname !~ '^pg_'); res = PQexec(conn, buf); if (PQresultStatus(res) != PGRES_TUPLES_OK) --- 209,215 strcat(buf, AND a.atttypid = t.oid ); strcat(buf, AND c.relnamespace = s.oid ); strcat(buf, AND t.typname in ('oid', 'lo') ); ! strcat(buf, AND c.relkind in ('r', 'm')); It concerns me slightly that older vacuumlo could silently remove large objects still referenced by MVs. Only slightly, though, because the next MV refresh would remove those references anyway. Is there anything we should do to help that situation? If nothing else, perhaps backpatch this patch hunk. +varlistentry + termliteralWITH OIDS//term + termliteralWITHOUT OIDS//term + listitem + para + These are obsolescent syntaxes equivalent to literalWITH (OIDS)/ + and literalWITH (OIDS=FALSE)/, respectively. If you wish to give + both an literalOIDS/ setting and storage parameters, you must use + the literalWITH ( ... )/ syntax; see above. + /para + /listitem +/varlistentry Let's not support OIDs on MVs. They'll be regenerated on every refresh. *** *** 336,342 ExplainOneQuery(Query *query, IntoClause *into, ExplainState *es, */ void ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es, ! const char *queryString, ParamListInfo params) { if (utilityStmt
[HACKERS] NOT VALID FKs and Constraints
9.1 introduced delayed validation on FKs, and 9.2 on table constraints, however neither one has been useful due to lesser-locks on ALTER TABLE being reverted (see http://www.postgresql.org/message-id/1330350691-su...@alvh.no-ip.org), preventing the lock benefits during the VALIDATE stage. I don't see any commits since then about either one. Is fixing this still on the radar for 9.3? Also, is a similar feature planned (for 9.3 or further out) for NOT NULL column constraints? Thanks, Cody Cutrer
Re: [HACKERS] [PATCH] pg_isready (was: [WIP] pg_ping utility)
On Fri, Jan 25, 2013 at 1:45 AM, Phil Sorber p...@omniti.com wrote: On Wed, Jan 23, 2013 at 8:12 PM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Jan 24, 2013 at 8:47 AM, Phil Sorber p...@omniti.com wrote: On Wed, Jan 23, 2013 at 6:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: Phil Sorber p...@omniti.com writes: On Wed, Jan 23, 2013 at 1:58 PM, Bruce Momjian br...@momjian.us wrote: On Wed, Jan 23, 2013 at 12:27:45PM -0500, Tom Lane wrote: +1 for default timeout --- if this isn't like ping where you are expecting to run indefinitely, I can't see that it's a good idea for it to sit very long by default, in any circumstance. FYI, the pg_ctl -w (wait) default is 60 seconds: Great. That is what I came to on my own as well. Figured that might be a sticking point, but as there is precedent, I'm happy with it. I'm not sure that's a relevant precedent at all. What that number is is the time that pg_ctl will wait around for the postmaster to start or stop before reporting a problem --- and in either case, a significant delay (multiple seconds) is not surprising, because of crash-recovery work, shutdown checkpointing, etc. For pg_isready, you'd expect to get a response more or less instantly, wouldn't you? Personally, I'd decide that pg_isready is broken if it didn't give me an answer in a couple of seconds, much less a minute. What I had in mind was a default timeout of maybe 3 or 4 seconds... I was thinking that if it was in a script you wouldn't care if it was 60 seconds. If it was at the command line you would ^C it much earlier. I think the default linux TCP connection timeout is around 20 seconds. My feeling is everyone is going to have a differing opinion on this, which is why I was hoping that some good precedent existed. I'm fine with 3 or 4, whatever can be agreed upon. +1 with 3 or 4 secounds. Aside from this issue, I have one minor comment. ISTM you need to add the following line to the end of the help message. This line has been included in the help message of all the other client commands. Report bugs to pgsql-b...@postgresql.org. Ok, I set the default timeout to 3 seconds, added the bugs email to the help, and also added docs that I forgot last time. Thanks! Also, still hoping to get some feedback on my other issues. set_pglocale_pgservice() should be called? I think that the command name (i.e., pg_isready) should be given to PQpingParams() as fallback_application_name. Otherwise, the server by default uses unknown as the application name of pg_isready. It's undesirable. Why isn't the following message output only when invalid option is specified? Try \%s --help\ for more information. When the conninfo string including the hostname or port number is specified in -d option, pg_isready displays the wrong information as follows. $ pg_isready -d port= /tmp:5432 - no response Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state
On Thu, Jan 24, 2013 at 6:14 AM, Andres Freund and...@2ndquadrant.com wrote: Thats way much more along the lines of what I am afraid of than the performance stuff - but Heikki cited those, so I replied to that. Note that I didn't say this must, must go in - I just don't think Heikki's reasoning about why not hit the nail on the head. Fair enough, no argument. Before getting bogged down in technical commentary, let me say this very clearly: I am enormously grateful for your work on this project. Logical replication based on WAL decoding is a feature of enormous value that PostgreSQL has needed for a long time, and your work has made that look like an achievable goal. Furthermore, it seems to me that you have pursued the community process with all the vigor and sincerity for which anyone could ask. Serious design concerns were raised early in the process and you made radical changes to the design which I believe have improved it tremendously, and you've continued to display an outstanding attitude at every phase of this process about which I can't say enough good things. There is no question in my mind that this work is going to be the beginning of a process that revolutionizes the way people think about replication and PostgreSQL, and you deserve our sincere thanks for that. Now, the bad news is, I don't think it's very reasonable to try to commit this to 9.3. I think it is just too much stuff too late in the cycle. I've reviewed some of the patches from time to time but there is a lot more stuff and it's big and complicated and it's not really clear that we have the interface quite right yet, even though I think it's also clear that we are a lot of closer than we were. I don't want to be fixing that during beta, much less after release. I tried very, very hard to get the basics of the design interface solid. Which obviously doesn't man I am succeeding - luckily not being superhuman after all ;). And I think thats very much where input is desparetely needed and where I failed to raise enough attention. The output plugin interface follewed by the walsender interface is what needs to be most closely vetted. Those are the permanent, user/developer exposed UI and the one we should try to keep as stable as possible. The output plugin callbacks are defined here: http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=src/include/replication/output_plugin.h;hb=xlog-decoding-rebasing-cf4 To make it more agnostic of the technology to implement changeset extraction we possibly should replace the ReorderBuffer(TXN|Change) structs being passed by something more implementation agnostic. walsender interface: http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=src/backend/replication/repl_gram.y;hb=xlog-decoding-rebasing-cf4 The interesting new commands are: 1) K_INIT_LOGICAL_REPLICATION NAME NAME 2) K_START_LOGICAL_REPLICATION NAME RECPTR plugin_options 3) K_FREE_LOGICAL_REPLICATION NAME 1 3 allocate (respectively free) the permanent state associated with one changeset consumer whereas START_LOGICAL_REPLICATION streams out changes starting at RECPTR. Forgive me for not having looked at the patch, but to what extent is all this, ah, documented? Btw, there are currently *no* changes to the wal format at all if wal_format logical except that xl_running_xacts are logged more frequently which obviously could easily be made conditional. Baring bugs of course. The changes with wal_level=logical aren't that big either imo: * heap_insert, heap_update prevent full page writes from removing their normal record by using a separate XLogRecData block for the buffer and the record * heap_delete adds more data (the pkey of the tuple) after the unchanged xl_heap_delete struct * On changes to catalog tables (relfilenode, tid, cmin, cmax) are logged. No changes to mvcc for normal backends at all, unless you count the very slightly changed *Satisfies interface (getting passed a HeapTuple instead of HeapTupleHeader). I am not sure what you're concerned about WRT the on-disk format of the tuples? We are pretty much nailed down on that due to pg_upgrade'ability anyway and it could be changed from this patches POV without a problem, the output plugin just sees normal HeapTuples? Or are you concerned about the code extracting them from the xlog records? Mostly, my concern is that you've accidentally broken something, or that your code will turn out to be flaky in ways we can't now predict. My only really specific concern at this point is about the special treatment of catalog tables. We've never done anything like that before, and it feels like a bad idea. In particular, the fact that you have to WAL-log new information about cmin/cmax really suggests that we're committing ourselves to the MVCC infrastructure in a way that we weren't previously. There's some category of stuff that our MVCC implementation didn't previously
Re: [HACKERS] Materialized views WIP patch
Thanks for looking at this! Noah Misch wrote: For the benefit of the archives, I note that we almost need not truncate an unlogged materialized view during crash recovery. MVs are refreshed in a VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the MV's pg_class to that relfilenode. When a crash occurs with no refresh in flight, the latest refresh had been safely synced. When a crash cuts short a refresh, the pg_class update will not stick, and the durability of the old heap is not in doubt. However, non-btree index builds don't have the same property; we would need to force an immediate sync of the indexes to be safe here. It would remain necessary to truncate unlogged MVs when recovering a base backup, which may contain a partially-written refresh that did eventually commit. Future MV variants that modify the MV in place would also need the usual truncate on crash. Hmm. That's a very good observation. Perhaps the issue can be punted to a future release where we start adding more incremental updates to them. I'll think on that, but on the face of it, it sounds like the best choice. I'm going to follow this with a review covering a broader range of topics. I'll need time to digest the rest of it. As you note, recent commits conflict with the last patch. Please look at the github repo where I've been working on this. I'll post an updated patch later today. https://github.com/kgrittn/postgres/tree/matview You might want to ignore the interim work on detecting the new pg_dump dependencies through walking the internal structures. I decided that was heading in a direction which might be unnecessarily fragile and slow; so I tried writing it as a query against the system tables. I'm pretty happy with the results. Here's the query: with recursive w as ( select d1.objid, d1.objid as wrkid, d2.refobjid, c2.relkind as refrelkind from pg_depend d1 join pg_class c1 on c1.oid = d1.objid and c1.relkind = 'm' and c1.relisvalid join pg_rewrite r1 on r1.ev_class = d1.objid join pg_depend d2 on d2.classid = 'pg_rewrite'::regclass and d2.objid = r1.oid and d2.refobjid d1.objid join pg_class c2 on c2.oid = d2.refobjid and c2.relkind in ('m','v') and c2.relisvalid where d1.classid = 'pg_class'::regclass union select w.objid, w.refobjid as wrkid, d3.refobjid, c3.relkind as refrelkind from w join pg_rewrite r3 on r3.ev_class = w.refobjid join pg_depend d3 on d3.classid = 'pg_rewrite'::regclass and d3.objid = r3.oid and d3.refobjid w.refobjid join pg_class c3 on c3.oid = d3.refobjid and c3.relkind in ('m','v') and c3.relisvalid where w.refrelkind 'm' ), x as ( select objid::regclass, refobjid::regclass from w where refrelkind = 'm' ) select 'm'::text as type, x.objid, x.refobjid from x union all select 'i'::text as type, x.objid, i.indexrelid as refobjid from x join pg_index i on i.indrelid = x.refobjid and i.indisvalid ; If we bail on having pg_class.relisvalid, then it will obviously need adjustment. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
On Wed, Jan 23, 2013 at 1:45 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Andres Freund escribió: I somewhat dislike the fact that CONCURRENTLY isn't really concurrent here (for the listeners: swapping the indexes acquires exlusive locks) , but I don't see any other naming being better. REINDEX ALMOST CONCURRENTLY? I'm kind of unconvinced of the value proposition of this patch. I mean, you can DROP INDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY today, so ... how is this better? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
On Thu, Jan 24, 2013 at 01:29:56PM -0500, Robert Haas wrote: On Wed, Jan 23, 2013 at 1:45 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Andres Freund escribió: I somewhat dislike the fact that CONCURRENTLY isn't really concurrent here (for the listeners: swapping the indexes acquires exlusive locks) , but I don't see any other naming being better. REINDEX ALMOST CONCURRENTLY? I'm kind of unconvinced of the value proposition of this patch. I mean, you can DROP INDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY today, so ... how is this better? This has been on the TODO list for a while, and I don't think the renaming in a transaction work needed to use drop/create is really something we want to force on users. In addition, doing that for all tables in a database is even more work, so I would be disappointed _not_ to get this feature in 9.3. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state
* Robert Haas (robertmh...@gmail.com) wrote: Now, the bad news is, I don't think it's very reasonable to try to commit this to 9.3. I think it is just too much stuff too late in the cycle. I've reviewed some of the patches from time to time but there is a lot more stuff and it's big and complicated and it's not really clear that we have the interface quite right yet, even though I think it's also clear that we are a lot of closer than we were. I don't want to be fixing that during beta, much less after release. The only way to avoid this happening again and again, imv, is to get it committed early in whatever cycle it's slated to release for. We've got some serious challenges there though because we want to encourage everyone to focus on beta testing and going through the release process, plus we don't want to tag/branch too early or we create more work for ourselves. It would have been nice to get this into 9.3, but I can certainly understand needing to move it back, but can we get a slightly more specific plan around getting it in then? Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Strange Windows problem, lock_timeout test request
On Sat, Jan 19, 2013 at 12:15 PM, Andrew Dunstan and...@dunslane.net wrote: On 01/19/2013 02:36 AM, Boszormenyi Zoltan wrote: A long time ago I had a lot of sympathy with this answer, but these days not so much. Getting a working mingw/msys environment sufficient to build a bare bones PostgreSQL from scratch is both cheap and fairly easy. The improvements that mingw has made in its install process, and the presence of cheap or free windows instances in the cloud combine to make this pretty simple. But since it's still slightly involved here is how I constructed one such this morning: I've used this description, skipping the Amazon part and putting it directly on my Windows computer, and it worked. Except bin/pg_ctl does not work. It just silently exits without doing anything, so I have to use bin/postgres to start the database (which is what make check uses anyway, so not a problem if you just want make check). Is that just me or is that a known problem? I've seen some discussion from 2004, but didn't find a conclusion. What advantages does mingw have over MSVC? Is it just that it is cost-free, or is it easier to use mingw than MSVC for someone used to building on Linux? (mingw certainly does not seem to have the advantage of being fast!) Would you like to put this somewhere on wiki.postgresql.org, or would you mind if I do so? Thanks for the primer, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
Bruce Momjian br...@momjian.us writes: On Thu, Jan 24, 2013 at 01:29:56PM -0500, Robert Haas wrote: I'm kind of unconvinced of the value proposition of this patch. I mean, you can DROP INDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY today, so ... how is this better? This has been on the TODO list for a while, and I don't think the renaming in a transaction work needed to use drop/create is really something we want to force on users. In addition, doing that for all tables in a database is even more work, so I would be disappointed _not_ to get this feature in 9.3. I haven't given the current patch a look, but based on previous discussions, this isn't going to be more than a macro for things that users can do already --- that is, it's going to be basically DROP CONCURRENTLY plus CREATE CONCURRENTLY plus ALTER INDEX RENAME, including the fact that the RENAME step will transiently need an exclusive lock. (If that's not what it's doing, it's broken.) So there's some convenience argument for it, but it's hardly amounting to a stellar improvement. I'm kind of inclined to put it off till after we fix the SnapshotNow race condition problems; at that point it should be possible to do REINDEX CONCURRENTLY more simply and without any exclusive lock anywhere. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
On 2013-01-24 13:29:56 -0500, Robert Haas wrote: On Wed, Jan 23, 2013 at 1:45 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Andres Freund escribió: I somewhat dislike the fact that CONCURRENTLY isn't really concurrent here (for the listeners: swapping the indexes acquires exlusive locks) , but I don't see any other naming being better. REINDEX ALMOST CONCURRENTLY? I'm kind of unconvinced of the value proposition of this patch. I mean, you can DROP INDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY today, so ... how is this better? In the wake of beb850e1d873f8920a78b9b9ee27e9f87c95592f I wrote a script to do this and it really is harder than one might think: * you cannot do it in the database as CONCURRENTLY cannot be used in a TX * you cannot do it to toast tables (this is currently broken in the patch but should be fixable) * you cannot legally do it when foreign key reference your unique key * you cannot do it to exclusion constraints or non-immediate indexes All of those are fixable (and most are) within REINDEX CONCURRENLY, so I find that to be a major feature even if its not as good as it could be. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state
On 24.01.2013 20:27, Robert Haas wrote: Before getting bogged down in technical commentary, let me say this very clearly: I am enormously grateful for your work on this project. Logical replication based on WAL decoding is a feature of enormous value that PostgreSQL has needed for a long time, and your work has made that look like an achievable goal. Furthermore, it seems to me that you have pursued the community process with all the vigor and sincerity for which anyone could ask. Serious design concerns were raised early in the process and you made radical changes to the design which I believe have improved it tremendously, and you've continued to display an outstanding attitude at every phase of this process about which I can't say enough good things. +1. I really appreciate all the work you Andres have put into this. I've argued in the past myself that there should be a little tool that scrapes the WAL to do logical replication. Essentially, just what you've implemented. That said (hah, you knew there would be a but ;-)), now that I see what that looks like, I'm feeling that maybe it wasn't such a good idea after all. It sounded like a fairly small patch that greatly reduces the overhead in the master with existing replication systems like slony, but it turned out to be a huge patch with a lot of new concepts and interfaces. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Skip checkpoint on promoting from streaming replication
On 24 January 2013 17:44, Simon Riggs si...@2ndquadrant.com wrote: At replay, an end-of-recovery record should be a signal to the hot standby mechanism that there are no transactions running in the master at that point, same as a shutdown checkpoint. I had a reason why I didn't do that, but it seems to have slipped my mind. If I can't remember, I'll add it. I think it was simply to keep things simple and avoid bugs in this release. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gistchoose vs. bloat
On 21.01.2013 15:19, Heikki Linnakangas wrote: On 21.01.2013 15:06, Tom Lane wrote: Jeff Davispg...@j-davis.com writes: On Mon, 2013-01-21 at 00:48 -0500, Tom Lane wrote: I looked at this patch. ISTM we should not have the option at all but just do it always. I cannot believe that always-go-left is ever a preferable strategy in the long run; the resulting imbalance in the index will surely kill any possible benefit. Even if there are some cases where it miraculously fails to lose, how many users are going to realize that applies to their case and make use of the option? Sounds good to me. If I remember correctly, there was also an argument that it may be useful for repeatable test results. That's a little questionable for performance (except in those cases where few penalties are identical anyway), but could plausibly be useful for a crash report or something. Meh. There's already a random decision, in the equivalent place and for a comparable reason, in btree (cf _bt_findinsertloc). Nobody's ever complained about that being indeterminate, so I'm unconvinced that there's a market for it with gist. I wonder if it would work for gist to do something similar to _bt_findinsertloc, and have a bias towards the left page, but sometimes descend to one of the pages to the right. You would get the cache locality of usually descending down the same subtree, but also fill the pages to the right. Jeff / Alexander, want to give that a shot? I did some experimenting with that. I used the same test case Alexander did, with geonames data, and compared unpatched version, the original patch, and the attached patch that biases the first best tuple found, but still sometimes chooses the other equally good ones. testname| initsize | finalsize | idx_blks_read | idx_blks_hit +--+---+---+-- patched-10-4mb | 75497472 | 90202112 | 5853604 | 10178331 unpatched-4mb | 75145216 | 94863360 | 5880676 | 10185647 unpatched-4mb | 75587584 | 97165312 | 5903107 | 10183759 patched-2-4mb | 74768384 | 81403904 | 5768124 | 10193738 origpatch-4mb | 74883072 | 82182144 | 5783412 | 10185373 All these tests were performed with shared_buffers=4MB, so that the index won't fit completely in shared buffers, and I could use the idx_blk_read/hit ratio as a measure of cache-friendliness. The origpath test was with a simplified version of Alexander's patch, see attached. It's the same as the original, but with the randomization-option and gist-build related code removed. The patched-10 and patched-2 tests are two variants with my patch, with 1/10 and 1/2 chance of moving to the next equal tuple, respectively. The differences in cache hit ratio might be just a result of different index sizes. I included two unpatched runs above to show that there's a fair amount of noise in these tests. That's because of the random nature of the test case; it picks rows to delete and insert at random. I think the conclusion is that all of these patches are effective. The 1/10 variant is less effective, as expected, as it's closer in behavior to the unpatched behavior than the others. The 1/2 variant seems as good as the original patch. I performed another test, by inserting 100 duplicate values on an empty table. testname | finalsize | idx_blks_read | idx_blks_hit +---+---+-- unpatched | 89030656 | 21350 | 2972033 patched-10 | 88973312 | 21450 | 2971920 patched-2 | 88481792 | 22947 | 2970260 origpatch | 61186048 |761817 | 2221500 The original patch produces a much smaller index in this test, but since the inserts are distributed randomly, the pages are accessed randomly which is bad for caching. A table full of duplicates isn't very realistic, but overall, I'm leaning towards my version of this patch (gistchoose-2.patch). It has less potential for causing a regression in existing applications, but is just as effective in the original scenario of repeated delete+insert. - Heikki duplicatetest.sh Description: Bourne shell script gistbloat.sh Description: Bourne shell script diff --git a/src/backend/access/gist/gistutil.c b/src/backend/access/gist/gistutil.c index 8b60774..75778f6 100644 --- a/src/backend/access/gist/gistutil.c +++ b/src/backend/access/gist/gistutil.c @@ -379,6 +379,7 @@ gistchoose(Relation r, Page p, IndexTuple it, /* it has compressed entry */ GISTENTRY entry, identry[INDEX_MAX_KEYS]; bool isnull[INDEX_MAX_KEYS]; + int look_further_on_equal = -1; Assert(!GistPageIsLeaf(p)); @@ -446,6 +447,8 @@ gistchoose(Relation r, Page p, IndexTuple it, /* it has compressed entry */ if (j r-rd_att-natts - 1) best_penalty[j + 1] = -1; + +look_further_on_equal = -1; } else if (best_penalty[j] == usize) { @@ -468,12 +471,46 @@
Re: [HACKERS] [PATCH] pg_isready (was: [WIP] pg_ping utility)
On Thu, Jan 24, 2013 at 1:12 PM, Fujii Masao masao.fu...@gmail.com wrote: set_pglocale_pgservice() should be called? I think that the command name (i.e., pg_isready) should be given to PQpingParams() as fallback_application_name. Otherwise, the server by default uses unknown as the application name of pg_isready. It's undesirable. Why isn't the following message output only when invalid option is specified? Try \%s --help\ for more information. I've updated the patch to address these three issues. Attached. When the conninfo string including the hostname or port number is specified in -d option, pg_isready displays the wrong information as follows. $ pg_isready -d port= /tmp:5432 - no response This is what i asked about in my previous email about precedence of the parameters. I can parse that with PQconninfoParse, but what are the rules for merging both individual and conninfo params together? For example if someone did: pg_isready -h foo -d host=bar port=4321 -p 1234 What should the connection parameters be? Regards, -- Fujii Masao pg_isready_timeout_v3.diff Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gistchoose vs. bloat
On 21.01.2013 15:19, Heikki Linnakangas wrote: On 21.01.2013 15:06, Tom Lane wrote: Jeff Davispg...@j-davis.com writes: On Mon, 2013-01-21 at 00:48 -0500, Tom Lane wrote: I looked at this patch. ISTM we should not have the option at all but just do it always. I cannot believe that always-go-left is ever a preferable strategy in the long run; the resulting imbalance in the index will surely kill any possible benefit. Even if there are some cases where it miraculously fails to lose, how many users are going to realize that applies to their case and make use of the option? Sounds good to me. If I remember correctly, there was also an argument that it may be useful for repeatable test results. That's a little questionable for performance (except in those cases where few penalties are identical anyway), but could plausibly be useful for a crash report or something. Meh. There's already a random decision, in the equivalent place and for a comparable reason, in btree (cf _bt_findinsertloc). Nobody's ever complained about that being indeterminate, so I'm unconvinced that there's a market for it with gist. I wonder if it would work for gist to do something similar to _bt_findinsertloc, and have a bias towards the left page, but sometimes descend to one of the pages to the right. You would get the cache locality of usually descending down the same subtree, but also fill the pages to the right. Jeff / Alexander, want to give that a shot? I did some experimenting with that. I used the same test case Alexander did, with geonames data, and compared unpatched version, the original patch, and the attached patch that biases the first best tuple found, but still sometimes chooses the other equally good ones. testname| initsize | finalsize | idx_blks_read | idx_blks_hit +--+---+---+-- patched-10-4mb | 75497472 | 90202112 | 5853604 | 10178331 unpatched-4mb | 75145216 | 94863360 | 5880676 | 10185647 unpatched-4mb | 75587584 | 97165312 | 5903107 | 10183759 patched-2-4mb | 74768384 | 81403904 | 5768124 | 10193738 origpatch-4mb | 74883072 | 82182144 | 5783412 | 10185373 All these tests were performed with shared_buffers=4MB, so that the index won't fit completely in shared buffers, and I could use the idx_blk_read/hit ratio as a measure of cache-friendliness. The origpath test was with a simplified version of Alexander's patch, see attached. It's the same as the original, but with the randomization-option and gist-build related code removed. The patched-10 and patched-2 tests are two variants with my patch, with 1/10 and 1/2 chance of moving to the next equal tuple, respectively. The differences in cache hit ratio might be just a result of different index sizes. I included two unpatched runs above to show that there's a fair amount of noise in these tests. That's because of the random nature of the test case; it picks rows to delete and insert at random. I think the conclusion is that all of these patches are effective. The 1/10 variant is less effective, as expected, as it's closer in behavior to the unpatched behavior than the others. The 1/2 variant seems as good as the original patch. I performed another test, by inserting 100 duplicate values on an empty table. testname | finalsize | idx_blks_read | idx_blks_hit +---+---+-- unpatched | 89030656 | 21350 | 2972033 patched-10 | 88973312 | 21450 | 2971920 patched-2 | 88481792 | 22947 | 2970260 origpatch | 61186048 |761817 | 2221500 The original patch produces a much smaller index in this test, but since the inserts are distributed randomly, the pages are accessed randomly which is bad for caching. A table full of duplicates isn't very realistic, but overall, I'm leaning towards my version of this patch (gistchoose-2.patch). It has less potential for causing a regression in existing applications, but is just as effective in the original scenario of repeated delete+insert. - Heikki duplicatetest.sh Description: Bourne shell script gistbloat.sh Description: Bourne shell script diff --git a/src/backend/access/gist/gistutil.c b/src/backend/access/gist/gistutil.c index 8b60774..75778f6 100644 --- a/src/backend/access/gist/gistutil.c +++ b/src/backend/access/gist/gistutil.c @@ -379,6 +379,7 @@ gistchoose(Relation r, Page p, IndexTuple it, /* it has compressed entry */ GISTENTRY entry, identry[INDEX_MAX_KEYS]; bool isnull[INDEX_MAX_KEYS]; + int look_further_on_equal = -1; Assert(!GistPageIsLeaf(p)); @@ -446,6 +447,8 @@ gistchoose(Relation r, Page p, IndexTuple it, /* it has compressed entry */ if (j r-rd_att-natts - 1) best_penalty[j + 1] = -1; + +look_further_on_equal = -1; } else if (best_penalty[j] == usize) { @@ -468,12 +471,46 @@
Re: [HACKERS] Strange Windows problem, lock_timeout test request
On 01/24/2013 01:44 PM, Jeff Janes wrote: On Sat, Jan 19, 2013 at 12:15 PM, Andrew Dunstan and...@dunslane.net wrote: On 01/19/2013 02:36 AM, Boszormenyi Zoltan wrote: A long time ago I had a lot of sympathy with this answer, but these days not so much. Getting a working mingw/msys environment sufficient to build a bare bones PostgreSQL from scratch is both cheap and fairly easy. The improvements that mingw has made in its install process, and the presence of cheap or free windows instances in the cloud combine to make this pretty simple. But since it's still slightly involved here is how I constructed one such this morning: I've used this description, skipping the Amazon part and putting it directly on my Windows computer, and it worked. Except bin/pg_ctl does not work. It just silently exits without doing anything, so I have to use bin/postgres to start the database (which is what make check uses anyway, so not a problem if you just want make check). Is that just me or is that a known problem? I've seen some discussion from 2004, but didn't find a conclusion. Did you copy libpq.dll from the lib directory to the bin directory? If not, try that and see if it fixes the problem. What advantages does mingw have over MSVC? Is it just that it is cost-free, or is it easier to use mingw than MSVC for someone used to building on Linux? (mingw certainly does not seem to have the advantage of being fast!) See Craig's email today about problems with SDKs and availabilty of Would you like to put this somewhere on wiki.postgresql.org, or would you mind if I do so? Thanks for the primer, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gistchoose vs. bloat
Heikki Linnakangas hlinnakan...@vmware.com writes: I did some experimenting with that. I used the same test case Alexander did, with geonames data, and compared unpatched version, the original patch, and the attached patch that biases the first best tuple found, but still sometimes chooses the other equally good ones. testname| initsize | finalsize | idx_blks_read | idx_blks_hit +--+---+---+-- patched-10-4mb | 75497472 | 90202112 | 5853604 | 10178331 unpatched-4mb | 75145216 | 94863360 | 5880676 | 10185647 unpatched-4mb | 75587584 | 97165312 | 5903107 | 10183759 patched-2-4mb | 74768384 | 81403904 | 5768124 | 10193738 origpatch-4mb | 74883072 | 82182144 | 5783412 | 10185373 I think the conclusion is that all of these patches are effective. The 1/10 variant is less effective, as expected, as it's closer in behavior to the unpatched behavior than the others. The 1/2 variant seems as good as the original patch. At least on this example, it seems a tad better, if you look at index size. A table full of duplicates isn't very realistic, but overall, I'm leaning towards my version of this patch (gistchoose-2.patch). It has less potential for causing a regression in existing applications, but is just as effective in the original scenario of repeated delete+insert. +1 for this patch, but I think the comments could use more work. I was convinced it was wrong on first examination, mainly because it's hard to follow the underdocumented look_further_on_equal logic. I propose the attached, which is the same logic with better comments (I also chose to rename and invert the sense of the state variable, because it seemed easier to follow this way ... YMMV on that though). regards, tom lane diff --git a/src/backend/access/gist/gistutil.c b/src/backend/access/gist/gistutil.c index 8b60774f50349222ee04fef13613e3592120973b..a127627334e80704edeed38d3522e4000a695abb 100644 *** a/src/backend/access/gist/gistutil.c --- b/src/backend/access/gist/gistutil.c *** gistchoose(Relation r, Page p, IndexTupl *** 379,384 --- 379,385 GISTENTRY entry, identry[INDEX_MAX_KEYS]; bool isnull[INDEX_MAX_KEYS]; + int keep_current_best; Assert(!GistPageIsLeaf(p)); *** gistchoose(Relation r, Page p, IndexTupl *** 402,407 --- 403,433 best_penalty[0] = -1; /* + * If we find a tuple that's exactly as good as the currently best one, we + * could use either one. When inserting a lot of tuples with the same or + * similar keys, it's preferable to descend down the same path when + * possible, as that's more cache-friendly. On the other hand, if all + * inserts land on the same leaf page after a split, we're never going to + * insert anything to the other half of the split, and will end up using + * only 50% of the available space. Distributing the inserts evenly would + * lead to better space usage, but that hurts cache-locality during + * insertion. To get the best of both worlds, when we find a tuple that's + * exactly as good as the previous best, choose randomly whether to stick + * to the old best, or use the new one. Once we decide to stick to the + * old best, we keep sticking to it for any subsequent equally good tuples + * we might find. This favors tuples with low offsets, but still allows + * some inserts to go to other equally-good subtrees. + * + * keep_current_best is -1 if we haven't yet had to make a random choice + * whether to keep the current best tuple. If we have done so, and + * decided to keep it, keep_current_best is 1; if we've decided to + * replace, keep_current_best is 0. (This state will be reset to -1 as + * soon as we've made the replacement, but sometimes we make the choice in + * advance of actually finding a replacement best tuple.) + */ + keep_current_best = -1; + + /* * Loop over tuples on page. */ maxoff = PageGetMaxOffsetNumber(p); *** gistchoose(Relation r, Page p, IndexTupl *** 446,451 --- 472,480 if (j r-rd_att-natts - 1) best_penalty[j + 1] = -1; + + /* we have new best, so reset keep-it decision */ + keep_current_best = -1; } else if (best_penalty[j] == usize) { *** gistchoose(Relation r, Page p, IndexTupl *** 468,479 } /* ! * If we find a tuple with zero penalty for all columns, there's no ! * need to examine remaining tuples; just break out of the loop and ! * return it. */ if (zero_penalty) ! break; } return result; --- 497,537 } /* ! * If we looped past the last column, and did not update result, ! * then this tuple is exactly as good as the prior best tuple. ! */ ! if (j == r-rd_att-natts result != i) ! { ! if
Re: [HACKERS] Materialized views WIP patch
On Thu, Jan 24, 2013 at 01:29:10PM -0500, Kevin Grittner wrote: Noah Misch wrote: For the benefit of the archives, I note that we almost need not truncate an unlogged materialized view during crash recovery. MVs are refreshed in a VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the MV's pg_class to that relfilenode. When a crash occurs with no refresh in flight, the latest refresh had been safely synced. When a crash cuts short a refresh, the pg_class update will not stick, and the durability of the old heap is not in doubt. However, non-btree index builds don't have the same property; we would need to force an immediate sync of the indexes to be safe here. It would remain necessary to truncate unlogged MVs when recovering a base backup, which may contain a partially-written refresh that did eventually commit. Future MV variants that modify the MV in place would also need the usual truncate on crash. Hmm. That's a very good observation. Perhaps the issue can be punted to a future release where we start adding more incremental updates to them. I'll think on that, but on the face of it, it sounds like the best choice. That situation is challenging for the same reason pg_class.relisvalid was hard to implement for unlogged relations. The startup process doesn't know the relkind of the unlogged-relation relfilenodes it cleans. If you can work through all that, it's certainly a nice endpoint to not lose unlogged snapshot MVs on crash. But I intended the first half of my message as the recommendation and the above as a wish for the future. You might want to ignore the interim work on detecting the new pg_dump dependencies through walking the internal structures. I decided that was heading in a direction which might be unnecessarily fragile and slow; so I tried writing it as a query against the system tables. I'm pretty happy with the results. Here's the query: with recursive w as [snip] Why is the dependency problem of ordering MV refreshes and MV index builds so different from existing pg_dump dependency problems? If we bail on having pg_class.relisvalid, then it will obviously need adjustment. Even if we don't have the column, we can have the fact of an MV's validity SQL-visible in some other way. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Strange Windows problem, lock_timeout test request
On 01/24/2013 02:41 PM, Andrew Dunstan wrote: What advantages does mingw have over MSVC? Is it just that it is cost-free, or is it easier to use mingw than MSVC for someone used to building on Linux? (mingw certainly does not seem to have the advantage of being fast!) See Craig's email today about problems with SDKs and availabilty of Not sure what happened there. ... availability of free 64 bit MSVC compilers. Also, some third party libraries are built with the Mingw compilers and it's often best not to switch if you can help it. Would you like to put this somewhere on wiki.postgresql.org, or would you mind if I do so? Please go for it. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gistchoose vs. bloat
On Thu, Jan 24, 2013 at 11:26 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 21.01.2013 15:19, Heikki Linnakangas wrote: On 21.01.2013 15:06, Tom Lane wrote: Jeff Davispg...@j-davis.com writes: On Mon, 2013-01-21 at 00:48 -0500, Tom Lane wrote: I looked at this patch. ISTM we should not have the option at all but just do it always. I cannot believe that always-go-left is ever a preferable strategy in the long run; the resulting imbalance in the index will surely kill any possible benefit. Even if there are some cases where it miraculously fails to lose, how many users are going to realize that applies to their case and make use of the option? Sounds good to me. If I remember correctly, there was also an argument that it may be useful for repeatable test results. That's a little questionable for performance (except in those cases where few penalties are identical anyway), but could plausibly be useful for a crash report or something. Meh. There's already a random decision, in the equivalent place and for a comparable reason, in btree (cf _bt_findinsertloc). Nobody's ever complained about that being indeterminate, so I'm unconvinced that there's a market for it with gist. I wonder if it would work for gist to do something similar to _bt_findinsertloc, and have a bias towards the left page, but sometimes descend to one of the pages to the right. You would get the cache locality of usually descending down the same subtree, but also fill the pages to the right. Jeff / Alexander, want to give that a shot? I did some experimenting with that. I used the same test case Alexander did, with geonames data, and compared unpatched version, the original patch, and the attached patch that biases the first best tuple found, but still sometimes chooses the other equally good ones. testname| initsize | finalsize | idx_blks_read | idx_blks_hit +--+--**-+---+**-- patched-10-4mb | 75497472 | 90202112 | 5853604 | 10178331 unpatched-4mb | 75145216 | 94863360 | 5880676 | 10185647 unpatched-4mb | 75587584 | 97165312 | 5903107 | 10183759 patched-2-4mb | 74768384 | 81403904 | 5768124 | 10193738 origpatch-4mb | 74883072 | 82182144 | 5783412 | 10185373 All these tests were performed with shared_buffers=4MB, so that the index won't fit completely in shared buffers, and I could use the idx_blk_read/hit ratio as a measure of cache-friendliness. The origpath test was with a simplified version of Alexander's patch, see attached. It's the same as the original, but with the randomization-option and gist-build related code removed. The patched-10 and patched-2 tests are two variants with my patch, with 1/10 and 1/2 chance of moving to the next equal tuple, respectively. The differences in cache hit ratio might be just a result of different index sizes. I included two unpatched runs above to show that there's a fair amount of noise in these tests. That's because of the random nature of the test case; it picks rows to delete and insert at random. I think the conclusion is that all of these patches are effective. The 1/10 variant is less effective, as expected, as it's closer in behavior to the unpatched behavior than the others. The 1/2 variant seems as good as the original patch. Does two unpatched-4mb lines are different by coincidence? If so then variance is significant and we need more experiments to actually compare patched-2-4mb and origpatch-4mb lines. There is another cause of overhead when use randomization in gistchoose: extra penalty calls. It could be significant when index fits to cache. In order evade it I especially change behaviour of my patch from look sequentially and choose random to look in random order. I think we need to include comparison of CPU time. -- With best regards, Alexander Korotkov.
Re: [HACKERS] Materialized views WIP patch
Noah Misch wrote: On Thu, Jan 24, 2013 at 01:29:10PM -0500, Kevin Grittner wrote: Noah Misch wrote: For the benefit of the archives, I note that we almost need not truncate an unlogged materialized view during crash recovery. MVs are refreshed in a VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the MV's pg_class to that relfilenode. When a crash occurs with no refresh in flight, the latest refresh had been safely synced. When a crash cuts short a refresh, the pg_class update will not stick, and the durability of the old heap is not in doubt. However, non-btree index builds don't have the same property; we would need to force an immediate sync of the indexes to be safe here. It would remain necessary to truncate unlogged MVs when recovering a base backup, which may contain a partially-written refresh that did eventually commit. Future MV variants that modify the MV in place would also need the usual truncate on crash. Hmm. That's a very good observation. Perhaps the issue can be punted to a future release where we start adding more incremental updates to them. I'll think on that, but on the face of it, it sounds like the best choice. That situation is challenging for the same reason pg_class.relisvalid was hard to implement for unlogged relations. The startup process doesn't know the relkind of the unlogged-relation relfilenodes it cleans. If you can work through all that, it's certainly a nice endpoint to not lose unlogged snapshot MVs on crash. But I intended the first half of my message as the recommendation and the above as a wish for the future. Well, if I just don't create an init fork for MVs, they are left as they were on recovery, aren't they? So for 9.3, that solves that issue, I think. pg_class.relisvald is a separate issue. You might want to ignore the interim work on detecting the new pg_dump dependencies through walking the internal structures. I decided that was heading in a direction which might be unnecessarily fragile and slow; so I tried writing it as a query against the system tables. I'm pretty happy with the results. Here's the query: with recursive w as [snip] Why is the dependency problem of ordering MV refreshes and MV index builds so different from existing pg_dump dependency problems? If mva has indexes and is referenced by mvb, the CREATE statements are all properly ordered, but you want mva populated and indexed before you attempt to populate mvb. (Populated to get correct results, indexed to get them quickly.) We don't have anything else like that. If we bail on having pg_class.relisvalid, then it will obviously need adjustment. Even if we don't have the column, we can have the fact of an MV's validity SQL-visible in some other way. Sure, I didn't say we had to abandon the query -- probably just replace the relisvalid tests with a function call using the oid of the MV. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
Jameison Martin jameis...@yahoo.com writes: there have been a lot of different threads on this patch. i'm going to take a stab at teasing them out, summarizing them, and addressing them individually. Thanks for reviewing the bidding so carefully. 4.3) does it violate a principle in the code (Tom's latest email) from my perspective, the code has to deal with the fact that natts on the persistent row is different than the tupdesc already, this is the foundation upon which ALTER TABLE ADD NULL and DROP COLUMN as a metadata operation are built. so i believe that this patch strengthens the code's ability to handle the ALTER TABLE ADD NULL case by generalizing it: now the code will more frequently need to deal with the disparity between natts and tupdesc rather than only after someone did an ALTER TABLE. i'm an advocate of making corner cases go through generalized code where possible. I think we're already pretty convinced that that case works, since ALTER TABLE ADD COLUMN has been around for a very long time. however, in all honestly i hadn't consider Tom's point that the patch has created a situation where natts on a row can deviate from the tupdesc that it was built with (as opposed to the current tupdesc which it could always deviate due to a subsequent ALTER TABLE). this is a subtle point and i don't think i have enough experience in the Postgres code to argue one way or another about it. so i'll have to leave that determination up to people that are more familiar with the code. To be a bit more concrete, the thing that is worrying me is that the executor frequently transforms tuples between virtual and HeapTuple formats (look at the TupleTableSlot infrastructure, junk filters, and tuplesort/tuplestore, for example). Up to now such transformation could be counted on not to affect the apparent number of columns in the tuple; but with this patch, a tuple materialized as a HeapTuple might well show an natts smaller than what you'd conclude from looking at it in virtual slot format. Now it's surely true that any code that's broken by that would be broken anyway if it were fed a tuple direct-from-disk from a relation that had suffered a later ADD COLUMN, but there are lots of code paths where raw disk tuples would never appear. So IMO there's a pretty fair chance of exposing latent bugs with this. As an example that this type of concern isn't hypothetical, and that identifying such bugs can be very difficult, see commit 039680aff. That bug had been latent for years before it was exposed by an unrelated change, and it took several more years to track it down. As I said, I'd be willing to take this risk if the patch showed more attractive performance benefits ... but it still seems mighty marginal from here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gistchoose vs. bloat
Alexander Korotkov aekorot...@gmail.com writes: There is another cause of overhead when use randomization in gistchoose: extra penalty calls. It could be significant when index fits to cache. In order evade it I especially change behaviour of my patch from look sequentially and choose random to look in random order. I think we need to include comparison of CPU time. Hmm ... actually, isn't that an argument in favor of Heikki's method? The way he's doing it, we can exit without making additional penalty calls once we've found a zero-penalty tuple and decided not to look further (which is something with a pretty high probability). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [BUGS] BUG #7815: Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails - invalid status retrieve
On Wed, Jan 23, 2013 at 04:58:57PM -0500, Bruce Momjian wrote: Attached is a ready-to-apply version of the patch. I continued to use postmaster.pid to determine if I need to try the start/stop test --- that allows me to know which servers _might_ be running, because a server start failure might be for many reasons and I would prefer not to suggest the server is running if I can avoid it, and the pid file gives me that. The patch is longer than I expected because the code needed to be reordered. The starting banner indicates if it a live check, but to check for a live server we need to start/stop the servers and we needed to know where the binaries are, and we didn't do that until after the start banner. I removed the 'check' line for binary checks, and moved that before the banner printing. postmaster_start also now needs an option to supress an error. Applied. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On Thu, Jan 24, 2013 at 03:14:15PM -0500, Kevin Grittner wrote: Noah Misch wrote: On Thu, Jan 24, 2013 at 01:29:10PM -0500, Kevin Grittner wrote: Noah Misch wrote: For the benefit of the archives, I note that we almost need not truncate an unlogged materialized view during crash recovery. MVs are refreshed in a VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the MV's pg_class to that relfilenode. When a crash occurs with no refresh in flight, the latest refresh had been safely synced. When a crash cuts short a refresh, the pg_class update will not stick, and the durability of the old heap is not in doubt. However, non-btree index builds don't have the same property; we would need to force an immediate sync of the indexes to be safe here. It would remain necessary to truncate unlogged MVs when recovering a base backup, which may contain a partially-written refresh that did eventually commit. Future MV variants that modify the MV in place would also need the usual truncate on crash. Hmm. That's a very good observation. Perhaps the issue can be punted to a future release where we start adding more incremental updates to them. I'll think on that, but on the face of it, it sounds like the best choice. That situation is challenging for the same reason pg_class.relisvalid was hard to implement for unlogged relations. The startup process doesn't know the relkind of the unlogged-relation relfilenodes it cleans. If you can work through all that, it's certainly a nice endpoint to not lose unlogged snapshot MVs on crash. But I intended the first half of my message as the recommendation and the above as a wish for the future. Well, if I just don't create an init fork for MVs, they are left as they were on recovery, aren't they? So for 9.3, that solves that issue, I think. pg_class.relisvald is a separate issue. The startup process just looks for init forks, yes. But it's acceptable to leave the unlogged MV materials alone during *crash* recovery only. When recovering from a base backup, we once again need an init fork to refresh the unlogged-MV relations. In turn, we would still need a relisvalid implementation that copes. This is all solvable, sure, but it looks like a trip off into the weeds relative to the core aim of this patch. Why is the dependency problem of ordering MV refreshes and MV index builds so different from existing pg_dump dependency problems? If mva has indexes and is referenced by mvb, the CREATE statements are all properly ordered, but you want mva populated and indexed before you attempt to populate mvb. (Populated to get correct results, indexed to get them quickly.) We don't have anything else like that. Is the REFRESH order just a replay of the CREATE order (with index builds interspersed), or can it differ? nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gistchoose vs. bloat
On 24.01.2013 22:35, Tom Lane wrote: Alexander Korotkovaekorot...@gmail.com writes: There is another cause of overhead when use randomization in gistchoose: extra penalty calls. It could be significant when index fits to cache. In order evade it I especially change behaviour of my patch from look sequentially and choose random to look in random order. I think we need to include comparison of CPU time. Hmm ... actually, isn't that an argument in favor of Heikki's method? The way he's doing it, we can exit without making additional penalty calls once we've found a zero-penalty tuple and decided not to look further (which is something with a pretty high probability). No, I think Alexander is right, although I believe the difference is minimal in practice. If we assume that the there are no zero-penalty tuples on the page, with both approaches you have to call penalty on every tuple to know which is best. If there are zero-penalty tuples, then there is a small difference. With Alexander's method, you can stop looking as soon as you find a zero-penalty tuple (the order you check the tuples is random). With my method, you can stop looking as soon as you find a zero-penalty tuple, *and* you decide to not look further. With the 1/2 probability to stop looking further, you give up on average after 2 tuples. So if I'm doing my math right, my patch does on average between 1x - 2x as many penalty calls as Alexander's, depending on how many zero-penalty tuples there are. The 2x case is when the page is full of zero-penalty tuples, in which case the difference isn't big in absolute terms; 2 penalty calls per page versus 1. BTW, one thing that I wondered about this: How expensive is random()? I'm assuming not very, but I don't really know. Alexander's patch called random() for every tuple on the page, while I call it only once for each equal-penalty tuple. If it's really cheap, I think my/Tom's patch could be slightly simplified by always initializing keep_current_best with random() at top of the function, and eliminating the -1 haven't decided yet state. OTOH, if random() is expensive, I note that we only need one bit of randomness, so we could avoid calling random() so often if we utilized all the bits from each random() call. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] text search: restricting the number of parsed words in headline generation
On Wed, Aug 15, 2012 at 11:09:18PM +0530, Sushant Sinha wrote: I will do the profiling and present the results. Sushant, do you have any profiling results on this issue from August? --- On Wed, 2012-08-15 at 12:45 -0400, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: Is this a TODO? AFAIR nothing's been done about the speed issue, so yes. I didn't like the idea of creating a user-visible knob when the speed issue might be fixable with internal algorithm improvements, but we never followed up on this in either fashion. regards, tom lane --- On Tue, Aug 23, 2011 at 10:31:42PM -0400, Tom Lane wrote: Sushant Sinha sushant...@gmail.com writes: Doesn't this force the headline to be taken from the first N words of the document, independent of where the match was? That seems rather unworkable, or at least unhelpful. In headline generation function, we don't have any index or knowledge of where the match is. We discover the matches by first tokenizing and then comparing the matches with the query tokens. So it is hard to do anything better than first N words. After looking at the code in wparser_def.c a bit more, I wonder whether this patch is doing what you think it is. Did you do any profiling to confirm that tokenization is where the cost is? Because it looks to me like the match searching in hlCover() is at least O(N^2) in the number of tokens in the document, which means it's probably the dominant cost for any long document. I suspect that your patch helps not so much because it saves tokenization costs as because it bounds the amount of effort spent in hlCover(). I haven't tried to do anything about this, but I wonder whether it wouldn't be possible to eliminate the quadratic blowup by saving more state across the repeated calls to hlCover(). At the very least, it shouldn't be necessary to find the last query-token occurrence in the document from scratch on each and every call. Actually, this code seems probably flat-out wrong: won't every successful call of hlCover() on a given document return exactly the same q value (end position), namely the last token occurrence in the document? How is that helpful? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #6572: The example of SPI_execute is bogus
On Wed, Aug 29, 2012 at 01:13:51PM +, Rajeev rastogi wrote: From: pgsql-bugs-ow...@postgresql.org [pgsql-bugs-ow...@postgresql.org] on behalf of Bruce Momjian [br...@momjian.us] Sent: Wednesday, August 29, 2012 8:46 AM To: Tom Lane Cc: Robert Haas; Hitoshi Harada; pgsql-b...@postgresql.org; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] [BUGS] BUG #6572: The example of SPI_execute is bogus On Sun, Apr 15, 2012 at 12:29:39PM -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Apr 5, 2012 at 2:39 AM, Hitoshi Harada umi.tan...@gmail.com wrote: On Wed, Apr 4, 2012 at 8:00 AM, Tom Lane t...@sss.pgh.pa.us wrote: Given the lack of complaints since 9.0, maybe we should not fix this but just redefine the new behavior as being correct? But it seems mighty inconsistent that the tuple limit would apply if you have RETURNING but not when you don't. In any case, the ramifications are wider than one example in the SPI docs. To be honest, I was surprised when I found tcount parameter is said to be applied to even INSERT. I believe people think that parameter is to limit memory consumption when returning tuples thus it'd be applied for only SELECT or DML with RETURNING. So I'm +1 for non-fix but redefine the behavior. Who wants to limit the number of rows processed inside the backend, from SPI? Yeah. Okay, apparently nobody cares about RETURNING behaving differently from non-RETURNING, so the consensus is to redefine the current behavior as correct. That means what we need is to go through the docs and see what places need to be updated (and, I guess, back-patch the changes to 9.0). I will get to this if nobody else does, but not right away. Would someone make the doc change outlined above? Thanks. I would like to work on this documentation bug. As per analysis I am planning to update following SPI function: 1. SPI_Execute: Here we will mention that argument count is used only for the kind of command which returns result i.e. all kind of SELECT and DML with returning clause. count is ignored for any other kind of commands. I will add one example also to indicate the difference. 2. SPI_execute_plan_with_paramlist: Here we can give just reference to SPI_execute i.e. I will mention that count has same interpretation as in SPI_execute. 3. SPI_execp: Here we can give just reference to SPI_execute i.e. I will mention that count has same interpretation as in SPI_execute. Would someone please provide answers to these questions, or write a patch? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Wed, 2013-01-16 at 17:38 -0800, Jeff Davis wrote: New version of checksums patch. And another new version of both patches. Changes: * Rebased. * Rename SetBufferCommitInfoNeedsSave to MarkBufferDirtyHint. Now that it's being used more places, it makes sense to give it a more generic name. * My colleague, Yingjie He, noticed that the FSM doesn't write any WAL, and therefore we must protect those operations against torn pages. That seems simple enough: just use MarkBufferDirtyHint (formerly SetBufferCommitInfoNeedsSave) instead of MarkBufferDirty. The FSM changes are not critical, so the fact that we may lose the dirty bit is OK. Regards, Jeff Davis replace-tli-with-checksums-20130124.patch.gz Description: GNU Zip compressed data checksums-20130124.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG #6510: A simple prompt is displayed using wrong charset
On 01/24/2013 11:19 AM, Noah Misch wrote: On Thu, Jan 24, 2013 at 08:50:36AM -0500, Andrew Dunstan wrote: On 01/24/2013 03:42 AM, Craig Ringer wrote: On 01/24/2013 01:06 AM, Alexander Law wrote: Hello, Please let me know if I can do something to get the bug fix (https://commitfest.postgresql.org/action/patch_view?id=902) committed. I would like to fix other bugs related to postgres localization, but I am not sure yet how to do it. For anyone looking for the history, the 1st post on this topic is here: http://www.postgresql.org/message-id/e1s3twb-0004oy...@wrigleys.postgresql.org Yeah. I'm happy enough with this patch. ISTM it's really a bug and should be backpatched, no? It is a bug, yes. I'm neutral on whether to backpatch. Well, that's what I did. :-) cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gistchoose vs. bloat
Heikki Linnakangas hlinnakan...@vmware.com writes: BTW, one thing that I wondered about this: How expensive is random()? I'm assuming not very, but I don't really know. Alexander's patch called random() for every tuple on the page, while I call it only once for each equal-penalty tuple. If it's really cheap, I think my/Tom's patch could be slightly simplified by always initializing keep_current_best with random() at top of the function, and eliminating the -1 haven't decided yet state. I thought about that too, and concluded that random() is probably expensive enough that we don't want to call it unnecessarily. OTOH, if random() is expensive, I note that we only need one bit of randomness, so we could avoid calling random() so often if we utilized all the bits from each random() call. Meh. That would hard-wire the decision that the probability of keeping a best tuple is exactly 0.5. I'd rather keep the flexibility to tune it later. The way your patch is set up, it seems unlikely that it will call random() very many times per page, so I'm not that concerned about minimizing the number of calls further. (Also, in the probably-common case that there are no exactly equally good alternatives, this would actually be a net loss since it would add one unnecessary call.) So basically, Alexander's method requires more random() calls and fewer penalty() calls than yours, at least in typical cases. It's hard to say which is faster without benchmarking, and it would matter which opclass you were testing on. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #6572: The example of SPI_execute is bogus
Bruce Momjian br...@momjian.us writes: Would someone make the doc change outlined above? Thanks. Sorry, I'd nearly forgotten about this issue. Will see about fixing the docs. (It looks like some of the comments in execMain.c could use work too.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] autovacuum not prioritising for-wraparound tables
Hi, I have a bug pending that autovacuum fails to give priority to for-wraparound tables. When xid consumption rate is high and dead tuple creation is also high, it is possible that some tables are waiting for for-wraparound vacuums that don't complete in time because the workers are busy processing other tables that have accumulated dead tuples; the system is then down because it's too near the Xid wraparound horizon. Apparently this is particularly notorious in connection with TOAST tables, because those are always put in the tables-to-process list after regular tables. (As far as I recall, this was already reported elsewhere, but so far I have been unable to find the discussion in the archives. Pointers appreciated.) So here's a small, backpatchable patch that sorts the list of tables to process (not all that much tested yet). Tables which have the wraparound flag set are processed before those that are not. Other than this criterion, the order is not defined. Now we could implement this differently, and maybe more simply (say by keeping two lists of tables to process, one with for-wraparound tables and one with the rest) but this way it is simpler to add additional sorting criteria later: say within each category we could first process smaller tables that have more dead tuples. My intention is to clean this up and backpatch to all live branches. Comments? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services *** a/src/backend/postmaster/autovacuum.c --- b/src/backend/postmaster/autovacuum.c *** *** 1890,1895 get_database_list(void) --- 1890,1919 return dblist; } + typedef struct vacuum_table + { + Oid reloid; + bool wraparound; + } vacuum_table; + + /* + * qsort comparator. Tables marked for wraparound sort first. + */ + static int + vacpriority_comparator(const void *a, const void *b) + { + const vacuum_table *taba = *(vacuum_table *const *) a; + const vacuum_table *tabb = *(vacuum_table *const *) b; + + if (taba-wraparound == tabb-wraparound) + return 0; + + if (taba-wraparound) + return -1; + else + return 1; + } + /* * Process a database table-by-table * *** *** 1903,1917 do_autovacuum(void) HeapTuple tuple; HeapScanDesc relScan; Form_pg_database dbForm; - List *table_oids = NIL; HASHCTL ctl; HTAB *table_toast_map; - ListCell *volatile cell; PgStat_StatDBEntry *shared; PgStat_StatDBEntry *dbentry; BufferAccessStrategy bstrategy; ScanKeyData key; TupleDesc pg_class_desc; /* * StartTransactionCommand and CommitTransactionCommand will automatically --- 1927,1943 HeapTuple tuple; HeapScanDesc relScan; Form_pg_database dbForm; HASHCTL ctl; HTAB *table_toast_map; PgStat_StatDBEntry *shared; PgStat_StatDBEntry *dbentry; BufferAccessStrategy bstrategy; ScanKeyData key; TupleDesc pg_class_desc; + vacuum_table **tables; + int ntables; + int maxtables; + int i; /* * StartTransactionCommand and CommitTransactionCommand will automatically *** *** 1986,1991 do_autovacuum(void) --- 2012,2022 ctl, HASH_ELEM | HASH_FUNCTION); + /* initialize our tasklist */ + ntables = 0; + maxtables = 32; + tables = palloc(maxtables * sizeof(vacuum_table *)); + /* * Scan pg_class to determine which tables to vacuum. * *** *** 2077,2085 do_autovacuum(void) } else { ! /* relations that need work are added to table_oids */ if (dovacuum || doanalyze) ! table_oids = lappend_oid(table_oids, relid); /* * Remember the association for the second pass. Note: we must do --- 2108,2133 } else { ! /* relations that need work are added to our tasklist */ if (dovacuum || doanalyze) ! { ! vacuum_table *tab; ! ! /* enlarge the array if necessary */ ! if (ntables = maxtables) ! { ! maxtables *= 2; ! tables = repalloc(tables, ! maxtables * sizeof(vacuum_table *)); ! } ! ! tab = palloc(sizeof(vacuum_table)); ! ! tab-reloid = relid; ! tab-wraparound = wraparound; ! ! tables[ntables++] = tab; ! } /* * Remember the association for the second pass. Note: we must do *** *** 2162,2168 do_autovacuum(void) /* ignore analyze for toast tables */ if (dovacuum) ! table_oids = lappend_oid(table_oids, relid); } heap_endscan(relScan); --- 2210,2233 /* ignore analyze for toast tables */ if (dovacuum) ! { ! vacuum_table *tab; ! ! /* enlarge the array if necessary */ ! if (ntables = maxtables) ! { ! maxtables *= 2; ! tables = repalloc(tables, ! maxtables * sizeof(vacuum_table *)); ! } ! ! tab = palloc(sizeof(vacuum_table)); ! ! tab-reloid = relid; ! tab-wraparound = wraparound; ! !
Re: [HACKERS] COPY FREEZE has no warning
On Wed, Jan 23, 2013 at 02:02:46PM -0500, Bruce Momjian wrote: As a reminder, COPY FREEZE still does not issue any warning/notice if the freezing does not happen: Requests copying the data with rows already frozen, just as they would be after running the commandVACUUM FREEZE/ command. This is intended as a performance option for initial data loading. Rows will be frozen only if the table being loaded has been created in the current subtransaction, there are no cursors open and there are no older snapshots held by this transaction. If those conditions are not met the command will continue without error though will not freeze rows. It is also possible in rare cases that the request cannot be honoured for internal reasons, hence literalFREEZE/literal is more of a guideline than a hard rule. Note that all other sessions will immediately be able to see the data once it has been successfully loaded. This violates the normal rules of MVCC visibility and by specifying this option the user acknowledges explicitly that this is understood. Didn't we want to issue the user some kind of feedback? As no one wanted to write this patch, I have developed the attached version. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c new file mode 100644 index 8778e8b..c8c9821 *** a/src/backend/commands/copy.c --- b/src/backend/commands/copy.c *** CopyFrom(CopyState cstate) *** 2011,2023 * where we can apply the optimization, so in those rare cases * where we cannot honour the request we do so silently. */ ! if (cstate-freeze ! ThereAreNoPriorRegisteredSnapshots() ! ThereAreNoReadyPortals() ! (cstate-rel-rd_newRelfilenodeSubid == GetCurrentSubTransactionId() || ! cstate-rel-rd_createSubid == GetCurrentSubTransactionId())) ! hi_options |= HEAP_INSERT_FROZEN; } /* * We need a ResultRelInfo so we can use the regular executor's --- 2011,2029 * where we can apply the optimization, so in those rare cases * where we cannot honour the request we do so silently. */ ! if (cstate-freeze) ! { ! if (ThereAreNoPriorRegisteredSnapshots() ! ThereAreNoReadyPortals() ! (cstate-rel-rd_newRelfilenodeSubid == GetCurrentSubTransactionId() || ! cstate-rel-rd_createSubid == GetCurrentSubTransactionId())) ! hi_options |= HEAP_INSERT_FROZEN; ! else ! ereport(NOTICE, (errmsg(unable to honor \freeze\ option))); ! } } + else if (cstate-freeze) + ereport(NOTICE, (errmsg(unable to honor \freeze\ option))); /* * We need a ResultRelInfo so we can use the regular executor's diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out new file mode 100644 index 78c601f..126b3c7 *** a/src/test/regress/expected/copy2.out --- b/src/test/regress/expected/copy2.out *** SELECT * FROM vistest; *** 334,345 --- 334,347 COMMIT; TRUNCATE vistest; COPY vistest FROM stdin CSV FREEZE; + NOTICE: unable to honor freeze option BEGIN; INSERT INTO vistest VALUES ('z'); SAVEPOINT s1; TRUNCATE vistest; ROLLBACK TO SAVEPOINT s1; COPY vistest FROM stdin CSV FREEZE; + NOTICE: unable to honor freeze option SELECT * FROM vistest; a -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #6572: The example of SPI_execute is bogus
On Thu, Jan 24, 2013 at 04:51:04PM -0500, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: Would someone make the doc change outlined above? Thanks. Sorry, I'd nearly forgotten about this issue. Will see about fixing the docs. (It looks like some of the comments in execMain.c could use work too.) Thanks. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers