date:20130124

On 24 January 2013 07:25, Pavan Deolasee pavan.deola...@gmail.com wrote:
 On Thu, Jan 24, 2013 at 12:36 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 The docs on bgworker twice refer to HOT standby.  I don't think that in
 either case, the hot needs emphasis, and if it does making it look like an
 acronym (one already used for something else) is probably not the way to do
 it.

 I think it should it be Hot Standby for consistency. +1 for changing
 it from HOT to hot/Hot anyway

Patch applied. Thanks

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG #6510: A simple prompt is displayed using wrong charset

2013-01-24 Thread Craig Ringer

On 01/24/2013 01:06 AM, Alexander Law wrote:
 Hello,
 Please let me know if I can do something to get the bug fix
 (https://commitfest.postgresql.org/action/patch_view?id=902) committed.
 I would like to fix other bugs related to postgres localization, but I
 am not sure yet how to do it.

For anyone looking for the history, the 1st post on this topic is here:

http://www.postgresql.org/message-id/e1s3twb-0004oy...@wrigleys.postgresql.org

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state

On 24 January 2013 01:17, Robert Haas robertmh...@gmail.com wrote:

 I agree.  The thing that scares me about the logical replication stuff
 is not that it might be slow (and if your numbers are to be believed,
 it isn't), but that I suspect it's riddled with bugs and possibly some
 questionable design decisions.  If we commit it and release it, then
 we're going to be stuck maintaining it for a very, very long time.  If
 it turns out to have serious bugs that can't be fixed without a new
 major release, it's going to be a serious black eye for the project.

 Of course, I have no evidence that that will happen.


This is a generic argument against applying any invasive patch. I
agree 9.2 had major bugs on release, though that was because of the
invasive nature of some of the changes, even in seemingly minor
patches.

The most invasive and therefore risky changes in this release are
already committed - changes to the way WAL reading and timelines work.
If we don't apply a single additional patch in this CF, we will still
in my opinion have a major requirement for beta testing prior to
release.

The code executed here is isolated to users of the new feature and is
therefore low risk to non-users. Of course there will be bugs.
Everybody understands what new feature means and we as a project
aren't exposed to risks from this. New feature also means
groundbreaking new capabilities, so the balance of high reward, low
risk means this gets my vote to apply. I'm just about to spend some
days giving a final review on it to confirm/refute that opinion in
technical detail.

Code using these features is available and marked them clearly as full
copyright transfer to PGDG, TPL licenced. That code is external not by
author's choice, but at the specific request of the project to make it
thay way. I personally will be looking to add code to core over time.
It was useful for everybody that replication solutions started out of
core, but replication is now a core requirement for databases and we
must fully deliver on that thought.

I agree with your concern re: checksums and foreign key locks. FK
locks has had considerable review and support, so I expect that to be
a manageable issue.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting visibility map in VACUUM's second phase

2013-01-24 Thread Jeff Janes

On Friday, December 7, 2012, Pavan Deolasee wrote:



Revised patch attached. This time much less invasive. I added a new
 function heap_page_is_all_visible() to check if the given page is
 all-visible and also return the visibility cutoff xid.


Hi Pavan,

lazy_vacuum_page now pins and unpins the vmbuffer for each page it marks
all-visible, which seems like a lot of needless traffic since the next
vmbuffer is likely to be the same as the previous one.

Could it instead get vmbuffer passed down to it from the two calling sites?
  (One call site would have to have it introduced, just for this purpose,
the other should be able to use its existing one.)

Cheers,

Jeff

Re: [HACKERS] [sepgsql 1/3] add name qualified creation label

2013-01-24 Thread Kohei KaiGai

2013/1/24 Tom Lane t...@sss.pgh.pa.us:
 John R Pierce pie...@hogranch.com writes:
 On 1/23/2013 8:32 PM, Tom Lane wrote:
 FWIW, in Fedora-land I see: ...

 I'd be far more interested in what is in RHEL and CentOS.Fedora,
 with its 6 month obsolescence cycle, is of zero interest to me for
 deploying database servers.

 But of course Fedora is also the upstream that will become RHEL7
 and beyond.

 EL6 has libselinux 2.0.94
 EL5 has libselinux 1.33.4

 sepgsql already requires libselinux 2.0.99, so it doesn't appear to me
 that moving that goalpost is going to change things one way or the other
 for the existing RHEL branches.  I couldn't ship contrib/sepgsql today
 in those branches.

 It might be that the update timing makes a bigger difference in some
 other distros, though.  To return to Heikki's original point about
 Debian, what are they shipping today?

Even though I'm not good at release cycle of Debian, I tried to check
the shipped version of postgresql and libselinux for stable, testing,
unstable and experimental release.
I'm not certain why they don't push postgresql-9.2 into experimental
release yet. However, it seems to me optimistic libselinux-2.1.10 being
bundled on the timeline of postgresql-9.3.

If someone familiar with Debian's release cycle, I'd like to see the suggestion.

* Debian (stable) ... postgresql-8.4 + libselinux-2.0.96
http://packages.debian.org/en/squeeze/postgresql
http://packages.debian.org/en/source/squeeze/libselinux

* Debian (testing) ... postgresql-9.1 + libselinux-2.1.9
http://packages.debian.org/en/wheezy/postgresql
http://packages.debian.org/en/source/wheezy/libselinux

* Debian (unstable) ... postgresql-9.1 + libselinux-2.1.9
http://packages.debian.org/en/sid/postgresql
http://packages.debian.org/en/source/sid/libselinux

* Debian (experimental) ... postgresql-9.1 + libselinux-2.1.12
http://packages.debian.org/en/experimental/postgresql
http://packages.debian.org/en/source/experimental/libselinux

Also, Ubuntu almost reflects Debian's release.

* Ubuntu 11.10 ... postgresql-9.1 + libselinux-2.0.98
https://launchpad.net/ubuntu/oneiric/+package/postgresql
https://launchpad.net/ubuntu/oneiric/+source/libselinux

* Ubuntu 12.04 ... postgresql-9.1 + libselinux-2.1.0
https://launchpad.net/ubuntu/precise/+package/postgresql
https://launchpad.net/ubuntu/precise/+source/libselinux

* Ubuntu 12.10 ... postgresql-9.1 + libselinux-2.1.9
https://launchpad.net/ubuntu/quantal/+package/postgresql
https://launchpad.net/ubuntu/quantal/+source/libselinux

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CF3+4 (was Re: Parallel query execution)

2013-01-24 Thread Pavan Deolasee

On Wed, Jan 23, 2013 at 10:14 PM, Robert Haas robertmh...@gmail.com wrote:

 I think there's been something of a professionalization of PostgreSQL
 development over the last few years.   More and more people are able
 to get paid to work on PostgreSQL as part or in a few cases all of
 their job.  This trend includes me, but also a lot of other people.
 There are certainly good things about this, but I think it increases
 the pressure to get patches committed.

Also many of the new developers who were previously working from
proprietary companies (includes me) may not have seen so long cycles
for feature development, from design to acceptance/rejection. The
problem aggravates because the same developer was previously doing
development at much much faster pace and will find our style very
conservative. Of course, the quality of work might be a notch lower
when you are coding features at that pace and you may introduce
regression and bugs on the way, but that probably gets compensated by
more structured QA and testing that proprietary companies do. And many
of these developers might be working on equally important and mission
critical products even before. I know we are very conscious of our
code quality and product stability, and for the right reasons, but I
wonder the overall productivity will go up if we do the same i.e. have
quick feature acceptance cycles compensated by more QA. The problem is
being a community driven project we attract more developers but very
less QA people.

Would it help to significantly enhance our regression coverage so that
bugs are caught early (assuming that's a problem) ? If so, may be we
should have a month-long QAfest once where every developer is only
writing test cases.

Would it help to have a formal product manager (or a team) that gets
to decide whether we want a particular feature or not ? This might be
similar to our current model of discussion on hackers but more time
bound and with the authority to the team to accept/reject ideas at the
end of the time period and not keeping it vague for later decision
after people have put in a lot of efforts already.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting visibility map in VACUUM's second phase

2013-01-24 Thread Pavan Deolasee

Hi Jeff,

On Thu, Jan 24, 2013 at 2:41 PM, Jeff Janes jeff.ja...@gmail.com wrote:

 lazy_vacuum_page now pins and unpins the vmbuffer for each page it marks
 all-visible, which seems like a lot of needless traffic since the next
 vmbuffer is likely to be the same as the previous one.


Good idea. Even though the cost of pinning/unpinning may not be high
with respect to the vacuum cost itself, but it seems to be a good idea
because we already do that at other places. Do you have any other
review comments on the patch or I'll fix this and send an updated
patch soon.

Thanks,
Pavan
-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH 0/3] Work around icc miscompilation

I'm sending three smaller patches for review, which try to fix icc
and pathscale (mis)compilation problems described in my last email.

Not sure if we need to fix the overflow check x + 1 = 0 (x  0) at
src/backend/executor/execQual.c:3088, which icc optimizes away, too.

  Fix x + y  x overflow checks
  Fix x + 1  x overflow checks
  Fix overflow checking in repeat()

 src/backend/utils/adt/float.c |8 
 src/backend/utils/adt/oracle_compat.c |   18 +++---
 src/backend/utils/adt/varbit.c|7 +--
 src/backend/utils/adt/varlena.c   |   10 ++
 src/pl/plpgsql/src/pl_exec.c  |4 ++--
 5 files changed, 24 insertions(+), 23 deletions(-)

On 1/20/13 2:58 AM, Xi Wang wrote:
 Intel's icc and PathScale's pathcc compilers optimize away several
 overflow checks, since they consider signed integer overflow as
 undefined behavior.  This leads to a vulnerable binary.
 
 Currently we use -fwrapv to disable such (mis)optimizations in gcc,
 but not in other compilers.
 
 
 Examples
 
 
 1) x + 1 = 0 (assuming x  0).
 
 src/backend/executor/execQual.c:3088
 
 Below is the simplified code.
 
 -
 void bar(void);
 void foo(int this_ndims)
 {
  if (this_ndims = 0)
  return;
  int elem_ndims = this_ndims;
  int ndims = elem_ndims + 1;
  if (ndims = 0)
  bar();
 }
 -
 
 $ icc -S -o - sadd1.c
 ...
 foo:
 # parameter 1: %edi
 ..B1.1:
 ..___tag_value_foo.1:
 ..B1.2:
  ret
 
 2) x + 1  x
 
 src/backend/utils/adt/float.c:2769
 src/backend/utils/adt/float.c:2785
 src/backend/utils/adt/oracle_compat.c:1045 (x + C  x)
 
 Below is the simplified code.
 
 -
 void bar(void);
 void foo(int count)
 {
  int result = count + 1;
  if (result  count)
  bar();
 }
 -
 
 $ icc -S -o - sadd2.c
 ...
 foo:
 # parameter 1: %edi
 ..B1.1:
 ..___tag_value_foo.1:
  ret
 3) x + y = x (assuming y  0)
 
 src/backend/utils/adt/varbit.c:1142
 src/backend/utils/adt/varlena.c:1001
 src/backend/utils/adt/varlena.c:2024
 src/pl/plpgsql/src/pl_exec.c:1975
 src/pl/plpgsql/src/pl_exec.c:1981
 
 Below is the simplified code.
 
 -
 void bar(void);
 void foo(int sp, int sl)
 {
  if (sp = 0)
  return;
  int sp_pl_sl = sp + sl;
  if (sp_pl_sl = sl)
  bar();
 }
 -
 
 $ icc -S -o - sadd3.c
 foo:
 # parameter 1: %edi
 # parameter 2: %esi
 ..B1.1:
 ..___tag_value_foo.1:
 ..B1.2:
  ret
 
 Possible fixes
 ==
 
 * Recent versions of icc and pathcc support gcc's workaround option,
 -fno-strict-overflow, to disable some optimizations based on signed
 integer overflow.  It's better to add this option to configure.
 They don't support gcc's -fwrapv yet.
 
 * This -fno-strict-overflow option cannot help in all cases: it cannot
 prevent the latest icc from (mis)compiling the 1st case.  We could also
 fix the source code by avoiding signed integer overflows, as follows.
 
 x + y = 0 (assuming x  0, y  0)
 -- x  INT_MAX - y
 
 x + y = x (assuming y  0)
 -- x  INT_MAX - y
 
 I'd suggest to fix the code rather than trying to work around the
 compilers since the fix seems simple and portable.
 
 See two recent compiler bugs of -fwrapv/-fno-strict-overflow as well.
 
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55883
 http://software.intel.com/en-us/forums/topic/358200
 
 * I don't have access to IBM's xlc compiler.  Not sure how it works for
 the above cases.
 
 - xi
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Event Triggers: adding information

2013-01-24 Thread Dimitri Fontaine

Robert Haas robertmh...@gmail.com writes:
 Or maybe we should just silently ignore failures to look up the event
 trigger.  That might be better, because the DBA could always do:

 DROP FUNCTION myeventtrgfn() CASCADE;

 ...and it would be undesirable for other sessions to error out in that
 case due to SnapshotNow effects.

What about taking a lock on the functions we decide we will need to be
running, maybe a ShareUpdateExclusiveLock, so that the function can not
disappear under us from concurrent activity?

Note to self, most probably using:
LockRelationOid(fnoid, ShareUpdateExclusiveLock);

After all, we might be right not to optimize for DDL concurrency…

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH 1/3] Fix x + y x overflow checks

icc optimizes away the overflow check x + y  x (y  0), because
signed integer overflow is undefined behavior in C.  Instead, use
a safe precondition test x  INT_MAX - y.
---
 src/backend/utils/adt/varbit.c  |7 +--
 src/backend/utils/adt/varlena.c |   10 ++
 src/pl/plpgsql/src/pl_exec.c|4 ++--
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/src/backend/utils/adt/varbit.c b/src/backend/utils/adt/varbit.c
index 1712c12..af69200 100644
--- a/src/backend/utils/adt/varbit.c
+++ b/src/backend/utils/adt/varbit.c
@@ -16,6 +16,8 @@
 
 #include postgres.h
 
+#include limits.h
+
 #include access/htup_details.h
 #include libpq/pqformat.h
 #include nodes/nodeFuncs.h
@@ -1138,12 +1140,13 @@ bit_overlay(VarBit *t1, VarBit *t2, int sp, int sl)
ereport(ERROR,
(errcode(ERRCODE_SUBSTRING_ERROR),
 errmsg(negative substring length not 
allowed)));
-   sp_pl_sl = sp + sl;
-   if (sp_pl_sl = sl)
+   if (sl  INT_MAX - sp)
ereport(ERROR,
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
 errmsg(integer out of range)));
 
+   sp_pl_sl = sp + sl;
+
s1 = bitsubstring(t1, 1, sp - 1, false);
s2 = bitsubstring(t1, sp_pl_sl, -1, true);
result = bit_catenate(s1, t2);
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 95e41bf..c907f44 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -997,12 +997,13 @@ text_overlay(text *t1, text *t2, int sp, int sl)
ereport(ERROR,
(errcode(ERRCODE_SUBSTRING_ERROR),
 errmsg(negative substring length not 
allowed)));
-   sp_pl_sl = sp + sl;
-   if (sp_pl_sl = sl)
+   if (sl  INT_MAX - sp)
ereport(ERROR,
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
 errmsg(integer out of range)));
 
+   sp_pl_sl = sp + sl;
+
s1 = text_substring(PointerGetDatum(t1), 1, sp - 1, false);
s2 = text_substring(PointerGetDatum(t1), sp_pl_sl, -1, true);
result = text_catenate(s1, t2);
@@ -2020,12 +2021,13 @@ bytea_overlay(bytea *t1, bytea *t2, int sp, int sl)
ereport(ERROR,
(errcode(ERRCODE_SUBSTRING_ERROR),
 errmsg(negative substring length not 
allowed)));
-   sp_pl_sl = sp + sl;
-   if (sp_pl_sl = sl)
+   if (sl  INT_MAX - sp)
ereport(ERROR,
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
 errmsg(integer out of range)));
 
+   sp_pl_sl = sp + sl;
+
s1 = bytea_substring(PointerGetDatum(t1), 1, sp - 1, false);
s2 = bytea_substring(PointerGetDatum(t1), sp_pl_sl, -1, true);
result = bytea_catenate(s1, t2);
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 0ecf651..a9cf6df 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -1972,13 +1972,13 @@ exec_stmt_fori(PLpgSQL_execstate *estate, 
PLpgSQL_stmt_fori *stmt)
 */
if (stmt-reverse)
{
-   if ((int32) (loop_value - step_value)  loop_value)
+   if (loop_value  INT_MIN + step_value)
break;
loop_value -= step_value;
}
else
{
-   if ((int32) (loop_value + step_value)  loop_value)
+   if (loop_value  INT_MAX - step_value)
break;
loop_value += step_value;
}
-- 
1.7.10.4



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH 2/3] Fix x + 1 x overflow checks

icc optimizes away x + 1  x because signed integer overflow is
undefined behavior in C.  Instead, simply check if x is INT_MAX.
---
 src/backend/utils/adt/float.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index b73e0d5..344b092 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -2764,12 +2764,12 @@ width_bucket_float8(PG_FUNCTION_ARGS)
result = 0;
else if (operand = bound2)
{
-   result = count + 1;
/* check for overflow */
-   if (result  count)
+   if (count == INT_MAX)
ereport(ERROR,

(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
 errmsg(integer out of 
range)));
+   result = count + 1;
}
else
result = ((float8) count * (operand - bound1) / (bound2 
- bound1)) + 1;
@@ -2780,12 +2780,12 @@ width_bucket_float8(PG_FUNCTION_ARGS)
result = 0;
else if (operand = bound2)
{
-   result = count + 1;
/* check for overflow */
-   if (result  count)
+   if (count == INT_MAX)
ereport(ERROR,

(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
 errmsg(integer out of 
range)));
+   result = count + 1;
}
else
result = ((float8) count * (bound1 - operand) / (bound1 
- bound2)) + 1;
-- 
1.7.10.4



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH 3/3] Fix overflow checking in repeat()

icc optimizes away `check + VARHDRSZ = check' since signed integer
overflow is undefined behavior.  Simplify the overflow check for
`VARHDRSZ + count * slen' as `count  (INT_MAX - VARHDRSZ) / slen'.
---
 src/backend/utils/adt/oracle_compat.c |   18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/src/backend/utils/adt/oracle_compat.c 
b/src/backend/utils/adt/oracle_compat.c
index d448088..a85a672 100644
--- a/src/backend/utils/adt/oracle_compat.c
+++ b/src/backend/utils/adt/oracle_compat.c
@@ -15,6 +15,8 @@
  */
 #include postgres.h
 
+#include limits.h
+
 #include utils/builtins.h
 #include utils/formatting.h
 #include mb/pg_wchar.h
@@ -1034,20 +1036,14 @@ repeat(PG_FUNCTION_ARGS)
count = 0;
 
slen = VARSIZE_ANY_EXHDR(string);
-   tlen = VARHDRSZ + (count * slen);
 
/* Check for integer overflow */
-   if (slen != 0  count != 0)
-   {
-   int check = count * slen;
-   int check2 = check + VARHDRSZ;
-
-   if ((check / slen) != count || check2 = check)
-   ereport(ERROR,
-   
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-errmsg(requested length too large)));
-   }
+   if (slen != 0  count  (INT_MAX - VARHDRSZ) / slen)
+   ereport(ERROR,
+   (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+errmsg(requested length too large)));
 
+   tlen = VARHDRSZ + (count * slen);
result = (text *) palloc(tlen);
 
SET_VARSIZE(result, tlen);
-- 
1.7.10.4



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 0/3] Work around icc miscompilation


On 24.01.2013 11:33, Xi Wang wrote:

I'm sending three smaller patches for review, which try to fix icc
and pathscale (mis)compilation problems described in my last email.


These patches look ok at a quick glance, but how do we ensure this kind 
of problems don't crop back again in the future? Does icc give a warning 
about these? Do we have a buildfarm animal that produces the warnings?


If we fix these, can we stop using -frapv on gcc? Is there any way to 
get gcc to warn about these?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

On Thursday, January 24, 2013 7:42 AM Noah Misch wrote:
 On Wed, Jan 09, 2013 at 11:22:06AM +, Simon Riggs wrote:
  On 24 December 2012 16:57, Amit Kapila amit.kap...@huawei.com
 wrote:
   Performance: Average of 3 runs of pgbench in tps
   9.3devel  |  with trailing null patch
   --+--
   578.9872  |   573.4980
 
  On balance, it would seem optimizing for this special case would
  affect everybody negatively; not much, but enough. Which means we
  should rekect this patch.
 
  Do you have a reason why a different view might be taken?
 
 We've mostly ignored performance changes of a few percentage points,
 because
 the global consequences of adding or removing code to the binary image
 can
 easily change performance that much.  It would be great to get to the
 point
 where we can reason about 1% performance changes, but we're not there.
 If a
 measured +1% performance change would have yielded a different
 decision, we
 should rethink the decision based on more-robust criteria.

Today, I had taken one more set of readings of original pg_bench which are
below in this mail. The difference is that this set of readings are on Suse
11 and with Shared buffers - 4G
IMO, the changes in this patch are not causing any regression, however it
gives performance/size reduction
in some of the cases as shown by data in previous mails.
So the point to decide is whether the usecases in which it is giving benefit
are worth to have this Patch?
As Tom had already raised some concern's about the code, I think the Patch
can only have merit if the usecase
makes sense to people.

System Configuration: 
Hardware : 4 core (Intel(R) Xeon(R) CPU L5408 @ 2.13GHz)   RAM : 24GB 
Operating System: Suse-Linux 11.2 x86_64   

Sever Configuration: 
The database cluster will be initialized with locales 
  COLLATE:  C 
  CTYPE:C 
  MESSAGES: en_US.UTF-8 
  MONETARY: en_US.UTF-8 
  NUMERIC:  en_US.UTF-8 
  TIME: en_US.UTF-8 

shared_buffers = 4GB 
checkpoint_segments = 255   
checkpoint_timeout = 10min   

pgbench: 
transaction type: TPC-B (sort of) 
scaling factor: 75 
query mode: simple 
number of clients: 4 
number of threads: 4 
duration: 300 s 

   originalpatch   
 Run-1: 311 312 
 Run-2: 307 302 
 Run-3: 300 312 
 Run-4: 285 295 
 Run-5: 308 305 
  
 Avg  : 302.2  305.2


With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 0/3] Work around icc miscompilation

On 1/24/13 5:02 AM, Heikki Linnakangas wrote:
 These patches look ok at a quick glance, but how do we ensure this kind 
 of problems don't crop back again in the future? Does icc give a warning 
 about these? Do we have a buildfarm animal that produces the warnings?
 
 If we fix these, can we stop using -frapv on gcc? Is there any way to 
 get gcc to warn about these?

Thanks for reviewing.

gcc has this -Wstrict-overflow option to warn against overflow checks
that may be optimized away.  The result in inaccurate: it may produce
a large number of false warnings, and it may also miss many cases (esp.
when gcc's value-range-propagation fails to compute variables' ranges).

Not sure if other compilers have similar options.

I find these broken checks using a static checker I'm developing, and
only report cases that existing compilers do miscompile.  If you are
interested, I'll post a complete list of overflow checks in pgsql that
invoke undefined behavior and thus may be killed by future compilers.

I believe we can get rid of -fwrapv once we fix all such checks.

- xi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state


On 24.01.2013 00:30, Andres Freund wrote:

Hi,

I decided to reply on the patches thread to be able to find this later.

On 2013-01-23 22:48:50 +0200, Heikki Linnakangas wrote:

logical changeset generation v4
This is a boatload of infrastructure for supporting logical replication, yet
we have no code actually implementing logical replication that would go with
this. The premise of logical replication over trigger-based was that it'd be
faster, yet we cannot asses that without a working implementation. I don't
think this can be committed in this state.


Its a fair point that this is a huge amount of code without a user in
itself in-core.
But the reason it got no user included is because several people
explicitly didn't want a user in-core for now but said the first part of
this would be to implement the changeset generation as a separate
piece. Didn't you actually prefer not to have any users of this in-core
yourself?


Yes, I certainly did. But we still need to see the other piece of the 
puzzle to see how this fits with it.


BTW, why does all the transaction reordering stuff has to be in core?

How much of this infrastructure is to support replicating DDL changes? 
IOW, if we drop that requirement, how much code can we slash? Any other 
features or requirements that could be dropped? I think it's clear at 
this stage that this patch is not going to be committed as it is. If you 
can reduce it to a fraction of what it is now, that fraction might have 
a chance. Otherwise, it's just going to be pushed to the next commitfest 
as whole, and we're going to be having the same doubts and discussions then.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Event Triggers: adding information

2013-01-24 Thread Dimitri Fontaine

Robert Haas robertmh...@gmail.com writes:
 On Mon, Jan 21, 2013 at 12:27 PM, Dimitri Fontaine
 dimi...@2ndquadrant.fr wrote:
   - Context exposing and filtering

 I'm not feeling very sanguine about any of this.  I feel strongly that
 we should try to do something that's more principled than just
 throwing random junk into ProcessUtility.

The goal is to allow for advanced users of that feature to get at the
sequence, constraints and index names that have been picked
automatically by PostgreSQL when doing some CREATE TABLE statement that
involves them:

   CREATE TABLE foo (id serial PRIMARY KEY, f1 text);

Andres and Steve, as you're the next ones to benefit directly from that
whole feature and at different levels, do you have a strong opinion on
whether you really need that to operate at the logical level?

The trick is not implementing the feature (that's quite easy really),
it's about being able to provide it without any API promises, because
it's an opening on the internals.

Maybe the answer is to provide the same level of information as you get
in an Event Trigger to new hooks.

   - Additional Information to PL/pgSQL

 We're talking about :

   - operation   (CREATE, ALTER, DROP)
   - object type (TABLE, FUNCTION, TEXT SEARCH DICTIONARY, …)
   - object OID(when there's a single object target)
   - object shema name (when there's a single object target)
   - object name   (when there's a single object target)

I realise I omited something important here. The current patch effort is
only adding those information if the target object exists in the
catalogs at the time of the information lookup.

 In ddl_command_start, I believe you can only expose the information
 that is available in the parse tree.  That is, if the user typed
 CREATE TABLE foo, you can tell the event trigger that the user didn't
 specify a schema and that they mentioned the name foo.  You cannot

I used to do that, and that's the reason why I had the Command String
Normalisation in the same patch: both are walking the parsetree to some
extend, and I didn't want to have different routines doing different
walks to get at the information.

 predict what schema foo will actually end up in at that point, and we
 shouldn't try.  I don't have a problem with exposing the information
 we have at this point with the documented caveat that if you rely on
 it to mean something it doesn't you get to keep both pieces.

Fair enough. I retracted from doing that to be able to separate away the
parse tree walking code for later review, and I'm now wondering if
that's really what we want to do.

If auditing, you can log the information after the command has been run
(in case of CREATE and ALTER command) or before the object has been
deleted (in case of a DROP command).

If you want to cancel the whole operation depending on the name of the
object, or the schema it ends up in, then you can do it at the end of
the command's execution with a RAISE EXCEPTION.

 But I wonder: wouldn't it be better to just expose the raw string the
 user typed?  I mean, in all the cases we really care about, the
 information we can *reliably* expose at this point is going to be
 pretty nearly the same as extracting the third space-delimited word
 out of the command text and splitting it on a period.  And you can do
 that much easily even in PL/pgsql.  You could also extract a whole lot

Totally Agreed. That's another reason why I want to provide users with
the Normalized Command String, it will be then be even easier for them
to do what you just say.

After having been working on that for some time, though, I'm leaning
toward Tom's view that you shouldn't try to expose information about
objets that don't exists in the catalogs, because that's not code,
that's magic.

 of OTHER potentially useful information from the command text that you
 couldn't get if we just exposed the given schema name and object name.
  For example, this would allow you to write an event trigger that lets
 you enforce a policy that all column names must use Hungarian
 notation.  Note that this is a terrible idea and you shouldn't do it,
 but you could if you wanted to.  And, the trigger would be relatively
 long and complex and you might have bugs, but that's still better than
 the status quo where you are simply out of luck.

You seem to be talking about System Hungarian, not Apps Hungarian, but I
don't think that's the point here.

  http://www.joelonsoftware.com/articles/Wrong.html

 Now, in a ddl_command_end trigger, there's a lot more information that
 you can usefully expose.  In theory, if the command records what it
 did somewhere, you can extract that information with as much
 precision as it cared to record.  However, I'm pretty skeptical of the
 idea of exposing a single OID.  I mean, if the main use case here is

There's precedent in PostgreSQL: how do you get information about each
row that were in the target from a FOR EACH STATEMENT trigger?

 array

Re: [HACKERS] Event Triggers: adding information

2013-01-24 Thread Dimitri Fontaine

Robert Haas robertmh...@gmail.com writes:
 Can you point me specifically at what you have in mind so I can make
 sure I'm considering the right thing?

Something like this part:

+ -- now try something crazy to ensure we don't crash the backend
+ create function test_event_trigger_drop_function()
+  returns event_trigger
+  as $$
+ BEGIN
+ drop function test_event_trigger2() cascade;
+ END
+ $$ language plpgsql;
+ 
+ create function test_event_trigger2() returns event_trigger as $$
+ BEGIN
+ RAISE NOTICE 'test_event_trigger2: % %', tg_event, tg_tag;
+ END
+ $$ language plpgsql;
+ 
+ create event trigger drop_test_a on ddl_command_start
+ when tag in ('create table')
+execute procedure test_event_trigger_drop_function();
+ 
+ create event trigger drop_test_b on ddl_command_start
+execute procedure test_event_trigger2();
+
+ create table event_trigger_fire1 (a int);
+ 
+ -- then cleanup the crazy test
+ drop event trigger drop_test_a;
+ drop event trigger drop_test_b;
+ drop function test_event_trigger_drop_function();

The only problem for the regression tests being that there's an OID in
the ouput, but with your proposed error message that would go away.

-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state

Hi Robert, Hi all,

On 2013-01-23 20:17:04 -0500, Robert Haas wrote:
On Wed, Jan 23, 2013 at 5:30 PM, Andres Freund and...@2ndquadrant.com wrote:
The only reason the submitted version of logical decoding is
comparatively slow is that its xmin update policy is braindamaged,
working on that right now.

I agree. The thing that scares me about the logical replication stuff
is not that it might be slow (and if your numbers are to be believed,
it isn't), but that I suspect it's riddled with bugs and possibly some
questionable design decisions. If we commit it and release it, then
we're going to be stuck maintaining it for a very, very long time. If
it turns out to have serious bugs that can't be fixed without a new
major release, it's going to be a serious black eye for the project.

Thats way much more along the lines of what I am afraid of than the
performance stuff - but Heikki cited those, so I replied to that.

Note that I didn't say this must, must go in - I just don't think
Heikki's reasoning about why not hit the nail on the head.

Of course, I have no evidence that that will happen. But it is a
really big piece of code, and therefore unless you are superman, it's
probably got a really large number of bugs. The scary thing is that
it is not as if we can say, well, this is a big hunk of code, but it
doesn't really touch the core of the system, so if it's broken, it'll
be broken itself, but it won't break anything else. Rather, this code
is deeply in bed with WAL, with MVCC, and with the on-disk format of
tuples, and makes fundamental changes to the first two of those. You
agreed with Tom that 9.2 is the buggiest release in recent memory, but
I think logical replication could easily be an order of magnitude
worse.

I tried very, very hard to get the basics of the design interface
solid. Which obviously doesn't man I am succeeding - luckily not being
superhuman after all ;). And I think thats very much where input is
desparetely needed and where I failed to raise enough attention. The
output plugin interface follewed by the walsender interface is what
needs to be most closely vetted.
Those are the permanent, user/developer exposed UI and the one we should
try to keep as stable as possible.

The output plugin callbacks are defined here:
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=src/include/replication/output_plugin.h;hb=xlog-decoding-rebasing-cf4
To make it more agnostic of the technology to implement changeset
extraction we possibly should replace the ReorderBuffer(TXN|Change)
structs being passed by something more implementation agnostic.

walsender interface:
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=src/backend/replication/repl_gram.y;hb=xlog-decoding-rebasing-cf4
The interesting new commands are:
1) K_INIT_LOGICAL_REPLICATION NAME NAME
2) K_START_LOGICAL_REPLICATION NAME RECPTR plugin_options
3) K_FREE_LOGICAL_REPLICATION NAME

1 3 allocate (respectively free) the permanent state associated with
one changeset consumer whereas START_LOGICAL_REPLICATION streams out
changes starting at RECPTR.

Btw, there are currently *no* changes to the wal format at all if
wal_format logical except that xl_running_xacts are logged more
frequently which obviously could easily be made conditional. Baring bugs
of course.
The changes with wal_level=logical aren't that big either imo:
* heap_insert, heap_update prevent full page writes from removing their
normal record by using a separate XLogRecData block for the buffer and
the record
* heap_delete adds more data (the pkey of the tuple) after the unchanged
xl_heap_delete struct
* On changes to catalog tables (relfilenode, tid, cmin, cmax) are logged.

No changes to mvcc for normal backends at all, unless you count the very
slightly changed *Satisfies interface (getting passed a HeapTuple
instead of HeapTupleHeader).

I am not sure what you're concerned about WRT the on-disk format of the
tuples? We are pretty much nailed down on that due to pg_upgrade'ability
anyway and it could be changed from this patches POV without a problem,
the output plugin just sees normal HeapTuples? Or are you concerned
about the code extracting them from the xlog records?

So I think the won't break anything else argument can be made rather
fairly if the heapam.c changes, which aren't that complex, are vetted
closely.

Now, the disucssion about all the code thats active *during* decoding is
something else entirely :/

You
agreed with Tom that 9.2 is the buggiest release in recent memory, but
I think logical replication could easily be an order of magnitude
worse.

I unfortunately think that not providing more builtin capabilities in
this area also has significant dangers. Imo this is one of the weakest,
or even the weakest, area of postgres.

I personally have the impression that just about nobody did actual beta
testing of the lastest releases, especially 9.2, and that is the

Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

On Thursday, January 24, 2013 1:56 AM Andres Freund wrote:
 On 2013-01-22 12:32:07 +, Amit kapila wrote:
  This closes all comments raised till now for this patch.
  Kindly let me know if you feel something is missing?
 
 I am coming late to this patch, so bear with me if I repeat somethign
 said elsewhere.

Thanks for looking at patch.
 
 Review comments of cursory pass through the patch:
 * most comments are hard to understand. I know the problem of that
   being hard for a non-native speaker by heart, but I think another
 pass
   over them would be good thing.

I shall give another pass to improve the comments in code.

 * The gram.y changes arround set_rest_(more|common) seem pretty
 confused
   to me. 

E.g. its not possible anymore to set the timezone for a  function. 

What do you exactly mean by this part of comment.

 And why is it possible to persistently set the search path,
   but not client encoding? Why is FROM CURRENT in set_rest_more?

I think the main reason was some of the commands can work only in
transaction 
and as SET Persistent cannot work inside transaction block, so I had kept
some seggregation. 
I shall reply you the exact reason in separate mail.

 * set_config_file should elog(ERROR), not return on an unhandled
   setstmt-kind

Shall fix it.

 * why are you creating AutoConfFileName if its not stat'able? It seems
   better to simply skip parsing the old file in that case

Sure, it's better to handle as you are suggesting.

 * Writing the temporary file to .$pid seems like a bad idea, better use
   one file for that, SET PERSISTENT is protected by an exclusive lock
   anyway.

  I think we can use one temporary file, infact that was one of the ways I
have asked in one of the previous mails. 
  However Tom and Zoltan felt this is better way to do it. 
  I can think of one reason where it will be better to have .$pid, and that
is if some one has opened the 
  file manually, then all other sessions can fail (in WINDOWS). Infact this
is one of test Zoltan had performed.


 * the write sequence should be:
   * fsync(tempfile)
   * fsync(directory)
   * rename(tempfile, configfile)
   * fsync(configfile)
   * fsync(directory)

Why do we need fsync(directory) and fsync(configfile)?

 * write_auto_conf_file should probably escape quoted values?
  Can you please elaborate more, I am not able to understand your point?

 * coding style should be adhered to more closesly, there are many
   if (pointer) which should be if (pointer != NULL), 

Are you pointing in function  validate_conf_option(const char *name, char
*value) 
  for below usage: 
  + if (value)

 single-line blocks
   enclosed in curlies which shouldn't, etc.
Shall fix.

 * replace_auto_config_file and surrounding functions need more comments
   in the header

  Sure, shall add more comments in function header of following functions: 
  validate_conf_option() 
  RemoveAutoConfTmpFiles() 
  write_auto_conf_file() 
  replace_auto_config_value()

 * the check that prevents persistent SETs in a transaction should
 rather
   be in utility.c and use PreventTransactionChain like most of the
   others that need to do that (c.f. T_CreatedbStmt).

  We can move the check in utility.c, but if we use PreventTransactionChain,
then it will be disallowed 
  in functions as well. But the original idea was to not support in
transaction blocks only. 
  So I feel use of current function IsTransactionBlock() should be okay.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CF3+4 (was Re: Parallel query execution)

On Thursday, January 24, 2013 2:58 AM Stephen Frost wrote:
 Heikki,
 
 * Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
  FWIW, here's how I feel about some the patches. It's not an
 exhaustive list.
 
 Thanks for going through them and commenting on them.
 
 
  built-in/SQL Command to edit the server configuration file
  (postgresql.conf)
  I think this should be a pgfoundry project, not in core. At least for
 now.
 
 I don't think it would ever get deployed or used then..

  I think for this feature there has been lot of test already done and
reviewed by quite a few people.
  About the feature's value and high level design, there is quite a lot of
discuss already happened.
  At this point, there is no design level concern for this patch, there are
few issues which can be dealt
  with minor changes.
  Do you see any lack of importance for this feature?

With Regards,
Amit Kapila.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state

One random thing that caught my eye in the patch, I though I'd mention 
it while I still remember: In heap_delete, you call heap_form_tuple() in 
a critical section. That's a bad idea, because if it runs out of memory 
- PANIC.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state

On 2013-01-24 12:38:25 +0200, Heikki Linnakangas wrote:
On 24.01.2013 00:30, Andres Freund wrote:
Hi,

I decided to reply on the patches thread to be able to find this later.

On 2013-01-23 22:48:50 +0200, Heikki Linnakangas wrote:
logical changeset generation v4
This is a boatload of infrastructure for supporting logical replication, yet
we have no code actually implementing logical replication that would go with
this. The premise of logical replication over trigger-based was that it'd be
faster, yet we cannot asses that without a working implementation. I don't
think this can be committed in this state.

Its a fair point that this is a huge amount of code without a user in
itself in-core.
But the reason it got no user included is because several people
explicitly didn't want a user in-core for now but said the first part of
this would be to implement the changeset generation as a separate
piece. Didn't you actually prefer not to have any users of this in-core
yourself?

Yes, I certainly did. But we still need to see the other piece of the puzzle
to see how this fits with it.

Fair enough. I am also working on a user of this infrastructure but that
doesn't help you very much. Steve Singer seemed to make some stabs at
writing an output plugin as well. Steve, how far did you get there?

BTW, why does all the transaction reordering stuff has to be in core?

It didn't use to, but people argued pretty damned hard that no undecoded
data should ever allowed to leave the postgres cluster. And to be fair
it makes writing an output plugin *way* much easier. Check
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=contrib/test_decoding/test_decoding.c;hb=xlog-decoding-rebasing-cf4
If you skip over tuple_to_stringinfo(), which is just pretty generic
scaffolding for converting a whole tuple to a string, writing out the
changes in some format by now is pretty damn simple.

How much of this infrastructure is to support replicating DDL changes? IOW,
if we drop that requirement, how much code can we slash?

Unfortunately I don't think too much unless we add in other code that
allows us to check whether the current definition of a table is still
the same as it was back when the tuple was logged.

Any other features or requirements that could be dropped? I think it's clear
at this stage that
this patch is not going to be committed as it is. If you can reduce it to a
fraction of what it is now, that fraction might have a chance. Otherwise,
it's just going to be pushed to the next commitfest as whole, and we're
going to be having the same doubts and discussions then.

One thing that reduces complexity is to declare the following as
unsupported:
- CREATE TABLE foo(data text);
- DECODE UP TO HERE;
- INSERT INTO foo(data) VALUES(very-long-to-be-externally-toasted-tuple);
- DROP TABLE foo;
- DECODE UP TO HERE;

but thats just a minor thing.

I think what we can do more realistically than to chop of required parts
of changeset extraction is to start applying some of the preliminary
patches independently:
- the relmapper/relfilenode changes + pg_relation_by_filenode(spc,
relnode) should be independently committable if a bit boring
- allowing walsenders to connect to a database possibly needs an interface
change
but otherwise it should be fine to go in independently. It also has
other potential use-cases, so I think thats fair.
- logging xl_running_xact's more frequently could also be committed
independently and makes sense independently as it allows a standby to
enter HS faster if the master is busy
- Introducing InvalidCommandId should be relatively uncontroversial. The
fact that no invalid value for command ids exists is imo an oversight
- the *Satisfies change could be applied and they are imo ready but
there's no use-case for it without the rest, so I am not sure whether
theres a point
- currently not separately available, but we could add wal_level=logical
independently. There would be no user of it, but it would be partial
work. That includes the relcache support for keeping track of the
primary key which already is available separately.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] My first patch! (to \df output)

2013-01-24 Thread Phil Sorber

On Thu, Jan 24, 2013 at 2:27 AM, Craig Ringer cr...@2ndquadrant.com wrote:
 On 01/24/2013 01:50 AM, Phil Sorber wrote:
 This looks good to me now. Compiles and works as described.
 Ready to go?

 https://commitfest.postgresql.org/action/patch_view?id=1008


I guess I wasn't ready to be so bold, but sure. :) I changed it to
'ready for committer'.


 --
  Craig Ringer   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: logical changeset generation v4 - Heikki's thoughts about the patch state

On 2013-01-24 13:28:18 +0200, Heikki Linnakangas wrote:
 One random thing that caught my eye in the patch, I though I'd mention it
 while I still remember: In heap_delete, you call heap_form_tuple() in a
 critical section. That's a bad idea, because if it runs out of memory -
 PANIC.

Good point, will fix.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]

On 2013-01-24 16:45:42 +0530, Amit Kapila wrote:
  * The gram.y changes arround set_rest_(more|common) seem pretty
  confused
to me.

 E.g. its not possible anymore to set the timezone for a function.

 What do you exactly mean by this part of comment.

The set_rest_more production is used in FunctionSetResetClause and youve
removed capabilities from it.

  And why is it possible to persistently set the search path,
but not client encoding? Why is FROM CURRENT in set_rest_more?

 I think the main reason was some of the commands can work only in
 transaction  and as SET Persistent cannot work inside transaction block, so I 
 had kept
 some seggregation.

Yes, I can see reasons for doing this, I just think the split isn't
correct as youve done it.

  * Writing the temporary file to .$pid seems like a bad idea, better use
one file for that, SET PERSISTENT is protected by an exclusive lock
anyway.

   I think we can use one temporary file, infact that was one of the ways I
 have asked in one of the previous mails.
   However Tom and Zoltan felt this is better way to do it.

The have? I didn't read it like that. The file can only ever written by
a running postmaster and we already have code that ensures that. There's
absolutely no need for the tempfile to have a nondeterministic
name. That makes cleanup way easier as well.

   I can think of one reason where it will be better to have .$pid, and that
 is if some one has opened the
   file manually, then all other sessions can fail (in WINDOWS). Infact this
 is one of test Zoltan had performed.

Why on earth should somebody have the tempfile open? There's an
exclusive lock around writing the file in your code and if anybody
interferes like that with postgres' temporary data you have far bigger
problems than SET PERSISTENT erroring out.

  * the write sequence should be:
* fsync(tempfile)
* fsync(directory)
* rename(tempfile, configfile)
* fsync(configfile)
* fsync(directory)

 Why do we need fsync(directory) and fsync(configfile)?

So they don't vanish /get corrupted after a crash? The above is the
canonically safe way to safely write a file without an invalid file ever
being visible.

  * write_auto_conf_file should probably escape quoted values?
   Can you please elaborate more, I am not able to understand your
   point?

You do something like (don't have the code right here)
if (quoted)
   appendStringInfo(= \%s\, value).
what happens if value contains a ?

  * coding style should be adhered to more closesly, there are many
if (pointer) which should be if (pointer != NULL),

 Are you pointing in function  validate_conf_option(const char *name, char  
 *value)
   for below usage:
   + if (value)

For example, yes.


  * the check that prevents persistent SETs in a transaction should
  rather
be in utility.c and use PreventTransactionChain like most of the
others that need to do that (c.f. T_CreatedbStmt).

   We can move the check in utility.c, but if we use PreventTransactionChain,
   then it will be disallowed in functions as well. But the original idea was 
 to not support in
   transaction blocks only.
   So I feel use of current function IsTransactionBlock() should be okay.

I don't think its even remotely safe to allow doing this from a
function. Doing it there explicitly allows doing it in a transaction.

Should we find out its safe, fine, relaxing restrictions lateron is no
problem, imposing ones after introducing a feature is.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sepgsql 1/3] add name qualified creation label

2013-01-24 Thread Magnus Hagander

On Thu, Jan 24, 2013 at 10:11 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 2013/1/24 Tom Lane t...@sss.pgh.pa.us:
 John R Pierce pie...@hogranch.com writes:
 On 1/23/2013 8:32 PM, Tom Lane wrote:
 FWIW, in Fedora-land I see: ...

 I'd be far more interested in what is in RHEL and CentOS.Fedora,
 with its 6 month obsolescence cycle, is of zero interest to me for
 deploying database servers.

 But of course Fedora is also the upstream that will become RHEL7
 and beyond.

Do we know which version of Fedora will become RHEL7, and thus, which
version of libselinux will go in RHEL7? (And do we know which version
of postgres will go in RHEL7, assuming release schedules hold)

 It might be that the update timing makes a bigger difference in some
 other distros, though.  To return to Heikki's original point about
 Debian, what are they shipping today?

 Even though I'm not good at release cycle of Debian, I tried to check
 the shipped version of postgresql and libselinux for stable, testing,
 unstable and experimental release.
 I'm not certain why they don't push postgresql-9.2 into experimental
 release yet. However, it seems to me optimistic libselinux-2.1.10 being
 bundled on the timeline of postgresql-9.3.

 If someone familiar with Debian's release cycle, I'd like to see the 
 suggestion.

 * Debian (stable) ... postgresql-8.4 + libselinux-2.0.96
 http://packages.debian.org/en/squeeze/postgresql
 http://packages.debian.org/en/source/squeeze/libselinux

 * Debian (testing) ... postgresql-9.1 + libselinux-2.1.9
 http://packages.debian.org/en/wheezy/postgresql
 http://packages.debian.org/en/source/wheezy/libselinux

Just as a note, wheezy is the version that will be the next debian
stable, and it's in freeze since quite a while back. So we can safely
expect it will be 2.1.9 that's included in the next debian stable.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG #6510: A simple prompt is displayed using wrong charset

On Thu, Jan 24, 2013 at 03:45:36PM +0800, Craig Ringer wrote:
 On 01/24/2013 01:34 AM, Tom Lane wrote:
  Alexander Law exclus...@gmail.com writes:
  Please let me know if I can do something to get the bug fix 
  (https://commitfest.postgresql.org/action/patch_view?id=902) committed.
  It's waiting on some Windows-savvy committer to pick it up, IMO.
 I'm no committer, but I can work with Windows and know text encoding
 issues fairly well. I'll take a look, though I can't do it immediately
 as I have some other priority work. Should be able to in the next 24h.

Feel free, but it's already Ready for Committer.  Also, Andrew has said he
plans to pick it up.  If in doubt, I think this patch is relatively solid in
terms of non-committer review.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG #6510: A simple prompt is displayed using wrong charset

2013-01-24 Thread Craig Ringer

On 01/24/2013 07:57 PM, Noah Misch wrote:
 On Thu, Jan 24, 2013 at 03:45:36PM +0800, Craig Ringer wrote:
 On 01/24/2013 01:34 AM, Tom Lane wrote:
 Alexander Law exclus...@gmail.com writes:
 Please let me know if I can do something to get the bug fix 
 (https://commitfest.postgresql.org/action/patch_view?id=902) committed.
 It's waiting on some Windows-savvy committer to pick it up, IMO.
 I'm no committer, but I can work with Windows and know text encoding
 issues fairly well. I'll take a look, though I can't do it immediately
 as I have some other priority work. Should be able to in the next 24h.
 Feel free, but it's already Ready for Committer.  Also, Andrew has said he
 plans to pick it up.  If in doubt, I think this patch is relatively solid in
 terms of non-committer review.
I'm happy with that; I'll look elsewhere. Thanks.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] \d failing to find uppercased object names

2013-01-24 Thread Nikolay Samokhvalov

Hello,

some app created tables ussing uppercase letters in table names (this app
was migrated from mysql). One tries to use \d to search tables info:

ru=# \d pnct_Board
Did not find any relation named pnct_Board.

Of course, it works with :
ru=# \d pnct_Board
 Table public.pnct_Board
   Column|   Type   | Modifiers

-+--+---
 id  | bigint   | not null default
nextval('pnct_Board_id_seq'::regclass)
...


Could this be considered as gotcha?

The thing is that Postgres reply is pretty much clear:
  Did not find any relation named pnct_Board.
, but table pnct_Board does exist in the system, so Postgres' reply is
obviously incorrect.

Version: 9.2.1

Re: [HACKERS] \d failing to find uppercased object names

2013-01-24 Thread Виктор Егоров

2013/1/24 Nikolay Samokhvalov samokhva...@gmail.com:
 some app created tables ussing uppercase letters in table names (this app
 was migrated from mysql). One tries to use \d to search tables info:

 ...

 Could this be considered as gotcha?

 The thing is that Postgres reply is pretty much clear:
   Did not find any relation named pnct_Board.
 , but table pnct_Board does exist in the system, so Postgres' reply is
 obviously incorrect.

Take a look at this section in the docs:
http://www.postgresql.org/docs/current/interactive/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS

This is expected behavior.


-- 
Victor Y. Yegorov


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_ctl idempotent option

On Thu, Jan 24, 2013 at 09:05:59AM +0530, Ashutosh Bapat wrote:
 I agree, answering the question, whether the particular attempt of
 starting a server succeeded or not, will need the current behaviour.
 Now, question is which of these behaviours should be default?

That would work.  pg_upgrade knows the cluster version at that point and
can use the proper flag.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

On Thursday, January 24, 2013 5:25 PM Andres Freund wrote:
 On 2013-01-24 16:45:42 +0530, Amit Kapila wrote:
   * The gram.y changes arround set_rest_(more|common) seem pretty
   confused
 to me.
 
  E.g. its not possible anymore to set the timezone for a function.
 
  What do you exactly mean by this part of comment.
 
 The set_rest_more production is used in FunctionSetResetClause and
 youve
 removed capabilities from it.

set_rest_more has set_reset_common, so there should be no problem.

set_rest_more:/* Generic SET syntaxes: */ 
set_rest_common 
| var_name FROM CURRENT_P 
{ 
VariableSetStmt *n =
makeNode(VariableSetStmt); 
n-kind = VAR_SET_CURRENT; 
n-name = $1; 
$$ = n; 
} 

   And why is it possible to persistently set the search path,
 but not client encoding? Why is FROM CURRENT in set_rest_more?
 
  I think the main reason was some of the commands can work only in
  transaction  and as SET Persistent cannot work inside transaction
 block, so I had kept
  some seggregation.
 
 Yes, I can see reasons for doing this, I just think the split isn't
 correct as youve done it.
 
   * Writing the temporary file to .$pid seems like a bad idea, better
 use
 one file for that, SET PERSISTENT is protected by an exclusive
 lock
 anyway.
 
I think we can use one temporary file, infact that was one of the
 ways I
  have asked in one of the previous mails.
However Tom and Zoltan felt this is better way to do it.

 The have? I didn't read it like that. The file can only ever written by
 a running postmaster and we already have code that ensures that.
 There's
 absolutely no need for the tempfile to have a nondeterministic
 name. That makes cleanup way easier as well.

Sure, I understand that cleaning will be easier. So IIUC you are suggesting,
we should just keep
name as postgresql.auto.conf.tmp

In general, please read the below mail where it has been suggested to use
.$pid
http://www.postgresql.org/message-id/28379.1358541...@sss.pgh.pa.us


I can think of one reason where it will be better to have .$pid,
 and that
  is if some one has opened the
file manually, then all other sessions can fail (in WINDOWS).
 Infact this
  is one of test Zoltan had performed.
 
 Why on earth should somebody have the tempfile open? There's an
 exclusive lock around writing the file in your code and if anybody
 interferes like that with postgres' temporary data you have far bigger
 problems than SET PERSISTENT erroring out.

I am also not sure, but may be some people do for test purpose. 


   * the write sequence should be:
 * fsync(tempfile)
 * fsync(directory)
 * rename(tempfile, configfile)
 * fsync(configfile)
 * fsync(directory)
 
  Why do we need fsync(directory) and fsync(configfile)?
 
 So they don't vanish /get corrupted after a crash? The above is the
 canonically safe way to safely write a file without an invalid file
 ever
 being visible.

Do you think there will be problem if we just use as below: 
* fsync(tempfile)
* rename(tempfile, configfile)

I have seen such kind of use elsewhere also in code (writeTimeLineHistory())

   * write_auto_conf_file should probably escape quoted values?
Can you please elaborate more, I am not able to understand your
point?
 
 You do something like (don't have the code right here)
 if (quoted)
appendStringInfo(= \%s\, value).
 what happens if value contains a ?

I think this case is missed. I shall take care to handle the case when value
contains such value.


   * coding style should be adhered to more closesly, there are many
 if (pointer) which should be if (pointer != NULL),
 
  Are you pointing in function  validate_conf_option(const char *name,
 char  *value)
for below usage:
+ if (value)
 
 For example, yes.

Okay, I shall fix these, but I have seen such code at other places
(set_config_option), that's why I have kept it like how it is currently.

 
   * the check that prevents persistent SETs in a transaction should
   rather
 be in utility.c and use PreventTransactionChain like most of the
 others that need to do that (c.f. T_CreatedbStmt).
 
We can move the check in utility.c, but if we use
 PreventTransactionChain,
then it will be disallowed in functions as well. But the original
 idea was to not support in
transaction blocks only.
So I feel use of current function IsTransactionBlock() should be
 okay.
 
 I don't think its even remotely safe to allow doing this from a
 function. Doing it there explicitly allows doing it in a transaction.

As SET command is allowed inside a function, so I don't think without any
reason we should disallow SET PERSISTENT in function. 
The

[HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]

On 2013-01-24 18:37:29 +0530, Amit Kapila wrote:
 On Thursday, January 24, 2013 5:25 PM Andres Freund wrote:
  On 2013-01-24 16:45:42 +0530, Amit Kapila wrote:
* The gram.y changes arround set_rest_(more|common) seem pretty
confused
  to me.
  
   E.g. its not possible anymore to set the timezone for a function.
  
   What do you exactly mean by this part of comment.
  
  The set_rest_more production is used in FunctionSetResetClause and
  youve
  removed capabilities from it.
 
 set_rest_more has set_reset_common, so there should be no problem.
 
 set_rest_more:/* Generic SET syntaxes: */ 
 set_rest_common 
 | var_name FROM CURRENT_P 
 { 
 VariableSetStmt *n = 
 makeNode(VariableSetStmt); 
 n-kind = VAR_SET_CURRENT; 
 n-name = $1; 
 $$ = n; 
 } 

True. I still think that the split youve made isn't right as-is.

* Writing the temporary file to .$pid seems like a bad idea, better use
  one file for that, SET PERSISTENT is protected by an exclusive lock
  anyway.
  
 I think we can use one temporary file, infact that was one of the ways I
   have asked in one of the previous mails.
 However Tom and Zoltan felt this is better way to do it.

  The have? I didn't read it like that. The file can only ever written by
  a running postmaster and we already have code that ensures that.
  There's
  absolutely no need for the tempfile to have a nondeterministic
  name. That makes cleanup way easier as well.
 
 Sure, I understand that cleaning will be easier. So IIUC you are suggesting,
 we should just keep
 name as postgresql.auto.conf.tmp
 
 In general, please read the below mail where it has been suggested to use
 .$pid
 http://www.postgresql.org/message-id/28379.1358541...@sss.pgh.pa.us

That was in the context of your use of mkstemp() as far as I read
it. We use constantly named temp files in other locations as well.


  Why on earth should somebody have the tempfile open? There's an
  exclusive lock around writing the file in your code and if anybody
  interferes like that with postgres' temporary data you have far bigger
  problems than SET PERSISTENT erroring out.
 
 I am also not sure, but may be some people do for test purpose. 

That seems like an completely pointless reasoning.

* the write sequence should be:
  * fsync(tempfile)
  * fsync(directory)
  * rename(tempfile, configfile)
  * fsync(configfile)
  * fsync(directory)
  
   Why do we need fsync(directory) and fsync(configfile)?
  
  So they don't vanish /get corrupted after a crash? The above is the
  canonically safe way to safely write a file without an invalid file
  ever
  being visible.
 
 Do you think there will be problem if we just use as below: 
 * fsync(tempfile)
 * rename(tempfile, configfile)
 
 I have seen such kind of use elsewhere also in code (writeTimeLineHistory())

Yea, there are some places where the code isn't entirely safe. No reason
to introduce more of those.
* the check that prevents persistent SETs in a transaction should
rather
  be in utility.c and use PreventTransactionChain like most of the
  others that need to do that (c.f. T_CreatedbStmt).
  
 We can move the check in utility.c, but if we use
  PreventTransactionChain,
 then it will be disallowed in functions as well. But the original
  idea was to not support in
 transaction blocks only.
 So I feel use of current function IsTransactionBlock() should be
  okay.
  
  I don't think its even remotely safe to allow doing this from a
  function. Doing it there explicitly allows doing it in a transaction.
 
 As SET command is allowed inside a function, so I don't think without any
 reason we should disallow SET PERSISTENT in function. 
 The reason why it's not allowed in transaction is that we have to delete the
 temporary file in transaction end or at rollback which could have made the
 logic much more complex.

Yea, and allowing use in functions doesn't help you with that at all:

Consider the following plpgsql function:

$$
BEGIN
SET PERSISTENT foo = 'bar';
RAISE ERROR 'blub';
END;
$$

Now you have a SET PERSISTENT that succeeded although the transaction
failed.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-01-24 Thread Alvaro Herrera

Amit Kapila escribió:
 On Wednesday, January 23, 2013 11:51 PM Alvaro Herrera wrote:

  Yes it is -- the /etc/postgresql/version/cluster directory (where
  postgresql.conf resides) is owned by user postgres.
 
 So in that case we can consider postgresql.auto.conf to be in path w.r.t
 postgresql.conf instead of data directory.

That seems sensible to me.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG #6510: A simple prompt is displayed using wrong charset



On 01/24/2013 03:42 AM, Craig Ringer wrote:

On 01/24/2013 01:06 AM, Alexander Law wrote:

Hello,
Please let me know if I can do something to get the bug fix 
(https://commitfest.postgresql.org/action/patch_view?id=902) committed.
I would like to fix other bugs related to postgres localization, but 
I am not sure yet how to do it.


For anyone looking for the history, the 1st post on this topic is here:

http://www.postgresql.org/message-id/e1s3twb-0004oy...@wrigleys.postgresql.org



Yeah.

I'm happy enough with this patch. ISTM it's really a bug and should be 
backpatched, no?


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Back-branch update releases coming in a couple weeks

2013-01-24 Thread MauMau


From: Fujii Masao masao.fu...@gmail.com

On Thu, Jan 24, 2013 at 7:42 AM, MauMau maumau...@gmail.com wrote:
I searched through PostgreSQL mailing lists with WAL contains references 
to

invalid pages, and i found 19 messages.  Some people encountered similar
problem.  There were some discussions regarding those problems (Tom and
Simon Riggs commented), but those discussions did not reach a solution.

I also found a discussion which might relate to this problem.  Does this 
fix

the problem?

[BUG] lag of minRecoveryPont in archive recovery
http://www.postgresql.org/message-id/20121206.130458.170549097.horiguchi.kyot...@lab.ntt.co.jp


Yes. Could you check whether you can reproduce the problem on the
latest REL9_2_STABLE?


I tried to produce the problem by doing pg_ctl stop -mi against the 
primary more than ten times on REL9_2_STABLE, but the problem did not 
appear.  However, I encountered the crash only once out of dozens of 
failovers, possibly more than a hundred times, on PostgreSQL 9.1.6.  So, I'm 
not sure the problem is fixed in REL9_2_STABLE.


I'm wondering if the fix discussed in the above thread solves my problem.  I 
found the following differences between Horiguchi-san's case and my case:


(1)
Horiguchi-san says the bug outputs the message:

WARNING:  page 0 of relation base/16384/16385 does not exist

On the other hand, I got the message:

WARNING:  page 506747 of relation base/482272/482304 was uninitialized


(2)
Horiguchi-san produced the problem when he shut the standby immediately and 
restarted it.  However, I saw the problem during failover.



(3)
Horiguchi-san did not use any index, but in my case the WARNING message 
refers to an index.



But there's a similar point.  Horiguchi-san says the problem occurs after 
DELETE+VACUUM.  In my case, I shut the primary down while the application 
was doing INSERT/UPDATE.  As the below messages show, some vacuuming was 
running just before the immediate shutdown:


...
LOG:  automatic vacuum of table GOLD.scm1.tbl1: index scans: 0
pages: 0 removed, 36743 remain
tuples: 0 removed, 73764 remain
system usage: CPU 0.09s/0.11u sec elapsed 0.66 sec
LOG:  automatic analyze of table GOLD.scm1.tbl1 system usage: CPU 
0.00s/0.14u sec elapsed 0.32 sec

LOG:  automatic vacuum of table GOLD.scm1.tbl2: index scans: 0
pages: 0 removed, 12101 remain
tuples: 40657 removed, 44142 remain system usage: CPU 0.06s/0.06u sec 
elapsed 0.30 sec
LOG:  automatic analyze of table GOLD.scm1.tbl2 system usage: CPU 
0.00s/0.06u sec elapsed 0.14 sec

LOG:  received immediate shutdown request
...


Could you tell me the details of the problem discussed and fixed in the 
upcoming minor release?  I would to like to know the phenomenon and its 
conditions, and whether it applies to my case.


Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 1/3] Fix x + y x overflow checks

2013-01-24 Thread Claudio Freire

On Thu, Jan 24, 2013 at 6:36 AM, Xi Wang xi.w...@gmail.com wrote:
 icc optimizes away the overflow check x + y  x (y  0), because
 signed integer overflow is undefined behavior in C.  Instead, use
 a safe precondition test x  INT_MAX - y.


I should mention gcc 4.7 does the same, and it emits a warning.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Passing connection string to pg_basebackup

2013-01-24 Thread Hari Babu

On Tue, Jan 22, 2013 3:27 PM Hari Babu wrote:
On Saturday, January 19, 2013 5:49 PM Magnus Hagander wrote:
On Fri, Jan 18, 2013 at 1:05 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 18.01.2013 13:41, Amit Kapila wrote:

 On Friday, January 18, 2013 3:46 PM Heikki Linnakangas wrote:

 On 18.01.2013 08:50, Amit Kapila wrote:
 So to solve this problem below can be done:
 1. Support connection string in pg_basebackup and mention keepalives or
 connection_timeout
 2. Support recv_timeout separately to provide a way to users who are
not
 comfortable tcp keepalives

 a. 1 can be done alone
 b. 2 can be done alone
 c. both 1 and 2.


 Right. Let's do just 1 for now. An general application level, non-TCP,
 keepalive message at the libpq level might be a good idea, but that's a
much
 larger patch, definitely not 9.3 material.

+1 for doing 1 now. But actually, I think we can just keep it that way
in the future as well. If you need to specify these fairly advanced
options, using a connection string really isn't a problem.

I think it would be more worthwhile to go through the rest of the
tools in bin/ and make sure they *all* support connection strings.
And, an important point,  do it the same way.

Presently I am trying to implement the option-1 by adding an extra command
line
Option -C connection_string to pg_basebackup and pg_receivexlog.
This option can be used with all the tools in bin folder.

The existing command line options to the tools are not planned to remove as
of now.

To handle both options, we can follow these approaches.

1. To make the code simpler, the connection string is formed inside with
the existing
command line options, if the user is not provided the connection_string
option.
which is used for further processing.

2. The connection_string and existing command line options are handled
separately.

I feel approach-1 is better. Please provide your suggestions on the same.

Here is the patch which handles taking of connection string as an argument
to pg_basebackup and pg_receivexlog. 

Description of changes: 

1. New command line -C connection-stringoption is added for passing the
connection string. 
2. Used PQconnectdb function for connecting to server instead of existing
function PQconnectdbParams. 
3. The existing command line parameters are formed in a string and passed to
PQconnectdb function. 
4. With the connection string, if user provides additional options with
existing command line options, higher priority is given for the additional
options. 
5. conninfo_parse function is modified to handle of single quote in the
password provided as input. 


please provide your suggestions.


Regards, 
Hari babu.


pg_basebkup_recvxlog_conn_string_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

On Thursday, January 24, 2013 6:51 PM Andres Freund wrote:
 On 2013-01-24 18:37:29 +0530, Amit Kapila wrote:
  On Thursday, January 24, 2013 5:25 PM Andres Freund wrote:
   On 2013-01-24 16:45:42 +0530, Amit Kapila wrote:
 * The gram.y changes arround set_rest_(more|common) seem pretty
 confused
   to me.
   
E.g. its not possible anymore to set the timezone for a
 function.
   
What do you exactly mean by this part of comment.
  
   The set_rest_more production is used in FunctionSetResetClause and
   youve
   removed capabilities from it.
 
  set_rest_more has set_reset_common, so there should be no problem.
 
  set_rest_more:/* Generic SET syntaxes: */
  set_rest_common
  | var_name FROM CURRENT_P
  {
  VariableSetStmt *n =
 makeNode(VariableSetStmt);
  n-kind = VAR_SET_CURRENT;
  n-name = $1;
  $$ = n;
  }
 
 True. I still think that the split youve made isn't right as-is.

I am working on to provide the exact reasoning of split and if some variable
is not appropriate,
I shall do the appropriate movement.

 * Writing the temporary file to .$pid seems like a bad idea,
 better use
   one file for that, SET PERSISTENT is protected by an
 exclusive lock
   anyway.
   
  I think we can use one temporary file, infact that was one of
 the ways I
have asked in one of the previous mails.
  However Tom and Zoltan felt this is better way to do it.
 
   The have? I didn't read it like that. The file can only ever
 written by
   a running postmaster and we already have code that ensures that.
   There's
   absolutely no need for the tempfile to have a nondeterministic
   name. That makes cleanup way easier as well.
 
  Sure, I understand that cleaning will be easier. So IIUC you are
 suggesting,
  we should just keep
  name as postgresql.auto.conf.tmp
 
  In general, please read the below mail where it has been suggested to
 use
  .$pid
  http://www.postgresql.org/message-id/28379.1358541...@sss.pgh.pa.us
 
 That was in the context of your use of mkstemp() as far as I read
 it. We use constantly named temp files in other locations as well.

Okay, as I told you that I have also proposed initially, so as now you have
pointed it specifically,
I shall do it that way unless somebody will have another strong point of not
doing it this way. 
 
   Why on earth should somebody have the tempfile open? There's an
   exclusive lock around writing the file in your code and if anybody
   interferes like that with postgres' temporary data you have far
 bigger
   problems than SET PERSISTENT erroring out.
 
  I am also not sure, but may be some people do for test purpose.
 
 That seems like an completely pointless reasoning.

I have no specific reason, so I shall do it the way you have suggested.

 * the write sequence should be:
   * fsync(tempfile)
   * fsync(directory)
   * rename(tempfile, configfile)
   * fsync(configfile)
   * fsync(directory)
   
Why do we need fsync(directory) and fsync(configfile)?
  
   So they don't vanish /get corrupted after a crash? The above is the
   canonically safe way to safely write a file without an invalid file
   ever
   being visible.
 
  Do you think there will be problem if we just use as below:
  * fsync(tempfile)
  * rename(tempfile, configfile)
 
  I have seen such kind of use elsewhere also in code
 (writeTimeLineHistory())
 
 Yea, there are some places where the code isn't entirely safe. No
 reason
 to introduce more of those.

Okay, I will check this and let you know if I have any doubts.

 * the check that prevents persistent SETs in a transaction
 should
 rather
   be in utility.c and use PreventTransactionChain like most of
 the
   others that need to do that (c.f. T_CreatedbStmt).
   
  We can move the check in utility.c, but if we use
   PreventTransactionChain,
  then it will be disallowed in functions as well. But the
 original
   idea was to not support in
  transaction blocks only.
  So I feel use of current function IsTransactionBlock() should
 be
   okay.
  
   I don't think its even remotely safe to allow doing this from a
   function. Doing it there explicitly allows doing it in a
 transaction.
 
  As SET command is allowed inside a function, so I don't think without
 any
  reason we should disallow SET PERSISTENT in function.
  The reason why it's not allowed in transaction is that we have to
 delete the
  temporary file in transaction end or at rollback which could have
 made the
  logic much more complex.
 
 Yea, and allowing use in functions doesn't help you with that at all:
 
 Consider the following plpgsql function:
 
 $$
 BEGIN
 SET PERSISTENT foo = 'bar';
 RAISE ERROR 'blub';

Re: [HACKERS] [PATCH 0/3] Work around icc miscompilation

Xi Wang xi.w...@gmail.com writes:
 On 1/24/13 5:02 AM, Heikki Linnakangas wrote:
 If we fix these, can we stop using -frapv on gcc? Is there any way to 
 get gcc to warn about these?

 I believe we can get rid of -fwrapv once we fix all such checks.

TBH, I find that statement to be nonsense, and these patches to be
completely the wrong way to go about it.

The fundamental problem here is that the compiler, unless told otherwise
by a compilation switch, believes it is entitled to assume that no
integer overflow will happen anywhere in the program.  Therefore, any
error check that is looking for overflow *should* get optimized away.
The only reason the compiler would fail to do that is if its optimizer
isn't quite smart enough to prove that the code is testing for an
overflow condition.  So what you are proposing here is merely the next
move in an arms race with the compiler writers, and it will surely
break again in a future generation of compilers.  Or even if these
particular trouble spots don't break, something else will.  The only
reliable solution is to not let the compiler make that type of
assumption.

So I think we should just reject all of these, and instead fix configure
to make sure it turns on icc's equivalent of -fwrapv.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting visibility map in VACUUM's second phase

2013-01-24 Thread Jeff Janes

On Thu, Jan 24, 2013 at 1:28 AM, Pavan Deolasee
pavan.deola...@gmail.com wrote:
 Hi Jeff,

 On Thu, Jan 24, 2013 at 2:41 PM, Jeff Janes jeff.ja...@gmail.com wrote:

 lazy_vacuum_page now pins and unpins the vmbuffer for each page it marks
 all-visible, which seems like a lot of needless traffic since the next
 vmbuffer is likely to be the same as the previous one.


 Good idea. Even though the cost of pinning/unpinning may not be high
 with respect to the vacuum cost itself, but it seems to be a good idea
 because we already do that at other places. Do you have any other
 review comments on the patch or I'll fix this and send an updated
 patch soon.

That was the only thing that stood out to me.  You had some doubts
about visibility_cutoff_xid, but I don't know enough about that to
address them.  I agree that it would be nice to avoid the code
duplication, but I don't see a reasonable way to do that.

It applies cleanly (some offsets), builds without warnings, passes
make check under cassert.No documentation or regression tests are
needed.  We want this, and it does it.

I have verified the primary objective (setting vm sooner) but haven't
experimentally verified that this actually increases throughput for
some workload.  For pgbench when all data fits in shared_buffers, at
least it doesn't cause a readily noticeable slow down.

I haven't devised any patch-specific test cases, either for
correctness or performance.  I just used make check, pgbench, and the
vacuum verbose verification you told us about in the original
submission.

If we are going to scan a block twice, I wonder if it should be done
the other way around.  If there are any dead tuples that need to be
removed from indexes, there is no point in dirtying the page with a
HOT prune on the first pass when it will just be dirtied again after
the indexes are vacuumed.  I don't see this idea holding up your patch
though, as surely it would be more invasive than what you are doing.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state

2013-01-24 Thread Steve Singer

On 13-01-24 06:40 AM, Andres Freund wrote:

I was able to get something that generated output for INSERT statements
in a format similar to what a modified slony apply trigger would want.
This was with the list of tables to replicate hard-coded in the plugin.
This was with the patchset from the last commitfest.I had gotten a bit
hung up on the UPDATE and DELETE support because slony allows you to use
an arbitrary user specified unique index as your key. It looks like
better support for tables with a unique non-primary key is in the most
recent patch set. I am hoping to have time this weekend to update my
plugin to use parameters passed in on the init and other updates in the
most recent version. If I make some progress I will post a link to my
progress at the end of the weekend. My big issue is that I have limited
time to spend on this.

BTW, why does all the transaction reordering stuff has to be in core?

I think we will find that the replication systems won't be the only
users of this feature. I have often seen systems that have a logging
requirement for auditing purposes or to log then reconstruct the
sequence of changes made to a set of tables in order to feed a
downstream application. Triggers and a journaling table are the
traditional way of doing this but it should be pretty easy to write a
plugin to accomplish the same thing that should give better
performance. If the reordering stuff wasn't in core this would be much
harder.

How much of this infrastructure is to support replicating DDL changes? IOW,
if we drop that requirement, how much code can we slash?

Unfortunately I don't think too much unless we add in other code that
allows us to check whether the current definition of a table is still
the same as it was back when the tuple was logged.

Any other features or requirements that could be dropped? I think it's clear at
this stage that
this patch is not going to be committed as it is. If you can reduce it to a
fraction of what it is now, that fraction might have a chance. Otherwise,
it's just going to be pushed to the next commitfest as whole, and we're
going to be having the same doubts and discussions then.

but thats just a minor thing.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG #6510: A simple prompt is displayed using wrong charset

On Thu, Jan 24, 2013 at 08:50:36AM -0500, Andrew Dunstan wrote:
 On 01/24/2013 03:42 AM, Craig Ringer wrote:
 On 01/24/2013 01:06 AM, Alexander Law wrote:
 Hello,
 Please let me know if I can do something to get the bug fix  
 (https://commitfest.postgresql.org/action/patch_view?id=902) 
 committed.
 I would like to fix other bugs related to postgres localization, but  
 I am not sure yet how to do it.

 For anyone looking for the history, the 1st post on this topic is here:

 http://www.postgresql.org/message-id/e1s3twb-0004oy...@wrigleys.postgresql.org


 Yeah.

 I'm happy enough with this patch. ISTM it's really a bug and should be  
 backpatched, no?

It is a bug, yes.  I'm neutral on whether to backpatch.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]

Andres Freund and...@2ndquadrant.com writes:
 On 2013-01-24 16:45:42 +0530, Amit Kapila wrote:
 * Writing the temporary file to .$pid seems like a bad idea, better use
 one file for that, SET PERSISTENT is protected by an exclusive lock
 anyway.

 I think we can use one temporary file, infact that was one of the ways I
 have asked in one of the previous mails.
 However Tom and Zoltan felt this is better way to do it.

 The have? I didn't read it like that. The file can only ever written by
 a running postmaster and we already have code that ensures that. There's
 absolutely no need for the tempfile to have a nondeterministic
 name. That makes cleanup way easier as well.

Say again?  Surely the temp file is being written by whichever backend
is executing SET PERSISTENT, and there could be more than one.

Or, if it isn't and you really do mean that this patch is getting the
postmaster process involved in executing SET PERSISTENT, that's going to
be sufficient grounds for rejection right there.  The only way that we
have to keep the postmaster reliable is to not have it doing more than
it absolutely must.

While I have not been paying very much attention to this patch, I have
to say that every time I've looked into the thread I've gotten an
overwhelming impression of overdesign and overcomplication.  This should
not be a very large or subtle patch.  Write a setting into a temp file,
rename it into place, done.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Skip checkpoint on promoting from streaming replication

On 6 January 2013 21:58, Simon Riggs si...@2ndquadrant.com wrote:
 On 9 August 2012 10:45, Simon Riggs si...@2ndquadrant.com wrote:
 On 22 June 2012 05:03, Kyotaro HORIGUCHI
 horiguchi.kyot...@lab.ntt.co.jp wrote:

I hope this is promising.

 I've reviewed this and thought about it over some time.

 I've been torn between the need to remove the checkpoint for speed and
 being worried about the implications of doing so.

 We promote in multiple use cases. When we end a PITR, or are
 performing a switchover, it doesn't really matter how long the
 shutdown checkpoint takes, so I'm inclined to leave it there in those
 cases. For failover, we need fast promotion.

 So my thinking is to make   pg_ctl promote -m fast
 be the way to initiate a fast failover that skips the shutdown checkpoint.

 That way all existing applications work the same as before, while new
 users that explicitly choose to do so will gain from the new option.


Here's a patch to skip checkpoint when we do

  pg_ctl promote -m fast

We keep the end of recovery checkpoint in all other cases.

The only thing left from Kyotaro's patch is a single line of code -
the call to ReadCheckpointRecord() that checks to see if the WAL
records for the last two restartpoints is on disk, which was an
important line of code.

Patch implements a new record type XLOG_END_OF_RECOVERY that behaves
on replay like a shutdown checkpoint record. I put this back in from
my patch because I believe its important that we have a clear place
where the WAL history changes timelineId. WAL format change bump
required.

So far this is only barely tested, but considering time goes on, I
thought people might want to pass comment on this.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


fast_promote.v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]

On 2013-01-24 11:22:52 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2013-01-24 16:45:42 +0530, Amit Kapila wrote:
  * Writing the temporary file to .$pid seems like a bad idea, better use
  one file for that, SET PERSISTENT is protected by an exclusive lock
  anyway.
 
  I think we can use one temporary file, infact that was one of the ways I
  have asked in one of the previous mails.
  However Tom and Zoltan felt this is better way to do it.
 
  The have? I didn't read it like that. The file can only ever written by
  a running postmaster and we already have code that ensures that. There's
  absolutely no need for the tempfile to have a nondeterministic
  name. That makes cleanup way easier as well.
 
 Say again?  Surely the temp file is being written by whichever backend
 is executing SET PERSISTENT, and there could be more than one.

Sure, but the patch acquires SetPersistentLock exlusively beforehand
which seems fine to me.


Any opinion whether its acceptable to allow SET PERSISTENT in functions?
It seems absurd to me to allow it, but Amit seems to be of another
opinion.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 0/3] Work around icc miscompilation

On 1/24/13 10:48 AM, Tom Lane wrote:
 The fundamental problem here is that the compiler, unless told otherwise
 by a compilation switch, believes it is entitled to assume that no
 integer overflow will happen anywhere in the program.  Therefore, any
 error check that is looking for overflow *should* get optimized away.
 The only reason the compiler would fail to do that is if its optimizer
 isn't quite smart enough to prove that the code is testing for an
 overflow condition.  So what you are proposing here is merely the next
 move in an arms race with the compiler writers, and it will surely
 break again in a future generation of compilers.  Or even if these
 particular trouble spots don't break, something else will.  The only
 reliable solution is to not let the compiler make that type of
 assumption.

What I am proposing here is the opposite: _not_ to enter an arm race
with the compiler writers.  Instead, make the code conform to the C
standard, something both sides can agree on.

Particularly, avoid using (signed) overflowed results to detect
overflows, which the C standard clearly specifies as undefined
behavior and many compilers are actively exploiting.

We could use either unsigned overflows (which is well defined) or
precondition testing (like `x  INT_MAX - y' in the patches).

 So I think we should just reject all of these, and instead fix configure
 to make sure it turns on icc's equivalent of -fwrapv.

While I agree it's better to turn on icc's -fno-strict-overflow as a
workaround, the fundamental problem here is that we are _not_
programming in the C language.  Rather, we are programming in some
C-with-signed-wrapraround dialect.

The worst part of this C dialect is that it has never been specified
clearly what it means by signed wraparound:

gcc's -fwrapv assumes signed wraparound for add/sub/mul, but not for
div (e.g., INT_MIN/-1 traps on x86) nor shift (e.g., 132 produces
undef with clang).

clang's -fwrapv also assumes workaround for pointer arithmetic, while
gcc's does not.

I have no idea what icc and pathscale's -fno-strict-overflow option
does (probably the closest thing to -fwrapv).  Sometimes it prevents
such checks from being optimized away, sometimes it doesn't.

Anyway, it would not be surprising if future C compilers break this
dialect again.  But they shouldn't break standard-conforming code.

- xi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pg_isready (was: [WIP] pg_ping utility)

2013-01-24 Thread Phil Sorber

On Wed, Jan 23, 2013 at 8:12 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Jan 24, 2013 at 8:47 AM, Phil Sorber p...@omniti.com wrote:
 On Wed, Jan 23, 2013 at 6:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Phil Sorber p...@omniti.com writes:
 On Wed, Jan 23, 2013 at 1:58 PM, Bruce Momjian br...@momjian.us wrote:
 On Wed, Jan 23, 2013 at 12:27:45PM -0500, Tom Lane wrote:
 +1 for default timeout --- if this isn't like ping where you are
 expecting to run indefinitely, I can't see that it's a good idea for it
 to sit very long by default, in any circumstance.

 FYI, the pg_ctl -w (wait) default is 60 seconds:

 Great. That is what I came to on my own as well. Figured that might be
 a sticking point, but as there is precedent, I'm happy with it.

 I'm not sure that's a relevant precedent at all.  What that number is
 is the time that pg_ctl will wait around for the postmaster to start or
 stop before reporting a problem --- and in either case, a significant
 delay (multiple seconds) is not surprising, because of crash-recovery
 work, shutdown checkpointing, etc.  For pg_isready, you'd expect to get
 a response more or less instantly, wouldn't you?  Personally, I'd decide
 that pg_isready is broken if it didn't give me an answer in a couple of
 seconds, much less a minute.

 What I had in mind was a default timeout of maybe 3 or 4 seconds...

 I was thinking that if it was in a script you wouldn't care if it was
 60 seconds. If it was at the command line you would ^C it much
 earlier. I think the default linux TCP connection timeout is around 20
 seconds. My feeling is everyone is going to have a differing opinion
 on this, which is why I was hoping that some good precedent existed.
 I'm fine with 3 or 4, whatever can be agreed upon.

 +1 with 3 or 4 secounds.

 Aside from this issue, I have one minor comment. ISTM you need to
 add the following line to the end of the help message. This line has
 been included in the help message of all the other client commands.

 Report bugs to pgsql-b...@postgresql.org.

Ok, I set the default timeout to 3 seconds, added the bugs email to
the help, and also added docs that I forgot last time.

Also, still hoping to get some feedback on my other issues.

Thanks.


 Regards,

 --
 Fujii Masao


pg_isready_timeout_v2.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Skip checkpoint on promoting from streaming replication


On 24.01.2013 18:24, Simon Riggs wrote:

On 6 January 2013 21:58, Simon Riggssi...@2ndquadrant.com  wrote:

I've been torn between the need to remove the checkpoint for speed and
being worried about the implications of doing so.

We promote in multiple use cases. When we end a PITR, or are
performing a switchover, it doesn't really matter how long the
shutdown checkpoint takes, so I'm inclined to leave it there in those
cases. For failover, we need fast promotion.

So my thinking is to make   pg_ctl promote -m fast
be the way to initiate a fast failover that skips the shutdown checkpoint.

That way all existing applications work the same as before, while new
users that explicitly choose to do so will gain from the new option.


Here's a patch to skip checkpoint when we do

   pg_ctl promote -m fast

We keep the end of recovery checkpoint in all other cases.


Hmm, there seems to be no way to do a fast promotion with a trigger file.

I'm a bit confused why there needs to be special mode for this. Can't we 
just always do the fast promotion? I agree that there's no urgency 
when you're doing PITR, but shouldn't do any harm either. Or perhaps 
always do fast promotion when starting up from standby mode, and 
slow otherwise.


Are we comfortable enough with this to skip the checkpoint after crash 
recovery?


I may be missing something, but it looks like after a fast promotion, 
you don't request a new checkpoint. So it can take quite a while for the 
next checkpoint to be triggered by checkpoint_timeout/segments. That 
shouldn't be a problem, but I feel that it'd be prudent to request a new 
checkpoint immediately (not necessarily an immediate checkpoint, though).



The only thing left from Kyotaro's patch is a single line of code -
the call to ReadCheckpointRecord() that checks to see if the WAL
records for the last two restartpoints is on disk, which was an
important line of code.


Why's that important, just for paranoia? If the last two restartpoints 
have disappeared, something's seriously wrong, and you will be in 
trouble e.g if you crash at that point. Do we need to be extra paranoid 
when doing a fast promotion?



Patch implements a new record type XLOG_END_OF_RECOVERY that behaves
on replay like a shutdown checkpoint record. I put this back in from
my patch because I believe its important that we have a clear place
where the WAL history changes timelineId. WAL format change bump
required.


Agreed, such a WAL record is essential.

At replay, an end-of-recovery record should be a signal to the hot 
standby mechanism that there are no transactions running in the master 
at that point, same as a shutdown checkpoint.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Back-branch update releases coming in a couple weeks

2013-01-24 Thread Fujii Masao

On Thu, Jan 24, 2013 at 11:53 PM, MauMau maumau...@gmail.com wrote:
From: Fujii Masao masao.fu...@gmail.com

On Thu, Jan 24, 2013 at 7:42 AM, MauMau maumau...@gmail.com wrote:

I searched through PostgreSQL mailing lists with WAL contains references
to
invalid pages, and i found 19 messages. Some people encountered similar
problem. There were some discussions regarding those problems (Tom and
Simon Riggs commented), but those discussions did not reach a solution.

I also found a discussion which might relate to this problem. Does this
fix
the problem?

[BUG] lag of minRecoveryPont in archive recovery

http://www.postgresql.org/message-id/20121206.130458.170549097.horiguchi.kyot...@lab.ntt.co.jp

Yes. Could you check whether you can reproduce the problem on the
latest REL9_2_STABLE?

I tried to produce the problem by doing pg_ctl stop -mi against the
primary more than ten times on REL9_2_STABLE, but the problem did not
appear. However, I encountered the crash only once out of dozens of
failovers, possibly more than a hundred times, on PostgreSQL 9.1.6. So, I'm
not sure the problem is fixed in REL9_2_STABLE.

You can reproduce the problem in REL9_1_STABLE?

Sorry, I pointed wrong version, i.e., REL9_2_STABLE. Since you encountered
the problem in 9.1.6, you need to retry the test in REL9_1_STABLE.

I'm wondering if the fix discussed in the above thread solves my problem. I
found the following differences between Horiguchi-san's case and my case:

(1)
Horiguchi-san says the bug outputs the message:

WARNING: page 0 of relation base/16384/16385 does not exist

On the other hand, I got the message:

WARNING: page 506747 of relation base/482272/482304 was uninitialized

(2)
Horiguchi-san produced the problem when he shut the standby immediately and
restarted it. However, I saw the problem during failover.

(3)
Horiguchi-san did not use any index, but in my case the WARNING message
refers to an index.

But there's a similar point. Horiguchi-san says the problem occurs after
DELETE+VACUUM. In my case, I shut the primary down while the application
was doing INSERT/UPDATE. As the below messages show, some vacuuming was
running just before the immediate shutdown:

...
LOG: automatic vacuum of table GOLD.scm1.tbl1: index scans: 0
pages: 0 removed, 36743 remain
tuples: 0 removed, 73764 remain
system usage: CPU 0.09s/0.11u sec elapsed 0.66 sec
LOG: automatic analyze of table GOLD.scm1.tbl1 system usage: CPU
0.00s/0.14u sec elapsed 0.32 sec
LOG: automatic vacuum of table GOLD.scm1.tbl2: index scans: 0
pages: 0 removed, 12101 remain
tuples: 40657 removed, 44142 remain system usage: CPU 0.06s/0.06u sec
elapsed 0.30 sec
LOG: automatic analyze of table GOLD.scm1.tbl2 system usage: CPU
0.00s/0.06u sec elapsed 0.14 sec
LOG: received immediate shutdown request
...

Could you tell me the details of the problem discussed and fixed in the
upcoming minor release? I would to like to know the phenomenon and its
conditions, and whether it applies to my case.

http://www.postgresql.org/message-id/20121206.130458.170549097.horiguchi.kyot...@lab.ntt.co.jp

Could you read the discussion in the above thread?

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improve concurrency of foreign key locking

2013-01-24 Thread Alvaro Herrera

Alvaro Herrera wrote:
 Improve concurrency of foreign key locking

I noticed a bug in visibility routines when pg_upgrade is involved:
tuples marked FOR UPDATE in the old cluster (i.e. HEAP_XMAX_INVALID not
set, HEAP_XMAX_EXCL_LOCK set, HEAP_XMAX_IS_MULTI not set) are invisible
(dead) on the new cluster.  I had defined the HEAP_XMAX_IS_LOCKED_ONLY
thusly:

#define HEAP_XMAX_IS_LOCKED_ONLY(infomask) \
((infomask)  HEAP_XMAX_LOCK_ONLY)

but this doesn't work for the reason stated above.  The fix is to
redefine the macro like this:

/*
 * A tuple is only locked (i.e. not updated by its Xmax) if the
 * HEAP_XMAX_LOCK_ONLY bit is set; or, for pg_upgrade's sake, if the Xmax is
 * not a multi and the EXCL_LOCK bit is set.
 *
 * See also HeapTupleHeaderIsOnlyLocked, which also checks for a possible
 * aborted updater transaction.
 *
 * Beware of multiple evaluations of the argument.
 */
#define HEAP_XMAX_IS_LOCKED_ONLY(infomask) \
(((infomask)  HEAP_XMAX_LOCK_ONLY) || \
 (((infomask)  (HEAP_XMAX_IS_MULTI | HEAP_LOCK_MASK)) == 
HEAP_XMAX_EXCL_LOCK))



-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]

Andres Freund and...@2ndquadrant.com writes:
 On 2013-01-24 11:22:52 -0500, Tom Lane wrote:
 Say again?  Surely the temp file is being written by whichever backend
 is executing SET PERSISTENT, and there could be more than one.

 Sure, but the patch acquires SetPersistentLock exlusively beforehand
 which seems fine to me.

Why should we have such a lock?  Seems like that will probably introduce
as many problems as it fixes.  Deadlock risk, blockages, etc.  It is not
necessary for atomicity, since rename() would be atomic already.

 Any opinion whether its acceptable to allow SET PERSISTENT in functions?
 It seems absurd to me to allow it, but Amit seems to be of another
 opinion.

Well, it's really a definitional question I think: do you expect that
subsequent failure of the transaction should cause such a SET to roll
back?

I think it would be entirely self-consistent to define SET PERSISTENT as
a nontransactional operation.  Then the implementation would just be to
write the file immediately when the command is executed, and there's no
particular reason why it can't be allowed inside a transaction block.

If you want it to be transactional, then the point of disallowing it in
transaction blocks (or functions) would be to not have a very large
window between writing the file and committing.  But it's still possible
that the transaction would fail somewhere in there, leading to the
inconsistent outcome that the transaction reports failing but we applied
the SET anyway.  I do agree that it would be nonsensical to allow SET
PERSISTENT in functions but not transaction blocks.

Another approach is to remember the requested setting until somewhere in
the pre-commit sequence, and then try to do the file write at that time.
I'm not terribly thrilled with that approach, though, because (a) it
only narrows the window for an inconsistent outcome, it doesn't remove
it entirely; (b) there are already too many things that want to be the
last thing done before commit; and (c) it adds complexity and overhead
that I'd just as soon this patch not add.  But if you want S.P. to be
transactional and allowed inside functions, I think this would be the
only acceptable implementation.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_retainxlog for inclusion in 9.3?

2013-01-24 Thread Peter Eisentraut

After reviewing this, it appears to me that this is really just a very
verbose version of

archive_command = 'sleep $initialsleep; while test $(psql -AtX -c select 
pg_xlogfile_name(something)  $$%f$$ collate \C\;) = t; sleep $sleep; done'

I think it might be better to just document this as an example.  I don't
quite see the overhead of maintaining another tool justified.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_retainxlog for inclusion in 9.3?

2013-01-24 Thread Magnus Hagander

On Thu, Jan 24, 2013 at 6:04 PM, Peter Eisentraut pete...@gmx.net wrote:
 After reviewing this, it appears to me that this is really just a very
 verbose version of

 archive_command = 'sleep $initialsleep; while test $(psql -AtX -c select 
 pg_xlogfile_name(something)  $$%f$$ collate \C\;) = t; sleep $sleep; done'

 I think it might be better to just document this as an example.  I don't
 quite see the overhead of maintaining another tool justified.

Well, obviously I don't entirely agree ;)

Yes, it's a convenience command. Like pg_standby was. And like many
other commands that we maintain as part of *core*, such as createuser,
vacuumdb, etc. Those can all be done with an even *simpler* command
than the one you suggest above. So I don't see that as an argument why
it wouldn't be useful.

Also, the command you suggest above does not work on Windows. You can
probably write a .BAT file to do it for you, but I'm pretty sure it's
impossible to do it as an archive_command there.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]

On 2013-01-24 12:02:59 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2013-01-24 11:22:52 -0500, Tom Lane wrote:
  Say again?  Surely the temp file is being written by whichever backend
  is executing SET PERSISTENT, and there could be more than one.
 
  Sure, but the patch acquires SetPersistentLock exlusively beforehand
  which seems fine to me.
 
 Why should we have such a lock?  Seems like that will probably introduce
 as many problems as it fixes.  Deadlock risk, blockages, etc.  It is not
 necessary for atomicity, since rename() would be atomic already.

Well, the patch acquires it as-is, so I made the judgement based on
that.
But I think that lock isn't neccesarily a bad idea. While the
replacement of the values clearly is atomic due to the rename I think
there's another confusing behaviour lurking:

Backend A: starts
Backend B: starts
Backend A: reads the config
Backend B: reads the config
Backend A: does SET PERSISTENT foobar =..;
Backend B: does SET PERSISTENT foobar =..;

Now B overwrites the config change A has made as they are all stored in
the same file.

So the only safe way to do this seems to be:

LWLockAcquire(SetPersistentLock, LW_EXCLUSIVE);
ProcessConfigFile();
validate_option();
write_rename();
LWLockRelease();

We can decide not to care about the above case but the window isn't that
small if no reload is implied and so this seems to be overly grotty.

  Any opinion whether its acceptable to allow SET PERSISTENT in functions?
  It seems absurd to me to allow it, but Amit seems to be of another
  opinion.
 
 Well, it's really a definitional question I think: do you expect that
 subsequent failure of the transaction should cause such a SET to roll
 back?
 
 I think it would be entirely self-consistent to define SET PERSISTENT as
 a nontransactional operation.  Then the implementation would just be to
 write the file immediately when the command is executed, and there's no
 particular reason why it can't be allowed inside a transaction block.

Thats how its implemented atm except for not allowing it in transactions.

I think the reason I am weary of allowing it inside transaction is that
I think the config file needs to be reloaded before writing the new one
and it seems dangerous to me to reload the config in all the possible
situations a function can be called.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_retainxlog for inclusion in 9.3?

Magnus Hagander mag...@hagander.net writes:
 On Thu, Jan 24, 2013 at 6:04 PM, Peter Eisentraut pete...@gmx.net wrote:
 I think it might be better to just document this as an example.  I don't
 quite see the overhead of maintaining another tool justified.

 Well, obviously I don't entirely agree ;)

 Yes, it's a convenience command. Like pg_standby was. And like many
 other commands that we maintain as part of *core*, such as createuser,
 vacuumdb, etc. Those can all be done with an even *simpler* command
 than the one you suggest above. So I don't see that as an argument why
 it wouldn't be useful.

We've discussed removing a lot of those tools, too.  Not breaking
backwards compatibility is probably the only reason they're still there.

In the case at hand, I seem to recall from upthread that we expect
this'd be obsolete in a release or two.  If that's true then I think
a para or two of documentation is a better idea than a tool we'll be
essentially condemned to keep maintaining forever.

 Also, the command you suggest above does not work on Windows. You can
 probably write a .BAT file to do it for you, but I'm pretty sure it's
 impossible to do it as an archive_command there.

Perhaps we could whip up such a .BAT file and put it in the docs?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_retainxlog for inclusion in 9.3?

2013-01-24 Thread Magnus Hagander

On Thu, Jan 24, 2013 at 6:25 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 On Thu, Jan 24, 2013 at 6:04 PM, Peter Eisentraut pete...@gmx.net wrote:
 I think it might be better to just document this as an example.  I don't
 quite see the overhead of maintaining another tool justified.

 Well, obviously I don't entirely agree ;)

 Yes, it's a convenience command. Like pg_standby was. And like many
 other commands that we maintain as part of *core*, such as createuser,
 vacuumdb, etc. Those can all be done with an even *simpler* command
 than the one you suggest above. So I don't see that as an argument why
 it wouldn't be useful.

 We've discussed removing a lot of those tools, too.  Not breaking
 backwards compatibility is probably the only reason they're still there.

 In the case at hand, I seem to recall from upthread that we expect
 this'd be obsolete in a release or two.  If that's true then I think
 a para or two of documentation is a better idea than a tool we'll be
 essentially condemned to keep maintaining forever.

Not really sure there is such an expectation - any more than there was
such an expectation when we initially put pg_standby in there. It
would be *possible* to do it, certainly. But it's not like we have an
actual plan. And AFAIK the stuff that was discussed upthread was a
simplified version of it - not the full flexibility.

That said, it's certainly a point that we'd have to maintain it. But I
don't see why we'd have to maintain it beyond the point where we
included the same functionality in core, if we did.


 Also, the command you suggest above does not work on Windows. You can
 probably write a .BAT file to do it for you, but I'm pretty sure it's
 impossible to do it as an archive_command there.

 Perhaps we could whip up such a .BAT file and put it in the docs?

That would probably work, yes.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

Andres Freund and...@anarazel.de writes:
 Backend A: does SET PERSISTENT foobar =..;
 Backend B: does SET PERSISTENT foobar =..;

 Now B overwrites the config change A has made as they are all stored in
 the same file.

Say what?  I thought the plan was one setting per file, so that we don't
get involved in having to parse-and-edit the file contents.  What was
all that argument about a new directory, if we're only using one file?

If we are using just one file, then I agree a lock would be needed to
synchronize updates.  But that seems to add a lot of complication
elsewhere.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: fix corner use case of variadic fuctions usage

2013-01-24 Thread Pavel Stehule

Hello

so there is updated version

+ some lines of doc
+ new regress tests

there are no reply to my previous mail - so I choose

concat(variadic null) --- NULL
concat(variadic '{}') --- empty string

this behave should not be precedent for any other variadic any
function, because concat() and concat_ws() functions has a specific
behave - when it is called with one parameter returns string or empty
string

Regards

Pavel


2013/1/23 Pavel Stehule pavel.steh...@gmail.com:
 2013/1/23 Tom Lane t...@sss.pgh.pa.us:
 Pavel Stehule pavel.steh...@gmail.com writes:
 next related example

 CREATE OR REPLACE FUNCTION public.myleast(VARIADIC integer[])
  RETURNS integer
  LANGUAGE sql
 AS $function$
 select min(v) from unnest($1) g(v)
 $function$

 The reason you get a null from that is that (1) unnest() produces zero
 rows out for either a null or empty-array input, and (2) min() over
 zero rows produces NULL.

 In a lot of cases, it's not very sane for aggregates over no rows to
 produce NULL; the best-known example is that SUM() produces NULL, when
 anyone who'd not suffered brain-damage from sitting on the SQL committee
 would have made it return zero.  So I'm not very comfortable with
 generalizing from this specific case to decide that NULL is the
 universally right result.


 I am testing some situation and there are no consistent idea - can we
 talk just only about functions concat and concat_ws?

 these functions are really specific - now we talk about corner use
 case and strongly PostgreSQL proprietary solution. So any solution
 should not too bad.

 Difference between all solution will by maximally +/- 4 simple rows
 per any function.

 Possibilities

 1) A. concat(variadic NULL) = empty string, B. concat(variadic '{}')
 = empty string -- if we accept @A, then B is ok
 2) A. concat(variadic NULL) = NULL, B. concat(variadic '{}') = NULL
 -- question - is @B valid ?
 3) A. concat(variadic NULL) = NULL, B. concat(variadic '{}) = empty string

 There are no other possibility.

 I can live with any variant - probably we find any precedent to any
 variant in our code or in ANSI SQL.

 I like @2 as general concept for PostgreSQL variadic functions, but
 when we talk about concat() and concat_ws() @1 is valid too.

 Please, can somebody say his opinion early

 Regards

 Pavel



 regards, tom lane


variadic_any_fix_20130124.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]

On 2013-01-24 12:30:02 -0500, Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  Backend A: does SET PERSISTENT foobar =..;
  Backend B: does SET PERSISTENT foobar =..;

  Now B overwrites the config change A has made as they are all stored in
  the same file.

 Say what?  I thought the plan was one setting per file, so that we don't
 get involved in having to parse-and-edit the file contents.  What was
 all that argument about a new directory, if we're only using one file?

I initially thought that as well (and voted for it) but after reskimming
the thread and reading the patch that doesn't seem to be the case unless
its implemented in a way I don't understand:

+#define PG_AUTOCONF_DIRauto.conf.d
+#define PG_AUTOCONF_FILENAME   postgresql.auto.conf

+   /* Frame auto conf name and auto conf sample temp file name */
+   snprintf(AutoConfFileName, sizeof(AutoConfFileName), %s/%s/%s,
+   ConfigFileDir,
+   PG_AUTOCONF_DIR,
+   PG_AUTOCONF_FILENAME);
+   snprintf(AutoConfTmpFileName, sizeof(AutoConfTmpFileName),%s/%s/%s.%d,
+   ConfigFileDir,
+   PG_AUTOCONF_DIR,
+   PG_AUTOCONF_FILENAME,
+   (int) getpid());
+

I don't understand what the conf.d is all about either if only one file
is going to be used.

 If we are using just one file, then I agree a lock would be needed to
 synchronize updates.  But that seems to add a lot of complication
 elsewhere.

More people seem to have voted for the single file approach but I still
haven't understood why...

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Skip checkpoint on promoting from streaming replication

On 24 January 2013 16:52, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 On 24.01.2013 18:24, Simon Riggs wrote:

 On 6 January 2013 21:58, Simon Riggssi...@2ndquadrant.com  wrote:

 I've been torn between the need to remove the checkpoint for speed and

 being worried about the implications of doing so.

 We promote in multiple use cases. When we end a PITR, or are
 performing a switchover, it doesn't really matter how long the
 shutdown checkpoint takes, so I'm inclined to leave it there in those
 cases. For failover, we need fast promotion.

 So my thinking is to make   pg_ctl promote -m fast
 be the way to initiate a fast failover that skips the shutdown
 checkpoint.

 That way all existing applications work the same as before, while new
 users that explicitly choose to do so will gain from the new option.


 Here's a patch to skip checkpoint when we do

pg_ctl promote -m fast

 We keep the end of recovery checkpoint in all other cases.


 Hmm, there seems to be no way to do a fast promotion with a trigger file.

True. I thought we were moving away from trigger files to use of promote

 I'm a bit confused why there needs to be special mode for this. Can't we
 just always do the fast promotion? I agree that there's no urgency when
 you're doing PITR, but shouldn't do any harm either. Or perhaps always do
 fast promotion when starting up from standby mode, and slow otherwise.

 Are we comfortable enough with this to skip the checkpoint after crash
 recovery?

I'm not. Maybe if we get no bugs we can make it do this always, in next release.

It;s fast when it needs to be and safe otherwise.


 I may be missing something, but it looks like after a fast promotion, you
 don't request a new checkpoint. So it can take quite a while for the next
 checkpoint to be triggered by checkpoint_timeout/segments. That shouldn't be
 a problem, but I feel that it'd be prudent to request a new checkpoint
 immediately (not necessarily an immediate checkpoint, though).

I thought of that and there is a long comment to explain why I didn't.

Two problems:

1) an immediate checkpoint can cause a disk/resource usage spike,
which is definitely not what you need just when a spike of connections
and new SQL hits the system.

2) If we did that, we would have an EndOfRecovery record, some other
records and then a Shutdown checkpoint.
As I right this, (2) is wrong, because we shouldn't do a a Shutdown
checkpoint anyway.

But I still think (1) is a valid concern.

 The only thing left from Kyotaro's patch is a single line of code -
 the call to ReadCheckpointRecord() that checks to see if the WAL
 records for the last two restartpoints is on disk, which was an
 important line of code.


 Why's that important, just for paranoia? If the last two restartpoints have
 disappeared, something's seriously wrong, and you will be in trouble e.g if
 you crash at that point. Do we need to be extra paranoid when doing a fast
 promotion?

The check is cheap, so what do we gain by skipping the check?

 Patch implements a new record type XLOG_END_OF_RECOVERY that behaves
 on replay like a shutdown checkpoint record. I put this back in from
 my patch because I believe its important that we have a clear place
 where the WAL history changes timelineId. WAL format change bump
 required.


 Agreed, such a WAL record is essential.

 At replay, an end-of-recovery record should be a signal to the hot standby
 mechanism that there are no transactions running in the master at that
 point, same as a shutdown checkpoint.

I had a reason why I didn't do that, but it seems to have slipped my mind.

If I can't remember, I'll add it.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] LATERAL, UNNEST and spec compliance

2013-01-24 Thread David Fetter

Folks,

Andrew Gierth asked me to send this out as his email is in a parlous
state at the moment.  My comments will follow in replies.  Without
further ado:


SQL2008 says, for 7.6 table reference

6)
  a) If TR is contained in a from clause FC with no intervening query
expression, then the scope clause SC of TR is the select statement:
single row or innermost query specification that contains FC.  The
scope of a range variable of TR is the select list, where clause,
group by clause, having clause, and window clause of SC, together
with every lateral derived table that is simply contained in FC and
is preceded by TR, and every collection derived table that is simply
contained in FC and is preceded by TR, and the join condition of all
joined tables contained in SC that contain TR. If SC is the query
specification that is the query expression body of a simple table
query STQ, then the scope of a range variable of TR also includes the
order by clause of STQ.

This is the clause that defines the scope effect of LATERAL, and as can be
seen, it defines collection derived table, i.e. UNNEST(), as having the
same behaviour as lateral derived table.

It is also worth noting at this point that pg's FROM func() syntax is not
in the spec (the nearest is FROM TABLE(collection value expression)).

Our implementation of UNNEST currently deviates from the spec by not being
implicitly LATERAL; given the (sub)query

  SELECT * FROM sometable, UNNEST(somearray);

then somearray is required to be a parameter or outer reference rather
than a column of sometable. To get the spec's behaviour for this, we
currently have to do:

  SELECT * FROM sometable, LATERAL UNNEST(somearray);

which is non-standard syntax. (In the spec, only table subquery can
follow LATERAL.)

(We also don't accept the (optional) syntax of S301, allowing multiple
parameters to UNNEST().)

As I see it, the current options are:

1. Do nothing, and insist on non-standard use of the LATERAL keyword.

2. Add UNNEST to the grammar (or parse analysis) as a special case, making
   it implicitly LATERAL.

   (This would make implementing S301 easier, but special cases are ugly.)

3. Make all cases of SRFs in the FROM-clause implicitly LATERAL.

   (As far as I can tell, those cases whose behaviour would be changed by
   this actually produce errors in versions prior to 9.3, so no working
   code should be affected.)

Since LATERAL is new in 9.3, I think the pros and cons of these choices
should be considered now, rather than being allowed to slide by unexamined.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

On Thu, Jan 17, 2013 at 07:54:55AM -0500, Robert Haas wrote:
 On Wed, Jan 16, 2013 at 1:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Where I really need someone to hit me upside the head with a
  clue-stick is the code I added to the bottom of RelationBuildDesc()
  in relcache.c. The idea is that on first access to an unlogged MV,
  to detect that the heap has been replaced by the init fork, set
  relisvalid to false, and make the heap look normal again.
 
  Hmm.  I agree that relcache.c has absolutely no business doing that,
  but not sure what else to do instead.  Seems like it might be better
  done at first touch of the MV in the parser, rewriter, or planner ---
  but the fact that I can't immediately decide which of those is right
  makes me feel that it's still too squishy.
 
 I think we shouldn't be doing that at all.  The whole business of
 transferring the relation-is-invalid information from the relation to
 a pg_class flag seems like a Rube Goldberg device to me.  I'm still
 not convinced that we should have a relation-is-invalid flag at all,
 but can we at least not have two?
 
 It seems perfectly adequate to detect that the MV is invalid when we
 actually try to execute a plan - that is, when we first access the
 heap or one of its indexes.  So the bit can just live in the
 file-on-disk, and there's no need to have a second copy of it in
 pg_class.

Like Kevin, I want a way to distinguish unpopulated MVs from MVs that
genuinely yielded the empty set at last refresh.  I agree that there's no
particular need to store that fact in pg_class, and I would much prefer only
storing it one way in any case.  A user-visible disadvantage of the current
implementation is that relisvalid remains stale until something opens the rel.
That's fine for the system itself, but it can deceive user-initiated catalog
queries.  Imagine a check_postgres action that looks for invalid MVs to
complain about.  It couldn't just scan pg_class; it would need to first do
something that opens every MV.

I suggest the following:

1. Let an invalid MV have a zero-length heap.  Distinguish a valid, empty MV
   by giving it a page with no tuples.  This entails VACUUM[1] not truncating
   MVs below one page and the refresh operation, where necessary, extending
   the relation from zero pages to one.
2. Remove pg_class.relisvalid.
3. Add a bool field to RelationData.  The word valid is used in that context
   to refer to the validity of the structure itself, so perhaps call the new
   field rd_scannable.  RelationIsFlaggedAsValid() can become a macro;
   consider changing its name for consistency with the field name.
4. During relcache build, set the field to RelationGetNumberBlocks(rel) != 0
   for MVs, fixed true for everyone else.  Any operation that changes the
   field must, and probably would anyway, instigate a relcache invalidation.
5. Expose a database function, say pg_relation_scannable(), for querying the
   current state of a relation.  This supports user-level monitoring.

Does that seem reasonable?  One semantic difference to keep in mind is that
unlogged MVs will be considered invalid on the standby while valid on the
master.  That's essentially an accurate report, so I won't mind it.

For the benefit of the archives, I note that we almost need not truncate an
unlogged materialized view during crash recovery.  MVs are refreshed in a
VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the MV's
pg_class to that relfilenode.  When a crash occurs with no refresh in flight,
the latest refresh had been safely synced.  When a crash cuts short a refresh,
the pg_class update will not stick, and the durability of the old heap is not
in doubt.  However, non-btree index builds don't have the same property; we
would need to force an immediate sync of the indexes to be safe here.  It
would remain necessary to truncate unlogged MVs when recovering a base backup,
which may contain a partially-written refresh that did eventually commit.
Future MV variants that modify the MV in place would also need the usual
truncate on crash.

I'm going to follow this with a review covering a broader range of topics.

Thanks,
nm

[1] For the time being, it's unfortunate to VACUUM materialized views at all;
they only ever bear frozen tuples.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

Hi Kevin,

The patch conflicts with git master; I tested against master@{2013-01-20}.

On Wed, Jan 16, 2013 at 12:40:55AM -0500, Kevin Grittner wrote:
 I've been struggling with two areas:
 
  - pg_dump sorting for MVs which depend on other MVs

From your later messages, I understand that you have a way forward on this.

  - proper handling of the relisvalid flag for unlogged MVs after recovery

I have discussed this in a separate email.  While reading the patch to assess
that topic, I found a few more things:

 *** a/contrib/pg_upgrade/version_old_8_3.c
 --- b/contrib/pg_upgrade/version_old_8_3.c
 ***
 *** 145,151  old_8_3_check_for_tsquery_usage(ClusterInfo *cluster)
   FROM   
 pg_catalog.pg_class c, 
  
 pg_catalog.pg_namespace n, 
  
 pg_catalog.pg_attribute a 
 ! WHERE  
 c.relkind = 'r' AND 
  
 c.oid = a.attrelid AND 
  
 NOT a.attisdropped AND 
  
 a.atttypid = 'pg_catalog.tsquery'::pg_catalog.regtype AND 
 --- 145,151 
   FROM   
 pg_catalog.pg_class c, 
  
 pg_catalog.pg_namespace n, 
  
 pg_catalog.pg_attribute a 
 ! WHERE  
 c.relkind in ('r', 'm') AND 
  
 c.oid = a.attrelid AND 
  
 NOT a.attisdropped AND 
  
 a.atttypid = 'pg_catalog.tsquery'::pg_catalog.regtype AND 

PostgreSQL 8.3 clusters won't contain materialized views, so it doesn't really
matter whether this change happens or not.  I suggest adding a comment,
whether or not you keep the code change.

 *** a/contrib/sepgsql/sepgsql.h
 --- b/contrib/sepgsql/sepgsql.h
 ***
 *** 32,37 
 --- 32,39 
   
   /*
* Internally used code of object classes
 +  *
 +  * NOTE: Materialized views are treated as tables for now.

This smells like a bypass of mandatory access control.  Unless you've
determined that this is correct within the sepgsql security model, I suggest
starting with a draconian policy, like simply crippling MVs.  Even if you have
determined that, separating out the nontrivial sepgsql support might be good.
The set of ideal reviewers is quite different.

*/
   #define SEPG_CLASS_PROCESS  0
   #define SEPG_CLASS_FILE 1
 *** a/contrib/vacuumlo/vacuumlo.c
 --- b/contrib/vacuumlo/vacuumlo.c
 ***
 *** 209,215  vacuumlo(const char *database, const struct _param * param)
   strcat(buf,   AND a.atttypid = t.oid );
   strcat(buf,   AND c.relnamespace = s.oid );
   strcat(buf,   AND t.typname in ('oid', 'lo') );
 ! strcat(buf,   AND c.relkind = 'r');
   strcat(buf,   AND s.nspname !~ '^pg_');
   res = PQexec(conn, buf);
   if (PQresultStatus(res) != PGRES_TUPLES_OK)
 --- 209,215 
   strcat(buf,   AND a.atttypid = t.oid );
   strcat(buf,   AND c.relnamespace = s.oid );
   strcat(buf,   AND t.typname in ('oid', 'lo') );
 ! strcat(buf,   AND c.relkind in ('r', 'm'));

It concerns me slightly that older vacuumlo could silently remove large
objects still referenced by MVs.  Only slightly, though, because the next MV
refresh would remove those references anyway.  Is there anything we should do
to help that situation?  If nothing else, perhaps backpatch this patch hunk.

 +varlistentry
 + termliteralWITH OIDS//term
 + termliteralWITHOUT OIDS//term
 + listitem
 +  para
 +   These are obsolescent syntaxes equivalent to literalWITH (OIDS)/
 +   and literalWITH (OIDS=FALSE)/, respectively.  If you wish to give
 +   both an literalOIDS/ setting and storage parameters, you must use
 +   the literalWITH ( ... )/ syntax; see above.
 +  /para
 + /listitem
 +/varlistentry

Let's not support OIDs on MVs.  They'll be regenerated on every refresh.

 ***
 *** 336,342  ExplainOneQuery(Query *query, IntoClause *into, ExplainState 
 *es,
*/
   void
   ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
 !   const char *queryString, ParamListInfo params)
   {
   if (utilityStmt

[HACKERS] NOT VALID FKs and Constraints

2013-01-24 Thread Cody Cutrer

9.1 introduced delayed validation on FKs, and 9.2 on table constraints,
however neither one has been useful due to lesser-locks on ALTER TABLE
being reverted (see
http://www.postgresql.org/message-id/1330350691-su...@alvh.no-ip.org),
preventing the lock benefits during the VALIDATE stage.

I don't see any commits since then about either one. Is fixing this still
on the radar for 9.3? Also, is a similar feature planned (for 9.3 or
further out) for NOT NULL column constraints?

Thanks,

Cody Cutrer

Re: [HACKERS] [PATCH] pg_isready (was: [WIP] pg_ping utility)

2013-01-24 Thread Fujii Masao

On Fri, Jan 25, 2013 at 1:45 AM, Phil Sorber p...@omniti.com wrote:
 On Wed, Jan 23, 2013 at 8:12 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Jan 24, 2013 at 8:47 AM, Phil Sorber p...@omniti.com wrote:
 On Wed, Jan 23, 2013 at 6:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Phil Sorber p...@omniti.com writes:
 On Wed, Jan 23, 2013 at 1:58 PM, Bruce Momjian br...@momjian.us wrote:
 On Wed, Jan 23, 2013 at 12:27:45PM -0500, Tom Lane wrote:
 +1 for default timeout --- if this isn't like ping where you are
 expecting to run indefinitely, I can't see that it's a good idea for it
 to sit very long by default, in any circumstance.

 FYI, the pg_ctl -w (wait) default is 60 seconds:

 Great. That is what I came to on my own as well. Figured that might be
 a sticking point, but as there is precedent, I'm happy with it.

 I'm not sure that's a relevant precedent at all.  What that number is
 is the time that pg_ctl will wait around for the postmaster to start or
 stop before reporting a problem --- and in either case, a significant
 delay (multiple seconds) is not surprising, because of crash-recovery
 work, shutdown checkpointing, etc.  For pg_isready, you'd expect to get
 a response more or less instantly, wouldn't you?  Personally, I'd decide
 that pg_isready is broken if it didn't give me an answer in a couple of
 seconds, much less a minute.

 What I had in mind was a default timeout of maybe 3 or 4 seconds...

 I was thinking that if it was in a script you wouldn't care if it was
 60 seconds. If it was at the command line you would ^C it much
 earlier. I think the default linux TCP connection timeout is around 20
 seconds. My feeling is everyone is going to have a differing opinion
 on this, which is why I was hoping that some good precedent existed.
 I'm fine with 3 or 4, whatever can be agreed upon.

 +1 with 3 or 4 secounds.

 Aside from this issue, I have one minor comment. ISTM you need to
 add the following line to the end of the help message. This line has
 been included in the help message of all the other client commands.

 Report bugs to pgsql-b...@postgresql.org.

 Ok, I set the default timeout to 3 seconds, added the bugs email to
 the help, and also added docs that I forgot last time.

Thanks!

 Also, still hoping to get some feedback on my other issues.

set_pglocale_pgservice() should be called?

I think that the command name (i.e., pg_isready) should be given to
PQpingParams() as fallback_application_name. Otherwise, the server
by default uses unknown as the application name of pg_isready.
It's undesirable.

Why isn't the following message output only when invalid option is
specified?

Try \%s --help\ for more information.

When the conninfo string including the hostname or port number is
specified in -d option, pg_isready displays the wrong information
as follows.

$ pg_isready -d port=
/tmp:5432 - no response

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state

2013-01-24 Thread Robert Haas

On Thu, Jan 24, 2013 at 6:14 AM, Andres Freund and...@2ndquadrant.com wrote:
Thats way much more along the lines of what I am afraid of than the
performance stuff - but Heikki cited those, so I replied to that.

Note that I didn't say this must, must go in - I just don't think
Heikki's reasoning about why not hit the nail on the head.

Fair enough, no argument.

Before getting bogged down in technical commentary, let me say this
very clearly: I am enormously grateful for your work on this project.
Logical replication based on WAL decoding is a feature of enormous
value that PostgreSQL has needed for a long time, and your work has
made that look like an achievable goal. Furthermore, it seems to me
that you have pursued the community process with all the vigor and
sincerity for which anyone could ask. Serious design concerns were
raised early in the process and you made radical changes to the design
which I believe have improved it tremendously, and you've continued to
display an outstanding attitude at every phase of this process about
which I can't say enough good things. There is no question in my mind
that this work is going to be the beginning of a process that
revolutionizes the way people think about replication and PostgreSQL,
and you deserve our sincere thanks for that.

Now, the bad news is, I don't think it's very reasonable to try to
commit this to 9.3. I think it is just too much stuff too late in the
cycle. I've reviewed some of the patches from time to time but there
is a lot more stuff and it's big and complicated and it's not really
clear that we have the interface quite right yet, even though I think
it's also clear that we are a lot of closer than we were. I don't
want to be fixing that during beta, much less after release.

1 3 allocate (respectively free) the permanent state associated with
one changeset consumer whereas START_LOGICAL_REPLICATION streams out
changes starting at RECPTR.

Forgive me for not having looked at the patch, but to what extent is
all this, ah, documented?

No changes to mvcc for normal backends at all, unless you count the very
slightly changed *Satisfies interface (getting passed a HeapTuple
instead of HeapTupleHeader).

Mostly, my concern is that you've accidentally broken something, or
that your code will turn out to be flaky in ways we can't now predict.
My only really specific concern at this point is about the special
treatment of catalog tables. We've never done anything like that
before, and it feels like a bad idea. In particular, the fact that
you have to WAL-log new information about cmin/cmax really suggests
that we're committing ourselves to the MVCC infrastructure in a way
that we weren't previously. There's some category of stuff that our
MVCC implementation didn't previously

Re: [HACKERS] Materialized views WIP patch

2013-01-24 Thread Kevin Grittner

Thanks for looking at this!

Noah Misch wrote:

 For the benefit of the archives, I note that we almost need not truncate an
 unlogged materialized view during crash recovery. MVs are refreshed in a
 VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the MV's
 pg_class to that relfilenode. When a crash occurs with no refresh in flight,
 the latest refresh had been safely synced. When a crash cuts short a refresh,
 the pg_class update will not stick, and the durability of the old heap is not
 in doubt. However, non-btree index builds don't have the same property; we
 would need to force an immediate sync of the indexes to be safe here. It
 would remain necessary to truncate unlogged MVs when recovering a base backup,
 which may contain a partially-written refresh that did eventually commit.
 Future MV variants that modify the MV in place would also need the usual
 truncate on crash.

Hmm. That's a very good observation. Perhaps the issue can be
punted to a future release where we start adding more incremental
updates to them. I'll think on that, but on the face of it, it
sounds like the best choice.

 I'm going to follow this with a review covering a broader range
 of topics.

I'll need time to digest the rest of it. As you note, recent
commits conflict with the last patch. Please look at the github
repo where I've been working on this. I'll post an updated patch
later today.

https://github.com/kgrittn/postgres/tree/matview

You might want to ignore the interim work on detecting the new
pg_dump dependencies through walking the internal structures. I
decided that was heading in a direction which might be
unnecessarily fragile and slow; so I tried writing it as a query
against the system tables. I'm pretty happy with the results.
Here's the query:

with recursive w as
(
select
    d1.objid,
    d1.objid as wrkid,
    d2.refobjid,
    c2.relkind as refrelkind
  from pg_depend d1
  join pg_class c1 on c1.oid = d1.objid
                  and c1.relkind = 'm'
                  and c1.relisvalid
  join pg_rewrite r1 on r1.ev_class = d1.objid
  join pg_depend d2 on d2.classid = 'pg_rewrite'::regclass
                   and d2.objid = r1.oid
                   and d2.refobjid  d1.objid
  join pg_class c2 on c2.oid = d2.refobjid
                  and c2.relkind in ('m','v')
                  and c2.relisvalid
  where d1.classid = 'pg_class'::regclass
union
select
    w.objid,
    w.refobjid as wrkid,
    d3.refobjid,
    c3.relkind as refrelkind
  from w
  join pg_rewrite r3 on r3.ev_class = w.refobjid
  join pg_depend d3 on d3.classid = 'pg_rewrite'::regclass
                   and d3.objid = r3.oid
                   and d3.refobjid  w.refobjid
  join pg_class c3 on c3.oid = d3.refobjid
                  and c3.relkind in ('m','v')
                  and c3.relisvalid
  where w.refrelkind  'm'
),
x as
(
select objid::regclass, refobjid::regclass from w
  where refrelkind = 'm'
)
select 'm'::text as type, x.objid, x.refobjid from x
union all
select
    'i'::text as type,
    x.objid,
    i.indexrelid as refobjid
  from x
  join pg_index i on i.indrelid = x.refobjid
                 and i.indisvalid
;

If we bail on having pg_class.relisvalid, then it will obviously
need adjustment.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-01-24 Thread Robert Haas

On Wed, Jan 23, 2013 at 1:45 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Andres Freund escribió:
 I somewhat dislike the fact that CONCURRENTLY isn't really concurrent
 here (for the listeners: swapping the indexes acquires exlusive locks) ,
 but I don't see any other naming being better.

 REINDEX ALMOST CONCURRENTLY?

I'm kind of unconvinced of the value proposition of this patch.  I
mean, you can DROP INDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY
today, so ... how is this better?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

On Thu, Jan 24, 2013 at 01:29:56PM -0500, Robert Haas wrote:
 On Wed, Jan 23, 2013 at 1:45 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
  Andres Freund escribió:
  I somewhat dislike the fact that CONCURRENTLY isn't really concurrent
  here (for the listeners: swapping the indexes acquires exlusive locks) ,
  but I don't see any other naming being better.
 
  REINDEX ALMOST CONCURRENTLY?
 
 I'm kind of unconvinced of the value proposition of this patch.  I
 mean, you can DROP INDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY
 today, so ... how is this better?

This has been on the TODO list for a while, and I don't think the
renaming in a transaction work needed to use drop/create is really
something we want to force on users.  In addition, doing that for all
tables in a database is even more work, so I would be disappointed _not_
to get this feature in 9.3.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state

2013-01-24 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
 Now, the bad news is, I don't think it's very reasonable to try to
 commit this to 9.3.  I think it is just too much stuff too late in the
 cycle.  I've reviewed some of the patches from time to time but there
 is a lot more stuff and it's big and complicated and it's not really
 clear that we have the interface quite right yet, even though I think
 it's also clear that we are a lot of closer than we were.  I don't
 want to be fixing that during beta, much less after release.

The only way to avoid this happening again and again, imv, is to get it
committed early in whatever cycle it's slated to release for.  We've got
some serious challenges there though because we want to encourage
everyone to focus on beta testing and going through the release process,
plus we don't want to tag/branch too early or we create more work for
ourselves.

It would have been nice to get this into 9.3, but I can certainly
understand needing to move it back, but can we get a slightly more
specific plan around getting it in then?

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Strange Windows problem, lock_timeout test request

2013-01-24 Thread Jeff Janes

On Sat, Jan 19, 2013 at 12:15 PM, Andrew Dunstan and...@dunslane.net wrote:
 On 01/19/2013 02:36 AM, Boszormenyi Zoltan wrote:


 A long time ago I had a lot of sympathy with this answer, but these days not
 so much. Getting a working mingw/msys environment sufficient to build a bare
 bones PostgreSQL from scratch is both cheap and fairly easy. The
 improvements that mingw has made in its install process, and the presence of
 cheap or free windows instances in the cloud combine to make this pretty
 simple.  But since it's still slightly involved here is how I constructed
 one such  this morning:

I've used this description, skipping the Amazon part and putting it
directly on my Windows computer, and it worked.

Except bin/pg_ctl does not work.  It just silently exits without doing
anything, so I have to use bin/postgres to start the database (which
is what make check uses anyway, so not a problem if you just want
make check).  Is that just me or is that a known problem?  I've seen
some discussion from 2004, but didn't find a conclusion.

What advantages does mingw have over MSVC?  Is it just that it is
cost-free, or is it easier to use mingw than MSVC for someone used to
building on Linux?  (mingw certainly does not seem to have the
advantage of being fast!)

Would you like to put this somewhere on wiki.postgresql.org, or would
you mind if I do so?

Thanks for the primer,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

Bruce Momjian br...@momjian.us writes:
 On Thu, Jan 24, 2013 at 01:29:56PM -0500, Robert Haas wrote:
 I'm kind of unconvinced of the value proposition of this patch.  I
 mean, you can DROP INDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY
 today, so ... how is this better?

 This has been on the TODO list for a while, and I don't think the
 renaming in a transaction work needed to use drop/create is really
 something we want to force on users.  In addition, doing that for all
 tables in a database is even more work, so I would be disappointed _not_
 to get this feature in 9.3.

I haven't given the current patch a look, but based on previous
discussions, this isn't going to be more than a macro for things that
users can do already --- that is, it's going to be basically DROP
CONCURRENTLY plus CREATE CONCURRENTLY plus ALTER INDEX RENAME, including
the fact that the RENAME step will transiently need an exclusive lock.
(If that's not what it's doing, it's broken.)  So there's some
convenience argument for it, but it's hardly amounting to a stellar
improvement.

I'm kind of inclined to put it off till after we fix the SnapshotNow
race condition problems; at that point it should be possible to do
REINDEX CONCURRENTLY more simply and without any exclusive lock
anywhere.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

On 2013-01-24 13:29:56 -0500, Robert Haas wrote:
 On Wed, Jan 23, 2013 at 1:45 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
  Andres Freund escribió:
  I somewhat dislike the fact that CONCURRENTLY isn't really concurrent
  here (for the listeners: swapping the indexes acquires exlusive locks) ,
  but I don't see any other naming being better.
 
  REINDEX ALMOST CONCURRENTLY?
 
 I'm kind of unconvinced of the value proposition of this patch.  I
 mean, you can DROP INDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY
 today, so ... how is this better?

In the wake of beb850e1d873f8920a78b9b9ee27e9f87c95592f I wrote a script
to do this and it really is harder than one might think:
* you cannot do it in the database as CONCURRENTLY cannot be used in a
  TX
* you cannot do it to toast tables (this is currently broken in the
  patch but should be fixable)
* you cannot legally do it when foreign key reference your unique key
* you cannot do it to exclusion constraints or non-immediate indexes

All of those are fixable (and most are) within REINDEX CONCURRENLY, so I
find that to be a major feature even if its not as good as it could be.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state


On 24.01.2013 20:27, Robert Haas wrote:

Before getting bogged down in technical commentary, let me say this
very clearly: I am enormously grateful for your work on this project.
Logical replication based on WAL decoding is a feature of enormous
value that PostgreSQL has needed for a long time, and your work has
made that look like an achievable goal.  Furthermore, it seems to me
that you have pursued the community process with all the vigor and
sincerity for which anyone could ask.  Serious design concerns were
raised early in the process and you made radical changes to the design
which I believe have improved it tremendously, and you've continued to
display an outstanding attitude at every phase of this process about
which I can't say enough good things.


+1. I really appreciate all the work you Andres have put into this. I've 
argued in the past myself that there should be a little tool that 
scrapes the WAL to do logical replication. Essentially, just what you've 
implemented.


That said (hah, you knew there would be a but ;-)), now that I see 
what that looks like, I'm feeling that maybe it wasn't such a good idea 
after all. It sounded like a fairly small patch that greatly reduces the 
overhead in the master with existing replication systems like slony, but 
it turned out to be a huge patch with a lot of new concepts and interfaces.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Skip checkpoint on promoting from streaming replication

On 24 January 2013 17:44, Simon Riggs si...@2ndquadrant.com wrote:

 At replay, an end-of-recovery record should be a signal to the hot standby
 mechanism that there are no transactions running in the master at that
 point, same as a shutdown checkpoint.

 I had a reason why I didn't do that, but it seems to have slipped my mind.

 If I can't remember, I'll add it.

I think it was simply to keep things simple and avoid bugs in this release.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gistchoose vs. bloat


On 21.01.2013 15:19, Heikki Linnakangas wrote:

On 21.01.2013 15:06, Tom Lane wrote:

Jeff Davispg...@j-davis.com writes:

On Mon, 2013-01-21 at 00:48 -0500, Tom Lane wrote:

I looked at this patch. ISTM we should not have the option at all but
just do it always. I cannot believe that always-go-left is ever a
preferable strategy in the long run; the resulting imbalance in the
index will surely kill any possible benefit. Even if there are some
cases where it miraculously fails to lose, how many users are going to
realize that applies to their case and make use of the option?



Sounds good to me.



If I remember correctly, there was also an argument that it may be
useful for repeatable test results. That's a little questionable for
performance (except in those cases where few penalties are identical
anyway), but could plausibly be useful for a crash report or something.


Meh. There's already a random decision, in the equivalent place and for
a comparable reason, in btree (cf _bt_findinsertloc). Nobody's ever
complained about that being indeterminate, so I'm unconvinced that
there's a market for it with gist.


I wonder if it would work for gist to do something similar to
_bt_findinsertloc, and have a bias towards the left page, but sometimes
descend to one of the pages to the right. You would get the cache
locality of usually descending down the same subtree, but also fill the
pages to the right. Jeff / Alexander, want to give that a shot?


I did some experimenting with that. I used the same test case Alexander 
did, with geonames data, and compared unpatched version, the original 
patch, and the attached patch that biases the first best tuple found, 
but still sometimes chooses the other equally good ones.


testname| initsize | finalsize | idx_blks_read | idx_blks_hit
+--+---+---+--
 patched-10-4mb | 75497472 |  90202112 |   5853604 | 10178331
 unpatched-4mb  | 75145216 |  94863360 |   5880676 | 10185647
 unpatched-4mb  | 75587584 |  97165312 |   5903107 | 10183759
 patched-2-4mb  | 74768384 |  81403904 |   5768124 | 10193738
 origpatch-4mb  | 74883072 |  82182144 |   5783412 | 10185373

All these tests were performed with shared_buffers=4MB, so that the 
index won't fit completely in shared buffers, and I could use the 
idx_blk_read/hit ratio as a measure of cache-friendliness. The 
origpath test was with a simplified version of Alexander's patch, see 
attached. It's the same as the original, but with the 
randomization-option and gist-build related code removed. The patched-10 
and patched-2 tests are two variants with my patch, with 1/10 and 1/2 
chance of moving to the next equal tuple, respectively. The differences 
in cache hit ratio might be just a result of different index sizes. I 
included two unpatched runs above to show that there's a fair amount of 
noise in these tests. That's because of the random nature of the test 
case; it picks rows to delete and insert at random.


I think the conclusion is that all of these patches are effective. The 
1/10 variant is less effective, as expected, as it's closer in behavior 
to the unpatched behavior than the others. The 1/2 variant seems as good 
as the original patch.



I performed another test, by inserting 100 duplicate values on an 
empty table.


  testname  | finalsize | idx_blks_read | idx_blks_hit
+---+---+--
 unpatched  |  89030656 | 21350 |  2972033
 patched-10 |  88973312 | 21450 |  2971920
 patched-2  |  88481792 | 22947 |  2970260
 origpatch  |  61186048 |761817 |  2221500

The original patch produces a much smaller index in this test, but since 
the inserts are distributed randomly, the pages are accessed randomly 
which is bad for caching.


A table full of duplicates isn't very realistic, but overall, I'm 
leaning towards my version of this patch (gistchoose-2.patch). It has 
less potential for causing a regression in existing applications, but is 
just as effective in the original scenario of repeated delete+insert.


- Heikki


duplicatetest.sh
Description: Bourne shell script


gistbloat.sh
Description: Bourne shell script
diff --git a/src/backend/access/gist/gistutil.c b/src/backend/access/gist/gistutil.c
index 8b60774..75778f6 100644
--- a/src/backend/access/gist/gistutil.c
+++ b/src/backend/access/gist/gistutil.c
@@ -379,6 +379,7 @@ gistchoose(Relation r, Page p, IndexTuple it,	/* it has compressed entry */
 	GISTENTRY	entry,
 identry[INDEX_MAX_KEYS];
 	bool		isnull[INDEX_MAX_KEYS];
+	int			look_further_on_equal = -1;
 
 	Assert(!GistPageIsLeaf(p));
 
@@ -446,6 +447,8 @@ gistchoose(Relation r, Page p, IndexTuple it,	/* it has compressed entry */
 
 if (j  r-rd_att-natts - 1)
 	best_penalty[j + 1] = -1;
+
+look_further_on_equal = -1;
 			}
 			else if (best_penalty[j] == usize)
 			{
@@ -468,12 +471,46 @@

Re: [HACKERS] [PATCH] pg_isready (was: [WIP] pg_ping utility)

2013-01-24 Thread Phil Sorber

On Thu, Jan 24, 2013 at 1:12 PM, Fujii Masao masao.fu...@gmail.com wrote:
 set_pglocale_pgservice() should be called?

 I think that the command name (i.e., pg_isready) should be given to
 PQpingParams() as fallback_application_name. Otherwise, the server
 by default uses unknown as the application name of pg_isready.
 It's undesirable.

 Why isn't the following message output only when invalid option is
 specified?

 Try \%s --help\ for more information.

I've updated the patch to address these three issues. Attached.


 When the conninfo string including the hostname or port number is
 specified in -d option, pg_isready displays the wrong information
 as follows.

 $ pg_isready -d port=
 /tmp:5432 - no response


This is what i asked about in my previous email about precedence of
the parameters. I can parse that with PQconninfoParse, but what are
the rules for merging both individual and conninfo params together?

For example if someone did: pg_isready -h foo -d host=bar port=4321 -p 1234

What should the connection parameters be?

 Regards,

 --
 Fujii Masao


pg_isready_timeout_v3.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gistchoose vs. bloat


On 21.01.2013 15:19, Heikki Linnakangas wrote:

On 21.01.2013 15:06, Tom Lane wrote:

Jeff Davispg...@j-davis.com writes:

On Mon, 2013-01-21 at 00:48 -0500, Tom Lane wrote:

I looked at this patch. ISTM we should not have the option at all but
just do it always. I cannot believe that always-go-left is ever a
preferable strategy in the long run; the resulting imbalance in the
index will surely kill any possible benefit. Even if there are some
cases where it miraculously fails to lose, how many users are going to
realize that applies to their case and make use of the option?



Sounds good to me.



If I remember correctly, there was also an argument that it may be
useful for repeatable test results. That's a little questionable for
performance (except in those cases where few penalties are identical
anyway), but could plausibly be useful for a crash report or something.


Meh. There's already a random decision, in the equivalent place and for
a comparable reason, in btree (cf _bt_findinsertloc). Nobody's ever
complained about that being indeterminate, so I'm unconvinced that
there's a market for it with gist.


I wonder if it would work for gist to do something similar to
_bt_findinsertloc, and have a bias towards the left page, but sometimes
descend to one of the pages to the right. You would get the cache
locality of usually descending down the same subtree, but also fill the
pages to the right. Jeff / Alexander, want to give that a shot?


I did some experimenting with that. I used the same test case Alexander 
did, with geonames data, and compared unpatched version, the original 
patch, and the attached patch that biases the first best tuple found, 
but still sometimes chooses the other equally good ones.


testname| initsize | finalsize | idx_blks_read | idx_blks_hit
+--+---+---+--
 patched-10-4mb | 75497472 |  90202112 |   5853604 | 10178331
 unpatched-4mb  | 75145216 |  94863360 |   5880676 | 10185647
 unpatched-4mb  | 75587584 |  97165312 |   5903107 | 10183759
 patched-2-4mb  | 74768384 |  81403904 |   5768124 | 10193738
 origpatch-4mb  | 74883072 |  82182144 |   5783412 | 10185373

All these tests were performed with shared_buffers=4MB, so that the 
index won't fit completely in shared buffers, and I could use the 
idx_blk_read/hit ratio as a measure of cache-friendliness. The 
origpath test was with a simplified version of Alexander's patch, see 
attached. It's the same as the original, but with the 
randomization-option and gist-build related code removed. The patched-10 
and patched-2 tests are two variants with my patch, with 1/10 and 1/2 
chance of moving to the next equal tuple, respectively. The differences 
in cache hit ratio might be just a result of different index sizes. I 
included two unpatched runs above to show that there's a fair amount of 
noise in these tests. That's because of the random nature of the test 
case; it picks rows to delete and insert at random.


I think the conclusion is that all of these patches are effective. The 
1/10 variant is less effective, as expected, as it's closer in behavior 
to the unpatched behavior than the others. The 1/2 variant seems as good 
as the original patch.



I performed another test, by inserting 100 duplicate values on an 
empty table.


  testname  | finalsize | idx_blks_read | idx_blks_hit
+---+---+--
 unpatched  |  89030656 | 21350 |  2972033
 patched-10 |  88973312 | 21450 |  2971920
 patched-2  |  88481792 | 22947 |  2970260
 origpatch  |  61186048 |761817 |  2221500

The original patch produces a much smaller index in this test, but since 
the inserts are distributed randomly, the pages are accessed randomly 
which is bad for caching.


A table full of duplicates isn't very realistic, but overall, I'm 
leaning towards my version of this patch (gistchoose-2.patch). It has 
less potential for causing a regression in existing applications, but is 
just as effective in the original scenario of repeated delete+insert.


- Heikki


duplicatetest.sh
Description: Bourne shell script


gistbloat.sh
Description: Bourne shell script
diff --git a/src/backend/access/gist/gistutil.c b/src/backend/access/gist/gistutil.c
index 8b60774..75778f6 100644
--- a/src/backend/access/gist/gistutil.c
+++ b/src/backend/access/gist/gistutil.c
@@ -379,6 +379,7 @@ gistchoose(Relation r, Page p, IndexTuple it,	/* it has compressed entry */
 	GISTENTRY	entry,
 identry[INDEX_MAX_KEYS];
 	bool		isnull[INDEX_MAX_KEYS];
+	int			look_further_on_equal = -1;
 
 	Assert(!GistPageIsLeaf(p));
 
@@ -446,6 +447,8 @@ gistchoose(Relation r, Page p, IndexTuple it,	/* it has compressed entry */
 
 if (j  r-rd_att-natts - 1)
 	best_penalty[j + 1] = -1;
+
+look_further_on_equal = -1;
 			}
 			else if (best_penalty[j] == usize)
 			{
@@ -468,12 +471,46 @@

Re: [HACKERS] Strange Windows problem, lock_timeout test request



On 01/24/2013 01:44 PM, Jeff Janes wrote:

On Sat, Jan 19, 2013 at 12:15 PM, Andrew Dunstan and...@dunslane.net wrote:

On 01/19/2013 02:36 AM, Boszormenyi Zoltan wrote:
A long time ago I had a lot of sympathy with this answer, but these days not
so much. Getting a working mingw/msys environment sufficient to build a bare
bones PostgreSQL from scratch is both cheap and fairly easy. The
improvements that mingw has made in its install process, and the presence of
cheap or free windows instances in the cloud combine to make this pretty
simple.  But since it's still slightly involved here is how I constructed
one such  this morning:

I've used this description, skipping the Amazon part and putting it
directly on my Windows computer, and it worked.

Except bin/pg_ctl does not work.  It just silently exits without doing
anything, so I have to use bin/postgres to start the database (which
is what make check uses anyway, so not a problem if you just want
make check).  Is that just me or is that a known problem?  I've seen
some discussion from 2004, but didn't find a conclusion.


Did you copy libpq.dll from the lib directory to the bin directory? If 
not, try that and see if it fixes the problem.




What advantages does mingw have over MSVC?  Is it just that it is
cost-free, or is it easier to use mingw than MSVC for someone used to
building on Linux?  (mingw certainly does not seem to have the
advantage of being fast!)


See Craig's email today about problems with SDKs and availabilty of



Would you like to put this somewhere on wiki.postgresql.org, or would
you mind if I do so?

Thanks for the primer,

Jeff





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gistchoose vs. bloat

Heikki Linnakangas hlinnakan...@vmware.com writes:
 I did some experimenting with that. I used the same test case Alexander 
 did, with geonames data, and compared unpatched version, the original 
 patch, and the attached patch that biases the first best tuple found, 
 but still sometimes chooses the other equally good ones.

  testname| initsize | finalsize | idx_blks_read | idx_blks_hit
 +--+---+---+--
   patched-10-4mb | 75497472 |  90202112 |   5853604 | 10178331
   unpatched-4mb  | 75145216 |  94863360 |   5880676 | 10185647
   unpatched-4mb  | 75587584 |  97165312 |   5903107 | 10183759
   patched-2-4mb  | 74768384 |  81403904 |   5768124 | 10193738
   origpatch-4mb  | 74883072 |  82182144 |   5783412 | 10185373

 I think the conclusion is that all of these patches are effective. The 
 1/10 variant is less effective, as expected, as it's closer in behavior 
 to the unpatched behavior than the others. The 1/2 variant seems as good 
 as the original patch.

At least on this example, it seems a tad better, if you look at index
size.

 A table full of duplicates isn't very realistic, but overall, I'm 
 leaning towards my version of this patch (gistchoose-2.patch). It has 
 less potential for causing a regression in existing applications, but is 
 just as effective in the original scenario of repeated delete+insert.

+1 for this patch, but I think the comments could use more work.  I was
convinced it was wrong on first examination, mainly because it's hard to
follow the underdocumented look_further_on_equal logic.  I propose the
attached, which is the same logic with better comments (I also chose to
rename and invert the sense of the state variable, because it seemed
easier to follow this way ... YMMV on that though).

regards, tom lane

diff --git a/src/backend/access/gist/gistutil.c b/src/backend/access/gist/gistutil.c
index 8b60774f50349222ee04fef13613e3592120973b..a127627334e80704edeed38d3522e4000a695abb 100644
*** a/src/backend/access/gist/gistutil.c
--- b/src/backend/access/gist/gistutil.c
*** gistchoose(Relation r, Page p, IndexTupl
*** 379,384 
--- 379,385 
  	GISTENTRY	entry,
  identry[INDEX_MAX_KEYS];
  	bool		isnull[INDEX_MAX_KEYS];
+ 	int			keep_current_best;
  
  	Assert(!GistPageIsLeaf(p));
  
*** gistchoose(Relation r, Page p, IndexTupl
*** 402,407 
--- 403,433 
  	best_penalty[0] = -1;
  
  	/*
+ 	 * If we find a tuple that's exactly as good as the currently best one, we
+ 	 * could use either one.  When inserting a lot of tuples with the same or
+ 	 * similar keys, it's preferable to descend down the same path when
+ 	 * possible, as that's more cache-friendly.  On the other hand, if all
+ 	 * inserts land on the same leaf page after a split, we're never going to
+ 	 * insert anything to the other half of the split, and will end up using
+ 	 * only 50% of the available space.  Distributing the inserts evenly would
+ 	 * lead to better space usage, but that hurts cache-locality during
+ 	 * insertion.  To get the best of both worlds, when we find a tuple that's
+ 	 * exactly as good as the previous best, choose randomly whether to stick
+ 	 * to the old best, or use the new one.  Once we decide to stick to the
+ 	 * old best, we keep sticking to it for any subsequent equally good tuples
+ 	 * we might find.  This favors tuples with low offsets, but still allows
+ 	 * some inserts to go to other equally-good subtrees.
+ 	 *
+ 	 * keep_current_best is -1 if we haven't yet had to make a random choice
+ 	 * whether to keep the current best tuple.  If we have done so, and
+ 	 * decided to keep it, keep_current_best is 1; if we've decided to
+ 	 * replace, keep_current_best is 0.  (This state will be reset to -1 as
+ 	 * soon as we've made the replacement, but sometimes we make the choice in
+ 	 * advance of actually finding a replacement best tuple.)
+ 	 */
+ 	keep_current_best = -1;
+ 
+ 	/*
  	 * Loop over tuples on page.
  	 */
  	maxoff = PageGetMaxOffsetNumber(p);
*** gistchoose(Relation r, Page p, IndexTupl
*** 446,451 
--- 472,480 
  
  if (j  r-rd_att-natts - 1)
  	best_penalty[j + 1] = -1;
+ 
+ /* we have new best, so reset keep-it decision */
+ keep_current_best = -1;
  			}
  			else if (best_penalty[j] == usize)
  			{
*** gistchoose(Relation r, Page p, IndexTupl
*** 468,479 
  		}
  
  		/*
! 		 * If we find a tuple with zero penalty for all columns, there's no
! 		 * need to examine remaining tuples; just break out of the loop and
! 		 * return it.
  		 */
  		if (zero_penalty)
! 			break;
  	}
  
  	return result;
--- 497,537 
  		}
  
  		/*
! 		 * If we looped past the last column, and did not update result,
! 		 * then this tuple is exactly as good as the prior best tuple.
! 		 */
! 		if (j == r-rd_att-natts  result != i)
! 		{
! 			if

Re: [HACKERS] Materialized views WIP patch

On Thu, Jan 24, 2013 at 01:29:10PM -0500, Kevin Grittner wrote:
 Noah Misch wrote:
  For the benefit of the archives, I note that we almost need not truncate an
  unlogged materialized view during crash recovery. MVs are refreshed in a
  VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the 
  MV's
  pg_class to that relfilenode. When a crash occurs with no refresh in flight,
  the latest refresh had been safely synced. When a crash cuts short a 
  refresh,
  the pg_class update will not stick, and the durability of the old heap is 
  not
  in doubt. However, non-btree index builds don't have the same property; we
  would need to force an immediate sync of the indexes to be safe here. It
  would remain necessary to truncate unlogged MVs when recovering a base 
  backup,
  which may contain a partially-written refresh that did eventually commit.
  Future MV variants that modify the MV in place would also need the usual
  truncate on crash.
 
 Hmm. That's a very good observation. Perhaps the issue can be
 punted to a future release where we start adding more incremental
 updates to them. I'll think on that, but on the face of it, it
 sounds like the best choice.

That situation is challenging for the same reason pg_class.relisvalid was hard
to implement for unlogged relations.  The startup process doesn't know the
relkind of the unlogged-relation relfilenodes it cleans.  If you can work
through all that, it's certainly a nice endpoint to not lose unlogged snapshot
MVs on crash.  But I intended the first half of my message as the
recommendation and the above as a wish for the future.

 You might want to ignore the interim work on detecting the new
 pg_dump dependencies through walking the internal structures. I
 decided that was heading in a direction which might be
 unnecessarily fragile and slow; so I tried writing it as a query
 against the system tables. I'm pretty happy with the results.
 Here's the query:
 
 with recursive w as
[snip]

Why is the dependency problem of ordering MV refreshes and MV index builds so
different from existing pg_dump dependency problems?

 If we bail on having pg_class.relisvalid, then it will obviously
 need adjustment.

Even if we don't have the column, we can have the fact of an MV's validity
SQL-visible in some other way.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Strange Windows problem, lock_timeout test request



On 01/24/2013 02:41 PM, Andrew Dunstan wrote:




What advantages does mingw have over MSVC?  Is it just that it is
cost-free, or is it easier to use mingw than MSVC for someone used to
building on Linux?  (mingw certainly does not seem to have the
advantage of being fast!)


See Craig's email today about problems with SDKs and availabilty of



Not sure what happened there.

... availability of free 64 bit MSVC compilers.

Also, some third party libraries are built with the Mingw compilers and 
it's often best not to switch if you can help it.






Would you like to put this somewhere on wiki.postgresql.org, or would
you mind if I do so?


Please go for it.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gistchoose vs. bloat

2013-01-24 Thread Alexander Korotkov

On Thu, Jan 24, 2013 at 11:26 PM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:

 On 21.01.2013 15:19, Heikki Linnakangas wrote:

 On 21.01.2013 15:06, Tom Lane wrote:

 Jeff Davispg...@j-davis.com writes:

 On Mon, 2013-01-21 at 00:48 -0500, Tom Lane wrote:

 I looked at this patch. ISTM we should not have the option at all but
 just do it always. I cannot believe that always-go-left is ever a
 preferable strategy in the long run; the resulting imbalance in the
 index will surely kill any possible benefit. Even if there are some
 cases where it miraculously fails to lose, how many users are going to
 realize that applies to their case and make use of the option?


  Sounds good to me.


  If I remember correctly, there was also an argument that it may be
 useful for repeatable test results. That's a little questionable for
 performance (except in those cases where few penalties are identical
 anyway), but could plausibly be useful for a crash report or something.


 Meh. There's already a random decision, in the equivalent place and for
 a comparable reason, in btree (cf _bt_findinsertloc). Nobody's ever
 complained about that being indeterminate, so I'm unconvinced that
 there's a market for it with gist.


 I wonder if it would work for gist to do something similar to
 _bt_findinsertloc, and have a bias towards the left page, but sometimes
 descend to one of the pages to the right. You would get the cache
 locality of usually descending down the same subtree, but also fill the
 pages to the right. Jeff / Alexander, want to give that a shot?


 I did some experimenting with that. I used the same test case Alexander
 did, with geonames data, and compared unpatched version, the original
 patch, and the attached patch that biases the first best tuple found, but
 still sometimes chooses the other equally good ones.

 testname| initsize | finalsize | idx_blks_read | idx_blks_hit
 +--+--**-+---+**--
  patched-10-4mb | 75497472 |  90202112 |   5853604 | 10178331
  unpatched-4mb  | 75145216 |  94863360 |   5880676 | 10185647
  unpatched-4mb  | 75587584 |  97165312 |   5903107 | 10183759
  patched-2-4mb  | 74768384 |  81403904 |   5768124 | 10193738
  origpatch-4mb  | 74883072 |  82182144 |   5783412 | 10185373

 All these tests were performed with shared_buffers=4MB, so that the index
 won't fit completely in shared buffers, and I could use the
 idx_blk_read/hit ratio as a measure of cache-friendliness. The origpath
 test was with a simplified version of Alexander's patch, see attached. It's
 the same as the original, but with the randomization-option and gist-build
 related code removed. The patched-10 and patched-2 tests are two variants
 with my patch, with 1/10 and 1/2 chance of moving to the next equal tuple,
 respectively. The differences in cache hit ratio might be just a result of
 different index sizes. I included two unpatched runs above to show that
 there's a fair amount of noise in these tests. That's because of the random
 nature of the test case; it picks rows to delete and insert at random.

 I think the conclusion is that all of these patches are effective. The
 1/10 variant is less effective, as expected, as it's closer in behavior to
 the unpatched behavior than the others. The 1/2 variant seems as good as
 the original patch.


Does two unpatched-4mb lines are different by coincidence? If so then
variance is significant and we need more experiments to actually compare
patched-2-4mb and origpatch-4mb lines.
There is another cause of overhead when use randomization in gistchoose:
extra penalty calls. It could be significant when index fits to cache. In
order evade it I especially change behaviour of my patch from look
sequentially and choose random to look in random order. I think we need
to include comparison of CPU time.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] Materialized views WIP patch

2013-01-24 Thread Kevin Grittner

Noah Misch wrote:
 On Thu, Jan 24, 2013 at 01:29:10PM -0500, Kevin Grittner wrote:
 Noah Misch wrote:
 For the benefit of the archives, I note that we almost need not truncate an
 unlogged materialized view during crash recovery. MVs are refreshed in a
 VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the 
 MV's
 pg_class to that relfilenode. When a crash occurs with no refresh in flight,
 the latest refresh had been safely synced. When a crash cuts short a 
 refresh,
 the pg_class update will not stick, and the durability of the old heap is 
 not
 in doubt. However, non-btree index builds don't have the same property; we
 would need to force an immediate sync of the indexes to be safe here. It
 would remain necessary to truncate unlogged MVs when recovering a base 
 backup,
 which may contain a partially-written refresh that did eventually commit.
 Future MV variants that modify the MV in place would also need the usual
 truncate on crash.
 
 Hmm. That's a very good observation. Perhaps the issue can be
 punted to a future release where we start adding more incremental
 updates to them. I'll think on that, but on the face of it, it
 sounds like the best choice.
 
 That situation is challenging for the same reason pg_class.relisvalid was hard
 to implement for unlogged relations. The startup process doesn't know the
 relkind of the unlogged-relation relfilenodes it cleans. If you can work
 through all that, it's certainly a nice endpoint to not lose unlogged snapshot
 MVs on crash. But I intended the first half of my message as the
 recommendation and the above as a wish for the future.

Well, if I just don't create an init fork for MVs, they are left as
they were on recovery, aren't they? So for 9.3, that solves that
issue, I think. pg_class.relisvald is a separate issue.

 You might want to ignore the interim work on detecting the new
 pg_dump dependencies through walking the internal structures. I
 decided that was heading in a direction which might be
 unnecessarily fragile and slow; so I tried writing it as a query
 against the system tables. I'm pretty happy with the results.
 Here's the query:
 
 with recursive w as
 [snip]
 
 Why is the dependency problem of ordering MV refreshes and MV index builds so
 different from existing pg_dump dependency problems?

If mva has indexes and is referenced by mvb, the CREATE statements
are all properly ordered, but you want mva populated and indexed
before you attempt to populate mvb. (Populated to get correct
results, indexed to get them quickly.) We don't have anything else
like that.

 If we bail on having pg_class.relisvalid, then it will obviously
 need adjustment.
 
 Even if we don't have the column, we can have the fact of an MV's validity
 SQL-visible in some other way.

Sure, I didn't say we had to abandon the query -- probably just
replace the relisvalid tests with a function call using the oid of
the MV.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

Jameison Martin jameis...@yahoo.com writes:
 there have been a lot of different threads on this patch. i'm going to take a 
 stab at teasing them out, summarizing them, and addressing them individually.

Thanks for reviewing the bidding so carefully.

 4.3) does it violate a principle in the code (Tom's latest email)

 from my perspective, the code has to deal with the fact that natts on the 
 persistent row is different than the tupdesc already, this is the foundation 
 upon which ALTER TABLE ADD NULL and DROP COLUMN as a metadata operation are 
 built. so i believe that this patch strengthens the code's ability to handle 
 the ALTER TABLE ADD NULL case by generalizing it: now the code will more 
 frequently need to deal with the disparity between natts and tupdesc rather 
 than only after someone did an ALTER TABLE. i'm an advocate of making corner 
 cases go through generalized code where possible.

I think we're already pretty convinced that that case works, since ALTER
TABLE ADD COLUMN has been around for a very long time.

 however, in all honestly i hadn't consider Tom's point that the patch has 
 created a situation where natts on a row can deviate from the tupdesc that it 
 was built with (as opposed to the current tupdesc which it could always 
 deviate due to a subsequent ALTER TABLE). this is a subtle point and i don't 
 think i have enough experience in the Postgres code to argue one way or 
 another about it. so i'll have to leave that determination up to people that 
 are more familiar with the code.

To be a bit more concrete, the thing that is worrying me is that the
executor frequently transforms tuples between virtual and HeapTuple
formats (look at the TupleTableSlot infrastructure, junk filters, and
tuplesort/tuplestore, for example).  Up to now such transformation could
be counted on not to affect the apparent number of columns in the tuple;
but with this patch, a tuple materialized as a HeapTuple might well show
an natts smaller than what you'd conclude from looking at it in virtual
slot format.  Now it's surely true that any code that's broken by that
would be broken anyway if it were fed a tuple direct-from-disk from a
relation that had suffered a later ADD COLUMN, but there are lots of
code paths where raw disk tuples would never appear.  So IMO there's a
pretty fair chance of exposing latent bugs with this.

As an example that this type of concern isn't hypothetical, and that
identifying such bugs can be very difficult, see commit 039680aff.
That bug had been latent for years before it was exposed by an unrelated
change, and it took several more years to track it down.

As I said, I'd be willing to take this risk if the patch showed more
attractive performance benefits ... but it still seems mighty marginal
from here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gistchoose vs. bloat

Alexander Korotkov aekorot...@gmail.com writes:
 There is another cause of overhead when use randomization in gistchoose:
 extra penalty calls. It could be significant when index fits to cache. In
 order evade it I especially change behaviour of my patch from look
 sequentially and choose random to look in random order. I think we need
 to include comparison of CPU time.

Hmm ... actually, isn't that an argument in favor of Heikki's method?
The way he's doing it, we can exit without making additional penalty
calls once we've found a zero-penalty tuple and decided not to look
further (which is something with a pretty high probability).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [BUGS] BUG #7815: Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails - invalid status retrieve

On Wed, Jan 23, 2013 at 04:58:57PM -0500, Bruce Momjian wrote:
 Attached is a ready-to-apply version of the patch.  I continued to use
 postmaster.pid to determine if I need to try the start/stop test ---
 that allows me to know which servers _might_ be running, because a
 server start failure might be for many reasons and I would prefer not to
 suggest the server is running if I can avoid it, and the pid file gives
 me that.
 
 The patch is longer than I expected because the code needed to be
 reordered.  The starting banner indicates if it a live check, but to
 check for a live server we need to start/stop the servers and we needed
 to know where the binaries are, and we didn't do that until after the
 start banner.  I removed the 'check' line for binary checks, and moved
 that before the banner printing.  postmaster_start also now needs an
 option to supress an error.

Applied.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

On Thu, Jan 24, 2013 at 03:14:15PM -0500, Kevin Grittner wrote:
 Noah Misch wrote:
  On Thu, Jan 24, 2013 at 01:29:10PM -0500, Kevin Grittner wrote:
  Noah Misch wrote:
  For the benefit of the archives, I note that we almost need not truncate 
  an
  unlogged materialized view during crash recovery. MVs are refreshed in a
  VACUUM FULL-like manner: fill a new relfilenode, fsync it, and point the 
  MV's
  pg_class to that relfilenode. When a crash occurs with no refresh in 
  flight,
  the latest refresh had been safely synced. When a crash cuts short a 
  refresh,
  the pg_class update will not stick, and the durability of the old heap is 
  not
  in doubt. However, non-btree index builds don't have the same property; we
  would need to force an immediate sync of the indexes to be safe here. It
  would remain necessary to truncate unlogged MVs when recovering a base 
  backup,
  which may contain a partially-written refresh that did eventually commit.
  Future MV variants that modify the MV in place would also need the usual
  truncate on crash.
  
  Hmm. That's a very good observation. Perhaps the issue can be
  punted to a future release where we start adding more incremental
  updates to them. I'll think on that, but on the face of it, it
  sounds like the best choice.
  
  That situation is challenging for the same reason pg_class.relisvalid was 
  hard
  to implement for unlogged relations. The startup process doesn't know the
  relkind of the unlogged-relation relfilenodes it cleans. If you can work
  through all that, it's certainly a nice endpoint to not lose unlogged 
  snapshot
  MVs on crash. But I intended the first half of my message as the
  recommendation and the above as a wish for the future.
 
 Well, if I just don't create an init fork for MVs, they are left as
 they were on recovery, aren't they? So for 9.3, that solves that
 issue, I think. pg_class.relisvald is a separate issue.

The startup process just looks for init forks, yes.  But it's acceptable to
leave the unlogged MV materials alone during *crash* recovery only.  When
recovering from a base backup, we once again need an init fork to refresh the
unlogged-MV relations.  In turn, we would still need a relisvalid
implementation that copes.  This is all solvable, sure, but it looks like a
trip off into the weeds relative to the core aim of this patch.

  Why is the dependency problem of ordering MV refreshes and MV index builds 
  so
  different from existing pg_dump dependency problems?
 
 If mva has indexes and is referenced by mvb, the CREATE statements
 are all properly ordered, but you want mva populated and indexed
 before you attempt to populate mvb. (Populated to get correct
 results, indexed to get them quickly.) We don't have anything else
 like that.

Is the REFRESH order just a replay of the CREATE order (with index builds
interspersed), or can it differ?

nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gistchoose vs. bloat


On 24.01.2013 22:35, Tom Lane wrote:

Alexander Korotkovaekorot...@gmail.com  writes:

There is another cause of overhead when use randomization in gistchoose:
extra penalty calls. It could be significant when index fits to cache. In
order evade it I especially change behaviour of my patch from look
sequentially and choose random to look in random order. I think we need
to include comparison of CPU time.


Hmm ... actually, isn't that an argument in favor of Heikki's method?
The way he's doing it, we can exit without making additional penalty
calls once we've found a zero-penalty tuple and decided not to look
further (which is something with a pretty high probability).


No, I think Alexander is right, although I believe the difference is 
minimal in practice.


If we assume that the there are no zero-penalty tuples on the page, with 
both approaches you have to call penalty on every tuple to know which is 
best. If there are zero-penalty tuples, then there is a small 
difference. With Alexander's method, you can stop looking as soon as you 
find a zero-penalty tuple (the order you check the tuples is random). 
With my method, you can stop looking as soon as you find a zero-penalty 
tuple, *and* you decide to not look further. With the 1/2 probability to 
stop looking further, you give up on average after 2 tuples.


So if I'm doing my math right, my patch does on average between 1x - 2x 
as many penalty calls as Alexander's, depending on how many zero-penalty 
tuples there are. The 2x case is when the page is full of zero-penalty 
tuples, in which case the difference isn't big in absolute terms; 2 
penalty calls per page versus 1.


BTW, one thing that I wondered about this: How expensive is random()? 
I'm assuming not very, but I don't really know. Alexander's patch called 
random() for every tuple on the page, while I call it only once for each 
equal-penalty tuple. If it's really cheap, I think my/Tom's patch could 
be slightly simplified by always initializing keep_current_best with 
random() at top of the function, and eliminating the -1 haven't decided 
yet state. OTOH, if random() is expensive, I note that we only need one 
bit of randomness, so we could avoid calling random() so often if we 
utilized all the bits from each random() call.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] text search: restricting the number of parsed words in headline generation

On Wed, Aug 15, 2012 at 11:09:18PM +0530, Sushant Sinha wrote:
 I will do the profiling and present the results.

Sushant, do you have any profiling results on this issue from August?

---


 
 On Wed, 2012-08-15 at 12:45 -0400, Tom Lane wrote:
  Bruce Momjian br...@momjian.us writes:
   Is this a TODO?
  
  AFAIR nothing's been done about the speed issue, so yes.  I didn't
  like the idea of creating a user-visible knob when the speed issue
  might be fixable with internal algorithm improvements, but we never
  followed up on this in either fashion.
  
  regards, tom lane
  
   ---
  
   On Tue, Aug 23, 2011 at 10:31:42PM -0400, Tom Lane wrote:
   Sushant Sinha sushant...@gmail.com writes:
   Doesn't this force the headline to be taken from the first N words of
   the document, independent of where the match was?  That seems rather
   unworkable, or at least unhelpful.
   
   In headline generation function, we don't have any index or knowledge of
   where the match is. We discover the matches by first tokenizing and then
   comparing the matches with the query tokens. So it is hard to do
   anything better than first N words.
   
   After looking at the code in wparser_def.c a bit more, I wonder whether
   this patch is doing what you think it is.  Did you do any profiling to
   confirm that tokenization is where the cost is?  Because it looks to me
   like the match searching in hlCover() is at least O(N^2) in the number
   of tokens in the document, which means it's probably the dominant cost
   for any long document.  I suspect that your patch helps not so much
   because it saves tokenization costs as because it bounds the amount of
   effort spent in hlCover().
   
   I haven't tried to do anything about this, but I wonder whether it
   wouldn't be possible to eliminate the quadratic blowup by saving more
   state across the repeated calls to hlCover().  At the very least, it
   shouldn't be necessary to find the last query-token occurrence in the
   document from scratch on each and every call.
   
   Actually, this code seems probably flat-out wrong: won't every
   successful call of hlCover() on a given document return exactly the same
   q value (end position), namely the last token occurrence in the
   document?  How is that helpful?
   
   regards, tom lane
   
   -- 
   Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
   To make changes to your subscription:
   http://www.postgresql.org/mailpref/pgsql-hackers
  
   -- 
 Bruce Momjian  br...@momjian.ushttp://momjian.us
 EnterpriseDB http://enterprisedb.com
  
 + It's impossible for everything to be true. +
  
  
   -- 
   Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
   To make changes to your subscription:
   http://www.postgresql.org/mailpref/pgsql-hackers
 
 

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #6572: The example of SPI_execute is bogus

On Wed, Aug 29, 2012 at 01:13:51PM +, Rajeev rastogi wrote:

 From: pgsql-bugs-ow...@postgresql.org [pgsql-bugs-ow...@postgresql.org] on 
 behalf of Bruce Momjian [br...@momjian.us]
 Sent: Wednesday, August 29, 2012 8:46 AM
 To: Tom Lane
 Cc: Robert Haas; Hitoshi Harada; pgsql-b...@postgresql.org; 
 pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] [BUGS] BUG #6572: The example of SPI_execute is bogus

 On Sun, Apr 15, 2012 at 12:29:39PM -0400, Tom Lane wrote:
  Robert Haas robertmh...@gmail.com writes:
   On Thu, Apr 5, 2012 at 2:39 AM, Hitoshi Harada umi.tan...@gmail.com 
   wrote:
   On Wed, Apr 4, 2012 at 8:00 AM, Tom Lane t...@sss.pgh.pa.us wrote:
   Given the lack of complaints since 9.0, maybe we should not fix this
   but just redefine the new behavior as being correct?  But it seems
   mighty inconsistent that the tuple limit would apply if you have
   RETURNING but not when you don't.  In any case, the ramifications
   are wider than one example in the SPI docs.

   To be honest, I was surprised when I found tcount parameter is said to
   be applied to even INSERT.  I believe people think that parameter is
   to limit memory consumption when returning tuples thus it'd be applied
   for only SELECT or DML with RETURNING.  So I'm +1 for non-fix but
   redefine the behavior.  Who wants to limit the number of rows
   processed inside the backend, from SPI?

   Yeah.

  Okay, apparently nobody cares about RETURNING behaving differently from
  non-RETURNING, so the consensus is to redefine the current behavior as
  correct.  That means what we need is to go through the docs and see what
  places need to be updated (and, I guess, back-patch the changes to 9.0).
  I will get to this if nobody else does, but not right away.

  Would someone make the doc change outlined above?  Thanks.

 I would like to work on this documentation bug.
 As per analysis I am planning to update following SPI function:
 1. SPI_Execute: Here we will mention that argument count is used only for the 
 kind of command which returns result i.e. all kind of SELECT and DML with 
 returning clause. count is ignored for any other kind of commands. I will add 
 one example also to indicate the difference.
 2. SPI_execute_plan_with_paramlist: Here we can give just reference to 
 SPI_execute i.e. I will mention that count has same interpretation as in 
 SPI_execute.
 3. SPI_execp: Here we can give just reference to SPI_execute i.e. I will 
 mention that count has same interpretation as in SPI_execute.

Would someone please provide answers to these questions, or write a
patch?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-01-24 Thread Jeff Davis

On Wed, 2013-01-16 at 17:38 -0800, Jeff Davis wrote:
 New version of checksums patch.

And another new version of both patches.

Changes:
 * Rebased.
 * Rename SetBufferCommitInfoNeedsSave to MarkBufferDirtyHint. Now that
it's being used more places, it makes sense to give it a more generic
name.
 * My colleague, Yingjie He, noticed that the FSM doesn't write any WAL,
and therefore we must protect those operations against torn pages. That
seems simple enough: just use MarkBufferDirtyHint (formerly
SetBufferCommitInfoNeedsSave) instead of MarkBufferDirty. The FSM
changes are not critical, so the fact that we may lose the dirty bit is
OK.

Regards,
Jeff Davis


replace-tli-with-checksums-20130124.patch.gz
Description: GNU Zip compressed data


checksums-20130124.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG #6510: A simple prompt is displayed using wrong charset



On 01/24/2013 11:19 AM, Noah Misch wrote:

On Thu, Jan 24, 2013 at 08:50:36AM -0500, Andrew Dunstan wrote:

On 01/24/2013 03:42 AM, Craig Ringer wrote:

On 01/24/2013 01:06 AM, Alexander Law wrote:

Hello,
Please let me know if I can do something to get the bug fix
(https://commitfest.postgresql.org/action/patch_view?id=902)
committed.
I would like to fix other bugs related to postgres localization, but
I am not sure yet how to do it.

For anyone looking for the history, the 1st post on this topic is here:

http://www.postgresql.org/message-id/e1s3twb-0004oy...@wrigleys.postgresql.org


Yeah.

I'm happy enough with this patch. ISTM it's really a bug and should be
backpatched, no?

It is a bug, yes.  I'm neutral on whether to backpatch.



Well, that's what I did. :-)

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gistchoose vs. bloat

Heikki Linnakangas hlinnakan...@vmware.com writes:
 BTW, one thing that I wondered about this: How expensive is random()? 
 I'm assuming not very, but I don't really know. Alexander's patch called 
 random() for every tuple on the page, while I call it only once for each 
 equal-penalty tuple. If it's really cheap, I think my/Tom's patch could 
 be slightly simplified by always initializing keep_current_best with 
 random() at top of the function, and eliminating the -1 haven't decided 
 yet state.

I thought about that too, and concluded that random() is probably
expensive enough that we don't want to call it unnecessarily.

 OTOH, if random() is expensive, I note that we only need one 
 bit of randomness, so we could avoid calling random() so often if we 
 utilized all the bits from each random() call.

Meh.  That would hard-wire the decision that the probability of keeping
a best tuple is exactly 0.5.  I'd rather keep the flexibility to tune it
later.  The way your patch is set up, it seems unlikely that it will
call random() very many times per page, so I'm not that concerned about
minimizing the number of calls further.  (Also, in the probably-common
case that there are no exactly equally good alternatives, this would
actually be a net loss since it would add one unnecessary call.)

So basically, Alexander's method requires more random() calls and fewer
penalty() calls than yours, at least in typical cases.  It's hard to say
which is faster without benchmarking, and it would matter which opclass
you were testing on.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #6572: The example of SPI_execute is bogus

Bruce Momjian br...@momjian.us writes:
 Would someone make the doc change outlined above?  Thanks.

Sorry, I'd nearly forgotten about this issue.  Will see about fixing the
docs.  (It looks like some of the comments in execMain.c could use work
too.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] autovacuum not prioritising for-wraparound tables

2013-01-24 Thread Alvaro Herrera

Hi,

I have a bug pending that autovacuum fails to give priority to
for-wraparound tables.  When xid consumption rate is high and dead tuple
creation is also high, it is possible that some tables are waiting for
for-wraparound vacuums that don't complete in time because the workers
are busy processing other tables that have accumulated dead tuples; the
system is then down because it's too near the Xid wraparound horizon.
Apparently this is particularly notorious in connection with TOAST
tables, because those are always put in the tables-to-process list after
regular tables.

(As far as I recall, this was already reported elsewhere, but so far I
have been unable to find the discussion in the archives.  Pointers
appreciated.)

So here's a small, backpatchable patch that sorts the list of tables to
process (not all that much tested yet).  Tables which have the
wraparound flag set are processed before those that are not.  Other
than this criterion, the order is not defined.

Now we could implement this differently, and maybe more simply (say by
keeping two lists of tables to process, one with for-wraparound tables
and one with the rest) but this way it is simpler to add additional
sorting criteria later: say within each category we could first process
smaller tables that have more dead tuples.

My intention is to clean this up and backpatch to all live branches.
Comments?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
*** a/src/backend/postmaster/autovacuum.c
--- b/src/backend/postmaster/autovacuum.c
***
*** 1890,1895  get_database_list(void)
--- 1890,1919 
  	return dblist;
  }
  
+ typedef struct vacuum_table
+ {
+ 	Oid		reloid;
+ 	bool	wraparound;
+ } vacuum_table;
+ 
+ /*
+  * qsort comparator.  Tables marked for wraparound sort first.
+  */
+ static int
+ vacpriority_comparator(const void *a, const void *b)
+ {
+ 	const vacuum_table *taba = *(vacuum_table *const *) a;
+ 	const vacuum_table *tabb = *(vacuum_table *const *) b;
+ 
+ 	if (taba-wraparound == tabb-wraparound)
+ 		return 0;
+ 
+ 	if (taba-wraparound)
+ 		return -1;
+ 	else
+ 		return 1;
+ }
+ 
  /*
   * Process a database table-by-table
   *
***
*** 1903,1917  do_autovacuum(void)
  	HeapTuple	tuple;
  	HeapScanDesc relScan;
  	Form_pg_database dbForm;
- 	List	   *table_oids = NIL;
  	HASHCTL		ctl;
  	HTAB	   *table_toast_map;
- 	ListCell   *volatile cell;
  	PgStat_StatDBEntry *shared;
  	PgStat_StatDBEntry *dbentry;
  	BufferAccessStrategy bstrategy;
  	ScanKeyData key;
  	TupleDesc	pg_class_desc;
  
  	/*
  	 * StartTransactionCommand and CommitTransactionCommand will automatically
--- 1927,1943 
  	HeapTuple	tuple;
  	HeapScanDesc relScan;
  	Form_pg_database dbForm;
  	HASHCTL		ctl;
  	HTAB	   *table_toast_map;
  	PgStat_StatDBEntry *shared;
  	PgStat_StatDBEntry *dbentry;
  	BufferAccessStrategy bstrategy;
  	ScanKeyData key;
  	TupleDesc	pg_class_desc;
+ 	vacuum_table **tables;
+ 	int			ntables;
+ 	int			maxtables;
+ 	int			i;
  
  	/*
  	 * StartTransactionCommand and CommitTransactionCommand will automatically
***
*** 1986,1991  do_autovacuum(void)
--- 2012,2022 
    ctl,
    HASH_ELEM | HASH_FUNCTION);
  
+ 	/* initialize our tasklist */
+ 	ntables = 0;
+ 	maxtables = 32;
+ 	tables = palloc(maxtables * sizeof(vacuum_table *));
+ 
  	/*
  	 * Scan pg_class to determine which tables to vacuum.
  	 *
***
*** 2077,2085  do_autovacuum(void)
  		}
  		else
  		{
! 			/* relations that need work are added to table_oids */
  			if (dovacuum || doanalyze)
! table_oids = lappend_oid(table_oids, relid);
  
  			/*
  			 * Remember the association for the second pass.  Note: we must do
--- 2108,2133 
  		}
  		else
  		{
! 			/* relations that need work are added to our tasklist */
  			if (dovacuum || doanalyze)
! 			{
! vacuum_table *tab;
! 
! /* enlarge the array if necessary */
! if (ntables = maxtables)
! {
! 	maxtables *= 2;
! 	tables = repalloc(tables,
! 	  maxtables * sizeof(vacuum_table *));
! }
! 
! tab = palloc(sizeof(vacuum_table));
! 
! tab-reloid = relid;
! tab-wraparound = wraparound;
! 
! tables[ntables++] = tab;
! 			}
  
  			/*
  			 * Remember the association for the second pass.  Note: we must do
***
*** 2162,2168  do_autovacuum(void)
  
  		/* ignore analyze for toast tables */
  		if (dovacuum)
! 			table_oids = lappend_oid(table_oids, relid);
  	}
  
  	heap_endscan(relScan);
--- 2210,2233 
  
  		/* ignore analyze for toast tables */
  		if (dovacuum)
! 		{
! 			vacuum_table *tab;
! 
! 			/* enlarge the array if necessary */
! 			if (ntables = maxtables)
! 			{
! maxtables *= 2;
! tables = repalloc(tables,
!   maxtables * sizeof(vacuum_table *));
! 			}
! 
! 			tab = palloc(sizeof(vacuum_table));
! 
! 			tab-reloid = relid;
! 			tab-wraparound = wraparound;
! 
!

Re: [HACKERS] COPY FREEZE has no warning

On Wed, Jan 23, 2013 at 02:02:46PM -0500, Bruce Momjian wrote:
 As a reminder, COPY FREEZE still does not issue any warning/notice if
 the freezing does not happen:
 
   Requests copying the data with rows already frozen, just as they
   would be after running the commandVACUUM FREEZE/ command.
   This is intended as a performance option for initial data loading.
   Rows will be frozen only if the table being loaded has been created
   in the current subtransaction, there are no cursors open and there
   are no older snapshots held by this transaction. If those conditions
   are not met the command will continue without error though will not
   freeze rows. It is also possible in rare cases that the request
   cannot be honoured for internal reasons, hence literalFREEZE/literal
   is more of a guideline than a hard rule.
 
   Note that all other sessions will immediately be able to see the data
   once it has been successfully loaded. This violates the normal rules
   of MVCC visibility and by specifying this option the user acknowledges
   explicitly that this is understood.
 
 Didn't we want to issue the user some kind of feedback?

As no one wanted to write this patch, I have developed the attached
version.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
new file mode 100644
index 8778e8b..c8c9821
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*** CopyFrom(CopyState cstate)
*** 2011,2023 
  		 * where we can apply the optimization, so in those rare cases
  		 * where we cannot honour the request we do so silently.
  		 */
! 		if (cstate-freeze 
! 			ThereAreNoPriorRegisteredSnapshots() 
! 			ThereAreNoReadyPortals() 
! 			(cstate-rel-rd_newRelfilenodeSubid == GetCurrentSubTransactionId() ||
! 			 cstate-rel-rd_createSubid == GetCurrentSubTransactionId()))
! 			hi_options |= HEAP_INSERT_FROZEN;
  	}
  
  	/*
  	 * We need a ResultRelInfo so we can use the regular executor's
--- 2011,2029 
  		 * where we can apply the optimization, so in those rare cases
  		 * where we cannot honour the request we do so silently.
  		 */
! 		if (cstate-freeze)
! 		{
! 			if (ThereAreNoPriorRegisteredSnapshots() 
! ThereAreNoReadyPortals() 
! (cstate-rel-rd_newRelfilenodeSubid == GetCurrentSubTransactionId() ||
!  cstate-rel-rd_createSubid == GetCurrentSubTransactionId()))
! hi_options |= HEAP_INSERT_FROZEN;
! 			else
! ereport(NOTICE, (errmsg(unable to honor \freeze\ option)));
! 		}
  	}
+ 	else if (cstate-freeze)
+ 		ereport(NOTICE, (errmsg(unable to honor \freeze\ option)));
  
  	/*
  	 * We need a ResultRelInfo so we can use the regular executor's
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
new file mode 100644
index 78c601f..126b3c7
*** a/src/test/regress/expected/copy2.out
--- b/src/test/regress/expected/copy2.out
*** SELECT * FROM vistest;
*** 334,345 
--- 334,347 
  COMMIT;
  TRUNCATE vistest;
  COPY vistest FROM stdin CSV FREEZE;
+ NOTICE:  unable to honor freeze option
  BEGIN;
  INSERT INTO vistest VALUES ('z');
  SAVEPOINT s1;
  TRUNCATE vistest;
  ROLLBACK TO SAVEPOINT s1;
  COPY vistest FROM stdin CSV FREEZE;
+ NOTICE:  unable to honor freeze option
  SELECT * FROM vistest;
   a  
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #6572: The example of SPI_execute is bogus