Re: [HACKERS] Latch implementation that wakes on postmaster death on both win32 and Unix

2011-06-30 Thread Fujii Masao
On Sat, Jun 25, 2011 at 10:41 AM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 Attached is patch that addresses Fujii's third and most recent set of 
 concerns.

Thanks for updating the patch!

 I think that Heikki is currently taking another look at my work,
 because he indicates in a new message to the list a short time ago
 that while reviewing my patch, he realised that there may be an
 independent problem with silent_mode. I will wait for his remarks
 before producing another version of the patch that incorporates those
 two small changes.

Yes, we should wait for the comments from Heikki. But, I have another
comments;

InitPostmasterDeathWatchHandle() can be static function because it's
used only in postmaster.c.

ReleasePostmasterDeathWatchHandle() can call ereport(FATAL) before
StartChildProcess() or BackendStartup() calls on_exit_reset() and resets
MyProcPid. This looks unsafe. If that ereport(FATAL) is unfortunately
called, a process other than postmaster would perform the postmaster's
proc-exit handlers. And that ereport(FATAL) would report wrong pid
when %p is specified in log_line_prefix. What about closing the pipe in
ClosePostmasterPorts() and removing ReleasePostmasterDeathWatchHandle()?

+   /*
+* Set O_NONBLOCK to allow checking for the fd's presence with a 
select() call
+*/
+   if (fcntl(postmaster_alive_fds[POSTMASTER_FD_WATCH], F_SETFL, 
O_NONBLOCK))
+   {
+   ereport(FATAL,
+   (errcode_for_socket_access(),
+errmsg(failed to set the postmaster death watching 
fd's flags: %m)));
+   }

I don't think that the pipe fd needs to be set to non-blocking mode
since we don't read or write on it.

http://developer.postgresql.org/pgdocs/postgres/error-style-guide.html
According to the error style guide, I think that it's better to change the
following messages:

+errmsg( pipe() call failed to create pipe to monitor 
postmaster
death: %m)));

could not create pipe for monitoring postmaster death: %m

+errmsg(failed to close file descriptor associated with
postmaster death in child process)));

could not close postmaster pipe: %m

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ToDo: list of active channels

2011-06-30 Thread Pavel Stehule
Hello

I use a LISTEN/NOTIFY. Now I have to check, if second application that
creates channels is active. It should be simple with system view of
active channels.

Regards

Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Online base backup from the hot-standby

2011-06-30 Thread Jun Ishiduka

   Process of online base backup on standby server:
1. pg_start_backup('x');
2. copy the data directory
3. copy *pg_control*
  
  Who deletes the backup_label file created by pg_start_backup()?
  Isn't pg_stop_backup() required to do that?

 You need it to take the system out of backup mode as well, don't you?

Certainly so.

Add to the above process:
  4. pg_stop_backup();

But I do not consider a case such as to promote in backup mode yet.
I need to think a little, including it.

On this commitfest, the goal of the patch is to be able to be 
recovered using Minimum recovery ending location in the control file.


Jun Ishizuka
NTT Software Corporation
TEL:045-317-7018
E-Mail: ishizuka@po.ntts.co.jp




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types, constructors, and the type system

2011-06-30 Thread Florian Pflug
On Jun29, 2011, at 23:44 , Peter Eisentraut wrote:
 On ons, 2011-06-29 at 10:15 -0700, David E. Wheeler wrote:
 On Jun 29, 2011, at 10:13 AM, Florian Pflug wrote:
 Because there might be more than one range type for a
 base type. Say there are two range types over text, one
 with collation 'de_DE' and one with collation 'en_US'.
 What would the type of
 range('foo', 'f')
 be?
 
 The one that corresponds to the current LC_COLLATE setting.
 
 Yes, or more generally, we have logic that determines, for example, what
 collation to use for
 
 'foo'  'f'
 
 The same logic can be used to determine what collation to use for
 
 range('foo', 'f')
 
 (In fact, if you implement range() as a user-space function, that will
 happen automatically.)

I don't think it will - as it stands, there isn't a single collatable
type RANGE but instead one *distinct* type per combination of base type,
btree opclass and collation. The reasons for that were discussed at length -
the basic argument for doing it that way was to make a range represent
a fixed set of values.

There's also no guarantee that a range type with collation LC_COLLATE
even exists.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch file questions?

2011-06-30 Thread Florian Pflug
On Jun30, 2011, at 07:22 , Casey Havenor wrote:
 What recommendations do you have for the other systems? - Win7, Win XP and
 Mac?

Mac OS X comes with patch already installed I think - at least it gets
installed when you install XCode (Apple's IDE), which you need anyway to
get a C compiler (The current XCode version, XCode 4, includes both gcc
and clang, btw.).

If you need additional unix tools on Mac OS X, there's macports
(http://www.macports.org/).

For windows, there's cygwin (http://www.cygwin.com/), which probably also
contains a package for patch. 

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding Japanese README

2011-06-30 Thread Peter Eisentraut
On ons, 2011-06-29 at 18:48 -0500, Kevin Grittner wrote:
 I think this needs a well-defined and sustainable *process*, not
 just a set of volunteers.  I'm skeptical that a workable process can
 be devised, but I'm willing to be proven wrong.

We had translated FAQs, each with a maintainer, which were massively
outdated before we removed them.

And just as another example closer to home, I can tell even without
knowing any Japanese or Chinese that doc/README.mb.big5 and
doc/README.mb.jp are so outdated that we should consider removing them
right now.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding Japanese README

2011-06-30 Thread Peter Eisentraut
On tor, 2011-06-30 at 09:42 +0900, Tatsuo Ishii wrote:
 Why don't you worry about message translations then? Translation is a
 human process and there's no way to guaranteer the translation is
 perfect.

At least for message translations we have a process and sophisticated
tools that ensure that the translation has been done and when and by
whom.  If you create additional translation maintenance effort, then if
you could integrate that into the existing PO-based system, I'm sure the
objections would be less (or different).

And another important point: the PO-based system never shows outdated
translations to the user.

But if we're adding another step to the release preparation where
someone has to chase down half a dozen people to update some file, that
creates a lot more burden and bottlenecks, including the possibility to
hold up minor releases or having to remove files in minor releases.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types, constructors, and the type system

2011-06-30 Thread Peter Eisentraut
On tor, 2011-06-30 at 08:45 +0200, Florian Pflug wrote:
 I don't think it will - as it stands, there isn't a single collatable
 type RANGE but instead one *distinct* type per combination of base
 type, btree opclass and collation. The reasons for that were discussed
 at length - the basic argument for doing it that way was to make a
 range represent a fixed set of values.

How would the system catalogs be initialized under that theory: surely
you're not going to seed (nr. of types) * (nr. of collations) * (nr. of
opclasses) range types in initdb?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types, constructors, and the type system

2011-06-30 Thread Florian Pflug
On Jun30, 2011, at 09:05 , Peter Eisentraut wrote:
 On tor, 2011-06-30 at 08:45 +0200, Florian Pflug wrote:
 I don't think it will - as it stands, there isn't a single collatable
 type RANGE but instead one *distinct* type per combination of base
 type, btree opclass and collation. The reasons for that were discussed
 at length - the basic argument for doing it that way was to make a
 range represent a fixed set of values.
 
 How would the system catalogs be initialized under that theory: surely
 you're not going to seed (nr. of types) * (nr. of collations) * (nr. of
 opclasses) range types in initdb?

There's CREATE RANGE. By default, no range types would exists I believe.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in SQL/MED?

2011-06-30 Thread Albe Laurenz
I wrote:

 If you invoke any of the SQL/MED CREATE or ALTER commands,
 the validator function is only called if an option list was given.

[...]
 Example:
[...]

The example is misleading. Here a better one:

CREATE SERVER myoradb FOREIGN DATA WRAPPER oracle_fdw OPTIONS (foo
'bar');
ERROR:  invalid option foo
HINT:  Valid options in this context are: dbserver, user, password

but:
CREATE SERVER myoradb FOREIGN DATA WRAPPER oracle_fdw;
gives no error.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parameterized aggregate subquery (was: Pull up aggregate subquery)

2011-06-30 Thread Yeb Havinga

On 2011-06-29 19:22, Hitoshi Harada wrote:

Other things are all good points. Thanks for elaborate review!
More than anything, I'm going to fix the 6) issue, at least to find the cause.


Some more questions:
8) why are cheapest start path and cheapest total path in 
best_inner_subqueryscan the same?


9) as remarked up a different thread, more than one clause could be 
pushed down a subquery. The current patch only considers alternative 
paths that have at most one clause pushed down. Is this because of the 
call site of best_inner_subqueryscan, i.e. under one join rel? Would it 
be an idea to consider, for each subquery, only a single alternative 
plan that tries to have all suitable clauses pushed down?


10) I have a hard time imagining use cases that will actually result in 
a alternative plan, especially since not all subqueries are allowed to 
have quals pushed down into, and like Simon Riggs pointed out that many 
users will write queries like this with the subqueries pulled up. If it 
is the case that the subqueries that can't be pulled up have a large 
overlap with the ones that are not pushdown safe (limit, set operations 
etc), there might be little actual use cases for this patch.


I think the most important thing for this patch to go forward is to have 
a few examples, from which it's clear that the patch is beneficial.


regards,

--

Yeb Havinga
http://www.mgrid.net/
Mastering Medical Data


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Latch implementation that wakes on postmaster death on both win32 and Unix

2011-06-30 Thread Heikki Linnakangas

On 30.06.2011 09:36, Fujii Masao wrote:

On Sat, Jun 25, 2011 at 10:41 AM, Peter Geogheganpe...@2ndquadrant.com  wrote:

Attached is patch that addresses Fujii's third and most recent set of concerns.


Thanks for updating the patch!


I think that Heikki is currently taking another look at my work,
because he indicates in a new message to the list a short time ago
that while reviewing my patch, he realised that there may be an
independent problem with silent_mode. I will wait for his remarks
before producing another version of the patch that incorporates those
two small changes.


Yes, we should wait for the comments from Heikki. But, I have another
comments;


Here's a WIP patch with some mostly cosmetic changes I've done this far. 
I haven't tested the Windows code at all yet. It seems that no-one is 
objecting to removing silent_mode altogether, so I'm going to do that 
before committing this patch.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a7f5373..155acea 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10165,7 +10165,7 @@ retry:
 	/*
 	 * Wait for more WAL to arrive, or timeout to be reached
 	 */
-	WaitLatch(XLogCtl-recoveryWakeupLatch, 500L);
+	WaitLatch(XLogCtl-recoveryWakeupLatch, WL_LATCH_SET | WL_TIMEOUT, 500L);
 	ResetLatch(XLogCtl-recoveryWakeupLatch);
 }
 else
diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c
index 6dae7c9..1a2e141 100644
--- a/src/backend/port/unix_latch.c
+++ b/src/backend/port/unix_latch.c
@@ -93,6 +93,7 @@
 #endif
 
 #include miscadmin.h
+#include postmaster/postmaster.h
 #include storage/latch.h
 #include storage/shmem.h
 
@@ -179,31 +180,32 @@ DisownLatch(volatile Latch *latch)
  * Wait for given latch to be set or until timeout is exceeded.
  * If the latch is already set, the function returns immediately.
  *
- * The 'timeout' is given in microseconds, and -1 means wait forever.
- * On some platforms, signals cause the timeout to be restarted, so beware
- * that the function can sleep for several times longer than the specified
- * timeout.
+ * The 'timeout' is given in microseconds. It must be = 0 if WL_TIMEOUT
+ * event is given, otherwise it is ignored. On some platforms, signals cause
+ * the timeout to be restarted, so beware that the function can sleep for
+ * several times longer than the specified timeout.
  *
  * The latch must be owned by the current process, ie. it must be a
  * backend-local latch initialized with InitLatch, or a shared latch
  * associated with the current process by calling OwnLatch.
  *
- * Returns 'true' if the latch was set, or 'false' if timeout was reached.
+ * Returns bit field indicating which condition(s) caused the wake-up. Note
+ * that if multiple wake-up conditions are true, there is no guarantee that
+ * we return all of them in one call, but we will return at least one.
  */
-bool
-WaitLatch(volatile Latch *latch, long timeout)
+int
+WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
 {
-	return WaitLatchOrSocket(latch, PGINVALID_SOCKET, false, false, timeout)  0;
+	return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout);
 }
 
 /*
- * Like WaitLatch, but will also return when there's data available in
- * 'sock' for reading or writing. Returns 0 if timeout was reached,
- * 1 if the latch was set, 2 if the socket became readable or writable.
+ * Like WaitLatch, but with an extra socket argument for WL_SOCKET_*
+ * conditions.
  */
 int
-WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
-  bool forWrite, long timeout)
+WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
+  long timeout)
 {
 	struct timeval tv,
 			   *tvp = NULL;
@@ -212,19 +214,26 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 	int			rc;
 	int			result = 0;
 
+	Assert(wakeEvents != 0);
+
+	/* Ignore WL_SOCKET_* events if no valid socket is given */
+	if (sock == PGINVALID_SOCKET)
+		wakeEvents = ~(WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE);
+
 	if (latch-owner_pid != MyProcPid)
 		elog(ERROR, cannot wait on a latch owned by another process);
 
 	/* Initialize timeout */
-	if (timeout = 0)
+	if (wakeEvents  WL_TIMEOUT)
 	{
+		Assert(timeout = 0);
 		tv.tv_sec = timeout / 100L;
 		tv.tv_usec = timeout % 100L;
 		tvp = tv;
 	}
 
 	waiting = true;
-	for (;;)
+	do
 	{
 		int			hifd;
 
@@ -235,16 +244,28 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 		 * do that), and the select() will return immediately.
 		 */
 		drainSelfPipe();
-		if (latch-is_set)
+		if (latch-is_set  (wakeEvents  WL_LATCH_SET))
 		{
-			result = 1;
+			result |= WL_LATCH_SET;
+			/*
+			 * Leave loop immediately, avoid blocking again. We don't attempt
+			 * to report any other events that might also be satisfied.
+			 */
 			break;

[HACKERS] Re: Review of patch Bugfix for XPATH() if expression returns a scalar value

2011-06-30 Thread Radosław Smogura

On Wed, 29 Jun 2011 22:26:39 +0200, Florian Pflug wrote:

On Jun29, 2011, at 19:57 , Radosław Smogura wrote:

This is review of patch
https://commitfest.postgresql.org/action/patch_view?id=565
Bugfix for XPATH() if expression returns a scalar value

SELECT XMLELEMENT(name root, XMLATTRIBUTES(foo.namespace AS sth)) 
FROM (SELECT
(XPATH('namespace-uri(/*)', x))[1] AS namespace FROM (VALUES 
(XMLELEMENT(name
root, XMLATTRIBUTES('n' AS xmlns, 'v' AS value),'t'))) v(x)) as 
foo;


  xmlelement
-
root sth=amp;lt;n/

	It's clearly visible that value from attribute is lt;n, not . 
Every
parser will read this as lt;n which is not-natural and will 
require form
consumer/developer to de-escape this on his side - roughly speaking 
this will

be reported as serious bug.


There's a problem there, no doubt, but your proposed solution just 
moves the

problem around.

Here's your query, reformatted to be more legible

SELECT
  XMLELEMENT(name root,
 XMLATTRIBUTES(foo.namespace AS sth)
  )
FROM (
  SELECT
(XPATH('namespace-uri(/*)', x))[1] AS namespace
  FROM (VALUES (XMLELEMENT(name root,
XMLATTRIBUTES('n' AS xmlns,
  'v' AS value),'t')
  )
) v(x)) as foo;

What happens is that the XPATH expression selects the xmlns
attribute with the value 'n'. To be well-formed xml, that value
must be escaped, so what is actually returned by the XPATH
call is 'lt;n'. The XMLATTRIBUTES() function, however, doesn't
distinguish between input of type XML and input of type TEXT,
so it figures it has to represent the *textual* value 'lt;n'
in xml, and produces 'amp;lt;n'.

Not escaping textual values returned by XPATH isn't a solution,
though. For example, assume someone inserts the value returned
by the XPATH() call in your query into a column of type XML.
If XPATH() returned 'n' literally instead of escaping it,
the column would contain non-well-formed XML in a column of type
XML. Not pretty, and pretty fatal during your next dump/reload
cycle.

Or, if you want another example, simply remove the XMLATTRIBUTES
call and use the value returned by XPATH as a child node directly.

SELECT
  XMLELEMENT(name root,
 foo.namespace
  )
FROM (
  SELECT
(XPATH('namespace-uri(/*)', x))[1] AS namespace
  FROM (VALUES (XMLELEMENT(name root,
XMLATTRIBUTES('n' AS xmlns,
  'v' AS value),'t')
  )
) v(x)) as foo;

 xmlelement

 rootlt;n/root

Note that this correct! The root node contains a text node
representing the textual value 'n'. If XPATH() hadn't return
the value escaped, the query above would have produces

  rootn/root

which is obviously wrong. It works in this case because XMLELEMENT
is smart enough to distinguish between child nodes gives as TEXT
values (those are escaped) and child nodes provided as XML values
(those are inserted unchanged).

XMLATTRIBUTES, however, doesn't make that distinction, and simply
escaped all attributes values independent of their type. Now, the
reason it does that is probably that *not* all XML values are
well-formed attribute values without additional escaping. For 
example,

you cannot have literal '' and '' in attribute values. So if
XMLATTRIBUTES, like XMLELEMENT never escaped values which are
already of type XML, the following piece of code

  XMLELEMENT(name a, XMLATTRIBUTES('node/'::xml as v))

would produce

  a v=node//

which, alas, is not well-formed :-(

The correct solution, I believe, is to teach XMLATTRIBUTES to
only escape those characters which are invalid in attribute
values but valid in XML (Probably only '' and '' but I'll
check). If there are no objects, I'll prepare a separate patch
for that. I don't want to include it in this patch because it's
really a distinct issue (even modifies a different function).

best regards,
Florian Pflug


You may manually enter invalid xml too, You don't need xpath for this. 
Much more

In PostgreSQL 9.0.1, compiled by Visual C++ build 1500, 64-bit
I executed
SELECT XMLELEMENT(name a, XMLATTRIBUTES('node/'::xml as v))
and it's looks like I got correct output a v=lt;node/gt;/
when I looked with text editor into table files I saw same value.

I will check on last master if I can reproduce this.

But indeed, now PostgreSQL is not type safe against XML type. See 
SELECT XMLELEMENT(name root, '', (xpath('//text()', 
'rootlt;/root'))[1])).


You found some problem with escaping, but solution is bad, I think 
problem lies with XML type, which may be used to hold strings and proper 
xml. For example above query can't distinguish if (xpath('//text()', 
'rootlt;/root'))[1] is xml text, or xml element.


For me adding support for scalar is really good and needed, but 
escaping is not.


Regards,
Radosław Smogura


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parameterized aggregate subquery (was: Pull up aggregate subquery)

2011-06-30 Thread Hitoshi Harada
2011/6/30 Yeb Havinga yebhavi...@gmail.com:
 On 2011-06-29 19:22, Hitoshi Harada wrote:

 Other things are all good points. Thanks for elaborate review!
 More than anything, I'm going to fix the 6) issue, at least to find the
 cause.

 Some more questions:
 8) why are cheapest start path and cheapest total path in
 best_inner_subqueryscan the same?

Because best_inner_indexscan has the two. Actually one of them is
enough so far. I aligned it as the existing interface but they might
be one.

 10) I have a hard time imagining use cases that will actually result in a
 alternative plan, especially since not all subqueries are allowed to have
 quals pushed down into, and like Simon Riggs pointed out that many users
 will write queries like this with the subqueries pulled up. If it is the
 case that the subqueries that can't be pulled up have a large overlap with
 the ones that are not pushdown safe (limit, set operations etc), there might
 be little actual use cases for this patch.

I have seen many cases that this planner hack would help
significantly, which were difficult to rewrite. Why were they
difficult to write? Because, quals on size_m (and they have quals on
size_l in fact) are usually very complicated (5-10 op clauses) and the
join+agg part itself is kind of subquery in other big query. Of course
there were workaround like split the statement to two, filtering
size_m then aggregate size_l by the result of the first statement.
However, it's against instinct. The reason why planner is in RDBMS is
to let users to write simple (as needed) statements. I don't know if
the example I raise here is common or not, but I believe the example
represents one to many relation simply, therefore there should be
many users who just don't find themselves currently in the slow query
performance.

 I think the most important thing for this patch to go forward is to have a
 few examples, from which it's clear that the patch is beneficial.

What will be good examples to show benefit of the patch? I guess the
test case of size_m/size_l shows it. What lacks on the case, do you
think?

Regards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] time-delayed standbys

2011-06-30 Thread Simon Riggs
On Thu, Jun 30, 2011 at 2:56 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jun 29, 2011 at 9:54 PM, Josh Berkus j...@agliodbs.com wrote:
 I am not sure exactly how walreceiver handles it if the disk is full.
 I assume it craps out and eventually retries, so probably what will
 happen is that, after the standby's pg_xlog directory fills up,
 walreceiver will sit there and error out until replay advances enough
 to remove a WAL file and thus permit some more data to be streamed.

 Nope, it gets stuck and stops there.  Replay doesn't advance unless you
 can somehow clear out some space manually; if the disk is full, the disk
 is full, and PostgreSQL doesn't remove WAL files without being able to
 write files first.

 Manual (or scripted) intervention is always necessary if you reach disk
 100% full.

 Wow, that's a pretty crappy failure mode... but I don't think we need
 to fix it just on account of this patch.  It would be nice to fix, of
 course.

How is that different to running out of space in the main database?

If I try to pour a pint of milk into a small cup, I don't blame the cup.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Latch implementation that wakes on postmaster death on both win32 and Unix

2011-06-30 Thread Peter Geoghegan
On 30 June 2011 08:58, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

 Here's a WIP patch with some mostly cosmetic changes I've done this far. I
 haven't tested the Windows code at all yet. It seems that no-one is
 objecting to removing silent_mode altogether, so I'm going to do that before
 committing this patch.

I'm mostly happy with the changes you've made, but I note:

Fujii is of course correct in pointing out that
InitPostmasterDeathWatchHandle() should be a static function.

s/the implementation, as described in unix_latch.c/the implementation/
 - This is my mistake. I see no reason to mention the .c file. Use
ctags.

Minor niggle, but there is a little errant whitespace at the top of
fork_process.c.

(wakeEvents  WL_TIMEOUT) != 0 -- I was going to note that this was
redundant, but then I remembered that stupid MSVC warning, which I
wouldn't have seen here because I didn't use it for my Windows build
due to an infuriating issue with winflex (Our own Cygwin built version
of flex for windows). I wouldn't have expected that when it was set to
build C though. I'm not sure exactly why it isn't necessary in other
places where we're (arguably) doing the same thing.


On 30 June 2011 07:36, Fujii Masao masao.fu...@gmail.com wrote:

 ReleasePostmasterDeathWatchHandle() can call ereport(FATAL) before
 StartChildProcess() or BackendStartup() calls on_exit_reset() and resets
 MyProcPid. This looks unsafe. If that ereport(FATAL) is unfortunately
 called, a process other than postmaster would perform the postmaster's
 proc-exit handlers. And that ereport(FATAL) would report wrong pid
 when %p is specified in log_line_prefix. What about closing the pipe in
 ClosePostmasterPorts() and removing ReleasePostmasterDeathWatchHandle()?

Hmm. That is a valid criticism. I'd rather move the
ReleasePostmasterDeathWatchHandle() call into ClosePostmasterPorts()
though.

 http://developer.postgresql.org/pgdocs/postgres/error-style-guide.html
 According to the error style guide, I think that it's better to change the
 following messages:

I don't think that the way I've phrased my error messages is
inconsistent with that style guide, excepty perhaps the pipe()
reference, but if you feel it's important to try and use could not,
I have no objections.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in SQL/MED?

2011-06-30 Thread 花田 茂
(2011/06/29 21:23), Albe Laurenz wrote:
 If you invoke any of the SQL/MED CREATE or ALTER commands,
 the validator function is only called if an option list was given.
 
 That means that you cannot enforce required options at object creation
 time, because the validator function is not always called.
 I consider that unexpected an undesirable behaviour.
 
 Example:
 CREATE EXTENSION file_fdw;
 CREATE FOREIGN DATA WRAPPER file HANDLER file_fdw_handler VALIDATOR
 file_fdw_validator;
 CREATE SERVER file FOREIGN DATA WRAPPER file;
 CREATE FOREIGN TABLE flat (id integer) SERVER file OPTIONS (format
 'csv');
 SELECT * FROM flat;
 ERROR:  filename is required for file_fdw foreign tables
 
 Now file_fdw does not try to enforce the filename option, but it
 would be nice to be able to do so.
 
 The attached patch would change the behaviour so that the validator
 function
 is always called.
 
 
 If that change meets with approval, should file_fdw be changed so that
 it
 requires filename at table creation time?

I think this proposal is reasonable.  IMHO this fix is enough trivial to
be merged into 9.1 release.

I attached a patch which fixes file_fdw to check required option
(filename) in its validator function.  I think that such requirement
should be checked again in PlanForeignScan(), as it had been so far.
Note that this patch requires fdw.patch has been applied.

With this patch, file_fdw rejects commands which eliminate filename
option as result.  Please see attached sample.txt for detail.

BTW, I noticed that current document says just the validator function
is called for CREATE FDW/SERVER/FOREIGN TABLE, and doesn't mention
ALTER command or USER MAPPING.  I'll post another patch for this issue
later.

Regards,
-- 
Shigeru Hanada

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 466c015..57e522f 100644
*** a/contrib/file_fdw/file_fdw.c
--- b/contrib/file_fdw/file_fdw.c
*** file_fdw_validator(PG_FUNCTION_ARGS)
*** 215,220 
--- 215,230 
 */
ProcessCopyOptions(NULL, true, other_options);
  
+   /*
+* filename is a required option.  Validity of other options including
+* relative ones have been checked in ProcessCopyOptions().
+* Note: We don't care its value even though it might be empty, because
+* COPY comand doesn't.
+*/
+   if (catalog == ForeignTableRelationId  filename == NULL)
+   ereport(ERROR,
+   (errmsg(filename is required for file_fdw 
foreign tables)));
+ 
PG_RETURN_VOID();
  }
  
*** fileGetOptions(Oid foreigntableid,
*** 286,295 
}
prev = lc;
}
-   if (*filename == NULL)
-   ereport(ERROR,
-   (errcode(ERRCODE_FDW_UNABLE_TO_CREATE_REPLY),
-errmsg(filename is required for file_fdw 
foreign tables)));
*other_options = options;
  }
  
--- 296,301 
*** filePlanForeignScan(Oid foreigntableid,
*** 308,313 
--- 314,323 
  
/* Fetch options --- we only need filename at this point */
fileGetOptions(foreigntableid, filename, options);
+   if (filename == NULL)
+   ereport(ERROR,
+   (errcode(ERRCODE_FDW_UNABLE_TO_CREATE_REPLY),
+errmsg(filename is required for file_fdw 
foreign tables)));
  
/* Construct FdwPlan with cost estimates */
fdwplan = makeNode(FdwPlan);
diff --git a/contrib/file_fdw/input/file_fdw.source 
b/contrib/file_fdw/input/file_fdw.source
index 9ff7235..8d6dfa3 100644
*** a/contrib/file_fdw/input/file_fdw.source
--- b/contrib/file_fdw/input/file_fdw.source
*** CREATE FOREIGN TABLE tbl () SERVER file_
*** 59,64 
--- 59,65 
  ');   -- ERROR
  CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'csv', null '
  ');   -- ERROR
+ CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'csv');
   -- ERROR
  
  CREATE FOREIGN TABLE agg_text (
a   int2,
diff --git a/contrib/file_fdw/output/file_fdw.source 
b/contrib/file_fdw/output/file_fdw.source
index 2ba36c9..6cc6746 100644
*** a/contrib/file_fdw/output/file_fdw.source
--- b/contrib/file_fdw/output/file_fdw.source
*** ERROR:  COPY delimiter cannot be newline
*** 75,80 
--- 75,82 
  CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'csv', null '
  ');   -- ERROR
  ERROR:  COPY null representation cannot use newline or carriage return
+ CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'csv');
   -- ERROR
+ ERROR:  filename is required for file_fdw foreign tables
  CREATE FOREIGN TABLE agg_text (
a   int2,
b   float4
postgres=# CREATE FOREIGN TABLE foo () SERVER file;
ERROR:  filename is required for file_fdw foreign tables
postgres=# CREATE FOREIGN TABLE 

Re: [HACKERS] Review of patch Bugfix for XPATH() if expression returns a scalar value

2011-06-30 Thread Florian Pflug
On Jun30, 2011, at 10:33 , Radosław Smogura wrote:
 You may manually enter invalid xml too, You don't need xpath for this.

Please give an example, instead of merely asserting
I get

postgres=# select ''::xml;
ERROR:  invalid XML content at character 8

 Much more
 In PostgreSQL 9.0.1, compiled by Visual C++ build 1500, 64-bit
 I executed
 SELECT XMLELEMENT(name a, XMLATTRIBUTES('node/'::xml as v))
 and it's looks like I got correct output a v=lt;node/gt;/
 when I looked with text editor into table files I saw same value.

I fail to see your point. XMLATTRIBUTES currently escapes
the value unconditionally, which happens to work as expected in
this case. But try
  SELECT XMLELEMENT(name a, XMLATTRIBUTES('lt;'::xml as v))
which returns
  a v=amp;lt;/
which I guess isn't what one would expect, otherwise one wouldn't
have added the cast to xml.

I believe the latter example should probably return
  a v=lt;/
instead, just as XMLELEMENT(name a, 'lt;'::xml) returns
  alt;/a

But again, this has little to do with the patch at hand, and should
therefore be discussed separately. The fact that XPATH's failure to
escape it's return value happens to fit XMLATTRIBUTE's failure to
not escape it's input value doesn't prove that both functions are
correct. Especially not if other functions like XMLELEMENT *don't*
escape their input values of they're already of type XML.

 But indeed, now PostgreSQL is not type safe against XML type. See
 SELECT XMLELEMENT(name root, '', (xpath('//text()', 
 'rootlt;/root'))[1])).

That is *exactly* what my other patch fixes, the one you deem
undesirable. With that other patch applied, I get

  xmlelement   
---
 rootlt;lt;/root

wich is correct. So here too I fail to see your point.

 You found some problem with escaping, but solution is bad,
 I think problem lies with XML type, which may be used to hold strings
 and proper xml.

But it's not *supposed* to hold arbitrary strings. If it was, the
input function wouldn't check whether or not the value was valid or
not.

Do you agree that, for any type, SELECT value_of_type_X::text::X
should never throw an error? Because that, IMHO quite basic, guarantee
is broken without proper escaping of XML values. Try

SELECT (XPATH('/text()', 'tlt;/t'::xml))[1]::text::xml

on unpatched 9.0. It fails, which IMNSHO is very clearly a bug.

 For example above query can't distinguish if
 (xpath('//text()', 'rootlt;/root'))[1] is xml text, or xml element.

Text *is* a kind of element in XML. There are different types of
nodes, one of which are is text node. The fact that it's not surrounded
by .., /.. doesn't make a different, and doesn't change the quoting
rules one bit.

 For me adding support for scalar is really good and needed, but escaping is 
 not.

So far, I remain unconvinced by your arguments. What you argue for
might be sensible if we *didn't* have an XML type and instead simply
had stuff like XPATH() for type TEXT. But we *do* have a type XML,
which *does* validate it's input, so I fail to see how allowing values
of type XML which *don't* pass that input validation can be anything but
a bug.

Compare this to the textual types. We've gone through some pain in
the past to guarantee that textual types never hold values which aren't
valid text in the database's encoding (e.g, we prevent textual values
from containing bytes sequences which aren't valid UTF-8 if the database
encoding is UTF-8). It follows that we should do the same for XML types.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types, constructors, and the type system

2011-06-30 Thread Florian Pflug
On Jun29, 2011, at 17:41 , Jeff Davis wrote:
 Is it? That's actually too bad, since I kinda like it. But anyway,
 if that's a concern it could also be
  range_bounds(ARRAY[1,2]::int8range, '(]')
 
 What type would the result of that be? What value?

ARRAY[1,2]::int8range would return an int8range instance representing
the range [1,2] (i.e. the set of values {1,2}). range_bounds would then
modify the left bound to be exclusive, and thus return the int8range
(1,2] (i.e. the set of values {2}). range_bounds would have the signature
  range_bounds(anyrange) returns anyrange.

I do think we'll probably want to have functions to modify the boundary
type (open or closed) anyway, so it wouldn't be that huge of a deal if
the range constructor didn't let you specify them.

Empty ranges would be constructed by
  ARRAY[]::int8range
(Hm, ok, now I'm cheating... Currently you'd need to write
ARRAY[]::int8[]::int8range, but fixing that only needs a few lines
in the transformExpression* function that makes ARRAY[]::int8[] work).

 * It still suffers similar problems as casting back and forth to text:
 ANYARRAY is too general, doesn't really take advantage of the type
 system, and not a great fit anyway.
 
 I believe it alleviates the gravest problems of casting back and forth
 to text. It doesn't have quoting issues and it doesn't potentially lose
 information.
 
 I think it still circumvents the type system to a degree. We're just
 putting stuff in an array with no intention of really using it that way.

Well, arrays are containers, and we need two values to construct a range,
so putting them into a container first and then creating the range from that
doesn't seem so bad to me. We do use the full set of features that arrays
provide, since we only ever expect zero, one or two entries. But I don't
think this is different from functions who only support single-dimensional
arrays - they too choose to use only a subset of the features set of arrays.

 In any case, I wouldn't expect this to *stay* the only way to construct
 a range forever. But I does have it's virtues for a first incarnation of
 range type, I believe, mostly because it's completely unintrusive and
 won't cause any backwards-compatbility headaches in the future
 
 I'm not sure that your overloading of arrays is completely immune from
 backwards-compatibility problems, should we decide to change it later.
 
 But regardless, we have quite a lot of time to make a decision before
 9.2 is released; so let's do it once and do it right.
 
 I fear that the intermediate type will turn out to be quite intrusive,
 at least if we try to handle all the corner cases and loose ends. And if
 we don't, I'm concerned that we're painting ourselves into a corner here...
 
 Can you expand on some of the corner-cases and loose ends you're
 concerned about? Does marking it as a pseudotype and making the IO
 functions throw exceptions handle them?

Hm, I guess. I'm sill no huge fan of RANGEINPUT, but if we prevent
it from being used as a column type and from being used as an argument
type, then I guess it's workable...

Btw, what happened to the idea of making RANGE(...) a special syntactic
construct instead of a normal function call? Did we discard that for its
intrusiveness, or were there other reasons?

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.2] Fix leaky-view problem, part 1

2011-06-30 Thread Noah Misch
On Wed, Jun 29, 2011 at 05:05:22PM +0100, Kohei KaiGai wrote:
 2011/6/28 Noah Misch n...@leadboat.com:
  On Tue, Jun 28, 2011 at 10:11:59PM +0200, Kohei KaiGai wrote:

  CREATE VIEW a AS SELECT * FROM ta WHERE ac = 5;
  ALTER VIEW a OWNER TO alice;
  CREATE VIEW b AS SELECT * FROM tb WHERE bc = 6;
  ALTER VIEW b OWNER TO bob;
  SELECT * FROM a, b;
 
  Both the ac=5 and the bc=6 quals do run at the same depth despite enforcing
  security for different principals. ?I can't think of a way that one view 
  owner
  could use this situation to subvert the security of the other view owner, 
  but I
  wanted to throw it out.
 
 Even if view owner set a trap in his view, we have no way to reference 
 variables
 come from outside of the view. In above example, even if I added f_leak() into
 the definition of VIEW A, we cannot give argument to reference VIEW B.

Good point.  Yes, it should be rigorously safe on that account.

  I was referring to this paragraph:
 
  ?On the technical side, I am pretty doubtful that the approach of adding a
  ?nestlevel to FuncExpr and RelOptInfo is the right way to go. ?I believe we
  ?have existing code (to handle left joins) that prevents quals from being
  ?pushed down too far by fudging the set of relations that are supposedly 
  needed
  ?to evaluate the qual. ?I suspect a similar approach would work here.
 
 It seems to me the later half of this paragraph is talking about the problem 
 of
 unexpected qualifier pushing-down over the security barrier; I'm trying to 
 solve
 the problem with the part.2 patch.
 The scenario the part.1 patch tries to solve is order to launch qualifiers, 
 not
 unexpected pushing-down.

Okay, you're probably correct that it wasn't referring to the topic at hand.
I'm still suspicious of the silent assumption about how quals can be assigned
to plan nodes, but I don't have any concrete ideas for avoiding that.

  In addition, implementation will become complex, if both of qualifiers 
  pulled-up
  from security barrier view and qualifiers pulled-up from regular views are 
  mixed
  within a single qualifier list.
 
  I only scanned the part 2 patch, but isn't the bookkeeping already 
  happening for
  its purposes? ?How much more complexity would we get to apply the same 
  strategy
  to the behavior of this patch?
 
 If conditional, what criteria we should have to reorder the quelifier?
 The current patch checks the depth at first, then it checks cost if same 
 deptn.
 It is quite simple rule. I have no idea of the criteria to order the
 mixed qualifier
 come from security-barrier views and regular views.

Let's see.  Every qual list will have some depth d such that all quals having
depth = d are security-relevant, and all others are not security-relevant.
(This does not hold for all means of identifying security-relevant quals, but
it does hold for the CREATE SECURITY VIEW/reloptions strategy proposed in your
part 2 patch.)  Suppose you track whether each Query node represents a
security view, then only increment the qualifier depth for such Query nodes,
rather than all Query nodes.  The tracked depth then becomes a security
partition depth.  Keep the actual sorting algorithm the same.  (Disclaimer: I
haven't been thinking about this nearly as long as you have, so I may be
missing something relatively obvious.)

As it stands, the patch badly damages the performance of this example:

CREATE FUNCTION expensive(int) RETURNS boolean LANGUAGE sql
AS 'SELECT pg_sleep(1); SELECT true' COST 100;
CREATE TABLE t(c) AS SELECT * FROM generate_series(1,3);
EXPLAIN ANALYZE
SELECT * FROM (SELECT * FROM t WHERE expensive(c)) t0 WHERE c = 2;

That doesn't even use a view, let alone a security view.  While I like the
patch's current simplicity, we need to narrow its impact.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] time-delayed standbys

2011-06-30 Thread Simon Riggs
On Wed, Jun 29, 2011 at 7:11 PM, Robert Haas robertmh...@gmail.com wrote:

 I don't really see how that's any different from what happens now.  If
 (for whatever reason) the master is generating WAL faster than a
 streaming standby can replay it, then the excess WAL is going to pile
 up someplace, and you might run out of disk space.   Time-delaying the
 standby creates an additional way for that to happen, but I don't
 think it's an entirely new problem.

The only way to control this is with a time delay that can be changed
while the server is running. A recovery.conf parameter doesn't allow
that, so another way is preferable.

I think the time problems are more complex than said. The patch relies
upon transaction completion times, but not all WAL records have a time
attached to them. Plus you only used commits anyway, not sure why.
Some actions aren't even transactional, such as DROP DATABASE, amongst
others. Consecutive records can be hours apart, so it would be
possible to delay on some WAL records but then replay records that
happened minutes ago, then wait hours for the next apply. So this
patch doesn't do what it claims in all cases.

Similar discussion on max_standby_delay covered exactly that ground
and went on for weeks in 9.0. IIRC I presented the same case you just
did and we agreed in the end that was not acceptable. I'm not going to
repeat it. Please check the archives.

So, again +1 for the feature, but -1 for the currently proposed
implementation, based upon review.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parameterized aggregate subquery

2011-06-30 Thread Yeb Havinga

On 2011-06-30 09:39, Yeb Havinga wrote:


9) as remarked up a different thread, more than one clause could be 
pushed down a subquery. The current patch only considers alternative 
paths that have at most one clause pushed down. Is this because of the 
call site of best_inner_subqueryscan, i.e. under one join rel? Would 
it be an idea to consider, for each subquery, only a single 
alternative plan that tries to have all suitable clauses pushed down?


Ehm, please forget this remark, I've mistaken join rel for base rel.

-- Yeb


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Make the visibility map crash-safe.

2011-06-30 Thread Robert Haas
On Tue, Jun 28, 2011 at 12:29 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jun 28, 2011 at 11:44 AM, Cédric Villemain
 cedric.villemain.deb...@gmail.com wrote:
 out of curiosity, does it affect the previous benchmarks of the feature ?

 I don't think there's much performance impact, because the only case
 where we actually have to do any real work is when vacuum comes
 through and marks a buffer we're about to update all-visible.  It
 would probably be good to run some tests to verify that, though.  I'll
 try to set something up.

I did some pgbench runs on the 32-core box Nate Boley gave me access
to.  I'm not sure that pgbench is the best workload to test this with,
but I gave it a shot.  I used these settings:

checkpoint_segments=64
shared_buffers=8GB
synchronous_commit=off
checkpoint_completion_target=0.9

I compare the performance of commit
431ab0e82819b31fcd1e33ecb52c2cd3b4b41da7 (post-patch) with commit
431ab0e82819b31fcd1e33ecb52c2cd3b4b41da7 (pre-patch).  I tried 3
15-minute pgbench runs each with one client, eight clients, and
thirty-two clients, on each branch.  I used scale factor 100,
reinitializing the entire cluster after each run.

one client (prepatch)
tps = 616.412009 (including connections establishing)
tps = 619.189040 (including connections establishing)
tps = 605.357710 (including connections establishing)

one client (postpatch)
tps = 583.295139 (including connections establishing)
tps = 609.236975 (including connections establishing)
tps = 597.674354 (including connections establishing)

eight clients (prepatch)
tps = 2635.459096 (including connections establishing)
tps = 2468.973962 (including connections establishing)
tps = 2592.889454 (including connections establishing)

eight clients (postpatch)
tps = 2602.226439 (including connections establishing)
tps = 2644.201228 (including connections establishing)
tps = 2725.760364 (including connections establishing)

thirty-two clients (prepatch)
tps = 3889.761627 (including connections establishing)
tps = 4365.501093 (including connections establishing)
tps = 3816.119328 (including connections establishing)

thirty-two clients (postpatch)
tps = 3888.620044 (including connections establishing)
tps = 3614.252463 (including connections establishing)
tps = 4090.430296 (including connections establishing)

I think it's pretty difficult to draw any firm conclusions from this
data.  The one and thirty-two core results came out a bit slower; the
eight core results came out a bit faster.  But the variation between
branches is less than the inter-run variation on the same branch, so
if there is an effect, this test doesn't clearly capture it.

Having thought about it a bit, I think that if we ran this test for
long enough, we probably could measure a small effect.   pgbench is a
very write-intensive test, so anything that writes additional WAL
records is going cause checkpoints to happen more frequently, and
there's no denying that this change increases WAL volume slightly.

On another note, these results are also interesting from a scalability
perspective.  The eight-core results are about 4.4x the one-core
results, and the 32-client results are about 6.5x the one-core
results.  Needless to say, there's a lot of room for improvement
there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hint bit cache v6

2011-06-30 Thread Robert Haas
On Thu, Jun 30, 2011 at 12:31 AM, Jim Nasby j...@nasby.net wrote:
 Would it be reasonable to keep a second level cache that store individual 
 XIDs instead of blocks? That would provide protection for XIDs that are 
 extremely common but don't have a good fit with the pattern of XID ranges 
 that we're caching. I would expect this to happen if you had a transaction 
 that touched a bunch of data (ie: bulk load or update) some time ago (so the 
 other XIDs around it are less likely to be interesting) but not old enough to 
 have been frozen yet. Obviously you couldn't keep too many XIDs in this 
 secondary cache, but if you're just trying to prevent certain pathological 
 cases then hopefully you wouldn't need to keep that many.

Maybe, but I think that's probably still papering around the problem.
I'd really like to find an algorithm that bounds how often we can
flush a page out of the cache to some number of tuples significantly
greater than 100.  The one I suggested yesterday has that property,
for example, although it may have other problems I'm not thinking of.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding Japanese README

2011-06-30 Thread Robert Haas
On Thu, Jun 30, 2011 at 1:49 AM, Hitoshi Harada umi.tan...@gmail.com wrote:
 2011/6/30 Itagaki Takahiro itagaki.takah...@gmail.com:
 On Thu, Jun 30, 2011 at 09:42, Tatsuo Ishii is...@postgresql.org wrote:
 BTW I will talk to some Japanese speaking developers about my idea if
 community agree to add Japanese README to the source tree so that I am
 not the only one who are contributing this project

 Now, if someone wanted to set up a web site or Wiki page with
 translations, that would be up to them.

 IMHO, the Wiki approach seems to be reasonable than a README file.
 It will be suitable for adding non-Japanese translations and
 non-core developer can join to translate or fix the docs.

 +1. If we really want to prove the demand, let's start with Wiki,
 which is less invasive than README (though I doubt such pages would be
 updated finally.)

We could consider adding a note to the README files pointing people to
an appropriate wiki page for translated versions, so that someone who
just checks out the source tree has a place to go and start looking.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Latch implementation that wakes on postmaster death on both win32 and Unix

2011-06-30 Thread Robert Haas
On Thu, Jun 30, 2011 at 5:47 AM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 I don't think that the way I've phrased my error messages is
 inconsistent with that style guide, excepty perhaps the pipe()
 reference, but if you feel it's important to try and use could not,
 I have no objections.

I  like Fujii's rephrasing - we don't usually mention the name of the
system call.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] how to call the function--pqCatenateResultError()

2011-06-30 Thread Robert Haas
2011/6/27 _石头 tanji...@qq.com:
 Hello!~

        Now i encounter a function call problem in PostgreSQL's psql module!

        The situation is as follow:
         In ./src/bin/psql/common.c, I want to call the
 function pqCatenateResultError().
         Function pqCatenateResultError() is declared in
 ./src/interfaces/libpq/libpq-init.h
                                     extern void
 pqCatenateResultError(PGresult *res, const char *msg);
         and is defined in ./src/interfaces/libpq/fe-exec.c
                                void
                                pqCatenateResultError(PGresult *res, const
 char *msg)
                               {
                           PQExpBufferData errorBuf;
                           if (!res || !msg)
                              return;
                           initPQExpBuffer(errorBuf);
                           if (res-errMsg)
                   appendPQExpBufferStr(errorBuf, res-errMsg);
                           appendPQExpBufferStr(errorBuf, msg);
                           pqSetResultError(res, errorBuf.data);
                           termPQExpBuffer(errorBuf);
                               }
       To call this function in ./common.c, I include 'libpq-init.h' in
 ./src/bin/psql/common.h .
       As ./common.c include the header file 'common.h'.
      But when I use pqCatenateResultError() in ./common.c, It appears
 undefined reference to pqCatenateResultError() first.

      So I include 'extern void pqCatenateResultError(PGresult *res, const
 char *msg);' at the begining of './common.c' .
      But this change make no difference to the result.

       I do not know why this happened! Someone hlep me!  Thank you.
       There is another situation similar to the situation above:
       Function PQexec() is declared in ./src/interfaces/libpq/libpq-fe.h and
 defined in ./src/interfaces/libpq/fe-exec.c
                              extern PGresult *PQexec(PGconn *conn, const
 char *query);
       I can call this function with no error happening!
       These two situation puzzled me!~

Are you developing on Windows, by any chance?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq SSL with non-blocking sockets

2011-06-30 Thread Steve Singer

On 11-06-28 02:14 PM, Martin Pihlak wrote:


Hmm, I thought I thought about that. There was a check in the original
patch: if (conn-sslRetryBytes || (conn-outCount - remaining)  0)
So if the SSL retry buffer was emptied it would return 1 if there was
something left in the regular output buffer. Or did I miss something
there?



The issue I saw in the original patch was that at that point in the 
function, sslRetryBytes could be zero (if the data was sent) but  
conn-outCount - remaining would be an incorrect value since remaining 
is the number of bytes left to send from sslRetryBuf NOT 
conn-outBuffer.   This is no longer an issue in the updated patch.

I will try to take a closer look at your updated patch in the next few days.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in SQL/MED?

2011-06-30 Thread Alvaro Herrera
Excerpts from 花田 茂's message of jue jun 30 06:00:23 -0400 2011:

 I attached a patch which fixes file_fdw to check required option
 (filename) in its validator function.  I think that such requirement
 should be checked again in PlanForeignScan(), as it had been so far.
 Note that this patch requires fdw.patch has been applied.

I think it'd be good to keep the error check in fileGetOptions() just to
ensure that we don't crash in case a catalog is messed up with, though
I'd degrade it to elog(ERROR) from ereport.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small patch for GiST: move childoffnum to child

2011-06-30 Thread Alexander Korotkov
Hi Jeff,

Thank you for review.

On Thu, Jun 30, 2011 at 8:50 AM, Jeff Janes jeff.ja...@gmail.com wrote:

 So I tried provoking situations where this surrounding code section

would get executed, both patched and unpatched.  I have been unable to
 do so--apparently this code is for an incredibly obscure situation
 which I can't induce at will.

Yes, it also seems pretty hard to get this code section executed for me. I'm
going to ask Teodor and Oleg about it.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] hint bit cache v6

2011-06-30 Thread Merlin Moncure
On Wed, Jun 29, 2011 at 3:18 PM, Robert Haas robertmh...@gmail.com wrote:
 With the algorithm as coded, to fully flush the cache you'd have to
 find a series of *unhinted* tuples distributed across minimum of four
 64k wide page ranges, with the number of tuples being over the 5%
 noise floor.  Furthermore, to eject a cache page you have to have more
 hits than a page already in the cache got -- hits are tracked on the
 existing pages and the candidate pages are effectively with the
 existing pages based on hits with the best four picked.  Also, is it
 reasonable to expect the cache to help in these types of cases
 anyways?

 *scratches head*

 Well, surely you must need to reset the counters for the pages you've
 chosen to include in the cache at some point.  If you don't, then
 there's a strong likelihood that after some period of time the pages
 you have in there will become so firmly nailed into the cache that
 they can never be evicted.

yup -- hit counts are reset on replacement.

 Yeah, I am inclined to think that going very far in that direction is
 going to be a big pile of fail.  TBH, I'm somewhat surprised that you
 managed to get what you already have to work without a performance
 regression.  That code path is extremely hot, and injecting more
 complexity seems like it ought to be a loser.  For purposes of this
 review, I'm taking it as given that you've tested this carefully, but
 I'll admit to lingering skepticism.

You are absolutely right to be skeptical.  Here's the rub -- it's
incredibly difficult to contrive a set of circumstances where the
cache is aggressively replaced, and even if you can somehow coerce
that to happen, the underlying data is permanently altered so that it
can't happen again.  Cheating by mucking around with the xid or
altering the visibility routines to force replacement isn't really
fair.  AFAICT, unhinted tuples tend to be both recent and grouped into
a fairly narrow transaction range basically always.  Sooner or later,
at least for tables that matter, a vacuum is going to roll around and
smear the bits back into the table.  My point is that even for the
worst case I could think of -- pgbench continually jamming records
into the table, replacements tend to be quite rare.  Let's assume
though I'm wrong and the pathological case is likely to happen.

 I think the basic problem is that the cache pages are so large.  It's
 hard to make them smaller because that increases the cost of accessing
 the cache, as you already noted, but at the same time, making an
 eviction decision on a cache that holds 64K entries based on 100 data
 points seems like waving around a pretty large-caliber weapon in a
 fairly uncontrolled fashion.  Maybe it works OK 90% of the time, but
 it's hard to avoid the niggling fear that someone might accidentally
 get shot.

Right...it's essentially a rolling statistical sampling of transaction
IDs based on past acceses.  Going from say, 100 to 1000 will probably
drop your errror margin quite a bit but I really wonder how benefit
you can get from going past that.

 Just for the sake of argument, let's suppose we made an array with 64K
 elements, each one representing one 64K chunk of the XID space.  Each
 element is a 4-byte unsigned integer, so the array takes up 256kB of
 memory... probably not good, but I'm just thinking out loud here.
 Every time we're asked about an XID, we bump the corresponding
 counter.  Periodically (say, every 64K probes) we zip through all the
 counters and consider performing a cache replacement.  We look at the
 not-already-cached page that has the highest counter value and compare
 it to the counter value for the cached pages.  If it is higher and the
 difference exceeds some safety margin (to protect against hysteresis),
 then we do the replacement.  I think something like this would allow
 you to have a lot more information available to make a direct
 apples-to-apples comparison at cache replacement time, as compared
 with what you have now.

yeah -- what you've done here is basically two things: 1. by mapping
static ranges you get to skip the sort (but not the scan) during the
replacement, and 2. by vastly expanding the sampling size you
eliminate thrashing via noise in the rolling sample.  This comes at a
significant memory cost and loss of some nimbleness in the cache (i'm
not completely sure more aggressive replacement is 'bad')  -- although
that could be mitigated by replacing at more frequent intervals.

 Now, the big problem here is that a 256kB array is probably too big,
 but because we don't want to blow out the lower-level CPU caches and
 also because every time we consider performing a cache replacement
 we're going to want to zero the whole thing, and that has to be fast.
 But we probably don't really need the array to cover the entire XID
 space.  vacuum_freeze_min_age defaults to 50 million transactions, and
 vacuum_freeze_table_age defaults to 150 million transactions.  If we
 only concern ourselves 

Re: [HACKERS] Avoid index rebuilds for no-rewrite ALTER TABLE ALTER TYPE

2011-06-30 Thread Noah Misch
On Wed, Jun 29, 2011 at 09:42:06AM -0400, Robert Haas wrote:
 On Tue, Jun 28, 2011 at 3:40 PM, Noah Misch n...@leadboat.com wrote:

  Here's the call stack in question:
 
  ? ? ? ?RelationBuildLocalRelation
  ? ? ? ?heap_create
  ? ? ? ?index_create
  ? ? ? ?DefineIndex
  ? ? ? ?ATExecAddIndex
 
  Looking at it again, it wouldn't bring the end of the world to add a 
  relfilenode
  argument to each. None of those have more than four callers.
 
 Yeah.  Those functions take an awful lot of arguments, which suggests
 that some refactoring might be in order, but I still think it's
 cleaner to add another argument than to change the state around
 after-the-fact.
 
  ATExecAddIndex()
  would then call RelationPreserveStorage() before calling DefineIndex(), 
  which
  would in turn put things in a correct state from the start. ?Does that seem
  appropriate? ?Offhand, I do like it better than what I had.
 
 I wish we could avoid the whole death-and-resurrection thing
 altogether, but off-hand I'm not seeing a real clean way to do that.
 At the very least we had better comment it to death.

I couldn't think of a massive amount to say about that, but see what you think
of this level of commentary.

Looking at this again turned up a live bug in the previous version: if the old
index file were created in the current transaction, we would wrongly remove its
delete-at-abort entry as well as its delete-at-commit entry.  This leaked the
disk file.  Fixed by adding an argument to RelationPreserveStorage() indicating
which kind to remove.  Test case:

BEGIN;
CREATE TABLE t AS SELECT * FROM generate_series(1,10) t(n);
CREATE INDEX ti ON t(n);
SELECT pg_relation_filepath('ti');
ALTER TABLE t ALTER n TYPE int;
ROLLBACK;
CHECKPOINT;
-- file named above should be gone

I also updated the ATPostAlterTypeCleanup() variable names per discussion and
moved IndexStmt.oldNode to a more-natural position in the structure.

Thanks,
nm
diff --git a/doc/src/sgml/xindex.sgml b/doc/src/sgml/xindex.sgml
index a90b4e2..b6fe908 100644
*** a/doc/src/sgml/xindex.sgml
--- b/doc/src/sgml/xindex.sgml
***
*** 834,846  ALTER OPERATOR FAMILY integer_ops USING btree ADD
para
 In a B-tree operator family, all the operators in the family must sort
 compatibly, meaning that the transitive laws hold across all the data types
!supported by the family: quoteif A = B and B = C, then A =
!C/, and quoteif A lt; B and B lt; C, then A lt; C/.  For each
!operator in the family there must be a support function having the same
!two input data types as the operator.  It is recommended that a family be
!complete, i.e., for each combination of data types, all operators are
!included.  Each operator class should include just the non-cross-type
!operators and support function for its data type.
/para
  
para
--- 834,848 
para
 In a B-tree operator family, all the operators in the family must sort
 compatibly, meaning that the transitive laws hold across all the data types
!supported by the family: quoteif A = B and B = C, then A = C/,
!and quoteif A lt; B and B lt; C, then A lt; C/.  Subjecting operands
!to any number of implicit casts or binary coercion casts involving only
!types represented in the family must not change the result other than to
!require a similar cast thereon.  For each operator in the family there must
!be a support function having the same two input data types as the operator.
!It is recommended that a family be complete, i.e., for each combination of
!data types, all operators are included.  Each operator class should include
!just the non-cross-type operators and support function for its data type.
/para
  
para
***
*** 851,861  ALTER OPERATOR FAMILY integer_ops USING btree ADD
 by the family's equality operators, even when the values are of different
 types.  This is usually difficult to accomplish when the types have
 different physical representations, but it can be done in some cases.
!Notice that there is only one support function per data type, not one
!per equality operator.  It is recommended that a family be complete, i.e.,
!provide an equality operator for each combination of data types.
!Each operator class should include just the non-cross-type equality
!operator and the support function for its data type.
/para
  
para
--- 853,865 
 by the family's equality operators, even when the values are of different
 types.  This is usually difficult to accomplish when the types have
 different physical representations, but it can be done in some cases.
!Implicit casts and binary coercion casts among types represented in the
!family must preserve this invariant.  Notice that there is only one support
!function per data type, not one per equality operator.  It is recommended
! 

Re: [HACKERS] creating CHECK constraints as NOT VALID

2011-06-30 Thread Alvaro Herrera
Excerpts from Robert Haas's message of sáb jun 18 23:53:17 -0400 2011:

 I agree.  That's pretty contorted.  How about something like this:
 

Thanks Jaime and Robert.  I have pushed this patch with these fixes.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types, constructors, and the type system

2011-06-30 Thread Jeff Davis
On Thu, 2011-06-30 at 09:11 +0200, Florian Pflug wrote:
  How would the system catalogs be initialized under that theory: surely
  you're not going to seed (nr. of types) * (nr. of collations) * (nr. of
  opclasses) range types in initdb?
 
 There's CREATE RANGE.

Right. In that respect, it's more like a record type: many possible
record types exist, but you only define the ones you want.

 By default, no range types would exists I believe.

I was planning to include _some_ by default. Probably not text ranges,
but integer and timestamp[tz] ranges. If nothing else, it makes it
easier to document.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types, constructors, and the type system

2011-06-30 Thread Jeff Davis
On Wed, 2011-06-29 at 10:15 -0700, David E. Wheeler wrote:
 On Jun 29, 2011, at 10:13 AM, Florian Pflug wrote:
 
  Because there might be more than one range type for a
  base type. Say there are two range types over text, one
  with collation 'de_DE' and one with collation 'en_US'.
  What would the type of
   range('foo', 'f')
  be?
 
 The one that corresponds to the current LC_COLLATE setting.

Then how do you get a text range that doesn't correspond to the
LC_COLLATE setting? Does that mean you couldn't dump/reload from a
system with one collation and get the same values in a system with a
different collation? That would be very strange.

Or, what about other types that just happen to have multiple useful
total orders?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types, constructors, and the type system

2011-06-30 Thread Jeff Davis
On Wed, 2011-06-29 at 12:34 -0400, Robert Haas wrote:
 But now that I'm thinking about this a little more, I'm worried about this 
 case:
 
 CREATE TABLE foo AS RANGE('something'::funkytype, 'somethingelse'::funktype);
 DROP TYPE funkytype;
 
 It seems to me that the first statement had better fail, or else the
 second one is going to create a hopeless mess (imagine that a new type
 comes along and gets the OID of funkytype).

Interesting point. I don't think it's a problem because pseudo-types
can't be used that way, so that provides us a mechanism to stop it. But
it means that we have to be a little more sure that such values can't
persist anywhere.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hint bit cache v6

2011-06-30 Thread Robert Haas
On Thu, Jun 30, 2011 at 11:18 AM, Merlin Moncure mmonc...@gmail.com wrote:
 I think the basic problem is that the cache pages are so large.  It's
 hard to make them smaller because that increases the cost of accessing
 the cache, as you already noted, but at the same time, making an
 eviction decision on a cache that holds 64K entries based on 100 data
 points seems like waving around a pretty large-caliber weapon in a
 fairly uncontrolled fashion.  Maybe it works OK 90% of the time, but
 it's hard to avoid the niggling fear that someone might accidentally
 get shot.

 Right...it's essentially a rolling statistical sampling of transaction
 IDs based on past acceses.  Going from say, 100 to 1000 will probably
 drop your errror margin quite a bit but I really wonder how benefit
 you can get from going past that.

An interesting idea might be to forcibly cause a replacement every 100
tuples (or every 1000 tuples) and see if that causes a noticeable
performance regression.  If it doesn't, that's a good data point.

I think the sour spot for this whole idea is likely to be the case
where you have a workload that is 90% read and 10% write, or something
like that.  On an all-read workload (like pgbench -S), you're
hopefully going to converge to a state where all the hint-bits are
set, once VACUUM kicks in.  And on an all-write workload I think that
you might lose the effect in the noise.  Not sure if we have an easy
way to set up such a workload with pgbench, though.

 Just for the sake of argument, let's suppose we made an array with 64K
 elements, each one representing one 64K chunk of the XID space.  Each
 element is a 4-byte unsigned integer, so the array takes up 256kB of
 memory... probably not good, but I'm just thinking out loud here.
 Every time we're asked about an XID, we bump the corresponding
 counter.  Periodically (say, every 64K probes) we zip through all the
 counters and consider performing a cache replacement.  We look at the
 not-already-cached page that has the highest counter value and compare
 it to the counter value for the cached pages.  If it is higher and the
 difference exceeds some safety margin (to protect against hysteresis),
 then we do the replacement.  I think something like this would allow
 you to have a lot more information available to make a direct
 apples-to-apples comparison at cache replacement time, as compared
 with what you have now.

 yeah -- what you've done here is basically two things: 1. by mapping
 static ranges you get to skip the sort (but not the scan) during the
 replacement, and 2. by vastly expanding the sampling size you
 eliminate thrashing via noise in the rolling sample.  This comes at a
 significant memory cost and loss of some nimbleness in the cache (i'm
 not completely sure more aggressive replacement is 'bad')  -- although
 that could be mitigated by replacing at more frequent intervals.

Well, that gets at an interesting question: how often SHOULD we
replace cache pages?  My gut is that it's 10K-100K and your gut is
that it's 100-1000, which is a hundred-fold difference.  Seems like we
need some kind of emprical data to decide what actually works well in
real life.

 Your range counting strategy might work -- I think to do it right so
 that you don't have to map the entire range is going to require two
 levels of ranges such that you activate a very wide range first before
 looking at 64k subranges. That way you could reduce memory consumption
 significantly without sorting during the replacement.

I don't think you need two levels of ranges.  Just keep track of the
minimum and maximum XID covered by the array (these must always be 64M
transactions apart, say, if the array has 1024 entries, and each cache
page covers 64K XIDs).  When you see a given XID, and it's between the
minimum and the maximum, then bump the counter in bucket XID/64K.  If
you see an XID that is older than the minimum, then ignore it; we
won't consider devoting cache pages to really old XIDs.  If you see an
XID that is newer than the maximum, then just increase the minimum and
maximum by enough to put the new XID in range.  For every 64K
increment by which you advance the minimum and maximum, you need to
zero one entry in the array (since it's no longer covering the same
portion of the XID space) but all the other entries remain intact.

(There is a small problem here of how to work out how to initially set
the minimum and maximum to something sensible, and to make sure that
they don't get out of whack if a backend sits idle for a few billion
transactions, but that seems like it should be solvable.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types, constructors, and the type system

2011-06-30 Thread David E. Wheeler
On Jun 30, 2011, at 9:29 AM, Jeff Davis wrote:

 Right. In that respect, it's more like a record type: many possible
 record types exist, but you only define the ones you want.

Well, okay. How is this same problem handled for RECORD types, then?

 By default, no range types would exists I believe.
 
 I was planning to include _some_ by default. Probably not text ranges,
 but integer and timestamp[tz] ranges. If nothing else, it makes it
 easier to document.

+1

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types, constructors, and the type system

2011-06-30 Thread David E. Wheeler
On Jun 30, 2011, at 9:34 AM, Jeff Davis wrote:

 Then how do you get a text range that doesn't correspond to the
 LC_COLLATE setting?

You cast it.

 Does that mean you couldn't dump/reload from a
 system with one collation and get the same values in a system with a
 different collation? That would be very strange.

No, pg_dump should always explicitly cast things. But there should be a 
reasonable default behavior if I'm in psql and don't cast.

 Or, what about other types that just happen to have multiple useful
 total orders?

Cast where you need it explicit, and have a reasonable default when it's not 
cast.

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] time-delayed standbys

2011-06-30 Thread Josh Berkus
On 6/30/11 2:00 AM, Simon Riggs wrote:
 Manual (or scripted) intervention is always necessary if you reach disk
  100% full.
 
  Wow, that's a pretty crappy failure mode... but I don't think we need
  to fix it just on account of this patch.  It would be nice to fix, of
  course.
 How is that different to running out of space in the main database?
 
 If I try to pour a pint of milk into a small cup, I don't blame the cup.

I have to agree with Simon here.  ;-)

We can do some things to make this easier for administrators, but
there's no way to solve the problem.  And the things we could do would
have to be advanced optional modes which aren't on by default, so they
wouldn't really help the DBA with poor planning skills.  Here's my
suggestions:

1) Have a utility (pg_archivecleanup?) which checks if we have more than
a specific settings's worth of archive_logs, and breaks replication and
deletes the archive logs if we hit that number.  This would also require
some way for the standby to stop replicating *without* becoming a
standalone server, which I don't think we currently have.

2) Have a setting where, regardless of standby_delay settings, the
standby will interrupt any running queries and start applying logs as
fast as possible if it hits a certain number of unapplied archive logs.
 Of course, given the issues we had with standby_delay, I'm not sure I
want to complicate it further.

I think we've already fixed the biggest issue in 9.1, since we now have
a limit on the number of WALs the master will keep if archiving is
failing ... yes?  That's the only big *avoidable* failure mode we have,
where a failing standby effectively shuts down the master.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoid index rebuilds for no-rewrite ALTER TABLE ALTER TYPE

2011-06-30 Thread Robert Haas
On Thu, Jun 30, 2011 at 11:55 AM, Noah Misch n...@leadboat.com wrote:
 On Wed, Jun 29, 2011 at 09:42:06AM -0400, Robert Haas wrote:
 On Tue, Jun 28, 2011 at 3:40 PM, Noah Misch n...@leadboat.com wrote:

  Here's the call stack in question:
 
  ? ? ? ?RelationBuildLocalRelation
  ? ? ? ?heap_create
  ? ? ? ?index_create
  ? ? ? ?DefineIndex
  ? ? ? ?ATExecAddIndex
 
  Looking at it again, it wouldn't bring the end of the world to add a 
  relfilenode
  argument to each. None of those have more than four callers.

 Yeah.  Those functions take an awful lot of arguments, which suggests
 that some refactoring might be in order, but I still think it's
 cleaner to add another argument than to change the state around
 after-the-fact.

  ATExecAddIndex()
  would then call RelationPreserveStorage() before calling DefineIndex(), 
  which
  would in turn put things in a correct state from the start. ?Does that seem
  appropriate? ?Offhand, I do like it better than what I had.

 I wish we could avoid the whole death-and-resurrection thing
 altogether, but off-hand I'm not seeing a real clean way to do that.
 At the very least we had better comment it to death.

 I couldn't think of a massive amount to say about that, but see what you think
 of this level of commentary.

 Looking at this again turned up a live bug in the previous version: if the old
 index file were created in the current transaction, we would wrongly remove 
 its
 delete-at-abort entry as well as its delete-at-commit entry.  This leaked the
 disk file.  Fixed by adding an argument to RelationPreserveStorage() 
 indicating
 which kind to remove.  Test case:

        BEGIN;
        CREATE TABLE t AS SELECT * FROM generate_series(1,10) t(n);
        CREATE INDEX ti ON t(n);
        SELECT pg_relation_filepath('ti');
        ALTER TABLE t ALTER n TYPE int;
        ROLLBACK;
        CHECKPOINT;
        -- file named above should be gone

 I also updated the ATPostAlterTypeCleanup() variable names per discussion and
 moved IndexStmt.oldNode to a more-natural position in the structure.

On first blush, that looks a whole lot cleaner.  I'll try to find some
time for a more detailed review soon.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] time-delayed standbys

2011-06-30 Thread Robert Haas
On Thu, Jun 30, 2011 at 1:00 PM, Josh Berkus j...@agliodbs.com wrote:
 On 6/30/11 2:00 AM, Simon Riggs wrote:
 Manual (or scripted) intervention is always necessary if you reach disk
  100% full.
 
  Wow, that's a pretty crappy failure mode... but I don't think we need
  to fix it just on account of this patch.  It would be nice to fix, of
  course.
 How is that different to running out of space in the main database?

 If I try to pour a pint of milk into a small cup, I don't blame the cup.

 I have to agree with Simon here.  ;-)

 We can do some things to make this easier for administrators, but
 there's no way to solve the problem.  And the things we could do would
 have to be advanced optional modes which aren't on by default, so they
 wouldn't really help the DBA with poor planning skills.  Here's my
 suggestions:

 1) Have a utility (pg_archivecleanup?) which checks if we have more than
 a specific settings's worth of archive_logs, and breaks replication and
 deletes the archive logs if we hit that number.  This would also require
 some way for the standby to stop replicating *without* becoming a
 standalone server, which I don't think we currently have.

 2) Have a setting where, regardless of standby_delay settings, the
 standby will interrupt any running queries and start applying logs as
 fast as possible if it hits a certain number of unapplied archive logs.
  Of course, given the issues we had with standby_delay, I'm not sure I
 want to complicate it further.

 I think we've already fixed the biggest issue in 9.1, since we now have
 a limit on the number of WALs the master will keep if archiving is
 failing ... yes?  That's the only big *avoidable* failure mode we have,
 where a failing standby effectively shuts down the master.

I'm not sure we changed anything in this area for 9.1.  Am I wrong?
wal_keep_segments was present in 9.0.  Using that instead of archiving
is a reasonable way to bound the amount of disk space that can get
used, at the cost of possibly needing to rebuild the standby if things
get too far behind.  Of course, in any version, you could also use an
archive_command that will remove old files to make space if the disk
is full, with the same downside: if the standby isn't done with those
files, you're now in for a rebuild.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] time-delayed standbys

2011-06-30 Thread Robert Haas
On Thu, Jun 30, 2011 at 6:45 AM, Simon Riggs si...@2ndquadrant.com wrote:
 The only way to control this is with a time delay that can be changed
 while the server is running. A recovery.conf parameter doesn't allow
 that, so another way is preferable.

True.  We've talked about making the recovery.conf parameters into
GUCs, which would address that concern (and some others).

 I think the time problems are more complex than said. The patch relies
 upon transaction completion times, but not all WAL records have a time
 attached to them. Plus you only used commits anyway, not sure why.

For the same reason we do that with the recovery_target_* code -
replaying something like a heap insert or heap update doesn't change
the user-visible state of the database, because the records aren't
visible anyway until the commit record is replayed.

 Some actions aren't even transactional, such as DROP DATABASE, amongst

Good point.  We'd probably need to add a timestamp to the drop
database record, as that's a case that people would likely want to
defend against with this feature.

 others. Consecutive records can be hours apart, so it would be
 possible to delay on some WAL records but then replay records that
 happened minutes ago, then wait hours for the next apply. So this
 patch doesn't do what it claims in all cases.

 Similar discussion on max_standby_delay covered exactly that ground
 and went on for weeks in 9.0. IIRC I presented the same case you just
 did and we agreed in the end that was not acceptable. I'm not going to
 repeat it. Please check the archives.

I think this case is a bit different.  First, max_standby_delay is
relevant for any installation using Hot Standby, whereas this is a
feature that specifically involves time.  Saying that you have to have
time synchronization for Hot Standby to work as designed is more of a
burden than saying you need time synchronization *if you want to use
the time-delayed recovery feature*.  Second, and maybe more
importantly, no one has come up with an idea for how to make this work
reliably in the presence of time skew.  Perhaps we could provide a
simple time-skew correction feature that would work in the streaming
case (though probably not nearly as well as running ntpd), but as I
understand your argument, you're saying that most people will want to
use this with archiving.  I don't see how to make that work without
time synchronization.  In the max_standby_delay case, the requirement
is that queries not get cancelled too aggressively while at the same
time letting the standby get too far behind the master, which leaves
some flexibility in terms of how we actually make that trade-off, and
we eventually found a way that didn't require time synchronization,
which was an improvement.  But for a time-delayed standby, the
requirement at least AIUI is that the state of the standby lag the
master by a certain time interval, and I don't see any way to do that
without comparing slave timestamps with master timestamps.  If we can
find a similar clever trick here, great!  But I'm not seeing how to do
it.

Now, another option here is to give up on the idea of a time-delayed
standby altogether and instead allow the standby to lag the master by
a certain number of WAL segments or XIDs.  Of course, if we do that,
then we will not have a feature called time-delayed standbys.
Instead, we will have a feature called standbys delayed by a certain
number of WAL segments (or XIDs).  That certainly caters to some of
the same use cases, but I think it severely lacking in the usability
department.  I bet the first thing most people will do is to try to
figure out how to translate between those metrics and time, and I bet
we'll get complaints on systems where the activity load is variable
and therefore the time lag for a fixed WAL-segment lag or XID-lag is
unpredictable.  So I think keeping it defined it terms of time is the
right way forward, even though the need for external time
synchronization is, certainly, not ideal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] time-delayed standbys

2011-06-30 Thread Josh Berkus
On 6/30/11 10:25 AM, Robert Haas wrote:
 So I think keeping it defined it terms of time is the
 right way forward, even though the need for external time
 synchronization is, certainly, not ideal.

Actually, when we last had the argument about time synchronization,
Kevin Grittner (I believe) pointed out that unsynchronized replication
servers have an assortment of other issues ... like any read query
involving now().  As the person who originally brought up this hurdle, I
felt that his argument defeated mine.

Certainly I can't see any logical way to have time delay in the absence
of clock synchronization of some kind.  Also, I kinda feel like this
discussion seems aimed at overcomplicating a feature which only a small
fraction of our users will ever use.Let's keep it as simple as possible.

As for delay on streaming replication, I'm for it.  I think that
post-9.1, thanks to pgbasebackup, the number of our users who are doing
archive log shipping is going to drop tremendously.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] time-delayed standbys

2011-06-30 Thread Kevin Grittner
Josh Berkus j...@agliodbs.com wrote:
 
 when we last had the argument about time synchronization,
 Kevin Grittner (I believe) pointed out that unsynchronized
 replication servers have an assortment of other issues ... like
 any read query involving now().
 
I don't remember making that point, although I think it's a valid
one.
 
What I'm sure I pointed out is that we have one central router which
synchronizes to a whole bunch of atomic clocks around the world
using the normal discard the outliers and average the rest
algorithm, and then *every singe server and workstation on our
network synchronizes to that router*.  Our database servers are all
running on Linux using ntpd.  Our monitoring spams us with email if
any of the clocks falls outside nominal bounds.  (It's been many
years since we had a misconfigured server which triggered that.)
 
I think doing anything in PostgreSQL around this beyond allowing
DBAs to trust their server clocks is insane.  The arguments for
using and trusting ntpd is pretty much identical to the arguments
for using and trusting the OS file systems.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] time-delayed standbys

2011-06-30 Thread Josh Berkus
Kevin,

 I think doing anything in PostgreSQL around this beyond allowing
 DBAs to trust their server clocks is insane.  The arguments for
 using and trusting ntpd is pretty much identical to the arguments
 for using and trusting the OS file systems.

Oh, you don't want to implement our own NTP?  Coward!

;-)

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] time-delayed standbys

2011-06-30 Thread Robert Haas
On Thu, Jun 30, 2011 at 1:51 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 I think doing anything in PostgreSQL around this beyond allowing
 DBAs to trust their server clocks is insane.  The arguments for
 using and trusting ntpd is pretty much identical to the arguments
 for using and trusting the OS file systems.

Except that implementing our own file system would likely have more
benefit and be less work than implementing our own time
synchronization, at least if we want it to be reliable.

Again, I am not trying to pretend that this is any great shakes.
MySQL's version of this feature apparently does somehow compensate for
time skew, which I assume must mean that their replication works
differently than ours - inter alia, it probably requires a TCP socket
connection between the servers.  Since we don't require that, it
limits our options in this area, but also gives us more options in
other areas.  Still, if I could think of a way to do this that didn't
depend on time synchronization, then I'd be in favor of eliminating
that requirement.  I just can't; and I'm inclined to think it isn't
possible.

I wouldn't be opposed to having an option to try to detect time skew
between the master and the slave and, say, display that information in
pg_stat_replication.  It might be useful to have that data for
monitoring purposes, and it probably wouldn't even be that much code.
However, I'd be a bit hesitant to use that data to correct the
amount of time we spend waiting for time-delayed replication, because
it would doubtless be extremely imprecise compared to real time
synchronization, and considerably more error-prone.  IOW, what you
said.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Enable CHECK constraints to be declared NOT VALID

2011-06-30 Thread Alvaro Herrera
Excerpts from Alvaro Herrera's message of jue jun 30 11:58:09 -0400 2011:
 Enable CHECK constraints to be declared NOT VALID
 
 [...]
 
 This patch was sponsored by Enova Financial.

Robert Hass (whose name I misspelled in the commit message above) just
mentioned to me (in an answer to my apologizing about it) that he didn't
think that mentioning sponsors for patch development was a good idea.

I don't think we have a policy for this, but I have done it for some
time now and nobody has complained, so I sort of assumed it was okay.
Besides, some of the people pouring the money in does care about it;
moreover, it provides a little incentive for other companies that might
also be in a position to fund development but lack the peer approval
of the idea, or a final little push.

So what's the general opinion here?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Enable CHECK constraints to be declared NOT VALID

2011-06-30 Thread David E. Wheeler
On Jun 30, 2011, at 12:09 PM, Alvaro Herrera wrote:

 Robert Hass (whose name I misspelled in the commit message above) just
 mentioned to me (in an answer to my apologizing about it) that he didn't
 think that mentioning sponsors for patch development was a good idea.
 
 I don't think we have a policy for this, but I have done it for some
 time now and nobody has complained, so I sort of assumed it was okay.
 Besides, some of the people pouring the money in does care about it;
 moreover, it provides a little incentive for other companies that might
 also be in a position to fund development but lack the peer approval
 of the idea, or a final little push.
 
 So what's the general opinion here?

I certainly see no harm in it, and contributors at all levels -- including 
sponsors of new features or fixes -- ought to be acknowledged and thanked.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Enable CHECK constraints to be declared NOT VALID

2011-06-30 Thread Devrim GÜNDÜZ
On Thu, 2011-06-30 at 15:09 -0400, Alvaro Herrera wrote:

snip

 I don't think we have a policy for this, but I have done it for some
 time now and nobody has complained, so I sort of assumed it was okay.
 Besides, some of the people pouring the money in does care about it;
 moreover, it provides a little incentive for other companies that
 might also be in a position to fund development but lack the peer
 approval of the idea, or a final little push.
 
 So what's the general opinion here? 

+1 for adding sponsor name to the commit message. It will encourage
companies more. 
-- 
Devrim GÜNDÜZ
Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] hint bit cache v6

2011-06-30 Thread Merlin Moncure
On Thu, Jun 30, 2011 at 11:44 AM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jun 30, 2011 at 11:18 AM, Merlin Moncure mmonc...@gmail.com wrote:
 I think the basic problem is that the cache pages are so large.  It's
 hard to make them smaller because that increases the cost of accessing
 the cache, as you already noted, but at the same time, making an
 eviction decision on a cache that holds 64K entries based on 100 data
 points seems like waving around a pretty large-caliber weapon in a
 fairly uncontrolled fashion.  Maybe it works OK 90% of the time, but
 it's hard to avoid the niggling fear that someone might accidentally
 get shot.

 Right...it's essentially a rolling statistical sampling of transaction
 IDs based on past acceses.  Going from say, 100 to 1000 will probably
 drop your errror margin quite a bit but I really wonder how benefit
 you can get from going past that.

 An interesting idea might be to forcibly cause a replacement every 100
 tuples (or every 1000 tuples) and see if that causes a noticeable
 performance regression.  If it doesn't, that's a good data point.

To test this I disabled the cache check on top of
HeapTupleSatisfiesMVCC and forced a cache entry in place of it with a
bogus xid (so every tuple scanned was treated as a 'miss'.  Scanning 1
million records in memory over and over went up a few percentage
points -- converging on about 280ms instead of 270ms.   This is with
bumped to 1000 miss array:

oprofile said:
regular hinted tuple case:
120  10.2041  advance_aggregates
107   9.0986  heapgettup_pagemode
776.5476  ExecProject
746.2925  heapgetpage
726.1224  ExecProcNode
726.1224  ExecScan
695.8673  advance_transition_function
665.6122  heap_getnext
584.9320  HeapTupleSatisfiesMVCC

hinted tuple with pathological cache entry:
111   8.9588  advance_aggregates
109   8.7974  heapgettup_pagemode
856.8604  ExecProject
816.5375  heapgetpage
776.2147  ExecScan
705.6497  ExecStoreTuple
705.6497  HeapTupleSatisfiesMVCC

this was a fairly short profiling run but the numbers are fairly
consistent.  the replacement is fairly cheap...sorting 1000 ints
doesn't take all that long. 100 is even faster.

 I think the sour spot for this whole idea is likely to be the case
 where you have a workload that is 90% read and 10% write, or something
 like that.  On an all-read workload (like pgbench -S), you're
 hopefully going to converge to a state where all the hint-bits are
 set, once VACUUM kicks in.  And on an all-write workload I think that
 you might lose the effect in the noise.  Not sure if we have an easy
 way to set up such a workload with pgbench, though.

it's trivial to test with pgbench -- just use the random number
facility to generate an id for some table and update where random() 
.9.   However this does not generate nearly enough 'misses' to
exercise cache replacement in any meaningful way.  My workstation vm
can punch out about 5k transactions/sec.  With only 10% getting a new
xid and writing back a tuple to the table only 500 xids are getting
generated/second.  At that rate it takes quite a while to burn through
the 256k transactions the cache ranges over.  Also this case is not
bound by scan performance anyways making profiling it a little silly
-- HeapTupleSatisfiesMVCC is simply not as big of a bottleneck in OLTP
workloads.  Scan heavy loads is where this matters, and scan heavy
loads naturally tend to generate less xids per tuple scanned.

 Just for the sake of argument, let's suppose we made an array with 64K
 elements, each one representing one 64K chunk of the XID space.  Each
 element is a 4-byte unsigned integer, so the array takes up 256kB of
 memory... probably not good, but I'm just thinking out loud here.
 Every time we're asked about an XID, we bump the corresponding
 counter.  Periodically (say, every 64K probes) we zip through all the
 counters and consider performing a cache replacement.  We look at the
 not-already-cached page that has the highest counter value and compare
 it to the counter value for the cached pages.  If it is higher and the
 difference exceeds some safety margin (to protect against hysteresis),
 then we do the replacement.  I think something like this would allow
 you to have a lot more information available to make a direct
 apples-to-apples comparison at cache replacement time, as compared
 with what you have now.

 yeah -- what you've done here is basically two things: 1. by mapping
 static ranges you get to skip the sort (but not the scan) during the
 replacement, and 2. by vastly expanding the sampling size you
 eliminate thrashing via noise in the rolling sample.  This comes at a
 significant memory cost and loss of some nimbleness in the cache (i'm
 not completely sure more aggressive replacement is 'bad')  -- although
 that could be mitigated by replacing at more frequent intervals.

 Well, that gets at an interesting 

Re: [HACKERS] [COMMITTERS] pgsql: Make the visibility map crash-safe.

2011-06-30 Thread Jeff Davis
On Thu, 2011-06-30 at 07:50 -0400, Robert Haas wrote:
 I compare the performance of commit
 431ab0e82819b31fcd1e33ecb52c2cd3b4b41da7 (post-patch) with commit
 431ab0e82819b31fcd1e33ecb52c2cd3b4b41da7 (pre-patch).

I believe that is a copy/paste error.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing the overhead of frequent table locks, v4

2011-06-30 Thread Thom Brown
On 27 June 2011 15:13, Robert Haas robertmh...@gmail.com wrote:
 I didn't get a lot of comments on my the previous version of my patch
 to accelerate table locks.

 http://archives.postgresql.org/pgsql-hackers/2011-06/msg00953.php

 Here's a new version anyway.  In this version, I have:

 1. Made pg_locks show fast-path locks as well as regular locks, with a
 new Boolean column to distinguish.
 2. Renamed fpLWLock to backendLock, on the belief that we're going to
 want something like this for quite a bit more than just this one
 patch.
 3. Removed some debugging code.

Are you able to benchmark this again, and possibly get Stefan to give
it a go too?

Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add support for logging current role (what to review?)

2011-06-30 Thread Stephen Frost
* Alex Hunsaker (bada...@gmail.com) wrote:
 I think if Stephen was proposing 10 fields, or if there was a list of
 fields we were planning on adding in the next release or 3, it might
 be worth re-factoring. 

I know of at least one person (in an earlier piece of the thread
discussing this patch) who was talking about other fields they'd like
included in the CSV log which aren't currently.  I don't recall what
that was though, but I think it might have been something like line #
from inside stored procedures..

 I know of no such list, and I think this field
 useful/important enough that people who are using csv logging would
 want it anyway. +1 on just tacking on the field and causing a flag day
 for csv users.

Honestly, I think it was *me* who raised the issue that we don't have a
header for CSV logs and that it sucks for people using CSV files.  We've
changed it in the past (application_name was added, iirc) and there
wasn't much noise of it that I recall.  If everyone's happy with that,
it's fine by me.

I do want to rework the logging infrastructure (as discussed in the dev
meeting), but I see that whole thing as rather orthogonal to this
change.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Patch file questions?

2011-06-30 Thread Casey Havenor
Ok - loaded up Linux - fun stuff. 
Figured out how to get the code PostgreSQL version 9.0.4 - Have that in a
directory. 
Got the patch from above... 
Placed into same directory. 

Loaded the dependents for the final compile and install with .configure. 

BUT 

When I run the patch I'm getting some Hunk fails?  Is this because I'm not
using the same version that the author intended the patch for? 

From the output it looks to be going through the files properly but just no
able to inset the new code? 

I've tried ignoring white space and even tried a force - of course I backed
everything up prior to this. :)

-
Warmest regards, 

Casey Havenor
--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Patch-file-questions-tp4536031p4540442.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch file questions?

2011-06-30 Thread Andrew Dunstan



On 06/30/2011 07:13 PM, Casey Havenor wrote:


When I run the patch I'm getting some Hunk fails?  Is this because I'm not
using the same version that the author intended the patch for?




Yes. Welcome to the world of patch bitrot.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing the overhead of frequent table locks, v4

2011-06-30 Thread Robert Haas
On Thu, Jun 30, 2011 at 6:28 PM, Thom Brown t...@linux.com wrote:
 On 27 June 2011 15:13, Robert Haas robertmh...@gmail.com wrote:
 I didn't get a lot of comments on my the previous version of my patch
 to accelerate table locks.

 http://archives.postgresql.org/pgsql-hackers/2011-06/msg00953.php

 Here's a new version anyway.  In this version, I have:

 1. Made pg_locks show fast-path locks as well as regular locks, with a
 new Boolean column to distinguish.
 2. Renamed fpLWLock to backendLock, on the belief that we're going to
 want something like this for quite a bit more than just this one
 patch.
 3. Removed some debugging code.

 Are you able to benchmark this again, and possibly get Stefan to give
 it a go too?

I probably can, but there's not going to be any significant
difference.  The patch hasn't changed much.  Note that the version
where Stefan saw a big performance regression when he pounded the
server was actually a combination of this patch and the fast vxid
locks patch.  If we only apply this one, I believe there will still be
enough lock manager contention to stifle the worst of those effects.

Of course, even if we apply both, there's only going to be a
regression on servers with 32 cores, I think, and even then only when
the number of clients is greater than the number of cores.  Everyone
else will see a benefit.  I think that the next thing for me to tackle
is the SInvalReadLock spinlock contention, but what I'm really hurting
for is some code review.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] time-delayed standbys

2011-06-30 Thread Fujii Masao
On Fri, Jul 1, 2011 at 2:25 AM, Robert Haas robertmh...@gmail.com wrote:
 Some actions aren't even transactional, such as DROP DATABASE, amongst

 Good point.  We'd probably need to add a timestamp to the drop
 database record, as that's a case that people would likely want to
 defend against with this feature.

This means that recovery_target_* code would also need to deal with
DROP DATABASE case.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] time-delayed standbys

2011-06-30 Thread Fujii Masao
On Fri, Jul 1, 2011 at 3:25 AM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jun 30, 2011 at 1:51 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 I think doing anything in PostgreSQL around this beyond allowing
 DBAs to trust their server clocks is insane.  The arguments for
 using and trusting ntpd is pretty much identical to the arguments
 for using and trusting the OS file systems.

 Except that implementing our own file system would likely have more
 benefit and be less work than implementing our own time
 synchronization, at least if we want it to be reliable.

 Again, I am not trying to pretend that this is any great shakes.
 MySQL's version of this feature apparently does somehow compensate for
 time skew, which I assume must mean that their replication works
 differently than ours - inter alia, it probably requires a TCP socket
 connection between the servers.  Since we don't require that, it
 limits our options in this area, but also gives us more options in
 other areas.  Still, if I could think of a way to do this that didn't
 depend on time synchronization, then I'd be in favor of eliminating
 that requirement.  I just can't; and I'm inclined to think it isn't
 possible.

 I wouldn't be opposed to having an option to try to detect time skew
 between the master and the slave and, say, display that information in
 pg_stat_replication.  It might be useful to have that data for
 monitoring purposes, and it probably wouldn't even be that much code.
 However, I'd be a bit hesitant to use that data to correct the
 amount of time we spend waiting for time-delayed replication, because
 it would doubtless be extremely imprecise compared to real time
 synchronization, and considerably more error-prone.  IOW, what you
 said.

I agree with Robert. It's difficult to implement time-synchronization feature
which can deal with all the cases, and I'm not sure if that's really
worth taking
our time.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add support for logging current role (what to review?)

2011-06-30 Thread Alvaro Herrera
Excerpts from Stephen Frost's message of jue jun 30 18:35:40 -0400 2011:

  I know of no such list, and I think this field
  useful/important enough that people who are using csv logging would
  want it anyway. +1 on just tacking on the field and causing a flag day
  for csv users.
 
 Honestly, I think it was *me* who raised the issue that we don't have a
 header for CSV logs and that it sucks for people using CSV files.  We've
 changed it in the past (application_name was added, iirc) and there
 wasn't much noise of it that I recall.  If everyone's happy with that,
 it's fine by me.

I don't understand why it is so much a deal that 9.1 has a different CSV
table definition than 9.0 anyway (or any two release combination).  As
long as both are clearly and correctly documented in the respective
pages, it shouldn't be an issue at all.  If anyone attempts to load CSV
log files into the wrong table definition, the problem should make
itself evident pretty quickly.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] time-delayed standbys

2011-06-30 Thread Jaime Casanova
Fujii Masao masao.fu...@gmail.com writes:

 On Fri, Jul 1, 2011 at 2:25 AM, Robert Haas robertmh...@gmail.com wrote:
 Some actions aren't even transactional, such as DROP DATABASE, amongst

 Good point.  We'd probably need to add a timestamp to the drop
 database record, as that's a case that people would likely want to
 defend against with this feature.

 This means that recovery_target_* code would also need to deal with
 DROP DATABASE case.


there is no problem if you use restore point names... but of course
you lose flexibility (ie: you can't restore to 5 minutes before now)

mmm... a lazy idea: can't we just create a restore point wal record
*before* we actually drop the database? then we won't need to modify
logic about recovery_target_* (if it is only DROP DATABASE maybe that's
enough about complicating code) and we can provide that protection since
9.1

-- 
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL 
Soporte 24x7, desarrollo, capacitación y servicios

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Online base backup from the hot-standby

2011-06-30 Thread Jun Ishiduka

 On this commitfest, the goal of the patch is to be able to be 
 recovered using Minimum recovery ending location in the control file.

Done.

Regards.


Jun Ishizuka
NTT Software Corporation
TEL:045-317-7018
E-Mail: ishizuka@po.ntts.co.jp



standby_online_backup_02.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types, constructors, and the type system

2011-06-30 Thread Jeff Davis
On Thu, 2011-06-30 at 09:58 -0700, David E. Wheeler wrote:
 On Jun 30, 2011, at 9:29 AM, Jeff Davis wrote:
 
  Right. In that respect, it's more like a record type: many possible
  record types exist, but you only define the ones you want.
 
 Well, okay. How is this same problem handled for RECORD types, then?

What problem, exactly? For a given list of subtypes, there is only one
valid record type.

Also, record is not a great example. The implementation uses at least
one pretty horrible hack.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add support for logging current role (what to review?)

2011-06-30 Thread Magnus Hagander
On Fri, Jul 1, 2011 at 03:36, Alvaro Herrera alvhe...@commandprompt.com wrote:
 Excerpts from Stephen Frost's message of jue jun 30 18:35:40 -0400 2011:

  I know of no such list, and I think this field
  useful/important enough that people who are using csv logging would
  want it anyway. +1 on just tacking on the field and causing a flag day
  for csv users.

 Honestly, I think it was *me* who raised the issue that we don't have a
 header for CSV logs and that it sucks for people using CSV files.  We've
 changed it in the past (application_name was added, iirc) and there
 wasn't much noise of it that I recall.  If everyone's happy with that,
 it's fine by me.

 I don't understand why it is so much a deal that 9.1 has a different CSV
 table definition than 9.0 anyway (or any two release combination).  As
 long as both are clearly and correctly documented in the respective
 pages, it shouldn't be an issue at all.  If anyone attempts to load CSV
 log files into the wrong table definition, the problem should make
 itself evident pretty quickly.

I don't see that as a big problem either. A file header would make it
easier to detect for tools though - either by a regular CSV header or
by just always logging a row with the version number in the first line
of each file. Making it an actual CSV header has the added benefit of
being a standard way that non-pg-specific tools know how to deal
with..

If it's a pg specific tool, it's likely going to require version
specific information anyway. This is just one of them. Now, if the
order and ocntents of the fields is made entirely configurable, that
basically creates an unlimited number of permutations, and *that* may
make things really hard on tool developers. And without a header, it
makes the tool developers work completely impossible - but harder even
with.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] PANIC while doing failover (streaming replication)

2011-06-30 Thread Fujii Masao
On Fri, Jul 1, 2011 at 2:18 PM, Mikko Partio mpar...@gmail.com wrote:
 Hello list
 I have two a machine cluster with PostgreSQL 9.0.4 and streaming
 replication. In a normal situation I did a failover -- touched the trigger
 file in standby to promote it to production mode. I have done this
 previously without any complications but now the to-be-production-database
 had a PANIC and shut itself down. From the logs:
 postgres[10751]: [2-1] 2011-06-30 17:25:24 EEST [10751]: [1-1] user=,db=
 LOG:  streaming replication successfully connected to primary
 postgres[10736]: [10-1] 2011-07-01 07:50:29 EEST [10736]: [10-1] user=,db=
 LOG:  trigger file found: /postgresql/data/finish_replication.trigger
 postgres[10751]: [3-1] 2011-07-01 07:50:29 EEST [10751]: [2-1] user=,db=
 FATAL:  terminating walreceiver process due to administrator command
 postgres[10736]: [11-1] 2011-07-01 07:50:30 EEST [10736]: [11-1] user=,db=
 LOG:  redo done at AE/8B1855C8
 postgres[10736]: [12-1] 2011-07-01 07:50:30 EEST [10736]: [12-1] user=,db=
 LOG:  last completed transaction was at log time 2011-07-01
 07:50:14.67424+03
 postgres[10736]: [13-1] 2011-07-01 07:50:30 EEST [10736]: [13-1] user=,db=
 LOG:  restored log file 000600AE008B from archive
 postgres[10736]: [14-1] 2011-07-01 07:50:30 EEST [10736]: [14-1] user=,db=
 LOG:  selected new timeline ID: 7
 postgres[10736]: [15-1] 2011-07-01 07:50:30 EEST [10736]: [15-1] user=,db=
 LOG:  restored log file 0006.history from archive
 postgres[10736]: [16-1] 2011-07-01 07:50:30 EEST [10736]: [16-1] user=,db=
 LOG:  archive recovery complete
 postgres[10736]: [17-1] 2011-07-01 07:50:30 EEST [10736]: [17-1] user=,db=
 WARNING:  page 68975 of relation base/3711580/4398155 was uninitialized
 postgres[10736]: [18-1] 2011-07-01 07:50:30 EEST [10736]: [18-1] user=,db=
 WARNING:  page 782 of relation base/3711580/4395394 was uninitialized
 postgres[10736]: [19-1] 2011-07-01 07:50:30 EEST [10736]: [19-1] user=,db=
 WARNING:  page 68948 of relation base/3711580/4398155 was uninitialized
 postgres[10736]: [20-1] 2011-07-01 07:50:30 EEST [10736]: [20-1] user=,db=
 WARNING:  page 68986 of relation base/3711580/4398155 was uninitialized
 [ there are maybe 50 lines of these WARNING messages for all together six
 different relations, all indexes ]
 postgres[10736]: [73-1] 2011-07-01 07:50:31 EEST [10736]: [73-1] user=,db=
 PANIC:  WAL contains references to invalid pages
 postgres[10727]: [2-1] 2011-07-01 07:50:31 EEST [10727]: [2-1] user=,db=
 LOG:  startup process (PID 10736) was terminated by signal 6: Aborted
 postgres[10727]: [3-1] 2011-07-01 07:50:31 EEST [10727]: [3-1] user=,db=
 LOG:  terminating any other active server processes
 Then I started postgres as it was shut down:
 postgres[12191]: [1-1] 2011-07-01 07:50:59 EEST [12191]: [1-1] user=,db=
 LOG:  database system was interrupted while in recovery at log time
 2011-07-01 07:44:02 EEST
 postgres[12191]: [1-2] 2011-07-01 07:50:59 EEST [12191]: [2-1] user=,db=
 HINT:  If this has occurred more than once some data might be corrupted and
 you might need to choose an earlier recovery target.
 postgres[12191]: [2-1] 2011-07-01 07:50:59 EEST [12191]: [3-1] user=,db=
 LOG:  database system was not properly shut down; automatic recovery in
 progress
 postgres[12191]: [3-1] 2011-07-01 07:50:59 EEST [12191]: [4-1] user=,db=
 LOG:  redo starts at AE/8B13E398
 postgres[12191]: [4-1] 2011-07-01 07:50:59 EEST [12191]: [5-1] user=,db=
 LOG:  consistent recovery state reached at AE/8C00
 postgres[12191]: [5-1] 2011-07-01 07:50:59 EEST [12191]: [6-1] user=,db=
 LOG:  unexpected pageaddr AE/6E00 in log file 174, segment 140, offset 0
 postgres[12191]: [6-1] 2011-07-01 07:50:59 EEST [12191]: [7-1] user=,db=
 LOG:  redo done at AE/8B1855C8
 postgres[12191]: [7-1] 2011-07-01 07:50:59 EEST [12191]: [8-1] user=,db=
 LOG:  last completed transaction was at log time 2011-07-01
 07:50:14.67424+03
 postgres[12194]: [1-1] 2011-07-01 07:50:59 EEST [12194]: [1-1] user=,db=
 LOG:  autovacuum launcher started
 postgres[12189]: [1-1] 2011-07-01 07:50:59 EEST [12189]: [1-1] user=,db=
 LOG:  database system is ready to accept connections
 Are these log messages something to worry about?

http://archives.postgresql.org/pgsql-hackers/2008-12/msg01335.php
According to the above thread, this issue seems to exist for a few years,
and be unsolved yet. Could you provide a self-contained test case?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers