date:20120120

Re: [HACKERS] Command Triggers

2012-01-20 Thread Dimitri Fontaine

Robert Haas  writes:
> I think the OID is better than the name, but if it's easy to pass the
> name and schema, then I'm fine with it.  But I do think this is one of

It's quite easy to get name and schema from the command yes, here's an
example of how I'm doing it for some commands:

case T_CreateStmt:
{
CreateStmt *node = (CreateStmt *)parsetree;
cmd->schemaname = RangeVarGetNamespace(node->relation);
cmd->objectname = node->relation->relname;
break;
}

case T_AlterTableStmt:
{
AlterTableStmt *node = (AlterTableStmt *)parsetree;
cmd->schemaname = RangeVarGetNamespace(node->relation);
cmd->objectname = node->relation->relname;
break;
}

case T_CreateExtensionStmt:
{
cmd->schemaname = NULL;
cmd->objectname = ((CreateExtensionStmt 
*)parsetree)->extname;
break;
}

Getting the OID on the other hand is much harder, because each command
implements that part as it wants to, and DefineWhatever() functions are
returning void. We could maybe have them return the main Oid of the
object created, or we could have the CommandContext structure I'm using
be a backend global variable that any command would stuff.

In any case, adding support for the OID only works for after trigger
when talking about CREATE commands and for before trigger if talking
about DROP commands, assuming that the trigger procedure is run after
we've been locating said Oid.

It seems to me to be a couple orders of magnitude more work to get the
Oid here compared to just get the schemaname and objectname. And getting
those works in all cases as we take them from the command itself (we
fill in the schema with the first search_path entry when it's not given
explicitly in the command)

 CREATE TABLE foo();
 NOTICE:  tag:  CREATE TABLE
 NOTICE:  enforce_local_style{public.foo}: foo

> those problems that is complex enough that we should be happy to get
> the core infrastructure in in the first commit, even with an extremely
> limited amount of functionality, and then build on it.
> Enhance-and-extend is so much easier than a monolithic code drop.

I'm happy to read that, and I'm preparing next patch version (v6) with
that goal in mind.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remembering bug #6123

2012-01-20 Thread Kevin Grittner

"Kevin Grittner"  wrote:
> Tom Lane  wrote:
 
>> Well, the bottom line that's concerning me here is whether
>> throwing errors is going to push anyone's application into an
>> unfixable corner.  I'm somewhat encouraged that your Circuit
>> Courts software can adapt to it, since that's certainly one of
>> the larger and more complex applications out there. Or at least I
>> would be if you had actually verified that the CC code was okay
>> with the recently-proposed patch versions. Do you have any
>> thorough tests you can run against whatever we end up with?
 
> To test the new version of this patch, we would need to pick an
> application release, and use the patch through the development,
> testing, and staging cycles,  We would need to look for all
> triggers needing adjustment, and make the necessary changes.  We
> would need to figure out which triggers were important to cover,
> and ensure that testing covered all of them.
> 
> Given the discussions with my new manager this past week, I'm
> pretty sure we can work this into a release that would complete
> testing and hit pilot deployment in something like three months,
> give or take a little.  I can't actually make any promises on that
> until I talk to her next week.
 
After a couple meetings, I have approval to get this into an
application release currently in development.  Assuming that your
patch from the 13th is good for doing the testing, I think I can
post test results in about three weeks.  I'll also work on a
follow-on patch to add couple paragraphs and an example of the issue
to the docs by then.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-01-20 Thread Garick Hamlin

On Fri, Jan 20, 2012 at 09:12:13AM -0800, David E. Wheeler wrote:
> On Jan 19, 2012, at 9:07 PM, Tom Lane wrote:
> 
> > If his client encoding is UTF8, the value will be letter-perfect JSON
> > when it gets to him; and if his client encoding is not UTF8, then he's
> > already pretty much decided that he doesn't give a fig about the
> > Unicode-centricity of the JSON spec, no?
> 
> Don’t entirely agree with this. Some folks are stuck with other encodings and
> cannot change them for one reason or another. That said, they can convert
> JSON from their required encoding into UTF-8 on the client side, so there is
> a workaround.

Perhaps in addition to trying to just 'do the right thing by default',
it makes sense to have a two canonicalization functions?

Say: json_utf8() and json_ascii().

They could give the same output no matter what encoding was set? 

json_utf8 would give nice output where characters were canonicalized to 
native utf8 characters and json_ascii() would output only non-control
ascii characters literally and escape everything else or something
like that?

Garick

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-01-20 Thread Andrew Dunstan




On 01/20/2012 11:58 AM, Robert Haas wrote:

On Fri, Jan 20, 2012 at 10:45 AM, Andrew Dunstan  wrote:

XML's&#; escape mechanism is more or less the equivalent of JSON's
\u. But XML documents can be encoded in a variety of encodings,
including non-unicode encodings such as Latin-1. However, no matter what the
document encoding,&#; designates the character with Unicode code point
, whether or not that is part of the document encoding's charset.

OK.


Given that precedent, I'm wondering if we do need to enforce anything other
than that it is a valid unicode code point.

Equivalence comparison is going to be difficult anyway if you're not
resolving all \u escapes. Possibly we need some sort of canonicalization
function to apply for comparison purposes. But we're not providing any
comparison ops today anyway, so I don't think we need to make that decision
now. As you say, there doesn't seem to be any defined canonical form - the
spec is a bit light on in this respect.

Well, we clearly have to resolve all \u to do either comparison or
canonicalization.  The current patch does neither, but presumably we
want to leave the door open to such things.  If we're using UTF-8 and
comparing two strings, and we get to a position where one of them has
a character and the other has \u, it's pretty simple to do the
comparison: we just turn  into a wchar_t and test for equality.
That should be trivial, unless I'm misunderstanding.  If, however,
we're not using UTF-8, we have to first turn \u into a Unicode
code point, then covert that to a character in the database encoding,
and then test for equality with the other character after that.  I'm
not sure whether that's possible in general, how to do it, or how
efficient it is.  Can you or anyone shed any light on that topic?



We know perfectly well how to turn two strings from encoding x to utf8 
(see mb_utils.c::pg_do_encoding_conversion() ). Once we've done that 
ISTM we have reduced this to the previous problem, as the mathematicians 
like to say.



cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum rate limit in KBps

2012-01-20 Thread Greg Smith


On 01/20/2012 10:37 AM, Robert Haas wrote:

On Thu, Jan 19, 2012 at 11:29 PM, Greg Smith  wrote:

vacuum_cost_page_hit = 0.1
vacuum_cost_page_miss = 1.0
vacuum_cost_page_dirty = 2.0

Now add in the new setting, which is explicitly said to be the read value:

vacuum_cost_read_limit = 8000 # maximum page miss read rate in
kilobytes/second

That may be a little better, but I still don't think it's worth
breaking backward compatibility for.  I mean, suppose I don't care
about read rate, but I want to limit my dirty data rate to 1MB/s.
What parameters should I set?


vacuum_cost_page_dirty = 8.0

The resulting maximum rates will then be:

hit = 80MB/s
miss = 8MB/s
dirty = 1MB/s

The question you should ask yourself next is "how do I limit my dirty data rate to 
1MB/s in 9.1?"  Working that out by hand is a good exercise, to show just how much 
less complicated this proposal is over the current state of things.  Show me how it's 
possible to do that in way we can expect new DBAs to follow, then the idea of keeping 
strong backwards compatibility here would have some weight.  I see sticking too closely 
to the current scheme as being more bug-level compatibility; it's fundamentally broken, 
by being too difficult to use, to most people in its current form.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Removing freelist (was Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?)

2012-01-20 Thread Heikki Linnakangas


On 04.01.2012 13:14, Simon Riggs wrote:

On Tue, Jan 3, 2012 at 11:28 PM, Tom Lane  wrote:

Jim Nasby  writes:

On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:

This could well be related to the fact that DropRelFileNodeBuffers()
does a scan of shared_buffers, which is an O(N) approach no matter the
size of the index.



Couldn't we just leave the buffers alone? Once an index is dropped and that's 
pushed out through the catalog then nothing should be trying to access them and 
they'll eventually just get aged out.


No, we can't, because if they're still dirty then the bgwriter would
first try to write them to the no-longer-existing storage file.  It's
important that we kill the buffers immediately during relation drop.

I'm still thinking that it might be sufficient to mark the buffers
invalid and let the clock sweep find them, thereby eliminating the need
for a freelist.


My patch puts things on the freelist only when it is free to do so.
Not having a freelist at all is probably a simpler way of avoiding the
lock contention, so I'll happily back that suggestion instead. Patch
attached, previous patch revoked.


So, you're proposing that we remove freelist altogether? Sounds 
reasonable, but that needs to be performance tested somehow. I'm not 
sure what exactly the test should look like, but it probably should 
involve an OLTP workload, and large tables being created, populated to 
fill the cache with pages from the table, and dropped, while the OLTP 
workload is running. I'd imagine that the worst case scenario looks 
something like that.


Removing the freelist should speed up the drop, but the question is how 
costly are the extra clock sweeps that the backends have to do.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-01-20 Thread Robert Haas

On Fri, Jan 20, 2012 at 12:14 PM, David E. Wheeler  wrote:
> On Jan 20, 2012, at 8:58 AM, Robert Haas wrote:
>
>> If, however,
>> we're not using UTF-8, we have to first turn \u into a Unicode
>> code point, then covert that to a character in the database encoding,
>> and then test for equality with the other character after that.  I'm
>> not sure whether that's possible in general, how to do it, or how
>> efficient it is.  Can you or anyone shed any light on that topic?
>
> If it’s like the XML example, it should always represent a Unicode code 
> point, and *not* be converted to the other character set, no?

Well, you can pick which way you want to do the conversion.  If the
database encoding is SJIS, and there's an SJIS character in a string
that gets passed to json_in(), and there's another string which also
gets passed to json_in() which contains \u, then any sort of
canonicalization or equality testing is going to need to convert the
SJIS character to a Unicode code point, or the Unicode code point to
an SJIS character, to see whether they match.

Err, actually, now that I think about it, that might be a problem:
what happens if we're trying to test two characters for equality and
the encoding conversion fails?  We really just want to return false -
the strings are clearly not equal if either contains even one
character that can't be converted to the other encoding - so it's not
good if an error gets thrown in there anywhere.

> At any rate, since the JSON standard requires UTF-8, such distinctions having 
> to do with alternate encodings are not likely to be covered, so I suspect we 
> can do whatever we want here. It’s outside the spec.

I agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers

2012-01-20 Thread Robert Haas

On Fri, Jan 20, 2012 at 12:14 PM, Dimitri Fontaine
 wrote:
> Robert Haas  writes:
>> I think the OID is better than the name, but if it's easy to pass the
>> name and schema, then I'm fine with it.  But I do think this is one of
>
> It's quite easy to get name and schema from the command yes, here's an
> example of how I'm doing it for some commands:
>
>                case T_CreateStmt:
>                {
>                        CreateStmt *node = (CreateStmt *)parsetree;
>                        cmd->schemaname = RangeVarGetNamespace(node->relation);
>                        cmd->objectname = node->relation->relname;
>                        break;
>                }
>
>                case T_AlterTableStmt:
>                {
>                        AlterTableStmt *node = (AlterTableStmt *)parsetree;
>                        cmd->schemaname = RangeVarGetNamespace(node->relation);
>                        cmd->objectname = node->relation->relname;
>                        break;
>                }
>
>                case T_CreateExtensionStmt:
>                {
>                        cmd->schemaname = NULL;
>                        cmd->objectname = ((CreateExtensionStmt 
> *)parsetree)->extname;
>                        break;
>                }
>
> Getting the OID on the other hand is much harder, because each command
> implements that part as it wants to, and DefineWhatever() functions are
> returning void. We could maybe have them return the main Oid of the
> object created, or we could have the CommandContext structure I'm using
> be a backend global variable that any command would stuff.
>
> In any case, adding support for the OID only works for after trigger
> when talking about CREATE commands and for before trigger if talking
> about DROP commands, assuming that the trigger procedure is run after
> we've been locating said Oid.
>
> It seems to me to be a couple orders of magnitude more work to get the
> Oid here compared to just get the schemaname and objectname. And getting
> those works in all cases as we take them from the command itself (we
> fill in the schema with the first search_path entry when it's not given
> explicitly in the command)
>
>  CREATE TABLE foo();
>  NOTICE:  tag:  CREATE TABLE
>  NOTICE:  enforce_local_style{public.foo}: foo

Hmm, OK.  But what happens if the user doesn't specify a schema name explicitly?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum rate limit in KBps

2012-01-20 Thread Robert Haas

On Fri, Jan 20, 2012 at 12:35 PM, Greg Smith  wrote:
> On 01/20/2012 10:37 AM, Robert Haas wrote:
>>
>> On Thu, Jan 19, 2012 at 11:29 PM, Greg Smith  wrote:
>>>
>>> vacuum_cost_page_hit = 0.1
>>>
>>> vacuum_cost_page_miss = 1.0
>>> vacuum_cost_page_dirty = 2.0
>>>
>>> Now add in the new setting, which is explicitly said to be the read
>>> value:
>>>
>>> vacuum_cost_read_limit = 8000 # maximum page miss read rate in
>>> kilobytes/second
>>
>> That may be a little better, but I still don't think it's worth
>>
>> breaking backward compatibility for.  I mean, suppose I don't care
>> about read rate, but I want to limit my dirty data rate to 1MB/s.
>> What parameters should I set?
>
> vacuum_cost_page_dirty = 8.0
>
> The resulting maximum rates will then be:
>
> hit = 80MB/s
> miss = 8MB/s
> dirty = 1MB/s

That is, of course, not quite what I asked for.  In fact it's likely
that my actual rate will be less than 1MB/s, because there will
probably be a miss for ever dirty.  It will probably be about 8/9ths
of a MB/s.

> The question you should ask yourself next is "how do I limit my dirty data
> rate to 1MB/s in 9.1?"  Working that out by hand is a good exercise, to show
> just how much less complicated this proposal is over the current state of
> things.

OK, sure.  Our block size is 8kB, so we need every 128 blocks to
involve 1000 ms of delay.   Obviously there are many combinations of
parameters that will make that work, but here's one: delay for 125
milliseconds after each 16 blocks:

vacuum_cost_page_hit = 0
vacuum_cost_page_miss = 0
vacuum_cost_page_dirty = 1
vacuum_cost_limit = 16
autovacuum_vacuum_cost_delay = 125ms

Maybe that strikes you as worse than what you're proposing; it strikes
me as better.  Either way I think it's not going to be a good day for
people who are bad at math.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inline Extension

2012-01-20 Thread Dimitri Fontaine

Robert Haas  writes:
>> Maybe you would want the system to be able to determine the oldest
>> version to start from to reach the current default_version given in the
>> control file, but I guess it would be better to add another property
>> like default_full_version or such (last_stop?).
>
> Possibly that might be a better design, yes.  Especially since we
> don't order version numbers intrinsically.

It's quite simple to implement too, see attached.  As for the version
number ordering, that's a giant rat hole I don't want to be digging in,
and I think we can bypass that problem entirely.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

diff --git a/contrib/hstore/Makefile b/contrib/hstore/Makefile
index e9e5e53..2d624d3 100644
--- a/contrib/hstore/Makefile
+++ b/contrib/hstore/Makefile
@@ -5,7 +5,7 @@ OBJS = hstore_io.o hstore_op.o hstore_gist.o hstore_gin.o hstore_compat.o \
 	crc32.o
 
 EXTENSION = hstore
-DATA = hstore--1.0.sql hstore--1.1.sql hstore--1.0--1.1.sql \
+DATA = hstore--1.0.sql hstore--1.1.sql.orig hstore--1.0--1.1.sql \
 	hstore--unpackaged--1.0.sql
 
 REGRESS = hstore
diff --git a/contrib/hstore/hstore--1.1.sql b/contrib/hstore/hstore--1.1.sql
deleted file mode 100644
index e95ad32..000
--- a/contrib/hstore/hstore--1.1.sql
+++ /dev/null
@@ -1,524 +0,0 @@
-/* contrib/hstore/hstore--1.1.sql */
-
--- complain if script is sourced in psql, rather than via CREATE EXTENSION
-\echo Use "CREATE EXTENSION hstore" to load this file. \quit
-
-CREATE TYPE hstore;
-
-CREATE FUNCTION hstore_in(cstring)
-RETURNS hstore
-AS 'MODULE_PATHNAME'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION hstore_out(hstore)
-RETURNS cstring
-AS 'MODULE_PATHNAME'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION hstore_recv(internal)
-RETURNS hstore
-AS 'MODULE_PATHNAME'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION hstore_send(hstore)
-RETURNS bytea
-AS 'MODULE_PATHNAME'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE TYPE hstore (
-INTERNALLENGTH = -1,
-INPUT = hstore_in,
-OUTPUT = hstore_out,
-RECEIVE = hstore_recv,
-SEND = hstore_send,
-STORAGE = extended
-);
-
-CREATE FUNCTION hstore_version_diag(hstore)
-RETURNS integer
-AS 'MODULE_PATHNAME','hstore_version_diag'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION fetchval(hstore,text)
-RETURNS text
-AS 'MODULE_PATHNAME','hstore_fetchval'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE OPERATOR -> (
-	LEFTARG = hstore,
-	RIGHTARG = text,
-	PROCEDURE = fetchval
-);
-
-CREATE FUNCTION slice_array(hstore,text[])
-RETURNS text[]
-AS 'MODULE_PATHNAME','hstore_slice_to_array'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE OPERATOR -> (
-	LEFTARG = hstore,
-	RIGHTARG = text[],
-	PROCEDURE = slice_array
-);
-
-CREATE FUNCTION slice(hstore,text[])
-RETURNS hstore
-AS 'MODULE_PATHNAME','hstore_slice_to_hstore'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION isexists(hstore,text)
-RETURNS bool
-AS 'MODULE_PATHNAME','hstore_exists'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION exist(hstore,text)
-RETURNS bool
-AS 'MODULE_PATHNAME','hstore_exists'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE OPERATOR ? (
-	LEFTARG = hstore,
-	RIGHTARG = text,
-	PROCEDURE = exist,
-	RESTRICT = contsel,
-	JOIN = contjoinsel
-);
-
-CREATE FUNCTION exists_any(hstore,text[])
-RETURNS bool
-AS 'MODULE_PATHNAME','hstore_exists_any'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE OPERATOR ?| (
-	LEFTARG = hstore,
-	RIGHTARG = text[],
-	PROCEDURE = exists_any,
-	RESTRICT = contsel,
-	JOIN = contjoinsel
-);
-
-CREATE FUNCTION exists_all(hstore,text[])
-RETURNS bool
-AS 'MODULE_PATHNAME','hstore_exists_all'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE OPERATOR ?& (
-	LEFTARG = hstore,
-	RIGHTARG = text[],
-	PROCEDURE = exists_all,
-	RESTRICT = contsel,
-	JOIN = contjoinsel
-);
-
-CREATE FUNCTION isdefined(hstore,text)
-RETURNS bool
-AS 'MODULE_PATHNAME','hstore_defined'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION defined(hstore,text)
-RETURNS bool
-AS 'MODULE_PATHNAME','hstore_defined'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION delete(hstore,text)
-RETURNS hstore
-AS 'MODULE_PATHNAME','hstore_delete'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION delete(hstore,text[])
-RETURNS hstore
-AS 'MODULE_PATHNAME','hstore_delete_array'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION delete(hstore,hstore)
-RETURNS hstore
-AS 'MODULE_PATHNAME','hstore_delete_hstore'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE OPERATOR - (
-	LEFTARG = hstore,
-	RIGHTARG = text,
-	PROCEDURE = delete
-);
-
-CREATE OPERATOR - (
-	LEFTARG = hstore,
-	RIGHTARG = text[],
-	PROCEDURE = delete
-);
-
-CREATE OPERATOR - (
-	LEFTARG = hstore,
-	RIGHTARG = hstore,
-	PROCEDURE = delete
-);
-
-CREATE FUNCTION hs_concat(hstore,hstore)
-RETURNS hstore
-AS 'MODULE_PATHNAME','hstore_concat'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE OPERATOR || (
-	LEFTARG = hstore,
-	RIGHTARG = hstore,
-	PROCEDURE = hs_concat
-);
-
-CREATE FUNCTION hs_contains(hstore,hstore)
-RETURNS bool
-AS

Re: [HACKERS] JSON for PG 9.2

2012-01-20 Thread Tom Lane

Robert Haas  writes:
> Err, actually, now that I think about it, that might be a problem:
> what happens if we're trying to test two characters for equality and
> the encoding conversion fails?

This is surely all entirely doable given the encoding infrastructure
we already have.  We might need some minor refactoring, eg to have
a way of not throwing an error, but it's not going to be that hard
to achieve if somebody wants to do it.  So I still see little reason
for making the JSON type behave visibly differently in non-UTF8 database
encodings.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers

2012-01-20 Thread Tom Lane

Robert Haas  writes:
> Hmm, OK.  But what happens if the user doesn't specify a schema name 
> explicitly?

For the creation case, RangeVarGetCreationNamespace should handle that.
The code Dimitri quoted is wrong, but not that hard to fix.

Unfortunately, the code he quoted for the ALTER case is also wrong,
and harder to fix.  Until you've done the lookup you don't know which
schema the referenced object is in.  And I don't care for the idea of
doing the lookup twice, as (a) it'll be slower and (b) there are race
cases where it will give the wrong answer, ie return an object other
than the one the ALTER eventually acts on.

Really I think there is not any single point where you can put the
command-trigger hook and be done.  In almost every case, the right
place is going to be buried somewhere within the execution of the
command.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers

2012-01-20 Thread Dimitri Fontaine

Tom Lane  writes:
> For the creation case, RangeVarGetCreationNamespace should handle that.
> The code Dimitri quoted is wrong, but not that hard to fix.

Ok.

> Unfortunately, the code he quoted for the ALTER case is also wrong,
> and harder to fix.  Until you've done the lookup you don't know which
> schema the referenced object is in.  And I don't care for the idea of
> doing the lookup twice, as (a) it'll be slower and (b) there are race
> cases where it will give the wrong answer, ie return an object other
> than the one the ALTER eventually acts on.

Oh. Yes.

> Really I think there is not any single point where you can put the
> command-trigger hook and be done.  In almost every case, the right
> place is going to be buried somewhere within the execution of the
> command.

I'm finally realizing it. I already introduced a structure called
CommandContextData to keep track of the current command elements we want
to pass down to command triggers, I think we're saying that this should
be a static variable that commands will need to be editing.

The main problem with that approach is that we will need to spread
calling the command trigger entry points to every supported command, I
wanted to avoid that first. It's no big deal though, as the API is
simple enough.

Expect a new patch made this way early next week.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checkpoint sync pause

2012-01-20 Thread Robert Haas

On Mon, Jan 16, 2012 at 8:59 PM, Greg Smith  wrote:
> [ interesting description of problem scenario and necessary conditions for 
> reproducing it ]

This is about what I thought was happening, but I'm still not quite
sure how to recreate it in the lab.

Have you had a chance to test with Linux 3.2 does any better in this
area?  As I understand it, it doesn't do anything particularly
interesting about the willingness of the kernel to cache gigantic
amounts of dirty data, but (1) supposedly it does a better job not
yanking the disk head around by just putting foreground processes to
sleep while writes happen in the background, rather than having the
foreground processes compete with the background writer for control of
the disk head; and (2) instead of having a sharp edge where background
writing kicks in, it tries to gradually ratchet up the pressure to get
things written out.

Somehow I can't shake the feeling that this is fundamentally a Linux
problem, and that it's going to be nearly impossible to work around in
user space without some help from the kernel.  I guess in some sense
it's reasonable that calling fsync() blasts the data at the platter at
top speed, but if that leads to starving everyone else on the system
then it starts to seem a lot less reasonable: part of the kernel's job
is to guarantee all processes fair access to shared resources, and if
it doesn't do that, we're always going to be playing catch-up.

>> Just one random thought: I wonder if it would make sense to cap the
>> delay after each sync to the time spending performing that sync.  That
>> would make the tuning of the delay less sensitive to the total number
>> of files, because we won't unnecessarily wait after each sync when
>> they're not actually taking any time to complete.
>
> This is one of the attractive ideas in this area that didn't work out so
> well when tested.  The problem is that writes into a battery-backed write
> cache will show zero latency for some time until the cache is filled...and
> then you're done.  You have to pause anyway, even though it seems write
> speed is massive, to give the cache some time to drain to disk between syncs
> that push data toward it.  Even though it absorbed your previous write with
> no delay, that doesn't mean it takes no time to process that write.  With
> proper write caching, that processing is just happening asynchronously.

Hmm, OK.  Well, to borrow a page from one of your other ideas, how
about keeping track of the number of fsync requests queued for each
file, and make the delay proportional to that number?  We might have
written the same block more than once, so it could be an overestimate,
but it rubs me the wrong way to think that a checkpoint is going to
finish late because somebody ran a CREATE TABLE statement that touched
5 or 6 catalogs, and now we've got to pause for 15-18 seconds because
they've each got one dirty block.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-01-20 Thread Robert Haas

On Fri, Jan 20, 2012 at 2:21 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> Err, actually, now that I think about it, that might be a problem:
>> what happens if we're trying to test two characters for equality and
>> the encoding conversion fails?
>
> This is surely all entirely doable given the encoding infrastructure
> we already have.  We might need some minor refactoring, eg to have
> a way of not throwing an error, but it's not going to be that hard
> to achieve if somebody wants to do it.  So I still see little reason
> for making the JSON type behave visibly differently in non-UTF8 database
> encodings.

OK.  It feels a little grotty to me, but I'll go with the flow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remembering bug #6123

2012-01-20 Thread Tom Lane

"Kevin Grittner"  writes:
> After a couple meetings, I have approval to get this into an
> application release currently in development.  Assuming that your
> patch from the 13th is good for doing the testing, I think I can
> post test results in about three weeks.  I'll also work on a
> follow-on patch to add couple paragraphs and an example of the issue
> to the docs by then.

Well, the issues about wording of the error message are obviously just
cosmetic, but I think we'd better do whatever we intend to do to the
callers of heap_lock_tuple before putting the patch through testing.
I'm inclined to go ahead and make them throw errors, just to see if
that has any effects we don't like.

I'm up to my elbows in planner guts at the moment, but will try to
fix up the patch this weekend if you want.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remembering bug #6123

2012-01-20 Thread Kevin Grittner

Tom Lane  wrote:

> I'm up to my elbows in planner guts at the moment, but will try to
> fix up the patch this weekend if you want.

They have scheduled testers to check on this issue next week, so it
would be great to get as close as we can on the stuff that matters.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inline Extension

2012-01-20 Thread Daniel Farina

On Thu, Jan 19, 2012 at 8:15 AM, Tom Lane  wrote:
>> Heikki Linnakangas  writes:
>>> Frankly I don't see the point of this. If the extension is an independent
>>> piece of (SQL) code, developed separately from an application, with its own
>>> lifecycle, a .sql file seems like the best way to distribute it. If it's
>>> not, ie. if it's an integral part of the database schema, then why package
>>> it as an extension in the first place?
> I'm with Heikki on not believing that this is a good idea.  If you are
> trying to do careful versioning of a set of object definitions, you want
> to stick the things in a file, you don't want them just flying by in
> submitted SQL.  Also, a large part of the point of the extension
> facility is to be able to do uninstall/reinstall and version
> upgrades/downgrades, none of which are possible unless the extension
> scripts are stored somewhere.
>
> ISTM your distribution concern would be much better addressed by
> installing contrib/adminpack and then just using pg_file_write()
> to put the new extension script into the remote server's library.

I think this is somewhat rube-goldberg-esque, and denies non-superuser
roles the ability to get more version management of schema and
operators.  As-is many organizations are submitting "migrations" via
plain SQL that include committing to a version management table that
is maintained by convention, and as-is that is considered a modern-day
best-practice.

Unless someone comes up with something fundamentally new in how to
handle the problem being solved in user-land via 'migrations' and
version tables in Postgres, I think tying completely safe, sandboxed,
non-superuser operators to super-user-only (and awkward besides) file
management via FEBE on the server-side is not going to do anything but
redelegate the problem of soundness to every migration/change
management regime, and decrease reuse of useful operators that do not
require superuser.

The ship has sailed.  Encouraging use of files and .sql buy no
soundness, because everyone is moving towards is overlaying version
management via pure FEBE anyway.  At best, this means to me that
Postgres is completely neutral about what version management regime
you want to use for operators and schemas.  At worst, it means
Postgres frustratingly unhelpful in common use cases.  And somewhere
in the middle is "this may have value, but not enough to be worth
maintaining."  Given the lack of a fast and widely deployed trusted PL
integrations, I my interpretation of the truth falls somewhere between
the latter two interpretations.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Group commit, revised

2012-01-20 Thread Heikki Linnakangas

I spent some time cleaning this up. Details below, but here are the 
highlights:


* Reverted the removal of wal_writer_delay
* Doesn't rely on WAL writer. Any process can act as the "leader" now.
* Removed heuristic on big flushes
* Uses PGSemaphoreLock/Unlock instead of latches

On 20.01.2012 17:30, Robert Haas wrote:

On Thu, Jan 19, 2012 at 10:46 PM, Peter Geoghegan  wrote:

+   if (delta>  XLOG_SEG_SIZE * CheckPointSegments ||
+   !ProcGlobal->groupCommitAvailable)

That seems like a gigantic value.  I would think that we'd want to
forget about group commit any time we're behind by more than one
segment (maybe less).


I'm sure that you're right - I myself described it as horrible in my
original mail. I think that whatever value we set this to ought to be
well reasoned. Right now, your magic number doesn't seem much better
than my bogus heuristic (which only ever served as a placeholder
implementation that hinted at a well-principled formula). What's your
basis for suggesting that that limit would always be close to optimal?


It's probably not - I suspect even a single WAL segment still too
large.  I'm pretty sure that it would be safe to always flush up to
the next block boundary, because we always write the whole block
anyway even if it's only partially filled.  So there's no possible
regression there.  Anything larger than "the end of the current 8kB
block" is going to be a trade-off between latency and throughput, and
it seems to me that there's probably only one way to figure out what's
reasonable: test a bunch of different values and see which ones
perform well in practice.  Maybe we could answer part of the question
by writing a test program which does repeated sequential writes of
increasingly-large chunks of data.  We might find out for example that
64kB is basically the same as 8kB because most of the overhead is in
the system call anyway, but beyond that the curve kinks.  Or it may be
that 16kB is already twice as slow as 8kB, or that you can go up to
1MB without a problem.  I don't see a way to know that without testing
it on a couple different pieces of hardware and seeing what happens.


I ripped away that heuristic for a flush that's "too large". After 
pondering it for a while, I came to the conclusion that as implemented 
in the patch, it was pointless. The thing is, if the big flush doesn't 
go through the group commit machinery, it's going to grab the 
WALWriteLock straight away. Before any smaller commits can proceed, they 
will need to grab that lock anyway, so the effect is the same as if the 
big flush had just joined the queue.


Maybe we should have a heuristic to split a large flush into smaller 
chunks. The WAL segment boundary would be a quite natural split point, 
for example, because when crossing the file boundary you have to issue 
separate fsync()s for the files anyway. But I'm happy with leaving that 
out for now, it's not any worse than it used to be without group commit 
anyway.



Ugh.  Our current system doesn't even require this code to be
interruptible, I think: you can't receive cancel or die interrupts
either while waiting for an LWLock or while holding it.


Right. Or within HOLD/RESUME_INTERRUPTS blocks.

The patch added some CHECK_FOR_INTERRUPTS() calls into various places in 
the commit/abort codepaths, but that's all dead code; they're all within 
a HOLD/RESUME_INTERRUPTS blocks.


I replaced the usage of latches with the more traditional 
PGSemaphoreLock/Unlock. It semaphore model works just fine in this case, 
where we have a lwlock to guard the wait queue, and when a process is 
waiting we know it will be woken up or something messed up at a pretty 
low level. We don't need a timeout or to wake up at other signals while 
waiting. Furthermore, the WAL writer didn't have a latch before this 
patch; it's not many lines of code to initialize the latch and set up 
the signal handler for it, but it already has a semaphore that's ready 
to use.


I wonder if we should rename the file into "xlogflush.c" or something 
like that, to make it clear that this works with any XLOG flushes, not 
just commits? Group commit is the usual term for this feature, so we 
should definitely mention that in the comments, but it might be better 
to rename the files/functions to emphasize that this is about WAL 
flushing in general.


This probably needs some further cleanup to fix things I've broken, and 
I haven't done any performance testing, but it's progress. Do you have a 
shell script or something that you used for the performance tests that I 
could run? Or would you like to re-run the tests you did earlier with 
this patch?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***
*** 1760,1809  SET ENABLE_SEQSCAN TO OFF;

   
  
-  
-   commit_delay (integer)
-   
-commit_delay configuration parameter
-   
-   
-

Re: [HACKERS] Inline Extension

2012-01-20 Thread Heikki Linnakangas


On 21.01.2012 00:00, Daniel Farina wrote:

I think this is somewhat rube-goldberg-esque, and denies non-superuser
roles the ability to get more version management of schema and
operators.  As-is many organizations are submitting "migrations" via
plain SQL that include committing to a version management table that
is maintained by convention, and as-is that is considered a modern-day
best-practice.


Even if you give the version number in the CREATE EXTENSION command, 
it's by convention that people actually maintain a sane versioning 
policy. If people don't take version management seriously, you will 
quickly end up with five different versions of an extension, all with 
version number 0.1.


Another approach is to use comments on the objects saying "version 
1.23". Those generally move together with the objects themselves; they 
are included in pg_dump schema-only dump, for example, while the 
contents of a table are not.



The ship has sailed.  Encouraging use of files and .sql buy no
soundness, because everyone is moving towards is overlaying version
management via pure FEBE anyway.


What is FEBE?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gistVacuumUpdate

2012-01-20 Thread Heikki Linnakangas


On 18.01.2012 23:38, YAMAMOTO Takashi wrote:

i'm wondering because what gistVacuumUpdate used to do does not seem to
be necessarily tied to the old-style VACUUM FULL.
currently, no one will re-union keys after tuple removals, right?


Right. I believe gistVacuumUpdate needed an AccessExclusiveLock, so now 
that VACUUM FULL is gone, there is no good moment to do it.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inline Extension

2012-01-20 Thread Daniel Farina

On Fri, Jan 20, 2012 at 2:48 PM, Heikki Linnakangas
 wrote:
> Even if you give the version number in the CREATE EXTENSION command, it's by
> convention that people actually maintain a sane versioning policy. If people
> don't take version management seriously, you will quickly end up with five
> different versions of an extension, all with version number 0.1.

Projects are taking it seriously, and invest a lot of effort in it.
There is no shortage of schema versioning frameworks, of varying
levels of maturitybut some are quite complete by the standards of
their users.  However, there is little knowledge shared between them,
and the no database gives them much support, so idiosyncrasy becomes
inevitable.

Unless one makes loading regular, unpackaged operators via SQL
impossible (which is, on the face of it, doesn't sound like a good
idea to me), the only people who will bother with CREATE EXTENSION
with inline content will be those who care and want to maintain
version control.

I stick to my original assessment: the ship has sailed, or perhaps,
more accurately, is sailing.  Perhaps one could gain wisdom by
thinking of this problem differently: "I'm going to write a migration
tool to help developers in my framework or workflow deal with change,
and I really would like it if my database helped me by".

I don't think we should fool ourselves that we'll be letting people
shoot themselves in the foot with regard to versioning if we give them
versions.  People have been painfully and assiduously trying to avoid
version problems for some time now, with no help from the database,
and they do it entirely via pure SQL statements from Postgres' point
of view.

> Another approach is to use comments on the objects saying "version 1.23".
> Those generally move together with the objects themselves; they are included
> in pg_dump schema-only dump, for example, while the contents of a table are
> not.

Idiosyncratic, but perhaps useful for some people.  Again, workarounds
are possible, and per my previous statements, even pervasive.  That
doesn't make it a desirable thing to avoid assisting with.

> What is FEBE?

FrontEnd BackEnd Protocol, I thought? The one libpq speaks. Is there a
better name?

Here are some reasons not do this feature, in my mind:

* Is the world ready yet to think about versioning operators in a
database the same way software is versioned?

* Related to that: If the feature is written, it will have to be
supported.  Does the feature have enough impact at this time?

* Do we know how to design the feature correctly so it will attract
use, especially as a Postgres-ism, which is related in some part to
the decaying importance (my perception) of 'database agnostic'

* Are the awkward user-land versioning strategies good enough? Would
it be better to focus elsewhere where things are even *more* awkward?

My assessment is: I know some people who would use this feature, but a
broad swathe are not loading too many operators into their database.
It's hard to know if that's because packaging operators in databases
has been so abysmally bad in the industry at large, or because the
choice of sandboxed languages are just not appealing for, say, a
Python or Ruby developer to write things that are not a series of SQL
statements, ala pgsql, and a lot of that (sans triggers) can be done
over the wire anyway.  Maybe embedded Javascript can help with this,
but it's not the Here and Now.

If I had to invest with complexity with regard to versioning, I'd
rather invest in Postgres being able to tell me a hashed version of
all or a controllable subset of the tables, types, attributes in the
database, as to quickly pick out drift between an freshly created
database (produced by the migrations on-disk against development) and
any cruft that sneaks into a production server that is thought to be
the same as the others.  That would have huge impact with common use
cases at this very moment in time, so it provides a reasonable
backdrop to evaluate the cost/impact of the proposed feature.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gistVacuumUpdate

2012-01-20 Thread Jaime Casanova

On Wed, Jan 18, 2012 at 10:05 AM, Heikki Linnakangas
 wrote:
> On 13.01.2012 06:24, YAMAMOTO Takashi wrote:
>>
>> hi,
>>
>> gistVacuumUpdate was removed when old-style VACUUM FULL was removed.
>> i wonder why.
>> it was not practical and REINDEX is preferred?
>>
>> anyway, the removal seems incomplete and there are some leftovers:
>>        F_TUPLES_DELETED
>>        F_DELETED
>>        XLOG_GIST_PAGE_DELETE
>
>
> Hmm, in theory we might bring back support for deleting pages in the future,
> I'm guessing F_DELETED and the WAL record type were left in place because of
> that.

I guess it was an oversight, because GistPageSetDeleted() is being
used in gistRedoPageDeleteRecord() and GistPageIsDeleted() in a few
other places

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Measuring relation free space

2012-01-20 Thread Jaime Casanova

On Wed, Jan 18, 2012 at 7:01 PM, Noah Misch  wrote:
> On Wed, Jan 18, 2012 at 09:46:20AM -0500, Jaime Casanova wrote:
>>
>> ignoring all non-leaf pages still gives a considerable difference
>> between pgstattuple and relation_free_space()
>
> pgstattuple() counts the single B-tree meta page as always-full, while
> relation_free_space() skips it for all purposes.  For tiny indexes, that can
> shift the percentage dramatically.
>

ok, i will reformulate the question. why is fine ignoring non-leaf
pages but is not fine to ignore the meta page?

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] REVIEW: pg_stat_statements with query tree based normalization

2012-01-20 Thread Kääriäinen Anssi

This is a short review of pg_stat_statements based on quick testing of the 
feature.

1. Installation: after managing to actually build PostgreSQL and contrib 
modules + changing
shared_preload_libraries to include pg_stat_statements I got this error:
FATAL:  could not create shared memory segment: Invalid argument
DETAIL:  Failed system call was shmget(key=5431001, size=34627584, 03600)

So, I needed to rise my SMH limits. I guess there isn't anything surprising or 
erroneous about
this but figured this is worth mentioning.

2. Usability:
  - If you have two similarly named tables in different schemas, for example 
public.tbl and
some_schema.tbl these tables will get different entries in pg_stat_statements. 
However, the
table names are not schema-qualified, so it is impossible to see which table is 
which.

# select query, calls from pg_stat_statements where query like 'select%test%';
query| calls 
-+---
 select * from test; | 4
 select * from test; | 2

# select * from tmp.test;
# select query, calls from pg_stat_statements where query like 'select%test%';
query| calls 
-+---
 select * from test; | 5
 select * from test; | 2

# select * from test;
# select query, calls from pg_stat_statements where query like 'select%test%';
query| calls 
-+---
 select * from test; | 5
 select * from test; | 3

- It would be nice from user perspective to transform "where id in (list of 
values)" to
"where id in(?)" always, regardless of the length of the list. Now "where id in 
(1, 2)" is
grouped to different pool than "where id in (1, 2, 3)".

3. I tried to run Django's test suite a few times and see if there would be any 
unexpected
behavior. Some results (note that I haven't tried to reproduce this error on 
master
without the patch):

test_django_testdb_default=# SELECT "aggregation_publisher"."id", 
"aggregation_publisher"."name", "aggregation_publisher"."num_awards", 
MIN("aggregation_book"."pubdate") AS "earliest_book" FROM 
"aggregation_publisher" LEFT OUTER JOIN "aggregation_book" ON 
("aggregation_publisher"."id" = "aggregation_book"."publisher_id") GROUP BY 
"aggregation_publisher"."id", "aggregation_publisher"."name", 
"aggregation_publisher"."num_awards" HAVING NOT 
(MIN("aggregation_book"."pubdate") IS NULL) ORDER BY "earliest_book" ASC;

ERROR:  unrecognized node type for havingclause node: 315
test_django_testdb_default=# \d aggregation_publisher
   Table "public.aggregation_publisher"
   Column   |  Type  | Modifiers
  
++
 id | integer| not null default 
nextval('aggregation_publisher_id_seq'::regclass)
 name   | character varying(255) | not null
 num_awards | integer| not null
Indexes:
"aggregation_publisher_pkey" PRIMARY KEY, btree (id)
Referenced by:
TABLE "aggregation_book" CONSTRAINT "aggregation_book_publisher_id_fkey" 
FOREIGN KEY (publisher_id) REFERENCES aggregation_publisher(id) DEFERRABLE 
INITIALLY DEFERRED


The time used for insert statements seems suspiciously low. Maybe PostgreSQL is 
just faster than I thought :)

query  | INSERT INTO "django_content_type" ("id", "name", "app_label", 
"model") VALUES (?, ?, ?, ?)
calls  | 5490
total_time | 0.8231193

Multi-values inserts do not seem to be normalized:
query  | INSERT INTO "custom_pk_business_employees" ("business_id", 
"employee_id") VALUES ('Sears', 456), ('Sears', 123)
calls  | 1256
total_time | 0.619693

I did not see any noticeable difference in runtimes with pg_stat_statements 
installed or uninstalled (as extension).
Not tested on master without the patch at all.

Overall the feature seems to be really useful.

 - Anssi
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Measuring relation free space

2012-01-20 Thread Noah Misch

On Fri, Jan 20, 2012 at 07:03:22PM -0500, Jaime Casanova wrote:
> On Wed, Jan 18, 2012 at 7:01 PM, Noah Misch  wrote:
> > On Wed, Jan 18, 2012 at 09:46:20AM -0500, Jaime Casanova wrote:
> >>
> >> ignoring all non-leaf pages still gives a considerable difference
> >> between pgstattuple and relation_free_space()
> >
> > pgstattuple() counts the single B-tree meta page as always-full, while
> > relation_free_space() skips it for all purposes. ?For tiny indexes, that can
> > shift the percentage dramatically.
> >
> 
> ok, i will reformulate the question. why is fine ignoring non-leaf
> pages but is not fine to ignore the meta page?

pgstattuple() figures the free_percent by adding up all space available to
hold tuples and dividing that by the simple size of the relation.  Non-leaf
pages and the meta page get identical treatment: both never hold tuples, so
they do not contribute to the free space.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-20 Thread Robert Haas

On Thu, Jan 19, 2012 at 10:02 AM, Simon Riggs  wrote:
> On Wed, Jan 18, 2012 at 11:12 PM, Robert Haas  wrote:
>> On Wed, Jan 18, 2012 at 5:49 PM, Simon Riggs  wrote:
>>> Can I just check with you that the only review comment is a one line
>>> change? Seems better to make any additional review comments in one go.
>>
>> No, I haven't done a full review yet - I just noticed that right at
>> the outset and wanted to be sure that got addressed.  I can look it
>> over more carefully, and test it.
>
> Corrected your point, and another found during retest.
>
> Works as advertised, docs corrected.

This comment needs updating:

/*
 * To drop an index safely, we must grab exclusive lock on its parent
 * table.  Exclusive lock on the index alone is insufficient because
 * another backend might be about to execute a query on the parent table.
 * If it relies on a previously cached list of index OIDs, then it could
 * attempt to access the just-dropped index.  We must therefore take a
 * table lock strong enough to prevent all queries on the table from
 * proceeding until we commit and send out a shared-cache-inval notice
 * that will make them update their index lists.
 */

Speaking of that comment, why does a concurrent index drop need any
lock at all on the parent table?  If, as the current comment text
says, the point of locking the parent table is to make sure that no
new backends can create relcache entries until our transaction
commits, then it's not needed here, or at least not for that purpose.
We're going to out-wait anyone who has the index open *after* we've
committed our changes to the pg_index entry, so there shouldn't be a
race condition there.  There may very well be a reason why we need a
lock on the parent, or there may not, but that's not it.

On the flip side, it strikes me that there's considerable danger that
ShareUpdateExclusiveLock on the index might not be strong enough.  In
particular, it would fall down if there's anyone who takes an
AccessShareLock, RowShareLock, or RowExclusiveLock and then proceeds
to access the index data despite its being marked !indisready and
!indisvalid.  There are certainly instances of this in contrib, such
as in pgstatindex(), which I'm pretty sure will abort with a nastygram
if somebody truncates away the file under it.  That might not be
enough reason to reject the patch, but I do think it would be the only
place in the system where we allow something like that to happen.  I'm
not sure whether there are any similar risks in core - I am a bit
worried about get_actual_variable_range() and gincostestimate(), but I
can't tell for sure whether there is any actual problem there, or
elsewhere.  I wonder if we should only do the *first* phase of the
drop with ShareUpdateExcusliveLock, and then after outwaiting
everyone, take an AccessExclusiveLock on the index (but not the table)
to do the rest of the work.  If that doesn't allow the drop to proceed
concurrently, then we go find the places that block against it and
teach them not to lock/use/examine indexes that are !indisready.  That
way, the worst case is that D-I-C is less concurrent than we'd like,
rather than making random other things fall over - specifically,
anything that count on our traditional guarantee that AccessShareLock
is enough to keep the file from disappearing.

In the department of minor nitpicks, the syntax synopsis you've added
looks wrong:

DROP INDEX
   [ IF EXISTS ] name
[, ...] [ CASCADE | RESTRICT ]
   CONCURRENTLY name

That implies that you may write if exists, followed by 1 or more
names, followed by cascade or restrict.  After that you must write
CONCURRENTLY, followed by exactly one name.  I think what you want is:

DROP INDEX [ IF EXISTS ] name [, ...] [ CASCADE | RESTRICT ]
DROP INDEX CONCURRENTLY name

...but I also wonder if we shouldn't support DROP INDEX [ IF EXISTS ]
CONCURRENTLY.  It could also be very useful to be able to concurrently
drop multiple indexes at once, since that would require waiting out
all concurrent transactions only once instead of once per drop.
Either way, I think we should have only one syntax, and then reject
combinations we can't handle in the code.  So I think we should have
the parser accept:

DROP INDEX [ IF EXISTS ] [ CONCURRENTLY ] name [, ...] [ CASCADE | RESTRICT ]

And then, if you try to use cascade, or try to drop multiple indexes
at once, we should throw an error saying that those options aren't
supported in combination with CONCURRENTLY, rather than rejecting them
as a syntax error at parse time.  That gives a better error message
and will make it easier for anyone who wants to extend this in the
future - plus it will allow things like DROP INDEX CONCURRENTLY bob
RESTRICT, which ought to be legal even if CASCADE isn't supported.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make

Re: [HACKERS] REVIEW: pg_stat_statements with query tree based normalization

2012-01-20 Thread Peter Geoghegan

On 21 January 2012 00:24, Kääriäinen Anssi  wrote:
> I did not see any noticeable difference in runtimes with pg_stat_statements 
> installed or uninstalled (as extension).
> Not tested on master without the patch at all.
>
> Overall the feature seems to be really useful.

Thanks for taking the time to review the patch!

I am going to produce another revision in response to feedback already
received. I intend to outline the steps that it will take to resolve
some outstanding issues in the next day or so. It would be nice if you
could take a look at the revised patch that is ultimately produced.
Should I keep you posted?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Group commit, revised

2012-01-20 Thread Peter Geoghegan

On 20 January 2012 22:30, Heikki Linnakangas
 wrote:
> Maybe we should have a heuristic to split a large flush into smaller chunks.
> The WAL segment boundary would be a quite natural split point, for example,
> because when crossing the file boundary you have to issue separate fsync()s
> for the files anyway. But I'm happy with leaving that out for now, it's not
> any worse than it used to be without group commit anyway.

Let's quantify how much of a problem that is first.

> The patch added some CHECK_FOR_INTERRUPTS() calls into various places in the
> commit/abort codepaths, but that's all dead code; they're all within a
> HOLD/RESUME_INTERRUPTS blocks.

Fair enough, but do you think it's acceptable to say "well, we can't
have errors within critical blocks anyway, and nor can the driver, so
just assume that the driver will successfully service the request"?

> Furthermore, the WAL writer didn't have a latch before this patch; it's not
> many lines of code to initialize the latch and set up the signal handler for
> it, but it already has a semaphore that's ready to use.

Uh, yes it did - WalWriterDelay is passed to WaitLatch(). It didn't
use the process latch as I did (which is initialised anyway), though I
believe it should have (on general principal, to avoid invalidation
issues when generic handlers are registered, plus because the process
latch is already initialised and available), which is why I changed
it. Whatever you do with group commit, you're going to want to look at
the changes made to the WAL Writer in my original patch outside of the
main loop, because there are one or two fixes for it included
(registering a usr1 signal handler and saving errno in existing
handlers), and because we need an alternative way of power saving if
you're not going to include the mechanism originally proposed - maybe
something similar to what has been done for the BGWriter in my patch
for that. At 5 wake-ups per second by default, the process is by a
wide margin the biggest culprit (except BGWriter, which is also 5 by
default, but that is addressed by my other patch that you're
reviewing). I want to fix that problem too, and possibly investigate
if there's something to be done about the checkpointer (though that
only has a 5 second timeout, so it's not a major concern). In any
case, we should encourage the idea that auxiliary processes will use
the proc latch, unless perhaps they only use a local latch like the
avlauncher does, imho.

Why did you remove the new assertions in unix_latch.c/win32_latch.c? I
think you should keep them, as well as my additional comments on latch
timeout invalidation issues in latch.h which are untouched in your
revision (though this looks to be a rough revision, so I shouldn't
read anything into that either way I suppose). In general, we should
try and use the process latch whenever we can.

> I wonder if we should rename the file into "xlogflush.c" or something like
> that, to make it clear that this works with any XLOG flushes, not just
> commits? Group commit is the usual term for this feature, so we should
> definitely mention that in the comments, but it might be better to rename
> the files/functions to emphasize that this is about WAL flushing in general.

Okay.

> This probably needs some further cleanup to fix things I've broken, and I
> haven't done any performance testing, but it's progress. Do you have a shell
> script or something that you used for the performance tests that I could
> run? Or would you like to re-run the tests you did earlier with this patch?

No, I'm using pgbench-tools, and there's no reason to think that you
couldn't get similar results on ordinary hardware, which is all I used
- obviously you'll want to make sure that you're using a file system
that supports granular fsyncs, like ext4. All of the details,
including the config for pgbench-tools, are in my original e-mail. I
have taken the time to re-run the benchmark and update the wiki with
that new information - I'd call it a draw.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Group commit, revised

2012-01-20 Thread Peter Geoghegan

On 21 January 2012 03:13, Peter Geoghegan  wrote:
> I have taken the time to re-run the benchmark and update the wiki with
> that new information - I'd call it a draw.

On second though, the occasional latency spikes that we see with my
patch (which uses the poll() based latch in the run that is
benchmarked) might be significant - difficult to say. The averages are
about the same though.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: index support for regexp search

2012-01-20 Thread Alexander Korotkov

Hi!

Thank you for your feedback!

On Fri, Jan 20, 2012 at 3:33 AM, Erik Rijkers  wrote:

> The patch yields spectacular speedups with small, simple-enough regexen.
>  But it does not do a
> good enough job when guessing where to use the index and where fall back
> to Seq Scan.  This can
> lead to (also spectacular) slow-downs, compared to Seq Scan.
>
Could you give some examples of regexes where index scan becomes slower
than seq scan?


> I guessed that MAX_COLOR_CHARS limits the character class size (to 4, in
> your patch), is that
> true?   I can understand you want that value to be low to limit the above
> risk, but now it reduces
> the usability of the feature a bit: one has to split up larger
> char-classes into several smaller
> ones to make a statement use the index: i.e.:
>
Yes, MAX_COLOR_CHARS is number of maximum character in automata color when
that color is divided to a separated characters. And it's likely there
could be better solution than just have this hard limit.


> Btw, it seems impossible to Ctrl-C out of a search once it is submitted; I
> suppose this is
> normally necessary for perfomance reasons, but it would be useful te be
> able to compile a test
> version that allows it.  I don't know how hard that would be.
>
I seems that Ctrl-C was impossible because procedure of trigrams
exctraction becomes so long while it is not breakable. It's not difficult
to make this procedure breakable, but actually it just shouldn't take so
long.


> There is also a minor bug, I think, when running with  'set
> enable_seqscan=off'  in combination
> with a too-large regex:
>
Thanks for pointing. Will be fixed.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] WIP: index support for regexp search

2012-01-20 Thread Alexander Korotkov

On Fri, Jan 20, 2012 at 8:45 PM, Marti Raudsepp  wrote:

> On Fri, Jan 20, 2012 at 01:33, Erik Rijkers  wrote:
> > Btw, it seems impossible to Ctrl-C out of a search once it is submitted;
> I suppose this is
> > normally necessary for perfomance reasons, but it would be useful te be
> able to compile a test
> > version that allows it.
>
> I believe being interruptible is a requirement for the patch to be
> accepted.
>
> CHECK_FOR_INTERRUPTS(); should be added to the indeterminate loops.
>
Sure. It's easy to fix. But it seems that in this case gin extract_query
method becomes slow (because index scan itself is breakable). So, it just
shouldn't work so long.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] WIP: index support for regexp search

2012-01-20 Thread Alexander Korotkov

On Fri, Jan 20, 2012 at 12:54 AM, Alexander Korotkov
wrote:

> On Fri, Jan 20, 2012 at 12:30 AM, Heikki Linnakangas <
> heikki.linnakan...@enterprisedb.com> wrote:
>
>> Apart from that, the multibyte issue seems like the big one. Any way
>>> around that?
>>
>> Conversion of pg_wchar to multibyte character is the only way I found to
> avoid serious hacking of existing regexp code. Do you think additional
> function in pg_wchar_tbl which converts pg_wchar back to multibyte
> character is possible solution?
>
Do you have any notes on it? I could make the patch which adds such
function into core.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] WIP: index support for regexp search

2012-01-20 Thread Erik Rijkers

On Sat, January 21, 2012 06:26, Alexander Korotkov wrote:
> Hi!
>
> Thank you for your feedback!
>
> On Fri, Jan 20, 2012 at 3:33 AM, Erik Rijkers  wrote:
>
>> The patch yields spectacular speedups with small, simple-enough regexen.
>>  But it does not do a
>> good enough job when guessing where to use the index and where fall back
>> to Seq Scan.  This can
>> lead to (also spectacular) slow-downs, compared to Seq Scan.
>>
> Could you give some examples of regexes where index scan becomes slower
> than seq scan?
>


x[aeio]+q takes many minutes, uninterruptible.  It's now running for almost 30 
minutes, I'm sure
it will come back eventually, but I think I'll kill it later on; I suppose you 
get the point  ;-)

Even with {n,m} quantifiers it's easy to hit slowdowns:

   (table azjunk6 is 112 MB, the index 693 MB.)
   (MAX_COLOR_CHARS=4  <= your original patch)


instance   tableregex  plan   time

HEAD   azjunk6  x[aeio]{1,3}q  Seq Scan   3566.088 
ms
HEAD   azjunk6  x[aeio]{1,3}q  Seq Scan   3540.606 
ms
HEAD   azjunk6  x[aeio]{1,3}q  Seq Scan   3495.034 
ms
HEAD   azjunk6  x[aeio]{1,3}q  Seq Scan   3510.403 
ms
trgm_regex azjunk6  x[aeio]{1,3}q  Bitmap Heap Scan   3724.131 
ms
trgm_regex azjunk6  x[aeio]{1,3}q  Bitmap Heap Scan   3844.999 
ms
trgm_regex azjunk6  x[aeio]{1,3}q  Bitmap Heap Scan   3835.190 
ms
trgm_regex azjunk6  x[aeio]{1,3}q  Bitmap Heap Scan   3724.016 
ms


HEAD   azjunk6  x[aeio]{1,4}q  Seq Scan   3617.997 
ms
HEAD   azjunk6  x[aeio]{1,4}q  Seq Scan   3644.215 
ms
HEAD   azjunk6  x[aeio]{1,4}q  Seq Scan   3636.976 
ms
HEAD   azjunk6  x[aeio]{1,4}q  Seq Scan   3625.493 
ms
trgm_regex azjunk6  x[aeio]{1,4}q  Bitmap Heap Scan   7885.247 
ms
trgm_regex azjunk6  x[aeio]{1,4}q  Bitmap Heap Scan   8799.082 
ms
trgm_regex azjunk6  x[aeio]{1,4}q  Bitmap Heap Scan   7754.152 
ms
trgm_regex azjunk6  x[aeio]{1,4}q  Bitmap Heap Scan   7721.332 
ms


This is with your patch as is; in instances compiled with higher 
MAX_COLOR_CHARS (I did 6 and 9),
it is of course even easier to dream up a slow regex...



Erik Rijkers






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

2012-01-20 Thread Hitoshi Harada

On Sun, Dec 18, 2011 at 6:36 AM, Dimitri Fontaine
 wrote:
> Hi,
>
> The extensions work we began in 9.1 is not yet finished entirely
> (*cough*), so I'm opening a new patch series here by attacking the
> dependency problems.
>
> Some people want us to manage extension version numbers with sorting
> semantics so that we are able to depend on foo >= 1.2 and crazy things
> like this, and I think the need is reasonable and easier than that to
> address.

The patch applies with one reject, which I could fix easily. The make
check passed.

extension-provides.v1.patch  extensions.docx
haradh1-mac:postgresql-head haradh1$ patch -p1 <
~/Downloads/extension-provides.v1.patch
patching file contrib/pg_upgrade_support/pg_upgrade_support.c
patching file doc/src/sgml/catalogs.sgml
Hunk #2 succeeded at 3063 (offset 11 lines).
Hunk #3 succeeded at 6865 (offset 11 lines).
patching file doc/src/sgml/extend.sgml
patching file src/backend/catalog/Makefile
patching file src/backend/catalog/dependency.c
patching file src/backend/catalog/system_views.sql
patching file src/backend/commands/extension.c
patching file src/include/catalog/dependency.h
patching file src/include/catalog/indexing.h
patching file src/include/catalog/pg_extension_feature.h
patching file src/include/catalog/pg_proc.h
Hunk #1 succeeded at 4341 (offset 25 lines).
patching file src/include/commands/extension.h
patching file src/test/regress/expected/rules.out
patching file src/test/regress/expected/sanity_check.out
Hunk #1 succeeded at 103 (offset 1 line).
Hunk #2 FAILED at 163.
1 out of 2 hunks FAILED -- saving rejects to file
src/test/regress/expected/sanity_check.out.rej

What this patch does is basically:
- add pg_extension_feature to store "feature" (name) provided by extensions
- extension control file now has "provide" parameter to indicate
"feature", which is comma separated
- when creating an extension, the backend looks for "feature" required
in control file
- the installed extension has dependency on "feature"

So, the first thing is catalog.

db1=# \d pg_extension_feature;
Table "pg_catalog.pg_extension_feature"
   Column   | Type | Modifiers
+--+---
 extoid | oid  | not null
 extfeature | name | not null
Indexes:
"pg_extension_feature_index" UNIQUE, btree (extoid, extfeature)
"pg_extension_feature_oid_index" UNIQUE, btree (oid)

* I'm not quit sure why pg_extension_feature_index needs extoid column.
* I have a big question to add two-column catalog. I don't mind the
actual number of columns, but if the table has only two columns, it
implies the design may be bad. Only two column catalog other than this
is pg_largeobject_metadata.

Next, some questions:
- Why is the finer dependency needed? Do you have tangible example
that struggles with the dependency granularity? I feel so good about
the existing dependency on extension as an extension developer of
several ones.
- What happens if DROP EXTENSION ... CASCADE? Does it work?
- How does pg_upgrade interact with this? The dependency changes.

A minor memo.
list_extension_features(): I guess the size argument for repalloc is bogus.

So, that's pretty much I've reviewed quickly. I'll need more time to
look in detail, but I'd like more inputs for the high-level design and
direction.

Thanks,
-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Online base backup from the hot-standby

2012-01-20 Thread Fujii Masao

On Fri, Jan 20, 2012 at 1:01 PM, Steve Singer  wrote:
> Here is my review of this verison of the patch. I think this patch has been
> in every CF for 9.2 and I feel it is getting close to being committed.

Thanks for the review!

> Testing Review
> 
>
> I encountered this on my first replica (the one based on the master).  I am
> not sure if it is related to this patch, it happened after the pg_basebackup
> against the replica finished.
>
> TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File:
> "twophase.c", Line: 1238)
> LOG:  startup process (PID 1) was terminated by signal 6: Aborted

I spent one hour to reproduce that issue, but finally I was not able
to do that :(
For now I have no idea what causes that issue. But basically the patch doesn't
touch any codes related to that issue, so I'm guessing that it's a problem of
the HEAD rather than the patch...

I will spend more time to diagnose the issue. If you notice something, please
let me know.

> - set full page writes=off and did a checkpoint
> - Started the pg_basebackup
> - set full_page_writes=on and did a HUP + some database activity that might
> have forced a checkpoint.
>
> I got this message from pg_basebackup.
> ./pg_basebackup -D ../data3 -l foo -h localhost -p 5438
> pg_basebackup: could not get WAL end position from server
>
> I point this out because the message is different than the normal "could not
> initiate base backup: FATAL:  WAL generated with full_page_writes=off" thatI
> normally see.

I guess that's because you started pg_basebackup before checkpoint record
with full_page_writes=off had been replicated and replayed to the standby.
In this case, when you starts pg_basebackup, it uses the previous checkpoint
record with maybe full_page_writes=on as the backup starting checkpoint, so
pg_basebackup passes the check of full_page_writes at the start of backup.
Then, it fails the check at the end of backup, so you got such an error message.

> We might want to add a PQerrorMessage(conn)) to
> pg_basebackup to print the error details.  Since this patch didn't actually
> change pg_basebackup I don't think your required to improve the error
> messages in it.  I am just mentioning this because it came up in testing.

Agreed.

When PQresultStatus() returns an unexpected status, basically the error
message from PQerrorMessage() should be reported. But only when
pg_basebackup could not get WAL end position, PQerrorMessage() was
not reported... This looks like a oversight of pg_basebackup... I think that
it's better to fix that as a separate patch (attached). Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 4007680..873ef64 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -914,7 +914,7 @@ BaseBackup(void)
 	res = PQexec(conn, "IDENTIFY_SYSTEM");
 	if (PQresultStatus(res) != PGRES_TUPLES_OK)
 	{
-		fprintf(stderr, _("%s: could not identify system: %s\n"),
+		fprintf(stderr, _("%s: could not identify system: %s"),
 progname, PQerrorMessage(conn));
 		disconnect_and_exit(1);
 	}
@@ -1049,8 +1049,8 @@ BaseBackup(void)
 	res = PQgetResult(conn);
 	if (PQresultStatus(res) != PGRES_TUPLES_OK)
 	{
-		fprintf(stderr, _("%s: could not get WAL end position from server\n"),
-progname);
+		fprintf(stderr, _("%s: could not get WAL end position from server: %s"),
+progname, PQerrorMessage(conn));
 		disconnect_and_exit(1);
 	}
 	if (PQntuples(res) != 1)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Online base backup from the hot-standby

2012-01-20 Thread Erik Rijkers

On Fri, January 20, 2012 05:01, Steve Singer wrote:
> On 12-01-17 05:38 AM, Fujii Masao wrote:
>> On Fri, Jan 13, 2012 at 5:02 PM, Fujii Masao  wrote:
>>> The amount of code changes to allow pg_basebackup to make a backup from
>>> the standby seems to be small. So I ended up merging that changes and the
>>> infrastructure patch. WIP patch attached. But I'd happy to split the patch 
>>> again
>>> if you want.
>> Attached is the updated version of the patch. I wrote the limitations of
>> standby-only backup in the document and changed the error messages.
>>
>
> Here is my review of this verison of the patch. I think this patch has
> been in every CF for 9.2 and I feel it is getting close to being
> committed.  The only issue of significants is a crash I encountered
> while testing, see below.
>
> I am fine with including the pg_basebackup changes in the patch it also
> makes testing some of the other changes possible.
>
>
> The documentation updates you have are good
>
> I don't see any issues looking at the code.
>
>
>
> Testing Review
> 
>
> I encountered this on my first replica (the one based on the master).  I
> am not sure if it is related to this patch, it happened after the
> pg_basebackup against the replica finished.
>
> TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File:
> "twophase.c", Line: 1238)
> LOG:  startup process (PID 1) was terminated by signal 6: Aborted
> LOG:  terminating any other active server processes
>
> A little earlier this postmaster had printed.
>
> LOG:  restored log file "0001001F" from archive
> LOG:  restored log file "00010020" from archive
> cp: cannot stat
> `/usr/local/pgsql92git/archive/00010021': No such file
> or directory
> LOG:  unexpected pageaddr 0/1900 in log file 0, segment 33, offset 0
> cp: cannot stat
> `/usr/local/pgsql92git/archive/00010021': No such file
> or directory
>
>
> I have NOT been able to replicate this error  and I am not sure exactly
> what I had done in my testing prior to that point.
>

I'm not sure, but it does look like this is the "mystery" bug that I 
encountered repeatedly
already in 9.0devel; but I was never able to reproduce it reliably.  But I 
don't think it was ever
solved.

  http://archives.postgresql.org/pgsql-hackers/2010-03/msg00223.php

Erik Rijkers















>
> In another test run I had
>
> - set full page writes=off and did a checkpoint
> - Started the pg_basebackup
> - set full_page_writes=on and did a HUP + some database activity that
> might have forced a checkpoint.
>
> I got this message from pg_basebackup.
> ./pg_basebackup -D ../data3 -l foo -h localhost -p 5438
> pg_basebackup: could not get WAL end position from server
>
> I point this out because the message is different than the normal "could
> not initiate base backup: FATAL:  WAL generated with
> full_page_writes=off" thatI normally see.We might want to add a
> PQerrorMessage(conn)) to pg_basebackup to print the error details.
> Since this patch didn't actually change pg_basebackup I don't think your
> required to improve the error messages in it.  I am just mentioning this
> because it came up in testing.
>
>
> The rest of the tests I did involving changing full_page_writes
> with/without checkpoints and sighups and promoting the replica seemed to
> work as expected.
>



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL Restore process during recovery

2012-01-20 Thread Simon Riggs

On Fri, Jan 20, 2012 at 3:43 AM, Fujii Masao  wrote:

Requested update


-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ce659ec..469e6d6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -40,6 +40,7 @@
 #include "pgstat.h"
 #include "postmaster/bgwriter.h"
 #include "postmaster/startup.h"
+#include "postmaster/walrestore.h"
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
@@ -187,7 +188,6 @@ static bool InArchiveRecovery = false;
 static bool restoredFromArchive = false;
 
 /* options taken from recovery.conf for archive recovery */
-static char *recoveryRestoreCommand = NULL;
 static char *recoveryEndCommand = NULL;
 static char *archiveCleanupCommand = NULL;
 static RecoveryTargetType recoveryTarget = RECOVERY_TARGET_UNSET;
@@ -575,8 +575,8 @@ bool reachedConsistency = false;
 
 static bool InRedo = false;
 
-/* Have we launched bgwriter during recovery? */
-static bool bgwriterLaunched = false;
+/* Have we launched background procs during archive recovery yet? */
+static bool ArchRecoveryBgProcsActive = false;
 
 /*
  * Information logged when we detect a change in one of the parameters
@@ -632,8 +632,6 @@ static bool XLogPageRead(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt,
 			 bool randAccess);
 static int	emode_for_corrupt_record(int emode, XLogRecPtr RecPtr);
 static void XLogFileClose(void);
-static bool RestoreArchivedFile(char *path, const char *xlogfname,
-	const char *recovername, off_t expectedSize);
 static void ExecuteRecoveryCommand(char *command, char *commandName,
 	   bool failOnerror);
 static void PreallocXlogFiles(XLogRecPtr endptr);
@@ -2706,19 +2704,47 @@ XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli,
 
 	XLogFileName(xlogfname, tli, log, seg);
 
+#define TMPRECOVERYXLOG	"RECOVERYXLOG"
+
 	switch (source)
 	{
 		case XLOG_FROM_ARCHIVE:
+			/*
+			 * Check to see if the WALRestore process has already put the
+			 * next file in place while we were working. If so, use that.
+			 * If not, get it ourselves. This makes it easier to handle
+			 * initial state before the WALRestore is active, and also
+			 * handles the stop/start logic correctly when we have both
+			 * streaming and file based replication active.
+			 *
+			 * We queue up the next task for WALRestore after we've begun to
+			 * use this file later in XLogFileRead().
+			 *
+			 * If the WALRestore process is still active, the lock wait makes
+			 * us wait, which is just like we were executing the command
+			 * ourselves and so doesn't alter the logic elsewhere.
+			 */
+			if (XLogFileIsNowFullyRestored(tli, log, seg))
+			{
+snprintf(path, MAXPGPATH, XLOGDIR "/%s", TMPRECOVERYXLOG);
+restoredFromArchive = true;
+break;
+			}
+
 			/* Report recovery progress in PS display */
 			snprintf(activitymsg, sizeof(activitymsg), "waiting for %s",
 	 xlogfname);
 			set_ps_display(activitymsg, false);
 
 			restoredFromArchive = RestoreArchivedFile(path, xlogfname,
-	  "RECOVERYXLOG",
+	  TMPRECOVERYXLOG,
 	  XLogSegSize);
+
 			if (!restoredFromArchive)
+			{
+LWLockRelease(WALRestoreCommandLock);
 return -1;
+			}
 			break;
 
 		case XLOG_FROM_PG_XLOG:
@@ -2748,18 +2774,42 @@ XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli,
 		if (stat(xlogfpath, &statbuf) == 0)
 		{
 			if (unlink(xlogfpath) != 0)
+			{
+LWLockRelease(WALRestoreCommandLock);
 ereport(FATAL,
 		(errcode_for_file_access(),
 		 errmsg("could not remove file \"%s\": %m",
 xlogfpath)));
+			}
 			reload = true;
 		}
 
 		if (rename(path, xlogfpath) < 0)
+		{
+			LWLockRelease(WALRestoreCommandLock);
 			ereport(ERROR,
 (errcode_for_file_access(),
  errmsg("could not rename file \"%s\" to \"%s\": %m",
 		path, xlogfpath)));
+		}
+
+		/*
+		 * Make sure we recover from the new filename, so we can reuse the
+		 * temporary filename for asynchronous restore actions.
+		 */
+		strcpy(path, xlogfpath);
+
+		/*
+		 * Tell the WALRestore process to get the next file now.
+		 * Hopefully it will be ready for use in time for the next call the
+		 * Startup process makes to XLogFileRead().
+		 *
+		 * It might seem like we should do that earlier but then there is a
+		 * race condition that might lead to replacing RECOVERYXLOG with
+		 * another file before we've copied it.
+		 */
+		SetNextWALRestoreLogSeg(tli, log, seg);
+		LWLockRelease(WALRestoreCommandLock);
 
 		/*
 		 * If the existing segment was replaced, since walsenders might have
@@ -2911,8 +2961,11 @@ XLogFileClose(void)
  * For fixed-size files, the caller may pass the expected size as an
  * additional crosscheck on successful recovery.  If the file size is not
  * known, set expectedSize = 0.
+ *
+ * Must be ca

Re: [HACKERS] WAL Restore process during recovery

2012-01-20 Thread Fujii Masao

On Fri, Jan 20, 2012 at 7:38 PM, Simon Riggs  wrote:
> On Fri, Jan 20, 2012 at 3:43 AM, Fujii Masao  wrote:
>
> Requested update

Thanks! Will review.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Online base backup from the hot-standby

2012-01-20 Thread Fujii Masao

On Fri, Jan 20, 2012 at 7:37 PM, Erik Rijkers  wrote:
> I'm not sure, but it does look like this is the "mystery" bug that I 
> encountered repeatedly
> already in 9.0devel; but I was never able to reproduce it reliably.  But I 
> don't think it was ever
> solved.
>
>  http://archives.postgresql.org/pgsql-hackers/2010-03/msg00223.php

I also encountered the same issue one year before:
http://archives.postgresql.org/pgsql-hackers/2010-11/msg01579.php

At that moment, I identified its cause:
http://archives.postgresql.org/pgsql-hackers/2010-11/msg01700.php

At last it was fixed:
http://archives.postgresql.org/pgsql-hackers/2010-11/msg01910.php

But Steve encountered it again, which means that the above fix is not
sufficient. Unless the issue is derived from my patch, we should do
another cycle of diagnosis of it.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Online base backup from the hot-standby

2012-01-20 Thread Simon Riggs

On Tue, Jan 17, 2012 at 10:38 AM, Fujii Masao  wrote:
> On Fri, Jan 13, 2012 at 5:02 PM, Fujii Masao  wrote:
>> The amount of code changes to allow pg_basebackup to make a backup from
>> the standby seems to be small. So I ended up merging that changes and the
>> infrastructure patch. WIP patch attached. But I'd happy to split the patch 
>> again
>> if you want.
>
> Attached is the updated version of the patch. I wrote the limitations of
> standby-only backup in the document and changed the error messages.

I'm looking at this patch and wondering why we're doing so many
press-ups to ensure full_page_writes parameter is on. This will still
fail if you use a utility that removes the full page writes, but fail
silently.

I think it would be beneficial to explicitly check that all WAL
records have full page writes actually attached to them until we
achieve consistency.

Surprised to see XLOG_FPW_CHANGE is there again after I objected to it
and it was removed. Not sure why? We already track other parameters
when they change, so I don't want to introduce a whole new WAL record
for each new parameter whose change needs tracking.

Please make a note for committer that wal version needs bumping.

I think its probably time to start a README.recovery to explain why
this works the way it does. Other changes can then start to do that as
well, so we can keep this to sane levels of complexity.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IDLE in transaction introspection

2012-01-20 Thread Magnus Hagander

On Thu, Jan 19, 2012 at 15:42, Fujii Masao  wrote:
> On Thu, Jan 19, 2012 at 10:31 PM, Magnus Hagander  wrote:
>> Applied with fairly extensive modifications. I moved things around,
>> switched to using enum instead of int+#define and a few things like
>> that. Also changed most of the markup in the docs - I may well have
>> broken some previously good language that, so if I did and someone
>> spots it, please mention it :-)
>
> The attached patch seems to need to be committed.

Thanks, applied.


> BTW, the following change in the patch is not required, but ISTM that
> it's better to change that way.
>
> -SELECT pg_stat_get_backend_pid(s.backendid) AS procpid,
> -       pg_stat_get_backend_activity(s.backendid) AS current_query
> +SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
> +       pg_stat_get_backend_activity(s.backendid) AS query

Agreed.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Online base backup from the hot-standby

2012-01-20 Thread Simon Riggs

On Fri, Jan 20, 2012 at 11:04 AM, Fujii Masao  wrote:

> But Steve encountered it again, which means that the above fix is not
> sufficient. Unless the issue is derived from my patch, we should do
> another cycle of diagnosis of it.

It's my bug, and I've posted a fix but not yet applied it, just added
to open items list. The only reason for that was time pressure, which
is now gone, so I'll look to apply it sooner.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New replication mode: write

2012-01-20 Thread Simon Riggs

On Mon, Jan 16, 2012 at 4:17 PM, Simon Riggs  wrote:
> On Mon, Jan 16, 2012 at 12:45 PM, Fujii Masao  wrote:
>
>> Done. Attached is the updated version of the patch.
>
> Thanks.
>
> I'll review this first, but can't start immediately. Please expect
> something back in 2 days.

On initial review this looks fine.

I'll do a more thorough hands-on review now and commit if still OK.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: tracking temp files in pg_stat_database

2012-01-20 Thread Magnus Hagander

On Tue, Jan 17, 2012 at 21:39, Tomas Vondra  wrote:
> On 20.12.2011 19:59, Tomas Vondra wrote:
>> On 20.12.2011 11:20, Magnus Hagander wrote:
>>> 2011/12/20 Tomas Vondra :

 I haven't updated the docs yet - let's see if the patch is acceptable at
 all first.
>>>
>>> Again, without having reviewed the code, this looks like a feature
>>> we'd want, so please add some docs, and then submit it for the next
>>> commitfest!
>>
>> I've added the docs (see the attachment) and rebased to current head.
>>
>> Tomas
>
> Fixed a failing regression test (check of pg_stat_database structure).

I'm wondering if this (and also my deadlocks stats patch that's int he
queue) should instead of inventing new pgstats messages, add fields to
the tabstat message. It sounds like that one is just for tables, but
it's already the one collecting info about commits and rollbacks, and
it's already sent on every commit.

Adding two fields to that one would add some extra bytes on every
send, but I wonder if that woudl ever affect performance, given the
total size of the packet? And it would certainly be lower overhead in
the cases that there *have* been temp tables used.

Thoughts?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Online base backup from the hot-standby

2012-01-20 Thread Fujii Masao

Thanks for the review!

On Fri, Jan 20, 2012 at 8:15 PM, Simon Riggs  wrote:
> I'm looking at this patch and wondering why we're doing so many
> press-ups to ensure full_page_writes parameter is on. This will still
> fail if you use a utility that removes the full page writes, but fail
> silently.
>
> I think it would be beneficial to explicitly check that all WAL
> records have full page writes actually attached to them until we
> achieve consistency.

I agree that it's worth adding such a safeguard. That can be a self-contained
feature, so I'll submit a separate patch for that, to keep each patch small.

> Surprised to see XLOG_FPW_CHANGE is there again after I objected to it
> and it was removed. Not sure why? We already track other parameters
> when they change, so I don't want to introduce a whole new WAL record
> for each new parameter whose change needs tracking.

I revived that because whenever full_page_writes must be WAL-logged
or replayed, there is no need to WAL-log or replay the HS parameters.
The opposite is also true. Logging or replaying all of them every time
seems to be a bit useless, and to make the code unreadable. ISTM that
XLOG_FPW_CHANGE can make the code simpler and avoid adding useless
WAL activity by merging them into one WAL record.

> Please make a note for committer that wal version needs bumping.

Okay, will add the note about bumping XLOG_PAGE_MAGIC.

> I think its probably time to start a README.recovery to explain why
> this works the way it does. Other changes can then start to do that as
> well, so we can keep this to sane levels of complexity.

In this CF, there are other patches which change recovery codes. So
I think that it's better to do that after all of them will have been committed.
No need to hurry up to do that now.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup is not checking IDENTIFY_SYSTEM numbre of columns

2012-01-20 Thread Magnus Hagander

On Sun, Jan 15, 2012 at 22:00, Jaime Casanova  wrote:
> On Wed, Jan 11, 2012 at 5:11 PM, Magnus Hagander  wrote:
>>
>> No, no reason. Adding such a check would be a good idea.
>>
>
> ok. patch attached, it also adds a few PQclear() calls before
> disconnect_and_exit().

I don't think we need to care about all those PQclear() - it does an
exit() right after them anyway, so what's the point? (the disconnect
part is important of course, since otherwise we get a message in the
log on the server)

I've applied the patch without the PQclear()s, and changed it around
so that the error message shown is actually the same in all the
different places.

> btw, in BaseBackup() in line 1149 (after the patch is applied) there
> is an exit instead of disconnect_and_exit() and that is probably a
> typo too

It is, indeed. Fixed.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)

2012-01-20 Thread Robert Haas

On Sat, Jan 14, 2012 at 9:32 AM, Heikki Linnakangas
 wrote:
> Here's another version of the patch to make XLogInsert less of a bottleneck
> on multi-CPU systems. The basic idea is the same as before, but several bugs
> have been fixed, and lots of misc. clean up has been done.

This seems to need a rebase.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CLOG contention, part 2

2012-01-20 Thread Robert Haas

On Sun, Jan 8, 2012 at 9:25 AM, Simon Riggs  wrote:
> I've taken that idea and used it to build a second Clog cache, known
> as ClogHistory which allows access to the read-only tail of pages in
> the clog. Once a page has been written to for the last time, it will
> be accessed via the ClogHistory Slru in preference to the normal Clog
> Slru. This separates historical accesses by readers from current write
> access by committers. Historical access doesn't force dirty writes,
> nor are commits made to wait when historical access occurs.

This seems to need a rebase.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inline Extension

2012-01-20 Thread Dimitri Fontaine

Tom Lane  writes:
> Robert Haas  writes:
>> I guess the question is: for what purpose?
>
> Indeed, it seems like such a thing is not an extension at all anymore,
> or at least it gives up many of the useful properties of extensions.

I'm thinking that a common name and version number tracked in the
database for a set of related functions (that usually form an API) is
useful enough a property to be wanting to have extension support more
use cases than contrib-like “module centric” extensions (meaning, C
coded and shipped with a .so).

> Given the entire lack of demand from the field for such a cut-down
> concept of extension, I think we should not be in a hurry to introduce
> it.  Maybe in a year or two when we have a clearer idea of how people
> are actually using extensions, there will be a better argument for it.

Fair enough I guess (or at least I'm understanding how alone I am here),
let's hear from the field first.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-01-20 Thread Robert Haas

On Fri, Jan 20, 2012 at 12:07 AM, Tom Lane  wrote:
> Andrew Dunstan  writes:
>> On 01/19/2012 04:12 PM, Robert Haas wrote:
>>> On Thu, Jan 19, 2012 at 4:07 PM, Andrew Dunstan  wrote:
 The spec only allows unescaped Unicode chars (and for our purposes that
 means UTF8). An unescaped non-ASCII character in, say, ISO-8859-1 will
 result in something that's not legal JSON.
>
>>> I understand.  I'm proposing that we not care.  In other words, if the
>>> server encoding is UTF-8, it'll really be JSON.  But if the server
>>> encoding is something else, it'll be almost-JSON.
>
>> Of course, for data going to the client, if the client encoding is UTF8,
>> they should get legal JSON, regardless of what the database encoding is,
>> and conversely too, no?
>
> Yes.  I think this argument has been mostly theologizing, along the
> lines of how many JSON characters can dance on the head of a pin.
> From a user's perspective, the database encoding is only a constraint on
> which characters he can store.

Bingo.

> He does not know or care what the bit
> representation is inside the server.  As such, if we store a non-ASCII
> character in a JSON string, it's valid JSON as far as the user is
> concerned, so long as that character exists in the Unicode standard.
> If his client encoding is UTF8, the value will be letter-perfect JSON
> when it gets to him; and if his client encoding is not UTF8, then he's
> already pretty much decided that he doesn't give a fig about the
> Unicode-centricity of the JSON spec, no?

Also agreed.  Personally, I think it may not have been a great idea to
tie the JSON spec so closely to Unicode, but I understand that it
would have been difficult to define an encoding-agnostic equivalent of
\u, since it's hard to know for sure whether an arbitrary encoding
even has a (sensible?) definition of code points, and they probably
wanted to avoid ambiguity.  But, it's bound to cause problems for any
system that runs in some other encoding, which, when so requested, we
do.  Even if we had the ability to support multiple encodings in the
same database, I'm not sure I'd be very excited about insisting that
JSON data always be stored in UTF-8, because that would introduce a
lot of unnecessary transcoding for people using other encodings and
basically unnecessarily handicap the functionality provided by the
datatype.  But at least if we had that, people would have the *option*
to use JSON with UTF-8 and get the fully spec-compliant behavior.  As
it is, they don't; the system we have forces the database encoding on
all datatypes whether they like it or not, and that ain't changing for
9.2.

> So I'm with Robert: we should just plain not care.  I would further
> suggest that maybe what we should do with incoming JSON escape sequences
> is convert them to Unicode code points and then to the equivalent
> character in the database encoding (or throw error if there is none).

The code I've written so far does no canonicalization of the input
value of any kind, just as we do for XML.  I'm inclined to leave it
that way.  Eventually, we might want to make the JSON datatype support
equality comparisons and so on, and that will require the system to
knowing that the letter r can be encoded as some \u sequence and
that the escape \r is equivalent to some other escape \u, but
right now all the code does is try to validate that the JSON is legal,
NOT second-guess the user's choice about how to spell things or where
to insert whitespace.  I think that's a good choice because (1) AFAIK,
there's no official canonicalization method for JSON, so whatever we
pick will be something we think is best, not an official method
sanction by the spec, (2) users might prefer the way they chose to
represent a given value over the way we choose to represent it, and
(3) by simply validating and storing the JSON object, rather than
doing any canonicalization, the input function avoids the need to do
any data copying, hopefully maximizing speed.  Canonicalization can be
added on top of what I've done here and people who want or need it can
use it; I have some ideas around how to make that leverage the
existing code that I intend to pursue for 9.3, but right now I'd
rather not go there.

So, given that framework, what the patch does is this: if you're using
UTF-8, then \u is accepted, provided that  is something that
equates to a legal Unicode code point.  It isn't converted to the
corresponding character: it's just validated.  If you're NOT using
UTF-8, then it allows \u for code points up through 127 (which we
assume are the same in all encodings) and anything higher than that is
rejected.  If someone knows an easy way to check whether a \u
sequence for  > 007F is a legal Unicode code point that has an
equivalent in the current server encoding, then we can add logic to
allow that case also, but personally I'm not that excited about it.
Anyone who is using \u escapes with a non-Unicode coding is
pr

Re: [HACKERS] Command Triggers

2012-01-20 Thread Dimitri Fontaine

Robert Haas  writes:
> My advice is to forget about trying to provide the command string to
> the user for the first version of this patch.  As you're finding out,
> there's no simple, easy, obvious way of doing it, and there are N>0
> useful things that can be done without that functionality.

Actually, none of my current examples and tests are relying on it. For
the trigger based replications Jan said he would need the full parse
tree rather than just the command string anyway, so we're not loosing
anything more that we already were ready to postpone.

>  I continue
> to think that for a first version of this, we should be satisfied to
> pass just the OID.  I know that's not really what you want, but
> there's much to be said for picking a goal that is achievable in the
> limited time available, and I fear that setting your sights too high
> will lead to something that either (a) doesn't get committed, or (b)
> gets committed, but turns out not to work very well, either of which
> would be less than ideal.

Agreed, mostly.

>From the code I already have I think we should still pass in the command
tag, the schema name (or null) and the object name (or null) as input
parameters to the trigger's procedure.

I'm now working on that and then the concurrency angle of the command
triggers.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] our buffer replacement strategy is kind of lame

2012-01-20 Thread Heikki Linnakangas


On 03.01.2012 17:56, Simon Riggs wrote:

On Tue, Jan 3, 2012 at 3:18 PM, Robert Haas  wrote:


2. When a backend can't find a free buffer, it spins for a long time
while holding the lock. This makes the buffer strategy O(N) in its
worst case, which slows everything down. Notably, while this is
happening the bgwriter sits doing nothing at all, right at the moment
when it is most needed. The Clock algorithm is an approximation of an
LRU, so is already suboptimal as a "perfect cache". Tweaking to avoid
worst case behaviour makes sense. How much to tweak? Well,...


I generally agree with this analysis, but I don't think the proposed
patch is going to solve the problem.  It may have some merit as a way
of limiting the worst case behavior.  For example, if every shared
buffer has a reference count of 5, the first buffer allocation that
misses is going to have a lot of work to do before it can actually
come up with a victim.  But I don't think it's going to provide good
scaling in general.  Even if the background writer only spins through,
on average, ten or fifteen buffers before finding one to evict, that
still means we're acquiring ten or fifteen spinlocks while holding
BufFreelistLock. I don't currently have the measurements to prove
that's too expensive, but I bet it is.


I think its worth reducing the cost of scanning, but that has little
to do with solving the O(N) problem. I think we need both.

I've left the way open for you to redesign freelist management in many
ways. Please take the opportunity and go for it, though we must
realise that major overhauls require significantly more testing to
prove their worth. Reducing spinlocking only sounds like a good way to
proceed for this release.

If you don't have time in 9.2, then these two small patches are worth
having. The bgwriter locking patch needs less formal evidence to show
its worth. We simply don't need to have a bgwriter that just sits
waiting doing nothing.


I'd like to see some benchmarks that show a benefit from these patches, 
before committing something like this that complicates the code. These 
patches are fairly small, but nevertheless. Once we have a test case, we 
can argue whether the benefit we're seeing is worth the extra code, or 
if there's some better way to achieve it.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup option for handling symlinks

2012-01-20 Thread Magnus Hagander

On Mon, Jan 16, 2012 at 19:52, Peter Eisentraut  wrote:
> On sön, 2012-01-08 at 22:22 +0100, Magnus Hagander wrote:
>> On Sun, Jan 8, 2012 at 21:53, Peter Eisentraut  wrote:
>> > I've recently had a possible need for telling pg_basebackup how to
>> > handle symlinks in the remote data directory, instead of ignoring them,
>> > which is what currently happens.  Possible options were recreating the
>> > symlink locally (pointing to a file on the local system) or copying the
>> > file the symlink points to.  This is mainly useful in scenarios where
>> > configuration files are symlinked from the data directory.  Has anyone
>> > else had the need for this?  Is it worth pursuing?
>>
>> Yes.
>>
>> I came up to the same issue though - in one case it would've been best
>> to copy the link, in the other case it would've been best to copy the
>> contents of the file :S Couldn't decide which was most important...
>> Maybe a switch would be needed?
>
> Yes.  Do we need to preserve the third behavior of ignoring symlinks?

I don't think we do.


> tar has an -h option that causes symlinks to be followed.  The default
> there is to archive the symlink itself.

Seems like a reasonable pattern to follow (though I think using -h is
a really bad idea, but the pattern of by default archiving the symlink
itself)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Online base backup from the hot-standby

2012-01-20 Thread Simon Riggs

On Fri, Jan 20, 2012 at 12:54 PM, Fujii Masao  wrote:
> Thanks for the review!
>
> On Fri, Jan 20, 2012 at 8:15 PM, Simon Riggs  wrote:
>> I'm looking at this patch and wondering why we're doing so many
>> press-ups to ensure full_page_writes parameter is on. This will still
>> fail if you use a utility that removes the full page writes, but fail
>> silently.
>>
>> I think it would be beneficial to explicitly check that all WAL
>> records have full page writes actually attached to them until we
>> achieve consistency.
>
> I agree that it's worth adding such a safeguard. That can be a self-contained
> feature, so I'll submit a separate patch for that, to keep each patch small.

Maybe, but you mean do this now as well? Not sure I like silent errors.

>> Surprised to see XLOG_FPW_CHANGE is there again after I objected to it
>> and it was removed. Not sure why? We already track other parameters
>> when they change, so I don't want to introduce a whole new WAL record
>> for each new parameter whose change needs tracking.
>
> I revived that because whenever full_page_writes must be WAL-logged
> or replayed, there is no need to WAL-log or replay the HS parameters.
> The opposite is also true. Logging or replaying all of them every time
> seems to be a bit useless, and to make the code unreadable. ISTM that
> XLOG_FPW_CHANGE can make the code simpler and avoid adding useless
> WAL activity by merging them into one WAL record.

I don't agree, but for the sake of getting on with things I say this
is minor so is no reason to block this.

>> Please make a note for committer that wal version needs bumping.
>
> Okay, will add the note about bumping XLOG_PAGE_MAGIC.
>
>> I think its probably time to start a README.recovery to explain why
>> this works the way it does. Other changes can then start to do that as
>> well, so we can keep this to sane levels of complexity.
>
> In this CF, there are other patches which change recovery codes. So
> I think that it's better to do that after all of them will have been 
> committed.
> No need to hurry up to do that now.

Agreed.

Will proceed to final review and if all OK, commit.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)

2012-01-20 Thread Simon Riggs

On Fri, Jan 20, 2012 at 2:11 PM, Heikki Linnakangas
 wrote:
> On 20.01.2012 15:32, Robert Haas wrote:
>>
>> On Sat, Jan 14, 2012 at 9:32 AM, Heikki Linnakangas
>>   wrote:
>>>
>>> Here's another version of the patch to make XLogInsert less of a
>>> bottleneck
>>> on multi-CPU systems. The basic idea is the same as before, but several
>>> bugs
>>> have been fixed, and lots of misc. clean up has been done.
>>
>>
>> This seems to need a rebase.
>
>
> Here you go.

I put myself down as reviewer for this. I'm planning to review this
early next week, once I've finished Fujii-san's patches.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inline Extension

2012-01-20 Thread Robert Haas

On Fri, Jan 20, 2012 at 8:52 AM, Dimitri Fontaine
 wrote:
> Tom Lane  writes:
>> Robert Haas  writes:
>>> I guess the question is: for what purpose?
>>
>> Indeed, it seems like such a thing is not an extension at all anymore,
>> or at least it gives up many of the useful properties of extensions.
>
> I'm thinking that a common name and version number tracked in the
> database for a set of related functions (that usually form an API) is
> useful enough a property to be wanting to have extension support more
> use cases than contrib-like “module centric” extensions (meaning, C
> coded and shipped with a .so).

I see that there is some usefulness there, but I'm not sure that this
is the best way to get our hands around it.  For one thing, people can
and do schema versioning and schema upgrade scripts entirely in
userland.  My last implementation worked by keeping a schema_versions
table on the server with one column, a UUID.  The deployment tarball
contained a file with a list of UUIDs in it, each one associated to an
SQL script.  At install time, the install script ran through that file
in order and ran any scripts whose UUID didn't yet appear in the
table, and then added the UUIDs of the run scripts to the table.  This
might not be what any given person wants, but there's a lot of
flexibility to do things like that without any particular support from
the DBMS.  (Incidentally, this experience is what convinced me that
CREATE TABLE IF EXISTS and ALTER TABLE IF EXISTS are good things to
have; my system could have been a lot simpler if I'd had those.)

One random design idea I had related to providing this functionality
in the DB core is to have a command that creates an empty extension,
maybe just "CREATE EXTENSION foo EMPTY", and an ALTER command that
forcibly changes the DB's notion of what version is installed, like
"ALTER EXTENSION foo FORCE VERSION TO '1.1'".  That would allow the
same sort of thing you're after here by using those two features plus
ALTER EXTENSION ADD/DROP, and could also be used to do other things.
For example, suppose you have DB servers A and B.  A is running an old
version of some extension and is in a shared hosting environment where
you can't get access to the box.  B, on the other hand, is your
brand-new, dedicated server.  You could upgrade the extension on the
old machine "manually", by issuing SQL commands to forcibly change its
state, and then do a dump and reload onto B.  This might be useful if,
for example, B is also running a newer DB server version that won't
support the very old version of the extension running on A.   This is
probably an unusual situation, but maybe there's some value in
allowing users who want to do such things a cleaner way to do it than
direct catalog hackery.

Anyway, I'm just thinking out loud here - I don't actually have a very
strong feeling that I know what all of the solutions are in this area,
or even all the problems.  I'm interested in hearing about your
experiences with the system, and other people's, because I certainly
do agree that there's room for improvement.  One of my personal pet
peeves is that the system doesn't know how to do an install of v1.1 by
running the v1.0 script followed by the 1.0-1.1 upgrade script, which
I fear is going to lead to a rapid uptick in the number of copies of
almost-identical scripts in our git repository.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-01-20 Thread Marko Kreen

On Tue, Jan 17, 2012 at 05:53:33PM +0900, Kyotaro HORIGUCHI wrote:
> Hello,  This is revised and rebased version of the patch.
> 
> a. Old term `Add Tuple Function' is changed to 'Store
>Handler'. The reason why not `storage' is simply length of the
>symbols.
> 
> b. I couldn't find the place to settle PGgetAsCString() in. It is
>removed and storeHandler()@dblink.c touches PGresAttValue
>directly in this new patch. Definition of PGresAttValue stays
>in lipq-fe.h and provided with comment.
> 
> c. Refine error handling of dblink.c. I think it preserves the
>previous behavior for column number mismatch and type
>conversion exception.
> 
> d. Document is revised.

First, my priority is one-the-fly result processing,
not the allocation optimizing.  And this patch seems to make
it possible, I can process results row-by-row, without the
need to buffer all of them in PQresult.  Which is great!

But the current API seems clumsy, I guess its because the
patch grew from trying to replace the low-level allocator.

I would like to propose better one-shot API with:

void *(*RowStoreHandler)(PGresult *res, PGresAttValue *columns);

where the PGresAttValue * is allocated once, inside PQresult.
And the pointers inside point directly to network buffer.
Ofcourse this requires replacing the current per-column malloc+copy
pattern with per-row parse+handle pattern, but I think resulting
API will be better:

1) Pass-through processing do not need to care about unnecessary
   per-row allocations.

2) Handlers that want to copy of the row (like regular libpq),
   can optimize allocations by having "global" view of the row.
   (Eg. One allocation for row header + data).

This also optimizes call patterns - first libpq parses packet,
then row handler processes row, no unnecessary back-and-forth.

Summary - current API has various assumptions how the row is
processed, let's remove those.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CLOG contention, part 2

2012-01-20 Thread Simon Riggs

On Fri, Jan 20, 2012 at 1:37 PM, Robert Haas  wrote:
> On Sun, Jan 8, 2012 at 9:25 AM, Simon Riggs  wrote:
>> I've taken that idea and used it to build a second Clog cache, known
>> as ClogHistory which allows access to the read-only tail of pages in
>> the clog. Once a page has been written to for the last time, it will
>> be accessed via the ClogHistory Slru in preference to the normal Clog
>> Slru. This separates historical accesses by readers from current write
>> access by committers. Historical access doesn't force dirty writes,
>> nor are commits made to wait when historical access occurs.
>
> This seems to need a rebase.

OT: It would save lots of time if we had 2 things for the CF app:

1. Emails that go to appropriate people when status changes. e.g. when
someone sets "Waiting for Author" the author gets an email so they
know the reviewer is expecting something. No knowing that wastes lots
of days, so if we want to do this in less days that seems like a great
place to start.

2. Something that automatically tests patches. If you submit a patch
we run up a blank VM and run patch applies on all patches. As soon as
we get a fail, an email goes to patch author. That way authors know as
soon as a recent commit invalidates something.

Those things have wasted time for me in the past, so they're
opportunities to improve the process, not must haves.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] our buffer replacement strategy is kind of lame

2012-01-20 Thread Robert Haas

On Fri, Jan 20, 2012 at 9:29 AM, Heikki Linnakangas
 wrote:
> I'd like to see some benchmarks that show a benefit from these patches,
> before committing something like this that complicates the code. These
> patches are fairly small, but nevertheless. Once we have a test case, we can
> argue whether the benefit we're seeing is worth the extra code, or if
> there's some better way to achieve it.

Agreed.  I don't really like either of these patches stylistically:
when you start sometimes locking things and sometimes not locking
things, it gets hard to reason about how the code will behave, and you
limit what you can in the future do inside the critical section
because you have to keep in mind that you don't really have full
mutual exclusion.  There are places where it may be worth making that
trade-off if we can show a large performance benefit, but I am wary of
quick hacks that may get in the way of more complete solutions we want
to implement later, especially if the performance benefit is marginal.

I'm in the process of trying to pull together benchmark numbers for
the various performance-related patches that are floating around this
CommitFest, but a big problem is that many of them need to be tested
in different ways, which are frequently not explicitly laid out in the
emails in which they are submitted.  Some of them are calculated to
improve latency, and others throughput.  Some require pgbench run with
a small scale factor, some require pgbench with a large scale factor,
and some need some completely other type of testing altogether.  Some
need short tests, others need long tests, still others require testing
with very specific parameters, and some don't really need more testing
so much as they need profiling.  Many (like the double-write patch)
need more thought about worst case scenarios, like a sequential scan
on a large, unhinted table.  Some only really need testing in one or
two scenarios, but others could affect a broad variety of workloads
and really ought to be tested in many different ways to fully
understand the behavior.   Some of them may be interdependent in
complex ways.

My current belief, based on the testing I did before Christmas and my
gut feelings about the remaining patches, is that the patches which
have the best chance of making a major difference on general workloads
are your XLOG insert scaling patch and the group commit patch.  Those
are, in fact, virtually the only ones that have had more than zero
test results posted to the list so far, and those test results are
extremely promising.  The various checkpoint-related patches also seem
somewhat promising to me, despite the lack (so far) of any test
results.  Everything else strikes me as likely to make a difference
that is either very small or only applicable in a fairly circumscribed
set of cases.  This preliminary opinion is subject to change as more
evidence comes in, of course.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch review for logging hooks (CF 2012-01)

2012-01-20 Thread Robert Haas

On Fri, Jan 20, 2012 at 2:47 AM, Greg Smith  wrote:
>> The updated patch looks good, marking as 'Ready for Committer'
>
> Patches without documentation are never ready for commit.  For this one, I'm
> not sure if that should be in the form of a reference example in contrib, or
> just something that documents that the hook exists and what the ground rules
> are for grabbing it.

Hooks are frequently not documented, and we only sometimes even bother
to include an example in contrib.  We should probably at least have a
working example for testing purposes, though, whether or not we end up
committing it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers

2012-01-20 Thread Robert Haas

On Fri, Jan 20, 2012 at 9:28 AM, Dimitri Fontaine
 wrote:
> Robert Haas  writes:
>> My advice is to forget about trying to provide the command string to
>> the user for the first version of this patch.  As you're finding out,
>> there's no simple, easy, obvious way of doing it, and there are N>0
>> useful things that can be done without that functionality.
>
> Actually, none of my current examples and tests are relying on it. For
> the trigger based replications Jan said he would need the full parse
> tree rather than just the command string anyway, so we're not loosing
> anything more that we already were ready to postpone.

Cool.

>>  I continue
>> to think that for a first version of this, we should be satisfied to
>> pass just the OID.  I know that's not really what you want, but
>> there's much to be said for picking a goal that is achievable in the
>> limited time available, and I fear that setting your sights too high
>> will lead to something that either (a) doesn't get committed, or (b)
>> gets committed, but turns out not to work very well, either of which
>> would be less than ideal.
>
> Agreed, mostly.
>
> From the code I already have I think we should still pass in the command
> tag, the schema name (or null) and the object name (or null) as input
> parameters to the trigger's procedure.

I think the OID is better than the name, but if it's easy to pass the
name and schema, then I'm fine with it.  But I do think this is one of
those problems that is complex enough that we should be happy to get
the core infrastructure in in the first commit, even with an extremely
limited amount of functionality, and then build on it.
Enhance-and-extend is so much easier than a monolithic code drop.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CLOG contention, part 2

2012-01-20 Thread Robert Haas

On Fri, Jan 20, 2012 at 9:44 AM, Simon Riggs  wrote:
> On Fri, Jan 20, 2012 at 1:37 PM, Robert Haas  wrote:
>> On Sun, Jan 8, 2012 at 9:25 AM, Simon Riggs  wrote:
>>> I've taken that idea and used it to build a second Clog cache, known
>>> as ClogHistory which allows access to the read-only tail of pages in
>>> the clog. Once a page has been written to for the last time, it will
>>> be accessed via the ClogHistory Slru in preference to the normal Clog
>>> Slru. This separates historical accesses by readers from current write
>>> access by committers. Historical access doesn't force dirty writes,
>>> nor are commits made to wait when historical access occurs.
>>
>> This seems to need a rebase.
>
> OT: It would save lots of time if we had 2 things for the CF app:
>
> 1. Emails that go to appropriate people when status changes. e.g. when
> someone sets "Waiting for Author" the author gets an email so they
> know the reviewer is expecting something. No knowing that wastes lots
> of days, so if we want to do this in less days that seems like a great
> place to start.
>
> 2. Something that automatically tests patches. If you submit a patch
> we run up a blank VM and run patch applies on all patches. As soon as
> we get a fail, an email goes to patch author. That way authors know as
> soon as a recent commit invalidates something.
>
> Those things have wasted time for me in the past, so they're
> opportunities to improve the process, not must haves.

Yeah, I agree that that would be nice.  I just haven't had time to
implement much of anything for the CF application in a long time.  My
management has been very interested in the performance and scalability
stuff, so that's been my main focus for 9.2.  I'm going to see if I
can carve out some time for this once the dust settles.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch review for logging hooks (CF 2012-01)

2012-01-20 Thread Marti Raudsepp

On Fri, Jan 20, 2012 at 17:01, Robert Haas  wrote:
> We should probably at least have a
> working example for testing purposes, though, whether or not we end up
> committing it.

Martin Pihlak sent a short description of how to test the patch with
his pg_logforward module:
http://archives.postgresql.org/pgsql-hackers/2012-01/msg00790.php

That's what I used when reviewing, too.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-01-20 Thread Tom Lane

Robert Haas  writes:
> The code I've written so far does no canonicalization of the input
> value of any kind, just as we do for XML.

Fair enough.

> So, given that framework, what the patch does is this: if you're using
> UTF-8, then \u is accepted, provided that  is something that
> equates to a legal Unicode code point.  It isn't converted to the
> corresponding character: it's just validated.  If you're NOT using
> UTF-8, then it allows \u for code points up through 127 (which we
> assume are the same in all encodings) and anything higher than that is
> rejected.

This seems a bit silly.  If you're going to leave the escape sequence as
ASCII, then why not just validate that it names a legal Unicode code
point and be done?  There is no reason whatever that that behavior needs
to depend on the database encoding.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Group commit, revised

2012-01-20 Thread Robert Haas

On Thu, Jan 19, 2012 at 10:46 PM, Peter Geoghegan  wrote:
> On 19 January 2012 17:40, Robert Haas  wrote:
>> I don't know what you mean by this.  I think removing wal_writer_delay
>> is premature, because I think it still may have some utility, and the
>> patch removes it.  That's a separate change that should be factored
>> out of this patch and discussed separately.
>
> FWIW, I don't really care too much if we keep wal_writer_delay,
> provided it is only used in place of the patch's
> WALWRITER_NORMAL_TIMEOUT constant. I will note in passing that I doubt
> that the effect with asynchronous commits and hint bits is pronounced
> enough to have ever transferred through to someone making a practical
> recommendation to reduce wal_writer_delay to ameliorate clog
> contention.

It was very visible in some benchmarking Heikki did, and I was able to
reproduce it.

>> Yeah, I guess the shutdown sequence could get a bit complex if we try
>> to make everyone go through the WAL writer all the time.  But I wonder
>> if we could rejiggering things somehow so that everything goes through
>> WAL writer if its dead.
>
> I'm not sure what you mean by this last bit. It sounds a bit hazardous.

That last "if" was supposed to say "unless", which may contribute to
the confusion.

> My problem with nominating a backend to the status of an auxiliary is
> that no matter what way you cut it, it increases the failure surface
> area, so to speak.

I think that's the wrong way of thinking about it.  Imagine this: we
maintain a queue of people who are waiting on the current WAL flush,
the current-flush-to LSN, and a queue of people who are waiting on the
next WAL flush, and a leader.  All this data is protected by a
spinlock.  When you want to flush WAL, you grab the spinlock.  If the
current-flush-to LSN is greater than the LSN you need, you add
yourself to the waiting-on-current-flush queue, release the spinlock,
and go to sleep.  Otherwise, if there's no leader, you become the
leader, enter your flush LSN as the current-flush-to-LSN, and release
the spinlock.  If there is a leader, you add yourself to the
waiting-on-next-flush queue, release the spinlock, and sleep.

If you become the leader, you perform the flush.  Then you retake the
spinlock, dequeue anyone waiting on the current flush, move all of the
next flush waiters (or those within a certain LSN distance) to the
current flush list, remember who is at the head of that new queue, and
release the spinlock.  You then set a flag in the PGPROC of the
backend now at the head of the next-flush queue and wake him up; he
checks that flag on waking to know whether he is the leader or whether
he's just been woken because his flush is done.  After waking him so
the next flush can get started, you wake all the people who were
waiting on the flush you already did.

This may or may not be a good design, but I don't think it has any
more failure surface area than what you have here.  In particular,
whether or not the WAL writer is running doesn't matter; the system
can run just fine without it, and can even still do group commit.  To
look at it another way, it's not a question of whether we're treating
a regular backend as an auxiliary process; there's no rule anywhere
that backends can't be dependent on the proper operation of other
backends for proper functioning - there are MANY places that have that
property, including LWLockAcquire() and LWLockRelease().  Nor is there
any rule that background processes are more reliable than foreground
processes, nor do I believe they are.  Much of the existing patch's
failure surface seems to me to come from the coordination it requires
between ordinary backends and the background writer, and possible race
conditions appertaining thereto: WAL writer dies, backend dies,
postmaster dies, postmaster and WAL writer die together, etc.

>> +       if (delta > XLOG_SEG_SIZE * CheckPointSegments ||
>> +                       !ProcGlobal->groupCommitAvailable)
>>
>> That seems like a gigantic value.  I would think that we'd want to
>> forget about group commit any time we're behind by more than one
>> segment (maybe less).
>
> I'm sure that you're right - I myself described it as horrible in my
> original mail. I think that whatever value we set this to ought to be
> well reasoned. Right now, your magic number doesn't seem much better
> than my bogus heuristic (which only ever served as a placeholder
> implementation that hinted at a well-principled formula). What's your
> basis for suggesting that that limit would always be close to optimal?

It's probably not - I suspect even a single WAL segment still too
large.  I'm pretty sure that it would be safe to always flush up to
the next block boundary, because we always write the whole block
anyway even if it's only partially filled.  So there's no possible
regression there.  Anything larger than "the end of the current 8kB
block" is going to be a trade-off between latency and throughput, and
it seems to me tha

Re: [HACKERS] JSON for PG 9.2

2012-01-20 Thread Robert Haas

On Fri, Jan 20, 2012 at 10:27 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> The code I've written so far does no canonicalization of the input
>> value of any kind, just as we do for XML.
>
> Fair enough.
>
>> So, given that framework, what the patch does is this: if you're using
>> UTF-8, then \u is accepted, provided that  is something that
>> equates to a legal Unicode code point.  It isn't converted to the
>> corresponding character: it's just validated.  If you're NOT using
>> UTF-8, then it allows \u for code points up through 127 (which we
>> assume are the same in all encodings) and anything higher than that is
>> rejected.
>
> This seems a bit silly.  If you're going to leave the escape sequence as
> ASCII, then why not just validate that it names a legal Unicode code
> point and be done?  There is no reason whatever that that behavior needs
> to depend on the database encoding.

Mostly because that would prevent us from adding canonicalization in
the future, AFAICS, and I don't want to back myself into a corner.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CLOG contention, part 2

2012-01-20 Thread Robert Haas

On Fri, Jan 20, 2012 at 10:16 AM, Simon Riggs  wrote:
> On Fri, Jan 20, 2012 at 1:37 PM, Robert Haas  wrote:
>> On Sun, Jan 8, 2012 at 9:25 AM, Simon Riggs  wrote:
>>> I've taken that idea and used it to build a second Clog cache, known
>>> as ClogHistory which allows access to the read-only tail of pages in
>>> the clog. Once a page has been written to for the last time, it will
>>> be accessed via the ClogHistory Slru in preference to the normal Clog
>>> Slru. This separates historical accesses by readers from current write
>>> access by committers. Historical access doesn't force dirty writes,
>>> nor are commits made to wait when historical access occurs.
>>
>> This seems to need a rebase.
>
> Still applies and compiles cleanly for me.

D'oh.  You're right.  Looks like I accidentally tried to apply this to
the 9.1 sources.  Sigh...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum rate limit in KBps

2012-01-20 Thread Robert Haas

On Thu, Jan 19, 2012 at 11:29 PM, Greg Smith  wrote:
> I chewed a bit on Heikki's comment that similarity to the query planning
> parameters might be useful, and Robert's that being able to explain how the
> feature works more easily has value.  I have an initial adjustment of my
> general idea that I think moves usefully in both those directions.
>
> The existing VACUUM cost constants look like this:
>
> vacuum_cost_page_hit = 1
> vacuum_cost_page_miss = 10
> vacuum_cost_page_dirty = 20
>
> These could be adjusted to instead be ratios like the query planner ones
> (seq_page_cost, random_page_cost, etc.), referenced off a value of 1.0 for
> page miss ~= a read is expected:
>
> vacuum_cost_page_hit = 0.1
> vacuum_cost_page_miss = 1.0
> vacuum_cost_page_dirty = 2.0
>
> Now add in the new setting, which is explicitly said to be the read value:
>
> vacuum_cost_read_limit = 8000 # maximum page miss read rate in
> kilobytes/second
>
> And I can shuffle the numbers around internally such that things still work
> exactly the same, at the default parameters.  And then anyone who spends
> time learning how either the query planning or vacuum cost ratio constants
> work will find the learning curve to pick up the other set easier.

That may be a little better, but I still don't think it's worth
breaking backward compatibility for.  I mean, suppose I don't care
about read rate, but I want to limit my dirty data rate to 1MB/s.
What parameters should I set?

> It might be a bit more straightforward yet if things were renamed so it was
> more obvious that page miss~=read, but I haven't seen a good way to do that
> yet.  Renaming the reference cost value to vacuum_cost_page_read has two
> problems.  It makes the backward compatibility issues larger, and it's not
> quite true.  The way I think this should be explained, they really aren't
> the same; that's why I used ~= above.  A page miss is not guaranteed to be a
> read, it just is expected to be one in the worst case.  The read rate that
> vacuum page misses introduce will not be exactly the same as
> vacuum_cost_read_limit--but it will be below that limit, which is all it
> claims to be.

Maybe, but I still think having the read rate limit the dirty rate or
visca versa is *really* weird.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CLOG contention, part 2

2012-01-20 Thread Simon Riggs

On Fri, Jan 20, 2012 at 3:32 PM, Robert Haas  wrote:
> On Fri, Jan 20, 2012 at 10:16 AM, Simon Riggs  wrote:
>> On Fri, Jan 20, 2012 at 1:37 PM, Robert Haas  wrote:
>>> On Sun, Jan 8, 2012 at 9:25 AM, Simon Riggs  wrote:
 I've taken that idea and used it to build a second Clog cache, known
 as ClogHistory which allows access to the read-only tail of pages in
 the clog. Once a page has been written to for the last time, it will
 be accessed via the ClogHistory Slru in preference to the normal Clog
 Slru. This separates historical accesses by readers from current write
 access by committers. Historical access doesn't force dirty writes,
 nor are commits made to wait when historical access occurs.
>>>
>>> This seems to need a rebase.
>>
>> Still applies and compiles cleanly for me.
>
> D'oh.  You're right.  Looks like I accidentally tried to apply this to
> the 9.1 sources.  Sigh...

No worries. It's Friday.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CLOG contention, part 2

2012-01-20 Thread Robert Haas

On Fri, Jan 20, 2012 at 10:38 AM, Simon Riggs  wrote:
> On Fri, Jan 20, 2012 at 3:32 PM, Robert Haas  wrote:
>> On Fri, Jan 20, 2012 at 10:16 AM, Simon Riggs  wrote:
>>> On Fri, Jan 20, 2012 at 1:37 PM, Robert Haas  wrote:
 On Sun, Jan 8, 2012 at 9:25 AM, Simon Riggs  wrote:
> I've taken that idea and used it to build a second Clog cache, known
> as ClogHistory which allows access to the read-only tail of pages in
> the clog. Once a page has been written to for the last time, it will
> be accessed via the ClogHistory Slru in preference to the normal Clog
> Slru. This separates historical accesses by readers from current write
> access by committers. Historical access doesn't force dirty writes,
> nor are commits made to wait when historical access occurs.

 This seems to need a rebase.
>>>
>>> Still applies and compiles cleanly for me.
>>
>> D'oh.  You're right.  Looks like I accidentally tried to apply this to
>> the 9.1 sources.  Sigh...
>
> No worries. It's Friday.

http://www.youtube.com/watch?v=kfVsfOSbJY0

Of course, I even ran git log to check that I had the latest
sources... but what I had, of course, was the latest 9.1 sources,
which still have recently-timestamped commits, and I didn't look
carefully enough.  Sigh.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-01-20 Thread Andrew Dunstan




On 01/20/2012 09:19 AM, Robert Haas wrote:

On Fri, Jan 20, 2012 at 12:07 AM, Tom Lane  wrote:

Andrew Dunstan  writes:

On 01/19/2012 04:12 PM, Robert Haas wrote:

On Thu, Jan 19, 2012 at 4:07 PM, Andrew Dunstanwrote:

The spec only allows unescaped Unicode chars (and for our purposes that
means UTF8). An unescaped non-ASCII character in, say, ISO-8859-1 will
result in something that's not legal JSON.

I understand.  I'm proposing that we not care.  In other words, if the
server encoding is UTF-8, it'll really be JSON.  But if the server
encoding is something else, it'll be almost-JSON.

Of course, for data going to the client, if the client encoding is UTF8,
they should get legal JSON, regardless of what the database encoding is,
and conversely too, no?

Yes.  I think this argument has been mostly theologizing, along the
lines of how many JSON characters can dance on the head of a pin.
 From a user's perspective, the database encoding is only a constraint on
which characters he can store.

Bingo.


He does not know or care what the bit
representation is inside the server.  As such, if we store a non-ASCII
character in a JSON string, it's valid JSON as far as the user is
concerned, so long as that character exists in the Unicode standard.
If his client encoding is UTF8, the value will be letter-perfect JSON
when it gets to him; and if his client encoding is not UTF8, then he's
already pretty much decided that he doesn't give a fig about the
Unicode-centricity of the JSON spec, no?

Also agreed.  Personally, I think it may not have been a great idea to
tie the JSON spec so closely to Unicode, but I understand that it
would have been difficult to define an encoding-agnostic equivalent of
\u, since it's hard to know for sure whether an arbitrary encoding
even has a (sensible?) definition of code points, and they probably
wanted to avoid ambiguity.  But, it's bound to cause problems for any
system that runs in some other encoding, which, when so requested, we
do.  Even if we had the ability to support multiple encodings in the
same database, I'm not sure I'd be very excited about insisting that
JSON data always be stored in UTF-8, because that would introduce a
lot of unnecessary transcoding for people using other encodings and
basically unnecessarily handicap the functionality provided by the
datatype.  But at least if we had that, people would have the *option*
to use JSON with UTF-8 and get the fully spec-compliant behavior.  As
it is, they don't; the system we have forces the database encoding on
all datatypes whether they like it or not, and that ain't changing for
9.2.


So I'm with Robert: we should just plain not care.  I would further
suggest that maybe what we should do with incoming JSON escape sequences
is convert them to Unicode code points and then to the equivalent
character in the database encoding (or throw error if there is none).

The code I've written so far does no canonicalization of the input
value of any kind, just as we do for XML.  I'm inclined to leave it
that way.  Eventually, we might want to make the JSON datatype support
equality comparisons and so on, and that will require the system to
knowing that the letter r can be encoded as some \u sequence and
that the escape \r is equivalent to some other escape \u, but
right now all the code does is try to validate that the JSON is legal,
NOT second-guess the user's choice about how to spell things or where
to insert whitespace.  I think that's a good choice because (1) AFAIK,
there's no official canonicalization method for JSON, so whatever we
pick will be something we think is best, not an official method
sanction by the spec, (2) users might prefer the way they chose to
represent a given value over the way we choose to represent it, and
(3) by simply validating and storing the JSON object, rather than
doing any canonicalization, the input function avoids the need to do
any data copying, hopefully maximizing speed.  Canonicalization can be
added on top of what I've done here and people who want or need it can
use it; I have some ideas around how to make that leverage the
existing code that I intend to pursue for 9.3, but right now I'd
rather not go there.

So, given that framework, what the patch does is this: if you're using
UTF-8, then \u is accepted, provided that  is something that
equates to a legal Unicode code point.  It isn't converted to the
corresponding character: it's just validated.  If you're NOT using
UTF-8, then it allows \u for code points up through 127 (which we
assume are the same in all encodings) and anything higher than that is
rejected.  If someone knows an easy way to check whether a \u
sequence for >  007F is a legal Unicode code point that has an
equivalent in the current server encoding, then we can add logic to
allow that case also, but personally I'm not that excited about it.
Anyone who is using \u escapes with a non-Unicode coding is
probably hoping that we ca

Re: [HACKERS] CLOG contention, part 2

2012-01-20 Thread Simon Riggs

On Fri, Jan 20, 2012 at 1:37 PM, Robert Haas  wrote:
> On Sun, Jan 8, 2012 at 9:25 AM, Simon Riggs  wrote:
>> I've taken that idea and used it to build a second Clog cache, known
>> as ClogHistory which allows access to the read-only tail of pages in
>> the clog. Once a page has been written to for the last time, it will
>> be accessed via the ClogHistory Slru in preference to the normal Clog
>> Slru. This separates historical accesses by readers from current write
>> access by committers. Historical access doesn't force dirty writes,
>> nor are commits made to wait when historical access occurs.
>
> This seems to need a rebase.

Still applies and compiles cleanly for me.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inline Extension

2012-01-20 Thread Dimitri Fontaine

Robert Haas  writes:
> peeves is that the system doesn't know how to do an install of v1.1 by
> running the v1.0 script followed by the 1.0-1.1 upgrade script, which

Did you try

  CREATE EXTENSION foo FROM 1.0;

Maybe you would want the system to be able to determine the oldest
version to start from to reach the current default_version given in the
control file, but I guess it would be better to add another property
like default_full_version or such (last_stop?).

Looks like a simple enough project, the fact that it helps our own
shipping and script maintenance could maybe allow us to work on that
this late, dunno.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: index support for regexp search

2012-01-20 Thread Marti Raudsepp

On Fri, Jan 20, 2012 at 01:33, Erik Rijkers  wrote:
> Btw, it seems impossible to Ctrl-C out of a search once it is submitted; I 
> suppose this is
> normally necessary for perfomance reasons, but it would be useful te be able 
> to compile a test
> version that allows it.

I believe being interruptible is a requirement for the patch to be accepted.

CHECK_FOR_INTERRUPTS(); should be added to the indeterminate loops.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inline Extension

2012-01-20 Thread Robert Haas

On Fri, Jan 20, 2012 at 11:29 AM, Dimitri Fontaine
 wrote:
> Robert Haas  writes:
>> peeves is that the system doesn't know how to do an install of v1.1 by
>> running the v1.0 script followed by the 1.0-1.1 upgrade script, which
>
> Did you try
>
>  CREATE EXTENSION foo FROM 1.0;

Well, yes, that works, but I'm going for what you wrote next:

> Maybe you would want the system to be able to determine the oldest
> version to start from to reach the current default_version given in the
> control file, but I guess it would be better to add another property
> like default_full_version or such (last_stop?).

Possibly that might be a better design, yes.  Especially since we
don't order version numbers intrinsically.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-01-20 Thread Robert Haas

On Fri, Jan 20, 2012 at 10:45 AM, Andrew Dunstan  wrote:
> XML's &#; escape mechanism is more or less the equivalent of JSON's
> \u. But XML documents can be encoded in a variety of encodings,
> including non-unicode encodings such as Latin-1. However, no matter what the
> document encoding, &#; designates the character with Unicode code point
> , whether or not that is part of the document encoding's charset.

OK.

> Given that precedent, I'm wondering if we do need to enforce anything other
> than that it is a valid unicode code point.
>
> Equivalence comparison is going to be difficult anyway if you're not
> resolving all \u escapes. Possibly we need some sort of canonicalization
> function to apply for comparison purposes. But we're not providing any
> comparison ops today anyway, so I don't think we need to make that decision
> now. As you say, there doesn't seem to be any defined canonical form - the
> spec is a bit light on in this respect.

Well, we clearly have to resolve all \u to do either comparison or
canonicalization.  The current patch does neither, but presumably we
want to leave the door open to such things.  If we're using UTF-8 and
comparing two strings, and we get to a position where one of them has
a character and the other has \u, it's pretty simple to do the
comparison: we just turn  into a wchar_t and test for equality.
That should be trivial, unless I'm misunderstanding.  If, however,
we're not using UTF-8, we have to first turn \u into a Unicode
code point, then covert that to a character in the database encoding,
and then test for equality with the other character after that.  I'm
not sure whether that's possible in general, how to do it, or how
efficient it is.  Can you or anyone shed any light on that topic?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-01-20 Thread David E. Wheeler

On Jan 19, 2012, at 9:07 PM, Tom Lane wrote:

> If his client encoding is UTF8, the value will be letter-perfect JSON
> when it gets to him; and if his client encoding is not UTF8, then he's
> already pretty much decided that he doesn't give a fig about the
> Unicode-centricity of the JSON spec, no?

Don’t entirely agree with this. Some folks are stuck with other encodings and 
cannot change them for one reason or another. That said, they can convert JSON 
from their required encoding into UTF-8 on the client side, so there is a 
workaround.

Not that this changes anything, and I agree with the overall direction of the 
discussion here. I just want to make sure we keep in mind folks who don’t 
necessarily have the freedom to switch to UTF-8. (And I say this as someone who 
*always* uses UTF-8!)

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-01-20 Thread David E. Wheeler

On Jan 20, 2012, at 8:58 AM, Robert Haas wrote:

> If, however,
> we're not using UTF-8, we have to first turn \u into a Unicode
> code point, then covert that to a character in the database encoding,
> and then test for equality with the other character after that.  I'm
> not sure whether that's possible in general, how to do it, or how
> efficient it is.  Can you or anyone shed any light on that topic?

If it’s like the XML example, it should always represent a Unicode code point, 
and *not* be converted to the other character set, no?

At any rate, since the JSON standard requires UTF-8, such distinctions having 
to do with alternate encodings are not likely to be covered, so I suspect we 
can do whatever we want here. It’s outside the spec.

Best,

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

79 matches

Mail list logo