date:20101019

Re: [HACKERS] How to determine failed connection attempt due to invalid authorization (libpq)?

2010-10-19 Thread David Fetter

On Mon, Oct 18, 2010 at 07:26:32AM -0700, David Fetter wrote:
 On Mon, Oct 18, 2010 at 10:18:25AM -0400, Robert Haas wrote:
  On Mon, Oct 18, 2010 at 10:05 AM, David Fetter da...@fetter.org wrote:
   On Mon, Oct 18, 2010 at 10:03:55AM -0400, Robert Haas wrote:
   On Sun, Oct 17, 2010 at 2:03 AM, Dmitriy Igrishin dmit...@gmail.com 
   wrote:
I've asked pgsql-general.
Unfortunately it seems that there is no better way to do it except
parsing PQerrorMessage(). Sadly.
  
   Yeah, doesn't look like it.  A quick glance at the code reveals that a
   PGresult can store individual error fields but a PGconn can store only
   a message.   :-(
  
   Does this seem worth patching for 9.1?
  
  Please feel free to submit a patch if you have an idea how to solve it.
 
 Will look that over this evening and submit an idea for a patch :)

I'm pretty slammed by some other commitments at the moment.  Probably
won't get a chance to do this until at least this weekend.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: extensible enums

2010-10-19 Thread Dean Rasheed

On 19 October 2010 05:21, Andrew Dunstan and...@dunslane.net wrote:


 On 10/18/2010 10:52 AM, Tom Lane wrote:

 We could possibly deal with enum types that follow the existing
 convention if we made the cache entry hold a list of all the original,
 known-to-be-sorted OIDs.  (This could be reasonably compact and cheap to
 probe if it were represented as a starting OID and a Bitmapset of delta
 values, since we can assume that the initial set of OIDs is pretty close
 together.)  But we have to have that cache entry, and we have to consult
 it on every single comparison, so it's definitely going to be slower
 than before.

 So I'm thinking the comparison procedure goes like this:

 1. Both OIDs even?
        If so, just compare them numerically, and we're done.

 2. Lookup cache entry for enum type.

 3. Both OIDs in list of known-sorted OIDs?
        If so, just compare them numerically, and we're done.

 4. Search the part of the cache entry that lists sort positions.
        If not both present, refresh the cache entry.
        If still not present, throw error.

 5. Compare by sort positions.

 Step 4 is the slowest part but would be avoided in most cases.
 However, step 2 is none too speedy either, and would usually
 be required when dealing with pre-existing enums.

 OK, I've made adjustments that I think do what you're suggesting.

 Patch is attached.


Ah, I'd missed the point about the bitmapset. In the most common case,
most of the enum elements are probably going to be in the right order,
so you save a lot by identifying that case quickly.

I didn't have time to play with hash maps myself, but I don't think it
will make much difference now because hopefully the binary search
isn't going to be hit a lot anyway.

There are a couple of things that look odd about this code though (I
haven't tested it, but they look wrong):

In the loop identifying the longest range:

for (list_end = start_pos+1; list_end  num; list_end++)
if (mycache-sort_order_list[list_end].sort_order 
mycache-sort_order_list[list_end - 1].sort_order ||
mycache-sort_order_list[list_end].enum_oid -
mycache-sort_order_list[list_end].enum_oid = BITMAPSIZE)

That last test isn't doing anything. Shouldn't the second list_end
should be start_pos?

In the loop that sets bits in the bitmap, it looks like it's assuming
the range starts at 0. I haven't had any caffeine yet, so maybe I'm
misunderstanding it, but I can't see anywhere that sets bitmap_base.

Regards,
Dean


 Alternatively this can be pulled from
 g...@github.com:adunstan/postgresql-dev.git

 cheers

 andrew


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: extensible enums

2010-10-19 Thread Thom Brown

On 19 October 2010 05:21, Andrew Dunstan and...@dunslane.net wrote:


 On 10/18/2010 10:52 AM, Tom Lane wrote:

 We could possibly deal with enum types that follow the existing
 convention if we made the cache entry hold a list of all the original,
 known-to-be-sorted OIDs.  (This could be reasonably compact and cheap to
 probe if it were represented as a starting OID and a Bitmapset of delta
 values, since we can assume that the initial set of OIDs is pretty close
 together.)  But we have to have that cache entry, and we have to consult
 it on every single comparison, so it's definitely going to be slower
 than before.

 So I'm thinking the comparison procedure goes like this:

 1. Both OIDs even?
        If so, just compare them numerically, and we're done.

 2. Lookup cache entry for enum type.

 3. Both OIDs in list of known-sorted OIDs?
        If so, just compare them numerically, and we're done.

 4. Search the part of the cache entry that lists sort positions.
        If not both present, refresh the cache entry.
        If still not present, throw error.

 5. Compare by sort positions.

 Step 4 is the slowest part but would be avoided in most cases.
 However, step 2 is none too speedy either, and would usually
 be required when dealing with pre-existing enums.

 OK, I've made adjustments that I think do what you're suggesting.

 Patch is attached.

 Alternatively this can be pulled from
 g...@github.com:adunstan/postgresql-dev.git

Andrew, can't you get your own repo at git.postgresql.org?

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: extensible enums

2010-10-19 Thread Magnus Hagander

On Tue, Oct 19, 2010 at 10:19, Thom Brown t...@linux.com wrote:
 On 19 October 2010 05:21, Andrew Dunstan and...@dunslane.net wrote:


 On 10/18/2010 10:52 AM, Tom Lane wrote:

 We could possibly deal with enum types that follow the existing
 convention if we made the cache entry hold a list of all the original,
 known-to-be-sorted OIDs.  (This could be reasonably compact and cheap to
 probe if it were represented as a starting OID and a Bitmapset of delta
 values, since we can assume that the initial set of OIDs is pretty close
 together.)  But we have to have that cache entry, and we have to consult
 it on every single comparison, so it's definitely going to be slower
 than before.

 So I'm thinking the comparison procedure goes like this:

 1. Both OIDs even?
        If so, just compare them numerically, and we're done.

 2. Lookup cache entry for enum type.

 3. Both OIDs in list of known-sorted OIDs?
        If so, just compare them numerically, and we're done.

 4. Search the part of the cache entry that lists sort positions.
        If not both present, refresh the cache entry.
        If still not present, throw error.

 5. Compare by sort positions.

 Step 4 is the slowest part but would be avoided in most cases.
 However, step 2 is none too speedy either, and would usually
 be required when dealing with pre-existing enums.

 OK, I've made adjustments that I think do what you're suggesting.

 Patch is attached.

 Alternatively this can be pulled from
 g...@github.com:adunstan/postgresql-dev.git

 Andrew, can't you get your own repo at git.postgresql.org?

He certainly could, but github provides more features and a nicer look
:-) And since it's git, it doesn't matter where the repo is.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leaky views, yet again

2010-10-19 Thread KaiGai Kohei


(2010/10/14 1:52), Tom Lane wrote:

Robert Haasrobertmh...@gmail.com  writes:

On Wed, Oct 13, 2010 at 11:45 AM, Tom Lanet...@sss.pgh.pa.us  wrote:

That's all true, but you have to consider how much the obstacle actually
gets in their way versus how painful it is on your end to create and
maintain the obstacle. �I don't think this proposed patch measures up
very well on either end of that tradeoff.



I think it would behoove us to try to separate concerns about this
particular patch from concerns about the viability of the whole
approach.  Whether or not it's useful to do X is a different question
than whether it can be done with few enough lines of code and/or
whether this patch actually does it well.


I think I left the wrong impression: I'm concerned about the whole
approach.  I haven't even read this particular patch lately.  I think
that trying to guarantee that the planner applies independent
constraints in a particular order will be expensive, fragile, and prone
to recurring security bugs no matter how it's implemented --- unless you
do it by lobotomizing query pullup/pushdown, which seems unacceptable
from a performance standpoint.

Just to give one example of what this patch misses (probably; as I said
I haven't read it lately), what about selectivity estimation?  One of
the things we like to do when we have an otherwise-unknown function is
to try it on all the histogram members and see what percentage yield
true.  That might already represent enough of an information leak to be
objectionable ... and yet, if we don't do it, the consequences for
performance could be pretty horrid because we'll have to work without
any reasonable selectivity estimate at all.  There are cases where this
technique isn't applied at the moment but probably should be, which is
why I characterize the leak-prevention idea as creating future security
issues: doing that is a constraint that will have to be accounted for in
every future planner change, and we are certain to miss the implications
sometimes.


Sorry, I might misread what you pointed out.

Do you see oprrest/oprjoin being internally invoked as a problem?
Well, I also think it is a problem, as long as they can be installed
by non-superusers. But, it is a separated problem from the row-level
security issues.

In my opinion, origin of the matter is incorrect checks on creation
of operators. It allows non-superusers to define a new operator with
restriction and join estimator as long as he has EXECUTE privilege
on the functions. (not EXECUTE privilege of actual user who invokes
this function on estimation phase!)
Then, the optimizer may internally invoke these functions without
any privilege checks on either the function or the table to be
estimated. If a malicious one tries to make other users to invoke
a trojan-horse, he can define a trappy operator with malicious
estimator functions, cannot it?

I think it is a situation that we should check superuser privilege
which means the specified functions have no malicious intention
and equivalent to the privilege to grant 'EXECUTE' to public.

Here is similar problem at conversion functions.
If malicious one want to install a fake conversion function that
records all the stream between server and client, it seems to me
not impossible.

Well, I'd like to suggest to revise the specifications of permission
checks on these commands. If we can ensure these functions are not
malicious, no need to care about information leaks via untrusted
functions internally used.

Thanks,
--
KaiGai Kohei kai...@ak.jp.nec.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simplifying replication

2010-10-19 Thread Dimitri Fontaine

Hi,

Josh Berkus j...@agliodbs.com writes:
 It is critical that we make replication easier to set up, administrate and
 monitor than it currently is.  In my conversations with people, this is more
 important to our users and the adoption of PostgreSQL than synchronous
 replication is.

I want to say a big big +1 here. The way replication and PITR setup are
implemented now are a very good prototype, it's time to consolidate and
get to something usable by normal people, as opposed to PostgreSQL full
time geeks.

Well, the current setup offers lots of flexibility which we'd better not
lose in the process, but the simple setup simply does not exists yet.

 1. Any postgresql standalone server can become a replication master simply
 by enabling replication connections in pg_hba.conf.  No other configuration
 is required, and no server restart is required.

That sounds as simple as changing the default wal_level to hot_standby,
and the default max_wal_senders to non-zero.

 2. Should I choose to adjust master configuration, for say performance
 reasons, most replication variables (including ones like wal_keep_segments)
 should be changeable without a server restart.

Anybody know how difficult that is without having to spend lots of time
studying the source code with the question in mind?

 3. I can configure a standby by copying the same postgresql.conf on the
 master.  I only have to change a single configuration variable (the
 primary_conninfo, or maybe a replication_mode setting) in order to start the
 server in standby mode.  GUCs which apply only to masters are ignored.

 4. I can start a new replica off the master by running a single command-line
 utility on the standby and giving it connection information to the master.
 Using this connection, it should be able to start a backup snapshot, copy
 the entire database and any required logs, and then come up in standby mode.
 All that should be required for this is one or two highport connections to
 the master.  No recovery.conf file is required, or exists.

There's a prototype to stream a base backup from a libpq connection, I
think someone here wanted to integrate that into the replication
protocol itself. It should be doable with a simple libpq connection and
all automated.

The pg_basebackup python client software is 100 lines of code. It's
mainly a recursive query to get the list of files within the master,
then two server side functions to get binary file chunks,
compressed. Then client side, a loop to decompress and write the chunks
at the right place. That's it.

  http://github.com/dimitri/pg_basebackup/blob/master/pg_basebackup.py

I could prepare a patch given some advice on the replication protocol
integration. For one, is streaming a base backup something that
walsender should care about?

 5. I can to cause the standby to fail over with a single command to the
 failover server.  If this is a trigger file, then it already has a default
 path to the trigger file in postgresql.conf, so that this does not require
 reconfiguration and restart of the standby at crisis time. Ideally, I use a
 pg_failover command or something similar.

This feature is in walmgr.py from Skytools and it's something necessary
to have in -core now that we have failover standby capacity. Much
agreed, and the pg_failover command is a good idea.

BTW, do we have a clear idea of how to implement pg_ping, and should it
reports current WAL location(s) of a standby?

 6. Should I decide to make the standby the new master, this should also be
 possible with a single command and a one-line configuration on the other
 standbys.  To aid this, we have an easy way to tell which standby in a group
 are most caught up.  If I try to promote the wrong standby (it's behind or
 somehow incompatible), it should fail with an appropriate message.

That needs a way to define a group of standby. There's nothing there
that makes them know about each other. That could fall off the automated
registration of them in a shared catalog on the master, with this shared
catalog spread over (hard-coded) asynchronous replication (sync ==
disaster here). But there's no agreement on this feature yet.

Then you need a way to organise them in groups in this shared catalog,
and you need to ask your network admins to make it so that they can
communicate with each other.

Now say we have pg_ping (or another tool) returning the current recv,
applied and synced LSNs, it would be possible for any standby to figure
out which other ones must be shot in case you failover here. The
failover command could list those other standby in the group that you're
behind of, and with a force command allow you to still failover to this
one. Now you have to STONITH the one listed, but that's your problem
after all.

Then, of course, any standby that's not in the same group as the one
that you failed over to has to be checked and resynced.

 7. Should I choose to use archive files as well as streaming replication,
 the utilities to

Re: [HACKERS] Extensions, this time with a patch

2010-10-19 Thread Itagaki Takahiro

On Mon, Oct 18, 2010 at 8:37 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Here's another version of the patch, v3. Changes:

CREATE EXTENSION is interesting feature!
I just compiled it and tested it via SQL commands. Here is a quick report.

* There are some compiler warnings. You might be missing something in
copyfuncs and equalfuncs.

copyfuncs.c:3119: warning: ‘_copyCreateExtensionStmt’ defined but not used
copyfuncs.c:3130: warning: ‘_copyDropExtensionStmt’ defined but not used
equalfuncs.c:1593: warning: ‘_equalCreateExtensionStmt’ defined but not used
equalfuncs.c:1602: warning: ‘_equalDropExtensionStmt’ defined but not used
postinit.c: In function ‘CheckMyDatabase’:
postinit.c:341: warning: implicit declaration of function ‘ExtensionSetCVC’


* There might be some bugs in pg_dump:

postgres=# CREATE EXTENSION dblink;
NOTICE:  Installing extension 'dblink' from
'$PGHOME/share/contrib/dblink.sql', with user data
CREATE EXTENSION
postgres=# \q
$ pg_dump
pg_dump: schema with OID 2200 does not exist, but is needed for object 16411


* The example in the doc CREATE EXTENSION hstore dumps surprising
warning messages,
We would be better to avoid such messages, though it's not an issue
for EXTENSION.

WARNING:  = is deprecated as an operator name
DETAIL:  This name may be disallowed altogether in future versions of
PostgreSQL.
CONTEXT:  SQL statement /* contrib/hstore/hstore.sql.in */
(followed by dumped script)


* Docs sql-createextension.html has two odd links:

See Also
DROP EXTENSION, Table 9-61, Appendix F


-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, this time with a patch

2010-10-19 Thread Dimitri Fontaine

Itagaki Takahiro itagaki.takah...@gmail.com writes:
 CREATE EXTENSION is interesting feature!
 I just compiled it and tested it via SQL commands. Here is a quick
 report.

Thanks for you time and interest!

 * There are some compiler warnings. You might be missing something in
 copyfuncs and equalfuncs.
 
 copyfuncs.c:3119: warning: ‘_copyCreateExtensionStmt’ defined but not used
 copyfuncs.c:3130: warning: ‘_copyDropExtensionStmt’ defined but not used
 equalfuncs.c:1593: warning: ‘_equalCreateExtensionStmt’ defined but not used
 equalfuncs.c:1602: warning: ‘_equalDropExtensionStmt’ defined but not used
 postinit.c: In function ‘CheckMyDatabase’:
 postinit.c:341: warning: implicit declaration of function ‘ExtensionSetCVC’
 

Ouch, sorry about that, I didn't spot them. Will fix and post a v4 patch soon.

 * There might be some bugs in pg_dump:
 
 postgres=# CREATE EXTENSION dblink;
 NOTICE:  Installing extension 'dblink' from
 '$PGHOME/share/contrib/dblink.sql', with user data
 CREATE EXTENSION
 postgres=# \q
 $ pg_dump
 pg_dump: schema with OID 2200 does not exist, but is needed for object 16411
 

I've hit that sometime but though that were tied to the dependency bug
fixed in the v3 patch. I can reproduce here, will fix too.

 * The example in the doc CREATE EXTENSION hstore dumps surprising
 warning messages,
 We would be better to avoid such messages, though it's not an issue
 for EXTENSION.
 
 WARNING:  = is deprecated as an operator name
 DETAIL:  This name may be disallowed altogether in future versions of
 PostgreSQL.
 CONTEXT:  SQL statement /* contrib/hstore/hstore.sql.in */
 (followed by dumped script)
 

I don't have a dumped script here, that maybe depends on verbosity
options?

 * Docs sql-createextension.html has two odd links:
 
 See Also
 DROP EXTENSION, Table 9-61, Appendix F
 

I didn't know if using xref would do, but if you find that odd, I will
replace with linkend and a custom label there.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_wal_senders must die

2010-10-19 Thread Stefan Kaltenbrunner


Josh Berkus wrote:

Hackers,

What purpose is served, exactly, by max_wal_senders?

In order for a standby to connect, it must have a superuser login, and 
replication connections must be enabled in pg_hba.conf.  How is having 
one more setting in one more file you have to enable on the master 
benefitting anyone?


Under what bizarre set of circumstances would anyone have runaway 
connections from replicas to the master?


Proposed that we simply remove this setting in 9.1.  The real maximum 
wal senders should be whatever max_connections is.


I disagree - limiting the maximum number of replication connections is 
important for my usecases.
Replication connections are significantly more heavilyweight than a 
normal connection and right now I for example simply use this setting to 
prevent stupid mistakes (especially in virtualized^cloudstyle environments).


What we really should look into is using a less privileged role - or 
dedicated replication role - and use the existing per role connection 
limit feature. That  feature is unlimited by default, people can change 
it like for every role and we can git rid of that guc.



Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_wal_senders must die

2010-10-19 Thread Magnus Hagander

On Tue, Oct 19, 2010 at 13:14, Stefan Kaltenbrunner
ste...@kaltenbrunner.cc wrote:
 Josh Berkus wrote:

 Hackers,

 What purpose is served, exactly, by max_wal_senders?

 In order for a standby to connect, it must have a superuser login, and
 replication connections must be enabled in pg_hba.conf.  How is having one
 more setting in one more file you have to enable on the master benefitting
 anyone?

 Under what bizarre set of circumstances would anyone have runaway
 connections from replicas to the master?

 Proposed that we simply remove this setting in 9.1.  The real maximum wal
 senders should be whatever max_connections is.

 I disagree - limiting the maximum number of replication connections is
 important for my usecases.
 Replication connections are significantly more heavilyweight than a normal
 connection and right now I for example simply use this setting to prevent
 stupid mistakes (especially in virtualized^cloudstyle environments).

 What we really should look into is using a less privileged role - or
 dedicated replication role - and use the existing per role connection limit
 feature. That  feature is unlimited by default, people can change it like
 for every role and we can git rid of that guc.

+1 for being able to control it that wya - that should keep it simple
for the newbie usecase, while retaining the ability for fine-grained
control for those who need it.

I think it's already on the TODO for 9.1 to use a separate role for it...

If we want something fixed *now*, should we perhaps just bump the
*default* value for max_wal_senders to 5 or something?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] comments on type attributes broken in 9.1devel

2010-10-19 Thread Robert Haas

On Tue, Oct 19, 2010 at 1:46 AM, Peter Eisentraut pete...@gmx.net wrote:
 The comment code refactoring c10575ff005c330d0475345621b7d381eb510c48
 broke comments on composite type attributes:

 COMMENT ON COLUMN foo.a is 'test';
 ERROR:  42809: foo is a composite type
 LOCATION:  heap_openrv, heapam.c:1110

 This is because as the code was moved from CommentAttribute() to
 get_object_address_attribute(), the call to relation_openrv() became a
 heap_openrv(), which causes the above error.

 Please check it out.

Thanks.  I changed that and added a regression test.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_wal_senders must die

2010-10-19 Thread Stefan Kaltenbrunner


Magnus Hagander wrote:

On Tue, Oct 19, 2010 at 13:14, Stefan Kaltenbrunner
ste...@kaltenbrunner.cc wrote:

Josh Berkus wrote:

Hackers,

What purpose is served, exactly, by max_wal_senders?

In order for a standby to connect, it must have a superuser login, and
replication connections must be enabled in pg_hba.conf.  How is having one
more setting in one more file you have to enable on the master benefitting
anyone?

Under what bizarre set of circumstances would anyone have runaway
connections from replicas to the master?

Proposed that we simply remove this setting in 9.1.  The real maximum wal
senders should be whatever max_connections is.

I disagree - limiting the maximum number of replication connections is
important for my usecases.
Replication connections are significantly more heavilyweight than a normal
connection and right now I for example simply use this setting to prevent
stupid mistakes (especially in virtualized^cloudstyle environments).

What we really should look into is using a less privileged role - or
dedicated replication role - and use the existing per role connection limit
feature. That  feature is unlimited by default, people can change it like
for every role and we can git rid of that guc.


+1 for being able to control it that wya - that should keep it simple
for the newbie usecase, while retaining the ability for fine-grained
control for those who need it.

I think it's already on the TODO for 9.1 to use a separate role for it...


I Think we had some plans to do that - I wonder how hard it would be to 
just do the dedicated role thing for now (maybe with the only constraint 
that it can only be used on a replication connection) and looking into 
making it (technically) less privileged later?




If we want something fixed *now*, should we perhaps just bump the
*default* value for max_wal_senders to 5 or something?


or accept -1 for unlimited and use by default, that would fix part of 
the complaint from josh but you would still have to restart the master 
to implement a limit...




Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leaky views, yet again

2010-10-19 Thread Robert Haas

On Oct 19, 2010, at 4:34 AM, KaiGai Kohei kai...@ak.jp.nec.com wrote:
 (2010/10/14 1:52), Tom Lane wrote:
 Robert Haasrobertmh...@gmail.com  writes:
 On Wed, Oct 13, 2010 at 11:45 AM, Tom Lanet...@sss.pgh.pa.us  wrote:
 That's all true, but you have to consider how much the obstacle actually
 gets in their way versus how painful it is on your end to create and
 maintain the obstacle. �I don't think this proposed patch measures up
 very well on either end of that tradeoff.
 
 I think it would behoove us to try to separate concerns about this
 particular patch from concerns about the viability of the whole
 approach.  Whether or not it's useful to do X is a different question
 than whether it can be done with few enough lines of code and/or
 whether this patch actually does it well.
 
 I think I left the wrong impression: I'm concerned about the whole
 approach.  I haven't even read this particular patch lately.  I think
 that trying to guarantee that the planner applies independent
 constraints in a particular order will be expensive, fragile, and prone
 to recurring security bugs no matter how it's implemented --- unless you
 do it by lobotomizing query pullup/pushdown, which seems unacceptable
 from a performance standpoint.
 
 Just to give one example of what this patch misses (probably; as I said
 I haven't read it lately), what about selectivity estimation?  One of
 the things we like to do when we have an otherwise-unknown function is
 to try it on all the histogram members and see what percentage yield
 true.  That might already represent enough of an information leak to be
 objectionable ... and yet, if we don't do it, the consequences for
 performance could be pretty horrid because we'll have to work without
 any reasonable selectivity estimate at all.  There are cases where this
 technique isn't applied at the moment but probably should be, which is
 why I characterize the leak-prevention idea as creating future security
 issues: doing that is a constraint that will have to be accounted for in
 every future planner change, and we are certain to miss the implications
 sometimes.
 
 Sorry, I might misread what you pointed out.

I think you're still misreading it. Want to try a third time?

 Do you see oprrest/oprjoin being internally invoked as a problem?
 Well, I also think it is a problem, as long as they can be installed
 by non-superusers. But, it is a separated problem from the row-level
 security issues.
 
 In my opinion, origin of the matter is incorrect checks on creation
 of operators. It allows non-superusers to define a new operator with
 restriction and join estimator as long as he has EXECUTE privilege
 on the functions. (not EXECUTE privilege of actual user who invokes
 this function on estimation phase!)
 Then, the optimizer may internally invoke these functions without
 any privilege checks on either the function or the table to be
 estimated. If a malicious one tries to make other users to invoke
 a trojan-horse, he can define a trappy operator with malicious
 estimator functions, cannot it?

This is a pretty poor excuse for a Trojan horse attack.

 

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, this time with a patch

2010-10-19 Thread Dimitri Fontaine

Itagaki Takahiro itagaki.takah...@gmail.com writes:
 * There are some compiler warnings. You might be missing something in
 copyfuncs and equalfuncs.

Fixed in v4, attached.

 * There might be some bugs in pg_dump:
 pg_dump: schema with OID 2200 does not exist, but is needed for object
 16411

Fixed in v4, attached.

The problem was with the query used in pg_dump to filter out relations
that are part of an extension, in getTables(). A composite type will
create an underlying relation of relkind 'c', but there was no direct
dependency to the extension, thus the filter failed to bypass it.

It's fixed by adding a direct internal dependency between the relation
and the extension, as that's so much easier than doing a recursive scan
of pg_depend in pg_dump SQL queries.

I will try to find out if other cases are forgotten like this one.

Note that as a result you need to drop extension dblink then create it
again so that you benefit from the fix. Also note that drop extension is
recursively following the dependencies so is not concerned by this bug.

 * The example in the doc CREATE EXTENSION hstore dumps surprising
 warning messages,
 We would be better to avoid such messages, though it's not an issue
 for EXTENSION.
 
 WARNING:  = is deprecated as an operator name
 DETAIL:  This name may be disallowed altogether in future versions of
 PostgreSQL.
 CONTEXT:  SQL statement /* contrib/hstore/hstore.sql.in */
 (followed by dumped script)
 

I didn't realise that using SPI would mean dumping the all script in
case of even NOTICEs. May be we want to protect against that in the
CREATE EXTENSION case, but I didn't have a look at how to do it. Do we
want CREATE EXTENSION to be more quiet than SPI usually is?

 * Docs sql-createextension.html has two odd links:
 
 See Also
 DROP EXTENSION, Table 9-61, Appendix F
 

Fixed in v4, attached.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support



extension.v4.patch.gz
Description: pg_dump support for PostgreSQL extensions

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, this time with a patch

2010-10-19 Thread Robert Haas

On Oct 19, 2010, at 8:33 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 I didn't realise that using SPI would mean dumping the all script in
 case of even NOTICEs. May be we want to protect against that in the
 CREATE EXTENSION case, but I didn't have a look at how to do it. Do we
 want CREATE EXTENSION to be more quiet than SPI usually is?

I don't see why.  I think the real action item here is to remove = from hstore.

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, this time with a patch

2010-10-19 Thread Dimitri Fontaine

Robert Haas robertmh...@gmail.com writes:
 I don't see why.  I think the real action item here is to remove =
 from hstore.

As input, consider that lots of extensions will create types that are
only a shell at the moment of the CREATE TYPE, and for each of those
types you will see the (potentially  1000 lines long) whole SQL script
dumped on the screen.

In the following script, I've filtered out the scripts, but it's written
out for each NOTICE message that you see:

dim ~/dev/PostgreSQL/postgresql-extension psql -c create extension citext; 
21 | egrep 'NOTICE|ERROR'
NOTICE:  Installing extension 'citext' from 
'/Users/dim/pgsql/exts/share/contrib/citext.sql', with user data
NOTICE:  return type citext is only a shell
NOTICE:  argument type citext is only a shell
NOTICE:  return type citext is only a shell
NOTICE:  argument type citext is only a shell
dim ~/dev/PostgreSQL/postgresql-extension psql -c create extension cube; 21 
| egrep 'NOTICE|ERROR'
NOTICE:  Installing extension 'cube' from 
'/Users/dim/pgsql/exts/share/contrib/cube.sql', with user data
NOTICE:  type cube is not yet defined
NOTICE:  return type cube is only a shell
NOTICE:  return type cube is only a shell
NOTICE:  argument type cube is only a shell
dim ~/dev/PostgreSQL/postgresql-extension psql -c create extension 
earthdistance; 21 | egrep 'NOTICE|ERROR'
NOTICE:  Installing extension 'earthdistance' from 
'/Users/dim/pgsql/exts/share/contrib/earthdistance.sql', with user data
dim ~/dev/PostgreSQL/postgresql-extension psql -c create extension 
fuzzystrmatch; 21 | egrep 'NOTICE|ERROR'
NOTICE:  Installing extension 'fuzzystrmatch' from 
'/Users/dim/pgsql/exts/share/contrib/fuzzystrmatch.sql', with user data
dim ~/dev/PostgreSQL/postgresql-extension psql -c create extension hstore; 
21 | egrep 'NOTICE|ERROR'
NOTICE:  Installing extension 'hstore' from 
'/Users/dim/pgsql/exts/share/contrib/hstore.sql', with user data
NOTICE:  return type hstore is only a shell
NOTICE:  argument type hstore is only a shell
NOTICE:  return type hstore is only a shell
NOTICE:  argument type hstore is only a shell
NOTICE:  return type ghstore is only a shell
NOTICE:  argument type ghstore is only a shell
dim ~/dev/PostgreSQL/postgresql-extension psql -c create extension isn; 21 
| egrep 'NOTICE|ERROR'
NOTICE:  Installing extension 'isn' from 
'/Users/dim/pgsql/exts/share/contrib/isn.sql', with user data
NOTICE:  type ean13 is not yet defined
NOTICE:  argument type ean13 is only a shell
NOTICE:  type isbn13 is not yet defined
NOTICE:  argument type isbn13 is only a shell
NOTICE:  type ismn13 is not yet defined
NOTICE:  argument type ismn13 is only a shell
NOTICE:  type issn13 is not yet defined
NOTICE:  argument type issn13 is only a shell
NOTICE:  type isbn is not yet defined
NOTICE:  argument type isbn is only a shell
NOTICE:  type ismn is not yet defined
NOTICE:  argument type ismn is only a shell
NOTICE:  type issn is not yet defined
NOTICE:  argument type issn is only a shell
NOTICE:  type upc is not yet defined
NOTICE:  argument type upc is only a shell
dim ~/dev/PostgreSQL/postgresql-extension psql -c create extension ltree; 
21 | egrep 'NOTICE|ERROR'
NOTICE:  Installing extension 'ltree' from 
'/Users/dim/pgsql/exts/share/contrib/ltree.sql', with user data
NOTICE:  type ltree is not yet defined
NOTICE:  argument type ltree is only a shell
NOTICE:  type lquery is not yet defined
NOTICE:  argument type lquery is only a shell
NOTICE:  type ltxtquery is not yet defined
NOTICE:  argument type ltxtquery is only a shell
NOTICE:  type ltree_gist is not yet defined
NOTICE:  argument type ltree_gist is only a shell

Just saying,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

PS: Oh, a repalloc() bug now. Will fix later in the afternoon, \dx or
select * from pg_extensions(); crashes with more than 10 extensions
installed in the v4 patch. That's what I get for doing that on a
Saturday evening.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leaky views, yet again

2010-10-19 Thread KaiGai Kohei


(2010/10/19 21:31), Robert Haas wrote:

On Oct 19, 2010, at 4:34 AM, KaiGai Koheikai...@ak.jp.nec.com  wrote:

(2010/10/14 1:52), Tom Lane wrote:

Robert Haasrobertmh...@gmail.com   writes:

On Wed, Oct 13, 2010 at 11:45 AM, Tom Lanet...@sss.pgh.pa.us   wrote:

That's all true, but you have to consider how much the obstacle actually
gets in their way versus how painful it is on your end to create and
maintain the obstacle. �I don't think this proposed patch measures up
very well on either end of that tradeoff.



I think it would behoove us to try to separate concerns about this
particular patch from concerns about the viability of the whole
approach.  Whether or not it's useful to do X is a different question
than whether it can be done with few enough lines of code and/or
whether this patch actually does it well.


I think I left the wrong impression: I'm concerned about the whole
approach.  I haven't even read this particular patch lately.  I think
that trying to guarantee that the planner applies independent
constraints in a particular order will be expensive, fragile, and prone
to recurring security bugs no matter how it's implemented --- unless you
do it by lobotomizing query pullup/pushdown, which seems unacceptable
from a performance standpoint.

Just to give one example of what this patch misses (probably; as I said
I haven't read it lately), what about selectivity estimation?  One of
the things we like to do when we have an otherwise-unknown function is
to try it on all the histogram members and see what percentage yield
true.  That might already represent enough of an information leak to be
objectionable ... and yet, if we don't do it, the consequences for
performance could be pretty horrid because we'll have to work without
any reasonable selectivity estimate at all.  There are cases where this
technique isn't applied at the moment but probably should be, which is
why I characterize the leak-prevention idea as creating future security
issues: doing that is a constraint that will have to be accounted for in
every future planner change, and we are certain to miss the implications
sometimes.


Sorry, I might misread what you pointed out.


I think you're still misreading it.


Hmm. In my understanding, it seems to me he concerned about possible leaky
estimate functions, so he mentioned the horrible performance degrading, if
we don't allow to execute these functions.
So, I suggested an idea that enforces all estimate functions being installed
by superusers; it enables us to assume they are enough safe.


Want to try a third time?


However, actually, it is still unclear for me... :-(


Do you see oprrest/oprjoin being internally invoked as a problem?
Well, I also think it is a problem, as long as they can be installed
by non-superusers. But, it is a separated problem from the row-level
security issues.

In my opinion, origin of the matter is incorrect checks on creation
of operators. It allows non-superusers to define a new operator with
restriction and join estimator as long as he has EXECUTE privilege
on the functions. (not EXECUTE privilege of actual user who invokes
this function on estimation phase!)
Then, the optimizer may internally invoke these functions without
any privilege checks on either the function or the table to be
estimated. If a malicious one tries to make other users to invoke
a trojan-horse, he can define a trappy operator with malicious
estimator functions, cannot it?


This is a pretty poor excuse for a Trojan horse attack.





...Robert




--
KaiGai Kohei kai...@kaigai.gr.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, this time with a patch

2010-10-19 Thread Robert Haas

On Tue, Oct 19, 2010 at 9:07 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 I don't see why.  I think the real action item here is to remove =
 from hstore.

 As input, consider that lots of extensions will create types that are
 only a shell at the moment of the CREATE TYPE, and for each of those
 types you will see the (potentially  1000 lines long) whole SQL script
 dumped on the screen.

 In the following script, I've filtered out the scripts, but it's written
 out for each NOTICE message that you see:

 dim ~/dev/PostgreSQL/postgresql-extension psql -c create extension citext; 
 21 | egrep 'NOTICE|ERROR'
 NOTICE:  Installing extension 'citext' from 
 '/Users/dim/pgsql/exts/share/contrib/citext.sql', with user data
 NOTICE:  return type citext is only a shell
 NOTICE:  argument type citext is only a shell
 NOTICE:  return type citext is only a shell
 NOTICE:  argument type citext is only a shell

Well, perhaps it's reasonable to suppress NOTICE messages then.  That
particular message could perhaps be quieted, but we probably don't
want to do that with every NOTICE that might occur (e.g. those that
you might get on table creation for sequences, indices, etc.).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: extensible enums

2010-10-19 Thread Alvaro Herrera

Excerpts from Magnus Hagander's message of mar oct 19 05:23:31 -0300 2010:
 
 He certainly could, but github provides more features and a nicer look
 :-) And since it's git, it doesn't matter where the repo is.

Yeah.  If you have a checked out copy of the GIT repo (preferably one
with the master branch in it), try this:

git remote add venum git://github.com/adunstan/postgresql-dev.git
git fetch venum venums:venums
git checkout venums

Then you have the patch in all its Git glory, in branch venums.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] create tablespace fails silently, or succeeds improperly

2010-10-19 Thread Tom Lane

Bruce Momjian br...@momjian.us writes:
 Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
 Looking at the pg_upgrade code some more, I found that it was not
 removing the PG_VERSION file when deleting = 8.4 tablespace files. 
 This might confuse administrators so the attached patch adds the removal
 of PG_VERSION.  I would like to apply this to master and 9.0.X.
 
 ... why is that a good idea?

 The script already deletes the 8.4 database directories, but leaves
 PG_VERSION behind.  Why keep it when all the 8.4 data is gone?  The
 script also dates PGDATA for 8.4, so there is nothing left pointing to
 that directory.

Oh, I misunderstood: I thought you were proposing to do this as an
automatic action inside pg_upgrade.  If it's part of the cleanup script,
it's fine.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] knngist - 0.8

2010-10-19 Thread David Fetter

On Mon, Oct 18, 2010 at 04:13:04PM +0100, Mark Cave-Ayland wrote:
 David Fetter wrote:
 
 For my vote, I'd prefer either the Oid of a custom type or an array
 of Oid, Datum pairs - i.e. something we can extend in the future if
 required.
 
 This sounds a lot like a foreign key to another table.  Are you not
 proposing doing that because of performance considerations?
 
 Cheers,
 David.
 
 Well, in PostGIS a typmod contains 3 pieces of information:
 
 1) the SRID
 2) the dimension
 3) the geometry type
 
 The SRID is technically already a foreign key into another table,
 with dimension and SRID as other information. At the moment, we
 bit-pack the dimension and geometry type into the SRID and use that
 as the typmod but this only leaves 21 bits IIRC for the SRID. The
 additional complication is that SRIDs at the higher end of the range
 are allowed for anyone to use, and so people may have their own
 customised spheroids defined in this region of the table.

Sounds like you need space for all of these separately, and once you
have that, you'll probably start thinking about other possible pieces
of information you could store, especially as you start to store new
data types and have new operators on them.

It sounds to me as thought the typemod, or something like it, needs to
be able to store, in general, completely arbitrary information,
although not likely much over a page or two of memory.  Cf.  Robert
Haas's recent comment about Klingon.  While we could make this a
completely opaque structure from the SQL level, I'm not sure it's a
good idea.

 If we had a foreign key into another table, we'd need to ensure that
 no one could tamper with it as otherwise all chaos would break lose,
 e.g. breaking the geometry type constraint on a column.

Our system catalog tables have the nice property that no one can
accidentally modify them by hand, and all chaos, as you put it, can
reasonably be expected to occur should they choose to do so.

 Heck, we even have people deleting the geometry_columns table
 sometimes because they are not aware of what it does.  By storing
 this information in the PG catalog then this can't happen, plus the
 information is available easily in Form_pg_attribute without having
 to implement our own cache, with its own related problems such as
 how/when to invalidate etc.

:)

 There is also a chance that we'd want to include additional
 information in the future related to geometry validity, for example,
 which would mean further reducing the range allowed within the
 spatial_ref_sys table in its existing form.

Right.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATH] Idle in transaction cancellation

2010-10-19 Thread Kevin Grittner

Andres Freund and...@anarazel.de wrote:
 
 Here is a proposed patch which enables cancellation of $subject.
 
Cool.  Some enhancements we'd like to do to Serializable Snapshot
Isolation (SSI), should the base patch make it in, would require
this capability.
 
 Currently it does *not* report any special error message to the
 client if it 
 
 starts sending commands in an (unbekownst to it) failed
 transaction, but just the normal 25P02: current transaction is
 aborted... message.
 
 It shouldn't be hard to add that and I will propose a patch if
 people would like it (I personally am not very interested, but I
 can see people validly wanting it)
 
For SSI purposes, it would be highly desirable to be able to set the
SQLSTATE and message generated when the canceled transaction
terminates.
 
 but I would like to have some feedback on the patch first.
 
I'll take a look when I can, but it may be a couple weeks from now.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATH] Idle in transaction cancellation

2010-10-19 Thread Robert Haas

On Tue, Oct 19, 2010 at 10:18 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Andres Freund and...@anarazel.de wrote:

 Here is a proposed patch which enables cancellation of $subject.

 I'll take a look when I can, but it may be a couple weeks from now.

How about adding it to
https://commitfest.postgresql.org/action/commitfest_view/open ?

Seems like a good feature, at least from 20,000 feet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, this time with a patch

2010-10-19 Thread Tom Lane

Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 Robert Haas robertmh...@gmail.com writes:
 I don't see why.  I think the real action item here is to remove =
 from hstore.

 As input, consider that lots of extensions will create types that are
 only a shell at the moment of the CREATE TYPE, and for each of those
 types you will see the (potentially  1000 lines long) whole SQL script
 dumped on the screen.

Only if the script is intentionally noisy.  The right fix here is
probably to bump up the min message level while running the script.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leaky views, yet again

2010-10-19 Thread Tom Lane

KaiGai Kohei kai...@kaigai.gr.jp writes:
 (2010/10/19 21:31), Robert Haas wrote:
 I think you're still misreading it.

 Hmm. In my understanding, it seems to me he concerned about possible leaky
 estimate functions, so he mentioned the horrible performance degrading, if
 we don't allow to execute these functions.

No, it wasn't about estimator functions at all, it was about what we do
in the absence of an estimator.

 So, I suggested an idea that enforces all estimate functions being installed
 by superusers; it enables us to assume they are enough safe.

Even if we were willing to accept a limitation as stringent as that,
preventing people from installing estimators is hardly going to cure
the problem.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-10-19 Thread Robert Haas

On Sat, Oct 16, 2010 at 12:59 PM, Terry Laurenzo t...@laurenzo.org wrote:
    - It is directly iterable without parsing and/or constructing an AST
    - It is its own representation.  If iterating and you want to tear-off a
 value to be returned or used elsewhere, its a simple buffer copy plus some
 bit twiddling.
    - It is conceivable that clients already know how to deal with BSON,
 allowing them to work with the internal form directly (ala MongoDB)
    - It stores a wider range of primitive types than JSON-text.  The most
 important are Date and binary.

When last I looked at that, it appeared to me that what BSON could
represent was a subset of what JSON could represent - in particular,
that it had things like a 32-bit limit on integers, or something along
those lines.  Sounds like it may be neither a superset nor a subset,
in which case I think it's a poor choice for an internal
representation of JSON.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-10-19 Thread Andrew Dunstan




On 10/19/2010 10:44 AM, Robert Haas wrote:

On Sat, Oct 16, 2010 at 12:59 PM, Terry Laurenzot...@laurenzo.org  wrote:

- It is directly iterable without parsing and/or constructing an AST
- It is its own representation.  If iterating and you want to tear-off a
value to be returned or used elsewhere, its a simple buffer copy plus some
bit twiddling.
- It is conceivable that clients already know how to deal with BSON,
allowing them to work with the internal form directly (ala MongoDB)
- It stores a wider range of primitive types than JSON-text.  The most
important are Date and binary.

When last I looked at that, it appeared to me that what BSON could
represent was a subset of what JSON could represent - in particular,
that it had things like a 32-bit limit on integers, or something along
those lines.  Sounds like it may be neither a superset nor a subset,
in which case I think it's a poor choice for an internal
representation of JSON.


Yeah, if it can't handle arbitrary precision numbers as has previously 
been stated it's dead in the water for our purposes, I think.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, this time with a patch

2010-10-19 Thread Dimitri Fontaine

Tom Lane t...@sss.pgh.pa.us writes:
 Only if the script is intentionally noisy.  The right fix here is
 probably to bump up the min message level while running the script.

You mean doing that from the SQL script itself (using SET) or in the
pg_execute_from_file() code? My guess is the former, but...

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] O_DSYNC broken on MacOS X?

2010-10-19 Thread Bruce Momjian

Robert Haas wrote:
 On Thu, Oct 7, 2010 at 11:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  Proposed doc patch attached.
 
  discusesed? ?Otherwise +1
 
 Woops, thanks.  Committed with that change.  I back-patched it back to
 8.3, which is as far as it applied with only minor conflicts.

I have applied the attached patch which mentions tools/fsync for testing
fsync method performance, and clarified the new paragraph about sync
methods.

I am glad to see we are beefing up this area of the docs.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 87d182c..f178835 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1569,13 +1569,13 @@ SET ENABLE_SEQSCAN TO OFF;
/itemizedlist
para
 Not all of these choices are available on all platforms.
-The default is the first method in the above list that is supported
-by the platform.  The default is not necessarily best; it may be
-necessary to change this setting, or other aspects of your system
-configuration, in order to create a crash-safe configuration, as
-discussed in xref linkend=wal-reliability, or to achieve best
-performance.
 The literalopen_/* options also use literalO_DIRECT/ if available.
+The default is the first method in the above list that is supported
+by the platform.  The default is not necessarily ideal; it might be
+necessary to change this setting or other aspects of your system
+configuration in order to create a crash-safe configuration or
+achieve optimal performance.
+These aspects are discussed in xref linkend=wal-reliability.
 The utility filenamesrc/tools/fsync/ in the PostgreSQL source tree
 can do performance testing of various fsync methods.
 This parameter can only be set in the filenamepostgresql.conf/
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 5678d6e..7b50bdd 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -530,11 +530,13 @@
   para
The xref linkend=guc-wal-sync-method parameter determines how
productnamePostgreSQL/productname will ask the kernel to force
-acronymWAL/acronym updates out to disk.
-   With the exception of literalfsync_writethrough/, which can sometimes
-   force a flush of the disk cache even when other options do not do so,
-   all the options should be the same in terms of reliability.
-   However, it's quite platform-specific which one will be the fastest.
+   acronymWAL/acronym updates out to disk.
+   All the options should be the same in terms of reliability, with
+   the exception of literalfsync_writethrough/, which can sometimes
+   force a flush of the disk cache even when other options do not do so.
+   However, it's quite platform-specific which one will be the fastest;
+   you can test option speeds using the utility filenamesrc/tools/fsync/
+   in the PostgreSQL source tree.
Note that this parameter is irrelevant if varnamefsync/varname
has been turned off.
   /para

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] knngist - 0.8

2010-10-19 Thread Teodor Sigaev


3) 3-rd boolean column (amopopr, amopfamily, amoporder)
- could be two records per operator
+ operator could be used in both roles
+ strategy number could be different for different roles



How can #3 work at all?  It's ignoring a couple of critical index
columns.  In particular, I believe the sticking point here is this
unique index:

 pg_amop_fam_strat_index UNIQUE, btree (amopfamily, amoplefttype, 
amoprighttype, amopstrategy)

I believe, that columns (amoplefttype, amoprighttype) are fixed for operation 
and could not be changed. So, two operations in one opfamily should be differ in 
strategy number for different roles. It also gives a direct way to transfer 
knowledge abot role to consistent method of GiST.



I'm not terribly thrilled with using a boolean here in any case.
Now that we have two roles an operator might play in an opclass,
who's to say there might not be more roles someday?  We should use
a column type that will support more than two roles without basic
rejiggering.


It's easy to change to char type, for now only 's'search and 'o'order characters 
will be allowed.



BTW, have we discussed the idea of embedding the role in the strategy
number?  For example, require regular operators to have strategy


In this schema, one operator could not be used in more than one role. Right now 
it's not strict limitation, because the we still don't have an example of 
boolean distance.


Anyhow, it needs to distinguish roles while IndexPath is built. So,
op_in_opfamily/get_op_opfamily_strategy/get_op_opfamily_properties and friends 
will accept extra argument pointing to interested role.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] knngist - 0.8

2010-10-19 Thread Teodor Sigaev


Thinking about it that way, perhaps we could add an integer column
amop_whats_it_good_for that gets used as a bit field.  That wouldn't
require changing the index structure, although it might break some
other things.



OPERATOR strategy_number ( op_type [ , op_type ] ) [ FOR { SEARCH |
ORDER } [, ...] ]


It's very desirable thing to be able to distinguish roles in consistent method 
of GiST: computation of distance could be very expensive and. The single way to 
provide it in current GiST interface is a strategy number. Of course, we could 
add 6-th argument to consistent to point role, but I don't think that's good 
decision.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simplifying replication

2010-10-19 Thread Greg Smith


Josh Berkus wrote:
It is critical that we make replication easier to set up, administrate 
and monitor than it currently is.  In my conversations with people, 
this is more important to our users and the adoption of PostgreSQL 
than synchronous replication is.


You should enjoy one of the patches we're furiously working on then, 
which is aiming at some of the administration and monitoring pieces 
here.  I have my own grand vision of how easy replication should be to 
setup too.  Visions and plans are nice, but building functional pieces 
of them and delivering them to the community is what actually moves 
PostgreSQL forward.  So far, multiple people have done that for sync 
rep, and what we're supposed to be focused on at this stage in the 
development cycle is finishing the work related to the open CommitFest 
item that includes that.


I find this launch into a new round of bike-shedding a bit distracting.  
If you want this to be easier to use, which it's obvious to any observer 
it should be because what's delivered in 9.0 is way too complicated, 
please work on finding development resources to assign to that problem.  
Because that's the bottleneck on simplifying things, not ideas about 
what to do.  I would recommend finding or assigning a developer to work 
on integrating base backup in to the streaming protocol as the biggest 
single thing that would improve the built-in replication.  All of the 
rest of the trivia about what knobs to set and such are tiny details 
that make for only a minor improvement until that's taken care of.


--
Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-10-19 Thread Terry Laurenzo

Agreed.  BSON was born out of implementations that either lacked arbitrary
precision numbers or had a strong affinity to an int/floating point way of
thinking about numbers.  I believe that if BSON had an arbitrary precision
number type, it would be a proper superset of JSON.

As an aside, the max range of an int in BSON 64bits.  Back to my original
comment that BSON was grown instead of designed, it looks like both the
32bit and 64bit integers were added late in the game and that the original
designers perhaps were just going to store all numbers as double.

Perhaps we should enumerate the attributes of what would make a good binary
encoding?

Terry

On Tue, Oct 19, 2010 at 8:57 AM, Andrew Dunstan and...@dunslane.net wrote:



 On 10/19/2010 10:44 AM, Robert Haas wrote:

 On Sat, Oct 16, 2010 at 12:59 PM, Terry Laurenzot...@laurenzo.org  wrote:

- It is directly iterable without parsing and/or constructing an AST
- It is its own representation.  If iterating and you want to tear-off
 a
 value to be returned or used elsewhere, its a simple buffer copy plus
 some
 bit twiddling.
- It is conceivable that clients already know how to deal with BSON,
 allowing them to work with the internal form directly (ala MongoDB)
- It stores a wider range of primitive types than JSON-text.  The most
 important are Date and binary.

 When last I looked at that, it appeared to me that what BSON could
 represent was a subset of what JSON could represent - in particular,
 that it had things like a 32-bit limit on integers, or something along
 those lines.  Sounds like it may be neither a superset nor a subset,
 in which case I think it's a poor choice for an internal
 representation of JSON.


 Yeah, if it can't handle arbitrary precision numbers as has previously been
 stated it's dead in the water for our purposes, I think.

 cheers

 andrew

Re: [HACKERS] O_DSYNC broken on MacOS X?

2010-10-19 Thread Bruce Momjian

Greg Smith wrote:
 A.M. wrote:
  Perhaps a simpler tool could run a basic fsyncs-per-second test and prompt 
  the DBA to check that the numbers are within the realm of possibility.

 
 This is what the test_fsync utility that already ships with the database 
 should be useful for.  The way Bruce changed it to report numbers in 
 commits/second for 9.0 makes it a lot easier to use for this purpose 
 than it used to be.  I think there's still some additional improvements 
 that could be made there, but it's a tricky test to run accurately.  The 

test_fsync was designed to test various things like whether several
open-sync writes are better than two write and an fsync, and whether you
can fsync data written on a different file descriptor.  It is really a
catch-all test right now, not one specific for choosing sync methods.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/JS

2010-10-19 Thread Greg

Cool, will take a look. Thanks!

--- On Tue, 19/10/10, Itagaki Takahiro itagaki.takah...@gmail.com wrote:

From: Itagaki Takahiro itagaki.takah...@gmail.com
Subject: Re: [HACKERS] PL/JS
To: Terri Laurenzo t...@laurenzo.org
Cc: Greg grigo...@yahoo.co.uk, pgsql-hackers@postgresql.org 
pgsql-hackers@postgresql.org
Date: Tuesday, 19 October, 2010, 3:18

On Tue, Oct 19, 2010 at 9:04 AM, Terri Laurenzo t...@laurenzo.org wrote:
 Is there a pl/js yet?  I've not been able to find anything
 but was thinking about putting something together based on v8 or spidermonkey.

I'm working on PL/v8, that is one of PL/JS (PL/JavaScript)
implementation based on v8.
http://code.google.com/p/plv8js/

--
Itagaki Takahiro

Re: [HACKERS] O_DSYNC broken on MacOS X?

2010-10-19 Thread A.M.


On Oct 19, 2010, at 11:22 AM, Bruce Momjian wrote:

 Greg Smith wrote:
 A.M. wrote:
 Perhaps a simpler tool could run a basic fsyncs-per-second test and prompt 
 the DBA to check that the numbers are within the realm of possibility.
 
 
 This is what the test_fsync utility that already ships with the database 
 should be useful for.  The way Bruce changed it to report numbers in 
 commits/second for 9.0 makes it a lot easier to use for this purpose 
 than it used to be.  I think there's still some additional improvements 
 that could be made there, but it's a tricky test to run accurately.  The 
 
 test_fsync was designed to test various things like whether several
 open-sync writes are better than two write and an fsync, and whether you
 can fsync data written on a different file descriptor.  It is really a
 catch-all test right now, not one specific for choosing sync methods.

I am working on simplifying the test_fsync tool and making it a contrib 
function which can be run by the superuser based on the configured fsync 
method. That way, the list can ask a user to run it to report fsyncs-per-second 
for suspiciousness. The goal is to make it more accessible. I was also thinking 
about adding some notes along the lines of Your drive fsync speed rates 
between a 5400 RPM SATA drive and a 7200 RPM SATA drive. or Your drive fsync 
speed rates as high as RAM- your fsync method may be wrong.

Currently, the test tool is not even compiled by default.

Thoughts?

Cheers,
M
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] knngist - 0.8

2010-10-19 Thread Teodor Sigaev


combination of parameters is one that it can handle.  I'm thinking
perhaps in lieu of a boolean, we can add another indexam method which,
if not InvalidOid, gets called when we're wondering about whether a
given clause is something that the index can order by.  Although
knngist focuses on a the ORDER BY col OP constant case, which


Hmm, interesting idea. To be more general, amcanorder (without byop suffix) 
could be eliminated too.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, this time with a patch

2010-10-19 Thread Tom Lane

Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 Tom Lane t...@sss.pgh.pa.us writes:
 Only if the script is intentionally noisy.  The right fix here is
 probably to bump up the min message level while running the script.

 You mean doing that from the SQL script itself (using SET) or in the
 pg_execute_from_file() code? My guess is the former, but...

You could argue that either way I guess.  The script knows what it
needs, but OTOH just about every extension there is will probably
be generating useless NOTICEs unless something is done, so maybe
the extension management code should take care of it for them.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simplifying replication

2010-10-19 Thread Josh Berkus




You should enjoy one of the patches we're furiously working on then,
which is aiming at some of the administration and monitoring pieces
here.


Great, glad to hear it!  Would you be willing to go into detail?


I have my own grand vision of how easy replication should be to
setup too.


So, share it.  I'd look forward to hearing it, especially since your 
vision probably takes synch rep and quorum commit into account, which 
mine doesn't.   If not here, then on your blog.



Visions and plans are nice, but building functional pieces of
them and delivering them to the community is what actually moves
PostgreSQL forward.


*shrug*.  Robert asked me to write it up for the list based on the 
discussions around synch rep.  Now you're going to bash me for doing so?


Many of the goals I described will mean removing knobs and changing 
defaults, or even foregoing fine-grained control entirely.  If we don't 
have agreement that simplifying replication is a high-priority goal, 
then it won't happen; anyone submitting a patch will be 
this-or-that-use-cased to death and will give up.


For that matter, I'm not sure that everyone agrees that simplification 
is a worthwhile goal.  For example, somewhere between 9.0beta4 and final 
release, someone changed the defaults for max_wal_senders and 
hot_standby to 0 and off.  I don't remember there even being 
discussion about it.


The discussion around synch rep certainly showed that the natural 
tendency of this list is to add complexity with each incarnation of a 
feature.  It's the easiest way to accomodate conflicting use cases, but 
it's not the best way.


--
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] create tablespace fails silently, or succeeds improperly

2010-10-19 Thread Bruce Momjian

Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  Tom Lane wrote:
  Bruce Momjian br...@momjian.us writes:
  Looking at the pg_upgrade code some more, I found that it was not
  removing the PG_VERSION file when deleting = 8.4 tablespace files. 
  This might confuse administrators so the attached patch adds the removal
  of PG_VERSION.  I would like to apply this to master and 9.0.X.
  
  ... why is that a good idea?
 
  The script already deletes the 8.4 database directories, but leaves
  PG_VERSION behind.  Why keep it when all the 8.4 data is gone?  The
  script also dates PGDATA for 8.4, so there is nothing left pointing to
  that directory.
 
 Oh, I misunderstood: I thought you were proposing to do this as an
 automatic action inside pg_upgrade.  If it's part of the cleanup script,
 it's fine.

Yes, never automatic.  You can always roll back after pg_upgrade
completes.  Once you start the new server, only then can you not go
back.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simplifying replication

2010-10-19 Thread Greg Smith


Josh Berkus wrote:
*shrug*.  Robert asked me to write it up for the list based on the 
discussions around synch rep.  Now you're going to bash me for doing so?


Sorry, next time I'll make sure to bash Robert too.  I don't have any 
problems with the basic ideas you're proposing, just concerns about when 
the right time to get into that whole giant subject is and who is going 
to work on.


--
Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_wal_senders must die

2010-10-19 Thread Josh Berkus


Stefan, Dimitri,

 I disagree - limiting the maximum number of replication connections is
 important for my usecases.

Can you explain more?  I clearly don't understand your use case.


If we want something fixed *now*, should we perhaps just bump the
*default* value for max_wal_senders to 5 or something?


If we're not going to remove it, then the default should be -1, which 
should mean same as max_connections.


--
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_wal_senders must die

2010-10-19 Thread Greg Smith


Josh Berkus wrote:
Under what bizarre set of circumstances would anyone have runaway 
connections from replicas to the master?


Cloud computing deployments where additional replicas are brought up 
automatically in response to demand.  It's easy to imagine a situation 
where a standby instance is spawned, starts to sync, and that additional 
load triggers *another* standby to come on board; repeat until the 
master is doing nothing but servicing standby sync requests.  
max_wal_senders provides a safety value for that.


I think Magnus's idea to bump the default to 5 triages the worst of the 
annoyance here, without dropping the feature (which has uses) or waiting 
for new development to complete.  I'd be in favor of just committing 
that change right now, before it gets forgotten about, and then if 
nobody else gets around to further work at least something improved here 
for 9.1.


--
Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, this time with a patch

2010-10-19 Thread Dimitri Fontaine

Tom Lane t...@sss.pgh.pa.us writes:
 You could argue that either way I guess.  The script knows what it
 needs, but OTOH just about every extension there is will probably
 be generating useless NOTICEs unless something is done, so maybe
 the extension management code should take care of it for them.

Either way is the key here too, so please find attached a revised (v5)
patch which will force log_min_messages and client_min_messages to
WARNING while the script is run.

v5 also contains the \dx bug fix about repalloc.

Please note that I didn't touch any contrib yet, so that hstore will
still dump its full script here:

dim=# create extension isn;
NOTICE:  Installing extension 'isn' from 
'/Users/dim/pgsql/exts/share/contrib/isn.sql', with user data
CREATE EXTENSION
dim=# create extension hstore;
NOTICE:  Installing extension 'hstore' from 
'/Users/dim/pgsql/exts/share/contrib/hstore.sql', with user data
WARNING:  = is deprecated as an operator name
DETAIL:  This name may be disallowed altogether in future versions of 
PostgreSQL.
CONTEXT:  SQL statement /* contrib/hstore/hstore.sql.in */

The script follows here. Maybe 9.1 is when to deprecate = as an
operator name in the hstore official extension? :)

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

Of course the git repo is still maintained for those prefering it:
  
http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=shortlog;h=refs/heads/extension



extension.v5.patch.gz
Description: pg_dump support for PostgreSQL extensions

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, this time with a patch

2010-10-19 Thread Alvaro Herrera

Excerpts from Dimitri Fontaine's message of dom oct 17 16:30:47 -0300 2010:

 The bulk of it is now short enough to be inlined in the mail, and if you
 have more comments I guess they'll be directed at this portion of the
 patch, so let's make it easy:
 
 /*
  * We abuse some internal knowledge from spi.h here. As we don't know
  * which queries are going to get executed, we don't know what to expect
  * as an OK return code from SPI_execute().  We assume that
  * SPI_OK_CONNECT, SPI_OK_FINISH and SPI_OK_FETCH are quite improbable,
  * though, and the errors are negatives.  So a valid return code is
  * considered to be SPI_OK_UTILITY or anything from there.
  */
 SPI_connect();
 if (SPI_execute(query_string, false, 0)  SPI_OK_UTILITY)
 ereport(ERROR,
 (errcode(ERRCODE_DATA_EXCEPTION),
  errmsg(File '%s' contains invalid query, filename)));
 SPI_finish();

SPI_OK_FETCH is indeed improbable -- it's not used anywhere in the SPI
routines, and hasn't been for ages.  SPI_OK_CONNECT and SPI_OK_FINNISH
are only issued by SPI_connect and SPI_finish, so presumably they can't
be returned by a script.

The error message wording needs some work; maybe
errmsg(file \%s\ could not be executed, filename)

SPI_execute sets the query as errcontext, which is good; but apparently
there is no error position, which is bad.  I'm not sure how feasible is
to fix this.  (Not this patch's responsibility anyway.)

I happened to notice that there are several pieces of code that are
calling SPI_connect and SPI_finish without checking the return value,
notably the FTS code and xml.c.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, this time with a patch

2010-10-19 Thread Alvaro Herrera

Excerpts from Dimitri Fontaine's message of vie oct 15 16:15:23 -0300 2010:

 The cfparser patch didn't change, the current version is still v1.

Hmm, this needs some cleanup; the comments still talk about the old
function name; and about just the recovery.conf file.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] O_DSYNC broken on MacOS X?

2010-10-19 Thread Bruce Momjian

A.M. wrote:
 
 On Oct 19, 2010, at 11:22 AM, Bruce Momjian wrote:
 
  Greg Smith wrote:
  A.M. wrote:
  Perhaps a simpler tool could run a basic fsyncs-per-second test and 
  prompt the DBA to check that the numbers are within the realm of 
  possibility.
  
  
  This is what the test_fsync utility that already ships with the database 
  should be useful for.  The way Bruce changed it to report numbers in 
  commits/second for 9.0 makes it a lot easier to use for this purpose 
  than it used to be.  I think there's still some additional improvements 
  that could be made there, but it's a tricky test to run accurately.  The 
  
  test_fsync was designed to test various things like whether several
  open-sync writes are better than two write and an fsync, and whether you
  can fsync data written on a different file descriptor.  It is really a
  catch-all test right now, not one specific for choosing sync methods.
 
 I am working on simplifying the test_fsync tool and making it a contrib 
 function which can be run by the superuser based on the configured fsync 
 method. That way, the list can ask a user to run it to report 
 fsyncs-per-second for suspiciousness. The goal is to make it more accessible. 
 I was also thinking about adding some notes along the lines of Your drive 
 fsync speed rates between a 5400 RPM SATA drive and a 7200 RPM SATA drive. 
 or Your drive fsync speed rates as high as RAM- your fsync method may be 
 wrong.
 
 Currently, the test tool is not even compiled by default.
 
 Thoughts?

Agreed.  Let me know if you have any questions.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simplifying replication

2010-10-19 Thread Josh Berkus


Dimitri, Greg,


I want to say a big big +1 here. The way replication and PITR setup are
implemented now are a very good prototype, it's time to consolidate and
get to something usable by normal people, as opposed to PostgreSQL full
time geeks.


Well, one thing to be addressed is separating the PITR functionality 
from replication.  PITR needs a lot of features -- timelines, recovery 
stop points, etc. -- which replication doesn't need or want.  I think 
that focussing on streaming replication functionality and ignoring the 
archive logs case is probably the best way to logically separate these 
two.  Presumably anyone who needs archive logs as well will be a 
professional DBA.



I could prepare a patch given some advice on the replication protocol
integration. For one, is streaming a base backup something that
walsender should care about?


Yeah, I thought there was a prototype for this somewhere.   From a user 
perspective, using a 2nd pgport connection for the initial clone is 
fine.   I don't know if we want to worry about it otherwise from a 
resource management perspective; presumably the cloning process is going 
to be a pretty big performance hit on the master.



BTW, do we have a clear idea of how to implement pg_ping, and should it
reports current WAL location(s) of a standby?


pg_ping?


That needs a way to define a group of standby. There's nothing there
that makes them know about each other.


Let me clarify.  I meant that if I try to make a *single* standby point 
to a new master, and that new master was behind the standby when it 
failed over, then the attempt to remaster should fail with an error.


I do *not* want to get into standby groups.  That way lies madness.  ;-)


Now say we have pg_ping (or another tool) returning the current recv,
applied and synced LSNs, it would be possible for any standby to figure
out which other ones must be shot in case you failover here. The
failover command could list those other standby in the group that you're
behind of, and with a force command allow you to still failover to this
one. Now you have to STONITH the one listed, but that's your problem
after all.


The LSN isn't enough; as others have pointed out, we have a fairly 
serious failure case if a standby comes up as a master, accepts 
transactions, and then we try to remaster a 2nd standby which was 
actually ahead of the first standby at the time of master failure.  I 
haven't seen a solution posted to that yet; maybe I missed it?


 Sorry, next time I'll make sure to bash Robert too. I don't have any
 problems with the basic ideas you're proposing, just concerns about when
 the right time to get into that whole giant subject is and who is going
 to work on.

If not now, when?  The 2nd CommitFest is almost complete.   If we're 
going to make any substantial changes, we need to have patches for the 
3rd commitfest.  And I didn't see anyone discussing simplification until 
I brought it up.


I don't realistically think that we're going to get 100% simplification 
for 9.1.  But it would be nice to at least get some components, which 
means getting agreement on how things should work, at least roughly.


--
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_wal_senders must die

2010-10-19 Thread Josh Berkus


On 10/19/2010 09:06 AM, Greg Smith wrote:

I think Magnus's idea to bump the default to 5 triages the worst of the
annoyance here, without dropping the feature (which has uses) or waiting
for new development to complete.  I'd be in favor of just committing
that change right now, before it gets forgotten about, and then if
nobody else gets around to further work at least something improved here
for 9.1.


Heck, even *I* could write that patch, if we're agreed.  Although you 
can commit it.


--
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, this time with a patch

2010-10-19 Thread Dimitri Fontaine

Alvaro Herrera alvhe...@commandprompt.com writes:
 The error message wording needs some work; maybe
   errmsg(file \%s\ could not be executed, filename)
[...]
 I happened to notice that there are several pieces of code that are
 calling SPI_connect and SPI_finish without checking the return value,
 notably the FTS code and xml.c.

Please find attached pg_execute_from_file.v4.patch with fixes. I've used
the usual error messages for SPI_connect() and finish this time.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 13897,13902  postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
--- 13897,13909 
 entrytyperecord/type/entry
 entryReturn information about a file/entry
/row
+   row
+entry
+ literalfunctionpg_execute_from_file(parameterfilename/ typetext/)/function/literal
+/entry
+entrytypevoid/type/entry
+entryExecutes the acronymSQL/ commands contained in a file/entry
+   /row
   /tbody
  /tgroup
 /table
***
*** 13935,13940  SELECT (pg_stat_file('filename')).modification;
--- 13942,13956 
  /programlisting
 /para
  
+indexterm
+ primarypg_execute_from_file/primary
+/indexterm
+para
+ functionpg_execute_from_file/ makes the server
+ execute acronymSQL/ commands to be found in a file. This function is
+ reserved to superusers.
+/para
+ 
 para
  The functions shown in xref linkend=functions-advisory-locks manage
  advisory locks.  For details about proper use of these functions, see
***
*** 13957,13962  SELECT (pg_stat_file('filename')).modification;
--- 13973,13979 
 entrytypevoid/type/entry
 entryObtain exclusive advisory lock/entry
/row
+ 
row
 entry
  literalfunctionpg_advisory_lock(parameterkey1/ typeint/, parameterkey2/ typeint/)/function/literal
*** a/src/backend/utils/adt/genfile.c
--- b/src/backend/utils/adt/genfile.c
***
*** 7,12 
--- 7,13 
   * Copyright (c) 2004-2010, PostgreSQL Global Development Group
   *
   * Author: Andreas Pflug pgad...@pse-consulting.de
+  * Dimitri Fontaine dimi...@2ndquadrant.fr
   *
   * IDENTIFICATION
   *	  src/backend/utils/adt/genfile.c
***
*** 21,26 
--- 22,28 
  #include dirent.h
  
  #include catalog/pg_type.h
+ #include executor/spi.h
  #include funcapi.h
  #include mb/pg_wchar.h
  #include miscadmin.h
***
*** 264,266  pg_ls_dir(PG_FUNCTION_ARGS)
--- 266,340 
  
  	SRF_RETURN_DONE(funcctx);
  }
+ 
+ /*
+  * Read a file then execute the SQL commands it contains.
+  */
+ Datum
+ pg_execute_from_file(PG_FUNCTION_ARGS)
+ {
+ 	text	   *filename_t = PG_GETARG_TEXT_P(0);
+ 	char	   *filename;
+ 	FILE   *file;
+ 	int64   fsize = -1, nbytes;
+ 	struct stat fst;
+ 	char   *query_string = NULL;
+ 
+ 	if (!superuser())
+ 		ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+  (errmsg(must be superuser to get file information;
+ 
+ 	/*
+ 	 * Only superuser can call pg_execute_from_file, and CREATE EXTENSION
+ 	 * uses that too. Don't double check the PATH. Also note that
+ 	 * extension's install files are not in $PGDATA but `pg_config
+ 	 * --sharedir`.
+ 	 */
+ 	filename = text_to_cstring(filename_t);
+ 
+ 	if (stat(filename, fst)  0)
+ 		ereport(ERROR,
+ (errcode_for_file_access(),
+  errmsg(could not stat file \%s\: %m, filename)));
+ 
+ 	fsize = Int64GetDatum((int64) fst.st_size);
+ 
+ 	if ((file = AllocateFile(filename, PG_BINARY_R)) == NULL)
+ 		ereport(ERROR,
+ (errcode_for_file_access(),
+  errmsg(could not open file \%s\ for reading: %m,
+ 		filename)));
+ 
+ 	if (ferror(file))
+ 		ereport(ERROR,
+ (errcode_for_file_access(),
+  errmsg(could not read file \%s\: %m, filename)));
+ 
+ 	query_string = (char *)palloc((fsize+1)*sizeof(char));
+ 	memset(query_string, 0, fsize+1);
+ 	nbytes = fread(query_string, 1, (size_t) fsize, file);
+ 	pg_verifymbstr(query_string, nbytes, false);
+ 	FreeFile(file);
+ 
+ 	/*
+ 	 * We abuse some internal knowledge from spi.h here. As we don't know
+ 	 * which queries are going to get executed, we don't know what to expect
+ 	 * as an OK return code from SPI_execute().  We assume that
+ 	 * SPI_OK_CONNECT, SPI_OK_FINISH and SPI_OK_FETCH are quite improbable,
+ 	 * though, and the errors are negatives.  So a valid return code is
+ 	 * considered to be SPI_OK_UTILITY or anything from there.
+ 	 */
+ 	if (SPI_connect() != SPI_OK_CONNECT)
+ 		elog(ERROR, SPI_connect failed);
+ 
+ 	if (SPI_execute(query_string, false, 0)  SPI_OK_UTILITY)
+ 		ereport(ERROR,
+ (errcode(ERRCODE_DATA_EXCEPTION),
+  errmsg(File '%s' could not be executed, filename)));
+ 
+ 	if (SPI_finish() != SPI_OK_FINISH)
+ 		elog(ERROR, SPI_finish failed);
+ 
+

Re: [HACKERS] Extensions, this time with a patch

2010-10-19 Thread Dimitri Fontaine

Alvaro Herrera alvhe...@commandprompt.com writes:
 Hmm, this needs some cleanup; the comments still talk about the old
 function name; and about just the recovery.conf file.

Ah yes, thinking it's an easy patch is not helping. Please find attached
a revised version of it.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 55,60 
--- 55,61 
  #include utils/guc.h
  #include utils/ps_status.h
  #include utils/relmapper.h
+ #include utils/cfparser.h
  #include pg_trace.h
  
  
***
*** 5018,5117  str_time(pg_time_t tnow)
  }
  
  /*
-  * Parse one line from recovery.conf. 'cmdline' is the raw line from the
-  * file. If the line is parsed successfully, returns true, false indicates
-  * syntax error. On success, *key_p and *value_p are set to the parameter
-  * name and value on the line, respectively. If the line is an empty line,
-  * consisting entirely of whitespace and comments, function returns true
-  * and *keyp_p and *value_p are set to NULL.
-  *
-  * The pointers returned in *key_p and *value_p point to an internal buffer
-  * that is valid only until the next call of parseRecoveryCommandFile().
-  */
- static bool
- parseRecoveryCommandFileLine(char *cmdline, char **key_p, char **value_p)
- {
- 	char	   *ptr;
- 	char	   *bufp;
- 	char	   *key;
- 	char	   *value;
- 	static char *buf = NULL;
- 
- 	*key_p = *value_p = NULL;
- 
- 	/*
- 	 * Allocate the buffer on first use. It's used to hold both the parameter
- 	 * name and value.
- 	 */
- 	if (buf == NULL)
- 		buf = malloc(MAXPGPATH + 1);
- 	bufp = buf;
- 
- 	/* Skip any whitespace at the beginning of line */
- 	for (ptr = cmdline; *ptr; ptr++)
- 	{
- 		if (!isspace((unsigned char) *ptr))
- 			break;
- 	}
- 	/* Ignore empty lines */
- 	if (*ptr == '\0' || *ptr == '#')
- 		return true;
- 
- 	/* Read the parameter name */
- 	key = bufp;
- 	while (*ptr  !isspace((unsigned char) *ptr) 
- 		   *ptr != '='  *ptr != '\'')
- 		*(bufp++) = *(ptr++);
- 	*(bufp++) = '\0';
- 
- 	/* Skip to the beginning quote of the parameter value */
- 	ptr = strchr(ptr, '\'');
- 	if (!ptr)
- 		return false;
- 	ptr++;
- 
- 	/* Read the parameter value to *bufp. Collapse any '' escapes as we go. */
- 	value = bufp;
- 	for (;;)
- 	{
- 		if (*ptr == '\'')
- 		{
- 			ptr++;
- 			if (*ptr == '\'')
- *(bufp++) = '\'';
- 			else
- 			{
- /* end of parameter */
- *bufp = '\0';
- break;
- 			}
- 		}
- 		else if (*ptr == '\0')
- 			return false;		/* unterminated quoted string */
- 		else
- 			*(bufp++) = *ptr;
- 
- 		ptr++;
- 	}
- 	*(bufp++) = '\0';
- 
- 	/* Check that there's no garbage after the value */
- 	while (*ptr)
- 	{
- 		if (*ptr == '#')
- 			break;
- 		if (!isspace((unsigned char) *ptr))
- 			return false;
- 		ptr++;
- 	}
- 
- 	/* Success! */
- 	*key_p = key;
- 	*value_p = value;
- 	return true;
- }
- 
- /*
   * See if there is a recovery command file (recovery.conf), and if so
   * read in parameters for archive recovery and XLOG streaming.
   *
--- 5019,5024 
***
*** 5147,5153  readRecoveryCommandFile(void)
  		char	   *tok1;
  		char	   *tok2;
  
! 		if (!parseRecoveryCommandFileLine(cmdline, tok1, tok2))
  		{
  			syntaxError = true;
  			break;
--- 5054,5060 
  		char	   *tok1;
  		char	   *tok2;
  
! 		if (!cfParseOneLine(cmdline, tok1, tok2))
  		{
  			syntaxError = true;
  			break;
*** a/src/backend/utils/misc/Makefile
--- b/src/backend/utils/misc/Makefile
***
*** 15,21  include $(top_builddir)/src/Makefile.global
  override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
  
  OBJS = guc.o help_config.o pg_rusage.o ps_status.o superuser.o tzparser.o \
!rbtree.o
  
  # This location might depend on the installation directories. Therefore
  # we can't subsitute it into pg_config.h.
--- 15,21 
  override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
  
  OBJS = guc.o help_config.o pg_rusage.o ps_status.o superuser.o tzparser.o \
!rbtree.o cfparser.o
  
  # This location might depend on the installation directories. Therefore
  # we can't subsitute it into pg_config.h.
*** /dev/null
--- b/src/backend/utils/misc/cfparser.c
***
*** 0 
--- 1,114 
+ /*-
+  *
+  * cfparser.c
+  *	  Function for parsing RecoveryCommandFile and Extension Control files
+  *
+  * This very simple file format (line oriented, variable = 'value') is used
+  * as the recovery.conf and the extension control file format.
+  *
+  * Portions Copyright (c) 1996-2010, PostgreSQL Global Development Group
+  * Portions Copyright (c) 1994, Regents of the University of California
+  *
+  * IDENTIFICATION
+  *	  src/backend/utils/misc/cfparser.c
+  *
+  *-
+  */
+ 
+ #include postgres.h
+ 
+ /*
+  * Parse one line

Re: [HACKERS] Issues with two-server Synch Rep

2010-10-19 Thread Josh Berkus




Absolutely.  For a synch standby, you can't tolerate any standby delay
at all.  This means that anywhere from 1/4 to 3/4 of queries on the
standby would be cancelled on any high-traffic OLTP server.  Hence,
useless.


Don't agree with your numbers there and you seem to be assuming no
workarounds would be in use. A different discussion, I think.


The only viable workaround would be to delay vacuum on the master, no?


Adding the feedback channel looks trivial to me, once we've got the main
sync rep patch in. I'll handle that.


OK. I thought it would be a major challenge.  Ideally, we'd have some 
way to turn snapshot publication on or off; for a standby which was not 
receiving reads, we wouldn't need it.  Maybe make it contingent on the 
status of hot_standby GUC?  That would make sense.



For this reason, I've removed the apply mode from my patch, for now. I
want to get the simplest possible patch agreed and then add things
later.


Um, ok.  apply mode is still useful for users who are not running 
queries against the standby.  Which many won't.


--
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: extensible enums

2010-10-19 Thread Andrew Dunstan




On 10/19/2010 12:21 AM, Andrew Dunstan wrote:



On 10/18/2010 10:52 AM, Tom Lane wrote:

We could possibly deal with enum types that follow the existing
convention if we made the cache entry hold a list of all the original,
known-to-be-sorted OIDs.  (This could be reasonably compact and cheap to
probe if it were represented as a starting OID and a Bitmapset of delta
values, since we can assume that the initial set of OIDs is pretty close
together.)  But we have to have that cache entry, and we have to consult
it on every single comparison, so it's definitely going to be slower
than before.

So I'm thinking the comparison procedure goes like this:

1. Both OIDs even?
If so, just compare them numerically, and we're done.

2. Lookup cache entry for enum type.

3. Both OIDs in list of known-sorted OIDs?
If so, just compare them numerically, and we're done.

4. Search the part of the cache entry that lists sort positions.
If not both present, refresh the cache entry.
If still not present, throw error.

5. Compare by sort positions.

Step 4 is the slowest part but would be avoided in most cases.
However, step 2 is none too speedy either, and would usually
be required when dealing with pre-existing enums.


OK, I've made adjustments that I think do what you're suggesting.




I've discovered and fixed a couple more bugs in this. I have one or two 
more things to fix and then I'll send a new patch.


Meanwhile, I've been testing  a database that was upgraded from 9.0, so 
it has a lot of odd-numbered Oids. It's not really clear from 
performance testing that the bitmap is a huge win, or even a win at all. 
(Of course, my implementation might suck too.) I'll continue testing.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Location for makeWholeRowVar()?

2010-10-19 Thread Tom Lane

The cause of bug #5716 is that preprocess_targetlist is trying to get
away with creating a whole-row variable marked with type RECORDOID,
even in cases where a specific composite type is known for the
referenced RTE.  This fails because now we might have non-equal Vars
representing the same RTE attribute in different places in the same
query.

I think the best fix for this is to have one and only one function
generating whole-row Vars, which means pulling out the guts of the
parser's transformWholeRowRef() function into a function that looks
like the attached.  Now I'm wondering where to put it.  I'm tempted
to put it into makefuncs.c next to makeVar(), and it fits there without
needing to add any new #includes; but it looks a tad more complicated
than most other functions in makefuncs.c.  Does anyone think this is
a bad place for it, and if so where would you put it instead?

regards, tom lane



/*
 * makeWholeRowVar -
 *creates a Var node representing a whole row from the specified RTE
 */
Var *
makeWholeRowVar(RangeTblEntry *rte,
Index varno,
Index varlevelsup)
{
Var*result;
Oid toid;

switch (rte-rtekind)
{
case RTE_RELATION:
/* relation: the rowtype is a named composite type */
toid = get_rel_type_id(rte-relid);
if (!OidIsValid(toid))
elog(ERROR, could not find type OID for 
relation %u,
 rte-relid);
result = makeVar(varno,
 InvalidAttrNumber,
 toid,
 -1,
 varlevelsup);
break;
case RTE_FUNCTION:
toid = exprType(rte-funcexpr);
if (type_is_rowtype(toid))
{
/* func returns composite; same as relation 
case */
result = makeVar(varno,
 
InvalidAttrNumber,
 toid,
 -1,
 varlevelsup);
}
else
{
/*
 * func returns scalar; instead of making a 
whole-row Var,
 * just reference the function's scalar output. 
 (XXX this
 * seems a tad inconsistent, especially if 
f.* was
 * explicitly written ...)
 */
result = makeVar(varno,
 1,
 toid,
 -1,
 varlevelsup);
}
break;
case RTE_VALUES:
toid = RECORDOID;
/* returns composite; same as relation case */
result = makeVar(varno,
 InvalidAttrNumber,
 toid,
 -1,
 varlevelsup);
break;
default:

/*
 * RTE is a join or subselect.  We represent this as a 
whole-row
 * Var of RECORD type.  (Note that in most cases the 
Var will be
 * expanded to a RowExpr during planning, but that is 
not our
 * concern here.)
 */
result = makeVar(varno,
 InvalidAttrNumber,
 RECORDOID,
 -1,
 varlevelsup);
break;
}

return result;
}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] knngist plans

2010-10-19 Thread Marios Vodas

On Sat, Oct 16, 2010 at 8:42 PM, Oleg Bartunov o...@sai.msu.su wrote:

 Marios,

 you're right. There are several reasons for poor documentation, but of
 course,
 no excuse, we do need good docs any way ! It's very nice you're willing to
 write one, since it's always better seen from outside of development.
 I think it'd be better to use wiki, so other people can join.
 Pictures would be nice, but I don't know if there is consensus about using
 pictures in documentation.


Wiki sounds good. Where can I find it?


 Oleg

 btw, we're thinking in the background about new indexing infrastructure -
 Spatial GiST, but it's rather big project, so we need sponsors.


 Is it related to SP-GIST (http://www.cs.purdue.edu/spgist/)?

Re: [HACKERS] Timeline in the light of Synchronous replication

2010-10-19 Thread Robert Haas

On Mon, Oct 18, 2010 at 4:31 AM, Fujii Masao masao.fu...@gmail.com wrote:
 But, even though we will have done that, it should be noted that WAL in
 A might be ahead of that in B. For example, A might crash right after
 writing WAL to the disk and before sending it to B. So when we restart
 the old master A as the standby after failover, we should need to delete
 some WAL files (in A) which are inconsistent with the WAL sequence in B.

Right.  There's no way to make it categorically safe to turn A into a
standby, because there's no way to guarantee that the fsyncs of the
WAL happen at the same femtosecond on both machines.  What we should
be looking for is a reliable way to determine whether or not it is in
fact safe.  Timelines are intended to provide that, but there are
holes, so they don't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] gist DatumGetPointer returns pointer to corrupted data

2010-10-19 Thread Marios Vodas

I index these structures in gist:

typedef struct {
 uint8 type_flag;
 float8 xi;
 float8 yi;
 Timestamp ti;
 float8 xe;
 float8 ye;
 Timestamp te;
 int32 id;
 } typ_s_flagged;

 typedef struct {
 uint8 type_flag;
 float8 xl;
 float8 yl;
 Timestamp tl;
 float8 xh;
 float8 yh;
 Timestamp th;
 } typ_b_flagged;


typ_s_flagged is the type of leaf entries and typ_b_flagged is for non-leaf
entries.
This is how I determine which type it is in functions union, picksplit,
penalty etc (I tried to use GIST_LEAF but it produced errors in execution
time!, anyway I know this might not be a best practice but it is not wrong).

GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0); //in penalty,
 consistent

//or GistEntryVector *entryvec = (GistEntryVector *) PG_GETARG_POINTER(0);
 entry = entryvec-vector[i]; in union and picksplit
 uint8 *type_flag = (uint8 *) DatumGetPointer(entry-key);
 if (*type_flag == 0) {
   typ_s_flagged *p1 = (typ_s_flagged *) DatumGetPointer(entry-key);
 } else if(*type_flag == 1){
   typ_b_flagged *p2 = (typ_b_flagged *) DatumGetPointer(entry-key);
 }


The problem is that when I access *p1-te* or *p1-id* or *p2-th* the value
I get is zeros, for both. But I get correct values for variables before *
p1-te*.
I checked my code multiple times and I didn't found a mistake like bad size
in palloc or wrong variable assignment.
I checked compress function and it seems to accept and return correct
values.
*Does anyone have any idea on how to solve this? Or why it happens?
*

Re: [HACKERS] Simplifying replication

2010-10-19 Thread Greg Stark

On Tue, Oct 19, 2010 at 9:16 AM, Josh Berkus j...@agliodbs.com wrote:
 Well, one thing to be addressed is separating the PITR functionality from
 replication.  PITR needs a lot of features -- timelines, recovery stop
 points, etc. -- which replication doesn't need or want.  I think that
 focussing on streaming replication functionality and ignoring the archive
 logs case is probably the best way to logically separate these two.
  Presumably anyone who needs archive logs as well will be a professional
 DBA.

The way things stand you *always* need archived logs. Even if you have
streaming set up it might try to use archived logs if it falls too far
behind.

Also all the features PITR needs are needed by replication as well.
Recovery stop points are absolutely critical. Otherwise if your
replica crashed it would have to start over from the original clone
time and replay all logs since then.

Timelines are not as obvious but perhaps that's our own mistake. When
you fail over to your replica shouldn't the new master get a new
timelineid? Isn't that the answer to the failure case when a slave
finds it's ahead of the master? If it has already replayed logs from a
different timelineid in the same lsn range then it can't switch
timelines to follow the new master. But if it hasn't then it can.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simplifying replication

2010-10-19 Thread Josh Berkus

Greg,

 The way things stand you *always* need archived logs. Even if you have
 streaming set up it might try to use archived logs if it falls too far
 behind.

Actually, you don't.  If you're willing to accept possible
desynchronization and recloning of the standbys, then you can skip the
archive logs.

 Timelines are not as obvious but perhaps that's our own mistake. When
 you fail over to your replica shouldn't the new master get a new
 timelineid? Isn't that the answer to the failure case when a slave
 finds it's ahead of the master? If it has already replayed logs from a
 different timelineid in the same lsn range then it can't switch
 timelines to follow the new master. But if it hasn't then it can.

Oh?  Do we have this information (i.e. what LSNs are associated with
which timeline)?


-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Creation of temporary tables on read-only standby servers

2010-10-19 Thread Robert Haas

On Mon, Oct 18, 2010 at 6:05 PM, Greg Stark gsst...@mit.edu wrote:
 On Mon, Oct 18, 2010 at 1:27 PM, Robert Haas robertmh...@gmail.com wrote:
 It'd be kinda cool if we had it, but the work required to get there
 seems far out of proportion to the benefits ...

 I agree.  I think that's backing into the problem from the wrong end.
 The limiting factor here is that we require the entire cluster to be
 replicated.  If you could replicate individual tables/schemas, then
 this problem disappears.  Of course, that's not easy either, but if
 you're going to solve a really hard problem, you might as well pick
 that one.

 That seems like an orthogonal issue. You can have a replica with fewer
 objects -- which requires ignoring parts of the logs -- or more
 objects -- which requires being able to do local modifications and
 have local transaction states independent of the master. They're both
 potentially useful features and neither replaces the need for the
 other.

 Simon talked about filtering the transaction logs a while back and got
 a lot of pushback. But the more uses we put the slaves to the more it
 will come up. I figure we'll eventually do something for this though
 we might want more experience with the current setup before we dive
 into it.

 Adding extra objects in the slaves sounds massively harder. The idea
 of using temp tables might be a useful simplification because in the
 general case what you want is a separate set of xids with a separate
 clog and wal. That sounds like it would be a lot more complex.

Well, temp tables really want a separate set of XIDs with a separate
CLOG, too.  Admittedly, they don't necessarily need WAL, if you can
make them work without catalog entries, but that's not so easy either.

 Another possibility though is to use the MED stuff. If we could
 support creating tables locally that were actually hosted by a
 separate database then you could do updates, inserts, etc against
 those tables since they're actually happening in the remote database.
 Alternately you set up your slave to be the target of remote tables in
 a read-write data warehouse database.  But this approach has
 limitations of its own until our MED implementation is a lot more
 powerful than it is today.

Agreed on all points.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, this time with a patch

2010-10-19 Thread Robert Haas

On Tue, Oct 19, 2010 at 12:09 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Tom Lane t...@sss.pgh.pa.us writes:
 You could argue that either way I guess.  The script knows what it
 needs, but OTOH just about every extension there is will probably
 be generating useless NOTICEs unless something is done, so maybe
 the extension management code should take care of it for them.

 Either way is the key here too, so please find attached a revised (v5)
 patch which will force log_min_messages and client_min_messages to
 WARNING while the script is run.

It seems good to do this in the normal case, but (1) if
client_min_messages was already set higher than WARNING, we probably
should not lower it and (2) we might want to have a way to lower it
for troubleshooting purposes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_wal_senders must die

2010-10-19 Thread Robert Haas

On Tue, Oct 19, 2010 at 12:18 PM, Josh Berkus j...@agliodbs.com wrote:
 On 10/19/2010 09:06 AM, Greg Smith wrote:

 I think Magnus's idea to bump the default to 5 triages the worst of the
 annoyance here, without dropping the feature (which has uses) or waiting
 for new development to complete.  I'd be in favor of just committing
 that change right now, before it gets forgotten about, and then if
 nobody else gets around to further work at least something improved here
 for 9.1.

 Heck, even *I* could write that patch, if we're agreed.  Although you can
 commit it.

Setting max_wal_senders to a non-zero value causes additional work to
be done every time a transaction commits, aborts, or is prepared.
It's possible that this overhead is too trivial to be worth worrying
about; I haven't looked at it closely.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Creation of temporary tables on read-only standby servers

2010-10-19 Thread Martijn van Oosterhout

On Tue, Oct 19, 2010 at 02:52:01PM -0400, Robert Haas wrote:
 Well, temp tables really want a separate set of XIDs with a separate
 CLOG, too.  Admittedly, they don't necessarily need WAL, if you can
 make them work without catalog entries, but that's not so easy either.

At one point there was the idea to have a sort of permanent temporary
tables which would have a pg_class entry but each session would have
its own copy. Replicated slaves would then also be able to use this
construction.

Doesn't help with the XIDs though.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature

Re: [HACKERS] Creation of temporary tables on read-only standby servers

2010-10-19 Thread Robert Haas

On Tue, Oct 19, 2010 at 3:01 PM, Martijn van Oosterhout
klep...@svana.org wrote:
 On Tue, Oct 19, 2010 at 02:52:01PM -0400, Robert Haas wrote:
 Well, temp tables really want a separate set of XIDs with a separate
 CLOG, too.  Admittedly, they don't necessarily need WAL, if you can
 make them work without catalog entries, but that's not so easy either.

 At one point there was the idea to have a sort of permanent temporary
 tables which would have a pg_class entry but each session would have
 its own copy. Replicated slaves would then also be able to use this
 construction.

 Doesn't help with the XIDs though.

Hmm... yeah, I think I was the one who proposed that, actually.  :-)

The trick is that it would require us to have two pg_class tables, two
pg_attribute tables, two pg_attrdef tables, etc.: in each case, one
permanent and one temporary.  I am not sure how complex that will turn
out to be.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Serializable snapshot isolation patch

2010-10-19 Thread Jeff Davis

On Mon, 2010-10-18 at 22:12 -0500, Kevin Grittner wrote:
 Hmmm...  When Joe was looking at the patch he exposed an intermittent
 problem with btree indexes which turned out to be related to improper
 handling of the predicate locks during index page clean-up caused by a
 vacuum.  Easy to fix once found, but it didn't always happen, even
 with identical runs.  (I'm guessing that was due to the randomness in
 picking a page to split when inserting into a group of identical
 keys.)  Perhaps a similar bug lurks in the GiST predicate locking.

I briefly looked into this when I woke up this morning, and I think I'm
close. I can reproduce it every time, so I should be able to fix this as
soon as I can find some free time (tomorrow night, probably).

I might also be able to help with the 2PC issue, but it will be at least
a week or two before I have the free time to dig into that one.
 
  Eventually, we may need to keep statistics about the number of
  conflicts happening, and start to rate-limit new serializable
  transactions (effectively reducing parallelism) to ensure that
  reasonable progress can be made (hopefully faster than serial
  execution).
  
 Ah, you've exposed just how self-serving my interest in an admission
 control policy mechanism is!  ;-)
  
 http://archives.postgresql.org/pgsql-hackers/2009-12/msg02189.php

Cool!
 
  I also *really* hope to add the SERIALIZABLE READ ONLY
  DEFERRABLE mode so that pg_dump and other read-only transactions
  don't push things into a state where the rollback rate spikes:
 
  http://archives.postgresql.org/pgsql-hackers/2010-09/msg01771.php
 
  One potential issue there is starvation. I don't see how you would
  guarantee that there will ever be a point to grab a safe snapshot,
  even if all of the other transactions are completing.
  
 After mulling it over in greater detail the previously, I see your
 point.  I'll think about it some more, but this particular idea might
 be a dead end.

I didn't quite mean that it's a dead-end. It still seems tempting to at
least try to get a safe snapshot, especially for a transaction that you
know will be long-running. Maybe a timeout or something?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-10-19 Thread Robert Haas

On Tue, Oct 19, 2010 at 11:22 AM, Terry Laurenzo t...@laurenzo.org wrote:
 Agreed.  BSON was born out of implementations that either lacked arbitrary
 precision numbers or had a strong affinity to an int/floating point way of
 thinking about numbers.  I believe that if BSON had an arbitrary precision
 number type, it would be a proper superset of JSON.

 As an aside, the max range of an int in BSON 64bits.  Back to my original
 comment that BSON was grown instead of designed, it looks like both the
 32bit and 64bit integers were added late in the game and that the original
 designers perhaps were just going to store all numbers as double.

 Perhaps we should enumerate the attributes of what would make a good binary
 encoding?

I think we should take a few steps back and ask why we think that
binary encoding is the way to go.  We store XML as text, for example,
and I can't remember any complaints about that on -bugs or
-performance, so why do we think JSON will be different?  Binary
encoding is a trade-off.  A well-designed binary encoding should make
it quicker to extract a small chunk of a large JSON object and return
it; however, it will also make it slower to return the whole object
(because you're adding serialization overhead).  I haven't seen any
analysis of which of those use cases is more important and why.

I am also wondering how this proposed binary encoding scheme will
interact with TOAST.  If the datum is compressed on disk, you'll have
to decompress it anyway to do anything with it; at that point, is
there still going to be a noticeable speed-up from using the binary
encoding?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, this time with a patch

2010-10-19 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Tue, Oct 19, 2010 at 12:09 PM, Dimitri Fontaine
 dimi...@2ndquadrant.fr wrote:
 Either way is the key here too, so please find attached a revised (v5)
 patch which will force log_min_messages and client_min_messages to
 WARNING while the script is run.

 It seems good to do this in the normal case, but (1) if
 client_min_messages was already set higher than WARNING, we probably
 should not lower it and (2) we might want to have a way to lower it
 for troubleshooting purposes.

I think the standard way of troubleshooting would be to run the
extension's script by hand, no?  So while I agree with (1),
I'm not sure we need to sweat about (2).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] knngist - 0.8

2010-10-19 Thread Robert Haas

2010/10/19 Teodor Sigaev teo...@sigaev.ru:
 Thinking about it that way, perhaps we could add an integer column
 amop_whats_it_good_for that gets used as a bit field.  That wouldn't
 require changing the index structure, although it might break some
 other things.

 OPERATOR strategy_number ( op_type [ , op_type ] ) [ FOR { SEARCH |
 ORDER } [, ...] ]

 It's very desirable thing to be able to distinguish roles in consistent
 method of GiST: computation of distance could be very expensive and. The
 single way to provide it in current GiST interface is a strategy number. Of
 course, we could add 6-th argument to consistent to point role, but I don't
 think that's good decision.

To me, adding an additional argument (or maybe providing a whole
separate support function, separate from consistent) seems quite
natural, because now you have a way to pass all the other little bits
that might matter... ASC/DESC, NULLS FIRST/LAST, collation OID, etc.
You can define the additional argument as providing all of the extra
info about how the operator is being used, and, if it's being used for
ordering, the details of the requested order.  What is your thinking
on the matter?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with two-server Synch Rep

2010-10-19 Thread Simon Riggs

On Tue, 2010-10-19 at 09:36 -0700, Josh Berkus wrote:
  Absolutely.  For a synch standby, you can't tolerate any standby delay
  at all.  This means that anywhere from 1/4 to 3/4 of queries on the
  standby would be cancelled on any high-traffic OLTP server.  Hence,
  useless.
 
  Don't agree with your numbers there and you seem to be assuming no
  workarounds would be in use. A different discussion, I think.
 
 The only viable workaround would be to delay vacuum on the master, no?

Sure. It's a documented workaround.

  Adding the feedback channel looks trivial to me, once we've got the main
  sync rep patch in. I'll handle that.
 
 OK. I thought it would be a major challenge.  Ideally, we'd have some 
 way to turn snapshot publication on or off; for a standby which was not 
 receiving reads, we wouldn't need it.  Maybe make it contingent on the 
 status of hot_standby GUC?  That would make sense.

Yes, I thought to extend hot_standby parameter to have 3 modes:

off (default) | on | feedback

  For this reason, I've removed the apply mode from my patch, for now. I
  want to get the simplest possible patch agreed and then add things
  later.
 
 Um, ok.  apply mode is still useful for users who are not running 
 queries against the standby.  Which many won't.

I agree many people won't use the standby for reads.

Why then would they care about the difference between fsync and apply?

Anyway, that suggestion is mainly so we can reach agreement on a minimal
patch, not suggesting it as a limit on released functionality.

-- 
 Simon Riggs   http://www.2ndquadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gist DatumGetPointer returns pointer to corrupted data

2010-10-19 Thread Robert Haas

On Tue, Oct 19, 2010 at 1:46 PM, Marios Vodas mvo...@gmail.com wrote:
 I index these structures in gist:

 typedef struct {
     uint8 type_flag;
     float8 xi;
     float8 yi;
     Timestamp ti;
     float8 xe;
     float8 ye;
     Timestamp te;
     int32 id;
 } typ_s_flagged;

 typedef struct {
     uint8 type_flag;
     float8 xl;
     float8 yl;
     Timestamp tl;
     float8 xh;
     float8 yh;
     Timestamp th;
 } typ_b_flagged;

 typ_s_flagged is the type of leaf entries and typ_b_flagged is for non-leaf
 entries.
 This is how I determine which type it is in functions union, picksplit,
 penalty etc (I tried to use GIST_LEAF but it produced errors in execution
 time!, anyway I know this might not be a best practice but it is not wrong).

 GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0); //in penalty,
 consistent

 //or GistEntryVector *entryvec = (GistEntryVector *) PG_GETARG_POINTER(0);
 entry = entryvec-vector[i]; in union and picksplit
 uint8 *type_flag = (uint8 *) DatumGetPointer(entry-key);
 if (*type_flag == 0) {
   typ_s_flagged *p1 = (typ_s_flagged *) DatumGetPointer(entry-key);
 } else if(*type_flag == 1){
   typ_b_flagged *p2 = (typ_b_flagged *) DatumGetPointer(entry-key);
 }


 The problem is that when I access p1-te or p1-id or p2-th the value I get
 is zeros, for both. But I get correct values for variables before p1-te.
 I checked my code multiple times and I didn't found a mistake like bad size
 in palloc or wrong variable assignment.
 I checked compress function and it seems to accept and return correct
 values.
 Does anyone have any idea on how to solve this? Or why it happens?

Is pg_type.typlen set correctly?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_wal_senders must die

2010-10-19 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Tue, Oct 19, 2010 at 12:18 PM, Josh Berkus j...@agliodbs.com wrote:
 On 10/19/2010 09:06 AM, Greg Smith wrote:
 I think Magnus's idea to bump the default to 5 triages the worst of the
 annoyance here, without dropping the feature (which has uses) or waiting
 for new development to complete.

 Setting max_wal_senders to a non-zero value causes additional work to
 be done every time a transaction commits, aborts, or is prepared.

Yes.  This isn't just a numeric parameter; it's also a boolean
indicating do I want to pay the overhead to be prepared to be a
replication master?.  Josh has completely failed to make a case that
that should be the default.  In fact, the system would fail to start
at all if we just changed the default for max_wal_senders and not the
default for wal_level.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-10-19 Thread David E. Wheeler

On Oct 19, 2010, at 12:17 PM, Robert Haas wrote:

 I think we should take a few steps back and ask why we think that
 binary encoding is the way to go.  We store XML as text, for example,
 and I can't remember any complaints about that on -bugs or
 -performance, so why do we think JSON will be different?  Binary
 encoding is a trade-off.  A well-designed binary encoding should make
 it quicker to extract a small chunk of a large JSON object and return
 it; however, it will also make it slower to return the whole object
 (because you're adding serialization overhead).  I haven't seen any
 analysis of which of those use cases is more important and why.

Maybe someone has numbers on that for the XML type?

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-10-19 Thread Joseph Adams

On Tue, Oct 19, 2010 at 11:22 AM, Terry Laurenzo t...@laurenzo.org wrote:
 Perhaps we should enumerate the attributes of what would make a good binary
 encoding?

Not sure if we're discussing the internal storage format or the binary
send/recv format, but in my humble opinion, some attributes of a good
internal format are:

1. Lightweight - it'd be really nice for the JSON datatype to be
available in core (even if extra features like JSONPath aren't).
2. Efficiency - Retrieval and storage of JSON datums should be
efficient.  The internal format should probably closely resemble the
binary send/recv format so there's a good reason to use it.

A good attribute of the binary send/recv format would be
compatibility.  For instance, if MongoDB (which I know very little
about) has binary send/receive, perhaps the JSON data type's binary
send/receive should use it.

Efficient retrieval and update of values in a large JSON tree would be
cool, but would be rather complex, and IMHO, overkill.  JSON's main
advantage is that it's sort of a least common denominator of the type
systems of many popular languages, making it easy to transfer
information between them.  Having hierarchical key/value store support
would be pretty cool, but I don't think it's what PostgreSQL's JSON
data type should do.

On Tue, Oct 19, 2010 at 3:17 PM, Robert Haas robertmh...@gmail.com wrote:
 I think we should take a few steps back and ask why we think that
 binary encoding is the way to go.  We store XML as text, for example,
 and I can't remember any complaints about that on -bugs or
 -performance, so why do we think JSON will be different?  Binary
 encoding is a trade-off.  A well-designed binary encoding should make
 it quicker to extract a small chunk of a large JSON object and return
 it; however, it will also make it slower to return the whole object
 (because you're adding serialization overhead).  I haven't seen any
 analysis of which of those use cases is more important and why.

Speculation: the overhead involved with retrieving/sending and
receiving/storing JSON (not to mention TOAST
compression/decompression) will be far greater than that of
serializing/unserializing.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-10-19 Thread Boszormenyi Zoltan

Hi,

Boszormenyi Zoltan írta:
 There is one problem with the patch, it doesn't survive
 make check. One of the regression tests fails the
 Assert(!cur_em-em_is_child);
 line in process_equivalence() in equivclass.c, but I couldn't
 yet find it what causes it. The why is vaguely clear:
 something modifies the ec_members list in the eq_classes'
 tree nodes while the node is in the tree. Because I didn't find
 the offender yet, I couldn't fix it, so I send this patch as is.
 I'll try to fix it if someone doesn't beat me in fixing it. :)
   

I am a little closer to this bug, maybe I even found the cause of it.
I found that process_equivalence() is called from:

path/equivclass.c:
reconsider_outer_join_clause()
reconsider_full_join_clause()
plan/initsplan.c:
distribute_qual_to_rels()

The problem is with the two functions in path/equivclass.c,
as process_equivalance() and those functions are all walk
the tree, and the current RBTree code can only deal with
one walk at a time. We need to push/pop the iterator state
to be able to serve more than one walkers.

Also, we need to split out the tree modifying part from
process_equivalence() somehow, as the tree walking
also cannot deal with node additions and deletions.

Best regards,
Zoltán Böszörményi

-- 
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extensions, this time with a patch

2010-10-19 Thread Alvaro Herrera

Excerpts from Dimitri Fontaine's message of mar oct 19 13:09:47 -0300 2010:
 Tom Lane t...@sss.pgh.pa.us writes:
  You could argue that either way I guess.  The script knows what it
  needs, but OTOH just about every extension there is will probably
  be generating useless NOTICEs unless something is done, so maybe
  the extension management code should take care of it for them.
 
 Either way is the key here too, so please find attached a revised (v5)
 patch which will force log_min_messages and client_min_messages to
 WARNING while the script is run.

I think you should pstrdup the return value from GetConfigOption.
(As a matter of style, I'd name the local vars differently, so that they
don't show up when you search the code for the global vars; perhaps
old_cmsgs and old_lmsgs would do, for example.)

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-10-19 Thread Tom Lane

Boszormenyi Zoltan z...@cybertec.at writes:
 The problem is with the two functions in path/equivclass.c,
 as process_equivalance() and those functions are all walk
 the tree, and the current RBTree code can only deal with
 one walk at a time. We need to push/pop the iterator state
 to be able to serve more than one walkers.

Good luck with that --- the iteration state is embedded in the rbtree.

 Also, we need to split out the tree modifying part from
 process_equivalence() somehow, as the tree walking
 also cannot deal with node additions and deletions.

That's not happening either.  Maybe you need to think of some other data
structure to use.  Hashtable maybe?  dynahash.c at least has reasonably
well-defined limitations in this area.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_wal_senders must die

2010-10-19 Thread Rob Wultsch

On Tue, Oct 19, 2010 at 12:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Oct 19, 2010 at 12:18 PM, Josh Berkus j...@agliodbs.com wrote:
 On 10/19/2010 09:06 AM, Greg Smith wrote:
 I think Magnus's idea to bump the default to 5 triages the worst of the
 annoyance here, without dropping the feature (which has uses) or waiting
 for new development to complete.

 Setting max_wal_senders to a non-zero value causes additional work to
 be done every time a transaction commits, aborts, or is prepared.

 Yes.  This isn't just a numeric parameter; it's also a boolean
 indicating do I want to pay the overhead to be prepared to be a
 replication master?.  Josh has completely failed to make a case that
 that should be the default.  In fact, the system would fail to start
 at all if we just changed the default for max_wal_senders and not the
 default for wal_level.

                        regards, tom lane

If the variable is altered such that it is dynamic, could it not be
updated by the postmaster when a connection attempts to begin
replicating?

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] knngist plans

2010-10-19 Thread Oleg Bartunov


On Tue, 19 Oct 2010, Marios Vodas wrote:


On Sat, Oct 16, 2010 at 8:42 PM, Oleg Bartunov o...@sai.msu.su wrote:


Marios,

you're right. There are several reasons for poor documentation, but of
course,
no excuse, we do need good docs any way ! It's very nice you're willing to
write one, since it's always better seen from outside of development.
I think it'd be better to use wiki, so other people can join.
Pictures would be nice, but I don't know if there is consensus about using
pictures in documentation.



Wiki sounds good. Where can I find it?


Lets start from this page.
http://wiki.postgresql.org/wiki/GiST
I think this page should be index page for all GiST documentation.






Oleg

btw, we're thinking in the background about new indexing infrastructure -
Spatial GiST, but it's rather big project, so we need sponsors.



Is it related to SP-GIST (http://www.cs.purdue.edu/spgist/)?


sort of.

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_wal_senders must die

2010-10-19 Thread Robert Haas

On Tue, Oct 19, 2010 at 3:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Oct 19, 2010 at 12:18 PM, Josh Berkus j...@agliodbs.com wrote:
 On 10/19/2010 09:06 AM, Greg Smith wrote:
 I think Magnus's idea to bump the default to 5 triages the worst of the
 annoyance here, without dropping the feature (which has uses) or waiting
 for new development to complete.

 Setting max_wal_senders to a non-zero value causes additional work to
 be done every time a transaction commits, aborts, or is prepared.

 Yes.  This isn't just a numeric parameter; it's also a boolean
 indicating do I want to pay the overhead to be prepared to be a
 replication master?.  Josh has completely failed to make a case that
 that should be the default.  In fact, the system would fail to start
 at all if we just changed the default for max_wal_senders and not the
 default for wal_level.

On a slightly tangential note, it would be really nice to be able to
change things like wal_level and max_wal_senders on the fly.  ISTM
that needing to change the setting is actually the smaller portion of
the gripe; the bigger annoyance is that you bring the system down,
change one setting, bring it back up, take a hot backup, and then
realize that you have to shut the system down again to change the
other setting, because you forgot to do it the first time.  Since the
error message you get on the slave side is pretty explicit, the sheer
fact of needing to change max_wal_senders is only a minor
inconvenience; but the possible need to take down the system a second
time is a major one.

One of the problems with making these and similar settings changeable
on the fly is that, to be safe, we can declare that the value is
increased until we can verify that every still-running backend has
picked up the increased value and begun acting upon it.  We have no
architecture for knowing when that has happened; the postmaster just
signals its children and forgets about it.  If you had a shared memory
array (similar to the ProcArray, but perhaps with a separate set of
locks) in which backends wrote their last-known values of GUCs that
fall into this category, it would be possible to write a function that
returns the answer to the question What is the least value of
wal_level that exists in any backend in the system?.  Maybe that's
not the right architecture; I'm not sure.  But we should see if we can
figure out something that will work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-10-19 Thread Robert Haas

On Tue, Oct 19, 2010 at 3:40 PM, Joseph Adams
joeyadams3.14...@gmail.com wrote:
 On Tue, Oct 19, 2010 at 3:17 PM, Robert Haas robertmh...@gmail.com wrote:
 I think we should take a few steps back and ask why we think that
 binary encoding is the way to go.  We store XML as text, for example,
 and I can't remember any complaints about that on -bugs or
 -performance, so why do we think JSON will be different?  Binary
 encoding is a trade-off.  A well-designed binary encoding should make
 it quicker to extract a small chunk of a large JSON object and return
 it; however, it will also make it slower to return the whole object
 (because you're adding serialization overhead).  I haven't seen any
 analysis of which of those use cases is more important and why.

 Speculation: the overhead involved with retrieving/sending and
 receiving/storing JSON (not to mention TOAST
 compression/decompression) will be far greater than that of
 serializing/unserializing.

I speculate that your speculation is incorrect.  AIUI, we, unlike
$COMPETITOR, tend to be CPU-bound rather than IO-bound on COPY.  But
perhaps less speculation and more benchmarking is in order.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_wal_senders must die

2010-10-19 Thread Josh Berkus


 Yes.  This isn't just a numeric parameter; it's also a boolean
 indicating do I want to pay the overhead to be prepared to be a
 replication master?.  

Since this is the first time I've heard of the overhead, it would be
hard for me to have taken that into consideration.  If there was
discussion about this ever, I missed it.  That explains why we changed
the default in RC1 though, which was a surprise to me.

What is the overhead exactly?   What specific work do we do?  Link?

 Josh has completely failed to make a case that
 that should be the default.  In fact, the system would fail to start
 at all if we just changed the default for max_wal_senders and not the
 default for wal_level.

Well, now that you mention it, I also think that hot standby should be
the default.  Yes, I know about the overhead, but I also think that the
number of our users who want easy replication *far* outnumber the users
who care about an extra 10% WAL overhead.

 On a slightly tangential note, it would be really nice to be able to
 change things like wal_level and max_wal_senders on the fly.

This would certainly help solve the problem.  Heck, if we could even
just change them with a reload (rather than a restart) it would be a
vast improvement.

 ISTM
 that needing to change the setting is actually the smaller portion of
 the gripe; the bigger annoyance is that you bring the system down,
 change one setting, bring it back up, take a hot backup, and then
 realize that you have to shut the system down again to change the
 other setting, because you forgot to do it the first time.  Since the
 error message you get on the slave side is pretty explicit, the sheer
 fact of needing to change max_wal_senders is only a minor
 inconvenience; but the possible need to take down the system a second
 time is a major one.

You've summed up the problem nicely.  I'll note that even though I've
already set up 3 production systems using SR, I still mess this up about
1/3 the time and have to restart and reclone.  There's just too many
settings.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Creation of temporary tables on read-only standby servers

2010-10-19 Thread Greg Stark

On Tue, Oct 19, 2010 at 12:03 PM, Robert Haas robertmh...@gmail.com wrote:
 The trick is that it would require us to have two pg_class tables, two
 pg_attribute tables, two pg_attrdef tables, etc.: in each case, one
 permanent and one temporary.  I am not sure how complex that will turn
 out to be.

Tom suggested using inheritance for this.

I find it strange to try constructing catalog tables to represent
these local definitions which never need to be read by any other
backend and in any case are 1:1 copies of the global catalog entries.

It seems to me simpler and more direct to just nail relcache
entries for these objects into memory and manipulate them directly.
They can be constructed from the global catalog tables and then
tweaked to point to the backend local temporary tables.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gist DatumGetPointer returns pointer to corrupted data

2010-10-19 Thread Marios Vodas

On Tue, Oct 19, 2010 at 10:23 PM, Robert Haas robertmh...@gmail.com wrote:

 Is pg_type.typlen set correctly?


Are you refering to table pg_type?
If yes, those type structures exist only in c I didn't write any in-out
functions, so they don't exist in sql level.
I pass a different data type (which exists in sql) to compress method which
changes it to typ_s_flagged.
The problem occurs in C code. I hope my explanation is clear.

Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-10-19 Thread Terry Laurenzo

On Tue, Oct 19, 2010 at 2:46 PM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, Oct 19, 2010 at 3:40 PM, Joseph Adams
 joeyadams3.14...@gmail.com wrote:
  On Tue, Oct 19, 2010 at 3:17 PM, Robert Haas robertmh...@gmail.com
 wrote:
  I think we should take a few steps back and ask why we think that
  binary encoding is the way to go.  We store XML as text, for example,
  and I can't remember any complaints about that on -bugs or
  -performance, so why do we think JSON will be different?  Binary
  encoding is a trade-off.  A well-designed binary encoding should make
  it quicker to extract a small chunk of a large JSON object and return
  it; however, it will also make it slower to return the whole object
  (because you're adding serialization overhead).  I haven't seen any
  analysis of which of those use cases is more important and why.
 
  Speculation: the overhead involved with retrieving/sending and
  receiving/storing JSON (not to mention TOAST
  compression/decompression) will be far greater than that of
  serializing/unserializing.

 I speculate that your speculation is incorrect.  AIUI, we, unlike
 $COMPETITOR, tend to be CPU-bound rather than IO-bound on COPY.  But
 perhaps less speculation and more benchmarking is in order.


After spending a week in the morass of this, I have to say that I am less
certain than I was on any front regarding the text/binary distinction.  I'll
take some time and benchmark different cases.  My hypothesis is that a well
implemented binary structure and conversions will add minimal overhead in
the IO + Validate case which would be the typical in/out flow.  It could be
substantially faster for binary send/receive because the validation step
could be eliminated/reduced.  Further storing as binary reduces the overhead
of random access to the data by database functions.

I'm envisioning staging this up as follows:
   1. Create a jsontext.  jsontext uses text as its internal
representation.  in/out functions are essentially a straight copy or a copy
+ validate.
   2. Create a jsonbinary type.  This uses an optimized binary format for
internal rep and send/receive.  in/out is a parse/transcode operation to
standard JSON text.
   3. Internal data access functions and JSON Path require a jsonbinary.
   4. There are implicit casts to/from jsontext and jsonbinary.

I've got a grammar in mind for the binary structure that I'll share later
when I've got some more time.  It's inspired by $COMPETITOR's format but a
little more sane, using type tags that implicitly define the size of the
operands, simplifying parsing.

I'll then define the various use cases and benchmark using the different
types.  Some examples include such as IO No Validate, IO+Validate, Store and
Index, Internal Processing, Internal Composition, etc.

The answer may be to have both a jsontext and jsonbinary type as each will
be optimized for a different case.

Make sense?  It may be a week before I get through this.
Terry

[HACKERS] pg_rawdump

2010-10-19 Thread Stephen R. van den Berg

I spent yesterday writing a new tool pg_rawdump (which will be released as
open source in due course), which takes the table files in an arbitrary
pgsql database, and is able to transform those back into tables (including
toast values).

In the course of doing this (a customer needed it because he only had a
copy of those files, the pg_clog etc. dirs were lost), I noticed that
there are two problems which complicate recovery (needlessly):
a. It takes some guesswork to find out which file corresponds with
   which table.
b. In order to recover the table content, it is (obviously) necessary
   to correlate with the original table definition, which obviously
   is lost with the rest of the information.

In order to simplify recovery at this point (enormously), it would
have been very helpful (at almost negligible cost), to have the name
of the table, the name of the columns, and the types of the
columns available.

Why don't we insert that data into the first page of a regular table
file after in the special data area?

I'd be willing to create a patch for that (should be pretty easy),
if nobody considers it to be a bad idea.
-- 
Stephen.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Creation of temporary tables on read-only standby servers

2010-10-19 Thread Pavel Stehule

2010/10/19 Greg Stark gsst...@mit.edu:
 On Tue, Oct 19, 2010 at 12:03 PM, Robert Haas robertmh...@gmail.com wrote:
 The trick is that it would require us to have two pg_class tables, two
 pg_attribute tables, two pg_attrdef tables, etc.: in each case, one
 permanent and one temporary.  I am not sure how complex that will turn
 out to be.

 Tom suggested using inheritance for this.

 I find it strange to try constructing catalog tables to represent
 these local definitions which never need to be read by any other
 backend and in any case are 1:1 copies of the global catalog entries.

 It seems to me simpler and more direct to just nail relcache
 entries for these objects into memory and manipulate them directly.
 They can be constructed from the global catalog tables and then
 tweaked to point to the backend local temporary tables.


+1

I had very ugly implementation of global temp tables based just on
relcache. The only one problem was with refresh of relcache. But
it's not too easy - for real using it's necessary to overwrite -
statistics, indexes, access statistics.

I had a idea to modify a data pages cache for support a permanent (and
only memory) pages. Then we can have a temporal tuples together with
standard tuples in one system table. This can be similar to memory
tables in mysql and can be interesting in cooperation with mmap - very
fast access to some tables or pre readed tables.

Regards

Pavel

 --
 greg

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-10-19 Thread Greg Stark

On Tue, Oct 19, 2010 at 12:17 PM, Robert Haas robertmh...@gmail.com wrote:
 I think we should take a few steps back and ask why we think that
 binary encoding is the way to go.  We store XML as text, for example,
 and I can't remember any complaints about that on -bugs or
 -performance, so why do we think JSON will be different?  Binary
 encoding is a trade-off.  A well-designed binary encoding should make
 it quicker to extract a small chunk of a large JSON object and return
 it; however, it will also make it slower to return the whole object
 (because you're adding serialization overhead).  I haven't seen any
 analysis of which of those use cases is more important and why.


The elephant in the room is if the binary encoded form is smaller then
it occupies less ram and disk bandwidth to copy it around. If your
database is large that alone is the dominant factor. Doubling the size
of all the objects in your database means halving the portion of the
database that fits in RAM and doubling the amount of I/O required to
complete any given operation. If your database fits entirely in RAM
either way then it still means less RAM bandwidth used which is often
the limiting factor but depending on how much cpu effort it takes to
serialize and deserialize the balance could shift either way.




-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-10-19 Thread Pavel Stehule


 After spending a week in the morass of this, I have to say that I am less
 certain than I was on any front regarding the text/binary distinction.  I'll
 take some time and benchmark different cases.  My hypothesis is that a well
 implemented binary structure and conversions will add minimal overhead in
 the IO + Validate case which would be the typical in/out flow.  It could be
 substantially faster for binary send/receive because the validation step
 could be eliminated/reduced.  Further storing as binary reduces the overhead
 of random access to the data by database functions.

 I'm envisioning staging this up as follows:
    1. Create a jsontext.  jsontext uses text as its internal
 representation.  in/out functions are essentially a straight copy or a copy
 + validate.
    2. Create a jsonbinary type.  This uses an optimized binary format for
 internal rep and send/receive.  in/out is a parse/transcode operation to
 standard JSON text.
    3. Internal data access functions and JSON Path require a jsonbinary.
    4. There are implicit casts to/from jsontext and jsonbinary.

some years ago I solved similar problems with xml type. I think, so
you have to calculate with two factors:

a) all varlena types are compressed - you cannot to get some
interesting fragment or you cannot tu update some interesting
fragment, every time pg working with complete document

b) access to some fragment of JSON or XML document are not really
important, because fast access to data are solved via indexes.

c) only a few API allows binary communication between server/client.
Almost all interfases use only text based API. I see some possible
interesting direction for binary protocol when some one uses a
javascript driver, when some one use pg in some javascript server side
environment, but it isn't a often used now.

Regards

Pavel


 I've got a grammar in mind for the binary structure that I'll share later
 when I've got some more time.  It's inspired by $COMPETITOR's format but a
 little more sane, using type tags that implicitly define the size of the
 operands, simplifying parsing.

 I'll then define the various use cases and benchmark using the different
 types.  Some examples include such as IO No Validate, IO+Validate, Store and
 Index, Internal Processing, Internal Composition, etc.

 The answer may be to have both a jsontext and jsonbinary type as each will
 be optimized for a different case.

 Make sense?  It may be a week before I get through this.
 Terry



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rawdump

2010-10-19 Thread Greg Stark

On Tue, Oct 19, 2010 at 1:12 PM, Stephen R. van den Berg s...@cuci.nl wrote:
 In order to simplify recovery at this point (enormously), it would
 have been very helpful (at almost negligible cost), to have the name
 of the table, the name of the columns, and the types of the
 columns available.

 Why don't we insert that data into the first page of a regular table
 file after in the special data area?

 I'd be willing to create a patch for that (should be pretty easy),
 if nobody considers it to be a bad idea.

There isn't necessarily one value for these attributes.  You can
rename columns and that rename may succeed and commit or fail and
rollback. You can drop or add columns and some rows will have or not
have the added columns at all. You could even add a column, insert
some rows, then abort -- all in a transaction. So some (aborted) rows
will have extra columns that aren't even present in the current table
definition.

All this isn't to say the idea you're presenting is impossible or a
bad idea. If this meta information was only a hint for forensic
purposes and you take into account these caveats it might still be
useful. But I'm not sure how useful. I mean, you can't really decipher
everything properly without the data in the catalog -- and you have to
premise this on the idea that you've lost everything in the catalog
but not the data in other tables. Which seems like a narrow use case.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-10-19 Thread Pavel Stehule

2010/10/19 Greg Stark gsst...@mit.edu:
 On Tue, Oct 19, 2010 at 12:17 PM, Robert Haas robertmh...@gmail.com wrote:
 I think we should take a few steps back and ask why we think that
 binary encoding is the way to go.  We store XML as text, for example,
 and I can't remember any complaints about that on -bugs or
 -performance, so why do we think JSON will be different?  Binary
 encoding is a trade-off.  A well-designed binary encoding should make
 it quicker to extract a small chunk of a large JSON object and return
 it; however, it will also make it slower to return the whole object
 (because you're adding serialization overhead).  I haven't seen any
 analysis of which of those use cases is more important and why.


 The elephant in the room is if the binary encoded form is smaller then
 it occupies less ram and disk bandwidth to copy it around. If your
 database is large that alone is the dominant factor. Doubling the size
 of all the objects in your database means halving the portion of the
 database that fits in RAM and doubling the amount of I/O required to
 complete any given operation. If your database fits entirely in RAM
 either way then it still means less RAM bandwidth used which is often
 the limiting factor but depending on how much cpu effort it takes to
 serialize and deserialize the balance could shift either way.

I am not sure, if this argument is important for json. This protocol
has not big overhead and json documents are pretty small. More - from
9.0 TOAST uses a relative aggresive compression. I would to like a
some standardised format for json inside pg too, but without using a
some external library I don't see a advantages to use a other format
then text.

Regards

Pavel





 --
 greg

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rawdump

2010-10-19 Thread Stephen R. van den Berg

Greg Stark wrote:
On Tue, Oct 19, 2010 at 1:12 PM, Stephen R. van den Berg s...@cuci.nl wrote:
 In order to simplify recovery at this point (enormously), it would
 have been very helpful (at almost negligible cost), to have the name
 of the table, the name of the columns, and the types of the
 columns available.

 Why don't we insert that data into the first page of a regular table
 file after in the special data area?

 I'd be willing to create a patch for that (should be pretty easy),
 if nobody considers it to be a bad idea.

There isn't necessarily one value for these attributes.  You can
rename columns and that rename may succeed and commit or fail and
rollback. You can drop or add columns and some rows will have or not
have the added columns at all. You could even add a column, insert
some rows, then abort -- all in a transaction. So some (aborted) rows
will have extra columns that aren't even present in the current table
definition.

True.

All this isn't to say the idea you're presenting is impossible or a
bad idea. If this meta information was only a hint for forensic
purposes and you take into account these caveats it might still be
useful.

This is exactly what I meant it for.  It would contain the most
recent alter table state (give or take some delay due to cache
flushes).

 But I'm not sure how useful. I mean, you can't really decipher
everything properly without the data in the catalog -- and you have to
premise this on the idea that you've lost everything in the catalog
but not the data in other tables. Which seems like a narrow use case.

It happens, more often than you'd think.  My client had it, I've
seen numerous google hits which show the same.
-- 
Stephen.

Be braver.  You cannot cross a chasm in two small jumps.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Domains versus arrays versus typmods

2010-10-19 Thread Tom Lane

In bug #5717, Richard Huxton complains that a domain declared like
CREATE DOMAIN mynums numeric(4,2)[1];
doesn't work properly, ie, the typmod isn't enforced in places where
it reasonably ought to be.  I dug into this a bit, and found that there
are more worms in this can than I first expected.

In the first place, plpgsql seems a few bricks shy of a load, in that
its code to handle assignment to an array element doesn't appear to
be considering arrays with typmod at all: it just passes typmod -1 to
exec_simple_cast_value() in the PLPGSQL_DTYPE_ARRAYELEM case in
exec_assign_value().  Nonetheless, if you try a case like

declare
  x numeric(4,2)[1];
begin
  x[1] := $1;

you'll find the typmod *is* enforced.  It turns out that the reason that
it works is that after constructing the modified array value,
exec_assign_value() calls itself recursively to do the actual assignment
to the array variable --- and that call sees that the array variable has
a typmod, so it applies the cast *at the array level*, ie it
disassembles the array, re-coerces each element, and builds a new array.
So it's horridly inefficient but it works.

That could and should be improved, but it seems to be just a matter
of local changes in pl_exec.c, and it's only an efficiency hack anyway.
The big problem is that none of this happens when the variable is
declared as a domain, and I believe that that is a significantly more
wide-ranging issue.

The real issue as I see it is that it's possible to subscript an array
without realizing that the array's type is really a domain that
incorporates a typmod specification.  This happens because of a hack
we did to simplify implementation of domains over arrays: we expose the
array element type as typelem of the domain as well.  For example, given
Richard's declaration we have

regression=# select typname,oid,typelem,typbasetype,typtypmod from pg_type 
where typname = 'mynums';
 typname |  oid  | typelem | typbasetype | typtypmod 
-+---+-+-+---
 mynums  | 92960 |1700 |1231 |262150
(1 row)

If we were to wrap another domain around that, we'd have

regression=# create domain mynums2 as mynums;
CREATE DOMAIN
regression=# select typname,oid,typelem,typbasetype,typtypmod from pg_type 
where typname = 'mynums2';
 typname |  oid  | typelem | typbasetype | typtypmod 
-+---+-+-+---
 mynums2 | 92976 |1700 |   92960 |-1
(1 row)

and at this point it's completely unobvious from inspection of the
pg_type entry that any typmod needs to be applied when subscripting.

So I'm suspicious that there are a boatload of bugs related to this kind
of domain declaration, not just the one case in plpgsql.

I think that what we ought to do about it is to stop exposing typelem
in domains' pg_type rows.  If you want to subscript a domain value, you
should have to drill down to its base type with getBaseTypeAndTypmod,
which would also give you the correct typmod to apply.  If we set
typelem to zero in domain pg_type rows, it shouldn't take too long to
find any places that are missing this consideration --- the breakage
will be obvious rather than subtle.

This catalog definitional change obviously shouldn't be back-patched,
but possibly the ensuing code changes could be, since the typelem change
is just to expose where things are wrong and wouldn't be a prerequisite
for corrected code to behave correctly.  Or we could decide that this is
a corner case we're not going to try to fix in back branches.  It's been
wrong since day 0, so there's certainly an argument that it's not
important enough to risk back-patching.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Serializable snapshot isolation patch

2010-10-19 Thread Kevin Grittner

Jeff Davis pg...@j-davis.com wrote:
 
 I briefly looked into this when I woke up this morning, and I
 think I'm close. I can reproduce it every time, so I should be
 able to fix this as soon as I can find some free time (tomorrow
 night, probably).
 
OK, I'll focus on other areas.
 
 I might also be able to help with the 2PC issue, but it will be at
 least a week or two before I have the free time to dig into that
 one.
 
If I get through the other points you raised, I'll start reading up
on 2PC.
 
 I didn't quite mean that it's a dead-end. It still seems tempting
 to at least try to get a safe snapshot, especially for a
 transaction that you know will be long-running. Maybe a timeout or
 something?
 
But what would you do when you hit the timeout?
 
One thing that would work, but I really don't think I like it, is
that a request for a snapshot for such a transaction would not only
block until it could get a clean snapshot (no overlapping
serializable non-read-only transactions which overlap serializable
transactions which wrote data and then committed in time to be
visible to the snapshot being acquired), but it would *also* block
*other* serializable transactions, if they were non-read-only, on an
attempt to acquire a snapshot.  That would at least guarantee that
the serializable read only deferrable transaction could get its
snapshot as soon as the initial set of problem overlapping
transactions completed, but it would be an exception to the SSI
introduces no new blocking guarantee.  :-(  I was OK with that for
the particular transaction where DEFERRABLE was requested, but to
have that block other serializable transactions seems pretty iffy to
me.
 
Short of that, I think you would just have to wait for completion of
all known problem transactions and then try again.  On a server with
a heave write load, particularly if the length of the writing
transactions was highly variable, you could go through a lot of
iterations before getting that clean snapshot.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Creation of temporary tables on read-only standby servers

2010-10-19 Thread Tom Lane

Greg Stark gsst...@mit.edu writes:
 On Tue, Oct 19, 2010 at 12:03 PM, Robert Haas robertmh...@gmail.com wrote:
 The trick is that it would require us to have two pg_class tables, two
 pg_attribute tables, two pg_attrdef tables, etc.: in each case, one
 permanent and one temporary.  I am not sure how complex that will turn
 out to be.

 Tom suggested using inheritance for this.

 I find it strange to try constructing catalog tables to represent
 these local definitions which never need to be read by any other
 backend and in any case are 1:1 copies of the global catalog entries.

 It seems to me simpler and more direct to just nail relcache
 entries for these objects into memory and manipulate them directly.

Relcache entries alone are not gonna work.  There is way too much stuff
that assumes that tables are correctly represented in the system
catalogs.

It's possible that you could make it work if you created the child
catalogs and immediately filled them with suitable entries describing
the child catalogs themselves.  Bootstrapping that might be a bit of fun
though.

The larger issue in all this is that there's so much code that supposes
that it just has to scan a particular catalog when it wants an entry,
and isn't going to go looking for child tables of the catalog.  That's
possibly fixable but is not likely to be easy, unless you can somehow
hide it within systable_beginscan and related routines.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-10-19 Thread Tom Lane

Terry Laurenzo t...@laurenzo.org writes:
 After spending a week in the morass of this, I have to say that I am less
 certain than I was on any front regarding the text/binary distinction.  I'll
 take some time and benchmark different cases.  My hypothesis is that a well
 implemented binary structure and conversions will add minimal overhead in
 the IO + Validate case which would be the typical in/out flow.  It could be
 substantially faster for binary send/receive because the validation step
 could be eliminated/reduced.

I think that arguments proceeding from speed of binary send/receive
aren't worth the electrons they're written on, because there is nothing
anywhere that says what the binary format ought to be.  In the case of
XML we're just using the text representation as the binary format too,
and nobody's complained about that.  If we were to choose to stick with
straight text internally for a JSON type, we'd do the same thing, and
again nobody would complain.

So, if you want to make a case for using some binary internal format or
other, make it without that consideration.

 I'm envisioning staging this up as follows:
1. Create a jsontext.  jsontext uses text as its internal
 representation.  in/out functions are essentially a straight copy or a copy
 + validate.
2. Create a jsonbinary type.  This uses an optimized binary format for
 internal rep and send/receive.  in/out is a parse/transcode operation to
 standard JSON text.

Ugh.  Please don't.  JSON should be JSON, and nothing else.  Do you see
any other datatypes in Postgres that expose such internal considerations?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rawdump

2010-10-19 Thread Tom Lane

Stephen R. van den Berg s...@cuci.nl writes:
 In order to simplify recovery at this point (enormously), it would
 have been very helpful (at almost negligible cost), to have the name
 of the table, the name of the columns, and the types of the
 columns available.

 Why don't we insert that data into the first page of a regular table
 file after in the special data area?

(1) it wouldn't necessarily fit

(2) what are you going to do to maintain it during ALTER TABLE?

(3) if there are any custom types involved, you're still lost.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-10-19 Thread Tom Lane

Greg Stark gsst...@mit.edu writes:
 The elephant in the room is if the binary encoded form is smaller then
 it occupies less ram and disk bandwidth to copy it around.

It seems equally likely that a binary-encoded form could be larger
than the text form (that's often true for our other datatypes).
Again, this is an argument that would require experimental evidence
to back it up.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Creation of temporary tables on read-only standby servers

2010-10-19 Thread Greg Stark

On Tue, Oct 19, 2010 at 3:45 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Relcache entries alone are not gonna work.  There is way too much stuff
 that assumes that tables are correctly represented in the system
 catalogs.


Well we're talking about multiple things now. In the global temporary
table case they *are* properly represented in the system catalogs.
Except for their local state such as the actual relfilenode all the
structural attributes are going to be accurate.

In the case of tables created locally on a slave, well, that's more complicated.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

2010-10-19 Thread Terry Laurenzo

On Tue, Oct 19, 2010 at 4:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Terry Laurenzo t...@laurenzo.org writes:
  After spending a week in the morass of this, I have to say that I am less
  certain than I was on any front regarding the text/binary distinction.
  I'll
  take some time and benchmark different cases.  My hypothesis is that a
 well
  implemented binary structure and conversions will add minimal overhead in
  the IO + Validate case which would be the typical in/out flow.  It could
 be
  substantially faster for binary send/receive because the validation step
  could be eliminated/reduced.

 I think that arguments proceeding from speed of binary send/receive
 aren't worth the electrons they're written on, because there is nothing
 anywhere that says what the binary format ought to be.  In the case of
 XML we're just using the text representation as the binary format too,
 and nobody's complained about that.  If we were to choose to stick with
 straight text internally for a JSON type, we'd do the same thing, and
 again nobody would complain.

 So, if you want to make a case for using some binary internal format or
 other, make it without that consideration.

  I'm envisioning staging this up as follows:
 1. Create a jsontext.  jsontext uses text as its internal
  representation.  in/out functions are essentially a straight copy or a
 copy
  + validate.
 2. Create a jsonbinary type.  This uses an optimized binary format
 for
  internal rep and send/receive.  in/out is a parse/transcode operation to
  standard JSON text.

 Ugh.  Please don't.  JSON should be JSON, and nothing else.  Do you see
 any other datatypes in Postgres that expose such internal considerations?

regards, tom lane


I don't think anyone here was really presenting arguments as yet.  We're
several layers deep on speculation and everyone is saying that
experimentation is needed.

I've got my own reasons for going down this path for a solution I have in
mind.  I had thought that some part of that might have been applicable to pg
core, but if not, that's no problem.  For my own edification, I'm going to
proceed down this path and see where it leads.  I'll let the list know what
I find out.

I can understand the sentiment that JSON should be JSON and nothing else
from a traditional database server's point of view, but there is nothing
sacrosanct about it in the broader context.

Terry

Re: [HACKERS] Creation of temporary tables on read-only standby servers

2010-10-19 Thread Tom Lane

Greg Stark gsst...@mit.edu writes:
 On Tue, Oct 19, 2010 at 3:45 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Relcache entries alone are not gonna work.  There is way too much stuff
 that assumes that tables are correctly represented in the system
 catalogs.

 Well we're talking about multiple things now. In the global temporary
 table case they *are* properly represented in the system catalogs.
 Except for their local state such as the actual relfilenode all the
 structural attributes are going to be accurate.

... and relpages and reltuples ... it's really not going to be that easy
to have a table that isn't described in pg_class.  Which the structure
you're describing isn't.  There might be a template for it in pg_class,
but that's something entirely different.

 In the case of tables created locally on a slave, well, that's more
 complicated.

I think they're more alike than you think.  If we had the infrastructure
to do local temp tables this way, it'd be pretty easy to use that to
instantiate per-backend copies of global temp tables.  (The global
entities would be templates, not actual tables.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 132 matches

Mail list logo