date:20110720


On 2011-07-09 09:14, Kohei KaiGai wrote:

OK, I'll try to modify the patch according to the flag of pg_proc design.
As long as the default of user-defined function is off, and we provide
built-in functions
with appropriate configurations, it seems to me the burden of DBA is
quite limited.


A different solution to the leaky view problem could be to check access 
to a tuple at or near the heaptuple visibility level, in addition to 
adding tuple access filter conditions to the query. This would have both 
the possible performance benefits of the query rewriting solution, as 
the everything is filtered before further processing at the heaptuple 
visibility level. Fixing leaky views is not needed because they don't 
exist in this case, the code is straightforward, and there's less change 
of future security bugs by either misconfiguration of leakproof 
functions or code that might introduce another leak path.


regards,

--
Yeb Havinga
http://www.mgrid.net/
Mastering Medical Data


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache

2011-07-20 Thread Kohei KaiGai

2011/7/19 Yeb Havinga yebhavi...@gmail.com:
 On 2011-07-19 22:39, Heikki Linnakangas wrote:

 On 19.07.2011 12:28, Yeb Havinga wrote:

 On 2011-07-18 22:21, Kohei KaiGai wrote:

 The Scientific Linux 6 is not suitable, because its libselinux version
 is a bit older
 than this patch expects (libselinux-2.0.99 or later).
 My recommendation is Fedora 15, instead.

 Installing right now, thanks for the heads up!

 Would it be reasonable to #ifdefs the parts that require version 2.0.99?
 That's very recent so might not be available on popular distributions for
 some time, so it would be nice to not have a hard dependency on it. You
 could have autoconf rules to check for the new functions, and only use them
 if they are available.

 In contrary to the subject I was under the impression the current patch is
 for the 9.2 release since it is in a commitfest for the 9.2 release cycle,
 which would make the libselinux-2.0.99 dependency less of a problem.

Sorry, the subject line was just my typo.

I'd like to agree with Yeb's opinion, because the reason why we determined
libselinux-2.0.93 as least requirement is the implementation in v9.1 used
a function that was supported in 2.0.93 due to omission of userspace avc
feature from the initial version to minimize patch size.
If we included userspace avc feature from the beginning, it would require
libselinux-2.0.99.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.2] Fix leaky-view problem, part 2

2011-07-20 Thread Kohei KaiGai

2011/7/20 Yeb Havinga yebhavi...@gmail.com:
 On 2011-07-09 09:14, Kohei KaiGai wrote:

 OK, I'll try to modify the patch according to the flag of pg_proc design.
 As long as the default of user-defined function is off, and we provide
 built-in functions
 with appropriate configurations, it seems to me the burden of DBA is
 quite limited.

 A different solution to the leaky view problem could be to check access to a
 tuple at or near the heaptuple visibility level, in addition to adding tuple
 access filter conditions to the query. This would have both the possible
 performance benefits of the query rewriting solution, as the everything is
 filtered before further processing at the heaptuple visibility level. Fixing
 leaky views is not needed because they don't exist in this case, the code is
 straightforward, and there's less change of future security bugs by either
 misconfiguration of leakproof functions or code that might introduce another
 leak path.

I'm not fun with this approach. The harderst one to find out a solution is
a way to distinguish qualifiers of security policy and others.
Leaky functions looks like a harmless function, them the optimizer will
distribute them onto particular scan plans.
If it was executed on the visibility check of tuples, same problem will be
reproduced. So, I'm still fun with a flag of pg_proc catalog and idea of
security barrier.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: patch review : Add ability to constrain backend temporary file space

2011-07-20 Thread Mark Kirkwood

And I thought I should mention: a big thank you to the the reviewers and 
interested parties, Cedric, Tatsuo, Robert and Tom who did a review + 
fixes as he committed the patch - nice work guys!


regards

Mark

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another issue with invalid XML values

On Jul20, 2011, at 01:42 , Tom Lane wrote:
 Florian Pflug f...@phlo.org writes:
 Updated patch attached. Do you think this is Ready for Committer?
 
 I've been looking through this patch.  While it's mostly good, I'm
 pretty unhappy with the way that the pg_xml_init/pg_xml_done code is
 deliberately designed to be non-reentrant (ie, throw an Assert if
 pg_xml_init is called twice without pg_xml_done in between).
 There are at least two issues with that:

Hm, yeah, I did it that way because I didn't see an immediate need
to support nested pg_xml_init/done calls. But you're right - since
we're already nearly there, we might as well go all the way.

 1. If you forget pg_xml_done in some code path, you'll find out from
 an Assert at the next pg_xml_init, which is probably far away from where
 the actual problem is.

Very true. In fact, I did miss one pg_xml_done() call in the xml2
contrib module initially, and it took me a while to locate the place
it was missing.

But won't me miss that error entirely if me make it re-entrant?

 2. I don't think it's entirely unlikely that uses of libxml could be
 nested.
 
 xpath_table in particular calls an awful lot of stuff between
 pg_xml_init and pg_xml_done, and is at the very least open to loss of
 control via an elog before it's called pg_xml_done.

True. But note that this function's error handling leaves something
to be desired anyway - I think it'll leak memory if it elog()s, for
example.

I was tempted to make all the functions in xml2/ use TRY/CATCH blocks
like the ones in core's xml.c do. The reasons I held back was I that
I felt these cleanups weren't part of the problem my patch was trying
to solve. At least according to our docs the xml2 contrib module is
also deprecated, which was another reason to make only the absolutely
necessary adjustments.

 I think this patch has already paid 90% of the notational price for
 supporting fully re-entrant use of libxml.  What I'm imagining is
 that we move all five of the static variables (xml_strictness,
 xml_err_occurred, xml_err_buf, xml_structuredErrorFunc_saved,
 xml_structuredErrorContext_saved) into a struct that's palloc'd
 by pg_xml_init and eventually passed to pg_xml_done.  It could be
 passed to xml_errorHandler via the currently-unused context argument.
 A nice side benefit is that we could get rid of PG_XML_STRICTNESS_NONE.

I'd marginally prefer for the caller, not pg_xml_init(), to be responsible
for allocating the struct. That gives the caller the option to simply
allocate it on the stack.

 Now the risk factor if we do that is that if someone misses a
 pg_xml_done call, we leave an error handler installed with a context
 argument that's probably pointing at garbage, and if someone then tries
 to use libxml without re-establishing their error handler, they've 
 got problems. But they'd have problems anyway with the current form of
 the patch.

Currently it'd actually work in non-assert-enabled builds, so long
as no third-party library depends on its libxml error handler being
restored by us (which the old code simply assume never to be the case).

 We could provide some defense against this by including a
 magic identifier value in the palloc'd struct and making
 xml_errorHandler check it before doing anything dangerous.  Also, we
 could make pg_xml_done warn if libxml's current context pointer is
 different from the struct passed to it, which would provide another
 means of detection that somebody had missed a cleanup call.

There's one additional hazard. Suppose you have a functions f() and
g() defined as

g() {
  PgXmlState* xml_state = pg_xml_init();
  ...
  /* Ups. No pg_xml_done() calll here */  
}

f() {
  PgXmlState* xml_state = pg_xml_init();

  g();

  xmlSomeLibXmlFunctionCall();
  XML_CHECK_AND_EREPORT(xml_state, ...);

  pg_xml_done();
}

Error reported by xmlSomeLibXmlFunctionCall() will now be
missed by the XML_CHECK_AND_EREPORT in f(), because they'll
modify g()'s xml_state, not f()'s.

So we really ought to make pg_xml_done() complain if libxml's
current error context isn't what it expects.

 Unless someone sees a major hole in this idea, or a better way to do it,
 I'm going to modify the patch along those lines and commit.

If pg_xml_done() and the error handler do the safety check you suggested,
it seems fine.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Initial Review: JSON contrib modul was: Re: [HACKERS] Another swing at JSON

On Jul20, 2011, at 06:40 , Joey Adams wrote:
 On Wed, Jul 20, 2011 at 12:32 AM, Robert Haas robertmh...@gmail.com wrote:
 Thanks for the input.  I'm leaning in this direction too.  However, it
 will be a tad tricky to implement the conversions efficiently, ...
 
 I'm a bit confused, because I thought what I was talking about was not
 doing any conversions in the first place.
 
 We want to be able to handle \u escapes when the database encoding
 is not UTF-8.  We could leave them in place, but sooner or later
 they'll need to be converted in order to unwrap or compare JSON
 strings.

Hm, I agree that we need to handle \u escapes in JSON input.
We won't ever produce them during output though, right?

 The approach being discussed is converting escapes to the database
 encoding.  This means escapes of characters not available in the
 database encoding (e.g. \u266B in ISO-8859-1) will be forbidden.
 
 The PostgreSQL parser (which also supports Unicode escapes) takes a
 simpler approach: don't allow non-ASCII escapes unless the server
 encoding is UTF-8.

Maybe JSON should simply reject them too, then. At least for now.

How does that XML type handle the situation? It seems that it'd have
the same problem with unicode entity references #;.

best regards,
Florian Pflug




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sepgsql documentation fixes

On Tue, Jul 19, 2011 at 6:10 AM, Kohei Kaigai kohei.kai...@emea.nec.com wrote:
  /etc/selinux/targeted/contexts/sepgsql_contexts:  line 33 has invalid 
  object
  type db_blobs
  It is not an error, but just a notification to inform users that
  sepgsql_contexts
  file contains invalid lines. It is harmless, so we can ignore them.
  I don't think sepgsql.sgml should mention about this noise, because it 
  purely
  come from the problem in libselinux and refpolicy; these are external 
  packages
  from viewpoint of PostgreSQL.
 This is in contradiction with the current phrase in the documentation
 that's right after the sepgsql.sql loading: If the installation process
 completes without error, you can now start the server normally. IMHO if
 there are warnings that can be ignored, it would limit confusion for
 sepgsql users if the documentation would say it at this point, e.g. If
 the installation process completes without error, you can now start the
 server normally. Warnings from errors in sepgsql_contexts, a file
 external to PostgreSQL, are harmless and can be ignored.

 Indeed, it might be confusable to understand whether the installation got
 completed correctly, or not.
 So, I appended more descriptions about this messages, as follows:

 +  para
 +   Please note that you may see the following notifications depending on
 +   the combination of a particular version of productnamelibselinux/
 +   and productnameselinux-policy/.
 +screen
 +/etc/selinux/targeted/contexts/sepgsql_contexts:  line 33 has invalid object 
 ty
 +/screen
 +   It is harmless messages and already fixed. So, you can ignore these
 +   messages or update related packages to the latest version.
 +  /para

 See the attached patch, that contains other 3 documentation updates.

 Thank you for this clarification. I have some ideas of things that if
 they were in the documentation they'd helped me. Instead of seeking
 agreement on each item, I propose that I gather documentation additions
 in a patch later after the review, and leave it up to you guys whether
 to include them or not.

 OK, I like to check them. In addition, I'll also revise the wikipage in
 parallel to inform correctly.

Does all of this apply to both 9.1 and 9.2devel?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-sepgsql-doc-revise.2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sepgsql documentation fixes

2011-07-20 Thread Kohei KaiGai

2011/7/20 Robert Haas robertmh...@gmail.com:
 On Tue, Jul 19, 2011 at 6:10 AM, Kohei Kaigai kohei.kai...@emea.nec.com 
 wrote:
  /etc/selinux/targeted/contexts/sepgsql_contexts:  line 33 has invalid 
  object
  type db_blobs
  It is not an error, but just a notification to inform users that
  sepgsql_contexts
  file contains invalid lines. It is harmless, so we can ignore them.
  I don't think sepgsql.sgml should mention about this noise, because it 
  purely
  come from the problem in libselinux and refpolicy; these are external 
  packages
  from viewpoint of PostgreSQL.
 This is in contradiction with the current phrase in the documentation
 that's right after the sepgsql.sql loading: If the installation process
 completes without error, you can now start the server normally. IMHO if
 there are warnings that can be ignored, it would limit confusion for
 sepgsql users if the documentation would say it at this point, e.g. If
 the installation process completes without error, you can now start the
 server normally. Warnings from errors in sepgsql_contexts, a file
 external to PostgreSQL, are harmless and can be ignored.

 Indeed, it might be confusable to understand whether the installation got
 completed correctly, or not.
 So, I appended more descriptions about this messages, as follows:

 +  para
 +   Please note that you may see the following notifications depending on
 +   the combination of a particular version of productnamelibselinux/
 +   and productnameselinux-policy/.
 +screen
 +/etc/selinux/targeted/contexts/sepgsql_contexts:  line 33 has invalid 
 object ty
 +/screen
 +   It is harmless messages and already fixed. So, you can ignore these
 +   messages or update related packages to the latest version.
 +  /para

 See the attached patch, that contains other 3 documentation updates.

 Thank you for this clarification. I have some ideas of things that if
 they were in the documentation they'd helped me. Instead of seeking
 agreement on each item, I propose that I gather documentation additions
 in a patch later after the review, and leave it up to you guys whether
 to include them or not.

 OK, I like to check them. In addition, I'll also revise the wikipage in
 parallel to inform correctly.

 Does all of this apply to both 9.1 and 9.2devel?

This update came from feedbacks when people tried to install contrib/sepgsql
of v9.1 and got troubled.
So, I think it is helpful to apply v9.1 also.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sepgsql documentation fixes

On Wed, Jul 20, 2011 at 8:50 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 Does all of this apply to both 9.1 and 9.2devel?

 This update came from feedbacks when people tried to install contrib/sepgsql
 of v9.1 and got troubled.
 So, I think it is helpful to apply v9.1 also.

OK, done, with some corrections.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache


On 2011-07-14 21:46, Kohei KaiGai wrote:

Sorry, the syscache part was mixed to contrib/sepgsql part
in my previous post.
Please see the attached revision.

Although its functionality is enough simple (it just reduces
number of system-call invocation), its performance
improvement is obvious.
So, I hope someone to volunteer to review these patches.
This is a review of the two userspace access vector cache patches for 
SE-PostgreSQL. Proofreading the source I've seen mostly cosmetic things. 
Though I have a few questions, I think the overal shape of the patch is 
good enough to mark it ready for comitter.


Remarks:

* The patches apply cleanly, compile cleanly on Fedora 15. It depends on 
libselinux-2.0.99, and that is checked on by the configure script: good.


* I run SELECT sepgsql_restorecon(NULL) and saw comparable numbers in 
speed gain and also backend process memory increase as indicated by 
KaiGai-san. The vmsize without the patch increases from running 
restorecon 3864kB, with the patch is increases 8160kB, a difference of 
~5MB. Where this change comes from is unclear to me. The avc_cache holds 
only 7 entries, and the avc memory context stats indicate it's only at 
8kB. Investigation into the SECLABELOID syscache revealed that even when 
that cache was set to a #buckets of 2, still after restorecon() the 
backend's vmsize increased about the ~5MB more.


* The size of SECLABELOID is currently set to 2048, which is equal to 
sizes of the pg_proc and pg_attribute caches and hence makes sense. The 
initial contents of pg_seclabel is 3346 entries. Just to get some idea 
what the cachesize means for restorecon performance I tested some 
different values (debugging was on). Until a size of 256, restorecon had 
comparable performance. Small drop from ~415 to ~425 from 128 to 64. 
Cache of 32 had ~435ms performance. Cache of 2 had 680ms. Without 
debugging symbols, I also got a slightly better performance on the 
restorecon call with a 1024 syscache size. This might be something to 
tweak in later versions, when there is more experience what this cache 
size means for performance on real databases.


* The cache is called userspace, presumably because it isn't cached in 
shared memory? If sepgsql is targeted at installations with many 
different kinds of clients (having different subject contexts), or 
access different objects, this is a good choice.


* Checked that the cache keeps working after errors, ok.

* uavc.c defines some static variables like lru_hint, threshold and 
unlabeled. It would be nice to have a small comment with each one that 
says what the variable represents (like is done for the avc_cache struct 
members right above it)


* I had to google SELINUX_AVD_FLAGS_PERMISSIVE to understand what was 
going on. Since the goal why it is recorded a domain is permissive, is 
to prevent flooding the log, maybe renaming avc_cache.permissive into 
avc_cache.log_violations_once would explain itself.


* selinux_status_open() is called and it's man page mentions 
selinux_status_close(). It currently unmaps memory and closes a file 
descriptor, which is done on process exit anyway. I don't have enough 
experience with these kinds of system calls to have a feeling whether it 
might change in the future (and in such cases I'd have added a call to 
selinux_status_close() on proc_exit, just to be on the safe side).


* sepgsel_avs_unlabeled(). I was confused a bit because objects can have 
a label 'unlabeled', and the comments use wording like 'when unlabeled'. 
Does this mean with no label, or with a label 'unlabeled'?


* sepgsql_avc_compute(): the large comment in the beginning is hard to 
follow since sentences seem to have been a bit mixed up.


* question: why is ucontext stored in avc_cache?

* there is a new guc parameter for the user: avc_threshold, which is 
also documented. My initial question after reading the description was: 
what are the 'items' and how can I see current usage / what are numbers 
to expect? Without that knowledge any educated change of that parameter 
would be impossible. Would it be possible to remove this guc?


* avc_init has magic numbers 384, 4096. Maybe these can be defines (with 
a small comment what it is?)


* Finally, IMO the call sites, e.g. check_relation_priviliges, have 
improved in readability with this patch.


--

Yeb Havinga
http://www.mgrid.net/
Mastering Medical Data


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.2] Fix leaky-view problem, part 2

2011-07-20 Thread Noah Misch

On Wed, Jul 20, 2011 at 09:02:59AM +0200, Yeb Havinga wrote:
 On 2011-07-09 09:14, Kohei KaiGai wrote:
 OK, I'll try to modify the patch according to the flag of pg_proc design.
 As long as the default of user-defined function is off, and we provide
 built-in functions
 with appropriate configurations, it seems to me the burden of DBA is
 quite limited.

 A different solution to the leaky view problem could be to check access  
 to a tuple at or near the heaptuple visibility level, in addition to  
 adding tuple access filter conditions to the query. This would have both  
 the possible performance benefits of the query rewriting solution, as  
 the everything is filtered before further processing at the heaptuple  
 visibility level. Fixing leaky views is not needed because they don't  
 exist in this case, the code is straightforward, and there's less change  
 of future security bugs by either misconfiguration of leakproof  
 functions or code that might introduce another leak path.

The SQL-level semantics of the view define the access rules in question.  How
would you translate that into tests to apply at a lower level?

-- 
Noah Mischhttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another issue with invalid XML values

I wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:
 I was thinking that maybe it's time for this module to hook onto the
 cleanup stuff for the xact error case; or at least have a check that it
 has been properly cleaned up elesewhere.  Maybe this can be made to work
 reentrantly if there's a global var holding the current context, and it
 contains a link to the next one up the stack.  At least, my impression
 is that the PG_TRY blocks are already messy.

 Yeah, that's another way we could go.  But I'm not sure how well it
 would interact with potential third-party modules setting up their own
 libxml error handlers.  Anybody have a thought about that?

I thought a bit more about this, and realized that there's a big
obstacle to getting rid of the PG_TRY blocks this way: most of them are
responsible for telling libxml to free some data structures, not just
restoring the error handler state.  We can't drop that aspect without
introducing session-lifespan memory leaks.

In principle we could expand the responsibilities of a
transaction-cleanup hook to include freeing data structures as well as
restoring error handlers, but the idea seems a lot messier and hence
less attractive than it did before.  I was ready to get rid of the
PG_TRY blocks until I came across this problem, but now I think I'll
stick with them.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.2] Fix leaky-view problem, part 2


On 2011-07-20 16:06, Noah Misch wrote:


The SQL-level semantics of the view define the access rules in question.  How
would you translate that into tests to apply at a lower level?
I assumed the leaky view thread was about row level security, not about 
access rules to views, since it was mentioned at the RLS wiki page for 
se-pgsql. Sorry for the confusion.


regards,
Yeb


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.2] Fix leaky-view problem, part 2


On 2011-07-20 16:15, Yeb Havinga wrote:

On 2011-07-20 16:06, Noah Misch wrote:


The SQL-level semantics of the view define the access rules in 
question.  How

would you translate that into tests to apply at a lower level?
I assumed the leaky view thread was about row level security, not 
about access rules to views, since it was mentioned at the RLS wiki 
page for se-pgsql. Sorry for the confusion.
Had to digg a bit for the wiki, it was this one : 
http://wiki.postgresql.org/wiki/RLS#Issue:_A_leaky_VIEWs_for_RLS


regards,
Yeb


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.2] Fix leaky-view problem, part 2

2011-07-20 Thread Noah Misch

On Wed, Jul 20, 2011 at 04:23:10PM +0200, Yeb Havinga wrote:
 On 2011-07-20 16:15, Yeb Havinga wrote:
 On 2011-07-20 16:06, Noah Misch wrote:

 The SQL-level semantics of the view define the access rules in  
 question.  How
 would you translate that into tests to apply at a lower level?
 I assumed the leaky view thread was about row level security, not  
 about access rules to views, since it was mentioned at the RLS wiki  
 page for se-pgsql. Sorry for the confusion.
 Had to digg a bit for the wiki, it was this one :  
 http://wiki.postgresql.org/wiki/RLS#Issue:_A_leaky_VIEWs_for_RLS

It is about row-level security, broadly.  These patches close the hazard
described in the latter half of this page:
http://www.postgresql.org/docs/9.0/static/rules-privileges.html

In the example given there, phone NOT LIKE '412%' is the (row-level) access
rule that needs to apply before any possibly-leaky function sees the tuple.

-- 
Noah Mischhttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache

On Wed, Jul 20, 2011 at 9:47 AM, Yeb Havinga yebhavi...@gmail.com wrote:
 * I run SELECT sepgsql_restorecon(NULL) and saw comparable numbers in speed
 gain and also backend process memory increase as indicated by KaiGai-san.
 The vmsize without the patch increases from running restorecon 3864kB, with
 the patch is increases 8160kB, a difference of ~5MB. Where this change comes
 from is unclear to me. The avc_cache holds only 7 entries, and the avc
 memory context stats indicate it's only at 8kB. Investigation into the
 SECLABELOID syscache revealed that even when that cache was set to a
 #buckets of 2, still after restorecon() the backend's vmsize increased about
 the ~5MB more.

 * The size of SECLABELOID is currently set to 2048, which is equal to sizes
 of the pg_proc and pg_attribute caches and hence makes sense. The initial
 contents of pg_seclabel is 3346 entries. Just to get some idea what the
 cachesize means for restorecon performance I tested some different values
 (debugging was on). Until a size of 256, restorecon had comparable
 performance. Small drop from ~415 to ~425 from 128 to 64. Cache of 32 had
 ~435ms performance. Cache of 2 had 680ms. Without debugging symbols, I also
 got a slightly better performance on the restorecon call with a 1024
 syscache size. This might be something to tweak in later versions, when
 there is more experience what this cache size means for performance on real
 databases.

Both of the above points make me real nervous, especially the second
one.  We already know that the cost of initializing the system caches
is a significant chunk of backend startup overhead, and I believe that
sizeof(Dllist) = 16 on 64-bit machine, so that means that 2048-entry
catcache is going to require us to zero an additional 32kB of memory
on every backend startup for a feature that most people won't use.

http://archives.postgresql.org/pgsql-hackers/2010-11/msg01632.php
http://archives.postgresql.org/pgsql-hackers/2010-11/msg01640.php

As to the first point, the other big problem with adding a syscache
here is that it could get really big.  I've worried about that issue
previously, and Yev's perplexity about where the memory is going is
not too reassuring.

I think there are two realistic ways forward here:

1. Bite the bullet and implement a system allowing catalog caches to
be expanded on the fly at runtime.  This would be nice because, aside
from whatever value it may have for SE-PostgreSQL, it might allow us
to shrink some of the other caches down and thereby speed up backend
startup in general, with the confidence that if the session ends up
running for a while we can expand them as necessary.  In the
particular case of SE-PostgreSQL, it might be nice to be able to set
the initial cache size to 0, and only pay the cost of setting it up if
we discover that the security label feature is in use; but maybe
that's overly complex.  On the downside, this addresses only the
startup problem (zeroing previously unreferenced memory is fairly
expensive) and not the runtime problem (using too much memory is bad).
 But perhaps there is some other solution to the second problem.

2. Implement a custom cache tailored to the needs of SE-PostgreSQL
within contrib/sepgsql.  This is a bit tricky because you need some
mechanism for receiving invalidation events.
CacheRegisterSyscacheCallback() is the usual method, but that only
lets you notice catcache invalidations, and here the whole point would
be to avoid having a catcache in the first place.  Possibly we could
add a hook to PrepareForTupleInvalidation(); assuming that the hook
engages only after the IsSystemRelation and IsToastRelation checks, it
shouldn't cost that much.  Doing it this way would (1) allow the
sepgsql-specific cache to come into existence only when needed and (2)
allow the sepgsl-specific cache to apply sepgsql-specific policies for
controlling memory use.  For example, it could cap the number of
entries in the cache and age out any that are too old; or it could
arrange to maintain a single copy of each label rather than a copy for
each object.

I think that the uavc cache stuff may still be something we can think
about committing for this 'fest (I haven't reviewed it yet), but IMHO
the syscache parts need some more thought.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Questions and experiences writing a Foreign Data Wrapper

2011-07-20 Thread Albe Laurenz

I wrote a FDW for Oracle to a) learn some server coding
and b) see how well the FDW API works for me.

I came up with three questions/experiences:

1) GetUserMapping throws an error if there is no
   user mapping for the user (or PUBLIC).
   I think that it would be much more useful if
   it would return NULL or something similar instead.
   Otherwise one would have to check for existence
   beforehand, which is no less complicated than what
   GetUserMapping does.

2) If I decide to close remote database connections after
   use, I would like to do so where reasonable.
   I would like to keep the connection open between query
   planning and query execution and close it when the
   scan is done.
   The exception could be named prepared statements.
   Is there a way to tell if that is the case during
   planing or execution?

3) I am confused by the order of function calls
   during execution of a subplan. It is like this:
 BeginForeignScan
 ReScanForeignScan
 IterateForeignScan
 IterateForeignScan
 ...
 ReScanForeignScan
 IterateForeignScan
 IterateForeignScan
 ...
 EndForeignScan
  So the first ReScan is done immediately after
  BeginForeignScan. Moreover, internal parameters are not
  set in the BeginForeignScan call.

  This is probably working as designed, but BeginForeignScan
  has no way to know whether it should execute a remote
  query or not. I ended up doing that in the first call to
  IterateForeignScan because I saw no other way.

  That seems a bit unfortunate. Is there an easy way to
  change that? If not, it should be documented.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another issue with invalid XML values

Florian Pflug f...@phlo.org writes:
 On Jul20, 2011, at 01:42 , Tom Lane wrote:
 1. If you forget pg_xml_done in some code path, you'll find out from
 an Assert at the next pg_xml_init, which is probably far away from where
 the actual problem is.

 Very true. In fact, I did miss one pg_xml_done() call in the xml2
 contrib module initially, and it took me a while to locate the place
 it was missing.

 But won't me miss that error entirely if me make it re-entrant?

Yeah, I don't see any very easy way to detect a missed pg_xml_done call,
but having it be an Assert failure some time later isn't good from a
robustness standpoint.

I'm of the opinion at this point that the most reliable solution is to
have a coding convention that pg_xml_init and pg_xml_done MUST be
called in the style

pg_xml_init();
PG_TRY();
...
PG_CATCH();
{
...
pg_xml_done();
PG_RE_THROW();
}
PG_END_TRY();
pg_xml_done();

If we convert contrib/xml2 over to this style, we can get rid of some of
the weirder aspects of the LEGACY mode, such as allowing xml_ereport to
occur after pg_xml_done (which would be problematic for my proposal,
since I want pg_xml_done to pfree the state including the message
buffer).

 I was tempted to make all the functions in xml2/ use TRY/CATCH blocks
 like the ones in core's xml.c do. The reasons I held back was I that
 I felt these cleanups weren't part of the problem my patch was trying
 to solve.

Fair point, but contorting the error handling to avoid changing xml2/ a
bit more doesn't seem like a good decision to me.  It's not like we
aren't forcing some change on that module already.

 So we really ought to make pg_xml_done() complain if libxml's
 current error context isn't what it expects.

Right, got that coded already.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another issue with invalid XML values

On Jul20, 2011, at 17:08 , Tom Lane wrote:
 Florian Pflug f...@phlo.org writes:
 On Jul20, 2011, at 01:42 , Tom Lane wrote:
 1. If you forget pg_xml_done in some code path, you'll find out from
 an Assert at the next pg_xml_init, which is probably far away from where
 the actual problem is.
 
 But won't me miss that error entirely if me make it re-entrant?
 
 I'm of the opinion at this point that the most reliable solution is to
 have a coding convention that pg_xml_init and pg_xml_done MUST be
 called in the style
 
   pg_xml_init();
   PG_TRY();
   ...
   PG_CATCH();
   {
   ...
   pg_xml_done();
   PG_RE_THROW();
   }
   PG_END_TRY();
   pg_xml_done();
 
 If we convert contrib/xml2 over to this style, we can get rid of some of
 the weirder aspects of the LEGACY mode, such as allowing xml_ereport to
 occur after pg_xml_done

Fine with me. 

 (which would be problematic for my proposal,
 since I want pg_xml_done to pfree the state including the message
 buffer).

I'm fine with having pg_xml_init() palloc the state and pg_xml_done()
pfree it, but I'm kinda curious about why you prefer that over making it
the callers responsibility and letting callers use a stack-allocated
struct if they wish to.

 I was tempted to make all the functions in xml2/ use TRY/CATCH blocks
 like the ones in core's xml.c do. The reasons I held back was I that
 I felt these cleanups weren't part of the problem my patch was trying
 to solve.
 
 Fair point, but contorting the error handling to avoid changing xml2/ a
 bit more doesn't seem like a good decision to me.  It's not like we
 aren't forcing some change on that module already.

Fair enough. Are you going to do that, or do you want me to produce an
updated patch? I can do that, but probably not before the weekend.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another issue with invalid XML values

Florian Pflug f...@phlo.org writes:
 I'm fine with having pg_xml_init() palloc the state and pg_xml_done()
 pfree it, but I'm kinda curious about why you prefer that over making it
 the callers responsibility and letting callers use a stack-allocated
 struct if they wish to.

We could do it that way, but it would require exposing the struct
definition to callers.  As I have it coded ATM, the struct is an
opaque typedef in xml.h and only known within xml.c, which decouples
contrib/xml2 from any changes in it.  Another point is that if we
changed our minds and went over to a transaction cleanup hook,
stack-allocated structs wouldn't work at all.  Lastly, even if we
did stack-allocate the control struct, the message buffer has to be
palloc'd so it can be expanded at need.

 Fair enough. Are you going to do that, or do you want me to produce an
 updated patch? I can do that, but probably not before the weekend.

No, I'm working on it, almost done already.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another issue with invalid XML values

On Jul20, 2011, at 17:37 , Tom Lane wrote:
 Florian Pflug f...@phlo.org writes:
 I'm fine with having pg_xml_init() palloc the state and pg_xml_done()
 pfree it, but I'm kinda curious about why you prefer that over making it
 the callers responsibility and letting callers use a stack-allocated
 struct if they wish to.
 
 We could do it that way, but it would require exposing the struct
 definition to callers.  As I have it coded ATM, the struct is an
 opaque typedef in xml.h and only known within xml.c, which decouples
 contrib/xml2 from any changes in it.  Another point is that if we
 changed our minds and went over to a transaction cleanup hook,
 stack-allocated structs wouldn't work at all.  Lastly, even if we
 did stack-allocate the control struct, the message buffer has to be
 palloc'd so it can be expanded at need.

You've convinced me, thanks for the detailed explanation!

 Fair enough. Are you going to do that, or do you want me to produce an
 updated patch? I can do that, but probably not before the weekend.
 
 No, I'm working on it, almost done already.

Cool, thanks!

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache


On 2011-07-20 16:54, Robert Haas wrote:


As to the first point, the other big problem with adding a syscache
here is that it could get really big.  I've worried about that issue
previously, and Yev's perplexity about where the memory is going is
not too reassuring.
Yeah I just realized that #buckets  cache items stored, which makes 
some of the comments I made a bit strange. If the syscaches store 
everything that's looked up then the same 5MB memory usage under 
changing #buckets makes perfect sense.


regards,
Yeb


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache

Yeb, Thanks for your volunteering.

On 2011-07-14 21:46, Kohei KaiGai wrote:
Sorry, the syscache part was mixed to contrib/sepgsql part
in my previous post.
Please see the attached revision.

Although its functionality is enough simple (it just reduces
number of system-call invocation), its performance
improvement is obvious.
So, I hope someone to volunteer to review these patches.
This is a review of the two userspace access vector cache patches for
SE-PostgreSQL. Proofreading the source I've seen mostly cosmetic things.
Though I have a few questions, I think the overal shape of the patch is
good enough to mark it ready for comitter.

Remarks:

* The patches apply cleanly, compile cleanly on Fedora 15. It depends on
libselinux-2.0.99, and that is checked on by the configure script: good.

* I run SELECT sepgsql_restorecon(NULL) and saw comparable numbers in
speed gain and also backend process memory increase as indicated by
KaiGai-san. The vmsize without the patch increases from running
restorecon 3864kB, with the patch is increases 8160kB, a difference of
~5MB. Where this change comes from is unclear to me. The avc_cache holds
only 7 entries, and the avc memory context stats indicate it's only at
8kB. Investigation into the SECLABELOID syscache revealed that even when
that cache was set to a #buckets of 2, still after restorecon() the
backend's vmsize increased about the ~5MB more.

The sepgsql_restorecon(NULL) assigns default security label on all the
database objects being controlled, thus, its workload caches security
label (including text data) of these objects.
So, ~5MB of difference is an upper limit of syscache usage because of
SECLABELOID.
The number of buckets is independent from the memory consumption.

* The size of SECLABELOID is currently set to 2048, which is equal to
sizes of the pg_proc and pg_attribute caches and hence makes sense. The
initial contents of pg_seclabel is 3346 entries. Just to get some idea
what the cachesize means for restorecon performance I tested some
different values (debugging was on). Until a size of 256, restorecon had
comparable performance. Small drop from ~415 to ~425 from 128 to 64.
Cache of 32 had ~435ms performance. Cache of 2 had 680ms. Without
debugging symbols, I also got a slightly better performance on the
restorecon call with a 1024 syscache size. This might be something to
tweak in later versions, when there is more experience what this cache
size means for performance on real databases.

Thanks for your research.
The reason why I set 2048 is just a copy from the catalog which may
hold biggest number of entries (pg_proc). It may be improved later.

* The cache is called userspace, presumably because it isn't cached in
shared memory? If sepgsql is targeted at installations with many
different kinds of clients (having different subject contexts), or
access different objects, this is a good choice.

SELinux's kernel implementation also has access vector cache:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=security/selinux/avc.c

The reason why I didn't choose shared memory is the length of security
label text is variable, so it is not feasible to acquire a fixed length
region on shared memory in startup time.

* Checked that the cache keeps working after errors, ok.

* uavc.c defines some static variables like lru_hint, threshold and
unlabeled. It would be nice to have a small comment with each one that
says what the variable represents (like is done for the avc_cache struct
members right above it)

OK, I'll add comments here.

* I had to google SELINUX_AVD_FLAGS_PERMISSIVE to understand what was
going on. Since the goal why it is recorded a domain is permissive, is
to prevent flooding the log, maybe renaming avc_cache.permissive into
avc_cache.log_violations_once would explain itself.

It is a bit concern for me, because the permissive domain means it shall
be performed like as system is configured as permissive mode.
The behavior of log-violation-at-once is a property of permissive mode.
So, how about an idea to add comments about permissive domain around
the code to modify cache-allowed?

* selinux_status_open() is called and it's man page mentions
selinux_status_close(). It currently unmaps memory and closes a file
descriptor, which is done on process exit anyway. I don't have enough
experience with these kinds of system calls to have a feeling whether it
might change in the future (and in such cases I'd have added a call to
selinux_status_close() on proc_exit, just to be on the safe side).

I omitted it because OS will handle these things correctly on process
Exit time, however, it is good idea to move on safe side.

* sepgsel_avs_unlabeled(). I was confused a bit because objects can have
a label 'unlabeled', and the comments use wording like 'when unlabeled'.
Does this mean with no label, or with a label 'unlabeled'?

It means object has no label.

Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache

On Wed, Jul 20, 2011 at 12:04 PM, Kohei Kaigai
kohei.kai...@emea.nec.com wrote:
 The sepgsql_restorecon(NULL) assigns default security label on all the
 database objects being controlled, thus, its workload caches security
 label (including text data) of these objects.
 So, ~5MB of difference is an upper limit of syscache usage because of
 SECLABELOID.

No, it's not.  It's just the upper limit of how large it can be on an
*empty* database.  A real database could have hundreds of tables and
views and thousands of columns.  To say nothing of large objects.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache

On Wed, Jul 20, 2011 at 12:40 PM, Kohei Kaigai
kohei.kai...@emea.nec.com wrote:
 One question is why InitCatalogCache() should be invoked from InitPostgres()?
 If we initialize syscache on starting up postmaster process, I think
 all the syscache buckets will be ready on child process forks, and
 unused syscache does not consume physical memory, even if security
 label acquire 2048 of buckets.

Most of the overhead seems to be the cost of the page faults required
for the kernel to map the relevant pages into the process address
space.  After a fork(), all those pages become copy-on-write, so if
the syscaches are actually used this doesn't save much of anything.
In the specific case of sepgsql it would help, though, because what
you're proposing is to add a syscache that will in most cases never be
accessed at all.  We'd still have a problem in the EXEC_BACKEND (i.e.
Windows) case, however...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Initial Review: JSON contrib modul was: Re: [HACKERS] Another swing at JSON

2011-07-20 Thread Joey Adams

On Wed, Jul 20, 2011 at 6:49 AM, Florian Pflug f...@phlo.org wrote:
 Hm, I agree that we need to handle \u escapes in JSON input.
 We won't ever produce them during output though, right?

We could, to prevent transcoding errors if the client encoding is
different than the server encoding (and neither is SQL_ASCII, nor is
the client encoding UTF8).  For example, if the database encoding is
UTF-8 and the client encoding is WIN1252, I'd think it would be a good
idea to escape non-ASCII characters.

 How does that XML type handle the situation? It seems that it'd have
 the same problem with unicode entity references #;.

From the looks of it, XML operations convert the text to UTF-8 before
passing it to libxml.  The XML type does not normalize the input;
SELECT '#9835;♫'::xml; simply yields #9835;♫.  Escapes of any
character are allowed in any encoding, from the looks of it.

- Joey

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache

 On Wed, Jul 20, 2011 at 12:40 PM, Kohei Kaigai
 kohei.kai...@emea.nec.com wrote:
  One question is why InitCatalogCache() should be invoked from 
  InitPostgres()?
  If we initialize syscache on starting up postmaster process, I think
  all the syscache buckets will be ready on child process forks, and
  unused syscache does not consume physical memory, even if security
  label acquire 2048 of buckets.
 
 Most of the overhead seems to be the cost of the page faults required
 for the kernel to map the relevant pages into the process address
 space.  After a fork(), all those pages become copy-on-write, so if
 the syscaches are actually used this doesn't save much of anything.
 In the specific case of sepgsql it would help, though, because what
 you're proposing is to add a syscache that will in most cases never be
 accessed at all.  We'd still have a problem in the EXEC_BACKEND (i.e.
 Windows) case, however...
 
Hmm. It might not work in windows case.

I'd like to have a discussion about syscache towards next commit-fest.
The issues may be:
 - Initial bucket allocation on most cases never be referenced.
 - Reclaim cache entries on growing up too large.

Thanks,
--
NEC Europe Ltd, SAP Global Competence Center
KaiGai Kohei kohei.kai...@emea.nec.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache

 On Wed, Jul 20, 2011 at 12:04 PM, Kohei Kaigai
 kohei.kai...@emea.nec.com wrote:
  The sepgsql_restorecon(NULL) assigns default security label on all the
  database objects being controlled, thus, its workload caches security
  label (including text data) of these objects.
  So, ~5MB of difference is an upper limit of syscache usage because of
  SECLABELOID.
 
 No, it's not.  It's just the upper limit of how large it can be on an
 *empty* database.  A real database could have hundreds of tables and
 views and thousands of columns.  To say nothing of large objects.
 
Ah, sorry, you are correct.

Regarding to large objects, GetSecurityLabel() is modified not to use
SECLABELOID to flood of the syscache.

Thanks,
--
NEC Europe Ltd, SAP Global Competence Center
KaiGai Kohei kohei.kai...@emea.nec.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another issue with invalid XML values

I've committed this patch with the discussed changes and some other
editorialization.  I have to leave for an appointment and can't write
anything now about the changes, but feel free to ask questions if you
have any.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: new contrib module plpgsql's embeded sql validator

2011-07-20 Thread Petr Jelínek


On 07/20/2011 05:24 AM, Tom Lane wrote:

=?ISO-8859-1?Q?Petr_Jel=EDnek?=pjmo...@pjmodos.net  writes:

But, I think we should add valitation hook to plpgsql plugin structure
so that you don't have to actually execute the function to check it -
curretly there are only executing hooks which is why the plugin only
works when you the func (not good for automation).

If you mean that such checks would be done automatically, no, they
shouldn't be.  Consider a function that creates a table and then uses
it, or even just depends on using a table that doesn't yet exist when
you do CREATE FUNCTION.


No, certainly not by default, I would just like to be able to do it 
automatically without having to call the function.


--
Petr Jelinek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A few user-level questions on Streaming Replication and pg_upgrade

2011-07-20 Thread Gurjeet Singh

On Tue, Jul 19, 2011 at 8:45 PM, Bruce Momjian br...@momjian.us wrote:

 Gurjeet Singh wrote:

 [  CC to general removed --- emailing only hackers;  cross-posting is
 frowned upon. ]


I thought these questions were of interest to the general public too.



  .) Is Streaming Replication supported across minor releases, in reverse
  direction; e.g. 9.0.3 to 9.0.1
 
  I think the answer is it depends, since it would depend upon
 whether
  any SR related bug has been fixed in the 'greater' of the minor releases.
 
  I am assuming that smaller minor release to bigger minor release will
  always be supported (e.g. 9.0.1 to 9.0.3)

 Yes.


I am assuming that's a yes to both the directions: older - newer , and
newer - older minor releases.


 We could mention in the minor release notes if we break streaming
 replication for a minor release --- or someone will tell us when we do.


I am pretty sure the Postgres community would notify its user base via
release notes.


  .) How reliable is `pg_upgrade -c` (dry run) currently; that is, how
  accurate is pg_upgrade at predicting any potential problem with the
 eventual
  in-place upgrade.
 
  I'd say it is as reliable as it gets since this is the official tool
  supported by the project, and it should not contain any known bugs. One
 has
  to use the latest and greatest 'minor' version of the tool for the major
  release they are upgrading to, though.

 Well, we make no guarantees about the software at all, so it is hard to
 make any guarantee about pg_upgrade either.


:) Given the BSD-style license, that's a fair point.

Thanks,
-- 
Gurjeet Singh
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: [HACKERS] [v9.2] SECURITY LABEL on shared database object

On Wed, Jul 6, 2011 at 1:25 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 2011/7/5 Alvaro Herrera alvhe...@commandprompt.com:
 Excerpts from Kohei Kaigai's message of mar jul 05 11:46:06 -0400 2011:
  On Tue, Jul 5, 2011 at 10:49 AM, Alvaro Herrera
  alvhe...@commandprompt.com wrote:
   Excerpts from Robert Haas's message of mar jul 05 10:19:18 -0400 2011:
  
   Hmm, OK.  I guess what I'm not sure about is - how much should we
   worry about the fact that this creates several more shared (and
   therefore nailed?) system catalogs?  Anyone have an opinion on that?
  
   Several?  That would worry me, given that we currently have a small
   number (eight currently).  If it's just one more, I don't think it's
   such a big deal.  I'm not sure what you mean by nailed though -- I mean,
   for example pg_shdescription is shared but not nailed in the rd_isnailed
   sense of the word, AFAICS.
 
  Well, right now the patch has pg_shseclabel, and its index, plus a
  toast table and a toast index.  Not sure why we want/need the toast
  table  index there, but the patch has 'em as of now.
 
 As a common belief, TEXT is a variable length data type, so pg_shseclabel
 need to have its toast table. However, I don't expect the label field get
 represented as a reference to external pointer, because average length of
 security context is about 40-60 bytes much less than the threshold to
 launch toast_save_datum().
 Do I need to remove these toast table  index?

 We don't have toast tables for pg_database and so on, for example, which
 means that datacl cannot go over a few hundred bytes long.  I think it
 makes sense to not have toast tables for pg_shseclabel.  Keep in mind
 that the label might be compressed before it's stored out of line, which
 gives you quite a bit of actual space.  If a security context is over
 5000 bytes in length I think you're in trouble :-)

 The attached patch removes toast table  index for pg_shseclabel.

 The current toasting.h defines toast table  index on pg_database,
 pg_shdescription and pg_db_role_setting only.
 The pg_authid and pg_tablespace don't have toast table  index
 in spite of variable-length field.
 So, it might not be a necessary stuff for all the shared relations.

I have committed this with fairly extensive revisions.  The main
things I changed were:

- The pg_dump/pg_dumpall support was broken for databases and
needlessly inefficient for roles and tablespaces.  (Parenthetically,
it is hard to blame anyone for screwing up the code here when it is
spaghetti code to begin with.)

- I did not commit the contrib/sepgsql parts of the patch, as I
haven't reviewed them.  I think it would be helpful for you to
resubmit those separately.

- I didn't think it was a good idea to make pg_shseclabel.provider a
name while leaving pg_seclabel.provider as a text field.  Surely these
two catalogs need to stay parallel.  For the same reason, I didn't
like the idea of installing a syscache for pg_shseclabel while
pg_seclabel soldiers on without one.  So for now I ripped out that
logic and instead made it work the old, slow way.  I know we need a
better solution here, but I want to come up with a plan that handles
both catalogs symmetrically and then do it all at once, rather than
piecemeal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache

 On Wed, Jul 20, 2011 at 9:47 AM, Yeb Havinga yebhavi...@gmail.com wrote:
  * I run SELECT sepgsql_restorecon(NULL) and saw comparable numbers in speed
  gain and also backend process memory increase as indicated by KaiGai-san.
  The vmsize without the patch increases from running restorecon 3864kB, with
  the patch is increases 8160kB, a difference of ~5MB. Where this change comes
  from is unclear to me. The avc_cache holds only 7 entries, and the avc
  memory context stats indicate it's only at 8kB. Investigation into the
  SECLABELOID syscache revealed that even when that cache was set to a
  #buckets of 2, still after restorecon() the backend's vmsize increased about
  the ~5MB more.
 
  * The size of SECLABELOID is currently set to 2048, which is equal to sizes
  of the pg_proc and pg_attribute caches and hence makes sense. The initial
  contents of pg_seclabel is 3346 entries. Just to get some idea what the
  cachesize means for restorecon performance I tested some different values
  (debugging was on). Until a size of 256, restorecon had comparable
  performance. Small drop from ~415 to ~425 from 128 to 64. Cache of 32 had
  ~435ms performance. Cache of 2 had 680ms. Without debugging symbols, I also
  got a slightly better performance on the restorecon call with a 1024
  syscache size. This might be something to tweak in later versions, when
  there is more experience what this cache size means for performance on real
  databases.
 
 Both of the above points make me real nervous, especially the second
 one.  We already know that the cost of initializing the system caches
 is a significant chunk of backend startup overhead, and I believe that
 sizeof(Dllist) = 16 on 64-bit machine, so that means that 2048-entry
 catcache is going to require us to zero an additional 32kB of memory
 on every backend startup for a feature that most people won't use.
 
 http://archives.postgresql.org/pgsql-hackers/2010-11/msg01632.php
 http://archives.postgresql.org/pgsql-hackers/2010-11/msg01640.php
 
 As to the first point, the other big problem with adding a syscache
 here is that it could get really big.  I've worried about that issue
 previously, and Yev's perplexity about where the memory is going is
 not too reassuring.
 
 I think there are two realistic ways forward here:
 
 1. Bite the bullet and implement a system allowing catalog caches to
 be expanded on the fly at runtime.  This would be nice because, aside
 from whatever value it may have for SE-PostgreSQL, it might allow us
 to shrink some of the other caches down and thereby speed up backend
 startup in general, with the confidence that if the session ends up
 running for a while we can expand them as necessary.  In the
 particular case of SE-PostgreSQL, it might be nice to be able to set
 the initial cache size to 0, and only pay the cost of setting it up if
 we discover that the security label feature is in use; but maybe
 that's overly complex.  On the downside, this addresses only the
 startup problem (zeroing previously unreferenced memory is fairly
 expensive) and not the runtime problem (using too much memory is bad).
  But perhaps there is some other solution to the second problem.
 
One question is why InitCatalogCache() should be invoked from InitPostgres()?
If we initialize syscache on starting up postmaster process, I think
all the syscache buckets will be ready on child process forks, and
unused syscache does not consume physical memory, even if security
label acquire 2048 of buckets.

 2. Implement a custom cache tailored to the needs of SE-PostgreSQL
 within contrib/sepgsql.  This is a bit tricky because you need some
 mechanism for receiving invalidation events.
 CacheRegisterSyscacheCallback() is the usual method, but that only
 lets you notice catcache invalidations, and here the whole point would
 be to avoid having a catcache in the first place.  Possibly we could
 add a hook to PrepareForTupleInvalidation(); assuming that the hook
 engages only after the IsSystemRelation and IsToastRelation checks, it
 shouldn't cost that much.  Doing it this way would (1) allow the
 sepgsql-specific cache to come into existence only when needed and (2)
 allow the sepgsl-specific cache to apply sepgsql-specific policies for
 controlling memory use.  For example, it could cap the number of
 entries in the cache and age out any that are too old; or it could
 arrange to maintain a single copy of each label rather than a copy for
 each object.
 
I'm interested in this idea, however, not a work in this commit-fest.

So, I'll detach syscache part from the existing uavc patch (and shared
object's security label patch).

 I think that the uavc cache stuff may still be something we can think
 about committing for this 'fest (I haven't reviewed it yet), but IMHO
 the syscache parts need some more thought.
 
Thanks,
--
NEC Europe Ltd, SAP Global Competence Center
KaiGai Kohei kohei.kai...@emea.nec.com

-- 
Sent via pgsql-hackers mailing list

Re: [HACKERS] Questions and experiences writing a Foreign Data Wrapper

2011-07-20 Thread Heikki Linnakangas


On 20.07.2011 18:00, Albe Laurenz wrote:

2) If I decide to close remote database connections after
use, I would like to do so where reasonable.
I would like to keep the connection open between query
planning and query execution and close it when the
scan is done.
The exception could be named prepared statements.
Is there a way to tell if that is the case during
planing or execution?


Hmm, maybe you could add a hook to close the connection when the 
transaction ends. But actually, you'd want to keep the connection open 
across transactions too. Some sort of a general connection caching 
facility would be useful for many FDW.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] lazy vxid locks, v3

I took another look at v2 of my lazy vxid locks patch and realized
that it was pretty flaky in a couple of different ways.  Here's a
version that I think is a bit more robust, but considering the extent
of the revisions, it probably needs another round of review from
someone before I commit it.

Any review appreciated; I would prefer not to have to wait until
October to get this committed, since there is quite a bit of follow-on
work that I would like to do as well.  FWIW, the performance
characteristics are basically identical to the previous versions,
AFAICT.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


lazyvxid-v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] cataloguing NOT NULL constraints

2011-07-20 Thread Alvaro Herrera

Excerpts from Peter Eisentraut's message of sáb jul 09 14:45:23 -0400 2011:
 On tor, 2011-07-07 at 17:34 -0400, Alvaro Herrera wrote:
  The attached patch introduces pg_constraint rows for NOT NULL
  column constraints.
 
 The information schema views check_constraints and table_constraints
 currently make up some artificial constraint names for not-null
 constraints.  I suppose this patch removes the underlying cause for
 that, so could you look into updating the information schema as well?
 You could probably just remove the separate union branches for not null
 and adjust the contype conditions.

Fixing table_constraints is pretty trivial, just like you suggest;
already done in my private tree.

I checked the check_constraints definition in the standard and it's not
clear to me that NOT NULL constraints are supposed to be there at all.
Are NOT NULL constraints considered to be CHECK constraints too?

The fix is trivial either way: if they are not to be there we should
just remove the UNION arm that deals with them.  If they are, we do
likewise and then fix the other arm as you suggest.

Thanks for the pointer.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another issue with invalid XML values

Tom Lane wrote:
 I've committed this patch with the discussed changes and some other
 editorialization.  I have to leave for an appointment and can't write
 anything now about the changes, but feel free to ask questions if you
 have any.

Did this fix any of the XML TODOs?

http://wiki.postgresql.org/wiki/Todo#XML

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A few user-level questions on Streaming Replication and pg_upgrade

Gurjeet Singh wrote:
 On Tue, Jul 19, 2011 at 8:45 PM, Bruce Momjian br...@momjian.us wrote:
 
  Gurjeet Singh wrote:
 
  [  CC to general removed --- emailing only hackers;  cross-posting is
  frowned upon. ]
 
 
 I thought these questions were of interest to the general public too.

What I usually do is to discuss on hackers and post a summary of
information 'general' would find interesting.

   .) Is Streaming Replication supported across minor releases, in reverse
   direction; e.g. 9.0.3 to 9.0.1
  
   I think the answer is it depends, since it would depend upon
  whether
   any SR related bug has been fixed in the 'greater' of the minor releases.
  
   I am assuming that smaller minor release to bigger minor release will
   always be supported (e.g. 9.0.1 to 9.0.3)
 
  Yes.
 
 
 I am assuming that's a yes to both the directions: older - newer , and
 newer - older minor releases.

Yes, I believe both directions would work unless we mentioned it the
release notes, in which cases it might not work, or might work older to
newer but newer to older.

  We could mention in the minor release notes if we break streaming
  replication for a minor release --- or someone will tell us when we do.
 
 
 I am pretty sure the Postgres community would notify its user base via
 release notes.

We will if we know it, but it is possible to have rare cases where we
don't find out until after the minor release.

   .) How reliable is `pg_upgrade -c` (dry run) currently; that is, how
   accurate is pg_upgrade at predicting any potential problem with the
  eventual
   in-place upgrade.
  
   I'd say it is as reliable as it gets since this is the official tool
   supported by the project, and it should not contain any known bugs. One
  has
   to use the latest and greatest 'minor' version of the tool for the major
   release they are upgrading to, though.
 
  Well, we make no guarantees about the software at all, so it is hard to
  make any guarantee about pg_upgrade either.
 
 
 :) Given the BSD-style license, that's a fair point.

I just found out, thanks to EnterpriseDB testing, that pg_upgrade -l
doesn't work on Windows.  I will post the fix in an hour for all
released versions of pg_upgrade.  You will find it entertaining that -l
works in check mode but not in actual upgrade mode.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A few user-level questions on Streaming Replication and pg_upgrade

On Wed, Jul 20, 2011 at 2:53 PM, Bruce Momjian br...@momjian.us wrote:
 I am assuming that's a yes to both the directions: older - newer , and
 newer - older minor releases.

 Yes, I believe both directions would work unless we mentioned it the
 release notes, in which cases it might not work, or might work older to
 newer but newer to older.

I don't see how this would get broken in a minor release.  We'd have
to change the WAL format or the tuple format, which is definitely not
minor release material.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] WIP fix proposal for bug #6123

2011-07-20 Thread Kevin Grittner

We're nearing completion of testing the migration of a lot of code
which used our custom Java framework into PostgreSQL functions and
triggers.  Yesterday our testers ran into surprising behavior
related to delete triggers.  A test case is presented on the -bugs
list, but basically it amounts to this:
 
(1)  We have some detail which is summarized by triggers into
related higher-level tables for performance reasons.
 
(2)  On delete, some of the higher level tables should cascade the
delete down to the lower levels.
 
(3)  Sometimes the same tables are involved in both.
 
This is complicated by the foreign key situation -- due to
conversion of less-than-perfect data and the fact that there is a
legal concept of the elected Clerk of Court in each county being the
custodian of the data we have orphaned detail in some tables which
we don't have authority to delete or create bogus parent rows for. 
(It would actually be a felony for us to do so, I think.)  Until 9.2
we won't be able to create foreign keys for these relationships, but
in each county we've created foreign keys for all relationships
without orphans.  So, this is one reason we can't count on foreign
key declarations, with the ON DELETE CASCADE option, yet we don't
want to drop the foreign keys where they exist, as they help prevent
further degradation of the data integrity.  So the DELETE from the
children should occur in the BEFORE trigger to avoid upsetting FK
logic.  Otherwise we could move the cascading deletes to the AFTER
DELETE trigger, where this odd behavior doesn't occur.
 
So basically, the goal of this patch is to ensure that a BEFORE
DELETE trigger doesn't silently fail to delete a row because that
row was updated during the BEFORE DELETE trigger firing (in our case
by secondary trigger firing).
 
If that description was too hard to follow, let me know and I'll try
again.  :-/
 
[Summarizing discussion on the -bugs list,] Tom didn't feel that
there was a need to support application code which does what I
describe above, and he felt that fixing it would open a can of
worms, with logical quandaries about correct behavior. Basically,
the changes I made were within switch statements where if the row
was found to be HeapTupleUpdated in READ COMMITTED, it would follow
the ctid chain; I used similar logic for HeapTupleSelfUpdated
regardless of transaction isolation level.  The reasons for not
re-firing delete triggers here is the same for why delete triggers
are not fired in the existing case -- it's just one delete.
 
No claims are made for completeness of this patch -- it's a proof of
concept on which I hope to get comments. Before this patch would be
production ready I would need to check for similar needs on UPDATE,
and would need to check to make sure there is no resource leakage.
It passes `make check-world`, `make installcheck-world` with a
database set for serializable transaction isolation, and the
isolation tests.  And of course, it doesn't show the behavior which
we found so astonishing -- we no longer see an attempted delete of a
parent succeed in deleting the children but leave the parent sitting
there.
 
A patch against 9.0 based on this approach may be find its way into
production here in about two weeks if there are no
contra-indications, so any review would be very much appreciated.
 
-Kevin
*** a/src/backend/executor/execMain.c
--- b/src/backend/executor/execMain.c
***
*** 1847,1854  EvalPlanQualFetch(EState *estate, Relation relation, int 
lockmode,
switch (test)
{
case HeapTupleSelfUpdated:
-   /* treat it as deleted; do not process 
*/
ReleaseBuffer(buffer);
return NULL;
  
case HeapTupleMayBeUpdated:
--- 1847,1862 
switch (test)
{
case HeapTupleSelfUpdated:
ReleaseBuffer(buffer);
+   if (!ItemPointerEquals(update_ctid, 
tuple.t_self))
+   {
+   /* it was updated, so look at 
the updated version */
+   tuple.t_self = update_ctid;
+   /* updated row should have xmin 
matching this xmax */
+   priorXmax = update_xmax;
+   continue;
+   }
+   /* treat it as deleted; do not process 
*/
return NULL;
  
case HeapTupleMayBeUpdated:
*** a/src/backend/executor/nodeModifyTable.c
--- b/src/backend/executor/nodeModifyTable.c
***

Re: [HACKERS] WIP fix proposal for bug #6123

On Wed, Jul 20, 2011 at 2:58 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 So basically, the goal of this patch is to ensure that a BEFORE
 DELETE trigger doesn't silently fail to delete a row because that
 row was updated during the BEFORE DELETE trigger firing (in our case
 by secondary trigger firing).

I've run across this problem before while writing application code and
have found it surprising, unfortunate, and difficult to work around.
It's not so bad when it only bites you once, but as things get more
complicated and you have more and more triggers floating around, it
gets pretty darn horrible.  One of the things I've done to mitigate
the impact of this is to write as many triggers as possible as AFTER
triggers, but that's sometimes difficult to accomplish.

Your scenario is a BEFORE DELETE trigger that does an UPDATE on the
same row, but I think this problem also occurs if you have a BEFORE
UPDATE trigger that does an UPDATE on the same row.  I believe the
second update gets silently ignored.

I'm unfortunately unqualified to speak to the correctness of your
patch, but I concur with your view that the current behavior is lousy.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A few user-level questions on Streaming Replication and pg_upgrade

Robert Haas wrote:
 On Wed, Jul 20, 2011 at 2:53 PM, Bruce Momjian br...@momjian.us wrote:
  I am assuming that's a yes to both the directions: older - newer , and
  newer - older minor releases.
 
  Yes, I believe both directions would work unless we mentioned it the
  release notes, in which cases it might not work, or might work older to
  newer but newer to older.
 
 I don't see how this would get broken in a minor release.  We'd have
 to change the WAL format or the tuple format, which is definitely not
 minor release material.

If there was a bug in the xlog stream content we might not be able to
fix it without breaking compatibility.  Rare, but possible.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP fix proposal for bug #6123

2011-07-20 Thread Kevin Grittner

Robert Haas robertmh...@gmail.com wrote:
 
 I think this problem also occurs if you have a BEFORE
 UPDATE trigger that does an UPDATE on the same row.
 
I'm pretty sure you're right, and I do intend to cover that in the
final patch.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A few user-level questions on Streaming Replication and pg_upgrade

On Wed, Jul 20, 2011 at 3:29 PM, Bruce Momjian br...@momjian.us wrote:
 Robert Haas wrote:
 On Wed, Jul 20, 2011 at 2:53 PM, Bruce Momjian br...@momjian.us wrote:
  I am assuming that's a yes to both the directions: older - newer , and
  newer - older minor releases.
 
  Yes, I believe both directions would work unless we mentioned it the
  release notes, in which cases it might not work, or might work older to
  newer but newer to older.

 I don't see how this would get broken in a minor release.  We'd have
 to change the WAL format or the tuple format, which is definitely not
 minor release material.

 If there was a bug in the xlog stream content we might not be able to
 fix it without breaking compatibility.  Rare, but possible.

OK, fair enough.  But I think the likelihood is extremely low.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another issue with invalid XML values

Bruce Momjian br...@momjian.us writes:
 Did this fix any of the XML TODOs?
   http://wiki.postgresql.org/wiki/Todo#XML

No, not that I know of.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache

2011-07-20 Thread Josh Berkus

Kaigai,

 I'd like to have a discussion about syscache towards next commit-fest.
 The issues may be:
  - Initial bucket allocation on most cases never be referenced.
  - Reclaim cache entries on growing up too large.

Should I move this patch to the next CF?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache

On Wed, Jul 20, 2011 at 4:06 PM, Josh Berkus j...@agliodbs.com wrote:
 Kaigai,

 I'd like to have a discussion about syscache towards next commit-fest.
 The issues may be:
  - Initial bucket allocation on most cases never be referenced.
  - Reclaim cache entries on growing up too large.

 Should I move this patch to the next CF?

Not yet.  I'm still going to review the other half of this patch.

The discussion above should be undertaken on a new thread.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache

On Wed, Jul 20, 2011 at 4:19 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jul 20, 2011 at 4:06 PM, Josh Berkus j...@agliodbs.com wrote:
 Kaigai,

 I'd like to have a discussion about syscache towards next commit-fest.
 The issues may be:
  - Initial bucket allocation on most cases never be referenced.
  - Reclaim cache entries on growing up too large.

 Should I move this patch to the next CF?

 Not yet.  I'm still going to review the other half of this patch.

On second thought, never mind: the second half still needs another
update from KaiGai, and I think we shouldn't hold the CF open for
that.  So I'll mark this Returned with Feedback.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A few user-level questions on Streaming Replication and pg_upgrade

Robert Haas wrote:
 On Wed, Jul 20, 2011 at 3:29 PM, Bruce Momjian br...@momjian.us wrote:
  Robert Haas wrote:
  On Wed, Jul 20, 2011 at 2:53 PM, Bruce Momjian br...@momjian.us wrote:
   I am assuming that's a yes to both the directions: older - newer , 
   and
   newer - older minor releases.
  
   Yes, I believe both directions would work unless we mentioned it the
   release notes, in which cases it might not work, or might work older to
   newer but newer to older.
 
  I don't see how this would get broken in a minor release. ?We'd have
  to change the WAL format or the tuple format, which is definitely not
  minor release material.
 
  If there was a bug in the xlog stream content we might not be able to
  fix it without breaking compatibility. ?Rare, but possible.
 
 OK, fair enough.  But I think the likelihood is extremely low.

It would require a case where we couldn't distinguish a valid from an
invalid WAL entry, or the streaming used by the old server was
sufficiently broken that we didn't want to support it.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache

Kohei Kaigai kohei.kai...@emea.nec.com writes:
 I'd like to have a discussion about syscache towards next commit-fest.
 The issues may be:
  - Initial bucket allocation on most cases never be referenced.
  - Reclaim cache entries on growing up too large.

There used to be support for limiting the number of entries in a
syscache.  It got removed (cf commit
8b9bc234ad43dfa788bde40ebf12e94f16556b7f) because (1) it was remarkably
expensive to do it (extra list manipulations, etc), and (2) performance
tended to fall off a cliff as soon as you had a few more tables or
whatever than the caches would hold.  I'm disinclined to reverse that
decision.  It appears to me that the security label stuff needs a
different set of performance tradeoffs than the rest of the catalogs,
which means it probably ought to do its own caching, rather than trying
to talk us into pessimizing the other catalogs for seclabel's benefit.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest Status: Sudden Death Overtime

[ having now read the relevant threads a bit more carefully ... ]

Florian Pflug f...@phlo.org writes:
 On Jul18, 2011, at 22:19 , Tom Lane wrote:
 Yeah, it's not clear to me either which of those are still reasonable
 candidates for committing as-is.

 There are actually three XML-related patches from me in the queue.
 I'll try to give an overview of their states - as perceived by me,
 of course. If people want to comment on this, I suggest to do that
 that either on the existing threads from these patches or on new ones,
 but not on this one - lest confusion ensues.

 * Bugfix for XPATH() if text or attribute nodes are selected
   https://commitfest.postgresql.org/action/patch_view?id=580

 Makes XPATH() return valid XML if text or attribute are selected.

 I'm not aware of any issues with that one, other than Radoslaw's
 unhappiness about the change of behaviour. Since the current behaviour
 is very clearly broken (as in dump, reload, ka-Woom), I'm not prepared
 to accept that as a show-stopper.

I think we ought to apply this one.  I agree that returning invalid XML
is not acceptable.

 * Bugfix for XPATH() if expression returns a scalar value
   https://commitfest.postgresql.org/action/patch_view?id=565

 Makes XPATH() support XPath expressions which compute a scalar value,
 not a node set (i.e. apply a top-level function such as name()).
 Currently, we return an empty array in that case.

 Peter Eisentraut initially seemed to prefer creating separate functions
 for that (XPATH_STRING, ...). I argued against that, on the grounds that
 that'd make it impossible to evaluate user-supplied XPath expression (since
 you wouldn't know which function to use). I haven't heard back from him
 after that argument.

I find your argument against XPATH_STRING() a bit unconvincing.
The use case for that is not where you are trying to evaluate an
unknown, arbitrary XPath expression; it's where you are evaluating an
expression that you *know* returns string (ie, it has name() or some
other string returning function at top level) and you'd like a string,
thanks, not something that you have to de-escape in order to get a
string.

Of course this use-case could also be addressed by providing a de-escape
function, but I don't really think that de_escape(xpath(...)[1]) is a
particularly friendly or efficient notation.

Now, these statements are not arguments against the patch --- as is,
XPATH() is entirely useless for expressions that return scalars, and
with the patch it at least does something usable.  So I think we should
apply it.  But there is room here for more function(s) to provide more
convenient functionality for the special case of expressions returning
strings.  I'd be inclined to provide xpath_number and xpath_bool too,
although those are less essential since a cast will get the job done.

There's some code-style aspects I don't care for in the submitted patch,
but I'll go clean those up and commit it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache

On Wed, Jul 20, 2011 at 4:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Kohei Kaigai kohei.kai...@emea.nec.com writes:
 I'd like to have a discussion about syscache towards next commit-fest.
 The issues may be:
  - Initial bucket allocation on most cases never be referenced.
  - Reclaim cache entries on growing up too large.

 There used to be support for limiting the number of entries in a
 syscache.  It got removed (cf commit
 8b9bc234ad43dfa788bde40ebf12e94f16556b7f) because (1) it was remarkably
 expensive to do it (extra list manipulations, etc), and (2) performance
 tended to fall off a cliff as soon as you had a few more tables or
 whatever than the caches would hold.  I'm disinclined to reverse that
 decision.  It appears to me that the security label stuff needs a
 different set of performance tradeoffs than the rest of the catalogs,
 which means it probably ought to do its own caching, rather than trying
 to talk us into pessimizing the other catalogs for seclabel's benefit.

I agree that we don't want to limit the size of the catcaches.  We've
been careful to design them in such a way that they won't blow out
memory, and so far there's no evidence that they do.  If it ain't
broke, don't fix it.  Having catcaches that can grow in size as needed
sounds useful to me, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest Status: Sudden Death Overtime

On Jul20, 2011, at 23:35 , Tom Lane wrote:
 I find your argument against XPATH_STRING() a bit unconvincing.
 The use case for that is not where you are trying to evaluate an
 unknown, arbitrary XPath expression; it's where you are evaluating an
 expression that you *know* returns string (ie, it has name() or some
 other string returning function at top level) and you'd like a string,
 thanks, not something that you have to de-escape in order to get a
 string.

Hm, maybe I didn't make myself clear. I'm not against having
special-case functions like XPATH_STRING() at all, as long as there's
a general-purpuse function like XPATH() that deal with every expression
you throw at it.

There's a small additional concern, though, which is that there's an
XPath 2.0 spec out there, and it modifies the type system and data model
rather heavily. So before we go adding functions, it'd probably be wise
to check that we're not painting ourselves into a corner.

 Of course this use-case could also be addressed by providing a de-escape
 function, but I don't really think that de_escape(xpath(...)[1]) is a
 particularly friendly or efficient notation.

Yeah, that's pretty ugly. Having XMLESCAPE/XMLUNESCAPE (with types
TEXT - XML and XML - TEXT) would probably still be a good idea though,
even if it no replacement for XPATH_STRING().

 Now, these statements are not arguments against the patch --- as is,
 XPATH() is entirely useless for expressions that return scalars, and
 with the patch it at least does something usable. So I think we should
 apply it.

Cool, so we're on the same page.

 But there is room here for more function(s) to provide more
 convenient functionality for the special case of expressions returning
 strings.  I'd be inclined to provide xpath_number and xpath_bool too,
 although those are less essential since a cast will get the job done.

No objection here. I do have a number of XML-related stuff that I'd like
to do for 9.2, actually. I'll write up a proposal once this commit-fest
is over, and I'll include XPATH_STRINGS and friends.

 There's some code-style aspects I don't care for in the submitted patch,
 but I'll go clean those up and commit it.

I'd offer to fix them, but I somehow have the feeling that you're already
in the middle of it ;-)

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest Status: Sudden Death Overtime

Florian Pflug f...@phlo.org writes:
 There's a small additional concern, though, which is that there's an
 XPath 2.0 spec out there, and it modifies the type system and data model
 rather heavily. So before we go adding functions, it'd probably be wise
 to check that we're not painting ourselves into a corner.

Good point.  Has anybody here looked at that?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and log file output on Windows


I have fixed the bug below with the attached patches for 9.0, 9.1, and
9.2.  I did a minimal patch for 9.0 and 9.1, and a more thorough patch
for 9.2.

The use of -l/log was tested originally in pg_migrator (for 8.4) and
reported to be working, but it turns out it only worked in 'check' mode,
and threw an error during actual upgrade.  This bug was reported to me
recently by EnterpriseDB testing of PG 9.0 on Windows.  The 9.2 C
comments should make it clear that Windows doesn't allow multiple
processes to write to the same file and  should avoid future breakage.

---

Bruce Momjian wrote:
 Has anyone successfully used pg_upgrade 9.0 with -l (log) on Windows?
 
 I received a private email bug report that pg_upgrade 9.0 does not work
 with the -l/log option on Windows.  The error is:
 
   Analyzing all rows in the new cluster
   c:/MinGW/msys/1.0/home/edb/inst/bin/vacuumdb --port 55445 --username 
 edb --all --analyze
c:/MinGW/msys/1.0/home/edb/auxschedule/test.log 21
   The process cannot access the file because it is being used by another
   process.

 What has me confused is this same code exists in pg_migrator, which was
 fixed to work with -l on Windows by Hiroshi Saito with this change:
 
   /*
* On Win32, we can't send both server output and pg_ctl output
* to the same file because we get the error:
* The process cannot access the file because it is being used by 
 another process.
* so we have to send pg_ctl output to 'nul'.
*/
   sprintf(cmd, SYSTEMQUOTE \%s/pg_ctl\ -l \%s\ -D \%s\ 
   -o \-p %d -c autovacuum=off -c 
 autovacuum_freeze_max_age=20\ 
   start  \%s\ 21 SYSTEMQUOTE,
bindir, ctx-logfile, datadir, port,
   #ifndef WIN32
ctx-logfile);
   #else
DEVNULL);
   #endif
 
 The fix was not to use the same log file and output file for pg_ctl. 
 But as you can see, the pg_ctl and vacuumdb code is unchanged:
 
 prep_status(ctx, Analyzing all rows in the new cluster);
 exec_prog(ctx, true,
   SYSTEMQUOTE \%s/vacuumdb\ --port %d --username \%s\ 
   --all --analyze  %s 21 SYSTEMQUOTE,
   ctx-new.bindir, ctx-new.port, ctx-user, ctx-logfile);
 
 I can't figure out of there is something odd about this user's setup or
 if there is a bug in pg_upgrade with -l on Windows.
 
 -- 
   Bruce Momjian  br...@momjian.ushttp://momjian.us
   EnterpriseDB http://enterprisedb.com
 
   + It's impossible for everything to be true. +
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/pg_upgrade.c b/contrib/pg_upgrade/pg_upgrade.c
new file mode 100644
index 1a515e7..37d2eed
*** a/contrib/pg_upgrade/pg_upgrade.c
--- b/contrib/pg_upgrade/pg_upgrade.c
*** prepare_new_cluster(migratorContext *ctx
*** 161,168 
  	prep_status(ctx, Analyzing all rows in the new cluster);
  	exec_prog(ctx, true,
  			  SYSTEMQUOTE \%s/vacuumdb\ --port %d --username \%s\ 
! 			  --all --analyze  %s 21 SYSTEMQUOTE,
! 			  ctx-new.bindir, ctx-new.port, ctx-user, ctx-logfile);
  	check_ok(ctx);
  
  	/*
--- 161,174 
  	prep_status(ctx, Analyzing all rows in the new cluster);
  	exec_prog(ctx, true,
  			  SYSTEMQUOTE \%s/vacuumdb\ --port %d --username \%s\ 
! 			  --all --analyze  \%s\ 21 SYSTEMQUOTE,
! 			  ctx-new.bindir, ctx-new.port, ctx-user,
! #ifndef WIN32
! 			  ctx-logfile
! #else
! 			  DEVNULL
! #endif
! 			  );
  	check_ok(ctx);
  
  	/*
*** prepare_new_cluster(migratorContext *ctx
*** 174,181 
  	prep_status(ctx, Freezing all rows on the new cluster);
  	exec_prog(ctx, true,
  			  SYSTEMQUOTE \%s/vacuumdb\ --port %d --username \%s\ 
! 			  --all --freeze  %s 21 SYSTEMQUOTE,
! 			  ctx-new.bindir, ctx-new.port, ctx-user, ctx-logfile);
  	check_ok(ctx);
  
  	get_pg_database_relfilenode(ctx, CLUSTER_NEW);
--- 180,193 
  	prep_status(ctx, Freezing all rows on the new cluster);
  	exec_prog(ctx, true,
  			  SYSTEMQUOTE \%s/vacuumdb\ --port %d --username \%s\ 
! 			  --all --freeze  \%s\ 21 SYSTEMQUOTE,
! 			  ctx-new.bindir, ctx-new.port, ctx-user,
! #ifndef WIN32
! 			  ctx-logfile
! #else
! 			  DEVNULL
! #endif
! 			  );
  	check_ok(ctx);
  
  	get_pg_database_relfilenode(ctx, CLUSTER_NEW);
*** prepare_new_databases(migratorContext *c
*** 207,213 
  			  --no-psqlrc --port %d --username \%s\ 
  			  -f \%s/%s\ --dbname template1  \%s\ SYSTEMQUOTE,
  			  ctx-new.bindir, ctx-new.port, ctx-user, ctx-cwd,
!

[HACKERS] sinval synchronization considered harmful

For the last week or so, in between various other tasks, I've been
trying to understand why performance drops off when you run pgbench -n
-S -c $CLIENTS -j $CLIENTS -T $A_FEW_MINUTES at very high client
counts.  The answer, in a word, is SIGetDataEntries().  I believe we
need to bite the bullet and rewrite this using a lock-free algorithm,
using memory barriers on processors with weak memory ordering.
Perhaps there is another way to do it, but nothing I've tried has
really worked so far, and I've tried quite a few things.  Here's the
data.

On unpatched master, performance scales pretty much linearly out to 32
clients.  As you add more clients, it drops off:

[1 client]
tps = 4459.203159 (including connections establishing)
tps = 4488.456559 (including connections establishing)
tps = 4449.570530 (including connections establishing)
[8 clients]
tps = 33524.668762 (including connections establishing)
tps = 33501.833907 (including connections establishing)
tps = 33382.380908 (including connections establishing)
[32 clients]
tps = 178394.863906 (including connections establishing)
tps = 178457.706972 (including connections establishing)
tps = 179365.310179 (including connections establishing)
[80 clients]
tps = 132518.586371 (including connections establishing)
tps = 130968.749747 (including connections establishing)
tps = 132574.338942 (including connections establishing)

With the lazy vxid locks patch, all of those numbers get better by a
few percentage points (the more clients, the more points) except at 80
clients, where the performance is sometimes better and sometimes
worse.  These are 5-minute runs:

[80 clients, with lazy vxid locks]
tps = 119215.958372 (including connections establishing)
tps = 113056.859871 (including connections establishing)
tps = 160562.770998 (including connections establishing)

I can't explain in detail why there is so much variation between the
5-minute runs, or why some runs perform worse than without the lazy
vxid locks, but I can say that apply the first of the two attached
patches (sinval-per-backend-mutex.patch) fixes it.  The patch gets rid
of SInvalReadLock and instead gives each backend its own spinlock.
SICleanupQueue() must therefore get somewhat fuzzy answers, but it
shouldn't matter.  The only real problem is if you need to do the
super-duper-cleanup where you subtract a constant from all the values
in unison.  I just got rid of that completely, by widening the
counters to 64 bits.  Assuming I haven't botched the implementation,
this is clearly a win.  There is still some performance drop-off going
from 32 clients to 80 clients, but it is much less.

[80 clients, with lazy vxid locks and sinval-per-backend-mutex]
tps = 167392.042393 (including connections establishing)
tps = 171336.145020 (including connections establishing)
tps = 170500.529303 (including connections establishing)

Profiling this combination of patches reveals that there is still some
pretty ugly spinlock contention on sinval's msgNumLock.  And it occurs
to me that on x86, we really don't need this lock ... or
SInvalReadLock ... or a per-backend mutex.  The whole of
SIGetDataEntries() can pretty easily be made lock-free.  The only real
changes that seem to be are needed are (1) to use a 64-bit counter, so
you never need to decrement and (2) to recheck resetState after
reading the entries from the queue, to see if we got reset while we
were reading those entries.  Since x86 guarantees that writes will
become visible in the order they are executed, we only need to make
sure that the compiler doesn't rearrange things.  As long as we first
read the maxMsgNum and then read the messages, we can't read garbage.
As long as we read the messages before we check resetState, we will be
sure to notice if we got reset before we read all the messages (which
is the only way that we can have read garbage messages).

Now this implementation (sinval-unlocked.patch) is obviously not a
serious proposal as it stands, because on processors with weak memory
ordering it will presumably blow up all over the place.  But here are
the results on x86:

[80 clients, with lazy vxid locks and sinval-unlocked]
tps = 203256.701227 (including connections establishing)
tps = 190637.957571 (including connections establishing)
tps = 190228.617178 (including connections establishing)

Now, that's actually *faster* then the above 32-core numbers.  Turns
out, with this approach, there's essentially no degradation between 32
clients and 80 clients.  It appears that even one spinlock acquisition
in SIGetDataEntries() is too many.  Currently, we have *three*: one to
get SInvalReadLock, one to get msgnumLock, and one to release
SInvalReadLock.

For contrast, on a two-core MacBook Pro, I can't measure any
difference at all between lazy-vxid,
lazy-vxid+sinval-per-backend-mutex, and lazy-vxid+sinval-unlocked.
The last might even be a touch slower, although there's enough noise
that it's hard to tell for sure, and the effect is very small if

Re: [HACKERS] Questions and experiences writing a Foreign Data Wrapper

2011-07-20 Thread David Fetter

On Wed, Jul 20, 2011 at 08:26:19PM +0300, Heikki Linnakangas wrote:
 On 20.07.2011 18:00, Albe Laurenz wrote:
 2) If I decide to close remote database connections after
 use, I would like to do so where reasonable.
 I would like to keep the connection open between query
 planning and query execution and close it when the
 scan is done.
 The exception could be named prepared statements.
 Is there a way to tell if that is the case during
 planing or execution?
 
 Hmm, maybe you could add a hook to close the connection when the
 transaction ends. But actually, you'd want to keep the connection
 open across transactions too. Some sort of a general connection
 caching facility would be useful for many FDW.

For what it's worth, DBI-Link caches FDW handles in perl session
variables.  I didn't really consider cross-session handle caching
because the complexity would have been large and the benefit small.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fixing PQsetvalue()