Re: [HACKERS] Fast insertion indexes: why no developments
Andres Freund-3 wrote On 2013-11-04 11:27:33 -0500, Robert Haas wrote: On Mon, Nov 4, 2013 at 11:24 AM, Claudio Freire lt; klaussfreire@ gt; wrote: Such a thing would help COPY, so maybe it's worth a look I have little doubt that a deferred insertion buffer of some kind could help performance on some workloads, though I suspect the buffer would have to be pretty big to make it worthwhile on a big COPY that generates mostly-random insertions. Even for random data presorting the to-be-inserted data appropriately could result in much better io patterns. Mmh, I'm afraid that the buffer should be huge to get some real advantage. You have to buffer enough values to avoid touching entire pages, which is not that easy. The LSM-tree is much complicated than a simple memory-buffer + delayed inserts. The other index types that are supposed to help in the random-insertion workload rely on sequential insertions (at the expense of more writing, and slower reads) rather than re-use the btree pattern. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5776963.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
Simon Riggs wrote Everybody on this thread is advised to look closely at Min Max indexes before starting any further work. MinMax will give us access to many new kinds of plan, plus they are about as close to perfectly efficient, by which I mean almost zero overhead, with regard to inserts as it is possible to get. Simon, I don't understand how minmax indexes would help in a random-inserts scenario. While I would love to use minmax for other columns (since we also partition and search based on a timestamp, which is usually well clustered), I thought minmax index would be perfect in a mostly-incremental values scenario. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5776964.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
Arul Shaji Arulappan wrote: Attached is a patch that implements the first set of changes discussed in this thread originally. They are: (i) Implements NCHAR/NVARCHAR as distinct data types, not as synonyms so that: - psql \d can display the user-specified data types. - pg_dump/pg_dumpall can output NCHAR/NVARCHAR columns as-is, not as CHAR/VARCHAR. - Groundwork to implement additional features for NCHAR/NVARCHAR in the future (For eg: separate encoding for nchar columns). (ii) Support for NCHAR/NVARCHAR in ECPG (iii) Documentation changes to reflect the new data type If I understood the discussion correctly the use case is that there are advantages to having a database encoding different from UTF-8, but you'd still want sume UTF-8 columns. Wouldn't it be a better design to allow specifying the encoding per column? That would give you more flexibility. I know that NCHAR/NVARCHAR is SQL Standard, but as I still think that it is a wart. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
On 5 November 2013 08:32, Leonardo Francalanci m_li...@yahoo.it wrote: Simon Riggs wrote Everybody on this thread is advised to look closely at Min Max indexes before starting any further work. MinMax will give us access to many new kinds of plan, plus they are about as close to perfectly efficient, by which I mean almost zero overhead, with regard to inserts as it is possible to get. Simon, I don't understand how minmax indexes would help in a random-inserts scenario. While I would love to use minmax for other columns (since we also partition and search based on a timestamp, which is usually well clustered), I thought minmax index would be perfect in a mostly-incremental values scenario. Minmax indexes seem to surprise many people, so broad generalisations aren't likely to be useful. I think the best thing to do is to publish some SQL requests that demonstrate in detail what you are trying to achieve and test them against minmax indexes. That way we can discuss what does work and what doesn't work well enough yet. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions
Hello, With this index, you will get a different plan like this, Exactly my point, can we look at making windowing functions smart and make use of available indexes? I might have guessed.. Does this satisfies your needs? Not exactly. If I have missed to mention, this is not a production issue for me. I am trying to see if PostgreSQL planner produces best plans for Data Warehouse and mining oriented queries. I agree to the point. I think Hashes can be efficiently used for sorting (and I believe they are used for joins too when a pre-sorted data set is not available via indexes). This again could my misinterpretation. It is true if 'Sorting' means 'key classification without orderings'. Hashes should always appear at inner side of a join, I'm convinced. The ordered' nature is not required for the case if outer side is already ordered. If not, separate sorting will needed. I lost you somewhere here. My be this is above my pay-grade :-) Sorry for my crumsy english :- No, it was not your English. :-) When I read it again and try to relate, I get your point. Actually true, hashes must always be performed as last option (if that is what you too meant) and if there are few other operations they must be the last one to be performed especially after sorting/grouping. Hashes must somehow make use of already sorted data (I think this something even you indicated) Well, at least with Oracle and DB2 planners I have seen that the plan produced with dense_rank performs better than a series of nested SELECT MAX(). I see your point. Although I don't know what plans they generates, and I don't see how to ordering and ranking without sorting. Could you let me see what they look like? # Nevertheless, I don't have the confidence that I can be of some # help.. I will do that if I get a DB2 system or Oracle system running. I will try to replicate the same 2 test cases and share the plan. One thing which I am sure is, the below part of the plan QUERY PLAN | Subquery Scan on __unnamed_subquery_0 (cost=12971.39..16964.99 rows=614 width=43) (actual time=2606.075..2953.937 rows=558 loops=1) would be generated as RID scan in DB2 (which I have seen to perform better than normal subquery scans in DB2). Regards Sameer Ashnik Pte Ltd
Re: [HACKERS] Fast insertion indexes: why no developments
Simon Riggs wrote Minmax indexes seem to surprise many people, so broad generalisations aren't likely to be useful. I think the best thing to do is to publish some SQL requests that demonstrate in detail what you are trying to achieve and test them against minmax indexes. That way we can discuss what does work and what doesn't work well enough yet. While I do believe in testing (since In theory there is no difference between theory and practice. In practice there is), I would like to know the properties of the minmax index before trying it. What is it supposed to be good at? What are the pros/cons? We can't ask all the users to just try the index and see if it works for them. As I said, my understanding is that is very efficient (both in insertion and in searching) when data is somehow ordered in the table. But maybe I got it wrong... Anyway, the sql scenario is simple: a table with 4 bigint indexes; data in the fields is a random bigint in the range 1-1000. Insertion is 5-10K rows/second. One search every 1-5 seconds, by one of the indexed fields (only equality, no range). There's also an index on a timestamp field, but that's not random so it doesn't give that many problems (it's actually the one where I wanted to try the minmax). -- View this message in context: http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5776976.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] configure: add git describe output to PG_VERSION when building a git tree
This makes it easy to see if the binaries were built from a real release or if they were built from a custom git tree. --- configure.in | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/configure.in b/configure.in index a4baeaf..7c5b3ce 100644 --- a/configure.in +++ b/configure.in @@ -29,7 +29,14 @@ AC_CONFIG_AUX_DIR(config) AC_PREFIX_DEFAULT(/usr/local/pgsql) AC_SUBST(configure_args, [$ac_configure_args]) -AC_DEFINE_UNQUOTED(PG_VERSION, $PACKAGE_VERSION, [PostgreSQL version as a string]) +# Append git tag based version to PG_VERSION if we're building from a git +# checkout, but don't touch PACKAGE_VERSION which is used to create other +# version variables (major version and numeric version.) +PG_VERSION=$PACKAGE_VERSION +if test -d .git; then + PG_VERSION=$PG_VERSION (`git describe --tags --long --dirty HEAD || echo unknown`) +fi +AC_DEFINE_UNQUOTED(PG_VERSION, $PG_VERSION, [PostgreSQL version as a string]) [PG_MAJORVERSION=`expr $PACKAGE_VERSION : '\([0-9][0-9]*\.[0-9][0-9]*\)'`] AC_SUBST(PG_MAJORVERSION) AC_DEFINE_UNQUOTED(PG_MAJORVERSION, $PG_MAJORVERSION, [PostgreSQL major version as a string]) -- 1.8.4.2 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
On 5 November 2013 09:57, Leonardo Francalanci m_li...@yahoo.it wrote: Simon Riggs wrote Minmax indexes seem to surprise many people, so broad generalisations aren't likely to be useful. I think the best thing to do is to publish some SQL requests that demonstrate in detail what you are trying to achieve and test them against minmax indexes. That way we can discuss what does work and what doesn't work well enough yet. While I do believe in testing (since In theory there is no difference between theory and practice. In practice there is), I would like to know the properties of the minmax index before trying it. What is it supposed to be good at? What are the pros/cons? We can't ask all the users to just try the index and see if it works for them. No, but then all users aren't suggesting we need a new index type are they? I think its reasonable for you to spend time checking whether what you require will be met, and if not, what precise query doesn't it help, so we can better design any future new-index. As I said, my understanding is that is very efficient (both in insertion and in searching) when data is somehow ordered in the table. But maybe I got it wrong... Anyway, the sql scenario is simple: a table with 4 bigint indexes; data in the fields is a random bigint in the range 1-1000. Insertion is 5-10K rows/second. One search every 1-5 seconds, by one of the indexed fields (only equality, no range). There's also an index on a timestamp field, but that's not random so it doesn't give that many problems (it's actually the one where I wanted to try the minmax). Without meaning to pick on you, imprecise analysis yields unhelpful new features. The clearer we are about what we are trying to solve the more likely we are to solve it well. 30 seconds analysis on what is needed is not sufficient to justify an additional man year of development, especially if a man year of work has already been done and the testing of the latest feature is now at hand. The requests from the indexes field, are they ordered? are they limited? Or you really want ALL calls? What is the tolerance on those? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
Simon Riggs wrote On 5 November 2013 09:57, Leonardo Francalanci lt; m_lists@ gt; wrote: While I do believe in testing (since In theory there is no difference between theory and practice. In practice there is), I would like to know the properties of the minmax index before trying it. What is it supposed to be good at? What are the pros/cons? We can't ask all the users to just try the index and see if it works for them. No, but then all users aren't suggesting we need a new index type are they? I think its reasonable for you to spend time checking whether what you require will be met, and if not, what precise query doesn't it help, so we can better design any future new-index. I don't understand the parallel We can't ask all the users to just try the index and see if it works for them with all users aren't suggesting we need a new index type. Anyway: I'm not suggesting we need a new index type. Please read my first post: I'm asking info, fearing that there's just a lot of marketing/hype in those indexes. What do you mean by spend time checking whether what you require will be met? Met by what, minmax indexes? Simon Riggs wrote As I said, my understanding is that is very efficient (both in insertion and in searching) when data is somehow ordered in the table. But maybe I got it wrong... Anyway, the sql scenario is simple: a table with 4 bigint indexes; data in the fields is a random bigint in the range 1-1000. Insertion is 5-10K rows/second. One search every 1-5 seconds, by one of the indexed fields (only equality, no range). There's also an index on a timestamp field, but that's not random so it doesn't give that many problems (it's actually the one where I wanted to try the minmax). Without meaning to pick on you, imprecise analysis yields unhelpful new features. The clearer we are about what we are trying to solve the more likely we are to solve it well. 30 seconds analysis on what is needed is not sufficient to justify an additional man year of development, especially if a man year of work has already been done and the testing of the latest feature is now at hand. I've never said that my analysis justifies a man year of work. As I said, I'm actually not confident at all that even if we had those cool indexes they would work on my scenario (marketing aside, there's not that much data/tests out there on those indexes). I just wanted to know if the matter was discussed in the past / getting more info. At the same time, I'm reluctant to try a new index hoping it will work in my case just because it's new and a man year of work has already been done. Again: what's this minmax index supposed to be good at? If it's indexing data in mostly-sequential order, it won't help me. From what I got (maybe I got it wrong?) the index stores min/max values of sequence of pages. In my case I guess that those min/max values would be close to 1 (min) /1000 (max), because I insert data in random order. So any query will scan the entire table anyway. Am I wrong? Simon Riggs wrote The requests from the indexes field, are they ordered? Mmmh, I don't think I understand the question... an operator searches for calls made by a user, so he searches in random order... Simon Riggs wrote are they limited? Or you really want ALL calls? What is the tolerance on those? I want all the calls made by the user. But there won't be many calls for 1 user. And most users will never be queried (as I said, one calls by person search every 1-5 seconds, so a very small percentage of calls will ever be queried/retrieved) -- View this message in context: http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5776982.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Crash recovery
Hi When PG crashes or the computer turned down unexpectedly, next time postmaster starts up, it does the crash recovery, actually redo xlog records, vacuum, etc. What module is responsible for crash recovery? Regards, Soroosh Sardari
Re: [HACKERS] Crash recovery
On 05.11.2013 13:21, Soroosh Sardari wrote: When PG crashes or the computer turned down unexpectedly, next time postmaster starts up, it does the crash recovery, actually redo xlog records, vacuum, etc. What module is responsible for crash recovery? See src/backend/access/transam/xlog.c. The code to read the WAL is there. It calls redo-routines in other modules, see e.g heap_redo for replaying heap records, btree_redo for B-tree index records and so forth. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] configure: add git describe output to PG_VERSION when building a git tree
On 05.11.2013 12:22, Oskari Saarenmaa wrote: This makes it easy to see if the binaries were built from a real release or if they were built from a custom git tree. Hmm, that would mean that a build from git checkout REL9_3_1 produces a different binary than one built from postgresql-9.3.1.tar.gz tarball. I can see some value in that kind of information, ie. knowing what patches a binary was built with, but this would only solve the problem for git checkouts. Even for a git checkout, the binaries won't be automatically updated unless you run configure again, which makes it pretty unreliable. -1 from me. PS, the git command in the patch doesn't work with my version of git: $ git describe --tags --long --dirty HEAD fatal: --dirty is incompatible with committishes $ git --version git version 1.8.4.rc3 - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
From: Albe Laurenz laurenz.a...@wien.gv.at If I understood the discussion correctly the use case is that there are advantages to having a database encoding different from UTF-8, but you'd still want sume UTF-8 columns. Wouldn't it be a better design to allow specifying the encoding per column? That would give you more flexibility. Yes, you are right. In the previous discussion: - That would be nice if available, but it is hard to implement multiple encodings in one database. - Some people (I'm not sure many or few) are NCHAR/NVARCHAR in other DBMSs. To invite them to PostgreSQL, it's important to support national character feature syntactically and document it in the manual. This is the first step. - As the second step, we can implement multiple encodings in one database. According to the SQL standard, NCHAR(n) is equivalent to CHAR(n) CHARACTER SET cs, where cs is an implementation-defined character set. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
MauMau wrote: From: Albe Laurenz laurenz.a...@wien.gv.at If I understood the discussion correctly the use case is that there are advantages to having a database encoding different from UTF-8, but you'd still want sume UTF-8 columns. Wouldn't it be a better design to allow specifying the encoding per column? That would give you more flexibility. Yes, you are right. In the previous discussion: - That would be nice if available, but it is hard to implement multiple encodings in one database. Granted. - Some people (I'm not sure many or few) are NCHAR/NVARCHAR in other DBMSs. To invite them to PostgreSQL, it's important to support national character feature syntactically and document it in the manual. This is the first step. I looked into the Standard, and it does not have NVARCHAR. The type is called NATIONAL CHARACTER VARYING, NATIONAL CHAR VARYING or NCHAR VARYING. I guess that the goal of this patch is to support Oracle syntax. But anybody trying to port CREATE TABLE statements from Oracle is already exposed to enough incompatibilities that the difference between NVARCHAR and NCHAR VARYING will not be the reason to reject PostgreSQL. In other words, I doubt that introducing the nonstandard NVARCHAR will have more benefits than drawbacks (new reserved word). Regarding the Standard compliant names of these data types, PostgreSQL already supports those. Maybe some documentation would help. - As the second step, we can implement multiple encodings in one database. According to the SQL standard, NCHAR(n) is equivalent to CHAR(n) CHARACTER SET cs, where cs is an implementation-defined character set. That second step would definitely have benefits. But I don't think that this requires the first step that your patch implements, it is in fact orthogonal. I don't think that there is any need to change NCHAR even if we get per-column encoding, it is just syntactic sugar to support SQL Feature F421. Why not tackle the second step first? Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
On Tue, Nov 5, 2013 at 6:57 AM, Leonardo Francalanci m_li...@yahoo.it wrote: Simon Riggs wrote Minmax indexes seem to surprise many people, so broad generalisations aren't likely to be useful. I think the best thing to do is to publish some SQL requests that demonstrate in detail what you are trying to achieve and test them against minmax indexes. That way we can discuss what does work and what doesn't work well enough yet. While I do believe in testing (since In theory there is no difference between theory and practice. In practice there is), I would like to know the properties of the minmax index before trying it. What is it supposed to be good at? What are the pros/cons? We can't ask all the users to just try the index and see if it works for them. As I said, my understanding is that is very efficient (both in insertion and in searching) when data is somehow ordered in the table. But maybe I got it wrong... Well, for one, random inserts (with random data) on a min-max index have a roughly 1/N chance of requiring a write to disk, and (N-1)/N chance of being completely free (or maybe a read to verify a write isn't needed, but that'll probably hit shared buffers), where N is the number of tuples per page. Per index page that is. Of course, non-random workloads are a different matter. Min-max indexes always require a sequential scan of the min-max index itself when querying. That works when you intend to query enough tuples to make up the cost (that is, more tuples than M * N * random_cost / seq_cost), where M is the number of pages in the index. Well, actually, since they result in better io patterns as well, the tradeoff is probably a little bit more tricky than that, in favor of min-max indexes. Min-max indexes tend to be very compact, so M is usually low. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
Hello, Please find attached pg_stat_statements-identification-v9.patch. I have tried to address the following review comments 1. Use version PGSS_TUP_V1_2 2.Fixed total time being zero 3. Remove 'session_start' from the view and use point release number to generate queryid 4. Hide only queryid and query text and not all fields from unauthorized user 5. Removed introduced field from view and code as statistics session concept is not being used 6. Removed struct Instrumentation usage 7. Updated sgml to reflect changes made. Removed all references to statistics session, and introduced fields. regards Sameer pg_stat_statements-identification-v9.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Row-security writer-side checks proposal
On Mon, Nov 4, 2013 at 8:00 PM, Craig Ringer cr...@2ndquadrant.com wrote: On 11/04/2013 09:55 PM, Robert Haas wrote: I continue to think that this syntax is misguided. For SELECT and DELETE there is only read-side security, and for INSERT there is only write-side security, so that's OK as far as it goes, but for UPDATE both read-side security and write-side security are possible, and there ought to be a way to get one without the other. This syntax won't support that cleanly. That's what I was thinking earlier too - separate FOR READ and FOR WRITE instead. The reason I came back to insert/update/delete was that it's entirely reasonable to want to prohibit deletes but permit updates to the same tuple. Both are writes; both set xmax, it's just that one _replaces_ the tuple, the other doesn't. So really, there are four cases: READ WRITE INSERT WRITE UPDATE WRITE DELETE Isn't READ similarly divisible into READ SELECT, READ UPDATE, and READ DELETE? I would generally expect that most people would want either read side security for all commands or read and write side security for all commands. I think whatever syntax we come up with this feature ought to make each of those things straightforward to get. but sometimes with different predicates for read and write, i.e. you can see rows you can't modify or can insert rows / update rows that you can't see after the change. Yes, that's possible. Similarly, saying you can update but not delete seems quite reasonable to me. If you simply want to allow UPDATE but not DELETE, you can refrain from granting the table-level privilege. The situation in which you need things separate is when you want to allow both UPDATE and DELETE but with different RLS quals for each. On the other hand, we might choose to say if you want to do things with that granularity use your own triggers to enforce it and provide only READ and WRITE for RLS. The funny thing about this whole feature is that it's just syntax support for doing things that you can already do in other ways. If you want read-side security, create a security_barrier view and select from that instead of hitting the table directly. If you want write-side security, enforce it using triggers. So essentially what this is, I think, is an attempt to invent nicer syntax around something that we already have, and provide a one-stop-shopping experience rather than a roll-your-own experience for people who want row-level security. Now maybe that's fine. But given that, I think it's pretty important that we get the syntax right. Because if you're adding a feature primarily to add a more convenient syntax, then the syntax had better actually be convenient. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.4] row level security
On Mon, Nov 4, 2013 at 8:46 PM, Craig Ringer cr...@2ndquadrant.com wrote: On 11/04/2013 11:17 PM, Robert Haas wrote: I'd still like to here what's wrong with what I said here: http://www.postgresql.org/message-id/ca+tgmoyr1phw3x9vnvuwdcfxkzk2p_jhtwc0fv2q58negcx...@mail.gmail.com For me, just my understanding. I'm still too new to the planner and rewriter to grasp that properly as written. I was responding to Tom's objection, and he makes a good point about CASE and optimisation. We have to be free to re-order and pre-evaluate where LEAKPROOF flags make it safe and permissible, without ever otherwise doing so. That makes the SECURITY BARRIER subquery look better, since the limited pull-up / push-down is already implemented there. Robert, any suggesitons on how to approach what you suggest? I'm pretty new to the planner's guts, but I know there've been some complaints about the way the current RLS code fiddles with Vars when it changes a direct scan of a rel into a subquery scan. The code in: https://github.com/ringerc/postgres/blob/rls-9.4/src/backend/optimizer/prep/prepunion.c#L1647 and https://github.com/ringerc/postgres/blob/rls-9.4/src/backend/optimizer/prep/prepunion.c#L1591 seems to be the one folks were complaining about earlier. I haven't studied this patch in detail, but I see why there's some unhappiness about that code: it's an RLS-specific kluge. Just shooting from the hip here, maybe we should attack the problem of making security-barrier views updatable first, as a separate patch. I would think that if we make that work, this will also work without, hopefully, any special hackery. And we'd get a separate, independently useful feature out of it, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql: Remove internal uses of CTimeZone/HasCTZSet.
On Mon, Nov 4, 2013 at 8:17 PM, Noah Misch n...@leadboat.com wrote: On Mon, Nov 04, 2013 at 02:34:02PM -0500, Tom Lane wrote: Noah Misch n...@leadboat.com writes: On Fri, Nov 01, 2013 at 04:51:34PM +, Tom Lane wrote: Remove internal uses of CTimeZone/HasCTZSet. This changed EncodeDateTime() output for USE_SQL_DATES and USE_GERMAN_DATES styles, because it inserts a space before tzn but does not insert a space before EncodeTimezone() output. Example: set datestyle = sql,mdy; select '2013-01-01'::timestamptz; old output: timestamptz 01/01/2013 00:00:00+00 new output: timestamptz - 01/01/2013 00:00:00 +00 I assume this was unintended. Yeah, I had seen some cases of that. I don't find it of great concern. We'd have to insert some ugly special-case code that looks at the zone name to decide whether to insert a space or not, and I don't think it'd actually be an improvement to do so. (Arguably, these formats are more consistent this way.) Still, this change is probably a sufficient reason not to back-patch this part of the fix. I was prepared to suppose that no substantial clientele relies on the to_char() TZ format code expanding to blank, the other behavior that changed with this patch. It's more of a stretch to figure applications won't stumble over this new whitespace. I do see greater consistency in the new behavior, but changing type output is a big deal. While preserving the old output does require ugly special-case code, such code was in place for years until removal by this commit and the commit following it. Perhaps making the behavior change is best anyway, but we should not conclude that decision on the basis of its origin as a side effect of otherwise-desirable refactoring. I have to admit I fear this will break a huge amount of application code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] exit_horribly vs exit_nicely in pg_dump
Hello. Examining pg_dump sources recently I've found that different exit procedure used for the same situations. A quick example from pg_dump.c: if (dataOnly schemaOnly) exit_horribly(NULL, options -s/--schema-only and -a/--data-only cannot be used together\n); if (dataOnly outputClean) exit_horribly(NULL, options -c/--clean and -a/--data-only cannot be used together\n); if (dump_inserts oids) { write_msg(NULL, options --inserts/--column-inserts and -o/--oids cannot be used together\n); write_msg(NULL, (The INSERT command cannot set OIDs.)\n); exit_nicely(1); } I suppose this should be call to exit_nicely() for all possible cases. The only need for calling exit_horribly() is when we are deep down in the multithreaded code, AFAIK. -- With best wishes, Pavel mailto:pa...@gf.microolap.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Row-security writer-side checks proposal
* Robert Haas (robertmh...@gmail.com) wrote: Now maybe that's fine. But given that, I think it's pretty important that we get the syntax right. Because if you're adding a feature primarily to add a more convenient syntax, then the syntax had better actually be convenient. I agree that we want to get the syntax correct, but also very clear as it's security related and we don't want anyone surprised by what happens when they use it. The idea, as has been discussed in the past, is to then allow tying RLS in with SELinux and provide MAC. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] [PATCH] configure: add git describe output to PG_VERSION when building a git tree
On Tue, Nov 05, 2013 at 02:06:26PM +0200, Heikki Linnakangas wrote: On 05.11.2013 12:22, Oskari Saarenmaa wrote: This makes it easy to see if the binaries were built from a real release or if they were built from a custom git tree. Hmm, that would mean that a build from git checkout REL9_3_1 produces a different binary than one built from postgresql-9.3.1.tar.gz tarball. Good point - how about adding git describe information only if the checkout doesn't match a tag exactly? So when you build REL9_3_1 there would be no extra information, but when you have any local changes on top of it we'd add the git describe output. I can see some value in that kind of information, ie. knowing what patches a binary was built with, but this would only solve the problem for git checkouts. Even for a git checkout, the binaries won't be automatically updated unless you run configure again, which makes it pretty unreliable. -1 from me. I don't think we can solve the problem of finding local changes for all the things people may do, but I'd guess it's pretty common to build postgresql from a clean local git checkout and with this change at least some portion of users would get some extra information. To solve the rerun configure thing we could put git version in a new header file and have a makefile dependency on .git/index for regenerating that file when needed. We could also let users override the extra version with a command line argument for configure so distributions could put the distribution package version there, for example 9.3.1-2.fc20 on my Fedora system. PS, the git command in the patch doesn't work with my version of git: $ git describe --tags --long --dirty HEAD fatal: --dirty is incompatible with committishes $ git --version git version 1.8.4.rc3 I initially used HEAD when looking at the git describe values, but then added --dirty which obviously doesn't make sense when describing a ref. Sorry about the broken patch, I was applying these changes manually from configure.in to configure because I didn't have the proper autoconf version installed (autoconf 2.63 doesn't seem to be available in many distributions anymore, maybe the required version could be upgraded at some point..) How about the patch below to fix the exact tag and --dirty issues? / Oskari diff --git a/configure.in b/configure.in index a4baeaf..d253286 100644 --- a/configure.in +++ b/configure.in @@ -29,7 +29,18 @@ AC_CONFIG_AUX_DIR(config) AC_PREFIX_DEFAULT(/usr/local/pgsql) AC_SUBST(configure_args, [$ac_configure_args]) -AC_DEFINE_UNQUOTED(PG_VERSION, $PACKAGE_VERSION, [PostgreSQL version as a string]) +# Append git tag based version to PG_VERSION if we're building from a git +# checkout that doesn't match a tag exactly. Don't touch PACKAGE_VERSION +# which is used to create other version variables (major version and numeric +# version.) +PG_VERSION=$PACKAGE_VERSION +if test -d .git; then + exact=`git describe --tags --exact-match --dirty 2/dev/null || echo dirty` + if echo $exact | grep -q dirty; then +PG_VERSION=$PG_VERSION (`git describe --tags --long --dirty || echo unknown`) + fi +fi +AC_DEFINE_UNQUOTED(PG_VERSION, $PG_VERSION, [PostgreSQL version as a string]) [PG_MAJORVERSION=`expr $PACKAGE_VERSION : '\([0-9][0-9]*\.[0-9][0-9]*\)'`] AC_SUBST(PG_MAJORVERSION) AC_DEFINE_UNQUOTED(PG_MAJORVERSION, $PG_MAJORVERSION, [PostgreSQL major version as a string]) -- 1.8.4.2 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] configure: add git describe output to PG_VERSION when building a git tree
On Tue, Nov 5, 2013 at 2:05 PM, Oskari Saarenmaa o...@ohmu.fi wrote: I can see some value in that kind of information, ie. knowing what patches a binary was built with, but this would only solve the problem for git checkouts. Even for a git checkout, the binaries won't be automatically updated unless you run configure again, which makes it pretty unreliable. -1 from me. I don't think we can solve the problem of finding local changes for all the things people may do, but I'd guess it's pretty common to build postgresql from a clean local git checkout and with this change at least some portion of users would get some extra information. To solve the rerun configure thing we could put git version in a new header file and have a makefile dependency on .git/index for regenerating that file when needed. We could also let users override the extra version with a command line argument for configure so distributions could put the distribution package version there, for example 9.3.1-2.fc20 on my Fedora system. PS, the git command in the patch doesn't work with my version of git: $ git describe --tags --long --dirty HEAD fatal: --dirty is incompatible with committishes $ git --version git version 1.8.4.rc3 I initially used HEAD when looking at the git describe values, but then added --dirty which obviously doesn't make sense when describing a ref. Sorry about the broken patch, I was applying these changes manually from configure.in to configure because I didn't have the proper autoconf version installed (autoconf 2.63 doesn't seem to be available in many distributions anymore, maybe the required version could be upgraded at some point..) How about the patch below to fix the exact tag and --dirty issues? A similar thing has been discussed a couple of months ago, and the idea to add any git-related information in configure has been rejected. You can have a look at the discussion here: http://www.postgresql.org/message-id/3038.1374031...@sss.pgh.pa.us As well as a potential solution here: http://www.postgresql.org/message-id/c51433da5e804767724d60eea57f4178.squir...@webmail.xs4all.nl Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical column order and physical column order
David Rowley escribió: In this case how does Postgresql know that attnum 3 is the 2nd user column in that table? Unless I have misunderstood something then there must be some logic in there to skip dropped columns and if so then could it not just grab the attphynum at that location? then just modify the 1-5 places listed above to sort on attlognum? During parse analysis, those columns obtained from pg_attribute are transformed to target list entries; they travel through the parser and executor in that representation, and TupleDescs are constructed from those lists. Making that works correctly needs some more code than just sorting on attlognum. There are some other messy parts like handling composite types when passed to functions, COPY, and some other things I don't remember. Did you look at the places my patch touches? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
Claudio Freire wrote Min-max indexes always require a sequential scan of the min-max index itself when querying. I'm worried about the number of heap pages that will be scanned. My understanding is that given the random input, the index will not be selective enough, and will end up requiring a lot of page scanning to get the relevant rows. That is: the selectivity of the min/max values will be very low, given the high cardinality and randomness of the input; I'm afraid that, in most page ranges, the min will be lower than the queried ones, and the max higher, resulting in lots of heap page scans. Quick test: a table with 1M rows, with random values from 1 to 1000: create table t as select generate_series(1, 100) as i, trunc(random() * 1000) as n; suppose a page range contains 100 rows (but my understanding is that minmax index will use a much bigger row count...), let's find how many page ranges should be scanned to find a series of 50 values (again, random in the 1-1000 range): with cte as (select min(n) as minn, max(n) as maxn, i/100 from t group by i/100), inp as (select generate_series(1, 50) iinp, trunc(random() * 1000) as s) select count(*) from inp left outer join cte on inp.s between minn and maxn group by iinp I get 9000 pages for 49 values out of 50... which means scanning 90% of the table. Either my sql is not correct (likely), or my understanding of the minmax index is not correct (even more likely), or the minmax index is not usable in a random inputs scenario. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5777009.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] configure: add git describe output to PG_VERSION when building a git tree
Oskari Saarenmaa o...@ohmu.fi writes: On Tue, Nov 05, 2013 at 02:06:26PM +0200, Heikki Linnakangas wrote: I can see some value in that kind of information, ie. knowing what patches a binary was built with, but this would only solve the problem for git checkouts. Even for a git checkout, the binaries won't be automatically updated unless you run configure again, which makes it pretty unreliable. -1 from me. I don't think we can solve the problem of finding local changes for all the things people may do, but I'd guess it's pretty common to build postgresql from a clean local git checkout and with this change at least some portion of users would get some extra information. I agree with Heikki that this is basically useless. Most of my builds are from git + uncommitted changes, so telling me what the top commit was has no notable value. Even if I always committed before building, the hash tags are only decipherable until I discard that branch. So basically, this would only be useful to people building production servers from random git pulls from development or release-branch mainline. How many people really do that, and should we inconvenience everybody else to benefit them? (Admittedly, the proposed patch is only marginally inconvenient as-is, but anything that would force a header update after any commit would definitely put me on the warpath.) BTW, I don't think the proposed patch works at all in a VPATH build. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql: Remove internal uses of CTimeZone/HasCTZSet.
Robert Haas robertmh...@gmail.com writes: On Mon, Nov 4, 2013 at 8:17 PM, Noah Misch n...@leadboat.com wrote: I was prepared to suppose that no substantial clientele relies on the to_char() TZ format code expanding to blank, the other behavior that changed with this patch. It's more of a stretch to figure applications won't stumble over this new whitespace. I do see greater consistency in the new behavior, but changing type output is a big deal. While preserving the old output does require ugly special-case code, such code was in place for years until removal by this commit and the commit following it. Perhaps making the behavior change is best anyway, but we should not conclude that decision on the basis of its origin as a side effect of otherwise-desirable refactoring. I have to admit I fear this will break a huge amount of application code. I don't believe that. You are talking about the intersection of (1) people who use a brute force timezone (which we already know is a small set, else the original bug would've been noticed long ago). (2) people who use SQL or German datestyle, neither of which is default. (3) people whose apps will fail if there's whitespace at a specific place in timestamp output, even though in many other cases there will be whitespace at that place anyway, notably including the case where they select any regular timezone. It's possible that this intersection is nonempty, but huge amount? Come on. This is minor compared to the application compatibility issues we come up with in every major release. More, I would argue that the number of failing applications is likely to be exceeded by the number that no longer fail upon selection of a brute-force zone, because they'd only ever been coded or tested with the normal-zone-name formatting. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
On Tue, Nov 5, 2013 at 11:28 AM, Leonardo Francalanci m_li...@yahoo.it wrote: I get 9000 pages for 49 values out of 50... which means scanning 90% of the table. Either my sql is not correct (likely), or my understanding of the minmax index is not correct (even more likely), or the minmax index is not usable in a random inputs scenario. Yep, you're correct. That's the cost for querying random values. But, both real data isn't truly random, and you haven't really analyzed update cost, which is what we were talking about in that last post. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
Claudio Freire wrote real data isn't truly random Well, let's try normal_rand??? create table t1 as select trunc(normal_rand(100, 50, 3)) as n, generate_series(1, 100) as i; with cte as (select min(n) as minn, max(n) as maxn, i/100 from t1 group by i/100), inp as (select generate_series(1, 100) iinp, trunc(normal_rand(100, 50, 3)) as s) select count(*),iinp from inp left outer join cte on inp.s between minn and maxn group by iinp; Not that much different in my run... Claudio Freire wrote you haven't really analyzed update cost, which is what we were talking about in that last post. I don't care for a better update cost if the cost to query is a table scan. Otherwise, I'll just claim that no index at all is even better than minmax: 0 update cost, pretty much same query time. Maybe there's value in minmax indexes for sequential data, but not for random data, which is the topic of this thread. BTW I would like to see some performance tests on the minmax indexes vs btree for the sequential inputs... is the gain worth it? I couldn't find any mention of performance tests in the minmax threads. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5777020.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] configure: add git describe output to PG_VERSION when building a git tree
* Tom Lane (t...@sss.pgh.pa.us) wrote: I agree with Heikki that this is basically useless. Most of my builds are from git + uncommitted changes, so telling me what the top commit was has no notable value. The focus of this change would really be, imv anyway, for more casual PG developers, particularly those familiar with github and who work with branches pushed there by other developers. Even if I always committed before building, the hash tags are only decipherable until I discard that branch. Certainly true- but then, are you typically keeping binaries around after you discard the branch? Or distributing those binaries to others? Seems unlikely. However, if you're pulling in many branches from remote locations and building lots of PG binaries, keeping it all straight and being confident you didn't mix anything can be a bit more challenging. So basically, this would only be useful to people building production servers from random git pulls from development or release-branch mainline. How many people really do that, and should we inconvenience everybody else to benefit them? Not many do it today because we actively discourage it by requiring patches to be posted to the mailing list and the number of people writing PG patches is relatively small. Even so though, I can see folks like certain PG-on-cloud providers, who are doing testing, or even deployments, with various patches to provide us feedback on them, and therefore have to manage a bunch of different binaries, might find it useful. (Admittedly, the proposed patch is only marginally inconvenient as-is, but anything that would force a header update after any commit would definitely put me on the warpath.) BTW, I don't think the proposed patch works at all in a VPATH build. Clearly, that would need to be addressed. All-in-all, I'm not super excited about this but I also wouldn't mind, so while not really a '+1', I'd say '+0'. Nice idea, if it isn't painful to deal with and maintain. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] [PATCH] configure: add git describe output to PG_VERSION when building a git tree
Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: So basically, this would only be useful to people building production servers from random git pulls from development or release-branch mainline. How many people really do that, and should we inconvenience everybody else to benefit them? Not many do it today because we actively discourage it by requiring patches to be posted to the mailing list and the number of people writing PG patches is relatively small. Even so though, I can see folks like certain PG-on-cloud providers, who are doing testing, or even deployments, with various patches to provide us feedback on them, and therefore have to manage a bunch of different binaries, might find it useful. I can see that there might be a use for tagging multiple binaries, I just don't believe that this is a particularly helpful way to do it. The last-commit tag is neither exactly the right data nor even a little bit user-friendly. What about, say, a configure option to add a user-specified string to the version() result? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] configure: add git describe output to PG_VERSION when building a git tree
On 11/05/2013 10:32 AM, Stephen Frost wrote: All-in-all, I'm not super excited about this but I also wouldn't mind, so while not really a '+1', I'd say '+0'. Nice idea, if it isn't painful to deal with and maintain. I doubt it's buying us anything worth having. What's more, it might get in the road of some other gadget that would be worth having. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] configure: add git describe output to PG_VERSION when building a git tree
* Tom Lane (t...@sss.pgh.pa.us) wrote: What about, say, a configure option to add a user-specified string to the version() result? I quite like that idea, personally. Folks who care about it being a git tag could then trivially get that also. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] better atomics
Hi, (Coming back to this now) On 2013-10-28 21:55:22 +0100, Andres Freund wrote: The list I have previously suggested was: * pg_atomic_load_u32(uint32 *) * uint32 pg_atomic_store_u32(uint32 *) To be able to write code without spreading volatiles all over while making it very clear that that part of the code is worthy of attention. * uint32 pg_atomic_exchange_u32(uint32 *ptr, uint32 val) * bool pg_atomic_compare_exchange_u32(uint32 *ptr, uint32 *expected, uint32 newval) So what I've previously proposed did use plain types for the to-be-manipulated atomic integers. I am not sure about that anymore though, C11 includes support for atomic operations, but it assumes that the to-be-manipulated variable is of a special atomic type. While I am not propose that we try to emulate C11's API completely (basically impossible as it uses type agnostic functions, it also exposes too much), it seems to make sense to allow PG's atomics to be implemented by C11's. The API is described at: http://en.cppreference.com/w/c/atomic The fundamental difference to what I've suggested so far is that the atomic variable isn't a plain integer/type but a distinct datatype that can *only* be manipulated via the atomic operators. That has the nice property that it cannot be accidentally manipulated via plain operators. E.g. it wouldn't be uint32 pg_atomic_fetch_add_u32(uint32 *ptr, uint32 add_); but uint32 pg_atomic_fetch_add_u32(pg_atomic_uint32 *ptr, uint32 add_); where, for now, atomic_uint32 is something like: typedef struct pg_atomic_uint32 { volatile uint32 value; } pg_atomic_uint32; a future C11 implementation would just typedef C11's respective atomic type to pg_atomic_uint32. Comments? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] configure: add git describe output to PG_VERSION when building a git tree
On 11/05/2013 10:53 AM, Stephen Frost wrote: * Tom Lane (t...@sss.pgh.pa.us) wrote: What about, say, a configure option to add a user-specified string to the version() result? I quite like that idea, personally. Folks who care about it being a git tag could then trivially get that also. +1. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Row-security writer-side checks proposal
On Tue, Nov 5, 2013 at 9:01 AM, Stephen Frost sfr...@snowman.net wrote: * Robert Haas (robertmh...@gmail.com) wrote: Now maybe that's fine. But given that, I think it's pretty important that we get the syntax right. Because if you're adding a feature primarily to add a more convenient syntax, then the syntax had better actually be convenient. I agree that we want to get the syntax correct, but also very clear as it's security related and we don't want anyone surprised by what happens when they use it. The idea, as has been discussed in the past, is to then allow tying RLS in with SELinux and provide MAC. No argument. I think convenient and unsurprising are closely-aligned goals. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] configure: allow adding a custom string to PG_VERSION
This can be used to tag custom built packages with an extra version string such as the git describe id or distribution package release version. Based on pgsql-hackers discussion: http://www.postgresql.org/message-id/20131105102226.ga26...@saarenmaa.fi Signed-off-by: Oskari Saarenmaa o...@ohmu.fi --- configure.in | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/configure.in b/configure.in index a4baeaf..5459c71 100644 --- a/configure.in +++ b/configure.in @@ -29,11 +29,15 @@ AC_CONFIG_AUX_DIR(config) AC_PREFIX_DEFAULT(/usr/local/pgsql) AC_SUBST(configure_args, [$ac_configure_args]) -AC_DEFINE_UNQUOTED(PG_VERSION, $PACKAGE_VERSION, [PostgreSQL version as a string]) [PG_MAJORVERSION=`expr $PACKAGE_VERSION : '\([0-9][0-9]*\.[0-9][0-9]*\)'`] AC_SUBST(PG_MAJORVERSION) AC_DEFINE_UNQUOTED(PG_MAJORVERSION, $PG_MAJORVERSION, [PostgreSQL major version as a string]) +# Allow adding a custom string to PG_VERSION +PGAC_ARG_REQ(with, extra-version, [NAME], [extra version information], +[PG_VERSION=$PACKAGE_VERSION ($withval)], [PG_VERSION=$PACKAGE_VERSION]) +AC_DEFINE_UNQUOTED(PG_VERSION, $PG_VERSION, [PostgreSQL version as a string]) + AC_CANONICAL_HOST template= @@ -1920,7 +1924,7 @@ else fi AC_DEFINE_UNQUOTED(PG_VERSION_STR, - [PostgreSQL $PACKAGE_VERSION on $host, compiled by $cc_string, `expr $ac_cv_sizeof_void_p \* 8`-bit], + [PostgreSQL $PG_VERSION on $host, compiled by $cc_string, `expr $ac_cv_sizeof_void_p \* 8`-bit], [A string containing the version number, platform, and C compiler]) # Supply a numeric version string for use by 3rd party add-ons -- 1.8.4.2 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Disallow pullup of a subquery with a subquery in its targetlist?
Back at http://www.postgresql.org/message-id/520d221e.2060...@gmail.com there was a complaint about strange behavior of a query that looks basically like this: SELECT ... FROM (SELECT ... , ( SELECT ... ORDER BY random() LIMIT 1 ) AS color_id FROM ... ) s LEFT JOIN ... The problem is that the planner decides it can pull up the subquery s, or flatten it into the outer query. This entails substituting the subqury's targetlist expressions for outer-query Vars referencing s, and there's more than one reference to s.color_id. So we get multiple copies of the inner subquery, and they will produce different results at runtime due to the use of random(). This results in inconsistent behavior. We decided long ago that we should forbid pullup of subqueries that contain volatile functions in their targetlists, because of what's basically the same hazard: you might get more evaluations of the volatile functions than you expected, yielding inconsistent results and/or unwanted side-effects. I first wondered why the instance of random() didn't prevent pullup in this example. That's because contain_volatile_functions() does not recurse into SubLinks, which maybe is the wrong thing; but I'm hesitant to change it without detailed analysis of all the (many) call sites. However, I think that a good case could also be made for fixing this by deciding that we shouldn't pull up if there are SubLinks in the subquery targetlist, period. Even without any volatile functions, multiple copies of a subquery seem like a probable loser cost-wise. Thoughts? If we do change this, should we back-patch it? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
On Tue, Nov 5, 2013 at 12:25 AM, Leonardo Francalanci m_li...@yahoo.itwrote: Andres Freund-3 wrote On 2013-11-04 11:27:33 -0500, Robert Haas wrote: On Mon, Nov 4, 2013 at 11:24 AM, Claudio Freire lt; klaussfreire@ gt; wrote: Such a thing would help COPY, so maybe it's worth a look I have little doubt that a deferred insertion buffer of some kind could help performance on some workloads, though I suspect the buffer would have to be pretty big to make it worthwhile on a big COPY that generates mostly-random insertions. Even for random data presorting the to-be-inserted data appropriately could result in much better io patterns. Mmh, I'm afraid that the buffer should be huge to get some real advantage. You have to buffer enough values to avoid touching entire pages, which is not that easy. Some experiments I did a few years ago showed that applying sorts to the data to be inserted could be helpful even when the sort batch size was as small as one tuple per 5 pages of existing index. Maybe even less. Cheers, Jeff
Re: Handle LIMIT/OFFSET before select clause (was: [HACKERS] Feature request: optimizer improvement)
I'm wondering what type of index would work for this as it is a volatile function. Not knowing how PGs optimizer runs, I'm at a loss as to why this wouldn't be possible or worth doing. It seems to me that all functions in the select part of the statement could be calculated at the end of the query after the results have been gathered, and even after the sorting had been done as long as the column wasn't part of the order by (or perhaps group by). I have an entire set of functions that perform in this way. For example, I'm selecting a list of all my products and the function does a complex calculation based on inventory in the warehouse + expected deliveries from the factory to determine how many of each item is available, and when they first become available. What's helpful is for the users search criteria to initially limit the search result, and then I want to paginate the results and only show them a few at a time. In the verbose syntax I mentioned originally, the query performs well, in the most straightforward syntax, it does not. I'm not sure I even need to hint the optimizer to perform this type of an optimization as it seems it would be beneficial (or at least not detrimental) 100% of the time. On Sat, Nov 2, 2013 at 10:06 AM, Tom Lane t...@sss.pgh.pa.us wrote: Atri Sharma atri.j...@gmail.com writes: I understand the reasons for executing SELECT before the sort. But, couldnt we get the planner to see the LIMIT part and push the sort node above the select node for this specific case? [ Shrug... ] I don't see the point. If the OP actually cares about the speed of this query, he's going to want to avoid the sort step too, which is what makes the index a good idea. More generally, this is not a transformation we could apply unconditionally --- at the very least it'd need to be avoided when volatile functions are involved, and I don't think it's invariably a win from a cost standpoint even if there aren't semantic blockers. But right now the planner has no real ability to reason about placement of SELECT-list evaluation: it's done in a fixed spot in any particular plan structure. I don't think it's practical to add such considerations to the rat's nest that is grouping_planner and friends. I have ambitions to replace all that with a Path-generation-and-comparison approach, and the Paths used for this would need to carry some indication of which expressions would be evaluated where. So maybe once that's done we could think about whether this is worth doing. I remain dubious though. regards, tom lane -- Joe's Computer Service 405-227-0951 Computer Running Slow? Call Joe! $125, Your computer will run like new! www.JoesComputerService.net
Re: [HACKERS] [PATCH] configure: add git describe output to PG_VERSION when building a git tree
On Tue, Nov 5, 2013 at 6:35 AM, Tom Lane t...@sss.pgh.pa.us wrote: Oskari Saarenmaa o...@ohmu.fi writes: On Tue, Nov 05, 2013 at 02:06:26PM +0200, Heikki Linnakangas wrote: I can see some value in that kind of information, ie. knowing what patches a binary was built with, but this would only solve the problem for git checkouts. Even for a git checkout, the binaries won't be automatically updated unless you run configure again, which makes it pretty unreliable. -1 from me. I don't think we can solve the problem of finding local changes for all the things people may do, but I'd guess it's pretty common to build postgresql from a clean local git checkout and with this change at least some portion of users would get some extra information. I agree with Heikki that this is basically useless. Most of my builds are from git + uncommitted changes, so telling me what the top commit was has no notable value. Even if I always committed before building, the hash tags are only decipherable until I discard that branch. I nearly always remember to set config's prefix to some directory name that describes the uncommitted changes which I am reviewing or testing. Also including into the directory name the git commit to which those changes were applied is awkward and easy to forgot to do--the kind of thing best done by software. (And if I've discarded the branch, that pretty much tells me what I need to know about the binary built from it--obsolete.) Cheers, Jeff
Re: [HACKERS] Fast insertion indexes: why no developments
On Tue, Nov 5, 2013 at 12:22 PM, Leonardo Francalanci m_li...@yahoo.it wrote: Claudio Freire wrote you haven't really analyzed update cost, which is what we were talking about in that last post. I don't care for a better update cost if the cost to query is a table scan. Otherwise, I'll just claim that no index at all is even better than minmax: 0 update cost, pretty much same query time. Maybe there's value in minmax indexes for sequential data, but not for random data, which is the topic of this thread. Well, of course, they're not magic pixie dust. But is your data really random? (or normal?) That's the thing... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part 1: additional information
On 04.11.2013 23:44, Alexander Korotkov wrote: On Mon, Oct 21, 2013 at 11:12 PM, Alexander Korotkov aekorot...@gmail.comwrote: Attached version of patch is debugged. I would like to note that number of bugs was low and it wasn't very hard to debug. I've rerun tests on it. You can see that numbers are improved as the result of your refactoring. event | period ---+- index_build | 00:01:45.416822 index_build_recovery | 00:00:06 index_update | 00:05:17.263606 index_update_recovery | 00:01:22 search_new| 00:24:07.706482 search_updated| 00:26:25.364494 (6 rows) label | blocks_mark +- search_new | 847587636 search_updated | 881210486 (2 rows) label | size ---+--- new | 419299328 after_updates | 715915264 (2 rows) Beside simple bugs, there was a subtle bug in incremental item indexes update. I've added some more comments including ASCII picture about how item indexes are shifted. Now, I'm trying to implement support of old page format. Then we can decide which approach to use. Attached version of patch has support of old page format. Patch still needs more documentation and probably refactoring, but I believe idea is clear and it can be reviewed. In the patch I have to revert change of null category placement for compatibility with old posting list format. Thanks, just glanced at this quickly. If I'm reading the patch correctly, old-style non-compressed posting tree leaf pages are not vacuumed at all; that's clearly not right. I argued earlier that we don't need to handle the case that compressing a page makes it larger, so that the existing items don't fit on the page anymore. I'm having some second thoughts on that; I didn't take into account the fact that the mini-index in the new page format takes up some space. I think it's still highly unlikely that there isn't enough space on a 8k page, but it's not totally crazy with a non-standard small page size. So at least that situation needs to be checked for with an ereport(), rather than Assert. To handle splitting a non-compressed page, it seems that you're relying on the fact that before splitting, we try to insert, which compresses the page. The problem with that is that it's not correctly WAL-logged. The split record that follows includes a full copy of both page halves, but if the split fails for some reason, e.g you run out of disk space, there is no WAL record at all of the the compression. I'd suggest doing the compression in the insert phase on a temporary copy of the page, leaving the original page untouched if there isn't enough space for the insertion to complete. (You could argue that this case can't happen because the compression must always create enough space to insert one more item. maybe so, but at least there should be an explicit check for that.) - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
Claudio Freire wrote Well, of course, they're not magic pixie dust. Of course they aren't. I think they can make a difference in a sequential input scenario. But I'm not the one who said that they are fit to solve the problems me and other people are talking about in this thread. Claudio Freire wrote But is your data really random? (or normal?) That's the thing... I still don't understand. Even if the data was normal, it wouldn't work. You can try and change the 3rd parameter in normal_rand and get a narrower distribution, but at the same time the query values should be narrower... so you'll get the same results. The problem is: if the range you get between min and max is too big in each page range, you'll end up scanning a lot of heap pages. To me, the thing is: has anyone made performance tests of these minmax indexes with an input that is not sequential? (BTW I'd like to see tests for the sequential input scenario too...) -- View this message in context: http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5777055.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
Jeff Janes wrote Some experiments I did a few years ago showed that applying sorts to the data to be inserted could be helpful even when the sort batch size was as small as one tuple per 5 pages of existing index. Maybe even less. Cool!!! Do you have any idea/hint on how I could try and replicate that? Do you remember how you did it? -- View this message in context: http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5777056.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
On Tue, Nov 5, 2013 at 2:52 PM, Leonardo Francalanci m_li...@yahoo.it wrote: Jeff Janes wrote Some experiments I did a few years ago showed that applying sorts to the data to be inserted could be helpful even when the sort batch size was as small as one tuple per 5 pages of existing index. Maybe even less. Cool!!! Do you have any idea/hint on how I could try and replicate that? Do you remember how you did it? I do it regularly by sorting tuples before inserting/updating. It helps quite significantly for batches of ~1000 tuples (well, in my case). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] List of binary-compatible data types
Noah, Also, JSON -- Text seems to be missing from the possible binary conversions. That's a TODO, I suppose. Only json -- text, not json -- text. Note that you can add the cast manually if you have an immediate need. Huh? Why would text -- JSON require a physical rewrite? We have to validate it, sure, but we don't need to rewrite it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] List of binary-compatible data types
On Tue, Nov 05, 2013 at 10:00:15AM -0800, Josh Berkus wrote: Noah, Also, JSON -- Text seems to be missing from the possible binary conversions. That's a TODO, I suppose. Only json -- text, not json -- text. Note that you can add the cast manually if you have an immediate need. Huh? Why would text -- JSON require a physical rewrite? We have to validate it, sure, but we don't need to rewrite it. That's all true, but the system has no concept like this cast validates the data, never changing it. We would first need to add metadata supporting such a concept. On the other hand, create cast (json as text) without function; leans only on concepts the system already knows. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Better error message for window-function spec bizarreness
We've had a couple of complaints about the error message that's thrown for the case where you try to copy-and-modify a window definition that includes a frame clause: http://www.postgresql.org/message-id/200911191711.najhbped009...@wwwmaster.postgresql.org http://www.postgresql.org/message-id/CADyrUxP-5pNAqxjuFx9ToeTEhsxwo942PS3Bk_=jekdmvg0...@mail.gmail.com I propose the attached patch, which changes the text of the message to cannot copy window \%s\ because it has a frame clause, and then adds a HINT Omit the parentheses in this OVER clause. if the clause is just OVER (windowname) and doesn't include any attempt to modify the window's properties. I think we should back-patch this into all versions that have window functions (ie, all supported branches). Any objections or better ideas? regards, tom lane diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c index ea90e58..29183c1 100644 *** a/src/backend/parser/parse_clause.c --- b/src/backend/parser/parse_clause.c *** transformWindowDefinitions(ParseState *p *** 1735,1745 /* * Per spec, a windowdef that references a previous one copies the * previous partition clause (and mustn't specify its own). It can ! * specify its own ordering clause. but only if the previous one had * none. It always specifies its own frame clause, and the previous ! * one must not have a frame clause. (Yeah, it's bizarre that each of * these cases works differently, but SQL:2008 says so; see 7.11 ! * window clause syntax rule 10 and general rule 1.) */ if (refwc) { --- 1735,1750 /* * Per spec, a windowdef that references a previous one copies the * previous partition clause (and mustn't specify its own). It can ! * specify its own ordering clause, but only if the previous one had * none. It always specifies its own frame clause, and the previous ! * one must not have a frame clause. Yeah, it's bizarre that each of * these cases works differently, but SQL:2008 says so; see 7.11 ! * window clause syntax rule 10 and general rule 1. The frame ! * clause rule is especially bizarre because it makes OVER foo ! * different from OVER (foo), solely in that the latter is required ! * to throw an error if foo has a nondefault frame clause. Well, ours ! * not to reason why, but we do go out of our way to throw a useful ! * error message for such cases. */ if (refwc) { *** transformWindowDefinitions(ParseState *p *** 1778,1788 wc-copiedOrder = false; } if (refwc refwc-frameOptions != FRAMEOPTION_DEFAULTS) ereport(ERROR, (errcode(ERRCODE_WINDOWING_ERROR), ! errmsg(cannot override frame clause of window \%s\, ! windef-refname), parser_errposition(pstate, windef-location))); wc-frameOptions = windef-frameOptions; /* Process frame offset expressions */ wc-startOffset = transformFrameOffset(pstate, wc-frameOptions, --- 1783,1809 wc-copiedOrder = false; } if (refwc refwc-frameOptions != FRAMEOPTION_DEFAULTS) + { + /* + * Use this message if this is a WINDOW clause, or if it's an OVER + * clause that includes ORDER BY or framing clauses. (We already + * rejected PARTITION BY above, so no need to check that.) + */ + if (windef-name || + orderClause || windef-frameOptions != FRAMEOPTION_DEFAULTS) + ereport(ERROR, + (errcode(ERRCODE_WINDOWING_ERROR), + errmsg(cannot copy window \%s\ because it has a frame clause, + windef-refname), + parser_errposition(pstate, windef-location))); + /* Else this clause is just OVER (foo), so say this: */ ereport(ERROR, (errcode(ERRCODE_WINDOWING_ERROR), ! errmsg(cannot copy window \%s\ because it has a frame clause, ! windef-refname), ! errhint(Omit the parentheses in this OVER clause.), parser_errposition(pstate, windef-location))); + } wc-frameOptions = windef-frameOptions; /* Process frame offset expressions */ wc-startOffset = transformFrameOffset(pstate, wc-frameOptions, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Handle LIMIT/OFFSET before select clause (was: [HACKERS] Feature request: optimizer improvement)
Joe Love j...@primoweb.com writes: I'm wondering what type of index would work for this as it is a volatile function. Not knowing how PGs optimizer runs, I'm at a loss as to why this wouldn't be possible or worth doing. It seems to me that all functions in the select part of the statement could be calculated at the end of the query after the results have been gathered, and even after the sorting had been done as long as the column wasn't part of the order by (or perhaps group by). The short answer is that doing so directly contradicts the computational model defined by the SQL standard, and will break applications that rely on the current behavior. Since there's already a clear way to write the query in a way that specifies evaluating the functions after the sort/limit steps (ie, put the order by/limit in a sub-select), IMHO that's what you should do, not lobby to make the optimizer reinterpret what you wrote. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Handle LIMIT/OFFSET before select clause (was: [HACKERS] Feature request: optimizer improvement)
On Tue, Nov 5, 2013 at 11:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: Joe Love j...@primoweb.com writes: I'm wondering what type of index would work for this as it is a volatile function. Not knowing how PGs optimizer runs, I'm at a loss as to why this wouldn't be possible or worth doing. It seems to me that all functions in the select part of the statement could be calculated at the end of the query after the results have been gathered, and even after the sorting had been done as long as the column wasn't part of the order by (or perhaps group by). The short answer is that doing so directly contradicts the computational model defined by the SQL standard, and will break applications that rely on the current behavior. Since there's already a clear way to write the query in a way that specifies evaluating the functions after the sort/limit steps (ie, put the order by/limit in a sub-select), IMHO that's what you should do, not lobby to make the optimizer reinterpret what you wrote. +1. I thought more about our earlier discussion on this, and I agree with the point that making the planner push limit over select for this specific case is not a good idea. -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] List of binary-compatible data types
Noah, That's all true, but the system has no concept like this cast validates the data, never changing it. We would first need to add metadata supporting such a concept. On the other hand, create cast (json as text) without function; leans only on concepts the system already knows. Yeah, I'm thinking it might be worth coming up with a solution for that specific case. As users upgrade from 9.0 and 9.1 to 9.3, they're going to want to convert their text columns containing JSON to columns of the JSON type, and are going to be surprised how painful that is. Of course, if we get binary JSON in 9.4 (Oleg?), then a binary conversion will be required, so maybe it's a moot point. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] List of binary-compatible data types
On 2013-11-05 11:15:29 -0800, Josh Berkus wrote: Noah, That's all true, but the system has no concept like this cast validates the data, never changing it. We would first need to add metadata supporting such a concept. On the other hand, create cast (json as text) without function; leans only on concepts the system already knows. Yeah, I'm thinking it might be worth coming up with a solution for that specific case. As users upgrade from 9.0 and 9.1 to 9.3, they're going to want to convert their text columns containing JSON to columns of the JSON type, and are going to be surprised how painful that is. There's zap chance of doing anything for 9.3, this would require quite a bit of code in tablecmds.c and that surely isn't going to happen in the backbranches. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] missing RelationCloseSmgr in FreeFakeRelcacheEntry?
On 2013-11-04 13:48:32 +0100, Andres Freund wrote: What about just unowning the smgr entry with if (rel-rd_smgr != NULL) smgrsetowner(NULL, rel-rd_smgr) when closing the fake relcache entry? That shouldn't require any further changes than changing the comment in smgrsetowner() that this isn't something expected to frequently happen. Attached is a patch doing it like that, it required slightmy more invasive changes than that. With the patch applied we survive replay of a primary's make check run under valgrind without warnings. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 849559b9f1d0c20ea938d1dd13b37b53ebede856 Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Tue, 5 Nov 2013 17:23:09 +0100 Subject: [PATCH] Un-own SMgrRelations in FreeFakeRelcacheEntry(). The previous coding left SMgrRelation-smgr_owner point into free'd memory after FreeFakeRelcacheEntry() was executed which then was accessed and written to when commit messages including smgr invalidations or dropped relations where replayed during crash recovery or physical replication. Due to the memory allocation patterns during recovery this usually didn't cause big problems but was noticed using valgrind. --- src/backend/access/transam/xlogutils.c | 3 +++ src/backend/storage/smgr/smgr.c| 41 ++ src/include/storage/smgr.h | 1 + 3 files changed, 41 insertions(+), 4 deletions(-) diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index 5429d5e..94ddd78 100644 --- a/src/backend/access/transam/xlogutils.c +++ b/src/backend/access/transam/xlogutils.c @@ -433,6 +433,9 @@ CreateFakeRelcacheEntry(RelFileNode rnode) void FreeFakeRelcacheEntry(Relation fakerel) { + /* make sure the fakerel is not referenced anymore */ + if (fakerel-rd_smgr != NULL) + smgrunown(fakerel-rd_smgr); pfree(fakerel); } diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c index f7f1437..c9a2a36 100644 --- a/src/backend/storage/smgr/smgr.c +++ b/src/backend/storage/smgr/smgr.c @@ -85,6 +85,7 @@ static SMgrRelation first_unowned_reln = NULL; /* local function prototypes */ static void smgrshutdown(int code, Datum arg); static void remove_from_unowned_list(SMgrRelation reln); +static void add_to_unowned_list(SMgrRelation reln); /* @@ -174,9 +175,7 @@ smgropen(RelFileNode rnode, BackendId backend) for (forknum = 0; forknum = MAX_FORKNUM; forknum++) reln-md_fd[forknum] = NULL; - /* place it at head of unowned list (to make smgrsetowner cheap) */ - reln-next_unowned_reln = first_unowned_reln; - first_unowned_reln = reln; + add_to_unowned_list(reln); } return reln; @@ -191,7 +190,7 @@ smgropen(RelFileNode rnode, BackendId backend) void smgrsetowner(SMgrRelation *owner, SMgrRelation reln) { - /* We don't currently support disowning an SMgrRelation here */ + /* We don't support disowning an SMgrRelation here */ Assert(owner != NULL); /* @@ -214,6 +213,25 @@ smgrsetowner(SMgrRelation *owner, SMgrRelation reln) } /* + * smgrunown() -- Remove long-lived reference to an SMgrRelation object if one + *exists + */ +void +smgrunown(SMgrRelation reln) +{ + if (reln-smgr_owner == NULL) + return; + + /* unset the owner's reference */ + *(reln-smgr_owner) = NULL; + + /* unset our reference to the owner*/ + reln-smgr_owner = NULL; + + add_to_unowned_list(reln); +} + +/* * remove_from_unowned_list -- unlink an SMgrRelation from the unowned list * * If the reln is not present in the list, nothing happens. Typically this @@ -246,6 +264,21 @@ remove_from_unowned_list(SMgrRelation reln) } /* + * add_to_unowned_list -- link an SMgrRelation onto the unowned list + * + * Check remove_from_unowned_list()s comments for performance + * considerations. + */ +static void +add_to_unowned_list(SMgrRelation reln) +{ + /* place it at head of unowned list (to make smgrsetowner cheap) */ + reln-next_unowned_reln = first_unowned_reln; + first_unowned_reln = reln; +} + + +/* * smgrexists() -- Does the underlying file for a fork exist? */ bool diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h index 98b6f13..20c834d 100644 --- a/src/include/storage/smgr.h +++ b/src/include/storage/smgr.h @@ -80,6 +80,7 @@ extern void smgrinit(void); extern SMgrRelation smgropen(RelFileNode rnode, BackendId backend); extern bool smgrexists(SMgrRelation reln, ForkNumber forknum); extern void smgrsetowner(SMgrRelation *owner, SMgrRelation reln); +extern void smgrunown(SMgrRelation reln); extern void smgrclose(SMgrRelation reln); extern void smgrcloseall(void); extern void smgrclosenode(RelFileNodeBackend rnode); -- 1.8.4.21.g992c386.dirty -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:
[HACKERS] TODO: Split out pg_resetxlog output into pre- and post-sections
This patch implements the following TODO item: Split out pg_resetxlog output into pre- and post-sections http://archives.postgresql.org/pgsql-hackers/2010-08/msg02040.php On execution of pg_resetxlog using the option -n 1. It will display values in two section. 2. First section will be called as Current pg_control values or Guess pg_control values. 3. In first section, it will display all current (i.e. before change) values of control file or guessed values. 4. Second section will be called as Values to be used after reset. 5. In second section, it will display new values of parameters to be reset as per user request. Please provide your opinion or expectation out of this patch. I will add the same to November commitFest. Thanks and Regards, Kumar Rajeev Rastogi pg_resetxlogsection.patch Description: pg_resetxlogsection.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] missing locking in at least INSERT INTO view WITH CHECK
Andres Freund and...@2ndquadrant.com wrote: On 2013-11-02 17:05:24 -0700, Kevin Grittner wrote: Andres Freund and...@2ndquadrant.com wrote: Also attached is 0004 which just adds a heap_lock() around a newly created temporary table in the matview code which shouldn't be required for correctness but gives warm and fuzzy feelings as well as less debugging noise. Will think about this. I agree is is probably worth doing something to reduce the noise when looking for cases that actually matter. It's pretty much free, so I don't think there really is any reason to deviate from other parts of the code. Note how e.g. copy_heap_data(), DefineRelation() and ATRewriteTable() all lock the new relation, even if it just has been created and is (and in the latter two cases will always be) invisible. None of those locations are using heap_open() as the first parameter to heap_close(). That looks kinda iffy, and the fact that it is not yet done anywhere in the code gives me pause. You probably had a reason for preferring that to a simple call to LockRelationOid(), but I'm not seeing what that reason is. I also don't understand the use of the lockmode variable here. I'm thinking of something like the attached instead. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Companydiff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c index d5a10ad..d5e58ae 100644 --- a/src/backend/commands/matview.c +++ b/src/backend/commands/matview.c @@ -30,6 +30,7 @@ #include miscadmin.h #include parser/parse_relation.h #include rewrite/rewriteHandler.h +#include storage/lmgr.h #include storage/smgr.h #include tcop/tcopprot.h #include utils/builtins.h @@ -243,6 +244,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString, /* Create the transient table that will receive the regenerated data. */ OIDNewHeap = make_new_heap(matviewOid, tableSpace, concurrent, ExclusiveLock); + LockRelationOid(OIDNewHeap, AccessExclusiveLock); dest = CreateTransientRelDestReceiver(OIDNewHeap); /* Generate the data, if wanted. */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] missing locking in at least INSERT INTO view WITH CHECK
On 2013-11-05 11:56:25 -0800, Kevin Grittner wrote: Andres Freund and...@2ndquadrant.com wrote: On 2013-11-02 17:05:24 -0700, Kevin Grittner wrote: Andres Freund and...@2ndquadrant.com wrote: Also attached is 0004 which just adds a heap_lock() around a newly created temporary table in the matview code which shouldn't be required for correctness but gives warm and fuzzy feelings as well as less debugging noise. Will think about this. I agree is is probably worth doing something to reduce the noise when looking for cases that actually matter. It's pretty much free, so I don't think there really is any reason to deviate from other parts of the code. Note how e.g. copy_heap_data(), DefineRelation() and ATRewriteTable() all lock the new relation, even if it just has been created and is (and in the latter two cases will always be) invisible. None of those locations are using heap_open() as the first parameter to heap_close(). Oh! Sure, what I'd posted was just an absolutely crude hack that surely isn't committable. I'm thinking of something like the attached instead. Looks fine to me, maybe add a short comment? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] missing locking in at least INSERT INTO view WITH CHECK
On 2013-11-05 12:21:23 -0800, Kevin Grittner wrote: Andres Freund and...@2ndquadrant.com Looks fine to me Any thoughts on whether this should be back-patched to 9.3? I can see arguments both ways, and don't have a particularly strong feeling one way or the other. Hehe. I was wondering myself. I'd tentatively say no - unless we also backpatch the debugging patch there doesn't seem to be good reason to since the likelihood of conficts due to it doesn't seem very high. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Add cassert-only checks against unlocked use of relations
Hi, There frequently have been bugs where (heap|relation|index)_open(NoLock) was used without a previous locks which in some circumstances is an easy mistake to make and which is hard to notice. The attached patch adds --use-cassert only WARNINGs against doing so: Add cassert-only checks against unlocked use of relations. Specifically relation_open(), which also covers heap/index_open(), and RelationIdGetRelation() WARN if the relation is opened without being locked. index_open() now checks whether the heap relation the index is covering is locked. To make those checks possible add StrongestLocalRelationLock() which returns the strongest locally held lock over a relation. It relies on the also added StrongestLocalLock() which searches the local locktable sequentially for matching locks. After 1) 2a781d57dcd027df32d15ee2378b84d0c4d005d1 2) http://archives.postgresql.org/message-id/1383681385.79520.YahooMailNeo%40web162902.mail.bf1.yahoo.com 3) http://archives.postgresql.org/message-id/CAEZATCVCgboHKu_%2BK0nakrXW-AFEz_18icE0oWEQHsM-OepWhw%40mail.gmail.com HEAD doesn't generate warnings anymore. I think Kevin will commit something akin to 2), but 3) is an actual open bug, so that patch will need to get applied beforehand. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From c02e2c00c0dc98ff7d5f63a1d30887b8bbfe0dc3 Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Wed, 23 Oct 2013 02:34:51 +0200 Subject: [PATCH 1/4] Add cassert-only checks against unlocked use of relations. Specifically relation_open(), which also covers heap/index_open(), and RelationIdGetRelation() WARN if the relation is opened without being locked. index_open() now checks whether the heap relation the index is covering is locked. To make those checks possible add StrongestLocalRelationLock() which returns the strongest locally held lock over a relation. It relies on the also added StrongestLocalLock() which searches the local locktable sequentially for matching locks. --- src/backend/access/heap/heapam.c | 18 ++ src/backend/access/index/indexam.c | 23 +++ src/backend/storage/lmgr/lmgr.c| 23 ++- src/backend/storage/lmgr/lock.c| 35 +++ src/backend/utils/cache/relcache.c | 18 ++ src/include/storage/lmgr.h | 2 ++ src/include/storage/lock.h | 2 ++ 7 files changed, 120 insertions(+), 1 deletion(-) diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index a21f31b..3fbadd4 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -1038,6 +1038,24 @@ relation_open(Oid relationId, LOCKMODE lockmode) pgstat_initstats(r); +#if USE_ASSERT_CHECKING + /* + * Unless locked a relation's definition can change, thus relying + * on it is dangerous in that case. + * We cannot rely on correct locking during bootstrap, so don't + * check for it there. + */ + if (assert_enabled IsNormalProcessingMode() lockmode == NoLock) + { + LOCKMODE curmode; + + curmode = StrongestLocalRelationLock(relationId); + if (curmode == NoLock) + elog(WARNING, relation_open(%u, NoLock) of relation \%s\ without previously held lock, + relationId, RelationGetRelationName(r)); + } +#endif + return r; } diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c index b878155..c75bee2 100644 --- a/src/backend/access/index/indexam.c +++ b/src/backend/access/index/indexam.c @@ -65,6 +65,8 @@ #include postgres.h +#include miscadmin.h + #include access/relscan.h #include access/transam.h #include catalog/index.h @@ -142,6 +144,8 @@ static IndexScanDesc index_beginscan_internal(Relation indexRelation, * */ +#include catalog/catalog.h + /* * index_open - open an index relation by relation OID * @@ -169,6 +173,25 @@ index_open(Oid relationId, LOCKMODE lockmode) errmsg(\%s\ is not an index, RelationGetRelationName(r; +#if USE_ASSERT_CHECKING + /* + * Using an index's definition is only safe if the underlying + * relation is already locked. + * We cannot rely on correct locking during bootstrap, so don't + * check for it there. + */ + if (assert_enabled IsNormalProcessingMode()) + { + LOCKMODE mode; + + mode = StrongestLocalRelationLock(r-rd_index-indrelid); + if (mode == NoLock) + elog(WARNING, index_open(%u) of index \%s\ without lock on heap relation %u, + relationId, RelationGetRelationName(r), + r-rd_index-indrelid); + } +#endif + return r; } diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c index a978172..a6cf74a 100644 --- a/src/backend/storage/lmgr/lmgr.c +++ b/src/backend/storage/lmgr/lmgr.c
Re: [HACKERS] Window functions can be created with defaults, but they don't work
I wrote: I noticed this while poking at the variadic-aggregates issue: regression=# create function nth_value_def(anyelement, integer = 1) returns anyelement language internal window immutable strict as 'window_nth_value'; CREATE FUNCTION regression=# SELECT nth_value_def(ten) OVER (PARTITION BY four), ten, four FROM (SELECT * FROM tenk1 WHERE unique2 10 ORDER BY four, ten)s; The connection to the server was lost. Attempting reset: Failed. Attached is a proposed patch against HEAD that fixes this by supporting default arguments properly for window functions. In passing, it also allows named-argument notation in window function calls, since that's free once the other thing works (because the same subroutine fixes up both things). The reason this crashes is that the planner doesn't apply default-insertion to WindowFunc nodes, only to FuncExprs. We could make it do that, probably, but that seems to me like a feature addition. I think a more reasonable approach for back-patching is to fix function creation to disallow attaching defaults to window functions (or aggregates, for that matter, which would have the same issue if CREATE AGGREGATE had the syntax option to specify defaults). ProcedureCreate seems like an appropriate place, since it already contains a lot of sanity checks of this sort. Having now done the patch to fix it properly, I'm more inclined to think that maybe we should just back-patch this rather than inserting an error check. It seems pretty low-risk. Comments? regards, tom lane diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml index e3dbc4b..16aade4 100644 *** a/doc/src/sgml/syntax.sgml --- b/doc/src/sgml/syntax.sgml *** SELECT concat_lower_or_upper('Hello', 'W *** 2527,2534 note para ! Named and mixed call notations can currently be used only with regular ! functions, not with aggregate functions or window functions. /para /note /sect2 --- 2527,2535 note para ! Named and mixed call notations currently cannot be used when calling an ! aggregate function (but they do work when an aggregate function is used ! as a window function). /para /note /sect2 diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index 76c032c..4fa73a9 100644 *** a/src/backend/optimizer/util/clauses.c --- b/src/backend/optimizer/util/clauses.c *** eval_const_expressions_mutator(Node *nod *** 2251,2256 --- 2251,2306 */ return (Node *) copyObject(param); } + case T_WindowFunc: + { + WindowFunc *expr = (WindowFunc *) node; + Oid funcid = expr-winfnoid; + List *args; + Expr *aggfilter; + HeapTuple func_tuple; + WindowFunc *newexpr; + + /* + * We can't really simplify a WindowFunc node, but we mustn't + * just fall through to the default processing, because we + * have to apply expand_function_arguments to its argument + * list. That takes care of inserting default arguments and + * expanding named-argument notation. + */ + func_tuple = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid)); + if (!HeapTupleIsValid(func_tuple)) + elog(ERROR, cache lookup failed for function %u, funcid); + + args = expand_function_arguments(expr-args, expr-wintype, + func_tuple); + + ReleaseSysCache(func_tuple); + + /* Now, recursively simplify the args (which are a List) */ + args = (List *) + expression_tree_mutator((Node *) args, + eval_const_expressions_mutator, + (void *) context); + /* ... and the filter expression, which isn't */ + aggfilter = (Expr *) + eval_const_expressions_mutator((Node *) expr-aggfilter, + context); + + /* And build the replacement WindowFunc node */ + newexpr = makeNode(WindowFunc); + newexpr-winfnoid = expr-winfnoid; + newexpr-wintype = expr-wintype; + newexpr-wincollid = expr-wincollid; + newexpr-inputcollid = expr-inputcollid; + newexpr-args = args; + newexpr-aggfilter = aggfilter; + newexpr-winref = expr-winref; + newexpr-winstar = expr-winstar; + newexpr-winagg = expr-winagg; + newexpr-location = expr-location; + + return (Node *) newexpr; + } case T_FuncExpr: { FuncExpr *expr = (FuncExpr *) node; diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c index 2bd24c8..ede36d1 100644 *** a/src/backend/parser/parse_func.c --- b/src/backend/parser/parse_func.c *** ParseFuncOrColumn(ParseState *pstate, Li *** 537,553 errmsg(window functions cannot return sets), parser_errposition(pstate, location))); - /* - * We might want to support this later, but for now reject it because - * the planner and executor wouldn't cope with NamedArgExprs in a - * WindowFunc node. -
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
From: Albe Laurenz laurenz.a...@wien.gv.at I looked into the Standard, and it does not have NVARCHAR. The type is called NATIONAL CHARACTER VARYING, NATIONAL CHAR VARYING or NCHAR VARYING. OUch, that's just a mistake in my mail. You are correct. I guess that the goal of this patch is to support Oracle syntax. But anybody trying to port CREATE TABLE statements from Oracle is already exposed to enough incompatibilities that the difference between NVARCHAR and NCHAR VARYING will not be the reason to reject PostgreSQL. In other words, I doubt that introducing the nonstandard NVARCHAR will have more benefits than drawbacks (new reserved word). Agreed. But I'm in favor of supporting other DBMS's syntax if it doesn't complicate the spec or implementation too much, because it can help migrate to PostgreSQL. I understand PostgreSQL has made such efforts like PL/pgSQL which is similar to PL/SQL, text data type, AS in SELECT statement, etc. But I don't think that this requires the first step that your patch implements, it is in fact orthogonal. (It's not my patch.) Regarding the Standard compliant names of these data types, PostgreSQL already supports those. Maybe some documentation would help. I don't think that there is any need to change NCHAR even if we get per-column encoding, it is just syntactic sugar to support SQL Feature F421. Maybe so. I guess the distinct type for NCHAR is for future extension and user friendliness. As one user, I expect to get national character instead of char character set xxx as output of psql \d and pg_dump when I specified national character in DDL. In addition, that makes it easy to use the pg_dump output for importing data to other DBMSs for some reason. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add cassert-only checks against unlocked use of relations
Andres Freund and...@2ndquadrant.com writes: There frequently have been bugs where (heap|relation|index)_open(NoLock) was used without a previous locks which in some circumstances is an easy mistake to make and which is hard to notice. The attached patch adds --use-cassert only WARNINGs against doing so: While I agree that there seems to be a problem here, I'm not convinced that this is the solution. The implication of a heap_open(NoLock) is that the programmer believes that some previous action must have taken a lock on the relation; if he's wrong, then the causal link that he thought existed doesn't really. But this patch is not checking for a causal link; it'll be fooled just as easily as the programmer is by a happenstance (that is, unrelated) previous lock on the relation. What's more, it entirely fails to check whether the previous lock is really strong enough for what we're going to do. I also find it unduly expensive to search the whole lock hashtable on every relation open. That's going to be a O(N^2) cost for a transaction touching N relations, and that doesn't sound acceptable, not even for assert-only code. If we're sufficiently worried by this type of bug, ISTM we'd be better off just disallowing heap_open(NoLock). At the time we invented that, every lock request went to shared memory; but now that we have the local lock table, re-locking just requires a local hash lookup followed by incrementing a local counter. That's probably pretty cheap --- certainly a lot cheaper than what you've got here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add cassert-only checks against unlocked use of relations
On 2013-11-05 16:25:53 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: There frequently have been bugs where (heap|relation|index)_open(NoLock) was used without a previous locks which in some circumstances is an easy mistake to make and which is hard to notice. The attached patch adds --use-cassert only WARNINGs against doing so: While I agree that there seems to be a problem here, I'm not convinced that this is the solution. The implication of a heap_open(NoLock) is that the programmer believes that some previous action must have taken a lock on the relation; if he's wrong, then the causal link that he thought existed doesn't really. But this patch is not checking for a causal link; it'll be fooled just as easily as the programmer is by a happenstance (that is, unrelated) previous lock on the relation. What's more, it entirely fails to check whether the previous lock is really strong enough for what we're going to do. Sure. But it already found several bugs as evidenced by the referenced thread, so it seems to be helpful enough. I also find it unduly expensive to search the whole lock hashtable on every relation open. That's going to be a O(N^2) cost for a transaction touching N relations, and that doesn't sound acceptable, not even for assert-only code. We could relatively easily optimize that to a constant factor by just iterating over the possible lockmodes. Should only take a couple more lines. I'd be really, really surprised if it even comes close to the overhead of AtEOXact_Buffers() though. Which is not to say it's a bad idea to make this check more effective though ;) If we're sufficiently worried by this type of bug, ISTM we'd be better off just disallowing heap_open(NoLock). At the time we invented that, every lock request went to shared memory; but now that we have the local lock table, re-locking just requires a local hash lookup followed by incrementing a local counter. That's probably pretty cheap --- certainly a lot cheaper than what you've got here. Hm. That only works though if we're using the same lockmode as before - often enough the *_open(NoLock) checks would use a weaker locklevel than the previous lock. So I think the cost of doing so would probably be noticeable. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add cassert-only checks against unlocked use of relations
Andres Freund and...@2ndquadrant.com writes: On 2013-11-05 16:25:53 -0500, Tom Lane wrote: If we're sufficiently worried by this type of bug, ISTM we'd be better off just disallowing heap_open(NoLock). At the time we invented that, every lock request went to shared memory; but now that we have the local lock table, re-locking just requires a local hash lookup followed by incrementing a local counter. That's probably pretty cheap --- certainly a lot cheaper than what you've got here. Hm. That only works though if we're using the same lockmode as before - often enough the *_open(NoLock) checks would use a weaker locklevel than the previous lock. So I think the cost of doing so would probably be noticeable. If you're not using the same lockmode as before, it's probably a bug anyway. As I said already, the entire NoLock coding technique is dependent on having a very clear idea of which previous lock-taking you're riding on the coattails of. Why wouldn't you duplicate that lock level, if we say you can't use NoLock anymore? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] missing locking in at least INSERT INTO view WITH CHECK
Andres Freund and...@2ndquadrant.com wrote: On 2013-11-05 12:21:23 -0800, Kevin Grittner wrote: Andres Freund and...@2ndquadrant.com Looks fine to me Any thoughts on whether this should be back-patched to 9.3? I can see arguments both ways, and don't have a particularly strong feeling one way or the other. Hehe. I was wondering myself. I'd tentatively say no - unless we also backpatch the debugging patch there doesn't seem to be good reason to since the likelihood of conficts due to it doesn't seem very high. Works for me. Done. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add cassert-only checks against unlocked use of relations
On 2013-11-05 16:45:49 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2013-11-05 16:25:53 -0500, Tom Lane wrote: If we're sufficiently worried by this type of bug, ISTM we'd be better off just disallowing heap_open(NoLock). At the time we invented that, every lock request went to shared memory; but now that we have the local lock table, re-locking just requires a local hash lookup followed by incrementing a local counter. That's probably pretty cheap --- certainly a lot cheaper than what you've got here. Hm. That only works though if we're using the same lockmode as before - often enough the *_open(NoLock) checks would use a weaker locklevel than the previous lock. So I think the cost of doing so would probably be noticeable. If you're not using the same lockmode as before, it's probably a bug anyway. As I said already, the entire NoLock coding technique is dependent on having a very clear idea of which previous lock-taking you're riding on the coattails of. Why wouldn't you duplicate that lock level, if we say you can't use NoLock anymore? Hm. That doesn't really seem to match my reading of the code. There's quite some code around that relies the fact that the parser has taken an appropriately strong lock already. E.g. the parser chooses the lockmode in addRangeTableEntry() - I don't think we want to add duplicate isLockedRefname() calls everywhere. Other users of that relation will likely just use AccessShareLock since that's sufficient for their own use. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add cassert-only checks against unlocked use of relations
On 2013-11-05 22:35:41 +0100, Andres Freund wrote: We could relatively easily optimize that to a constant factor by just iterating over the possible lockmodes. Should only take a couple more lines. On that note, any chance you remember why you increased MAX_LOCKMODE by 2 to 10 back in 2001 although AccessExclusiveLock is 8? The relevant commit is E4fe42dfbc3bafa0ea615239d716a6b37d67da253 . Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
On 11/5/13, 1:04 AM, Arulappan, Arul Shaji wrote: Implements NCHAR/NVARCHAR as distinct data types, not as synonyms If, per SQL standard, NCHAR(x) is equivalent to CHAR(x) CHARACTER SET cs, then for some cs, NCHAR(x) must be the same as CHAR(x). Therefore, an implementation as separate data types is wrong. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add cassert-only checks against unlocked use of relations
Andres Freund and...@2ndquadrant.com writes: On that note, any chance you remember why you increased MAX_LOCKMODE by 2 to 10 back in 2001 although AccessExclusiveLock is 8? The relevant commit is 4fe42dfbc3bafa0ea615239d716a6b37d67da253 . Probably because it seemed like a round number, which 9 wasn't ... keep in mind that this data structure is nominally intended to support other lock semantics besides what LockConflicts[] defines. (BTW, it's a conceptual error to imagine that the numerical values of the lock mode codes define a strict strength ordering, which is another reason I don't care for your patch.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Window functions can be created with defaults, but they don't work
I wrote: Attached is a proposed patch against HEAD that fixes this by supporting default arguments properly for window functions. In passing, it also allows named-argument notation in window function calls, since that's free once the other thing works (because the same subroutine fixes up both things). Hm, well, almost free --- turns out ruleutils.c was assuming WindowFuncs couldn't contain named arguments. Updated patch attached. regards, tom lane diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml index e3dbc4b..16aade4 100644 *** a/doc/src/sgml/syntax.sgml --- b/doc/src/sgml/syntax.sgml *** SELECT concat_lower_or_upper('Hello', 'W *** 2527,2534 note para ! Named and mixed call notations can currently be used only with regular ! functions, not with aggregate functions or window functions. /para /note /sect2 --- 2527,2535 note para ! Named and mixed call notations currently cannot be used when calling an ! aggregate function (but they do work when an aggregate function is used ! as a window function). /para /note /sect2 diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index 76c032c..4fa73a9 100644 *** a/src/backend/optimizer/util/clauses.c --- b/src/backend/optimizer/util/clauses.c *** eval_const_expressions_mutator(Node *nod *** 2251,2256 --- 2251,2306 */ return (Node *) copyObject(param); } + case T_WindowFunc: + { + WindowFunc *expr = (WindowFunc *) node; + Oid funcid = expr-winfnoid; + List *args; + Expr *aggfilter; + HeapTuple func_tuple; + WindowFunc *newexpr; + + /* + * We can't really simplify a WindowFunc node, but we mustn't + * just fall through to the default processing, because we + * have to apply expand_function_arguments to its argument + * list. That takes care of inserting default arguments and + * expanding named-argument notation. + */ + func_tuple = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid)); + if (!HeapTupleIsValid(func_tuple)) + elog(ERROR, cache lookup failed for function %u, funcid); + + args = expand_function_arguments(expr-args, expr-wintype, + func_tuple); + + ReleaseSysCache(func_tuple); + + /* Now, recursively simplify the args (which are a List) */ + args = (List *) + expression_tree_mutator((Node *) args, + eval_const_expressions_mutator, + (void *) context); + /* ... and the filter expression, which isn't */ + aggfilter = (Expr *) + eval_const_expressions_mutator((Node *) expr-aggfilter, + context); + + /* And build the replacement WindowFunc node */ + newexpr = makeNode(WindowFunc); + newexpr-winfnoid = expr-winfnoid; + newexpr-wintype = expr-wintype; + newexpr-wincollid = expr-wincollid; + newexpr-inputcollid = expr-inputcollid; + newexpr-args = args; + newexpr-aggfilter = aggfilter; + newexpr-winref = expr-winref; + newexpr-winstar = expr-winstar; + newexpr-winagg = expr-winagg; + newexpr-location = expr-location; + + return (Node *) newexpr; + } case T_FuncExpr: { FuncExpr *expr = (FuncExpr *) node; diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c index 2bd24c8..ede36d1 100644 *** a/src/backend/parser/parse_func.c --- b/src/backend/parser/parse_func.c *** ParseFuncOrColumn(ParseState *pstate, Li *** 537,553 errmsg(window functions cannot return sets), parser_errposition(pstate, location))); - /* - * We might want to support this later, but for now reject it because - * the planner and executor wouldn't cope with NamedArgExprs in a - * WindowFunc node. - */ - if (argnames != NIL) - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg(window functions cannot use named arguments), - parser_errposition(pstate, location))); - /* parse_agg.c does additional window-func-specific processing */ transformWindowFuncCall(pstate, wfunc, over); --- 537,542 diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index 04b1c4f..915fb7a 100644 *** a/src/backend/utils/adt/ruleutils.c --- b/src/backend/utils/adt/ruleutils.c *** get_windowfunc_expr(WindowFunc *wfunc, d *** 7461,7466 --- 7461,7467 StringInfo buf = context-buf; Oid argtypes[FUNC_MAX_ARGS]; int nargs; + List *argnames; ListCell *l; if (list_length(wfunc-args) FUNC_MAX_ARGS) *** get_windowfunc_expr(WindowFunc *wfunc, d *** 7468,7485 (errcode(ERRCODE_TOO_MANY_ARGUMENTS), errmsg(too many arguments))); nargs = 0; foreach(l, wfunc-args) { Node *arg = (Node *) lfirst(l); ! Assert(!IsA(arg,
Re: [HACKERS] Add cassert-only checks against unlocked use of relations
On 2013-11-05 17:19:21 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On that note, any chance you remember why you increased MAX_LOCKMODE by 2 to 10 back in 2001 although AccessExclusiveLock is 8? The relevant commit is 4fe42dfbc3bafa0ea615239d716a6b37d67da253 . Probably because it seemed like a round number, which 9 wasn't ... keep in mind that this data structure is nominally intended to support other lock semantics besides what LockConflicts[] defines. Hm. Given that there are Assert()s for MAX_LOCKMODE around, that adding a new method isn't possible without editing lock.c and that we use it to in shared memory structures I am not sure I see the point of that slop. Anyway, it was just a point of curiosity. (BTW, it's a conceptual error to imagine that the numerical values of the lock mode codes define a strict strength ordering, which is another reason I don't care for your patch.) Well, while I don't thing it has too much practical bearing, we could just check for *any* lock already held instead, that's all we need for the added checks. I thought it might be more useful to get the strongest lock rather than any lock for other potential checks, but if that's a contentious point... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql: small patch to correct filename formatting error in '\s FILE' output
Ian Lawrence Barwick barw...@gmail.com writes: 2013/9/10 Bruce Momjian br...@momjian.us: I still see that weird behavior in git head: pgdevel=# \s history.txt Wrote history to file ./history.txt. pgdevel=# \s /tmp/history.txt Wrote history to file .//tmp/history.txt. pgdevel=# \cd /tmp pgdevel=# \s /tmp/history.txt Wrote history to file /tmp//tmp/history.txt. Should I revert the suggested patch? IIRC the patch was never applied, the reversion candidate is the existing commit 0725065b. I reverted 0725065b. AFAICT there is no interest in making \s produce a reliable full path name. There was some interest in making \cd tell you where it'd chdir'd to, but that would be a separate patch. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] configure: allow adding a custom string to PG_VERSION
On Tue, Nov 5, 2013 at 4:29 PM, Oskari Saarenmaa o...@ohmu.fi wrote: This can be used to tag custom built packages with an extra version string such as the git describe id or distribution package release version. Could you attach a proper patch to your email and register it to the next commit fest? Typically here: https://commitfest.postgresql.org/action/commitfest_view?id=20 This idea is more adaptable than your latest proposition, so some other people might be interested to review it. At least I like this idea. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
On Tue, Nov 5, 2013 at 5:30 PM, Claudio Freire klaussfre...@gmail.comwrote: Maybe there's value in minmax indexes for sequential data, but not for random data, which is the topic of this thread. Well, of course, they're not magic pixie dust. But is your data really random? (or normal?) I think minmax indexes are more akin to bitmap indexes. They will be very effective for columns with low-cardinality, especially for columns that are very clustered. In the extreme if all the values in some regions of the table are the same then minmax indexes would be optimal. I wouldn't expect them to be very effective for a highly selective column that isn't well clustered. It really sounds like you're describing a particular workload that btrees could just be more optimized for. Buffering all inserts in memory and merging them into the btree lazily is actually something Heikki has proposed in the past. I'm not clear if that gets you all the benefits of the indexes you described or not but it seems to target the particular problem you're having. -- greg
Re: [HACKERS] Fast insertion indexes: why no developments
Greg Stark escribió: I think minmax indexes are more akin to bitmap indexes. They will be very effective for columns with low-cardinality, especially for columns that are very clustered. In the extreme if all the values in some regions of the table are the same then minmax indexes would be optimal. I wouldn't expect them to be very effective for a highly selective column that isn't well clustered. I think clustering is more important than cardinality. I mean you can have a very useful minmax index on a float column, on which maybe there are no two identical values. I certainly don't think minmax indexes will solve all indexing problems. In the end, they're just one more tool in your DBA toolbox. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] List of binary-compatible data types
Andres, There's zap chance of doing anything for 9.3, this would require quite a bit of code in tablecmds.c and that surely isn't going to happen in the backbranches. Oh, sure, I was thinking of a workaround. Actually, being able to separate need to check contents from need to rewrite values could be useful for a lot of type conversions. I'd also love some way of doing a no-rewrite conversion between timestamp and timestamptz, based on the assumption that the original values are UTC time. That's one I encounter a lot. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] git diff --check whitespace checks, gitattributes
Peter Eisentraut pete...@gmx.net writes: Attached is a patch that - Adds a .gitattributes file to configure appropriate whitespace checks for git diff --check. - Cleans up all whitespace errors found in this way in existing code. Most of that is in files not covered by pgindent, some in new code since the last pgindent. This makes the entire tree git diff --check clean. After this, future patches can be inspected for whitespace errors with git diff --check, something that has been discussed on occasion. I always (well, almost always) do git diff --check, so making it stronger sounds good to me. But it sounds like this still leaves it to the committer to remember to run it. Can we do anything about that? One open question is whether psql output pasted into documentation, in particular .sgml files, should preserve the trailing whitespace that psql produces. This is currently done inconsistently. My preference is to trim the trailing whitespace, because otherwise it's impossible to check for trailing whitespace errors in other parts of those files. Maybe we should think about fixing psql to not generate that whitespace. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] git diff --check whitespace checks, gitattributes
Peter Eisentraut wrote: This makes the entire tree git diff --check clean. After this, future patches can be inspected for whitespace errors with git diff --check, something that has been discussed on occasion. +1 to this, and also +1 to Tom's suggestion of making it more strongly enforced. One open question is whether psql output pasted into documentation, in particular .sgml files, should preserve the trailing whitespace that psql produces. This is currently done inconsistently. I think pasting psql output verbatim isn't such a great idea anyway. Maybe it's okay for certain specific examples, but in most cases I think it'd be better to produce native SGML tables instead of programoutput stuff. After all, it's the result set that's interesting, not the way psql presents it. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ERROR during end-of-xact/FATAL
On Thu, Oct 31, 2013 at 8:22 PM, Noah Misch n...@leadboat.com wrote: CommitTransaction() and AbortTransaction() both do much work, and large portions of that work either should not or must not throw errors. An error during either function will, as usual, siglongjmp() out. Ordinarily, PostgresMain() will regain control and fire off a fresh AbortTransaction(). The consequences thereof depend on the original function's progress: - Before the function updates CurrentTransactionState-state, an ERROR is fully acceptable. CommitTransaction() specifically places failure-prone tasks accordingly; AbortTransaction() has no analogous tasks. - After the function updates CurrentTransactionState-state, an ERROR yields a user-unfriendly e.g. WARNING: AbortTransaction while in COMMIT state. This is not itself harmful, but we've largely kept the things that can fail for pedestrian reasons ahead of that point. - After CommitTransaction() calls RecordTransactionCommit() for an xid-bearing transaction, an ERROR upgrades to e.g. PANIC: cannot abort transaction 805, it was already committed. - After AbortTransaction() calls ProcArrayEndTransaction() for an xid-bearing transaction, an ERROR will lead to this assertion failure: TRAP: FailedAssertion(!(((allPgXact[proc-pgprocno].xid) != ((TransactionId) 0))), File: procarray.c, Line: 396) If the original AbortTransaction() pertained to a FATAL, the situation is worse. errfinish() promotes the ERROR thrown from AbortTransaction() to another FATAL, isn't errstart promotes ERROR to FATAL? so we reenter proc_exit(). Thanks to the following logic in shmem_exit(), following comment is in proc_exit_prepare(), but the effect will be same as described you due to logic in shmem_exit(). we will never return to AbortTransaction(): /* * call all the registered callbacks. * * Note that since we decrement on_proc_exit_index each time, if a * callback calls ereport(ERROR) or ereport(FATAL) then it won't be * invoked again when control comes back here (nor will the * previously-completed callbacks). So, an infinite loop should not be * possible. */ As a result, we miss any cleanups that had not yet happened in the original AbortTransaction(). In particular, this can leak heavyweight locks. An asserts build subsequently fails this way: TRAP: FailedAssertion(!(SHMQueueEmpty((MyProc-myProcLocks[i]))), File: proc.c, Line: 788) In a production build, the affected PGPROC slot just continues to hold the lock until the next backend claiming that slot calls LockReleaseAll(). Oops. Bottom line: most bets are off given an ERROR after RecordTransactionCommit() in CommitTransaction() or anywhere in AbortTransaction(). When I tried above scenario, I hit Assert at different place shmem_exit()-pgstat_beshutdown_hook()-pgstat_report_stat() pgstat_report_stat(bool force) { .. Assert(entry-trans == NULL); } The reason is that in AbortTransaction, an ERROR is thrown before call to AtEOXact_PgStat(false). This means that in the situation when an ERROR occurs in AbortTransaction which is called as a result of FATAL error, there are many more possibilities of Assert. Now, while those assertion failures are worth preventing on general principle, the actual field importance depends on whether things actually do fail in the vulnerable end-of-xact work. We've prevented the errors that would otherwise be relatively common, but there are several rare ways to get a late ERROR. Incomplete list: - If smgrDoPendingDeletes() finds files to delete, mdunlink() and its callee relpathbackend() call palloc(); this is true in all supported branches. In 9.3, due to commit 279628a0, smgrDoPendingDeletes() itself calls palloc(). (In fact, it does so even when the pending list is empty -- this is the only palloc() during a trivial transaction commit.) palloc() failure there yields a PANIC during commit. - ResourceOwnerRelease() calls FileClose() during abort, and FileClose() raises an ERROR when close() returns EIO. - AtEOXact_Inval() can lead to calls like RelationReloadIndexInfo(), which has many ways to throw errors. This precedes releasing heavyweight locks, so an error here during an abort pertaining to a FATAL exit orphans locks as described above. This relates into another recent thread: http://www.postgresql.org/message-id/20130805170931.ga369...@tornado.leadboat.com What should we do to mitigate these problems? Certainly we can harden individual end-of-xact tasks to not throw errors, as we have in the past. What higher-level strategies should we consider? What about for the unclean result of the FATAL-then-ERROR scenario in particular? About unclean FATAL-then-ERROR scenario, one way to deal at high level could be to treat such a case as backend crash in which case postmaster reinitialises shared memory and
Re: [HACKERS] [v9.4] row level security
On 11/05/2013 09:36 PM, Robert Haas wrote: I haven't studied this patch in detail, but I see why there's some unhappiness about that code: it's an RLS-specific kluge. Just shooting from the hip here, maybe we should attack the problem of making security-barrier views updatable first, as a separate patch. That's the approach I've been considering. There are a few wrinkles with it, though: (a) Updatable views are implemented in the rewriter, not the planner. The rewriter is not re-run when plans are invalidated or when the session authorization changes, etc. This means that we can't simply omit the RLS predicate for superuser because the same rewritten parse tree might get used for both superuser and non-superuser queries. Options: * Store the before-rewrite parse tree when RLS is discovered on one of the rels in the tree. Re-run the rewriter when re-planning. Ensure a change in current user always invalidates plans. * Declare that it's not legal to run a query originally parsed and rewritten as superuser as a non-superuser or vice versa. This would cause a great deal of pain with PL/PgSQL. * Always add the RLS predicate and solve the problem of reliably short-circuiting the user-supplied predicate for RLS-exempt users. We'd need a way to allow direct (not query-based) COPY TO for tables with RLS applied, preventing the rewriting of direct table access into subqueries for COPY, but since we never save plans for COPY that may be fine. * ... ? (b) Inheritance is a problem when RLS is done in the rewriter. As I understood it from Kohei KaiGai's description to me earlier, there was a strong preference on -hackers to enforce RLS predicates for child and parent tables completely independently. That's how RLS currently works, but it might be hard to get the same effect when applying RLS in the rewriter. We'd need to solve that, or redefine RLS's behaviour so that the predicate on a parent table applies to any child tables too. Personally I'd prefer the latter. (c) RLS might interact differently with rules declared on tables if implemented in the rewriter, so some investigation into that would be needed. I would think that if we make that work, this will also work without, hopefully, any special hackery. And we'd get a separate, independently useful feature out of it, too. I tend to agree. I'm just a bit concerned about dealing with the issues around RLS-exempt operations and users. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Row-security writer-side checks proposal
On 11/05/2013 09:30 PM, Robert Haas wrote: So really, there are four cases: READ WRITE INSERT WRITE UPDATE WRITE DELETE Isn't READ similarly divisible into READ SELECT, READ UPDATE, and READ DELETE? Not in my opinion. No matter what the command, the read side is all about having some way to obtain the contents of the tuple. Separate READ DELETE etc would only be interesting if we wanted to let someone DELETE rows they cannot SELECT. Since we have DELETE ... RETURNING, and since users can write a predicate function for DELETE that leaks the information even if we didn't, in practice if you give the user any READ right you've given them all of them. So I don't think we can support that (except maybe by column RLS down the track). By contrast on the write side it seems routine to need different rules for different operations. The traditional heriachical mandatory access model as implemented by Teradata Row Level Security, Oracle Label Based Security, etc, can have different rules for each operation. Here's a synopsis of the example described in the Teradata docs as a typical policy: - SELECT: Current session security label must be = the row label - INSERT: Current session security label must be = the new row label - UPDATE: Session label must be = row label to update the row. New row must be = session security label. - DELETE: Only permitted for rows with the lowest label set, ensuring row is reviewed and declassified before deletion. Except for the DELETE, this is actually just two policies, one for reads (session label = row label) and one for writes (session label = new row label). So this might be an acceptable constraint if necessary, but it'd be really good to support per-command rules, and we certainly need asymmetric read- and write- rules. I'm looking into use cases and existing examples to put this in a more concerete context, as much of the RLS discussion has been a hypothetical one that doesn't look at concrete user problems much. Similarly, saying you can update but not delete seems quite reasonable to me. If you simply want to allow UPDATE but not DELETE, you can refrain from granting the table-level privilege. The situation in which you need things separate is when you want to allow both UPDATE and DELETE but with different RLS quals for each. That's what I was getting at, yes. Like the example above; the set of rows you can update might be different to the set of rows you can delete. On the other hand, we might choose to say if you want to do things with that granularity use your own triggers to enforce it and provide only READ and WRITE for RLS. The funny thing about this whole feature is that it's just syntax support for doing things that you can already do in other ways. If you want read-side security, create a security_barrier view and select from that instead of hitting the table directly. If you want write-side security, enforce it using triggers. Right now you can't have both together, though; an UPDATE on the raw table can observe rows that wouldn't be visible via the view and can send them to the client via RAISE NOTICE or whatever. Support for automatically updatable security barrier views would take care of this issue, at which point I'd agree: RLS becomes mostly cosmetic syntactical sugar over existing capabilities. FKs would ignore RLS, much like the would if you use explicit SECURITY BARRIER views and have FKs between the base tables. One big difference still remains though: when you add an RLS policy on a table, all procedures and views referring to that table automatically use the transparent security barrier view over the table instead. That's *not* the case when you use views manually; you have to re-create views that point to the table so they instead point to a security barrier view over the table. Again it's nothing you can't do with updatable security barrier views, but it's automatic and transparent with RLS. Now maybe that's fine. But given that, I think it's pretty important that we get the syntax right. Because if you're adding a feature primarily to add a more convenient syntax, then the syntax had better actually be convenient. I completely agree with that. I don't have a strong opinion on the current syntax. I've looked at how some other vendors do it, and I can't say their approaches are pretty. Oracle VPD has you create a PL/SQL procedure to generate the SQL text of the desired RLS predicate. It then wants you to create a POLICY (via a PL/SQL call to a built-in package) that associates the predicate generation function with a table and sets some options controlling what statement types it applies to, when the predicate is re-generated, etc. It also has policy groups, which bundle policies together and control whether or not they're applied for a given session. The predicates of different policies are ANDed together. So to create a single RLS policy on a single table you have to write a
Re: [HACKERS] Row-security writer-side checks proposal
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/05/2013 10:01 PM, Stephen Frost wrote: * Robert Haas (robertmh...@gmail.com) wrote: Now maybe that's fine. But given that, I think it's pretty important that we get the syntax right. Because if you're adding a feature primarily to add a more convenient syntax, then the syntax had better actually be convenient. I agree that we want to get the syntax correct, but also very clear as it's security related and we don't want anyone surprised by what happens when they use it. The idea, as has been discussed in the past, is to then allow tying RLS in with SELinux and provide MAC. That was my impression also. To help get closer to that point, since you were involved in the work on auto-updatable views: any hints on what might be needed to tackle making security barrier views updatable? There's a fun little wrinkle with MAC, by the way: functional indexes. We can't allow the creation of a functional index, even by the table owner, if it uses any non-LEAKPROOF operators and functions. Otherwise the user can write a function to leak the rows, then create an index using that function. That's not a problem for the current phase of RLS because the table owner is allowed to remove the RLS constraint directly. They can also add triggers that might leak rows via CASCADEs, etc. When MAC comes into the picture we'll need to impose limits on triggers and functional indexes added to rows. - -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.15 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJSefGAAAoJELBXNkqjr+S2W6EH+wc3fM3GGoYjnietLfGiiFmA 4ea7sIcio9kdDap3dNpgnMW2NfEHu/OLxSptFGBjl3w4RfA1KSQaKcwupjmanPGa har7MylI4SKDRHB5LWZEgYrK1A3n/PTJUap3DFGhLJxAdCMM3AtQfcyHBoj/LXfZ 9o9KkpXfzFW2e4yuPR714rZMzfAgO+Jyij9WkcayNASw/0jnCuhCdBtg8mKU6mhz lC4KA0WGxXqCGDdKxPwVRSJTMoT8kBeUBf4lznSEeGspxCHb4GafMCFvhHarQ9WU +aBY1mw3ELFXqfPurLC5RZVQGYsygWfzrREJ+oHUJ3khgPR2djj0EAemK3lwO6M= =HYU7 -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers