Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema
> Date: Thu, 22 Mar 2012 13:17:01 -0400 > Subject: Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema > From: cbbro...@gmail.com > To: kevin.gritt...@wicourts.gov > CC: pgsql-hackers@postgresql.org > > On Thu, Mar 22, 2012 at 12:38 PM, Kevin Grittner > wrote: > > Tom Lane wrote: > >> Robert Haas writes: > >>> Well, the standard syntax apparently aims to reduce the number of > >>> returned rows, which ORDER BY does not. Maybe you could do it > >>> with ORDER BY .. LIMIT, but the idea here I think is that we'd > >>> like to sample the table without reading all of it first, so that > >>> seems to miss the point. > >> > >> I think actually the traditional locution is more like > >> WHERE random() < constant > >> where the constant is the fraction of the table you want. And > >> yeah, the presumption is that you'd like it to not actually read > >> every row. (Though unless the sampling density is quite a bit > >> less than 1 row per page, it's not clear how much you're really > >> going to win.) > > > > It's all going to depend on the use cases, which I don't think I've > > heard described very well yet. > > > > I've had to pick random rows from, for example, a table of > > disbursements to support a financial audit. In those cases it has > > been the sample size that mattered, and order didn't. One > > interesting twist there is that for some of these financial audits > > they wanted the probability of a row being selected to be > > proportional to the dollar amount of the disbursement. I don't > > think you can do this without a first pass across the whole data > > set. > > This one was commonly called "Dollar Unit Sampling," though the > terminology has gradually gotten internationalized. > http://www.dummies.com/how-to/content/how-does-monetary-unit-sampling-work.html > > What the article doesn't mention is that some particularly large items > might wind up covering multiple samples. In the example, they're > looking for a sample every $3125 down the list. If there was a single > transaction valued at $3, that (roughly) covers 10 of the desired > samples. > > It isn't possible to do this without scanning across the entire table. > > If you want repeatability, you probably want to instantiate a copy of > enough information to indicate the ordering chosen. That's probably > something that needs to be captured as part of the work of the audit, > so not only does it need to involve a pass across the data, it > probably requires capturing a fair bit of data for posterity. > -- > When confronted by a difficult problem, solve it by reducing it to the > question, "How would the Lone Ranger handle this?" The discussion till now has gone far beyond my understanding.Could anyone explain briefly what is the idea for now? The designing detail for me is still unfamiliar. I can only take time to understand while possible after being selected and put time on it to read relevant material. For now, I'm still curious why Neil's implementation is no longer working? The Postgres has been patched a lot, but the general idea behind Neil's implementation should still work, isn't it? Besides, whether this query is needed is still not decided. Seems this is another hard to decide point. Is it that this topic is still not so prepared for the Gsoc yet? If really so, I think I still have time to switch to other topics. Any suggestion? Thanks. Best Regards and ThanksHuang Qi VictorComputer Science of National University of Singapore
Re: [HACKERS] Speed dblink using alternate libpq tuple storage
I saw Kyotaro already answered, but I give my view as well. On Thu, Mar 22, 2012 at 06:07:16PM -0400, Tom Lane wrote: > AFAICT it breaks async processing entirely, because many changes have been > made that fail to distinguish "insufficient data available as yet" from > "hard error". As an example, this code at the beginning of > getAnotherTuple: > > /* Get the field count and make sure it's what we expect */ > if (pqGetInt(&tupnfields, 2, conn)) > ! return EOF; > > is considering three cases: it got a 2-byte integer (and can continue on), > or there aren't yet 2 more bytes available in the buffer, in which case it > should return EOF without doing anything, or pqGetInt detected a hard > error and updated the connection error state accordingly, in which case > again there is nothing to do except return EOF. In the patched code we > have: > > /* Get the field count and make sure it's what we expect */ > if (pqGetInt(&tupnfields, 2, conn)) > ! { > ! /* Whole the message must be loaded on the buffer here */ > ! errmsg = libpq_gettext("protocol error\n"); > ! goto error_and_forward; > ! } > > which handles neither the second nor third case correctly: it thinks that > "data not here yet" is a hard error, and then makes sure it is an error by > destroying the parsing state :-(. And if in fact pqGetInt did log an > error, that possibly-useful error message is overwritten with an entirely > useless "protocol error" text. No, "protocol error" really is only error case here. - pqGetInt() does not set errors. - V3 getAnotherTuple() is called only if packet is fully in buffer. > I don't think the error return cases for the row processor have been > thought out too well either. The row processor is not in charge of what > happens to the PGresult, and it certainly has no business telling libpq to > just "exit immediately from the topmost libpq function". If we do that > we'll probably lose sync with the data stream and be unable to recover use > of the connection at all. Also, do we need to consider any error cases > for the row processor other than out-of-memory? No, the rule is *not* "exit to topmost", but "exit PQisBusy()". This is exactly so that if any code that does not expect row-processor behaviour continues to work. Also, from programmers POV, this also means row-processor callback causes minimal changes to existing APIs. > If so it might be a good > idea for it to have some ability to store a custom error message into the > PGconn, which it cannot do given the current function API. There already was such function, but it was row-processor specific hack that could leak out and create confusion. I rejected it. Instead there should be generic error setting function, equivalent to current libpq internal error setting. But such generic error setting function would need review all libpq error states as it allows error state appear in new situations. Also we need to have well-defined behaviour of client-side errors vs. incoming server errors. Considering that even current cut-down patch is troubling committers, I would definitely suggest postponing such generic error setter to 9.3. Especially as it does not change anything coding-style-wise. > In the same vein, I am fairly uncomfortable with the blithe assertion that > a row processor can safely longjmp out of libpq. This was never foreseen > in the original library coding and there are any number of places that > that might break, now or in the future. Do we really need to allow it? > If we do, it would be a good idea to decorate the libpq functions that are > now expected to possibly longjmp with comments saying so. Tracing all the > potential call paths that might be aborted by a longjmp is an essential > activity anyway. I think we *should* allow exceptions, but in limited number of APIs. Basically, the usefulness for users vs. effort from our side is clearly on the side of providing it. But its up to us to define what the *limited* means (what needs least effort from us), so that later when users want to use exceptions in callback, they need to pick right API. Currently it seems only PQexec() + multiple SELECTS can give trouble, as previous PGresult is kept in stack. Should we unsupport PQexec or multiple SELECTS? But such case it borken even without exceptions - or at least very confusing. Not sure what to do with it. In any case, "decorating" libpq functions is wrong approach. This gives suggestion that caller of eg. PQexec() needs to take care of any possible behaviour of unknown callback. This will not work. Instead allowed functions should be simply listed in row-processor documentation. Basically custom callback should be always matched by caller that knows about it and knows how to handle it. Not sure how to put such suggestion into documentation tho'. > Another design deficiency is PQskipResult(). This is badly designed for >
Re: [HACKERS] Refactoring simplify_function (was: Caching constant stable expressions)
On Sat, Mar 24, 2012 at 01:17, Tom Lane wrote: > I've applied a slightly-modified version of this after reconciling it > with the protransform fixes. Cool, thanks! > I assume you are going to submit a rebased > version of the main CacheExpr patch? Yeah, I'm still working on addressing the comments from your last email. Haven't had much time to work on it for the last 2 weeks, but I hope to finish most of it this weekend. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring simplify_function (was: Caching constant stable expressions)
Marti Raudsepp writes: > Per Tom's request, I split out this refactoring from my CacheExpr patch. > Basically I'm just centralizing the eval_const_expressions_mutator() > call on function arguments, from multiple different places to a single > location. Without this, it would be a lot harder to implement argument > caching logic in the CacheExpr patch. I've applied a slightly-modified version of this after reconciling it with the protransform fixes. I assume you are going to submit a rebased version of the main CacheExpr patch? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
Alvaro Herrera writes: > So you still need an index on (oid), one on (extoid), and one on > (extfeature). Yes. And the main use case for the index on (extoid) is listing a given extension's features, that we want to order by their name, then the set of indexes I've been defining is now: Indexes: "pg_extension_feature_name_index" UNIQUE, btree (extfeature) "pg_extension_feature_oid_index" UNIQUE, btree (oid) "pg_extension_feature_extoid_name_index" btree (extoid, extfeature) Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
Excerpts from Dimitri Fontaine's message of vie mar 23 16:51:57 -0300 2012: > > Alvaro Herrera writes: > >> Yes, for pg_depend, no I don't know how to make that work with pointing > >> to the extensions directly, because the whole point here is to be able > >> to depend on a feature rather than the whole extension. > > > > Yes, I understand that -- but would it work to have the feature > > resolution be done at install/upgrade time, and once it's resolved, you > > record it by storing the extension than contains the feature? That way > > I don't think so, because at upgrade time you then typically only have > the new .control file with the new set of features, and you need to > act on the difference between the old and new features compared to the > current other packages dependencies towards them. Aha, right. So you still need an index on (oid), one on (extoid), and one on (extfeature). -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Apology to the community
On Fri, Mar 23, 2012 at 11:43:25AM -0700, Joshua D. Drake wrote: > > Hello, > > It has been brought to my attention a few times over the last year > that I have been over the top in my presentation of myself and have > in fact alienated and offended many of the community. To be honest I > am unaware of everything I have done but I do take the opinion of > those who have taken the time to point it out to me seriously. They > have been peers, friends, and community members, some of them for > over a decade. > > The last year has been very trying personally. If you wish to know > details, please email me directly. Although, I know that I can be a > difficult personality even at the best of times, I have always tried > to keep the communities best interest at heart. It is this past > year, and the trials associated that brought about, frankly what I > would consider some of the worst of who I am. > > With that, I would like to apologize directly to this community that > has provided me with so much, not just professionally but > personally. I want to commend Joshua Drake for the honesty and transparency of his comments above, and for its sentiments. Thank you, Josh. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Apology to the community
JD, > With that, I would like to apologize directly to this community that has > provided me with so much, not just professionally but personally. Thank you for the apology. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
Alvaro Herrera writes: >> Yes, for pg_depend, no I don't know how to make that work with pointing >> to the extensions directly, because the whole point here is to be able >> to depend on a feature rather than the whole extension. > > Yes, I understand that -- but would it work to have the feature > resolution be done at install/upgrade time, and once it's resolved, you > record it by storing the extension than contains the feature? That way I don't think so, because at upgrade time you then typically only have the new .control file with the new set of features, and you need to act on the difference between the old and new features compared to the current other packages dependencies towards them. For that to work you need to remember the exact set of per feature dependencies in between extensions. You can't trust the control files to reflect the reality you saw when installing or last updating. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for parallel pg_dump
On Fri, Mar 23, 2012 at 11:11 AM, Alvaro Herrera wrote: > Are you going to provide a rebased version? Yes, working on that. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE FOREGIN TABLE LACUNA
On Fri, Mar 23, 2012 at 11:38:56AM -0700, David Fetter wrote: > On Thu, Mar 15, 2012 at 11:23:43AM -0300, Alvaro Herrera wrote: > > Excerpts from David Fetter's message of jue mar 15 02:28:28 -0300 2012: > > > On Wed, Mar 14, 2012 at 12:06:20PM -0400, Robert Haas wrote: > > > > On Wed, Mar 14, 2012 at 10:22 AM, David Fetter wrote: > > > > >> I think that instead of inventing new grammar productions and a new > > > > >> node type for this, you should just reuse the existing productions > > > > >> for > > > > >> LIKE clauses and then reject invalid options during parse analysis. > > > > > > > > > > OK. Should I first merge CREATE FOREIGN TABLE with CREATE TABLE and > > > > > submit that as a separate patch? > > > > > > > > I don't see any reason to do that. I merely meant that you could > > > > reuse TableLikeClause or maybe even TableElement in the grammer for > > > > CreateForeignTableStmt. > > > > > > Next WIP patch attached implementing this via reusing TableLikeClause > > > and refactoring transformTableLikeClause(). > > > > > > What say? > > > > Looks much better to me, but the use of strcmp() doesn't look good. > > ISTM that stmtType is mostly used for error messages. I think you > > should add some kind of identifier (such as the original parser Node) > > into the CreateStmtContext so that you can do a IsA() test instead -- a > > bit more invasive as a patch, but much cleaner. > > > > Also the error messages need more work. > > How about this one? Oops, forgot to put the latest docs in. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate *** a/doc/src/sgml/ref/create_foreign_table.sgml --- b/doc/src/sgml/ref/create_foreign_table.sgml *** *** 19,26 CREATE FOREIGN TABLE [ IF NOT EXISTS ] table_name ( [ ! { column_name data_type [ OPTIONS ( option 'value' [, ... ] ) ] [ NULL | NOT NULL ] } ! [, ... ] ] ) SERVER server_name [ OPTIONS ( option 'value' [, ... ] ) ] --- 19,26 CREATE FOREIGN TABLE [ IF NOT EXISTS ] table_name ( [ ! { { column_name data_type [ NULL | NOT NULL ] | LIKE source_table } [, ... ] ! [ OPTIONS ( option 'value' [, ... ] ) ] } ] ) SERVER server_name [ OPTIONS ( option 'value' [, ... ] ) ] *** a/src/backend/parser/gram.y --- b/src/backend/parser/gram.y *** *** 3945,3950 ForeignTableElementList: --- 3945,3951 ForeignTableElement: columnDef { $$ = $1; } + | TableLikeClause { $$ = $1; } ; /* *** a/src/backend/parser/parse_utilcmd.c --- b/src/backend/parser/parse_utilcmd.c *** *** 66,71 typedef struct --- 66,72 { ParseState *pstate; /* overall parser state */ const char *stmtType; /* "CREATE [FOREIGN] TABLE" or "ALTER TABLE" */ + charrelkind;/* r = ordinary table, f = foreign table, cf. pg_catalog.pg_class */ RangeVar *relation; /* relation to create */ Relationrel;/* opened/locked rel, if ALTER */ List *inhRelations; /* relations to inherit from */ *** *** 194,202 transformCreateStmt(CreateStmt *stmt, const char *queryString) --- 195,209 cxt.pstate = pstate; if (IsA(stmt, CreateForeignTableStmt)) + { cxt.stmtType = "CREATE FOREIGN TABLE"; + cxt.relkind = 'f'; + } else + { cxt.stmtType = "CREATE TABLE"; + cxt.relkind = 'r'; + } cxt.relation = stmt->relation; cxt.rel = NULL; cxt.inhRelations = stmt->inhRelations; *** *** 623,629 transformTableConstraint(CreateStmtContext *cxt, Constraint *constraint) /* * transformTableLikeClause * ! * Change the LIKE portion of a CREATE TABLE statement into * column definitions which recreate the user defined column portions of * . */ --- 630,636 /* * transformTableLikeClause * ! * Change the LIKE portion of a CREATE [FOREIGN] TABLE statement into * column definitions which recreate the user defined column portions of * . */ *** *** 652,657 transformTableLikeClause(CreateStmtContext *cxt, TableLikeClause *table_like_cla --- 659,683 table_like_clause->relation->relname))); cancel_parser_errposition_callback(&pcbstate); + + /* +* For foreign tables, disallow some options. +*/ + if (cxt->relkind ==
Re: [HACKERS] query cache
On Fri, Mar 23, 2012 at 1:51 PM, Greg Stark wrote: > Well it's not entirely unlikely. If you step back a web application > looks like a big loop with a switch statement to go to different > pages. It keeps executing the same loop over and over again and there > are only a smallish number of web pages. Sure the bind variables > change but there will only be so many bind values and 10% of those > will get 90% of the traffic too. That may be true, but lots of web applications have millions of users. The fact that a few hundred thousand of those may account for most of the traffic doesn't seem like it's going to help much unless there are not many users in total; and in that case it's plenty fast enough without a cache anyway. > But the other thing that happens is that people run multiple queries > aggregating or selecting from the same subset of data. So you often > get things like > > select count(*) from () > select * from () order by foo limit 10 > select * from () order by bar limit 10 > > for the same . That means if we could cache the rows > coming out of parts of the plan and remember those rows when we see a > plan with a common subtree in the plan then we could avoid a lot of > repetitive work. Currently, we don't even recognize this situation within a plan; for example, if you do project pp LEFT JOIN person sr ON pp.sales_rep_id = sr.id LEFT JOIN person pm ON pp.project_manager_id = pm.id, the query planner will happily seq-scan the person table twice to build two copies of the same hash table. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Apology to the community
On Fri, Mar 23, 2012 at 1:43 PM, Joshua D. Drake wrote: > > Hello, > > It has been brought to my attention a few times over the last year that I > have been over the top in my presentation of myself and have in fact > alienated and offended many of the community. To be honest I am unaware of > everything I have done but I do take the opinion of those who have taken the > time to point it out to me seriously. They have been peers, friends, and > community members, some of them for over a decade. > > The last year has been very trying personally. If you wish to know details, > please email me directly. Although, I know that I can be a difficult > personality even at the best of times, I have always tried to keep the > communities best interest at heart. It is this past year, and the trials > associated that brought about, frankly what I would consider some of the > worst of who I am. > > With that, I would like to apologize directly to this community that has > provided me with so much, not just professionally but personally. does this mean we have to take down the "Joshua D. Drake" dartboard? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] query cache
Robert Haas writes: > What I think is more common is the repeated submission of queries that > are *nearly* identical, but with either different parameter bindings > or different constants. It would be nice to have some kind of cache > that would allow us to avoid the overhead of parsing and planning > nearly identical statements over and over again, but the trick is that > you have to fingerprint the query to notice that's happening in the > first place, and the fingerprinting has to cost less than what the > cache saves you. I don't know whether that's possible, but I suspect > it's far from easy. The traditional solution to this is to make the application do it, ie, parameterized prepared statements. Since that has direct benefits to the application as well, in that it can use out-of-line values and thereby avoid quoting and SQL-injection risks, it's not apparent that it's a good idea to expend lots of sweat to reverse-engineer parameterization from a collection of unparameterized queries. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Apology to the community
Hello, It has been brought to my attention a few times over the last year that I have been over the top in my presentation of myself and have in fact alienated and offended many of the community. To be honest I am unaware of everything I have done but I do take the opinion of those who have taken the time to point it out to me seriously. They have been peers, friends, and community members, some of them for over a decade. The last year has been very trying personally. If you wish to know details, please email me directly. Although, I know that I can be a difficult personality even at the best of times, I have always tried to keep the communities best interest at heart. It is this past year, and the trials associated that brought about, frankly what I would consider some of the worst of who I am. With that, I would like to apologize directly to this community that has provided me with so much, not just professionally but personally. Apologies, JD -- Command Prompt, Inc. - http://www.commandprompt.com/ PostgreSQL Support, Training, Professional Services and Development The PostgreSQL Conference - http://www.postgresqlconference.org/ @cmdpromptinc - @postgresconf - 509-416-6579 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE FOREGIN TABLE LACUNA
On Thu, Mar 15, 2012 at 11:23:43AM -0300, Alvaro Herrera wrote: > Excerpts from David Fetter's message of jue mar 15 02:28:28 -0300 2012: > > On Wed, Mar 14, 2012 at 12:06:20PM -0400, Robert Haas wrote: > > > On Wed, Mar 14, 2012 at 10:22 AM, David Fetter wrote: > > > >> I think that instead of inventing new grammar productions and a new > > > >> node type for this, you should just reuse the existing productions for > > > >> LIKE clauses and then reject invalid options during parse analysis. > > > > > > > > OK. Should I first merge CREATE FOREIGN TABLE with CREATE TABLE and > > > > submit that as a separate patch? > > > > > > I don't see any reason to do that. I merely meant that you could > > > reuse TableLikeClause or maybe even TableElement in the grammer for > > > CreateForeignTableStmt. > > > > Next WIP patch attached implementing this via reusing TableLikeClause > > and refactoring transformTableLikeClause(). > > > > What say? > > Looks much better to me, but the use of strcmp() doesn't look good. > ISTM that stmtType is mostly used for error messages. I think you > should add some kind of identifier (such as the original parser Node) > into the CreateStmtContext so that you can do a IsA() test instead -- a > bit more invasive as a patch, but much cleaner. > > Also the error messages need more work. How about this one? Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate *** a/doc/src/sgml/ref/create_foreign_table.sgml --- b/doc/src/sgml/ref/create_foreign_table.sgml *** *** 19,26 CREATE FOREIGN TABLE [ IF NOT EXISTS ] table_name ( [ ! { column_name data_type [ OPTIONS ( option 'value' [, ... ] ) ] [ NULL | NOT NULL ] } ! [, ... ] ] ) SERVER server_name [ OPTIONS ( option 'value' [, ... ] ) ] --- 19,26 CREATE FOREIGN TABLE [ IF NOT EXISTS ] table_name ( [ ! { { column_name data_type [ NULL | NOT NULL ] | LIKE source_table } [, ... ] ! [ OPTIONS ( option 'value' [, ... ] ) ] } ] ) SERVER server_name [ OPTIONS ( option 'value' [, ... ] ) ] *** a/src/backend/parser/gram.y --- b/src/backend/parser/gram.y *** *** 3945,3950 ForeignTableElementList: --- 3945,3951 ForeignTableElement: columnDef { $$ = $1; } + | TableLikeClause { $$ = $1; } ; /* *** a/src/backend/parser/parse_utilcmd.c --- b/src/backend/parser/parse_utilcmd.c *** *** 66,71 typedef struct --- 66,72 { ParseState *pstate; /* overall parser state */ const char *stmtType; /* "CREATE [FOREIGN] TABLE" or "ALTER TABLE" */ + charrelkind;/* r = ordinary table, f = foreign table, cf. pg_catalog.pg_class */ RangeVar *relation; /* relation to create */ Relationrel;/* opened/locked rel, if ALTER */ List *inhRelations; /* relations to inherit from */ *** *** 194,202 transformCreateStmt(CreateStmt *stmt, const char *queryString) --- 195,209 cxt.pstate = pstate; if (IsA(stmt, CreateForeignTableStmt)) + { cxt.stmtType = "CREATE FOREIGN TABLE"; + cxt.relkind = 'f'; + } else + { cxt.stmtType = "CREATE TABLE"; + cxt.relkind = 'r'; + } cxt.relation = stmt->relation; cxt.rel = NULL; cxt.inhRelations = stmt->inhRelations; *** *** 623,629 transformTableConstraint(CreateStmtContext *cxt, Constraint *constraint) /* * transformTableLikeClause * ! * Change the LIKE portion of a CREATE TABLE statement into * column definitions which recreate the user defined column portions of * . */ --- 630,636 /* * transformTableLikeClause * ! * Change the LIKE portion of a CREATE [FOREIGN] TABLE statement into * column definitions which recreate the user defined column portions of * . */ *** *** 652,657 transformTableLikeClause(CreateStmtContext *cxt, TableLikeClause *table_like_cla --- 659,683 table_like_clause->relation->relname))); cancel_parser_errposition_callback(&pcbstate); + + /* +* For foreign tables, disallow some options. +*/ + if (cxt->relkind == 'f') + { + if (table_like_clause->options & CREATE_TABLE_LIKE_CONSTRAINTS) + { + ereport(ERROR, +
Re: [HACKERS] Finer Extension dependencies
Excerpts from Dimitri Fontaine's message of vie mar 23 13:12:22 -0300 2012: > > Alvaro Herrera writes: > > Why do features have OIDs? Is this for pg_depend entries? If so, would > > it work to have pg_depend entries point to extensions instead? > > Yes, for pg_depend, no I don't know how to make that work with pointing > to the extensions directly, because the whole point here is to be able > to depend on a feature rather than the whole extension. Yes, I understand that -- but would it work to have the feature resolution be done at install/upgrade time, and once it's resolved, you record it by storing the extension than contains the feature? That way it correctly breaks when the extension gets removed; and since we ensure that upgrading an extension means delete its features and then insert them anew, it would also correctly break at that point if some feature is no longer provided. I'm not wedded to this idea, so if we think it doesn't work for some reason, I have no problem going back to the idea of having direct dependencies to features instead. But I think it's worth considering. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] query cache
On Fri, Mar 23, 2012 at 12:03 PM, Robert Haas wrote: > On Fri, Mar 23, 2012 at 12:29 PM, Greg Stark wrote: >> On Fri, Mar 23, 2012 at 3:49 PM, Tom Lane wrote: >>> The complication, opportunities for bugs, and general slowdown >>> associated with that would outweigh any possible gain, in the opinion >>> of most hackers who have thought about this. >> >> I wouldn't be quite so pessimistic. I think the problem is that the >> hard part in doing this for real is all the parts the proposal glosses >> over. How much memory is it worth dedicating to the cache before the >> cost of that memory costs more than it helps? How do you invalidate >> cache entries efficiently enough that it doesn't become a bottleneck? > > I think the question of how you would invalidate things is a very good one. > > The other thing that makes me skeptical of this proposal is that I am > not very sure that executing absolutely identical queries is a very > common use case for a relational database. I suppose there might be a > few queries that run over and over again (e.g. whatever you need to > render your home page), but I think those will be the exception, and > not the rule. It therefore seems likely that the overhead of such a > cache would in most cases be greater than the benefit of having it in > the first place. > > What I think is more common is the repeated submission of queries that > are *nearly* identical, but with either different parameter bindings > or different constants. It would be nice to have some kind of cache > that would allow us to avoid the overhead of parsing and planning > nearly identical statements over and over again, but the trick is that > you have to fingerprint the query to notice that's happening in the > first place, and the fingerprinting has to cost less than what the > cache saves you. I don't know whether that's possible, but I suspect > it's far from easy. Query cache basically addresses two use cases: 1) read only or mostly read only workloads 2) badly written application code (either by human or machine) The problem is that #1 can be optimized by any number of simple techniques, and #2 is not a good basis for complicated internal features with nasty trade-offs. mysql's query cache woes are well known -- it's typical for administrators to turn the feature off. The feature is misnamed -- it's a 'benchmark cheating feature' since a lot of db benchmarks tend to focus on single user loads and/or highly repetitive queries but completely falls over in production real world workloads. Also, it's really not that difficult to rig an ad-hoc cache in the server or on the client side and you can then gear it towards your particular use-case. People that are asking for this probably really want materialized views instead. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] query cache
On Fri, Mar 23, 2012 at 5:03 PM, Robert Haas wrote: > The other thing that makes me skeptical of this proposal is that I am > not very sure that executing absolutely identical queries is a very > common use case for a relational database. I suppose there might be a > few queries that run over and over again (e.g. whatever you need to > render your home page), but I think those will be the exception, and > not the rule. It therefore seems likely that the overhead of such a > cache would in most cases be greater than the benefit of having it in > the first place. Well it's not entirely unlikely. If you step back a web application looks like a big loop with a switch statement to go to different pages. It keeps executing the same loop over and over again and there are only a smallish number of web pages. Sure the bind variables change but there will only be so many bind values and 10% of those will get 90% of the traffic too. But the other thing that happens is that people run multiple queries aggregating or selecting from the same subset of data. So you often get things like select count(*) from () select * from () order by foo limit 10 select * from () order by bar limit 10 for the same . That means if we could cache the rows coming out of parts of the plan and remember those rows when we see a plan with a common subtree in the plan then we could avoid a lot of repetitive work. This depends on being able to recognize when we can guarantee that subtrees of plans produce the same rows even if the surrounding tree changes. That will be true sometimes but not other times. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standbys, txid_current_snapshot, wraparound
On Fri, Mar 23, 2012 at 1:52 AM, Simon Riggs wrote: > So we have this? > > Master pg_controldata - OK txid_current_snapshot() - OK > Standby pg_controldata - OK txid_current_snapshot() - lower value > > Are there just 2 standbys? So all standbys have acted identically? Yes, I believe this is the situation. All have acted identically. Also, some new data: I took a new base backup after the epoch increment and started a new standby, and it reported txid_current_snapshot correctly, at least moments after it became consistent. This morning, however, it does not, and reports the 0-epoch number. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reporting WAL file containing checkpoint's REDO record in pg_controldata's result
On Sat, Mar 24, 2012 at 1:49 AM, Robert Haas wrote: > On Fri, Mar 23, 2012 at 12:42 PM, Fujii Masao wrote: >> On Fri, Mar 23, 2012 at 9:41 PM, Robert Haas wrote: >>> On Fri, Mar 23, 2012 at 6:13 AM, Fujii Masao wrote: On Fri, Mar 23, 2012 at 5:56 PM, Magnus Hagander wrote: > Might it be a good idea to put it on it's own row instead of changing > the format of an existing row, in order not to break scripts and > programs that are parsing the previous output? Good idea! What row name should we use for the WAL file containing REDO record? "Latest checkpoint's REDO file"? >>> >>> Sounds good to me. I like the idea, too. The status quo is an >>> unnecessary nuisance, so this will be a nice usability improvement. >> >> Attached patch adds new row "Latest checkpoint's REDO WAL segment:" into >> the result of pg_controldata. I used the term "WAL segment" for the row name >> instead of "file" because "WAL segment" is used in another row "Bytes per WAL >> segment:". But better name? > > s/segment/file/g? Yep, "file" might be more intuitive for a user than "segment". Attached is the "file" version of the patch. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center pg_controldata_walfilename_v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] query cache
On Fri, Mar 23, 2012 at 12:29 PM, Greg Stark wrote: > On Fri, Mar 23, 2012 at 3:49 PM, Tom Lane wrote: >> The complication, opportunities for bugs, and general slowdown >> associated with that would outweigh any possible gain, in the opinion >> of most hackers who have thought about this. > > I wouldn't be quite so pessimistic. I think the problem is that the > hard part in doing this for real is all the parts the proposal glosses > over. How much memory is it worth dedicating to the cache before the > cost of that memory costs more than it helps? How do you invalidate > cache entries efficiently enough that it doesn't become a bottleneck? I think the question of how you would invalidate things is a very good one. The other thing that makes me skeptical of this proposal is that I am not very sure that executing absolutely identical queries is a very common use case for a relational database. I suppose there might be a few queries that run over and over again (e.g. whatever you need to render your home page), but I think those will be the exception, and not the rule. It therefore seems likely that the overhead of such a cache would in most cases be greater than the benefit of having it in the first place. What I think is more common is the repeated submission of queries that are *nearly* identical, but with either different parameter bindings or different constants. It would be nice to have some kind of cache that would allow us to avoid the overhead of parsing and planning nearly identical statements over and over again, but the trick is that you have to fingerprint the query to notice that's happening in the first place, and the fingerprinting has to cost less than what the cache saves you. I don't know whether that's possible, but I suspect it's far from easy. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add notion of a "transform function" that can simplify function
Noah Misch writes: > On Fri, Mar 23, 2012 at 11:31:54AM -0400, Tom Lane wrote: >> ... I've not looked >> yet at the existing transform functions, but why would they want to know >> about the original node at all? > You suggested[1] passing an Expr instead of an argument list, and your reasons > still seem good to me. That said, perhaps we should send both the original > Expr and the simplified argument list. That will help if we ever want to > fully simplify x - y * 0. (Then again, the feature is undocumented and we > could change it when that day comes.) I believe what I had in mind back then was that we'd build a new FuncExpr containing the simplified argument list. On reflection that's probably the most future-proof way to do it, since otherwise anytime we change the contents of FuncExpr, we'll be faced with possibly having to change the signature of protransform functions. Will go see what I can do with that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reporting WAL file containing checkpoint's REDO record in pg_controldata's result
On Fri, Mar 23, 2012 at 12:42 PM, Fujii Masao wrote: > On Fri, Mar 23, 2012 at 9:41 PM, Robert Haas wrote: >> On Fri, Mar 23, 2012 at 6:13 AM, Fujii Masao wrote: >>> On Fri, Mar 23, 2012 at 5:56 PM, Magnus Hagander >>> wrote: Might it be a good idea to put it on it's own row instead of changing the format of an existing row, in order not to break scripts and programs that are parsing the previous output? >>> >>> Good idea! What row name should we use for the WAL file containing >>> REDO record? "Latest checkpoint's REDO file"? >> >> Sounds good to me. I like the idea, too. The status quo is an >> unnecessary nuisance, so this will be a nice usability improvement. > > Attached patch adds new row "Latest checkpoint's REDO WAL segment:" into > the result of pg_controldata. I used the term "WAL segment" for the row name > instead of "file" because "WAL segment" is used in another row "Bytes per WAL > segment:". But better name? s/segment/file/g? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reporting WAL file containing checkpoint's REDO record in pg_controldata's result
On Fri, Mar 23, 2012 at 9:41 PM, Robert Haas wrote: > On Fri, Mar 23, 2012 at 6:13 AM, Fujii Masao wrote: >> On Fri, Mar 23, 2012 at 5:56 PM, Magnus Hagander wrote: >>> Might it be a good idea to put it on it's own row instead of changing >>> the format of an existing row, in order not to break scripts and >>> programs that are parsing the previous output? >> >> Good idea! What row name should we use for the WAL file containing >> REDO record? "Latest checkpoint's REDO file"? > > Sounds good to me. I like the idea, too. The status quo is an > unnecessary nuisance, so this will be a nice usability improvement. Attached patch adds new row "Latest checkpoint's REDO WAL segment:" into the result of pg_controldata. I used the term "WAL segment" for the row name instead of "file" because "WAL segment" is used in another row "Bytes per WAL segment:". But better name? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center pg_controldata_walfilename_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] query cache
On Fri, Mar 23, 2012 at 11:29 AM, Greg Stark wrote: > On Fri, Mar 23, 2012 at 3:49 PM, Tom Lane wrote: > > The complication, opportunities for bugs, and general slowdown > > associated with that would outweigh any possible gain, in the opinion > > of most hackers who have thought about this. > > I wouldn't be quite so pessimistic. I think the problem is that the > hard part in doing this for real is all the parts the proposal glosses > over. How much memory is it worth dedicating to the cache before the > cost of that memory costs more than it helps? How do you invalidate > cache entries efficiently enough that it doesn't become a bottleneck? > > Also, you need to identify the specific advantages you hope a built-in > cache would have over one implemented in the ORM or database library. > If there aren't any advantages then those solutions are much simpler. > And they have other advantages as well -- one of the main reason > people implement caches is so they can move the load away from the > bottleneck of the database to the more easily scaled out application. > > Thanks for the input. I've had many of these thoughts myself, and I guess it depends on the environment the database will be used, memory settings, and other variables, on how valuable a query cache would be. I'll definitely give this more thought before sending an official proposal. Billy
Re: [HACKERS] query cache
On Fri, Mar 23, 2012 at 3:49 PM, Tom Lane wrote: > The complication, opportunities for bugs, and general slowdown > associated with that would outweigh any possible gain, in the opinion > of most hackers who have thought about this. I wouldn't be quite so pessimistic. I think the problem is that the hard part in doing this for real is all the parts the proposal glosses over. How much memory is it worth dedicating to the cache before the cost of that memory costs more than it helps? How do you invalidate cache entries efficiently enough that it doesn't become a bottleneck? Also, you need to identify the specific advantages you hope a built-in cache would have over one implemented in the ORM or database library. If there aren't any advantages then those solutions are much simpler. And they have other advantages as well -- one of the main reason people implement caches is so they can move the load away from the bottleneck of the database to the more easily scaled out application. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add notion of a "transform function" that can simplify function
I wrote: > However, see my response to Robert: why are we passing the original node > to the transform function at all? It would be more useful and easier to > work with to pass the function's fully-processed argument list, I believe. After a bit of looking around, I realize that the current implementation of transform functions is flat-out wrong, because whenever a transform actually fires, it proceeds to throw away all the work that eval_const_expressions has done on the input, and instead return some lightly-modified version of the original node tree. Thus for example in the regression database: regression=# create function foo(x float8, y int) returns numeric as regression-# 'select ($1 + $2)::numeric' language sql; CREATE FUNCTION regression=# select "numeric"(foo(y := 1, x := f1), -1) from float8_tbl; ERROR: unrecognized node type: 310 since the adjustment of foo's named arguments is thrown away. So this patch is going to need some work. I continue to not see any particular reason why the transform function should need the original node tree. I think what it *should* be getting is the OID of the function (currently, it's impossible for one transform to serve more than one function, which seems like it might be useful); the input collation (currently, transforms are basically unusable for any collation-sensitive function), and the pre-simplified argument list. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
Alvaro Herrera writes: > Why do features have OIDs? Is this for pg_depend entries? If so, would > it work to have pg_depend entries point to extensions instead? Yes, for pg_depend, no I don't know how to make that work with pointing to the extensions directly, because the whole point here is to be able to depend on a feature rather than the whole extension. Use cases: - depend on a feature f that appeared in version y of the extension (bugfix, new capability) - deprecate a feature: alter extension update removes a feature, you want to know that the dependent extensions need processing (cascade to remove them in the operation, or update them before hand, etc) (still manual operation though) I don't see how to handle those cases with a direct dependency on the extension rather than one of its features. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Add notion of a "transform function" that can simplify function
On Fri, Mar 23, 2012 at 11:31:54AM -0400, Tom Lane wrote: > Robert Haas writes: > > On Fri, Mar 23, 2012 at 10:55 AM, Tom Lane wrote: > >> Why exactly was this thought to be a good idea: > >> > >>> * A NULL original expression disables use of transform functions while > >>> * retaining all other behaviors. > > > I assumed that we were merely trying to avoid forcing the caller to > > provide the expression tree if they didn't have it handy, and that the > > comment was merely making allowance for the fact that someone might > > want to do such a thing. > > How would they not have the original expression tree handy? > > But now that I'm looking at this ... the API specification for transform > functions seems rather thoroughly broken anyway. Why are we passing the > original expression and nothing else? This would appear to require the > transform function to repeat all the input-normalization and > simplification work done up to this point. It would seem to me to be > more useful to pass the fully-processed argument list. I've not looked > yet at the existing transform functions, but why would they want to know > about the original node at all? You suggested[1] passing an Expr instead of an argument list, and your reasons still seem good to me. That said, perhaps we should send both the original Expr and the simplified argument list. That will help if we ever want to fully simplify x - y * 0. (Then again, the feature is undocumented and we could change it when that day comes.) [1] http://archives.postgresql.org/pgsql-hackers/2011-06/msg00915.php The existing transform functions are trivial and could survive on nearly any API we might consider. See varchar_transform(). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] query cache
Billy Earney writes: > I'm wondering if anyone would be interested in a query cache as a backend > to postgresql? I believe this has been suggested and rejected several times before. Did you look through the pgsql-hackers archives? > To invalidate cache entries, look at the transactions being committed (and > written to WAL log, if my memory serves me) and send a message to the > qcache process to invalidate any query which depends on the modfied > relation (ie, table, etc) The complication, opportunities for bugs, and general slowdown associated with that would outweigh any possible gain, in the opinion of most hackers who have thought about this. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
Excerpts from Dimitri Fontaine's message of vie mar 23 12:26:47 -0300 2012: > > Alvaro Herrera writes: > > Excerpts from Dimitri Fontaine's message of vie mar 23 11:05:37 -0300 2012: > > > >> =# \d pg_extension_feature > >> Table "pg_catalog.pg_extension_feature" > >> Column | Type | Modifiers > >> +--+--- > >>extoid | oid | not null > >>extfeature | name | not null > >> Indexes: > >> "pg_extension_feature_name_index" UNIQUE, btree (extfeature) > >> "pg_extension_feature_oid_index" UNIQUE, btree (oid) > >> "pg_extension_feature_extoid_name_index" btree (extoid, extfeature) > >> > >> We could maybe get rid of the (extoid, extfeature) index which is only > >> used to get sorted output in list_extension_features() function, but I > >> don't know how to do an ORDER BY scan without index in C (yet). > >> > >> The ordering is then used to maintain pg_depend when the list of > >> provided features changes at upgrade time. We fetch the ordered list of > >> “old” feature names then for each newly provided feature name we > >> bsearch() the old list, which then needs to be properly ordered. > > > > Hm, couldn't it be done simply with a qsort()? Presumably there aren't > > many feature entries to sort ... > > Mmmm… Then we would need an index on extoid to be able to list features > of a given extension, and that would be the only usage of such an index. > I guess that having it include the feature's name is not so expensive as > to try avoiding it and qsort() in the code rather than scan the index in > order? Well, as far as I can see the only use of pg_extension_feature_extoid_name_index right now is the same as the only use for the extoid index. I mean, what you really want is to find out the features of an extension, right? The extfeature column is just there to provide you with the ordering, which should be easy to determine outside of the index. Why do features have OIDs? Is this for pg_depend entries? If so, would it work to have pg_depend entries point to extensions instead? -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add notion of a "transform function" that can simplify function
Noah Misch writes: > On Fri, Mar 23, 2012 at 10:55:52AM -0400, Tom Lane wrote: >> Why exactly was this thought to be a good idea: >> >>> * A NULL original expression disables use of transform functions while >>> * retaining all other behaviors. > I did it that way because it looked wrong to pass the same CoerceViaIO node to > transforms of both the input and output functions. Thinking about it again > now, doing so imposes no fundamental problems. Feel welcome to change it. Oh, I see your point --- it's not obvious whether the current transform is meant for the input or the output function. Which is a very good point. In principle the transform function could figure out which end of that it must be, but it would be ugly. However, see my response to Robert: why are we passing the original node to the transform function at all? It would be more useful and easier to work with to pass the function's fully-processed argument list, I believe. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uppercase tab completion keywords in psql?
On 23 March 2012 15:13, Andrew Dunstan wrote: > Upper casing SQL keywords is a common style, which is used in lots of our > code (e.g. regression tests, psql queries, pg_dump). I think the default > should match what is in effect our house style, and what we have > historically done. The code doesn't give preferential treatment to lower-case code - it merely puts it on an even footing. I would agree with your position if the change assumed that the user always wanted to use lower-case SQL, but it does not. Rather, it intelligently infers what the user wants. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #6510: A simple prompt is displayed using wrong charset
Excerpts from Alexander LAW's message of mar mar 20 16:50:14 -0300 2012: > Thanks, I've understood your point. > Please look at the patch. It implements the first way and it makes psql > work too. Great, thanks. Hopefully somebody with Windows-compile abilities will have a look at this. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers patch v18
Thom Brown writes: > The new command triggers work correctly. Thanks for your continued testing :) > Having looked at your regression tests, you don't seem to have enough > "before" triggers in the tests. There's no test for before CREATE > TABLE, CREATE TABLE AS or SELECT INTO. In my tests I have 170 unique > command triggers, but there are only 44 in the regression test. Is > there a reason why there aren't many tests? Now that we share the same code for ANY triggers and specific ones, I guess we could drop a lot of specific command triggers from the regression tests. > A problem still outstanding is that when I build the docs, the CREATE I would like to get back on code level review now if at all possible, and I would integrate your suggestions here into the next patch revision if another one is needed. The only point yet to address from last round from Andres is about the API around CommandFiresTrigger() and the Memory Context we use here. We're missing an explicit Reset call, and to be able to have we need to have a more complex API, because of the way RemoveObjects() and RemoveRelations() work. We would need to add no-reset APIs and an entry point to manually reset the memory context, which currently gets disposed at the same time as its parent context, the current one that's been setup before entering standard_ProcessUtility(). Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add notion of a "transform function" that can simplify function
Robert Haas writes: > On Fri, Mar 23, 2012 at 10:55 AM, Tom Lane wrote: >> Why exactly was this thought to be a good idea: >> >>> * A NULL original expression disables use of transform functions while >>> * retaining all other behaviors. > I assumed that we were merely trying to avoid forcing the caller to > provide the expression tree if they didn't have it handy, and that the > comment was merely making allowance for the fact that someone might > want to do such a thing. How would they not have the original expression tree handy? But now that I'm looking at this ... the API specification for transform functions seems rather thoroughly broken anyway. Why are we passing the original expression and nothing else? This would appear to require the transform function to repeat all the input-normalization and simplification work done up to this point. It would seem to me to be more useful to pass the fully-processed argument list. I've not looked yet at the existing transform functions, but why would they want to know about the original node at all? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Add notion of a "transform function" that can simplify function
On Fri, Mar 23, 2012 at 10:55:52AM -0400, Tom Lane wrote: > Robert Haas writes: > > Add notion of a "transform function" that can simplify function calls. > > Why exactly was this thought to be a good idea: > > > * A NULL original expression disables use of transform functions while > > * retaining all other behaviors. We last spoke of that idea here, albeit in minimal detail: http://archives.postgresql.org/pgsql-hackers/2011-06/msg00918.php > AFAICT that buys nothing except to greatly complicate the API > specification for simplify_function, something that is now proving > problematic for Marti's requested refactoring [1]. If it's > inappropriate for a transform function to modify a CoerceViaIO call, > surely the transform function can be expected to know that. I did it that way because it looked wrong to pass the same CoerceViaIO node to transforms of both the input and output functions. Thinking about it again now, doing so imposes no fundamental problems. Feel welcome to change it. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] query cache
Greetings! I've done a brief search of the postgresql mail archives, and I've noticed a few projects for adding query caches to postgresql, (for example, Masanori Yamazaki's query cache proposal for GSOC 2011), as well as the query cache announced at http://www.postgresql.org/about/news/1296/ (pgc). Both of these seem to be external solutions that act more like a proxy between clients and servers, instead of being part of the server processes. I'm wondering if anyone would be interested in a query cache as a backend to postgresql? I've been playing around with the postgresql code, and if I'm understanding the code, I believe this is possible. I've been writing some code, but don't have anything working yet, (I'm receiving a hash table corruption error), but I'm working through it. here's my basic idea: 1. intercept select queries in execMain.c at ExecuteQuery and see if the sourcetext of this query is in the "query hash". (later we could make this more sophisticated by using the query plan or some type of AST) instead of the query text since adding or removing a space would create a different query hash key. 2. if the query is in the cache, return the cached results of this query. 3. if the query is not cached, run the query like normal, grabbing the tuples as they are sent to the "dest" and store them in the cache. (For now, I'm ignoring storage constraints, etc, but these details will need to be added before going to production). To invalidate cache entries, look at the transactions being committed (and written to WAL log, if my memory serves me) and send a message to the qcache process to invalidate any query which depends on the modfied relation (ie, table, etc) For the experts out there, does this seem reasonable, or am I misunderstanding the source code? Anyone aware of a project trying to accomplish this? Thanks! Billy Earney
Re: [HACKERS] Finer Extension dependencies
Alvaro Herrera writes: > Excerpts from Dimitri Fontaine's message of vie mar 23 11:05:37 -0300 2012: > >> =# \d pg_extension_feature >> Table "pg_catalog.pg_extension_feature" >> Column | Type | Modifiers >> +--+--- >>extoid | oid | not null >>extfeature | name | not null >> Indexes: >> "pg_extension_feature_name_index" UNIQUE, btree (extfeature) >> "pg_extension_feature_oid_index" UNIQUE, btree (oid) >> "pg_extension_feature_extoid_name_index" btree (extoid, extfeature) >> >> We could maybe get rid of the (extoid, extfeature) index which is only >> used to get sorted output in list_extension_features() function, but I >> don't know how to do an ORDER BY scan without index in C (yet). >> >> The ordering is then used to maintain pg_depend when the list of >> provided features changes at upgrade time. We fetch the ordered list of >> “old” feature names then for each newly provided feature name we >> bsearch() the old list, which then needs to be properly ordered. > > Hm, couldn't it be done simply with a qsort()? Presumably there aren't > many feature entries to sort ... Mmmm… Then we would need an index on extoid to be able to list features of a given extension, and that would be the only usage of such an index. I guess that having it include the feature's name is not so expensive as to try avoiding it and qsort() in the code rather than scan the index in order? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Add notion of a "transform function" that can simplify function
On Fri, Mar 23, 2012 at 10:55 AM, Tom Lane wrote: > Robert Haas writes: >> Add notion of a "transform function" that can simplify function calls. > > Why exactly was this thought to be a good idea: > >> * A NULL original expression disables use of transform functions while >> * retaining all other behaviors. > > AFAICT that buys nothing except to greatly complicate the API > specification for simplify_function, something that is now proving > problematic for Marti's requested refactoring [1]. If it's > inappropriate for a transform function to modify a CoerceViaIO call, > surely the transform function can be expected to know that. I assumed that we were merely trying to avoid forcing the caller to provide the expression tree if they didn't have it handy, and that the comment was merely making allowance for the fact that someone might want to do such a thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uppercase tab completion keywords in psql?
Peter Geoghegan writes: > On 22 March 2012 22:05, Andrew Dunstan wrote: >> Should it be governed by a setting? > Perhaps, but I find the behaviour that was introduced by Peter's patch > to be a more preferable default. FWIW, I like the new behavior better too. I'm not particularly a fan of all-caps. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uppercase tab completion keywords in psql?
On 03/23/2012 11:07 AM, Peter Geoghegan wrote: On 22 March 2012 22:05, Andrew Dunstan wrote: On 03/22/2012 05:49 PM, Bruce Momjian wrote: Robert Haas and I are disappointed by this change. I liked the fact that I could post nice-looking SQL queries without having to use my capslock key (which I use as a second control key). Any chance of reverting this change? Should it be governed by a setting? Perhaps, but I find the behaviour that was introduced by Peter's patch to be a more preferable default. Upper casing SQL keywords is a common style, which is used in lots of our code (e.g. regression tests, psql queries, pg_dump). I think the default should match what is in effect our house style, and what we have historically done. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for parallel pg_dump
Are you going to provide a rebased version? -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uppercase tab completion keywords in psql?
On 22 March 2012 22:05, Andrew Dunstan wrote: > > > On 03/22/2012 05:49 PM, Bruce Momjian wrote: >> >> >> Robert Haas and I are disappointed by this change. I liked the fact >> that I could post nice-looking SQL queries without having to use my >> capslock key (which I use as a second control key). Any chance of >> reverting this change? >> > > Should it be governed by a setting? Perhaps, but I find the behaviour that was introduced by Peter's patch to be a more preferable default. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap_freeze_tuple locking requirements
Excerpts from Robert Haas's message of mié mar 21 21:50:24 -0300 2012: > heap_freeze_tuple() was apparently designed at one point to cope with > being called with either a shared or exclusive buffer lock. But none > of the current callers call it with a shared lock; they all call it > with an exclusive lock, except for the heap-rewrite code which doesn't > take (or need) a lock at all. > Since this is just dead code removal, I propose to apply this to 9.2. +1 -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add notion of a "transform function" that can simplify function
Robert Haas writes: > Add notion of a "transform function" that can simplify function calls. Why exactly was this thought to be a good idea: > * A NULL original expression disables use of transform functions while > * retaining all other behaviors. AFAICT that buys nothing except to greatly complicate the API specification for simplify_function, something that is now proving problematic for Marti's requested refactoring [1]. If it's inappropriate for a transform function to modify a CoerceViaIO call, surely the transform function can be expected to know that. regards, tom lane [1] http://archives.postgresql.org/pgsql-hackers/2012-03/msg00694.php -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uppercase tab completion keywords in psql?
On Fri, Mar 23, 2012 at 11:51:16AM -0300, Alvaro Herrera wrote: > > Excerpts from Andrew Dunstan's message of jue mar 22 19:05:30 -0300 2012: > > > > On 03/22/2012 05:49 PM, Bruce Momjian wrote: > > > > > > Robert Haas and I are disappointed by this change. I liked the > > > fact that I could post nice-looking SQL queries without having > > > to use my capslock key (which I use as a second control key). > > > Any chance of reverting this change? > > > > > > > Should it be governed by a setting? > > A \set variable perhaps? +1 Would the old behavior be the default? +1 for defaulting to the old behavior. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uppercase tab completion keywords in psql?
On Thu, Mar 22, 2012 at 06:05:30PM -0400, Andrew Dunstan wrote: > On 03/22/2012 05:49 PM, Bruce Momjian wrote: > >Robert Haas and I are disappointed by this change. I liked the > >fact that I could post nice-looking SQL queries without having to > >use my capslock key (which I use as a second control key). Any > >chance of reverting this change? > > > > Should it be governed by a setting? Something like (upper|lower|preserve) ? Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uppercase tab completion keywords in psql?
Excerpts from Andrew Dunstan's message of jue mar 22 19:05:30 -0300 2012: > > On 03/22/2012 05:49 PM, Bruce Momjian wrote: > > > > Robert Haas and I are disappointed by this change. I liked the fact > > that I could post nice-looking SQL queries without having to use my > > capslock key (which I use as a second control key). Any chance of > > reverting this change? > > > > Should it be governed by a setting? A \set variable perhaps? +1 Would the old behavior be the default? -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
Excerpts from Dimitri Fontaine's message of vie mar 23 11:05:37 -0300 2012: > =# \d pg_extension_feature > Table "pg_catalog.pg_extension_feature" > Column | Type | Modifiers > +--+--- >extoid | oid | not null >extfeature | name | not null > Indexes: > "pg_extension_feature_name_index" UNIQUE, btree (extfeature) > "pg_extension_feature_oid_index" UNIQUE, btree (oid) > "pg_extension_feature_extoid_name_index" btree (extoid, extfeature) > > We could maybe get rid of the (extoid, extfeature) index which is only > used to get sorted output in list_extension_features() function, but I > don't know how to do an ORDER BY scan without index in C (yet). > > The ordering is then used to maintain pg_depend when the list of > provided features changes at upgrade time. We fetch the ordered list of > “old” feature names then for each newly provided feature name we > bsearch() the old list, which then needs to be properly ordered. Hm, couldn't it be done simply with a qsort()? Presumably there aren't many feature entries to sort ... -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpoint patches
* Robert Haas (robertmh...@gmail.com) wrote: > Well, how do you want to look at it? I thought the last graph you provided was a useful way to view the results. It was my intent to make that clear in my prior email, my apologies if that didn't come through. > Here's the data from 80th > percentile through 100th percentile - percentile, patched, unpatched, > difference - for the same two runs I've been comparing: [...] > 98 12100 24645 -12545 > 99 186043 201309 -15266 > 100 9513855 9074161 439694 Those are the areas that I think we want to be looking at/for: the outliers. > By the way, I reran the tests on master with checkpoint_timeout=16min, > and here are the tps results: 2492.966759, 2588.750631, 2575.175993. > So it seems like not all of the tps gain from this patch comes from > the fact that it increases the time between checkpoints. Comparing > the median of three results between the different sets of runs, > applying the patch and setting a 3s delay between syncs gives you > about a 5.8% increase throughput, but also adds 30-40 seconds between > checkpoints. If you don't apply the patch but do increase time > between checkpoints by 1 minute, you get about a 5.0% increase in > throughput. That certainly means that the patch is doing something - > because 5.8% for 30-40 seconds is better than 5.0% for 60 seconds - > but it's a pretty small effect. That doesn't surprise me too much. As I mentioned before, and Greg please correct me if I'm wrong, but I thought this patch was intended to reduce the latency spikes that we suffer from under some workloads, which can often be attributed back to i/o related contention. I don't believe it's intended or expected to seriously increase throughput. > The picture looks similar here. Increasing checkpoint_timeout isn't > *quite* as good as spreading out the fsyncs, but it's pretty darn > close. For example, looking at the median of the three 98th > percentile numbers for each configuration, the patch bought us a 28% > improvement in 98th percentile latency. But increasing > checkpoint_timeout by a minute bought us a 15% improvement in 98th > percentile latency. So it's still not clear to me that the patch is > doing anything on this test that you couldn't get just by increasing > checkpoint_timeout by a few more minutes. Granted, it lets you keep > your inter-checkpoint interval slightly smaller, but that's not that > exciting. That having been said, I don't have a whole lot of trouble > believing that there are other cases where this is more worthwhile. I could certainly see the checkpoint_timeout parameter, along with the others, as being sufficient to address this, in which case we likely don't need the patch. They're both more-or-less intended to do the same thing and it's just a question of if being more granular ends up helping or not. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Unnecessary WAL archiving after failover
On Thu, Mar 22, 2012 at 12:56 AM, Robert Haas wrote: > On Wed, Feb 29, 2012 at 5:48 AM, Fujii Masao wrote: >> Hi, >> >> In streaming replication, after failover, new master might have lots >> of un-applied >> WAL files with old timeline ID. They are the WAL files which were recycled >> as a >> future ones when the server was running as a standby. Since they will never >> be >> used later, they don't need to be archived after failover. But since they >> have >> neither .ready nor .done file in archive_status, checkpoints after >> failover newly >> create .reacy files for them, and then finally they are archived. >> Which might cause >> disk I/O spike both in WAL and archive storage. >> >> To avoid the above problem, I think that un-applied WAL files with old >> timeline ID >> should be marked as already-archived and recycled immediately at the end of >> recovery. Thought? > > I'm not an expert on this, but that makes sense to me. Thanks for agreeing with my idea. On second thought, I found other issues about WAL archiving after failover. So let me clarify the issues again. Just after failover, there can be three kinds of WAL files in new master's pg_xlog directory: (1) WAL files which were recycled to by restartpoint I've already explained upthread the issue which these WAL files cause after failover. (2) WAL files which were restored from the archive In 9.1 or before, the restored WAL files don't remain after failover because they are always restored onto the temporary filename "RECOVERYXLOG". So the issue which I explain from now doesn't exist in 9.1 or before. In 9.2dev, as the result of supporting cascade replication, an archived WAL file is restored onto correct file name so that cascading walsender can send it to another standby. This restored WAL file has neither .ready nor .done archive status file. After failover, checkpoint checks the archive status file of the restored WAL file to attempt to recycle it, finds that it has neither .ready nor ,done, and creates .ready. Because of existence of .ready, it will be archived again even though it obviously already exists in the archival storage :( To prevent a restored WAL file from being archived again, I think that .done should be created whenever WAL file is successfully restored (of course this should happen only when archive_mode is enabled). Thought? Since this is the oversight of cascade replication, I'm thinking to implement the patch for 9.2dev. (3) WAL files which were streamed from the master These WAL files also don't have any archive status, so checkpoint creates .ready for them after failover. And then, all or many of them will be archived at a time, which would cause I/O spike on both WAL and archival storage. To avoid this problem, I think that we should change walreceiver so that it creates .ready as soon as it completes the WAL file. Also we should change the archiver process so that it starts up even in standby mode and archives the WAL files. If each server has its own archival storage, the above solution would work fine. But if all servers share the archival storage, multiple archiver processes in those servers might archive the same WAL file to the shared area at the same time. Is this OK? If not, to avoid this, we might need to separate archive_mode into two: one for normal mode (i.e., master), another for standbfy mode. If the archive is shared, we can ensure that only one archiver in the master copies the WAL file at the same time by disabling WAL archiving in standby mode but enabling it in normal mode. Thought? Invoking the archiver process in standby mode is new feature, not a bug fix. It's too late to propose new feature for 9.2. So I'll propose this for 9.3. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regarding column reordering project for GSoc 2012
On Wed, Mar 21, 2012 at 10:51 PM, Atri Sharma wrote: > Please let me know how to proceed further. > > Waiting for your reply, sure -- let's take this discussion off line. send me a private mail and we'll discuss if/how we can get this off the ground. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reporting WAL file containing checkpoint's REDO record in pg_controldata's result
On Fri, Mar 23, 2012 at 6:13 AM, Fujii Masao wrote: > On Fri, Mar 23, 2012 at 5:56 PM, Magnus Hagander wrote: >> Might it be a good idea to put it on it's own row instead of changing >> the format of an existing row, in order not to break scripts and >> programs that are parsing the previous output? > > Good idea! What row name should we use for the WAL file containing > REDO record? "Latest checkpoint's REDO file"? Sounds good to me. I like the idea, too. The status quo is an unnecessary nuisance, so this will be a nice usability improvement. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpoint patches
On Thu, Mar 22, 2012 at 8:44 PM, Stephen Frost wrote: > * Robert Haas (robertmh...@gmail.com) wrote: >> On Thu, Mar 22, 2012 at 3:45 PM, Stephen Frost wrote: >> > Well, those numbers just aren't that exciting. :/ >> >> Agreed. There's clearly an effect, but on this test it's not very big. > > Ok, perhaps that was because of how you were analyzing it using the 90th > percetile..? Well, how do you want to look at it? Here's the data from 80th percentile through 100th percentile - percentile, patched, unpatched, difference - for the same two runs I've been comparing: 80 1321 1348 -27 81 1333 1360 -27 82 1345 1373 -28 83 1359 1387 -28 84 1373 1401 -28 85 1388 1417 -29 86 1404 1434 -30 87 1422 1452 -30 88 1441 1472 -31 89 1462 1494 -32 90 1487 1519 -32 91 1514 1548 -34 92 1547 1582 -35 93 1586 1625 -39 94 1637 1681 -44 95 1709 1762 -53 96 1825 1905 -80 97 2106 2288 -182 98 12100 24645 -12545 99 186043 201309 -15266 100 9513855 9074161 439694 Here are the 95th-100th percentiles for each of the six runs: ckpt.checkpoint-sync-pause-v1.10: 1709, 1825, 2106, 12100, 186043, 9513855 ckpt.checkpoint-sync-pause-v1.11: 1707, 1824, 2118, 16792, 196107, 8869602 ckpt.checkpoint-sync-pause-v1.12: 1693, 1807, 2091, 15132, 191207, 7246326 ckpt.master.10: 1734, 1875, 2235, 21145, 203214, 6855888 ckpt.master.11: 1762, 1905, 2288, 24645, 201309, 9074161 ckpt.master.12: 1746, 1889, 2272, 20309, 194459, 7833582 By the way, I reran the tests on master with checkpoint_timeout=16min, and here are the tps results: 2492.966759, 2588.750631, 2575.175993. So it seems like not all of the tps gain from this patch comes from the fact that it increases the time between checkpoints. Comparing the median of three results between the different sets of runs, applying the patch and setting a 3s delay between syncs gives you about a 5.8% increase throughput, but also adds 30-40 seconds between checkpoints. If you don't apply the patch but do increase time between checkpoints by 1 minute, you get about a 5.0% increase in throughput. That certainly means that the patch is doing something - because 5.8% for 30-40 seconds is better than 5.0% for 60 seconds - but it's a pretty small effect. And here are the latency results for 95th-100th percentile with checkpoint_timeout=16min. ckpt.master.13: 1703, 1830, 2166, 17953, 192434, 43946669 ckpt.master.14: 1728, 1858, 2169, 15596, 187943, 9619191 ckpt.master.15: 1700, 1835, 2189, 22181, 206445, 8212125 The picture looks similar here. Increasing checkpoint_timeout isn't *quite* as good as spreading out the fsyncs, but it's pretty darn close. For example, looking at the median of the three 98th percentile numbers for each configuration, the patch bought us a 28% improvement in 98th percentile latency. But increasing checkpoint_timeout by a minute bought us a 15% improvement in 98th percentile latency. So it's still not clear to me that the patch is doing anything on this test that you couldn't get just by increasing checkpoint_timeout by a few more minutes. Granted, it lets you keep your inter-checkpoint interval slightly smaller, but that's not that exciting. That having been said, I don't have a whole lot of trouble believing that there are other cases where this is more worthwhile. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reporting WAL file containing checkpoint's REDO record in pg_controldata's result
On Fri, Mar 23, 2012 at 5:56 PM, Magnus Hagander wrote: > Might it be a good idea to put it on it's own row instead of changing > the format of an existing row, in order not to break scripts and > programs that are parsing the previous output? Good idea! What row name should we use for the WAL file containing REDO record? "Latest checkpoint's REDO file"? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reporting WAL file containing checkpoint's REDO record in pg_controldata's result
On Fri, Mar 23, 2012 at 10:51, Fujii Masao wrote: > On Fri, Mar 23, 2012 at 2:06 PM, Jaime Casanova wrote: >> On Thu, Mar 22, 2012 at 11:06 PM, Fujii Masao wrote: >>> >>> We can use >>> pg_xlogfile_name function to calculate that, but it cannot be executed in >>> the standby. Another problem is that pg_xlogfile_name always uses >>> current timeline for the calculation, so if the reported timeline is not >>> the same as current one, pg_xlogfile_name cannot return the correct WAL >>> file name. Making pg_controldata report that WAL file name gets rid of >>> such a complexity. >>> >> >> i would think that pg_xlogfile_name() is not allowed in the standby >> because ThisTimelineId is not very well defined in recovery but if you >> extend pg_xlogfile_name() to also receive a timelineid as you >> suggested in [1] then that version of the function could be allowed in >> the standby. >> or there is something else i'm missing? >> >> is that enough for you to solve your problem? > > Yes, we can more easily calculate the cutoff point by using that extended > pg_xlogfile_name(). But if pg_controldata reports the WAL file name, we > can calculate the cutoff point without starting the server. So I think that > it's worth changing pg_controldata that way even if we extend > pg_xlogfile_name(). +1 - I think they're both useful things, each on it's own. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reporting WAL file containing checkpoint's REDO record in pg_controldata's result
On Fri, Mar 23, 2012 at 2:06 PM, Jaime Casanova wrote: > On Thu, Mar 22, 2012 at 11:06 PM, Fujii Masao wrote: >> >> We can use >> pg_xlogfile_name function to calculate that, but it cannot be executed in >> the standby. Another problem is that pg_xlogfile_name always uses >> current timeline for the calculation, so if the reported timeline is not >> the same as current one, pg_xlogfile_name cannot return the correct WAL >> file name. Making pg_controldata report that WAL file name gets rid of >> such a complexity. >> > > i would think that pg_xlogfile_name() is not allowed in the standby > because ThisTimelineId is not very well defined in recovery but if you > extend pg_xlogfile_name() to also receive a timelineid as you > suggested in [1] then that version of the function could be allowed in > the standby. > or there is something else i'm missing? > > is that enough for you to solve your problem? Yes, we can more easily calculate the cutoff point by using that extended pg_xlogfile_name(). But if pg_controldata reports the WAL file name, we can calculate the cutoff point without starting the server. So I think that it's worth changing pg_controldata that way even if we extend pg_xlogfile_name(). Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed dblink using alternate libpq tuple storage
Thank you for picking up. > is considering three cases: it got a 2-byte integer (and can continue on), > or there aren't yet 2 more bytes available in the buffer, in which case it > should return EOF without doing anything, or pqGetInt detected a hard > error and updated the connection error state accordingly, in which case > again there is nothing to do except return EOF. In the patched code we > have: ... > which handles neither the second nor third case correctly: it thinks that > "data not here yet" is a hard error, and then makes sure it is an error by > destroying the parsing state :-(. Marko and I think that, in protocol 3, all bytes of the incoming message should have been surely loaded when entering getAnotherTuple(). The following part In pqParseInput3() does this. | if (avail < msgLength) | { | /* | * Before returning, enlarge the input buffer if needed to hold | * the whole message. | (snipped).. | */ | if (pqCheckInBufferSpace(conn->inCursor + (size_t) msgLength, | conn)) | { | /* | * XXX add some better recovery code... | (snipped).. | */ | handleSyncLoss(conn, id, msgLength); | } | return; So, if cursor state is broken just after exiting getAnotherTuple(), it had already been broken BEFORE entering getAnotherTuple() according to current disign. That is the 'protocol error' means. pqGetInt there should not detect any errors except for broken message. > error, that possibly-useful error message is overwritten with an entirely > useless "protocol error" text. Plus, current pqGetInt seems to set its own error message only for the wrong parameter 'bytes'. On the other hand, in protocol 2 (to be removed ?) the error handling mechanism get touched, because full-load of the message is not guraranteed. > I don't think the error return cases for the row processor have been > thought out too well either. The row processor is not in charge of what > happens to the PGresult, Default row processor stuffs PGresult with tuples, another (say that of dblink) leave it empty. Row processor manages PGresult by the extent of their own. > and it certainly has no business telling libpq to just "exit > immediately from the topmost libpq function". If we do that > we'll probably lose sync with the data stream and be unable to > recover use of the connection at all. I don't think PGresult has any charge of error handling system in current implement. The phrase 'exit immediately from the topmost libpq function' should not be able to be seen in the patch. The exit routes from row processor are following, - Do longjmp (or PG_PG_TRY-CATCH mechanism) out of the row processor. - Row processor returns 0 when entered from PQisBusy(), immediately exit from PQisBusy(). Curosor consistency will be kept in both case. The cursor already be on the next to the last byte of the current message. > Also, do we need to consider any error cases for the row > processor other than out-of-memory? If so it might be a good > idea for it to have some ability to store a custom error > message into the PGconn, which it cannot do given the current > function API. It seems not have so strong necessity concerning dblink or PQgetRow comparing to expected additional complexity around. So this patch does not include it. > In the same vein, I am fairly uncomfortable with the blithe assertion that > a row processor can safely longjmp out of libpq. This was never foreseen > in the original library coding and there are any number of places that > that might break, now or in the future. Do we really need to allow it? To protect row processor from longjmp'ing out, I enclosed the functions potentially throw exception by PG_TRY-CATCH clause in the early verson. This was totally safe but the penalty was not negligible because the TRY-CATCH was passed for every row. > If we do, it would be a good idea to decorate the libpq > functions that are now expected to possibly longjmp with > comments saying so. Tracing all the potential call paths that > might be aborted by a longjmp is an essential activity anyway. Concerning now but the future, I can show you the trail of confirmation process. - There is no difference between with and without the patch at the level of getAnotherTuple() from the view of consistency. - Assuming pqParseInput3 detects the next message has not come after getAnotherTuple returned. It exits immediately on reading the length of the next message. This is the same condition to longjumping. > if (pqGetInt(&msgLength, 4, conn)) > return; - parseInput passes it through and immediately exits in consistent state. - The caller of PQgetResult, PQisBusy, PQskipResult, PQnotifies, PQputCopyData, pqHandleSendFailure gain the control finally. I am convinced that the async status at the time must be PGASYNC_BUSY and the conn cursor in consistent state. So the ancesto
Re: [HACKERS] Reporting WAL file containing checkpoint's REDO record in pg_controldata's result
On Fri, Mar 23, 2012 at 05:06, Fujii Masao wrote: > Hi, > > I'd like to propose to change pg_controldata so that it reports the name > of WAL file containing the latest checkpoint's REDO record, as follows: > > $ pg_controldata $PGDATA > ... > Latest checkpoint's REDO location: 0/16D6ACC (file > 00010001) > Latest checkpoint's TimeLineID: 1 > ... > > This simplifies very much the way to calculate the archive file cutoff point > because the reported WAL file is just cutoff point itself. If the file name is > not reported, we have to calculate the cutoff point from the reported > location and timeline, which is complicated calculation. We can use > pg_xlogfile_name function to calculate that, but it cannot be executed in > the standby. Another problem is that pg_xlogfile_name always uses > current timeline for the calculation, so if the reported timeline is not > the same as current one, pg_xlogfile_name cannot return the correct WAL > file name. Making pg_controldata report that WAL file name gets rid of > such a complexity. > > You may think that archive_cleanup_command is usable for that purpose. > That's true. But it's not usable simply for the case where there are more > than one standby servers. In this case, the archive file cutoff point needs > to be calculated from each standby's REDO location and timeline. > > Attached patch changes pg_controldata as above. Thought? Might it be a good idea to put it on it's own row instead of changing the format of an existing row, in order not to break scripts and programs that are parsing the previous output? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standbys, txid_current_snapshot, wraparound
On Fri, Mar 23, 2012 at 12:26 AM, Daniel Farina wrote: > Some time ago I reported bug 6291[0], which reported a Xid wraparound, > both as reported in pg_controldata and by txid_current_snapshot. > Unfortunately, nobody could reproduce it. > > Today, the same system of ours just passed the wraparound mark > successfully at this time, incrementing the epoch. However, two > standbys have not done the same: they have wrapped to a low txid. At > this time, pg_controldata does report the correct epoch, as I read it, > unlike the original case. > > I have not yet tried to reproduce this in a minimal way, but I wanted > to relate this information as soon as possible. > > These systems are 9.0.6, on Ubuntu 10.04 LTS, amd64. > > [0]: http://archives.postgresql.org/pgsql-bugs/2011-11/msg00094.php So we have this? Master pg_controldata - OK txid_current_snapshot() - OK Standby pg_controldata - OK txid_current_snapshot() - lower value Are there just 2 standbys? So all standbys have acted identically? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers