date:20120323

Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema

2012-03-23 Thread Qi Huang





> Date: Thu, 22 Mar 2012 13:17:01 -0400
> Subject: Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema
> From: cbbro...@gmail.com
> To: kevin.gritt...@wicourts.gov
> CC: pgsql-hackers@postgresql.org
> 
> On Thu, Mar 22, 2012 at 12:38 PM, Kevin Grittner
>  wrote:
> > Tom Lane  wrote:
> >> Robert Haas  writes:
> >>> Well, the standard syntax apparently aims to reduce the number of
> >>> returned rows, which ORDER BY does not.  Maybe you could do it
> >>> with ORDER BY .. LIMIT, but the idea here I think is that we'd
> >>> like to sample the table without reading all of it first, so that
> >>> seems to miss the point.
> >>
> >> I think actually the traditional locution is more like
> >>   WHERE random() < constant
> >> where the constant is the fraction of the table you want.  And
> >> yeah, the presumption is that you'd like it to not actually read
> >> every row.  (Though unless the sampling density is quite a bit
> >> less than 1 row per page, it's not clear how much you're really
> >> going to win.)
> >
> > It's all going to depend on the use cases, which I don't think I've
> > heard described very well yet.
> >
> > I've had to pick random rows from, for example, a table of
> > disbursements to support a financial audit.  In those cases it has
> > been the sample size that mattered, and order didn't.  One
> > interesting twist there is that for some of these financial audits
> > they wanted the probability of a row being selected to be
> > proportional to the dollar amount of the disbursement.  I don't
> > think you can do this without a first pass across the whole data
> > set.
> 
> This one was commonly called "Dollar Unit Sampling," though the
> terminology has gradually gotten internationalized.
> http://www.dummies.com/how-to/content/how-does-monetary-unit-sampling-work.html
> 
> What the article doesn't mention is that some particularly large items
> might wind up covering multiple samples.  In the example, they're
> looking for a sample every $3125 down the list.  If there was a single
> transaction valued at $3, that (roughly) covers 10 of the desired
> samples.
> 
> It isn't possible to do this without scanning across the entire table.
> 
> If you want repeatability, you probably want to instantiate a copy of
> enough information to indicate the ordering chosen.  That's probably
> something that needs to be captured as part of the work of the audit,
> so not only does it need to involve a pass across the data, it
> probably requires capturing a fair bit of data for posterity.
> -- 
> When confronted by a difficult problem, solve it by reducing it to the
> question, "How would the Lone Ranger handle this?"



The discussion till now has gone far beyond my understanding.Could anyone 
explain briefly what is the idea for now? The designing detail for me is still 
unfamiliar. I can only take time to understand while possible after being 
selected and put time on it to read relevant material. For now, I'm still 
curious why Neil's implementation is no longer working? The Postgres has been 
patched a lot, but the general idea behind Neil's implementation should still 
work, isn't it? Besides, whether this query is needed is still not decided. 
Seems this is another hard to decide point.  Is it that this topic is still not 
so prepared for the Gsoc yet? If really so, I think I still have time to switch 
to other topics. Any suggestion?
Thanks.

Best Regards and ThanksHuang Qi VictorComputer Science of National University 
of Singapore

Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-03-23 Thread Marko Kreen

I saw Kyotaro already answered, but I give my view as well.

On Thu, Mar 22, 2012 at 06:07:16PM -0400, Tom Lane wrote:
> AFAICT it breaks async processing entirely, because many changes have been
> made that fail to distinguish "insufficient data available as yet" from
> "hard error".  As an example, this code at the beginning of
> getAnotherTuple:
>   
>   /* Get the field count and make sure it's what we expect */
>   if (pqGetInt(&tupnfields, 2, conn))
> ! return EOF;
> 
> is considering three cases: it got a 2-byte integer (and can continue on),
> or there aren't yet 2 more bytes available in the buffer, in which case it
> should return EOF without doing anything, or pqGetInt detected a hard
> error and updated the connection error state accordingly, in which case
> again there is nothing to do except return EOF.  In the patched code we
> have:
> 
>   /* Get the field count and make sure it's what we expect */
>   if (pqGetInt(&tupnfields, 2, conn))
> ! {
> ! /* Whole the message must be loaded on the buffer here */
> ! errmsg = libpq_gettext("protocol error\n");
> ! goto error_and_forward;
> ! }
> 
> which handles neither the second nor third case correctly: it thinks that
> "data not here yet" is a hard error, and then makes sure it is an error by
> destroying the parsing state :-(.  And if in fact pqGetInt did log an
> error, that possibly-useful error message is overwritten with an entirely
> useless "protocol error" text.

No, "protocol error" really is only error case here.

- pqGetInt() does not set errors.

- V3 getAnotherTuple() is called only if packet is fully in buffer.

> I don't think the error return cases for the row processor have been
> thought out too well either.  The row processor is not in charge of what
> happens to the PGresult, and it certainly has no business telling libpq to
> just "exit immediately from the topmost libpq function".  If we do that
> we'll probably lose sync with the data stream and be unable to recover use
> of the connection at all.  Also, do we need to consider any error cases
> for the row processor other than out-of-memory?

No, the rule is *not* "exit to topmost", but "exit PQisBusy()".

This is exactly so that if any code that does not expect row-processor
behaviour continues to work.

Also, from programmers POV, this also means row-processor callback causes
minimal changes to existing APIs.

> If so it might be a good
> idea for it to have some ability to store a custom error message into the
> PGconn, which it cannot do given the current function API.

There already was such function, but it was row-processor specific hack
that could leak out and create confusion.  I rejected it.  Instead there
should be generic error setting function, equivalent to current libpq
internal error setting.

But such generic error setting function would need review all libpq
error states as it allows error state appear in new situations.  Also
we need to have well-defined behaviour of client-side errors vs. incoming
server errors.

Considering that even current cut-down patch is troubling committers,
I would definitely suggest postponing such generic error setter to 9.3.

Especially as it does not change anything coding-style-wise.

> In the same vein, I am fairly uncomfortable with the blithe assertion that
> a row processor can safely longjmp out of libpq.  This was never foreseen
> in the original library coding and there are any number of places that
> that might break, now or in the future.  Do we really need to allow it?
> If we do, it would be a good idea to decorate the libpq functions that are
> now expected to possibly longjmp with comments saying so.  Tracing all the
> potential call paths that might be aborted by a longjmp is an essential
> activity anyway.

I think we *should* allow exceptions, but in limited number of APIs.

Basically, the usefulness for users vs. effort from our side
is clearly on the side of providing it.

But its up to us to define what the *limited* means (what needs
least effort from us), so that later when users want to use exceptions
in callback, they need to pick right API.

Currently it seems only PQexec() + multiple SELECTS can give trouble,
as previous PGresult is kept in stack.  Should we unsupport
PQexec or multiple SELECTS?

But such case it borken even without exceptions - or at least
very confusing.  Not sure what to do with it.

In any case, "decorating" libpq functions is wrong approach.  This gives
suggestion that caller of eg. PQexec() needs to take care of any possible
behaviour of unknown callback.  This will not work.   Instead allowed
functions should be simply listed in row-processor documentation.

Basically custom callback should be always matched by caller that
knows about it and knows how to handle it.  Not sure how to put
such suggestion into documentation tho'.

> Another design deficiency is PQskipResult().  This is badly designed for
>

Re: [HACKERS] Refactoring simplify_function (was: Caching constant stable expressions)

2012-03-23 Thread Marti Raudsepp

On Sat, Mar 24, 2012 at 01:17, Tom Lane  wrote:
> I've applied a slightly-modified version of this after reconciling it
> with the protransform fixes.

Cool, thanks!

> I assume you are going to submit a rebased
> version of the main CacheExpr patch?

Yeah, I'm still working on addressing the comments from your last email.

Haven't had much time to work on it for the last 2 weeks, but I hope
to finish most of it this weekend.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring simplify_function (was: Caching constant stable expressions)

2012-03-23 Thread Tom Lane

Marti Raudsepp  writes:
> Per Tom's request, I split out this refactoring from my CacheExpr patch.

> Basically I'm just centralizing the eval_const_expressions_mutator()
> call on function arguments, from multiple different places to a single
> location. Without this, it would be a lot harder to implement argument
> caching logic in the CacheExpr patch.

I've applied a slightly-modified version of this after reconciling it
with the protransform fixes.  I assume you are going to submit a rebased
version of the main CacheExpr patch?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

2012-03-23 Thread Dimitri Fontaine

Alvaro Herrera  writes:
> So you still need an index on (oid), one on (extoid), and one on
> (extfeature).

Yes. And the main use case for the index on (extoid) is listing a given
extension's features, that we want to order by their name, then the set
of indexes I've been defining is now:

Indexes:
"pg_extension_feature_name_index" UNIQUE, btree (extfeature)
"pg_extension_feature_oid_index" UNIQUE, btree (oid)
"pg_extension_feature_extoid_name_index" btree (extoid, extfeature)

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

2012-03-23 Thread Alvaro Herrera


Excerpts from Dimitri Fontaine's message of vie mar 23 16:51:57 -0300 2012:
> 
> Alvaro Herrera  writes:
> >> Yes, for pg_depend, no I don't know how to make that work with pointing
> >> to the extensions directly, because the whole point here is to be able
> >> to depend on a feature rather than the whole extension.
> >
> > Yes, I understand that -- but would it work to have the feature
> > resolution be done at install/upgrade time, and once it's resolved, you
> > record it by storing the extension than contains the feature?  That way
> 
> I don't think so, because at upgrade time you then typically only have
> the new .control file with the new set of features, and you need to
> act on the difference between the old and new features compared to the
> current other packages dependencies towards them.

Aha, right.

So you still need an index on (oid), one on (extoid), and one on
(extfeature).

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Apology to the community

2012-03-23 Thread Bruce Momjian

On Fri, Mar 23, 2012 at 11:43:25AM -0700, Joshua D. Drake wrote:
> 
> Hello,
> 
> It has been brought to my attention a few times over the last year
> that I have been over the top in my presentation of myself and have
> in fact alienated and offended many of the community. To be honest I
> am unaware of everything I have done but I do take the opinion of
> those who have taken the time to point it out to me seriously. They
> have been peers, friends, and community members, some of them for
> over a decade.
> 
> The last year has been very trying personally. If you wish to know
> details, please email me directly. Although, I know that I can be a
> difficult personality even at the best of times, I have always tried
> to keep the communities best interest at heart. It is this past
> year, and the trials associated that brought about, frankly what I
> would consider some of the worst of who I am.
> 
> With that, I would like to apologize directly to this community that
> has provided me with so much, not just professionally but
> personally.

I want to commend Joshua Drake for the honesty and transparency of his
comments above, and for its sentiments.  Thank you, Josh.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Apology to the community

2012-03-23 Thread Josh Berkus

JD,

> With that, I would like to apologize directly to this community that has
> provided me with so much, not just professionally but personally.

Thank you for the apology.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

2012-03-23 Thread Dimitri Fontaine

Alvaro Herrera  writes:
>> Yes, for pg_depend, no I don't know how to make that work with pointing
>> to the extensions directly, because the whole point here is to be able
>> to depend on a feature rather than the whole extension.
>
> Yes, I understand that -- but would it work to have the feature
> resolution be done at install/upgrade time, and once it's resolved, you
> record it by storing the extension than contains the feature?  That way

I don't think so, because at upgrade time you then typically only have
the new .control file with the new set of features, and you need to
act on the difference between the old and new features compared to the
current other packages dependencies towards them.

For that to work you need to remember the exact set of per feature
dependencies in between extensions. You can't trust the control files to
reflect the reality you saw when installing or last updating.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch for parallel pg_dump

2012-03-23 Thread Joachim Wieland

On Fri, Mar 23, 2012 at 11:11 AM, Alvaro Herrera
 wrote:
> Are you going to provide a rebased version?

Yes, working on that.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE FOREGIN TABLE LACUNA

2012-03-23 Thread David Fetter

On Fri, Mar 23, 2012 at 11:38:56AM -0700, David Fetter wrote:
> On Thu, Mar 15, 2012 at 11:23:43AM -0300, Alvaro Herrera wrote:
> > Excerpts from David Fetter's message of jue mar 15 02:28:28 -0300 2012:
> > > On Wed, Mar 14, 2012 at 12:06:20PM -0400, Robert Haas wrote:
> > > > On Wed, Mar 14, 2012 at 10:22 AM, David Fetter  wrote:
> > > > >> I think that instead of inventing new grammar productions and a new
> > > > >> node type for this, you should just reuse the existing productions 
> > > > >> for
> > > > >> LIKE clauses and then reject invalid options during parse analysis.
> > > > >
> > > > > OK.  Should I first merge CREATE FOREIGN TABLE with CREATE TABLE and
> > > > > submit that as a separate patch?
> > > > 
> > > > I don't see any reason to do that.  I merely meant that you could
> > > > reuse TableLikeClause or maybe even TableElement in the grammer for
> > > > CreateForeignTableStmt.
> > > 
> > > Next WIP patch attached implementing this via reusing TableLikeClause
> > > and refactoring transformTableLikeClause().
> > > 
> > > What say?
> > 
> > Looks much better to me, but the use of strcmp() doesn't look good.
> > ISTM that stmtType is mostly used for error messages.  I think you
> > should add some kind of identifier (such as the original parser Node)
> > into the CreateStmtContext so that you can do a IsA() test instead -- a
> > bit more invasive as a patch, but much cleaner.
> > 
> > Also the error messages need more work.
> 
> How about this one?

Oops, forgot to put the latest docs in.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
*** a/doc/src/sgml/ref/create_foreign_table.sgml
--- b/doc/src/sgml/ref/create_foreign_table.sgml
***
*** 19,26 
   
  
  CREATE FOREIGN TABLE [ IF NOT EXISTS ] table_name ( [
!   { column_name data_type [ OPTIONS ( option 'value' [, ... ] ) ] [ NULL | NOT NULL ] }
! [, ... ]
  ] )
SERVER server_name
  [ OPTIONS ( option 'value' [, ... ] ) ]
--- 19,26 
   
  
  CREATE FOREIGN TABLE [ IF NOT EXISTS ] table_name ( [
!   { { column_name data_type [ NULL | NOT NULL ] | LIKE 
source_table } [, ... ]
!   [ OPTIONS ( option 
'value' [, ... ] ) ] }
  ] )
SERVER server_name
  [ OPTIONS ( option 'value' [, ... ] ) ]
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
***
*** 3945,3950  ForeignTableElementList:
--- 3945,3951 
  
  ForeignTableElement:
columnDef   { $$ = 
$1; }
+ | TableLikeClause { $$ = $1; }
;
  
  /*
*** a/src/backend/parser/parse_utilcmd.c
--- b/src/backend/parser/parse_utilcmd.c
***
*** 66,71  typedef struct
--- 66,72 
  {
ParseState *pstate; /* overall parser state */
const char *stmtType;   /* "CREATE [FOREIGN] TABLE" or "ALTER 
TABLE" */
+   charrelkind;/* r = ordinary table, f = 
foreign table, cf. pg_catalog.pg_class */
RangeVar   *relation;   /* relation to create */
Relationrel;/* opened/locked rel, if ALTER 
*/
List   *inhRelations;   /* relations to inherit from */
***
*** 194,202  transformCreateStmt(CreateStmt *stmt, const char *queryString)
--- 195,209 
  
cxt.pstate = pstate;
if (IsA(stmt, CreateForeignTableStmt))
+   {
cxt.stmtType = "CREATE FOREIGN TABLE";
+   cxt.relkind = 'f';
+   }
else
+   {
cxt.stmtType = "CREATE TABLE";
+   cxt.relkind = 'r';
+   }
cxt.relation = stmt->relation;
cxt.rel = NULL;
cxt.inhRelations = stmt->inhRelations;
***
*** 623,629  transformTableConstraint(CreateStmtContext *cxt, Constraint 
*constraint)
  /*
   * transformTableLikeClause
   *
!  * Change the LIKE  portion of a CREATE TABLE statement into
   * column definitions which recreate the user defined column portions of
   * .
   */
--- 630,636 
  /*
   * transformTableLikeClause
   *
!  * Change the LIKE  portion of a CREATE [FOREIGN] TABLE statement 
into
   * column definitions which recreate the user defined column portions of
   * .
   */
***
*** 652,657  transformTableLikeClause(CreateStmtContext *cxt, 
TableLikeClause *table_like_cla
--- 659,683 

table_like_clause->relation->relname)));
  
cancel_parser_errposition_callback(&pcbstate);
+   
+   /*
+* For foreign tables, disallow some options.
+*/
+   if (cxt->relkind ==

Re: [HACKERS] query cache

2012-03-23 Thread Robert Haas

On Fri, Mar 23, 2012 at 1:51 PM, Greg Stark  wrote:
> Well it's not entirely unlikely. If you step back a web application
> looks like a big loop with a switch statement to go to different
> pages. It keeps executing the same loop over and over again and there
> are only a smallish number of web pages. Sure the bind variables
> change but there will only be so many bind values and 10% of those
> will get 90% of the traffic too.

That may be true, but lots of web applications have millions of users.
 The fact that a few hundred thousand of those may account for most of
the traffic doesn't seem like it's going to help much unless there are
not many users in total; and in that case it's plenty fast enough
without a cache anyway.

> But the other thing that happens is that people run multiple queries
> aggregating or selecting from the same subset of data. So you often
> get things like
>
> select count(*) from ()
> select * from () order by foo limit 10
> select * from () order by bar limit 10
>
> for the same . That means if we could cache the rows
> coming out of parts of the plan and remember those rows when we see a
> plan with a common subtree in the plan then we could avoid a lot of
> repetitive work.

Currently, we don't even recognize this situation within a plan; for
example, if you do project pp LEFT JOIN person sr ON pp.sales_rep_id =
sr.id LEFT JOIN person pm ON pp.project_manager_id = pm.id, the query
planner will happily seq-scan the person table twice to build two
copies of the same hash table.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Apology to the community

2012-03-23 Thread Merlin Moncure

On Fri, Mar 23, 2012 at 1:43 PM, Joshua D. Drake  wrote:
>
> Hello,
>
> It has been brought to my attention a few times over the last year that I
> have been over the top in my presentation of myself and have in fact
> alienated and offended many of the community. To be honest I am unaware of
> everything I have done but I do take the opinion of those who have taken the
> time to point it out to me seriously. They have been peers, friends, and
> community members, some of them for over a decade.
>
> The last year has been very trying personally. If you wish to know details,
> please email me directly. Although, I know that I can be a difficult
> personality even at the best of times, I have always tried to keep the
> communities best interest at heart. It is this past year, and the trials
> associated that brought about, frankly what I would consider some of the
> worst of who I am.
>
> With that, I would like to apologize directly to this community that has
> provided me with so much, not just professionally but personally.

does this mean we have to take down the "Joshua D. Drake" dartboard?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] query cache

2012-03-23 Thread Tom Lane

Robert Haas  writes:
> What I think is more common is the repeated submission of queries that
> are *nearly* identical, but with either different parameter bindings
> or different constants.  It would be nice to have some kind of cache
> that would allow us to avoid the overhead of parsing and planning
> nearly identical statements over and over again, but the trick is that
> you have to fingerprint the query to notice that's happening in the
> first place, and the fingerprinting has to cost less than what the
> cache saves you.  I don't know whether that's possible, but I suspect
> it's far from easy.

The traditional solution to this is to make the application do it, ie,
parameterized prepared statements.  Since that has direct benefits to
the application as well, in that it can use out-of-line values and
thereby avoid quoting and SQL-injection risks, it's not apparent that
it's a good idea to expend lots of sweat to reverse-engineer
parameterization from a collection of unparameterized queries.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Apology to the community

2012-03-23 Thread Joshua D. Drake



Hello,

It has been brought to my attention a few times over the last year that 
I have been over the top in my presentation of myself and have in fact 
alienated and offended many of the community. To be honest I am unaware 
of everything I have done but I do take the opinion of those who have 
taken the time to point it out to me seriously. They have been peers, 
friends, and community members, some of them for over a decade.


The last year has been very trying personally. If you wish to know 
details, please email me directly. Although, I know that I can be a 
difficult personality even at the best of times, I have always tried to 
keep the communities best interest at heart. It is this past year, and 
the trials associated that brought about, frankly what I would consider 
some of the worst of who I am.


With that, I would like to apologize directly to this community that has 
provided me with so much, not just professionally but personally.


Apologies,

JD
--
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
The PostgreSQL Conference - http://www.postgresqlconference.org/
@cmdpromptinc - @postgresconf - 509-416-6579

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE FOREGIN TABLE LACUNA

2012-03-23 Thread David Fetter

On Thu, Mar 15, 2012 at 11:23:43AM -0300, Alvaro Herrera wrote:
> Excerpts from David Fetter's message of jue mar 15 02:28:28 -0300 2012:
> > On Wed, Mar 14, 2012 at 12:06:20PM -0400, Robert Haas wrote:
> > > On Wed, Mar 14, 2012 at 10:22 AM, David Fetter  wrote:
> > > >> I think that instead of inventing new grammar productions and a new
> > > >> node type for this, you should just reuse the existing productions for
> > > >> LIKE clauses and then reject invalid options during parse analysis.
> > > >
> > > > OK.  Should I first merge CREATE FOREIGN TABLE with CREATE TABLE and
> > > > submit that as a separate patch?
> > > 
> > > I don't see any reason to do that.  I merely meant that you could
> > > reuse TableLikeClause or maybe even TableElement in the grammer for
> > > CreateForeignTableStmt.
> > 
> > Next WIP patch attached implementing this via reusing TableLikeClause
> > and refactoring transformTableLikeClause().
> > 
> > What say?
> 
> Looks much better to me, but the use of strcmp() doesn't look good.
> ISTM that stmtType is mostly used for error messages.  I think you
> should add some kind of identifier (such as the original parser Node)
> into the CreateStmtContext so that you can do a IsA() test instead -- a
> bit more invasive as a patch, but much cleaner.
> 
> Also the error messages need more work.

How about this one?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
*** a/doc/src/sgml/ref/create_foreign_table.sgml
--- b/doc/src/sgml/ref/create_foreign_table.sgml
***
*** 19,26 
   
  
  CREATE FOREIGN TABLE [ IF NOT EXISTS ] table_name ( [
!   { column_name data_type [ OPTIONS ( option 'value' [, ... ] ) ] [ NULL | NOT NULL ] }
! [, ... ]
  ] )
SERVER server_name
  [ OPTIONS ( option 'value' [, ... ] ) ]
--- 19,26 
   
  
  CREATE FOREIGN TABLE [ IF NOT EXISTS ] table_name ( [
!   { { column_name data_type [ NULL | NOT NULL ] | LIKE 
source_table } [, ... ]
!   [ OPTIONS ( option 
'value' [, ... ] ) ] }
  ] )
SERVER server_name
  [ OPTIONS ( option 'value' [, ... ] ) ]
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
***
*** 3945,3950  ForeignTableElementList:
--- 3945,3951 
  
  ForeignTableElement:
columnDef   { $$ = 
$1; }
+ | TableLikeClause { $$ = $1; }
;
  
  /*
*** a/src/backend/parser/parse_utilcmd.c
--- b/src/backend/parser/parse_utilcmd.c
***
*** 66,71  typedef struct
--- 66,72 
  {
ParseState *pstate; /* overall parser state */
const char *stmtType;   /* "CREATE [FOREIGN] TABLE" or "ALTER 
TABLE" */
+   charrelkind;/* r = ordinary table, f = 
foreign table, cf. pg_catalog.pg_class */
RangeVar   *relation;   /* relation to create */
Relationrel;/* opened/locked rel, if ALTER 
*/
List   *inhRelations;   /* relations to inherit from */
***
*** 194,202  transformCreateStmt(CreateStmt *stmt, const char *queryString)
--- 195,209 
  
cxt.pstate = pstate;
if (IsA(stmt, CreateForeignTableStmt))
+   {
cxt.stmtType = "CREATE FOREIGN TABLE";
+   cxt.relkind = 'f';
+   }
else
+   {
cxt.stmtType = "CREATE TABLE";
+   cxt.relkind = 'r';
+   }
cxt.relation = stmt->relation;
cxt.rel = NULL;
cxt.inhRelations = stmt->inhRelations;
***
*** 623,629  transformTableConstraint(CreateStmtContext *cxt, Constraint 
*constraint)
  /*
   * transformTableLikeClause
   *
!  * Change the LIKE  portion of a CREATE TABLE statement into
   * column definitions which recreate the user defined column portions of
   * .
   */
--- 630,636 
  /*
   * transformTableLikeClause
   *
!  * Change the LIKE  portion of a CREATE [FOREIGN] TABLE statement 
into
   * column definitions which recreate the user defined column portions of
   * .
   */
***
*** 652,657  transformTableLikeClause(CreateStmtContext *cxt, 
TableLikeClause *table_like_cla
--- 659,683 

table_like_clause->relation->relname)));
  
cancel_parser_errposition_callback(&pcbstate);
+   
+   /*
+* For foreign tables, disallow some options.
+*/
+   if (cxt->relkind == 'f')
+   {
+   if (table_like_clause->options & CREATE_TABLE_LIKE_CONSTRAINTS)
+   {
+   ereport(ERROR,
+

Re: [HACKERS] Finer Extension dependencies

2012-03-23 Thread Alvaro Herrera

Excerpts from Dimitri Fontaine's message of vie mar 23 13:12:22 -0300 2012:
> 
> Alvaro Herrera  writes:
> > Why do features have OIDs?  Is this for pg_depend entries?  If so, would
> > it work to have pg_depend entries point to extensions instead?
> 
> Yes, for pg_depend, no I don't know how to make that work with pointing
> to the extensions directly, because the whole point here is to be able
> to depend on a feature rather than the whole extension.

Yes, I understand that -- but would it work to have the feature
resolution be done at install/upgrade time, and once it's resolved, you
record it by storing the extension than contains the feature?  That way
it correctly breaks when the extension gets removed; and since we ensure
that upgrading an extension means delete its features and then insert
them anew, it would also correctly break at that point if some feature
is no longer provided.

I'm not wedded to this idea, so if we think it doesn't work for some
reason, I have no problem going back to the idea of having direct
dependencies to features instead.  But I think it's worth considering.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] query cache

2012-03-23 Thread Merlin Moncure

On Fri, Mar 23, 2012 at 12:03 PM, Robert Haas  wrote:
> On Fri, Mar 23, 2012 at 12:29 PM, Greg Stark  wrote:
>> On Fri, Mar 23, 2012 at 3:49 PM, Tom Lane  wrote:
>>> The complication, opportunities for bugs, and general slowdown
>>> associated with that would outweigh any possible gain, in the opinion
>>> of most hackers who have thought about this.
>>
>> I wouldn't be quite so pessimistic. I think the problem is that the
>> hard part in doing this for real is all the parts the proposal glosses
>> over. How much memory is it worth dedicating to the cache before the
>> cost of that memory costs more than it helps? How do you invalidate
>> cache entries efficiently enough that it doesn't become a bottleneck?
>
> I think the question of how you would invalidate things is a very good one.
>
> The other thing that makes me skeptical of this proposal is that I am
> not very sure that executing absolutely identical queries is a very
> common use case for a relational database.  I suppose there might be a
> few queries that run over and over again (e.g. whatever you need to
> render your home page), but I think those will be the exception, and
> not the rule.  It therefore seems likely that the overhead of such a
> cache would in most cases be greater than the benefit of having it in
> the first place.
>
> What I think is more common is the repeated submission of queries that
> are *nearly* identical, but with either different parameter bindings
> or different constants.  It would be nice to have some kind of cache
> that would allow us to avoid the overhead of parsing and planning
> nearly identical statements over and over again, but the trick is that
> you have to fingerprint the query to notice that's happening in the
> first place, and the fingerprinting has to cost less than what the
> cache saves you.  I don't know whether that's possible, but I suspect
> it's far from easy.

Query cache basically addresses two use cases:
1) read only or mostly read only workloads
2) badly written application code (either by human or machine)

The problem is that #1 can be optimized by any number of simple
techniques, and #2 is not a good basis for complicated internal
features with nasty trade-offs.  mysql's query cache woes are well
known -- it's typical for administrators to turn the feature off.  The
feature is misnamed -- it's a 'benchmark cheating feature' since a lot
of db benchmarks tend to focus on single user loads and/or highly
repetitive queries but completely falls over in production real world
workloads.  Also, it's really not that difficult to rig an ad-hoc
cache in the server or on the client side and you can then gear it
towards your particular use-case.

People that are asking for this probably really want materialized views instead.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] query cache

2012-03-23 Thread Greg Stark

On Fri, Mar 23, 2012 at 5:03 PM, Robert Haas  wrote:
> The other thing that makes me skeptical of this proposal is that I am
> not very sure that executing absolutely identical queries is a very
> common use case for a relational database.  I suppose there might be a
> few queries that run over and over again (e.g. whatever you need to
> render your home page), but I think those will be the exception, and
> not the rule.  It therefore seems likely that the overhead of such a
> cache would in most cases be greater than the benefit of having it in
> the first place.

Well it's not entirely unlikely. If you step back a web application
looks like a big loop with a switch statement to go to different
pages. It keeps executing the same loop over and over again and there
are only a smallish number of web pages. Sure the bind variables
change but there will only be so many bind values and 10% of those
will get 90% of the traffic too.

But the other thing that happens is that people run multiple queries
aggregating or selecting from the same subset of data. So you often
get things like

select count(*) from ()
select * from () order by foo limit 10
select * from () order by bar limit 10

for the same . That means if we could cache the rows
coming out of parts of the plan and remember those rows when we see a
plan with a common subtree in the plan then we could avoid a lot of
repetitive work.

This depends on being able to recognize when we can guarantee that
subtrees of plans produce the same rows even if the surrounding tree
changes. That will be true sometimes but not other times.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standbys, txid_current_snapshot, wraparound

2012-03-23 Thread Daniel Farina

On Fri, Mar 23, 2012 at 1:52 AM, Simon Riggs  wrote:
> So we have this?
>
> Master pg_controldata - OK txid_current_snapshot() - OK
> Standby pg_controldata - OK txid_current_snapshot() - lower value
>
> Are there just 2 standbys? So all standbys have acted identically?

Yes, I believe this is the situation. All have acted identically.
Also, some new data:

I took a new base backup after the epoch increment and started a new
standby, and it reported txid_current_snapshot correctly, at least
moments after it became consistent.  This morning, however, it does
not, and reports the 0-epoch number.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting WAL file containing checkpoint's REDO record in pg_controldata's result

2012-03-23 Thread Fujii Masao

On Sat, Mar 24, 2012 at 1:49 AM, Robert Haas  wrote:
> On Fri, Mar 23, 2012 at 12:42 PM, Fujii Masao  wrote:
>> On Fri, Mar 23, 2012 at 9:41 PM, Robert Haas  wrote:
>>> On Fri, Mar 23, 2012 at 6:13 AM, Fujii Masao  wrote:
 On Fri, Mar 23, 2012 at 5:56 PM, Magnus Hagander  
 wrote:
> Might it be a good idea to put it on it's own row instead of changing
> the format of an existing row, in order not to break scripts and
> programs that are parsing the previous output?

 Good idea! What row name should we use for the WAL file containing
 REDO record? "Latest checkpoint's REDO file"?
>>>
>>> Sounds good to me.  I like the idea, too.  The status quo is an
>>> unnecessary nuisance, so this will be a nice usability improvement.
>>
>> Attached patch adds new row "Latest checkpoint's REDO WAL segment:" into
>> the result of pg_controldata. I used the term "WAL segment" for the row name
>> instead of "file" because "WAL segment" is used in another row "Bytes per WAL
>> segment:". But better name?
>
> s/segment/file/g?

Yep, "file" might be more intuitive for a user than "segment". Attached is the
"file" version of the patch.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pg_controldata_walfilename_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] query cache

2012-03-23 Thread Robert Haas

On Fri, Mar 23, 2012 at 12:29 PM, Greg Stark  wrote:
> On Fri, Mar 23, 2012 at 3:49 PM, Tom Lane  wrote:
>> The complication, opportunities for bugs, and general slowdown
>> associated with that would outweigh any possible gain, in the opinion
>> of most hackers who have thought about this.
>
> I wouldn't be quite so pessimistic. I think the problem is that the
> hard part in doing this for real is all the parts the proposal glosses
> over. How much memory is it worth dedicating to the cache before the
> cost of that memory costs more than it helps? How do you invalidate
> cache entries efficiently enough that it doesn't become a bottleneck?

I think the question of how you would invalidate things is a very good one.

The other thing that makes me skeptical of this proposal is that I am
not very sure that executing absolutely identical queries is a very
common use case for a relational database.  I suppose there might be a
few queries that run over and over again (e.g. whatever you need to
render your home page), but I think those will be the exception, and
not the rule.  It therefore seems likely that the overhead of such a
cache would in most cases be greater than the benefit of having it in
the first place.

What I think is more common is the repeated submission of queries that
are *nearly* identical, but with either different parameter bindings
or different constants.  It would be nice to have some kind of cache
that would allow us to avoid the overhead of parsing and planning
nearly identical statements over and over again, but the trick is that
you have to fingerprint the query to notice that's happening in the
first place, and the fingerprinting has to cost less than what the
cache saves you.  I don't know whether that's possible, but I suspect
it's far from easy.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add notion of a "transform function" that can simplify function

2012-03-23 Thread Tom Lane

Noah Misch  writes:
> On Fri, Mar 23, 2012 at 11:31:54AM -0400, Tom Lane wrote:
>> ... I've not looked
>> yet at the existing transform functions, but why would they want to know
>> about the original node at all?

> You suggested[1] passing an Expr instead of an argument list, and your reasons
> still seem good to me.  That said, perhaps we should send both the original
> Expr and the simplified argument list.  That will help if we ever want to
> fully simplify x - y * 0.  (Then again, the feature is undocumented and we
> could change it when that day comes.)

I believe what I had in mind back then was that we'd build a new FuncExpr
containing the simplified argument list.  On reflection that's probably
the most future-proof way to do it, since otherwise anytime we change
the contents of FuncExpr, we'll be faced with possibly having to change
the signature of protransform functions.

Will go see what I can do with that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting WAL file containing checkpoint's REDO record in pg_controldata's result

2012-03-23 Thread Robert Haas

On Fri, Mar 23, 2012 at 12:42 PM, Fujii Masao  wrote:
> On Fri, Mar 23, 2012 at 9:41 PM, Robert Haas  wrote:
>> On Fri, Mar 23, 2012 at 6:13 AM, Fujii Masao  wrote:
>>> On Fri, Mar 23, 2012 at 5:56 PM, Magnus Hagander  
>>> wrote:
 Might it be a good idea to put it on it's own row instead of changing
 the format of an existing row, in order not to break scripts and
 programs that are parsing the previous output?
>>>
>>> Good idea! What row name should we use for the WAL file containing
>>> REDO record? "Latest checkpoint's REDO file"?
>>
>> Sounds good to me.  I like the idea, too.  The status quo is an
>> unnecessary nuisance, so this will be a nice usability improvement.
>
> Attached patch adds new row "Latest checkpoint's REDO WAL segment:" into
> the result of pg_controldata. I used the term "WAL segment" for the row name
> instead of "file" because "WAL segment" is used in another row "Bytes per WAL
> segment:". But better name?

s/segment/file/g?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting WAL file containing checkpoint's REDO record in pg_controldata's result

2012-03-23 Thread Fujii Masao

On Fri, Mar 23, 2012 at 9:41 PM, Robert Haas  wrote:
> On Fri, Mar 23, 2012 at 6:13 AM, Fujii Masao  wrote:
>> On Fri, Mar 23, 2012 at 5:56 PM, Magnus Hagander  wrote:
>>> Might it be a good idea to put it on it's own row instead of changing
>>> the format of an existing row, in order not to break scripts and
>>> programs that are parsing the previous output?
>>
>> Good idea! What row name should we use for the WAL file containing
>> REDO record? "Latest checkpoint's REDO file"?
>
> Sounds good to me.  I like the idea, too.  The status quo is an
> unnecessary nuisance, so this will be a nice usability improvement.

Attached patch adds new row "Latest checkpoint's REDO WAL segment:" into
the result of pg_controldata. I used the term "WAL segment" for the row name
instead of "file" because "WAL segment" is used in another row "Bytes per WAL
segment:". But better name?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pg_controldata_walfilename_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] query cache

2012-03-23 Thread Billy Earney

On Fri, Mar 23, 2012 at 11:29 AM, Greg Stark  wrote:

> On Fri, Mar 23, 2012 at 3:49 PM, Tom Lane  wrote:
> > The complication, opportunities for bugs, and general slowdown
> > associated with that would outweigh any possible gain, in the opinion
> > of most hackers who have thought about this.
>
> I wouldn't be quite so pessimistic. I think the problem is that the
> hard part in doing this for real is all the parts the proposal glosses
> over. How much memory is it worth dedicating to the cache before the
> cost of that memory costs more than it helps? How do you invalidate
> cache entries efficiently enough that it doesn't become a bottleneck?
>
> Also, you need to identify the specific advantages you hope a built-in
> cache would have over one implemented in the ORM or database library.
> If there aren't any advantages then those solutions are much simpler.
> And they have other advantages as well -- one of the main reason
> people implement caches is so they can move the load away from the
> bottleneck of the database to the more easily scaled out application.
>
> Thanks for the input.  I've had many of these thoughts myself, and I guess
it depends on the environment the database will be used, memory settings,
and other variables,  on how valuable a query cache would be.  I'll
definitely give this more thought before sending an official proposal.

Billy

Re: [HACKERS] query cache

2012-03-23 Thread Greg Stark

On Fri, Mar 23, 2012 at 3:49 PM, Tom Lane  wrote:
> The complication, opportunities for bugs, and general slowdown
> associated with that would outweigh any possible gain, in the opinion
> of most hackers who have thought about this.

I wouldn't be quite so pessimistic. I think the problem is that the
hard part in doing this for real is all the parts the proposal glosses
over. How much memory is it worth dedicating to the cache before the
cost of that memory costs more than it helps? How do you invalidate
cache entries efficiently enough that it doesn't become a bottleneck?

Also, you need to identify the specific advantages you hope a built-in
cache would have over one implemented in the ORM or database library.
If there aren't any advantages then those solutions are much simpler.
And they have other advantages as well -- one of the main reason
people implement caches is so they can move the load away from the
bottleneck of the database to the more easily scaled out application.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add notion of a "transform function" that can simplify function

2012-03-23 Thread Tom Lane

I wrote:
> However, see my response to Robert: why are we passing the original node
> to the transform function at all?  It would be more useful and easier to
> work with to pass the function's fully-processed argument list, I believe.

After a bit of looking around, I realize that the current implementation
of transform functions is flat-out wrong, because whenever a transform
actually fires, it proceeds to throw away all the work that
eval_const_expressions has done on the input, and instead return some
lightly-modified version of the original node tree.  Thus for example
in the regression database:

regression=# create function foo(x float8, y int) returns numeric as
regression-# 'select ($1 + $2)::numeric' language sql;
CREATE FUNCTION

regression=# select "numeric"(foo(y := 1, x := f1), -1) from float8_tbl;
ERROR:  unrecognized node type: 310

since the adjustment of foo's named arguments is thrown away.

So this patch is going to need some work.  I continue to not see any
particular reason why the transform function should need the original
node tree.  I think what it *should* be getting is the OID of the
function (currently, it's impossible for one transform to serve more
than one function, which seems like it might be useful); the input
collation (currently, transforms are basically unusable for any
collation-sensitive function), and the pre-simplified argument list.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

2012-03-23 Thread Dimitri Fontaine

Alvaro Herrera  writes:
> Why do features have OIDs?  Is this for pg_depend entries?  If so, would
> it work to have pg_depend entries point to extensions instead?

Yes, for pg_depend, no I don't know how to make that work with pointing
to the extensions directly, because the whole point here is to be able
to depend on a feature rather than the whole extension.

Use cases:

 - depend on a feature f that appeared in version y of the extension
   (bugfix, new capability)

 - deprecate a feature: alter extension update removes a feature, you
   want to know that the dependent extensions need processing (cascade
   to remove them in the operation, or update them before hand, etc)
   (still manual operation though)

I don't see how to handle those cases with a direct dependency on the
extension rather than one of its features.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Add notion of a "transform function" that can simplify function

2012-03-23 Thread Noah Misch

On Fri, Mar 23, 2012 at 11:31:54AM -0400, Tom Lane wrote:
> Robert Haas  writes:
> > On Fri, Mar 23, 2012 at 10:55 AM, Tom Lane  wrote:
> >> Why exactly was this thought to be a good idea:
> >> 
> >>> * A NULL original expression disables use of transform functions while
> >>> * retaining all other behaviors.
> 
> > I assumed that we were merely trying to avoid forcing the caller to
> > provide the expression tree if they didn't have it handy, and that the
> > comment was merely making allowance for the fact that someone might
> > want to do such a thing.
> 
> How would they not have the original expression tree handy?
> 
> But now that I'm looking at this ... the API specification for transform
> functions seems rather thoroughly broken anyway.  Why are we passing the
> original expression and nothing else?  This would appear to require the
> transform function to repeat all the input-normalization and
> simplification work done up to this point.  It would seem to me to be
> more useful to pass the fully-processed argument list.  I've not looked
> yet at the existing transform functions, but why would they want to know
> about the original node at all?

You suggested[1] passing an Expr instead of an argument list, and your reasons
still seem good to me.  That said, perhaps we should send both the original
Expr and the simplified argument list.  That will help if we ever want to
fully simplify x - y * 0.  (Then again, the feature is undocumented and we
could change it when that day comes.)

[1] http://archives.postgresql.org/pgsql-hackers/2011-06/msg00915.php

The existing transform functions are trivial and could survive on nearly any
API we might consider.  See varchar_transform().

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] query cache

2012-03-23 Thread Tom Lane

Billy Earney  writes:
> I'm wondering if anyone would be interested in a query cache as a backend
> to postgresql?

I believe this has been suggested and rejected several times before.
Did you look through the pgsql-hackers archives?

> To invalidate cache entries, look at the transactions being committed (and
> written to WAL log, if my memory serves me) and send a message to the
> qcache process to invalidate any query which depends on the modfied
> relation (ie, table, etc)

The complication, opportunities for bugs, and general slowdown
associated with that would outweigh any possible gain, in the opinion
of most hackers who have thought about this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

2012-03-23 Thread Alvaro Herrera


Excerpts from Dimitri Fontaine's message of vie mar 23 12:26:47 -0300 2012:
> 
> Alvaro Herrera  writes:
> > Excerpts from Dimitri Fontaine's message of vie mar 23 11:05:37 -0300 2012:
> >
> >>   =# \d pg_extension_feature
> >>   Table "pg_catalog.pg_extension_feature"
> >>  Column   | Type | Modifiers
> >>   +--+---
> >>extoid | oid  | not null
> >>extfeature | name | not null
> >>   Indexes:
> >>   "pg_extension_feature_name_index" UNIQUE, btree (extfeature)
> >>   "pg_extension_feature_oid_index" UNIQUE, btree (oid)
> >>   "pg_extension_feature_extoid_name_index" btree (extoid, extfeature)
> >>
> >> We could maybe get rid of the (extoid, extfeature) index which is only
> >> used to get sorted output in list_extension_features() function, but I
> >> don't know how to do an ORDER BY scan without index in C (yet).
> >>
> >> The ordering is then used to maintain pg_depend when the list of
> >> provided features changes at upgrade time. We fetch the ordered list of
> >> “old” feature names then for each newly provided feature name we
> >> bsearch() the old list, which then needs to be properly ordered.
> >
> > Hm, couldn't it be done simply with a qsort()?  Presumably there aren't
> > many feature entries to sort ...
> 
> Mmmm… Then we would need an index on extoid to be able to list features
> of a given extension, and that would be the only usage of such an index.
> I guess that having it include the feature's name is not so expensive as
> to try avoiding it and qsort() in the code rather than scan the index in
> order?

Well, as far as I can see the only use of
pg_extension_feature_extoid_name_index right now is the same as the only
use for the extoid index.  I mean, what you really want is to find out
the features of an extension, right?  The extfeature column is just
there to provide you with the ordering, which should be easy to
determine outside of the index.

Why do features have OIDs?  Is this for pg_depend entries?  If so, would
it work to have pg_depend entries point to extensions instead?

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add notion of a "transform function" that can simplify function

2012-03-23 Thread Tom Lane

Noah Misch  writes:
> On Fri, Mar 23, 2012 at 10:55:52AM -0400, Tom Lane wrote:
>> Why exactly was this thought to be a good idea:
>> 
>>> * A NULL original expression disables use of transform functions while
>>> * retaining all other behaviors.

> I did it that way because it looked wrong to pass the same CoerceViaIO node to
> transforms of both the input and output functions.  Thinking about it again
> now, doing so imposes no fundamental problems.  Feel welcome to change it.

Oh, I see your point --- it's not obvious whether the current transform
is meant for the input or the output function.  Which is a very good
point.  In principle the transform function could figure out which end
of that it must be, but it would be ugly.

However, see my response to Robert: why are we passing the original node
to the transform function at all?  It would be more useful and easier to
work with to pass the function's fully-processed argument list, I believe.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uppercase tab completion keywords in psql?

2012-03-23 Thread Peter Geoghegan

On 23 March 2012 15:13, Andrew Dunstan  wrote:
> Upper casing SQL keywords is a common style, which is used in lots of our
> code (e.g. regression tests, psql queries, pg_dump). I think the default
> should match what is in effect our house style, and what we have
> historically done.

The code doesn't give preferential treatment to lower-case code - it
merely puts it on an even footing. I would agree with your position if
the change assumed that the user always wanted to use lower-case SQL,
but it does not. Rather, it intelligently infers what the user wants.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #6510: A simple prompt is displayed using wrong charset

2012-03-23 Thread Alvaro Herrera


Excerpts from Alexander LAW's message of mar mar 20 16:50:14 -0300 2012:
> Thanks, I've understood your point.
> Please look at the patch. It implements the first way and it makes psql 
> work too.

Great, thanks.  Hopefully somebody with Windows-compile abilities will
have a look at this.


-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers patch v18

2012-03-23 Thread Dimitri Fontaine

Thom Brown  writes:
> The new command triggers work correctly.

Thanks for your continued testing :)

> Having looked at your regression tests, you don't seem to have enough
> "before" triggers in the tests.  There's no test for before CREATE
> TABLE, CREATE TABLE AS or SELECT INTO.  In my tests I have 170 unique
> command triggers, but there are only 44 in the regression test.  Is
> there a reason why there aren't many tests?

Now that we share the same code for ANY triggers and specific ones, I
guess we could drop a lot of specific command triggers from the
regression tests.

> A problem still outstanding is that when I build the docs, the CREATE

I would like to get back on code level review now if at all possible,
and I would integrate your suggestions here into the next patch revision
if another one is needed.

The only point yet to address from last round from Andres is about the
API around CommandFiresTrigger() and the Memory Context we use here.
We're missing an explicit Reset call, and to be able to have we need to
have a more complex API, because of the way RemoveObjects() and
RemoveRelations() work.

We would need to add no-reset APIs and an entry point to manually reset
the memory context, which currently gets disposed at the same time as
its parent context, the current one that's been setup before entering
standard_ProcessUtility().

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add notion of a "transform function" that can simplify function

2012-03-23 Thread Tom Lane

Robert Haas  writes:
> On Fri, Mar 23, 2012 at 10:55 AM, Tom Lane  wrote:
>> Why exactly was this thought to be a good idea:
>> 
>>> * A NULL original expression disables use of transform functions while
>>> * retaining all other behaviors.

> I assumed that we were merely trying to avoid forcing the caller to
> provide the expression tree if they didn't have it handy, and that the
> comment was merely making allowance for the fact that someone might
> want to do such a thing.

How would they not have the original expression tree handy?

But now that I'm looking at this ... the API specification for transform
functions seems rather thoroughly broken anyway.  Why are we passing the
original expression and nothing else?  This would appear to require the
transform function to repeat all the input-normalization and
simplification work done up to this point.  It would seem to me to be
more useful to pass the fully-processed argument list.  I've not looked
yet at the existing transform functions, but why would they want to know
about the original node at all?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Add notion of a "transform function" that can simplify function

2012-03-23 Thread Noah Misch

On Fri, Mar 23, 2012 at 10:55:52AM -0400, Tom Lane wrote:
> Robert Haas  writes:
> > Add notion of a "transform function" that can simplify function calls.
> 
> Why exactly was this thought to be a good idea:
> 
> > * A NULL original expression disables use of transform functions while
> > * retaining all other behaviors.

We last spoke of that idea here, albeit in minimal detail:
http://archives.postgresql.org/pgsql-hackers/2011-06/msg00918.php

> AFAICT that buys nothing except to greatly complicate the API
> specification for simplify_function, something that is now proving
> problematic for Marti's requested refactoring [1].  If it's
> inappropriate for a transform function to modify a CoerceViaIO call,
> surely the transform function can be expected to know that.

I did it that way because it looked wrong to pass the same CoerceViaIO node to
transforms of both the input and output functions.  Thinking about it again
now, doing so imposes no fundamental problems.  Feel welcome to change it.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] query cache

2012-03-23 Thread Billy Earney

Greetings!

I've done a brief search of the postgresql mail archives, and I've noticed
a few projects for adding query caches to postgresql,  (for example,
Masanori Yamazaki's query cache proposal for GSOC 2011), as well as the
query cache announced at http://www.postgresql.org/about/news/1296/
(pgc).  Both of these seem to be external solutions that act more like a
proxy between clients and servers, instead of being part of the server
processes.

I'm wondering if anyone would be interested in a query cache as a backend
to postgresql?  I've been playing around with the postgresql code, and if
I'm understanding the code, I believe this is possible. I've been writing
some code, but don't have anything working yet, (I'm receiving a hash table
corruption error), but I'm working through it.

here's my basic idea:

1.  intercept select queries in execMain.c  at ExecuteQuery and see if the
sourcetext of this query is in the "query hash".  (later we could make this
more sophisticated  by  using the query plan or some type of AST) instead
of the query text since adding or removing a space would create a different
query hash key.
2.  if the query is in the cache, return the cached results of this query.
3.  if the query is not cached, run the query like normal, grabbing the
tuples as they are sent to the "dest" and store them in the cache. (For
now, I'm ignoring storage constraints, etc, but these details will need to
be added before going to production).

To invalidate cache entries, look at the transactions being committed (and
written to WAL log, if my memory serves me) and send a message to the
qcache process to invalidate any query which depends on the modfied
relation (ie, table, etc)


For the experts out there, does this seem reasonable, or am I
misunderstanding the source code?  Anyone aware of a project trying to
accomplish this?

Thanks!

Billy Earney

Re: [HACKERS] Finer Extension dependencies

2012-03-23 Thread Dimitri Fontaine

Alvaro Herrera  writes:
> Excerpts from Dimitri Fontaine's message of vie mar 23 11:05:37 -0300 2012:
>
>>   =# \d pg_extension_feature
>>   Table "pg_catalog.pg_extension_feature"
>>  Column   | Type | Modifiers
>>   +--+---
>>extoid | oid  | not null
>>extfeature | name | not null
>>   Indexes:
>>   "pg_extension_feature_name_index" UNIQUE, btree (extfeature)
>>   "pg_extension_feature_oid_index" UNIQUE, btree (oid)
>>   "pg_extension_feature_extoid_name_index" btree (extoid, extfeature)
>>
>> We could maybe get rid of the (extoid, extfeature) index which is only
>> used to get sorted output in list_extension_features() function, but I
>> don't know how to do an ORDER BY scan without index in C (yet).
>>
>> The ordering is then used to maintain pg_depend when the list of
>> provided features changes at upgrade time. We fetch the ordered list of
>> “old” feature names then for each newly provided feature name we
>> bsearch() the old list, which then needs to be properly ordered.
>
> Hm, couldn't it be done simply with a qsort()?  Presumably there aren't
> many feature entries to sort ...

Mmmm… Then we would need an index on extoid to be able to list features
of a given extension, and that would be the only usage of such an index.
I guess that having it include the feature's name is not so expensive as
to try avoiding it and qsort() in the code rather than scan the index in
order?

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Add notion of a "transform function" that can simplify function

2012-03-23 Thread Robert Haas

On Fri, Mar 23, 2012 at 10:55 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> Add notion of a "transform function" that can simplify function calls.
>
> Why exactly was this thought to be a good idea:
>
>> * A NULL original expression disables use of transform functions while
>> * retaining all other behaviors.
>
> AFAICT that buys nothing except to greatly complicate the API
> specification for simplify_function, something that is now proving
> problematic for Marti's requested refactoring [1].  If it's
> inappropriate for a transform function to modify a CoerceViaIO call,
> surely the transform function can be expected to know that.

I assumed that we were merely trying to avoid forcing the caller to
provide the expression tree if they didn't have it handy, and that the
comment was merely making allowance for the fact that someone might
want to do such a thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uppercase tab completion keywords in psql?

2012-03-23 Thread Tom Lane

Peter Geoghegan  writes:
> On 22 March 2012 22:05, Andrew Dunstan  wrote:
>> Should it be governed by a setting?

> Perhaps, but I find the behaviour that was introduced by Peter's patch
> to be a more preferable default.

FWIW, I like the new behavior better too.  I'm not particularly a
fan of all-caps.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uppercase tab completion keywords in psql?

2012-03-23 Thread Andrew Dunstan




On 03/23/2012 11:07 AM, Peter Geoghegan wrote:

On 22 March 2012 22:05, Andrew Dunstan  wrote:


On 03/22/2012 05:49 PM, Bruce Momjian wrote:


Robert Haas and I are disappointed by this change.  I liked the fact
that I could post nice-looking SQL queries without having to use my
capslock key (which I use as a second control key).  Any chance of
reverting this change?


Should it be governed by a setting?

Perhaps, but I find the behaviour that was introduced by Peter's patch
to be a more preferable default.



Upper casing SQL keywords is a common style, which is used in lots of 
our code (e.g. regression tests, psql queries, pg_dump). I think the 
default should match what is in effect our house style, and what we have 
historically done.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch for parallel pg_dump

2012-03-23 Thread Alvaro Herrera



Are you going to provide a rebased version?

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uppercase tab completion keywords in psql?

2012-03-23 Thread Peter Geoghegan

On 22 March 2012 22:05, Andrew Dunstan  wrote:
>
>
> On 03/22/2012 05:49 PM, Bruce Momjian wrote:
>>
>>
>> Robert Haas and I are disappointed by this change.  I liked the fact
>> that I could post nice-looking SQL queries without having to use my
>> capslock key (which I use as a second control key).  Any chance of
>> reverting this change?
>>
>
> Should it be governed by a setting?

Perhaps, but I find the behaviour that was introduced by Peter's patch
to be a more preferable default.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] heap_freeze_tuple locking requirements

2012-03-23 Thread Alvaro Herrera


Excerpts from Robert Haas's message of mié mar 21 21:50:24 -0300 2012:
> heap_freeze_tuple() was apparently designed at one point to cope with
> being called with either a shared or exclusive buffer lock.  But none
> of the current callers call it with a shared lock; they all call it
> with an exclusive lock, except for the heap-rewrite code which doesn't
> take (or need) a lock at all.

> Since this is just dead code removal, I propose to apply this to 9.2.

+1

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add notion of a "transform function" that can simplify function

2012-03-23 Thread Tom Lane

Robert Haas  writes:
> Add notion of a "transform function" that can simplify function calls.

Why exactly was this thought to be a good idea:

> * A NULL original expression disables use of transform functions while
> * retaining all other behaviors.

AFAICT that buys nothing except to greatly complicate the API
specification for simplify_function, something that is now proving
problematic for Marti's requested refactoring [1].  If it's
inappropriate for a transform function to modify a CoerceViaIO call,
surely the transform function can be expected to know that.

regards, tom lane

[1] http://archives.postgresql.org/pgsql-hackers/2012-03/msg00694.php

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uppercase tab completion keywords in psql?

2012-03-23 Thread David Fetter

On Fri, Mar 23, 2012 at 11:51:16AM -0300, Alvaro Herrera wrote:
> 
> Excerpts from Andrew Dunstan's message of jue mar 22 19:05:30 -0300 2012:
> > 
> > On 03/22/2012 05:49 PM, Bruce Momjian wrote:
> > >
> > > Robert Haas and I are disappointed by this change.  I liked the
> > > fact that I could post nice-looking SQL queries without having
> > > to use my capslock key (which I use as a second control key).
> > > Any chance of reverting this change?
> > >
> > 
> > Should it be governed by a setting?
> 
> A \set variable perhaps?  +1  Would the old behavior be the default?

+1 for defaulting to the old behavior.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uppercase tab completion keywords in psql?

2012-03-23 Thread David Fetter

On Thu, Mar 22, 2012 at 06:05:30PM -0400, Andrew Dunstan wrote:
> On 03/22/2012 05:49 PM, Bruce Momjian wrote:
> >Robert Haas and I are disappointed by this change.  I liked the
> >fact that I could post nice-looking SQL queries without having to
> >use my capslock key (which I use as a second control key).  Any
> >chance of reverting this change?
> >
> 
> Should it be governed by a setting?

Something like (upper|lower|preserve) ?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uppercase tab completion keywords in psql?

2012-03-23 Thread Alvaro Herrera


Excerpts from Andrew Dunstan's message of jue mar 22 19:05:30 -0300 2012:
> 
> On 03/22/2012 05:49 PM, Bruce Momjian wrote:
> >
> > Robert Haas and I are disappointed by this change.  I liked the fact
> > that I could post nice-looking SQL queries without having to use my
> > capslock key (which I use as a second control key).  Any chance of
> > reverting this change?
> >
> 
> Should it be governed by a setting?

A \set variable perhaps?  +1  Would the old behavior be the default?

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Finer Extension dependencies

2012-03-23 Thread Alvaro Herrera


Excerpts from Dimitri Fontaine's message of vie mar 23 11:05:37 -0300 2012:

>   =# \d pg_extension_feature
>   Table "pg_catalog.pg_extension_feature"
>  Column   | Type | Modifiers 
>   +--+---
>extoid | oid  | not null
>extfeature | name | not null
>   Indexes:
>   "pg_extension_feature_name_index" UNIQUE, btree (extfeature)
>   "pg_extension_feature_oid_index" UNIQUE, btree (oid)
>   "pg_extension_feature_extoid_name_index" btree (extoid, extfeature)
> 
> We could maybe get rid of the (extoid, extfeature) index which is only
> used to get sorted output in list_extension_features() function, but I
> don't know how to do an ORDER BY scan without index in C (yet).
> 
> The ordering is then used to maintain pg_depend when the list of
> provided features changes at upgrade time. We fetch the ordered list of
> “old” feature names then for each newly provided feature name we
> bsearch() the old list, which then needs to be properly ordered.

Hm, couldn't it be done simply with a qsort()?  Presumably there aren't
many feature entries to sort ...

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpoint patches

2012-03-23 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> Well, how do you want to look at it?  

I thought the last graph you provided was a useful way to view the
results.  It was my intent to make that clear in my prior email, my
apologies if that didn't come through.

> Here's the data from 80th
> percentile through 100th percentile - percentile, patched, unpatched,
> difference - for the same two runs I've been comparing:
[...]
> 98 12100 24645 -12545
> 99 186043 201309 -15266
> 100 9513855 9074161 439694

Those are the areas that I think we want to be looking at/for: the
outliers.

> By the way, I reran the tests on master with checkpoint_timeout=16min,
> and here are the tps results: 2492.966759, 2588.750631, 2575.175993.
> So it seems like not all of the tps gain from this patch comes from
> the fact that it increases the time between checkpoints.  Comparing
> the median of three results between the different sets of runs,
> applying the patch and setting a 3s delay between syncs gives you
> about a 5.8% increase throughput, but also adds 30-40 seconds between
> checkpoints.  If you don't apply the patch but do increase time
> between checkpoints by 1 minute, you get about a 5.0% increase in
> throughput.  That certainly means that the patch is doing something -
> because 5.8% for 30-40 seconds is better than 5.0% for 60 seconds -
> but it's a pretty small effect.

That doesn't surprise me too much.  As I mentioned before, and Greg
please correct me if I'm wrong, but I thought this patch was intended to
reduce the latency spikes that we suffer from under some workloads,
which can often be attributed back to i/o related contention.  I don't
believe it's intended or expected to seriously increase throughput.

> The picture looks similar here.  Increasing checkpoint_timeout isn't
> *quite* as good as spreading out the fsyncs, but it's pretty darn
> close.  For example, looking at the median of the three 98th
> percentile numbers for each configuration, the patch bought us a 28%
> improvement in 98th percentile latency.  But increasing
> checkpoint_timeout by a minute bought us a 15% improvement in 98th
> percentile latency.  So it's still not clear to me that the patch is
> doing anything on this test that you couldn't get just by increasing
> checkpoint_timeout by a few more minutes.  Granted, it lets you keep
> your inter-checkpoint interval slightly smaller, but that's not that
> exciting.  That having been said, I don't have a whole lot of trouble
> believing that there are other cases where this is more worthwhile.

I could certainly see the checkpoint_timeout parameter, along with the
others, as being sufficient to address this, in which case we likely
don't need the patch.  They're both more-or-less intended to do the same
thing and it's just a question of if being more granular ends up helping
or not.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Unnecessary WAL archiving after failover

2012-03-23 Thread Fujii Masao

On Thu, Mar 22, 2012 at 12:56 AM, Robert Haas  wrote:
> On Wed, Feb 29, 2012 at 5:48 AM, Fujii Masao  wrote:
>> Hi,
>>
>> In streaming replication, after failover, new master might have lots
>> of un-applied
>> WAL files with old timeline ID. They are the WAL files which were recycled 
>> as a
>> future ones when the server was running as a standby. Since they will never 
>> be
>> used later, they don't need to be archived after failover. But since they 
>> have
>> neither .ready nor .done file in archive_status, checkpoints after
>> failover newly
>> create .reacy files for them, and then finally they are archived.
>> Which might cause
>> disk I/O spike both in WAL and archive storage.
>>
>> To avoid the above problem, I think that un-applied WAL files with old
>> timeline ID
>> should be marked as already-archived and recycled immediately at the end of
>> recovery. Thought?
>
> I'm not an expert on this, but that makes sense to me.

Thanks for agreeing with my idea.

On second thought, I found other issues about WAL archiving after
failover. So let me clarify the issues again.

Just after failover, there can be three kinds of WAL files in new
master's pg_xlog directory:

(1) WAL files which were recycled to by restartpoint

I've already explained upthread the issue which these WAL files cause
after failover.

(2) WAL files which were restored from the archive

In 9.1 or before, the restored WAL files don't remain after failover
because they are always restored onto the temporary filename
"RECOVERYXLOG". So the issue which I explain from now doesn't exist
in 9.1 or before.

In 9.2dev, as the result of supporting cascade replication,
an archived WAL file is restored onto correct file name so that
cascading walsender can send it to another standby. This restored
WAL file has neither .ready nor .done archive status file. After
failover, checkpoint checks the archive status file of the restored
WAL file to attempt to recycle it, finds that it has neither .ready
nor ,done, and creates .ready. Because of existence of .ready,
it will be archived again even though it obviously already exists in
the archival storage :(

To prevent a restored WAL file from being archived again, I think
that .done should be created whenever WAL file is successfully
restored (of course this should happen only when archive_mode is
enabled). Thought?

Since this is the oversight of cascade replication, I'm thinking to
implement the patch for 9.2dev.

(3) WAL files which were streamed from the master

These WAL files also don't have any archive status, so checkpoint
creates .ready for them after failover. And then, all or many of
them will be archived at a time, which would cause I/O spike on
both WAL and archival storage.

To avoid this problem, I think that we should change walreceiver
so that it creates .ready as soon as it completes the WAL file. Also
we should change the archiver process so that it starts up even in
standby mode and archives the WAL files.

If each server has its own archival storage, the above solution would
work fine. But if all servers share the archival storage, multiple archiver
processes in those servers might archive the same WAL file to
the shared area at the same time. Is this OK? If not, to avoid this,
we might need to separate archive_mode into two: one for normal mode
(i.e., master), another for standbfy mode. If the archive is shared,
we can ensure that only one archiver in the master copies the WAL file
at the same time by disabling WAL archiving in standby mode but
enabling it in normal mode. Thought?

Invoking the archiver process in standby mode is new feature,
not a bug fix. It's too late to propose new feature for 9.2. So I'll
propose this for 9.3.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-23 Thread Merlin Moncure

On Wed, Mar 21, 2012 at 10:51 PM, Atri Sharma  wrote:
> Please let me know how to proceed further.
>
> Waiting for your reply,

sure -- let's take this discussion off line.  send me a private mail
and we'll discuss if/how we can get this off the ground.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting WAL file containing checkpoint's REDO record in pg_controldata's result

2012-03-23 Thread Robert Haas

On Fri, Mar 23, 2012 at 6:13 AM, Fujii Masao  wrote:
> On Fri, Mar 23, 2012 at 5:56 PM, Magnus Hagander  wrote:
>> Might it be a good idea to put it on it's own row instead of changing
>> the format of an existing row, in order not to break scripts and
>> programs that are parsing the previous output?
>
> Good idea! What row name should we use for the WAL file containing
> REDO record? "Latest checkpoint's REDO file"?

Sounds good to me.  I like the idea, too.  The status quo is an
unnecessary nuisance, so this will be a nice usability improvement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpoint patches

2012-03-23 Thread Robert Haas

On Thu, Mar 22, 2012 at 8:44 PM, Stephen Frost  wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
>> On Thu, Mar 22, 2012 at 3:45 PM, Stephen Frost  wrote:
>> > Well, those numbers just aren't that exciting. :/
>>
>> Agreed.  There's clearly an effect, but on this test it's not very big.
>
> Ok, perhaps that was because of how you were analyzing it using the 90th
> percetile..?

Well, how do you want to look at it?  Here's the data from 80th
percentile through 100th percentile - percentile, patched, unpatched,
difference - for the same two runs I've been comparing:

80 1321 1348 -27
81 1333 1360 -27
82 1345 1373 -28
83 1359 1387 -28
84 1373 1401 -28
85 1388 1417 -29
86 1404 1434 -30
87 1422 1452 -30
88 1441 1472 -31
89 1462 1494 -32
90 1487 1519 -32
91 1514 1548 -34
92 1547 1582 -35
93 1586 1625 -39
94 1637 1681 -44
95 1709 1762 -53
96 1825 1905 -80
97 2106 2288 -182
98 12100 24645 -12545
99 186043 201309 -15266
100 9513855 9074161 439694

Here are the 95th-100th percentiles for each of the six runs:

ckpt.checkpoint-sync-pause-v1.10: 1709, 1825, 2106, 12100, 186043, 9513855
ckpt.checkpoint-sync-pause-v1.11: 1707, 1824, 2118, 16792, 196107, 8869602
ckpt.checkpoint-sync-pause-v1.12: 1693, 1807, 2091, 15132, 191207, 7246326
ckpt.master.10: 1734, 1875, 2235, 21145, 203214, 6855888
ckpt.master.11: 1762, 1905, 2288, 24645, 201309, 9074161
ckpt.master.12: 1746, 1889, 2272, 20309, 194459, 7833582

By the way, I reran the tests on master with checkpoint_timeout=16min,
and here are the tps results: 2492.966759, 2588.750631, 2575.175993.
So it seems like not all of the tps gain from this patch comes from
the fact that it increases the time between checkpoints.  Comparing
the median of three results between the different sets of runs,
applying the patch and setting a 3s delay between syncs gives you
about a 5.8% increase throughput, but also adds 30-40 seconds between
checkpoints.  If you don't apply the patch but do increase time
between checkpoints by 1 minute, you get about a 5.0% increase in
throughput.  That certainly means that the patch is doing something -
because 5.8% for 30-40 seconds is better than 5.0% for 60 seconds -
but it's a pretty small effect.

And here are the latency results for 95th-100th percentile with
checkpoint_timeout=16min.

ckpt.master.13: 1703, 1830, 2166, 17953, 192434, 43946669
ckpt.master.14: 1728, 1858, 2169, 15596, 187943, 9619191
ckpt.master.15: 1700, 1835, 2189, 22181, 206445, 8212125

The picture looks similar here.  Increasing checkpoint_timeout isn't
*quite* as good as spreading out the fsyncs, but it's pretty darn
close.  For example, looking at the median of the three 98th
percentile numbers for each configuration, the patch bought us a 28%
improvement in 98th percentile latency.  But increasing
checkpoint_timeout by a minute bought us a 15% improvement in 98th
percentile latency.  So it's still not clear to me that the patch is
doing anything on this test that you couldn't get just by increasing
checkpoint_timeout by a few more minutes.  Granted, it lets you keep
your inter-checkpoint interval slightly smaller, but that's not that
exciting.  That having been said, I don't have a whole lot of trouble
believing that there are other cases where this is more worthwhile.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting WAL file containing checkpoint's REDO record in pg_controldata's result

2012-03-23 Thread Fujii Masao

On Fri, Mar 23, 2012 at 5:56 PM, Magnus Hagander  wrote:
> Might it be a good idea to put it on it's own row instead of changing
> the format of an existing row, in order not to break scripts and
> programs that are parsing the previous output?

Good idea! What row name should we use for the WAL file containing
REDO record? "Latest checkpoint's REDO file"?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting WAL file containing checkpoint's REDO record in pg_controldata's result

2012-03-23 Thread Magnus Hagander

On Fri, Mar 23, 2012 at 10:51, Fujii Masao  wrote:
> On Fri, Mar 23, 2012 at 2:06 PM, Jaime Casanova  wrote:
>> On Thu, Mar 22, 2012 at 11:06 PM, Fujii Masao  wrote:
>>>
>>> We can use
>>> pg_xlogfile_name function to calculate that, but it cannot be executed in
>>> the standby. Another problem is that pg_xlogfile_name always uses
>>> current timeline for the calculation, so if the reported timeline is not
>>> the same as current one, pg_xlogfile_name cannot return the correct WAL
>>> file name. Making pg_controldata report that WAL file name gets rid of
>>> such a complexity.
>>>
>>
>> i would think that pg_xlogfile_name() is not allowed in the standby
>> because ThisTimelineId is not very well defined in recovery but if you
>> extend pg_xlogfile_name() to also receive a timelineid as you
>> suggested in [1] then that version of the function could be allowed in
>> the standby.
>> or there is something else i'm missing?
>>
>> is that enough for you to solve your problem?
>
> Yes, we can more easily calculate the cutoff point by using that extended
> pg_xlogfile_name(). But if pg_controldata reports the WAL file name, we
> can calculate the cutoff point without starting the server. So I think that
> it's worth changing pg_controldata that way even if we extend
> pg_xlogfile_name().

+1 - I think they're both useful things, each on it's own.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reporting WAL file containing checkpoint's REDO record in pg_controldata's result

2012-03-23 Thread Fujii Masao

On Fri, Mar 23, 2012 at 2:06 PM, Jaime Casanova  wrote:
> On Thu, Mar 22, 2012 at 11:06 PM, Fujii Masao  wrote:
>>
>> We can use
>> pg_xlogfile_name function to calculate that, but it cannot be executed in
>> the standby. Another problem is that pg_xlogfile_name always uses
>> current timeline for the calculation, so if the reported timeline is not
>> the same as current one, pg_xlogfile_name cannot return the correct WAL
>> file name. Making pg_controldata report that WAL file name gets rid of
>> such a complexity.
>>
>
> i would think that pg_xlogfile_name() is not allowed in the standby
> because ThisTimelineId is not very well defined in recovery but if you
> extend pg_xlogfile_name() to also receive a timelineid as you
> suggested in [1] then that version of the function could be allowed in
> the standby.
> or there is something else i'm missing?
>
> is that enough for you to solve your problem?

Yes, we can more easily calculate the cutoff point by using that extended
pg_xlogfile_name(). But if pg_controldata reports the WAL file name, we
can calculate the cutoff point without starting the server. So I think that
it's worth changing pg_controldata that way even if we extend
pg_xlogfile_name().

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-03-23 Thread Kyotaro HORIGUCHI

Thank you for picking up.

> is considering three cases: it got a 2-byte integer (and can continue on),
> or there aren't yet 2 more bytes available in the buffer, in which case it
> should return EOF without doing anything, or pqGetInt detected a hard
> error and updated the connection error state accordingly, in which case
> again there is nothing to do except return EOF.  In the patched code we
> have:
...
> which handles neither the second nor third case correctly: it thinks that
> "data not here yet" is a hard error, and then makes sure it is an error by
> destroying the parsing state :-(.

Marko and I think that, in protocol 3, all bytes of the incoming
message should have been surely loaded when entering
getAnotherTuple(). The following part In pqParseInput3() does
this.

| if (avail < msgLength)
| {
| /*
|  * Before returning, enlarge the input buffer if needed to hold
|  * the whole message.
|  (snipped)..
|  */
| if (pqCheckInBufferSpace(conn->inCursor + (size_t) msgLength,
|  conn))
| {
| /*
|  * XXX add some better recovery code...
| (snipped)..
|  */
| handleSyncLoss(conn, id, msgLength);
| }
| return;


So, if cursor state is broken just after exiting
getAnotherTuple(), it had already been broken BEFORE entering
getAnotherTuple() according to current disign. That is the
'protocol error' means. pqGetInt there should not detect any
errors except for broken message.


> error, that possibly-useful error message is overwritten with an entirely
> useless "protocol error" text.

 Plus, current pqGetInt seems to set its own error message only
for the wrong parameter 'bytes'. 

On the other hand, in protocol 2 (to be removed ?) the error
handling mechanism get touched, because full-load of the message
is not guraranteed.

> I don't think the error return cases for the row processor have been
> thought out too well either.  The row processor is not in charge of what
> happens to the PGresult,

Default row processor stuffs PGresult with tuples, another (say
that of dblink) leave it empty. Row processor manages PGresult by
the extent of their own.

>  and it certainly has no business telling libpq to just "exit
> immediately from the topmost libpq function". If we do that
> we'll probably lose sync with the data stream and be unable to
> recover use of the connection at all.

I don't think PGresult has any charge of error handling system in
current implement. The phrase 'exit immediately from the topmost
libpq function' should not be able to be seen in the patch.

The exit routes from row processor are following,

 - Do longjmp (or PG_PG_TRY-CATCH mechanism) out of the row
   processor.

 - Row processor returns 0 when entered from PQisBusy(),
   immediately exit from PQisBusy().

Curosor consistency will be kept in both case. The cursor already
be on the next to the last byte of the current message.

> Also, do we need to consider any error cases for the row
> processor other than out-of-memory?  If so it might be a good
> idea for it to have some ability to store a custom error
> message into the PGconn, which it cannot do given the current
> function API.

It seems not have so strong necessity concerning dblink or
PQgetRow comparing to expected additional complexity around. So
this patch does not include it.

> In the same vein, I am fairly uncomfortable with the blithe assertion that
> a row processor can safely longjmp out of libpq.  This was never foreseen
> in the original library coding and there are any number of places that
> that might break, now or in the future.  Do we really need to allow it?

To protect row processor from longjmp'ing out, I enclosed the
functions potentially throw exception by PG_TRY-CATCH clause in
the early verson. This was totally safe but the penalty was not
negligible because the TRY-CATCH was passed for every row.


> If we do, it would be a good idea to decorate the libpq
> functions that are now expected to possibly longjmp with
> comments saying so.  Tracing all the potential call paths that
> might be aborted by a longjmp is an essential activity anyway.

Concerning now but the future, I can show you the trail of
confirmation process.

- There is no difference between with and without the patch at
  the level of getAnotherTuple() from the view of consistency.

- Assuming pqParseInput3 detects the next message has not come
  after getAnotherTuple returned. It exits immediately on reading
  the length of the next message. This is the same condition to
  longjumping.
 
>   if (pqGetInt(&msgLength, 4, conn))
>   return;

- parseInput passes it through and immediately exits in
  consistent state.

- The caller of PQgetResult, PQisBusy, PQskipResult, PQnotifies,
  PQputCopyData, pqHandleSendFailure gain the control finally. I
  am convinced that the async status at the time must be
  PGASYNC_BUSY and the conn cursor in consistent state.

  So the ancesto

Re: [HACKERS] Reporting WAL file containing checkpoint's REDO record in pg_controldata's result

2012-03-23 Thread Magnus Hagander

On Fri, Mar 23, 2012 at 05:06, Fujii Masao  wrote:
> Hi,
>
> I'd like to propose to change pg_controldata so that it reports the name
> of WAL file containing the latest checkpoint's REDO record, as follows:
>
>    $ pg_controldata $PGDATA
>    ...
>    Latest checkpoint's REDO location:    0/16D6ACC (file
> 00010001)
>    Latest checkpoint's TimeLineID:       1
>    ...
>
> This simplifies very much the way to calculate the archive file cutoff point
> because the reported WAL file is just cutoff point itself. If the file name is
> not reported, we have to calculate the cutoff point from the reported
> location and timeline, which is complicated calculation. We can use
> pg_xlogfile_name function to calculate that, but it cannot be executed in
> the standby. Another problem is that pg_xlogfile_name always uses
> current timeline for the calculation, so if the reported timeline is not
> the same as current one, pg_xlogfile_name cannot return the correct WAL
> file name. Making pg_controldata report that WAL file name gets rid of
> such a complexity.
>
> You may think that archive_cleanup_command is usable for that purpose.
> That's true. But it's not usable simply for  the case where there are more
> than one standby servers. In this case, the archive file cutoff point needs
> to be calculated from each standby's REDO location and timeline.
>
> Attached patch changes pg_controldata as above. Thought?

Might it be a good idea to put it on it's own row instead of changing
the format of an existing row, in order not to break scripts and
programs that are parsing the previous output?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standbys, txid_current_snapshot, wraparound

2012-03-23 Thread Simon Riggs

On Fri, Mar 23, 2012 at 12:26 AM, Daniel Farina  wrote:

> Some time ago I reported bug 6291[0], which reported a Xid wraparound,
> both as reported in pg_controldata and by txid_current_snapshot.
> Unfortunately, nobody could reproduce it.
>
> Today, the same system of ours just passed the wraparound mark
> successfully at this time, incrementing the epoch.  However, two
> standbys have not done the same: they have wrapped to a low txid.  At
> this time, pg_controldata does report the correct epoch, as I read it,
> unlike the original case.
>
> I have not yet tried to reproduce this in a minimal way, but I wanted
> to relate this information as soon as possible.
>
> These systems are 9.0.6, on Ubuntu 10.04 LTS, amd64.
>
> [0]: http://archives.postgresql.org/pgsql-bugs/2011-11/msg00094.php

So we have this?

Master pg_controldata - OK txid_current_snapshot() - OK
Standby pg_controldata - OK txid_current_snapshot() - lower value

Are there just 2 standbys? So all standbys have acted identically?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

62 matches

Mail list logo