Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-01-05 Thread Boszormenyi Zoltan

2013-01-05 05:58 keltezéssel, Amit kapila írta:

On Friday, January 04, 2013 10:57 PM Boszormenyi Zoltan wrote:

Hi,
I am reviewing your patch.

Thank you very much.


Since you are using a constant string, it would be a little faster
to use sizeof(string)-1 as it can be computed at compile-time
and not run the strlen() all the time this code is reached.

I have reffered the code just above for PG_TEMP_FILE_PREFIX in same function 
sendDir().
I have that not only that place, but there other places where strlen is used 
for PG_TEMP_FILE_PREFIX.
I think in this path, such optimization might not help much.
However if you think we should take care of this, then I can find other places 
where similar change can be done
to make it consistent?


Yes, but it needs a separate cleanup patch.

Also, since the SET PERSISTENT patch seems to create a single lock file,
sizeof(string) (which includes the 0 byte at the end of the string, so it
matches the whole filename) can be used instead of strlen(string) or
sizeof(string)-1 that match the filename as a prefix only.




In create_conf_lock_file():



Can't we add a new LWLock and use it as a critical section instead
of waiting for 10 seconds? It would be quite surprising to wait
10 seconds  when accidentally executing two SET PERSISTENT
statements in parallel.

Yes, you are right adding a new LWLock will avoid the use of sleep.
However I am just not sure if for only this purpose we should add a new LWLock?

Similar logic is used in CreateLockFile() for postmaster file but it doesn't 
use sleep.
How about reducing the time of sleep or removing sleep, in that user might get
error and he need to retry to get his command successful?

With Regards,
Amit Kapila.




--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-01-05 Thread Noah Misch
On Sat, Jan 05, 2013 at 08:05:11AM +0100, Boszormenyi Zoltan wrote:
 2013-01-05 05:58 keltez?ssel, Amit kapila ?rta:
 On Friday, January 04, 2013 10:57 PM Boszormenyi Zoltan wrote:
 In create_conf_lock_file():

 Can't we add a new LWLock and use it as a critical section instead
 of waiting for 10 seconds? It would be quite surprising to wait
 10 seconds  when accidentally executing two SET PERSISTENT
 statements in parallel.
 Yes, you are right adding a new LWLock will avoid the use of sleep.
 However I am just not sure if for only this purpose we should add a new 
 LWLock?

 Similar logic is used in CreateLockFile() for postmaster file but it doesn't 
 use sleep.
 How about reducing the time of sleep or removing sleep, in that user might 
 get
 error and he need to retry to get his command successful?

 I think removing the loop entirely is better.

Agreed.  A new LWLock with a narrow purpose is fine.  CreateLockFile() happens
early, before shared memory is available.  Its approach is not a relevant
precedent for code that runs later.

 However, the behaviour should be discussed by a wider audience:
 - the backends should wait for each other just like other statements
that fight for the same object (in which case locking is needed), or
 - throw an error immediately on conflicting parallel work

All other things being equal, the first choice sounds better.  (I don't think
the decision is essential to the utility of this feature.)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_retainxlog for inclusion in 9.3?

2013-01-05 Thread Magnus Hagander
On Fri, Jan 4, 2013 at 7:13 PM, Peter Eisentraut pete...@gmx.net wrote:
 On 1/3/13 12:30 PM, Robert Haas wrote:
 On Thu, Jan 3, 2013 at 11:32 AM, Magnus Hagander mag...@hagander.net wrote:
 Any particular reason? It goes pretty tightly together with
 pg_receivexlog, which is why I'd prefer putting it alongside that one.
 But if you have a good argument against it, I can change my mind :)

 Mostly that it seems like a hack, and I suspect we may come up with a
 better way to do this in the future.

 It does seem like a hack.  Couldn't this be implemented with a backend
 switch instead?

It definitely is a bit of a hack.

I assume by backend switch you mean guc, right? If so, no, not easily
so. Because it's the archiver process that does the deleting. And this
process does not have access to a full backend interface, e.g. the
ability to run a query. We could make it look at the same data that's
currently shown in pg_stat_replicatoin through shared memory, but thta
would *only* work in the very most simple cases (e.g. a single
pg_receivexlog and no other replication). The ability to run a custom
SQL query is going to be necessary for anything a bit more advanced.


 Also, as a small practical matter, since this is a server-side program
 (since it's being used as archive_command), we shouldn't put it into the
 pg_basebackup directory, because that would blur the lines about what to
 install where, in particular for the translations.

Good argument. That along with the being a hack, and the comment from
Robert, means that maybe contrib/ is a better place for it, yes.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bad examples in pg_dump README

2013-01-05 Thread Magnus Hagander
On Fri, Jan 4, 2013 at 3:36 AM, Josh Kupershmidt schmi...@gmail.com wrote:
 The ./src/bin/pg_dump README contains several inoperable examples.
 First, this suggestion:

 |   or to list tables:
 |
 |   pg_restore backup-file --table | less

 seems bogus, i.e. the --table option requires an argument specifing
 which table to restore, and does not list tables. Next, this
 suggested command:

 |  pg_restore backup-file -l --oid --rearrange | less

 hasn't worked since 7.4 or thereabouts, since we don't have
 --rearrange or --oid anymore. (I'm not sure we ever had --oid, maybe
 it was supposed to be --oid-order?). Then, three examples claim we
 support a --use flag, e.g.

 |  pg_restore backup.bck --use=toc.lis  script.sql

 where presumably --use-list was meant. Further little gripes with
 this README include mixed use of tabs and spaces for the examples, and
 lines running over the 80-column limit which at least some of our
 other READMEs seem to honor.

 I started out attempting to fix up the README, keeping the original
 example commands intact, but upon further reflection I think the
 examples of dumping, restoring, and selective-restoring in that REAMDE
 don't belong there anyway. There are already better examples of such
 usage in the pg_dump/pg_restore documentation pages, and IMO that's
 where such generally-useful usage information belongs.

 I propose slimming down the pg_dump README, keeping intact the
 introductory notes and details of the tar format.

Do we need to keep it at all, really? Certainly the introductory part
is covered in the main documentation already...

I believe the tar notes are also out of date. For example, they claim
that you can't expect pg_restore to work with an uncompressed tar
format - yet the header in pg_backup_tar.c says that an uncompressed
tar format is compatible with a directory format dump, and thus you
*can* use pg_restore.

(And fwiw,the note about the user should probably go in src/port/ now
that we moved the tar header creation there a few days ago)

I would suggest we just drop the README file completely. I don't think
it adds any value at all.

Any objections to that path? :)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby [Review]

2013-01-05 Thread Magnus Hagander
On Thu, Jan 3, 2013 at 1:33 PM, Boszormenyi Zoltan z...@cybertec.at wrote:
 2013-01-02 11:54 keltezéssel, Boszormenyi Zoltan írta:

 2013-01-02 10:37 keltezéssel, Boszormenyi Zoltan írta:

 2013-01-02 10:12 keltezéssel, Magnus Hagander írta:
 Attached is the quotes-v2 patch, the function is renamed and
 the comment is modified plus the pg_basebackup v21 patch
 that uses this function.


 Rebased after pgtar.h was added. The quotes.c comment was modified
 even more so it doesn't say this function is used for SQL string literals.

Quotes patch applied with a few small changes:
1) Support for the msvc build (needs an entry added for new files that
go in src/port if they should be used on Windows)
2) Makefile entries should be alphabetical (yes, that's really trivial
nitpicking :D)


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Proposal: Store timestamptz of database creation on pg_database

2013-01-05 Thread Fabrízio de Royes Mello
On Fri, Jan 4, 2013 at 4:07 PM, Peter Eisentraut pete...@gmx.net wrote:

 On 1/3/13 3:26 PM, Robert Haas wrote:
  It's true, as we've often
  said here, that leveraging the OS facilities means that we get the
  benefit of improving OS facilities for free - but it also means that
  we never exceed what the OS facilities are able to provide.

 And that should be the deciding factor, shouldn't it?  Clearly, the OS
 timestamps do not satisfy the requirements of tracking database object
 creation times.


+1

And IMHO we must decide what we do or if we'll don't anything.

In this thread was discussed many ways to how to implement and how not
implement, so I compile some important points discussed before (sorry if I
forgot something):

* the original proposal was just to add a column in shared catalog
'pg_database' to track creation time (I already sent a patch [1]), but the
discussion going to implement a way to track creation time off all database
objects

* some people said if we implement that then we must have some way to alter
creation times by SQL (ALTER cmds) and also have a way to dump and restore
this info by pg_dump/pg_restore. Some agreed and others disagree.

* we talk about implement it with EventTriggers, but they not cover shared
objects (like databases, roles, tablespaces,...), and someone talked to
extend EventTriggers to cover this kind of objects or maybe we have a way
to create *shared tables* (this is what I understood). This way force to
every people implement your own track time system or maybe someone share a
extension to do that.

* also we discuss about create two new catalogs, one local and another
shared (like pg_description and pg_shdescription) to track creation times
of all database objects.

Please fix if I forgot something. Anyway, we must decide what to do.

I don't know enough, but I have another idea. What you guys think about we
have tables like stats tables to track creation times, with a GUC to
enable or disable this behavior.


Regards,


[1] http://archives.postgresql.org/pgsql-hackers/2013-01/msg00111.php

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby [Review]

2013-01-05 Thread Magnus Hagander
On Sat, Jan 5, 2013 at 3:41 PM, Magnus Hagander mag...@hagander.net wrote:
 On Thu, Jan 3, 2013 at 1:33 PM, Boszormenyi Zoltan z...@cybertec.at wrote:
 2013-01-02 11:54 keltezéssel, Boszormenyi Zoltan írta:

 2013-01-02 10:37 keltezéssel, Boszormenyi Zoltan írta:

 2013-01-02 10:12 keltezéssel, Magnus Hagander írta:
 Attached is the quotes-v2 patch, the function is renamed and
 the comment is modified plus the pg_basebackup v21 patch
 that uses this function.


 Rebased after pgtar.h was added. The quotes.c comment was modified
 even more so it doesn't say this function is used for SQL string literals.

 Quotes patch applied with a few small changes:
 1) Support for the msvc build (needs an entry added for new files that
 go in src/port if they should be used on Windows)
 2) Makefile entries should be alphabetical (yes, that's really trivial
 nitpicking :D)

After some further modifications, I've applied the pg_basebackup patch as well.

Thanks! And again, apologies for dragging the process out longer than
should've been necessary.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Proposal: Store timestamptz of database creation on pg_database

2013-01-05 Thread Stephen Frost
* Fabrízio de Royes Mello (fabriziome...@gmail.com) wrote:
 * also we discuss about create two new catalogs, one local and another
 shared (like pg_description and pg_shdescription) to track creation times
 of all database objects.

Creating a separate catalog (or two) every time we want to track XYZ for
all objects is rather overkill...  Thinking about this a bit more, and
noting that pg_description/shdescription more-or-less already exist as a
framework for tracking 'something' for 'all catalog entries'- why don't
we just add these columns to those tables..?  This would also address
Peter's concern about making sure we do this 'wholesale' and in one
release rather than spread across multiple releases- just make sure it
covers the same set of things which 'comment' does.

Also, I don't think we really need a GUC for this.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby [Review]

2013-01-05 Thread Boszormenyi Zoltan

2013-01-05 16:58 keltezéssel, Magnus Hagander írta:

On Sat, Jan 5, 2013 at 3:41 PM, Magnus Hagander mag...@hagander.net wrote:

On Thu, Jan 3, 2013 at 1:33 PM, Boszormenyi Zoltan z...@cybertec.at wrote:

2013-01-02 11:54 keltezéssel, Boszormenyi Zoltan írta:

2013-01-02 10:37 keltezéssel, Boszormenyi Zoltan írta:

2013-01-02 10:12 keltezéssel, Magnus Hagander írta:
Attached is the quotes-v2 patch, the function is renamed and
the comment is modified plus the pg_basebackup v21 patch
that uses this function.


Rebased after pgtar.h was added. The quotes.c comment was modified
even more so it doesn't say this function is used for SQL string literals.

Quotes patch applied with a few small changes:
1) Support for the msvc build (needs an entry added for new files that
go in src/port if they should be used on Windows)
2) Makefile entries should be alphabetical (yes, that's really trivial
nitpicking :D)

After some further modifications, I've applied the pg_basebackup patch as well.


Thank you very much!


Thanks! And again, apologies for dragging the process out longer than
should've been necessary.


I blamed it on me not having done reviews earlier in the commitfest,
which I finally did last week. Now, if only Tom could find some time
to review the lock_timeout patch... ;-)

Thanks again and best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] enhanced error fields

2013-01-05 Thread Pavel Stehule
Hello

2013/1/4 Robert Haas robertmh...@gmail.com:
 On Fri, Dec 28, 2012 at 1:21 PM, Peter Geoghegan pe...@2ndquadrant.com 
 wrote:
 Now, as to the question of whether we need to make sure that
 everything is always fully qualified - I respectfully disagree with
 Stephen, and maintain my position set out in the last round of
 feedback [1], which is that we don't need to fully qualify everything,
 because *for the purposes of error handling*, which is what I think we
 should care about, these fields are sufficient:

 +   column_name text,
 +   table_name text,
 +   constraint_name text,
 +   schema_name text,

 If you use the same constraint name everywhere, you might get the same
 error message. The situations in which this scheme might fall down are
 too implausible for me to want to bloat up all those ereport sites any
 further (something that Stephen also complained about).

 I think that the major outstanding issues are concerning whether or
 not I have the API here right. I make explicit guarantees as to the
 availability of certain fields for certain errcodes (a small number of
 Class 23 - Integrity Constraint Violation codes). No one has really
 said anything about that, though I think it's important.

 I also continue to think that we shouldn't have routine_name,
 routine_schema and trigger_name fields - I think it's wrong-headed
 to have an exception handler magically act on knowledge about where
 the exception came from that has been baked into the exception - is
 there any sort of precedent for this? Pavel disagrees here. Again, I
 defer to others.

 I don't really agree with this.  To be honest, my biggest concern
 about this patch is that it will make it take longer to report an
 error.  At least in the cases where these additional fields are
 included, it will take CPU time to populate those fields, and maybe
 there will be some overhead even in the cases where they aren't even
 used (although I'd expect that to be too little to measure).  Now,
 maybe that doesn't matter in the case where the error is being
 reported back to the client, because the overhead of shipping packets
 across even a local socket likely dwarfs the overhead, but I think it
 matters a lot where you are running a big exception-catching loop.
 That is already painfully slow, and -1 from me on anything that makes
 it significantly slower.  I have a feeling this isn't the first time
 I'm ranting about this topic in relation to this patch, so apologies
 if this is old news.

We did these tests independently - Peter and me with same result - in
use cases developed for highlighting impact of this patch you can see
some slowdown - if I remember well - less than 3% - and it raised
exceptions really intensively - and It can be visible only from stored
procedures environment - if somebody use it from client side, then
impact is zero.


 But if we decide that there is no performance issue or that we don't
 care about the hit there, then I think Stephen and Pavel are right to
 want a large amount of very precise detail.  What's the use case for
 this feature?  Presumably, it's for people for whom parsing the error
 message is not a practical option, so either they textual error
 message doesn't identify the target object sufficiently precisely, and
 they want to make sure they know what it applies to; or else it's for
 people who want any error that applies to a table to be handled the
 same way (without worrying about exactly which error they have got).
 Imagine, for example, someone writing a framework that will be used as
 a basis for many different applications.  It might want to do
 something, like, I don't know, update the comment on a table every
 time an error involving that table occurs.  Clearly, precise
 identification of the table is a must.  In a particular application
 development environment, it's reasonable to rely on users to name
 things sensibly, but if you're shipping a tool that might be dropped
 into any arbitrary environment, that's significantly more dangerous.

 Similarly, for a function-related error, you'd need something like
 that looks like the output of pg_proc.oid::regprocedure, or individual
 fields with the various components of that output.  That sounds like
 routine_name et. al.

Probably we don't need all fields mentioned in ANSI SQL, because some
from these fields has no sense in pg, but routine name and trigger
name is really basic for some little bit more sophisticate work with
exception on PL level.

Regards

Pavel



 I'm not happy about the idea of shipping OIDs back to the client.
 OIDs are too user-visible as it is; we should try to make them less
 not moreso.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] enhanced error fields

2013-01-05 Thread Pavel Stehule
2013/1/4 Peter Geoghegan pe...@2ndquadrant.com:
 On 4 January 2013 18:07, Tom Lane t...@sss.pgh.pa.us wrote:
 Exactly.  To my mind, the *entire* point of this patch is to remove the
 need for people to try to dig information out of potentially-localized
 message strings.  It's not clear to me that we have to strain to provide
 information that isn't in the currently-reported messages --- we are
 only trying to make it easier for client-side code to extract the
 information it's likely to need.

 It seems that we're in agreement, then. I'll prepare a version of the
 patch very similar to the one I previously posted, but with some
 caveats about how reliably the values can be used. I think that that
 should be fine.

is there agreement of routine_name and trigger_name fields?

Regards

Pavel


 --
 Peter Geoghegan   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reporting hba lines

2013-01-05 Thread Magnus Hagander
On Fri, Jun 29, 2012 at 4:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 Turned out to be a bit more work than I thought, since the current
 parser reads pg_hba byte by byte, and not line by line. So I had to
 change that. See attached, seems reasonable?

 A couple of comments:

 * In some places you have if ((c = *(*lineptr)++) != '\0') and in other
 places just if ((c = *(*lineptr)++)).  This should be consistent (and
 personally I prefer the first way).

 * I'm not sure that this conversion is right:

 !   if (c != EOF)
 !   ungetc(c, fp);
 ---
 !   if (c != '\0')
 !   (*lineptr)--;

 In the file case, it's impossible to push back EOF, and unnecessary
 since another getc will produce EOF again anyway.  In the string case,
 though, I think you might need to decrement the lineptr unconditionally,
 else next call will run off the end of the string no?

 * This bit seems a bit ugly, and not Windows-aware either:

 !   /* We don't store the 
 trailing newline */
 !   if 
 (rawline[strlen(rawline)-1] == '\n')
 !   
 rawline[strlen(rawline)-1] = '\0';
 !

 It might be better to strip trailing \n and \r from the line immediately
 upon read, rather than here.

I re-found this branch that I had completely forgotten about, when
cleaning up my reposiroty. oops.

Attached is an updated version of the patch, per the comments from Tom
and rebased on top of the current master. Since it's been a long time
ago, and some code churn in the area, another round of review is
probably a good thing...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


hba_line3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git author vs committer

2013-01-05 Thread Magnus Hagander
On Thu, Sep 13, 2012 at 9:00 AM, Magnus Hagander mag...@hagander.net wrote:
 On Thu, Sep 13, 2012 at 5:22 AM, Peter Eisentraut pete...@gmx.net wrote:
 On Wed, 2012-09-12 at 19:13 +0200, Magnus Hagander wrote:
 Just to be clear, what you're saying is we want to change the policy
 that says committer must be on list of approved committers 
 commiter==author to committer must be on list of approved committers
  author must be on list of approved committers?

 Yes, that would be my request.

 Definitely sounds like a reasonable thing to do. So unless there are
 objections, I'll try to get around to getting that done not too long
 from now.

Sorry, this took much longer than I had initially planned for, but at
least it's been done now.  Hopefully I didn't break anything else in
the process :)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] enhanced error fields

2013-01-05 Thread Peter Geoghegan
On 5 January 2013 16:56, Pavel Stehule pavel.steh...@gmail.com wrote:
 It seems that we're in agreement, then. I'll prepare a version of the
 patch very similar to the one I previously posted, but with some
 caveats about how reliably the values can be used. I think that that
 should be fine.

 is there agreement of routine_name and trigger_name fields?

Well, Tom and I are both opposed to including those fields. Peter E
seemed to support it in some way, but didn't respond to Tom's
criticisms (which were just a restatement of my own). So, it seems to
me that we're not going to do that, assuming nothing changes.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Proposal: Store timestamptz of database creation on pg_database

2013-01-05 Thread Fabrízio de Royes Mello
* Stephen Frost sfr...@snowman.net wrote:

 * Fabrízio de Royes Mello (fabriziome...@gmail.com) wrote:
  * also we discuss about create two new catalogs, one local and another
  shared (like pg_description and pg_shdescription) to track creation
times
  of all database objects.

 Creating a separate catalog (or two) every time we want to track XYZ for
 all objects is rather overkill...  Thinking about this a bit more, and
 noting that pg_description/shdescription more-or-less already exist as a
 framework for tracking 'something' for 'all catalog entries'- why don't
 we just add these columns to those tables..?

But those tables are filled only when we execute COMMENT ON statement...
then your idea is create a 'null' comment every time we create a single
object... is it?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] enhanced error fields

2013-01-05 Thread Pavel Stehule
2013/1/5 Peter Geoghegan pe...@2ndquadrant.com:
 On 5 January 2013 16:56, Pavel Stehule pavel.steh...@gmail.com wrote:
 It seems that we're in agreement, then. I'll prepare a version of the
 patch very similar to the one I previously posted, but with some
 caveats about how reliably the values can be used. I think that that
 should be fine.

 is there agreement of routine_name and trigger_name fields?

 Well, Tom and I are both opposed to including those fields. Peter E
 seemed to support it in some way, but didn't respond to Tom's
 criticisms (which were just a restatement of my own). So, it seems to
 me that we're not going to do that, assuming nothing changes.

if I understand well Robert Haas is for including these fields - so
score is still 2:2 - but this is not a  match :)

I have no more new arguments for these fields - yes, there are no change

Pavel


 --
 Peter Geoghegan   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bad examples in pg_dump README

2013-01-05 Thread Josh Kupershmidt
On Sat, Jan 5, 2013 at 7:34 AM, Magnus Hagander mag...@hagander.net wrote:
 On Fri, Jan 4, 2013 at 3:36 AM, Josh Kupershmidt schmi...@gmail.com wrote:
 Do we need to keep it at all, really? Certainly the introductory part
 is covered in the main documentation already...

Pretty much. (I did find note #2 mildly interesting.)

 I believe the tar notes are also out of date. For example, they claim
 that you can't expect pg_restore to work with an uncompressed tar
 format - yet the header in pg_backup_tar.c says that an uncompressed
 tar format is compatible with a directory format dump, and thus you
 *can* use pg_restore.

 (And fwiw,the note about the user should probably go in src/port/ now
 that we moved the tar header creation there a few days ago)

Hrm yeah, so the second paragraph under the tar section can certainly be axed.

 I would suggest we just drop the README file completely. I don't think
 it adds any value at all.

 Any objections to that path? :)

I think that's OK, since there's not much left in that README after
removing the bogus examples, introductory text that's covered
elsewhere, and obsolete second paragraph about the tar format. Perhaps
we could keep the other paragraphs about the tar format, either in the
header comments for pg_backup_tar.c or in src/port/, though?

Oh, and for this comment in pg_backup_tar.c:

|  *  See the headers to pg_backup_files  pg_restore for more
details.

there is no longer a pg_backup_files.c.

Josh


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_retainxlog for inclusion in 9.3?

2013-01-05 Thread Phil Sorber
On Tue, Jan 1, 2013 at 10:10 AM, Magnus Hagander mag...@hagander.net wrote:
 So, it turns out the reason I got no feedback on this tool, was that I
 forgot both to email about and to actually push the code to github :O
 So this is actually code that's almost half a year old and that I was
 supposed to submit for the first or second commitfest. Oops.

 So, the tool and a README for it right now can be found at
 https://github.com/mhagander/pg_retainxlog for the time being.

 The idea behind the tool is to plug a hole in the case when
 pg_receivexlog is used for log archiving, which is that you still need
 a blocking archive_command in order to make sure that files aren't
 recycled on the master. So for 9.2 you can do this with an
 archive_command that checks if the file has appeared properly on the
 slave - but that usually means you're back at requiring ssh
 connectivity between the machines, for example. Even though this
 information is actually avialable on the master...

 This can also be of use to pure replication scenarios, where you don't
 know how to tune wal_keep_segments, but using actual live feedback
 instead of guessing.

 When pg_retainxlog is used as an archive_command, it will check the
 pg_stat_replication view instead of checking the slave. It will then
 only return ok once the requested logfile has been replicated to the
 slave. By default it will look for a replication client named
 pg_receivexlog, but it supports overriding the query with anything -
 so you can say things like needs to have arrived on at least two
 replication slaves before we consider it archived. Or if used instead
 of wal_keep_segmnets, needs to have arrived at *all* replication
 slaves.

 Is this a tool that people would like to see included in the general
 toolchain? If so, I'll reformat it to work in the general build
 environment and submit it for the last commitfest.

 (comments on the code itself are of course also welcome)

 --
  Magnus Hagander
  Me: http://www.hagander.net/
  Work: http://www.redpill-linpro.com/


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

+1 to this concept, however it may be implemented.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Invent a one-shot variant of CachedPlans for better performanc

2013-01-05 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On 4 January 2013 22:42, Tom Lane t...@sss.pgh.pa.us wrote:
 Invent a one-shot variant of CachedPlans for better performance.

 Just a moment of reflection here but I did already invent a one-shot
 plan concept in a patch submission to 9.2, called exactly that, which
 enables a variety of optimizations, this being one.

If you're speaking of
http://archives.postgresql.org/pgsql-hackers/2011-06/msg01168.php

that patch was vastly more invasive than this one, with vastly less
clear performance characteristics (none of which were documented by test
cases).  And it had the potential for unexpected user-visible semantics
changes; for instance, as submitted it made EXPLAIN produce results
different from what might actually happen at execution.  I don't think
there's any comparison to this patch at all except in the nomenclature.

 We need a full one-shot concept linking the planner and executor for
 all sorts of reasons, not just this one. We can discuss the
 practicality of individual optimizations but the linkage should be
 clear throughout the whole infrastructure.

I thought then, and I think now, that such a concept was too squishy
to be useful as an actual guide to what to change.  The particular
arguments you advanced then have been overtaken by events anyway;
notably that Marti Raudsepp's work on caching stable subexpressions at
execution seems like a much more general answer to the problem of
handling stable functions efficiently.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Proposal: Store timestamptz of database creation on pg_database

2013-01-05 Thread Stephen Frost
* Fabrízio de Royes Mello (fabriziome...@gmail.com) wrote:
 But those tables are filled only when we execute COMMENT ON statement...
 then your idea is create a 'null' comment every time we create a single
 object... is it?

Yes, and have the actual 'description' field (as it's variable) at the
end of the catalog.

Regarding the semantics of it- I was thinking about how directories and
unix files work.  Basically, adding or removing a sub-object would
update the alter time on the object itself, changing an already existing
object or sub-object would update only the object/sub-object's alter
time.  Creating an object or sub/object would set its create time and
alter time to the same value.  I would distinguish 'create' from
'ctime', however, and have our 'create' time be only the actual
*creation* time of the object.  ALTER table OWNER TO user; would update
tables alter time.

Open to other thoughts on this and perhaps we should create a wiki page
to start documentating the semantics.  Once we get agreement there, it's
just a bit of code. :)

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2013-01-05 Thread Joshua Berkus
Robert,

 I'm sure it's possible; I don't *think* it's terribly easy.  The
 usual
 algorithm for cycle detection is to have each node send to the next
 node the path that the data has taken.  But, there's no unique
 identifier for each slave that I know of - you could use IP address,
 but that's not really unique.  And, if the WAL passes through an
 archive, how do you deal with that?  

Not that I know how to do this, but it seems like a more direct approach is to 
check whether there's a master anywhere up the line.  H.  Still sounds 
fairly difficult.

 I'm sure somebody could figure
 all of this stuff out, but it seems fairly complicated for the
 benefit
 we'd get.  I just don't think this is going to be a terribly common
 problem; if it turns out I'm wrong, I may revise my opinion.  :-)

I don't think it'll be that common either.  The problem is that when it does 
happen, it'll be very hard for the hapless sysadmin involved to troubleshoot.

 To me, it seems that lag monitoring between master and standby is
 something that anyone running a complex replication configuration
 should be doing - and yeah, I think anything involving four standbys
 (or cascading) qualifies as complex.  If you're doing that, you
 should
 notice pretty quickly that your replication lag is increasing
 steadily.  

There are many reasons why replication lag would increase steadily.

 You might also check pg_stat_replication the master and
 notice that there are no connections there any more. 

Well, if you've created a true cycle, every server has one or more replicas.  
The original case I presented was the most probably cause of accidental cycles: 
the original master dies, and the on-call sysadmin accidentally connects the 
first replica to the last replica while trying to recover the cluster.

AFAICT, the only way to troubleshoot a cycle is to test every server in the 
network to see if it's a master and has replicas, and if no server is a master 
with replicas, it's a cycle.  Again, not fast or intuitive.

 Could someone
 miss those tell-tale signs?  Sure.  But they could also set
 autovacuum_naptime to an hour and then file a support ticket
 complaining that about table bloat - and they do.  Personally, as
 user
 screw-ups go, I'd consider that scenario (and its fourteen cousins,
 twenty-seven second cousins, and three hundred and ninety two other
 extended family members) as higher-priority and lower effort to fix
 than this particular thing.

I agree that this isn't a particularly high-priority issue.  I do think it 
should go on the TODO list, though, just in case we get a GSOC student or other 
new contributor who wants to tackle it.

--Josh




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2013-01-05 Thread Peter Geoghegan
On 21 December 2012 14:08, Robert Haas robertmh...@gmail.com wrote:
 I'm sure it's possible; I don't *think* it's terribly easy.

I'm inclined to agree that this isn't a terribly pressing issue.
Certainly, the need to introduce a bunch of new infrastructure to
detect this case seems hard to justify.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] Slow query: bitmap scan troubles

2013-01-05 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes:
 [moved to hackers]
 On Wednesday, December 5, 2012, Tom Lane wrote:
 Hm.  To tell you the truth, in October I'd completely forgotten about
 the January patch, and was thinking that the 1/1 cost had a lot
 of history behind it.  But if we never shipped it before 9.2 then of
 course that idea is false.  Perhaps we should backpatch the log curve
 into 9.2 --- that would reduce the amount of differential between what
 9.2 does and what previous branches do for large indexes.

 I think we should backpatch it for 9.2.3.  I've seen another email which is
 probably due to the same issue (nested loop vs hash join).  And some
 monitoring of a database I am responsible for suggests it might be heading
 in that direction as well as the size grows.

I received an off-list report of a case where not only did the 1/1
factor cause a nestloop-vs-hashjoin decision to be made wrongly, but
even adding the ln() computation as in commit bf01e34b556 didn't fix it.
I believe the index in question was on the order of 2 pages, so
it's not too hard to see why this might be the case:

* historical fudge factor   4 * 2/10 = 0.8
* 9.2 fudge factor  4 * 2/1 = 8.0
* with ln() correction  4 * ln(1 + 2/1) = 4.39 or so

At this point I'm about ready to not only revert the 10-to-1
change, but keep the ln() adjustment, ie make the calculation be
random_page_cost * ln(1 + index_pages/10).  This would give
essentially the pre-9.2 behavior for indexes up to some tens of
thousands of pages, and keep the fudge factor from getting out of
control even for very very large indexes.

 But I am wondering if it should be present at all in 9.3.  When it was
 introduced, the argument seemed to be that smaller indexes might be easier
 to keep in cache.

No.  The argument is that if we don't have some such correction, the
planner is liable to believe that different-sized indexes have *exactly
the same cost*, if a given query would fetch the same number of index
entries.  This is quite easy to demonstrate when experimenting with
partial indexes, in particular - without the fudge factor the planner
sees no advantage of a partial index over a full index from which the
query would fetch the same number of entries.  We do want the planner
to pick the partial index if it's usable, and a fudge factor is about
the least unprincipled way to make it do so.

 The argument for increasing the penalty by a factor of 10 was that the
 smaller one could be swamped by noise such as page-boundary-roundoff
 behavior.

Yeah, I wrote that, but in hindsight it seems like a mistaken idea.
The noise problem is that because we round off page count and row count
estimates to integers at various places, it's fairly easy for small
changes in statistics to move a plan's estimated cost by significantly
more than this fudge factor will.  However, the case that the fudge
factor is meant to fix is indexes that are otherwise identical for
the query's purposes --- and any roundoff effects will be the same.
(The fudge factor itself is *not* rounded off anywhere, it flows
directly to the bottom-line cost for the indexscan.)

 One thing which depends on the index size which, as far as I can tell, is
 not currently being counted is the cost of comparing the tuples all the way
 down the index.  This would be proportional to log2(indextuples) *
 cpu_index_tuple_cost, or maybe log2(indextuples) *
 (cpu_index_tuple_cost+cpu_operator_cost), or something like that.

Yeah, I know.  I've experimented repeatedly over the years with trying
to account explicitly for index descent costs.  But every time, anything
that looks even remotely principled turns out to produce an overly large
correction that results in bad plan choices.  I don't know exactly why
this is, but it's true.

One other point is that I think it is better for any such correction
to depend on the index's total page count, not total tuple count,
because otherwise two indexes that are identical except for bloat
effects will appear to have identical costs.  So from that standpoint,
the ln() form of the fudge factor seems quite reasonable as a crude form
of index descent cost estimate.  The fact that we're needing to dial
it down so much reinforces my feeling that descent costs are close to
negligible in practice.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] dynamic SQL - possible performance regression in 9.2

2013-01-05 Thread Jeff Janes
On Wednesday, January 2, 2013, Tom Lane wrote:

 Jeff Janes jeff.ja...@gmail.com writes:
  Using a RULE-based partitioning instead with row by row insertion, the
  plancache changes  slowed it down by 300%, and this patch doesn't change
  that.  But that seems to be down to the insertion getting planned
  repeatedly, because it decides the custom plan is cheaper than the
 generic
  plan.  Whatever savings the custom plan may have are clearly less than
 the
  cost of doing the planning repeatedly.

 That scenario doesn't sound like it has anything to do with the one being
 discussed in this thread.  But what do you mean by rule-based
 partitioning exactly?  A rule per se wouldn't result in a cached plan
 at all, let alone one with parameters, which would be necessary to
 trigger any use of the custom-cached-plan code path.


Right, it is not related to the dynamic SQL, but is to the plan-cache.


 Test cases are way more interesting than hand-wavy complaints.


Sorry, when exiled to the hinterlands I have more time to test various
things but not a good enough connectivity to describe them well.  I'm
attaching the test case to load 1e5 rows into a very skinny table with 100
partitions using rules.

origin is from a few days ago, origin_reduce_copies is Heikki's patch,
and origin_one_shot is your now-committed patch.  (unshown are
e6faf910d75027 and e6faf910d75027_prev, but that is where the regression
was introduced)

JJ /usr/local/pgsql_REL9_1_7/
Time: 64252.6907920837 ms
JJ origin/
Time: 186657.824039459 ms
JJ origin_reduce_copies/
Time: 185370.236873627 ms
JJ origin_one_shot/
Time: 189104.484081268 ms


The root problem is that it thinks the generic plan costs about 50% more
than the custom one.  I don't know why it thinks that, or how much it is
worth chasing it down.

On the other hand, your patch does fix almost all of the 9.2.[012]
regression of using the following dynamic SQL trigger (instead of RULES) to
load into the same test case.

CREATE OR REPLACE FUNCTION foo_insert_trigger()
RETURNS trigger AS $$
DECLARE tablename varchar(24);
BEGIN
tablename = 'foo_' || new.partition;
EXECUTE 'INSERT INTO '|| tablename ||' VALUES (($1).*)' USING NEW ;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;

CREATE TRIGGER foo_insert_trigger
BEFORE INSERT ON foo
FOR EACH ROW EXECUTE PROCEDURE foo_insert_trigger();

Cheers,

Jeff
## The purpose of this program is to load the same files that \copy would 
## load into a pgsql table, but do so one row at a time rather than in bulk.
## This could be useful to demonstrate the difference in loading efficiency between bulk 
## and row by row, without having to use different formats for each.

## This makes no attempt to deal with escape characters like \t and \n the same way \copy does, 
## so the loaded results will not be identical but that shouldn't matter if used only for 
## benchmarking and not for actual production loading (and why would you use this for that purpose when 
## \copy is available?)

## When loading into a 2 column unindexed untriggered table, Perl takes about half the CPU time and 
## postgres about half.  When loading to a table with triggers or indexs, Perl's slice becomes 
## mostly immaterial.

use strict;
use warnings;
use DBI;
use Time::HiRes qw(time);
my $start=time();
my $dbi=DBI-connect('DBI:Pg:');
my ($columns) = $dbi-selectrow_array(select count(*) from information_schema.columns where table_name=?, undef, $ARGV[0]);
$columns  0 or die no such table '$ARGV[0]';
## prepare an insert with as many placeholders as the table has columns
my $insert=$dbi-prepare(Insert into $ARGV[0] values ( . (join ',', map '?', 1..$columns) .')');

open my $fh, , $ARGV[1] or die Couldn't open '$ARGV[1]': $!;
$dbi-begin_work();
while ($fh) { chomp;
	my @x=split /\t/;
	$insert-execute(@x);
};
$dbi-commit();
my $stop=time();
## make a timing output that look like the one psql \timing would generate
print Time: , 1000* ($stop-$start),  ms\n;

-- -- make rand.csv like below.
-- perl -le 'print join \t, int(rand()*1e9),$_%100 foreach 1..1e5'  rand.csv

-- -- execute on different versions:
--  while (true) ; do for f in /usr/local/pgsql_REL9_1_7/  /usr/local/pgsql_REL9_2_2/ origin_94afbd5831fbc1926f/; do pg_ctl stop ; rm -r /tmp/data; $f/bin/initdb; $f/bin/pg_ctl start -w -o --shared_buffers=512MB --checkpoint_segments=60 --shared_buffers=16MB; createdb; echo JJ $f;  psql -f rules_test.sql -X ; done ; done  rules_test.sql.out

-- -- and pull results from the log file
-- tail -n 100 -f rules_test.sql.out |grep -P 'JJ|Time:' -a


drop table foo cascade;
create table foo ( x integer, partition integer);
--create index on foo (x);
create table foo_0 (like foo including indexes) inherits (foo);
create table foo_1 (like foo including indexes) inherits (foo);
create table foo_2 (like foo including indexes) inherits (foo);
create table foo_3 (like foo including indexes) inherits (foo);
create table foo_4 (like foo including indexes) inherits (foo);
create table foo_5 (like foo 

Re: [HACKERS] too much pgbench init output

2013-01-05 Thread Tatsuo Ishii
 On 19.12.2012 06:30, Jeevan Chalke wrote:
 
 
 
 On Mon, Dec 17, 2012 at 5:37 AM, Tomas Vondra t...@fuzzy.cz
 mailto:t...@fuzzy.cz wrote:
 
 Hi,
 
 attached is a new version of the patch that
 
 (a) converts the 'log_step_seconds' variable to a constant (and does
 not allow changing it using a command-line option etc.)
 
 (b) keeps the current logging as a default
 
 (c) adds a -q switch that enables the new logging with a 5-second
 interval
 
 I'm still not convinced there should be yet another know for tuning the
 log interval - opinions?
 
 
 It seems that you have generated a patch over your earlier version and
 due to that it is not cleanly applying on fresh sources.
 Please generate patch on fresh sources.
 
 Seems you're right - I've attached the proper patch against current master.
 
 However, I absolutely no issues with the design. Also code review is
 already done and looks good to me.
 I think to move forward on this we need someone from core-team. So I am
 planning to change its status to ready-for-committor.
 
 Before that please provide updated patch for final code review.

As a committer, I have looked into the patch. I noticed two things:

1) In the help you put '-q' option into Common options section. I
think this should be moved to Initialization options section because
the option is only applied while initializing.

2) Shouldn't a long option for '-q' be provided? Something like
'--quiet-progress-logging'?

3) No patches for docs found (doc/src/sgml/pgbench.sgml)
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] question: foreign key constraints and AccessExclusive locks

2013-01-05 Thread Jon Nelson
When adding a foreign key constraint on tableA which references
tableB, why is an AccessExclusive lock on tableB necessary? Wouldn't a
lock that prevents writes be sufficient, or does PostgreSQL have to
modify *both* tables in some fashion? I'm using PostgreSQL 8.4 on
Linux.

-- 
Jon


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] too much pgbench init output

2013-01-05 Thread Tomas Vondra
On 6.1.2013 03:03, Tatsuo Ishii wrote:
 As a committer, I have looked into the patch. I noticed two things:
 
 1) In the help you put '-q' option into Common options section. I
 think this should be moved to Initialization options section because
 the option is only applied while initializing.

Good point, moved.

 2) Shouldn't a long option for '-q' be provided? Something like
 '--quiet-progress-logging'?

I don't think so. Currently pgbench has either short or long option, not
both (for the same thing). I believe we should stick to this and either
choose -q or --quiet-logging but not both.

 3) No patches for docs found (doc/src/sgml/pgbench.sgml)

I've added a brief description of the -q option into the docs. IMHO
it's enough but feel free to add some more details.


There's one more thing I've just noticed - the original version of the
patch simply removed the old logging, but this one keeps both old and
quiet logging. But the old logging still uses this:

fprintf(stderr, %d of %d tuples (%d%%) done.\n, 

while the new logging does this

fprintf(stderr, %d of %d tuples (%d%%) done (elapsed %.2f s,
remaining %.2f s).\n,

i.e. it prints additional info about elapsed/estimated time. Do we want
to keep it this way (i.e. not to mess with the old logging) or do we
want to add these new fields to the old logging too?

I suggest to add it to the old logging, to keep the log messages the
same, the only difference being the logging frequency.


Tomas
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index e376452..31764fe 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -39,6 +39,7 @@
 #include portability/instr_time.h
 
 #include ctype.h
+#include math.h
 
 #ifndef WIN32
 #include sys/time.h
@@ -102,6 +103,7 @@ extern int  optind;
 #define MAXCLIENTS 1024
 #endif
 
+#define LOG_STEP_SECONDS   5   /* seconds between log messages */
 #define DEFAULT_NXACTS 10  /* default nxacts */
 
 intnxacts = 0; /* number of 
transactions per client */
@@ -150,6 +152,7 @@ char   *index_tablespace = NULL;
 #define naccounts  10
 
 bool   use_log;/* log transaction latencies to 
a file */
+bool   use_quiet;  /* quiet logging onto stderr */
 bool   is_connect; /* establish connection for 
each transaction */
 bool   is_latencies;   /* report per-command latencies */
 intmain_pid;   /* main process id used 
in log filename */
@@ -359,6 +362,7 @@ usage(void)
 -n   do not run VACUUM after initialization\n
 -F NUM   fill factor\n
 -s NUM   scaling factor\n
+-q   quiet logging (one message each 5 seconds)\n
 --foreign-keys\n
  create foreign key constraints between 
tables\n
 --index-tablespace=TABLESPACE\n
@@ -1362,6 +1366,11 @@ init(bool is_no_vacuum)
charsql[256];
int i;
 
+   /* used to track elapsed time and estimate of the remaining time */
+   instr_time  start, diff;
+   double  elapsed_sec, remaining_sec;
+   int log_interval = 1;
+
if ((con = doConnect()) == NULL)
exit(1);
 
@@ -1430,6 +1439,8 @@ init(bool is_no_vacuum)
}
PQclear(res);
 
+   INSTR_TIME_SET_CURRENT(start);
+
for (i = 0; i  naccounts * scale; i++)
{
int j = i + 1;
@@ -1441,10 +1452,33 @@ init(bool is_no_vacuum)
exit(1);
}
 
-   if (j % 10 == 0)
+   /* If we want to stick with the original logging, print a 
message each
+* 100k inserted rows. */
+   if ((! use_quiet)  (j % 10 == 0))
fprintf(stderr, %d of %d tuples (%d%%) done.\n,
-   j, naccounts * scale,
-   (int) (((int64) j * 100) / (naccounts * 
scale)));
+   j, naccounts * scale,
+   (int) (((int64) j * 
100) / (naccounts * scale)));
+   /* let's not call the timing for each row, but only each 100 
rows */
+   else if (use_quiet  (j % 100 == 0))
+   {
+   INSTR_TIME_SET_CURRENT(diff);
+   INSTR_TIME_SUBTRACT(diff, start);
+
+   elapsed_sec = INSTR_TIME_GET_DOUBLE(diff);
+   remaining_sec = (scale * naccounts - j) * elapsed_sec / 
j;
+
+   /* have we reached the next interval (or end)? */
+   if ((j == scale * naccounts) || 

Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction

2013-01-05 Thread Tomas Vondra
On 4.1.2013 17:42, Robert Haas wrote:
 On Mon, Dec 31, 2012 at 11:51 AM, Tomas Vondra t...@fuzzy.cz wrote:
 I thought I followed the conding style - which guidelines have I broken?
 
 + /* If there are no non-local relations, then we're done. Release the 
 memory
 +  * and return. */
 
 Multi-line comments should start with a line containing only /* and
 end with a line containing only */.
 
 +DropRelFileNodeAllBuffers(RelFileNodeBackend * rnodes, int nnodes)
 and
 +rnode_comparator(const void * p1, const void * p2)
 
 The extra spaces after the asterisks should be removed.
 
 +void smgrdounlinkall(SMgrRelation * rels, int nrels, bool isRedo)
 +{
 
 void should be on a line by itself.
 
 Sorry to nitpick.

No, thanks for the nitpicking! Code style is important.

 As for BSEARCH_LIMIT, I don't have a great idea - maybe just
 DROP_RELATIONS_BSEARCH_LIMIT?

Sounds good. I've changed the name and fixed the codestyle issues in the
attached version of the patch.

Tomas
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 8dc622a..8c12a43 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -312,6 +312,11 @@ smgrDoPendingDeletes(bool isCommit)
PendingRelDelete *pending;
PendingRelDelete *prev;
PendingRelDelete *next;
+
+   SMgrRelation   *srels = palloc(sizeof(SMgrRelation));
+   int nrels = 0,
+   i = 0,
+   maxrels = 1;
 
prev = NULL;
for (pending = pendingDeletes; pending != NULL; pending = next)
@@ -335,14 +340,31 @@ smgrDoPendingDeletes(bool isCommit)
SMgrRelation srel;
 
srel = smgropen(pending-relnode, 
pending-backend);
-   smgrdounlink(srel, false);
-   smgrclose(srel);
+   
+   /* extend the array if needed (double the size) 
*/
+   if (maxrels = nrels) {
+   maxrels *= 2;
+   srels = repalloc(srels, 
sizeof(SMgrRelation) * maxrels);
+   }
+   
+   srels[nrels++] = srel;
}
/* must explicitly free the list entry */
pfree(pending);
/* prev does not change */
}
}
+
+   if (nrels  0)
+   {
+   smgrdounlinkall(srels, nrels, false);
+   
+   for (i = 0; i  nrels; i++)
+   smgrclose(srels[i]);
+   }
+   
+   pfree(srels);
+
 }
 
 /*
diff --git a/src/backend/storage/buffer/bufmgr.c 
b/src/backend/storage/buffer/bufmgr.c
index dddb6c0..52ca2b0 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -62,6 +62,7 @@
 #define BUF_WRITTEN0x01
 #define BUF_REUSABLE   0x02
 
+#define DROP_RELATIONS_BSEARCH_LIMIT   10
 
 /* GUC variables */
 bool   zero_damaged_pages = false;
@@ -108,6 +109,7 @@ static volatile BufferDesc *BufferAlloc(SMgrRelation smgr,
 static void FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln);
 static void AtProcExit_Buffers(int code, Datum arg);
 
+static int rnode_comparator(const void * p1, const void * p2);
 
 /*
  * PrefetchBuffer -- initiate asynchronous read of a block of a relation
@@ -2094,35 +2096,87 @@ DropRelFileNodeBuffers(RelFileNodeBackend rnode, 
ForkNumber forkNum,
  * 
  */
 void
-DropRelFileNodeAllBuffers(RelFileNodeBackend rnode)
+DropRelFileNodeAllBuffers(RelFileNodeBackend *rnodes, int nnodes)
 {
-   int i;
+   int i, j, n = 0;
+   RelFileNode * nodes;
+
+   nodes = palloc(sizeof(RelFileNode) * nnodes); /* non-local relations */
 
/* If it's a local relation, it's localbuf.c's problem. */
-   if (RelFileNodeBackendIsTemp(rnode))
+   for (i = 0; i  nnodes; i++)
{
-   if (rnode.backend == MyBackendId)
-   DropRelFileNodeAllLocalBuffers(rnode.node);
+   if (RelFileNodeBackendIsTemp(rnodes[i]))
+   {
+   if (rnodes[i].backend == MyBackendId)
+   DropRelFileNodeAllLocalBuffers(rnodes[i].node);
+   }
+   else
+   {
+   nodes[n++] = rnodes[i].node;
+   }
+   }
+
+   /*
+* If there are no non-local relations, then we're done. Release the 
memory
+* and return.
+*/
+   if (n == 0)
+   {
+   pfree(nodes);
return;
}
 
+   /* sort the list of rnodes (but only 

Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-01-05 Thread Tomas Vondra
On 3.1.2013 20:33, Magnus Hagander wrote:
 On Thu, Jan 3, 2013 at 8:31 PM, Tomas Vondra t...@fuzzy.cz wrote:
 On 3.1.2013 18:47, Heikki Linnakangas wrote:
 How about creating the new directory as a direct subdir of $PGDATA,
 rather than buried in global? global is supposed to contain data
 related to shared catalog relations (plus pg_control), so it doesn't
 seem like the right location for per-database stat files. Also, if we're
 going to have admins manually zapping the directory (hopefully when the
 system is offline), that's less scary if the directory is not buried as
 deep.

 That's clearly possible and it's a trivial change. I was thinking about
 that actually, but then I placed the directory into global because
 that's where the pgstat.stat originally was.
 
 Yeah, +1 for a separate directory not in global.

OK, I moved the files from global/stat to stat.

Tomas
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index be3adf1..4ec485e 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -64,10 +64,14 @@
 
 /* --
  * Paths for the statistics files (relative to installation's $PGDATA).
+ * Permanent and temprorary, global and per-database files.
  * --
  */
-#define PGSTAT_STAT_PERMANENT_FILENAME global/pgstat.stat
-#define PGSTAT_STAT_PERMANENT_TMPFILE  global/pgstat.tmp
+#define PGSTAT_STAT_PERMANENT_DIRECTORYstat
+#define PGSTAT_STAT_PERMANENT_FILENAME stat/global.stat
+#define PGSTAT_STAT_PERMANENT_TMPFILE  stat/global.tmp
+#define PGSTAT_STAT_PERMANENT_DB_FILENAME  stat/%d.stat
+#define PGSTAT_STAT_PERMANENT_DB_TMPFILE   stat/%d.tmp
 
 /* --
  * Timer definitions.
@@ -115,8 +119,11 @@ int
pgstat_track_activity_query_size = 1024;
  * Built from GUC parameter
  * --
  */
+char  *pgstat_stat_directory = NULL;
 char  *pgstat_stat_filename = NULL;
 char  *pgstat_stat_tmpname = NULL;
+char  *pgstat_stat_db_filename = NULL;
+char  *pgstat_stat_db_tmpname = NULL;
 
 /*
  * BgWriter global statistics counters (unused in other processes).
@@ -219,11 +226,16 @@ static intlocalNumBackends = 0;
  */
 static PgStat_GlobalStats globalStats;
 
-/* Last time the collector successfully wrote the stats file */
-static TimestampTz last_statwrite;
+/* Write request info for each database */
+typedef struct DBWriteRequest
+{
+   Oid databaseid; /* OID of the database 
to write */
+   TimestampTz request_time;   /* timestamp of the last write request 
*/
+} DBWriteRequest;
 
-/* Latest statistics request time from backends */
-static TimestampTz last_statrequest;
+/* Latest statistics request time from backends for each DB */
+static DBWriteRequest * last_statrequests = NULL;
+static int num_statrequests = 0;
 
 static volatile bool need_exit = false;
 static volatile bool got_SIGHUP = false;
@@ -252,11 +264,17 @@ static void pgstat_sighup_handler(SIGNAL_ARGS);
 static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
 static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,
 Oid tableoid, bool create);
-static void pgstat_write_statsfile(bool permanent);
-static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent);
+static void pgstat_write_statsfile(bool permanent, bool force);
+static void pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool 
permanent);
+static void pgstat_write_db_dummyfile(Oid databaseid);
+static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent, bool onlydbs);
+static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB 
*funchash, bool permanent);
 static void backend_read_statsfile(void);
 static void pgstat_read_current_status(void);
 
+static bool pgstat_write_statsfile_needed();
+static bool pgstat_db_requested(Oid databaseid);
+
 static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
 static void pgstat_send_funcstats(void);
 static HTAB *pgstat_collect_oids(Oid catalogid);
@@ -285,7 +303,6 @@ static void 
pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int le
 static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len);
 static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
 
-
 /* 
  * Public functions called from postmaster follow
  * 
@@ -549,8 +566,34 @@ startup_failed:
 void
 pgstat_reset_all(void)
 {
-   unlink(pgstat_stat_filename);
-   unlink(PGSTAT_STAT_PERMANENT_FILENAME);
+   DIR * dir;
+   struct dirent * entry;
+
+   dir = AllocateDir(pgstat_stat_directory);
+   while ((entry = ReadDir(dir, pgstat_stat_directory)) != NULL)
+   {
+   char fname[strlen(pgstat_stat_directory) + 
strlen(entry-d_name) + 1];
+
+   if 

Re: [HACKERS] too much pgbench init output

2013-01-05 Thread Tatsuo Ishii
 On 6.1.2013 03:03, Tatsuo Ishii wrote:
 As a committer, I have looked into the patch. I noticed two things:
 
 1) In the help you put '-q' option into Common options section. I
 think this should be moved to Initialization options section because
 the option is only applied while initializing.
 
 Good point, moved.

In addition to this, I'd suggest to add checking -q is only possible
with -i option since without -i, -q is meaningless.

 2) Shouldn't a long option for '-q' be provided? Something like
 '--quiet-progress-logging'?
 
 I don't think so. Currently pgbench has either short or long option, not
 both (for the same thing). I believe we should stick to this and either
 choose -q or --quiet-logging but not both.

Ok.

 3) No patches for docs found (doc/src/sgml/pgbench.sgml)
 
 I've added a brief description of the -q option into the docs. IMHO
 it's enough but feel free to add some more details.

Good.

 There's one more thing I've just noticed - the original version of the
 patch simply removed the old logging, but this one keeps both old and
 quiet logging. But the old logging still uses this:
 
 fprintf(stderr, %d of %d tuples (%d%%) done.\n, 
 
 while the new logging does this
 
 fprintf(stderr, %d of %d tuples (%d%%) done (elapsed %.2f s,
 remaining %.2f s).\n,
 
 i.e. it prints additional info about elapsed/estimated time. Do we want
 to keep it this way (i.e. not to mess with the old logging) or do we
 want to add these new fields to the old logging too?
 
 I suggest to add it to the old logging, to keep the log messages the
 same, the only difference being the logging frequency.

If we do so, probably '-q' is not appropeate option name any more,
since the only difference between old logging and new one is, the
former is printed every 1 lines while the latter is every 5
seconds, which is not really quiet. What do you think?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-01-05 Thread Amit kapila

On Saturday, January 05, 2013 12:35 PM Boszormenyi Zoltan wrote:
2013-01-05 05:58 keltezéssel, Amit kapila írta:
 On Friday, January 04, 2013 10:57 PM Boszormenyi Zoltan wrote:
 Hi,
 I am reviewing your patch.
 Thank you very much.

 Yes, you are right adding a new LWLock will avoid the use of sleep.
 However I am just not sure if for only this purpose we should add a new 
 LWLock?

 Similar logic is used in CreateLockFile() for postmaster file but it doesn't 
 use sleep.
 How about reducing the time of sleep or removing sleep, in that user might 
 get
 error and he need to retry to get his command successful?

 I think removing the loop entirely is better.

 However, the behaviour should be discussed by a wider audience:
 - the backends should wait for each other just like other statements
that fight for the same object (in which case locking is needed), or
 - throw an error immediately on conflicting parallel work

Okay, I shall update the patch with first option as suggested by Noah as well.

I also tried to trigger one of the errors by creating the lock file manually.
 You need an extra space between the ... retry and or ...:

 +   ereport(ERROR,
 + (errcode_for_file_access(),
 +errmsg(could not create 
 lock file
 \%s\: %m , LockFileName),
 +errhint(May be too many 
 concurrent edit
 into file happening, please wait!! and retry
 +or .lock is 
 file
 accidently left there please clean the file from config_dir)));

Will fix.


Thank you for review.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] too much pgbench init output

2013-01-05 Thread Tomas Vondra
On 6.1.2013 05:07, Tatsuo Ishii wrote:
 On 6.1.2013 03:03, Tatsuo Ishii wrote:
 As a committer, I have looked into the patch. I noticed two things:

 1) In the help you put '-q' option into Common options section. I
 think this should be moved to Initialization options section because
 the option is only applied while initializing.

 Good point, moved.
 
 In addition to this, I'd suggest to add checking -q is only possible
 with -i option since without -i, -q is meaningless.

Done.

 There's one more thing I've just noticed - the original version of the
 patch simply removed the old logging, but this one keeps both old and
 quiet logging. But the old logging still uses this:

 fprintf(stderr, %d of %d tuples (%d%%) done.\n, 

 while the new logging does this

 fprintf(stderr, %d of %d tuples (%d%%) done (elapsed %.2f s,
 remaining %.2f s).\n,

 i.e. it prints additional info about elapsed/estimated time. Do we want
 to keep it this way (i.e. not to mess with the old logging) or do we
 want to add these new fields to the old logging too?

 I suggest to add it to the old logging, to keep the log messages the
 same, the only difference being the logging frequency.
 
 If we do so, probably '-q' is not appropeate option name any more,
 since the only difference between old logging and new one is, the
 former is printed every 1 lines while the latter is every 5
 seconds, which is not really quiet. What do you think?

AFAIK the 5 second logging is much quieter in most cases (and a bit
more verbose when the initialization gets very slower), so I think the
quiet logging is not a bad match although maybe there's a better name.

This change (adding the elapsed/remaining fields to the original loggin)
would be consistent with this name, because considering a single line,
the -q is more verbose right now.

So I'd stick with the -q option and added the fields to the original
logging. But I'm not opposing a different name, I just can't think of a
better one.

Tomas
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index e376452..39bd8a5 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -39,6 +39,7 @@
 #include portability/instr_time.h
 
 #include ctype.h
+#include math.h
 
 #ifndef WIN32
 #include sys/time.h
@@ -102,6 +103,7 @@ extern int  optind;
 #define MAXCLIENTS 1024
 #endif
 
+#define LOG_STEP_SECONDS   5   /* seconds between log messages */
 #define DEFAULT_NXACTS 10  /* default nxacts */
 
 intnxacts = 0; /* number of 
transactions per client */
@@ -150,6 +152,7 @@ char   *index_tablespace = NULL;
 #define naccounts  10
 
 bool   use_log;/* log transaction latencies to 
a file */
+bool   use_quiet;  /* quiet logging onto stderr */
 bool   is_connect; /* establish connection for 
each transaction */
 bool   is_latencies;   /* report per-command latencies */
 intmain_pid;   /* main process id used 
in log filename */
@@ -359,6 +362,7 @@ usage(void)
 -n   do not run VACUUM after initialization\n
 -F NUM   fill factor\n
 -s NUM   scaling factor\n
+-q   quiet logging (one message each 5 seconds)\n
 --foreign-keys\n
  create foreign key constraints between 
tables\n
 --index-tablespace=TABLESPACE\n
@@ -1362,6 +1366,11 @@ init(bool is_no_vacuum)
charsql[256];
int i;
 
+   /* used to track elapsed time and estimate of the remaining time */
+   instr_time  start, diff;
+   double  elapsed_sec, remaining_sec;
+   int log_interval = 1;
+
if ((con = doConnect()) == NULL)
exit(1);
 
@@ -1430,6 +1439,8 @@ init(bool is_no_vacuum)
}
PQclear(res);
 
+   INSTR_TIME_SET_CURRENT(start);
+
for (i = 0; i  naccounts * scale; i++)
{
int j = i + 1;
@@ -1441,10 +1452,33 @@ init(bool is_no_vacuum)
exit(1);
}
 
-   if (j % 10 == 0)
+   /* If we want to stick with the original logging, print a 
message each
+* 100k inserted rows. */
+   if ((! use_quiet)  (j % 10 == 0))
fprintf(stderr, %d of %d tuples (%d%%) done.\n,
-   j, naccounts * scale,
-   (int) (((int64) j * 100) / (naccounts * 
scale)));
+   j, naccounts * scale,
+   (int) (((int64) j * 
100) / (naccounts * scale)));
+   /* let's not call