Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-09-02 Thread Magnus Hagander
On Thu, Sep 1, 2011 at 21:59, Tomas Vondra t...@fuzzy.cz wrote:
 I've prepared a significantly simplified version of the patch. The two
 main changes are

 (a) it does not update the pg_stat_bgwriter anymore, it just prints an
 info to the server log
 (b) a new GUC is not required, it's driven by the log_checkpoints

The comment still refers to the checkpoint_update_limit.


 This version will log at least 10 'checkpoint status' lines (at 10%, 20%,
 30%, ...) and whenever 5 seconds since the last log elapses. The time is
 not checked for each buffer but for 128 buffers.

 So if the checkpoint is very slow, you'll get a message every 5 seconds,
 if it's fast you'll get 10 messages.

I would personally find this very annoying. If I read it correctly,
anybody with a database with no problem at all but that has
log_checkpoints on suddenly got at least 10 times as many messages? I
generally try to advise my clients to *not* log excessively because
then they will end up not bothering to read the logs...


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql.conf archive_command example

2011-09-02 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 I'm also wondering if providing some shell script examples of a
 fault-tolerant script to handle archiving would be useful.

 I think it would.

My usual advice is to avoid having to write one if possible, because
it's more complex than it looks.  What about recommending existing
solutions, such as walmgr from Skytools?

Even better, what about including a default archiving tool, that could
be either another script in bin/ or rather an internal command.  The
default would accept a location as argument, for simple needs you mount
a remote filesystem and there you go.  If you need something more
complex, you still can provide it yourself.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_restore --no-post-data and --post-data-only

2011-09-02 Thread Dimitri Fontaine
Greg Sabino Mullane g...@turnstep.com writes:
 It's off topic. But I think custom format would require a major mangling
 to be able to handle a complete cluster. This isn't just a simple matter
 of programming, IMNSHO.

 Oh, I meant just having it create separate custom format files for each
 database. As shell scripts all over the world have been doing for years,
 but it would be nice if it was simply built in.

+1
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-09-02 Thread Tomas Vondra
On 2 Září 2011, 9:47, Magnus Hagander wrote:
 On Thu, Sep 1, 2011 at 21:59, Tomas Vondra t...@fuzzy.cz wrote:
 I've prepared a significantly simplified version of the patch. The two
 main changes are

 (a) it does not update the pg_stat_bgwriter anymore, it just prints an
 info to the server log
 (b) a new GUC is not required, it's driven by the log_checkpoints

 The comment still refers to the checkpoint_update_limit.

OK, I'll fix that.

 This version will log at least 10 'checkpoint status' lines (at 10%,
 20%,
 30%, ...) and whenever 5 seconds since the last log elapses. The time is
 not checked for each buffer but for 128 buffers.

 So if the checkpoint is very slow, you'll get a message every 5 seconds,
 if it's fast you'll get 10 messages.

 I would personally find this very annoying. If I read it correctly,
 anybody with a database with no problem at all but that has
 log_checkpoints on suddenly got at least 10 times as many messages? I
 generally try to advise my clients to *not* log excessively because
 then they will end up not bothering to read the logs...

What about logging it with a lower level, e.g. NOTICE instead of the
current LOG? If that's not a solution then a new GUC is needed I guess.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Italian PGDay 2011, Call for papers is now open

2011-09-02 Thread Gabriele Bartolini
The fifth edition of the Italian PostgreSQL Day (PGDay.IT 2011) will be 
held on November 25 in Prato.


It will cover topics for users, developers and contributors to 
PostgreSQL, as well as decision and policy makers. For more information 
about the conference, please visit the website at http://2011.pgday.it/.


Proposals for talks are now being accepted. Please note that talks will 
be accepted in both English and Italian. This call for papers is 
targeted for English speakers.


Each session will last 30 minutes, and may be on any topic related to 
PostgreSQL. Suggested topic areas include:


 * Developing applications for PostgreSQL
 * Administering large scale PostgreSQL installations
 * Case studies of PostgreSQL deployments
 * PostgreSQL tools and utilities
 * PostgreSQL hacking
 * Community  user groups
 * Tuning the server
 * Migrating from other systems
 * Scaling/replication
 * Benchmarking  hardware
 * PostgreSQL related products

Of course, proposals for talks on other PostgreSQL related topics will 
also be gladly accepted. A limited number of longer, 60-minute, slots 
will be available, especially aimed at practical tutorials. Please 
indicate clearly in your submission if you wish to make a 60-minute talk.


It is also possible for you to give a tutorial on advanced use, 
administration or development of a PostgreSQL cluster. Every tutorial 
must have a live demo, didactic material for attendees and have to be no 
more long than 60 minutes. All the material required for the tutorial 
itself (e.g., slides, code snippets, howtos, etc.) are on the lecturer(s).


Finally, there will be a session of five minute lightning talks. A 
separate call for proposals will be made for these further on.


The submission deadline is October 16th, 2011. Selected speakers will be 
notified on October 18th, 2011. Please email your proposals to 
pgday2...@itpug.org, and include the following information:


 * Name
 * Company (if applicable)
 * Biography (a paragraph about you and your involvement with PostgreSQL)
 * Presentation title
 * Presentation abstract
 * Specific requests

The proposals will be considered by a committee that will finalise the 
schedule to be published nearer the conference date. If your proposal 
has been accepted, you will be informed by email.


Submission form and more detailed information are available as a PDF 
document here: 
http://2011.pgday.it/sites/default/files/PGDay2011-CFP-EN_0.pdf


We look forward to hearing from you, and seeing you in Prato in November!

Gabriele Bartolini
Italian PostgreSQL Users Group (ITPUG), President



Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-09-02 Thread Magnus Hagander
On Fri, Sep 2, 2011 at 11:01, Tomas Vondra t...@fuzzy.cz wrote:
 On 2 Září 2011, 9:47, Magnus Hagander wrote:
 On Thu, Sep 1, 2011 at 21:59, Tomas Vondra t...@fuzzy.cz wrote:
 I've prepared a significantly simplified version of the patch. The two
 main changes are

 (a) it does not update the pg_stat_bgwriter anymore, it just prints an
 info to the server log
 (b) a new GUC is not required, it's driven by the log_checkpoints

 The comment still refers to the checkpoint_update_limit.

 OK, I'll fix that.

 This version will log at least 10 'checkpoint status' lines (at 10%,
 20%,
 30%, ...) and whenever 5 seconds since the last log elapses. The time is
 not checked for each buffer but for 128 buffers.

 So if the checkpoint is very slow, you'll get a message every 5 seconds,
 if it's fast you'll get 10 messages.

 I would personally find this very annoying. If I read it correctly,
 anybody with a database with no problem at all but that has
 log_checkpoints on suddenly got at least 10 times as many messages? I
 generally try to advise my clients to *not* log excessively because
 then they will end up not bothering to read the logs...

 What about logging it with a lower level, e.g. NOTICE instead of the
 current LOG? If that's not a solution then a new GUC is needed I guess.

I guess if it's at a DEBUG level it won't annoy anybody who doesn't
need it. Not sure if NOTICE is low enough..

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] help with plug-in function for additional (partition/shard) visibility checks

2011-09-02 Thread PostgreSQL - Hans-Jürgen Schönig
hello …

i have been thinking about this issue for quite a while ...
given your idea i am not sure how this can work at all.

consider:
begin;
insert 1
insert 2
commit

assume this ends up in the same node,
now you split it into two …
1 and 2 will have exactly the same visibility to and transaction.
i am not sure how you can get this right without looking at the data.

alternative idea: what if the proxy would add / generate a filter by looking at 
the data?
a quick idea would be that once you split you add a simple directive such as 
FILTER GENERATOR $1 or so to the PL/proxy code.
it would then behind the scene arrange the filter passed on.
what do you think?

regards,

hans



On Sep 1, 2011, at 10:13 AM, Hannu Krosing wrote:

 Hallow hackers
 
 I have the following problem to solve and would like to get advice on
 the best way to do it.
 
 The problem:
 
 When growing a pl/proxy based database cluster, one of the main
 operations is splitting a partition. The standard flow is as follows:
 
 1) make a copy of the partitions table(s) to another database
 2) reconfigure pl/proxy to use 2 partitions instead of one
 
 The easy part is making a copy of all or half of the table to another
 database. The hard part is fast deletion (i mean milliseconds,
 comparable to TRUNCATE) the data that should not be in a partition (so
 that RUN ON ALL functions will continue to return right results).
 
 It would be relatively easy, if we still had the RULES for select
 available for plain tables, but even then the eventual cleanup would
 usually mean at least 3 passes of disk writes (set xmax, write deleted
 flag, vacuum and remove)
 
 What I would like to have is possibility for additional visibility
 checks, which would run some simple C function over tuple data (usually
 hash(fieldval) + and + or ) and return visibility (is in this partition)
 as a result. It would be best if this is run at so low level that also
 vacuum would use it and can clean up the foreign partition data in one
 pass, without doing the delete dance first.
 
 So finally the QUESTION :
 
 where in code would be the best place to check for this so that
 
 1) both regular queries and VACUUM see it
 2) the tuple data (and not only system fields or just xmin/xmax) would
 be available for the function to use
 
 
 -- 
 ---
 Hannu Krosing
 PostgreSQL Unlimited Scalability and Performance Consultant
 2ndQuadrant Nordic
 PG Admin Book: http://www.2ndQuadrant.com/books/
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Cascaded standby message

2011-09-02 Thread Magnus Hagander
From what I can tell, everytime I start a postmaster on HEAD (at least
when i've set wal_level=archive, and max_wal_senders  0), I get the
message:
LOG:  terminating all walsender processes to force cascaded standby(s)
to update timeline and reconnect

in the startup log.

This is long before I've connected any slaves or even considered
cascading standbys - seems this message is written unnecessarily?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] help with plug-in function for additional (partition/shard) visibility checks

2011-09-02 Thread Hannu Krosing
On Fri, 2011-09-02 at 14:01 +0200, PostgreSQL - Hans-Jürgen Schönig
wrote:
 hello …
 
 i have been thinking about this issue for quite a while ...
 given your idea i am not sure how this can work at all.
 
 consider:
   begin;
   insert 1
   insert 2
   commit
 
 assume this ends up in the same node,
 now you split it into two …
 1 and 2 will have exactly the same visibility to and transaction.
 i am not sure how you can get this right without looking at the data.

It has to consider the data when determining visibility, that's the
whole point of the plug-in .

The idea is, that each row belongs to a certain partition, as
determined by some function over it's fields. Most often this function
is hash of primary key OR-ed by a bitmap representing cluster size and
AND-ed by bitmap for partition(s) stored in this database. 

when you split the parition, then some row's don't belong in the old
partition database anymore (and if you did a full copy, then the other
half dont belong to the new one), so they should be handled as
invisible / deleted. As this can be only done by looking at the tuple
data, this needs an additional visibility function. And as this is only
needed for partitioned databases, it makes sense to implement it as a
plogin, so it would not wast cycles on non-partitioned databases

 alternative idea: what if the proxy would add / generate a filter by 
 looking at the data?
 a quick idea would be that once you split you add a simple directive 
 such as FILTER GENERATOR $1 or so to the PL/proxy code.
 it would then behind the scene arrange the filter passed on.
 what do you think?

Hmm. I'm not sure I understand what you are trying to say. Can you
elaborate a little ?

 
   regards,
 
   hans
 
 
 
 On Sep 1, 2011, at 10:13 AM, Hannu Krosing wrote:
 
  Hallow hackers
  
  I have the following problem to solve and would like to get advice on
  the best way to do it.
  
  The problem:
  
  When growing a pl/proxy based database cluster, one of the main
  operations is splitting a partition. The standard flow is as follows:
  
  1) make a copy of the partitions table(s) to another database
  2) reconfigure pl/proxy to use 2 partitions instead of one
  
  The easy part is making a copy of all or half of the table to another
  database. The hard part is fast deletion (i mean milliseconds,
  comparable to TRUNCATE) the data that should not be in a partition (so
  that RUN ON ALL functions will continue to return right results).
  
  It would be relatively easy, if we still had the RULES for select
  available for plain tables, but even then the eventual cleanup would
  usually mean at least 3 passes of disk writes (set xmax, write deleted
  flag, vacuum and remove)
  
  What I would like to have is possibility for additional visibility
  checks, which would run some simple C function over tuple data (usually
  hash(fieldval) + and + or ) and return visibility (is in this partition)
  as a result. It would be best if this is run at so low level that also
  vacuum would use it and can clean up the foreign partition data in one
  pass, without doing the delete dance first.
  
  So finally the QUESTION :
  
  where in code would be the best place to check for this so that
  
  1) both regular queries and VACUUM see it
  2) the tuple data (and not only system fields or just xmin/xmax) would
  be available for the function to use
  
  
  -- 
  ---
  Hannu Krosing
  PostgreSQL Unlimited Scalability and Performance Consultant
  2ndQuadrant Nordic
  PG Admin Book: http://www.2ndQuadrant.com/books/
  
  
  -- 
  Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-hackers
  
 
 --
 Cybertec Schönig  Schönig GmbH
 Gröhrmühlgasse 26
 A-2700 Wiener Neustadt, Austria
 Web: http://www.postgresql-support.de
 
 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] help with plug-in function for additional (partition/shard) visibility checks

2011-09-02 Thread PostgreSQL - Hans-Jürgen Schönig
hello …

the goal of the entire proxy thing is to make the right query go to the right 
node / nodes.
we determine this by using a partitioning function and so on …
currently PL/proxy has only a handful of commands - one is RUN ON … which tells 
us where to put things.
assume you issue a select … some select will fall out on the target node. 
to restrict the data coming from the node you could add an additional 
constraint on the way …

say:
SELECT * FROM proxy_table WHERE a = 20;

what you want to reach the node after a split is …

SELECT * FROM proxy_table WHERE a = 20 AND col = filter the wrong half 
away

my idea is to add an additional command to the PL/proxy command set.
it should call a function generating this additional filter.
maybe somehow like that …

RUN ON hashtext($1) 
-- this one already knows about the increased cluster
GENERATE FILTER my_create_the_bloody_filter_func($1)-- this one 
would massage the query going to the node.

it would actually open the door for a lot of additional trickery.
the function would tell the proxy what to append - and: this what would be 
under your full control.

what do you think?
i got to think about it futher but i can envision that this could be feasible 
...

hans


On Sep 2, 2011, at 2:36 PM, Hannu Krosing wrote:

 On Fri, 2011-09-02 at 14:01 +0200, PostgreSQL - Hans-Jürgen Schönig
 wrote:
 hello …
 
 i have been thinking about this issue for quite a while ...
 given your idea i am not sure how this can work at all.
 
 consider:
  begin;
  insert 1
  insert 2
  commit
 
 assume this ends up in the same node,
 now you split it into two …
 1 and 2 will have exactly the same visibility to and transaction.
 i am not sure how you can get this right without looking at the data.
 
 It has to consider the data when determining visibility, that's the
 whole point of the plug-in .
 
 The idea is, that each row belongs to a certain partition, as
 determined by some function over it's fields. Most often this function
 is hash of primary key OR-ed by a bitmap representing cluster size and
 AND-ed by bitmap for partition(s) stored in this database. 
 
 when you split the parition, then some row's don't belong in the old
 partition database anymore (and if you did a full copy, then the other
 half dont belong to the new one), so they should be handled as
 invisible / deleted. As this can be only done by looking at the tuple
 data, this needs an additional visibility function. And as this is only
 needed for partitioned databases, it makes sense to implement it as a
 plogin, so it would not wast cycles on non-partitioned databases
 
 alternative idea: what if the proxy would add / generate a filter by 
 looking at the data?
 a quick idea would be that once you split you add a simple directive 
 such as FILTER GENERATOR $1 or so to the PL/proxy code.
 it would then behind the scene arrange the filter passed on.
 what do you think?
 
 Hmm. I'm not sure I understand what you are trying to say. Can you
 elaborate a little ?
 
 
  regards,
 
  hans
 
 
 
 On Sep 1, 2011, at 10:13 AM, Hannu Krosing wrote:
 
 Hallow hackers
 
 I have the following problem to solve and would like to get advice on
 the best way to do it.
 
 The problem:
 
 When growing a pl/proxy based database cluster, one of the main
 operations is splitting a partition. The standard flow is as follows:
 
 1) make a copy of the partitions table(s) to another database
 2) reconfigure pl/proxy to use 2 partitions instead of one
 
 The easy part is making a copy of all or half of the table to another
 database. The hard part is fast deletion (i mean milliseconds,
 comparable to TRUNCATE) the data that should not be in a partition (so
 that RUN ON ALL functions will continue to return right results).
 
 It would be relatively easy, if we still had the RULES for select
 available for plain tables, but even then the eventual cleanup would
 usually mean at least 3 passes of disk writes (set xmax, write deleted
 flag, vacuum and remove)
 
 What I would like to have is possibility for additional visibility
 checks, which would run some simple C function over tuple data (usually
 hash(fieldval) + and + or ) and return visibility (is in this partition)
 as a result. It would be best if this is run at so low level that also
 vacuum would use it and can clean up the foreign partition data in one
 pass, without doing the delete dance first.
 
 So finally the QUESTION :
 
 where in code would be the best place to check for this so that
 
 1) both regular queries and VACUUM see it
 2) the tuple data (and not only system fields or just xmin/xmax) would
 be available for the function to use
 
 
 -- 
 ---
 Hannu Krosing
 PostgreSQL Unlimited Scalability and Performance Consultant
 2ndQuadrant Nordic
 PG Admin Book: http://www.2ndQuadrant.com/books/
 
 
 -- 
 Sent via 

Re: [HACKERS] help with plug-in function for additional (partition/shard) visibility checks

2011-09-02 Thread Hannu Krosing
On Fri, 2011-09-02 at 14:51 +0200, PostgreSQL - Hans-Jürgen Schönig
wrote:
 hello …
 
 the goal of the entire proxy thing is to make the right query go to the right 
 node / nodes.
 we determine this by using a partitioning function and so on …
 currently PL/proxy has only a handful of commands - one is RUN ON … which 
 tells us where to put things.
 assume you issue a select … some select will fall out on the target node. 
 to restrict the data coming from the node you could add an additional 
 constraint on the way …
 
 say:
   SELECT * FROM proxy_table WHERE a = 20;
 
 what you want to reach the node after a split is …
 
   SELECT * FROM proxy_table WHERE a = 20 AND col = filter the wrong half 
 away
 
 my idea is to add an additional command to the PL/proxy command set.
 it should call a function generating this additional filter.
 maybe somehow like that …
 
   RUN ON hashtext($1) 
 -- this one already knows about the increased cluster
   GENERATE FILTER my_create_the_bloody_filter_func($1)-- this one 
 would massage the query going to the node.
 
 it would actually open the door for a lot of additional trickery.
 the function would tell the proxy what to append - and: this what would be 
 under your full control.
 
 what do you think?

Hmm, could work for simplest cases, but this has 2 main problems:

1) you need a full SQL parser to make this generally useful for plain
SQL

and

2) it still won't work for pl/proxy's main usecase - calling the same
_function_ on partition.

 i got to think about it futher but i can envision that this could be feasible 
 ...
 
   hans
 
 
 On Sep 2, 2011, at 2:36 PM, Hannu Krosing wrote:
 
  On Fri, 2011-09-02 at 14:01 +0200, PostgreSQL - Hans-Jürgen Schönig
  wrote:
  hello …
  
  i have been thinking about this issue for quite a while ...
  given your idea i am not sure how this can work at all.
  
  consider:
 begin;
 insert 1
 insert 2
 commit
  
  assume this ends up in the same node,
  now you split it into two …
  1 and 2 will have exactly the same visibility to and transaction.
  i am not sure how you can get this right without looking at the data.
  
  It has to consider the data when determining visibility, that's the
  whole point of the plug-in .
  
  The idea is, that each row belongs to a certain partition, as
  determined by some function over it's fields. Most often this function
  is hash of primary key OR-ed by a bitmap representing cluster size and
  AND-ed by bitmap for partition(s) stored in this database. 
  
  when you split the parition, then some row's don't belong in the old
  partition database anymore (and if you did a full copy, then the other
  half dont belong to the new one), so they should be handled as
  invisible / deleted. As this can be only done by looking at the tuple
  data, this needs an additional visibility function. And as this is only
  needed for partitioned databases, it makes sense to implement it as a
  plogin, so it would not wast cycles on non-partitioned databases
  
  alternative idea: what if the proxy would add / generate a filter by 
  looking at the data?
  a quick idea would be that once you split you add a simple directive 
  such as FILTER GENERATOR $1 or so to the PL/proxy code.
  it would then behind the scene arrange the filter passed on.
  what do you think?
  
  Hmm. I'm not sure I understand what you are trying to say. Can you
  elaborate a little ?
  
  
 regards,
  
 hans
  
  
  
  On Sep 1, 2011, at 10:13 AM, Hannu Krosing wrote:
  
  Hallow hackers
  
  I have the following problem to solve and would like to get advice on
  the best way to do it.
  
  The problem:
  
  When growing a pl/proxy based database cluster, one of the main
  operations is splitting a partition. The standard flow is as follows:
  
  1) make a copy of the partitions table(s) to another database
  2) reconfigure pl/proxy to use 2 partitions instead of one
  
  The easy part is making a copy of all or half of the table to another
  database. The hard part is fast deletion (i mean milliseconds,
  comparable to TRUNCATE) the data that should not be in a partition (so
  that RUN ON ALL functions will continue to return right results).
  
  It would be relatively easy, if we still had the RULES for select
  available for plain tables, but even then the eventual cleanup would
  usually mean at least 3 passes of disk writes (set xmax, write deleted
  flag, vacuum and remove)
  
  What I would like to have is possibility for additional visibility
  checks, which would run some simple C function over tuple data (usually
  hash(fieldval) + and + or ) and return visibility (is in this partition)
  as a result. It would be best if this is run at so low level that also
  vacuum would use it and can clean up the foreign partition data in one
  pass, without doing the delete dance first.
  
  So finally the 

Re: [HACKERS] help with plug-in function for additional (partition/shard) visibility checks

2011-09-02 Thread PostgreSQL - Hans-Jürgen Schönig

On Sep 2, 2011, at 2:59 PM, Hannu Krosing wrote:

 On Fri, 2011-09-02 at 14:51 +0200, PostgreSQL - Hans-Jürgen Schönig
 wrote:
 hello …
 
 the goal of the entire proxy thing is to make the right query go to the 
 right node / nodes.
 we determine this by using a partitioning function and so on …
 currently PL/proxy has only a handful of commands - one is RUN ON … which 
 tells us where to put things.
 assume you issue a select … some select will fall out on the target node. 
 to restrict the data coming from the node you could add an additional 
 constraint on the way …
 
 say:
  SELECT * FROM proxy_table WHERE a = 20;
 
 what you want to reach the node after a split is …
 
  SELECT * FROM proxy_table WHERE a = 20 AND col = filter the wrong half 
 away
 
 my idea is to add an additional command to the PL/proxy command set.
 it should call a function generating this additional filter.
 maybe somehow like that …
 
  RUN ON hashtext($1) 
 -- this one already knows about the increased cluster
  GENERATE FILTER my_create_the_bloody_filter_func($1)-- this one 
 would massage the query going to the node.
 
 it would actually open the door for a lot of additional trickery.
 the function would tell the proxy what to append - and: this what would be 
 under your full control.
 
 what do you think?
 
 Hmm, could work for simplest cases, but this has 2 main problems:
 
 1) you need a full SQL parser to make this generally useful for plain
 SQL


i think that everything beyond a simple case is pretty hard to achieve anyway. 
to me it looks pretty impossible to solve this in a generic way without same 
insane amount of labor input - at listen given the ideas coming to me in the 
past.
and yes, functions are an issue. unless you have some sort of virtually 
private database thing it is close to impossible (unless you want to try some 
nightmare based on views / constraint exclusion on the partitions or so).

regards,

hans


--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-09-02 Thread Tomas Vondra
On 2 Září 2011, 12:45, Magnus Hagander wrote:
 On Fri, Sep 2, 2011 at 11:01, Tomas Vondra t...@fuzzy.cz wrote:
 What about logging it with a lower level, e.g. NOTICE instead of the
 current LOG? If that's not a solution then a new GUC is needed I guess.

 I guess if it's at a DEBUG level it won't annoy anybody who doesn't
 need it. Not sure if NOTICE is low enough..

I've changed the level to NOTICE. I guess I could put that to info, but
the debug levels seem too low to me. Is there a rule of a thumb about
where to put messages?

Tomasdiff --git a/src/backend/storage/buffer/bufmgr.c 
b/src/backend/storage/buffer/bufmgr.c
new file mode 100644
index 4c7cfb0..a9bdd89
*** a/src/backend/storage/buffer/bufmgr.c
--- b/src/backend/storage/buffer/bufmgr.c
*** BufferSync(int flags)
*** 1175,1180 
--- 1175,1193 
int num_to_write;
int num_written;
int mask = BM_DIRTY;
+   
+   int num_since_update;
+   
+   longcurr_secs,
+   total_secs;
+   int curr_usecs,
+   total_usecs;
+   float   curr_time,
+   total_time;
+   
+   TimestampTz startTimestamp, lastTimestamp;
+   
+   int log_interval, check_interval;
  
/* Make sure we can handle the pin inside SyncOneBuffer */
ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
*** BufferSync(int flags)
*** 1238,1243 
--- 1251,1265 
buf_id = StrategySyncStart(NULL, NULL);
num_to_scan = NBuffers;
num_written = 0;
+   num_since_update = 0;
+   
+   startTimestamp = GetCurrentTimestamp();
+   lastTimestamp = startTimestamp;
+   
+   /* check the progress each  128 buffers or 10% */
+   log_interval = (int)round(num_to_write/10);
+   check_interval = ((log_interval  128)  (log_interval  0)) ? 
log_interval : 128;
+   
while (num_to_scan--  0)
{
volatile BufferDesc *bufHdr = BufferDescriptors[buf_id];
*** BufferSync(int flags)
*** 1261,1266 
--- 1283,1334 
TRACE_POSTGRESQL_BUFFER_SYNC_WRITTEN(buf_id);
BgWriterStats.m_buf_written_checkpoints++;
num_written++;
+   num_since_update++;
+   
+   /*
+* Every time we write enough buffers we check 
the time since 
+* last log and whenever we reach 5 seconds or 
10% of the buffers
+* to write, we put a checkpoint status message.
+* 
+* The log message contains info about total 
number of buffers to
+* write, how many buffers are already written, 
average and current
+* write speed and an estimate remaining time.
+*/
+   if ((log_checkpoints)  (num_written % 
check_interval == 0)) {
+ 
+   TimestampDifference(lastTimestamp,
+   GetCurrentTimestamp(),
+   curr_secs, curr_usecs);
+   
+   /* if at least 5 seconds elapsed since 
the last log, log */
+   if ((curr_secs = 5) || 
(num_since_update = log_interval)) {
+   
+   
TimestampDifference(startTimestamp,
+   GetCurrentTimestamp(),
+   total_secs, 
total_usecs);
+   
+   curr_time = curr_secs + 
(float)curr_usecs / 100;
+   total_time = total_secs + 
(float)total_usecs / 100;
+   
+   elog(NOTICE, checkpoint 
status: wrote %d buffers of %d (%.1f%%) in %.1f s; 
+   average %.1f MB/s (%d 
buffers, %ld.%03d s), 
+   current %.1f MB/s (%d 
buffers, %ld.%03d s), 
+   remaining %.1f s,
+   num_written, 
num_to_write, ((float) num_written * 100 / num_to_write),
+   

Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-09-02 Thread Peter Eisentraut
On fre, 2011-09-02 at 11:01 +0200, Tomas Vondra wrote:
 What about logging it with a lower level, e.g. NOTICE instead of the
 current LOG? If that's not a solution then a new GUC is needed I
 guess.

Changing the log level is not the appropriate solution.  Make it a
configuration parameter.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-09-02 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On fre, 2011-09-02 at 11:01 +0200, Tomas Vondra wrote:
 What about logging it with a lower level, e.g. NOTICE instead of the
 current LOG? If that's not a solution then a new GUC is needed I
 guess.

 Changing the log level is not the appropriate solution.  Make it a
 configuration parameter.

Frankly, logging as verbose as this is proposed to be is something
that I can't imagine anybody wanting at all, especially not in
production environments.  DEBUG3 or somewhere around there seems fine.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql.conf archive_command example

2011-09-02 Thread Kevin Grittner
Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 I'm also wondering if providing some shell script examples of a
 fault-tolerant script to handle archiving would be useful.

 I think it would.
 
 My usual advice is to avoid having to write one if possible,
 because it's more complex than it looks.  What about recommending
 existing solutions, such as walmgr from Skytools?
 
 Even better, what about including a default archiving tool, that
 could be either another script in bin/ or rather an internal
 command. The default would accept a location as argument, for
 simple needs you mount a remote filesystem and there you go.  If
 you need something more complex, you still can provide it
 yourself.
 
In a green field I might argue for having an archvie_directory GUC
instead of archive_command.  As it stands, it might be a really good
idea to provide a pg_archiveto executable which takes as arguments a
directory path and the arguments passed to the archive script.  With
a little extra effort, the executable could check for some file
which would specify what host and path should be writing archives
there, to avoid problems with copied database directories
accidentally writing to the same location as the source.
 
Such an executable seems like minimal effort compared to the
problems it would solve.
 
If there's an existing tool with appropriate licensing which is
sufficiently portable and reliable, all the better -- let's ship it
and use that for our example archive_command.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade automatic testing

2011-09-02 Thread Peter Eisentraut
On tor, 2011-09-01 at 18:55 -0400, Bruce Momjian wrote:
 Tom Lane wrote:
  Peter Eisentraut pete...@gmx.net writes:
   +# contrib/pg_upgrade/test.sh
   +#
   +# Test driver for pg_upgrade.  Initializes a new database cluster,
   +# runs the regression tests (to put in some data), runs pg_dumpall,
   +# runs pg_upgrade, runs pg_dumpall again, compares the dumps.
  
  Hm .. my experience is that that doesn't work at all, because the
  regression tests set up assorted C functions whose implementations are
  in pg_regress.so, and it creates them with absolute path references
  to pg_regress.so.  When you try to load that into another installation
  that's a different version of PG, it quite properly fails.  So I think
  that as given, this script is only useful for testing pg_upgrade of
  $currentversion to $currentversion.  Which is surely better than no test
 
 Reminder --- you can't use pg_upgrade to go from the same catalog
 version to the same catalog version because the catalog version is
 embedded in the tablespace directory name.

Well, it does work, but only because the regression tests don't keep a
tablespace around at the end.  Would pg_upgrade complain otherwise?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade automatic testing

2011-09-02 Thread Andrew Dunstan



On 09/02/2011 10:36 AM, Peter Eisentraut wrote:

On tor, 2011-09-01 at 18:55 -0400, Bruce Momjian wrote:

Tom Lane wrote:

Peter Eisentrautpete...@gmx.net  writes:

+# contrib/pg_upgrade/test.sh
+#
+# Test driver for pg_upgrade.  Initializes a new database cluster,
+# runs the regression tests (to put in some data), runs pg_dumpall,
+# runs pg_upgrade, runs pg_dumpall again, compares the dumps.

Hm .. my experience is that that doesn't work at all, because the
regression tests set up assorted C functions whose implementations are
in pg_regress.so, and it creates them with absolute path references
to pg_regress.so.  When you try to load that into another installation
that's a different version of PG, it quite properly fails.  So I think
that as given, this script is only useful for testing pg_upgrade of
$currentversion to $currentversion.  Which is surely better than no test

Reminder --- you can't use pg_upgrade to go from the same catalog
version to the same catalog version because the catalog version is
embedded in the tablespace directory name.

Well, it does work, but only because the regression tests don't keep a
tablespace around at the end.  Would pg_upgrade complain otherwise?



In any case, it would be good to get rid of the limitation if possible. 
Then we could look at creating an automated test that we could use in 
the buildfarm.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql.conf archive_command example

2011-09-02 Thread Dimitri Fontaine
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 In a green field I might argue for having an archvie_directory GUC
 instead of archive_command.  As it stands, it might be a really good

I would think we then would need both.  archive_command with parameters
offers both.

 idea to provide a pg_archiveto executable which takes as arguments a
 directory path and the arguments passed to the archive script.  With
 a little extra effort, the executable could check for some file
 which would specify what host and path should be writing archives
 there, to avoid problems with copied database directories
 accidentally writing to the same location as the source.
  
 Such an executable seems like minimal effort compared to the
 problems it would solve.
  
 If there's an existing tool with appropriate licensing which is
 sufficiently portable and reliable, all the better -- let's ship it
 and use that for our example archive_command.

I would like for it not to be an example, but a default value.
Something ready for production but with a very narrow use case.

-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-09-02 Thread Tomas Vondra
On 2 Září 2011, 15:44, Peter Eisentraut wrote:
 On fre, 2011-09-02 at 11:01 +0200, Tomas Vondra wrote:
 What about logging it with a lower level, e.g. NOTICE instead of the
 current LOG? If that's not a solution then a new GUC is needed I
 guess.

 Changing the log level is not the appropriate solution.  Make it a
 configuration parameter.

Why is it inappropriate solution? There's a log_checkpoints GUC that
drives it and you can either get basic info (summary of the checkpoint) or
detailed log (with a lower log level).

In the first patch I've proposed a new GUC (used to set how often the info
should be logged or disable it), but Josh Berkus pointed out that I should
get rid of it if I can. Which is what I've done in the following patches.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_restore --no-post-data and --post-data-only

2011-09-02 Thread Andrew Dunstan



On 09/01/2011 09:40 PM, Greg Sabino Mullane wrote:

It's off topic. But I think custom format would require a major mangling
to be able to handle a complete cluster. This isn't just a simple matter
of programming, IMNSHO.

Oh, I meant just having it create separate custom format files for each
database. As shell scripts all over the world have been doing for years,
but it would be nice if it was simply built in.




I guess it could be done, although I'm not going to do it :-) I'm more 
about making somewhat hard things easier than easy things slightly 
easier :-) You'd have to invent some sort of way to name files, possibly 
by supplying a template to the -f parameter which would fill in some 
placeholder, say a %, with the name of the database. Of course, then 
you'd have to make sure the database name didn't contain any forbidden 
characters.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-09-02 Thread Magnus Hagander
On Sep 2, 2011 5:02 PM, Tomas Vondra t...@fuzzy.cz wrote:

 On 2 Září 2011, 15:44, Peter Eisentraut wrote:
  On fre, 2011-09-02 at 11:01 +0200, Tomas Vondra wrote:
  What about logging it with a lower level, e.g. NOTICE instead of the
  current LOG? If that's not a solution then a new GUC is needed I
  guess.
 
  Changing the log level is not the appropriate solution.  Make it a
  configuration parameter.

 Why is it inappropriate solution? There's a log_checkpoints GUC that
 drives it and you can either get basic info (summary of the checkpoint) or
 detailed log (with a lower log level).

 In the first patch I've proposed a new GUC (used to set how often the info
 should be logged or disable it), but Josh Berkus pointed out that I should
 get rid of it if I can. Which is what I've done in the following patches.

Well, josh doesn't speak for everybody ;-)

Maybe one way could be to change log_checkpoints into an enum of off, on,
debug (values open for bikeshedding of course)

/Magnus


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-09-02 Thread Tomas Vondra
On 2 Září 2011, 16:13, Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
 On fre, 2011-09-02 at 11:01 +0200, Tomas Vondra wrote:
 What about logging it with a lower level, e.g. NOTICE instead of the
 current LOG? If that's not a solution then a new GUC is needed I
 guess.

 Changing the log level is not the appropriate solution.  Make it a
 configuration parameter.

 Frankly, logging as verbose as this is proposed to be is something
 that I can't imagine anybody wanting at all, especially not in
 production environments.  DEBUG3 or somewhere around there seems fine.

Yes, it can be a bit verbose, but I don't think it's that verbose. I'm one
of those this information about checkpoint progress may be a valuable info
- even on production systems - that's why I don't want to put that into
the debug levels.

It should write about 10 lines for each checkpoint (altough I've just
realized the current patch won't do that for the timed checkpoints - it'll
probably log significantly more).

My 'ideal' solution would be either to add another GUC (to turn this
on/off) or allow log_checkpoints to have three values

log_checkpoints = {off, normal, detailed}

where 'normal' provides the current output and 'detail' produces this much
verbose output.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-09-02 Thread Tomas Vondra
On 2 Září 2011, 17:08, Magnus Hagander wrote:
 On Sep 2, 2011 5:02 PM, Tomas Vondra t...@fuzzy.cz wrote:

 On 2 Září 2011, 15:44, Peter Eisentraut wrote:
  On fre, 2011-09-02 at 11:01 +0200, Tomas Vondra wrote:
  What about logging it with a lower level, e.g. NOTICE instead of the
  current LOG? If that's not a solution then a new GUC is needed I
  guess.
 
  Changing the log level is not the appropriate solution.  Make it a
  configuration parameter.

 Why is it inappropriate solution? There's a log_checkpoints GUC that
 drives it and you can either get basic info (summary of the checkpoint)
 or
 detailed log (with a lower log level).

 In the first patch I've proposed a new GUC (used to set how often the
 info
 should be logged or disable it), but Josh Berkus pointed out that I
 should
 get rid of it if I can. Which is what I've done in the following
 patches.

 Well, josh doesn't speak for everybody ;-)

Sure, but I think the effort not to have a zillion of GUC makes sense.

 Maybe one way could be to change log_checkpoints into an enum of off, on,
 debug (values open for bikeshedding of course)

Yes, that's actually one of the solutions I'd prefer. Not sure why I
rejected it ...

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-09-02 Thread Kevin Grittner
.Tomas Vondra t...@fuzzy.cz wrote:
 On 2 Zá*í 2011, 15:44, Peter Eisentraut wrote:
 
 Changing the log level is not the appropriate solution.  Make it
 a configuration parameter.
 
 Why is it inappropriate solution? There's a log_checkpoints GUC
 that drives it and you can either get basic info (summary of the
 checkpoint) or detailed log (with a lower log level).
 
Maybe?:
 
log_checkpoints = 'verbose'
 
It seems to me that one of the most likely uses of this is for
investigating performance issues in production.  Like with
log_statement, you might want to turn it on briefly and examine
results.
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-09-02 Thread Magnus Hagander
On Sep 2, 2011 5:18 PM, Jaime Casanova ja...@2ndquadrant.com wrote:

 On Fri, Sep 2, 2011 at 10:10 AM, Tomas Vondra t...@fuzzy.cz wrote:
 
  My 'ideal' solution would be either to add another GUC (to turn this
  on/off) or allow log_checkpoints to have three values
 
  log_checkpoints = {off, normal, detailed}
 

 +1 on make it an enum but i prefer values {on, off, verbose}

 where on will log just the same as now

Yeah, keeping on and off working the same way as they do today is a feature
on itself.

/Magnus


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-09-02 Thread Jaime Casanova
On Fri, Sep 2, 2011 at 10:10 AM, Tomas Vondra t...@fuzzy.cz wrote:

 My 'ideal' solution would be either to add another GUC (to turn this
 on/off) or allow log_checkpoints to have three values

 log_checkpoints = {off, normal, detailed}


+1 on make it an enum but i prefer values {on, off, verbose}

where on will log just the same as now

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.1] sepgsql - userspace access vector cache

2011-09-02 Thread Kohei Kaigai
 I've committed this, but I still think it would be helpful to revise
 that comment.  The turn boosted up is not very precise or
 informative.  Could you submit a separate, comment-only patch to
 improve this?
 
I tried to put more detailed explanation about the logic of do { ... } while
loop of sepgsql_avc_check_valid and the cache field of new security label to
be switched on execution of the procedure. Is it helpful?

Thanks,
--
NEC Europe Ltd, SAP Global Competence Center
KaiGai Kohei kohei.kai...@emea.nec.com


pgsql-uavc-comments-update.patch
Description: pgsql-uavc-comments-update.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] limit in subquery causes poor selectivity estimation

2011-09-02 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On lör, 2011-08-27 at 13:32 -0400, Tom Lane wrote:
 The larger problem is that if a subquery didn't get flattened, it's
 often because it's got LIMIT, or GROUP BY, or some similar clause that
 makes it highly suspect whether the statistics available for the table
 column are reasonable to use for the subquery outputs.  It wouldn't be
 that hard to grab the stats for test2.sha1, but then how do you want
 to adjust them to reflect the LIMIT?

 It turns out that this is a regression introduced in 8.4.8;

Well, the fact that examine_variable punts on subqueries is certainly
not a regression introduced in 8.4.8; it's always been like that.

I think your observation that 8.4.8 is worse has to be blamed on
commit 0ae8b300388c2a3eaf90e6e6f13d6be1f4d4ac2d, which introduced a
fallback rule of assuming 0.5 selectivity for a semijoin if we didn't
have non-default ndistinct estimates on both sides.  Before that, 8.4.x
would go ahead and apply its heuristic rule, essentially Min(nd2/nd1, 1),
even when one or both ndistinct values were completely made-up.

I'm not sure what we could do instead.  Perhaps you could argue that
we should just revert that commit on the grounds that it's doing more
harm than good, but I don't really believe that --- I think reverting
would just move the pain points around.  It's pure luck that 8.4.8
is worse rather than better on the particular example you cite.

On a longer-term basis, I'm looking into what we could do with
extracting stats from subqueries, but that doesn't seem like material
for a backpatch.  I have a draft patch that I've been playing with
(attached).  The main thing I feel unsure about is whether it's
reasonable to extract stats in this way from a subquery that has GROUP
BY or DISTINCT.  ISTM it's probably okay to ignore joining, sorting, or
limiting in the subquery: those might affect the stats of the subquery
output, but this is no worse than using the unmodified column statistics
for any other join-level selectivity estimate (where we already ignore
the effects of scan-level restriction clauses that will filter the
column values).  But GROUP BY or DISTINCT would entirely invalidate the
column frequency statistics, which makes me think that ignoring the
pg_statistic entry might be the thing to do.  Comments?

regards, tom lane

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index ef29374fcccae23cb663c04470f12c22321a0e2c..a703f4a910c0e5f521f09cf9564a05c73cf803b8 100644
*** a/src/backend/utils/adt/selfuncs.c
--- b/src/backend/utils/adt/selfuncs.c
***
*** 95,100 
--- 95,101 
  #include access/sysattr.h
  #include catalog/index.h
  #include catalog/pg_collation.h
+ #include catalog/pg_inherits_fn.h
  #include catalog/pg_opfamily.h
  #include catalog/pg_statistic.h
  #include catalog/pg_type.h
*** static double convert_one_bytea_to_scala
*** 168,173 
--- 169,176 
  			int rangelo, int rangehi);
  static char *convert_string_datum(Datum value, Oid typid);
  static double convert_timevalue_to_scalar(Datum value, Oid typid);
+ static void examine_variable_recurse(Query *subquery, AttrNumber resno,
+ 		 VariableStatData *vardata);
  static bool get_variable_range(PlannerInfo *root, VariableStatData *vardata,
     Oid sortop, Datum *min, Datum *max);
  static bool get_actual_variable_range(PlannerInfo *root,
*** examine_variable(PlannerInfo *root, Node
*** 4176,4196 
  		}
  		else if (rte-rtekind == RTE_RELATION)
  		{
  			vardata-statsTuple = SearchSysCache3(STATRELATTINH,
  ObjectIdGetDatum(rte-relid),
  Int16GetDatum(var-varattno),
    BoolGetDatum(rte-inh));
  			vardata-freefunc = ReleaseSysCache;
  		}
  		else
  		{
  			/*
! 			 * XXX This means the Var comes from a JOIN or sub-SELECT. Later
! 			 * add code to dig down into the join etc and see if we can trace
! 			 * the variable to something with stats.  (But beware of
! 			 * sub-SELECTs with DISTINCT/GROUP BY/etc.	Perhaps there are no
! 			 * cases where this would really be useful, because we'd have
! 			 * flattened the subselect if it is??)
  			 */
  		}
  
--- 4179,4205 
  		}
  		else if (rte-rtekind == RTE_RELATION)
  		{
+ 			/* plain table, so look up the column in pg_statistic */
  			vardata-statsTuple = SearchSysCache3(STATRELATTINH,
  ObjectIdGetDatum(rte-relid),
  Int16GetDatum(var-varattno),
    BoolGetDatum(rte-inh));
  			vardata-freefunc = ReleaseSysCache;
  		}
+ 		else if (rte-rtekind == RTE_SUBQUERY)
+ 		{
+ 			/* recurse into subquery */
+ 			examine_variable_recurse(rte-subquery, var-varattno,
+ 	 vardata);
+ 		}
  		else
  		{
  			/*
! 			 * Otherwise, the Var comes from a FUNCTION, VALUES, or CTE RTE.
! 			 * (We won't see RTE_JOIN here because join alias Vars have
! 			 * already been flattened.)  There's not much we can do with
! 			 * 

Re: [HACKERS] [COMMITTERS] pgsql: Remove fmgr.h include in cube contrib --- caused crash on a Ge

2011-09-02 Thread Alvaro Herrera
Excerpts from Bruce Momjian's message of vie sep 02 12:20:50 -0300 2011:

 Wow, that is interesting.  So the problem is the inclusion of
 replication/walsender.h in xlog.h.  Hard to see how that could cause the
 cube regression tests to fail, but of course, it is.

Hmm, so you included walsender.h into xlog.h?  That seems a bit funny
considering that walsender.h already includes xlog.h.  It seems the
reason for this is only the AllowCascadeReplication() definition.  Maybe
that should go elsewhere instead, for example walsender.h?

I wonder if there should be a new header, something like
walsender_internal.h, for stuff like WalSnd and WalSndCtlData struct
defs.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql.conf archive_command example

2011-09-02 Thread Robert Haas
On Fri, Sep 2, 2011 at 10:34 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 I'm also wondering if providing some shell script examples of a
 fault-tolerant script to handle archiving would be useful.

 I think it would.

 My usual advice is to avoid having to write one if possible,
 because it's more complex than it looks.  What about recommending
 existing solutions, such as walmgr from Skytools?

 Even better, what about including a default archiving tool, that
 could be either another script in bin/ or rather an internal
 command. The default would accept a location as argument, for
 simple needs you mount a remote filesystem and there you go.  If
 you need something more complex, you still can provide it
 yourself.

 In a green field I might argue for having an archvie_directory GUC
 instead of archive_command.  As it stands, it might be a really good
 idea to provide a pg_archiveto executable which takes as arguments a
 directory path and the arguments passed to the archive script.  With
 a little extra effort, the executable could check for some file
 which would specify what host and path should be writing archives
 there, to avoid problems with copied database directories
 accidentally writing to the same location as the source.

 Such an executable seems like minimal effort compared to the
 problems it would solve.

 If there's an existing tool with appropriate licensing which is
 sufficiently portable and reliable, all the better -- let's ship it
 and use that for our example archive_command.

Another thought I have here is to wonder whether we should change
something on the server side so that we don't NEED such a complicated
archive_command.  I mean, copying a file to a directory somewhere is
not fundamentally a complex operation.  Nor is using ssh to copy it to
another machine.  The fact that archive_commands need to be so complex
seems like a usability defect.  The consensus seems to be that just
using something like 'cp' for your archive command won't work out
well, but maybe instead of shipping a more complicated script we
should be trying to eliminate (or at least reduce) the need for a more
complicated script.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-09-02 Thread Robert Haas
On Thu, Sep 1, 2011 at 3:59 PM, Tomas Vondra t...@fuzzy.cz wrote:
 I've prepared a significantly simplified version of the patch. The two
 main changes are

 (a) it does not update the pg_stat_bgwriter anymore, it just prints an
 info to the server log

 (b) a new GUC is not required, it's driven by the log_checkpoints

 This version will log at least 10 'checkpoint status' lines (at 10%, 20%,
 30%, ...) and whenever 5 seconds since the last log elapses. The time is
 not checked for each buffer but for 128 buffers.

 So if the checkpoint is very slow, you'll get a message every 5 seconds,
 if it's fast you'll get 10 messages.

This seems like a strange heuristic.  I understand the value of
emitting a progress report every N seconds, but why would you want a
report at 10%, 20%, 30% even if it hasn't been 5 seconds yet?  I don't
need ten progress messages on a checkpoint that only takes three
seconds to complete.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql.conf archive_command example

2011-09-02 Thread Andrew Dunstan



On 09/02/2011 01:00 PM, Robert Haas wrote:

On Fri, Sep 2, 2011 at 10:34 AM, Kevin Grittner
kevin.gritt...@wicourts.gov  wrote:

Dimitri Fontainedimi...@2ndquadrant.fr  wrote:

Robert Haasrobertmh...@gmail.com  writes:

I'm also wondering if providing some shell script examples of a
fault-tolerant script to handle archiving would be useful.

I think it would.

My usual advice is to avoid having to write one if possible,
because it's more complex than it looks.  What about recommending
existing solutions, such as walmgr from Skytools?

Even better, what about including a default archiving tool, that
could be either another script in bin/ or rather an internal
command. The default would accept a location as argument, for
simple needs you mount a remote filesystem and there you go.  If
you need something more complex, you still can provide it
yourself.

In a green field I might argue for having an archvie_directory GUC
instead of archive_command.  As it stands, it might be a really good
idea to provide a pg_archiveto executable which takes as arguments a
directory path and the arguments passed to the archive script.  With
a little extra effort, the executable could check for some file
which would specify what host and path should be writing archives
there, to avoid problems with copied database directories
accidentally writing to the same location as the source.

Such an executable seems like minimal effort compared to the
problems it would solve.

If there's an existing tool with appropriate licensing which is
sufficiently portable and reliable, all the better -- let's ship it
and use that for our example archive_command.

Another thought I have here is to wonder whether we should change
something on the server side so that we don't NEED such a complicated
archive_command.  I mean, copying a file to a directory somewhere is
not fundamentally a complex operation.  Nor is using ssh to copy it to
another machine.  The fact that archive_commands need to be so complex
seems like a usability defect.  The consensus seems to be that just
using something like 'cp' for your archive command won't work out
well, but maybe instead of shipping a more complicated script we
should be trying to eliminate (or at least reduce) the need for a more
complicated script.



The problem is that the number of ways you might want to do things is 
quite large. For example, you might want to copy the archives to more 
than one place for safety reasons. I pretty much always set 
archive_command to a script which I can then customize to my heart's 
content, and it seems to work pretty well. Providing a simple example of 
such a script seems like it could be useful.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql.conf archive_command example

2011-09-02 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 maybe instead of shipping a more complicated script we should be
 trying to eliminate (or at least reduce) the need for a more
 complicated script.
 
That was the intent of my pg_archiveto suggestion.  I'll amend it
(based on other comments) to allow for a URL as an alternative way
to specify the directory.  So your archive_command might be:
 
'pg_archiveto /var/pgsql/backup/wal/ %p %f'
 
or:
 
'pg_archiveto http://backup-server/mydbserver/wal/ %p %f'
 
or maybe:
 
'pg_archiveto /mnt/someremotedirectory/ %p %f'
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql.conf archive_command example

2011-09-02 Thread Kevin Grittner
Andrew Dunstan and...@dunslane.net wrote:
 
 For example, you might want to copy the archives to more than one
 place for safety reasons.
 
We've never felt that the way to do that was to put the logic for it
in the archive script -- we archive to a local directory and set up
rsync tasks on cron to distribute it.  Otherwise you might not
archive to one target if the other is down, or might have trouble
catching up with a target when it comes back from being down.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql.conf archive_command example

2011-09-02 Thread Magnus Hagander
On Fri, Sep 2, 2011 at 19:13, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Andrew Dunstan and...@dunslane.net wrote:

 For example, you might want to copy the archives to more than one
 place for safety reasons.

 We've never felt that the way to do that was to put the logic for it
 in the archive script -- we archive to a local directory and set up
 rsync tasks on cron to distribute it.  Otherwise you might not
 archive to one target if the other is down, or might have trouble
 catching up with a target when it comes back from being down.

Archiving it locally will give you a window of lost data if you crash.

The point being - different people have different requirements, which
is one thing our currently solution is very good at catering to - the
downside being lots of work. I don't think dumbing down the system is
a good idea - but shipping an example script probably is.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] WAL low watermark during base backup

2011-09-02 Thread Magnus Hagander
Attached patch implements a low watermark wal location in the
walsender shmem array. Setting this value in a walsender prevents
transaction log removal prior to this point - similar to how
wal_keep_segments work, except with an absolute number rather than
relative. For now, this is set when running a base backup with WAL
included - to prevent the required WAL to be recycled away while the
backup is running, without having to guestimate the value for
wal_keep_segments. (There could be other ways added to set it in the
future, but that's the only one I've done for now)

It obviously needs some documentation updates as well, but I wanted to
get some comments on the way it's done before I work on those.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 8194,8199  CreateRestartPoint(int flags)
--- 8194,8206 
   * Calculate the last segment that we need to retain because of
   * wal_keep_segments, by subtracting wal_keep_segments from
   * the given xlog location, recptr.
+  *
+  * Also check if there any in-progress base backup that has set
+  * a low watermark preventing us from removing it.
+  *
+  * NOTE! If the last segment calculated is later than the one
+  * passed in through logId and logSeg, do *not* update the
+  * values.
   */
  static void
  KeepLogSeg(XLogRecPtr recptr, uint32 *logId, uint32 *logSeg)
***
*** 8202,8211  KeepLogSeg(XLogRecPtr recptr, uint32 *logId, uint32 *logSeg)
--- 8209,8260 
  	uint32		seg;
  	int			d_log;
  	int			d_seg;
+ 	XLogRecPtr	lowwater = {0,0};
+ 	uint32		lowwater_log = 0;
+ 	uint32		lowwater_seg = 0;
+ 
+ 	if (max_wal_senders  0)
+ 	{
+ 		int i;
+ 
+ 		/* Check if there is a WAL sender with a low watermark */
+ 		for (i = 0; i  max_wal_senders; i++)
+ 		{
+ 			/* use volatile pointer to prevent code rearrangement */
+ 			volatile WalSnd *walsnd = WalSndCtl-walsnds[i];
+ 			XLogRecPtr	this_lowwater;
+ 
+ 			if (walsnd-pid == 0)
+ continue;
+ 
+ 			SpinLockAcquire(walsnd-mutex);
+ 			this_lowwater = walsnd-lowwater;
+ 			SpinLockRelease(walsnd-mutex);
+ 
+ 			if (XLByteLT(lowwater, this_lowwater))
+ lowwater = this_lowwater;
+ 		}
+ 
+ 		XLByteToSeg(lowwater, lowwater_log, lowwater_seg);
+ 	}
  
  	if (wal_keep_segments == 0)
+ 	{
+ 		/* No wal_keep_segments, so let low watermark decide */
+ 		if (lowwater_log == 0  lowwater_seg == 0)
+ 			return;
+ 
+ 		if (lowwater_log  *logId || (lowwater_log == *logId  lowwater_seg  *logSeg))
+ 		{
+ 			*logId = lowwater_log;
+ 			*logSeg = lowwater_seg;
+ 		}
  		return;
+ 	}
  
+ 	/*
+ 	 * Calculate the cutoff point caused by wal_keep_segments
+ 	 */
  	XLByteToSeg(recptr, log, seg);
  
  	d_seg = wal_keep_segments % XLogSegsPerFile;
***
*** 8226,8231  KeepLogSeg(XLogRecPtr recptr, uint32 *logId, uint32 *logSeg)
--- 8275,8293 
  	else
  		log = log - d_log;
  
+ 	/*
+ 	 * If the low watermark is earlier than wal_keep_segments, let
+ 	 * it decide if we keep or not.
+ 	 */
+ 	if (lowwater_log  0 || lowwater_seg  0)
+ 	{
+ 		if (lowwater_log  log || (lowwater_log == log  lowwater_seg  seg))
+ 		{
+ 			log = lowwater_log;
+ 			seg = lowwater_seg;
+ 		}
+ 	}
+ 
  	/* don't delete WAL segments newer than the calculated segment */
  	if (log  *logId || (log == *logId  seg  *logSeg))
  	{
*** a/src/backend/replication/basebackup.c
--- b/src/backend/replication/basebackup.c
***
*** 96,101  perform_base_backup(basebackup_options *opt, DIR *tblspcdir)
--- 96,115 
  	startptr = do_pg_start_backup(opt-label, opt-fastcheckpoint, labelfile);
  	SendXlogRecPtrResult(startptr);
  
+ 	/*
+ 	 * If we are including WAL, set a low watermark so that ordinary
+ 	 * WAL rotation won't remove the files for us.
+ 	 */
+ 	if (opt-includewal)
+ 	{
+ 		/* use volatile pointer to prevent code rearrangement */
+ 		volatile WalSnd *walsnd = MyWalSnd;
+ 
+ 		SpinLockAcquire(walsnd-mutex);
+ 		walsnd-lowwater = startptr;
+ 		SpinLockRelease(walsnd-mutex);
+ 	}
+ 
  	PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0);
  	{
  		List	   *tablespaces = NIL;
*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
***
*** 899,904  InitWalSnd(void)
--- 899,905 
  			 */
  			walsnd-pid = MyProcPid;
  			MemSet(walsnd-sentPtr, 0, sizeof(XLogRecPtr));
+ 			MemSet(walsnd-lowwater, 0, sizeof(XLogRecPtr));
  			walsnd-state = WALSNDSTATE_STARTUP;
  			SpinLockRelease(walsnd-mutex);
  			/* don't need the lock anymore */
*** a/src/include/replication/walsender.h
--- b/src/include/replication/walsender.h
***
*** 46,51  typedef struct WalSnd
--- 46,57 
  	XLogRecPtr	flush;
  	XLogRecPtr	apply;
  
+ 	/*
+ 	 * Prevent xlog rotation prior to the low watermark (used during base
+ 	 * backups that include the transaction log)
+ 	 */
+ 	XLogRecPtr	lowwater;
+ 
  	/* Protects shared variables shown 

Re: [HACKERS] pg_upgrade automatic testing

2011-09-02 Thread Bruce Momjian
Peter Eisentraut wrote:
 On tor, 2011-09-01 at 18:55 -0400, Bruce Momjian wrote:
  Tom Lane wrote:
   Peter Eisentraut pete...@gmx.net writes:
+# contrib/pg_upgrade/test.sh
+#
+# Test driver for pg_upgrade.  Initializes a new database cluster,
+# runs the regression tests (to put in some data), runs pg_dumpall,
+# runs pg_upgrade, runs pg_dumpall again, compares the dumps.
   
   Hm .. my experience is that that doesn't work at all, because the
   regression tests set up assorted C functions whose implementations are
   in pg_regress.so, and it creates them with absolute path references
   to pg_regress.so.  When you try to load that into another installation
   that's a different version of PG, it quite properly fails.  So I think
   that as given, this script is only useful for testing pg_upgrade of
   $currentversion to $currentversion.  Which is surely better than no test
  
  Reminder --- you can't use pg_upgrade to go from the same catalog
  version to the same catalog version because the catalog version is
  embedded in the tablespace directory name.
 
 Well, it does work, but only because the regression tests don't keep a
 tablespace around at the end.  Would pg_upgrade complain otherwise?

The restriction is only for old clusters that contain tablespaces, and
you get this error message during the check phase:

Cannot migrate to/from the same system catalog version when
using tablespaces.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade automatic testing

2011-09-02 Thread Bruce Momjian
Andrew Dunstan wrote:
 
 
 On 09/02/2011 10:36 AM, Peter Eisentraut wrote:
  On tor, 2011-09-01 at 18:55 -0400, Bruce Momjian wrote:
  Tom Lane wrote:
  Peter Eisentrautpete...@gmx.net  writes:
  +# contrib/pg_upgrade/test.sh
  +#
  +# Test driver for pg_upgrade.  Initializes a new database cluster,
  +# runs the regression tests (to put in some data), runs pg_dumpall,
  +# runs pg_upgrade, runs pg_dumpall again, compares the dumps.
  Hm .. my experience is that that doesn't work at all, because the
  regression tests set up assorted C functions whose implementations are
  in pg_regress.so, and it creates them with absolute path references
  to pg_regress.so.  When you try to load that into another installation
  that's a different version of PG, it quite properly fails.  So I think
  that as given, this script is only useful for testing pg_upgrade of
  $currentversion to $currentversion.  Which is surely better than no test
  Reminder --- you can't use pg_upgrade to go from the same catalog
  version to the same catalog version because the catalog version is
  embedded in the tablespace directory name.
  Well, it does work, but only because the regression tests don't keep a
  tablespace around at the end.  Would pg_upgrade complain otherwise?
 
 
 In any case, it would be good to get rid of the limitation if possible. 
 Then we could look at creating an automated test that we could use in 
 the buildfarm.

Well, the idea of using the catalog version was that it is something we
expect would change during any change in the system catalogs or internal
data format that would require the use of pg_upgrade.  I am unclear what
other fixed value we could use for this.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL low watermark during base backup

2011-09-02 Thread Jaime Casanova
On Fri, Sep 2, 2011 at 12:52 PM, Magnus Hagander mag...@hagander.net wrote:
 Attached patch implements a low watermark wal location in the
 walsender shmem array. Setting this value in a walsender prevents
 transaction log removal prior to this point - similar to how
 wal_keep_segments work, except with an absolute number rather than
 relative.

cool! just a question, shouldn't we clean the value after the base
backup has finished?

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL low watermark during base backup

2011-09-02 Thread Magnus Hagander
On Fri, Sep 2, 2011 at 20:12, Jaime Casanova ja...@2ndquadrant.com wrote:
 On Fri, Sep 2, 2011 at 12:52 PM, Magnus Hagander mag...@hagander.net wrote:
 Attached patch implements a low watermark wal location in the
 walsender shmem array. Setting this value in a walsender prevents
 transaction log removal prior to this point - similar to how
 wal_keep_segments work, except with an absolute number rather than
 relative.

 cool! just a question, shouldn't we clean the value after the base
 backup has finished?

We should. Thanks, will fix!

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade automatic testing

2011-09-02 Thread Andrew Dunstan



On 09/02/2011 01:55 PM, Bruce Momjian wrote:

Andrew Dunstan wrote:


On 09/02/2011 10:36 AM, Peter Eisentraut wrote:

On tor, 2011-09-01 at 18:55 -0400, Bruce Momjian wrote:

Tom Lane wrote:

Peter Eisentrautpete...@gmx.net   writes:

+# contrib/pg_upgrade/test.sh
+#
+# Test driver for pg_upgrade.  Initializes a new database cluster,
+# runs the regression tests (to put in some data), runs pg_dumpall,
+# runs pg_upgrade, runs pg_dumpall again, compares the dumps.

Hm .. my experience is that that doesn't work at all, because the
regression tests set up assorted C functions whose implementations are
in pg_regress.so, and it creates them with absolute path references
to pg_regress.so.  When you try to load that into another installation
that's a different version of PG, it quite properly fails.  So I think
that as given, this script is only useful for testing pg_upgrade of
$currentversion to $currentversion.  Which is surely better than no test

Reminder --- you can't use pg_upgrade to go from the same catalog
version to the same catalog version because the catalog version is
embedded in the tablespace directory name.

Well, it does work, but only because the regression tests don't keep a
tablespace around at the end.  Would pg_upgrade complain otherwise?


In any case, it would be good to get rid of the limitation if possible.
Then we could look at creating an automated test that we could use in
the buildfarm.

Well, the idea of using the catalog version was that it is something we
expect would change during any change in the system catalogs or internal
data format that would require the use of pg_upgrade.  I am unclear what
other fixed value we could use for this.


Why not use a prefix like 'd_' and 's_' so they can't be identical?

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump --exclude-table-data

2011-09-02 Thread Andrew Dunstan



On 08/24/2011 05:01 PM, Josh Berkus wrote:

For those who are (like my clients :-) ) anxious to get their hands on
this immediately, a backport patch is also attached which applies to 9.0
sources, and applies with offsets to 8.4 sources.

FWIW, I have immediate use for this in creating cut-down versions of
databases for testing purposes.  It'll eliminate a couple pages of shell
scripts for me.



OK, this seems to have some pluses and no negative comments, so it seems 
worth going forward. Do we want an equivalent pg_restore option?


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-09-02 Thread Peter Eisentraut
On fre, 2011-09-02 at 17:02 +0200, Tomas Vondra wrote:
 On 2 Září 2011, 15:44, Peter Eisentraut wrote:
  On fre, 2011-09-02 at 11:01 +0200, Tomas Vondra wrote:
  What about logging it with a lower level, e.g. NOTICE instead of the
  current LOG? If that's not a solution then a new GUC is needed I
  guess.
 
  Changing the log level is not the appropriate solution.  Make it a
  configuration parameter.
 
 Why is it inappropriate solution? There's a log_checkpoints GUC that
 drives it and you can either get basic info (summary of the checkpoint) or
 detailed log (with a lower log level).

If a user is forced to change the log level to get at one particular
piece of information, they will then also turn on countless other log
events on that level, which is annoying.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-09-02 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On fre, 2011-09-02 at 17:02 +0200, Tomas Vondra wrote:
 Why is it inappropriate solution? There's a log_checkpoints GUC that
 drives it and you can either get basic info (summary of the checkpoint) or
 detailed log (with a lower log level).

 If a user is forced to change the log level to get at one particular
 piece of information, they will then also turn on countless other log
 events on that level, which is annoying.

Yeah, if we're going to have this at all, some form of GUC control over
it seems necessary.  I'm still disturbed by the verbosity of the
proposed output though.  Couldn't we collapse the information into a
single log entry per checkpoint cycle?  Perhaps that would be sufficient
to just let the log_checkpoints setting be used as-is.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] symbol mismatches on minor version upgrades

2011-09-02 Thread Peter Eisentraut
On tis, 2011-08-30 at 15:31 -0400, Tom Lane wrote:
 While not wishing to deny that this can be a problem, I think you're
 overstating this aspect:
 
  Now if this had been, say, plpython, which is also developed closely
  together with the backend, but is probably shipped in a separate binary
  package and has extra dependencies, so it might reasonably not be
  upgraded at the same time, there would be additional problems.  We
  should figure out a way to advise packagers about putting in tight
  enough version dependencies when this happens.
 
 This is not possible at least in the Red Hat world, because all the
 subpackages have exact-version-and-release dependencies tying them
 together.  That's distro policy not just my whim, and I'd expect other
 server-grade distros to have similar policies. 

Well, the Debian packages don't do this.  Obviously, they could, but no
one has ever clarified this.

Exactly which distribution policy is this?  I would rather think that
this is something that upstream needs to determine.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-09-02 Thread Tomas Vondra
On 2 Září 2011, 20:46, Peter Eisentraut wrote:
 On fre, 2011-09-02 at 17:13 +0200, Tomas Vondra wrote:
 Sure, but I think the effort not to have a zillion of GUC makes sense.

 Well, I'll be the first to agree that reducing complexity in
 configuration and tuning settings is worth aiming for.

 But for logging, I'd rather have more settings, theoretically up to one
 for each possible message.  That doesn't increase complexity, as long as
 it has linear behavior.  It's debatable whether that means a new
 log_something parameter for each situation, or just a single parameter
 containing some kind of list, or something else, but that's a different
 problem.

OK, good point. The effort to make this work without a GUC obviously is a
dead end, and the idea to turn log_checkpoints into an enum seems
reasonable.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade automatic testing

2011-09-02 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 Andrew Dunstan wrote:
 In any case, it would be good to get rid of the limitation if possible. 
 Then we could look at creating an automated test that we could use in 
 the buildfarm.

 Well, the idea of using the catalog version was that it is something we
 expect would change during any change in the system catalogs or internal
 data format that would require the use of pg_upgrade.  I am unclear what
 other fixed value we could use for this.

IMO there's next to no value in testing that scenario anyway, since
nobody would ever use it in the field.  What *would* be of value is
testing upgrades from previous release versions.  Probably that will
take some work in the buildfarm infrastructure as well as figuring out a
non-problematic test case to use, but that's the direction we need to
head in.

The other reasonable use-case for pg_upgrade is migrating a development
or beta-test installation across a catversion bump, but again the
tablespace directory name is not a restriction.  Perhaps we could have
a test that involves checking out the
commit-just-before-the-last-catversion-change and seeing if we can
migrate from that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sha1, sha2 functions into core?

2011-09-02 Thread k...@rice.edu
On Fri, Sep 02, 2011 at 09:54:07PM +0300, Peter Eisentraut wrote:
 On ons, 2011-08-31 at 13:12 -0500, Ross J. Reedstrom wrote:
  Hmm, this thread seems to have petered out without a conclusion. Just
  wanted to comment that there _are_ non-password storage uses for these
  digests: I use them in a context of storing large files in a bytea
  column, as a means to doing data deduplication, and avoiding pushing
  files from clients to server and back.
 
 But I suppose you don't need the hash function in the database system
 for that.
 

It is very useful to have the same hash function used internally by
PostgreSQL exposed externally. I know you can get the code and add an
equivalent one of your own...

Regards,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-09-02 Thread Tomas Vondra
On 2 Září 2011, 20:48, Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
 On fre, 2011-09-02 at 17:02 +0200, Tomas Vondra wrote:
 Why is it inappropriate solution? There's a log_checkpoints GUC that
 drives it and you can either get basic info (summary of the checkpoint)
 or
 detailed log (with a lower log level).

 If a user is forced to change the log level to get at one particular
 piece of information, they will then also turn on countless other log
 events on that level, which is annoying.

 Yeah, if we're going to have this at all, some form of GUC control over
 it seems necessary.  I'm still disturbed by the verbosity of the
 proposed output though.  Couldn't we collapse the information into a
 single log entry per checkpoint cycle?  Perhaps that would be sufficient
 to just let the log_checkpoints setting be used as-is.

Yes, the GUC seems inevitable. I'm still working on the verbosity for
different checpoint typees, but I think 5 seconds for xlog and 10% for
timed checkpoint is a reasonable setting.

I'm not sure what you mean by collapsing the info into a single log entry?
That would mean I'd have to wait till the checkpoint completes, and one of
the reasons for this patch was to get info about progress while the
checkpoint is running.

For example I'd like to have this information in cases when the checkpoint
never finishes - for example when performing automated benchmarks. I
sometimes just kill the database (I need to rebuild it from scratch for
the next run) so I don't get any checkpoint message at all. Waiting for
the checkpoint would significantly increase the time for each run and thus
for the whole benchmark. With one run that probably is not a problem, with
360 runs each minute makes a big difference.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] symbol mismatches on minor version upgrades

2011-09-02 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On tis, 2011-08-30 at 15:31 -0400, Tom Lane wrote:
 This is not possible at least in the Red Hat world, because all the
 subpackages have exact-version-and-release dependencies tying them
 together.  That's distro policy not just my whim, and I'd expect other
 server-grade distros to have similar policies. 

 Well, the Debian packages don't do this.  Obviously, they could, but no
 one has ever clarified this.

 Exactly which distribution policy is this?

https://fedoraproject.org/wiki/Packaging:Guidelines#Requiring_Base_Package

The discussion there doesn't go into all that much detail, but there
have been enough bad experiences at Red Hat with partially-updated
packages that people have very negative views of doing otherwise.

 I would rather think that
 this is something that upstream needs to determine.

Since the upstream is shipping a single tarball, it's unlikely that
they'll think about it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump --exclude-table-data

2011-09-02 Thread Josh Berkus

 OK, this seems to have some pluses and no negative comments, so it seems
 worth going forward. Do we want an equivalent pg_restore option?

I'm not sure it's *as* important for pg_restore, since I can easily use
 a manifest to avoid restoring data for a single table.  So I guess it's
a question of how hard is it to add it?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] CF2011-09

2011-09-02 Thread Josh Berkus
Hackers,

Who is our CommitFest Master for the next CF?  It's only around 2 weeks
away.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-09-02 Thread Tom Lane
Tomas Vondra t...@fuzzy.cz writes:
 On 2 Září 2011, 20:48, Tom Lane wrote:
 Yeah, if we're going to have this at all, some form of GUC control over
 it seems necessary.  I'm still disturbed by the verbosity of the
 proposed output though.  Couldn't we collapse the information into a
 single log entry per checkpoint cycle?  Perhaps that would be sufficient
 to just let the log_checkpoints setting be used as-is.

 I'm not sure what you mean by collapsing the info into a single log entry?
 That would mean I'd have to wait till the checkpoint completes, and one of
 the reasons for this patch was to get info about progress while the
 checkpoint is running.

Well, to be blunt, putting stuff into the postmaster log is entirely the
wrong way to satify a requirement like that.  If you want to expose
progress information, it should be exposed via something dynamic like
pg_stat_activity.  What could be useful to log is statistics that people
might want to aggregate later, and I don't immediately see a reason why
such stats couldn't be logged just once at end of each checkpoint cycle.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql.conf archive_command example

2011-09-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Another thought I have here is to wonder whether we should change
 something on the server side so that we don't NEED such a complicated
 archive_command.  I mean, copying a file to a directory somewhere is
 not fundamentally a complex operation.  Nor is using ssh to copy it to
 another machine.

It is once you consider error handling and catering to N combinations of
user requirements.

I think the notion that we should get rid of archive_command in favor of
something more hard-wired is sheer lunacy.  We have a nicely decoupled
arrangement for dealing with these issues now; why would we want to pull
them into the server?

Now, providing a more useful sample script is certainly reasonable.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql.conf archive_command example

2011-09-02 Thread Robert Haas
On Fri, Sep 2, 2011 at 3:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Another thought I have here is to wonder whether we should change
 something on the server side so that we don't NEED such a complicated
 archive_command.  I mean, copying a file to a directory somewhere is
 not fundamentally a complex operation.  Nor is using ssh to copy it to
 another machine.

 It is once you consider error handling and catering to N combinations of
 user requirements.

 I think the notion that we should get rid of archive_command in favor of
 something more hard-wired is sheer lunacy.  We have a nicely decoupled
 arrangement for dealing with these issues now; why would we want to pull
 them into the server?

I wasn't really proposing to get rid of it, but I do wonder if there
are some configuration parameters we could add somewhere that would
make common cases easier without making really complex things
impossible.

 Now, providing a more useful sample script is certainly reasonable.

Yep, so let's start with that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CF2011-09

2011-09-02 Thread Kevin Grittner
Josh Berkus j...@agliodbs.com wrote:
 
 Who is our CommitFest Master for the next CF?  It's only around 2
 weeks away.
 
I have volunteered for that, although I can step aside if someone
else wants to do it.  (I'm assuming nobody does, but you never
know.)  If nobody else claims it within the next couple days, I'll
post a call for reviewers.
 
In any event, people should be getting those patches into the CF web
application:
 
https://commitfest.postgresql.org/action/commitfest_view/open
 
And it's never too early to sign up as a reviewer or to actually
review patches.  I'm pleasantly surprised to see that 11 patches
have already been committed, rejected, or marked ready for
committer, and two more have been reviewed and are waiting for the
author!
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql.conf archive_command example

2011-09-02 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 I think the notion that we should get rid of archive_command in
 favor of something more hard-wired is sheer lunacy.
 
It's a good thing nobody proposed that.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql.conf archive_command example

2011-09-02 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 I wasn't really proposing to get rid of it, but I do wonder if there
 are some configuration parameters we could add somewhere that would
 make common cases easier without making really complex things
 impossible.

I think the solution to that problem is to provide a default archive
command that just does the very simple thing, namely moving the WAL file
to some place given as parameter.  Some “local” mount point.

 Now, providing a more useful sample script is certainly reasonable.

Allow me to insist here, I don't think a sample is what we need to be
providing our users.  I think they deserve a default production grade
implementation of a very simple archive command.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-09-02 Thread Tomas Vondra
On 2 Září 2011, 21:23, Tom Lane wrote:
 Tomas Vondra t...@fuzzy.cz writes:
 On 2 Z?? 2011, 20:48, Tom Lane wrote:
 Yeah, if we're going to have this at all, some form of GUC control over
 it seems necessary.  I'm still disturbed by the verbosity of the
 proposed output though.  Couldn't we collapse the information into a
 single log entry per checkpoint cycle?  Perhaps that would be
 sufficient
 to just let the log_checkpoints setting be used as-is.

 I'm not sure what you mean by collapsing the info into a single log
 entry?
 That would mean I'd have to wait till the checkpoint completes, and one
 of
 the reasons for this patch was to get info about progress while the
 checkpoint is running.

 Well, to be blunt, putting stuff into the postmaster log is entirely the
 wrong way to satify a requirement like that.  If you want to expose
 progress information, it should be exposed via something dynamic like
 pg_stat_activity.  What could be useful to log is statistics that people
 might want to aggregate later, and I don't immediately see a reason why
 such stats couldn't be logged just once at end of each checkpoint cycle.

The problem with pg_stat_activity is that it provides just 'current
state', no history. If you don't sample that often enough, you may
completely miss the checkpoint (and thus you'll have no info about it,
unless you enable log_checkpoints and check the log). And it's imposible
to use if you need info about something that happened in the past. And
AFAIK it does not show processes running timed checkpoints for example.

So IMHO it's unusable for what I'm trying to achieve.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade automatic testing

2011-09-02 Thread Andrew Dunstan



On 09/02/2011 03:04 PM, Tom Lane wrote:

Bruce Momjianbr...@momjian.us  writes:

Andrew Dunstan wrote:

In any case, it would be good to get rid of the limitation if possible.
Then we could look at creating an automated test that we could use in
the buildfarm.

Well, the idea of using the catalog version was that it is something we
expect would change during any change in the system catalogs or internal
data format that would require the use of pg_upgrade.  I am unclear what
other fixed value we could use for this.

IMO there's next to no value in testing that scenario anyway, since
nobody would ever use it in the field.  What *would* be of value is
testing upgrades from previous release versions.  Probably that will
take some work in the buildfarm infrastructure as well as figuring out a
non-problematic test case to use, but that's the direction we need to
head in.



I'm working on this right now.

Basically the idea is to stash away build and data dirs (after we've run 
regression, PL and contrib testing) for stable branches (via a command 
line option) and then test upgrading them. A trial run on the first part 
is currently running. Once I have that sorted out I'll work on the 
testing bit ;-)


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: regular logging of checkpoint progress

2011-09-02 Thread Tom Lane
Tomas Vondra t...@fuzzy.cz writes:
 On 2 Září 2011, 21:23, Tom Lane wrote:
 Well, to be blunt, putting stuff into the postmaster log is entirely the
 wrong way to satify a requirement like that.  If you want to expose
 progress information, it should be exposed via something dynamic like
 pg_stat_activity.  What could be useful to log is statistics that people
 might want to aggregate later, and I don't immediately see a reason why
 such stats couldn't be logged just once at end of each checkpoint cycle.

 The problem with pg_stat_activity is that it provides just 'current
 state', no history. If you don't sample that often enough, you may
 completely miss the checkpoint (and thus you'll have no info about it,
 unless you enable log_checkpoints and check the log). And it's imposible
 to use if you need info about something that happened in the past. And
 AFAIK it does not show processes running timed checkpoints for example.

Your requirements seem sufficiently slippery that I don't think you've
thought them through very well.  As far as I can see, the proposed patch
will mostly result in bloating the postmaster log with repetitive
information of next to no value.  I can see the reason for wanting to
know what the system is doing right now, and I can see the reason for
wanting aggregatable statistics so that you can tell over time whether
your settings need to be adjusted.  I don't see the value in a lot of
10% done log entries --- there is essentially no historical value in
such, IMO, because they don't correspond to any user-level activity.
(Which is what distinguishes this from, say, log_connections or
log_statements.)  The fact that you can't figure out a reasonable
frequency for making the entries is a symptom of the design being wrong
at its core.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade automatic testing

2011-09-02 Thread Bruce Momjian
Andrew Dunstan wrote:
 
 
 On 09/02/2011 01:55 PM, Bruce Momjian wrote:
  Andrew Dunstan wrote:
 
  On 09/02/2011 10:36 AM, Peter Eisentraut wrote:
  On tor, 2011-09-01 at 18:55 -0400, Bruce Momjian wrote:
  Tom Lane wrote:
  Peter Eisentrautpete...@gmx.net   writes:
  +# contrib/pg_upgrade/test.sh
  +#
  +# Test driver for pg_upgrade.  Initializes a new database cluster,
  +# runs the regression tests (to put in some data), runs pg_dumpall,
  +# runs pg_upgrade, runs pg_dumpall again, compares the dumps.
  Hm .. my experience is that that doesn't work at all, because the
  regression tests set up assorted C functions whose implementations are
  in pg_regress.so, and it creates them with absolute path references
  to pg_regress.so.  When you try to load that into another installation
  that's a different version of PG, it quite properly fails.  So I think
  that as given, this script is only useful for testing pg_upgrade of
  $currentversion to $currentversion.  Which is surely better than no test
  Reminder --- you can't use pg_upgrade to go from the same catalog
  version to the same catalog version because the catalog version is
  embedded in the tablespace directory name.
  Well, it does work, but only because the regression tests don't keep a
  tablespace around at the end.  Would pg_upgrade complain otherwise?
 
  In any case, it would be good to get rid of the limitation if possible.
  Then we could look at creating an automated test that we could use in
  the buildfarm.
  Well, the idea of using the catalog version was that it is something we
  expect would change during any change in the system catalogs or internal
  data format that would require the use of pg_upgrade.  I am unclear what
  other fixed value we could use for this.
 
 Why not use a prefix like 'd_' and 's_' so they can't be identical?

What would 'd' and 's' mean?  Destination and Source?  That doesn't help
us because a destination today might be a source tomorrow and we don't
rename these tablespace directory names.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql.conf archive_command example

2011-09-02 Thread Dimitri Fontaine
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 (1)  We're talking about a new /bin executable to do this which
 could be referenced in an archive_command string or run from a
 script called by archive_command, right?

That, or an internal implementation.  That would be a function in the
backend that would be called when archive_command is set to some
specific value, like for example test and cd are command lines referring
not to some executable on the PATH but to some internal code in bash.

But I know some people here will frown upon that idea.

 (2)  It should copy, not move, with protection against overwriting
 an existing file.

See, we need to provide a good production grade facility.  I've never
tried to do it myself, I'm just using walmgr to manage my archives.

 (3)  Maybe not in the initial version, but eventually it might be
 nice to support URLs of some known protocols in addition to local
 directories.

I guess that if patches are provided in that direction it would be kind
of hard to refuse integrating them :)

 (4)  Maybe not in the initial version, but eventually it might be
 nice to support checking for an owner file of some sort in the
 target directory, to help sort out problems with copied databases
 writing to the same location as the source.

Then we need to provide the associated restore command which must not be
one owner here I guess…

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sha1, sha2 functions into core?

2011-09-02 Thread Ross J. Reedstrom
On Fri, Sep 02, 2011 at 02:05:45PM -0500, k...@rice.edu wrote:
 On Fri, Sep 02, 2011 at 09:54:07PM +0300, Peter Eisentraut wrote:
  On ons, 2011-08-31 at 13:12 -0500, Ross J. Reedstrom wrote:
   Hmm, this thread seems to have petered out without a conclusion. Just
   wanted to comment that there _are_ non-password storage uses for these
   digests: I use them in a context of storing large files in a bytea
   column, as a means to doing data deduplication, and avoiding pushing
   files from clients to server and back.
  
  But I suppose you don't need the hash function in the database system
  for that.
  
 
 It is very useful to have the same hash function used internally by
 PostgreSQL exposed externally. I know you can get the code and add an
 equivalent one of your own...
 
Thanks for the support Ken, but Peter's right: the only backend use in
my particular case is to let the backend do the hash calc during bulk
loads: in the production code path, having the hash in two places
doesn't save any work, since the client code has to calculate the hash
in order to test for its existence in the backend. I suppose if the
network cost was negligable, I could just push the files anyway, and
have a before-insert trigger calculate the hash and do the dedup: then
it'd be hidden in the backend completely. But as is, I can do all the
work in the client.

Ross
-- 
Ross Reedstrom, Ph.D. reeds...@rice.edu
Systems Engineer  Admin, Research Scientistphone: 713-348-6166
Connexions  http://cnx.orgfax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E  F888 D3AE 810E 88F0 BEDE

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CF2011-09

2011-09-02 Thread Josh Berkus
Kevin,

 I have volunteered for that, although I can step aside if someone
 else wants to do it.  (I'm assuming nobody does, but you never
 know.)  If nobody else claims it within the next couple days, I'll
 post a call for reviewers.

Don't look at me.

I haven't deciphered Robert's code enough to figure out how to add the
activity views we should have (i.e. all reviewers who haven't done
anything for 5 days, all submitters waiting for 5 days).

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mb_regress.sh gripes

2011-09-02 Thread Josh Kupershmidt
On Thu, Sep 1, 2011 at 9:25 AM, Robert Haas robertmh...@gmail.com wrote:
 Committed, with some changes.  I used the new --if-exists option for
 dropdb rather than doing it as you had it here; I assume this may have
 been the motivation for that patch.

Thanks, and that was indeed one reason I wanted dropdb --if-exists.

 I also just made the exit code 1
 no matter how many failures there were.  That seems more normal, and I
 wasn't altogether certain that $((expr)) is completely portable.

Yeah, I wasn't sure about that either. The old book The Unix
Programming Environment (section Arithmetic in sh) claims that
using expr or bc is the only way to do such arithmetic, e.g.:
  var=`expr $var + 1`

But then [1] documents arithmetic expansion as used in the patch. But
anyway, just returning 1 as you've done seems fine.

Josh

[1] 
http://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_06_04

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade automatic testing

2011-09-02 Thread Peter Eisentraut
On fre, 2011-09-02 at 16:00 -0400, Andrew Dunstan wrote:
 Basically the idea is to stash away build and data dirs (after we've run 
 regression, PL and contrib testing) for stable branches (via a command 
 line option) and then test upgrading them. A trial run on the first part 
 is currently running. Once I have that sorted out I'll work on the 
 testing bit ;-)

It won't work, unless you have a solution for fixing the paths of the
shared library modules used by the regression tests.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade automatic testing

2011-09-02 Thread Peter Eisentraut
On fre, 2011-09-02 at 15:04 -0400, Tom Lane wrote:
 IMO there's next to no value in testing that scenario anyway, since
 nobody would ever use it in the field.  What *would* be of value is
 testing upgrades from previous release versions.  Probably that will
 take some work in the buildfarm infrastructure as well as figuring out a
 non-problematic test case to use, but that's the direction we need to
 head in. 

Well, let's take a step back.  I originally developed this test runner a
few months ago while fixing upgrade issues related to composite types
that had been altered.  So this is actually useful stuff that would help
prevent these sorts of problems in the future, and would help developers
fix problems of this sort.

But if you think about it, it doesn't really test pg_upgrade, it tests
pg_dump.  So the test could just as well be moved to src/bin/pg_dump/
and be labeled pg_dump smoke test or whatever.  (Minor detail: The bug
fix above involved the --binary-upgrade flag, so it is somewhat
pg_upgrade related.)

A real pg_upgrade test suite should naturally upgrade across binary
incompatible versions.  The real question is how you develop a useful
test input.  Most pg_upgrade issues are not bugs of omission or
regression but unexpected corner cases discovered with databases of
nontrivial usage patterns.  (The recent one related to upgrade from 8.3
is an exception.)  Because the basic premise of pg_upgrade is, dump and
restore the schema, move over the files, that's it, and the rest of the
code is workarounds for obscure details that are difficult to anticipate
let alone test for.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade automatic testing

2011-09-02 Thread Andrew Dunstan



On 09/02/2011 06:37 PM, Peter Eisentraut wrote:

On fre, 2011-09-02 at 16:00 -0400, Andrew Dunstan wrote:

Basically the idea is to stash away build and data dirs (after we've run
regression, PL and contrib testing) for stable branches (via a command
line option) and then test upgrading them. A trial run on the first part
is currently running. Once I have that sorted out I'll work on the
testing bit ;-)

It won't work, unless you have a solution for fixing the paths of the
shared library modules used by the regression tests.


Well, we could drop those functions and not run tests that require them. 
Or we could possibly install the libraries in $libdir and hack pg_proc 
accordingly. We'd have to install them on both the source and 
destination branches, of course.


Maybe people can think of other possible solutions too. I don't think we 
should give up in this too easily.


Maybe we need to develop a test db specifically for pg_upgrade anyway.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade automatic testing

2011-09-02 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 09/02/2011 06:37 PM, Peter Eisentraut wrote:
 It won't work, unless you have a solution for fixing the paths of the
 shared library modules used by the regression tests.

 Well, we could drop those functions and not run tests that require them. 
 Or we could possibly install the libraries in $libdir and hack pg_proc 
 accordingly. We'd have to install them on both the source and 
 destination branches, of course.

The only one that's problematic is pg_regress.so; contrib modules are
already installed in $libdir.  I still think that installing
pg_regress.so in $libdir may be the most reasonable solution, assuming
that the delta involved isn't too great.  Yeah, we would have to
back-patch the changes into every release branch we want to test
upgrading from, but how risky is that really?  The *only* thing it
affects is the regression tests.

Maybe I should produce a draft patch for moving pg_regress.so that way,
and we could see how big a delta it really is.

 Maybe we need to develop a test db specifically for pg_upgrade anyway.

Possibly, but it'll always be more impoverished than the regular
regression test DBs IMO.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade automatic testing

2011-09-02 Thread Andrew Dunstan



On 09/02/2011 07:49 PM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

On 09/02/2011 06:37 PM, Peter Eisentraut wrote:

It won't work, unless you have a solution for fixing the paths of the
shared library modules used by the regression tests.

Well, we could drop those functions and not run tests that require them.
Or we could possibly install the libraries in $libdir and hack pg_proc
accordingly. We'd have to install them on both the source and
destination branches, of course.

The only one that's problematic is pg_regress.so; contrib modules are
already installed in $libdir.  I still think that installing
pg_regress.so in $libdir may be the most reasonable solution, assuming
that the delta involved isn't too great.  Yeah, we would have to
back-patch the changes into every release branch we want to test
upgrading from, but how risky is that really?  The *only* thing it
affects is the regression tests.


Agreed. It doesn't seem terribly dangerous.

There are three listed in the regression db I just looked at: 
regress.so, autoinc.so and refint.so.



Maybe I should produce a draft patch for moving pg_regress.so that way,
and we could see how big a delta it really is.


Sounds like a plan.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers