Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-03 Thread Simon Riggs
On Fri, 2010-09-03 at 12:50 +0900, Fujii Masao wrote:
 On Thu, Sep 2, 2010 at 11:32 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
  I understand what you're after, the idea of being able to set
  synchronization level on a per-transaction basis is cool. But I haven't seen
  a satisfactory design for it. I don't understand how it would work in
  practice. Even though it's cool, having different kinds of standbys
  connected is a more common scenario, and the design needs to accommodate
  that too. I'm all ears if you can sketch a design that can do that.
 
 That design would affect what the standby should reply. If we choose
 async/recv/fsync/replay on a per-transaction basis, the standby
 should send multiple LSNs and the master needs to decide when
 replication has been completed. OTOH, if we choose just sync/async,
 the standby has only to send one LSN.
 
 The former seems to be more useful, but triples the number of ACK
 from the standby. I'm not sure whether its overhead is ignorable,
 especially when the distance between the master and the standby is
 very long.

No, it doesn't. There is no requirement for additional messages. It just
adds 16 bytes onto the reply message, maybe 24. If there is a noticeable
overhead from that, shoot me. 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-03 Thread Fujii Masao
On Thu, Sep 2, 2010 at 7:24 PM, Fujii Masao masao.fu...@gmail.com wrote:
 I propose a configuration file standbys.conf, in the master:

 # STANDBY NAME    SYNCHRONOUS   TIMEOUT
 importantreplica  yes           100ms
 tempcopy          no            10s

 Seems good. In fact, instead of yes/no, async/recv/fsync/replay is specified
 in SYNCHRONOUS field?

 OTOH, something like standby_name parameter should be introduced in
 recovery.conf.

 We should allow multiple standbys with the same name? Probably yes.
 We might need to add NUMBER field into the standbys.conf, in the future.

Here is the proposed detailed design:

standbys.conf
=
# This is not initialized by initdb, so users need to create it under $PGDATA.
* The template is located in the PREFIX/share directory.

# This is read by postmaster at the startup as well as pg_hba.conf is.
* In EXEC_BACKEND environement, each walsender must read it at the startup.
* This is ignored when max_wal_senders is zero.
* FATAL is emitted when standbys.conf doesn't exist even if max_wal_senders
  is positive.

# SIGHUP makes only postmaser re-read the standbys.conf.
* New configuration doesn't affect the existing connections to the standbys,
  i.e., it's used only for subsequent connections.
* XXX: Should the existing connections react to new configuration? What if
  new standbys.conf doesn't have the standby_name of the existing
connection?

# The connection from the standby is rejected if its standby_name is not listed
  in standbys.conf.
* Multiple standbys with the same name are allowed.

# The valid values of SYNCHRONOUS field are async, recv, fsync and replay.

standby_name

# This is new string-typed parameter in recovery.conf.
* XXX: Should standby_name and standby_mode be merged?

# Walreceiver sends this to the master when establishing the connection.

Comments? Is the above too complicated for the first step? If so, I'd
propose to just introduce new recovery.conf parameter like replication_mode
specifying the synchronization level, instead.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-03 Thread Heikki Linnakangas

On 03/09/10 09:36, Simon Riggs wrote:

On Fri, 2010-09-03 at 12:50 +0900, Fujii Masao wrote:

That design would affect what the standby should reply. If we choose
async/recv/fsync/replay on a per-transaction basis, the standby
should send multiple LSNs and the master needs to decide when
replication has been completed. OTOH, if we choose just sync/async,
the standby has only to send one LSN.

The former seems to be more useful, but triples the number of ACK
from the standby. I'm not sure whether its overhead is ignorable,
especially when the distance between the master and the standby is
very long.


No, it doesn't. There is no requirement for additional messages.


Please explain how you do it then. When a commit record is sent to the 
standby, it needs to acknowledge it 1) when it has received it, 2) when 
it fsyncs it to disk and c) when it's replayed. I don't see how you can 
get around that.


Perhaps you can save a bit by combining multiple messages together, like 
in Nagle's algorithm, but then you introduce extra delays which is 
exactly what you don't want.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-03 Thread Fujii Masao
On Fri, Sep 3, 2010 at 3:36 PM, Simon Riggs si...@2ndquadrant.com wrote:
 The former seems to be more useful, but triples the number of ACK
 from the standby. I'm not sure whether its overhead is ignorable,
 especially when the distance between the master and the standby is
 very long.

 No, it doesn't. There is no requirement for additional messages. It just
 adds 16 bytes onto the reply message, maybe 24. If there is a noticeable
 overhead from that, shoot me.

The reply message would be sent at least three times every WAL chunk,
i.e., when the standby has received, synced and replayed it. So ISTM
that additional messagings happen. Though I'm not sure if this really
harms the performance...

You'd like to choose async/recv/fsync/replay on a per-transaction basis
rather than async/sync?

Even when async is chosen as the synchronization level in standbys.conf,
it can be changed to other level in transaction? If so, the standby has
to send the reply even if async is chosen and most replies might be
ignored in the master.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-03 Thread Simon Riggs
On Fri, 2010-09-03 at 09:55 +0300, Heikki Linnakangas wrote:
 On 03/09/10 09:36, Simon Riggs wrote:
  On Fri, 2010-09-03 at 12:50 +0900, Fujii Masao wrote:
  That design would affect what the standby should reply. If we choose
  async/recv/fsync/replay on a per-transaction basis, the standby
  should send multiple LSNs and the master needs to decide when
  replication has been completed. OTOH, if we choose just sync/async,
  the standby has only to send one LSN.
 
  The former seems to be more useful, but triples the number of ACK
  from the standby. I'm not sure whether its overhead is ignorable,
  especially when the distance between the master and the standby is
  very long.
 
  No, it doesn't. There is no requirement for additional messages.
 
 Please explain how you do it then. When a commit record is sent to the 
 standby, it needs to acknowledge it 1) when it has received it, 2) when 
 it fsyncs it to disk and c) when it's replayed. I don't see how you can 
 get around that.
 
 Perhaps you can save a bit by combining multiple messages together, like 
 in Nagle's algorithm, but then you introduce extra delays which is 
 exactly what you don't want.

From my perspective, you seem to be struggling to find reasons why this
should not happen, rather than seeing the alternatives that would
obviously present themselves if your attitude was a positive one. We
won't make any progress with this style of discussion.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] thousand unrelated data files in pg_default tablespace

2010-09-03 Thread Pavel Stehule
hello

2010/8/31 Tom Lane t...@sss.pgh.pa.us:
 Pavel Stehule pavel.steh...@gmail.com writes:
 there is a dump from 8KB files

 Well, those certainly look like tables/indexes not temp files.
 So we can rule out one theory.

 You're *certain* these aren't referenced from pg_class.relfilenode
 of any of the databases in the server?

I have a info, so these files are not in pg_class.relfilenode. More -
these files are three months old, and in this time was server two
times restarted.

Regards

Pavel Stehule

                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] regclass without error?

2010-09-03 Thread Tatsuo Ishii
Hi,

Is there any way to use regclass without having ERROR?

pgpool-II needs to find the oid from table name and for the purpose it
issues something like SELECT 'table_name'::regproc::oid. Problem is,
if the table does not exist, an error occured and the transaction
aborts. Ideally if the table does not exist, the SELECT returns 0
(InvalidOid).

Any idea?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-03 Thread Heikki Linnakangas

On 03/09/10 10:45, Simon Riggs wrote:

On Fri, 2010-09-03 at 09:55 +0300, Heikki Linnakangas wrote:

On 03/09/10 09:36, Simon Riggs wrote:

On Fri, 2010-09-03 at 12:50 +0900, Fujii Masao wrote:

That design would affect what the standby should reply. If we choose
async/recv/fsync/replay on a per-transaction basis, the standby
should send multiple LSNs and the master needs to decide when
replication has been completed. OTOH, if we choose just sync/async,
the standby has only to send one LSN.

The former seems to be more useful, but triples the number of ACK
from the standby. I'm not sure whether its overhead is ignorable,
especially when the distance between the master and the standby is
very long.


No, it doesn't. There is no requirement for additional messages.


Please explain how you do it then. When a commit record is sent to the
standby, it needs to acknowledge it 1) when it has received it, 2) when
it fsyncs it to disk and c) when it's replayed. I don't see how you can
get around that.

Perhaps you can save a bit by combining multiple messages together, like
in Nagle's algorithm, but then you introduce extra delays which is
exactly what you don't want.



From my perspective, you seem to be struggling to find reasons why this

should not happen, rather than seeing the alternatives that would
obviously present themselves if your attitude was a positive one. We
won't make any progress with this style of discussion.


Huh? You made a very clear claim above that you don't need additional 
messages. I explained why I don't think that's true, and asked you to 
explain why you think it is true. Whether the claim is true or not does 
not depend on my attitude.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] plan time of MASSIVE partitioning ...

2010-09-03 Thread PostgreSQL - Hans-Jürgen Schönig
hello everybody,

we came across an issue which turned out to be more serious than previously 
expected.
imagine a system with, say, 1000 partitions (heavily indexed) or so. the time 
taken by the planner is already fairly heavy in this case.

i tried this one with 5000 unindexed tables (just one col):

test=# \timing
Timing is on.
test=# prepare x(int4) AS select * from t_data order by id desc;
PREPARE
Time: 361.552 ms

you will see similar or higher runtimes in case of 500 partitions and a handful 
of indexes.

does anybody see a potential way to do a shortcut through the planner?
a prepared query is no solution here as constraint exclusion would be dead in 
this case (making the entire thing an even bigger drama).

did anybody think of a solution to this problem.
or more precisely: can there be a solution to this problem?

many thanks,

hans

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-09-03 Thread Boszormenyi Zoltan
Hi,

Dimitri Fontaine írta:
 Simon Riggs si...@2ndquadrant.com writes:
   
 Seems strange. If you have 2 standbys and you say you would like node1
 to be the preferred candidate, then you load it so heavily that a remote
 server with by-definition much larger network delay responds first, then
 I say your preference was wrong.
 

 There's a communication mismatch here I think. The problem with the
 dynamic aspect of the system is that the admin wants to plan ahead and
 choose in advance the failover server.

 Other than that I much prefer the automatic and dynamic quorum idea.

   
 If you, Jan and Yeb wish to completely exclude standbys from being part
 of any quorum, then I guess we need to have per-standby settings to
 allow that to be defined. I'm in favour of giving people options. That
 needn't be a mandatory per-standby setting, just a non-default option,
 so that we can reduce the complexity of configuration for common
 cases.
 

 +1

   
 Maximum Performance = quorum = 0
 Maximum Availability = quorum = 1, timeout_action = commit
 Maximum Protection = quorum = 1, timeout_action = shutdown
 

 +1

 Being able to say that a given server has not been granted to
 participate into the vote allowing to reach the global durability quorum
 will allow for choosing the failover candidates.

 Now you're able to have this reporting server and know for sure that
 your sync replicated transactions are not waiting for it.

 To summarize, the current per-transaction approach would be :

  - transaction level replication synchronous behaviour
   

Sorry for answering such an old mail, but what is the purpose of
a transaction level synchronous behaviour if async transactions
can be held back by a sync transaction?

In my patch, when the transactions were waiting for ack from
the standby, they have already released all their locks, the wait
happened at the latest possible point in CommitTransaction().

In Fujii's patch (I am looking at synch_rep_0722.patch, is there
a newer one?) the wait happens in RecordTransactionCommit()
so other transactions still see the sync transaction and most
importantly, the locks held by the sync transaction will make
the async  transactions waiting for the same lock wait too.

  - proxy/cascading in core
  - quorum setup for deciding any commit is safe
  - any server can be excluded from the sync quorum
  - timeout can still raises exception or ignore (commit)?

 This last point seems to need some more discussion, or I didn't
 understand well the current positions and proposals.

 Regards,
   

Best regards,
Zoltán Böszörményi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-03 Thread Simon Riggs
On Fri, 2010-09-03 at 12:33 +0300, Heikki Linnakangas wrote:
 On 03/09/10 10:45, Simon Riggs wrote:
  On Fri, 2010-09-03 at 09:55 +0300, Heikki Linnakangas wrote:
  On 03/09/10 09:36, Simon Riggs wrote:
  On Fri, 2010-09-03 at 12:50 +0900, Fujii Masao wrote:
  That design would affect what the standby should reply. If we choose
  async/recv/fsync/replay on a per-transaction basis, the standby
  should send multiple LSNs and the master needs to decide when
  replication has been completed. OTOH, if we choose just sync/async,
  the standby has only to send one LSN.
 
  The former seems to be more useful, but triples the number of ACK
  from the standby. I'm not sure whether its overhead is ignorable,
  especially when the distance between the master and the standby is
  very long.
 
  No, it doesn't. There is no requirement for additional messages.
 
  Please explain how you do it then. When a commit record is sent to the
  standby, it needs to acknowledge it 1) when it has received it, 2) when
  it fsyncs it to disk and c) when it's replayed. I don't see how you can
  get around that.
 
  Perhaps you can save a bit by combining multiple messages together, like
  in Nagle's algorithm, but then you introduce extra delays which is
  exactly what you don't want.
 
  From my perspective, you seem to be struggling to find reasons why this
  should not happen, rather than seeing the alternatives that would
  obviously present themselves if your attitude was a positive one. We
  won't make any progress with this style of discussion.
 
 Huh? You made a very clear claim above that you don't need additional 
 messages. I explained why I don't think that's true, and asked you to 
 explain why you think it is true. Whether the claim is true or not does 
 not depend on my attitude.

Why exactly would we need to send 3 messages when we could send 1? 
Replace your statements of it needs to with why would it instead.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-03 Thread Heikki Linnakangas

On 03/09/10 13:20, Simon Riggs wrote:

On Fri, 2010-09-03 at 12:33 +0300, Heikki Linnakangas wrote:

On 03/09/10 10:45, Simon Riggs wrote:

On Fri, 2010-09-03 at 09:55 +0300, Heikki Linnakangas wrote:

On 03/09/10 09:36, Simon Riggs wrote:

On Fri, 2010-09-03 at 12:50 +0900, Fujii Masao wrote:

That design would affect what the standby should reply. If we choose
async/recv/fsync/replay on a per-transaction basis, the standby
should send multiple LSNs and the master needs to decide when
replication has been completed. OTOH, if we choose just sync/async,
the standby has only to send one LSN.

The former seems to be more useful, but triples the number of ACK
from the standby. I'm not sure whether its overhead is ignorable,
especially when the distance between the master and the standby is
very long.


No, it doesn't. There is no requirement for additional messages.


Please explain how you do it then. When a commit record is sent to the
standby, it needs to acknowledge it 1) when it has received it, 2) when
it fsyncs it to disk and c) when it's replayed. I don't see how you can
get around that.

Perhaps you can save a bit by combining multiple messages together, like
in Nagle's algorithm, but then you introduce extra delays which is
exactly what you don't want.



 From my perspective, you seem to be struggling to find reasons why this

should not happen, rather than seeing the alternatives that would
obviously present themselves if your attitude was a positive one. We
won't make any progress with this style of discussion.


Huh? You made a very clear claim above that you don't need additional
messages. I explained why I don't think that's true, and asked you to
explain why you think it is true. Whether the claim is true or not does
not depend on my attitude.


Why exactly would we need to send 3 messages when we could send 1?
Replace your statements of it needs to with why would it instead.


(scratches head..) What's the point of differentiating 
received/fsynced/replayed, if the master receives the ack for all of 
them at the same time?


Let's try this with an example: In the master, I do stuff and commit a 
transaction. I want to know when the transaction is fsynced in the 
standby. The WAL is sent to the standby, up to the commit record.


Upthread you said that:

 The standby does *not* need
 to know the wishes of transactions on the master.

So, when does standby send the single message back to the master?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Heikki Linnakangas

On 02/09/10 23:13, Tom Lane wrote:

The WaitLatch ...timeout API could use a bit of refinement.  I'd suggest
defining negative timeout as meaning wait forever, so that timeout = 0
can be used for check but don't wait.  Also, it seems like the
function shouldn't just return void but should return a bool to show
whether it saw the latch set or timed out.


In case of WaitLatchOrSocket, the caller might want to know if a latch 
was set, the socket became readable, or it timed out. So we need three 
different return values.


 (Yeah, I realize the caller
 could look into the latch to find that out, but callers really ought to
 treat latches as opaque structs.)

Hmm, maybe we need a TestLatch function to check if a latch is set.


I don't think you have the select-failed logic right in
WaitLatchOrSocket; on EINTR it will suppose that FD_ISSET is a valid
test to make, which I think ain't the case.  Just continue around
the loop.


Yep.

I also realized that the timeout handling is a bit surprising with 
interrupts. After EINTR we call select() again with the same timeout, so 
a signal effectively restarts the timer. We seem to have similar 
behavior in a couple of other places, in pgstat.c and auth.c. So maybe 
that's OK and just needs to be documented, but I thought I'd bring it up.



It seems like both implementations are #include'ing more than they
ought to --- why replication/walsender.h, in particular?


Windows implementation needs it for the max_wal_senders variable, to 
allocate enough shared Event objects in LatchShmemInit. In unix_latch.c 
it's not needed.



Also, using sig_atomic_t for owner_pid is entirely not sane.
On many platforms sig_atomic_t is only a byte, and besides
which you have no need for that field to be settable by a
signal handler.


Hmm, true, it doesn't need to be set from signal handler, but is there 
an atomicity problem if one process calls ReleaseLatch while another 
process is in SetLatch? ReleaseLatch sets owner_pid to 0, while SetLatch 
reads it and calls kill() on it. Can we assume that pid_t is atomic, or 
do we need a spinlock to protect it? (Windows implementation has a 
similar issue with HANDLE instead of pid_t)


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] thousand unrelated data files in pg_default tablespace

2010-09-03 Thread Heikki Linnakangas

On 03/09/10 11:16, Pavel Stehule wrote:

2010/8/31 Tom Lanet...@sss.pgh.pa.us:

Pavel Stehulepavel.steh...@gmail.com  writes:

there is a dump from 8KB files


Well, those certainly look like tables/indexes not temp files.
So we can rule out one theory.

You're *certain* these aren't referenced from pg_class.relfilenode
of any of the databases in the server?


I have a info, so these files are not in pg_class.relfilenode. More -
these files are three months old, and in this time was server two
times restarted.


Maybe they're tables that were created in a transaction, but the process 
crashed hard before committing? Like:


BEGIN;
CREATE TABLE foo (...);
COPY foo FROM ...;
kill -9 postgres

That will leave behind a file like that. Do you do something like that 
in the application?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Streaming a base backup from master

2010-09-03 Thread Heikki Linnakangas
It's been discussed before that it would be cool if you could stream a 
new base backup from the master server, via libpq. That way you would 
not need low-level filesystem access to initialize a new standby.


Magnus mentioned today that he started hacking on that, and 
coincidentally I just started experimenting with it yesterday as well 
:-). So let's get this out on the mailing list.


Here's a WIP patch. It adds a new TAKE_BACKUP command to the 
replication command set. Upon receiving that command, the master starts 
a COPY, and streams a tarred copy of the data directory to the client. 
The patch includes a simple command-line tool, pg_streambackup, to 
connect to a server and request a backup that you can then redirect to a 
.tar file or pipe to tar x.


TODO:

* We need a smarter way to do pg_start/stop_backup() with this. At the 
moment, you can only have one backup running at a time, but we shouldn't 
have that limitation with this built-in mechanism.


* The streamed backup archive should contain all the necessary WAL files 
too, so that you don't need to set up archiving to use this. You could 
just point the tiny client tool to the server, and get a backup archive 
containing everything that's necessary to restore correctly.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index 113dc3f..5bb4159 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/replication
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = walsender.o walreceiverfuncs.o walreceiver.o
+OBJS = walsender.o walreceiverfuncs.o walreceiver.o basebackup.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
new file mode 100644
index 000..9be9bdf
--- /dev/null
+++ b/src/backend/replication/basebackup.c
@@ -0,0 +1,276 @@
+/*-
+ *
+ * basebackup.c
+ *	  code for taking a base backup and streaming it a standby
+ *
+ * Portions Copyright (c) 2010-2010, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  $PostgreSQL$
+ *
+ *-
+ */
+#include postgres.h
+
+#include sys/types.h
+#include sys/stat.h
+#include unistd.h
+#include dirent.h
+#include time.h
+
+#include access/xlog_internal.h /* for pg_start/stop_backup */
+#include utils/builtins.h
+#include lib/stringinfo.h
+#include libpq/libpq.h
+#include libpq/pqformat.h
+#include replication/basebackup.h
+#include storage/fd.h
+
+static void sendDir(char *path);
+static void sendFile(char *path);
+static void _tarWriteHeader(char *filename, uint64 fileLen);
+
+void
+SendBaseBackup(void)
+{
+	StringInfoData buf;
+
+	DirectFunctionCall2(pg_start_backup, CStringGetTextDatum(basebackup),
+		BoolGetDatum(true));
+
+	/* Send CopyOutResponse message */
+	pq_beginmessage(buf, 'H');
+	pq_sendbyte(buf, 0);		/* overall format */
+	pq_sendint(buf, 0, 2);			/* natts */
+	pq_endmessage(buf);
+
+	/* tar up the data directory */
+	sendDir(.);
+
+	/* Send CopyDone message */
+	pq_putemptymessage('c');
+
+	/* XXX: Is there no DirectFunctionCall0? */
+	DirectFunctionCall1(pg_stop_backup, (Datum) 0);
+}
+
+static void
+sendDir(char *path)
+{
+	DIR		   *dir;
+	struct dirent *de;
+	char		pathbuf[MAXPGPATH];
+	struct stat statbuf;
+
+	dir = AllocateDir(path);
+	while ((de = ReadDir(dir, path)) != NULL)
+	{
+		/* Skip special stuff */
+		if (strcmp(de-d_name, .) == 0 || strcmp(de-d_name, ..) == 0)
+			continue;
+
+		snprintf(pathbuf, MAXPGPATH, %s/%s, path, de-d_name);
+
+		/* Skip pg_xlog and postmaster.pid */
+		if (strcmp(pathbuf, ./pg_xlog) == 0)
+			continue;
+		if (strcmp(pathbuf, ./postmaster.pid) == 0)
+			continue;
+
+		if (lstat(pathbuf, statbuf) != 0)
+		{
+			if (errno != ENOENT)
+			{
+elog(WARNING, could not stat file or directory \%s\: %m,
+	 pathbuf);
+			}
+			continue;
+		}
+
+		if (S_ISDIR(statbuf.st_mode))
+		{
+			/* call ourselves recursively for a directory */
+			sendDir(pathbuf);
+		}
+		else if (S_ISREG(statbuf.st_mode))
+		{
+			sendFile(pathbuf);
+		}
+		else
+			elog(WARNING, skipping special file \%s\, pathbuf);
+	}
+	FreeDir(dir);
+}
+
+
+/* Functions for handling tar file format, copied from pg_dump **/
+
+/*
+ * Utility routine to print possibly larger than 32 bit integers in a
+ * portable fashion.  Filled with zeros.
+ */
+static void
+print_val(char *s, uint64 val, unsigned int base, size_t len)
+{
+	int			i;
+
+	for (i = len; i  0; i--)
+	{
+		int			digit = val % base;
+
+		s[i - 1] = '0' + digit;
+		val = val / base;
+	}
+}
+
+/*
+ * Maximum file size for a tar member: The limit inherent in the
+ * format is 2^33-1 bytes (nearly 8 GB).  But we don't want to exceed
+ * what we can represent in pgoff_t.
+ */
+#define 

Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Thom Brown
On 3 September 2010 12:19, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 TODO:

 * We need a smarter way to do pg_start/stop_backup() with this. At the
 moment, you can only have one backup running at a time, but we shouldn't
 have that limitation with this built-in mechanism.

Would it be possible to not require pg_start/stop_backup() for this
new feature? (yes, I'm probably missing something obvious here)

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Dave Page
On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Here's a WIP patch. It adds a new TAKE_BACKUP command to the replication
 command set. Upon receiving that command, the master starts a COPY, and
 streams a tarred copy of the data directory to the client. The patch
 includes a simple command-line tool, pg_streambackup, to connect to a server
 and request a backup that you can then redirect to a .tar file or pipe to
 tar x.

Cool. Can you add a TODO to build in code to un-tar the archive? tar
is not usually found on Windows systems, and as we already have tar
extraction code in pg_restore it could presumably be added relatively
painlessly.


-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Magnus Hagander
On Fri, Sep 3, 2010 at 13:19, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 It's been discussed before that it would be cool if you could stream a new
 base backup from the master server, via libpq. That way you would not need
 low-level filesystem access to initialize a new standby.

 Magnus mentioned today that he started hacking on that, and coincidentally I
 just started experimenting with it yesterday as well :-). So let's get this
 out on the mailing list.

 Here's a WIP patch. It adds a new TAKE_BACKUP command to the replication
 command set. Upon receiving that command, the master starts a COPY, and
 streams a tarred copy of the data directory to the client. The patch
 includes a simple command-line tool, pg_streambackup, to connect to a server
 and request a backup that you can then redirect to a .tar file or pipe to
 tar x.

 TODO:

 * We need a smarter way to do pg_start/stop_backup() with this. At the
 moment, you can only have one backup running at a time, but we shouldn't
 have that limitation with this built-in mechanism.

 * The streamed backup archive should contain all the necessary WAL files
 too, so that you don't need to set up archiving to use this. You could just
 point the tiny client tool to the server, and get a backup archive
 containing everything that's necessary to restore correctly.

For this last point, this should of course be *optional*, but it would
be very good to have that option (and probably on by default).


Couple of quick comments that I saw directly differentiated from the
code I have :-) We chatted some about it already, but it should be
included for others...

* It should be possible to pass the backup label through, not just
hardcode it to basebackup

* Needs support for tablespaces. We should either follow the symlinks
and pick up the files, or throw an error if it's there. Silently
delivering an incomplete backup is not a good thing :-)

* Is there a point in adapting the chunk size to the size of the libpq buffers?

FWIW, my implementation was as a user-defined function, which has the
advantage it can run on 9.0. But most likely this code can be ripped
out and provided as a separate backport project for 9.0 if necessary -
no need to have separate codebases.

Other than that, our code is remarkably similar.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-09-03 Thread Fujii Masao
On Fri, Sep 3, 2010 at 6:43 PM, Boszormenyi Zoltan z...@cybertec.at wrote:
 In my patch, when the transactions were waiting for ack from
 the standby, they have already released all their locks, the wait
 happened at the latest possible point in CommitTransaction().

 In Fujii's patch (I am looking at synch_rep_0722.patch, is there
 a newer one?)

No ;)

We'll have to create the patch based on the result of the recent
discussion held on other thread.

 the wait happens in RecordTransactionCommit()
 so other transactions still see the sync transaction and most
 importantly, the locks held by the sync transaction will make
 the async  transactions waiting for the same lock wait too.

The transaction should be invisible to other transactions until
its replication has been completed. So I put the wait before
CommitTransaction() calls ProcArrayEndTransaction(). Is this unsafe?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Magnus Hagander
On Fri, Sep 3, 2010 at 13:25, Thom Brown t...@linux.com wrote:
 On 3 September 2010 12:19, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 TODO:

 * We need a smarter way to do pg_start/stop_backup() with this. At the
 moment, you can only have one backup running at a time, but we shouldn't
 have that limitation with this built-in mechanism.

 Would it be possible to not require pg_start/stop_backup() for this
 new feature? (yes, I'm probably missing something obvious here)

You don't need to run it *manually*, but the process needs to run it
automatically in the background for you. Which it does already in the
suggested patch.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Thom Brown
On 3 September 2010 12:30, Magnus Hagander mag...@hagander.net wrote:
 On Fri, Sep 3, 2010 at 13:25, Thom Brown t...@linux.com wrote:
 On 3 September 2010 12:19, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 TODO:

 * We need a smarter way to do pg_start/stop_backup() with this. At the
 moment, you can only have one backup running at a time, but we shouldn't
 have that limitation with this built-in mechanism.

 Would it be possible to not require pg_start/stop_backup() for this
 new feature? (yes, I'm probably missing something obvious here)

 You don't need to run it *manually*, but the process needs to run it
 automatically in the background for you. Which it does already in the
 suggested patch.

Ah, clearly I didn't read the patch in any detail.  Thanks :)

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Heikki Linnakangas

On 03/09/10 14:25, Thom Brown wrote:

On 3 September 2010 12:19, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

TODO:

* We need a smarter way to do pg_start/stop_backup() with this. At the
moment, you can only have one backup running at a time, but we shouldn't
have that limitation with this built-in mechanism.


Would it be possible to not require pg_start/stop_backup() for this
new feature? (yes, I'm probably missing something obvious here)


Well, pg_start_backup() does several things:

1. It sets the forceFullPageWrites flag, so that we don't get partial 
pages in the restored database.

2. It performs a checkpoint
3. It creates a backup label file

We certainly need 1 and 2. We don't necessary need to write the backup 
label file to the data directory when we're streaming the backup 
directly to the client, we can just include it in the streamed archive.


pg_stop_backup() also does several things:
1. It clears the forceFullPageWrites flag.
2. It writes an end-of-backup WAL record
3. It switches to new WAL segment, to get the final WAL segment archived.
4. It writes a backup history file
5. It removes the backup label file.
6. It waits for all the required WAL files to be archived.

We need 1, but the rest we could do in a smarter way. When we have more 
control of the backup process, I don't think we need the end-of-backup 
WAL record or the backup label anymore. We can add the pg_control file 
as the last file in the archive, and set minRecoveryPoint in it to the 
last WAL record needed to recover.


So no, we don't really need pg_start/stop_backup() per se, but we'll 
need something similar...


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Heikki Linnakangas

On 03/09/10 14:28, Dave Page wrote:

On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

Here's a WIP patch. It adds a new TAKE_BACKUP command to the replication
command set. Upon receiving that command, the master starts a COPY, and
streams a tarred copy of the data directory to the client. The patch
includes a simple command-line tool, pg_streambackup, to connect to a server
and request a backup that you can then redirect to a .tar file or pipe to
tar x.


Cool. Can you add a TODO to build in code to un-tar the archive? tar
is not usually found on Windows systems, and as we already have tar
extraction code in pg_restore it could presumably be added relatively
painlessly.


Ok. Another obvious thing that people will want is to gzip the tar file 
while sending it, to reduce network traffic.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Magnus Hagander
On Fri, Sep 3, 2010 at 13:48, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 03/09/10 14:28, Dave Page wrote:

 On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com  wrote:

 Here's a WIP patch. It adds a new TAKE_BACKUP command to the
 replication
 command set. Upon receiving that command, the master starts a COPY, and
 streams a tarred copy of the data directory to the client. The patch
 includes a simple command-line tool, pg_streambackup, to connect to a
 server
 and request a backup that you can then redirect to a .tar file or pipe to
 tar x.

 Cool. Can you add a TODO to build in code to un-tar the archive? tar
 is not usually found on Windows systems, and as we already have tar
 extraction code in pg_restore it could presumably be added relatively
 painlessly.

 Ok. Another obvious thing that people will want is to gzip the tar file
 while sending it, to reduce network traffic.

Not necessarily obvious, needs to be configurable. There are a lot of
cases where you might not want it.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost estimates for parameterized paths

2010-09-03 Thread Robert Haas
On Thu, Sep 2, 2010 at 5:31 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Awhile back I ranted about replacing the planner's concept of inner
 indexscans with a more generalized notion of parameterized paths:
 http://archives.postgresql.org/pgsql-hackers/2009-10/msg00994.php

 The executor fixes for that are done, and now I'm grappling with getting
 the planner to do something useful with it.  The biggest problem I've run
 into is that a parameterized path can't really be assigned a fixed cost
 in the same way that a normal path can.  The current implementation of
 cost_index() depends on knowing the size of the outer relation --- that
 is, the expected number of execution loops for the indexscan --- in order
 to account for cache effects sanely while estimating the average cost of
 any one inner indexscan.  We know that that is an important thing to do
 because the cost estimates seem to be a lot closer to reality now that we
 do that than what we were getting before; so dropping the consideration is
 entirely out of the question.

 The planner is already cheating on this to a considerable extent, because
 it estimates the cost of an inner indexscan only once, using the first
 outer rel we try to join to.  That cost is cached and reused with other
 potential outer-rel join partners, even though very different numbers of
 outer rows might be involved.  This problem will get a lot worse with the
 types of plans that I hope the planner will be able to come up with after
 this fix goes in, because the indexscan might be at the bottom of a join
 nest.  So we need a real fix not another hack.

 The best idea I can come up with at the moment is to compute best case
 and worst case costs for a parameterized path, corresponding to the
 largest and smallest numbers of loops we might expect it to be repeated
 for.  The largest number of loops could be estimated via the cartesian
 product of the sizes of the other tables in the query, for example.  The
 worst case cost is its cost if only repeated once.  Then, during path
 pruning in add_path, we only discard a parameterized path if its best-case
 cost is worse than the worst-case cost of some otherwise comparable path.
 Whenever we join the parameterized path with the required outer relation,
 we redo the cost calculation using that rel's actual rowcount estimate in
 order to form a final cost estimate for the no-longer-parameterized join
 path.

Interestingly, I previously proposed almost exactly this approach to
handle a couple of other problems:

http://archives.postgresql.org/pgsql-hackers/2009-09/msg01379.php
(index only scans)
http://archives.postgresql.org/pgsql-hackers/2009-10/msg00125.php
(per-query work mem)

I'm not entirely sure whether we can use this approach for more than
one kind of problem at a time; if we can't, it's probably not a good
idea to do it at all.  I also fear that any venture into this area
will  involve slowing down add_path(), which is a hotspot when
planning large join nests.  That might not be a win, if this is the
only case it allows us to improve.

*thinks*

It strikes me that the outer tuple count is essentially being used to
derate the cost of the index probes.  So another way to look at this
is that the best-case cost of an index probe is just the CPU cost, and
the worst-case cost of an index probe is the CPU cost plus the disk
cost.  So your fear that not many parameterized paths will be
discarded is probably valid.  But then, maybe that's OK.  It seems to
me that the critical point is to make sure that we don't form them in
the first place in cases where we could get the same benefit by
commuting the joins.  Once we've gotten to the point of considering a
plan of this type, the chances that it's actually the best plan seem
pretty high.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-09-03 Thread Stephen Frost
* PostgreSQL - Hans-Jürgen Schönig (postg...@cybertec.at) wrote:
 did anybody think of a solution to this problem.
 or more precisely: can there be a solution to this problem?

Please post to the correct list (-performance) and provide information
like PG version, postgresql.conf, the actual table definition, the
resulting query plan, etc, etc...

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-09-03 Thread PostgreSQL - Hans-Jürgen Schönig

On Sep 3, 2010, at 2:04 PM, Stephen Frost wrote:

 * PostgreSQL - Hans-Jürgen Schönig (postg...@cybertec.at) wrote:
 did anybody think of a solution to this problem.
 or more precisely: can there be a solution to this problem?
 
 Please post to the correct list (-performance) and provide information
 like PG version, postgresql.conf, the actual table definition, the
 resulting query plan, etc, etc...
 
   Thanks,
 
   Stephen



hello stephen,

this seems like more a developer question to me than a pre performance one.
it is not related to the table structure at all - it is basically an issue with 
incredibly large inheritance lists.
it applies to postgres 9 and most likely to everything before.
postgresql.conf is not relevant at all at this point.

the plan is pretty fine.
the question is rather: does anybody see a chance to handle such lists more 
efficiently inside postgres?
also, it is not the point if my data structure is sane or not. it is really 
more generic - namely a shortcut for this case inside the planing process.

many thanks,

hans

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Greg Stark
On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 * We need a smarter way to do pg_start/stop_backup() with this. At the
 moment, you can only have one backup running at a time, but we shouldn't
 have that limitation with this built-in mechanism.

Well there's no particular reason we couldn't support having multiple
pg_start_backup() pending either. It's just not usually something
people have need so far.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-09-03 Thread Stephen Frost
* PostgreSQL - Hans-Jürgen Schönig (postg...@cybertec.at) wrote:
 this seems like more a developer question to me than a pre performance one.
 it is not related to the table structure at all - it is basically an issue 
 with incredibly large inheritance lists.
 it applies to postgres 9 and most likely to everything before.
 postgresql.conf is not relevant at all at this point.

Really?  What's constraint_exclusion set to?  Is GEQO getting used?
What are the GEQO parameters set to?  Do you have any CHECK constraints
on the tables?

If you want someone else to build a test case and start looking into it,
it's best to not make them have to guess at what you've done.

 the plan is pretty fine.
 the question is rather: does anybody see a chance to handle such lists more 
 efficiently inside postgres?
 also, it is not the point if my data structure is sane or not. it is really 
 more generic - namely a shortcut for this case inside the planing process.

Coming up with cases where PG doesn't perform well in a completely
contrived unrealistic environment isn't likely to impress anyone to
do anything.

A small (but complete..) test case which mimics a real world environment
and real world problem would go alot farther.  I can certainly believe
that people out there have partitions set up with lots of tables and
that it will likely grow- but they probably also have CHECK constraints,
have tweaked what constraint_exclusion is set to, have adjusted their
work_mem and related settings, maybe tweaked some planner GUCs, etc,
etc.

This is an area I'm actually interested in and curious about, I'd rather
work together on it than be told that the questions I'm asking aren't
relevant.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Heikki Linnakangas

On 03/09/10 15:16, Greg Stark wrote:

On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

* We need a smarter way to do pg_start/stop_backup() with this. At the
moment, you can only have one backup running at a time, but we shouldn't
have that limitation with this built-in mechanism.


Well there's no particular reason we couldn't support having multiple
pg_start_backup() pending either. It's just not usually something
people have need so far.


The backup label file makes that hard. There can be only one at a time.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] thousand unrelated data files in pg_default tablespace

2010-09-03 Thread Pavel Stehule
2010/9/3 Heikki Linnakangas heikki.linnakan...@enterprisedb.com:
 On 03/09/10 11:16, Pavel Stehule wrote:

 2010/8/31 Tom Lanet...@sss.pgh.pa.us:

 Pavel Stehulepavel.steh...@gmail.com  writes:

 there is a dump from 8KB files

 Well, those certainly look like tables/indexes not temp files.
 So we can rule out one theory.

 You're *certain* these aren't referenced from pg_class.relfilenode
 of any of the databases in the server?

 I have a info, so these files are not in pg_class.relfilenode. More -
 these files are three months old, and in this time was server two
 times restarted.

 Maybe they're tables that were created in a transaction, but the process
 crashed hard before committing? Like:

 BEGIN;
 CREATE TABLE foo (...);
 COPY foo FROM ...;
 kill -9 postgres

yes, it's possible - but there are not any record about server crash -
sometimes client crashes.

Regards

Pavel

 That will leave behind a file like that. Do you do something like that in
 the application?

 --
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-09-03 Thread Boszormenyi Zoltan
Fujii Masao írta:
 On Fri, Sep 3, 2010 at 6:43 PM, Boszormenyi Zoltan z...@cybertec.at wrote:
   
 In my patch, when the transactions were waiting for ack from
 the standby, they have already released all their locks, the wait
 happened at the latest possible point in CommitTransaction().

 In Fujii's patch (I am looking at synch_rep_0722.patch, is there
 a newer one?)
 

 No ;)

 We'll have to create the patch based on the result of the recent
 discussion held on other thread.

   
 the wait happens in RecordTransactionCommit()
 so other transactions still see the sync transaction and most
 importantly, the locks held by the sync transaction will make
 the async  transactions waiting for the same lock wait too.
 

 The transaction should be invisible to other transactions until
 its replication has been completed.

Invisible? How can it be invisible? You are in RecordTransactionCommit(),
even before calling ProcArrayEndTransaction(MyProc, latestXid) and
releasing the locks the transaction holds.

  So I put the wait before
 CommitTransaction() calls ProcArrayEndTransaction(). Is this unsafe?
   

I don't know whether it's unsafe. In my patch, I only registered the Xid
at the point where you do WaitXLogSend(), this was the safe point
to setup the waiting for sync ack. Otherwise, when the Xid registration
for the sync ack was done in CommitTransaction() later than
RecordTransactionCommit(), there was a race between the primary and
the standby. The scenario was that the standby received and processed
the COMMIT of certain Xids even before the backend  on the primary
properly registered its Xid, so the backend has set up the waiting for
sync ack after this Xid was acked by the standby. The result was stuck
backends.

My idea to split up the registration for wait and the waiting itself
would allow for transaction-level synchronous setting, i.e. in my
patch the transaction released the locks and did all the post-commit
cleanups *then* it waited for sync ack if needed. In the meantime,
because locks were already released, other transactions could
progress with their job, allowing that e.g. async transactions to
progress and theoretically finish faster than the sync transaction
that was waiting for the ack.

The solution in my patch was not racy, registration of the Xid
was done before XLogInsert() in RecordTransactionCommit().
If the standby acked the Xid to the primary before reaching the
end of CommitTransaction() then this backend didn't even needed
to wait because the Xid was found in its PGPROC structure
and the waiting for sync ack was torn down.

But with the LSNs, as you are waiting for XactLastRecEnd
which is set by XLogInsert(). I don't know if it's safe to
WaitXLogSend() after XLogFlush() in RecordTransactionCommit().
I remember that in previous instances of my patch even if I
put the waiting for sync ack directly after
latestXid = RecordTransactionCommit();
in CommitTransaction(), there were cases when I got stuck
backends after a pgbench run. I had the primary and standbys
on the same machine on different ports, so the ack was almost
instant, which wouldn't be the case with a real network. But the
race condition was still there it just doesn't show up with networks
being slower than memory.  In your patch, the waiting happens
almost at the end of RecordTransactionCommit(), so theoretically
it has the same race condition. Am I missing something?

Best regards,
Zoltán Böszörményi

 Regards,

   


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Robert Haas
On Fri, Sep 3, 2010 at 7:28 AM, Dave Page dp...@pgadmin.org wrote:
 On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Here's a WIP patch. It adds a new TAKE_BACKUP command to the replication
 command set. Upon receiving that command, the master starts a COPY, and
 streams a tarred copy of the data directory to the client. The patch
 includes a simple command-line tool, pg_streambackup, to connect to a server
 and request a backup that you can then redirect to a .tar file or pipe to
 tar x.

 Cool. Can you add a TODO to build in code to un-tar the archive? tar
 is not usually found on Windows systems, and as we already have tar
 extraction code in pg_restore it could presumably be added relatively
 painlessly.

It seems like the elephant in the room here is updating an existing
backup without recopying the entire data directory.  Perhaps that's
phase two, but worth keeping in mind...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Magnus Hagander
On Fri, Sep 3, 2010 at 15:24, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Sep 3, 2010 at 7:28 AM, Dave Page dp...@pgadmin.org wrote:
 On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Here's a WIP patch. It adds a new TAKE_BACKUP command to the replication
 command set. Upon receiving that command, the master starts a COPY, and
 streams a tarred copy of the data directory to the client. The patch
 includes a simple command-line tool, pg_streambackup, to connect to a server
 and request a backup that you can then redirect to a .tar file or pipe to
 tar x.

 Cool. Can you add a TODO to build in code to un-tar the archive? tar
 is not usually found on Windows systems, and as we already have tar
 extraction code in pg_restore it could presumably be added relatively
 painlessly.

 It seems like the elephant in the room here is updating an existing
 backup without recopying the entire data directory.  Perhaps that's
 phase two, but worth keeping in mind...

I'd say that's a very different use-case, but still a very useful one
of course. It's probably going to be a lot more complex (it would
require bi-directional traffic, I think)...


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Dave Page
On Fri, Sep 3, 2010 at 2:24 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Sep 3, 2010 at 7:28 AM, Dave Page dp...@pgadmin.org wrote:
 On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Here's a WIP patch. It adds a new TAKE_BACKUP command to the replication
 command set. Upon receiving that command, the master starts a COPY, and
 streams a tarred copy of the data directory to the client. The patch
 includes a simple command-line tool, pg_streambackup, to connect to a server
 and request a backup that you can then redirect to a .tar file or pipe to
 tar x.

 Cool. Can you add a TODO to build in code to un-tar the archive? tar
 is not usually found on Windows systems, and as we already have tar
 extraction code in pg_restore it could presumably be added relatively
 painlessly.

 It seems like the elephant in the room here is updating an existing
 backup without recopying the entire data directory.  Perhaps that's
 phase two, but worth keeping in mind...

rsync? Might be easier to use that from day 1 (well, day 2) than to
retrofit later.


-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-09-03 Thread Robert Haas
2010/9/3 PostgreSQL - Hans-Jürgen Schönig postg...@cybertec.at:
 i tried this one with 5000 unindexed tables (just one col):

 test=# \timing
 Timing is on.
 test=# prepare x(int4) AS select * from t_data order by id desc;
 PREPARE
 Time: 361.552 ms

 you will see similar or higher runtimes in case of 500 partitions and a 
 handful of indexes.

I'd like to see (1) a script to reproduce your test environment (as
Stephen also requested) and (2) gprof or oprofile results.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Robert Haas
On Fri, Sep 3, 2010 at 9:26 AM, Dave Page dp...@pgadmin.org wrote:
 rsync? Might be easier to use that from day 1 (well, day 2) than to
 retrofit later.

I'm not sure we want to depend on an external utility like that,
particularly one that users may not have installed.  And I'm not sure
if that can be made to work over a libpq channel, either.  But
certainly something with that functionality would be nice to have,
whether it ends up sharing code or not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Dave Page
On Fri, Sep 3, 2010 at 2:29 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Sep 3, 2010 at 9:26 AM, Dave Page dp...@pgadmin.org wrote:
 rsync? Might be easier to use that from day 1 (well, day 2) than to
 retrofit later.

 I'm not sure we want to depend on an external utility like that,
 particularly one that users may not have installed.  And I'm not sure
 if that can be made to work over a libpq channel, either.  But
 certainly something with that functionality would be nice to have,
 whether it ends up sharing code or not.

No, I agree we don't want an external dependency (I was just bleating
about needing tar on Windows). I was assuming/hoping there's a
librsync somewhere...


-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regclass without error?

2010-09-03 Thread Robert Haas
On Fri, Sep 3, 2010 at 4:28 AM, Tatsuo Ishii is...@postgresql.org wrote:
 Is there any way to use regclass without having ERROR?

 pgpool-II needs to find the oid from table name and for the purpose it
 issues something like SELECT 'table_name'::regproc::oid. Problem is,
 if the table does not exist, an error occured and the transaction
 aborts. Ideally if the table does not exist, the SELECT returns 0
 (InvalidOid).

 Any idea?

You can write a query against the system catalog tables.  Or you could
install a function that wraps the regclass cast in an exception
handler.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Path question

2010-09-03 Thread Hans-Jürgen Schönig

On Sep 2, 2010, at 1:20 AM, Robert Haas wrote:

 On Sep 1, 2010, at 10:21 AM, Greg Stark gsst...@mit.edu wrote:
 For what it's worth I disagree with Tom. I think this is a situation
 where we need *both* types of solution. Ideally we will be able to use
 a plain Append node for cases where we know the relative ordering of
 the data in different partitions, but there will always be cases where
 the structured partition data doesn't actually match up with the
 ordering requested and we'll need to fall back to a merge-append node.
 
 I agree. Explicit partitioning may open up some additional optimization 
 possibilities in certain cases, but Merge Append is more general and 
 extremely valuable in its own right.
 
 ...Robert
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



we have revised greg's wonderful work and ported the entire thing to head.
it solves the problem of merge_append. i did some testing earlier on today and 
it seems most important cases are working nicely.

here are some test cases:

test=# \d t_data
Table public.t_data
 Column |  Type   | Modifiers 
+-+---
 id | integer | 
 tstamp | date| 

test=# \d t_data_1
   Table public.t_data_1
 Column |  Type   | Modifiers 
+-+---
 id | integer | 
 tstamp | date| 
Indexes:
idx_1 btree (id)
Check constraints:
t_data_1_id_check CHECK (id = 1 AND id = 1)
Inherits: t_data

test=# \d t_data_2
   Table public.t_data_2
 Column |  Type   | Modifiers 
+-+---
 id | integer | 
 tstamp | date| 
Indexes:
idx_2 btree (id)
Check constraints:
t_data_2_id_check CHECK (id = 10001 AND id = 2)
Inherits: t_data

test=# \d t_data_3
   Table public.t_data_3
 Column |  Type   | Modifiers 
+-+---
 id | integer | 
 tstamp | date| 
Indexes:
idx_3 btree (id)
Check constraints:
t_data_3_id_check CHECK (id = 20001 AND id = 3)
Inherits: t_data


simple windowing ...

test=# explain select *, max(id) OVER ( ORDER BY id) from t_data ; 
 QUERY PLAN 
 
-
 WindowAgg  (cost=149.99..2154.43 rows=32140 width=8)
   -  Result  (cost=149.99..1672.33 rows=32140 width=8)
 -  Append  (cost=149.99..1672.33 rows=32140 width=8)
   -  Sort  (cost=149.78..155.13 rows=2140 width=8)
 Sort Key: public.t_data.id
 -  Seq Scan on t_data  (cost=0.00..31.40 rows=2140 
width=8)
   -  Index Scan using idx_1 on t_data_1 t_data  
(cost=0.00..318.25 rows=1 width=8)
   -  Index Scan using idx_2 on t_data_2 t_data  
(cost=0.00..318.25 rows=1 width=8)
   -  Index Scan using idx_3 on t_data_3 t_data  
(cost=0.00..318.25 rows=1 width=8)
(9 rows)

it does a nice index scan; merges the stuff and puts it up into the high level 
doing the windowing.

test=# select *, max(id) OVER ( ORDER BY id) from t_data LIMIT 10; 
 id |   tstamp   | max 
++-
  1 | 2010-01-01 |   1
  2 | 2010-01-01 |   2
  3 | 2010-01-01 |   3
  4 | 2010-01-01 |   4
  5 | 2010-01-01 |   5
  6 | 2010-01-01 |   6
  7 | 2010-01-01 |   7
  8 | 2010-01-01 |   8
  9 | 2010-01-01 |   9
 10 | 2010-01-01 |  10
(10 rows)

the cost model does what it should as well:

test=# explain select *, max(id) OVER ( ORDER BY id) from t_data ; 
 QUERY PLAN 
 
-
 WindowAgg  (cost=2872.41..3434.86 rows=32140 width=8)
   -  Sort  (cost=2872.41..2952.76 rows=32140 width=8)
 Sort Key: public.t_data.id
 -  Result  (cost=0.00..466.40 rows=32140 width=8)
   -  Append  (cost=0.00..466.40 rows=32140 width=8)
 -  Seq Scan on t_data  (cost=0.00..31.40 rows=2140 
width=8)
 -  Seq Scan on t_data_1 t_data  (cost=0.00..145.00 
rows=1 width=8)
 -  Seq Scan on t_data_2 t_data  (cost=0.00..145.00 
rows=1 width=8)
 -  Seq Scan on t_data_3 t_data  (cost=0.00..145.00 
rows=1 width=8)
(9 rows)

it has proven to be really valuable in my first tests.
maybe this is helpful for some people out there.

many thanks,

hans




merge-append-91-v1.diff
Description: Binary data


--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Robert Haas
On Fri, Sep 3, 2010 at 9:32 AM, Dave Page dp...@pgadmin.org wrote:
 No, I agree we don't want an external dependency (I was just bleating
 about needing tar on Windows). I was assuming/hoping there's a
 librsync somewhere...

The rsync code itself is not modular, I believe.  I think the author
thereof kind of took the approach of placing efficiency before all.
See:

http://www.samba.org/rsync/how-rsync-works.html ... especially the
section on The Rsync Protocol

I Googled librsync and got a hit, but that code is a rewrite of the
source base and seems to have little or no activity since 2004.

http://librsync.sourceforge.net/

That page writes: librsync is not wire-compatible with rsync 2.x, and
is not likely to be in the future.  The current version of rsync is
3.0.7.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] english parser in text search: support for multiple words in the same position

2010-09-03 Thread Robert Haas
On Wed, Sep 1, 2010 at 2:42 AM, Sushant Sinha sushant...@gmail.com wrote:
 I have attached a patch that emits parts of a host token, a url token,
 an email token and a file token. Further, it makes sure that a
 host/url/email/file token and the first part-token are at the same
 position in tsvector.

You should probably add this patch here:

https://commitfest.postgresql.org/action/commitfest_view/open

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regclass without error?

2010-09-03 Thread Tatsuo Ishii
 You can write a query against the system catalog tables.

That was pretty hard than I though(schema search path etc.).

 Or you could
 install a function that wraps the regclass cast in an exception
 handler.

That requires users to install the function. Annoying for users.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 The rsync code itself is not modular, I believe.  I think the author
 thereof kind of took the approach of placing efficiency before all.

Yeah, I looked into this when discussing this same concept at PGCon with
folks.  There doesn't appear to be a good librsync and, even if there
was, there's a heck of alot of complexity there that we *don't* need.
rsync is a great tool, don't get me wrong, but let's not try to go over
our heads here.

We don't need permissions handling, as an example.  I also don't think
we need the binary diff/partial file transfer capability- we already
break relations into 1G chunks (when/if they reach that size), so you
won't necessairly be copying the entire relation if you're just doing
mtime based or per-file-checksum based detection.  We don't need device
node handling, we don't need auto-ignoring files, or pattern
exclusion/inclusion, we don't really need a progress bar (though it'd be
nice.. :), etc, etc, etc.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Tom Lane
Fujii Masao masao.fu...@gmail.com writes:
 On Fri, Sep 3, 2010 at 11:08 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 elog(FATAL) is *certainly* not a better idea.  I think there's really
 nothing that can be done, you just have to silently ignore the error.

 Hmm.. some functions called by a signal handler use elog(FATAL), e.g.,
 RecoveryConflictInterrupt() do that when unknown conflict mode is given
 as an argument. Are these calls unsafe, too?

[ shrug... ]  I stated before that the Hot Standby patch is doing
utterly unsafe things in signal handlers.  Simon rejected that.
I am waiting for irrefutable evidence to emerge from the field
(and am very confident that it will be forthcoming...) before
I argue with him further.  Meanwhile, I'm not going to accept anything
unsafe in a core facility like this patch is going to be.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regclass without error?

2010-09-03 Thread Marko Tiikkaja

On 2010-09-03 4:46 PM, Tatsuo Ishii wrote:

You can write a query against the system catalog tables.


That was pretty hard than I though(schema search path etc.).


Or you could
install a function that wraps the regclass cast in an exception
handler.


That requires users to install the function. Annoying for users.


How about using a SAVEPOINT before the cast?  Wouldn't fail your 
transaction..



Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regclass without error?

2010-09-03 Thread David Fetter
On Fri, Sep 03, 2010 at 05:10:44PM +0300, Marko Tiikkaja wrote:
 On 2010-09-03 4:46 PM, Tatsuo Ishii wrote:
 You can write a query against the system catalog tables.
 
 That was pretty hard than I though(schema search path etc.).
 
 Or you could
 install a function that wraps the regclass cast in an exception
 handler.
 
 That requires users to install the function. Annoying for users.
 
 How about using a SAVEPOINT before the cast?  Wouldn't fail your
 transaction..

For unattended operation, there are some issues:

* Generating appropriate SAVEPOINT names
* Keeping track of same
* Detecting errors
* Issuing ROLLBACKs to the aforementioned SAVEPOINT

None of this works super well for a bulk load.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regclass without error?

2010-09-03 Thread Tom Lane
Tatsuo Ishii is...@postgresql.org writes:
 Is there any way to use regclass without having ERROR?

 pgpool-II needs to find the oid from table name and for the purpose it
 issues something like SELECT 'table_name'::regproc::oid. Problem is,
 if the table does not exist, an error occured and the transaction
 aborts. Ideally if the table does not exist, the SELECT returns 0
 (InvalidOid).

I don't think the cast should act that way, but I could see providing a
separate conversion function that returns 0 ... or perhaps better NULL
... if no match.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regclass without error?

2010-09-03 Thread Pavel Golub
Hello, guys.

You wrote:

TI Hi,

TI Is there any way to use regclass without having ERROR?

TI pgpool-II needs to find the oid from table name and for the purpose it
TI issues something like SELECT 'table_name'::regproc::oid. Problem is,
TI if the table does not exist, an error occured and the transaction
TI aborts. Ideally if the table does not exist, the SELECT returns 0
TI (InvalidOid).

Agreed with Tatsuo about having InvalidOid.

TI Any idea?
TI --
TI Tatsuo Ishii
TI SRA OSS, Inc. Japan
TI English: http://www.sraoss.co.jp/index_en.php
TI Japanese: http://www.sraoss.co.jp




-- 
With best wishes,
 Pavel  mailto:pa...@gf.microolap.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-09-03 Thread Tom Lane
=?iso-8859-1?Q?PostgreSQL_-_Hans-J=FCrgen_Sch=F6nig?= postg...@cybertec.at 
writes:
 imagine a system with, say, 1000 partitions (heavily indexed) or so. the time 
 taken by the planner is already fairly heavy in this case.

As the fine manual points out, the current scheme for managing
partitioned tables isn't intended to scale past a few dozen partitions.

I think we'll be able to do better when we have an explicit
representation of partitioning, since then the planner won't
have to expend large amounts of effort reverse-engineering knowledge
about how an inheritance tree is partitioned.  Before that happens,
it's not really worth the trouble to worry about such cases.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plan time of MASSIVE partitioning ...

2010-09-03 Thread PostgreSQL - Hans-Jürgen Schönig

On Sep 3, 2010, at 4:40 PM, Tom Lane wrote:

 =?iso-8859-1?Q?PostgreSQL_-_Hans-J=FCrgen_Sch=F6nig?= postg...@cybertec.at 
 writes:
 imagine a system with, say, 1000 partitions (heavily indexed) or so. the 
 time taken by the planner is already fairly heavy in this case.
 
 As the fine manual points out, the current scheme for managing
 partitioned tables isn't intended to scale past a few dozen partitions.
 
 I think we'll be able to do better when we have an explicit
 representation of partitioning, since then the planner won't
 have to expend large amounts of effort reverse-engineering knowledge
 about how an inheritance tree is partitioned.  Before that happens,
 it's not really worth the trouble to worry about such cases.
 
   regards, tom lane
 


thank you ... - the manual is clear here but we wanted to see if there is some 
reasonably low hanging fruit to get around this.
it is no solution but at least a clear statement ...

many thanks,

hans


--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 02/09/10 23:13, Tom Lane wrote:
 (Yeah, I realize the caller
 could look into the latch to find that out, but callers really ought to
 treat latches as opaque structs.)

 Hmm, maybe we need a TestLatch function to check if a latch is set.

+1.  It could be a macro for now.  I wish that we could declare Latch as
an opaque struct, but we probably need to let callers embed it in a
larger struct, so we'll just have to rely on callers to code cleanly.

 I also realized that the timeout handling is a bit surprising with 
 interrupts. After EINTR we call select() again with the same timeout, so 
 a signal effectively restarts the timer.

Actually it's platform-dependent.  On some machines (I think
BSD-derived) the value of the timeout struct gets reduced by the elapsed
time before returning, so that if you just loop around you don't get
more than the originally specified delay time in total.

 We seem to have similar 
 behavior in a couple of other places, in pgstat.c and auth.c. So maybe 
 that's OK and just needs to be documented, but I thought I'd bring it up.

Yeah.  I am hoping that this facility will greatly reduce the need for
callers to care about the exact timeout delay, so I think that what we
should do for now is just document that the timeout might be several
times as long as specified, and see how it goes from there.

We could fix the problem if we had to, by adding some gettimeofday()
calls and computing the remaining delay time each time through the
loop.  But let's avoid doing that until it's clear we need it.

 Also, using sig_atomic_t for owner_pid is entirely not sane.

 Hmm, true, it doesn't need to be set from signal handler, but is there 
 an atomicity problem if one process calls ReleaseLatch while another 
 process is in SetLatch?

If there is *any* possibility of that happening then you have far worse
problems than whether the field is atomically readable or not: the
behavior will be unpredictable at just slightly larger timescales.
This is the reason why I think it'd be better if ReleaseLatch simply
didn't exist.  That'd discourage people from designing dynamic latch
structures, which are fundamentally going to be subject to race
conditions.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 03/09/10 15:16, Greg Stark wrote:
 On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com  wrote:
 * We need a smarter way to do pg_start/stop_backup() with this. At the
 moment, you can only have one backup running at a time, but we shouldn't
 have that limitation with this built-in mechanism.
 
 Well there's no particular reason we couldn't support having multiple
 pg_start_backup() pending either. It's just not usually something
 people have need so far.

 The backup label file makes that hard. There can be only one at a time.

I don't actually see a use-case for streaming multiple concurrent
backups.  How many people are going to be able to afford that kind of
load on the master's I/O bandwidth?

Certainly for version 1, it would be sufficient to throw an error if
someone tries to start a backup while another one is in progress.
*Maybe*, down the road, we'd want to relax it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Kevin Grittner
Stephen Frost sfr...@snowman.net wrote:
 
 there's a heck of alot of complexity there that we *don't* need.
 rsync is a great tool, don't get me wrong, but let's not try to go
 over our heads here.
 
Right -- among other things, it checks for portions of a new file
which match the old file at a different location.  For example, if
you have a very large text file, and insert a line or two at the
start, it will wind up only sending the new lines.  (Well, that and
all the checksums which help it determine that the rest of the file
matches at a shifted location.)  I would think that PostgreSQL could
just check whether *corresponding* portions of a file matched, which
is much simpler.
 
 we already break relations into 1G chunks (when/if they reach that
 size), so you won't necessairly be copying the entire relation if
 you're just doing mtime based or per-file-checksum based
 detection.
 
While 1GB granularity would be OK, I doubt it's optimal; I think CRC
checks for smaller chunks might be worthwhile.  My gut feel is that
somewhere in the 64kB to 1MB range would probably be optimal for us,
although the sweet spot will depend on how the database is used.
A configurable or self-adjusting size would be cool.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Windows Tools

2010-09-03 Thread David Fetter
Hello,

At work, I've been tasked with providing some Windows connection
libraries, etc.  Are people still using MinGW, or should I just
recommend using a proprietary toolkit to do the builds?  If so, which
one(s)?

Cheers,
David (Windows n00b)
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windows Tools

2010-09-03 Thread Dave Page
On Fri, Sep 3, 2010 at 4:11 PM, David Fetter da...@fetter.org wrote:
 Hello,

 At work, I've been tasked with providing some Windows connection
 libraries, etc.  Are people still using MinGW, or should I just
 recommend using a proprietary toolkit to do the builds?  If so, which
 one(s)?

Visual C++ 2005 for 8.3/8.4 or 2008 for 9.0. You can use the free
Express versions, but you'll need the Windows SDK as well with those
(also free). Setup info is in the docs - note that it can be a little
painful to get the SDK fully registered with 2005, but it seems to be
completely automatic with 2008.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Stephen Frost
Kevin,

* Kevin Grittner (kevin.gritt...@wicourts.gov) wrote:
 While 1GB granularity would be OK, I doubt it's optimal; I think CRC
 checks for smaller chunks might be worthwhile.  My gut feel is that
 somewhere in the 64kB to 1MB range would probably be optimal for us,
 although the sweet spot will depend on how the database is used.
 A configurable or self-adjusting size would be cool.

We have something much better, called WAL.  If people want to keep their
backup current, they should use that after getting the base backup up
and working.  We don't need to support this for the base backup, imv.

In any case, it's certainly not something required for an initial
implementation..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Thom Brown
On 3 September 2010 16:01, Tom Lane t...@sss.pgh.pa.us wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 03/09/10 15:16, Greg Stark wrote:
 On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com  wrote:
 * We need a smarter way to do pg_start/stop_backup() with this. At the
 moment, you can only have one backup running at a time, but we shouldn't
 have that limitation with this built-in mechanism.

 Well there's no particular reason we couldn't support having multiple
 pg_start_backup() pending either. It's just not usually something
 people have need so far.

 The backup label file makes that hard. There can be only one at a time.

 I don't actually see a use-case for streaming multiple concurrent
 backups.  How many people are going to be able to afford that kind of
 load on the master's I/O bandwidth?

To make it affordable, could functionality be added to allow slaves to
become chainable? (i.e. master streams to standby 1, which streams to
standby 2 etc)  This would help reduce bandwidth for normal streaming
replication too, which would be useful on particularly busy databases.
Obviously in synchronous replication this would be horribly slow so
not feasible for that.
-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Kevin Grittner
Stephen Frost sfr...@snowman.net wrote:
 
 We have something much better, called WAL.  If people want to keep
 their backup current, they should use that after getting the base
 backup up and working.
 
Unless you want to provide support for Point In Time Recovery
without excessive recovery times.
 
 We don't need to support this for the base backup, imv.
 
We found that making a hard-link copy of the previous base backup
and using rsync to bring it up to date used 1% the WAN bandwidth as
sending a complete, compressed base backup.  Just sending modified
files in their entirety would have bought the first order of
magnitude; recognizing the unchanged portions buys the second order
of magnitude.
 
 In any case, it's certainly not something required for an initial
 implementation..
 
No disagreement there; but sometimes it pays to know where you might
want to go, so you don't do something to make further development in
that direction unnecessarily difficult.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Heikki Linnakangas

On 03/09/10 18:01, Tom Lane wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

On 03/09/10 15:16, Greg Stark wrote:

On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com   wrote:

* We need a smarter way to do pg_start/stop_backup() with this. At the
moment, you can only have one backup running at a time, but we shouldn't
have that limitation with this built-in mechanism.


Well there's no particular reason we couldn't support having multiple
pg_start_backup() pending either. It's just not usually something
people have need so far.



The backup label file makes that hard. There can be only one at a time.


I don't actually see a use-case for streaming multiple concurrent
backups.  How many people are going to be able to afford that kind of
load on the master's I/O bandwidth?


It's more a matter of convenience when you're setting up test 
environments with small databases or something like that. I don't see 
many people regularly using the streaming backup for anything larger 
than a few hundred gigabytes anyway. At that point you'll most likely 
want to use something more efficient.



Certainly for version 1, it would be sufficient to throw an error if
someone tries to start a backup while another one is in progress.
*Maybe*, down the road, we'd want to relax it.


Yeah, it's OK for 1st version.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Robert Haas
On Fri, Sep 3, 2010 at 11:20 AM, Stephen Frost sfr...@snowman.net wrote:
 Kevin,

 * Kevin Grittner (kevin.gritt...@wicourts.gov) wrote:
 While 1GB granularity would be OK, I doubt it's optimal; I think CRC
 checks for smaller chunks might be worthwhile.  My gut feel is that
 somewhere in the 64kB to 1MB range would probably be optimal for us,
 although the sweet spot will depend on how the database is used.
 A configurable or self-adjusting size would be cool.

 We have something much better, called WAL.  If people want to keep their
 backup current, they should use that after getting the base backup up
 and working.  We don't need to support this for the base backup, imv.

 In any case, it's certainly not something required for an initial
 implementation..

While I'm certainly not knocking WAL, it's not difficult to think of
cases where being able to incrementally update a backup saves you an
awful lot of bandwidth.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Robert Haas
On Fri, Sep 3, 2010 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Fujii Masao masao.fu...@gmail.com writes:
 On Fri, Sep 3, 2010 at 11:08 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 elog(FATAL) is *certainly* not a better idea.  I think there's really
 nothing that can be done, you just have to silently ignore the error.

 Hmm.. some functions called by a signal handler use elog(FATAL), e.g.,
 RecoveryConflictInterrupt() do that when unknown conflict mode is given
 as an argument. Are these calls unsafe, too?

 [ shrug... ]  I stated before that the Hot Standby patch is doing
 utterly unsafe things in signal handlers.  Simon rejected that.
 I am waiting for irrefutable evidence to emerge from the field
 (and am very confident that it will be forthcoming...) before
 I argue with him further.  Meanwhile, I'm not going to accept anything
 unsafe in a core facility like this patch is going to be.

Oh.  I thought you had ignored his objections and fixed it.  Why are
we releasing 9.0 with this problem again?  Surely this is nuts.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Stephen Frost sfr...@snowman.net wrote:
 In any case, it's certainly not something required for an initial
 implementation..
 
 No disagreement there; but sometimes it pays to know where you might
 want to go, so you don't do something to make further development in
 that direction unnecessarily difficult.

I think that setting out to reimplement rsync, or to go down a design
path where we're likely to do a lot of that eventually, is the height
of folly.  We should be standing on the shoulders of other projects,
not rolling our own because of misguided ideas about people not having
those projects installed.

IOW, what I'd like to see is protocol extensions that allow an external
copy of rsync to be invoked; not build in rsync, or tar, or anything
else that we could get off-the-shelf.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Robert Haas
On Fri, Sep 3, 2010 at 11:47 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Stephen Frost sfr...@snowman.net wrote:
 In any case, it's certainly not something required for an initial
 implementation..

 No disagreement there; but sometimes it pays to know where you might
 want to go, so you don't do something to make further development in
 that direction unnecessarily difficult.

 I think that setting out to reimplement rsync, or to go down a design
 path where we're likely to do a lot of that eventually, is the height
 of folly.  We should be standing on the shoulders of other projects,
 not rolling our own because of misguided ideas about people not having
 those projects installed.

 IOW, what I'd like to see is protocol extensions that allow an external
 copy of rsync to be invoked; not build in rsync, or tar, or anything
 else that we could get off-the-shelf.

We used to use cp to create databases.  Should we go back to that system?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread David Blewett
On Fri, Sep 3, 2010 at 11:47 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 IOW, what I'd like to see is protocol extensions that allow an external
 copy of rsync to be invoked; not build in rsync, or tar, or anything
 else that we could get off-the-shelf.

Personally, I would love to see protocol-level compression added.
(Yes, going over a compressed SSH tunnel works well, but in general
isn't user-friendly.)

Josh: we talked on IRC awhile back and you mentioned that CMD had
added this in Mammoth? Would you be interested in having someone get
that integrated back into the community?

David Blewett

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regclass without error?

2010-09-03 Thread David E. Wheeler
On Sep 3, 2010, at 7:31 AM, Tom Lane wrote:

 I don't think the cast should act that way, but I could see providing a
 separate conversion function that returns 0 ... or perhaps better NULL
 ... if no match.

+1 I could use this in pgTAP.

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 IOW, what I'd like to see is protocol extensions that allow an external
 copy of rsync to be invoked; not build in rsync, or tar, or anything
 else that we could get off-the-shelf.

I'd much rather use an existing library to implement it than call out to
some external utility.  That said, I'm about as thrilled with libtar as
librsync after a bit of googling around. :/

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Fri, Sep 3, 2010 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 [ shrug... ]  I stated before that the Hot Standby patch is doing
 utterly unsafe things in signal handlers.  Simon rejected that.
 I am waiting for irrefutable evidence to emerge from the field
 (and am very confident that it will be forthcoming...) before
 I argue with him further.  Meanwhile, I'm not going to accept anything
 unsafe in a core facility like this patch is going to be.

 Oh.  I thought you had ignored his objections and fixed it.  Why are
 we releasing 9.0 with this problem again?  Surely this is nuts.

My original review of hot standby found about half a dozen things
I thought were broken:
http://archives.postgresql.org/pgsql-hackers/2010-05/msg00178.php
After a *very* long-drawn-out fight I fixed one of them
(max_standby_delay), largely still over Simon's objections.  I don't
have the energy to repeat that another half-dozen times, so I'm going
to wait for the suspected problems to be proven by field experience.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 what I'd like to see is protocol extensions that allow an external
 copy of rsync to be invoked; not build in rsync, or tar, or
 anything else that we could get off-the-shelf.
 
The complexities of dealing with properly invoking rsync externally
could well require more code and be considerably more fragile than
passing the data through the existing SR connection; particularly
since to get the full benefits of rsync you need to be dealing with
a daemon which has the appropriate modules configured -- the
location of which you wouldn't easily know.
 
If we were talking about re-implementing rsync, or doing more than a
rough approximation, kinda, of 5% of what rsync does, I'd be with
you.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Heikki Linnakangas

On 03/09/10 19:09, Stephen Frost wrote:

* Tom Lane (t...@sss.pgh.pa.us) wrote:

IOW, what I'd like to see is protocol extensions that allow an external
copy of rsync to be invoked; not build in rsync, or tar, or anything
else that we could get off-the-shelf.


I'd much rather use an existing library to implement it than call out to
some external utility.  That said, I'm about as thrilled with libtar as
librsync after a bit of googling around. :/


The code to build a tar archive is about 200 lines of code. The amount 
of code for untar is about the same. That's about the amount of effort 
We could add zlib compression since we already link with that, but 
that's about it. I'm not interested in adding more infrastructure for 
more tools. For more complicated scenarios, you can still use 
pg_start/stop_backup() as usual, there's nothing wrong with that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Heikki Linnakangas

On 03/09/10 18:53, David Blewett wrote:

On Fri, Sep 3, 2010 at 11:47 AM, Tom Lanet...@sss.pgh.pa.us  wrote:

IOW, what I'd like to see is protocol extensions that allow an external
copy of rsync to be invoked; not build in rsync, or tar, or anything
else that we could get off-the-shelf.


Personally, I would love to see protocol-level compression added.
(Yes, going over a compressed SSH tunnel works well, but in general
isn't user-friendly.)

Josh: we talked on IRC awhile back and you mentioned that CMD had
added this in Mammoth? Would you be interested in having someone get
that integrated back into the community?


There's a recent thread on pgsql-general about just that:
http://archives.postgresql.org/pgsql-general/2010-08/msg3.php

I agree with Tom's comments there, I'd like to have something to 
enable/disable SSL compression rather than implement our own. There was 
some discussion that it might not be available on JDBC SSL 
implementations, but if it's done in our protocol, you'll need changes 
to the client to make it work anyway.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windows Tools

2010-09-03 Thread Magnus Hagander
On Fri, Sep 3, 2010 at 17:18, Dave Page dp...@pgadmin.org wrote:
 On Fri, Sep 3, 2010 at 4:11 PM, David Fetter da...@fetter.org wrote:
 Hello,

 At work, I've been tasked with providing some Windows connection
 libraries, etc.  Are people still using MinGW, or should I just
 recommend using a proprietary toolkit to do the builds?  If so, which
 one(s)?

 Visual C++ 2005 for 8.3/8.4 or 2008 for 9.0. You can use the free
 Express versions, but you'll need the Windows SDK as well with those
 (also free). Setup info is in the docs - note that it can be a little
 painful to get the SDK fully registered with 2005, but it seems to be
 completely automatic with 2008.

If you download the latest SDK, you don't even need Visual C++ Express
- you can build directly from the SDK. Assuming you don't *want* the
IDE, of course...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Robert Haas
On Fri, Sep 3, 2010 at 12:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Sep 3, 2010 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 [ shrug... ]  I stated before that the Hot Standby patch is doing
 utterly unsafe things in signal handlers.  Simon rejected that.
 I am waiting for irrefutable evidence to emerge from the field
 (and am very confident that it will be forthcoming...) before
 I argue with him further.  Meanwhile, I'm not going to accept anything
 unsafe in a core facility like this patch is going to be.

 Oh.  I thought you had ignored his objections and fixed it.  Why are
 we releasing 9.0 with this problem again?  Surely this is nuts.

 My original review of hot standby found about half a dozen things
 I thought were broken:
 http://archives.postgresql.org/pgsql-hackers/2010-05/msg00178.php
 After a *very* long-drawn-out fight I fixed one of them
 (max_standby_delay), largely still over Simon's objections.  I don't
 have the energy to repeat that another half-dozen times, so I'm going
 to wait for the suspected problems to be proven by field experience.

Bummer.  Allow me to cast a vote for doing something about the fact
that handle_standby_sig_alarm() thinks it can safely acquire an LWLock
in a signal handler.  I think we should be making our decisions on
what to change in the code based on what is technically sound, rather
than based on how much the author complains about changing it.  Of
course there may be cases where there is a legitimate difference of
opinion concerning the best way forward, but I don't believe this is
one of them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regclass without error?

2010-09-03 Thread Alvaro Herrera
Excerpts from David Fetter's message of vie sep 03 10:21:00 -0400 2010:

  How about using a SAVEPOINT before the cast?  Wouldn't fail your
  transaction..
 
 For unattended operation, there are some issues:
 
 * Generating appropriate SAVEPOINT names
 * Keeping track of same
 * Detecting errors
 * Issuing ROLLBACKs to the aforementioned SAVEPOINT
 
 None of this works super well for a bulk load.

Why do you need any of this?  The logic is pretty simple:

SAVEPOINT my_savepoint;
select ...::regclass;

-- if it works
RELEASE my_savepoint;

-- if it fails
ROLLBACK TO my_savepoint;
RELEASE my_savepoint;


Granted, it's not super-performant ...

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regclass without error?

2010-09-03 Thread David Fetter
On Fri, Sep 03, 2010 at 01:28:15PM -0400, Alvaro Herrera wrote:
 Excerpts from David Fetter's message of vie sep 03 10:21:00 -0400 2010:
 
   How about using a SAVEPOINT before the cast?  Wouldn't fail your
   transaction..
  
  For unattended operation, there are some issues:
  
  * Generating appropriate SAVEPOINT names
  * Keeping track of same
  * Detecting errors
  * Issuing ROLLBACKs to the aforementioned SAVEPOINT
  
  None of this works super well for a bulk load.
 
 Why do you need any of this?  The logic is pretty simple:
 
 SAVEPOINT my_savepoint;
 select ...::regclass;
 
 -- if it works
 RELEASE my_savepoint;
 
 -- if it fails
 ROLLBACK TO my_savepoint;
 RELEASE my_savepoint;
 
 Granted, it's not super-performant ...

We have no mechanism to do this, or any other check, during a bulk
load.  It'd be a great feature to have :)

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regclass without error?

2010-09-03 Thread Alvaro Herrera
Excerpts from David Fetter's message of vie sep 03 13:31:00 -0400 2010:

 We have no mechanism to do this, or any other check, during a bulk
 load.  It'd be a great feature to have :)

I'm not sure what kind of bulk load you are talking about, nor what does
it have to do with pgpool-II.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Heikki Linnakangas

On 03/09/10 17:51, Tom Lane wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

On 02/09/10 23:13, Tom Lane wrote:

Also, using sig_atomic_t for owner_pid is entirely not sane.



Hmm, true, it doesn't need to be set from signal handler, but is there
an atomicity problem if one process calls ReleaseLatch while another
process is in SetLatch?


If there is *any* possibility of that happening then you have far worse
problems than whether the field is atomically readable or not: the
behavior will be unpredictable at just slightly larger timescales.
This is the reason why I think it'd be better if ReleaseLatch simply
didn't exist.  That'd discourage people from designing dynamic latch
structures, which are fundamentally going to be subject to race
conditions.


Each Walsender needs a latch, and walsenders come and go. I first 
experimented with had no ReleaseLatch function; instead any process 
could call WaitLatch on any shared latch, as long as only one process 
waits on a given latch at a time. But it had exactly the same problem, 
WaitLatch had to set the pid on the Latch struct to allow other 
processes to send the signal. Another process could call SetLatch and 
read the pid field, while WaitLatch is just setting it. I think we'll 
have to put a spinlock there, if we can't assume that assignment of 
pid_t is atomic. It's not the end of the world..


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost estimates for parameterized paths

2010-09-03 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Sep 2, 2010 at 5:31 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The best idea I can come up with at the moment is to compute best case
 and worst case costs for a parameterized path,

 Interestingly, I previously proposed almost exactly this approach to
 handle a couple of other problems:

I thought it seemed familiar ;-)

 I'm not entirely sure whether we can use this approach for more than
 one kind of problem at a time; if we can't, it's probably not a good
 idea to do it at all.

I don't see why we couldn't do it in principle.  The issue of course
is how many paths survive, and can we tolerate that from a planner
performance standpoint?

On reflection I think that for parameterized paths the problem won't be
too bad, because (a) we'll ignore parameterized paths except when
considering a join to the right outer rel, so their presence in the
rel's pathlist won't cost much otherwise, and (b) we will only generate
them when there's a suitable joinclause, so the number of potential
paths will be limited.  I am not sure those statements can be made for
the other applications you suggested, though.

There's probably not much we can do at this point except code up the
idea and see how well it works.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 03/09/10 17:51, Tom Lane wrote:
 If there is *any* possibility of that happening then you have far worse
 problems than whether the field is atomically readable or not: the
 behavior will be unpredictable at just slightly larger timescales.

 Each Walsender needs a latch, and walsenders come and go.

Well, then we need to think extremely hard about the circumstances in
which we need to send a cross-process latch signal to walsenders and
what the behavior needs to be in the race conditions.

 WaitLatch had to set the pid on the Latch struct to allow other 
 processes to send the signal. Another process could call SetLatch and 
 read the pid field, while WaitLatch is just setting it. I think we'll 
 have to put a spinlock there, if we can't assume that assignment of 
 pid_t is atomic. It's not the end of the world..

Yes it is.  Signal handlers can't take spinlocks (what if they interrupt
while the mainline is holding the lock?).

It's probably not too unreasonable to assume that pid_t assignment is
atomic.  But I'm still thinking that we have bigger problems than that
if there are really cases where SetLatch can execute at approximately
the same time as a latch owner is coming or going.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ps buffer is incorrectly padded on the (latest) OS X

2010-09-03 Thread Alexey Klyukin
Hi,

I always wondered why ps ax|grep postgres shows several extra blank lines
after the process name, i.e.

  972   ??  Ss 0:00.69 postgres: writer process






973   ??  Ss 0:00.51 postgres: wal writer process

(I put newlines instead of spaces there). By looking into the code I've found
this part of set_ps_display:

#ifdef PS_USE_CLOBBER_ARGV
/* pad unused memory; need only clobber remainder of old status string 
*/
if (last_status_len  ps_buffer_cur_len)
MemSet(ps_buffer + ps_buffer_cur_len, PS_PADDING,
   last_status_len - ps_buffer_cur_len);
last_status_len = ps_buffer_cur_len;
#endif   /* PS_USE_CLOBBER_ARGV */

PS_PADDING padding on __darwin__ is set to ' '. Apparently this doesn't work
correctly with OS X 10.6. After I changed the define to use '\0' on darwin
extra blank likes (actually consisting of hundreds of spaces without a line
break) disappeared. The one-liner change follows:

===
diff --git a/src/backend/utils/misc/ps_status.c 
b/src/backend/utils/misc/ps_status.c
index f27a52f..c2ddf33 100644
--- a/src/backend/utils/misc/ps_status.c
+++ b/src/backend/utils/misc/ps_status.c
@@ -76,7 +76,7 @@ bool  update_process_title = true;
 
 
 /* Different systems want the buffer padded differently */
-#if defined(_AIX) || defined(__linux__) || defined(__svr4__)
+#if defined(_AIX) || defined(__linux__) || defined(__svr4__) || 
defined(__darwin__)
 #define PS_PADDING '\0'
 #else
 #define PS_PADDING ' '
===

I don't have different OS X versions to test, so I'm not sure whether 10.5 or
below are also affected. Also, the patch should specifically check for 10.6,
though I don't know how to distinguish between different OS X versions in
postgres sources (any suggestions?).

Regards,
--
Alexey Klyukin  http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windows Tools

2010-09-03 Thread David Fetter
On Fri, Sep 03, 2010 at 06:24:06PM +0200, Magnus Hagander wrote:
 On Fri, Sep 3, 2010 at 17:18, Dave Page dp...@pgadmin.org wrote:
  On Fri, Sep 3, 2010 at 4:11 PM, David Fetter da...@fetter.org wrote:
  Hello,
 
  At work, I've been tasked with providing some Windows connection
  libraries, etc.  Are people still using MinGW, or should I just
  recommend using a proprietary toolkit to do the builds?  If so,
  which one(s)?
 
  Visual C++ 2005 for 8.3/8.4 or 2008 for 9.0. You can use the free
  Express versions, but you'll need the Windows SDK as well with
  those (also free). Setup info is in the docs - note that it can be
  a little painful to get the SDK fully registered with 2005, but it
  seems to be completely automatic with 2008.
 
 If you download the latest SDK, you don't even need Visual C++
 Express - you can build directly from the SDK.  Assuming you don't
 *want* the IDE, of course...

Here are a few of the things I'd rather have than that IDE:

* Root canal therapy without anaesthesia
* Migraine
* Windows for my desktop

Fortunately, it looks like I'll get none of the above for this project :)

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Heikki Linnakangas

On 03/09/10 21:16, Tom Lane wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

WaitLatch had to set the pid on the Latch struct to allow other
processes to send the signal. Another process could call SetLatch and
read the pid field, while WaitLatch is just setting it. I think we'll
have to put a spinlock there, if we can't assume that assignment of
pid_t is atomic. It's not the end of the world..


Yes it is.  Signal handlers can't take spinlocks (what if they interrupt
while the mainline is holding the lock?).


Ok, I see.


It's probably not too unreasonable to assume that pid_t assignment is
atomic.  But I'm still thinking that we have bigger problems than that
if there are really cases where SetLatch can execute at approximately
the same time as a latch owner is coming or going.


I don't see how to avoid it. A walsender, or any process really, can 
exit at any time. It can make the latch inaccessible to others before it 
exits to minimize the window, but it's always going to be possible that 
another process is just about to call SetLatch when you exit.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 03/09/10 21:16, Tom Lane wrote:
 It's probably not too unreasonable to assume that pid_t assignment is
 atomic.  But I'm still thinking that we have bigger problems than that
 if there are really cases where SetLatch can execute at approximately
 the same time as a latch owner is coming or going.

 I don't see how to avoid it. A walsender, or any process really, can 
 exit at any time. It can make the latch inaccessible to others before it 
 exits to minimize the window, but it's always going to be possible that 
 another process is just about to call SetLatch when you exit.

Well, in that case what we need to do is presume that the latch object
has a continuing existence but the owner/receiver can come and go.
I would suggest that InitLatch needs to initialize the object into a
valid but unowned state; there is *no* deinitialize operation; and
there are AcquireLatch and ReleaseLatch operations to become owner
or stop being owner.  We also need to define the semantics of SetLatch
on an unowned latch --- does this set a signal condition that will be
available to the next owner?

This amount of complexity might be overkill for local latches, but I
think we need it for shared ones.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ps buffer is incorrectly padded on the (latest) OS X

2010-09-03 Thread Tom Lane
Alexey Klyukin al...@commandprompt.com writes:
 I always wondered why ps ax|grep postgres shows several extra blank lines
 after the process name, i.e.

AFAIR it's always done that on OSX.  I thought we'd tried the '\0'
padding way back when and it didn't work nicely, but maybe Apple fixed
that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Ron Mayer
Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Sep 3, 2010 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 [ shrug... ]  I stated before that the Hot Standby patch is doing
 utterly unsafe things in signal handlers.  Simon rejected that.
 I am waiting for irrefutable evidence to emerge from the field
 (and am very confident that it will be forthcoming...) [...]
 
 [...]Why are
 we releasing 9.0 with this problem again?  Surely this is nuts.

Will the docs give enough info so that release note readers
will know when they're giving well-informed consent to volunteer
to produce such field evidence?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] returning multiple result sets from a stored procedure

2010-09-03 Thread Merlin Moncure
On Thu, Sep 2, 2010 at 7:17 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from John Adams's message of jue sep 02 18:25:45 -0400 2010:
 I noticed in postgres you cannot return multiple result sets from a stored
 procedure (surprisingly as it looks like a very good dbms).

 If you're really intent about doing this, you can emulate it by
 returning a set of refcursors.

Also arrays can work pretty well, depending on how much data there is
and where it's going.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread Martijn van Oosterhout
On Fri, Sep 03, 2010 at 09:56:12AM -0400, Stephen Frost wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
  The rsync code itself is not modular, I believe.  I think the author
  thereof kind of took the approach of placing efficiency before all.
 
 Yeah, I looked into this when discussing this same concept at PGCon with
 folks.  There doesn't appear to be a good librsync and, even if there
 was, there's a heck of alot of complexity there that we *don't* need.
 rsync is a great tool, don't get me wrong, but let's not try to go over
 our heads here.

rsync is not rocket science. All you need is for the receiving end to
send a checksum for each block it has. The server side does the same
checksum and for each block sends back same or new data.

The client and the server don't need to synchronise at all. If the
client sends nothing, the server sends everything.

The tricky part of rsync (finding block that have moved) is not needed
here.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature


[HACKERS] The other major HS TODO: standby promotion

2010-09-03 Thread Josh Berkus
All,

As far as I'm concerned, the other big missing feature for HS is the
ability to promote standbys to become the new master.   If we had that
feature, then HS can be the backbone of a large-scale PostgreSQL
cloud; if we don't have it, then HS does not contribute very much to
scalability beyond a couple of servers.

It also seems like it ought to be relatively easy to do, if I understand
the issues correctly.  Please advise me if I understand the two
obstacles for this:

a) for a standby to become a master to another standby, the promoted
standby must be equal to or ahead of the nonpromoted standby in the
replication stream.

b) when we bring a standby up, it comes up on a new timeline.  Since the
other standbys don't have this new timeline, they are incompatible with it.

c) when we promote a standby, it would also need to save all of its
transaction logs until the other standbys connect.

(a) seems easily enough solved by giving two steps: giving the DBA a way
to check where in the replication stream each standby is (I think we
already have this) and by having the re-mastering mechanism check for
regressions in timestamps or the XID sequence.

I can see two ways to tackle (b).  One would be NOT to start a new
timeline (as an option) when we promote the standby.  That method
probably has complications we don't want to get into.

The second method would be by giving standbys a way to subscribe to a
new timeline.  This seems like the better approach, as it would
logically be part of the re-mastering command.  What changes would be
required to do this?

(c) can actually already be dealt with by setting an archive_command on
each standby.  Beyond that, I don't think that we really need to do
anything; DBAs can have a choice between archiving logs to allow for
remastering of all standbys, or saving space and bandwidth, and forcing
some standbys to be re-cloned if you run out of time.  It would be nice,
eventually, to have a way to tell PostgreSQL to retain more or less WAL
segments without restarting the server, but I don't see this as critical.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] returning multiple result sets from a stored procedure

2010-09-03 Thread Josh Berkus

 I noticed in postgres you cannot return multiple result sets from a
 stored procedure (surprisingly as it looks like a very good dbms).

That feature has been on the TODO list for years.  However, nobody has
stepped forward to either write it, or to fund working on it.  If your
company has programmers or money to build this feature, it could
probably get done fairly quickly (as in, next version).

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Robert Haas
On Fri, Sep 3, 2010 at 3:11 PM, Ron Mayer rm...@cheapcomplexdevices.com wrote:
 Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Sep 3, 2010 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 [ shrug... ]  I stated before that the Hot Standby patch is doing
 utterly unsafe things in signal handlers.  Simon rejected that.
 I am waiting for irrefutable evidence to emerge from the field
 (and am very confident that it will be forthcoming...) [...]

 [...]Why are
 we releasing 9.0 with this problem again?  Surely this is nuts.

 Will the docs give enough info so that release note readers
 will know when they're giving well-informed consent to volunteer
 to produce such field evidence?

Yeah, exactly.  Good news: you can now run queries on the standby.
Bad news: we've abandoned our policy of not releasing with known bugs.
 Have fun and enjoy using PostgreSQL, the world's most advanced open
source databSegmentation fault

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] returning multiple result sets from a stored procedure

2010-09-03 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 I noticed in postgres you cannot return multiple result sets from a
 stored procedure (surprisingly as it looks like a very good dbms).

 That feature has been on the TODO list for years.  However, nobody has
 stepped forward to either write it, or to fund working on it.  If your
 company has programmers or money to build this feature, it could
 probably get done fairly quickly (as in, next version).

Part of the reason it's sat on TODO is lack of consensus about how such
a feature ought to look/work; particularly since most of the discussion
about it has considered that it'd go along with stored procedures
executing outside of transactions.  It's not just a matter of needing to
find some programming manpower.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Heikki Linnakangas

On 03/09/10 19:38, Robert Haas wrote:

On Fri, Sep 3, 2010 at 12:10 PM, Tom Lanet...@sss.pgh.pa.us  wrote:

Robert Haasrobertmh...@gmail.com  writes:

On Fri, Sep 3, 2010 at 10:07 AM, Tom Lanet...@sss.pgh.pa.us  wrote:

[ shrug... ]  I stated before that the Hot Standby patch is doing
utterly unsafe things in signal handlers.  Simon rejected that.
I am waiting for irrefutable evidence to emerge from the field
(and am very confident that it will be forthcoming...) before
I argue with him further.  Meanwhile, I'm not going to accept anything
unsafe in a core facility like this patch is going to be.



Oh.  I thought you had ignored his objections and fixed it.  Why are
we releasing 9.0 with this problem again?  Surely this is nuts.


My original review of hot standby found about half a dozen things
I thought were broken:
http://archives.postgresql.org/pgsql-hackers/2010-05/msg00178.php
After a *very* long-drawn-out fight I fixed one of them
(max_standby_delay), largely still over Simon's objections.  I don't
have the energy to repeat that another half-dozen times, so I'm going
to wait for the suspected problems to be proven by field experience.


Bummer.  Allow me to cast a vote for doing something about the fact
that handle_standby_sig_alarm() thinks it can safely acquire an LWLock
in a signal handler.  I think we should be making our decisions on
what to change in the code based on what is technically sound, rather
than based on how much the author complains about changing it.  Of
course there may be cases where there is a legitimate difference of
opinion concerning the best way forward, but I don't believe this is
one of them.


Hmm, just to make the risk more concrete, here's one scenario that could 
happen:


1. Startup process tries to acquire cleanup lock on a page. It's pinned, 
so it has to wait, and calls ResolveRecoveryConflictWithBufferPin().
2. ResolveRecoveryConflictWithBufferPin enables the standby SIGALRM 
handler by calling enable_standby_sig_alarm(), and calls 
ProcWaitForSignal().


3. ProcWaitForSignal() calls semop() (assuming sysv semaphores here) to 
wait for the process semaphore


4. Max standby delay is reached and SIGALRM fired. CheckStandbyTimeout() 
is called in signal handler. CheckStandbyTimeout() calls 
SendRecoveryConflictWithBufferPin(), which calls CancelDBBackends()


5. CancelDBBackends() tries to acquire ProcArrayLock in exclusive mode. 
It's being held by another process, so we have to sleep


6. To sleep, LWLockAcquire calls PGSemaphoreLock, which calls semop() to 
wait on on the process semaphore.


So we now have the same process nested twice inside a semop() call. 
Looking at the Linux signal (7) man page, it is undefined what happens 
if semop() is re-entered in a signal handler while another semop() call 
is happening in main line of execution. Assuming it somehow works, which 
semop() call is going to return when the semaphore is incremented?


Maybe that's ok, if I'm reading the deadlock checker code correctly, it 
also calls semop() to increment the another process' semaphore, and the 
deadlock checker can be invoked from a signal handler while in semop() 
to wait on our process' semaphore. BTW, sem_post(), which is the Posix 
function for incrementing a semaphore, is listed as a safe function to 
call in a signal handler. But it's certainly fishy.


A safer approach would be to just PGSemaphoreUnlock() in the signal 
handler, and do all the other processing outside it. You'd still call 
semop() within semop(), but at least it would be closer to the semop() 
within semop() we already do in the deadlock checker. And there would be 
less randomness from timing and lock contention involved, making it 
easier to test the behavior on various platforms.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost estimates for parameterized paths

2010-09-03 Thread Robert Haas
On Fri, Sep 3, 2010 at 2:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 On reflection I think that for parameterized paths the problem won't be
 too bad, because (a) we'll ignore parameterized paths except when
 considering a join to the right outer rel, so their presence in the
 rel's pathlist won't cost much otherwise,

Hmm.  Maybe they should go into a separate path list, and perhaps we
could do the min/max calculations only with that pathlist (at least
for now), thus avoiding taking a generalized penalty to handle this
specific case.  IIUC, a parameterized path should never cause an
unparamaterized path to be thrown out, even if the unparameterized
path costs more than the worst-case cost for the parameterized path,
because the parameterized path constrains the possible join strategies
higher up, so what looked like a great idea at first blush might turn
out to suck when the chips are down.  Then, too, I'm not sure we can
even guarantee it will always be possible to form a valid plan around
a given parameterized path, which is another good reason not to
discard any unparameterized alternatives that may exist.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming a base backup from master

2010-09-03 Thread David Blewett
On Fri, Sep 3, 2010 at 12:23 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 03/09/10 18:53, David Blewett wrote:

 On Fri, Sep 3, 2010 at 11:47 AM, Tom Lanet...@sss.pgh.pa.us  wrote:

 IOW, what I'd like to see is protocol extensions that allow an external
 copy of rsync to be invoked; not build in rsync, or tar, or anything
 else that we could get off-the-shelf.

 Personally, I would love to see protocol-level compression added.
 (Yes, going over a compressed SSH tunnel works well, but in general
 isn't user-friendly.)

 Josh: we talked on IRC awhile back and you mentioned that CMD had
 added this in Mammoth? Would you be interested in having someone get
 that integrated back into the community?

 There's a recent thread on pgsql-general about just that:
 http://archives.postgresql.org/pgsql-general/2010-08/msg3.php

 I agree with Tom's comments there, I'd like to have something to
 enable/disable SSL compression rather than implement our own. There was some
 discussion that it might not be available on JDBC SSL implementations, but
 if it's done in our protocol, you'll need changes to the client to make it
 work anyway.

While I agree that combining SSL with compression is a great win, I'm
not sold on Tom's argument that compression is only needed in WAN
situations. I've seen great benefit to using an SSH tunnel with
compression over LAN connections (100 and 1000 mbps). At work, we do
have a private WAN that it would be nice to be able to use compression
with no encryption on. I think it's a general-use thing. While I know
it's not the best argument, MySQL does provide compression at the
connection level.

David Blewett

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: OT: OFF TOPIC: [HACKERS] returning multiple result sets from a stored procedure

2010-09-03 Thread Josh Berkus
John,

 I honestly do not mean any offence, just out of curiosity.
 
 If you guys care about money and time why would you spend the best years
 of your life basically copying commercial products for free?

We don't do it to copy commercial products.  We do it to build something
better than them.

 I do not have experience with open source and I kind of thought open
 source guys do not need or care about money and time.

It's a common misapprehension that open source software is somehow
produced for free.  The press has contributed to this myth a great deal
by calling open source socialism and altruism.  What's actually true
about open source is that the organization which releases the product
(the open source project) is not necessarily the same organzation which
pays the developers.  However, if you look at any mature, large open
source project you will find that at least 1/4 of its code contributors
are paid to work on the project by *someone*, and that those paid
developers account for 70% to 95% of the code.  PostgreSQL is no
exception to this rule.

The three differences between an open source project and proprietary
software in terms of adding new features are:

a) it's pay or play, which means that you have the option of writing
the new feature yourself instead of funding it in cash, and

b) the cost of developing new features if you choose to fund them is
much cheaper (generally a couple orders of magnitude cheaper) than
proprietary software because of the open market for developers and
greater efficiency of OSS development, and

c) it's *much* easier for multiple companies to contribute to the same
project if that project is open source than if it's proprietary.

Ultimately, however, if a feature is going to be added to any OSS
project, that feature is going to be paid for by someone, either in
money, time, or both.

It does help us to get feedback like the feedback you gave eariler, even
if you can't contribute to the project because it helps us prioritize
new features.  But you should recognize that if you're not contributing
money or time to the project, you may have a long wait for the feature
*you* want.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: OT: OFF TOPIC: [HACKERS] returning multiple result sets from a stored procedure

2010-09-03 Thread Josh Berkus
On 9/3/10 2:20 PM, Josh Berkus wrote:
 However, if you look at any mature, large open
 source project you will find that at least 1/4 of its code contributors
 are paid to work on the project by *someone*, and that those paid
 developers account for 70% to 95% of the code.

Relevant link for this:
http://apcmag.com/linux-now-75-corporate.htm

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Robert Haas
On Fri, Sep 3, 2010 at 4:20 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Maybe that's ok, if I'm reading the deadlock checker code correctly, it also
 calls semop() to increment the another process' semaphore, and the deadlock
 checker can be invoked from a signal handler while in semop() to wait on our
 process' semaphore. BTW, sem_post(), which is the Posix function for
 incrementing a semaphore, is listed as a safe function to call in a signal
 handler. But it's certainly fishy.

Color me confused; I may need to backpedal rapidly here.  I had
thought that what Tom was complaining about was the fact that the
signal handler was taking LWLocks, which I would have thought to be
totally unsafe.  But it seems the deadlock detector does the same
thing, more or less because the signal handlers are set up so that
they don't do anything unless we're within a limited section of code
where nothing too interesting can happen.  I'm not too sure why we
think that it's safe to invoke the deadlock detector that way, but
it's also not too clear to me why this case is any worse.

It furthermore appears that Simon's reply to Tom's complaint about
this function was: This was modelled very closely on
handle_sig_alarm() and was reviewed by other hackers. I'm not great on
that, as you know, so if you can explain what it is I can't do, and
how that differs from handle_sig_alarm running the deadlock detector
in the same way, then I'll work on it some more.

I guess I need the same explanation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 A safer approach would be to just PGSemaphoreUnlock() in the signal 
 handler, and do all the other processing outside it.

I don't see any particularly good reason to assume that
PGSemaphoreUnlock is safe either: you're still talking about nested
semop operations.

The pre-existing SIGALRM handler uses a self-signal (kill(MyProcPid,
SIGINT)) to kick the process off any wait it might be doing.  I'd rather
do something like that.

Or maybe the work you're doing on latches would help ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost estimates for parameterized paths

2010-09-03 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Fri, Sep 3, 2010 at 2:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 On reflection I think that for parameterized paths the problem won't be
 too bad, because (a) we'll ignore parameterized paths except when
 considering a join to the right outer rel, so their presence in the
 rel's pathlist won't cost much otherwise,

 Hmm.  Maybe they should go into a separate path list, and perhaps we
 could do the min/max calculations only with that pathlist (at least
 for now), thus avoiding taking a generalized penalty to handle this
 specific case.  IIUC, a parameterized path should never cause an
 unparamaterized path to be thrown out,

Yeah, but the converse isn't true.  I had considered the idea of keeping
parameterized paths in a separate list, but you'd still have to examine
the main list to look for unparameterized paths that might dominate them.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-03 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Color me confused; I may need to backpedal rapidly here.  I had
 thought that what Tom was complaining about was the fact that the
 signal handler was taking LWLocks, which I would have thought to be
 totally unsafe.

Well, it's unsafe if the signal could interrupt mainline code that is
trying to take an LWLock or already holds the same LWLock --- or, if you
consider deadlock risks against other processes, already holding other
LWLocks might be problematic too.  And trying to use facilities like
elog or palloc is also pretty unsafe if the mainline is too.

The reason the deadlock checker is okay is that there is only a *very*
narrow range of mainline code that it could possibly be interrupting,
basically nothing but the PGSemaphoreLock() call in ProcSleep.
(There's a lot of other code inside the loop there, but notice that it's
protected by tests that ensure that it won't run unless the deadlock
checker already did.)

One salient thing about ProcSleep is that you shouldn't call it while
holding any LWLocks, and another is that if you're sleeping while
holding regular heavyweight locks, the deadlock checker is exactly what
will get you out of trouble if there's a deadlock.

Now the HS case likewise appears to be set up so that the signal can
only directly interrupt ProcWaitForSignal, so I think the core issue is
whether any deadlock situations are possible.  Given that this gets
called from a low-level place like LockBufferForCleanup, I don't feel
too comfortable about that.  I certainly haven't seen any analysis or
documentation of what locks can safely be held at that point.
The deadlock checker only tries to take the LockMgr LWLocks, so
extrapolating from whether it is safe to whether touching the
ProcArrayLock is safe seems entirely unfounded.

It might be worth pointing out here that LockBufferForCleanup is already
known to be a risk factor for undetected deadlocks, even without HS in
the picture, because of the possibility of deadlocks involving a chain
of both heavyweight locks and LWLocks.  Whether HS makes it materially
worse may be something that we need field experience to determine.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >