date:20101007


On 07.10.2010 06:39, Robert Haas wrote:

On Tue, Oct 5, 2010 at 3:42 PM, Tom Lanet...@sss.pgh.pa.us  wrote:

Right, *column* filtering seems easy and entirely secure.  The angst
here is about row filtering.  Can we have a view in which users can see
the values of a column for some rows, with perfect security that they
can't identify values for the hidden rows?  The stronger form is that
they shouldn't even be able to tell that hidden rows exist, which is
something your view doesn't try to do; but there are at least some
applications where that would be desirable.


I took a crack at documenting the current behavior; see attached.


Looks good. It gives the impression that you need to be able to a create 
custom function to exploit, though. It would be good to mention that 
internal functions can be used too, revoking access to CREATE FUNCTION 
does not make you safe.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] todo point: plpgsql - scrollable cursors are supported

2010-10-07 Thread Pavel Stehule

2010/10/7 Robert Haas robertmh...@gmail.com:
 On Mon, Oct 4, 2010 at 2:52 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 I am thinking so you can remove a scrollable cursor support from
 ToDo for plpgsql. Scrollable cursors are supported and supported
 syntax are same as core SQL language.

 I agree, removed.  I also removed WITH HOLD cursors, which we seem to
 have as well.

I think so we doesn't support WITH HOLD cursor syntax yet. Maybe we
have similar functionality. Don't know.

Pavel


 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Perf regression in 2.6.32 (Ubuntu 10.04 LTS)


Robert Haas wrote:

Greg, have you run into any other evidence suggesting a problem with 2.6.32?
  


I haven't actually checked myself yet.  Right now the only distribution 
shipping 2.6.32 usefully is Ubuntu 10.04, which I can't recommend anyone 
use on a server because their release schedules are way too aggressive 
to ever deliver stable versions anymore.  So until either RHEL6 or 
Debian Squeeze ships, very later this year or early next, the 
performance of 2.6.32 is irrelevant to me.  And by then I'm hoping that 
the early adopters have squashed more of the obvious bugs here.  2.6.32 
is 11 months old at this point, which makes it still a bleeding edge 
kernel in my book.


--
Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On Scalability

Any feedbacks from TGL and Heikki, then?

2010/7/29 Joshua D. Drake j...@commandprompt.com:
 On Thu, 2010-07-29 at 19:52 +0200, Vincenzo Romano wrote:
 2010/7/29 Joshua D. Drake j...@commandprompt.com:
  On Thu, 2010-07-29 at 19:34 +0200, Vincenzo Romano wrote:
 
  I expect that a more complex schema will imply higher workloads
  on the query planner. What I don't know is how the increase in the
  workload will happen: linearly, sublinearly, polynomially or what?

 Do you think I should ask somewhere else?
 Any hint?

 The two people that would likely know the best are on vacation, TGL and
 Heikki. You may have to wait a bit.

 Sincerely,

 Joshua D. Drake

 --
 PostgreSQL.org Major Contributor
 Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
 Consulting, Training, Support, Custom Development, Engineering
 http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt





-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
cel +393398083886 fix +390823454163 fax +3902700506964
gtalk. vincenzo.rom...@notorand.it skype. notorand.it
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On Scalability


On 07.10.2010 10:09, Vincenzo Romano wrote:

Any feedbacks from TGL and Heikki, then?


I don't have anything to add to what others said already. Your best 
advice is to test it yourself.


I would expect the plan time to be linear relative to the number of 
partial indexes or child tables involved, except that constraint 
exclusion of CHECK constraints on the partitions is exponential. But I 
also wouldn't be surprised if there's some other non-linear aspect there 
that shows its head with thousands of partitions.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

On Wed, 2010-10-06 at 10:57 -0700, Josh Berkus wrote:

 I also strongly believe that we should get single-standby
 functionality committed and tested *first*, before working further on
 multi-standby.

Yes, lets get k = 1 first.

With k = 1 the number of standbys is not limited, so we can still have
very robust and highly available architectures. So we mean
first-acknowledgement-releases-waiters.

 (1) Consistency: this is another DBA-false-confidence issue.  DBAs who
 implement (1) are liable to do so thinking that they are not only
 guaranteeing the consistency of every standby with the master, but the
 consistency of every standby with every other standby -- a kind of
 dummy multi-master.  They are not, so it will take multiple reminders
 and workarounds in the docs to explain this.  And we'll get complaints
 anyway.

This puts the matter very clearly. Setting k = N is not as good an idea
as it sounds when first described.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

On Wed, 2010-10-06 at 10:57 -0700, Josh Berkus wrote:
 (2), (3) Degradation: (Jeff) these two cases make sense only if we
 give
 DBAs the tools they need to monitor which standbys are falling behind,
 and to drop and replace those standbys.  Otherwise we risk giving DBAs
 false confidence that they have better-than-1-standby reliability when
 actually they don't.  Current tools are not really adequate for this.

Current tools work just fine for identifying if a server is falling
behind. This improved in 9.0 to give fine-grained information. Nothing
more is needed here within the server.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On Scalability

On Thu, 2010-10-07 at 10:28 +0300, Heikki Linnakangas wrote:

 constraint exclusion of CHECK constraints on the partitions is
 exponential

Constraint exclusion is linear with respect to number of partitions.
Why do you say exponential?
 
-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] host name support in pg_hba.conf

2010-10-07 Thread Peter Eisentraut

On tis, 2010-10-05 at 22:28 -0400, Joachim Wieland wrote:
 This is exactly what I am seeing here. However contrary to your case
 the
 patch makes it even worse on my side. With the patch compiled in and a
 pg_hba.conf entry of localhost, I cannot connect anymore to -h
 localhost, I get no pg_hba.conf entry for host ::1.
 
 This is mostly standard Ubuntu setup.

Could you post your /etc/hosts file?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On Scalability


On 07.10.2010 10:41, Simon Riggs wrote:

On Thu, 2010-10-07 at 10:28 +0300, Heikki Linnakangas wrote:


constraint exclusion of CHECK constraints on the partitions is
exponential


Constraint exclusion is linear with respect to number of partitions.
Why do you say exponential?


For some reason I thought the planner needs to check the constraints of 
the partitions against each other, but you're right, clearly that's not 
the case. Linear it is.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

On 10/06/2010 10:01 PM, Simon Riggs wrote:
 The code to implement your desired option is
 more complex and really should come later.

I'm sorry, but I think of that exactly the opposite way. The timeout for
automatic continuation after waiting for a standby is the addition. The
wait state of the master is there anyway, whether or not it's bound by a
timeout. The timeout option should thus come later.

Regards

Markus Wanner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

Markus Wanner mar...@bluegap.ch writes:
 I'm just saying that this should be an option, not the only choice.

 I'm sorry, I just don't see the use case for a mode that drops
 guarantees when they are most needed. People who don't need those
 guarantees should definitely go for async replication instead.

We're still talking about freezing the master and all the applications
when the first standby still has to do a base backup and catch-up to
where the master currently is, right?

 What does a synchronous replication mode that falls back to async upon
 failure give you, except for a severe degradation in performance during
 normal operation? Why not use async right away in such a case?

It's all about the standard case you're building, sync rep, and how to
manage errors. In most cases I want flexibility. Alert says standby is
down, you lost your durability requirements, so now I'm building a new
standby. Does it mean my applications are all off and the master
refusing to work? I sure hope I can choose about that, if possible per
application.

Next step, the old standby has been able to boot again, thanks to the
sysadmins who repaired it, so it's online again, and my replacement
machine is doing a base-backup. Are all the applications still
unavailable? I sure hope I have a word in this decision.

 so opening a
 superuser connection to act on the currently waiting transaction is
 still possible (pass/fail, but fail is what at this point? shutdown to
 wait some more offline?).

 Not sure I'm following here. The admin will be busy re-establishing
 (connections to) standbies, killing transactions on the master doesn't
 help anything - whether or not the master waits forever.

The idea here would be to be able to manually ACK a transaction that's
waiting forever, because you know it won't have an answer and you'd
prefer the application to just continue. But I see that's not a valid
use case for you.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep at Oct 5

2010-10-07 Thread Fujii Masao

On Wed, Oct 6, 2010 at 4:06 PM, Simon Riggs si...@2ndquadrant.com wrote:
 The problem is how much WAL is stored on (any) node. Currently that is
 wal_keep_segments, which doesn't work very well, but I've seen no better
 ideas that cover all important cases.

What about allowing the master to read and send WAL from the archive?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit


On 07.10.2010 12:52, Dimitri Fontaine wrote:

Markus Wannermar...@bluegap.ch  writes:

I'm just saying that this should be an option, not the only choice.


I'm sorry, I just don't see the use case for a mode that drops
guarantees when they are most needed. People who don't need those
guarantees should definitely go for async replication instead.


We're still talking about freezing the master and all the applications
when the first standby still has to do a base backup and catch-up to
where the master currently is, right?


Either that, or you configure your system for asynchronous replication 
first, and flip the switch to synchronous only after the standby has 
caught up. Setting up the first standby happens only once when you 
initially set up the system, or if you're recovering from a catastrophic 
loss of the standby.



What does a synchronous replication mode that falls back to async upon
failure give you, except for a severe degradation in performance during
normal operation? Why not use async right away in such a case?


It's all about the standard case you're building, sync rep, and how to
manage errors. In most cases I want flexibility. Alert says standby is
down, you lost your durability requirements, so now I'm building a new
standby. Does it mean my applications are all off and the master
refusing to work?


Yes. That's why you want to have at least two standbys if you care about 
availability. Or if durability isn't that important to you after all, 
use asynchronous replication.


Of course, if in the heat of the moment the admin is willing to forge 
ahead without the standby, he can temporarily change the configuration 
in the master. If you want the standby to be rebuilt automatically, you 
can even incorporate that configuration change in the scripts too. The 
important point is that you or your scripts are in control, and you know 
at all times whether you can trust the standby or not. If the master 
makes such decisions automatically, you don't know if the standby is 
trustworthy (ie. guaranteed up-to-date) or not.



so opening a
superuser connection to act on the currently waiting transaction is
still possible (pass/fail, but fail is what at this point? shutdown to
wait some more offline?).


Not sure I'm following here. The admin will be busy re-establishing
(connections to) standbies, killing transactions on the master doesn't
help anything - whether or not the master waits forever.


The idea here would be to be able to manually ACK a transaction that's
waiting forever, because you know it won't have an answer and you'd
prefer the application to just continue. But I see that's not a valid
use case for you.


I don't see anything wrong with having tools for admins to deal with the 
unexpected. I'm not sure overriding individual transactions is very 
useful though, more likely you'll want to take the whole server offline, 
or you want to change the config to allow all transactions to continue 
without the synchronous standby.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Either that, or you configure your system for asynchronous replication
 first, and flip the switch to synchronous only after the standby has caught
 up. Setting up the first standby happens only once when you initially set up
 the system, or if you're recovering from a catastrophic loss of the
 standby.

Or if the standby is lagging and the master wal_keep_segments is not
sized big enough. Is that a catastrophic loss of the standby too?

 It's all about the standard case you're building, sync rep, and how to
 manage errors. In most cases I want flexibility. Alert says standby is
 down, you lost your durability requirements, so now I'm building a new
 standby. Does it mean my applications are all off and the master
 refusing to work?

 Yes. That's why you want to have at least two standbys if you care about
 availability. Or if durability isn't that important to you after all, use
 asynchronous replication.

Agreed, that's a nice simple use case.

Another one is to say that I want sync rep when the standby is
available, but I don't have the budget for more. So I prefer a good
alerting system and low-budget-no-guarantee when the standby is down,
that's my risk evaluation.

 Of course, if in the heat of the moment the admin is willing to forge ahead
 without the standby, he can temporarily change the configuration in the
 master. If you want the standby to be rebuilt automatically, you can even
 incorporate that configuration change in the scripts too. The important
 point is that you or your scripts are in control, and you know at all times
 whether you can trust the standby or not. If the master makes such decisions
 automatically, you don't know if the standby is trustworthy (ie. guaranteed
 up-to-date) or not.

My proposal is that the master has the information to make the decision,
and the behavior is something you setup. Default to security, so wait
forever and block the applications, but could be set to ignore standby
that have not at least reached this state.

I don't see that you can make everybody happy without a knob here, and I
don't see how we can deliver one without a clear state diagram of the
standby possible current states and transitions.

The other alternative is to just don't care and accept the timeout as
being an option with the quorum, so that you just don't wait for the
quorum if so you want. It's much more dynamic and dangerous, but with a
good alerting system it'll be very popular I guess.

 I don't see anything wrong with having tools for admins to deal with the
 unexpected. I'm not sure overriding individual transactions is very useful
 though, more likely you'll want to take the whole server offline, or you
 want to change the config to allow all transactions to continue without the
 synchronous standby.

The question then is, should the new configuration alter running
transactions? My implicit was that I don't think so, and then I need
another facility, such as

  SELECT pg_cancel_quorum_wait(procpid)
FROM pg_stat_activity
   WHERE waiting_quorum;

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

On Thu, 2010-10-07 at 11:46 +0200, Markus Wanner wrote:
 On 10/06/2010 10:01 PM, Simon Riggs wrote:
  The code to implement your desired option is
  more complex and really should come later.
 
 I'm sorry, but I think of that exactly the opposite way. 

I see why you say that. Dimitri's suggestion is an enhancement on the
basic feature, just as Heikki's is. My reply was directed at Heikki, but
should also apply to Dimitri's idea also.

 The timeout for
 automatic continuation after waiting for a standby is the addition. The
 wait state of the master is there anyway, whether or not it's bound by a
 timeout. The timeout option should thus come later.

Adding timeout is very little code. We can take that out of the patch if
that's an objection.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: SQL/MED(FDW) DDL

2010-10-07 Thread Shigeru HANADA

On Tue, 05 Oct 2010 14:59:27 -0400
Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
  I wonder if we
  should focus on first efforts on really simple cases like CSV files
  (as you mentioned) and perhaps something like memcached, which has
  different properties than a CSV file, but extremely simple ones.  I
  think it's inevitable that the API is going to get more complicated
  from release to release and probably not in backward-compatible ways;
  I think it's too early to be worried about that.
 
 +1 ... we can *not* assume that we know enough to get the API right
 the first time.  It'll have to be stated up front that it's subject to
 change and you probably shouldn't be trying to develop FDWs separately
 from the main project for awhile yet.
 
 I think that looking only at CSV might be too simple, though.  It'd be
 good to be at least trying to build a remote-PG adapter, even with the
 knowledge that it's all throwaway code.
Thanks for comments, and sorry for delayed reply.

I propose revised version of FDW API which supports both of CSV and PG. 
It would be a minimal API to support SQL/MED, so some features such as
optimizing query or using remote index are not considered much.

==
This FDW API is designed to support scanning a foreign table.  As
SQL/MED standard says, updating external data is not supported.  The
instance of a foreign table may be a CSV file or a table on an
external PostgreSQL server, or anything which could generate
PG-style tuples.

Scanning of a foreign table will be done in some steps below.
Note: FdwRoutine-XXX() means FDW API function.

1. Parser parses a query string and generate a query tree.  In this
step, a foreign table is not distinguished from a local table.

2. Optimizer generates a Path pathnode (pathtype = T_ForeignPath) for
each foreign table in the query.  Because foreign tables don't have
TID nor index (at least now), we don't add new Path type.

In this step, optimizer calls FdwRoutine-GetStatistics() to get
estimated costs of the foreign scan.  FDW can calculate costs by
itself with some statistics, or get costs from remote-side, or leave
the costs as is.

3. Planner generates a ForeignScan plannode for each foreign Path. 
ForeignScan could be typedef of Scan because it has no other member,
but it have been defined as a independent structure to make it be able
to add other members in the future.

4. Executor executes each ForeignScan node in some steps.  Through the
Step-4, FDW can keep their own information about the foreign scan in the
ForeignScan-FdwReply.

4-a. To initialize a ForeignScan plannodes, ExecInitForeignScan() is
called for each ForeignScan node.  In ExecInitForeignScan(),
ForeignScanState will be created from:

  * ScanState which hold same information as SeqScan.
  * FdwRoutine pointer as cache (for cache).
  * some catalog information about the foreign table (for cache).
  * connection established with FdwRoutine-ConnectServer().
FDW can use GetFSConnectionByName(name) to get pooled connection by
name in FdwRoutine-ConnectServer().

Then, ExecInitForeignScan() calls FdwRouteine-Open() to tell FDW that
it's time to start the query.  FDW can do some initialization if
necessary.

4-b. To retrieve a tuple from the foreign table, ExecForeignScan()
calls FdwRoutine-Iterate().  FDW should set next tuple into
ss_ScanTupleSlot of ScanState or empty the slot to indicate EOF.
Projection and qual evaluation will be done in ExecScan() later, so
ExecForeignScan() should return all columns/tuples in the external
data store.

4-c. To reset the scan and rewind cursor to the head of the foreign
table, ExecForeignReScan() calls FdwRoutine-ReOpen().  This occurs
when a ForeignScan node is a inner node of a nested loop join.
FDW is required to return the first tuple again at next
FdwRoutine-Iterate() call.

4-d. At the end of execution of ForeignScan, ExecEndForeignScan() is
called.  ExecEndForeignScan() calls FdwRoutine-Close() to tell FDW
that no more Iterate will called.  FDW can do some finalization if
necessary.

5. Connections which have established via FdwRoutine-ConnectServer()
are pooled in the backend for future query which accesses same foreign
server.  Pooling mechanism is implemented in core module, not in the
each FDW.

When a user executes DISCARD ALL command, or backend dies,
FdwRoutine-FreeFSConnection() is called for each foreign connection
to discard it.

To achieve features above, I propose following FDW API:

/*
 * Almost same as SeqScan, but defined as a structure to allow add
 * members in the future.
 */
typedef struct ForeignScan {
Scanscan;
} ForeignScan;

/*
 * Handle to access FDW-depend data.
 * Each FDW can use ForeignScanState-reply with casting between FdwReply
 * and actual type.
 */
typedef FdwReply FdwReply;

/*
 * ForeignScanState node is used to store scan status.
 */
typedef struct ForeignScanState {
ScanState   ss; /* its first field is NodeTag

Re: [HACKERS] todo point: plpgsql - scrollable cursors are supported

On Thu, Oct 7, 2010 at 2:38 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 2010/10/7 Robert Haas robertmh...@gmail.com:
 On Mon, Oct 4, 2010 at 2:52 AM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 I am thinking so you can remove a scrollable cursor support from
 ToDo for plpgsql. Scrollable cursors are supported and supported
 syntax are same as core SQL language.

 I agree, removed.  I also removed WITH HOLD cursors, which we seem to
 have as well.

 I think so we doesn't support WITH HOLD cursor syntax yet. Maybe we
 have similar functionality. Don't know.

It's in the documentation...

http://www.postgresql.org/docs/current/static/sql-declare.html

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] todo point: plpgsql - scrollable cursors are supported

2010-10-07 Thread Pavel Stehule

2010/10/7 Robert Haas robertmh...@gmail.com:
 On Thu, Oct 7, 2010 at 2:38 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 2010/10/7 Robert Haas robertmh...@gmail.com:
 On Mon, Oct 4, 2010 at 2:52 AM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 I am thinking so you can remove a scrollable cursor support from
 ToDo for plpgsql. Scrollable cursors are supported and supported
 syntax are same as core SQL language.

 I agree, removed.  I also removed WITH HOLD cursors, which we seem to
 have as well.

 I think so we doesn't support WITH HOLD cursor syntax yet. Maybe we
 have similar functionality. Don't know.

 It's in the documentation...

 http://www.postgresql.org/docs/current/static/sql-declare.html

can be - but it isn't supported by plpgsql. This is similar to
scrollable cursors - SQL supported it some years, but plpgsql allowed
it only two years. The minimal necessary change is relative simple
plpgsql's parser update - but I don't know if there isn't other
issues.

see 
http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/pl/plpgsql/src/gram.y.diff?r1=1.99r2=1.100

Regards

Pavel



 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep at Oct 5

2010-10-07 Thread Steve Singer


On 10-10-07 05:52 AM, Fujii Masao wrote:

On Wed, Oct 6, 2010 at 4:06 PM, Simon Riggssi...@2ndquadrant.com  wrote:

The problem is how much WAL is stored on (any) node. Currently that is
wal_keep_segments, which doesn't work very well, but I've seen no better
ideas that cover all important cases.


What about allowing the master to read and send WAL from the archive?

Regards,


Then you have to deal with telling the archive how long it needs to keep 
WAL segments because the master might ask for them back.  If the archive 
is remote from the master then you have some extra network copying going 
on.  It would be better to let the slave being reconfigured to read the 
missing WAL from the archive.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

On 10/07/2010 01:08 PM, Simon Riggs wrote:
 Adding timeout is very little code. We can take that out of the patch if
 that's an objection.

Okay. If you take it out, we are at the wait-forever option, right?

If not, I definitely don't understand how you envision things to happen.
I've been asking [1] about that distinction before, but didn't get a
direct answer.

Regards

Markus Wanner


[1]: Re: Configuring synchronous replication, Markus Wanner:
http://archives.postgresql.org/message-id/4c9c5887.4040...@bluegap.ch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] todo point: plpgsql - scrollable cursors are supported

On Thu, Oct 7, 2010 at 7:34 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
2010/10/7 Robert Haas robertmh...@gmail.com:
On Thu, Oct 7, 2010 at 2:38 AM, Pavel Stehule pavel.steh...@gmail.com
wrote:
2010/10/7 Robert Haas robertmh...@gmail.com:
On Mon, Oct 4, 2010 at 2:52 AM, Pavel Stehule pavel.steh...@gmail.com
wrote:
I am thinking so you can remove a scrollable cursor support from
ToDo for plpgsql. Scrollable cursors are supported and supported
syntax are same as core SQL language.

I agree, removed. I also removed WITH HOLD cursors, which we seem to
have as well.

I think so we doesn't support WITH HOLD cursor syntax yet. Maybe we
have similar functionality. Don't know.

It's in the documentation...

http://www.postgresql.org/docs/current/static/sql-declare.html

can be - but it isn't supported by plpgsql. This is similar to
scrollable cursors - SQL supported it some years, but plpgsql allowed
it only two years. The minimal necessary change is relative simple
plpgsql's parser update - but I don't know if there isn't other
issues.

see
http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/pl/plpgsql/src/gram.y.diff?r1=1.99r2=1.100

I think the issue there may be that the execution of the function is
wrapped in a transaction anyway, so it's not really clear what WITH
HOLD would do for you. Now, if we could decouple those things... THAT
would be exciting, for a lot more reasons than just this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

On Thu, Oct 7, 2010 at 3:30 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Yes, lets get k = 1 first.

 With k = 1 the number of standbys is not limited, so we can still have
 very robust and highly available architectures. So we mean
 first-acknowledgement-releases-waiters.

+1.  I like the design Greg Smith proposed yesterday (though there are
details to be worked out).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leaky views, yet again

On Thu, Oct 7, 2010 at 2:02 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 07.10.2010 06:39, Robert Haas wrote:

 On Tue, Oct 5, 2010 at 3:42 PM, Tom Lanet...@sss.pgh.pa.us  wrote:

 Right, *column* filtering seems easy and entirely secure.  The angst
 here is about row filtering.  Can we have a view in which users can see
 the values of a column for some rows, with perfect security that they
 can't identify values for the hidden rows?  The stronger form is that
 they shouldn't even be able to tell that hidden rows exist, which is
 something your view doesn't try to do; but there are at least some
 applications where that would be desirable.

 I took a crack at documenting the current behavior; see attached.

 Looks good. It gives the impression that you need to be able to a create
 custom function to exploit, though. It would be good to mention that
 internal functions can be used too, revoking access to CREATE FUNCTION does
 not make you safe.

OK, second try attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


document-leaky-views-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On Scalability

2010/10/7 Heikki Linnakangas heikki.linnakan...@enterprisedb.com:
 On 07.10.2010 10:41, Simon Riggs wrote:

 On Thu, 2010-10-07 at 10:28 +0300, Heikki Linnakangas wrote:

 constraint exclusion of CHECK constraints on the partitions is
 exponential

 Constraint exclusion is linear with respect to number of partitions.
 Why do you say exponential?

 For some reason I thought the planner needs to check the constraints of the
 partitions against each other, but you're right, clearly that's not the
 case. Linear it is.

 --
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com


Making these things sub-linear (whether not O(log n) or even O(1) ),
provided that there's  way to, would make this RDBMS more appealing
to enterprises.
I mean also partial indexes (as an alternative to table partitioning).
Being able to effectively cope with a dozen child tables or so it's more
like an amateur feature.
If you really need partitioning (or just hierarchical stuff) I think you'll need
for quite more than a dozen items.
If you partition by just weeks, you'll need 50+ a year.

Is there any precise direction to where look into the code for it?

Is there a way to put this into a wish list?

-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

Salut Dimitri,

On 10/07/2010 12:32 PM, Dimitri Fontaine wrote:
Another one is to say that I want sync rep when the standby is
available, but I don't have the budget for more. So I prefer a good
alerting system and low-budget-no-guarantee when the standby is down,
that's my risk evaluation.

I think that's a pretty special case, because the good alerting system
is at least as expensive as another server that just persistently stores
and ACKs incoming WAL.

Why does one ever want the guarantee that sync replication gives to only
hold true up to one failure, if a better guarantee doesn't cost anything
extra? (Note that a good alerting system is impossible to achieve with
only two servers. You need a third device anyway).

Or put another way: a good alerting system is one that understands
Postgres to some extent. It protects you from data loss in *every* case.
If you attach at least two database servers to it, you get availability
as long as any one of the two is up and running. No matter what happened
before, even a full cluster power outage is guaranteed to recover from
automatically without any data loss.

[ Okay, the standby mode that only stores and ACKs WAL without having a
full database behind still needs to be written. However, pg_streamrecv
certainly goes that direction already, see [1]. ]

Sync replication between really just two servers is asking for trouble
and certainly not worth the savings in hardware cost. Better invest in a
good UPS and redundant power supplies for a single server.

The question then is, should the new configuration alter running
transactions?

It should definitely affect all currently running and waiting
transactions. For anything beyond three servers, where quorum_commit
could be bigger than one, it absolutely makes sense to be able to just
lower the requirements temporarily, instead of having to cancel the
guarantee completely.

Regards

Markus Wanner

[1]: Using streaming replication as log archiving, Magnus Hagander
http://archives.postgresql.org/message-id/aanlkti=_bzsyt8a1kjtpwzxnwyygqnvp1nbjwrnsd...@mail.gmail.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep at Oct 5

2010-10-07 Thread Fujii Masao

On Thu, Oct 7, 2010 at 8:46 PM, Steve Singer ssin...@ca.afilias.info wrote:
 Then you have to deal with telling the archive how long it needs to keep WAL
 segments because the master might ask for them back.

Yeah, it's not easy to determine how long we should keep the archived WAL files.
We need to calculate which WAL file is deletable according to the progress of
each standby and the state of the stored base backups which still might be used
for PITR.

 If the archive is
 remote from the master then you have some extra network copying going on.

Yep, so I think that the master (i.e., walsender) should read the archived
WAL file by using restore_command specified by users. If the archive is
remote from the master, then we would need to specify something like scp in
restore_command. Also, even if you compress the archived WAL file by using
pg_compress, the master can decompress it by using pg_decompress in
restore_command and transfer it.

 It would be better to let the slave being reconfigured to read the missing
 WAL from the archive.

That's one of choices.

But I've heard that some people don't want to set up the shared archive area
which can be accessed by the master and the standby. For example, they feel
that it's complex to configure NFS server or automatic-scp-without-password
setting for sharing the archived WAL files.

Currently we have to increase wal_keep_segments to work around that problem.
But the pg_xlog disk space is usually small and not suitable to keep many
WAL files. So we might be unable to increase wal_keep_segments.

If we allow the master to stream WAL files from the archive, we don't need
to increase wal_keep_segments and set up such a complex configuration. So
this idea is one of useful choices, I think.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leaky views, yet again

* Robert Haas (robertmh...@gmail.com) wrote:
 On Thu, Oct 7, 2010 at 2:02 AM, Heikki Linnakangas
  Looks good. It gives the impression that you need to be able to a create
  custom function to exploit, though. It would be good to mention that
  internal functions can be used too, revoking access to CREATE FUNCTION does
  not make you safe.
 
 OK, second try attached.

This might be overly pedantic, but I don't think 'tampering' gives the
right impression.  Also, there's a marked difference between viewing
data by using built-ins such as casting (since you'll only get to see
the first value in a column that fails the cast) and being able to write
a function that pulls out every row of the table and dumps it into
another table.  I think it'd have a much bigger impression if you went
ahead and changed the 'raise notice' to an 'insert into table x;'.

Also, even if you can't create functions (due to lack of create
privileges on any schema), you could use DO clauses now.  Revoking
usage rights on all languages should prevent both though.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Git cvsserver serious issue

2010-10-07 Thread Andrew Dunstan




On 09/23/2010 01:18 PM, Aidan Van Dyk wrote:

On Thu, Sep 23, 2010 at 11:49 AM, Tom Lanet...@sss.pgh.pa.us  wrote:

Magnus Hagandermag...@hagander.net  writes:

On Thu, Sep 23, 2010 at 17:32, Andrew Dunstanand...@dunslane.net  wrote:

Are we sure that's going to stop the DOS issue?

As long as it's done right, I don't see how it wouldn't.

There might be a cleaner way to do it, but after a moment's inspection
of the script, I'd be inclined to just hack GITCVS::updater-new() to
throw error if $module is neither master nor REL\d_\d_STABLE.
Keep in mind of course that I'm a lousy Perl coder.

Here's a quick change that will allow you to specifig a modules in
the gitcvs section to export, like:
 [gitcvs]
  enabled=1
  modules=master,REL9_0_STABLE,REL8_4_STABLE

--- git-cvsserver.orig  2010-09-23 12:03:06.0 -0400
+++ git-cvsserver   2010-09-23 13:16:53.0 -0400
@@ -2771,6 +2771,12 @@

  die Git repo '$self-{git_path}' doesn't exist unless ( -d
$self-{git_path} );

+if (defined $cfg-{gitcvs}{modules})
+{
+$log-debug(Limitting modules: . $cfg-{gitcvs}{modules});
+die Invalid module $module unless map {/^ *$module$/}
split(',', $cfg-{gitcvs}{modules});
+}
+
  $self-{dbdriver} = $cfg-{gitcvs}{$state-{method}}{dbdriver} ||
  $cfg-{gitcvs}{dbdriver} || SQLite;
  $self-{dbname} = $cfg-{gitcvs}{$state-{method}}{dbname} ||



OK, let's go with that. I was kinda hoping that we wouldn't have to do 
this at all, but Stefan has been having serious problems getting git to 
build and run on spoonbill, and I don't want to take up more of his time 
or be without it for very long.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leaky views, yet again


On 07.10.2010 16:10, Stephen Frost wrote:

Also, even if you can't create functions (due to lack of create
privileges on any schema), you could use DO clauses now.


There's no way to shoehorn a DO clause into a SELECT, you can't do:

SELECT data FROM view WHERE (DO $$ RAISE NOTICE argument; $$) = 1

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leaky views, yet again

On Thu, Oct 7, 2010 at 9:10 AM, Stephen Frost sfr...@snowman.net wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
 On Thu, Oct 7, 2010 at 2:02 AM, Heikki Linnakangas
  Looks good. It gives the impression that you need to be able to a create
  custom function to exploit, though. It would be good to mention that
  internal functions can be used too, revoking access to CREATE FUNCTION does
  not make you safe.

 OK, second try attached.

 This might be overly pedantic, but I don't think 'tampering' gives the
 right impression.

I'm open to suggestions.

 Also, there's a marked difference between viewing
 data by using built-ins such as casting (since you'll only get to see
 the first value in a column that fails the cast) and being able to write
 a function that pulls out every row of the table and dumps it into
 another table.

Well, that's why I give the more serious example first.  Even with
casting failures, there's a good chance you can probe a bunch of
different rows by throwing random filter conditions into the clause.

(function_that_returns_false() OR (name = 'some_constant' AND number::box)

 I think it'd have a much bigger impression if you went
 ahead and changed the 'raise notice' to an 'insert into table x;'.

Possibly, but it makes the example slightly longer, and I think it's
clear enough as-is.

 Also, even if you can't create functions (due to lack of create
 privileges on any schema), you could use DO clauses now.  Revoking
 usage rights on all languages should prevent both though.

I don't see how to make it work with a DO clause.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

Markus Wanner mar...@bluegap.ch writes:
 Why does one ever want the guarantee that sync replication gives to only
 hold true up to one failure, if a better guarantee doesn't cost anything
 extra? (Note that a good alerting system is impossible to achieve with
 only two servers. You need a third device anyway).

I think you're all into durability, and that's good. The extra cost is
service downtime if that's not what you're after: there's also
availability and load balancing read queries on a system with no lag (no
stale data servicing) when all is working right.

I still think your use case is a solid one, but that we need to be ready
to answer to some other ones, that you call relaxed and wrong because of
data loss risks. My proposal is to make the risk window obvious and the
behavior when you enter it configurable.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep at Oct 5

On Thu, Oct 7, 2010 at 9:08 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 7, 2010 at 8:46 PM, Steve Singer ssin...@ca.afilias.info wrote:
 Then you have to deal with telling the archive how long it needs to keep WAL
 segments because the master might ask for them back.

 Yeah, it's not easy to determine how long we should keep the archived WAL 
 files.
 We need to calculate which WAL file is deletable according to the progress of
 each standby and the state of the stored base backups which still might be 
 used
 for PITR.

 If the archive is
 remote from the master then you have some extra network copying going on.

 Yep, so I think that the master (i.e., walsender) should read the archived
 WAL file by using restore_command specified by users. If the archive is
 remote from the master, then we would need to specify something like scp in
 restore_command. Also, even if you compress the archived WAL file by using
 pg_compress, the master can decompress it by using pg_decompress in
 restore_command and transfer it.

 It would be better to let the slave being reconfigured to read the missing
 WAL from the archive.

 That's one of choices.

 But I've heard that some people don't want to set up the shared archive area
 which can be accessed by the master and the standby. For example, they feel
 that it's complex to configure NFS server or automatic-scp-without-password
 setting for sharing the archived WAL files.

 Currently we have to increase wal_keep_segments to work around that problem.
 But the pg_xlog disk space is usually small and not suitable to keep many
 WAL files. So we might be unable to increase wal_keep_segments.

 If we allow the master to stream WAL files from the archive, we don't need
 to increase wal_keep_segments and set up such a complex configuration. So
 this idea is one of useful choices, I think.

I'm not sure anyone other than yourself has endorsed this idea, but in
any case it seems off the critical path for getting this feature
committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leaky views, yet again

* Heikki Linnakangas (heikki.linnakan...@enterprisedb.com) wrote:
 On 07.10.2010 16:10, Stephen Frost wrote:
 Also, even if you can't create functions (due to lack of create
 privileges on any schema), you could use DO clauses now.

 There's no way to shoehorn a DO clause into a SELECT, you can't do:

 SELECT data FROM view WHERE (DO $$ RAISE NOTICE argument; $$) = 1

Wow, I just kind of assumed you could; I'm not really sure why.  Perhaps
it'll be possible in the future though, so might be something to think
about if/when it happens.  Can't see a way to abuse the view from inside
a DO or in a function in the same way either.

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] leaky views, yet again

* Robert Haas (robertmh...@gmail.com) wrote:
 On Thu, Oct 7, 2010 at 9:10 AM, Stephen Frost sfr...@snowman.net wrote:
  This might be overly pedantic, but I don't think 'tampering' gives the
  right impression.
 
 I'm open to suggestions.

Yeah, wasn't coming up with a better word myself. :/  Maybe something
with circumvented?

  Also, even if you can't create functions (due to lack of create
  privileges on any schema), you could use DO clauses now.  Revoking
  usage rights on all languages should prevent both though.
 
 I don't see how to make it work with a DO clause.

Yeah, Heikki pointed out that I was assuming PG could work more magic
than it can today. :)  Sorry for the noise.

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] On Scalability

On Thu, Oct 7, 2010 at 8:10 AM, Vincenzo Romano
vincenzo.rom...@notorand.it wrote:
 Making these things sub-linear (whether not O(log n) or even O(1) ),
 provided that there's  way to, would make this RDBMS more appealing
 to enterprises.
 I mean also partial indexes (as an alternative to table partitioning).
 Being able to effectively cope with a dozen child tables or so it's more
 like an amateur feature.
 If you really need partitioning (or just hierarchical stuff) I think you'll 
 need
 for quite more than a dozen items.
 If you partition by just weeks, you'll need 50+ a year.

 Is there any precise direction to where look into the code for it?

 Is there a way to put this into a wish list?

Well, you can't just arbitrarily turn a O(n) algorithm into an O(lg n)
algorithm.  I think the most promising approach to scaling to large
numbers of partitions is the patch that Itagaki Takahiro was working
on back in July.  Unfortunately, that patch still needs a lot of work
- and some redesign - before it will really meet our needs.  Right
now, the way to set up partitioning is to create a parent table and
then create a bunch of child tables that inherit from them and then
put mutually exclusive CHECK constraints on all the children and make
sure constraint_exclusion is on so that the planner can notice when
not all children need to be scanned.  As a totally general
architecture, this is probably hard to beat (or to make sublinear).
However, if we have DDL that allows the user to say: this is a set of
child tables that are range partitions on this key column, with these
boundaries, then you should be able to make the constraint exclusion
calculations much more efficient, because it won't have to infer so
much from first principles.  O(lg n) doesn't seem out of the question
given that architecture.

I think, though, that that is still some way off.  If you're in a
position to help with (or fund) the coding, it can be made to happen
faster, of course.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

On Thu, Oct 7, 2010 at 6:32 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:

Or if the standby is lagging and the master wal_keep_segments is not
sized big enough. Is that a catastrophic loss of the standby too?

Sure, but that lagged standy is already asynchrounous, not
synchrounous. If it was synchronous, it would have slowed the master
down enough it would not be lagged.

I'm really confused with all this k N scenarious I see bandied
about, because, all it really amounts to is I only want *one*
syncronous replication, and a bunch of synchrounous replications.
And a bit of chance thrown in the mix to hope the syncronous one is
pretty stable, the asynchronous ones aren't *too* far behind (define
too and far at your leisure).

And then I see a lot of posturing about how to recover when the
asynchronous standbys aren't synchronous enough at some point...

Agreed, that's a nice simple use case.

Another one is to say that I want sync rep when the standby is
available, but I don't have the budget for more. So I prefer a good
alerting system and low-budget-no-guarantee when the standby is down,
that's my risk evaluation.

That screems wrong in my books:

OK, I want durability, so I always want to have 2 copies of the data,
but if we loose one, copy, I want to keep on trucking, because I don't
*really* want durability.

If you want most-of-the time mostly 2 copy durabiltiy, then really
good asynchronous replication is a really good solutions.

Yes, I believe you need to have a way for an admin (or
process/control/config) to be able to demote a synchronous
replication scenario into async (or standalone, which is just an
extension of really async). But it's no longer syncronous replication
at that point. And if the choice is made to keep trucking while a
new standby is being brought online and available and caught up,
that's fine too. But during that perioud, until the slave is caught
up and synchrounously replicating, it's *not* synchronous replication.

So I'm not arguing that there shouldn't be a way to turn of
synchronous replication once it's on. Hopefully without having to
take down the cluster (pg instance type cluster) But I am pleading
that there is a way to setup PG such that synchronous replication *is*
synchronously replicating, or things stop and backup until such a time
as it is.

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Git cvsserver serious issue

2010-10-07 Thread Magnus Hagander

On Thu, Oct 7, 2010 at 15:16, Andrew Dunstan and...@dunslane.net wrote:


 On 09/23/2010 01:18 PM, Aidan Van Dyk wrote:

 On Thu, Sep 23, 2010 at 11:49 AM, Tom Lanet...@sss.pgh.pa.us  wrote:

 Magnus Hagandermag...@hagander.net  writes:

 On Thu, Sep 23, 2010 at 17:32, Andrew Dunstanand...@dunslane.net
  wrote:

 Are we sure that's going to stop the DOS issue?

 As long as it's done right, I don't see how it wouldn't.

 There might be a cleaner way to do it, but after a moment's inspection
 of the script, I'd be inclined to just hack GITCVS::updater-new() to
 throw error if $module is neither master nor REL\d_\d_STABLE.
 Keep in mind of course that I'm a lousy Perl coder.

 Here's a quick change that will allow you to specifig a modules in
 the gitcvs section to export, like:
     [gitcvs]
          enabled=1
          modules=master,REL9_0_STABLE,REL8_4_STABLE

 --- git-cvsserver.orig  2010-09-23 12:03:06.0 -0400
 +++ git-cvsserver       2010-09-23 13:16:53.0 -0400
 @@ -2771,6 +2771,12 @@

      die Git repo '$self-{git_path}' doesn't exist unless ( -d
 $self-{git_path} );

 +    if (defined $cfg-{gitcvs}{modules})
 +    {
 +        $log-debug(Limitting modules: . $cfg-{gitcvs}{modules});
 +        die Invalid module $module unless map {/^ *$module$/}
 split(',', $cfg-{gitcvs}{modules});
 +    }
 +
      $self-{dbdriver} = $cfg-{gitcvs}{$state-{method}}{dbdriver} ||
          $cfg-{gitcvs}{dbdriver} || SQLite;
      $self-{dbname} = $cfg-{gitcvs}{$state-{method}}{dbname} ||


 OK, let's go with that. I was kinda hoping that we wouldn't have to do this
 at all, but Stefan has been having serious problems getting git to build and
 run on spoonbill, and I don't want to take up more of his time or be without
 it for very long.

Are we doing this *just* for spoonbill? If so, it's a lot easier to
just filter-by-IP, so we don't have to maintain a patched version...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On Scalability

2010-10-07 Thread Tom Lane

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 07.10.2010 10:41, Simon Riggs wrote:
 Constraint exclusion is linear with respect to number of partitions.
 Why do you say exponential?

 For some reason I thought the planner needs to check the constraints of 
 the partitions against each other, but you're right, clearly that's not 
 the case. Linear it is.

Well, it's really more like O(mn) where m is the number of partitions
and n is the number of clauses in the query --- and not only that, but
the O() notation is hiding a depressingly high constant factor.  And
then there are practical problems like failing to exclude partitions as
soon as there are any parameters in the query.

There's basically no way that we're going to get decent performance for
large numbers of partitions as long as we have to resort to
theorem-proving to lead us to the correct partition.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On Scalability

2010/10/7 Robert Haas robertmh...@gmail.com:
 Well, you can't just arbitrarily turn a O(n) algorithm into an O(lg n)

That's trivially true. I was not asking for the recipe to do it.

 algorithm.  I think the most promising approach to scaling to large
 numbers of partitions is the patch that Itagaki Takahiro was working
 on back in July.  Unfortunately, that patch still needs a lot of work
 - and some redesign - before it will really meet our needs.  Right
 now, the way to set up partitioning is to create a parent table and
 then create a bunch of child tables that inherit from them and then
 put mutually exclusive CHECK constraints on all the children and make
 sure constraint_exclusion is on so that the planner can notice when
 not all children need to be scanned.  As a totally general
 architecture, this is probably hard to beat (or to make sublinear).

This is exactly what's described into the official documentation.
Everyone I ask information about before going deeper in test I get
the same answer: don't try to use more than a dozen child tables.

 However, if we have DDL that allows the user to say: this is a set of
 child tables that are range partitions on this key column, with these
 boundaries, then you should be able to make the constraint exclusion
 calculations much more efficient, because it won't have to infer so
 much from first principles.  O(lg n) doesn't seem out of the question
 given that architecture.

I see the main problem in the way the planner understands which partition
is useful and which one is not.
Having the DDL supporting the feature could just be syntactic sugar
if the underlying mechanism is inadequate.

 I think, though, that that is still some way off.  If you're in a
 position to help with (or fund) the coding, it can be made to happen
 faster, of course.

This is why I was asking for directions: brwosing the whole code to look for the
relevant stuff is quite time consuming.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise Postgres Company

-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Git cvsserver serious issue

2010-10-07 Thread Andrew Dunstan




On 10/07/2010 09:44 AM, Magnus Hagander wrote:

On Thu, Oct 7, 2010 at 15:16, Andrew Dunstanand...@dunslane.net  wrote:


On 09/23/2010 01:18 PM, Aidan Van Dyk wrote:

On Thu, Sep 23, 2010 at 11:49 AM, Tom Lanet...@sss.pgh.pa.uswrote:

Magnus Hagandermag...@hagander.netwrites:

On Thu, Sep 23, 2010 at 17:32, Andrew Dunstanand...@dunslane.net
  wrote:

Are we sure that's going to stop the DOS issue?

As long as it's done right, I don't see how it wouldn't.

There might be a cleaner way to do it, but after a moment's inspection
of the script, I'd be inclined to just hack GITCVS::updater-new() to
throw error if $module is neither master nor REL\d_\d_STABLE.
Keep in mind of course that I'm a lousy Perl coder.

Here's a quick change that will allow you to specifig a modules in
the gitcvs section to export, like:
 [gitcvs]
  enabled=1
  modules=master,REL9_0_STABLE,REL8_4_STABLE

--- git-cvsserver.orig  2010-09-23 12:03:06.0 -0400
+++ git-cvsserver   2010-09-23 13:16:53.0 -0400
@@ -2771,6 +2771,12 @@

  die Git repo '$self-{git_path}' doesn't exist unless ( -d
$self-{git_path} );

+if (defined $cfg-{gitcvs}{modules})
+{
+$log-debug(Limitting modules: . $cfg-{gitcvs}{modules});
+die Invalid module $module unless map {/^ *$module$/}
split(',', $cfg-{gitcvs}{modules});
+}
+
  $self-{dbdriver} = $cfg-{gitcvs}{$state-{method}}{dbdriver} ||
  $cfg-{gitcvs}{dbdriver} || SQLite;
  $self-{dbname} = $cfg-{gitcvs}{$state-{method}}{dbname} ||


OK, let's go with that. I was kinda hoping that we wouldn't have to do this
at all, but Stefan has been having serious problems getting git to build and
run on spoonbill, and I don't want to take up more of his time or be without
it for very long.

Are we doing this *just* for spoonbill? If so, it's a lot easier to
just filter-by-IP, so we don't have to maintain a patched version...



That's the only one I know of - there could certainly be others - but if 
we're going to support continued CVS use I want to be able to test it, 
and I don't have a static IP address. Can you filter by name lookup? I 
have a dyndns name.


OTOH, this patch seems pretty small and simple to maintain.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

Aidan Van Dyk ai...@highrise.ca writes:
 Sure, but that lagged standy is already asynchrounous, not
 synchrounous.  If it was synchronous, it would have slowed the master
 down enough it would not be lagged.

Agreed, except in the case of a joining standby. But you're saying it
better than I do:

 Yes, I believe you need to have a way for an admin (or
 process/control/config) to be able to demote a synchronous
 replication scenario into async (or standalone, which is just an
 extension of really async).  But it's no longer syncronous replication
 at that point.  And if the choice is made to keep trucking while a
 new standby is being brought online and available and caught up,
 that's fine too.  But during that perioud, until the slave is caught
 up and synchrounously replicating, it's *not* synchronous replication.

That's exactly my point. I think we need to handle the case and make it
obvious that this window is a data-loss window where there's no sync rep
ongoing, then offer users a choice of behaviour.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Git cvsserver serious issue

2010-10-07 Thread Magnus Hagander

On Thu, Oct 7, 2010 at 16:07, Andrew Dunstan and...@dunslane.net wrote:

 On 10/07/2010 09:44 AM, Magnus Hagander wrote:

 On Thu, Oct 7, 2010 at 15:16, Andrew Dunstanand...@dunslane.net  wrote:

 On 09/23/2010 01:18 PM, Aidan Van Dyk wrote:

 On Thu, Sep 23, 2010 at 11:49 AM, Tom Lanet...@sss.pgh.pa.us    wrote:

 Magnus Hagandermag...@hagander.net    writes:

 On Thu, Sep 23, 2010 at 17:32, Andrew Dunstanand...@dunslane.net
  wrote:

 Are we sure that's going to stop the DOS issue?

 As long as it's done right, I don't see how it wouldn't.

 There might be a cleaner way to do it, but after a moment's inspection
 of the script, I'd be inclined to just hack GITCVS::updater-new() to
 throw error if $module is neither master nor REL\d_\d_STABLE.
 Keep in mind of course that I'm a lousy Perl coder.

 Here's a quick change that will allow you to specifig a modules in
 the gitcvs section to export, like:
     [gitcvs]
          enabled=1
          modules=master,REL9_0_STABLE,REL8_4_STABLE

 --- git-cvsserver.orig  2010-09-23 12:03:06.0 -0400
 +++ git-cvsserver       2010-09-23 13:16:53.0 -0400
 @@ -2771,6 +2771,12 @@

      die Git repo '$self-{git_path}' doesn't exist unless ( -d
 $self-{git_path} );

 +    if (defined $cfg-{gitcvs}{modules})
 +    {
 +        $log-debug(Limitting modules: . $cfg-{gitcvs}{modules});
 +        die Invalid module $module unless map {/^ *$module$/}
 split(',', $cfg-{gitcvs}{modules});
 +    }
 +
      $self-{dbdriver} = $cfg-{gitcvs}{$state-{method}}{dbdriver} ||
          $cfg-{gitcvs}{dbdriver} || SQLite;
      $self-{dbname} = $cfg-{gitcvs}{$state-{method}}{dbname} ||

 OK, let's go with that. I was kinda hoping that we wouldn't have to do
 this
 at all, but Stefan has been having serious problems getting git to build
 and
 run on spoonbill, and I don't want to take up more of his time or be
 without
 it for very long.

 Are we doing this *just* for spoonbill? If so, it's a lot easier to
 just filter-by-IP, so we don't have to maintain a patched version...


 That's the only one I know of - there could certainly be others - but if
 we're going to support continued CVS use I want to be able to test it, and I
 don't have a static IP address. Can you filter by name lookup? I have a
 dyndns name.

No, filtering is by IP. ssh forwarding might work.


 OTOH, this patch seems pretty small and simple to maintain.

True, it is rather small.

Does anybody know if there's an automated way to maintain that on
freebsd ports, and if so, how that works? I want to be *sure* we can't
accidentally upgrade git-cvsserver *without* the patch, since that is
a security issue.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On Scalability

2010/10/7 Tom Lane t...@sss.pgh.pa.us:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 07.10.2010 10:41, Simon Riggs wrote:
 Constraint exclusion is linear with respect to number of partitions.
 Why do you say exponential?

 For some reason I thought the planner needs to check the constraints of
 the partitions against each other, but you're right, clearly that's not
 the case. Linear it is.

 Well, it's really more like O(mn) where m is the number of partitions
 and n is the number of clauses in the query --- and not only that, but
 the O() notation is hiding a depressingly high constant factor.  And
 then there are practical problems like failing to exclude partitions as
 soon as there are any parameters in the query.


Does the same considerations apply to partial indexes?
I mean, I can replace table partitioning with index partitioning concept.
(Well I know it's not really the same).
Would then it be the same O(nm) to let the planner choose the right indexes
given a certain query?

 There's basically no way that we're going to get decent performance for
 large numbers of partitions as long as we have to resort to
 theorem-proving to lead us to the correct partition.

                        regards, tom lane


I'm not sure about MySQL, but Oracle can handle large partitioning.
So I would say there's a way to achieve the same goal.

-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
cel +393398083886 fix +390823454163 fax +3902700506964
gtalk. vincenzo.rom...@notorand.it skype. notorand.it
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On Scalability

* Vincenzo Romano (vincenzo.rom...@notorand.it) wrote:
 I see the main problem in the way the planner understands which partition
 is useful and which one is not.
 Having the DDL supporting the feature could just be syntactic sugar
 if the underlying mechanism is inadequate.

I'm pretty sure the point with the DDL would be to have a way for the
user to communicate to the planner more understanding about the
partitioning, not just to be syntactic sugar.  With that additional
information, the planner can make a faster and better decision.

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Issues with Quorum Commit

On Thu, Oct 7, 2010 at 10:08 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
Aidan Van Dyk ai...@highrise.ca writes:
Sure, but that lagged standy is already asynchrounous, not
synchrounous. If it was synchronous, it would have slowed the master
down enough it would not be lagged.

Agreed, except in the case of a joining standby.

*shrug* The joining standby is still asynchronous at this point.
It's not synchronous replication. It's just another ^k of the N
slaves serving stale data ;-)

But you're saying it
better than I do:

That's exactly my point. I think we need to handle the case and make it
obvious that this window is a data-loss window where there's no sync rep
ongoing, then offer users a choice of behaviour.

Again, I'm stating there is *no* choice in synchronous replication.
It's *got* to block, otherwise it's not synchronous replication. The
choice is if you want synchronous replication or not at that point.

And turning it off might be a good (best) choice for for most people.
I just want to make sure that:
1) There's now way to *sensibly* think it's still synchronously replicating
2) There is a way to enforce that the commits happening *are*
synchronously replicating.

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On Scalability


Vincenzo Romano wrote:

I see the main problem in the way the planner understands which partition
is useful and which one is not.
Having the DDL supporting the feature could just be syntactic sugar
if the underlying mechanism is inadequate.
  


You have the order of this backwards.  In order to do better than the 
way the current scheme is implemented, the optimizer needs higher 
quality metadata about the structure of the partitions to work with.  
Right now, it's inferring them from the CHECK constraints, which 
requires the whole theorem-proving bit Tom mentioned.  That's never 
going to get any more algorithmically efficient than it already is.


If the DDL that created the partitions also made better quality metadata 
available about the structure of the partitions, at that point it would 
be possible to also do better in how the optimizer pruned partitions to 
consider too.  If the list it has was known to be in a particular 
structured/sorted order, the optimizer could do a binary search to find 
relevant partitions, rather than the linear scan required right now.


Until that work is done, any other improvement attempts are doomed to 
fail.  That's the point Robert was trying to make to you.  And the fact 
Oracle does this is why it's able to scale to high partition counts 
better than PostgreSQL can.


You can read more about the work that was being done here at 
http://wiki.postgresql.org/wiki/Table_partitioning


--
Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On Scalability

2010/10/7 Stephen Frost sfr...@snowman.net:
 * Vincenzo Romano (vincenzo.rom...@notorand.it) wrote:
 I see the main problem in the way the planner understands which partition
 is useful and which one is not.
 Having the DDL supporting the feature could just be syntactic sugar
 if the underlying mechanism is inadequate.

 I'm pretty sure the point with the DDL would be to have a way for the
 user to communicate to the planner more understanding about the
 partitioning, not just to be syntactic sugar.  With that additional
 information, the planner can make a faster and better decision.

        Stephen

Which kind of information are you thinking about?
I think that the stuff you put into the CHECK condition for the table
will say it all.
Infact there you have not just the column names with relevant values, but the
actual expression(s) to be checked,

-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On Scalability

* Vincenzo Romano (vincenzo.rom...@notorand.it) wrote:
 Which kind of information are you thinking about?
 I think that the stuff you put into the CHECK condition for the table
 will say it all.

The problem is that CHECK conditions can contain just about anything,
hence the planner needs to deal with that possibility.

 Infact there you have not just the column names with relevant values, but the
 actual expression(s) to be checked,

Yes, that would be the problem.  Proving something based on expressions
is alot more time consuming and complicated than being explicitly told
what goes where.

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] On Scalability

2010/10/7 Greg Smith g...@2ndquadrant.com:
 Vincenzo Romano wrote:

 I see the main problem in the way the planner understands which
 partition
 is useful and which one is not.
 Having the DDL supporting the feature could just be syntactic sugar
 if the underlying mechanism is inadequate.


 You have the order of this backwards.  In order to do better than the way
 the current scheme is implemented, the optimizer needs higher quality
 metadata about the structure of the partitions to work with.  Right now,
 it's inferring them from the CHECK constraints, which requires the whole
 theorem-proving bit Tom mentioned.  That's never going to get any more
 algorithmically efficient than it already is.
 If the DDL that created the partitions also made better quality metadata
 available about the structure of the partitions, at that point it would be
 possible to also do better in how the optimizer pruned partitions to
 consider too.  If the list it has was known to be in a particular
 structured/sorted order, the optimizer could do a binary search to find
 relevant partitions, rather than the linear scan required right now.


Do you mean the check constraint is used as plain text to be (somehow) executed?
If this is the case, then you (all) are perfectly and obviously right
and I'm just fishing
for bicycles in the sea.

I would expect a parser to ... ehm ... parse the CHECK constraint
expression at CREATE TABLE  time and
extract all the needed high quality metadata, like the list of
columns involved and the type of
checks (range, value list, etc.).
The same would be useful for partial indexes, as well.

But maybe this is just wishful thinking.

 Until that work is done, any other improvement attempts are doomed to fail.
  That's the point Robert was trying to make to you.  And the fact Oracle
 does this is why it's able to scale to high partition counts better than
 PostgreSQL can.

 You can read more about the work that was being done here at
 http://wiki.postgresql.org/wiki/Table_partitioning

Done. As well as the official documentation.
The point is that there are no hints on the topic.
There should be a caveat in the documentation saying that partitioning
is not scalable. As well as partial indexing.

Thanks so far for the information.

 --
 Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD
 PostgreSQL Training, Services and Support  www.2ndQuadrant.us

-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On Scalability

2010-10-07 Thread Alvaro Herrera

Excerpts from Vincenzo Romano's message of jue oct 07 10:44:34 -0400 2010:

 Do you mean the check constraint is used as plain text to be (somehow) 
 executed?
 If this is the case, then you (all) are perfectly and obviously right
 and I'm just fishing
 for bicycles in the sea.

Yeah, hence this thread hasn't advanced things very much in any useful
direction.  That we need to improve the partitioning implementation is
already known.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On Scalability

2010/10/7 Alvaro Herrera alvhe...@commandprompt.com:
 Excerpts from Vincenzo Romano's message of jue oct 07 10:44:34 -0400 2010:

 Do you mean the check constraint is used as plain text to be (somehow) 
 executed?
 If this is the case, then you (all) are perfectly and obviously right
 and I'm just fishing
 for bicycles in the sea.

 Yeah, hence this thread hasn't advanced things very much in any useful
 direction.  That we need to improve the partitioning implementation is
 already known.

Maybe I'm willing to help and possibly able to.
But I need to understand things that are already known but I didn't know yet.

 --
 Álvaro Herrera alvhe...@commandprompt.com
 The PostgreSQL Company - Command Prompt, Inc.
 PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On Scalability

2010/10/7 Stephen Frost sfr...@snowman.net:
 * Vincenzo Romano (vincenzo.rom...@notorand.it) wrote:
 Which kind of information are you thinking about?
 I think that the stuff you put into the CHECK condition for the table
 will say it all.

 The problem is that CHECK conditions can contain just about anything,
 hence the planner needs to deal with that possibility.

Not really. For partitioning there would be some constraints as you
have in the DEFAULT values.

 Infact there you have not just the column names with relevant values, but the
 actual expression(s) to be checked,

 Yes, that would be the problem.  Proving something based on expressions
 is alot more time consuming and complicated than being explicitly told
 what goes where.

Consuming computing resources at DDL-time should be OK if that will
lead to big savings at DML-time (run-time), my opinion. It'd be just like
compile time optimizations.

        Stephen

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.9 (GNU/Linux)

 iEYEARECAAYFAkyt3qMACgkQrzgMPqB3kiih3wCcCwLlvpDCjgG5LSgim/XGieEE
 MsEAn0mHfAizDOpvepGXWTWlxHtJibA5
 =Szx4
 -END PGP SIGNATURE-


-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] O_DSYNC broken on MacOS X?

On Tue, Oct 5, 2010 at 8:11 AM, Peter Eisentraut pete...@gmx.net wrote:
 On mån, 2010-10-04 at 23:41 -0400, Robert Haas wrote:
  Well, it's not really useful, but that's how it works everywhere.  On
  Linux, fsync carries the stuff from the kernel's RAM to the disk
  controller's RAM, and then it depends on some hdparm magic or something
  what happens next.

 That's a bit vaguer than I'd like.  TFD says The aim of WAL is to
 ensure that the log is written before database records are altered,
 but this can be subverted by disk drives that falsely report a
 successful write to the kernel, when in fact they have only cached the
 data and not yet stored it on the disk. A power failure in such a
 situation might lead to irrecoverable data corruption. Administrators
 should try to ensure that disks holding PostgreSQL's WAL log files do
 not make such false reports.  This leaves open the question of how
 they should attempt to do this; we should say what we know about that.

 That is explained in section 29.1 Reliability.

 I also notice the following sentence in our documentation, which now
 appears to me to be flat-out wrong: The wal_sync_method parameter
 determines how PostgreSQL will ask the kernel to force WAL  updates
 out to disk. All the options should be the same in terms of
 reliability, but it's quite platform-specific which one will be the
 fastest.  Obviously, we know now (if we didn't before) that this
 isn't the case, per my OP.

 Right.  It was true before fsync_writethrough was invented.

Proposed doc patch attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


document-wal-caveats.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On Scalability

* Vincenzo Romano (vincenzo.rom...@notorand.it) wrote:
 I would expect a parser to ... ehm ... parse the CHECK constraint
 expression at CREATE TABLE  time and
 extract all the needed high quality metadata, like the list of
 columns involved and the type of
 checks (range, value list, etc.).

Check constraints can be added after the table is created.  Inheiritance
can be added/changed independently of check constraints.  Hacking all of
the inheiritance, check constraint creation, and any other possibly
involved code paths to try to figure out if this particular table, check
constraint, inheiritance relationship, etc, is part of a partitioning
setup isn't exactly trivial, or the right approach.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] On Scalability

* Vincenzo Romano (vincenzo.rom...@notorand.it) wrote:
 2010/10/7 Stephen Frost sfr...@snowman.net:
  * Vincenzo Romano (vincenzo.rom...@notorand.it) wrote:
  The problem is that CHECK conditions can contain just about anything,
  hence the planner needs to deal with that possibility.
 
 Not really. For partitioning there would be some constraints as you
 have in the DEFAULT values.

How do we know when it's partitioning and not a CHECK constraint being
used for something else..?  I'll tell you- through the user using
specific partitioning DDL statements.

 Consuming computing resources at DDL-time should be OK if that will
 lead to big savings at DML-time (run-time), my opinion. It'd be just like
 compile time optimizations.

CHECK constraints, inheiritance, etc, are general things which can be
used for more than just partitioning.  Abusing them to go through tons
of extra gyrations to make the specific partitioning case faster at DML
time (if that's really even possible...  I'm not convinced you could
make it bullet-proof) isn't a good approach.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] security hook on table creation

On Wed, Oct 6, 2010 at 5:21 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Robert Haas's message of mié oct 06 17:02:22 -0400 2010:
 2010/10/5 KaiGai Kohei kai...@ak.jp.nec.com:

  However, we also have a few headache cases.
  DefineType() creates a new type object and its array type, but it does not
  call CommandCounterIncrement() by the end of this function, so the new type
  entries are not visible from the plugin modules, even if we put a security
  hook at tail of the DefineType().
  DefineFunction() also has same matter. It create a new procedure object,
  but it also does not call CommandCounterIncrement() by the end of this
  function, except for the case when ProcedureCreate() invokes language
  validator function.

 So I guess the first question here is why it's important to be able to
 see the new entry.  I am thinking that you want it so that, for
 example, you can fetch the namespace OID to perform an SE-Linux type
 transition.  Is that right?

 I'm not sure that there's any point trying to optimize these to the
 point of avoiding CommandCounterIncrement.  Surely DefineType et al are
 not performance-sensitive operations.

OK, fair enough.  Let's just do it unconditionally then.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On Scalability

2010/10/7 Stephen Frost sfr...@snowman.net:
 * Vincenzo Romano (vincenzo.rom...@notorand.it) wrote:
 2010/10/7 Stephen Frost sfr...@snowman.net:
  * Vincenzo Romano (vincenzo.rom...@notorand
.it) wrote:
  The problem is that CHECK conditions can contain just about anything,
  hence the planner needs to deal with that possibility.

 Not really. For partitioning there would be some constraints as you
 have in the DEFAULT values.

 How do we know when it's partitioning and not a CHECK constraint being
 used for something else..?

Why asking? You don't need to tell them apart.
Just parse the expression, extract the metadata to be used when the expression
need to be evaluated. Being it a plain CHECK constraint or something
for the partition
management would then be irrelevant.

 I'll tell you- through the user using
 specific partitioning DDL statements.

That could be the next step, once  the underlying stuff is already in place.

 Consuming computing resources at DDL-time should be OK if that will
 lead to big savings at DML-time (run-time), my opinion. It'd be just like
 compile time optimizations.

 CHECK constraints, inheiritance, etc, are general things which can be
 used for more than just partitioning.  Abusing them to go through tons
 of extra gyrations to make the specific partitioning case faster at DML
 time (if that's really even possible...  I'm not convinced you could
 make it bullet-proof) isn't a good approach.

At the moment I'm not interested in particular cases.
I think that CHECK constraints (as well as partial indexes expressions) should
be handled in a more effective way. Better partitioning (both for
tables and indexes) would
be a side effect.

Thanks for the insights.

        Thanks,

                Stephen

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.9 (GNU/Linux)

 iEYEARECAAYFAkyt5CAACgkQrzgMPqB3kijAUACfd9QcB00Nic6mSwWmwoXABc4p
 kBoAnAijF39ZTFOGjpk1CN/8/I3Tj9HI
 =C8G/
 -END PGP SIGNATURE-


-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On Scalability

2010/10/7 Stephen Frost sfr...@snowman.net:
 * Vincenzo Romano (vincenzo.rom...@notorand.it) wrote:
 I would expect a parser to ... ehm ... parse the CHECK constraint
 expression at CREATE TABLE  time and
 extract all the needed high quality metadata, like the list of
 columns involved and the type of
 checks (range, value list, etc.).

 Check constraints can be added after the table is created.  Inheiritance
 can be added/changed independently of check constraints.  Hacking all of
 the inheiritance, check constraint creation, and any other possibly
 involved code paths to try to figure out if this particular table, check
 constraint, inheiritance relationship, etc, is part of a partitioning
 setup isn't exactly trivial, or the right approach.

        Thanks,

                Stephen

I think none will say things are trivial.
So, what'd be the right approach in your vision?
I mean, if you think about partitioning a-la Oracle, then you'll have to
parse those expressions anyway.

-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
cel +393398083886 fix +390823454163 fax +3902700506964
gtalk. vincenzo.rom...@notorand.it skype. notorand.it
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On Scalability

Vincenzo Romano vincenzo.rom...@notorand.it wrote:
 2010/10/7 Stephen Frost sfr...@snowman.net:
 
 Yes, that would be the problem.  Proving something based on
 expressions is alot more time consuming and complicated than
 being explicitly told what goes where.
 
 Consuming computing resources at DDL-time should be OK if that
 will lead to big savings at DML-time (run-time), my opinion. It'd
 be just like compile time optimizations.
 
I think something you haven't entirely grasped is how pluggable
PostgreSQL is -- you can not only define your own functions in a
wide variety of languages (including C), but your own data types,
operators, casts, index strategies, etc.  Determining, even at DDL
time that even a built-in datatype's expression is or isn't useful
in partitioning could be quite painful in the absence of syntax
specifically geared toward partitioning.  If there's a CHECK
constraint on a polygon column to ensure that it isn't a concave
polygon, you might be facing a lot of work to know whether it's
involved in partitioning.  Now imagine that a CHECK constraint is on
a column with a user defined type and uses the @%!! operator and
that the user has changed some of the allowed implicit casts used in
the expression.
 
While this flexibility is a great strength of PostgreSQL, it makes
some things more difficult to implement than they would be in more
limited database products.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On Scalability

* Vincenzo Romano (vincenzo.rom...@notorand.it) wrote:
 So, what'd be the right approach in your vision?

Have you read http://wiki.postgresql.org/wiki/Table_partitioning and the
various places it links to..?

 I mean, if you think about partitioning a-la Oracle, then you'll have to
 parse those expressions anyway.

Oracle's approach is discussed there.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Issues with Quorum Commit

Aidan Van Dyk ai...@highrise.ca writes:
 *shrug*  The joining standby is still asynchronous at this point.
 It's not synchronous replication.  It's just another ^k of the N
 slaves serving stale data ;-)

Agreed *here*, but if you read the threads again, you'll see that's not
at all what's been talked about before my proposal.

In particular, the questions about how to unlock a master's setup while
its synced standby is doing a base backup should not be allowed to
exists, and you seem to agree with my point.

 That's exactly my point. I think we need to handle the case and make it
 obvious that this window is a data-loss window where there's no sync rep
 ongoing, then offer users a choice of behaviour.

 Again, I'm stating there is *no* choice in synchronous replication.
 It's *got* to block, otherwise it's not synchronous replication.  The
 choice is if you want synchronous replication or not at that point.

Exactly, even if I didn't dare spell it this way.

What I want to propose is for the user to be able to configure things so
that he loses the sync aspect of the replication if it so happens that
the setup is not able to provide for it.

It may sound strange, but it's needed when all you want is a no stale
data reporting stanbdy, e.g. And it so happens that it's already in
Simon's code, AFAIUI (yet to read it, see).

 And turning it off might be a good (best) choice for for most people.
 I just want to make sure that:
 1) There's now way to *sensibly* think it's still synchronously replicating
 2) There is a way to enforce that the commits happening *are*
 synchronously replicating.

We're on the same track. I don't know how to offer your options without
a clear listing of standby states and transitions, which must include
the synchronicity and whether you just lost it or whatnot.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] O_DSYNC broken on MacOS X?


Robert Haas wrote:

Proposed doc patch attached.
  


Looks accurate to me.  I like the additional linking to the Reliability 
page you put in there too.  Heavily referencing that important page from 
related areas is a good thing, particularly now that it's got a lot more 
details than it used to. 


--
Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] O_DSYNC broken on MacOS X?

2010-10-07 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 Proposed doc patch attached.

discusesed?  Otherwise +1

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] O_DSYNC broken on MacOS X?

On Thu, Oct 7, 2010 at 11:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Proposed doc patch attached.

 discusesed?  Otherwise +1

Woops, thanks.  Committed with that change.  I back-patched it back to
8.3, which is as far as it applied with only minor conflicts.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] O_DSYNC broken on MacOS X?

On Thu, Oct 7, 2010 at 11:45 AM, Greg Smith g...@2ndquadrant.com wrote:
 Robert Haas wrote:
 Proposed doc patch attached.

 Looks accurate to me.  I like the additional linking to the Reliability page
 you put in there too.  Heavily referencing that important page from related
 areas is a good thing, particularly now that it's got a lot more details
 than it used to.

Cool, thanks for the fast review.  I suspect there are more details
that could stand to be added to the WAL reliability page as well, but
I don't know what they are so I can't add them.

I still have the feeling that we have not put quite a large enough
red, blinking light around this issue, but I don't have a concrete
suggestion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit


Markus Wanner wrote:

I think that's a pretty special case, because the good alerting system
is at least as expensive as another server that just persistently stores
and ACKs incoming WAL.
  


The cost of hardware capable of running a database server is a large 
multiple of what you can build an alerting machine for.  I have two 
systems that are approaching the trash heap just at my house, relative 
to the main work I do, but that are fully capable of running an alerting 
system.  Building a production quality database server requires a more 
significant investment:  high quality disks, ECC RAM, battery-backed 
RAID controller, etc.  Relative to what the hardware in a database 
server costs, what you need to build an alerting system is almost free.  
Oh:  and most businesses that are complicated enough to need a serious 
database server already have them, so they actually cost nothing beyond 
the software setup time to point them toward the databases, too.



Why does one ever want the guarantee that sync replication gives to only
hold true up to one failure, if a better guarantee doesn't cost anything
extra? (Note that a good alerting system is impossible to achieve with
only two servers. You need a third device anyway).
  


I do not disagree with your theory or reasoning.  But as a practical 
matter, I'm afraid the true cost of the better guarantee you're 
suggesting here is additional code complexity that will likely cause 
this feature to miss 9.1 altogether.  As far as I'm concerned, this 
whole diversion into the topic of quorum commit is only consuming 
resources away from targeting something achievable in the time frame of 
a single release.



Sync replication between really just two servers is asking for trouble
and certainly not worth the savings in hardware cost. Better invest in a
good UPS and redundant power supplies for a single server.
  


I wish I could give you the long list of data recovery projects I've 
worked on over the last few years, so you could really appreciate how 
much what you're saying here is exactly the opposite of the reality 
here.  You cannot make a single server reliable enough to survive all of 
the things that Murphy's Law will inflict upon it, at any price.  For 
most of the businesses I work with who want sync rep, data is not 
considered safe until the second copy is on storage miles away from the 
original, because they know this too.


Personal anecdote I can share:  I used to have an important project 
related to stock trading where I kept my backup system about 50 miles 
away from me.  I was aiming for constant availability, while still being 
able to drive to the other server if needed for disaster recovery.  
Guess what?  Even those two turned out not to be nearly independent 
enough; see http://en.wikipedia.org/wiki/Northeast_Blackout_of_2003 for 
details of how I lost both of those at the same time for days.  Silly 
me, I'd only spread them across two adjacent states with different power 
providers!  Not nearly good enough to avoid a correlated failure.


--
Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On Scalability

2010/10/7 Stephen Frost sfr...@snowman.net:
 * Vincenzo Romano (vincenzo.rom...@notorand.it) wrote:
 So, what'd be the right approach in your vision?

 Have you read http://wiki.postgresql.org/wiki/Table_partitioning and the
 various places it links to..?

 I mean, if you think about partitioning a-la Oracle, then you'll have to
 parse those expressions anyway.

 Oracle's approach is discussed there.

I didn't meant the implementation, but the goals achieved.


        Thanks,

                Stephen

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.9 (GNU/Linux)

 iEYEARECAAYFAkyt6ikACgkQrzgMPqB3kih0HwCcD8rQQhD6oXao8ZnG/bMAvx2d
 4HkAnjrzox4XemzVyFkhKRXb3ZjS2nba
 =6WlP
 -END PGP SIGNATURE-





-- 
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
NON QVIETIS MARIBVS NAVTA PERITVS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] O_DSYNC broken on MacOS X?

2010-10-07 Thread A.M.


On Oct 7, 2010, at 12:26 PM, Robert Haas wrote:

 On Thu, Oct 7, 2010 at 11:45 AM, Greg Smith g...@2ndquadrant.com wrote:
 Robert Haas wrote:
 Proposed doc patch attached.
 
 Looks accurate to me.  I like the additional linking to the Reliability page
 you put in there too.  Heavily referencing that important page from related
 areas is a good thing, particularly now that it's got a lot more details
 than it used to.
 
 Cool, thanks for the fast review.  I suspect there are more details
 that could stand to be added to the WAL reliability page as well, but
 I don't know what they are so I can't add them.
 
 I still have the feeling that we have not put quite a large enough
 red, blinking light around this issue, but I don't have a concrete
 suggestion.

I think the general problem is that there is no simple way to verify that a 
PostgreSQL commit is pushing the bits to persistent storage. It would be 
helpful if there were a platform-specific, volume-specific tool to deduce this. 
Currently, there is no warning light that goes on when commits are not 
persistent.

On Linux, a tool could check filesystem parameters, hdparm (if relevant), and 
hard drive and controller specs (possibly against a blacklist of known liars).

Perhaps a simpler tool could run a basic fsyncs-per-second test and prompt the 
DBA to check that the numbers are within the realm of possibility.

How else can a DBA today ensure that a commit is a commit?

Cheers,
M
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

On 10/7/10 6:41 AM, Aidan Van Dyk wrote:
 I'm really confused with all this k  N scenarious I see bandied
 about, because, all it really amounts to is I only want *one*
 syncronous replication, and a bunch of synchrounous replications.
 And a bit of chance thrown in the mix to hope the syncronous one is
 pretty stable, the asynchronous ones aren't *too* far behind (define
 too and far at your leisure).

Effectively, yes.  The the difference between k of N synch rep and 1
synch standby + several async standbys is that in k of N, you have a
pool and aren't dependent on having a specific standby be very reliable,
just that any one of them is.

So if you have k = 3 and N = 10, then you can have 10 standbys and only
3 of them need to ack any specific commit for the master to proceed. As
long as (a) you retain at least one of the 3 which ack'd, and (b) you
have some way of determining which standby is the most caught up, data
loss is fairly unlikely; you'd need to lose 4 of the 10, and the wrong
4, to lose data.

The advantage of this for availability over just having k = N = 3 comes
when one of the standbys is responding slowly (due to traffic) or goes
offline unexpectedly due to a hardware failure.  In the k = N = 3 case,
the system halts.  In the k = 3, N = 10 case, you can lose up to 7
standbys without the system going down.

It's notable that the massively scalable transactional databases
(Dynamo, Cassandra, various telecom databases, etc.) all operate this way.

However, I do consider this advanced functionality and not worth
pursuing until we have the k = 1 case implemented and well-tested.  For
comparison, Cassandra, Hypertable and Riak have been working on their k
 N functionality for a couple years now and none of them has it stable
*and* fast.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] standby registration (was: is sync rep stalled?)


On 06.10.2010 19:26, Greg Smith wrote:

Now, the more relevant question, what I actually need in order for a
Sync Rep feature in 9.1 to be useful to the people who want it most I
talk to. That would be a simple to configure setup where I list a subset
of important nodes, and the appropriate acknowledgement level I want
to hear from one of them. And when one of those nodes gives that
acknowledgement, commit on the master happens too. That's it. For use
cases like the commonly discussed two local/two remote situation, the
two remote ones would be listed as the important ones.


This feels like the best way forward to me. It gives some flexibility, 
and doesn't need a new config file.


Let me check that I got this right, and add some details to make it more 
concrete: Each standby is given a name. It can be something like 
boston1 or testserver. It does *not* have to be unique across all 
standby servers. In the master, you have a list of important, 
synchronous, nodes that must acknowledge each commit before it is 
acknowledged to the client.


The standby name is a GUC in the standby's configuration file:

standby_name='bostonserver'

The list of important nodes is also a GUC, in the master's configuration 
file:


synchronous_standbys='bostonserver, oxfordserver'

To configure for a simple setup with a master and one synchronous 
standby (which is not a very good setup from availability point of view, 
as discussed to death), you give the standby a name, and put the same 
name in synchronous_standbys in the master.


To configure a setup with a master and two standbys, so that a commit is 
acknowledged to client as soon as either one of the standbys acknowledge 
it, you give both standbys the same name, and the same name in 
synchronous_standbys list in the master. This is the configuration that 
gives zero data loss in case one server fails, but also caters for 
availability because you don't need to halt the master if one standby fails.


To configure a setup with a master and two standbys, so that a commit is 
acknowledged to client after *both* standbys acknowledge it, you give 
both standbys a different name, and list both names in 
synchronous_standbys_list in the master.


I believe this will bend to most real life scenarios people have.


Now, the other big fight is over wait forever vs timeout. 
Personally, I'm stand firmly in the wait forever camp - you're nuts if 
you want a timeout. However, I can see that not everyone agrees :-). 
Fortunately, once we have robust wait forever behavior, it shouldn't 
be hard at all to add a timeout option on top of that, for those who 
want it. We should be able to have both options in 9.1.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] standby registration (was: is sync rep stalled?)

2010-10-07 Thread Dave Page

On 10/7/10, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote:
 On 06.10.2010 19:26, Greg Smith wrote:
 Now, the more relevant question, what I actually need in order for a
 Sync Rep feature in 9.1 to be useful to the people who want it most I
 talk to. That would be a simple to configure setup where I list a subset
 of important nodes, and the appropriate acknowledgement level I want
 to hear from one of them. And when one of those nodes gives that
 acknowledgement, commit on the master happens too. That's it. For use
 cases like the commonly discussed two local/two remote situation, the
 two remote ones would be listed as the important ones.

 This feels like the best way forward to me. It gives some flexibility,
 and doesn't need a new config file.

 Let me check that I got this right, and add some details to make it more
 concrete: Each standby is given a name. It can be something like
 boston1 or testserver. It does *not* have to be unique across all
 standby servers. In the master, you have a list of important,
 synchronous, nodes that must acknowledge each commit before it is
 acknowledged to the client.

 The standby name is a GUC in the standby's configuration file:

 standby_name='bostonserver'

 The list of important nodes is also a GUC, in the master's configuration
 file:

 synchronous_standbys='bostonserver, oxfordserver'

 To configure for a simple setup with a master and one synchronous
 standby (which is not a very good setup from availability point of view,
 as discussed to death), you give the standby a name, and put the same
 name in synchronous_standbys in the master.

 To configure a setup with a master and two standbys, so that a commit is
 acknowledged to client as soon as either one of the standbys acknowledge
 it, you give both standbys the same name, and the same name in
 synchronous_standbys list in the master. This is the configuration that
 gives zero data loss in case one server fails, but also caters for
 availability because you don't need to halt the master if one standby fails.

 To configure a setup with a master and two standbys, so that a commit is
 acknowledged to client after *both* standbys acknowledge it, you give
 both standbys a different name, and list both names in
 synchronous_standbys_list in the master.

 I believe this will bend to most real life scenarios people have.

+1. I think this would have met any needs of mine in my past life as a
sysadmin/dba.


 Now, the other big fight is over wait forever vs timeout.
 Personally, I'm stand firmly in the wait forever camp - you're nuts if
 you want a timeout. However, I can see that not everyone agrees :-).
 Fortunately, once we have robust wait forever behavior, it shouldn't
 be hard at all to add a timeout option on top of that, for those who
 want it. We should be able to have both options


I disagree that you're nuts if you want this feature fwiw. +1 on your
suggested plan though :-)

/D

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

On Thu, Oct 7, 2010 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote:

So if you have k = 3 and N = 10, then you can have 10 standbys and only
3 of them need to ack any specific commit for the master to proceed. As
long as (a) you retain at least one of the 3 which ack'd, and (b) you
have some way of determining which standby is the most caught up, data
loss is fairly unlikely; you'd need to lose 4 of the 10, and the wrong
4, to lose data.

The advantage of this for availability over just having k = N = 3 comes
when one of the standbys is responding slowly (due to traffic) or goes
offline unexpectedly due to a hardware failure. In the k = N = 3 case,
the system halts. In the k = 3, N = 10 case, you can lose up to 7
standbys without the system going down.

Sure, but here is where I might not be following.

If you want synchronous replication because you want query
availabilty while making sure you're not getting stale queries from
all your slaves, than using your k N (k = 3 and N - 10) situation is
screwing your self.

To get non-stale responses, you can only query those k=3 servers.
But you've shot your self in the foot because you don't know which
3/10 those will be. The other 7 *are* stale (by definition). They
talk about picking the caught up slave when the master fails, but
you actually need to do that for *every query*.

If you say they are pretty close so by the time you get the query to
them they will be caught up, well then, all you really want is good
async replication, you don't really *need* the synchronous part.

The only case I see a race to quorum type of k N being useful is
if you're just trying to duplicate data everywhere, but not actually
querying any of the replicas. I can see that all queries go to the
master, but the chances are pretty high the multiple machines are
going to fail so I want multiple replicas being useful, but I
*don't* think that's what most people are wanting in their I want 3
of 10 servers to ack the commit.

The difference between good async and sync is only the *guarentee*.
If you don't need the guarantee, you don't need the synchronous part.

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] standby registration (was: is sync rep stalled?)

On 10/7/10 10:27 AM, Heikki Linnakangas wrote:
 The standby name is a GUC in the standby's configuration file:
 
 standby_name='bostonserver'
 
 The list of important nodes is also a GUC, in the master's configuration
 file:
 
 synchronous_standbys='bostonserver, oxfordserver'

This seems to abandon Simon's concept of per-transaction synchronization
control.  That seems like such a potentially useful feature that I'm
reluctant to abandon it just for administrative elegance.

Does this work together with that in some way I can't see?

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit


 If you want synchronous replication because you want query
 availabilty while making sure you're not getting stale queries from
 all your slaves, than using your k  N (k = 3 and N - 10) situation is
 screwing your self.

Correct. If that is your reason for synch standby, then you should be
using k = N configuration.

However, some people are willing to sacrifice consistency for durability
and availability.  We should give them that option (eventually), since
among that triad you can never have more than two.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

On 10/07/2010 06:41 PM, Greg Smith wrote:
 The cost of hardware capable of running a database server is a large
 multiple of what you can build an alerting machine for.

You realize you don't need lots of disks nor RAM for a box that only
ACKs? A box with two SAS disks and a BBU isn't that expensive anymore.

 I do not disagree with your theory or reasoning.  But as a practical
 matter, I'm afraid the true cost of the better guarantee you're
 suggesting here is additional code complexity that will likely cause
 this feature to miss 9.1 altogether.  As far as I'm concerned, this
 whole diversion into the topic of quorum commit is only consuming
 resources away from targeting something achievable in the time frame of
 a single release.

So far I've been under the impression that Simon already has the code
for quorum_commit k = 1.

What I'm opposing to is the timeout feature, which I consider to be
additional code, unneeded complexity and foot-gun.

 You cannot make a single server reliable enough to survive all of
 the things that Murphy's Law will inflict upon it, at any price.

That's exactly what I'm saying applies to two servers as well. And why a
timeout is a bad thing here, because the chance the second nodes fails
as well is there (and is higher than you think, according to Murphy).

 For
 most of the businesses I work with who want sync rep, data is not
 considered safe until the second copy is on storage miles away from the
 original, because they know this too.

Now, that are the people who really need sync rep, yes. What do you
think how happy those businesses were to find out that Postgres is
cheating on them in case of a network outage, for example? Do they
really value (write!) availability more than data safety?

 Silly
 me, I'd only spread them across two adjacent states with different power
 providers!  Not nearly good enough to avoid a correlated failure.

Thanks for sharing this. I hope you didn't loose data.

Regards

Markus Wanner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] standby registration (was: is sync rep stalled?)

On Thu, Oct 7, 2010 at 1:39 PM, Dave Page dp...@pgadmin.org wrote:
 On 10/7/10, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote:
 On 06.10.2010 19:26, Greg Smith wrote:
 Now, the more relevant question, what I actually need in order for a
 Sync Rep feature in 9.1 to be useful to the people who want it most I
 talk to. That would be a simple to configure setup where I list a subset
 of important nodes, and the appropriate acknowledgement level I want
 to hear from one of them. And when one of those nodes gives that
 acknowledgement, commit on the master happens too. That's it. For use
 cases like the commonly discussed two local/two remote situation, the
 two remote ones would be listed as the important ones.

 This feels like the best way forward to me. It gives some flexibility,
 and doesn't need a new config file.

 Let me check that I got this right, and add some details to make it more
 concrete: Each standby is given a name. It can be something like
 boston1 or testserver. It does *not* have to be unique across all
 standby servers. In the master, you have a list of important,
 synchronous, nodes that must acknowledge each commit before it is
 acknowledged to the client.

 The standby name is a GUC in the standby's configuration file:

 standby_name='bostonserver'

 The list of important nodes is also a GUC, in the master's configuration
 file:

 synchronous_standbys='bostonserver, oxfordserver'

 To configure for a simple setup with a master and one synchronous
 standby (which is not a very good setup from availability point of view,
 as discussed to death), you give the standby a name, and put the same
 name in synchronous_standbys in the master.

 To configure a setup with a master and two standbys, so that a commit is
 acknowledged to client as soon as either one of the standbys acknowledge
 it, you give both standbys the same name, and the same name in
 synchronous_standbys list in the master. This is the configuration that
 gives zero data loss in case one server fails, but also caters for
 availability because you don't need to halt the master if one standby fails.

 To configure a setup with a master and two standbys, so that a commit is
 acknowledged to client after *both* standbys acknowledge it, you give
 both standbys a different name, and list both names in
 synchronous_standbys_list in the master.

 I believe this will bend to most real life scenarios people have.

 +1. I think this would have met any needs of mine in my past life as a
 sysadmin/dba.

Before we get too far down the garden path here, this is actually
substantially more complicated than what Greg proposed.  Greg was
proposing, as have some other folks I think, to focus only on the k=1
case - in other words, only one acknowledgment would ever be required
for any given commit.  I think he's right to focus on that case,
because the multiple-ACKs-required solutions are quite a bit hairier.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] standby registration (was: is sync rep stalled?)

On Thu, Oct 7, 2010 at 1:45 PM, Josh Berkus j...@agliodbs.com wrote:
 On 10/7/10 10:27 AM, Heikki Linnakangas wrote:
 The standby name is a GUC in the standby's configuration file:

 standby_name='bostonserver'

 The list of important nodes is also a GUC, in the master's configuration
 file:

 synchronous_standbys='bostonserver, oxfordserver'

 This seems to abandon Simon's concept of per-transaction synchronization
 control.  That seems like such a potentially useful feature that I'm
 reluctant to abandon it just for administrative elegance.

 Does this work together with that in some way I can't see?

I think they work together fine.  Greg's idea is that you list the
important standbys, and a synchronization guarantee that you'd like to
have for at least one of them.  Simon's idea - at least at 10,000 feet
- is that you can take a pass on that guarantee for transactions that
don't need it.  I don't see why you can't have both.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit


 But as a practical matter, I'm afraid the true cost of the better
 guarantee you're suggesting here is additional code complexity that will
 likely cause this feature to miss 9.1 altogether.  As far as I'm
 concerned, this whole diversion into the topic of quorum commit is only
 consuming resources away from targeting something achievable in the time
 frame of a single release.

Yes. My purpose in starting this thread was to show that k  1 quorum
commit is considerably more complex than the people who have been
bringing it up in other threads seem to think it is.  It is not
achievable for 9.1, and maybe not even for 9.2.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Issues with two-server Synch Rep

Simon, Fujii,

What follows are what I see as the major issues with making two-server
synch replication work well.  I would like to have you each answer them,
explaining how your patch and your design addresses each issue.  I
believe this will go a long way towards helping the majority of the
community understand the options we have from your code, as well as
where help is still needed.

Adding a Synch Standby
---
What is the procedure for adding a new synchronous standby in your
implementation?  That is, how do we go from having a standby server with
an empty PGDATA to having a working synchronous standby?

Snapshot Publication
-
During 9.0 development discussion, one of the things we realized we
needed for synch standby was publication of snapshots back to the master
in order to prevent query cancel on the standby.  Without this, the
synch standby is useless for running read queries.  Does your patch
implement this?  Please describe.

Management
---
One of the serious flaws currently in HS/SR is complexity of
administration.  Setting up and configuring even a single master and
single standby requires editing up to 6 configuration files in Postgres,
as well as dealing with file permissions.  As such, any Synch Rep patch
must work together with attempts to simplify administration.  How does
your design do this?

Monitoring
---
Synch rep offers severe penalties to availability if a synch standby
gets behind or goes down.  What replication-specific monitoring tools
and hooks are available to allow administators to take action before the
database becomes unavailable?

Degradation

In the event that the synch rep standby falls too far behind or becomes
unavailable, or is deliberately taken offline, what are you envisioning
as the process for the DBA resolving the situation?  Is there any
ability to commit stuck transactions?

Client Consistency
-
With a standby in apply mode, and a master failure at the wrong time,
there is the possibility that the Standby will apply a transaction at
the same time that the master crashes, causing the client to never
receive a commit message.  Once the client reconnects to the standby,
how will it know whether its transaction was committed or not?

As a lesser case, a standby in apply mode will show the results of
committed transactions *before* they are visible on the master.  Is
there any need to handle this?  If so, how?

Performance

As with XA, synch rep has the potential to be so slow as to be unusable.
 What optimizations to you make in your approach to synch rep to make it
faster than two-phase commit?  What other performance optimizations have
you added?

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] standby registration (was: is sync rep stalled?)

On Thu, Oct 7, 2010 at 1:27 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

Let me check that I got this right, and add some details to make it more
concrete: Each standby is given a name. It can be something like boston1
or testserver. It does *not* have to be unique across all standby servers.
In the master, you have a list of important, synchronous, nodes that must
acknowledge each commit before it is acknowledged to the client.

The standby name is a GUC in the standby's configuration file:

standby_name='bostonserver'

The list of important nodes is also a GUC, in the master's configuration
file:

synchronous_standbys='bostonserver, oxfordserver'

+1.

It definitely covers the scenarios I want.

And even allows the ones I don't want, and don't understand either ;-)

I and personally, I'ld *love* it if the streaming replication protocol
was adjusted to that every streaming WAL client reported back their
role and recive/fsync/replay positions as part of the protocol
(allowing role and positions to be something NULLable/empty/0). I
think Simon demonstrated that the overhead to report it isn't high.
Again, in the deployments I'm wanting, the slave isn't a PG server,
but something like Magnus's stream-to-archive, so I can't query the
slave to see how far behind it is.

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

Aidan Van Dyk ai...@highrise.ca wrote:
 
 To get non-stale responses, you can only query those k=3
 servers.  But you've shot your self in the foot because you don't
 know which 3/10 those will be.  The other 7 *are* stale (by
 definition). They talk about picking the caught up slave when
 the master fails, but you actually need to do that for *every
 query*.
 
With web applications, at least, you often don't care that the data
read is absolutely up-to-date, as long as the point in time doesn't
jump around from one request to the next.  When we have used load
balancing between multiple database servers (which has actually
become unnecessary for us lately because PostgreSQL has gotten so
darned fast!), we have established affinity between a session and
one of the database servers, so that if they became slightly out of
sync, data would not pop in and out of existence arbitrarily.  I
think a reasonable person could combine this technique with a 3 of
10 synchronous replication quorum to get both safe persistence of
data and reasonable performance.
 
I can also envision use cases where this would not be desirable.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

On Thu, Oct 7, 2010 at 2:10 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Aidan Van Dyk ai...@highrise.ca wrote:

 To get non-stale responses, you can only query those k=3
 servers.  But you've shot your self in the foot because you don't
 know which 3/10 those will be.  The other 7 *are* stale (by
 definition). They talk about picking the caught up slave when
 the master fails, but you actually need to do that for *every
 query*.

 With web applications, at least, you often don't care that the data
 read is absolutely up-to-date, as long as the point in time doesn't
 jump around from one request to the next.  When we have used load
 balancing between multiple database servers (which has actually
 become unnecessary for us lately because PostgreSQL has gotten so
 darned fast!), we have established affinity between a session and
 one of the database servers, so that if they became slightly out of
 sync, data would not pop in and out of existence arbitrarily.  I
 think a reasonable person could combine this technique with a 3 of
 10 synchronous replication quorum to get both safe persistence of
 data and reasonable performance.

 I can also envision use cases where this would not be desirable.

Well, keep in mind all updates have to be done on the single master.
That works pretty well for fine-grained replication, but I don't think
it's very good for full-cluster replication.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

Robert Haas robertmh...@gmail.com wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 
 With web applications, at least, you often don't care that the
 data read is absolutely up-to-date, as long as the point in time
 doesn't jump around from one request to the next.  When we have
 used load balancing between multiple database servers (which has
 actually become unnecessary for us lately because PostgreSQL has
 gotten so darned fast!), we have established affinity between a
 session and one of the database servers, so that if they became
 slightly out of sync, data would not pop in and out of existence
 arbitrarily.  I think a reasonable person could combine this
 technique with a 3 of 10 synchronous replication quorum to get
 both safe persistence of data and reasonable performance.
 
 I can also envision use cases where this would not be desirable.
 
 Well, keep in mind all updates have to be done on the single
 master.  That works pretty well for fine-grained replication, but
 I don't think it's very good for full-cluster replication.
 
I'm completely failing to understand your point here.  Could you
restate another way?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] standby registration (was: is sync rep stalled?)


 I think they work together fine.  Greg's idea is that you list the
 important standbys, and a synchronization guarantee that you'd like to
 have for at least one of them.  Simon's idea - at least at 10,000 feet
 - is that you can take a pass on that guarantee for transactions that
 don't need it.  I don't see why you can't have both.

So, two things:

1) This version of Standby Registration seems to add One More Damn Place
You Need To Configure Standby (OMDPYNTCS) without adding any
functionality you couldn't get *without* having a list on the master.
Can someone explain to me what functionality is added by this approach
vs. not having a list on the master at all?

2) I see Simon's approach where you can designate not just synch/asynch,
but synch *mode* per session to be valuable.  I can imagine having
transactions I just want to ack vs. transactions I want to apply
according to application logic (e.g. customer personal information vs.
financial transactions).  This approach would still seem to remove that
functionality.  Does it?

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

On 10/07/2010 03:19 PM, Dimitri Fontaine wrote:
 I think you're all into durability, and that's good. The extra cost is
 service downtime

It's just *reduced* availability. That doesn't necessarily mean
downtime, if you combine cleverly with async replication.

 if that's not what you're after: there's also
 availability and load balancing read queries on a system with no lag (no
 stale data servicing) when all is working right.

All I'm saying is that those use cases are much better served with async
replication. Maybe together with something that warns and takes action
in case the standby's lag gets too big.

Or what kind of customers do you think really need a no-lag solution for
read-only queries? In the LAN case, the lag of async rep is negligible
and in the WAN case the latencies of sync rep are prohibitive.

 My proposal is to make the risk window obvious and the
 behavior when you enter it configurable.

I don't buy that. The risk calculation gets a lot simpler and obvious
with strict guarantees.

Regards

Markus Wanner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

On 10/07/2010 07:44 PM, Aidan Van Dyk wrote:
 The only case I see a race to quorum type of k  N being useful is
 if you're just trying to duplicate data everywhere, but not actually
 querying any of the replicas.  I can see that all queries go to the
 master, but the chances are pretty high the multiple machines are
 going to fail so I want  multiple replicas being useful, but I
 *don't* think that's what most people are wanting in their I want 3
 of 10 servers to ack the commit.

What else do you think they want it for, if not for protection against
data loss?

(Note that the queries don't need to go to the master exclusively if you
can live with some lag - and I think the vast majority of people can.
The zero data loss guarantee holds true in any case, though).

 The difference between good async and sync is only the *guarentee*.
 If you don't need the guarantee, you don't need the synchronous part.

Here we are exactly on the same page again.

Regards

Markus Wanner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] a few small bugs in plpgsql

2010-10-07 Thread Pavel Stehule

Hello,

today I found a few bugs:

a) parser allow a labels on invalid positions with strange runtime bug:

postgres=# CREATE OR REPLACE FUNCTION foo()
RETURNS void AS $$
BEGIN
  FOR i IN 1..2
  invalidLabel
  LOOP
RAISE NOTICE '%',i;
  END LOOP;
END;
$$ LANGUAGE plpgsql;
CREATE FUNCTION

ERROR:  column invalidlabel does not exist
LINE 2:   invalidLabel
 ^
QUERY:  SELECT 2
  invalidLabel
CONTEXT:  PL/pgSQL function foo line 3 at FOR with integer loop variable
postgres=#

b) SRF functions must not be finished by RETURN statement - I know, so
there is outer default block, but it looks like inconsistency for SRF
functions, because you can use a RETURN NEXT without RETURN. It maybe
isn't bug - but I am filling it as inconsistency.

postgres=# CREATE OR REPLACE FUNCTION fg(OUT i int)
RETURNS SETOF int AS $$
BEGIN
  FOR i IN 1..3
  LOOP fg.i := i;
RETURN NEXT;
  END LOOP;
END;
$$ LANGUAGE plpgsql;
CREATE FUNCTION

postgres=# select fg();
 fg

  1
  2
  3
(3 rows)

Regards

Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] standby registration (was: is sync rep stalled?)

On Thu, Oct 7, 2010 at 2:33 PM, Josh Berkus j...@agliodbs.com wrote:
 I think they work together fine.  Greg's idea is that you list the
 important standbys, and a synchronization guarantee that you'd like to
 have for at least one of them.  Simon's idea - at least at 10,000 feet
 - is that you can take a pass on that guarantee for transactions that
 don't need it.  I don't see why you can't have both.

 So, two things:

 1) This version of Standby Registration seems to add One More Damn Place
 You Need To Configure Standby (OMDPYNTCS) without adding any
 functionality you couldn't get *without* having a list on the master.
 Can someone explain to me what functionality is added by this approach
 vs. not having a list on the master at all?

Well, then you couldn't have one strictly synchronous standby and one
asynchronous standby.

 2) I see Simon's approach where you can designate not just synch/asynch,
 but synch *mode* per session to be valuable.  I can imagine having
 transactions I just want to ack vs. transactions I want to apply
 according to application logic (e.g. customer personal information vs.
 financial transactions).  This approach would still seem to remove that
 functionality.  Does it?

I'm not totally sure.  I think we could probably avoid removing that
with careful detailed design.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

On Thu, Oct 7, 2010 at 2:31 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Robert Haas robertmh...@gmail.com wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov wrote:

 With web applications, at least, you often don't care that the
 data read is absolutely up-to-date, as long as the point in time
 doesn't jump around from one request to the next.  When we have
 used load balancing between multiple database servers (which has
 actually become unnecessary for us lately because PostgreSQL has
 gotten so darned fast!), we have established affinity between a
 session and one of the database servers, so that if they became
 slightly out of sync, data would not pop in and out of existence
 arbitrarily.  I think a reasonable person could combine this
 technique with a 3 of 10 synchronous replication quorum to get
 both safe persistence of data and reasonable performance.

 I can also envision use cases where this would not be desirable.

 Well, keep in mind all updates have to be done on the single
 master.  That works pretty well for fine-grained replication, but
 I don't think it's very good for full-cluster replication.

 I'm completely failing to understand your point here.  Could you
 restate another way?

Establishing an affinity between a session and one of the database
servers will only help if the traffic is strictly read-only.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

Markus Wanner mar...@bluegap.ch writes:
 I don't buy that. The risk calculation gets a lot simpler and obvious
 with strict guarantees.

Ok, I'm lost in the use cases and analysis.


I still don't understand why you want to consider the system already
synchronous when it's not, whatever is the guarantee you're asking for.

All I'm saying is that we should be able to know and show what the
current system is up to, and we should be able to offer sane reactions
in case of errors.

You're calling a sane reaction blocking the master entirely when the
standby ain't ready yet (it's still at the base backup state), and I can
live with that. As an option.

I say that either we go the lax quorum route, or we have to care for
details and summary the failure cases with precision, and the possible
responses with care. I don't see that possible without a clear state of
each element in the system, their transitions, and a way to derive the
global state of the distributed system out of that.

It might be that the simpler way to go here is what Greg Smith has been
proposing for a long time already, and again quite recently on this
thread: have all the information you need in a system table and offer to
run a user defined function to determine the state of the system.


I think we managed to show what Josh Berkus wanted to know now. That's a
quagmire here. Now, the problem I have is not Quorum Commit but the very
definition of synchronous replication and the system we're trying to
build. Not sure there's two of us wanting the same thing here.


Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

Robert Haas robertmh...@gmail.com wrote:
 
 Establishing an affinity between a session and one of the database
 servers will only help if the traffic is strictly read-only.
 
Thanks; I now see your point.
 
In our environment, that's pretty common.  Our most heavily used web
app (the one for which we have, at times, needed load balancing)
connects to the database with a read-only login.  Many of our web
apps do their writing by posting to queues which are handled at the
appropriate source database later.  (I had the opportunity to use
one of these for real last night, to fill in a juror questionnaire
after receiving a summons from the jury clerk in the county where I
live.)
 
Like I said, there are sane cases for this usage, but it won't fit
everybody.  I have no idea on percentages.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Git cvsserver serious issue

2010-10-07 Thread Andrew Dunstan




On 10/07/2010 10:11 AM, Magnus Hagander wrote:



OTOH, this patch seems pretty small and simple to maintain.

True, it is rather small.

Does anybody know if there's an automated way to maintain that on
freebsd ports, and if so, how that works? I want to be *sure* we can't
accidentally upgrade git-cvsserver *without* the patch, since that is
a security issue.



Why not just make a local copy somewhere else and patch and run that? 
It's just a Perl script, no?


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On Scalability

2010-10-07 Thread Greg Stark

Firstly I want to say I think this discussion is over-looking some
benefits of the current system in other use cases. I don't think we
should get rid of the current system even once we have proper
partitioning. It solves use cases such as data warehouse queries that
need to do a full table scan of some subset of the data which happens
to be located in a single sub-table quite well. In that case being
able to do a sequential scan instead of an index range scan is a big
benefit and the overhead of the analysis is irrelevant for a data
warehouse query. And the constraint may or may not have anything to do
with the partitioning key. You cold have constraints like customer_id
in (...) for last month's financial records so lookups for new
customers don't need to check all the historical tables from before
they became customers.

In fact what I'm interested in doing is extending the support to use
stats on children marked read-only. If we have a histogram for a table
which has been marked read-only since the table was analyzed then we
could trust the upper and lower bounds or the most-frequent-list to
exclude partitions. That would really help for things like date-range
lookups on tables where the partition key is financial quarter or
invoice_id or some other nearly perfectly correlated column.

None of this replaces having a good partitioning story for OLTP
queries and management needs. But it extends the usefulness of that
setup to data warehouse queries on other related columns that haven't
been explicitly declared as the partitioning key.


On Thu, Oct 7, 2010 at 8:35 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Vincenzo Romano vincenzo.rom...@notorand.it wrote:
 2010/10/7 Stephen Frost sfr...@snowman.net:

 Yes, that would be the problem.  Proving something based on
 expressions is alot more time consuming and complicated than
 being explicitly told what goes where.

 Consuming computing resources at DDL-time should be OK if that
 will lead to big savings at DML-time (run-time), my opinion. It'd
 be just like compile time optimizations.

 I think something you haven't entirely grasped is how pluggable
 PostgreSQL is -- you can not only define your own functions in a
 wide variety of languages (including C), but your own data types,
 operators, casts, index strategies, etc.

I suspect it's likely that a partitioning system would only work with
btree opclasses anyways. It might be interesting to think about what
it would take to make the setups we've talked about in the past work
with arbitrary operator classes as long as those operator classes
support some concept of mutually exclusive. But nothing we've talked
about so far would be that flexible.

Pre-analyzing the check constraints to construct a partitioning data
structure might even be a plausible way to move forward -- I don't see
any obvious show-stoppers. The system could look for a set of btree
opclass based conditions that guarantee all the partitions are
mutually exclusive ranges.

My instincts tell me it would be less useful though because there's
less the system would be able to do with that structure to help the
user. That is, if it *can't* prove the constraints are mutually
exclusive then the user is left with a bunch of check constraints and
no useful feedback about what they've done wrong. And if it can prove
it the user is happy but the next time he has to add a partition he
has to look at the existing partitions and carefully construct his
check constraint instead of having the system help him out by
supplying one side of the bounds and providing a convenient syntax. It
would also be hard to specify how to automatically add partitions
which I expect is a feature people will want eventually.

There are some plus sides as well -- allowing some optimizations for
check constraints without requiring the user to promise to always use
that as their partitioning key in the future. But I think on the whole
it would be a disadvantage.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Git cvsserver serious issue

2010-10-07 Thread Magnus Hagander

On Thu, Oct 7, 2010 at 21:31, Andrew Dunstan and...@dunslane.net wrote:


 On 10/07/2010 10:11 AM, Magnus Hagander wrote:

 OTOH, this patch seems pretty small and simple to maintain.

 True, it is rather small.

 Does anybody know if there's an automated way to maintain that on
 freebsd ports, and if so, how that works? I want to be *sure* we can't
 accidentally upgrade git-cvsserver *without* the patch, since that is
 a security issue.


 Why not just make a local copy somewhere else and patch and run that? It's
 just a Perl script, no?

Yeah, but then we have to remember to manually patch that one when
somebody *else* finds/fixes a security issue. We have automatic
monitoring on the ports stuff to detect when that happens..

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Timeout and Synch Rep

All,

In my effort to make the discussion around the design decisions of synch
rep less opaque, I'm starting a separate thread about what has developed
to be one of the more contentious issues.

I'm going to champion timeouts because I plan to use them.  In fact, I
plan to deploy synch rep with a timeout if it's available within 2 weeks
of 9.1 being released.  Without a timeout (i.e. wait forever is the
only mode), that project will probably never use synch rep.

Let me give you my use-case so that you can understand why I want a timeout.

Client is a telecommunications service provider.  They have a primary
server and a failover server for data updates.  They also have two async
slaves on older machines for reporting purposes.   The failover
currently does NOT accept any queries in order to keep it as current as
possible.

They would like the failover to be synchronous so that they can
guarentee no data loss in the event of a master failure.  However, zero
data loss is less important to them than uptime ... they have a five9's
SLA with their clients, and the hardware on the master is very good.

So, if something happens to the standby, and it cannot return an ack in
30 seconds, they would like it to degrade to asynch mode.  At that
point, they would also like to trigger a nagios alert which will wake up
the sysadmin with flashing red lights.  Once he has resolved the
problem, he would like to promote the now-asynch standby back to synch
standby.

Yes, this means that, in the event of a standby failure, they have a
window where any failure on the master will mean data loss.  The user
regards this risk as acceptable, given that both the master and the
failover are located in the same data center in any case, so there is
always a risk of a sufficient disaster wiping out all data back to the
daily backup.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] standby registration (was: is sync rep stalled?)

2010-10-07 Thread Greg Stark

On Thu, Oct 7, 2010 at 10:27 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 The standby name is a GUC in the standby's configuration file:

 standby_name='bostonserver'


Fwiw I was hoping it would be possible to set every machine up with an
identical postgresql.conf file. That doesn't preclude this idea since
you could start up your server with a script that sets the GUC on the
command-line and that script could use whatever it wants to look up
its name such as using its hardware info to look it up in a database.
But just something to keep in mind.

In particular I would want to be able to configure everything
identically and then have each node run some kind of program which
determines its name and position in the replication structure. This
implies that each node given its identity and the total view of the
structure can figure out what it should be doing including whether to
be read-only or read-write, who to contact as its master, and whether
to listen from slaves.

If every node needs a configuration file specifying multiple
interdependent variables which are all different from server to server
it'll be too hard to keep them all in sync. I would rather tell every
node, here's how to push to the archive, here's how to pull, here's
the whole master-slave structure even the parts you don't need to know
about and the redundant entry for yourself -- now here's your name go
figure out whether to push or pull and from where

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On Scalability

On Thu, 2010-10-07 at 14:10 +0200, Vincenzo Romano wrote:

 Making these things sub-linear (whether not O(log n) or even O(1) ),
 provided that there's  way to, would make this RDBMS more appealing
 to enterprises.
 I mean also partial indexes (as an alternative to table partitioning).
 Being able to effectively cope with a dozen child tables or so it's more
 like an amateur feature.
 If you really need partitioning (or just hierarchical stuff) I think you'll 
 need
 for quite more than a dozen items.
 If you partition by just weeks, you'll need 50+ a year.
 
 Is there any precise direction to where look into the code for it?
 
 Is there a way to put this into a wish list?

It's already on the wish list (TODO) and has been for many years.

We've mostly lacked somebody with the experience and time/funding to
complete that implementation work. I figure I'll be doing it for 9.2
now; it may be difficult to do this for next release.

Theoretically, this can be O(n.log n) for range partitioning and O(1)
for exact value partitioning, though the latter isn't a frequent use
case.

Your conclusion that the current partitioning only works with a dozen or
so items doesn't match the experience of current users however.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit

On Thu, 2010-10-07 at 13:44 -0400, Aidan Van Dyk wrote:

 To get non-stale responses, you can only query those k=3 servers.
 But you've shot your self in the foot because you don't know which
 3/10 those will be.  The other 7 *are* stale (by definition).  They
 talk about picking the caught up slave when the master fails, but
 you actually need to do that for *every query*.

There is a big confusion around that point and I need to point out that
statement isn't accurate. It's taken me a long while to understand this.

Asking for k  1 does *not* mean those servers are time synchronised.
All it means is that the master will stop waiting after 3
acknowledgements. There is no connection between the master receiving
acknowledgements and the standby applying changes received from master;
the standbys are all independent of one another.

In a bad case, those 3 acknowledgements might happen say 5 seconds apart
on the worst and best of the 3 servers. So the first standby to receive
the data could have applied the changes ~4.8 seconds prior to the 3rd
standby. There is still a chance of reading stale data on one standby,
but reading fresh data on another server. In most cases the time window
is small, but still exists.

The other 7 are stale with respect to the first 3. But then so are the
last 9 compared with the first one. The value of k has nothing
whatsoever to do with the time difference between the master and the
last standby to receive/apply the changes. The gap between first and
last standby (i.e. N, not k) is the time window during which a query
might/might not see a particular committed result.

So standbys are eventually consistent whether or not the master relies
on them to provide an acknowledgement. The only place where you can
guarantee non-stale data is on the master.

High values of k reduce the possibility of data loss, whereas expected
cluster availability is reduced as N - k gets smaller.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with Quorum Commit