Re: [HACKERS] Replication documentation addition

2006-10-27 Thread Richard Troy

On Wed, 25 Oct 2006, Bruce Momjian wrote:

   ...snip...
>
> > Data partitioning is often done within a single database on a single
> > server and therefore, as a concept, has nothing whatsoever to do with
> > different servers. Similarly, the second paragraph of this section is
>
> Uh, why would someone split things up like that on a single server?
>
> > problematic. Please define your term first, then talk about some
> > implementations - this is muddying the water. Further, there are both
> > vertical and horizontal partitioning - you mention neither - and each has
> > its own distinct uses. If partitioning is mentioned, it should be more
> > complete.
>
> Uh, what exactly needs to be defined.

OK, "Data partitioning"; data partitioning begins in the RDB world with
the very notion of tables, and we partition our data during schema
development with the goal of "normalizing" the design - "thrid normal
form" being the one most Professors talk about as a target. "Data
partitioning", then, is the intentional denormalization of the design to
accomplish some goal(s) - not all of which are listed in this document's
title. In this context, data partitioning takes two forms based upon which
axis of a two-dimensional table is to be divided, with the vertical
partition dividing attributes (as in a master/detail relationship with
one-to-one mapping), and the horizontal partition dividing based on one or
more attributes domain, or value (as in your example of London records
being kept in a database in London, while Paris records are kept in
Paris).

The point I was making was that that section of the document was in err
because it presumed there was only one form of data partitioning and that
it was horizontal. (The document is now missing, so I can't look at the
current content - it was here:
ftp://momjian.us/pub/postgresql/mypatches/replication.)

In answer to your query about why someone would use such partitioning, the
nearly universal answer is performance, and the distant second answer is
security. In one example that comes immediately to mind, there is a table
which is a central core of an application, and, as such, there's a lot to
say about the items in this table. The table's size is in the tens to
hundreds of millions of rows, and needs to be joined with something else
in a huge fraction of queries.  For performance reasons, the tables size
was therefore kept as tiny as possible and detail table(s) is(are) used
for the remaining attributes that logically belong in the table - it's a
vertical partition. It's an exceptionally common technique - so common, it
probably didn't occur to you that you were even talking about it when you
spoke of "data partitioning."

> > Next, Query Broadcast Load Balancing... also needs a lot of work. First,
> > it's foremost in my memory that sending read queries everywhere and
> > returning the first result set back is a key way to improve application
> > performance at the cost of additional load on other systems - I guess
> > that's not at all what the document is after here, but it's a worthy part
> > of a dialogue on broadcasting queries. In other words, this has more parts
> > to it than just what the document now entertains. Secondly, the document
>
> Uh, do we want to go into that here?  I guess I could.
>
> > doesn't address _at_all_ whether this is a two-phaise-commit environment
> > or not. If not, how are updates managed? If each server operates
> > independently and one of them fails, what do you do then? How do you know
> > _any_ server got an insert/update? ...  Each server _can't_ operate
> > independently unless the application does its own insert/update commits to
> > every one of them - and that can't be fast, nor does it load balance,
> > though it may contribute to superior uptime performance by the
> > application.
>
> I think having the application middle layer do the commits is how it
> works now.  Can someone explain how pgpool works, or should we mention
> how two-phase commit has to be done here?  pgpool2 has additional
> features.

Well, you hadn't mentioned two phaise commit at all and it surely belong
somewhere in this document - it's a core PG feature and enables a lot of
alternative solutions which the document discusses.

What it needs to say but doesn't (didn't?) is that the load from read
queries can be distributed for load balancing purposes but that there's no
benefit possible for writes, and that replication overhead costs could
possibly overwhelm the benefits in high-update scenarios. The point that
each server operates independently is only true if you ignore the the
necessary replication - which, to my mind, links the systems and they are
not independent. ...I suppose that in a completely read-only environment -
or updated nightly by dumping tarwads or something like that, they could
be considered independent, but it's hardly worth the sentence.

Regards,
Richard

-- 
Richard Troy, Chief Scientist
Science Tools Corporation
510-924-1363 o

Re: [HACKERS] Replication documentation addition

2006-10-26 Thread Andrew Sullivan
On Thu, Oct 26, 2006 at 03:06:13PM -0400, Robert Treat wrote:
> 
> Unfortunately the techdocs system won't support a url like the one above, 
> rather you'll end up with something more like the following  
> http://www.postgresql.org/docs/techdocs.54 which is the "GUI Tools Guide" 
> (which is linked in the FAQ fwiw).  Once it is in place, it will be stable 
> though. 

Surely this is what redirects were invented for, no? 

http://www.postgresql.org/replication redirects to [stable magic URL]

Put the former in the docs.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
Users never remark, "Wow, this software may be buggy and hard 
to use, but at least there is a lot of code underneath."
--Damien Katz

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Replication documentation addition

2006-10-26 Thread Robert Treat
On Thursday 26 October 2006 10:45, Andrew Sullivan wrote:
> On Wed, Oct 25, 2006 at 05:46:33PM -0400, Bruce Momjian wrote:
> > Josh Berkus wrote:
> > > So, like www.postgresql.org/docs/techdocs/replication?   That would
> > > work.
> >
> > Yes.
>
> I like that idea, but I think that the URL needs to be decided upon,
> needs to be stable, and needs to be put into the docs.  (I don't see
> it ATM, I guess because the URL isn't chosen yet?)  We get so many
> questions about "what replication system" that I'm sure people are
> looking for outlines.
>
> A

Unfortunately the techdocs system won't support a url like the one above, 
rather you'll end up with something more like the following  
http://www.postgresql.org/docs/techdocs.54 which is the "GUI Tools Guide" 
(which is linked in the FAQ fwiw).  Once it is in place, it will be stable 
though. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Replication documentation addition

2006-10-26 Thread Richard Troy

On Wed, 25 Oct 2006, Josh Berkus wrote:
>
> Bruce,
>
> > It isn't designed for that.  It is designed for people to understand
> > what they want, and then they can look around for solutions.  I think
> > most agree we don't want a list of solutions in the documentation,
> > though I have a few as examples.
>
> Do they?   I've seen no discussion of the matter.  I think we should have
> them.
>
>

I completely agree; If you want to attract competent people from the
business world, one thing you have to do is respect their time by helping
them find information, especially about things they don't know exist. All
that's needed are pointers, but the pointers need to be to solid
documents/resources, not just the top of a heap - if you'll forgive the
pun.

Richard



-- 
Richard Troy, Chief Scientist
Science Tools Corporation
510-924-1363 or 202-747-1263
[EMAIL PROTECTED], http://ScienceTools.com/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Replication documentation addition

2006-10-26 Thread Alexey Klyukin

Hi,

A typo:
("a write to any server has to be _propogated_")
s/propogated/propagated

Bruce Momjian wrote:

Here is a new replication documentation section I want to add for 8.2:

ftp://momjian.us/pub/postgresql/mypatches/replication

Comments welcomed.

  

--
Regards,

Alexey Klyukin  alexk(at)vollmond.org.ua
Simferopol, Crimea, Ukraine.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Replication documentation addition

2006-10-26 Thread Bruce Momjian

With no new additions submitted today, I have moved my text into our
SGML documentation:

http://momjian.us/main/writings/pgsql/sgml/failover.html

Please let me know what additional changes are needed.

---

bruce wrote:
> Richard Troy wrote:
> > 
> > > Here is a new replication documentation section I want to add for 8.2:
> > >
> > > ftp://momjian.us/pub/postgresql/mypatches/replication
> > >
> > 
> > ...Read the document, as promissed...
> > 
> > First paragraph, "(fail over)" is inconsistent with title, "failover", as
> > are other spots throughout the document. The whole document should be
> > consistent and I vote for "failover" and not "fail over."
> 
> OK.  Fixed to "failover"
> 
> > Fourth paragraph, "This "sync problem" is the fundamental difficulty for
> > servers working together"; "Sync problem" hasn't been defined. Actually,
> > you're talking about the consistent attribute of the "acid" properties of
> > all competent databases: Atomic, Consistency, Isolation, and Durability.
> > At least define the term you are using - probably most easily done in the
> > preceeding paragraph.
> 
> OK, "sync problem" term removed, and spelled out fully.
> 
> > The fifth paragraph needs a lot more help, I think. Howabout this
> > alternative:
> > 
> > So called "two phaised commit" was developed as a strategy in which two or
> > more databases are updated simultaneously and none of the data is
> > committed until all are committed. This guarantees consistency between the
> > databases with all propagation delay being absorbed by the writer at write
> > time. There are times when this propagation delay is large, so sometimes
> > alternatives are worked out which we'll call here "asynchronous updates,"
> > however, in these cases, there is always a window of time in which some
> > transaction can be lost should a failure occurr. For this reason,
> > asynchronous updates are only used when the possibility of such losses is
> > acceptible.
> 
> I have modified the paragraph to use some of your terms.
> 
> > Paragraphs six through to "shared disk failover" seem very awkward to me.
> > I don't like them at all.
> > 
> > "Shared disk failover" has nothing to do with "the sync problem" as it's
> > not a multiple-database solution. It's an uptime, "24 X 7 X 365" issue.
> > Further, it also has nothing to do with disk arrays, though it is often
> > used with RAID to help avoid disk based corruption problems.
> 
> Yes, please see updated version.  I removed the sync problem term from
> there.
> 
> > The point about Warm Standby needs to include a warning about WAL that it
> > MUST be sensitive to the semantics of the database design or else it's
> > fatally flawed. I'm talking about "referential integrety". That is to say,
> > it's inappropriate to capture updates on a table by table basis, as some
> > such systems do, (I have no idea what's done by anyone in the PG world on
> > this right now) because an update to one table (esp. inserts) very often
> > go hand in glove with updates in other tables and to get one without the
> > other can corrupt a database.
> 
> We don't have that problem.  We recover only full transactions.
> 
> > The description of "Continuously running replication server" should
> > include the critical caveat - repeated if you think it's already said
> > elsewhere - that it is ONLY suitable for applications in which a loss of
> > (missing) update data doesn't matter. For example, an airline reservation
> > system would be an inappropriate application for such a "solution" because
> > what seats are available cannot be guaranteed to be correct.
> 
> I have added note about data loss for the Slony item.
> 
> > Regarding data partitioning, I strongly disagree with the opening sentence
> > in that it doesn't split a database into sets, it splits tables into sets.
> 
> OK, changed.
> 
> > Data partitioning is often done within a single database on a single
> > server and therefore, as a concept, has nothing whatsoever to do with
> > different servers. Similarly, the second paragraph of this section is
> 
> Uh, why would someone split things up like that on a single server?
> 
> > problematic. Please define your term first, then talk about some
> > implementations - this is muddying the water. Further, there are both
> > vertical and horizontal partitioning - you mention neither - and each has
> > its own distinct uses. If partitioning is mentioned, it should be more
> > complete.
> 
> Uh, what exactly needs to be defined.
> 
> > Next, Query Broadcast Load Balancing... also needs a lot of work. First,
> > it's foremost in my memory that sending read queries everywhere and
> > returning the first result set back is a key way to improve application
> > performance at the cost of additional load on other systems - I guess
> > that's not at all what the document is after here, but it's a worthy part
> > of a dialogue on br

Re: [HACKERS] Replication documentation addition

2006-10-26 Thread Andrew Sullivan
On Wed, Oct 25, 2006 at 05:46:33PM -0400, Bruce Momjian wrote:
> Josh Berkus wrote:
> > So, like www.postgresql.org/docs/techdocs/replication?   That would work.
> 
> Yes.

I like that idea, but I think that the URL needs to be decided upon,
needs to be stable, and needs to be put into the docs.  (I don't see
it ATM, I guess because the URL isn't chosen yet?)  We get so many
questions about "what replication system" that I'm sure people are
looking for outlines.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
In the future this spectacle of the middle classes shocking the avant-
garde will probably become the textbook definition of Postmodernism. 
--Brad Holland

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Replication documentation addition

2006-10-25 Thread Bruce Momjian
Josh Berkus wrote:
> Bruce,
> 
> > Most people didn't want a list because there is no way to keep it
> > current in the docs, and a secondary web site was suggested for the
> > list.
> 
> So, like www.postgresql.org/docs/techdocs/replication?   That would work.

Yes.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Replication documentation addition

2006-10-25 Thread Josh Berkus
Bruce,

> Most people didn't want a list because there is no way to keep it
> current in the docs, and a secondary web site was suggested for the
> list.

So, like www.postgresql.org/docs/techdocs/replication?   That would work.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Replication documentation addition

2006-10-25 Thread Bruce Momjian
Josh Berkus wrote:
> Bruce,
> 
> > It isn't designed for that.  It is designed for people to understand
> > what they want, and then they can look around for solutions.  I think
> > most agree we don't want a list of solutions in the documentation,
> > though I have a few as examples.  
> 
> Do they?   I've seen no discussion of the matter.  I think we should have 
> them.

Most people didn't want a list because there is no way to keep it
current in the docs, and a secondary web site was suggested for the
list.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Replication documentation addition

2006-10-25 Thread Josh Berkus
Bruce,

> It isn't designed for that.  It is designed for people to understand
> what they want, and then they can look around for solutions.  I think
> most agree we don't want a list of solutions in the documentation,
> though I have a few as examples.  

Do they?   I've seen no discussion of the matter.  I think we should have 
them.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Replication documentation addition

2006-10-25 Thread Bruce Momjian
Josh Berkus wrote:
> Bruce,
> 
> > > > ftp://momjian.us/pub/postgresql/mypatches/replication
> 
> I'm still not seeing anything in this patch that tells users where they can 
> get replication solutions for PostgreSQL, either OSS or commercial.

It isn't designed for that.  It is designed for people to understand
what they want, and then they can look around for solutions.  I think
most agree we don't want a list of solutions in the documentation,
though I have a few as examples.  Also, some of the solutions don't
require software, but just configuration or special hardware.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Replication documentation addition

2006-10-25 Thread Bruce Momjian
Richard Troy wrote:
> 
> > Here is a new replication documentation section I want to add for 8.2:
> >
> > ftp://momjian.us/pub/postgresql/mypatches/replication
> >
> 
> ...Read the document, as promissed...
> 
> First paragraph, "(fail over)" is inconsistent with title, "failover", as
> are other spots throughout the document. The whole document should be
> consistent and I vote for "failover" and not "fail over."

OK.  Fixed to "failover"

> Fourth paragraph, "This "sync problem" is the fundamental difficulty for
> servers working together"; "Sync problem" hasn't been defined. Actually,
> you're talking about the consistent attribute of the "acid" properties of
> all competent databases: Atomic, Consistency, Isolation, and Durability.
> At least define the term you are using - probably most easily done in the
> preceeding paragraph.

OK, "sync problem" term removed, and spelled out fully.

> The fifth paragraph needs a lot more help, I think. Howabout this
> alternative:
> 
> So called "two phaised commit" was developed as a strategy in which two or
> more databases are updated simultaneously and none of the data is
> committed until all are committed. This guarantees consistency between the
> databases with all propagation delay being absorbed by the writer at write
> time. There are times when this propagation delay is large, so sometimes
> alternatives are worked out which we'll call here "asynchronous updates,"
> however, in these cases, there is always a window of time in which some
> transaction can be lost should a failure occurr. For this reason,
> asynchronous updates are only used when the possibility of such losses is
> acceptible.

I have modified the paragraph to use some of your terms.

> Paragraphs six through to "shared disk failover" seem very awkward to me.
> I don't like them at all.
> 
> "Shared disk failover" has nothing to do with "the sync problem" as it's
> not a multiple-database solution. It's an uptime, "24 X 7 X 365" issue.
> Further, it also has nothing to do with disk arrays, though it is often
> used with RAID to help avoid disk based corruption problems.

Yes, please see updated version.  I removed the sync problem term from
there.

> The point about Warm Standby needs to include a warning about WAL that it
> MUST be sensitive to the semantics of the database design or else it's
> fatally flawed. I'm talking about "referential integrety". That is to say,
> it's inappropriate to capture updates on a table by table basis, as some
> such systems do, (I have no idea what's done by anyone in the PG world on
> this right now) because an update to one table (esp. inserts) very often
> go hand in glove with updates in other tables and to get one without the
> other can corrupt a database.

We don't have that problem.  We recover only full transactions.

> The description of "Continuously running replication server" should
> include the critical caveat - repeated if you think it's already said
> elsewhere - that it is ONLY suitable for applications in which a loss of
> (missing) update data doesn't matter. For example, an airline reservation
> system would be an inappropriate application for such a "solution" because
> what seats are available cannot be guaranteed to be correct.

I have added note about data loss for the Slony item.

> Regarding data partitioning, I strongly disagree with the opening sentence
> in that it doesn't split a database into sets, it splits tables into sets.

OK, changed.

> Data partitioning is often done within a single database on a single
> server and therefore, as a concept, has nothing whatsoever to do with
> different servers. Similarly, the second paragraph of this section is

Uh, why would someone split things up like that on a single server?

> problematic. Please define your term first, then talk about some
> implementations - this is muddying the water. Further, there are both
> vertical and horizontal partitioning - you mention neither - and each has
> its own distinct uses. If partitioning is mentioned, it should be more
> complete.

Uh, what exactly needs to be defined.

> Next, Query Broadcast Load Balancing... also needs a lot of work. First,
> it's foremost in my memory that sending read queries everywhere and
> returning the first result set back is a key way to improve application
> performance at the cost of additional load on other systems - I guess
> that's not at all what the document is after here, but it's a worthy part
> of a dialogue on broadcasting queries. In other words, this has more parts
> to it than just what the document now entertains. Secondly, the document

Uh, do we want to go into that here?  I guess I could.

> doesn't address _at_all_ whether this is a two-phaise-commit environment
> or not. If not, how are updates managed? If each server operates
> independently and one of them fails, what do you do then? How do you know
> _any_ server got an insert/update? ...  Each server _can't_ operate
> independently unl

Re: [HACKERS] Replication documentation addition

2006-10-25 Thread Josh Berkus
Bruce,

> > >   ftp://momjian.us/pub/postgresql/mypatches/replication

I'm still not seeing anything in this patch that tells users where they can 
get replication solutions for PostgreSQL, either OSS or commercial.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Replication documentation addition

2006-10-25 Thread Bruce Momjian
Alexey Klyukin wrote:
> Hi,
> 
> A typo:
> ("a write to any server has to be _propogated_")
> s/propogated/propagated

Thanks, fixed.

---


> 
> Bruce Momjian wrote:
> > Here is a new replication documentation section I want to add for 8.2:
> >
> > ftp://momjian.us/pub/postgresql/mypatches/replication
> >
> > Comments welcomed.
> >
> >   
> -- 
> Regards,
> 
> Alexey Klyukinalexk(at)vollmond.org.ua
> Simferopol, Crimea, Ukraine.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Replication documentation addition

2006-10-25 Thread Richard Troy

> Here is a new replication documentation section I want to add for 8.2:
>
> ftp://momjian.us/pub/postgresql/mypatches/replication
>

...Read the document, as promissed...

First paragraph, "(fail over)" is inconsistent with title, "failover", as
are other spots throughout the document. The whole document should be
consistent and I vote for "failover" and not "fail over."

Fourth paragraph, "This "sync problem" is the fundamental difficulty for
servers working together"; "Sync problem" hasn't been defined. Actually,
you're talking about the consistent attribute of the "acid" properties of
all competent databases: Atomic, Consistency, Isolation, and Durability.
At least define the term you are using - probably most easily done in the
preceeding paragraph.

The fifth paragraph needs a lot more help, I think. Howabout this
alternative:

So called "two phaised commit" was developed as a strategy in which two or
more databases are updated simultaneously and none of the data is
committed until all are committed. This guarantees consistency between the
databases with all propagation delay being absorbed by the writer at write
time. There are times when this propagation delay is large, so sometimes
alternatives are worked out which we'll call here "asynchronous updates,"
however, in these cases, there is always a window of time in which some
transaction can be lost should a failure occurr. For this reason,
asynchronous updates are only used when the possibility of such losses is
acceptible.

Paragraphs six through to "shared disk failover" seem very awkward to me.
I don't like them at all.

"Shared disk failover" has nothing to do with "the sync problem" as it's
not a multiple-database solution. It's an uptime, "24 X 7 X 365" issue.
Further, it also has nothing to do with disk arrays, though it is often
used with RAID to help avoid disk based corruption problems.

The point about Warm Standby needs to include a warning about WAL that it
MUST be sensitive to the semantics of the database design or else it's
fatally flawed. I'm talking about "referential integrety". That is to say,
it's inappropriate to capture updates on a table by table basis, as some
such systems do, (I have no idea what's done by anyone in the PG world on
this right now) because an update to one table (esp. inserts) very often
go hand in glove with updates in other tables and to get one without the
other can corrupt a database.

The description of "Continuously running replication server" should
include the critical caveat - repeated if you think it's already said
elsewhere - that it is ONLY suitable for applications in which a loss of
(missing) update data doesn't matter. For example, an airline reservation
system would be an inappropriate application for such a "solution" because
what seats are available cannot be guaranteed to be correct.

Regarding data partitioning, I strongly disagree with the opening sentence
in that it doesn't split a database into sets, it splits tables into sets.
Data partitioning is often done within a single database on a single
server and therefore, as a concept, has nothing whatsoever to do with
different servers. Similarly, the second paragraph of this section is
problematic. Please define your term first, then talk about some
implementations - this is muddying the water. Further, there are both
vertical and horizontal partitioning - you mention neither - and each has
its own distinct uses. If partitioning is mentioned, it should be more
complete.

Next, Query Broadcast Load Balancing... also needs a lot of work. First,
it's foremost in my memory that sending read queries everywhere and
returning the first result set back is a key way to improve application
performance at the cost of additional load on other systems - I guess
that's not at all what the document is after here, but it's a worthy part
of a dialogue on broadcasting queries. In other words, this has more parts
to it than just what the document now entertains. Secondly, the document
doesn't address _at_all_ whether this is a two-phaise-commit environment
or not. If not, how are updates managed? If each server operates
independently and one of them fails, what do you do then? How do you know
_any_ server got an insert/update? ...  Each server _can't_ operate
independently unless the application does its own insert/update commits to
every one of them - and that can't be fast, nor does it load balance,
though it may contribute to superior uptime performance by the
application.

Next up; I'm not aware of any current products or projects that provide
parallel query execution, though Informix might - I can ask a colleague or
two. Either way, it's probably best to simply define the term (perhaps in
a little more detail), and not mention solutions - they change with time
anyway.

While I've never used Oracle's clustering tools, I've read up on them and
have customers who use them, and I think this description of Oracle
clustering is a mis-read on what the 

Re: [HACKERS] Replication documentation addition

2006-10-25 Thread Richard Troy

Hi Hannu, everyone,

I apologize for not having read the document in question - will do
shortly. My comments are brought about by the dialogue I read on list this
morning...

> > Here is a new replication documentation section I want to add for 8.2:
> >
> > ftp://momjian.us/pub/postgresql/mypatches/replication
>

> > Data Partitioning
> > -
> >
> > Data partitioning splits the database into data sets.  To achieve
> > replication, each data set can only be modified by one server.  For
> > example, data can be partitioned by offices, e.g. London and Paris.
> > While London and Paris servers have all data records, only London can
> > modify London records, and Paris can only modify Paris records.  Such
> > partitioning is usually accomplished in application code, though rules
> > and triggers can help enforce partitioning and keep the read-only data
> > sets current.  Slony can also be used in such a setup.  While Slony
> > replicates only entire tables, London and Paris can be placed in
> > separate tables, and inheritance can be used to access from both tables
> > using a single table name.
>
> Maybe another use of partitioning should also be mentioned. That is ,
> when partitioning is used to overcome limitations of single servers
> (especially IO and memory, but also CPU), and only a subset of data is
> stored and processed on each server.

> > I think the "official" term for this kind of "replication" is
> > Shared-Nothing Clustering.

"Data partitioning" has two fundamental flavors, "horizontal" and
"vertical", quite a handful of implementations, and even more motivations
behind why one uses either strategy and whatever implementation. The same
is true for "clustering" - a few fundamental strategies, with a larger
number of implementations and yet more motivations. Replication,
meanwhile, is yet another beast altogether, sharing the same fundamentals
of multiple flavors, implementations and motivations. … I strongly urge
keeping any documentation on these (and related) topics strictly distinct
and separate.

In my view, one should define the terms first, separately, distinctly, and
as succinctly as possible, and, following this, a dialogue on how these
may be combined can be entertained. The definitions of each should be both
complete and academic in flavor and may include implementation and
motivational  information, but never "muddy the water" by mixing with
other concepts - not yet, not until after all the fundamentals have been
introduced.

I don't know much about what PostgreSql has been doing in these areas of
late - nothing, I gather from someone's post this morning - but I'll try
to help out as I can with a paragraph or two - whatever you want,
whatever's welcome - as "I was there" when Randy Eash created the first
commercial RDBMS replicator - for Ingres - and since I created the first
commercial RDBMS front-end failover technology, also for Ingres, so I have
a pretty good handle on all the issues.

Also, I liked what Markus Schiltknecht wrote, but will have to read the
original before I can comment on his specific points.

>> I am not inclined to add commercial offerings.  If people wanted
>> commercial database offerings, they can get them from companies that
>> advertize.  People are coming to PostgreSQL for open source solutions,
>> and I think mentioning commercial ones doesn't make sense.
>>
>> If we are to add them, I need to hear that from people who haven't
>> worked in PostgreSQL commerical replication companies.
>
> I'm not coming to PostgreSQL for open source solutions. I'm coming
> to PostgreSQL for _good_ solutions.
>
> I want to see what solutions might be available for a problem I have.
> I certainly want to know whether they're freely available, commercial
> or some flavour of open source, but I'd like to know about all of them.
>
> A big part of the value of Postgresql is the applications and extensions
> that support it. Hiding the existence of some subset of those just
> because of the way they're licensed is both underselling postgresql
> and doing something of a disservice to the user of the document.

> If potential new users look through the docs and it says no options
> available for what they want or consider they will need in the future
> then they go elsewhere, if they know that some options are available
> then they will look further if they want that feature.


I agree that people look through the materials on the web site,
documentation especially, and make choices based upon what they see. Many
of us don't have time to spend a day searching the web for things we don't
even know exist. By including more information, more users will be
attracted to PostgreSql, whether it be in the documentation or web site. I
have been SURE that certain things must exist in the PG world, but haven't
known about them with certainty due to time constraints, but would gladly
point our customers at Postgres solutions if only I knew about them. Count
this paragraph 

Re: [HACKERS] Replication documentation addition

2006-10-25 Thread Bruce Momjian

I have added this text:

Commercial Solutions


Because PostgreSQL is open source and easily extended, a number of
companies have taken PostgreSQL and created commercial closed-source
solutions with unique failover, replication, and load balancing
capabilities.


---

Hannu Krosing wrote:
> ?hel kenal p?eval, T, 2006-10-24 kell 22:57, kirjutas Bruce Momjian:
> > I don't think the PostgreSQL documentation should be mentioning
> > commercial solutions.
> 
> IMNSHO, having commercial solutions based on postgresql which extend
> postgres in directions not (yet?) done by core postgres is nothing to be
> ashamed of.
> 
> And we should at least mention the OSS version of Bizgres as a place
> where quite a lot of initial development is done on performance
> improvements considered too risky for mainline postgresql.
> 
> And if you need a more technical reason, you can use free libpq and psql
> to connect to even Bizgres MPP ;)
> 
> 
> > ---
> > 
> > Luke Lonergan wrote:
> > > Bruce, 
> > > 
> > > > -Original Message-
> > > > From: [EMAIL PROTECTED] 
> > > > [mailto:[EMAIL PROTECTED] On Behalf Of Bruce Momjian
> > > > Sent: Tuesday, October 24, 2006 5:16 PM
> > > > To: Hannu Krosing
> > > > Cc: PostgreSQL-documentation; PostgreSQL-development
> > > > Subject: Re: [HACKERS] Replication documentation addition
> > > > 
> > > > 
> > > > OK, I have updated the URL.  Please let me know how you like it.
> > > 
> > > There's a typo on line 8, first paragraph:
> > > 
> > > "perhaps with only one server allowing write rwork together at the same
> > > time."
> > > 
> > > Also, consider this wording of the last description:
> > > 
> > > "Single-Query Clustering..."
> > > 
> > > Replaced by:
> > > 
> > > "Shared Nothing Clustering
> > > ---
> > > 
> > > This allows multiple servers with separate disks to work together on a
> > > each query.
> > > In shared nothing clusters, the work of answering each query is
> > > distributed among
> > > the servers to increase the performance through parallelism.  These
> > > systems will
> > > typically feature high availability by using other forms of replication
> > > internally.
> > > 
> > > While there are no open source options for this type of clustering,
> > > there are several
> > > commercial products available that implement this approach, making
> > > PostgreSQL achieve
> > > very high performance for multi-Terabyte business intelligence
> > > databases."
> > > 
> > > - Luke
> > 
> -- 
> 
> Hannu Krosing
> Database Architect
> Skype Technologies O?
> Akadeemia tee 21 F, Tallinn, 12618, Estonia
> 
> Skype me:  callto:hkrosing
> Get Skype for free:  http://www.skype.com

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Replication documentation addition

2006-10-25 Thread Hannu Krosing
Ühel kenal päeval, T, 2006-10-24 kell 22:57, kirjutas Bruce Momjian:
> I don't think the PostgreSQL documentation should be mentioning
> commercial solutions.

IMNSHO, having commercial solutions based on postgresql which extend
postgres in directions not (yet?) done by core postgres is nothing to be
ashamed of.

And we should at least mention the OSS version of Bizgres as a place
where quite a lot of initial development is done on performance
improvements considered too risky for mainline postgresql.

And if you need a more technical reason, you can use free libpq and psql
to connect to even Bizgres MPP ;)


> ---
> 
> Luke Lonergan wrote:
> > Bruce, 
> > 
> > > -Original Message-
> > > From: [EMAIL PROTECTED] 
> > > [mailto:[EMAIL PROTECTED] On Behalf Of Bruce Momjian
> > > Sent: Tuesday, October 24, 2006 5:16 PM
> > > To: Hannu Krosing
> > > Cc: PostgreSQL-documentation; PostgreSQL-development
> > > Subject: Re: [HACKERS] Replication documentation addition
> > > 
> > > 
> > > OK, I have updated the URL.  Please let me know how you like it.
> > 
> > There's a typo on line 8, first paragraph:
> > 
> > "perhaps with only one server allowing write rwork together at the same
> > time."
> > 
> > Also, consider this wording of the last description:
> > 
> > "Single-Query Clustering..."
> > 
> > Replaced by:
> > 
> > "Shared Nothing Clustering
> > ---
> > 
> > This allows multiple servers with separate disks to work together on a
> > each query.
> > In shared nothing clusters, the work of answering each query is
> > distributed among
> > the servers to increase the performance through parallelism.  These
> > systems will
> > typically feature high availability by using other forms of replication
> > internally.
> > 
> > While there are no open source options for this type of clustering,
> > there are several
> > commercial products available that implement this approach, making
> > PostgreSQL achieve
> > very high performance for multi-Terabyte business intelligence
> > databases."
> > 
> > - Luke
> 
-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Replication documentation addition

2006-10-24 Thread Joshua D. Drake
Josh Berkus wrote:
> Bruce,
> 
>> I have updated the text.  Please let me know what else I should change.
>> I am unsure if I should be mentioning commercial PostgreSQL products in
>> our documentation.
> 
> I think you should mention the postgresql-only ones, but just briefly with a 
> link.  Bizgres MPP, ExtenDB, uni/cluster, and Mammoth Replicator.

And to further this I would expect that it would be a subsection.. e.g;
a  or . I think the open source version should absolutely
get top billing though.

Sincerely,

Joshua D. Drake




-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Replication documentation addition

2006-10-24 Thread Josh Berkus
Bruce,

> I have updated the text.  Please let me know what else I should change.
> I am unsure if I should be mentioning commercial PostgreSQL products in
> our documentation.

I think you should mention the postgresql-only ones, but just briefly with a 
link.  Bizgres MPP, ExtenDB, uni/cluster, and Mammoth Replicator.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Replication documentation addition

2006-10-24 Thread Bruce Momjian

I don't think the PostgreSQL documentation should be mentioning
commercial solutions.

---

Luke Lonergan wrote:
> Bruce, 
> 
> > -Original Message-
> > From: [EMAIL PROTECTED] 
> > [mailto:[EMAIL PROTECTED] On Behalf Of Bruce Momjian
> > Sent: Tuesday, October 24, 2006 5:16 PM
> > To: Hannu Krosing
> > Cc: PostgreSQL-documentation; PostgreSQL-development
> > Subject: Re: [HACKERS] Replication documentation addition
> > 
> > 
> > OK, I have updated the URL.  Please let me know how you like it.
> 
> There's a typo on line 8, first paragraph:
> 
> "perhaps with only one server allowing write rwork together at the same
> time."
> 
> Also, consider this wording of the last description:
> 
> "Single-Query Clustering..."
> 
> Replaced by:
> 
> "Shared Nothing Clustering
> ---
> 
> This allows multiple servers with separate disks to work together on a
> each query.
> In shared nothing clusters, the work of answering each query is
> distributed among
> the servers to increase the performance through parallelism.  These
> systems will
> typically feature high availability by using other forms of replication
> internally.
> 
> While there are no open source options for this type of clustering,
> there are several
> commercial products available that implement this approach, making
> PostgreSQL achieve
> very high performance for multi-Terabyte business intelligence
> databases."
> 
> - Luke

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Replication documentation addition

2006-10-24 Thread Bruce Momjian
Markus Schiltknecht wrote:
> Looking at that, I'm a) missing PgCluster and b) arguing that we have to 
> admit that we simply can not 'list .. replication solutions ... and how 
> to get them' because all of the solutions mentioned need quite some 
> knowledge and require a more or less complex installation and configuration.

Where is pgcluster in terms of usability?  Should I mention it?

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Replication documentation addition

2006-10-24 Thread Bruce Momjian

I have updated the text.  Please let me know what else I should change. 
I am unsure if I should be mentioning commercial PostgreSQL products in
our documentation.

---

Hannu Krosing wrote:
> ?hel kenal p?eval, T, 2006-10-24 kell 00:20, kirjutas Bruce Momjian:
> > Here is a new replication documentation section I want to add for 8.2:
> > 
> > ftp://momjian.us/pub/postgresql/mypatches/replication
> 
> This is how data partitioning is currently described there
> 
> > Data Partitioning
> > -
> > 
> > Data partitioning splits the database into data sets.  To achieve
> > replication, each data set can only be modified by one server.  For
> > example, data can be partitioned by offices, e.g. London and Paris. 
> > While London and Paris servers have all data records, only London can
> > modify London records, and Paris can only modify Paris records.  Such
> > partitioning is usually accomplished in application code, though rules
> > and triggers can help enforce partitioning and keep the read-only data
> > sets current.  Slony can also be used in such a setup.  While Slony
> > replicates only entire tables, London and Paris can be placed in
> > separate tables, and inheritance can be used to access from both tables
> > using a single table name.
> 
> Maybe another use of partitioning should also be mentioned. That is ,
> when partitioning is used to overcome limitations of single servers
> (especially IO and memory, but also CPU), and only a subset of data is
> stored and processed on each server.
> 
> As an example of this type of partitioning you could mention Bizgres MPP
> (a PG-based commercial product, http://www.greenplum.com ), which
> partitions data to use I/O and CPU of several DB servers for processing
> complex OLAP queries, and Pl_Proxy
> ( http://pgfoundry.org/projects/plproxy/ ) which does the same for OLTP
> loads.
> 
> I think the "official" term for this kind of "replication" is
> Shared-Nothing Clustering.
> 
> -- 
> 
> Hannu Krosing
> Database Architect
> Skype Technologies O?
> Akadeemia tee 21 F, Tallinn, 12618, Estonia
> 
> Skype me:  callto:hkrosing
> Get Skype for free:  http://www.skype.com
> 
> 
> 
> 
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Replication documentation addition

2006-10-24 Thread Luke Lonergan
Bruce, 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Bruce Momjian
> Sent: Tuesday, October 24, 2006 5:16 PM
> To: Hannu Krosing
> Cc: PostgreSQL-documentation; PostgreSQL-development
> Subject: Re: [HACKERS] Replication documentation addition
> 
> 
> OK, I have updated the URL.  Please let me know how you like it.

There's a typo on line 8, first paragraph:

"perhaps with only one server allowing write rwork together at the same
time."

Also, consider this wording of the last description:

"Single-Query Clustering..."

Replaced by:

"Shared Nothing Clustering
---

This allows multiple servers with separate disks to work together on a
each query.
In shared nothing clusters, the work of answering each query is
distributed among
the servers to increase the performance through parallelism.  These
systems will
typically feature high availability by using other forms of replication
internally.

While there are no open source options for this type of clustering,
there are several
commercial products available that implement this approach, making
PostgreSQL achieve
very high performance for multi-Terabyte business intelligence
databases."

- Luke


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Replication documentation addition

2006-10-24 Thread Bruce Momjian

OK, I have updated the URL.  Please let me know how you like it.

---

Hannu Krosing wrote:
> ?hel kenal p?eval, T, 2006-10-24 kell 00:20, kirjutas Bruce Momjian:
> > Here is a new replication documentation section I want to add for 8.2:
> > 
> > ftp://momjian.us/pub/postgresql/mypatches/replication
> 
> This is how data partitioning is currently described there
> 
> > Data Partitioning
> > -
> > 
> > Data partitioning splits the database into data sets.  To achieve
> > replication, each data set can only be modified by one server.  For
> > example, data can be partitioned by offices, e.g. London and Paris. 
> > While London and Paris servers have all data records, only London can
> > modify London records, and Paris can only modify Paris records.  Such
> > partitioning is usually accomplished in application code, though rules
> > and triggers can help enforce partitioning and keep the read-only data
> > sets current.  Slony can also be used in such a setup.  While Slony
> > replicates only entire tables, London and Paris can be placed in
> > separate tables, and inheritance can be used to access from both tables
> > using a single table name.
> 
> Maybe another use of partitioning should also be mentioned. That is ,
> when partitioning is used to overcome limitations of single servers
> (especially IO and memory, but also CPU), and only a subset of data is
> stored and processed on each server.
> 
> As an example of this type of partitioning you could mention Bizgres MPP
> (a PG-based commercial product, http://www.greenplum.com ), which
> partitions data to use I/O and CPU of several DB servers for processing
> complex OLAP queries, and Pl_Proxy
> ( http://pgfoundry.org/projects/plproxy/ ) which does the same for OLTP
> loads.
> 
> I think the "official" term for this kind of "replication" is
> Shared-Nothing Clustering.
> 
> -- 
> 
> Hannu Krosing
> Database Architect
> Skype Technologies O?
> Akadeemia tee 21 F, Tallinn, 12618, Estonia
> 
> Skype me:  callto:hkrosing
> Get Skype for free:  http://www.skype.com
> 

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Replication documentation addition

2006-10-24 Thread Jim C. Nasby
On Mon, Oct 23, 2006 at 11:39:34PM -0400, Bruce Momjian wrote:
> Query Broadcast Replication
> ---
> 
> This involves sending write queries to multiple servers.  Read-only
> queries can be sent to a single server because there is no need for all
> servers to process it.   This can be complex to setup because functions
> like random() and CURRENT_TIMESTAMP will have different values on
> different servers, and sequences should be consistent across servers.
> Pgpool implements this type of replication.

Isn't there another active project that does this besides pgpool?

It's probably also worth mentioning the commercial replication schemes
that are out there.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Replication documentation addition

2006-10-24 Thread Markus Schiltknecht

Hello Josh,

Josh Berkus wrote:
Hmmm ... while the primer on different types of replication is fine, I 
think what users were really looking for is a listing of the different 
replication solutions which are available for PostgreSQL and how to get 
them.


Well, let's see what we have:

* Shared Disk Fail Over
* Warm Standby Using Point-In-Time Recovery
* Point-In-Time Recovery

these first three require quite some configuration, AFAIK there is no 
tool or single solution you can download, install and be happy with. I 
probably wouldn't even call them 'replication solutions'. For me those 
are more like backups with fail-over capability.



* Continuously Running Fail-Over Server

(BTW, what is 'partial replication' supposed to mean here?)
Here we could link to Slony.


* Data Partitioning

Here we can't provide a link, it's just a way to handle the problem in 
the application code.



* Query Broadcast Replication

Here we could link to PgPool.


* Multi-Master Replication
  (or better: Distributed Shared Memory Replication)

No existing solution for PostgreSQL.


Looking at that, I'm a) missing PgCluster and b) arguing that we have to 
admit that we simply can not 'list .. replication solutions ... and how 
to get them' because all of the solutions mentioned need quite some 
knowledge and require a more or less complex installation and configuration.


Regards

Markus



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Replication documentation addition

2006-10-24 Thread Josh Berkus
Bruce,

> Here is my first draft of a new replication section for our
> documentation.  I am looking for any comments.

Hmmm ... while the primer on different types of replication is fine, I 
think what users were really looking for is a listing of the different 
replication solutions which are available for PostgreSQL and how to get 
them.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Replication documentation addition

2006-10-24 Thread Markus Schiltknecht

Hannu Krosing wrote:

I think the "official" term for this kind of "replication" is
Shared-Nothing Clustering.


Well, that's just another distinction for clusters. Most of the time 
it's between Shared-Disk vs. Shared-Nothing. You could also see the very 
Big Irons as a Shared-Everything Cluster.


While it's certainly true, that any kind of data partitioning for 
databases only make sense for Shared-Nothing Clusters, I don't think 
it's a 'kind of replication'. AFAIK most database replication solutions 
are built for Shared-Nothing Clusters. (With the exception of 
PgCluster-II, I think).


Regards

Markus




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Replication documentation addition

2006-10-24 Thread Hannu Krosing
Ühel kenal päeval, T, 2006-10-24 kell 00:20, kirjutas Bruce Momjian:
> Here is a new replication documentation section I want to add for 8.2:
> 
>   ftp://momjian.us/pub/postgresql/mypatches/replication

This is how data partitioning is currently described there

> Data Partitioning
> -
> 
> Data partitioning splits the database into data sets.  To achieve
> replication, each data set can only be modified by one server.  For
> example, data can be partitioned by offices, e.g. London and Paris. 
> While London and Paris servers have all data records, only London can
> modify London records, and Paris can only modify Paris records.  Such
> partitioning is usually accomplished in application code, though rules
> and triggers can help enforce partitioning and keep the read-only data
> sets current.  Slony can also be used in such a setup.  While Slony
> replicates only entire tables, London and Paris can be placed in
> separate tables, and inheritance can be used to access from both tables
> using a single table name.

Maybe another use of partitioning should also be mentioned. That is ,
when partitioning is used to overcome limitations of single servers
(especially IO and memory, but also CPU), and only a subset of data is
stored and processed on each server.

As an example of this type of partitioning you could mention Bizgres MPP
(a PG-based commercial product, http://www.greenplum.com ), which
partitions data to use I/O and CPU of several DB servers for processing
complex OLAP queries, and Pl_Proxy
( http://pgfoundry.org/projects/plproxy/ ) which does the same for OLTP
loads.

I think the "official" term for this kind of "replication" is
Shared-Nothing Clustering.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Replication Documentation

2006-08-03 Thread Andrew Hammond
> > "There are a number of different approaches to solving the problem of
> > replication, each with strengths and weaknesses. As a result, there
> > are a number of different replication solutions available for
> > PostgreSQL. To find out more, please refer to the website."
>
> Well, that's what I've been talking about all along, and it has also
> been the resolution at the Toronto meeting.

Great. Is the above text sufficient for the documentation then, or does
anyone have a suggestion on how to say this better?

Drew


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Replication Documentation

2006-08-03 Thread Peter Eisentraut
Andrew Hammond wrote:
> How about "what works with a given release at the time of the
> release"?

We just threw that idea out in the context of the procedural language 
discussion because we do not have the resources to check what works.

> Arguably, neither are most of the procedural languages in the Server
> Programming section of the documentation, and yet they're included.

That is false.  The documentation documents exactly those pieces of code 
that we distribute.

> "There are a number of different approaches to solving the problem of
> replication, each with strengths and weaknesses. As a result, there
> are a number of different replication solutions available for
> PostgreSQL. To find out more, please refer to the website."

Well, that's what I've been talking about all along, and it has also 
been the resolution at the Toronto meeting.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Replication Documentation

2006-08-03 Thread Andrew Hammond
Markus Schiltknecht wrote:
> Hi,
>
> Andrew Hammond wrote:
>   > I can see value in documenting what replication systems are known to
> > work (for some definition of work) with a given release in the
> > documentation for that release. Five years down the road when I'm
> > trying to implement replication for a client who's somehow locked into
> > postgres 8.2 (for whatever reason), it would be very helpful to know
> > that slony1.2 is an option. I don't know if this is sufficient
> > justification.
>
> Please keep in mind, that most replication solutions (that I know of)
> are quite independent from the PostgreSQL version used. Thus,
> documenting which version of PostgreSQL can be used with which version
> of a replication system should better be covered in the documentation of
> the replication system.

I would agree to this with the caveat that there needs to be something
in the postgres documentation that points people to the various
replication systems available.

> Otherwise you would have to update the
> PostgreSQL documentation for new releases of your favorite replication
> system - which seems to lead to confusion.

Yeah, updating the docs based on other software releases would suck.
How about "what works with a given release at the time of the release"?
Perhaps this could be limited to a pointer to the docs for such
replication systems, and maybe a very brief description (based on
Chris' taxonomy)?

> > Including a separate page on the history of postgres replication to
> > date also makes some sense, at least to me. It should be relatively
> > easy to maintain.
>
> I agree that having such a 'replication guide for users of PostgreSQL'
> is a good thing to have. But I think not much of that should be part of
> the official PostgreSQL documentation - mainly because the replication
> solutions are not part of PostgreSQL.

Arguably, neither are most of the procedural languages in the Server
Programming section of the documentation, and yet they're included. I
agree that it's improtant to keep the documentation from getting
cluttered up with stuff that's "not part of PostgreSQL". However, I
think the very fact so many people assume that there's no replication
for PostgreSQL simply because it's not mentioned in the documentation
shows that for many people replication is precieved as "part of" the
dbms. Even a single page in the documentation wich consists of
something along the lines of the following would help these folks find
what they're looking for.

"There are a number of different approaches to solving the problem of
replication, each with strengths and weaknesses. As a result, there are
a number of different replication solutions available for PostgreSQL.
To find out more, please refer to the website."


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Replication Documentation

2006-08-02 Thread Markus Schiltknecht

Hi,

Andrew Hammond wrote:
 > I can see value in documenting what replication systems are known to

work (for some definition of work) with a given release in the
documentation for that release. Five years down the road when I'm
trying to implement replication for a client who's somehow locked into
postgres 8.2 (for whatever reason), it would be very helpful to know
that slony1.2 is an option. I don't know if this is sufficient
justification.


Please keep in mind, that most replication solutions (that I know of) 
are quite independent from the PostgreSQL version used. Thus, 
documenting which version of PostgreSQL can be used with which version 
of a replication system should better be covered in the documentation of 
the replication system. Otherwise you would have to update the 
PostgreSQL documentation for new releases of your favorite replication 
system - which seems to lead to confusion.



Including a separate page on the history of postgres replication to
date also makes some sense, at least to me. It should be relatively
easy to maintain.


I agree that having such a 'replication guide for users of PostgreSQL' 
is a good thing to have. But I think not much of that should be part of 
the official PostgreSQL documentation - mainly because the replication 
solutions are not part of PostgreSQL.


Regards

Markus

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Replication Documentation

2006-08-02 Thread Andrew Hammond
Peter Eisentraut wrote:
> Alvaro Herrera wrote:
> > > >I don't think this sort of material belongs directly into the
> > > > PostgreSQL documentation.
> >
> > Why not?
>
> PostgreSQL documentation (or any product documentation) should be
> factual: describe what the software does and give advice on its use.
> This should be mostly independent of the external circumstances,
> because people will still read that documentation three or four years
> from now.
>
> The proposed text is, at least partially, journalistic: it evaluates
> competing ideas, gives historical and anecdotal information, reports on
> current events, and makes speculations about the future.  That is the
> sort of material that is published in periodicals or other volatile
> media.

I can see value in documenting what replication systems are known to
work (for some definition of work) with a given release in the
documentation for that release. Five years down the road when I'm
trying to implement replication for a client who's somehow locked into
postgres 8.2 (for whatever reason), it would be very helpful to know
that slony1.2 is an option. I don't know if this is sufficient
justification.

Including a separate page on the history of postgres replication to
date also makes some sense, at least to me. It should be relatively
easy to maintain.

If we do talk about replicatoin, then including a probably separate and
presumably quite static page on the taxonomy of replication seems
necessary. As Chris notes, the term replication by it'self is can mean
quite a number of things.

> At the summit, we resolved, for precisely these reasons, to keep the
> journalistic parts on the web site, for clear separation from the
> shipped product and for easier updates (and for easier reference as
> well, because the PostgreSQL documentation is not the single obvious
> place to look for it) and refer to it from the documentation.
>
> --
> Peter Eisentraut
> http://developer.postgresql.org/~petere/
>
> ---(end of broadcast)---
> TIP 4: Have you searched our list archives?
> 
>http://archives.postgresql.org


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match