Re: [HACKERS] Replication on the backend

2005-12-10 Thread Markus Schiltknecht
Hello,

On Fri, 2005-12-09 at 08:47 -0500, Christopher Browne wrote:
 We *know* (particularly those of us that have had involvement in
 actually implementing replication systems used in production
 environments) that user space implementations of replication can
 function satisfactorily.  We've implemented it.

While this might be true, allow me a sidenote: AFAIK the very first,
functional prototype we know of was Postgres-R for PostgreSQL 6.4.2 (1).
So the very same holds true for a replication solution integrated into
the backend: we know such an implementation can function satisfactorily.

As we mostly agree, the performance bottelneck is _not_ the CPU, but the
nodes interconnects (the network). Regarding communication between the
backends and the replication solution, performance isn't that much of an
issue, because the inter-node communication will allways be slower than
inter-process communication.

A different problem is how to distribute PostgreSQL with different
upcomming replication solutions. It seems to me that most people's main
concern is not being able to get a prebuilt PostgreSQL with
_just_one_replication_solution_that_works_(tm) For most users it really
doesn't matter _how_ exactly the solution technically got integrated. 

This problem gets solved with hooks and preloading a library: you could
simply provide _one_ PostgreSQL package which provides hooks for
replication solutions. Those could then provide a package with their
library. This of course is only doable if the number of hooks is kept
low.

Regards

Markus


[1] pgreplication project on gborg:
http://gborg.postgresql.org/project/pgreplication/projdisplay.php



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Replication on the backend

2005-12-09 Thread Christopher Browne
 Are you sure that no way to implement a generic aproach on the
 backend? What does specification say?

What specification are you talking about?

 Does Oracle 10g have a core implementation of replication (cluster)?

Since replication is sold as a separate product from Oracle 10g,
obviously not.

We *know* (particularly those of us that have had involvement in
actually implementing replication systems used in production
environments) that user space implementations of replication can
function satisfactorily.  We've implemented it.

You might be able to convince us that implementing replication inside
the backend is preferable, but you'll have to have exceedingly strong
evidence in order to overcome the already known factor that doing it
outside the backend works quite well.
-- 
let name=cbbrowne and tld=gmail.com in name ^ @ ^ tld;;
http://linuxdatabases.info/info/slony.html
Now, suddenly,  I _am_ the  expanding Russian Frontier  -- Commander
Ivanova
But with very nice borders... -- Dr Franklin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Replication on the backend

2005-12-08 Thread Gustavo Tonini
Are you sure that no way to implement a generic aproach on the backend?
What does specification say? Does Oracle 10g have a core implementation
of replication (cluster)?

Gustavo.
2005/12/7, Andrew Sullivan [EMAIL PROTECTED]:
On Tue, Dec 06, 2005 at 12:35:43AM -0500, Jan Wieck wrote: We do not plan to implement replication inside the backend. Replication needs are so diverse that pluggable replication support makes a lot more
 sense. To me it even makes more sense than keeping transaction support outside of the database itself and add it via pluggable storage add-on.And, as I say every single time this comes up, Oracle's and IBM's and
MS's and everybody else's replication systems are _also_ add ons.Ifyou don't believe me, look at the license costs.You can get asystem without it enabled, which means (by definition) it's a modularextension.
A--Andrew Sullivan| [EMAIL PROTECTED]In the future this spectacle of the middle classes shocking the avant-garde will probably become the textbook definition of Postmodernism.
--Brad Holland---(end of broadcast)---TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not
 match


Re: [HACKERS] Replication on the backend

2005-12-08 Thread Andrew Dunstan


What is the point of these questions? If you have a concrete, practical 
proposal to make, please do so. Otherwise, you have already got the 
answer from the people who are actually working on replication and 
understand it far beyond abstract considerations. If you think there is 
a good reason to do replication directly in the backend code rather than 
as an addon, possibly using an agreed API, then you need to provide hard 
evidence, not mere assertion or conjecture.


cheers

andrew

Gustavo Tonini wrote:

Are you sure that no way to implement a generic aproach on the 
backend? What does specification say? Does Oracle 10g have a core 
implementation of replication (cluster)?


Gustavo.


2005/12/7, Andrew Sullivan [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED]:


On Tue, Dec 06, 2005 at 12:35:43AM -0500, Jan Wieck wrote:
 We do not plan to implement replication inside the backend.
Replication
 needs are so diverse that pluggable replication support makes a
lot more
 sense. To me it even makes more sense than keeping transaction
support
 outside of the database itself and add it via pluggable storage
add-on.

And, as I say every single time this comes up, Oracle's and IBM's and
MS's and everybody else's replication systems are _also_ add ons.  If
you don't believe me, look at the license costs.  You can get a
system without it enabled, which means (by definition) it's a modular
extension.

A

--
Andrew Sullivan  | [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
In the future this spectacle of the middle classes shocking the avant-
garde will probably become the textbook definition of Postmodernism.
--Brad Holland

---(end of
broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Replication on the backend

2005-12-08 Thread Jan Wieck

On 12/8/2005 1:28 PM, Gustavo Tonini wrote:


Are you sure that no way to implement a generic aproach on the backend? What


You mean generic as in a replication system that can do asynchronous 
master-slave, asynchronous multimaster with conflict resolution based on 
timestamps, system priority or user defined resolution stubs, can do 
synchronous predicate locking but also does support thousands of 
asynchronous, partial replica (salesman on the road), and last but not 
least can run as a synchronous cluster in a LAN segment. All the above 
can of course be mixed together ... like a central cluster of 8 load 
balanced, fault tolerant systems, with async multimaster replica in all 
external branch servers, partial multimaster replica for the road 
warriers and some slave leaf nodes for reporting.


If you can present a prototype for the above, I am sure that we will 
change our opinion and finally settle for one, builtin replication system.



does specification say? Does Oracle 10g have a core implementation of
replication (cluster)?


What specification?


Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Replication on the backend

2005-12-07 Thread Markus Schiltknecht
On Wed, 2005-12-07 at 01:04 -0800, J. Andrew Rogers wrote:
 Opteron boards get pretty damn close to Big Iron SMP fabric  
 performance in a cheap package.  Given how many companies have  
 announced plans to produce Opteron server boards with Infiniband  
 fabrics directly integrated into HyperTransport, I would say that  
 this is the future of server boards.

InfiniBand on-board? Wow, seems very interesting. Thank you for your
hints and numbers, very helpfull!

 And if postgres could actually use an infiniband fabric for  
 clustering a single database instance across Opteron servers, that  
 would be very impressive...

full ACK

Regards

Markus


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Replication on the backend

2005-12-07 Thread J. Andrew Rogers


On Dec 6, 2005, at 9:09 PM, Gregory Maxwell wrote:

Eh, why would light limited delay be any slower than a disk on FC the
same distance away? :)

In any case, performance of PG on iscsi is just fine. You can't blame
the network... Doing multimaster replication is hard because the
locking primitives that are fine on a simple multiprocessor system
(with a VERY high bandwidth very low latency interconnect between
processors) just don't work across a network, so you're left finding
other methods and making them work...



Speed of light latency shows up pretty damn often in real networks,  
even relatively local ones.  The number of people that wonder why a  
transcontinental SLA of 10ms is not possible is astonishing.  The  
silicon fabrics are sufficiently fast that most well-designed  
networks are limited by how fast one can push photons through a  
fiber, which is significantly slower than photons through a vacuum.   
Silicon switch fabrics add latency measured in nanoseconds, which is  
effectively zero for many networks that leave the system board.


Compared to single system simple SMP, a local cluster built on a  
first-rate fabric will have about an order of magnitude higher  
latency but very similar bandwidth.  On the other hand, at those  
latencies you can increase the number of addressable processors with  
that kind of bandwidth by an order of magnitude, so it is a bit of a  
trade.  However, latency matters a lot such that one would have to be  
a lot smarter about partitioning synchronization across that fabric  
even though one would lose nothing in the bandwidth department.




But again, multimaster isn't hard because there of some inherently
slow property of networks.



Eh?  As far as I know, the difficulty of multi-master is almost  
entirely a product of the latency of real networks such that they are  
too slow for scalable distributed locks.  SMP is little more than a  
distributed lock manager implemented in silicon.  Therefore, multi- 
master is hard in practice because we cannot drive networks fast  
enough.  That said, current state-of-the-art network fabrics are  
within an order of magnitude of SMP fabrics such that they could be  
real contenders, particularly once you get north of 8-16 processors.


The really sweet potential is in Opteron system boards with  
Infiniband directly attached to HyperTransport.  At that level of  
bandwidth and latency, both per node and per switch fabric, the  
architecture possibilities start to become intriguing.



J. Andrew Rogers



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Replication on the backend

2005-12-07 Thread J. Andrew Rogers


On Dec 6, 2005, at 11:42 PM, Markus Schiltknecht wrote:
Does anybody have latency / roundtrip measurements for current  
hardware?

I'm interested in:
1Gb Ethernet,
10 Gb Ethernet,
InfiniBand,
probably even p2p usb2 or firewire links?



In another secret life, I know a bit about supercomputing fabrics.   
The latency metrics have to be thoroughly qualified.


First, most of the RTT latency numbers for network fabrics are for 0  
byte packet sizes, which really does not apply to anyone shuffling  
real data around.  For small packets, high-performance fabrics (HTX  
Infiniband, Quadrics, etc) have approximately an order of magnitude  
less latency than vanilla Ethernet, though the performance specifics  
depend greatly on the actual usage.  For large packet sizes, the  
differences in latency become far less obvious.  However, for real  
packets a performant fabric will still look very good compared to  
disk systems.  Switched fiber fabrics have enough relatively  
inexpensive throughput now to saturate most disk systems and CPU I/O  
busses; only platforms like HyperTransport can really keep up.  It is  
worth pointing out that the latency of high-end network fabrics is  
similar to large NUMA fabrics, which exposes some of the limits of  
SMP scalability.  As a point of reference, an organization that knows  
what they are doing should have no problem getting 500 microsecond  
RTT on a vanilla metropolitan area GigE fiber network -- a few  
network operators actually do deliver this on a regional scale.  For  
a local cluster, a competent design can best this by orders of  
magnitude.


There are a number of silicon limitations, but a system that connects  
the fabric directly to HyperTransport can drive several GB/s with  
very respectable microsecond latencies if the rest of the system is  
up to it.  There are Opteron system boards now that will drive  
Infiniband directly from HyperTransport.  I know Arima/Rioworks makes  
some (great server boards generally), and several other companies are  
either making them or have announced them in the pipeline.  These  
Opteron boards get pretty damn close to Big Iron SMP fabric  
performance in a cheap package.  Given how many companies have  
announced plans to produce Opteron server boards with Infiniband  
fabrics directly integrated into HyperTransport, I would say that  
this is the future of server boards.


And if postgres could actually use an infiniband fabric for  
clustering a single database instance across Opteron servers, that  
would be very impressive...


J. Andrew Rogers



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Replication on the backend

2005-12-07 Thread Luke Lonergan
Andrew,
 
 And if postgres could actually use an infiniband fabric for 
 clustering a single database instance across Opteron servers, that 
 would be very impressive...

That's what we do with Bizgres MPP.  We've implemented an interconnect to do 
the data shuffling underneath the optimizer/executor and we currently use 
TCP/IP, though we could haul out SDP over Infiniband should we need it.
 
However, our optimizer effectively minimizes traffic over the interconnect now 
and that works well for all of the plans we've run so far.  It would be nice to 
characterize the improvements we could get from moving to 3x infiniband.
 
Regarding a direct Hypertransport to Infiniband bridge, have you looked at 
Pathscale? http://www.pathscale.com/  I know the fellow behind the scenes who 
designed it, and I think it's probably well thought out.  We were gunning for 
less than 1us RTT through the adapter and switch once, and I bet they are close.
 
- Luke


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Replication on the backend

2005-12-07 Thread Andrew Sullivan
On Tue, Dec 06, 2005 at 12:35:43AM -0500, Jan Wieck wrote:
 We do not plan to implement replication inside the backend. Replication 
 needs are so diverse that pluggable replication support makes a lot more 
 sense. To me it even makes more sense than keeping transaction support 
 outside of the database itself and add it via pluggable storage add-on.

And, as I say every single time this comes up, Oracle's and IBM's and
MS's and everybody else's replication systems are _also_ add ons.  If
you don't believe me, look at the license costs.  You can get a
system without it enabled, which means (by definition) it's a modular
extension.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
In the future this spectacle of the middle classes shocking the avant-
garde will probably become the textbook definition of Postmodernism. 
--Brad Holland

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Replication on the backend

2005-12-06 Thread Gustavo Tonini
But, wouldn't the performance be better? And wouldn't asynchronous messages be better processed?

Thanks for replies,
Gustavo.2005/12/6, Jan Wieck [EMAIL PROTECTED]:
On 12/5/2005 8:18 PM, Gustavo Tonini wrote: replication (master/slave, multi-master, etc) implemented inside postgres...I would like to know what has been make in this area.We do not plan to implement replication inside the backend. Replication
needs are so diverse that pluggable replication support makes a lot moresense. To me it even makes more sense than keeping transaction supportoutside of the database itself and add it via pluggable storage add-on.
Jan Gustavo. P.S. Sorry for my bad English. 2005/12/5, Chris Browne [EMAIL PROTECTED]: 
[EMAIL PROTECTED] (Gustavo Tonini) writes:  What about replication or data distribution inside the backend.This  is a valid issue?
 I'm not sure what your question is... -- (reverse (concatenate 'string gro.gultn @ enworbbc)) 
http://www.ntlug.org/~cbbrowne/x.html Love is like a snowmobile flying over the frozen tundra that suddenly flips, pinning you underneath.At night, the ice weasels come. -- Matt Groening
 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriatesubscribe-nomail command to 
[EMAIL PROTECTED] so that yourmessage can get through to the mailing list cleanly--#==#
# It's easier to get forgiveness for being wrong than for being right. ##
Let's break this rule - forgive
me.##== [EMAIL PROTECTED] #


Re: [HACKERS] Replication on the backend

2005-12-06 Thread Markus Schiltknecht
On Tue, 2005-12-06 at 10:03 -0200, Gustavo Tonini wrote:
 But,  wouldn't the performance be better? And wouldn't asynchronous
 messages be better processed?

At least for synchronous multi-master replication, the performance
bottelneck is going to be the interconnect between the nodes -
integration of the replication logic into the backend most probably
doesn't affect performance that much.

I'd rather like to ask Jan what different needs for replication he
discovered so far. And how he came to the conclusion, that it's not
possible to provide a general solution.

My point for integration into the backend is flexibility: obviously the
replication code can influence the database much more from within the
backend than from the outside. For example running one complex query on
several nodes. I know, this a very advanced feature - currently it's not
even possible to run one query on multiple backends (i.e. processors of
a multi-core system) - but I like to plan ahead instead of throwing away
code later. For such advanced features you simply have to dig around in
the backend code one day. Of course you can always add hooks, but IMHO
that only complicates matters.

Is there some discussion going on about such topics somewhere? What's up
with slony-2? The wiki on slony2.org still doesn't provide a lot of
technical information (and obviously got spammed BTW).

Regards

Markus



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Replication on the backend

2005-12-06 Thread Jan Wieck

On 12/6/2005 8:10 AM, Markus Schiltknecht wrote:


On Tue, 2005-12-06 at 10:03 -0200, Gustavo Tonini wrote:

But,  wouldn't the performance be better? And wouldn't asynchronous
messages be better processed?


At least for synchronous multi-master replication, the performance
bottelneck is going to be the interconnect between the nodes -
integration of the replication logic into the backend most probably
doesn't affect performance that much.


That is exactly right. Thus far, processor, memory and disk speeds have 
allways advanced on a higher pace than network speeds. Thus, the few 
percent of performance gain we'd get from moving things into the backend 
will be irrelevant tomorrow with 4x-core and 16x-core CPU's.



I'd rather like to ask Jan what different needs for replication he
discovered so far. And how he came to the conclusion, that it's not
possible to provide a general solution.


  - Asynchronous master to multi-slave. We have a few of those with
Mommoth-Replicator and Slony-I being the top players. Slony-I does
need some cleanup and/or reimplementation after we have a general
pluggable replication API in place.

  - Synchronous multimaster. There are certain attempts out there, like
Postgres-R, pgcluster, Slony-II. Some more advanced, some less. But
certainly nothing I would send into the ring against Oracle-Grid.

  - Asynchronous multimaster with conflict resolution. I have not seen
any reasonable attempt on this one yet. Plus, it divides again into
two camps. One is the idea to have one central system with thousands
of satellites (salesman on the street), the other being two or more
central systems doing load balancing (although this competes with
sync-mm).


My point for integration into the backend is flexibility: obviously the
replication code can influence the database much more from within the


We need a general API. It should be possible to define on a per-database 
level which shared replication module to load on connect. The init 
function of that replication module then installs all the required 
callbacks at strategic points (like heap_update(), at_commit() ...) and 
the rest is hidden in the module.



Is there some discussion going on about such topics somewhere? What's up
with slony-2? The wiki on slony2.org still doesn't provide a lot of
technical information (and obviously got spammed BTW).


Slony-II has been slow lately in the Eastern timezone.


Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Replication on the backend

2005-12-06 Thread Markus Schiltknecht
Hello Jan,

On Tue, 2005-12-06 at 10:10 -0500, Jan Wieck wrote:
 We need a general API. It should be possible to define on a per-database 
 level which shared replication module to load on connect. The init 
 function of that replication module then installs all the required 
 callbacks at strategic points (like heap_update(), at_commit() ...) and 
 the rest is hidden in the module.

thank you for your list of replication types. Those still have some
things in common. Thus your approach of providing hooks for different
modules might make sense. Only I fear that I would need way to many
hooks for what I want ;)

 Slony-II has been slow lately in the Eastern timezone.

What is that supposed to mean? Who sits in the eastern timezone?

Regards

Markus


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Replication on the backend

2005-12-06 Thread Chris Browne
[EMAIL PROTECTED] (Gustavo Tonini) writes:
 But,  wouldn't the performance be better? And wouldn't asynchronous
 messages be better processed?

Why do you think performance would be materially affected by this?

The MAJOR performance bottleneck is normally the slow network
connection between servers.

When looked at in the perspective of that bottleneck, pretty much
everything else is just noise.  (Sometimes pretty loud noise, but
still noise :-).)
-- 
let name=cbbrowne and tld=cbbrowne.com in name ^ @ ^ tld;;
http://cbbrowne.com/info/spreadsheets.html
When the grammar checker identifies an error, it suggests a
correction and can even makes some changes for you.  
-- Microsoft Word for Windows 2.0 User's Guide, p.35:

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Replication on the backend

2005-12-06 Thread Mario Weilguni

IMO this is not true. You can get affordable 10GBit network adapters, so you 
can have plenty of bandwith in a db server pool (if they are located in the 
same area). Even 1GBit Ethernet greatly helps here, and would make it possible 
to balance read-intensive (and not write intensive) applications. We using 
linux bonding interface with 2 gbit NICs, and 200 MBytes/sec throughput is 
something you need to have a quite some harddisks to reach that. Latency is not 
bad too.

Regards,
Mario weilguni


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Chris Browne
Sent: Tuesday, December 06, 2005 4:43 PM
To: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Replication on the backend

[EMAIL PROTECTED] (Gustavo Tonini) writes:
 But,  wouldn't the performance be better? And wouldn't asynchronous
 messages be better processed?

Why do you think performance would be materially affected by this?

The MAJOR performance bottleneck is normally the slow network
connection between servers.

When looked at in the perspective of that bottleneck, pretty much
everything else is just noise.  (Sometimes pretty loud noise, but
still noise :-).)
-- 
let name=cbbrowne and tld=cbbrowne.com in name ^ @ ^ tld;;
http://cbbrowne.com/info/spreadsheets.html
When the grammar checker identifies an error, it suggests a
correction and can even makes some changes for you.  
-- Microsoft Word for Windows 2.0 User's Guide, p.35:

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Replication on the backend

2005-12-06 Thread Michael Meskes
  Postgres-R, pgcluster, Slony-II. Some more advanced, some less. But
  certainly nothing I would send into the ring against Oracle-Grid.

Assuming that you mean Oracle Real Application Cluster (the Grid is more, 
right?) I wonder if this technology technically still counts as replication. 
AFAIK they do not replicate data but share a common data pool among different 
servers. You still have communication overhead but you write a tuple only 
once for all servers involved. Takes away a lot of overhead on a system 
that's heavily written too.

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Replication on the backend

2005-12-06 Thread Rick Gigger

Just like MySql!

On Dec 5, 2005, at 10:35 PM, Jan Wieck wrote:


On 12/5/2005 8:18 PM, Gustavo Tonini wrote:


replication (master/slave, multi-master, etc) implemented inside
postgres...I would like to know what has been make in this area.


We do not plan to implement replication inside the backend.  
Replication needs are so diverse that pluggable replication support  
makes a lot more sense. To me it even makes more sense than keeping  
transaction support outside of the database itself and add it via  
pluggable storage add-on.



Jan



Gustavo.
P.S. Sorry for my bad English.
2005/12/5, Chris Browne [EMAIL PROTECTED]:


[EMAIL PROTECTED] (Gustavo Tonini) writes:
 What about replication or data distribution inside the  
backend.  This

 is a valid issue?

I'm not sure what your question is...
--
(reverse (concatenate 'string gro.gultn @ enworbbc))
http://www.ntlug.org/~cbbrowne/x.html
Love is like a snowmobile flying over the frozen tundra that  
suddenly

flips, pinning you underneath.  At night, the ice weasels come.
-- Matt Groening

---(end of  
broadcast)---

TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so  
that your

   message can get through to the mailing list cleanly




--
#= 
=#
# It's easier to get forgiveness for being wrong than for being  
right. #
# Let's break this rule - forgive  
me.  #
#==  
[EMAIL PROTECTED] #


---(end of  
broadcast)---

TIP 5: don't forget to increase your free space map settings




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Replication on the backend

2005-12-06 Thread Rick Gigger

  - Asynchronous master to multi-slave. We have a few of those with
Mommoth-Replicator and Slony-I being the top players. Slony-I does
need some cleanup and/or reimplementation after we have a general
pluggable replication API in place.


Is this API actually have people working on it or just something on  
the todo list?



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Replication on the backend

2005-12-06 Thread Gustavo Tonini
I don't see anything in the TODO list. I'm very interesting in work that. If is possible...

Gustavo.


Re: [HACKERS] Replication on the backend

2005-12-06 Thread Aly S.P Dharshi
I would classify it as a clustered database system (Oracle 10g that is).
Clustered meaning more than one node in the cluster.

ALy.

On Tue, 6 Dec 2005, Michael Meskes wrote:

  Postgres-R, pgcluster, Slony-II. Some more advanced, some less. But
  certainly nothing I would send into the ring against Oracle-Grid.

Assuming that you mean Oracle Real Application Cluster (the Grid is more,
right?) I wonder if this technology technically still counts as replication.
AFAIK they do not replicate data but share a common data pool among different
servers. You still have communication overhead but you write a tuple only
once for all servers involved. Takes away a lot of overhead on a system
that's heavily written too.

Michael


-- 
Aly S.P Dharshi
[EMAIL PROTECTED]

 A good speech is like a good dress
  that's short enough to be interesting
  and long enough to cover the subject

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Replication on the backend

2005-12-06 Thread Jan Wieck

On 12/6/2005 11:23 AM, Mario Weilguni wrote:


IMO this is not true. You can get affordable 10GBit network adapters, so you 
can have plenty of bandwith in a db server pool (if they are located in the 
same area). Even 1GBit Ethernet greatly helps here, and would make it possible 
to balance read-intensive (and not write intensive) applications. We using 
linux bonding interface with 2 gbit NICs, and 200 MBytes/sec throughput is 
something you need to have a quite some harddisks to reach that. Latency is not 
bad too.


It's not so much the bandwidth but more the roundtrips that limit your 
maximum transaction throughput. Remember, whatever the priority, you 
can't increase the speed of light.



Jan




Regards,
Mario weilguni


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Chris Browne
Sent: Tuesday, December 06, 2005 4:43 PM
To: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Replication on the backend

[EMAIL PROTECTED] (Gustavo Tonini) writes:

But,  wouldn't the performance be better? And wouldn't asynchronous
messages be better processed?


Why do you think performance would be materially affected by this?

The MAJOR performance bottleneck is normally the slow network
connection between servers.

When looked at in the perspective of that bottleneck, pretty much
everything else is just noise.  (Sometimes pretty loud noise, but
still noise :-).)



--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Replication on the backend

2005-12-06 Thread Gregory Maxwell
On 12/6/05, Jan Wieck [EMAIL PROTECTED] wrote:
  IMO this is not true. You can get affordable 10GBit network adapters, so 
  you can have plenty of bandwith in a db server pool (if they are located in 
  the same area). Even 1GBit Ethernet greatly helps here, and would make it 
  possible to balance read-intensive (and not write intensive) applications. 
  We using linux bonding interface with 2 gbit NICs, and 200 MBytes/sec 
  throughput is something you need to have a quite some harddisks to reach 
  that. Latency is not bad too.

 It's not so much the bandwidth but more the roundtrips that limit your
 maximum transaction throughput. Remember, whatever the priority, you
 can't increase the speed of light.

Eh, why would light limited delay be any slower than a disk on FC the
same distance away? :)

In any case, performance of PG on iscsi is just fine. You can't blame
the network... Doing multimaster replication is hard because the
locking primitives that are fine on a simple multiprocessor system
(with a VERY high bandwidth very low latency interconnect between
processors) just don't work across a network, so you're left finding
other methods and making them work...

But again, multimaster isn't hard because there of some inherently
slow property of networks.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Replication on the backend

2005-12-06 Thread Markus Schiltknecht
On Tue, 2005-12-06 at 23:19 -0500, Jan Wieck wrote:
 It's not so much the bandwidth but more the roundtrips that limit your 
 maximum transaction throughput. 

I completely agree that the latency is counting, not the bandwith.

Does anybody have latency / roundtrip measurements for current hardware?
I'm interested in:
1Gb Ethernet,
10 Gb Ethernet,
InfiniBand,
probably even p2p usb2 or firewire links?

At least Quadrics claims(1) to have measured only 1.38 microseconds.
Assuming real world condition would give you 5 microseconds, on a 3 GHz
processor that's 15'000 CPY cycles. Which is IMHO not that much any
more. Or am I wrong (mental arithmetic never was my favourite subject)?

Regards
Markus

[1]
http://www.quadrics.com/quadrics/QuadricsHome.nsf/NewsByDate/98FFE60F799AC95180256FEA002A6D9D



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Replication on the backend

2005-12-05 Thread Gustavo Tonini
What about replication or data distribution inside the backend. This is a valid issue?

Thanks,
Gustavo.


Re: [HACKERS] Replication on the backend

2005-12-05 Thread Joshua D. Drake

Gustavo Tonini wrote:
replication (master/slave, multi-master, etc) implemented inside 
postgres...I would like to know what has been make in this area.

http://www.commandprompt.com/ - Master/Slave

Joshua D. Drake




Gustavo.

P.S. Sorry for my bad English.

2005/12/5, Chris Browne [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]:

[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] (Gustavo
Tonini) writes:
 What about replication or data distribution inside the
backend.  This
 is a valid issue?

I'm not sure what your question is...
--
(reverse (concatenate 'string gro.gultn @ enworbbc))
http://www.ntlug.org/~cbbrowne/x.html
http://www.ntlug.org/%7Ecbbrowne/x.html
Love is like a snowmobile flying over the frozen tundra that
suddenly
flips, pinning you underneath.  At night, the ice weasels come.
-- Matt Groening

---(end of
broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED]
mailto:[EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly





---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Replication on the backend

2005-12-05 Thread Christopher Kings-Lynne
replication (master/slave, multi-master, etc) implemented inside 
postgres...I would like to know what has been make in this area.


It's not in the backend, check out things like Slony (www.slony.info) 
and various other commercial solutions.


Chris


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Replication on the backend

2005-12-05 Thread Jan Wieck

On 12/5/2005 8:18 PM, Gustavo Tonini wrote:


replication (master/slave, multi-master, etc) implemented inside
postgres...I would like to know what has been make in this area.


We do not plan to implement replication inside the backend. Replication 
needs are so diverse that pluggable replication support makes a lot more 
sense. To me it even makes more sense than keeping transaction support 
outside of the database itself and add it via pluggable storage add-on.



Jan




Gustavo.

P.S. Sorry for my bad English.

2005/12/5, Chris Browne [EMAIL PROTECTED]:


[EMAIL PROTECTED] (Gustavo Tonini) writes:
 What about replication or data distribution inside the backend.  This
 is a valid issue?

I'm not sure what your question is...
--
(reverse (concatenate 'string gro.gultn @ enworbbc))
http://www.ntlug.org/~cbbrowne/x.html
Love is like a snowmobile flying over the frozen tundra that suddenly
flips, pinning you underneath.  At night, the ice weasels come.
-- Matt Groening

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly






--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings