Re: [HACKERS] postgresql clustering

2005-09-30 Thread Daniel Duvall
Thanks for your reply Luke.

Bizgres looks like a very promissing project.  I'll be sure to follow
it.

Thanks to everyone for their comments.  I'm starting to understand the
truth behind the hype and where these performance gains and hits stem
from.

-Dan


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] postgresql clustering

2005-09-30 Thread Daniel Duvall
What about clustered filesystems?  At first blush I would think the
overhead of something like GFS might kill performance.  Could one
potentially achieve a fail-over config using multiple nodes with GFS,
each having there own instance of PostgreSQL (but only one running at
any given moment)?

Best,
Dan


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Fwd: Re: [HACKERS] postgresql clustering

2005-09-30 Thread Trent Shipley
What is the relationship between database support for clustering and grid 
computing and support for distributed databases?

Two-phase COMMIT is comming in 8.1.  What effect will this have in promoting 
FOSS grid support or distribution solutions for Postgresql? 

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] postgresql clustering

2005-09-30 Thread Luke Lonergan
Dan,

On 9/29/05 3:23 PM, Daniel Duvall [EMAIL PROTECTED] wrote:

 What about clustered filesystems?  At first blush I would think the
 overhead of something like GFS might kill performance.  Could one
 potentially achieve a fail-over config using multiple nodes with GFS,
 each having there own instance of PostgreSQL (but only one running at
 any given moment)?

Interestingly - my friend Matt O'Keefe built GFS at UMN, I was one of his
first customers/sponsors of the research in 1998 when I implemented an
8-node shared disk cluster on Alpha Linux using GFS and Fibre Channel.

Again - it depends on what you're doing - if it's OLTP, you will spend too
much time in lock management for disk access and things like Oracle RAC's
CacheFusion becomes critical to reduce the number of times you have to hit
disks.  For warehousing/sequential scans, this kind of clustering is
irrelevant.

- Luke 



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] postgresql clustering

2005-09-30 Thread Hans-Jürgen Schönig

Luke Lonergan wrote:

Dan,

On 9/29/05 3:23 PM, Daniel Duvall [EMAIL PROTECTED] wrote:



What about clustered filesystems?  At first blush I would think the
overhead of something like GFS might kill performance.  Could one
potentially achieve a fail-over config using multiple nodes with GFS,
each having there own instance of PostgreSQL (but only one running at
any given moment)?



Interestingly - my friend Matt O'Keefe built GFS at UMN, I was one of his
first customers/sponsors of the research in 1998 when I implemented an
8-node shared disk cluster on Alpha Linux using GFS and Fibre Channel.

Again - it depends on what you're doing - if it's OLTP, you will spend too
much time in lock management for disk access and things like Oracle RAC's
CacheFusion becomes critical to reduce the number of times you have to hit
disks.  



Hitting the disk is really bad. However, we have seen that consulting 
the network for small portions of data (e.g. locks) is even more 
critical. you will see that the CPU on all nodes is running at 1% or so 
while the network is waiting for data to be exchanged (latency) - this 
is the real problem.


i don't know what oracle is doing in detail but they have real problem 
when losing a node inside the cluster (syncing again is really time 
consuming).




For warehousing/sequential scans, this kind of clustering is
irrelevant.


I suggest to look at Teradata - for do really nice query partitioning on 
so called AMPs (we'd simply call it node). It is really nice for really 
ugly warehousing queries (ugly in terms of amount of data).


Hans



--
Cybertec Geschwinde  Schönig GmbH
Schöngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] postgresql clustering

2005-09-29 Thread Daniel Duvall
While clustering in some circles may be an open-ended buzzword --
mainly the commercial DB marketing crowd -- there are concepts beneath
the bull that are even inherent in the name.  However, I understand
your point.

From what I've researched, the concepts and practices seem to fall
under one of two abstract categorizations: fail-over (ok...
high-availability), and parallel execution (high-performance... sure).
While some consider the implementation of only one of these to qualify
a cluster, others seem to demand that a true cluster must
implement both.

What I'm really after is a DB setup that does fail-over and parallel
execution.  Your setup sounds like it would gracefully handle the
former, but cannot achieve the latter.  Perhaps I'm simply asking too
much of a free software setup.

Thanks for your response.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] postgresql clustering

2005-09-29 Thread Tino Wildenhain

Daniel Duvall schrieb:

While clustering in some circles may be an open-ended buzzword --
mainly the commercial DB marketing crowd -- there are concepts beneath
the bull that are even inherent in the name.  However, I understand
your point.


From what I've researched, the concepts and practices seem to fall

under one of two abstract categorizations: fail-over (ok...
high-availability), and parallel execution (high-performance... sure).


Well, I dont know why many people believe parallel execution
automatically means high performance. Actually most of the time
the performance is much worser this way.
If your dataset remains statically and you do only read-only
requets, you get higher performance thru load-balancing.
If howewer you do some changes to the data, the change has to
be propagated to all nodes - which in fact costs performance.
This highly depends on the link speed between the nodes.


While some consider the implementation of only one of these to qualify
a cluster, others seem to demand that a true cluster must
implement both.

What I'm really after is a DB setup that does fail-over and parallel
execution.  Your setup sounds like it would gracefully handle the
former, but cannot achieve the latter.  Perhaps I'm simply asking too
much of a free software setup.


commercial vendors arent much better here - they just dont tell you :-)
There is pgpool or SQLRelay for example if you want to parallelize
requests, you can combine with the various replication mechanism
also available for PG and get what you want - and most important
- get whats possible. Nobody can trick the math :-)


Greets
Tino

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] postgresql clustering

2005-09-29 Thread Jonah H. Harris
On 9/29/05, Tino Wildenhain [EMAIL PROTECTED] wrote:
Well, I dont know why many people believe parallel executionautomatically means high performance. Actually most of the timethe performance is much worser this way.If your dataset remains statically and you do only read-only
requets, you get higher performance thru load-balancing.If howewer you do some changes to the data, the change has tobe propagated to all nodes - which in fact costs performance.This highly depends on the link speed between the nodes.

I think you should clarify that the type of clustering you're
discussing is the, shared-nothing model which is most prevalent in
open-source databases. Shared-disk and shared-memory clustered
systems do not have the propagation issue but do have others
(distributed lock manager, etc). Don't make blind
statements. If you want more information about real-world
clustering, read the research for DB2 (Mainframe) and Oracle RAC.
-- Respectfully,Jonah H. Harris, Database Internals ArchitectEnterpriseDB Corporationhttp://www.enterprisedb.com/


Re: [HACKERS] postgresql clustering

2005-09-29 Thread Gaetano Mendola
Daniel Duvall wrote:
 While clustering in some circles may be an open-ended buzzword --
 mainly the commercial DB marketing crowd -- there are concepts beneath
 the bull that are even inherent in the name.  However, I understand
 your point.
 
From what I've researched, the concepts and practices seem to fall
 under one of two abstract categorizations: fail-over (ok...
 high-availability), and parallel execution (high-performance... sure).
 While some consider the implementation of only one of these to qualify
 a cluster, others seem to demand that a true cluster must
 implement both.
 
 What I'm really after is a DB setup that does fail-over and parallel
 execution.  Your setup sounds like it would gracefully handle the
 former, but cannot achieve the latter.  Perhaps I'm simply asking too
 much of a free software setup.
 
 Thanks for your response.
 

Also consider the PITR and some work I did last year:
http://archives.postgresql.org/pgsql-admin/2005-06/msg00013.php

With PITR you can have one or more remote machine/s that
continuously replay log from main, and if the main crash
the mirrors can come out from their reply and go on line.

At that time was not possible connect to a replayng engine
to perform ( at least ) queries, dunno if this changed in 8.1

BTW, did someone go further with that idea? If not I'd like rewrite
that stuff in C ( I do prefer C++ ).

Regards
Gaetano Mendola





---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] postgresql clustering

2005-09-29 Thread Tino Wildenhain

Jonah H. Harris schrieb:
On 9/29/05, *Tino Wildenhain* [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED] wrote:


Well, I dont know why many people believe parallel execution
automatically means high performance. Actually most of the time
the performance is much worser this way.
If your dataset remains statically and you do only read-only
requets, you get higher performance thru load-balancing.
If howewer you do some changes to the data, the change has to
be propagated to all nodes - which in fact costs performance.
This highly depends on the link speed between the nodes. 



I think you should clarify that the type of clustering you're discussing 
is the, shared-nothing model which is most prevalent in open-source 
databases.  Shared-disk and shared-memory clustered systems do not have 
the propagation issue but do have others (distributed lock manager, 
etc).  Don't make blind statements.  If you want more information about 
real-world clustering, read the research for DB2 (Mainframe) and 
Oracle RAC.


No, thats not a blind statement ;) It does not matter how the
information is technically shared - shared mem must be
copied or accessed over network links if you have more then
one independend system. Locks are informations too - thus the
same constraints apply.

So no matter how you label the problem, the basic constraints:
read communication and synchronisation overhead will remain.

Costom solutions can circumvent some of the problems if you
can shift the problem area (e.g. have some read-only areas,
some seldom-write areas and some high write, some seldom read
and not immediately propagated data)


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] postgresql clustering

2005-09-29 Thread Luke Lonergan
Daniel,

From what I've researched, the concepts and practices seem to fall
 under one of two abstract categorizations: fail-over (ok...
 high-availability), and parallel execution (high-performance... sure).
 While some consider the implementation of only one of these to qualify
 a cluster, others seem to demand that a true cluster must
 implement both.

If you want to get a high degree of parallelism, 10s or 100s of machines are 
required.   At that size, you must have fault tolerance to make the ystem 
usable.

 What I'm really after is a DB setup that does fail-over and parallel
 execution.  Your setup sounds like it would gracefully handle the
 former, but cannot achieve the latter.  Perhaps I'm simply asking too
 much of a free software setup.

We've spent the last 3 years developing a parallel database that does both and 
I can tell you that it takes a huge development effort to get it right for the 
general audience.  Bizgres MPP is capable of handling ANSI SQL, is ACID 
compliant and scales to tens of terabytes, but it's not free (sorry about 
that).  It is tons cheaper than Oracle or Teradata though, and it's based on 
Postgres.

- Luke


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] postgresql clustering

2005-09-28 Thread Gaetano Mendola
Daniel Duvall wrote:

 I've looked at PostgreSQL and EnterpriseDB, but I can't find anything
 definitive  as far as clustering capabilities.  What kinds of projects
 are there for clustering PgSQL, and are any of them mature enough for
 commercial apps?

As you well know clustering means all and nothing at the same time.
We do have a commercial failover cluster for provided by Redhat,
with postgres running on it. The Postgres is installed on both nodes and the
data are stored on SAN, only one instance of postgres run at time in one
of two nodes. In last 2 years we had a failure and the service relocation
worked as expected.

Consider also that applications shall have a good behaviour like try to
close the current connection and retry to open a new one for a while

Regards
Gaetano Mendola


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] postgresql clustering

2005-09-28 Thread Joshua D. Drake

Gaetano Mendola wrote:


Daniel Duvall wrote:

 


I've looked at PostgreSQL and EnterpriseDB, but I can't find anything
definitive  as far as clustering capabilities.  What kinds of projects
are there for clustering PgSQL, and are any of them mature enough for
commercial apps?
   



Are you looking for clustering or replication? There are two very 
popular replication

solutions: Slony-I and Mammoth Replicator.

Slony-I is an external replication solution, Mammoth Replicator is a 
complete

PostgreSQL + Replication solution.

Sincerely,

Joshua D. Drake



As you well know clustering means all and nothing at the same time.
We do have a commercial failover cluster for provided by Redhat,
with postgres running on it. The Postgres is installed on both nodes and the
data are stored on SAN, only one instance of postgres run at time in one
of two nodes. In last 2 years we had a failure and the service relocation
worked as expected.

Consider also that applications shall have a good behaviour like try to
close the current connection and retry to open a new one for a while

Regards
Gaetano Mendola


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings
 




--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] postgresql clustering

2005-09-22 Thread Daniel Duvall
Jonah,

I stumbled on this discussion in one of my recurring searches for an
open-source database app capable of true clustering (failover, load
balancing, etc) that I can pair with my PHP application.  A search
that, sadly, most often ends in disappointment -- there's tons and tons
of database marketing BS out there.

Part of my frustration is do to my lack of a real understanding of the
models you mentioned in your comment.  I've been searching for
meaningful text and comparisons of the different clustering models, but
have yet to find anything that truely breaks it down well (and deep).

Could you perhaps point me -- and anyone else that happens upon this
post with the same frustrations -- in the right direction?

I've looked at PostgreSQL and EnterpriseDB, but I can't find anything
definitive  as far as clustering capabilities.  What kinds of projects
are there for clustering PgSQL, and are any of them mature enough for
commercial apps?

Best,
Dan


Jonah H. Harris wrote:
 In the past couple years I've worked on several personal/business projects
 to cluster PostgreSQL and InnoDB (without MySQL). I've tested
 shared-nothing, shared-memory, and shared-disk models. IMHO, shared-disk is
 the only viable option for performance and/or large production business
 environments. Using shared-memory or shared-nothing architectures in a
 database are fine for high-availability, but are expensive from a
 business-case for added performance. I'd be happy to share any of my
 clustering knowledge with ya offline. Have fun!



 On 9/21/05, Rafik Salama [EMAIL PROTECTED] wrote:
 
  No I do not have a case study, I just read so, but what I am suggesting to
  start doing is that if there is no cluster implementation to give high
  availability of the database, I will start doing this project through the
  message passing technique and I already have in the university a cluster
  of
  19 machine intel xeon, you can see it in this URL
  http://www.cs.aucegypt.edu/~cluster
 
  But any way I was just asking so as not to reinvent the Wheel, in case
  there
  is something like that, but since there is not, I will give it a try, at
  the
  end of the day it is open source and I can do anything and if it happens
  to
  work, who knows
 
  Thanks
 
  Rafik Salama
  Systems Architect
 
  CIT Global
  CIT Building, Free Zone
  Nasr City,
  P.O.Box 11816, Cairo, Egypt
  Tel : +202 271 8794 (ext. 115)
  Fax : +202 2748335
  Cell: +2010 5410035
  http://www.citglobal.com
 
  -Original Message-
  From: David Fetter [mailto:[EMAIL PROTECTED]
  Sent: Wednesday, September 21, 2005 8:12 PM
  To: Rafik Salama
  Cc: pgsql-hackers@postgresql.org
  Subject: Re: [HACKERS] postgresql clustering
 
  On Wed, Sep 21, 2005 at 08:01:08PM +0300, Rafik Salama wrote:
   Dear Sirs
  
   I know that that postgresql can be configured for high availability
   over a clustered environment using pgcluster,
 
  Do you have a case study showing this?
 
   I am currently studying in my masters the clustering using MPI and
   OpenMP, PVM and others packages and I have to do a project, so I was
   thinking to use this opportunity to start implementing the
   clustering over postgresql using any of the above packages.
  
   What do you think?
 
  Let a thousand schools of thought content. Let a hundred flowers
  bloom.
 
  Cheers,
  D
  --
  David Fetter [EMAIL PROTECTED] http://fetter.org/
  phone: +1 510 893 6100 mobile: +1 415 235 3778
 
  Remember to vote!
 
 
  ---(end of broadcast)---
  TIP 5: don't forget to increase your free space map settings
 



 --
 Respectfully,

 Jonah H. Harris, Database Internals Architect
 EnterpriseDB Corporation
 http://www.enterprisedb.com/


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] postgresql clustering

2005-09-21 Thread David Fetter
On Wed, Sep 21, 2005 at 08:01:08PM +0300, Rafik Salama wrote:
 Dear Sirs
 
 I know that that postgresql can be configured for high availability
 over a clustered environment using pgcluster,

Do you have a case study showing this?

 I am currently studying in my masters the clustering using MPI and
 OpenMP, PVM and others packages and I have to do a project, so I was
 thinking to use this opportunity to start implementing the
 clustering over postgresql using any of the above packages.
  
 What do you think?

Let a thousand schools of thought content.  Let a hundred flowers
bloom.

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] postgresql clustering

2005-09-21 Thread Aly Dharshi
I think its a great idea to give it a shot, maybe you can present a 
proposal to the list of how you wish to go about it. There could be some 
experts on the list who may give you some input and direction.


Aly.

David Fetter wrote:

On Wed, Sep 21, 2005 at 08:01:08PM +0300, Rafik Salama wrote:


Dear Sirs

I know that that postgresql can be configured for high availability
over a clustered environment using pgcluster,



Do you have a case study showing this?



I am currently studying in my masters the clustering using MPI and
OpenMP, PVM and others packages and I have to do a project, so I was
thinking to use this opportunity to start implementing the
clustering over postgresql using any of the above packages.

What do you think?



Let a thousand schools of thought content.  Let a hundred flowers
bloom.

Cheers,
D


--
Aly Dharshi
[EMAIL PROTECTED]

 A good speech is like a good dress
  that's short enough to be interesting
  and long enough to cover the subject


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] postgresql clustering

2005-09-21 Thread Rafik Salama
No I do not have a case study, I just read so, but what I am suggesting to
start doing is that if there is no cluster implementation to give high
availability of the database, I will start doing this project through the
message passing technique and I already have in the university a cluster of
19 machine intel xeon, you can see it in this URL
http://www.cs.aucegypt.edu/~cluster

But any way I was just asking so as not to reinvent the Wheel, in case there
is something like that, but since there is not, I will give it a try, at the
end of the day it is open source and I can do anything and if it happens to
work, who knows

Thanks

Rafik Salama
Systems Architect

CIT Global
CIT Building, Free Zone
Nasr City,
P.O.Box 11816, Cairo, Egypt
Tel : +202 271 8794 (ext. 115)
Fax : +202 2748335
Cell: +2010 5410035
http://www.citglobal.com

-Original Message-
From: David Fetter [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 21, 2005 8:12 PM
To: Rafik Salama
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] postgresql clustering

On Wed, Sep 21, 2005 at 08:01:08PM +0300, Rafik Salama wrote:
 Dear Sirs
 
 I know that that postgresql can be configured for high availability
 over a clustered environment using pgcluster,

Do you have a case study showing this?

 I am currently studying in my masters the clustering using MPI and
 OpenMP, PVM and others packages and I have to do a project, so I was
 thinking to use this opportunity to start implementing the
 clustering over postgresql using any of the above packages.
  
 What do you think?

Let a thousand schools of thought content.  Let a hundred flowers
bloom.

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] postgresql clustering

2005-09-21 Thread Jonah H. Harris
In the past couple years I've worked on several personal/business
projects to cluster PostgreSQL and InnoDB (without MySQL). I've
tested shared-nothing, shared-memory, and shared-disk models.
IMHO, shared-disk is the only viable option for performance and/or
large production business environments. Using shared-memory or
shared-nothing architectures in a database are fine for
high-availability, but are expensive from a business-case for added
performance. I'd be happy to share any of my clustering knowledge
with ya offline. Have fun!

On 9/21/05, Rafik Salama [EMAIL PROTECTED] wrote:
No I do not have a case study, I just read so, but what I am suggesting tostart doing is that if there is no cluster implementation to give highavailability of the database, I will start doing this project through the
message passing technique and I already have in the university a cluster of19 machine intel xeon, you can see it in this URLhttp://www.cs.aucegypt.edu/~cluster
But any way I was just asking so as not to reinvent the Wheel, in case thereis something like that, but since there is not, I will give it a try, at theend of the day it is open source and I can do anything and if it happens to
work, who knowsThanksRafik SalamaSystems ArchitectCIT GlobalCIT Building, Free ZoneNasr City,P.O.Box 11816, Cairo, EgyptTel : +202 271 8794 (ext. 115)Fax : +202 2748335
Cell: +2010 5410035http://www.citglobal.com-Original Message-From: David Fetter [mailto:[EMAIL PROTECTED]]Sent: Wednesday, September 21, 2005 8:12 PM
To: Rafik SalamaCc: pgsql-hackers@postgresql.orgSubject: Re: [HACKERS] postgresql clusteringOn Wed, Sep 21, 2005 at 08:01:08PM +0300, Rafik Salama wrote:
 Dear Sirs I know that that postgresql can be configured for high availability over a clustered environment using pgcluster,Do you have a case study showing this? I am currently studying in my masters the clustering using MPI and
 OpenMP, PVM and others packages and I have to do a project, so I was thinking to use this opportunity to start implementing the clustering over postgresql using any of the above packages.
 What do you think?Let a thousand schools of thought content.Let a hundred flowersbloom.Cheers,D--David Fetter [EMAIL PROTECTED] 
http://fetter.org/phone: +1 510 893 6100 mobile: +1 415 235 3778Remember to vote!---(end of broadcast)---TIP 5: don't forget to increase your free space map settings
-- Respectfully,Jonah H. Harris, Database Internals ArchitectEnterpriseDB Corporationhttp://www.enterprisedb.com/