Re: [HACKERS] postgresql clustering
Thanks for your reply Luke. Bizgres looks like a very promissing project. I'll be sure to follow it. Thanks to everyone for their comments. I'm starting to understand the truth behind the hype and where these performance gains and hits stem from. -Dan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] postgresql clustering
What about clustered filesystems? At first blush I would think the overhead of something like GFS might kill performance. Could one potentially achieve a fail-over config using multiple nodes with GFS, each having there own instance of PostgreSQL (but only one running at any given moment)? Best, Dan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Fwd: Re: [HACKERS] postgresql clustering
What is the relationship between database support for clustering and grid computing and support for distributed databases? Two-phase COMMIT is comming in 8.1. What effect will this have in promoting FOSS grid support or distribution solutions for Postgresql? ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] postgresql clustering
Dan, On 9/29/05 3:23 PM, Daniel Duvall [EMAIL PROTECTED] wrote: What about clustered filesystems? At first blush I would think the overhead of something like GFS might kill performance. Could one potentially achieve a fail-over config using multiple nodes with GFS, each having there own instance of PostgreSQL (but only one running at any given moment)? Interestingly - my friend Matt O'Keefe built GFS at UMN, I was one of his first customers/sponsors of the research in 1998 when I implemented an 8-node shared disk cluster on Alpha Linux using GFS and Fibre Channel. Again - it depends on what you're doing - if it's OLTP, you will spend too much time in lock management for disk access and things like Oracle RAC's CacheFusion becomes critical to reduce the number of times you have to hit disks. For warehousing/sequential scans, this kind of clustering is irrelevant. - Luke ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] postgresql clustering
Luke Lonergan wrote: Dan, On 9/29/05 3:23 PM, Daniel Duvall [EMAIL PROTECTED] wrote: What about clustered filesystems? At first blush I would think the overhead of something like GFS might kill performance. Could one potentially achieve a fail-over config using multiple nodes with GFS, each having there own instance of PostgreSQL (but only one running at any given moment)? Interestingly - my friend Matt O'Keefe built GFS at UMN, I was one of his first customers/sponsors of the research in 1998 when I implemented an 8-node shared disk cluster on Alpha Linux using GFS and Fibre Channel. Again - it depends on what you're doing - if it's OLTP, you will spend too much time in lock management for disk access and things like Oracle RAC's CacheFusion becomes critical to reduce the number of times you have to hit disks. Hitting the disk is really bad. However, we have seen that consulting the network for small portions of data (e.g. locks) is even more critical. you will see that the CPU on all nodes is running at 1% or so while the network is waiting for data to be exchanged (latency) - this is the real problem. i don't know what oracle is doing in detail but they have real problem when losing a node inside the cluster (syncing again is really time consuming). For warehousing/sequential scans, this kind of clustering is irrelevant. I suggest to look at Teradata - for do really nice query partitioning on so called AMPs (we'd simply call it node). It is really nice for really ugly warehousing queries (ugly in terms of amount of data). Hans -- Cybertec Geschwinde Schönig GmbH Schöngrabern 134; A-2020 Hollabrunn Tel: +43/1/205 10 35 / 340 www.postgresql.at, www.cybertec.at ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] postgresql clustering
While clustering in some circles may be an open-ended buzzword -- mainly the commercial DB marketing crowd -- there are concepts beneath the bull that are even inherent in the name. However, I understand your point. From what I've researched, the concepts and practices seem to fall under one of two abstract categorizations: fail-over (ok... high-availability), and parallel execution (high-performance... sure). While some consider the implementation of only one of these to qualify a cluster, others seem to demand that a true cluster must implement both. What I'm really after is a DB setup that does fail-over and parallel execution. Your setup sounds like it would gracefully handle the former, but cannot achieve the latter. Perhaps I'm simply asking too much of a free software setup. Thanks for your response. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] postgresql clustering
Daniel Duvall schrieb: While clustering in some circles may be an open-ended buzzword -- mainly the commercial DB marketing crowd -- there are concepts beneath the bull that are even inherent in the name. However, I understand your point. From what I've researched, the concepts and practices seem to fall under one of two abstract categorizations: fail-over (ok... high-availability), and parallel execution (high-performance... sure). Well, I dont know why many people believe parallel execution automatically means high performance. Actually most of the time the performance is much worser this way. If your dataset remains statically and you do only read-only requets, you get higher performance thru load-balancing. If howewer you do some changes to the data, the change has to be propagated to all nodes - which in fact costs performance. This highly depends on the link speed between the nodes. While some consider the implementation of only one of these to qualify a cluster, others seem to demand that a true cluster must implement both. What I'm really after is a DB setup that does fail-over and parallel execution. Your setup sounds like it would gracefully handle the former, but cannot achieve the latter. Perhaps I'm simply asking too much of a free software setup. commercial vendors arent much better here - they just dont tell you :-) There is pgpool or SQLRelay for example if you want to parallelize requests, you can combine with the various replication mechanism also available for PG and get what you want - and most important - get whats possible. Nobody can trick the math :-) Greets Tino ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] postgresql clustering
On 9/29/05, Tino Wildenhain [EMAIL PROTECTED] wrote: Well, I dont know why many people believe parallel executionautomatically means high performance. Actually most of the timethe performance is much worser this way.If your dataset remains statically and you do only read-only requets, you get higher performance thru load-balancing.If howewer you do some changes to the data, the change has tobe propagated to all nodes - which in fact costs performance.This highly depends on the link speed between the nodes. I think you should clarify that the type of clustering you're discussing is the, shared-nothing model which is most prevalent in open-source databases. Shared-disk and shared-memory clustered systems do not have the propagation issue but do have others (distributed lock manager, etc). Don't make blind statements. If you want more information about real-world clustering, read the research for DB2 (Mainframe) and Oracle RAC. -- Respectfully,Jonah H. Harris, Database Internals ArchitectEnterpriseDB Corporationhttp://www.enterprisedb.com/
Re: [HACKERS] postgresql clustering
Daniel Duvall wrote: While clustering in some circles may be an open-ended buzzword -- mainly the commercial DB marketing crowd -- there are concepts beneath the bull that are even inherent in the name. However, I understand your point. From what I've researched, the concepts and practices seem to fall under one of two abstract categorizations: fail-over (ok... high-availability), and parallel execution (high-performance... sure). While some consider the implementation of only one of these to qualify a cluster, others seem to demand that a true cluster must implement both. What I'm really after is a DB setup that does fail-over and parallel execution. Your setup sounds like it would gracefully handle the former, but cannot achieve the latter. Perhaps I'm simply asking too much of a free software setup. Thanks for your response. Also consider the PITR and some work I did last year: http://archives.postgresql.org/pgsql-admin/2005-06/msg00013.php With PITR you can have one or more remote machine/s that continuously replay log from main, and if the main crash the mirrors can come out from their reply and go on line. At that time was not possible connect to a replayng engine to perform ( at least ) queries, dunno if this changed in 8.1 BTW, did someone go further with that idea? If not I'd like rewrite that stuff in C ( I do prefer C++ ). Regards Gaetano Mendola ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] postgresql clustering
Jonah H. Harris schrieb: On 9/29/05, *Tino Wildenhain* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Well, I dont know why many people believe parallel execution automatically means high performance. Actually most of the time the performance is much worser this way. If your dataset remains statically and you do only read-only requets, you get higher performance thru load-balancing. If howewer you do some changes to the data, the change has to be propagated to all nodes - which in fact costs performance. This highly depends on the link speed between the nodes. I think you should clarify that the type of clustering you're discussing is the, shared-nothing model which is most prevalent in open-source databases. Shared-disk and shared-memory clustered systems do not have the propagation issue but do have others (distributed lock manager, etc). Don't make blind statements. If you want more information about real-world clustering, read the research for DB2 (Mainframe) and Oracle RAC. No, thats not a blind statement ;) It does not matter how the information is technically shared - shared mem must be copied or accessed over network links if you have more then one independend system. Locks are informations too - thus the same constraints apply. So no matter how you label the problem, the basic constraints: read communication and synchronisation overhead will remain. Costom solutions can circumvent some of the problems if you can shift the problem area (e.g. have some read-only areas, some seldom-write areas and some high write, some seldom read and not immediately propagated data) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] postgresql clustering
Daniel, From what I've researched, the concepts and practices seem to fall under one of two abstract categorizations: fail-over (ok... high-availability), and parallel execution (high-performance... sure). While some consider the implementation of only one of these to qualify a cluster, others seem to demand that a true cluster must implement both. If you want to get a high degree of parallelism, 10s or 100s of machines are required. At that size, you must have fault tolerance to make the ystem usable. What I'm really after is a DB setup that does fail-over and parallel execution. Your setup sounds like it would gracefully handle the former, but cannot achieve the latter. Perhaps I'm simply asking too much of a free software setup. We've spent the last 3 years developing a parallel database that does both and I can tell you that it takes a huge development effort to get it right for the general audience. Bizgres MPP is capable of handling ANSI SQL, is ACID compliant and scales to tens of terabytes, but it's not free (sorry about that). It is tons cheaper than Oracle or Teradata though, and it's based on Postgres. - Luke ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] postgresql clustering
Daniel Duvall wrote: I've looked at PostgreSQL and EnterpriseDB, but I can't find anything definitive as far as clustering capabilities. What kinds of projects are there for clustering PgSQL, and are any of them mature enough for commercial apps? As you well know clustering means all and nothing at the same time. We do have a commercial failover cluster for provided by Redhat, with postgres running on it. The Postgres is installed on both nodes and the data are stored on SAN, only one instance of postgres run at time in one of two nodes. In last 2 years we had a failure and the service relocation worked as expected. Consider also that applications shall have a good behaviour like try to close the current connection and retry to open a new one for a while Regards Gaetano Mendola ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] postgresql clustering
Gaetano Mendola wrote: Daniel Duvall wrote: I've looked at PostgreSQL and EnterpriseDB, but I can't find anything definitive as far as clustering capabilities. What kinds of projects are there for clustering PgSQL, and are any of them mature enough for commercial apps? Are you looking for clustering or replication? There are two very popular replication solutions: Slony-I and Mammoth Replicator. Slony-I is an external replication solution, Mammoth Replicator is a complete PostgreSQL + Replication solution. Sincerely, Joshua D. Drake As you well know clustering means all and nothing at the same time. We do have a commercial failover cluster for provided by Redhat, with postgres running on it. The Postgres is installed on both nodes and the data are stored on SAN, only one instance of postgres run at time in one of two nodes. In last 2 years we had a failure and the service relocation worked as expected. Consider also that applications shall have a good behaviour like try to close the current connection and retry to open a new one for a while Regards Gaetano Mendola ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240 PostgreSQL Replication, Consulting, Custom Programming, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] postgresql clustering
Jonah, I stumbled on this discussion in one of my recurring searches for an open-source database app capable of true clustering (failover, load balancing, etc) that I can pair with my PHP application. A search that, sadly, most often ends in disappointment -- there's tons and tons of database marketing BS out there. Part of my frustration is do to my lack of a real understanding of the models you mentioned in your comment. I've been searching for meaningful text and comparisons of the different clustering models, but have yet to find anything that truely breaks it down well (and deep). Could you perhaps point me -- and anyone else that happens upon this post with the same frustrations -- in the right direction? I've looked at PostgreSQL and EnterpriseDB, but I can't find anything definitive as far as clustering capabilities. What kinds of projects are there for clustering PgSQL, and are any of them mature enough for commercial apps? Best, Dan Jonah H. Harris wrote: In the past couple years I've worked on several personal/business projects to cluster PostgreSQL and InnoDB (without MySQL). I've tested shared-nothing, shared-memory, and shared-disk models. IMHO, shared-disk is the only viable option for performance and/or large production business environments. Using shared-memory or shared-nothing architectures in a database are fine for high-availability, but are expensive from a business-case for added performance. I'd be happy to share any of my clustering knowledge with ya offline. Have fun! On 9/21/05, Rafik Salama [EMAIL PROTECTED] wrote: No I do not have a case study, I just read so, but what I am suggesting to start doing is that if there is no cluster implementation to give high availability of the database, I will start doing this project through the message passing technique and I already have in the university a cluster of 19 machine intel xeon, you can see it in this URL http://www.cs.aucegypt.edu/~cluster But any way I was just asking so as not to reinvent the Wheel, in case there is something like that, but since there is not, I will give it a try, at the end of the day it is open source and I can do anything and if it happens to work, who knows Thanks Rafik Salama Systems Architect CIT Global CIT Building, Free Zone Nasr City, P.O.Box 11816, Cairo, Egypt Tel : +202 271 8794 (ext. 115) Fax : +202 2748335 Cell: +2010 5410035 http://www.citglobal.com -Original Message- From: David Fetter [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 21, 2005 8:12 PM To: Rafik Salama Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] postgresql clustering On Wed, Sep 21, 2005 at 08:01:08PM +0300, Rafik Salama wrote: Dear Sirs I know that that postgresql can be configured for high availability over a clustered environment using pgcluster, Do you have a case study showing this? I am currently studying in my masters the clustering using MPI and OpenMP, PVM and others packages and I have to do a project, so I was thinking to use this opportunity to start implementing the clustering over postgresql using any of the above packages. What do you think? Let a thousand schools of thought content. Let a hundred flowers bloom. Cheers, D -- David Fetter [EMAIL PROTECTED] http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote! ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Respectfully, Jonah H. Harris, Database Internals Architect EnterpriseDB Corporation http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] postgresql clustering
On Wed, Sep 21, 2005 at 08:01:08PM +0300, Rafik Salama wrote: Dear Sirs I know that that postgresql can be configured for high availability over a clustered environment using pgcluster, Do you have a case study showing this? I am currently studying in my masters the clustering using MPI and OpenMP, PVM and others packages and I have to do a project, so I was thinking to use this opportunity to start implementing the clustering over postgresql using any of the above packages. What do you think? Let a thousand schools of thought content. Let a hundred flowers bloom. Cheers, D -- David Fetter [EMAIL PROTECTED] http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote! ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] postgresql clustering
I think its a great idea to give it a shot, maybe you can present a proposal to the list of how you wish to go about it. There could be some experts on the list who may give you some input and direction. Aly. David Fetter wrote: On Wed, Sep 21, 2005 at 08:01:08PM +0300, Rafik Salama wrote: Dear Sirs I know that that postgresql can be configured for high availability over a clustered environment using pgcluster, Do you have a case study showing this? I am currently studying in my masters the clustering using MPI and OpenMP, PVM and others packages and I have to do a project, so I was thinking to use this opportunity to start implementing the clustering over postgresql using any of the above packages. What do you think? Let a thousand schools of thought content. Let a hundred flowers bloom. Cheers, D -- Aly Dharshi [EMAIL PROTECTED] A good speech is like a good dress that's short enough to be interesting and long enough to cover the subject ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] postgresql clustering
No I do not have a case study, I just read so, but what I am suggesting to start doing is that if there is no cluster implementation to give high availability of the database, I will start doing this project through the message passing technique and I already have in the university a cluster of 19 machine intel xeon, you can see it in this URL http://www.cs.aucegypt.edu/~cluster But any way I was just asking so as not to reinvent the Wheel, in case there is something like that, but since there is not, I will give it a try, at the end of the day it is open source and I can do anything and if it happens to work, who knows Thanks Rafik Salama Systems Architect CIT Global CIT Building, Free Zone Nasr City, P.O.Box 11816, Cairo, Egypt Tel : +202 271 8794 (ext. 115) Fax : +202 2748335 Cell: +2010 5410035 http://www.citglobal.com -Original Message- From: David Fetter [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 21, 2005 8:12 PM To: Rafik Salama Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] postgresql clustering On Wed, Sep 21, 2005 at 08:01:08PM +0300, Rafik Salama wrote: Dear Sirs I know that that postgresql can be configured for high availability over a clustered environment using pgcluster, Do you have a case study showing this? I am currently studying in my masters the clustering using MPI and OpenMP, PVM and others packages and I have to do a project, so I was thinking to use this opportunity to start implementing the clustering over postgresql using any of the above packages. What do you think? Let a thousand schools of thought content. Let a hundred flowers bloom. Cheers, D -- David Fetter [EMAIL PROTECTED] http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote! ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] postgresql clustering
In the past couple years I've worked on several personal/business projects to cluster PostgreSQL and InnoDB (without MySQL). I've tested shared-nothing, shared-memory, and shared-disk models. IMHO, shared-disk is the only viable option for performance and/or large production business environments. Using shared-memory or shared-nothing architectures in a database are fine for high-availability, but are expensive from a business-case for added performance. I'd be happy to share any of my clustering knowledge with ya offline. Have fun! On 9/21/05, Rafik Salama [EMAIL PROTECTED] wrote: No I do not have a case study, I just read so, but what I am suggesting tostart doing is that if there is no cluster implementation to give highavailability of the database, I will start doing this project through the message passing technique and I already have in the university a cluster of19 machine intel xeon, you can see it in this URLhttp://www.cs.aucegypt.edu/~cluster But any way I was just asking so as not to reinvent the Wheel, in case thereis something like that, but since there is not, I will give it a try, at theend of the day it is open source and I can do anything and if it happens to work, who knowsThanksRafik SalamaSystems ArchitectCIT GlobalCIT Building, Free ZoneNasr City,P.O.Box 11816, Cairo, EgyptTel : +202 271 8794 (ext. 115)Fax : +202 2748335 Cell: +2010 5410035http://www.citglobal.com-Original Message-From: David Fetter [mailto:[EMAIL PROTECTED]]Sent: Wednesday, September 21, 2005 8:12 PM To: Rafik SalamaCc: pgsql-hackers@postgresql.orgSubject: Re: [HACKERS] postgresql clusteringOn Wed, Sep 21, 2005 at 08:01:08PM +0300, Rafik Salama wrote: Dear Sirs I know that that postgresql can be configured for high availability over a clustered environment using pgcluster,Do you have a case study showing this? I am currently studying in my masters the clustering using MPI and OpenMP, PVM and others packages and I have to do a project, so I was thinking to use this opportunity to start implementing the clustering over postgresql using any of the above packages. What do you think?Let a thousand schools of thought content.Let a hundred flowersbloom.Cheers,D--David Fetter [EMAIL PROTECTED] http://fetter.org/phone: +1 510 893 6100 mobile: +1 415 235 3778Remember to vote!---(end of broadcast)---TIP 5: don't forget to increase your free space map settings -- Respectfully,Jonah H. Harris, Database Internals ArchitectEnterpriseDB Corporationhttp://www.enterprisedb.com/