Anyone? Please? By the way, is there like a FAQ somewhere so that I could avoid asking basic questions like this. For example, I have another question about replicate_def. How does it really work and what is it for? I was able to make data partitioning dist_def work (and later found out that not all nodes have contain the same data). Is my English comprehension really that bad or am I just relying too much on the pgpool-II manual? (Believe me, I tried reading it so many many many times but no, I'm never going to give up until I have this implemented because I strongly believe in its potential and we need it for speed on querying millions of data). Thank you very much for your help. I hope I could get a reply this time. :)
> Hello! > > In our company, we have approximately 144+ million records and are stored > in multiple databases using MySQL InnoDB engine. We are planning to > migrate them to PostgreSQL and found out about Pgpool-II. I am currently > testing the software (pgpool-II 2.1 beta2). I have successfully finished > setting up Replication Mode (1 Pgpool-II Server and 3 backend nodes). > > The MySQL server is a single Sales Report server wherein a lot of queries > are done everyday specially SELECT queries. INSERTS/UPDATES I believe, are > done after midnight. We are planning to do the write queries > (INSERT/UPDATE/etc.) realtime and at the same time, take lots of SELECT > queries. > > I compared the time taken when issuing a simple "SELECT COUNT(*) FROM > table1" on the MySQL server against the time taken on doing the same > through the Pgpool-II Server and directly to one of the backends. Here's > the result: > > * Pgpool-II Server with Replication on 3 Backends: 31.34mins > * MySQL Standalone Server: 20.56mins > * PostgreSQL Pgpool-II Directly to Backend: 14.28mins > > The time to beat is MySQL's 20.56mins. It seems that connecting directly > to one of the backend servers is faster than going through the Pgpool-II > server when doing a SELECT query (maybe because of overhead?). I'm not > sure if I read/comprehended it right from the manual, that Replication > Mode performs best on multiple simultaneous connections. (Probably a > server with lots of connections then balances the load?) Another good > thing about Replication Mode is that, as I understand it, creates a backup > of the whole database/s to each and everyone of the backend servers so > when a failure occurs on one the the backends, service continues.(But what > if the main Pgpool-II Server fails?) > > I also found out about Parallel Query Execution which is, I believe, the > right mode for our target. Our primary goal is to minimize the time taken > when doing SELECT queries and strip down the MySQL's 20.56min time limit. > We designed a simple setup to utilize both Replication and Parallel Modes: > > Pgpool-II Server (Replication Mode) > | > _______|________ > | | > Backend0 Backend1 > (Parallel Mode) (Parallel Mode) > ______|______ ______|______ > | | | | | | > B0 B1 B2 B0 B1 B2 <-- Backends for > Parallel > > Question #1: Will this setup work? > Question #2: Is this the best path to go for minimizing SELECT query > times? > Question #3: Any other suggestions please? > > Thank you for your time and help. > > Best regards, > Viril L. Calimlim > > _______________________________________________ > Pgpool-general mailing list > [email protected] > http://pgfoundry.org/mailman/listinfo/pgpool-general > > _______________________________________________ Pgpool-general mailing list [email protected] http://pgfoundry.org/mailman/listinfo/pgpool-general
