Hello, > Hi. > > >> >> It's now working in pgpool-II 2.1 RC1. No more problems. And yes I >> use >> >> PHP >> >> PDO and AdoDB. My previous post has more details. So what's the >> >> replicate_def table and replicate_select for? Thanks again. :) >> > >> > replicate_def is to replicate data, not to distribute. >> > >> > ex) partitioning data >> > node1: dist1 -> (1, 2, 3) dist2 -> (4, 5, 6) >> > node2: dist1 -> (4, 5, 6) dist2 -> (7, 8, 9) >> > node3: dist1 -> (7, 8, 9) dist2 -> (10, 11, 12) >> > >> > When pgpool executes the follwing query, the query is not parallel >> > execution. Because we need to join dist1 table on node2 and dist2 >> > table on node1. >> > >> > SELECT * FROM dist1 INNER JOIN dist2 on dist1.a = dist2.a; >> > >> > >> > ex) partitioning data >> > node1: dist1 -> (1, 2, 3) rep1 -> (4, 5, 6, 7, 8, 9, 10, 11, 12) >> > node2: dist1 -> (4, 5, 6) rep1 -> (4, 5, 6, 7, 8, 9, 10, 11, 12) >> > node3: dist1 -> (7, 8, 9) rep1 -> (4, 5, 6, 7, 8, 9, 10, 11, 12) >> > >> > pgpool can execute the following query concurrently. Because rep1 >> > table is the same on all nodes. >> > >> > SELECT * FROM dist1 INNER JOIN rep1 ON dist1.a = rep1.a; >> > >> > I think small tables are defined in replicate_def. >> >> If small tables are ideal in replicate_def then, it may not be >> applicable >> to our situation because here's the structure of the whole thing as >> planned: >> >> database: loadbal >> schema: public >> tables: salesdetail >> salesheader >> >> Tables salesdetail and salesheader both contain transaction date and >> branchcode columns. We decided to partition the data by transaction date >> because, as calculated, it partitions data more evenly as compared to by >> branchcode partitioning. We then extract the MONTH part of the >> transaction >> date then use it as an argument on the SELECT CASE function. Therefore, >> possible values are 1-12. The 2 tables are very very large (millions of >> records per table because we have like 100+ branches and >> hundreds/thousands of transactions per day). So that is the reason why >> my >> function looks like this: >
I just found out that it's not just salesdetail table and salesheader table that we need to generate a sales report from but there is another database (MySQL) that we need to migrate to PostgreSQL. It's called loadbalance_lookup. It contains several tables but some of the fields of the loadbalance_lookup.tables are the same with salesdetail and salesheader fields. The programmer said that he left joins the ones from loadbalance_lookup.table.field to one of the main tables but not both (salesdetail and salesheader). If this is the case then, I may have to create a loadbalance_lookup table and place the only needed fields inside it like branchcode and transaction_date, etc. Am I right? And I guess, partitioning should be: node1: loadbalance_lookup -> (1, 2, 3, 4) salesheader -> (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) salesdetail -> (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) node2: loadbalance_lookup -> (5, 6, 7, 8) salesheader -> (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) salesdetail -> (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) node3: loadbalance_lookup -> (9, 10, 11, 12) salesheader -> (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) salesdetail -> (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) *** Assuming: "transaction_date" field is the partitioning key column and present on all the 3 tables then we just EXTRACT(MONTH FROM $1) to get the 1-12 values. Since loadbalance_lookup fields are to be joined with either salesheader or salesdetail fields, I distributed loadbalance_lookup then replicated salesheader and salesdetail (as per the above INNER JOIN example for the right way of doing parallel query). Sorry if it looks a little more complicated this time. All we want to do is to partition our very very large data and make parallel query work to our advantage. Is my setup correct? I appreciate all your help. Thank you very much. Regards, Viril Calimlim > Do you want to execute following query ? > > SELECT * FROM salesdetail INNER JOIN salesheader using(transaction_date); > > ex) partitioning data > node1: salesdetail -> (1, 2, 3, 4) salesheader -> (1, 2, 3, 4) > node2: salesdetail -> (5, 6, 7, 8) salesheader -> (5, 6, 7, 8) > node3: salesdetail -> (9, 10, 11, 12) salesheader -> (9, 10, 11, 12) > > If JOIN is concluded by a single node, you can use the VIEW. > > Define VIEW to each node. > CREATE VIEW detail_and_header as SELECT * FROM salesdetail INNER JOIN > salesheader using(transaction_date); > > Define VIEW as partitioning table to dist_def table. > > You can use parallel query execution. > > SELECT * FROM detail_and_header; > >> >> *** Note: $1 = transaction date >> >> SELECT CASE WHEN EXTRACT(MONTH FROM $1) => 1 AND EXTRACT(MONTH FROM $1) >> <= >> 4 THEN 0 >> >> *** Months 1-4 goes to node0 >> >> Question: Is it safe to say that it is not advisable to use >> replicate_def >> because the tables are very large or should I use it to replicate the >> records on all nodes? >> >> Whenever I enter a value in the transactiondate field, ex. 2008-07-15, >> it >> goes to node1 alone and no copy on others. Is this the right behavior? >> Are >> we going to be able to take advantage of parallel query if this is going >> to be the setup? If not then what would possibly be the best >> partitioning >> scheme to take full advantage of parallel query given the >> structure/setup >> above? Our goal again is to prove that pgpool-II parallel query can cut >> the SELECT query time shorter as compared to a single MySQL server with >> same amount of data (millions). Ideally, time taken should divided by 3 >> because of 3 backends. >> >> Thank you very much for all your help. >> >> Regards, >> Viril Calimlim >> >> > >> > replicate_select is a parameter for replication mode. So you don't >> > need to set it. >> > >> > I'm not good at English. So if you cannot understand, feel free to ask >> > more. >> > >> > Regards, >> > -- >> > Yoshiyuki Asaba >> > [EMAIL PROTECTED] >> > >> >> >> >> Regards, >> >> Viril Calimlim >> >> >> >> > Hi, >> >> > >> >> > From: [EMAIL PROTECTED] >> >> > Subject: Re: [Pgpool-general] Parallel Query Mode - SELECT, >> >> Phppgadmin, >> >> > Psql >> >> > Date: Thu, 10 Jul 2008 17:25:41 +0800 (PHT) >> >> > >> >> >> This is really serious because we code using PHP and we wouldn't >> be >> >> able >> >> >> to make use of pgpool-II if we couldn't solve this problem. >> >> > >> >> > Do you use PHP-PDO? >> >> > >> >> > By the way, >> >> >> where can I get more information regarding other parameters like >> >> >> replicate_select (yes it replicates the SELECT on all nodes but >> >> what's >> >> >> the >> >> >> advantage? did I miss anything on the manual?), replicate_def >> table, >> >> >> etc. >> >> >> I am more than willing to further test the software for you to >> >> eliminate >> >> >> more bugs. Thank you so much for the speedy response. >> >> > >> >> > Could you try pgpool-II 2.1 RC1? >> >> > >> >> > Regards, >> >> > -- >> >> > Yoshiyuki Asaba >> >> > [EMAIL PROTECTED] >> >> > >> >> > >> >> > >> >> >> >> >> >> Best regards, >> >> >> Viril Calimlim >> >> >> >> >> >> > Hi, >> >> >> > >> >> >> > Do you set parallel_mode to true? >> >> >> > >> >> >> > Regards, >> >> >> > -- >> >> >> > Yoshiyuki Asaba >> >> >> > [EMAIL PROTECTED] >> >> >> > >> >> >> > >> >> >> > From: [EMAIL PROTECTED] >> >> >> > Subject: [Pgpool-general] Parallel Query Mode - SELECT, >> Phppgadmin, >> >> >> Psql >> >> >> > Date: Thu, 10 Jul 2008 15:45:11 +0800 (PHT) >> >> >> > >> >> >> >> Hello! >> >> >> >> >> >> >> >> First of all, thank you for taking the time to reply and >> excellent >> >> >> tip >> >> >> >> regarding my last question about Parallel Query. However, I've >> got >> >> a >> >> >> new >> >> >> >> problem. >> >> >> >> >> >> >> >> In Parallel Query Mode, as per the Tutorial guidelines, I >> created >> >> the >> >> >> >> following: >> >> >> >> >> >> >> >> - Database "test" >> >> >> >> - Schema "testschema" >> >> >> >> - Table "testtable" >> >> >> >> - Fields "id integer", "name varchar(20)" >> >> >> >> - Function pgpool_catalog.dist_def_test with same ranges >> >> (1..100000 >> >> >> etc) >> >> >> >> - Inserted a row in dist_def table with key "id" >> >> >> >> >> >> >> >> I have 3 backend servers 0, 1, 2 and 1 pgpool-II dedicated >> server. >> >> I >> >> >> did >> >> >> >> not insert a row in replicate_def because I still don't >> understand >> >> >> that >> >> >> >> part (sorry). >> >> >> >> >> >> >> >> My question is that whenever I issue the command: >> >> >> >> >> >> >> >> [EMAIL PROTECTED]:~$ psql -p 9999 -c "INSERT INTO >> >> >> >> testschema.testtable VALUES (100000,'testname')" test >> >> >> >> >> >> >> >> ... I checked all the backends and it send the data to node0 >> ALONE >> >> >> and >> >> >> >> when I do a SELECT *, it shows only 1 row. >> >> >> >> >> >> >> >> But when I use Phppgadmin or even run a PHP script (using PDO) >> on >> >> >> port >> >> >> >> 9999 of the pgpool-II server to insert data, I checked all the >> >> >> backends >> >> >> >> and all of them has the copy of the inserted row. When I do >> SELECT >> >> *, >> >> >> I >> >> >> >> get 3 rows of the same data. Another thing about Phppgadmin, >> when >> >> I >> >> >> >> click >> >> >> >> on the BROWSE button on testtable, it just keeps on Loading... >> >> >> meaning, >> >> >> >> it >> >> >> >> stuck forever for some reason and I had to kill all pgpool >> >> processes. >> >> >> >> >> >> >> >> tail postgresql log: lots of "LOG: unexpected EOF on client >> >> >> connection" >> >> >> >> tail pgpool log: >> >> >> >> >> >> >> >> 2008-07-10 15:41:28 DEBUG: pid 6621: read_kind_from_backend: >> read >> >> >> kind >> >> >> >> from 0 th backend Z NUM_BACKENDS: 3 >> >> >> >> 2008-07-10 15:41:28 DEBUG: pid 6621: read_kind_from_backend: >> read >> >> >> kind >> >> >> >> from 1 th backend Z NUM_BACKENDS: 3 >> >> >> >> 2008-07-10 15:41:28 DEBUG: pid 6621: read_kind_from_backend: >> read >> >> >> kind >> >> >> >> from 2 th backend Z NUM_BACKENDS: 3 >> >> >> >> 2008-07-10 15:41:28 DEBUG: pid 6621: pool_process_query: kind >> from >> >> >> >> backend: Z >> >> >> >> 2008-07-10 15:41:28 DEBUG: pid 6621: pool_read_message_length: >> >> slot: >> >> >> 0 >> >> >> >> length: 5 >> >> >> >> 2008-07-10 15:41:28 DEBUG: pid 6621: pool_read_message_length: >> >> slot: >> >> >> 1 >> >> >> >> length: 5 >> >> >> >> 2008-07-10 15:41:28 DEBUG: pid 6621: pool_read_message_length: >> >> slot: >> >> >> 2 >> >> >> >> length: 5 >> >> >> >> 2008-07-10 15:41:28 DEBUG: pid 6621: ReadyForQuery: message >> >> length: 5 >> >> >> >> 2008-07-10 15:41:28 DEBUG: pid 6621: ReadyForQuery: transaction >> >> >> state: I >> >> >> >> 2008-07-10 15:41:28 DEBUG: pid 6621: >> pool_connection_pool_timer: >> >> set >> >> >> >> close >> >> >> >> time 1215675688 >> >> >> >> >> >> >> >> Version: pgpool-II 2.1beta2 >> >> >> >> OS: Ubuntu 8.04 Server >> >> >> >> PostgreSQL: 8.3.1 >> >> >> >> >> >> >> >> Please help. I really want this to work. Ive been after this >> for 3 >> >> >> >> months >> >> >> >> now. >> >> >> >> >> >> >> >> Thank you very much for your help. >> >> >> >> >> >> >> >> Regards, >> >> >> >> Viril >> >> >> >> >> >> >> >> _______________________________________________ >> >> >> >> Pgpool-general mailing list >> >> >> >> [email protected] >> >> >> >> http://pgfoundry.org/mailman/listinfo/pgpool-general >> >> >> > _______________________________________________ >> >> >> > Pgpool-general mailing list >> >> >> > [email protected] >> >> >> > http://pgfoundry.org/mailman/listinfo/pgpool-general >> >> >> > >> >> >> > >> >> >> >> >> >> >> >> > >> >> > >> >> >> >> >> > >> > >> >> >> >> _______________________________________________ >> Pgpool-general mailing list >> [email protected] >> http://pgfoundry.org/mailman/listinfo/pgpool-general >> > > > -- > SRA OSS, Inc. $BF|K\;Y<R(B > Yoshiharu Mori <[EMAIL PROTECTED]> > http://www.sraoss.co.jp/ > > _______________________________________________ Pgpool-general mailing list [email protected] http://pgfoundry.org/mailman/listinfo/pgpool-general
