Hi Tatsuo, filtered logs are attached.
Can you validate the patches applied? Thanks, Agustín Almonte F.
pgpool_pid11723.log
Description: Binary data
El 25-09-2009, a las 4:00, Tatsuo Ishii escribió:
Xavier, Thanks for analyzing and patches! I don't know what 0x0049050000 is either. Can you send me the log? -- Tatsuo Ishii SRA OSS, Inc. JapanTatsuo, I think we found what the problem was. During the reset of a backend the pgpool process send a BEGIN command to start a transaction and expects to receive a message kind 'N', 'E', 'C' or 'Z', but in our case the backend sends something different ( 0x0049050000 ). The process interprets part of what it received as the length of the data it needs to read from the backend, and so blocks itself indefinitely while waiting to read that much data. I don't know what it is that the backend is sending, but it seems to be always the same data (0x0049050000), and the first byte of it is not any known message kind ('N', 'E', 'C', etc...). I've attached a patch which aborts the reset operation if what was read from the backend is none of the expected message kinds. We also have some logs which might make it easier to understand the code flow in case you want to examine them. CheersOn Thu, Sep 24, 2009 at 9:41 AM, Xavier Noguer <[email protected]> wrote:Tatsuo, Our test case was this: two backends running postgres 8.1; a few differences between them, with the master node always having more registers.We tried to reproduce the effect on our development environment, but it didn't work the first time. I'll try again to see if I can provideyou with the necessary database dumps to reproduce it. CheersOn Thu, Sep 24, 2009 at 4:05 AM, Tatsuo Ishii <[email protected]> wrote:Thanks for investigation. But I could not reproduce Agustín's problem. I ran test/jdbc fortesting. If you have a self contained test case, please let me know. I would like to know why my patches did not work and should help me infuture bug shooting. -- Tatsuo Ishii SRA OSS, Inc. JapanHello Tatsuo,I'm working with Agustín Almonte on this same issue, and after trying the latest patch you provided we realized that when a DEALLOCATE was being sent for a prepared statement, that prepared statement was not being taken off prepared_list. This meant that prepared_list was notupdated and the same DEALLOCATE was sent over and over again.Attached you'll find a patch that takes the prepared statement offprepared_list after having sent the DEALLOCATE for that prepared statement. We tested it and it seems to work fine. Cheers--- pool_process_query.c 2009-09-24 01:56:59.000000000 -0400 +++ pool_process_query.c.new 2009-09-25 03:00:23.000000000 -0400 @@ -2619,6 +2619,12 @@ return POOL_END; } len = ntohl(len) - 4; + + if (kind != 'N' && kind != 'E' && kind != 'C') + { + pool_error("do_command: error, kind is not N, E or C"); + return POOL_END; + } string = pool_read2(backend, len); if (string == NULL) {
_______________________________________________ Pgpool-general mailing list [email protected] http://pgfoundry.org/mailman/listinfo/pgpool-general
