Hi Tatsuo,

filtered logs are attached.

Can you validate the patches applied?

Thanks,
Agustín Almonte F.


Attachment: pgpool_pid11723.log
Description: Binary data



El 25-09-2009, a las 4:00, Tatsuo Ishii escribió:

Xavier,

Thanks for analyzing and patches! I don't know what 0x0049050000 is
either. Can you send me the log?
--
Tatsuo Ishii
SRA OSS, Inc. Japan

Tatsuo,

I think we found what the problem was. During the reset of a backend
the pgpool process send a BEGIN command to start a transaction and
expects to receive a message kind 'N', 'E', 'C' or 'Z', but in our
case the backend sends something different ( 0x0049050000 ). The
process interprets part of what it received as the length of the data
it needs to read from the backend, and so blocks itself indefinitely
while waiting to read that much data.

I don't know what it is that the backend is sending, but it seems to
be always the same data (0x0049050000), and the first byte of it is
not any known message kind ('N', 'E', 'C', etc...).

I've attached a patch which aborts the reset operation if what was
read from the backend is none of the expected message kinds.

We also have some logs which might make it easier to understand the
code flow in case you want to examine them.

Cheers


On Thu, Sep 24, 2009 at 9:41 AM, Xavier Noguer <[email protected]> wrote:
 Tatsuo,

 Our test case was this:  two backends running postgres 8.1; a few
differences between them, with the master node always having more
registers.

We tried to reproduce the effect on our development environment, but it didn't work the first time. I'll try again to see if I can provide
you with the necessary database dumps to reproduce it.

 Cheers

On Thu, Sep 24, 2009 at 4:05 AM, Tatsuo Ishii <[email protected]> wrote:
Thanks for investigation.

But I could not reproduce Agustín's problem. I ran test/jdbc for
testing. If you have a self contained test case, please let me know. I would like to know why my patches did not work and should help me in
future bug shooting.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

 Hello Tatsuo,

I'm working with Agustín Almonte on this same issue, and after trying the latest patch you provided we realized that when a DEALLOCATE was being sent for a prepared statement, that prepared statement was not being taken off prepared_list. This meant that prepared_list was not
updated and the same DEALLOCATE was sent over and over again.

Attached you'll find a patch that takes the prepared statement off
prepared_list after having sent the DEALLOCATE for that prepared
statement. We tested it and it seems to work fine.

 Cheers



--- pool_process_query.c        2009-09-24 01:56:59.000000000 -0400
+++ pool_process_query.c.new    2009-09-25 03:00:23.000000000 -0400
@@ -2619,6 +2619,12 @@
                                return POOL_END;
                        }
                        len = ntohl(len) - 4;
+                       
+                       if (kind != 'N' && kind != 'E' && kind != 'C')
+                       {
+                               pool_error("do_command: error, kind is not N, E or 
C");
+                               return POOL_END;
+                       }
                        string = pool_read2(backend, len);
                        if (string == NULL)
                        {

_______________________________________________
Pgpool-general mailing list
[email protected]
http://pgfoundry.org/mailman/listinfo/pgpool-general

Reply via email to