Hi,
Don't know what you call the pcp server program ? I've modified the
patch to be placed into the pcp_do_child() function from pcp_child.c,
this is for me where PCP command are received. I guess that libpcp use
this part too. I've also add the same fix for pcp_recovery_node and
pcp_attach_node that doesn't handle the case too.
Here is the server response when out of range :
DEBUG: send: tos="R", len=46
DEBUG: recv: tos="r", len=21, data=AuthenticationOK
DEBUG: send: tos="D", len=6
DEBUG: recv: tos="e", len=21, data=NodeIdOutOfRange
DEBUG: command failed. reason=NodeIdOutOfRange
BackendError
DEBUG: send: tos="X", len=4
Hope this is what you requested, else I don't know where to do better.
Regards,
Le 07/01/2011 10:12, Tatsuo Ishii a écrit :
> Gilles,
>
> Thanks for the report. That is definitely a bug. There should be a
> node id range check somewhere. However I don't think placing the
> check in pcp_detach_node command is a good idea. Rather, we should put
> the check in the pcp server program. This way, not only by using
> pcp_command but using pcp library is being checked of the node range.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
>
>> I found an annoying problem with the PCP command pcp_detach_node. I have
>> 3 computers running each a postgresql instance in a streaming
>> replication line. PgPool is running on the first node which is the
>> master. The problem comes when you give a node id outside the real node
>> numbers.
>>
>> As I explain above I just have 3 nodes so node id goes from 0 up to 2
>> and if I use node id 3 that doesn't exists, here are the results:
>>
>> /usr/bin/pcp_detach_node -d 10 192.168.1.11 9898 postgres postgres 3
>>
>> DEBUG: send: tos="R", len=46
>> DEBUG: recv: tos="r", len=21, data=AuthenticationOK
>> DEBUG: send: tos="D", len=6
>> DEBUG: recv: tos="d", len=20, data=CommandComplete
>> DEBUG: send: tos="X", len=4
>> ------------- log file ----------------
>> LOG: notice_backend_error: node 0 is not valid backend.
>> LOG: starting degeneration. shutdown host 192.168.1.13(5432)
>> LOG: execute command: /home/postgres/bin/failover.sh 2 192.168.1.13
>> 192.168.1.11 /home/postgres/data/postgres.trigger
>> LOG: failover_handler: set new master node: 0
>> LOG: failover done. shutdown host 192.168.1.13(5432)
>> LOG: find_primary_node: primary node id is 0
>>
>> [postg...@vm1 ~]$ psql -p 9999 -c "SHOW pool_nodes;"
>> node_id | hostname | port | status | lb_weight | state
>> ---------+--------------+------+--------+-----------+-------
>> 0 | 192.168.1.11 | 5432 | 2 | 0.333333 | P
>> 1 | 192.168.1.12 | 5432 | 2 | 0.333333 | S
>> 2 | 192.168.1.13 | 5432 | 3 | 0.333333 | S
>> (3 rows)
>>
>> As you can see node 2 has been detached instead of aborting and
>> displaying an error, I also experienced that the detached node was node
>> 0, which is worst.
>>
>> I've attached a patch that will return the following :
>>
>> /usr/bin/pcp_detach_node -d 10 192.168.1.11 9898 postgres postgres 3
>> DEBUG: send: tos="R", len=46
>> DEBUG: recv: tos="r", len=21, data=AuthenticationOK
>> DEBUG: send: tos="D", len=6
>> EOFError
>> DEBUG: send: tos="X", len=4
>> ------------- log file ----------------
>> LOG: pcp_child: node id 3 is not valid
>> LOG: PCP child 32232 exits with status 256
>> LOG: fork a new PCP child pid 32299
>>
>>
>> Regards,
>>
>> --
>> Gilles Darold
>> http://dalibo.com - http://dalibo.org
>>
--
Gilles Darold
http://dalibo.com - http://dalibo.org
702a703,718
> if ( (node_id < 0) || (node_id >= pool_config->backend_desc->num_backends) )
> {
> char code[] = "NodeIdOutOfRange";
>
> pool_error("pcp_child: node id %d is not valid", node_id);
> pcp_write(frontend, "e", 1);
> wsize = htonl(sizeof(code) + sizeof(int));
> pcp_write(frontend, &wsize, sizeof(int));
> pcp_write(frontend, code, sizeof(code));
> if (pcp_flush(frontend) < 0)
> {
> pool_error("pcp_child: pcp_flush() failed. reason: %s", strerror(errno));
> exit(1);
> }
> exit(1);
> }
724a741,756
> if ( (node_id < 0) || (node_id >= pool_config->backend_desc->num_backends) )
> {
> char code[] = "NodeIdOutOfRange";
>
> pool_error("pcp_child: node id %d is not valid", node_id);
> pcp_write(frontend, "e", 1);
> wsize = htonl(sizeof(code) + sizeof(int));
> pcp_write(frontend, &wsize, sizeof(int));
> pcp_write(frontend, code, sizeof(code));
> if (pcp_flush(frontend) < 0)
> {
> pool_error("pcp_child: pcp_flush() failed. reason: %s", strerror(errno));
> exit(1);
> }
> exit(1);
> }
775a808,810
> if ( (node_id < 0) || (node_id >= pool_config->backend_desc->num_backends) )
> {
> char code[] = "NodeIdOutOfRange";
776a812,823
> pool_error("pcp_child: node id %d is not valid", node_id);
> pcp_write(frontend, "e", 1);
> wsize = htonl(sizeof(code) + sizeof(int));
> pcp_write(frontend, &wsize, sizeof(int));
> pcp_write(frontend, code, sizeof(code));
> if (pcp_flush(frontend) < 0)
> {
> pool_error("pcp_child: pcp_flush() failed. reason: %s", strerror(errno));
> exit(1);
> }
> exit(1);
> }
_______________________________________________
Pgpool-hackers mailing list
[email protected]
http://pgfoundry.org/mailman/listinfo/pgpool-hackers