Re: [Pgpool-hackers] Major bug with pcp_detach_node

Gilles Darold Fri, 07 Jan 2011 03:26:37 -0800

Hi,

Don't know what you call the pcp server program ? I've modified the
patch to be placed into the pcp_do_child() function from pcp_child.c,
this is for me where PCP command are received. I guess that libpcp use
this part too. I've also add the same fix for pcp_recovery_node and
pcp_attach_node that doesn't handle the case too.


Here is the server response when out of range :

DEBUG: send: tos="R", len=46
DEBUG: recv: tos="r", len=21, data=AuthenticationOK
DEBUG: send: tos="D", len=6
DEBUG: recv: tos="e", len=21, data=NodeIdOutOfRange
DEBUG: command failed. reason=NodeIdOutOfRange
BackendError
DEBUG: send: tos="X", len=4

Hope this is what you requested, else I don't know where to do better.

Regards,

 
Le 07/01/2011 10:12, Tatsuo Ishii a écrit :
> Gilles,
>
> Thanks for the report.  That is definitely a bug. There should be a
> node id range check somewhere.  However I don't think placing the
> check in pcp_detach_node command is a good idea. Rather, we should put
> the check in the pcp server program. This way, not only by using
> pcp_command but using pcp library is being checked of the node range.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
>
>> I found an annoying problem with the PCP command pcp_detach_node. I have
>> 3 computers running each a postgresql instance in a streaming
>> replication line. PgPool is running on the first node which is the
>> master. The problem comes when you give a node id outside the real node
>> numbers.
>>
>> As I explain above I just have 3 nodes so node id goes from 0 up to 2
>> and if I use node id 3 that doesn't exists, here are the results:
>>
>> /usr/bin/pcp_detach_node -d 10 192.168.1.11 9898 postgres postgres 3
>>
>> DEBUG: send: tos="R", len=46
>> DEBUG: recv: tos="r", len=21, data=AuthenticationOK
>> DEBUG: send: tos="D", len=6
>> DEBUG: recv: tos="d", len=20, data=CommandComplete
>> DEBUG: send: tos="X", len=4
>> ------------- log file ----------------
>> LOG: notice_backend_error: node 0 is not valid backend.
>> LOG: starting degeneration. shutdown host 192.168.1.13(5432)
>> LOG: execute command: /home/postgres/bin/failover.sh 2 192.168.1.13
>> 192.168.1.11 /home/postgres/data/postgres.trigger
>> LOG: failover_handler: set new master node: 0
>> LOG: failover done. shutdown host 192.168.1.13(5432)
>> LOG: find_primary_node: primary node id is 0
>>  
>> [postg...@vm1 ~]$ psql -p 9999 -c "SHOW pool_nodes;"
>>  node_id |   hostname   | port | status | lb_weight | state
>> ---------+--------------+------+--------+-----------+-------
>>  0       | 192.168.1.11 | 5432 | 2      | 0.333333  | P
>>  1       | 192.168.1.12 | 5432 | 2      | 0.333333  | S
>>  2       | 192.168.1.13 | 5432 | 3      | 0.333333  | S
>> (3 rows)
>>
>> As you can see node 2 has been detached instead of aborting and
>> displaying an error, I also experienced that the detached node was node
>> 0, which is worst.
>>
>> I've attached a patch that will return the following :
>>
>> /usr/bin/pcp_detach_node -d 10 192.168.1.11 9898 postgres postgres 3
>> DEBUG: send: tos="R", len=46
>> DEBUG: recv: tos="r", len=21, data=AuthenticationOK
>> DEBUG: send: tos="D", len=6
>> EOFError
>> DEBUG: send: tos="X", len=4
>> ------------- log file ----------------
>> LOG: pcp_child: node id 3 is not valid
>> LOG: PCP child 32232 exits with status 256
>> LOG: fork a new PCP child pid 32299
>>
>>
>> Regards,
>>
>> -- 
>> Gilles Darold
>> http://dalibo.com - http://dalibo.org
>>


-- 
Gilles Darold
http://dalibo.com - http://dalibo.org

702a703,718
> 				if ( (node_id < 0) || (node_id >= pool_config->backend_desc->num_backends) )
> 				{
> 					char code[] = "NodeIdOutOfRange";
> 
> 					pool_error("pcp_child: node id %d is not valid", node_id);
> 					pcp_write(frontend, "e", 1);
> 					wsize = htonl(sizeof(code) + sizeof(int));
> 					pcp_write(frontend, &wsize, sizeof(int));
> 					pcp_write(frontend, code, sizeof(code));
> 					if (pcp_flush(frontend) < 0)
> 					{
> 						pool_error("pcp_child: pcp_flush() failed. reason: %s", strerror(errno));
> 						exit(1);
> 					}
> 					exit(1);
> 				}
724a741,756
> 				if ( (node_id < 0) || (node_id >= pool_config->backend_desc->num_backends) )
> 				{
> 					char code[] = "NodeIdOutOfRange";
> 
> 					pool_error("pcp_child: node id %d is not valid", node_id);
> 					pcp_write(frontend, "e", 1);
> 					wsize = htonl(sizeof(code) + sizeof(int));
> 					pcp_write(frontend, &wsize, sizeof(int));
> 					pcp_write(frontend, code, sizeof(code));
> 					if (pcp_flush(frontend) < 0)
> 					{
> 						pool_error("pcp_child: pcp_flush() failed. reason: %s", strerror(errno));
> 						exit(1);
> 					}
> 					exit(1);
> 				}
775a808,810
> 				if ( (node_id < 0) || (node_id >= pool_config->backend_desc->num_backends) )
> 				{
> 					char code[] = "NodeIdOutOfRange";
776a812,823
> 					pool_error("pcp_child: node id %d is not valid", node_id);
> 					pcp_write(frontend, "e", 1);
> 					wsize = htonl(sizeof(code) + sizeof(int));
> 					pcp_write(frontend, &wsize, sizeof(int));
> 					pcp_write(frontend, code, sizeof(code));
> 					if (pcp_flush(frontend) < 0)
> 					{
> 						pool_error("pcp_child: pcp_flush() failed. reason: %s", strerror(errno));
> 						exit(1);
> 					}
> 					exit(1);
> 				}

_______________________________________________
Pgpool-hackers mailing list
[email protected]
http://pgfoundry.org/mailman/listinfo/pgpool-hackers

Re: [Pgpool-hackers] Major bug with pcp_detach_node

Reply via email to