Hello again,
In order to solve the problem described in previous email, I made some
changes in pool_connection_pool.c file. The changes are listed below.
Basically I wrote a new method : reinitCurrentCluster to find a new value
for CurrentCluster variable, if an error is found when connecting to one
node in cluster. Now in method create_cp a loop exists: from label
tryConnect .... up to goto tryConnect. We exit this loop when a valid DB
node is found or when no more nodes with status TBL_INIT or TBL_USE exist.
What do you thing about these changes? I already tested them and they seem
to work fine. Is there any chance to be included in Cybercluster release
code?
Appendix: pool_connection_pool.c file
.......
static int reinitCurrentCluster(void);
............
static int reinitCurrentCluster(void){
int count;
ClusterTbl * cluster_p = NULL;
char * func = "reinitCurrentCluster()";
/* get the least locaded cluster server info */
cluster_p = PGRscan_cluster();
count = 0;
while (cluster_p == NULL ){
if ( count > PGLB_CONNECT_RETRY_TIME){
show_error("%s:no cluster available",func);
exit(1);
return STATUS_ERROR;
}
cluster_p = PGRscan_cluster();
count ++;
}
CurrentCluster = cluster_p;
return STATUS_OK;
}
static POOL_CONNECTION_POOL_SLOT *create_cp(POOL_CONNECTION_POOL_SLOT *cp,
int secondary_backend){
char * func = "create_cp()";
int fd;
char hostName[HOSTNAME_MAX_LENGTH];
tryConnect:
if (gethostname(hostName,sizeof(hostName)) < 0){
show_error("%s:gethostname() failed. (%s)",func,strerror(errno));
return NULL;
}
if (PGRis_same_host(hostName,CurrentCluster->hostName) == 1){
#ifdef PRINT_DEBUG
show_debug("%s:[%s] [%s] is
same",func,hostName,CurrentCluster->hostName);
#endif
fd = connect_unix_domain_socket(secondary_backend);
}
else{
fd = connect_inet_domain_socket(secondary_backend);
}
if (fd < 0){
#ifdef PRINT_DEBUG
show_debug("%s:[%s] right here",func,CurrentCluster->hostName);
#endif
/* fatal error, notice to parent and exit */
notice_backend_error();
//exit(1);
reinitCurrentCluster();
goto tryConnect;
return NULL;
}
cp->con = pool_open(fd);
cp->closetime = 0;
return cp;
}
.......................................
Cheers,
Lia Domide.
_____
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Lia Domide
Sent: 13 February 2008 14:29
To: [email protected]
Subject: [Pgcluster-general] Cybercluster's Load Balance-- problem
Hi everybody,
I am testing cybercluster with multiple nodes again.
If one LB has, let's say, 3 DB nodes registered (gn1, gn2 and gn3). Consider
the situation when gn1 is down, but gn2 and gn3 are running ok.
When restarting LB service, it assigns to each DB node state TBL_INIT (1).
First connection request will return Error to the backend, as the LB tries
to connect to gn1 (which is down). In that try gn1 will be updated in LB's
memory (status = TBL_ERROR_NOTICE =98), and a second connection will go to
gn2 (OK).
Is there any possible change in Cybercluster that could be done, to avoid
such situations? Could LB repeatable try to connect to all DB's in his list?
Thanks in advance,
Lia Domide.
_______________________________________________
Pgcluster-general mailing list
[email protected]
http://pgfoundry.org/mailman/listinfo/pgcluster-general