> > >Raymond Dans wrote: >> Martin wrote: >>>> Subject: [sipX-dev] XX-6547 - sipXsupervisor core dump during >>>> replication >of10, 000 user database >>>> >>>> In tracing the issue with replication of a 10,000 user database, I've >>>> found a couple of areas in the system that are problematic. >>>> >>>> A typical database replication request involves sipXconfig gathering >>>> all of the necessary information, constructing an XML-RPC request for >>>> that replication and sending it to the appropriate sipXsupervisor. >>>> sipXconfig will then wait a pre-determined amount of time for a >>>> response. Should it not receive a response in the alloted time, the >>>> replication will be marked as failed (problem 1). >>> Are you saying that if there is no response it is marked >>> failed for good and no automatic retry is performaed? If so, >>> that is a problem. >> >>>From what I've seen, sipXconfig does not do an immediate retry on a >> failure scenario. It will eventually try when the next send profiles >> occurs due to some change in configuration. >> > >I think that current sipXconfig behavior is better than the brute force >retrying. Making sipXconfig retry XML/RPC call automatically in many cases >(certainly in this one) would make things worse. Underlying protocol >insures basic connectivity: if there are problems they are not the kind >that would clear themselves in a matter of seconds. >D.
I think that in the current UI it is very hard for the admin to find out what to do once a replication error occured (also see my recent post on Job Status page). We got to have an automatic recovery process. There are many reasons why connectivity to another host in the cluster is temporarily lost. If this happens during some replication event to that host, then all the admin gets is a failure status on the Job Status page. Most admins will just clear that page so that the annoying error that displays on every page goes away. The host in question remains with failed replication. It might recover sort of 'by accident' next time the admin makes a config change, but even that is hard to tell. The system might still be running, but e.g. a distributed proxy might use the wrong dialplan. Did I get this right? --martin _______________________________________________ sipx-dev mailing list [email protected] List Archive: http://list.sipfoundry.org/archive/sipx-dev Unsubscribe: http://list.sipfoundry.org/mailman/listinfo/sipx-dev sipXecs IP PBX -- http://www.sipfoundry.org/
