Hi,
I've gone over the script and as far as I can see its working as
expected until the traffic remps up and then opensips crashes.
cores:
http://pastebin.com/CgN0h40K
http://pastebin.com/ay5TS8zD
http://pastebin.com/PGn3AqmU
Regards,
Richard
On 06/03/2017 12:14, Richard Robson wrote:
Hi<
I've tested this on the latest 2.2.3 with the same results.
http://pastebin.com/Uixb3v8G
there were a few of these in the logsd too just before the crash:
Mar 5 22:02:27 gl-sip-03 /usr/sbin/opensips[29875]:
WARNING:core:utimer_ticker: utimer task <tm-utimer> already scheduled
for 204079170 ms (now 204079270 ms), it may overlap..
Mar 5 22:02:27 gl-sip-03 /usr/sbin/opensips[29875]:
WARNING:core:utimer_ticker: utimer task <tm-utimer> already scheduled
for 204079170 ms (now 204079360 ms), it may overlap..
Mar 5 22:02:27 gl-sip-03 /usr/sbin/opensips[29875]:
WARNING:core:utimer_ticker: utimer task <tm-utimer> already scheduled
for 204079170 ms (now 204079460 ms), it may overlap..
Mar 5 22:02:27 gl-sip-03 /usr/sbin/opensips[29875]:
WARNING:core:utimer_ticker: utimer task <tm-utimer> already scheduled
for 204079170 ms (now 204079560 ms), it may overlap..
Mar 5 22:02:27 gl-sip-03 /usr/sbin/opensips[29875]:
WARNING:core:utimer_ticker: utimer task <tm-utimer> already scheduled
for 204079170 ms (now 204079660 ms), it may overlap..
Mar 5 22:02:28 gl-sip-03 /usr/sbin/opensips[29875]:
WARNING:core:utimer_ticker: utimer task <tm-utimer> already scheduled
for 204079170 ms (now 204079760 ms), it may overlap..
Regards,
Richard
On 03/03/2017 13:15, Richard Robson wrote:
More cores
http://pastebin.com/MXW2VBhi
http://pastebin.com/T7JFAP2U
http://pastebin.com/u44aaVpWquit
http://pastebin.com/SFKKcGxE
http://pastebin.com/dwSgMsJi
http://pastebin.com/9HdGLm96
I've put 2.2.3 on the dev box now and will try to replicate on that
box, but its difficult to replicate the traffic artificially. I'll
try to replicate the fault on the dev box over the weekend. I cant do
it on the live gateways because it will affect customer traffic.
Regards,
Richard
On 03/03/2017 11:28, Richard Robson wrote:
I've revisited the gateway failover mechanism I had developed in
order to re route calls to the next gateway on 500's due to capacity
on the gateways we are using.
we have 3 gateways from one carrier and one from another. The 3 have
4 cps and will return a 503 or 500 if we breach this. The single
gateway from the other carrier has plenty of capacity and should not
be a problem so we want to catch this . and route to the next gateway.
We are counting the CPS and channel limits and are routing to the
next gateway if we exceed the limit set, but There are still
occasions where a 5XX is generated, which results in a rejected call.
We want to stop these rejected calls and therefore want to implement
the failover mechanism for the 5XX responses. For 6 months we have
been failing over if we think the counts are to high on any one
gateway without a problem. But when I implement a failover on a 5XX
response opensips starts crashing.
It's difficult to generate artificial traffic to mimic the real
traffic, but I've not had a problem with the script in testing. Last
night I rolled out the new script but by 09:15 this morning opensips
started crashing 10 times in 5 minutes. This was as the traffic
ramped up. I rolled back the script and it restarted OK and has not
crashed since. Therefore the Failover Mechanism in the script is
where the crash is happening
Core dump: http://pastebin.com/CqnESCm4
I'll add more dumps later
Regards,
Richard
this is the failure route catching the 5XX
failure_route[dr_fo] {
xlog (" [dr] Recieved reply to method $rm From: $fd, $fn,
$ft, $fu, $fU, $si, $sp, To: $ru");
if (t_was_cancelled()) {
xlog("[dr]call cancelled by internal caller");
rtpengine_manage();
do_accounting("db", "cdr|missed");
exit;
}
if ( t_check_status("[54]03")) {
route(relay_failover);
}
if ( t_check_status("500")) {
route(relay_failover);
}
do_accounting("db", "cdr|missed");
rtpengine_manage();
exit;
}
This is the route taken on the failure
route[relay_failover]{
if (use_next_gw()) {
xlog("[relay_failover-route] Selected Gateway is $rd");
$avp(trunkratelimit)=$(avp(attrs){s.select,0,:});
$avp(trunkchannellimit)=$(avp(attrs){s.select,1,:});
####### check channel limit ######
get_profile_size("outbound","$rd","$var(size)");
xlog("[relay_failover-route] Selected Gateway is $rd
var(size) = $var(size)");
xlog("[relay_failover-route] Selected Gateway is $rd
avp(trunkcalllimit) = $avp(trunkchannellimit)");
xlog("[relay_failover-route] Selected Gateway is
$rd result = ( $var(size) > $avp(trunkchannellimit))");
if ( $(var(size){s.int}) >
$(avp(trunkchannellimit){s.int})) {
xlog("[relay_failover-route] Trunk $rd
exceeded $avp(trunkchannellimit) concurrent calls $var(size)");
route(relay_failover);
}
} else {
send_reply("503", "Gateways Exhusted");
exit;
}
##### We need to check Rate Limiting #######
if (!rl_check("$rd", "$(avp(trunkratelimit){s.int})",
"TAILDROP")) { # Check Rate limit $avp needs changing
rl_dec_count("$rd"); # decrement the counter since
we've not "used" one
xlog("[ratelimiter-route] [Max CPS:
$(avp(trunkratelimit){s.int}) Current CPS: $rl_count($rd)] Call to:
$rU from: $fU CPS exceeded, delaying");
$avp(initial_time)=($Ts*1000)+($Tsm/1000);
async(usleep("200000"),relay_failover_delay);
xlog ("Should not get here!!!! after async requst");
} else {
xlog ("[relay_outbound-route] [Max CPS:
$avp(trunkratelimit) Current CPS: $rl_count($rd)] Call to: $rU from:
$fU not ratelimited");
}
t_on_failure("dr_fo");
do_accounting("db", "cdr|missed");
rtpengine_manage();
if (!t_relay()) {
xlog("[relay-route] ERROR: Unable to relay");
send_reply("500","Internal Error");
exit;
}
}
--
Richard Robson
Greenlight Support
01382 843843
[email protected]
_______________________________________________
Users mailing list
[email protected]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users