I'm observing the following scenario:

* mod_dialog callbacks trigger 2 or more times (nearly) simultaneously for the 
same dialog
* pua_dialoginfo sends PUBLISH 1, referencing etag A
* pua_dialoginfo sends PUBLISH 2, referencing etag A
* presence_dialoginfo processes PUBLISH 1, replies with new etag B
* presence_dialoginfo processes PUBLISH 2, replies with a 412 (because etag A 
no longer exists)
* pua_dialoginfo receives the 412 and re-tries it as PUBLISH 3 ("sent a PUBLISH 
within a dialog that no longer exists, send again an intial PUBLISH")
* presence_dialoginfo processes PUBLISH 3, and may or may not accept it

The situation as described is not ideal since it'll fill up your logs with 
errors, but isn't critical per se. Much more problematic is when there are more 
than 2 PUBLISHes generated for the same dialog simultaneously, as this can 
cause a (near) infinite race between the various PUBLISH requests all fighting 
to update the same etag. For example, 10 PUBLISH are sent out for etag A; all 
but one are rejected with a 412; then the other 9 keep on bouncing back and 
forth between pua_dialoginfo and presence_dialoginfo because they do not share 
the same view on the dialog's latest etag.

Even worse is when presence_dialoginfo is rejecting *all* incoming PUBLISHes 
with a 412, for example because of a database/memory/replication problem or a 
malformed request. A `t_reply("412", "Not today")` in the presence_dialoginfo 
server, combined with a single PUBLISH from pua_dialoginfo is enough to 
reproducibly brick the pua_dialoginfo server because it runs into critical 
memory fragmentation levels.

I think there are multiple ways to fix or alleviate this problem.

## pua generic

* pua (publ_cback_func) should not retry 412-failed PUBLISHes indefinitely, but 
e.g. at most once
* pua should not generate simultaneous PUBLISHes for the same presentity. It 
should delay PUBLISH 2 until PUBLISH 1 is either (permanently) accepted or 
rejected; or it should discard PUBLISH 2 immediately when it is generated.
* Perhaps make handling of 412 replies more fine-grained. Currently every 412 
reply is handled like this ("sent a PUBLISH within a dialog that no longer 
exists"), while that statement doesn't apply to all possible 412 replies. 

## pua_dialoginfo specific

* pua_dialoginfo currently subscribes to a lot of mod_dialog callbacks. For 
example subscribing to both DLGCB_CONFIRMED and DLGCB_CONFIRMED_NA will always 
get you two (rapidly succeeding) PUBLISHes with exactly the same contents. 
Subscribing to DLGCB_REQ_WITHIN means you'll get a new PUBLISH for every 
re-INVITE such as hold/unhold or codec negotiation, which is useless in many 
usecases. It would be helpful to allow configuring pua_dialoginfo with a list 
of callbacks to subscribe to.
* With or without a smaller set of mod_dialog callbacks, pua_dialoginfo can 
generate multiple PUBLISH requests with exactly the same contents. Since pua is 
aware of the (last) state that it published for the presentity, it could 
compare if the newly generated PUBLISH is any different from the last known 
state, and discard it if it's not.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/2048
_______________________________________________
Kamailio (SER) - Development Mailing List
[email protected]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev

Reply via email to