I'm observing the following scenario:
* mod_dialog callbacks trigger 2 or more times (nearly) simultaneously for the
same dialog
* pua_dialoginfo sends PUBLISH 1, referencing etag A
* pua_dialoginfo sends PUBLISH 2, referencing etag A
* presence_dialoginfo processes PUBLISH 1, replies with new etag B
* presence_dialoginfo processes PUBLISH 2, replies with a 412 (because etag A
no longer exists)
* pua_dialoginfo receives the 412 and re-tries it as PUBLISH 3 ("sent a PUBLISH
within a dialog that no longer exists, send again an intial PUBLISH")
* presence_dialoginfo processes PUBLISH 3, and may or may not accept it
The situation as described is not ideal since it'll fill up your logs with
errors, but isn't critical per se. Much more problematic is when there are more
than 2 PUBLISHes generated for the same dialog simultaneously, as this can
cause a (near) infinite race between the various PUBLISH requests all fighting
to update the same etag. For example, 10 PUBLISH are sent out for etag A; all
but one are rejected with a 412; then the other 9 keep on bouncing back and
forth between pua_dialoginfo and presence_dialoginfo because they do not share
the same view on the dialog's latest etag.
Even worse is when presence_dialoginfo is rejecting *all* incoming PUBLISHes
with a 412, for example because of a database/memory/replication problem or a
malformed request. A `t_reply("412", "Not today")` in the presence_dialoginfo
server, combined with a single PUBLISH from pua_dialoginfo is enough to
reproducibly brick the pua_dialoginfo server because it runs into critical
memory fragmentation levels.
I think there are multiple ways to fix or alleviate this problem.
## pua generic
* pua (publ_cback_func) should not retry 412-failed PUBLISHes indefinitely, but
e.g. at most once
* pua should not generate simultaneous PUBLISHes for the same presentity. It
should delay PUBLISH 2 until PUBLISH 1 is either (permanently) accepted or
rejected; or it should discard PUBLISH 2 immediately when it is generated.
* Perhaps make handling of 412 replies more fine-grained. Currently every 412
reply is handled like this ("sent a PUBLISH within a dialog that no longer
exists"), while that statement doesn't apply to all possible 412 replies.
## pua_dialoginfo specific
* pua_dialoginfo currently subscribes to a lot of mod_dialog callbacks. For
example subscribing to both DLGCB_CONFIRMED and DLGCB_CONFIRMED_NA will always
get you two (rapidly succeeding) PUBLISHes with exactly the same contents.
Subscribing to DLGCB_REQ_WITHIN means you'll get a new PUBLISH for every
re-INVITE such as hold/unhold or codec negotiation, which is useless in many
usecases. It would be helpful to allow configuring pua_dialoginfo with a list
of callbacks to subscribe to.
* With or without a smaller set of mod_dialog callbacks, pua_dialoginfo can
generate multiple PUBLISH requests with exactly the same contents. Since pua is
aware of the (last) state that it published for the presentity, it could
compare if the newly generated PUBLISH is any different from the last known
state, and discard it if it's not.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/2048_______________________________________________
Kamailio (SER) - Development Mailing List
[email protected]
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev