Hi Andrew,

OK, let's try these steps:

1) right after starting opensips, do an "opensips-cli -x mi ps" and identify the PID of the FIFO process

2) wait for the blocking to happen

3) from a console with root privileges , do "gdb opensips _PID_OF_FIFO_"  and run "bt full"

Best regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
  https://www.opensips-solutions.com
OpenSIPS eBootcamp 2021
  https://opensips.org/training/OpenSIPS_eBootcamp_2021/

On 10/7/21 2:19 AM, Andrew Yager wrote:
Hi Bogdan-Andrei,

This does look to be frozen in this state too. I guess GDB is the next step.

Andrew


On Wed, 6 Oct 2021 at 17:12, Bogdan-Andrei Iancu <[email protected] <mailto:[email protected]>> wrote:

    Hi Andrew,

    What you can do is, after getting the FIFO blocked, to do a trap
    "opensips-cli trap" to try to see what the FIFO process is doing
    (or trying to do). If the trap does not work, let me know, I will
    give you the instructions on how to directly attache with GDB to
    the process.

    Best regards,

    Bogdan-Andrei Iancu

    OpenSIPS Founder and Developer
       https://www.opensips-solutions.com  <https://www.opensips-solutions.com>
    OpenSIPS eBootcamp 2021
       https://opensips.org/training/OpenSIPS_eBootcamp_2021/  
<https://opensips.org/training/OpenSIPS_eBootcamp_2021/>

    On 10/6/21 12:24 AM, Andrew Yager wrote:
    Also restarting opensips produces this:

    Restarting opensips (via systemctl): opensips.serviceERROR:
    communication exception for 'which' returned: cannot access fifo
    file /tmp/opensips_fifo: [Errno 32] Broken pipe!

    Andrew


    On Wed, 6 Oct 2021 at 08:23, Andrew Yager <[email protected]
    <mailto:[email protected]>> wrote:

        Hi,

        Just replicated this on the 3.2.2 nightly build. Debug logs
        are enabled, and this is all I can see on the last request
        that stalled:

        Oct  6 04:49:32 hvprxy osips[1186130]:
        DBG:mi_fifo:mi_fifo_callback: running command [{"jsonrpc":
        "2.0", "id": "5319", "method": "which", "params": []}]
        Oct  6 04:49:32 hvprxy osips[1186130]:
        DBG:mi_fifo:mi_fifo_callback: got mi response = [0x558865cc9560]
        Oct  6 04:49:32 hvprxy osips[1186130]:
        DBG:mi_fifo:mi_fifo_callback: running command [{"jsonrpc":
        "2.0", "id": "8377", "method": "get_statistics", "params":
        [["all"]]}]
        Oct  6 04:49:32 hvprxy osips[1186133]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186134]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186135]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186136]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186137]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186138]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186139]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186140]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186141]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186142]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186143]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186144]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186145]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186146]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186147]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186148]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186149]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186150]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186151]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186152]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186153]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186154]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186155]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186156]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186157]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186158]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186159]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186160]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186161]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186162]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186163]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186164]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186165]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186166]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186167]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186168]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186169]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186170]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186171]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186172]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186173]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1
        Oct  6 04:49:32 hvprxy osips[1186174]:
        DBG:core:ipc_handle_job: received job type 0[RPC] from process 1

        Andrew

        On Mon, 4 Oct 2021 at 22:37, Andrew Yager <[email protected]
        <mailto:[email protected]>> wrote:

            Just further to this, I think it's the ul_dump command
            that seems to cause the issue first.

            Andrew


            On Sat, 2 Oct 2021 at 13:05, Andrew Yager
            <[email protected] <mailto:[email protected]>> wrote:

                Hi,

                Not entirely sure where to start digging on this one.
                On 3.1.3 we've had an issue appear "suddenly" whereby
                our mi command output seems to "block" and not return
                any useful data. Restarting opensips processes
                restores comms.

                We end up with a huge number of
                opensips_fifo_reply_\* files in the /tmp directory,
                but effectively no data is ever written and
                opensips-cli freezes.

                We've restarted twice to "resolve" now, but the issue
                appears to reoccur. We're using the mi get_statistics
                command to monitor stats/health and uldump to pull
                some detail about usrloc data every 5 minutes.

                I don't think I can cause a debug level increase
                because the MI process doesn't seem to be
                communicating (at least it doesn't seem to work).

                opensips 3.1.3-1
                opensips-cli 0.1~20210707~572d2db-
                5.4.0-88-generic #99-Ubuntu SMP Thu Sep 23 17:29:00
                UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

                I can restart again, but if this is going to keep
                happening, I want to make sure I have something
                useful in the logs to be able to trace down.

                Andrew


    _______________________________________________
    Users mailing list
    [email protected]  <mailto:[email protected]>
    http://lists.opensips.org/cgi-bin/mailman/listinfo/users  
<http://lists.opensips.org/cgi-bin/mailman/listinfo/users>


_______________________________________________
Users mailing list
[email protected]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users

Reply via email to