Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2020-01-16 Thread adeykunov
Daniel,

Got the same issue on 5.3.1 with openssl1.1, debian9.
After 3 working days of tests (about ~30-50 wss clients), suddenly we've got
a lot of connections stucked in CLOSE_WAIT state. Kamailio called
sig_alarm_abort() when we try to reboot.

Thanks,
Andrey





--
Sent from: http://sip-router.1086192.n5.nabble.com/Users-f3.html

___
Kamailio (SER) - Users Mailing List
sr-users@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users


Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-10-03 Thread Daniel-Constantin Mierla
Hello,

for deadlock issue with libssl 1.1 an workaround with a preloaded
library was made available quite some time ago:

https://github.com/kamailio/kamailio/tree/master/src/modules/tls/utils/openssl_mutex_shared

Recently that code was integrated in the core, so the preloaded library
is not needed if you run 5.1.9 or latest branch 5.2 (to be released as
5.2.5, probably soon) as well as branch 5.3 or master.

However, few days ago was reported a crash inside the pseudo-random
number generator (prng) of libssl 1.1, which seems to be caused by the
changes in libssl 1.1 to have only-thread-safety approach. A patch was
pushed two days ago, which seemed to fix it, see:
 
https://github.com/kamailio/kamailio/issues/2077

More work is expected there in the next few days to play with variants
of prng.

Cheers,
Daniel

On 03.10.19 10:29, Jurijs Ivolga wrote:
> Hi Daniel,
>
> I hope you are well. Do you have any updates on this issue? Did you
> get any response on openssl mailing list? Thank you!
>
> With kind regards,
>
> Jurijs
>
>
> On Mon, Apr 1, 2019 at 11:55 AM Daniel-Constantin Mierla
> mailto:mico...@gmail.com>> wrote:
>
> Hello,
>
> an update on this issue -- I spent a bit of time looking at
> libssl/libcrypto library and the problem can be the type of
> mutexes they
> use now internally starting with v1.1, respectively the pthread mutex.
> They are not process shared and kamailio is a multi-process
> application,
> working with the same tls connection from multiple processes.
>
> Today I wrote to openssl mailing list, waiting now to see if I get any
> hints from there.
>
> Cheers,
> Daniel
>
> On 01.04.19 10:33, Kristijan Vrban wrote:
> > Hi Andrew,
> >
> > yes, with openssl 1.0.2 Kamailio is now up and running since five
> > days. Looks good so far.
> >
> > Kristijan
> >
> > Am Do., 28. März 2019 um 11:09 Uhr schrieb Andrew Pogrebennyk
> > mailto:apogreben...@sipwise.com>>:
> >> On 3/26/19 3:52 PM, Kristijan Vrban wrote:
>  Just curious, did you get to compile with OpenSSL 1.0 and test?
> >>> Just compiled with OpenSSL 1.0 . Gone test now.
> >> Kristijan,
> >> any new occurrences since you have recompiled kamailio with
> openssl 1.0?
> >>
> >> Regards,
> >> Andrew
> > ___
> > Kamailio (SER) - Users Mailing List
> > sr-users@lists.kamailio.org 
> > https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>
> -- 
> Daniel-Constantin Mierla -- www.asipto.com 
> www.twitter.com/miconda  --
> www.linkedin.com/in/miconda 
> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com
> 
>
>
> ___
> Kamailio (SER) - Users Mailing List
> sr-users@lists.kamailio.org 
> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>
-- 
Daniel-Constantin Mierla -- www.asipto.com
www.twitter.com/miconda -- www.linkedin.com/in/miconda
Kamailio Advanced Training, Oct 21-23, 2019, Berlin, Germany -- 
https://asipto.com/u/kat

___
Kamailio (SER) - Users Mailing List
sr-users@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users


Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-10-03 Thread Jurijs Ivolga
Hi Daniel,

I hope you are well. Do you have any updates on this issue? Did you get any
response on openssl mailing list? Thank you!

With kind regards,

Jurijs


On Mon, Apr 1, 2019 at 11:55 AM Daniel-Constantin Mierla 
wrote:

> Hello,
>
> an update on this issue -- I spent a bit of time looking at
> libssl/libcrypto library and the problem can be the type of mutexes they
> use now internally starting with v1.1, respectively the pthread mutex.
> They are not process shared and kamailio is a multi-process application,
> working with the same tls connection from multiple processes.
>
> Today I wrote to openssl mailing list, waiting now to see if I get any
> hints from there.
>
> Cheers,
> Daniel
>
> On 01.04.19 10:33, Kristijan Vrban wrote:
> > Hi Andrew,
> >
> > yes, with openssl 1.0.2 Kamailio is now up and running since five
> > days. Looks good so far.
> >
> > Kristijan
> >
> > Am Do., 28. März 2019 um 11:09 Uhr schrieb Andrew Pogrebennyk
> > :
> >> On 3/26/19 3:52 PM, Kristijan Vrban wrote:
>  Just curious, did you get to compile with OpenSSL 1.0 and test?
> >>> Just compiled with OpenSSL 1.0 . Gone test now.
> >> Kristijan,
> >> any new occurrences since you have recompiled kamailio with openssl 1.0?
> >>
> >> Regards,
> >> Andrew
> > ___
> > Kamailio (SER) - Users Mailing List
> > sr-users@lists.kamailio.org
> > https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>
> --
> Daniel-Constantin Mierla -- www.asipto.com
> www.twitter.com/miconda -- www.linkedin.com/in/miconda
> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com
>
>
> ___
> Kamailio (SER) - Users Mailing List
> sr-users@lists.kamailio.org
> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>
___
Kamailio (SER) - Users Mailing List
sr-users@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users


Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-05-20 Thread Aymeric Moizard
Tks Daniel,

I have installed the workaround.

lsof seems to indicate that I have installed and
pre-loaded openssl_mutex_shared.so correctly.

I will let you know if I see the issue again.

Tks!
Aymeric

Le lun. 20 mai 2019 à 09:49, Daniel-Constantin Mierla  a
écrit :

> Hello,
>
> this kind of behaviour, with long time blocking and then moving on, is a
> symptom of the same issue. One of the observed behaviours was that
> attaching with gdb and detaching make code running further, that's what
> kamctl trap does. I haven't looked deeper, but my guess is that some
> signals are sent during the gdb operations.
>
> It would be good if you can test with the workaround and see the results.
> There was already a report that the issue was not seen after a rather long
> running time.
>
> Cheers,
> Daniel
> On 17.05.19 16:03, Aymeric Moizard wrote:
>
> Hi!
>
> I haven't used the workaround yet: I'm focusing on trying to make sure I
> have the same issue
> or trying to figure out how to force it to happen.
>
> I have started to check again the server today and I started by this
> command:
>  $> sudo kamcmd tls.list
>
> In my previous description, the above was a dead lock. Today, It finally
> completed, but
> after 5 minutes. (I suspect 5 minutes is abnormal)
>
> During the long running command:
> -> UDP was working
> -> TCP was not:
> -> The TCP connection is being ESTABLISHED, but the SIP message was not
> replied.
> (this was the behavior I had before)
>
> At the same time, I took a trap "sudo kamctl trap". (during the dead lock)
> -> one thread is on "tls_list" (tls_rpc.c:154)
> -> one thread is on tcpconn_get (core/tcp_main.c:1449) called
> from tcp_send (core/tcp_main.c:1716)
> and seems to be sending a 484 Address Incomplete on a TLS connection
> -> 2 threads are on CRYPTO_THREAD_write_lock on a backtrace showing
> "SSL_do_handshake/tls_accept"
>
> Suddenly, "sudo kamcmd tls.list" completed, and then, my TCP Agent received
> 4 answers from kamailio for the last 4 REGISTER sent.
>
> I have a network capture for my TCP agent.
> I have a trap showing 2 thread waiting on "CRYPTO_THREAD_write_lock"
>
> Conclusion:
> The use-case showed that the lock was VERY long.
> The use-case showed that the lock was TEMPORARY...
>
> Side-note: From my understanding of the multi-fork/openssl issue, I would
> expect
> to see dead lock happening very fast after a kamailio restart?
>
> Do you expect the preload workaround to work in such behavior?
> Or do you consider that my issue is different?
>
> Because there is no "real" dead-lock, I don't understand why "my" issue
> would be related to libssl1.1...
>
> My gdb trap, network capture are available in private exchange if you
> need! (please ask me by direct email)
>
> Tks
> Aymeric
>
>
-- 
Antisip - http://www.antisip.com
___
Kamailio (SER) - Users Mailing List
sr-users@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users


Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-05-20 Thread Daniel-Constantin Mierla
Hello,

this kind of behaviour, with long time blocking and then moving on, is a
symptom of the same issue. One of the observed behaviours was that
attaching with gdb and detaching make code running further, that's what
kamctl trap does. I haven't looked deeper, but my guess is that some
signals are sent during the gdb operations.

It would be good if you can test with the workaround and see the
results. There was already a report that the issue was not seen after a
rather long running time.

Cheers,
Daniel

On 17.05.19 16:03, Aymeric Moizard wrote:
> Hi!
>
> I haven't used the workaround yet: I'm focusing on trying to make sure
> I have the same issue
> or trying to figure out how to force it to happen.
>
> I have started to check again the server today and I started by this
> command:
>  $> sudo kamcmd tls.list
>
> In my previous description, the above was a dead lock. Today, It
> finally completed, but
> after 5 minutes. (I suspect 5 minutes is abnormal)
>
> During the long running command:
> -> UDP was working
> -> TCP was not: 
> -> The TCP connection is being ESTABLISHED, but the SIP message was
> not replied.
>     (this was the behavior I had before)
>
> At the same time, I took a trap "sudo kamctl trap". (during the dead lock)
> -> one thread is on "tls_list" (tls_rpc.c:154)
> -> one thread is on tcpconn_get (core/tcp_main.c:1449) called
> from tcp_send (core/tcp_main.c:1716)
>     and seems to be sending a 484 Address Incomplete on a TLS connection
> -> 2 threads are on CRYPTO_THREAD_write_lock on a backtrace showing
> "SSL_do_handshake/tls_accept"
>
> Suddenly, "sudo kamcmd tls.list" completed, and then, my TCP Agent
> received
> 4 answers from kamailio for the last 4 REGISTER sent.
>
> I have a network capture for my TCP agent.
> I have a trap showing 2 thread waiting on "CRYPTO_THREAD_write_lock"
>
> Conclusion:
> The use-case showed that the lock was VERY long.
> The use-case showed that the lock was TEMPORARY...
>
> Side-note: From my understanding of the multi-fork/openssl issue, I
> would expect
> to see dead lock happening very fast after a kamailio restart?
>
> Do you expect the preload workaround to work in such behavior?
> Or do you consider that my issue is different?
>
> Because there is no "real" dead-lock, I don't understand why "my"
> issue would be related to libssl1.1...
>
> My gdb trap, network capture are available in private exchange if you
> need! (please ask me by direct email)
>
> Tks
> Aymeric
>
> Le lun. 13 mai 2019 à 12:48, Daniel-Constantin Mierla
> mailto:mico...@gmail.com>> a écrit :
>
> Hello,
>
> thanks for the feedback! It is good to know that it works well so
> far for you. I don't see any reason not to make the library to
> preload as part of the next release.
>
> Just to let everyone know, for now, the built packages are pinned
> to link against libssl 1.0.x.
>
> Soon, I will approach the openssl project in order to find a
> proper solution for long term.
>
> Cheers,
> Daniel
>
> On 13.05.19 10:48, Floimair Florian wrote:
>>
>> Hi all!
>>
>>  
>>
>> We have used the work-around with the pre-loaded library and so
>> far this seems to have fixed our problem (that my colleague
>> Kristijan Vrban reported).
>>
>> At least we did not have a single failure within the last week,
>> whereas before the issue happened about once every 2 days.
>>
>> Would be nice if this would be part of the next Kamailio version.
>>
>>  
>>
>>  
>>
>>  
>>
>> With best regards
>>
>>
>> *Florian Floimair
>> *Innovation - Software-Development
>>
>> *COMMEND INTERNATIONAL GMBH
>> *A-5020 Salzburg, Saalachstraße 51
>> http://www.commend.com <http://www.commend.com/>
>>
>> *Security and Communication by Commend
>>
>> *FN 178618z | LG Salzburg
>>
>>  
>>
>> *Von: *sr-users 
>>     <mailto:sr-users-boun...@lists.kamailio.org> im Auftrag von
>> Daniel-Constantin Mierla 
>> <mailto:mico...@gmail.com>
>> *Antworten an: *"mico...@gmail.com" <mailto:mico...@gmail.com>
>>  <mailto:mico...@gmail.com>, "Kamailio (SER) -
>> Users Mailing List" 
>> <mailto:sr-users@lists.kamailio.org>
>> *Datum: *Montag, 15. April 2019 um 09:07
>> *An: *Aymeric Moizard 
>> <mailto:amoiz...@gmail.com>

Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-05-18 Thread Aymeric Moizard
Hi!

I haven't used the workaround yet: I'm focusing on trying to make sure I
have the same issue
or trying to figure out how to force it to happen.

I have started to check again the server today and I started by this
command:
 $> sudo kamcmd tls.list

In my previous description, the above was a dead lock. Today, It finally
completed, but
after 5 minutes. (I suspect 5 minutes is abnormal)

During the long running command:
-> UDP was working
-> TCP was not:
-> The TCP connection is being ESTABLISHED, but the SIP message was not
replied.
(this was the behavior I had before)

At the same time, I took a trap "sudo kamctl trap". (during the dead lock)
-> one thread is on "tls_list" (tls_rpc.c:154)
-> one thread is on tcpconn_get (core/tcp_main.c:1449) called from tcp_send
(core/tcp_main.c:1716)
and seems to be sending a 484 Address Incomplete on a TLS connection
-> 2 threads are on CRYPTO_THREAD_write_lock on a backtrace showing
"SSL_do_handshake/tls_accept"

Suddenly, "sudo kamcmd tls.list" completed, and then, my TCP Agent received
4 answers from kamailio for the last 4 REGISTER sent.

I have a network capture for my TCP agent.
I have a trap showing 2 thread waiting on "CRYPTO_THREAD_write_lock"

Conclusion:
The use-case showed that the lock was VERY long.
The use-case showed that the lock was TEMPORARY...

Side-note: From my understanding of the multi-fork/openssl issue, I would
expect
to see dead lock happening very fast after a kamailio restart?

Do you expect the preload workaround to work in such behavior?
Or do you consider that my issue is different?

Because there is no "real" dead-lock, I don't understand why "my" issue
would be related to libssl1.1...

My gdb trap, network capture are available in private exchange if you need!
(please ask me by direct email)

Tks
Aymeric

Le lun. 13 mai 2019 à 12:48, Daniel-Constantin Mierla  a
écrit :

> Hello,
>
> thanks for the feedback! It is good to know that it works well so far for
> you. I don't see any reason not to make the library to preload as part of
> the next release.
>
> Just to let everyone know, for now, the built packages are pinned to link
> against libssl 1.0.x.
>
> Soon, I will approach the openssl project in order to find a proper
> solution for long term.
>
> Cheers,
> Daniel
> On 13.05.19 10:48, Floimair Florian wrote:
>
> Hi all!
>
>
>
> We have used the work-around with the pre-loaded library and so far this
> seems to have fixed our problem (that my colleague Kristijan Vrban
> reported).
>
> At least we did not have a single failure within the last week, whereas
> before the issue happened about once every 2 days.
>
> Would be nice if this would be part of the next Kamailio version.
>
>
>
>
>
>
>
> With best regards
>
>
>
> *Florian Floimair *Innovation - Software-Development
>
>
> *COMMEND INTERNATIONAL GMBH *A-5020 Salzburg, Saalachstraße 51
> http://www.commend.com
>
>
>
> *Security and Communication by Commend *FN 178618z | LG Salzburg
>
>
>
> *Von: *sr-users 
>  im Auftrag von Daniel-Constantin
> Mierla  
> *Antworten an: *"mico...@gmail.com" 
>  , "Kamailio (SER) - Users Mailing
> List"  
> *Datum: *Montag, 15. April 2019 um 09:07
> *An: *Aymeric Moizard  ,
> "Kamailio (SER) - Users Mailing List" 
> 
> *Betreff: *Re: [SR-Users] Kamailio stop to process incoming SIP traffic
> via TCP.
>
>
>
> Hello Aymeric,
>
> would you be able to test with tls module compiled against libssl 1.1 and
> using the pre-loaded shared object workaround?
>
>   *
> https://github.com/kamailio/kamailio/tree/master/src/modules/tls/utils/openssl_mutex_shared
> <https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkamailio%2Fkamailio%2Ftree%2Fmaster%2Fsrc%2Fmodules%2Ftls%2Futils%2Fopenssl_mutex_shared=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526480174=d9E%2Fy4cvdLkGCPUexoCJ7tws3QL4rFqz5ebnMGnsESQ%3D=0>
>
> You should be able to use it with any version, no need to test with
> kamailio master branch.
>
> Just clone the master branch, then:
>
> cd src/modules/tls/utils/openssl_mutex_shared
>
> make
>
> Either from there or copy openssl_mutex_shared.so to a location you want,
> then pre-load it before starting your version of Kamailio.
>
> The README.md in the folder has some more details.
>
> I would like to have some validation that it works fine before approaching
> this topic with libssl project to allow to init the locks with shared
> process option.
>
> Thanks,
> Daniel
>
> On 26.03.19 16:18, Daniel-Constantin Mierla wrote:
>

Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-05-13 Thread Daniel-Constantin Mierla
Hello,

thanks for the feedback! It is good to know that it works well so far
for you. I don't see any reason not to make the library to preload as
part of the next release.

Just to let everyone know, for now, the built packages are pinned to
link against libssl 1.0.x.

Soon, I will approach the openssl project in order to find a proper
solution for long term.

Cheers,
Daniel

On 13.05.19 10:48, Floimair Florian wrote:
>
> Hi all!
>
>  
>
> We have used the work-around with the pre-loaded library and so far
> this seems to have fixed our problem (that my colleague Kristijan
> Vrban reported).
>
> At least we did not have a single failure within the last week,
> whereas before the issue happened about once every 2 days.
>
> Would be nice if this would be part of the next Kamailio version.
>
>  
>
>  
>
>  
>
> With best regards
>
>
> *Florian Floimair
> *Innovation - Software-Development
>
> *COMMEND INTERNATIONAL GMBH
> *A-5020 Salzburg, Saalachstraße 51
> http://www.commend.com <http://www.commend.com/>
>
> *Security and Communication by Commend
>
> *FN 178618z | LG Salzburg
>
>  
>
> *Von: *sr-users  im Auftrag von
> Daniel-Constantin Mierla 
> *Antworten an: *"mico...@gmail.com" , "Kamailio
> (SER) - Users Mailing List" 
> *Datum: *Montag, 15. April 2019 um 09:07
> *An: *Aymeric Moizard , "Kamailio (SER) - Users
> Mailing List" 
> *Betreff: *Re: [SR-Users] Kamailio stop to process incoming SIP
> traffic via TCP.
>
>  
>
> Hello Aymeric,
>
> would you be able to test with tls module compiled against libssl 1.1
> and using the pre-loaded shared object workaround?
>
>   *
> https://github.com/kamailio/kamailio/tree/master/src/modules/tls/utils/openssl_mutex_shared
> <https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkamailio%2Fkamailio%2Ftree%2Fmaster%2Fsrc%2Fmodules%2Ftls%2Futils%2Fopenssl_mutex_shared=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526480174=d9E%2Fy4cvdLkGCPUexoCJ7tws3QL4rFqz5ebnMGnsESQ%3D=0>
>
> You should be able to use it with any version, no need to test with
> kamailio master branch.
>
> Just clone the master branch, then:
>
> cd src/modules/tls/utils/openssl_mutex_shared
>
> make
>
> Either from there or copy openssl_mutex_shared.so to a location you
> want, then pre-load it before starting your version of Kamailio.
>
> The README.md in the folder has some more details.
>
> I would like to have some validation that it works fine before
> approaching this topic with libssl project to allow to init the locks
> with shared process option.
>
> Thanks,
> Daniel
>
> On 26.03.19 16:18, Daniel-Constantin Mierla wrote:
>
> Hello,
>
> yep, locking there is expected, as listing the tls connections
> wait for no other processes to change the content of internal tls
> connection structures. So it is a side effect of libssl/libcrypto
> getting stuck and the other processing waiting for it to move one.
> I have the Kamailio training in USA these days, so the trip and
> schedule of the day didn't allow me to look more at the
> libsll/libcrypto code in order to find a solution here. It is a
> high priority in my list, as I get time during the next days.
>
> Cheers,
> Daniel
>
> On 26.03.19 15:55, Aymeric Moizard wrote:
>
> Hi All,
>
>  
>
> I was debugging a TCP issue (most probably, I may start a
> thread for this question).
>
>  
>
> I was trying to get some info for TCP and TLS.
>
>  
>
> I typed:
>
> $> sudo kamctl rpc tls.list
>
>  
>
> And waited for a while until... I realized that my
> User-Agent, connected with TCP was not able to register any
> more. I think the rpc command has introduced something wrong.
>
>  
>
> The device can successfully "connect", send the REGISTER over
> the established TCP connection. The REGISTER do not appear in
> the logs any more, I don't see any traffic for TCP any more.
> So the behavior is the same as I had before: TCP and TLS are
> both not working and UDP is still working fine.
>
>  
>
> kamctl do not work any more... so kamctl trap do not work...
>
>  
>
> I have been able to type.. manually... for (all?) kamailio
> threads:
>
>  
>
> gdb /usr/sbin/kamailio 16500 -batch --eval-command="bt full"
> 

Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-05-13 Thread Floimair Florian
Hi all!

We have used the work-around with the pre-loaded library and so far this seems 
to have fixed our problem (that my colleague Kristijan Vrban reported).
At least we did not have a single failure within the last week, whereas before 
the issue happened about once every 2 days.
Would be nice if this would be part of the next Kamailio version.



With best regards

Florian Floimair
Innovation - Software-Development

COMMEND INTERNATIONAL GMBH
A-5020 Salzburg, Saalachstraße 51
http://www.commend.com<http://www.commend.com/>

Security and Communication by Commend

FN 178618z | LG Salzburg

Von: sr-users  im Auftrag von 
Daniel-Constantin Mierla 
Antworten an: "mico...@gmail.com" , "Kamailio (SER) - Users 
Mailing List" 
Datum: Montag, 15. April 2019 um 09:07
An: Aymeric Moizard , "Kamailio (SER) - Users Mailing List" 

Betreff: Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.


Hello Aymeric,

would you be able to test with tls module compiled against libssl 1.1 and using 
the pre-loaded shared object workaround?

  * 
https://github.com/kamailio/kamailio/tree/master/src/modules/tls/utils/openssl_mutex_shared<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkamailio%2Fkamailio%2Ftree%2Fmaster%2Fsrc%2Fmodules%2Ftls%2Futils%2Fopenssl_mutex_shared=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526480174=d9E%2Fy4cvdLkGCPUexoCJ7tws3QL4rFqz5ebnMGnsESQ%3D=0>

You should be able to use it with any version, no need to test with kamailio 
master branch.

Just clone the master branch, then:

cd src/modules/tls/utils/openssl_mutex_shared

make

Either from there or copy openssl_mutex_shared.so to a location you want, then 
pre-load it before starting your version of Kamailio.

The README.md in the folder has some more details.

I would like to have some validation that it works fine before approaching this 
topic with libssl project to allow to init the locks with shared process option.

Thanks,
Daniel
On 26.03.19 16:18, Daniel-Constantin Mierla wrote:

Hello,

yep, locking there is expected, as listing the tls connections wait for no 
other processes to change the content of internal tls connection structures. So 
it is a side effect of libssl/libcrypto getting stuck and the other processing 
waiting for it to move one. I have the Kamailio training in USA these days, so 
the trip and schedule of the day didn't allow me to look more at the 
libsll/libcrypto code in order to find a solution here. It is a high priority 
in my list, as I get time during the next days.

Cheers,
Daniel
On 26.03.19 15:55, Aymeric Moizard wrote:
Hi All,

I was debugging a TCP issue (most probably, I may start a thread for this 
question).

I was trying to get some info for TCP and TLS.

I typed:
$> sudo kamctl rpc tls.list

And waited for a while until... I realized that my User-Agent, connected 
with TCP was not able to register any more. I think the rpc command has 
introduced something wrong.

The device can successfully "connect", send the REGISTER over the established 
TCP connection. The REGISTER do not appear in the logs any more, I don't see 
any traffic for TCP any more. So the behavior is the same as I had before: TCP 
and TLS are both not working and UDP is still working fine.

kamctl do not work any more... so kamctl trap do not work...

I have been able to type.. manually... for (all?) kamailio threads:

gdb /usr/sbin/kamailio 16500 -batch --eval-command="bt full" >> 
kamailio-trap-tcp-down.txt

I'm temporarly puting the backtrace I have here:
https://sip.antisip.com/kamailio-trap-tcp-down.txt<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsip.antisip.com%2Fkamailio-trap-tcp-down.txt=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526490178=1lfFxvR0m4PVcfnYsrrIO%2FM2nbGK6zfpl2C01O2c7M0%3D=0>

You can see a thread stuck on the json command line: "tls_list"
And many other waiting on CRYPTO_THREAD_write_lock
? might be related to: 
https://github.com/openssl/openssl/issues/5376<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenssl%2Fopenssl%2Fissues%2F5376=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526490178=D5Fb4U3trdbRUY7ifMLSc5KE4mAxjK2%2BzOy8nSD1Rks%3D=0>
SIDE NOTE:
Right before I was typing the last gdb command for the last thread, kamailio
has crashed: This was around 5 minutes after the dead lock started.

Mar 26 14:47:11 sip kamailio[16493]: ERROR:  [core/tcp_main.c:2561]: 
tcpconn_do_send(): failed to send on 0x7ff8dfc2fdc8 
(91.121.30.149:5061->62.210.97.21:49351<https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2F62.210.97.21%3A49351=02%7C01%7Cf.floimair%40commend.com

Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-05-02 Thread Daniel-Constantin Mierla
Hello,

I think one possibility to reproduce the issue would be to create a
scenario when same connection is wanted at the same time, when the first
process that gets the lock on it needs a bit more time to execute. Not
sure how to create the case, maybe something like:

  - enable onsend_route for replies and when getting a 200ok reply to
and INVITE there, do a sleep()
  - because the ACK is not coming fast enough, 200ok should be
retransmitted by the callee, another K process will get it and will try
to send it over the same connection
  - run with not many tcp workers, maybe like tcp_children=4
  - do several calls and see how it goes

Again, not sure it covers the case properly, but it is something to
test, because the backtraces I got showed attempts to use same connection.

Otherwise, just running it with traffic for long time, eventually with
two kamailio connected via tls, so a single connection is used for all
traffic between them and makes it likely to have many processes trying
to use it.

Cheers,
Daniel

On 01.05.19 22:26, Aymeric Moizard wrote:
> HI Daniel,
>
> I have received your request and have added it to my TODO list...
>
> Unfortunatly, no much time currently. I will certainly do it later, but
> cannot give any delay for it.
>
> Also, I would really like to understand how to "generate" the issue.
> (I think I had the issue only once or twice this year...)
>
> Otherwise, I will have no way to make sure the workaround would
> work...
>
> Tks
> Aymeric
>
>
>
> Le lun. 15 avr. 2019 à 09:06, Daniel-Constantin Mierla
> mailto:mico...@gmail.com>> a écrit :
>
> Hello Aymeric,
>
> would you be able to test with tls module compiled against libssl
> 1.1 and using the pre-loaded shared object workaround?
>
>   *
> 
> https://github.com/kamailio/kamailio/tree/master/src/modules/tls/utils/openssl_mutex_shared
>
> You should be able to use it with any version, no need to test
> with kamailio master branch.
>
> Just clone the master branch, then:
>
> cd src/modules/tls/utils/openssl_mutex_shared
>
> make
>
> Either from there or copy openssl_mutex_shared.so to a location
> you want, then pre-load it before starting your version of Kamailio.
>
> The README.md in the folder has some more details.
>
> I would like to have some validation that it works fine before
> approaching this topic with libssl project to allow to init the
> locks with shared process option.
>
> Thanks,
> Daniel
>
> On 26.03.19 16:18, Daniel-Constantin Mierla wrote:
>>
>> Hello,
>>
>> yep, locking there is expected, as listing the tls connections
>> wait for no other processes to change the content of internal tls
>> connection structures. So it is a side effect of libssl/libcrypto
>> getting stuck and the other processing waiting for it to move
>> one. I have the Kamailio training in USA these days, so the trip
>> and schedule of the day didn't allow me to look more at the
>> libsll/libcrypto code in order to find a solution here. It is a
>> high priority in my list, as I get time during the next days.
>>
>> Cheers,
>> Daniel
>>
>> On 26.03.19 15:55, Aymeric Moizard wrote:
>>> Hi All,
>>>
>>> I was debugging a TCP issue (most probably, I may start a thread
>>> for this question).
>>>
>>> I was trying to get some info for TCP and TLS.
>>>
>>> I typed:
>>> $> sudo kamctl rpc tls.list
>>>
>>> And waited for a while until... I realized that my
>>> User-Agent, connected with TCP was not able to register any
>>> more. I think the rpc command has introduced something wrong.
>>>
>>> The device can successfully "connect", send the REGISTER over
>>> the established TCP connection. The REGISTER do not appear in
>>> the logs any more, I don't see any traffic for TCP any more. So
>>> the behavior is the same as I had before: TCP and TLS are both
>>> not working and UDP is still working fine.
>>>
>>> kamctl do not work any more... so kamctl trap do not work...
>>>
>>> I have been able to type.. manually... for (all?) kamailio threads:
>>>
>>> gdb /usr/sbin/kamailio 16500 -batch --eval-command="bt full" >>
>>> kamailio-trap-tcp-down.txt
>>>
>>> I'm temporarly puting the backtrace I have here:
>>> https://sip.antisip.com/kamailio-trap-tcp-down.txt
>>>
>>> You can see a thread stuck on the json command line: "tls_list"
>>> And many other waiting on CRYPTO_THREAD_write_lock
>>> ? might be related to:
>>> https://github.com/openssl/openssl/issues/5376
>>> SIDE NOTE:
>>> Right before I was typing the last gdb command for the last
>>> thread, kamailio
>>> has crashed: This was around 5 minutes after the dead lock started.
>>>
>>> Mar 26 14:47:11 sip kamailio[16493]: ERROR: 
>>> [core/tcp_main.c:2561]: tcpconn_do_send(): failed to send on
>>> 0x7ff8dfc2fdc8 (91.121.30.149:5061->62.210.97.21:49351
>>> 

Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-05-01 Thread Aymeric Moizard
HI Daniel,

I have received your request and have added it to my TODO list...

Unfortunatly, no much time currently. I will certainly do it later, but
cannot give any delay for it.

Also, I would really like to understand how to "generate" the issue.
(I think I had the issue only once or twice this year...)

Otherwise, I will have no way to make sure the workaround would
work...

Tks
Aymeric



Le lun. 15 avr. 2019 à 09:06, Daniel-Constantin Mierla 
a écrit :

> Hello Aymeric,
>
> would you be able to test with tls module compiled against libssl 1.1 and
> using the pre-loaded shared object workaround?
>
>   *
> https://github.com/kamailio/kamailio/tree/master/src/modules/tls/utils/openssl_mutex_shared
>
> You should be able to use it with any version, no need to test with
> kamailio master branch.
>
> Just clone the master branch, then:
>
> cd src/modules/tls/utils/openssl_mutex_shared
>
> make
>
> Either from there or copy openssl_mutex_shared.so to a location you want,
> then pre-load it before starting your version of Kamailio.
>
> The README.md in the folder has some more details.
>
> I would like to have some validation that it works fine before approaching
> this topic with libssl project to allow to init the locks with shared
> process option.
>
> Thanks,
> Daniel
> On 26.03.19 16:18, Daniel-Constantin Mierla wrote:
>
> Hello,
>
> yep, locking there is expected, as listing the tls connections wait for no
> other processes to change the content of internal tls connection
> structures. So it is a side effect of libssl/libcrypto getting stuck and
> the other processing waiting for it to move one. I have the Kamailio
> training in USA these days, so the trip and schedule of the day didn't
> allow me to look more at the libsll/libcrypto code in order to find a
> solution here. It is a high priority in my list, as I get time during the
> next days.
>
> Cheers,
> Daniel
> On 26.03.19 15:55, Aymeric Moizard wrote:
>
> Hi All,
>
> I was debugging a TCP issue (most probably, I may start a thread for this
> question).
>
> I was trying to get some info for TCP and TLS.
>
> I typed:
> $> sudo kamctl rpc tls.list
>
> And waited for a while until... I realized that my User-Agent,
> connected with TCP was not able to register any more. I think the rpc
> command has introduced something wrong.
>
> The device can successfully "connect", send the REGISTER over the
> established TCP connection. The REGISTER do not appear in the logs any
> more, I don't see any traffic for TCP any more. So the behavior is the same
> as I had before: TCP and TLS are both not working and UDP is still working
> fine.
>
> kamctl do not work any more... so kamctl trap do not work...
>
> I have been able to type.. manually... for (all?) kamailio threads:
>
> gdb /usr/sbin/kamailio 16500 -batch --eval-command="bt full" >>
> kamailio-trap-tcp-down.txt
>
> I'm temporarly puting the backtrace I have here:
> https://sip.antisip.com/kamailio-trap-tcp-down.txt
>
> You can see a thread stuck on the json command line: "tls_list"
> And many other waiting on CRYPTO_THREAD_write_lock
> ? might be related to: https://github.com/openssl/openssl/issues/5376
> SIDE NOTE:
> Right before I was typing the last gdb command for the last thread,
> kamailio
> has crashed: This was around 5 minutes after the dead lock started.
>
> Mar 26 14:47:11 sip kamailio[16493]: ERROR:  [core/tcp_main.c:2561]:
> tcpconn_do_send(): failed to send on 0x7ff8dfc2fdc8 (91.121.30.149:5061->
> 62.210.97.21:49351): Broken pipe (32)
> Mar 26 14:47:11 sip kamailio[16493]: ERROR:  [core/tcp_read.c:1505]:
> tcp_read_req(): ERROR: tcp_read_req: error reading - c: 0x7ff8dfc2fdc8 r:
> 0x7ff8dfc2fe48 (-1)
> Mar 26 14:47:11 sip kamailio[16493]: WARNING: 
> [core/tcp_read.c:1848]: handle_io(): F_TCPCONN connection marked as bad:
> 0x7ff8dfa6a408 id 846 refcnt 3
> Mar 26 14:47:11 sip kamailio[16371]: ALERT:  [main.c:755]:
> handle_sigs(): child process 16374 exited by a signal 11
> Mar 26 14:47:11 sip kamailio[16371]: ALERT:  [main.c:758]:
> handle_sigs(): core was not generated
> Mar 26 14:47:11 sip kamailio[16371]: INFO:  [main.c:781]:
> handle_sigs(): terminating due to SIGCHLD
> Mar 26 14:47:11 sip kamailio[16493]: INFO:  [main.c:836]: sig_usr():
> signal 15 received
> Mar 26 14:47:11 sip kamailio[16500]: INFO:  [main.c:836]: sig_usr():
> signal 15 received
> Mar 26 14:47:11 sip kamailio[16479]: INFO:  [main.c:836]: sig_usr():
> signal 15 received
>
>
> Unfortunalty, even if I did my best to setup my service to generate a core
> on crash, I still have "core was not generated" (debian stretch)
>
> Tks for reading!
> Regards
> Aymeric
>
>
>
> Le mar. 26 mars 2019 à 14:11, Kristijan Vrban  a
> écrit :
>
>> And again one more kamctl trap file where
>>
>> set_reply_no_connect was set.
>>
>> Am Di., 26. März 2019 um 08:53 Uhr schrieb Kristijan Vrban
>> :
>> >
>> > Attached also the output of kamctl trap
>> >
>> > Am Di., 26. März 2019 um 08:42 Uhr schrieb Kristijan Vrban
>> > :
>> > >
>> > 

Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-04-15 Thread Daniel-Constantin Mierla
Hello Aymeric,

would you be able to test with tls module compiled against libssl 1.1
and using the pre-loaded shared object workaround?

  *
https://github.com/kamailio/kamailio/tree/master/src/modules/tls/utils/openssl_mutex_shared

You should be able to use it with any version, no need to test with
kamailio master branch.

Just clone the master branch, then:

cd src/modules/tls/utils/openssl_mutex_shared

make

Either from there or copy openssl_mutex_shared.so to a location you
want, then pre-load it before starting your version of Kamailio.

The README.md in the folder has some more details.

I would like to have some validation that it works fine before
approaching this topic with libssl project to allow to init the locks
with shared process option.

Thanks,
Daniel

On 26.03.19 16:18, Daniel-Constantin Mierla wrote:
>
> Hello,
>
> yep, locking there is expected, as listing the tls connections wait
> for no other processes to change the content of internal tls
> connection structures. So it is a side effect of libssl/libcrypto
> getting stuck and the other processing waiting for it to move one. I
> have the Kamailio training in USA these days, so the trip and schedule
> of the day didn't allow me to look more at the libsll/libcrypto code
> in order to find a solution here. It is a high priority in my list, as
> I get time during the next days.
>
> Cheers,
> Daniel
>
> On 26.03.19 15:55, Aymeric Moizard wrote:
>> Hi All,
>>
>> I was debugging a TCP issue (most probably, I may start a thread for
>> this question).
>>
>> I was trying to get some info for TCP and TLS.
>>
>> I typed:
>> $> sudo kamctl rpc tls.list
>>
>> And waited for a while until... I realized that my User-Agent,
>> connected with TCP was not able to register any more. I think the rpc
>> command has introduced something wrong.
>>
>> The device can successfully "connect", send the REGISTER over the
>> established TCP connection. The REGISTER do not appear in the logs
>> any more, I don't see any traffic for TCP any more. So the behavior
>> is the same as I had before: TCP and TLS are both not working and UDP
>> is still working fine.
>>
>> kamctl do not work any more... so kamctl trap do not work...
>>
>> I have been able to type.. manually... for (all?) kamailio threads:
>>
>> gdb /usr/sbin/kamailio 16500 -batch --eval-command="bt full" >>
>> kamailio-trap-tcp-down.txt
>>
>> I'm temporarly puting the backtrace I have here:
>> https://sip.antisip.com/kamailio-trap-tcp-down.txt
>>
>> You can see a thread stuck on the json command line: "tls_list"
>> And many other waiting on CRYPTO_THREAD_write_lock
>> ? might be related to: https://github.com/openssl/openssl/issues/5376
>> SIDE NOTE:
>> Right before I was typing the last gdb command for the last thread,
>> kamailio
>> has crashed: This was around 5 minutes after the dead lock started.
>>
>> Mar 26 14:47:11 sip kamailio[16493]: ERROR: 
>> [core/tcp_main.c:2561]: tcpconn_do_send(): failed to send on
>> 0x7ff8dfc2fdc8 (91.121.30.149:5061->62.210.97.21:49351
>> ): Broken pipe (32)
>> Mar 26 14:47:11 sip kamailio[16493]: ERROR: 
>> [core/tcp_read.c:1505]: tcp_read_req(): ERROR: tcp_read_req: error
>> reading - c: 0x7ff8dfc2fdc8 r: 0x7ff8dfc2fe48 (-1)
>> Mar 26 14:47:11 sip kamailio[16493]: WARNING: 
>> [core/tcp_read.c:1848]: handle_io(): F_TCPCONN connection marked as
>> bad: 0x7ff8dfa6a408 id 846 refcnt 3
>> Mar 26 14:47:11 sip kamailio[16371]: ALERT:  [main.c:755]:
>> handle_sigs(): child process 16374 exited by a signal 11
>> Mar 26 14:47:11 sip kamailio[16371]: ALERT:  [main.c:758]:
>> handle_sigs(): core was not generated
>> Mar 26 14:47:11 sip kamailio[16371]: INFO:  [main.c:781]:
>> handle_sigs(): terminating due to SIGCHLD
>> Mar 26 14:47:11 sip kamailio[16493]: INFO:  [main.c:836]:
>> sig_usr(): signal 15 received
>> Mar 26 14:47:11 sip kamailio[16500]: INFO:  [main.c:836]:
>> sig_usr(): signal 15 received
>> Mar 26 14:47:11 sip kamailio[16479]: INFO:  [main.c:836]:
>> sig_usr(): signal 15 received
>>
>>
>> Unfortunalty, even if I did my best to setup my service to generate a
>> core on crash, I still have "core was not generated" (debian stretch)
>>
>> Tks for reading!
>> Regards
>> Aymeric
>>
>>
>>
>> Le mar. 26 mars 2019 à 14:11, Kristijan Vrban > > a écrit :
>>
>> And again one more kamctl trap file where
>>
>> set_reply_no_connect was set.
>>
>> Am Di., 26. März 2019 um 08:53 Uhr schrieb Kristijan Vrban
>> mailto:vrban.l...@gmail.com>>:
>> >
>> > Attached also the output of kamctl trap
>> >
>> > Am Di., 26. März 2019 um 08:42 Uhr schrieb Kristijan Vrban
>> > mailto:vrban.l...@gmail.com>>:
>> > >
>> > > > Have you done a test with tools such as sipp, or was this
>> happening
>> > > > after a while, with usual phones registering?
>> > >
>> > > Usual variety of devices registering via TLS. But i can not
>> exclude
>> > > that some devices displaying 

Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-04-11 Thread Richard Fuchs

Hi,

(X-posted to sr-dev as this is getting into the nitty gritty)

As a short-term workaround for this, I've been playing with the 
preloaded library approach to hijack the pthread mutex calls and force 
them to provide process-shared mutexes. AFAICT this seems to be working 
and only has the minuscule performance impact of using slower 
process-shared mutexes in all instances, even when they aren't required.


The code for the preloaded library itself is very short and simple: 
https://gist.github.com/rfuchs/1bb7348b6acbe37e557d94c2f69a1498


As a more complete patch that integrates it into the build system 
(probably badly): 
https://gist.github.com/rfuchs/b240ffe87938a45e6f2a4cf53fe29f17


Finally it requires adding it to the startup script, for example in a 
systemd service file as:


Environment='LD_PRELOAD=/usr/lib/x86_64-linux-gnu/kamailio/openssl_mutex_shared/openssl_mutex_shared.so'

(that's with a hard coded path which isn't optimal of course).

I don't consider this a proper fix, but only a hacky workaround, but it 
might be a solution for the very near future. Throwing it out there in 
case other people have been working on similar approaches, and/or maybe 
have some comments about this.


Cheers


On 01/04/2019 04.52, Daniel-Constantin Mierla wrote:

Hello,

an update on this issue -- I spent a bit of time looking at
libssl/libcrypto library and the problem can be the type of mutexes they
use now internally starting with v1.1, respectively the pthread mutex.
They are not process shared and kamailio is a multi-process application,
working with the same tls connection from multiple processes.

Today I wrote to openssl mailing list, waiting now to see if I get any
hints from there.

Cheers,
Daniel

On 01.04.19 10:33, Kristijan Vrban wrote:

Hi Andrew,

yes, with openssl 1.0.2 Kamailio is now up and running since five
days. Looks good so far.

Kristijan

Am Do., 28. März 2019 um 11:09 Uhr schrieb Andrew Pogrebennyk
:

On 3/26/19 3:52 PM, Kristijan Vrban wrote:

Just curious, did you get to compile with OpenSSL 1.0 and test?

Just compiled with OpenSSL 1.0 . Gone test now.

Kristijan,
any new occurrences since you have recompiled kamailio with openssl 1.0?

Regards,
Andrew

___
Kamailio (SER) - Users Mailing List
sr-users@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users


___
Kamailio (SER) - Users Mailing List
sr-users@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users


Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-04-01 Thread Daniel-Constantin Mierla
Hello,

an update on this issue -- I spent a bit of time looking at
libssl/libcrypto library and the problem can be the type of mutexes they
use now internally starting with v1.1, respectively the pthread mutex.
They are not process shared and kamailio is a multi-process application,
working with the same tls connection from multiple processes.

Today I wrote to openssl mailing list, waiting now to see if I get any
hints from there.

Cheers,
Daniel

On 01.04.19 10:33, Kristijan Vrban wrote:
> Hi Andrew,
>
> yes, with openssl 1.0.2 Kamailio is now up and running since five
> days. Looks good so far.
>
> Kristijan
>
> Am Do., 28. März 2019 um 11:09 Uhr schrieb Andrew Pogrebennyk
> :
>> On 3/26/19 3:52 PM, Kristijan Vrban wrote:
 Just curious, did you get to compile with OpenSSL 1.0 and test?
>>> Just compiled with OpenSSL 1.0 . Gone test now.
>> Kristijan,
>> any new occurrences since you have recompiled kamailio with openssl 1.0?
>>
>> Regards,
>> Andrew
> ___
> Kamailio (SER) - Users Mailing List
> sr-users@lists.kamailio.org
> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users

-- 
Daniel-Constantin Mierla -- www.asipto.com
www.twitter.com/miconda -- www.linkedin.com/in/miconda
Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com


___
Kamailio (SER) - Users Mailing List
sr-users@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users


Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-04-01 Thread Kristijan Vrban
Hi Andrew,

yes, with openssl 1.0.2 Kamailio is now up and running since five
days. Looks good so far.

Kristijan

Am Do., 28. März 2019 um 11:09 Uhr schrieb Andrew Pogrebennyk
:
>
> On 3/26/19 3:52 PM, Kristijan Vrban wrote:
> >> Just curious, did you get to compile with OpenSSL 1.0 and test?
> > Just compiled with OpenSSL 1.0 . Gone test now.
>
> Kristijan,
> any new occurrences since you have recompiled kamailio with openssl 1.0?
>
> Regards,
> Andrew

___
Kamailio (SER) - Users Mailing List
sr-users@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users


Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-03-28 Thread Andrew Pogrebennyk
On 3/26/19 3:52 PM, Kristijan Vrban wrote:
>> Just curious, did you get to compile with OpenSSL 1.0 and test?
> Just compiled with OpenSSL 1.0 . Gone test now.

Kristijan,
any new occurrences since you have recompiled kamailio with openssl 1.0?

Regards,
Andrew

___
Kamailio (SER) - Users Mailing List
sr-users@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users


Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-03-26 Thread Daniel-Constantin Mierla
Hello,

yep, locking there is expected, as listing the tls connections wait for
no other processes to change the content of internal tls connection
structures. So it is a side effect of libssl/libcrypto getting stuck and
the other processing waiting for it to move one. I have the Kamailio
training in USA these days, so the trip and schedule of the day didn't
allow me to look more at the libsll/libcrypto code in order to find a
solution here. It is a high priority in my list, as I get time during
the next days.

Cheers,
Daniel

On 26.03.19 15:55, Aymeric Moizard wrote:
> Hi All,
>
> I was debugging a TCP issue (most probably, I may start a thread for
> this question).
>
> I was trying to get some info for TCP and TLS.
>
> I typed:
> $> sudo kamctl rpc tls.list
>
> And waited for a while until... I realized that my User-Agent,
> connected with TCP was not able to register any more. I think the rpc
> command has introduced something wrong.
>
> The device can successfully "connect", send the REGISTER over the
> established TCP connection. The REGISTER do not appear in the logs any
> more, I don't see any traffic for TCP any more. So the behavior is the
> same as I had before: TCP and TLS are both not working and UDP is
> still working fine.
>
> kamctl do not work any more... so kamctl trap do not work...
>
> I have been able to type.. manually... for (all?) kamailio threads:
>
> gdb /usr/sbin/kamailio 16500 -batch --eval-command="bt full" >>
> kamailio-trap-tcp-down.txt
>
> I'm temporarly puting the backtrace I have here:
> https://sip.antisip.com/kamailio-trap-tcp-down.txt
>
> You can see a thread stuck on the json command line: "tls_list"
> And many other waiting on CRYPTO_THREAD_write_lock
> ? might be related to: https://github.com/openssl/openssl/issues/5376
> SIDE NOTE:
> Right before I was typing the last gdb command for the last thread,
> kamailio
> has crashed: This was around 5 minutes after the dead lock started.
>
> Mar 26 14:47:11 sip kamailio[16493]: ERROR: 
> [core/tcp_main.c:2561]: tcpconn_do_send(): failed to send on
> 0x7ff8dfc2fdc8 (91.121.30.149:5061->62.210.97.21:49351
> ): Broken pipe (32)
> Mar 26 14:47:11 sip kamailio[16493]: ERROR: 
> [core/tcp_read.c:1505]: tcp_read_req(): ERROR: tcp_read_req: error
> reading - c: 0x7ff8dfc2fdc8 r: 0x7ff8dfc2fe48 (-1)
> Mar 26 14:47:11 sip kamailio[16493]: WARNING: 
> [core/tcp_read.c:1848]: handle_io(): F_TCPCONN connection marked as
> bad: 0x7ff8dfa6a408 id 846 refcnt 3
> Mar 26 14:47:11 sip kamailio[16371]: ALERT:  [main.c:755]:
> handle_sigs(): child process 16374 exited by a signal 11
> Mar 26 14:47:11 sip kamailio[16371]: ALERT:  [main.c:758]:
> handle_sigs(): core was not generated
> Mar 26 14:47:11 sip kamailio[16371]: INFO:  [main.c:781]:
> handle_sigs(): terminating due to SIGCHLD
> Mar 26 14:47:11 sip kamailio[16493]: INFO:  [main.c:836]:
> sig_usr(): signal 15 received
> Mar 26 14:47:11 sip kamailio[16500]: INFO:  [main.c:836]:
> sig_usr(): signal 15 received
> Mar 26 14:47:11 sip kamailio[16479]: INFO:  [main.c:836]:
> sig_usr(): signal 15 received
>
>
> Unfortunalty, even if I did my best to setup my service to generate a
> core on crash, I still have "core was not generated" (debian stretch)
>
> Tks for reading!
> Regards
> Aymeric
>
>
>
> Le mar. 26 mars 2019 à 14:11, Kristijan Vrban  > a écrit :
>
> And again one more kamctl trap file where
>
> set_reply_no_connect was set.
>
> Am Di., 26. März 2019 um 08:53 Uhr schrieb Kristijan Vrban
> mailto:vrban.l...@gmail.com>>:
> >
> > Attached also the output of kamctl trap
> >
> > Am Di., 26. März 2019 um 08:42 Uhr schrieb Kristijan Vrban
> > mailto:vrban.l...@gmail.com>>:
> > >
> > > > Have you done a test with tools such as sipp, or was this
> happening
> > > > after a while, with usual phones registering?
> > >
> > > Usual variety of devices registering via TLS. But i can not
> exclude
> > > that some devices displaying behavioural problems.
> > >
> > > > Can you list the tcp connections and see if they are listed?
> > > > kamctl tcp core.tcp_list
> > >
> > > Need Kex module for that? So i can deliver next time. But when
> i do
> > > "lsof -u kamailio |grep TCP"
> > > i get a long list of more then 2000 lines with:
> > >
> > > ...
> > > kamailio 37561 kamailio 2105u     sock                0,9      0t0
> > > 27856287 protocol: TCP
> > > kamailio 37561 kamailio 2106u     sock                0,9      0t0
> > > 27856305 protocol: TCP
> > > kamailio 37561 kamailio 2107u     sock                0,9      0t0
> > > 27856306 protocol: TCP
> > > kamailio 37561 kamailio 2108u     sock                0,9      0t0
> > > 27856914 protocol: TCP
> > > ...
> > >
> > > So about the time Kamailio created a lot of socket in the TCP
> domain,
> > > but which are not bound to any port (eg 

Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-03-26 Thread Aymeric Moizard
Hi All,

I was debugging a TCP issue (most probably, I may start a thread for this
question).

I was trying to get some info for TCP and TLS.

I typed:
$> sudo kamctl rpc tls.list

And waited for a while until... I realized that my User-Agent,
connected with TCP was not able to register any more. I think the rpc
command has introduced something wrong.

The device can successfully "connect", send the REGISTER over the
established TCP connection. The REGISTER do not appear in the logs any
more, I don't see any traffic for TCP any more. So the behavior is the same
as I had before: TCP and TLS are both not working and UDP is still working
fine.

kamctl do not work any more... so kamctl trap do not work...

I have been able to type.. manually... for (all?) kamailio threads:

gdb /usr/sbin/kamailio 16500 -batch --eval-command="bt full" >>
kamailio-trap-tcp-down.txt

I'm temporarly puting the backtrace I have here:
https://sip.antisip.com/kamailio-trap-tcp-down.txt

You can see a thread stuck on the json command line: "tls_list"
And many other waiting on CRYPTO_THREAD_write_lock

? might be related to: https://github.com/openssl/openssl/issues/5376

SIDE NOTE:
Right before I was typing the last gdb command for the last thread, kamailio
has crashed: This was around 5 minutes after the dead lock started.

Mar 26 14:47:11 sip kamailio[16493]: ERROR:  [core/tcp_main.c:2561]:
tcpconn_do_send(): failed to send on 0x7ff8dfc2fdc8 (91.121.30.149:5061->
62.210.97.21:49351): Broken pipe (32)
Mar 26 14:47:11 sip kamailio[16493]: ERROR:  [core/tcp_read.c:1505]:
tcp_read_req(): ERROR: tcp_read_req: error reading - c: 0x7ff8dfc2fdc8 r:
0x7ff8dfc2fe48 (-1)
Mar 26 14:47:11 sip kamailio[16493]: WARNING: 
[core/tcp_read.c:1848]: handle_io(): F_TCPCONN connection marked as bad:
0x7ff8dfa6a408 id 846 refcnt 3
Mar 26 14:47:11 sip kamailio[16371]: ALERT:  [main.c:755]:
handle_sigs(): child process 16374 exited by a signal 11
Mar 26 14:47:11 sip kamailio[16371]: ALERT:  [main.c:758]:
handle_sigs(): core was not generated
Mar 26 14:47:11 sip kamailio[16371]: INFO:  [main.c:781]:
handle_sigs(): terminating due to SIGCHLD
Mar 26 14:47:11 sip kamailio[16493]: INFO:  [main.c:836]: sig_usr():
signal 15 received
Mar 26 14:47:11 sip kamailio[16500]: INFO:  [main.c:836]: sig_usr():
signal 15 received
Mar 26 14:47:11 sip kamailio[16479]: INFO:  [main.c:836]: sig_usr():
signal 15 received


Unfortunalty, even if I did my best to setup my service to generate a core
on crash, I still have "core was not generated" (debian stretch)

Tks for reading!
Regards
Aymeric



Le mar. 26 mars 2019 à 14:11, Kristijan Vrban  a
écrit :

> And again one more kamctl trap file where
>
> set_reply_no_connect was set.
>
> Am Di., 26. März 2019 um 08:53 Uhr schrieb Kristijan Vrban
> :
> >
> > Attached also the output of kamctl trap
> >
> > Am Di., 26. März 2019 um 08:42 Uhr schrieb Kristijan Vrban
> > :
> > >
> > > > Have you done a test with tools such as sipp, or was this happening
> > > > after a while, with usual phones registering?
> > >
> > > Usual variety of devices registering via TLS. But i can not exclude
> > > that some devices displaying behavioural problems.
> > >
> > > > Can you list the tcp connections and see if they are listed?
> > > > kamctl tcp core.tcp_list
> > >
> > > Need Kex module for that? So i can deliver next time. But when i do
> > > "lsof -u kamailio |grep TCP"
> > > i get a long list of more then 2000 lines with:
> > >
> > > ...
> > > kamailio 37561 kamailio 2105u sock0,9  0t0
> > > 27856287 protocol: TCP
> > > kamailio 37561 kamailio 2106u sock0,9  0t0
> > > 27856305 protocol: TCP
> > > kamailio 37561 kamailio 2107u sock0,9  0t0
> > > 27856306 protocol: TCP
> > > kamailio 37561 kamailio 2108u sock0,9  0t0
> > > 27856914 protocol: TCP
> > > ...
> > >
> > > So about the time Kamailio created a lot of socket in the TCP domain,
> > > but which are not bound to any port (eg via connect(2) or listen(2) or
> > > bind(2))
> > > Until we get to the maximum number of 2048 connections.
> > >
> > > Best
> > > Kristijan
> > >
> > > Am Mo., 25. März 2019 um 14:27 Uhr schrieb Daniel-Constantin Mierla
> > > :
> > > >
> > > > Have you done a test with tools such as sipp, or was this happening
> > > > after a while, with usual phones registering?
> > > >
> > > > Can you list the tcp connections and see if they are listed?
> > > >
> > > > kamctl tcp core.tcp_list
> > > >
> > > > Cheers,
> > > > Daniel
> > > >
> > > > On 25.03.19 08:03, Kristijan Vrban wrote:
> > > > >> The solution here is to use set_reply_no_connect()
> > > > > implemented it. Now the issue has shifted to:
> > > > >
> > > > > ERROR:  [core/tcp_main.c:3959]: handle_new_connect(): maximum
> > > > > number of connections exceeded: 2048/2048
> > > > >
> > > > > But not a single TCP connection is active between Kamailio and any
> > > > > device. Seems this counter for maximum number of 

Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-03-26 Thread Kristijan Vrban
> Just curious, did you get to compile with OpenSSL 1.0 and test?

Just compiled with OpenSSL 1.0 . Gone test now.

Am Di., 26. März 2019 um 15:40 Uhr schrieb Joel Serrano :
>
> Just curious, did you get to compile with OpenSSL 1.0 and test?
>
> On Tue, Mar 26, 2019 at 06:12 Kristijan Vrban  wrote:
>>
>> And again one more kamctl trap file where
>>
>> set_reply_no_connect was set.
>>
>> Am Di., 26. März 2019 um 08:53 Uhr schrieb Kristijan Vrban
>> :
>> >
>> > Attached also the output of kamctl trap
>> >
>> > Am Di., 26. März 2019 um 08:42 Uhr schrieb Kristijan Vrban
>> > :
>> > >
>> > > > Have you done a test with tools such as sipp, or was this happening
>> > > > after a while, with usual phones registering?
>> > >
>> > > Usual variety of devices registering via TLS. But i can not exclude
>> > > that some devices displaying behavioural problems.
>> > >
>> > > > Can you list the tcp connections and see if they are listed?
>> > > > kamctl tcp core.tcp_list
>> > >
>> > > Need Kex module for that? So i can deliver next time. But when i do
>> > > "lsof -u kamailio |grep TCP"
>> > > i get a long list of more then 2000 lines with:
>> > >
>> > > ...
>> > > kamailio 37561 kamailio 2105u sock0,9  0t0
>> > > 27856287 protocol: TCP
>> > > kamailio 37561 kamailio 2106u sock0,9  0t0
>> > > 27856305 protocol: TCP
>> > > kamailio 37561 kamailio 2107u sock0,9  0t0
>> > > 27856306 protocol: TCP
>> > > kamailio 37561 kamailio 2108u sock0,9  0t0
>> > > 27856914 protocol: TCP
>> > > ...
>> > >
>> > > So about the time Kamailio created a lot of socket in the TCP domain,
>> > > but which are not bound to any port (eg via connect(2) or listen(2) or
>> > > bind(2))
>> > > Until we get to the maximum number of 2048 connections.
>> > >
>> > > Best
>> > > Kristijan
>> > >
>> > > Am Mo., 25. März 2019 um 14:27 Uhr schrieb Daniel-Constantin Mierla
>> > > :
>> > > >
>> > > > Have you done a test with tools such as sipp, or was this happening
>> > > > after a while, with usual phones registering?
>> > > >
>> > > > Can you list the tcp connections and see if they are listed?
>> > > >
>> > > > kamctl tcp core.tcp_list
>> > > >
>> > > > Cheers,
>> > > > Daniel
>> > > >
>> > > > On 25.03.19 08:03, Kristijan Vrban wrote:
>> > > > >> The solution here is to use set_reply_no_connect()
>> > > > > implemented it. Now the issue has shifted to:
>> > > > >
>> > > > > ERROR:  [core/tcp_main.c:3959]: handle_new_connect(): maximum
>> > > > > number of connections exceeded: 2048/2048
>> > > > >
>> > > > > But not a single TCP connection is active between Kamailio and any
>> > > > > device. Seems this counter for maximum number of connections
>> > > > > now has an issue?
>> > > > >
>> > > > > Kristijan
>> > > > >
>> > > > > Am Mi., 20. März 2019 um 15:07 Uhr schrieb Daniel-Constantin Mierla
>> > > > > :
>> > > > >> Hello,
>> > > > >>
>> > > > >> based on the trap output I think I could figure out what happened 
>> > > > >> there.
>> > > > >>
>> > > > >> You have tcp_children to very low value (1 or so), the problem is 
>> > > > >> not
>> > > > >> actually that one, but the fact that the connection to upstream (the
>> > > > >> device/app sending the request) was closed after receiving the 
>> > > > >> request
>> > > > >> and routing of the reply gets stuck in the way of:
>> > > > >>
>> > > > >>   - a reply is received and has to be forwarded
>> > > > >>   - connection was lost, so Kamailio tries to establish a new one, 
>> > > > >> but
>> > > > >> takes time till fails because the upstream is behind nat or so 
>> > > > >> based on
>> > > > >> the via header:
>> > > > >>
>> > > > >> Via: SIP/2.0/TLS
>> > > > >> 10.1.0.4:10002;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
>> > > > >>
>> > > > >>   - the reply is retransmitted and gets to another worker, which 
>> > > > >> tries
>> > > > >> to forward it again, but discovers a connection structure for that
>> > > > >> destination exists (created by previous reply worker) and now waits 
>> > > > >> for
>> > > > >> the connection to be released (or better said, for the mutex on 
>> > > > >> writing
>> > > > >> buffer to be unlocked)
>> > > > >>
>> > > > >>   - as the second reply waits, there can be other retransmissions 
>> > > > >> of the
>> > > > >> reply ending up in other workers stuck on waiting for the mutex of 
>> > > > >> the
>> > > > >> connection write buffer
>> > > > >>
>> > > > >> The solution here is to use set_reply_no_connect() -- you can put it
>> > > > >> first in request_route block. I think this would be a good addition 
>> > > > >> to
>> > > > >> the default configuration file as well, IMO, the sip server should 
>> > > > >> not
>> > > > >> connect for sending replies and should do it also for requests that 
>> > > > >> go
>> > > > >> behind nat.
>> > > > >>
>> > > > >> Cheers,
>> > > > >> Daniel
>> > > > >>
>> > > > >> On 19.03.19 10:53, Kristijan 

Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-03-26 Thread Joel Serrano
Just curious, did you get to compile with OpenSSL 1.0 and test?

On Tue, Mar 26, 2019 at 06:12 Kristijan Vrban  wrote:

> And again one more kamctl trap file where
>
> set_reply_no_connect was set.
>
> Am Di., 26. März 2019 um 08:53 Uhr schrieb Kristijan Vrban
> :
> >
> > Attached also the output of kamctl trap
> >
> > Am Di., 26. März 2019 um 08:42 Uhr schrieb Kristijan Vrban
> > :
> > >
> > > > Have you done a test with tools such as sipp, or was this happening
> > > > after a while, with usual phones registering?
> > >
> > > Usual variety of devices registering via TLS. But i can not exclude
> > > that some devices displaying behavioural problems.
> > >
> > > > Can you list the tcp connections and see if they are listed?
> > > > kamctl tcp core.tcp_list
> > >
> > > Need Kex module for that? So i can deliver next time. But when i do
> > > "lsof -u kamailio |grep TCP"
> > > i get a long list of more then 2000 lines with:
> > >
> > > ...
> > > kamailio 37561 kamailio 2105u sock0,9  0t0
> > > 27856287 protocol: TCP
> > > kamailio 37561 kamailio 2106u sock0,9  0t0
> > > 27856305 protocol: TCP
> > > kamailio 37561 kamailio 2107u sock0,9  0t0
> > > 27856306 protocol: TCP
> > > kamailio 37561 kamailio 2108u sock0,9  0t0
> > > 27856914 protocol: TCP
> > > ...
> > >
> > > So about the time Kamailio created a lot of socket in the TCP domain,
> > > but which are not bound to any port (eg via connect(2) or listen(2) or
> > > bind(2))
> > > Until we get to the maximum number of 2048 connections.
> > >
> > > Best
> > > Kristijan
> > >
> > > Am Mo., 25. März 2019 um 14:27 Uhr schrieb Daniel-Constantin Mierla
> > > :
> > > >
> > > > Have you done a test with tools such as sipp, or was this happening
> > > > after a while, with usual phones registering?
> > > >
> > > > Can you list the tcp connections and see if they are listed?
> > > >
> > > > kamctl tcp core.tcp_list
> > > >
> > > > Cheers,
> > > > Daniel
> > > >
> > > > On 25.03.19 08:03, Kristijan Vrban wrote:
> > > > >> The solution here is to use set_reply_no_connect()
> > > > > implemented it. Now the issue has shifted to:
> > > > >
> > > > > ERROR:  [core/tcp_main.c:3959]: handle_new_connect(): maximum
> > > > > number of connections exceeded: 2048/2048
> > > > >
> > > > > But not a single TCP connection is active between Kamailio and any
> > > > > device. Seems this counter for maximum number of connections
> > > > > now has an issue?
> > > > >
> > > > > Kristijan
> > > > >
> > > > > Am Mi., 20. März 2019 um 15:07 Uhr schrieb Daniel-Constantin Mierla
> > > > > :
> > > > >> Hello,
> > > > >>
> > > > >> based on the trap output I think I could figure out what happened
> there.
> > > > >>
> > > > >> You have tcp_children to very low value (1 or so), the problem is
> not
> > > > >> actually that one, but the fact that the connection to upstream
> (the
> > > > >> device/app sending the request) was closed after receiving the
> request
> > > > >> and routing of the reply gets stuck in the way of:
> > > > >>
> > > > >>   - a reply is received and has to be forwarded
> > > > >>   - connection was lost, so Kamailio tries to establish a new
> one, but
> > > > >> takes time till fails because the upstream is behind nat or so
> based on
> > > > >> the via header:
> > > > >>
> > > > >> Via: SIP/2.0/TLS
> > > > >> 10.1.0.4:10002
> ;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
> > > > >>
> > > > >>   - the reply is retransmitted and gets to another worker, which
> tries
> > > > >> to forward it again, but discovers a connection structure for that
> > > > >> destination exists (created by previous reply worker) and now
> waits for
> > > > >> the connection to be released (or better said, for the mutex on
> writing
> > > > >> buffer to be unlocked)
> > > > >>
> > > > >>   - as the second reply waits, there can be other retransmissions
> of the
> > > > >> reply ending up in other workers stuck on waiting for the mutex
> of the
> > > > >> connection write buffer
> > > > >>
> > > > >> The solution here is to use set_reply_no_connect() -- you can put
> it
> > > > >> first in request_route block. I think this would be a good
> addition to
> > > > >> the default configuration file as well, IMO, the sip server
> should not
> > > > >> connect for sending replies and should do it also for requests
> that go
> > > > >> behind nat.
> > > > >>
> > > > >> Cheers,
> > > > >> Daniel
> > > > >>
> > > > >> On 19.03.19 10:53, Kristijan Vrban wrote:
> > > > >>> So i had again the situation. But this time, incoming udp was
> > > > >>> affected. Kamailio was sending out OPTIONS (via dispatcher
> module) to
> > > > >>> a group of asterisk machines
> > > > >>> but the 200 OK reply to the OPTIONS where not processed, so the
> > > > >>> dispatcher module set all asterisk to inactive, even though they
> > > > >>> replied 200 OK
> > > > >>>
> > > > >>> Attached the 

Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-03-26 Thread Kristijan Vrban
> Have you done a test with tools such as sipp, or was this happening
> after a while, with usual phones registering?

Usual variety of devices registering via TLS. But i can not exclude
that some devices displaying behavioural problems.

> Can you list the tcp connections and see if they are listed?
> kamctl tcp core.tcp_list

Need Kex module for that? So i can deliver next time. But when i do
"lsof -u kamailio |grep TCP"
i get a long list of more then 2000 lines with:

...
kamailio 37561 kamailio 2105u sock0,9  0t0
27856287 protocol: TCP
kamailio 37561 kamailio 2106u sock0,9  0t0
27856305 protocol: TCP
kamailio 37561 kamailio 2107u sock0,9  0t0
27856306 protocol: TCP
kamailio 37561 kamailio 2108u sock0,9  0t0
27856914 protocol: TCP
...

So about the time Kamailio created a lot of socket in the TCP domain,
but which are not bound to any port (eg via connect(2) or listen(2) or
bind(2))
Until we get to the maximum number of 2048 connections.

Best
Kristijan

Am Mo., 25. März 2019 um 14:27 Uhr schrieb Daniel-Constantin Mierla
:
>
> Have you done a test with tools such as sipp, or was this happening
> after a while, with usual phones registering?
>
> Can you list the tcp connections and see if they are listed?
>
> kamctl tcp core.tcp_list
>
> Cheers,
> Daniel
>
> On 25.03.19 08:03, Kristijan Vrban wrote:
> >> The solution here is to use set_reply_no_connect()
> > implemented it. Now the issue has shifted to:
> >
> > ERROR:  [core/tcp_main.c:3959]: handle_new_connect(): maximum
> > number of connections exceeded: 2048/2048
> >
> > But not a single TCP connection is active between Kamailio and any
> > device. Seems this counter for maximum number of connections
> > now has an issue?
> >
> > Kristijan
> >
> > Am Mi., 20. März 2019 um 15:07 Uhr schrieb Daniel-Constantin Mierla
> > :
> >> Hello,
> >>
> >> based on the trap output I think I could figure out what happened there.
> >>
> >> You have tcp_children to very low value (1 or so), the problem is not
> >> actually that one, but the fact that the connection to upstream (the
> >> device/app sending the request) was closed after receiving the request
> >> and routing of the reply gets stuck in the way of:
> >>
> >>   - a reply is received and has to be forwarded
> >>   - connection was lost, so Kamailio tries to establish a new one, but
> >> takes time till fails because the upstream is behind nat or so based on
> >> the via header:
> >>
> >> Via: SIP/2.0/TLS
> >> 10.1.0.4:10002;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
> >>
> >>   - the reply is retransmitted and gets to another worker, which tries
> >> to forward it again, but discovers a connection structure for that
> >> destination exists (created by previous reply worker) and now waits for
> >> the connection to be released (or better said, for the mutex on writing
> >> buffer to be unlocked)
> >>
> >>   - as the second reply waits, there can be other retransmissions of the
> >> reply ending up in other workers stuck on waiting for the mutex of the
> >> connection write buffer
> >>
> >> The solution here is to use set_reply_no_connect() -- you can put it
> >> first in request_route block. I think this would be a good addition to
> >> the default configuration file as well, IMO, the sip server should not
> >> connect for sending replies and should do it also for requests that go
> >> behind nat.
> >>
> >> Cheers,
> >> Daniel
> >>
> >> On 19.03.19 10:53, Kristijan Vrban wrote:
> >>> So i had again the situation. But this time, incoming udp was
> >>> affected. Kamailio was sending out OPTIONS (via dispatcher module) to
> >>> a group of asterisk machines
> >>> but the 200 OK reply to the OPTIONS where not processed, so the
> >>> dispatcher module set all asterisk to inactive, even though they
> >>> replied 200 OK
> >>>
> >>> Attached the output of kamctl trap during the situation. Hope there is
> >>> any useful in it. Because after "kamctl trap" it was working again
> >>> without kamailio restart.
> >>>
> >>> Best
> >>> Kristijan
> >>>
> >>> Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla
> >>> :
>  Hello,
> 
>  setting tcp_children=1 is not a god option for scallability, practically
>  you set kamailio to process a single tcp message at one time, on high
>  traffic, that won't work well.
> 
>  Maybe try to set tcp_children to 2 or 4, that should make an eventual
>  race appear faster.
> 
>  Regarding the pid, if it is an outgoing connection, then it can be
>  created by any worker process, including a UDP worker, if that was the
>  one receiving the sip message over udp and sends it out via tcp.
> 
>  Cheers,
>  Daniel
> 
>  On 18.03.19 10:09, Kristijan Vrban wrote:
> > Hi Daniel,
> >
> > for testing, i now had set: "tcp_children=1" and so far this issue did 
> > not 

Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-03-25 Thread Sergey Safarov
I looked similar examples when
1) used perl module + perl app in kamailio config;
2) used http_client module and upstream http server return error message
with size about 64Kb.

you can check your config for external server calls. Think this may be
related.
Sergey

пн, 25 мар. 2019 г. в 16:28, Daniel-Constantin Mierla :

> Have you done a test with tools such as sipp, or was this happening
> after a while, with usual phones registering?
>
> Can you list the tcp connections and see if they are listed?
>
> kamctl tcp core.tcp_list
>
> Cheers,
> Daniel
>
> On 25.03.19 08:03, Kristijan Vrban wrote:
> >> The solution here is to use set_reply_no_connect()
> > implemented it. Now the issue has shifted to:
> >
> > ERROR:  [core/tcp_main.c:3959]: handle_new_connect(): maximum
> > number of connections exceeded: 2048/2048
> >
> > But not a single TCP connection is active between Kamailio and any
> > device. Seems this counter for maximum number of connections
> > now has an issue?
> >
> > Kristijan
> >
> > Am Mi., 20. März 2019 um 15:07 Uhr schrieb Daniel-Constantin Mierla
> > :
> >> Hello,
> >>
> >> based on the trap output I think I could figure out what happened there.
> >>
> >> You have tcp_children to very low value (1 or so), the problem is not
> >> actually that one, but the fact that the connection to upstream (the
> >> device/app sending the request) was closed after receiving the request
> >> and routing of the reply gets stuck in the way of:
> >>
> >>   - a reply is received and has to be forwarded
> >>   - connection was lost, so Kamailio tries to establish a new one, but
> >> takes time till fails because the upstream is behind nat or so based on
> >> the via header:
> >>
> >> Via: SIP/2.0/TLS
> >> 10.1.0.4:10002
> ;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
> >>
> >>   - the reply is retransmitted and gets to another worker, which tries
> >> to forward it again, but discovers a connection structure for that
> >> destination exists (created by previous reply worker) and now waits for
> >> the connection to be released (or better said, for the mutex on writing
> >> buffer to be unlocked)
> >>
> >>   - as the second reply waits, there can be other retransmissions of the
> >> reply ending up in other workers stuck on waiting for the mutex of the
> >> connection write buffer
> >>
> >> The solution here is to use set_reply_no_connect() -- you can put it
> >> first in request_route block. I think this would be a good addition to
> >> the default configuration file as well, IMO, the sip server should not
> >> connect for sending replies and should do it also for requests that go
> >> behind nat.
> >>
> >> Cheers,
> >> Daniel
> >>
> >> On 19.03.19 10:53, Kristijan Vrban wrote:
> >>> So i had again the situation. But this time, incoming udp was
> >>> affected. Kamailio was sending out OPTIONS (via dispatcher module) to
> >>> a group of asterisk machines
> >>> but the 200 OK reply to the OPTIONS where not processed, so the
> >>> dispatcher module set all asterisk to inactive, even though they
> >>> replied 200 OK
> >>>
> >>> Attached the output of kamctl trap during the situation. Hope there is
> >>> any useful in it. Because after "kamctl trap" it was working again
> >>> without kamailio restart.
> >>>
> >>> Best
> >>> Kristijan
> >>>
> >>> Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla
> >>> :
>  Hello,
> 
>  setting tcp_children=1 is not a god option for scallability,
> practically
>  you set kamailio to process a single tcp message at one time, on high
>  traffic, that won't work well.
> 
>  Maybe try to set tcp_children to 2 or 4, that should make an eventual
>  race appear faster.
> 
>  Regarding the pid, if it is an outgoing connection, then it can be
>  created by any worker process, including a UDP worker, if that was the
>  one receiving the sip message over udp and sends it out via tcp.
> 
>  Cheers,
>  Daniel
> 
>  On 18.03.19 10:09, Kristijan Vrban wrote:
> > Hi Daniel,
> >
> > for testing, i now had set: "tcp_children=1" and so far this issue
> did not occur
> > ever since. So now value to provide for "kamctl trap" yet.
> >
> > "kamctl ps" show this two process to handle tcp:
> >
> > ...
> > }, {
> >   "IDX":  25,
> >   "PID":  71929,
> >   "DSC":  "tcp receiver (generic) child=0"
> > }, {
> >   "IDX":  26,
> >   "PID":  71933,
> >   "DSC":  "tcp main process"
> > }
> > ...
> >
> >
> > Ok, but then is was wondering to see a TCP connection on a udp
> receiver child:
> >
> >
> > netstat -ntp |grep 5061
> >
> > ...
> > tcp0  0 172.17.217.10:5061  195.70.114.125:18252
> > ESTABLISHED 71895/kamailio
> > ...
> >
> > An pid 71895 is:
> >
> > }, {
> >   "IDX":  3,
> >   "PID":  

Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-03-25 Thread Kristijan Vrban
> The solution here is to use set_reply_no_connect()

implemented it. Now the issue has shifted to:

ERROR:  [core/tcp_main.c:3959]: handle_new_connect(): maximum
number of connections exceeded: 2048/2048

But not a single TCP connection is active between Kamailio and any
device. Seems this counter for maximum number of connections
now has an issue?

Kristijan

Am Mi., 20. März 2019 um 15:07 Uhr schrieb Daniel-Constantin Mierla
:
>
> Hello,
>
> based on the trap output I think I could figure out what happened there.
>
> You have tcp_children to very low value (1 or so), the problem is not
> actually that one, but the fact that the connection to upstream (the
> device/app sending the request) was closed after receiving the request
> and routing of the reply gets stuck in the way of:
>
>   - a reply is received and has to be forwarded
>   - connection was lost, so Kamailio tries to establish a new one, but
> takes time till fails because the upstream is behind nat or so based on
> the via header:
>
> Via: SIP/2.0/TLS
> 10.1.0.4:10002;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
>
>   - the reply is retransmitted and gets to another worker, which tries
> to forward it again, but discovers a connection structure for that
> destination exists (created by previous reply worker) and now waits for
> the connection to be released (or better said, for the mutex on writing
> buffer to be unlocked)
>
>   - as the second reply waits, there can be other retransmissions of the
> reply ending up in other workers stuck on waiting for the mutex of the
> connection write buffer
>
> The solution here is to use set_reply_no_connect() -- you can put it
> first in request_route block. I think this would be a good addition to
> the default configuration file as well, IMO, the sip server should not
> connect for sending replies and should do it also for requests that go
> behind nat.
>
> Cheers,
> Daniel
>
> On 19.03.19 10:53, Kristijan Vrban wrote:
> > So i had again the situation. But this time, incoming udp was
> > affected. Kamailio was sending out OPTIONS (via dispatcher module) to
> > a group of asterisk machines
> > but the 200 OK reply to the OPTIONS where not processed, so the
> > dispatcher module set all asterisk to inactive, even though they
> > replied 200 OK
> >
> > Attached the output of kamctl trap during the situation. Hope there is
> > any useful in it. Because after "kamctl trap" it was working again
> > without kamailio restart.
> >
> > Best
> > Kristijan
> >
> > Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla
> > :
> >> Hello,
> >>
> >> setting tcp_children=1 is not a god option for scallability, practically
> >> you set kamailio to process a single tcp message at one time, on high
> >> traffic, that won't work well.
> >>
> >> Maybe try to set tcp_children to 2 or 4, that should make an eventual
> >> race appear faster.
> >>
> >> Regarding the pid, if it is an outgoing connection, then it can be
> >> created by any worker process, including a UDP worker, if that was the
> >> one receiving the sip message over udp and sends it out via tcp.
> >>
> >> Cheers,
> >> Daniel
> >>
> >> On 18.03.19 10:09, Kristijan Vrban wrote:
> >>> Hi Daniel,
> >>>
> >>> for testing, i now had set: "tcp_children=1" and so far this issue did 
> >>> not occur
> >>> ever since. So now value to provide for "kamctl trap" yet.
> >>>
> >>> "kamctl ps" show this two process to handle tcp:
> >>>
> >>> ...
> >>> }, {
> >>>   "IDX":  25,
> >>>   "PID":  71929,
> >>>   "DSC":  "tcp receiver (generic) child=0"
> >>> }, {
> >>>   "IDX":  26,
> >>>   "PID":  71933,
> >>>   "DSC":  "tcp main process"
> >>> }
> >>> ...
> >>>
> >>>
> >>> Ok, but then is was wondering to see a TCP connection on a udp receiver 
> >>> child:
> >>>
> >>>
> >>> netstat -ntp |grep 5061
> >>>
> >>> ...
> >>> tcp0  0 172.17.217.10:5061  195.70.114.125:18252
> >>> ESTABLISHED 71895/kamailio
> >>> ...
> >>>
> >>> An pid 71895 is:
> >>>
> >>> }, {
> >>>   "IDX":  3,
> >>>   "PID":  71895,
> >>>   "DSC":  "udp receiver child=2 sock=127.0.0.1:5060"
> >>> }, {
> >>>
> >>>
> >>>
> >>> And if i look into it via "lsof -p 71895" (the udp receiver child)
> >>>
> >>> ...
> >>> kamailio 71895 kamailio   14u  sock0,9  0t0
> >>> 8856085 protocol: TCP
> >>> kamailio 71895 kamailio   15u  sock0,9  0t0
> >>> 8886886 protocol: TCP
> >>> kamailio 71895 kamailio   16u  sock0,9  0t0
> >>> 8854886 protocol: TCP
> >>> kamailio 71895 kamailio   17u  sock0,9  0t0
> >>> 8828915 protocol: TCP
> >>> kamailio 71895 kamailio   18u  unix 0x5f73cb91  0t0
> >>> 1680314 type=DGRAM
> >>> kamailio 71895 kamailio   19u  IPv41846523  0t0
> >>> TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED)
> >>> kamailio 71895 kamailio   20u  sock0,9  0t0
> >>> 8887192 

Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-03-23 Thread Joel Serrano
Yes I agree there... I meant the trigger. I have a feeling your issue is
with TLS, and when it happens it affects the rest...

Give it a try and let us know ;)

On Sat, Mar 23, 2019 at 12:05 Aymeric Moizard  wrote:

> Hi Joel,
>
> My issue was that any TCP traffic wasn't working: including TLS. I guess
> it could be related to the SSL.1.1 issue.
>
> Tks!
> Aymeric
>
> Le ven. 22 mars 2019 à 21:21, Joel Serrano  a écrit :
>
>> Hi Aymeric,
>>
>> Are you sure the issue is with TCP and not strictly related to TLS? I
>> highly suggest you compile with ssl1.0 and give it a try...
>>
>> If you want to read how I got to that conclusion:
>> https://github.com/kamailio/kamailio/issues/1172
>>
>> Hope it helps!
>> Joel.
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Mar 22, 2019 at 11:58 AM Aymeric Moizard 
>> wrote:
>>
>>> Hi Daniel,
>>>
>>> Tks for the tips.
>>>
>>> My traffic does include TLS as well.
>>>
>>> For TCP settings:
>>>
>>> tcp_connection_lifetime=3600
>>> tcp_async=yes
>>> tcp_rd_buf_size=16384
>>> tcp_accept_no_cl=yes
>>> tcp_max_connections=5
>>> tcp_connect_timeout=7
>>>
>>> For TLS:
>>> enable_tls=yes
>>> tls_max_connections=5
>>>
>>> I'm using "set_forward_no_connect();" after lookup(location) since a
>>> long time.
>>>
>>> I have added this week "set_reply_no_connect();" in case it will help to
>>> avoid the issue.
>>>
>>> If the issue occurs, I will try to get something via "kamctrl trap".
>>>
>>> In order to get a coredump (on restart timeout?) I have added this in
>>> my kamailio.service
>>>
>>> WorkingDirectory=/var/run/kamailio
>>> LimitCORE=infinity
>>>
>>> I have also DUMP_CORE=yes in /etc/default/kamailio
>>> and disable_core_dump=no in my kamailio.cfg
>>>
>>> However, I'm not able to see any core dumps when restarting kamailio
>>> even when I see "sig_alarm_abort(): shutdown timeout triggered, dying"...
>>>
>>> Am I supposed to get a core dump in such case?
>>>
>>> Tks a lot!
>>> Aymeric
>>>
>>> Le ven. 22 mars 2019 à 14:19, Daniel-Constantin Mierla <
>>> mico...@gmail.com> a écrit :
>>>
 Do you have pure tcp traffic and facing this issue, or there are
 actually tls connections?

 What are the values for core parameters related to tcp connect and tcp
 send timeouts?

 As for restart taking long, see exit_timeout parameter:

   * https://www.kamailio.org/wiki/cookbooks/5.2.x/core#exit_timeout

 As for tls with libssl1.1/libcrypto1.1, I think I discover what the
 issue is. With v1.1 they use their own internal locking functions, not
 exposing any api to set them from outside. Before, kamailio was
 initializing the libray telling to use Kamailio locks, giving one lock per
 connection. As i could get from some gdb traces I received, with libssl
 1.1, the same internal lock is used for when attempting to connect to
 different addresses as well as when trying to write to different
 connections. If one operation is slow for what so ever reason, the others
 are waiting for the lock to be lifted by the slow operation. I am digging
 in the source code of libssl1.1 to figure out a solution, it can still take
 a bit because I am travelling for several days with no much spare time.

 Among the tunnings would be lower timeouts to connect and send, do not
 attempt to connect unless you are sure the target expects new connections
 (e.g., sending to a gateway/sip server accepting traffic via tls, but don't
 do it even for the requests routed via lookup(location) as the registration
 is using a connection with an ephemeral source port and trying to connect
 back to it will fail). If still a major issue for what so ever reason,
 using a version compiled with libssl1.0 would be something to go for it.

 Cheers,
 Daniel
 On 21.03.19 19:17, Aymeric Moizard wrote:

 Hi List,

 I want to share that I also met this issue last week with my kamailio
 5.2.2.

 As far as I was able to see, SIP application were able to "connect()"
 with TCP, but my logs wasn't reporting any of the SIP message received
 with TCP.

 I have an pike right before an xlog showing every incoming request.
 However
 I suspect the issue was not related to pike module. The log didn't
 showed unusual
 number of blocked traffic.

 I'm almost sure I haven't reached any ulimit restrictions.
 I have many TCP, UDP childreen...
 Server was not under high load
 Nothing unusual.

 I'm running the default build for debian stretch from here:
http://deb.kamailio.org/kamailio52 stretch

 And unfortunatly, I had some tiny pressure to restart the service so I
 was
 not able to get deeper into the issue.

 If I'm correct, I will certainly improve much things by using
 "set_reply_no_connect()".
 I have added it and restarted!
 (Tks Daniel for this tip!)

 I have been looking at issue reported here:

Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-03-23 Thread Aymeric Moizard
Hi Joel,

My issue was that any TCP traffic wasn't working: including TLS. I guess it
could be related to the SSL.1.1 issue.

Tks!
Aymeric

Le ven. 22 mars 2019 à 21:21, Joel Serrano  a écrit :

> Hi Aymeric,
>
> Are you sure the issue is with TCP and not strictly related to TLS? I
> highly suggest you compile with ssl1.0 and give it a try...
>
> If you want to read how I got to that conclusion:
> https://github.com/kamailio/kamailio/issues/1172
>
> Hope it helps!
> Joel.
>
>
>
>
>
>
>
> On Fri, Mar 22, 2019 at 11:58 AM Aymeric Moizard 
> wrote:
>
>> Hi Daniel,
>>
>> Tks for the tips.
>>
>> My traffic does include TLS as well.
>>
>> For TCP settings:
>>
>> tcp_connection_lifetime=3600
>> tcp_async=yes
>> tcp_rd_buf_size=16384
>> tcp_accept_no_cl=yes
>> tcp_max_connections=5
>> tcp_connect_timeout=7
>>
>> For TLS:
>> enable_tls=yes
>> tls_max_connections=5
>>
>> I'm using "set_forward_no_connect();" after lookup(location) since a long
>> time.
>>
>> I have added this week "set_reply_no_connect();" in case it will help to
>> avoid the issue.
>>
>> If the issue occurs, I will try to get something via "kamctrl trap".
>>
>> In order to get a coredump (on restart timeout?) I have added this in
>> my kamailio.service
>>
>> WorkingDirectory=/var/run/kamailio
>> LimitCORE=infinity
>>
>> I have also DUMP_CORE=yes in /etc/default/kamailio
>> and disable_core_dump=no in my kamailio.cfg
>>
>> However, I'm not able to see any core dumps when restarting kamailio
>> even when I see "sig_alarm_abort(): shutdown timeout triggered, dying"...
>>
>> Am I supposed to get a core dump in such case?
>>
>> Tks a lot!
>> Aymeric
>>
>> Le ven. 22 mars 2019 à 14:19, Daniel-Constantin Mierla 
>> a écrit :
>>
>>> Do you have pure tcp traffic and facing this issue, or there are
>>> actually tls connections?
>>>
>>> What are the values for core parameters related to tcp connect and tcp
>>> send timeouts?
>>>
>>> As for restart taking long, see exit_timeout parameter:
>>>
>>>   * https://www.kamailio.org/wiki/cookbooks/5.2.x/core#exit_timeout
>>>
>>> As for tls with libssl1.1/libcrypto1.1, I think I discover what the
>>> issue is. With v1.1 they use their own internal locking functions, not
>>> exposing any api to set them from outside. Before, kamailio was
>>> initializing the libray telling to use Kamailio locks, giving one lock per
>>> connection. As i could get from some gdb traces I received, with libssl
>>> 1.1, the same internal lock is used for when attempting to connect to
>>> different addresses as well as when trying to write to different
>>> connections. If one operation is slow for what so ever reason, the others
>>> are waiting for the lock to be lifted by the slow operation. I am digging
>>> in the source code of libssl1.1 to figure out a solution, it can still take
>>> a bit because I am travelling for several days with no much spare time.
>>>
>>> Among the tunnings would be lower timeouts to connect and send, do not
>>> attempt to connect unless you are sure the target expects new connections
>>> (e.g., sending to a gateway/sip server accepting traffic via tls, but don't
>>> do it even for the requests routed via lookup(location) as the registration
>>> is using a connection with an ephemeral source port and trying to connect
>>> back to it will fail). If still a major issue for what so ever reason,
>>> using a version compiled with libssl1.0 would be something to go for it.
>>>
>>> Cheers,
>>> Daniel
>>> On 21.03.19 19:17, Aymeric Moizard wrote:
>>>
>>> Hi List,
>>>
>>> I want to share that I also met this issue last week with my kamailio
>>> 5.2.2.
>>>
>>> As far as I was able to see, SIP application were able to "connect()"
>>> with TCP, but my logs wasn't reporting any of the SIP message received
>>> with TCP.
>>>
>>> I have an pike right before an xlog showing every incoming request.
>>> However
>>> I suspect the issue was not related to pike module. The log didn't
>>> showed unusual
>>> number of blocked traffic.
>>>
>>> I'm almost sure I haven't reached any ulimit restrictions.
>>> I have many TCP, UDP childreen...
>>> Server was not under high load
>>> Nothing unusual.
>>>
>>> I'm running the default build for debian stretch from here:
>>>http://deb.kamailio.org/kamailio52 stretch
>>>
>>> And unfortunatly, I had some tiny pressure to restart the service so I
>>> was
>>> not able to get deeper into the issue.
>>>
>>> If I'm correct, I will certainly improve much things by using
>>> "set_reply_no_connect()".
>>> I have added it and restarted!
>>> (Tks Daniel for this tip!)
>>>
>>> I have been looking at issue reported here:
>>> "Kamailio 5.0.2 suddenly stops processing traffic, then generates a core
>>> when restarting."
>>> https://github.com/kamailio/kamailio/issues/1172
>>>
>>> I have to say that I do have libssl1.1.
>>> And I do have crash when I restart my kamailio. (even when I simply
>>> restart after a configuration modification)
>>>
>>> Mar 21 18:28:50 sip kamailio[19222]: INFO:  [main.c:836]:

Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-03-22 Thread Joel Serrano
To make it easier for you and not have to go through the whole thread, if
you want the TL;DR start here:
https://github.com/kamailio/kamailio/issues/1172#issuecomment-312634272

On Fri, Mar 22, 2019 at 1:19 PM Joel Serrano  wrote:

> Hi Aymeric,
>
> Are you sure the issue is with TCP and not strictly related to TLS? I
> highly suggest you compile with ssl1.0 and give it a try...
>
> If you want to read how I got to that conclusion:
> https://github.com/kamailio/kamailio/issues/1172
>
> Hope it helps!
> Joel.
>
>
>
>
>
>
>
> On Fri, Mar 22, 2019 at 11:58 AM Aymeric Moizard 
> wrote:
>
>> Hi Daniel,
>>
>> Tks for the tips.
>>
>> My traffic does include TLS as well.
>>
>> For TCP settings:
>>
>> tcp_connection_lifetime=3600
>> tcp_async=yes
>> tcp_rd_buf_size=16384
>> tcp_accept_no_cl=yes
>> tcp_max_connections=5
>> tcp_connect_timeout=7
>>
>> For TLS:
>> enable_tls=yes
>> tls_max_connections=5
>>
>> I'm using "set_forward_no_connect();" after lookup(location) since a long
>> time.
>>
>> I have added this week "set_reply_no_connect();" in case it will help to
>> avoid the issue.
>>
>> If the issue occurs, I will try to get something via "kamctrl trap".
>>
>> In order to get a coredump (on restart timeout?) I have added this in
>> my kamailio.service
>>
>> WorkingDirectory=/var/run/kamailio
>> LimitCORE=infinity
>>
>> I have also DUMP_CORE=yes in /etc/default/kamailio
>> and disable_core_dump=no in my kamailio.cfg
>>
>> However, I'm not able to see any core dumps when restarting kamailio
>> even when I see "sig_alarm_abort(): shutdown timeout triggered, dying"...
>>
>> Am I supposed to get a core dump in such case?
>>
>> Tks a lot!
>> Aymeric
>>
>> Le ven. 22 mars 2019 à 14:19, Daniel-Constantin Mierla 
>> a écrit :
>>
>>> Do you have pure tcp traffic and facing this issue, or there are
>>> actually tls connections?
>>>
>>> What are the values for core parameters related to tcp connect and tcp
>>> send timeouts?
>>>
>>> As for restart taking long, see exit_timeout parameter:
>>>
>>>   * https://www.kamailio.org/wiki/cookbooks/5.2.x/core#exit_timeout
>>>
>>> As for tls with libssl1.1/libcrypto1.1, I think I discover what the
>>> issue is. With v1.1 they use their own internal locking functions, not
>>> exposing any api to set them from outside. Before, kamailio was
>>> initializing the libray telling to use Kamailio locks, giving one lock per
>>> connection. As i could get from some gdb traces I received, with libssl
>>> 1.1, the same internal lock is used for when attempting to connect to
>>> different addresses as well as when trying to write to different
>>> connections. If one operation is slow for what so ever reason, the others
>>> are waiting for the lock to be lifted by the slow operation. I am digging
>>> in the source code of libssl1.1 to figure out a solution, it can still take
>>> a bit because I am travelling for several days with no much spare time.
>>>
>>> Among the tunnings would be lower timeouts to connect and send, do not
>>> attempt to connect unless you are sure the target expects new connections
>>> (e.g., sending to a gateway/sip server accepting traffic via tls, but don't
>>> do it even for the requests routed via lookup(location) as the registration
>>> is using a connection with an ephemeral source port and trying to connect
>>> back to it will fail). If still a major issue for what so ever reason,
>>> using a version compiled with libssl1.0 would be something to go for it.
>>>
>>> Cheers,
>>> Daniel
>>> On 21.03.19 19:17, Aymeric Moizard wrote:
>>>
>>> Hi List,
>>>
>>> I want to share that I also met this issue last week with my kamailio
>>> 5.2.2.
>>>
>>> As far as I was able to see, SIP application were able to "connect()"
>>> with TCP, but my logs wasn't reporting any of the SIP message received
>>> with TCP.
>>>
>>> I have an pike right before an xlog showing every incoming request.
>>> However
>>> I suspect the issue was not related to pike module. The log didn't
>>> showed unusual
>>> number of blocked traffic.
>>>
>>> I'm almost sure I haven't reached any ulimit restrictions.
>>> I have many TCP, UDP childreen...
>>> Server was not under high load
>>> Nothing unusual.
>>>
>>> I'm running the default build for debian stretch from here:
>>>http://deb.kamailio.org/kamailio52 stretch
>>>
>>> And unfortunatly, I had some tiny pressure to restart the service so I
>>> was
>>> not able to get deeper into the issue.
>>>
>>> If I'm correct, I will certainly improve much things by using
>>> "set_reply_no_connect()".
>>> I have added it and restarted!
>>> (Tks Daniel for this tip!)
>>>
>>> I have been looking at issue reported here:
>>> "Kamailio 5.0.2 suddenly stops processing traffic, then generates a core
>>> when restarting."
>>> https://github.com/kamailio/kamailio/issues/1172
>>>
>>> I have to say that I do have libssl1.1.
>>> And I do have crash when I restart my kamailio. (even when I simply
>>> restart after a configuration modification)
>>>
>>> Mar 21 18:28:50 sip 

Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-03-22 Thread Joel Serrano
Hi Aymeric,

Are you sure the issue is with TCP and not strictly related to TLS? I
highly suggest you compile with ssl1.0 and give it a try...

If you want to read how I got to that conclusion:
https://github.com/kamailio/kamailio/issues/1172

Hope it helps!
Joel.







On Fri, Mar 22, 2019 at 11:58 AM Aymeric Moizard  wrote:

> Hi Daniel,
>
> Tks for the tips.
>
> My traffic does include TLS as well.
>
> For TCP settings:
>
> tcp_connection_lifetime=3600
> tcp_async=yes
> tcp_rd_buf_size=16384
> tcp_accept_no_cl=yes
> tcp_max_connections=5
> tcp_connect_timeout=7
>
> For TLS:
> enable_tls=yes
> tls_max_connections=5
>
> I'm using "set_forward_no_connect();" after lookup(location) since a long
> time.
>
> I have added this week "set_reply_no_connect();" in case it will help to
> avoid the issue.
>
> If the issue occurs, I will try to get something via "kamctrl trap".
>
> In order to get a coredump (on restart timeout?) I have added this in
> my kamailio.service
>
> WorkingDirectory=/var/run/kamailio
> LimitCORE=infinity
>
> I have also DUMP_CORE=yes in /etc/default/kamailio
> and disable_core_dump=no in my kamailio.cfg
>
> However, I'm not able to see any core dumps when restarting kamailio
> even when I see "sig_alarm_abort(): shutdown timeout triggered, dying"...
>
> Am I supposed to get a core dump in such case?
>
> Tks a lot!
> Aymeric
>
> Le ven. 22 mars 2019 à 14:19, Daniel-Constantin Mierla 
> a écrit :
>
>> Do you have pure tcp traffic and facing this issue, or there are actually
>> tls connections?
>>
>> What are the values for core parameters related to tcp connect and tcp
>> send timeouts?
>>
>> As for restart taking long, see exit_timeout parameter:
>>
>>   * https://www.kamailio.org/wiki/cookbooks/5.2.x/core#exit_timeout
>>
>> As for tls with libssl1.1/libcrypto1.1, I think I discover what the issue
>> is. With v1.1 they use their own internal locking functions, not exposing
>> any api to set them from outside. Before, kamailio was initializing the
>> libray telling to use Kamailio locks, giving one lock per connection. As i
>> could get from some gdb traces I received, with libssl 1.1, the same
>> internal lock is used for when attempting to connect to different addresses
>> as well as when trying to write to different connections. If one operation
>> is slow for what so ever reason, the others are waiting for the lock to be
>> lifted by the slow operation. I am digging in the source code of libssl1.1
>> to figure out a solution, it can still take a bit because I am travelling
>> for several days with no much spare time.
>>
>> Among the tunnings would be lower timeouts to connect and send, do not
>> attempt to connect unless you are sure the target expects new connections
>> (e.g., sending to a gateway/sip server accepting traffic via tls, but don't
>> do it even for the requests routed via lookup(location) as the registration
>> is using a connection with an ephemeral source port and trying to connect
>> back to it will fail). If still a major issue for what so ever reason,
>> using a version compiled with libssl1.0 would be something to go for it.
>>
>> Cheers,
>> Daniel
>> On 21.03.19 19:17, Aymeric Moizard wrote:
>>
>> Hi List,
>>
>> I want to share that I also met this issue last week with my kamailio
>> 5.2.2.
>>
>> As far as I was able to see, SIP application were able to "connect()"
>> with TCP, but my logs wasn't reporting any of the SIP message received
>> with TCP.
>>
>> I have an pike right before an xlog showing every incoming request.
>> However
>> I suspect the issue was not related to pike module. The log didn't showed
>> unusual
>> number of blocked traffic.
>>
>> I'm almost sure I haven't reached any ulimit restrictions.
>> I have many TCP, UDP childreen...
>> Server was not under high load
>> Nothing unusual.
>>
>> I'm running the default build for debian stretch from here:
>>http://deb.kamailio.org/kamailio52 stretch
>>
>> And unfortunatly, I had some tiny pressure to restart the service so I was
>> not able to get deeper into the issue.
>>
>> If I'm correct, I will certainly improve much things by using
>> "set_reply_no_connect()".
>> I have added it and restarted!
>> (Tks Daniel for this tip!)
>>
>> I have been looking at issue reported here:
>> "Kamailio 5.0.2 suddenly stops processing traffic, then generates a core
>> when restarting."
>> https://github.com/kamailio/kamailio/issues/1172
>>
>> I have to say that I do have libssl1.1.
>> And I do have crash when I restart my kamailio. (even when I simply
>> restart after a configuration modification)
>>
>> Mar 21 18:28:50 sip kamailio[19222]: INFO:  [main.c:836]:
>> sig_usr(): signal 15 received
>> Mar 21 18:28:50 sip kamailio[19175]: ERROR: ctl [ctl.c:390]:
>> mod_destroy(): ERROR: ctl: could not delete unix socket
>> /var/run/kamailio/kamailio_ctl: Permission denied (13)
>>
>> [... one minute without nothing...]
>>
>> Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs
>> [jsonrpcs_fifo.c:599]: 

Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-03-22 Thread Aymeric Moizard
Hi Daniel,

Tks for the tips.

My traffic does include TLS as well.

For TCP settings:

tcp_connection_lifetime=3600
tcp_async=yes
tcp_rd_buf_size=16384
tcp_accept_no_cl=yes
tcp_max_connections=5
tcp_connect_timeout=7

For TLS:
enable_tls=yes
tls_max_connections=5

I'm using "set_forward_no_connect();" after lookup(location) since a long
time.

I have added this week "set_reply_no_connect();" in case it will help to
avoid the issue.

If the issue occurs, I will try to get something via "kamctrl trap".

In order to get a coredump (on restart timeout?) I have added this in
my kamailio.service

WorkingDirectory=/var/run/kamailio
LimitCORE=infinity

I have also DUMP_CORE=yes in /etc/default/kamailio
and disable_core_dump=no in my kamailio.cfg

However, I'm not able to see any core dumps when restarting kamailio
even when I see "sig_alarm_abort(): shutdown timeout triggered, dying"...

Am I supposed to get a core dump in such case?

Tks a lot!
Aymeric

Le ven. 22 mars 2019 à 14:19, Daniel-Constantin Mierla 
a écrit :

> Do you have pure tcp traffic and facing this issue, or there are actually
> tls connections?
>
> What are the values for core parameters related to tcp connect and tcp
> send timeouts?
>
> As for restart taking long, see exit_timeout parameter:
>
>   * https://www.kamailio.org/wiki/cookbooks/5.2.x/core#exit_timeout
>
> As for tls with libssl1.1/libcrypto1.1, I think I discover what the issue
> is. With v1.1 they use their own internal locking functions, not exposing
> any api to set them from outside. Before, kamailio was initializing the
> libray telling to use Kamailio locks, giving one lock per connection. As i
> could get from some gdb traces I received, with libssl 1.1, the same
> internal lock is used for when attempting to connect to different addresses
> as well as when trying to write to different connections. If one operation
> is slow for what so ever reason, the others are waiting for the lock to be
> lifted by the slow operation. I am digging in the source code of libssl1.1
> to figure out a solution, it can still take a bit because I am travelling
> for several days with no much spare time.
>
> Among the tunnings would be lower timeouts to connect and send, do not
> attempt to connect unless you are sure the target expects new connections
> (e.g., sending to a gateway/sip server accepting traffic via tls, but don't
> do it even for the requests routed via lookup(location) as the registration
> is using a connection with an ephemeral source port and trying to connect
> back to it will fail). If still a major issue for what so ever reason,
> using a version compiled with libssl1.0 would be something to go for it.
>
> Cheers,
> Daniel
> On 21.03.19 19:17, Aymeric Moizard wrote:
>
> Hi List,
>
> I want to share that I also met this issue last week with my kamailio
> 5.2.2.
>
> As far as I was able to see, SIP application were able to "connect()"
> with TCP, but my logs wasn't reporting any of the SIP message received
> with TCP.
>
> I have an pike right before an xlog showing every incoming request. However
> I suspect the issue was not related to pike module. The log didn't showed
> unusual
> number of blocked traffic.
>
> I'm almost sure I haven't reached any ulimit restrictions.
> I have many TCP, UDP childreen...
> Server was not under high load
> Nothing unusual.
>
> I'm running the default build for debian stretch from here:
>http://deb.kamailio.org/kamailio52 stretch
>
> And unfortunatly, I had some tiny pressure to restart the service so I was
> not able to get deeper into the issue.
>
> If I'm correct, I will certainly improve much things by using
> "set_reply_no_connect()".
> I have added it and restarted!
> (Tks Daniel for this tip!)
>
> I have been looking at issue reported here:
> "Kamailio 5.0.2 suddenly stops processing traffic, then generates a core
> when restarting."
> https://github.com/kamailio/kamailio/issues/1172
>
> I have to say that I do have libssl1.1.
> And I do have crash when I restart my kamailio. (even when I simply
> restart after a configuration modification)
>
> Mar 21 18:28:50 sip kamailio[19222]: INFO:  [main.c:836]: sig_usr():
> signal 15 received
> Mar 21 18:28:50 sip kamailio[19175]: ERROR: ctl [ctl.c:390]:
> mod_destroy(): ERROR: ctl: could not delete unix socket
> /var/run/kamailio/kamailio_ctl: Permission denied (13)
>
> [... one minute without nothing...]
>
> Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs
> [jsonrpcs_fifo.c:599]: jsonrpc_fifo_destroy(): FIFO stat failed: Permission
> denied
> Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs
> [jsonrpcs_sock.c:516]: jsonrpc_dgram_destroy(): socket stat failed:
> Permission denied
> Mar 21 18:29:50 sip kamailio[19175]: CRITICAL:  [main.c:662]:
> sig_alarm_abort(): shutdown timeout triggered, dying...
>
> As the 1172 issue is closed, should I expect kamailio to still have
> trouble with libssl1.1?
>
> I just restarted again my service (to see if it restart better after 30
> 

Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-03-22 Thread Daniel-Constantin Mierla
Do you have pure tcp traffic and facing this issue, or there are
actually tls connections?

What are the values for core parameters related to tcp connect and tcp
send timeouts?

As for restart taking long, see exit_timeout parameter:

  * https://www.kamailio.org/wiki/cookbooks/5.2.x/core#exit_timeout

As for tls with libssl1.1/libcrypto1.1, I think I discover what the
issue is. With v1.1 they use their own internal locking functions, not
exposing any api to set them from outside. Before, kamailio was
initializing the libray telling to use Kamailio locks, giving one lock
per connection. As i could get from some gdb traces I received, with
libssl 1.1, the same internal lock is used for when attempting to
connect to different addresses as well as when trying to write to
different connections. If one operation is slow for what so ever reason,
the others are waiting for the lock to be lifted by the slow operation.
I am digging in the source code of libssl1.1 to figure out a solution,
it can still take a bit because I am travelling for several days with no
much spare time.

Among the tunnings would be lower timeouts to connect and send, do not
attempt to connect unless you are sure the target expects new
connections (e.g., sending to a gateway/sip server accepting traffic via
tls, but don't do it even for the requests routed via lookup(location)
as the registration is using a connection with an ephemeral source port
and trying to connect back to it will fail). If still a major issue for
what so ever reason, using a version compiled with libssl1.0 would be
something to go for it.

Cheers,
Daniel

On 21.03.19 19:17, Aymeric Moizard wrote:
> Hi List,
>
> I want to share that I also met this issue last week with my kamailio
> 5.2.2.
>
> As far as I was able to see, SIP application were able to "connect()"
> with TCP, but my logs wasn't reporting any of the SIP message received
> with TCP.
>
> I have an pike right before an xlog showing every incoming request.
> However
> I suspect the issue was not related to pike module. The log didn't
> showed unusual
> number of blocked traffic.
>
> I'm almost sure I haven't reached any ulimit restrictions.
> I have many TCP, UDP childreen...
> Server was not under high load
> Nothing unusual.
>
> I'm running the default build for debian stretch from here:
>    http://deb.kamailio.org/kamailio52 stretch
>
> And unfortunatly, I had some tiny pressure to restart the service so I was
> not able to get deeper into the issue.
>
> If I'm correct, I will certainly improve much things by using
> "set_reply_no_connect()".
> I have added it and restarted!
> (Tks Daniel for this tip!)
>
> I have been looking at issue reported here:
> "Kamailio 5.0.2 suddenly stops processing traffic, then generates a
> core when restarting."
> https://github.com/kamailio/kamailio/issues/1172
>
> I have to say that I do have libssl1.1.
> And I do have crash when I restart my kamailio. (even when I simply
> restart after a configuration modification)
>
> Mar 21 18:28:50 sip kamailio[19222]: INFO:  [main.c:836]:
> sig_usr(): signal 15 received
> Mar 21 18:28:50 sip kamailio[19175]: ERROR: ctl [ctl.c:390]:
> mod_destroy(): ERROR: ctl: could not delete unix socket
> /var/run/kamailio/kamailio_ctl: Permission denied (13)
>
> [... one minute without nothing...]
>
> Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs
> [jsonrpcs_fifo.c:599]: jsonrpc_fifo_destroy(): FIFO stat failed:
> Permission denied
> Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs
> [jsonrpcs_sock.c:516]: jsonrpc_dgram_destroy(): socket stat failed:
> Permission denied
> Mar 21 18:29:50 sip kamailio[19175]: CRITICAL:  [main.c:662]:
> sig_alarm_abort(): shutdown timeout triggered, dying...
>
> As the 1172 issue is closed, should I expect kamailio to still have
> trouble with libssl1.1?
>
> I just restarted again my service (to see if it restart better after
> 30 minutes only instead of a week)
>
> Mar 21 19:07:30 sip kamailio[28737]: INFO:  [main.c:836]:
> sig_usr(): signal 15 received
> Mar 21 19:07:31 sip kamailio[28671]: ERROR: ctl [ctl.c:390]:
> mod_destroy(): ERROR: ctl: could not delete unix socket
> /var/run/kamailio/kamailio_ctl: Permission denied (13)
>
> [... one minute without nothing...]
>
> Mar 21 19:08:30 sip kamailio[28671]: CRITICAL:  [main.c:662]:
> sig_alarm_abort(): shutdown timeout triggered, dying...
>
> Still not able to restart in a clean way!
> Tks!
> Regards
> Aymeric
>
>
> Le mer. 20 mars 2019 à 15:08, Daniel-Constantin Mierla
> mailto:mico...@gmail.com>> a écrit :
>
> Hello,
>
> based on the trap output I think I could figure out what happened
> there.
>
> You have tcp_children to very low value (1 or so), the problem is not
> actually that one, but the fact that the connection to upstream (the
> device/app sending the request) was closed after receiving the request
> and routing of the reply gets stuck in the way of:
>
>   - a reply is received and has to be forwarded
>   - 

Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-03-21 Thread Aymeric Moizard
Hi List,

I want to share that I also met this issue last week with my kamailio 5.2.2.

As far as I was able to see, SIP application were able to "connect()"
with TCP, but my logs wasn't reporting any of the SIP message received
with TCP.

I have an pike right before an xlog showing every incoming request. However
I suspect the issue was not related to pike module. The log didn't showed
unusual
number of blocked traffic.

I'm almost sure I haven't reached any ulimit restrictions.
I have many TCP, UDP childreen...
Server was not under high load
Nothing unusual.

I'm running the default build for debian stretch from here:
   http://deb.kamailio.org/kamailio52 stretch

And unfortunatly, I had some tiny pressure to restart the service so I was
not able to get deeper into the issue.

If I'm correct, I will certainly improve much things by using
"set_reply_no_connect()".
I have added it and restarted!
(Tks Daniel for this tip!)

I have been looking at issue reported here:
"Kamailio 5.0.2 suddenly stops processing traffic, then generates a core
when restarting."
https://github.com/kamailio/kamailio/issues/1172

I have to say that I do have libssl1.1.
And I do have crash when I restart my kamailio. (even when I simply restart
after a configuration modification)

Mar 21 18:28:50 sip kamailio[19222]: INFO:  [main.c:836]: sig_usr():
signal 15 received
Mar 21 18:28:50 sip kamailio[19175]: ERROR: ctl [ctl.c:390]: mod_destroy():
ERROR: ctl: could not delete unix socket /var/run/kamailio/kamailio_ctl:
Permission denied (13)

[... one minute without nothing...]

Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs [jsonrpcs_fifo.c:599]:
jsonrpc_fifo_destroy(): FIFO stat failed: Permission denied
Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs [jsonrpcs_sock.c:516]:
jsonrpc_dgram_destroy(): socket stat failed: Permission denied
Mar 21 18:29:50 sip kamailio[19175]: CRITICAL:  [main.c:662]:
sig_alarm_abort(): shutdown timeout triggered, dying...

As the 1172 issue is closed, should I expect kamailio to still have trouble
with libssl1.1?

I just restarted again my service (to see if it restart better after 30
minutes only instead of a week)

Mar 21 19:07:30 sip kamailio[28737]: INFO:  [main.c:836]: sig_usr():
signal 15 received
Mar 21 19:07:31 sip kamailio[28671]: ERROR: ctl [ctl.c:390]: mod_destroy():
ERROR: ctl: could not delete unix socket /var/run/kamailio/kamailio_ctl:
Permission denied (13)

[... one minute without nothing...]

Mar 21 19:08:30 sip kamailio[28671]: CRITICAL:  [main.c:662]:
sig_alarm_abort(): shutdown timeout triggered, dying...

Still not able to restart in a clean way!
Tks!
Regards
Aymeric


Le mer. 20 mars 2019 à 15:08, Daniel-Constantin Mierla 
a écrit :

> Hello,
>
> based on the trap output I think I could figure out what happened there.
>
> You have tcp_children to very low value (1 or so), the problem is not
> actually that one, but the fact that the connection to upstream (the
> device/app sending the request) was closed after receiving the request
> and routing of the reply gets stuck in the way of:
>
>   - a reply is received and has to be forwarded
>   - connection was lost, so Kamailio tries to establish a new one, but
> takes time till fails because the upstream is behind nat or so based on
> the via header:
>
> Via: SIP/2.0/TLS
> 10.1.0.4:10002
> ;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
>
>   - the reply is retransmitted and gets to another worker, which tries
> to forward it again, but discovers a connection structure for that
> destination exists (created by previous reply worker) and now waits for
> the connection to be released (or better said, for the mutex on writing
> buffer to be unlocked)
>
>   - as the second reply waits, there can be other retransmissions of the
> reply ending up in other workers stuck on waiting for the mutex of the
> connection write buffer
>
> The solution here is to use set_reply_no_connect() -- you can put it
> first in request_route block. I think this would be a good addition to
> the default configuration file as well, IMO, the sip server should not
> connect for sending replies and should do it also for requests that go
> behind nat.
>
> Cheers,
> Daniel
>
> On 19.03.19 10:53, Kristijan Vrban wrote:
> > So i had again the situation. But this time, incoming udp was
> > affected. Kamailio was sending out OPTIONS (via dispatcher module) to
> > a group of asterisk machines
> > but the 200 OK reply to the OPTIONS where not processed, so the
> > dispatcher module set all asterisk to inactive, even though they
> > replied 200 OK
> >
> > Attached the output of kamctl trap during the situation. Hope there is
> > any useful in it. Because after "kamctl trap" it was working again
> > without kamailio restart.
> >
> > Best
> > Kristijan
> >
> > Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla
> > :
> >> Hello,
> >>
> >> setting tcp_children=1 is not a god option for scallability, practically
> 

Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-03-20 Thread Daniel-Constantin Mierla
Hello,

based on the trap output I think I could figure out what happened there.

You have tcp_children to very low value (1 or so), the problem is not
actually that one, but the fact that the connection to upstream (the
device/app sending the request) was closed after receiving the request
and routing of the reply gets stuck in the way of:

  - a reply is received and has to be forwarded
  - connection was lost, so Kamailio tries to establish a new one, but
takes time till fails because the upstream is behind nat or so based on
the via header:

Via: SIP/2.0/TLS
10.1.0.4:10002;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9

  - the reply is retransmitted and gets to another worker, which tries
to forward it again, but discovers a connection structure for that
destination exists (created by previous reply worker) and now waits for
the connection to be released (or better said, for the mutex on writing
buffer to be unlocked)

  - as the second reply waits, there can be other retransmissions of the
reply ending up in other workers stuck on waiting for the mutex of the
connection write buffer

The solution here is to use set_reply_no_connect() -- you can put it
first in request_route block. I think this would be a good addition to
the default configuration file as well, IMO, the sip server should not
connect for sending replies and should do it also for requests that go
behind nat.

Cheers,
Daniel

On 19.03.19 10:53, Kristijan Vrban wrote:
> So i had again the situation. But this time, incoming udp was
> affected. Kamailio was sending out OPTIONS (via dispatcher module) to
> a group of asterisk machines
> but the 200 OK reply to the OPTIONS where not processed, so the
> dispatcher module set all asterisk to inactive, even though they
> replied 200 OK
>
> Attached the output of kamctl trap during the situation. Hope there is
> any useful in it. Because after "kamctl trap" it was working again
> without kamailio restart.
>
> Best
> Kristijan
>
> Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla
> :
>> Hello,
>>
>> setting tcp_children=1 is not a god option for scallability, practically
>> you set kamailio to process a single tcp message at one time, on high
>> traffic, that won't work well.
>>
>> Maybe try to set tcp_children to 2 or 4, that should make an eventual
>> race appear faster.
>>
>> Regarding the pid, if it is an outgoing connection, then it can be
>> created by any worker process, including a UDP worker, if that was the
>> one receiving the sip message over udp and sends it out via tcp.
>>
>> Cheers,
>> Daniel
>>
>> On 18.03.19 10:09, Kristijan Vrban wrote:
>>> Hi Daniel,
>>>
>>> for testing, i now had set: "tcp_children=1" and so far this issue did not 
>>> occur
>>> ever since. So now value to provide for "kamctl trap" yet.
>>>
>>> "kamctl ps" show this two process to handle tcp:
>>>
>>> ...
>>> }, {
>>>   "IDX":  25,
>>>   "PID":  71929,
>>>   "DSC":  "tcp receiver (generic) child=0"
>>> }, {
>>>   "IDX":  26,
>>>   "PID":  71933,
>>>   "DSC":  "tcp main process"
>>> }
>>> ...
>>>
>>>
>>> Ok, but then is was wondering to see a TCP connection on a udp receiver 
>>> child:
>>>
>>>
>>> netstat -ntp |grep 5061
>>>
>>> ...
>>> tcp0  0 172.17.217.10:5061  195.70.114.125:18252
>>> ESTABLISHED 71895/kamailio
>>> ...
>>>
>>> An pid 71895 is:
>>>
>>> }, {
>>>   "IDX":  3,
>>>   "PID":  71895,
>>>   "DSC":  "udp receiver child=2 sock=127.0.0.1:5060"
>>> }, {
>>>
>>>
>>>
>>> And if i look into it via "lsof -p 71895" (the udp receiver child)
>>>
>>> ...
>>> kamailio 71895 kamailio   14u  sock0,9  0t0
>>> 8856085 protocol: TCP
>>> kamailio 71895 kamailio   15u  sock0,9  0t0
>>> 8886886 protocol: TCP
>>> kamailio 71895 kamailio   16u  sock0,9  0t0
>>> 8854886 protocol: TCP
>>> kamailio 71895 kamailio   17u  sock0,9  0t0
>>> 8828915 protocol: TCP
>>> kamailio 71895 kamailio   18u  unix 0x5f73cb91  0t0
>>> 1680314 type=DGRAM
>>> kamailio 71895 kamailio   19u  IPv41846523  0t0
>>> TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED)
>>> kamailio 71895 kamailio   20u  sock0,9  0t0
>>> 8887192 protocol: TCP
>>> kamailio 71895 kamailio   21u  sock0,9  0t0
>>> 8813634 protocol: TCP
>>> kamailio 71895 kamailio   22u  unix 0xc19bd102  0t0
>>> 1681407 type=STREAM
>>> kamailio 71895 kamailio   23u  sock0,9  0t0
>>> 8850488 protocol: TCP
>>> ...
>>>
>>> Not only the ESTABLISHED TCP session. But also this empty sockets
>>> "protocol: TCP"
>>> What are they doing there in the udp receiver? Is that how it's supposed to 
>>> be?
>>>
>>> Kristijan
>>>
>>> Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla
>>> :
 Can you get file written by `kamctl trap`? It should have the backtrace
 for all kamailio 

Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-03-18 Thread Daniel-Constantin Mierla
Hello,

setting tcp_children=1 is not a god option for scallability, practically
you set kamailio to process a single tcp message at one time, on high
traffic, that won't work well.

Maybe try to set tcp_children to 2 or 4, that should make an eventual
race appear faster.

Regarding the pid, if it is an outgoing connection, then it can be
created by any worker process, including a UDP worker, if that was the
one receiving the sip message over udp and sends it out via tcp.

Cheers,
Daniel

On 18.03.19 10:09, Kristijan Vrban wrote:
> Hi Daniel,
>
> for testing, i now had set: "tcp_children=1" and so far this issue did not 
> occur
> ever since. So now value to provide for "kamctl trap" yet.
>
> "kamctl ps" show this two process to handle tcp:
>
> ...
> }, {
>   "IDX":  25,
>   "PID":  71929,
>   "DSC":  "tcp receiver (generic) child=0"
> }, {
>   "IDX":  26,
>   "PID":  71933,
>   "DSC":  "tcp main process"
> }
> ...
>
>
> Ok, but then is was wondering to see a TCP connection on a udp receiver child:
>
>
> netstat -ntp |grep 5061
>
> ...
> tcp0  0 172.17.217.10:5061  195.70.114.125:18252
> ESTABLISHED 71895/kamailio
> ...
>
> An pid 71895 is:
>
> }, {
>   "IDX":  3,
>   "PID":  71895,
>   "DSC":  "udp receiver child=2 sock=127.0.0.1:5060"
> }, {
>
>
>
> And if i look into it via "lsof -p 71895" (the udp receiver child)
>
> ...
> kamailio 71895 kamailio   14u  sock0,9  0t0
> 8856085 protocol: TCP
> kamailio 71895 kamailio   15u  sock0,9  0t0
> 8886886 protocol: TCP
> kamailio 71895 kamailio   16u  sock0,9  0t0
> 8854886 protocol: TCP
> kamailio 71895 kamailio   17u  sock0,9  0t0
> 8828915 protocol: TCP
> kamailio 71895 kamailio   18u  unix 0x5f73cb91  0t0
> 1680314 type=DGRAM
> kamailio 71895 kamailio   19u  IPv41846523  0t0
> TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED)
> kamailio 71895 kamailio   20u  sock0,9  0t0
> 8887192 protocol: TCP
> kamailio 71895 kamailio   21u  sock0,9  0t0
> 8813634 protocol: TCP
> kamailio 71895 kamailio   22u  unix 0xc19bd102  0t0
> 1681407 type=STREAM
> kamailio 71895 kamailio   23u  sock0,9  0t0
> 8850488 protocol: TCP
> ...
>
> Not only the ESTABLISHED TCP session. But also this empty sockets
> "protocol: TCP"
> What are they doing there in the udp receiver? Is that how it's supposed to 
> be?
>
> Kristijan
>
> Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla
> :
>> Can you get file written by `kamctl trap`? It should have the backtrace
>> for all kamailio processes. You need latest kamailio 5.2.
>>
>> Also, get the output for: kamctl ps
>>
>> Cheers,
>> Daniel
>>
>> On 14.03.19 13:52, Kristijan Vrban wrote:
>>> When i attach via gdb to one of the tcp worker, i see this:
>>>
>>> (gdb) bt
>>> #0  0x7fdaf4d14470 in futex_wait (private=,
>>> expected=1, futex_word=0x7fdaeca92f8c) at
>>> ../sysdeps/unix/sysv/linux/futex-internal.h:61
>>> #1  futex_wait_simple (private=, expected=1,
>>> futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135
>>> #2  __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at
>>> pthread_rwlock_wrlock.c:67
>>> #3  0x7fdaf0912ee9 in CRYPTO_THREAD_write_lock () from
>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>> #4  0x7fdaf08e1c08 in ?? () from 
>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>> #5  0x7fdaf08a6f69 in ?? () from 
>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>> #6  0x7fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from
>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>> #7  0x7fdaf0c31144 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>> #8  0x7fdaf0c2bddb in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>> #9  0x7fdaf0c22858 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>> #10 0x7fdaf0c1af61 in SSL_do_handshake () from
>>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>> #11 0x7fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98,
>>> error=0x7e2a2df0) at tls_server.c:422
>>> #12 0x7fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98,
>>> flags=0x7e2c318c) at tls_server.c:1116
>>> #13 0x556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98,
>>> read_flags=0x7e2c318c) at core/tcp_read.c:469
>>> #14 0x556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98,
>>> bytes_read=0x7e2c3184, read_flags=0x7e2c318c) at
>>> core/tcp_read.c:1496
>>> #15 0x556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1,
>>> idx=-1) at core/tcp_read.c:1862
>>> #16 0x556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0 ,
>>> t=2, repeat=0) at core/io_wait.h:1065
>>> #17 0x556ead5f6b35 in tcp_receive_loop (unix_sock=49) at
>>> core/tcp_read.c:1974
>>> #18 0x556ead4c8e24 in tcp_init_children () at core/tcp_main.c:4853
>>> #19 0x556ead3c352a in main_loop () at main.c:1735
>>> #20 0x556ead3ca5f8 in main 

Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-03-14 Thread Daniel-Constantin Mierla
Can you get file written by `kamctl trap`? It should have the backtrace
for all kamailio processes. You need latest kamailio 5.2.

Also, get the output for: kamctl ps

Cheers,
Daniel

On 14.03.19 13:52, Kristijan Vrban wrote:
> When i attach via gdb to one of the tcp worker, i see this:
>
> (gdb) bt
> #0  0x7fdaf4d14470 in futex_wait (private=,
> expected=1, futex_word=0x7fdaeca92f8c) at
> ../sysdeps/unix/sysv/linux/futex-internal.h:61
> #1  futex_wait_simple (private=, expected=1,
> futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135
> #2  __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at
> pthread_rwlock_wrlock.c:67
> #3  0x7fdaf0912ee9 in CRYPTO_THREAD_write_lock () from
> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
> #4  0x7fdaf08e1c08 in ?? () from 
> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
> #5  0x7fdaf08a6f69 in ?? () from 
> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
> #6  0x7fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from
> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
> #7  0x7fdaf0c31144 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
> #8  0x7fdaf0c2bddb in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
> #9  0x7fdaf0c22858 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
> #10 0x7fdaf0c1af61 in SSL_do_handshake () from
> /usr/lib/x86_64-linux-gnu/libssl.so.1.1
> #11 0x7fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98,
> error=0x7e2a2df0) at tls_server.c:422
> #12 0x7fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98,
> flags=0x7e2c318c) at tls_server.c:1116
> #13 0x556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98,
> read_flags=0x7e2c318c) at core/tcp_read.c:469
> #14 0x556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98,
> bytes_read=0x7e2c3184, read_flags=0x7e2c318c) at
> core/tcp_read.c:1496
> #15 0x556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1,
> idx=-1) at core/tcp_read.c:1862
> #16 0x556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0 ,
> t=2, repeat=0) at core/io_wait.h:1065
> #17 0x556ead5f6b35 in tcp_receive_loop (unix_sock=49) at
> core/tcp_read.c:1974
> #18 0x556ead4c8e24 in tcp_init_children () at core/tcp_main.c:4853
> #19 0x556ead3c352a in main_loop () at main.c:1735
> #20 0x556ead3ca5f8 in main (argc=13, argv=0x7e2c3828) at main.c:2675
>
>
>
>
>
>
>
> Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban
> :
>> Hi, with full debug is see this in log for every incoming TCP SIP request:
>>
>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG:
>>  [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp
>> receiver, connection passed to the least busy one (105)
>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG:
>>  [core/tcp_main.c:3875]: send2child(): selected tcp worker 2
>> 27(17937) for activity on [tls:172.17.217.10:5061], 0x7fdaeda8f928
>>
>> So the Kamailio TCP process is working, and received TCP traffic. But
>> the tcp workers are somehow busy.
>>
>> When i attach via strace to the TCP worker, i do not see any activity. Just:
>>
>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL
>>
>> and nothing, even when i see the main tcp process choose this worker process.
>>
>> Kristijan
>>
>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban
>> :
>>> first of all thanks for the feedback. i prepared our system now to run
>>> with debug=3
>>> I hope to see more then then.
>>>
>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban
>>> :
 Hi kamailios,

 i have a creepy situation with v5.2.1 stable Kamilio. After a day or
 so, Kamailio stop to process incoming SIP traffic via TCP. The
 incoming TCP network packages get TCP-ACK from the OS (Debian 9,
 4.18.0-15-generic-Linux) but Kamailio does not show any processing for
 the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via
 UDP is working just totally fine.

 When i look via command "netstat -ntp" is see, that the Recv-Q get
 bigger and bigger. e.g.:

 Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program
 name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED
 31347/kamailio

 After Kamailio restart, all is working fine again for a day. We have
 maybe 10-20 devices online via TCP and low call volume (1-2 call per
 minute). The only settings for tcp we have is "tcp_delayed_ack=no"

 How to could we debug this situation? Again, no error, no warings in
 the log. Just nothing.

 Kristijan
> ___
> Kamailio (SER) - Users Mailing List
> sr-users@lists.kamailio.org
> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users

-- 
Daniel-Constantin Mierla -- www.asipto.com
www.twitter.com/miconda -- www.linkedin.com/in/miconda
Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com
Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- 

Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-03-14 Thread Kristijan Vrban
When i attach via gdb to one of the tcp worker, i see this:

(gdb) bt
#0  0x7fdaf4d14470 in futex_wait (private=,
expected=1, futex_word=0x7fdaeca92f8c) at
../sysdeps/unix/sysv/linux/futex-internal.h:61
#1  futex_wait_simple (private=, expected=1,
futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135
#2  __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at
pthread_rwlock_wrlock.c:67
#3  0x7fdaf0912ee9 in CRYPTO_THREAD_write_lock () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
#4  0x7fdaf08e1c08 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
#5  0x7fdaf08a6f69 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
#6  0x7fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
#7  0x7fdaf0c31144 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
#8  0x7fdaf0c2bddb in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
#9  0x7fdaf0c22858 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
#10 0x7fdaf0c1af61 in SSL_do_handshake () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
#11 0x7fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98,
error=0x7e2a2df0) at tls_server.c:422
#12 0x7fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98,
flags=0x7e2c318c) at tls_server.c:1116
#13 0x556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98,
read_flags=0x7e2c318c) at core/tcp_read.c:469
#14 0x556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98,
bytes_read=0x7e2c3184, read_flags=0x7e2c318c) at
core/tcp_read.c:1496
#15 0x556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1,
idx=-1) at core/tcp_read.c:1862
#16 0x556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0 ,
t=2, repeat=0) at core/io_wait.h:1065
#17 0x556ead5f6b35 in tcp_receive_loop (unix_sock=49) at
core/tcp_read.c:1974
#18 0x556ead4c8e24 in tcp_init_children () at core/tcp_main.c:4853
#19 0x556ead3c352a in main_loop () at main.c:1735
#20 0x556ead3ca5f8 in main (argc=13, argv=0x7e2c3828) at main.c:2675







Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban
:
>
> Hi, with full debug is see this in log for every incoming TCP SIP request:
>
> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG:
>  [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp
> receiver, connection passed to the least busy one (105)
> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG:
>  [core/tcp_main.c:3875]: send2child(): selected tcp worker 2
> 27(17937) for activity on [tls:172.17.217.10:5061], 0x7fdaeda8f928
>
> So the Kamailio TCP process is working, and received TCP traffic. But
> the tcp workers are somehow busy.
>
> When i attach via strace to the TCP worker, i do not see any activity. Just:
>
> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL
>
> and nothing, even when i see the main tcp process choose this worker process.
>
> Kristijan
>
> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban
> :
> >
> > first of all thanks for the feedback. i prepared our system now to run
> > with debug=3
> > I hope to see more then then.
> >
> > Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban
> > :
> > >
> > > Hi kamailios,
> > >
> > > i have a creepy situation with v5.2.1 stable Kamilio. After a day or
> > > so, Kamailio stop to process incoming SIP traffic via TCP. The
> > > incoming TCP network packages get TCP-ACK from the OS (Debian 9,
> > > 4.18.0-15-generic-Linux) but Kamailio does not show any processing for
> > > the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via
> > > UDP is working just totally fine.
> > >
> > > When i look via command "netstat -ntp" is see, that the Recv-Q get
> > > bigger and bigger. e.g.:
> > >
> > > Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program
> > > name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED
> > > 31347/kamailio
> > >
> > > After Kamailio restart, all is working fine again for a day. We have
> > > maybe 10-20 devices online via TCP and low call volume (1-2 call per
> > > minute). The only settings for tcp we have is "tcp_delayed_ack=no"
> > >
> > > How to could we debug this situation? Again, no error, no warings in
> > > the log. Just nothing.
> > >
> > > Kristijan

___
Kamailio (SER) - Users Mailing List
sr-users@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users


Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-03-14 Thread Kristijan Vrban
Hi, with full debug is see this in log for every incoming TCP SIP request:

Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG:
 [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp
receiver, connection passed to the least busy one (105)
Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG:
 [core/tcp_main.c:3875]: send2child(): selected tcp worker 2
27(17937) for activity on [tls:172.17.217.10:5061], 0x7fdaeda8f928

So the Kamailio TCP process is working, and received TCP traffic. But
the tcp workers are somehow busy.

When i attach via strace to the TCP worker, i do not see any activity. Just:

futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL

and nothing, even when i see the main tcp process choose this worker process.

Kristijan

Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban
:
>
> first of all thanks for the feedback. i prepared our system now to run
> with debug=3
> I hope to see more then then.
>
> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban
> :
> >
> > Hi kamailios,
> >
> > i have a creepy situation with v5.2.1 stable Kamilio. After a day or
> > so, Kamailio stop to process incoming SIP traffic via TCP. The
> > incoming TCP network packages get TCP-ACK from the OS (Debian 9,
> > 4.18.0-15-generic-Linux) but Kamailio does not show any processing for
> > the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via
> > UDP is working just totally fine.
> >
> > When i look via command "netstat -ntp" is see, that the Recv-Q get
> > bigger and bigger. e.g.:
> >
> > Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program
> > name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED
> > 31347/kamailio
> >
> > After Kamailio restart, all is working fine again for a day. We have
> > maybe 10-20 devices online via TCP and low call volume (1-2 call per
> > minute). The only settings for tcp we have is "tcp_delayed_ack=no"
> >
> > How to could we debug this situation? Again, no error, no warings in
> > the log. Just nothing.
> >
> > Kristijan

___
Kamailio (SER) - Users Mailing List
sr-users@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users


Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-02-27 Thread Kristijan Vrban
first of all thanks for the feedback. i prepared our system now to run
with debug=3
I hope to see more then then.

Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban
:
>
> Hi kamailios,
>
> i have a creepy situation with v5.2.1 stable Kamilio. After a day or
> so, Kamailio stop to process incoming SIP traffic via TCP. The
> incoming TCP network packages get TCP-ACK from the OS (Debian 9,
> 4.18.0-15-generic-Linux) but Kamailio does not show any processing for
> the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via
> UDP is working just totally fine.
>
> When i look via command "netstat -ntp" is see, that the Recv-Q get
> bigger and bigger. e.g.:
>
> Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program
> name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED
> 31347/kamailio
>
> After Kamailio restart, all is working fine again for a day. We have
> maybe 10-20 devices online via TCP and low call volume (1-2 call per
> minute). The only settings for tcp we have is "tcp_delayed_ack=no"
>
> How to could we debug this situation? Again, no error, no warings in
> the log. Just nothing.
>
> Kristijan

___
Kamailio (SER) - Users Mailing List
sr-users@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users


Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-02-27 Thread Sergey Safarov
I think need to increase LimitNOFILE in systemd file
https://www.freedesktop.org/software/systemd/man/systemd.exec.html

[Service]
LimitNOFILE=99
.

Sergey

ср, 27 февр. 2019 г., 14:27 Ivaylo Markov :

> Hello,
>
> I believe this issue is related -
> https://github.com/kamailio/kamailio/issues/1172. We encountered the
> problem before and the solution is to link kamailio-tls-modules with
> libssl1.0.X instead of libssl1.1.
> On 27/02/2019 13:23, Jurijs Ivolga wrote:
>
> Hi,
>
> Just to add that in my case I had a problem when after some period of time
> with a lot of TLS clients(100k+) I got a lot of TCP connections in
> CLOSE_WAIT state. When connections in CLOSE_WAIT state hit more then 1k,
> then kamailio stopped to receive traffic via TLS, nevertheless UDP at same
> time worked fine. From my point of view it looked like there was issue
> somewhere on Linux side, cause Kamailio never got anything... At least this
> is what I remember... I still plan to work on it someday. :) And if I will
> find out, I'll let you know.
>
> Jurijs
>
>
> On Wed, Feb 27, 2019 at 1:13 PM Kristijan Vrban 
> wrote:
>
>> when is strace to the kamailio process that is attached to the tcp
>> port. it get sporadic this:
>>
>> [], 46, 5000)= 0
>> epoll_wait(17, [{EPOLLIN, {u32=2692971064 <(269)%20297-1064>,
>> u64=139924137540152}}], 46, 5000) = 1
>> accept(14, {sa_family=AF_INET, sin_port=htons(59766),
>> sin_addr=inet_addr("xxx.xx.xxx.xxx")}, [28->16]) = 275
>> fcntl(275, F_GETFL) = 0x2 (flags O_RDWR)
>> fcntl(275, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
>> epoll_ctl(17, EPOLL_CTL_ADD, 275, {EPOLLIN|EPOLLRDHUP,
>> {u32=2692977328 <(269)%20297-7328>, u64=139924137546416}}) = 0
>> epoll_wait(17, [{EPOLLIN, {u32=2692977328 <(269)%20297-7328>,
>> u64=139924137546416}}], 47, 5000) = 1
>> epoll_ctl(17, EPOLL_CTL_DEL, 275, 0x7ffdae44ee4c) = 0
>> recvmsg(53, {msg_namelen=0}, MSG_DONTWAIT) = -1 EAGAIN (Resource
>> temporarily unavailable)
>> recvfrom(56, 0x7ffdae44ed90, 16, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN
>> (Resource temporarily unavailable)
>> sendmsg(56, {msg_name=NULL, msg_namelen=0,
>> msg_iov=[{iov_base="\210ku\230B\177\0\0", iov_len=8}], msg_iovlen=1,
>> msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET,
>> cmsg_type=SCM_RIGHTS, cmsg_data=[275]}], msg_controllen=20,
>> msg_flags=0}, 0) = 8
>> epoll_wait(17,
>>
>> But that's all, no further processing by kamailio.
>>
>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban
>> :
>> >
>> > Hi kamailios,
>> >
>> > i have a creepy situation with v5.2.1 stable Kamilio. After a day or
>> > so, Kamailio stop to process incoming SIP traffic via TCP. The
>> > incoming TCP network packages get TCP-ACK from the OS (Debian 9,
>> > 4.18.0-15-generic-Linux) but Kamailio does not show any processing for
>> > the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via
>> > UDP is working just totally fine.
>> >
>> > When i look via command "netstat -ntp" is see, that the Recv-Q get
>> > bigger and bigger. e.g.:
>> >
>> > Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program
>> > name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED
>> > 31347/kamailio
>> >
>> > After Kamailio restart, all is working fine again for a day. We have
>> > maybe 10-20 devices online via TCP and low call volume (1-2 call per
>> > minute). The only settings for tcp we have is "tcp_delayed_ack=no"
>> >
>> > How to could we debug this situation? Again, no error, no warings in
>> > the log. Just nothing.
>> >
>> > Kristijan
>>
>> ___
>> Kamailio (SER) - Users Mailing List
>> sr-users@lists.kamailio.org
>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>>
>
> ___
> Kamailio (SER) - Users Mailing 
> Listsr-users@lists.kamailio.orghttps://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>
> ___
> Kamailio (SER) - Users Mailing List
> sr-users@lists.kamailio.org
> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>
___
Kamailio (SER) - Users Mailing List
sr-users@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users


Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-02-27 Thread Ivaylo Markov
Hello,

I believe this issue is related -
https://github.com/kamailio/kamailio/issues/1172. We encountered the
problem before and the solution is to link kamailio-tls-modules with
libssl1.0.X instead of libssl1.1.

On 27/02/2019 13:23, Jurijs Ivolga wrote:
> Hi,
>
> Just to add that in my case I had a problem when after some period of
> time with a lot of TLS clients(100k+) I got a lot of TCP connections
> in CLOSE_WAIT state. When connections in CLOSE_WAIT state hit more
> then 1k, then kamailio stopped to receive traffic via TLS,
> nevertheless UDP at same time worked fine. From my point of view it
> looked like there was issue somewhere on Linux side, cause Kamailio
> never got anything... At least this is what I remember... I still plan
> to work on it someday. :) And if I will find out, I'll let you know.
>
> Jurijs
>
>
> On Wed, Feb 27, 2019 at 1:13 PM Kristijan Vrban  > wrote:
>
> when is strace to the kamailio process that is attached to the tcp
> port. it get sporadic this:
>
> [], 46, 5000)            = 0
> epoll_wait(17, [{EPOLLIN, {u32=2692971064, u64=139924137540152}}],
> 46, 5000) = 1
> accept(14, {sa_family=AF_INET, sin_port=htons(59766),
> sin_addr=inet_addr("xxx.xx.xxx.xxx")}, [28->16]) = 275
> fcntl(275, F_GETFL)                     = 0x2 (flags O_RDWR)
> fcntl(275, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
> epoll_ctl(17, EPOLL_CTL_ADD, 275, {EPOLLIN|EPOLLRDHUP,
> {u32=2692977328, u64=139924137546416}}) = 0
> epoll_wait(17, [{EPOLLIN, {u32=2692977328, u64=139924137546416}}],
> 47, 5000) = 1
> epoll_ctl(17, EPOLL_CTL_DEL, 275, 0x7ffdae44ee4c) = 0
> recvmsg(53, {msg_namelen=0}, MSG_DONTWAIT) = -1 EAGAIN (Resource
> temporarily unavailable)
> recvfrom(56, 0x7ffdae44ed90, 16, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN
> (Resource temporarily unavailable)
> sendmsg(56, {msg_name=NULL, msg_namelen=0,
> msg_iov=[{iov_base="\210ku\230B\177\0\0", iov_len=8}], msg_iovlen=1,
> msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET,
> cmsg_type=SCM_RIGHTS, cmsg_data=[275]}], msg_controllen=20,
> msg_flags=0}, 0) = 8
> epoll_wait(17,
>
> But that's all, no further processing by kamailio.
>
> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban
> mailto:vrban.l...@gmail.com>>:
> >
> > Hi kamailios,
> >
> > i have a creepy situation with v5.2.1 stable Kamilio. After a day or
> > so, Kamailio stop to process incoming SIP traffic via TCP. The
> > incoming TCP network packages get TCP-ACK from the OS (Debian 9,
> > 4.18.0-15-generic-Linux) but Kamailio does not show any
> processing for
> > the SIP-Traffic incoming via TCP. No logs, nothing. While
> traffic via
> > UDP is working just totally fine.
> >
> > When i look via command "netstat -ntp" is see, that the Recv-Q get
> > bigger and bigger. e.g.:
> >
> > Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program
> > name tcp 4566 0 172.17.217.12:5060 
> xxx.xxx.xxx.xxx:57252 ESTABLISHED
> > 31347/kamailio
> >
> > After Kamailio restart, all is working fine again for a day. We have
> > maybe 10-20 devices online via TCP and low call volume (1-2 call per
> > minute). The only settings for tcp we have is "tcp_delayed_ack=no"
> >
> > How to could we debug this situation? Again, no error, no warings in
> > the log. Just nothing.
> >
> > Kristijan
>
> ___
> Kamailio (SER) - Users Mailing List
> sr-users@lists.kamailio.org 
> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>
>
> ___
> Kamailio (SER) - Users Mailing List
> sr-users@lists.kamailio.org
> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
___
Kamailio (SER) - Users Mailing List
sr-users@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users


Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-02-27 Thread Jurijs Ivolga
Hi,

Just to add that in my case I had a problem when after some period of time
with a lot of TLS clients(100k+) I got a lot of TCP connections in
CLOSE_WAIT state. When connections in CLOSE_WAIT state hit more then 1k,
then kamailio stopped to receive traffic via TLS, nevertheless UDP at same
time worked fine. From my point of view it looked like there was issue
somewhere on Linux side, cause Kamailio never got anything... At least this
is what I remember... I still plan to work on it someday. :) And if I will
find out, I'll let you know.

Jurijs


On Wed, Feb 27, 2019 at 1:13 PM Kristijan Vrban 
wrote:

> when is strace to the kamailio process that is attached to the tcp
> port. it get sporadic this:
>
> [], 46, 5000)= 0
> epoll_wait(17, [{EPOLLIN, {u32=2692971064, u64=139924137540152}}], 46,
> 5000) = 1
> accept(14, {sa_family=AF_INET, sin_port=htons(59766),
> sin_addr=inet_addr("xxx.xx.xxx.xxx")}, [28->16]) = 275
> fcntl(275, F_GETFL) = 0x2 (flags O_RDWR)
> fcntl(275, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
> epoll_ctl(17, EPOLL_CTL_ADD, 275, {EPOLLIN|EPOLLRDHUP,
> {u32=2692977328, u64=139924137546416}}) = 0
> epoll_wait(17, [{EPOLLIN, {u32=2692977328, u64=139924137546416}}], 47,
> 5000) = 1
> epoll_ctl(17, EPOLL_CTL_DEL, 275, 0x7ffdae44ee4c) = 0
> recvmsg(53, {msg_namelen=0}, MSG_DONTWAIT) = -1 EAGAIN (Resource
> temporarily unavailable)
> recvfrom(56, 0x7ffdae44ed90, 16, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN
> (Resource temporarily unavailable)
> sendmsg(56, {msg_name=NULL, msg_namelen=0,
> msg_iov=[{iov_base="\210ku\230B\177\0\0", iov_len=8}], msg_iovlen=1,
> msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET,
> cmsg_type=SCM_RIGHTS, cmsg_data=[275]}], msg_controllen=20,
> msg_flags=0}, 0) = 8
> epoll_wait(17,
>
> But that's all, no further processing by kamailio.
>
> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban
> :
> >
> > Hi kamailios,
> >
> > i have a creepy situation with v5.2.1 stable Kamilio. After a day or
> > so, Kamailio stop to process incoming SIP traffic via TCP. The
> > incoming TCP network packages get TCP-ACK from the OS (Debian 9,
> > 4.18.0-15-generic-Linux) but Kamailio does not show any processing for
> > the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via
> > UDP is working just totally fine.
> >
> > When i look via command "netstat -ntp" is see, that the Recv-Q get
> > bigger and bigger. e.g.:
> >
> > Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program
> > name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED
> > 31347/kamailio
> >
> > After Kamailio restart, all is working fine again for a day. We have
> > maybe 10-20 devices online via TCP and low call volume (1-2 call per
> > minute). The only settings for tcp we have is "tcp_delayed_ack=no"
> >
> > How to could we debug this situation? Again, no error, no warings in
> > the log. Just nothing.
> >
> > Kristijan
>
> ___
> Kamailio (SER) - Users Mailing List
> sr-users@lists.kamailio.org
> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>
___
Kamailio (SER) - Users Mailing List
sr-users@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users


Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-02-27 Thread Kristijan Vrban
when is strace to the kamailio process that is attached to the tcp
port. it get sporadic this:

[], 46, 5000)= 0
epoll_wait(17, [{EPOLLIN, {u32=2692971064, u64=139924137540152}}], 46, 5000) = 1
accept(14, {sa_family=AF_INET, sin_port=htons(59766),
sin_addr=inet_addr("xxx.xx.xxx.xxx")}, [28->16]) = 275
fcntl(275, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(275, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
epoll_ctl(17, EPOLL_CTL_ADD, 275, {EPOLLIN|EPOLLRDHUP,
{u32=2692977328, u64=139924137546416}}) = 0
epoll_wait(17, [{EPOLLIN, {u32=2692977328, u64=139924137546416}}], 47, 5000) = 1
epoll_ctl(17, EPOLL_CTL_DEL, 275, 0x7ffdae44ee4c) = 0
recvmsg(53, {msg_namelen=0}, MSG_DONTWAIT) = -1 EAGAIN (Resource
temporarily unavailable)
recvfrom(56, 0x7ffdae44ed90, 16, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN
(Resource temporarily unavailable)
sendmsg(56, {msg_name=NULL, msg_namelen=0,
msg_iov=[{iov_base="\210ku\230B\177\0\0", iov_len=8}], msg_iovlen=1,
msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET,
cmsg_type=SCM_RIGHTS, cmsg_data=[275]}], msg_controllen=20,
msg_flags=0}, 0) = 8
epoll_wait(17,

But that's all, no further processing by kamailio.

Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban
:
>
> Hi kamailios,
>
> i have a creepy situation with v5.2.1 stable Kamilio. After a day or
> so, Kamailio stop to process incoming SIP traffic via TCP. The
> incoming TCP network packages get TCP-ACK from the OS (Debian 9,
> 4.18.0-15-generic-Linux) but Kamailio does not show any processing for
> the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via
> UDP is working just totally fine.
>
> When i look via command "netstat -ntp" is see, that the Recv-Q get
> bigger and bigger. e.g.:
>
> Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program
> name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED
> 31347/kamailio
>
> After Kamailio restart, all is working fine again for a day. We have
> maybe 10-20 devices online via TCP and low call volume (1-2 call per
> minute). The only settings for tcp we have is "tcp_delayed_ack=no"
>
> How to could we debug this situation? Again, no error, no warings in
> the log. Just nothing.
>
> Kristijan

___
Kamailio (SER) - Users Mailing List
sr-users@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users


Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

2019-02-27 Thread Jurijs Ivolga
Hi,

I experienced something similar on Debian Stretch, nevertheless on Debian
Jessie it worked fine. We use TLS and I was thinking that it is something
to do with SSL libraries, but never had chance to find out. But maybe my
problem was nothing to do with what you just described.

Jurijs


On Wed, Feb 27, 2019 at 12:54 PM Kristijan Vrban 
wrote:

> Hi kamailios,
>
> i have a creepy situation with v5.2.1 stable Kamilio. After a day or
> so, Kamailio stop to process incoming SIP traffic via TCP. The
> incoming TCP network packages get TCP-ACK from the OS (Debian 9,
> 4.18.0-15-generic-Linux) but Kamailio does not show any processing for
> the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via
> UDP is working just totally fine.
>
> When i look via command "netstat -ntp" is see, that the Recv-Q get
> bigger and bigger. e.g.:
>
> Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program
> name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED
> 31347/kamailio
>
> After Kamailio restart, all is working fine again for a day. We have
> maybe 10-20 devices online via TCP and low call volume (1-2 call per
> minute). The only settings for tcp we have is "tcp_delayed_ack=no"
>
> How to could we debug this situation? Again, no error, no warings in
> the log. Just nothing.
>
> Kristijan
>
> ___
> Kamailio (SER) - Users Mailing List
> sr-users@lists.kamailio.org
> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>
___
Kamailio (SER) - Users Mailing List
sr-users@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users