Re: [OpenSIPS-Users] Rest Client Async operation

2019-06-27 Thread Ben Newlin
Liviu,

Thank you. This makes complete sense now and the documentation updates are 
excellent. Thanks as well for the workaround suggestion; we will investigate 
whether it is necessary. You do make a good point in the docs about TCP 
connection reuse in the workers, which shows my example as a bit of an extreme 
case. In practical use it is more likely the connection would exist and the 
transfer would timeout instead, which would be async.

Even though the example may be extreme, in the current world of cloud computing 
having the remote server “disappear” without closing TCP connections is 
becoming a much more common concern. Cloud instances can and do just disappear 
and the connections will hang; cloud networks have issues and can leave 
connections hung as well. We have been wrestling with these issues with the 
db_postgres and db_virtual modules for some time and have an open ticket for it 
now. While it is impossible to avoid some service impairment when a necessary 
remote component is no longer available, having the entire call processing 
application block on the resource is also not a very good response. We are 
always looking for better ways to gracefully handle the unexpected.

Thanks for the time and quick responses!

Ben Newlin

From: Users  on behalf of Liviu Chircu 

Reply-To: OpenSIPS users mailling list 
Date: Thursday, June 27, 2019 at 5:05 AM
To: OpenSIPS users mailling list 
Subject: Re: [OpenSIPS-Users] Rest Client Async operation


On 26.06.2019 17:48, Ben Newlin wrote:
Thanks again for the info. I think what you are saying is that the async 
operation is not launching a new process to handle the called function, but is 
performing the function in the original worker thread and only taking advantage 
of any suspend/resume or polling functionality already exposed by the 
underlying function itself.
Just to clear this up:  the underlying functions themselves need not offer any 
polling functionality,
they just need to meet two criteria: (1) be non-blocking; (2) provide a valid 
fd for the async engine to poll on.
Any blocking I/O functionality, be it within libcurl, MySQL, etc. that meets 
the above can be adapted to
work with the async engine available in OpenSIPS 2.1+.

I understand that the practicalities of the implementation in OpenSIPS may have 
required this design, but I must re-iterate that these limitations need to be 
documented very carefully as they are very important to understand when 
designing OpenSIPS scripts with async functionality and are not described 
anywhere. I could not find anywhere in the documentation that indicates that 
async operations can potentially still block the original worker thread and 
block call processing. The closest is:

“The current OpenSIPS worker will launch the asynchronous operation, after 
which it will continue to process other pending tasks”

But this provides no elaboration on what it means for the worker to “launch” 
the operation, and more importantly it does not indicate that the launching 
itself can block, which is the key issue here.
Agreed - will try to find a way to integrate this corner-case into the docs, 
somehow.

As I said, this unfortunately makes async processing mostly useless for us. For 
both DB and REST queries if only the data transfer is async then it is only 
useful when the data being transferred is extremely large or prone to 
delays/jitter. Such transfers should be avoided during realtime processing 
whether async or not, as they will still delay the individual call even if not 
others. For small payloads, like the single JSON object common in REST 
responses, it is the connection itself that is the concern. Once connected, 
running the data transfer in async mode represents no real gain.
Then I recommend you stop using rest_client, which currently optimizes system 
resource
usage for setups where the TCP connect cannot possibly ever hang, and resort to 
forking
a process for each HTTP request, using a construct such as:

async(exec("curl your_connect_hanging_http_endpoint"), resume_route);

Although forking a process for each request is a costly operation which will
eat more system resources during normal operation, at least this solution 
optimizes
for the worst case, when the HTTP server is down.  In this latter case, the
throughput of your SIP server won't be hindered that much, as the hanging 
connect
will be done asynchronously.

Best regards,

Liviu Chircu

OpenSIPS Developer

http://www.opensips-solutions.com<http://www.opensips-solutions.com>
___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] Rest Client Async operation

2019-06-27 Thread Liviu Chircu
I have updated the async docs [1], so everyone can read more about this 
limitation,

along with a mitigation solution for it.

Cheers,

[1]: https://www.opensips.org/Documentation/Script-Async-3-0#toc8

Liviu Chircu
OpenSIPS Developer
http://www.opensips-solutions.com


___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] Rest Client Async operation

2019-06-27 Thread Liviu Chircu


On 26.06.2019 17:48, Ben Newlin wrote:


Thanks again for the info. I think what you are saying is that the 
async operation is not launching a new process to handle the called 
function, but is performing the function in the original worker thread 
and only taking advantage of any suspend/resume or polling 
functionality already exposed by the underlying function itself.


Just to clear this up:  the underlying functions themselves need not 
offer any polling functionality,
they just need to meet two criteria: (1) be non-blocking; (2) provide a 
valid fd for the async engine to poll on.


Any blocking I/O functionality, be it within libcurl, MySQL, etc. that 
meets the above can be adapted to

work with the async engine available in OpenSIPS 2.1+.

I understand that the practicalities of the implementation in OpenSIPS 
may have required this design, but I must re-iterate that these 
limitations need to be documented very carefully as they are very 
important to understand when designing OpenSIPS scripts with async 
functionality and are not described anywhere. I could not find 
anywhere in the documentation that indicates that async operations can 
potentially still block the original worker thread and block call 
processing. The closest is:


“The current OpenSIPS worker will launch the asynchronous operation, 
after which it will continue to process other pending tasks”


But this provides no elaboration on what it means for the worker to 
“launch” the operation, and more importantly it does not indicate that 
the launching itself can block, which is the key issue here.


Agreed - will try to find a way to integrate this corner-case into the 
docs, somehow.


As I said, this unfortunately makes async processing mostly useless 
for us. For both DB and REST queries if only the data transfer is 
async then it is only useful when the data being transferred is 
extremely large or prone to delays/jitter. Such transfers should be 
avoided during realtime processing whether async or not, as they will 
still delay the individual call even if not others. For small 
payloads, like the single JSON object common in REST responses, it is 
the connection itself that is the concern. Once connected, running the 
data transfer in async mode represents no real gain.


Then I recommend you stop using rest_client, which currently optimizes 
system resource
usage forsetups where the TCP connect cannot possibly ever hang, and 
resort to forking

a process for eachHTTP request, using a construct such as:

    async(exec("curl your_connect_hanging_http_endpoint"), resume_route);

Although forking a process for each request is a costly operation which will
eat more system resources during normal operation, at least this 
solution optimizes

for the worst case, when the HTTP server is down.  In this latter case, the
throughput of your SIP server won't be hindered that much, as the 
hanging connect

will be done asynchronously.

Best regards,

Liviu Chircu
OpenSIPS Developer
http://www.opensips-solutions.com

___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] Rest Client Async operation

2019-06-26 Thread Ben Newlin
Liviu,

Thanks again for the info. I think what you are saying is that the async 
operation is not launching a new process to handle the called function, but is 
performing the function in the original worker thread and only taking advantage 
of any suspend/resume or polling functionality already exposed by the 
underlying function itself.

This is very different from the way I had hoped async would work. The 
documentation does not speak to the specific implementation of async, so it was 
all assumptions on my part. But to me when you say async it means a separate 
process or thread is being created to perform the function. This would allow 
that process to block on the connect or any other aspect of the called function 
as necessary without blocking call processing, which is the desired outcome of 
any async operation.

I understand that the practicalities of the implementation in OpenSIPS may have 
required this design, but I must re-iterate that these limitations need to be 
documented very carefully as they are very important to understand when 
designing OpenSIPS scripts with async functionality and are not described 
anywhere. I could not find anywhere in the documentation that indicates that 
async operations can potentially still block the original worker thread and 
block call processing. The closest is:

“The current OpenSIPS worker will launch the asynchronous operation, after 
which it will continue to process other pending tasks”

But this provides no elaboration on what it means for the worker to “launch” 
the operation, and more importantly it does not indicate that the launching 
itself can block, which is the key issue here.

As I said, this unfortunately makes async processing mostly useless for us. For 
both DB and REST queries if only the data transfer is async then it is only 
useful when the data being transferred is extremely large or prone to 
delays/jitter. Such transfers should be avoided during realtime processing 
whether async or not, as they will still delay the individual call even if not 
others. For small payloads, like the single JSON object common in REST 
responses, it is the connection itself that is the concern. Once connected, 
running the data transfer in async mode represents no real gain.

As far as investigating why the server is not responding, of course we will 
always do this. But we cannot anticipate or fix every cause of remote system 
failure. In order to design a resilient system we cannot assume the remote 
server will always be there. We had believed async would allow call processing 
to proceed in failure cases like this without blocking.

Ben Newlin

From: Users  on behalf of Liviu Chircu 

Reply-To: OpenSIPS users mailling list 
Date: Wednesday, June 26, 2019 at 9:23 AM
To: "users@lists.opensips.org" 
Subject: Re: [OpenSIPS-Users] Rest Client Async operation


It's the same process doing both the connect and the transfer.  The problem is 
that libcurl, as it stands now,
is not able to give me a file descriptor to poll on, until it connects [1].  
See this section:

"When libcurl returns -1 in max_fd, it is because libcurl currently does 
something that isn't possible
for your application to monitor with a socket and unfortunately you can then 
not know exactly when the
current action is completed using select(). You then need to wait a while 
before you proceed and call
curl_multi_perform anyway. How long to wait? Unless curl_multi_timeout gives 
you a lower number, we
suggest 100 milliseconds or so, but you may want to test it out in your own 
particular conditions to
find a suitable value."

Regarding your issue: I would investigate further the reason why the connect is 
hanging, and not getting
rejected immediately when your server is down.  That would solve all your 
blocking issues.

The same with MySQL connections which go down: only after the connection is up 
are we able to obtain
its file descriptor to asynchronously poll on.  So if connect to DB_HOST:3306 
hangs, so will OpenSIPS.

Regards,

[1]: 
https://curl.haxx.se/libcurl/c/curl_multi_fdset.html<https://curl.haxx.se/libcurl/c/curl_multi_fdset.html>

Liviu Chircu

OpenSIPS Developer

http://www.opensips-solutions.com<http://www.opensips-solutions.com>
On 25.06.2019 18:41, Ben Newlin wrote:
but I guess my question would be why isn’t the entire operation run in async? 
Why must the connect be performed in the current process and only the transfer 
be in another process?
___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] Rest Client Async operation

2019-06-26 Thread Liviu Chircu

That is exactly why we need to extract the connection/transfer fd in the
first place: so we can throw it into epoll_wait() and have it act as
an event generator :)

Liviu Chircu
OpenSIPS Developer
http://www.opensips-solutions.com

On 26.06.2019 16:38, SamyGo wrote:

once we somehow get the event from libcurl


___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] Rest Client Async operation

2019-06-26 Thread SamyGo
Hi Liviu,

Is it possible to suspend the transaction and resume it once we somehow get
the event from libcurl as soon as the connect is done?
Im imagining the way usually APNS is done these days. The only thing
missing here is the event from the rest_client !

Can this mechanism help with the problem of backlog on the opensips thread
?

Best Regards,
Sammy


On Wed., Jun. 26, 2019, 9:23 a.m. Liviu Chircu,  wrote:

> It's the same process doing both the connect and the transfer.  The
> problem is that libcurl, as it stands now,
> is not able to give me a file descriptor to poll on, until it connects
> [1].  See this section:
>
> "When libcurl returns -1 in max_fd, it is because libcurl currently does
> something that isn't possible
> for your application to monitor with a socket and unfortunately you can
> then not know exactly when the
> current action is completed using select(). You then need to wait a while
> before you proceed and call
> curl_multi_perform anyway. How long to wait? Unless curl_multi_timeout
> gives you a lower number, we
> suggest 100 milliseconds or so, but you may want to test it out in your
> own particular conditions to
> find a suitable value."
>
> Regarding your issue: I would investigate further the reason why the
> connect is hanging, and not getting
> rejected immediately when your server is down.  That would solve all your
> blocking issues.
>
> The same with MySQL connections which go down: only after the connection
> is up are we able to obtain
> its file descriptor to asynchronously poll on.  So if connect to
> DB_HOST:3306 hangs, so will OpenSIPS.
>
> Regards,
>
> [1]: https://curl.haxx.se/libcurl/c/curl_multi_fdset.html
>
> Liviu Chircu
> OpenSIPS Developerhttp://www.opensips-solutions.com
>
> On 25.06.2019 18:41, Ben Newlin wrote:
>
> but I guess my question would be why isn’t the entire operation run in
> async? Why must the connect be performed in the current process and only
> the transfer be in another process?
>
> ___
> Users mailing list
> Users@lists.opensips.org
> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>
___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] Rest Client Async operation

2019-06-26 Thread Liviu Chircu
It's the same process doing both the connect and the transfer.  The 
problem is that libcurl, as it stands now,
is not able to give me a file descriptor to poll on, until it connects 
[1].  See this section:


"When libcurl returns -1 in max_fd, it is because libcurl currently does 
something that isn't possible
for your application to monitor with a socket and unfortunately you can 
then not know exactly when the
current action is completed using select(). You then need to wait a 
while before you proceed and call
curl_multi_perform anyway. How long to wait? Unless curl_multi_timeout 
gives you a lower number, we
suggest 100 milliseconds or so, but you may want to test it out in your 
own particular conditions to

find a suitable value."

Regarding your issue: I would investigate further the reason why the 
connect is hanging, and not getting
rejected immediately when your server is down.  That would solve all 
your blocking issues.


The same with MySQL connections which go down: only after the connection 
is up are we able to obtain
its file descriptor to asynchronously poll on.  So if connect to 
DB_HOST:3306 hangs, so will OpenSIPS.


Regards,

[1]: https://curl.haxx.se/libcurl/c/curl_multi_fdset.html

Liviu Chircu
OpenSIPS Developer
http://www.opensips-solutions.com

On 25.06.2019 18:41, Ben Newlin wrote:
but I guess my question would be why isn’t the entire operation run in 
async? Why must the connect be performed in the current process and 
only the transfer be in another process?
___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] Rest Client Async operation

2019-06-25 Thread Ben Newlin
Liviu,

Thank you for the response and explanation, and for investigating the timing 
discrepancies. I must say though that this answer is very disappointing. We are 
not transferring large files or data sets during realtime call processing (I 
hope no one is!), so the transfer time is not really a concern; it is the 
connect time that can cause severe backlogs when the far end is not available, 
as in my test case. So as it stands the async functionality for rest_client is 
mostly useless to us as it doesn’t prevent the most common cause of congestion.

I can appreciate that libcurl does not provide access during the connect, but I 
guess my question would be why isn’t the entire operation run in async? Why 
must the connect be performed in the current process and only the transfer be 
in another process? My expectation when using any async function was that all 
execution of the underlying function called would be performed in a separate 
process, not that the function could pick and choose which parts would block 
and which would not.

Are other async function implementations similar to this? For example, does an 
async_db_query establish the DB connection in the current thread in blocking 
mode and only perform the actual query as async?

I would also recommend updating the documentation for both async and 
rest_client specifically to make this limitation clear, as it was not at all 
clear to us.

Ben Newlin

From: Users  on behalf of Liviu Chircu 

Reply-To: OpenSIPS users mailling list 
Date: Tuesday, June 25, 2019 at 8:52 AM
To: "users@lists.opensips.org" 
Subject: Re: [OpenSIPS-Users] Rest Client Async operation


Hi Ben,

Regarding the "blocking rest_get()" problem: due to the specifics of the 
libcurl API, there is
no direct access to the socket file descriptors during the initial TCP connect 
handshake.  Thus,
this phase will be synchronous (blocking), while all transfers (send HTTP req, 
receive reply) are
done asynchronously, regardless of how the data gets send or arrives (all at 
once, in chunks,
byte-by-byte, etc.).

So, indeed: on the specific test case that the remote TCP endpoint is either 
unresponsive or it is
blocked by a firewall, the throughput of your OpenSIPS may be significantly 
affected when talking HTTP
to such peers even if you perform async HTTP requests.  Maybe this is just my 
opinion, but I believe
that the average case is when the HTTP server/API is down and the TCP connect 
attempt will get rejected.
See [1] for more on this topic.

Regarding the 5s vs. 8s timeouts, I will have to confirm the behavior first and 
get back to you.
Async is heavily reliant on tm, so I wouldn't exclude the possibility that the 
transaction timeout
overrides, in some cases, the async logic's timeout.

Best regards,

[1]: 
https://serverfault.com/questions/521359/why-do-some-connections-time-out-and-others-get-refused<https://serverfault.com/questions/521359/why-do-some-connections-time-out-and-others-get-refused>

Liviu Chircu

OpenSIPS Developer

http://www.opensips-solutions.com<http://www.opensips-solutions.com>
On 25.06.2019 00:33, Ben Newlin wrote:
Hello all,

During testing I have found behavior that I cannot explain based on our current 
understanding of the operation of the async and rest_client functionality. I 
have reproduced this behavior with a fairly simple script configuration, the 
relevant piece of which is below:

children=8

loadmodule "rest_client.so"
modparam("rest_client", "connection_timeout", 5)
modparam("rest_client", "connect_poll_interval", 1)
modparam("rest_client", "curl_timeout", 5)

route {
  xlog("L_ALERT", "before t_newtran()\r\n");
  t_newtran();
  xlog("L_ALERT", "after t_newtran()\r\n");
  async(rest_post("http://0.0.0.0;<http://0.0.0.0>, "{}", "application/json", 
"$var(body)", "$var(ctype)", "$var(rcode)"), test_resume);
  xlog("L_ALERT", "after async\r\n");
}
route[test_resume] {
  xlog("L_ALERT", "test_resume: $retcode\r\n");
  t_reply("503", "Service Unavailable");
}

The expected operation is to immediately trigger t_newtran() which sends a 100 
response, then timeout on the rest_post call in 5 seconds and respond with a 
503. For individual calls this does occur, although the timeout consistently 
takes several seconds too long, usually ~8s, as can be seen in the logs from 
the above code:

Jun 24 19:10:56 [337] before t_newtran()
Jun 24 19:10:56 [337] after t_newtran()
Jun 24 19:11:04 [337] ERROR:rest_client:start_async_http_req: connect timeout 
on http://0.0.0.0<http://0.0.0.0> (5s)
Jun 24 19:11:04 [337] test_resume: -2

The extra time is troubling, but the bigger concern is that when I run multiple 
concurrent calls, I find that the 8 UDP children are still all blocking against 

Re: [OpenSIPS-Users] Rest Client Async operation

2019-06-25 Thread Liviu Chircu

Hi Ben,

Regarding the "blocking rest_get()" problem: due to the specifics of the 
libcurl API, there is
no direct access to the socket file descriptors during the initial TCP 
connect handshake.  Thus,
this phase will be synchronous (blocking), while all transfers (send 
HTTP req, receive reply) are
done asynchronously, regardless of how the data gets send or arrives 
(all at once, in chunks,

byte-by-byte, etc.).

So, indeed: on the specific test case that the remote TCP endpoint is 
either unresponsive or it is
blocked by a firewall, the throughput of your OpenSIPS may be 
significantly affected when talking HTTP
to such peers even if you perform async HTTP requests.  Maybe this is 
just my opinion, but I believe
that the average case is when the HTTP server/API is down and the TCP 
connect attempt will get rejected.

See [1] for more on this topic.

Regarding the 5s vs. 8s timeouts, I will have to confirm the behavior 
first and get back to you.
Async is heavily reliant on tm, so I wouldn't exclude the possibility 
that the transaction timeout

overrides, in some cases, the async logic's timeout.

Best regards,

[1]: 
https://serverfault.com/questions/521359/why-do-some-connections-time-out-and-others-get-refused


Liviu Chircu
OpenSIPS Developer
http://www.opensips-solutions.com

On 25.06.2019 00:33, Ben Newlin wrote:


Hello all,

During testing I have found behavior that I cannot explain based on 
our current understanding of the operation of the async and 
rest_client functionality. I have reproduced this behavior with a 
fairly simple script configuration, the relevant piece of which is below:


children=8

loadmodule "rest_client.so"

modparam("rest_client", "connection_timeout", 5)

modparam("rest_client", "connect_poll_interval", 1)

modparam("rest_client", "curl_timeout", 5)

route {

xlog("L_ALERT", "before t_newtran()\r\n");

t_newtran();

xlog("L_ALERT", "after t_newtran()\r\n");

async(rest_post("http://0.0.0.0;, "{}", "application/json", 
"$var(body)", "$var(ctype)", "$var(rcode)"), test_resume);


xlog("L_ALERT", "after async\r\n");

}

route[test_resume] {

xlog("L_ALERT", "test_resume: $retcode\r\n");

t_reply("503", "Service Unavailable");

}

The expected operation is to immediately trigger t_newtran() which 
sends a 100 response, then timeout on the rest_post call in 5 seconds 
and respond with a 503. For individual calls this does occur, although 
the timeout consistently takes several seconds too long, usually ~8s, 
as can be seen in the logs from the above code:


Jun 24 19:10:56 [337] before t_newtran()

Jun 24 19:10:56 [337] after t_newtran()

Jun 24 19:11:04 [337] ERROR:rest_client:start_async_http_req: connect 
timeout on http://0.0.0.0 (5s)


Jun 24 19:11:04 [337] test_resume: -2

The extra time is troubling, but the bigger concern is that when I run 
multiple concurrent calls, I find that the 8 UDP children are still 
all blocking against the rest_post call. Here are the logs for that test:


Jun 24 21:19:51 [337] before t_newtran()

Jun 24 21:19:51 [337] after t_newtran()

Jun 24 21:19:51 [338] before t_newtran()

Jun 24 21:19:51 [338] after t_newtran()

Jun 24 21:19:51 [335] before t_newtran()

Jun 24 21:19:51 [335] after t_newtran()

Jun 24 21:19:51 [333] before t_newtran()

Jun 24 21:19:51 [333] after t_newtran()

Jun 24 21:19:52 [331] before t_newtran()

Jun 24 21:19:52 [331] after t_newtran()

Jun 24 21:19:52 [336] before t_newtran()

Jun 24 21:19:52 [336] after t_newtran()

Jun 24 21:19:52 [332] before t_newtran()

Jun 24 21:19:52 [332] after t_newtran()

Jun 24 21:19:52 [334] before t_newtran()

Jun 24 21:19:52 [334] after t_newtran()

Jun 24 21:19:59 [337] ERROR:rest_client:start_async_http_req: connect 
timeout on http://0.0.0.0 (5s)


Jun 24 21:19:59 [337] test_resume: -2

Jun 24 21:19:59 [337] before t_newtran()

Jun 24 21:19:59 [337] after t_newtran()

Jun 24 21:19:59 [338] ERROR:rest_client:start_async_http_req: connect 
timeout on http://0.0.0.0 (5s)


Jun 24 21:19:59 [338] test_resume: -2

Jun 24 21:19:59 [338] before t_newtran()

Jun 24 21:19:59 [338] after t_newtran()

Jun 24 21:19:59 [335] ERROR:rest_client:start_async_http_req: connect 
timeout on http://0.0.0.0 (5s)


Jun 24 21:19:59 [335] test_resume: -2

Jun 24 21:19:59 [335] before t_newtran()

Jun 24 21:19:59 [335] after t_newtran()

Jun 24 21:19:59 [333] ERROR:rest_client:start_async_http_req: connect 
timeout on http://0.0.0.0 (5s)


Jun 24 21:19:59 [333] test_resume: -2

Jun 24 21:19:59 [333] before t_newtran()

Jun 24 21:19:59 [333] after t_newtran()

Jun 24 21:19:59 [331] ERROR:rest_client:start_async_http_req: connect 
timeout on http://0.0.0.0 (5s)


Jun 24 21:19:59 [331] test_resume: -2

Jun 24 21:19:59 [331] before t_newtran()

Jun 24 21:19:59 [331] after t_newtran()

Jun 24 21:19:59 [336] ERROR:rest_client:start_async_http_req: connect 
timeout on http://0.0.0.0 (5s)


Jun 24 21:19:59 [336] test_resume: -2

Jun 24 21:19:59 [336] before t_newtran()

Jun 24 21:19:59 [336] after t_newtran()

[OpenSIPS-Users] Rest Client Async operation

2019-06-24 Thread Ben Newlin
Hello all,

During testing I have found behavior that I cannot explain based on our current 
understanding of the operation of the async and rest_client functionality. I 
have reproduced this behavior with a fairly simple script configuration, the 
relevant piece of which is below:

children=8

loadmodule "rest_client.so"
modparam("rest_client", "connection_timeout", 5)
modparam("rest_client", "connect_poll_interval", 1)
modparam("rest_client", "curl_timeout", 5)

route {
  xlog("L_ALERT", "before t_newtran()\r\n");
  t_newtran();
  xlog("L_ALERT", "after t_newtran()\r\n");
  async(rest_post("http://0.0.0.0;, "{}", "application/json", "$var(body)", 
"$var(ctype)", "$var(rcode)"), test_resume);
  xlog("L_ALERT", "after async\r\n");
}
route[test_resume] {
  xlog("L_ALERT", "test_resume: $retcode\r\n");
  t_reply("503", "Service Unavailable");
}

The expected operation is to immediately trigger t_newtran() which sends a 100 
response, then timeout on the rest_post call in 5 seconds and respond with a 
503. For individual calls this does occur, although the timeout consistently 
takes several seconds too long, usually ~8s, as can be seen in the logs from 
the above code:

Jun 24 19:10:56 [337] before t_newtran()
Jun 24 19:10:56 [337] after t_newtran()
Jun 24 19:11:04 [337] ERROR:rest_client:start_async_http_req: connect timeout 
on http://0.0.0.0 (5s)
Jun 24 19:11:04 [337] test_resume: -2

The extra time is troubling, but the bigger concern is that when I run multiple 
concurrent calls, I find that the 8 UDP children are still all blocking against 
the rest_post call. Here are the logs for that test:

Jun 24 21:19:51 [337] before t_newtran()
Jun 24 21:19:51 [337] after t_newtran()
Jun 24 21:19:51 [338] before t_newtran()
Jun 24 21:19:51 [338] after t_newtran()
Jun 24 21:19:51 [335] before t_newtran()
Jun 24 21:19:51 [335] after t_newtran()
Jun 24 21:19:51 [333] before t_newtran()
Jun 24 21:19:51 [333] after t_newtran()
Jun 24 21:19:52 [331] before t_newtran()
Jun 24 21:19:52 [331] after t_newtran()
Jun 24 21:19:52 [336] before t_newtran()
Jun 24 21:19:52 [336] after t_newtran()
Jun 24 21:19:52 [332] before t_newtran()
Jun 24 21:19:52 [332] after t_newtran()
Jun 24 21:19:52 [334] before t_newtran()
Jun 24 21:19:52 [334] after t_newtran()
Jun 24 21:19:59 [337] ERROR:rest_client:start_async_http_req: connect timeout 
on http://0.0.0.0 (5s)
Jun 24 21:19:59 [337] test_resume: -2
Jun 24 21:19:59 [337] before t_newtran()
Jun 24 21:19:59 [337] after t_newtran()
Jun 24 21:19:59 [338] ERROR:rest_client:start_async_http_req: connect timeout 
on http://0.0.0.0 (5s)
Jun 24 21:19:59 [338] test_resume: -2
Jun 24 21:19:59 [338] before t_newtran()
Jun 24 21:19:59 [338] after t_newtran()
Jun 24 21:19:59 [335] ERROR:rest_client:start_async_http_req: connect timeout 
on http://0.0.0.0 (5s)
Jun 24 21:19:59 [335] test_resume: -2
Jun 24 21:19:59 [335] before t_newtran()
Jun 24 21:19:59 [335] after t_newtran()
Jun 24 21:19:59 [333] ERROR:rest_client:start_async_http_req: connect timeout 
on http://0.0.0.0 (5s)
Jun 24 21:19:59 [333] test_resume: -2
Jun 24 21:19:59 [333] before t_newtran()
Jun 24 21:19:59 [333] after t_newtran()
Jun 24 21:19:59 [331] ERROR:rest_client:start_async_http_req: connect timeout 
on http://0.0.0.0 (5s)
Jun 24 21:19:59 [331] test_resume: -2
Jun 24 21:19:59 [331] before t_newtran()
Jun 24 21:19:59 [331] after t_newtran()
Jun 24 21:19:59 [336] ERROR:rest_client:start_async_http_req: connect timeout 
on http://0.0.0.0 (5s)
Jun 24 21:19:59 [336] test_resume: -2
Jun 24 21:19:59 [336] before t_newtran()
Jun 24 21:19:59 [336] after t_newtran()
Jun 24 21:19:59 [332] ERROR:rest_client:start_async_http_req: connect timeout 
on http://0.0.0.0 (5s)
Jun 24 21:19:59 [332] test_resume: -2
Jun 24 21:19:59 [332] before t_newtran()
Jun 24 21:19:59 [332] after t_newtran()
Jun 24 21:20:00 [334] ERROR:rest_client:start_async_http_req: connect timeout 
on http://0.0.0.0 (5s)
Jun 24 21:20:00 [334] test_resume: -2
Jun 24 21:20:00 [334] before t_newtran()
Jun 24 21:20:00 [334] after t_newtran()
Jun 24 21:20:07 [337] ERROR:rest_client:start_async_http_req: connect timeout 
on http://0.0.0.0 (5s)
Jun 24 21:20:07 [337] test_resume: -2
Jun 24 21:20:07 [338] ERROR:rest_client:start_async_http_req: connect timeout 
on http://0.0.0.0 (5s)
Jun 24 21:20:07 [338] test_resume: -2
Jun 24 21:20:07 [335] ERROR:rest_client:start_async_http_req: connect timeout 
on http://0.0.0.0 (5s)
Jun 24 21:20:07 [335] test_resume: -2
Jun 24 21:20:07 [333] ERROR:rest_client:start_async_http_req: connect timeout 
on http://0.0.0.0 (5s)
Jun 24 21:20:07 [333] test_resume: -2
Jun 24 21:20:07 [331] ERROR:rest_client:start_async_http_req: connect timeout 
on http://0.0.0.0 (5s)
Jun 24 21:20:07 [331] test_resume: -2
Jun 24 21:20:07 [336] ERROR:rest_client:start_async_http_req: connect timeout 
on http://0.0.0.0 (5s)
Jun 24 21:20:07 [336] test_resume: -2
Jun 24 21:20:07 [332] ERROR:rest_client:start_async_http_req: connect timeout 
on http://0.0.0.0 (5s)
Jun 24 21:20:07