I’ve patched the code to do as you said, creating sockets until a null, and
then closing them all, printing a lot to debug it for now, and what I see on
the Mac is the following:
- the sockets are happily created up to 1024, irrespectively of the ulimit -n
value being 256, 512, 1024 or anything up to 8192. (need to grep the code and
understand where is this 1024 limit coming from, as it surely ain’t respecting
the ulimit -n)
- the test fails when it tries to close the sockets. Depending on the value of
ulimit, it will not be able to close all sockets, and will either hang forever
(limit = 8192) or fail in the middle with an assertion and a stack smash
affecting the close cycle
- for ulimit -n 8192, the last lines are Socket “…\nclose: 112\nSocket clos”
and hangs forever
- for ulimit -n 1024, 512 and 256, it doesn’t hang, but reports a "Bad file
descriptor (signaler.cpp:110)” and the counter mixes itself up.
- For example, for 256: (…) means order is correct, at 794 the assertion, then
i becomes 695 to 691, then an empty line (with?), then back at 788 (where are
the 793 to 789), then dies at 696:
Socket close: 795
Socket close: 794Bad file descriptor (signaler.cpp:110)
Socket close: 695
Socket close: 694
(…)
Socket close: 691
Socket close: 788
Socket close: 787
(…)
Socket close: 696
(nothing else)
I’m gonna put the kid to bed, and then compare the compilation and behavior on
linux and try to understand a little bit more on the code, but if anyone has
any idea already about:
- why is it possible to create 1024 sockets irrespectively of the file handler
limits. is this 1024 hardcoded somewhere on the code?
- where could be the “stack overflow” that affects the close in such a random
manner.
brb
index d7d85d7..b0c4ce6 100644
--- a/tests/test_many_sockets.cpp
+++ b/tests/test_many_sockets.cpp
@@ -22,7 +22,8 @@
#include <stdio.h>
#include <stdlib.h>
-const int no_of_sockets = 5000;
+const int no_of_sockets = 1024 * 1024;
+
int main(void)
{
@@ -36,15 +37,19 @@ int main(void)
for ( int i = 0; i < no_of_sockets; ++i )
{
sockets[i] = zmq_socket(ctx, ZMQ_PAIR);
- if (sockets[i])
- ++sockets_created;
+ if (!sockets[i])
+ break;
+ printf("Socket created: %d\n", i);
+ ++sockets_created;
}
- assert(sockets_created < no_of_sockets);
+ printf("Socket limit: %d\n", sockets_created);
- for ( int i = 0; i < no_of_sockets; ++i )
- if (sockets[i])
- zmq_close (sockets[i]);
+ for ( int i = sockets_created-1; i != 0; --i )
+ {
+ printf("Socket close: %d\n", i);
+ zmq_close (sockets[i]);
+ }
zmq_ctx_destroy (ctx);
return 0;
On Nov 9, 2013, at 12:42, Pieter Hintjens <[email protected]> wrote:
> The test_many_sockets should IMO create sockets in a loop, without
> limit, until it receives a NULL return, and then exit happily. The
> goal being to check that libzmq does not crash or assert when this
> condition hits.
>
> On Sat, Nov 9, 2013 at 1:19 PM, Bruno D. Rodrigues
> <[email protected]> wrote:
>> So if the limits are raised should the test fail or still pass albeit
>> is not testing anything?
>>
>> I don't think it's a good idea to have tests depending on
>> externalities as they should run consistently no matter what the
>> ulimits -n is (as long as it's sane), but can I assume a default of
>> 1024 (is it the default/minimum from Linux?), and if so I'll try to
>> have a look at them later.
>>
>> 1. Assume ulimits 1024? Why does the doc say 1200?
>>
>> 2. That test shall pass or fail if limits are raised?
>>
>>
>>
>> --
>> Bruno Rodrigues
>> Sent from my iPhone
>>
>> No dia 09/11/2013, às 11:25, Pieter Hintjens <[email protected]> escreveu:
>>
>>> XFAIL is intentional failure, which is normal for the two tests that have
>>> it.
>>>
>>> The test_many_sockets test is meant to exceed system limits and check
>>> libzmq deals with it correctly. It shouldn't need raising process
>>> handles to pass. The code may still be flaky on OS/X.
>>>
>>>
>>> On Sat, Nov 9, 2013 at 12:18 PM, Bruno D. Rodrigues
>>> <[email protected]> wrote:
>>>> With the current master, I get all pass on macosx as long as I run ulimit
>>>> -n 8192 before.
>>>>
>>>> The test_many_sockets fails because it creates 5K sockets. The shutdown
>>>> may fail for similar reasons.
>>>>
>>>> ============================================================================
>>>> Testsuite summary for zeromq 4.1.0
>>>> ============================================================================
>>>> # TOTAL: 46
>>>> # PASS: 44
>>>> # SKIP: 0
>>>> # XFAIL: 2
>>>> # FAIL: 0
>>>> # XPASS: 0
>>>> # ERROR: 0
>>>>
>>>> (I assume the 2 XFAIL are okish?)
>>>>
>>>>
>>>>> On Nov 9, 2013, at 9:38, Pieter Hintjens <[email protected]> wrote:
>>>>>
>>>>>> On Sat, Nov 9, 2013 at 8:17 AM, Matt Connolly <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>> test_many_sockets.log:
>>>>>> Assertion failed: nbytes == sizeof (dummy) (signaler.cpp:149)
>>>>>
>>>>> Some unhandled error condition on line 140; presumably specific to OS/X.
>>>>>
>>>>> -Pieter
>>>>> _______________________________________________
>>>>> zeromq-dev mailing list
>>>>> [email protected]
>>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>
>>>> _______________________________________________
>>>> zeromq-dev mailing list
>>>> [email protected]
>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>
>>>
>>>
>>> --
>>> -
>>> Pieter Hintjens
>>> CEO of iMatix.com
>>> Founder of ZeroMQ community
>>> blog: http://hintjens.com
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> [email protected]
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> _______________________________________________
>> zeromq-dev mailing list
>> [email protected]
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>
> --
> -
> Pieter Hintjens
> CEO of iMatix.com
> Founder of ZeroMQ community
> blog: http://hintjens.com
> _______________________________________________
> zeromq-dev mailing list
> [email protected]
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev