** Description changed:

- Fix applied in stress-ng [1] on V0.10.09 and in Focal (development series),
- working on the backport patch for the stable releases (Bionic, Disco, Eoan).
+ [Impact]
  
- [1] https://kernel.ubuntu.com/git/cking/stress-
+  * Users running stress-ng's 'af-alg' stressor (which is part of the 'cpu' and
+   'os' classes of stressors) with 50+ instances, might get failure exit status
+   and the message 'bind failed, errno=110 (Connection timed out)'.
+ 
+  * For MAAS users, this means the CPU hardware tests (that run the 'cpu' class
+    of stressors) on larger systems might report 'FAILED' status, thus possibly
+    misleading administrators about the hardware present in the system.
+ 
+  * It has been determined that the problem root cause is related to concurrent
+    module loading request threshold in the kernel (50), which is exercised by
+    the crypto API at the time of the bind() system call (so to load the crypto
+    algorithm module requested).
+ 
+  * The problem happens due to a race condition between the instance that 
exceeded
+    the threshold of concurrent module loading (50 requests), then timed out 
while
+    waiting for a second chance (5 seconds), and another instance that 
successfully
+    made it and requested the module load but the module's self-tests didn't 
finish
+    within the time-out running in the first instance (60 seconds), as all the 
CPUs
+    are currently under stress; this error is then returned to 
userspace/bind().
+ 
+  * Not all instances fail with that error, as once the crypto algorithm module
+    is successfully loaded (i.e., by another concurrent instance and the module
+    self-tests eventually finished), the problem no longer occurs.
+ 
+  * The fix simply checks for ETIMEDOUT errno/failure on the bind() system 
call,
+    and performs a bounded retry loop (3 attempts), as the module may just have
+    been loaded successfully by another instance.
+ 
+ [Test Case]
+ 
+  * A synthetic reproducer is available; a kernel module that uses kprobes to
+    force the synchronization of af-alg instances to happen in the way needed
+    to reproduce the problem.
+ 
+  * With the kernel module loaded, one of the af-alg instances (not all of 
them)
+    hits the bind() connection timed out error if this fix/patch is not 
applied.
+ 
+ [Regression Potential]
+ 
+  * The code changes are minimal and contained within the af-alg stressor
+ code.
+ 
+  * Differences in behavior might be af-alg/cpu/os stressors that now pass/exit
+    with successful status on larger systems.
+ 
+ [Other Info]
+ 
+  * Fix applied in stress-ng [1] on V0.10.09 and in Focal (development
+ series).
+ 
+  * Backport provided for these stable releases: Bionic, Disco, Eoan.
+ 
+  [1] https://kernel.ubuntu.com/git/cking/stress-
  ng.git/commit/?id=637e0a9b7050cc69e76eeb7b61c14a659d8b8cfd
+ 
+ [Original Description]
+ 
+ The MAAS hardware test for CPU (long/12h) fails due to stress-ng-af-alg
+ bind() errors.
+ 
+ stress-ng-cpu-long <...> Failed [View log]
+ 
+ disabled 'cpu-online' as it may hang the machine (enable it with the 
--pathological option)
+ dispatching hogs: 72 af-alg, 72 atomic, 72 branch, 72 bsearch, 72 cache, 72 
context, 72 cpu, 72 crypt, 72 fp-error, 72 funccall, 72 getrandom, 72 heapsort, 
72 hsearch, 72 icache, 72 ioport, 72 lockbus, 72 longjmp, 72 lsearch, 72 
malloc, 72 matrix, 72 membarrier, 72 memcpy, 72 mergesort, 72 nop, 72 numa, 72 
opcode, 72 qsort, 72 radixsort, 72 rdrand, 72 str, 72 stream, 72 tree, 72 tsc, 
72 tsearch, 72 vecmath, 72 wcs, 72 zlib
+ stress-ng-numa: system has 2 of a maximum 1024 memory NUMA nodes
+ stress-ng-stream: stressor loosely based on a variant of the STREAM benchmark 
code
+ stress-ng-stream: do NOT submit any of these results to the STREAM benchmark 
results
+ stress-ng-stream: Using CPU cache size of 25344K
+ stress-ng-af-alg: bind failed, errno=110 (Connection timed out)
+ stress-ng-af-alg: bind failed, errno=110 (Connection timed out)
+ stress-ng-af-alg: bind failed, errno=110 (Connection timed out)
+ stress-ng-af-alg: bind failed, errno=110 (Connection timed out)
+ stress-ng-af-alg: bind failed, errno=110 (Connection timed out)
+ stress-ng-af-alg: bind failed, errno=110 (Connection timed out)
+ ...
+ process 6626 (stress-ng-af_alg) terminated with an error, exit status=1 
(stress-ng core failure)
+ process 6673 (stress-ng-af_alg) terminated with an error, exit status=1 
(stress-ng core failure)
+ process 6713 (stress-ng-af_alg) terminated with an error, exit status=1 
(stress-ng core failure)
+ process 6751 (stress-ng-af_alg) terminated with an error, exit status=1 
(stress-ng core failure)
+ process 6800 (stress-ng-af_alg) terminated with an error, exit status=1 
(stress-ng core failure)
+ ...
+ unsuccessful run completed in 44935.38s (12 hours, 28 mins, 55.38 secs)
+ ...

** Description changed:

  [Impact]
  
-  * Users running stress-ng's 'af-alg' stressor (which is part of the 'cpu' and
-   'os' classes of stressors) with 50+ instances, might get failure exit status
-   and the message 'bind failed, errno=110 (Connection timed out)'.
+  * Users running stress-ng's 'af-alg' stressor (which is part of the 'cpu'
+    and 'os' classes of stressors) with 50+ instances, might get failure exit
+    status and the message 'bind failed, errno=110 (Connection timed out)'.
  
-  * For MAAS users, this means the CPU hardware tests (that run the 'cpu' class
-    of stressors) on larger systems might report 'FAILED' status, thus possibly
-    misleading administrators about the hardware present in the system.
+  * For MAAS users, this means the CPU hardware tests (that run the 'cpu'
+    class of stressors) on larger systems might report 'FAILED' status, thus
+    possibly misleading admins about the hardware present in the system.
  
-  * It has been determined that the problem root cause is related to concurrent
-    module loading request threshold in the kernel (50), which is exercised by
-    the crypto API at the time of the bind() system call (so to load the crypto
-    algorithm module requested).
+  * It has been determined the problem root cause is related to concurrent
+    module loading request threshold in the kernel (50), which is exercised
+    by the crypto API at the time of the bind() system call (so to load the
+    crypto algorithm module requested).
  
-  * The problem happens due to a race condition between the instance that 
exceeded
-    the threshold of concurrent module loading (50 requests), then timed out 
while
-    waiting for a second chance (5 seconds), and another instance that 
successfully
-    made it and requested the module load but the module's self-tests didn't 
finish
-    within the time-out running in the first instance (60 seconds), as all the 
CPUs
-    are currently under stress; this error is then returned to 
userspace/bind().
+  * The problem happens due to a race condition between the instance that
+    exceeded the threshold of concurrent module loading (50 requests), then
+    timed out while waiting for a second chance (5 seconds), and another
+    instance that successfully made it and requested the module load but the
+    module's self-tests didn't finish within the time-out running in the first
+    instance (60 seconds), as all the CPUs are currently under stress; 
+    this error is then returned to userspace/bind().
  
-  * Not all instances fail with that error, as once the crypto algorithm module
-    is successfully loaded (i.e., by another concurrent instance and the module
-    self-tests eventually finished), the problem no longer occurs.
+  * Not all instances fail with that error, as once the crypto algorithm
+    module is successfully loaded (i.e., by another concurrent instance
+    and the module self-tests eventually finished), the problem no longer
+    occurs.
  
-  * The fix simply checks for ETIMEDOUT errno/failure on the bind() system 
call,
-    and performs a bounded retry loop (3 attempts), as the module may just have
-    been loaded successfully by another instance.
+  * The fix simply checks for ETIMEDOUT errno/failure on the bind() system
+    call, and performs a bounded retry loop (3 attempts), as the module may
+    just have been loaded successfully by another instance.
  
  [Test Case]
  
-  * A synthetic reproducer is available; a kernel module that uses kprobes to
-    force the synchronization of af-alg instances to happen in the way needed
-    to reproduce the problem.
+  * A synthetic reproducer is available; a kernel module that uses kprobes to
+    force the synchronization of af-alg instances to happen in the way needed
+    to reproduce the problem.
  
-  * With the kernel module loaded, one of the af-alg instances (not all of 
them)
-    hits the bind() connection timed out error if this fix/patch is not 
applied.
+  * With the kernel module loaded, one of the af-alg instances (not all of 
them)
+    hits the bind() connection timed out error if this fix/patch is not 
applied.
  
  [Regression Potential]
  
-  * The code changes are minimal and contained within the af-alg stressor
+  * The code changes are minimal and contained within the af-alg stressor
  code.
  
-  * Differences in behavior might be af-alg/cpu/os stressors that now pass/exit
-    with successful status on larger systems.
+  * Differences in behavior might be af-alg/cpu/os stressors that now pass/exit
+    with successful status on larger systems.
  
  [Other Info]
  
-  * Fix applied in stress-ng [1] on V0.10.09 and in Focal (development
+  * Fix applied in stress-ng [1] on V0.10.09 and in Focal (development
  series).
  
-  * Backport provided for these stable releases: Bionic, Disco, Eoan.
+  * Backport provided for these stable releases: Bionic, Disco, Eoan.
  
-  [1] https://kernel.ubuntu.com/git/cking/stress-
+  [1] https://kernel.ubuntu.com/git/cking/stress-
  ng.git/commit/?id=637e0a9b7050cc69e76eeb7b61c14a659d8b8cfd
  
  [Original Description]
  
  The MAAS hardware test for CPU (long/12h) fails due to stress-ng-af-alg
  bind() errors.
  
  stress-ng-cpu-long <...> Failed [View log]
  
  disabled 'cpu-online' as it may hang the machine (enable it with the 
--pathological option)
  dispatching hogs: 72 af-alg, 72 atomic, 72 branch, 72 bsearch, 72 cache, 72 
context, 72 cpu, 72 crypt, 72 fp-error, 72 funccall, 72 getrandom, 72 heapsort, 
72 hsearch, 72 icache, 72 ioport, 72 lockbus, 72 longjmp, 72 lsearch, 72 
malloc, 72 matrix, 72 membarrier, 72 memcpy, 72 mergesort, 72 nop, 72 numa, 72 
opcode, 72 qsort, 72 radixsort, 72 rdrand, 72 str, 72 stream, 72 tree, 72 tsc, 
72 tsearch, 72 vecmath, 72 wcs, 72 zlib
  stress-ng-numa: system has 2 of a maximum 1024 memory NUMA nodes
  stress-ng-stream: stressor loosely based on a variant of the STREAM benchmark 
code
  stress-ng-stream: do NOT submit any of these results to the STREAM benchmark 
results
  stress-ng-stream: Using CPU cache size of 25344K
  stress-ng-af-alg: bind failed, errno=110 (Connection timed out)
  stress-ng-af-alg: bind failed, errno=110 (Connection timed out)
  stress-ng-af-alg: bind failed, errno=110 (Connection timed out)
  stress-ng-af-alg: bind failed, errno=110 (Connection timed out)
  stress-ng-af-alg: bind failed, errno=110 (Connection timed out)
  stress-ng-af-alg: bind failed, errno=110 (Connection timed out)
  ...
  process 6626 (stress-ng-af_alg) terminated with an error, exit status=1 
(stress-ng core failure)
  process 6673 (stress-ng-af_alg) terminated with an error, exit status=1 
(stress-ng core failure)
  process 6713 (stress-ng-af_alg) terminated with an error, exit status=1 
(stress-ng core failure)
  process 6751 (stress-ng-af_alg) terminated with an error, exit status=1 
(stress-ng core failure)
  process 6800 (stress-ng-af_alg) terminated with an error, exit status=1 
(stress-ng core failure)
  ...
  unsuccessful run completed in 44935.38s (12 hours, 28 mins, 55.38 secs)
  ...

** Description changed:

  [Impact]
  
-  * Users running stress-ng's 'af-alg' stressor (which is part of the 'cpu'
-    and 'os' classes of stressors) with 50+ instances, might get failure exit
-    status and the message 'bind failed, errno=110 (Connection timed out)'.
+  * Users running stress-ng's 'af-alg' stressor (which is part of the 'cpu'
+    and 'os' classes of stressors) with 50+ instances, might get failure exit
+    status and the message 'bind failed, errno=110 (Connection timed out)'.
  
-  * For MAAS users, this means the CPU hardware tests (that run the 'cpu'
-    class of stressors) on larger systems might report 'FAILED' status, thus
-    possibly misleading admins about the hardware present in the system.
+  * For MAAS users, this means the CPU hardware tests (that run the 'cpu'
+    class of stressors) on larger systems might report 'FAILED' status, thus
+    possibly misleading admins about the hardware present in the system.
  
-  * It has been determined the problem root cause is related to concurrent
-    module loading request threshold in the kernel (50), which is exercised
-    by the crypto API at the time of the bind() system call (so to load the
-    crypto algorithm module requested).
+  * It has been determined the problem root cause is related to concurrent
+    module loading request threshold in the kernel (50), which is exercised
+    by the crypto API at the time of the bind() system call (so to load the
+    crypto algorithm module requested).
  
-  * The problem happens due to a race condition between the instance that
-    exceeded the threshold of concurrent module loading (50 requests), then
-    timed out while waiting for a second chance (5 seconds), and another
-    instance that successfully made it and requested the module load but the
-    module's self-tests didn't finish within the time-out running in the first
-    instance (60 seconds), as all the CPUs are currently under stress; 
-    this error is then returned to userspace/bind().
+  * The problem happens due to a race condition between the instance that
+    exceeded the threshold of concurrent module loading (50 requests), then
+    timed out while waiting for a second chance (5 seconds), and another
+    instance that successfully made it and requested the module load but the
+    module's self-tests didn't finish within the time-out running in the first
+    instance (60 seconds), as all the CPUs are currently under stress;
+    this error is then returned to userspace/bind().
  
-  * Not all instances fail with that error, as once the crypto algorithm
-    module is successfully loaded (i.e., by another concurrent instance
-    and the module self-tests eventually finished), the problem no longer
-    occurs.
+  * Not all instances fail with that error, as once the crypto algorithm
+    module is successfully loaded (i.e., by another concurrent instance
+    and the module self-tests eventually finished), the problem no longer
+    occurs.
  
-  * The fix simply checks for ETIMEDOUT errno/failure on the bind() system
-    call, and performs a bounded retry loop (3 attempts), as the module may
-    just have been loaded successfully by another instance.
+  * The fix simply checks for ETIMEDOUT errno/failure on the bind() system
+    call, and performs a bounded retry loop (3 attempts), as the module may
+    just have been loaded successfully by another instance.
  
  [Test Case]
  
-  * A synthetic reproducer is available; a kernel module that uses kprobes to
-    force the synchronization of af-alg instances to happen in the way needed
-    to reproduce the problem.
+  * A synthetic reproducer is available; a kernel module that uses kprobes to
+    force the synchronization of af-alg instances to happen in the way needed
+    to reproduce the problem.
  
-  * With the kernel module loaded, one of the af-alg instances (not all of 
them)
-    hits the bind() connection timed out error if this fix/patch is not 
applied.
+  * With the kernel module loaded, one of the af-alg instances (not all of
+    them) hits the bind() connection timed out if this patch is not applied.
  
  [Regression Potential]
  
-  * The code changes are minimal and contained within the af-alg stressor
+  * The code changes are minimal and contained within af-alg stressor
  code.
  
-  * Differences in behavior might be af-alg/cpu/os stressors that now pass/exit
-    with successful status on larger systems.
+  * Differences in behavior might be af-alg/cpu/os stressors that now
+    pass/exit with successful status on larger systems.
  
  [Other Info]
  
   * Fix applied in stress-ng [1] on V0.10.09 and in Focal (development
  series).
  
   * Backport provided for these stable releases: Bionic, Disco, Eoan.
  
   [1] https://kernel.ubuntu.com/git/cking/stress-
  ng.git/commit/?id=637e0a9b7050cc69e76eeb7b61c14a659d8b8cfd
  
  [Original Description]
  
  The MAAS hardware test for CPU (long/12h) fails due to stress-ng-af-alg
  bind() errors.
  
  stress-ng-cpu-long <...> Failed [View log]
  
  disabled 'cpu-online' as it may hang the machine (enable it with the 
--pathological option)
  dispatching hogs: 72 af-alg, 72 atomic, 72 branch, 72 bsearch, 72 cache, 72 
context, 72 cpu, 72 crypt, 72 fp-error, 72 funccall, 72 getrandom, 72 heapsort, 
72 hsearch, 72 icache, 72 ioport, 72 lockbus, 72 longjmp, 72 lsearch, 72 
malloc, 72 matrix, 72 membarrier, 72 memcpy, 72 mergesort, 72 nop, 72 numa, 72 
opcode, 72 qsort, 72 radixsort, 72 rdrand, 72 str, 72 stream, 72 tree, 72 tsc, 
72 tsearch, 72 vecmath, 72 wcs, 72 zlib
  stress-ng-numa: system has 2 of a maximum 1024 memory NUMA nodes
  stress-ng-stream: stressor loosely based on a variant of the STREAM benchmark 
code
  stress-ng-stream: do NOT submit any of these results to the STREAM benchmark 
results
  stress-ng-stream: Using CPU cache size of 25344K
  stress-ng-af-alg: bind failed, errno=110 (Connection timed out)
  stress-ng-af-alg: bind failed, errno=110 (Connection timed out)
  stress-ng-af-alg: bind failed, errno=110 (Connection timed out)
  stress-ng-af-alg: bind failed, errno=110 (Connection timed out)
  stress-ng-af-alg: bind failed, errno=110 (Connection timed out)
  stress-ng-af-alg: bind failed, errno=110 (Connection timed out)
  ...
  process 6626 (stress-ng-af_alg) terminated with an error, exit status=1 
(stress-ng core failure)
  process 6673 (stress-ng-af_alg) terminated with an error, exit status=1 
(stress-ng core failure)
  process 6713 (stress-ng-af_alg) terminated with an error, exit status=1 
(stress-ng core failure)
  process 6751 (stress-ng-af_alg) terminated with an error, exit status=1 
(stress-ng core failure)
  process 6800 (stress-ng-af_alg) terminated with an error, exit status=1 
(stress-ng core failure)
  ...
  unsuccessful run completed in 44935.38s (12 hours, 28 mins, 55.38 secs)
  ...

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1851553

Title:
  stress-ng-af-alg: bind failed, errno=110 (Connection timed out)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/stress-ng/+bug/1851553/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to