[Touch-packages] [Bug 1629226] Re: systemd's service killed by cgroup controller pids

2017-05-22 Thread Balint Reczey
Regarding the original report this is a simple program which keeps the maximal 
allowed children running and it does not get killed by cgroups, just the fork() 
call fails:
---
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define MASTER_SLEEP_NS 100L
#define CHILD_SLEEP_S 5

void  main(void)
{
  pid_t  pid;
  struct timespec master_sleep = {0, MASTER_SLEEP_NS};

  for (;;) {
pid = fork();
if (pid < 0) {
  perror("fork failed:");
  nanosleep(_sleep, NULL);
}
if (pid == 0) {
  sleep(CHILD_SLEEP_S);
  exit(0);
}
nanosleep(_sleep, NULL);
/* collect exited children */
while (waitpid(-1, NULL, WNOHANG) > 0);
  } 
}
---
[Unit]
Description=Reproducer 
After=multi-user.target

[Service]
ExecStart=/home/user/reproducer
Type=simple
TasksMax=512

[Install]
WantedBy=multi-user.target
---
● reproducer.service - Reproducer
   Loaded: loaded (/etc/systemd/system/reproducer.service; disabled; vendor 
preset: enabled)
   Active: active (running) since Mon 2017-05-22 13:16:50 UTC; 3min 3s ago
 Main PID: 11778 (reproducer)
Tasks: 512 (limit: 512)
   Memory: 55.4M
  CPU: 1min 2.794s
   CGroup: /system.slice/reproducer.service
   ├─11778 /home/rbalint/reproducer
   ├─18144 /home/rbalint/reproducer
...
   ├─26763 /home/rbalint/reproducer
   ├─26764 /home/rbalint/reproducer
   ├─26765 /home/rbalint/reproducer
   └─26766 /home/rbalint/reproducer

May 22 13:20:14 zesty-test reproducer[11778]: fork failed:: Resource 
temporarily unavailable
May 22 13:20:14 zesty-test reproducer[11778]: fork failed:: Resource 
temporarily unavailable
May 22 13:20:14 zesty-test reproducer[11778]: fork failed:: Resource 
temporarily unavailable

---

Bash on the other hand kills itself after a few failing forks:

● reproducer.service - Reproducer
   Loaded: loaded (/etc/systemd/system/reproducer.service; disabled; vendor 
preset: enabled)
   Active: failed (Result: exit-code) since Mon 2017-05-22 13:22:38 UTC; 3s ago
  Process: 14281 ExecStart=/home/rbalint/reproducer.sh (code=exited, 
status=0/SUCCESS)
 Main PID: 14287 (code=exited, status=254)
  CPU: 639ms

May 22 13:22:35 zesty-test reproducer.sh[14281]: /home/rbalint/reproducer.sh: 
fork: retry: Resource temporarily unavailable
May 22 13:22:35 zesty-test reproducer.sh[14281]: /home/rbalint/reproducer.sh: 
fork: retry: Resource temporarily unavailable
May 22 13:22:35 zesty-test reproducer.sh[14281]: /home/rbalint/reproducer.sh: 
fork: retry: Resource temporarily unavailable
May 22 13:22:35 zesty-test reproducer.sh[14281]: /home/rbalint/reproducer.sh: 
fork: retry: Resource temporarily unavailable
May 22 13:22:35 zesty-test reproducer.sh[14281]: /home/rbalint/reproducer.sh: 
fork: retry: Resource temporarily unavailable
May 22 13:22:35 zesty-test reproducer.sh[14281]: /home/rbalint/reproducer.sh: 
fork: retry: Resource temporarily unavailable
May 22 13:22:38 zesty-test reproducer.sh[14281]: /home/rbalint/reproducer.sh: 
fork: Interrupted system call
May 22 13:22:38 zesty-test systemd[1]: reproducer.service: Main process exited, 
code=exited, status=254/n/a
May 22 13:22:38 zesty-test systemd[1]: reproducer.service: Unit entered failed 
state.


http://sources.debian.net/src/bash/4.4-5/jobs.c/?hl=1919#L1919

  /* Create the child, handle severe errors.  Retry on EAGAIN. */
  while ((pid = fork ()) < 0 && errno == EAGAIN && forksleep < FORKSLEEP_MAX)
{
  /* bash-4.2 */
  sigprocmask (SIG_SETMASK, , (sigset_t *)NULL);
  /* If we can't create any children, try to reap some dead ones. */
  waitchld (-1, 0);

  errno = EAGAIN;   /* restore errno */
  sys_error ("fork: retry");
  RESET_SIGTERM;

  if (sleep (forksleep) != 0)
break;
  forksleep <<= 1;

  if (interrupt_state)
break;
  sigprocmask (SIG_SETMASK, , (sigset_t *)NULL);
}
...
  if (pid < 0)
{
  sys_error ("fork");

  /* Kill all of the processes in the current pipeline. */
  terminate_current_pipeline ();

  /* Discard the current pipeline, if any. */
  if (the_pipeline)
kill_current_pipeline ();

  last_command_exit_value = EX_NOEXEC;
  throw_to_top_level ();/* Reset signals, etc. */
}
...

I believe this is by design and I think this approach is reasonable.

A shell should not try to keep itself alive forking new processes when
it hit system limits already for a few times. There are other tools
available for implementing servers with worker pools which adapt to
system limits which are not defined in advance.

If you know the number of workers need in advance I suggest setting
TasksMax to high enough or to infinity in case you don't want to rely on
cgroup fork limits.


** Changed in: bash (Ubuntu)
   Status: In Progress => Invalid

** Summary changed:

- systemd's service killed by cgroup controller pids
+ Bash exits after a few failed fork()-s

-- 
You received this bug 

[Touch-packages] [Bug 1629226] Re: systemd's service killed by cgroup controller pids

2017-05-22 Thread Balint Reczey
** Changed in: bash (Ubuntu)
   Status: Triaged => In Progress

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to bash in Ubuntu.
https://bugs.launchpad.net/bugs/1629226

Title:
  systemd's service killed by cgroup controller pids

Status in The Ubuntu-power-systems project:
  New
Status in bash package in Ubuntu:
  In Progress

Bug description:
  Problem Description
  ===
  I write a simple systemd service which will fork child processes fiercely. 
But quickly the service failed:

  % sudo systemctl status reproducer.service
  ? reproducer.service - Reproducer of systemd services killed by ips
 Loaded: loaded (/etc/systemd/system/reproducer.service; disabled; vendor 
preset: enabled)
 Active: failed (Result: exit-code) since Fri 2016-03-18 06:58:37 CDT; 2min 
43s ago
Process: 5103 ExecStart=/home/hpt/reproducer/reproducer.sh (code=exited, 
status=0/SUCCESS)
   Main PID: 5105 (code=exited, status=254)

  Mar 18 06:58:36 pinelp3 reproducer.sh[5103]: 
/home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable
  Mar 18 06:58:36 pinelp3 reproducer.sh[5103]: 
/home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable
  Mar 18 06:58:37 pinelp3 reproducer.sh[5103]: 
/home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable
  Mar 18 06:58:37 pinelp3 reproducer.sh[5103]: 
/home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable
  Mar 18 06:58:37 pinelp3 reproducer.sh[5103]: 
/home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable
  Mar 18 06:58:37 pinelp3 reproducer.sh[5103]: 
/home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable
  Mar 18 06:58:37 pinelp3 systemd[1]: reproducer.service: Main process exited, 
code=exited, status=254/n/a
  Mar 18 06:58:37 pinelp3 reproducer.sh[5103]: 
/home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable
  Mar 18 06:58:37 pinelp3 systemd[1]: reproducer.service: Unit entered failed 
state.
  Mar 18 06:58:37 pinelp3 systemd[1]: reproducer.service: Failed with result 
'exit-code'.

  The default task limit of systemd services is 512. Looks like the
  service is terminated by the kernel's ips cgroup controller. I think
  this isn't correct. Child processes cannot be forked shouldn't cause
  parent to die.

  
  % cat /etc/systemd/system/reproducer.service 
  [Unit]
  Description=Reproducer of systemd services killed by ips
  After=multi-user.target

  [Service]
  ExecStart=/home/hpt/reproducer/reproducer.sh
  Type=forking

  [Install]
  WantedBy=multi-user.target

  % cat /home/hpt/reproducer/reproducer.sh
  #!/bin/bash

  foo()
  {
  #exec sh -c "echo $1: \$\$;sleep 60"
  echo $1: 
  sleep 60
  }

  bar()
  {
  c=1
  while true
  do
  for ((i=1;i<=2048;i++))
  do
  foo $c &
  ((c++))
  done

  wait
  c=1
  done
  }

  # main
  bar &

  disown -a

  exit 0

  
  ---uname output---
  Linux pinelp3 4.4.0-12-generic #28-Ubuntu SMP Wed Mar 9 00:40:38 UTC 2016 
ppc64le ppc64le ppc64le GNU/Linux
   
  Machine Type = IBM,8408-E8E,lpar 

  Steps to Reproduce
  
  1. install the simple service in "Problem description"
  2. sudo systemctl start reproducer.service
  3. wait 2~3 minutes
   
  == Comment: #3 - Vaishnavi Bhat  - 2016-03-22 11:21:55 ==
  From the machine,
  root@pinelp3:~# ulimit -a
  core file size  (blocks, -c) 0
  data seg size   (kbytes, -d) unlimited
  scheduling priority (-e) 0
  file size   (blocks, -f) unlimited
  pending signals (-i) 48192
  max locked memory   (kbytes, -l) 64
  max memory size (kbytes, -m) unlimited
  open files  (-n) 1024
  pipe size(512 bytes, -p) 8
  POSIX message queues (bytes, -q) 819200
  real-time priority  (-r) 0
  stack size  (kbytes, -s) 8192
  cpu time   (seconds, -t) unlimited
  max user processes  (-u) 48192
  virtual memory  (kbytes, -v) unlimited
  file locks  (-x) unlimited

  root@pinelp3:~# ps aux | wc -l  ->While the service is 
running 
  1084
  root@pinelp3:~# ps aux | wc -l   "
  1084
  root@pinelp3:~# ps aux | wc -l   "
  1084
  root@pinelp3:~# ps aux | wc -l   "
  1084
  root@pinelp3:~# ps aux | wc -l-->While the service is not 
running. 
  572

  root@pinelp3:~# free -m   --> While service is running
totalusedfree  shared  buff/cache   
available
  Mem:  12117 628 459  22   11029

[Touch-packages] [Bug 1629226] Re: systemd's service killed by cgroup controller pids

2017-05-17 Thread Steve Langasek
** Changed in: bash (Ubuntu)
   Importance: Undecided => Medium

** Changed in: bash (Ubuntu)
   Status: New => Triaged

** Changed in: bash (Ubuntu)
Milestone: None => ubuntu-17.05

** Changed in: bash (Ubuntu)
 Assignee: Taco Screen team (taco-screen-team) => Balint Reczey (rbalint)

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to bash in Ubuntu.
https://bugs.launchpad.net/bugs/1629226

Title:
  systemd's service killed by cgroup controller pids

Status in The Ubuntu-power-systems project:
  New
Status in bash package in Ubuntu:
  Triaged

Bug description:
  Problem Description
  ===
  I write a simple systemd service which will fork child processes fiercely. 
But quickly the service failed:

  % sudo systemctl status reproducer.service
  ? reproducer.service - Reproducer of systemd services killed by ips
 Loaded: loaded (/etc/systemd/system/reproducer.service; disabled; vendor 
preset: enabled)
 Active: failed (Result: exit-code) since Fri 2016-03-18 06:58:37 CDT; 2min 
43s ago
Process: 5103 ExecStart=/home/hpt/reproducer/reproducer.sh (code=exited, 
status=0/SUCCESS)
   Main PID: 5105 (code=exited, status=254)

  Mar 18 06:58:36 pinelp3 reproducer.sh[5103]: 
/home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable
  Mar 18 06:58:36 pinelp3 reproducer.sh[5103]: 
/home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable
  Mar 18 06:58:37 pinelp3 reproducer.sh[5103]: 
/home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable
  Mar 18 06:58:37 pinelp3 reproducer.sh[5103]: 
/home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable
  Mar 18 06:58:37 pinelp3 reproducer.sh[5103]: 
/home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable
  Mar 18 06:58:37 pinelp3 reproducer.sh[5103]: 
/home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable
  Mar 18 06:58:37 pinelp3 systemd[1]: reproducer.service: Main process exited, 
code=exited, status=254/n/a
  Mar 18 06:58:37 pinelp3 reproducer.sh[5103]: 
/home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable
  Mar 18 06:58:37 pinelp3 systemd[1]: reproducer.service: Unit entered failed 
state.
  Mar 18 06:58:37 pinelp3 systemd[1]: reproducer.service: Failed with result 
'exit-code'.

  The default task limit of systemd services is 512. Looks like the
  service is terminated by the kernel's ips cgroup controller. I think
  this isn't correct. Child processes cannot be forked shouldn't cause
  parent to die.

  
  % cat /etc/systemd/system/reproducer.service 
  [Unit]
  Description=Reproducer of systemd services killed by ips
  After=multi-user.target

  [Service]
  ExecStart=/home/hpt/reproducer/reproducer.sh
  Type=forking

  [Install]
  WantedBy=multi-user.target

  % cat /home/hpt/reproducer/reproducer.sh
  #!/bin/bash

  foo()
  {
  #exec sh -c "echo $1: \$\$;sleep 60"
  echo $1: 
  sleep 60
  }

  bar()
  {
  c=1
  while true
  do
  for ((i=1;i<=2048;i++))
  do
  foo $c &
  ((c++))
  done

  wait
  c=1
  done
  }

  # main
  bar &

  disown -a

  exit 0

  
  ---uname output---
  Linux pinelp3 4.4.0-12-generic #28-Ubuntu SMP Wed Mar 9 00:40:38 UTC 2016 
ppc64le ppc64le ppc64le GNU/Linux
   
  Machine Type = IBM,8408-E8E,lpar 

  Steps to Reproduce
  
  1. install the simple service in "Problem description"
  2. sudo systemctl start reproducer.service
  3. wait 2~3 minutes
   
  == Comment: #3 - Vaishnavi Bhat  - 2016-03-22 11:21:55 ==
  From the machine,
  root@pinelp3:~# ulimit -a
  core file size  (blocks, -c) 0
  data seg size   (kbytes, -d) unlimited
  scheduling priority (-e) 0
  file size   (blocks, -f) unlimited
  pending signals (-i) 48192
  max locked memory   (kbytes, -l) 64
  max memory size (kbytes, -m) unlimited
  open files  (-n) 1024
  pipe size(512 bytes, -p) 8
  POSIX message queues (bytes, -q) 819200
  real-time priority  (-r) 0
  stack size  (kbytes, -s) 8192
  cpu time   (seconds, -t) unlimited
  max user processes  (-u) 48192
  virtual memory  (kbytes, -v) unlimited
  file locks  (-x) unlimited

  root@pinelp3:~# ps aux | wc -l  ->While the service is 
running 
  1084
  root@pinelp3:~# ps aux | wc -l   "
  1084
  root@pinelp3:~# ps aux | wc -l   "
  1084
  root@pinelp3:~# ps aux | wc -l   "
  1084
  root@pinelp3:~# ps aux | wc -l-->While the service is not 
running. 
  572

  

[Touch-packages] [Bug 1629226] Re: systemd's service killed by cgroup controller pids

2017-04-26 Thread Manoj Iyer
** Also affects: ubuntu-power-systems
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to bash in Ubuntu.
https://bugs.launchpad.net/bugs/1629226

Title:
  systemd's service killed by cgroup controller pids

Status in The Ubuntu-power-systems project:
  New
Status in bash package in Ubuntu:
  New

Bug description:
  Problem Description
  ===
  I write a simple systemd service which will fork child processes fiercely. 
But quickly the service failed:

  % sudo systemctl status reproducer.service
  ? reproducer.service - Reproducer of systemd services killed by ips
 Loaded: loaded (/etc/systemd/system/reproducer.service; disabled; vendor 
preset: enabled)
 Active: failed (Result: exit-code) since Fri 2016-03-18 06:58:37 CDT; 2min 
43s ago
Process: 5103 ExecStart=/home/hpt/reproducer/reproducer.sh (code=exited, 
status=0/SUCCESS)
   Main PID: 5105 (code=exited, status=254)

  Mar 18 06:58:36 pinelp3 reproducer.sh[5103]: 
/home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable
  Mar 18 06:58:36 pinelp3 reproducer.sh[5103]: 
/home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable
  Mar 18 06:58:37 pinelp3 reproducer.sh[5103]: 
/home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable
  Mar 18 06:58:37 pinelp3 reproducer.sh[5103]: 
/home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable
  Mar 18 06:58:37 pinelp3 reproducer.sh[5103]: 
/home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable
  Mar 18 06:58:37 pinelp3 reproducer.sh[5103]: 
/home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable
  Mar 18 06:58:37 pinelp3 systemd[1]: reproducer.service: Main process exited, 
code=exited, status=254/n/a
  Mar 18 06:58:37 pinelp3 reproducer.sh[5103]: 
/home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable
  Mar 18 06:58:37 pinelp3 systemd[1]: reproducer.service: Unit entered failed 
state.
  Mar 18 06:58:37 pinelp3 systemd[1]: reproducer.service: Failed with result 
'exit-code'.

  The default task limit of systemd services is 512. Looks like the
  service is terminated by the kernel's ips cgroup controller. I think
  this isn't correct. Child processes cannot be forked shouldn't cause
  parent to die.

  
  % cat /etc/systemd/system/reproducer.service 
  [Unit]
  Description=Reproducer of systemd services killed by ips
  After=multi-user.target

  [Service]
  ExecStart=/home/hpt/reproducer/reproducer.sh
  Type=forking

  [Install]
  WantedBy=multi-user.target

  % cat /home/hpt/reproducer/reproducer.sh
  #!/bin/bash

  foo()
  {
  #exec sh -c "echo $1: \$\$;sleep 60"
  echo $1: 
  sleep 60
  }

  bar()
  {
  c=1
  while true
  do
  for ((i=1;i<=2048;i++))
  do
  foo $c &
  ((c++))
  done

  wait
  c=1
  done
  }

  # main
  bar &

  disown -a

  exit 0

  
  ---uname output---
  Linux pinelp3 4.4.0-12-generic #28-Ubuntu SMP Wed Mar 9 00:40:38 UTC 2016 
ppc64le ppc64le ppc64le GNU/Linux
   
  Machine Type = IBM,8408-E8E,lpar 

  Steps to Reproduce
  
  1. install the simple service in "Problem description"
  2. sudo systemctl start reproducer.service
  3. wait 2~3 minutes
   
  == Comment: #3 - Vaishnavi Bhat  - 2016-03-22 11:21:55 ==
  From the machine,
  root@pinelp3:~# ulimit -a
  core file size  (blocks, -c) 0
  data seg size   (kbytes, -d) unlimited
  scheduling priority (-e) 0
  file size   (blocks, -f) unlimited
  pending signals (-i) 48192
  max locked memory   (kbytes, -l) 64
  max memory size (kbytes, -m) unlimited
  open files  (-n) 1024
  pipe size(512 bytes, -p) 8
  POSIX message queues (bytes, -q) 819200
  real-time priority  (-r) 0
  stack size  (kbytes, -s) 8192
  cpu time   (seconds, -t) unlimited
  max user processes  (-u) 48192
  virtual memory  (kbytes, -v) unlimited
  file locks  (-x) unlimited

  root@pinelp3:~# ps aux | wc -l  ->While the service is 
running 
  1084
  root@pinelp3:~# ps aux | wc -l   "
  1084
  root@pinelp3:~# ps aux | wc -l   "
  1084
  root@pinelp3:~# ps aux | wc -l   "
  1084
  root@pinelp3:~# ps aux | wc -l-->While the service is not 
running. 
  572

  root@pinelp3:~# free -m   --> While service is running
totalusedfree  shared  buff/cache   
available
  Mem:  12117 628 459  22   11029