Re: lockup in bgp_delete()

2017-03-21 Thread Luke Shumaker
On Mon, 20 Mar 2017 18:29:07 -0400,
Eduardo Bustamante wrote:
> 
> This was reported a month ago:
> http://lists.gnu.org/archive/html/bug-bash/2017-02/msg00025.html

Oh! Thank you, I tried searching the list archives, but nothing came
up.  Odd; it shows up in searches now.

-- 
Happy hacking,
~ Luke Shumaker



Re: lockup in bgp_delete()

2017-03-20 Thread Chet Ramey
On 3/20/17 6:29 PM, Eduardo Bustamante wrote:
> This was reported a month ago:
> http://lists.gnu.org/archive/html/bug-bash/2017-02/msg00025.html

The devel git branch on savannah has several fixes for this.  If you
don't want to run that on a server, you can just snag the jobs.c file
from the latest snapshot and either try to just drop it in, or see
if the attached patch works.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
*** ../bash-4.4-patched/jobs.c	2016-11-11 13:42:55.0 -0500
--- jobs.c	2017-02-22 15:16:28.0 -0500
***
*** 813,818 
struct pidstat *ps;
  
!   bucket = pshash_getbucket (pid);
!   psi = bgp_getindex ();
ps = [psi];
  
--- 796,815 
struct pidstat *ps;
  
!   /* bucket == existing chain of pids hashing to same value
!  psi = where were going to put this pid/status */
! 
!   bucket = pshash_getbucket (pid);	/* index into pidstat_table */
!   psi = bgp_getindex ();		/* bgpids.head, index into storage */
! 
!   /* XXX - what if psi == *bucket? */
!   if (psi == *bucket)
! {
! #ifdef DEBUG
!   internal_warning ("hashed pid %d (pid %d) collides with bgpids.head, skipping", psi, pid);
! #endif
!   bgpids.storage[psi].pid = NO_PID;		/* make sure */
!   psi = bgp_getindex ();			/* skip to next one */
! }
! 
ps = [psi];
  
***
*** 842,845 
--- 839,843 
  {
struct pidstat *ps;
+   ps_index_t *bucket;
  
ps = [psi];
***
*** 847,856 
  return;
  
!   if (ps->bucket_next != NO_PID)
  bgpids.storage[ps->bucket_next].bucket_prev = ps->bucket_prev;
!   if (ps->bucket_prev != NO_PID)
  bgpids.storage[ps->bucket_prev].bucket_next = ps->bucket_next;
else
! *(pshash_getbucket (ps->pid)) = ps->bucket_next;
  }
  
--- 845,861 
  return;
  
!   if (ps->bucket_next != NO_PIDSTAT)
  bgpids.storage[ps->bucket_next].bucket_prev = ps->bucket_prev;
!   if (ps->bucket_prev != NO_PIDSTAT)
  bgpids.storage[ps->bucket_prev].bucket_next = ps->bucket_next;
else
! {
!   bucket = pshash_getbucket (ps->pid);
!   *bucket = ps->bucket_next;	/* deleting chain head in hash table */
! }
! 
!   /* clear out this cell, just in case */
!   ps->pid = NO_PID;
!   ps->bucket_next = ps->bucket_prev = NO_PIDSTAT;
  }
  
***
*** 859,863 
   pid_t pid;
  {
!   ps_index_t psi;
  
if (bgpids.storage == 0 || bgpids.nalloc == 0 || bgpids.npid == 0)
--- 864,868 
   pid_t pid;
  {
!   ps_index_t psi, orig_psi;
  
if (bgpids.storage == 0 || bgpids.nalloc == 0 || bgpids.npid == 0)
***
*** 865,871 
  
/* Search chain using hash to find bucket in pidstat_table */
!   for (psi = *(pshash_getbucket (pid)); psi != NO_PIDSTAT; psi = bgpids.storage[psi].bucket_next)
! if (bgpids.storage[psi].pid == pid)
!   break;
  
if (psi == NO_PIDSTAT)
--- 870,883 
  
/* Search chain using hash to find bucket in pidstat_table */
!   for (orig_psi = psi = *(pshash_getbucket (pid)); psi != NO_PIDSTAT; psi = bgpids.storage[psi].bucket_next)
! {
!   if (bgpids.storage[psi].pid == pid)
! 	break;
!   if (orig_psi == bgpids.storage[psi].bucket_next)	/* catch reported bug */
! 	{
! 	  internal_warning ("bgp_delete: LOOP: psi (%d) == storage[psi].bucket_next", psi);
! 	  return 0;
! 	}
! }
  
if (psi == NO_PIDSTAT)
***
*** 905,909 
   pid_t pid;
  {
!   ps_index_t psi;
  
if (bgpids.storage == 0 || bgpids.nalloc == 0 || bgpids.npid == 0)
--- 917,921 
   pid_t pid;
  {
!   ps_index_t psi, orig_psi;
  
if (bgpids.storage == 0 || bgpids.nalloc == 0 || bgpids.npid == 0)
***
*** 911,917 
  
/* Search chain using hash to find bucket in pidstat_table */
!   for (psi = *(pshash_getbucket (pid)); psi != NO_PIDSTAT; psi = bgpids.storage[psi].bucket_next)
! if (bgpids.storage[psi].pid == pid)
!   return (bgpids.storage[psi].status);
  
return -1;
--- 923,936 
  
/* Search chain using hash to find bucket in pidstat_table */
!   for (orig_psi = psi = *(pshash_getbucket (pid)); psi != NO_PIDSTAT; psi = bgpids.storage[psi].bucket_next)
! {
!   if (bgpids.storage[psi].pid == pid)
! 	return (bgpids.storage[psi].status);
!   if (orig_psi == bgpids.storage[psi].bucket_next)	/* catch reported bug */
! 	{
! 	  internal_warning ("bgp_search: LOOP: psi (%d) == storage[psi].bucket_next", psi);
! 	  return -1;
! 	}
! }
  
return -1;


Re: lockup in bgp_delete()

2017-03-20 Thread Eduardo Bustamante
This was reported a month ago:
http://lists.gnu.org/archive/html/bug-bash/2017-02/msg00025.html



lockup in bgp_delete()

2017-03-20 Thread Luke Shumaker

Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='x86_64' 
-DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='x86_64-unknown-linux-gnu' 
-DCONF_VENDOR='unknown' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash' 
-DSHELL -DHAVE_CONFIG_H   -I.  -I. -I./include -I./lib  -D_FORTIFY_SOURCE=2 
-march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong 
-DDEFAULT_PATH_VALUE='/usr/local/sbin:/usr/local/bin:/usr/bin' 
-DSTANDARD_UTILS_PATH='/usr/bin' -DSYS_BASHRC='/etc/bash.bashrc' 
-DSYS_BASH_LOGOUT='/etc/bash.bash_logout' -DNON_INTERACTIVE_LOGIN_SHELLS 
-Wno-parentheses -Wno-format-security
uname output: Linux build64-par 4.9.11-gnu-1 #1 SMP PREEMPT Sun Feb 19 18:36:28 
UYT 2017 x86_64 GNU/Linux
Machine Type: x86_64-unknown-linux-gnu

Bash Version: 4.4
Patch Level: 12
Release Status: release

Description:

Occasionally, on one of my servers, a bash script in a cron
job locks up, and pegs one of the CPU cores at 100%.
Attaching to it with GDB, I see that it is stuck in the loop
in bgp_delete; it is looking for the index of the pid to
delete in bgpids.storage, but it's not there.  And
bgpids.storage being a circular linked list, it just loops
around and around, never exiting the loop.

Repeat-By:

I'm not sure.  I've seen it 3 times: on 2017-02-18 (with
4.4.11), 2017-02-27 (with 4.4.11), and 2017-03-20 (with
4.4.12).  The cron job runs daily.  So I don't quite know what
causes it.  I've left it running, and can attach to it with
GDB to answer questions, or anything.

-- 
Happy hacking,
~ Luke Shumaker