Re: Bash lockups

2010-05-28 Thread Carl Johnson
Giorgos Keramidas  writes:

> On Fri, 21 May 2010 09:30:05 -0700, Carl Johnson  wrote:
>> Giorgos Keramidas  writes:
>>> Does this lock-up happen if you leave the shell 'idle' for too long
>>> over an ssh session?  There may be problems with stateful connection
>>> tracking between your terminal and the remote shell :-/
>>
>> No, I don't think that could be the problem.  I am just using ssh
>> between local machines and there is no firewall between them.  It also
>> often seems to happen to a shell as I switch away from it to another
>> one.  One suspicion is that something is sending a signal to the shell
>> as it switches, and bash sometimes doesn't handle that signal
>> properly.
>>
>> I also should have mentioned that I have been running bash as my
>> default shell for years under Linux and have never seen this problem
>> there.
>>
>> Thanks for the suggestion.
>
> That's ok.  If you can attach to the bash process with ktrace please try
> to grab a ktrace file from a deadlocked shell.  We may be able to see
> why it gets deadlocked by running kdump(8) on the shell trace file.
>
> You can run a second shell under ktrace (and hope that the parent
> doesn't deadlock before the traced child shell), by running:
>
> bash$ ktrace -f bash.trace bash --login
>
> When you exit from the child shell you can dump ktrace(8) events from
> the bash.trace file with:
>
> bash$ kdump -f bash.trace > logfile 2>&1
>
> Looking near the last records dumped in 'logfile' should be quite
> informative if the process is dead-locked or spinning around the same
> code over and over again.

I finally got one after starting ktrace a few days ago.  It is
informative, but it raises as many questions as it answers.  It
basically just wrote out the prompt, *started* to setup for reading
the input and just stopped.  I ran gdb on it and it is stuck looping
somewhere in getenv.  I don't have the system compiled with debugging,
so I have limited information on what it is doing there.  I checked
multiple times, and I also saw getenv running routines such as memset,
strlen, mbrtowc, and wcsnrtombs.

The following is the tail end of the 'kdump -Ef' output:
  67263 bash 61412.013860 GIO   fd 2 wrote 28 bytes
   0x 0d0f 1b5b 316d 5b63 6172 6c6a 4063 6a62 7364 3874 207e 5d24 1b5b  
|...[1m[ca...@cjbsd8t ~]$.[|
   0x001a 6d20  
|m |

  67263 bash 61412.013867 RET   write 28/0x1c
  67263 bash 61412.013874 CALL  sigprocmask(SIG_SETMASK,0x80e133c,0)
  67263 bash 61412.013880 RET   sigprocmask 0

and the following is the similar section of a normal prompt:
  67263 bash 61403.461469 GIO   fd 2 wrote 27 bytes
   0x 0f1b 5b31 6d5b 6361 726c 6a40 636a 6273 6438 7420 7e5d 241b 5b6d  
|..[1m[ca...@cjbsd8t ~]$.[m|
   0x001a 20
| |
  67263 bash 61403.461476 RET   write 27/0x1b
  67263 bash 61403.461483 CALL  sigprocmask(SIG_SETMASK,0x80e133c,0)
  67263 bash 61403.461489 RET   sigprocmask 0
  67263 bash 61403.461497 CALL  sigprocmask(SIG_BLOCK,0,0x80e1e3c)
  67263 bash 61403.461504 RET   sigprocmask 0
  67263 bash 61403.461513 CALL  read(0,0xbfbfd95f,0x1)

I just realized there is an extra CR at the beginning of that prompt
(28 bytes instead of 27) that I don't see elsewhere, but nothing else
before that looks different.  This one is an i368 8.0 release, but I
also have another hung shell in a amd64 7.3 release system in
VirtualBox.  I just checked my other ktrace logs and I found one
other place where that extra CR occurs, but there is no lockup there
and that was my other system.

The following is a section of a backtrace from gdb: 
  #0  0x28308540 in mbrtowc () from /lib/libc.so.7
  #1  0x080c7ce6 in getenv ()
  #2  0x080c1335 in getenv ()
  #3  0x080ae1d4 in getenv ()
  #4  0x080ac4b0 in getenv ()
  #5  0x080ac815 in getenv ()
  #6  0x080c3955 in getenv ()
  #7  0x080c3ac9 in getenv ()
  #8  0x080ac4b0 in getenv ()
  #9  0x080ac815 in getenv ()
  #10 0x080acb6c in getenv ()
  #11 0x080acf55 in getenv ()
  #12 0x08054611 in ?? ()
  #13 0x284a9a80 in ?? ()
  ...
  #67 0x2832cbfd in time () from /lib/libc.so.7

The first few entries change when I let it run for a while, but the
last 8-9 getenv addresses and everything before them remain the same.
There are a total of about 65 backtrace entries this time, some of
which are 0x addresses which seem suspicious.  The backtrace
from the other hung shell is also in getenv, but I didn't have ktrace
running on that one.

I am at the limit of my experience, so does anybody else have any
ideas about what could cause this, or how I could trace it further?  I
am keeping the processes attached to gdb, so I can do further checking
on them if anyone has any other ideas.  Thanks in advance for any
help, and thanks for the help that allowed me to get this far.

-- 
Carl Johnsonca...@peak.org

_

Re: Bash lockups

2010-05-21 Thread Carl Johnson
Giorgos Keramidas  writes:

> On Fri, 21 May 2010 09:30:05 -0700, Carl Johnson  wrote:
>> Giorgos Keramidas  writes:
>>> Does this lock-up happen if you leave the shell 'idle' for too long
>>> over an ssh session?  There may be problems with stateful connection
>>> tracking between your terminal and the remote shell :-/
>>
>> No, I don't think that could be the problem.  I am just using ssh
>> between local machines and there is no firewall between them.  It also
>> often seems to happen to a shell as I switch away from it to another
>> one.  One suspicion is that something is sending a signal to the shell
>> as it switches, and bash sometimes doesn't handle that signal
>> properly.
>>
>> I also should have mentioned that I have been running bash as my
>> default shell for years under Linux and have never seen this problem
>> there.
>>
>> Thanks for the suggestion.
>
> That's ok.  If you can attach to the bash process with ktrace please try
> to grab a ktrace file from a deadlocked shell.  We may be able to see
> why it gets deadlocked by running kdump(8) on the shell trace file.
>
> You can run a second shell under ktrace (and hope that the parent
> doesn't deadlock before the traced child shell), by running:
>
> bash$ ktrace -f bash.trace bash --login
>
> When you exit from the child shell you can dump ktrace(8) events from
> the bash.trace file with:
>
> bash$ kdump -f bash.trace > logfile 2>&1
>
> Looking near the last records dumped in 'logfile' should be quite
> informative if the process is dead-locked or spinning around the same
> code over and over again.

Thanks for the detailed information.  I have been mostly a linux user,
so this is new for me.  It hasn't been happening very often lately, so
it might be a while now.  I will definitely try to keep any hung
processes around to try your suggestions.

-- 
Carl Johnsonca...@peak.org

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: Bash lockups

2010-05-21 Thread Giorgos Keramidas
On Fri, 21 May 2010 09:30:05 -0700, Carl Johnson  wrote:
> Giorgos Keramidas  writes:
>> Does this lock-up happen if you leave the shell 'idle' for too long
>> over an ssh session?  There may be problems with stateful connection
>> tracking between your terminal and the remote shell :-/
>
> No, I don't think that could be the problem.  I am just using ssh
> between local machines and there is no firewall between them.  It also
> often seems to happen to a shell as I switch away from it to another
> one.  One suspicion is that something is sending a signal to the shell
> as it switches, and bash sometimes doesn't handle that signal
> properly.
>
> I also should have mentioned that I have been running bash as my
> default shell for years under Linux and have never seen this problem
> there.
>
> Thanks for the suggestion.

That's ok.  If you can attach to the bash process with ktrace please try
to grab a ktrace file from a deadlocked shell.  We may be able to see
why it gets deadlocked by running kdump(8) on the shell trace file.

You can run a second shell under ktrace (and hope that the parent
doesn't deadlock before the traced child shell), by running:

bash$ ktrace -f bash.trace bash --login

When you exit from the child shell you can dump ktrace(8) events from
the bash.trace file with:

bash$ kdump -f bash.trace > logfile 2>&1

Looking near the last records dumped in 'logfile' should be quite
informative if the process is dead-locked or spinning around the same
code over and over again.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: Bash lockups

2010-05-21 Thread Carl Johnson
Giorgos Keramidas  writes:

> On Wed, 19 May 2010 16:14:52 -0700, Carl Johnson  wrote:
>> I have been experimenting with FreeBSD for a while, and I consistently
>> get bash lockups at irregular intervals when it is otherwise idle.  By
>> lockup, I mean that it stops responding to the keyboard and uses 100%
>> CPU.  It will sometimes go for days with no problems, but I had two
>> yesterday, and other today.  They have occurred on test systems
>> running in VirtualBox and on a real computer, both i386 and amd64
>> images, and a mixture of 7.1, 7.3 and 8.0.  They usually seem to
>> happen when I am switching tabs in konsole or switching shells in
>> screen, but other times I think they happen when I am not even using
>> the system.  The only thing I have found I can do is to do a kill -9
>> and start a new shell.
>
> Does this lock-up happen if you leave the shell 'idle' for too long over
> an ssh session?  There may be problems with stateful connection tracking
> between your terminal and the remote shell :-/

No, I don't think that could be the problem.  I am just using ssh
between local machines and there is no firewall between them.  It also
often seems to happen to a shell as I switch away from it to another
one.  One suspicion is that something is sending a signal to the shell
as it switches, and bash sometimes doesn't handle that signal
properly.

I also should have mentioned that I have been running bash as my
default shell for years under Linux and have never seen this problem
there.

Thanks for the suggestion.

-- 
Carl Johnsonca...@peak.org

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: Bash lockups

2010-05-20 Thread Giorgos Keramidas
On Wed, 19 May 2010 16:14:52 -0700, Carl Johnson  wrote:
> I have been experimenting with FreeBSD for a while, and I consistently
> get bash lockups at irregular intervals when it is otherwise idle.  By
> lockup, I mean that it stops responding to the keyboard and uses 100%
> CPU.  It will sometimes go for days with no problems, but I had two
> yesterday, and other today.  They have occurred on test systems
> running in VirtualBox and on a real computer, both i386 and amd64
> images, and a mixture of 7.1, 7.3 and 8.0.  They usually seem to
> happen when I am switching tabs in konsole or switching shells in
> screen, but other times I think they happen when I am not even using
> the system.  The only thing I have found I can do is to do a kill -9
> and start a new shell.

Does this lock-up happen if you leave the shell 'idle' for too long over
an ssh session?  There may be problems with stateful connection tracking
between your terminal and the remote shell :-/

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: Bash lockups

2010-05-20 Thread Carl Johnson
vogelke+u...@pobox.com (Karl Vogel) writes:

>>> On Wed, 19 May 2010 16:14:52 -0700, 
>>> Carl Johnson  said:
>
> C> I have been experimenting with FreeBSD for a while, and I consistently
> C> get bash lockups at irregular intervals when it is otherwise idle.
> C> Does anybody have any suggestings on how I could try to trace this?
>
>1.  Get a process-table list every minute or so via cron.  It might show
>something else running or trying to run when you have your lockups.
>Try "ps -axw -o user,pid,ppid,pgid,tt,start,time,command".
>
>2.  Get the PID of the bash session, and run something like this as root:
>
>pid=12345
>k=1
>while true; do
>truss -p $pid 2>&1 | head -1000 > /dir-with-lots-of-space/$k
>k=`expr $k + 1`
>done
>
>This should break the truss output into 1000-line chunks and let you
>clean out the directory before it chews up all your space.  Hopefully
>one of the truss files will show something useful after a lockup.

Thanks for the ideas.  I keep several windows with shells open so I
don't want to trace all of them yet.  I don't even know what the
shells are doing when they lock up, so for now I'll just wait until
one locks up and then try truss to see what it is actually doing.
This happens only occasionally, so I will probably have to wait a
while.

I don't know this is actually just a bash problem since I have never
had it happen running on Linux in at least 10 years.
-- 
Carl Johnsonca...@peak.org

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: Bash lockups

2010-05-20 Thread Karl Vogel
>> On Wed, 19 May 2010 16:14:52 -0700, 
>> Carl Johnson  said:

C> I have been experimenting with FreeBSD for a while, and I consistently
C> get bash lockups at irregular intervals when it is otherwise idle.
C> Does anybody have any suggestings on how I could try to trace this?

   1.  Get a process-table list every minute or so via cron.  It might show
   something else running or trying to run when you have your lockups.
   Try "ps -axw -o user,pid,ppid,pgid,tt,start,time,command".

   2.  Get the PID of the bash session, and run something like this as root:

   pid=12345
   k=1
   while true; do
   truss -p $pid 2>&1 | head -1000 > /dir-with-lots-of-space/$k
   k=`expr $k + 1`
   done

   This should break the truss output into 1000-line chunks and let you
   clean out the directory before it chews up all your space.  Hopefully
   one of the truss files will show something useful after a lockup.

-- 
Karl Vogel  I don't speak for the USAF or my company

REMOTE CONTROL - female, because it gives a man pleasure, he'd be lost
without it, and while he doesn't always know the right buttons to push,
he keeps trying.   --from the "What gender are they?" list
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Bash lockups

2010-05-19 Thread Carl Johnson
I have been experimenting with FreeBSD for a while, and I consistently
get bash lockups at irregular intervals when it is otherwise idle.  By
lockup, I mean that it stops responding to the keyboard and uses 100%
CPU.  It will sometimes go for days with no problems, but I had two
yesterday, and other today.  They have occurred on test systems
running in VirtualBox and on a real computer, both i386 and amd64
images, and a mixture of 7.1, 7.3 and 8.0.  They usually seem to
happen when I am switching tabs in konsole or switching shells in
screen, but other times I think they happen when I am not even using
the system.  The only thing I have found I can do is to do a kill -9
and start a new shell.

Does anybody have any suggestings on how I could try to trace this?  I
haven't been able to find any bug reports, but I don't know enough to
know how to search the FreeBSD problem reports very well.

Thanks for any help.  I already subscribe to this list, so there is no
need to cc me.
-- 
Carl Johnsonca...@peak.org

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"