Rocco Caputo wrote:
On Thu, Feb 03, 2005 at 05:01:26PM -0800, Ofer Nave wrote:
I've spent all day trying to figure out why my program segfaults when I hit Ctrl-C. It's a massive program, with thousands of lines and dozens of modules and PoCo::Child and PoCo::IKC, and it spawns processes on remote hosts and communicates with them... total mess.
Aie!! Thanks for shrinking the test case. I don't have Sybase to run it against, but it's a lot easier to inspect by eye this way.
Thank *you* for looking at my test case. :)
I just did a quick test with just DBD::Sybase including, starting and killing my script over and over. If I ran ps immediately after, I was sometimes quick enough to catch something like this:After many hours, I finally got it down to a simple eight line program that demonstrates the problem:
#!/usr/bin/perl -w use DBD::Sybase; use POE; print "started (PID=$$)\n"; print "going into infinite loop\n"; my $i = 0; while ( 1 ) { $i += 2; $i -= 2; last if ( $i == 9999 ); } print "done\n";
If I run this program and then hit Ctrl-C, it says 'Segmentation fault' and returns to the prompt. It also leaves behind a fork of the process with a pid equal to the original pid + 2. That is, if I started the program and it had pid 13582 when it started, then after it segfaults, there's an identical process hanging around the process table with pid 13584.
POE doesn't fork, so I can only blame DBD::Sybase for that.
onave 14882 0.0 0.0 18700 4200 pts/4 S 11:04 0:00 /usr/bin/perl -w ./seg.pl sybase
onave 14884 0.0 0.0 0 0 pts/4 Z 11:04 0:00 [seg.pl <defunct>]
onave 14885 0.0 0.0 0 0 pts/4 Z 11:04 0:00 [seg.pl <defunct>]
onave 14886 0.0 0.0 0 0 pts/4 Z 11:04 0:00 [seg.pl <defunct>]
onave 14887 0.0 0.0 0 0 pts/4 Z 11:04 0:00 [seg.pl <defunct>]
onave 14888 0.0 0.0 0 0 pts/4 Z 11:04 0:00 [seg.pl <defunct>]
Which hung around for a second before disappearing. Why starting one script that does nothing but use DBD::Sybase and start an infinite while loop would result in six processes, I don't know. I skipped through the DBD::Sybase v1.04 perl module and didn't see a single call to system(), fork(), or even a backtick. But I can't account for the C code that goes with it, or DBI itself, which also gets loaded.
If I comment out 'use POE', it doesn't segfault. If I comment out 'DBD::Sybase', it ignores my Ctrl-C and I have to use the kill command.
POE before 0.30 registers signal handlers for most of %SIG because it supports a generic _signal event. POE 0.30 and beyond don't register signal handlers willy-nilly, so at the very least ^C should interrupt your test program.
Ah. Cool.
I grabbed a copy of v0.3009 and installed it in a tmp directory. My test script no longer screams segfault when I hit Ctrl-C, and though I did see a few leftover processes in the beginning, I now seem unable to generate them. I just started and killed the script 50 times, and the process table was clean. So it's a marked improvement.I'm not exactly sure how to go about solving this problem. I also don't know much about what POE is doing with signals in the background, but from a few tests it's obvious that %SIG is almost empty in a typical non-POE perl script, but the minute you include POE, all signals get handlers. That might explain why I can't Ctrl-C when I include POE, but it doesn't explain why I Ctrl-C segfaults when I include both POE and DBD::Sybase.
I don't want to sound cold-hearted, but you should probably try the test case against the latest CPAN release of POE. The newer signal semantics may solve your problem.
I'm sure hoping they do, anyway. :)
However, throwing the new version into my real script didn't help. It still segfaults on Ctrl-C. :( I'll post again when I have more time to re-analyze with v0.3009.
-ofer