Using Solaris 2.8 and Perl v5.8.7 built for sun4-solaris-thread-multi

I have a small script that is given a collection of email addresses, 1-800+ along with a message and it goes through each address and sends the message to each address. If the address is a standard email, it simply emails it. If its a email hosted by my2way.com the script uses the LWP / HTTP modules to post the message to their web page.

When the program starts up, it spawns up to 25 worker threads to handle the
collection of the messages. On rare occasions, a thread create fails and my program goes down hill from there... Not being a perl expert, I'm struggling to find how
to recover gracefully.

I could post the entire script if needed, but here are the key parts.

#!/apps/soc/bin/perl
use threads;
use threads::shared;

sub worker_thread ($)
{
   my $thread_number = $_[0];
   my $single_address;
   while (1)
   {
       {        # used for locking
           lock (@temp_address_list);
if (!defined ($single_address = shift @temp_address_list)) #get the next address
           {
# print LOGFILE format_log_header () . "Thread: $thread_number NULL addresses left - exiting: $thread_number\n";
               return;  #terminate thread
           }
       }        # unlock here
       if (!send_page ($single_address, $thread_number))
       {
           return;        #terminate this thread if it errors
       }
   }
}


##        Now start a loop of each address to send to.
##
if ($address_count > $max_thread_count) {
   $threads_started = $max_thread_count
}
else {
   $threads_started = $address_count;
}

for ($loop = 0; $loop < $threads_started; $loop++)
{
   print LOGFILE format_log_header () . "Thread Starting: $loop\n";
   print format_log_header () . "Thread Starting: $loop\n";
   $threads_list[$loop] = threads->new(\&worker_thread, $loop);
}

for ($loop = 0; $loop < $threads_started; $loop++)
{
   $threads_list[$loop]->join;
   print "Thread completed\n";
   print LOGFILE format_log_header () . "Reaping thread:  $loop\n";
}

The problem is I occasionally get this error:
FATAL: Callback called exit at /apps/soc/paging/send_page.pl line 447, <STDIN> line 15.
where line 447 is the threads->new statement.
After this error, I may also get these errors further into the script
FATAL: END failed--call queue aborted at /apps/soc/paging/send_page.pl line 11
which point to the header comments of the script.

My problem is probably as simple as not checking if the threads create fails, but what is the best way to do this? Also any idea to reduce / eliminate the failure
in the first place, it happens maybe only 1 in 500 executions.

Thanks for any help, again if people need the full script, I can easilly post it.
Brian



Reply via email to