Dear all,

I am trying to speed up a very long procedure that I need to run on multiple 
files and though that I could multithread different jobs on different files 
across multiple CPUs. For some reason that I don't really get, I only achieve 
very small time gain. I have included my script which essentially repeat the 
same function, extractSeq() on multiple files using a maximum of four threads.

I would really appreciate if I could finally understand how to use threads to 
speed up some of my lengthy scripts.

Thanks

Marco

#!/usr/local/bin/perl -w

use strict;
use Bio::SeqIO;
use threads;
use Getopt::Std;

our $opt_p;

init();
my @thr;
for (my $i=0;$i<=$#ARGV;$i++){
  push @thr, threads->new(\&extractSeq, $ARGV[$i]);
  if (scalar(@thr) == $opt_p || $i == $#ARGV){
    print "Running ",scalar(@thr)," parallel jobs\n";
    $_->join for @thr;
    undef @thr;
  }
}

sub extractSeq {
  my $file=shift;

  my ($dir,$pre,$suf) = ($file=~/(^.+\/|^)(.+)\.(.+$)/);
  my $out_name = "$pre"."_CleanSeq.$suf";

  my $seqin = Bio::SeqIO->new(-file => $file,
             -format =>'fasta');

  my $seq_out = Bio::SeqIO->new(-file => ">$out_name",
                  -format => 'fasta');

  while (my $seq = $seqin->next_seq){
    if ($seq->seq =~ /AGATC/){
      $seq->seq($seq->subseq(1,$-[0]+5));

      $seq_out->write_seq($seq);
    }
  }
    return(0);
}


sub init {
  getopts("p:");
  unless (@ARGV) {
    print("extractseq.pl [-p 4] seq_1.fa [seq_2.fa ...]\n\n",
      "Take the sequences from the Solexa sequences in Fasta format and\n",
      "\t1)Find the B primer\n",
      "\t2)Extract the sequences before the B primer leaving 5 nt of B 
primer\n\n",
      "-p\tNumber of processors to be used to process the files when more than 
one files are passed to the command line\n",
      "\tDefault 4\n\n");
    exit(1);
  }
  $opt_p = 4 unless $opt_p;
  return(0);
}

--
Marco Blanchette, Ph.D.
Assistant Investigator
Stowers Institute for Medical Research
1000 East 50th St.

Kansas City, MO 64110

Tel: 816-926-4071
Cell: 816-726-8419
Fax: 816-926-2018

Reply via email to