Re: rearrange text

2003-08-31 Thread John W. Krahn
[ Top-posting fixed ]


Mike Robeson wrote:
 
 In article [EMAIL PROTECTED], [EMAIL PROTECTED] (John W. Krahn)
 wrote:
 
  You keep changing the specs Mike.  :-)   Based on your code and data above,
  this will work:
 
  #!/usr/bin/perl
  use warnings;
  use strict;
 
  print Enter the path of the INFILE to be processed: ;
  chomp( my $infile = STDIN );
  open INFILE, $infile or die Can't open $infile for input: $!;
 
  print Enter in the path of the OUTFILE: ;
  chomp( my $outfile = STDIN );
  open OUTFILE, $outfile or die Can't open $outfile for output: $!;
 
  print Enter in the LENGTH you want the sequence to be: ;
  my ( $len ) = STDIN =~ /(\d+)/ or die Invalid length parameter;
 
  print OUTFILE R 1 $len\n\n; # The top of the file is supposed
 
  $/ = '';  # Set input record separator
 
  while ( INFILE ) {
  chomp;
  next unless s/^\s*(\S+)//;
  my $name = $1;
  my @char = ( /[acgt]/g, ( '-' ) x $len )[ 0 .. $len - 1 ];
  print OUTFILE  @char   $name\n;
  }
 
  close INFILE;
  close OUTFILE;
 
  __END__
 
 Yeah, the parameters keep changing because my buddy and I had
 incorrectly remembered the format it was supposd to be (several times).

If you think it's hard on your end, the only data I can test on is the
data you post here.  :-)


 Sorry about that. Anyway, what you provided did work with some changes.
 the code you sent didn't do anything other than give me the command
 promt again - but I managed to get it to work anyway. However, I can't
 seem to figure out why I cannot use this line:
 
 my @char = ( split( // ), ( '-' ) x ( $len - length ) );

split( // ) returns a list of ALL the characters in $_ and the use of
length() assumes that ONLY valid characters are in $_


 instead of this line:
 
 my @char = ( /[a-z]/ig, ( '-' ) x $len )[ 0 .. $len - 1 ];

/[acgt]/g returns a list of ONLY the characters 'a', 'c', 'g' and 't'
and using the list slice assumes that there may be invalid characers in
$_


 in the script - I get errors ( I mean there are other changes that need
 to be made but these lines are my major focus). I just like the way the
 first line of code calculates the amount of dashes to add. It's just an
 aesthetic thing. :-)

It works fine if you KNOW that ONLY valid characters are in $_.


 Note I changed it to a-z because we use many other
 characters than atcg, for example n means unkown base. Also, just
 do not understand as clearly the second line of code above as I do the
 first. From what I can gather you are making, say, 50 dashes and then
 filling in the dashes that match the charatcers within [a-z] from left
 to right as long as there are characters to fill in the dashes. Does
 that make sense?

//g creates a list of valid characters from $_ with $len hyphens
appended on the end and that list is sliced to the length of $len.


 Anyway, when I add $/ = ''; to the original script below I get a
 
I see no script below.

 contatination error (why?). Basically, I am trying to understand why

Did you chomp the input?  Did you test for valid data before processing
it?

 certain bits of code break the script and others don't. I find I learn
 things better by trying alternate forms of code to figure out
 relationships. However, I have been looking through my perl books like
 crazy and can't seem to understand some of these relationships. I know
 it will take time and experience... I guess I'll pick up another perl
 book that provides another perspective.
 
 Sorry about taking so long with this, I am trying to make an honest
 effort to learn. :-)

No problem.   :-)


John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: rearrange text

2003-08-31 Thread Mike Robeson
Yeah, the parameters keep changing because my buddy and I had 
incorrectly remembered the format it was supposd to be (several times). 
Sorry about that. Anyway, what you provided did work with some changes. 
the code you sent didn't do anything other than give me the command 
promt again - but I managed to get it to work anyway. However, I can't 
seem to figure out why I cannot use this line:

my @char = ( split( // ), ( '-' ) x ( $len - length ) );

instead of this line: 

my @char = ( /[a-z]/ig, ( '-' ) x $len )[ 0 .. $len - 1 ];

in the script - I get errors ( I mean there are other changes that need 
to be made but these lines are my major focus). I just like the way the 
first line of code calculates the amount of dashes to add. It's just an 
aesthetic thing. :-) Note I changed it to a-z because we use many other 
characters than atcg, for example n means unkown base. Also, just 
do not understand as clearly the second line of code above as I do the 
first. From what I can gather you are making, say, 50 dashes and then 
filling in the dashes that match the charatcers within [a-z] from left 
to right as long as there are characters to fill in the dashes. Does 
that make sense?

Anyway, when I add $/ = ''; to the original script below I get a 
contatination error (why?). Basically, I am trying to understand why 
certain bits of code break the script and others don't. I find I learn 
things better by trying alternate forms of code to figure out 
relationships. However, I have been looking through my perl books like 
crazy and can't seem to understand some of these relationships. I know 
it will take time and experience... I guess I'll pick up another perl 
book that provides another perspective.

Sorry about taking so long with this, I am trying to make an honest 
effort to learn. :-)

-Thanks
-Mike


In article [EMAIL PROTECTED], [EMAIL PROTECTED] (John W. Krahn) 
wrote:

 Mike wrote:
  
  Well this is the final code I put together with everyones help from this
  group:
  
  #!/usr/bin/perl
  use warnings;
  use strict;
  
  print Enter the path of the INFILE to be processed:\n;
  chomp (my $infile = STDIN);
  open(INFILE, $infile)
or die Can't open INFILE for input: $!;
  print Enter in the path of the OUTFILE:\n;
  chomp (my $outfile = STDIN);
  open(OUTFILE, $outfile)
or die Can't open OUTFILE for input: $!;
  print Enter in the LENGTH you want the sequence to be:\n;
  chomp (my $len = STDIN);
  
  my ($name, @seq);
  while ( INFILE ) {
  chomp;
  unless ( /^\s*$/ or s/^\s*(.+)// ) {
  $name = $1;
  my @char = ( split( // ), ( '-' ) x ( $len - length ) );
  push @seq, ' '.@char   $name;
  }
  }
  
  {
 local $ =\n;
 print OUTFILE R 1 [EMAIL PROTECTED]; # The top of the file is
  supposed
  
  }
  
  close INFILE;
  close OUTFILE;
  
  [snip]
  
  However, I forgot that sometime the imput data is like this:
  
  dog
  agatgtagt
  agtggttga
  agggagc
  cat
  gcatcgatg
  agcatatgc
  mouse
  actagcatc
  acgtacgat
  
  That is the sequence of letters can span multiple lines. I would like
  the above script to handle input data that can possibly span several
  lines as well as those that do not. and output as mentioned above.
 
 You keep changing the specs Mike.  :-)   Based on your code and data above, 
 this will work:
 
 #!/usr/bin/perl
 use warnings;
 use strict;
 
 print Enter the path of the INFILE to be processed: ;
 chomp( my $infile = STDIN );
 open INFILE, $infile or die Can't open $infile for input: $!;
 
 print Enter in the path of the OUTFILE: ;
 chomp( my $outfile = STDIN );
 open OUTFILE, $outfile or die Can't open $outfile for output: $!;
 
 print Enter in the LENGTH you want the sequence to be: ;
 my ( $len ) = STDIN =~ /(\d+)/ or die Invalid length parameter;
 
 print OUTFILE R 1 $len\n\n; # The top of the file is supposed
 
 $/ = '';  # Set input record separator
 
 while ( INFILE ) {
 chomp;
 next unless s/^\s*(\S+)//;
 my $name = $1;
 my @char = ( /[acgt]/g, ( '-' ) x $len )[ 0 .. $len - 1 ];
 print OUTFILE  @char   $name\n;
 }
 
 close INFILE;
 close OUTFILE;
 
 __END__
 
 
 
 John

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: rearrange text

2003-08-30 Thread Mike
Well this is the final code I put together with everyones help from this 
group:

#!/usr/bin/perl
use warnings;
use strict;

print Enter the path of the INFILE to be processed:\n;

chomp (my $infile = STDIN);

open(INFILE, $infile)
  or die Can't open INFILE for input: $!;

print Enter in the path of the OUTFILE:\n;

chomp (my $outfile = STDIN);

open(OUTFILE, $outfile)
  or die Can't open OUTFILE for input: $!;

print Enter in the LENGTH you want the sequence to be:\n;

chomp (my $len = STDIN);

my ($name, @seq);
while ( INFILE ) {
chomp;
unless ( /^\s*$/ or s/^\s*(.+)// ) {
$name = $1;   
my @char = ( split( // ), ( '-' ) x ( $len - length ) ); 
push @seq, ' '.@char   $name;
}
}

{
   local $ =\n;
   print OUTFILE R 1 [EMAIL PROTECTED]; # The top of the file is 
supposed
 
}

close INFILE;
close OUTFILE;



Basically it will take this file:

dog
atcgc
cat
atcgctac
mouse
agctata


and turn it into this:
R 1 10
 a t c g c - - - - -   dog
 a t c g c t a c - -   cat
 a g c t a t a - - -   mouse

However, I forgot that sometime the imput data is like this:

dog
agatgtagt
agtggttga
agggagc
cat
gcatcgatg
agcatatgc
mouse
actagcatc
acgtacgat

That is the sequence of letters can span multiple lines. I would like 
the above script to handle input data that can possibly span several 
lines as well as those that do not. and output as mentioned above.

You all have been much help! I have really learned a lot with the help 
you've given so far!

-Thanks!
-Mike



In article [EMAIL PROTECTED],
 [EMAIL PROTECTED] (David Wall) wrote:

 --On Monday, August 25, 2003 6:50 PM -0400 Mike Robeson 
 [EMAIL PROTECTED] wrote:
 
  OK, I feel like an idiot. When I initially asked for help with this I
  just realized that I forgot two little details. I was supposed to add
  the number of sequences as well as the length of the sequences at the
  top of the output file.
 
  That is this file:
 
  dog
  agatagatcgcatcga
  cat
  acgcttcgatacgctagctta
  mouse
  agatatacgggtt
 
  is relly supposed to be:
 
  3 22
  a g a t a g a t c g c a t c g a - - - - - -dog
  a c g c t t c g a t a c g c t a g c t t a -cat
  a g a t a t a c g g g t t - - - - - - - - -mouse
 
  The '3' represents the number of individual sequences in the file (i.e.
  dog, cat, mouse). And the 22 is the number of letters and dashes there
  are. The length is already in the script as $len. I am able to get the
  length listed at the top. However, I cannot find a way to have the
  number of sequences (the 3 in this case) printed to the top.
 
 Here's one way (slightly altering John's solution), but it will use lots of 
 memory if the sequences are long.
 
 
 #!/usr/bin/perl
 use warnings;
 use strict;
 
 my ($name, $num_seq, @seq);
 my $len = 30;
 while ( DATA ) {
 unless ( /^\s*$/ or s/^\s*(\S+)// ) {
 my $name = $1;
 my @char = ( /[acgt]/g, ( '-' ) x $len )[ 0 .. $len - 1 ];
 push @seq, @char$name;
 $num_seq++;
 }
 }
 {
 local $ =\n;
 print [EMAIL PROTECTED];
 }
 
 __DATA__
   dog
 agatagatcgcatcga
   cat
 acgcttcgatacgctagctta
   mouse
 agatatacgggt

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: rearrange text

2003-08-30 Thread John W. Krahn
Mike wrote:
 
 Well this is the final code I put together with everyones help from this
 group:
 
 #!/usr/bin/perl
 use warnings;
 use strict;
 
 print Enter the path of the INFILE to be processed:\n;
 chomp (my $infile = STDIN);
 open(INFILE, $infile)
   or die Can't open INFILE for input: $!;
 print Enter in the path of the OUTFILE:\n;
 chomp (my $outfile = STDIN);
 open(OUTFILE, $outfile)
   or die Can't open OUTFILE for input: $!;
 print Enter in the LENGTH you want the sequence to be:\n;
 chomp (my $len = STDIN);
 
 my ($name, @seq);
 while ( INFILE ) {
 chomp;
 unless ( /^\s*$/ or s/^\s*(.+)// ) {
 $name = $1;
 my @char = ( split( // ), ( '-' ) x ( $len - length ) );
 push @seq, ' '.@char   $name;
 }
 }
 
 {
local $ =\n;
print OUTFILE R 1 [EMAIL PROTECTED]; # The top of the file is
 supposed
 
 }
 
 close INFILE;
 close OUTFILE;
 
 [snip]
 
 However, I forgot that sometime the imput data is like this:
 
 dog
 agatgtagt
 agtggttga
 agggagc
 cat
 gcatcgatg
 agcatatgc
 mouse
 actagcatc
 acgtacgat
 
 That is the sequence of letters can span multiple lines. I would like
 the above script to handle input data that can possibly span several
 lines as well as those that do not. and output as mentioned above.

You keep changing the specs Mike.  :-)   Based on your code and data above, this will 
work:

#!/usr/bin/perl
use warnings;
use strict;

print Enter the path of the INFILE to be processed: ;
chomp( my $infile = STDIN );
open INFILE, $infile or die Can't open $infile for input: $!;

print Enter in the path of the OUTFILE: ;
chomp( my $outfile = STDIN );
open OUTFILE, $outfile or die Can't open $outfile for output: $!;

print Enter in the LENGTH you want the sequence to be: ;
my ( $len ) = STDIN =~ /(\d+)/ or die Invalid length parameter;

print OUTFILE R 1 $len\n\n; # The top of the file is supposed

$/ = '';  # Set input record separator

while ( INFILE ) {
chomp;
next unless s/^\s*(\S+)//;
my $name = $1;
my @char = ( /[acgt]/g, ( '-' ) x $len )[ 0 .. $len - 1 ];
print OUTFILE  @char   $name\n;
}

close INFILE;
close OUTFILE;

__END__



John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: rearrange text

2003-08-26 Thread David Wall


--On Monday, August 25, 2003 6:50 PM -0400 Mike Robeson 
[EMAIL PROTECTED] wrote:

OK, I feel like an idiot. When I initially asked for help with this I
just realized that I forgot two little details. I was supposed to add
the number of sequences as well as the length of the sequences at the
top of the output file.
That is this file:

dog
agatagatcgcatcga
cat
acgcttcgatacgctagctta
mouse
agatatacgggtt

is relly supposed to be:

3 22
a g a t a g a t c g c a t c g a - - - - - -dog
a c g c t t c g a t a c g c t a g c t t a -cat
a g a t a t a c g g g t t - - - - - - - - -mouse
The '3' represents the number of individual sequences in the file (i.e.
dog, cat, mouse). And the 22 is the number of letters and dashes there
are. The length is already in the script as $len. I am able to get the
length listed at the top. However, I cannot find a way to have the
number of sequences (the 3 in this case) printed to the top.
Here's one way (slightly altering John's solution), but it will use lots of 
memory if the sequences are long.

#!/usr/bin/perl
use warnings;
use strict;
my ($name, $num_seq, @seq);
my $len = 30;
while ( DATA ) {
   unless ( /^\s*$/ or s/^\s*(\S+)// ) {
   my $name = $1;
   my @char = ( /[acgt]/g, ( '-' ) x $len )[ 0 .. $len - 1 ];
   push @seq, @char$name;
   $num_seq++;
   }
}
{
   local $ =\n;
   print [EMAIL PROTECTED];
}
__DATA__
 dog
agatagatcgcatcga
 cat
acgcttcgatacgctagctta
 mouse
agatatacgggt


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: rearrange text

2003-08-25 Thread Mike Robeson

I do not know what happend but the text didn't get formatted correctly 
on the list. But this is how the out put should really have been:

a g a t a g a t c g c a t c g a - - - - - -dog
a c g c t t c g a t a c g c t a g c t t a -cat
a g a t a t a c g g g t t - - - - - - - - -mouse

That is, I want the edited sequence data and the name on the same line.


-Thanks
-Mike



In article [EMAIL PROTECTED], [EMAIL PROTECTED] (John W. Krahn) 
wrote:

 Mike Robeson wrote:
  
  Hello,
 
 Hello,
 
  I am a relatively new PERL beginner and have been trying to work with
  simple bioinformatics stuff. I have so far written some very useful but
  simple bioinformatics scripts. However recently I have been trying
  to work on a script to no avail. I have a text file whose contents are:
  
   dog
  agatagatcgcatcga
   cat
  acgcttcgatacgctagctta
   mouse
  agatatacgggt
  
   and so on...
  
  I would like to turn that into this:
  
  a g a t a g a t c g c a t c g a - - - - - - - - - - - - - - -
  dog
  a c g c t t c g a t a c g c t a g c t t a - - - - - - - - - -
  cat
  a g a t a t a c g g g t t  - - - - - - - - - - - - - - - - - - -
  mouse
  
  Notice that the sequence of letters varies however I need the lines in
  the newly formed file to be equal in length by adding the appropriate
  amount of dashes. For those in the know I am trying to convert a FASTA
  file into a DCSE file.
  
  I have been beating my head for the past 2 weeks and I cannot figure
  out how to do this. I do not expect a complete answer (I would like to
  try figuring this out on my own as much as possible) but rather some
  guidance. Any detailed pseudo-code would be appreciated!!
 
 According to your data this should work:
 
 #!/usr/bin/perl
 use warnings;
 use strict;
 
 my $len = 30;  # pad out to this length
 while ( DATA ) {
 unless ( s/^\s*// ) {
 chomp;
 my @char = ( split( // ), ( '-' ) x ( $len - length ) );
 $_ = @char\n;
 }
 print;
 }
 
 __DATA__
  dog
 agatagatcgcatcga
  cat
 acgcttcgatacgctagctta
  mouse
 agatatacgggt
 
 
 
 John

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: rearrange text

2003-08-25 Thread zsdc
Mike Robeson wrote:

I do not know what happend but the text didn't get formatted correctly 
on the list. But this is how the out put should really have been:

a g a t a g a t c g c a t c g a - - - - - -dog
a c g c t t c g a t a c g c t a g c t t a -cat
a g a t a t a c g g g t t - - - - - - - - -mouse
That is, I want the edited sequence data and the name on the same line.
I'd hate to mess with your DNA (but just in case -- I, for one, welcome 
our new super dog-cat-mouse mutant overlords) but I'll post two links 
you may find interesting (if you don't already know them, that is).

First, you might take a look at the bioperl project: http://bioperl.org/ 
There's a mailing list you may find very useful, bioperl-l at 
bioperl.org: http://www.bioperl.org/MailList.shtml

There's also a book, Beginning Perl for Bioinformatics written by James 
Tisdall: http://www.oreilly.com/catalog/begperlbio/

I wish you good luck with your creations.

-zsdc.

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: rearrange text

2003-08-25 Thread John W. Krahn
Mike Robeson wrote:
 
 In article [EMAIL PROTECTED], [EMAIL PROTECTED] (John W. Krahn)
 wrote:
 
  According to your data this should work:
 
  #!/usr/bin/perl
  use warnings;
  use strict;
 
  my $len = 30;  # pad out to this length
  while ( DATA ) {
  unless ( s/^\s*// ) {
  chomp;
  my @char = ( split( // ), ( '-' ) x ( $len - length ) );
  $_ = @char\n;
  }
  print;
  }
 
  __DATA__
   dog
  agatagatcgcatcga
   cat
  acgcttcgatacgctagctta
   mouse
  agatatacgggt
 
 I do not know what happend but the text didn't get formatted correctly
 on the list. But this is how the out put should really have been:
 
 a g a t a g a t c g c a t c g a - - - - - -dog
 a c g c t t c g a t a c g c t a g c t t a -cat
 a g a t a t a c g g g t t - - - - - - - - -mouse
 
 That is, I want the edited sequence data and the name on the same line.

#!/usr/bin/perl
use warnings;
use strict;

my $len = 30;
my $name;
while ( DATA ) {
chomp;
unless ( s/^\s*(.+)// ) {
$name = $1;
my @char = ( split( // ), ( '-' ) x ( $len - length ) );
print @char$name\n;
}
}

__DATA__
 dog
agatagatcgcatcga
 cat
acgcttcgatacgctagctta
 mouse
agatatacgggt



John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: rearrange text

2003-08-25 Thread Mike Robeson
John,
Thanks for the help!! I new it had to be simple... but I just didn't see 
it! I just need to add some more code to it but I think I can take it 
from here.

Thanks again!
-Mike




In article [EMAIL PROTECTED], [EMAIL PROTECTED] (John W. Krahn) 
wrote:

 Mike Robeson wrote:
  
  In article [EMAIL PROTECTED], [EMAIL PROTECTED] (John W. Krahn)
  wrote:
  
   According to your data this should work:
  
   #!/usr/bin/perl
   use warnings;
   use strict;
  
   my $len = 30;  # pad out to this length
   while ( DATA ) {
   unless ( s/^\s*// ) {
   chomp;
   my @char = ( split( // ), ( '-' ) x ( $len - length ) );
   $_ = @char\n;
   }
   print;
   }
  
   __DATA__
dog
   agatagatcgcatcga
cat
   acgcttcgatacgctagctta
mouse
   agatatacgggt
  
  I do not know what happend but the text didn't get formatted correctly
  on the list. But this is how the out put should really have been:
  
  a g a t a g a t c g c a t c g a - - - - - -dog
  a c g c t t c g a t a c g c t a g c t t a -cat
  a g a t a t a c g g g t t - - - - - - - - -mouse
  
  That is, I want the edited sequence data and the name on the same line.
 
 #!/usr/bin/perl
 use warnings;
 use strict;
 
 my $len = 30;
 my $name;
 while ( DATA ) {
 chomp;
 unless ( s/^\s*(.+)// ) {
 $name = $1;
 my @char = ( split( // ), ( '-' ) x ( $len - length ) );
 print @char$name\n;
 }
 }
 
 __DATA__
  dog
 agatagatcgcatcga
  cat
 acgcttcgatacgctagctta
  mouse
 agatatacgggt
 
 
 
 John

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: rearrange text

2003-08-25 Thread John W. Krahn
Mike Robeson wrote:
 
 In article [EMAIL PROTECTED], [EMAIL PROTECTED] (John W. Krahn)
 wrote:
 
  Mike Robeson wrote:
  
   I do not know what happend but the text didn't get formatted correctly
   on the list. But this is how the out put should really have been:
  
   a g a t a g a t c g c a t c g a - - - - - -dog
   a c g c t t c g a t a c g c t a g c t t a -cat
   a g a t a t a c g g g t t - - - - - - - - -mouse
  
   That is, I want the edited sequence data and the name on the same line.
 
  #!/usr/bin/perl
  use warnings;
  use strict;
 
  my $len = 30;
  my $name;
  while ( DATA ) {
  chomp;
  unless ( s/^\s*(.+)// ) {
  $name = $1;
  my @char = ( split( // ), ( '-' ) x ( $len - length ) );
  print @char$name\n;
  }
  }
 
  __DATA__
   dog
  agatagatcgcatcga
   cat
  acgcttcgatacgctagctta
   mouse
  agatatacgggt
 
 Thanks for the help!! I new it had to be simple... but I just didn't see
 it! I just need to add some more code to it but I think I can take it
 from here.

You can make that a bit more robust.  :-)

#!/usr/bin/perl
use warnings;
use strict;

my $len = 30;
while ( DATA ) {
unless ( /^\s*$/ or s/^\s*(\S+)// ) {
my $name = $1;
my @char = ( /[acgt]/g, ( '-' ) x $len )[ 0 .. $len - 1 ];
print @char$name\n;
}
}

__DATA__
 dog
agatagatcgcatcga
 cat
acgcttcgatacgctagctta
 mouse
agatatacgggt



John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: rearrange text

2003-08-25 Thread Mike Robeson
OK, I feel like an idiot. When I initially asked for help with this I 
just realized that I forgot two little details. I was supposed to add 
the number of sequences as well as the length of the sequences at the 
top of the output file. 

That is this file:

dog
agatagatcgcatcga
cat
acgcttcgatacgctagctta
mouse
agatatacgggtt

is relly supposed to be:

3 22
a g a t a g a t c g c a t c g a - - - - - -dog
a c g c t t c g a t a c g c t a g c t t a -cat
a g a t a t a c g g g t t - - - - - - - - -mouse

The '3' represents the number of individual sequences in the file (i.e. 
dog, cat, mouse). And the 22 is the number of letters and dashes there 
are. The length is already in the script as $len. I am able to get the 
length listed at the top. However, I cannot find a way to have the 
number of sequences (the 3 in this case) printed to the top. 

Is there a way that I can just append to the outfile at the begining of 
a file?

Sorry, about this. I didn't realize I forgot to include this info. I 
guess I am to busy trying to learn PERL and I am not paying attention to 
what I need PERL to do for me!  :-)

-Thanks
-Mike


In article [EMAIL PROTECTED], [EMAIL PROTECTED] (John W. Krahn) 
wrote:

 Mike Robeson wrote:
  
  In article [EMAIL PROTECTED], [EMAIL PROTECTED] (John W. Krahn)
  wrote:
  
   Mike Robeson wrote:
   
I do not know what happend but the text didn't get formatted correctly
on the list. But this is how the out put should really have been:
   
a g a t a g a t c g c a t c g a - - - - - -dog
a c g c t t c g a t a c g c t a g c t t a -cat
a g a t a t a c g g g t t - - - - - - - - -mouse
   
That is, I want the edited sequence data and the name on the same line.
  
   #!/usr/bin/perl
   use warnings;
   use strict;
  
   my $len = 30;
   my $name;
   while ( DATA ) {
   chomp;
   unless ( s/^\s*(.+)// ) {
   $name = $1;
   my @char = ( split( // ), ( '-' ) x ( $len - length ) );
   print @char$name\n;
   }
   }
  
   __DATA__
dog
   agatagatcgcatcga
cat
   acgcttcgatacgctagctta
mouse
   agatatacgggt
  
  Thanks for the help!! I new it had to be simple... but I just didn't see
  it! I just need to add some more code to it but I think I can take it
  from here.
 
 You can make that a bit more robust.  :-)
 
 #!/usr/bin/perl
 use warnings;
 use strict;
 
 my $len = 30;
 while ( DATA ) {
 unless ( /^\s*$/ or s/^\s*(\S+)// ) {
 my $name = $1;
 my @char = ( /[acgt]/g, ( '-' ) x $len )[ 0 .. $len - 1 ];
 print @char$name\n;
 }
 }
 
 __DATA__
  dog
 agatagatcgcatcga
  cat
 acgcttcgatacgctagctta
  mouse
 agatatacgggt
 
 
 
 John

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: rearrange text

2003-08-24 Thread Wagner, David --- Senior Programmer Analyst --- WGO
Mike Robeson wrote:
 Hello,
 
 I am a relatively new PERL beginner and have been trying to work with
 simple bioinformatics stuff. I have so far written some very useful
 but simple bioinformatics scripts. However recently I have been
 trying to work on a script to no avail. I have a text file whose
 contents are: 
 
  dog
 agatagatcgcatcga
  cat
 acgcttcgatacgctagctta
  mouse
 agatatacgggt
 
  and so on...
 
 I would like to turn that into this:
 
 a g a t a g a t c g c a t c g a - - - - - - - - - - - - - - -
 dog
 a c g c t t c g a t a c g c t a g c t t a - - - - - - - - - -
 cat
 a g a t a t a c g g g t t  - - - - - - - - - - - - - - - - - - -
 mouse
 
 Notice that the sequence of letters varies however I need the lines in
 the newly formed file to be equal in length by adding the appropriate
 amount of dashes. For those in the know I am trying to convert a FASTA
 file into a DCSE file.
 
 I have been beating my head for the past 2 weeks and I cannot figure
 out how to do this. I do not expect a complete answer (I would like to
 try figuring this out on my own as much as possible) but rather some
 guidance. Any detailed pseudo-code would be appreciated!!
 
 -Thanks!
 -Mike
Here is a shot and at least one way to try it:

#!perl -w

use strict;

my @MyWorka = ();
my $In = 0;
my $MyItem1 ;
my $MyItem2 ;
my $MyMaxLen = 35;

while ( DATA ) {
chomp;
next if ( /^\s*$/ );
if ( /^\s+(\S+)/ ) {
$MyItem1 = $1;
chomp($MyItem2 = DATA);
$In++;
$MyItem2 =~ s/\s+//g;
my $MyLen = length($MyItem2);
if ( $MyLen  $MyMaxLen ) {
my $MyExtra = $MyMaxLen - $MyLen;
$MyItem2 .= sprintf %s, '-'x$MyExtra;
 }
@MyWorka = split(//,$MyItem2);
printf %-2sx$MyMaxLen , @MyWorka;
printf \n%-s\n\n,
$MyItem1;
 }
 }

__DATA__
 dog
agatagatcgcatcga
 cat
acgcttcgatacgctagctta
 mouse
agatatacgggt

Output:
a g a t a g a t c g c a t c g a - - - - - - - - - - - - - - - - - - -
dog

a c g c t t c g a t a c g c t a g c t t a - - - - - - - - - - - - - -
cat

a g a t a t a c g g g t - - - - - - - - - - - - - - - - - - - - - - -
mouse



**
This message contains information that is confidential
and proprietary to FedEx Freight or its affiliates.
It is intended only for the recipient named and for
the express purpose(s) described therein.
Any other use is prohibited.



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: rearrange text

2003-08-24 Thread John W. Krahn
Mike Robeson wrote:
 
 Hello,

Hello,

 I am a relatively new PERL beginner and have been trying to work with
 simple bioinformatics stuff. I have so far written some very useful but
 simple bioinformatics scripts. However recently I have been trying
 to work on a script to no avail. I have a text file whose contents are:
 
  dog
 agatagatcgcatcga
  cat
 acgcttcgatacgctagctta
  mouse
 agatatacgggt
 
  and so on...
 
 I would like to turn that into this:
 
 a g a t a g a t c g c a t c g a - - - - - - - - - - - - - - -
 dog
 a c g c t t c g a t a c g c t a g c t t a - - - - - - - - - -
 cat
 a g a t a t a c g g g t t  - - - - - - - - - - - - - - - - - - -
 mouse
 
 Notice that the sequence of letters varies however I need the lines in
 the newly formed file to be equal in length by adding the appropriate
 amount of dashes. For those in the know I am trying to convert a FASTA
 file into a DCSE file.
 
 I have been beating my head for the past 2 weeks and I cannot figure
 out how to do this. I do not expect a complete answer (I would like to
 try figuring this out on my own as much as possible) but rather some
 guidance. Any detailed pseudo-code would be appreciated!!

According to your data this should work:

#!/usr/bin/perl
use warnings;
use strict;

my $len = 30;  # pad out to this length
while ( DATA ) {
unless ( s/^\s*// ) {
chomp;
my @char = ( split( // ), ( '-' ) x ( $len - length ) );
$_ = @char\n;
}
print;
}

__DATA__
 dog
agatagatcgcatcga
 cat
acgcttcgatacgctagctta
 mouse
agatatacgggt



John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: rearrange text

2003-08-24 Thread John W. Krahn
David --- Senior Programmer Analyst --- Wgo Wagner wrote:
 
 #!perl -w
 use strict;
 
 my @MyWorka = ();
 my $In = 0;
 my $MyItem1 ;
 my $MyItem2 ;

There is no reason to declare these variables with file scope as they
are only used inside the while loop.

 my $MyMaxLen = 35;
 
 while ( DATA ) {
 chomp;
 next if ( /^\s*$/ );
 if ( /^\s+(\S+)/ ) {
 $MyItem1 = $1;

  my $MyItem1 = $1;

 chomp($MyItem2 = DATA);

  chomp( my $MyItem2 = DATA );

 $In++;

What does this do?  It isn't used anywhere else.

 $MyItem2 =~ s/\s+//g;
 my $MyLen = length($MyItem2);
 if ( $MyLen  $MyMaxLen ) {
 my $MyExtra = $MyMaxLen - $MyLen;
 $MyItem2 .= sprintf %s, '-'x$MyExtra;

The use of sprintf is a bit redundant.

  $MyItem2 .= '-' x $MyExtra;

  }
 @MyWorka = split(//,$MyItem2);
 printf %-2sx$MyMaxLen , @MyWorka;
 printf \n%-s\n\n,
 $MyItem1;

No need for printf here.

  print join ' ', @MyWorka;
  print \n$MyItem1\n\n,

  }
  }



John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]