Re: rearrange text
[ Top-posting fixed ] Mike Robeson wrote: In article [EMAIL PROTECTED], [EMAIL PROTECTED] (John W. Krahn) wrote: You keep changing the specs Mike. :-) Based on your code and data above, this will work: #!/usr/bin/perl use warnings; use strict; print Enter the path of the INFILE to be processed: ; chomp( my $infile = STDIN ); open INFILE, $infile or die Can't open $infile for input: $!; print Enter in the path of the OUTFILE: ; chomp( my $outfile = STDIN ); open OUTFILE, $outfile or die Can't open $outfile for output: $!; print Enter in the LENGTH you want the sequence to be: ; my ( $len ) = STDIN =~ /(\d+)/ or die Invalid length parameter; print OUTFILE R 1 $len\n\n; # The top of the file is supposed $/ = ''; # Set input record separator while ( INFILE ) { chomp; next unless s/^\s*(\S+)//; my $name = $1; my @char = ( /[acgt]/g, ( '-' ) x $len )[ 0 .. $len - 1 ]; print OUTFILE @char $name\n; } close INFILE; close OUTFILE; __END__ Yeah, the parameters keep changing because my buddy and I had incorrectly remembered the format it was supposd to be (several times). If you think it's hard on your end, the only data I can test on is the data you post here. :-) Sorry about that. Anyway, what you provided did work with some changes. the code you sent didn't do anything other than give me the command promt again - but I managed to get it to work anyway. However, I can't seem to figure out why I cannot use this line: my @char = ( split( // ), ( '-' ) x ( $len - length ) ); split( // ) returns a list of ALL the characters in $_ and the use of length() assumes that ONLY valid characters are in $_ instead of this line: my @char = ( /[a-z]/ig, ( '-' ) x $len )[ 0 .. $len - 1 ]; /[acgt]/g returns a list of ONLY the characters 'a', 'c', 'g' and 't' and using the list slice assumes that there may be invalid characers in $_ in the script - I get errors ( I mean there are other changes that need to be made but these lines are my major focus). I just like the way the first line of code calculates the amount of dashes to add. It's just an aesthetic thing. :-) It works fine if you KNOW that ONLY valid characters are in $_. Note I changed it to a-z because we use many other characters than atcg, for example n means unkown base. Also, just do not understand as clearly the second line of code above as I do the first. From what I can gather you are making, say, 50 dashes and then filling in the dashes that match the charatcers within [a-z] from left to right as long as there are characters to fill in the dashes. Does that make sense? //g creates a list of valid characters from $_ with $len hyphens appended on the end and that list is sliced to the length of $len. Anyway, when I add $/ = ''; to the original script below I get a I see no script below. contatination error (why?). Basically, I am trying to understand why Did you chomp the input? Did you test for valid data before processing it? certain bits of code break the script and others don't. I find I learn things better by trying alternate forms of code to figure out relationships. However, I have been looking through my perl books like crazy and can't seem to understand some of these relationships. I know it will take time and experience... I guess I'll pick up another perl book that provides another perspective. Sorry about taking so long with this, I am trying to make an honest effort to learn. :-) No problem. :-) John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: rearrange text
Yeah, the parameters keep changing because my buddy and I had incorrectly remembered the format it was supposd to be (several times). Sorry about that. Anyway, what you provided did work with some changes. the code you sent didn't do anything other than give me the command promt again - but I managed to get it to work anyway. However, I can't seem to figure out why I cannot use this line: my @char = ( split( // ), ( '-' ) x ( $len - length ) ); instead of this line: my @char = ( /[a-z]/ig, ( '-' ) x $len )[ 0 .. $len - 1 ]; in the script - I get errors ( I mean there are other changes that need to be made but these lines are my major focus). I just like the way the first line of code calculates the amount of dashes to add. It's just an aesthetic thing. :-) Note I changed it to a-z because we use many other characters than atcg, for example n means unkown base. Also, just do not understand as clearly the second line of code above as I do the first. From what I can gather you are making, say, 50 dashes and then filling in the dashes that match the charatcers within [a-z] from left to right as long as there are characters to fill in the dashes. Does that make sense? Anyway, when I add $/ = ''; to the original script below I get a contatination error (why?). Basically, I am trying to understand why certain bits of code break the script and others don't. I find I learn things better by trying alternate forms of code to figure out relationships. However, I have been looking through my perl books like crazy and can't seem to understand some of these relationships. I know it will take time and experience... I guess I'll pick up another perl book that provides another perspective. Sorry about taking so long with this, I am trying to make an honest effort to learn. :-) -Thanks -Mike In article [EMAIL PROTECTED], [EMAIL PROTECTED] (John W. Krahn) wrote: Mike wrote: Well this is the final code I put together with everyones help from this group: #!/usr/bin/perl use warnings; use strict; print Enter the path of the INFILE to be processed:\n; chomp (my $infile = STDIN); open(INFILE, $infile) or die Can't open INFILE for input: $!; print Enter in the path of the OUTFILE:\n; chomp (my $outfile = STDIN); open(OUTFILE, $outfile) or die Can't open OUTFILE for input: $!; print Enter in the LENGTH you want the sequence to be:\n; chomp (my $len = STDIN); my ($name, @seq); while ( INFILE ) { chomp; unless ( /^\s*$/ or s/^\s*(.+)// ) { $name = $1; my @char = ( split( // ), ( '-' ) x ( $len - length ) ); push @seq, ' '.@char $name; } } { local $ =\n; print OUTFILE R 1 [EMAIL PROTECTED]; # The top of the file is supposed } close INFILE; close OUTFILE; [snip] However, I forgot that sometime the imput data is like this: dog agatgtagt agtggttga agggagc cat gcatcgatg agcatatgc mouse actagcatc acgtacgat That is the sequence of letters can span multiple lines. I would like the above script to handle input data that can possibly span several lines as well as those that do not. and output as mentioned above. You keep changing the specs Mike. :-) Based on your code and data above, this will work: #!/usr/bin/perl use warnings; use strict; print Enter the path of the INFILE to be processed: ; chomp( my $infile = STDIN ); open INFILE, $infile or die Can't open $infile for input: $!; print Enter in the path of the OUTFILE: ; chomp( my $outfile = STDIN ); open OUTFILE, $outfile or die Can't open $outfile for output: $!; print Enter in the LENGTH you want the sequence to be: ; my ( $len ) = STDIN =~ /(\d+)/ or die Invalid length parameter; print OUTFILE R 1 $len\n\n; # The top of the file is supposed $/ = ''; # Set input record separator while ( INFILE ) { chomp; next unless s/^\s*(\S+)//; my $name = $1; my @char = ( /[acgt]/g, ( '-' ) x $len )[ 0 .. $len - 1 ]; print OUTFILE @char $name\n; } close INFILE; close OUTFILE; __END__ John -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: rearrange text
Well this is the final code I put together with everyones help from this group: #!/usr/bin/perl use warnings; use strict; print Enter the path of the INFILE to be processed:\n; chomp (my $infile = STDIN); open(INFILE, $infile) or die Can't open INFILE for input: $!; print Enter in the path of the OUTFILE:\n; chomp (my $outfile = STDIN); open(OUTFILE, $outfile) or die Can't open OUTFILE for input: $!; print Enter in the LENGTH you want the sequence to be:\n; chomp (my $len = STDIN); my ($name, @seq); while ( INFILE ) { chomp; unless ( /^\s*$/ or s/^\s*(.+)// ) { $name = $1; my @char = ( split( // ), ( '-' ) x ( $len - length ) ); push @seq, ' '.@char $name; } } { local $ =\n; print OUTFILE R 1 [EMAIL PROTECTED]; # The top of the file is supposed } close INFILE; close OUTFILE; Basically it will take this file: dog atcgc cat atcgctac mouse agctata and turn it into this: R 1 10 a t c g c - - - - - dog a t c g c t a c - - cat a g c t a t a - - - mouse However, I forgot that sometime the imput data is like this: dog agatgtagt agtggttga agggagc cat gcatcgatg agcatatgc mouse actagcatc acgtacgat That is the sequence of letters can span multiple lines. I would like the above script to handle input data that can possibly span several lines as well as those that do not. and output as mentioned above. You all have been much help! I have really learned a lot with the help you've given so far! -Thanks! -Mike In article [EMAIL PROTECTED], [EMAIL PROTECTED] (David Wall) wrote: --On Monday, August 25, 2003 6:50 PM -0400 Mike Robeson [EMAIL PROTECTED] wrote: OK, I feel like an idiot. When I initially asked for help with this I just realized that I forgot two little details. I was supposed to add the number of sequences as well as the length of the sequences at the top of the output file. That is this file: dog agatagatcgcatcga cat acgcttcgatacgctagctta mouse agatatacgggtt is relly supposed to be: 3 22 a g a t a g a t c g c a t c g a - - - - - -dog a c g c t t c g a t a c g c t a g c t t a -cat a g a t a t a c g g g t t - - - - - - - - -mouse The '3' represents the number of individual sequences in the file (i.e. dog, cat, mouse). And the 22 is the number of letters and dashes there are. The length is already in the script as $len. I am able to get the length listed at the top. However, I cannot find a way to have the number of sequences (the 3 in this case) printed to the top. Here's one way (slightly altering John's solution), but it will use lots of memory if the sequences are long. #!/usr/bin/perl use warnings; use strict; my ($name, $num_seq, @seq); my $len = 30; while ( DATA ) { unless ( /^\s*$/ or s/^\s*(\S+)// ) { my $name = $1; my @char = ( /[acgt]/g, ( '-' ) x $len )[ 0 .. $len - 1 ]; push @seq, @char$name; $num_seq++; } } { local $ =\n; print [EMAIL PROTECTED]; } __DATA__ dog agatagatcgcatcga cat acgcttcgatacgctagctta mouse agatatacgggt -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: rearrange text
Mike wrote: Well this is the final code I put together with everyones help from this group: #!/usr/bin/perl use warnings; use strict; print Enter the path of the INFILE to be processed:\n; chomp (my $infile = STDIN); open(INFILE, $infile) or die Can't open INFILE for input: $!; print Enter in the path of the OUTFILE:\n; chomp (my $outfile = STDIN); open(OUTFILE, $outfile) or die Can't open OUTFILE for input: $!; print Enter in the LENGTH you want the sequence to be:\n; chomp (my $len = STDIN); my ($name, @seq); while ( INFILE ) { chomp; unless ( /^\s*$/ or s/^\s*(.+)// ) { $name = $1; my @char = ( split( // ), ( '-' ) x ( $len - length ) ); push @seq, ' '.@char $name; } } { local $ =\n; print OUTFILE R 1 [EMAIL PROTECTED]; # The top of the file is supposed } close INFILE; close OUTFILE; [snip] However, I forgot that sometime the imput data is like this: dog agatgtagt agtggttga agggagc cat gcatcgatg agcatatgc mouse actagcatc acgtacgat That is the sequence of letters can span multiple lines. I would like the above script to handle input data that can possibly span several lines as well as those that do not. and output as mentioned above. You keep changing the specs Mike. :-) Based on your code and data above, this will work: #!/usr/bin/perl use warnings; use strict; print Enter the path of the INFILE to be processed: ; chomp( my $infile = STDIN ); open INFILE, $infile or die Can't open $infile for input: $!; print Enter in the path of the OUTFILE: ; chomp( my $outfile = STDIN ); open OUTFILE, $outfile or die Can't open $outfile for output: $!; print Enter in the LENGTH you want the sequence to be: ; my ( $len ) = STDIN =~ /(\d+)/ or die Invalid length parameter; print OUTFILE R 1 $len\n\n; # The top of the file is supposed $/ = ''; # Set input record separator while ( INFILE ) { chomp; next unless s/^\s*(\S+)//; my $name = $1; my @char = ( /[acgt]/g, ( '-' ) x $len )[ 0 .. $len - 1 ]; print OUTFILE @char $name\n; } close INFILE; close OUTFILE; __END__ John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: rearrange text
--On Monday, August 25, 2003 6:50 PM -0400 Mike Robeson [EMAIL PROTECTED] wrote: OK, I feel like an idiot. When I initially asked for help with this I just realized that I forgot two little details. I was supposed to add the number of sequences as well as the length of the sequences at the top of the output file. That is this file: dog agatagatcgcatcga cat acgcttcgatacgctagctta mouse agatatacgggtt is relly supposed to be: 3 22 a g a t a g a t c g c a t c g a - - - - - -dog a c g c t t c g a t a c g c t a g c t t a -cat a g a t a t a c g g g t t - - - - - - - - -mouse The '3' represents the number of individual sequences in the file (i.e. dog, cat, mouse). And the 22 is the number of letters and dashes there are. The length is already in the script as $len. I am able to get the length listed at the top. However, I cannot find a way to have the number of sequences (the 3 in this case) printed to the top. Here's one way (slightly altering John's solution), but it will use lots of memory if the sequences are long. #!/usr/bin/perl use warnings; use strict; my ($name, $num_seq, @seq); my $len = 30; while ( DATA ) { unless ( /^\s*$/ or s/^\s*(\S+)// ) { my $name = $1; my @char = ( /[acgt]/g, ( '-' ) x $len )[ 0 .. $len - 1 ]; push @seq, @char$name; $num_seq++; } } { local $ =\n; print [EMAIL PROTECTED]; } __DATA__ dog agatagatcgcatcga cat acgcttcgatacgctagctta mouse agatatacgggt -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: rearrange text
I do not know what happend but the text didn't get formatted correctly on the list. But this is how the out put should really have been: a g a t a g a t c g c a t c g a - - - - - -dog a c g c t t c g a t a c g c t a g c t t a -cat a g a t a t a c g g g t t - - - - - - - - -mouse That is, I want the edited sequence data and the name on the same line. -Thanks -Mike In article [EMAIL PROTECTED], [EMAIL PROTECTED] (John W. Krahn) wrote: Mike Robeson wrote: Hello, Hello, I am a relatively new PERL beginner and have been trying to work with simple bioinformatics stuff. I have so far written some very useful but simple bioinformatics scripts. However recently I have been trying to work on a script to no avail. I have a text file whose contents are: dog agatagatcgcatcga cat acgcttcgatacgctagctta mouse agatatacgggt and so on... I would like to turn that into this: a g a t a g a t c g c a t c g a - - - - - - - - - - - - - - - dog a c g c t t c g a t a c g c t a g c t t a - - - - - - - - - - cat a g a t a t a c g g g t t - - - - - - - - - - - - - - - - - - - mouse Notice that the sequence of letters varies however I need the lines in the newly formed file to be equal in length by adding the appropriate amount of dashes. For those in the know I am trying to convert a FASTA file into a DCSE file. I have been beating my head for the past 2 weeks and I cannot figure out how to do this. I do not expect a complete answer (I would like to try figuring this out on my own as much as possible) but rather some guidance. Any detailed pseudo-code would be appreciated!! According to your data this should work: #!/usr/bin/perl use warnings; use strict; my $len = 30; # pad out to this length while ( DATA ) { unless ( s/^\s*// ) { chomp; my @char = ( split( // ), ( '-' ) x ( $len - length ) ); $_ = @char\n; } print; } __DATA__ dog agatagatcgcatcga cat acgcttcgatacgctagctta mouse agatatacgggt John -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: rearrange text
Mike Robeson wrote: I do not know what happend but the text didn't get formatted correctly on the list. But this is how the out put should really have been: a g a t a g a t c g c a t c g a - - - - - -dog a c g c t t c g a t a c g c t a g c t t a -cat a g a t a t a c g g g t t - - - - - - - - -mouse That is, I want the edited sequence data and the name on the same line. I'd hate to mess with your DNA (but just in case -- I, for one, welcome our new super dog-cat-mouse mutant overlords) but I'll post two links you may find interesting (if you don't already know them, that is). First, you might take a look at the bioperl project: http://bioperl.org/ There's a mailing list you may find very useful, bioperl-l at bioperl.org: http://www.bioperl.org/MailList.shtml There's also a book, Beginning Perl for Bioinformatics written by James Tisdall: http://www.oreilly.com/catalog/begperlbio/ I wish you good luck with your creations. -zsdc. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: rearrange text
Mike Robeson wrote: In article [EMAIL PROTECTED], [EMAIL PROTECTED] (John W. Krahn) wrote: According to your data this should work: #!/usr/bin/perl use warnings; use strict; my $len = 30; # pad out to this length while ( DATA ) { unless ( s/^\s*// ) { chomp; my @char = ( split( // ), ( '-' ) x ( $len - length ) ); $_ = @char\n; } print; } __DATA__ dog agatagatcgcatcga cat acgcttcgatacgctagctta mouse agatatacgggt I do not know what happend but the text didn't get formatted correctly on the list. But this is how the out put should really have been: a g a t a g a t c g c a t c g a - - - - - -dog a c g c t t c g a t a c g c t a g c t t a -cat a g a t a t a c g g g t t - - - - - - - - -mouse That is, I want the edited sequence data and the name on the same line. #!/usr/bin/perl use warnings; use strict; my $len = 30; my $name; while ( DATA ) { chomp; unless ( s/^\s*(.+)// ) { $name = $1; my @char = ( split( // ), ( '-' ) x ( $len - length ) ); print @char$name\n; } } __DATA__ dog agatagatcgcatcga cat acgcttcgatacgctagctta mouse agatatacgggt John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: rearrange text
John, Thanks for the help!! I new it had to be simple... but I just didn't see it! I just need to add some more code to it but I think I can take it from here. Thanks again! -Mike In article [EMAIL PROTECTED], [EMAIL PROTECTED] (John W. Krahn) wrote: Mike Robeson wrote: In article [EMAIL PROTECTED], [EMAIL PROTECTED] (John W. Krahn) wrote: According to your data this should work: #!/usr/bin/perl use warnings; use strict; my $len = 30; # pad out to this length while ( DATA ) { unless ( s/^\s*// ) { chomp; my @char = ( split( // ), ( '-' ) x ( $len - length ) ); $_ = @char\n; } print; } __DATA__ dog agatagatcgcatcga cat acgcttcgatacgctagctta mouse agatatacgggt I do not know what happend but the text didn't get formatted correctly on the list. But this is how the out put should really have been: a g a t a g a t c g c a t c g a - - - - - -dog a c g c t t c g a t a c g c t a g c t t a -cat a g a t a t a c g g g t t - - - - - - - - -mouse That is, I want the edited sequence data and the name on the same line. #!/usr/bin/perl use warnings; use strict; my $len = 30; my $name; while ( DATA ) { chomp; unless ( s/^\s*(.+)// ) { $name = $1; my @char = ( split( // ), ( '-' ) x ( $len - length ) ); print @char$name\n; } } __DATA__ dog agatagatcgcatcga cat acgcttcgatacgctagctta mouse agatatacgggt John -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: rearrange text
Mike Robeson wrote: In article [EMAIL PROTECTED], [EMAIL PROTECTED] (John W. Krahn) wrote: Mike Robeson wrote: I do not know what happend but the text didn't get formatted correctly on the list. But this is how the out put should really have been: a g a t a g a t c g c a t c g a - - - - - -dog a c g c t t c g a t a c g c t a g c t t a -cat a g a t a t a c g g g t t - - - - - - - - -mouse That is, I want the edited sequence data and the name on the same line. #!/usr/bin/perl use warnings; use strict; my $len = 30; my $name; while ( DATA ) { chomp; unless ( s/^\s*(.+)// ) { $name = $1; my @char = ( split( // ), ( '-' ) x ( $len - length ) ); print @char$name\n; } } __DATA__ dog agatagatcgcatcga cat acgcttcgatacgctagctta mouse agatatacgggt Thanks for the help!! I new it had to be simple... but I just didn't see it! I just need to add some more code to it but I think I can take it from here. You can make that a bit more robust. :-) #!/usr/bin/perl use warnings; use strict; my $len = 30; while ( DATA ) { unless ( /^\s*$/ or s/^\s*(\S+)// ) { my $name = $1; my @char = ( /[acgt]/g, ( '-' ) x $len )[ 0 .. $len - 1 ]; print @char$name\n; } } __DATA__ dog agatagatcgcatcga cat acgcttcgatacgctagctta mouse agatatacgggt John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: rearrange text
OK, I feel like an idiot. When I initially asked for help with this I just realized that I forgot two little details. I was supposed to add the number of sequences as well as the length of the sequences at the top of the output file. That is this file: dog agatagatcgcatcga cat acgcttcgatacgctagctta mouse agatatacgggtt is relly supposed to be: 3 22 a g a t a g a t c g c a t c g a - - - - - -dog a c g c t t c g a t a c g c t a g c t t a -cat a g a t a t a c g g g t t - - - - - - - - -mouse The '3' represents the number of individual sequences in the file (i.e. dog, cat, mouse). And the 22 is the number of letters and dashes there are. The length is already in the script as $len. I am able to get the length listed at the top. However, I cannot find a way to have the number of sequences (the 3 in this case) printed to the top. Is there a way that I can just append to the outfile at the begining of a file? Sorry, about this. I didn't realize I forgot to include this info. I guess I am to busy trying to learn PERL and I am not paying attention to what I need PERL to do for me! :-) -Thanks -Mike In article [EMAIL PROTECTED], [EMAIL PROTECTED] (John W. Krahn) wrote: Mike Robeson wrote: In article [EMAIL PROTECTED], [EMAIL PROTECTED] (John W. Krahn) wrote: Mike Robeson wrote: I do not know what happend but the text didn't get formatted correctly on the list. But this is how the out put should really have been: a g a t a g a t c g c a t c g a - - - - - -dog a c g c t t c g a t a c g c t a g c t t a -cat a g a t a t a c g g g t t - - - - - - - - -mouse That is, I want the edited sequence data and the name on the same line. #!/usr/bin/perl use warnings; use strict; my $len = 30; my $name; while ( DATA ) { chomp; unless ( s/^\s*(.+)// ) { $name = $1; my @char = ( split( // ), ( '-' ) x ( $len - length ) ); print @char$name\n; } } __DATA__ dog agatagatcgcatcga cat acgcttcgatacgctagctta mouse agatatacgggt Thanks for the help!! I new it had to be simple... but I just didn't see it! I just need to add some more code to it but I think I can take it from here. You can make that a bit more robust. :-) #!/usr/bin/perl use warnings; use strict; my $len = 30; while ( DATA ) { unless ( /^\s*$/ or s/^\s*(\S+)// ) { my $name = $1; my @char = ( /[acgt]/g, ( '-' ) x $len )[ 0 .. $len - 1 ]; print @char$name\n; } } __DATA__ dog agatagatcgcatcga cat acgcttcgatacgctagctta mouse agatatacgggt John -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: rearrange text
Mike Robeson wrote: Hello, I am a relatively new PERL beginner and have been trying to work with simple bioinformatics stuff. I have so far written some very useful but simple bioinformatics scripts. However recently I have been trying to work on a script to no avail. I have a text file whose contents are: dog agatagatcgcatcga cat acgcttcgatacgctagctta mouse agatatacgggt and so on... I would like to turn that into this: a g a t a g a t c g c a t c g a - - - - - - - - - - - - - - - dog a c g c t t c g a t a c g c t a g c t t a - - - - - - - - - - cat a g a t a t a c g g g t t - - - - - - - - - - - - - - - - - - - mouse Notice that the sequence of letters varies however I need the lines in the newly formed file to be equal in length by adding the appropriate amount of dashes. For those in the know I am trying to convert a FASTA file into a DCSE file. I have been beating my head for the past 2 weeks and I cannot figure out how to do this. I do not expect a complete answer (I would like to try figuring this out on my own as much as possible) but rather some guidance. Any detailed pseudo-code would be appreciated!! -Thanks! -Mike Here is a shot and at least one way to try it: #!perl -w use strict; my @MyWorka = (); my $In = 0; my $MyItem1 ; my $MyItem2 ; my $MyMaxLen = 35; while ( DATA ) { chomp; next if ( /^\s*$/ ); if ( /^\s+(\S+)/ ) { $MyItem1 = $1; chomp($MyItem2 = DATA); $In++; $MyItem2 =~ s/\s+//g; my $MyLen = length($MyItem2); if ( $MyLen $MyMaxLen ) { my $MyExtra = $MyMaxLen - $MyLen; $MyItem2 .= sprintf %s, '-'x$MyExtra; } @MyWorka = split(//,$MyItem2); printf %-2sx$MyMaxLen , @MyWorka; printf \n%-s\n\n, $MyItem1; } } __DATA__ dog agatagatcgcatcga cat acgcttcgatacgctagctta mouse agatatacgggt Output: a g a t a g a t c g c a t c g a - - - - - - - - - - - - - - - - - - - dog a c g c t t c g a t a c g c t a g c t t a - - - - - - - - - - - - - - cat a g a t a t a c g g g t - - - - - - - - - - - - - - - - - - - - - - - mouse ** This message contains information that is confidential and proprietary to FedEx Freight or its affiliates. It is intended only for the recipient named and for the express purpose(s) described therein. Any other use is prohibited. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: rearrange text
Mike Robeson wrote: Hello, Hello, I am a relatively new PERL beginner and have been trying to work with simple bioinformatics stuff. I have so far written some very useful but simple bioinformatics scripts. However recently I have been trying to work on a script to no avail. I have a text file whose contents are: dog agatagatcgcatcga cat acgcttcgatacgctagctta mouse agatatacgggt and so on... I would like to turn that into this: a g a t a g a t c g c a t c g a - - - - - - - - - - - - - - - dog a c g c t t c g a t a c g c t a g c t t a - - - - - - - - - - cat a g a t a t a c g g g t t - - - - - - - - - - - - - - - - - - - mouse Notice that the sequence of letters varies however I need the lines in the newly formed file to be equal in length by adding the appropriate amount of dashes. For those in the know I am trying to convert a FASTA file into a DCSE file. I have been beating my head for the past 2 weeks and I cannot figure out how to do this. I do not expect a complete answer (I would like to try figuring this out on my own as much as possible) but rather some guidance. Any detailed pseudo-code would be appreciated!! According to your data this should work: #!/usr/bin/perl use warnings; use strict; my $len = 30; # pad out to this length while ( DATA ) { unless ( s/^\s*// ) { chomp; my @char = ( split( // ), ( '-' ) x ( $len - length ) ); $_ = @char\n; } print; } __DATA__ dog agatagatcgcatcga cat acgcttcgatacgctagctta mouse agatatacgggt John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: rearrange text
David --- Senior Programmer Analyst --- Wgo Wagner wrote: #!perl -w use strict; my @MyWorka = (); my $In = 0; my $MyItem1 ; my $MyItem2 ; There is no reason to declare these variables with file scope as they are only used inside the while loop. my $MyMaxLen = 35; while ( DATA ) { chomp; next if ( /^\s*$/ ); if ( /^\s+(\S+)/ ) { $MyItem1 = $1; my $MyItem1 = $1; chomp($MyItem2 = DATA); chomp( my $MyItem2 = DATA ); $In++; What does this do? It isn't used anywhere else. $MyItem2 =~ s/\s+//g; my $MyLen = length($MyItem2); if ( $MyLen $MyMaxLen ) { my $MyExtra = $MyMaxLen - $MyLen; $MyItem2 .= sprintf %s, '-'x$MyExtra; The use of sprintf is a bit redundant. $MyItem2 .= '-' x $MyExtra; } @MyWorka = split(//,$MyItem2); printf %-2sx$MyMaxLen , @MyWorka; printf \n%-s\n\n, $MyItem1; No need for printf here. print join ' ', @MyWorka; print \n$MyItem1\n\n, } } John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]