Hi Jumana, Following up. Let's change the subject line. This makes it much easier for folks to see that this is a new topic of conversation.
[Apologies to the others on the list for my last reply: I didn't realize that the subject was wrong, as well as the long quoted digest. I'll try to be more careful next time.] Jumana, I would strongly suggest separating string parsing issues from computational issues. The suggestion to use Biopython is twofold: not only do you get to avoid writing a FASTA parser, but it gets you in the right mindset of processing _multiple_ sequences. You are encountering this problem, as your comment suggests: > I wrote a program close to what Denis suggested , however it works only if I > have one sequence (one header and one sequence), I can not modify it to work > if I have several sequences (like above). You want the structure of your program to do an analysis on each biological sequence, rather than on just on each character of your sequence. ### ### pseudocode below: # ### from Bio import SeqIO import sys def doAnalysis(record): print("I see: ", record.id, record.seq) ## fill me in for record in SeqIO.parse(sys.stdin, 'fasta'): doAnalysis(record) ### And you can fill in the details of doAnalysis() so that it does the nucleotide counting and only needs to worry about the contents of the record's single sequence. In bioinformatics contexts, you must either deal with memory consumption, or use libraries that naturally lend to doing things in a memory-careful way, or else your computer will start swapping RAM. At least, unless your data sets are trivial, which I am guessing is not the case. In short, please use the BioPython library. It will handle a lot of issues that you are not considering, including memory consumption and correct, stream-oriented parsing of FASTA. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor