Re: processing the genetic code with python?
I'm writing your name down and this is the last time I'm doing homework for you. James Wow, you are really a pretentious asshole. If you don't want to provide people with help, don't bother. And that code's incorrect anyway. -- http://mail.python.org/mailman/listinfo/python-list
Re: processing the genetic code with python?
[EMAIL PROTECTED] wrote: I'm writing your name down and this is the last time I'm doing homework for you. James Wow, you are really a pretentious asshole. If you don't want to provide people with help, don't bother. And that code's incorrect anyway. So a smiley has to be explicitly included for you to perceive something as being funny? regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd www.holdenweb.com Love me, love my blog holdenweb.blogspot.com -- http://mail.python.org/mailman/listinfo/python-list
Re: processing the genetic code with python?
David E. Konerding DSD staff [EMAIL PROTECTED] wrote: I don't really understand precisely what you're trying to do. First off, those aren't base pairs, they're bases. Only when you have double-stranded DNA (or RNA, or some other oddball cases) would they be base pairs. Isn't that just a standard way to write DNA pairs? After all, every a is paired with a t, and every c is paired with a g, so it is redundant to specify both ends of the pair. -- - Tim Roberts, [EMAIL PROTECTED] Providenza Boekelheide, Inc. -- http://mail.python.org/mailman/listinfo/python-list
Re: processing the genetic code with python?
Thanks so much guys for you all your help. I've had a month to learn python and do this for my project, I had the basics down but just kept getting unstuck. I won't message again - promise! Hehehe, thanks again everyone. -- http://mail.python.org/mailman/listinfo/python-list
Re: processing the genetic code with python?
Diez B. Roggisch wrote: nuttydevil schrieb: I've tried various ways of doing this but keep coming unstuck along the way. Has anyone got any suggestions for how they would tackle this problem? Thanks for any help recieved! Show us your ways, show us where you got stuck - then we'd might be able to help you. Also, take a look at the biopython package, which will almost certainly save you large amounts of time on tasks such as the one you describe. http://www.biopython.org/ regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd www.holdenweb.com Love me, love my blog holdenweb.blogspot.com -- http://mail.python.org/mailman/listinfo/python-list
Re: processing the genetic code with python?
In article [EMAIL PROTECTED], nuttydevil [EMAIL PROTECTED] wrote: I have many notepad documents that all contain long chunks of genetic code. They look something like this: atggctaaactgaccaagcgcatgcgtgttatccgcgagaaagttgatgcaaccaaacag tacgacatcaacgaagctatcgcactgctgaaagagctggcgactgctaaattcgtagaa agcgtggacgtagctgttaacctcggcatcgacgctcgtaaatctgaccagaacgtacgt ggtgcaactgtactgccgcacggtactggccgttccgttcgcgtagccgtatttacccaa Basically, I want to design a program using python that can open and read these documents. Start by googling for biopython. -- http://mail.python.org/mailman/listinfo/python-list
Re: processing the genetic code with python?
In article [EMAIL PROTECTED], nuttydevil wrote: I have many notepad documents that all contain long chunks of genetic code. They look something like this: atggctaaactgaccaagcgcatgcgtgttatccgcgagaaagttgatgcaaccaaacag tacgacatcaacgaagctatcgcactgctgaaagagctggcgactgctaaattcgtagaa agcgtggacgtagctgttaacctcggcatcgacgctcgtaaatctgaccagaacgtacgt ggtgcaactgtactgccgcacggtactggccgttccgttcgcgtagccgtatttacccaa Basically, I want to design a program using python that can open and read these documents. However, I want them to be read 3 base pairs at a time (to analyse them codon by codon) and find the value that each codon has a value assigned to it. An example of this is below: ** If the three base pairs were UUU the value assigned to it (from the codon value table) would be 0.296 The program has to read all the sequence three pairs at a time, then I want to get all the values for each codon, multiply them together and put them to the power of 1 / the length of the sequence in codons (which is the length of the whole sequence divided by three). I don't really understand precisely what you're trying to do. First off, those aren't base pairs, they're bases. Only when you have double-stranded DNA (or RNA, or some other oddball cases) would they be base pairs. Second, I don't know what the codon to value function is, is this frequency (IE number n occurences of codon X out of N total codons)? Or is the lookup table provided for you? Anyay, I can help you with most of the preprocessing. For example, However, to make things even more complicated, the notebook sequences are in lowercase and the codon value table is in uppercase, so the sequences need to be converted into uppercase. Also, the Ts in the DNA sequences need to be changed to Us (again to match the codon value table). And finally, before the DNA sequences are read and analysed I need to remove the first 50 codons (i.e. the first 150 letters) and the last 20 codons (the last 60 letters) from the DNA sequence. I've also been having problems ensuring the program reads ALL the sequence 3 letters at a time. So, if the file is called notepad.txt, I'd do what you did above as: import string o = open(notepad.txt) l = o.readlines() ## read all lines l = map(string.strip, l) ## strip newlines l = .join(l) ## join into one string (in case codon boundaries cross lines) l = l[50:-60] l = l.upper() print l codons = [] for i in range(0, len(l), 3): codons.append(l[i:i+3]) print codons That gets you about 30% of the way there. Dave -- http://mail.python.org/mailman/listinfo/python-list
Re: processing the genetic code with python?
nuttydevil wrote: I have many notepad documents that all contain long chunks of genetic code. They look something like this: atggctaaactgaccaagcgcatgcgtgttatccgcgagaaagttgatgcaaccaaacag tacgacatcaacgaagctatcgcactgctgaaagagctggcgactgctaaattcgtagaa agcgtggacgtagctgttaacctcggcatcgacgctcgtaaatctgaccagaacgtacgt ggtgcaactgtactgccgcacggtactggccgttccgttcgcgtagccgtatttacccaa Basically, I want to design a program using python that can open and read these documents. However, I want them to be read 3 base pairs at a time (to analyse them codon by codon) and find the value that each codon has a value assigned to it. An example of this is below: ** If the three base pairs were UUU the value assigned to it (from the codon value table) would be 0.296 The program has to read all the sequence three pairs at a time, then I want to get all the values for each codon, multiply them together and put them to the power of 1 / the length of the sequence in codons (which is the length of the whole sequence divided by three). However, to make things even more complicated, the notebook sequences are in lowercase and the codon value table is in uppercase, so the sequences need to be converted into uppercase. Also, the Ts in the DNA sequences need to be changed to Us (again to match the codon value table). And finally, before the DNA sequences are read and analysed I need to remove the first 50 codons (i.e. the first 150 letters) and the last 20 codons (the last 60 letters) from the DNA sequence. I've also been having problems ensuring the program reads ALL the sequence 3 letters at a time. I've tried various ways of doing this but keep coming unstuck along the way. Has anyone got any suggestions for how they would tackle this problem? Yes: use python. Thanks for any help recieved! I couldn't help myself. I strongly suggest you study this example. It will cut your coding time way down in the future. I'm writing your name down and this is the last time I'm doing homework for you. James from operator import mul table = { 'AUG' : 0.98999, 'CCC' : 0.9755 } # == you fill this in trim_front = 50 trim_back = 20 # Why I did this: # Python =1 line per thought; you have to love it data = .join([s.strip() for s in open(filename)]) data = data.upper().replace('T', 'U') codons = [data[i:i+3] for i in xrange(0, len(data), 3)] # Alex Martelli trimmed = codons[trim_front:-trim_back] product = reduce(mul, [table[codon] for codon in codons]) value = product**(1.0/len(trimmed)) # == is this really ALL codons? print value # useless print statement -- James Stroud UCLA-DOE Institute for Genomics and Proteomics Box 951570 Los Angeles, CA 90095 http://www.jamesstroud.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: processing the genetic code with python?
James Stroud wrote: nuttydevil wrote: I have many notepad documents that all contain long chunks of genetic code. They look something like this: atggctaaactgaccaagcgcatgcgtgttatccgcgagaaagttgatgcaaccaaacag tacgacatcaacgaagctatcgcactgctgaaagagctggcgactgctaaattcgtagaa agcgtggacgtagctgttaacctcggcatcgacgctcgtaaatctgaccagaacgtacgt ggtgcaactgtactgccgcacggtactggccgttccgttcgcgtagccgtatttacccaa Basically, I want to design a program using python that can open and read these documents. However, I want them to be read 3 base pairs at a time (to analyse them codon by codon) and find the value that each codon has a value assigned to it. An example of this is below: ** If the three base pairs were UUU the value assigned to it (from the codon value table) would be 0.296 The program has to read all the sequence three pairs at a time, then I want to get all the values for each codon, multiply them together and put them to the power of 1 / the length of the sequence in codons (which is the length of the whole sequence divided by three). However, to make things even more complicated, the notebook sequences are in lowercase and the codon value table is in uppercase, so the sequences need to be converted into uppercase. Also, the Ts in the DNA sequences need to be changed to Us (again to match the codon value table). And finally, before the DNA sequences are read and analysed I need to remove the first 50 codons (i.e. the first 150 letters) and the last 20 codons (the last 60 letters) from the DNA sequence. I've also been having problems ensuring the program reads ALL the sequence 3 letters at a time. I've tried various ways of doing this but keep coming unstuck along the way. Has anyone got any suggestions for how they would tackle this problem? Yes: use python. Thanks for any help recieved! I couldn't help myself. I strongly suggest you study this example. It will cut your coding time way down in the future. I'm writing your name down and this is the last time I'm doing homework for you. James from operator import mul table = { 'AUG' : 0.98999, 'CCC' : 0.9755 } # == you fill this in trim_front = 50 trim_back = 20 # Why I did this: # Python =1 line per thought; you have to love it data = .join([s.strip() for s in open(filename)]) data = data.upper().replace('T', 'U') codons = [data[i:i+3] for i in xrange(0, len(data), 3)] # Alex Martelli trimmed = codons[trim_front:-trim_back] product = reduce(mul, [table[codon] for codon in codons]) value = product**(1.0/len(trimmed)) # == is this really ALL codons? print value # useless print statement I noticed a typo. Should be Python = 1 line per thought. James -- James Stroud UCLA-DOE Institute for Genomics and Proteomics Box 951570 Los Angeles, CA 90095 http://www.jamesstroud.com/ -- http://mail.python.org/mailman/listinfo/python-list