Re: processing the genetic code with python?

2006-03-10 Thread brainy_muppet



 I'm writing your name down and this is the last time I'm doing homework
 for you.

 James



Wow, you are really a pretentious asshole. If you don't want to provide
people with help, don't bother. 

And that code's incorrect anyway.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: processing the genetic code with python?

2006-03-10 Thread Steve Holden
[EMAIL PROTECTED] wrote:
 
I'm writing your name down and this is the last time I'm doing homework
for you.

James


 
 
 Wow, you are really a pretentious asshole. If you don't want to provide
 people with help, don't bother. 
 
 And that code's incorrect anyway.
 

So a smiley has to be explicitly included for you to perceive something 
as being funny?

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd www.holdenweb.com
Love me, love my blog holdenweb.blogspot.com

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: processing the genetic code with python?

2006-03-09 Thread Tim Roberts
David E. Konerding DSD staff [EMAIL PROTECTED] wrote:

I don't really understand precisely what you're trying to do.  

First off, those aren't base pairs, they're bases.  Only when you have 
double-stranded
DNA (or RNA, or some other oddball cases) would they be base pairs.

Isn't that just a standard way to write DNA pairs?  After all, every a is
paired with a t, and every c is paired with a g, so it is redundant
to specify both ends of the pair.
-- 
- Tim Roberts, [EMAIL PROTECTED]
  Providenza  Boekelheide, Inc.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: processing the genetic code with python?

2006-03-07 Thread nuttydevil
Thanks so much guys for you all your help. I've had a month to learn
python and do this for my project, I had the basics down but just kept
getting unstuck. I won't message again - promise! Hehehe, thanks again
everyone.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: processing the genetic code with python?

2006-03-06 Thread Steve Holden
Diez B. Roggisch wrote:
 nuttydevil schrieb:
 
I've tried various ways of doing this but keep coming unstuck along the
way. Has anyone got any suggestions for how they would tackle this
problem?
Thanks for any help recieved!
 
 
 Show us your ways, show us where you got stuck - then we'd might be able to 
 help you.

Also, take a look at the biopython package, which will almost certainly 
save you large amounts of time on tasks such as the one you describe.

   http://www.biopython.org/

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd www.holdenweb.com
Love me, love my blog holdenweb.blogspot.com

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: processing the genetic code with python?

2006-03-06 Thread Roy Smith
In article [EMAIL PROTECTED],
nuttydevil [EMAIL PROTECTED] wrote:
I have many notepad documents that all contain long chunks of genetic
code. They look something like this:

atggctaaactgaccaagcgcatgcgtgttatccgcgagaaagttgatgcaaccaaacag
tacgacatcaacgaagctatcgcactgctgaaagagctggcgactgctaaattcgtagaa
agcgtggacgtagctgttaacctcggcatcgacgctcgtaaatctgaccagaacgtacgt
ggtgcaactgtactgccgcacggtactggccgttccgttcgcgtagccgtatttacccaa

Basically, I want to design a program using python that can open and
read these documents.

Start by googling for biopython.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: processing the genetic code with python?

2006-03-06 Thread David E. Konerding DSD staff
In article [EMAIL PROTECTED], nuttydevil wrote:
 I have many notepad documents that all contain long chunks of genetic
 code. They look something like this:
 
 atggctaaactgaccaagcgcatgcgtgttatccgcgagaaagttgatgcaaccaaacag
 tacgacatcaacgaagctatcgcactgctgaaagagctggcgactgctaaattcgtagaa
 agcgtggacgtagctgttaacctcggcatcgacgctcgtaaatctgaccagaacgtacgt
 ggtgcaactgtactgccgcacggtactggccgttccgttcgcgtagccgtatttacccaa
 
 Basically, I want to design a program using python that can open and
 read these documents. However, I want them to be read 3 base pairs at a
 time (to analyse them codon by codon) and find the value that each
 codon has a value assigned to it. An example of this is below:
 
 ** If the three base pairs were UUU the value assigned to it (from the
 codon value table) would be 0.296
 
 The program has to read all the sequence three pairs at a time, then I
 want to get all the values for each codon, multiply them together and
 put them to the power of 1 / the length of the sequence in codons
 (which is the length of the whole sequence divided by three).
 

I don't really understand precisely what you're trying to do.  

First off, those aren't base pairs, they're bases.  Only when you have 
double-stranded
DNA (or RNA, or some other oddball cases) would they be base pairs.

Second, I don't know what the codon to value function is, is this frequency (IE 
number n  occurences of codon
X out of N total codons)?  Or is the lookup table provided for you?

Anyay, I can help you with most of the preprocessing.  For example,

However, to make things even more complicated, the notebook sequences
 are in lowercase and the codon value table is in uppercase, so the
 sequences need to be converted into uppercase. Also, the Ts in the DNA
 sequences need to be changed to Us (again to match the codon value
 table). And finally, before the DNA sequences are read and analysed I
 need to remove the first 50 codons (i.e. the first 150 letters) and the
 last 20 codons (the last 60 letters) from the DNA sequence. I've also
 been having problems ensuring the program reads ALL the sequence 3
 letters at a time.

So, if the file is called notepad.txt, I'd do what you did above as:

import string
o = open(notepad.txt)
l = o.readlines()  ## read all lines
l = map(string.strip, l)   ## strip newlines
l = .join(l)  ## join into one string (in case codon boundaries cross lines)
l = l[50:-60]
l = l.upper()
print l

codons = []
for i in range(0, len(l), 3):
codons.append(l[i:i+3])

print codons


That gets you about 30% of the way there.

Dave
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: processing the genetic code with python?

2006-03-06 Thread James Stroud
nuttydevil wrote:
 I have many notepad documents that all contain long chunks of genetic
 code. They look something like this:
 
 atggctaaactgaccaagcgcatgcgtgttatccgcgagaaagttgatgcaaccaaacag
 tacgacatcaacgaagctatcgcactgctgaaagagctggcgactgctaaattcgtagaa
 agcgtggacgtagctgttaacctcggcatcgacgctcgtaaatctgaccagaacgtacgt
 ggtgcaactgtactgccgcacggtactggccgttccgttcgcgtagccgtatttacccaa
 
 Basically, I want to design a program using python that can open and
 read these documents. However, I want them to be read 3 base pairs at a
 time (to analyse them codon by codon) and find the value that each
 codon has a value assigned to it. An example of this is below:
 
 ** If the three base pairs were UUU the value assigned to it (from the
 codon value table) would be 0.296
 
 The program has to read all the sequence three pairs at a time, then I
 want to get all the values for each codon, multiply them together and
 put them to the power of 1 / the length of the sequence in codons
 (which is the length of the whole sequence divided by three).
 
 However, to make things even more complicated, the notebook sequences
 are in lowercase and the codon value table is in uppercase, so the
 sequences need to be converted into uppercase. Also, the Ts in the DNA
 sequences need to be changed to Us (again to match the codon value
 table). And finally, before the DNA sequences are read and analysed I
 need to remove the first 50 codons (i.e. the first 150 letters) and the
 last 20 codons (the last 60 letters) from the DNA sequence. I've also
 been having problems ensuring the program reads ALL the sequence 3
 letters at a time.
 
 I've tried various ways of doing this but keep coming unstuck along the
 way. Has anyone got any suggestions for how they would tackle this
 problem?

Yes: use python.

 Thanks for any help recieved!
 

I couldn't help myself. I strongly suggest you study this example. It 
will cut your coding time way down in the future.

I'm writing your name down and this is the last time I'm doing homework 
for you.

James


from operator import mul

table = { 'AUG' : 0.98999, 'CCC' : 0.9755 } # == you fill this in
trim_front = 50
trim_back = 20

# Why I did this:
# Python =1 line per thought; you have to love it
data = .join([s.strip() for s in open(filename)])
data = data.upper().replace('T', 'U')
codons = [data[i:i+3] for i in xrange(0, len(data), 3)]  # Alex Martelli
trimmed = codons[trim_front:-trim_back]
product = reduce(mul, [table[codon] for codon in codons])
value = product**(1.0/len(trimmed))  # == is this really ALL codons?

print value   # useless print statement


-- 
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: processing the genetic code with python?

2006-03-06 Thread James Stroud
James Stroud wrote:
 nuttydevil wrote:
 
 I have many notepad documents that all contain long chunks of genetic
 code. They look something like this:

 atggctaaactgaccaagcgcatgcgtgttatccgcgagaaagttgatgcaaccaaacag
 tacgacatcaacgaagctatcgcactgctgaaagagctggcgactgctaaattcgtagaa
 agcgtggacgtagctgttaacctcggcatcgacgctcgtaaatctgaccagaacgtacgt
 ggtgcaactgtactgccgcacggtactggccgttccgttcgcgtagccgtatttacccaa

 Basically, I want to design a program using python that can open and
 read these documents. However, I want them to be read 3 base pairs at a
 time (to analyse them codon by codon) and find the value that each
 codon has a value assigned to it. An example of this is below:

 ** If the three base pairs were UUU the value assigned to it (from the
 codon value table) would be 0.296

 The program has to read all the sequence three pairs at a time, then I
 want to get all the values for each codon, multiply them together and
 put them to the power of 1 / the length of the sequence in codons
 (which is the length of the whole sequence divided by three).

 However, to make things even more complicated, the notebook sequences
 are in lowercase and the codon value table is in uppercase, so the
 sequences need to be converted into uppercase. Also, the Ts in the DNA
 sequences need to be changed to Us (again to match the codon value
 table). And finally, before the DNA sequences are read and analysed I
 need to remove the first 50 codons (i.e. the first 150 letters) and the
 last 20 codons (the last 60 letters) from the DNA sequence. I've also
 been having problems ensuring the program reads ALL the sequence 3
 letters at a time.

 I've tried various ways of doing this but keep coming unstuck along the
 way. Has anyone got any suggestions for how they would tackle this
 problem?
 
 
 Yes: use python.
 
 Thanks for any help recieved!

 
 I couldn't help myself. I strongly suggest you study this example. It 
 will cut your coding time way down in the future.
 
 I'm writing your name down and this is the last time I'm doing homework 
 for you.
 
 James
 
 
 from operator import mul
 
 table = { 'AUG' : 0.98999, 'CCC' : 0.9755 } # == you fill this in
 trim_front = 50
 trim_back = 20
 
 # Why I did this:
 # Python =1 line per thought; you have to love it
 data = .join([s.strip() for s in open(filename)])
 data = data.upper().replace('T', 'U')
 codons = [data[i:i+3] for i in xrange(0, len(data), 3)]  # Alex Martelli
 trimmed = codons[trim_front:-trim_back]
 product = reduce(mul, [table[codon] for codon in codons])
 value = product**(1.0/len(trimmed))  # == is this really ALL codons?
 
 print value   # useless print statement
 
 

I noticed a typo. Should be Python = 1 line per thought.

James

-- 
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
-- 
http://mail.python.org/mailman/listinfo/python-list