There are a bunch of new tests up at shootout.alioth.debian.org for which Python does not yet have code. I've taken a crack at one of them, a task to print the reverse complement of a gene transcription. Since there are a lot of minds on this newsgroup that are much better at optimization than I, I'm posting the code I came up with to see if anyone sees any opportunities for substantial improvement. Without further ado:
table = string.maketrans('ACBDGHK\nMNSRUTWVY', 'TGVHCDM\nKNSYAAWBR')
def show(s): i = 0 for char in s.upper().translate(table)[::-1]: if i == 60: print i = 0 sys.stdout.write(char) i += 1 print
def main(): seq = '' for line in sys.stdin: if line[0] == '>' or line[0] == ';': if seq != '': show(seq) seq = '' print line, else: seq += line[:-1] show(seq)
main()
Don't know if this is faster for your data, but I think you could also write this as (untested):
# table as default argument value so you don't have to do # a global lookup each time it's used
def show(seq, table=string.maketrans('ACBDGHK\nMNSRUTWVY', 'TGVHCDM\nKNSYAAWBR') seq = seq.upper().translate(table)[::-1] # print string in slices of length 60 for i in range(0, len(seq), 60): print seq[i:i+60]
def main(): seq = [] # alias methods to avoid repeated lookup join = ''.join append = seq.append for line in sys.stdin: # note only one "line[0]" by using "in" test if line[0] in ';>': # note no need to check if seq is empty; show now prints # nothing for an empty string show(join(seq)) print line, del seq[:] else: append(line[:-1])
Making seq into a list instead of a string (and using .extend instead of the + operator) didn't give any speed improvements. Neither did using a dictionary instead of the translate function, or using reversed() instead of s[::-1]. The latter surprised me, since I would have guessed using an iterator to be more efficient. Since the shootout also tests memory usage, should I be using reversed for that reason?
reversed() won't save you any memory -- you're already loading the entire string into memory anyway.
Interesting tidbit: del seq[:] tests faster than seq = []
$ python -m timeit -s "lst = range(1000)" "lst = []" 10000000 loops, best of 3: 0.159 usec per loop
$ python -m timeit -s "lst = range(1000)" "del lst[:]" 10000000 loops, best of 3: 0.134 usec per loop
It's probably the right way to go in this case anyway -- no need to create a new empty list each time.
STeVe -- http://mail.python.org/mailman/listinfo/python-list