Jacob Lee wrote:
There are a bunch of new tests up at shootout.alioth.debian.org for which
Python does not yet have code. I've taken a crack at one of them, a task
to print the reverse complement of a gene transcription. Since there are a
lot of minds on this newsgroup that are much better at optimization than
I, I'm posting the code I came up with to see if anyone sees any
opportunities for substantial improvement. Without further ado:

table = string.maketrans('ACBDGHK\nMNSRUTWVY', 'TGVHCDM\nKNSYAAWBR')

def show(s):
    i = 0
    for char in s.upper().translate(table)[::-1]:
        if i == 60:
            print
            i = 0
        sys.stdout.write(char)
        i += 1
    print

def main():
    seq = ''
    for line in sys.stdin:
        if line[0] == '>' or line[0] == ';':
            if seq != '':
                show(seq)
                seq = ''
            print line,
        else:
            seq += line[:-1]
    show(seq)

main()

Don't know if this is faster for your data, but I think you could also write this as (untested):


# table as default argument value so you don't have to do
# a global lookup each time it's used

def show(seq, table=string.maketrans('ACBDGHK\nMNSRUTWVY',
                                     'TGVHCDM\nKNSYAAWBR')
    seq = seq.upper().translate(table)[::-1]
    # print string in slices of length 60
    for i in range(0, len(seq), 60):
        print seq[i:i+60]

def main():
    seq = []
    # alias methods to avoid repeated lookup
    join = ''.join
    append = seq.append
    for line in sys.stdin:
        # note only one "line[0]" by using "in" test
        if line[0] in ';>':
            # note no need to check if seq is empty; show now prints
            # nothing for an empty string
            show(join(seq))
            print line,
            del seq[:]
        else:
            append(line[:-1])


Making seq into a list instead of a string (and using .extend instead of
the + operator) didn't give any speed improvements. Neither did using a
dictionary instead of the translate function, or using reversed() instead
of s[::-1]. The latter surprised me, since I would have guessed using an
iterator to be more efficient. Since the shootout also tests memory usage,
should I be using reversed for that reason?

reversed() won't save you any memory -- you're already loading the entire string into memory anyway.



Interesting tidbit: del seq[:] tests faster than seq = []

$ python -m timeit -s "lst = range(1000)" "lst = []"
10000000 loops, best of 3: 0.159 usec per loop

$ python -m timeit -s "lst = range(1000)" "del lst[:]"
10000000 loops, best of 3: 0.134 usec per loop

It's probably the right way to go in this case anyway -- no need to create a new empty list each time.

STeVe
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to