Hi all, New to the list and fairly new to pypy. First of all, congrats on the new 1.6 release-- the growing support for numpy is very exciting (go, fight, win, take state!).
So I snagged the 1.6 release to test if it would be faster on the kind of code I often write: Bioinformatics. In this snippet, the point is to check the mappability of the genome-- if a particular substring appears more than once in the genome, the region is called unmappable. Machine Specs: 64bit Ubuntu 11.04 119048 CPython 2.7.1 pystones/second 416667 pypy1.6 pystones/second The CPython version of the following code takes a bit more than a minute to run on the 21st chromosome of the human reference genome, but the pypy version has been going for 27+ minutes and hasn't yet finished the first step of loading the genome as a dict of strings. Am I using some construct that's particularly difficult for pypy? Am I missing something? hg18 chrom 21 is available at http://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes/chr21.fa.gz import sys def slide_dna(dna, windowsize): for i in xrange(len(dna) - windowsize): slice = dna[i:i+windowsize] if 'N' not in slice: yield slice fasta_file = sys.argv[1] # should be *.fa print 'loading dna from', fasta_file chroms = {} dna = None for l in open(fasta_file): if l.startswith('>'): # new chromosome if dna is not None: chroms[chrom] = dna chrom = l.strip().replace('>', '') dna = '' else: dna += l.rstrip() if dna is not None: chroms[chrom] = dna for length in [15]:#, 25, 35, 45, 55, 65, 75]: print 'now on', length mappable = 0 repeat = 0 s = {} for dna in chroms.itervalues(): for slice in slide_dna(dna, length): try: s[slice] += 1 except KeyError: s[slice] = 1 print 'built, now counting' for dna in chroms.itervalues(): for slice in slide_dna(dna, length): if s[slice] == 1: mappable += 1 else: repeat += 1 print 'for substring length %s, mappable: %s, repeat: %s' % (length, mappable, repeat) -- Jake Biesinger Graduate Student Xie Lab, UC Irvine
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev