On May 12, 2009, at 3:08 AM, michel bellis wrote:
> > Chris, > > The results I have sent are in order as I entered in the pydev console > (eclipse on Windows XP). So the os.listdir is before construction of > the NLMSA. that's odd. Why would there be .idDict and .seqIDdict files *before* building the NLMSA. I guess you already tried to build it before, and these are left over from old builds. > > > I tried your code and got a lot of file longer than 1 000 000. But I > do not undestand what is the limit exactly. The limit is that my brain was inoperative when I wrote that. I meant 1 billion, because I wanted it to report any sequence that might go over the 32 bit int limit. But what's 3 orders of magnitude between friends, anyway? > > I ran it with a different value (>= 2 000 000 000) and was surprised > to find this: > mm8.chrM 2666011489 > > So I reconstructed mm8 without chrM. And then at the end I got this: > mm8.chr9_random 2666012422 (which was hg18.chr9_random 1146434 OK, as I thought, something is going wrong with the reading of your sequence file. Was chrM the *last* sequence in the file? And chr9_random the next-to-last sequence? That would imply that the last sequence length is always getting screwed up. Looking at the seqfmt.pyx code, I don't see any obvious reason why that could occur. Assuming that 'mm8' is the name of your fasta input file, try the following test code: from pygr import seqfmt f = file('mm8') d = {} seqfmt.read_fasta_lengths(d, f, 'mm8') print d # see if the sequence length is already crazy at this point Finally, what platform (Windows via Cygwin?) and Python / Pyrex versions are you using? -- Chris --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pygr-dev" group. To post to this group, send email to pygr-dev@googlegroups.com To unsubscribe from this group, send email to pygr-dev+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/pygr-dev?hl=en -~----------~----~----~----~------~----~------~--~---