I have no idea about it, but maybe something goes wrong during the construction of pygr seqlen files (pureseq has I think the correct size, but some length is wrong in seqlen (chrM is a small file (17Ko) but its length in seqlen is huge).
First I concatenate the fasta files with this program : import sys, os, string def concatenate(inputFileDir, outputFile): os.chdir(inputFileDir) print 'processing ... %s' % os.getcwd() fileList=os.listdir(inputFileDir) outFile=open(outputFile,'w') for fileName in fileList: for lines in open(fileName,'r').xreadlines(): outFile.write(lines) outFile.close() print '%s is constructed' % os.getcwd() if __name__=='__main__': INPUTFILE_DIR='D:/data/ucsc/mm8' OUTPUTFILE='D:/data/ucsc/genome/mm8' concatenate(INPUTFILE_DIR,OUTPUTFILE) then I construct pygr file with this program : def make_blast_db(inputFileDir): os.chdir(inputFileDir) fileList=os.listdir(inputFileDir) for fileName in fileList: seqdb.BlastDB(fileName) if __name__=='__main__': #CREATE PYGR RESSOURCE INPUTFILE_DIR='D:/data/ucsc/genome' make_blast_db(INPUTFILE_DIR) On May 12, 3:32 pm, Istvan Albert <istvan.alb...@gmail.com> wrote: > On May 12, 6:08 am, michel bellis <fill.i...@9online.fr> wrote: > > > I tried your code and got a lot of file longer than 1 000 000. But I > > do not undestand what is the limit exactly. > > The number refers the the length of the sequence not the size of the > file. The limit of a 32 bit long signed integer is 2,147,483,647 > > The point that Chris was making is that each human chromosome is at > most 245 million bp long so how could you end up with sequences that > are over 2 billion long? > > Istvan --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pygr-dev" group. To post to this group, send email to pygr-dev@googlegroups.com To unsubscribe from this group, send email to pygr-dev+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/pygr-dev?hl=en -~----------~----~----~----~------~----~------~--~---