Hi Chris, I think what you should check SequenceFileDB whether the the given chromosome is in it.
i.e. call the given chromosome, >>> hg19['chr6_qbl_hap6'] >>> ponAbe2['chr6'] I don't see any problems with your NLMSA building script. If you want to build NLMSA from MULTIZ alignments from UCSC genome browser not custom ones, I suggest another solution. You can download all genomes and pre-built NLMSA files available at http://biodb.bioinformatics.ucla.edu/PYGRDATA/ and http://biodb.bioinformatics.ucla.edu/GENOMES/ Or, you can give download=True option for automatic downloading and building SequenceFileDB and NLMSA from biodb2.bioinformatics.ucla.edu WORLDBASEPATH = '.,http://biodb2.bioinformatics.ucla.edu:5000' # save your resources in '.' current directory, and connect biodb2 pygr resources via http://biodb2.bioinformatics.ucla.edu:5000 from pygr import worldbase worldbase('Bio.MSA.UCSC.hg19_multiz46way', download=True) -- Namshin Kim On Fri, Sep 17, 2010 at 6:29 AM, Chris Fuller <chriskful...@gmail.com>wrote: > Hello Pygr-dev, > > It seems that others have previously run into warnings like: > > *** WARNING: Unknown sequence hg19.chr6_qbl_hap6 ignored... > *** WARNING: Unknown sequence panTro2.chr6 ignored... > *** WARNING: Unknown sequence ponAbe2.chr6 ignored... > > when building an NLMSA using MAF files. I'm running into thousands of > these when using multiz46way and six corresponding genomes, all > downloaded from UCSC. With grep I can verify that, for instance, > ponAbe2.chr6 references exist in chr6.maf and that my ponAbe2 fasta > file really contains a >chr6 header. > > How can I determine if these errors originate in my files or my pygr > code? Any suggestions? > > Thank you, > > Chris > > Chris Fuller > ch...@genome.ucsf.edu > > The code I'm using (in Eclipse) is: > > import os, glob > from pygr import cnestedlist,seqdb > > # Create list of full paths to all MAF files involved > maf_path_string = '/home/chris/Storage/Data/Public/Human/hg19_MAF' > maf_files_list = glob.glob(maf_path_string + '/*.maf') > > # Create list of full paths to each Genome in single FASTA format > genomes ={} > seqlist = ['hg19','panTro2', 'ponAbe2', 'rheMac2', 'mm9', 'rn4'] > genomes_path_string = '/home/chris/Storage/Data/Public/Genomes/ > single_file' > seqlist_path = [] > for i in range(len(seqlist)): > seqlist_path.append(genomes_path_string + '/' + seqlist[i]) > > for orgstr in seqlist_path: > genomes[orgstr] = seqdb.SequenceFileDB(orgstr) > genomeUnion = seqdb.PrefixUnionDict(genomes) > > # Now build it: > NLMSA_path = '/home/chris/Storage/Data/Public/Human/hg19_MAF/NLMSA' > msa = cnestedlist.NLMSA(pathstem=NLMSA_path, mode='w', > seqDict=genomeUnion, mafFiles=maf_files_list, bidirectional=False) > msa.build(saveSeqDict=True) > > > -- > You received this message because you are subscribed to the Google Groups > "pygr-dev" group. > To post to this group, send email to pygr-...@googlegroups.com. > To unsubscribe from this group, send email to > pygr-dev+unsubscr...@googlegroups.com<pygr-dev%2bunsubscr...@googlegroups.com> > . > For more options, visit this group at > http://groups.google.com/group/pygr-dev?hl=en. > > -- You received this message because you are subscribed to the Google Groups "pygr-dev" group. To post to this group, send email to pygr-...@googlegroups.com. To unsubscribe from this group, send email to pygr-dev+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/pygr-dev?hl=en.