Hello Pygr-dev, It seems that others have previously run into warnings like:
*** WARNING: Unknown sequence hg19.chr6_qbl_hap6 ignored... *** WARNING: Unknown sequence panTro2.chr6 ignored... *** WARNING: Unknown sequence ponAbe2.chr6 ignored... when building an NLMSA using MAF files. I'm running into thousands of these when using multiz46way and six corresponding genomes, all downloaded from UCSC. With grep I can verify that, for instance, ponAbe2.chr6 references exist in chr6.maf and that my ponAbe2 fasta file really contains a >chr6 header. How can I determine if these errors originate in my files or my pygr code? Any suggestions? Thank you, Chris Chris Fuller ch...@genome.ucsf.edu The code I'm using (in Eclipse) is: import os, glob from pygr import cnestedlist,seqdb # Create list of full paths to all MAF files involved maf_path_string = '/home/chris/Storage/Data/Public/Human/hg19_MAF' maf_files_list = glob.glob(maf_path_string + '/*.maf') # Create list of full paths to each Genome in single FASTA format genomes ={} seqlist = ['hg19','panTro2', 'ponAbe2', 'rheMac2', 'mm9', 'rn4'] genomes_path_string = '/home/chris/Storage/Data/Public/Genomes/ single_file' seqlist_path = [] for i in range(len(seqlist)): seqlist_path.append(genomes_path_string + '/' + seqlist[i]) for orgstr in seqlist_path: genomes[orgstr] = seqdb.SequenceFileDB(orgstr) genomeUnion = seqdb.PrefixUnionDict(genomes) # Now build it: NLMSA_path = '/home/chris/Storage/Data/Public/Human/hg19_MAF/NLMSA' msa = cnestedlist.NLMSA(pathstem=NLMSA_path, mode='w', seqDict=genomeUnion, mafFiles=maf_files_list, bidirectional=False) msa.build(saveSeqDict=True) -- You received this message because you are subscribed to the Google Groups "pygr-dev" group. To post to this group, send email to pygr-...@googlegroups.com. To unsubscribe from this group, send email to pygr-dev+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/pygr-dev?hl=en.