Hello Pygr-dev,

It seems that others have previously run into warnings like:

 *** WARNING: Unknown sequence hg19.chr6_qbl_hap6 ignored...
 *** WARNING: Unknown sequence panTro2.chr6 ignored...
 *** WARNING: Unknown sequence ponAbe2.chr6 ignored...

when building an NLMSA using MAF files.  I'm running into thousands of
these when using multiz46way and six corresponding genomes, all
downloaded from UCSC.  With grep I can verify that, for instance,
ponAbe2.chr6 references exist in chr6.maf and that my ponAbe2 fasta
file really contains a >chr6 header.

How can I determine if these errors originate in my files or my pygr
code?  Any suggestions?

Thank you,

Chris

Chris Fuller
ch...@genome.ucsf.edu

The code I'm using (in Eclipse) is:

import os, glob
from pygr import cnestedlist,seqdb

# Create list of full paths to all MAF files involved
maf_path_string = '/home/chris/Storage/Data/Public/Human/hg19_MAF'
maf_files_list = glob.glob(maf_path_string + '/*.maf')

# Create list of full paths to each Genome in single FASTA format
genomes ={}
seqlist = ['hg19','panTro2', 'ponAbe2', 'rheMac2', 'mm9', 'rn4']
genomes_path_string = '/home/chris/Storage/Data/Public/Genomes/
single_file'
seqlist_path = []
for i in range(len(seqlist)):
    seqlist_path.append(genomes_path_string + '/' + seqlist[i])

for orgstr in seqlist_path:
    genomes[orgstr] = seqdb.SequenceFileDB(orgstr)
genomeUnion = seqdb.PrefixUnionDict(genomes)

# Now build it:
NLMSA_path = '/home/chris/Storage/Data/Public/Human/hg19_MAF/NLMSA'
msa = cnestedlist.NLMSA(pathstem=NLMSA_path, mode='w',
seqDict=genomeUnion, mafFiles=maf_files_list, bidirectional=False)
msa.build(saveSeqDict=True)


-- 
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to pygr-...@googlegroups.com.
To unsubscribe from this group, send email to 
pygr-dev+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en.

Reply via email to