Re: [pygr] Problems Building NLMSA from MAF

Namshin Kim Thu, 16 Sep 2010 18:28:19 -0700

Hi Chris,

I think what you should check SequenceFileDB whether the the given
chromosome is in it.


i.e. call the given chromosome,

>>> hg19['chr6_qbl_hap6']
>>> ponAbe2['chr6']

I don't see any problems with your NLMSA building script.

If you want to build NLMSA from MULTIZ alignments from UCSC genome browser
not custom ones, I suggest another solution. You can download all genomes
and pre-built NLMSA files available at
http://biodb.bioinformatics.ucla.edu/PYGRDATA/ and
http://biodb.bioinformatics.ucla.edu/GENOMES/

Or, you can give download=True option for automatic downloading and building
SequenceFileDB and NLMSA from biodb2.bioinformatics.ucla.edu

WORLDBASEPATH = '.,http://biodb2.bioinformatics.ucla.edu:5000' # save your
resources in '.' current directory, and connect biodb2 pygr resources via
http://biodb2.bioinformatics.ucla.edu:5000

from pygr import worldbase
worldbase('Bio.MSA.UCSC.hg19_multiz46way', download=True)

--
Namshin Kim




On Fri, Sep 17, 2010 at 6:29 AM, Chris Fuller <chriskful...@gmail.com>wrote:

> Hello Pygr-dev,
>
> It seems that others have previously run into warnings like:
>
>  *** WARNING: Unknown sequence hg19.chr6_qbl_hap6 ignored...
>  *** WARNING: Unknown sequence panTro2.chr6 ignored...
>  *** WARNING: Unknown sequence ponAbe2.chr6 ignored...
>
> when building an NLMSA using MAF files.  I'm running into thousands of
> these when using multiz46way and six corresponding genomes, all
> downloaded from UCSC.  With grep I can verify that, for instance,
> ponAbe2.chr6 references exist in chr6.maf and that my ponAbe2 fasta
> file really contains a >chr6 header.
>
> How can I determine if these errors originate in my files or my pygr
> code?  Any suggestions?
>
> Thank you,
>
> Chris
>
> Chris Fuller
> ch...@genome.ucsf.edu
>
> The code I'm using (in Eclipse) is:
>
> import os, glob
> from pygr import cnestedlist,seqdb
>
> # Create list of full paths to all MAF files involved
> maf_path_string = '/home/chris/Storage/Data/Public/Human/hg19_MAF'
> maf_files_list = glob.glob(maf_path_string + '/*.maf')
>
> # Create list of full paths to each Genome in single FASTA format
> genomes ={}
> seqlist = ['hg19','panTro2', 'ponAbe2', 'rheMac2', 'mm9', 'rn4']
> genomes_path_string = '/home/chris/Storage/Data/Public/Genomes/
> single_file'
> seqlist_path = []
> for i in range(len(seqlist)):
>    seqlist_path.append(genomes_path_string + '/' + seqlist[i])
>
> for orgstr in seqlist_path:
>    genomes[orgstr] = seqdb.SequenceFileDB(orgstr)
> genomeUnion = seqdb.PrefixUnionDict(genomes)
>
> # Now build it:
> NLMSA_path = '/home/chris/Storage/Data/Public/Human/hg19_MAF/NLMSA'
> msa = cnestedlist.NLMSA(pathstem=NLMSA_path, mode='w',
> seqDict=genomeUnion, mafFiles=maf_files_list, bidirectional=False)
> msa.build(saveSeqDict=True)
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "pygr-dev" group.
> To post to this group, send email to pygr-...@googlegroups.com.
> To unsubscribe from this group, send email to
> pygr-dev+unsubscr...@googlegroups.com<pygr-dev%2bunsubscr...@googlegroups.com>
> .
> For more options, visit this group at
> http://groups.google.com/group/pygr-dev?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to pygr-...@googlegroups.com.
To unsubscribe from this group, send email to 
pygr-dev+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en.

Re: [pygr] Problems Building NLMSA from MAF

Reply via email to