[pygr] Re: multiz44way?

Christopher Lee Thu, 03 Sep 2009 13:37:38 -0700

Hi Paul,
Yes, it should also get all dependencies with download=True.

How this is supposed to work: When a resource is saved to worldbase,  
its dependencies should also have worldbase IDs.  I.e. in the case of  
the 44 genome alignment, its seqDict (a PrefixUnionDict) contains  
references to each of the 44 genomes.  When the alignment is pickled  
by worldbase, that will automatically recurse to pickle the  
dependencies, including each of the 44 genomes.  Those genomes should  
each be marked with its worldbase ID; in that case pickling simply  
saves the worldbase ID rather than including the pickle of the genome  
in the pickle of the alignment.  When Pygr unpickles the alignment, it  
simply launches new worldbase requests for these IDs, in order to get  
them.  If the alignment was requested with download=True, the new  
requests will also be requested with download=True.


When the alignment was first saved to worldbase, its genomes will be  
marked with their worldbase IDs if either
   - those genomes were themselves obtained from worldbase;
   - or those genomes were saved to worldbase in this commit or a  
previous commit.

If a genome did not have a worldbase ID when the alignment was saved  
to worldbase, worldbase obviously cannot just save the genome as its  
worldbase ID, but instead must include its pickle in the alignment  
pickle.  In that case it will be retrieved along with the alignment,  
but it cannot be saved to worldbase locally (again because it has no  
ID).  That could conceivably lead to the situation you are seeing  
(where only the alignment is stored in your local metabase).

-- Chris

On Sep 3, 2009, at 1:21 PM, Paul Rigor (gmail) wrote:

> No genomes were listed once I downloaded the 44way alignment.   
> Here's an ipython session
>
> In [1]: from pygr import worldbase
>
> In [2]: worldbase.dir()
> Out[2]:
> ['0root',
>  '0version',
>  'Bio.MSA.UCSC.hg18_multiz44way',
>  'Bio.MSA.UCSC.hg18_multiz44way.txt',
>  '__doc__.Bio.MSA.UCSC.hg18_multiz44way',
>  '__doc__.Bio.MSA.UCSC.hg18_multiz44way.txt']
>
>
> From my understanding, the multiple alignment is generated from MAF  
> files and the coordinates are mapped to existing genome resources.   
> So you're saying that with this single MSA download step, all of the  
> necessary genomes will also be downloaded?
>
> Thanks!
> Paul

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to pygr-dev@googlegroups.com
To unsubscribe from this group, send email to 
pygr-dev+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[pygr] Re: multiz44way?

Reply via email to