[pygr] Re: not reinventing the wheel on Annotations

Christopher Lee Tue, 25 Aug 2009 18:01:20 -0700

On Aug 25, 2009, at 5:40 PM, jbiesinger wrote:
>>
> I know this would be a tall order, but it would be awesome to have
> some kind of automatic mapping between these feature types.  I can't
> speak for the earlier GFF versions, but GFF3 uses the Parent attribute
> to associate genes->mRNA->exons etc. You would create the graph
> representing the complete set (or some part) of Parent-child
> relationships? In other words, have the children be accessible through
> the parent automagically, e.g.,
> myGene = annotDB['NM_027672']
> mRNAs = myGene.mRNA
> exons = mRNAs[0].exons
> for exon in exons:
>    print exon.start, exon.stop


Sure, why not -- these are all just one-to-many relations, right?  As  
long as the mapping information for each step (gene --> mRNAs; mRNA -- 
 > exons) exists somewhere, we should be able to tell Pygr how to get  
it and use it.  My first guess is we would model this as a one-to-many  
mapping with no edge information.

First question: where are these mappings stored?  In a database  
table?  If the schema is something straightforward (e.g. for gene -->  
mRNA, a table with an mRNA unique ID and a gene ID foreign key value),  
it should be easy to plug that in to Pygr using one of its standard  
classes like sqlgraph.SQLGraph.

>
> I'm thinking of mapping from the annotationDB (gff file) to itself.
> Would you need to create a separate table defining these relationships
> (like splices in
> http://www.doe-mbi.ucla.edu/~leec/newpygrdocs/tutorials/worldbase.html#worldbase-schema-a-simple-framework-for-managing-database-schemas)
>  
> ?

If the GFF mappings are stored in flat files (instead of a database),  
I guess we'd have some extra work to provide truly scalable solutions  
for:
- parsing the file
- building an on-disk index (using something like shelve or sqlite) so  
we can work with these datasets in Python without always having the  
first step be "load all the data into memory".  Python's support for  
shelve seems to be waning, so sqlite seems like the way to go.  That  
would take us right back to the database case I outlined above, since  
Pygr's sqlgraph classes work transparently with sqlite / MySQL...

If the data are in a flat file, could you provide some example code  
for simply parsing it?  If you just show me how you would load the  
data into memory, I can add code for turning it into an sqlite index  
(a one-time operation), which would then be usable by SQLGraph.

>
> It would be great to have all of this information available via pygr.

Let's make it happen!

-- Chris

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[pygr] Re: not reinventing the wheel on Annotations

Reply via email to