On Nov 11, 2009, at 2:37 PM, Marek Szuba wrote:
> > On Wed, 11 Nov 2009 12:34:00 -0800 > "C. Titus Brown" <c...@msu.edu> wrote: > >>> 1. Can we live without original Ensembl exon IDs? >> I can ;). But I don't understand why we would need to. > Well, if we absolutely and positively NEED Ensembl exon IDs then the > whole idea of interfacing with Ensembl via UCSC is useless and we need > to get back to talking directly to Ensembl. I think you're expressing this too pessimistically. At a minimum, the UCSC exon annotation should make it dramatically easier to join against the ensemble exon annotations. In 90% of cases exon annotations from these two sources will be uniquely mappable to each other simply by matching their sequences; the remaining ambiguities should be resolvable by the context of their neighboring exons. In other words, let's consider a hierarchy of three approaches ordered by increasing amount of work: 1. UCSC supplies ensembl exon IDs, so we're done. 2. We run an automated JOIN process that builds a mapping of UCSC exon annotations to Ensembl exon IDs. If there is a tiny fraction that cannot be mapped, that is not a big problem. We make this mapping available in Worldbase. 3. we give up on UCSC altogether and revert to trying to figure out how to map ensembl coordinates to UCSC genome coordinates. But we've been trying to get that information for 2 - 3 years now with no success. If option #1 is out, then let's consider option #2. > >> Yes, it should be easy to do. I'm not entirely sure what the best >> mechanism will be though; is the problem that individual exon info >> will have to be extracted from the blob dynamically? > Pretty much... This is no problem. We just need to decide on a scheme for assigning each UCSC exon a unique ID. I guess I'd advocate just using a string consisting of chromosome ID + start + stop. E.g. something like "1.10000:10150" > >> I could imagine an ensGene wrapper object with an exons list-like >> object that in turn dynamically pulls exon information out of the >> blobs, e.g. code like this > [...] >> Is this the sort of thing we need? > Yup, that's it - possibly accompanied by caching of already-located > exons. Sure, no problem. Using the exon ID scheme proposed above, this becomes utterly trivial -- a class whose __getitem__() just echoes back the chromosome ID, start, stop for the annotation DB to look up from the genome db... -- Chris --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pygr-dev" group. To post to this group, send email to pygr-dev@googlegroups.com To unsubscribe from this group, send email to pygr-dev+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/pygr-dev?hl=en -~----------~----~----~----~------~----~------~--~---