Hello everyone, I have just pushed the first version of Ensembl-in-UCSC code to my GitHub repository ('ucsc_ensembl' branch):
http://github.com/mkszuba/pygr/commit/193d2d24a472d2deb1090128168c057775185164 Its status is as follows: 1. Transcript and gene AnnotationDBs can be created and used without problems, talking directly to the UCSC MySQL server; 2. Exon annotations are successfully extracted from transcripts they are embedded in and used to create a working exon AnnotationDB, with exon identifier format of 'transcript_id:rank'. This of course requires parsing of all Ensembl transcripts in the UCSC database, which takes about 12 minutes; while it is not a problem if we end up packing these databases into NLMSAs and storing them in worldbase, allowing users to talk to UCSC directly would likely require a redesign which would e.g. only fetch and parse transcripts on demand; 3. Fetching original exon IDs from Ensembl does NOT work yet: while all that would require is a three table-join SELECT query, for reasons unclear to me I get a 'no database selected' error (to have the error appear, remove the try..except block from lines 27-31); 4. There is no support for protein annotations yet. In principle this is trivial to achieve - just use the table ensGtp to translate protein ID to transcript ID, then return appropriate transcript data - but I do not know how to attach SQLTable to a join result instead of a real table (NB. for the same reason I used cursor.execute trying to get Ensembl exon ID), and building this AnnotationDB on the client side (a'la the one for exons) feels rather wasteful; 5. The list of Ensembl database names for different versions of their data is hardcoded: the names in question follow the format 'homo_sapiens_core_XX_YYY' and while 'XX', the actual version number, is trivial to obtain from UCSC, 'YYY' is not. Not sure how we should proceed here. Cheers, -- MS -- You received this message because you are subscribed to the Google Groups "pygr-dev" group. To post to this group, send email to pygr-...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/pygr-dev?hl=.